• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Jeanne Boyarsky
  • Ron McLeod
  • Tim Cooke
Sheriffs:
  • Devaka Cooray
  • paul wheaton
  • Mark Herschberg
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Stephan van Hulst
  • Frits Walraven
  • Jj Roberts
Bartenders:
  • Carey Brown
  • salvin francis
  • Piet Souris

Compare rows within a group in a dataframe

 
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello everyone, and thank you for seeing my post.
What I need seems kind of basic, but I'm struggling with it, so I would appreciate your help.

This code:

   data = pd.DataFrame({'id' : ['a100', 'a100', 'a100', 'a200', 'a200', 'a200','a300','a300', 'a300', 'a400', 'a400', 'a400', 'a500', 'a500', 'a500', 'a600', 'a600', 'a600', 'a700', 'a700', 'a700', 'a800', 'a800', 'a800', 'a900', 'a900', 'a900'],
                    'type': ['euro', 'dollar', 'yen', 'euro', 'dollar', 'yen','euro', 'dollar', 'yen', 'euro', 'dollar', 'yen', 'euro', 'dollar', 'yen', 'euro', 'dollar', 'yen', 'euro', 'dollar', 'yen', 'euro', 'dollar', 'yen', 'euro', 'dollar', 'yen'],
                   'model': ['EQ', 'EQ', 'EQ', 'MC', 'MC', 'MC', 'EQ','EQ', 'EQ', 'MC', 'MC', 'MC', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ'],
                    'status_ant': ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'A', 'A', 'A', 'B', 'C', 'B', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A'],
                    'status': ['B', 'C', 'A', 'B', 'C', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'C', 'B', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'A', 'B', 'A', 'C', 'C']
                    })

results in this dataframe:

           id     type   model   status_ant status
    0   a100    euro    EQ           A        B
    1   a100  dollar     EQ          A        C
    2   a100     yen    EQ           A        A
    3   a200    euro    MC          A        B
    4   a200  dollar    MC           A        C
    5   a200     yen    MC           A        A
    6   a300    euro    EQ           B        A
    7   a300  dollar    EQ           C        A
    8   a300     yen    EQ           A        A
    9   a400    euro    MC          B        A
   10  a400  dollar    MC           C        A
   11  a400     yen    MC           A        A
   12  a500    euro    EQ           A        B
   13  a500  dollar    EQ           A        C
   14  a500     yen    EQ           A        B
   15  a600    euro    EQ           B        A
   16  a600  dollar    EQ           C        A
   17  a600     yen    EQ           B        A
   18  a700    euro    EQ           A        A
   19  a700  dollar    EQ           A        A
   20  a700     yen    EQ           A        B
   21  a800    euro    EQ           A        B
   22  a800  dollar    EQ           A        A
   23  a800     yen    EQ           A        B
   24  a900    euro    EQ           A        A
   25  a900  dollar    EQ           A        C
   26  a900     yen    EQ           A        C



I need to filter the rows I want using some conditions. The logic I need is this one:


* First of all, these conditions I will mention should only apply to rows where model == 'EQ'.

* If there's a row where:

status_ant == 'A' and status == 'B' and type == 'euro'

AND there's another row WITHIN the same group (by id) where
 
status_ant == 'A' and status == 'C' and type == 'dollar'

drop those two rows (else, keep them). The same logic applies for rows were:

status_ant == 'B' and status == 'A' and type == 'euro'

AND there's another row WITHIN the same group (by id) where

status_ant == 'C' and status == 'A' and type == 'dollar'

The row with type == 'yen' shouldn't be dropped whatever the values in status_ant and status are.


Note: I know it might be confusing, but the resulting dataframe I want is this one (it exemplifies very well all the conditions I mentioned):


         id     type   model   status_ant status
    2   a100     yen    EQ          A       A
    3   a200    euro    MC          A       B
    4   a200  dollar    MC          A       C
    5   a200     yen    MC          A       A
    8   a300     yen    EQ           A       A
    9   a400    euro    MC          B       A
    10  a400  dollar    MC          C       A
    11  a400     yen    MC          A       A
    14  a500     yen    EQ          A       B
    17  a600     yen    EQ          B       A
    18  a700    euro    EQ          A       A
    19  a700  dollar    EQ          A       A
    20  a700     yen    EQ          A       B
    21  a800    euro    EQ          A       B
    22  a800  dollar    EQ          A       A
    23  a800     yen    EQ          A       B
    24  a900    euro    EQ          A       A
    25  a900  dollar    EQ          A       C
    26  a900     yen    EQ          A       C

I was looking for the answer and found something similar here: [Pandas: Comparing rows within groups](https://stackoverflow.com/questions/48819644/pandas-comparing-rows-within-groups)
but couldn't make it work for my code. I do think an apply and groupby is what I need.

Really appreciate your help and your time. Thank you

 
I hired a bunch of ninjas. The fridge is empty, but I can't find them to tell them the mission.
the value of filler advertising in 2021
https://coderanch.com/t/730886/filler-advertising
reply
    Bookmark Topic Watch Topic
  • New Topic