Hello everyone, and thank you for seeing my post.
What I need seems kind of basic, but I'm struggling with it, so I would appreciate your help.
This code:
data = pd.DataFrame({'id' : ['a100', 'a100', 'a100', 'a200', 'a200', 'a200','a300','a300', 'a300', 'a400', 'a400', 'a400', 'a500', 'a500', 'a500', 'a600', 'a600', 'a600', 'a700', 'a700', 'a700', 'a800', 'a800', 'a800', 'a900', 'a900', 'a900'],
'type': ['euro', 'dollar', 'yen', 'euro', 'dollar', 'yen','euro', 'dollar', 'yen', 'euro', 'dollar', 'yen', 'euro', 'dollar', 'yen', 'euro', 'dollar', 'yen', 'euro', 'dollar', 'yen', 'euro', 'dollar', 'yen', 'euro', 'dollar', 'yen'],
'model': ['EQ', 'EQ', 'EQ', 'MC', 'MC', 'MC', 'EQ','EQ', 'EQ', 'MC', 'MC', 'MC', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ'],
'status_ant': ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'A', 'A', 'A', 'B', 'C', 'B', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A'],
'status': ['B', 'C', 'A', 'B', 'C', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'C', 'B', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'A', 'B', 'A', 'C', 'C']
})
results in this dataframe:
id type model status_ant status
0 a100 euro EQ A B
1 a100 dollar EQ A C
2 a100 yen EQ A A
3 a200 euro MC A B
4 a200 dollar MC A C
5 a200 yen MC A A
6 a300 euro EQ B A
7 a300 dollar EQ C A
8 a300 yen EQ A A
9 a400 euro MC B A
10 a400 dollar MC C A
11 a400 yen MC A A
12 a500 euro EQ A B
13 a500 dollar EQ A C
14 a500 yen EQ A B
15 a600 euro EQ B A
16 a600 dollar EQ C A
17 a600 yen EQ B A
18 a700 euro EQ A A
19 a700 dollar EQ A A
20 a700 yen EQ A B
21 a800 euro EQ A B
22 a800 dollar EQ A A
23 a800 yen EQ A B
24 a900 euro EQ A A
25 a900 dollar EQ A C
26 a900 yen EQ A C
I need to filter the rows I want using some conditions. The logic I need is this one:
* First of all, these conditions I will mention should only apply to rows where model == 'EQ'.
* If there's a row where:
status_ant == 'A' and status == 'B' and type == 'euro'
AND there's another row WITHIN the same group (by id) where
status_ant == 'A' and status == 'C' and type == 'dollar'
drop those two rows (else, keep them). The same logic applies for rows were:
status_ant == 'B' and status == 'A' and type == 'euro'
AND there's another row WITHIN the same group (by id) where
status_ant == 'C' and status == 'A' and type == 'dollar'
The row with type == 'yen' shouldn't be dropped whatever the values in status_ant and status are.
Note: I know it might be confusing, but the resulting dataframe I want is this one (it exemplifies very well all the conditions I mentioned):
id type model status_ant status
2 a100 yen EQ A A
3 a200 euro MC A B
4 a200 dollar MC A C
5 a200 yen MC A A
8 a300 yen EQ A A
9 a400 euro MC B A
10 a400 dollar MC C A
11 a400 yen MC A A
14 a500 yen EQ A B
17 a600 yen EQ B A
18 a700 euro EQ A A
19 a700 dollar EQ A A
20 a700 yen EQ A B
21 a800 euro EQ A B
22 a800 dollar EQ A A
23 a800 yen EQ A B
24 a900 euro EQ A A
25 a900 dollar EQ A C
26 a900 yen EQ A C
I was looking for the answer and found something similar here: [Pandas: Comparing rows within groups](
https://stackoverflow.com/questions/48819644/pandas-comparing-rows-within-groups)
but couldn't make it work for my code. I do think an apply and groupby is what I need.
Really appreciate your help and your time. Thank you