Win a copy of TensorFlow 2.0 in Action this week in the Artificial Intelligence and Machine Learning forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Paul Clapham
  • Bear Bibeault
  • Jeanne Boyarsky
Sheriffs:
  • Ron McLeod
  • Tim Cooke
  • Devaka Cooray
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Jj Roberts
  • Stephan van Hulst
  • Carey Brown
Bartenders:
  • salvin francis
  • Scott Selikoff
  • fred rosenberger

How to filter a dataframe?

 
Ranch Hand
Posts: 70
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm using the joinery library to use a DataFrame in my project. I was able to use it to read a csv file but now I'm trying to figure out how I can filter my dataframe?

For ex. in the csv I have a column height and the value is either t - tall or s - short, I'm trying to figure out a way to get the dataframe for each of those, something like:
When I try this I get

The type of the expression must be an array type but it resolved to DataFrame<Object>



Any ideas how I can filter the dataframe properly?

Thanks so much!
 
Saloon Keeper
Posts: 22668
153
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Looks like it did exactly what you told it to: created a single DataFrame<object>,

You didn't say  DataFrame<object>[] or List DataFrame<object>, you only specified a single  DataFrame<object>.
 
Glenda Karen
Ranch Hand
Posts: 70
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you!
So I tried changing it to:

but now I'm getting

type mismatch: cannot convert from DataFrame<Object> to DataFrame<Object>[]


as well I got an error regarding the height:

height cannot be resolved or is not a field

 
Tim Holloway
Saloon Keeper
Posts: 22668
153
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I don't suppose you have a JavaDoc link for that library?

I took a quick dive into the code and a DataFrame is a container for a collection of data. I think actually you're supposed to define a class that defines each object (as read as a CSV row in your case) and use that.

So for example, define a class Rectangle, load a DataFrame<Rectangle> from the rectangle co-ordinates/dimensions  and work with it that way.
 
Glenda Karen
Ranch Hand
Posts: 70
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry I don't think I have a JavaDoc link but I did find this api documentation

So if I understand correctly I'd make a class for reading csv and then I could use that to achieve the filtering?

Something like:
 
Bartender
Posts: 4109
156
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In the API link that you show, have a look at the 'select' method. If you click on that word, you will see an example.
 
Tim Holloway
Saloon Keeper
Posts: 22668
153
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Piet Souris wrote:In the API link that you show, have a look at the 'select' method. If you click on that word, you will see an example.



Actually, I think that is the JavaDoc! The stuff that we're used to seeing only uses the bare minimum of the extensive capabilities of the JavaDoc system. These people used it more effectively.
 
Tim Holloway
Saloon Keeper
Posts: 22668
153
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, I did a little more reading and I'm getting more confused. Either their terminology is fuzzy or their explanations are fuzzy. They talk about things they call "columns" that seem like they should be rows, but then they talk about "rows", so they can't simply be using the wrong name. And I'm suspecting that in some cases at least, a "column" is a column header, but why data values should directly attach to them is a puzzle.

It sounds like a really great framework if someone could just explain it in a way that makes sense.
 
Glenda Karen
Ranch Hand
Posts: 70
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Piet Souris wrote:In the API link that you show, have a look at the 'select' method. If you click on that word, you will see an example.



Thank you!
So maybe I could do something like:
 
Glenda Karen
Ranch Hand
Posts: 70
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Tim Holloway wrote:

Piet Souris wrote:In the API link that you show, have a look at the 'select' method. If you click on that word, you will see an example.



Actually, I think that is the JavaDoc! The stuff that we're used to seeing only uses the bare minimum of the extensive capabilities of the JavaDoc system. These people used it more effectively.



Oh neat, glad it was:)
 
Glenda Karen
Ranch Hand
Posts: 70
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Tim Holloway wrote:Well, I did a little more reading and I'm getting more confused. Either their terminology is fuzzy or their explanations are fuzzy. They talk about things they call "columns" that seem like they should be rows, but then they talk about "rows", so they can't simply be using the wrong name. And I'm suspecting that in some cases at least, a "column" is a column header, but why data values should directly attach to them is a puzzle.

It sounds like a really great framework if someone could just explain it in a way that makes sense.


Thanks so much for looking through it, ya it looks like there is a lot to it but I was having trouble understanding how to implement the different functionalities. I was looking for a Java replacement for python's pandas library and this one looks pretty promising, just got stuck trying to figure out how to filter in a similar way to pandas, in python I can do:
 
Tim Holloway
Saloon Keeper
Posts: 22668
153
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The one thing I did get an impression of was that the DataFrame is a numeric value storage and analytic system. So you wouldn't - as I interpret it - search for "tall", you'd search for height exceeding whatever your idea of tall was.
 
Piet Souris
Bartender
Posts: 4109
156
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Glenda wrote:short_and_tall = pd.DataFrame(pd.read_csv(name_of_csv))
only_Tall = short_and_tall[short_and_tall.height == "t"]


Don't know about pandas, but this is more or less how you would do it in R.

By the look of it, it seems that java's DataFrame does not support filtering like this. The example they give is as follows:

> DataFrame<Object> df = new DataFrame<>("name", "value");
> for (int i = 0; i < 10; i++)
>     df.append(Arrays.asList("name" + i, i));
> df.select(new Predicate<Object>() {
>         @Override
>         public Boolean apply(List<Object> values) {
>             return Integer.class.cast(values.get(1)).intValue() % 2 == 0;
>        
>     })
>   .col(1);
[0, 2, 4, 6, 8] }

Apparently, the paramater 'List<Object> values' seem to mean a row in the DataFrame. In this case, values.get(1) gets he second element (i.e. "value"), converts it to an Integer and tests whether it is even or not. So, in your example:

say, that "height" is the third column, and so you would get something like:

But be warned: I have not tested this!
 
Tim Holloway
Saloon Keeper
Posts: 22668
153
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is this not a bit redundantly redundant?

Why "Integer.class.cast()" instead of just "Integer.cast()"?

Then again, why cast(record.get(2)).intValue() instead of simply "((int) record.get(2))"?

For that matter, you should be able to define a DataFrame<Integer> as I interpret it, although I'm not sure precisely how. In which case, casting would not apply at all.
 
Glenda Karen
Ranch Hand
Posts: 70
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Tim Holloway wrote:The one thing I did get an impression of was that the DataFrame is a numeric value storage and analytic system. So you wouldn't - as I interpret it - search for "tall", you'd search for height exceeding whatever your idea of tall was.



Thank you! The only problem is that I have it saved in the csv as t or s, I don't actually know the initial values for tall and short.
 
Glenda Karen
Ranch Hand
Posts: 70
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Apparently, the paramater 'List<Object> values' seem to mean a row in the DataFrame. In this case, values.get(1) gets he second element (i.e. "value"), converts it to an Integer and tests whether it is even or not. So, in your example:

say, that "height" is the third column, and so you would get something like:

But be warned: I have not tested this!



Thanks so much, I'm gonna try and see if I can implement something like this, since I'm working with strings and I don't know the original values for how something was categorized as tall or short, perhaps I should check out other libraries. Or maybe there is a way to support strings, like I'm wondering if I could convert t and s to their numerical value of the alphabet and use that somehow.
 
Glenda Karen
Ranch Hand
Posts: 70
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So I tried adding:


I debugged it and seems record returned all the rows in the second column.
So say the csv looked something like:

I only got back
Running the next line:

Gets an error:
which I think is related to the sring and integer.
and also returns

 
Glenda Karen
Ranch Hand
Posts: 70
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I was able to tweak some stuff for strings but still now sure how to get all tall into one dataframe and all short into another.



Debugging this shows record as each row, as I keep resuming the debug, I see it going through each row and then height shows either "t" or "s" as the value.

In the console I'm getting:


Now I need to figure out how I can get the rest the columns corresponding to each height and save into a dataframe accordingly.
 
Piet Souris
Bartender
Posts: 4109
156
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi Glenda,

yes that seems like it. In my previous example (the one that gives this ClasscastException), I assumed that column 2 was an integer, but that was wrong, and I am not sure what "0.11v" means.

A DataFrame has a method "columns()" that will get you all the names. And it has methods to show you a few of the rows, so that you can see what the content of a row is.

I this case, the column that you want is column 1, where the height is denoted as either "t" or "s". That is clearly a String, so indeed you should cast column 1 into a String. And you are using a much shorter way than in the example.

And about Tims suggestion to have a DataFrame<Integer>: as you can see from the context, that is impossible here. It is possible to drop all non-integer columns and cast the remaining DataFrame to a DataFrame<Integer>, but I doubt if that is very useful.

Anyway: by the look of it, you seem to have mastered that DataFrame issue! But compared to R, (especially that filtering), it seems a bit lengthy.
 
Piet Souris
Bartender
Posts: 4109
156
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
O yeah: I ovelooked your latest reply. Use "equals()" to test for String equality, not "==".
 
Glenda Karen
Ranch Hand
Posts: 70
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks so much for your reply, I have changed it to:
This returns true or false, so I can probably do an if true...

It is possible to drop all non-integer columns and cast the remaining DataFrame to a DataFrame<Integer>, but I doubt if that is very useful.


So does that mean if I wanted to return all rows and columns of tall into a dataframe it wouldn't work?

and I am not sure what "0.11v" means

that's just placement text for other columns in the csv, I have a few columns containing some other data about each tall or short item.
 
Piet Souris
Bartender
Posts: 4109
156
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Glenda Karen wrote:Thanks so much for your reply, I have changed it to:


Yes, that looks correct to me.

Glenda wrote:
So does that mean if I wanted to return all rows and columns of tall into a dataframe it wouldn't work?


Can you elaborate this a little? I'm not sure what you mean.

But have you got that selection working now?
 
Glenda Karen
Ranch Hand
Posts: 70
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Can you elaborate this a little? I'm not sure what you mean

what I mean is, is it possible for me to get all the "t" in the height column and all the corresponding columns related to the "t", for ex. if there are 3 columns: height,  name and age, if there are 3 "t" in height column can I get their info regarding age and name. Say the csv looks like:


Am I able to filter for "t" and get a new dataframe with:


But have you got that selection working now?


Not to get only "t" or "s", only to get the height column with all the data pertaining to it.
 
Piet Souris
Bartender
Posts: 4109
156
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Glenda Karen wrote:Am I able to filter for "t" and get a new dataframe with:


Yes, that is the result of the select: you get all rows for which the Predicate returns 'true'. If you want less columns, then have a look at the method 'retain'.
 
Glenda Karen
Ranch Hand
Posts: 70
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Amazing, thanks so much for your help!
 
Those cherries would go best on cherry cheesecake. Don't put those cherries on this tiny ad:
the value of filler advertising in 2020
https://coderanch.com/t/730886/filler-advertising
reply
    Bookmark Topic Watch Topic
  • New Topic