Win a copy of Zero to AI - A non-technical, hype-free guide to prospering in the AI era this week in the Artificial Intelligence and Machine Learning forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Paul Clapham
  • Bear Bibeault
  • Jeanne Boyarsky
Sheriffs:
  • Ron McLeod
  • Tim Cooke
  • Devaka Cooray
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Jj Roberts
  • Stephan van Hulst
  • Carey Brown
Bartenders:
  • salvin francis
  • Scott Selikoff
  • fred rosenberger

How to get index of DataFrame?

 
Ranch Hand
Posts: 52
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I posted recently about the joinery library (thanks so much for all the help I recieved)  I'm now experimenting with another library called nRo/DataFrame  and I'm trying to get the index of the DataFrame.

I've searched through the javadoc but I haven't quite found what I'm looking for.

I've tried using a few different methods such as size() (which gave me the number of rows) and getRows() (which gave me a specifc row) but they didn't give me what I needed.
Using getColumns() got me something like:

but I need to get something more similar to :

In the joinery library I was able to access by calling .index() and it returned something like:

or in python pandas dataFrame.index :

Basically what I have so far is:

Thanks so much!
 
Bartender
Posts: 4103
156
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
A DataFrame has a method 'getHeader()' and that header has a method 'getIndex()'.
 
Glenda Karen
Ranch Hand
Posts: 52
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you!
I tried:
And I got back 2 for which position the height is but I need a way to also get which position each row is, so  for all "t" in header which row they are in, or as pandas defines it, "The index (row labels) of the DataFrame."
 
Piet Souris
Bartender
Posts: 4103
156
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi Glenda,

I looked through the API, but found nothing useful sofar. But I can think of three things for now:

1) perhaps the dataframe contains a column with unique values, so that in a selection you can identify the selected rows.

2) add an IntegerColumn to the dataframe, containing the values 0, 1, ..., dataframe.size() - 1.  In the selection you will see these numbers.

3) get the column you're interested in. That column implements Iterable<T>, so you could do something like:
   
 
Glenda Karen
Ranch Hand
Posts: 52
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks so much!

I wonder which approach would be best to use so I can then
1. use a shuffle method to shuffle all the indexes,
2. then divide the array into only a few indexes and
3. then get a new dataframe based on only the few indexes from the divided array.

For ex. say the indexes of my current dataframe are [3, 6, 7, 9, 10, 11] I then shuffle that and get [7, 3, 9, 11, 10, 6] and then divide it to only get the first 3, so I'm left with [7, 3, 9] and I then get a new dataframe using the 3 remaining indexes which correspond to the indexes in the original dataframe.

Piet Souris wrote:2) add an IntegerColumn to the dataframe, containing the values 0, 1, ..., dataframe.size() - 1.  In the selection you will see these numbers.


Perhaps this approach would be best suited for the needs mentioned above.

Additionally I do have unique string values in one of the columns, so I could do something like what is quoted below to get an array of numbers?

Piet Souris wrote:

 
Glenda Karen
Ranch Hand
Posts: 52
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I tried implementing but I got back for height
 
Piet Souris
Bartender
Posts: 4103
156
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Glenda Karen wrote:I tried implementing but I got back for height


hi Glenda,

I should be more careful when giving an example. The "T" that I used was meaning the type of the values in the column, and was not meant literally as a T.

Now, you told that the values of the "height" column were "s" and "t". So we sould start with

and then you follow it by:

If StringColumn fails, then use "getColumn("height"). Unfortunately, I have not downloaded all of this DataFrame, so I can test nothing.

As far as your previous reply concerns: can you tell me what you are trying to achieve with all these indices? It is not clear to me.
 
Glenda Karen
Ranch Hand
Posts: 52
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Piet,

Thanks so much the StringColumn worked for getting the index.

Now, you told that the values of the "height" column were "s" and "t". So we sould start with


and then you follow it by:



Piet Souris wrote:As far as your previous reply concerns: can you tell me what you are trying to achieve with all these indices?



Basically what I'm trying to do is split up the dataframe into 2 dataframes, 1 containing all the the short and the 2nd containing all the talls, I'm using the indices to then shuffle the indexes (corresponding to the height column of the dataframes) so I can make a test and training set to use for data science predictions.

For ex. I make a talls dataframe by filtering all "t" in the original dataframe (containing the short and tall), "t" in the height column is at the index of the original dataframe at say [0, 3, 4, 6, 8, 9], in order to make an accurate test and training set, I shuffle the array so maybe after it's shuffled it will look like [3, 9, 4, 0, 8, 6] I then want to split the array of indexes (corresponding to the columns in the original dataframe) so I can give about 20% of the  rows containing "t" to the test set and 80% of the rows containing "t" to the training set.

I want to be able to take the test set and predict if it is short or tall.
 
reply
    Bookmark Topic Watch Topic
  • New Topic