• Post Reply Bookmark Topic Watch Topic
  • New Topic

Metadata extraction using Tika  RSS feed

 
sudheshna Iyer
Ranch Hand
Posts: 71
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
1. I have few questions about the extraction of metadata. So I wanted to join mailing list of Tika user group. Can you please provide the email address for it?

2. How do I extract the metadata from a file? For eg: I need author information. So for different files, author information is coming from different fields like:
Author , meta:author , citation_author

Which one should I take? Also I need to extract ~15 of predefined metadata fields like publication year , doi,.. from Metadata.
What is the best way to extract these fields from Metadata object. Metadata.names() contains elements like "citation_doi".
Should I say iterate thru metadata names and for each metadata, should I say



Is there any better way to extract the metadata?
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So I wanted to join mailing list of Tika user group. Can you please provide the email address for it?

With all due respect, that is way too easy to find yourself to answer it here. Which search phrase suggests itself?

How do I extract the metadata from a file? For eg: I need author information. So for different files, author information is coming from different fields like:
Author , meta:author , citation_author

A quick search for "meta data extraction tika" found http://blog.jeroenreijn.com/2010/04/metadata-extraction-with-apache-tika.html.
 
sudheshna Iyer
Ranch Hand
Posts: 71
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you for your reply.

But for getting one metadata, I have to put "Model" metadata tag. How do I know the name. That is what I posted in my code..
metadata.get("Model").equals("Canon EOS 350D DIGITAL"));
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Each file type will have to be handled differently. We can't tell you how, exactly, as we don't know what corpus of files you're dealing with.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!