Search...
FAQs
Subscribe
Pie
FAQs
Recent topics
Flagged topics
Hot topics
Best topics
Search...
Search within Beginning Java
Search Coderanch
Advance search
Google search
Register / Login
Help coderanch get a
new server
by contributing to the
fundraiser
Post Reply
Bookmark Topic
Watch Topic
New Topic
programming forums
Java
Mobile
Certification
Databases
Caching
Books
Engineering
Micro Controllers
OS
Languages
Paradigms
IDEs
Build Tools
Frameworks
Application Servers
Open Source
This Site
Careers
Other
Pie Elite
all forums
this forum made possible by our volunteer staff, including ...
Marshals:
Campbell Ritchie
Ron McLeod
Paul Clapham
Devaka Cooray
Liutauras Vilda
Sheriffs:
Jeanne Boyarsky
paul wheaton
Henry Wong
Saloon Keepers:
Stephan van Hulst
Tim Holloway
Tim Moores
Carey Brown
Mikalai Zaikin
Bartenders:
Lou Hamers
Piet Souris
Frits Walraven
Forum:
Beginning Java
TreeMap question#1
Anthony Alexander
Greenhorn
Posts: 15
posted 18 years ago
Number of slices to send:
Optional 'thank-you' note:
Send
How come when I have an array of TreeMaps and I try to view some values, I only see the last entry?
for (int yy=0; yy<N; yy++) //Loop through each treemap in array { for(String xx : WeightData[yy].keySet( )) { System.out.println(xx + " " + WeightData[yy].get(xx)); } }
other part of code segment:
public static TreeMap<String, Double> []WeightData = new TreeMap[Docs.length];
I apologise, N is defined in main program as 10 and there are already items in WeightData array treemap at this point.
Keith Lynn
Ranch Hand
Posts: 2412
posted 18 years ago
Number of slices to send:
Optional 'thank-you' note:
Send
Can you show all of the code?
Anthony Alexander
Greenhorn
Posts: 15
posted 18 years ago
Number of slices to send:
Optional 'thank-you' note:
Send
OK Here goes. It is a information retrieval program. You will need ten text files (words.txt-words10.txt) to run it.
It is still in progress so it may be hard to follow.
// Run.java v1.60 // --------------- // Obtained Vector Lengths for Query and Document(s). // // Only two things left: // 1. Dot Products (Q.D1, Q.D2...Q.D10) - Loop of some sort. // 2. Calculate Similarity Values - // "Sim[x] = DotProducts[x] / VLQuery * VLDoc[x]" //////////////////////////////////////////////////////////////////////// import java.util.*; // Provides TreeMap, Iterator, Scanner import java.io.*; // Provides FileReader, FileNotFoundException import javax.swing.*; import java.lang.*; public class Run { public static void main(String[ ] args) { // Introduction... Intro(); // Create DocId Tree Map TreeMap<String, List<Integer>> DocData = new TreeMap<String, List<Integer>>( ); readWordFile(DocData); // Ending....... Ending(); } public static int getCount (String word, TreeMap<String, Integer> DocData) { if (DocData.containsKey(word)) { // The word has occurred before, so get its count from the map return DocData.get(word); // Auto-unboxed } else { // No occurrences of this word return 0; } } public static void readWordFile(TreeMap<String, List<Integer>> DocData) { Scanner wordFile; String word; // A word read from the file Integer count; // The number of occurrences of the word TreeMap<String, Integer> []FreqData = new TreeMap[Docs.length];// initialize the array JOptionPane.showMessageDialog(null,"Firstly, going to read index terms from database\ninto" + "\n<FreqData>\nTreeMap array. Index terms are set to lower case. unwanted symbols are" + " removed"); //**FOR LOOP TO READ THE DOCUMENTS** for (int x=0; x<Docs.length; x++) { FreqData[x] = new TreeMap<String, Integer>(); //initialise array try { wordFile = new Scanner(new FileReader(Docs[x])); } catch (FileNotFoundException e) { System.err.println(e); return; } while (wordFile.hasNext( )) { // Read the next word and get rid of the end-of-line marker if needed: word = wordFile.next( ); // This makes the Word lower case. word = word.toLowerCase(); // Lets Clean The Word. // CleanWord(word); // NOT USED AT THE MOMENT. //Remove unwanted symbols from the word - //(Cannot replace brackets (){}[] or math operators =+*) word = word.replaceAll("!", "");word = word.replaceAll("'", ""); word = word.replaceAll(",", "");word = word.replaceAll("�", ""); word = word.replaceAll("\"", "");word = word.replaceAll("$", ""); word = word.replaceAll("%", "");word = word.replaceAll("^", ""); word = word.replaceAll("-", "");word = word.replaceAll("&", ""); word = word.replaceAll("|", "");word = word.replaceAll("@", ""); word = word.replaceAll("`", "");word = word.replaceAll("�", ""); word = word.replaceAll(":", "");word = word.replaceAll(";", ""); word = word.replaceAll("#", "");word = word.replaceAll("~", ""); word = word.replace (".", ""); //replaceAll removes whole word. word = word.replace ("?", "");//replace All crashes. // Get the current count of this word, add one, and then store the new count: count = getCount(word, FreqData[x]) + 1; // Put the word and the frequency in table FreqData[x].put(word, count); //Add DocIDs to current word List<Integer> DocIDs = DocData.get(word ); if( DocIDs == null )//If there is no array list. { DocIDs = new ArrayList<Integer>( ); //Create a new array list DocData.put( word, DocIDs );//put the word and array list in treemap } DocIDs.add( x ); //Add Document ID to the List //Need a check here. Maybe a loop that checks that the //same value of DocID does not already exist. if (DocIDs.size() >=2 ) { //If current DocID is the same as previous if (DocIDs.get(DocIDs.size()-1) == DocIDs.get(DocIDs.size()-2) ) { //Remove the current entry DocIDs.remove(DocIDs.get(DocIDs.size()-1)); } } } //SHOW CONTENTS OF CURRENT TREEMAP! - Should be a class System.out.println("\n\nTreeMap FreqData["+x+"]" + "\n\t TERM \tTERM FREQUENCY"); for(String y : FreqData[x].keySet( )) { System.out.printf("%15s %10d\n", y, FreqData[x].get(y)); //Add Term to Term Table TermTable.add(y); } } // PRINT OUT THE DOCUMENT DOC-ID TREE MAP. // IDF STUFF IS DONE HERE AS WELL. JOptionPane.showMessageDialog(null,"<Document ID TreeMap>\nIDF is calculated here by\n" + "Log[base10]( N / M ).\nWhere N is Total Number of Docs and" + "\nM is Docs Term appears in."); showDocID( DocData ); // Show weights of each - Term Frequency * IDF JOptionPane.showMessageDialog(null,"<Weights TreeMap>\nTo be calculated by\nTermFrequency * IDF." + "\nVector Length of each Document also to be calculated\n" + "by the square root of (SUM of weight.squared)"); System.out.println("\n\nCalculate Weights"); for (int q=0; q<N; q++) { System.out.println("\nWeights from Document "+q+"\n"); VL = 0.0; for(String v : FreqData[q].keySet( )) { WeightData[q] = new TreeMap<String, Double>(); //initialise array //System.out.printf("%15s %10d\n", v, FreqData[q].get(v)); // W = TermFrequency * IDF Double W = FreqData[q].get(v) * IDFData.get(v); //System.out.println(FreqData[q].get(v)); //System.out.println(IDFData.get(v)); //System.out.println(FreqData[q].get(v)* IDFData.get(v)); //Display Weight System.out.println(q + " : " + v + " : " + W ); //Add to WeightData TreeMap WeightData[q].put(v, W); // **************Vector Length***************************************** //Lets Calculate Vector Length within Document Double Wsq = W*W; VL = VL + Wsq; //VL=+(W*W); // VL= VL + Weight squared } System.out.println("\nSquare root of " + VL + " gives us "); VL = Math.sqrt(VL); //Square root System.out.println("Doc["+q+"] Vector Length : " + VL +"\n\n"); //Store value into Vector Length array VLDoc[q] = VL; // ********************************************************************* } /////////////////////////////////////////////////////////////////////////////// // Enter User query..... //Window to enter search term String Temp = (String)JOptionPane.showInputDialog(null, "Enter search","Java"); //System output of entered word System.out.println("\n\nUser entered ["+Temp+"]"); String Word; Word = Temp.toLowerCase(); //Make entered query lower case //Break search word into terms by use of symbols String delimiters = "[' ,.!?]"; // analyse the string & break into words String[] QueryTerms = Word.split(delimiters, 0); //display string as it is in table form System.out.println("Query Words,"); int u=0; // used in loop. //Loop output of SearchWords... for(String q : QueryTerms) { System.out.println("Query Term " + u + " : " + q); u++; QueryTable.add(q); //add word to query table, why? not sure } //////////////////////////////////////////////////////////////////////////////// // 1. Set the Query TreeMap // ------------------------ // Loop through IDF TreeMap, find Query Word(s). // Get the IDF for each Query Term. // Put Query Term and IDF (Query Weight?) into QueryData Tree Map. JOptionPane.showMessageDialog(null, "View IDF Data.\nLook for Query words in IDF data."); //Something here, add word to Query TreeMap? System.out.println("\n\nIDF Tree Map\n-------------"); for(String xx : IDFData.keySet( )) { System.out.println(xx + " " + IDFData.get(xx)); for (int tv=0; tv<QueryTerms.length; tv++) { if (QueryTerms[tv].equals(xx)) //If current query term is in IDF Table { System.out.print("****Match!*****\n\n"); //Show match. JOptionPane.showMessageDialog(null, "[ " + xx + " ] found with\nWeight : " + IDFData.get(xx)); //Pop Window Display //Put Query word and weight into Tree Map QueryData.put(xx, IDFData.get(xx)); } } } //Lets see the Query TreeMap. JOptionPane.showMessageDialog(null, "Look at QueryData TreeMap"); System.out.println("\n\nQueryData TreeMap"+"\n--------------------------"); for(String xx : QueryData.keySet( )) { System.out.println(xx + " " + QueryData.get(xx)); //**************Vector Length***************************************** //Lets Calculate Vector Length for the Query Double Wsq2 = QueryData.get(xx)*QueryData.get(xx); VLQuery = VLQuery + Wsq2; } System.out.println("\nSquare root of " + VLQuery + " gives us "); VLQuery = Math.sqrt(VLQuery); //Square root and store System.out.println("Query Vector Length : " + VLQuery +"\n\n"); //********************************************************************* // 2. Vector Lengths // ----------------- // Have to compute the vector lengths now. // Going to have to compute for the query now, Documents are done already. // Have to square all weights and then square root the total. JOptionPane.showMessageDialog(null, "Now we have computed the Vector length\n" + "For The Query and also for the documents.\n\n" + "The next thing to do is:\nDot products for each document" + " which are calculated by taking\nSUM of Query word weight" + " Multiplied by term weight in document."); // 3. Dot Products // ---------------- // Dot products for each document are calculated by taking query word weight // Multiplied by term weight in document. // Will be using QueryData treemap, WeightData[] treemap and storing in DotProduct[]. System.out.println("\n\nDot Product\n--------------\n"); //Test for (int yy=0; yy<N; yy++) //Loop through each document { for(String xx : WeightData[yy].keySet( )) { System.out.println(xx + " " + WeightData[yy].get(xx)); } } //**ERROR ABOVE******** //Prints out TreeMap with DocID public static void showDocID( Map<String,List<Integer>> m ) { System.out.println("\n\nDOC ID TREEMAP"); for( int i = 0; i < INDENT*2; i++ ) //Indent now is dash (*2) System.out.print( "-" ); Set<String> keys = m.keySet( ); for( String k : keys ) { System.out.print( "\n"+k ); System.out.print( ":" ); for( int i = k.length( ); i < INDENT; i++ ) //Indent is just 30 spaces System.out.print( " " ); List<Integer> lines = m.get( k ); Iterator<Integer> itr = lines.iterator( ); System.out.print( itr.next( ) ); //Show the frequencies while( itr.hasNext( ) ) System.out.print( ", " + itr.next( ) ); System.out.println( ); //IDF STUFF M = lines.size(); IDF = Math.log10(N/M); System.out.println("\nDOC FREQ " + M + ", IDF " + IDF); //Add to IDF TreeMap IDFData.put(k, IDF); //Add IDF to IDF Table IDFTable.add(IDF); //Add Doc Frequency to Table DocFreqTable.add(M); for( int i = 0; i < INDENT*2; i++ ) //Indent now is dash (*2) System.out.print( "-" ); } } // Clean word, make it user friendly public static String CleanWord(String word) { //Trim blank spaces from word (Index word) //word = word.trim(); //Remove unwanted symbols from the word - //(Cannot replace brackets (){}[] or math operators =+*) word = word.replaceAll("!", "");word = word.replaceAll("'", ""); word = word.replaceAll(",", "");word = word.replaceAll("�", ""); word = word.replaceAll("\"", "");word = word.replaceAll("$", ""); word = word.replaceAll("%", "");word = word.replaceAll("^", ""); word = word.replaceAll("-", "");word = word.replaceAll("&", ""); word = word.replaceAll("|", "");word = word.replaceAll("@", ""); word = word.replaceAll("`", "");word = word.replaceAll("�", ""); word = word.replaceAll(":", "");word = word.replaceAll(";", ""); word = word.replaceAll("#", "");word = word.replaceAll("~", ""); word = word.replace (".", ""); //replaceAll removes whole word //JOptionPane.showMessageDialog(null, word); return word; //Return 'cleaned' word. } public static void Intro() { JOptionPane.showMessageDialog(null, "Conrad McLaughlin\n524318", "Final Year BEng Project", JOptionPane.INFORMATION_MESSAGE); JOptionPane.showMessageDialog(null, "Run! (Beta Version 1.25)\n\n- Created an Inverted Index" + "\n- Computed Term Frequency\n- Calculated Inverse Document Frequency" + "\n- Worked out the weight of each term after 2nd pass through index."); } public static void Ending() { JOptionPane.showMessageDialog(null, "End"); } //////////////////////////////////////////////////////////////////////////////////////// public static final int INDENT = 30; // Array of documents static String Docs [] = {"words.txt", "words2.txt","words3.txt", "words4.txt", "words5.txt", "words6.txt","words7.txt", "words8.txt", "words9.txt", "words10.txt",}; //IDF Stuff public static int N = Docs.length; //Size of database public static int M = 0;//Document frequency public static double IDF;//IDF //Table to hold IDFs. public static ArrayList IDFTable = new ArrayList <Integer>(); //Table to hold all terms. public static ArrayList TermTable = new ArrayList <String>(); //Table to hold Doc Freqs. public static ArrayList DocFreqTable = new ArrayList <String>(); //Table to hold Weights. public static ArrayList WeightTable = new ArrayList <Double>(); // IDF data TreeMap public static TreeMap<String, Double> IDFData = new TreeMap (); // Weight data TreeMap. public static TreeMap<String, Double> []WeightData = new TreeMap[Docs.length]; // Query TreeMap public static TreeMap<String, Double> QueryData = new TreeMap (); //Table to hold Query terms public static ArrayList QueryTable = new ArrayList <String>(); //Array of Vector Lengths from Document public static Double [] VLDoc = new Double [N]; //Same size as Docs Length public static Double VL; //Temp. Vector Length for Document // Double variable for query vector length public static Double VLQuery = 0.0; //Only one vector length value for Query //Array of Dot Products for each Document public static Double [] DotProduct = new Double [N]; //Same size as Docs Length public static Double DP; //Temp. Dot Product ///////////////////////////////////////////////////////////////////////////////////////// }
reply
reply
Bookmark Topic
Watch Topic
New Topic
Boost this thread!
Similar Threads
nested loop??
Issues with JTextFields and JButtons
question about treemaps
my inverted index not successfull
Error while running program
More...