• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Liutauras Vilda
  • Jeanne Boyarsky
  • paul wheaton
Sheriffs:
  • Ron McLeod
  • Devaka Cooray
  • Henry Wong
Saloon Keepers:
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Tim Moores
  • Mikalai Zaikin
Bartenders:
  • Frits Walraven

Regular Expression any char between html-tags

 
Ranch Hand
Posts: 136
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I want to fetch any data between a <td>sart tag and </td>end tag. I only find examples for getting the tags and not the data...
 
Ranch Hand
Posts: 781
Netbeans IDE Ubuntu Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

You don't say whether or not you have other tags within your <td> tags, whether or not there can be a new line within the <td> tag or whether or not you can have more than one on one line so take the following as a starting point.

import java.util.regex.*;

public class Test20040913
{
public static void main(String[] args)
{
Pattern pattern = Pattern.compile("<td>([^<]*)</td>", Pattern.MULTILINE);

String lines = "some rubbish <td>value 1</td> some futher \nrubbish <td>value \n2</td> and more again";
Matcher matcher = pattern.matcher(lines);
for (int startPoint = 0; matcher.find(startPoint); startPoint = matcher.end())
{
System.out.println(" Value found at " + matcher.start(1) + " with value [" + matcher.group(1) + "]");
}
}
}
 
Sebastian Green
Ranch Hand
Posts: 136
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well in my td tag there wont be another htmltag. No new lines, just a-z&0-9.. your code didnt do the trick. Here is my String


The value I want is "7 r", "2 st" & "6 194 834 p".

Thansk for your time!
 
Ranch Hand
Posts: 135
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This might be overkill but you could use an xml parser and get the values from the 'TD' elements.
 
James Sabre
Ranch Hand
Posts: 781
Netbeans IDE Ubuntu Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Change the pattern to

Pattern pattern = Pattern.compile("<td[^>]*>([^<]*)</td>", Pattern.MULTILINE | Pattern.CASE_INSENSITIVE);

This still makes several severe assumptions about your requirements. One can only generate an effective regular expression if the requirements are well specified and without a good specification one is only guessing.

You are reaching the point where you might do better to use a tolerant XML parser to generate a DOM document. A Google search could be effective.
 
James Sabre
Ranch Hand
Posts: 781
Netbeans IDE Ubuntu Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Sorry Rovas, I had my blinkers on and I didn't spot your post!

It looks like you and I agree about it possible being better to use an XML parser!
 
Screaming fools! It's nothing more than a tiny ad:
Gift giving made easy with the permaculture playing cards
https://coderanch.com/t/777758/Gift-giving-easy-permaculture-playing
reply
    Bookmark Topic Watch Topic
  • New Topic