• Post Reply Bookmark Topic Watch Topic
  • New Topic

Regular Expression any char between html-tags  RSS feed

 
Ranch Hand
Posts: 136
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I want to fetch any data between a <td>sart tag and </td>end tag. I only find examples for getting the tags and not the data...
 
Ranch Hand
Posts: 781
Java Netbeans IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

You don't say whether or not you have other tags within your <td> tags, whether or not there can be a new line within the <td> tag or whether or not you can have more than one on one line so take the following as a starting point.

import java.util.regex.*;

public class Test20040913
{
public static void main(String[] args)
{
Pattern pattern = Pattern.compile("<td>([^<]*)</td>", Pattern.MULTILINE);

String lines = "some rubbish <td>value 1</td> some futher \nrubbish <td>value \n2</td> and more again";
Matcher matcher = pattern.matcher(lines);
for (int startPoint = 0; matcher.find(startPoint); startPoint = matcher.end())
{
System.out.println(" Value found at " + matcher.start(1) + " with value [" + matcher.group(1) + "]");
}
}
}
 
Sebastian Green
Ranch Hand
Posts: 136
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well in my td tag there wont be another htmltag. No new lines, just a-z&0-9.. your code didnt do the trick. Here is my String


The value I want is "7 r", "2 st" & "6 194 834 p".

Thansk for your time!
 
Ranch Hand
Posts: 135
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This might be overkill but you could use an xml parser and get the values from the 'TD' elements.
 
James Sabre
Ranch Hand
Posts: 781
Java Netbeans IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Change the pattern to

Pattern pattern = Pattern.compile("<td[^>]*>([^<]*)</td>", Pattern.MULTILINE | Pattern.CASE_INSENSITIVE);

This still makes several severe assumptions about your requirements. One can only generate an effective regular expression if the requirements are well specified and without a good specification one is only guessing.

You are reaching the point where you might do better to use a tolerant XML parser to generate a DOM document. A Google search could be effective.
 
James Sabre
Ranch Hand
Posts: 781
Java Netbeans IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry Rovas, I had my blinkers on and I didn't spot your post!

It looks like you and I agree about it possible being better to use an XML parser!
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!