• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Stripping out HTML from String

 
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello,

I'm writing a web app using JAVA. I have a code that generates HTML code from some template and database. Funtion returns this HTML in String and I need to take out <head> tag from my html.

My code looks like this:



As you see, i need to take out <head>.....</head> from my code and <body ...>. Leave everything that is in inside body.
 
Ranch Hand
Posts: 143
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Do you know how to search a file? Basically search through the file, looking for the stuff you want to remove (or the first line you want to keep). You'll need to create a temporary file to copy the contents you want to keep from the original, and then when you're done, write the new stuff to the original file (or write to a new file).
 
Maksim Ustinov
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
That's not a problem. I already have the file and the content is in the string. Not i just need to create Regular Expression to remove it using .removeAll() function but I don't know how to create that RegEx.
 
author & internet detective
Posts: 41878
909
Eclipse IDE VI Editor Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Maksim,
You are correct that using a regular expression is the best way to approach this. Whenever I use regular expressions, I start out small and make sure my regular expression does the same thing at each step.

For example, can you write a regular expression to:
1) Remove <head>?
2) Remove <head>...</head>?
3) Remove <body withABunchOfAttributes>?
3) Remove </body>?
4) Combine steps 2-4? (hint - you need to use grouping parens for this one if you want to do it one regular expression)

This sounds like a strange requirement. Do you really want to remove all the HTML rather than just the head and body tags? In particular do you want the <html> and <table> tags present?

Also, take a look at the Pattern.DOT_ALL flag since you are matching across multiple lines. I know about this flag, use it frequently and still manage to forget it on my first shot most of the time.
 
Maksim Ustinov
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Jeanne for your response.
Yes, I do need to delete <html> and </html> tags but that's not a problem, the problem is with <HEAD> tags..

Here is what I came up with to take out those tags but I'm not sure if this is correct.



Please let me know how it can be optimized and it can out unlimited number of spaces and new lines ignore everything that's in between.
 
Maksim Ustinov
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I just did few modifications to my RegEx and here is what I've got:



One small question is, how do I modify <head> part?
 
Jeanne Boyarsky
author & internet detective
Posts: 41878
909
Eclipse IDE VI Editor Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Maksim,
Are you trying to delete everything between the head tags? (I think that's what you are trying to accomplish, but the reg exp is way too complicated for that. So then I second guessed my understanding.)

This matches everything between the head tags regardless of what is in between:
 
I was her plaything! And so was this tiny ad:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic