This week's book giveaway is in the Java 9 forum.
We're giving away four copies of Java 9 Modularity: Patterns and Practices for Developing Maintainable Applications and have Sander Mak & Paul Bakker on-line!
See this thread for details.
Win a copy of Java 9 Modularity: Patterns and Practices for Developing Maintainable Applications this week in the Java 9 forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Issue with Web Harvest removing spaces after closing tags  RSS feed

 
Ajay Dhar
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How do I prevent Web Harvest from removing the space after closing tags when I convert html to xml? My configuration file is shown below:



I'm using Web Harvest to extract the paragraphs (<p></p>) from an HTML page. But there's an issue. Web Harvest is removing the space after the closing tags like </b> and </a>. When I remove the HTML tags using JSoup from the results of Web Harvest there is no space between the text of a link and the following word. The same happens for text that was in bold.


Help is greatly appreciated.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!