This week's book giveaway is in the Java 9 forum. We're giving away four copies of Java 9 Modularity: Patterns and Practices for Developing Maintainable Applications and have Sander Mak & Paul Bakker on-line! See this thread for details.
Issue with Web Harvest removing spaces after closing tags
posted 5 years ago
How do I prevent Web Harvest from removing the space after closing tags when I convert html to xml? My configuration file is shown below:
I'm using Web Harvest to extract the paragraphs (<p></p>) from an HTML page. But there's an issue. Web Harvest is removing the space after the closing tags like </b> and </a>. When I remove the HTML tags using JSoup from the results of Web Harvest there is no space between the text of a link and the following word. The same happens for text that was in bold.