asit dhal wrote:I need to remove all tags(html tags and javascript code) from a web page.
Can somebody tell me how to do this ?
I suggest you look at a parser for SAX or DOM.
Java has implementations for both. The first is generally easier to use, and I'm pretty sure it will do what you want; however you may need to convert the HTML to XHTML first. For that, there is a utility called JTidy, which I believe has it's own SAX-like parser built-in; but I've never used it, so have no idea how easy it is.
Tip: DON'T think about a regex-based solution if there is any "awareness" required. They are very powerful, but not well-suited to hierarchical logic.
Winston