once you have converted a webpage to an image (Quite abstract for me)
you should probably convert it to grayscale and then make custom filters to increase contrast to a very high value,
the resulting output would be an image thats black and white with text ready for OCR
I do not know of any current apis that support OCR.