For text files you can check that the file does not contain any characters that are not part of the character sets you want to allow. Often that means only characters below 256 are allowed, but Uncode (or whatever other encoding might be used) complicate the matter a bit.
HTML files should have a DOCTYPE declaration at the beginning, although not all do.
You can
test for image files of a particular type by something like the
ImageInfo class.