I think you have no idea about what kind of "big" task it is a talk here. If to mention one of well known services who does address hygiene and standardization and related stuff, is company called Pitney Bowes, made a revenue of 3.4 billion USD back in 2016, for providing the services you wish to implement.
Now, I'm not a data scientist, but there are such techniques as fuzzy matching, where as you said, some characters could be misspelled/transposed/removed/doubled/you name it, but again, that's complex (very) in its own way, and again, for consolidation tasks, where you may want to consolidate the data on address, you'd need to attempt to hygiene it in some way and standardize it, and we go back again where we started - that's not simple.
So I think what you are left with if that's an in-house implementation, is to do some tricks, similar to:
1. Take the address, remove all spaces (maybe special characters), sort characters in the lexicographic order, and calculate the distance (Levenshtein's) between the two addresses - that what comes to my mind first as a blunt solution. But that's not immune to false positives, where two addresses deriving from the same block just different i.e. flats would have a distance of 1, so what do you do then? The same distance would be if in one of same exact addresses some insignificant character would be missed - so, the decision is yours. You could add more complexity to this logic, maybe to check the distance only on non-numeric characters. Many options, many things to consider.
2. Just removing special characters from addresses and upper casing them, also could cover some percentage of cases, but would leave out many other as well, i.e. missing/misplaced characters.
Venkattaramana Santhababu wrote:But if there is some basic library that I can use to solve 50% of this problem
I'm not aware of that. Maybe there are, google it. There are phone normalisation libraries created by Google, who deal with phone normalisation and standardization, but they don't deal with missing or misplaced digits within the phone numbers (what you are essentially looking for) - as that would be a guessing.