I am trying to develop an android application to translate certain gestures used by deaf people to communicate into text displayed on the screen or voice using mobile camera.The camera will capture the gestures,then the application will handle the process of changing these movements into proper text or voice.
Where should I start?What Java classes or interfaces should I implement.As far as I know,image processing software has to be developed.Can I rely on existing Java classes and interfaces to implement my application.I need ideas.Thank you.
That is a really neat idea. I think there would need to be a lot of processing involved so the first thing to do would be to search for Android-compatible image processing libraries. I wouldn't be surprised if Google has one available - one they use in their Goggles and Translate applications, but I am not sure. And don't limit yourself to just Java. They might be the easiest to use, but C++ can be used with the NDK - and there are several good C++ image processing libraries of which I am sure some are Android compatible (OpenCV is I know. There are others).
What I would do were I you would be to take a few of the image processing libraries I find and write a normal Java prototype application. Take some 'ideal' images and see if you can process them. There are two ways you could go about doing it: 1) Object detection with a set of rules (sort of like face detection but with more rules and shapes to be found), or 2) Feature/2D shape detection and matching against 'knowns'. I bring up these two methods because I know OpenCV can do them. See how well you can get along on the computer, how much time it takes, how small you can get the detection/matching rules and references and still be reliable. I do a lot of image processing on PCs and I find that making the images as simple as possible (black and white binary images rock!) makes comparisons much faster and more resilient to minor shape variations (associated with noise, angle, or shading) than gray or color images. So your strategy might be to simplify the images so you get a binary of the hands and arms if possible. This also makes time-resolved comparison easier as well (you can generate multiple binary images with values of 0 or 1, then add 100 to the first image. Then subtract the second image from the first image and you get highlights of the motion from the first location to the second location in a very obvious manner (background and areas occupied by both the first and second image have a value of 100, areas occupied by the first image but not the second image are a value of 101 and represent the areas moved away from, and areas occupied by the second image but not the first have a value of 99 and represent the areas moved into.) Of course you don't need to add 100, you could add any value, but 100 lets you process multiple frames like this into a single time-projection.
Once you have the binary images comparisons/analysis becomes relatively easier, it is getting to the binary results that is tough.
Once you have a routine I would then port the result over to Android. The concern might be that the amount of processing might be too much for the phone - but I am not sure - Google Goggles and Translate behave reasonably well, as do various barcode/QRcode scanners, and almost every camera app seems to have face detection these days. But if you find the phone can't handle the processing then you could always develop it as a web service with your phone app as a front end. It wouldn't be 'real time' but might be fast enough if you use low-enough resolution images (no need for 8MP!)