Thanks Thushan. Handwriting recognition and image to text conversion are few use cases i could quote which are similar to image classification/segmentation and combination of text and image.
My imagination on CNN is that I just recognize an object by zooming in and zooming out and try to classify or segment based on the recognized features.
For RNN, it makes a great sense to take LSTM model and assume how we read text and analyze the context of the sentence or a paragraph. we also can go forward and backward on sentences like in bi-directonal Networks. Definitely we have seqtoseq, manytoone and onetomany or many to many models.
Beyond these general example which puts up itself into the context of our human eye, reading skills and listening skills. is it possible to apply them in Strategic Games like what Open AI did? how complex such system will be and what will be general compute resource required in production systems.