You might ask: “how on Earth computer-vision is related to RIA?”, my answer is “as Microsoft Surface is”.
Computer-vision is the technique used to make machines understand the physical environment through images. Used of course a lot by robots but getting more and more common, because of its affordability, as “input device” for computers allowing a more natural interaction (without mouse or keyboard) with the software. The recent boom of computer-vision is in facts due to multi-touch interfaces, like the ones built by Jeff Han and later by Microsoft (Surface and Touch Wall), which are making many users dreaming one day to browse their files like Tom Cruise in Minority Report.
With Surface, Microsoft is trying to create a solution where the computer not only tracks fingers but also items, this probably happens through fiducials presumably just added to the items as sticker (thing easily achievable with reacTIVision).
A big advantage of using cameras as input and projectors as output is indeed that your interactive surface is very scalable and the size is basically limited only by the projection/camera distance and angle.
The images that are retrieved by the camera have to be analyzed depending on the information we want to read from them. External factors like lighting or background can make the process very hard and having a good setup, when possible, is crucial. To analyze the images, general image processing algorithms have to be used, from blur or median filters to dilation and erosion, which can be very useful for instance when looking for connected components (here some fundamentals of Image Processing).
Luckily there are already libraries that hide many complex algorithms from you, but it is good to know how they work and how to better prepare the images for them.
THE C++ library is OpenCV, in the java world JMyron is quite popular.
With higher level front-ends, like Flash, the most common approach is to use a proxy application which analyzes the images and sends the results through socket. An example could be Touchlib.
Said that, since Flash can manage one webcam and allows you to perform low level image processing, it can be already used to create a complete solution, from input to output, by itself and I’m sure BitmapBlogDetection after a couple of bitmap filters can help you doing a multi-touch system in no time, and most important only ActionScript!
For further readings I would recommend to follow my related tag on del.icio.us.