Server-side video processing of life-casting robots (Day #2, idea week)
I love iRobot's history of openness. The roomba API is completely open: anyone with a serial cable and some know-how can control it. The ConnectR should eventually be open as well.
I'd like to see this taken to the next level with live-casting from a robot. Justin.tv meets ConnectR.
The robot would either have to be managed by a human when in public, or it could wander alone in a private space. Community voting could control where the robot goes and whhere it directs its gaze. Ideally, the robot would be placed where average people can't be, like back stage at a concert or fashion show. Dangerous places are also ideal. Let the robot go where the humans shouldn't.
Robots with internet enabled cameras can do more than normal robots. Server-side processing means the robot doesn't need to have expensive hardware on board for the intelligence.
Obstacle avoidance, mapping, localization, face & pedestrian detection, object detection, object tracking & motion modeling, etc. can all be done using today's technology with a single camera stream. Some processing would need to be local given the constraints of network bandwidth, but plenty could be offloaded. It is also a good model for premium services: pay more to get faster connections, fast processing, and more capabilities.
Automated surveillance with alerts for intruders could be a killer-app for wifi/webcam robots. Today's choices for home security are immobile, cost thousands, or both.
Server side image processing itself is a viable idea. It's the next step in online video. Today, we just stream compressed pixels. Tomorrow, we'll calculate and stream information about the scene. You can send a flickr image stream to a service which finds faces, builds a corpus of data to identify the people from context, and performs face recognition. Face recognition companies like Animetrics could be tapped to do the hard part.
Motion detection for surveillance applications could all be online, with companies like Intellivid and ObjectVideo already having optimized the image processing component.
Street Views in maps can be combined with GPS tagged digital camera shots to build super high resolution aerial imagery, and eventually 3D.
Services like Fauxto which aim to be photo-shop online could build an interesting api where any image can be sent with instructions for processing.
The point is that all of these services require sending video and images to a server, where some intelligent processing occurs. Often times, the processing will involve the same software modules, so each problem is not unique. Tap the long tail of software development and allow 3rd parties to build their own processing streams that live on your servers. This could be made simple and standard using tools like Python with modules for image processing like OpenCV, PIL, NumPy, and SciPy.