Reality Flythrough

Overview

This project is the foundation for a much larger research project on Reality Flythrough. The goal is to support virtual exploration through reality using only photographs (or live video feeds) and the camera positions as input. The application has no special knowledge about the geometry of the space. The executable provided here uses 14 photographs of my living room. Enjoy the tour!

Introduction

Reality Flythrough is a recreation of reality for the purpose of virtual exploration. In its ideal, Reality Flythrough is a live, immersive representation of a physical space. There are numerous applications for such a system, but perhaps the most compelling involves disaster response. Imagine first responders equipped with headmounted wireless cameras encountering the chaos of a disaster site. As they fan out through the site, they continuously broadcast what they see to a Reality Flythrough server. Central command could virtually explore the site by attaching to these video feeds and get some sense of the big picture. Medics could locate the injured, firefighters could see potential flareups, and engineers could see structural weaknesses. As more people enter the site and fixed cameras are set up, the naturalness of the flythrough is enhanced until ultimately the entire space is covered and central command can zip around the site looking for hot spots unencumbered by physical forces.

There are many other applications for Reality Flythrough ranging from improving the quality of life of the disabled to allowing people to fly ahead of their vehicles to see what traffic jams can be avoided.

Reality Flythrough (also known as Tele-Reality) was first described in the academic literature by Szeliski (94) as the ideal for immersive real-time live flythroughs through reality. Much research has been done since in texturing virtual reality with photos, but little has been accomplished in using live video feeds for the texturing. The main reason for this is that live texturing is a very difficult problem.

I propose to tackle this problem by assuming a high density of cameras and avoiding texturing altogether. Given enough cameras, it is possible to use the cameras alone to achieve Reality Flythrough simply by choosing the camera whose image is used for each frame of the rendered video.

My purpose for taking CSE167 was to learn how computer graphics could help with providing smooth transitions between cameras. This project is my first crack at providing these smooth transitions. I have made a number of simplifications in this proof of concept:

  1. Still photographs are used instead of live video
  2. Camera positions are statically defined (the goal is to support moving cameras).
  3. Camera position, rotation, and attitude information are relatively accurately measured.

Despite these simplifications, the results are very exciting and suggest that still photographs can be a substitute for live video in many cases and can help reduce the requirement of having high camera density.

Disclaimer: I'm a grad student and this project has been supporting my research all quarter, so I've spent far more than the 20-30 hours suggested.

Screen Shots

The input to the program is 14 camera positions (position, rotation, and attitude) and 14 640x480 images taken from those positions. The images were taken at various locations in my living room. Here are some sample transitions between two positions.

Photo A:
Pos(4.63, 5.00, 10.00) 35.0 deg screenshot_rotation1.jpg
Photo B:
Pos(4.63, 5.00, 10.00) 75.0 deg screenshot_rotation2.jpg

Transition from A to B screenshot_rotation3.jpg

Photo A:
Pos(4.63, 5.00, 10.00) 35.0 deg screenshot_rotation1.jpg
Photo I:
Pos(13.63, 5.00, 4.00) 295.0 deg screenshot_rotation2.jpg

Transition from A to I screenshot_rotation3.jpg

The screen shots don't really do it justice. You have to actually see the motion to appreciate it.

Instructions for Running

Warning: This requires a pretty fast computer with a good graphics card.

First download and install the app:

There are several different interfaces that can be used for exploring my living room. The simplest to use is one of the permutations of the "camera to camera" transitions.

Camera to Camera Transitions

To perform a camera to camera transition, you select the camera that you want to move to by pressing a letter in the range of 'a' to 'o'.

There are three different kinds of camera to camera transitions. The default does a straight one to one transition. You select the transition mode by entering a number from 1 to 3. The modes are described below:

  1. one to one This transition does a straight transition between two cameras and doesn't make any effort to help the user make sense of the amount of rotation or the distance travelled. Many frames may be all black during one of these transitions. You select this mode by pressing 1.
  2. multi camera straight path This transition attempts to fill in the blank space that might show up during long transitions. The algorithm for choosing images to display along a path needs a lot of work, but this is my first stab at trying to present images that make sense. You'll see that in many cases the additional information actually hurts the user's ability to make sense of the transition. You select this mode by pressing 2.
  3. multi camera walk forward This was my first attempt to improve mode 2 by mimicing how people actually walk. People usually walk facing forward. What this transition mode does is perform a rotation to get the user facing the walking direction, move straight along the path, and then do a final rotation to point the user in the same direction as the target camera. I only use this mode if the amount of translation (walking along the path) is more than the amount of rotation. There aren't many transitions in my living room that meet this criteria, so in most cases you'll see the same results as mode 2. You select this mode by pressing 3.

Free Walking

Free walking allows the user to use cursor keys to explore the space. It takes a litte practice to get used to this mode. I don't have any boundary checking so you can get yourself lost in no-mans land. Since the transitions require constant motion in order to be effective (the poor quality of the screen shots above prove this point), I have made the user be in constant motion. The user has to explicitly press a key to change direction or stop. This mode always uses mode 2 type transitions, so the transition mode you're in does not have any impact on what you see. These are the controls: An example scenario: you might begin rotating left by pressing the '[' key and then when you see a destination you want to go to you press the 'up arrow' to move forward. This cancels the rotation, and starts a forward motion. Once you reach your destination, you can hit 'enter' to move to the camera that best shows that image.

Like I said, it takes a lot of practice, and I don't have enough images of my living room to fill in all detail. This mode also reveals all of the problems with my algorithm for selecting the best camera at any position.

How does it Work?

By now you probably have a pretty good sense of how the program works. The underlying idea is very simple. I paint each image in an appropriately sized rectangle at a position in space that is determined by treating a camera as a projector. I look directly down the camera's lense and essentially project the image onto a virtual wall that is "focus depth" feet away. The "focus depth" is pre-calculated by estimating the distance from the most dominant object in the image to the camera. (This is going to be one of the most difficult hurdles to overcome when dealing with live cameras whose position has not been predetermined.) Once all of the images have been drawn in space, performing a transition is simply a matter of moving the view from looking down one camera lense to looking down another. OpenGL takes care of the rest. Almost. I spent some time playing around with different blending methods to make the transition appear smooth and to obscure some of the artifacts that arise from using 2d images to represent space. The blending method that I settled on turns off depth testing and draws the "TO" image on top of the "FROM" image with the opacity of the "TO" image increasing from 40% to 100%.

To handle multi-camera transitions, I draw a virtual line segment from the starting point to the ending point and find all cameras that are within a delta of the line segment. I then compute a fitness value for each camera based on distance and camera angle. Ordering by fitness, I place the cameras at the approriate frames, making sure to keep the number of frames between each camera above a specified minimum.

The walk forward transition is composed of three of the multi-camera transitions: onne that does the initial rotation with as much of the translation that completes in that time; one that does the translation; and one that does the final rotation.

The bulk of the work for this project was spent on making the application scaleable. The goal is to support thousands of live video cameras. I had to make special considerations for memory utilization and speed. One example of this is how I handle texture maps. I have a texture map resource allocator, and currently there are only two textures created. Each time a camera gains focus, it acquires one of the texture maps, and loads its current image into it (remember, the goal is to support live video, so the images will always be changing). Because speed is an issue when loading these images, I do not use gluBuild2DMipmaps(). I also do not use gluScaleImage() to get my image size to be a factor of 2. Instead, I place my image in the upper left of a block of memory that is large enough to hold an image whose width and height are a factor of 2, and then specify the right and bottom coordinates of my texture map as the ratio of the original image size to the new image size.

Todo

There are a number of improvements that are planned: