I built a network infrastructure that supported one-to-many broadcasts with less than 700ms of camera-to-monitor latency across the broader internet. Latency was a really interesting problem because it required optimizing every facet of the pipeline; it meant working on image acquisition, video compression, the autoscaling edge-origin network architecture, and the decompression and playback of the video. Each step had to both be fast and parallel to support 4K video at 60 fps, while also taking as little time as possible to pass data off to the next element of the pipeline.
Most video latency comes from the networking protocol used to send the compressed video data over the network to the target computer. Most established live video systems use HLS or RTMP, which have latencies of 3 to 30 seconds. To minimize that latency, I used the SRT protocol for all networking. SRT provides a balance between latency and reliability, so that you can trade off a more responsive stream at the expense of video corruptions becoming more likely. At the time, SRT was a very new protocol that wasn't quite ready for production use, so I worked with the creators to fix some bugs, and to polish a Gstreamer plugin.
The downside, however, to using SRT was the fact that few systems properly supported it. Because of this, we spent a lot of time creating our own SRT-compatible systems. One example of this is our Unity video player which, on Android, was actually more performant than VLC.
Virtual Reality
I fell in love with VR after I tried the DK1 back in 2013. I've released a number of VR titles, including gvr.tv. It was a VR game where you could drive a car with an attached VR camera around my actual living room. You could even buy treats for Gonzo, my friend's dog. It had a 97% positive rating on Steam, and a few users with over 100 hours logged.
I also developed a multiplayer engine called Gamelodge where all the games were made by users. The core idea was that making multiplayer games is hard. Hard to design, hard to debug, and hard to get enough users to get past the chicken-and-the-egg problem. Gamelodge abstracted away the networking, and allowed users to record and share a binary file of the networked state, such that you could replay what happened to see any bugs and edit your code to validate that you've fixed the bug. It also was a nice way to replay fun moments from different angles.
I also worked on a novel approach to streaming live VR that resulted in this patent
Computer Vision
I built an aruco marker detection system that asynchronously locates markers in a video stream, sends them over the network, and plays back the locations of the markers interpolated to be synchronized with the video stream. This was used so that we could display virtual elements on top of a piece of paper with a printed aruco on it. These virtual elements would be buttons, or videos, or 3D avatars of other users.
I also worked on a system to dewarp stereoscopic fisheye video for playback. Most people make the assumption that fisheye video is equirectangular, as this makes the math involved in dewarping much simpler. However, this causes visible warping, that it especially prominent around the edges of the video. To improve this, I got distortion data of the lens from the manufacturer, took 50 images of a checkerboard, and ran the data through OpenCV's fisheye calibration system, using the provided lens data as the initial state for the solver. This generated distortion parameters that allowed me to create a very high quality display system. The final viewer was implemented in an HLSL shader for desktop to create the highest fidelity dewarping possible, and as a mesh for mobile where performance was critical.
Unity3D
I've been working with Unity since roughly 2014. In that time I've built a number of applications with it, both for my own projects and for companies I've worked for. For example, in 2015 I released a small game called "3D Othello" that was available on Oculus Share. It was a multiplayer game where you and an opponent played a version of Othello that entailed a 3D grid of spaces, instead of a 2D map. It included an AI which, I am proud to say, frequently beat me at my own game.
I've also worked on some open source assets for Unity, including Mumble-Unity, which allowed Unity to connect and talk to users of the popular Mumble application. It features smooth audio broadcasting and playback with negligible overhead and virtually no runtime allocations.
Geophysics
As an undergraduate at the University of Maryland, I worked on building a novel approach to locate lunar seismic events based on the polarization of P and S waves. The approach worked well on terrestrial data, but the lunar crust added a lot of noise that made the method unreliable.
Machine Learning
Admittedly, my understanding of Machine Learning is rather facile; I know enough to decide if a problem is ammenable for machine learning, and enough to solve a problem using--for example--tensorflow, but not enough to design a network from scratch.
The first project where I used machine learning was for my research at UC Davis. This used a neural network with one hidden layer, and would help identify steel balls in images that had failed to be identified in previous processing steps.
The second time I used machine learning was a system that took two images of a person, and created a realistic avatar of that person. This worked by passing the facial images to dlib, which generated a description vector of the face. Then this dlib vector would be used as the input layer for a deep neural network in tensorflow, which would output facial morphs for the user's avatar.
Condensed Matter Physics
In the summer of 2014 I was an undergraduate researcher at UC Davis. My research was on identifying patterns within granular dynamics. You can imagine that if you had a jar of rice and slowly tilted the cup, eventually the rice begins to fall over. What's interesting is that the angle at which the rice begins to topple is extremely variable. Sometimes it begins to slide at 20 degrees, and another time it may begin to slide at 40 degrees. We still don't have an understanding of what contributes to the stability of granules like rice. The idea behind my research is that if we generated thousands of images of granular sliding, we could then analyze the parsed data to discover what contributes to stability. My main contribution was in improving the detection of steel balls in a glass drum. Although I think that the research was interesting, honestly it was the project that helped me decide that I don't want a career in research.
Image Signal Processing
Initially, the imagery coming out of my video pipelines were soft, and muted. After looking into the problem, I found a number of causes for the blurriness, and ways to fix them. One contributor was the demosaicing algorithm used on the sensor. Modern digital cameras are only able to measure red or green or blue. Software then uses algorithms to fill in the missing color channels at each pixel. As a result, there is large corpus of research for how to best perform this interpolation. The default implementation provided by the camera manufacturer used a bilinear demosaicing which is simple to implement, but produces soft images with frequent artifacting. To obivate this blurriness, I implemented Demosaicing With Directional Filtering and a posteriori Decision in CUDA. This uses information about where edges are in an image as motivation for which direction to interpolate along. This resulted in imagery that looked vibrant and crisp
The second largest contributor to the loss of visual fidelity was the h264 compression. My initial plan was to upgrade to H265 using a custom encoding chip, but after integrating it, I learned that some of our target GPUs didn't support hardware H265 decoding, and software decoding was too slow for our resolution/framerate. For prerecorded videos, we could run day-long video compressions with software encoding and get outstanding video quality, but live video had to run using hardware encoding which prioritized speed and thoroughput over video fidelity. As a result, my focus was on preprocessing the image so that the encoder had an easier image to compress. The best way I found to do this was denoising the image, so that the encoder had less high frequency spatial information to handle. I worked on a novel image denoiser that used the slow but powerful BM3D to train a low-pass filter to recognize what high frequency data was noise, and what data was structure. This worked somewhat in my initial tests, but due to time concerns I ended up using a temporal filter that averaged groups of pixels over time, provided that they hadn't considerably changed. Were I to revisit this problem again, I'd like to try leveraging the Motion Vector engine on GPUs to determine where a pixel was in previous frames, and then average the resulting pixel across it's history.