Saturday, November 7, 2015

DRSSTC Δt5: MIDI Streaming and Shutter Syncing

Until last week, I hadn't really touched my Tesla coil setup since I moved out to Seattle. Maybe because the next step was a whole bunch of software writing. As of Δt3, I had written a little test program that could send some basic frequency (resonant and pulse generation) and pulse shaping commands to the driver. But it was just for a fixed frequency of pulse generation and of course I really wanted to make a multi-track MIDI driver for it.

The number I had in mind was three tracks, to match the output capabilities of MIDI Scooter. While the concept was cool, the parsing/streaming implementation was flawed and the range of notes that you can play with a motor is kinda limited by the power electronics and motor RPM. So I reworked the parser and completely scrapped and rebuilt the streaming component of it. (More on that later.) Plus I did a lot of preliminary thinking on how best to play three notes using discrete pulses. As it turns out, the way that works best in most scenarios is also the easiest to code, since it uses the inherent interrupt prioritization and preemption capabilities that most microcontrollers have.

So despite my hesitation to start on the new software, it actually turned out to be a pretty simple coding task. It did require a lot of communication protocol code on both the coil driver and the GUI, to support sending commands and streaming MIDI events at the same time. But it went pretty smoothly. I don't think I've written as many lines of code before and had them mostly work on the first try. And the result is a MIDI parser/streamer that I can finally be proud of. Here it is undergoing my MIDI torture test, a song known only as "Track 1" from the SNES game Top Gear.

The note density is so high that it makes a really good test song for MIDI streaming. I only wish I had more even more tracks...

The workflow from .mid file to coil driver is actually pretty similar to MIDI scooter. First, I load and parse the MIDI, grouping events by track/channel. Then, I pick the three track/channel combinations I want to make up the notes for the coil. These get restructured into an array with absolute timestamps (still in MIDI ticks). The array is streamed wirelessly, 64 bytes at a time, to a circular buffer on the coil driver. The coil driver reports back what event index is currently playing, so the streamer can wait if the buffer is full.

On the coil driver itself, there are five timers. In order of interrupt priority:

  • The pulse timer, which controls the actual gate drive, runs in the 120-170kHz range and just cycles through a pre-calculated array of duty cycles for pulse shaping. It's always running, but it only sets duty cycles when a pulse is active. 
  • Then, there are three note timers that run at lower priority. Their rollover frequencies are the three MIDI note frequencies. When they roll over, they configure and start a new pulse and then wait for the pulse to end (including ringdown time). They're all equal priority interrupts, so they can't preempt each other. This ensures no overlap of pulses.
  • Lastly, there's the MIDI timer, which runs at the MIDI tick frequency and has the lowest interrupt priority. It checks for new MIDI events and updates the note timers accordingly. I'm actually using SysTick for this (sorry, SysTick) since I ran out of normal timers.
There are three levels of volume control involved as well. Relative channel volume is set by configuring the pulse length (how many resonant periods each pulse lasts). But since the driver was designed to be hard-switched, I'm also using duty cycle control for individual note volume. And there is a master volume that scales all the duty cycles equally. All of this is controlled through the GUI, which can send commands simultaneously while streaming notes, as shown in the video.

It's really nice to have such a high level of control over the pulse generation. For example, I also added a test mode that produces a single long pulse with gradually ramped duty cycle. This allows for making longer, quieter sparks with low power...good for testing.

I also got to set up an experiment that I've wanted to do ever since I got my Grasshopper 3 camera. The idea is to use the global shutter and external trigger capabilities of the industrial camera to image a Tesla coil arc at a precise time. Taking it one step further, I also have my favorite Tektronix 2445 analog oscilloscope and a current transformer. I thought that it would be really cool to have the scope trace of primary current and the arc in the same image at the same time, and then to sweep the trigger point along the duration of the pulse to see how they both evolve.

The setup for this was a lot of fun.

Camera is in the foreground, taped to a tripod because I lost my damn tripod adapter.
Using a picture frame glass as a reflective surface with a black background (and a neon bunny!).
I knew I wanted to keep the scope relatively far from the arc itself, but still wanted the image of the scope trace to appear near the spark and be in focus. So, I set up a reflective surface at a 45º angle and placed the scope about the same distance to the left as the arc is behind the mirror, so they could both be in focus. When imaged straight on, they appear side by side, but the scope trace is horizontally flipped, which is why the pulse progresses from right to left.

This picture is a longer exposure, so you can see the entire pulse on the scope. To make it sweep out the pulse and show the arc condition, I set the exposure to 20-50μs and had the trigger point sweep from the very start of the pulse to the end on successful pulses. So, each frame is actually a different pulse (should be clear from the arcs being differently-shaped) but the progression still shows the life cycle of the spark, including the ring-up period before the arc even forms.The pulse timer fires the trigger at the right point in the pulse through a GPIO on the microcontroller. Luckily, the trigger input on the camera is already optocoupled, so it didn't seem to have any issues with EMI.

Seeing the pulse shape and how it relates to arc length is really interesting. It might be useful for tuning to be able to see primary current waveform and arc images as different things are adjusted. No matter what, the effect is cool and I like that it can only really be done with old-school analog methods and mirrors (no smoke, yet).

Monday, September 21, 2015

GS3 / SurfaceCam Multipurpose Adapter

It's been a while since I've made anything purely mechanical, so I had a bit of fun this weekend putting together a multipurpose adapter for my Grasshopper 3 camera.

The primary function of the adapter is to attach the camera to a Microsoft Surface Pro tablet, which acts as the monitor and recorder via USB 3.0. I was going to make a simple adapter but got caught up in linkage design and figured out a way to make it pivot 180º to fold flat in either direction.

Some waterjet-cut parts and a few hours of assembly later, and it's ready to go. The camera cage has 1/4-20 mounts on top and bottom for mounting to a tripod, or attaching accessories, like an audio recorder in this case. There's even a MōVI adapter for attaching just the Sufrace Pro 2 to an M5 handlebar for stabilizer use. (The camera head itself goes on the gimbal inside a different cage, if I can ever find a suitable USB 3.0 cable.)

Anyway, quick build, quick post. Here are some more recent videos I've done with the GS3 and my custom capture and color processing software.

Plane spotting at SeaTac using the multipurpose adapter and a 75mm lens (250mm equivalent).

Slow motion weed trimming while testing out an ALTA prototype. No drones were harmed in the making of this video.

Freefly BBQ aftermath. My custom color processing software was still a bit of a WIP at this point.

Saturday, January 17, 2015

Three-Phase Color

I was doing a bit more work on my DirectX-based .raw image viewer when I came across a nice mathematical overlap with  three-phase motor control theory. It has to do with conversion from red/green/blue (RGB) to hue/saturation/lightness (HSL), two different ways of representing color. Most of the conversion methods are piecewise-linear, with max(), min(), and conditionals to break up the color space. But I figured motors are round and color wheels are round, so maybe I would try applying a motor phase transform to [R, G, B] to see what happens.

The transform of interest is the Clarke transform, which converts a three-phase signal into two orthogonal components (α and β) and a zero-sequence component (γ) that is just the average of the three phases. In motor control with symmetric three-phase signals, γ is usually zero. Applied to [R, G, B], it's just the intensity, one measure of lightness.

In motor control, it's common to find the phase and magnitude of the vector defined by α and β, for example to determine the amplitude and electrical angle of sinusoidal back EMF in a PMSM. It turns out the phase and magnitude are useful in color space as well, representing the hue and saturation, respectively. It might not be exactly adherent to the definition of these terms, but rather than rambling on about hexagons and circles, I'll just say it is close enough for me. (The Wikipedia article's alternate non-hexagon hue (H2) and chroma (C2) calculation is exactly the Clarke transform and magnitude/phase math.)

So I added this hue and saturation adjustment method to the raw viewer's pixel shader:

I'm particularly happy about the fact that it occupies barely 15 lines of HLSL code:

// Clarke Transform Color Processing:
c_alpha = 0.6667f * tempcolor.r - 0.3333f * tempcolor.g - 0.3333f * tempcolor.b;
c_beta = 0.5774f * tempcolor.g - 0.5774f * tempcolor.b;
c_gamma = 0.3333f * tempcolor.r + 0.3333f * tempcolor.g + 0.3333f * tempcolor.b;
c_hue = atan2(c_beta, c_alpha);
c_sat = sqrt(pow(abs(c_alpha), 2) + pow(abs(c_beta), 2));
c_sat *= saturation;
c_hue += hue_shift;
c_alpha = c_sat * cos(c_hue);
c_beta = c_sat * sin(c_hue);

tempcolor.r = c_alpha + c_gamma;
tempcolor.g = -0.5f * c_alpha + 0.8660f * c_beta + c_gamma;
tempcolor.b = -0.5f * c_alpha - 0.8660f * c_beta + c_gamma;

I doubt it's the most computationally efficient way to do it (with the trig and all), but it does avoid a bunch of conditionals from the piecewise methods. And as I mentioned in the last post, the pixel shader is far from the performance bottleneck of the viewer right now.

Updated HLSL Source: debayercolor.fx

Updated Viewer Source (VB 2012 Project):
Built for .NET 4.0 64-bit. Requires the SlimDX SDK.

And for fun, here's some 150fps video of a new kitchen appliance I just received and hope to put to good use soon:

Saturday, December 27, 2014

Fun with Pixel Shaders

One of the things that saved my ass for MIT Mini Maker Faire was SlimDX, a modern (.NET 4.0) replacement for Microsoft's obsolete Managed DirectX framework. I was only using it to replace the managed DirectInput wrapper I had for interfacing to USB HID joysticks for controlling Twitch and 4pcb. For that it was almost a drop-in replacement. But the SlimDX framework also allows for accessing almost all of DirectX from managed .NET code.

I've never really messed with DirectX, even though at one point long ago I wanted to program video games and 3D stuff. The setup always seemed daunting. But with a managed .NET wrapper for it, I decided to give it a go. This time I'm not using it to make video games, though. I'm using it to access GPU horsepower for simple image processing.

The task is as follows:

The most efficient way to capture and save images from my USB3.0 camera is as raw images. The file is a binary stream of 8-bit or 12-bit pixel brightnesses straight from the Bayer-masked sensor. If one were to convert these raw pixel values to a grayscale image, it would look like this:

Zoom in to see the checkerboard Bayer's lost a bit in translation to a grayscale JPEG, but you can still see it on the car and in the sky.

The Bayer filter encodes color not on a per-pixel basis, but in the average brightness of nearby pixels dedicated to each color (red, green, and blue). This always seemed like cheating to me; to arrive at a full color, full resolution image, 200% more information is generated by interpolation than was originally captured by the sensor. But the eye is more sensitive to brightness than to color, so it's a way to sense and encode the information more efficiently.

Anyway, deriving the full color image from the raw Bayer-masked data is a bit of a computationally-intensive process. In the most serial implementation, it would involve four nested for loops to scan through each pixel looking at the color information from its neighboring pixels in each direction. In pseudo-code:

// Scan all pixels.
for y = 0 to (height - 1)
 for x = 0 to (width - 1)
  Reset weighted averages.  

  // Scan a local window of pixels.
  for dx = -N to +N
   for dy = -N to +N
    brightness = GetBrightness(x+dx, y+dy)
    Add brightness to weighted averages.

  Set colors of output (x, y) by weighted averages.


The window size (2N+1)x(2N+1) could be 3x3 or 5x5, depending on the algorithm used. More complex algorithms might also have conditionals or derivatives inside the nested for loop. But the real computational burden comes from serially scanning through x and y. For a 1920x1080 pixel image, that's 2,073,600 iterations. Nest a 5x5 for loop inside of that and you have 51,840,000 times through. This is a disaster for a CPU. (And by disaster, I mean it would take a second or two...)

But the GPU is ideally-suited for this task since it can break up the outermost for loops and put them onto a crapload of miniature parallel processors with nothing better to do. This works because each pixel's color output is independent - it depends only on the raw input image. The little piece of software that handles each pixel's calculations is called a pixel shader, and they're probably the most exciting new software tool I've learned in a long time.

For my very first pixel shader, I've written a simple raw image processor. I know good solutions for this already exist, and previously I would use IrfanView's Formats plug-in to do it. But it's much more fun to build it from scratch. I'm sure I'm doing most of the processing incorrectly or inefficiently, but at least I know what's going on under the hood.

The shader I wrote has two passes. The first pass takes as its input the raw Bayer-masked image and calculates R, G, and B values for each pixel based on the technique presented in this excellent Microsoft Research technical article. It then does simple brightness, color correction, contrast, and saturation adjustment on each pixel. This step is a work-in-progress as I figure out how to define the order of operations and what techniques work best. But the framework is there for any amount of per-pixel color processing. One nice thing is that the pixel shader works natively in single-precision floating point, so there's no need to worry about bit depth of the intermediate data for a small number of processing steps.

The second pass implements an arbitrary 5x5 convolution kernel, which can be used for any number of effects, including sharpening. The reason this is done as a second pass is because it requires the full-color output image of the first pass as its input. It can't be done as part of a single per-pixel operation with only the raw input image. So, the first pass renders its result to a texture (the storage type for a 2D image), and the second pass references this texture for its 5x5 window. The output of the second pass can either be rendered to the screen, or to another texture to be saved as a processed image file, or both.

What a lovely Seattle winter day.
Even though the pixel shader does all of the exciting work, there's still the matter of wrapping the whole thing in a .NET project with SlimDX providing the interface to DirectX. I did this with a simple VB program that has a viewport, folder browser, and some numeric inputs. For my purposes, a folder full of raw images goes together as a video clip. So being able to enumerate the files, scan through them in the viewer, and batch convert them to JPEGs was the goal.

Hrm, looks kinda like all my other GUIs...

As it turns out, the pixel shader itself is more than fast enough for real-time (30fps) processing. The time consuming parts are loading and saving files. Buffering into RAM would help if the only goal was real-time playback, but outputting to JPEGs is never going to be fast. As it is, for a 1920x1200 image on my laptop, the timing is roughly 30ms to load the file, an immeasurably short amount of time to actually run both pixel shader passes, and then 60ms to save the file. To batch convert an entire folder of 1000 raw images to JPEG, including all image processing steps, took 93s (10.75fps), compared to 264s (3.79fps) in IrfanView.

Real-time scrubbing on the MS Surface Pro 2, including file load and image processing, but not including saving JPEGs.

There are probably ways to speed up file loading and saving a bit, but even as-is it's a good tool for setting up JPEG image sequences to go into a video editor. The opportunity to do some image processing in floating point, before compression, is also useful, and it takes some of the load off the video editor's color correction.

I'm mostly excited about the ability to write GPU code in general. It's a programming skill that still feels like a superpower to me. Maybe I'll get back into 3D, or use it for simulation purposes, or vision processing. Modern GPUs have more methods available for using memory that make the cores more useful for general computing, not just graphics. As usual with new tools, I don't really know what I'll use it for, I just know I'll use it.

If you're interested in the VB project and the shader source (both very much works-in-progress), they are here:

VB 2012 Project:
Built for .NET 4.0 64-bit. Requires the SlimDX SDK.

HLSL Source: debayercolor.fx

P.S. I chose DirectX because it's what I have the most experience with. (I know somebody will say, "You should use OpenGL.") I'm sure all of this could be done in OpenGL / GLSL as well.

Saturday, October 18, 2014

MIT Mini Maker Faire 2014

I made a quick trip back to Boston/Cambridge for the first ever MIT Mini Maker Faire. Recap and pictures below, but first here is some video from the event:

As expected from an MIT Maker Faire, there were lots of electric go-karts, Tesla coils, 3D printers, robots, and... scooter...things.
To this I contributed a set of long-time Maker Faire veteran projects (Pneu Scooter, 4pcb, Twitch) and a couple of new things (Talon v2 multirotor, Grasshopper3 Camera Rig). I always like to bring enough projects that if some don't work or can't be demonstrated, I have plenty of back-ups. Fixing stuff on the spot isn't really possible when you have a constant stream of visitors. But I've been to a number of Maker Faires and decided the maximum number of projects I can keep track of is five. Especially since this time I had to be completely mobile, as in airline mobile.

Arriving at the venue, MIT's North Court, luggage in the foreground, MIT Stata Center in the background.
The travel requirement meant that, unfortunately, tinyKart transport was out. (Although it is theoretically feasible to transport it for free via airline except for the battery and the seat...) But Pneu Scooter is eminently flyable and in fact has been all over the world in checked baggage already. It collapses to about 30" long and weighs 18lbs. The battery is well within TSA travel limits for rechargeable lithium ion batteries installed in a device. Oh, and Twitch fits right between the deck and the handlebar:

It was definitely designed that way...
Pneu Scooter and Twitch are really all I should ever bring to Maker Faires. They are low-maintenance and very reliable; both have survived years of abuse. In fact, Pneu Scooter is almost four year old now...still running the original motor and A123 battery pack, and still has a decent five-mile range. (I range-tested it before I left.) It's been through a number of motor controllers and wheels though. Because the tires are tiny, it's always been a pain in the ass accessing the Schrader valves with normal bike pumps. Turns out it just took five minutes of Amazon Googling (Is that a thing?) to solve that problem:

Right-angle Schrader check valve. Why Have I not had this forever?
Pneu Scooter survived the rainy Faire with no issues - it's been through much worse. I participated in the small EV race featuring 2- and 3-wheel vehicles. Unfortunately I didn't get any video of it, but Pneu Scooter came in third or something...I wasn't keeping track and nobody else was either. Mostly I was occupied by trying to avoid being on the outside of the drift trike:

Yes, those red wheels are casters...
But if I had to pick one project that I could pretty much singly count on for Maker Faire duty, it's Twitch.

Despite the plastic Vex wheels, Twitch has been pretty durable over the years. I had planned to spend a few days rebuilding it since I thought one of the motors was dead, but when I took it off the shelf to inspect, it was all working fine. In fact, the only holdup for getting it Faire-ready was that the Direct Input drivers I have been using since .NET 1.0 to read in Twitch's USB gamepad controller are no longer supported by Windows 7/8. Yes, Twitch outlasted a Microsoft product lifecycle... Anyway, after much panic, I found a great free library called SlimDX that offers an API very similar to the old Managed Direct X library, so I was back in action.

Basically, Twitch is an infinite source of entertainment. I spent a lot of the Faire just driving it around the crowd from afar and watching people wonder if it's autonomous... I would also drive it really slowly in one orientation, wait for a little kid to attempt to step on it (they always do), and then change it to the other orientation and dart off sideways. And then there is just the normal drifting and driving that is unlike any other robot most people have seen. I found an actual use for the linkage drive too - when it would get stuck with two wheels on pavement and two wheels on grass, it was very easy to just rotate the wheels 90º and get back on the walkway. Seattle drivers need this feature for when it snows...

Twitch is definitely my favorite robot. Every time I take it out, it gets more fun to drive. I have 75% of the parts I need to make a second, more formidable version... This Maker Faire was enough to convince me that it needs to be finished.

4pcb was a bit of a dud. I don't know if my standards for flying machines have just gotten higher or if it always flew as bad as it did during my pre-Faire flight test. It still suffers from a really, really bad structural resonance that kills the efficiency and messes with the gyros.

It was one of the first, or maybe the first PCB quadrotor with brushless motor drivers. But the Toshiba TB6588FG drivers are limited in what they can do, as is the Arduino Pro Mini that runs the flight control. Basically, it's time for a v2 that leverages some new technology and also improves the mechanical design - maybe going to 5" props as well. We'll see...

And unfortunately, because of the rain and crowds, I didn't get to do any aerial video with my new Talon copter. But it looks good and works quite nicely, for a ~$300 all-up build. (Not including the GoPro.) Here's some video I shot with my dad in North Carolina that I had queued up to show people at the Faire.

Talon v2, son of Kramnikopter.
Electric linkage drive scooter my Kickstarter plz.
The last thing I brought for the Faire was my Grasshopper 3 camera setup with the custom recording software I've been working on for the Microsoft Surface. With this and the Edgerton Center's new MōVI M5, I got to do a bit of high speed go-kart filming and other Maker Faire documentation. The videos above were all created with this setup.

I had a stand, but this seemed easier at the time...
As a mobile, stabilized, medium-speed camera (150fps @ 1080p), it really works quite nicely. I know the iPhone 6+ now has slow-mo and O.I.S., but it's way more fun to play with gimbals and raw 350MB/s HD over USB3. Of course it meant I had 200GB of raw video to go through by the end of the Faire. I did all of the video editing in Lightworks using a JPEG timeline. (Each frame is a JPEG in a folder...somehow it manages to handle this without rendering.)

And that's pretty much it. It was much like other Maker Faires I've been to: lots of people asking questions with varying degrees of incisiveness ("Is that a drone?"), crazy old guys who come out right before the Faire ends to talk to you about their invention, and little kids trying to ride or touch things that they shouldn't be trying to ride or touch while their parents encourage them. Although I did get one or two very insightful kids who came by on their own and asked the most relevant questions, which gives me hope for the future. It was great to return to Cambridge and see everyone's cool new projects as well.

My MIT Maker Faire 2014 fleet by the numbers:
Projects: 5
Total Weight: ~75lbs
Total Number of Wheels: 6 (not including omniwheel rollers)
Total Number of Props: 8
Total Number of Motors: 18 (not including servos), 1 of my own design
Total Number of Motor Controllers: 18 (duh...), 16 of my own design!
Total Number of Cameras: 2

And here are some more pictures from the Faire:

I'm not sure what this is.
Dane's segboard, Flying Nimbus, which I got to ride. It actually has recycle Segstick parts!
Good old MITERS, where you can't tell where the shelves end and the floor begins!
Ed Moviing. I finally figured out a good way to power wireless HD transmitters...can you see?
A small portion of the EVs, lining up for a picture or a race or something.
Of course there were Tesla coils.
Flying out of Boston after the Faire, got a great Sunday morning view.

Sunday, September 7, 2014

Grasshopper3: Circular Buffer, Post Triggering, and Continuous Modes

Previously I have implemented a bare-bones RAM buffering program for the Grasshopper3 USB3 camera. The idea was to strip out all operations other than transferring raw image data from the USB3 port to RAM, so that the full frame rate of the camera can be buffered even on a Microsoft Surface2 tablet. While the RAM buffer is filling, no image conversion or saving to disk operations are going on. The GUI is running on another thread, and the image preview rate is held to 30Hz.

One-shot linear buffer with pre-trigger: 1) After triggering, frames are transferred into RAM (yellow). 2) RAM buffer is full. 3) Images are converted and saved to disk (not in real-time).
At the time I also tried to implement a circular buffer, where the oldest images are continuously overwritten in RAM. This allows for post-triggering, a common feature of high-speed cameras. The motivation for post-triggering is that the buffer is short (order of seconds) but you don't know when the exciting thing is going to happen. So you capture image data at full frame rate, continuously overwriting the oldest images in the buffer, until something exciting happens. The trigger can come after the action, stopping the image data capture and locking the previous N image frames in the buffer. The entire buffer can then be saved starting from the oldest image and ending at the trigger.

Circular buffer with post-trigger: 1) Buffer begins filling with frames. 2) After full, the oldest frame is overwritten with the newest. 3) A post-trigger stops buffering new frames and starts saving the previous N frames to disk.
It didn't work the first time I tried it; the frame rate would drop after the first time through the buffer. But I did a little code cleanup - now the first time through it constructs the array elements that make up the frame buffer, but on subsequent passes it assumes the structures are already in place and just transfers in data. This makes a wonderful flat-top trapezoidal wave of RAM usage that corresponds exactly with the allocated buffer size:

Post-triggering is not the only thing a circular buffer structure is good for. It can also be used as the basis for robust (buffered) continuous saving to disk. Assuming a fast enough write speed, frames can be continuously taken out of the buffer on a First-In First-Out (FIFO) basis and written to disk. I say "disk" but in order to get fast enough write speeds it really does need to be a solid-state drive. And even then it is a challenge.

For one, the sequential write speed of even the fastest SSDs struggles to keep up with USB3. To achieve maximum frame rate, the saved images must be in a raw format, both to keep the size down (one color per pixel, not de-Bayered) and to avoid having the processor bottleneck the entire process during image conversion. Luckily there is an option to just spit out raw Bayer data in 8- or 12-bit-per-pixel formats. IrfanView (my favorite image viewer) has a plug-in that is capable of parsing and color-processing raw images. The plug-in also works with the batch conversion portion of IrfanView, so you can convert an entire folder of raw frames.

The other challenge is that the operations required to save to disk take up processor time. In the FlyCap2 software that comes with the camera, the image capture loop has no trouble running at full frame rate, but turning on the saving operation causes the frame processing rate to drop on my laptop and especially on the MS Surface 2. To try to combat this problem, I did something I've never done before: actually intentionally write a multi-threaded application the right way. The image capture loop runs on one thread while the save loop runs on a separate thread. (And the GUI runs on an entirely different thread...) This way, a slow-down on the saving thread ideally doesn't cause a frame rate drop on the capture thread. The FIFO might fill up a little, but it can catch up later.

Continuous saving: Images are put into the RAM buffer on one thread (yellow) and removed from it to write to disk on another thread (green). This happens simultaneously and continuously as long as the disk write can keep up.
There's another interesting twist to the continuous-saving circular buffer: the frame rate in doesn't necessarily have to equal the frame rate out. For example, it's possible to buffer into RAM at 150fps but save only every 5th frame, for 30fps output. Then, if something exciting happens, the outgoing rate can be switched to 150fps temporarily to capture the high-speed action. If the write-to-disk thread can't keep up, the FIFO grows in size. As long as the outgoing rate is switched back to 30fps before the buffer is full, the excess FIFO elements can be unloaded later.

The key parameter for this continuous saving mode is the number of frames of delay between the incoming and the outgoing thread. The target delay could be zero, but then you would have to know in advance if you want to switch to high-speed saving. Setting the target delay to some number part-way through the buffer allows for post-triggering of the high-speed saving period, which seems more useful. I added a buffer graphic to the GUI to show both the target and the actual saving delay during continuous saving. My mind still has trouble thinking about when to turn the frame rate divider on and off, but I think it could be useful in some cases.

Here's some video I took to try out these new modes. It's all captured on the Microsoft Surface 2, so no fancy hardware required and it's all still very portable.

This is a simple test of the circular buffer at 1080p150 with post-trigger. The coin in particular is a good example of when it's nice to just leave the buffer running and post-trigger after a lucky spin gets the coin to land in frame.

More coin spinning, but this time using the continuous saving mode. Frames go into RAM at 150fps, but normally they are only written to disk at 30fps. When something interesting happens (such as the coin actually being in frame...), a burst of 150fps writing to disk is triggered. On the Surface 2, the write thread is slower than the read thread, so it can only proceed for a little while until the FIFO gets too full. Switching back to 30fps saving allows the FIFO to catch up.

Finally. a quick test of lower resolution / higher frame rate operation. At 480p, the frame rate can get up to 360+ fps. Buffering is fine at this frame rate (the overall data rate is actually lower). It actually doesn't require an insane amount of light either - the iPhone display is the only source of light here. You can see its IR proximity sensor LED flashing, as well as individual frame transitions on the display, behind the water stream. The maximum frame rate goes all the way up to 1100+ fps at 120p, sometime I have yet to try out.

That's it for now. The program (which started out as the FlyCapture2SimpleGUI source that comes with the camera) has a nice VC# GUI:

I can't distribute the source since it's derived from the proprietary SDK, but now you know it's possible and relatively easy to get it to capture and save efficiently with a bit of good programming. It was a fun project since I've never intentionally written interacting multi-threaded programs, other than maybe separating GUI threads from other things. I guess I'm only ten years or so behind on my application programming skills now...

Monday, May 5, 2014

Grasshopper3 Mobile Setup

I've now got a complete mobile setup working for the Grasshopper3 camera that I started playing with last week, and I took it for a spin during the Freefly company barbecue. (It's Seattle, so barbecues are indoor events. And it's Freefly, so there are RC drift cars everywhere.)

Since the camera communicates to a Windows or Linux machine over USB 3.0, I went looking for small USB 3.0-capable devices to go with it. There are a few interesting options. My laptop, which I carry around 90% of the time anyway, is the default choice. It is technically portable, but it's not really something you could use like a handheld camcorder. 

The smallest and least expensive device I found, thanks to a tip in last post's comments, is the ODROID-XU. At first I was skeptical that this small embedded Linux board could work with the camera, but there is actually a Point Grey app note describing how to set it up. The fixed 2GB of RAM would be limiting for buffering at full frame rate. And there is no SATA, since the single USB3.0 interface is intended for fast hard drives. So it would be limited to recording short bursts or at low frame rates, I think. But for the price it may still be interesting to explore. I will have to become a Linux hacker some day.

The Intel NUC, with a 4"x4" footprint, is another interesting choice if I want to turn it into a boxed camera, with up to 16GB of RAM and a spot for an SSD. The camera's drivers are known to work well on Intel chipsets, so this would be a good place to start. It would need a battery to go with it, but even so the resulting final package would be pretty compact and powerful. The only thing that's missing is an external monitor via HDMI out.

My first idea, and the one I ended up going with, is the Microsoft Surface Pro 2:

The Grasshopper3 take better pictures at 150fps than my phone does stills.
Other than a brief mention in a Point Grey app note, there wasn't any documentation that convinced me the Surface Pro 2 would work, but it has an Intel i5-4300 series, 8GB of RAM, and USB 3.0, so it seemed likely. And it did work, although at first not quite as well as my laptop (which is an i7-3740QM with 16GB of RAM). Using the FlyCapture2 Viewer, I could reliably record 120fps on the laptop, and sometimes if I kill all the background processes and the wind is blowing in the right direction, 150fps. On the Surface, those two numbers were more like 90fps and 120fps. Understandable, if the limitation really is processing power.

I also could not install the Point Grey USB 3.0 driver on the Surface. I tried every trick I know for getting third-party drivers to install in Windows: disabled driver signing (even though they are signed drivers), modified the .INF to trick Windows into accepting that it was in fact a USB 3.0 driver, turning off Secure Boot and UEFI mode, forcing the issue by uninstalling the old driver. No matter what, Windows 8.1 would not let me change drivers. I read on that internet thing that Windows 8 has its own integrated USB 3.0 driver, even though it still says Intel in driver name. Anyway, after a day of cursing at Windows 8 refusing to let me do a simple thing, I gave up on that approach and started looking at software.

The FlyCapture2 Viewer is a convenient GUI for streaming and saving images, but it definitely has some weird quirks. It tries to display images on screen at full frame rate, which is silly at 150fps. Most monitors can't keep up with that, and it's using processing power to convert the image to a GDI bitmap and draw graphics. The program also doesn't allow pure RAM buffering. It always tries to convert and save images to disk, filling the RAM buffer only if it is unable to do so fast enough. At 150fps, this leads to an interesting memory and processor usage waveform:

Discontinuous-mode RAM converter.
During the up slope of the memory usage plot, the program is creating a FIFO buffer in RAM and simultaneously pulling images out, converting them to their final still or video format, and writing them to disk. During the down slope, recording has stopped and the program finishes converting and saving the buffer. You can also see from the processor usage that even just streaming and displaying images without recording (when the RAM slope is zero) takes up a lot of processor time.

The difference between the up and down slopes is the reason why there needs to be a buffer. Hard disk speed can't keep up with the raw image data. An SSD like the one on the Surface Pro 2 has more of a chance, but it still can't record 1:1 at 150fps. It can, however, operate continuously at 30fps and possibly faster with some tweaking.

But to achieve actual maximum frame rate (USB 3.0 bus or sensor limited), I wanted to be able to 1) drop display rate down to 30fps and 2) only buffer into RAM, without trying to convert and save images at the same time. This is how high-speed cameras I've used in the past have worked. It means you get a limited record time based on available memory, but it's much easier on the processor. Converting and saving is deferred until after recording has finished. You could also record into a circular RAM buffer and use a post trigger after something exciting happens. Unfortunately, as far as I could tell, the stock FlyCapture2 Viewer program doesn't have these options.

The FlyCapture2 SDK, though, is extensive and has tons of example code. I dug around for a while and found the SimpleGUI example project was the easiest to work with. It's a Visual C# .NET project, a language I haven't used before but since I know C and VB.NET, it was easy enough to pick up. The project has only an image viewer and access to the standard camera control dialog, no capacity to buffer or save. So that part I have been adding myself. It's a work-in-progress still, so I won't post any source yet, but you can see the interface on this contraption:

Part of the motivation for choosing the Surface was so I could make the most absurd Mōvi monitor ever.
To the SimpleGUI I have just added a field for frame buffer size, a Record button, and a Save Buffer button. In the background, I have created an array of images that is dynamically allocated space in RAM as it gets filled up with raw images from the camera. I also modified the display code to only run at a fraction of the camera frame rate. (The code is well-written and places display and image capture on different threads, but I still think lowering the display rate helps.)

Once the buffered record is finished, Save Buffer starts the processor-intensive work of converting the image to its final format (including doing color processing and compression). It writes the images to a folder and clears out the RAM buffer as it goes. With the Surface's SSD, the write process is relatively quick. Not quite 150fps quick, but not far off either. So you record for 10-20 seconds, then save for a bit longer than that. Of course, you can still record continuously at lower frame rates using the normal FlyCapture2 Viewer. But this allows even the Surface to hit maximum frame rate.

All hail USB 3.0.
I just have to worry about cracking the screen now.
There are still a number of things I want to add to the program. I tested the circular buffer with post-trigger idea but couldn't get it working quite the way I wanted yet. I think that is achievable, though, and would make capturing unpredictable events much easier. I also want to attempt to write my own simultaneous buffering and converting/saving code to see if it can be any faster than the stock Viewer. I doubt it will but it's worth a try. Maybe saving raw images without trying to convert formats or do color processing is possible at faster rates. And there are some user interface functions to improve on. But in general I'm happy with the performance of the modified SimpleGUI program.

And I'm happy with the Grasshopper3 + Surface Pro 2 combo in general. They work quite nicely together, since the Surface combines the functions of monitor and recorder into one relatively compact device. The real enabler here is USB 3.0, though. It's hard to even imagine the transfer speeds at work. At 350MB/s, at any given point in time there are 10 bits, more than an entire pixel, contained in the two feet of USB 3.0 cable going from the camera to the Surface.

The sheer amount of data being generated is mind-boggling. For maximum frame rate, the RAM buffer must save raw images, which are 8 bits per pixel at 1920x1200 resolution. Each pixel has a color defined by the Bayer mask. (Higher bit depths and more advanced image processing modes are available at lower frame rates.) On the Surface, this means about 18 seconds of 150fps buffering, at most.

There are a variety of options available for color processing the raw image, and after color processing it can be saved as a standard 24-bit bitmap, meaning 8 bits of red, green, and blue for each pixel. In this format, each frame is a 6.6MB file. This fills up the 256GB SSD after just four minutes of video... So a better option might be to save the frames as high-quality JPEGs, which seems to offer about a 10:1 compression. Still, getting frame data off the Surface and onto my laptop for editing seemed like it would be a challenge.

Enter the RAGE.
USB 3.0 comes to the rescue here as well, though. There exist many extremely fast USB 3.0 thumb drives now. This 64GB one has a write speed of 50MB/s and a read speed nearing 200MB/s (almost as fast as the camera streams data). And it's not even nearly the fastest one available. The read speed is so fast that it's actually way better for me to just edit off of this drive than transfer anything to my laptop's hard drive.

Solid- I mean, Lightworks.
Lightworks seems to handle .bmp or .jpg frame folders nicely, importing continuously-numbered image sequences without a hassle. If they're on a fast enough disk, previewing them is no problem either. So I can just leave the source folders on the thumb drive - a really nice workflow, actually. When the editing is all done, I can archive the entire drive and/or just wipe it and start again.

While I was grabbing the USB 3.0 thumb drive, I also found this awesome thing:

It's a ReTrak USB 3.0 cable, which is absolutely awesome for Mōvi use - even better than the relatively flexible $5 eBay cable I bought to replace the suspension bridge cable that Point Grey sells. It's extremely thin and flexible, so as to impart minimum force onto the gimbal's stabilized payload. I'm not sure the shielding is sufficient for some of the things I plan to do with it, though, so I'll keep the other cables around just in case...

That's it for now. Here's some water and watermelon: