Thursday, June 16, 2016

Twitch X - Servoless Linkage Drive

I've got a new bot.

*Twitch* 
I'm not ready to call it 100% finished yet, but, most importantly, it drives!


It can pull off the trickiest bit of linkage drive maneuvering, Translation While axis-swITCHing, which proves it is possible to control all four degrees of freedom at the same time with just the four motors. That's a much better ratio of actuators to degrees of freedom than Twitch, Jr, which relied on two giant servos to steer the linkages. To steal a term from another Twitch, it's a "holonom-ish" drivetrain: able to fully control all three of its planar degrees of freedom, sort-of, in some cases, with an extra degree of freedom just for fun (and for generating more pushing power in a particular direction when desired). In reality, it doesn't have a practical advantage over some other drivetrains, but I've driven lots of different types of robots and this is by far the most fun.

I've Missed Going Home with Aluminum Chips in My Hair

The build was relatively quick and easy.

It helps when most of your parts are topologically similar waterjet-cut plates.
There were a few minor issues...can you spot the one in this picture? (Not the small linkages. Those are just from Twitch, Jr. for comparison.)
 The main actual fabrication required was making a few turned parts on my tinyLathe.

Poor tinyLathe.
These were the posts and keyed hubs for the wheels. The posts hold the thrust and radial bearings on which the drive units swivel. They're straight from tinyKart's steering system, so they should be very overkill for this robot.


Other than that, there was a just a lot of finish-drilling and countersinking. Oh, and some sketchy sheet metal bending that came out surprisingly well (using 5052 aluminum instead of 6061, for better forming properties).


The dimensional accuracy of the build wasn't quite as good as I was hoping, mostly due to lazy machining on my part. The top and bottom plate don't quite drop on with a satisfying zero-force slip fit, but with the bearings it really doesn't matter much. The main mechanical problem I ran into was not leaving any clearance for the rounded-off linkage ends to the inside surface of the motor mounting blocks. This led to a bit of binding that I thought was due to frame alignment issues but in fact was easily solved with a belt sander.

By virtue of careful design and forethought complete luck, I actually mitigated one of the biggest deficiencies of Twitch X over Twitch, Jr., the relative difficulty of changing out wheels. As it turns out, because of the way wheel closeouts are shaped, it's just barely possible to put on and take off a wheel without removing the top and bottom plates. This is a huge win because the top and bottom plates have the most hardware, the trickiest alignment, and are attached to the two linkage position-sensing potentiometers (so, taking them off would usually require re-calibration).

You might also notice the magnetic hubs. More on these in a later post, I think.

It's also possible to access most of the linkage shoulder screws from the wheel wells by rotating the linkages to different positions, so if one comes loose it doesn't necessarily require taking the whole robot apart to fix it. The center of the robot is also relatively accessible (for adjusting a linkage pot or soldering a motor lead, for example) thanks to the lack of giant servos in the middle and the fact that the battery and controller are outside of the central section.

It's so...empty.
The Mystery of the Sin/Cos Link

Where the two servos would be are just two potentiometers, one attached to each diagonal linkage. In Twitch, Jr., the diagonal linkages are independent, but in order for servoless linkage drive to work, they need to be tied together (to reduce the total number of degrees of freedom to four). I was a bit naive in naming the link that ties them together the sin/cos link, thinking that it just caused one to sweep out asin(x) while the other swept out acos(1-x), or something like that. The actual trajectory is not that simple.

If you can figure out the function f, you will win a cookie.

In what might be a first for this blog, I actually don't have an analytical solution for it. I'm sure one exists, but I think it would be a messy bit of trig. Through some random guesswork and not wanting to rename the linkage, I found that the following parameterization is very nearly perfect:
x = [sin(θ1)]^K = 1 - [cos(θ2)]^K
The exponent K is determined by the geometry of the linkage and can be found by fitting to CAD-solved angles. For Twitch X, it's somewhere around 1.23456789. (I'm not joking.) You can have an extra cookie if you can explain this parameterization and how the exponent can be derived from the geometry. I actually haven't worked this out. (For the purpose of verification, L1 = 1.00in and L3 = 7.50in on Twitch X.)


The parameter x is actually extremely convenient for controlling the linkage degree of freedom. It's easy to measure θ1 and θ2 using the pots, but it would be awkward to control one or the other. As the linkage sweeps through its range of motion, the sensitivity of the two angles to wheel rotation changes. Near the ends of travel, one angle is barely changing. By converting both angles to x and taking a weighted average based on the sensitivity, which is itself a function of x, a much better control variable is made available, one that ranges from 0.0 to 1.0 as the linkage degree of freedom sweeps from full forward to full sideways.

One other really nice thing about the parameter x is that at 0.5, the wheels are perpendicular. This is true for any exponent K. This gives a simple target for the linkage controller to get into the traditional diamond-layout omnidirectional drive configuration. Because of the weird geometry, the wheels are not at 45º angles to the chassis in this mode the way they were on Twitch, Jr. But, they are perpendicular to each other, which is the necessary condition for properly-constrained driving with four omniwheels. The driving coordinate system is actually rotated about 10º from the chassis at this point.

At x = 0.5, any exponent will give a linkage angle sum of 90º...very convenient.

I was mistaken in my last post when I said that it was possible to write a continuous mixer for all the in-between states that aren't linkage angle sums of {0º, 90º, 180º}, corresponding to the {forward, omni, sideways} driving. As it turns out these are all over-constrained and don't have a roll-without-slipping solution for wheel rotational velocity. (And I'm not talking about sideways slipping, I mean true tangential slipping.) So, I used a three-state mixer similar to Twitch, Jr. to handle the three driving modes. The pot-derived x parameter determines which state it's in. 

Twitch Drive

The quad H-bridge board I designed for Twitch came in and went together pretty easily.



I don't know yet if it's worthy of being it's own separate thing, but I really do like the layout and the modularity of the design. Besides the four H-bridges, there's some power conversion, an STM32F3 microcontroller, an MPU-6050 IMU, and headers for an XBee. The optocoupled gate drive works the same way it always does: without any drama.

Mmmmm, free deadtime.
The main issues I had were with relatively low-quality boards causing some soldering mishaps requiring blue wire micro-surgery. I've already ordered some spares from my absolute favorite board place, OSHPark, so if this one eventually dies I have some really nice ones to replace it. The board doesn't quite fit the way I wanted it to in the front wedge. It was supposed to mount vertically, but the capacitors and wiring take up too much space. I spent several hours trying to figure out how to modify the chassis to fit it either vertically or horizontally before I realized that I don't have to do either...

Yes, I felt stupid.
Since I'm only using the vertical gyro anyway, it doesn't matter which way the board is oriented. The extra trig just goes into the rotation controller's gain scaling anyway. Also, this mounting allows me to put some padding around the board to protect against impacts and vibration a bit more. The batteries will still go in the opposite wedge, and all the wiring runs in a tidy channel down the middle of the bottom plate.

Control and Controllers

As I mentioned, Twitch X has four degrees of freedom, all of which can be controlled independently by the four actuators. The "mixer" handles assigning wheel velocities based on the outputs of four degree of freedom controllers. In general, all four wheels are involved in each degree of freedom:

Driving forward and turning are the obvious ones and are the same as any other 4WD "tank steer" or "skid steer" robot. Driving sideways requires that the linkage be moved to the sideways position and then it's the same as driving forward (although two of the wheels have reversed their "forward" direction). Omnidirectional drive in the x = 0.5 "diamond" state also has a well-known mixer. The last degree of freedom is moving the linkage itself. This is accomplished by driving pairs of wheels against each other. (Which pairs depends on which way you want the linkage to move.)

Might help picture it...

Forward and sideways translation are easy enough to control manually, so the mixer just forwards commands for these directly to the correct wheels, depending on the linkage position. The other two degrees of freedom are much better handled by a closed-loop controller. For rotation, the vertical gyro is used in a feedback loop to control an exact rate of rotation, commanded by the driver. This helps keep the bot straight even if the wheels slip a little, and is crucial to this type of drivetrain. Likewise, the linkage degree of freedom is feedback-controlled off of the x parameter, as measured by the two potentiometers.

Everyone asks what the operator interface is like - I use a Playstation 4 controller with the following layout:


I don't know why I chose this layout originally, but I've been training on it since Twitch, Jr. and have the maneuvers all in muscle memory. The coolest tricks are ones involving all four degrees of freedom at the same time, like Translation While axis swITCHing, where the bot travels in a straight line but rotates and changes linkage orientations on the fly.

There are still some improvements to be made - I haven't gotten around to finishing the magnetic wheel hubs yet or really tying down all the loose parts and getting it ready to take abuse. But I also enjoy driving robots even more so than I do building them, so I couldn't resist doing some test drives of the new servoless system as soon as it was functional.

Friday, April 29, 2016

Twitch X - Did I used to be a MechE?

A combination of FRC season and BattleBots Season 2 got me thinking about how Saturday morning robot blitz building was a staple of my life back in the day, so I'm getting back into robots for a bit. I actually had a major "Battle-ready" redesign of Twitch, Jr. in the works a while ago, but couldn't really fit it into any sort of combat robotics framework and decided it wouldn't really be competitive without making major design sacrifices. That, and I got distracted by many other EE and software projects. Now, though, I've decided to try to remember how to MechE and go ahead and build it exactly the way I want.


There are actually at least three linkage drive robots named Twitch already. The OG version, which was my inspiration, was a 2008 FRC robot by Team 1561. There's also this one that I recently found. I can't find the documentation for it but it looks like it could be a mechanical relative of Twitch, Jr., with more modern electronics. And then there's this clever one (not named Twitch) that uses linkages and gears to achieve a similar wheel trajectory. Other than that I haven't seen any linkage drive robots; it remains a rare and uniquely entertaining drivetrain configuration. I've decided the world needs one more, so I present:

Twitch X

When I first drew up a new version of Twitch, I had in mind creating a wedge+lifter battlebot out of it. I made some rough geometry concepts for an Infiniwedge, which would be a wedge from every angle even if flipped over.


Get it?
This was a cool concept but of questionable competitiveness and legality in any particular league, and ultimately just too big to be practical. The footprint compared to the wheel placement just doesn't make sense. Also, even though it can vector traction into any pushing direction, omniwheels just aren't robust or sticky enough for a sumo match. And in any other type of competition the whole thing would just get exploded anyway. Linkage drive works fine with Colson wheels, if you ditch the 45º-angle omnidirectional state, but I think that would be too much of a compromise on what makes it a neat drivetrain.

I still like the idea of making a ruggedized, closed-wheel version of a linkage drive, though. That addresses the major failure point of Twitch, Jr., which is that if you hit something at high speed (which you will do), the exposed plastic omnis just break. When I break things, I have a habit of overdoing the next version a bit. With that in mind, here's a look at the new drive block:


First off, it uses the newer 4" Vex Pro omniwheels, which look to be much more durable than the previous version, made with fiber-reinforced plastic. These are designed for 130lb FRC bots, so I think they'll handle a ~20lb bot just fine. They're also completely enclosed by the chassis this time. To help increase impact tolerance, I'm attaching the wheels to the motor shaft via a magnetically-coupled hub.


1/2" disk magnets go in the pockets on the wheel and the hub cap. When attached, the hub cap soft-couples the wheel to the hub via the magnets (and friction). I'll probably stuff the inside of the wheel with something soft too so it floats, radially. The magnets should slip at around 20lbs of force, providing rotational and translational shock absorption but still allowing the wheels to break traction with torque to spare.

The gearbox is a Banebots P60 16:1 planetary, similar in size to what's on Twitch, Jr. But, Twitch X uses RS550 motors instead of 370s. Top speed should be around 20fps and it should be able to break traction at around 10A per motor. (So, plenty of overhead for pushing, if ever actually needed.)

The entire drive block rides on a radial and a thrust bearing set similar to what tinyKart uses for its steering. This means the whole bot sandwich can be properly preloaded without binding the linkage. In other words, I'm pretty sure I'll be able to bot-surf on it.

The linkages themselves have also been sized up, cut out of 1/4" plate and using 3/16" steel shoulder screws as pins (versus 1/8" plate linkages with 1/8" brass shoulder screw pins on Twitch, Jr.). You may wonder what massive metal-gear quarter-scale airplane servos are needed to swing these linkages. The answer is...none.

No Servos

While it's still a theory until proven in operation, I figured out some time ago that Twitch really doesn't need separate actuators to steer the linkages, which is great because the RC servos were the second most frequently broken part on Twitch, Jr. It's down to degrees of freedom and actuation. Twitch, being a holonomic drive in some configurations, generally has two translational and one rotational degree of freedom. But it has four independent actuators. There should be, then, one "extra" control mode that can steer the linkages if they're all tied together into one remaining degree of freedom.

As it turns out, this was how Twitch, Jr.'s linkages were originally designed. There is an extra linkage (or two kinematically redundant ones) that I call the sin/cos linkage, that ties the two diagonal wheel sets together. They don't rotate by the same angle - one follows a cosine, the other a sine, but they both go from 0º to 90º. I always considered this linkage optional, since with two servos to steer the diagonal wheel sets anyway, it didn't really do anything. But, in servo-less linkage drive, at least one of the sin/cos links is necessary.


The position of the last degree of freedom will be measured by one or two potentiometers mounted where the servos would have gone. This will go to a feedback-controlled output that gets mixed in with the other three degrees of control. All four control outputs can contribute to each motor's commanded speed through this mixer. It should also more smoothly handle all positions between forward and sideways driving, versus Twitch, Jr.'s discrete states.

Twitch Drive

Since there's only the four motors to control, I could probably have just gone with off-the-shelf H-bridge controllers like I did with Twitch, Jr.. But PCBs are cheap and Cap Kart taught me how to make indestructible H-bridges, so I just did that. I haven't updated the circuit in any way...I just find different ways to redraw the schematic.


It's the same H-bridge design that's running my Tesla coil, just with the normal-sized HCPL-3120s instead of the supersized ACNW-3190s, and normal FETs (IPB014N06 for this build). The optos are wired in reverse parallel with an RC filter, which sets the LED forward current and the deadtime. It's possible, with a logic inverter, to present each phase to the microcontroller as a single tri-state optical input, high for high, low for low, high-Z for off, with hardware-based deadtime. It really doesn't get any simpler...

I jammed the H-bridge module into the corner of board and copy-pasted it three more times.



Each H-bridge gets a pair of low-side shunt current sense through a differential op-amp as well. The FET thermal path isn't very good, but with good current control they should have no trouble driving RS550s on a robot this size with minimal heat sinking. In the middle is an STM32F3 (with its abundance of independent ADCs and Timers), an XBee, and an IMU for feedback control of the rotational degree of freedom (pretty much a necessity for linkage drive).

Parts are incoming just in time for a weekend robot build, I hope!

Wednesday, January 20, 2016

GS3 / Surfacecam GPU-Accelerated Preview

I've been further evolving the FlyCapture-based capture software for my Grasshopper3 camera system. This step involved merging in some work I had done to create a GPU-accelerated RAW file viewer. The viewer opens RAW files of the type saved out by the capture software and processes them through debayer, color correction, and convolution (sharpen/blur) pixel shaders. It was my first GPU-accelerated coding experience and I gained an appreciation for just how fast the GPU could do image processing tasks that take many milliseconds if done in software.

Some of the early modifications I made to the FlyCapture demo program were to reduce the frequency of the GDI-based UI thread, limit the size of the preview image, and force the preview to use only the easiest debayer algorithm (nearest-neighbor). This cut down the CPU utilization enough that the capture thread could buffer to RAM at full frame rate without getting bogged down by the UI thread. This was especially important on the Microsoft Surface Pro 2, my actual capture device, which has fewer cores to work with.

Adding the GPU debayer and color correction into the capture software lets me undo a lot of those restrictions. The GPU can run a nice debayer algorithm (this is the one I like a lot), do full color correction and sharpening, and render the preview to an arbitrarily large viewport. The CPU is no longer needed to do software debayer or GDI bitmap drawing. Its only responsibility is shuttling raw data to the GPU in the form of a texture. More on that later. This is the new, optimized architecture:


Camera capture and saving to the RAM buffer is unaffected. RAM buffer writing to disk is also unaffected. (Images are still written in RAW format, straight from the RAM buffer. GPU processing is for preview only.) I did simplify things a lot by eliminating all other modes of operation and making the RAM buffer truly circular, with one thread for filling it and one for emptying it. It's nice when you can delete 75% of your code and have the remaining bits still work just as well.

The UI thread has the new DirectX-based GPU interface. Its primary job now is to shuttle raw image data from the camera to the GPU. The mechanism for doing this is via a bound texture - a piece of memory representing a 2D image that both the CPU and the GPU have access to (not at the same time). This would normally be projected onto a 3D object but in this case it just gets rendered to a full-screen rectangle. The fastest way to get the data into a texture is to marshal it in with raw pointers, something C# allows you to do only within the context of the "unsafe" keyword...I wonder if they're trying to tell you something.


The textures usually directly represent a bitmap. So, most of the texture formats allow for three-color pixel formats such as R32G32B32A32 (32 bits of floating point each for Red, Green, Blue, and Alpha). Since the data from the camera represents raw Bayer-masked pixels, I have had to abuse the pixel formats a little. For 8-bit data, it's not too bad. I am using R8_UNorm format, which just takes an unsigned 8-bit value and normalizes it to the range 0.0f to 1.0f. 

12-bit is significantly more complicated, since there are no 12- or 24-bit pixel formats into which one or two pixels can be stuffed cleanly. Instead, I'm using R32G32B32_UInt and Texture.Load() instead of Texture.Sample(). This allows direct bitwise unpacking of the 96-bit pixel data, which actually contains eight adjacent 12-bit pixels. And I do mean bitwise...the data goes through two layers of rearrangement on its way into the texture, each with its own quirks and endianness, so there's no clean way to sort it out without bitwise operations.

This might be something like what's actually going on.
In order to accommodate both 8-bit and 12-bit data, I added an unpacking step that is just another pixel shader that converts the raw data into a common 16-bit single-color format before it goes into the debayer, color correction, and convolution shader passes just like in the RAW file viewer. The shader file is linked below for anyone who's interested.

The end result of all this is I get a cheaply-rendered high-quality preview in the capture program, up to full screen, which looks great on the Surface 2:


Once the image is in the GPU, there's almost no end to the amount of fast processing that can be done with it. Shown in the video above is a feature I developed for the RAW viewer, saturation detection. Any pixel that is clipped in red, green, or blue value because of overexposure or over-correction gets its saturated color(s) inverted. In real time, this is useful for setting up the exposure. Edge detection for focus assist can also be done fairly easily with the convolution shader. The thread timing diagnostics show just how fast the UI thread is now. Adding a bit more to the shaders will be no problem, I think.

For now, here is the shader file that does 8-bit and 12-bit unpacking (12-bit is only valid for this camera...other cameras may have different bit-ordering!), as well as the debayer, color correction, and convolution shaders.

Shaders: debayercolor_preview.fx

Saturday, November 7, 2015

DRSSTC Δt5: MIDI Streaming and Shutter Syncing

Until last week, I hadn't really touched my Tesla coil setup since I moved out to Seattle. Maybe because the next step was a whole bunch of software writing. As of Δt3, I had written a little test program that could send some basic frequency (resonant and pulse generation) and pulse shaping commands to the driver. But it was just for a fixed frequency of pulse generation and of course I really wanted to make a multi-track MIDI driver for it.

The number I had in mind was three tracks, to match the output capabilities of MIDI Scooter. While the concept was cool, the parsing/streaming implementation was flawed and the range of notes that you can play with a motor is kinda limited by the power electronics and motor RPM. So I reworked the parser and completely scrapped and rebuilt the streaming component of it. (More on that later.) Plus I did a lot of preliminary thinking on how best to play three notes using discrete pulses. As it turns out, the way that works best in most scenarios is also the easiest to code, since it uses the inherent interrupt prioritization and preemption capabilities that most microcontrollers have.


So despite my hesitation to start on the new software, it actually turned out to be a pretty simple coding task. It did require a lot of communication protocol code on both the coil driver and the GUI, to support sending commands and streaming MIDI events at the same time. But it went pretty smoothly. I don't think I've written as many lines of code before and had them mostly work on the first try. And the result is a MIDI parser/streamer that I can finally be proud of. Here it is undergoing my MIDI torture test, a song known only as "Track 1" from the SNES game Top Gear.




The note density is so high that it makes a really good test song for MIDI streaming. I only wish I had more even more tracks...


The workflow from .mid file to coil driver is actually pretty similar to MIDI scooter. First, I load and parse the MIDI, grouping events by track/channel. Then, I pick the three track/channel combinations I want to make up the notes for the coil. These get restructured into an array with absolute timestamps (still in MIDI ticks). The array is streamed wirelessly, 64 bytes at a time, to a circular buffer on the coil driver. The coil driver reports back what event index is currently playing, so the streamer can wait if the buffer is full.


On the coil driver itself, there are five timers. In order of interrupt priority:

  • The pulse timer, which controls the actual gate drive, runs in the 120-170kHz range and just cycles through a pre-calculated array of duty cycles for pulse shaping. It's always running, but it only sets duty cycles when a pulse is active. 
  • Then, there are three note timers that run at lower priority. Their rollover frequencies are the three MIDI note frequencies. When they roll over, they configure and start a new pulse and then wait for the pulse to end (including ringdown time). They're all equal priority interrupts, so they can't preempt each other. This ensures no overlap of pulses.
  • Lastly, there's the MIDI timer, which runs at the MIDI tick frequency and has the lowest interrupt priority. It checks for new MIDI events and updates the note timers accordingly. I'm actually using SysTick for this (sorry, SysTick) since I ran out of normal timers.
There are three levels of volume control involved as well. Relative channel volume is set by configuring the pulse length (how many resonant periods each pulse lasts). But since the driver was designed to be hard-switched, I'm also using duty cycle control for individual note volume. And there is a master volume that scales all the duty cycles equally. All of this is controlled through the GUI, which can send commands simultaneously while streaming notes, as shown in the video.


It's really nice to have such a high level of control over the pulse generation. For example, I also added a test mode that produces a single long pulse with gradually ramped duty cycle. This allows for making longer, quieter sparks with low power...good for testing.

I also got to set up an experiment that I've wanted to do ever since I got my Grasshopper 3 camera. The idea is to use the global shutter and external trigger capabilities of the industrial camera to image a Tesla coil arc at a precise time. Taking it one step further, I also have my favorite Tektronix 2445 analog oscilloscope and a current transformer. I thought that it would be really cool to have the scope trace of primary current and the arc in the same image at the same time, and then to sweep the trigger point along the duration of the pulse to see how they both evolve.




The setup for this was a lot of fun.

Camera is in the foreground, taped to a tripod because I lost my damn tripod adapter.
Using a picture frame glass as a reflective surface with a black background (and a neon bunny!).
I knew I wanted to keep the scope relatively far from the arc itself, but still wanted the image of the scope trace to appear near the spark and be in focus. So, I set up a reflective surface at a 45º angle and placed the scope about the same distance to the left as the arc is behind the mirror, so they could both be in focus. When imaged straight on, they appear side by side, but the scope trace is horizontally flipped, which is why the pulse progresses from right to left.


This picture is a longer exposure, so you can see the entire pulse on the scope. To make it sweep out the pulse and show the arc condition, I set the exposure to 20-50μs and had the trigger point sweep from the very start of the pulse to the end on successful pulses. So, each frame is actually a different pulse (should be clear from the arcs being differently-shaped) but the progression still shows the life cycle of the spark, including the ring-up period before the arc even forms.The pulse timer fires the trigger at the right point in the pulse through a GPIO on the microcontroller. Luckily, the trigger input on the camera is already optocoupled, so it didn't seem to have any issues with EMI.

Seeing the pulse shape and how it relates to arc length is really interesting. It might be useful for tuning to be able to see primary current waveform and arc images as different things are adjusted. No matter what, the effect is cool and I like that it can only really be done with old-school analog methods and mirrors (no smoke, yet).

Monday, September 21, 2015

GS3 / SurfaceCam Multipurpose Adapter

It's been a while since I've made anything purely mechanical, so I had a bit of fun this weekend putting together a multipurpose adapter for my Grasshopper 3 camera.


The primary function of the adapter is to attach the camera to a Microsoft Surface Pro tablet, which acts as the monitor and recorder via USB 3.0. I was going to make a simple adapter but got caught up in linkage design and figured out a way to make it pivot 180º to fold flat in either direction.



Some waterjet-cut parts and a few hours of assembly later, and it's ready to go. The camera cage has 1/4-20 mounts on top and bottom for mounting to a tripod, or attaching accessories, like an audio recorder in this case. There's even a MōVI adapter for attaching just the Sufrace Pro 2 to an M5 handlebar for stabilizer use. (The camera head itself goes on the gimbal inside a different cage, if I can ever find a suitable USB 3.0 cable.)

Anyway, quick build, quick post. Here are some more recent videos I've done with the GS3 and my custom capture and color processing software.


Plane spotting at SeaTac using the multipurpose adapter and a 75mm lens (250mm equivalent).

Slow motion weed trimming while testing out an ALTA prototype. No drones were harmed in the making of this video.


Freefly BBQ aftermath. My custom color processing software was still a bit of a WIP at this point.

Saturday, January 17, 2015

Three-Phase Color

I was doing a bit more work on my DirectX-based .raw image viewer when I came across a nice mathematical overlap with  three-phase motor control theory. It has to do with conversion from red/green/blue (RGB) to hue/saturation/lightness (HSL), two different ways of representing color. Most of the conversion methods are piecewise-linear, with max(), min(), and conditionals to break up the color space. But I figured motors are round and color wheels are round, so maybe I would try applying a motor phase transform to [R, G, B] to see what happens.


The transform of interest is the Clarke transform, which converts a three-phase signal into two orthogonal components (α and β) and a zero-sequence component (γ) that is just the average of the three phases. In motor control with symmetric three-phase signals, γ is usually zero. Applied to [R, G, B], it's just the intensity, one measure of lightness.

In motor control, it's common to find the phase and magnitude of the vector defined by α and β, for example to determine the amplitude and electrical angle of sinusoidal back EMF in a PMSM. It turns out the phase and magnitude are useful in color space as well, representing the hue and saturation, respectively. It might not be exactly adherent to the definition of these terms, but rather than rambling on about hexagons and circles, I'll just say it is close enough for me. (The Wikipedia article's alternate non-hexagon hue (H2) and chroma (C2) calculation is exactly the Clarke transform and magnitude/phase math.)

So I added this hue and saturation adjustment method to the raw viewer's pixel shader:






I'm particularly happy about the fact that it occupies barely 15 lines of HLSL code:

// Clarke Transform Color Processing:
c_alpha = 0.6667f * tempcolor.r - 0.3333f * tempcolor.g - 0.3333f * tempcolor.b;
c_beta = 0.5774f * tempcolor.g - 0.5774f * tempcolor.b;
c_gamma = 0.3333f * tempcolor.r + 0.3333f * tempcolor.g + 0.3333f * tempcolor.b;
c_hue = atan2(c_beta, c_alpha);
c_sat = sqrt(pow(abs(c_alpha), 2) + pow(abs(c_beta), 2));
c_sat *= saturation;
c_hue += hue_shift;
c_alpha = c_sat * cos(c_hue);
c_beta = c_sat * sin(c_hue);

tempcolor.r = c_alpha + c_gamma;
tempcolor.g = -0.5f * c_alpha + 0.8660f * c_beta + c_gamma;
tempcolor.b = -0.5f * c_alpha - 0.8660f * c_beta + c_gamma;

I doubt it's the most computationally efficient way to do it (with the trig and all), but it does avoid a bunch of conditionals from the piecewise methods. And as I mentioned in the last post, the pixel shader is far from the performance bottleneck of the viewer right now.

Updated HLSL Source: debayercolor.fx

Updated Viewer Source (VB 2012 Project): RawView_v0_2.zip
Built for .NET 4.0 64-bit. Requires the SlimDX SDK.

And for fun, here's some 150fps video of a new kitchen appliance I just received and hope to put to good use soon:

Saturday, December 27, 2014

Fun with Pixel Shaders

One of the things that saved my ass for MIT Mini Maker Faire was SlimDX, a modern (.NET 4.0) replacement for Microsoft's obsolete Managed DirectX framework. I was only using it to replace the managed DirectInput wrapper I had for interfacing to USB HID joysticks for controlling Twitch and 4pcb. For that it was almost a drop-in replacement. But the SlimDX framework also allows for accessing almost all of DirectX from managed .NET code.

I've never really messed with DirectX, even though at one point long ago I wanted to program video games and 3D stuff. The setup always seemed daunting. But with a managed .NET wrapper for it, I decided to give it a go. This time I'm not using it to make video games, though. I'm using it to access GPU horsepower for simple image processing.

The task is as follows:

The most efficient way to capture and save images from my USB3.0 camera is as raw images. The file is a binary stream of 8-bit or 12-bit pixel brightnesses straight from the Bayer-masked sensor. If one were to convert these raw pixel values to a grayscale image, it would look like this:

Zoom in to see the checkerboard Bayer pattern...it's lost a bit in translation to a grayscale JPEG, but you can still see it on the car and in the sky.


The Bayer filter encodes color not on a per-pixel basis, but in the average brightness of nearby pixels dedicated to each color (red, green, and blue). This always seemed like cheating to me; to arrive at a full color, full resolution image, 200% more information is generated by interpolation than was originally captured by the sensor. But the eye is more sensitive to brightness than to color, so it's a way to sense and encode the information more efficiently.

Anyway, deriving the full color image from the raw Bayer-masked data is a bit of a computationally-intensive process. In the most serial implementation, it would involve four nested for loops to scan through each pixel looking at the color information from its neighboring pixels in each direction. In pseudo-code:

// Scan all pixels.
for y = 0 to (height - 1)
 for x = 0 to (width - 1)
  
  Reset weighted averages.  

  // Scan a local window of pixels.
  for dx = -N to +N
   for dy = -N to +N
    
    brightness = GetBrightness(x+dx, y+dy)
    Add brightness to weighted averages.
   
   next
  next

  Set colors of output (x, y) by weighted averages.

 next
next

The window size (2N+1)x(2N+1) could be 3x3 or 5x5, depending on the algorithm used. More complex algorithms might also have conditionals or derivatives inside the nested for loop. But the real computational burden comes from serially scanning through x and y. For a 1920x1080 pixel image, that's 2,073,600 iterations. Nest a 5x5 for loop inside of that and you have 51,840,000 times through. This is a disaster for a CPU. (And by disaster, I mean it would take a second or two...)

But the GPU is ideally-suited for this task since it can break up the outermost for loops and put them onto a crapload of miniature parallel processors with nothing better to do. This works because each pixel's color output is independent - it depends only on the raw input image. The little piece of software that handles each pixel's calculations is called a pixel shader, and they're probably the most exciting new software tool I've learned in a long time.

For my very first pixel shader, I've written a simple raw image processor. I know good solutions for this already exist, and previously I would use IrfanView's Formats plug-in to do it. But it's much more fun to build it from scratch. I'm sure I'm doing most of the processing incorrectly or inefficiently, but at least I know what's going on under the hood.

The shader I wrote has two passes. The first pass takes as its input the raw Bayer-masked image and calculates R, G, and B values for each pixel based on the technique presented in this excellent Microsoft Research technical article. It then does simple brightness, color correction, contrast, and saturation adjustment on each pixel. This step is a work-in-progress as I figure out how to define the order of operations and what techniques work best. But the framework is there for any amount of per-pixel color processing. One nice thing is that the pixel shader works natively in single-precision floating point, so there's no need to worry about bit depth of the intermediate data for a small number of processing steps.


The second pass implements an arbitrary 5x5 convolution kernel, which can be used for any number of effects, including sharpening. The reason this is done as a second pass is because it requires the full-color output image of the first pass as its input. It can't be done as part of a single per-pixel operation with only the raw input image. So, the first pass renders its result to a texture (the storage type for a 2D image), and the second pass references this texture for its 5x5 window. The output of the second pass can either be rendered to the screen, or to another texture to be saved as a processed image file, or both.

What a lovely Seattle winter day.
Even though the pixel shader does all of the exciting work, there's still the matter of wrapping the whole thing in a .NET project with SlimDX providing the interface to DirectX. I did this with a simple VB program that has a viewport, folder browser, and some numeric inputs. For my purposes, a folder full of raw images goes together as a video clip. So being able to enumerate the files, scan through them in the viewer, and batch convert them to JPEGs was the goal.

Hrm, looks kinda like all my other GUIs...

As it turns out, the pixel shader itself is more than fast enough for real-time (30fps) processing. The time consuming parts are loading and saving files. Buffering into RAM would help if the only goal was real-time playback, but outputting to JPEGs is never going to be fast. As it is, for a 1920x1200 image on my laptop, the timing is roughly 30ms to load the file, an immeasurably short amount of time to actually run both pixel shader passes, and then 60ms to save the file. To batch convert an entire folder of 1000 raw images to JPEG, including all image processing steps, took 93s (10.75fps), compared to 264s (3.79fps) in IrfanView.

Real-time scrubbing on the MS Surface Pro 2, including file load and image processing, but not including saving JPEGs.

There are probably ways to speed up file loading and saving a bit, but even as-is it's a good tool for setting up JPEG image sequences to go into a video editor. The opportunity to do some image processing in floating point, before compression, is also useful, and it takes some of the load off the video editor's color correction.

I'm mostly excited about the ability to write GPU code in general. It's a programming skill that still feels like a superpower to me. Maybe I'll get back into 3D, or use it for simulation purposes, or vision processing. Modern GPUs have more methods available for using memory that make the cores more useful for general computing, not just graphics. As usual with new tools, I don't really know what I'll use it for, I just know I'll use it.

If you're interested in the VB project and the shader source (both very much works-in-progress), they are here:

VB 2012 Project: RawView_v0_1.zip
Built for .NET 4.0 64-bit. Requires the SlimDX SDK.

HLSL Source: debayercolor.fx

P.S. I chose DirectX because it's what I have the most experience with. (I know somebody will say, "You should use OpenGL.") I'm sure all of this could be done in OpenGL / GLSL as well.