Raw UVC on Android

It was time to integrate support for the Nreal Light's stereo camera into our Android build, but our go-to library (nokhwa) does not work on Android. In fact, nothing seemed to support this specific camera on Android... So it was time to make our own USB Video Device Class implementation (but not really).

Webcam support on Android

As far as I gathered, external camera support for Android was always kind of an afterthought. It was officially integrated in Android P, but support for it is optional (i.e. depends on the manufacturer compiling the firmware).

According to this very detailed stackoverflow answer, it was not very well supported on even Android 11 phones. The official APIs did work on my test Samsung S9+, but it's running Android 13 (LineageOS), so that's not really representative.

Oh, by the way, permission handling has been broken for 4 years now.

What API to use

In native code on Android, you get access to all cameras (including the external ones) through the official NdkCamera API, which is the C binding of the Camera2 API (which in turn is not recommended and CameraX should be used in java/kotlin land).

I usually go with the "when in Rome" way and use the official recommendation, but NdkCamera is needlessly complicated to use. Like, just take a look at their "basic" example. When you see a 500 line Manager class in a "basic" example, you know you're in for some "fun" coding.

My problems specifically:

As a side note, OpenCV does have an Android video backend, so you can actually use the nice OpenCV interfaces.

Trying the stereo webcam with some apps

So turns out none of the above is relevant anyway, because when I tried out the Light cameras with some apps, only the RGB camera worked. The stereo cam either didn't show up as an option, it didn't show a pic, or straight up crashed the apps.

I assume the apps use the official API, because what else would they use.

Except the one just called "USB Camera" that could actually read and display the pic. How does it do it?

Apparently, instead of going through the kernel driver and all the Android cruft they put around it, it uses a bundled libuvc and communicates directly with the camera through libusb. This solution was also mentioned by the stackoverflow answer I linked above. Here's a similar project on github.


In the past I've already looked at the USB HID specs in detail, and at my previous job I had to go through some way more involved protocols (I even looked at some 3GPP stuff, but I will spare you that horror), so I jumped into the UVC specifications.

I have to say, these USB specs are pretty good. The docs are well written and easy to find, and the protocols themselves are very simple to implement if you only need basic functionality.

In this case, it turns out that all you have to do is bulk read the USB endpoint, and you get raw frames. That it, no callbacks, no manager object, no ImageReader, just a synchronous libusb_bulk_transfer, and you get a raw frame.

This is especially true with the stereo cam of the Nreal Light, because the USB stream is served by a custom firmware on an OV580 DSP, and it only has a single stream format, and practically no controls.

So I looked at the USB descriptor of the camera, issued the bulk read on the relevant endpoint, and...



Looking at libuvc

I guess I have no choice but to clone the libuvc repo and look at what I actually need to do.

After cloning, I cmake-d the project and tried out the example to verify that it actually works:

$ cd libuvc
$ mkdir build
$ cd build
$ cmake ..
$ make -j
$ ./example
UVC initialized
Device found
Device opened
DEVICE CONFIGURATION (05a9:0680/OV0001) ---
Status: idle
        bcdUVC: 0x0110
        bEndpointAddress: 129
                  bits per pixel: 16
                  GUID: 5955593200001000800000aa00389b71 (YUY2)
                  default frame: 1
                  aspect ratio: 0x0
                  interlace flags: 00
                  copy protect: 00
                          capabilities: 00
                          size: 640x481
                          bit rate: 147763200-147763200
                          max frame size: 615680
                          default interval: 1/30
                          interval[0]: 1/30

First format: (YUY2) 640x481 30fps
bmHint: 0001
bFormatIndex: 1
bFrameIndex: 1
dwFrameInterval: 333333
wKeyFrameRate: 0
wPFrameRate: 0
wCompQuality: 0
wCompWindowSize: 0
wDelay: 101
dwMaxVideoFrameSize: 615680
dwMaxPayloadTransferSize: 32768
bInterfaceNumber: 1
Enabling auto exposure ...
 ... enabled auto exposure
callback! frame_format = 3, width = 640, height = 481, length = 615680, ptr = 0x3039
callback! frame_format = 3, width = 640, height = 481, length = 615680, ptr = 0x3039
callback! frame_format = 3, width = 640, height = 481, length = 615680, ptr = 0x3039
callback! frame_format = 3, width = 640, height = 481, length = 615680, ptr = 0x3039

Cool, it seems to be working.

Now looking at the code, while the initialization is simple, it does some more than just opening the fd, reading the descriptors and jumping right to the bulk read part. It sends a kind of "start stream" packet: in the uvc_start_streaming, before the uvc_stream_start function is called (which does the actual bulk reading stuff), it calls uvc_stream_open_ctrl, which in turn calls uvc_stream_ctrl, to send a control packet, containing the stream format requested from the device, which was previously filled by querying the supported frame formats.

Since we know that the device only supports a single format that is unlikely to be changed, in our implementation we could fill this control structure out based on the specs, and the read descriptor.

Or we could just dump it from the libuvc example using gdb LMAO:

$ gdb example
(gdb) break uvc_stream_open_ctrl
(gdb) r
Thread 1 "example" hit Breakpoint 1, uvc_stream_open_ctrl (devh=0x555555578b80, strmhp=0x7fffffffd6b0, ctrl=0x7fffffffd720) at /tmp/libuvc/src/stream.c:1007
(gdb) break libusb_control_transfer
(gdb) c
Thread 1 "example" hit Breakpoint 2, 0x00007ffff7d75b10 in libusb_control_transfer () from /usr/lib/libusb-1.0.so.0
(gdb) bt
#0  0x00007ffff7d75b10 in libusb_control_transfer () from /usr/lib/libusb-1.0.so.0
#1  0x00007ffff7fb98b0 in uvc_query_stream_ctrl (devh=0x555555578b80, ctrl=0x7fffffffd720, probe=0 '\000', req=UVC_SET_CUR) at /tmp/libuvc/src/stream.c:235
#2  0x00007ffff7fb9e5b in uvc_stream_ctrl (strmh=0x5555555a2500, ctrl=0x7fffffffd720) at /tmp/libuvc/src/stream.c:402
#3  0x00007ffff7fbaf9f in uvc_stream_open_ctrl (devh=0x555555578b80, strmhp=0x7fffffffd6b0, ctrl=0x7fffffffd720) at /tmp/libuvc/src/stream.c:1037
#4  0x00007ffff7fbad74 in uvc_start_streaming (devh=0x555555578b80, ctrl=0x7fffffffd720, cb=0x555555555279 <cb>, user_ptr=0x3039, flags=0 '\000') at /tmp/libuvc/src/stream.c:941
#5  0x0000555555555610 in main (argc=1, argv=0x7fffffffd868) at /tmp/libuvc/src/example.c:174
(gdb) frame 1
(gdb) p/x buf
$1 = {0x1, 0x0, 0x1, 0x1, 0x15, 0x16, 0x5, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x65, 0x0, 0x0, 0x65, 0x9, 0x0, 0x0, 0x80, 0x0, 0x0, 0x80, 0xd1, 0xf0, 0x8, 0x8, 0xf0, 0x69, 0x38}

Test implementation

I put the setup packet in my test python "implementation":

import libusb
import ctypes

handle = libusb.open_device_with_vid_pid(None, 0x05A9, 0x0680)
libusb.set_auto_detach_kernel_driver(handle, 1)
libusb.claim_interface(handle, 1)

    0x1, 0x0, 0x1, 0x1, 0x15, 0x16, 0x5, 0x0, 0x0,
    0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,  0x65, 0x0,
    0x0, 0x65, 0x9, 0x0, 0x0, 0x80, 0x0, 0x0, 0x80,
    0xd1, 0xf0, 0x8, 0x8, 0xf0, 0xa9, 0x18
stream_enable_buffer = (
    ctypes.c_uint8 * len(STREAM_ENABLE)
    2 << 8,
    stream_enable_buffer, len(STREAM_ENABLE), 1000

frame_buffer = (PACKET_SIZE*ctypes.c_uint8)()
read_actual = ctypes.c_int(0)
while True:
    libusb.bulk_transfer(handle, 0x81, frame_buffer,
                         PACKET_SIZE, read_actual, 1000)
    print("Header:", list(frame_buffer[:12]))
    print("Buffer data:", list(frame_buffer[12:50]), "...")

And I got my frames.

Now all that was left to do is remove the 12 byte frame header that's added every 0x8000 bytes (as indicated in the frame format descriptor), and we got our sweet 640x481 YUYV raw frames. I leave this as an exercise to the reader.

Oh, and it worked out of the box on Android too.

(Note: the frames are not actually YUY2, that's a lie, it's actually 640x2x480 grayscale, plus a fake last line that contains some metadata, importantly the IMU timestamp. But this is a really Nreal Light-specific quirk)

The final code in Rust can be seen here, along with an example.


Implementing your own USB device classes is good fun if you do it for very specific hardware. Can recommend.

If you need Augmented Reality problem solving, or want help implementing an AR or VR idea, drop us a mail at info@voidcomputing.hu