More AR glasses USB protocols: the Worse, the Better and the Prettier

We've found a drop-in replacement for the Nreal Light, called the Grawoow G530 (or Metavision M53, and who knows how many other names), so we finally have 3 more protocols to write about in this blog.

Background

The previous blog post in the topic has become somewhat of a reference in the community, and has actually driven some sales for the company, so it seemed like a good idea to write about our more recent findings, and share it with anyone interested.

The post itself will probably be a bit dry for the casual reader. Sorry about that.

The Worse: Grawoow G530

G530 on a dog

We started searching for a replacement for the XREAL Light from Day 1, because it is not supported or manufactured by XREAL anymore. We needed glasses with stereo cams and active support.

Some months ago I got contacted on LinkedIn by a Chinese seller, and after a bit of talking, we bought a test piece. It wasn't as easy as just going on a webshop (and I had to do all kinds of import paperwork), but it was still smooth: send some mails, wire money, receive glasses.

I call it worse, because it is a tiny bit worse in every regard: it looks cheaper, the plastic parts fit worse, and the protocol is missing some information that is crucial for good support. The main thing it has going for it is that it's still available from distributors.

The architecture is extremely similar to the XREAL Light, so much so that I'm only going to link the draft architecture pic. Main components:

Gotta love Chinese copying culture. By the way, the glasses seem to be widely white-labeled; the Metavision M53 seem to be the same hardware. Even the firmware and SDK say G530 and not M53.

USB interfaces

The device comes up as two hubs (one USB 3 and one USB 2) and 5 devices:

MCU Protocol

The MCU control is predominantly through control packets, although there is an interrupt endpoint for the forehead detector event.

The control protocol is always two control packets, one to send the command and one to receive the result. The magic libusb parameters are:

Send: 
    bmRequestType: 0x21 (CTRL_TYPE_CLASS|CTRL_RECIPIENT_INTERFACE|ENDPOINT_OUT)
    bRequest:      9
    wValue:        0x201
    wIndex:        0

Receive: 
    bmRequestType: 0xa1 (CTRL_TYPE_CLASS|CTRL_RECIPIENT_INTERFACE|ENDPOINT_IN)
    bRequest:      1
    wValue:        0x102
    wIndex:        0

Note that these are the standard SetReport and GetReport HID requests (see Section 7.2 in the HID Device Class definition), so these might be available with some standard report-based HID APIs.

The packet structure is as follows:

An example packets:

Commands (data is empty for "Get" commands here):

Command ID Data
Get firmware version 0xffe1 8 bytes, unknown format
Get serial number 0x8005 The serial number as UTF-8 string
Set serial number 0x8004 Same as above
Get display mode 0x8007 Display mode as a single byte: 0 is mirrored, any other nonzero is SBS 60Hz
Set display mode 0x8008 Same as above
Get display brightness 0x801d Brightness as a single byte, 0-4
Set display brightness 0x801e Same as above

Some commands I haven't listed (or tested really), but can be easily obtained from the SDK libraries:

You can also continuously read the Interrupt endpoint on endpoint number 0x85, where you should get key and distance sensor events (in the same 0xaa 0xbb format as the control packets), but I only ever got the "glasses taken off" event, and it was not worth implementing.

Getting the calibration data

As opposed to the XREAL protos, where you can get the calibration JSON from the OV580, you actually have to do it over the above MCU protocol, using command IDs 0x8009 (metadata) and 0x800a (actual calibration data).

The metadata response looks something like this:
[0, 0, 0, 241, 0, 0, 10, 210, 3, 142]

The "get calibration data" packet needs additional data: a 0 byte, and then 4 byte offset, in big endian. So it's [0, 0, 0, 0, 0] for the first packet, [0, 0, 0, 0, 241] for the next, and so on.

Response is the same 5 bytes followed by a 0 byte (so 6 in total), and then the actual data. If you request more data than the calibration file size, the packet will be smaller, or even empty. So requesting the metadata is kind of useless, you can just request data until you get an empty response.

IMU protocol

Fortunately this is another glasses that gives you an IMU stream out of the box, and you don't need to fight for it. All you have to do is continuously read 0x80 chunks on the HID interrupt endpoint 0x89 of the OV580 device.

It is a large packet, and the SDK only parses the raw accelerometer, gyro and temperature data. A lot of the packets seem to be fixed bytes, and the only thing that changes (other than what we already know) are two sequence numbers. Yeah, sequence numbers, not even proper timestamps.

All data are transferred as little endian signed ints. The conversion factors are the same as in the Invensense MPU6050 docs.

Data Offset Size Conversion
Acceleration 0x58 3*4 Divide by 16384.0 and then convert gs to m/s2
Gyroscope 0x3c 3*4 Divide by 16.4 and then convert °/s to rad/s
Temperature 0x2a 2 Divide by 326.8, then add 25.0

The Better: Rokid Max

Rokid Max on a dog

The Rokid Max is a logical evolution of the Rokid Air. Better design, better fit, better protocol, and the DisplayPort part is apparently 2ms quicker, reducing motion-to-photon latency. Everything else is pretty much the same, so much so that most of it can be handled by the same code. They even kept the gimmicky focal adjustment knobs (even though it's still unusable for people who have astigmatism)

Protocol

The main new protocol element is "sensor data marker = 17" in the IMU data packets, which combines all previous packets into one. Its structure looks like this:

Index Bytes Description
0x00 1 Sensor data marker (17)
0x01 8 Timestamp (little endian)
0x09 3x4 Gyroscope x, y and z reading in f32 format
0x15 3x4 Accelerometer x, y and z reading in f32 format
0x21 3x4 Magnetometer x, y and z reading in f32 format
0x2d 1 Physical key statuses (bitfield)
0x2e 1 Proximity sensor status (near=0, far=1)
0x2f 1 ?
0x30 8 Timestamp of last VSYNC (little endian)
0x38 3 ???
0x3b 1 Display brightness
0x3c 1 Volume
0x3d 3 ???

Display modes

The Max added a bunch of new display modes:

Mode SBS Resolution Refresh rate
0 1920x1080 60Hz
1 Yes 3840x1080 60Hz
2 Yes* 1920x1080 60Hz
3 1920x1080 120Hz
4 Yes 3840x1200 90Hz
5 Yes 3840x1200 60Hz

*: This is a "half SBS" mode, meaning that it splits the regular HD image in half, and then stretches each half horizontally over each of the glasses.

Modes above 6 are equivalent to mode 3.

The Prettier: XREAL Air

XREAL Air on a dog

The XREAL Air is not an evolution of the Light, it is much more like the Rokid Max, but with a way better design. And I mean a lot better, the thing actually looks like regular (albeit a bit big) sunglasses. It is the first AR glasses that passes the "Tram #4 test": I could wear it on Tram #4 and people wouldn't really notice. Maybe the cable hanging down.

Unfortunately it doesn't have a camera, so no inside-out 6DOF anymore.

On the other hand, it has the absolute lowest display delay out of all 6 we described in these blog posts, so the image is rock stable even with dynamic head movements.

The protocol is weird. They kept the separate USB interfaces for the MCU and IMU + DSP pair. Both are different from the Light's. Unfortunately this post was written way after I finished work on the Air, so I'm writing it based on the code of ar-drivers-rs.

MCU protocol

Packets are sent over regular HID read() and write() primitives, over interface 4 (endpoints 0x86 and 0x07). Packet size is 0x40 both ways.

Index Bytes Description
0x00 1 Header (0xfd)
0x01 4 Checksum (see below)
0x05 2 Length of additional data
0x07 4 Request ID (not checked by the MCU, only used to identify answers. Can be anything)
0x0b 4 Timestamp (also not checked, can be 0)
0x0f 2 Command ID
0x11 5 Zeros (probably)
0x16 n Additional data

Every int is Little Endian.

The checksum is CRC32(Adler) like the Light's. The checksum data is from byte 5 to the end of the packet (i.e. the length field + 17).

Again, there is no need to individually enable events or hardware, so we only need the bare minimum commands:

Command ID Data
Get MCU FW version 0x0026 Version as UTF-8 string
Get serial number 0x0015 The serial number as UTF-8 string
Get display mode 0x0007 Display mode as a single byte
Set display mode 0x0008 Same as above

There is a .js file in the official app that describes a lot more commands for both the Air and the Light. There aren't many interesting things, just a couple version strings, firmware update, reboot, and fiddling with the display.

Some asynchronous events also arrive on the same channel (sometimes between command and its reply). They use the same packet format as the commands and replies. The only one worth looking for is ID 0x6c05, which is the key press (more precisely key release) event.

Display modes

They also added a lot more display modes:

Mode SBS Resolution Refresh rate
1 1920x1080 60Hz
3 Yes 3840x1080 60Hz
4 Yes 3840x1080 72Hz
5 1920x1080 72Hz
8 Yes* 1920x1080 60Hz
9 Yes 3840x1080 90Hz
10 1920x1080 90Hz
11 1920x1080 120Hz

*: This is a "half SBS" mode, meaning that it splits the regular HD image in half, and then stretches each half horizontally over each of the glasses. This is the replacement for Mode 1, which was vertically stretched half-SBS on the Light.

Invalid display modes cause an error, and I checked all 256 values.

The IMU protocol

IMU packets are also sent/received with regular HID read() and write(), over interface 3 (endpoints 0x84 and 0x05), with 0x40-sized packets.

Index Bytes Description
0x00 1 Header (0xaa)
0x01 4 Checksum (same as MCU checksum)
0x05 2 Length of additional data
0x07 1 Command ID
0x08 n Additional data

Every int is Little Endian.

Interestingly, while the packet format is very different, the commands are exactly the same as the Light's:

Command Id Command data
Get calibration file length 0x14 Calibration file id according to the SDK, doesn't seem to affect anything. Can be empty
Get calibration file part 0x15 Should be block number. Doesn't do anything, can be empty.
Enable IMU stream 0x19 0: disable, 1: Enable

The calibration file format is similar, although this time they didn't stuff 3 different files in there, you only have the JSON.

The IMU packet format is different, more compact, but the logic is the same:

Index Bytes Description
0x00 2 Header (0x01, 0x02)
0x02 2 Temperature (raw data from the ICM-20602)
0x04 8 Timestamp (nanoseconds)
0x0C 2 Gyroscope multiplier
0x0e 4 Gyroscope divisor
0x12 3 Gyroscope X reading
0x15 3 Gyroscope Y reading
0x18 3 Gyroscope Z reading
0x1b 2 Accelerometer multiplier
0x1d 4 Accelerometer divisor
0x21 3 Accelerometer X reading
0x24 3 Accelerometer Y reading
0x27 3 Accelerometer Z reading
0x2a 2 Magnetometer offset
0x2c 4 Magnetometer divisor
0x30 2 Magnetometer X reading
0x32 2 Magnetometer Y reading
0x34 2 Magnetometer Z reading

Yes, there are 3 byte signed integers there. They are encoded the same way as "regular" 4 byte integers (little endian, one's complement), but on 3 bytes. Thankfully the Rust parsing library I use has built-in support for these, because manually converting is a pain.

One thing to note is that the coordinate system of the raw sensor readings is different from the calibration file's coordinate system.

Extra: Monado driver for the Rokid Max

I always wanted to support the Rokid Max, but I didn't really want to buy one just to do it. Thankfully, a kind soul from Canada actually got in contact with me on github paid for both the glasses and my time to do it. Thanks again Mauve.

The only extra was that I had to also make a Monado driver. Monado is a nice piece of software that implements the OpenXR API, so any OpenXR-using apps (major 3D engines, some AR desktops for example) can use any Monado-supported hardware. They have a very friendly discord, and the code is very good quality, it was a joy to work with, and my code got reviewed basically instantly. Once the comments were fixed, it was in trunk the next day.

Support for the Rokid Max has been merged to main. Some people are working on supporting the Nreal Air, and (as of writing) it works well, but there are some kinks to be ironed out.

Maybe you can help :)


Previous article:
New site design

If you need Augmented Reality problem solving, or want help implementing an AR or VR idea, drop us a mail at info@voidcomputing.hu