More AR glasses USB protocols: the Worse, the Better and the Prettier
We've found a drop-in replacement for the Nreal Light, called the Grawoow G530 (or Metavision M53, and who knows how many other names), so we finally have 3 more protocols to write about in this blog.
Background
The previous blog post in the topic has become somewhat of a reference in the community, and has actually driven some sales for the company, so it seemed like a good idea to write about our more recent findings, and share it with anyone interested.
The post itself will probably be a bit dry for the casual reader. Sorry about that.
The Worse: Grawoow G530
We started searching for a replacement for the XREAL Light from Day 1, because it is not supported or manufactured by XREAL anymore. We needed glasses with stereo cams and active support.
Some months ago I got contacted on LinkedIn by a Chinese seller, and after a bit of talking, we bought a test piece. It wasn't as easy as just going on a webshop (and I had to do all kinds of import paperwork), but it was still smooth: send some mails, wire money, receive glasses.
I call it worse, because it is a tiny bit worse in every regard: it looks cheaper, the plastic parts fit worse, and the protocol is missing some information that is crucial for good support. The main thing it has going for it is that it's still available from distributors.
The architecture is extremely similar to the XREAL Light, so much so that I'm only going to link the draft architecture pic. Main components:
- All the standard DP-over-USB-C driving two micro OLED LCD display stuff on two USB 3 lanes
- An USB 3 RGB CAM on the remaining two lanes
- Stereo grayscale cams and IMU driven by an OV-580 (exactly the same as the Light, even the protocol... mostly)
- Distance sensor at the forehead
- Four physical buttons: brightness up/down, and volume up/down
- Judging by the USB device descriptors, the MCU and audio functionalities are done by the same chip.
Gotta love Chinese copying culture. By the way, the glasses seem to be widely white-labeled; the Metavision M53 seem to be the same hardware. Even the firmware and SDK say G530 and not M53.
USB interfaces
The device comes up as two hubs (one USB 3 and one USB 2) and 5 devices:
- Realtek Semiconductor Corp. RGB Camera
- VID:
0bda
, PID:5880
- Bog-standard USB3 camera, capable of full HD at some pretty high frame rates
- CVT Electronics.Co.,Ltd G530
- VID:
1ff7
, PID:0ff4
- Interfaces
- 0: HID. This is the glasses control (MCU) endpoint.
- 1,2,3: Audio
- OmniVision Technologies, Inc. USB Camera-OV580
- VID:
05a9
PID:0f87
- The OV580 with an UVC (stereo cam) and a HID (IMU) interface
MCU Protocol
The MCU control is predominantly through control packets, although there is an interrupt endpoint for the forehead detector event.
The control protocol is always two control packets, one to send the command
and one to receive the result. The magic libusb
parameters are:
Send:
bmRequestType: 0x21 (CTRL_TYPE_CLASS|CTRL_RECIPIENT_INTERFACE|ENDPOINT_OUT)
bRequest: 9
wValue: 0x201
wIndex: 0
Receive:
bmRequestType: 0xa1 (CTRL_TYPE_CLASS|CTRL_RECIPIENT_INTERFACE|ENDPOINT_IN)
bRequest: 1
wValue: 0x102
wIndex: 0
Note that these are the standard SetReport
and GetReport
HID requests (see
Section 7.2 in the HID Device Class definition),
so these might be available with some standard report-based HID APIs.
The packet structure is as follows:
- Header: 2 bytes, fixed
0xaa, 0xbb
- Command: 2 bytes, big endian
- Additional data size: 2 bytes, big endian
- Additional data: variable, can be 0 bytes
- Checksum: sum of all previous bytes, excluding the
0xaa
,0xbb
part.
An example packets:
- Get serial number:
[0xaa, 0xbb, 0x80, 5, 0, 0, 0x85]
- Response:
[0xaa, 0xbb, 0x80, 5, 0, 5, 0x33, 0x31, 0x33, 0x33, 0x37, 0x8b]
- (I've accidentally overwritten the serial before I could read the original one.)
Commands (data is empty for "Get" commands here):
Command | ID | Data |
---|---|---|
Get firmware version | 0xffe1 | 8 bytes, unknown format |
Get serial number | 0x8005 | The serial number as UTF-8 string |
Set serial number | 0x8004 | Same as above |
Get display mode | 0x8007 | Display mode as a single byte: 0 is mirrored, any other nonzero is SBS 60Hz |
Set display mode | 0x8008 | Same as above |
Get display brightness | 0x801d | Brightness as a single byte, 0-4 |
Set display brightness | 0x801e | Same as above |
Some commands I haven't listed (or tested really), but can be easily obtained from the SDK libraries:
- Sensor and camera enable/disable: All sensors and cameras are enabled and streaming by default, so no need to touch those
- Display settings, like brightness contrast per channel, and all kinds of low-level DP stuff
- Audio volume
- Firmware update
- Sketchy stuff like getting and setting a HDCP key
You can also continuously read the Interrupt endpoint on endpoint number 0x85
,
where you should get key and distance sensor events (in the same 0xaa 0xbb
format
as the control packets), but I only ever got the "glasses taken off" event,
and it was not worth implementing.
Getting the calibration data
As opposed to the XREAL protos, where you can get the calibration JSON
from the OV580, you actually have to do it over the above MCU protocol, using
command IDs 0x8009
(metadata) and 0x800a
(actual calibration data).
The metadata response looks something like this:[0, 0, 0, 241, 0, 0, 10, 210, 3, 142]
- 2 bytes header, which should be 0
- 2 bytes is the "max packet size" (big endian). We'll be doing 256 byte control packets anyway, but good to know I guess?
- 4 bytes data size (big endian)
- 2 more unknown bytes.
The "get calibration data" packet needs additional data: a 0 byte, and then
4 byte offset, in big endian. So it's [0, 0, 0, 0, 0]
for the first packet,
[0, 0, 0, 0, 241]
for the next, and so on.
Response is the same 5 bytes followed by a 0 byte (so 6 in total), and then the actual data. If you request more data than the calibration file size, the packet will be smaller, or even empty. So requesting the metadata is kind of useless, you can just request data until you get an empty response.
IMU protocol
Fortunately this is another glasses that gives you an IMU stream out of the box,
and you don't need to fight for it. All you have to do is continuously read 0x80
chunks
on the HID interrupt endpoint 0x89
of the OV580 device.
It is a large packet, and the SDK only parses the raw accelerometer, gyro and temperature data. A lot of the packets seem to be fixed bytes, and the only thing that changes (other than what we already know) are two sequence numbers. Yeah, sequence numbers, not even proper timestamps.
All data are transferred as little endian signed ints. The conversion factors are the same as in the Invensense MPU6050 docs.
Data | Offset | Size | Conversion |
---|---|---|---|
Acceleration | 0x58 | 3*4 | Divide by 16384.0 and then convert g s to m/s2 |
Gyroscope | 0x3c | 3*4 | Divide by 16.4 and then convert °/s to rad/s |
Temperature | 0x2a | 2 | Divide by 326.8, then add 25.0 |
The Better: Rokid Max
The Rokid Max is a logical evolution of the Rokid Air. Better design, better fit, better protocol, and the DisplayPort part is apparently 2ms quicker, reducing motion-to-photon latency. Everything else is pretty much the same, so much so that most of it can be handled by the same code. They even kept the gimmicky focal adjustment knobs (even though it's still unusable for people who have astigmatism)
Protocol
The main new protocol element is "sensor data marker = 17
" in the IMU data packets, which combines
all previous packets into one. Its structure looks like this:
Index | Bytes | Description |
---|---|---|
0x00 | 1 | Sensor data marker (17) |
0x01 | 8 | Timestamp (little endian) |
0x09 | 3x4 | Gyroscope x, y and z reading in f32 format |
0x15 | 3x4 | Accelerometer x, y and z reading in f32 format |
0x21 | 3x4 | Magnetometer x, y and z reading in f32 format |
0x2d | 1 | Physical key statuses (bitfield) |
0x2e | 1 | Proximity sensor status (near=0, far=1) |
0x2f | 1 | ? |
0x30 | 8 | Timestamp of last VSYNC (little endian) |
0x38 | 3 | ??? |
0x3b | 1 | Display brightness |
0x3c | 1 | Volume |
0x3d | 3 | ??? |
Display modes
The Max added a bunch of new display modes:
Mode | SBS | Resolution | Refresh rate |
---|---|---|---|
0 | 1920x1080 | 60Hz | |
1 | Yes | 3840x1080 | 60Hz |
2 | Yes* | 1920x1080 | 60Hz |
3 | 1920x1080 | 120Hz | |
4 | Yes | 3840x1200 | 90Hz |
5 | Yes | 3840x1200 | 60Hz |
*: This is a "half SBS" mode, meaning that it splits the regular HD image in half, and then stretches each half horizontally over each of the glasses.
Modes above 6 are equivalent to mode 3.
The Prettier: XREAL Air
The XREAL Air is not an evolution of the Light, it is much more like the Rokid Max, but with a way better design. And I mean a lot better, the thing actually looks like regular (albeit a bit big) sunglasses. It is the first AR glasses that passes the "Tram #4 test": I could wear it on Tram #4 and people wouldn't really notice. Maybe the cable hanging down.
Unfortunately it doesn't have a camera, so no inside-out 6DOF anymore.
On the other hand, it has the absolute lowest display delay out of all 6 we described in these blog posts, so the image is rock stable even with dynamic head movements.
The protocol is weird. They kept the separate USB interfaces for the MCU and IMU + DSP pair. Both are different from the Light's. Unfortunately this post was written way after I finished work on the Air, so I'm writing it based on the code of ar-drivers-rs.
MCU protocol
Packets are sent over regular HID read()
and write()
primitives, over interface 4
(endpoints 0x86
and 0x07
). Packet size is 0x40
both ways.
Index | Bytes | Description |
---|---|---|
0x00 | 1 | Header (0xfd) |
0x01 | 4 | Checksum (see below) |
0x05 | 2 | Length of additional data |
0x07 | 4 | Request ID (not checked by the MCU, only used to identify answers. Can be anything) |
0x0b | 4 | Timestamp (also not checked, can be 0) |
0x0f | 2 | Command ID |
0x11 | 5 | Zeros (probably) |
0x16 | n | Additional data |
Every int is Little Endian.
The checksum is CRC32(Adler) like the Light's. The checksum data is from byte 5 to the end of the packet (i.e. the length field + 17).
Again, there is no need to individually enable events or hardware, so we only need the bare minimum commands:
Command | ID | Data |
---|---|---|
Get MCU FW version | 0x0026 | Version as UTF-8 string |
Get serial number | 0x0015 | The serial number as UTF-8 string |
Get display mode | 0x0007 | Display mode as a single byte |
Set display mode | 0x0008 | Same as above |
There is a .js file in the official app that describes a lot more commands for both the Air and the Light. There aren't many interesting things, just a couple version strings, firmware update, reboot, and fiddling with the display.
Some asynchronous events also arrive on the same channel (sometimes between command and its reply). They use the same packet format as the commands and replies. The only one worth looking for is ID 0x6c05, which is the key press (more precisely key release) event.
Display modes
They also added a lot more display modes:
Mode | SBS | Resolution | Refresh rate |
---|---|---|---|
1 | 1920x1080 | 60Hz | |
3 | Yes | 3840x1080 | 60Hz |
4 | Yes | 3840x1080 | 72Hz |
5 | 1920x1080 | 72Hz | |
8 | Yes* | 1920x1080 | 60Hz |
9 | Yes | 3840x1080 | 90Hz |
10 | 1920x1080 | 90Hz | |
11 | 1920x1080 | 120Hz |
*: This is a "half SBS" mode, meaning that it splits the regular HD image in half, and then stretches each half horizontally over each of the glasses. This is the replacement for Mode 1, which was vertically stretched half-SBS on the Light.
Invalid display modes cause an error, and I checked all 256 values.
The IMU protocol
IMU packets are also sent/received with regular HID read()
and write()
,
over interface 3 (endpoints 0x84
and 0x05
), with 0x40
-sized packets.
Index | Bytes | Description |
---|---|---|
0x00 | 1 | Header (0xaa) |
0x01 | 4 | Checksum (same as MCU checksum) |
0x05 | 2 | Length of additional data |
0x07 | 1 | Command ID |
0x08 | n | Additional data |
Every int is Little Endian.
Interestingly, while the packet format is very different, the commands are exactly the same as the Light's:
Command | Id | Command data |
---|---|---|
Get calibration file length | 0x14 | Calibration file id according to the SDK, doesn't seem to affect anything. Can be empty |
Get calibration file part | 0x15 | Should be block number. Doesn't do anything, can be empty. |
Enable IMU stream | 0x19 | 0: disable, 1: Enable |
The calibration file format is similar, although this time they didn't stuff 3 different files in there, you only have the JSON.
The IMU packet format is different, more compact, but the logic is the same:
Index | Bytes | Description |
---|---|---|
0x00 | 2 | Header (0x01, 0x02) |
0x02 | 2 | Temperature (raw data from the ICM-20602) |
0x04 | 8 | Timestamp (nanoseconds) |
0x0C | 2 | Gyroscope multiplier |
0x0e | 4 | Gyroscope divisor |
0x12 | 3 | Gyroscope X reading |
0x15 | 3 | Gyroscope Y reading |
0x18 | 3 | Gyroscope Z reading |
0x1b | 2 | Accelerometer multiplier |
0x1d | 4 | Accelerometer divisor |
0x21 | 3 | Accelerometer X reading |
0x24 | 3 | Accelerometer Y reading |
0x27 | 3 | Accelerometer Z reading |
0x2a | 2 | Magnetometer offset |
0x2c | 4 | Magnetometer divisor |
0x30 | 2 | Magnetometer X reading |
0x32 | 2 | Magnetometer Y reading |
0x34 | 2 | Magnetometer Z reading |
Yes, there are 3 byte signed integers there. They are encoded the same way as "regular" 4 byte integers (little endian, one's complement), but on 3 bytes. Thankfully the Rust parsing library I use has built-in support for these, because manually converting is a pain.
One thing to note is that the coordinate system of the raw sensor readings is different from the calibration file's coordinate system.
Extra: Monado driver for the Rokid Max
I always wanted to support the Rokid Max, but I didn't really want to buy one just to do it. Thankfully, a kind soul from Canada actually got in contact with me on github paid for both the glasses and my time to do it. Thanks again Mauve.
The only extra was that I had to also make a Monado driver. Monado is a nice piece of software that implements the OpenXR API, so any OpenXR-using apps (major 3D engines, some AR desktops for example) can use any Monado-supported hardware. They have a very friendly discord, and the code is very good quality, it was a joy to work with, and my code got reviewed basically instantly. Once the comments were fixed, it was in trunk the next day.
Support for the Rokid Max has been merged to main. Some people are working on supporting the Nreal Air, and (as of writing) it works well, but there are some kinks to be ironed out.
Maybe you can help :)
New site design
If you need Augmented Reality problem solving, or want help implementing an AR or VR idea, drop us a mail at info@voidcomputing.hu