Writing an open source GPU driver – without the hardware

After six months of reverse-engineering, the new Arm “Valhall” GPUs (Mali-G57, Mali-G78) are getting free and open source Panfrost drivers. With a new compiler, driver patches, and some kernel hacking, these new GPUs are almost ready for upstream.

In 2021, there were no Valhall devices running mainline Linux. While a lack of devices poses an obvious obstacle to device driver development, there is no better time to write drivers than before hardware reaches end-users. Developing and distributing production-quality drivers takes time, and we don’t want users to be reliant on closed source blobs. If development doesn’t start until a device hits shelves, that device could reach “end-of-life” by the time there are mature open drivers. But with a head start, we can have drivers ready by the time devices reach end users.

Let’s see how.

Reverse-engineering without root

Over the summer, Collabora purchased an Android phone with a Mali-G78. The phone isn’t rooted, so we can’t replace its graphics drivers with our own. We can put the phone in developer mode, so we can run test applications with the proprietary graphics driver and inject our own code with LD_PRELOAD, allowing us to inspect the graphics memory prepared by the proprietary driver and “passively” reverse-engineer the hardware. This memory includes compiled shader binaries in the Valhall instruction set, as well as Valhall’s data structures controlling graphics state like textures, blending, and culling.

Reverse-engineering “actively” is possible, too. We can modify compiled shaders and GPU data structures, allowing us to experiment with individual bits. We can go further, constructing our own shaders and data structures and validating them against the hardware.

To motivate this technique, consider the reverse-engineering of Valhall’s “buffer descriptor”. This new data structure describes a buffer of memory, accessed by a new “load buffer” instruction (LD_BUFFER). After guessing the layout of the buffer descriptor and encoding of LD_BUFFER, we can build our own buffer descriptor and write a shader using LD_BUFFER to validate our guess and probe the low-level semantics.

When reverse-engineering Valhall’s new data structures, we have legacy to guide us. While Valhall reorganizes its data structures to reduce Vulkan driver overhead, the bit-level contents resemble older Mali GPUs. If we find the “contours” of new data structures, we can fill in the details by comparing with older hardware.

As we learn about the data structures, we document our findings in a formal XML hardware description. This file has the same format as the XML for older Mali architectures already supported by Panfrost. Since the Valhall data structures descend from these older architectures, we can fork an older Mali’s XML to save us some typing and keep naming consistent.

After enough reverse-engineering, we can slot our XML into Panfrost, automatically generating code to pack and unpack the data structures. Thanks to tireless work by Collaboran Boris Brezillon, Panfrost’s performance-critical code is specialized at compile-time to the target architecture, allowing us to add new architectures without adding overhead to existing hardware. So with our XML in hand, we can get started writing a Valhall driver.

Writing drivers without hardware

It is November 2021. I’ve written a Valhall compiler. I’ve reverse-engineered enough to write a driver. I still have no Linux hardware to test my code.

That’s a major road block.

Good thing I know a detour.

We can develop the driver on any Linux machine, without testing against real hardware. To pull that off, unit testing is mandatory. With no hardware, we can’t run integration tests, but unit tests can run on any hardware. For the Valhall compiler, I wrote unit tests for everything from instruction packing to optimization. Although the coverage isn’t exhaustive, it caught numerous bugs early on.

There is a caveat: unit testing can’t tell us if our expectations of the hardware are correct. However, it can confirm that our code matches our expectations. If our reverse-engineering is thorough, these expectations should be correct.

Even so, unit testing alone isn’t enough.

Enter drm-shim.

drm-shim

Mesa drivers like Panfrost can mock test hardware with drm-shim, a small library which stubs out the system calls used by userspace graphics drivers to communicate with the kernel. With drm-shim, unmodified userspace drivers think they’re running against real hardware – including Valhall hardware.

Graphics guru Emma Anholt designed drm-shim to run Mesa’s compilers as cross-compilers for use in continuous integration (CI). Outside of CI, drm-shim allows testing compilers on our development machines, which may be significantly faster than the embedded devices we target. But it’s not limited to compilers; we can run entire test suites under drm-shim, “cross-testing” for any hardware we please. The tests won’t pass, since drm-shim does no rendering; it is a shim, not an emulator. But it allows us to exercise new driver code paths without the constraints of real hardware.

As drm-shim runs on any Linux machine, I wanted to use the fastest Linux machine I own: my Apple M1. Bizarrely, drm-shim didn’t work on my M1 Linux box, although it works on everyone else’s computers. That calls for a debugging session.

After some poking around, I stumbled on the offending code:

bo->addr=util_vma_heap_alloc(&heap, size, 4096);
mmap(NULL, ..., bo->addr);

This code allocates a chunk of memory aligned to a page and uses its address as the offset in a call to mmap. On my system, the mmap call fails, so I consulted the man page for mmap:

offset must be a multiple of the page size as returned by sysconf(_SC_PAGE_SIZE).

The mmap in drm-shim works, because the page size on Linux is 4096 bytes (4K)…

Until it isn’t.

Apple’s input/output memory management unit uses larger pages, 16384 bytes (16K) large. As a consequence, when we run Linux bare metal on Apple platforms, we configure Linux to use 16K pages everywhere to keep life simple. That means that on Apple platforms running Linux, sysconf(_SC_PAGE_SIZE) returns 16384, so the mmap fails. The fix is easy:

bo->addr=util_vma_heap_alloc(&heap, size, sysconf(_SC_PAGE_SIZE));
mmap(NULL, ..., bo->addr);

With that, drm-shim works on systems with page sizes larger than 4K, including my M1. That means I can compile thousands of shaders per second with the Valhall compiler, far more than any system with a Mali GPU could. I can also run Khronos’s OpenGL ES Conformance Test Suite:

PAN_MESA_DEBUG=valhall,trace LIBGL_DRIVERS_PATH=~/lib/dri/ LD_PRELOAD=~/mesa/build/src/panfrost/drm-shim/libpanfrost_noop_drm_shim.so PAN_GPU_ID=9091 EGL_PLATFORM=surfaceless ./deqp-gles31 --deqp-surface-type=pbuffer --deqp-gl-config-name=rgba8888d24s8ms0 --deqp-surface-width=256 --deqp-surface-height=256'

Long commands like this one run tests and produce pretty-printed dumps of GPU memory, ready for manual inspection. If the dumps look like the dumps from the proprietary driver, there’s a good chance the tests will pass on real hardware, too.

Since Valhall is similar to its predecessors, the years we’ve spent nurturing Panfrost mean we only need to modify the driver in areas where Valhall introduces breaking changes.

For example, Valhall’s instruction set resembles the older “Bifrost” instruction set, so we may embed the Valhall compiler as an additional backend in the existing Bifrost compiler. Shared compiler passes like instruction selection and register allocation “just work” on Valhall, even though they were developed and debugged for Bifrost.

Once we adapt Panfrost for Valhall, we’ll have a conformant, performant driver ready out-of-the-box.

…In theory.

Real hardware, real pain

I couldn’t test on real Valhall hardware until early January, when I procured a Chromebook with a MediaTek MT8192 system-on-chip and a matching serial cable. MT8192 sports a Valhall “Mali-G57” GPU, compatible with the Mali-G78 I’m reverse-engineering. Mainline kernel support for MT8192 is sparse, but Linux does boot. With patches by other Collaborans, USB works too. That’s enough to get to work on the GPU. Sure, the display doesn’t work, but who needs that?!

We’ll start by teaching Linux how to find the GPU. On desktops, ACPI and UEFI let the operating system discover any connected hardware. While these standards exist for Arm, in practice Arm systems require a device tree describing the hardware: what parts there are, which registers and clocks they use, and how they’re connected. We don’t know much about MT8192, but ChromeOS supports it, so ChromeOS has a complete device tree. Adapting that device tree for mainline, we soon see signs of life:

[  1.942843] panfrost 13000000.gpu: unknown id 0x9093 major 0x0 minor 0x0 status 0x0

The kernel cannot identify the connected Mali GPU, but that’s expected – after all, it has never seen a Mali-G57 before. We need to add a mapping from Mali-G57’s hardware ID to its name, feature list, and hardware bug list. Then the driver loads.

[  1.942843] panfrost 13000000.gpu: mali-g57 id 0x9093 major 0x0 minor 0x0 status 0x0
[  1.982322] [drm] Initialized panfrost 1.2.0 20180908 for 13000000.gpu on minor 0

Based on the downstream kernel module released by Arm, we know the parts of Valhall relevant to the kernel are backwards-compatible with Mali GPUs from a decade ago. Panfrost supports existing Mali hardware, so in theory, we can test drive the Mali-G57 right now.

When it comes to hardware, theory and practice never agree.

Let’s try submitting a “null job” to the hardware, a simple job that does nothing whatsoever:

struct mali_job_descriptor_header job={
    .job_type=MALI_JOB_TYPE_NULL,
    .job_index=1
};

Only 2 bits set in the entire data structure. We can even hard-code this job into the kernel and submit it as soon as the hardware powers on. Since this job is correct, the hardware will run it fine.

[   2.094748] panfrost 13000000.gpu: js fault, js=1, status=DATA_INVALID_FAULT, head=0x6087000, tail=0x6087000

What? The hardware claims the job is invalid, even though the job is clearly valid. Apparently, the hardware is reading something different from memory than we wrote.

That symptom is eerily familiar. When Collaboran Tomeu Vizoso and I added support for Mali-G52 two years ago, we observed the same symptoms on an Amlogic system-on-chip. The culprit was an Amlogic-specific cache coherency issue. That fix doesn’t apply here, so it’s time to hunt for MediaTek-specific bugs.

Crawling through ChromeOS code, I found that MediaTek submitted an unexplained change to the GPU driver, setting a single bit belonging to a clock on MT8192 in order to “disable ACP”, fixing bus faults. This change is the embodiment of a “fix everything” magic bit, the kind only rumoured to exist and the stuff of reverse-engineers’ nightmares.

…But setting that bit in our kernel makes our null job complete successfully.

…Wait, what?

It turns out ACP is the “Accelerator Coherency Port”, responsible for managing cache coherency between the CPU and the GPU. Apparently, ACP was not supposed to be enabled on MT8192, but due a hardware bug was enabled accidentally. The kernel must set this bit to disable ACP as a workaround.

Again, what?

Pressing on, we can submit the same null job from userspace. To the hardware, kernelspace and userspace are the same, so this must work.

It does not.

The job times out before completing. Inspecting the kernel log, we notice an earlier timeout, waiting for the GPU to wake up after being reset.

Littering the kernel with printks, eventually we find that the GPU is powered off once Linux boots, and nothing we do will power it back on. No wonder everything times out.

For some problems, we can only hope for a leprechaun to whisper the solution in our ear. Our leprechaun comes in the form of kernel wizard Heiko Stuebner. Heiko suggested that Linux might be powering off the GPU. To save power, Linux turns off unused clocks and power domains. If Linux doesn’t know a clock or power domain is used by the GPU, it’ll turn off the GPU inadvertently.

For debugging, we can disable this mechanism by setting the clk_ignore_unused pd_ignore_unused kernel arguments. Doing so makes our userspace tests work.

Sometimes the simplest solutions are in front of us.

What is the root cause? MediaTek has a complicated hierarchy of clocks and power domains, and we missed some in our device tree. We’ll need to update our code to teach Linux about the extra clocks and power domains to fix the issue properly.

Nevertheless, we can now test our driver on real hardware. It’s a rough start: the first job we submit returns a Data Invalid Fault. Experimenting, it seems Valhall requires greater pointer alignment of its data structures than Bifrost did. Increasing the alignment at which we allocate fixes the faults, and decreasing again lets us determine the minimum required alignment. This information is accessible once we can run code on the hardware, but inaccessible when studying hardware in vitro. Reverse-engineering and driver development are better together.

Success at last

With these fixes, we finally see our first passing test, running on real hardware, with data structures prepared by our open source Mesa driver and shaders compiled by our Valhall compiler. Woo!

It only took me a few days after getting the hardware and a serial cable to pass hundreds of tests on the new architecture. Months of speculatively developing the driver came with a huge pay off.

Sounds like we’ll have Valhall drivers in time for end-users after all.

Note: This article have been indexed to our site. We do not claim legitimacy, ownership or copyright of any of the content above. To see the article at original source Click Here

Pet Steam Brush for Dog & Cat – 3-in-1 Spray Hair Removal Comb – Steam Brush for Shedding & Grooming – Water Brush for Long & Short Haired Pets – Spritz Defur Comb – Includes Waterless Shampoo

(63)

$24.99 (as of March 9, 2025 19:44 GMT +00:00 - )

Muddy Mat® Original Dog Door Mat for Muddy Paws, Super Absorbent Microfiber, Non-Slip Washable Pet Rug, Quick Dry Chenille Entryway Carpet, Machine Washable Indoor Outdoor mat, Army Green 18"x28"

(21576)

$19.95 (as of March 9, 2025 19:46 GMT +00:00 - )

Amazon Basics Digital Kitchen Scale with LCD Display, Batteries Included, Weighs up to 11 pounds, Black and Stainless Steel

(99099)

$11.49 (as of March 9, 2025 20:21 GMT +00:00 - )

LEGO Botanicals Lucky Bamboo Building Set - Artificial Plant for Indoor Home Décor, Adults Ages 18+ - Fake Plant Decoration for Table, Desk, Office - Unique Gift for Her & Him - 10344

(377)

$29.65 (as of March 9, 2025 19:44 GMT +00:00 - )

Artnaturals Kojic Acid Soap + African Net Sponge (2 pack X 142g Turmeric bars) Dark spot remover & Scars - Original Japanese Complex Vitamin C, Hyaluronic Acid, retinol, shea butter (Citrus)

(2110)

$14.97 (as of March 9, 2025 20:21 GMT +00:00 - )

Index Of News Author

Technology

The Renault-Nissan-Mitsubishi Alliance to develop a solid-state battery and 35 new EVs in a US$26 billion investment

Reviews, News, CPU, GPU, Articles, Columns, Other "or" search relation.3D Printing, 5G, Accessory, AI, Alder Lake, AMD, Android, Apple, ARM, Audio, Biotech, Business, Camera, Cannon Lake, Cezanne (Zen 3), Charts, Chinese Tech, Chromebook, Coffee Lake, Comet Lake, Console, Convertible / 2-in-1, Cryptocurrency, Cyberlaw, Deal, Desktop, E-Mobility, Education, Exclusive, Fail, Foldable, Gadget, Galaxy Note, Galaxy S,…

January 27, 2022

Technology

This week in AI: OpenAI plays for keeps with GPTs

Keeping up with an industry as fast-moving as AI is a tall order. So until an AI can do it for you, here’s a handy roundup of recent stories in the world of machine learning, along with notable research and experiments we didn’t cover on their own. This week in AI, OpenAI held the first of what

November 10, 2023

Technology

WeChat Pay Launches ‘Learn First, Pay Later’ Feature

In recent years, the phenomenon of education and training institutions going bankrupt has become increasingly common. Those who pay upfront often experience the painful lesson of ‘paying money but receiving nothing in return’. Even legitimate organizations within the industry are often affected by such incidents, struggling to attract customers and make profits. Last month, WeChat

December 13, 2023

Technology

Browns vs Steelers live stream: how to watch NFL Monday Night Football online anywhere

Home News Software (Image credit: Nick Cammett/Diamond Images via Getty Images) Somehow, both Cleveland and Pittsburgh still have an outside shot at making the playoffs, but with Baker Mayfield and Ben Roethlisberger calling the shots, you might want to watch this one through your fingers. Read on as we explain how to get a Browns…

January 3, 2022

Technology

Pixel 6 GPS location issue patch rolling out now, delayed August security patch coming soon

Google recently released a small patch for the Pixel 6, the Pixel 6 Pro and the Pixel 6a that addresses a GPS location issue. Announced on Google’s Pixel support thread, the update is described as a “Fix for GPS location failure under certain conditions.” Google says the rollout for the update started August 4th, and…

August 5, 2022

Technology

Oracle to power 1GW datacenter with trio of tiny nuclear reactors

Oracle is going nuclear over growing demand for AI datacenters, and that's not a metaphor for Larry Ellison's mood. On Monday's quarterly earnings call, Oracle's founder, chair and CTO revealed the database giant and cloud provider had secured building permits for a trio of small modular reactors (SMRs) to power a datacenter with over a

September 11, 2024

Hand-Picked Top-Read Stories

Nyesom Wike Arrives In Italy To Meet Lombardy President, Attilio Fontana

Tinubu Appoints Adeladan Olarinre, Mukhtar Muhammed As New Permanent Secretaries

Akpabio Denies Sexual Harassment Allegations As Natasha Akpoti Files New Petition

Trending Tags

Writing an open source GPU driver – without the hardware

Reverse-engineering without root

Writing drivers without hardware

drm-shim

Real hardware, real pain

Success at last

Pet Steam Brush for Dog & Cat – 3-in-1 Spray Hair Removal Comb – Steam Brush for Shedding & Grooming – Water Brush for Long & Short Haired Pets – Spritz Defur Comb – Includes Waterless Shampoo

Muddy Mat® Original Dog Door Mat for Muddy Paws, Super Absorbent Microfiber, Non-Slip Washable Pet Rug, Quick Dry Chenille Entryway Carpet, Machine Washable Indoor Outdoor mat, Army Green 18"x28"

Amazon Basics Digital Kitchen Scale with LCD Display, Batteries Included, Weighs up to 11 pounds, Black and Stainless Steel

LEGO Botanicals Lucky Bamboo Building Set - Artificial Plant for Indoor Home Décor, Adults Ages 18+ - Fake Plant Decoration for Table, Desk, Office - Unique Gift for Her & Him - 10344

Artnaturals Kojic Acid Soap + African Net Sponge (2 pack X 142g Turmeric bars) Dark spot remover & Scars - Original Japanese Complex Vitamin C, Hyaluronic Acid, retinol, shea butter (Citrus)

League of Legends Teaser Hints at New Champion

Should Pharmaceutical Companies Sell Weight-Loss Drugs Directly to Consumers?

Lil Kim, 47, Slays In Blue Corset Top With Matching Hair & Thigh-High Leather Boots — Photos

Kites in the Classroom: Training Teachers to Conduct Remote Sensing Missions

NHL Top Plays (Feb. 14)

Nyesom Wike Arrives In Italy To Meet Lombardy President, Attilio Fontana

Tinubu Appoints Adeladan Olarinre, Mukhtar Muhammed As New Permanent Secretaries

Akpabio Denies Sexual Harassment Allegations As Natasha Akpoti Files New Petition

Has church lost moral authority? Prophet Peter drugging & assault scandal [Pulse Editor’s Opinion]

Dentsu launches School of Influence for creators – First of its kind in Kenya

Writing an open source GPU driver – without the hardware

Reverse-engineering without root

Writing drivers without hardware

drm-shim

Code sharing

Real hardware, real pain

Success at last

Related Posts