So I'm a big fan of opensrc graphics.. and one thing that has frustrated me for a long time is lack of open graphics on ARM platforms. I'm a big fan of open source in general, and that is why I love TI (and Linaro). TI has been very focused on publishing public TRMs getting support for the OMAP platform in the upstream kernel tree. I can build Linus's kernel tree and get something pretty well functional on my pandaboard. The display and omapdrm support in the upstream kernel is progressing pretty well. Which is great. The rpmsg framework is merged in the mainline for 3.4, which is the first step in getting multimedia (video decode/encode) support in the upstream kernel.
But one area where our hands are tied is graphics acceleration. I'd love nothing more than to be working on an opensrc and upstream driver for the SGX GPU used on OMAP platforms. But due to what I know and have access to about the inner workings of the IMGtech GPU's, that would not be possible without IMG's approval. I hope someday they warm up to the open source community, but for now I am forced to look elsewhere to contribute.
But wait.. what about the GPL pvr kernel driver? Well, the fact is that userspace and kernel are not independent. I love not only the linux kernel but the whole gnu/linux system of which a userspace that is developed in a collaborative open fashion is an integral part. And this is especially true in the realm of graphics drivers.. no where else are there such complex interactions between userspace and kernel. I am not strictly against having a closed userspace GL stack provided there is an open userspace alternative that is at least able to exercise the same kernel APIs. If there is an open userspace, that gives anyone who wants to, the freedom to start hacking and contributing and making things better. That is the great thing about the open source! With only a closed userspace, there is no freedom to fix the kernel parts. And the interaction between userspace and kernel parts of a graphics are too complex to be able to accept and properly review a kernel driver for acceptance into the upstream kernel tree without some open userspace that can exercise those APIs provided by the kernel part of the driver. Simply slapping some GPL headers on a kernel module that is ridden with OS abstraction layers and NIH re-invention of infrastructure provided by the upstream kernel isn't going to cut it here. And without an open userspace, there is no room for the open source community to refactor and fix anything.
But I'm not one to sit around and complain about a problem indefinitely without eventually trying to do something about it. One thing that gave me a glimmer of hope is the lima project. The first real (non vaporware) opensrc graphics effort on ARM. With that as a piece of needed inspiration, what could I do to help the cause? Well, with ARM as a member company of Linaro, and coming into contact with ARM folks working on mali, as well as engineers from other Linaro member companies who use mali, it seemed like direct contribution to the lima project might be a bit of a gray area. I don't think I really know any internal s3cr3ts of how mali works (and certainly not more than the lima folks have already figured out). But I don't want to get Linaro in trouble with it's member companies and it seemed like a potential conflict of interest. So what could I do? Pick another ARM platform that I know nothing about, and go to town!
This really leaves two big players. Of the two, I had a friend who could loan me a dragonboard to hack on, so that pretty much clinched the deal. (Although I have hopes that someday someone will figure out how to get something based on the nouveau driver running on tegra.)
Methodology
The approach I took is quite similar to, and strongly inspired by, the approach that Luc Verhaegen took with the lima driver project. It basically amounts to using a LD_PRELOAD shim to intercept system calls, digging through the kernel code to understand the existing userspace<->kernel API, and figuring out how to observe and log the interesting bits.
I've started with 2d acceleration support, mainly because that seemed like a good "warm-up" exercise, and also because there is currently no publicly available acceleration for x11 for the snapdragon platform (binary blob or otherwise). Most of the time so far has gone into figuring out the kernel APIs, and writing some utility code to log and post-process the results of running some simple test apps using the closed src binaries available for android, obtained from a cyanogenmod filesystem (because qualcomm does not provide any userspace support for gnu-linux (non-android) userspace, at least not to the general public). I used some linker tricks to link the test code against the android binary blob libs, and android libc, etc, within a ubuntu 11.10 filesystem. (Fwiw, I use 11.10 because that was prior to the switch over to armhf and based on the 3.0 kernel, which was what I had available from codeaurora git trees.) The good news is, from what I've been able to figure out from the GPL kernel driver, a lot of the infrastructure like pixel and cmdstream buffer allocation, and cmdstream submission, appear to be similar for 2d and 3d, so I think a lot of the work done so far for 2d accel will be useful when it comes to working on the 3d part.
The libwrap code I wrote logs information about the blits (cmdstream, and various parameters like gpu addresses, surface dimensions, blit coords) to a simple .rd log file (which amounts to a sequence of type/length/value fields). These .rd files get processed with a utility I wrote called "redump", to generate a reports showing side-by-side comparisons of the cmdstream, with similarities and parts of dwords that appeared to match surface and blit parameters highlighted. It isn't a perfect disassembly of the command stream, but it certainly helps to spot patterns.
Once I had a reasonable collection of tests for the types of blit operations which are important for an x11 EXA driver, I started varying parameters to figure out the limits, ie. what is the largest blit x, y, width, height, max surface width, height, stride, etc, to establish how many bits are used to encode different fields in the command stream. In some cases, I noticed there were multiple encoding options so parameters could be packed if fewer dwords if less bits where needed to encode the parameters. (For the current EXA driver I'm pretty much using the worst case encoding options so far, to keep things simple.)
With these tests, and the corresponding redump reports, I started work on implementing the EXA accel fxns for the xf86-video-freedreno driver. The work on the EXA driver really only started about 1.5 weekends ago (and most of the time at the beginning was just getting a skeletal driver setup, which is based on a stripped down and simplified xf86-video-msm).
Current Status
So far, I've got the basic solid/copy/composite operations implemented. There are some limitations still in the composite code, such as operations with masks are rejected. (There is an awkward limitation in libC2D2 that there is no way to specific independently mask and src coordinates.. I'm not sure yet if this is a limitation of the hw, but we will be a bit on our own to figure out this via experimentation with the cmdstream. One option to deal with this is ptr arithmetic on the mask surface gpu addr.) And there are still some lesser used color formats that I haven't tackled.
The next big thing, however, will be to deal properly with submission of multiple blits at a time, and not having to block until submitted blits are completely. Without this, performance is (as you would expect) quite bad. But that is easy enough to fix later. There is some awkwardness with the current kernel interface (see NOTES in freedreno tree about how context switch restore works). But that can be fixed by enhancing the kernel part to take separate ptrs in a single ioctl. And of course deciphering the context restore packet would be needed to properly support context switching if you have multiple processes using 2d (but this isn't too important for having a single xserver running so I think we can come back to it later).
A quick note on the kernel: the existing driver from qualcomm is what I'd call a semi-DRM driver. It is using GEM buffers, so it gives us what we'd need eventually for DRI2 and 3d. But not mode setting (which is handled via fbdev driver, also opened by xserver), and not a batchbuffer sort of interface for cmd submission.. cmd submission is handled via separate kgsl-2d/3d devices which are not aware of GEM buffer handles, so mapping buffers to the GPU cannot be handled as part of the cmd submission. So far I'm leaving the kernel driver mostly as-is (sans maybe some minor backwards compatible enhancements), because it is essential to be able to run test code based on the existing binary blob libraries back to back with work-in-progress xorg/mesa drivers. One approach to cleaning up the kernel part might be to provide an emulation layer to emulate the old interfaces, although for now there are enough other things to do that I haven't given this much thought yet. Of course, volunteers are always welcome ;-)
And an IRC channel on freenode at #freedreno
So far there are no mailing lists (I'm not really sure where they could be hosted) or web page other than the wiki pages at gitorious.
Disclaimer
This is a project that I've been working on in my own free time, not using the resources or time of my employeer or Linaro. It is something I've been working on of my own accord because quite simply I want to see advancement of the state of open source graphics in linux. I hope that Linaro will be supportive of this effort, and of open source graphics on all ARM platforms. And I know that a lot of the individual people that make up Linaro are quite passionate about open source. But I realize that dealing with the business concerns of all various member and potential member companies is a difficult balancing act. And as always, the opinions expressed here in my blog are those of my own and not necessarily those of my employer or of Linaro.