Blogging the Monkey: 2012

Monday, December 10, 2012

es2gears!

Another quick update, since it has been a while. I've been working for the last month or so on a gallium driver for freedreno, and now it is finally showing signs of life with es2gears:

There are still a lot of little missing pieces (textures, not translating a lot of tgsi shader opcodes, and it won't work for anything that isn't glClear()ing the color/depth/stencil buffers each frame. Although these shouldn't be too hard to add, just takes time.

The work-in-progress mesa code is on github:
git://github.com/freedreno/mesa.git

Tuesday, October 23, 2012

freedreno update

Just a quick update, since it has been a while and I haven't had time to update the old blog:

I managed to get stencil buffer working just before XDC, a bit over a month ago.. no exciting looking demo, but the stencil test (which is based on the stencil example in opengles-book-samples) is now working
Since then I have been starting to get things into shape in order to start on a gallium driver. I've written a libdrm_freedreno module to include the common code for interfacing with the kernel driver, needed by xf86-video-freedreno, fdre, and the eventual gallium driver. Eventually we will need a proper DRM kernel driver, but libdrm_freedreno abstracts this well enough that I can concentrate on the 2d/3d userspace parts. The eventually plan is that when there is a proper DRM kernel driver (supporting a pushbuffer type interface, and synchronization between 2d and 3d pipes), libdrm_freedreno will be the point where we support talking to either the new driver or the existing QCOM kernel driver.
I've fixed the problems that were preventing batching in the 2d driver, so now the performance is starting to be decent for xf86-video-freedreno. There is still some room for improvement (we fall back to sw for some sorts of blits that the hardware should be able to support, but libC2D2 does not, primarily blits with mask surfaces.. but I think I understand enough of the 2d registers to implement the missing blits. And also there is probably some room to optimize the cmdstream a bit for batches of blits.)
I've implemented DRI2 support in xf86-video-freedreno and fdre, so now we have: dri2 lols cats:

So the next step, which I should hopefully be starting on in a week or two is the gallium driver! Of course this is still going to be a lot of work, but things are coming along nicely so far :-)

Wednesday, August 15, 2012

Open Source lolscat!

Last weekend I fixed a few issues w/ the assembler to make more complex fragment shaders work, added support for VBO's. And now I've pushed support to enable GL_BLEND. And as a result, I give you open source lolscat:

The model was borrowed from glmark2, the fragment shader uses phong lighting. And two textures are alpha blended in front to give the text.

Sunday, August 5, 2012

textured cube (fullscreen!)

managed to bang out some basic texture support.. and also multi-tile rendering so I'm not limited to render targets that can fit in GMEM (512KiB on the a220 that I have in my touchpad), and of course this is using the freedreno shader assembler (no need for the binary blob):

maybe some day I'll figure out how to make better quality videos :-P

Thursday, August 2, 2012

freedreno moves to github

just a quick update: I've move freedreno to github: http://freedreno.github.com/

I've started to add some wiki pages to document what I've learned so far, and the gitorious wiki was just too useless to deal with.

Sunday, July 29, 2012

freedreno update: first renders shader assembler!

For the last month or so, I've been working on deciphering the adreno shader instruction set and creating a disassembler, and now assembler. I've hooked it up in fdre and now I can run all the test apps with shaders assembled from the fdasm assembler syntax, rather than requiring pre-compiled shaders from the adreno binary blob! This means now I can run 3d tests with no dependency on the binary blob!

No new pictures, everything looks the same as before, including the spinning cube. I've updated the cube test with some comments in the shader assembly to make it a bit easier to follow.

Monday, June 25, 2012

First renders

Just a quick update about the freedreno project.. after a month or so of not having much time to work on it due to travel and other projects, last weekend I finally had some time to spend on 3d and fdre, a simple library to drive the gpu (similar to limare) and some test apps, and now we have first renders for a few simple test apps.

quad-flat:

triangle-quad:

and triangle-smoothed:

The shaders are currently just binary, extracted via cffdump from egl/gles test apps written to use the same shaders. And for now I'm sticking to smaller render target sizes which can be handled in a single pass without resorting to binning (basically splitting the render up into parts that can each fit within the built in GMEM).

It is of course a long ways from a full blown mesa driver, but I'm making some good progress on figuring out the command stream. No where near the progress of the lima guys, who already have textures working and are pretty far on the vertex and shader compilers.

Saturday, April 14, 2012

Fighting back against binary blobs!

So I'm a big fan of opensrc graphics.. and one thing that has frustrated me for a long time is lack of open graphics on ARM platforms. I'm a big fan of open source in general, and that is why I love TI (and Linaro). TI has been very focused on publishing public TRMs getting support for the OMAP platform in the upstream kernel tree. I can build Linus's kernel tree and get something pretty well functional on my pandaboard. The display and omapdrm support in the upstream kernel is progressing pretty well. Which is great. The rpmsg framework is merged in the mainline for 3.4, which is the first step in getting multimedia (video decode/encode) support in the upstream kernel.

But one area where our hands are tied is graphics acceleration. I'd love nothing more than to be working on an opensrc and upstream driver for the SGX GPU used on OMAP platforms. But due to what I know and have access to about the inner workings of the IMGtech GPU's, that would not be possible without IMG's approval. I hope someday they warm up to the open source community, but for now I am forced to look elsewhere to contribute.

But wait.. what about the GPL pvr kernel driver? Well, the fact is that userspace and kernel are not independent. I love not only the linux kernel but the whole gnu/linux system of which a userspace that is developed in a collaborative open fashion is an integral part. And this is especially true in the realm of graphics drivers.. no where else are there such complex interactions between userspace and kernel. I am not strictly against having a closed userspace GL stack provided there is an open userspace alternative that is at least able to exercise the same kernel APIs. If there is an open userspace, that gives anyone who wants to, the freedom to start hacking and contributing and making things better. That is the great thing about the open source! With only a closed userspace, there is no freedom to fix the kernel parts. And the interaction between userspace and kernel parts of a graphics are too complex to be able to accept and properly review a kernel driver for acceptance into the upstream kernel tree without some open userspace that can exercise those APIs provided by the kernel part of the driver. Simply slapping some GPL headers on a kernel module that is ridden with OS abstraction layers and NIH re-invention of infrastructure provided by the upstream kernel isn't going to cut it here. And without an open userspace, there is no room for the open source community to refactor and fix anything.

But I'm not one to sit around and complain about a problem indefinitely without eventually trying to do something about it. One thing that gave me a glimmer of hope is the lima project. The first real (non vaporware) opensrc graphics effort on ARM. With that as a piece of needed inspiration, what could I do to help the cause? Well, with ARM as a member company of Linaro, and coming into contact with ARM folks working on mali, as well as engineers from other Linaro member companies who use mali, it seemed like direct contribution to the lima project might be a bit of a gray area. I don't think I really know any internal s3cr3ts of how mali works (and certainly not more than the lima folks have already figured out). But I don't want to get Linaro in trouble with it's member companies and it seemed like a potential conflict of interest. So what could I do? Pick another ARM platform that I know nothing about, and go to town!

This really leaves two big players. Of the two, I had a friend who could loan me a dragonboard to hack on, so that pretty much clinched the deal. (Although I have hopes that someday someone will figure out how to get something based on the nouveau driver running on tegra.)

Methodology

The approach I took is quite similar to, and strongly inspired by, the approach that Luc Verhaegen took with the lima driver project. It basically amounts to using a LD_PRELOAD shim to intercept system calls, digging through the kernel code to understand the existing userspace<->kernel API, and figuring out how to observe and log the interesting bits.

I've started with 2d acceleration support, mainly because that seemed like a good "warm-up" exercise, and also because there is currently no publicly available acceleration for x11 for the snapdragon platform (binary blob or otherwise). Most of the time so far has gone into figuring out the kernel APIs, and writing some utility code to log and post-process the results of running some simple test apps using the closed src binaries available for android, obtained from a cyanogenmod filesystem (because qualcomm does not provide any userspace support for gnu-linux (non-android) userspace, at least not to the general public). I used some linker tricks to link the test code against the android binary blob libs, and android libc, etc, within a ubuntu 11.10 filesystem. (Fwiw, I use 11.10 because that was prior to the switch over to armhf and based on the 3.0 kernel, which was what I had available from codeaurora git trees.) The good news is, from what I've been able to figure out from the GPL kernel driver, a lot of the infrastructure like pixel and cmdstream buffer allocation, and cmdstream submission, appear to be similar for 2d and 3d, so I think a lot of the work done so far for 2d accel will be useful when it comes to working on the 3d part.

The libwrap code I wrote logs information about the blits (cmdstream, and various parameters like gpu addresses, surface dimensions, blit coords) to a simple .rd log file (which amounts to a sequence of type/length/value fields). These .rd files get processed with a utility I wrote called "redump", to generate a reports showing side-by-side comparisons of the cmdstream, with similarities and parts of dwords that appeared to match surface and blit parameters highlighted. It isn't a perfect disassembly of the command stream, but it certainly helps to spot patterns.

Once I had a reasonable collection of tests for the types of blit operations which are important for an x11 EXA driver, I started varying parameters to figure out the limits, ie. what is the largest blit x, y, width, height, max surface width, height, stride, etc, to establish how many bits are used to encode different fields in the command stream. In some cases, I noticed there were multiple encoding options so parameters could be packed if fewer dwords if less bits where needed to encode the parameters. (For the current EXA driver I'm pretty much using the worst case encoding options so far, to keep things simple.)

With these tests, and the corresponding redump reports, I started work on implementing the EXA accel fxns for the xf86-video-freedreno driver. The work on the EXA driver really only started about 1.5 weekends ago (and most of the time at the beginning was just getting a skeletal driver setup, which is based on a stripped down and simplified xf86-video-msm).

Current Status

So far, I've got the basic solid/copy/composite operations implemented. There are some limitations still in the composite code, such as operations with masks are rejected. (There is an awkward limitation in libC2D2 that there is no way to specific independently mask and src coordinates.. I'm not sure yet if this is a limitation of the hw, but we will be a bit on our own to figure out this via experimentation with the cmdstream. One option to deal with this is ptr arithmetic on the mask surface gpu addr.) And there are still some lesser used color formats that I haven't tackled.

The next big thing, however, will be to deal properly with submission of multiple blits at a time, and not having to block until submitted blits are completely. Without this, performance is (as you would expect) quite bad. But that is easy enough to fix later. There is some awkwardness with the current kernel interface (see NOTES in freedreno tree about how context switch restore works). But that can be fixed by enhancing the kernel part to take separate ptrs in a single ioctl. And of course deciphering the context restore packet would be needed to properly support context switching if you have multiple processes using 2d (but this isn't too important for having a single xserver running so I think we can come back to it later).

A quick note on the kernel: the existing driver from qualcomm is what I'd call a semi-DRM driver. It is using GEM buffers, so it gives us what we'd need eventually for DRI2 and 3d. But not mode setting (which is handled via fbdev driver, also opened by xserver), and not a batchbuffer sort of interface for cmd submission.. cmd submission is handled via separate kgsl-2d/3d devices which are not aware of GEM buffer handles, so mapping buffers to the GPU cannot be handled as part of the cmd submission. So far I'm leaving the kernel driver mostly as-is (sans maybe some minor backwards compatible enhancements), because it is essential to be able to run test code based on the existing binary blob libraries back to back with work-in-progress xorg/mesa drivers. One approach to cleaning up the kernel part might be to provide an emulation layer to emulate the old interfaces, although for now there are enough other things to do that I haven't given this much thought yet. Of course, volunteers are always welcome ;-)

The git trees can be found at: https://gitorious.org/freedreno/

And an IRC channel on freenode at #freedreno

So far there are no mailing lists (I'm not really sure where they could be hosted) or web page other than the wiki pages at gitorious.

Disclaimer

This is a project that I've been working on in my own free time, not using the resources or time of my employeer or Linaro. It is something I've been working on of my own accord because quite simply I want to see advancement of the state of open source graphics in linux. I hope that Linaro will be supportive of this effort, and of open source graphics on all ARM platforms. And I know that a lot of the individual people that make up Linaro are quite passionate about open source. But I realize that dealing with the business concerns of all various member and potential member companies is a difficult balancing act. And as always, the opinions expressed here in my blog are those of my own and not necessarily those of my employer or of Linaro.

Saturday, January 14, 2012

ubuntu-tv 1080p on omap4 panda

you can find a higher resolution video at: http://www.youtube.com/watch?v=HJuyNrVOS1I
needs a bit of cleanup, but patches are here: https://github.com/robclark/qtmobility-1.1.0

update: and as usual Ricardo has a nicer looking video at his blog

Thursday, January 12, 2012

omap5 at CES

http://www.engadget.com/2012/01/12/ti-omap-5-exclusive-demo-laptops-ultrabooks-ces-2012-video/

Thursday, January 5, 2012

xbmc update

of course I've been too busy to write anything, but Ricardo has updated his blog w/ a note about xbmc progress:

http://rsalveti.wordpress.com/2012/01/06/hw-video-decode-and-xbmc-ubuntu-linaro/

Blogging the Monkey