tag:blogger.com,1999:blog-82013182549445139102024-02-19T08:10:19.980-08:00Blogging the MonkeyTales of a code monkeyRobhttp://www.blogger.com/profile/00061851853178706566noreply@blogger.comBlogger54125tag:blogger.com,1999:blog-8201318254944513910.post-70957066733535835952018-02-11T14:46:00.000-08:002018-02-11T14:46:48.601-08:00Infrequent freedreno update<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: Arial, Helvetica, sans-serif;">As is usually the case, I'm long overdue for an update. So this covers the last six(ish) months or so. The first part might be old news if you follow <a href="https://www.phoronix.com/" target="_blank">phoronix</a>.</span><br />
<br />
<h3 style="text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;">Older News</span></h3>
<span style="font-family: Arial, Helvetica, sans-serif;">In the <a href="http://bloggingthemonkey.blogspot.com/2017/06/long-overdue-update.html" target="_blank">last update</a>, I mentioned basic a5xx compute shader support. Late last year (and landing in the mesa 18.0 branch) I had a chance to revisit compute support for a5xx, and finished:</span><br />
<ul style="text-align: left;">
<li><span style="font-family: Arial, Helvetica, sans-serif;">image support</span></li>
<li><span style="font-family: Arial, Helvetica, sans-serif;">shared variable support</span></li>
<li><span style="font-family: Arial, Helvetica, sans-serif;">barriers, which involved some improvements to the ir3 instruction scheduler so barriers could be scheduled in the correct order (ie. for various types of barriers, certain instructions can't be move before/after the related barrier</span></li>
</ul>
<span style="font-family: Arial, Helvetica, sans-serif;">There were also some semi-related SSBO fixes, and additional r/e of instruction encodings, in particular for barriers (new cat7 group of instructions) and image vs SSBO (where different variation of the cat6 instruction encoding are used for images vs SSBOs).</span><br />
<br />
<span style="font-family: Arial, Helvetica, sans-serif;">Also I r/e'd and added support for indirect compute, indirect draw, texture-gather, stencil textures, and <a href="https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_framebuffer_no_attachments.txt" target="_blank">ARB_framebuffer_no_attachments</a> on a5xx. Which brings us pretty close to gles31 support. And over the holiday break I r/e'd and implemented tiled texture support, because moar fps ;-)</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">Ilia Mirkin also implemented indirect draw, stencil texture, and <a href="https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_framebuffer_no_attachments.txt" target="_blank">ARB_framebuffer_no_attachments</a> for a4xx. Ilia and Wladimir J. van der Laan also landed a handful of a2xx and a20x fixes. (But there are more a20x fixes hanging out on a branch which we still need to rebase and merge.) It is definitely nice seeing older hw, which blob driver has long since dropped support for, getting some attention.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<h3 style="text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;">Other News</span></h3>
<span style="font-family: Arial, Helvetica, sans-serif;">Not exactly freedreno related, but probably of some interest to freedreno users.. in the 4.14 kernel, my qcom_iommu driver finally landed! This was the last piece to having the gpu working on a vanilla upstream kernel on the <a href="https://www.96boards.org/product/dragonboard410c/" target="_blank">dragonboard 410c</a>. In addition, the camera driver also landed in 4.14, and venus, the v4l2 mem-to-mem driver for hw video decode/encode landed in 4.13. (The venus driver also already has support for db820c.)</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">Fwiw, the v4l2 mem-to-mem driver interface is becoming the defacto standard for hw video decode/encode on SoC's. GStreamer has had support for a long time now. And more recently <a href="https://github.com/FFmpeg/FFmpeg/commit/1ef7752d64cbe9af2f27cc65aba3a2ca3831c128#diff-143a78d3b451c891d27ad5a4ce321515" target="_blank">ffmpeg (v3.4)</a> and kodi have gained support:</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: Arial, Helvetica, sans-serif;"><iframe width="320" height="266" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/gEEmCsgioII/0.jpg" src="https://www.youtube.com/embed/gEEmCsgioII?feature=player_embedded" frameborder="0" allowfullscreen></iframe></span></div>
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<br />
<span style="font-family: Arial, Helvetica, sans-serif;">When I first started on freedreno, qcom support for upstream kernel was pretty dire (ie. I think serial console support might have worked on some ancient SoC). When I started, the only kernel that I could use to get the gpu running was old downstream msm android kernels (initially 2.6.35, and on later boards 3.4 and 3.10). The ifc6410 was the first board that I (eventually) could run an upstream kernel (after starting out with an msm-3.4 kernel), and the db410c was the first board I got where I never even used an downstream android kernel. Initially db410c was upstream kernel with a pile of patches, although the size of the patchset dropped over time. With db820c, that pattern is repeating again (ie. the patchset is already small enough that I managed to easily rebase it myself for after 4.14). Linaro and qcom have been working quietly in the background to upstream all the various drivers that something like drm/msm depend on to work (clk, genpd, gpio, i2c, and other lower level platform support). This is awesome to see, and the linaro/qcom developers behind this progress deserve all the thanks. Without much fanfare, snapdragon has gone from a hopeless case (from upstream perspective) to one of the better supported platforms!</span><br />
<br />
<span style="font-family: Arial, Helvetica, sans-serif;">Thanks to the upstream kernel support, and u-boot/UEFI support which I've <a href="http://bloggingthemonkey.blogspot.com/2017/06/long-overdue-update.html" target="_blank">mentioned</a> before, Fedora 27 supports db410c <a href="https://nullr0ute.com/2017/11/getting-started-with-fedora-on-the-96boards-dragonboard/" target="_blank">out of the box</a> (and the situation should be similar with other distro's that have new enough kernel (and gst/ffmpeg/kodi if you care about hw video decode). Note that the firmware for db410c (and db820c) has been merged in linux-firmware since that blog post.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<h3 style="text-align: left;">
<span style="font-family: Arial, Helvetica, sans-serif;">More Recent News</span></h3>
<span style="font-family: Arial, Helvetica, sans-serif;">More recently, I have been working on a batch of (mostly) compiler related enhancements to improve performance with things that have more complex shaders. In particular:</span><br />
<ul style="text-align: left;">
<li><span style="font-family: Arial, Helvetica, sans-serif;">Switch over to NIR's support for lowering phi-web's to registers, instead of dealing with <a href="https://en.wikipedia.org/wiki/Static_single_assignment_form" target="_blank">phi</a> instructions in ir3. NIR has a much more sophisticated pass for coming out of SSA, which does a better job at avoiding the need to insert extra MOV instructions, although a bunch of RA (register allocation) related fixes were required. The end result is fewer instructions in resulting shader, and more importantly a reduction in register usage.</span></li>
<li><span style="font-family: Arial, Helvetica, sans-serif;">Using NIR's peephole_select pass to lower if/else, instead of our own pass. This was a pretty small change (although it took some work to arrive at a decent threshold). Previously the <span style="font-family: "Courier New", Courier, monospace;">ir3_nir_lower_if_else</span> pass would try to lower <i>all</i> if/else to select instructions, but in extreme cases this is counter-productive as it increases register pressure. (Background: in simple cases for a GPU, executing both sides of an if/else and using a select instruction to choose the results makes sense, since GPUs tend to be a SIMT arch, and if you aren't executing both sides, you are stalling threads in a warp that took the opposite direction in the if/else.. but in extreme cases this increases register usage which reduces the # of warps in flight.) End result was 4x speedup in alu2 benchmark, although in the real world it tends to matter less (ie. most shaders aren't that complex).</span></li>
<li><span style="font-family: Arial, Helvetica, sans-serif;">Better handling of sync flags across basic blocks</span></li>
<li><span style="font-family: Arial, Helvetica, sans-serif;">Better instruction scheduling across basic blocks</span></li>
<li><span style="font-family: Arial, Helvetica, sans-serif;">Better instruction scheduling for SFU instructions (ie. sqrt, rsqrt, sin, cos, etc) to avoid stalls on SFU.</span></li>
<li><span style="font-family: Arial, Helvetica, sans-serif;">R/e and add support for <span style="font-family: "Courier New", Courier, monospace;">(sat)</span>urate flag flag (to avoid extra sequence of min.f + max.f instructions to clamp a result)</span></li>
<li><span style="font-family: Arial, Helvetica, sans-serif;">And a few other tweaks. </span></li>
</ul>
<span style="font-family: Arial, Helvetica, sans-serif;">The end results tend to depend on how complex the shaders that a game/benchmark uses. At the extreme high end, 4x improvement for alu2. On the other hand, probably doesn't make much difference for older games like xonotic. Supertuxkart and most of the other gfxbench benchmarks show something along the lines of 10-20% improvement. Supertuxkart, in particular, with advanced pipeline, the combination of compiler improvements with previous lrz and tiled texture (ie. <span style="font-family: "Courier New", Courier, monospace;">FD_MESA_DEBUG=lrz,ttile</span>) is a 30% improvement! Some of the more complex shaders I've been looking at, like shadertoy <a href="https://www.shadertoy.com/view/ldl3zN" target="_blank">piano</a>, show 25% improvement on the compiler changes alone. (Shadertoy isn't likely to benefit from lrz/ttile since it is basically just drawing a quad with all the rendering logic in the fragment shader.)</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">In other news, things are starting to get interesting for snapdragon 845 (sdm845). Initial patches for a6xx GPU support have been posted (although I still need to my hands on a6xx hw to start r/e for userspace, so those probably won't be merged soon). And qcom has drm/msm display support buried away in their msm-4.9 tree (expect to see first round of patches for upstream soon.. it's a <i>lot</i> of code, so expect some refactoring before it is merged, but good to get this process started now).</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></div>
Robhttp://www.blogger.com/profile/00061851853178706566noreply@blogger.com0tag:blogger.com,1999:blog-8201318254944513910.post-3466516620993951062017-08-27T09:29:00.001-07:002017-08-27T10:24:51.284-07:00About shader compilers, IR's, and where the time is spent<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: "arial" , "helvetica" , sans-serif;">Occasionally the question comes up about why we convert between various IR's (intermediate representations), like glsl to <a href="http://www.jlekstrand.net/jason/projects/mesa/nir-notes/" target="_blank">NIR</a>, in the process of compiling a shader. Wouldn't it be faster if we just skipped a step and went straight from glsl to "the final thing", which would be ir3 (freedreno), codegen (nouveau), or LLVM (radeonsi/radv). It is a reasonable question, since most people haven't worked on compilers and we probably haven't done a good job at explaining all the various passes involved in compiling a shader or presenting a breakdown of where the time is spent.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">So I spent a bit of time this morning with perf to profile a shader-db run (or rather a subset of a full run to keep the <span style="font-family: "courier new" , "courier" , monospace;">perf.data</span> size manageable, see notes at end).</span><br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhw89nofbDtphrVseWWh53Fu_bMW7OXeyL9BXaZtoYj0DqKNRVN3hUmTCFo1y7AZbKnSnnCgM5DH6vuVVKvaI38TJqDK29ocC9WFD3MpAmUW0WhVmcokbE0AL_lYDEPBhFOMiRu2rXrlg_j/s1600/flamegraph.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="900" data-original-width="1600" height="180" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhw89nofbDtphrVseWWh53Fu_bMW7OXeyL9BXaZtoYj0DqKNRVN3hUmTCFo1y7AZbKnSnnCgM5DH6vuVVKvaI38TJqDK29ocC9WFD3MpAmUW0WhVmcokbE0AL_lYDEPBhFOMiRu2rXrlg_j/s320/flamegraph.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-family: "arial" , "helvetica" , sans-serif;">A flamegraph from the shader-db run, since every blog post needs a catchy picture.</span></td></tr>
</tbody></table>
<span style="font-family: "arial" , "helvetica" , sans-serif;"></span><br />
<h3 style="text-align: left;">
<span style="font-family: "arial" , "helvetica" , sans-serif;">Breakdown:</span></h3>
<ul style="text-align: left;">
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">parser, into glsl: 9.98%</span></li>
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">glsl to nir: 1.3%</span></li>
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">nir opt/lowering passes: 21.4%</span></li>
<ul>
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">CSE: 6.9%</span></li>
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">opt algebraic: 3.5%</span></li>
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">conversion to SSA: 2.1%</span></li>
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">DCE: 2.0%</span></li>
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">copy propagation: 1.3%</span></li>
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">other lowering passes: 5.6% </span></li>
</ul>
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">nir to ir3: 1.5% </span></li>
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">ir3 passes: 21.5%</span></li>
<ul>
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">register allocation: 5.1%</span></li>
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">sched: 14.3%</span></li>
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">other: 2.1% </span></li>
</ul>
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">assembly (ir3->binary): 0.66%</span></li>
</ul>
<span style="font-family: "arial" , "helvetica" , sans-serif;">This is ignoring some of the fixed overheads of shader-db runner, and also doesn't capture individually a bunch of NIR lowering passes. NIR has ~40 lowering passes, some that are gl related like <span style="font-family: "courier new" , "courier" , monospace;">nir_lower_draw_pixels</span> and <span style="font-family: "courier new" , "courier" , monospace;">nir_lower_wpos_ytransform</span> (because for hysterical reasons textures and therefore FBO's are upside down in gl). For gallium drivers using NIR, these gl specific passes are called from mesa state-tracker.</span><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">The other lowering passes are not gl specific but tend to be specific to general GPU shader features (ie. things that you wouldn't find in a C compiler for a cpu) and things that are needed by multiple different drivers. Such as, <span style="font-family: "courier new" , "courier" , monospace;">nir_lower_tex</span> which handles sampling from YUV textures, ie. inserting the instructions to do YUV->RGB conversion (since GLES and android strongly assume this is a thing that hardware can always do), lowering RECT textures, or clamping texture coords. These lowering passes are called from the driver backend so the driver is in control of what lowering pass are needed, including configuration about individual features in passes which handle multiple things, based on what the hardware does not support directly.</span><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">These lowering passes are mostly O(n), and lost in the noise.</span><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">Also note that freedreno, along with the other drivers that can consume NIR directly, disable a bunch of opt passes that were originally done in glsl, but that NIR (or LLVM) can do more efficiently. For freedreno, disabling the glsl opt passes shaved ~30% runtime off of a shader-db run, so spending 1.3% to convert into NIR is way more than offset.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">For other drivers, the breakdown may be different. I expect radeonsi/radv skips some of the general opt passes in NIR which have a counterpart in LLVM, but re-uses other lowering passes which do not have a counterpart in LLVM.</span><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"></span><br />
<h3 style="text-align: left;">
<span style="font-family: "arial" , "helvetica" , sans-serif;">Is it still a gallium driver?</span></h3>
<span style="font-family: "arial" , "helvetica" , sans-serif;"></span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">This is a related question that comes up sometimes, is it a gallium driver if it doesn't use TGSI? Yes.</span><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">The drivers that can consume NIR and implement the gallium pipe driver interface, freedreno a3xx+, vc4, vc5, and radeonsi (optionally), are gallium drivers. They still have to accept TGSI for state trackers which do not support NIR, and various built-in shaders (blits, mipmap generation, etc). Most use the shared <span style="font-family: "courier new" , "courier" , monospace;">tgsi_to_nir</span> pass for TGSI shaders. Note that currently <span style="font-family: "courier new" , "courier" , monospace;">tgsi_to_nir</span> does not support all the TGSI features, but just features needed by internal shaders, and what is needed for gl3/gles3 (ie. basically what freedreno and vc4 needed before mesa state-tracker grew support for <span style="font-family: "courier new" , "courier" , monospace;">glsl_to_nir</span>).</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;"></span><br />
<h3 style="text-align: left;">
<span style="font-family: "arial" , "helvetica" , sans-serif;">Notes:</span></h3>
<span style="font-family: "arial" , "helvetica" , sans-serif;"></span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">Collected from shader-db run (glamor + supertuxkart + 0ad shaders) with a debug mesa build (to have debug syms and prevent inlining) but with <span style="font-family: "courier new" , "courier" , monospace;">NIR_VALIDATE=0</span> (otherwise results with debug builds are highly skewed). A subset of all shader-db shaders was used to keep the <span style="font-family: "courier new" , "courier" , monospace;">perf.data</span> size manageable.</span></div>
Robhttp://www.blogger.com/profile/00061851853178706566noreply@blogger.com0tag:blogger.com,1999:blog-8201318254944513910.post-83936410251087726982017-06-25T15:03:00.000-07:002017-06-26T08:35:54.544-07:00long overdue update<div dir="ltr" style="text-align: left;" trbidi="on">
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Since it has been a while since the last update, I guess it is a good time to post an update on some of the progress that has been happening with freedreno and upstream support for snapdragon boards.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: large;"><b>freedreno / mesa</b></span></span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">While the 17.1 release included enabling <a href="http://bloggingthemonkey.blogspot.com/2016/07/dirty-tricks-for-moar-fps.html" target="_blank">reorder</a> support by default, there have been many other interesting features landed since the 17.1 branch point (so they will be included in the future 17.2 release). Many, but not all, are related to <a href="https://www.phoronix.com/scan.php?page=news_item&px=Freedreno-A5xx-Bringup" target="_blank">a5xx</a>. (Something that I just realized I forgot to blog about, but have <a href="http://armdevices.net/2017/03/14/freedreno-enables-linux-distros-on-dragonboard-820c-96boards/" target="_blank">demoed</a> here and there.)</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b>GL/GLES Compute Shaders:</b></span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">So far this is only a5xx (although a4xx seems to work similarly, and would probably be not too hard to get working if someone had the right hardware and a bit of time). SSBOs and atomics are supported, but image support (an important part of compute shaders) is still TODO (and some r/e required, although it seems to share a lot in common with SSBOs). Adreno 3xx support for compute shaders appears to be more work (ie. less in common with a4xx/a5xx, and probably part of the reason that qualcomm never bothered adding support in android blob driver). Patches welcome, but for now a3xx compute support is far enough down my TODO list that it might not otherwise happen.<br /> </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">I know there is a lot of interest in open source OpenCL support for freedreno, and hopefully that is something that will come in the future. But there is the big challenge of how to get opencl shaders (kernels) into a form that can be consumed by freedreno's <a href="https://github.com/freedreno/freedreno/wiki/A3xx-shader-instruction-set-architecture" target="_blank">ir3</a> shader compiler backend. While there is some potential to re-use spirv_to_nir at some point, there are some complicated details. For compute kernels (ie. OpenCL) there are some restrictions lifted on SPIRV that spirv_to_nir relies on. (Little details like lack of requirement for structured flow control.)</span><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b>A5xx HW Binning Support:</b></span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Traditionally hw binning support, while a pretty big perf boost, has been kinda difficult (translation: lot of things can be done wrong to lead to difficult to debug GPU lockups), this time around it wasn't so hard. I guess experience on <a href="http://bloggingthemonkey.blogspot.com/2014/01/freedreno-update-new-year-edition.html" target="_blank">a3xx</a>/<a href="http://bloggingthemonkey.blogspot.com/2016/05/freedreno-not-so-periodic-update.html" target="_blank">a4xx</a> has helped. And everyone loves ~30% fps boost in your favorite game!</span><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">This has brought performance roughly up to the levels as ifc6540/a420. Which sounds bad, but remember we are comparing apples and oranges. On ifc6540 (snapdragon 805), we don't yet have upstream kernel support so this was using a 3.10 android kernel (with bus-scaling and all the downstream tricks to optimize memory bandwidth and overall SoC performance). But on a530 (dragonboard820c), I never had a working downstream kernel (or had to bother backporting the upstream drm/msm driver to some ancient android kernel.. hurray!). The upshot is that any perf #'s for a5xx don't include bus-scaling, cpufreq, etc. I expect a pretty big performance boost on a530 once we have a way to clock up memory/interconnects. (Ie. on micro-benchmarks a530 is >2x faster than a420 on alu limited workloads, but still a bit slower than a420 on bandwidth limited workloads, despite having a higher theoretical bandwidth.)</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Side note, linaro is working on an upstream solution for <a href="https://lwn.net/Articles/716059/" target="_blank">bus-scaling</a>. This is a very important improvement needed upstream for ARM SoC's, especially ones that optimize so strongly for battery life. (Keep in mind that interconnects, which span across the SoC, and memory, are a big power consumer in a modern SoC.. so a lot of qualcomm's good performance + battery life in phones comes down to these systemwide optimizations.) It is equivalent to slow memory clockings on some generations of nouveau, except in this case it is outside the gpu driver (ie. we aren't talking about vram on a discrete gpu), and the reason is to enable a high end phone SoC to last a couple days on battery, rather than keeping your video card from melting.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b>A5xx gles3.0/gl3.1 support:</b></span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Probably it would have made sense to spend time on this before compute shaders (since they are otherwise only exposed with <span style="font-family: "courier new" , "courier" , monospace;">$MESA_GL_VERSION_OVERRIDE</span> tricks.. but hey, I was curious about how compute shaders worked). After an assortment of small things to r/e and implement, we where just a few (~50) texture/vbo/fb formats away from gl3.1. Nothing really exciting. Mostly just a few weekends probing unknown format #'s and seeing which piglit format tests started passing. The sort have thing that would have taken approximately 10 minutes with docs.. but hey, it needed to be done.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b>Switching to NIR by default:</b></span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">This is one thing that benefits a3xx and a4xx as well as a5xx. While freedreno has had NIR support for <a href="https://www.phoronix.com/scan.php?page=news_item&px=Freedreno-Adds-NIR-Compiler" target="_blank">a while</a>, it hasn't been enabled by default until more <a href="https://www.phoronix.com/scan.php?page=news_item&px=Freedreno-NIR-Default" target="_blank">recently</a>. The issue was handling of complex <a href="https://cgit.freedesktop.org/mesa/mesa/commit/?id=caa64b24ce4ca32addfae2bcd93b59b1e5225d82" target="_blank">dereferences</a> (multi-dimensional arrays, arrays of structs, etc). The problem was that freedreno's ir3 backend preferred to keep things in SSA form (since that gives the instruction scheduler more flexibilty, which is pretty imprortant in the a3xx+ instruction set architecture (<a href="https://github.com/freedreno/freedreno/wiki/A3xx-shader-instruction-set-architecture" target="_blank">ir3</a>)). Adding support to lower arrays to regs allowed moving the deref offset calculation to NIR so that we wouldn't regress by turning NIR on by default. This is useful since it cuts shader compilation time, but also because tgsi_to_nir doesn't support SSBOs, atomics, and other new shiny glsl features. (Now we only rely on tgsi_to_nir for various legacy paths and built-in blit shaders which don't need new shiny glsl features.)</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b>A5xx HW Query Support:</b></span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Adreno 5xx changed how hw <a href="https://github.com/freedreno/freedreno/wiki/Queries" target="_blank">queries</a> (ie. occlusion query and time-elapsed query, etc) work. For the <a href="https://github.com/freedreno/freedreno/wiki/A5xx-Queries" target="_blank">better</a>,
since now we can accumulate per-tile results on the GPU. But it
required some new support in freedreno for a different sort of query,
and some r/e about how this actually worked. And while we had
previously lied about occlusions query support (mostly to expose more
than gl1.4 support), that isn't a very good long term solution. In
addition, time-elapsed query is useful for performance/profiling work,
so helpful for some of the following projects.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b>A5xx LRZ Support:</b></span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Adreno 5xx adds another cute optimization called "LRZ". (Presumably "low resolution Z (depth buffer)". I've spent a some time r/e'ing this feature and <a href="https://cgit.freedesktop.org/mesa/mesa/commit/?id=5b60004525876616c4719bb790108db4650b1f49" target="_blank">implementing</a> support for it in freedreno. It is a neat new hw trick that a5xx has, which serves two purposes.</span></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;">The basic idea is to have a per-quad depth value so that in the binning pass primitives can be rejected (per tile) based on depth (ie. reject more early). But then recycle the LRZ buffer in draw phase to function as for-free depth pre-pass (ie. reject earlier primitives based on the z value of later primitives).</span></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;">The benefit depends on how well optimized the game is. Ie. games that are well optimized for traditional GPU architectures (ie. sorting geometry, already doing depth pre-passes, etc) won't benefit as much.. but this helps a lot for badly written games that relied on per-pixel deferred rendering.</span></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Overall, for things like <a href="http://blog.supertuxkart.net/" target="_blank">stk</a>/xonotic, it seems like a ~5-10% win.</span><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">edit: I forgot to mention, this isn't enabled by default as it causes some issues (which seem like a sort of z-fighting) with <a href="https://play0ad.com/" target="_blank">0ad</a>. Other than that, I haven't found anything that it doesn't work with. To enable: <span style="font-family: "courier new" , "courier" , monospace;">FD_MESA_DEBUG=lrz</span>. It would be nice if there were some way to have driver specific flags in driconf to control things like this.</span><br />
<br /></div>
<div>
</div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;">The main remaining performance trick for a5xx is UBWC (ie. bandwidth compression) + tiled textures. I've worked out mostly how UBWC works (in particular texture layout, at least for 2d textures + mipmap, but I think we can infer how 2d arrays, 3d, etc, work from that). Most of the infrastructure for upload/download blits (to convert to/from linear) should be easier thanks to the <a href="http://bloggingthemonkey.blogspot.com/2016/07/dirty-tricks-for-moar-fps.html" target="_blank">reorder</a> support. We'll see if I actually find time to implement it before the mesa 17.2 branch point.</span></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: large;"><b>Standardized Embedded Nonsense Hacks</b></span></span></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Anyone who has dealt with arm (non-server) devices, should be familiar with the silly-embedded-nonsense-hacks world. In particular the non-standard boot-chain which makes it difficult for distro's to support the plethora of arm boards (let alone phones/tablets/etc) out there without per-board support. Which was fine in the early days, but N boards times M distro's, it really doesn't scale.</span></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Thanks to work by Mateusz Kulikowski, we now have u-boot support for dragonboard 410c. It's been on my TODO list to play with for a while. But more recently I realized that u-boot, thanks to the work of many <a href="https://www.suse.com/docrep/documents/a1f0ledpbe/UEFI%20on%20Top%20of%20U-Boot.pdf" target="_blank">others</a>, can provide enough of EFI runtime-services interface for grub to work. This means that it is a path forward for standardized distros on aarch64 (like fedora and opensuse), which expect UEFI, to boot on boards which don't otherwise have UEFI firmware.</span></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;">So I decided to spend a bit of time pretending to be a <a href="https://www.happyassassin.net/2013/05/03/a-day-in-the-life-of-a-firmware-engineer/" target="_blank">crack smoking firmware engineer</a>. (Not literally, of course.. that would be stupid!)</span></div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">After fixing some linker script bugs with u-boot's db410c support vs efi_runtime section, and debugging some issues with grub finding the boot disk with the help of <a href="https://fedoraproject.org/wiki/User:Pjones" target="_blank">Peter Jones</a> (the resident grub/EFI expert who conveniently sits near me), and a couple other misc u-boot fixes, I had a fedora 26 alpha image booting on the db410c.</span><br />
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">The next step was figuring out display, so we could have grub boot menu on screen, like you would expect on a grown-up platform. As it turns out, on most devices, lk (little kernel, ie. what normally loads the kernel+initrd on snapdragon android devices) already supports lighting up the display, since most/all android devices put up the initial splash-screen before the kernel is loaded. Unfortunately this was not the case with the db410c's lk. But Archit (qcom engineer who has contributed a whole lot of drm/msm and other drm patches) pointed me at a different lk branch (among the 100's) which had msm8916 display + adv7533 dsi->hdmi bridge (like what db410c uses). After digging through a convoluted git history, I was able to track down the relevant gpio/i2c/adv7533 patches to port to the lk branch used on db410c.</span><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">After that, I added support for lk to populate a framebuffer node, using the <a href="https://github.com/torvalds/linux/blob/master/Documentation/devicetree/bindings/display/simple-framebuffer.txt" target="_blank">simple-framebuffer</a> bindings to pass the pre-configured scanout buffer (+dimensions) to u-boot. This plus a new <a href="https://lists.denx.de/pipermail/u-boot/2017-June/296388.html" target="_blank">simplefb</a> video driver for u-boot, enables u-boot to expose display support to grub via the EFI <a href="http://wiki.phoenix.com/wiki/index.php/EFI_GRAPHICS_OUTPUT_PROTOCOL" target="_blank">GOP</a> protocol. (Along the way I had to add 32bpp rgb support to lk since u-boot and grub don't understand packed 24bpp rgb.)</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">All this got to the point of:</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/6jePbdpTHRA/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/6jePbdpTHRA?feature=player_embedded" width="320"></iframe></span></div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">This is a fedora image, booting off of usb disk (ie. not just rootfs on usb disk, but also grub/kernel/initrd/dtb). With graphical grub menu to select which kernel to boot, just like you would expect on a PC. The grubaa64.efi here is vanilla distro boot-loader, and from the point of view of the distro image, lk/u-boot is just the platform's firmware which somehow provides the UEFI interface the distro media expects. It is worth pointing out some advantages of a traditional lk->kernel boot chain:</span><br />
<ul style="text-align: left;">
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">booting from USB, network, etc (which lk cannot do)</span></li>
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">doesn't require kernel packed in custom boot.img partition which is board specific</span></li>
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">booting installer image (ie. from sd-card or network) </span></li>
</ul>
<span style="font-family: "arial" , "helvetica" , sans-serif;">When the kernel starts, in early boot, it is using efifb, just like it would on a PC. (Ie. so you can see what is going on on-screen before hw specific drm driver kernel module is loaded).</span><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">There are still a few rough edges. The drm/msm driver and msm clk drivers are a bit surprised when some clks are already enabled when the kernel starts, and the display is already light up.. now we have a good reason to fix some of those issues. And right now we don't have a good way to load a newer device tree binary (dtb) after a distro kernel update (ie. without updating u-boot, aka "the firmware"). (For simple SoC's maybe a pre-baked dtb for the life of the board is sufficient... I have my doubts about that for SoCs as complex as the various snapdragon's, if for no other reason that we haven't even figured out how to model all the features of the existing SoCs in devicetree.) One idea is for u-boot to pass to grub the name of the board dtb file to load via EFI variables. I've sent a very early <a href="https://lists.denx.de/pipermail/u-boot/2017-June/296387.html" target="_blank">RFC</a> to add EFI variable support in u-boot. We'll see how this goes, in the mean time there might be more "firmware" upgrades needed than you'd normally expect on a mature platform like x86.</span><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">For now, my lk + u-boot work is here:</span><br />
<ul style="text-align: left;">
<li><a href="https://github.com/robclark/u-boot/commits/db410c-display"><span style="font-family: "arial" , "helvetica" , sans-serif;">https://github.com/robclark/u-boot/commits/db410c-display</span></a></li>
<li><span style="font-family: "arial" , "helvetica" , sans-serif;"><a href="https://github.com/robclark/lk/commits/db410c-display" target="_blank">https://github.com/robclark/lk/commits/db410c-display </a></span></li>
</ul>
<span style="font-family: "arial" , "helvetica" , sans-serif;">and prebulit "firmware" is <a href="https://people.freedesktop.org/~robclark/db410c/" target="_blank">here</a>. For now you will need to edit distro grub.cfg to add 'devicetree' commands to load appropriate dtb since what is included with u-boot.img is a very minimal fdt (ie. just enough for the drivers in u-boot).</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
</div>
Robhttp://www.blogger.com/profile/00061851853178706566noreply@blogger.com4tag:blogger.com,1999:blog-8201318254944513910.post-24318728403922853572016-11-16T07:59:00.000-08:002016-11-16T07:59:28.269-08:00a quick note for users/distros<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: Arial,Helvetica,sans-serif;">At this point, I haven't pushed a new release tag for xf86-video-freedreno to update to latest xserver ABI. I'm inclined not to. If you are using a modern xserver you probably want to be using xf86-video-modesetting + glamor. It has more features (dri3, xv, etc) and better performance. And GL support on a3xx/a4xx is pretty solid. So distros with a modern xserver might as well drop the xf86-video-freedreno package.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">The one case where xf86-video-freedreno is still useful is bringing up a new generation of adreno, since it can do dri2 with pure-sw fallbacks for all the EXA ops. But if that is what you are doing, I guess you know how to git clone and build.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">The possible alternative is to push a patch that makes xf86-video-freedreno still build, but only probe (with latest xserver ABI) if some "ForceLoad" type option is given in xorg.conf, otherwise fallback to modesetting/glamor. I can't think of a good reason to do this at the moment. But as always, questions/comments/suggestions welcome.</span><br />
<br /></div>
Robhttp://www.blogger.com/profile/00061851853178706566noreply@blogger.com0tag:blogger.com,1999:blog-8201318254944513910.post-83566854008793836792016-07-30T12:12:00.000-07:002016-08-01T05:28:11.164-07:00dirty tricks for moar fps!<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: "arial" , "helvetica" , sans-serif;">This weekend I landed a patchset in mesa to add support for resource shadowing and batch re-ordering in freedreno. What this is, will take a bit of explaining, but the tl;dr: is a nice fps boost in many games/apps.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">But first, a bit of background about <a href="https://en.wikipedia.org/wiki/Tiled_rendering" target="_blank">tiling gpu</a>'s: the basic idea of a tiler is to render N draw calls a tile at a time, with a tile's worth of the "framebuffer state" (ie. each of the MRT color bufs + depth/stencil) resident in an internal tile buffer. The idea is that most of your memory traffic is to/from your color and z/s buffers. So rather than rendering each of your draw calls in it's entirety, you split the screen up into tiles and repeat each of the N draws for each tile to fast internal/on-chip memory. This avoids going back to main memory for each of the color and z/s buffer accesses, and enables a tiler to do more with less memory bandwidth. But it means there is never a single point in the sequence of draws.. ie. draw #1 for tile #2 could happen after draw #2 for tile #1. (Also, that is why GL_TIMESTAMP queries are bonkers for tilers.)</span><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">For purpose of discussion (and also how things are named in the code, if you look), I will define a tile-pass, ie. rendering of N draws for each tile in succession (or even if multiple tiles are rendered in parallel) as a "batch".</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Unfortunately, many games/apps are not written with tilers in mind. There are a handful of common anti-patterns which force a driver for a tiling gpu to flush the current batch. Examples are unnecessary FBO switches, and texture or UBO uploads mid-batch.</span><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">For example, with a 1920x1080 r8g8b8a8 render target, with z24s8 depth/stencil buffer, an unnecessary batch flush costs you 16MB of write memory bandwidth, plus another 16MB of read when we later need to pull the data back into the tile buffer. That number can easily get much bigger with games using float16 or float32 (rather than 8 bits per <span style="font-family: "arial" , "helvetica" , sans-serif;">component</span>) intermediate buffers, and/or multiple render targets. Ie. two MRT's with float16 internal-format plus z24s8 z/s would be 40MB write + 40MB read per extra flush. </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">So, take the example of a UBO update, at a point where you are not otherwise needing to flush the batch (ie. swapbuffers or FBO switch). A straightforward gl driver for a tiler would need to flush the current batch, so each of the draws before the UBO update would see the old state, and each of the draws after the UBO update would see the new state.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Enter resource shadowing and batch reordering. Two reasonably big (ie. touches a lot of the code) changes in the driver which combine to avoid these extra batch flushes, as much as possible.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Resource shadowing is allocating a new backing GEM buffer object (BO) for the resource (texture/UBO/VBO/etc), and if necessary copying parts of the BO contents to the new buffer (back-blit).</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">So for the example of the UBO update, rather than taking the 16MB+16MB (or more) hit of a tile flush, why not just create two versions of the UBO. It might involve copying a few KB's of UBO (ie. whatever was not overwritten by the game), but that is a lot less than 32MB?</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">But of course, it is not that simple. Was the buffer or texture level mapped with <span style="font-family: "courier new" , "courier" , monospace;">GL_MAP_INVALIDATE_BUFFER_BIT</span> or <span style="font-family: "courier new" , "courier" , monospace;">GL_MAP_INVALIDATE_RANGE_BIT</span>? (Or GL API that implies the equivalent, although fortunately as a gallium driver we don't have to care so much about all the various different GL paths that amount to the same thing for the hw.) For a texture with mipmap levels, we unfortunately don't know at the time where we need to create the new shadow BO whether the next GL calls will <span style="font-family: "courier new" , "courier" , monospace;">glGenerateMipmap()</span> or upload the remaining mipmap levels. So there is a bit of complexity in handling all the cases properly. There may be a few more cases we could handle without falling back to flushing the current batch, but for now we handle all the common cases.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">The batch re-ordering component of this allows any potential back-blits from the shadow'd BO to the new BO (when resource shadowing kicks in), to be split out into a separate batch. The resource/dependency tracking between batches and resources (ie. if various batches need to read from a given resource, we need to know that so they can be executed before something writes to the resource) lets us know which order to flush various in-flight batches to achi<span style="font-family: "arial" , "helvetica" , sans-serif;">eve correct results</span>. Note that this is partly because we use <span style="font-family: "courier new" , "courier" , monospace;">util_blitter</span>, which turns any internally generated resource-shadowing back-blits into normal draw calls (since we don't have a dedicated blit pipe).. but this approach also handles the unnecessary FBO switch case for free.</span><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">Unfortunately, the batch re-ordering required a bit of an overhaul about how cmdstream buffers are handled, which required changes in all layers of the stack (mesa + libdrm + kernel). The kernel changes are in drm-next for 4.8 and libdrm parts are in the latest libdrm release. And while things will continue to work with a new userspace and old kernel, all these new optimizations will be disabled.</span><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">(And, while there is a growing number of snapdragon/adreno SBC's and phones/tablets <a href="https://lwn.net/Articles/680109/" target="_blank">getting upstream attention</a>, if you are stuck on a downstream 3.10 kernel, look <a href="https://github.com/freedreno/kernel-msm/commits/ifc6540-4.4.4-drm-v4.8" target="_blank">here</a>.)</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">And for now, even with a new enough kernel, for the time being reorder support is not enabled by default. There are a couple more piglit tests remaining to investigate, but I'll probably flip it to be enabled by default (if you have a new enough kernel) before the next mesa release branch. Until then, use <span style="font-family: "courier new" , "courier" , monospace;">FD_MESA_DEBUG=reorder</span> (and once the default is switched, that would be <span style="font-family: "courier new" , "courier" , monospace;">FD_MESA_DEBUG=noreorder</span> to disable).</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">I'll cover the implementation and tricks to keep the CPU overhead of all this extra bookkeeping small later (probably at <a href="https://www.x.org/wiki/Events/XDC2016/" target="_blank">XDC2016</a>), since this post is already getting rather long. But the juicy bits: ~30% gain in supertuxkart (new render engine) and ~20% gain in manhattan are the big winners. In general at least a few percent gain in most things I looked at, generally in the 5-10% range.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
Robhttp://www.blogger.com/profile/00061851853178706566noreply@blogger.com4tag:blogger.com,1999:blog-8201318254944513910.post-25652692062150380202016-05-04T11:42:00.000-07:002016-05-04T16:38:32.028-07:00Freedreno (not so) periodic update<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-family: "arial" , "helvetica" , sans-serif;">S</span>ince I seem to be not so good at finding time for blog updates recently, this update probably covers a greater timespan than it should, and some of this is already old news ;-)</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Already quite some time ago, but in case you didn't already notice: with the mesa 11.1 release, freedreno now supports up to (desktop) gl3.1 on both a3xx and a4xx (in addition to <a href="http://bloggingthemonkey.blogspot.com/2015/08/freedreno-mesa-110-progress-update.html" target="_blank">gles3</a>). Which is high enough to show up on the front page at <a href="https://people.freedesktop.org/~imirkin/glxinfo/glxinfo.html" target="_blank">glxinfo</a>. (Which, btw, is a useful tool to see exactly which gl/gles extensions are supported by which version of mesa on various different hw.)</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">A couple months back, I spent a bit of time starting to look at performance. On master now (so will be in 11.3), we have timestamp and time-elapsed query support for a4xx, and I may expose a few more performance counters (mostly for the benefit of <a href="http://www.phoronix.com/scan.php?page=news_item&px=MTMzNTI" target="_blank">gallium HUD</a>). I still need to add support for a3xx, but already this is useful to help profile. In addition, I've cobbled together a simple <a href="https://github.com/freedreno/envytools/blob/master/rnn/fdperf.c" target="_blank">fdperf</a> cmdline tool:</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<iframe height="480" src="https://showterm.io/a6cbdbd41aadd66346043#" width="640"></iframe>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">I also got around to (finally) implementing <a href="https://github.com/freedreno/freedreno/wiki/Adreno-tiling#optimized-approach" target="_blank">hw binning</a> support for a4xx, which for *some* games can have a pretty big perf boost:</span><br />
<ul style="text-align: left;">
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">glmark2 'refract' bench (an extreme example): 31fps -> 124fps</span></li>
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">xonotic (med): 44.4fps -> 50.3fps</span></li>
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">supertuxkart (new render engine): 15fps -> 19fps </span></li>
</ul>
<span style="font-family: "arial" , "helvetica" , sans-serif;">More recently I've started to run the <a href="https://source.android.com/devices/graphics/testing.html" target="_blank">dEQP</a> gles3 tests against freedreno. Initially the results where not too good, but since then I've fixed a couple thousand test cases.. fortunately it was just a few bugs and a couple missing workaround for hw bug/limitations (depending on how you look at it) which counted for the bulk of the fails. Now we are at 98.9% pass (or 99.5% if you don't count the 'skips' against the pass ratio). These fixes have also helped piglit, where we are now up to 98.3% pass. These figures are a4xx, but most of the fixes apply to a3xx as well.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">I've also made some improvements in ir3 (shader compiler for a3xx and later) so the code it generates is starting to be pretty decent. The <a href="https://cgit.freedesktop.org/mesa/mesa/commit/?id=173871dfb988c3e9fb74a8016d2b024619a5d918" target="_blank">immediate->const lowering</a> that I just pushed helps reduce register pressure in a lot of cases. We still need support for spilling, but at least now <a href="https://www.shadertoy.com/" target="_blank">shadertoy</a> (which is some sort of cruel joke against shader compiler writers) isn't a complete horror show:</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjLVDeORFJVMkrWjdZRz4dnqies0biDTTbc2S3TRKDH4ws-FQ0GG3HndJzywIzOVerEdpfot6e1IWjHOVkFP0pXg27rns662moPE0LG4VhLVhU-rf5xS4uKpq7Zw5kOKkdU2ymGp4BODLRX/s1600/flappybird.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="180" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjLVDeORFJVMkrWjdZRz4dnqies0biDTTbc2S3TRKDH4ws-FQ0GG3HndJzywIzOVerEdpfot6e1IWjHOVkFP0pXg27rns662moPE0LG4VhLVhU-rf5xS4uKpq7Zw5kOKkdU2ymGp4BODLRX/s320/flappybird.png" width="320" /></a></span></div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">In other cool news, in case you had not already seen: Rob Herring and John Stultz from linaro have been doing some cool work, with Rob getting <a href="https://plus.google.com/+RobHerring/posts/MmwvhwQ6T9x" target="_blank">android running on an upstream kernel plus mesa running on db410c and qemu</a> (with freedreno and virtgl), and John taking all that, and getting it all <a href="https://plus.google.com/111524780435806926688/posts/fkQ1BMjNNcn" target="_blank">running on a nexus7 tablet</a>. (And more recently, getting <a href="https://plus.google.com/111524780435806926688/posts/7gNfnn4tpqe" target="_blank">wifi</a> working as well.) I had the opportunity to see this in person when I was at Linaro Connect in March. It might not seem impressive if you are unfamiliar with the extent to which android device kernels diverge from upstream, but to see an upstream kernel running on an actual device with only ~50patches is quite a feat:</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgef9vf_sB29iz4lIMLJEY5NjzUqRkrd9kCN459psQANDTxAglpNiOF6BRZf1tkHNAKW1IVfKfYkhjQfbr_QPPLMjmmVjoOu9x4DAOF1KhjzGzkkpuzhhYXJhFqXnBb9O_yzfX8Zp-GsyWR/s1600/IMG_20160311_114648.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgef9vf_sB29iz4lIMLJEY5NjzUqRkrd9kCN459psQANDTxAglpNiOF6BRZf1tkHNAKW1IVfKfYkhjQfbr_QPPLMjmmVjoOu9x4DAOF1KhjzGzkkpuzhhYXJhFqXnBb9O_yzfX8Zp-GsyWR/s320/IMG_20160311_114648.jpg" width="240" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"></span></div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">The UI was actually reasonably fast, despite not yet using overlays to bypass GPU for composition. But as ongoing work in drm/kms for <a href="https://lists.freedesktop.org/archives/dri-devel/2016-April/105859.html" target="_blank">explicit fencing</a>, and mesa <a href="https://lists.freedesktop.org/archives/mesa-dev/2016-April/111574.html" target="_blank">EGL_ANDROID_native_fence_sync</a> land, we should be able to get hw composition working.</span><br />
<br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
Robhttp://www.blogger.com/profile/00061851853178706566noreply@blogger.com5tag:blogger.com,1999:blog-8201318254944513910.post-81763287375490330172015-08-15T13:21:00.001-07:002015-08-15T15:15:00.750-07:00freedreno - mesa 11.0 progress update, OpenGLES3 and more<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: Arial,Helvetica,sans-serif;">So the big news for the upcoming mesa 11.0 release is gl4.x support for radeon and nouveau. Which has been in the works for a long time, and a pretty tremendous milestone (and the reason that the next mesa release is 11.0 rather than 10.7). But on the freedreno side of things, we haven't been sitting still either. In fact, with the transform-feedback support I landed a couple weeks ago (for a3xx+a4xx), plus MRT+z32s8 support for a4xx (Ilia landed the a3xx parts of those a while back), we now support OpenGLES 3.0[1] on both adreno 3xx and 4xx!!</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">In addition, with the TBO support that landed a few days ago, plus handful of other fixes in the last few days, we have the new <a href="http://supertuxkart.sourceforge.net/Antarctica:_Technical_Details">antarctica</a> gl3.1 render engine for supertuxkart working!</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: Arial,Helvetica,sans-serif;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjKMB0Fhmihyphenhyphenf8OgpMDZqWzzV9UohDYDJk8_9kYxE9nQowLNYa6D9RFHDWPAt2uJM6mDA6KTXOo9xJyqFW8vZqQlxqNcV_Xe8XrqI6BtEa1D-Xz_CFz-VLkF80O0ZraecIyLjrwUI53eroz/s1600/stk-cocoa_temple-4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjKMB0Fhmihyphenhyphenf8OgpMDZqWzzV9UohDYDJk8_9kYxE9nQowLNYa6D9RFHDWPAt2uJM6mDA6KTXOo9xJyqFW8vZqQlxqNcV_Xe8XrqI6BtEa1D-Xz_CFz-VLkF80O0ZraecIyLjrwUI53eroz/s320/stk-cocoa_temple-4.png" width="320" /></a></span></div>
<span style="font-family: Arial,Helvetica,sans-serif;">Note that you need to use <span style="font-family: "Courier New",Courier,monospace;">MESA_GL_VERSION_OVERRIDE=3.1</span> and <span style="font-family: "Courier New",Courier,monospace;">MESA_GLSL_VERSION_OVERRIDE=140</span>, since while we support everything that stk needs, we don't yet support everything needed to advertise gl3.1. (But hey, according to qualcomm, adreno 3xx doesn't even support higher than gles3.0.. I guess we'll have to show them ;-)) </span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">The nice thing to see about this working, is that it is utilizing pretty much all of the recent freedreno features (transform feedback, MRT, UBO's, TBO's, etc).</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">Of course, the new render engine is considerably more heavyweight compared to older versions of stk. But I think there is some <a href="https://github.com/supertuxkart/stk-code/issues/2288">low hanging fruit</a> on the stk engine side of things to reclaim some of those lost fps.</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">update: oh, and the first time around, I completely forgot to mention that qualcomm has recently published *some* gpu <a href="https://developer.qualcomm.com/hardware/dragonboard-410c/tools">docs</a>, for a3xx, for the dragonboard 410c. Not quite as extensive as what broadcom has published for vc4, but it gives us all the a3xx registers, which is quite a lot more than any other SoC vendor has done to date :-)</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-size: x-small;"><br /></span>
<span style="font-size: x-small;"><span style="font-family: Arial,Helvetica,sans-serif;">[1]
minus MSAA.. There is a bigger task, which is on the TODO list, to teach mesa st about some
extensions to support MSAA resolve on tile->mem.. such as
EXT_multisampled_render_to_texture, plus perhaps driconf option to
enable it for apps that are not aware, which would make MSAA much more
useful on a tiling gpu. Until then, mesa doesn't check MSAA for gles3,
and if it did we could advertise PIPE_CAP_FAKE_SW_MSAA. Plus, who really cares about MSAA on a 5" 4k screen?</span></span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span></div>
Robhttp://www.blogger.com/profile/00061851853178706566noreply@blogger.com2tag:blogger.com,1999:blog-8201318254944513910.post-40711839450847764952015-07-04T10:07:00.000-07:002015-07-04T11:54:13.127-07:00happy (gpu) independence day<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: Arial,Helvetica,sans-serif;">So, I realized it has been a while since posting about freedreno progress, so in honor of US independence day I figured it was as good an excuse as any for an update about independence from gpu blob driver for snapdragon/adreno..</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">Back in end of March 2015 at ELC, I gave a <a href="https://lwn.net/Articles/638908/">freedreno update</a> <a href="http://people.freedesktop.org/~robclark/freedreno-elc-2015.html">presentation</a> at ELC, listing the following major tasks left for gles3 support:</span><br />
<ul style="text-align: left;">
<li><span style="font-family: Arial,Helvetica,sans-serif;">Uniform Buffer Objects (UBO)</span></li>
<li><span style="font-family: Arial,Helvetica,sans-serif;">Transform Feedback (TF)</span></li>
<li><span style="font-family: Arial,Helvetica,sans-serif;">Multi-Render-Target (MRT)</span></li>
<li><span style="font-family: Arial,Helvetica,sans-serif;">advanced flow control in shader compiler</span></li>
</ul>
<span style="font-family: Arial,Helvetica,sans-serif;"> and additionally for gl3: </span><br />
<div>
<ul style="text-align: left;">
<li><span style="font-family: Arial,Helvetica,sans-serif;">Multisample anti-aliasing (MSAA)</span></li>
<li><span style="font-family: Arial,Helvetica,sans-serif;">NV_conditional_render</span></li>
<li><span style="font-family: Arial,Helvetica,sans-serif;">32b depth (z32 and z32_s8) (which I forgot to mention in the presentation) </span></li>
</ul>
<span style="font-family: Arial,Helvetica,sans-serif;">EDIT: Ilia pointed out that 32b depth is needed for gles3 too, and gl3 additionally needs clipdist/etc (which we'll have to emulate, but hopefully can do in a generic nir pass) and rgtc (which will need sw decompression hopefully in mesa core so other drivers for gles class hw can reuse). Original list was based on what mesa's compute_version() code was checking quite some time back.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"> </span><br />
<span style="font-family: Arial,Helvetica,sans-serif;">Since then, we've gained support for UBO's (a3xx by Ilia Mirkin, and a4xx), MRT (for a3xx and core, again thanks to Ilia.. still needs to be wired up for a4xx), 32b depth (a3xx and core, again thanks to Ilia), and I've finished up shader compiler for loops/flow-control for ir3 (a3xx/a4xx). The shader compiler work was a somewhat larger task than I expected (and I did expect it to be a lot of work), but it also involved moving over to <a href="http://www.jlekstrand.net/jason/projects/mesa/nir-notes/">NIR</a>, in addition to re-writing the scheduler and register allocation passes, as well as a lot of re-org to ir3 in order to support multiple basic blocks. The move to NIR was not strictly required, but it brings a lot of benefits in the form of shared support for conversion to SSA, scalarizing, CSE, DCE, constant folding, and algebraic optimizations. And I figured it was less work in the long run to move to NIR first and drop the TGSI frontend, before doing all the refactoring needed to support loops and non-lowerable flow-control. Incidentally, the compiler work should make the shader-compiler part of TF easier (since we need to generate a conditional write to TF buffer iff not overwriting past the end of the TF buffer).</span></div>
<div>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial,Helvetica,sans-serif;">In the mean time, freedreno and drm/msm have also gained support for the a306 gpu found in the new <a href="https://www.96boards.org/products/ce/dragonboard410c/">dragonboard 410c</a>. This board is a nice new low cost ($75) snapdragon community board based on the 64bit snapdragon 410. And thanks to a lot of work by linaro and qualcomm, the upstream kernel situation for this board is looking pretty good. It is shipping initially with a <a href="https://git.linaro.org/?p=landing-teams/working/qualcomm/kernel.git;a=shortlog;h=refs/heads/release/qcomlt-4.0">4.0 based kernel</a> (with patches on top for stuff that hadn't yet been merged for 4.0, including a lot of stuff backported from 4.1 and 4.2), including gpu/display/audio/video-codec/etc. I believe that the 4.1 kernel was the first version where a vanilla kernel could boot on db410c with basic stuff (like serial console) working. The kernel support for the gpu and display, other than the adv7533 hdmi bridge chip) landed in 4.2. There is still more work to get *everything* (including audio, vidc, etc) merged upstream, but work continues in that direction, making this quite an exciting board.</span></div>
<div>
</div>
<div>
<span style="font-family: Arial,Helvetica,sans-serif;">Also, we have a GSoC student, Varad, working on <a href="https://varadgautam.wordpress.com/2015/06/27/initial-support-for-freedreno-on-android/">freedreno support for android</a>. It is still in early stages, with some debugging still to do, but he has made a lot of progress and things are starting to work.</span></div>
<div>
</div>
<div>
<span style="font-family: Arial,Helvetica,sans-serif;">And since no blog post is complete without some nice screenshots... the other day someone pointed me at a post in the dolphin forums about <a href="https://forums.dolphin-emu.org/Thread-adreno-420-speed-is-amazing-but?pid=376831#pid376831">how dolphin was running on a420</a> (same device as in the <a href="http://www.inforcecomputing.com/products/single-board-computers/6540-single-board-computer-sbc">ifc6540</a>). We all had a good laugh about the rendering issues with the blob driver. But, since dolphin was the first gl3 game that worked with freedreno, I was curious how freedreno would do.. so I fired up the ifc6540 and replayed some dolphin fifo logs that would let me render approximately the same scenes:</span></div>
<div>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: Arial,Helvetica,sans-serif;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmq2-NeexUypChZeHJCrfKGXiM7g7sE_4RKirHNtwC1KeGXLN88Pg2oQuRqQ_1Zxs7b5NSvq1g_Jw6BJ2czRBq48wVBwAmFXWm13HlVCNyeSHpnDeNjHmj6mNQjgTDkPwcpeoKNqVLw83b/s1600/SSBM-Yoshi.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmq2-NeexUypChZeHJCrfKGXiM7g7sE_4RKirHNtwC1KeGXLN88Pg2oQuRqQ_1Zxs7b5NSvq1g_Jw6BJ2czRBq48wVBwAmFXWm13HlVCNyeSHpnDeNjHmj6mNQjgTDkPwcpeoKNqVLw83b/s320/SSBM-Yoshi.png" width="320" /></a></span></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: Arial,Helvetica,sans-serif;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjft6qnCPvoK-aA5F0TfcWWr98qAD4R1cSjuwMv8_Yb3eGfJAy9iwpfzG9VQoXIsUO_ggz3q5PwFiThRAFzyNLp-Ag8LZ4g-te0r6cK8hSTn1BGSHMA7vD_OqfpI6MhoKJanPMa0Uyatwdr/s1600/Title.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjft6qnCPvoK-aA5F0TfcWWr98qAD4R1cSjuwMv8_Yb3eGfJAy9iwpfzG9VQoXIsUO_ggz3q5PwFiThRAFzyNLp-Ag8LZ4g-te0r6cK8hSTn1BGSHMA7vD_OqfpI6MhoKJanPMa0Uyatwdr/s320/Title.png" width="320" /></a></span></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: Arial,Helvetica,sans-serif;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEirXbfFidNHjc35jLwFEge9KW7VfgHma1aSq66F8qLMOf2Y_sbt_jnCRBbReQdfh70_xahfZRXLKfWmod6Z5Jfz0CSYAl59MRVLw5IKQ7YAsnQAHW4bKD1jnL7Erx51WApsIvvdSz6NCZGD/s1600/CreationScreen.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEirXbfFidNHjc35jLwFEge9KW7VfgHma1aSq66F8qLMOf2Y_sbt_jnCRBbReQdfh70_xahfZRXLKfWmod6Z5Jfz0CSYAl59MRVLw5IKQ7YAsnQAHW4bKD1jnL7Erx51WApsIvvdSz6NCZGD/s320/CreationScreen.png" width="320" /></a></span></div>
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: Arial,Helvetica,sans-serif;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2JMyTv6z3904zn09XmBOGIPNEsHdEaKNxJNaEsfCyTprdgvHBx25wsNNlO1rC7IK_XHqilgQto43L87EbTHotGxFRBjNxTGOZt_u9_DXUUKoKbLV7_OZZdBci5To0-ZMIEpdunGAtrXCD/s1600/Ingame.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2JMyTv6z3904zn09XmBOGIPNEsHdEaKNxJNaEsfCyTprdgvHBx25wsNNlO1rC7IK_XHqilgQto43L87EbTHotGxFRBjNxTGOZt_u9_DXUUKoKbLV7_OZZdBci5To0-ZMIEpdunGAtrXCD/s320/Ingame.png" width="320" /></a></span></div>
<div>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span></div>
<div>
</div>
<div>
<span style="font-family: Arial,Helvetica,sans-serif;">Yoshi looks to be rendering pretty well.. digimon has a bit of corruption, but no where near as bad as the blob driver. I suspect the issue with digimon is an instruction scheduling issue in the shader compiler (well, no rest for the gpu driver writers), but nice to see that it is already in pretty good shape.</span></div>
<div>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial,Helvetica,sans-serif;">Now we just need steam store or some unigine demos for arm linux :-P</span></div>
<div>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span></div>
<div>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span></div>
<div>
</div>
</div>
Robhttp://www.blogger.com/profile/00061851853178706566noreply@blogger.com0tag:blogger.com,1999:blog-8201318254944513910.post-23056195257142372732015-02-22T07:41:00.003-08:002015-02-22T07:41:56.345-08:00a4xx shaping up<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: Arial,Helvetica,sans-serif;">So, I finally figured out the bug that was causing some incorrect rendering in xonotic (and, it turns out, to be the same bug plaguing a lot of other games/webgl/etc). The fix is pushed to upstream mesa master (and I guess I should probably push it to the 10.5 stable branch too). Now that xonotic renders correctly, I think I can finally call freedreno a4xx support usable:</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<div class="separator" style="clear: both; text-align: center;">
<iframe width="320" height="266" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/ZGWplbPCOmE/0.jpg" src="http://www.youtube.com/embed/ZGWplbPCOmE?feature=player_embedded" frameborder="0" allowfullscreen></iframe></div>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">Also, for fun, a little comparison between the ifc6540 board (snapdragon 805, aka apq8084), and my laptop (i5-4310U). Both have 1920x1080 resolution, both running gnome-shell and firefox (with identical settings). Laptop is fedora f21 while ifc6540 is rawhide), but it is quite close to an apples-to-apples comparision:</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<div class="separator" style="clear: both; text-align: center;">
<iframe width="320" height="266" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/c7sHPU6L79k/0.jpg" src="http://www.youtube.com/embed/c7sHPU6L79k?feature=player_embedded" frameborder="0" allowfullscreen></iframe></div>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">Obviously not a rigorous benchmark, so please don't read too much into the results. The intel is still faster overall (as it should be at it's size/price/power budget), but amazing that the gap is becoming so small between something that can be squeezed into a cell phone and dedicated laptop class chips.</span><br />
<br /></div>
Robhttp://www.blogger.com/profile/00061851853178706566noreply@blogger.com12tag:blogger.com,1999:blog-8201318254944513910.post-15906407404937558052014-12-20T12:03:00.001-08:002014-12-20T12:03:18.167-08:00a4xx in the holiday spirit<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: Arial,Helvetica,sans-serif;">Just in time for the upcoming break, we have figured out how to do alpha-test, and now supertuxkart is rendering properly:</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiRVRrrfQfY7nhbUTQO8ADqoWDpom-_5ZQkn-MAHwFTfT-6JEEty6U9gpUNhH1OapPVk9_CXezPVJtlb6hE5KEcLN1Ml8aaqwhyiTdQfzexIayF_i05ZyuD1nF7Hsfv7ItyGVdzoKVhZr7f/s1600/stk-a4xx.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiRVRrrfQfY7nhbUTQO8ADqoWDpom-_5ZQkn-MAHwFTfT-6JEEty6U9gpUNhH1OapPVk9_CXezPVJtlb6hE5KEcLN1Ml8aaqwhyiTdQfzexIayF_i05ZyuD1nF7Hsfv7ItyGVdzoKVhZr7f/s1600/stk-a4xx.png" height="180" width="320" /></a></div>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">If you are wondering about the new stk beta, I have a build from a few weeks back which seems to render properly as well.. few rough edges but I think that is just from using random git commit-id for stk. But we don't have enough gl3 features yet (on a3xx or a4xx) to be using the new rendering paths.</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">And gnome-shell works nicely too. Still some rendering issues with xonotic. And a little ways behind a3xx in piglit results, but not quite as much as I would have expected at this early stage.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">Still missing are some optimizations that are important for certain use-cases (hw-binning support for games, GMEM bypass for UI/mipmap-generation/etc). But the a420 in apq8084 (ifc6540 board) is surprisingly fast all the same.</span><br />
</div>
Robhttp://www.blogger.com/profile/00061851853178706566noreply@blogger.com2tag:blogger.com,1999:blog-8201318254944513910.post-61188216627438572922014-11-15T07:45:00.000-08:002014-11-15T07:49:11.113-08:00freedreno a4xx<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: Arial,Helvetica,sans-serif;">A couple weeks ago, qualcomm (quic) surprised some by <a href="http://www.phoronix.com/scan.php?page=news_item&px=MTgyNzM">sending kernel patches</a> to enable the new adreno 4xx family of GPUs found in their latest SoCs. Such as the apq8084 powering my <a href="http://inforcecomputing.com/blog/?p=279">ifc6540</a> board with the a420 GPU. Note that qualcomm had already sent patches to enable display support for apq8084, merged in 3.17. And I'm looking forward to more good things from their upstream efforts in the future.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">So in the last weeks, in between various other kernel work (<a href="http://blog.ffwll.ch/2014/11/atomic-modeset-support-for-kms-drivers.html">atomic-helper</a> conversion and few other misc things for 3.19) and RHEL stuff, I've managed to bang out <a href="http://cgit.freedesktop.org/mesa/mesa/commit/?id=61c68b69d704b5faa5ff9d2b73b24bebf7e19412">initial gallium support for a4xx</a>. There are still plenty of missing things, or stuff hard-coded, etc. But yesterday I managed to get textures working, and fix RGBA/BGRA confusion, so now enough works for 'gears and maybe about half of glmark2:</span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQKAkIe2np0cp9G8vhLJ4dX_O3gCbK8xCvFHz3JvZZELh3cOoBDe7rxDv7_aqeO_SvarLffpsjJVym6veB89-tOS4JbYmU2L-LkNwXwrq46-4CFJBCj5oM-kjj_i7C59cyKq7VSvHh-wcB/s1600/gears.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQKAkIe2np0cp9G8vhLJ4dX_O3gCbK8xCvFHz3JvZZELh3cOoBDe7rxDv7_aqeO_SvarLffpsjJVym6veB89-tOS4JbYmU2L-LkNwXwrq46-4CFJBCj5oM-kjj_i7C59cyKq7VSvHh-wcB/s1600/gears.png" height="200" width="185" /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjnQTljtiXeKF9kWIEkZbagFXSRyR1qA9gyhZI_Z_W5S9Ua2IJBvjsEdC_UwC0WzUJ8qbL4GppTXaHcjpG6m4nSR21Oez8KowdYx1M_jp4XufO4eCMzmic_ZacCfS4qdmystx-zv2uM_uKL/s1600/build.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjnQTljtiXeKF9kWIEkZbagFXSRyR1qA9gyhZI_Z_W5S9Ua2IJBvjsEdC_UwC0WzUJ8qbL4GppTXaHcjpG6m4nSR21Oez8KowdYx1M_jp4XufO4eCMzmic_ZacCfS4qdmystx-zv2uM_uKL/s1600/build.png" height="155" width="200" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiNyxerdg-s17wta89HszI90wwx_ZdXrZHVQGgiszVsU7VYzG-kp1-BenNjFdYKzWbiIQ-xURGYQmepYCZGDvSaUVOmx_G36hvX1HEDpdBWWLmATZ-FiOYEiWKZbz4aeF3bbGddcNDSdtD4/s1600/bump.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiNyxerdg-s17wta89HszI90wwx_ZdXrZHVQGgiszVsU7VYzG-kp1-BenNjFdYKzWbiIQ-xURGYQmepYCZGDvSaUVOmx_G36hvX1HEDpdBWWLmATZ-FiOYEiWKZbz4aeF3bbGddcNDSdtD4/s1600/bump.png" height="155" width="200" /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhEhsGXnWytMWqVtf1aYg0e7r0BoWO3J2VUHiQkmpoITCWMhxXnW2koGLpSvABcbVFnE-C6ViUzty4pwbbp2sMvH2HZFY-JUvVMEOOttwuwKqgnOSOkTyww05yh4F_yQHXfk3KevMk_5YTc/s1600/shading.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhEhsGXnWytMWqVtf1aYg0e7r0BoWO3J2VUHiQkmpoITCWMhxXnW2koGLpSvABcbVFnE-C6ViUzty4pwbbp2sMvH2HZFY-JUvVMEOOttwuwKqgnOSOkTyww05yh4F_yQHXfk3KevMk_5YTc/s1600/shading.png" height="155" width="200" /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgoNq9oWLnPvv8uWNBFnSucVvmJu0SpwVhrBxKrageIeSwNG4VhpUWA-Z0TAGLozVvR32z4-I60UxXBieF2DKOav9Hd7U-_9csn6DcDmnm6tL0qc-EIvqfbaS_XenWZEuTbfo5oTGo_1o1D/s1600/texture.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgoNq9oWLnPvv8uWNBFnSucVvmJu0SpwVhrBxKrageIeSwNG4VhpUWA-Z0TAGLozVvR32z4-I60UxXBieF2DKOav9Hd7U-_9csn6DcDmnm6tL0qc-EIvqfbaS_XenWZEuTbfo5oTGo_1o1D/s1600/texture.png" height="155" width="200" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi1l9J5j-0wgv2HvNktTMB6A6RBb77cfx6t-LGA569kiaTa1vgWHAu5y7HhWHTVR9pWML6TJXyBubbIGP_rCaWsVf8qK2vAVxN5h4dzNIzPh2H5EHxRnpwQJlvJwIhfL0KlEkKAl54QuXce/s1600/buffer.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi1l9J5j-0wgv2HvNktTMB6A6RBb77cfx6t-LGA569kiaTa1vgWHAu5y7HhWHTVR9pWML6TJXyBubbIGP_rCaWsVf8qK2vAVxN5h4dzNIzPh2H5EHxRnpwQJlvJwIhfL0KlEkKAl54QuXce/s1600/buffer.png" height="155" width="200" /></a></div>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">I've intentionally pushed it (just now) after the mesa 10.4 branch point, since it isn't quite ready to be enabled by default in distro mesa builds. When it gets to the point of at least being able to run a desktop environment (gnome-shell / compiz / etc), I may backport to 10.4. But there is still a lot of work to do. The good news is that so far it seems quite fast (and that is without hw binning or XA yet even!)</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span></div>
Robhttp://www.blogger.com/profile/00061851853178706566noreply@blogger.com2tag:blogger.com,1999:blog-8201318254944513910.post-42026313703057252992014-10-13T08:01:00.000-07:002014-10-13T08:01:06.893-07:00Silly r/e tool nonsense hacks<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: Arial,Helvetica,sans-serif;">In the process of reverse engineering work for freedreno, I've cobbled together some interesting <a href="https://github.com/freedreno/freedreno/wiki/Reverse-engineering-tools" target="_blank">tools</a>. The earliest and most useful of which is <a href="https://github.com/freedreno/freedreno/wiki/Reverse-engineering-tools#cffdump" target="_blank">cffdump</a>. (Named after some command-stream dumping debug code in the old kgsl android kernel driver, upon which it was originally inspired.) The cffdump tool knows how to parse out the "toplevel" command-stream stored as an .rd (re-dump) file, finding packets that load state memory, write registers, IB (indirect branch), etc. The .rd file contains snapshots of gpu buffers, in order to chase gpu pointers at decode time. It links in librnn from the nouveau envytools project for the decoding of individual registers, and a few other things. It also calls out to the freedreno disassembler code to show inline disassembly of shaders, decodes vertex and constant (uniform) buffers, etc. And even generates pretty color output (thanks to librnn):</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: Arial,Helvetica,sans-serif;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXj3Wd4nTFhog5UcnhHfzulx9hx0t8uyQ7JKJJOU2ARluLgf9_jQErYTXOudCe40hQ3xWOMlP9PmxHCOpX855aK3V56iSEwjng_ATprEtZ9Vm4D6SXVm4BMTwy1vHm0QzFm0fjIKGTRbxW/s1600/cffdump.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXj3Wd4nTFhog5UcnhHfzulx9hx0t8uyQ7JKJJOU2ARluLgf9_jQErYTXOudCe40hQ3xWOMlP9PmxHCOpX855aK3V56iSEwjng_ATprEtZ9Vm4D6SXVm4BMTwy1vHm0QzFm0fjIKGTRbxW/s1600/cffdump.png" height="167" width="320" /></a></span></div>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">A few months back, I added some basic <a href="http://www.lua.org/">lua</a> scripting support to cffdump, mostly to assist in r/e work for adreno a4xx. When invoked with the --script argument, cffdump would load the specific lua script, and call the 'draw' function it defines on each <span style="font-family: "Courier New",Courier,monospace;">CP_DRAW_INDX</span> opcode. The choice of lua was mostly because it seemed fairly easy to integrate with .c code.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">Since then, I've had the thought in the back of my mind that adding script bindings to integrate rnn register decode to lua would be useful for much more. Such as writing a command-stream validator to check for inconsistent programming. There are a number of places where inconsistencies between various register settings and such will result in gpu lockup. The general adreno design philosophy appears to be to not ever dedicate transistors to making the driver writer's life easier... which for a SoC gpu is certainly the right choice, but it doesn't make things any easier for me. Over time, I've discovered many of these of these rules, but they are mostly all in my head at the moment. And from time to time, when adding new features to the gallium driver, I inadvertently break one or more of the rules and end up wasting time studying cmdstream dumps from the freedreno gallium driver to figure out what I did wrong.</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">So, on the way to <a href="http://www.x.org/wiki/Events/XDC2014/" target="_blank">XDC2014</a> I started hacking up support for <a href="https://github.com/freedreno/freedreno/commit/45aabc195e889bba90a905dbbbd5a7e999dfd363" target="_blank">register decoding</a> from lua scripts. It turns out that time in airports and airplanes, where I can't exactly break out an ifc6410 and hdmi monitor to do some driver work, is a good time to catch up on these sort of projects. Now I can do nifty things like:</span><br />
<blockquote class="tr_bq">
<span style="background-color: white;"><span style="font-family: Arial,Helvetica,sans-serif;"><br /></span></span>
<span style="font-size: xx-small;"><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;"><span style="background-color: white;">-- load rnn database file for a320:</span></span></span><br /><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;"><span style="background-color: white;">r = rnn.init("a320")</span></span></span><br /><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;"><span style="background-color: white;"></span></span></span><br /><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;"><span style="background-color: white;">function start_cmdstream(name)</span></span></span><br /><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;"><span style="background-color: white;"> io.write("START: " .. name .. "\n")</span></span></span><br /><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;"><span style="background-color: white;">end</span></span></span><br /><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;"><span style="background-color: white;"></span></span></span><br /><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;"><span style="background-color: white;">function draw(primtype, nindx)</span></span></span><br /><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;"><span style="background-color: white;"> -- simple full register access:</span></span></span><br /><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;"><span style="background-color: white;"> io.write("GRAS_CL_VPORT_XOFFSET: " .. r.GRAS_CL_VPORT_XOFFSET .. "\n")</span></span></span><br /><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;"><span style="background-color: white;"> -- access boolean bitfield Z_ENABLE in RB_DEPTH_CONTROL register:</span></span></span><br /><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;"><span style="background-color: white;"> io.write("RB_DEPTH_CONTROL.Z_ENABLE: " .. tostring(r.RB_DEPTH_CONTROL.Z_ENABLE) .. "\n")</span></span></span><br /><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;"><span style="background-color: white;"> -- access ROP_CONTROL bitfield inside CONTROL register inside RB_MRT[] array:</span></span></span><br /><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;"><span style="background-color: white;"> io.write("RB_MRT[0].CONTROL.ROP_CODE: " .. r.RB_MRT[0].CONTROL.ROP_CODE .. "\n")</span></span></span><br /><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;"><span style="background-color: white;">end</span></span></span><br /><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;"><span style="background-color: white;"></span></span></span><br /><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;"><span style="background-color: white;">function end_cmdstream()</span></span></span><br /><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;"><span style="background-color: white;"> io.write("END\n")</span></span></span><br /><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;"><span style="background-color: white;">end</span></span></span><br /><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;"><span style="background-color: white;"></span></span></span><br /><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;"><span style="background-color: white;">function finish()</span></span></span><br /><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;"><span style="background-color: white;"> io.write("FINISH\n")</span></span></span><br /><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;"><span style="background-color: white;">end</span></span></span></span><br /><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;"></span></span></blockquote>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span><span style="font-family: Arial,Helvetica,sans-serif;">which will generate output like:</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<blockquote class="tr_bq">
<span style="font-size: xx-small;"><span style="font-family: "Courier New",Courier,monospace;">[robclark@thunkpad:~/src/freedreno (master)]$ ./cffdump --script test.lua piglit.rd <br />Reading piglit.rd...<br />START: piglit.rd</span></span><br />
<span style="font-size: xx-small;"><span style="font-family: "Courier New",Courier,monospace;">GRAS_CL_VPORT_XOFFSET: 79.5<br />RB_DEPTH_CONTROL.Z_ENABLE: true<br />RB_MRT[0].CONTROL.ROP_CODE: 12</span> </span></blockquote>
<br /><span style="font-family: Arial,Helvetica,sans-serif;">Currently it should handle all of the rnndb constructs that are used for adreno. Ie. simple registers, arrays of simple registers, arrays of groups of registers, etc. No support for "stripes" yet since those are not used for freedreno.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">At the moment, all the script bindings are in <a href="https://github.com/freedreno/freedreno/blob/45aabc195e889bba90a905dbbbd5a7e999dfd363/util/script.c">freedreno.git/util/script.c</a> but if there is some interest in this from nouveau or anyone else using librnn then it would be a good idea to try to refactor some of this into more generic code in librnn. It would still need a bit of glue from the tool linking librnn to get at the actual register values.</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">Still needed are a few more script hooks (such as <span style="font-family: "Courier New",Courier,monospace;">CP_LOAD_STATE</span>) to do everything I need for a validator script. Hopefully I find some time to work on that before the next conference ;-)</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">PS. I hope this post is at least a bit coherent.. I am still a bit jetlagged.. </span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"> </span><br />
<br /><span style="font-family: Arial,Helvetica,sans-serif;"><br /></span></div>
Robhttp://www.blogger.com/profile/00061851853178706566noreply@blogger.com0tag:blogger.com,1999:blog-8201318254944513910.post-27158202267080620512014-10-04T07:08:00.000-07:002014-10-04T07:08:54.660-07:00Freedreno Update<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: Arial,Helvetica,sans-serif;">A number of people have recently asked what is new with freedreno. It had been a while since posting an update.. and, well, not everyone watches mesa commit logs for fun, or watches #freedreno on freenode, so it seemed like time for another semi-irregular freedreno blog post.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">The tl;dr version: recently it has been a lot of robustness, and bug fixes and smaller feature implementation for piglit, etc. No one big exciting feature this time.. but lots of little things adding up to make freedreno on a3xx more complete and mature.</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">And an obligatory screenshot, just because:</span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgHDXIFgwu2JKFUSf7n95-pVjVSDjyLRL6KV1_hDpaSHxJKockQMBQVvYPLdYG_G3DuPpNSkQXWSQi6hcgF2piDubxMidrtDifA-EciyhQRSSNcFH77h6AO2Z5CZKxad_07G_OQwlG3xUVO/s1600/webgl.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgHDXIFgwu2JKFUSf7n95-pVjVSDjyLRL6KV1_hDpaSHxJKockQMBQVvYPLdYG_G3DuPpNSkQXWSQi6hcgF2piDubxMidrtDifA-EciyhQRSSNcFH77h6AO2Z5CZKxad_07G_OQwlG3xUVO/s1600/webgl.png" height="180" width="320" /></a></div>
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">(Yeah, webgl should probably be faster in chrome/chromium.. but not packaged for fedora, and chrome build system was invented by someone who wants to make compiling their src as difficult as possible.)</span><br />
<br /><b><span style="font-family: Arial,Helvetica,sans-serif;">Mesa..</span></b><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">On the mesa/gallium driver front, the big news is that earlier this week we finally achieved a 90% pass ratio for <a href="http://piglit.freedesktop.org/" target="_blank">piglit</a>. (In fact, 90.4%) To put this in perspective, a little over six months ago freedreno was at just 50% pass. Since June, we have added around 600 passing tests. In fact in the last week, an additional ~50 tests are passing, which bumps us up to 91% pass.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">For those who are not familiar with it, piglit is an open source OpenGL test suite. Since the mesa developers are quite good about adding new test cases to piglit whenever adding a new feature/extension to mesa, it is a very comprehensive test suite. The down side, if you could call it that, is that it has a lot more OpenGL tests compared to OpenGLES (at least for GLES < 3.0). So getting the pass ratio up involved implementing (and in some cases emulating) a number of features that the blob ES-only driver does not support. Fortunately enough of the registers and bitfields are known at this point that trial and error with educated guesses (and then see which guesses make piglit tests pass) has worked out reasonably well for some features. Other features, like GL_CLAMP and two sided color, we need to emulate in the shader, which was implemented as a TGSI to TGSI pass in order to hopefully be useful for other gallium drivers for GLES class hardware. (And, in fact both of those are things that at least some of the desktop drivers need to emulate as well.)</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">And big thanks to Ilia Mirkin for a lot of advice and some patches for the failing piglits. Ilia has also started sending a lot of patches for the compiler to flesh out integer support, add new instructions (in particular texture sample instructions), and other things that will be needed for GL3/GLES3. In fact as a result of his work, we are already at ~85% pass for GL3 despite missing some bullet-point features!</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<b><span style="font-family: Arial,Helvetica,sans-serif;">DDX..</span></b><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">On the xf86-video-freedreno front, over the last few months we have gained server managed fd's and OutputClass support (so that a sufficiently new xserver can auto-pick the correct driver, like we have had for a long time on desktop/pci systems). And a hot-off-the-presses 1.3.0 release with a handful of robustness fixes. I strongly recommend to upgrade.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<b><span style="font-family: Arial,Helvetica,sans-serif;">Kernel..</span></b><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">These last few kernel releases have seen a significant improvement in the state of apq8064/ifc6410 support upstream. As of the 3.17 kernel, the main things missing to work on a pure-upstream[1] kernel are the rpm/rpm-regulators iommu drivers. The linaro folks have been a big help there. In particular, their <a href="https://git.linaro.org/landing-teams/working/qualcomm/kernel.git/shortlog/refs/heads/integration-linux-qcomlt" target="_blank">integration branch</a>, which consists of latest upstream plus in-flight patches, is significantly easier than tracking all the relevant kernel mailing lists.</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">For drm/msm, the last few kernel releases have seen: some basic gpu perf and logging debugfs features, DT support for mdp4 (display controller version in apq8064), LVDS and multi-monitor support for mdp4, and mdp5 v1.3 support from qcom for upcoming devices. And of course bug fixes!</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">[1] Ie. Linus's tree... kernel-msm or AOSP is not upstream, for any android type's who were confused about that.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span></div>
Robhttp://www.blogger.com/profile/00061851853178706566noreply@blogger.com0tag:blogger.com,1999:blog-8201318254944513910.post-82087494527139056902014-06-23T16:52:00.001-07:002014-06-23T16:54:06.000-07:00Fire in the (root) hole!<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial,Helvetica,sans-serif;">This will, I
think, be the first time blogging about something quite so
retroactively, but for reasons which should be apparent, I could not blog about this little adventure until now. This is the story
of <a href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-0972" target="_blank">CVE-2014-0972</a> (</span><span style="font-family: Arial,Helvetica,sans-serif;"><a href="https://www.codeaurora.org/unprivileged-gpu-command-streams-can-change-iommu-page-table-cve-2014-0972" target="_blank">QCIR-2014-00004-1</a>), and (at least part of) how I was able to install fedora on my firetv:</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Arial,Helvetica,sans-serif;"></span></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span><iframe allowfullscreen='allowfullscreen' webkitallowfullscreen='webkitallowfullscreen' mozallowfullscreen='mozallowfullscreen' width='320' height='266' src='https://www.youtube.com/embed/PEruWaKKviQ?feature=player_embedded' frameborder='0'></iframe></div>
<h4 style="text-align: left;">
<span style="font-family: Arial,Helvetica,sans-serif;"></span><span style="font-family: Arial,Helvetica,sans-serif;">Introduction..</span></h4>
<h4>
<span style="font-family: Arial,Helvetica,sans-serif;"></span></h4>
<span style="font-family: Arial,Helvetica,sans-serif;">Back in April, I bought myself a <a href="http://en.wikipedia.org/wiki/Amazon_Fire_TV" target="_blank">Fire TV</a>, with the thought that it would make a nice fedora xbmc htpc setup, complete with open src drivers, to replace my aging pandaboard. But, of course, as delivered the Fire TV is locked down with no root access.<br /><br />At the same time, there was a feature of the downstream android kernel gpu driver (kgsl), per-context pagetables, which had been on my TODO list for the upstream drm/msm driver for a while now. But, I needed to understand better what kgsl was doing and the interactions with the hardware, in particular the behaviour of the CP (command processor), in order to convince myself that such a feature was safe. People generally frown on introducing root holes in the upstream kernel, and I didn't exactly have documentation about the hardware. So it was time to roll up my sleeves and get some hands-on experience (translation: try to poke and crash the gpu in lots of different ways and try to make sense of the result). </span><br />
<h4 style="text-align: left;">
<span style="font-family: Arial,Helvetica,sans-serif;">Into the rabbit hole..</span></h4>
<h3>
<span style="font-family: Arial,Helvetica,sans-serif;"></span></h3>
<span style="font-family: Arial,Helvetica,sans-serif;">The modern snapdragon SoCs use IOMMUs everywhere. Including the GPU. To implement per-context gpu pagetables, basically all the driver needs to do is to bang a few IOMMU registers to change the pagetable base addr and invalidate the TLB. But this must be done when you are sure the GPU is not still trying to access memory mapped in the old page tables. Since a GPU is a highly asynchronous device, it would be a big performance hit to stall until GPU ringbuffer drains, then reprogram IOMMU, then resume the GPU with commands from the new context. To avoid this performance hit, kgsl maps some of the IOMMU registers into the GPU's virtual address space, and emits commands into the ringbuffer for the CP to write the necessary registers to switch pagetables and invalidate TLB.<br /><br />It was this reprogramming of IOMMU from the GPU itself which I needed to understand better. Anyone who understands GPU's would have the initial reaction that this is extremely dangerous. But kgsl was, it seemed, taking some protections. However, I needed to be sure I properly understood how this worked, to see if there was something that was overlooked.<br /><br />The GPU, in fact, has two hw contexts which it can switch between. Essentially it is in some ways similar to supervisor vs user context on a CPU. The way kgsl uses this is to map the IOMMU registers into the supervisor context, but not user contexts. The ringbuffer is mapped into all the user contexts, plus supervisor context, at the same device virtual address. The idea being that if the ringbuffer is mapped in the same position in all contexts, you can safely context switch from commands in the ringbuffer.<br /><br />To do this, kgsl emits commands for the CP to write a special bit in <span style="font-family: "Courier New",Courier,monospace;">CP_STATE_DEBUG_INDEX</span> to switch to the "supervisor" context. Then commands to write IOMMU registers, followed by write to <span style="font-family: "Courier New",Courier,monospace;">CP_STATE_DEBUG_INDEX</span> to switch back to user context. (I'm over-simplifying slightly, as there are some barriers needed to account for asynchronous writes.) But userspace constructed commands never execute from the ringbuffer, instead the kernel puts an IB (indirect branch) into the ringbuffer to jump to the userspace constructed cmdstream buffer. This userspace cmdstream buffer is never mapped into supervisor context, or into other user's contexts. So in theory, if userspace tried to write <span style="font-family: "Courier New",Courier,monospace;">CP_STATE_DEBUG_INDEX</span> to switch to supervisor mode (and gain access to the IOMMU registers), the GPU would immediately page fault, since the cmdstream it was in the middle of executing is no longer mapped. Ok, so far, so good.</span><br />
<h4 style="text-align: left;">
<span style="font-family: Arial,Helvetica,sans-serif;">Where it breaks down..</span></h4>
<h4>
<span style="font-family: Arial,Helvetica,sans-serif;"></span></h4>
<span style="font-family: Arial,Helvetica,sans-serif;">From my attempts at switching to supervisor mode from IB1, and deciphering the fault address where the gpu crashed, and iommu register dumps, I could tell that the next few commands after the switch to supervisor mode where excuted without problem.. there is some prefetch/pipelining!<br /><br />But much more conveniently, while poking around, I realized that there were a couple pages mapped globally (in supervisor and all user contexts), which where mapped writable in user contexts. I used the so called "setstate" buffer. So I simply had to construct a cmdstream buffer to write the commands I wanted to execute into the setstate buffer, and then do an IB to that buffer and do the supervisor switch in IB2.<br /><br />Ok.. but do do anything useful with this, I'd need a reasonable chunk of physically contiguous pages, at a known physical address.. in particular 16K for first level pagetables and 16K second level pagetables. Fortunately ION comes to the rescue here, with it's physically contiguous carveouts at known physical addresses. In this case, allocate from the multimedia pool when there is no video playback, etc, going on. This way ION allocates from the beginning of the carveout pool, a known address.<br /><br />Into this buffer, construct a new set of pagetables, which map whatever physical address you want to read/write (hint, any of kernel lowmem), a replacement page for the setstate buffer (since we don't know the original setstate buffer's physical address.. which means we actually have two copies of the commands copied into setstate buffer, one copied via gpu to original setstate page, and one written directly by cpu in the replacement setstate page).</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">The proof of concept that I made simply copied the string "Kilroy was here" into a kernel buffer. But quite easily any random app downloaded from an untrusted source could access any memory, become root, etc. Not the sort of thing you want falling into the wrong hands.</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">Once I managed to prove to myself that I understood properly how the hw was working, I wrote up a short report, and submitted it (plus proof of concept) to the qualcomm security team.</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">Now that the vulnerability is no longer embargoed, I've made available the proof of concept and report <a href="https://github.com/robclark/kilroy" target="_blank">here</a>.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br />Originally I planned to (once fixes were pushed out, so as to not put someone who did <i>not</i> intend to root their device at risk) release a jailbreak based on this vulnerability. But once <a href="http://towelroot.com/" target="_blank">towelroot</a> was released, there was no longer a need for me to turn this into an actual firetv jailbreak. Which saves me from having to figure out how to make an apk. </span><br />
<h4 style="text-align: left;">
<span style="font-family: Arial,Helvetica,sans-serif;">Parting thoughts..</span></h4>
<h4>
<span style="font-family: Arial,Helvetica,sans-serif;"></span></h4>
<ol style="text-align: left;">
<li><span style="font-family: Arial,Helvetica,sans-serif;">Well, knownledge about physical addresses and contiguous memory in userspace, while it might not be a security problem in and of itself, it sure helps turn other theoritical exploits into actual exploits.</span></li>
<li><span style="font-family: Arial,Helvetica,sans-serif;">As far as downstream vendor drivers go, the kgsl driver is actually pretty decent, in terms of code quality, etc. I've seen far worse. Admittedly this was not a trivial hole. But imagine what issues lurk in other downstream gpu/camera/video/etc drivers. Security is often not simple, and I really doubt whether the other downstream drivers are getting a critical look (from good-guys who will report the issue responsibly).</span></li>
<li><span style="font-family: Arial,Helvetica,sans-serif;">I used to think of the whole one-kernel-branch-per-device wild-west ways of android as a bit of a headache. Now I realize it is a security nightmare. An important part of platform security is being able to react quickly when (not if) vulnaribilites are found. In the desktop/server world, CVEs are usually not embargoed for more than a week.. that is all you need, since fortunately we don't need a different kernel for each different make and model of server, laptop, etc. In the mobile device world, it is quite a different story!</span></li>
</ol>
<div style="text-align: left;">
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span></div>
</div>
Robhttp://www.blogger.com/profile/00061851853178706566noreply@blogger.com16tag:blogger.com,1999:blog-8201318254944513910.post-81180969565975492822014-05-13T16:08:00.000-07:002014-05-17T05:05:33.335-07:00Freedreno turns gl 2.0 today!<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: Arial,Helvetica,sans-serif;">I've just pushed to upstream mesa support for occlusion query, which means that freedreno now advertises OpenGL 2.0:</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<br />
<blockquote class="tr_bq">
<span style="font-family: "Courier New",Courier,monospace;">OpenGL vendor string: freedreno<br />OpenGL renderer string: Gallium 0.4 on FD320<br />OpenGL version string: 2.0 Mesa 10.3.0-devel (git-00fcf8b)<br />OpenGL shading language version string: 1.20</span></blockquote>
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">Note that this is desktop OpenGL. Freedreno has supported OpenGLES 2.0 for quite a long time now.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">Implementing occlusion query was a bit interesting due to the way the tiling works on adreno. We have to track query results per tile. I've written up a bit of a description about how it works on the wiki: <a href="https://github.com/freedreno/freedreno/wiki/Queries#hardware-queries" target="_blank">Hardware Queries</a></span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">Looks like next up is sRGB support which gets us up to GL 2.1. And then the fun begins with work on GL/GLES 3.0 :-)</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">EDIT: turns out sRGB texture support is pretty easy. So now we are GL 2.1. (GL/GLES 3.0 also needs sRGB render target support which is a bit more involved. But there that is just one of several <a href="https://github.com/freedreno/freedreno/wiki/TODO" target="_blank">features needed</a> for 3.0). </span><br />
<br />
<br /></div>
Robhttp://www.blogger.com/profile/00061851853178706566noreply@blogger.com5tag:blogger.com,1999:blog-8201318254944513910.post-17553821568010571942014-03-07T05:02:00.002-08:002014-03-07T05:02:20.456-08:00mesa git repo for f20<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: Arial,Helvetica,sans-serif;">a quick PSA:</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">For those using my prebuilt freedreno binaries for fedora, there is now a much better way. Nicolas Chauvet has created a repo w/ latest mesa which will work with freedreno:</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<a href="http://blog.kwizart.fr/post/2014/03/02/163-mesa-10.2-from-git-for-Fedora-20"><span style="font-family: Arial,Helvetica,sans-serif;">http://blog.kwizart.fr/post/2014/03/02/163-mesa-10.2-from-git-for-Fedora-20</span></a><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">Big thanks Nicolas!</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span></div>
Robhttp://www.blogger.com/profile/00061851853178706566noreply@blogger.com2tag:blogger.com,1999:blog-8201318254944513910.post-59012692281061415412014-02-05T15:53:00.002-08:002014-02-05T15:53:54.200-08:00freedreno: new compiler<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: Arial,Helvetica,sans-serif;">Complementing the hw binning support which landed <a href="http://bloggingthemonkey.blogspot.com/2014/01/freedreno-update-new-year-edition.html" target="_blank">earlier this year</a>, and is now enabled by default, I've recently pushed the initial round of new-compiler work to mesa. Initially I was going to keep it on a branch until I had a chance to sort out a better register allocation (RA) algorithm, but the improved instruction scheduling fixed so many bugs that I decided it should be merged in it's current form.</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">Or explained another way, ever since fedora updated to supertuxkart 0.8.1, about half the tracks had rendering problems and/or triggered gpu hangs. The new compiler fixed all those problems (and more). And I like supertuxkart :-)</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;"><b>Background:</b></span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">The original a3xx compiler was more of a simple TGSI translator. It translated each TGSI opcode into a simple sequence of one or more native instructions. There was a fixed (per-shader) mapping between TGSI <span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">INPUT</span></span>, <span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">OUTPUT</span></span>, and <span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">TEMP</span></span> vec4 register files to the native (flat) scalar register file. A not-insignificant part of the code was relatively generic, in concept but not implementation, lowering of TGSI opcodes that relate more closely to old ARB shader instructions, (<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">SCS</span></span> - Sine Cosine, <span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">LIT</span></span> - Light Coefficients, etc) </span><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: Arial,Helvetica,sans-serif;">than the instruction set of any modern GPU</span>.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">The simple TGSI translator approach works fine with simple shader ISA's. It worked ok for a2xx, other than slightly suboptimal register usage. But the problem is that a3xx (and a4xx) is not such a simple instruction set architecture. In particular, the <a href="https://github.com/freedreno/freedreno/wiki/A3xx-shader-instruction-set-architecture#wiki-scheduling" target="_blank">instruction scheduling</a> required that the compiler be aware of the shader instruction pipeline(s). </span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">This was obvious pretty early on in the reverse engineering stage. But in the early days of the gallium a3xx support, there were too many other things to do... spending the needed time on the compiler then was not really an option. Instead the "use lots of <span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">nop</span></span>'s and hope for the best" strategy was employed.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">And while it worked as a stop-gap solution, it turns out that there are a lot of edge cases where "hope for the best" does not really work out that well in practice. After debugging a number of rendering bugs and piglit failures which all traced back to instruction scheduling problems, it was becoming clear that it was time for a more permanent solution.</span><br />
<br />
<b><span style="font-family: Arial,Helvetica,sans-serif;">In with the new:</span></b><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">First thing I wanted to do before adding a lot more complexity is to rip out a bunch of code. With that in mind I implemented a generic TGSI lowering pass, to replace about a dozen opcodes with sequences of equivalent simpler instructions. This probably should be made configurable and moved to util, I think most of the lowerings would be useful to other gallium drivers.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">Once the handling of the now unneeded TGSI opcodes was removed, I copied fd3_compiler to fd3_compiler_old. Originally the plan was to remove this before pushing upstream. I just wanted a way to compare the results from the original compiler to the new compiler to help during testing and debugging. But currently shaders with relative addressing need to fall back to the old compiler, so it stays for now.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">The next step was to turn </span><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-size: x-small;"><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;">ir3</span></span></span> (the a3xx IR<span style="font-family: "Courier New",Courier,monospace;"></span>), which originates from the <a href="https://github.com/freedreno/freedreno/tree/master/fdre-a3xx" target="_blank">fdre-a3xx</a> shader assembler into something more useful. The approach I settled on (mostly to ease the transition) was to add a few extra "meta-instructions" to hold some additional information which would be needed in later passes, including </span><span style="font-family: Arial,Helvetica,sans-serif;">Φ (Phi) instructions where a result depends on flow control. Plus a few extra instruction and register flags, the important one being <span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">IR3_REG_SSA</span></span>, used for src register nodes to indicate that the register node points to the dependent instruction. Now what used to be the compiler (well, roughly 2/3rds of it) is the front-end. Instead of producing a linear sequence of instructions fed directly to the assembler/codegen, the frontend is now generating a graph of instructions modified by subsequent passes until we have something suitable for codegen. </span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">For each output, we keep the pointer to the instruction which generates that value (at the scalar level), which in turn has the pointer to the instructions generating it's srcs/inputs, and so on. As before, the front end is generating sequences of scalar instructions for each (written) component in a TGSI vector instruction. Although now instructions whose result is not used simply has nobody pointing to them so they naturally vanish.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">At the same time, mostly to preserve my sanity while debugging, but partially also to make nifty pictures, I implemented an "ir3 dumper" which would dump out the graph in .dot syntax:</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;" target="_blank"><img border="0" src="http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png" height="20" width="320" /></a></div>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">The first pass eliminates some redundant moves (some of which come from the front end, some from TGSI itself). Probably the front end could be a bit more clever about not inserting unneeded moves, but since TGSI has separate <span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">INPUT</span></span>/<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">OUTPUT</span></span>/<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">TEMP</span></span> register files, there will always be some extra moves which need eliminating.</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">After that, I calculate a "depth" for each instruction, where the depth is the number of instruction cycles/slots required to compute that value:</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> dd(instr, n): depth(instr->src[n]) + delay(instr->src[n], </span></span><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">instr</span></span>)</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> depth(instr): 1 + max(dd(instr, 0), ..., dd(instr, N))</span></span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">where <span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">delay(p,c)</span></span> gives the required number of instruction slots between an instruction which produces a value and an instruction which consumes a value.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">The depth is used for scheduling. The short version of how it works is to recursively schedule output instructions with the greatest depth until no more instructions can be scheduled (more delay slots needed). For instructions with multiple inputs/srcs, the unscheduled src instruction with the greatest depth is scheduled first. Once we hit a point where there are some delay slots to fill, we switch to the next deepest output, and so on until the needed delay slots are filled. If there are no instructions that can be scheduled, then we insert <span style="font-family: "Courier New",Courier,monospace;">nop</span>'s.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">Once the graph is scheduled, we have a linear sequence of instructions, at which point we do RA. I won't say too much about that now, since it is already a long post and I'll probably change the algorithm. It is worth noting that some register assignment algorithms can coalesce unneeded moves. Although moves factor into the scheduling decisions for the a3xx ISA, so I'm not really sure that this is too useful me.</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">The end result, thanks to a combination of removal of scalar instructions to calculate TGSI vec4 register components which are unused, plus removal of unnecessary moves, plus scheduling other instructions rather than filling with no-op's everywhere, for non trivial shaders it is not uncommon to see the compiler use ~33% the number of instructions, and half the number of registers.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<b><span style="font-family: Arial,Helvetica,sans-serif;">Testing/Debugging:</span></b><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">Validating compilers is hard. Piglit has a number of tests to exercise relatively specific features. But with games, it isn't always the case that an incorrect shader produces (visually) incorrect results. And visually incorrect results are not always straightforward to trace back to the problem. Ie. games typically have many shaders, many draw calls, tracking down the problematic draw and it's shaders is not always easy.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">So I wrote a very simplistic <a href="https://github.com/freedreno/mesa/commit/f24b351db9b89939df6331894136ecfa95fe4a30" target="_blank">emulator</a> for testing the output of the compiler. I captured the TGSI dumps of all the shaders from various apps (<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">ST_DEBUG=tgsi</span></span>). The test app would assemble the TGSI, feed into both the old and new compiler, then run same sets of randomized inputs through the resulting shaders and compare outputs.</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">There are a few cases where differing output is expected, since the new compiler has slightly more well defined undefined behaviour for shaders that use uninitialized values... to avoid invalid pointers in the graph produced by the front-end, uninitialized values get a '<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">mov Rdst, immed{0.0}</span></span>' instruction. So there are some cases where the resulting shader needs to be manually validated. But in general this let me test (and debug) the new compiler with 100's of shaders in a relatively short amount of time.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<b><span style="font-family: Arial,Helvetica,sans-serif;">Performance:</span></b><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">So the obvious question, what does this all mean in terms of performance? Well, start with the easy results, es2gears[1]:</span><br />
<ul style="text-align: left;">
<li><span style="font-family: Arial,Helvetica,sans-serif;">original compiler: ~435fps</span></li>
<li><span style="font-family: Arial,Helvetica,sans-serif;">new compiler: ~539fps</span></li>
</ul>
<span style="font-family: Arial,Helvetica,sans-serif;">With supertuxkart, the result is a bit easier to show in pictures. Part of the problem is that the tracks that are heavy enough on the GPU to not be purely CPU limited, didn't actually work before with the original compiler. That plus, as far as I know, there is no simple benchmark mode which spits out a number at the end, as with xonotic. So I used the trace points + timechart approach, mentioned in a previous <a href="http://bloggingthemonkey.blogspot.com/2013/09/freedreno-update-moar-fps.html" target="_blank">post</a>.</span><span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> supertuxkart -f --track fortmagma --profile-laps=1</span></span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">I manually took one second long captures, in as close to the same spot as possible (just after light turns green):</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> ./perf timechart record -a -g -o stk-apq8074-opt+bin-1.data sleep 1</span></span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">In this case I was running on an apq8074/a330 device, fwiw. Our starting point is:</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://people.freedesktop.org/~robclark/a3xx/stk-apq8074-1.svg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;" target="_blank"><img border="0" src="http://people.freedesktop.org/~robclark/a3xx/stk-apq8074-1.png" height="153" width="320" /></a></div>
<div style="text-align: center;">
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span></div>
<span style="font-family: Arial,Helvetica,sans-serif;">Then once hw binning is in place, we are starting to look more CPU limited than anything:</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://people.freedesktop.org/~robclark/a3xx/stk-apq8074-bin-1.svg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;" target="_blank"><img border="0" src="http://people.freedesktop.org/~robclark/a3xx/stk-apq8074-bin-1.png" height="153" width="320" /></a></div>
<div style="text-align: center;">
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span></div>
<span style="font-family: Arial,Helvetica,sans-serif;">And with addition of new compiler, the GPU is idle more of the time, but since the GPU is no longer the bottleneck (on the less demanding tracks) there isn't too much change in framerate:</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://people.freedesktop.org/~robclark/a3xx/stk-apq8074-opt+bin-1.svg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;" target="_blank"><img border="0" src="http://people.freedesktop.org/~robclark/a3xx/stk-apq8074-opt+bin-1.png" height="153" width="320" /></a></div>
<div style="text-align: center;">
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span></div>
<span style="font-family: Arial,Helvetica,sans-serif;">Still, it could help power if the GPU can shut off sooner, and other levels which push the GPU harder benefit.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">With binning plus improved compiler, there should not be any more huge performance gaps compared to the blob compiler. Without linux blob drivers, there is no way to make a real apples to apples comparison, but remaining things that could be improved should be a few percent here and there. Which is a good thing. There are still plenty of missing features and undiscovered bugs, I'm sure. But I'm hopefully that we can at least have things in good shape for a3xx before the first a4xx devices ship ;-)</span><br />
<br />
<br />
-----<br />
[1] <span style="font-family: Arial,Helvetica,sans-serif;">Windowed apps
would benefit somewhat from XA support in DDX, avoiding stall for GPU to
complete before sw blit (memcpy) to front buffer.. but the small
default window size for 'gears means that hw binning does not have much
impact. The remaining figures are for fullscreen 1280x720.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span></div>
Robhttp://www.blogger.com/profile/00061851853178706566noreply@blogger.com14tag:blogger.com,1999:blog-8201318254944513910.post-24684449123134893192014-01-08T15:37:00.000-08:002014-01-08T15:37:25.762-08:00freedreno update: new year edition<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: Arial,Helvetica,sans-serif;">Time for another freedreno update. hw binning support, and fun with gallium HUD.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;"><b>Mesa/Gallium:</b></span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">The big news is that <a href="https://github.com/freedreno/freedreno/wiki/Adreno-tiling#optimized-approach" target="_blank">hw binning</a> pass support (for a3xx) is working. This is a pre-pass for all the draws which generates a visibility stream (ie. basically which vertices apply to which tiles) used to speed up the tile rendering step by filtering out non visible vertices for a given tile.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">tl;dr: games or anything with a healthy vertex loading (ie. not window managers) are showing 35-45% fps boost.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">Currently it is not enabled by default. I'd like some time for it to get more testing before it is enabled by default. For now, use the FD_MESA_DEBUG environment variable to enable it, ie:</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: "Courier New",Courier,monospace;"> FD_MESA_DEBUG=binning supertuxkart </span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">Also, since I was looking for a way to correlate fps with various other statistics (in particular batches per second vs frames per second), I started playing with the gallium performance monitor HUD (heads-up-display). With the addition of a few driver custom queries, I had what I needed:</span><br />
<a href="http://people.freedesktop.org/~robclark/stk.png" target="_blank"><span style="font-family: Arial,Helvetica,sans-serif;"><br /></span></a>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://people.freedesktop.org/~robclark/stk.png" target="_blank"><img alt="http://people.freedesktop.org/~robclark/stk.png" border="0" src="http://people.freedesktop.org/~robclark/stk.png" height="180" title="" width="320" /></a></div>
<div style="text-align: center;">
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span></div>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">The driver custom queries:</span><br />
<ul style="text-align: left;">
<li><span style="font-family: "Courier New",Courier,monospace;">draw-calls</span></li>
<li><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;">batches</span> - number of batches per second, sum of <span style="font-family: "Courier New",Courier,monospace;">batches-sysmem</span> plus <span style="font-family: "Courier New",Courier,monospace;">batches-gmem</span></span></li>
<li><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;">batches-gmem</span> - a set of tiles in GMEM rendered, for each tile (optionally) system mem -> gmem (restore), plus N draws, plus gmem -> system mem (resolve); value in batches per second</span></li>
<li><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;">batches-sysmem</span> - draws to system memory (GMEM bypass) per second</span></li>
<li><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;">restores</span> - number of GMEM batches that required restore per second</span></li>
</ul>
<span style="font-family: Arial,Helvetica,sans-serif;">So above screenshot was generated with:</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: "Courier New",Courier,monospace;"> export GALLIUM_HUD=cpu0+cpu1+cpu2+cpu3,fps+batches-sysmem+batches-gmem+restores,draw-calls</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> export FD_MESA_DEBUG=binning</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> supertuxkart -s 1280x720 --demo-mode 1</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">The binning and query support are on mesa master.</span><br />
</div>
Robhttp://www.blogger.com/profile/00061851853178706566noreply@blogger.com5tag:blogger.com,1999:blog-8201318254944513910.post-53347446907174656422013-11-24T07:58:00.000-08:002013-11-24T07:58:44.159-08:00freedreno update<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: Arial,Helvetica,sans-serif;">It's been a while since I've posted an update about the progress of freedreno.. so no major/big headlines, just lots of small stuff.</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;"><b>Mesa 10</b></span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">I finally polished up the support for emulating (via index buffer) GL_QUAD and other desktop GL primitives which aren't supported in hardware by adreno. This is needed for gnome-shell and compiz (and probably other compositing window managers using opengl). The u_primconvert utility could be handy in case any of the other upcoming drivers for SoC GPU's need to emulate any GL primitives which are not in GLES. This, plus some other fixes needed for latest gnome-shell in fedora 20 where merged prior to the mesa 10.0 branch point, meaning that once Mesa 10 trickles into distributions, you should be able to use distro packaged freedreno rather than needing to rebuild mesa from git.</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;"><b>Piglit</b></span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">Since last blog post, I've added support for relative addressing (needed by chromium gl rendering, and a bunch of piglit tests), and fixed a whole bunch of little bugs or missing bits. And I've started publishing piglit <a href="http://people.freedesktop.org/~robclark/summary/all_es2/A320/index.html" target="_blank">results</a>. Don't read too much into the absolute numbers, the all_es2 tests from Tom Gall's <a href="https://git.linaro.org/gitweb?p=people/tomgall/piglit.git;a=shortlog;h=refs/heads/gles2-all" target="_blank">gles2-all</a> branch still has a number of bogus tests (ie. shaders with precision specifier issues, etc), so not all the failures are freedreno bugs. But there has been an increase in pass's (and no more crashers) over last few months.</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">I do really badly need a better collection of GLES2 tests ;-) </span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;"><b>Boards</b></span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">The <a href="http://bloggingthemonkey.blogspot.com/2013/07/freedreno-update-drmkms-and-ifc6410.html" target="_blank">IFC6410</a> is finally shipping out in larger numbers, as more folks in #freedreno are starting to receive their boards. This board has been my primary freedreno dev platform for a while now. If you are looking for a nice small SBC type ARM board with open source graphics, this is a pretty sweet little board. Pico-itx, APQ8064 (1.5GHz quad core krait + adreno 320), 2GiB DDR3, SATA and gigabit-ethernet (hooked up via pci-e, not usb :-)). Only downside is upstream kernel support for APQ8064 is pretty non-existent[1], there is only a downstream msm-3.4 based kernel (see <a href="https://github.com/freedreno/kernel-msm/commits/ifc6410-drm" target="_blank">ifc6410-drm</a> branch).</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">And more recently I received a <a href="http://www.braincorporation.com/portfolio_page/bstem/" target="_blank">bStem</a> board. This board is more targeted at robotics (bunch of sensors, FPGA, and various add on boards for motor/RC control, etc). But it has APQ8060A (1.7GHz dual core krait + adreno 320), and the typical hdmi and usb connectors. I've pushed initial kernel msm drm/kms support to the <a href="https://github.com/freedreno/kernel-msm/commits/bstem-drm" target="_blank">bstem-drm</a> branch. I'm using the same Fedora 20 filesystem that I use with the ifc.</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">Notes:</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;">[1]
APQ8x74 (aka snapdragon 800) seems to be getting into better shape in
upstream kernel, so hopefully we start seeing APQ8074 versions of some
of these boards at some point.</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;"><b>Adreno 4xx</b></span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">Last week qualcomm <a href="http://www.engadget.com/2013/11/20/qualcomm-unveils-snapdragon-805-processor-ultra-HD/" target="_blank">announced</a> their first adreno 420 device. We knew this was coming, since support has been starting to show up in qualcomm's downstream android kernel driver (kgsl) in the last few months. It unfortunately doesn't contain nearly as many useful hints as kgsl did for 2xx and 3xx, but it does give us a few register names. And fwiw, more recent versions of the android blob userspace GLES drivers appear to have support for 4xx.</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">The recent announcements don't give too much details, but previous leaked specs indicate DX11 feature-set, and this seems to be backed up by handful of register names we can see from downstream kgsl driver. (ie. hull/tesselator/domain/geometry shaders, etc).</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">From what I can tell so far, 4xx appears to be same shader ISA as 3xx (phew!), but pretty much all registers change or at least move, and a lot more features in hw. So hopefully shouldn't take as long to figure out compared to 3xx (which had both new shader ISA plus register reshuffling).. at least for getting basics running.</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">Since the recent blob drivers have 4xx support, it should be possible to make a reasonable amount of progress on 4xx r/e before we can get our hands on actual devices. Of course, there is still much to do on 3xx, so for the time being 4xx is not a priority. </span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;"><b>Mailing List, etc</b></span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br />Since more folks are starting to play with freedreno (on IFC6410 and other devices), the whole email-questions-directly-to-rob thing is starting to look like it might not scale too well in the long run. And, asking questions on IRC doesn't work out too well if you don't have a bip or screen setup to keep your connection alive until someone has a chance to wake up and answer. So now we have a mailing list:</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;"> <a href="http://lists.freedesktop.org/mailman/listinfo/freedreno">http://lists.freedesktop.org/mailman/listinfo/freedreno</a></span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">That plus steadily improving docs and info on the <a href="https://github.com/freedreno/freedreno/wiki" target="_blank">wiki</a> should hopefully help.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span></div>
Robhttp://www.blogger.com/profile/00061851853178706566noreply@blogger.com1tag:blogger.com,1999:blog-8201318254944513910.post-43933372003771802172013-09-14T14:48:00.000-07:002013-09-14T14:48:22.393-07:00freedreno update: moar fps!<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: Arial,Helvetica,sans-serif;">Now that msm drm/kms kernel driver is merged upstream, I've spent the last few weeks on a bit of a debugging / fixing spree. (Yes, an odd way to start a post about performance/profiling.) I added proper support for mipmaps/cubemaps/etc (multi-slice resources), killed a few gpu lockup bugs, installed a bunch of games and went looking for and fixing rendering issues. I've put together a <a href="https://github.com/freedreno/freedreno/wiki#status" target="_blank">status</a> table on the freedreno wiki.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">In the process, I noticed some games, such as <a href="http://supertuxkart.sourceforge.net/" target="_blank">supertuxkart</a>, which had low fps, also also had unusually low gpu utilization (30-50%). Now, a new graphics driver stack will always have lots of room for optimization (which is certainly true of freedreno). The key is to know which optimization to work on first. It does no good to make the shader compiler generate 2x faster shaders (which I think is currently possible) if that is just going to take you from 30-50% utilization to 15-25% utilization at roughly the same fps. So before we get to the fun optimizations, we need to take care of any of the cpu side bottlenecks in the driver.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">Now the linux <a href="https://perf.wiki.kernel.org/" target="_blank">perf tool</a> is pretty nice just for identifying purely cpu bottlenecks. In fact it showed me pretty quickly that the upstream IOMMU framework struggles with gpu type workloads. Mapping/unmapping individual pages is not really the way to do it. On the downstream msm-3.4 based android kernel, we have <span style="font-family: "Courier New",Courier,monospace;">iommu_map_range()</span> and </span><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",Courier,monospace;">iommu_unmap_range()</span></span>[<a href="#fn01a" id="fn01b">1</a>]... using these instead is worth 2-3 fps in xonotic, and probably more in supertuxkart, but we'll come back to that.</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">But perf tool does not really help much with gpu or cpu/gpu interactions, at least not by itself. So, first I added some <a href="https://github.com/freedreno/kernel-msm/commit/e915a428b25642fcd5f0537cf447b6e776df7d7e" target="_blank">trace points</a> in the kernel drm/kms driver.. in particular, I put tracepoints:</span><br />
<ol style="text-align: left;">
<li><span style="font-family: Arial,Helvetica,sans-serif;">tracing the fence # when work is submitted to the gpu, and when we get the completion interrupt.</span></li>
<li><span style="font-family: Arial,Helvetica,sans-serif;">tracing the fence # when cpu waits on a fence and when it finishes waiting</span></li>
<li><span style="font-family: Arial,Helvetica,sans-serif;">and when pageflip is requested and when it completes (after rendering completes and after vsync)</span></li>
</ol>
<span style="font-family: Arial,Helvetica,sans-serif;">And then I <a href="https://github.com/freedreno/kernel-msm/commit/67b267b5b5e584e3c81ecb3ab970492e13df961c" target="_blank">hacked up</a> the perf timechart tool to display gpu information in the timechart, for a nice timeline overview. Currently I have it looking for the msm trace events, but I think that it would be useful to have a small set of generic trace events which all the drm drivers can use, so that tools won't have to be looking for driver specific traces. I think what I have is a reasonable start, but probably needs a bit of work to handle gpu's that have multiple rings, etc.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">With that, I fired up supertuxkart again (in demo mode so it will drive itself), and then <span style="font-family: "Courier New",Courier,monospace;">perf timechart record</span> for a couple seconds to capture a short trace:</span>
<a href="http://people.freedesktop.org/~robclark/perf-supertuxkart-discard.png" target="_blank"><span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
</a><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://people.freedesktop.org/~robclark/perf-supertuxkart.svg" target="_blank"><img border="0" height="203" src="http://people.freedesktop.org/~robclark/perf-supertuxkart.png" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">You can see above, there is a new bar at the top, below the cpu bars, for the gpu, showing when the gpu is active. And a green overlay bar on the gpu showing where pageflip has been requested (typically right after rendering submitted), and when pageflip completes (next vblank after rendering completes. And below, in the per-process bars, a yellow overlay marker when the process is pending on a fence (waiting for some gpu rendering to complete).</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">And immediately we can see see that that the bottleneck is a fence that supertuxkart is stalling on before it is able to submit rendering for the next frame. After a little bit of poking, I realized that I should implement support for <span style="font-family: "Courier New",Courier,monospace;">PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE</span> in the freedreno gallium driver. If this usage bit is set, it is a hint to the gallium driver that the previous buffer contents do not need to be preserved after the upload. So in cases that the backing gem buffer object (bo) is still busy (referenced by previous rendering which is not yet complete), it is better to just delete the bo and create a new one, rather than stalling the cpu. The drm driver holds a ref for bo's that are associated to gpu rendering which has not yet completed, so the pages for the old bo don't go away until the gpu is finished with them.</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">With this change, things have improved, but there is still a bottleneck:</span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://people.freedesktop.org/~robclark/perf-supertuxkart-discard.svg" target="_blank"><img border="0" height="191" src="http://people.freedesktop.org/~robclark/perf-supertuxkart-discard.png" width="400" /></a><span id="goog_599990708"></span><span id="goog_599990709"></span></div>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">(note that the timescale differs between these three timecharts, since the capture duration differed)</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">Oddly we see a lot of activity on kworker (workqueue worker thread in the kernel). This is mainly retire_worker, in particular releasing the reference that the driver holds to bo's for rendering which is now completed. After a bit more digging, it turns out that supertuxkart is creating on the order of 150-200 transient buffers per frame. Unref'ing these, unmapping from IOMMU and cpu, and deleting backing pages for that many buffers takes some time. Even with some optimization in the kernel, there is still going to be a lot of overhead in the associated vma setup/teardown (since many of these buffers are used for vertex/attribute upload, and will need to be mmap'd), zeroing out pages before the next allocation, etc.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">So borrowing an idea from i915, I implemented a bo cache in userspace, in libdrm_freedreno. On new allocations, we round up to the next bucket size, and if there is a unused buffer in the bucket cache which is not still busy, we take that buffer instead of allocating a new one. (If I add a BO_FOR_RENDERING flag, like i915, I could take a still-busy gem bo for cases where I know cpu access will not be needed... by the time the gpu starts writing to the buffer, it will be no longer busy.)</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">With this, things look much better:</span><br />
<a href="http://people.freedesktop.org/~robclark/perf-supertuxkart-bocache.svg" target="_blank"><span style="font-family: Arial,Helvetica,sans-serif;"><br /></span></a>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://people.freedesktop.org/~robclark/perf-supertuxkart-bocache.svg" target="_blank"><img border="0" height="191" src="http://people.freedesktop.org/~robclark/perf-supertuxkart-bocache.png" width="400" /></a></div>
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">As you can see, the gpu is nearly continuously occupied. And a nice benefit is a drop in cpu utilization. To do this properly, I need to add a MADVISE style ioctl in msm drm/kms driver, so userspace can advise the kernel that it is keeping a bo around in a cache, and that the kernel is free to free the backing pages under memory pressure, tear down the cpu mapping, etc. This will prevent the wrath of the OOM killer :-)</span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">So now with the bottlenecks in the driver worked out, future work to make the gpu render faster (ie, hw binning pass, shader compiler optimizations, etc) will actually bring a meaningful benefit. </span><br />
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">Notes: </span><br />
<span style="font-family: Arial,Helvetica,sans-serif;">[<a href="#fn01b" id="fn01a">1</a>] just fwiw, the ideal IOMMU API would give me a way to make multiple map/unmap updates without tlb/etc flush. This should be even better than the map/unmap_range variants. I know when I'm submitting rendering jobs which reference the buffers to the GPU, so I have good points for a batch IOMMU update flush.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
</div>
Robhttp://www.blogger.com/profile/00061851853178706566noreply@blogger.com6tag:blogger.com,1999:blog-8201318254944513910.post-50030114480273783912013-08-25T12:59:00.003-07:002013-08-25T12:59:48.424-07:00freedreno wiki update<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: Arial,Helvetica,sans-serif;">I've spent the morning cleaning up and adding some useful information to the freedreno <a href="https://github.com/freedreno/freedreno/wiki" target="_blank">wiki</a> (such as <a href="https://github.com/freedreno/freedreno/wiki/A3xx-shader-instruction-set-architecture" target="_blank">a3xx shader isa</a>, how <a href="https://github.com/freedreno/freedreno/wiki/Adreno-tiling" target="_blank">tiling</a> works on adreno, and how to use the various <a href="https://github.com/freedreno/freedreno/wiki/Reverse-engineering-tools" target="_blank">tools</a>). So if you want to learn how adreno works and/or start to contribute yourself, now you have no excuse ;-)</span></div>
Robhttp://www.blogger.com/profile/00061851853178706566noreply@blogger.com0tag:blogger.com,1999:blog-8201318254944513910.post-16120422009842674442013-07-30T10:24:00.001-07:002013-07-30T10:24:36.102-07:00freedreno update: drm/kms and ifc6410<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: Arial,Helvetica,sans-serif;">About a month ago, I received a new ARM dev board, a <a href="http://www.inforcecomputing.com/product/moreinfo/ifc6410.html" target="_blank">IFC6410</a>! Which despite the boring sounding name is quite an impressive bit of kit. About $150, quad-core krait, 2G DDR, SATA, gigabit ethernet.. and adreno a320. It is basically the same SoC that is in the nexus4 (or the new nexus7). But in more convenient form factor for development.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">And with this board that I've been developing a new msm drm/kms driver. For a while now, freedreno has been limping along with the msm fbdev and kgsl drivers from their android kernel tree, while I focused on the userspace gallium driver and ddx (xf86-video-freedreno). But that was always a short-term solution.. with the qcom android drivers, I can't really handle synchronization between processes, which gets really crazy w/ x11 and compositing window manager where you have sharing in both directions (as texture and/or render target), I can't handle page flipping (let alone page flipping synchronized with the GPU), and have general robustness issues.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">Unfortunately, the msm android fbdev driver code is a real mess (at least the mdp4 parts). Even by android / vendor kernel standards, which are pretty low to begin with. And I don't have any docs on the display controller. In the end, I ended up instrumenting the code to trace all the register reads/writes, etc, wrote a small parser tool using envytools/librnn, and starting writing rnndb register database for the display controller registers. It was a lot easier to get a general picture of how the hardware works that way! Plus I can generate register level headers from rnndb in the same way I do for the gallium driver.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">So, earlier in the month, I sent first round of RFC patches, with just basic KMS support. A couple weeks ago I send the 2nd round which added a3xx gpu support and basic kmscube working. Since then I've fixed a few things, added HW cursor support and more gpu debugfs bits to help when things go wrong. And added kms support in xf86-video-freedreno. And so 3rd (and hopefully final-RFC round) of patches will go out soon. But now, time for some eye candy:</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">gnome-shell running on freedreno + msm drm/kms:</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen='allowfullscreen' webkitallowfullscreen='webkitallowfullscreen' mozallowfullscreen='mozallowfullscreen' width='320' height='266' src='https://www.youtube.com/embed/RycBWfvdwoE?feature=player_embedded' frameborder='0'></iframe></div>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<br />
<span style="font-family: Arial,Helvetica,sans-serif;">and, now that we have drm/kms support, we can use wayland/weston drm compositor:</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiL2GoF3kkr9yiclh1KQJ9Cz-qZAz25iJ5BLO2Paq4mjOHxl3aNpqViO2TEkfZJyvZX_pf3nhBDoTkgBbLn6FudaG-zWqy5ZdA78bzSRzHM3DSNOXalxHPosJcyvjv9Npz8At3rM-jByNMl/s1600/2013-07-30_12-01-05_104.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="180" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiL2GoF3kkr9yiclh1KQJ9Cz-qZAz25iJ5BLO2Paq4mjOHxl3aNpqViO2TEkfZJyvZX_pf3nhBDoTkgBbLn6FudaG-zWqy5ZdA78bzSRzHM3DSNOXalxHPosJcyvjv9Npz8At3rM-jByNMl/s320/2013-07-30_12-01-05_104.jpg" width="320" /></a></div>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">so, as <a href="https://fedoraproject.org/wiki/Changes/Wayland" target="_blank">gnome-shell as-a wayland compositor</a> work progresses, freedreno should be in good shape for the next generation of linux desktop :-)</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">-----</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">NOTE: If you look on the msm-drm branches in libdrm and xf86-video-freedreno trees, you'll notice that I've structured things to work on either current android drivers (with a couple small patches), or on msm drm/kms driver. This is mainly because it is unlikely that I'll be able to support every random lcd panel on every snapdragon phone/tablet that someone might want to try out freedreno on. Time permitting, I'll eventually add support for the LCD panels on devices I have (HP touchpad, nexus4), and support for some of the older generation adreno gpus.. although patches certainly welcome ;-)</span></div>
Robhttp://www.blogger.com/profile/00061851853178706566noreply@blogger.com22tag:blogger.com,1999:blog-8201318254944513910.post-65446019847939599962013-06-05T13:38:00.001-07:002013-06-05T13:38:37.813-07:00fedora 19 installer for nexus4<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: Arial,Helvetica,sans-serif;">As promised in my previous post, now there is an f19 installer for nexus4:</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;"><a href="https://github.com/freedreno/nexus4-fedora">https://github.com/freedreno/nexus4-fedora</a></span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">So if you have an n4 and a bit of free space, you can play around with accelerated open-source gpu goodness :-)</span><br />
<br /></div>
Robhttp://www.blogger.com/profile/00061851853178706566noreply@blogger.com15tag:blogger.com,1999:blog-8201318254944513910.post-72098149091864536282013-06-02T16:42:00.001-07:002013-06-02T16:42:38.346-07:00freedreno + gnome-shell on nexus4/a320<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: Arial,Helvetica,sans-serif;">We don't need no binary blobz :-)</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen='allowfullscreen' webkitallowfullscreen='webkitallowfullscreen' mozallowfullscreen='mozallowfullscreen' width='320' height='266' src='https://www.youtube.com/embed/xXSOp-4Pyvg?feature=player_embedded' frameborder='0'></iframe></div>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">gallium/freedreno + xf86-video-freedreno using XA gallium state tracker on fedora F18. Gnome-shell, compiz, xonontic, ioquake all work. Just need to clean up the patches for XA and freedreno a bit more before they are ready for upstream. And hopefully in the next couple days I'll have some time to put together a sort of make-shift installer for anyone else who wants to try.</span></div>
Robhttp://www.blogger.com/profile/00061851853178706566noreply@blogger.com7tag:blogger.com,1999:blog-8201318254944513910.post-17863783176558557292013-05-19T18:06:00.002-07:002013-05-19T18:06:30.283-07:00gallium a3xx es2gears<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: Arial,Helvetica,sans-serif;">So, I've been working on the freedreno gallium support for adreno a3xx for the past few weeks or so, and now it is starting to take shape:</span><br />
<div style="text-align: center;">
<br /></div>
<div style="text-align: center;">
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<iframe allowfullscreen="" frameborder="0" height="360" src="http://www.youtube.com/embed/-rIGbkPSzp8" width="480"></iframe>
</div>
<div style="text-align: center;">
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;"> </span></div>
<span style="font-family: Arial,Helvetica,sans-serif;"> The full video <a href="http://youtu.be/-rIGbkPSzp8" target="_blank">here</a>.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">The code is on the a3xx branch on github.. still has a ways got go before I'm ready to go upstream with it, but now we're getting into the fun bits :-)</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;">Special thanks to Benjamin Tissoires for getting the touchscreen going for me on the nexus4.</span><br />
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span>
<span style="font-family: Arial,Helvetica,sans-serif;"><br /></span></div>
Robhttp://www.blogger.com/profile/00061851853178706566noreply@blogger.com4