Monday, August 23, 2010

ffpv8 + neon = 720p24

btw, been a long time since I had a chance to update the blog.. so I just thought I'd drop a quick note about something I've been playing with for the last few weekends.. the new ffvp8 decoder!

I've started writing the neon dsp functions for the VP8 decoder, as an excuse to learn a bit more about sw video codecs, neon, and VP8. At this point, not all of the dsp functions are implemented, but all the important ones for all the VP8 clips that I can find are implemented (loop filter, bicubic MC functions, and some misc other functions). Most of the major other ones, such as the bilinear MC functions, don't seem to be used in the clips that I can find, but should not be too hard to add when I find clips to test with.

The result is some 15-20% faster than libvpx, mostly thanks to ffvp8 being more cache friendly than libvpx decoder, and not doing silly things like memcpy of reference frames, rather than my hard-core neon optimizing skills.. and this is even without ffvp8 being a multi-threaded decoder, which is something that would benefit an SMP cortex-a9 platform like OMAP4 if done properly. And all this should be possible to get a bit faster by spending some time tweaking the instruction order to avoid stalls and some other tricks like that. (And hopefully I'll learn a few tricks in the process as the patches are reviewed.)

The result so far is here:


Current status is that it is all working, and producing bit exact output compared the plain 'C' versions of the DSP functions for all the test clips I have. I'll update again when I add more or when the patches are in upstream ffmpeg.

I also have some work-in-progress patches for gst-ffmpeg to avoid a memcpy for codecs that don't support edge emulation, although these depend on rowstride and some of the other related features that we've added to GStreamer for omap4.

3 comments:

  1. IIRC, samples 3, 4 and 7 from the testsuite (http://samples.mplayerhq.hu/fate-suite/vp8-test-vectors-r1/) use the bilin MC filter, but I wouldn't worry about it since it's only used for profile=1 or profile=2, and that's not the default for libvpx encoding (or in other words: you won't find any such clips in the wild, so apart from the academic exercise there really isn't too much of a point).

    Cool work! Will you submit this patch to the FFmpeg team?

    ReplyDelete
  2. Ahh, cool, thx for the links to the clips. Good to know that those won't be found in the wild.. although if those functions were implemented in neon a clip encoded for profile 1 or 2 should be a bit faster I guess.

    yeah, it is my intention that the patches can be taken in by FFmpeg. I've sent them to mru who as going to review, but I guess he's been busy. I guess I should ping him again.

    ReplyDelete
  3. Rock 'n' roll Robbo. Nice work, again!

    Next time I suggest you paint fiery streaks down the sides of your patchset to really let people know you mean business.

    ReplyDelete