public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* libva decoding performance regression with kernel 4.0-rc
@ 2015-04-10  1:00 Olivier Crête
  2015-04-10  6:23 ` Chris Wilson
  0 siblings, 1 reply; 3+ messages in thread
From: Olivier Crête @ 2015-04-10  1:00 UTC (permalink / raw)
  To: Brad Volkin
  Cc: Daniel Vetter, Jani Nikula, David Airlie, intel-gfx, dri-devel,
	linux-kernel

Hello,

Using an Atom E3845 board, we had a pretty bad performance regression
when upgrading to 4.0-rc6 from 3.19. With the help of git bisect, I
traced it back to commit 78a42377. Reverting this commit and subsequent
related commits (b9ffd80, 71745376, etc) fixes the performance
regression for me.

Without those patches, I can play 8-9 1080p MPEG2 streams, after them,
it's down to 5-6.

I tested using a libdrm checkout from Feb 16, and the latest git master
of libva, libva-intel-driver and gst-plugins-vaapi. The "identity
drop-probability=1" is to prevent anything from being displayed, so it's
purely decoding performance.

Pure decode, single stream not displayed:
time gst-launch-1.0 filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! identity drop-probability=1 ! vaapisink

With kernel 3.18.0-rc7-01052-g493018d
real	0m11.429s
user	0m6.516s
sys	0m1.640s

With kernel 3.18.0-rc7-01053-g78a4237
real	0m12.694s
user	0m6.744s
sys	0m2.680s


8 simultaneous streams displayed:
time gst-launch-1.0 filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \
  filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \
  filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \
  filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \
  filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \
  filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \
  filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \
  filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0

With kernel 3.18.0-rc7-01052-g493018d
real	2m45.317s
user	1m21.296s
sys	0m51.080s

With kernel 3.18.0-rc7-01053-g78a4237
real	3m1.275s
user	1m24.336s
sys	1m38.360s


-- 
Olivier Crête
olivier.crete@collabora.com


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: libva decoding performance regression with kernel 4.0-rc
  2015-04-10  1:00 libva decoding performance regression with kernel 4.0-rc Olivier Crête
@ 2015-04-10  6:23 ` Chris Wilson
  2015-04-10 23:25   ` Olivier Crête
  0 siblings, 1 reply; 3+ messages in thread
From: Chris Wilson @ 2015-04-10  6:23 UTC (permalink / raw)
  To: Olivier Crête
  Cc: Brad Volkin, Daniel Vetter, Jani Nikula, David Airlie, intel-gfx,
	dri-devel, linux-kernel

On Thu, Apr 09, 2015 at 09:00:43PM -0400, Olivier Crête wrote:
> Hello,
> 
> Using an Atom E3845 board, we had a pretty bad performance regression
> when upgrading to 4.0-rc6 from 3.19. With the help of git bisect, I
> traced it back to commit 78a42377. Reverting this commit and subsequent
> related commits (b9ffd80, 71745376, etc) fixes the performance
> regression for me.

Can you please test

http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=for-olivier-crete

on your setup.

First
http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=for-olivier-crete&id=044307a99b418258ac0d775460d73b20b80277c1
to get a baseline with nightly as that contains some fine tuning to the
batch allocations, which is pretty significant for libva on Atom (only
double clflushing one or two pages every batch rather than 128) and then
http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=for-olivier-crete&id=0a24802a5b61403b887ce401ce3efd52f5fd1eac
to see if the command parser tuning helps.

Hope this helps,
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: libva decoding performance regression with kernel 4.0-rc
  2015-04-10  6:23 ` Chris Wilson
@ 2015-04-10 23:25   ` Olivier Crête
  0 siblings, 0 replies; 3+ messages in thread
From: Olivier Crête @ 2015-04-10 23:25 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Daniel Vetter, Jani Nikula, David Airlie, intel-gfx, dri-devel,
	linux-kernel

Hello,

Thanks for the quick reply!

With my real use-cases:

1. 9x 720p60 mpeg2 videos
 - 4.0-rc6: ~12 frames per second are on time
 - 4.0-rc6 + reverts: a stable 45 frames per second are on time
 - 044307a9: 40-45 frames per second are on time
 - 0a24802a: 45-46 frames per second are on time

2. 1080i30 mpeg2 videos
 - 4.0-rc6:  5 videos
 - 044307a9: 10 videos
 - 0a24802a: 10 videos

So you basically beat my baseline too, good job, thanks a lot! Any
chance you can sneak this into 4.0 ?

Olivier

On Fri, 2015-04-10 at 07:23 +0100, Chris Wilson wrote:
> On Thu, Apr 09, 2015 at 09:00:43PM -0400, Olivier Crête wrote:
> > Hello,
> > 
> > Using an Atom E3845 board, we had a pretty bad performance regression
> > when upgrading to 4.0-rc6 from 3.19. With the help of git bisect, I
> > traced it back to commit 78a42377. Reverting this commit and subsequent
> > related commits (b9ffd80, 71745376, etc) fixes the performance
> > regression for me.
> 
> Can you please test
> 
> http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=for-olivier-crete
> 
> on your setup.
> 
> First
> http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=for-olivier-crete&id=044307a99b418258ac0d775460d73b20b80277c1
> to get a baseline with nightly as that contains some fine tuning to the
> batch allocations, which is pretty significant for libva on Atom (only
> double clflushing one or two pages every batch rather than 128) and then
> http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=for-olivier-crete&id=0a24802a5b61403b887ce401ce3efd52f5fd1eac
> to see if the command parser tuning helps.
> 
> Hope this helps,
> -Chris
> 

-- 
Olivier Crête
olivier.crete@collabora.com


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-04-10 23:26 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-10  1:00 libva decoding performance regression with kernel 4.0-rc Olivier Crête
2015-04-10  6:23 ` Chris Wilson
2015-04-10 23:25   ` Olivier Crête

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox