* libva decoding performance regression with kernel 4.0-rc
@ 2015-04-10 1:00 Olivier Crête
2015-04-10 6:23 ` Chris Wilson
0 siblings, 1 reply; 3+ messages in thread
From: Olivier Crête @ 2015-04-10 1:00 UTC (permalink / raw)
To: Brad Volkin
Cc: Daniel Vetter, Jani Nikula, David Airlie, intel-gfx, dri-devel,
linux-kernel
Hello,
Using an Atom E3845 board, we had a pretty bad performance regression
when upgrading to 4.0-rc6 from 3.19. With the help of git bisect, I
traced it back to commit 78a42377. Reverting this commit and subsequent
related commits (b9ffd80, 71745376, etc) fixes the performance
regression for me.
Without those patches, I can play 8-9 1080p MPEG2 streams, after them,
it's down to 5-6.
I tested using a libdrm checkout from Feb 16, and the latest git master
of libva, libva-intel-driver and gst-plugins-vaapi. The "identity
drop-probability=1" is to prevent anything from being displayed, so it's
purely decoding performance.
Pure decode, single stream not displayed:
time gst-launch-1.0 filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! identity drop-probability=1 ! vaapisink
With kernel 3.18.0-rc7-01052-g493018d
real 0m11.429s
user 0m6.516s
sys 0m1.640s
With kernel 3.18.0-rc7-01053-g78a4237
real 0m12.694s
user 0m6.744s
sys 0m2.680s
8 simultaneous streams displayed:
time gst-launch-1.0 filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \
filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \
filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \
filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \
filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \
filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \
filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0 \
filesrc location=18Mbps_CBR_MPEG2_Main-High_1920x1080p_16x9_29-97fps.m2t ! tsdemux ! mpegvideoparse ! vaapidecode ! vaapisink sync=0
With kernel 3.18.0-rc7-01052-g493018d
real 2m45.317s
user 1m21.296s
sys 0m51.080s
With kernel 3.18.0-rc7-01053-g78a4237
real 3m1.275s
user 1m24.336s
sys 1m38.360s
--
Olivier Crête
olivier.crete@collabora.com
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: libva decoding performance regression with kernel 4.0-rc
2015-04-10 1:00 libva decoding performance regression with kernel 4.0-rc Olivier Crête
@ 2015-04-10 6:23 ` Chris Wilson
2015-04-10 23:25 ` Olivier Crête
0 siblings, 1 reply; 3+ messages in thread
From: Chris Wilson @ 2015-04-10 6:23 UTC (permalink / raw)
To: Olivier Crête
Cc: Brad Volkin, Daniel Vetter, Jani Nikula, David Airlie, intel-gfx,
dri-devel, linux-kernel
On Thu, Apr 09, 2015 at 09:00:43PM -0400, Olivier Crête wrote:
> Hello,
>
> Using an Atom E3845 board, we had a pretty bad performance regression
> when upgrading to 4.0-rc6 from 3.19. With the help of git bisect, I
> traced it back to commit 78a42377. Reverting this commit and subsequent
> related commits (b9ffd80, 71745376, etc) fixes the performance
> regression for me.
Can you please test
http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=for-olivier-crete
on your setup.
First
http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=for-olivier-crete&id=044307a99b418258ac0d775460d73b20b80277c1
to get a baseline with nightly as that contains some fine tuning to the
batch allocations, which is pretty significant for libva on Atom (only
double clflushing one or two pages every batch rather than 128) and then
http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=for-olivier-crete&id=0a24802a5b61403b887ce401ce3efd52f5fd1eac
to see if the command parser tuning helps.
Hope this helps,
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: libva decoding performance regression with kernel 4.0-rc
2015-04-10 6:23 ` Chris Wilson
@ 2015-04-10 23:25 ` Olivier Crête
0 siblings, 0 replies; 3+ messages in thread
From: Olivier Crête @ 2015-04-10 23:25 UTC (permalink / raw)
To: Chris Wilson
Cc: Daniel Vetter, Jani Nikula, David Airlie, intel-gfx, dri-devel,
linux-kernel
Hello,
Thanks for the quick reply!
With my real use-cases:
1. 9x 720p60 mpeg2 videos
- 4.0-rc6: ~12 frames per second are on time
- 4.0-rc6 + reverts: a stable 45 frames per second are on time
- 044307a9: 40-45 frames per second are on time
- 0a24802a: 45-46 frames per second are on time
2. 1080i30 mpeg2 videos
- 4.0-rc6: 5 videos
- 044307a9: 10 videos
- 0a24802a: 10 videos
So you basically beat my baseline too, good job, thanks a lot! Any
chance you can sneak this into 4.0 ?
Olivier
On Fri, 2015-04-10 at 07:23 +0100, Chris Wilson wrote:
> On Thu, Apr 09, 2015 at 09:00:43PM -0400, Olivier Crête wrote:
> > Hello,
> >
> > Using an Atom E3845 board, we had a pretty bad performance regression
> > when upgrading to 4.0-rc6 from 3.19. With the help of git bisect, I
> > traced it back to commit 78a42377. Reverting this commit and subsequent
> > related commits (b9ffd80, 71745376, etc) fixes the performance
> > regression for me.
>
> Can you please test
>
> http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=for-olivier-crete
>
> on your setup.
>
> First
> http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=for-olivier-crete&id=044307a99b418258ac0d775460d73b20b80277c1
> to get a baseline with nightly as that contains some fine tuning to the
> batch allocations, which is pretty significant for libva on Atom (only
> double clflushing one or two pages every batch rather than 128) and then
> http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=for-olivier-crete&id=0a24802a5b61403b887ce401ce3efd52f5fd1eac
> to see if the command parser tuning helps.
>
> Hope this helps,
> -Chris
>
--
Olivier Crête
olivier.crete@collabora.com
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-04-10 23:26 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-10 1:00 libva decoding performance regression with kernel 4.0-rc Olivier Crête
2015-04-10 6:23 ` Chris Wilson
2015-04-10 23:25 ` Olivier Crête
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox