From: Nicolas Dufresne <nicolas.dufresne@collabora.com>
To: "Arnd Bergmann" <arnd@arndb.de>,
"Arnd Bergmann" <arnd@kernel.org>,
"Detlev Casanova" <detlev.casanova@collabora.com>,
"Ezequiel Garcia" <ezequiel@vanguardiasur.com.ar>,
"Mauro Carvalho Chehab" <mchehab@kernel.org>,
"Heiko Stübner" <heiko@sntech.de>,
"Nathan Chancellor" <nathan@kernel.org>,
"Hans Verkuil" <hverkuil+cisco@kernel.org>
Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com>,
Bill Wendling <morbo@google.com>,
Justin Stitt <justinstitt@google.com>,
linux-media@vger.kernel.org, linux-rockchip@lists.infradead.org,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, llvm@lists.linux.dev
Subject: Re: [PATCH 1/2] media: rkvdec: reduce excessive stack usage in assemble_hw_pps()
Date: Mon, 02 Feb 2026 10:12:53 -0500 [thread overview]
Message-ID: <da9f044152383cacd50989b025fdce08a654bbe3.camel@collabora.com> (raw)
In-Reply-To: <3b89635f-1c1c-4e4e-b0a9-2bbd0f21bc90@app.fastmail.com>
[-- Attachment #1: Type: text/plain, Size: 4529 bytes --]
Hi Arnd,
Le lundi 02 février 2026 à 15:09 +0100, Arnd Bergmann a écrit :
> On Mon, Feb 2, 2026, at 14:42, Nicolas Dufresne wrote:
> > Le lundi 02 février 2026 à 10:47 +0100, Arnd Bergmann a écrit :
> > > From: Arnd Bergmann <arnd@arndb.de>
> > >
> > > The rkvdec_pps had a large set of bitfields, all of which
> > > as misaligned. This causes clang-21 and likely other versions to
> > > produce absolutely awful object code and a warning about very
> > > large stack usage, on targets without unaligned access:
> > >
> > > drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c:966:12: error: stack frame size (1472) exceeds limit (1280) in 'rkvdec_vp9_start' [-Werror,-Wframe-larger-than]
> >
> > We had already addressed and validated that on clang-21, which indicates me that
> > we likely are missing an architecture (or a config) in our CI. Can you document
> > which architecture, configuration and flags was affected so we can add it on our
> > side ?
> >
> > Our media pipeline before sending to Linus and the clang builds trace are in the
> > following link, in case it matters.
> >
> > https://gitlab.freedesktop.org/linux-media/media-committers/-/pipelines/1588731
> > https://gitlab.freedesktop.org/linux-media/media-committers/-/jobs/91604655
>
> The configuration that hit this for me was an ARMv7-M NOMMU build. I'm
> doing 'randconfig' builds here, so I inevitably hit some corner cases
> that all deterministic CI systems miss. I don't think that you should
> add ARMv7-M here, since that would take up useful build resources
> from something more important. There are no drviers/media/ actual
> users on ARMv7-M, and next time it is going to be something else.
>
> > > Part of the problem here is how all the bitfield accesses are
> > > inlined into a function that already has large structures on
> > > the stack.
> >
> > Another observation is that you had to enable ASAN to make it miss-behave on for
> > loop unrolling (with complex bitfield writes). All I've obtained by visiting
> > the Link: is that its armv7-a architecture.
>
> Right, this randconfig build likely got closer to the warning
> limit because of the inherent overhead in KASAN, but the problem
> with the unaligned bitfields was something that I could later
> reproduce without KASAN, on ARMv5 and MIPS32r2.
>
> This is something we should fix in clang.
All fair comments. I plan to take this into fixes (no changes needed), hopefully
for rc-2.
Performance wise, this code is to replace read/mask/write into hardware
registers which was significantly slower for this amount of registers (~200
32bit integers) and this type of IP (its not sram). This is run once per frame.
In practice, if we hand code the read/mask/write, the performance should
eventually converge to using bitfield and letting the compiler do this masking,
I was being optimistic on how the compiler would behave. If performance of that
is truly a problem, we can always just prepare the ram register ahead of the
operation queue (instead of doing it in the executor).
One thing to remind, you can't optimize the data structure layout, since they
need to match the register layout. But while fixing some of the stack report
previously, I did endup up moving few things out of loops (which is not clearly
feasible in this patch). I did not checked all the code (only the failing one).
One of the bad pattern which costed stack (and overhead probably) was the use of
switch() statement to pick one of the unaligned register location, with that
switch being part of an unrolled loop. If you ever spot these, and have time,
please just manually unroll the switch out of the loop (its actually less code).
>
> > > Mark set_field_order_cnt() as noinline_for_stack, and split out
> > > the following accesses in assemble_hw_pps() into another noinline
> > > function, both of which now using around 800 bytes of stack in the
> > > same configuration.
> > >
> > > There is clearly still something wrong with clang here, but
> > > splitting it into multiple functions reduces the risk of stack
> > > overflow.
> >
> > We've tried really hard to avoid this noninline_for_stack just because compilers
> > are buggy. I'll have a look again in case I find some ideas, but meanwhile, with
> > failing architecture in the commit message:
> >
> > Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
>
> Thanks!
>
> Arnd
thanks to you,
Nicolas
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
next prev parent reply other threads:[~2026-02-02 15:13 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-02 9:47 [PATCH 1/2] media: rkvdec: reduce excessive stack usage in assemble_hw_pps() Arnd Bergmann
2026-02-02 9:47 ` [PATCH 2/2] media: rkvdec: reduce stack usage in rkvdec_init_v4l2_vp9_count_tbl() Arnd Bergmann
2026-02-02 16:32 ` Nicolas Dufresne
2026-02-02 13:42 ` [PATCH 1/2] media: rkvdec: reduce excessive stack usage in assemble_hw_pps() Nicolas Dufresne
2026-02-02 14:09 ` Arnd Bergmann
2026-02-02 15:12 ` Nicolas Dufresne [this message]
2026-02-02 15:59 ` Arnd Bergmann
2026-02-02 16:31 ` Nicolas Dufresne
2026-02-02 22:45 ` David Laight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=da9f044152383cacd50989b025fdce08a654bbe3.camel@collabora.com \
--to=nicolas.dufresne@collabora.com \
--cc=arnd@arndb.de \
--cc=arnd@kernel.org \
--cc=detlev.casanova@collabora.com \
--cc=ezequiel@vanguardiasur.com.ar \
--cc=heiko@sntech.de \
--cc=hverkuil+cisco@kernel.org \
--cc=justinstitt@google.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-media@vger.kernel.org \
--cc=linux-rockchip@lists.infradead.org \
--cc=llvm@lists.linux.dev \
--cc=mchehab@kernel.org \
--cc=morbo@google.com \
--cc=nathan@kernel.org \
--cc=nick.desaulniers+lkml@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox