From: Boris Brezillon <boris.brezillon@collabora.com>
To: Iago Toral <itoral@igalia.com>
Cc: dri-devel@lists.freedesktop.org,
"Steven Price" <steven.price@arm.com>,
"Liviu Dudau" <liviu.dudau@arm.com>,
"Adrián Larumbe" <adrian.larumbe@collabora.com>,
lima@lists.freedesktop.org, "Qiang Yu" <yuq825@gmail.com>,
"David Airlie" <airlied@gmail.com>,
"Simona Vetter" <simona@ffwll.ch>,
"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
"Maxime Ripard" <mripard@kernel.org>,
"Thomas Zimmermann" <tzimmermann@suse.de>,
"Dmitry Osipenko" <dmitry.osipenko@collabora.com>,
"Alyssa Rosenzweig" <alyssa@rosenzweig.io>,
"Christian Koenig" <christian.koenig@amd.com>,
"Faith Ekstrand" <faith.ekstrand@collabora.com>,
kernel@collabora.com
Subject: Re: [PATCH] drm/doc: Start documenting aspects specific to tile-based renderers
Date: Mon, 28 Apr 2025 10:13:02 +0200 [thread overview]
Message-ID: <20250428101302.2df2f9cb@collabora.com> (raw)
In-Reply-To: <123343432f17913452ba9cbef6161837cc3c07d8.camel@igalia.com>
Hi Iago,
On Mon, 28 Apr 2025 08:55:07 +0200
Iago Toral <itoral@igalia.com> wrote:
> Hi,
>
> Pitching in to describe the situation for v3d:
Thanks for chiming in.
>
> El vie, 18-04-2025 a las 14:25 +0200, Boris Brezillon escribió:
>
> (...)
> > +For all these reasons, the tiler usually allocates memory
> > dynamically, but
> > +DRM has not been designed with this use case in mind. Drivers will
> > address
> > +these problems differently based on the functionality provided by
> > their
> > +hardware, but all of them almost certainly have to deal with this
> > somehow.
> > +
> > +The easy solution is to statically allocate a huge buffer to pick
> > from when
> > +tiler memory is needed, and fail the rendering when this buffer is
> > depleted.
> > +Some drivers try to be smarter to avoid reserving a lot of memory
> > upfront.
> > +Instead, they start with an almost empty buffer and progressively
> > populate it
> > +when the GPU faults on an address sitting in the tiler buffer range.
> > This
> > +works okay most of the time but it falls short when the system is
> > under
> > +memory pressure, because the memory request is not guaranteed to be
> > satisfied.
> > +In that case, the driver either fails the rendering, or, if the
> > hardware
> > +allows it, it tries to flush the primitives that have been processed
> > and
> > +triggers a fragment job that will consume those primitives and free
> > up some
> > +memory to be recycled and make further progress on the tiling step.
> > This is
> > +usually referred as partial/incremental rendering (it might have
> > other names).
>
> In our case, user space allocates some memory up front hoping to avoid
> running out of memory during tiling, but if the tiler does run out of
> memory we get an interrupt and the tiler hw will stop and wait for the
> kernel driver to write back an address where more memory is made
> available (via register write), which we will try to allocate at that
> point. This can happen any number of times until the tiler job
> completes
Sounds very much like how new Mali-CSF works, except Mali-CSF also has
a fallback for when the allocation can't be satisfied.
>
> I am not sure that we are handling allocation failure on this path
> nicely at the moment since we don't try to fail and cancel the job,
> that's maybe something we should fix, although I don't personally
> recall any reports of us running into this situation either.
Yeah, I'd say you're pretty much in the same place Panfrost/Panthor are
at the moment: we're not playing by the dma_fence rules, but no user
complained so far. BTW, that doesn't necessarily mean the problem
doesn't occur, just that it's not been identified as being a KMD issue
:-).
>
>
> > +
> > +Compute based emulation of geometry stages
> > +------------------------------------------
> > +
> > +More and more hardware vendors don't bother providing hardware
> > support for
> > +geometry/tesselation/mesh stages, since those can be emulated with
> > compute
> > +shaders. But the same problem we have with tiler memory exists with
> > those
> > +intermediate compute-emulated stages, because transient data shared
> > between
> > +stages need to be stored in memory for the next stage to consume,
> > and this
> > +bubbles up until the tiling stage is reached, because ultimately,
> > what the
> > +tiling stage will need to process is a set of vertices it can turn
> > into
> > +primitives, like would happen if the application had emulated the
> > geometry,
> > +tesselation or mesh stages with compute.
> > +
> > +Unlike tiling, where the hardware can provide a fallback to recycle
> > memory,
> > +there is no way the intermediate primitives can be flushed up to the
> > framebuffer,
> > +because it's a purely software emulation here. This being said, the
> > same
> > +"start small, grow on-demand" can be applied to avoid over-
> > allocating memory
> > +upfront.
>
> FWIW, v3d has geometry and tessellation hardware.
Yep, Alyssa mentioned that. I'll change this section to specifically
mention Arm/Mali as being the outlier here.
>
>
> > +
> > +On-demand memory allocation
> > +---------------------------
> > +
> > +As explained in previous sections, on-demand allocation is a central
> > piece
> > +of tile-based renderer if we don't want to over-allocate, which is
> > bad for
> > +integrated GPUs who share their memory with the rest of the system.
> > +
> > +The problem with on-demand allocation is that suddenly, GPU accesses
> > can
> > +fail on OOM, and the DRM components (drm_gpu_scheduler and drm_gem
> > mostly)
> > +were not designed for that. Those are assuming that buffers memory
> > is
> > +populated at job submission time, and will stay around for the job
> > lifetime.
> > +If a GPU fault happens, it's the user fault, and the context can be
> > flagged
> > +unusable. On-demand allocation is usually implemented as allocation-
> > on-fault,
> > +and the dma_fence contract prevents us from blocking on allocations
> > in that
> > +path (GPU fault handlers are in the dma-fence signalling path).
>
> As I described above, v3d is not quite an allocation-on-fault mechanism
> but rather, we get a dedicated interrupt from the hw when it needs more
> memory, which I believe happens a bit before it completely runs out of
> memory actually. Maybe that changes the picture since we don't exactly
> use a fault handler?
Not really. Any mechanism relying on on-demand allocation in the
dma_fence signalling path is problematic. The fact it's based on a
fault handler might add extra problems on top, but both designs violate
the dma_fence contract stating that no non-fallible allocation should
be done in the dma_fence signalling path (that is, any allocation
happening between the moment the job was queued to the
drm_sched_entity, and the moment the job fence is signalled).
Given, the description you made, I think we can add v3d to the list of
problematic drivers :-(.
Regards,
Boris
next prev parent reply other threads:[~2025-04-28 8:13 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-18 12:25 [PATCH] drm/doc: Start documenting aspects specific to tile-based renderers Boris Brezillon
2025-04-23 9:41 ` Steven Price
2025-04-28 8:00 ` Boris Brezillon
2025-04-23 14:47 ` Alyssa Rosenzweig
2025-04-28 7:42 ` Boris Brezillon
2025-04-28 13:45 ` Alyssa Rosenzweig
2025-04-28 6:55 ` Iago Toral
2025-04-28 8:13 ` Boris Brezillon [this message]
2025-04-28 8:22 ` Iago Toral
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250428101302.2df2f9cb@collabora.com \
--to=boris.brezillon@collabora.com \
--cc=adrian.larumbe@collabora.com \
--cc=airlied@gmail.com \
--cc=alyssa@rosenzweig.io \
--cc=christian.koenig@amd.com \
--cc=dmitry.osipenko@collabora.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=faith.ekstrand@collabora.com \
--cc=itoral@igalia.com \
--cc=kernel@collabora.com \
--cc=lima@lists.freedesktop.org \
--cc=liviu.dudau@arm.com \
--cc=maarten.lankhorst@linux.intel.com \
--cc=mripard@kernel.org \
--cc=simona@ffwll.ch \
--cc=steven.price@arm.com \
--cc=tzimmermann@suse.de \
--cc=yuq825@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.