From: Daniel Vetter <daniel@ffwll.ch>
To: Jason Ekstrand <jason@jlekstrand.net>
Cc: Intel GFX <intel-gfx@lists.freedesktop.org>,
Maling list - DRI developers <dri-devel@lists.freedesktop.org>
Subject: Re: [Intel-gfx] [PATCH 3/6] drm/i915: Always call i915_globals_exit() from i915_exit()
Date: Wed, 21 Jul 2021 13:26:21 +0200 [thread overview]
Message-ID: <YPgEXZ4zXuR4374o@phenom.ffwll.local> (raw)
In-Reply-To: <CAOFGe973b3YPNs+0aGThP0uFTu13t94To=wu3_ZwX5DJhSoGMw@mail.gmail.com>
On Tue, Jul 20, 2021 at 09:55:22AM -0500, Jason Ekstrand wrote:
> On Tue, Jul 20, 2021 at 9:18 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Mon, Jul 19, 2021 at 01:30:44PM -0500, Jason Ekstrand wrote:
> > > If the driver was not fully loaded, we may still have globals lying
> > > around. If we don't tear those down in i915_exit(), we'll leak a bunch
> > > of memory slabs. This can happen two ways: use_kms = false and if we've
> > > run mock selftests. In either case, we have an early exit from
> > > i915_init which happens after i915_globals_init() and we need to clean
> > > up those globals. While we're here, add an explicit boolean instead of
> > > using a random field from i915_pci_device to detect partial loads.
> > >
> > > The mock selftests case gets especially sticky. The load isn't entirely
> > > a no-op. We actually do quite a bit inside those selftests including
> > > allocating a bunch of mock objects and running tests on them. Once all
> > > those tests are complete, we exit early from i915_init(). Perviously,
> > > i915_init() would return a non-zero error code on failure and a zero
> > > error code on success. In the success case, we would get to i915_exit()
> > > and check i915_pci_driver.driver.owner to detect if i915_init exited early
> > > and do nothing. In the failure case, we would fail i915_init() but
> > > there would be no opportunity to clean up globals.
> > >
> > > The most annoying part is that you don't actually notice the failure as
> > > part of the self-tests since leaking a bit of memory, while bad, doesn't
> > > result in anything observable from userspace. Instead, the next time we
> > > load the driver (usually for next IGT test), i915_globals_init() gets
> > > invoked again, we go to allocate a bunch of new memory slabs, those
> > > implicitly create debugfs entries, and debugfs warns that we're trying
> > > to create directories and files that already exist. Since this all
> > > happens as part of the next driver load, it shows up in the dmesg-warn
> > > of whatever IGT test ran after the mock selftests.
> > >
> > > While the obvious thing to do here might be to call i915_globals_exit()
> > > after selftests, that's not actually safe. The dma-buf selftests call
> > > i915_gem_prime_export which creates a file. We call dma_buf_put() on
> > > the resulting dmabuf which calls fput() on the file. However, fput()
> > > isn't immediate and gets flushed right before syscall returns. This
> > > means that all the fput()s from the selftests don't happen until right
> > > before the module load syscall used to fire off the selftests returns
> > > which is after i915_init(). If we call i915_globals_exit() in
> > > i915_init() after selftests, we end up freeing slabs out from under
> > > objects which won't get released until fput() is flushed at the end of
> > > the module load.
> > >
> > > The solution here is to let i915_init() return success early and detect
> > > the early success in i915_exit() and only tear down globals and nothing
> > > else. This way the module loads successfully, regardless of the success
> > > or failure of the tests. Because we've not enumerated any PCI devices,
> > > no device nodes are created and it's entirely useless from userspace.
> > > The only thing the module does at that point is hold on to a bit of
> > > memory until we unload it and i915_exit() is called. Importantly, this
> > > means that everything from our selftests has the ability to properly
> > > flush out between i915_init() and i915_exit() because there are a couple
> > > syscall boundaries in between.
> > >
> > > Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
> > > Fixes: 32eb6bcfdda9 ("drm/i915: Make request allocation caches global")
> > > Cc: Daniel Vetter <daniel@ffwll.ch>
> > > ---
> > > drivers/gpu/drm/i915/i915_pci.c | 32 +++++++++++++++++++++++++-------
> > > 1 file changed, 25 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> > > index 4e627b57d31a2..24e4e54516936 100644
> > > --- a/drivers/gpu/drm/i915/i915_pci.c
> > > +++ b/drivers/gpu/drm/i915/i915_pci.c
> > > @@ -1194,18 +1194,31 @@ static struct pci_driver i915_pci_driver = {
> > > .driver.pm = &i915_pm_ops,
> > > };
> > >
> > > +static bool i915_fully_loaded = false;
> > > +
> > > static int __init i915_init(void)
> > > {
> > > bool use_kms = true;
> > > int err;
> > >
> > > + i915_fully_loaded = false;
> > > +
> > > err = i915_globals_init();
> > > if (err)
> > > return err;
> > >
> > > + /* i915_mock_selftests() only returns zero if no mock subtests were
> > > + * run. If we get any non-zero error code, we return early here.
> > > + * We always return success because selftests may have allocated
> > > + * objects from slabs which will get cleaned up by i915_exit(). We
> > > + * could attempt to clean up immediately and fail module load but,
> > > + * thanks to interactions with other parts of the kernel (struct
> > > + * file, in particular), it's safer to let the module fully load
> > > + * and then clean up on unload.
> > > + */
> > > err = i915_mock_selftests();
> > > if (err)
> > > - return err > 0 ? 0 : err;
> > > + return 0;
> >
> > At least the module options still claim that you can run selftests and
> > still load the driver. Which makes sense for perf/hw selftests, since
> > those need the driver, but would result in the same old bug resurfacing
> > that you're trying to fix there.
> >
> > Is that description just confused and needs some fixing, or do we have a
> > gap here?
>
> I don't think there's real need for a fully loaded driver after mock
> selftests. They exist entirely to run against a mock driver, not the
> real one.
Can you pls update the module option help then for the next round?
-Daniel
>
> > Patch itself looks reasonable, with the nits from Tvrtko addressed:
>
> Done
>
> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
> Thanks
>
> --Jason
>
> > >
> > > /*
> > > * Enable KMS by default, unless explicitly overriden by
> > > @@ -1225,6 +1238,12 @@ static int __init i915_init(void)
> > > return 0;
> > > }
> > >
> > > + /* After this point, i915_init() must either fully succeed or
> > > + * properly tear everything down and fail. We don't have separate
> > > + * flags for each set-up bit.
> > > + */
> > > + i915_fully_loaded = true;
> > > +
> > > i915_pmu_init();
> > >
> > > err = pci_register_driver(&i915_pci_driver);
> > > @@ -1240,12 +1259,11 @@ static int __init i915_init(void)
> > >
> > > static void __exit i915_exit(void)
> > > {
> > > - if (!i915_pci_driver.driver.owner)
> > > - return;
> > > -
> > > - i915_perf_sysctl_unregister();
> > > - pci_unregister_driver(&i915_pci_driver);
> > > - i915_pmu_exit();
> > > + if (i915_fully_loaded) {
> > > + i915_perf_sysctl_unregister();
> > > + pci_unregister_driver(&i915_pci_driver);
> > > + i915_pmu_exit();
> > > + }
> > > i915_globals_exit();
> > > }
> > >
> > > --
> > > 2.31.1
> > >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2021-07-21 11:26 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-19 18:30 [Intel-gfx] [PATCH 0/6] Fix the debugfs splat from mock selftests Jason Ekstrand
2021-07-19 18:30 ` [Intel-gfx] [PATCH 1/6] drm/i915: Call i915_globals_exit() after i915_pmu_exit() Jason Ekstrand
2021-07-19 18:30 ` [Intel-gfx] [PATCH 2/6] drm/i915: Call i915_globals_exit() if pci_register_device() fails Jason Ekstrand
2021-07-20 14:11 ` Daniel Vetter
2021-07-19 18:30 ` [Intel-gfx] [PATCH 3/6] drm/i915: Always call i915_globals_exit() from i915_exit() Jason Ekstrand
2021-07-20 8:25 ` Tvrtko Ursulin
2021-07-20 14:53 ` Jason Ekstrand
2021-07-20 15:22 ` Tvrtko Ursulin
2021-07-20 15:30 ` Tvrtko Ursulin
2021-07-20 15:05 ` Jason Ekstrand
2021-07-20 15:25 ` Tvrtko Ursulin
2021-07-20 14:18 ` Daniel Vetter
2021-07-20 14:55 ` Jason Ekstrand
2021-07-21 11:26 ` Daniel Vetter [this message]
2021-07-21 15:20 ` Jason Ekstrand
2021-07-19 18:30 ` [Intel-gfx] [PATCH 4/6] drm/ttm: Force re-init if ttm_global_init() fails Jason Ekstrand
2021-07-19 19:21 ` Christian König
2021-07-19 18:30 ` [Intel-gfx] [PATCH 5/6] drm/ttm: Initialize debugfs from ttm_global_init() Jason Ekstrand
2021-07-20 14:22 ` Daniel Vetter
2021-07-19 18:30 ` [Intel-gfx] [PATCH 6/6] drm/i915: Make the kmem slab for i915_buddy_block a global Jason Ekstrand
2021-07-19 18:38 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Fix the debugfs splat from mock selftests Patchwork
2021-07-19 18:38 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2021-07-19 18:42 ` [Intel-gfx] ✗ Fi.CI.DOCS: " Patchwork
2021-07-19 19:05 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-07-19 23:32 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YPgEXZ4zXuR4374o@phenom.ffwll.local \
--to=daniel@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=jason@jlekstrand.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox