public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>,
	intel-gfx@lists.freedesktop.org,
	linux-perf-users@vger.kernel.org,
	Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	dri-devel@lists.freedesktop.org, Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 6/7] drm/i915/pmu: Lazy unregister
Date: Wed, 24 Jul 2024 14:41:05 +0200	[thread overview]
Message-ID: <20240724124105.GB13387@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <xsuzfv4rzb4c25sibt5gjskn7xyfwf33wgwaw4nkz5jlnvl2ke@ekur5xvhec3z>

On Tue, Jul 23, 2024 at 10:30:08AM -0500, Lucas De Marchi wrote:
> On Tue, Jul 23, 2024 at 09:03:25AM GMT, Tvrtko Ursulin wrote:
> > 
> > On 22/07/2024 22:06, Lucas De Marchi wrote:
> > > Instead of calling perf_pmu_unregister() when unbinding, defer that to
> > > the destruction of i915 object. Since perf itself holds a reference in
> > > the event, this only happens when all events are gone, which guarantees
> > > i915 is not unregistering the pmu with live events.
> > > 
> > > Previously, running the following sequence would crash the system after
> > > ~2 tries:
> > > 
> > > 	1) bind device to i915
> > > 	2) wait events to show up on sysfs
> > > 	3) start perf  stat -I 1000 -e i915/rcs0-busy/
> > > 	4) unbind driver
> > > 	5) kill perf
> > > 
> > > Most of the time this crashes in perf_pmu_disable() while accessing the
> > > percpu pmu_disable_count. This happens because perf_pmu_unregister()
> > > destroys it with free_percpu(pmu->pmu_disable_count).
> > > 
> > > With a lazy unbind, the pmu is only unregistered after (5) as opposed to
> > > after (4). The downside is that if a new bind operation is attempted for
> > > the same device/driver without killing the perf process, i915 will fail
> > > to register the pmu (but still load successfully). This seems better
> > > than completely crashing the system.
> > 
> > So effectively allows unbind to succeed without fully unbinding the
> > driver from the device? That sounds like a significant drawback and if
> > so, I wonder if a more complicated solution wouldn't be better after
> > all. Or is there precedence for allowing userspace keeping their paws on
> > unbound devices in this way?
> 
> keeping the resources alive but "unplunged" while the hardware
> disappeared is a common thing to do... it's the whole point of the
> drmm-managed resource for example. If you bind the driver and then
> unbind it while userspace is holding a ref, next time you try to bind it
> will come up with a different card number. A similar thing that could be
> done is to adjust the name of the event - currently we add the mangled
> pci slot.
> 
> That said, I agree a better approach would be to allow
> perf_pmu_unregister() to do its job even when there are open events. On
> top of that (or as a way to help achieve that), make perf core replace
> the callbacks with stubs when pmu is unregistered - that would even kill
> the need for i915's checks on pmu->closed (and fix the lack thereof in
> other drivers).
> 
> It can be a can of worms though and may be pushed back by perf core
> maintainers, so it'd be good have their feedback.

I don't think I understand the problem. I also don't understand drivers
much -- so that might be the problem.

So the problem appears to be that the device just disappears without
warning? How can a GPU go away like that?

Since you have a notion of this device, can't you do this stubbing you
talk about? That is, if your internal device reference becomes NULL, let
the PMU methods preserve the state like no-ops.

And then when the last event goes away, tear down the whole thing.

Again, I'm not sure I'm following.

  parent reply	other threads:[~2024-07-24 12:41 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-22 21:06 [PATCH 0/7] Fix i915 pmu on bind/unbind Lucas De Marchi
2024-07-22 21:06 ` [PATCH 1/7] perf/core: Add pmu get/put Lucas De Marchi
2024-07-23 23:07   ` Ian Rogers
2024-07-22 21:06 ` [PATCH 2/7] drm/i915/pmu: Fix crash due to use-after-free Lucas De Marchi
2024-07-22 21:06 ` [PATCH 3/7] drm/i915/pmu: Use event_to_pmu() Lucas De Marchi
2024-07-23  4:35   ` Dixit, Ashutosh
2024-07-22 21:06 ` [PATCH 4/7] drm/i915/pmu: Drop is_igp() Lucas De Marchi
2024-07-22 23:25   ` Dixit, Ashutosh
2024-07-23  7:52   ` Tvrtko Ursulin
2024-07-22 21:06 ` [PATCH 5/7] drm/i915/pmu: Let resource survive unbind Lucas De Marchi
2024-07-23  7:58   ` Tvrtko Ursulin
2024-07-22 21:06 ` [PATCH 6/7] drm/i915/pmu: Lazy unregister Lucas De Marchi
2024-07-23  8:03   ` Tvrtko Ursulin
2024-07-23 15:30     ` Lucas De Marchi
2024-07-24  7:48       ` Tvrtko Ursulin
2024-07-24 12:41       ` Peter Zijlstra [this message]
2024-07-24 15:39         ` Lucas De Marchi
2024-09-09 21:03           ` Lucas De Marchi
2024-07-22 21:06 ` [PATCH 7/7] drm/i915/pmu: Do not set event_init to NULL Lucas De Marchi
2024-08-05  6:55   ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240724124105.GB13387@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=acme@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=lucas.demarchi@intel.com \
    --cc=mingo@redhat.com \
    --cc=tursulin@ursulin.net \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox