All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	"David Rientjes" <rientjes@google.com>,
	"Ben Gardon" <bgardon@google.com>,
	"Michal Hocko" <mhocko@suse.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Dimitri Sivanich" <dimitri.sivanich@hpe.com>
Subject: Re: [PATCH] mm/oom_kill: Ensure MMU notifier range_end() is paired with range_start()
Date: Wed, 10 Mar 2021 17:20:01 -0800	[thread overview]
Message-ID: <YElwQU9mPUNwPg7q@google.com> (raw)
In-Reply-To: <20210311002807.GQ444867@ziepe.ca>

On Wed, Mar 10, 2021, Jason Gunthorpe wrote:
> On Wed, Mar 10, 2021 at 01:31:17PM -0800, Sean Christopherson wrote:
> > Invoke the MMU notifier's .invalidate_range_end() callbacks even if one
> > of the .invalidate_range_start() callbacks failed.  If there are multiple
> > notifiers, the notifier that did not fail may have performed actions in
> > its ...start() that it expects to unwind via ...end().  Per the
> > mmu_notifier_ops documentation, ...start() and ...end() must be paired.
> 
> No this is not OK, if invalidate_start returns EBUSY invalidate_end
> should *not* be called.
> 
> As you observed:
>  
> > The only in-kernel usage that is fatally broken is the SGI UV GRU driver,
> > which effectively blocks and sleeps fault handlers during ...start(), and
> > unblocks/wakes the handlers during ...end().  But, the only users that
> > can fail ...start() are the i915 and Nouveau drivers, which are unlikely
> > to collide with the SGI driver.
> 
> It used to be worse but I've since moved most of the other problematic
> users to the itree notifier which doesn't have the problem.
> 
> > KVM is the only other user of ...end(), and while KVM also blocks fault
> > handlers in ...start(), the fault handlers do not sleep and originate in
> 
> KVM will have its mmu_notifier_count become imbalanced:
> 
> static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
>                                         const struct mmu_notifier_range *range)
> {
>         kvm->mmu_notifier_count++;
> 
> static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn,
>                                         const struct mmu_notifier_range *range)
> {
>         kvm->mmu_notifier_count--;
> 
> Which I believe is fatal to kvm? These notifiers certainly do not only
> happen at process exit.

My point about the process dying is that the existing bug that causes
mmu_notifier_count to become imbalanced is benign only because the process is
being killed, and thus KVM will stop running its vCPUs.

> So, both of the remaining _end users become corrupted with this patch!

I don't follow.  mn_hlist_invalidate_range_start() iterates over all notifiers,
even if a notifier earlier in the chain failed.  How will KVM become imbalanced?

The existing _end users never fail their _start.  If KVM started failing its
start, then yes, it could get corrupted.  But my assumption/expection is that,
if KVM were to ever reject _start, it would be responsible for knowing that it
must also skip _end.  I'm happy to kick that one down the road though, as I
can't think of a scenario where KVM would _need_ to sleep.

> I've tried to fix this before, the only thing that seems like it will
> work is to sort the hlist and only call ends that have succeeded their
> starts by comparing pointers with <.
> 
> This is because the hlist can have items removed concurrently under
> SRCU so there is no easy way to compute the subset that succeeded in
> calling start.
> 
> I had a prior effort to just ban more than 1 hlist notifier with end,
> but it turns out kvm on ARM uses two all the time (IIRC)
> 
> > Found by inspection.  Verified by adding a second notifier in KVM
> > that
> 
> AFAIK it is a non-problem in real life because kvm is not mixed with
> notifier_start's that fail (and GRU is dead?). Everything else was
> fixed by moving to itree.
> 
> Jason


  reply	other threads:[~2021-03-11  1:20 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-10 21:31 [PATCH] mm/oom_kill: Ensure MMU notifier range_end() is paired with range_start() Sean Christopherson
2021-03-10 22:08 ` Ben Gardon
2021-03-11  0:06 ` Andrew Morton
2021-03-11  0:28 ` Jason Gunthorpe
2021-03-11  1:20   ` Sean Christopherson [this message]
2021-03-11  1:50     ` Jason Gunthorpe
2021-03-11  7:22       ` Sean Christopherson
2021-03-11 16:20   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YElwQU9mPUNwPg7q@google.com \
    --to=seanjc@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bgardon@google.com \
    --cc=dimitri.sivanich@hpe.com \
    --cc=hannes@cmpxchg.org \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.