Re: [PATCH RFC] mm: migrate: Fix races of __find_get_block() and page migration

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@suse.de>
To: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, mhocko@suse.cz, stable@vger.kernel.org
Subject: Re: [PATCH RFC] mm: migrate: Fix races of __find_get_block() and page migration
Date: Fri, 12 Jul 2019 11:10:43 +0100	[thread overview]
Message-ID: <20190712101042.GJ13484@suse.de> (raw)
In-Reply-To: <20190712091746.GB906@quack2.suse.cz>

On Fri, Jul 12, 2019 at 11:17:46AM +0200, Jan Kara wrote:
> On Thu 11-07-19 17:04:55, Andrew Morton wrote:
> > On Thu, 11 Jul 2019 14:58:38 +0200 Jan Kara <jack@suse.cz> wrote:
> > 
> > > buffer_migrate_page_norefs() can race with bh users in a following way:
> > > 
> > > CPU1					CPU2
> > > buffer_migrate_page_norefs()
> > >   buffer_migrate_lock_buffers()
> > >   checks bh refs
> > >   spin_unlock(&mapping->private_lock)
> > > 					__find_get_block()
> > > 					  spin_lock(&mapping->private_lock)
> > > 					  grab bh ref
> > > 					  spin_unlock(&mapping->private_lock)
> > >   move page				  do bh work
> > > 
> > > This can result in various issues like lost updates to buffers (i.e.
> > > metadata corruption) or use after free issues for the old page.
> > > 
> > > Closing this race window is relatively difficult. We could hold
> > > mapping->private_lock in buffer_migrate_page_norefs() until we are
> > > finished with migrating the page but the lock hold times would be rather
> > > big. So let's revert to a more careful variant of page migration requiring
> > > eviction of buffers on migrated page. This is effectively
> > > fallback_migrate_page() that additionally invalidates bh LRUs in case
> > > try_to_free_buffers() failed.
> > 
> > Is this premature optimization?  Holding ->private_lock while messing
> > with the buffers would be the standard way of addressing this.  The
> > longer hold times *might* be an issue, but we don't know this, do we? 
> > If there are indeed such problems then they could be improved by, say,
> > doing more of the newpage preparation prior to taking ->private_lock.
> 
> I didn't check how long the private_lock hold times would actually be, it
> just seems there's a lot of work done before the page is fully migrated a
> we could release the lock. And since the lock blocks bh lookup,
> set_page_dirty(), etc. for the whole device, it just seemed as a bad idea.
> I don't think much of a newpage setup can be moved outside of private_lock
> - in particular page cache replacement, page copying, page state migration
> all need to be there so that bh code doesn't get confused.
> 
> But I guess it's fair to measure at least ballpark numbers of what the lock
> hold times would be to get idea whether the contention concern is
> substantiated or not.
> 

I think it would be tricky to measure and quantify how much the contention
is an issue. While it would be possible to construct a microbenchmark that
should illustrate the problem, it would tell us relatively little about
how much of a problem it is generally. It would be relatively difficult
to detect the contention and stalls in block lookups due to migration
would be tricky to spot. Careful use of lock_stat might help but
enabling that has consequences of its own.

However, a rise in allocation failures due to dirty pages not being
migrated is relatively easy to detect and the consequences are relatively
benign -- failed high-order allocation that is usually ok versus a stall
on block lookups that could have a wider general impact.

On that basis, I think the patch you proposed is the more appropriate as
a fix for the race which has the potential for data corruption. So;

Acked-by: Mel Gorman <mgorman@techsingularity.net>

> Finally, I guess I should mention there's one more approach to the problem
> I was considering: Modify bh code to fully rely on page lock instead of
> private_lock for bh lookup. That would make sense scalability-wise on its
> own. The problem with it is that __find_get_block() would become a sleeping
> function. There aren't that many places calling the function and most of
> them seem fine with it but still it is non-trivial amount of work to do the
> conversion and it can have some fallout so it didn't seem like a good
> solution for a data-corruption issue that needs to go to stable...
> 

Maybe *if* it's shown there is a major issue with increased high-order
allocation failures, it would be worth looking into but right now, I
think it's overkill with relatively high risk and closing the potential
race is more important.

-- 
Mel Gorman
SUSE Labs

next prev parent reply	other threads:[~2019-07-12 10:10 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-11 12:58 [PATCH RFC] mm: migrate: Fix races of __find_get_block() and page migration Jan Kara
2019-07-12  0:04 ` Andrew Morton
2019-07-12  8:04   ` Mel Gorman
2019-07-12  9:17   ` Jan Kara
2019-07-12 10:10     ` Mel Gorman [this message]
2019-07-12 11:20       ` Jan Kara
2019-07-12 12:39         ` Mel Gorman
2019-07-12 21:21           ` Andrew Morton
2019-07-14 21:20             ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190712101042.GJ13484@suse.de \
    --to=mgorman@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=jack@suse.cz \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.