Re: [PATCH -mm] mm, swap: Fix race between swapoff and some swap operations

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	"Minchan Kim" <minchan@kernel.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	"Hugh Dickins" <hughd@google.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Tim Chen" <tim.c.chen@linux.intel.com>,
	"Shaohua Li" <shli@fb.com>,
	"Mel Gorman" <mgorman@techsingularity.net>,
	"J�r�me Glisse" <jglisse@redhat.com>,
	"Michal Hocko" <mhocko@suse.com>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"David Rientjes" <rientjes@google.com>,
	"Rik van Riel" <riel@redhat.com>, "Jan Kara" <jack@suse.cz>,
	"Dave Jiang" <dave.jiang@intel.com>,
	"Aaron Lu" <aaron.lu@intel.com>
Subject: Re: [PATCH -mm] mm, swap: Fix race between swapoff and some swap operations
Date: Tue, 12 Dec 2017 19:27:25 -0800	[thread overview]
Message-ID: <20171213032725.GJ7829@linux.vnet.ibm.com> (raw)
In-Reply-To: <87indbnzga.fsf@yhuang-dev.intel.com>

On Wed, Dec 13, 2017 at 10:17:41AM +0800, Huang, Ying wrote:
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
> 
> > On Tue, Dec 12, 2017 at 09:12:20AM +0800, Huang, Ying wrote:
> >> Hi, Pual,
> >> 
> >> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
> >> 
> >> > On Mon, Dec 11, 2017 at 01:30:03PM +0800, Huang, Ying wrote:
> >> >> Andrew Morton <akpm@linux-foundation.org> writes:
> >> >> 
> >> >> > On Fri, 08 Dec 2017 16:41:38 +0800 "Huang\, Ying" <ying.huang@intel.com> wrote:
> >> >> >
> >> >> >> > Why do we need srcu here? Is it enough with rcu like below?
> >> >> >> >
> >> >> >> > It might have a bug/room to be optimized about performance/naming.
> >> >> >> > I just wanted to show my intention.
> >> >> >> 
> >> >> >> Yes.  rcu should work too.  But if we use rcu, it may need to be called
> >> >> >> several times to make sure the swap device under us doesn't go away, for
> >> >> >> example, when checking si->max in __swp_swapcount() and
> >> >> >> add_swap_count_continuation().  And I found we need rcu to protect swap
> >> >> >> cache radix tree array too.  So I think it may be better to use one
> >> >> >> calling to srcu_read_lock/unlock() instead of multiple callings to
> >> >> >> rcu_read_lock/unlock().
> >> >> >
> >> >> > Or use stop_machine() ;)  It's very crude but it sure is simple.  Does
> >> >> > anyone have a swapoff-intensive workload?
> >> >> 
> >> >> Sorry, I don't know how to solve the problem with stop_machine().
> >> >> 
> >> >> The problem we try to resolved is that, we have a swap entry, but that
> >> >> swap entry can become invalid because of swappoff between we check it
> >> >> and we use it.  So we need to prevent swapoff to be run between checking
> >> >> and using.
> >> >> 
> >> >> I don't know how to use stop_machine() in swapoff to wait for all users
> >> >> of swap entry to finish.  Anyone can help me on this?
> >> >
> >> > You can think of stop_machine() as being sort of like a reader-writer
> >> > lock.  The readers can be any section of code with preemption disabled,
> >> > and the writer is the function passed to stop_machine().
> >> >
> >> > Users running real-time applications on Linux don't tend to like
> >> > stop_machine() much, but perhaps it is nevertheless the right tool
> >> > for this particular job.
> >> 
> >> Thanks a lot for explanation!  Now I understand this.
> >> 
> >> Another question, for this specific problem, I think both stop_machine()
> >> based solution and rcu_read_lock/unlock() + synchronize_rcu() based
> >> solution work.  If so, what is the difference between them?  I guess rcu
> >> based solution will be a little better for real-time applications?  So
> >> what is the advantage of stop_machine() based solution?
> >
> > The stop_machine() solution places similar restrictions on readers as
> > does rcu_read_lock/unlock() + synchronize_rcu(), if that is what you
> > are asking.
> >
> > More precisely, the stop_machine() solution places exactly the
> > same restrictions on readers as does preempt_disable/enable() and
> > synchronize_sched().
> >
> > I would expect stop_machine() to be faster than either synchronize_rcu()
> > synchronize_sched(), or synchronize_srcu(), but stop_machine() operates
> > by making each CPU spin with interrupts until all the other CPUs arrive.
> > This normally does not make real-time people happy.
> >
> > An compromise position is available in the form of
> > synchronize_rcu_expedited() and synchronize_sched_expedited().  These
> > are faster than their non-expedited counterparts, and only momentarily
> > disturb each CPU, rather than spinning with interrupts disabled.  However,
> > stop_machine() is probably a bit faster.
> >
> > Finally, syncrhonize_srcu_expedited() is reasonably fast, but
> > avoids disturbing other CPUs.  Last I checked, not quite as fast as
> > synchronize_rcu_expedited() and synchronize_sched_expedited(), though.
> >
> > You asked!  ;-)
> 
> Thanks a lot Paul!  That exceeds my expectation!
> 
> The performance of swapoff() isn't very important, probably it's not
> necessary to accelerate it at the cost of realtime.  I think it is
> better to use a rcu or srcu based solution.  I think the cost at reader
> side should be almost same between rcu and srcu?  To use srcu, we need
> to select CONFIG_SRCU when CONFIG_SWAP is enabled in Kconfig.  I think
> that should be OK?

The thing to do is to try SRCU and see if you can see significant
performance degradation.  Given that there is swapping involved, I
would be surprised if the added read-side overhead of SRCU was even
measurable, but then again I have been surprised before.

And yes, just select CONFIG_SRCU when you need it.

							Thanx, Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	"Minchan Kim" <minchan@kernel.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	"Hugh Dickins" <hughd@google.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Tim Chen" <tim.c.chen@linux.intel.com>,
	"Shaohua Li" <shli@fb.com>,
	"Mel Gorman" <mgorman@techsingularity.net>,
	"J�r�me Glisse" <jglisse@redhat.com>,
	"Michal Hocko" <mhocko@suse.com>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"David Rientjes" <rientjes@google.com>,
	"Rik van Riel" <riel@redhat.com>, "Jan Kara" <jack@suse.cz>,
	"Dave Jiang" <dave.jiang@intel.com>,
	"Aaron Lu" <aaron.lu@intel.com>
Subject: Re: [PATCH -mm] mm, swap: Fix race between swapoff and some swap operations
Date: Tue, 12 Dec 2017 19:27:25 -0800	[thread overview]
Message-ID: <20171213032725.GJ7829@linux.vnet.ibm.com> (raw)
In-Reply-To: <87indbnzga.fsf@yhuang-dev.intel.com>

On Wed, Dec 13, 2017 at 10:17:41AM +0800, Huang, Ying wrote:
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
> 
> > On Tue, Dec 12, 2017 at 09:12:20AM +0800, Huang, Ying wrote:
> >> Hi, Pual,
> >> 
> >> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
> >> 
> >> > On Mon, Dec 11, 2017 at 01:30:03PM +0800, Huang, Ying wrote:
> >> >> Andrew Morton <akpm@linux-foundation.org> writes:
> >> >> 
> >> >> > On Fri, 08 Dec 2017 16:41:38 +0800 "Huang\, Ying" <ying.huang@intel.com> wrote:
> >> >> >
> >> >> >> > Why do we need srcu here? Is it enough with rcu like below?
> >> >> >> >
> >> >> >> > It might have a bug/room to be optimized about performance/naming.
> >> >> >> > I just wanted to show my intention.
> >> >> >> 
> >> >> >> Yes.  rcu should work too.  But if we use rcu, it may need to be called
> >> >> >> several times to make sure the swap device under us doesn't go away, for
> >> >> >> example, when checking si->max in __swp_swapcount() and
> >> >> >> add_swap_count_continuation().  And I found we need rcu to protect swap
> >> >> >> cache radix tree array too.  So I think it may be better to use one
> >> >> >> calling to srcu_read_lock/unlock() instead of multiple callings to
> >> >> >> rcu_read_lock/unlock().
> >> >> >
> >> >> > Or use stop_machine() ;)  It's very crude but it sure is simple.  Does
> >> >> > anyone have a swapoff-intensive workload?
> >> >> 
> >> >> Sorry, I don't know how to solve the problem with stop_machine().
> >> >> 
> >> >> The problem we try to resolved is that, we have a swap entry, but that
> >> >> swap entry can become invalid because of swappoff between we check it
> >> >> and we use it.  So we need to prevent swapoff to be run between checking
> >> >> and using.
> >> >> 
> >> >> I don't know how to use stop_machine() in swapoff to wait for all users
> >> >> of swap entry to finish.  Anyone can help me on this?
> >> >
> >> > You can think of stop_machine() as being sort of like a reader-writer
> >> > lock.  The readers can be any section of code with preemption disabled,
> >> > and the writer is the function passed to stop_machine().
> >> >
> >> > Users running real-time applications on Linux don't tend to like
> >> > stop_machine() much, but perhaps it is nevertheless the right tool
> >> > for this particular job.
> >> 
> >> Thanks a lot for explanation!  Now I understand this.
> >> 
> >> Another question, for this specific problem, I think both stop_machine()
> >> based solution and rcu_read_lock/unlock() + synchronize_rcu() based
> >> solution work.  If so, what is the difference between them?  I guess rcu
> >> based solution will be a little better for real-time applications?  So
> >> what is the advantage of stop_machine() based solution?
> >
> > The stop_machine() solution places similar restrictions on readers as
> > does rcu_read_lock/unlock() + synchronize_rcu(), if that is what you
> > are asking.
> >
> > More precisely, the stop_machine() solution places exactly the
> > same restrictions on readers as does preempt_disable/enable() and
> > synchronize_sched().
> >
> > I would expect stop_machine() to be faster than either synchronize_rcu()
> > synchronize_sched(), or synchronize_srcu(), but stop_machine() operates
> > by making each CPU spin with interrupts until all the other CPUs arrive.
> > This normally does not make real-time people happy.
> >
> > An compromise position is available in the form of
> > synchronize_rcu_expedited() and synchronize_sched_expedited().  These
> > are faster than their non-expedited counterparts, and only momentarily
> > disturb each CPU, rather than spinning with interrupts disabled.  However,
> > stop_machine() is probably a bit faster.
> >
> > Finally, syncrhonize_srcu_expedited() is reasonably fast, but
> > avoids disturbing other CPUs.  Last I checked, not quite as fast as
> > synchronize_rcu_expedited() and synchronize_sched_expedited(), though.
> >
> > You asked!  ;-)
> 
> Thanks a lot Paul!  That exceeds my expectation!
> 
> The performance of swapoff() isn't very important, probably it's not
> necessary to accelerate it at the cost of realtime.  I think it is
> better to use a rcu or srcu based solution.  I think the cost at reader
> side should be almost same between rcu and srcu?  To use srcu, we need
> to select CONFIG_SRCU when CONFIG_SWAP is enabled in Kconfig.  I think
> that should be OK?

The thing to do is to try SRCU and see if you can see significant
performance degradation.  Given that there is swapping involved, I
would be surprised if the added read-side overhead of SRCU was even
measurable, but then again I have been surprised before.

And yes, just select CONFIG_SRCU when you need it.

							Thanx, Paul

next prev parent reply	other threads:[~2017-12-13  3:27 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-07  1:14 [PATCH -mm] mm, swap: Fix race between swapoff and some swap operations Huang, Ying
2017-12-07  1:14 ` Huang, Ying
2017-12-08  0:29 ` Andrew Morton
2017-12-08  0:29   ` Andrew Morton
2017-12-08  1:43   ` Minchan Kim
2017-12-08  1:43     ` Minchan Kim
2017-12-08  5:41     ` Huang, Ying
2017-12-08  8:26       ` Minchan Kim
2017-12-08  8:26         ` Minchan Kim
2017-12-08  8:41         ` Huang, Ying
2017-12-08  8:41           ` Huang, Ying
2017-12-08  9:10           ` Minchan Kim
2017-12-08  9:10             ` Minchan Kim
2017-12-08 12:32             ` Huang, Ying
2017-12-08 12:32               ` Huang, Ying
2017-12-13  7:15               ` Minchan Kim
2017-12-13  7:15                 ` Minchan Kim
2017-12-13  8:52                 ` Huang, Ying
2017-12-13  8:52                   ` Huang, Ying
2017-12-08 22:09           ` Andrew Morton
2017-12-08 22:09             ` Andrew Morton
2017-12-11  5:30             ` Huang, Ying
2017-12-11  5:30               ` Huang, Ying
2017-12-11 17:04               ` Paul E. McKenney
2017-12-11 17:04                 ` Paul E. McKenney
2017-12-12  1:12                 ` Huang, Ying
2017-12-12  1:12                   ` Huang, Ying
2017-12-12 17:11                   ` Paul E. McKenney
2017-12-12 17:11                     ` Paul E. McKenney
2017-12-13  2:17                     ` Huang, Ying
2017-12-13  2:17                       ` Huang, Ying
2017-12-13  3:27                       ` Paul E. McKenney [this message]
2017-12-13  3:27                         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171213032725.GJ7829@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=aarcange@redhat.com \
    --cc=aaron.lu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.jiang@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jglisse@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=shli@fb.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.