From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Hugh Dickins <hughd@google.com>, Minchan Kim <minchan@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Tim Chen <tim.c.chen@linux.intel.com>, Shaohua Li <shli@fb.com>,
Mel Gorman <mgorman@techsingularity.net>,
JXrXme Glisse <jglisse@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
David Rientjes <rientjes@google.com>,
Rik van Riel <riel@redhat.com>, Jan Kara <jack@suse.cz>,
Dave Jiang <dave.jiang@intel.com>, Aaron Lu <aaron.lu@intel.com>
Subject: Re: [PATCH -mm -V2] mm, swap: Fix race between swapoff and some swap operations
Date: Thu, 14 Dec 2017 21:49:29 -0800 [thread overview]
Message-ID: <20171215054929.GW7829@linux.vnet.ibm.com> (raw)
In-Reply-To: <871sjwn5bk.fsf@yhuang-dev.intel.com>
On Fri, Dec 15, 2017 at 09:33:03AM +0800, Huang, Ying wrote:
> Michal Hocko <mhocko@kernel.org> writes:
>
> > On Thu 14-12-17 21:38:32, Huang, Ying wrote:
> >> From: Huang Ying <ying.huang@intel.com>
> >>
> >> When the swapin is performed, after getting the swap entry information
> >> from the page table, system will swap in the swap entry, without any
> >> lock held to prevent the swap device from being swapoff. This may
> >> cause the race like below,
> >>
> >> CPU 1 CPU 2
> >> ----- -----
> >> do_swap_page
> >> swapin_readahead
> >> __read_swap_cache_async
> >> swapoff swapcache_prepare
> >> p->swap_map = NULL __swap_duplicate
> >> p->swap_map[?] /* !!! NULL pointer access */
> >>
> >> Because swap off is usually done when system shutdown only, the race
> >> may not hit many people in practice. But it is still a race need to
> >> be fixed.
> >>
> >> To fix the race, get_swap_device() is added to prevent swap device
> >> from being swapoff until put_swap_device() is called. When
> >> get_swap_device() is called, the caller should have some locks (like
> >> PTL, page lock, or swap_info_struct->lock) held to guarantee the swap
> >> entry is valid, or check the origin of swap entry again to make sure
> >> the swap device hasn't been swapoff already.
> >>
> >> Because swapoff() is very race code path, to make the normal path runs
> >
> > s@race@rare@ I suppose
>
> Oops, thanks for pointing this out!
>
> >> as fast as possible, SRCU instead of reference count is used to
> >> implement get/put_swap_device(). From get_swap_device() to
> >> put_swap_device(), the reader side of SRCU is locked, so
> >> synchronize_srcu() in swapoff() will wait until put_swap_device() is
> >> called.
> >
> > It is quite unfortunate to pull SRCU as a dependency to the core kernel.
> > Different attempts to do this have failed in the past. This one is
> > slightly different though because I would suspect that those tiny
> > systems do not configure swap. But who knows, maybe they do.
>
> I remember Paul said there is a tiny implementation of SRCU which can
> fit this requirement.
>
> Hi, Paul, whether my memory is correct?
Yes, if you build with CONFIG_SMP=n, then you will get Tiny SRCU, which
is quite compact.
Thanx, Paul
> > Anyway, if you are worried about performance then I would expect some
> > numbers to back that worry. So why don't simply start with simpler
> > ref count based and then optimize it later based on some actual numbers.
>
> My -V1 is based on ref count. I think the performance difference should
> be not measurable. The idea is that swapoff() is so rare, so we should
> accelerate normal path as much as possible, even if this will cause slow
> down in swapoff. If we cannot use SRCU in the end, we may try RCU,
> preempt off (for stop_machine()), etc.
>
> > Btw. have you considered pcp refcount framework. I would suspect that
> > this would give you close to SRCU performance.
>
> No. I think pcp refcount doesn't fit here. You should hold a initial
> refcount for pcp refcount, it isn't the case here.
>
> Best Regards,
> Huang, Ying
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Hugh Dickins <hughd@google.com>, Minchan Kim <minchan@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Tim Chen <tim.c.chen@linux.intel.com>, Shaohua Li <shli@fb.com>,
Mel Gorman <mgorman@techsingularity.net>,
JXrXme Glisse <jglisse@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
David Rientjes <rientjes@google.com>,
Rik van Riel <riel@redhat.com>, Jan Kara <jack@suse.cz>,
Dave Jiang <dave.jiang@intel.com>, Aaron Lu <aaron.lu@intel.com>
Subject: Re: [PATCH -mm -V2] mm, swap: Fix race between swapoff and some swap operations
Date: Thu, 14 Dec 2017 21:49:29 -0800 [thread overview]
Message-ID: <20171215054929.GW7829@linux.vnet.ibm.com> (raw)
In-Reply-To: <871sjwn5bk.fsf@yhuang-dev.intel.com>
On Fri, Dec 15, 2017 at 09:33:03AM +0800, Huang, Ying wrote:
> Michal Hocko <mhocko@kernel.org> writes:
>
> > On Thu 14-12-17 21:38:32, Huang, Ying wrote:
> >> From: Huang Ying <ying.huang@intel.com>
> >>
> >> When the swapin is performed, after getting the swap entry information
> >> from the page table, system will swap in the swap entry, without any
> >> lock held to prevent the swap device from being swapoff. This may
> >> cause the race like below,
> >>
> >> CPU 1 CPU 2
> >> ----- -----
> >> do_swap_page
> >> swapin_readahead
> >> __read_swap_cache_async
> >> swapoff swapcache_prepare
> >> p->swap_map = NULL __swap_duplicate
> >> p->swap_map[?] /* !!! NULL pointer access */
> >>
> >> Because swap off is usually done when system shutdown only, the race
> >> may not hit many people in practice. But it is still a race need to
> >> be fixed.
> >>
> >> To fix the race, get_swap_device() is added to prevent swap device
> >> from being swapoff until put_swap_device() is called. When
> >> get_swap_device() is called, the caller should have some locks (like
> >> PTL, page lock, or swap_info_struct->lock) held to guarantee the swap
> >> entry is valid, or check the origin of swap entry again to make sure
> >> the swap device hasn't been swapoff already.
> >>
> >> Because swapoff() is very race code path, to make the normal path runs
> >
> > s@race@rare@ I suppose
>
> Oops, thanks for pointing this out!
>
> >> as fast as possible, SRCU instead of reference count is used to
> >> implement get/put_swap_device(). From get_swap_device() to
> >> put_swap_device(), the reader side of SRCU is locked, so
> >> synchronize_srcu() in swapoff() will wait until put_swap_device() is
> >> called.
> >
> > It is quite unfortunate to pull SRCU as a dependency to the core kernel.
> > Different attempts to do this have failed in the past. This one is
> > slightly different though because I would suspect that those tiny
> > systems do not configure swap. But who knows, maybe they do.
>
> I remember Paul said there is a tiny implementation of SRCU which can
> fit this requirement.
>
> Hi, Paul, whether my memory is correct?
Yes, if you build with CONFIG_SMP=n, then you will get Tiny SRCU, which
is quite compact.
Thanx, Paul
> > Anyway, if you are worried about performance then I would expect some
> > numbers to back that worry. So why don't simply start with simpler
> > ref count based and then optimize it later based on some actual numbers.
>
> My -V1 is based on ref count. I think the performance difference should
> be not measurable. The idea is that swapoff() is so rare, so we should
> accelerate normal path as much as possible, even if this will cause slow
> down in swapoff. If we cannot use SRCU in the end, we may try RCU,
> preempt off (for stop_machine()), etc.
>
> > Btw. have you considered pcp refcount framework. I would suspect that
> > this would give you close to SRCU performance.
>
> No. I think pcp refcount doesn't fit here. You should hold a initial
> refcount for pcp refcount, it isn't the case here.
>
> Best Regards,
> Huang, Ying
>
next prev parent reply other threads:[~2017-12-15 5:49 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-14 13:38 [PATCH -mm -V2] mm, swap: Fix race between swapoff and some swap operations Huang, Ying
2017-12-14 13:38 ` Huang, Ying
2017-12-14 15:17 ` Michal Hocko
2017-12-14 15:17 ` Michal Hocko
2017-12-14 20:42 ` Andrew Morton
2017-12-14 20:42 ` Andrew Morton
2017-12-15 1:57 ` Huang, Ying
2017-12-15 10:04 ` Michal Hocko
2017-12-15 10:04 ` Michal Hocko
2017-12-15 22:27 ` Andrew Morton
2017-12-15 22:27 ` Andrew Morton
2017-12-15 1:33 ` Huang, Ying
2017-12-15 5:49 ` Paul E. McKenney [this message]
2017-12-15 5:49 ` Paul E. McKenney
2017-12-15 6:37 ` Huang, Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171215054929.GW7829@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=aarcange@redhat.com \
--cc=aaron.lu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=dave.jiang@intel.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=jglisse@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@kernel.org \
--cc=minchan@kernel.org \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=shli@fb.com \
--cc=tim.c.chen@linux.intel.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.