All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	"Hugh Dickins" <hughd@google.com>,
	"Paul E . McKenney" <paulmck@linux.vnet.ibm.com>,
	"Minchan Kim" <minchan@kernel.org>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Tim Chen" <tim.c.chen@linux.intel.com>,
	"Mel Gorman" <mgorman@techsingularity.net>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Michal Hocko" <mhocko@suse.com>,
	"David Rientjes" <rientjes@google.com>,
	"Rik van Riel" <riel@redhat.com>, "Jan Kara" <jack@suse.cz>,
	"Dave Jiang" <dave.jiang@intel.com>,
	"Daniel Jordan" <daniel.m.jordan@oracle.com>,
	"Andrea Parri" <andrea.parri@amarulasolutions.com>
Subject: Re: [PATCH -mm -V7] mm, swap: fix race between swapoff and some swap operations
Date: Wed, 13 Feb 2019 21:38:05 -0500	[thread overview]
Message-ID: <20190214023805.GA19090@redhat.com> (raw)
In-Reply-To: <20190211083846.18888-1-ying.huang@intel.com>

Hello everyone,

On Mon, Feb 11, 2019 at 04:38:46PM +0800, Huang, Ying wrote:
> @@ -2386,7 +2463,17 @@ static void enable_swap_info(struct swap_info_struct *p, int prio,
>  	frontswap_init(p->type, frontswap_map);
>  	spin_lock(&swap_lock);
>  	spin_lock(&p->lock);
> -	 _enable_swap_info(p, prio, swap_map, cluster_info);
> +	setup_swap_info(p, prio, swap_map, cluster_info);
> +	spin_unlock(&p->lock);
> +	spin_unlock(&swap_lock);
> +	/*
> +	 * Guarantee swap_map, cluster_info, etc. fields are used
> +	 * between get/put_swap_device() only if SWP_VALID bit is set
> +	 */
> +	stop_machine(swap_onoff_stop, NULL, cpu_online_mask);

Should cpu_online_mask be read while holding cpus_read_lock?

	cpus_read_lock();
	err = __stop_machine(swap_onoff_stop, NULL, cpu_online_mask);
	cpus_read_unlock();

I missed what the exact motivation was for the switch from
rcu_read_lock()/syncrhonize_rcu() to preempt_disable()/stop_machine().

It looks like the above stop_machine all it does is to reach a
quiescent point, when you've RCU that already can reach the quiescent
point without an explicit stop_machine.

The reason both implementations are basically looking the same is that
stop_machine dummy call of swap_onoff_stop() { /* noop */ } will only
reach a quiescent point faster than RCU, but it's otherwise
functionally identical to RCU, but it's extremely more expensive. If
it wasn't functionally identical stop_machine() couldn't be used as a
drop in replacement of synchronize_sched() in the previous patch.

I don't see the point of worrying about the synchronize_rcu latency in
swapoff when RCU is basically identical and not more complex.

So to be clear, I'm not against stop_machine() but with stop_machine()
method invoked in all CPUs, you can actually do more than RCU and you
can remove real locking not just reach a quiescent point.

With stop_machine() the code would need reshuffling around so that the
actual p->swap_map = NULL happens inside stop_machine, not outside
like with RCU.

With RCU all code stays concurrent at all times, simply the race is
controlled, as opposed with stop_machine() you can make fully
serialize and run like in UP temporarily (vs all preempt_disable()
section at least).

For example nr_swapfiles could in theory become a constant under
preempt_disable() with stop_machine() without having to take a
swap_lock.

swap_onoff_stop can be implemented like this:

enum {
	FIRST_STOP_MACHINE_INIT,
	FIRST_STOP_MACHINE_START,
	FIRST_STOP_MACHINE_END,
};
static int first_stop_machine;
static int swap_onoff_stop(void *data)
{
	struct swap_stop_machine *swsm = (struct swap_stop_machine *)data;
	int first;

	first = cmpxchg(&first_stop_machine, FIRST_STOP_MACHINE_INIT,
			FIRST_STOP_MACHINE_START);
	if (first == FIRST_STOP_MACHINE_INIT) {
		swsm->p->swap_map = NULL;
		/* add more stuff here until swap_lock goes away */
		smp_wmb();
		WRITE_ONCE(first_stop_machine, FIRST_STOP_MACHINE_END);
	} else {
		do {
			cpu_relax();
		} while (READ_ONCE(first_stop_machine) !=
			 FIRST_STOP_MACHINE_END);
		smp_rmb();
	}

	return 0;
}

stop_machine invoked with a method like above, will guarantee while we
set p->swap_map to NULL (and while we do nr_swapfiles++) nothing else
can run, no even interrupts, so some lock may just disappear. Only NMI
and SMI could possibly run concurrently with the swsm->p->swap_map =
NULL operation.

If we've to keep swap_onoff_stop() a dummy function run on all CPUs
just to reach a quiescent point, then I don't see why
the synchronize_rcu() (or synchronize_sched or synchronize_kernel or
whatever it is called right now, but still RCU) solution isn't
preferable.

Thanks,
Andrea


  parent reply	other threads:[~2019-02-14  2:38 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-11  8:38 [PATCH -mm -V7] mm, swap: fix race between swapoff and some swap operations Huang, Ying
2019-02-11 19:06 ` Daniel Jordan
2019-02-12  3:21   ` Andrea Parri
2019-02-12  6:47     ` Huang, Ying
2019-02-12 17:58       ` Tim Chen
2019-02-13  3:23         ` Huang, Ying
2019-02-12 20:06     ` Daniel Jordan
2019-02-12  6:40   ` Huang, Ying
2019-02-12 10:13 ` Andrea Parri
2019-02-15  6:34   ` Huang, Ying
2019-02-14  2:38 ` Andrea Arcangeli [this message]
2019-02-14  8:07   ` Huang, Ying
2019-02-14 21:47     ` Andrea Arcangeli
2019-02-15  7:50       ` Huang, Ying
2019-02-14 14:33 ` Michal Hocko
2019-02-14 20:30   ` Andrew Morton
2019-02-14 21:22     ` Andrea Arcangeli
2019-02-15  7:08   ` Huang, Ying
2019-02-15 13:11     ` Michal Hocko
2019-02-18  0:51       ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190214023805.GA19090@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrea.parri@amarulasolutions.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=dave.jiang@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jglisse@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.