Re: [PATCH V2] lib/group_cpus.c: avoid to acquire cpu hotplug lock in group_cpus_evenly

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ming Lei <ming.lei@redhat.com>
To: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Jens Axboe <axboe@kernel.dk>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org, Keith Busch <kbusch@kernel.org>,
	linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	Yi Zhang <yi.zhang@redhat.com>,
	Guangwu Zhang <guazhang@redhat.com>
Subject: Re: [PATCH V2] lib/group_cpus.c: avoid to acquire cpu hotplug lock in group_cpus_evenly
Date: Fri, 18 Aug 2023 21:58:36 +0800	[thread overview]
Message-ID: <ZN95DCe2Ipt2FW75@fedora> (raw)
In-Reply-To: <a60de9ff-6dad-f243-6bd0-56810ef57c85@linux.dev>

On Fri, Aug 18, 2023 at 02:59:13PM +0800, Chengming Zhou wrote:
> Hi,
> 
> On 2023/8/18 09:52, Ming Lei wrote:
> > group_cpus_evenly() could be part of storage driver's error handler,
> > such as nvme driver, when may happen during CPU hotplug, in which
> > storage queue has to drain its pending IOs because all CPUs associated
> > with the queue are offline and the queue is becoming inactive. And
> > handling IO needs error handler to provide forward progress.
> > 
> > Then dead lock is caused:
> > 
> > 1) inside CPU hotplug handler, CPU hotplug lock is held, and blk-mq's
> > handler is waiting for inflight IO
> > 
> > 2) error handler is waiting for CPU hotplug lock
> > 
> > 3) inflight IO can't be completed in blk-mq's CPU hotplug handler because
> > error handling can't provide forward progress.
> > 
> > Solve the deadlock by not holding CPU hotplug lock in group_cpus_evenly(),
> > in which two stage spreads are taken: 1) the 1st stage is over all present
> > CPUs; 2) the end stage is over all other CPUs.
> > 
> > Turns out the two stage spread just needs consistent 'cpu_present_mask', and
> > remove the CPU hotplug lock by storing it into one local cache. This way
> > doesn't change correctness, because all CPUs are still covered.
> > 
> > Cc: Keith Busch <kbusch@kernel.org>
> > Cc: linux-nvme@lists.infradead.org
> > Cc: linux-block@vger.kernel.org
> > Reported-by: Yi Zhang <yi.zhang@redhat.com>
> > Reported-by: Guangwu Zhang <guazhang@redhat.com>
> > Tested-by: Guangwu Zhang <guazhang@redhat.com>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> > V2:
> > 	- fix "Cc: block list"
> > 	- add tested-by tag
> > 
> >  lib/group_cpus.c | 22 ++++++++++++++++------
> >  1 file changed, 16 insertions(+), 6 deletions(-)
> > 
> > diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> > index aa3f6815bb12..15006e79196f 100644
> > --- a/lib/group_cpus.c
> > +++ b/lib/group_cpus.c
> > @@ -348,6 +348,7 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps)
> >  {
> >  	unsigned int curgrp = 0, nr_present = 0, nr_others = 0;
> >  	cpumask_var_t *node_to_cpumask;
> > +	cpumask_var_t local_cpu_present_mask;
> >  	cpumask_var_t nmsk, npresmsk;
> >  	int ret = -ENOMEM;
> >  	struct cpumask *masks = NULL;
> > @@ -355,6 +356,16 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps)
> >  	if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL))
> >  		return NULL;
> >  
> > +	if (!zalloc_cpumask_var(&local_cpu_present_mask, GFP_KERNEL))
> > +		goto fail_local_pres_mask;
> > +
> > +	/*
> > +	 * Make a local cache of 'cpu_present_mask', so the two stages
> > +	 * spread can observe consistent 'cpu_present_mask' without holding
> > +	 * cpu hotplug lock.
> > +	 */
> > +	cpumask_copy(local_cpu_present_mask, cpu_present_mask);
> > +
> 
> Maybe we can reuse npresmsk instead of allocating another cpumask?
> In the first stage: npresmsk = cpu_present_mask
> In the second stage: npresmsk = cpu_possible_mask & ~npresmsk

Good idea!


Thanks, 
Ming

     prev parent reply	other threads:[~2023-08-18 14:00 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-18  1:52 [PATCH V2] lib/group_cpus.c: avoid to acquire cpu hotplug lock in group_cpus_evenly Ming Lei
2023-08-18  6:59 ` Chengming Zhou
2023-08-18 13:58   ` Ming Lei [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZN95DCe2Ipt2FW75@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=chengming.zhou@linux.dev \
    --cc=guazhang@redhat.com \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=tglx@linutronix.de \
    --cc=yi.zhang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.