Re: [PATCH 2/2] blk-mq: simplify queue mapping & schedule with each possisble CPU

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ming Lei <ming.lei@redhat.com>
To: "jianchao.wang" <jianchao.w.wang@oracle.com>
Cc: Jens Axboe <axboe@fb.com>,
	linux-block@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Stefan Haberland <sth@linux.vnet.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
	James Smart <james.smart@broadcom.com>,
	Keith Busch <keith.busch@intel.com>,
	Sagi Grimberg <sagi@grimberg.me>,
	linux-nvme@lists.infradead.org
Subject: Re: [PATCH 2/2] blk-mq: simplify queue mapping & schedule with each possisble CPU
Date: Wed, 17 Jan 2018 14:22:52 +0800	[thread overview]
Message-ID: <20180117062251.GC9487@ming.t460p> (raw)
In-Reply-To: <8c8efce8-ea02-0a9e-8369-44c885f4731d@oracle.com>

Hi Jianchao,

On Wed, Jan 17, 2018 at 01:24:23PM +0800, jianchao.wang wrote:
> Hi ming
> 
> Thanks for your kindly response.
> 
> On 01/17/2018 11:52 AM, Ming Lei wrote:
> >> It is here.
> >> __blk_mq_run_hw_queue()
> >> ....
> >>     WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
> >>         cpu_online(hctx->next_cpu));
> > I think this warning is triggered after the CPU of hctx->next_cpu becomes
> > online again, and it should have been dealt with by the following trick,
> > and this patch is against the previous one, please test it and see if
> > the warning can be fixed.
> > 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 23f0f3ddffcf..0620ccb65e4e 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -1452,6 +1452,9 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
> >  			tried = true;
> >  			goto select_cpu;
> >  		}
> > +
> > +		/* handle after this CPU of hctx->next_cpu becomes online again */
> > +		hctx->next_cpu_batch = 1;
> >  		return WORK_CPU_UNBOUND;
> >  	}
> >  	return hctx->next_cpu;
> > 
> 
> The WARNING still could be triggered.
> 
> [  282.194532] WARNING: CPU: 0 PID: 222 at /home/will/u04/source_code/linux-block/block/blk-mq.c:1315 __blk_mq_run_hw_queue+0x92/0xa0
> [  282.194534] Modules linked in: ....
> [  282.194561] CPU: 0 PID: 222 Comm: kworker/2:1H Tainted: G        W        4.15.0-rc7+ #17
> 

This warning can't be removed completely, for example, the CPU figured
in blk_mq_hctx_next_cpu(hctx) can be put on again just after the
following call returns and before __blk_mq_run_hw_queue() is scheduled
to run.

	kblockd_mod_delayed_work_on(blk_mq_hctx_next_cpu(hctx), &hctx->run_work, msecs_to_jiffies(msecs))

Just be curious how you trigger this issue? And is it triggered in CPU
hotplug stress test? Or in a normal use case?

> >> ....
> >>
> >> To eliminate this risk totally, we could blk_mq_hctx_next_cpu return the cpu  even if the cpu is offlined and modify the cpu_online above to cpu_active.
> >> The kworkers of the per-cpu pool must have be migrated back when the cpu is set active.
> >> But there seems to be issues in DASD as your previous comment.
> > Yes, we can't break DASD.
> > 
> >> That is the original version of this patch, and both Christian and Stefan
> >> reported that system can't boot from DASD in this way[2], and I changed
> >> to AND with cpu_online_mask, then their system can boot well
> >> On the other hand, there is also risk in 
> >>
> >> @@ -440,7 +440,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
> >>  		blk_queue_exit(q);
> >>  		return ERR_PTR(-EXDEV);
> >>  	}
> >> -	cpu = cpumask_first(alloc_data.hctx->cpumask);
> >> +	cpu = cpumask_first_and(alloc_data.hctx->cpumask, cpu_online_mask);
> >>  	alloc_data.ctx = __blk_mq_get_ctx(q, cpu);
> >>
> >> what if the cpus in alloc_data.hctx->cpumask are all offlined ?
> > This one is crazy, and is used by NVMe only, it should be fine if
> > the passed 'hctx_idx' is retrieved by the current running CPU, such
> > as the way of blk_mq_map_queue(). But if not, bad thing may happen.
> Even if retrieved by current running cpu, the task still could be migrated away when the cpu is offlined.
> 

I just checked NVMe code, looks only nvmf_connect_io_queue() passes
one specific 'qid' and calls blk_mq_alloc_request_hctx(); and for others,
NVME_QID_ANY is passed, then blk_mq_alloc_request() is called in
nvme_alloc_request().

CC NVMe list and guys.

Hello James, Keith, Christoph and Sagi,

Looks nvmf_connect_io_queue() is run in the following fragile way:

	for (i = 1; i < ctrl->ctrl.queue_count; i++) {
		...
		nvmf_connect_io_queue(&ctrl->ctrl, i);
		...
	}

The hardware queue idx is passed to nvmf_connect_io_queue() directly
and finally blk_mq_alloc_request_hctx() is called to allocate request
for the queue, but all CPUs mapped to this hw queue may become offline
when running in blk_mq_alloc_request_hctx(), so what is the supposed way
to handle this situation?

Is it fine to return failure simply in blk_mq_alloc_request_hctx()
under this situation? But the CPU may become online later...

Thanks,
Ming

WARNING: multiple messages have this Message-ID (diff)

From: ming.lei@redhat.com (Ming Lei)
Subject: [PATCH 2/2] blk-mq: simplify queue mapping & schedule with each possisble CPU
Date: Wed, 17 Jan 2018 14:22:52 +0800	[thread overview]
Message-ID: <20180117062251.GC9487@ming.t460p> (raw)
In-Reply-To: <8c8efce8-ea02-0a9e-8369-44c885f4731d@oracle.com>

Hi Jianchao,

On Wed, Jan 17, 2018@01:24:23PM +0800, jianchao.wang wrote:
> Hi ming
> 
> Thanks for your kindly response.
> 
> On 01/17/2018 11:52 AM, Ming Lei wrote:
> >> It is here.
> >> __blk_mq_run_hw_queue()
> >> ....
> >>     WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
> >>         cpu_online(hctx->next_cpu));
> > I think this warning is triggered after the CPU of hctx->next_cpu becomes
> > online again, and it should have been dealt with by the following trick,
> > and this patch is against the previous one, please test it and see if
> > the warning can be fixed.
> > 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 23f0f3ddffcf..0620ccb65e4e 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -1452,6 +1452,9 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
> >  			tried = true;
> >  			goto select_cpu;
> >  		}
> > +
> > +		/* handle after this CPU of hctx->next_cpu becomes online again */
> > +		hctx->next_cpu_batch = 1;
> >  		return WORK_CPU_UNBOUND;
> >  	}
> >  	return hctx->next_cpu;
> > 
> 
> The WARNING still could be triggered.
> 
> [  282.194532] WARNING: CPU: 0 PID: 222 at /home/will/u04/source_code/linux-block/block/blk-mq.c:1315 __blk_mq_run_hw_queue+0x92/0xa0
> [  282.194534] Modules linked in: ....
> [  282.194561] CPU: 0 PID: 222 Comm: kworker/2:1H Tainted: G        W        4.15.0-rc7+ #17
> 

This warning can't be removed completely, for example, the CPU figured
in blk_mq_hctx_next_cpu(hctx) can be put on again just after the
following call returns and before __blk_mq_run_hw_queue() is scheduled
to run.

	kblockd_mod_delayed_work_on(blk_mq_hctx_next_cpu(hctx), &hctx->run_work, msecs_to_jiffies(msecs))

Just be curious how you trigger this issue? And is it triggered in CPU
hotplug stress test? Or in a normal use case?

> >> ....
> >>
> >> To eliminate this risk totally, we could blk_mq_hctx_next_cpu return the cpu  even if the cpu is offlined and modify the cpu_online above to cpu_active.
> >> The kworkers of the per-cpu pool must have be migrated back when the cpu is set active.
> >> But there seems to be issues in DASD as your previous comment.
> > Yes, we can't break DASD.
> > 
> >> That is the original version of this patch, and both Christian and Stefan
> >> reported that system can't boot from DASD in this way[2], and I changed
> >> to AND with cpu_online_mask, then their system can boot well
> >> On the other hand, there is also risk in 
> >>
> >> @@ -440,7 +440,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
> >>  		blk_queue_exit(q);
> >>  		return ERR_PTR(-EXDEV);
> >>  	}
> >> -	cpu = cpumask_first(alloc_data.hctx->cpumask);
> >> +	cpu = cpumask_first_and(alloc_data.hctx->cpumask, cpu_online_mask);
> >>  	alloc_data.ctx = __blk_mq_get_ctx(q, cpu);
> >>
> >> what if the cpus in alloc_data.hctx->cpumask are all offlined ?
> > This one is crazy, and is used by NVMe only, it should be fine if
> > the passed 'hctx_idx' is retrieved by the current running CPU, such
> > as the way of blk_mq_map_queue(). But if not, bad thing may happen.
> Even if retrieved by current running cpu, the task still could be migrated away when the cpu is offlined.
> 

I just checked NVMe code, looks only nvmf_connect_io_queue() passes
one specific 'qid' and calls blk_mq_alloc_request_hctx(); and for others,
NVME_QID_ANY is passed, then blk_mq_alloc_request() is called in
nvme_alloc_request().

CC NVMe list and guys.

Hello James, Keith, Christoph and Sagi,

Looks nvmf_connect_io_queue() is run in the following fragile way:

	for (i = 1; i < ctrl->ctrl.queue_count; i++) {
		...
		nvmf_connect_io_queue(&ctrl->ctrl, i);
		...
	}

The hardware queue idx is passed to nvmf_connect_io_queue() directly
and finally blk_mq_alloc_request_hctx() is called to allocate request
for the queue, but all CPUs mapped to this hw queue may become offline
when running in blk_mq_alloc_request_hctx(), so what is the supposed way
to handle this situation?

Is it fine to return failure simply in blk_mq_alloc_request_hctx()
under this situation? But the CPU may become online later...

Thanks,
Ming

next prev parent reply	other threads:[~2018-01-17  6:23 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-12  2:53 [PATCH 0/2] blk-mq: support physical CPU hotplug Ming Lei
2018-01-12  2:53 ` [PATCH 1/2] genirq/affinity: assign vectors to all possible CPUs Ming Lei
2018-01-12 19:35   ` Thomas Gleixner
2018-01-12  2:53 ` [PATCH 2/2] blk-mq: simplify queue mapping & schedule with each possisble CPU Ming Lei
2018-01-16 10:00   ` Stefan Haberland
2018-01-16 10:12   ` jianchao.wang
2018-01-16 12:10     ` Ming Lei
2018-01-16 14:31       ` jianchao.wang
2018-01-16 15:11         ` jianchao.wang
2018-01-16 15:32         ` Ming Lei
2018-01-17  2:56           ` jianchao.wang
2018-01-17  3:52             ` Ming Lei
2018-01-17  5:24               ` jianchao.wang
2018-01-17  6:22                 ` Ming Lei [this message]
2018-01-17  6:22                   ` Ming Lei
2018-01-17  8:09                   ` jianchao.wang
2018-01-17  8:09                     ` jianchao.wang
2018-01-17  9:57                     ` Ming Lei
2018-01-17  9:57                       ` Ming Lei
2018-01-17 10:07                       ` Christian Borntraeger
2018-01-17 10:07                         ` Christian Borntraeger
2018-01-17 10:14                         ` Christian Borntraeger
2018-01-17 10:14                           ` Christian Borntraeger
2018-01-17 10:17                         ` Ming Lei
2018-01-17 10:17                           ` Ming Lei
2018-01-19  3:05                       ` jianchao.wang
2018-01-19  3:05                         ` jianchao.wang
2018-01-26  9:31                         ` Ming Lei
2018-01-26  9:31                           ` Ming Lei
2018-01-12  8:12 ` [PATCH 0/2] blk-mq: support physical CPU hotplug Christian Borntraeger
2018-01-12 10:47   ` Johannes Thumshirn
2018-01-12 10:47     ` Johannes Thumshirn
2018-01-12 18:02 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180117062251.GC9487@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=axboe@fb.com \
    --cc=borntraeger@de.ibm.com \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=james.smart@broadcom.com \
    --cc=jianchao.w.wang@oracle.com \
    --cc=keith.busch@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    --cc=sth@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.