All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Bart Van Assche <bvanassche@acm.org>
Cc: Sagi Grimberg <sagi@grimberg.me>, Jens Axboe <axboe@kernel.dk>,
	James Smart <james.smart@broadcom.com>,
	linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	Keith Busch <kbusch@kernel.org>, Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH RFC 0/3] blk-mq/nvme: use blk_mq_alloc_request() for NVMe's connect request
Date: Sun, 17 Nov 2019 12:12:33 +0800	[thread overview]
Message-ID: <20191117041233.GA30615@ming.t460p> (raw)
In-Reply-To: <4a39a98e-19bc-0a9a-3d92-ceab2c656037@acm.org>

On Sat, Nov 16, 2019 at 05:24:05PM -0800, Bart Van Assche wrote:
> On 2019-11-15 23:17, Ming Lei wrote:
> > Now blk-mq takes a static queue mapping between CPU and hw queues, given
> > CPU hotplug may happen any time, so the specified hw queue may become
> > inactive any time.
> 
> Hi Ming,
> 
> I can trigger a race between blk_mq_alloc_request_hctx() and
> CPU-hotplugging by running blktests. The patch below fixes that race
> on my setup. Does this patch also fix the race(s) that you ran into?

The following problem has been triggered in my regular test for years,
is it same with yours?

[ 2248.751675] nvme nvme1: creating 2 I/O queues.
[ 2248.752351] BUG: unable to handle page fault for address: 0000607d064434a8
[ 2248.753348] #PF: supervisor write access in kernel mode
[ 2248.754106] #PF: error_code(0x0002) - not-present page
[ 2248.754846] PGD 0 P4D 0
[ 2248.755230] Oops: 0002 [#1] PREEMPT SMP PTI
[ 2248.755838] CPU: 7 PID: 16293 Comm: kworker/u18:3 Not tainted 5.4.0-rc7_96b95eff4a59_master+ #1
[ 2248.757089] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180724_192412-buildhw-07.phx2.fedoraproject.org-1.fc29 04/01/2014
[ 2248.758863] Workqueue: nvme-reset-wq nvme_loop_reset_ctrl_work [nvme_loop]
[ 2248.759857] RIP: 0010:blk_mq_get_request+0x2a8/0x31c
[ 2248.760654] Code: c7 83 08 01 00 00 00 00 00 00 48 c7 83 10 01 00 00 00 00 00 00 48 8b 55 18 45 84 ed 74 0c 31 c0 41 81 e5 00 08 06 00 0f 95 c0 <48> ff 44 c2 68 c7 83 d4 00 00 00 01 00 00 00 f7 45 10 00 00 06 00
[ 2248.763375] RSP: 0018:ffffc900012dbc80 EFLAGS: 00010246
[ 2248.764130] RAX: 0000000000000000 RBX: ffff888170d70000 RCX: 0000000000000017
[ 2248.765156] RDX: 0000607d06443440 RSI: 0000020bb36c554e RDI: 0000020bb3837c3f
[ 2248.766034] RBP: ffffc900012dbcc0 R08: 00000000f461df07 R09: 00000000000000a8
[ 2248.767084] R10: ffffc900012dbe50 R11: 0000000000000002 R12: 0000000000000000
[ 2248.768109] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 2248.769134] FS:  0000000000000000(0000) GS:ffff88827bd80000(0000) knlGS:0000000000000000
[ 2248.770294] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2248.771125] CR2: 0000607d064434a8 CR3: 0000000272866001 CR4: 0000000000760ee0
[ 2248.772152] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2248.773179] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2248.774204] PKRU: 55555554
[ 2248.774603] Call Trace:
[ 2248.774983]  blk_mq_alloc_request_hctx+0xc5/0x10e
[ 2248.775674]  nvme_alloc_request+0x42/0x71
[ 2248.776263]  __nvme_submit_sync_cmd+0x49/0x1b2
[ 2248.776910]  nvmf_connect_io_queue+0x12c/0x195 [nvme_fabrics]
[ 2248.777663]  ? nvme_loop_connect_io_queues+0x2f/0x54 [nvme_loop]
[ 2248.778481]  nvme_loop_connect_io_queues+0x2f/0x54 [nvme_loop]
[ 2248.779325]  nvme_loop_reset_ctrl_work+0x62/0xd4 [nvme_loop]
[ 2248.780144]  process_one_work+0x1a8/0x2a1
[ 2248.780727]  ? process_scheduled_works+0x2c/0x2c
[ 2248.781398]  process_scheduled_works+0x27/0x2c
[ 2248.782046]  worker_thread+0x1b1/0x23f
[ 2248.782594]  kthread+0xf5/0xfa
[ 2248.783048]  ? kthread_unpark+0x62/0x62
[ 2248.783608]  ret_from_fork+0x35/0x40

> 
> Thanks,
> 
> Bart.
> 
> 
> Subject: [PATCH] blk-mq: Fix a race between blk_mq_alloc_request_hctx() and
>  CPU hot-plugging
> 
> ---
>  block/blk-mq.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 20a71dcdc339..16057aa2307f 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -442,13 +442,15 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>  	if (WARN_ON_ONCE(!(flags & BLK_MQ_REQ_NOWAIT)))
>  		return ERR_PTR(-EINVAL);
> 
> -	if (hctx_idx >= q->nr_hw_queues)
> -		return ERR_PTR(-EIO);
> -
>  	ret = blk_queue_enter(q, flags);
>  	if (ret)
>  		return ERR_PTR(ret);
> 
> +	if (hctx_idx >= q->nr_hw_queues) {
> +		blk_queue_exit(q);
> +		return ERR_PTR(-EIO);
> +	}
> +

Not sure how this patch can make a difference since blk_queue_enter()
never checks if hctx is active, the problem is that the hctx represented
by 'hctx_idx' becomes inactive when calling blk_mq_alloc_request_hctx()(
all CPUs of this hctx becomes offline).

The problem simply is in the following code:

        cpu = cpumask_first_and(alloc_data.hctx->cpumask, cpu_online_mask);
        alloc_data.ctx = __blk_mq_get_ctx(q, cpu);

        rq = blk_mq_get_request(q, NULL, &alloc_data);
        blk_queue_exit(q);

'cpu' becomes 'nr_cpu_ids', then kernel oops will be triggered in
blk_mq_get_request().


Thanks,
Ming


WARNING: multiple messages have this Message-ID (diff)
From: Ming Lei <ming.lei@redhat.com>
To: Bart Van Assche <bvanassche@acm.org>
Cc: Jens Axboe <axboe@kernel.dk>, Sagi Grimberg <sagi@grimberg.me>,
	James Smart <james.smart@broadcom.com>,
	linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	Keith Busch <kbusch@kernel.org>, Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH RFC 0/3] blk-mq/nvme: use blk_mq_alloc_request() for NVMe's connect request
Date: Sun, 17 Nov 2019 12:12:33 +0800	[thread overview]
Message-ID: <20191117041233.GA30615@ming.t460p> (raw)
In-Reply-To: <4a39a98e-19bc-0a9a-3d92-ceab2c656037@acm.org>

On Sat, Nov 16, 2019 at 05:24:05PM -0800, Bart Van Assche wrote:
> On 2019-11-15 23:17, Ming Lei wrote:
> > Now blk-mq takes a static queue mapping between CPU and hw queues, given
> > CPU hotplug may happen any time, so the specified hw queue may become
> > inactive any time.
> 
> Hi Ming,
> 
> I can trigger a race between blk_mq_alloc_request_hctx() and
> CPU-hotplugging by running blktests. The patch below fixes that race
> on my setup. Does this patch also fix the race(s) that you ran into?

The following problem has been triggered in my regular test for years,
is it same with yours?

[ 2248.751675] nvme nvme1: creating 2 I/O queues.
[ 2248.752351] BUG: unable to handle page fault for address: 0000607d064434a8
[ 2248.753348] #PF: supervisor write access in kernel mode
[ 2248.754106] #PF: error_code(0x0002) - not-present page
[ 2248.754846] PGD 0 P4D 0
[ 2248.755230] Oops: 0002 [#1] PREEMPT SMP PTI
[ 2248.755838] CPU: 7 PID: 16293 Comm: kworker/u18:3 Not tainted 5.4.0-rc7_96b95eff4a59_master+ #1
[ 2248.757089] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180724_192412-buildhw-07.phx2.fedoraproject.org-1.fc29 04/01/2014
[ 2248.758863] Workqueue: nvme-reset-wq nvme_loop_reset_ctrl_work [nvme_loop]
[ 2248.759857] RIP: 0010:blk_mq_get_request+0x2a8/0x31c
[ 2248.760654] Code: c7 83 08 01 00 00 00 00 00 00 48 c7 83 10 01 00 00 00 00 00 00 48 8b 55 18 45 84 ed 74 0c 31 c0 41 81 e5 00 08 06 00 0f 95 c0 <48> ff 44 c2 68 c7 83 d4 00 00 00 01 00 00 00 f7 45 10 00 00 06 00
[ 2248.763375] RSP: 0018:ffffc900012dbc80 EFLAGS: 00010246
[ 2248.764130] RAX: 0000000000000000 RBX: ffff888170d70000 RCX: 0000000000000017
[ 2248.765156] RDX: 0000607d06443440 RSI: 0000020bb36c554e RDI: 0000020bb3837c3f
[ 2248.766034] RBP: ffffc900012dbcc0 R08: 00000000f461df07 R09: 00000000000000a8
[ 2248.767084] R10: ffffc900012dbe50 R11: 0000000000000002 R12: 0000000000000000
[ 2248.768109] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 2248.769134] FS:  0000000000000000(0000) GS:ffff88827bd80000(0000) knlGS:0000000000000000
[ 2248.770294] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2248.771125] CR2: 0000607d064434a8 CR3: 0000000272866001 CR4: 0000000000760ee0
[ 2248.772152] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2248.773179] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2248.774204] PKRU: 55555554
[ 2248.774603] Call Trace:
[ 2248.774983]  blk_mq_alloc_request_hctx+0xc5/0x10e
[ 2248.775674]  nvme_alloc_request+0x42/0x71
[ 2248.776263]  __nvme_submit_sync_cmd+0x49/0x1b2
[ 2248.776910]  nvmf_connect_io_queue+0x12c/0x195 [nvme_fabrics]
[ 2248.777663]  ? nvme_loop_connect_io_queues+0x2f/0x54 [nvme_loop]
[ 2248.778481]  nvme_loop_connect_io_queues+0x2f/0x54 [nvme_loop]
[ 2248.779325]  nvme_loop_reset_ctrl_work+0x62/0xd4 [nvme_loop]
[ 2248.780144]  process_one_work+0x1a8/0x2a1
[ 2248.780727]  ? process_scheduled_works+0x2c/0x2c
[ 2248.781398]  process_scheduled_works+0x27/0x2c
[ 2248.782046]  worker_thread+0x1b1/0x23f
[ 2248.782594]  kthread+0xf5/0xfa
[ 2248.783048]  ? kthread_unpark+0x62/0x62
[ 2248.783608]  ret_from_fork+0x35/0x40

> 
> Thanks,
> 
> Bart.
> 
> 
> Subject: [PATCH] blk-mq: Fix a race between blk_mq_alloc_request_hctx() and
>  CPU hot-plugging
> 
> ---
>  block/blk-mq.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 20a71dcdc339..16057aa2307f 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -442,13 +442,15 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>  	if (WARN_ON_ONCE(!(flags & BLK_MQ_REQ_NOWAIT)))
>  		return ERR_PTR(-EINVAL);
> 
> -	if (hctx_idx >= q->nr_hw_queues)
> -		return ERR_PTR(-EIO);
> -
>  	ret = blk_queue_enter(q, flags);
>  	if (ret)
>  		return ERR_PTR(ret);
> 
> +	if (hctx_idx >= q->nr_hw_queues) {
> +		blk_queue_exit(q);
> +		return ERR_PTR(-EIO);
> +	}
> +

Not sure how this patch can make a difference since blk_queue_enter()
never checks if hctx is active, the problem is that the hctx represented
by 'hctx_idx' becomes inactive when calling blk_mq_alloc_request_hctx()(
all CPUs of this hctx becomes offline).

The problem simply is in the following code:

        cpu = cpumask_first_and(alloc_data.hctx->cpumask, cpu_online_mask);
        alloc_data.ctx = __blk_mq_get_ctx(q, cpu);

        rq = blk_mq_get_request(q, NULL, &alloc_data);
        blk_queue_exit(q);

'cpu' becomes 'nr_cpu_ids', then kernel oops will be triggered in
blk_mq_get_request().


Thanks,
Ming


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2019-11-17  4:12 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-15 10:42 [PATCH RFC 0/3] blk-mq/nvme: use blk_mq_alloc_request() for NVMe's connect request Ming Lei
2019-11-15 10:42 ` Ming Lei
2019-11-15 10:42 ` [PATCH RFC 1/3] block: reuse one scheduler/flush field for private request's data Ming Lei
2019-11-15 10:42   ` Ming Lei
2019-11-15 10:42 ` [PATCH RFC 2/3] nvme: don't use blk_mq_alloc_request_hctx() for allocating connect request Ming Lei
2019-11-15 10:42   ` Ming Lei
2019-11-15 10:42 ` [PATCH RFC 3/3] blk-mq: kill blk_mq_alloc_request_hctx() Ming Lei
2019-11-15 10:42   ` Ming Lei
2019-11-15 22:38 ` [PATCH RFC 0/3] blk-mq/nvme: use blk_mq_alloc_request() for NVMe's connect request Sagi Grimberg
2019-11-15 22:38   ` Sagi Grimberg
2019-11-16  7:17   ` Ming Lei
2019-11-16  7:17     ` Ming Lei
2019-11-17  1:24     ` Bart Van Assche
2019-11-17  1:24       ` Bart Van Assche
2019-11-17  4:12       ` Ming Lei [this message]
2019-11-17  4:12         ` Ming Lei
2019-11-18 23:27         ` Bart Van Assche
2019-11-18 23:27           ` Bart Van Assche
2019-11-19  0:05     ` Sagi Grimberg
2019-11-19  0:05       ` Sagi Grimberg
2019-11-19  0:34       ` Keith Busch
2019-11-19  0:34         ` Keith Busch
2019-11-19  1:43         ` Sagi Grimberg
2019-11-19  1:43           ` Sagi Grimberg
2019-11-19  2:38         ` Ming Lei
2019-11-19  2:38           ` Ming Lei
2019-11-19  2:33       ` Ming Lei
2019-11-19  2:33         ` Ming Lei
2019-11-19 17:56       ` James Smart
2019-11-19 17:56         ` James Smart
2019-11-20  6:35         ` Ming Lei
2019-11-20  6:35           ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191117041233.GA30615@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=hch@lst.de \
    --cc=james.smart@broadcom.com \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.