All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: "jianchao.wang" <jianchao.w.wang@oracle.com>
Cc: Jens Axboe <axboe@fb.com>,
	linux-block@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Stefan Haberland <sth@linux.vnet.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org, Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH 2/2] blk-mq: simplify queue mapping & schedule with each possisble CPU
Date: Tue, 16 Jan 2018 23:32:54 +0800	[thread overview]
Message-ID: <20180116153248.GA3018@ming.t460p> (raw)
In-Reply-To: <7c24e321-2d3b-cdec-699a-f58c34300aa9@oracle.com>

On Tue, Jan 16, 2018 at 10:31:42PM +0800, jianchao.wang wrote:
> Hi minglei
> 
> On 01/16/2018 08:10 PM, Ming Lei wrote:
> >>> -		next_cpu = cpumask_next(hctx->next_cpu, hctx->cpumask);
> >>> +		next_cpu = cpumask_next_and(hctx->next_cpu, hctx->cpumask,
> >>> +				cpu_online_mask);
> >>>  		if (next_cpu >= nr_cpu_ids)
> >>> -			next_cpu = cpumask_first(hctx->cpumask);
> >>> +			next_cpu = cpumask_first_and(hctx->cpumask,cpu_online_mask);
> >> the next_cpu here could be >= nr_cpu_ids when the none of on hctx->cpumask is online.
> > That supposes not happen because storage device(blk-mq hw queue) is
> > generally C/S model, that means the queue becomes only active when
> > there is online CPU mapped to it.
> > 
> > But it won't be true for non-block-IO queue, such as HPSA's queues[1], and
> > network controller RX queues.
> > 
> > [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__marc.info_-3Fl-3Dlinux-2Dkernel-26m-3D151601867018444-26w-3D2&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=tCZdQH6JUW1dkNCN92ycoUoKfDU_qWj-7EsUoYpOeJ0&s=vgHC9sbjYQb7mtY9MUJzbVXyVEyjoNJPWEx4_rfrHxU&e=
> > 
> > One thing I am still not sure(but generic irq affinity supposes to deal with
> > well) is that the CPU may become offline after the IO is just submitted,
> > then where the IRQ controller delivers the interrupt of this hw queue
> > to?
> > 
> >> This could be reproduced on NVMe with a patch that could hold some rqs on ctx->rq_list,
> >> meanwhile a script online and offline the cpus. Then a panic occurred in __queue_work().
> > That shouldn't happen, when CPU offline happens the rqs in ctx->rq_list
> > are dispatched directly, please see blk_mq_hctx_notify_dead().
> 
> Yes, I know. The  blk_mq_hctx_notify_dead will be invoked after the cpu has been set offlined.
> Please refer to the following diagram.
> 
> CPU A                      CPU T
>                  kick  
>   _cpu_down()     ->       cpuhp_thread_fun (cpuhpT kthread)
>                                AP_ACTIVE           (clear cpu_active_mask)
>                                  |
>                                  v
>                                AP_WORKQUEUE_ONLINE (unbind workers)
>                                  |
>                                  v
>                                TEARDOWN_CPU        (stop_machine)
>                                     ,                   | execute
>                                      \_ _ _ _ _ _       v
>                                         preempt  V  take_cpu_down ( migration kthread)
>                                                     set_cpu_online(smp_processor_id(), false) (__cpu_disable)  ------> Here !!!
>                                                     TEARDOWN_CPU
>                                                         |
>              cpuhpT kthead is    |                      v
>              migrated away       ,                    AP_SCHED_STARTING (migrate_tasks)
>                  _ _ _ _ _ _ _ _/                       |
>                 V                                       v
>               CPU X                                   AP_OFFLINE
>                                                         
>                                                         |
>                                                         ,
>                                              _ _ _ _ _ /
>                                             V
>                                       do_idle (idle task)
>  <_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ cpuhp_report_idle_dead
>                          complete st->done_down
>            __cpu_die (cpuhpT kthread, teardown_cpu) 
> 
>  AP_OFFLINE
>    |
>    v
>  BRINGUP_CPU
>    |
>    v
>  BLK_MQ_DEAD    -------> Here !!!
>    |
>    v
>  OFFLINE
> 
> The cpu has been cleared in cpu_online_mask when blk_mq_hctx_notify_dead is invoked.
> If the device is NVMe which only has one cpu mapped on the hctx, 
> cpumask_first_and(hctx->cpumask,cpu_online_mask) will return a bad value.

Hi Jianchao,

OK, I got it, and it should have been the only corner case in which
all CPUs mapped to this hctx become offline, and I believe the following
patch should address this case, could you give a test?

---
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c376d1b6309a..23f0f3ddffcf 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1416,21 +1416,44 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
  */
 static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
 {
+	bool tried = false;
+
 	if (hctx->queue->nr_hw_queues == 1)
 		return WORK_CPU_UNBOUND;
 
 	if (--hctx->next_cpu_batch <= 0) {
 		int next_cpu;
+select_cpu:
 
 		next_cpu = cpumask_next_and(hctx->next_cpu, hctx->cpumask,
 				cpu_online_mask);
 		if (next_cpu >= nr_cpu_ids)
 			next_cpu = cpumask_first_and(hctx->cpumask,cpu_online_mask);
 
-		hctx->next_cpu = next_cpu;
+		/*
+		 * No online CPU can be found here when running from
+		 * blk_mq_hctx_notify_dead(), so make sure hctx->next_cpu
+		 * is set correctly.
+		 */
+		if (next_cpu >= nr_cpu_ids)
+			hctx->next_cpu = cpumask_first_and(hctx->cpumask,
+					cpu_possible_mask);
+		else
+			hctx->next_cpu = next_cpu;
 		hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
 	}
 
+	/*
+	 * Do unbound schedule if we can't find a online CPU for this hctx,
+	 * and it should happen only if hctx->next_cpu is becoming DEAD.
+	 */
+	if (!cpu_online(hctx->next_cpu)) {
+		if (!tried) {
+			tried = true;
+			goto select_cpu;
+		}
+		return WORK_CPU_UNBOUND;
+	}
 	return hctx->next_cpu;
 }
 

Thanks,
Ming

  parent reply	other threads:[~2018-01-16 15:33 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-12  2:53 [PATCH 0/2] blk-mq: support physical CPU hotplug Ming Lei
2018-01-12  2:53 ` [PATCH 1/2] genirq/affinity: assign vectors to all possible CPUs Ming Lei
2018-01-12 19:35   ` Thomas Gleixner
2018-01-12  2:53 ` [PATCH 2/2] blk-mq: simplify queue mapping & schedule with each possisble CPU Ming Lei
2018-01-16 10:00   ` Stefan Haberland
2018-01-16 10:12   ` jianchao.wang
2018-01-16 12:10     ` Ming Lei
2018-01-16 14:31       ` jianchao.wang
2018-01-16 15:11         ` jianchao.wang
2018-01-16 15:32         ` Ming Lei [this message]
2018-01-17  2:56           ` jianchao.wang
2018-01-17  3:52             ` Ming Lei
2018-01-17  5:24               ` jianchao.wang
2018-01-17  6:22                 ` Ming Lei
2018-01-17  6:22                   ` Ming Lei
2018-01-17  8:09                   ` jianchao.wang
2018-01-17  8:09                     ` jianchao.wang
2018-01-17  9:57                     ` Ming Lei
2018-01-17  9:57                       ` Ming Lei
2018-01-17 10:07                       ` Christian Borntraeger
2018-01-17 10:07                         ` Christian Borntraeger
2018-01-17 10:14                         ` Christian Borntraeger
2018-01-17 10:14                           ` Christian Borntraeger
2018-01-17 10:17                         ` Ming Lei
2018-01-17 10:17                           ` Ming Lei
2018-01-19  3:05                       ` jianchao.wang
2018-01-19  3:05                         ` jianchao.wang
2018-01-26  9:31                         ` Ming Lei
2018-01-26  9:31                           ` Ming Lei
2018-01-12  8:12 ` [PATCH 0/2] blk-mq: support physical CPU hotplug Christian Borntraeger
2018-01-12 10:47   ` Johannes Thumshirn
2018-01-12 10:47     ` Johannes Thumshirn
2018-01-12 18:02 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180116153248.GA3018@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=axboe@fb.com \
    --cc=borntraeger@de.ibm.com \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=jianchao.w.wang@oracle.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sth@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.