All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] NVMe: Remove remnants of cpu hotplug
@ 2014-12-04 17:43 Sam Bradshaw
  2014-12-04 17:57 ` Jens Axboe
  0 siblings, 1 reply; 5+ messages in thread
From: Sam Bradshaw @ 2014-12-04 17:43 UTC (permalink / raw)


This patch cleans up some code inherited from the cpu hotplug handling, 
including correcting a bug where hctx->cpumask included non-schedulable 
cpus (backtrace attached).

Signed-off-by: Selvan Mani <smani at micron.com>
Signed-off-by: Sam Bradshaw <sbradshaw at micron.com>
---
diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index bcbdf83..57ed698 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core.c
@@ -76,7 +76,6 @@ static LIST_HEAD(dev_list);
 static struct task_struct *nvme_thread;
 static struct workqueue_struct *nvme_workq;
 static wait_queue_head_t nvme_kthread_wait;
-static struct notifier_block nvme_nb;
 
 static void nvme_reset_failed_dev(struct work_struct *ws);
 static int nvme_process_cq(struct nvme_queue *nvmeq);
@@ -1981,7 +1980,7 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
 	struct pci_dev *pdev = dev->pci_dev;
 	int result, i, vecs, nr_io_queues, size;
 
-	nr_io_queues = num_possible_cpus();
+	nr_io_queues = num_online_cpus();
 	result = set_queue_count(dev, nr_io_queues);
 	if (result <= 0)
 		return result;
@@ -2878,7 +2877,6 @@ static int __init nvme_init(void)
 static void __exit nvme_exit(void)
 {
 	pci_unregister_driver(&nvme_driver);
-	unregister_hotcpu_notifier(&nvme_nb);
 	unregister_blkdev(nvme_major, "nvme");
 	destroy_workqueue(nvme_workq);
 	BUG_ON(nvme_thread && !IS_ERR(nvme_thread));
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bt
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20141204/645d4d01/attachment-0001.ksh>

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH] NVMe: Remove remnants of cpu hotplug
  2014-12-04 17:43 [PATCH] NVMe: Remove remnants of cpu hotplug Sam Bradshaw
@ 2014-12-04 17:57 ` Jens Axboe
  2014-12-04 21:54   ` Sam Bradshaw (sbradshaw)
  2014-12-09 20:50   ` Sam Bradshaw (sbradshaw)
  0 siblings, 2 replies; 5+ messages in thread
From: Jens Axboe @ 2014-12-04 17:57 UTC (permalink / raw)


On 12/04/2014 10:43 AM, Sam Bradshaw wrote:
> This patch cleans up some code inherited from the cpu hotplug handling,
> including correcting a bug where hctx->cpumask included non-schedulable
> cpus (backtrace attached).

Getting rid of the notifier_block is definitely right, that should not 
be there anymore.

I'm curious with the num_possible -> num_online change. This should be 
handled by blk-mq, so no worries there. But how did it trigger the unset 
CPU in the run mask? Ran into the same issue here on a different test 
case, just curious if you looked into this.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] NVMe: Remove remnants of cpu hotplug
  2014-12-04 17:57 ` Jens Axboe
@ 2014-12-04 21:54   ` Sam Bradshaw (sbradshaw)
  2014-12-09 20:50   ` Sam Bradshaw (sbradshaw)
  1 sibling, 0 replies; 5+ messages in thread
From: Sam Bradshaw (sbradshaw) @ 2014-12-04 21:54 UTC (permalink / raw)




> -----Original Message-----
> From: Jens Axboe [mailto:axboe at kernel.dk]
> Sent: Thursday, December 04, 2014 9:57 AM
> To: Sam Bradshaw (sbradshaw); linux-nvme at lists.infradead.org
> Cc: Selvan Mani (smani) [CONT - Type 2]
> Subject: Re: [PATCH] NVMe: Remove remnants of cpu hotplug
> 
> On 12/04/2014 10:43 AM, Sam Bradshaw wrote:
> > This patch cleans up some code inherited from the cpu hotplug
> > handling, including correcting a bug where hctx->cpumask included
> > non-schedulable cpus (backtrace attached).
> 
> Getting rid of the notifier_block is definitely right, that should not
> be there anymore.
> 
> I'm curious with the num_possible -> num_online change. This should be
> handled by blk-mq, so no worries there. But how did it trigger the
> unset CPU in the run mask? Ran into the same issue here on a different
> test case, just curious if you looked into this.

The systems that exhibit the problem are Dell R620 and R720 with only 1 cpu socket populated out of 2, which creates a difference between cpus online (24) and possible (32).  I don't have direct access to these systems at the moment to verify this but my thinking is that a sw ctx is getting associated to an uninitialized hctx in the mapping step, which appears to be possible if the zeroth cpu in the cpus possible map is offline.  In that case, the sw ctx maps to the zeroth hctx, which may not have that (or any) cpu in the hctx->cpu_mask.  Maybe the default hctx shouldn't be the zeroth but instead the first that has at least one cpu associated with it?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] NVMe: Remove remnants of cpu hotplug
  2014-12-04 17:57 ` Jens Axboe
  2014-12-04 21:54   ` Sam Bradshaw (sbradshaw)
@ 2014-12-09 20:50   ` Sam Bradshaw (sbradshaw)
  2014-12-09 20:52     ` Jens Axboe
  1 sibling, 1 reply; 5+ messages in thread
From: Sam Bradshaw (sbradshaw) @ 2014-12-09 20:50 UTC (permalink / raw)



> I'm curious with the num_possible -> num_online change. This should be
> handled by blk-mq, so no worries there. But how did it trigger the
> unset CPU in the run mask? Ran into the same issue here on a different
> test case, just curious if you looked into this.

After looking into it, the issue has been fixed on your for-3.19/core 
branch due to:

[PATCH] blk-mq: prevent unmapped hw queue from being scheduled

It's straightforward enough to test that the fix works by setting 
possible_cpus on the kernel command line to something larger than 
online cpus.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] NVMe: Remove remnants of cpu hotplug
  2014-12-09 20:50   ` Sam Bradshaw (sbradshaw)
@ 2014-12-09 20:52     ` Jens Axboe
  0 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2014-12-09 20:52 UTC (permalink / raw)


On 12/09/2014 01:50 PM, Sam Bradshaw (sbradshaw) wrote:
> 
>> I'm curious with the num_possible -> num_online change. This should be
>> handled by blk-mq, so no worries there. But how did it trigger the
>> unset CPU in the run mask? Ran into the same issue here on a different
>> test case, just curious if you looked into this.
> 
> After looking into it, the issue has been fixed on your for-3.19/core 
> branch due to:
> 
> [PATCH] blk-mq: prevent unmapped hw queue from being scheduled
> 
> It's straightforward enough to test that the fix works by setting 
> possible_cpus on the kernel command line to something larger than 
> online cpus.

Great, that's much preferred :-)

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-12-09 20:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-04 17:43 [PATCH] NVMe: Remove remnants of cpu hotplug Sam Bradshaw
2014-12-04 17:57 ` Jens Axboe
2014-12-04 21:54   ` Sam Bradshaw (sbradshaw)
2014-12-09 20:50   ` Sam Bradshaw (sbradshaw)
2014-12-09 20:52     ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.