From: Jens Axboe <axboe@kernel.dk>
To: Meelis Roos <mroos@linux.ee>
Cc: Christoph Hellwig <hch@infradead.org>,
linux-scsi@vger.kernel.org, sparclinux@vger.kernel.org,
David Miller <davem@davemloft.net>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: Another (ESP?) scsi blk-mq problem on sparc64
Date: Fri, 14 Nov 2014 16:29:39 -0700 [thread overview]
Message-ID: <54669063.6050307@kernel.dk> (raw)
In-Reply-To: <alpine.LRH.2.11.1411150033400.26813@adalberg.ut.ee>
[-- Attachment #1: Type: text/plain, Size: 2010 bytes --]
On 2014-11-14 15:59, Meelis Roos wrote:
>>>> The second oops is in blk_mq_map_queue() which is a trivial
>>>> two level cpu lookup. I wonder if there's something odd about
>>>> cpu numbers on these big old sparc systems?
>>>
>>> CPU numbers are sparse - they are determined by hardware slot number and
>>> some models only fill every other mainboard slot, and first slots can be
>>> free. I have first board offline and currently have CPUs numbered
>>> 10,11,14,15 online.
>>>
>>> Here is debug with Jens's patch:
>>
>>> [ 133.971050] CPU 11: synchronized TICK with master CPU (last diff -1 cycles, maxerr 516 cycles)
>>> [ 133.975491] CPU 14: synchronized TICK with master CPU (last diff -3 cycles, maxerr 531 cycles)
>>> [ 133.979943] CPU 15: synchronized TICK with master CPU (last diff -3 cycles, maxerr 531 cycles)
>>> [ 133.980146] Brought up 4 CPUs
>>
>> So this looks like this might be the issue. On a scsi-mq disabled boot,
>> you have 4 CPUs, but how are they numbered?
>
> The numbers are always the same.
I would hope so, my question was really on what CPU numbers you see. But
I guess that 10, 11, 14, and 15?
> But everything seems to be mapped to queue 0?
As it should, scsi-mq only supports a single hw queue for now.
>> We might need Christophs debug patch on top this to fully know...
>
> Applied it too, dmesg is below. Yes it does spam the log a lot, and over
> 9600bps console its' somewhat slow :)
>
> There is another detail to note -this server contains a faulty disk as
> sdc that times out spinup. I left it in the server because it helped to
> pinpoint and fix a previous error in esp scsi driver. This can be a
> factor here too - the error handling details.
It could be. So we have tons of mappings from CPU10 to queue 0, but then
we see this:
> [ 256.236742] cpu: 10
> [ 256.236749] queue: 809119744
and it turns to crap. This is pretty weird. Try with this debug patch -
get rid of the other ones first. It should reduce your noise level too.
--
Jens Axboe
[-- Attachment #2: debug.patch --]
[-- Type: text/x-patch, Size: 1154 bytes --]
diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 1065d7c65fa1..9200e2aee746 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -81,6 +81,9 @@ int blk_mq_update_queue_map(unsigned int *map, unsigned int nr_queues)
map[i] = map[first_sibling];
}
+ for (i = 0; i < queue; i++)
+ printk(KERN_ERR "cpumap %d -> %d\n", i, map[i]);
+
free_cpumask_var(cpus);
return 0;
}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 68929bad9a6a..1678da3505ea 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1265,12 +1265,25 @@ run_queue:
blk_mq_put_ctx(data.ctx);
}
+static int did_warn;
+
/*
* Default mapping to a software queue, since we use one per CPU.
*/
struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *q, const int cpu)
{
- return q->queue_hw_ctx[q->mq_map[cpu]];
+ int i;
+
+ i = q->mq_map[cpu];
+ if (!i || did_warn)
+ return q->queue_hw_ctx[0];
+
+ printk(KERN_ERR "blk-mq: cpu %u got queue %u\n", cpu, i);
+ for_each_online_cpu(i)
+ printk(KERN_ERR " cpu%d -> queue index %u\n", i, q->mq_map[i]);
+
+ did_warn = 1;
+ return q->queue_hw_ctx[0];
}
EXPORT_SYMBOL(blk_mq_map_queue);
WARNING: multiple messages have this Message-ID (diff)
From: Jens Axboe <axboe@kernel.dk>
To: Meelis Roos <mroos@linux.ee>
Cc: Christoph Hellwig <hch@infradead.org>,
linux-scsi@vger.kernel.org, sparclinux@vger.kernel.org,
David Miller <davem@davemloft.net>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: Another (ESP?) scsi blk-mq problem on sparc64
Date: Fri, 14 Nov 2014 23:29:39 +0000 [thread overview]
Message-ID: <54669063.6050307@kernel.dk> (raw)
In-Reply-To: <alpine.LRH.2.11.1411150033400.26813@adalberg.ut.ee>
[-- Attachment #1: Type: text/plain, Size: 2010 bytes --]
On 2014-11-14 15:59, Meelis Roos wrote:
>>>> The second oops is in blk_mq_map_queue() which is a trivial
>>>> two level cpu lookup. I wonder if there's something odd about
>>>> cpu numbers on these big old sparc systems?
>>>
>>> CPU numbers are sparse - they are determined by hardware slot number and
>>> some models only fill every other mainboard slot, and first slots can be
>>> free. I have first board offline and currently have CPUs numbered
>>> 10,11,14,15 online.
>>>
>>> Here is debug with Jens's patch:
>>
>>> [ 133.971050] CPU 11: synchronized TICK with master CPU (last diff -1 cycles, maxerr 516 cycles)
>>> [ 133.975491] CPU 14: synchronized TICK with master CPU (last diff -3 cycles, maxerr 531 cycles)
>>> [ 133.979943] CPU 15: synchronized TICK with master CPU (last diff -3 cycles, maxerr 531 cycles)
>>> [ 133.980146] Brought up 4 CPUs
>>
>> So this looks like this might be the issue. On a scsi-mq disabled boot,
>> you have 4 CPUs, but how are they numbered?
>
> The numbers are always the same.
I would hope so, my question was really on what CPU numbers you see. But
I guess that 10, 11, 14, and 15?
> But everything seems to be mapped to queue 0?
As it should, scsi-mq only supports a single hw queue for now.
>> We might need Christophs debug patch on top this to fully know...
>
> Applied it too, dmesg is below. Yes it does spam the log a lot, and over
> 9600bps console its' somewhat slow :)
>
> There is another detail to note -this server contains a faulty disk as
> sdc that times out spinup. I left it in the server because it helped to
> pinpoint and fix a previous error in esp scsi driver. This can be a
> factor here too - the error handling details.
It could be. So we have tons of mappings from CPU10 to queue 0, but then
we see this:
> [ 256.236742] cpu: 10
> [ 256.236749] queue: 809119744
and it turns to crap. This is pretty weird. Try with this debug patch -
get rid of the other ones first. It should reduce your noise level too.
--
Jens Axboe
[-- Attachment #2: debug.patch --]
[-- Type: text/x-patch, Size: 1154 bytes --]
diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 1065d7c65fa1..9200e2aee746 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -81,6 +81,9 @@ int blk_mq_update_queue_map(unsigned int *map, unsigned int nr_queues)
map[i] = map[first_sibling];
}
+ for (i = 0; i < queue; i++)
+ printk(KERN_ERR "cpumap %d -> %d\n", i, map[i]);
+
free_cpumask_var(cpus);
return 0;
}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 68929bad9a6a..1678da3505ea 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1265,12 +1265,25 @@ run_queue:
blk_mq_put_ctx(data.ctx);
}
+static int did_warn;
+
/*
* Default mapping to a software queue, since we use one per CPU.
*/
struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *q, const int cpu)
{
- return q->queue_hw_ctx[q->mq_map[cpu]];
+ int i;
+
+ i = q->mq_map[cpu];
+ if (!i || did_warn)
+ return q->queue_hw_ctx[0];
+
+ printk(KERN_ERR "blk-mq: cpu %u got queue %u\n", cpu, i);
+ for_each_online_cpu(i)
+ printk(KERN_ERR " cpu%d -> queue index %u\n", i, q->mq_map[i]);
+
+ did_warn = 1;
+ return q->queue_hw_ctx[0];
}
EXPORT_SYMBOL(blk_mq_map_queue);
next prev parent reply other threads:[~2014-11-14 23:29 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-14 11:32 Another (ESP?) scsi blk-mq problem on sparc64 Meelis Roos
2014-11-14 11:32 ` Meelis Roos
2014-11-14 16:58 ` Christoph Hellwig
2014-11-14 16:58 ` Christoph Hellwig
2014-11-14 17:01 ` Jens Axboe
2014-11-14 17:01 ` Jens Axboe
2014-11-14 19:35 ` Meelis Roos
2014-11-14 19:35 ` Meelis Roos
2014-11-14 19:42 ` Jens Axboe
2014-11-14 19:42 ` Jens Axboe
2014-11-14 22:59 ` Meelis Roos
2014-11-14 22:59 ` Meelis Roos
2014-11-14 23:29 ` Jens Axboe [this message]
2014-11-14 23:29 ` Jens Axboe
2014-11-15 6:48 ` Meelis Roos
2014-11-15 6:48 ` Meelis Roos
2014-11-15 15:31 ` Jens Axboe
2014-11-15 15:31 ` Jens Axboe
2014-11-20 6:01 ` Christoph Hellwig
2014-11-20 6:01 ` Christoph Hellwig
2014-11-21 19:56 ` David Miller
2014-11-21 19:56 ` David Miller
2014-11-24 8:21 ` Christoph Hellwig
2014-11-24 8:21 ` Christoph Hellwig
2014-11-24 15:35 ` Jens Axboe
2014-11-24 15:35 ` Jens Axboe
2014-11-24 16:22 ` Paul E. McKenney
2014-11-24 16:22 ` Paul E. McKenney
2014-11-24 17:16 ` Jens Axboe
2014-11-24 17:16 ` Jens Axboe
2014-11-24 17:31 ` Paul E. McKenney
2014-11-24 17:31 ` Paul E. McKenney
2014-11-24 17:33 ` Jens Axboe
2014-11-24 17:33 ` Jens Axboe
2014-11-24 17:44 ` Paul E. McKenney
2014-11-24 17:44 ` Paul E. McKenney
2014-11-24 21:56 ` David Miller
2014-11-24 21:56 ` David Miller
2014-11-24 22:01 ` Jens Axboe
2014-11-24 22:01 ` Jens Axboe
2014-11-24 22:09 ` David Miller
2014-11-24 22:09 ` David Miller
2014-11-24 22:20 ` Jens Axboe
2014-11-24 22:20 ` Jens Axboe
2014-11-24 22:23 ` mroos
2014-11-24 22:23 ` mroos
2014-11-24 22:28 ` David Miller
2014-11-24 22:28 ` David Miller
2015-01-29 7:53 ` Meelis Roos
2015-01-29 7:53 ` Meelis Roos
2015-01-29 16:37 ` Jens Axboe
2015-01-29 16:37 ` Jens Axboe
2015-09-04 8:33 ` Meelis Roos
2015-09-04 8:33 ` Meelis Roos
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54669063.6050307@kernel.dk \
--to=axboe@kernel.dk \
--cc=davem@davemloft.net \
--cc=hch@infradead.org \
--cc=linux-scsi@vger.kernel.org \
--cc=mroos@linux.ee \
--cc=paulmck@linux.vnet.ibm.com \
--cc=sparclinux@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.