* Duplicate tag error with 5.2
@ 2019-07-18 0:41 Benjamin Herrenschmidt
2019-07-18 1:13 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2019-07-18 0:41 UTC (permalink / raw)
Hi !
So I'm getting occasional abrupt shutdowns on the Mac Mini 2018. I
found out that it's the T2 chip that is panic'ing and shutting the
system down.
Interestingly, when booting into MacOS, we can get some kind of error
log of what happened. It looks like the NVME implementation on that
thing is a SW emulation done by the T2 chip :-) It's failing on an
assert due to a duplicate tag error...
Does that ring any bell ? Any idea what might be causing that ? (tag is
tag 8 if that means anything).
Cheers,
Ben.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2
2019-07-18 0:41 Duplicate tag error with 5.2 Benjamin Herrenschmidt
@ 2019-07-18 1:13 ` Benjamin Herrenschmidt
2019-07-18 1:29 ` Benjamin Herrenschmidt
[not found] ` <CAOSXXT5jh0Yi0xPbsLu9V=KVmee_Pto6KNRWxUbfk_8=UGGU3A@mail.gmail.com>
0 siblings, 2 replies; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2019-07-18 1:13 UTC (permalink / raw)
On Thu, 2019-07-18@10:41 +1000, Benjamin Herrenschmidt wrote:
> Hi !
>
> So I'm getting occasional abrupt shutdowns on the Mac Mini 2018. I
> found out that it's the T2 chip that is panic'ing and shutting the
> system down.
>
> Interestingly, when booting into MacOS, we can get some kind of error
> log of what happened. It looks like the NVME implementation on that
> thing is a SW emulation done by the T2 chip :-) It's failing on an
> assert due to a duplicate tag error...
>
> Does that ring any bell ? Any idea what might be causing that ? (tag
> is
> tag 8 if that means anything).
Note: It seems to happen often during boot, when systemd starts all
services... It looks like a service hangs for a few seconds (I see the
systemd "waiting for service..." prompt show up for a bit, then the
machine dies abruptly.
I have yet to find a way to reproduce this on a successfully booted
system however. Once it's gone all the way, it's been stable so far, at
least doing plenty of -j5 kernel compiles.
Trying various stressers now...
Also the tag number changes from occurrence to occurrence, no pattern.
The IO queue depth is 129.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2
2019-07-18 1:13 ` Benjamin Herrenschmidt
@ 2019-07-18 1:29 ` Benjamin Herrenschmidt
[not found] ` <CAOSXXT6Z=zEpWqac2k1ydk2LynAEtFr-4jXJVCtTa5yn8H7f3Q@mail.gmail.com>
[not found] ` <CAOSXXT5jh0Yi0xPbsLu9V=KVmee_Pto6KNRWxUbfk_8=UGGU3A@mail.gmail.com>
1 sibling, 1 reply; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2019-07-18 1:29 UTC (permalink / raw)
On Thu, 2019-07-18@11:13 +1000, Benjamin Herrenschmidt wrote:
>
> Note: It seems to happen often during boot, when systemd starts all
> services... It looks like a service hangs for a few seconds (I see the
> systemd "waiting for service..." prompt show up for a bit, then the
> machine dies abruptly.
>
> I have yet to find a way to reproduce this on a successfully booted
> system however. Once it's gone all the way, it's been stable so far, at
> least doing plenty of -j5 kernel compiles.
>
> Trying various stressers now...
>
> Also the tag number changes from occurrence to occurrence, no pattern.
> The IO queue depth is 129.
Allright, might have caught us red handed ... unless I'm missing a path
where we recycle IDs. So this is with 5.1.16. I've added printk's to
nvme_setup_cmd() and to nvme_process_cq() printing the command_id of
the commands we setup and the commands we find completions for.
Now, I dont' have a serial console or a working netconsole on that
thing, so I'm just copying the last few entries from a screenshot taken
with my phone camera...
setup id 5
setup id 37
setup id 6
setup id 4
got id 37
got id 6
got id 4
setup id 7
setup id 5 <---- duplicate
setup id 38
got id 5 <--- probably completing the first one
At this point, we see a handful more setup id but no more interrupt nor
completion from the device, and it then abruptly kills the box.
Later, booting into MacOS, I retreived the log, and the duplicate tag
it's complaining about is ... guess what ? 5 !
Any idea what might be going on here ? Can we recycle tags via some
other path ?
Cheers,
Ben.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2
[not found] ` <CAOSXXT5jh0Yi0xPbsLu9V=KVmee_Pto6KNRWxUbfk_8=UGGU3A@mail.gmail.com>
@ 2019-07-18 1:31 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2019-07-18 1:31 UTC (permalink / raw)
On Thu, 2019-07-18@10:21 +0900, Keith Busch wrote:
>
> On Thu, Jul 18, 2019, 10:13 AM Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> > On Thu, 2019-07-18@10:41 +1000, Benjamin Herrenschmidt wrote:
> > > Hi !
> > >
> > > So I'm getting occasional abrupt shutdowns on the Mac Mini 2018. I
> > > found out that it's the T2 chip that is panic'ing and shutting the
> > > system down.
> > >
> > > Interestingly, when booting into MacOS, we can get some kind of error
> > > log of what happened. It looks like the NVME implementation on that
> > > thing is a SW emulation done by the T2 chip :-) It's failing on an
> > > assert due to a duplicate tag error...
> > >
> > > Does that ring any bell ? Any idea what might be causing that ? (tag
> > > is
> > > tag 8 if that means anything).
> >
> > Note: It seems to happen often during boot, when systemd starts all
> > services... It looks like a service hangs for a few seconds (I see the
> > systemd "waiting for service..." prompt show up for a bit, then the
> > machine dies abruptly.
> >
> > I have yet to find a way to reproduce this on a successfully booted
> > system however. Once it's gone all the way, it's been stable so far, at
> > least doing plenty of -j5 kernel compiles.
> >
> > Trying various stressers now...
> >
> > Also the tag number changes from occurrence to occurrence, no pattern.
> > The IO queue depth is 129
>
> Does this one need that QD1 quirk?
You mean queue depth 1 ? I suppose it would help but it would kill
performance :-) From the reverse engineering guy, it looks like MacOS
is happily using the whole queue. See my other message, I seem to have
caught us sending a duplicate tag indeed, though it's unclear how that
happened.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2
[not found] ` <CAOSXXT6Z=zEpWqac2k1ydk2LynAEtFr-4jXJVCtTa5yn8H7f3Q@mail.gmail.com>
@ 2019-07-18 4:53 ` Benjamin Herrenschmidt
2019-07-18 5:27 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2019-07-18 4:53 UTC (permalink / raw)
On Thu, 2019-07-18@12:58 +0900, Keith Busch wrote:
> It doesn't look possible to reuse an active tag on the same queue,
> but we can have duplicate tags on different hardware queues. Could
> you the queue ID in addition to the tag in the prints?
There's only one IO queue on this, and my testing is on qid 1 only.
That said, I wonder whether we have a different issue that's indirectly
causing this...
I noticed the qdepth is strange ... 129. That doesn't look right to me.
I wonder if Apple is not properly 0-basing the capability, and thus as
a result, we might get bogus completions or something like this.
I'll dig a bit more.
Cheers,
Ben,
^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2
2019-07-18 4:53 ` Benjamin Herrenschmidt
@ 2019-07-18 5:27 ` Benjamin Herrenschmidt
2019-07-18 6:00 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2019-07-18 5:27 UTC (permalink / raw)
On Thu, 2019-07-18@14:53 +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2019-07-18@12:58 +0900, Keith Busch wrote:
> > It doesn't look possible to reuse an active tag on the same queue,
> > but we can have duplicate tags on different hardware queues. Could
> > you the queue ID in addition to the tag in the prints?
>
> There's only one IO queue on this, and my testing is on qid 1 only.
>
> That said, I wonder whether we have a different issue that's indirectly
> causing this...
>
> I noticed the qdepth is strange ... 129. That doesn't look right to me.
> I wonder if Apple is not properly 0-basing the capability, and thus as
> a result, we might get bogus completions or something like this.
>
> I'll dig a bit more.
Actually the original printk's ignored the qid. I have a theory, trying
to test it now.
The problem tends to happens around when smartd starts. I wonder if the
Apple drive is assuming a unique tag set accross both IO and admin
queue...
I'll do more digging.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2
2019-07-18 5:27 ` Benjamin Herrenschmidt
@ 2019-07-18 6:00 ` Benjamin Herrenschmidt
2019-07-18 7:13 ` Damien Le Moal
0 siblings, 1 reply; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2019-07-18 6:00 UTC (permalink / raw)
On Thu, 2019-07-18@15:27 +1000, Benjamin Herrenschmidt wrote:
>
> Actually the original printk's ignored the qid. I have a theory, trying
> to test it now.
>
> The problem tends to happens around when smartd starts. I wonder if the
> Apple drive is assuming a unique tag set accross both IO and admin
> queue...
>
> I'll do more digging.
Ok. I think that's it.
I reduced the IO queue to 4 entries (3 usable) and the admin queue to 6
(4 usable), and added printk's of submitted and completed tags.
I can trigger the problem easily now running smartctl -c /dev/nvme0n1
and doing a bit of IOs. It seems to happen when the IO and Admin queue
use the same tag.
I verified after the crash, going to MacOS and getting the log from the
T2 chip, that the tag it complains about *is* the one that we happened
to have used accross both queues at the point where it dies.
Now, I am not that familiar with the tag management/allocation code,
how hard would it be to split the tags accross the queues ? I suppose I
could easily add a quirk to do +32 to the tags used on the IO queue...
(Provided they support arbitrary sized tags up to 16 bits).
Cheers,
Ben.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2
2019-07-18 6:00 ` Benjamin Herrenschmidt
@ 2019-07-18 7:13 ` Damien Le Moal
2019-07-18 7:39 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 17+ messages in thread
From: Damien Le Moal @ 2019-07-18 7:13 UTC (permalink / raw)
On 2019/07/18 15:01, Benjamin Herrenschmidt wrote:
> On Thu, 2019-07-18@15:27 +1000, Benjamin Herrenschmidt wrote:
>>
>> Actually the original printk's ignored the qid. I have a theory, trying
>> to test it now.
>>
>> The problem tends to happens around when smartd starts. I wonder if the
>> Apple drive is assuming a unique tag set accross both IO and admin
>> queue...
>>
>> I'll do more digging.
>
> Ok. I think that's it.
>
> I reduced the IO queue to 4 entries (3 usable) and the admin queue to 6
> (4 usable), and added printk's of submitted and completed tags.
>
> I can trigger the problem easily now running smartctl -c /dev/nvme0n1
> and doing a bit of IOs. It seems to happen when the IO and Admin queue
> use the same tag.
So isn't it that you are getting a completion cqe for an admin queue command in
an IO completion queue ? Or the reverse ? Given how weird this NVMe device seems
to be, it may be a possibility. In addition to the command ID (tag), if you
print the cqe queue ID (le16_to_cpu(cqe->sq_id)), what do you see ?
> I verified after the crash, going to MacOS and getting the log from the
> T2 chip, that the tag it complains about *is* the one that we happened
> to have used accross both queues at the point where it dies.
>
> Now, I am not that familiar with the tag management/allocation code,
> how hard would it be to split the tags accross the queues ? I suppose I
> could easily add a quirk to do +32 to the tags used on the IO queue...
The admin queue has its own tagset (nvme_dev->admin_tagset), separate from IO
queues (nvme_dev->tagset). So I do not see how this would solve the problem.
> (Provided they support arbitrary sized tags up to 16 bits).
>
> Cheers,
> Ben.
>
>
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2
2019-07-18 7:13 ` Damien Le Moal
@ 2019-07-18 7:39 ` Benjamin Herrenschmidt
2019-07-19 0:39 ` Damien Le Moal
0 siblings, 1 reply; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2019-07-18 7:39 UTC (permalink / raw)
On Thu, 2019-07-18@07:13 +0000, Damien Le Moal wrote:
> > I can trigger the problem easily now running smartctl -c /dev/nvme0n1
> > and doing a bit of IOs. It seems to happen when the IO and Admin queue
> > use the same tag.
>
> So isn't it that you are getting a completion cqe for an admin queue command in
> an IO completion queue ? Or the reverse ? Given how weird this NVMe device seems
> to be, it may be a possibility. In addition to the command ID (tag), if you
> print the cqe queue ID (le16_to_cpu(cqe->sq_id)), what do you see ?
Ah I can add code to validate that it's coming into the right queue,
good idea.
> > I verified after the crash, going to MacOS and getting the log from the
> > T2 chip, that the tag it complains about *is* the one that we happened
> > to have used accross both queues at the point where it dies.
> >
> > Now, I am not that familiar with the tag management/allocation code,
> > how hard would it be to split the tags accross the queues ? I suppose I
> > could easily add a quirk to do +32 to the tags used on the IO queue...
>
> The admin queue has its own tagset (nvme_dev->admin_tagset), separate from IO
> queues (nvme_dev->tagset). So I do not see how this would solve the problem.
I can easily do that by offsetting command_id locally to pci.c but
doing just that didn't completely fix it. It just made it much harder
to hit. I suspect they might also not like tags > 128, so I've added a
hack to clamp the IO queue depth to 64 as well, it's still running,
I'll let it run overnight (smartctl in a loop + dd's)
Cheers,
Ben.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2
2019-07-18 7:39 ` Benjamin Herrenschmidt
@ 2019-07-19 0:39 ` Damien Le Moal
2019-07-19 0:44 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 17+ messages in thread
From: Damien Le Moal @ 2019-07-19 0:39 UTC (permalink / raw)
On 2019/07/18 16:40, Benjamin Herrenschmidt wrote:
> On Thu, 2019-07-18@07:13 +0000, Damien Le Moal wrote:
>>> I can trigger the problem easily now running smartctl -c /dev/nvme0n1
>>> and doing a bit of IOs. It seems to happen when the IO and Admin queue
>>> use the same tag.
>>
>> So isn't it that you are getting a completion cqe for an admin queue command in
>> an IO completion queue ? Or the reverse ? Given how weird this NVMe device seems
>> to be, it may be a possibility. In addition to the command ID (tag), if you
>> print the cqe queue ID (le16_to_cpu(cqe->sq_id)), what do you see ?
>
> Ah I can add code to validate that it's coming into the right queue,
> good idea.
If the completion really shows up into the wrong queue, a fix may be simply to
hack this code in nvme_handle_cqe():
req = blk_mq_tag_to_rq(*nvmeq->tags, cqe->command_id);
trace_nvme_sq(req, cqe->sq_head, nvmeq->sq_tail);
nvme_end_request(req, cqe->status, cqe->result);
to use a different nvmeq pointer, that is the one that corresponds to cqe->sq_id
queue used for submission, which would lead to the correct tagset to be
referenced and suppress the false "duplicate" tag issue. Locking of queues/hctx
may need careful checking with such a change though.
>
>>> I verified after the crash, going to MacOS and getting the log from the
>>> T2 chip, that the tag it complains about *is* the one that we happened
>>> to have used accross both queues at the point where it dies.
>>>
>>> Now, I am not that familiar with the tag management/allocation code,
>>> how hard would it be to split the tags accross the queues ? I suppose I
>>> could easily add a quirk to do +32 to the tags used on the IO queue...
>>
>> The admin queue has its own tagset (nvme_dev->admin_tagset), separate from IO
>> queues (nvme_dev->tagset). So I do not see how this would solve the problem.
>
> I can easily do that by offsetting command_id locally to pci.c but
> doing just that didn't completely fix it. It just made it much harder
> to hit. I suspect they might also not like tags > 128, so I've added a
> hack to clamp the IO queue depth to 64 as well, it's still running,
> I'll let it run overnight (smartctl in a loop + dd's)
>
> Cheers,
> Ben.
>
>
>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2
2019-07-19 0:39 ` Damien Le Moal
@ 2019-07-19 0:44 ` Benjamin Herrenschmidt
2019-07-19 0:53 ` Damien Le Moal
0 siblings, 1 reply; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2019-07-19 0:44 UTC (permalink / raw)
On Fri, 2019-07-19@00:39 +0000, Damien Le Moal wrote:
> On 2019/07/18 16:40, Benjamin Herrenschmidt wrote:
> > On Thu, 2019-07-18@07:13 +0000, Damien Le Moal wrote:
> > > > I can trigger the problem easily now running smartctl -c /dev/nvme0n1
> > > > and doing a bit of IOs. It seems to happen when the IO and Admin queue
> > > > use the same tag.
> > >
> > > So isn't it that you are getting a completion cqe for an admin queue command in
> > > an IO completion queue ? Or the reverse ? Given how weird this NVMe device seems
> > > to be, it may be a possibility. In addition to the command ID (tag), if you
> > > print the cqe queue ID (le16_to_cpu(cqe->sq_id)), what do you see ?
> >
> > Ah I can add code to validate that it's coming into the right queue,
> > good idea.
>
> If the completion really shows up into the wrong queue, a fix may be simply to
> hack this code in nvme_handle_cqe():
>
> req = blk_mq_tag_to_rq(*nvmeq->tags, cqe->command_id);
> trace_nvme_sq(req, cqe->sq_head, nvmeq->sq_tail);
> nvme_end_request(req, cqe->status, cqe->result);
>
> to use a different nvmeq pointer, that is the one that corresponds to cqe->sq_id
> queue used for submission, which would lead to the correct tagset to be
> referenced and suppress the false "duplicate" tag issue. Locking of queues/hctx
> may need careful checking with such a change though.
But if the completion arrived in the wrong queue, wouldn't we be
missing the completions for admin requests and thus having all sort of
issues ?
Things are now solid with those two changes I've done locally, I'll
send a tentative patche:
- Offset all tags in the IO queue with 32 so they don't overlap (I do
it at the point of writing into the submission queue, and undo it when
processing the CQs).
- Reduce the IO queue depth by 32
Without the latter, I occasionally (but more rarely) still got the
error, but always with tags > 127. I suspect that not only Apple
implementation actually *uses* the tags we specify for their own
internal tracking, but they also only support values 0...127.
With those two changes it's been solid. Thankfully the resulting quirk
is reasonably simple and self contained to pci.c. I'll clean things up
and post.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2
2019-07-19 0:44 ` Benjamin Herrenschmidt
@ 2019-07-19 0:53 ` Damien Le Moal
2019-07-19 1:00 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 17+ messages in thread
From: Damien Le Moal @ 2019-07-19 0:53 UTC (permalink / raw)
On 2019/07/19 9:44, Benjamin Herrenschmidt wrote:
> On Fri, 2019-07-19@00:39 +0000, Damien Le Moal wrote:
>> On 2019/07/18 16:40, Benjamin Herrenschmidt wrote:
>>> On Thu, 2019-07-18@07:13 +0000, Damien Le Moal wrote:
>>>>> I can trigger the problem easily now running smartctl -c /dev/nvme0n1
>>>>> and doing a bit of IOs. It seems to happen when the IO and Admin queue
>>>>> use the same tag.
>>>>
>>>> So isn't it that you are getting a completion cqe for an admin queue command in
>>>> an IO completion queue ? Or the reverse ? Given how weird this NVMe device seems
>>>> to be, it may be a possibility. In addition to the command ID (tag), if you
>>>> print the cqe queue ID (le16_to_cpu(cqe->sq_id)), what do you see ?
>>>
>>> Ah I can add code to validate that it's coming into the right queue,
>>> good idea.
>>
>> If the completion really shows up into the wrong queue, a fix may be simply to
>> hack this code in nvme_handle_cqe():
>>
>> req = blk_mq_tag_to_rq(*nvmeq->tags, cqe->command_id);
>> trace_nvme_sq(req, cqe->sq_head, nvmeq->sq_tail);
>> nvme_end_request(req, cqe->status, cqe->result);
>>
>> to use a different nvmeq pointer, that is the one that corresponds to cqe->sq_id
>> queue used for submission, which would lead to the correct tagset to be
>> referenced and suppress the false "duplicate" tag issue. Locking of queues/hctx
>> may need careful checking with such a change though.
>
> But if the completion arrived in the wrong queue, wouldn't we be
> missing the completions for admin requests and thus having all sort of
> issues ?
If the admin req completion comes in the wrong queue, then the admin queue
completion queue should not be changing and not need any head/doorbell tweaks.
It simply stays as is. Since I think that nvme_handle_cqe is used for all
commands regardless of submission type (normal, polled, sync, etc), this should
work in all cases. Not sure at all for all this. Some side effects are
definitely possible. But worth a try I think.
>
> Things are now solid with those two changes I've done locally, I'll
> send a tentative patche:
>
> - Offset all tags in the IO queue with 32 so they don't overlap (I do
> it at the point of writing into the submission queue, and undo it when
> processing the CQs).
>
> - Reduce the IO queue depth by 32
>
> Without the latter, I occasionally (but more rarely) still got the
> error, but always with tags > 127. I suspect that not only Apple
> implementation actually *uses* the tags we specify for their own
> internal tracking, but they also only support values 0...127.
>
> With those two changes it's been solid. Thankfully the resulting quirk
> is reasonably simple and self contained to pci.c. I'll clean things up
> and post.
>
> Cheers,
> Ben.
>
>
>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2
2019-07-19 0:53 ` Damien Le Moal
@ 2019-07-19 1:00 ` Benjamin Herrenschmidt
2019-07-19 1:09 ` Damien Le Moal
0 siblings, 1 reply; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2019-07-19 1:00 UTC (permalink / raw)
On Fri, 2019-07-19@00:53 +0000, Damien Le Moal wrote:
>
> If the admin req completion comes in the wrong queue, then the admin queue
> completion queue should not be changing and not need any head/doorbell tweaks.
> It simply stays as is. Since I think that nvme_handle_cqe is used for all
> commands regardless of submission type (normal, polled, sync, etc), this should
> work in all cases. Not sure at all for all this. Some side effects are
> definitely possible. But worth a try I think.
I've printed the queue I get completions from and they arrive in the
right queues. So that doesn't seem to be the issue.
Apple's implementation is a SW one running on the T2 chip, I suspect
it's just completely broken and they don't care because they have their
own driver.
I'll clean up my latest fix, test more and send it. It applies on top
of the 3 patches I sent earlier.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2
2019-07-19 1:00 ` Benjamin Herrenschmidt
@ 2019-07-19 1:09 ` Damien Le Moal
2019-07-19 1:20 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 17+ messages in thread
From: Damien Le Moal @ 2019-07-19 1:09 UTC (permalink / raw)
On 2019/07/19 10:00, Benjamin Herrenschmidt wrote:
> On Fri, 2019-07-19@00:53 +0000, Damien Le Moal wrote:
>>
>> If the admin req completion comes in the wrong queue, then the admin queue
>> completion queue should not be changing and not need any head/doorbell tweaks.
>> It simply stays as is. Since I think that nvme_handle_cqe is used for all
>> commands regardless of submission type (normal, polled, sync, etc), this should
>> work in all cases. Not sure at all for all this. Some side effects are
>> definitely possible. But worth a try I think.
>
> I've printed the queue I get completions from and they arrive in the
> right queues. So that doesn't seem to be the issue.
OK... But then how is it possible to get a bad tag error ? if the commands are
issued and completes in the correct queues, the tagsets are independent and
should not clash in any way. This is really weird.
> Apple's implementation is a SW one running on the T2 chip, I suspect
> it's just completely broken and they don't care because they have their
> own driver.
Got it, but if the prints show correct cqueues, Linux normal driver should
work... Or is it a problem with a buggy T2 controller implementation resulting
in some completions being queued twice ? Really weird HW :)
>
> I'll clean up my latest fix, test more and send it. It applies on top
> of the 3 patches I sent earlier.
>
> Cheers,
> Ben.
>
>
>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2
2019-07-19 1:09 ` Damien Le Moal
@ 2019-07-19 1:20 ` Benjamin Herrenschmidt
2019-07-19 1:24 ` Damien Le Moal
0 siblings, 1 reply; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2019-07-19 1:20 UTC (permalink / raw)
On Fri, 2019-07-19@01:09 +0000, Damien Le Moal wrote:
> On 2019/07/19 10:00, Benjamin Herrenschmidt wrote:
> > On Fri, 2019-07-19@00:53 +0000, Damien Le Moal wrote:
> > >
> > > If the admin req completion comes in the wrong queue, then the admin queue
> > > completion queue should not be changing and not need any head/doorbell tweaks.
> > > It simply stays as is. Since I think that nvme_handle_cqe is used for all
> > > commands regardless of submission type (normal, polled, sync, etc), this should
> > > work in all cases. Not sure at all for all this. Some side effects are
> > > definitely possible. But worth a try I think.
> >
> > I've printed the queue I get completions from and they arrive in the
> > right queues. So that doesn't seem to be the issue.
>
> OK... But then how is it possible to get a bad tag error ? if the commands are
> issued and completes in the correct queues, the tagsets are independent and
> should not clash in any way. This is really weird.
>
> > Apple's implementation is a SW one running on the T2 chip, I suspect
> > it's just completely broken and they don't care because they have their
> > own driver.
>
> Got it, but if the prints show correct cqueues, Linux normal driver should
> work... Or is it a problem with a buggy T2 controller implementation resulting
> in some completions being queued twice ? Really weird HW :)
No, I think the problem is that Apple implementation (which as I said
is just a bit of SW running off the PCIe slave bit of the T2 chip, on
the ARM core), somewhat uses the tags we provide for internal tracking,
and they don't append the qid to them, so internally they assume all
tags are unique regardless of what queue they come from. It's crazy but
that seem to be what they did.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2
2019-07-19 1:20 ` Benjamin Herrenschmidt
@ 2019-07-19 1:24 ` Damien Le Moal
2019-07-19 1:34 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 17+ messages in thread
From: Damien Le Moal @ 2019-07-19 1:24 UTC (permalink / raw)
On 2019/07/19 10:20, Benjamin Herrenschmidt wrote:
> On Fri, 2019-07-19@01:09 +0000, Damien Le Moal wrote:
>> On 2019/07/19 10:00, Benjamin Herrenschmidt wrote:
>>> On Fri, 2019-07-19@00:53 +0000, Damien Le Moal wrote:
>>>>
>>>> If the admin req completion comes in the wrong queue, then the admin queue
>>>> completion queue should not be changing and not need any head/doorbell tweaks.
>>>> It simply stays as is. Since I think that nvme_handle_cqe is used for all
>>>> commands regardless of submission type (normal, polled, sync, etc), this should
>>>> work in all cases. Not sure at all for all this. Some side effects are
>>>> definitely possible. But worth a try I think.
>>>
>>> I've printed the queue I get completions from and they arrive in the
>>> right queues. So that doesn't seem to be the issue.
>>
>> OK... But then how is it possible to get a bad tag error ? if the commands are
>> issued and completes in the correct queues, the tagsets are independent and
>> should not clash in any way. This is really weird.
>>
>>> Apple's implementation is a SW one running on the T2 chip, I suspect
>>> it's just completely broken and they don't care because they have their
>>> own driver.
>>
>> Got it, but if the prints show correct cqueues, Linux normal driver should
>> work... Or is it a problem with a buggy T2 controller implementation resulting
>> in some completions being queued twice ? Really weird HW :)
>
> No, I think the problem is that Apple implementation (which as I said
> is just a bit of SW running off the PCIe slave bit of the T2 chip, on
> the ARM core), somewhat uses the tags we provide for internal tracking,
> and they don't append the qid to them, so internally they assume all
> tags are unique regardless of what queue they come from. It's crazy but
> that seem to be what they did.
OK. So the device essentially operates with a shared tagset across all queues..
And indeed that does not play well with Linux driver. Looking at the code,
having the admin queue and io queues share the same tagset would be the cleanest
fix, but that would also mean essentially rewriting completely pci.c and
implementing an "apple.c" host driver :)
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2
2019-07-19 1:24 ` Damien Le Moal
@ 2019-07-19 1:34 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2019-07-19 1:34 UTC (permalink / raw)
On Fri, 2019-07-19@01:24 +0000, Damien Le Moal wrote:
> > No, I think the problem is that Apple implementation (which as I said
> > is just a bit of SW running off the PCIe slave bit of the T2 chip, on
> > the ARM core), somewhat uses the tags we provide for internal tracking,
> > and they don't append the qid to them, so internally they assume all
> > tags are unique regardless of what queue they come from. It's crazy but
> > that seem to be what they did.
>
> OK. So the device essentially operates with a shared tagset across all queues..
> And indeed that does not play well with Linux driver. Looking at the code,
> having the admin queue and io queues share the same tagset would be the cleanest
> fix, but that would also mean essentially rewriting completely pci.c and
> implementing an "apple.c" host driver :)
Yeah, that's why my approach, while sub-optimal (I have to shrink the
IO queue) is the simplest, I just "offset" the IO queue tag space to
avoid overlaps. The offset is basically the patch I posted yesterday,
just with another hack missing I'll send later to also shrink the queue
to the resulting tags don't go over 127.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2019-07-19 1:34 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-07-18 0:41 Duplicate tag error with 5.2 Benjamin Herrenschmidt
2019-07-18 1:13 ` Benjamin Herrenschmidt
2019-07-18 1:29 ` Benjamin Herrenschmidt
[not found] ` <CAOSXXT6Z=zEpWqac2k1ydk2LynAEtFr-4jXJVCtTa5yn8H7f3Q@mail.gmail.com>
2019-07-18 4:53 ` Benjamin Herrenschmidt
2019-07-18 5:27 ` Benjamin Herrenschmidt
2019-07-18 6:00 ` Benjamin Herrenschmidt
2019-07-18 7:13 ` Damien Le Moal
2019-07-18 7:39 ` Benjamin Herrenschmidt
2019-07-19 0:39 ` Damien Le Moal
2019-07-19 0:44 ` Benjamin Herrenschmidt
2019-07-19 0:53 ` Damien Le Moal
2019-07-19 1:00 ` Benjamin Herrenschmidt
2019-07-19 1:09 ` Damien Le Moal
2019-07-19 1:20 ` Benjamin Herrenschmidt
2019-07-19 1:24 ` Damien Le Moal
2019-07-19 1:34 ` Benjamin Herrenschmidt
[not found] ` <CAOSXXT5jh0Yi0xPbsLu9V=KVmee_Pto6KNRWxUbfk_8=UGGU3A@mail.gmail.com>
2019-07-18 1:31 ` Benjamin Herrenschmidt
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.