* Duplicate tag error with 5.2 @ 2019-07-18 0:41 Benjamin Herrenschmidt 2019-07-18 1:13 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 17+ messages in thread From: Benjamin Herrenschmidt @ 2019-07-18 0:41 UTC (permalink / raw) Hi ! So I'm getting occasional abrupt shutdowns on the Mac Mini 2018. I found out that it's the T2 chip that is panic'ing and shutting the system down. Interestingly, when booting into MacOS, we can get some kind of error log of what happened. It looks like the NVME implementation on that thing is a SW emulation done by the T2 chip :-) It's failing on an assert due to a duplicate tag error... Does that ring any bell ? Any idea what might be causing that ? (tag is tag 8 if that means anything). Cheers, Ben. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2 2019-07-18 0:41 Duplicate tag error with 5.2 Benjamin Herrenschmidt @ 2019-07-18 1:13 ` Benjamin Herrenschmidt 2019-07-18 1:29 ` Benjamin Herrenschmidt [not found] ` <CAOSXXT5jh0Yi0xPbsLu9V=KVmee_Pto6KNRWxUbfk_8=UGGU3A@mail.gmail.com> 0 siblings, 2 replies; 17+ messages in thread From: Benjamin Herrenschmidt @ 2019-07-18 1:13 UTC (permalink / raw) On Thu, 2019-07-18@10:41 +1000, Benjamin Herrenschmidt wrote: > Hi ! > > So I'm getting occasional abrupt shutdowns on the Mac Mini 2018. I > found out that it's the T2 chip that is panic'ing and shutting the > system down. > > Interestingly, when booting into MacOS, we can get some kind of error > log of what happened. It looks like the NVME implementation on that > thing is a SW emulation done by the T2 chip :-) It's failing on an > assert due to a duplicate tag error... > > Does that ring any bell ? Any idea what might be causing that ? (tag > is > tag 8 if that means anything). Note: It seems to happen often during boot, when systemd starts all services... It looks like a service hangs for a few seconds (I see the systemd "waiting for service..." prompt show up for a bit, then the machine dies abruptly. I have yet to find a way to reproduce this on a successfully booted system however. Once it's gone all the way, it's been stable so far, at least doing plenty of -j5 kernel compiles. Trying various stressers now... Also the tag number changes from occurrence to occurrence, no pattern. The IO queue depth is 129. Cheers, Ben. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2 2019-07-18 1:13 ` Benjamin Herrenschmidt @ 2019-07-18 1:29 ` Benjamin Herrenschmidt [not found] ` <CAOSXXT6Z=zEpWqac2k1ydk2LynAEtFr-4jXJVCtTa5yn8H7f3Q@mail.gmail.com> [not found] ` <CAOSXXT5jh0Yi0xPbsLu9V=KVmee_Pto6KNRWxUbfk_8=UGGU3A@mail.gmail.com> 1 sibling, 1 reply; 17+ messages in thread From: Benjamin Herrenschmidt @ 2019-07-18 1:29 UTC (permalink / raw) On Thu, 2019-07-18@11:13 +1000, Benjamin Herrenschmidt wrote: > > Note: It seems to happen often during boot, when systemd starts all > services... It looks like a service hangs for a few seconds (I see the > systemd "waiting for service..." prompt show up for a bit, then the > machine dies abruptly. > > I have yet to find a way to reproduce this on a successfully booted > system however. Once it's gone all the way, it's been stable so far, at > least doing plenty of -j5 kernel compiles. > > Trying various stressers now... > > Also the tag number changes from occurrence to occurrence, no pattern. > The IO queue depth is 129. Allright, might have caught us red handed ... unless I'm missing a path where we recycle IDs. So this is with 5.1.16. I've added printk's to nvme_setup_cmd() and to nvme_process_cq() printing the command_id of the commands we setup and the commands we find completions for. Now, I dont' have a serial console or a working netconsole on that thing, so I'm just copying the last few entries from a screenshot taken with my phone camera... setup id 5 setup id 37 setup id 6 setup id 4 got id 37 got id 6 got id 4 setup id 7 setup id 5 <---- duplicate setup id 38 got id 5 <--- probably completing the first one At this point, we see a handful more setup id but no more interrupt nor completion from the device, and it then abruptly kills the box. Later, booting into MacOS, I retreived the log, and the duplicate tag it's complaining about is ... guess what ? 5 ! Any idea what might be going on here ? Can we recycle tags via some other path ? Cheers, Ben. ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <CAOSXXT6Z=zEpWqac2k1ydk2LynAEtFr-4jXJVCtTa5yn8H7f3Q@mail.gmail.com>]
* Duplicate tag error with 5.2 [not found] ` <CAOSXXT6Z=zEpWqac2k1ydk2LynAEtFr-4jXJVCtTa5yn8H7f3Q@mail.gmail.com> @ 2019-07-18 4:53 ` Benjamin Herrenschmidt 2019-07-18 5:27 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 17+ messages in thread From: Benjamin Herrenschmidt @ 2019-07-18 4:53 UTC (permalink / raw) On Thu, 2019-07-18@12:58 +0900, Keith Busch wrote: > It doesn't look possible to reuse an active tag on the same queue, > but we can have duplicate tags on different hardware queues. Could > you the queue ID in addition to the tag in the prints? There's only one IO queue on this, and my testing is on qid 1 only. That said, I wonder whether we have a different issue that's indirectly causing this... I noticed the qdepth is strange ... 129. That doesn't look right to me. I wonder if Apple is not properly 0-basing the capability, and thus as a result, we might get bogus completions or something like this. I'll dig a bit more. Cheers, Ben, ^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2 2019-07-18 4:53 ` Benjamin Herrenschmidt @ 2019-07-18 5:27 ` Benjamin Herrenschmidt 2019-07-18 6:00 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 17+ messages in thread From: Benjamin Herrenschmidt @ 2019-07-18 5:27 UTC (permalink / raw) On Thu, 2019-07-18@14:53 +1000, Benjamin Herrenschmidt wrote: > On Thu, 2019-07-18@12:58 +0900, Keith Busch wrote: > > It doesn't look possible to reuse an active tag on the same queue, > > but we can have duplicate tags on different hardware queues. Could > > you the queue ID in addition to the tag in the prints? > > There's only one IO queue on this, and my testing is on qid 1 only. > > That said, I wonder whether we have a different issue that's indirectly > causing this... > > I noticed the qdepth is strange ... 129. That doesn't look right to me. > I wonder if Apple is not properly 0-basing the capability, and thus as > a result, we might get bogus completions or something like this. > > I'll dig a bit more. Actually the original printk's ignored the qid. I have a theory, trying to test it now. The problem tends to happens around when smartd starts. I wonder if the Apple drive is assuming a unique tag set accross both IO and admin queue... I'll do more digging. Cheers, Ben. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2 2019-07-18 5:27 ` Benjamin Herrenschmidt @ 2019-07-18 6:00 ` Benjamin Herrenschmidt 2019-07-18 7:13 ` Damien Le Moal 0 siblings, 1 reply; 17+ messages in thread From: Benjamin Herrenschmidt @ 2019-07-18 6:00 UTC (permalink / raw) On Thu, 2019-07-18@15:27 +1000, Benjamin Herrenschmidt wrote: > > Actually the original printk's ignored the qid. I have a theory, trying > to test it now. > > The problem tends to happens around when smartd starts. I wonder if the > Apple drive is assuming a unique tag set accross both IO and admin > queue... > > I'll do more digging. Ok. I think that's it. I reduced the IO queue to 4 entries (3 usable) and the admin queue to 6 (4 usable), and added printk's of submitted and completed tags. I can trigger the problem easily now running smartctl -c /dev/nvme0n1 and doing a bit of IOs. It seems to happen when the IO and Admin queue use the same tag. I verified after the crash, going to MacOS and getting the log from the T2 chip, that the tag it complains about *is* the one that we happened to have used accross both queues at the point where it dies. Now, I am not that familiar with the tag management/allocation code, how hard would it be to split the tags accross the queues ? I suppose I could easily add a quirk to do +32 to the tags used on the IO queue... (Provided they support arbitrary sized tags up to 16 bits). Cheers, Ben. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2 2019-07-18 6:00 ` Benjamin Herrenschmidt @ 2019-07-18 7:13 ` Damien Le Moal 2019-07-18 7:39 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 17+ messages in thread From: Damien Le Moal @ 2019-07-18 7:13 UTC (permalink / raw) On 2019/07/18 15:01, Benjamin Herrenschmidt wrote: > On Thu, 2019-07-18@15:27 +1000, Benjamin Herrenschmidt wrote: >> >> Actually the original printk's ignored the qid. I have a theory, trying >> to test it now. >> >> The problem tends to happens around when smartd starts. I wonder if the >> Apple drive is assuming a unique tag set accross both IO and admin >> queue... >> >> I'll do more digging. > > Ok. I think that's it. > > I reduced the IO queue to 4 entries (3 usable) and the admin queue to 6 > (4 usable), and added printk's of submitted and completed tags. > > I can trigger the problem easily now running smartctl -c /dev/nvme0n1 > and doing a bit of IOs. It seems to happen when the IO and Admin queue > use the same tag. So isn't it that you are getting a completion cqe for an admin queue command in an IO completion queue ? Or the reverse ? Given how weird this NVMe device seems to be, it may be a possibility. In addition to the command ID (tag), if you print the cqe queue ID (le16_to_cpu(cqe->sq_id)), what do you see ? > I verified after the crash, going to MacOS and getting the log from the > T2 chip, that the tag it complains about *is* the one that we happened > to have used accross both queues at the point where it dies. > > Now, I am not that familiar with the tag management/allocation code, > how hard would it be to split the tags accross the queues ? I suppose I > could easily add a quirk to do +32 to the tags used on the IO queue... The admin queue has its own tagset (nvme_dev->admin_tagset), separate from IO queues (nvme_dev->tagset). So I do not see how this would solve the problem. > (Provided they support arbitrary sized tags up to 16 bits). > > Cheers, > Ben. > > > > _______________________________________________ > Linux-nvme mailing list > Linux-nvme at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-nvme > -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2 2019-07-18 7:13 ` Damien Le Moal @ 2019-07-18 7:39 ` Benjamin Herrenschmidt 2019-07-19 0:39 ` Damien Le Moal 0 siblings, 1 reply; 17+ messages in thread From: Benjamin Herrenschmidt @ 2019-07-18 7:39 UTC (permalink / raw) On Thu, 2019-07-18@07:13 +0000, Damien Le Moal wrote: > > I can trigger the problem easily now running smartctl -c /dev/nvme0n1 > > and doing a bit of IOs. It seems to happen when the IO and Admin queue > > use the same tag. > > So isn't it that you are getting a completion cqe for an admin queue command in > an IO completion queue ? Or the reverse ? Given how weird this NVMe device seems > to be, it may be a possibility. In addition to the command ID (tag), if you > print the cqe queue ID (le16_to_cpu(cqe->sq_id)), what do you see ? Ah I can add code to validate that it's coming into the right queue, good idea. > > I verified after the crash, going to MacOS and getting the log from the > > T2 chip, that the tag it complains about *is* the one that we happened > > to have used accross both queues at the point where it dies. > > > > Now, I am not that familiar with the tag management/allocation code, > > how hard would it be to split the tags accross the queues ? I suppose I > > could easily add a quirk to do +32 to the tags used on the IO queue... > > The admin queue has its own tagset (nvme_dev->admin_tagset), separate from IO > queues (nvme_dev->tagset). So I do not see how this would solve the problem. I can easily do that by offsetting command_id locally to pci.c but doing just that didn't completely fix it. It just made it much harder to hit. I suspect they might also not like tags > 128, so I've added a hack to clamp the IO queue depth to 64 as well, it's still running, I'll let it run overnight (smartctl in a loop + dd's) Cheers, Ben. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2 2019-07-18 7:39 ` Benjamin Herrenschmidt @ 2019-07-19 0:39 ` Damien Le Moal 2019-07-19 0:44 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 17+ messages in thread From: Damien Le Moal @ 2019-07-19 0:39 UTC (permalink / raw) On 2019/07/18 16:40, Benjamin Herrenschmidt wrote: > On Thu, 2019-07-18@07:13 +0000, Damien Le Moal wrote: >>> I can trigger the problem easily now running smartctl -c /dev/nvme0n1 >>> and doing a bit of IOs. It seems to happen when the IO and Admin queue >>> use the same tag. >> >> So isn't it that you are getting a completion cqe for an admin queue command in >> an IO completion queue ? Or the reverse ? Given how weird this NVMe device seems >> to be, it may be a possibility. In addition to the command ID (tag), if you >> print the cqe queue ID (le16_to_cpu(cqe->sq_id)), what do you see ? > > Ah I can add code to validate that it's coming into the right queue, > good idea. If the completion really shows up into the wrong queue, a fix may be simply to hack this code in nvme_handle_cqe(): req = blk_mq_tag_to_rq(*nvmeq->tags, cqe->command_id); trace_nvme_sq(req, cqe->sq_head, nvmeq->sq_tail); nvme_end_request(req, cqe->status, cqe->result); to use a different nvmeq pointer, that is the one that corresponds to cqe->sq_id queue used for submission, which would lead to the correct tagset to be referenced and suppress the false "duplicate" tag issue. Locking of queues/hctx may need careful checking with such a change though. > >>> I verified after the crash, going to MacOS and getting the log from the >>> T2 chip, that the tag it complains about *is* the one that we happened >>> to have used accross both queues at the point where it dies. >>> >>> Now, I am not that familiar with the tag management/allocation code, >>> how hard would it be to split the tags accross the queues ? I suppose I >>> could easily add a quirk to do +32 to the tags used on the IO queue... >> >> The admin queue has its own tagset (nvme_dev->admin_tagset), separate from IO >> queues (nvme_dev->tagset). So I do not see how this would solve the problem. > > I can easily do that by offsetting command_id locally to pci.c but > doing just that didn't completely fix it. It just made it much harder > to hit. I suspect they might also not like tags > 128, so I've added a > hack to clamp the IO queue depth to 64 as well, it's still running, > I'll let it run overnight (smartctl in a loop + dd's) > > Cheers, > Ben. > > > -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2 2019-07-19 0:39 ` Damien Le Moal @ 2019-07-19 0:44 ` Benjamin Herrenschmidt 2019-07-19 0:53 ` Damien Le Moal 0 siblings, 1 reply; 17+ messages in thread From: Benjamin Herrenschmidt @ 2019-07-19 0:44 UTC (permalink / raw) On Fri, 2019-07-19@00:39 +0000, Damien Le Moal wrote: > On 2019/07/18 16:40, Benjamin Herrenschmidt wrote: > > On Thu, 2019-07-18@07:13 +0000, Damien Le Moal wrote: > > > > I can trigger the problem easily now running smartctl -c /dev/nvme0n1 > > > > and doing a bit of IOs. It seems to happen when the IO and Admin queue > > > > use the same tag. > > > > > > So isn't it that you are getting a completion cqe for an admin queue command in > > > an IO completion queue ? Or the reverse ? Given how weird this NVMe device seems > > > to be, it may be a possibility. In addition to the command ID (tag), if you > > > print the cqe queue ID (le16_to_cpu(cqe->sq_id)), what do you see ? > > > > Ah I can add code to validate that it's coming into the right queue, > > good idea. > > If the completion really shows up into the wrong queue, a fix may be simply to > hack this code in nvme_handle_cqe(): > > req = blk_mq_tag_to_rq(*nvmeq->tags, cqe->command_id); > trace_nvme_sq(req, cqe->sq_head, nvmeq->sq_tail); > nvme_end_request(req, cqe->status, cqe->result); > > to use a different nvmeq pointer, that is the one that corresponds to cqe->sq_id > queue used for submission, which would lead to the correct tagset to be > referenced and suppress the false "duplicate" tag issue. Locking of queues/hctx > may need careful checking with such a change though. But if the completion arrived in the wrong queue, wouldn't we be missing the completions for admin requests and thus having all sort of issues ? Things are now solid with those two changes I've done locally, I'll send a tentative patche: - Offset all tags in the IO queue with 32 so they don't overlap (I do it at the point of writing into the submission queue, and undo it when processing the CQs). - Reduce the IO queue depth by 32 Without the latter, I occasionally (but more rarely) still got the error, but always with tags > 127. I suspect that not only Apple implementation actually *uses* the tags we specify for their own internal tracking, but they also only support values 0...127. With those two changes it's been solid. Thankfully the resulting quirk is reasonably simple and self contained to pci.c. I'll clean things up and post. Cheers, Ben. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2 2019-07-19 0:44 ` Benjamin Herrenschmidt @ 2019-07-19 0:53 ` Damien Le Moal 2019-07-19 1:00 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 17+ messages in thread From: Damien Le Moal @ 2019-07-19 0:53 UTC (permalink / raw) On 2019/07/19 9:44, Benjamin Herrenschmidt wrote: > On Fri, 2019-07-19@00:39 +0000, Damien Le Moal wrote: >> On 2019/07/18 16:40, Benjamin Herrenschmidt wrote: >>> On Thu, 2019-07-18@07:13 +0000, Damien Le Moal wrote: >>>>> I can trigger the problem easily now running smartctl -c /dev/nvme0n1 >>>>> and doing a bit of IOs. It seems to happen when the IO and Admin queue >>>>> use the same tag. >>>> >>>> So isn't it that you are getting a completion cqe for an admin queue command in >>>> an IO completion queue ? Or the reverse ? Given how weird this NVMe device seems >>>> to be, it may be a possibility. In addition to the command ID (tag), if you >>>> print the cqe queue ID (le16_to_cpu(cqe->sq_id)), what do you see ? >>> >>> Ah I can add code to validate that it's coming into the right queue, >>> good idea. >> >> If the completion really shows up into the wrong queue, a fix may be simply to >> hack this code in nvme_handle_cqe(): >> >> req = blk_mq_tag_to_rq(*nvmeq->tags, cqe->command_id); >> trace_nvme_sq(req, cqe->sq_head, nvmeq->sq_tail); >> nvme_end_request(req, cqe->status, cqe->result); >> >> to use a different nvmeq pointer, that is the one that corresponds to cqe->sq_id >> queue used for submission, which would lead to the correct tagset to be >> referenced and suppress the false "duplicate" tag issue. Locking of queues/hctx >> may need careful checking with such a change though. > > But if the completion arrived in the wrong queue, wouldn't we be > missing the completions for admin requests and thus having all sort of > issues ? If the admin req completion comes in the wrong queue, then the admin queue completion queue should not be changing and not need any head/doorbell tweaks. It simply stays as is. Since I think that nvme_handle_cqe is used for all commands regardless of submission type (normal, polled, sync, etc), this should work in all cases. Not sure at all for all this. Some side effects are definitely possible. But worth a try I think. > > Things are now solid with those two changes I've done locally, I'll > send a tentative patche: > > - Offset all tags in the IO queue with 32 so they don't overlap (I do > it at the point of writing into the submission queue, and undo it when > processing the CQs). > > - Reduce the IO queue depth by 32 > > Without the latter, I occasionally (but more rarely) still got the > error, but always with tags > 127. I suspect that not only Apple > implementation actually *uses* the tags we specify for their own > internal tracking, but they also only support values 0...127. > > With those two changes it's been solid. Thankfully the resulting quirk > is reasonably simple and self contained to pci.c. I'll clean things up > and post. > > Cheers, > Ben. > > > -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2 2019-07-19 0:53 ` Damien Le Moal @ 2019-07-19 1:00 ` Benjamin Herrenschmidt 2019-07-19 1:09 ` Damien Le Moal 0 siblings, 1 reply; 17+ messages in thread From: Benjamin Herrenschmidt @ 2019-07-19 1:00 UTC (permalink / raw) On Fri, 2019-07-19@00:53 +0000, Damien Le Moal wrote: > > If the admin req completion comes in the wrong queue, then the admin queue > completion queue should not be changing and not need any head/doorbell tweaks. > It simply stays as is. Since I think that nvme_handle_cqe is used for all > commands regardless of submission type (normal, polled, sync, etc), this should > work in all cases. Not sure at all for all this. Some side effects are > definitely possible. But worth a try I think. I've printed the queue I get completions from and they arrive in the right queues. So that doesn't seem to be the issue. Apple's implementation is a SW one running on the T2 chip, I suspect it's just completely broken and they don't care because they have their own driver. I'll clean up my latest fix, test more and send it. It applies on top of the 3 patches I sent earlier. Cheers, Ben. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2 2019-07-19 1:00 ` Benjamin Herrenschmidt @ 2019-07-19 1:09 ` Damien Le Moal 2019-07-19 1:20 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 17+ messages in thread From: Damien Le Moal @ 2019-07-19 1:09 UTC (permalink / raw) On 2019/07/19 10:00, Benjamin Herrenschmidt wrote: > On Fri, 2019-07-19@00:53 +0000, Damien Le Moal wrote: >> >> If the admin req completion comes in the wrong queue, then the admin queue >> completion queue should not be changing and not need any head/doorbell tweaks. >> It simply stays as is. Since I think that nvme_handle_cqe is used for all >> commands regardless of submission type (normal, polled, sync, etc), this should >> work in all cases. Not sure at all for all this. Some side effects are >> definitely possible. But worth a try I think. > > I've printed the queue I get completions from and they arrive in the > right queues. So that doesn't seem to be the issue. OK... But then how is it possible to get a bad tag error ? if the commands are issued and completes in the correct queues, the tagsets are independent and should not clash in any way. This is really weird. > Apple's implementation is a SW one running on the T2 chip, I suspect > it's just completely broken and they don't care because they have their > own driver. Got it, but if the prints show correct cqueues, Linux normal driver should work... Or is it a problem with a buggy T2 controller implementation resulting in some completions being queued twice ? Really weird HW :) > > I'll clean up my latest fix, test more and send it. It applies on top > of the 3 patches I sent earlier. > > Cheers, > Ben. > > > -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2 2019-07-19 1:09 ` Damien Le Moal @ 2019-07-19 1:20 ` Benjamin Herrenschmidt 2019-07-19 1:24 ` Damien Le Moal 0 siblings, 1 reply; 17+ messages in thread From: Benjamin Herrenschmidt @ 2019-07-19 1:20 UTC (permalink / raw) On Fri, 2019-07-19@01:09 +0000, Damien Le Moal wrote: > On 2019/07/19 10:00, Benjamin Herrenschmidt wrote: > > On Fri, 2019-07-19@00:53 +0000, Damien Le Moal wrote: > > > > > > If the admin req completion comes in the wrong queue, then the admin queue > > > completion queue should not be changing and not need any head/doorbell tweaks. > > > It simply stays as is. Since I think that nvme_handle_cqe is used for all > > > commands regardless of submission type (normal, polled, sync, etc), this should > > > work in all cases. Not sure at all for all this. Some side effects are > > > definitely possible. But worth a try I think. > > > > I've printed the queue I get completions from and they arrive in the > > right queues. So that doesn't seem to be the issue. > > OK... But then how is it possible to get a bad tag error ? if the commands are > issued and completes in the correct queues, the tagsets are independent and > should not clash in any way. This is really weird. > > > Apple's implementation is a SW one running on the T2 chip, I suspect > > it's just completely broken and they don't care because they have their > > own driver. > > Got it, but if the prints show correct cqueues, Linux normal driver should > work... Or is it a problem with a buggy T2 controller implementation resulting > in some completions being queued twice ? Really weird HW :) No, I think the problem is that Apple implementation (which as I said is just a bit of SW running off the PCIe slave bit of the T2 chip, on the ARM core), somewhat uses the tags we provide for internal tracking, and they don't append the qid to them, so internally they assume all tags are unique regardless of what queue they come from. It's crazy but that seem to be what they did. Cheers, Ben. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2 2019-07-19 1:20 ` Benjamin Herrenschmidt @ 2019-07-19 1:24 ` Damien Le Moal 2019-07-19 1:34 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 17+ messages in thread From: Damien Le Moal @ 2019-07-19 1:24 UTC (permalink / raw) On 2019/07/19 10:20, Benjamin Herrenschmidt wrote: > On Fri, 2019-07-19@01:09 +0000, Damien Le Moal wrote: >> On 2019/07/19 10:00, Benjamin Herrenschmidt wrote: >>> On Fri, 2019-07-19@00:53 +0000, Damien Le Moal wrote: >>>> >>>> If the admin req completion comes in the wrong queue, then the admin queue >>>> completion queue should not be changing and not need any head/doorbell tweaks. >>>> It simply stays as is. Since I think that nvme_handle_cqe is used for all >>>> commands regardless of submission type (normal, polled, sync, etc), this should >>>> work in all cases. Not sure at all for all this. Some side effects are >>>> definitely possible. But worth a try I think. >>> >>> I've printed the queue I get completions from and they arrive in the >>> right queues. So that doesn't seem to be the issue. >> >> OK... But then how is it possible to get a bad tag error ? if the commands are >> issued and completes in the correct queues, the tagsets are independent and >> should not clash in any way. This is really weird. >> >>> Apple's implementation is a SW one running on the T2 chip, I suspect >>> it's just completely broken and they don't care because they have their >>> own driver. >> >> Got it, but if the prints show correct cqueues, Linux normal driver should >> work... Or is it a problem with a buggy T2 controller implementation resulting >> in some completions being queued twice ? Really weird HW :) > > No, I think the problem is that Apple implementation (which as I said > is just a bit of SW running off the PCIe slave bit of the T2 chip, on > the ARM core), somewhat uses the tags we provide for internal tracking, > and they don't append the qid to them, so internally they assume all > tags are unique regardless of what queue they come from. It's crazy but > that seem to be what they did. OK. So the device essentially operates with a shared tagset across all queues.. And indeed that does not play well with Linux driver. Looking at the code, having the admin queue and io queues share the same tagset would be the cleanest fix, but that would also mean essentially rewriting completely pci.c and implementing an "apple.c" host driver :) -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 17+ messages in thread
* Duplicate tag error with 5.2 2019-07-19 1:24 ` Damien Le Moal @ 2019-07-19 1:34 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 17+ messages in thread From: Benjamin Herrenschmidt @ 2019-07-19 1:34 UTC (permalink / raw) On Fri, 2019-07-19@01:24 +0000, Damien Le Moal wrote: > > No, I think the problem is that Apple implementation (which as I said > > is just a bit of SW running off the PCIe slave bit of the T2 chip, on > > the ARM core), somewhat uses the tags we provide for internal tracking, > > and they don't append the qid to them, so internally they assume all > > tags are unique regardless of what queue they come from. It's crazy but > > that seem to be what they did. > > OK. So the device essentially operates with a shared tagset across all queues.. > And indeed that does not play well with Linux driver. Looking at the code, > having the admin queue and io queues share the same tagset would be the cleanest > fix, but that would also mean essentially rewriting completely pci.c and > implementing an "apple.c" host driver :) Yeah, that's why my approach, while sub-optimal (I have to shrink the IO queue) is the simplest, I just "offset" the IO queue tag space to avoid overlaps. The offset is basically the patch I posted yesterday, just with another hack missing I'll send later to also shrink the queue to the resulting tags don't go over 127. Cheers, Ben. ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <CAOSXXT5jh0Yi0xPbsLu9V=KVmee_Pto6KNRWxUbfk_8=UGGU3A@mail.gmail.com>]
* Duplicate tag error with 5.2 [not found] ` <CAOSXXT5jh0Yi0xPbsLu9V=KVmee_Pto6KNRWxUbfk_8=UGGU3A@mail.gmail.com> @ 2019-07-18 1:31 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 17+ messages in thread From: Benjamin Herrenschmidt @ 2019-07-18 1:31 UTC (permalink / raw) On Thu, 2019-07-18@10:21 +0900, Keith Busch wrote: > > On Thu, Jul 18, 2019, 10:13 AM Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > > On Thu, 2019-07-18@10:41 +1000, Benjamin Herrenschmidt wrote: > > > Hi ! > > > > > > So I'm getting occasional abrupt shutdowns on the Mac Mini 2018. I > > > found out that it's the T2 chip that is panic'ing and shutting the > > > system down. > > > > > > Interestingly, when booting into MacOS, we can get some kind of error > > > log of what happened. It looks like the NVME implementation on that > > > thing is a SW emulation done by the T2 chip :-) It's failing on an > > > assert due to a duplicate tag error... > > > > > > Does that ring any bell ? Any idea what might be causing that ? (tag > > > is > > > tag 8 if that means anything). > > > > Note: It seems to happen often during boot, when systemd starts all > > services... It looks like a service hangs for a few seconds (I see the > > systemd "waiting for service..." prompt show up for a bit, then the > > machine dies abruptly. > > > > I have yet to find a way to reproduce this on a successfully booted > > system however. Once it's gone all the way, it's been stable so far, at > > least doing plenty of -j5 kernel compiles. > > > > Trying various stressers now... > > > > Also the tag number changes from occurrence to occurrence, no pattern. > > The IO queue depth is 129 > > Does this one need that QD1 quirk? You mean queue depth 1 ? I suppose it would help but it would kill performance :-) From the reverse engineering guy, it looks like MacOS is happily using the whole queue. See my other message, I seem to have caught us sending a duplicate tag indeed, though it's unclear how that happened. Cheers, Ben. ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2019-07-19 1:34 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-07-18 0:41 Duplicate tag error with 5.2 Benjamin Herrenschmidt
2019-07-18 1:13 ` Benjamin Herrenschmidt
2019-07-18 1:29 ` Benjamin Herrenschmidt
[not found] ` <CAOSXXT6Z=zEpWqac2k1ydk2LynAEtFr-4jXJVCtTa5yn8H7f3Q@mail.gmail.com>
2019-07-18 4:53 ` Benjamin Herrenschmidt
2019-07-18 5:27 ` Benjamin Herrenschmidt
2019-07-18 6:00 ` Benjamin Herrenschmidt
2019-07-18 7:13 ` Damien Le Moal
2019-07-18 7:39 ` Benjamin Herrenschmidt
2019-07-19 0:39 ` Damien Le Moal
2019-07-19 0:44 ` Benjamin Herrenschmidt
2019-07-19 0:53 ` Damien Le Moal
2019-07-19 1:00 ` Benjamin Herrenschmidt
2019-07-19 1:09 ` Damien Le Moal
2019-07-19 1:20 ` Benjamin Herrenschmidt
2019-07-19 1:24 ` Damien Le Moal
2019-07-19 1:34 ` Benjamin Herrenschmidt
[not found] ` <CAOSXXT5jh0Yi0xPbsLu9V=KVmee_Pto6KNRWxUbfk_8=UGGU3A@mail.gmail.com>
2019-07-18 1:31 ` Benjamin Herrenschmidt
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.