All of lore.kernel.org
 help / color / mirror / Atom feed
From: benh@kernel.crashing.org (Benjamin Herrenschmidt)
Subject: Duplicate tag error with 5.2
Date: Fri, 19 Jul 2019 10:44:33 +1000	[thread overview]
Message-ID: <50c35ab3db7745875476c0966bf191ab42de4dd1.camel@kernel.crashing.org> (raw)
In-Reply-To: <BYAPR04MB581667EE6FB45D86881529E2E7CB0@BYAPR04MB5816.namprd04.prod.outlook.com>

On Fri, 2019-07-19@00:39 +0000, Damien Le Moal wrote:
> On 2019/07/18 16:40, Benjamin Herrenschmidt wrote:
> > On Thu, 2019-07-18@07:13 +0000, Damien Le Moal wrote:
> > > > I can trigger the problem easily now running smartctl -c /dev/nvme0n1
> > > > and doing a bit of IOs. It seems to happen when the IO and Admin queue
> > > > use the same tag.
> > > 
> > > So isn't it that you are getting a completion cqe for an admin queue command in
> > > an IO completion queue ? Or the reverse ? Given how weird this NVMe device seems
> > > to be, it may be a possibility. In addition to the command ID (tag), if you
> > > print the cqe queue ID (le16_to_cpu(cqe->sq_id)), what do you see ?
> > 
> > Ah I can add code to validate that it's coming into the right queue,
> > good idea.
> 
> If the completion really shows up into the wrong queue, a fix may be simply to
> hack this code in nvme_handle_cqe():
> 
> 	req = blk_mq_tag_to_rq(*nvmeq->tags, cqe->command_id);
> 	trace_nvme_sq(req, cqe->sq_head, nvmeq->sq_tail);
> 	nvme_end_request(req, cqe->status, cqe->result);
> 
> to use a different nvmeq pointer, that is the one that corresponds to cqe->sq_id
> queue used for submission, which would lead to the correct tagset to be
> referenced and suppress the false "duplicate" tag issue. Locking of queues/hctx
> may need careful checking with such a change though.

But if the completion arrived in the wrong queue, wouldn't we be
missing the completions for admin requests and thus having all sort of
issues ?

Things are now solid with those two changes I've done locally, I'll
send a tentative patche:

 - Offset all tags in the IO queue with 32 so they don't overlap (I do
it at the point of writing into the submission queue, and undo it when
processing the CQs).

 - Reduce the IO queue depth by 32

Without the latter, I occasionally (but more rarely) still got the
error, but always with tags > 127. I suspect that not only Apple
implementation actually *uses* the tags we specify for their own
internal tracking, but they also only support values 0...127.

With those two changes it's been solid. Thankfully the resulting quirk
is reasonably simple and self contained to pci.c. I'll clean things up
and post.

Cheers,
Ben.

  reply	other threads:[~2019-07-19  0:44 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-18  0:41 Duplicate tag error with 5.2 Benjamin Herrenschmidt
2019-07-18  1:13 ` Benjamin Herrenschmidt
2019-07-18  1:29   ` Benjamin Herrenschmidt
     [not found]     ` <CAOSXXT6Z=zEpWqac2k1ydk2LynAEtFr-4jXJVCtTa5yn8H7f3Q@mail.gmail.com>
2019-07-18  4:53       ` Benjamin Herrenschmidt
2019-07-18  5:27         ` Benjamin Herrenschmidt
2019-07-18  6:00           ` Benjamin Herrenschmidt
2019-07-18  7:13             ` Damien Le Moal
2019-07-18  7:39               ` Benjamin Herrenschmidt
2019-07-19  0:39                 ` Damien Le Moal
2019-07-19  0:44                   ` Benjamin Herrenschmidt [this message]
2019-07-19  0:53                     ` Damien Le Moal
2019-07-19  1:00                       ` Benjamin Herrenschmidt
2019-07-19  1:09                         ` Damien Le Moal
2019-07-19  1:20                           ` Benjamin Herrenschmidt
2019-07-19  1:24                             ` Damien Le Moal
2019-07-19  1:34                               ` Benjamin Herrenschmidt
     [not found]   ` <CAOSXXT5jh0Yi0xPbsLu9V=KVmee_Pto6KNRWxUbfk_8=UGGU3A@mail.gmail.com>
2019-07-18  1:31     ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50c35ab3db7745875476c0966bf191ab42de4dd1.camel@kernel.crashing.org \
    --to=benh@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.