From: sagi@grimberg.me (Sagi Grimberg)
Subject: [PATCH] nvme-rdma: Always signal fabrics private commands
Date: Sun, 26 Jun 2016 19:41:39 +0300 [thread overview]
Message-ID: <577005C3.4000802@grimberg.me> (raw)
In-Reply-To: <20160624070740.GB4252@infradead.org>
>> Some RDMA adapters were observed to have some issues
>> with selective completion signaling which might cause
>> a use-after-free condition when the device accidentally
>> reports a completion when the caller context (wr_cqe)
>> was already freed.
>
> I'd really love to fully root cause this issue and find a way
> to fix it in the driver or core.
> This isn't really something a ULP should have to care about, and I'm trying to understand how
> the existing ULPs get away without this.
It's a cxgb4 specific work-around (the only device this was observed
by). Not sure how this can be addressed in the core. We could comment
that in the code.
> I think we should apply this anyway for now unless we can come up
> woth something better, but I'm not exactly happy about it.
>
>> The first time this was detected was for flush requests
>> that were not allocated from the tagset, now we see that
>> in the error path of fabrics connect (admin). The normal
>> I/O selective signaling is safe because we free the tagset
>> only when all the queue-pairs were drained.
>
> So for flush we needed this because the flush request is allocated
> as part of the hctx, but pass through requests aren't really
> special in terms of allocation. What's the reason we need to
> treat these special?
OK heres what I think is going on. we allocate the rdma queue and
issue admin connect (unsignaled). connect fails, and we orderly
teardown the admin queue (free the tagset, put the device and free the
queue).
Due to the fact that the cxgb4 driver is responsible for flushing
pending work requests and has no way of telling what the HW processed
other than the head/tail indexes (which are probably updated at
completion time) it sees the admin connect in the send-queue, it
doesn't know if the HW did anything with it, so it flushes it anyway.
Our error path is freeing the tagset before we free the queue (draining
the qp) so we get to a use-after-free condition (->done() is a freed
tag memory).
Note that we must allocate the qp before we allocate the tagset because
we need the device when init_request callouts come. So we allocated
before, we free after. An alternative fix was to free the queue before
the tagset even though we allocated it before (as Steve suggested).
next prev parent reply other threads:[~2016-06-26 16:41 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-23 16:08 [PATCH] nvme-rdma: Always signal fabrics private commands Sagi Grimberg
2016-06-23 18:17 ` Steve Wise
2016-06-24 7:07 ` Christoph Hellwig
2016-06-24 14:05 ` Steve Wise
2016-06-26 16:41 ` Sagi Grimberg [this message]
2016-06-28 8:41 ` Christoph Hellwig
2016-06-28 14:20 ` Steve Wise
2016-06-29 14:57 ` Steve Wise
2016-06-30 6:36 ` 'Christoph Hellwig'
2016-06-30 13:44 ` Steve Wise
2016-06-30 15:10 ` Steve Wise
2016-07-13 10:08 ` Sagi Grimberg
2016-07-13 10:11 ` Sagi Grimberg
2016-07-13 14:28 ` Steve Wise
2016-07-13 14:47 ` Sagi Grimberg
2016-07-13 14:51 ` Steve Wise
2016-07-13 15:02 ` Sagi Grimberg
2016-07-13 15:12 ` Steve Wise
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=577005C3.4000802@grimberg.me \
--to=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).