From: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
Keith Busch <keith.busch@gmail.com>
Subject: Re: [RFC 2/2] io_uring: acquire ctx->uring_lock before calling io_issue_sqe()
Date: Tue, 28 Jan 2020 12:34:49 -0800 [thread overview]
Message-ID: <f56a8767-c754-b2e9-bfea-1ced197a05d7@oracle.com> (raw)
In-Reply-To: <a316d3fe-4162-8274-a74a-2d13a4caf011@kernel.dk>
On 1/16/2020 1:26 PM, Jens Axboe wrote:
> On 1/16/20 2:04 PM, Bijan Mottahedeh wrote:
>> On 1/16/2020 12:02 PM, Jens Axboe wrote:
>>> On 1/16/20 12:08 PM, Bijan Mottahedeh wrote:
>>>> On 1/16/2020 8:22 AM, Jens Axboe wrote:
>>>>> On 1/15/20 9:42 PM, Jens Axboe wrote:
>>>>>> On 1/15/20 9:34 PM, Jens Axboe wrote:
>>>>>>> On 1/15/20 7:37 PM, Bijan Mottahedeh wrote:
>>>>>>>> io_issue_sqe() calls io_iopoll_req_issued() which manipulates poll_list,
>>>>>>>> so acquire ctx->uring_lock beforehand similar to other instances of
>>>>>>>> calling io_issue_sqe().
>>>>>>> Is the below not enough?
>>>>>> This should be better, we have two that set ->in_async, and only one
>>>>>> doesn't hold the mutex.
>>>>>>
>>>>>> If this works for you, can you resend patch 2 with that? Also add a:
>>>>>>
>>>>>> Fixes: 8a4955ff1cca ("io_uring: sqthread should grab ctx->uring_lock for submissions")
>>>>>>
>>>>>> to it as well. Thanks!
>>>>> I tested and queued this up:
>>>>>
>>>>> https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.5&id=11ba820bf163e224bf5dd44e545a66a44a5b1d7a
>>>>>
>>>>> Please let me know if this works, it sits on top of the ->result patch you
>>>>> sent in.
>>>>>
>>>> That works, thanks.
>>>>
>>>> I'm however still seeing a use-after-free error in the request
>>>> completion path in nvme_unmap_data(). It happens only when testing with
>>>> large block sizes in fio, typically > 128k, e.g. bs=256k will always hit it.
>>>>
>>>> This is the error:
>>>>
>>>> DMA-API: nvme 0000:00:04.0: device driver tries to free DMA memory it
>>>> has not allocated [device address=0x6b6b6b6b6b6b6b6b] [size=1802201963
>>>> bytes]
>>>>
>>>> and this warning occasionally:
>>>>
>>>> WARN_ON_ONCE(blk_mq_rq_state(rq) != MQ_RQ_IDLE);
>>>>
>>>> It seems like a request might be issued multiple times but I can't see
>>>> anything in io_uring code that would account for it.
>>> Both of them indicate reuse, and I agree I don't think it's io_uring. It
>>> really feels like an issue with nvme when a poll queue is shared, but I
>>> haven't been able to pin point what it is yet.
>>>
>>> The 128K is interesting, that would seem to indicate that it's related to
>>> splitting of the IO (which would create > 1 IO per submitted IO).
>>>
>> Where does the split take place? I had suspected that it might be
>> related to the submit_bio() loop in __blkdev_direct_IO() but I don't
>> think I saw multiple submit_bio() calls or maybe I missed something.
> See the path from blk_mq_make_request() -> __blk_queue_split() ->
> blk_bio_segment_split(). The bio is built and submitted, then split if
> it violates any size constraints. The splits are submitted through
> generic_make_request(), so that might be why you didn't see multiple
> submit_bio() calls.
>
I think the problem is in __blkdev_direct_IO() and not related to
request size:
qc = submit_bio(bio);
if (polled)
WRITE_ONCE(iocb->ki_cookie, qc);
The first call to submit_bio() when dio->is_sync is not set won't have
acquired a bio ref through bio_get() and so the bio/dio could be freed
when ki_cookie is set.
With the specific io_uring test, this happens because
blk_mq_make_request()->blk_mq_get_request() fails and so terminates the
request.
As for the fix for polled io (!is_sync) case, I'm wondering if
dio->multi_bio is really necessary in __blkdev_direct_IO(). Can we call
bio_get() unconditionally after the call to bio_alloc_bioset(), set
dio->ref = 1, and increment it for additional submit bio calls? Would
it make sense to do away with multi_bio?
Also, I'm not clear on how is_sync + mult_bio case is supposed to work.
__blkdev_direct_IO() polls for *a* completion in the request's hctx and
not *the* request completion itself, so what does that tell us for
multi_bio + is_sync? Is the polling supposed to guarantee that all
constituent bios for a mult_bio request have completed before return?
--bijan
PS I couldn't see 256k requests being split via __blk_queue_split(),
still not sure how that works.
next prev parent reply other threads:[~2020-01-28 20:35 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-16 2:37 [RFC 0/2] Fixes for fio io_uring polled mode test failures Bijan Mottahedeh
2020-01-16 2:37 ` [RFC 1/2] io_uring: clear req->result always before issuing a read/write request Bijan Mottahedeh
2020-01-16 4:34 ` Jens Axboe
2020-01-16 2:37 ` [RFC 2/2] io_uring: acquire ctx->uring_lock before calling io_issue_sqe() Bijan Mottahedeh
2020-01-16 4:34 ` Jens Axboe
2020-01-16 4:42 ` Jens Axboe
2020-01-16 16:22 ` Jens Axboe
2020-01-16 19:08 ` Bijan Mottahedeh
2020-01-16 20:02 ` Jens Axboe
2020-01-16 21:04 ` Bijan Mottahedeh
2020-01-16 21:26 ` Jens Axboe
2020-01-28 20:34 ` Bijan Mottahedeh [this message]
2020-01-28 23:37 ` Jens Axboe
2020-01-28 23:49 ` Bijan Mottahedeh
2020-01-28 23:52 ` Jens Axboe
2020-01-31 3:36 ` Bijan Mottahedeh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f56a8767-c754-b2e9-bfea-1ced197a05d7@oracle.com \
--to=bijan.mottahedeh@oracle.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=keith.busch@gmail.com \
--cc=linux-block@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox