From: Jens Axboe <axboe@kernel.dk>
To: Pavel Begunkov <asml.silence@gmail.com>,
Martin Raiber <martin@urbackup.org>,
io-uring@vger.kernel.org
Subject: Re: Fixed buffers have out-dated content
Date: Sat, 16 Jan 2021 12:39:54 -0700 [thread overview]
Message-ID: <fb386017-3362-9cc6-8a9a-fe2233c9525d@kernel.dk> (raw)
In-Reply-To: <25f75e49-9d5e-fcab-e24b-8ad908254c2e@gmail.com>
On 1/16/21 12:30 PM, Pavel Begunkov wrote:
> On 14/01/2021 21:50, Martin Raiber wrote:
>> On 10.01.2021 17:50 Martin Raiber wrote:
>>> On 09.01.2021 21:32 Pavel Begunkov wrote:
>>>> On 09/01/2021 16:58, Martin Raiber wrote:
>>>>> On 09.01.2021 17:23 Jens Axboe wrote:
>>>>>> On 1/8/21 4:39 PM, Martin Raiber wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have a gnarly issue with io_uring and fixed buffers (fixed
>>>>>>> read/write). It seems the contents of those buffers contain old data in
>>>>>>> some rare cases under memory pressure after a read/during a write.
>>>>>>>
>>>>>>> Specifically I use io_uring with fuse and to confirm this is not some
>>>>>>> user space issue let fuse print the unique id it adds to each request.
>>>>>>> Fuse adds this request data to a pipe, and when the pipe buffer is later
>>>>>>> copied to the io_uring fixed buffer it has the id of a fuse request
>>>>>>> returned earlier using the same buffer while returning the size of the
>>>>>>> new request. Or I set the unique id in the buffer, write it to fuse (via
>>>>>>> writing to a pipe, then splicing) and then fuse returns with e.g.
>>>>>>> ENOENT, because the unique id is not correct because in kernel it reads
>>>>>>> the id of the previous, already completed, request using this buffer.
>>>>>>>
>>>>>>> To make reproducing this faster running memtester (which mlocks a
>>>>>>> configurable amount of memory) with a large amount of user memory every
>>>>>>> 30s helps. So it has something to do with swapping? It seems to not
>>>>>>> occur if no swap space is active. Problem occurs without warning when
>>>>>>> the kernel is build with KASAN and slab debugging.
>>>>>>>
>>>>>>> If I don't use the _FIXED opcodes (which is easy to do), the problem
>>>>>>> does not occur.
>>>>>>>
>>>>>>> Problem occurs with 5.9.16 and 5.10.5.
>>>>>> Can you mention more about what kind of IO you are doing, I'm assuming
>>>>>> it's O_DIRECT? I'll see if I can reproduce this.
>>>>> It's writing to/reading from pipes (nonblocking, no O_DIRECT).
>>>> A blind guess, does it handle short reads and writes? If not, can you
>>>> check whether they happen or not?
>>>
>>> Something like this was what I suspected at first as well. It does check for short read/writes and I added (unnecessary -- because the fuse request structure is 40 bytes and it does io in page sizes) code for retrying short reads at some point. I also checked for the pipes to be empty before they are used at some point and let the kernel log allocation failures (idea was that it was short pipe read/writes because of allocation failure or that something doesn't get rewound properly in this case). Beyond that three things that make a user space problem unlikely:
>>>
>>> - occurs only when using fixed buffers and does not occur when running same code without fixed buffer opcodes
>>> - doesn't occur when there is no memory pressure
>>> - I added print(k/f) logging that pointed me in this direction as well
>>>
>>>>> I can reproduce it with https://github.com/uroni/fuseuring on e.g. a 2GB VPS. Modify bench.sh so that fio loops. Add swap, then run 1400M memtester while it runs (so it swaps, I guess). I can try further reducing the reproducer, but I wanted to avoid that work in case it is something obvious. The next step would be to remove fuse from the equation -- it does try to move the pages from the pipe when splicing to it, for example.
>>
>> When I use 5.10.7 with 09854ba94c6aad7886996bfbee2530b3d8a7f4f4 ("mm: do_wp_page() simplification"), 1a0cf26323c80e2f1c58fc04f15686de61bfab0c ("mm/ksm: Remove reuse_ksm_page()") and be068f29034fb00530a053d18b8cf140c32b12b3 ("mm: fix misplaced unlock_page in do_wp_page()") reverted the issue doesn't seem to occur.
>
> Thanks for tracking it down. Was it reported to Linus and Peter?
That seems very strange and should then affect a bunch of other stuff,
too... Do you have a test case? I'd love to dive into this and figure
out what is going on, and would save me a lot of time.
--
Jens Axboe
next prev parent reply other threads:[~2021-01-16 19:40 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-08 23:39 Fixed buffer have out-dated content Martin Raiber
2021-01-09 16:23 ` Jens Axboe
2021-01-09 16:58 ` Martin Raiber
2021-01-09 20:32 ` Pavel Begunkov
2021-01-10 16:50 ` Martin Raiber
2021-01-14 21:50 ` Fixed buffers " Martin Raiber
2021-01-16 19:30 ` Pavel Begunkov
2021-01-16 19:39 ` Jens Axboe [this message]
2021-01-16 22:12 ` Jens Axboe
2021-01-16 23:05 ` Linus Torvalds
2021-01-16 23:34 ` Linus Torvalds
2021-01-17 20:07 ` Martin Raiber
2021-01-17 20:14 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fb386017-3362-9cc6-8a9a-fe2233c9525d@kernel.dk \
--to=axboe@kernel.dk \
--cc=asml.silence@gmail.com \
--cc=io-uring@vger.kernel.org \
--cc=martin@urbackup.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.