Re: [Bug 208827] [fio io_uring] io_uring write data crc32c verify failed

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Jens Axboe <axboe@kernel.dk>
To: Dave Chinner <david@fromorbit.com>, bugzilla-daemon@bugzilla.kernel.org
Cc: linux-xfs@vger.kernel.org
Subject: Re: [Bug 208827] [fio io_uring] io_uring write data crc32c verify failed
Date: Mon, 10 Aug 2020 21:01:17 -0600	[thread overview]
Message-ID: <a01b791f-c6e9-c554-5293-b4baff01e9fd@kernel.dk> (raw)
In-Reply-To: <cd94fcfb-6a8f-b0f4-565e-74733d71d7c3@kernel.dk>

On 8/10/20 8:01 PM, Jens Axboe wrote:
> On 8/10/20 7:50 PM, Jens Axboe wrote:
>> On 8/10/20 7:15 PM, Jens Axboe wrote:
>>> On 8/10/20 3:08 AM, Dave Chinner wrote:
>>>> On Mon, Aug 10, 2020 at 05:08:07PM +1000, Dave Chinner wrote:
>>>>> [cc Jens]
>>>>>
>>>>> [Jens, data corruption w/ io_uring and simple fio reproducer. see
>>>>> the bz link below.]
>>>>>
>>>>> On Mon, Aug 10, 2020 at 01:56:05PM +1000, Dave Chinner wrote:
>>>>>> On Mon, Aug 10, 2020 at 10:09:32AM +1000, Dave Chinner wrote:
>>>>>>> On Fri, Aug 07, 2020 at 03:12:03AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
>>>>>>>> --- Comment #1 from Dave Chinner (david@fromorbit.com) ---
>>>>>>>> On Thu, Aug 06, 2020 at 04:57:58AM +0000, bugzilla-daemon@bugzilla.kernel.org
>>>>>>>> wrote:
>>>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=208827
>>>>>>>>>
>>>>>>>>>             Bug ID: 208827
>>>>>>>>>            Summary: [fio io_uring] io_uring write data crc32c verify
>>>>>>>>>                     failed
>>>>>>>>>            Product: File System
>>>>>>>>>            Version: 2.5
>>>>>>>>>     Kernel Version: xfs-linux xfs-5.9-merge-7 + v5.8-rc4
>>>>>>>
>>>>>>> FWIW, I can reproduce this with a vanilla 5.8 release kernel,
>>>>>>> so this isn't related to contents of the XFS dev tree at all...
>>>>>>>
>>>>>>> In fact, this bug isn't a recent regression. AFAICT, it was
>>>>>>> introduced between in 5.4 and 5.5 - 5.4 did not reproduce, 5.5 did
>>>>>>> reproduce. More info once I've finished bisecting it....
>>>>>>
>>>>>> f67676d160c6ee2ed82917fadfed6d29cab8237c is the first bad commit
>>>>>> commit f67676d160c6ee2ed82917fadfed6d29cab8237c
>>>>>> Author: Jens Axboe <axboe@kernel.dk>
>>>>>> Date:   Mon Dec 2 11:03:47 2019 -0700
>>>>>>
>>>>>>     io_uring: ensure async punted read/write requests copy iovec
>>>>>
>>>>> ....
>>>>>
>>>>> Ok, I went back to vanilla 5.8 to continue debugging and adding
>>>>> tracepoints, and it's proving strangely difficult to reproduce now.
>>>>
>>>> Which turns out to be caused by a tracepoint I inserted to try to
>>>> narrow down if this was an invalidation race. I put this in
>>>> invalidate_complete_page:
>>>>
>>>>
>>>> --- a/mm/truncate.c
>>>> +++ b/mm/truncate.c
>>>> @@ -257,8 +257,11 @@ int invalidate_inode_page(struct page *page)
>>>>         struct address_space *mapping = page_mapping(page);
>>>>         if (!mapping)
>>>>                 return 0;
>>>> -       if (PageDirty(page) || PageWriteback(page))
>>>> +       if (PageDirty(page) || PageWriteback(page)) {
>>>> +               trace_printk("ino 0x%lx page %p, offset 0x%lx\n",
>>>> +                       mapping->host->i_ino, page, page->index * PAGE_SIZE);
>>>>                 return 0;
>>>> +       }
>>>>         if (page_mapped(page))
>>>>                 return 0;
>>>>         return invalidate_complete_page(mapping, page);
>>>>
>>>>
>>>> And that alone, without even enabling tracepoints, made the
>>>> corruption go completely away. So I suspect a page state race
>>>> condition and look at POSIX_FADV_DONTNEED, which fio is issuing
>>>> before running it's verification reads. First thing that does:
>>>>
>>>> 	if (!inode_write_congested(mapping->host))
>>>> 		__filemap_fdatawrite_range(mapping, offset, endbyte,
>>>> 					   WB_SYNC_NONE);
>>>>
>>>> It starts async writeback of the dirty pages. There's 256MB of dirty
>>>> pages on these inodes, and iomap tracing indicates the entire 256MB
>>>> immediately runs through the trace_iomap_writepage() tracepoint.
>>>> i.e. every page goes Dirty -> Writeback and is submitted for async
>>>> IO.
>>>>
>>>> Then the POSIX_FADV_DONTNEED code goes and runs
>>>> invalidate_mapping_pages(), which ends up try-locking each page and
>>>> then running invalidate_inode_page() on the page, which is where the
>>>> trace debug I put in on pages under writeback gets hit. So if
>>>> changing the invalidation code for pages under writeback makes the
>>>> problem go away, then stopping invalidate_mapping_pages() from
>>>> racing with page writeback should make the problem go away, too.
>>>>
>>>> This does indeed make the corruption go away:
>>>>
>>>> --- a/mm/fadvise.c
>>>> +++ b/mm/fadvise.c
>>>> @@ -109,9 +109,8 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
>>>>         case POSIX_FADV_NOREUSE:
>>>>                 break;
>>>>         case POSIX_FADV_DONTNEED:
>>>>                 if (!inode_write_congested(mapping->host))
>>>> -                       __filemap_fdatawrite_range(mapping, offset, endbyte,
>>>> -                                                  WB_SYNC_NONE);
>>>> +                       filemap_write_and_wait_range(mapping, offset, endbyte);
>>>>  
>>>>                 /*
>>>>                  * First and last FULL page! Partial pages are deliberately
>>>>
>>>> by making the invalidation wait for the pages to go fully to the
>>>> clean state before starting.
>>>>
>>>> This, however, only fixes the specific symptom being tripped over
>>>> here.  To further test this, I removed this writeback from
>>>> POSIX_FADV_DONTNEED completely so I could trigger writeback via
>>>> controlled background writeback. And, as I expected, whenever
>>>> background writeback ran to write back these dirty files, the
>>>> verification failures triggered again. It is quite reliable.
>>>>
>>>> So it looks like there is some kind of writeback completion vs page
>>>> invalidation race condition occurring, but more work is needed to
>>>> isolate it further. I don't know what part the async read plays in
>>>> the corruption yet, because I don't know how we are getting pages in
>>>> the cache where page->index != the file offset stamped in the data.
>>>> That smells of leaking PageUptodate flags...
>>>
>>> The async read really isn't doing anything that you could not do with
>>> separate threads. Unfortunately it's not that easy to have multiple
>>> threads working on the same region with fio, or we could've reproduced
>>> it with a job file written to use that instead.
>>>
>>> I'll dig a bit here...
>>
>> Have we verified that the actual page cache is inconsistent, or is that
>> just an assumption? I'm asking since I poked a bit on the fio side, and
>> suspiciously the failed verification was a short IO. At least
>> originally, fio will retry those, but it could be a bug in the io_uring
>> engine for fio.
>>
>> I'll poke some more.
> 
> The on-disk state seems sane. I added a hack that clears the rest of the
> buffer to 0x5a when we get a short read. When the verify fails, the io_u
> that was attempted verified looks like this:
> 
> 0000fe0 116b b418 d14b 0477 822d 6dcd 201d 1316
> 0000ff0 3045 eca3 0d1c 1a4f e608 0571 6b52 015e
> 0001000 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a
> *
> 0010000
> 
> where fio dumps this as the expected data:
> 
> 0000fe0 116b b418 d14b 0477 822d 6dcd 201d 1316
> 0000ff0 3045 eca3 0d1c 1a4f e608 0571 6b52 015e
> 0001000 3cc1 4daa cab1 12ba c798 0b54 b281 0a05
> 0001010 98f3 bd9e 30a5 1728 531e 6b3a 2745 1877
> 
> Fio says the offset is 62652416 in the file, and reading
> that 64k block from the file and dumping it:
> 
> 0000fe0 116b b418 d14b 0477 822d 6dcd 201d 1316
> 0000ff0 3045 eca3 0d1c 1a4f e608 0571 6b52 015e
> 0001000 3cc1 4daa cab1 12ba c798 0b54 b281 0a05
> 0001010 98f3 bd9e 30a5 1728 531e 6b3a 2745 1877
> 
> So it seems to me like the file state is consistent, at least after the
> run, and that this seems more likely to be a fio issue with short
> read handling.
> 
> Poking along...

I've pushed some fixes for fio for this, it now works fine for me. JFYI.

-- 
Jens Axboe

next prev parent reply	other threads:[~2020-08-11  3:01 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-06  4:57 [Bug 208827] New: [fio io_uring] io_uring write data crc32c verify failed bugzilla-daemon
2020-08-07  2:42 ` Dave Chinner
2020-08-07  3:12 ` [Bug 208827] " bugzilla-daemon
2020-08-10  0:09   ` Dave Chinner
2020-08-10  3:56     ` Dave Chinner
2020-08-10  7:08       ` Dave Chinner
2020-08-10  9:08         ` Dave Chinner
2020-08-11  1:15           ` Jens Axboe
2020-08-11  1:50             ` Jens Axboe
2020-08-11  2:01               ` Jens Axboe
2020-08-11  3:01                 ` Jens Axboe [this message]
2020-08-11 20:56                 ` Jeff Moyer
2020-08-11 22:09                   ` Dave Chinner
2020-08-12 15:13                     ` Jens Axboe
2020-08-12 15:24                       ` Jeff Moyer
2020-08-12 15:26                         ` Jens Axboe
2020-08-11  2:00           ` Dave Chinner
2020-08-11  2:19             ` Jens Axboe
2020-08-11  5:53               ` Dave Chinner
2020-08-11  7:05               ` Dave Chinner
2020-08-11 13:10                 ` Jens Axboe
2020-08-11 21:59                   ` Dave Chinner
2020-08-11 23:00                     ` Dave Chinner
2020-08-12 15:19                       ` Jens Axboe
2020-08-11  1:07         ` Jens Axboe
2020-08-10  0:09 ` bugzilla-daemon
2020-08-10  3:56 ` bugzilla-daemon
2020-08-10  7:08 ` bugzilla-daemon
2020-08-10  9:09 ` bugzilla-daemon
2020-08-11  1:07 ` bugzilla-daemon
2020-08-11  1:15 ` bugzilla-daemon
2020-08-11  1:50 ` bugzilla-daemon
2020-08-11  2:00 ` bugzilla-daemon
2020-08-11  2:01 ` bugzilla-daemon
2020-08-11  2:20 ` bugzilla-daemon
2020-08-11  3:01 ` bugzilla-daemon
2020-08-11  5:53 ` bugzilla-daemon
2020-08-11  7:05 ` bugzilla-daemon
2020-08-11 13:10 ` bugzilla-daemon
2020-08-11 16:16 ` bugzilla-daemon
2020-08-11 20:56 ` bugzilla-daemon
2020-08-11 21:59 ` bugzilla-daemon
2020-08-11 22:09 ` bugzilla-daemon
2020-08-11 23:00 ` bugzilla-daemon
2020-08-12  3:15 ` bugzilla-daemon
2020-08-12 15:14 ` bugzilla-daemon
2020-08-12 15:19 ` bugzilla-daemon
2020-08-12 15:24 ` bugzilla-daemon
2020-08-12 15:26 ` bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a01b791f-c6e9-c554-5293-b4baff01e9fd@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=bugzilla-daemon@bugzilla.kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox