From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0ECE0C433DF for ; Tue, 11 Aug 2020 01:50:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DAD9520716 for ; Tue, 11 Aug 2020 01:50:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727985AbgHKBuV convert rfc822-to-8bit (ORCPT ); Mon, 10 Aug 2020 21:50:21 -0400 Received: from mail.kernel.org ([198.145.29.99]:33046 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728160AbgHKBuV (ORCPT ); Mon, 10 Aug 2020 21:50:21 -0400 From: bugzilla-daemon@bugzilla.kernel.org To: linux-xfs@vger.kernel.org Subject: [Bug 208827] [fio io_uring] io_uring write data crc32c verify failed Date: Tue, 11 Aug 2020 01:50:19 +0000 X-Bugzilla-Reason: None X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: AssignedTo filesystem_xfs@kernel-bugs.kernel.org X-Bugzilla-Product: File System X-Bugzilla-Component: XFS X-Bugzilla-Version: 2.5 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: axboe@kernel.dk X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P1 X-Bugzilla-Assigned-To: filesystem_xfs@kernel-bugs.kernel.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT X-Bugzilla-URL: https://bugzilla.kernel.org/ Auto-Submitted: auto-generated MIME-Version: 1.0 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org https://bugzilla.kernel.org/show_bug.cgi?id=208827 --- Comment #8 from Jens Axboe (axboe@kernel.dk) --- On 8/10/20 7:15 PM, Jens Axboe wrote: > On 8/10/20 3:08 AM, Dave Chinner wrote: >> On Mon, Aug 10, 2020 at 05:08:07PM +1000, Dave Chinner wrote: >>> [cc Jens] >>> >>> [Jens, data corruption w/ io_uring and simple fio reproducer. see >>> the bz link below.] >>> >>> On Mon, Aug 10, 2020 at 01:56:05PM +1000, Dave Chinner wrote: >>>> On Mon, Aug 10, 2020 at 10:09:32AM +1000, Dave Chinner wrote: >>>>> On Fri, Aug 07, 2020 at 03:12:03AM +0000, >>>>> bugzilla-daemon@bugzilla.kernel.org wrote: >>>>>> --- Comment #1 from Dave Chinner (david@fromorbit.com) --- >>>>>> On Thu, Aug 06, 2020 at 04:57:58AM +0000, >>>>>> bugzilla-daemon@bugzilla.kernel.org >>>>>> wrote: >>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=208827 >>>>>>> >>>>>>> Bug ID: 208827 >>>>>>> Summary: [fio io_uring] io_uring write data crc32c verify >>>>>>> failed >>>>>>> Product: File System >>>>>>> Version: 2.5 >>>>>>> Kernel Version: xfs-linux xfs-5.9-merge-7 + v5.8-rc4 >>>>> >>>>> FWIW, I can reproduce this with a vanilla 5.8 release kernel, >>>>> so this isn't related to contents of the XFS dev tree at all... >>>>> >>>>> In fact, this bug isn't a recent regression. AFAICT, it was >>>>> introduced between in 5.4 and 5.5 - 5.4 did not reproduce, 5.5 did >>>>> reproduce. More info once I've finished bisecting it.... >>>> >>>> f67676d160c6ee2ed82917fadfed6d29cab8237c is the first bad commit >>>> commit f67676d160c6ee2ed82917fadfed6d29cab8237c >>>> Author: Jens Axboe >>>> Date: Mon Dec 2 11:03:47 2019 -0700 >>>> >>>> io_uring: ensure async punted read/write requests copy iovec >>> >>> .... >>> >>> Ok, I went back to vanilla 5.8 to continue debugging and adding >>> tracepoints, and it's proving strangely difficult to reproduce now. >> >> Which turns out to be caused by a tracepoint I inserted to try to >> narrow down if this was an invalidation race. I put this in >> invalidate_complete_page: >> >> >> --- a/mm/truncate.c >> +++ b/mm/truncate.c >> @@ -257,8 +257,11 @@ int invalidate_inode_page(struct page *page) >> struct address_space *mapping = page_mapping(page); >> if (!mapping) >> return 0; >> - if (PageDirty(page) || PageWriteback(page)) >> + if (PageDirty(page) || PageWriteback(page)) { >> + trace_printk("ino 0x%lx page %p, offset 0x%lx\n", >> + mapping->host->i_ino, page, page->index * >> PAGE_SIZE); >> return 0; >> + } >> if (page_mapped(page)) >> return 0; >> return invalidate_complete_page(mapping, page); >> >> >> And that alone, without even enabling tracepoints, made the >> corruption go completely away. So I suspect a page state race >> condition and look at POSIX_FADV_DONTNEED, which fio is issuing >> before running it's verification reads. First thing that does: >> >> if (!inode_write_congested(mapping->host)) >> __filemap_fdatawrite_range(mapping, offset, endbyte, >> WB_SYNC_NONE); >> >> It starts async writeback of the dirty pages. There's 256MB of dirty >> pages on these inodes, and iomap tracing indicates the entire 256MB >> immediately runs through the trace_iomap_writepage() tracepoint. >> i.e. every page goes Dirty -> Writeback and is submitted for async >> IO. >> >> Then the POSIX_FADV_DONTNEED code goes and runs >> invalidate_mapping_pages(), which ends up try-locking each page and >> then running invalidate_inode_page() on the page, which is where the >> trace debug I put in on pages under writeback gets hit. So if >> changing the invalidation code for pages under writeback makes the >> problem go away, then stopping invalidate_mapping_pages() from >> racing with page writeback should make the problem go away, too. >> >> This does indeed make the corruption go away: >> >> --- a/mm/fadvise.c >> +++ b/mm/fadvise.c >> @@ -109,9 +109,8 @@ int generic_fadvise(struct file *file, loff_t offset, >> loff_t len, int advice) >> case POSIX_FADV_NOREUSE: >> break; >> case POSIX_FADV_DONTNEED: >> if (!inode_write_congested(mapping->host)) >> - __filemap_fdatawrite_range(mapping, offset, endbyte, >> - WB_SYNC_NONE); >> + filemap_write_and_wait_range(mapping, offset, >> endbyte); >> >> /* >> * First and last FULL page! Partial pages are deliberately >> >> by making the invalidation wait for the pages to go fully to the >> clean state before starting. >> >> This, however, only fixes the specific symptom being tripped over >> here. To further test this, I removed this writeback from >> POSIX_FADV_DONTNEED completely so I could trigger writeback via >> controlled background writeback. And, as I expected, whenever >> background writeback ran to write back these dirty files, the >> verification failures triggered again. It is quite reliable. >> >> So it looks like there is some kind of writeback completion vs page >> invalidation race condition occurring, but more work is needed to >> isolate it further. I don't know what part the async read plays in >> the corruption yet, because I don't know how we are getting pages in >> the cache where page->index != the file offset stamped in the data. >> That smells of leaking PageUptodate flags... > > The async read really isn't doing anything that you could not do with > separate threads. Unfortunately it's not that easy to have multiple > threads working on the same region with fio, or we could've reproduced > it with a job file written to use that instead. > > I'll dig a bit here... Have we verified that the actual page cache is inconsistent, or is that just an assumption? I'm asking since I poked a bit on the fio side, and suspiciously the failed verification was a short IO. At least originally, fio will retry those, but it could be a bug in the io_uring engine for fio. I'll poke some more. -- You are receiving this mail because: You are watching the assignee of the bug.