From: Jacek Tomaka <Jacek.Tomaka@poczta.fm>
To: NeilBrown <neilb@suse.de>
Cc: "trond.myklebust@hammerspace.com"
<trond.myklebust@hammerspace.com>,
"anna.schumaker@netapp.com" <anna.schumaker@netapp.com>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: NFS data corruption on congested network
Date: Mon, 26 Feb 2024 12:58:16 +0100 [thread overview]
Message-ID: <flfkkydzpicimncinmba@mlpw> (raw)
In-Reply-To: <170890314859.24797.16728369357798855399@noble.neil.brown.name>
Hi NeilBrown,
> though if your kernel is older than 6.3, that will be
> redirty_for_writepage(wbc, page);
Things are looking good. I have ran it on 15 machines for good couple of hours and i do not see the problem. Usually i would see it after 1-3 iterations but now they are reaching 20 iterations without the problem.
Thank you for the fix.
Regards.
Jacek Tomaka
Temat: Re: NFS data corruption on congested network
Data: 2024-02-26 0:19
Nadawca: "NeilBrown" <neilb@suse.de>
Adresat: "Jacek Tomaka" <Jacek.Tomaka@poczta.fm>;
DW: trond.myklebust@hammerspace.com; anna.schumaker@netapp.com; linux-nfs@vger.kernel.org;
>
>> On Mon, 26 Feb 2024, NeilBrown wrote:
>> On Fri, 23 Feb 2024, Jacek Tomaka wrote:
>>> Hello,
>>> I ran into an issue where the NFS file ends up being corrupted on
disk. We started noticing it on certain, quite old hardware after upgrading
OS from Centos 6 to Rocky 9.2. We do see it on Rocky 9.3 but not on 9.1.
>>>
>>> After some investigation we have reasons to believe that the
change was introduced by the following commit:
>>>
https://github.com/torvalds/linux/commit/6df25e58532be7a4cd6fb15bcd85805947402d91
>>
>> Thanks for the report.
>> Can you try a change to your kernel?
>>
>> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
>> index bb79d3a886ae..08a787147bd2 100644
>> --- a/fs/nfs/write.c
>> +++ b/fs/nfs/write.c
>> @@ -668,8 +668,10 @@ static int nfs_writepage_locked(struct folio
*folio,
>> int err;
>>
>> if (wbc->sync_mode == WB_SYNC_NONE &&
>> - NFS_SERVER(inode)->write_congested)
>> + NFS_SERVER(inode)->write_congested) {
>> + folio_redirty_for_writepage(wbc, folio);
>> return AOP_WRITEPAGE_ACTIVATE;
>> + }
>>
>> nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGE);
>> nfs_pageio_init_write(&pgio, inode, 0, false,
>
> Actually this is only needed before linux 6.8 as only nfs_writepage()
> can call nfs_writepage_locked() with sync_mode of WB_SYNC_NONE.
> So v5.18 through v6.7 might need fixing.
>
> NeilBrown
>
>
>>
>>
>> though if your kernel is older than 6.3, that will be
>> redirty_for_writepage(wbc, page);
>>
>> Thanks,
>> NeilBrown
>>
>>
>>>
>>> We write a number of files on a single thread. Each file is up to
4GB. Before closing we call fdatasync. Sometimes the file ends up being
corrupted. The corruptions is in a form of a number ( more than 3k pages in
one case) of zero filled pages.
>>> When this happens the file cannot be deleted from the client
machine which created the file, even when the process which wrote the file
completed successfully.
>>>
>>> The machines have about 128GB of memory, i think and probably
network that leaves to be desired.
>>>
>>> My reproducer is currently tied up to our internal software, but i
suspect setting the write_congested flag randomly should allow to reproduce
the issue.
>>>
>>> Regards.
>>> Jacek Tomaka
>>>
>>
>>
>>
>
>
>
next prev parent reply other threads:[~2024-02-26 12:01 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-22 14:54 NFS data corruption on congested network Jacek Tomaka
2024-02-25 23:02 ` NeilBrown
2024-02-25 23:19 ` NeilBrown
2024-02-26 8:39 ` Cedric Blancher
2024-02-26 11:58 ` Jacek Tomaka [this message]
2024-02-27 22:59 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=flfkkydzpicimncinmba@mlpw \
--to=jacek.tomaka@poczta.fm \
--cc=anna.schumaker@netapp.com \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=trond.myklebust@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.