From: Jacek Tomaka <Jacek.Tomaka@poczta.fm>
To: NeilBrown <neilb@suse.de>
Cc: "trond.myklebust@hammerspace.com"
<trond.myklebust@hammerspace.com>,
"anna.schumaker@netapp.com" <anna.schumaker@netapp.com>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: NFS data corruption on congested network
Date: Mon, 26 Feb 2024 12:58:16 +0100 [thread overview]
Message-ID: <flfkkydzpicimncinmba@mlpw> (raw)
In-Reply-To: <170890314859.24797.16728369357798855399@noble.neil.brown.name>
Hi NeilBrown,
> though if your kernel is older than 6.3, that will be
> redirty_for_writepage(wbc, page);
Things are looking good. I have ran it on 15 machines for good couple of hours and i do not see the problem. Usually i would see it after 1-3 iterations but now they are reaching 20 iterations without the problem.
Thank you for the fix.
Regards.
Jacek Tomaka
Temat: Re: NFS data corruption on congested network
Data: 2024-02-26 0:19
Nadawca: "NeilBrown" <neilb@suse.de>
Adresat: "Jacek Tomaka" <Jacek.Tomaka@poczta.fm>;
DW: trond.myklebust@hammerspace.com; anna.schumaker@netapp.com; linux-nfs@vger.kernel.org;
>
>> On Mon, 26 Feb 2024, NeilBrown wrote:
>> On Fri, 23 Feb 2024, Jacek Tomaka wrote:
>>> Hello,
>>> I ran into an issue where the NFS file ends up being corrupted on
disk. We started noticing it on certain, quite old hardware after upgrading
OS from Centos 6 to Rocky 9.2. We do see it on Rocky 9.3 but not on 9.1.
>>>
>>> After some investigation we have reasons to believe that the
change was introduced by the following commit:
>>>
https://github.com/torvalds/linux/commit/6df25e58532be7a4cd6fb15bcd85805947402d91
>>
>> Thanks for the report.
>> Can you try a change to your kernel?
>>
>> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
>> index bb79d3a886ae..08a787147bd2 100644
>> --- a/fs/nfs/write.c
>> +++ b/fs/nfs/write.c
>> @@ -668,8 +668,10 @@ static int nfs_writepage_locked(struct folio
*folio,
>> int err;
>>
>> if (wbc->sync_mode == WB_SYNC_NONE &&
>> - NFS_SERVER(inode)->write_congested)
>> + NFS_SERVER(inode)->write_congested) {
>> + folio_redirty_for_writepage(wbc, folio);
>> return AOP_WRITEPAGE_ACTIVATE;
>> + }
>>
>> nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGE);
>> nfs_pageio_init_write(&pgio, inode, 0, false,
>
> Actually this is only needed before linux 6.8 as only nfs_writepage()
> can call nfs_writepage_locked() with sync_mode of WB_SYNC_NONE.
> So v5.18 through v6.7 might need fixing.
>
> NeilBrown
>
>
>>
>>
>> though if your kernel is older than 6.3, that will be
>> redirty_for_writepage(wbc, page);
>>
>> Thanks,
>> NeilBrown
>>
>>
>>>
>>> We write a number of files on a single thread. Each file is up to
4GB. Before closing we call fdatasync. Sometimes the file ends up being
corrupted. The corruptions is in a form of a number ( more than 3k pages in
one case) of zero filled pages.
>>> When this happens the file cannot be deleted from the client
machine which created the file, even when the process which wrote the file
completed successfully.
>>>
>>> The machines have about 128GB of memory, i think and probably
network that leaves to be desired.
>>>
>>> My reproducer is currently tied up to our internal software, but i
suspect setting the write_congested flag randomly should allow to reproduce
the issue.
>>>
>>> Regards.
>>> Jacek Tomaka
>>>
>>
>>
>>
>
>
>
next prev parent reply other threads:[~2024-02-26 12:01 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-22 14:54 NFS data corruption on congested network Jacek Tomaka
2024-02-25 23:02 ` NeilBrown
2024-02-25 23:19 ` NeilBrown
2024-02-26 8:39 ` Cedric Blancher
2024-02-26 11:58 ` Jacek Tomaka [this message]
2024-02-27 22:59 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=flfkkydzpicimncinmba@mlpw \
--to=jacek.tomaka@poczta.fm \
--cc=anna.schumaker@netapp.com \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=trond.myklebust@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox