From: "Darrick J. Wong" <djwong@kernel.org>
To: Jose M Calhariz <jose.calhariz@tecnico.ulisboa.pt>
Cc: linux-xfs@vger.kernel.org
Subject: Re: Data corruption with XFS on Debian 11 and 12 under heavy load.
Date: Tue, 29 Aug 2023 16:41:36 -0700 [thread overview]
Message-ID: <20230829234136.GF28186@frogsfrogsfrogs> (raw)
In-Reply-To: <ZO4nuHNg+KFzZ2Qz@calhariz.com>
On Tue, Aug 29, 2023 at 06:15:36PM +0100, Jose M Calhariz wrote:
>
> Hi,
>
> I have been chasing a data corruption problem under heavy load on 4
> servers that I have at my care. First I thought of an hardware
> problem because it only happen with RAID 6 disks. So I reported to Debian:
>
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1032391
>
> Further research pointed to be the XFS the common pattern, not an
> hardware issue. So I made an informal query to a friend in a software
> house that relies heavily on XFS about his thought on this issue. He
> made reference to several problems fixed on kernel 6.2 and a
> discussion on this mailing list about back porting the fixes to 6.1
> kernel.
>
> With this information I have tried the latest kernel at that time on
> Debian testing over Debian v12 and I could not reproduce the
> problem. So I made another bug report:
>
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1040416
>
> My questions to this mailing list:
>
> - Have anyone experienced under Debian or with vanilla kernels
> corruption under heavy load on XFS?
Yes. There were a rash of corruption problems that got fixed in 6.2:
https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/tag/?h=xfs-6.2-merge-8
My guess with no other information is either the write invalidation
problem in iomap; or maybe COW extent allocations racing with the log.
Most of these haven't been backported to 6.1 because our only choices as
a community were (a) let a dumb bot shovel in patches with zero QA or
(b) try to scare up volunteers to backport things to LTS kernels. (a)
wasn't acceptable, but then with (b)...
> - Should I stop waiting for the fixes being back ported to vanilla
> 6.1 and run the latest kernel from Debian testing anyway? Taking
> notice that kernels from testing have less security updates on time
> than stable kernels, specially security issues with limited
> disclosure.
...there isn't really a designated 6.1 LTS backport engineer right now.
A couple folks from Cloudflare; Amir Goldstein; and Ted Ts'o have been
sharing the work when they have spare time.
--D
> I am happy to provide more info about my setup or my stability tests
> that fail under XFS.
>
>
> Kind regards
> Jose M Calhariz
>
> --
> --
> Um falso amigo nunca o xinga
>
> Um verdadeiro amigo já o xingou de tudo quanto é
> palavrão que existe - e até inventou alguns novos
prev parent reply other threads:[~2023-08-29 23:42 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-29 17:15 Data corruption with XFS on Debian 11 and 12 under heavy load Jose M Calhariz
2023-08-29 21:54 ` Dave Chinner
2023-08-29 23:41 ` Darrick J. Wong [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230829234136.GF28186@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=jose.calhariz@tecnico.ulisboa.pt \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox