From: Jens Axboe <axboe@kernel.dk>
To: Matthew Wilcox <willy@infradead.org>, Chris Mason <clm@meta.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Dave Chinner <david@fromorbit.com>,
Christian Theune <ct@flyingcircus.io>,
linux-mm@kvack.org,
"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
Daniel Dao <dqminh@cloudflare.com>,
regressions@lists.linux.dev, regressions@leemhuis.info
Subject: Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
Date: Wed, 18 Sep 2024 00:37:02 -0600 [thread overview]
Message-ID: <5bee194c-9cd3-47e7-919b-9f352441f855@kernel.dk> (raw)
In-Reply-To: <ZumDPU7RDg5wV0Re@casper.infradead.org>
On 9/17/24 7:25 AM, Matthew Wilcox wrote:
> On Tue, Sep 17, 2024 at 01:13:05PM +0200, Chris Mason wrote:
>> On 9/17/24 5:32 AM, Matthew Wilcox wrote:
>>> On Mon, Sep 16, 2024 at 10:47:10AM +0200, Chris Mason wrote:
>>>> I've got a bunch of assertions around incorrect folio->mapping and I'm
>>>> trying to bash on the ENOMEM for readahead case. There's a GFP_NOWARN
>>>> on those, and our systems do run pretty short on ram, so it feels right
>>>> at least. We'll see.
>>>
>>> I've been running with some variant of this patch the whole way across
>>> the Atlantic, and not hit any problems. But maybe with the right
>>> workload ...?
>>>
>>> There are two things being tested here. One is whether we have a
>>> cross-linked node (ie a node that's in two trees at the same time).
>>> The other is whether the slab allocator is giving us a node that already
>>> contains non-NULL entries.
>>>
>>> If you could throw this on top of your kernel, we might stand a chance
>>> of catching the problem sooner. If it is one of these problems and not
>>> something weirder.
>>>
>>
>> This fires in roughly 10 seconds for me on top of v6.11. Since array seems
>> to always be 1, I'm not sure if the assertion is right, but hopefully you
>> can trigger yourself.
>
> Whoops.
>
> $ git grep XA_RCU_FREE
> lib/xarray.c:#define XA_RCU_FREE ((struct xarray *)1)
> lib/xarray.c: node->array = XA_RCU_FREE;
>
> so you walked into a node which is currently being freed by RCU. Which
> isn't a problem, of course. I don't know why I do that; it doesn't seem
> like anyone tests it. The jetlag is seriously kicking in right now,
> so I'm going to refrain from saying anything more because it probably
> won't be coherent.
Based on a modified reproducer from Chris (N threads reading from a
file, M threads dropping pages), I can pretty quickly reproduce the
xas_descend() spin on 6.9 in a vm with 128 cpus. Here's some debugging
output with a modified version of your patch too, that ignores
XA_RCU_FREE:
node ffff8e838a01f788 max 59 parent 0000000000000000 shift 0 count 0 values 0 array ffff8e839dfa86a0 list ffff8e838a01f7a0 ffff8e838a01f7a0 marks 0 0 0
WARNING: CPU: 106 PID: 1554 at lib/xarray.c:405 xas_alloc.cold+0x26/0x4b
which is:
XA_NODE_BUG_ON(node, memchr_inv(&node->slots, 0, sizeof(void *) * XA_CHUN K_SIZE));
and:
node ffff8e838a01f788 offset 59 parent ffff8e838b0419c8 shift 0 count 252 values 0 array ffff8e839dfa86a0 list ffff8e838a01f7a0 ffff8e838a01f7a0 marks 0 0 0
which is:
XA_NODE_BUG_ON(node, node->count > XA_CHUNK_SIZE);
and for this particular run, 2 threads spinning:
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: Tasks blocked on level-1 rcu_node (CPUs 16-31): P1555
rcu: Tasks blocked on level-1 rcu_node (CPUs 64-79): P1556
rcu: (detected by 97, t=2102 jiffies, g=7821, q=293800 ncpus=128)
task:reader state:R running task stack:0 pid:1555 tgid:1551 ppid:1 flags:0x00004006
Call Trace:
<TASK>
? __schedule+0x37f/0xaa0
? sysvec_apic_timer_interrupt+0x96/0xb0
? asm_sysvec_apic_timer_interrupt+0x16/0x20
? xas_load+0x74/0xe0
? xas_load+0x10/0xe0
? xas_find+0x162/0x1b0
? find_lock_entries+0x1ac/0x360
? find_lock_entries+0x76/0x360
? mapping_try_invalidate+0x5d/0x130
? generic_fadvise+0x110/0x240
? xfd_validate_state+0x1e/0x70
? ksys_fadvise64_64+0x50/0x90
? __x64_sys_fadvise64+0x18/0x20
? do_syscall_64+0x5d/0x180
? entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
task:reader state:R running task stack:0 pid:1556 tgid:1551 ppid:1 flags:0x00004006
The reproducer takes ~30 seconds, and will lead to anywhere from 1..N
threads spinning here.
Now for the kicker - this doesn't reproduce in 6.10 and onwards. There
are only a few changes here that are relevant, seemingly, and the prime
candidates are:
commit a4864671ca0bf51c8e78242951741df52c06766f
Author: Kairui Song <kasong@tencent.com>
Date: Tue Apr 16 01:18:55 2024 +0800
lib/xarray: introduce a new helper xas_get_order
and the followup filemap change:
commit 6758c1128ceb45d1a35298912b974eb4895b7dd9
Author: Kairui Song <kasong@tencent.com>
Date: Tue Apr 16 01:18:56 2024 +0800
mm/filemap: optimize filemap folio adding
and reverting those two on 6.10 hits it again almost immediately. Didn't
look into these commit, but looks like they inadvertently also fixed
this corruption issue.
--
Jens Axboe
next prev parent reply other threads:[~2024-09-18 6:37 UTC|newest]
Thread overview: 81+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-12 21:18 Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) Christian Theune
2024-09-12 21:55 ` Matthew Wilcox
2024-09-12 22:11 ` Christian Theune
2024-09-12 22:12 ` Jens Axboe
2024-09-12 22:25 ` Linus Torvalds
2024-09-12 22:30 ` Jens Axboe
2024-09-12 22:56 ` Linus Torvalds
2024-09-13 3:44 ` Matthew Wilcox
2024-09-13 13:23 ` Christian Theune
2024-09-13 12:11 ` Christian Brauner
2024-09-16 13:29 ` Matthew Wilcox
2024-09-18 9:51 ` Christian Brauner
2024-09-13 15:30 ` Chris Mason
2024-09-13 15:51 ` Matthew Wilcox
2024-09-13 16:33 ` Chris Mason
2024-09-13 18:15 ` Matthew Wilcox
2024-09-13 21:24 ` Linus Torvalds
2024-09-13 21:30 ` Matthew Wilcox
2024-09-13 16:04 ` David Howells
2024-09-13 16:37 ` Chris Mason
2024-09-16 0:00 ` Dave Chinner
2024-09-16 4:20 ` Linus Torvalds
2024-09-16 8:47 ` Chris Mason
2024-09-17 9:32 ` Matthew Wilcox
2024-09-17 9:36 ` Chris Mason
2024-09-17 10:11 ` Christian Theune
2024-09-17 11:13 ` Chris Mason
2024-09-17 13:25 ` Matthew Wilcox
2024-09-18 6:37 ` Jens Axboe [this message]
2024-09-18 9:28 ` Chris Mason
2024-09-18 12:23 ` Chris Mason
2024-09-18 13:34 ` Matthew Wilcox
2024-09-18 13:51 ` Linus Torvalds
2024-09-18 14:12 ` Matthew Wilcox
2024-09-18 14:39 ` Linus Torvalds
2024-09-18 17:12 ` Matthew Wilcox
2024-09-18 16:37 ` Chris Mason
2024-09-19 1:43 ` Dave Chinner
2024-09-19 3:03 ` Linus Torvalds
2024-09-19 3:12 ` Linus Torvalds
2024-09-19 3:38 ` Jens Axboe
2024-09-19 4:32 ` Linus Torvalds
2024-09-19 4:42 ` Jens Axboe
2024-09-19 4:36 ` Matthew Wilcox
2024-09-19 4:46 ` Jens Axboe
2024-09-19 5:20 ` Jens Axboe
2024-09-19 4:46 ` Linus Torvalds
2024-09-20 13:54 ` Chris Mason
2024-09-24 15:58 ` Matthew Wilcox
2024-09-24 17:16 ` Sam James
2024-09-25 16:06 ` Kairui Song
2024-09-25 16:42 ` Christian Theune
2024-09-27 14:51 ` Sam James
2024-09-27 14:58 ` Jens Axboe
2024-10-01 21:10 ` Kairui Song
2024-09-24 19:17 ` Chris Mason
2024-09-24 19:24 ` Linus Torvalds
2024-09-19 6:34 ` Christian Theune
2024-09-19 6:57 ` Linus Torvalds
2024-09-19 10:19 ` Christian Theune
2024-09-30 17:34 ` Christian Theune
2024-09-30 18:46 ` Linus Torvalds
2024-09-30 19:25 ` Christian Theune
2024-09-30 20:12 ` Linus Torvalds
2024-09-30 20:56 ` Matthew Wilcox
2024-09-30 22:42 ` Davidlohr Bueso
2024-09-30 23:00 ` Davidlohr Bueso
2024-09-30 23:53 ` Linus Torvalds
2024-10-01 0:56 ` Chris Mason
2024-10-01 7:54 ` Christian Theune
2024-10-10 6:29 ` Christian Theune
2024-10-11 7:27 ` Christian Theune
2024-10-11 9:08 ` Christian Theune
2024-10-11 13:06 ` Chris Mason
2024-10-11 13:50 ` Christian Theune
2024-10-12 17:01 ` Linus Torvalds
2024-12-02 10:44 ` Christian Theune
2024-10-01 2:22 ` Dave Chinner
2024-09-16 7:14 ` Christian Theune
2024-09-16 12:16 ` Matthew Wilcox
2024-09-18 8:31 ` Christian Theune
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5bee194c-9cd3-47e7-919b-9f352441f855@kernel.dk \
--to=axboe@kernel.dk \
--cc=clm@meta.com \
--cc=ct@flyingcircus.io \
--cc=david@fromorbit.com \
--cc=dqminh@cloudflare.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=regressions@leemhuis.info \
--cc=regressions@lists.linux.dev \
--cc=torvalds@linux-foundation.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.