public inbox for linux-raid@vger.kernel.org
 help / color / mirror / Atom feed
* parity raid and ext4 get stuck in writes
@ 2023-12-22 20:48 Carlos Carvalho
  2023-12-22 23:00 ` eyal
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Carlos Carvalho @ 2023-12-22 20:48 UTC (permalink / raw)
  To: linux-ext4, linux-raid

This is finally a summary of a long standing problem. When lots of writes to
many files are sent in a short time the kernel gets stuck and stops sending
write requests to the disks. Sometimes it recovers and finally sends the
modified pages to permanent storage, sometimes not and eventually other
functions degrade and the machine crashes.

A simple way to reproduce: expand a kernel source tree, like
xzcat linux-6.5.tar.xz | tar x -f -

With the default vm settings for dirty_background_ratio and dirty_ratio this
will finish quickly with ~1.5GB of dirty pages in ram and ~100k inodes to be
written and the kernel gets stuck.

The bug exists in all 6.* kernels; I've tested the latest release of all
6.[1-6]. However some conditions must exist for the problem to appear:

- there must be many inodes to be flushed; just many bytes in a few files don't
  show the problem
- it happens only with ext4 on a parity raid array

I've moved one of our arrays to xfs and everything works fine, so it's either
specific to ext4 or xfs is not affected. When the lockup happens the flush
kworker starts using 100% cpu permanently. I have not observed the bug in
raid10, only in raid[56].

The problem is more easily triggered with 6.[56] but 6.1 is also affected.

Limiting dirty_bytes and dirty_background_bytes to low values reduce the
probability of lockup, probably because the process generating writes is
stopped before too many files are created.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-01-04  6:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-22 20:48 parity raid and ext4 get stuck in writes Carlos Carvalho
2023-12-22 23:00 ` eyal
2023-12-25  7:39 ` Daniel Dawson
2023-12-25 10:15   ` Peter Grandi
2023-12-25 13:38     ` Carlos Carvalho
2024-01-04  6:11   ` Ojaswin Mujoo
2024-01-04  6:08 ` Ojaswin Mujoo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox