public inbox for linux-raid@vger.kernel.org
 help / color / mirror / Atom feed
From: Carlos Carvalho <carlos@fisica.ufpr.br>
To: linux-ext4@vger.kernel.org, linux-raid@vger.kernel.org
Subject: parity raid and ext4 get stuck in writes
Date: Fri, 22 Dec 2023 17:48:01 -0300	[thread overview]
Message-ID: <ZYX2AS8isUHtbMXe@fisica.ufpr.br> (raw)

This is finally a summary of a long standing problem. When lots of writes to
many files are sent in a short time the kernel gets stuck and stops sending
write requests to the disks. Sometimes it recovers and finally sends the
modified pages to permanent storage, sometimes not and eventually other
functions degrade and the machine crashes.

A simple way to reproduce: expand a kernel source tree, like
xzcat linux-6.5.tar.xz | tar x -f -

With the default vm settings for dirty_background_ratio and dirty_ratio this
will finish quickly with ~1.5GB of dirty pages in ram and ~100k inodes to be
written and the kernel gets stuck.

The bug exists in all 6.* kernels; I've tested the latest release of all
6.[1-6]. However some conditions must exist for the problem to appear:

- there must be many inodes to be flushed; just many bytes in a few files don't
  show the problem
- it happens only with ext4 on a parity raid array

I've moved one of our arrays to xfs and everything works fine, so it's either
specific to ext4 or xfs is not affected. When the lockup happens the flush
kworker starts using 100% cpu permanently. I have not observed the bug in
raid10, only in raid[56].

The problem is more easily triggered with 6.[56] but 6.1 is also affected.

Limiting dirty_bytes and dirty_background_bytes to low values reduce the
probability of lockup, probably because the process generating writes is
stopped before too many files are created.

             reply	other threads:[~2023-12-22 20:54 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-22 20:48 Carlos Carvalho [this message]
2023-12-22 23:00 ` parity raid and ext4 get stuck in writes eyal
2023-12-25  7:39 ` Daniel Dawson
2023-12-25 10:15   ` Peter Grandi
2023-12-25 13:38     ` Carlos Carvalho
2024-01-04  6:11   ` Ojaswin Mujoo
2024-01-04  6:08 ` Ojaswin Mujoo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZYX2AS8isUHtbMXe@fisica.ufpr.br \
    --to=carlos@fisica.ufpr.br \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox