parity raid and ext4 get stuck in writes

public inbox for linux-raid@vger.kernel.org
 help / color / mirror / Atom feed

* parity raid and ext4 get stuck in writes
@ 2023-12-22 20:48 Carlos Carvalho
  2023-12-22 23:00 ` eyal
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Carlos Carvalho @ 2023-12-22 20:48 UTC (permalink / raw)
  To: linux-ext4, linux-raid

This is finally a summary of a long standing problem. When lots of writes to
many files are sent in a short time the kernel gets stuck and stops sending
write requests to the disks. Sometimes it recovers and finally sends the
modified pages to permanent storage, sometimes not and eventually other
functions degrade and the machine crashes.

A simple way to reproduce: expand a kernel source tree, like
xzcat linux-6.5.tar.xz | tar x -f -

With the default vm settings for dirty_background_ratio and dirty_ratio this
will finish quickly with ~1.5GB of dirty pages in ram and ~100k inodes to be
written and the kernel gets stuck.

The bug exists in all 6.* kernels; I've tested the latest release of all
6.[1-6]. However some conditions must exist for the problem to appear:

- there must be many inodes to be flushed; just many bytes in a few files don't
  show the problem
- it happens only with ext4 on a parity raid array

I've moved one of our arrays to xfs and everything works fine, so it's either
specific to ext4 or xfs is not affected. When the lockup happens the flush
kworker starts using 100% cpu permanently. I have not observed the bug in
raid10, only in raid[56].

The problem is more easily triggered with 6.[56] but 6.1 is also affected.

Limiting dirty_bytes and dirty_background_bytes to low values reduce the
probability of lockup, probably because the process generating writes is
stopped before too many files are created.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: parity raid and ext4 get stuck in writes
  2023-12-22 20:48 parity raid and ext4 get stuck in writes Carlos Carvalho
@ 2023-12-22 23:00 ` eyal
  2023-12-25  7:39 ` Daniel Dawson
  2024-01-04  6:08 ` Ojaswin Mujoo
  2 siblings, 0 replies; 7+ messages in thread
From: eyal @ 2023-12-22 23:00 UTC (permalink / raw)
  To: linux-raid, linux-ext4, carlos

On 23/12/23 07:48, Carlos Carvalho wrote:
> This is finally a summary of a long standing problem. When lots of writes to
> many files are sent in a short time the kernel gets stuck and stops sending
> write requests to the disks. Sometimes it recovers and finally sends the
> modified pages to permanent storage, sometimes not and eventually other
> functions degrade and the machine crashes.
> 
> A simple way to reproduce: expand a kernel source tree, like
> xzcat linux-6.5.tar.xz | tar x -f -
> 
> With the default vm settings for dirty_background_ratio and dirty_ratio this
> will finish quickly with ~1.5GB of dirty pages in ram and ~100k inodes to be
> written and the kernel gets stuck.
> 
> The bug exists in all 6.* kernels; I've tested the latest release of all
> 6.[1-6]. However some conditions must exist for the problem to appear:
> 
> - there must be many inodes to be flushed; just many bytes in a few files don't
>    show the problem
> - it happens only with ext4 on a parity raid array

This may be unrelated but there is an open problem that looks somewhat similar.
It is tracked at
	https://bugzilla.kernel.org/show_bug.cgi?id=217965

If your fs is mounted with a non-zero 'stripe=' (as RAID arrays usually are),
try to get around the issue with
	$ sudo mount -o remount,stripe=0 YourFS
If it makes a difference then you may be looking at a similar issue.

> I've moved one of our arrays to xfs and everything works fine, so it's either
> specific to ext4 or xfs is not affected. When the lockup happens the flush
> kworker starts using 100% cpu permanently. I have not observed the bug in
> raid10, only in raid[56].
> 
> The problem is more easily triggered with 6.[56] but 6.1 is also affected.

The issue was seen in kernels 6.5 and later but not in 6.4, so maybe not the same thing.

> Limiting dirty_bytes and dirty_background_bytes to low values reduce the
> probability of lockup, probably because the process generating writes is
> stopped before too many files are created.

HTH

-- 
Eyal at Home (eyal@eyal.emu.id.au)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: parity raid and ext4 get stuck in writes
  2023-12-22 20:48 parity raid and ext4 get stuck in writes Carlos Carvalho
  2023-12-22 23:00 ` eyal
@ 2023-12-25  7:39 ` Daniel Dawson
  2023-12-25 10:15   ` Peter Grandi
  2024-01-04  6:11   ` Ojaswin Mujoo
  2024-01-04  6:08 ` Ojaswin Mujoo
  2 siblings, 2 replies; 7+ messages in thread
From: Daniel Dawson @ 2023-12-25  7:39 UTC (permalink / raw)
  To: Carlos Carvalho, linux-ext4, linux-raid

On 12/22/23 12:48 PM, Carlos Carvalho wrote:
> This is finally a summary of a long standing problem. When lots of writes to
> many files are sent in a short time the kernel gets stuck and stops sending
> write requests to the disks. Sometimes it recovers and finally sends the
> modified pages to permanent storage, sometimes not and eventually other
> functions degrade and the machine crashes.
>
> A simple way to reproduce: expand a kernel source tree, like
> xzcat linux-6.5.tar.xz | tar x -f -
This sounds almost exactly like a problem I was having, right down to 
triggering it by writing the files of a kernel tree, though the details 
in my case are slightly different. I wanted to report it, but wanted to 
get a better handle on it and never managed it, and now I've changed my 
setup such that it doesn't happen anymore.
> - it happens only with ext4 on a parity raid array

This is where it differs for me. I experienced it only with btrfs. But I 
had two arrays with it, one on SSDs and one on HDDs. The HDD array 
exhibited the problem almost exclusively (the SSDs, I think, exhibited 
it once in several months, while the HDDs did pretty much every time I 
tried to compile a new kernel (until I started working around it), and 
even from some other things, which was a couple of times a week). I 
imagine because HDDs much slower and therefore allow more data to get 
cached.

Now that I've switched the HDD array to ext4, I haven't experienced the 
issue even once. But the setup has better performance, so maybe it's 
just because it flushes its writes faster.

-- 
PGP fingerprint: 5BBD5080FEB0EF7F142F8173D572B791F7B4422A

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: parity raid and ext4 get stuck in writes
  2023-12-25  7:39 ` Daniel Dawson
@ 2023-12-25 10:15   ` Peter Grandi
  2023-12-25 13:38     ` Carlos Carvalho
  2024-01-04  6:11   ` Ojaswin Mujoo
  1 sibling, 1 reply; 7+ messages in thread
From: Peter Grandi @ 2023-12-25 10:15 UTC (permalink / raw)
  To: list Linux RAID

>> [...] a long standing problem. When lots of writes to many
>> files are sent in a short time the kernel gets stuck and
>> stops sending write requests to the disks. [...] A simple way
>> to reproduce: expand a kernel source tree, like xzcat
>> linux-6.5.tar.xz | tar x -f -

That is a well known (ideally...) consequence of misconfiguring
both physical storage and the Linux flusher cache so there is a
high chance of post-saturation congestion under load.

https://www.sabi.co.uk/blog/anno05-4th.html?051105#051105

> [...] I had two arrays with it, one on SSDs and one on HDDs.
> The HDD array exhibited the problem almost exclusively [...]

If an HDD set is misconfigured so that it reaches post
saturation congestion of IOPS much sooner than problems with
that and the consequences of flusher cache misconfiguration will
happen much more frequently. Usually to sustain the same number
of workload IOPS one needs at least 10 times more HDDs than
SSDs.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: parity raid and ext4 get stuck in writes
  2023-12-25 10:15   ` Peter Grandi
@ 2023-12-25 13:38     ` Carlos Carvalho
  0 siblings, 0 replies; 7+ messages in thread
From: Carlos Carvalho @ 2023-12-25 13:38 UTC (permalink / raw)
  To: list Linux RAID

Peter Grandi (pg@mdraid.list.sabi.co.UK) wrote on Mon, Dec 25, 2023 at 07:15:16AM -03:
> >> [...] a long standing problem. When lots of writes to many
> >> files are sent in a short time the kernel gets stuck and
> >> stops sending write requests to the disks. [...] A simple way
> >> to reproduce: expand a kernel source tree, like xzcat
> >> linux-6.5.tar.xz | tar x -f -
> 
> That is a well known (ideally...) consequence of misconfiguring
> both physical storage and the Linux flusher cache so there is a
> high chance of post-saturation congestion under load.
> 
> https://www.sabi.co.uk/blog/anno05-4th.html?051105#051105

No.

It's not a configuration problem, it's a kernel bug. Of course we can reduce
the number and size of dirty pages, as I mentioned myself in the post, but the
bug continues to exist. I even did it to keep a critical server alive. It is a
nuisance though because bursts of disk writes take much longer to complete.

Even restraining dirty pages after about 7-10 days that critical machine still
gets stuck and needs a reboot... As time goes by the machine becomes more
susceptible to the bug. Maybe because of memory fragmentation? This is only a
"wild guess", I have no idea if it makes sense and agrees with Ojaswin's
findings.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: parity raid and ext4 get stuck in writes
  2023-12-22 20:48 parity raid and ext4 get stuck in writes Carlos Carvalho
  2023-12-22 23:00 ` eyal
  2023-12-25  7:39 ` Daniel Dawson
@ 2024-01-04  6:08 ` Ojaswin Mujoo
  2 siblings, 0 replies; 7+ messages in thread
From: Ojaswin Mujoo @ 2024-01-04  6:08 UTC (permalink / raw)
  To: Carlos Carvalho; +Cc: linux-ext4, linux-raid

On Fri, Dec 22, 2023 at 05:48:01PM -0300, Carlos Carvalho wrote:
> This is finally a summary of a long standing problem. When lots of writes to
> many files are sent in a short time the kernel gets stuck and stops sending
> write requests to the disks. Sometimes it recovers and finally sends the
> modified pages to permanent storage, sometimes not and eventually other
> functions degrade and the machine crashes.
> 
> A simple way to reproduce: expand a kernel source tree, like
> xzcat linux-6.5.tar.xz | tar x -f -
> 
> With the default vm settings for dirty_background_ratio and dirty_ratio this
> will finish quickly with ~1.5GB of dirty pages in ram and ~100k inodes to be
> written and the kernel gets stuck.
> 
> The bug exists in all 6.* kernels; I've tested the latest release of all
> 6.[1-6]. However some conditions must exist for the problem to appear:
> 
> - there must be many inodes to be flushed; just many bytes in a few files don't
>   show the problem
> - it happens only with ext4 on a parity raid array
> 
> I've moved one of our arrays to xfs and everything works fine, so it's either
> specific to ext4 or xfs is not affected. When the lockup happens the flush
> kworker starts using 100% cpu permanently. I have not observed the bug in
> raid10, only in raid[56].
> 
> The problem is more easily triggered with 6.[56] but 6.1 is also affected.
> 
> Limiting dirty_bytes and dirty_background_bytes to low values reduce the
> probability of lockup, probably because the process generating writes is
> stopped before too many files are created.
 
 Hey Carlos,

 Thanks for sharing this. So as per your comment on the kernel bugzilla,
 it seems like the issue gets fixed for you with stripe=0 as well, so it
 might actually be the same issue. However, most of the people there are
 not able to replicate this in kernel before 6.5, so I'm interested in
 your statement that you see this in 6.1 as well.

 Would it possible to replicate this on 6.1 or any pre 6.5 kernel with
 some perf probes and share the report? I've added the steps to add the
 probes in pre 6.4 kernel here [1] (although it should hopefully work
 with 6.1 - 6.3 as well, since I don't think there'll be much change in
 the functions probed there). The probe would be helpful to confirm if
 the issue we see ion 6.5+ kernels and the one you are seeing in 6.1 is
 the same.

 Thanks,
 ojaswin

 [1] https://bugzilla.kernel.org/show_bug.cgi?id=217965#c36

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: parity raid and ext4 get stuck in writes
  2023-12-25  7:39 ` Daniel Dawson
  2023-12-25 10:15   ` Peter Grandi
@ 2024-01-04  6:11   ` Ojaswin Mujoo
  1 sibling, 0 replies; 7+ messages in thread
From: Ojaswin Mujoo @ 2024-01-04  6:11 UTC (permalink / raw)
  To: Daniel Dawson; +Cc: Carlos Carvalho, linux-ext4, linux-raid

On Sun, Dec 24, 2023 at 11:39:05PM -0800, Daniel Dawson wrote:
> On 12/22/23 12:48 PM, Carlos Carvalho wrote:
> > This is finally a summary of a long standing problem. When lots of writes to
> > many files are sent in a short time the kernel gets stuck and stops sending
> > write requests to the disks. Sometimes it recovers and finally sends the
> > modified pages to permanent storage, sometimes not and eventually other
> > functions degrade and the machine crashes.
> > 
> > A simple way to reproduce: expand a kernel source tree, like
> > xzcat linux-6.5.tar.xz | tar x -f -
> This sounds almost exactly like a problem I was having, right down to
> triggering it by writing the files of a kernel tree, though the details in
> my case are slightly different. I wanted to report it, but wanted to get a
> better handle on it and never managed it, and now I've changed my setup such
> that it doesn't happen anymore.
> > - it happens only with ext4 on a parity raid array
> 
> This is where it differs for me. I experienced it only with btrfs. But I had

Hi Daniel,

So I think there are some other people noticing something similar on
btrfs as well [1]. Maybe this is related to the issue you are noticing
although they have not mentioned anything about raid in btrfs.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2242391

Regards,
ojaswin
> two arrays with it, one on SSDs and one on HDDs. The HDD array exhibited the
> problem almost exclusively (the SSDs, I think, exhibited it once in several
> months, while the HDDs did pretty much every time I tried to compile a new
> kernel (until I started working around it), and even from some other things,
> which was a couple of times a week). I imagine because HDDs much slower and
> therefore allow more data to get cached.
> 
> Now that I've switched the HDD array to ext4, I haven't experienced the
> issue even once. But the setup has better performance, so maybe it's just
> because it flushes its writes faster.
> 
> -- 
> PGP fingerprint: 5BBD5080FEB0EF7F142F8173D572B791F7B4422A
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-01-04  6:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-22 20:48 parity raid and ext4 get stuck in writes Carlos Carvalho
2023-12-22 23:00 ` eyal
2023-12-25  7:39 ` Daniel Dawson
2023-12-25 10:15   ` Peter Grandi
2023-12-25 13:38     ` Carlos Carvalho
2024-01-04  6:11   ` Ojaswin Mujoo
2024-01-04  6:08 ` Ojaswin Mujoo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox