public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Nikolai Zhubr <zhubr.2@gmail.com>
Cc: Theodore Ts'o <tytso@mit.edu>,
	linux-ext4@vger.kernel.org, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org, jack@suse.cz
Subject: Re: ext4 damage suspected in between 5.15.167 - 5.15.170
Date: Mon, 16 Dec 2024 13:59:41 +0100	[thread overview]
Message-ID: <20241216125941.pr2eufott5pmqyyh@quack3> (raw)
In-Reply-To: <ce9055d7-7301-0abe-3609-3a4e2e7b1e5e@gmail.com>

Hi Nikolai!

On Sat 14-12-24 22:58:24, Nikolai Zhubr wrote:
> On 12/13/24 19:12, Theodore Ts'o wrote:
> > Note that some hardware errors can be caused by one-off errors, such
> > as cosmic rays causing a bit-flip in memory DIMM.  If that happens,
> > RAID won't save you, since the error was introduced before an updated
> 
> Certainly cosmic rays is a possibility, but based on previous episodes I'd
> still rather bet on a more usual "subtle interaction" problem, either exact
> same or some similar to [1].
> I even tried to run an existing test for this particular case as described
> in [2] but it is not too user-friendly and somehow exits abnormally without
> actually doing any interesting work. I'll get back to it later when I have
> some time.
> 
> [1] https://lore.kernel.org/stable/20231205122122.dfhhoaswsfscuhc3@quack3/
> [2] https://lwn.net/Articles/954364/
> 
> > The location of block allocation bitmaps never gets changed, so this
> > sort of thing only happens due to hardware-induced corruption.
> 
> Well, unless e.g. some modified sectors start being flushed to random wrong
> offsets, like in [1] above, or something similar.

Note that above bug led to writing file data to another position in that
file. As such it cannot really lead to metadata corruption. Corrupting data
in a file is relatively frequent event (given the wide variety of
manipulations we do with file data). OTOH I've never seen corrupting
metadata like this (in particular because ext4 has additional sanity checks
that newly allocated blocks don't overlap with critical fs metadata). In
theory, there could be software bug leading to writing sector to a wrong
position but frankly, in all the cases I've investigated so far such bugs
ended up being HW related.

> > Otherwise, I strongly encourage you to learn, and to take
> > responsibility for the health of your own system.  And ideally, you
> > can also use that knowledge to help other users out, which is the only
> > way the free-as-in-beer ecosystem can flurish; by having everybody
> 
> True. Generally I try to follow that, as much as appears possible.
> It is sad a direct communication end-user-to-developer for solving issues is
> becoming increasingly problematic here.

On one hand I understand you, on the other hand back in the good old days
(and I remember those as well ;) you wouldn't get much help when running
over three years old kernel either. And I understand you're running a stable
kernel that gets at least some updates but that's meant more for companies
that build their products on top of that and have teams available for
debugging issues. For an enduser I find some distribution kernels (Debian,
Ubuntu, openSUSE, Fedora) more suitable as they get much more scrutiny
before being released than -stable and also people there are more willing
to look at issues with older kernels (that are still supported by the
distro).

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2024-12-16 12:59 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-12 18:31 ext4 damage suspected in between 5.15.167 - 5.15.170 Nikolai Zhubr
2024-12-12 19:16 ` Theodore Ts'o
2024-12-13 10:49   ` Nikolai Zhubr
2024-12-13 16:12     ` Theodore Ts'o
2024-12-14 19:58       ` Nikolai Zhubr
2024-12-16 12:59         ` Jan Kara [this message]
2024-12-16 15:16         ` David Laight
2024-12-16 19:31           ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241216125941.pr2eufott5pmqyyh@quack3 \
    --to=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=zhubr.2@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox