All of lore.kernel.org
 help / color / mirror / Atom feed
From: Salvatore Bonaccorso <carnil@debian.org>
To: Yu Kuai <yukuai1@huaweicloud.com>, 1104460@bugs.debian.org
Cc: "Antoine Beaupré" <anarcat@debian.org>,
	"Moritz Mühlenhoff" <jmm@inutil.org>,
	"Melvin Vermeeren" <vermeeren@vermwa.re>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Coly Li" <colyli@kernel.org>, "Sasha Levin" <sashal@kernel.org>,
	stable <stable@vger.kernel.org>,
	regressions@lists.linux.dev, "yukuai (C)" <yukuai3@huawei.com>
Subject: Re: Bug#1104460: [regression 6.1.y] discard/TRIM through RAID10 blocking
Date: Tue, 6 May 2025 08:00:34 +0200	[thread overview]
Message-ID: <aBmlgkHrbTYzwjj4@eldamar.lan> (raw)
In-Reply-To: <4762cbe1-30a2-e5cd-52e1-f2de7714da1e@huaweicloud.com>

Hi Yu,

Thanks for your followups.

On Tue, May 06, 2025 at 09:25:50AM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2025/05/06 4:59, Antoine Beaupré 写道:
> > On 2025-05-05 22:36:07, Salvatore Bonaccorso wrote:
> > > Hi Antoine,
> > > 
> > > On Mon, May 05, 2025 at 02:50:32PM -0400, Antoine Beaupré wrote:
> > > > On 2025-05-05 18:02:37, Salvatore Bonaccorso wrote:
> > > > > On Mon, May 05, 2025 at 04:00:31PM +0200, Salvatore Bonaccorso wrote:
> > > > > > Hi Moritz,
> > > > > > 
> > > > > > On Mon, May 05, 2025 at 01:47:15PM +0200, Moritz Mühlenhoff wrote:
> > > > > > > Am Wed, Apr 30, 2025 at 05:55:20PM +0200 schrieb Salvatore Bonaccorso:
> > > > > > > > Hi
> > > > > > > > 
> > > > > > > > We got a regression report in Debian after the update from 6.1.133 to
> > > > > > > > 6.1.135. Melvin is reporting that discard/trimm trhough a RAID10 array
> > > > > > > > stalls idefintively. The full report is inlined below and originates
> > > > > > > > from https://bugs.debian.org/1104460 .
> > > > > > > 
> > > > > > > JFTR, we ran into the same problem with a few Wikimedia servers running
> > > > > > > 6.1.135 and RAID 10: The servers started to lock up once fstrim.service
> > > > > > > got started. Full oops messages are available at
> > > > > > > https://phabricator.wikimedia.org/P75746
> > > > > > 
> > > > > > Thanks for this aditional datapoints. Assuming you wont be able to
> > > > > > thest the other stable series where the commit d05af90d6218
> > > > > > ("md/raid10: fix missing discard IO accounting") went in, might you at
> > > > > > least be able to test the 6.1.y branch with the commit reverted again
> > > > > > and manually trigger the issue?
> > > > > > 
> > > > > > If needed I can provide a test Debian package of 6.1.135 (or 6.1.137)
> > > > > > with the patch reverted.
> > > > > 
> > > > > So one additional data point as several Debian users were reporting
> > > > > back beeing affected: One user did upgrade to 6.12.25 (where the
> > > > > commit was backported as well) and is not able to reproduce the issue
> > > > > there.
> > > > 
> > > > That would be me.
> > > > 
> > > > I can reproduce the issue as outlined by Moritz above fairly reliably in
> > > > 6.1.135 (debian package 6.1.0-34-amd64). The reproducer is simple, on a
> > > > RAID-10 host:
> > > > 
> > > >   1. reboot
> > > >   2. systemctl start fstrim.service
> > > > 
> > > > We're tracking the issue internally in:
> > > > 
> > > > https://gitlab.torproject.org/tpo/tpa/team/-/issues/42146
> > > > 
> > > > I've managed to workaround the issue by upgrading to the Debian package
> > > > from testing/unstable (6.12.25), as Salvatore indicated above. There,
> > > > fstrim doesn't cause any crash and completes successfully. In stable, it
> > > > just hangs there forever. The kernel doesn't completely panic and the
> > > > machine is otherwise somewhat still functional: my existing SSH
> > > > connection keeps working, for example, but new ones fail. And an `apt
> > > > install` of another kernel hangs forever.
> > > 
> > > So likely at least in 6.1.y there are missing pre-requisites causing
> > > the behaviour.
> > > 
> > > If you can test 6.1.135-1 with the commit
> > > 4a05f7ae33716d996c5ce56478a36a3ede1d76f2 reverted then you can fetch
> > > built packages at:
> > > 
> > > https://people.debian.org/~carnil/tmp/linux/1104460/
> 
> Can you also test with 4a05f7ae33716d996c5ce56478a36a3ede1d76f2 not
> reverted, and also cherry-pick c567c86b90d4715081adfe5eb812141a5b6b4883?

Thank you.

Antoine, Moritz,
https://people.debian.org/~carnil/tmp/linux/1104460-2/ contains a
build with 4a05f7ae33716d996c5ce56478a36a3ede1d76f2 *not* reverted and
with c567c86b90d4715081adfe5eb812141a5b6b4883 cherry-picked, can you
test this one as well?

Regards,
Salvatore

  reply	other threads:[~2025-05-06  6:00 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <174602441004.174814.6400502946223473449.reportbug@talos.vermwa.re>
2025-04-30 15:55 ` [regression 6.1.y] discard/TRIM through RAID10 blocking (was: Re: Bug#1104460: linux-image-6.1.0-34-powerpc64le: Discard broken) with RAID10: BUG: kernel tried to execute user page (0) - exploit attempt? Salvatore Bonaccorso
2025-05-05 11:47   ` Moritz Mühlenhoff
2025-05-05 14:00     ` Salvatore Bonaccorso
2025-05-05 16:02       ` Salvatore Bonaccorso
2025-05-05 18:50         ` Bug#1104460: " Antoine Beaupré
2025-05-05 20:36           ` Salvatore Bonaccorso
2025-05-05 20:59             ` Antoine Beaupré
2025-05-06  1:25               ` Bug#1104460: [regression 6.1.y] discard/TRIM through RAID10 blocking Yu Kuai
2025-05-06  6:00                 ` Salvatore Bonaccorso [this message]
2025-05-06 13:12                   ` Antoine Beaupré
2025-05-06  1:11   ` Yu Kuai
2025-05-06  1:19     ` Yu Kuai
2025-05-06 15:16   ` [regression 6.1.y] discard/TRIM through RAID10 blocking (was: Re: Bug#1104460: linux-image-6.1.0-34-powerpc64le: Discard broken) with RAID10: BUG: kernel tried to execute user page (0) - exploit attempt? Melvin Vermeeren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBmlgkHrbTYzwjj4@eldamar.lan \
    --to=carnil@debian.org \
    --cc=1104460@bugs.debian.org \
    --cc=anarcat@debian.org \
    --cc=colyli@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jmm@inutil.org \
    --cc=regressions@lists.linux.dev \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=vermeeren@vermwa.re \
    --cc=yukuai1@huaweicloud.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.