Linux kernel -stable discussions
 help / color / mirror / Atom feed
From: "Antoine Beaupré" <anarcat@debian.org>
To: Salvatore Bonaccorso <carnil@debian.org>,
	Yu Kuai <yukuai1@huaweicloud.com>,
	1104460@bugs.debian.org
Cc: "Moritz Mühlenhoff" <jmm@inutil.org>,
	"Melvin Vermeeren" <vermeeren@vermwa.re>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Coly Li" <colyli@kernel.org>, "Sasha Levin" <sashal@kernel.org>,
	stable <stable@vger.kernel.org>,
	regressions@lists.linux.dev, "yukuai (C)" <yukuai3@huawei.com>
Subject: Re: Bug#1104460: [regression 6.1.y] discard/TRIM through RAID10 blocking
Date: Tue, 06 May 2025 09:12:19 -0400	[thread overview]
Message-ID: <87wmatvq6k.fsf@angela.anarc.at> (raw)
In-Reply-To: <aBmlgkHrbTYzwjj4@eldamar.lan>

On 2025-05-06 08:00:34, Salvatore Bonaccorso wrote:
> Hi Yu,
>
> Thanks for your followups.
>
> On Tue, May 06, 2025 at 09:25:50AM +0800, Yu Kuai wrote:
>> Hi,
>> 
>> 在 2025/05/06 4:59, Antoine Beaupré 写道:
>> > On 2025-05-05 22:36:07, Salvatore Bonaccorso wrote:
>> > > Hi Antoine,
>> > > 
>> > > On Mon, May 05, 2025 at 02:50:32PM -0400, Antoine Beaupré wrote:
>> > > > On 2025-05-05 18:02:37, Salvatore Bonaccorso wrote:
>> > > > > On Mon, May 05, 2025 at 04:00:31PM +0200, Salvatore Bonaccorso wrote:
>> > > > > > Hi Moritz,
>> > > > > > 
>> > > > > > On Mon, May 05, 2025 at 01:47:15PM +0200, Moritz Mühlenhoff wrote:
>> > > > > > > Am Wed, Apr 30, 2025 at 05:55:20PM +0200 schrieb Salvatore Bonaccorso:
>> > > > > > > > Hi
>> > > > > > > > 
>> > > > > > > > We got a regression report in Debian after the update from 6.1.133 to
>> > > > > > > > 6.1.135. Melvin is reporting that discard/trimm trhough a RAID10 array
>> > > > > > > > stalls idefintively. The full report is inlined below and originates
>> > > > > > > > from https://bugs.debian.org/1104460 .
>> > > > > > > 
>> > > > > > > JFTR, we ran into the same problem with a few Wikimedia servers running
>> > > > > > > 6.1.135 and RAID 10: The servers started to lock up once fstrim.service
>> > > > > > > got started. Full oops messages are available at
>> > > > > > > https://phabricator.wikimedia.org/P75746
>> > > > > > 
>> > > > > > Thanks for this aditional datapoints. Assuming you wont be able to
>> > > > > > thest the other stable series where the commit d05af90d6218
>> > > > > > ("md/raid10: fix missing discard IO accounting") went in, might you at
>> > > > > > least be able to test the 6.1.y branch with the commit reverted again
>> > > > > > and manually trigger the issue?
>> > > > > > 
>> > > > > > If needed I can provide a test Debian package of 6.1.135 (or 6.1.137)
>> > > > > > with the patch reverted.
>> > > > > 
>> > > > > So one additional data point as several Debian users were reporting
>> > > > > back beeing affected: One user did upgrade to 6.12.25 (where the
>> > > > > commit was backported as well) and is not able to reproduce the issue
>> > > > > there.
>> > > > 
>> > > > That would be me.
>> > > > 
>> > > > I can reproduce the issue as outlined by Moritz above fairly reliably in
>> > > > 6.1.135 (debian package 6.1.0-34-amd64). The reproducer is simple, on a
>> > > > RAID-10 host:
>> > > > 
>> > > >   1. reboot
>> > > >   2. systemctl start fstrim.service
>> > > > 
>> > > > We're tracking the issue internally in:
>> > > > 
>> > > > https://gitlab.torproject.org/tpo/tpa/team/-/issues/42146
>> > > > 
>> > > > I've managed to workaround the issue by upgrading to the Debian package
>> > > > from testing/unstable (6.12.25), as Salvatore indicated above. There,
>> > > > fstrim doesn't cause any crash and completes successfully. In stable, it
>> > > > just hangs there forever. The kernel doesn't completely panic and the
>> > > > machine is otherwise somewhat still functional: my existing SSH
>> > > > connection keeps working, for example, but new ones fail. And an `apt
>> > > > install` of another kernel hangs forever.
>> > > 
>> > > So likely at least in 6.1.y there are missing pre-requisites causing
>> > > the behaviour.
>> > > 
>> > > If you can test 6.1.135-1 with the commit
>> > > 4a05f7ae33716d996c5ce56478a36a3ede1d76f2 reverted then you can fetch
>> > > built packages at:
>> > > 
>> > > https://people.debian.org/~carnil/tmp/linux/1104460/
>> 
>> Can you also test with 4a05f7ae33716d996c5ce56478a36a3ede1d76f2 not
>> reverted, and also cherry-pick c567c86b90d4715081adfe5eb812141a5b6b4883?
>
> Thank you.
>
> Antoine, Moritz,
> https://people.debian.org/~carnil/tmp/linux/1104460-2/ contains a
> build with 4a05f7ae33716d996c5ce56478a36a3ede1d76f2 *not* reverted and
> with c567c86b90d4715081adfe5eb812141a5b6b4883 cherry-picked, can you
> test this one as well?

I tested this one, and could succesfully run fstrim.service without
problems.

A.

-- 
L'ennui avec la grande famille humaine, c'est que tout le monde veut
en être le père.
                        - Mafalda

  reply	other threads:[~2025-05-06 13:12 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <174602441004.174814.6400502946223473449.reportbug@talos.vermwa.re>
2025-04-30 15:55 ` [regression 6.1.y] discard/TRIM through RAID10 blocking (was: Re: Bug#1104460: linux-image-6.1.0-34-powerpc64le: Discard broken) with RAID10: BUG: kernel tried to execute user page (0) - exploit attempt? Salvatore Bonaccorso
2025-05-05 11:47   ` Moritz Mühlenhoff
2025-05-05 14:00     ` Salvatore Bonaccorso
2025-05-05 16:02       ` Salvatore Bonaccorso
2025-05-05 18:50         ` Bug#1104460: " Antoine Beaupré
2025-05-05 20:36           ` Salvatore Bonaccorso
2025-05-05 20:59             ` Antoine Beaupré
2025-05-06  1:25               ` Bug#1104460: [regression 6.1.y] discard/TRIM through RAID10 blocking Yu Kuai
2025-05-06  6:00                 ` Salvatore Bonaccorso
2025-05-06 13:12                   ` Antoine Beaupré [this message]
2025-05-06  1:11   ` Yu Kuai
2025-05-06  1:19     ` Yu Kuai
2025-05-06 15:16   ` [regression 6.1.y] discard/TRIM through RAID10 blocking (was: Re: Bug#1104460: linux-image-6.1.0-34-powerpc64le: Discard broken) with RAID10: BUG: kernel tried to execute user page (0) - exploit attempt? Melvin Vermeeren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wmatvq6k.fsf@angela.anarc.at \
    --to=anarcat@debian.org \
    --cc=1104460@bugs.debian.org \
    --cc=carnil@debian.org \
    --cc=colyli@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jmm@inutil.org \
    --cc=regressions@lists.linux.dev \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=vermeeren@vermwa.re \
    --cc=yukuai1@huaweicloud.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox