From: Niklas Cassel <cassel@kernel.org>
To: Tim Teichmann <teichmanntim@outlook.de>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Christian Heusel <christian@heusel.eu>,
regressions@lists.linux.dev, x86@kernel.org,
stable@vger.kernel.org, Hans de Goede <hdegoede@redhat.com>,
linux-ide@vger.kernel.org, Damien Le Moal <dlemoal@kernel.org>,
Jens Axboe <axboe@kernel.dk>
Subject: Re: [REGRESSION][BISECTED] Scheduling errors with the AMD FX 8300 CPU
Date: Tue, 28 May 2024 22:19:21 +0200 [thread overview]
Message-ID: <ZlY8SbGVMHho-dLz@ryzen.lan> (raw)
In-Reply-To: <f3b909f3-de1d-4781-aa7a-1967abe24125@kernel.dk>
Hello Tim,
On Tue, May 28, 2024 at 01:17:51PM -0600, Jens Axboe wrote:
> (Adding Damien, he's the ATA guy these days - leaving the below intact)
>
> On 5/28/24 1:15 PM, Thomas Gleixner wrote:
> > Tim!
> >
> > On Tue, May 28 2024 at 17:43, Tim Teichmann wrote:
> >> On 24/05/27 07:17pm, Thomas Gleixner wrote:
> >> I've just tested the fix you've provided in the previous email.
> >> The exact patches are attached to the ticket in the archlinux bugtracker[0].
> >
> > Thanks! I will write a proper changelog and ship it.
> >
> >> The error regarding CPU scheduling disappeared for both kernel verions[0].
> >> However, the ATA bus error still occurs.
> >>
> >> Also, I suppose that the ATA bus error is the same as the previous one,
> >> because the only value that changes in the exception message is SAct.
> >>
> >> This is the message of the ATA error before the patch:
> >>
> >>>> May 23 23:36:49 archlinux kernel: smpboot: x86: Booting SMP configuration:
> >>>> May 23 23:36:49 archlinux kernel: .... node #0, CPUs: #2 #4 #6
> >>>> May 23 23:36:49 archlinux kernel: __common_interrupt: 2.55 No irq handler for vector
> >>>> May 23 23:36:49 archlinux kernel: __common_interrupt: 4.55 No irq handler for vector
> >>>> May 23 23:36:49 archlinux kernel: __common_interrupt: 6.55 No irq handler for vector
> >>>>
> >>>> ATA stuff:
> >>>>
> >>>> May 23 23:36:59 archlinux kernel: ata2.00: exception Emask 0x10 SAct 0x1fffe000 SErr 0x40d0002 action 0xe frozen
> >>>
> >>> That's probably just the fallout of the above.
> >
> > It's in reality not related and I saw some other AHCI fallout fly by.
> >
> >> And that's the message after the patch:
> >>
> >> [ 4.877584] ata2.00: exception Emask 0x10 SAct 0x80000000 SErr 0x40d0002 action 0xe frozen
> >>
> >> The full dmesg outputs are in the attachments.
> >
> > Cc'ed the AHCI people and left the info around for them.
We recently (kernel v6.9) enabled LPM for all AHCI controllers if:
-The AHCI controller reports that it supports LPM, and
-The drive reports that it supports LPM (DIPM), and
-CONFIG_SATA_MOBILE_LPM_POLICY=3, and
-The port is not defined as external in the per port PxCMD register, and
-The port is not defined as hotplug capable in the per port PxCMD register.
However, there appears to be some drives (usually cheap ones that we've never
heard about) that reports that they support DIPM, but when actually turning
it on, they stop working.
Looking at the dmesg, you seem to have two SATA drives:
> >> [ 0.957220] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> >> [ 0.957984] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> >> [ 0.958027] ata3.00: ATA-8: TOSHIBA HDWD110, MS2OA8J0, max UDMA/133
> >> [ 0.958069] ata2.00: ATA-11: Apacer AS340 120GB, AP612PE0, max UDMA/133
ata3 (TOSHIBA HDWD110) appears to work correctly.
ata2 (Apacer AS340 120GB) results in command timeouts and
"a change in device presence has been detected" being set in PxSERR.DIAG.X.
> >> [ 2.964262] ata2.00: exception Emask 0x10 SAct 0x80 SErr 0x40d0002 action 0xe frozen
> >> [ 2.964274] ata2.00: irq_stat 0x00000040, connection status changed
> >> [ 2.964279] ata2: SError: { RecovComm PHYRdyChg CommWake 10B8B DevExch }
> >> [ 2.964288] ata2.00: failed command: READ FPDMA QUEUED
> >> [ 2.964291] ata2.00: cmd 60/08:38:80:ff:f1/00:00:0d:00:00/40 tag 7 ncq dma 4096 in
> >> res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
> >> [ 2.964307] ata2.00: status: { DRDY }
> >> [ 2.964318] ata2: hard resetting link
Could you please try the following patch (quirk):
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index c449d60d9bb9..24ebcad65b65 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4199,6 +4199,9 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = {
ATA_HORKAGE_ZERO_AFTER_TRIM |
ATA_HORKAGE_NOLPM },
+ /* Apacer models with LPM issues */
+ { "Apacer AS340*", NULL, ATA_HORKAGE_NOLPM },
+
/* These specific Samsung models/firmware-revs do not handle LPM well */
{ "SAMSUNG MZMPC128HBFU-000MV", "CXM14M1Q", ATA_HORKAGE_NOLPM },
{ "SAMSUNG SSD PM830 mSATA *", "CXM13D1Q", ATA_HORKAGE_NOLPM },
Kind regards,
Niklas
next prev parent reply other threads:[~2024-05-28 20:19 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <7skhx6mwe4hxiul64v6azhlxnokheorksqsdbp7qw6g2jduf6c@7b5pvomauugk>
[not found] ` <87r0dqdf0r.ffs@tglx>
[not found] ` <gtgsklvltu5pzeiqn7fwaktdsywk2re75unapgbcarlmqkya5a@mt7pi4j2f7b3>
[not found] ` <87h6ejd0wt.ffs@tglx>
[not found] ` <PR3PR02MB6012CB03006F1EEE8E8B5D69B3F02@PR3PR02MB6012.eurprd02.prod.outlook.com>
[not found] ` <874jajcn9r.ffs@tglx>
[not found] ` <PR3PR02MB6012EDF7EBA8045FBB03C434B3F02@PR3PR02MB6012.eurprd02.prod.outlook.com>
[not found] ` <87msobb2dp.ffs@tglx>
[not found] ` <PR3PR02MB6012D4B2D513F6FA9D29BE5EB3F12@PR3PR02MB6012.eurprd02.prod.outlook.com>
2024-05-28 19:15 ` [REGRESSION][BISECTED] Scheduling errors with the AMD FX 8300 CPU Thomas Gleixner
2024-05-28 19:17 ` Jens Axboe
2024-05-28 20:19 ` Niklas Cassel [this message]
2024-05-29 13:02 ` Tim Teichmann
2024-05-29 18:33 ` Niklas Cassel
2024-05-29 22:59 ` Tim Teichmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZlY8SbGVMHho-dLz@ryzen.lan \
--to=cassel@kernel.org \
--cc=axboe@kernel.dk \
--cc=christian@heusel.eu \
--cc=dlemoal@kernel.org \
--cc=hdegoede@redhat.com \
--cc=linux-ide@vger.kernel.org \
--cc=regressions@lists.linux.dev \
--cc=stable@vger.kernel.org \
--cc=teichmanntim@outlook.de \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox