linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567
@ 2025-06-25 19:17 Bandhan Pramanik
  2025-06-25 20:20 ` Bjorn Helgaas
  2025-07-12 19:18 ` Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567 Askar Safin
  0 siblings, 2 replies; 21+ messages in thread
From: Bandhan Pramanik @ 2025-06-25 19:17 UTC (permalink / raw)
  To: linux-pci, linux-acpi; +Cc: ath10k, linux-wireless, stable

Hello,

The following is the original thread, where a bug was reported to the
linux-wireless and ath10k mailing lists. The specific bug has been
detailed clearly here.

https://lore.kernel.org/linux-wireless/690B1DB2-C9DC-4FAD-8063-4CED659B1701@gmail.com/T/#t

There is also a Bugzilla report by me, which was opened later:
https://bugzilla.kernel.org/show_bug.cgi?id=220264

As stated, it is highly encouraged to check out all the logs,
especially the line of IRQ #16 in /proc/interrupts.

Here is where all the logs are:
https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180
(these logs are taken from an Arch liveboot)

On my daily driver, I found these on my IRQ #16:

  16:     173210          0          0          0 IR-IO-APIC
16-fasteoi   i2c_designware.0, idma64.0, i801_smbus

The fixes stated on the Reddit post for this Wi-Fi card didn't quite
work. (But git-cloning the firmware files did give me some more time
to have stable internet)

This time, I had to go for the GRUB kernel parameters.

Right now, I'm using "irqpoll" to curb the errors caused.
"intel_iommu=off" did not work, and the Wi-Fi was constantly crashing
even then. Did not try out "pci=noaer" this time.

If it's of any concern, there is a very weird error in Chromium-based
browsers which has only happened after I started using irqpoll. When I
Google something, the background of the individual result boxes shows
as pure black, while the surrounding space is the usual
greyish-blackish, like we see in Dark Mode. Here is a picture of the
exact thing I'm experiencing: https://files.catbox.moe/mjew6g.png

If you notice anything in my logs/bug reports, please let me know.
(Because it seems like Wi-Fi errors are just a red herring, there are
some ACPI or PCIe-related errors in the computers of this model - just
a naive speculation, though.)

Thanking you,
Bandhan Pramanik

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567
  2025-06-25 19:17 Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567 Bandhan Pramanik
@ 2025-06-25 20:20 ` Bjorn Helgaas
  2025-06-25 22:50   ` Bandhan Pramanik
  2025-07-12 19:18 ` Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567 Askar Safin
  1 sibling, 1 reply; 21+ messages in thread
From: Bjorn Helgaas @ 2025-06-25 20:20 UTC (permalink / raw)
  To: Bandhan Pramanik, Jeff Johnson
  Cc: linux-pci, linux-acpi, ath10k, linux-wireless, stable

[+cc Jeff, ath10k maintainer]

On Thu, Jun 26, 2025 at 12:47:49AM +0530, Bandhan Pramanik wrote:
> Hello,
> 
> The following is the original thread, where a bug was reported to the
> linux-wireless and ath10k mailing lists. The specific bug has been
> detailed clearly here.
> 
> https://lore.kernel.org/linux-wireless/690B1DB2-C9DC-4FAD-8063-4CED659B1701@gmail.com/T/#t
> 
> There is also a Bugzilla report by me, which was opened later:
> https://bugzilla.kernel.org/show_bug.cgi?id=220264
> 
> As stated, it is highly encouraged to check out all the logs,
> especially the line of IRQ #16 in /proc/interrupts.
> 
> Here is where all the logs are:
> https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180
> (these logs are taken from an Arch liveboot)
> 
> On my daily driver, I found these on my IRQ #16:
> 
>   16:     173210          0          0          0 IR-IO-APIC
> 16-fasteoi   i2c_designware.0, idma64.0, i801_smbus
> 
> The fixes stated on the Reddit post for this Wi-Fi card didn't quite
> work. (But git-cloning the firmware files did give me some more time
> to have stable internet)
> 
> This time, I had to go for the GRUB kernel parameters.
> 
> Right now, I'm using "irqpoll" to curb the errors caused.
> "intel_iommu=off" did not work, and the Wi-Fi was constantly crashing
> even then. Did not try out "pci=noaer" this time.
> 
> If it's of any concern, there is a very weird error in Chromium-based
> browsers which has only happened after I started using irqpoll. When I
> Google something, the background of the individual result boxes shows
> as pure black, while the surrounding space is the usual
> greyish-blackish, like we see in Dark Mode. Here is a picture of the
> exact thing I'm experiencing: https://files.catbox.moe/mjew6g.png
> 
> If you notice anything in my logs/bug reports, please let me know.
> (Because it seems like Wi-Fi errors are just a red herring, there are
> some ACPI or PCIe-related errors in the computers of this model - just
> a naive speculation, though.)

Your dmesg log is incomplete, and we would need to see the entire
thing.  It should start with something like this:

  Linux version 6.8.0-60-generic (buildd@lcy02-amd64-054) (x86_64-linux-gnu-gcc-13 (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, GNU ld (GNU Binutils for Ubuntu) 2.42) #63-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 15 19:04:15 UTC 2025 (Ubuntu 6.8.0-60.63-generic 6.8.12)

Your lspci output doesn't include the necessary PCI details; collect
it with "sudo lspci -vv".

We should pick the most serious problem and focus on that instead of
trying to solve everything at once.

It sounds like the ath10k issue might be the biggest problem?  If
"options ath10k_core skip_otp=y" is a workaround for this problem, it
looks like some ath10k firmware thing, probably unrelated to the PCI
core.

Bjorn


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567
  2025-06-25 20:20 ` Bjorn Helgaas
@ 2025-06-25 22:50   ` Bandhan Pramanik
  2025-06-26 17:53     ` Bandhan Pramanik
  0 siblings, 1 reply; 21+ messages in thread
From: Bandhan Pramanik @ 2025-06-25 22:50 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jeff Johnson, linux-pci, linux-acpi, ath10k, linux-wireless,
	stable

Please ignore the last email (I haven't replied to everyone). Also,
here's the actual updated dmesg (the previous one was the old one):
https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180/raw/78460e6931a055b6776afe756a95d467913d5ebd/updated-dmesg

On Thu, Jun 26, 2025 at 4:16 AM Bandhan Pramanik
<bandhanpramanik06.foss@gmail.com> wrote:
>
> Hello Bjorn,
>
> First of all, thanks a LOT for replying.
>
> I have included the files in my previous GitHub Gist. Sharing the raw
> files for easier analysis.
>
> lspci -vv: https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180/raw/78460e6931a055b6776afe756a95d467913d5ebd/detailed-lspci.txt
> dmesg: https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180/raw/78460e6931a055b6776afe756a95d467913d5ebd/dmesg.log
>
> On a different note, I had to use pci=noaer, so that the ring buffer
> wouldn't get cleared that fast.
>
> Regarding the ath10k thing, none of the fixes worked this time. Only
> irqpoll worked. I don't know if it's because of a disparity b/w GNOME
> and KDE (because my daily driver is Fedora 42), but I'm 300% sure that
> it's not just the Wi-Fi that's the issue here. It's most probably a
> lot of issues here, and the harder issues to fix are usually the ones
> closer to the hardware.
>
> Anyway, if you get something, please let me know.
>
> Bandhan
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567
  2025-06-25 22:50   ` Bandhan Pramanik
@ 2025-06-26 17:53     ` Bandhan Pramanik
  2025-06-26 23:21       ` Bandhan Pramanik
  0 siblings, 1 reply; 21+ messages in thread
From: Bandhan Pramanik @ 2025-06-26 17:53 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jeff Johnson, linux-pci, linux-acpi, ath10k, linux-wireless,
	stable

Hello everyone,

I think I found it. I used irqpoll and I didn't experience any hiccups with my mouse performance. But the Wi-Fi was still malfunctioning.

To linux-pci and linux-acpi:

It's an ath10k problem, sure, but there's something definitely problematic happening if, in the normal state, these Wi-Fi bugs hamper the touchpad movement.

To ath10k and linux-wireless:

I tried out "options ath10k_core rawmode = 0" along with "skip_otp=y' and the Wi-Fi seems to work perfectly as of now. It might be the fix, it might not be either. But I think there's something more important to ask: Are there any good resources/documentation on referring to what the different key-value pairs mean? Like, what's the exact documentation through which people arrive at "rawmode=0" or "skip_otp=y"?



Bandhan


On 26 June 2025 4:20:13 am IST, Bandhan Pramanik <bandhanpramanik06.foss@gmail.com> wrote:
> Please ignore the last email (I haven't replied to everyone). Also,
> here's the actual updated dmesg (the previous one was the old one):
> https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180/raw/78460e6931a055b6776afe756a95d467913d5ebd/updated-dmesg
> 
> On Thu, Jun 26, 2025 at 4:16 AM Bandhan Pramanik
> <bandhanpramanik06.foss@gmail.com> wrote:
> >
> > Hello Bjorn,
> >
> > First of all, thanks a LOT for replying.
> >
> > I have included the files in my previous GitHub Gist. Sharing the raw
> > files for easier analysis.
> >
> > lspci -vv: https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180/raw/78460e6931a055b6776afe756a95d467913d5ebd/detailed-lspci.txt
> > dmesg: https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180/raw/78460e6931a055b6776afe756a95d467913d5ebd/dmesg.log
> >
> > On a different note, I had to use pci=noaer, so that the ring buffer
> > wouldn't get cleared that fast.
> >
> > Regarding the ath10k thing, none of the fixes worked this time. Only
> > irqpoll worked. I don't know if it's because of a disparity b/w GNOME
> > and KDE (because my daily driver is Fedora 42), but I'm 300% sure that
> > it's not just the Wi-Fi that's the issue here. It's most probably a
> > lot of issues here, and the harder issues to fix are usually the ones
> > closer to the hardware.
> >
> > Anyway, if you get something, please let me know.
> >
> > Bandhan
> >

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567
  2025-06-26 17:53     ` Bandhan Pramanik
@ 2025-06-26 23:21       ` Bandhan Pramanik
  2025-07-04 19:30         ` Bandhan Pramanik
  0 siblings, 1 reply; 21+ messages in thread
From: Bandhan Pramanik @ 2025-06-26 23:21 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jeff Johnson, linux-pci, linux-acpi, ath10k, linux-wireless,
	stable

Just a small update: it's not the fix. Back to square 1. 

On 26 June 2025 11:23:14 pm IST, Bandhan Pramanik <bandhanpramanik06.foss@gmail.com> wrote:
> Hello everyone,
> 
> I think I found it. I used irqpoll and I didn't experience any hiccups with my mouse performance. But the Wi-Fi was still malfunctioning.
> 
> To linux-pci and linux-acpi:
> 
> It's an ath10k problem, sure, but there's something definitely problematic happening if, in the normal state, these Wi-Fi bugs hamper the touchpad movement.
> 
> To ath10k and linux-wireless:
> 
> I tried out "options ath10k_core rawmode = 0" along with "skip_otp=y' and the Wi-Fi seems to work perfectly as of now. It might be the fix, it might not be either. But I think there's something more important to ask: Are there any good resources/documentation on referring to what the different key-value pairs mean? Like, what's the exact documentation through which people arrive at "rawmode=0" or "skip_otp=y"?
> 
> 
> 
> Bandhan
> 
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567
  2025-06-26 23:21       ` Bandhan Pramanik
@ 2025-07-04 19:30         ` Bandhan Pramanik
  2025-07-05 13:50           ` Bjorn Helgaas
  0 siblings, 1 reply; 21+ messages in thread
From: Bandhan Pramanik @ 2025-07-04 19:30 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jeff Johnson, linux-pci, linux-acpi, ath10k, linux-wireless,
	stable

Hi everyone,

Here after a week. I did my research.

I talked to some folks on IRC and the glaring issue was basically this: 

> [ 1146.810055] pcieport 0000:00:1c.0: AER: Uncorrectable (Fatal) error message received from 0000:01:00.0

This basically means that the root port (that 1c thing written with colons) of PCIe is the main problem here. 

One particular note: this issue can be reproduced on the models of this same laptop. Therefore, this happens in most if not all of the laptops of the same model.

For starters, the root port basically manages the communication between the CPU and the device. Now, this root port itself is reporting fatal errors.

This is not a Wi-Fi error, but something deeper. 

Any tips on what to do?

Bandhan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567
  2025-07-04 19:30         ` Bandhan Pramanik
@ 2025-07-05 13:50           ` Bjorn Helgaas
  2025-07-05 15:00             ` Bandhan Pramanik
  0 siblings, 1 reply; 21+ messages in thread
From: Bjorn Helgaas @ 2025-07-05 13:50 UTC (permalink / raw)
  To: Bandhan Pramanik
  Cc: Jeff Johnson, linux-pci, linux-acpi, ath10k, linux-wireless,
	stable

On Sat, Jul 05, 2025 at 01:00:23AM +0530, Bandhan Pramanik wrote:
> Hi everyone,
> 
> Here after a week. I did my research.
> 
> I talked to some folks on IRC and the glaring issue was basically this: 
> 
> > [ 1146.810055] pcieport 0000:00:1c.0: AER: Uncorrectable (Fatal) error message received from 0000:01:00.0

Where is the complete dmesg log from which this is extracted?

> This basically means that the root port (that 1c thing written with
> colons) of PCIe is the main problem here. 
> 
> One particular note: this issue can be reproduced on the models of
> this same laptop. Therefore, this happens in most if not all of the
> laptops of the same model.
> 
> For starters, the root port basically manages the communication
> between the CPU and the device. Now, this root port itself is
> reporting fatal errors.
> 
> This is not a Wi-Fi error, but something deeper. 

Devices that support AER have extra log registers to capture details
about an error.  A device that detects an error sends a PCIe Error
Message upstream to a Root Port.  The Root Port generates an
interrupt, which is handled by the aer driver.  In this case, the
01:00.0 device detected an error and sent an ERR_FATAL message
upstream, and the 00:1c.0 Root Port received it and generated an
interrupt.  The ERR_FATAL message doesn't contain any details about
the error itself, so the aer driver looks for the AER registers in the
01:00.0 device and logs those details to the dmesg log.  Normally
there would be a few lines after the one you quoted that would include
those details.

Bjorn

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567
  2025-07-05 13:50           ` Bjorn Helgaas
@ 2025-07-05 15:00             ` Bandhan Pramanik
  2025-07-05 19:58               ` Bjorn Helgaas
  0 siblings, 1 reply; 21+ messages in thread
From: Bandhan Pramanik @ 2025-07-05 15:00 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jeff Johnson, linux-pci, linux-acpi, ath10k, linux-wireless,
	stable

Hello,

The dmesg log (the older one) is present here:
https://gist.githubusercontent.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180/raw/78460e6931a055b6776afe756a95d467913d5ebd/dmesg.log

The newer dmesg log includes the first line and is not overwritten by
the ring buffer (used pci=noaer in this case):
https://gist.githubusercontent.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180/raw/78460e6931a055b6776afe756a95d467913d5ebd/updated-dmesg
 (The newer one doesn't have the error recorded).

You should check out the older dmesg, the quoted line was taken from
there verbatim, including any additional details.

Bandhan

On Sat, Jul 5, 2025 at 7:20 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Sat, Jul 05, 2025 at 01:00:23AM +0530, Bandhan Pramanik wrote:
> > Hi everyone,
> >
> > Here after a week. I did my research.
> >
> > I talked to some folks on IRC and the glaring issue was basically this:
> >
> > > [ 1146.810055] pcieport 0000:00:1c.0: AER: Uncorrectable (Fatal) error message received from 0000:01:00.0
>
> Where is the complete dmesg log from which this is extracted?
>
> > This basically means that the root port (that 1c thing written with
> > colons) of PCIe is the main problem here.
> >
> > One particular note: this issue can be reproduced on the models of
> > this same laptop. Therefore, this happens in most if not all of the
> > laptops of the same model.
> >
> > For starters, the root port basically manages the communication
> > between the CPU and the device. Now, this root port itself is
> > reporting fatal errors.
> >
> > This is not a Wi-Fi error, but something deeper.
>
> Devices that support AER have extra log registers to capture details
> about an error.  A device that detects an error sends a PCIe Error
> Message upstream to a Root Port.  The Root Port generates an
> interrupt, which is handled by the aer driver.  In this case, the
> 01:00.0 device detected an error and sent an ERR_FATAL message
> upstream, and the 00:1c.0 Root Port received it and generated an
> interrupt.  The ERR_FATAL message doesn't contain any details about
> the error itself, so the aer driver looks for the AER registers in the
> 01:00.0 device and logs those details to the dmesg log.  Normally
> there would be a few lines after the one you quoted that would include
> those details.
>
> Bjorn

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567
  2025-07-05 15:00             ` Bandhan Pramanik
@ 2025-07-05 19:58               ` Bjorn Helgaas
  2025-07-06 23:01                 ` Bandhan Pramanik
  0 siblings, 1 reply; 21+ messages in thread
From: Bjorn Helgaas @ 2025-07-05 19:58 UTC (permalink / raw)
  To: Bandhan Pramanik
  Cc: Jeff Johnson, linux-pci, linux-acpi, ath10k, linux-wireless,
	stable

On Sat, Jul 05, 2025 at 08:30:46PM +0530, Bandhan Pramanik wrote:
> Hello,
> 
> The dmesg log (the older one) is present here:

[1]:
> https://gist.githubusercontent.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180/raw/78460e6931a055b6776afe756a95d467913d5ebd/dmesg.log
> 
> The newer dmesg log includes the first line and is not overwritten by
> the ring buffer (used pci=noaer in this case):
> https://gist.githubusercontent.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180/raw/78460e6931a055b6776afe756a95d467913d5ebd/updated-dmesg
>  (The newer one doesn't have the error recorded).
> 
> You should check out the older dmesg, the quoted line was taken from
> there verbatim, including any additional details.
> 
> Bandhan
> 
> On Sat, Jul 5, 2025 at 7:20 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Sat, Jul 05, 2025 at 01:00:23AM +0530, Bandhan Pramanik wrote:
> > > Hi everyone,
> > >
> > > Here after a week. I did my research.
> > >
> > > I talked to some folks on IRC and the glaring issue was basically this:
> > >
> > > > [ 1146.810055] pcieport 0000:00:1c.0: AER: Uncorrectable (Fatal) error message received from 0000:01:00.0

From [1]:

  [ 1146.810055] pcieport 0000:00:1c.0: AER: Uncorrectable (Fatal) error message received from 0000:01:00.0
  [ 1146.810069] ath10k_pci 0000:01:00.0: AER: PCIe Bus Error: severity=Uncorrectable (Fatal), type=Inaccessible, (Unregistered Agent ID)
  [ 1146.813130] ath10k_pci 0000:01:00.0: AER: can't recover (no error_detected callback)
  [ 1146.948066] pcieport 0000:00:1c.0: AER: Root Port link has been reset (0)
  [ 1146.948112] pcieport 0000:00:1c.0: AER: device recovery failed
  [ 1146.949480] ath10k_pci 0000:01:00.0: failed to wake target for read32 at 0x0003a028: -110

I think Linux is not doing a very good job of extracting error
information.  I think is_error_source() read PCI_ERR_UNCOR_STATUS from
01:00.0 and saw an error logged, but aer_get_device_error_info()
declined to read PCI_ERR_UNCOR_STATUS again because we thought the
link was unusable, so aer_print_error() didn't have any info to print,
hence the "Inaccessible" message.

Are you able to rebuild a kernel with the patch below?  This is based
on v6.16-rc1 and likely wouldn't apply cleanly to your v6.14 kernel.
But if you are able to build v6.16-rc1 with this patch, or adapt it to
v6.14, I'd be interested in the output.

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 70ac66188367..99acb1e1946e 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -990,6 +990,8 @@ static bool is_error_source(struct pci_dev *dev, struct aer_err_info *e_info)
 	if ((PCI_BUS_NUM(e_info->id) != 0) &&
 	    !(dev->bus->bus_flags & PCI_BUS_FLAGS_NO_AERSID)) {
 		/* Device ID match? */
+		pci_info(dev, "%s: bus_flags %#x e_info->id %#04x\n",
+			 __func__, dev->bus->bus_flags, e_info->id);
 		if (e_info->id == pci_dev_id(dev))
 			return true;
 
@@ -1025,6 +1027,10 @@ static bool is_error_source(struct pci_dev *dev, struct aer_err_info *e_info)
 		pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &status);
 		pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_MASK, &mask);
 	}
+	pci_info(dev, "%s: %s STATUS %#010x MASK %#010x\n",
+		 __func__,
+		 e_info->severity == AER_CORRECTABLE ? "COR" : "UNCOR",
+		 status, mask);
 	if (status & ~mask)
 		return true;
 
@@ -1368,6 +1374,8 @@ int aer_get_device_error_info(struct aer_err_info *info, int i)
 	aer = dev->aer_cap;
 	type = pci_pcie_type(dev);
 
+	pci_info(dev, "%s: type %#x cap %#04x\n", __func__, type, aer);
+
 	/* Must reset in this function */
 	info->status = 0;
 	info->tlp_header_valid = 0;
@@ -1383,16 +1391,14 @@ int aer_get_device_error_info(struct aer_err_info *info, int i)
 			&info->mask);
 		if (!(info->status & ~info->mask))
 			return 0;
-	} else if (type == PCI_EXP_TYPE_ROOT_PORT ||
-		   type == PCI_EXP_TYPE_RC_EC ||
-		   type == PCI_EXP_TYPE_DOWNSTREAM ||
-		   info->severity == AER_NONFATAL) {
-
+	} else {
 		/* Link is still healthy for IO reads */
 		pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS,
 			&info->status);
 		pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_MASK,
 			&info->mask);
+		pci_info(dev, "%s: UNCOR STATUS %#010x MASK %#010x\n",
+			 __func__, info->status, info->mask);
 		if (!(info->status & ~info->mask))
 			return 0;
 
@@ -1471,6 +1477,8 @@ static void aer_isr_one_error(struct pci_dev *root,
 {
 	u32 status = e_src->status;
 
+	pci_info(root, "%s: ROOT_STATUS %#010x ROOT_ERR_SRC %#010x\n",
+		 __func__, e_src->status, e_src->id);
 	pci_rootport_aer_stats_incr(root, e_src);
 
 	/*

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567
  2025-07-05 19:58               ` Bjorn Helgaas
@ 2025-07-06 23:01                 ` Bandhan Pramanik
  2025-07-07  6:11                   ` Manivannan Sadhasivam
  0 siblings, 1 reply; 21+ messages in thread
From: Bandhan Pramanik @ 2025-07-06 23:01 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jeff Johnson, linux-pci, linux-acpi, ath10k, linux-wireless,
	stable

Hi Bjorn,

I have downloaded 6.16-rc4, and I have a bootable pendrive having the arch iso, but I really don't know how to rebuild the kernel on a bootable drive.

Any tips on how to do that?

Bandhan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567
  2025-07-06 23:01                 ` Bandhan Pramanik
@ 2025-07-07  6:11                   ` Manivannan Sadhasivam
  2025-07-09 17:30                     ` Bandhan Pramanik
  0 siblings, 1 reply; 21+ messages in thread
From: Manivannan Sadhasivam @ 2025-07-07  6:11 UTC (permalink / raw)
  To: Bandhan Pramanik
  Cc: Bjorn Helgaas, Jeff Johnson, linux-pci, linux-acpi, ath10k,
	linux-wireless, stable

On Mon, Jul 07, 2025 at 04:31:22AM GMT, Bandhan Pramanik wrote:
> Hi Bjorn,
> 
> I have downloaded 6.16-rc4, and I have a bootable pendrive having the arch iso, but I really don't know how to rebuild the kernel on a bootable drive.
> 

You don't need to reinstall Arch for installing a custom kernel. Refer the Arch
linux wiki on how to install a custom kernel from source:

https://wiki.archlinux.org/title/Kernel/Arch_build_system
https://wiki.archlinux.org/title/Kernel/Traditional_compilation

- Mani

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567
  2025-07-07  6:11                   ` Manivannan Sadhasivam
@ 2025-07-09 17:30                     ` Bandhan Pramanik
  2025-07-10 19:06                       ` Bandhan Pramanik
  0 siblings, 1 reply; 21+ messages in thread
From: Bandhan Pramanik @ 2025-07-09 17:30 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Bjorn Helgaas, Jeff Johnson, linux-pci, linux-acpi, ath10k,
	linux-wireless, stable

Hello, 

I was actually a bit distracted by the things caused by the Automatic Partitioning of Fedora. I'll inform that in Fedora Bugzilla... anyway.

I realised that making the modules will take 8-9 hours, I didn't even have much of a success (because all the modules didn't properly load, particularly the firmware-N.bin files couldn't be found). 

But I'll try to recompile the kernel, I'll just have to give it overnight time.

Bandhan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567
  2025-07-09 17:30                     ` Bandhan Pramanik
@ 2025-07-10 19:06                       ` Bandhan Pramanik
  2025-07-11 12:15                         ` Bjorn Helgaas
  0 siblings, 1 reply; 21+ messages in thread
From: Bandhan Pramanik @ 2025-07-10 19:06 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Bjorn Helgaas, Jeff Johnson, linux-pci, linux-acpi, ath10k,
	linux-wireless, stable

Ok, we did it. Could reproduce the errors properly.

Here are the journalctl logs:

Kernel level: https://gist.githubusercontent.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180/raw/a9e93c4ba41fb0b3d7602e6bfddce9aa5f3a19b2/KERNEL%2520journalctl%2520v6.16-rc4
User level: https://gist.githubusercontent.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180/raw/a9e93c4ba41fb0b3d7602e6bfddce9aa5f3a19b2/NON-KERNEL%2520journalctl%2520v6.16-rc4

Just so you know, I have used v6.16-rc4.

Bandhan.

On Wed, Jul 9, 2025 at 11:00 PM Bandhan Pramanik
<bandhanpramanik06.foss@gmail.com> wrote:
>
> Hello,
>
> I was actually a bit distracted by the things caused by the Automatic Partitioning of Fedora. I'll inform that in Fedora Bugzilla... anyway.
>
> I realised that making the modules will take 8-9 hours, I didn't even have much of a success (because all the modules didn't properly load, particularly the firmware-N.bin files couldn't be found).
>
> But I'll try to recompile the kernel, I'll just have to give it overnight time.
>
> Bandhan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567
  2025-07-10 19:06                       ` Bandhan Pramanik
@ 2025-07-11 12:15                         ` Bjorn Helgaas
  2025-07-11 16:04                           ` Bandhan Pramanik
  0 siblings, 1 reply; 21+ messages in thread
From: Bjorn Helgaas @ 2025-07-11 12:15 UTC (permalink / raw)
  To: Bandhan Pramanik
  Cc: Manivannan Sadhasivam, Jeff Johnson, linux-pci, linux-acpi,
	ath10k, linux-wireless, stable

On Fri, Jul 11, 2025 at 12:36:12AM +0530, Bandhan Pramanik wrote:
> Ok, we did it. Could reproduce the errors properly.
> 
> Here are the journalctl logs:
> 
> Kernel level: https://gist.githubusercontent.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180/raw/a9e93c4ba41fb0b3d7602e6bfddce9aa5f3a19b2/KERNEL%2520journalctl%2520v6.16-rc4
> User level: https://gist.githubusercontent.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180/raw/a9e93c4ba41fb0b3d7602e6bfddce9aa5f3a19b2/NON-KERNEL%2520journalctl%2520v6.16-rc4

Thanks.  These logs look like the kernel doesn't include the patch I
sent at https://lore.kernel.org/r/20250705195846.GA2011829@bhelgaas

Can you please try with that patch?

> Just so you know, I have used v6.16-rc4.
> 
> Bandhan.
> 
> On Wed, Jul 9, 2025 at 11:00 PM Bandhan Pramanik
> <bandhanpramanik06.foss@gmail.com> wrote:
> >
> > Hello,
> >
> > I was actually a bit distracted by the things caused by the Automatic Partitioning of Fedora. I'll inform that in Fedora Bugzilla... anyway.
> >
> > I realised that making the modules will take 8-9 hours, I didn't even have much of a success (because all the modules didn't properly load, particularly the firmware-N.bin files couldn't be found).
> >
> > But I'll try to recompile the kernel, I'll just have to give it overnight time.
> >
> > Bandhan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567
  2025-07-11 12:15                         ` Bjorn Helgaas
@ 2025-07-11 16:04                           ` Bandhan Pramanik
  2025-07-11 16:36                             ` Bjorn Helgaas
  0 siblings, 1 reply; 21+ messages in thread
From: Bandhan Pramanik @ 2025-07-11 16:04 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Manivannan Sadhasivam, Jeff Johnson, linux-pci, linux-acpi,
	ath10k, linux-wireless, stable

Hello,

I really couldn't find on the internet how to compile a single file now that I have compiled the whole kernel.

Any ways to do that?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567
  2025-07-11 16:04                           ` Bandhan Pramanik
@ 2025-07-11 16:36                             ` Bjorn Helgaas
  2025-07-12  6:48                               ` Bandhan Pramanik
  0 siblings, 1 reply; 21+ messages in thread
From: Bjorn Helgaas @ 2025-07-11 16:36 UTC (permalink / raw)
  To: Bandhan Pramanik
  Cc: Manivannan Sadhasivam, Jeff Johnson, linux-pci, linux-acpi,
	ath10k, linux-wireless, stable

On Fri, Jul 11, 2025 at 09:34:43PM +0530, Bandhan Pramanik wrote:
> Hello,
> 
> I really couldn't find on the internet how to compile a single file
> now that I have compiled the whole kernel.
> 
> Any ways to do that?

If you apply the patch (cd to the linux/ directory, then "patch -p1 <
email-file"), then run whatever "make" command you used before, it
should rebuild that file and relink the whole kernel.

Bjorn

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567
  2025-07-11 16:36                             ` Bjorn Helgaas
@ 2025-07-12  6:48                               ` Bandhan Pramanik
  2025-07-29 17:35                                 ` Bandhan Pramanik
  0 siblings, 1 reply; 21+ messages in thread
From: Bandhan Pramanik @ 2025-07-12  6:48 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Manivannan Sadhasivam, Jeff Johnson, linux-pci, linux-acpi,
	ath10k, linux-wireless, stable

Compiled the usual way: the bzImage compiled within 4-5 minutes
(compared to 1 hour previously), and the modules compiled within 1
hour (compared to 8 hours previously). Also, the congestion strangely
didn't happen. It was instead silently followed by "No Internet".

Didn't add the kernel-level journalctl because I'm sure that the
normal journalctl includes the kernel-level stuff too:
https://gist.githubusercontent.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180/raw/07b34aa3fa19da5afa4bb161454e3cb2081b9880/journalctl%2520v6.16-rc4-PATCH1

Please let me know what you think of the logs.

Bandhan

On Fri, Jul 11, 2025 at 10:06 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Fri, Jul 11, 2025 at 09:34:43PM +0530, Bandhan Pramanik wrote:
> > Hello,
> >
> > I really couldn't find on the internet how to compile a single file
> > now that I have compiled the whole kernel.
> >
> > Any ways to do that?
>
> If you apply the patch (cd to the linux/ directory, then "patch -p1 <
> email-file"), then run whatever "make" command you used before, it
> should rebuild that file and relink the whole kernel.
>
> Bjorn

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567
  2025-06-25 19:17 Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567 Bandhan Pramanik
  2025-06-25 20:20 ` Bjorn Helgaas
@ 2025-07-12 19:18 ` Askar Safin
  2025-07-13 16:04   ` Bandhan Pramanik
  1 sibling, 1 reply; 21+ messages in thread
From: Askar Safin @ 2025-07-12 19:18 UTC (permalink / raw)
  To: bandhanpramanik06.foss
  Cc: ath10k, linux-acpi, linux-pci, linux-wireless, stable

I saw problems with Atheros on my Dell Inspiron, too.

These instructions helped me to reset the device without reboot:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730331/comments/40

I used modified script based on the one above (run as root):

set -e
rmmod ath10k_pci 2> /dev/null || :
rmmod ath10k_core 2> /dev/null || :
rmmod ath 2> /dev/null || :
{ echo 1 > /sys/bus/pci/devices/0000\:03\:00.0/remove; } 2> /dev/null || :
sleep 2
echo 1 > /sys/bus/pci/rescan

Try both scripts, one of them should work.

If still doesn't work, try to run original script, then do hibernate, if still doesn't work, run script again.

I finally was able to solve my problem by replacing Wi-Fi adapter. :) Here is my new Wi-Fi adapter:

[    7.136347] iwlwifi 0000:03:00.0: Detected Intel(R) Dual Band Wireless AC 3160, REV=0x164

--
Askar Safin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567
  2025-07-12 19:18 ` Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567 Askar Safin
@ 2025-07-13 16:04   ` Bandhan Pramanik
  0 siblings, 0 replies; 21+ messages in thread
From: Bandhan Pramanik @ 2025-07-13 16:04 UTC (permalink / raw)
  To: Askar Safin; +Cc: ath10k, linux-acpi, linux-pci, linux-wireless, stable

Hello Askar,

I appreciate your response. However, we're mainly trying to find out
exactly 'why' this problem occurs. You might say that it's some kind
of "Root Cause Analysis," so that this error goes away from the
Inspiron laptops once and for all. If you keep on reading the messages
in the other thread, you'll realise just how deep this error goes.

But still, thanks a lot for the response.

Bandhan

On Sun, Jul 13, 2025 at 12:48 AM Askar Safin <safinaskar@zohomail.com> wrote:
>
> I saw problems with Atheros on my Dell Inspiron, too.
>
> These instructions helped me to reset the device without reboot:
>
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730331/comments/40
>
> I used modified script based on the one above (run as root):
>
> set -e
> rmmod ath10k_pci 2> /dev/null || :
> rmmod ath10k_core 2> /dev/null || :
> rmmod ath 2> /dev/null || :
> { echo 1 > /sys/bus/pci/devices/0000\:03\:00.0/remove; } 2> /dev/null || :
> sleep 2
> echo 1 > /sys/bus/pci/rescan
>
> Try both scripts, one of them should work.
>
> If still doesn't work, try to run original script, then do hibernate, if still doesn't work, run script again.
>
> I finally was able to solve my problem by replacing Wi-Fi adapter. :) Here is my new Wi-Fi adapter:
>
> [    7.136347] iwlwifi 0000:03:00.0: Detected Intel(R) Dual Band Wireless AC 3160, REV=0x164
>
> --
> Askar Safin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567
  2025-07-12  6:48                               ` Bandhan Pramanik
@ 2025-07-29 17:35                                 ` Bandhan Pramanik
  2025-08-17  9:38                                   ` [PATCH TEST] ath10k: Testing Mani's ASPM patch (QCA9377, v6.16-rc1) Bandhan Pramanik
  0 siblings, 1 reply; 21+ messages in thread
From: Bandhan Pramanik @ 2025-07-29 17:35 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Manivannan Sadhasivam, Jeff Johnson, linux-pci, linux-acpi,
	ath10k, linux-wireless, stable

Hello everyone,

I hope this email finds you all in good health.

This short email is being made as a humble request to look at the logs I 
sent previously, and if anything can be done to them.

Thanks,

Bandhan Pramanik

বন্ধন প্রামাণিক


On 7/12/25 12:18 PM, Bandhan Pramanik wrote:
> Compiled the usual way: the bzImage compiled within 4-5 minutes
> (compared to 1 hour previously), and the modules compiled within 1
> hour (compared to 8 hours previously). Also, the congestion strangely
> didn't happen. It was instead silently followed by "No Internet".
>
> Didn't add the kernel-level journalctl because I'm sure that the
> normal journalctl includes the kernel-level stuff too:
> https://gist.githubusercontent.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180/raw/07b34aa3fa19da5afa4bb161454e3cb2081b9880/journalctl%2520v6.16-rc4-PATCH1
>
> Please let me know what you think of the logs.
>
> Bandhan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH TEST] ath10k: Testing Mani's ASPM patch (QCA9377, v6.16-rc1)
  2025-07-29 17:35                                 ` Bandhan Pramanik
@ 2025-08-17  9:38                                   ` Bandhan Pramanik
  0 siblings, 0 replies; 21+ messages in thread
From: Bandhan Pramanik @ 2025-08-17  9:38 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Manivannan Sadhasivam, Jeff Johnson, linux-pci, linux-acpi,
	ath10k, linux-wireless, stable

Hello everyone,

I was actually trying to build a kernel for someone else using build 
automation scripts. They still didn't compile. (Here's the relevant 
thread I'm referring to: 
https://lore.kernel.org/ath10k/176B76BC-6801-4C3F-A774-9611B43ED4AF@gmail.com/T/#t) 

Regardless, I tested the patches by Mani 
(https://lore.kernel.org/r/20250716-ath-aspm-fix-v1-0-dd3e62c1b692@oss.qualcomm.com).

Here are the logs:

Without pcie_aspm=off (did not work): 
https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180/raw/84edf89740919e3cba2ac33e567110b14e2d3627/patch-dmesg

With pcie_aspm=off (worked, no issues observed): 
https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180/raw/84edf89740919e3cba2ac33e567110b14e2d3627/patch-dmesg-pcie_aspm_off

I'm not fully sure what the logs imply, but thanks, it did work even 
after playing YouTube videos for two hours (might not even work in the 
future, cannot be certain). Earlier, my laptop had its Wi-Fi turning off 
within minutes because of IRQ #16 being flooded, in every kernel I 
tested so far. Now, it does work.

There can be seen certain stuff in the second log, where it took some 
time to connect to Wi-Fi, but I think it is mostly fine.

Thanks to those who have prepared the patches and those who did bear 
with me, but there's one request I still need to make.

For anyone who has built patched kernels for Ubuntu and exported in DEB 
files, I would appreciate any guidance or pointers on how to do the 
same, as my efforts to use "make deb-pkg" have been futile: 
https://github.com/BandhanPramanik/ath10k-patched-kernel-ppa/actions/runs/16975573959/job/48123563881


Bandhan Pramanik
বন্ধন প্রামাণিক


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-08-17  9:38 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-25 19:17 Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567 Bandhan Pramanik
2025-06-25 20:20 ` Bjorn Helgaas
2025-06-25 22:50   ` Bandhan Pramanik
2025-06-26 17:53     ` Bandhan Pramanik
2025-06-26 23:21       ` Bandhan Pramanik
2025-07-04 19:30         ` Bandhan Pramanik
2025-07-05 13:50           ` Bjorn Helgaas
2025-07-05 15:00             ` Bandhan Pramanik
2025-07-05 19:58               ` Bjorn Helgaas
2025-07-06 23:01                 ` Bandhan Pramanik
2025-07-07  6:11                   ` Manivannan Sadhasivam
2025-07-09 17:30                     ` Bandhan Pramanik
2025-07-10 19:06                       ` Bandhan Pramanik
2025-07-11 12:15                         ` Bjorn Helgaas
2025-07-11 16:04                           ` Bandhan Pramanik
2025-07-11 16:36                             ` Bjorn Helgaas
2025-07-12  6:48                               ` Bandhan Pramanik
2025-07-29 17:35                                 ` Bandhan Pramanik
2025-08-17  9:38                                   ` [PATCH TEST] ath10k: Testing Mani's ASPM patch (QCA9377, v6.16-rc1) Bandhan Pramanik
2025-07-12 19:18 ` Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567 Askar Safin
2025-07-13 16:04   ` Bandhan Pramanik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).