public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region
@ 2026-04-22  8:03 dayou5941
  2026-04-22 14:36 ` Niklas Cassel
  2026-04-30  0:59 ` kernel test robot
  0 siblings, 2 replies; 10+ messages in thread
From: dayou5941 @ 2026-04-22  8:03 UTC (permalink / raw)
  To: dlemoal, cassel; +Cc: linux-ide, linux-kernel, liyouhong

From: liyouhong <liyouhong@kylinos.cn>

Commit 18ee7c49f75b ("ata: ahci: Introduce firmware-specific caps
initialization") introduced a regression where the driver would
attempt to access port registers beyond the mapped MMIO region when
HOST_PORTS_IMPL and HOST_CAP contains invalid values.

This occurs in scenarios where:
1. The AHCI controller is disabled by BIOS, causing registers to read
   as 0xFFFFFFFF.
2. HOST_PORTS_IMPL indicates more ports than physically implemented.
3. The actual MMIO region is smaller than needed for the indicated
ports.

When the driver iterates through ports indicated in HOST_PORTS_IMPL and
accesses PORT_CMD registers, it may calculate port offsets that exceed
the mapped MMIO region. On ARM64 systems, this causes a kernel panic.
when attempting to read from the unmapped address.

When mmio_size is 0x1000 and all ports need to be accessed.
The issue manifests as a panic in __raw_readl() when i=30, as the
calculated offset (0x1000) equals the typical MMIO region size (0x1000),
placing the access exactly at the boundary of the mapped region.

The log is as follows:
[    0.389751][    T1] Unable to handle kernel paging request at virtual address ffff800082916018
[    0.389752][    T1] Mem abort info:
[    0.389753][    T1]   ESR = 0x0000000096000007
[    0.389754][    T1]   EC = 0x25: D
ABT (current EL), IL = 32 bits
[    0.389756][    T1]   SET = 0, FnV = 0
[    0.389757][    T1]   EA = 0, S1PTW = 0
[    0.389758][    T1]   FSC = 0x07: level 3 translation fault
[    0.389759][    T1] Data abort info:
[    0.389760][    T1]   ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
[    0.389761][    T1]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[    0.389762][    T1] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[    0.389764][    T1] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000e16b1000
[    0.389766][    T1] [ffff800082916018] pgd=10000000e264d003, p4d=10000000e264d003, pud=10000000e264e003, pmd=10000000e2653003, pte=0000000000000000
[    0.389772][    T1] Internal error: Oops: 0000000096000007 [#1]  SMP
[    0.391912][    T1] Modules linked in:
[    0.397407][   T71] nvme nvme0: allocated 16 MiB host memory buffer (1 segment).
[    0.406067][   T71] nvme nvme0: 8/0/0 default/read/poll queues
[    0.426950][   T12]  nvme0n1: p1 p2 p3 p4 p5 p6
[    0.427171][   T71] probe of 0000:01:00.0 returned 0 after 37797 usecs
[    0.428770][   T80] Freeing initrd memory: 39012K
[   12.823718][    T1] CPU: 1 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-dirty #3 PREEMPTLAZY 
[   12.832490][    T1] Hardware name: ISOFTSTONE COMPUTER TianYaoW600 Series/EM_FC19M, BIOS KL4.2B.EM.D.R003.RTHF.260414.R 04/14/2026 16:26:14
[   12.844992][    T1] pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   12.852635][    T1] pc : ahci_init_one+0x6a8/0xf68
[   12.857417][    T1] lr : ahci_init_one+0x668/0xf68
[   12.862194][    T1] sp : ffff8000828aba20
[   12.866189][    T1] x29: ffff8000828aba20 x28: 000000000000001e x27: ffff800081742300
[   12.874007][    T1] x26: ffff0020289b8000 x25: 0000000000000001 x24: ffff002020f710d0
[   12.881823][    T1] x23: 0000000000000005 x22: ffff002026d9ee00 x21: ffff002026d9f280
[   12.889639][    T1] x20: ffff002020f71000 x19: 0000000000000000 x18: 00000000ffffffff
[   12.897456][    T1] x17: ffff002021c7c000 x16: ffff002020d6ee00 x15: ffff8000828ab850
[   12.905272][    T1] x14: ffff0020289ba42c x13: ffff0020289ba3f9 x12: 0101010101010101
[   12.913088][    T1] x11: 7f7f7f7f7f7f7f7f x10: 0000000000000078 x9 : ffff800080bdf1b0
[   12.920905][    T1] x8 : 0000000000000030 x7 : 0000000000000008 x6 : ffff800081271928
[   12.928721][    T1] x5 : 0000000000000030 x4 : 0000000000000030 x3 : 0000000000000000
[   12.936537][    T1] x2 : ffff002026d9f280 x1 : 0000000000001018 x0 : ffff800082916018
[   12.944354][    T1] Call trace:
[   12.947481][    T1]  ahci_init_one+0x6a8/0xf68 (P)
[   12.952260][    T1]  local_pci_probe+0x44/0xb0
[   12.956693][    T1]  pci_device_probe+0xd4/0x268
[   12.961298][    T1]  really_probe+0xc4/0x3e8
[   12.965557][    T1]  __driver_probe_device+0xd4/0x188
[   12.970595][    T1]  driver_probe_device+0x40/0x118
[   12.975460][    T1]  __driver_attach+0xe8/0x218
[   12.979977][    T1]  bus_for_each_dev+0x7c/0xe0
[   12.984494][    T1]  driver_attach+0x28/0x38
[   12.988751][    T1]  bus_add_driver+0x120/0x248
[   12.993269][    T1]  driver_register+0x60/0x128
[   12.997787][    T1]  __pci_register_driver+0x50/0x60
[   13.002739][    T1]  ahci_pci_driver_init+0x28/0x38
[   13.007605][    T1]  do_one_initcall+0x58/0x450
[   13.012124][    T1]  kernel_init_freeable+0x1fc/0x560
[   13.017164][    T1]  kernel_init+0x28/0x1f8
[   13.021337][    T1]  ret_from_fork+0x10/0x20
[   13.025596][    T1] Code: 53196021 91046021 f9400840 8b010000 (b9400000) 
[   13.032370][    T1] ---[ end trace 0000000000000000 ]---
[   13.037737][    T1] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[   13.046077][    T1] SMP: stopping secondary CPUs
[   13.050684][    T1] Kernel Offset: disabled
[   13.054852][    T1] CPU features: 0x0000000,000e0005,40230521,0401720b
[   13.061366][    T1] Memory Limit: none
[   13.065102][    T1] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

Fix this by adding bounds checking in ahci_save_initial_config(). Before
accessing each port's PORT_CMD register, verify that the port's register
block offset is within the mapped MMIO region. Skip any ports that would
require accessing memory beyond the mapped region.

Fixes: 18ee7c49f75b ("ata: ahci: Introduce firmware-specific caps initialization")
Signed-off-by: liyouhong <liyouhong@kylinos.cn>
diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index 2d1ca9f2f546..69de3560aeec 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -468,6 +468,8 @@ void ahci_save_initial_config(struct device *dev, struct ahci_host_priv *hpriv)
 	void __iomem *port_mmio;
 	unsigned long port_map;
 	u32 cap, cap2, vers;
+	unsigned long long mmio_size = 0;
+	bool is_pci_dev = false;
 	int i;
 
 	/* make sure AHCI mode is enabled before accessing CAP */
@@ -595,6 +597,13 @@ void ahci_save_initial_config(struct device *dev, struct ahci_host_priv *hpriv)
 		hpriv->saved_port_map = port_map;
 	}
 
+	is_pci_dev = dev_is_pci(dev);
+	if (is_pci_dev) {
+		struct pci_dev *pdev = to_pci_dev(dev);
+
+		mmio_size = (unsigned long long)pci_resource_len(pdev, 5);
+	}
+
 	/*
 	 * Preserve the ports capabilities defined by the platform. Note there
 	 * is no need in storing the rest of the P#.CMD fields since they are
@@ -605,6 +614,29 @@ void ahci_save_initial_config(struct device *dev, struct ahci_host_priv *hpriv)
 			continue;
 
 		port_mmio = __ahci_port_base(hpriv, i);
+
+		/* Calculate offset from MMIO base */
+		unsigned long long port_offset = (unsigned long long)port_mmio -
+						 (unsigned long long)mmio;
+		/* Check if port register block is within MMIO region */
+		if (is_pci_dev && port_offset >= mmio_size) {
+			/*
+			 * Port registers exceed MMIO region boundary.
+			 * Since ports are sequentially mapped (0x100 + i*0x80),
+			 * all subsequent ports will also exceed the boundary.
+			 *
+			 * Update port_map to exclude this and all higher ports,
+			 * then break out of the loop.
+			 */
+			dev_warn(dev, "Port %d (offset 0x%llx) exceeds MMIO region (0x%llx),
+				       truncating port map at port %d\n",
+				       i, port_offset, mmio_size, i-1);
+
+			port_map = (1UL << i) - 1;
+			hpriv->saved_port_map = port_map;
+			break;
+		}
+
 		hpriv->saved_port_cap[i] =
 			readl(port_mmio + PORT_CMD) & PORT_CMD_CAP;
 	}
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region
  2026-04-22  8:03 [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region dayou5941
@ 2026-04-22 14:36 ` Niklas Cassel
       [not found]   ` <55809835.8838.19db9be1205.Coremail.dayou5941@163.com>
  2026-04-30  0:59 ` kernel test robot
  1 sibling, 1 reply; 10+ messages in thread
From: Niklas Cassel @ 2026-04-22 14:36 UTC (permalink / raw)
  To: dayou5941; +Cc: dlemoal, linux-ide, linux-kernel, liyouhong

Hello liyouhong,

On Wed, Apr 22, 2026 at 04:03:22PM +0800, dayou5941@163.com wrote:
> From: liyouhong <liyouhong@kylinos.cn>
> 
> Commit 18ee7c49f75b ("ata: ahci: Introduce firmware-specific caps
> initialization") introduced a regression where the driver would
> attempt to access port registers beyond the mapped MMIO region when
> HOST_PORTS_IMPL and HOST_CAP contains invalid values.
> 
> This occurs in scenarios where:
> 1. The AHCI controller is disabled by BIOS, causing registers to read
>    as 0xFFFFFFFF.
> 2. HOST_PORTS_IMPL indicates more ports than physically implemented.
> 3. The actual MMIO region is smaller than needed for the indicated
> ports.

I'm not a fan of this patch for two reasons:

1) The AHCI specification, 1.3.1, "10.1.1 Firmware Specific Initialization":

Firmware shall always initialize the following registers and values:
 CAP.SSS (support for staggered spin-up)
 CAP.SMPS (support for mechanical presence switches)
 PI (ports implemented)
  ...

Thus, it is a spec violation to not initialize these registers correctly.

If you have an AHCI controller that is a PCIe card, it will surely expose
the Ports Implemented register correctly.

So this problem should be limited to on-boad AHCI controllers, where
BIOS/platform firmware has violated the specification, and not initialized
the registers required by the specification.

If you can disable the controller via BIOS, then the most logical thing is
for these on-boards AHCI controllers to actually get disabled and not show
up in lspci at all. As far as I know, that is how it works on Intel and
AMD systems.


2) Why limit this to AHCI controllers connected via PCI?

Surely device tree platforms with a AHCI controller connected directly to
the AXI bus can also violate the specification and not initialize the PI
(Ports Implemented) register.

Sure, for AHCI controllers connected without a PCI bus, there is the
"ports-implemented" device tree property, that can be used to initialize
this register correctly for platforms without any embedded firmware.

But if we should implement a workaround, why add code that is explcitly
limited to PCI?




I think a better solution is to limit Ports Implemented to CAP.NP.
As Ports Implemented is a bitmask with up to CAP.NP bits.

Ports Implemented is not allowed to be larger than CAP.NP, it can
only mark ports within the CAP.NP MMIO region as "unimplemented".

So the highest bit set in Ports Implemented can not be higher than CAP.NP.
(I.e. a per port MMIO offset is not allowed to be larger than
CAP.NP * MMIO registers per port.)


Looking at the libata code, it looks like we have code for this already:
https://github.com/torvalds/linux/blob/v7.0/drivers/ata/libahci.c#L561-L569

Could you perhaps debug to see why this code in no longer working as
intended?

If you revert 18ee7c49f75b ("ata: ahci: Introduce firmware-specific caps
initialization"), do you see the
"implemented port map (0x%lx) contains more ports than nr_ports (%u), using nr_ports"
warning in the log?

Please share the equivalent to these prints after reverting the commit:
[1825468.337982] ahci 0000:00:17.0: AHCI vers 0001.0301, 32 command slots, 6 Gbps, SATA mode
[1825468.350108] ahci 0000:00:17.0: 8/8 ports implemented (port mask 0xff)


> When mmio_size is 0x1000 and all ports need to be accessed.
> The issue manifests as a panic in __raw_readl() when i=30, as the
> calculated offset (0x1000) equals the typical MMIO region size (0x1000),
> placing the access exactly at the boundary of the mapped region.

AFAICT, port_map gets limited to CAP.NP, and we only access ports that are
within port_map (and this offset < CAP.NP):

	for_each_set_bit(i, &port_map, AHCI_MAX_PORTS) {
		if (hpriv->saved_port_cap[i])
			continue;

		port_mmio = __ahci_port_base(hpriv, i);



Wait a second:

> 3. The actual MMIO region is smaller than needed for the indicated
> ports.

I hope you are not saying that your PCI device is only exposing a BAR
that is smaller than needed to contain the number of ports defined in
CAP.NP in this controller?

That sounds like a seriously buggy controller...

Could you please share the output of (with the PCI BDF of you controller
of course):

$ sudo lspci -vvvnns 0000:00:17.0
00:17.0 SATA controller [0106]: Intel Corporation C620 Series Chipset Family SATA Controller [AHCI mode] [8086:a182] (rev 0a) (prog-if 01 [AHCI 1.0])
        Subsystem: Super Micro Computer Inc Device [15d9:1b62]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 138
        NUMA node: 0
        IOMMU group: 28
        Region 0: Memory at a5984000 (32-bit, non-prefetchable) [size=8K]
        Region 1: Memory at a598a000 (32-bit, non-prefetchable) [size=256]
        Region 2: I/O ports at 4050 [size=8]
        Region 3: I/O ports at 4040 [size=4]
        Region 4: I/O ports at 4000 [size=32]
        Region 5: Memory at a5700000 (32-bit, non-prefetchable) [size=512K]

So we can see PCI vendor ID, PCI device ID, and size of the BARs?

From 3.3 Port Registers (one set per port):

Port offset = 100h + (PI Asserted Bit Position * 80h)

CAP.NP can represent at most 32 ports. (Note that the register field itself
is 0's based, so a value of 0 means 1 port and a value of 31 means 32 ports.)

If you really have 32 ports, then you would need a BAR that has size:
0x100+(31*0x80) + 0x80 (start offset + size of the per port register itself): 0x1100

Then we would need to add a quirk for your PCI device and PCI vendor ID,
which limits the number of ports to 30 (by setting CAP.NP to 29) for your
AHCI controller.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region
       [not found]   ` <55809835.8838.19db9be1205.Coremail.dayou5941@163.com>
@ 2026-04-23 17:19     ` Niklas Cassel
  2026-04-24  2:43       ` Damien Le Moal
       [not found]       ` <26f59adf.571d.19dbe310f41.Coremail.dayou5941@163.com>
  0 siblings, 2 replies; 10+ messages in thread
From: Niklas Cassel @ 2026-04-23 17:19 UTC (permalink / raw)
  To: 李佑鸿; +Cc: dlemoal, linux-ide, linux-kernel, liyouhong

Hello liyouhong,

On Thu, Apr 23, 2026 at 05:48:54PM +0800, 李佑鸿  wrote:
> However, I have already confirmed with the BIOS supplier that when the 
> BIOS disables all SATA ports, it does indeed initialize the values of 
> the HOST_CAP and HOSTPorts_IMPL registers in accordance with the specifications.
> 
> 
> /* Register values after disabling SATA in BIOS */
> HOST_CAP (0x00) = 0xffffffff
> HOST_PORTS_IMPL (0x0c) = 0xffffffff
> HOST_VERSION (0x10) = 0xffffffff
> MMIO_SIZE = 4096

I am actually very surprised to see e.g. CAP (0x00) and AHCI VERSION (0x10)
being uninitialized.

These registers are not mentioned in:
AHCI 1.3.1 - 10.1.1 Firmware Specific Initialization

And I would have expected them to have fixed value regardless if BIOS
has enabled the controller or not, because e.g. CAP.NP (number of ports)
and the AHCI version must obviously be known when synthesizing the IP.


> lspci -vvvnns 05:00.0
> 05:00.0 SATA controller [0106]: Phytium Technology Co., Ltd. Device [1db7:d001] (rev 01) (prog-if 01 [AHCI 1.0])
> Region 5: Memory at 5b800000 (32-bit, non-prefetchable) [size=4K]

So the BAR size is 4K (when the controller is disabled).

Could you please enable the controller in BIOS, and then dump the value of
the CAP (0x00) register.

And also double check that the BAR size is still 4K when the controller is
enabled?

We could then add a quirk to drivers/ata/ahci.c, something like:

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 1d73a53370cf..e7250781e9b7 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -2003,6 +2003,10 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
        if (pdev->vendor == 0x177d && pdev->device == 0xa01c)
                hpriv->irq_handler = ahci_thunderx_irq_handler;
+
+       /* Phytium SATA controller has empty CAP register */
+       if (pdev->vendor == 0x1db7 && pdev->device == 0xd001)
+               hpriv->saved_cap = 0xC734FF02;
 #endif
 
        /* save initial config */


Where you replace 0xC734FF02 with whatever you get when reading the
CAP (0x00) register when the controller is enabled.


Kind regards,
Niklas

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region
  2026-04-23 17:19     ` Niklas Cassel
@ 2026-04-24  2:43       ` Damien Le Moal
       [not found]         ` <13d7d471.6389.19dbe594e14.Coremail.dayou5941@163.com>
       [not found]       ` <26f59adf.571d.19dbe310f41.Coremail.dayou5941@163.com>
  1 sibling, 1 reply; 10+ messages in thread
From: Damien Le Moal @ 2026-04-24  2:43 UTC (permalink / raw)
  To: Niklas Cassel, 李佑鸿
  Cc: linux-ide, linux-kernel, liyouhong

On 4/24/26 02:19, Niklas Cassel wrote:
> Hello liyouhong,
> 
> On Thu, Apr 23, 2026 at 05:48:54PM +0800, 李佑鸿  wrote:
>> However, I have already confirmed with the BIOS supplier that when the 
>> BIOS disables all SATA ports, it does indeed initialize the values of 
>> the HOST_CAP and HOSTPorts_IMPL registers in accordance with the specifications.
>>
>>
>> /* Register values after disabling SATA in BIOS */
>> HOST_CAP (0x00) = 0xffffffff
>> HOST_PORTS_IMPL (0x0c) = 0xffffffff
>> HOST_VERSION (0x10) = 0xffffffff
>> MMIO_SIZE = 4096
> 
> I am actually very surprised to see e.g. CAP (0x00) and AHCI VERSION (0x10)
> being uninitialized.

What I am surprised of here is that we even see that device on the PCI bus at
all when it is disabled in the BIOs. If that device is disabled, why are we even
seeing it by scanning the PCI ports ? The adapter should simply not be visible
at all.

Trying to debug register values when we should not even be seeing the device in
the first place does not make sense to me. If anything, I would take a really
big hammer here and try to completely ignore that adapter if we can somehow
detect that it has been in fact disabled in the BIOS. But that detection may be
challenging to do since it seems we are dealing with a very buggy BIOS.

So maybe we should simply warn and exit probe early if we see a PCI BAR size
that is broken.

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: Re: [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region
       [not found]       ` <26f59adf.571d.19dbe310f41.Coremail.dayou5941@163.com>
@ 2026-04-24 11:07         ` Niklas Cassel
  0 siblings, 0 replies; 10+ messages in thread
From: Niklas Cassel @ 2026-04-24 11:07 UTC (permalink / raw)
  To: 李佑鸿; +Cc: dlemoal, linux-ide, linux-kernel, liyouhong

On Fri, Apr 24, 2026 at 02:32:59PM +0800, 李佑鸿  wrote:
> However, in the following code:
> cap = readl(mmio + HOST_CAP); /* When cap = 0xFFFFFFFF */
> if (hpriv->saved_cap)
>     cap = (cap & ~(HOST_CAP_SSS | HOST_CAP_MPS)) | hpriv->saved_cap; /* When hpriv->saved_cap = 0xEF36FF81 */

Sorry for that. From the name hpriv->saved_cap, I assumed that it would
actually override hpriv->cap, but it seems that it only allows you to
set additional caps, not to override/clear caps that are broken...

I guess we could theoretically do:

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 1d73a53370cf..94c3c740134a 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -2003,6 +2003,10 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
        if (pdev->vendor == 0x177d && pdev->device == 0xa01c)
                hpriv->irq_handler = ahci_thunderx_irq_handler;
+
+       /* Phytium SATA controller has bogus CAP register */
+       if (pdev->vendor == 0x1db7 && pdev->device == 0xd001)
+               writel(0xEF36FF81, hpriv->mmio + HOST_CAP);
 #endif
 
        /* save initial config */



However, that said, I agree with Damien, I think a better solution is to
fail the probe(). Will reply to that mail with more details.


Kind regards,
Niklas

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Re: [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region
       [not found]         ` <13d7d471.6389.19dbe594e14.Coremail.dayou5941@163.com>
@ 2026-04-24 11:07           ` Niklas Cassel
  2026-04-24 11:15             ` Damien Le Moal
  2026-04-25  6:15             ` Re:Re: " 李佑鸿 
  0 siblings, 2 replies; 10+ messages in thread
From: Niklas Cassel @ 2026-04-24 11:07 UTC (permalink / raw)
  To: 李佑鸿
  Cc: Damien Le Moal, linux-ide, linux-kernel, liyouhong

On Fri, Apr 24, 2026 at 03:16:56PM +0800, 李佑鸿  wrote:
> At 2026-04-24 10:43:11, "Damien Le Moal" <dlemoal@kernel.org> wrote:
> >
> >What I am surprised of here is that we even see that device on the PCI bus at
> >all when it is disabled in the BIOs. If that device is disabled, why are we even
> >seeing it by scanning the PCI ports ? The adapter should simply not be visible
> >at all.

I agree.
E.g. both AMD and Intel make sure that the AHCI controller PCI device does
not show up on the PCI bus if you disable it using the BIOS.

It would have been nice if the Phytium BIOS also worked like that.


> 3. **BAR Size Mismatch**: `CAP.NP` indicates more ports than physically fit in the BAR
> - If CAP claims 32 ports but BAR is only 4KB, this is physically impossible
> - 32 ports require at least 0x1100 bytes (0x100 + 32 * 0x80)

The reason why I did not suggest failing the probe() originally, was because
李佑鸿 sent a patch with a Fixes tag, so I assumed that he considered it a
regression, and wanted the hardware to continue working, even though it was
marked as disabled in BIOS, because that is how it was before the commit in
the Fixes tag was introduced.

That said, I fully agree that I think it is better to modify the AHCI driver
to fail the probe() for this AHCI controller when it has not been properly
initialized (because it is marked as disabled in BIOS).

I prefer option 3.
Look at CAP.NP and look at the BAR size, if it is too small, just fail the
probe().


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region
  2026-04-24 11:07           ` Niklas Cassel
@ 2026-04-24 11:15             ` Damien Le Moal
  2026-04-25  6:15             ` Re:Re: " 李佑鸿 
  1 sibling, 0 replies; 10+ messages in thread
From: Damien Le Moal @ 2026-04-24 11:15 UTC (permalink / raw)
  To: Niklas Cassel, 李佑鸿
  Cc: linux-ide, linux-kernel, liyouhong

On 4/24/26 20:07, Niklas Cassel wrote:
> On Fri, Apr 24, 2026 at 03:16:56PM +0800, 李佑鸿  wrote:
>> At 2026-04-24 10:43:11, "Damien Le Moal" <dlemoal@kernel.org> wrote:
>>>
>>> What I am surprised of here is that we even see that device on the PCI bus at
>>> all when it is disabled in the BIOs. If that device is disabled, why are we even
>>> seeing it by scanning the PCI ports ? The adapter should simply not be visible
>>> at all.
> 
> I agree.
> E.g. both AMD and Intel make sure that the AHCI controller PCI device does
> not show up on the PCI bus if you disable it using the BIOS.
> 
> It would have been nice if the Phytium BIOS also worked like that.
> 
> 
>> 3. **BAR Size Mismatch**: `CAP.NP` indicates more ports than physically fit in the BAR
>> - If CAP claims 32 ports but BAR is only 4KB, this is physically impossible
>> - 32 ports require at least 0x1100 bytes (0x100 + 32 * 0x80)
> 
> The reason why I did not suggest failing the probe() originally, was because
> 李佑鸿 sent a patch with a Fixes tag, so I assumed that he considered it a
> regression, and wanted the hardware to continue working, even though it was
> marked as disabled in BIOS, because that is how it was before the commit in
> the Fixes tag was introduced.
> 
> That said, I fully agree that I think it is better to modify the AHCI driver
> to fail the probe() for this AHCI controller when it has not been properly
> initialized (because it is marked as disabled in BIOS).
> 
> I prefer option 3.
> Look at CAP.NP and look at the BAR size, if it is too small, just fail the
> probe().

Sounds good (and a lot safer) to me.


> 
> 
> Kind regards,
> Niklas


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re:Re: Re: [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region
  2026-04-24 11:07           ` Niklas Cassel
  2026-04-24 11:15             ` Damien Le Moal
@ 2026-04-25  6:15             ` 李佑鸿 
  1 sibling, 0 replies; 10+ messages in thread
From: 李佑鸿  @ 2026-04-25  6:15 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: Damien Le Moal, linux-ide, linux-kernel, liyouhong







Hi Niklas, Damien,



At 2026-04-24 19:07:33, "Niklas Cassel" <cassel@kernel.org> wrote:
>On Fri, Apr 24, 2026 at 03:16:56PM +0800, 李佑鸿  wrote:
>> At 2026-04-24 10:43:11, "Damien Le Moal" <dlemoal@kernel.org> wrote:
>> >
>> >What I am surprised of here is that we even see that device on the PCI bus at
>> >all when it is disabled in the BIOs. If that device is disabled, why are we even
>> >seeing it by scanning the PCI ports ? The adapter should simply not be visible
>> >at all.
>
>I agree.
>E.g. both AMD and Intel make sure that the AHCI controller PCI device does
>not show up on the PCI bus if you disable it using the BIOS.
>
>It would have been nice if the Phytium BIOS also worked like that.
>
>
>> 3. **BAR Size Mismatch**: `CAP.NP` indicates more ports than physically fit in the BAR
>> - If CAP claims 32 ports but BAR is only 4KB, this is physically impossible
>> - 32 ports require at least 0x1100 bytes (0x100 + 32 * 0x80)
>
>The reason why I did not suggest failing the probe() originally, was because
>李佑鸿 sent a patch with a Fixes tag, so I assumed that he considered it a
>regression, and wanted the hardware to continue working, even though it was
>marked as disabled in BIOS, because that is how it was before the commit in
>the Fixes tag was introduced.
>
>That said, I fully agree that I think it is better to modify the AHCI driver
>to fail the probe() for this AHCI controller when it has not been properly
>initialized (because it is marked as disabled in BIOS).
>
>I prefer option 3.
>Look at CAP.NP and look at the BAR size, if it is too small, just fail the

>probe().



Thank you for the clear direction. I agree that checking BAR size against
CAP.NP is the cleanest solution.


I will prepare a v2 patch implementing this check:


1. In ahci_init_one(), after mapping MMIO but before any port access.
2. Calculate: required_size = 0x100 + (CAP.NP * 0x80).
3. Compare with pci_resource_len(pdev, 5).
4. If required_size > BAR size, fail probe with -ENODEV and clear error.


The patch will include:
- A new ahci_validate_bar_size() helper function
- Clear error message: "AHCI: BAR5 too small for %u ports..."


I'll send the v2 patch shortly.


Best regards,
liyouhong>
>
>Kind regards,

>Niklas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region
  2026-04-22  8:03 [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region dayou5941
  2026-04-22 14:36 ` Niklas Cassel
@ 2026-04-30  0:59 ` kernel test robot
  2026-04-30 12:48   ` Niklas Cassel
  1 sibling, 1 reply; 10+ messages in thread
From: kernel test robot @ 2026-04-30  0:59 UTC (permalink / raw)
  To: dayou5941, dlemoal, cassel
  Cc: llvm, oe-kbuild-all, linux-ide, linux-kernel, liyouhong

Hi,

kernel test robot noticed the following build errors:

[auto build test ERROR on linus/master]
[also build test ERROR on v7.1-rc1 next-20260429]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/dayou5941-163-com/ata-libahci-fix-panic-when-accessing-ports-beyond-MMIO-region/20260422-192119
base:   linus/master
patch link:    https://lore.kernel.org/r/20260422080322.1006592-1-dayou5941%40163.com
patch subject: [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region
config: arm64-randconfig-003-20260430 (https://download.01.org/0day-ci/archive/20260430/202604300815.6kpEidbJ-lkp@intel.com/config)
compiler: clang version 23.0.0git (https://github.com/llvm/llvm-project 5bac06718f502014fade905512f1d26d578a18f3)
rustc: rustc 1.88.0 (6b00bc388 2025-06-23)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260430/202604300815.6kpEidbJ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202604300815.6kpEidbJ-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/ata/libahci.c:614:18: warning: missing terminating '"' character [-Winvalid-pp-token]
     614 |                         dev_warn(dev, "Port %d (offset 0x%llx) exceeds MMIO region (0x%llx),
         |                                       ^
   drivers/ata/libahci.c:615:44: warning: missing terminating '"' character [-Winvalid-pp-token]
     615 |                                        truncating port map at port %d\n",
         |                                                                        ^
>> drivers/ata/libahci.c:614:18: error: expected expression
     614 |                         dev_warn(dev, "Port %d (offset 0x%llx) exceeds MMIO region (0x%llx),
         |                                       ^
   2 warnings and 1 error generated.


vim +614 drivers/ata/libahci.c

   428	
   429	/**
   430	 *	ahci_save_initial_config - Save and fixup initial config values
   431	 *	@dev: target AHCI device
   432	 *	@hpriv: host private area to store config values
   433	 *
   434	 *	Some registers containing configuration info might be setup by
   435	 *	BIOS and might be cleared on reset.  This function saves the
   436	 *	initial values of those registers into @hpriv such that they
   437	 *	can be restored after controller reset.
   438	 *
   439	 *	If inconsistent, config values are fixed up by this function.
   440	 *
   441	 *	If it is not set already this function sets hpriv->start_engine to
   442	 *	ahci_start_engine.
   443	 *
   444	 *	LOCKING:
   445	 *	None.
   446	 */
   447	void ahci_save_initial_config(struct device *dev, struct ahci_host_priv *hpriv)
   448	{
   449		void __iomem *mmio = hpriv->mmio;
   450		void __iomem *port_mmio;
   451		unsigned long port_map;
   452		u32 cap, cap2, vers;
   453		unsigned long long mmio_size = 0;
   454		bool is_pci_dev = false;
   455		int i;
   456	
   457		/* make sure AHCI mode is enabled before accessing CAP */
   458		ahci_enable_ahci(mmio);
   459	
   460		/*
   461		 * Values prefixed with saved_ are written back to the HBA and ports
   462		 * registers after reset. Values without are used for driver operation.
   463		 */
   464	
   465		/*
   466		 * Override HW-init HBA capability fields with the platform-specific
   467		 * values. The rest of the HBA capabilities are defined as Read-only
   468		 * and can't be modified in CSR anyway.
   469		 */
   470		cap = readl(mmio + HOST_CAP);
   471		if (hpriv->saved_cap)
   472			cap = (cap & ~(HOST_CAP_SSS | HOST_CAP_MPS)) | hpriv->saved_cap;
   473		hpriv->saved_cap = cap;
   474	
   475		/* CAP2 register is only defined for AHCI 1.2 and later */
   476		vers = readl(mmio + HOST_VERSION);
   477		if ((vers >> 16) > 1 ||
   478		   ((vers >> 16) == 1 && (vers & 0xFFFF) >= 0x200))
   479			hpriv->saved_cap2 = cap2 = readl(mmio + HOST_CAP2);
   480		else
   481			hpriv->saved_cap2 = cap2 = 0;
   482	
   483		/* some chips have errata preventing 64bit use */
   484		if ((cap & HOST_CAP_64) && (hpriv->flags & AHCI_HFLAG_32BIT_ONLY)) {
   485			dev_info(dev, "controller can't do 64bit DMA, forcing 32bit\n");
   486			cap &= ~HOST_CAP_64;
   487		}
   488	
   489		if ((cap & HOST_CAP_NCQ) && (hpriv->flags & AHCI_HFLAG_NO_NCQ)) {
   490			dev_info(dev, "controller can't do NCQ, turning off CAP_NCQ\n");
   491			cap &= ~HOST_CAP_NCQ;
   492		}
   493	
   494		if (!(cap & HOST_CAP_NCQ) && (hpriv->flags & AHCI_HFLAG_YES_NCQ)) {
   495			dev_info(dev, "controller can do NCQ, turning on CAP_NCQ\n");
   496			cap |= HOST_CAP_NCQ;
   497		}
   498	
   499		if ((cap & HOST_CAP_PMP) && (hpriv->flags & AHCI_HFLAG_NO_PMP)) {
   500			dev_info(dev, "controller can't do PMP, turning off CAP_PMP\n");
   501			cap &= ~HOST_CAP_PMP;
   502		}
   503	
   504		if ((cap & HOST_CAP_SNTF) && (hpriv->flags & AHCI_HFLAG_NO_SNTF)) {
   505			dev_info(dev,
   506				 "controller can't do SNTF, turning off CAP_SNTF\n");
   507			cap &= ~HOST_CAP_SNTF;
   508		}
   509	
   510		if ((cap2 & HOST_CAP2_SDS) && (hpriv->flags & AHCI_HFLAG_NO_DEVSLP)) {
   511			dev_info(dev,
   512				 "controller can't do DEVSLP, turning off\n");
   513			cap2 &= ~HOST_CAP2_SDS;
   514			cap2 &= ~HOST_CAP2_SADM;
   515		}
   516	
   517		if (!(cap & HOST_CAP_FBS) && (hpriv->flags & AHCI_HFLAG_YES_FBS)) {
   518			dev_info(dev, "controller can do FBS, turning on CAP_FBS\n");
   519			cap |= HOST_CAP_FBS;
   520		}
   521	
   522		if ((cap & HOST_CAP_FBS) && (hpriv->flags & AHCI_HFLAG_NO_FBS)) {
   523			dev_info(dev, "controller can't do FBS, turning off CAP_FBS\n");
   524			cap &= ~HOST_CAP_FBS;
   525		}
   526	
   527		if (!(cap & HOST_CAP_ALPM) && (hpriv->flags & AHCI_HFLAG_YES_ALPM)) {
   528			dev_info(dev, "controller can do ALPM, turning on CAP_ALPM\n");
   529			cap |= HOST_CAP_ALPM;
   530		}
   531	
   532		if ((cap & HOST_CAP_SXS) && (hpriv->flags & AHCI_HFLAG_NO_SXS)) {
   533			dev_info(dev, "controller does not support SXS, disabling CAP_SXS\n");
   534			cap &= ~HOST_CAP_SXS;
   535		}
   536	
   537		/* Override the HBA ports mapping if the platform needs it */
   538		port_map = readl(mmio + HOST_PORTS_IMPL);
   539		if (hpriv->saved_port_map && port_map != hpriv->saved_port_map) {
   540			dev_info(dev, "forcing port_map 0x%lx -> 0x%x\n",
   541				 port_map, hpriv->saved_port_map);
   542			port_map = hpriv->saved_port_map;
   543		} else {
   544			hpriv->saved_port_map = port_map;
   545		}
   546	
   547		/* mask_port_map not set means that all ports are available */
   548		if (hpriv->mask_port_map) {
   549			dev_warn(dev, "masking port_map 0x%lx -> 0x%lx\n",
   550				port_map,
   551				port_map & hpriv->mask_port_map);
   552			port_map &= hpriv->mask_port_map;
   553		}
   554	
   555		/* cross check port_map and cap.n_ports */
   556		if (port_map) {
   557			int map_ports = 0;
   558	
   559			for (i = 0; i < AHCI_MAX_PORTS; i++)
   560				if (port_map & (1 << i))
   561					map_ports++;
   562	
   563			/* If PI has more ports than n_ports, whine, clear
   564			 * port_map and let it be generated from n_ports.
   565			 */
   566			if (map_ports > ahci_nr_ports(cap)) {
   567				dev_warn(dev,
   568					 "implemented port map (0x%lx) contains more ports than nr_ports (%u), using nr_ports\n",
   569					 port_map, ahci_nr_ports(cap));
   570				port_map = 0;
   571			}
   572		}
   573	
   574		/* fabricate port_map from cap.nr_ports for < AHCI 1.3 */
   575		if (!port_map && vers < 0x10300) {
   576			port_map = (1 << ahci_nr_ports(cap)) - 1;
   577			dev_warn(dev, "forcing PORTS_IMPL to 0x%lx\n", port_map);
   578	
   579			/* write the fixed up value to the PI register */
   580			hpriv->saved_port_map = port_map;
   581		}
   582	
   583		is_pci_dev = dev_is_pci(dev);
   584		if (is_pci_dev) {
   585			struct pci_dev *pdev = to_pci_dev(dev);
   586	
   587			mmio_size = (unsigned long long)pci_resource_len(pdev, 5);
   588		}
   589	
   590		/*
   591		 * Preserve the ports capabilities defined by the platform. Note there
   592		 * is no need in storing the rest of the P#.CMD fields since they are
   593		 * volatile.
   594		 */
   595		for_each_set_bit(i, &port_map, AHCI_MAX_PORTS) {
   596			if (hpriv->saved_port_cap[i])
   597				continue;
   598	
   599			port_mmio = __ahci_port_base(hpriv, i);
   600	
   601			/* Calculate offset from MMIO base */
   602			unsigned long long port_offset = (unsigned long long)port_mmio -
   603							 (unsigned long long)mmio;
   604			/* Check if port register block is within MMIO region */
   605			if (is_pci_dev && port_offset >= mmio_size) {
   606				/*
   607				 * Port registers exceed MMIO region boundary.
   608				 * Since ports are sequentially mapped (0x100 + i*0x80),
   609				 * all subsequent ports will also exceed the boundary.
   610				 *
   611				 * Update port_map to exclude this and all higher ports,
   612				 * then break out of the loop.
   613				 */
 > 614				dev_warn(dev, "Port %d (offset 0x%llx) exceeds MMIO region (0x%llx),
   615					       truncating port map at port %d\n",
   616					       i, port_offset, mmio_size, i-1);
   617	
   618				port_map = (1UL << i) - 1;
   619				hpriv->saved_port_map = port_map;
   620				break;
   621			}
   622	
   623			hpriv->saved_port_cap[i] =
   624				readl(port_mmio + PORT_CMD) & PORT_CMD_CAP;
   625		}
   626	
   627		/* record values to use during operation */
   628		hpriv->cap = cap;
   629		hpriv->cap2 = cap2;
   630		hpriv->version = vers;
   631		hpriv->port_map = port_map;
   632	
   633		if (!hpriv->start_engine)
   634			hpriv->start_engine = ahci_start_engine;
   635	
   636		if (!hpriv->stop_engine)
   637			hpriv->stop_engine = ahci_stop_engine;
   638	
   639		if (!hpriv->irq_handler)
   640			hpriv->irq_handler = ahci_single_level_irq_intr;
   641	}
   642	EXPORT_SYMBOL_GPL(ahci_save_initial_config);
   643	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region
  2026-04-30  0:59 ` kernel test robot
@ 2026-04-30 12:48   ` Niklas Cassel
  0 siblings, 0 replies; 10+ messages in thread
From: Niklas Cassel @ 2026-04-30 12:48 UTC (permalink / raw)
  To: kernel test robot
  Cc: dayou5941, dlemoal, llvm, oe-kbuild-all, linux-ide, linux-kernel,
	liyouhong

On Thu, Apr 30, 2026 at 08:59:57AM +0800, kernel test robot wrote:
> Hi,
> 
> kernel test robot noticed the following build errors:
> 
> [auto build test ERROR on linus/master]
> [also build test ERROR on v7.1-rc1 next-20260429]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/dayou5941-163-com/ata-libahci-fix-panic-when-accessing-ports-beyond-MMIO-region/20260422-192119
> base:   linus/master
> patch link:    https://lore.kernel.org/r/20260422080322.1006592-1-dayou5941%40163.com
> patch subject: [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region
> config: arm64-randconfig-003-20260430 (https://download.01.org/0day-ci/archive/20260430/202604300815.6kpEidbJ-lkp@intel.com/config)
> compiler: clang version 23.0.0git (https://github.com/llvm/llvm-project 5bac06718f502014fade905512f1d26d578a18f3)
> rustc: rustc 1.88.0 (6b00bc388 2025-06-23)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260430/202604300815.6kpEidbJ-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202604300815.6kpEidbJ-lkp@intel.com/
> 
> All errors (new ones prefixed by >>):
> 
>    drivers/ata/libahci.c:614:18: warning: missing terminating '"' character [-Winvalid-pp-token]
>      614 |                         dev_warn(dev, "Port %d (offset 0x%llx) exceeds MMIO region (0x%llx),
>          |                                       ^
>    drivers/ata/libahci.c:615:44: warning: missing terminating '"' character [-Winvalid-pp-token]
>      615 |                                        truncating port map at port %d\n",
>          |                                                                        ^
> >> drivers/ata/libahci.c:614:18: error: expected expression
>      614 |                         dev_warn(dev, "Port %d (offset 0x%llx) exceeds MMIO region (0x%llx),
>          |                                       ^
>    2 warnings and 1 error generated.

Thank you dear kernel test robot,

However, this report is for V1, we have applied V5 of this patch:
https://lore.kernel.org/linux-ide/177735961706.2183577.4033506737290437474.b4-ty@b4/T/#t

Which does not have this problem.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-04-30 12:48 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-22  8:03 [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region dayou5941
2026-04-22 14:36 ` Niklas Cassel
     [not found]   ` <55809835.8838.19db9be1205.Coremail.dayou5941@163.com>
2026-04-23 17:19     ` Niklas Cassel
2026-04-24  2:43       ` Damien Le Moal
     [not found]         ` <13d7d471.6389.19dbe594e14.Coremail.dayou5941@163.com>
2026-04-24 11:07           ` Niklas Cassel
2026-04-24 11:15             ` Damien Le Moal
2026-04-25  6:15             ` Re:Re: " 李佑鸿 
     [not found]       ` <26f59adf.571d.19dbe310f41.Coremail.dayou5941@163.com>
2026-04-24 11:07         ` Niklas Cassel
2026-04-30  0:59 ` kernel test robot
2026-04-30 12:48   ` Niklas Cassel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox