* [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region
@ 2026-04-22 8:03 dayou5941
2026-04-22 14:36 ` Niklas Cassel
2026-04-30 0:59 ` kernel test robot
0 siblings, 2 replies; 10+ messages in thread
From: dayou5941 @ 2026-04-22 8:03 UTC (permalink / raw)
To: dlemoal, cassel; +Cc: linux-ide, linux-kernel, liyouhong
From: liyouhong <liyouhong@kylinos.cn>
Commit 18ee7c49f75b ("ata: ahci: Introduce firmware-specific caps
initialization") introduced a regression where the driver would
attempt to access port registers beyond the mapped MMIO region when
HOST_PORTS_IMPL and HOST_CAP contains invalid values.
This occurs in scenarios where:
1. The AHCI controller is disabled by BIOS, causing registers to read
as 0xFFFFFFFF.
2. HOST_PORTS_IMPL indicates more ports than physically implemented.
3. The actual MMIO region is smaller than needed for the indicated
ports.
When the driver iterates through ports indicated in HOST_PORTS_IMPL and
accesses PORT_CMD registers, it may calculate port offsets that exceed
the mapped MMIO region. On ARM64 systems, this causes a kernel panic.
when attempting to read from the unmapped address.
When mmio_size is 0x1000 and all ports need to be accessed.
The issue manifests as a panic in __raw_readl() when i=30, as the
calculated offset (0x1000) equals the typical MMIO region size (0x1000),
placing the access exactly at the boundary of the mapped region.
The log is as follows:
[ 0.389751][ T1] Unable to handle kernel paging request at virtual address ffff800082916018
[ 0.389752][ T1] Mem abort info:
[ 0.389753][ T1] ESR = 0x0000000096000007
[ 0.389754][ T1] EC = 0x25: D
ABT (current EL), IL = 32 bits
[ 0.389756][ T1] SET = 0, FnV = 0
[ 0.389757][ T1] EA = 0, S1PTW = 0
[ 0.389758][ T1] FSC = 0x07: level 3 translation fault
[ 0.389759][ T1] Data abort info:
[ 0.389760][ T1] ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
[ 0.389761][ T1] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 0.389762][ T1] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 0.389764][ T1] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000e16b1000
[ 0.389766][ T1] [ffff800082916018] pgd=10000000e264d003, p4d=10000000e264d003, pud=10000000e264e003, pmd=10000000e2653003, pte=0000000000000000
[ 0.389772][ T1] Internal error: Oops: 0000000096000007 [#1] SMP
[ 0.391912][ T1] Modules linked in:
[ 0.397407][ T71] nvme nvme0: allocated 16 MiB host memory buffer (1 segment).
[ 0.406067][ T71] nvme nvme0: 8/0/0 default/read/poll queues
[ 0.426950][ T12] nvme0n1: p1 p2 p3 p4 p5 p6
[ 0.427171][ T71] probe of 0000:01:00.0 returned 0 after 37797 usecs
[ 0.428770][ T80] Freeing initrd memory: 39012K
[ 12.823718][ T1] CPU: 1 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-dirty #3 PREEMPTLAZY
[ 12.832490][ T1] Hardware name: ISOFTSTONE COMPUTER TianYaoW600 Series/EM_FC19M, BIOS KL4.2B.EM.D.R003.RTHF.260414.R 04/14/2026 16:26:14
[ 12.844992][ T1] pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 12.852635][ T1] pc : ahci_init_one+0x6a8/0xf68
[ 12.857417][ T1] lr : ahci_init_one+0x668/0xf68
[ 12.862194][ T1] sp : ffff8000828aba20
[ 12.866189][ T1] x29: ffff8000828aba20 x28: 000000000000001e x27: ffff800081742300
[ 12.874007][ T1] x26: ffff0020289b8000 x25: 0000000000000001 x24: ffff002020f710d0
[ 12.881823][ T1] x23: 0000000000000005 x22: ffff002026d9ee00 x21: ffff002026d9f280
[ 12.889639][ T1] x20: ffff002020f71000 x19: 0000000000000000 x18: 00000000ffffffff
[ 12.897456][ T1] x17: ffff002021c7c000 x16: ffff002020d6ee00 x15: ffff8000828ab850
[ 12.905272][ T1] x14: ffff0020289ba42c x13: ffff0020289ba3f9 x12: 0101010101010101
[ 12.913088][ T1] x11: 7f7f7f7f7f7f7f7f x10: 0000000000000078 x9 : ffff800080bdf1b0
[ 12.920905][ T1] x8 : 0000000000000030 x7 : 0000000000000008 x6 : ffff800081271928
[ 12.928721][ T1] x5 : 0000000000000030 x4 : 0000000000000030 x3 : 0000000000000000
[ 12.936537][ T1] x2 : ffff002026d9f280 x1 : 0000000000001018 x0 : ffff800082916018
[ 12.944354][ T1] Call trace:
[ 12.947481][ T1] ahci_init_one+0x6a8/0xf68 (P)
[ 12.952260][ T1] local_pci_probe+0x44/0xb0
[ 12.956693][ T1] pci_device_probe+0xd4/0x268
[ 12.961298][ T1] really_probe+0xc4/0x3e8
[ 12.965557][ T1] __driver_probe_device+0xd4/0x188
[ 12.970595][ T1] driver_probe_device+0x40/0x118
[ 12.975460][ T1] __driver_attach+0xe8/0x218
[ 12.979977][ T1] bus_for_each_dev+0x7c/0xe0
[ 12.984494][ T1] driver_attach+0x28/0x38
[ 12.988751][ T1] bus_add_driver+0x120/0x248
[ 12.993269][ T1] driver_register+0x60/0x128
[ 12.997787][ T1] __pci_register_driver+0x50/0x60
[ 13.002739][ T1] ahci_pci_driver_init+0x28/0x38
[ 13.007605][ T1] do_one_initcall+0x58/0x450
[ 13.012124][ T1] kernel_init_freeable+0x1fc/0x560
[ 13.017164][ T1] kernel_init+0x28/0x1f8
[ 13.021337][ T1] ret_from_fork+0x10/0x20
[ 13.025596][ T1] Code: 53196021 91046021 f9400840 8b010000 (b9400000)
[ 13.032370][ T1] ---[ end trace 0000000000000000 ]---
[ 13.037737][ T1] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 13.046077][ T1] SMP: stopping secondary CPUs
[ 13.050684][ T1] Kernel Offset: disabled
[ 13.054852][ T1] CPU features: 0x0000000,000e0005,40230521,0401720b
[ 13.061366][ T1] Memory Limit: none
[ 13.065102][ T1] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
Fix this by adding bounds checking in ahci_save_initial_config(). Before
accessing each port's PORT_CMD register, verify that the port's register
block offset is within the mapped MMIO region. Skip any ports that would
require accessing memory beyond the mapped region.
Fixes: 18ee7c49f75b ("ata: ahci: Introduce firmware-specific caps initialization")
Signed-off-by: liyouhong <liyouhong@kylinos.cn>
diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index 2d1ca9f2f546..69de3560aeec 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -468,6 +468,8 @@ void ahci_save_initial_config(struct device *dev, struct ahci_host_priv *hpriv)
void __iomem *port_mmio;
unsigned long port_map;
u32 cap, cap2, vers;
+ unsigned long long mmio_size = 0;
+ bool is_pci_dev = false;
int i;
/* make sure AHCI mode is enabled before accessing CAP */
@@ -595,6 +597,13 @@ void ahci_save_initial_config(struct device *dev, struct ahci_host_priv *hpriv)
hpriv->saved_port_map = port_map;
}
+ is_pci_dev = dev_is_pci(dev);
+ if (is_pci_dev) {
+ struct pci_dev *pdev = to_pci_dev(dev);
+
+ mmio_size = (unsigned long long)pci_resource_len(pdev, 5);
+ }
+
/*
* Preserve the ports capabilities defined by the platform. Note there
* is no need in storing the rest of the P#.CMD fields since they are
@@ -605,6 +614,29 @@ void ahci_save_initial_config(struct device *dev, struct ahci_host_priv *hpriv)
continue;
port_mmio = __ahci_port_base(hpriv, i);
+
+ /* Calculate offset from MMIO base */
+ unsigned long long port_offset = (unsigned long long)port_mmio -
+ (unsigned long long)mmio;
+ /* Check if port register block is within MMIO region */
+ if (is_pci_dev && port_offset >= mmio_size) {
+ /*
+ * Port registers exceed MMIO region boundary.
+ * Since ports are sequentially mapped (0x100 + i*0x80),
+ * all subsequent ports will also exceed the boundary.
+ *
+ * Update port_map to exclude this and all higher ports,
+ * then break out of the loop.
+ */
+ dev_warn(dev, "Port %d (offset 0x%llx) exceeds MMIO region (0x%llx),
+ truncating port map at port %d\n",
+ i, port_offset, mmio_size, i-1);
+
+ port_map = (1UL << i) - 1;
+ hpriv->saved_port_map = port_map;
+ break;
+ }
+
hpriv->saved_port_cap[i] =
readl(port_mmio + PORT_CMD) & PORT_CMD_CAP;
}
--
2.33.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region 2026-04-22 8:03 [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region dayou5941 @ 2026-04-22 14:36 ` Niklas Cassel [not found] ` <55809835.8838.19db9be1205.Coremail.dayou5941@163.com> 2026-04-30 0:59 ` kernel test robot 1 sibling, 1 reply; 10+ messages in thread From: Niklas Cassel @ 2026-04-22 14:36 UTC (permalink / raw) To: dayou5941; +Cc: dlemoal, linux-ide, linux-kernel, liyouhong Hello liyouhong, On Wed, Apr 22, 2026 at 04:03:22PM +0800, dayou5941@163.com wrote: > From: liyouhong <liyouhong@kylinos.cn> > > Commit 18ee7c49f75b ("ata: ahci: Introduce firmware-specific caps > initialization") introduced a regression where the driver would > attempt to access port registers beyond the mapped MMIO region when > HOST_PORTS_IMPL and HOST_CAP contains invalid values. > > This occurs in scenarios where: > 1. The AHCI controller is disabled by BIOS, causing registers to read > as 0xFFFFFFFF. > 2. HOST_PORTS_IMPL indicates more ports than physically implemented. > 3. The actual MMIO region is smaller than needed for the indicated > ports. I'm not a fan of this patch for two reasons: 1) The AHCI specification, 1.3.1, "10.1.1 Firmware Specific Initialization": Firmware shall always initialize the following registers and values: CAP.SSS (support for staggered spin-up) CAP.SMPS (support for mechanical presence switches) PI (ports implemented) ... Thus, it is a spec violation to not initialize these registers correctly. If you have an AHCI controller that is a PCIe card, it will surely expose the Ports Implemented register correctly. So this problem should be limited to on-boad AHCI controllers, where BIOS/platform firmware has violated the specification, and not initialized the registers required by the specification. If you can disable the controller via BIOS, then the most logical thing is for these on-boards AHCI controllers to actually get disabled and not show up in lspci at all. As far as I know, that is how it works on Intel and AMD systems. 2) Why limit this to AHCI controllers connected via PCI? Surely device tree platforms with a AHCI controller connected directly to the AXI bus can also violate the specification and not initialize the PI (Ports Implemented) register. Sure, for AHCI controllers connected without a PCI bus, there is the "ports-implemented" device tree property, that can be used to initialize this register correctly for platforms without any embedded firmware. But if we should implement a workaround, why add code that is explcitly limited to PCI? I think a better solution is to limit Ports Implemented to CAP.NP. As Ports Implemented is a bitmask with up to CAP.NP bits. Ports Implemented is not allowed to be larger than CAP.NP, it can only mark ports within the CAP.NP MMIO region as "unimplemented". So the highest bit set in Ports Implemented can not be higher than CAP.NP. (I.e. a per port MMIO offset is not allowed to be larger than CAP.NP * MMIO registers per port.) Looking at the libata code, it looks like we have code for this already: https://github.com/torvalds/linux/blob/v7.0/drivers/ata/libahci.c#L561-L569 Could you perhaps debug to see why this code in no longer working as intended? If you revert 18ee7c49f75b ("ata: ahci: Introduce firmware-specific caps initialization"), do you see the "implemented port map (0x%lx) contains more ports than nr_ports (%u), using nr_ports" warning in the log? Please share the equivalent to these prints after reverting the commit: [1825468.337982] ahci 0000:00:17.0: AHCI vers 0001.0301, 32 command slots, 6 Gbps, SATA mode [1825468.350108] ahci 0000:00:17.0: 8/8 ports implemented (port mask 0xff) > When mmio_size is 0x1000 and all ports need to be accessed. > The issue manifests as a panic in __raw_readl() when i=30, as the > calculated offset (0x1000) equals the typical MMIO region size (0x1000), > placing the access exactly at the boundary of the mapped region. AFAICT, port_map gets limited to CAP.NP, and we only access ports that are within port_map (and this offset < CAP.NP): for_each_set_bit(i, &port_map, AHCI_MAX_PORTS) { if (hpriv->saved_port_cap[i]) continue; port_mmio = __ahci_port_base(hpriv, i); Wait a second: > 3. The actual MMIO region is smaller than needed for the indicated > ports. I hope you are not saying that your PCI device is only exposing a BAR that is smaller than needed to contain the number of ports defined in CAP.NP in this controller? That sounds like a seriously buggy controller... Could you please share the output of (with the PCI BDF of you controller of course): $ sudo lspci -vvvnns 0000:00:17.0 00:17.0 SATA controller [0106]: Intel Corporation C620 Series Chipset Family SATA Controller [AHCI mode] [8086:a182] (rev 0a) (prog-if 01 [AHCI 1.0]) Subsystem: Super Micro Computer Inc Device [15d9:1b62] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 138 NUMA node: 0 IOMMU group: 28 Region 0: Memory at a5984000 (32-bit, non-prefetchable) [size=8K] Region 1: Memory at a598a000 (32-bit, non-prefetchable) [size=256] Region 2: I/O ports at 4050 [size=8] Region 3: I/O ports at 4040 [size=4] Region 4: I/O ports at 4000 [size=32] Region 5: Memory at a5700000 (32-bit, non-prefetchable) [size=512K] So we can see PCI vendor ID, PCI device ID, and size of the BARs? From 3.3 Port Registers (one set per port): Port offset = 100h + (PI Asserted Bit Position * 80h) CAP.NP can represent at most 32 ports. (Note that the register field itself is 0's based, so a value of 0 means 1 port and a value of 31 means 32 ports.) If you really have 32 ports, then you would need a BAR that has size: 0x100+(31*0x80) + 0x80 (start offset + size of the per port register itself): 0x1100 Then we would need to add a quirk for your PCI device and PCI vendor ID, which limits the number of ports to 30 (by setting CAP.NP to 29) for your AHCI controller. Kind regards, Niklas ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <55809835.8838.19db9be1205.Coremail.dayou5941@163.com>]
* Re: Re: [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region [not found] ` <55809835.8838.19db9be1205.Coremail.dayou5941@163.com> @ 2026-04-23 17:19 ` Niklas Cassel 2026-04-24 2:43 ` Damien Le Moal [not found] ` <26f59adf.571d.19dbe310f41.Coremail.dayou5941@163.com> 0 siblings, 2 replies; 10+ messages in thread From: Niklas Cassel @ 2026-04-23 17:19 UTC (permalink / raw) To: 李佑鸿; +Cc: dlemoal, linux-ide, linux-kernel, liyouhong Hello liyouhong, On Thu, Apr 23, 2026 at 05:48:54PM +0800, 李佑鸿 wrote: > However, I have already confirmed with the BIOS supplier that when the > BIOS disables all SATA ports, it does indeed initialize the values of > the HOST_CAP and HOSTPorts_IMPL registers in accordance with the specifications. > > > /* Register values after disabling SATA in BIOS */ > HOST_CAP (0x00) = 0xffffffff > HOST_PORTS_IMPL (0x0c) = 0xffffffff > HOST_VERSION (0x10) = 0xffffffff > MMIO_SIZE = 4096 I am actually very surprised to see e.g. CAP (0x00) and AHCI VERSION (0x10) being uninitialized. These registers are not mentioned in: AHCI 1.3.1 - 10.1.1 Firmware Specific Initialization And I would have expected them to have fixed value regardless if BIOS has enabled the controller or not, because e.g. CAP.NP (number of ports) and the AHCI version must obviously be known when synthesizing the IP. > lspci -vvvnns 05:00.0 > 05:00.0 SATA controller [0106]: Phytium Technology Co., Ltd. Device [1db7:d001] (rev 01) (prog-if 01 [AHCI 1.0]) > Region 5: Memory at 5b800000 (32-bit, non-prefetchable) [size=4K] So the BAR size is 4K (when the controller is disabled). Could you please enable the controller in BIOS, and then dump the value of the CAP (0x00) register. And also double check that the BAR size is still 4K when the controller is enabled? We could then add a quirk to drivers/ata/ahci.c, something like: diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index 1d73a53370cf..e7250781e9b7 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -2003,6 +2003,10 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) if (pdev->vendor == 0x177d && pdev->device == 0xa01c) hpriv->irq_handler = ahci_thunderx_irq_handler; + + /* Phytium SATA controller has empty CAP register */ + if (pdev->vendor == 0x1db7 && pdev->device == 0xd001) + hpriv->saved_cap = 0xC734FF02; #endif /* save initial config */ Where you replace 0xC734FF02 with whatever you get when reading the CAP (0x00) register when the controller is enabled. Kind regards, Niklas ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region 2026-04-23 17:19 ` Niklas Cassel @ 2026-04-24 2:43 ` Damien Le Moal [not found] ` <13d7d471.6389.19dbe594e14.Coremail.dayou5941@163.com> [not found] ` <26f59adf.571d.19dbe310f41.Coremail.dayou5941@163.com> 1 sibling, 1 reply; 10+ messages in thread From: Damien Le Moal @ 2026-04-24 2:43 UTC (permalink / raw) To: Niklas Cassel, 李佑鸿 Cc: linux-ide, linux-kernel, liyouhong On 4/24/26 02:19, Niklas Cassel wrote: > Hello liyouhong, > > On Thu, Apr 23, 2026 at 05:48:54PM +0800, 李佑鸿 wrote: >> However, I have already confirmed with the BIOS supplier that when the >> BIOS disables all SATA ports, it does indeed initialize the values of >> the HOST_CAP and HOSTPorts_IMPL registers in accordance with the specifications. >> >> >> /* Register values after disabling SATA in BIOS */ >> HOST_CAP (0x00) = 0xffffffff >> HOST_PORTS_IMPL (0x0c) = 0xffffffff >> HOST_VERSION (0x10) = 0xffffffff >> MMIO_SIZE = 4096 > > I am actually very surprised to see e.g. CAP (0x00) and AHCI VERSION (0x10) > being uninitialized. What I am surprised of here is that we even see that device on the PCI bus at all when it is disabled in the BIOs. If that device is disabled, why are we even seeing it by scanning the PCI ports ? The adapter should simply not be visible at all. Trying to debug register values when we should not even be seeing the device in the first place does not make sense to me. If anything, I would take a really big hammer here and try to completely ignore that adapter if we can somehow detect that it has been in fact disabled in the BIOS. But that detection may be challenging to do since it seems we are dealing with a very buggy BIOS. So maybe we should simply warn and exit probe early if we see a PCI BAR size that is broken. -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <13d7d471.6389.19dbe594e14.Coremail.dayou5941@163.com>]
* Re: Re: [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region [not found] ` <13d7d471.6389.19dbe594e14.Coremail.dayou5941@163.com> @ 2026-04-24 11:07 ` Niklas Cassel 2026-04-24 11:15 ` Damien Le Moal 2026-04-25 6:15 ` Re:Re: " 李佑鸿 0 siblings, 2 replies; 10+ messages in thread From: Niklas Cassel @ 2026-04-24 11:07 UTC (permalink / raw) To: 李佑鸿 Cc: Damien Le Moal, linux-ide, linux-kernel, liyouhong On Fri, Apr 24, 2026 at 03:16:56PM +0800, 李佑鸿 wrote: > At 2026-04-24 10:43:11, "Damien Le Moal" <dlemoal@kernel.org> wrote: > > > >What I am surprised of here is that we even see that device on the PCI bus at > >all when it is disabled in the BIOs. If that device is disabled, why are we even > >seeing it by scanning the PCI ports ? The adapter should simply not be visible > >at all. I agree. E.g. both AMD and Intel make sure that the AHCI controller PCI device does not show up on the PCI bus if you disable it using the BIOS. It would have been nice if the Phytium BIOS also worked like that. > 3. **BAR Size Mismatch**: `CAP.NP` indicates more ports than physically fit in the BAR > - If CAP claims 32 ports but BAR is only 4KB, this is physically impossible > - 32 ports require at least 0x1100 bytes (0x100 + 32 * 0x80) The reason why I did not suggest failing the probe() originally, was because 李佑鸿 sent a patch with a Fixes tag, so I assumed that he considered it a regression, and wanted the hardware to continue working, even though it was marked as disabled in BIOS, because that is how it was before the commit in the Fixes tag was introduced. That said, I fully agree that I think it is better to modify the AHCI driver to fail the probe() for this AHCI controller when it has not been properly initialized (because it is marked as disabled in BIOS). I prefer option 3. Look at CAP.NP and look at the BAR size, if it is too small, just fail the probe(). Kind regards, Niklas ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region 2026-04-24 11:07 ` Niklas Cassel @ 2026-04-24 11:15 ` Damien Le Moal 2026-04-25 6:15 ` Re:Re: " 李佑鸿 1 sibling, 0 replies; 10+ messages in thread From: Damien Le Moal @ 2026-04-24 11:15 UTC (permalink / raw) To: Niklas Cassel, 李佑鸿 Cc: linux-ide, linux-kernel, liyouhong On 4/24/26 20:07, Niklas Cassel wrote: > On Fri, Apr 24, 2026 at 03:16:56PM +0800, 李佑鸿 wrote: >> At 2026-04-24 10:43:11, "Damien Le Moal" <dlemoal@kernel.org> wrote: >>> >>> What I am surprised of here is that we even see that device on the PCI bus at >>> all when it is disabled in the BIOs. If that device is disabled, why are we even >>> seeing it by scanning the PCI ports ? The adapter should simply not be visible >>> at all. > > I agree. > E.g. both AMD and Intel make sure that the AHCI controller PCI device does > not show up on the PCI bus if you disable it using the BIOS. > > It would have been nice if the Phytium BIOS also worked like that. > > >> 3. **BAR Size Mismatch**: `CAP.NP` indicates more ports than physically fit in the BAR >> - If CAP claims 32 ports but BAR is only 4KB, this is physically impossible >> - 32 ports require at least 0x1100 bytes (0x100 + 32 * 0x80) > > The reason why I did not suggest failing the probe() originally, was because > 李佑鸿 sent a patch with a Fixes tag, so I assumed that he considered it a > regression, and wanted the hardware to continue working, even though it was > marked as disabled in BIOS, because that is how it was before the commit in > the Fixes tag was introduced. > > That said, I fully agree that I think it is better to modify the AHCI driver > to fail the probe() for this AHCI controller when it has not been properly > initialized (because it is marked as disabled in BIOS). > > I prefer option 3. > Look at CAP.NP and look at the BAR size, if it is too small, just fail the > probe(). Sounds good (and a lot safer) to me. > > > Kind regards, > Niklas -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re:Re: Re: [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region 2026-04-24 11:07 ` Niklas Cassel 2026-04-24 11:15 ` Damien Le Moal @ 2026-04-25 6:15 ` 李佑鸿 1 sibling, 0 replies; 10+ messages in thread From: 李佑鸿 @ 2026-04-25 6:15 UTC (permalink / raw) To: Niklas Cassel; +Cc: Damien Le Moal, linux-ide, linux-kernel, liyouhong Hi Niklas, Damien, At 2026-04-24 19:07:33, "Niklas Cassel" <cassel@kernel.org> wrote: >On Fri, Apr 24, 2026 at 03:16:56PM +0800, 李佑鸿 wrote: >> At 2026-04-24 10:43:11, "Damien Le Moal" <dlemoal@kernel.org> wrote: >> > >> >What I am surprised of here is that we even see that device on the PCI bus at >> >all when it is disabled in the BIOs. If that device is disabled, why are we even >> >seeing it by scanning the PCI ports ? The adapter should simply not be visible >> >at all. > >I agree. >E.g. both AMD and Intel make sure that the AHCI controller PCI device does >not show up on the PCI bus if you disable it using the BIOS. > >It would have been nice if the Phytium BIOS also worked like that. > > >> 3. **BAR Size Mismatch**: `CAP.NP` indicates more ports than physically fit in the BAR >> - If CAP claims 32 ports but BAR is only 4KB, this is physically impossible >> - 32 ports require at least 0x1100 bytes (0x100 + 32 * 0x80) > >The reason why I did not suggest failing the probe() originally, was because >李佑鸿 sent a patch with a Fixes tag, so I assumed that he considered it a >regression, and wanted the hardware to continue working, even though it was >marked as disabled in BIOS, because that is how it was before the commit in >the Fixes tag was introduced. > >That said, I fully agree that I think it is better to modify the AHCI driver >to fail the probe() for this AHCI controller when it has not been properly >initialized (because it is marked as disabled in BIOS). > >I prefer option 3. >Look at CAP.NP and look at the BAR size, if it is too small, just fail the >probe(). Thank you for the clear direction. I agree that checking BAR size against CAP.NP is the cleanest solution. I will prepare a v2 patch implementing this check: 1. In ahci_init_one(), after mapping MMIO but before any port access. 2. Calculate: required_size = 0x100 + (CAP.NP * 0x80). 3. Compare with pci_resource_len(pdev, 5). 4. If required_size > BAR size, fail probe with -ENODEV and clear error. The patch will include: - A new ahci_validate_bar_size() helper function - Clear error message: "AHCI: BAR5 too small for %u ports..." I'll send the v2 patch shortly. Best regards, liyouhong> > >Kind regards, >Niklas ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <26f59adf.571d.19dbe310f41.Coremail.dayou5941@163.com>]
* Re: Re: Re: [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region [not found] ` <26f59adf.571d.19dbe310f41.Coremail.dayou5941@163.com> @ 2026-04-24 11:07 ` Niklas Cassel 0 siblings, 0 replies; 10+ messages in thread From: Niklas Cassel @ 2026-04-24 11:07 UTC (permalink / raw) To: 李佑鸿; +Cc: dlemoal, linux-ide, linux-kernel, liyouhong On Fri, Apr 24, 2026 at 02:32:59PM +0800, 李佑鸿 wrote: > However, in the following code: > cap = readl(mmio + HOST_CAP); /* When cap = 0xFFFFFFFF */ > if (hpriv->saved_cap) > cap = (cap & ~(HOST_CAP_SSS | HOST_CAP_MPS)) | hpriv->saved_cap; /* When hpriv->saved_cap = 0xEF36FF81 */ Sorry for that. From the name hpriv->saved_cap, I assumed that it would actually override hpriv->cap, but it seems that it only allows you to set additional caps, not to override/clear caps that are broken... I guess we could theoretically do: diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index 1d73a53370cf..94c3c740134a 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -2003,6 +2003,10 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) if (pdev->vendor == 0x177d && pdev->device == 0xa01c) hpriv->irq_handler = ahci_thunderx_irq_handler; + + /* Phytium SATA controller has bogus CAP register */ + if (pdev->vendor == 0x1db7 && pdev->device == 0xd001) + writel(0xEF36FF81, hpriv->mmio + HOST_CAP); #endif /* save initial config */ However, that said, I agree with Damien, I think a better solution is to fail the probe(). Will reply to that mail with more details. Kind regards, Niklas ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region 2026-04-22 8:03 [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region dayou5941 2026-04-22 14:36 ` Niklas Cassel @ 2026-04-30 0:59 ` kernel test robot 2026-04-30 12:48 ` Niklas Cassel 1 sibling, 1 reply; 10+ messages in thread From: kernel test robot @ 2026-04-30 0:59 UTC (permalink / raw) To: dayou5941, dlemoal, cassel Cc: llvm, oe-kbuild-all, linux-ide, linux-kernel, liyouhong Hi, kernel test robot noticed the following build errors: [auto build test ERROR on linus/master] [also build test ERROR on v7.1-rc1 next-20260429] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/dayou5941-163-com/ata-libahci-fix-panic-when-accessing-ports-beyond-MMIO-region/20260422-192119 base: linus/master patch link: https://lore.kernel.org/r/20260422080322.1006592-1-dayou5941%40163.com patch subject: [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region config: arm64-randconfig-003-20260430 (https://download.01.org/0day-ci/archive/20260430/202604300815.6kpEidbJ-lkp@intel.com/config) compiler: clang version 23.0.0git (https://github.com/llvm/llvm-project 5bac06718f502014fade905512f1d26d578a18f3) rustc: rustc 1.88.0 (6b00bc388 2025-06-23) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260430/202604300815.6kpEidbJ-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202604300815.6kpEidbJ-lkp@intel.com/ All errors (new ones prefixed by >>): drivers/ata/libahci.c:614:18: warning: missing terminating '"' character [-Winvalid-pp-token] 614 | dev_warn(dev, "Port %d (offset 0x%llx) exceeds MMIO region (0x%llx), | ^ drivers/ata/libahci.c:615:44: warning: missing terminating '"' character [-Winvalid-pp-token] 615 | truncating port map at port %d\n", | ^ >> drivers/ata/libahci.c:614:18: error: expected expression 614 | dev_warn(dev, "Port %d (offset 0x%llx) exceeds MMIO region (0x%llx), | ^ 2 warnings and 1 error generated. vim +614 drivers/ata/libahci.c 428 429 /** 430 * ahci_save_initial_config - Save and fixup initial config values 431 * @dev: target AHCI device 432 * @hpriv: host private area to store config values 433 * 434 * Some registers containing configuration info might be setup by 435 * BIOS and might be cleared on reset. This function saves the 436 * initial values of those registers into @hpriv such that they 437 * can be restored after controller reset. 438 * 439 * If inconsistent, config values are fixed up by this function. 440 * 441 * If it is not set already this function sets hpriv->start_engine to 442 * ahci_start_engine. 443 * 444 * LOCKING: 445 * None. 446 */ 447 void ahci_save_initial_config(struct device *dev, struct ahci_host_priv *hpriv) 448 { 449 void __iomem *mmio = hpriv->mmio; 450 void __iomem *port_mmio; 451 unsigned long port_map; 452 u32 cap, cap2, vers; 453 unsigned long long mmio_size = 0; 454 bool is_pci_dev = false; 455 int i; 456 457 /* make sure AHCI mode is enabled before accessing CAP */ 458 ahci_enable_ahci(mmio); 459 460 /* 461 * Values prefixed with saved_ are written back to the HBA and ports 462 * registers after reset. Values without are used for driver operation. 463 */ 464 465 /* 466 * Override HW-init HBA capability fields with the platform-specific 467 * values. The rest of the HBA capabilities are defined as Read-only 468 * and can't be modified in CSR anyway. 469 */ 470 cap = readl(mmio + HOST_CAP); 471 if (hpriv->saved_cap) 472 cap = (cap & ~(HOST_CAP_SSS | HOST_CAP_MPS)) | hpriv->saved_cap; 473 hpriv->saved_cap = cap; 474 475 /* CAP2 register is only defined for AHCI 1.2 and later */ 476 vers = readl(mmio + HOST_VERSION); 477 if ((vers >> 16) > 1 || 478 ((vers >> 16) == 1 && (vers & 0xFFFF) >= 0x200)) 479 hpriv->saved_cap2 = cap2 = readl(mmio + HOST_CAP2); 480 else 481 hpriv->saved_cap2 = cap2 = 0; 482 483 /* some chips have errata preventing 64bit use */ 484 if ((cap & HOST_CAP_64) && (hpriv->flags & AHCI_HFLAG_32BIT_ONLY)) { 485 dev_info(dev, "controller can't do 64bit DMA, forcing 32bit\n"); 486 cap &= ~HOST_CAP_64; 487 } 488 489 if ((cap & HOST_CAP_NCQ) && (hpriv->flags & AHCI_HFLAG_NO_NCQ)) { 490 dev_info(dev, "controller can't do NCQ, turning off CAP_NCQ\n"); 491 cap &= ~HOST_CAP_NCQ; 492 } 493 494 if (!(cap & HOST_CAP_NCQ) && (hpriv->flags & AHCI_HFLAG_YES_NCQ)) { 495 dev_info(dev, "controller can do NCQ, turning on CAP_NCQ\n"); 496 cap |= HOST_CAP_NCQ; 497 } 498 499 if ((cap & HOST_CAP_PMP) && (hpriv->flags & AHCI_HFLAG_NO_PMP)) { 500 dev_info(dev, "controller can't do PMP, turning off CAP_PMP\n"); 501 cap &= ~HOST_CAP_PMP; 502 } 503 504 if ((cap & HOST_CAP_SNTF) && (hpriv->flags & AHCI_HFLAG_NO_SNTF)) { 505 dev_info(dev, 506 "controller can't do SNTF, turning off CAP_SNTF\n"); 507 cap &= ~HOST_CAP_SNTF; 508 } 509 510 if ((cap2 & HOST_CAP2_SDS) && (hpriv->flags & AHCI_HFLAG_NO_DEVSLP)) { 511 dev_info(dev, 512 "controller can't do DEVSLP, turning off\n"); 513 cap2 &= ~HOST_CAP2_SDS; 514 cap2 &= ~HOST_CAP2_SADM; 515 } 516 517 if (!(cap & HOST_CAP_FBS) && (hpriv->flags & AHCI_HFLAG_YES_FBS)) { 518 dev_info(dev, "controller can do FBS, turning on CAP_FBS\n"); 519 cap |= HOST_CAP_FBS; 520 } 521 522 if ((cap & HOST_CAP_FBS) && (hpriv->flags & AHCI_HFLAG_NO_FBS)) { 523 dev_info(dev, "controller can't do FBS, turning off CAP_FBS\n"); 524 cap &= ~HOST_CAP_FBS; 525 } 526 527 if (!(cap & HOST_CAP_ALPM) && (hpriv->flags & AHCI_HFLAG_YES_ALPM)) { 528 dev_info(dev, "controller can do ALPM, turning on CAP_ALPM\n"); 529 cap |= HOST_CAP_ALPM; 530 } 531 532 if ((cap & HOST_CAP_SXS) && (hpriv->flags & AHCI_HFLAG_NO_SXS)) { 533 dev_info(dev, "controller does not support SXS, disabling CAP_SXS\n"); 534 cap &= ~HOST_CAP_SXS; 535 } 536 537 /* Override the HBA ports mapping if the platform needs it */ 538 port_map = readl(mmio + HOST_PORTS_IMPL); 539 if (hpriv->saved_port_map && port_map != hpriv->saved_port_map) { 540 dev_info(dev, "forcing port_map 0x%lx -> 0x%x\n", 541 port_map, hpriv->saved_port_map); 542 port_map = hpriv->saved_port_map; 543 } else { 544 hpriv->saved_port_map = port_map; 545 } 546 547 /* mask_port_map not set means that all ports are available */ 548 if (hpriv->mask_port_map) { 549 dev_warn(dev, "masking port_map 0x%lx -> 0x%lx\n", 550 port_map, 551 port_map & hpriv->mask_port_map); 552 port_map &= hpriv->mask_port_map; 553 } 554 555 /* cross check port_map and cap.n_ports */ 556 if (port_map) { 557 int map_ports = 0; 558 559 for (i = 0; i < AHCI_MAX_PORTS; i++) 560 if (port_map & (1 << i)) 561 map_ports++; 562 563 /* If PI has more ports than n_ports, whine, clear 564 * port_map and let it be generated from n_ports. 565 */ 566 if (map_ports > ahci_nr_ports(cap)) { 567 dev_warn(dev, 568 "implemented port map (0x%lx) contains more ports than nr_ports (%u), using nr_ports\n", 569 port_map, ahci_nr_ports(cap)); 570 port_map = 0; 571 } 572 } 573 574 /* fabricate port_map from cap.nr_ports for < AHCI 1.3 */ 575 if (!port_map && vers < 0x10300) { 576 port_map = (1 << ahci_nr_ports(cap)) - 1; 577 dev_warn(dev, "forcing PORTS_IMPL to 0x%lx\n", port_map); 578 579 /* write the fixed up value to the PI register */ 580 hpriv->saved_port_map = port_map; 581 } 582 583 is_pci_dev = dev_is_pci(dev); 584 if (is_pci_dev) { 585 struct pci_dev *pdev = to_pci_dev(dev); 586 587 mmio_size = (unsigned long long)pci_resource_len(pdev, 5); 588 } 589 590 /* 591 * Preserve the ports capabilities defined by the platform. Note there 592 * is no need in storing the rest of the P#.CMD fields since they are 593 * volatile. 594 */ 595 for_each_set_bit(i, &port_map, AHCI_MAX_PORTS) { 596 if (hpriv->saved_port_cap[i]) 597 continue; 598 599 port_mmio = __ahci_port_base(hpriv, i); 600 601 /* Calculate offset from MMIO base */ 602 unsigned long long port_offset = (unsigned long long)port_mmio - 603 (unsigned long long)mmio; 604 /* Check if port register block is within MMIO region */ 605 if (is_pci_dev && port_offset >= mmio_size) { 606 /* 607 * Port registers exceed MMIO region boundary. 608 * Since ports are sequentially mapped (0x100 + i*0x80), 609 * all subsequent ports will also exceed the boundary. 610 * 611 * Update port_map to exclude this and all higher ports, 612 * then break out of the loop. 613 */ > 614 dev_warn(dev, "Port %d (offset 0x%llx) exceeds MMIO region (0x%llx), 615 truncating port map at port %d\n", 616 i, port_offset, mmio_size, i-1); 617 618 port_map = (1UL << i) - 1; 619 hpriv->saved_port_map = port_map; 620 break; 621 } 622 623 hpriv->saved_port_cap[i] = 624 readl(port_mmio + PORT_CMD) & PORT_CMD_CAP; 625 } 626 627 /* record values to use during operation */ 628 hpriv->cap = cap; 629 hpriv->cap2 = cap2; 630 hpriv->version = vers; 631 hpriv->port_map = port_map; 632 633 if (!hpriv->start_engine) 634 hpriv->start_engine = ahci_start_engine; 635 636 if (!hpriv->stop_engine) 637 hpriv->stop_engine = ahci_stop_engine; 638 639 if (!hpriv->irq_handler) 640 hpriv->irq_handler = ahci_single_level_irq_intr; 641 } 642 EXPORT_SYMBOL_GPL(ahci_save_initial_config); 643 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region 2026-04-30 0:59 ` kernel test robot @ 2026-04-30 12:48 ` Niklas Cassel 0 siblings, 0 replies; 10+ messages in thread From: Niklas Cassel @ 2026-04-30 12:48 UTC (permalink / raw) To: kernel test robot Cc: dayou5941, dlemoal, llvm, oe-kbuild-all, linux-ide, linux-kernel, liyouhong On Thu, Apr 30, 2026 at 08:59:57AM +0800, kernel test robot wrote: > Hi, > > kernel test robot noticed the following build errors: > > [auto build test ERROR on linus/master] > [also build test ERROR on v7.1-rc1 next-20260429] > [If your patch is applied to the wrong git tree, kindly drop us a note. > And when submitting patch, we suggest to use '--base' as documented in > https://git-scm.com/docs/git-format-patch#_base_tree_information] > > url: https://github.com/intel-lab-lkp/linux/commits/dayou5941-163-com/ata-libahci-fix-panic-when-accessing-ports-beyond-MMIO-region/20260422-192119 > base: linus/master > patch link: https://lore.kernel.org/r/20260422080322.1006592-1-dayou5941%40163.com > patch subject: [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region > config: arm64-randconfig-003-20260430 (https://download.01.org/0day-ci/archive/20260430/202604300815.6kpEidbJ-lkp@intel.com/config) > compiler: clang version 23.0.0git (https://github.com/llvm/llvm-project 5bac06718f502014fade905512f1d26d578a18f3) > rustc: rustc 1.88.0 (6b00bc388 2025-06-23) > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260430/202604300815.6kpEidbJ-lkp@intel.com/reproduce) > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <lkp@intel.com> > | Closes: https://lore.kernel.org/oe-kbuild-all/202604300815.6kpEidbJ-lkp@intel.com/ > > All errors (new ones prefixed by >>): > > drivers/ata/libahci.c:614:18: warning: missing terminating '"' character [-Winvalid-pp-token] > 614 | dev_warn(dev, "Port %d (offset 0x%llx) exceeds MMIO region (0x%llx), > | ^ > drivers/ata/libahci.c:615:44: warning: missing terminating '"' character [-Winvalid-pp-token] > 615 | truncating port map at port %d\n", > | ^ > >> drivers/ata/libahci.c:614:18: error: expected expression > 614 | dev_warn(dev, "Port %d (offset 0x%llx) exceeds MMIO region (0x%llx), > | ^ > 2 warnings and 1 error generated. Thank you dear kernel test robot, However, this report is for V1, we have applied V5 of this patch: https://lore.kernel.org/linux-ide/177735961706.2183577.4033506737290437474.b4-ty@b4/T/#t Which does not have this problem. Kind regards, Niklas ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-04-30 12:48 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-22 8:03 [PATCH] ata: libahci: fix panic when accessing ports beyond MMIO region dayou5941
2026-04-22 14:36 ` Niklas Cassel
[not found] ` <55809835.8838.19db9be1205.Coremail.dayou5941@163.com>
2026-04-23 17:19 ` Niklas Cassel
2026-04-24 2:43 ` Damien Le Moal
[not found] ` <13d7d471.6389.19dbe594e14.Coremail.dayou5941@163.com>
2026-04-24 11:07 ` Niklas Cassel
2026-04-24 11:15 ` Damien Le Moal
2026-04-25 6:15 ` Re:Re: " 李佑鸿
[not found] ` <26f59adf.571d.19dbe310f41.Coremail.dayou5941@163.com>
2026-04-24 11:07 ` Niklas Cassel
2026-04-30 0:59 ` kernel test robot
2026-04-30 12:48 ` Niklas Cassel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox