linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* WARNING: drivers/iommu/io-pgtable-arm.c:639
@ 2025-12-09 11:43 Sebastian Ott
  2025-12-09 11:50 ` Robin Murphy
  2025-12-10  5:02 ` Keith Busch
  0 siblings, 2 replies; 16+ messages in thread
From: Sebastian Ott @ 2025-12-09 11:43 UTC (permalink / raw)
  To: linux-nvme, iommu, linux-block, linux-kernel, linux-xfs
  Cc: Jens Axboe, Christoph Hellwig, Will Deacon, Robin Murphy,
	Carlos Maiolino

Hi,

got the following warning after a kernel update on Thurstday, leading to a
panic and fs corruption. I didn't capture the first warning but I'm pretty
sure it was the same. It's reproducible but I didn't bisect since it
borked my fs. The only hint I can give is that v6.18 worked. Is this a
known issue? Anything I should try?

[64906.234244] WARNING: drivers/iommu/io-pgtable-arm.c:639 at __arm_lpae_unmap+0x358/0x3d0, CPU#94: kworker/94:0/494
[64906.234247] Modules linked in: mlx5_ib ib_uverbs ib_core qrtr rfkill sunrpc mlx5_core cdc_eem usbnet mii acpi_ipmi ipmi_ssif ipmi_devintf ipmi_msghandler mlxfw arm_cmn psample arm_spe_pmu arm_dmc620_pmu vfat fat arm_dsu_pmu cppc_cpufreq fuse loop dm_multipath nfnetlink zram xfs nvme mgag200 ghash_ce sbsa_gwdt nvme_core i2c_algo_bit xgene_hwmon scsi_dh_rdac scsi_dh_emc scsi_dh_alua i2c_dev
[64906.234269] CPU: 94 UID: 0 PID: 494 Comm: kworker/94:0 Tainted: G        W           6.18.0+ #1 PREEMPT(voluntary) 
[64906.234271] Tainted: [W]=WARN
[64906.234271] Hardware name: HPE ProLiant RL300 Gen11/ProLiant RL300 Gen11, BIOS 1.50 12/18/2023
[64906.234272] Workqueue: xfs-buf/nvme1n1p1 xfs_buf_ioend_work [xfs]
[64906.234383] pstate: 804000c9 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[64906.234385] pc : __arm_lpae_unmap+0x358/0x3d0
[64906.234386] lr : __arm_lpae_unmap+0x100/0x3d0
[64906.234387] sp : ffff800083d4bad0
[64906.234388] x29: ffff800083d4bad0 x28: 00000000f3460000 x27: ffff800081bb28e8
[64906.234391] x26: 0000000000001000 x25: ffff800083d4be00 x24: 00000000f3460000
[64906.234393] x23: 0000000000001000 x22: ffff07ff85de9c20 x21: 0000000000000001
[64906.234395] x20: 0000000000000000 x19: ffff07ff9d540300 x18: 0000000000000300
[64906.234398] x17: ffff887cbd289000 x16: ffff800083d48000 x15: 0000000000001000
[64906.234400] x14: 0000000000000fc4 x13: 0000000000000820 x12: 0000000000001000
[64906.234402] x11: 0000000000000006 x10: ffff07ffa1b9c300 x9 : 0000000000000009
[64906.234405] x8 : 0000000000000060 x7 : 000000000000000c x6 : ffff07ffa1b9c000
[64906.234407] x5 : 0000000000000003 x4 : 0000000000000001 x3 : 0000000000001000
[64906.234409] x2 : 0000000000000000 x1 : ffff800083d4be00 x0 : 0000000000000000
[64906.234411] Call trace:
[64906.234412]  __arm_lpae_unmap+0x358/0x3d0 (P)
[64906.234414]  __arm_lpae_unmap+0x100/0x3d0
[64906.234415]  __arm_lpae_unmap+0x100/0x3d0
[64906.234417]  __arm_lpae_unmap+0x100/0x3d0
[64906.234418]  arm_lpae_unmap_pages+0x74/0x90
[64906.234420]  arm_smmu_unmap_pages+0x24/0x40
[64906.234422]  __iommu_unmap+0xe8/0x2a0
[64906.234424]  iommu_unmap_fast+0x18/0x30
[64906.234426]  __iommu_dma_iova_unlink+0xe4/0x280
[64906.234428]  dma_iova_destroy+0x30/0x58
[64906.234431]  nvme_unmap_data+0x88/0x248 [nvme]
[64906.234434]  nvme_poll_cq+0x1d4/0x3e0 [nvme]
[64906.234438]  nvme_irq+0x28/0x70 [nvme]
[64906.234441]  __handle_irq_event_percpu+0x84/0x370
[64906.234444]  handle_irq_event+0x4c/0xb0
[64906.234447]  handle_fasteoi_irq+0x110/0x1a8
[64906.234449]  handle_irq_desc+0x3c/0x68
[64906.234451]  generic_handle_domain_irq+0x24/0x40
[64906.234454]  gic_handle_irq+0x5c/0xe0
[64906.234455]  call_on_irq_stack+0x30/0x48
[64906.234457]  do_interrupt_handler+0xdc/0xe0
[64906.234459]  el1_interrupt+0x38/0x60
[64906.234462]  el1h_64_irq_handler+0x18/0x30
[64906.234464]  el1h_64_irq+0x70/0x78
[64906.234466]  arm_lpae_init_pte+0x228/0x238 (P)
[64906.234467]  __arm_lpae_map+0x2f8/0x378
[64906.234469]  __arm_lpae_map+0x114/0x378
[64906.234470]  __arm_lpae_map+0x114/0x378
[64906.234472]  __arm_lpae_map+0x114/0x378
[64906.234473]  arm_lpae_map_pages+0x108/0x240
[64906.234475]  arm_smmu_map_pages+0x24/0x40
[64906.234477]  iommu_map_nosync+0x124/0x310
[64906.234479]  iommu_map+0x2c/0xb0
[64906.234481]  __iommu_dma_map+0xbc/0x1b0
[64906.234484]  iommu_dma_map_phys+0xf0/0x1c0
[64906.234486]  dma_map_phys+0x190/0x1b0
[64906.234488]  dma_map_page_attrs+0x50/0x70
[64906.234490]  nvme_map_data+0x21c/0x318 [nvme]
[64906.234493]  nvme_prep_rq+0x60/0x200 [nvme]
[64906.234496]  nvme_queue_rq+0x48/0x180 [nvme]
[64906.234499]  blk_mq_dispatch_rq_list+0xfc/0x4d0
[64906.234502]  __blk_mq_sched_dispatch_requests+0xa4/0x1b0
[64906.234504]  blk_mq_sched_dispatch_requests+0x38/0xa0
[64906.234506]  blk_mq_run_hw_queue+0x2f0/0x3d0
[64906.234509]  blk_mq_issue_direct+0x12c/0x280
[64906.234511]  blk_mq_dispatch_queue_requests+0x258/0x318
[64906.234514]  blk_mq_flush_plug_list+0x68/0x170
[64906.234515]  __blk_flush_plug+0xf0/0x140
[64906.234518]  blk_finish_plug+0x34/0x50
[64906.234520]  xfs_buf_submit_bio+0x158/0x1a8 [xfs]
[64906.234630]  xfs_buf_submit+0x80/0x268 [xfs]
[64906.234739]  xfs_buf_ioend_handle_error+0x254/0x480 [xfs]
[64906.234848]  __xfs_buf_ioend+0x18c/0x218 [xfs]
[64906.234957]  xfs_buf_ioend_work+0x24/0x60 [xfs]
[64906.235066]  process_one_work+0x22c/0x658
[64906.235069]  worker_thread+0x1ac/0x360
[64906.235072]  kthread+0x110/0x138
[64906.235074]  ret_from_fork+0x10/0x20
[64906.235075] ---[ end trace 0000000000000000 ]---

Thanks,
Sebastian


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639
  2025-12-09 11:43 WARNING: drivers/iommu/io-pgtable-arm.c:639 Sebastian Ott
@ 2025-12-09 11:50 ` Robin Murphy
  2025-12-09 17:29   ` Chaitanya Kulkarni
  2025-12-09 21:05   ` Sebastian Ott
  2025-12-10  5:02 ` Keith Busch
  1 sibling, 2 replies; 16+ messages in thread
From: Robin Murphy @ 2025-12-09 11:50 UTC (permalink / raw)
  To: Sebastian Ott, linux-nvme, iommu, linux-block, linux-kernel,
	linux-xfs
  Cc: Jens Axboe, Christoph Hellwig, Will Deacon, Carlos Maiolino

On 2025-12-09 11:43 am, Sebastian Ott wrote:
> Hi,
> 
> got the following warning after a kernel update on Thurstday, leading to a
> panic and fs corruption. I didn't capture the first warning but I'm pretty
> sure it was the same. It's reproducible but I didn't bisect since it
> borked my fs. The only hint I can give is that v6.18 worked. Is this a
> known issue? Anything I should try?

nvme_unmap_data() is attempting to unmap an IOVA that was never mapped, 
or has already been unmapped by someone else. That's a usage bug.

Thanks,
Robin.

> [64906.234244] WARNING: drivers/iommu/io-pgtable-arm.c:639 at 
> __arm_lpae_unmap+0x358/0x3d0, CPU#94: kworker/94:0/494
> [64906.234247] Modules linked in: mlx5_ib ib_uverbs ib_core qrtr rfkill 
> sunrpc mlx5_core cdc_eem usbnet mii acpi_ipmi ipmi_ssif ipmi_devintf 
> ipmi_msghandler mlxfw arm_cmn psample arm_spe_pmu arm_dmc620_pmu vfat 
> fat arm_dsu_pmu cppc_cpufreq fuse loop dm_multipath nfnetlink zram xfs 
> nvme mgag200 ghash_ce sbsa_gwdt nvme_core i2c_algo_bit xgene_hwmon 
> scsi_dh_rdac scsi_dh_emc scsi_dh_alua i2c_dev
> [64906.234269] CPU: 94 UID: 0 PID: 494 Comm: kworker/94:0 Tainted: 
> G        W           6.18.0+ #1 PREEMPT(voluntary) [64906.234271] 
> Tainted: [W]=WARN
> [64906.234271] Hardware name: HPE ProLiant RL300 Gen11/ProLiant RL300 
> Gen11, BIOS 1.50 12/18/2023
> [64906.234272] Workqueue: xfs-buf/nvme1n1p1 xfs_buf_ioend_work [xfs]
> [64906.234383] pstate: 804000c9 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS 
> BTYPE=--)
> [64906.234385] pc : __arm_lpae_unmap+0x358/0x3d0
> [64906.234386] lr : __arm_lpae_unmap+0x100/0x3d0
> [64906.234387] sp : ffff800083d4bad0
> [64906.234388] x29: ffff800083d4bad0 x28: 00000000f3460000 x27: 
> ffff800081bb28e8
> [64906.234391] x26: 0000000000001000 x25: ffff800083d4be00 x24: 
> 00000000f3460000
> [64906.234393] x23: 0000000000001000 x22: ffff07ff85de9c20 x21: 
> 0000000000000001
> [64906.234395] x20: 0000000000000000 x19: ffff07ff9d540300 x18: 
> 0000000000000300
> [64906.234398] x17: ffff887cbd289000 x16: ffff800083d48000 x15: 
> 0000000000001000
> [64906.234400] x14: 0000000000000fc4 x13: 0000000000000820 x12: 
> 0000000000001000
> [64906.234402] x11: 0000000000000006 x10: ffff07ffa1b9c300 x9 : 
> 0000000000000009
> [64906.234405] x8 : 0000000000000060 x7 : 000000000000000c x6 : 
> ffff07ffa1b9c000
> [64906.234407] x5 : 0000000000000003 x4 : 0000000000000001 x3 : 
> 0000000000001000
> [64906.234409] x2 : 0000000000000000 x1 : ffff800083d4be00 x0 : 
> 0000000000000000
> [64906.234411] Call trace:
> [64906.234412]  __arm_lpae_unmap+0x358/0x3d0 (P)
> [64906.234414]  __arm_lpae_unmap+0x100/0x3d0
> [64906.234415]  __arm_lpae_unmap+0x100/0x3d0
> [64906.234417]  __arm_lpae_unmap+0x100/0x3d0
> [64906.234418]  arm_lpae_unmap_pages+0x74/0x90
> [64906.234420]  arm_smmu_unmap_pages+0x24/0x40
> [64906.234422]  __iommu_unmap+0xe8/0x2a0
> [64906.234424]  iommu_unmap_fast+0x18/0x30
> [64906.234426]  __iommu_dma_iova_unlink+0xe4/0x280
> [64906.234428]  dma_iova_destroy+0x30/0x58
> [64906.234431]  nvme_unmap_data+0x88/0x248 [nvme]
> [64906.234434]  nvme_poll_cq+0x1d4/0x3e0 [nvme]
> [64906.234438]  nvme_irq+0x28/0x70 [nvme]
> [64906.234441]  __handle_irq_event_percpu+0x84/0x370
> [64906.234444]  handle_irq_event+0x4c/0xb0
> [64906.234447]  handle_fasteoi_irq+0x110/0x1a8
> [64906.234449]  handle_irq_desc+0x3c/0x68
> [64906.234451]  generic_handle_domain_irq+0x24/0x40
> [64906.234454]  gic_handle_irq+0x5c/0xe0
> [64906.234455]  call_on_irq_stack+0x30/0x48
> [64906.234457]  do_interrupt_handler+0xdc/0xe0
> [64906.234459]  el1_interrupt+0x38/0x60
> [64906.234462]  el1h_64_irq_handler+0x18/0x30
> [64906.234464]  el1h_64_irq+0x70/0x78
> [64906.234466]  arm_lpae_init_pte+0x228/0x238 (P)
> [64906.234467]  __arm_lpae_map+0x2f8/0x378
> [64906.234469]  __arm_lpae_map+0x114/0x378
> [64906.234470]  __arm_lpae_map+0x114/0x378
> [64906.234472]  __arm_lpae_map+0x114/0x378
> [64906.234473]  arm_lpae_map_pages+0x108/0x240
> [64906.234475]  arm_smmu_map_pages+0x24/0x40
> [64906.234477]  iommu_map_nosync+0x124/0x310
> [64906.234479]  iommu_map+0x2c/0xb0
> [64906.234481]  __iommu_dma_map+0xbc/0x1b0
> [64906.234484]  iommu_dma_map_phys+0xf0/0x1c0
> [64906.234486]  dma_map_phys+0x190/0x1b0
> [64906.234488]  dma_map_page_attrs+0x50/0x70
> [64906.234490]  nvme_map_data+0x21c/0x318 [nvme]
> [64906.234493]  nvme_prep_rq+0x60/0x200 [nvme]
> [64906.234496]  nvme_queue_rq+0x48/0x180 [nvme]
> [64906.234499]  blk_mq_dispatch_rq_list+0xfc/0x4d0
> [64906.234502]  __blk_mq_sched_dispatch_requests+0xa4/0x1b0
> [64906.234504]  blk_mq_sched_dispatch_requests+0x38/0xa0
> [64906.234506]  blk_mq_run_hw_queue+0x2f0/0x3d0
> [64906.234509]  blk_mq_issue_direct+0x12c/0x280
> [64906.234511]  blk_mq_dispatch_queue_requests+0x258/0x318
> [64906.234514]  blk_mq_flush_plug_list+0x68/0x170
> [64906.234515]  __blk_flush_plug+0xf0/0x140
> [64906.234518]  blk_finish_plug+0x34/0x50
> [64906.234520]  xfs_buf_submit_bio+0x158/0x1a8 [xfs]
> [64906.234630]  xfs_buf_submit+0x80/0x268 [xfs]
> [64906.234739]  xfs_buf_ioend_handle_error+0x254/0x480 [xfs]
> [64906.234848]  __xfs_buf_ioend+0x18c/0x218 [xfs]
> [64906.234957]  xfs_buf_ioend_work+0x24/0x60 [xfs]
> [64906.235066]  process_one_work+0x22c/0x658
> [64906.235069]  worker_thread+0x1ac/0x360
> [64906.235072]  kthread+0x110/0x138
> [64906.235074]  ret_from_fork+0x10/0x20
> [64906.235075] ---[ end trace 0000000000000000 ]---
> 
> Thanks,
> Sebastian
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639
  2025-12-09 11:50 ` Robin Murphy
@ 2025-12-09 17:29   ` Chaitanya Kulkarni
  2025-12-09 17:34     ` Robin Murphy
  2025-12-09 21:05   ` Sebastian Ott
  1 sibling, 1 reply; 16+ messages in thread
From: Chaitanya Kulkarni @ 2025-12-09 17:29 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Jens Axboe, Christoph Hellwig, Will Deacon, Carlos Maiolino,
	iommu@lists.linux.dev, linux-xfs@vger.kernel.org,
	linux-nvme@lists.infradead.org, Sebastian Ott,
	linux-kernel@vger.kernel.org, linux-block@vger.kernel.org

On 12/9/25 03:50, Robin Murphy wrote:
> On 2025-12-09 11:43 am, Sebastian Ott wrote:
>> Hi,
>>
>> got the following warning after a kernel update on Thurstday, leading 
>> to a
>> panic and fs corruption. I didn't capture the first warning but I'm 
>> pretty
>> sure it was the same. It's reproducible but I didn't bisect since it
>> borked my fs. The only hint I can give is that v6.18 worked. Is this a
>> known issue? Anything I should try?
>
> nvme_unmap_data() is attempting to unmap an IOVA that was never 
> mapped, or has already been unmapped by someone else. That's a usage bug.
>
> Thanks,
> Robin.

Ankit A. also reported this.

Apart from unmapping, by any chance do we need this ?

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index e6626004b323..05d63fe92e43 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -637,7 +637,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
  	pte = READ_ONCE(*ptep);
  	if (!pte) {
  		WARN_ON(!(data->iop.cfg.quirks & IO_PGTABLE_QUIRK_NO_WARN));
-		return -ENOENT;
+		return 0;
  	}
  
  	/* If the size matches this level, we're in the right place */
-- 
2.40.0

disclaimer :-

THIS PATCH IS COMPLETELY UNTESTED AND MAY BE INCORRECT.
PLEASE REVIEW CAREFULLY.

-ck



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639
  2025-12-09 17:29   ` Chaitanya Kulkarni
@ 2025-12-09 17:34     ` Robin Murphy
  2025-12-09 17:59       ` Chaitanya Kulkarni
  0 siblings, 1 reply; 16+ messages in thread
From: Robin Murphy @ 2025-12-09 17:34 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Jens Axboe, Christoph Hellwig, Will Deacon, Carlos Maiolino,
	iommu@lists.linux.dev, linux-xfs@vger.kernel.org,
	linux-nvme@lists.infradead.org, Sebastian Ott,
	linux-kernel@vger.kernel.org, linux-block@vger.kernel.org

On 2025-12-09 5:29 pm, Chaitanya Kulkarni wrote:
> On 12/9/25 03:50, Robin Murphy wrote:
>> On 2025-12-09 11:43 am, Sebastian Ott wrote:
>>> Hi,
>>>
>>> got the following warning after a kernel update on Thurstday, leading
>>> to a
>>> panic and fs corruption. I didn't capture the first warning but I'm
>>> pretty
>>> sure it was the same. It's reproducible but I didn't bisect since it
>>> borked my fs. The only hint I can give is that v6.18 worked. Is this a
>>> known issue? Anything I should try?
>>
>> nvme_unmap_data() is attempting to unmap an IOVA that was never
>> mapped, or has already been unmapped by someone else. That's a usage bug.
>>
>> Thanks,
>> Robin.
> 
> Ankit A. also reported this.
> 
> Apart from unmapping, by any chance do we need this ?
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index e6626004b323..05d63fe92e43 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -637,7 +637,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>    	pte = READ_ONCE(*ptep);
>    	if (!pte) {
>    		WARN_ON(!(data->iop.cfg.quirks & IO_PGTABLE_QUIRK_NO_WARN));
> -		return -ENOENT;
> +		return 0;
>    	}
>    
>    	/* If the size matches this level, we're in the right place */

Oh, indeed - I also happened to notice that the other week and was 
intending to write up a fix, but apparently I completely forgot about it 
already :(

If you're happy to write that up and send a proper patch, please do - 
otherwise I'll try to get it done before I forget again...

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639
  2025-12-09 17:34     ` Robin Murphy
@ 2025-12-09 17:59       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 16+ messages in thread
From: Chaitanya Kulkarni @ 2025-12-09 17:59 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Jens Axboe, Christoph Hellwig, Will Deacon, Carlos Maiolino,
	iommu@lists.linux.dev, linux-xfs@vger.kernel.org,
	linux-nvme@lists.infradead.org, Sebastian Ott,
	linux-kernel@vger.kernel.org, linux-block@vger.kernel.org

On 12/9/25 09:34, Robin Murphy wrote:
> On 2025-12-09 5:29 pm, Chaitanya Kulkarni wrote:
>> On 12/9/25 03:50, Robin Murphy wrote:
>>> On 2025-12-09 11:43 am, Sebastian Ott wrote:
>>>> Hi,
>>>>
>>>> got the following warning after a kernel update on Thurstday, leading
>>>> to a
>>>> panic and fs corruption. I didn't capture the first warning but I'm
>>>> pretty
>>>> sure it was the same. It's reproducible but I didn't bisect since it
>>>> borked my fs. The only hint I can give is that v6.18 worked. Is this a
>>>> known issue? Anything I should try?
>>>
>>> nvme_unmap_data() is attempting to unmap an IOVA that was never
>>> mapped, or has already been unmapped by someone else. That's a usage 
>>> bug.
>>>
>>> Thanks,
>>> Robin.
>>
>> Ankit A. also reported this.
>>
>> Apart from unmapping, by any chance do we need this ?
>>
>> diff --git a/drivers/iommu/io-pgtable-arm.c 
>> b/drivers/iommu/io-pgtable-arm.c
>> index e6626004b323..05d63fe92e43 100644
>> --- a/drivers/iommu/io-pgtable-arm.c
>> +++ b/drivers/iommu/io-pgtable-arm.c
>> @@ -637,7 +637,7 @@ static size_t __arm_lpae_unmap(struct 
>> arm_lpae_io_pgtable *data,
>>        pte = READ_ONCE(*ptep);
>>        if (!pte) {
>>            WARN_ON(!(data->iop.cfg.quirks & IO_PGTABLE_QUIRK_NO_WARN));
>> -        return -ENOENT;
>> +        return 0;
>>        }
>>           /* If the size matches this level, we're in the right place */
>
> Oh, indeed - I also happened to notice that the other week and was 
> intending to write up a fix, but apparently I completely forgot about 
> it already :(
>
> If you're happy to write that up and send a proper patch, please do - 
> otherwise I'll try to get it done before I forget again...
>
> Thanks,
> Robin.

sounds good, I'll send a patch and continue debugging the
problem further.

-ck



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639
  2025-12-09 11:50 ` Robin Murphy
  2025-12-09 17:29   ` Chaitanya Kulkarni
@ 2025-12-09 21:05   ` Sebastian Ott
  2025-12-10  2:30     ` Chaitanya Kulkarni
  1 sibling, 1 reply; 16+ messages in thread
From: Sebastian Ott @ 2025-12-09 21:05 UTC (permalink / raw)
  To: Robin Murphy
  Cc: linux-nvme, iommu, linux-block, linux-kernel, linux-xfs,
	Jens Axboe, Christoph Hellwig, Will Deacon, Carlos Maiolino

On Tue, 9 Dec 2025, Robin Murphy wrote:
> On 2025-12-09 11:43 am, Sebastian Ott wrote:
>>  Hi,
>>
>>  got the following warning after a kernel update on Thurstday, leading to a
>>  panic and fs corruption. I didn't capture the first warning but I'm pretty
>>  sure it was the same. It's reproducible but I didn't bisect since it
>>  borked my fs. The only hint I can give is that v6.18 worked. Is this a
>>  known issue? Anything I should try?
>
> nvme_unmap_data() is attempting to unmap an IOVA that was never mapped, or 
> has already been unmapped by someone else. That's a usage bug.

OK, that's what I suspected - thanks for the confirmation!

I did another repro and tried:

good: 44fc84337b6e Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
bad:  cc25df3e2e22 Merge tag 'for-6.19/block-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

I'll start bisecting between these 2 - hoping it doesn't fork up my root
fs again...

Thanks,
Sebastian


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639
  2025-12-09 21:05   ` Sebastian Ott
@ 2025-12-10  2:30     ` Chaitanya Kulkarni
  2025-12-10  4:05       ` Keith Busch
  0 siblings, 1 reply; 16+ messages in thread
From: Chaitanya Kulkarni @ 2025-12-10  2:30 UTC (permalink / raw)
  To: Sebastian Ott
  Cc: linux-nvme@lists.infradead.org, iommu@lists.linux.dev,
	Robin Murphy, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org,
	Jens Axboe, Christoph Hellwig, Will Deacon, Carlos Maiolino

Sebastian,

On 12/9/25 13:05, Sebastian Ott wrote:
> On Tue, 9 Dec 2025, Robin Murphy wrote:
>> On 2025-12-09 11:43 am, Sebastian Ott wrote:
>>>  Hi,
>>>
>>>  got the following warning after a kernel update on Thurstday, 
>>> leading to a
>>>  panic and fs corruption. I didn't capture the first warning but I'm 
>>> pretty
>>>  sure it was the same. It's reproducible but I didn't bisect since it
>>>  borked my fs. The only hint I can give is that v6.18 worked. Is this a
>>>  known issue? Anything I should try?
>>
>> nvme_unmap_data() is attempting to unmap an IOVA that was never 
>> mapped, or has already been unmapped by someone else. That's a usage 
>> bug.
>
> OK, that's what I suspected - thanks for the confirmation!
>
> I did another repro and tried:
>
> good: 44fc84337b6e Merge tag 'arm64-upstream' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
> bad:  cc25df3e2e22 Merge tag 'for-6.19/block-20251201' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
>
> I'll start bisecting between these 2 - hoping it doesn't fork up my root
> fs again...
>
> Thanks,
> Sebastian
>
>
Can you see if this fixes your problem ?


==========
WARNING/DISCLOSURE:

These patches may cause system instability or crashes during testing.
Test only on non-production systems with proper backups in place.
==========


 From 0d180e8055e98d91174ba8fdd47ab934a7a88bef Mon Sep 17 00:00:00 2001
From: Chaitanya Kulkarni <ckulkarnilinux@gmail.com>
Date: Tue, 9 Dec 2025 01:23:51 -0800
Subject: [PATCH 1/2 COMPILE TESTED ONLY] iommu/io-pgtable-arm: fix size_t signedness bug in
  unmap path

__arm_lpae_unmap() returns size_t but was returning -ENOENT (negative
error code) when encountering an unmapped PTE. Since size_t is unsigned,
-ENOENT (typically -2) becomes a huge positive value (0xFFFFFFFFFFFFFFFE
on 64-bit systems).

This corrupted value propagates through the call chain:
   __arm_lpae_unmap() returns -ENOENT as size_t
   -> arm_lpae_unmap_pages() returns it
   -> __iommu_unmap() adds it to iova address
   -> iommu_pgsize() triggers BUG_ON due to corrupted iova

The corruption causes:
1. IOVA address overflow in __iommu_unmap() loop
2. BUG_ON in iommu_pgsize() from invalid address alignment
3. Kernel panic on ARM64 systems with SMMU

Fix by returning 0 instead of -ENOENT. The WARN_ON already signals
the error condition, and returning 0 (meaning "nothing unmapped")
is the correct semantic for size_t return type. This matches the
behavior of other io-pgtable implementations (io-pgtable-arm-v7s,
io-pgtable-dart) which return 0 on error conditions.

Kernel splat observed:

  ------------[ cut here ]------------
  kernel BUG at drivers/iommu/iommu.c:2464!
  Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
  Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
   nf_reject_ipv4 xt_tcpudp nft_compat x_tables nft_chain_nat nf_nat
   nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge
   stp llc nvme_fabrics binfmt_misc nls_iso8859_1 ipmi_ssif arm_smmuv3_pmu
   cdc_subset arm_spe_pmu spi_nor acpi_power_meter acpi_ipmi ipmi_devintf
   cppc_cpufreq ipmi_msghandler sch_fq_codel dm_multipath scsi_dh_rdac
   scsi_dh_emc scsi_dh_alua arm_cspmu_module efi_pstore autofs4 btrfs
   blake2b libblake2b raid10 raid456 async_raid6_recov async_memcpy
   async_pq async_xor async_tx xor xor_neon raid6_pq raid1 raid0 mlx5_ib
   ib_uverbs ib_core cdc_ether usbnet mlx5_core ghash_ce dax_hmem sm4_ce_gcm
   ast cxl_acpi sm4_ce_ccm drm_shmem_helper sm4_ce cxl_port drm_client_lib
   i2c_smbus sm4_ce_cipher mlxfw drm_kms_helper cxl_core sm4 nvme psample
   igb sm3_ce arm_smccc_trng einj drm nvme_core i2c_algo_bit xhci_pci_renesas
   tls i2c_tegra aes_neon_bs aes_neon_blk aes_ce_blk aes_ce_cipher
  CPU: 26 UID: 0 PID: 0 Comm: swapper/26 Tainted: G        W          6.19.0+ #98
  Tainted: [W]=WARN
  Hardware name: NVIDIA GB200 NVL/P3809-BMC, BIOS 02.05.10 20251010
  pstate: 234000c9 (nzCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
  pc : iommu_pgsize.isra.0+0xe8/0xf8
  lr : __iommu_unmap+0xe0/0x308
  sp : ffff80008034fca0
  x29: ffff80008034fca0 x28: 000000000000fffe x27: ffffc6e7950e60b0
  x26: 00000000f9740000 x25: ffffc6e794b2cde8 x24: ffffc6e7967916a8
  x23: ffff80008034fdb8 x22: 0000000000030000 x21: ffff000030949220
  x20: 00000000f974fffe x19: fffffffffffffffe x18: 0000000000000000
  x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
  x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
  x11: 000000000000003f x10: 0000000020010000 x9 : 00000000f974fffe
  x8 : 000000000000003f x7 : 0000000000000000 x6 : ffffffffffffffff
  x5 : 0000000000000000 x4 : ffff80008034fd00 x3 : 0000000000020002
  x2 : 00000000f974fffe x1 : 00000000f974fffe x0 : 0000000020010000
  Call trace:
   iommu_pgsize.isra.0+0xe8/0xf8 (P)
   iommu_unmap_fast+0x18/0x40
   __iommu_dma_iova_unlink+0xec/0x2e8
   dma_iova_destroy+0x30/0xa0
   nvme_unmap_data+0x200/0x2e8 [nvme]
   nvme_pci_complete_batch+0x58/0xa8 [nvme]
   nvme_irq+0x98/0xa8 [nvme]
   __handle_irq_event_percpu+0xbc/0x498
   handle_irq_event+0x54/0xe0
   handle_fasteoi_irq+0x12c/0x1c8
   handle_irq_desc+0x54/0x90
   generic_handle_domain_irq+0x24/0x48
   gic_handle_irq+0x200/0x410
   call_on_irq_stack+0x30/0x48
   do_interrupt_handler+0xa8/0xb8
   el1_interrupt+0x4c/0xd0
   el1h_64_irq_handler+0x18/0x38
   el1h_64_irq+0x84/0x88
   cpuidle_enter_state+0x110/0x6a8 (P)
   cpuidle_enter+0x40/0x70
   do_idle+0x264/0x310
   cpu_startup_entry+0x3c/0x50
   secondary_start_kernel+0x13c/0x180
   __secondary_switched+0xc0/0xc8
  Code: d2800009 d280000a d280000b d65f03c0 (d4210000)
  ---[ end trace 0000000000000000 ]---

Fixes: 3318f7b5cefb ("iommu/io-pgtable-arm: Add quirk to quiet WARN_ON()")
Cc: stable@vger.kernel.org
Reported-by: Ankit Agrawal <ankita@nvidia.com>
Reported-by: Sebastian Ott <sebott@redhat.com>
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
---

=======================================================================
DISCLOSURE: DUE TO LACK OF H/W THIS PATCH IS COMPLETELY UNTESTED AND
BASED SOLELY ON THEORETICAL ANALYSIS. PLEASE REVIEW CAREFULLY.
=======================================================================

---
  drivers/iommu/io-pgtable-arm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index e6626004b323..05d63fe92e43 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -637,7 +637,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
  	pte = READ_ONCE(*ptep);
  	if (!pte) {
  		WARN_ON(!(data->iop.cfg.quirks & IO_PGTABLE_QUIRK_NO_WARN));
-		return -ENOENT;
+		return 0;
  	}
  
  	/* If the size matches this level, we're in the right place */
-- 
2.40.0

######################################################################


 From aa540bb77f7d4460c87b0a317df264de748a3b3c Mon Sep 17 00:00:00 2001
From: Chaitanya Kulkarni <ckulkarnilinux@gmail.com>
Date: Tue, 9 Dec 2025 17:01:15 -0800
Subject: [PATCH 2/2 COMPILE TESTED ONLY] block: fix partial IOVA mapping cleanup in
  blk_rq_dma_map_iova

When dma_iova_link() fails partway through mapping a request's
scatter-gather list, the function would break out of the loop without
cleaning up the already-mapped portions. This leads to a map/unmap
size mismatch when the request is later completed.

The completion path (via dma_iova_destroy or nvme_unmap_data) attempts
to unmap the full expected size, but only a partial size was actually
mapped. This triggers "unmapped PTE" warnings in the ARM LPAE io-pgtable
code and can cause IOVA address corruption.

Fix by adding an out_unlink error path that calls dma_iova_unlink()
to clean up any partial mapping before returning failure. This ensures
that when an error occurs:
1. All partially-mapped IOVA ranges are properly unmapped
2. The completion path won't attempt to unmap non-existent mappings
3. No map/unmap size mismatch occurs

Fixes: 858299dc6160 ("block: add scatterlist-less DMA mapping helpers")
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
---

=======================================================================
DISCLOSURE: DUE TO LACK OF H/W THIS PATCH IS COMPLETELY UNTESTED AND
BASED SOLELY ON THEORETICAL ANALYSIS. PLEASE REVIEW CAREFULLY.
=======================================================================

---
  block/blk-mq-dma.c | 19 ++++++++++++++-----
  1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
index b6dbc9767596..eb8b5b6b595c 100644
--- a/block/blk-mq-dma.c
+++ b/block/blk-mq-dma.c
@@ -126,17 +126,26 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev,
  		error = dma_iova_link(dma_dev, state, vec->paddr, mapped,
  				vec->len, dir, attrs);
  		if (error)
-			break;
+			goto out_unlink;
  		mapped += vec->len;
  	} while (blk_map_iter_next(req, &iter->iter, vec));
  
  	error = dma_iova_sync(dma_dev, state, 0, mapped);
-	if (error) {
-		iter->status = errno_to_blk_status(error);
-		return false;
-	}
+	if (error)
+		goto out_unlink;
  
  	return true;
+
+out_unlink:
+	/*
+	 * Unlink any partial mapping to avoid unmap mismatch later.
+	 * If we mapped some bytes but not all, we must clean up now
+	 * to prevent attempting to unmap more than was actually mapped.
+	 */
+	if (mapped)
+		dma_iova_unlink(dma_dev, state, 0, mapped, dir, attrs);
+	iter->status = errno_to_blk_status(error);
+	return false;
  }
  
  static inline void blk_rq_map_iter_init(struct request *rq,
-- 
2.40.0


-ck



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639
  2025-12-10  2:30     ` Chaitanya Kulkarni
@ 2025-12-10  4:05       ` Keith Busch
  2025-12-10  4:59         ` Chaitanya Kulkarni
  0 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2025-12-10  4:05 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Sebastian Ott, linux-nvme@lists.infradead.org,
	iommu@lists.linux.dev, Robin Murphy, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org,
	Jens Axboe, Christoph Hellwig, Will Deacon, Carlos Maiolino

On Wed, Dec 10, 2025 at 02:30:50AM +0000, Chaitanya Kulkarni wrote:
> @@ -126,17 +126,26 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev,
>   		error = dma_iova_link(dma_dev, state, vec->paddr, mapped,
>   				vec->len, dir, attrs);
>   		if (error)
> -			break;
> +			goto out_unlink;
>   		mapped += vec->len;
>   	} while (blk_map_iter_next(req, &iter->iter, vec));
>   
>   	error = dma_iova_sync(dma_dev, state, 0, mapped);
> -	if (error) {
> -		iter->status = errno_to_blk_status(error);
> -		return false;
> -	}
> +	if (error)
> +		goto out_unlink;
>   
>   	return true;
> +
> +out_unlink:
> +	/*
> +	 * Unlink any partial mapping to avoid unmap mismatch later.
> +	 * If we mapped some bytes but not all, we must clean up now
> +	 * to prevent attempting to unmap more than was actually mapped.
> +	 */
> +	if (mapped)
> +		dma_iova_unlink(dma_dev, state, 0, mapped, dir, attrs);
> +	iter->status = errno_to_blk_status(error);
> +	return false;
>   }

It does look like a bug to continue on when dma_iova_link() fails as the
caller thinks the entire mapping was successful, but I think you also
need to call dma_iova_free() to undo the earlier dma_iova_try_alloc(),
otherwise iova space is leaked.

I'm a bit doubtful this error condition was hit though: this sequence
is largely the same as it was in v6.18 before the regression. The only
difference since then should just be for handling P2P DMA across a host
bridge, which I don't think applies to the reported bug since that's a
pretty unusual thing to do.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639
  2025-12-10  4:05       ` Keith Busch
@ 2025-12-10  4:59         ` Chaitanya Kulkarni
  2025-12-10 17:12           ` Sebastian Ott
  0 siblings, 1 reply; 16+ messages in thread
From: Chaitanya Kulkarni @ 2025-12-10  4:59 UTC (permalink / raw)
  To: Keith Busch, Sebastian Ott
  Cc: linux-nvme@lists.infradead.org, iommu@lists.linux.dev,
	Robin Murphy, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org,
	Jens Axboe, Christoph Hellwig, Will Deacon, Carlos Maiolino,
	Leon Romanovsky

(+ Leon Romanovsky)

On 12/9/25 20:05, Keith Busch wrote:
> On Wed, Dec 10, 2025 at 02:30:50AM +0000, Chaitanya Kulkarni wrote:
>> @@ -126,17 +126,26 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev,
>>    		error = dma_iova_link(dma_dev, state, vec->paddr, mapped,
>>    				vec->len, dir, attrs);
>>    		if (error)
>> -			break;
>> +			goto out_unlink;
>>    		mapped += vec->len;
>>    	} while (blk_map_iter_next(req, &iter->iter, vec));
>>    
>>    	error = dma_iova_sync(dma_dev, state, 0, mapped);
>> -	if (error) {
>> -		iter->status = errno_to_blk_status(error);
>> -		return false;
>> -	}
>> +	if (error)
>> +		goto out_unlink;
>>    
>>    	return true;
>> +
>> +out_unlink:
>> +	/*
>> +	 * Unlink any partial mapping to avoid unmap mismatch later.
>> +	 * If we mapped some bytes but not all, we must clean up now
>> +	 * to prevent attempting to unmap more than was actually mapped.
>> +	 */
>> +	if (mapped)
>> +		dma_iova_unlink(dma_dev, state, 0, mapped, dir, attrs);
>> +	iter->status = errno_to_blk_status(error);
>> +	return false;
>>    }
> It does look like a bug to continue on when dma_iova_link() fails as the
> caller thinks the entire mapping was successful, but I think you also
> need to call dma_iova_free() to undo the earlier dma_iova_try_alloc(),
> otherwise iova space is leaked.

Thanks for catching that, see updated version of this patch [1].

> I'm a bit doubtful this error condition was hit though: this sequence
> is largely the same as it was in v6.18 before the regression. The only
> difference since then should just be for handling P2P DMA across a host
> bridge, which I don't think applies to the reported bug since that's a
> pretty unusual thing to do.

That's why I've asked reporter to test it.

Either way, IMO both of the patches are still needed.

-ck

[1]

 From 726687876a334cb699247584102e491e98f8fdc4 Mon Sep 17 00:00:00 2001
From: Chaitanya Kulkarni <ckulkarnilinux@gmail.com>
Date: Tue, 9 Dec 2025 17:01:15 -0800
Subject: [PATCH 2/2] block: fix partial IOVA mapping cleanup in
  blk_rq_dma_map_iova

When dma_iova_link() fails partway through mapping a request's
scatter-gather list, the function would break out of the loop without
cleaning up the already-mapped portions. This leads to a map/unmap
size mismatch when the request is later completed.

The completion path (via dma_iova_destroy or nvme_unmap_data) attempts
to unmap the full expected size, but only a partial size was actually
mapped. This triggers "unmapped PTE" warnings in the ARM LPAE io-pgtable
code and can cause IOVA address corruption.

Fix by adding an out_unlink error path that calls dma_iova_unlink()
to clean up any partial mapping before returning failure. This ensures
that when an error occurs:
1. All partially-mapped IOVA ranges are properly unmapped
2. The completion path won't attempt to unmap non-existent mappings
3. No map/unmap size mismatch occurs

Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com>
---
  block/blk-mq-dma.c | 21 ++++++++++++++++-----
  1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
index b6dbc9767596..ecfd53ed6984 100644
--- a/block/blk-mq-dma.c
+++ b/block/blk-mq-dma.c
@@ -126,17 +126,28 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev,
  		error = dma_iova_link(dma_dev, state, vec->paddr, mapped,
  				vec->len, dir, attrs);
  		if (error)
-			break;
+			goto out_unlink;
  		mapped += vec->len;
  	} while (blk_map_iter_next(req, &iter->iter, vec));
  
  	error = dma_iova_sync(dma_dev, state, 0, mapped);
-	if (error) {
-		iter->status = errno_to_blk_status(error);
-		return false;
-	}
+	if (error)
+		goto out_unlink;
  
  	return true;
+
+out_unlink:
+	/*
+	 * Clean up partial mapping and free the entire IOVA reservation.
+	 * dma_iova_unlink() detaches any linked bytes, dma_iova_free()
+	 * returns the full IOVA window allocated by dma_iova_try_alloc()
+	 * (state->__size tracks the original allocation size).
+	 */
+	if (mapped)
+		dma_iova_unlink(dma_dev, state, 0, mapped, dir, attrs);
+	dma_iova_free(dma_dev, state);
+	iter->status = errno_to_blk_status(error);
+	return false;
  }
  
  static inline void blk_rq_map_iter_init(struct request *rq,
-- 
2.40.0




^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639
  2025-12-09 11:43 WARNING: drivers/iommu/io-pgtable-arm.c:639 Sebastian Ott
  2025-12-09 11:50 ` Robin Murphy
@ 2025-12-10  5:02 ` Keith Busch
  2025-12-10  5:33   ` Keith Busch
  2025-12-10 11:08   ` Sebastian Ott
  1 sibling, 2 replies; 16+ messages in thread
From: Keith Busch @ 2025-12-10  5:02 UTC (permalink / raw)
  To: Sebastian Ott
  Cc: linux-nvme, iommu, linux-block, linux-kernel, linux-xfs,
	Jens Axboe, Christoph Hellwig, Will Deacon, Robin Murphy,
	Carlos Maiolino

On Tue, Dec 09, 2025 at 12:43:31PM +0100, Sebastian Ott wrote:
> got the following warning after a kernel update on Thurstday, leading to a
> panic and fs corruption. I didn't capture the first warning but I'm pretty
> sure it was the same. It's reproducible but I didn't bisect since it
> borked my fs. The only hint I can give is that v6.18 worked. Is this a
> known issue? Anything I should try?

Could you check if your nvme device supports SGLs? There are some new
features in 6.19 that would allow merging IO that wouldn't have happened
before. You can check from command line:

  # nvme id-ctrl /dev/nvme0 | grep sgl

Replace "nvme0" with whatever your instance was named if it's not using
the 0 suffix.

What I'm thinking happened is that you had an IO that could be coalesced
in IOVA space at one point, and then when that request was completed and
later reused. The new request merged bio's that could not coalesce, and
the problem with that is that we never reinitialize the iova state, so
we're using the old context. And if that is what's happening, here's a
quick fix:

---
diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
index e9108ccaf4b06..7bff480d666e2 100644
--- a/block/blk-mq-dma.c
+++ b/block/blk-mq-dma.c
@@ -199,6 +199,7 @@ static bool blk_dma_map_iter_start(struct request *req, struct device *dma_dev,
 	if (blk_can_dma_map_iova(req, dma_dev) &&
 	    dma_iova_try_alloc(dma_dev, state, vec.paddr, total_len))
 		return blk_rq_dma_map_iova(req, dma_dev, state, iter, &vec);
+	state->__size = 0;
 	return blk_dma_map_direct(req, dma_dev, iter, &vec);
 }

--

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639
  2025-12-10  5:02 ` Keith Busch
@ 2025-12-10  5:33   ` Keith Busch
  2025-12-10 11:08   ` Sebastian Ott
  1 sibling, 0 replies; 16+ messages in thread
From: Keith Busch @ 2025-12-10  5:33 UTC (permalink / raw)
  To: Sebastian Ott
  Cc: linux-nvme, iommu, linux-block, linux-kernel, linux-xfs,
	Jens Axboe, Christoph Hellwig, Will Deacon, Robin Murphy,
	Carlos Maiolino

On Wed, Dec 10, 2025 at 02:02:43PM +0900, Keith Busch wrote:
> On Tue, Dec 09, 2025 at 12:43:31PM +0100, Sebastian Ott wrote:
> > got the following warning after a kernel update on Thurstday, leading to a
> > panic and fs corruption. I didn't capture the first warning but I'm pretty
> > sure it was the same. It's reproducible but I didn't bisect since it
> > borked my fs. The only hint I can give is that v6.18 worked. Is this a
> > known issue? Anything I should try?
> 
> Could you check if your nvme device supports SGLs? There are some new
> features in 6.19 that would allow merging IO that wouldn't have happened
> before. You can check from command line:

Actually the SGL support is probably unnecessary for ARM if your iommu
granularity is 64k. That setup could also lead to an uninitialized
"state" and the type of corruption you're observing. But the same patch
below is still the proposed fix for it anyway.
 
> ---
> diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
> index e9108ccaf4b06..7bff480d666e2 100644
> --- a/block/blk-mq-dma.c
> +++ b/block/blk-mq-dma.c
> @@ -199,6 +199,7 @@ static bool blk_dma_map_iter_start(struct request *req, struct device *dma_dev,
>  	if (blk_can_dma_map_iova(req, dma_dev) &&
>  	    dma_iova_try_alloc(dma_dev, state, vec.paddr, total_len))
>  		return blk_rq_dma_map_iova(req, dma_dev, state, iter, &vec);
> +	state->__size = 0;
>  	return blk_dma_map_direct(req, dma_dev, iter, &vec);
>  }
> 
> --
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639
  2025-12-10  5:02 ` Keith Busch
  2025-12-10  5:33   ` Keith Busch
@ 2025-12-10 11:08   ` Sebastian Ott
  2025-12-10 11:21     ` Keith Busch
  1 sibling, 1 reply; 16+ messages in thread
From: Sebastian Ott @ 2025-12-10 11:08 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, iommu, linux-block, linux-kernel, linux-xfs,
	Jens Axboe, Christoph Hellwig, Will Deacon, Robin Murphy,
	Carlos Maiolino

On Wed, 10 Dec 2025, Keith Busch wrote:
> On Tue, Dec 09, 2025 at 12:43:31PM +0100, Sebastian Ott wrote:
>> got the following warning after a kernel update on Thurstday, leading to a
>> panic and fs corruption. I didn't capture the first warning but I'm pretty
>> sure it was the same. It's reproducible but I didn't bisect since it
>> borked my fs. The only hint I can give is that v6.18 worked. Is this a
>> known issue? Anything I should try?
>
> Could you check if your nvme device supports SGLs? There are some new
> features in 6.19 that would allow merging IO that wouldn't have happened
> before. You can check from command line:
>
>  # nvme id-ctrl /dev/nvme0 | grep sgl

# nvme id-ctrl /dev/nvme0n1 | grep sgl
sgls      : 0xf0002

Thanks,
Sebastian


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639
  2025-12-10 11:08   ` Sebastian Ott
@ 2025-12-10 11:21     ` Keith Busch
  2025-12-10 16:57       ` Sebastian Ott
  0 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2025-12-10 11:21 UTC (permalink / raw)
  To: Sebastian Ott
  Cc: linux-nvme, iommu, linux-block, linux-kernel, linux-xfs,
	Jens Axboe, Christoph Hellwig, Will Deacon, Robin Murphy,
	Carlos Maiolino

On Wed, Dec 10, 2025 at 12:08:36PM +0100, Sebastian Ott wrote:
> On Wed, 10 Dec 2025, Keith Busch wrote:
> > On Tue, Dec 09, 2025 at 12:43:31PM +0100, Sebastian Ott wrote:
> > > got the following warning after a kernel update on Thurstday, leading to a
> > > panic and fs corruption. I didn't capture the first warning but I'm pretty
> > > sure it was the same. It's reproducible but I didn't bisect since it
> > > borked my fs. The only hint I can give is that v6.18 worked. Is this a
> > > known issue? Anything I should try?
> > 
> > Could you check if your nvme device supports SGLs? There are some new
> > features in 6.19 that would allow merging IO that wouldn't have happened
> > before. You can check from command line:
> > 
> >  # nvme id-ctrl /dev/nvme0 | grep sgl
> 
> # nvme id-ctrl /dev/nvme0n1 | grep sgl
> sgls      : 0xf0002

Oh neat, so you *do* support SGL. Not that it was required as arm64
can support iommu granularities larger than the NVMe PRP unit, so the
bug was possible to hit in either case for you (assuming the smmu was
configured with 64k io page size).

Anyway, thanks for the report, and sorry for the fs trouble the bug
caused you. I'm working on a blktest to specifically target this
condition so we don't regress again. I just need to make sure to run it
on a system with iommu enabled (usually it's off on my test machine).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639
  2025-12-10 11:21     ` Keith Busch
@ 2025-12-10 16:57       ` Sebastian Ott
  0 siblings, 0 replies; 16+ messages in thread
From: Sebastian Ott @ 2025-12-10 16:57 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, iommu, linux-block, linux-kernel, linux-xfs,
	Jens Axboe, Christoph Hellwig, Will Deacon, Robin Murphy,
	Carlos Maiolino

On Wed, 10 Dec 2025, Keith Busch wrote:
> On Wed, Dec 10, 2025 at 12:08:36PM +0100, Sebastian Ott wrote:
>> On Wed, 10 Dec 2025, Keith Busch wrote:
>>> On Tue, Dec 09, 2025 at 12:43:31PM +0100, Sebastian Ott wrote:
>>>> got the following warning after a kernel update on Thurstday, leading to a
>>>> panic and fs corruption. I didn't capture the first warning but I'm pretty
>>>> sure it was the same. It's reproducible but I didn't bisect since it
>>>> borked my fs. The only hint I can give is that v6.18 worked. Is this a
>>>> known issue? Anything I should try?
>>>
>>> Could you check if your nvme device supports SGLs? There are some new
>>> features in 6.19 that would allow merging IO that wouldn't have happened
>>> before. You can check from command line:
>>>
>>>  # nvme id-ctrl /dev/nvme0 | grep sgl
>>
>> # nvme id-ctrl /dev/nvme0n1 | grep sgl
>> sgls      : 0xf0002
>
> Oh neat, so you *do* support SGL. Not that it was required as arm64
> can support iommu granularities larger than the NVMe PRP unit, so the
> bug was possible to hit in either case for you (assuming the smmu was
> configured with 64k io page size).
>
> Anyway, thanks for the report, and sorry for the fs trouble the bug
> caused you.

No worries, it was a test system in need for an upgrade anyway.
Thanks for the quick fix!

> I'm working on a blktest to specifically target this
> condition so we don't regress again. I just need to make sure to run it
> on a system with iommu enabled (usually it's off on my test machine).

Great!


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639
  2025-12-10  4:59         ` Chaitanya Kulkarni
@ 2025-12-10 17:12           ` Sebastian Ott
  2025-12-10 21:12             ` Chaitanya Kulkarni
  0 siblings, 1 reply; 16+ messages in thread
From: Sebastian Ott @ 2025-12-10 17:12 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Keith Busch, linux-nvme@lists.infradead.org,
	iommu@lists.linux.dev, Robin Murphy, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org,
	Jens Axboe, Christoph Hellwig, Will Deacon, Carlos Maiolino,
	Leon Romanovsky

On Wed, 10 Dec 2025, Chaitanya Kulkarni wrote:
> (+ Leon Romanovsky)
>
> On 12/9/25 20:05, Keith Busch wrote:
>> On Wed, Dec 10, 2025 at 02:30:50AM +0000, Chaitanya Kulkarni wrote:
>>> @@ -126,17 +126,26 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev,
>>>    		error = dma_iova_link(dma_dev, state, vec->paddr, mapped,
>>>    				vec->len, dir, attrs);
>>>    		if (error)
>>> -			break;
>>> +			goto out_unlink;
>>>    		mapped += vec->len;
>>>    	} while (blk_map_iter_next(req, &iter->iter, vec));
>>>
>>>    	error = dma_iova_sync(dma_dev, state, 0, mapped);
>>> -	if (error) {
>>> -		iter->status = errno_to_blk_status(error);
>>> -		return false;
>>> -	}
>>> +	if (error)
>>> +		goto out_unlink;
>>>
>>>    	return true;
>>> +
>>> +out_unlink:
>>> +	/*
>>> +	 * Unlink any partial mapping to avoid unmap mismatch later.
>>> +	 * If we mapped some bytes but not all, we must clean up now
>>> +	 * to prevent attempting to unmap more than was actually mapped.
>>> +	 */
>>> +	if (mapped)
>>> +		dma_iova_unlink(dma_dev, state, 0, mapped, dir, attrs);
>>> +	iter->status = errno_to_blk_status(error);
>>> +	return false;
>>>    }
>> It does look like a bug to continue on when dma_iova_link() fails as the
>> caller thinks the entire mapping was successful, but I think you also
>> need to call dma_iova_free() to undo the earlier dma_iova_try_alloc(),
>> otherwise iova space is leaked.
>
> Thanks for catching that, see updated version of this patch [1].
>
>> I'm a bit doubtful this error condition was hit though: this sequence
>> is largely the same as it was in v6.18 before the regression. The only
>> difference since then should just be for handling P2P DMA across a host
>> bridge, which I don't think applies to the reported bug since that's a
>> pretty unusual thing to do.
>
> That's why I've asked reporter to test it.
>
> Either way, IMO both of the patches are still needed.
>

The patch Keith posted fixes the issue for me. Should I do another run
with only these 2 applied?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639
  2025-12-10 17:12           ` Sebastian Ott
@ 2025-12-10 21:12             ` Chaitanya Kulkarni
  0 siblings, 0 replies; 16+ messages in thread
From: Chaitanya Kulkarni @ 2025-12-10 21:12 UTC (permalink / raw)
  To: Sebastian Ott
  Cc: Keith Busch, linux-nvme@lists.infradead.org,
	iommu@lists.linux.dev, Robin Murphy, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org,
	Jens Axboe, Christoph Hellwig, Will Deacon, Carlos Maiolino,
	Leon Romanovsky

On 12/10/25 09:12, Sebastian Ott wrote:
> On Wed, 10 Dec 2025, Chaitanya Kulkarni wrote:
>> (+ Leon Romanovsky)
>>
>> On 12/9/25 20:05, Keith Busch wrote:
>>> On Wed, Dec 10, 2025 at 02:30:50AM +0000, Chaitanya Kulkarni wrote:
>>>> @@ -126,17 +126,26 @@ static bool blk_rq_dma_map_iova(struct 
>>>> request *req, struct device *dma_dev,
>>>>            error = dma_iova_link(dma_dev, state, vec->paddr, mapped,
>>>>                    vec->len, dir, attrs);
>>>>            if (error)
>>>> -            break;
>>>> +            goto out_unlink;
>>>>            mapped += vec->len;
>>>>        } while (blk_map_iter_next(req, &iter->iter, vec));
>>>>
>>>>        error = dma_iova_sync(dma_dev, state, 0, mapped);
>>>> -    if (error) {
>>>> -        iter->status = errno_to_blk_status(error);
>>>> -        return false;
>>>> -    }
>>>> +    if (error)
>>>> +        goto out_unlink;
>>>>
>>>>        return true;
>>>> +
>>>> +out_unlink:
>>>> +    /*
>>>> +     * Unlink any partial mapping to avoid unmap mismatch later.
>>>> +     * If we mapped some bytes but not all, we must clean up now
>>>> +     * to prevent attempting to unmap more than was actually mapped.
>>>> +     */
>>>> +    if (mapped)
>>>> +        dma_iova_unlink(dma_dev, state, 0, mapped, dir, attrs);
>>>> +    iter->status = errno_to_blk_status(error);
>>>> +    return false;
>>>>    }
>>> It does look like a bug to continue on when dma_iova_link() fails as 
>>> the
>>> caller thinks the entire mapping was successful, but I think you also
>>> need to call dma_iova_free() to undo the earlier dma_iova_try_alloc(),
>>> otherwise iova space is leaked.
>>
>> Thanks for catching that, see updated version of this patch [1].
>>
>>> I'm a bit doubtful this error condition was hit though: this sequence
>>> is largely the same as it was in v6.18 before the regression. The only
>>> difference since then should just be for handling P2P DMA across a host
>>> bridge, which I don't think applies to the reported bug since that's a
>>> pretty unusual thing to do.
>>
>> That's why I've asked reporter to test it.
>>
>> Either way, IMO both of the patches are still needed.
>>
>
> The patch Keith posted fixes the issue for me. Should I do another run
> with only these 2 applied?
>
no need for another run, these fixes are needed anyways.

I'll send formal patches for these.

Thanks for reporting this.

-ck



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-12-10 21:12 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-09 11:43 WARNING: drivers/iommu/io-pgtable-arm.c:639 Sebastian Ott
2025-12-09 11:50 ` Robin Murphy
2025-12-09 17:29   ` Chaitanya Kulkarni
2025-12-09 17:34     ` Robin Murphy
2025-12-09 17:59       ` Chaitanya Kulkarni
2025-12-09 21:05   ` Sebastian Ott
2025-12-10  2:30     ` Chaitanya Kulkarni
2025-12-10  4:05       ` Keith Busch
2025-12-10  4:59         ` Chaitanya Kulkarni
2025-12-10 17:12           ` Sebastian Ott
2025-12-10 21:12             ` Chaitanya Kulkarni
2025-12-10  5:02 ` Keith Busch
2025-12-10  5:33   ` Keith Busch
2025-12-10 11:08   ` Sebastian Ott
2025-12-10 11:21     ` Keith Busch
2025-12-10 16:57       ` Sebastian Ott

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).