* WARNING: drivers/iommu/io-pgtable-arm.c:639 @ 2025-12-09 11:43 Sebastian Ott 2025-12-09 11:50 ` Robin Murphy 2025-12-10 5:02 ` Keith Busch 0 siblings, 2 replies; 16+ messages in thread From: Sebastian Ott @ 2025-12-09 11:43 UTC (permalink / raw) To: linux-nvme, iommu, linux-block, linux-kernel, linux-xfs Cc: Jens Axboe, Christoph Hellwig, Will Deacon, Robin Murphy, Carlos Maiolino Hi, got the following warning after a kernel update on Thurstday, leading to a panic and fs corruption. I didn't capture the first warning but I'm pretty sure it was the same. It's reproducible but I didn't bisect since it borked my fs. The only hint I can give is that v6.18 worked. Is this a known issue? Anything I should try? [64906.234244] WARNING: drivers/iommu/io-pgtable-arm.c:639 at __arm_lpae_unmap+0x358/0x3d0, CPU#94: kworker/94:0/494 [64906.234247] Modules linked in: mlx5_ib ib_uverbs ib_core qrtr rfkill sunrpc mlx5_core cdc_eem usbnet mii acpi_ipmi ipmi_ssif ipmi_devintf ipmi_msghandler mlxfw arm_cmn psample arm_spe_pmu arm_dmc620_pmu vfat fat arm_dsu_pmu cppc_cpufreq fuse loop dm_multipath nfnetlink zram xfs nvme mgag200 ghash_ce sbsa_gwdt nvme_core i2c_algo_bit xgene_hwmon scsi_dh_rdac scsi_dh_emc scsi_dh_alua i2c_dev [64906.234269] CPU: 94 UID: 0 PID: 494 Comm: kworker/94:0 Tainted: G W 6.18.0+ #1 PREEMPT(voluntary) [64906.234271] Tainted: [W]=WARN [64906.234271] Hardware name: HPE ProLiant RL300 Gen11/ProLiant RL300 Gen11, BIOS 1.50 12/18/2023 [64906.234272] Workqueue: xfs-buf/nvme1n1p1 xfs_buf_ioend_work [xfs] [64906.234383] pstate: 804000c9 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [64906.234385] pc : __arm_lpae_unmap+0x358/0x3d0 [64906.234386] lr : __arm_lpae_unmap+0x100/0x3d0 [64906.234387] sp : ffff800083d4bad0 [64906.234388] x29: ffff800083d4bad0 x28: 00000000f3460000 x27: ffff800081bb28e8 [64906.234391] x26: 0000000000001000 x25: ffff800083d4be00 x24: 00000000f3460000 [64906.234393] x23: 0000000000001000 x22: ffff07ff85de9c20 x21: 0000000000000001 [64906.234395] x20: 0000000000000000 x19: ffff07ff9d540300 x18: 0000000000000300 [64906.234398] x17: ffff887cbd289000 x16: ffff800083d48000 x15: 0000000000001000 [64906.234400] x14: 0000000000000fc4 x13: 0000000000000820 x12: 0000000000001000 [64906.234402] x11: 0000000000000006 x10: ffff07ffa1b9c300 x9 : 0000000000000009 [64906.234405] x8 : 0000000000000060 x7 : 000000000000000c x6 : ffff07ffa1b9c000 [64906.234407] x5 : 0000000000000003 x4 : 0000000000000001 x3 : 0000000000001000 [64906.234409] x2 : 0000000000000000 x1 : ffff800083d4be00 x0 : 0000000000000000 [64906.234411] Call trace: [64906.234412] __arm_lpae_unmap+0x358/0x3d0 (P) [64906.234414] __arm_lpae_unmap+0x100/0x3d0 [64906.234415] __arm_lpae_unmap+0x100/0x3d0 [64906.234417] __arm_lpae_unmap+0x100/0x3d0 [64906.234418] arm_lpae_unmap_pages+0x74/0x90 [64906.234420] arm_smmu_unmap_pages+0x24/0x40 [64906.234422] __iommu_unmap+0xe8/0x2a0 [64906.234424] iommu_unmap_fast+0x18/0x30 [64906.234426] __iommu_dma_iova_unlink+0xe4/0x280 [64906.234428] dma_iova_destroy+0x30/0x58 [64906.234431] nvme_unmap_data+0x88/0x248 [nvme] [64906.234434] nvme_poll_cq+0x1d4/0x3e0 [nvme] [64906.234438] nvme_irq+0x28/0x70 [nvme] [64906.234441] __handle_irq_event_percpu+0x84/0x370 [64906.234444] handle_irq_event+0x4c/0xb0 [64906.234447] handle_fasteoi_irq+0x110/0x1a8 [64906.234449] handle_irq_desc+0x3c/0x68 [64906.234451] generic_handle_domain_irq+0x24/0x40 [64906.234454] gic_handle_irq+0x5c/0xe0 [64906.234455] call_on_irq_stack+0x30/0x48 [64906.234457] do_interrupt_handler+0xdc/0xe0 [64906.234459] el1_interrupt+0x38/0x60 [64906.234462] el1h_64_irq_handler+0x18/0x30 [64906.234464] el1h_64_irq+0x70/0x78 [64906.234466] arm_lpae_init_pte+0x228/0x238 (P) [64906.234467] __arm_lpae_map+0x2f8/0x378 [64906.234469] __arm_lpae_map+0x114/0x378 [64906.234470] __arm_lpae_map+0x114/0x378 [64906.234472] __arm_lpae_map+0x114/0x378 [64906.234473] arm_lpae_map_pages+0x108/0x240 [64906.234475] arm_smmu_map_pages+0x24/0x40 [64906.234477] iommu_map_nosync+0x124/0x310 [64906.234479] iommu_map+0x2c/0xb0 [64906.234481] __iommu_dma_map+0xbc/0x1b0 [64906.234484] iommu_dma_map_phys+0xf0/0x1c0 [64906.234486] dma_map_phys+0x190/0x1b0 [64906.234488] dma_map_page_attrs+0x50/0x70 [64906.234490] nvme_map_data+0x21c/0x318 [nvme] [64906.234493] nvme_prep_rq+0x60/0x200 [nvme] [64906.234496] nvme_queue_rq+0x48/0x180 [nvme] [64906.234499] blk_mq_dispatch_rq_list+0xfc/0x4d0 [64906.234502] __blk_mq_sched_dispatch_requests+0xa4/0x1b0 [64906.234504] blk_mq_sched_dispatch_requests+0x38/0xa0 [64906.234506] blk_mq_run_hw_queue+0x2f0/0x3d0 [64906.234509] blk_mq_issue_direct+0x12c/0x280 [64906.234511] blk_mq_dispatch_queue_requests+0x258/0x318 [64906.234514] blk_mq_flush_plug_list+0x68/0x170 [64906.234515] __blk_flush_plug+0xf0/0x140 [64906.234518] blk_finish_plug+0x34/0x50 [64906.234520] xfs_buf_submit_bio+0x158/0x1a8 [xfs] [64906.234630] xfs_buf_submit+0x80/0x268 [xfs] [64906.234739] xfs_buf_ioend_handle_error+0x254/0x480 [xfs] [64906.234848] __xfs_buf_ioend+0x18c/0x218 [xfs] [64906.234957] xfs_buf_ioend_work+0x24/0x60 [xfs] [64906.235066] process_one_work+0x22c/0x658 [64906.235069] worker_thread+0x1ac/0x360 [64906.235072] kthread+0x110/0x138 [64906.235074] ret_from_fork+0x10/0x20 [64906.235075] ---[ end trace 0000000000000000 ]--- Thanks, Sebastian ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639 2025-12-09 11:43 WARNING: drivers/iommu/io-pgtable-arm.c:639 Sebastian Ott @ 2025-12-09 11:50 ` Robin Murphy 2025-12-09 17:29 ` Chaitanya Kulkarni 2025-12-09 21:05 ` Sebastian Ott 2025-12-10 5:02 ` Keith Busch 1 sibling, 2 replies; 16+ messages in thread From: Robin Murphy @ 2025-12-09 11:50 UTC (permalink / raw) To: Sebastian Ott, linux-nvme, iommu, linux-block, linux-kernel, linux-xfs Cc: Jens Axboe, Christoph Hellwig, Will Deacon, Carlos Maiolino On 2025-12-09 11:43 am, Sebastian Ott wrote: > Hi, > > got the following warning after a kernel update on Thurstday, leading to a > panic and fs corruption. I didn't capture the first warning but I'm pretty > sure it was the same. It's reproducible but I didn't bisect since it > borked my fs. The only hint I can give is that v6.18 worked. Is this a > known issue? Anything I should try? nvme_unmap_data() is attempting to unmap an IOVA that was never mapped, or has already been unmapped by someone else. That's a usage bug. Thanks, Robin. > [64906.234244] WARNING: drivers/iommu/io-pgtable-arm.c:639 at > __arm_lpae_unmap+0x358/0x3d0, CPU#94: kworker/94:0/494 > [64906.234247] Modules linked in: mlx5_ib ib_uverbs ib_core qrtr rfkill > sunrpc mlx5_core cdc_eem usbnet mii acpi_ipmi ipmi_ssif ipmi_devintf > ipmi_msghandler mlxfw arm_cmn psample arm_spe_pmu arm_dmc620_pmu vfat > fat arm_dsu_pmu cppc_cpufreq fuse loop dm_multipath nfnetlink zram xfs > nvme mgag200 ghash_ce sbsa_gwdt nvme_core i2c_algo_bit xgene_hwmon > scsi_dh_rdac scsi_dh_emc scsi_dh_alua i2c_dev > [64906.234269] CPU: 94 UID: 0 PID: 494 Comm: kworker/94:0 Tainted: > G W 6.18.0+ #1 PREEMPT(voluntary) [64906.234271] > Tainted: [W]=WARN > [64906.234271] Hardware name: HPE ProLiant RL300 Gen11/ProLiant RL300 > Gen11, BIOS 1.50 12/18/2023 > [64906.234272] Workqueue: xfs-buf/nvme1n1p1 xfs_buf_ioend_work [xfs] > [64906.234383] pstate: 804000c9 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS > BTYPE=--) > [64906.234385] pc : __arm_lpae_unmap+0x358/0x3d0 > [64906.234386] lr : __arm_lpae_unmap+0x100/0x3d0 > [64906.234387] sp : ffff800083d4bad0 > [64906.234388] x29: ffff800083d4bad0 x28: 00000000f3460000 x27: > ffff800081bb28e8 > [64906.234391] x26: 0000000000001000 x25: ffff800083d4be00 x24: > 00000000f3460000 > [64906.234393] x23: 0000000000001000 x22: ffff07ff85de9c20 x21: > 0000000000000001 > [64906.234395] x20: 0000000000000000 x19: ffff07ff9d540300 x18: > 0000000000000300 > [64906.234398] x17: ffff887cbd289000 x16: ffff800083d48000 x15: > 0000000000001000 > [64906.234400] x14: 0000000000000fc4 x13: 0000000000000820 x12: > 0000000000001000 > [64906.234402] x11: 0000000000000006 x10: ffff07ffa1b9c300 x9 : > 0000000000000009 > [64906.234405] x8 : 0000000000000060 x7 : 000000000000000c x6 : > ffff07ffa1b9c000 > [64906.234407] x5 : 0000000000000003 x4 : 0000000000000001 x3 : > 0000000000001000 > [64906.234409] x2 : 0000000000000000 x1 : ffff800083d4be00 x0 : > 0000000000000000 > [64906.234411] Call trace: > [64906.234412] __arm_lpae_unmap+0x358/0x3d0 (P) > [64906.234414] __arm_lpae_unmap+0x100/0x3d0 > [64906.234415] __arm_lpae_unmap+0x100/0x3d0 > [64906.234417] __arm_lpae_unmap+0x100/0x3d0 > [64906.234418] arm_lpae_unmap_pages+0x74/0x90 > [64906.234420] arm_smmu_unmap_pages+0x24/0x40 > [64906.234422] __iommu_unmap+0xe8/0x2a0 > [64906.234424] iommu_unmap_fast+0x18/0x30 > [64906.234426] __iommu_dma_iova_unlink+0xe4/0x280 > [64906.234428] dma_iova_destroy+0x30/0x58 > [64906.234431] nvme_unmap_data+0x88/0x248 [nvme] > [64906.234434] nvme_poll_cq+0x1d4/0x3e0 [nvme] > [64906.234438] nvme_irq+0x28/0x70 [nvme] > [64906.234441] __handle_irq_event_percpu+0x84/0x370 > [64906.234444] handle_irq_event+0x4c/0xb0 > [64906.234447] handle_fasteoi_irq+0x110/0x1a8 > [64906.234449] handle_irq_desc+0x3c/0x68 > [64906.234451] generic_handle_domain_irq+0x24/0x40 > [64906.234454] gic_handle_irq+0x5c/0xe0 > [64906.234455] call_on_irq_stack+0x30/0x48 > [64906.234457] do_interrupt_handler+0xdc/0xe0 > [64906.234459] el1_interrupt+0x38/0x60 > [64906.234462] el1h_64_irq_handler+0x18/0x30 > [64906.234464] el1h_64_irq+0x70/0x78 > [64906.234466] arm_lpae_init_pte+0x228/0x238 (P) > [64906.234467] __arm_lpae_map+0x2f8/0x378 > [64906.234469] __arm_lpae_map+0x114/0x378 > [64906.234470] __arm_lpae_map+0x114/0x378 > [64906.234472] __arm_lpae_map+0x114/0x378 > [64906.234473] arm_lpae_map_pages+0x108/0x240 > [64906.234475] arm_smmu_map_pages+0x24/0x40 > [64906.234477] iommu_map_nosync+0x124/0x310 > [64906.234479] iommu_map+0x2c/0xb0 > [64906.234481] __iommu_dma_map+0xbc/0x1b0 > [64906.234484] iommu_dma_map_phys+0xf0/0x1c0 > [64906.234486] dma_map_phys+0x190/0x1b0 > [64906.234488] dma_map_page_attrs+0x50/0x70 > [64906.234490] nvme_map_data+0x21c/0x318 [nvme] > [64906.234493] nvme_prep_rq+0x60/0x200 [nvme] > [64906.234496] nvme_queue_rq+0x48/0x180 [nvme] > [64906.234499] blk_mq_dispatch_rq_list+0xfc/0x4d0 > [64906.234502] __blk_mq_sched_dispatch_requests+0xa4/0x1b0 > [64906.234504] blk_mq_sched_dispatch_requests+0x38/0xa0 > [64906.234506] blk_mq_run_hw_queue+0x2f0/0x3d0 > [64906.234509] blk_mq_issue_direct+0x12c/0x280 > [64906.234511] blk_mq_dispatch_queue_requests+0x258/0x318 > [64906.234514] blk_mq_flush_plug_list+0x68/0x170 > [64906.234515] __blk_flush_plug+0xf0/0x140 > [64906.234518] blk_finish_plug+0x34/0x50 > [64906.234520] xfs_buf_submit_bio+0x158/0x1a8 [xfs] > [64906.234630] xfs_buf_submit+0x80/0x268 [xfs] > [64906.234739] xfs_buf_ioend_handle_error+0x254/0x480 [xfs] > [64906.234848] __xfs_buf_ioend+0x18c/0x218 [xfs] > [64906.234957] xfs_buf_ioend_work+0x24/0x60 [xfs] > [64906.235066] process_one_work+0x22c/0x658 > [64906.235069] worker_thread+0x1ac/0x360 > [64906.235072] kthread+0x110/0x138 > [64906.235074] ret_from_fork+0x10/0x20 > [64906.235075] ---[ end trace 0000000000000000 ]--- > > Thanks, > Sebastian > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639 2025-12-09 11:50 ` Robin Murphy @ 2025-12-09 17:29 ` Chaitanya Kulkarni 2025-12-09 17:34 ` Robin Murphy 2025-12-09 21:05 ` Sebastian Ott 1 sibling, 1 reply; 16+ messages in thread From: Chaitanya Kulkarni @ 2025-12-09 17:29 UTC (permalink / raw) To: Robin Murphy Cc: Jens Axboe, Christoph Hellwig, Will Deacon, Carlos Maiolino, iommu@lists.linux.dev, linux-xfs@vger.kernel.org, linux-nvme@lists.infradead.org, Sebastian Ott, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org On 12/9/25 03:50, Robin Murphy wrote: > On 2025-12-09 11:43 am, Sebastian Ott wrote: >> Hi, >> >> got the following warning after a kernel update on Thurstday, leading >> to a >> panic and fs corruption. I didn't capture the first warning but I'm >> pretty >> sure it was the same. It's reproducible but I didn't bisect since it >> borked my fs. The only hint I can give is that v6.18 worked. Is this a >> known issue? Anything I should try? > > nvme_unmap_data() is attempting to unmap an IOVA that was never > mapped, or has already been unmapped by someone else. That's a usage bug. > > Thanks, > Robin. Ankit A. also reported this. Apart from unmapping, by any chance do we need this ? diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index e6626004b323..05d63fe92e43 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -637,7 +637,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data, pte = READ_ONCE(*ptep); if (!pte) { WARN_ON(!(data->iop.cfg.quirks & IO_PGTABLE_QUIRK_NO_WARN)); - return -ENOENT; + return 0; } /* If the size matches this level, we're in the right place */ -- 2.40.0 disclaimer :- THIS PATCH IS COMPLETELY UNTESTED AND MAY BE INCORRECT. PLEASE REVIEW CAREFULLY. -ck ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639 2025-12-09 17:29 ` Chaitanya Kulkarni @ 2025-12-09 17:34 ` Robin Murphy 2025-12-09 17:59 ` Chaitanya Kulkarni 0 siblings, 1 reply; 16+ messages in thread From: Robin Murphy @ 2025-12-09 17:34 UTC (permalink / raw) To: Chaitanya Kulkarni Cc: Jens Axboe, Christoph Hellwig, Will Deacon, Carlos Maiolino, iommu@lists.linux.dev, linux-xfs@vger.kernel.org, linux-nvme@lists.infradead.org, Sebastian Ott, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org On 2025-12-09 5:29 pm, Chaitanya Kulkarni wrote: > On 12/9/25 03:50, Robin Murphy wrote: >> On 2025-12-09 11:43 am, Sebastian Ott wrote: >>> Hi, >>> >>> got the following warning after a kernel update on Thurstday, leading >>> to a >>> panic and fs corruption. I didn't capture the first warning but I'm >>> pretty >>> sure it was the same. It's reproducible but I didn't bisect since it >>> borked my fs. The only hint I can give is that v6.18 worked. Is this a >>> known issue? Anything I should try? >> >> nvme_unmap_data() is attempting to unmap an IOVA that was never >> mapped, or has already been unmapped by someone else. That's a usage bug. >> >> Thanks, >> Robin. > > Ankit A. also reported this. > > Apart from unmapping, by any chance do we need this ? > > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c > index e6626004b323..05d63fe92e43 100644 > --- a/drivers/iommu/io-pgtable-arm.c > +++ b/drivers/iommu/io-pgtable-arm.c > @@ -637,7 +637,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data, > pte = READ_ONCE(*ptep); > if (!pte) { > WARN_ON(!(data->iop.cfg.quirks & IO_PGTABLE_QUIRK_NO_WARN)); > - return -ENOENT; > + return 0; > } > > /* If the size matches this level, we're in the right place */ Oh, indeed - I also happened to notice that the other week and was intending to write up a fix, but apparently I completely forgot about it already :( If you're happy to write that up and send a proper patch, please do - otherwise I'll try to get it done before I forget again... Thanks, Robin. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639 2025-12-09 17:34 ` Robin Murphy @ 2025-12-09 17:59 ` Chaitanya Kulkarni 0 siblings, 0 replies; 16+ messages in thread From: Chaitanya Kulkarni @ 2025-12-09 17:59 UTC (permalink / raw) To: Robin Murphy Cc: Jens Axboe, Christoph Hellwig, Will Deacon, Carlos Maiolino, iommu@lists.linux.dev, linux-xfs@vger.kernel.org, linux-nvme@lists.infradead.org, Sebastian Ott, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org On 12/9/25 09:34, Robin Murphy wrote: > On 2025-12-09 5:29 pm, Chaitanya Kulkarni wrote: >> On 12/9/25 03:50, Robin Murphy wrote: >>> On 2025-12-09 11:43 am, Sebastian Ott wrote: >>>> Hi, >>>> >>>> got the following warning after a kernel update on Thurstday, leading >>>> to a >>>> panic and fs corruption. I didn't capture the first warning but I'm >>>> pretty >>>> sure it was the same. It's reproducible but I didn't bisect since it >>>> borked my fs. The only hint I can give is that v6.18 worked. Is this a >>>> known issue? Anything I should try? >>> >>> nvme_unmap_data() is attempting to unmap an IOVA that was never >>> mapped, or has already been unmapped by someone else. That's a usage >>> bug. >>> >>> Thanks, >>> Robin. >> >> Ankit A. also reported this. >> >> Apart from unmapping, by any chance do we need this ? >> >> diff --git a/drivers/iommu/io-pgtable-arm.c >> b/drivers/iommu/io-pgtable-arm.c >> index e6626004b323..05d63fe92e43 100644 >> --- a/drivers/iommu/io-pgtable-arm.c >> +++ b/drivers/iommu/io-pgtable-arm.c >> @@ -637,7 +637,7 @@ static size_t __arm_lpae_unmap(struct >> arm_lpae_io_pgtable *data, >> pte = READ_ONCE(*ptep); >> if (!pte) { >> WARN_ON(!(data->iop.cfg.quirks & IO_PGTABLE_QUIRK_NO_WARN)); >> - return -ENOENT; >> + return 0; >> } >> /* If the size matches this level, we're in the right place */ > > Oh, indeed - I also happened to notice that the other week and was > intending to write up a fix, but apparently I completely forgot about > it already :( > > If you're happy to write that up and send a proper patch, please do - > otherwise I'll try to get it done before I forget again... > > Thanks, > Robin. sounds good, I'll send a patch and continue debugging the problem further. -ck ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639 2025-12-09 11:50 ` Robin Murphy 2025-12-09 17:29 ` Chaitanya Kulkarni @ 2025-12-09 21:05 ` Sebastian Ott 2025-12-10 2:30 ` Chaitanya Kulkarni 1 sibling, 1 reply; 16+ messages in thread From: Sebastian Ott @ 2025-12-09 21:05 UTC (permalink / raw) To: Robin Murphy Cc: linux-nvme, iommu, linux-block, linux-kernel, linux-xfs, Jens Axboe, Christoph Hellwig, Will Deacon, Carlos Maiolino On Tue, 9 Dec 2025, Robin Murphy wrote: > On 2025-12-09 11:43 am, Sebastian Ott wrote: >> Hi, >> >> got the following warning after a kernel update on Thurstday, leading to a >> panic and fs corruption. I didn't capture the first warning but I'm pretty >> sure it was the same. It's reproducible but I didn't bisect since it >> borked my fs. The only hint I can give is that v6.18 worked. Is this a >> known issue? Anything I should try? > > nvme_unmap_data() is attempting to unmap an IOVA that was never mapped, or > has already been unmapped by someone else. That's a usage bug. OK, that's what I suspected - thanks for the confirmation! I did another repro and tried: good: 44fc84337b6e Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux bad: cc25df3e2e22 Merge tag 'for-6.19/block-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux I'll start bisecting between these 2 - hoping it doesn't fork up my root fs again... Thanks, Sebastian ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639 2025-12-09 21:05 ` Sebastian Ott @ 2025-12-10 2:30 ` Chaitanya Kulkarni 2025-12-10 4:05 ` Keith Busch 0 siblings, 1 reply; 16+ messages in thread From: Chaitanya Kulkarni @ 2025-12-10 2:30 UTC (permalink / raw) To: Sebastian Ott Cc: linux-nvme@lists.infradead.org, iommu@lists.linux.dev, Robin Murphy, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, Jens Axboe, Christoph Hellwig, Will Deacon, Carlos Maiolino Sebastian, On 12/9/25 13:05, Sebastian Ott wrote: > On Tue, 9 Dec 2025, Robin Murphy wrote: >> On 2025-12-09 11:43 am, Sebastian Ott wrote: >>> Hi, >>> >>> got the following warning after a kernel update on Thurstday, >>> leading to a >>> panic and fs corruption. I didn't capture the first warning but I'm >>> pretty >>> sure it was the same. It's reproducible but I didn't bisect since it >>> borked my fs. The only hint I can give is that v6.18 worked. Is this a >>> known issue? Anything I should try? >> >> nvme_unmap_data() is attempting to unmap an IOVA that was never >> mapped, or has already been unmapped by someone else. That's a usage >> bug. > > OK, that's what I suspected - thanks for the confirmation! > > I did another repro and tried: > > good: 44fc84337b6e Merge tag 'arm64-upstream' of > git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux > bad: cc25df3e2e22 Merge tag 'for-6.19/block-20251201' of > git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux > > I'll start bisecting between these 2 - hoping it doesn't fork up my root > fs again... > > Thanks, > Sebastian > > Can you see if this fixes your problem ? ========== WARNING/DISCLOSURE: These patches may cause system instability or crashes during testing. Test only on non-production systems with proper backups in place. ========== From 0d180e8055e98d91174ba8fdd47ab934a7a88bef Mon Sep 17 00:00:00 2001 From: Chaitanya Kulkarni <ckulkarnilinux@gmail.com> Date: Tue, 9 Dec 2025 01:23:51 -0800 Subject: [PATCH 1/2 COMPILE TESTED ONLY] iommu/io-pgtable-arm: fix size_t signedness bug in unmap path __arm_lpae_unmap() returns size_t but was returning -ENOENT (negative error code) when encountering an unmapped PTE. Since size_t is unsigned, -ENOENT (typically -2) becomes a huge positive value (0xFFFFFFFFFFFFFFFE on 64-bit systems). This corrupted value propagates through the call chain: __arm_lpae_unmap() returns -ENOENT as size_t -> arm_lpae_unmap_pages() returns it -> __iommu_unmap() adds it to iova address -> iommu_pgsize() triggers BUG_ON due to corrupted iova The corruption causes: 1. IOVA address overflow in __iommu_unmap() loop 2. BUG_ON in iommu_pgsize() from invalid address alignment 3. Kernel panic on ARM64 systems with SMMU Fix by returning 0 instead of -ENOENT. The WARN_ON already signals the error condition, and returning 0 (meaning "nothing unmapped") is the correct semantic for size_t return type. This matches the behavior of other io-pgtable implementations (io-pgtable-arm-v7s, io-pgtable-dart) which return 0 on error conditions. Kernel splat observed: ------------[ cut here ]------------ kernel BUG at drivers/iommu/iommu.c:2464! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat x_tables nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc nvme_fabrics binfmt_misc nls_iso8859_1 ipmi_ssif arm_smmuv3_pmu cdc_subset arm_spe_pmu spi_nor acpi_power_meter acpi_ipmi ipmi_devintf cppc_cpufreq ipmi_msghandler sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua arm_cspmu_module efi_pstore autofs4 btrfs blake2b libblake2b raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq raid1 raid0 mlx5_ib ib_uverbs ib_core cdc_ether usbnet mlx5_core ghash_ce dax_hmem sm4_ce_gcm ast cxl_acpi sm4_ce_ccm drm_shmem_helper sm4_ce cxl_port drm_client_lib i2c_smbus sm4_ce_cipher mlxfw drm_kms_helper cxl_core sm4 nvme psample igb sm3_ce arm_smccc_trng einj drm nvme_core i2c_algo_bit xhci_pci_renesas tls i2c_tegra aes_neon_bs aes_neon_blk aes_ce_blk aes_ce_cipher CPU: 26 UID: 0 PID: 0 Comm: swapper/26 Tainted: G W 6.19.0+ #98 Tainted: [W]=WARN Hardware name: NVIDIA GB200 NVL/P3809-BMC, BIOS 02.05.10 20251010 pstate: 234000c9 (nzCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : iommu_pgsize.isra.0+0xe8/0xf8 lr : __iommu_unmap+0xe0/0x308 sp : ffff80008034fca0 x29: ffff80008034fca0 x28: 000000000000fffe x27: ffffc6e7950e60b0 x26: 00000000f9740000 x25: ffffc6e794b2cde8 x24: ffffc6e7967916a8 x23: ffff80008034fdb8 x22: 0000000000030000 x21: ffff000030949220 x20: 00000000f974fffe x19: fffffffffffffffe x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 000000000000003f x10: 0000000020010000 x9 : 00000000f974fffe x8 : 000000000000003f x7 : 0000000000000000 x6 : ffffffffffffffff x5 : 0000000000000000 x4 : ffff80008034fd00 x3 : 0000000000020002 x2 : 00000000f974fffe x1 : 00000000f974fffe x0 : 0000000020010000 Call trace: iommu_pgsize.isra.0+0xe8/0xf8 (P) iommu_unmap_fast+0x18/0x40 __iommu_dma_iova_unlink+0xec/0x2e8 dma_iova_destroy+0x30/0xa0 nvme_unmap_data+0x200/0x2e8 [nvme] nvme_pci_complete_batch+0x58/0xa8 [nvme] nvme_irq+0x98/0xa8 [nvme] __handle_irq_event_percpu+0xbc/0x498 handle_irq_event+0x54/0xe0 handle_fasteoi_irq+0x12c/0x1c8 handle_irq_desc+0x54/0x90 generic_handle_domain_irq+0x24/0x48 gic_handle_irq+0x200/0x410 call_on_irq_stack+0x30/0x48 do_interrupt_handler+0xa8/0xb8 el1_interrupt+0x4c/0xd0 el1h_64_irq_handler+0x18/0x38 el1h_64_irq+0x84/0x88 cpuidle_enter_state+0x110/0x6a8 (P) cpuidle_enter+0x40/0x70 do_idle+0x264/0x310 cpu_startup_entry+0x3c/0x50 secondary_start_kernel+0x13c/0x180 __secondary_switched+0xc0/0xc8 Code: d2800009 d280000a d280000b d65f03c0 (d4210000) ---[ end trace 0000000000000000 ]--- Fixes: 3318f7b5cefb ("iommu/io-pgtable-arm: Add quirk to quiet WARN_ON()") Cc: stable@vger.kernel.org Reported-by: Ankit Agrawal <ankita@nvidia.com> Reported-by: Sebastian Ott <sebott@redhat.com> Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> --- ======================================================================= DISCLOSURE: DUE TO LACK OF H/W THIS PATCH IS COMPLETELY UNTESTED AND BASED SOLELY ON THEORETICAL ANALYSIS. PLEASE REVIEW CAREFULLY. ======================================================================= --- drivers/iommu/io-pgtable-arm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index e6626004b323..05d63fe92e43 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -637,7 +637,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data, pte = READ_ONCE(*ptep); if (!pte) { WARN_ON(!(data->iop.cfg.quirks & IO_PGTABLE_QUIRK_NO_WARN)); - return -ENOENT; + return 0; } /* If the size matches this level, we're in the right place */ -- 2.40.0 ###################################################################### From aa540bb77f7d4460c87b0a317df264de748a3b3c Mon Sep 17 00:00:00 2001 From: Chaitanya Kulkarni <ckulkarnilinux@gmail.com> Date: Tue, 9 Dec 2025 17:01:15 -0800 Subject: [PATCH 2/2 COMPILE TESTED ONLY] block: fix partial IOVA mapping cleanup in blk_rq_dma_map_iova When dma_iova_link() fails partway through mapping a request's scatter-gather list, the function would break out of the loop without cleaning up the already-mapped portions. This leads to a map/unmap size mismatch when the request is later completed. The completion path (via dma_iova_destroy or nvme_unmap_data) attempts to unmap the full expected size, but only a partial size was actually mapped. This triggers "unmapped PTE" warnings in the ARM LPAE io-pgtable code and can cause IOVA address corruption. Fix by adding an out_unlink error path that calls dma_iova_unlink() to clean up any partial mapping before returning failure. This ensures that when an error occurs: 1. All partially-mapped IOVA ranges are properly unmapped 2. The completion path won't attempt to unmap non-existent mappings 3. No map/unmap size mismatch occurs Fixes: 858299dc6160 ("block: add scatterlist-less DMA mapping helpers") Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> --- ======================================================================= DISCLOSURE: DUE TO LACK OF H/W THIS PATCH IS COMPLETELY UNTESTED AND BASED SOLELY ON THEORETICAL ANALYSIS. PLEASE REVIEW CAREFULLY. ======================================================================= --- block/blk-mq-dma.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c index b6dbc9767596..eb8b5b6b595c 100644 --- a/block/blk-mq-dma.c +++ b/block/blk-mq-dma.c @@ -126,17 +126,26 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev, error = dma_iova_link(dma_dev, state, vec->paddr, mapped, vec->len, dir, attrs); if (error) - break; + goto out_unlink; mapped += vec->len; } while (blk_map_iter_next(req, &iter->iter, vec)); error = dma_iova_sync(dma_dev, state, 0, mapped); - if (error) { - iter->status = errno_to_blk_status(error); - return false; - } + if (error) + goto out_unlink; return true; + +out_unlink: + /* + * Unlink any partial mapping to avoid unmap mismatch later. + * If we mapped some bytes but not all, we must clean up now + * to prevent attempting to unmap more than was actually mapped. + */ + if (mapped) + dma_iova_unlink(dma_dev, state, 0, mapped, dir, attrs); + iter->status = errno_to_blk_status(error); + return false; } static inline void blk_rq_map_iter_init(struct request *rq, -- 2.40.0 -ck ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639 2025-12-10 2:30 ` Chaitanya Kulkarni @ 2025-12-10 4:05 ` Keith Busch 2025-12-10 4:59 ` Chaitanya Kulkarni 0 siblings, 1 reply; 16+ messages in thread From: Keith Busch @ 2025-12-10 4:05 UTC (permalink / raw) To: Chaitanya Kulkarni Cc: Sebastian Ott, linux-nvme@lists.infradead.org, iommu@lists.linux.dev, Robin Murphy, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, Jens Axboe, Christoph Hellwig, Will Deacon, Carlos Maiolino On Wed, Dec 10, 2025 at 02:30:50AM +0000, Chaitanya Kulkarni wrote: > @@ -126,17 +126,26 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev, > error = dma_iova_link(dma_dev, state, vec->paddr, mapped, > vec->len, dir, attrs); > if (error) > - break; > + goto out_unlink; > mapped += vec->len; > } while (blk_map_iter_next(req, &iter->iter, vec)); > > error = dma_iova_sync(dma_dev, state, 0, mapped); > - if (error) { > - iter->status = errno_to_blk_status(error); > - return false; > - } > + if (error) > + goto out_unlink; > > return true; > + > +out_unlink: > + /* > + * Unlink any partial mapping to avoid unmap mismatch later. > + * If we mapped some bytes but not all, we must clean up now > + * to prevent attempting to unmap more than was actually mapped. > + */ > + if (mapped) > + dma_iova_unlink(dma_dev, state, 0, mapped, dir, attrs); > + iter->status = errno_to_blk_status(error); > + return false; > } It does look like a bug to continue on when dma_iova_link() fails as the caller thinks the entire mapping was successful, but I think you also need to call dma_iova_free() to undo the earlier dma_iova_try_alloc(), otherwise iova space is leaked. I'm a bit doubtful this error condition was hit though: this sequence is largely the same as it was in v6.18 before the regression. The only difference since then should just be for handling P2P DMA across a host bridge, which I don't think applies to the reported bug since that's a pretty unusual thing to do. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639 2025-12-10 4:05 ` Keith Busch @ 2025-12-10 4:59 ` Chaitanya Kulkarni 2025-12-10 17:12 ` Sebastian Ott 0 siblings, 1 reply; 16+ messages in thread From: Chaitanya Kulkarni @ 2025-12-10 4:59 UTC (permalink / raw) To: Keith Busch, Sebastian Ott Cc: linux-nvme@lists.infradead.org, iommu@lists.linux.dev, Robin Murphy, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, Jens Axboe, Christoph Hellwig, Will Deacon, Carlos Maiolino, Leon Romanovsky (+ Leon Romanovsky) On 12/9/25 20:05, Keith Busch wrote: > On Wed, Dec 10, 2025 at 02:30:50AM +0000, Chaitanya Kulkarni wrote: >> @@ -126,17 +126,26 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev, >> error = dma_iova_link(dma_dev, state, vec->paddr, mapped, >> vec->len, dir, attrs); >> if (error) >> - break; >> + goto out_unlink; >> mapped += vec->len; >> } while (blk_map_iter_next(req, &iter->iter, vec)); >> >> error = dma_iova_sync(dma_dev, state, 0, mapped); >> - if (error) { >> - iter->status = errno_to_blk_status(error); >> - return false; >> - } >> + if (error) >> + goto out_unlink; >> >> return true; >> + >> +out_unlink: >> + /* >> + * Unlink any partial mapping to avoid unmap mismatch later. >> + * If we mapped some bytes but not all, we must clean up now >> + * to prevent attempting to unmap more than was actually mapped. >> + */ >> + if (mapped) >> + dma_iova_unlink(dma_dev, state, 0, mapped, dir, attrs); >> + iter->status = errno_to_blk_status(error); >> + return false; >> } > It does look like a bug to continue on when dma_iova_link() fails as the > caller thinks the entire mapping was successful, but I think you also > need to call dma_iova_free() to undo the earlier dma_iova_try_alloc(), > otherwise iova space is leaked. Thanks for catching that, see updated version of this patch [1]. > I'm a bit doubtful this error condition was hit though: this sequence > is largely the same as it was in v6.18 before the regression. The only > difference since then should just be for handling P2P DMA across a host > bridge, which I don't think applies to the reported bug since that's a > pretty unusual thing to do. That's why I've asked reporter to test it. Either way, IMO both of the patches are still needed. -ck [1] From 726687876a334cb699247584102e491e98f8fdc4 Mon Sep 17 00:00:00 2001 From: Chaitanya Kulkarni <ckulkarnilinux@gmail.com> Date: Tue, 9 Dec 2025 17:01:15 -0800 Subject: [PATCH 2/2] block: fix partial IOVA mapping cleanup in blk_rq_dma_map_iova When dma_iova_link() fails partway through mapping a request's scatter-gather list, the function would break out of the loop without cleaning up the already-mapped portions. This leads to a map/unmap size mismatch when the request is later completed. The completion path (via dma_iova_destroy or nvme_unmap_data) attempts to unmap the full expected size, but only a partial size was actually mapped. This triggers "unmapped PTE" warnings in the ARM LPAE io-pgtable code and can cause IOVA address corruption. Fix by adding an out_unlink error path that calls dma_iova_unlink() to clean up any partial mapping before returning failure. This ensures that when an error occurs: 1. All partially-mapped IOVA ranges are properly unmapped 2. The completion path won't attempt to unmap non-existent mappings 3. No map/unmap size mismatch occurs Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com> --- block/blk-mq-dma.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c index b6dbc9767596..ecfd53ed6984 100644 --- a/block/blk-mq-dma.c +++ b/block/blk-mq-dma.c @@ -126,17 +126,28 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev, error = dma_iova_link(dma_dev, state, vec->paddr, mapped, vec->len, dir, attrs); if (error) - break; + goto out_unlink; mapped += vec->len; } while (blk_map_iter_next(req, &iter->iter, vec)); error = dma_iova_sync(dma_dev, state, 0, mapped); - if (error) { - iter->status = errno_to_blk_status(error); - return false; - } + if (error) + goto out_unlink; return true; + +out_unlink: + /* + * Clean up partial mapping and free the entire IOVA reservation. + * dma_iova_unlink() detaches any linked bytes, dma_iova_free() + * returns the full IOVA window allocated by dma_iova_try_alloc() + * (state->__size tracks the original allocation size). + */ + if (mapped) + dma_iova_unlink(dma_dev, state, 0, mapped, dir, attrs); + dma_iova_free(dma_dev, state); + iter->status = errno_to_blk_status(error); + return false; } static inline void blk_rq_map_iter_init(struct request *rq, -- 2.40.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639 2025-12-10 4:59 ` Chaitanya Kulkarni @ 2025-12-10 17:12 ` Sebastian Ott 2025-12-10 21:12 ` Chaitanya Kulkarni 0 siblings, 1 reply; 16+ messages in thread From: Sebastian Ott @ 2025-12-10 17:12 UTC (permalink / raw) To: Chaitanya Kulkarni Cc: Keith Busch, linux-nvme@lists.infradead.org, iommu@lists.linux.dev, Robin Murphy, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, Jens Axboe, Christoph Hellwig, Will Deacon, Carlos Maiolino, Leon Romanovsky On Wed, 10 Dec 2025, Chaitanya Kulkarni wrote: > (+ Leon Romanovsky) > > On 12/9/25 20:05, Keith Busch wrote: >> On Wed, Dec 10, 2025 at 02:30:50AM +0000, Chaitanya Kulkarni wrote: >>> @@ -126,17 +126,26 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev, >>> error = dma_iova_link(dma_dev, state, vec->paddr, mapped, >>> vec->len, dir, attrs); >>> if (error) >>> - break; >>> + goto out_unlink; >>> mapped += vec->len; >>> } while (blk_map_iter_next(req, &iter->iter, vec)); >>> >>> error = dma_iova_sync(dma_dev, state, 0, mapped); >>> - if (error) { >>> - iter->status = errno_to_blk_status(error); >>> - return false; >>> - } >>> + if (error) >>> + goto out_unlink; >>> >>> return true; >>> + >>> +out_unlink: >>> + /* >>> + * Unlink any partial mapping to avoid unmap mismatch later. >>> + * If we mapped some bytes but not all, we must clean up now >>> + * to prevent attempting to unmap more than was actually mapped. >>> + */ >>> + if (mapped) >>> + dma_iova_unlink(dma_dev, state, 0, mapped, dir, attrs); >>> + iter->status = errno_to_blk_status(error); >>> + return false; >>> } >> It does look like a bug to continue on when dma_iova_link() fails as the >> caller thinks the entire mapping was successful, but I think you also >> need to call dma_iova_free() to undo the earlier dma_iova_try_alloc(), >> otherwise iova space is leaked. > > Thanks for catching that, see updated version of this patch [1]. > >> I'm a bit doubtful this error condition was hit though: this sequence >> is largely the same as it was in v6.18 before the regression. The only >> difference since then should just be for handling P2P DMA across a host >> bridge, which I don't think applies to the reported bug since that's a >> pretty unusual thing to do. > > That's why I've asked reporter to test it. > > Either way, IMO both of the patches are still needed. > The patch Keith posted fixes the issue for me. Should I do another run with only these 2 applied? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639 2025-12-10 17:12 ` Sebastian Ott @ 2025-12-10 21:12 ` Chaitanya Kulkarni 0 siblings, 0 replies; 16+ messages in thread From: Chaitanya Kulkarni @ 2025-12-10 21:12 UTC (permalink / raw) To: Sebastian Ott Cc: Keith Busch, linux-nvme@lists.infradead.org, iommu@lists.linux.dev, Robin Murphy, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, Jens Axboe, Christoph Hellwig, Will Deacon, Carlos Maiolino, Leon Romanovsky On 12/10/25 09:12, Sebastian Ott wrote: > On Wed, 10 Dec 2025, Chaitanya Kulkarni wrote: >> (+ Leon Romanovsky) >> >> On 12/9/25 20:05, Keith Busch wrote: >>> On Wed, Dec 10, 2025 at 02:30:50AM +0000, Chaitanya Kulkarni wrote: >>>> @@ -126,17 +126,26 @@ static bool blk_rq_dma_map_iova(struct >>>> request *req, struct device *dma_dev, >>>> error = dma_iova_link(dma_dev, state, vec->paddr, mapped, >>>> vec->len, dir, attrs); >>>> if (error) >>>> - break; >>>> + goto out_unlink; >>>> mapped += vec->len; >>>> } while (blk_map_iter_next(req, &iter->iter, vec)); >>>> >>>> error = dma_iova_sync(dma_dev, state, 0, mapped); >>>> - if (error) { >>>> - iter->status = errno_to_blk_status(error); >>>> - return false; >>>> - } >>>> + if (error) >>>> + goto out_unlink; >>>> >>>> return true; >>>> + >>>> +out_unlink: >>>> + /* >>>> + * Unlink any partial mapping to avoid unmap mismatch later. >>>> + * If we mapped some bytes but not all, we must clean up now >>>> + * to prevent attempting to unmap more than was actually mapped. >>>> + */ >>>> + if (mapped) >>>> + dma_iova_unlink(dma_dev, state, 0, mapped, dir, attrs); >>>> + iter->status = errno_to_blk_status(error); >>>> + return false; >>>> } >>> It does look like a bug to continue on when dma_iova_link() fails as >>> the >>> caller thinks the entire mapping was successful, but I think you also >>> need to call dma_iova_free() to undo the earlier dma_iova_try_alloc(), >>> otherwise iova space is leaked. >> >> Thanks for catching that, see updated version of this patch [1]. >> >>> I'm a bit doubtful this error condition was hit though: this sequence >>> is largely the same as it was in v6.18 before the regression. The only >>> difference since then should just be for handling P2P DMA across a host >>> bridge, which I don't think applies to the reported bug since that's a >>> pretty unusual thing to do. >> >> That's why I've asked reporter to test it. >> >> Either way, IMO both of the patches are still needed. >> > > The patch Keith posted fixes the issue for me. Should I do another run > with only these 2 applied? > no need for another run, these fixes are needed anyways. I'll send formal patches for these. Thanks for reporting this. -ck ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639 2025-12-09 11:43 WARNING: drivers/iommu/io-pgtable-arm.c:639 Sebastian Ott 2025-12-09 11:50 ` Robin Murphy @ 2025-12-10 5:02 ` Keith Busch 2025-12-10 5:33 ` Keith Busch 2025-12-10 11:08 ` Sebastian Ott 1 sibling, 2 replies; 16+ messages in thread From: Keith Busch @ 2025-12-10 5:02 UTC (permalink / raw) To: Sebastian Ott Cc: linux-nvme, iommu, linux-block, linux-kernel, linux-xfs, Jens Axboe, Christoph Hellwig, Will Deacon, Robin Murphy, Carlos Maiolino On Tue, Dec 09, 2025 at 12:43:31PM +0100, Sebastian Ott wrote: > got the following warning after a kernel update on Thurstday, leading to a > panic and fs corruption. I didn't capture the first warning but I'm pretty > sure it was the same. It's reproducible but I didn't bisect since it > borked my fs. The only hint I can give is that v6.18 worked. Is this a > known issue? Anything I should try? Could you check if your nvme device supports SGLs? There are some new features in 6.19 that would allow merging IO that wouldn't have happened before. You can check from command line: # nvme id-ctrl /dev/nvme0 | grep sgl Replace "nvme0" with whatever your instance was named if it's not using the 0 suffix. What I'm thinking happened is that you had an IO that could be coalesced in IOVA space at one point, and then when that request was completed and later reused. The new request merged bio's that could not coalesce, and the problem with that is that we never reinitialize the iova state, so we're using the old context. And if that is what's happening, here's a quick fix: --- diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c index e9108ccaf4b06..7bff480d666e2 100644 --- a/block/blk-mq-dma.c +++ b/block/blk-mq-dma.c @@ -199,6 +199,7 @@ static bool blk_dma_map_iter_start(struct request *req, struct device *dma_dev, if (blk_can_dma_map_iova(req, dma_dev) && dma_iova_try_alloc(dma_dev, state, vec.paddr, total_len)) return blk_rq_dma_map_iova(req, dma_dev, state, iter, &vec); + state->__size = 0; return blk_dma_map_direct(req, dma_dev, iter, &vec); } -- ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639 2025-12-10 5:02 ` Keith Busch @ 2025-12-10 5:33 ` Keith Busch 2025-12-10 11:08 ` Sebastian Ott 1 sibling, 0 replies; 16+ messages in thread From: Keith Busch @ 2025-12-10 5:33 UTC (permalink / raw) To: Sebastian Ott Cc: linux-nvme, iommu, linux-block, linux-kernel, linux-xfs, Jens Axboe, Christoph Hellwig, Will Deacon, Robin Murphy, Carlos Maiolino On Wed, Dec 10, 2025 at 02:02:43PM +0900, Keith Busch wrote: > On Tue, Dec 09, 2025 at 12:43:31PM +0100, Sebastian Ott wrote: > > got the following warning after a kernel update on Thurstday, leading to a > > panic and fs corruption. I didn't capture the first warning but I'm pretty > > sure it was the same. It's reproducible but I didn't bisect since it > > borked my fs. The only hint I can give is that v6.18 worked. Is this a > > known issue? Anything I should try? > > Could you check if your nvme device supports SGLs? There are some new > features in 6.19 that would allow merging IO that wouldn't have happened > before. You can check from command line: Actually the SGL support is probably unnecessary for ARM if your iommu granularity is 64k. That setup could also lead to an uninitialized "state" and the type of corruption you're observing. But the same patch below is still the proposed fix for it anyway. > --- > diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c > index e9108ccaf4b06..7bff480d666e2 100644 > --- a/block/blk-mq-dma.c > +++ b/block/blk-mq-dma.c > @@ -199,6 +199,7 @@ static bool blk_dma_map_iter_start(struct request *req, struct device *dma_dev, > if (blk_can_dma_map_iova(req, dma_dev) && > dma_iova_try_alloc(dma_dev, state, vec.paddr, total_len)) > return blk_rq_dma_map_iova(req, dma_dev, state, iter, &vec); > + state->__size = 0; > return blk_dma_map_direct(req, dma_dev, iter, &vec); > } > > -- > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639 2025-12-10 5:02 ` Keith Busch 2025-12-10 5:33 ` Keith Busch @ 2025-12-10 11:08 ` Sebastian Ott 2025-12-10 11:21 ` Keith Busch 1 sibling, 1 reply; 16+ messages in thread From: Sebastian Ott @ 2025-12-10 11:08 UTC (permalink / raw) To: Keith Busch Cc: linux-nvme, iommu, linux-block, linux-kernel, linux-xfs, Jens Axboe, Christoph Hellwig, Will Deacon, Robin Murphy, Carlos Maiolino On Wed, 10 Dec 2025, Keith Busch wrote: > On Tue, Dec 09, 2025 at 12:43:31PM +0100, Sebastian Ott wrote: >> got the following warning after a kernel update on Thurstday, leading to a >> panic and fs corruption. I didn't capture the first warning but I'm pretty >> sure it was the same. It's reproducible but I didn't bisect since it >> borked my fs. The only hint I can give is that v6.18 worked. Is this a >> known issue? Anything I should try? > > Could you check if your nvme device supports SGLs? There are some new > features in 6.19 that would allow merging IO that wouldn't have happened > before. You can check from command line: > > # nvme id-ctrl /dev/nvme0 | grep sgl # nvme id-ctrl /dev/nvme0n1 | grep sgl sgls : 0xf0002 Thanks, Sebastian ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639 2025-12-10 11:08 ` Sebastian Ott @ 2025-12-10 11:21 ` Keith Busch 2025-12-10 16:57 ` Sebastian Ott 0 siblings, 1 reply; 16+ messages in thread From: Keith Busch @ 2025-12-10 11:21 UTC (permalink / raw) To: Sebastian Ott Cc: linux-nvme, iommu, linux-block, linux-kernel, linux-xfs, Jens Axboe, Christoph Hellwig, Will Deacon, Robin Murphy, Carlos Maiolino On Wed, Dec 10, 2025 at 12:08:36PM +0100, Sebastian Ott wrote: > On Wed, 10 Dec 2025, Keith Busch wrote: > > On Tue, Dec 09, 2025 at 12:43:31PM +0100, Sebastian Ott wrote: > > > got the following warning after a kernel update on Thurstday, leading to a > > > panic and fs corruption. I didn't capture the first warning but I'm pretty > > > sure it was the same. It's reproducible but I didn't bisect since it > > > borked my fs. The only hint I can give is that v6.18 worked. Is this a > > > known issue? Anything I should try? > > > > Could you check if your nvme device supports SGLs? There are some new > > features in 6.19 that would allow merging IO that wouldn't have happened > > before. You can check from command line: > > > > # nvme id-ctrl /dev/nvme0 | grep sgl > > # nvme id-ctrl /dev/nvme0n1 | grep sgl > sgls : 0xf0002 Oh neat, so you *do* support SGL. Not that it was required as arm64 can support iommu granularities larger than the NVMe PRP unit, so the bug was possible to hit in either case for you (assuming the smmu was configured with 64k io page size). Anyway, thanks for the report, and sorry for the fs trouble the bug caused you. I'm working on a blktest to specifically target this condition so we don't regress again. I just need to make sure to run it on a system with iommu enabled (usually it's off on my test machine). ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: WARNING: drivers/iommu/io-pgtable-arm.c:639 2025-12-10 11:21 ` Keith Busch @ 2025-12-10 16:57 ` Sebastian Ott 0 siblings, 0 replies; 16+ messages in thread From: Sebastian Ott @ 2025-12-10 16:57 UTC (permalink / raw) To: Keith Busch Cc: linux-nvme, iommu, linux-block, linux-kernel, linux-xfs, Jens Axboe, Christoph Hellwig, Will Deacon, Robin Murphy, Carlos Maiolino On Wed, 10 Dec 2025, Keith Busch wrote: > On Wed, Dec 10, 2025 at 12:08:36PM +0100, Sebastian Ott wrote: >> On Wed, 10 Dec 2025, Keith Busch wrote: >>> On Tue, Dec 09, 2025 at 12:43:31PM +0100, Sebastian Ott wrote: >>>> got the following warning after a kernel update on Thurstday, leading to a >>>> panic and fs corruption. I didn't capture the first warning but I'm pretty >>>> sure it was the same. It's reproducible but I didn't bisect since it >>>> borked my fs. The only hint I can give is that v6.18 worked. Is this a >>>> known issue? Anything I should try? >>> >>> Could you check if your nvme device supports SGLs? There are some new >>> features in 6.19 that would allow merging IO that wouldn't have happened >>> before. You can check from command line: >>> >>> # nvme id-ctrl /dev/nvme0 | grep sgl >> >> # nvme id-ctrl /dev/nvme0n1 | grep sgl >> sgls : 0xf0002 > > Oh neat, so you *do* support SGL. Not that it was required as arm64 > can support iommu granularities larger than the NVMe PRP unit, so the > bug was possible to hit in either case for you (assuming the smmu was > configured with 64k io page size). > > Anyway, thanks for the report, and sorry for the fs trouble the bug > caused you. No worries, it was a test system in need for an upgrade anyway. Thanks for the quick fix! > I'm working on a blktest to specifically target this > condition so we don't regress again. I just need to make sure to run it > on a system with iommu enabled (usually it's off on my test machine). Great! ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-12-10 21:12 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-12-09 11:43 WARNING: drivers/iommu/io-pgtable-arm.c:639 Sebastian Ott 2025-12-09 11:50 ` Robin Murphy 2025-12-09 17:29 ` Chaitanya Kulkarni 2025-12-09 17:34 ` Robin Murphy 2025-12-09 17:59 ` Chaitanya Kulkarni 2025-12-09 21:05 ` Sebastian Ott 2025-12-10 2:30 ` Chaitanya Kulkarni 2025-12-10 4:05 ` Keith Busch 2025-12-10 4:59 ` Chaitanya Kulkarni 2025-12-10 17:12 ` Sebastian Ott 2025-12-10 21:12 ` Chaitanya Kulkarni 2025-12-10 5:02 ` Keith Busch 2025-12-10 5:33 ` Keith Busch 2025-12-10 11:08 ` Sebastian Ott 2025-12-10 11:21 ` Keith Busch 2025-12-10 16:57 ` Sebastian Ott
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).