* Error when running fio against nvme-of rdma target (mlx5 driver)
@ 2022-02-09 2:50 Martin Oliveira
2022-02-09 8:41 ` Chaitanya Kulkarni via iommu
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Martin Oliveira @ 2022-02-09 2:50 UTC (permalink / raw)
To: linux-nvme@lists.infradead.org, iommu@lists.linux-foundation.org,
linux-rdma@vger.kernel.org
Cc: Kelly Ursenbach, Lee, Jason, Logan Gunthorpe
Hello,
We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions.
Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace.
When running an fio job targeting directly the fabrics devices (no filesystem, see script at the end), within a minute or so we start seeing errors like this:
[ 408.368677] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d08000 flags=0x0000]
[ 408.372201] infiniband mlx5_0: mlx5_handle_error_cqe:332:(pid 0): WC error: 4, Message: local protection error
[ 408.380181] infiniband mlx5_0: dump_cqe:272:(pid 0): dump error cqe
[ 408.380187] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 408.380189] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 408.380191] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 408.380192] 00000030: 00 00 00 00 a9 00 56 04 00 00 01 e9 00 54 e8 e2
[ 408.380230] nvme nvme15: RECV for CQE 0x00000000ce392ed9 failed with status local protection error (4)
[ 408.380235] nvme nvme15: starting error recovery
[ 408.380238] nvme_ns_head_submit_bio: 726 callbacks suppressed
[ 408.380246] block nvme15n2: no usable path - requeuing I/O
[ 408.380284] block nvme15n5: no usable path - requeuing I/O
[ 408.380298] block nvme15n1: no usable path - requeuing I/O
[ 408.380304] block nvme15n11: no usable path - requeuing I/O
[ 408.380304] block nvme15n11: no usable path - requeuing I/O
[ 408.380330] block nvme15n1: no usable path - requeuing I/O
[ 408.380350] block nvme15n2: no usable path - requeuing I/O
[ 408.380371] block nvme15n6: no usable path - requeuing I/O
[ 408.380377] block nvme15n6: no usable path - requeuing I/O
[ 408.380382] block nvme15n4: no usable path - requeuing I/O
[ 408.380472] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d09000 flags=0x0000]
[ 408.391265] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d0a000 flags=0x0000]
[ 415.125967] nvmet: ctrl 1 keep-alive timer (5 seconds) expired!
[ 415.131898] nvmet: ctrl 1 fatal error occurred!
Occasionally, we've seen the following stack trace:
[ 1158.152464] kernel BUG at drivers/iommu/amd/io_pgtable.c:485!
[ 1158.427696] invalid opcode: 0000 [#1] SMP NOPTI
[ 1158.432228] CPU: 51 PID: 796 Comm: kworker/51:1H Tainted: P OE 5.13.0-eid-athena-g6fb4e704d11c-dirty #14
[ 1158.443867] Hardware name: GIGABYTE R272-Z32-00/MZ32-AR0-00, BIOS R21 10/08/2020
[ 1158.451252] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
[ 1158.456884] RIP: 0010:iommu_v1_unmap_page+0xed/0x100
[ 1158.461849] Code: 48 8b 45 d0 65 48 33 04 25 28 00 00 00 75 1d 48 83 c4 10 4c 89 f0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 49 8d 46 ff 4c 85 f0 74 d6 <0f> 0b e8 1c 38 46 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44
[ 1158.480589] RSP: 0018:ffffabb520587bd0 EFLAGS: 00010206
[ 1158.485812] RAX: 0001000000061fff RBX: 0000000000100000 RCX: 0000000000000027
[ 1158.492938] RDX: 0000000030562000 RSI: ffff000000000000 RDI: 0000000000000000
[ 1158.500071] RBP: ffffabb520587c08 R08: ffffabb520587bd0 R09: 0000000000000000
[ 1158.507202] R10: 0000000000000001 R11: 000ffffffffff000 R12: ffff9984abd9e318
[ 1158.514326] R13: ffff9984abd9e310 R14: 0001000000062000 R15: 0001000000000000
[ 1158.521452] FS: 0000000000000000(0000) GS:ffff99a36c8c0000(0000) knlGS:0000000000000000
[ 1158.529540] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1158.535286] CR2: 00007f75b04f1000 CR3: 00000001eddd8000 CR4: 0000000000350ee0
[ 1158.542419] Call Trace:
[ 1158.544877] amd_iommu_unmap+0x2c/0x40
[ 1158.548653] __iommu_unmap+0xc4/0x170
[ 1158.552344] iommu_unmap_fast+0xe/0x10
[ 1158.556100] __iommu_dma_unmap+0x85/0x120
[ 1158.560115] iommu_dma_unmap_sg+0x95/0x110
[ 1158.564213] dma_unmap_sg_attrs+0x42/0x50
[ 1158.568225] rdma_rw_ctx_destroy+0x6e/0xc0 [ib_core]
[ 1158.573201] nvmet_rdma_rw_ctx_destroy+0xa7/0xc0 [nvmet_rdma]
[ 1158.578944] nvmet_rdma_read_data_done+0x5c/0xf0 [nvmet_rdma]
[ 1158.584683] __ib_process_cq+0x8e/0x150 [ib_core]
[ 1158.589398] ib_cq_poll_work+0x2b/0x80 [ib_core]
[ 1158.594027] process_one_work+0x220/0x3c0
[ 1158.598038] worker_thread+0x4d/0x3f0
[ 1158.601696] kthread+0x114/0x150
[ 1158.604928] ? process_one_work+0x3c0/0x3c0
[ 1158.609114] ? kthread_park+0x90/0x90
[ 1158.612783] ret_from_fork+0x22/0x30
We first saw this on a 5.13 kernel but could reproduce with 5.17-rc2.
We found a possibly related bug report [1] that suggested disabling the IOMMU could help, but even after I disabled it (amd_iommu=off iommu=off) I still get errors (nvme IO timeouts). Another thread from 2016[2] suggested that disabling some kernel debug options could workaround the "local protection error" but that didn't help either.
As far as I can tell, the disks are fine, as running the same fio job targeting the real physical devices works fine.
Any suggestions are appreciated.
Thanks,
Martin
[1]: https://bugzilla.kernel.org/show_bug.cgi?id=210177
[2]: https://lore.kernel.org/all/6BBFD126-877C-4638-BB91-ABF715E29326@oracle.com/
fio script:
[global]
name=fio-seq-write
rw=write
bs=1M
direct=1
numjobs=32
time_based
group_reporting=1
runtime=18000
end_fsync=1
size=10G
ioengine=libaio
iodepth=16
[file1]
filename=/dev/nvme0n1
[file2]
filename=/dev/nvme0n2
[file3]
filename=/dev/nvme0n3
[file4]
filename=/dev/nvme0n4
[file5]
filename=/dev/nvme0n5
[file6]
filename=/dev/nvme0n6
[file7]
filename=/dev/nvme0n7
[file8]
filename=/dev/nvme0n8
[file9]
filename=/dev/nvme0n9
[file10]
filename=/dev/nvme0n10
[file11]
filename=/dev/nvme0n11
[file12]
filename=/dev/nvme0n12
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Error when running fio against nvme-of rdma target (mlx5 driver)
2022-02-09 2:50 Error when running fio against nvme-of rdma target (mlx5 driver) Martin Oliveira
@ 2022-02-09 8:41 ` Chaitanya Kulkarni via iommu
2022-02-10 23:58 ` Martin Oliveira
2022-02-09 12:48 ` Robin Murphy
2024-01-31 9:18 ` Arthur Muller
2 siblings, 1 reply; 8+ messages in thread
From: Chaitanya Kulkarni via iommu @ 2022-02-09 8:41 UTC (permalink / raw)
To: Martin Oliveira
Cc: Kelly Ursenbach, linux-rdma@vger.kernel.org, Lee, Jason,
linux-nvme@lists.infradead.org, iommu@lists.linux-foundation.org,
Logan Gunthorpe
On 2/8/22 6:50 PM, Martin Oliveira wrote:
> Hello,
>
> We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions.
>
> Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace.
>
Thanks for reporting this, if you can bisect the problem on your setup
it will help others to help you better.
-ck
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Error when running fio against nvme-of rdma target (mlx5 driver)
2022-02-09 2:50 Error when running fio against nvme-of rdma target (mlx5 driver) Martin Oliveira
2022-02-09 8:41 ` Chaitanya Kulkarni via iommu
@ 2022-02-09 12:48 ` Robin Murphy
2024-01-31 9:18 ` Arthur Muller
2 siblings, 0 replies; 8+ messages in thread
From: Robin Murphy @ 2022-02-09 12:48 UTC (permalink / raw)
To: Martin Oliveira, linux-nvme@lists.infradead.org,
iommu@lists.linux-foundation.org, linux-rdma@vger.kernel.org
Cc: Kelly Ursenbach, Lee, Jason, Logan Gunthorpe
On 2022-02-09 02:50, Martin Oliveira wrote:
> Hello,
>
> We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions.
>
> Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace.
>
> When running an fio job targeting directly the fabrics devices (no filesystem, see script at the end), within a minute or so we start seeing errors like this:
>
> [ 408.368677] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d08000 flags=0x0000]
> [ 408.372201] infiniband mlx5_0: mlx5_handle_error_cqe:332:(pid 0): WC error: 4, Message: local protection error
> [ 408.380181] infiniband mlx5_0: dump_cqe:272:(pid 0): dump error cqe
> [ 408.380187] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 408.380189] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 408.380191] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 408.380192] 00000030: 00 00 00 00 a9 00 56 04 00 00 01 e9 00 54 e8 e2
> [ 408.380230] nvme nvme15: RECV for CQE 0x00000000ce392ed9 failed with status local protection error (4)
> [ 408.380235] nvme nvme15: starting error recovery
> [ 408.380238] nvme_ns_head_submit_bio: 726 callbacks suppressed
> [ 408.380246] block nvme15n2: no usable path - requeuing I/O
> [ 408.380284] block nvme15n5: no usable path - requeuing I/O
> [ 408.380298] block nvme15n1: no usable path - requeuing I/O
> [ 408.380304] block nvme15n11: no usable path - requeuing I/O
> [ 408.380304] block nvme15n11: no usable path - requeuing I/O
> [ 408.380330] block nvme15n1: no usable path - requeuing I/O
> [ 408.380350] block nvme15n2: no usable path - requeuing I/O
> [ 408.380371] block nvme15n6: no usable path - requeuing I/O
> [ 408.380377] block nvme15n6: no usable path - requeuing I/O
> [ 408.380382] block nvme15n4: no usable path - requeuing I/O
> [ 408.380472] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d09000 flags=0x0000]
> [ 408.391265] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d0a000 flags=0x0000]
> [ 415.125967] nvmet: ctrl 1 keep-alive timer (5 seconds) expired!
> [ 415.131898] nvmet: ctrl 1 fatal error occurred!
>
> Occasionally, we've seen the following stack trace:
FWIW this is indicative the scatterlist passed to dma_unmap_sg_attrs()
was wrong - specifically it looks like an attempt to unmap a region
that's already unmapped (or was never mapped in the first place).
Whatever race or data corruption issue is causing that is almost
certainly happening much earlier, since the IO_PAGE_FAULT logs further
imply that either some pages have been spuriously unmapped while the
device was still accessing them, or some DMA address in the scatterlist
was already bogus by the time it was handed off to the device.
Robin.
> [ 1158.152464] kernel BUG at drivers/iommu/amd/io_pgtable.c:485!
> [ 1158.427696] invalid opcode: 0000 [#1] SMP NOPTI
> [ 1158.432228] CPU: 51 PID: 796 Comm: kworker/51:1H Tainted: P OE 5.13.0-eid-athena-g6fb4e704d11c-dirty #14
> [ 1158.443867] Hardware name: GIGABYTE R272-Z32-00/MZ32-AR0-00, BIOS R21 10/08/2020
> [ 1158.451252] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
> [ 1158.456884] RIP: 0010:iommu_v1_unmap_page+0xed/0x100
> [ 1158.461849] Code: 48 8b 45 d0 65 48 33 04 25 28 00 00 00 75 1d 48 83 c4 10 4c 89 f0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 49 8d 46 ff 4c 85 f0 74 d6 <0f> 0b e8 1c 38 46 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44
> [ 1158.480589] RSP: 0018:ffffabb520587bd0 EFLAGS: 00010206
> [ 1158.485812] RAX: 0001000000061fff RBX: 0000000000100000 RCX: 0000000000000027
> [ 1158.492938] RDX: 0000000030562000 RSI: ffff000000000000 RDI: 0000000000000000
> [ 1158.500071] RBP: ffffabb520587c08 R08: ffffabb520587bd0 R09: 0000000000000000
> [ 1158.507202] R10: 0000000000000001 R11: 000ffffffffff000 R12: ffff9984abd9e318
> [ 1158.514326] R13: ffff9984abd9e310 R14: 0001000000062000 R15: 0001000000000000
> [ 1158.521452] FS: 0000000000000000(0000) GS:ffff99a36c8c0000(0000) knlGS:0000000000000000
> [ 1158.529540] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1158.535286] CR2: 00007f75b04f1000 CR3: 00000001eddd8000 CR4: 0000000000350ee0
> [ 1158.542419] Call Trace:
> [ 1158.544877] amd_iommu_unmap+0x2c/0x40
> [ 1158.548653] __iommu_unmap+0xc4/0x170
> [ 1158.552344] iommu_unmap_fast+0xe/0x10
> [ 1158.556100] __iommu_dma_unmap+0x85/0x120
> [ 1158.560115] iommu_dma_unmap_sg+0x95/0x110
> [ 1158.564213] dma_unmap_sg_attrs+0x42/0x50
> [ 1158.568225] rdma_rw_ctx_destroy+0x6e/0xc0 [ib_core]
> [ 1158.573201] nvmet_rdma_rw_ctx_destroy+0xa7/0xc0 [nvmet_rdma]
> [ 1158.578944] nvmet_rdma_read_data_done+0x5c/0xf0 [nvmet_rdma]
> [ 1158.584683] __ib_process_cq+0x8e/0x150 [ib_core]
> [ 1158.589398] ib_cq_poll_work+0x2b/0x80 [ib_core]
> [ 1158.594027] process_one_work+0x220/0x3c0
> [ 1158.598038] worker_thread+0x4d/0x3f0
> [ 1158.601696] kthread+0x114/0x150
> [ 1158.604928] ? process_one_work+0x3c0/0x3c0
> [ 1158.609114] ? kthread_park+0x90/0x90
> [ 1158.612783] ret_from_fork+0x22/0x30
>
> We first saw this on a 5.13 kernel but could reproduce with 5.17-rc2.
>
> We found a possibly related bug report [1] that suggested disabling the IOMMU could help, but even after I disabled it (amd_iommu=off iommu=off) I still get errors (nvme IO timeouts). Another thread from 2016[2] suggested that disabling some kernel debug options could workaround the "local protection error" but that didn't help either.
>
> As far as I can tell, the disks are fine, as running the same fio job targeting the real physical devices works fine.
>
> Any suggestions are appreciated.
>
> Thanks,
> Martin
>
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=210177
> [2]: https://lore.kernel.org/all/6BBFD126-877C-4638-BB91-ABF715E29326@oracle.com/
>
> fio script:
> [global]
> name=fio-seq-write
> rw=write
> bs=1M
> direct=1
> numjobs=32
> time_based
> group_reporting=1
> runtime=18000
> end_fsync=1
> size=10G
> ioengine=libaio
> iodepth=16
>
> [file1]
> filename=/dev/nvme0n1
>
> [file2]
> filename=/dev/nvme0n2
>
> [file3]
> filename=/dev/nvme0n3
>
> [file4]
> filename=/dev/nvme0n4
>
> [file5]
> filename=/dev/nvme0n5
>
> [file6]
> filename=/dev/nvme0n6
>
> [file7]
> filename=/dev/nvme0n7
>
> [file8]
> filename=/dev/nvme0n8
>
> [file9]
> filename=/dev/nvme0n9
>
> [file10]
> filename=/dev/nvme0n10
>
> [file11]
> filename=/dev/nvme0n11
>
> [file12]
> filename=/dev/nvme0n12
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Error when running fio against nvme-of rdma target (mlx5 driver)
2022-02-09 8:41 ` Chaitanya Kulkarni via iommu
@ 2022-02-10 23:58 ` Martin Oliveira
2022-02-11 11:35 ` Robin Murphy
0 siblings, 1 reply; 8+ messages in thread
From: Martin Oliveira @ 2022-02-10 23:58 UTC (permalink / raw)
To: Chaitanya Kulkarni
Cc: Kelly Ursenbach, linux-rdma@vger.kernel.org, Lee, Jason,
linux-nvme@lists.infradead.org, iommu@lists.linux-foundation.org,
Logan Gunthorpe
On 2/9/22 1:41 AM, Chaitanya Kulkarni wrote:
> On 2/8/22 6:50 PM, Martin Oliveira wrote:
> > Hello,
> >
> > We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions.
> >
> > Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace.
> >
>
> Thanks for reporting this, if you can bisect the problem on your setup
> it will help others to help you better.
>
> -ck
Hi Chaitanya,
I went back to a kernel as old as 4.15 and the problem was still there, so I don't know of a good commit to start from.
I also learned that I can reproduce this with as little as 3 cards and I updated the firmware on the Mellanox cards to the latest version.
I'd be happy to try any tests if someone has any suggestions.
Thanks,
Martin
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Error when running fio against nvme-of rdma target (mlx5 driver)
2022-02-10 23:58 ` Martin Oliveira
@ 2022-02-11 11:35 ` Robin Murphy
2022-05-17 8:26 ` Mark Ruijter
0 siblings, 1 reply; 8+ messages in thread
From: Robin Murphy @ 2022-02-11 11:35 UTC (permalink / raw)
To: Martin Oliveira, Chaitanya Kulkarni
Cc: Kelly Ursenbach, linux-rdma@vger.kernel.org, Lee, Jason,
linux-nvme@lists.infradead.org, iommu@lists.linux-foundation.org,
Logan Gunthorpe
On 2022-02-10 23:58, Martin Oliveira wrote:
> On 2/9/22 1:41 AM, Chaitanya Kulkarni wrote:
>> On 2/8/22 6:50 PM, Martin Oliveira wrote:
>>> Hello,
>>>
>>> We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions.
>>>
>>> Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace.
>>>
>>
>> Thanks for reporting this, if you can bisect the problem on your setup
>> it will help others to help you better.
>>
>> -ck
>
> Hi Chaitanya,
>
> I went back to a kernel as old as 4.15 and the problem was still there, so I don't know of a good commit to start from.
>
> I also learned that I can reproduce this with as little as 3 cards and I updated the firmware on the Mellanox cards to the latest version.
>
> I'd be happy to try any tests if someone has any suggestions.
The IOMMU is probably your friend here - one thing that might be worth
trying is capturing the iommu:map and iommu:unmap tracepoints to see if
the address reported in subsequent IOMMU faults was previously mapped as
a valid DMA address (be warned that there will likely be a *lot* of
trace generated). With 5.13 or newer, booting with "iommu.forcedac=1"
should also make it easier to tell real DMA IOVAs from rogue physical
addresses or other nonsense, as real DMA addresses should then look more
like 0xffff24d08000.
That could at least help narrow down whether it's some kind of
use-after-free race or a completely bogus address creeping in somehow.
Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Error when running fio against nvme-of rdma target (mlx5 driver)
2022-02-11 11:35 ` Robin Murphy
@ 2022-05-17 8:26 ` Mark Ruijter
2022-05-17 11:16 ` Max Gurtovoy via iommu
0 siblings, 1 reply; 8+ messages in thread
From: Mark Ruijter @ 2022-05-17 8:26 UTC (permalink / raw)
To: Robin Murphy, Martin Oliveira, Chaitanya Kulkarni
Cc: Kelly Ursenbach, linux-rdma@vger.kernel.org, Lee, Jason,
linux-nvme@lists.infradead.org, iommu@lists.linux-foundation.org,
Logan Gunthorpe
Hi Robin,
I ran into the exact same problem while testing with 4 connect-x6 cards, kernel 5.18-rc6.
[ 4878.273016] nvme nvme0: Successfully reconnected (3 attempts)
[ 4879.122015] nvme nvme0: starting error recovery
[ 4879.122028] infiniband mlx5_4: mlx5_handle_error_cqe:332:(pid 0): WC error: 4, Message: local protection error
[ 4879.122035] infiniband mlx5_4: dump_cqe:272:(pid 0): dump error cqe
[ 4879.122037] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 4879.122039] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 4879.122040] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 4879.122040] 00000030: 00 00 00 00 a9 00 56 04 00 00 00 ed 0d da ff e2
[ 4881.085547] nvme nvme3: Reconnecting in 10 seconds...
I assume this means that the problem has still not been resolved?
If so, I'll try to diagnose the problem.
Thanks,
--Mark
On 11/02/2022, 12:35, "Linux-nvme on behalf of Robin Murphy" <linux-nvme-bounces@lists.infradead.org on behalf of robin.murphy@arm.com> wrote:
On 2022-02-10 23:58, Martin Oliveira wrote:
> On 2/9/22 1:41 AM, Chaitanya Kulkarni wrote:
>> On 2/8/22 6:50 PM, Martin Oliveira wrote:
>>> Hello,
>>>
>>> We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions.
>>>
>>> Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace.
>>>
>>
>> Thanks for reporting this, if you can bisect the problem on your setup
>> it will help others to help you better.
>>
>> -ck
>
> Hi Chaitanya,
>
> I went back to a kernel as old as 4.15 and the problem was still there, so I don't know of a good commit to start from.
>
> I also learned that I can reproduce this with as little as 3 cards and I updated the firmware on the Mellanox cards to the latest version.
>
> I'd be happy to try any tests if someone has any suggestions.
The IOMMU is probably your friend here - one thing that might be worth
trying is capturing the iommu:map and iommu:unmap tracepoints to see if
the address reported in subsequent IOMMU faults was previously mapped as
a valid DMA address (be warned that there will likely be a *lot* of
trace generated). With 5.13 or newer, booting with "iommu.forcedac=1"
should also make it easier to tell real DMA IOVAs from rogue physical
addresses or other nonsense, as real DMA addresses should then look more
like 0xffff24d08000.
That could at least help narrow down whether it's some kind of
use-after-free race or a completely bogus address creeping in somehow.
Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Error when running fio against nvme-of rdma target (mlx5 driver)
2022-05-17 8:26 ` Mark Ruijter
@ 2022-05-17 11:16 ` Max Gurtovoy via iommu
0 siblings, 0 replies; 8+ messages in thread
From: Max Gurtovoy via iommu @ 2022-05-17 11:16 UTC (permalink / raw)
To: Mark Ruijter, Robin Murphy, Martin Oliveira, Chaitanya Kulkarni
Cc: Kelly Ursenbach, linux-rdma@vger.kernel.org, Lee, Jason,
linux-nvme@lists.infradead.org, iommu@lists.linux-foundation.org,
Logan Gunthorpe
Hi,
Can you please send the original scenario, setup details and dumps ?
I can't find it in my mailbox.
you can send it directly to me to avoid spam.
-Max.
On 5/17/2022 11:26 AM, Mark Ruijter wrote:
> Hi Robin,
>
> I ran into the exact same problem while testing with 4 connect-x6 cards, kernel 5.18-rc6.
>
> [ 4878.273016] nvme nvme0: Successfully reconnected (3 attempts)
> [ 4879.122015] nvme nvme0: starting error recovery
> [ 4879.122028] infiniband mlx5_4: mlx5_handle_error_cqe:332:(pid 0): WC error: 4, Message: local protection error
> [ 4879.122035] infiniband mlx5_4: dump_cqe:272:(pid 0): dump error cqe
> [ 4879.122037] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 4879.122039] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 4879.122040] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 4879.122040] 00000030: 00 00 00 00 a9 00 56 04 00 00 00 ed 0d da ff e2
> [ 4881.085547] nvme nvme3: Reconnecting in 10 seconds...
>
> I assume this means that the problem has still not been resolved?
> If so, I'll try to diagnose the problem.
>
> Thanks,
>
> --Mark
>
> On 11/02/2022, 12:35, "Linux-nvme on behalf of Robin Murphy" <linux-nvme-bounces@lists.infradead.org on behalf of robin.murphy@arm.com> wrote:
>
> On 2022-02-10 23:58, Martin Oliveira wrote:
> > On 2/9/22 1:41 AM, Chaitanya Kulkarni wrote:
> >> On 2/8/22 6:50 PM, Martin Oliveira wrote:
> >>> Hello,
> >>>
> >>> We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions.
> >>>
> >>> Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace.
> >>>
> >>
> >> Thanks for reporting this, if you can bisect the problem on your setup
> >> it will help others to help you better.
> >>
> >> -ck
> >
> > Hi Chaitanya,
> >
> > I went back to a kernel as old as 4.15 and the problem was still there, so I don't know of a good commit to start from.
> >
> > I also learned that I can reproduce this with as little as 3 cards and I updated the firmware on the Mellanox cards to the latest version.
> >
> > I'd be happy to try any tests if someone has any suggestions.
>
> The IOMMU is probably your friend here - one thing that might be worth
> trying is capturing the iommu:map and iommu:unmap tracepoints to see if
> the address reported in subsequent IOMMU faults was previously mapped as
> a valid DMA address (be warned that there will likely be a *lot* of
> trace generated). With 5.13 or newer, booting with "iommu.forcedac=1"
> should also make it easier to tell real DMA IOVAs from rogue physical
> addresses or other nonsense, as real DMA addresses should then look more
> like 0xffff24d08000.
>
> That could at least help narrow down whether it's some kind of
> use-after-free race or a completely bogus address creeping in somehow.
>
> Robin.
>
>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Error when running fio against nvme-of rdma target (mlx5 driver)
2022-02-09 2:50 Error when running fio against nvme-of rdma target (mlx5 driver) Martin Oliveira
2022-02-09 8:41 ` Chaitanya Kulkarni via iommu
2022-02-09 12:48 ` Robin Murphy
@ 2024-01-31 9:18 ` Arthur Muller
2 siblings, 0 replies; 8+ messages in thread
From: Arthur Muller @ 2024-01-31 9:18 UTC (permalink / raw)
To: Martin Oliveira, linux-nvme@lists.infradead.org, iommu,
linux-rdma@vger.kernel.org
Cc: Kelly Ursenbach, Lee, Jason, Logan Gunthorpe
This is a re-sending of an email due to iommu old list bounce. Now
including new iommu@lists.linux.dev instead. First message was
submitted to LORE and is published at:
https://lore.kernel.org/all/9a40e66eb8ffc48a2e3765cf77f49914d57c55e7.camel@gmx.net/
Dear all,
We've encountered a similar issue. In our case, we are using the Lustre
file system instead of NVMe-oF to connect our storage over the network.
Our setup involves an AMD EPYC 7282 machine paired with Mellanox
MT28908 cards. Following the guidelines in the Nvidia documentation:
https://docs.nvidia.com/networking/display/mlnxenv584150lts/installing+mlnx_en#src-2477565014_InstallingMLNX_EN-InstallationModes
we compiled the MLNX_EN 5.8 LTS driver using VMA. Additionally, we
experimented with the latest MLNX_EN 23.10 driver, encountering the
same issue.
We use kernel 5.15.0 now. This problems first appered after upgrading
our systems from Ubuntu 20.04 LTS to Ubuntu 22.04 LTS and, thus, from
kernel 5.4 to 5.15. Unfortunately, we are not completely sure about the
MLNX_EN driver version prior to the upgrade, but strongly assume <=
5.8.
The error process appears to be initiated by the following error
message:
mlx5_core 0000:63:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
domain=0x0003 address=0x200020f758 flags=0x0020]
This error results in timeouts and read operation faults within the
Lustre file system on both the client and storage ends, potentially
leading to a complete storage failure.
Subsequently, the IO_PAGE_FAULT error is often followed by a "local
protection error," although this is not consistent. The error tends to
manifest after a prolonged period of constant network operation,
typically after approximately one day of continuous read and write
operations involving a Postgres database on the file system.
Regrettably, we have been unable to trigger this error by a manual
action.
Kind regards,
Arthur Müller
On Wed, 2022-02-09 at 02:50 +0000, Martin Oliveira wrote:
> Hello,
>
> We have been hitting an error when running IO over our nvme-of setup,
> using the mlx5 driver and we are wondering if anyone has seen
> anything similar/has any suggestions.
>
> Both initiator and target are AMD EPYC 7502 machines connected over
> RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are
> exposed as a single NVMe fabrics device, one physical SSD per
> namespace.
>
> When running an fio job targeting directly the fabrics devices (no
> filesystem, see script at the end), within a minute or so we start
> seeing errors like this:
>
> [ 408.368677] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x002f address=0x24d08000 flags=0x0000]
> [ 408.372201] infiniband mlx5_0: mlx5_handle_error_cqe:332:(pid 0):
> WC error: 4, Message: local protection error
> [ 408.380181] infiniband mlx5_0: dump_cqe:272:(pid 0): dump error
> cqe
> [ 408.380187] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00
> [ 408.380189] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00
> [ 408.380191] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00
> [ 408.380192] 00000030: 00 00 00 00 a9 00 56 04 00 00 01 e9 00 54 e8
> e2
> [ 408.380230] nvme nvme15: RECV for CQE 0x00000000ce392ed9 failed
> with status local protection error (4)
> [ 408.380235] nvme nvme15: starting error recovery
> [ 408.380238] nvme_ns_head_submit_bio: 726 callbacks suppressed
> [ 408.380246] block nvme15n2: no usable path - requeuing I/O
> [ 408.380284] block nvme15n5: no usable path - requeuing I/O
> [ 408.380298] block nvme15n1: no usable path - requeuing I/O
> [ 408.380304] block nvme15n11: no usable path - requeuing I/O
> [ 408.380304] block nvme15n11: no usable path - requeuing I/O
> [ 408.380330] block nvme15n1: no usable path - requeuing I/O
> [ 408.380350] block nvme15n2: no usable path - requeuing I/O
> [ 408.380371] block nvme15n6: no usable path - requeuing I/O
> [ 408.380377] block nvme15n6: no usable path - requeuing I/O
> [ 408.380382] block nvme15n4: no usable path - requeuing I/O
> [ 408.380472] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x002f address=0x24d09000 flags=0x0000]
> [ 408.391265] mlx5_core 0000:c1:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x002f address=0x24d0a000 flags=0x0000]
> [ 415.125967] nvmet: ctrl 1 keep-alive timer (5 seconds) expired!
> [ 415.131898] nvmet: ctrl 1 fatal error occurred!
>
> Occasionally, we've seen the following stack trace:
>
> [ 1158.152464] kernel BUG at drivers/iommu/amd/io_pgtable.c:485!
> [ 1158.427696] invalid opcode: 0000 [#1] SMP NOPTI
> [ 1158.432228] CPU: 51 PID: 796 Comm: kworker/51:1H Tainted:
> P OE 5.13.0-eid-athena-g6fb4e704d11c-dirty #14
> [ 1158.443867] Hardware name: GIGABYTE R272-Z32-00/MZ32-AR0-00, BIOS
> R21 10/08/2020
> [ 1158.451252] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
> [ 1158.456884] RIP: 0010:iommu_v1_unmap_page+0xed/0x100
> [ 1158.461849] Code: 48 8b 45 d0 65 48 33 04 25 28 00 00 00 75 1d 48
> 83 c4 10 4c 89 f0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 49 8d 46 ff 4c 85
> f0 74 d6 <0f> 0b e8 1c 38 46 00 66 66 2e 0f 1f 84 00 00 00 00 00 90
> 0f 1f 44
> [ 1158.480589] RSP: 0018:ffffabb520587bd0 EFLAGS: 00010206
> [ 1158.485812] RAX: 0001000000061fff RBX: 0000000000100000 RCX:
> 0000000000000027
> [ 1158.492938] RDX: 0000000030562000 RSI: ffff000000000000 RDI:
> 0000000000000000
> [ 1158.500071] RBP: ffffabb520587c08 R08: ffffabb520587bd0 R09:
> 0000000000000000
> [ 1158.507202] R10: 0000000000000001 R11: 000ffffffffff000 R12:
> ffff9984abd9e318
> [ 1158.514326] R13: ffff9984abd9e310 R14: 0001000000062000 R15:
> 0001000000000000
> [ 1158.521452] FS: 0000000000000000(0000) GS:ffff99a36c8c0000(0000)
> knlGS:0000000000000000
> [ 1158.529540] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1158.535286] CR2: 00007f75b04f1000 CR3: 00000001eddd8000 CR4:
> 0000000000350ee0
> [ 1158.542419] Call Trace:
> [ 1158.544877] amd_iommu_unmap+0x2c/0x40
> [ 1158.548653] __iommu_unmap+0xc4/0x170
> [ 1158.552344] iommu_unmap_fast+0xe/0x10
> [ 1158.556100] __iommu_dma_unmap+0x85/0x120
> [ 1158.560115] iommu_dma_unmap_sg+0x95/0x110
> [ 1158.564213] dma_unmap_sg_attrs+0x42/0x50
> [ 1158.568225] rdma_rw_ctx_destroy+0x6e/0xc0 [ib_core]
> [ 1158.573201] nvmet_rdma_rw_ctx_destroy+0xa7/0xc0 [nvmet_rdma]
> [ 1158.578944] nvmet_rdma_read_data_done+0x5c/0xf0 [nvmet_rdma]
> [ 1158.584683] __ib_process_cq+0x8e/0x150 [ib_core]
> [ 1158.589398] ib_cq_poll_work+0x2b/0x80 [ib_core]
> [ 1158.594027] process_one_work+0x220/0x3c0
> [ 1158.598038] worker_thread+0x4d/0x3f0
> [ 1158.601696] kthread+0x114/0x150
> [ 1158.604928] ? process_one_work+0x3c0/0x3c0
> [ 1158.609114] ? kthread_park+0x90/0x90
> [ 1158.612783] ret_from_fork+0x22/0x30
>
> We first saw this on a 5.13 kernel but could reproduce with 5.17-rc2.
>
> We found a possibly related bug report [1] that suggested disabling
> the IOMMU could help, but even after I disabled it (amd_iommu=off
> iommu=off) I still get errors (nvme IO timeouts). Another thread from
> 2016[2] suggested that disabling some kernel debug options could
> workaround the "local protection error" but that didn't help either.
>
> As far as I can tell, the disks are fine, as running the same fio job
> targeting the real physical devices works fine.
>
> Any suggestions are appreciated.
>
> Thanks,
> Martin
>
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=210177
> [2]:
> https://lore.kernel.org/all/6BBFD126-877C-4638-BB91-ABF715E29326@oracle.com/
>
> fio script:
> [global]
> name=fio-seq-write
> rw=write
> bs=1M
> direct=1
> numjobs=32
> time_based
> group_reporting=1
> runtime=18000
> end_fsync=1
> size=10G
> ioengine=libaio
> iodepth=16
>
> [file1]
> filename=/dev/nvme0n1
>
> [file2]
> filename=/dev/nvme0n2
>
> [file3]
> filename=/dev/nvme0n3
>
> [file4]
> filename=/dev/nvme0n4
>
> [file5]
> filename=/dev/nvme0n5
>
> [file6]
> filename=/dev/nvme0n6
>
> [file7]
> filename=/dev/nvme0n7
>
> [file8]
> filename=/dev/nvme0n8
>
> [file9]
> filename=/dev/nvme0n9
>
> [file10]
> filename=/dev/nvme0n10
>
> [file11]
> filename=/dev/nvme0n11
>
> [file12]
> filename=/dev/nvme0n12
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-01-31 9:18 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-02-09 2:50 Error when running fio against nvme-of rdma target (mlx5 driver) Martin Oliveira
2022-02-09 8:41 ` Chaitanya Kulkarni via iommu
2022-02-10 23:58 ` Martin Oliveira
2022-02-11 11:35 ` Robin Murphy
2022-05-17 8:26 ` Mark Ruijter
2022-05-17 11:16 ` Max Gurtovoy via iommu
2022-02-09 12:48 ` Robin Murphy
2024-01-31 9:18 ` Arthur Muller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox