public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
* [bug report] IOMMU reports data translation fault for fio testing
@ 2022-05-13 12:01 John Garry
  2022-05-14  2:28 ` Bart Van Assche
  0 siblings, 1 reply; 4+ messages in thread
From: John Garry @ 2022-05-13 12:01 UTC (permalink / raw)
  To: linux-block, linux-scsi
  Cc: chenxiang66@hisilicon.com >> Xiang Chen, liyihang (E)

Hi guys,

My colleague Yihang Li noticed this issue when testing throughput for 
hisi SAS arm64 controller on v5.18-rc6:

estuary:/$ bash ./create_fio_task.sh 4k read 128 1
test2  4k
my_runtime 1500
Creat 4k_read_depth128_fiotest file successfully
job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128
job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128
job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128
job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128
job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128
job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128
job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128
job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128
job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128
job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128
fio-2.2.10
Starting 10 processes
N[  304.798377] arm-smmu-v3 arm-smmu-v3.3.auto: event 0x10 
received:iops] [eta 23m:39s]
O[  304.804431] arm-smmu-v3 arm-smmu-v3.3.auto:         0x0000741000000010
T[  304.810404] arm-smmu-v3 arm-smmu-v3.3.auto:         0x000012000000007a
I[  304.816379] arm-smmu-v3 arm-smmu-v3.3.auto:         0x00000000abff6000
C[  304.822354] arm-smmu-v3 arm-smmu-v3.3.auto:         0x00000000abff6000
E[  304.828330] arm-smmu-v3 arm-smmu-v3.3.auto: event 0x10 received:
:[  304.834392] arm-smmu-v3 arm-smmu-v3.3.auto:         0x0000741000000010
[  304.840368] arm-smmu-v3 arm-smmu-v3.3.auto:         0x0000120000000058
[  304.846344] arm-smmu-v3 arm-smmu-v3.3.auto:         0x00000000abff6100
R[  304.852320] arm-smmu-v3 arm-smmu-v3.3.auto:         0x00000000abff6000
a[  304.858297] arm-smmu-v3 arm-smmu-v3.3.auto: event 0x10 received:
s[  304.864361] arm-smmu-v3 arm-smmu-v3.3.auto:         0x0000741000000010
I[  304.870337] arm-smmu-v3 arm-smmu-v3.3.auto:         0x000012000000004a
n[  304.876313] arm-smmu-v3 arm-smmu-v3.3.auto:         0x00000000abff62c0
t[  304.882289] arm-smmu-v3 arm-smmu-v3.3.auto:         0x00000000abff6000

Event 0x10 is a translation fault, meaning the DMA mapping is prob 
misconfigured.

I don't think it's an IOMMU issue as I tested that separately with a DMA 
mapping benchmark driver.

I'm told v5.17-rc7 does not have the issue. Any idea on the possible 
cause or if there is a fix in waiting? It could be an issue with the 
SCSI hba driver.

I'll bisect in the meantime.

thanks,
John

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [bug report] IOMMU reports data translation fault for fio testing
  2022-05-13 12:01 [bug report] IOMMU reports data translation fault for fio testing John Garry
@ 2022-05-14  2:28 ` Bart Van Assche
  2022-05-14  9:49   ` John Garry
  0 siblings, 1 reply; 4+ messages in thread
From: Bart Van Assche @ 2022-05-14  2:28 UTC (permalink / raw)
  To: John Garry, linux-block, linux-scsi
  Cc: chenxiang66@hisilicon.com >> Xiang Chen, liyihang (E)

On 5/13/22 05:01, John Garry wrote:
> It could be an issue with the SCSI hba driver.

That seems likely to me.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [bug report] IOMMU reports data translation fault for fio testing
  2022-05-14  2:28 ` Bart Van Assche
@ 2022-05-14  9:49   ` John Garry
  2022-05-16 10:51     ` John Garry
  0 siblings, 1 reply; 4+ messages in thread
From: John Garry @ 2022-05-14  9:49 UTC (permalink / raw)
  To: Bart Van Assche, linux-block, linux-scsi
  Cc: chenxiang66@hisilicon.com >> Xiang Chen, liyihang (E)

On 14/05/2022 03:28, Bart Van Assche wrote:
> On 5/13/22 05:01, John Garry wrote:
>> It could be an issue with the SCSI hba driver.
> 
> That seems likely to me.

Sure, that would be common wisdom. However the commit before anything 
related to driver was added for 5.18 is also bad. It could be 
pre-existing, but that starts to seem unlikely. Or it could still be an 
IOMMU issue - we already have a performance issue there.

This issue can take more than 15 minutes to occur, so is pretty painful 
to bisect...

Thanks,
John



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [bug report] IOMMU reports data translation fault for fio testing
  2022-05-14  9:49   ` John Garry
@ 2022-05-16 10:51     ` John Garry
  0 siblings, 0 replies; 4+ messages in thread
From: John Garry @ 2022-05-16 10:51 UTC (permalink / raw)
  To: Bart Van Assche, linux-block, linux-scsi
  Cc: chenxiang66@hisilicon.com >> Xiang Chen, liyihang (E)

On 14/05/2022 10:49, John Garry wrote:
>>> It could be an issue with the SCSI hba driver.
>>
>> That seems likely to me.
> 

Actually it is a LLDD problem. Sometimes it takes 45 minutes to trigger, 
though – not nice to bisect.

This looks to be the problematic patch:

author John Garry <john.garry@huawei.com> 2022-02-10 18:43:24 +0800
committer Martin K. Petersen <martin.petersen@oracle.com> 2022-02-11 
17:02:50 -0500
commit 26fc0ea74fcb9b76b41f5e9b89728cd1c01559cd (patch)
scsi: libsas: Drop SAS_TASK_AT_INITIATOR

If interested, this looks like the issue:

void hisi_sas_task_deliver(struct hisi_hba *hisi_hba,
break;
}

- spin_lock_irqsave(&task->task_state_lock, flags);
- task->task_state_flags |= SAS_TASK_AT_INITIATOR;
- spin_unlock_irqrestore(&task->task_state_lock, flags);
-
WRITE_ONCE(slot->ready, 1);

Losing the spinlock loses the barrier semantics as well, so a memory 
ordering issue.

> Sure, that would be common wisdom. However the commit before anything 
> related to driver was added for 5.18 is also bad. It could be 
> pre-existing, but that starts to seem unlikely. Or it could still be an 
> IOMMU issue - we already have a performance issue there.
> 
> This issue can take more than 15 minutes to occur, so is pretty painful 
> to bisect...


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-05-16 10:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-05-13 12:01 [bug report] IOMMU reports data translation fault for fio testing John Garry
2022-05-14  2:28 ` Bart Van Assche
2022-05-14  9:49   ` John Garry
2022-05-16 10:51     ` John Garry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox