broken CRCs at NVMeF target with SIW & NVMe/TCP transports

linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* broken CRCs at NVMeF target with SIW & NVMe/TCP transports
@ 2020-03-16 16:20 Krishnamraju Eraparaju
  2020-03-17  9:31 ` Bernard Metzler
  2020-03-17 12:45 ` Christoph Hellwig
  0 siblings, 2 replies; 15+ messages in thread
From: Krishnamraju Eraparaju @ 2020-03-16 16:20 UTC (permalink / raw)
  To: Bernard Metzler, sagi, hch
  Cc: linux-nvme, linux-rdma, Nirranjan Kirubaharan,
	Potnuri Bharat Teja


I'm seeing broken CRCs at NVMeF target while running the below program
at host. Here RDMA transport is SoftiWARP, but I'm also seeing the
same issue with NVMe/TCP aswell.

It appears to me that the same buffer is being rewritten by the
application/ULP before getting the completion for the previous requests.
getting the completion for the previous requests. HW based
HW based trasports(like iw_cxgb4) are not showing this issue because
they copy/DMA and then compute the CRC on copied buffer.

Please share your thoughts/comments/suggestions on this.

Commands used:
--------------
#nvme connect -t tcp -G -a 102.1.1.6 -s 4420 -n nvme-ram0  ==> for
NVMe/TCP
#nvme connect -t rdma -a 102.1.1.6 -s 4420 -n nvme-ram0 ==> for
SoftiWARP
#mkfs.ext3 -F /dev/nvme0n1 (issue occuring frequency is more with ext3
than ext4)
#mount /dev/nvme0n1 /mnt
#Then run the below program:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

int main() {
	int i;
	char* line1 = "123";
	FILE* fp;
	while(1) {
		fp = fopen("/mnt/tmp.txt", "w");
		setvbuf(fp, NULL, _IONBF, 0);
		for (i=0; i<100000; i++)
		     if ((fwrite(line1, 1, strlen(line1), fp) !=
strlen(line1)))
			exit(1);

		if (fclose(fp) != 0)
			exit(1);
	}
return 0;
}

DMESG at NVMe/TCP Target:
[  +5.119267] nvmet_tcp: queue 2: cmd 83 pdu (6) data digest error: recv
0xb1acaf93 expected 0xcd0b877d
[  +0.000017] nvmet: ctrl 1 fatal error occurred!


Thanks,
Krishna.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: broken CRCs at NVMeF target with SIW & NVMe/TCP transports
  2020-03-16 16:20 broken CRCs at NVMeF target with SIW & NVMe/TCP transports Krishnamraju Eraparaju
@ 2020-03-17  9:31 ` Bernard Metzler
  2020-03-17 12:26   ` Tom Talpey
  2020-03-17 12:45 ` Christoph Hellwig
  1 sibling, 1 reply; 15+ messages in thread
From: Bernard Metzler @ 2020-03-17  9:31 UTC (permalink / raw)
  To: Krishnamraju Eraparaju
  Cc: sagi, hch, linux-nvme, linux-rdma, Nirranjan Kirubaharan,
	Potnuri Bharat Teja

-----"Krishnamraju Eraparaju" <krishna2@chelsio.com> wrote: -----

>To: "Bernard Metzler" <BMT@zurich.ibm.com>, sagi@grimberg.me,
>hch@lst.de
>From: "Krishnamraju Eraparaju" <krishna2@chelsio.com>
>Date: 03/16/2020 05:20PM
>Cc: linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org,
>"Nirranjan Kirubaharan" <nirranjan@chelsio.com>, "Potnuri Bharat
>Teja" <bharat@chelsio.com>
>Subject: [EXTERNAL] broken CRCs at NVMeF target with SIW & NVMe/TCP
>transports
>
>I'm seeing broken CRCs at NVMeF target while running the below
>program
>at host. Here RDMA transport is SoftiWARP, but I'm also seeing the
>same issue with NVMe/TCP aswell.
>
>It appears to me that the same buffer is being rewritten by the
>application/ULP before getting the completion for the previous
>requests.
>getting the completion for the previous requests. HW based
>HW based trasports(like iw_cxgb4) are not showing this issue because
>they copy/DMA and then compute the CRC on copied buffer.
>

Thanks Krishna!

Yes, I see those errors as well. For TCP/NVMeF, I see it if
the data digest is enabled, which is functional similar to
have CRC enabled for iWarp. This appears to be your suggested
'-G' command line switch during TCP connect.

For SoftiWarp at host side and iWarp hardware at target side,
CRC gets enabled. Then I see that problem at host side for
SEND type work requests: A page of data referenced by the
SEND gets sometimes modified by the ULP after CRC computation
and before the data gets handed over (copied) to TCP via
kernel_sendmsg(), and far before the ULP reaps a work
completion for that SEND. So the ULP sometimes touches the
buffer after passing ownership to the provider, and before
getting it back by a matching work completion.

With siw and CRC switched off, this issue goes undetected,
since TCP copies the buffer at some point in time, and
only computes its TCP/IP checksum on a stable copy, or
typically even offloaded.

Another question is if it is possible that we are finally
placing stale data, or if closing the file recovers the
error by re-sending affected data. With my experiments,
until now I never detected broken file content after
file close. 

Thanks,
Bernard.

>Please share your thoughts/comments/suggestions on this.
>
>Commands used:
>--------------
>#nvme connect -t tcp -G -a 102.1.1.6 -s 4420 -n nvme-ram0  ==> for
>NVMe/TCP
>#nvme connect -t rdma -a 102.1.1.6 -s 4420 -n nvme-ram0 ==> for
>SoftiWARP
>#mkfs.ext3 -F /dev/nvme0n1 (issue occuring frequency is more with
>ext3
>than ext4)
>#mount /dev/nvme0n1 /mnt
>#Then run the below program:
>#include <stdlib.h>
>#include <stdio.h>
>#include <string.h>
>#include <unistd.h>
>
>int main() {
>	int i;
>	char* line1 = "123";
>	FILE* fp;
>	while(1) {
>		fp = fopen("/mnt/tmp.txt", "w");
>		setvbuf(fp, NULL, _IONBF, 0);
>		for (i=0; i<100000; i++)
>		     if ((fwrite(line1, 1, strlen(line1), fp) !=
>strlen(line1)))
>			exit(1);
>
>		if (fclose(fp) != 0)
>			exit(1);
>	}
>return 0;
>}
>
>DMESG at NVMe/TCP Target:
>[  +5.119267] nvmet_tcp: queue 2: cmd 83 pdu (6) data digest error:
>recv
>0xb1acaf93 expected 0xcd0b877d
>[  +0.000017] nvmet: ctrl 1 fatal error occurred!
>
>
>Thanks,
>Krishna.
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: broken CRCs at NVMeF target with SIW & NVMe/TCP transports
  2020-03-17  9:31 ` Bernard Metzler
@ 2020-03-17 12:26   ` Tom Talpey
  0 siblings, 0 replies; 15+ messages in thread
From: Tom Talpey @ 2020-03-17 12:26 UTC (permalink / raw)
  To: Bernard Metzler, Krishnamraju Eraparaju
  Cc: sagi, hch, linux-nvme, linux-rdma, Nirranjan Kirubaharan,
	Potnuri Bharat Teja

On 3/17/2020 5:31 AM, Bernard Metzler wrote:
> -----"Krishnamraju Eraparaju" <krishna2@chelsio.com> wrote: -----
> 
>> To: "Bernard Metzler" <BMT@zurich.ibm.com>, sagi@grimberg.me,
>> hch@lst.de
>> From: "Krishnamraju Eraparaju" <krishna2@chelsio.com>
>> Date: 03/16/2020 05:20PM
>> Cc: linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org,
>> "Nirranjan Kirubaharan" <nirranjan@chelsio.com>, "Potnuri Bharat
>> Teja" <bharat@chelsio.com>
>> Subject: [EXTERNAL] broken CRCs at NVMeF target with SIW & NVMe/TCP
>> transports
>>
>> I'm seeing broken CRCs at NVMeF target while running the below
>> program
>> at host. Here RDMA transport is SoftiWARP, but I'm also seeing the
>> same issue with NVMe/TCP aswell.
>>
>> It appears to me that the same buffer is being rewritten by the
>> application/ULP before getting the completion for the previous
>> requests.
>> getting the completion for the previous requests. HW based
>> HW based trasports(like iw_cxgb4) are not showing this issue because
>> they copy/DMA and then compute the CRC on copied buffer.
>>
> 
> Thanks Krishna!
> 
> Yes, I see those errors as well. For TCP/NVMeF, I see it if
> the data digest is enabled, which is functional similar to
> have CRC enabled for iWarp. This appears to be your suggested
> '-G' command line switch during TCP connect.
> 
> For SoftiWarp at host side and iWarp hardware at target side,
> CRC gets enabled. Then I see that problem at host side for
> SEND type work requests: A page of data referenced by the
> SEND gets sometimes modified by the ULP after CRC computation
> and before the data gets handed over (copied) to TCP via
> kernel_sendmsg(), and far before the ULP reaps a work
> completion for that SEND. So the ULP sometimes touches the
> buffer after passing ownership to the provider, and before
> getting it back by a matching work completion.

Well, that's a plain ULP bug. It's the very definition of a
send queue work request completion that the buffer has been
accepted by the LLP. Would the ULP read a receive buffer before
getting a completion? Same thing. Would the ULP complain if
its application consumer wrote data into async i/o O_DIRECT
buffers, or while it computed a krb5i hash? Yep.

> With siw and CRC switched off, this issue goes undetected,
> since TCP copies the buffer at some point in time, and
> only computes its TCP/IP checksum on a stable copy, or
> typically even offloaded.

An excellent test, and I'd love to know what ULPs/apps you
caught with it.

Tom.

> Another question is if it is possible that we are finally
> placing stale data, or if closing the file recovers the
> error by re-sending affected data. With my experiments,
> until now I never detected broken file content after
> file close.
> 
> 
> Thanks,
> Bernard.
> 
> 
> 
>> Please share your thoughts/comments/suggestions on this.
>>
>> Commands used:
>> --------------
>> #nvme connect -t tcp -G -a 102.1.1.6 -s 4420 -n nvme-ram0  ==> for
>> NVMe/TCP
>> #nvme connect -t rdma -a 102.1.1.6 -s 4420 -n nvme-ram0 ==> for
>> SoftiWARP
>> #mkfs.ext3 -F /dev/nvme0n1 (issue occuring frequency is more with
>> ext3
>> than ext4)
>> #mount /dev/nvme0n1 /mnt
>> #Then run the below program:
>> #include <stdlib.h>
>> #include <stdio.h>
>> #include <string.h>
>> #include <unistd.h>
>>
>> int main() {
>> 	int i;
>> 	char* line1 = "123";
>> 	FILE* fp;
>> 	while(1) {
>> 		fp = fopen("/mnt/tmp.txt", "w");
>> 		setvbuf(fp, NULL, _IONBF, 0);
>> 		for (i=0; i<100000; i++)
>> 		     if ((fwrite(line1, 1, strlen(line1), fp) !=
>> strlen(line1)))
>> 			exit(1);
>>
>> 		if (fclose(fp) != 0)
>> 			exit(1);
>> 	}
>> return 0;
>> }
>>
>> DMESG at NVMe/TCP Target:
>> [  +5.119267] nvmet_tcp: queue 2: cmd 83 pdu (6) data digest error:
>> recv
>> 0xb1acaf93 expected 0xcd0b877d
>> [  +0.000017] nvmet: ctrl 1 fatal error occurred!
>>
>>
>> Thanks,
>> Krishna.
>>
>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: broken CRCs at NVMeF target with SIW & NVMe/TCP transports
  2020-03-16 16:20 broken CRCs at NVMeF target with SIW & NVMe/TCP transports Krishnamraju Eraparaju
  2020-03-17  9:31 ` Bernard Metzler
@ 2020-03-17 12:45 ` Christoph Hellwig
  2020-03-17 13:17   ` Bernard Metzler
  2020-03-17 16:03   ` Sagi Grimberg
  1 sibling, 2 replies; 15+ messages in thread
From: Christoph Hellwig @ 2020-03-17 12:45 UTC (permalink / raw)
  To: Krishnamraju Eraparaju
  Cc: Bernard Metzler, sagi, hch, linux-nvme, linux-rdma,
	Nirranjan Kirubaharan, Potnuri Bharat Teja

On Mon, Mar 16, 2020 at 09:50:10PM +0530, Krishnamraju Eraparaju wrote:
> 
> I'm seeing broken CRCs at NVMeF target while running the below program
> at host. Here RDMA transport is SoftiWARP, but I'm also seeing the
> same issue with NVMe/TCP aswell.
> 
> It appears to me that the same buffer is being rewritten by the
> application/ULP before getting the completion for the previous requests.
> getting the completion for the previous requests. HW based
> HW based trasports(like iw_cxgb4) are not showing this issue because
> they copy/DMA and then compute the CRC on copied buffer.

For TCP we can set BDI_CAP_STABLE_WRITES.  For RDMA I don't think that
is a good idea as pretty much all RDMA block drivers rely on the
DMA behavior above.  The answer is to bounce buffer the data in
SoftiWARP / SoftRoCE.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Re: broken CRCs at NVMeF target with SIW & NVMe/TCP transports
  2020-03-17 12:45 ` Christoph Hellwig
@ 2020-03-17 13:17   ` Bernard Metzler
  2020-03-17 16:03   ` Sagi Grimberg
  1 sibling, 0 replies; 15+ messages in thread
From: Bernard Metzler @ 2020-03-17 13:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Krishnamraju Eraparaju, sagi, linux-nvme, linux-rdma,
	Nirranjan Kirubaharan, Potnuri Bharat Teja

-----"Christoph Hellwig" <hch@lst.de> wrote: -----

>To: "Krishnamraju Eraparaju" <krishna2@chelsio.com>
>From: "Christoph Hellwig" <hch@lst.de>
>Date: 03/17/2020 01:45PM
>Cc: "Bernard Metzler" <BMT@zurich.ibm.com>, sagi@grimberg.me,
>hch@lst.de, linux-nvme@lists.infradead.org,
>linux-rdma@vger.kernel.org, "Nirranjan Kirubaharan"
><nirranjan@chelsio.com>, "Potnuri Bharat Teja" <bharat@chelsio.com>
>Subject: [EXTERNAL] Re: broken CRCs at NVMeF target with SIW &
>NVMe/TCP transports
>
>On Mon, Mar 16, 2020 at 09:50:10PM +0530, Krishnamraju Eraparaju
>wrote:
>> 
>> I'm seeing broken CRCs at NVMeF target while running the below
>program
>> at host. Here RDMA transport is SoftiWARP, but I'm also seeing the
>> same issue with NVMe/TCP aswell.
>> 
>> It appears to me that the same buffer is being rewritten by the
>> application/ULP before getting the completion for the previous
>requests.
>> getting the completion for the previous requests. HW based
>> HW based trasports(like iw_cxgb4) are not showing this issue
>because
>> they copy/DMA and then compute the CRC on copied buffer.
>
>For TCP we can set BDI_CAP_STABLE_WRITES.  For RDMA I don't think
>that

Hmm, can you elaborate a little more here? I see that flag being
set for data digest enabled (e.g. nvme/host/core.c:nvme_alloc_ns()).
But enabling that data digest CRC is exactly when the NVMeF/TCP
target detects the issue and drops the frame and disconnects...?

The current situation for NVMeF/TCP is that the data digest is
not enabled per default and buffer changes are not detected then.

Krishna first detected it with using siw against hardware iWarp
target, since the CRC gets negotiated then.

>is a good idea as pretty much all RDMA block drivers rely on the
>DMA behavior above.  The answer is to bounce buffer the data in
>SoftiWARP / SoftRoCE.
>
>
Another extra copy of user data isn't really charming. Can we
somehow let the ULP have its fingers crossed until the buffer
got transferred, as signaled back?

Best,
Bernard.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: broken CRCs at NVMeF target with SIW & NVMe/TCP transports
  2020-03-17 12:45 ` Christoph Hellwig
  2020-03-17 13:17   ` Bernard Metzler
@ 2020-03-17 16:03   ` Sagi Grimberg
  2020-03-17 16:29     ` Bernard Metzler
  1 sibling, 1 reply; 15+ messages in thread
From: Sagi Grimberg @ 2020-03-17 16:03 UTC (permalink / raw)
  To: Christoph Hellwig, Krishnamraju Eraparaju
  Cc: Bernard Metzler, linux-nvme, linux-rdma, Nirranjan Kirubaharan,
	Potnuri Bharat Teja


> On Mon, Mar 16, 2020 at 09:50:10PM +0530, Krishnamraju Eraparaju wrote:
>>
>> I'm seeing broken CRCs at NVMeF target while running the below program
>> at host. Here RDMA transport is SoftiWARP, but I'm also seeing the
>> same issue with NVMe/TCP aswell.
>>
>> It appears to me that the same buffer is being rewritten by the
>> application/ULP before getting the completion for the previous requests.
>> getting the completion for the previous requests. HW based
>> HW based trasports(like iw_cxgb4) are not showing this issue because
>> they copy/DMA and then compute the CRC on copied buffer.
> 
> For TCP we can set BDI_CAP_STABLE_WRITES.  For RDMA I don't think that
> is a good idea as pretty much all RDMA block drivers rely on the
> DMA behavior above.  The answer is to bounce buffer the data in
> SoftiWARP / SoftRoCE.

We already do, see nvme_alloc_ns.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: broken CRCs at NVMeF target with SIW & NVMe/TCP transports
  2020-03-17 16:03   ` Sagi Grimberg
@ 2020-03-17 16:29     ` Bernard Metzler
  2020-03-17 16:39       ` Sagi Grimberg
  0 siblings, 1 reply; 15+ messages in thread
From: Bernard Metzler @ 2020-03-17 16:29 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Krishnamraju Eraparaju, linux-nvme, linux-rdma,
	Nirranjan Kirubaharan, Potnuri Bharat Teja

-----"Sagi Grimberg" <sagi@grimberg.me> wrote: -----

>To: "Christoph Hellwig" <hch@lst.de>, "Krishnamraju Eraparaju"
><krishna2@chelsio.com>
>From: "Sagi Grimberg" <sagi@grimberg.me>
>Date: 03/17/2020 05:04PM
>Cc: "Bernard Metzler" <BMT@zurich.ibm.com>,
>linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org,
>"Nirranjan Kirubaharan" <nirranjan@chelsio.com>, "Potnuri Bharat
>Teja" <bharat@chelsio.com>
>Subject: [EXTERNAL] Re: broken CRCs at NVMeF target with SIW &
>NVMe/TCP transports
>
>> On Mon, Mar 16, 2020 at 09:50:10PM +0530, Krishnamraju Eraparaju
>wrote:
>>>
>>> I'm seeing broken CRCs at NVMeF target while running the below
>program
>>> at host. Here RDMA transport is SoftiWARP, but I'm also seeing the
>>> same issue with NVMe/TCP aswell.
>>>
>>> It appears to me that the same buffer is being rewritten by the
>>> application/ULP before getting the completion for the previous
>requests.
>>> getting the completion for the previous requests. HW based
>>> HW based trasports(like iw_cxgb4) are not showing this issue
>because
>>> they copy/DMA and then compute the CRC on copied buffer.
>> 
>> For TCP we can set BDI_CAP_STABLE_WRITES.  For RDMA I don't think
>that
>> is a good idea as pretty much all RDMA block drivers rely on the
>> DMA behavior above.  The answer is to bounce buffer the data in
>> SoftiWARP / SoftRoCE.
>
>We already do, see nvme_alloc_ns.
>
>

Krishna was getting the issue when testing TCP/NVMeF with -G
during connect. That enables data digest and STABLE_WRITES
I think. So to me it seems we don't get stable pages, but
pages which are touched after handover to the provider.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: broken CRCs at NVMeF target with SIW & NVMe/TCP transports
  2020-03-17 16:29     ` Bernard Metzler
@ 2020-03-17 16:39       ` Sagi Grimberg
  2020-03-17 19:17         ` Krishnamraju Eraparaju
  0 siblings, 1 reply; 15+ messages in thread
From: Sagi Grimberg @ 2020-03-17 16:39 UTC (permalink / raw)
  To: Bernard Metzler
  Cc: Christoph Hellwig, Krishnamraju Eraparaju, linux-nvme, linux-rdma,
	Nirranjan Kirubaharan, Potnuri Bharat Teja


>>> For TCP we can set BDI_CAP_STABLE_WRITES.  For RDMA I don't think
>> that
>>> is a good idea as pretty much all RDMA block drivers rely on the
>>> DMA behavior above.  The answer is to bounce buffer the data in
>>> SoftiWARP / SoftRoCE.
>>
>> We already do, see nvme_alloc_ns.
>>
>>
> 
> Krishna was getting the issue when testing TCP/NVMeF with -G
> during connect. That enables data digest and STABLE_WRITES
> I think. So to me it seems we don't get stable pages, but
> pages which are touched after handover to the provider.

Non of the transports modifies the data at any point, both will
scan it to compute crc. So surely this is coming from the fs,
Krishna does this happen with xfs as well?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: broken CRCs at NVMeF target with SIW & NVMe/TCP transports
  2020-03-17 16:39       ` Sagi Grimberg
@ 2020-03-17 19:17         ` Krishnamraju Eraparaju
  2020-03-17 19:33           ` Sagi Grimberg
  0 siblings, 1 reply; 15+ messages in thread
From: Krishnamraju Eraparaju @ 2020-03-17 19:17 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Bernard Metzler, Christoph Hellwig, linux-nvme, linux-rdma,
	Nirranjan Kirubaharan, Potnuri Bharat Teja

On Tuesday, March 03/17/20, 2020 at 09:39:39 -0700, Sagi Grimberg wrote:
> 
> >>>For TCP we can set BDI_CAP_STABLE_WRITES.  For RDMA I don't think
> >>that
> >>>is a good idea as pretty much all RDMA block drivers rely on the
> >>>DMA behavior above.  The answer is to bounce buffer the data in
> >>>SoftiWARP / SoftRoCE.
> >>
> >>We already do, see nvme_alloc_ns.
> >>
> >>
> >
> >Krishna was getting the issue when testing TCP/NVMeF with -G
> >during connect. That enables data digest and STABLE_WRITES
> >I think. So to me it seems we don't get stable pages, but
> >pages which are touched after handover to the provider.
> 
> Non of the transports modifies the data at any point, both will
> scan it to compute crc. So surely this is coming from the fs,
> Krishna does this happen with xfs as well?
Yes, but rare(took ~15min to recreate), whereas with ext3/4
its almost immediate. Here is the error log for NVMe/TCP with xfs.

dmesg at Host:
[  +0.000323] nvme nvme2: creating 12 I/O queues.
[  +0.008991] nvme nvme2: Successfully reconnected (1 attempt)
[ +25.277733] blk_update_request: I/O error, dev nvme2n1, sector 0 op
0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 0
[  +6.043879] XFS (nvme2n1): Mounting V5 Filesystem
[  +0.017745] XFS (nvme2n1): Ending clean mount
[  +0.000174] xfs filesystem being mounted at /mnt supports timestamps
until 2038 (0x7fffffff)
[Mar18 00:14] nvme nvme2: Reconnecting in 10 seconds...
[  +0.000453] nvme nvme2: creating 12 I/O queues.
[  +0.009216] nvme nvme2: Successfully reconnected (1 attempt)
[Mar18 00:43] nvme nvme2: Reconnecting in 10 seconds...
[  +0.000383] nvme nvme2: creating 12 I/O queues.
[  +0.009239] nvme nvme2: Successfully reconnected (1 attempt)


dmesg at Target:
[Mar18 00:14] nvmet_tcp: queue 9: cmd 17 pdu (4) data digest error: recv
0x8e85d882 expected 0x9a46fac3
[  +0.000011] nvmet: ctrl 1 fatal error occurred!
[ +10.240266] nvmet: creating controller 1 for subsystem nvme-ram0 for
NQN nqn.2014-08.org.nvmexpress.chelsio.
[Mar18 00:42] nvmet_tcp: queue 7: cmd 89 pdu (4) data digest error: recv
0xc0ce3dfd expected 0x7ee136b5
[  +0.000012] nvmet: ctrl 1 fatal error occurred!
[Mar18 00:43] nvmet: creating controller 1 for subsystem nvme-ram0 for
NQN nqn.2014-08.org.nvmexpress.chelsio.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: broken CRCs at NVMeF target with SIW & NVMe/TCP transports
  2020-03-17 19:17         ` Krishnamraju Eraparaju
@ 2020-03-17 19:33           ` Sagi Grimberg
  2020-03-17 20:31             ` Krishnamraju Eraparaju
  0 siblings, 1 reply; 15+ messages in thread
From: Sagi Grimberg @ 2020-03-17 19:33 UTC (permalink / raw)
  To: Krishnamraju Eraparaju
  Cc: Bernard Metzler, Christoph Hellwig, linux-nvme, linux-rdma,
	Nirranjan Kirubaharan, Potnuri Bharat Teja


>>>>> For TCP we can set BDI_CAP_STABLE_WRITES.  For RDMA I don't think
>>>> that
>>>>> is a good idea as pretty much all RDMA block drivers rely on the
>>>>> DMA behavior above.  The answer is to bounce buffer the data in
>>>>> SoftiWARP / SoftRoCE.
>>>>
>>>> We already do, see nvme_alloc_ns.
>>>>
>>>>
>>>
>>> Krishna was getting the issue when testing TCP/NVMeF with -G
>>> during connect. That enables data digest and STABLE_WRITES
>>> I think. So to me it seems we don't get stable pages, but
>>> pages which are touched after handover to the provider.
>>
>> Non of the transports modifies the data at any point, both will
>> scan it to compute crc. So surely this is coming from the fs,
>> Krishna does this happen with xfs as well?
> Yes, but rare(took ~15min to recreate), whereas with ext3/4
> its almost immediate. Here is the error log for NVMe/TCP with xfs.

Thanks Krishna,

I assume that this makes the issue go away?
--
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 11e10fe1760f..cc93e1949b2c 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -889,7 +889,7 @@ static int nvme_tcp_try_send_data(struct 
nvme_tcp_request *req)
                         flags |= MSG_MORE;

                 /* can't zcopy slab pages */
-               if (unlikely(PageSlab(page))) {
+               if (unlikely(PageSlab(page)) || queue->data_digest) {
                         ret = sock_no_sendpage(queue->sock, page, 
offset, len,
                                         flags);
                 } else {
--

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: broken CRCs at NVMeF target with SIW & NVMe/TCP transports
  2020-03-17 19:33           ` Sagi Grimberg
@ 2020-03-17 20:31             ` Krishnamraju Eraparaju
  2020-03-18 16:49               ` Sagi Grimberg
  0 siblings, 1 reply; 15+ messages in thread
From: Krishnamraju Eraparaju @ 2020-03-17 20:31 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Bernard Metzler, Christoph Hellwig, linux-nvme, linux-rdma,
	Nirranjan Kirubaharan, Potnuri Bharat Teja

On Tuesday, March 03/17/20, 2020 at 12:33:44 -0700, Sagi Grimberg wrote:
> 
> >>>>>For TCP we can set BDI_CAP_STABLE_WRITES.  For RDMA I don't think
> >>>>that
> >>>>>is a good idea as pretty much all RDMA block drivers rely on the
> >>>>>DMA behavior above.  The answer is to bounce buffer the data in
> >>>>>SoftiWARP / SoftRoCE.
> >>>>
> >>>>We already do, see nvme_alloc_ns.
> >>>>
> >>>>
> >>>
> >>>Krishna was getting the issue when testing TCP/NVMeF with -G
> >>>during connect. That enables data digest and STABLE_WRITES
> >>>I think. So to me it seems we don't get stable pages, but
> >>>pages which are touched after handover to the provider.
> >>
> >>Non of the transports modifies the data at any point, both will
> >>scan it to compute crc. So surely this is coming from the fs,
> >>Krishna does this happen with xfs as well?
> >Yes, but rare(took ~15min to recreate), whereas with ext3/4
> >its almost immediate. Here is the error log for NVMe/TCP with xfs.
> 
> Thanks Krishna,
> 
> I assume that this makes the issue go away?
> --
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 11e10fe1760f..cc93e1949b2c 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -889,7 +889,7 @@ static int nvme_tcp_try_send_data(struct
> nvme_tcp_request *req)
>                         flags |= MSG_MORE;
> 
>                 /* can't zcopy slab pages */
> -               if (unlikely(PageSlab(page))) {
> +               if (unlikely(PageSlab(page)) || queue->data_digest) {
>                         ret = sock_no_sendpage(queue->sock, page,
> offset, len,
>                                         flags);
>                 } else {
> --

Unfortunately, issue is still occuring with this patch also.

Looks like the integrity of the data buffer right after the CRC
computation(data digest) is what causing this issue, despite the
buffer being sent via sendpage or no_sendpage.

Thanks,
Krishna.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: broken CRCs at NVMeF target with SIW & NVMe/TCP transports
  2020-03-17 20:31             ` Krishnamraju Eraparaju
@ 2020-03-18 16:49               ` Sagi Grimberg
  2020-03-20 14:35                 ` Krishnamraju Eraparaju
  0 siblings, 1 reply; 15+ messages in thread
From: Sagi Grimberg @ 2020-03-18 16:49 UTC (permalink / raw)
  To: Krishnamraju Eraparaju
  Cc: Bernard Metzler, Christoph Hellwig, linux-nvme, linux-rdma,
	Nirranjan Kirubaharan, Potnuri Bharat Teja


>> Thanks Krishna,
>>
>> I assume that this makes the issue go away?
>> --
>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>> index 11e10fe1760f..cc93e1949b2c 100644
>> --- a/drivers/nvme/host/tcp.c
>> +++ b/drivers/nvme/host/tcp.c
>> @@ -889,7 +889,7 @@ static int nvme_tcp_try_send_data(struct
>> nvme_tcp_request *req)
>>                          flags |= MSG_MORE;
>>
>>                  /* can't zcopy slab pages */
>> -               if (unlikely(PageSlab(page))) {
>> +               if (unlikely(PageSlab(page)) || queue->data_digest) {
>>                          ret = sock_no_sendpage(queue->sock, page,
>> offset, len,
>>                                          flags);
>>                  } else {
>> --
> 
> Unfortunately, issue is still occuring with this patch also.
> 
> Looks like the integrity of the data buffer right after the CRC
> computation(data digest) is what causing this issue, despite the
> buffer being sent via sendpage or no_sendpage.

I assume this happens with iSCSI as well? There is nothing special
we are doing with respect to digest.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: broken CRCs at NVMeF target with SIW & NVMe/TCP transports
  2020-03-18 16:49               ` Sagi Grimberg
@ 2020-03-20 14:35                 ` Krishnamraju Eraparaju
  2020-03-20 20:49                   ` Sagi Grimberg
  0 siblings, 1 reply; 15+ messages in thread
From: Krishnamraju Eraparaju @ 2020-03-20 14:35 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Bernard Metzler, Christoph Hellwig, linux-nvme, linux-rdma,
	Nirranjan Kirubaharan, Potnuri Bharat Teja

On Wednesday, March 03/18/20, 2020 at 09:49:07 -0700, Sagi Grimberg wrote:
> 
> >>Thanks Krishna,
> >>
> >>I assume that this makes the issue go away?
> >>--
> >>diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> >>index 11e10fe1760f..cc93e1949b2c 100644
> >>--- a/drivers/nvme/host/tcp.c
> >>+++ b/drivers/nvme/host/tcp.c
> >>@@ -889,7 +889,7 @@ static int nvme_tcp_try_send_data(struct
> >>nvme_tcp_request *req)
> >>                         flags |= MSG_MORE;
> >>
> >>                 /* can't zcopy slab pages */
> >>-               if (unlikely(PageSlab(page))) {
> >>+               if (unlikely(PageSlab(page)) || queue->data_digest) {
> >>                         ret = sock_no_sendpage(queue->sock, page,
> >>offset, len,
> >>                                         flags);
> >>                 } else {
> >>--
> >
> >Unfortunately, issue is still occuring with this patch also.
> >
> >Looks like the integrity of the data buffer right after the CRC
> >computation(data digest) is what causing this issue, despite the
> >buffer being sent via sendpage or no_sendpage.
> 
> I assume this happens with iSCSI as well? There is nothing special
> we are doing with respect to digest.

I don't see this issue with iscsi-tcp.

May be blk-mq is causing this issue? I assume iscsi-tcp does not have
blk_mq support yet upstream to verify with blk_mq enabled.
I tried on Ubuntu 19.10(which is based on Linux kernel 5.3), note that
RHEL does not support DataDigest.

The reason that I'm seeing this issue only with NVMe(tcp/softiwarp) &
iSER(softiwarp) is becuase of NVMeF&ISER using blk-mq? 

Anyhow, I see the content of the page is being updated by upper layers
while the tranport driver is computing CRC on that page content and
this needs a fix.

one could very easily recreate this issue running the below simple program over
NVMe/TCP.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

int main() {
	int i;
	char* line1 = "123";
	FILE* fp;
	while(1) {
		fp = fopen("/mnt/tmp.txt", "w");
		setvbuf(fp, NULL, _IONBF, 0);
		for (i=0; i<100000; i++)
		     if ((fwrite(line1, 1, strlen(line1), fp) !=
strlen(line1)))
			exit(1);

		if (fclose(fp) != 0)
			exit(1);
	}
return 0;
}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: broken CRCs at NVMeF target with SIW & NVMe/TCP transports
  2020-03-20 14:35                 ` Krishnamraju Eraparaju
@ 2020-03-20 20:49                   ` Sagi Grimberg
  2020-03-21  4:02                     ` Krishnamraju Eraparaju
  0 siblings, 1 reply; 15+ messages in thread
From: Sagi Grimberg @ 2020-03-20 20:49 UTC (permalink / raw)
  To: Krishnamraju Eraparaju
  Cc: Bernard Metzler, Christoph Hellwig, linux-nvme, linux-rdma,
	Nirranjan Kirubaharan, Potnuri Bharat Teja


>> I assume this happens with iSCSI as well? There is nothing special
>> we are doing with respect to digest.
> 
> I don't see this issue with iscsi-tcp.
> 
> May be blk-mq is causing this issue? I assume iscsi-tcp does not have
> blk_mq support yet upstream to verify with blk_mq enabled.
> I tried on Ubuntu 19.10(which is based on Linux kernel 5.3), note that
> RHEL does not support DataDigest.
> 
> The reason that I'm seeing this issue only with NVMe(tcp/softiwarp) &
> iSER(softiwarp) is becuase of NVMeF&ISER using blk-mq?
> 
> Anyhow, I see the content of the page is being updated by upper layers
> while the tranport driver is computing CRC on that page content and
> this needs a fix.

Krishna, do you happen to run with nvme multipath enabled?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: broken CRCs at NVMeF target with SIW & NVMe/TCP transports
  2020-03-20 20:49                   ` Sagi Grimberg
@ 2020-03-21  4:02                     ` Krishnamraju Eraparaju
  0 siblings, 0 replies; 15+ messages in thread
From: Krishnamraju Eraparaju @ 2020-03-21  4:02 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Bernard Metzler, Christoph Hellwig, linux-nvme, linux-rdma,
	Nirranjan Kirubaharan, Potnuri Bharat Teja

On Friday, March 03/20/20, 2020 at 13:49:25 -0700, Sagi Grimberg wrote:
> 
> >>I assume this happens with iSCSI as well? There is nothing special
> >>we are doing with respect to digest.
> >
> >I don't see this issue with iscsi-tcp.
> >
> >May be blk-mq is causing this issue? I assume iscsi-tcp does not have
> >blk_mq support yet upstream to verify with blk_mq enabled.
> >I tried on Ubuntu 19.10(which is based on Linux kernel 5.3), note that
> >RHEL does not support DataDigest.
> >
> >The reason that I'm seeing this issue only with NVMe(tcp/softiwarp) &
> >iSER(softiwarp) is becuase of NVMeF&ISER using blk-mq?
> >
> >Anyhow, I see the content of the page is being updated by upper layers
> >while the tranport driver is computing CRC on that page content and
> >this needs a fix.
> 
> Krishna, do you happen to run with nvme multipath enabled?

Yes Sagi, issue occurs with nvme multipath enabled also..

dmesg at initiator:
[ +10.671996] EXT4-fs (nvme0n1): mounting ext3 file system using the
ext4 subsystem
[  +0.004643] EXT4-fs (nvme0n1): mounted filesystem with ordered data
mode. Opts: (null)
[ +15.955424] block nvme0n1: no usable path - requeuing I/O
[  +0.000142] block nvme0n1: no usable path - requeuing I/O
[  +0.000135] block nvme0n1: no usable path - requeuing I/O
[  +0.000119] block nvme0n1: no usable path - requeuing I/O
[  +0.000108] block nvme0n1: no usable path - requeuing I/O
[  +0.000111] block nvme0n1: no usable path - requeuing I/O
[  +0.000118] block nvme0n1: no usable path - requeuing I/O
[  +0.000158] block nvme0n1: no usable path - requeuing I/O
[  +0.000130] block nvme0n1: no usable path - requeuing I/O
[  +0.000138] block nvme0n1: no usable path - requeuing I/O
[  +0.011754] nvme nvme0: Reconnecting in 10 seconds...
[ +10.261223] nvme_ns_head_make_request: 5 callbacks suppressed
[  +0.000002] block nvme0n1: no usable path - requeuing I/O
[  +0.000240] block nvme0n1: no usable path - requeuing I/O
[  +0.000107] block nvme0n1: no usable path - requeuing I/O
[  +0.000107] block nvme0n1: no usable path - requeuing I/O
[  +0.000107] block nvme0n1: no usable path - requeuing I/O
[  +0.000108] block nvme0n1: no usable path - requeuing I/O
[  +0.000132] block nvme0n1: no usable path - requeuing I/O
[  +0.000010] nvme nvme0: creating 12 I/O queues.
[  +0.000110] block nvme0n1: no usable path - requeuing I/O
[  +0.000232] block nvme0n1: no usable path - requeuing I/O
[  +0.000122] block nvme0n1: no usable path - requeuing I/O
[  +0.008407] nvme nvme0: Successfully reconnected (1 attempt)

dmesg at target:
[Mar21 09:24] nvmet_tcp: queue 3: cmd 38 pdu (6) data digest error: recv
0x21e59730 expected 0x2b88fed0
[  +0.000029] nvmet: ctrl 1 fatal error occurred!
[ +10.280101] nvmet: creating controller 1 for subsystem nvme-ram0 for
NQN nqn.2014-08.org.nvmexpress.chelsio.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-03-21  4:02 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-03-16 16:20 broken CRCs at NVMeF target with SIW & NVMe/TCP transports Krishnamraju Eraparaju
2020-03-17  9:31 ` Bernard Metzler
2020-03-17 12:26   ` Tom Talpey
2020-03-17 12:45 ` Christoph Hellwig
2020-03-17 13:17   ` Bernard Metzler
2020-03-17 16:03   ` Sagi Grimberg
2020-03-17 16:29     ` Bernard Metzler
2020-03-17 16:39       ` Sagi Grimberg
2020-03-17 19:17         ` Krishnamraju Eraparaju
2020-03-17 19:33           ` Sagi Grimberg
2020-03-17 20:31             ` Krishnamraju Eraparaju
2020-03-18 16:49               ` Sagi Grimberg
2020-03-20 14:35                 ` Krishnamraju Eraparaju
2020-03-20 20:49                   ` Sagi Grimberg
2020-03-21  4:02                     ` Krishnamraju Eraparaju

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).