* broken CRCs at NVMeF target with SIW & NVMe/TCP transports
@ 2020-03-16 16:20 Krishnamraju Eraparaju
2020-03-17 9:31 ` Bernard Metzler
2020-03-17 12:45 ` Christoph Hellwig
0 siblings, 2 replies; 15+ messages in thread
From: Krishnamraju Eraparaju @ 2020-03-16 16:20 UTC (permalink / raw)
To: Bernard Metzler, sagi, hch
Cc: linux-nvme, linux-rdma, Nirranjan Kirubaharan,
Potnuri Bharat Teja
I'm seeing broken CRCs at NVMeF target while running the below program
at host. Here RDMA transport is SoftiWARP, but I'm also seeing the
same issue with NVMe/TCP aswell.
It appears to me that the same buffer is being rewritten by the
application/ULP before getting the completion for the previous requests.
getting the completion for the previous requests. HW based
HW based trasports(like iw_cxgb4) are not showing this issue because
they copy/DMA and then compute the CRC on copied buffer.
Please share your thoughts/comments/suggestions on this.
Commands used:
--------------
#nvme connect -t tcp -G -a 102.1.1.6 -s 4420 -n nvme-ram0 ==> for
NVMe/TCP
#nvme connect -t rdma -a 102.1.1.6 -s 4420 -n nvme-ram0 ==> for
SoftiWARP
#mkfs.ext3 -F /dev/nvme0n1 (issue occuring frequency is more with ext3
than ext4)
#mount /dev/nvme0n1 /mnt
#Then run the below program:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
int main() {
int i;
char* line1 = "123";
FILE* fp;
while(1) {
fp = fopen("/mnt/tmp.txt", "w");
setvbuf(fp, NULL, _IONBF, 0);
for (i=0; i<100000; i++)
if ((fwrite(line1, 1, strlen(line1), fp) !=
strlen(line1)))
exit(1);
if (fclose(fp) != 0)
exit(1);
}
return 0;
}
DMESG at NVMe/TCP Target:
[ +5.119267] nvmet_tcp: queue 2: cmd 83 pdu (6) data digest error: recv
0xb1acaf93 expected 0xcd0b877d
[ +0.000017] nvmet: ctrl 1 fatal error occurred!
Thanks,
Krishna.
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: broken CRCs at NVMeF target with SIW & NVMe/TCP transports 2020-03-16 16:20 broken CRCs at NVMeF target with SIW & NVMe/TCP transports Krishnamraju Eraparaju @ 2020-03-17 9:31 ` Bernard Metzler 2020-03-17 12:26 ` Tom Talpey 2020-03-17 12:45 ` Christoph Hellwig 1 sibling, 1 reply; 15+ messages in thread From: Bernard Metzler @ 2020-03-17 9:31 UTC (permalink / raw) To: Krishnamraju Eraparaju Cc: sagi, hch, linux-nvme, linux-rdma, Nirranjan Kirubaharan, Potnuri Bharat Teja -----"Krishnamraju Eraparaju" <krishna2@chelsio.com> wrote: ----- >To: "Bernard Metzler" <BMT@zurich.ibm.com>, sagi@grimberg.me, >hch@lst.de >From: "Krishnamraju Eraparaju" <krishna2@chelsio.com> >Date: 03/16/2020 05:20PM >Cc: linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, >"Nirranjan Kirubaharan" <nirranjan@chelsio.com>, "Potnuri Bharat >Teja" <bharat@chelsio.com> >Subject: [EXTERNAL] broken CRCs at NVMeF target with SIW & NVMe/TCP >transports > >I'm seeing broken CRCs at NVMeF target while running the below >program >at host. Here RDMA transport is SoftiWARP, but I'm also seeing the >same issue with NVMe/TCP aswell. > >It appears to me that the same buffer is being rewritten by the >application/ULP before getting the completion for the previous >requests. >getting the completion for the previous requests. HW based >HW based trasports(like iw_cxgb4) are not showing this issue because >they copy/DMA and then compute the CRC on copied buffer. > Thanks Krishna! Yes, I see those errors as well. For TCP/NVMeF, I see it if the data digest is enabled, which is functional similar to have CRC enabled for iWarp. This appears to be your suggested '-G' command line switch during TCP connect. For SoftiWarp at host side and iWarp hardware at target side, CRC gets enabled. Then I see that problem at host side for SEND type work requests: A page of data referenced by the SEND gets sometimes modified by the ULP after CRC computation and before the data gets handed over (copied) to TCP via kernel_sendmsg(), and far before the ULP reaps a work completion for that SEND. So the ULP sometimes touches the buffer after passing ownership to the provider, and before getting it back by a matching work completion. With siw and CRC switched off, this issue goes undetected, since TCP copies the buffer at some point in time, and only computes its TCP/IP checksum on a stable copy, or typically even offloaded. Another question is if it is possible that we are finally placing stale data, or if closing the file recovers the error by re-sending affected data. With my experiments, until now I never detected broken file content after file close. Thanks, Bernard. >Please share your thoughts/comments/suggestions on this. > >Commands used: >-------------- >#nvme connect -t tcp -G -a 102.1.1.6 -s 4420 -n nvme-ram0 ==> for >NVMe/TCP >#nvme connect -t rdma -a 102.1.1.6 -s 4420 -n nvme-ram0 ==> for >SoftiWARP >#mkfs.ext3 -F /dev/nvme0n1 (issue occuring frequency is more with >ext3 >than ext4) >#mount /dev/nvme0n1 /mnt >#Then run the below program: >#include <stdlib.h> >#include <stdio.h> >#include <string.h> >#include <unistd.h> > >int main() { > int i; > char* line1 = "123"; > FILE* fp; > while(1) { > fp = fopen("/mnt/tmp.txt", "w"); > setvbuf(fp, NULL, _IONBF, 0); > for (i=0; i<100000; i++) > if ((fwrite(line1, 1, strlen(line1), fp) != >strlen(line1))) > exit(1); > > if (fclose(fp) != 0) > exit(1); > } >return 0; >} > >DMESG at NVMe/TCP Target: >[ +5.119267] nvmet_tcp: queue 2: cmd 83 pdu (6) data digest error: >recv >0xb1acaf93 expected 0xcd0b877d >[ +0.000017] nvmet: ctrl 1 fatal error occurred! > > >Thanks, >Krishna. > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: broken CRCs at NVMeF target with SIW & NVMe/TCP transports 2020-03-17 9:31 ` Bernard Metzler @ 2020-03-17 12:26 ` Tom Talpey 0 siblings, 0 replies; 15+ messages in thread From: Tom Talpey @ 2020-03-17 12:26 UTC (permalink / raw) To: Bernard Metzler, Krishnamraju Eraparaju Cc: sagi, hch, linux-nvme, linux-rdma, Nirranjan Kirubaharan, Potnuri Bharat Teja On 3/17/2020 5:31 AM, Bernard Metzler wrote: > -----"Krishnamraju Eraparaju" <krishna2@chelsio.com> wrote: ----- > >> To: "Bernard Metzler" <BMT@zurich.ibm.com>, sagi@grimberg.me, >> hch@lst.de >> From: "Krishnamraju Eraparaju" <krishna2@chelsio.com> >> Date: 03/16/2020 05:20PM >> Cc: linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, >> "Nirranjan Kirubaharan" <nirranjan@chelsio.com>, "Potnuri Bharat >> Teja" <bharat@chelsio.com> >> Subject: [EXTERNAL] broken CRCs at NVMeF target with SIW & NVMe/TCP >> transports >> >> I'm seeing broken CRCs at NVMeF target while running the below >> program >> at host. Here RDMA transport is SoftiWARP, but I'm also seeing the >> same issue with NVMe/TCP aswell. >> >> It appears to me that the same buffer is being rewritten by the >> application/ULP before getting the completion for the previous >> requests. >> getting the completion for the previous requests. HW based >> HW based trasports(like iw_cxgb4) are not showing this issue because >> they copy/DMA and then compute the CRC on copied buffer. >> > > Thanks Krishna! > > Yes, I see those errors as well. For TCP/NVMeF, I see it if > the data digest is enabled, which is functional similar to > have CRC enabled for iWarp. This appears to be your suggested > '-G' command line switch during TCP connect. > > For SoftiWarp at host side and iWarp hardware at target side, > CRC gets enabled. Then I see that problem at host side for > SEND type work requests: A page of data referenced by the > SEND gets sometimes modified by the ULP after CRC computation > and before the data gets handed over (copied) to TCP via > kernel_sendmsg(), and far before the ULP reaps a work > completion for that SEND. So the ULP sometimes touches the > buffer after passing ownership to the provider, and before > getting it back by a matching work completion. Well, that's a plain ULP bug. It's the very definition of a send queue work request completion that the buffer has been accepted by the LLP. Would the ULP read a receive buffer before getting a completion? Same thing. Would the ULP complain if its application consumer wrote data into async i/o O_DIRECT buffers, or while it computed a krb5i hash? Yep. > With siw and CRC switched off, this issue goes undetected, > since TCP copies the buffer at some point in time, and > only computes its TCP/IP checksum on a stable copy, or > typically even offloaded. An excellent test, and I'd love to know what ULPs/apps you caught with it. Tom. > Another question is if it is possible that we are finally > placing stale data, or if closing the file recovers the > error by re-sending affected data. With my experiments, > until now I never detected broken file content after > file close. > > > Thanks, > Bernard. > > > >> Please share your thoughts/comments/suggestions on this. >> >> Commands used: >> -------------- >> #nvme connect -t tcp -G -a 102.1.1.6 -s 4420 -n nvme-ram0 ==> for >> NVMe/TCP >> #nvme connect -t rdma -a 102.1.1.6 -s 4420 -n nvme-ram0 ==> for >> SoftiWARP >> #mkfs.ext3 -F /dev/nvme0n1 (issue occuring frequency is more with >> ext3 >> than ext4) >> #mount /dev/nvme0n1 /mnt >> #Then run the below program: >> #include <stdlib.h> >> #include <stdio.h> >> #include <string.h> >> #include <unistd.h> >> >> int main() { >> int i; >> char* line1 = "123"; >> FILE* fp; >> while(1) { >> fp = fopen("/mnt/tmp.txt", "w"); >> setvbuf(fp, NULL, _IONBF, 0); >> for (i=0; i<100000; i++) >> if ((fwrite(line1, 1, strlen(line1), fp) != >> strlen(line1))) >> exit(1); >> >> if (fclose(fp) != 0) >> exit(1); >> } >> return 0; >> } >> >> DMESG at NVMe/TCP Target: >> [ +5.119267] nvmet_tcp: queue 2: cmd 83 pdu (6) data digest error: >> recv >> 0xb1acaf93 expected 0xcd0b877d >> [ +0.000017] nvmet: ctrl 1 fatal error occurred! >> >> >> Thanks, >> Krishna. >> >> > > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: broken CRCs at NVMeF target with SIW & NVMe/TCP transports 2020-03-16 16:20 broken CRCs at NVMeF target with SIW & NVMe/TCP transports Krishnamraju Eraparaju 2020-03-17 9:31 ` Bernard Metzler @ 2020-03-17 12:45 ` Christoph Hellwig 2020-03-17 13:17 ` Bernard Metzler 2020-03-17 16:03 ` Sagi Grimberg 1 sibling, 2 replies; 15+ messages in thread From: Christoph Hellwig @ 2020-03-17 12:45 UTC (permalink / raw) To: Krishnamraju Eraparaju Cc: Bernard Metzler, sagi, hch, linux-nvme, linux-rdma, Nirranjan Kirubaharan, Potnuri Bharat Teja On Mon, Mar 16, 2020 at 09:50:10PM +0530, Krishnamraju Eraparaju wrote: > > I'm seeing broken CRCs at NVMeF target while running the below program > at host. Here RDMA transport is SoftiWARP, but I'm also seeing the > same issue with NVMe/TCP aswell. > > It appears to me that the same buffer is being rewritten by the > application/ULP before getting the completion for the previous requests. > getting the completion for the previous requests. HW based > HW based trasports(like iw_cxgb4) are not showing this issue because > they copy/DMA and then compute the CRC on copied buffer. For TCP we can set BDI_CAP_STABLE_WRITES. For RDMA I don't think that is a good idea as pretty much all RDMA block drivers rely on the DMA behavior above. The answer is to bounce buffer the data in SoftiWARP / SoftRoCE. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re: broken CRCs at NVMeF target with SIW & NVMe/TCP transports 2020-03-17 12:45 ` Christoph Hellwig @ 2020-03-17 13:17 ` Bernard Metzler 2020-03-17 16:03 ` Sagi Grimberg 1 sibling, 0 replies; 15+ messages in thread From: Bernard Metzler @ 2020-03-17 13:17 UTC (permalink / raw) To: Christoph Hellwig Cc: Krishnamraju Eraparaju, sagi, linux-nvme, linux-rdma, Nirranjan Kirubaharan, Potnuri Bharat Teja -----"Christoph Hellwig" <hch@lst.de> wrote: ----- >To: "Krishnamraju Eraparaju" <krishna2@chelsio.com> >From: "Christoph Hellwig" <hch@lst.de> >Date: 03/17/2020 01:45PM >Cc: "Bernard Metzler" <BMT@zurich.ibm.com>, sagi@grimberg.me, >hch@lst.de, linux-nvme@lists.infradead.org, >linux-rdma@vger.kernel.org, "Nirranjan Kirubaharan" ><nirranjan@chelsio.com>, "Potnuri Bharat Teja" <bharat@chelsio.com> >Subject: [EXTERNAL] Re: broken CRCs at NVMeF target with SIW & >NVMe/TCP transports > >On Mon, Mar 16, 2020 at 09:50:10PM +0530, Krishnamraju Eraparaju >wrote: >> >> I'm seeing broken CRCs at NVMeF target while running the below >program >> at host. Here RDMA transport is SoftiWARP, but I'm also seeing the >> same issue with NVMe/TCP aswell. >> >> It appears to me that the same buffer is being rewritten by the >> application/ULP before getting the completion for the previous >requests. >> getting the completion for the previous requests. HW based >> HW based trasports(like iw_cxgb4) are not showing this issue >because >> they copy/DMA and then compute the CRC on copied buffer. > >For TCP we can set BDI_CAP_STABLE_WRITES. For RDMA I don't think >that Hmm, can you elaborate a little more here? I see that flag being set for data digest enabled (e.g. nvme/host/core.c:nvme_alloc_ns()). But enabling that data digest CRC is exactly when the NVMeF/TCP target detects the issue and drops the frame and disconnects...? The current situation for NVMeF/TCP is that the data digest is not enabled per default and buffer changes are not detected then. Krishna first detected it with using siw against hardware iWarp target, since the CRC gets negotiated then. >is a good idea as pretty much all RDMA block drivers rely on the >DMA behavior above. The answer is to bounce buffer the data in >SoftiWARP / SoftRoCE. > > Another extra copy of user data isn't really charming. Can we somehow let the ULP have its fingers crossed until the buffer got transferred, as signaled back? Best, Bernard. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: broken CRCs at NVMeF target with SIW & NVMe/TCP transports 2020-03-17 12:45 ` Christoph Hellwig 2020-03-17 13:17 ` Bernard Metzler @ 2020-03-17 16:03 ` Sagi Grimberg 2020-03-17 16:29 ` Bernard Metzler 1 sibling, 1 reply; 15+ messages in thread From: Sagi Grimberg @ 2020-03-17 16:03 UTC (permalink / raw) To: Christoph Hellwig, Krishnamraju Eraparaju Cc: Bernard Metzler, linux-nvme, linux-rdma, Nirranjan Kirubaharan, Potnuri Bharat Teja > On Mon, Mar 16, 2020 at 09:50:10PM +0530, Krishnamraju Eraparaju wrote: >> >> I'm seeing broken CRCs at NVMeF target while running the below program >> at host. Here RDMA transport is SoftiWARP, but I'm also seeing the >> same issue with NVMe/TCP aswell. >> >> It appears to me that the same buffer is being rewritten by the >> application/ULP before getting the completion for the previous requests. >> getting the completion for the previous requests. HW based >> HW based trasports(like iw_cxgb4) are not showing this issue because >> they copy/DMA and then compute the CRC on copied buffer. > > For TCP we can set BDI_CAP_STABLE_WRITES. For RDMA I don't think that > is a good idea as pretty much all RDMA block drivers rely on the > DMA behavior above. The answer is to bounce buffer the data in > SoftiWARP / SoftRoCE. We already do, see nvme_alloc_ns. ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: broken CRCs at NVMeF target with SIW & NVMe/TCP transports 2020-03-17 16:03 ` Sagi Grimberg @ 2020-03-17 16:29 ` Bernard Metzler 2020-03-17 16:39 ` Sagi Grimberg 0 siblings, 1 reply; 15+ messages in thread From: Bernard Metzler @ 2020-03-17 16:29 UTC (permalink / raw) To: Sagi Grimberg Cc: Christoph Hellwig, Krishnamraju Eraparaju, linux-nvme, linux-rdma, Nirranjan Kirubaharan, Potnuri Bharat Teja -----"Sagi Grimberg" <sagi@grimberg.me> wrote: ----- >To: "Christoph Hellwig" <hch@lst.de>, "Krishnamraju Eraparaju" ><krishna2@chelsio.com> >From: "Sagi Grimberg" <sagi@grimberg.me> >Date: 03/17/2020 05:04PM >Cc: "Bernard Metzler" <BMT@zurich.ibm.com>, >linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, >"Nirranjan Kirubaharan" <nirranjan@chelsio.com>, "Potnuri Bharat >Teja" <bharat@chelsio.com> >Subject: [EXTERNAL] Re: broken CRCs at NVMeF target with SIW & >NVMe/TCP transports > >> On Mon, Mar 16, 2020 at 09:50:10PM +0530, Krishnamraju Eraparaju >wrote: >>> >>> I'm seeing broken CRCs at NVMeF target while running the below >program >>> at host. Here RDMA transport is SoftiWARP, but I'm also seeing the >>> same issue with NVMe/TCP aswell. >>> >>> It appears to me that the same buffer is being rewritten by the >>> application/ULP before getting the completion for the previous >requests. >>> getting the completion for the previous requests. HW based >>> HW based trasports(like iw_cxgb4) are not showing this issue >because >>> they copy/DMA and then compute the CRC on copied buffer. >> >> For TCP we can set BDI_CAP_STABLE_WRITES. For RDMA I don't think >that >> is a good idea as pretty much all RDMA block drivers rely on the >> DMA behavior above. The answer is to bounce buffer the data in >> SoftiWARP / SoftRoCE. > >We already do, see nvme_alloc_ns. > > Krishna was getting the issue when testing TCP/NVMeF with -G during connect. That enables data digest and STABLE_WRITES I think. So to me it seems we don't get stable pages, but pages which are touched after handover to the provider. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: broken CRCs at NVMeF target with SIW & NVMe/TCP transports 2020-03-17 16:29 ` Bernard Metzler @ 2020-03-17 16:39 ` Sagi Grimberg 2020-03-17 19:17 ` Krishnamraju Eraparaju 0 siblings, 1 reply; 15+ messages in thread From: Sagi Grimberg @ 2020-03-17 16:39 UTC (permalink / raw) To: Bernard Metzler Cc: Christoph Hellwig, Krishnamraju Eraparaju, linux-nvme, linux-rdma, Nirranjan Kirubaharan, Potnuri Bharat Teja >>> For TCP we can set BDI_CAP_STABLE_WRITES. For RDMA I don't think >> that >>> is a good idea as pretty much all RDMA block drivers rely on the >>> DMA behavior above. The answer is to bounce buffer the data in >>> SoftiWARP / SoftRoCE. >> >> We already do, see nvme_alloc_ns. >> >> > > Krishna was getting the issue when testing TCP/NVMeF with -G > during connect. That enables data digest and STABLE_WRITES > I think. So to me it seems we don't get stable pages, but > pages which are touched after handover to the provider. Non of the transports modifies the data at any point, both will scan it to compute crc. So surely this is coming from the fs, Krishna does this happen with xfs as well? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: broken CRCs at NVMeF target with SIW & NVMe/TCP transports 2020-03-17 16:39 ` Sagi Grimberg @ 2020-03-17 19:17 ` Krishnamraju Eraparaju 2020-03-17 19:33 ` Sagi Grimberg 0 siblings, 1 reply; 15+ messages in thread From: Krishnamraju Eraparaju @ 2020-03-17 19:17 UTC (permalink / raw) To: Sagi Grimberg Cc: Bernard Metzler, Christoph Hellwig, linux-nvme, linux-rdma, Nirranjan Kirubaharan, Potnuri Bharat Teja On Tuesday, March 03/17/20, 2020 at 09:39:39 -0700, Sagi Grimberg wrote: > > >>>For TCP we can set BDI_CAP_STABLE_WRITES. For RDMA I don't think > >>that > >>>is a good idea as pretty much all RDMA block drivers rely on the > >>>DMA behavior above. The answer is to bounce buffer the data in > >>>SoftiWARP / SoftRoCE. > >> > >>We already do, see nvme_alloc_ns. > >> > >> > > > >Krishna was getting the issue when testing TCP/NVMeF with -G > >during connect. That enables data digest and STABLE_WRITES > >I think. So to me it seems we don't get stable pages, but > >pages which are touched after handover to the provider. > > Non of the transports modifies the data at any point, both will > scan it to compute crc. So surely this is coming from the fs, > Krishna does this happen with xfs as well? Yes, but rare(took ~15min to recreate), whereas with ext3/4 its almost immediate. Here is the error log for NVMe/TCP with xfs. dmesg at Host: [ +0.000323] nvme nvme2: creating 12 I/O queues. [ +0.008991] nvme nvme2: Successfully reconnected (1 attempt) [ +25.277733] blk_update_request: I/O error, dev nvme2n1, sector 0 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 0 [ +6.043879] XFS (nvme2n1): Mounting V5 Filesystem [ +0.017745] XFS (nvme2n1): Ending clean mount [ +0.000174] xfs filesystem being mounted at /mnt supports timestamps until 2038 (0x7fffffff) [Mar18 00:14] nvme nvme2: Reconnecting in 10 seconds... [ +0.000453] nvme nvme2: creating 12 I/O queues. [ +0.009216] nvme nvme2: Successfully reconnected (1 attempt) [Mar18 00:43] nvme nvme2: Reconnecting in 10 seconds... [ +0.000383] nvme nvme2: creating 12 I/O queues. [ +0.009239] nvme nvme2: Successfully reconnected (1 attempt) dmesg at Target: [Mar18 00:14] nvmet_tcp: queue 9: cmd 17 pdu (4) data digest error: recv 0x8e85d882 expected 0x9a46fac3 [ +0.000011] nvmet: ctrl 1 fatal error occurred! [ +10.240266] nvmet: creating controller 1 for subsystem nvme-ram0 for NQN nqn.2014-08.org.nvmexpress.chelsio. [Mar18 00:42] nvmet_tcp: queue 7: cmd 89 pdu (4) data digest error: recv 0xc0ce3dfd expected 0x7ee136b5 [ +0.000012] nvmet: ctrl 1 fatal error occurred! [Mar18 00:43] nvmet: creating controller 1 for subsystem nvme-ram0 for NQN nqn.2014-08.org.nvmexpress.chelsio. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: broken CRCs at NVMeF target with SIW & NVMe/TCP transports 2020-03-17 19:17 ` Krishnamraju Eraparaju @ 2020-03-17 19:33 ` Sagi Grimberg 2020-03-17 20:31 ` Krishnamraju Eraparaju 0 siblings, 1 reply; 15+ messages in thread From: Sagi Grimberg @ 2020-03-17 19:33 UTC (permalink / raw) To: Krishnamraju Eraparaju Cc: Bernard Metzler, Christoph Hellwig, linux-nvme, linux-rdma, Nirranjan Kirubaharan, Potnuri Bharat Teja >>>>> For TCP we can set BDI_CAP_STABLE_WRITES. For RDMA I don't think >>>> that >>>>> is a good idea as pretty much all RDMA block drivers rely on the >>>>> DMA behavior above. The answer is to bounce buffer the data in >>>>> SoftiWARP / SoftRoCE. >>>> >>>> We already do, see nvme_alloc_ns. >>>> >>>> >>> >>> Krishna was getting the issue when testing TCP/NVMeF with -G >>> during connect. That enables data digest and STABLE_WRITES >>> I think. So to me it seems we don't get stable pages, but >>> pages which are touched after handover to the provider. >> >> Non of the transports modifies the data at any point, both will >> scan it to compute crc. So surely this is coming from the fs, >> Krishna does this happen with xfs as well? > Yes, but rare(took ~15min to recreate), whereas with ext3/4 > its almost immediate. Here is the error log for NVMe/TCP with xfs. Thanks Krishna, I assume that this makes the issue go away? -- diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 11e10fe1760f..cc93e1949b2c 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -889,7 +889,7 @@ static int nvme_tcp_try_send_data(struct nvme_tcp_request *req) flags |= MSG_MORE; /* can't zcopy slab pages */ - if (unlikely(PageSlab(page))) { + if (unlikely(PageSlab(page)) || queue->data_digest) { ret = sock_no_sendpage(queue->sock, page, offset, len, flags); } else { -- ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: broken CRCs at NVMeF target with SIW & NVMe/TCP transports 2020-03-17 19:33 ` Sagi Grimberg @ 2020-03-17 20:31 ` Krishnamraju Eraparaju 2020-03-18 16:49 ` Sagi Grimberg 0 siblings, 1 reply; 15+ messages in thread From: Krishnamraju Eraparaju @ 2020-03-17 20:31 UTC (permalink / raw) To: Sagi Grimberg Cc: Bernard Metzler, Christoph Hellwig, linux-nvme, linux-rdma, Nirranjan Kirubaharan, Potnuri Bharat Teja On Tuesday, March 03/17/20, 2020 at 12:33:44 -0700, Sagi Grimberg wrote: > > >>>>>For TCP we can set BDI_CAP_STABLE_WRITES. For RDMA I don't think > >>>>that > >>>>>is a good idea as pretty much all RDMA block drivers rely on the > >>>>>DMA behavior above. The answer is to bounce buffer the data in > >>>>>SoftiWARP / SoftRoCE. > >>>> > >>>>We already do, see nvme_alloc_ns. > >>>> > >>>> > >>> > >>>Krishna was getting the issue when testing TCP/NVMeF with -G > >>>during connect. That enables data digest and STABLE_WRITES > >>>I think. So to me it seems we don't get stable pages, but > >>>pages which are touched after handover to the provider. > >> > >>Non of the transports modifies the data at any point, both will > >>scan it to compute crc. So surely this is coming from the fs, > >>Krishna does this happen with xfs as well? > >Yes, but rare(took ~15min to recreate), whereas with ext3/4 > >its almost immediate. Here is the error log for NVMe/TCP with xfs. > > Thanks Krishna, > > I assume that this makes the issue go away? > -- > diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c > index 11e10fe1760f..cc93e1949b2c 100644 > --- a/drivers/nvme/host/tcp.c > +++ b/drivers/nvme/host/tcp.c > @@ -889,7 +889,7 @@ static int nvme_tcp_try_send_data(struct > nvme_tcp_request *req) > flags |= MSG_MORE; > > /* can't zcopy slab pages */ > - if (unlikely(PageSlab(page))) { > + if (unlikely(PageSlab(page)) || queue->data_digest) { > ret = sock_no_sendpage(queue->sock, page, > offset, len, > flags); > } else { > -- Unfortunately, issue is still occuring with this patch also. Looks like the integrity of the data buffer right after the CRC computation(data digest) is what causing this issue, despite the buffer being sent via sendpage or no_sendpage. Thanks, Krishna. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: broken CRCs at NVMeF target with SIW & NVMe/TCP transports 2020-03-17 20:31 ` Krishnamraju Eraparaju @ 2020-03-18 16:49 ` Sagi Grimberg 2020-03-20 14:35 ` Krishnamraju Eraparaju 0 siblings, 1 reply; 15+ messages in thread From: Sagi Grimberg @ 2020-03-18 16:49 UTC (permalink / raw) To: Krishnamraju Eraparaju Cc: Bernard Metzler, Christoph Hellwig, linux-nvme, linux-rdma, Nirranjan Kirubaharan, Potnuri Bharat Teja >> Thanks Krishna, >> >> I assume that this makes the issue go away? >> -- >> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c >> index 11e10fe1760f..cc93e1949b2c 100644 >> --- a/drivers/nvme/host/tcp.c >> +++ b/drivers/nvme/host/tcp.c >> @@ -889,7 +889,7 @@ static int nvme_tcp_try_send_data(struct >> nvme_tcp_request *req) >> flags |= MSG_MORE; >> >> /* can't zcopy slab pages */ >> - if (unlikely(PageSlab(page))) { >> + if (unlikely(PageSlab(page)) || queue->data_digest) { >> ret = sock_no_sendpage(queue->sock, page, >> offset, len, >> flags); >> } else { >> -- > > Unfortunately, issue is still occuring with this patch also. > > Looks like the integrity of the data buffer right after the CRC > computation(data digest) is what causing this issue, despite the > buffer being sent via sendpage or no_sendpage. I assume this happens with iSCSI as well? There is nothing special we are doing with respect to digest. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: broken CRCs at NVMeF target with SIW & NVMe/TCP transports 2020-03-18 16:49 ` Sagi Grimberg @ 2020-03-20 14:35 ` Krishnamraju Eraparaju 2020-03-20 20:49 ` Sagi Grimberg 0 siblings, 1 reply; 15+ messages in thread From: Krishnamraju Eraparaju @ 2020-03-20 14:35 UTC (permalink / raw) To: Sagi Grimberg Cc: Bernard Metzler, Christoph Hellwig, linux-nvme, linux-rdma, Nirranjan Kirubaharan, Potnuri Bharat Teja On Wednesday, March 03/18/20, 2020 at 09:49:07 -0700, Sagi Grimberg wrote: > > >>Thanks Krishna, > >> > >>I assume that this makes the issue go away? > >>-- > >>diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c > >>index 11e10fe1760f..cc93e1949b2c 100644 > >>--- a/drivers/nvme/host/tcp.c > >>+++ b/drivers/nvme/host/tcp.c > >>@@ -889,7 +889,7 @@ static int nvme_tcp_try_send_data(struct > >>nvme_tcp_request *req) > >> flags |= MSG_MORE; > >> > >> /* can't zcopy slab pages */ > >>- if (unlikely(PageSlab(page))) { > >>+ if (unlikely(PageSlab(page)) || queue->data_digest) { > >> ret = sock_no_sendpage(queue->sock, page, > >>offset, len, > >> flags); > >> } else { > >>-- > > > >Unfortunately, issue is still occuring with this patch also. > > > >Looks like the integrity of the data buffer right after the CRC > >computation(data digest) is what causing this issue, despite the > >buffer being sent via sendpage or no_sendpage. > > I assume this happens with iSCSI as well? There is nothing special > we are doing with respect to digest. I don't see this issue with iscsi-tcp. May be blk-mq is causing this issue? I assume iscsi-tcp does not have blk_mq support yet upstream to verify with blk_mq enabled. I tried on Ubuntu 19.10(which is based on Linux kernel 5.3), note that RHEL does not support DataDigest. The reason that I'm seeing this issue only with NVMe(tcp/softiwarp) & iSER(softiwarp) is becuase of NVMeF&ISER using blk-mq? Anyhow, I see the content of the page is being updated by upper layers while the tranport driver is computing CRC on that page content and this needs a fix. one could very easily recreate this issue running the below simple program over NVMe/TCP. #include <stdlib.h> #include <stdio.h> #include <string.h> #include <unistd.h> int main() { int i; char* line1 = "123"; FILE* fp; while(1) { fp = fopen("/mnt/tmp.txt", "w"); setvbuf(fp, NULL, _IONBF, 0); for (i=0; i<100000; i++) if ((fwrite(line1, 1, strlen(line1), fp) != strlen(line1))) exit(1); if (fclose(fp) != 0) exit(1); } return 0; } ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: broken CRCs at NVMeF target with SIW & NVMe/TCP transports 2020-03-20 14:35 ` Krishnamraju Eraparaju @ 2020-03-20 20:49 ` Sagi Grimberg 2020-03-21 4:02 ` Krishnamraju Eraparaju 0 siblings, 1 reply; 15+ messages in thread From: Sagi Grimberg @ 2020-03-20 20:49 UTC (permalink / raw) To: Krishnamraju Eraparaju Cc: Bernard Metzler, Christoph Hellwig, linux-nvme, linux-rdma, Nirranjan Kirubaharan, Potnuri Bharat Teja >> I assume this happens with iSCSI as well? There is nothing special >> we are doing with respect to digest. > > I don't see this issue with iscsi-tcp. > > May be blk-mq is causing this issue? I assume iscsi-tcp does not have > blk_mq support yet upstream to verify with blk_mq enabled. > I tried on Ubuntu 19.10(which is based on Linux kernel 5.3), note that > RHEL does not support DataDigest. > > The reason that I'm seeing this issue only with NVMe(tcp/softiwarp) & > iSER(softiwarp) is becuase of NVMeF&ISER using blk-mq? > > Anyhow, I see the content of the page is being updated by upper layers > while the tranport driver is computing CRC on that page content and > this needs a fix. Krishna, do you happen to run with nvme multipath enabled? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: broken CRCs at NVMeF target with SIW & NVMe/TCP transports 2020-03-20 20:49 ` Sagi Grimberg @ 2020-03-21 4:02 ` Krishnamraju Eraparaju 0 siblings, 0 replies; 15+ messages in thread From: Krishnamraju Eraparaju @ 2020-03-21 4:02 UTC (permalink / raw) To: Sagi Grimberg Cc: Bernard Metzler, Christoph Hellwig, linux-nvme, linux-rdma, Nirranjan Kirubaharan, Potnuri Bharat Teja On Friday, March 03/20/20, 2020 at 13:49:25 -0700, Sagi Grimberg wrote: > > >>I assume this happens with iSCSI as well? There is nothing special > >>we are doing with respect to digest. > > > >I don't see this issue with iscsi-tcp. > > > >May be blk-mq is causing this issue? I assume iscsi-tcp does not have > >blk_mq support yet upstream to verify with blk_mq enabled. > >I tried on Ubuntu 19.10(which is based on Linux kernel 5.3), note that > >RHEL does not support DataDigest. > > > >The reason that I'm seeing this issue only with NVMe(tcp/softiwarp) & > >iSER(softiwarp) is becuase of NVMeF&ISER using blk-mq? > > > >Anyhow, I see the content of the page is being updated by upper layers > >while the tranport driver is computing CRC on that page content and > >this needs a fix. > > Krishna, do you happen to run with nvme multipath enabled? Yes Sagi, issue occurs with nvme multipath enabled also.. dmesg at initiator: [ +10.671996] EXT4-fs (nvme0n1): mounting ext3 file system using the ext4 subsystem [ +0.004643] EXT4-fs (nvme0n1): mounted filesystem with ordered data mode. Opts: (null) [ +15.955424] block nvme0n1: no usable path - requeuing I/O [ +0.000142] block nvme0n1: no usable path - requeuing I/O [ +0.000135] block nvme0n1: no usable path - requeuing I/O [ +0.000119] block nvme0n1: no usable path - requeuing I/O [ +0.000108] block nvme0n1: no usable path - requeuing I/O [ +0.000111] block nvme0n1: no usable path - requeuing I/O [ +0.000118] block nvme0n1: no usable path - requeuing I/O [ +0.000158] block nvme0n1: no usable path - requeuing I/O [ +0.000130] block nvme0n1: no usable path - requeuing I/O [ +0.000138] block nvme0n1: no usable path - requeuing I/O [ +0.011754] nvme nvme0: Reconnecting in 10 seconds... [ +10.261223] nvme_ns_head_make_request: 5 callbacks suppressed [ +0.000002] block nvme0n1: no usable path - requeuing I/O [ +0.000240] block nvme0n1: no usable path - requeuing I/O [ +0.000107] block nvme0n1: no usable path - requeuing I/O [ +0.000107] block nvme0n1: no usable path - requeuing I/O [ +0.000107] block nvme0n1: no usable path - requeuing I/O [ +0.000108] block nvme0n1: no usable path - requeuing I/O [ +0.000132] block nvme0n1: no usable path - requeuing I/O [ +0.000010] nvme nvme0: creating 12 I/O queues. [ +0.000110] block nvme0n1: no usable path - requeuing I/O [ +0.000232] block nvme0n1: no usable path - requeuing I/O [ +0.000122] block nvme0n1: no usable path - requeuing I/O [ +0.008407] nvme nvme0: Successfully reconnected (1 attempt) dmesg at target: [Mar21 09:24] nvmet_tcp: queue 3: cmd 38 pdu (6) data digest error: recv 0x21e59730 expected 0x2b88fed0 [ +0.000029] nvmet: ctrl 1 fatal error occurred! [ +10.280101] nvmet: creating controller 1 for subsystem nvme-ram0 for NQN nqn.2014-08.org.nvmexpress.chelsio. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2020-03-21 4:02 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-03-16 16:20 broken CRCs at NVMeF target with SIW & NVMe/TCP transports Krishnamraju Eraparaju 2020-03-17 9:31 ` Bernard Metzler 2020-03-17 12:26 ` Tom Talpey 2020-03-17 12:45 ` Christoph Hellwig 2020-03-17 13:17 ` Bernard Metzler 2020-03-17 16:03 ` Sagi Grimberg 2020-03-17 16:29 ` Bernard Metzler 2020-03-17 16:39 ` Sagi Grimberg 2020-03-17 19:17 ` Krishnamraju Eraparaju 2020-03-17 19:33 ` Sagi Grimberg 2020-03-17 20:31 ` Krishnamraju Eraparaju 2020-03-18 16:49 ` Sagi Grimberg 2020-03-20 14:35 ` Krishnamraju Eraparaju 2020-03-20 20:49 ` Sagi Grimberg 2020-03-21 4:02 ` Krishnamraju Eraparaju
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).