From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp.opengridcomputing.com ([72.48.136.20]:53019 "EHLO smtp.opengridcomputing.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751002AbbJSO7B (ORCPT ); Mon, 19 Oct 2015 10:59:01 -0400 From: "Steve Wise" To: , , Cc: References: <1445128019150223@kroah.com> In-Reply-To: <1445128019150223@kroah.com> Subject: RE: FAILED: patch "[PATCH] svcrdma: handle rdma read with a non-zero initial page offset" failed to apply to 4.1-stable tree Date: Mon, 19 Oct 2015 09:59:02 -0500 Message-ID: <001801d10a7e$b16999e0$143ccda0$@opengridcomputing.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Language: en-us Sender: stable-owner@vger.kernel.org List-ID: > -----Original Message----- > From: gregkh@linuxfoundation.org [mailto:gregkh@linuxfoundation.org] > Sent: Saturday, October 17, 2015 7:27 PM > To: swise@opengridcomputing.com; bfields@redhat.com; chuck.lever@oracle.com > Cc: stable@vger.kernel.org > Subject: FAILED: patch "[PATCH] svcrdma: handle rdma read with a non-zero initial page offset" failed to apply to 4.1-stable tree > > > The patch below does not apply to the 4.1-stable tree. > If someone wants it applied there, or to any other stable or longterm > tree, then please email the backport, including the original git commit > id to . > > thanks, > > greg k-h > Hey Greg, I submitted the backport patch to linux-stable. Thanks, Steve. > ------------------ original commit in Linus's tree ------------------ > > >From c91aed9896946721bb30705ea2904edb3725dd61 Mon Sep 17 00:00:00 2001 > From: Steve Wise > Date: Mon, 28 Sep 2015 16:46:06 -0500 > Subject: [PATCH] svcrdma: handle rdma read with a non-zero initial page offset > > The server rdma_read_chunk_lcl() and rdma_read_chunk_frmr() functions > were not taking into account the initial page_offset when determining > the rdma read length. This resulted in a read who's starting address > and length exceeded the base/bounds of the frmr. > > The server gets an async error from the rdma device and kills the > connection, and the client then reconnects and resends. This repeats > indefinitely, and the application hangs. > > Most work loads don't tickle this bug apparently, but one test hit it > every time: building the linux kernel on a 16 core node with 'make -j > 16 O=/mnt/0' where /mnt/0 is a ramdisk mounted via NFSRDMA. > > This bug seems to only be tripped with devices having small fastreg page > list depths. I didn't see it with mlx4, for instance. > > Fixes: 0bf4828983df ('svcrdma: refactor marshalling logic') > Signed-off-by: Steve Wise > Tested-by: Chuck Lever > Cc: stable@vger.kernel.org > Signed-off-by: J. Bruce Fields > > diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c > index cb5174284074..5f6ca47092b0 100644 > --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c > +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c > @@ -136,7 +136,8 @@ int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt, > ctxt->direction = DMA_FROM_DEVICE; > ctxt->read_hdr = head; > pages_needed = min_t(int, pages_needed, xprt->sc_max_sge_rd); > - read = min_t(int, pages_needed << PAGE_SHIFT, rs_length); > + read = min_t(int, (pages_needed << PAGE_SHIFT) - *page_offset, > + rs_length); > > for (pno = 0; pno < pages_needed; pno++) { > int len = min_t(int, rs_length, PAGE_SIZE - pg_off); > @@ -235,7 +236,8 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt, > ctxt->direction = DMA_FROM_DEVICE; > ctxt->frmr = frmr; > pages_needed = min_t(int, pages_needed, xprt->sc_frmr_pg_list_len); > - read = min_t(int, pages_needed << PAGE_SHIFT, rs_length); > + read = min_t(int, (pages_needed << PAGE_SHIFT) - *page_offset, > + rs_length); > > frmr->kva = page_address(rqstp->rq_arg.pages[pg_no]); > frmr->direction = DMA_FROM_DEVICE;