stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Chuck Lever <chuck.lever@oracle.com>,
	"J. Bruce Fields" <bfields@redhat.com>
Subject: [PATCH 4.8 63/92] svcrdma: Tail iovec leaves an orphaned DMA mapping
Date: Thu, 17 Nov 2016 11:32:36 +0100	[thread overview]
Message-ID: <20161117103226.926244532@linuxfoundation.org> (raw)
In-Reply-To: <20161117103224.218007793@linuxfoundation.org>

4.8-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Chuck Lever <chuck.lever@oracle.com>

commit cace564f8b6260e806f5e28d7f192fd0e0c603ed upstream.

The ctxt's count field is overloaded to mean the number of pages in
the ctxt->page array and the number of SGEs in the ctxt->sge array.
Typically these two numbers are the same.

However, when an inline RPC reply is constructed from an xdr_buf
with a tail iovec, the head and tail often occupy the same page,
but each are DMA mapped independently. In that case, ->count equals
the number of pages, but it does not equal the number of SGEs.
There's one more SGE, for the tail iovec. Hence there is one more
DMA mapping than there are pages in the ctxt->page array.

This isn't a real problem until the server's iommu is enabled. Then
each RPC reply that has content in that iovec orphans a DMA mapping
that consists of real resources.

krb5i and krb5p always populate that tail iovec. After a couple
million sent krb5i/p RPC replies, the NFS server starts behaving
erratically. Reboot is needed to clear the problem.

Fixes: 9d11b51ce7c1 ("svcrdma: Fix send_reply() scatter/gather set-up")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/sunrpc/svc_rdma.h            |    9 +++++++++
 net/sunrpc/xprtrdma/svc_rdma_backchannel.c |    2 +-
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c    |    2 +-
 net/sunrpc/xprtrdma/svc_rdma_sendto.c      |   22 ++++------------------
 net/sunrpc/xprtrdma/svc_rdma_transport.c   |   18 ++++++++++++------
 5 files changed, 27 insertions(+), 26 deletions(-)

--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -86,6 +86,7 @@ struct svc_rdma_op_ctxt {
 	unsigned long flags;
 	enum dma_data_direction direction;
 	int count;
+	unsigned int mapped_sges;
 	struct ib_sge sge[RPCSVC_MAXPAGES];
 	struct page *pages[RPCSVC_MAXPAGES];
 };
@@ -193,6 +194,14 @@ struct svcxprt_rdma {
 
 #define RPCSVC_MAXPAYLOAD_RDMA	RPCSVC_MAXPAYLOAD
 
+/* Track DMA maps for this transport and context */
+static inline void svc_rdma_count_mappings(struct svcxprt_rdma *rdma,
+					   struct svc_rdma_op_ctxt *ctxt)
+{
+	ctxt->mapped_sges++;
+	atomic_inc(&rdma->sc_dma_used);
+}
+
 /* svc_rdma_backchannel.c */
 extern int svc_rdma_handle_bc_reply(struct rpc_xprt *xprt,
 				    struct rpcrdma_msg *rmsgp,
--- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
@@ -129,7 +129,7 @@ static int svc_rdma_bc_sendto(struct svc
 		ret = -EIO;
 		goto out_unmap;
 	}
-	atomic_inc(&rdma->sc_dma_used);
+	svc_rdma_count_mappings(rdma, ctxt);
 
 	memset(&send_wr, 0, sizeof(send_wr));
 	ctxt->cqe.done = svc_rdma_wc_send;
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -159,7 +159,7 @@ int rdma_read_chunk_lcl(struct svcxprt_r
 					   ctxt->sge[pno].addr);
 		if (ret)
 			goto err;
-		atomic_inc(&xprt->sc_dma_used);
+		svc_rdma_count_mappings(xprt, ctxt);
 
 		ctxt->sge[pno].lkey = xprt->sc_pd->local_dma_lkey;
 		ctxt->sge[pno].length = len;
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -280,7 +280,7 @@ static int send_write(struct svcxprt_rdm
 		if (ib_dma_mapping_error(xprt->sc_cm_id->device,
 					 sge[sge_no].addr))
 			goto err;
-		atomic_inc(&xprt->sc_dma_used);
+		svc_rdma_count_mappings(xprt, ctxt);
 		sge[sge_no].lkey = xprt->sc_pd->local_dma_lkey;
 		ctxt->count++;
 		sge_off = 0;
@@ -489,7 +489,7 @@ static int send_reply(struct svcxprt_rdm
 			    ctxt->sge[0].length, DMA_TO_DEVICE);
 	if (ib_dma_mapping_error(rdma->sc_cm_id->device, ctxt->sge[0].addr))
 		goto err;
-	atomic_inc(&rdma->sc_dma_used);
+	svc_rdma_count_mappings(rdma, ctxt);
 
 	ctxt->direction = DMA_TO_DEVICE;
 
@@ -505,7 +505,7 @@ static int send_reply(struct svcxprt_rdm
 		if (ib_dma_mapping_error(rdma->sc_cm_id->device,
 					 ctxt->sge[sge_no].addr))
 			goto err;
-		atomic_inc(&rdma->sc_dma_used);
+		svc_rdma_count_mappings(rdma, ctxt);
 		ctxt->sge[sge_no].lkey = rdma->sc_pd->local_dma_lkey;
 		ctxt->sge[sge_no].length = sge_bytes;
 	}
@@ -523,23 +523,9 @@ static int send_reply(struct svcxprt_rdm
 		ctxt->pages[page_no+1] = rqstp->rq_respages[page_no];
 		ctxt->count++;
 		rqstp->rq_respages[page_no] = NULL;
-		/*
-		 * If there are more pages than SGE, terminate SGE
-		 * list so that svc_rdma_unmap_dma doesn't attempt to
-		 * unmap garbage.
-		 */
-		if (page_no+1 >= sge_no)
-			ctxt->sge[page_no+1].length = 0;
 	}
 	rqstp->rq_next_page = rqstp->rq_respages + 1;
 
-	/* The loop above bumps sc_dma_used for each sge. The
-	 * xdr_buf.tail gets a separate sge, but resides in the
-	 * same page as xdr_buf.head. Don't count it twice.
-	 */
-	if (sge_no > ctxt->count)
-		atomic_dec(&rdma->sc_dma_used);
-
 	if (sge_no > rdma->sc_max_sge) {
 		pr_err("svcrdma: Too many sges (%d)\n", sge_no);
 		goto err;
@@ -692,7 +678,7 @@ void svc_rdma_send_error(struct svcxprt_
 		svc_rdma_put_context(ctxt, 1);
 		return;
 	}
-	atomic_inc(&xprt->sc_dma_used);
+	svc_rdma_count_mappings(xprt, ctxt);
 
 	/* Prepare SEND WR */
 	memset(&err_wr, 0, sizeof(err_wr));
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -198,6 +198,7 @@ struct svc_rdma_op_ctxt *svc_rdma_get_co
 
 out:
 	ctxt->count = 0;
+	ctxt->mapped_sges = 0;
 	ctxt->frmr = NULL;
 	return ctxt;
 
@@ -221,22 +222,27 @@ out_empty:
 void svc_rdma_unmap_dma(struct svc_rdma_op_ctxt *ctxt)
 {
 	struct svcxprt_rdma *xprt = ctxt->xprt;
-	int i;
-	for (i = 0; i < ctxt->count && ctxt->sge[i].length; i++) {
+	struct ib_device *device = xprt->sc_cm_id->device;
+	u32 lkey = xprt->sc_pd->local_dma_lkey;
+	unsigned int i, count;
+
+	for (count = 0, i = 0; i < ctxt->mapped_sges; i++) {
 		/*
 		 * Unmap the DMA addr in the SGE if the lkey matches
 		 * the local_dma_lkey, otherwise, ignore it since it is
 		 * an FRMR lkey and will be unmapped later when the
 		 * last WR that uses it completes.
 		 */
-		if (ctxt->sge[i].lkey == xprt->sc_pd->local_dma_lkey) {
-			atomic_dec(&xprt->sc_dma_used);
-			ib_dma_unmap_page(xprt->sc_cm_id->device,
+		if (ctxt->sge[i].lkey == lkey) {
+			count++;
+			ib_dma_unmap_page(device,
 					    ctxt->sge[i].addr,
 					    ctxt->sge[i].length,
 					    ctxt->direction);
 		}
 	}
+	ctxt->mapped_sges = 0;
+	atomic_sub(count, &xprt->sc_dma_used);
 }
 
 void svc_rdma_put_context(struct svc_rdma_op_ctxt *ctxt, int free_pages)
@@ -600,7 +606,7 @@ int svc_rdma_post_recv(struct svcxprt_rd
 				     DMA_FROM_DEVICE);
 		if (ib_dma_mapping_error(xprt->sc_cm_id->device, pa))
 			goto err_put_ctxt;
-		atomic_inc(&xprt->sc_dma_used);
+		svc_rdma_count_mappings(xprt, ctxt);
 		ctxt->sge[sge_no].addr = pa;
 		ctxt->sge[sge_no].length = PAGE_SIZE;
 		ctxt->sge[sge_no].lkey = xprt->sc_pd->local_dma_lkey;



  parent reply	other threads:[~2016-11-17 10:36 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20161117103726epcas5p2d4b3b822fdf8596bbd1a48a77364d0ac@epcas5p2.samsung.com>
2016-11-17 10:31 ` [PATCH 4.8 00/92] 4.8.9-stable review Greg Kroah-Hartman
2016-11-17 10:31   ` [PATCH 4.8 01/92] ALSA: info: Return error for invalid read/write Greg Kroah-Hartman
2016-11-17 10:31   ` [PATCH 4.8 02/92] ALSA: info: Limit the proc text input size Greg Kroah-Hartman
2016-11-17 10:31   ` [PATCH 4.8 03/92] ASoC: cs4270: fix DAPM stream name mismatch Greg Kroah-Hartman
2016-11-17 10:31   ` [PATCH 4.8 05/92] mm, frontswap: make sure allocated frontswap map is assigned Greg Kroah-Hartman
2016-11-17 10:31   ` [PATCH 4.8 06/92] shmem: fix pageflags after swapping DMA32 object Greg Kroah-Hartman
2016-11-17 10:31   ` [PATCH 4.8 07/92] swapfile: fix memory corruption via malformed swapfile Greg Kroah-Hartman
2016-11-17 10:31   ` [PATCH 4.8 08/92] mm: hwpoison: fix thp split handling in memory_failure() Greg Kroah-Hartman
2016-11-17 10:31   ` [PATCH 4.8 09/92] mm/hugetlb: fix huge page reservation leak in private mapping error paths Greg Kroah-Hartman
2016-11-17 10:31   ` [PATCH 4.8 10/92] coredump: fix unfreezable coredumping task Greg Kroah-Hartman
2016-11-17 10:31   ` [PATCH 4.8 11/92] s390/hypfs: Use get_free_page() instead of kmalloc to ensure page alignment Greg Kroah-Hartman
2016-11-17 10:31   ` [PATCH 4.8 12/92] ARC: timer: rtc: implement read loop in "C" vs. inline asm Greg Kroah-Hartman
2016-11-17 10:31   ` [PATCH 4.8 13/92] PCI: Dont attempt to claim shadow copies of ROM Greg Kroah-Hartman
2016-11-17 10:31   ` [PATCH 4.8 14/92] arc: Implement arch-specific dma_map_ops.mmap Greg Kroah-Hartman
2016-11-17 10:31   ` [PATCH 4.8 15/92] pinctrl: cherryview: Serialize register access in suspend/resume Greg Kroah-Hartman
2016-11-17 10:31   ` [PATCH 4.8 16/92] pinctrl: cherryview: Prevent possible interrupt storm on resume Greg Kroah-Hartman
2016-11-17 10:31   ` [PATCH 4.8 17/92] cpupower: Correct return type of cpu_power_is_cpu_online() in cpufreq-set Greg Kroah-Hartman
2016-11-17 10:31   ` [PATCH 4.8 18/92] mmc: sdhci: Fix CMD line reset interfering with ongoing data transfer Greg Kroah-Hartman
2016-11-17 10:31   ` [PATCH 4.8 19/92] mmc: sdhci: Fix unexpected data interrupt handling Greg Kroah-Hartman
2016-11-17 10:31   ` [PATCH 4.8 20/92] mmc: mmc: Use 500ms as the default generic CMD6 timeout Greg Kroah-Hartman
2016-11-17 10:31   ` [PATCH 4.8 21/92] staging: iio: ad5933: avoid uninitialized variable in error case Greg Kroah-Hartman
2016-11-17 10:31   ` [PATCH 4.8 22/92] staging: sm750fb: Fix bugs introduced by early commits Greg Kroah-Hartman
2016-11-17 10:31   ` [PATCH 4.8 23/92] staging: comedi: ni_tio: fix buggy ni_tio_clock_period_ps() return value Greg Kroah-Hartman
2016-11-17 10:31   ` [PATCH 4.8 24/92] drivers: staging: nvec: remove bogus reset command for PS/2 interface Greg Kroah-Hartman
2016-11-17 10:31   ` [PATCH 4.8 25/92] Revert "staging: nvec: ps2: change serio type to passthrough" Greg Kroah-Hartman
2016-11-17 10:31   ` [PATCH 4.8 26/92] staging: nvec: remove managed resource from PS2 driver Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 27/92] usb: dwc3: Fix error handling for core init Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 28/92] USB: cdc-acm: fix TIOCMIWAIT Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 30/92] drbd: Fix kernel_sendmsg() usage - potential NULL deref Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 31/92] toshiba-wmi: Fix loading the driver on non Toshiba laptops Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 32/92] clk: qoriq: Dont allow CPU clocks higher than starting value Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 33/92] cdc-acm: fix uninitialized variable Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 34/92] iio: hid-sensors: Increase the precision of scale to fix wrong reading interpretation Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 35/92] iio: orientation: hid-sensor-rotation: Add PM function (fix non working driver) Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 36/92] iio: st_sensors: fix scale configuration for h3lis331dl Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 37/92] scsi: qla2xxx: Fix scsi scan hang triggered if adapter fails during init Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 38/92] scsi: mpt3sas: Fix for block device of raid exists even after deleting raid disk Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 39/92] scsi: scsi_dh_alua: fix missing kref_put() in alua_rtpg_work() Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 40/92] scsi: scsi_dh_alua: Fix a reference counting bug Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 41/92] KVM: arm/arm64: vgic: Prevent access to invalid SPIs Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 45/92] drm/i915/dp: Extend BDW DP audio workaround to GEN9 platforms Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 47/92] drm/amdgpu: fix crash in acp_hw_fini Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 51/92] xprtrdma: use complete() instead complete_all() Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 52/92] xprtrdma: Fix DMAR failure in frwr_op_map() after reconnect Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 53/92] iommu/io-pgtable-arm: Check for v7s-incapable systems Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 54/92] iommu/amd: Free domain id when free a domain of struct dma_ops_domain Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 55/92] iommu/vt-d: Fix dead-locks in disable_dmar_iommu() path Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 57/92] watchdog: core: Fix devres_alloc() allocation size Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 58/92] Input: synaptics-rmi4 - fix error handling in SPI transport driver Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 59/92] Input: synaptics-rmi4 - fix error handling in I2C " Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 60/92] perf top: Fix refreshing hierarchy entries on TUI Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 61/92] mei: bus: fix received data size check in NFC fixup Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 62/92] svcrdma: Skip put_page() when send_reply() fails Greg Kroah-Hartman
2016-11-17 10:32   ` Greg Kroah-Hartman [this message]
2016-11-17 10:32   ` [PATCH 4.8 64/92] nvme: Delete created IO queues on reset Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 65/92] Revert "clocksource/drivers/timer_sun5i: Replace code by clocksource_mmio_init" Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 66/92] x86/build: Fix build with older GCC versions Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 67/92] clk: samsung: clk-exynos-audss: Fix module autoload Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 68/92] rtc: pcf2123: Add missing error code assignment before test Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 69/92] s390/dumpstack: restore reliable indicator for call traces Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 70/92] lib/genalloc.c: start search from start of chunk Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 71/92] hwrng: core - Dont use a stack buffer in add_early_randomness() Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 72/92] i40e: fix call of ndo_dflt_bridge_getlink() Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 73/92] mmc: sdhci-msm: Fix error return code in sdhci_msm_probe() Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 74/92] ACPI / APEI: Fix incorrect return value of ghes_proc() Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 75/92] ACPI/PCI/IRQ: assign ISA IRQ directly during early boot stages Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 76/92] ACPI/PCI: pci_link: penalize SCI correctly Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 77/92] ACPI/PCI: pci_link: Include PIRQ_PENALTY_PCI_USING for ISA IRQs Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 80/92] gpio/mvebu: Use irq_domain_add_linear Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 81/92] gpio: of: fix GPIO drivers with multiple gpio_chip for a single node Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 82/92] ASoC: Intel: Skylake: Always acquire runtime pm ref on unload Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 83/92] ASoC: sun4i-codec: return error code instead of NULL when create_card fails Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 84/92] pinctrl: iproc: Fix iProc and NSP GPIO support Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 85/92] mmc: mxs: Initialize the spinlock prior to using it Greg Kroah-Hartman
2016-11-17 10:32   ` [PATCH 4.8 86/92] memcg: prevent memcg caches to be both OFF_SLAB & OBJFREELIST_SLAB Greg Kroah-Hartman
2016-11-17 10:33   ` [PATCH 4.8 87/92] libceph: fix legacy layout decode with pool 0 Greg Kroah-Hartman
2016-11-17 10:33   ` [PATCH 4.8 88/92] NFSv4.1: work around -Wmaybe-uninitialized warning Greg Kroah-Hartman
2016-11-17 10:33   ` [PATCH 4.8 92/92] netfilter: fix namespace handling in nf_log_proc_dostring Greg Kroah-Hartman
     [not found]   ` <20161117103227.709330459@linuxfoundation.org>
2016-11-17 10:51     ` [PATCH 4.8 78/92] batman-adv: fix splat on disabling an interface Sven Eckelmann
2016-11-17 12:02       ` Greg Kroah-Hartman
     [not found]   ` <ff6afc35-bd5d-f6f0-f483-e1bc692646d5@samsung.com>
2016-11-17 16:48     ` [PATCH 4.8 00/92] 4.8.9-stable review Greg Kroah-Hartman
2016-11-17 22:23   ` Guenter Roeck
2016-11-18  7:14     ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161117103226.926244532@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=bfields@redhat.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).