All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Chuck Lever <chuck.lever@oracle.com>
Subject: [PATCH 4.19 16/46] svcrdma: Fix trace point use-after-free race
Date: Fri,  1 May 2020 15:22:41 +0200	[thread overview]
Message-ID: <20200501131504.178553519@linuxfoundation.org> (raw)
In-Reply-To: <20200501131457.023036302@linuxfoundation.org>

From: Chuck Lever <chuck.lever@oracle.com>

commit e28b4fc652c1830796a4d3e09565f30c20f9a2cf upstream.

I hit this while testing nfsd-5.7 with kernel memory debugging
enabled on my server:

Mar 30 13:21:45 klimt kernel: BUG: unable to handle page fault for address: ffff8887e6c279a8
Mar 30 13:21:45 klimt kernel: #PF: supervisor read access in kernel mode
Mar 30 13:21:45 klimt kernel: #PF: error_code(0x0000) - not-present page
Mar 30 13:21:45 klimt kernel: PGD 3601067 P4D 3601067 PUD 87c519067 PMD 87c3e2067 PTE 800ffff8193d8060
Mar 30 13:21:45 klimt kernel: Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
Mar 30 13:21:45 klimt kernel: CPU: 2 PID: 1933 Comm: nfsd Not tainted 5.6.0-rc6-00040-g881e87a3c6f9 #1591
Mar 30 13:21:45 klimt kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
Mar 30 13:21:45 klimt kernel: RIP: 0010:svc_rdma_post_chunk_ctxt+0xab/0x284 [rpcrdma]
Mar 30 13:21:45 klimt kernel: Code: c1 83 34 02 00 00 29 d0 85 c0 7e 72 48 8b bb a0 02 00 00 48 8d 54 24 08 4c 89 e6 48 8b 07 48 8b 40 20 e8 5a 5c 2b e1 41 89 c6 <8b> 45 20 89 44 24 04 8b 05 02 e9 01 00 85 c0 7e 33 e9 5e 01 00 00
Mar 30 13:21:45 klimt kernel: RSP: 0018:ffffc90000dfbdd8 EFLAGS: 00010286
Mar 30 13:21:45 klimt kernel: RAX: 0000000000000000 RBX: ffff8887db8db400 RCX: 0000000000000030
Mar 30 13:21:45 klimt kernel: RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000246
Mar 30 13:21:45 klimt kernel: RBP: ffff8887e6c27988 R08: 0000000000000000 R09: 0000000000000004
Mar 30 13:21:45 klimt kernel: R10: ffffc90000dfbdd8 R11: 00c068ef00000000 R12: ffff8887eb4e4a80
Mar 30 13:21:45 klimt kernel: R13: ffff8887db8db634 R14: 0000000000000000 R15: ffff8887fc931000
Mar 30 13:21:45 klimt kernel: FS:  0000000000000000(0000) GS:ffff88885bd00000(0000) knlGS:0000000000000000
Mar 30 13:21:45 klimt kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 30 13:21:45 klimt kernel: CR2: ffff8887e6c279a8 CR3: 000000081b72e002 CR4: 00000000001606e0
Mar 30 13:21:45 klimt kernel: Call Trace:
Mar 30 13:21:45 klimt kernel: ? svc_rdma_vec_to_sg+0x7f/0x7f [rpcrdma]
Mar 30 13:21:45 klimt kernel: svc_rdma_send_write_chunk+0x59/0xce [rpcrdma]
Mar 30 13:21:45 klimt kernel: svc_rdma_sendto+0xf9/0x3ae [rpcrdma]
Mar 30 13:21:45 klimt kernel: ? nfsd_destroy+0x51/0x51 [nfsd]
Mar 30 13:21:45 klimt kernel: svc_send+0x105/0x1e3 [sunrpc]
Mar 30 13:21:45 klimt kernel: nfsd+0xf2/0x149 [nfsd]
Mar 30 13:21:45 klimt kernel: kthread+0xf6/0xfb
Mar 30 13:21:45 klimt kernel: ? kthread_queue_delayed_work+0x74/0x74
Mar 30 13:21:45 klimt kernel: ret_from_fork+0x3a/0x50
Mar 30 13:21:45 klimt kernel: Modules linked in: ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue ib_umad ib_ipoib mlx4_ib sb_edac x86_pkg_temp_thermal iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel glue_helper crypto_simd cryptd pcspkr rpcrdma i2c_i801 rdma_ucm lpc_ich mfd_core ib_iser rdma_cm iw_cm ib_cm mei_me raid0 libiscsi mei sg scsi_transport_iscsi ioatdma wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs libcrc32c mlx4_en sd_mod sr_mod cdrom mlx4_core crc32c_intel igb nvme i2c_algo_bit ahci i2c_core libahci nvme_core dca libata t10_pi qedr dm_mirror dm_region_hash dm_log dm_mod dax qede qed crc8 ib_uverbs ib_core
Mar 30 13:21:45 klimt kernel: CR2: ffff8887e6c279a8
Mar 30 13:21:45 klimt kernel: ---[ end trace 87971d2ad3429424 ]---

It's absolutely not safe to use resources pointed to by the @send_wr
argument of ib_post_send() _after_ that function returns. Those
resources are typically freed by the Send completion handler, which
can run before ib_post_send() returns.

Thus the trace points currently around ib_post_send() in the
server's RPC/RDMA transport are a hazard, even when they are
disabled. Rearrange them so that they touch the Work Request only
_before_ ib_post_send() is invoked.

Fixes: bd2abef33394 ("svcrdma: Trace key RDMA API events")
Fixes: 4201c7464753 ("svcrdma: Introduce svc_rdma_send_ctxt")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/trace/events/rpcrdma.h        |   50 ++++++++++++++++++++++++----------
 net/sunrpc/xprtrdma/svc_rdma_rw.c     |    3 --
 net/sunrpc/xprtrdma/svc_rdma_sendto.c |   16 ++++++----
 3 files changed, 46 insertions(+), 23 deletions(-)

--- a/include/trace/events/rpcrdma.h
+++ b/include/trace/events/rpcrdma.h
@@ -1322,17 +1322,15 @@ DECLARE_EVENT_CLASS(svcrdma_sendcomp_eve
 
 TRACE_EVENT(svcrdma_post_send,
 	TP_PROTO(
-		const struct ib_send_wr *wr,
-		int status
+		const struct ib_send_wr *wr
 	),
 
-	TP_ARGS(wr, status),
+	TP_ARGS(wr),
 
 	TP_STRUCT__entry(
 		__field(const void *, cqe)
 		__field(unsigned int, num_sge)
 		__field(u32, inv_rkey)
-		__field(int, status)
 	),
 
 	TP_fast_assign(
@@ -1340,12 +1338,11 @@ TRACE_EVENT(svcrdma_post_send,
 		__entry->num_sge = wr->num_sge;
 		__entry->inv_rkey = (wr->opcode == IB_WR_SEND_WITH_INV) ?
 					wr->ex.invalidate_rkey : 0;
-		__entry->status = status;
 	),
 
-	TP_printk("cqe=%p num_sge=%u inv_rkey=0x%08x status=%d",
+	TP_printk("cqe=%p num_sge=%u inv_rkey=0x%08x",
 		__entry->cqe, __entry->num_sge,
-		__entry->inv_rkey, __entry->status
+		__entry->inv_rkey
 	)
 );
 
@@ -1410,26 +1407,23 @@ TRACE_EVENT(svcrdma_wc_receive,
 TRACE_EVENT(svcrdma_post_rw,
 	TP_PROTO(
 		const void *cqe,
-		int sqecount,
-		int status
+		int sqecount
 	),
 
-	TP_ARGS(cqe, sqecount, status),
+	TP_ARGS(cqe, sqecount),
 
 	TP_STRUCT__entry(
 		__field(const void *, cqe)
 		__field(int, sqecount)
-		__field(int, status)
 	),
 
 	TP_fast_assign(
 		__entry->cqe = cqe;
 		__entry->sqecount = sqecount;
-		__entry->status = status;
 	),
 
-	TP_printk("cqe=%p sqecount=%d status=%d",
-		__entry->cqe, __entry->sqecount, __entry->status
+	TP_printk("cqe=%p sqecount=%d",
+		__entry->cqe, __entry->sqecount
 	)
 );
 
@@ -1525,6 +1519,34 @@ DECLARE_EVENT_CLASS(svcrdma_sendqueue_ev
 DEFINE_SQ_EVENT(full);
 DEFINE_SQ_EVENT(retry);
 
+TRACE_EVENT(svcrdma_sq_post_err,
+	TP_PROTO(
+		const struct svcxprt_rdma *rdma,
+		int status
+	),
+
+	TP_ARGS(rdma, status),
+
+	TP_STRUCT__entry(
+		__field(int, avail)
+		__field(int, depth)
+		__field(int, status)
+		__string(addr, rdma->sc_xprt.xpt_remotebuf)
+	),
+
+	TP_fast_assign(
+		__entry->avail = atomic_read(&rdma->sc_sq_avail);
+		__entry->depth = rdma->sc_sq_depth;
+		__entry->status = status;
+		__assign_str(addr, rdma->sc_xprt.xpt_remotebuf);
+	),
+
+	TP_printk("addr=%s sc_sq_avail=%d/%d status=%d",
+		__get_str(addr), __entry->avail, __entry->depth,
+		__entry->status
+	)
+);
+
 #endif /* _TRACE_RPCRDMA_H */
 
 #include <trace/define_trace.h>
--- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
@@ -331,8 +331,6 @@ static int svc_rdma_post_chunk_ctxt(stru
 		if (atomic_sub_return(cc->cc_sqecount,
 				      &rdma->sc_sq_avail) > 0) {
 			ret = ib_post_send(rdma->sc_qp, first_wr, &bad_wr);
-			trace_svcrdma_post_rw(&cc->cc_cqe,
-					      cc->cc_sqecount, ret);
 			if (ret)
 				break;
 			return 0;
@@ -345,6 +343,7 @@ static int svc_rdma_post_chunk_ctxt(stru
 		trace_svcrdma_sq_retry(rdma);
 	} while (1);
 
+	trace_svcrdma_sq_post_err(rdma, ret);
 	set_bit(XPT_CLOSE, &xprt->xpt_flags);
 
 	/* If even one was posted, there will be a completion. */
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -310,15 +310,17 @@ int svc_rdma_send(struct svcxprt_rdma *r
 		}
 
 		svc_xprt_get(&rdma->sc_xprt);
+		trace_svcrdma_post_send(wr);
 		ret = ib_post_send(rdma->sc_qp, wr, NULL);
-		trace_svcrdma_post_send(wr, ret);
-		if (ret) {
-			set_bit(XPT_CLOSE, &rdma->sc_xprt.xpt_flags);
-			svc_xprt_put(&rdma->sc_xprt);
-			wake_up(&rdma->sc_send_wait);
-		}
-		break;
+		if (ret)
+			break;
+		return 0;
 	}
+
+	trace_svcrdma_sq_post_err(rdma, ret);
+	set_bit(XPT_CLOSE, &rdma->sc_xprt.xpt_flags);
+	svc_xprt_put(&rdma->sc_xprt);
+	wake_up(&rdma->sc_send_wait);
 	return ret;
 }
 



  parent reply	other threads:[~2020-05-01 13:53 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-01 13:22 [PATCH 4.19 00/46] 4.19.120-rc1 review Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 01/46] remoteproc: Fix wrong rvring index computation Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 02/46] mtd: cfi: fix deadloop in cfi_cmdset_0002.c do_write_buffer Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 03/46] include/uapi/linux/swab.h: fix userspace breakage, use __BITS_PER_LONG for swap Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 04/46] binder: take read mode of mmap_sem in binder_alloc_free_page() Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 05/46] usb: dwc3: gadget: Do link recovery for SS and SSP Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 06/46] usb: gadget: udc: bdc: Remove unnecessary NULL checks in bdc_req_complete Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 07/46] iio:ad7797: Use correct attribute_group Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 08/46] ASoC: q6dsp6: q6afe-dai: add missing channels to MI2S DAIs Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 09/46] ASoC: tas571x: disable regulators on failed probe Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 10/46] ASoC: wm8960: Fix wrong clock after suspend & resume Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 11/46] nfsd: memory corruption in nfsd4_lock() Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 12/46] i2c: altera: use proper variable to hold errno Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 13/46] rxrpc: Fix DATA Tx to disable nofrag for UDP on AF_INET6 socket Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 14/46] net/cxgb4: Check the return from t4_query_params properly Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 15/46] xfs: acquire superblock freeze protection on eofblocks scans Greg Kroah-Hartman
2020-05-01 13:22 ` Greg Kroah-Hartman [this message]
2020-05-01 13:22 ` [PATCH 4.19 17/46] svcrdma: Fix leak of svc_rdma_recv_ctxt objects Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 18/46] PCI: Avoid ASMedia XHCI USB PME# from D0 defect Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 19/46] PCI: Move Apex Edge TPU class quirk to fix BAR assignment Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 20/46] ARM: dts: bcm283x: Disable dsi0 node Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 21/46] cpumap: Avoid warning when CONFIG_DEBUG_PER_CPU_MAPS is enabled Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 22/46] net/mlx5: Fix failing fw tracer allocation on s390 Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 23/46] perf/core: fix parent pid/tid in task exit events Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 24/46] bpf, x86_32: Fix incorrect encoding in BPF_LDX zero-extension Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 25/46] mm: shmem: disable interrupt when acquiring info->lock in userfaultfd_copy path Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 26/46] xfs: clear PF_MEMALLOC before exiting xfsaild thread Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 27/46] bpf, x86: Fix encoding for lower 8-bit registers in BPF_STX BPF_B Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 28/46] net: fec: set GPR bit on suspend by DT configuration Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 29/46] x86: hyperv: report value of misc_features Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 30/46] xfs: fix partially uninitialized structure in xfs_reflink_remap_extent Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 31/46] ALSA: hda: Keep the controller initialization even if no codecs found Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 32/46] ALSA: hda: Explicitly permit using autosuspend if runtime PM is supported Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 33/46] scsi: target: fix PR IN / READ FULL STATUS for FC Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.19 34/46] scsi: target: tcmu: reset_ring should reset TCMU_DEV_BIT_BROKEN Greg Kroah-Hartman
2020-05-01 13:23 ` [PATCH 4.19 35/46] objtool: Fix CONFIG_UBSAN_TRAP unreachable warnings Greg Kroah-Hartman
2020-05-01 13:23 ` [PATCH 4.19 36/46] objtool: Support Clang non-section symbols in ORC dump Greg Kroah-Hartman
2020-05-01 13:23 ` [PATCH 4.19 37/46] xen/xenbus: ensure xenbus_map_ring_valloc() returns proper grant status Greg Kroah-Hartman
2020-05-01 13:23 ` [PATCH 4.19 38/46] ALSA: hda: call runtime_allow() for all hda controllers Greg Kroah-Hartman
2020-05-01 13:23 ` [PATCH 4.19 39/46] arm64: Delete the space separator in __emit_inst Greg Kroah-Hartman
2020-05-01 13:23 ` [PATCH 4.19 40/46] ext4: use matching invalidatepage in ext4_writepage Greg Kroah-Hartman
2020-05-01 13:23 ` [PATCH 4.19 41/46] ext4: increase wait time needed before reuse of deleted inode numbers Greg Kroah-Hartman
2020-05-01 13:23 ` [PATCH 4.19 42/46] ext4: convert BUG_ONs to WARN_ONs in mballoc.c Greg Kroah-Hartman
2020-05-01 13:23 ` [PATCH 4.19 43/46] hwmon: (jc42) Fix name to have no illegal characters Greg Kroah-Hartman
2020-05-01 13:23 ` [PATCH 4.19 44/46] bpf, x86_32: Fix clobbering of dst for BPF_JSET Greg Kroah-Hartman
2020-05-01 13:23 ` [PATCH 4.19 45/46] qed: Fix use after free in qed_chain_free Greg Kroah-Hartman
2020-05-01 13:23 ` [PATCH 4.19 46/46] ext4: check for non-zero journal inum in ext4_calculate_overhead Greg Kroah-Hartman
     [not found] ` <20200501131457.023036302-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
2020-05-01 15:16   ` [PATCH 4.19 00/46] 4.19.120-rc1 review Jon Hunter
2020-05-01 15:16     ` Jon Hunter
2020-05-01 22:05 ` Naresh Kamboju
2020-05-01 22:11 ` Guenter Roeck
2020-05-02 23:17 ` shuah

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200501131504.178553519@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.