All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@redhat.com>
To: Chuck Lever <chuck.lever@oracle.com>, bcodding@redhat.com
Cc: linux-nfs@vger.kernel.org
Subject: Re: [PATCH RFC] svcrdma: Preserve CB send buffer across retransmits
Date: Sat, 14 Oct 2017 06:38:12 -0400	[thread overview]
Message-ID: <1507977492.4831.2.camel@redhat.com> (raw)
In-Reply-To: <20171008225546.16538.99253.stgit@klimt.1015granger.net>

On Sun, 2017-10-08 at 19:13 -0400, Chuck Lever wrote:
> During each NFSv4 callback Call, an RDMA Send completion frees the
> page that contains the RPC Call message. If the upper layer determines
> that a retransmit is necessary, this is too soon.
> 
> One possible symptom: after a GARBAGE_ARGS response an NFSv4.1
> callback request, the following BUG fires on the NFS server:
> 
> kernel: BUG: Bad page state in process kworker/0:2H  pfn:7d3ce2
> kernel: page:ffffea001f4f3880 count:-2 mapcount:0 mapping:          (null) index:0x0
> kernel: flags: 0x2fffff80000000()
> kernel: raw: 002fffff80000000 0000000000000000 0000000000000000 fffffffeffffffff
> kernel: raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
> kernel: page dumped because: nonzero _refcount
> kernel: Modules linked in: cts rpcsec_gss_krb5 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm
> ocfs2_nodemanager ocfs2_stackglue rpcrdm a ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad
> rdma_cm ib_cm iw_cm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel
> kvm irqbypass crct10dif_pc lmul crc32_pclmul ghash_clmulni_intel pcbc iTCO_wdt
> iTCO_vendor_support aesni_intel crypto_simd glue_helper cryptd pcspkr lpc_ich i2c_i801
> mei_me mf d_core mei raid0 sg wmi ioatdma ipmi_si ipmi_devintf ipmi_msghandler shpchp
> acpi_power_meter acpi_pad nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs
> libcrc32c mlx4_en mlx4_ib mlx5_ib ib_core sd_mod sr_mod cdrom ast drm_kms_helper
> syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci crc32c_intel libahci drm
> mlx5_core igb libata mlx4_core dca i2c_algo_bit i2c_core nvme
> kernel: ptp nvme_core pps_core dm_mirror dm_region_hash dm_log dm_mod dax
> kernel: CPU: 0 PID: 11495 Comm: kworker/0:2H Not tainted 4.14.0-rc3-00001-g577ce48 #811
> kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
> kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
> kernel: Call Trace:
> kernel: dump_stack+0x62/0x80
> kernel: bad_page+0xfe/0x11a
> kernel: free_pages_check_bad+0x76/0x78
> kernel: free_pcppages_bulk+0x364/0x441
> kernel: ? ttwu_do_activate.isra.61+0x71/0x78
> kernel: free_hot_cold_page+0x1c5/0x202
> kernel: __put_page+0x2c/0x36
> kernel: svc_rdma_put_context+0xd9/0xe4 [rpcrdma]
> kernel: svc_rdma_wc_send+0x50/0x98 [rpcrdma]
> 
> This issue exists all the way back to v4.5, but refactoring and code
> re-organization prevents this simple patch from applying to kernels
> older than v4.12. The fix is the same, however, if someone needs to
> backport it.
> 
> Reported-by: Ben Coddington <bcodding@redhat.com>
> Fixes: 5d252f90a800 ('svcrdma: Add class for RDMA backwards ... ')
> Cc: stable@vger.kernel.org # v4.12
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  net/sunrpc/xprtrdma/svc_rdma_backchannel.c |    6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> Hi Ben, Jeff-
> 
> I was able to reproduce the problem here with a rigged Linux client.
> This is my proposed fix. Fix also tested with a prototype Solaris
> client similar to the one that exposed the problem.
> 
> 
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> index ec37ad8..1854db2 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> @@ -132,6 +132,10 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
>  	if (ret)
>  		goto out_err;
>  
> +	/* Bump page refcnt so Send completion doesn't release
> +	 * the rq_buffer before all retransmits are complete.
> +	 */
> +	get_page(virt_to_page(rqst->rq_buffer));

Looks fairly reasonable, but is this enough? Could rq_buffer be larger
than a page?

>  	ret = svc_rdma_post_send_wr(rdma, ctxt, 1, 0);
>  	if (ret)
>  		goto out_unmap;
> @@ -164,7 +168,6 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
>  		return -EINVAL;
>  	}
>  
> -	/* svc_rdma_sendto releases this page */
>  	page = alloc_page(RPCRDMA_DEF_GFP);
>  	if (!page)
>  		return -ENOMEM;
> @@ -183,6 +186,7 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
>  {
>  	struct rpc_rqst *rqst = task->tk_rqstp;
>  
> +	put_page(virt_to_page(rqst->rq_buffer));
>  	kfree(rqst->rq_rbuffer);
>  }
>  
> 

-- 
Jeff Layton <jlayton@redhat.com>

  parent reply	other threads:[~2017-10-14 10:38 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-08 23:13 [PATCH RFC] svcrdma: Preserve CB send buffer across retransmits Chuck Lever
2017-10-13 21:49 ` Chuck Lever
2017-10-14 10:38 ` Jeff Layton [this message]
2017-10-14 13:29   ` Chuck Lever
2017-10-14 13:35     ` Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1507977492.4831.2.camel@redhat.com \
    --to=jlayton@redhat.com \
    --cc=bcodding@redhat.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.