From: "J. Bruce Fields" <bfields@fieldses.org>
To: Neil Brown <neilb@suse.de>
Cc: Tom Tucker <tom@opengridcomputing.com>,
linux-nfs@vger.kernel.org, Wei Yongjun <yjwei@cn.fujitsu.com>
Subject: Re: The recent kref_put warning (was: [PATCH] sunrpc: remove unnecessary svc_xprt_put)
Date: Sun, 28 Feb 2010 16:05:53 -0500 [thread overview]
Message-ID: <20100228210553.GA27965@fieldses.org> (raw)
In-Reply-To: <20100227165913.53718449-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
On Sat, Feb 27, 2010 at 04:59:13PM +1100, Neil Brown wrote:
> On Sat, 27 Feb 2010 12:35:37 +1100
> Neil Brown <neilb@suse.de> wrote:
>
> > So if XPT_CLOSE is set while xpo_recvfrom is being called, which I think
> > is possible, and if ->xpo_recvfrom returns non-zero, then we end up
> > processing a request on a dead socket, which doesn't sound like the right
> > thing to do. I don't think it can cause the present problem, but
> > it looks wrong. That last 'if' should just be an 'else'.
> > I guess that would effectively reverse b0401d7253, though - not that
> > that patch seems entirely right to me - if there is a problem I probably
> > would have fixed it differently, though I'm not sure how.
> > So maybe change "if (XPT_CLOSE)" to "if (len <= 0 && XPT_CLOSE)" ???
>
> OK, I think I've nailed it. I think b0401d7253 is the culprit.
> Now let me see if I can convince you (and me).
Thanks, Neil, for the explanation.
> Firstly, why is this patch wrong.
> It claims:
>
> sunrpc: "Move close processing to a single place"
> (d7979ae4a050a45b78af51832475001b68263d2a) moved the
> close processing before the recvfrom method. This may
> cause the close processing never to execute. So this
> patch moves it to the right place.
>
> The referenced commit did *not* move the close processing before the
> recvfrom method - it was already there. The close processing was previously
> at the top of the individual recvfrom methods. It was split out and common
> code with placed before the now-slightly-reduced recvfrom methods.
> This is functionally a null change.
>
> However that doesn't explain why sometimes "the close processing [would]]
> never .. execute".
> The reason for this is subtle. One the changes in commit d7979ae4a is
>
> err_delete:
> - svc_delete_socket(svsk);
> + set_bit(SK_CLOSE, &svsk->sk_flags);
> return -EAGAIN;
>
> This is insufficient. The recvfrom methods must always call svc_xprt_received
> on completion so that the socket gets re-queued if there is any more work to
> do. This particular path did not make that call because it actually
> destroyed the svsk, making requeue pointless. When the svc_delete_socket was
> change to just set a bit, we should have added a call to svc_xprt_received,
> but we didn't. Sorry. As I said, it was subtle.
>
> So how is the b0401d7253 patch causing a problem?
>
> svc_tcp_state_change - which can be called at any time - sets XPT_CLOSE.
> If this happens while svc_tcp_recvfrom is running and before one of the
> calls to svc_xprt_received, then svc_xprt_received will requeue the svsk for
> further processing (to handle the close).
> As soon a svc_tcp_recvfrom completes, svc_recv will notice that XPT_CLOSE is
> set and will close the socket, dropping the last refcount.
OK, so the rule (or one of the rules) that was violated here was that we
end up calling svc_delete_xprt() without XPT_BUSY?
> Subsequently the
> thread which the socket was queued to wakes up, calls svc_recv, and triggers
> the warning.
>
> So the fix I propose is:
> - make the XPT_CLOSE case in svc_recv once more exclusive with the
> ->recvfrom case
> - make sure all paths out of all recvfrom methods call svc_xprt_received.
> Maybe it should be called after the ->xpo_recvfrom call instead.
>
> So something like this?
Makes sense to me on a first look. It would be helpful if people that
can reproduce this could test....
> I've made quite a few changes here - it might be worth splitting them
> up.
Probably so.
> One worth noting is that we now don't re-queue a udp
> socket at the earliest opportunity, but possibly do a
> csum_partial_copy_to_xdr before the requeue which could reduce performance
> slightly with udp on a multiprocessor.
Just because we're slower to get another CPU working on the next
incoming packet?
> I have no idea what the actual
> performance effect would be, but I think it makes the code a lot more robust
> (move the svc_xprt_received to just one place).
OK.
--b.
>
> NeilBrown
>
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index 4f30336..2d99fb8 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -699,8 +699,12 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
> spin_unlock_bh(&pool->sp_lock);
>
> len = 0;
> - if (test_bit(XPT_LISTENER, &xprt->xpt_flags) &&
> - !test_bit(XPT_CLOSE, &xprt->xpt_flags)) {
> +
> + if (test_bit(XPT_CLOSE, &xprt->xpt_flags)) {
> + dprintk("svc_recv: found XPT_CLOSE\n");
> + svc_delete_xprt(xprt);
> + return 0;
> + } else if (test_bit(XPT_LISTENER, &xprt->xpt_flags)) {
> struct svc_xprt *newxpt;
> newxpt = xprt->xpt_ops->xpo_accept(xprt);
> if (newxpt) {
> @@ -726,23 +730,19 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
> svc_xprt_received(newxpt);
> }
> svc_xprt_received(xprt);
> - } else if (!test_bit(XPT_CLOSE, &xprt->xpt_flags)) {
> - dprintk("svc: server %p, pool %u, transport %p, inuse=%d\n",
> - rqstp, pool->sp_id, xprt,
> - atomic_read(&xprt->xpt_ref.refcount));
> - rqstp->rq_deferred = svc_deferred_dequeue(xprt);
> - if (rqstp->rq_deferred) {
> - svc_xprt_received(xprt);
> - len = svc_deferred_recv(rqstp);
> - } else
> - len = xprt->xpt_ops->xpo_recvfrom(rqstp);
> - dprintk("svc: got len=%d\n", len);
> + return 0;
> }
>
> - if (test_bit(XPT_CLOSE, &xprt->xpt_flags)) {
> - dprintk("svc_recv: found XPT_CLOSE\n");
> - svc_delete_xprt(xprt);
> - }
> + dprintk("svc: server %p, pool %u, transport %p, inuse=%d\n",
> + rqstp, pool->sp_id, xprt,
> + atomic_read(&xprt->xpt_ref.refcount));
> + rqstp->rq_deferred = svc_deferred_dequeue(xprt);
> + if (rqstp->rq_deferred)
> + len = svc_deferred_recv(rqstp);
> + else
> + len = xprt->xpt_ops->xpo_recvfrom(rqstp);
> + dprintk("svc: got len=%d\n", len);
> + svc_xprt_received(xprt);
>
> /* No data, incomplete (TCP) read, or accept() */
> if (len == 0 || len == -EAGAIN) {
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 870929e..22d9904 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -547,7 +547,6 @@ static int svc_udp_recvfrom(struct svc_rqst *rqstp)
> dprintk("svc: recvfrom returned error %d\n", -err);
> set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
> }
> - svc_xprt_received(&svsk->sk_xprt);
> return -EAGAIN;
> }
> len = svc_addr_len(svc_addr(rqstp));
> @@ -562,11 +561,6 @@ static int svc_udp_recvfrom(struct svc_rqst *rqstp)
> svsk->sk_sk->sk_stamp = skb->tstamp;
> set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); /* there may be more data... */
>
> - /*
> - * Maybe more packets - kick another thread ASAP.
> - */
> - svc_xprt_received(&svsk->sk_xprt);
> -
> len = skb->len - sizeof(struct udphdr);
> rqstp->rq_arg.len = len;
>
> @@ -917,7 +911,6 @@ static int svc_tcp_recv_record(struct svc_sock *svsk, struct svc_rqst *rqstp)
> if (len < want) {
> dprintk("svc: short recvfrom while reading record "
> "length (%d of %d)\n", len, want);
> - svc_xprt_received(&svsk->sk_xprt);
> goto err_again; /* record header not complete */
> }
>
> @@ -953,7 +946,6 @@ static int svc_tcp_recv_record(struct svc_sock *svsk, struct svc_rqst *rqstp)
> if (len < svsk->sk_reclen) {
> dprintk("svc: incomplete TCP record (%d of %d)\n",
> len, svsk->sk_reclen);
> - svc_xprt_received(&svsk->sk_xprt);
> goto err_again; /* record not complete */
> }
> len = svsk->sk_reclen;
> @@ -961,10 +953,9 @@ static int svc_tcp_recv_record(struct svc_sock *svsk, struct svc_rqst *rqstp)
>
> return len;
> error:
> - if (len == -EAGAIN) {
> + if (len == -EAGAIN)
> dprintk("RPC: TCP recv_record got EAGAIN\n");
> - svc_xprt_received(&svsk->sk_xprt);
> - }
> +
> return len;
> err_delete:
> set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
> @@ -1109,18 +1100,14 @@ out:
> svsk->sk_tcplen = 0;
>
> svc_xprt_copy_addrs(rqstp, &svsk->sk_xprt);
> - svc_xprt_received(&svsk->sk_xprt);
> if (serv->sv_stats)
> serv->sv_stats->nettcpcnt++;
>
> return len;
>
> err_again:
> - if (len == -EAGAIN) {
> + if (len == -EAGAIN)
> dprintk("RPC: TCP recvfrom got EAGAIN\n");
> - svc_xprt_received(&svsk->sk_xprt);
> - return len;
> - }
> error:
> if (len != -EAGAIN) {
> printk(KERN_NOTICE "%s: recvfrom returned errno %d\n",
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> index f92e37e..0194de8 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> @@ -566,7 +566,6 @@ static int rdma_read_complete(struct svc_rqst *rqstp,
> ret, rqstp->rq_arg.len, rqstp->rq_arg.head[0].iov_base,
> rqstp->rq_arg.head[0].iov_len);
>
> - svc_xprt_received(rqstp->rq_xprt);
> return ret;
> }
>
> @@ -665,7 +664,6 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
> rqstp->rq_arg.head[0].iov_len);
> rqstp->rq_prot = IPPROTO_MAX;
> svc_xprt_copy_addrs(rqstp, xprt);
> - svc_xprt_received(xprt);
> return ret;
>
> close_out:
> @@ -678,6 +676,5 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
> */
> set_bit(XPT_CLOSE, &xprt->xpt_flags);
> defer:
> - svc_xprt_received(xprt);
> return 0;
> }
next prev parent reply other threads:[~2010-02-28 21:04 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-26 22:33 [PATCH] sunrpc: remove unnecessary svc_xprt_put Neil Brown
[not found] ` <19336.19524.469529.431210-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2010-02-26 22:44 ` J. Bruce Fields
2010-02-26 22:54 ` J. Bruce Fields
2010-02-27 0:40 ` Tom Tucker
2010-02-27 1:35 ` Neil Brown
[not found] ` <20100227123537.6289e326-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2010-02-27 2:38 ` Tom Tucker
2010-03-01 4:23 ` Neil Brown
[not found] ` <20100301152310.750f3504-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2010-03-01 14:44 ` J. Bruce Fields
2010-02-27 5:59 ` The recent kref_put warning (was: [PATCH] sunrpc: remove unnecessary svc_xprt_put) Neil Brown
[not found] ` <20100227165913.53718449-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2010-02-28 0:46 ` The recent kref_put warning Tom Tucker
2010-02-28 21:05 ` J. Bruce Fields [this message]
2010-02-28 22:07 ` The recent kref_put warning (was: [PATCH] sunrpc: remove unnecessary svc_xprt_put) J. Bruce Fields
2010-02-28 23:57 ` Neil Brown
[not found] ` <20100301105734.7fe935b0-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2010-03-01 3:46 ` J. Bruce Fields
2010-03-01 3:48 ` J. Bruce Fields
2010-03-01 5:51 ` Neil Brown
[not found] ` <20100301165114.74d2797b-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2010-03-01 14:50 ` J. Bruce Fields
2010-03-01 23:19 ` J. Bruce Fields
2010-03-01 23:20 ` J. Bruce Fields
2010-04-28 21:43 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100228210553.GA27965@fieldses.org \
--to=bfields@fieldses.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=tom@opengridcomputing.com \
--cc=yjwei@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.