All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Luís Henriques" <lhenriques@suse.de>
To: xiubli@redhat.com
Cc: jlayton@kernel.org, idryomov@gmail.com, vshankar@redhat.com,
	gfarnum@redhat.com, ceph-devel@vger.kernel.org
Subject: Re: [PATCH v3] ceph: stop forwarding the request when exceeding 256 times
Date: Wed, 30 Mar 2022 11:59:31 +0100	[thread overview]
Message-ID: <874k3fr9ik.fsf@brahms.olymp> (raw)
In-Reply-To: <20220330012521.170962-1-xiubli@redhat.com> (xiubli@redhat.com's message of "Wed, 30 Mar 2022 09:25:21 +0800")

xiubli@redhat.com writes:

> From: Xiubo Li <xiubli@redhat.com>
>
> The type of 'num_fwd' in ceph 'MClientRequestForward' is 'int32_t',
> while in 'ceph_mds_request_head' the type is '__u8'. So in case
> the request bounces between MDSes exceeding 256 times, the client
> will get stuck.
>
> In this case it's ususally a bug in MDS and continue bouncing the
> request makes no sense.
>
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>
> V3:
> - avoid usig the hardcode of 256
>
> V2:
> - s/EIO/EMULTIHOP/
> - Fixed dereferencing NULL seq bug
> - Removed the out lable
>
>
>  fs/ceph/mds_client.c | 39 ++++++++++++++++++++++++++++++++++-----
>  1 file changed, 34 insertions(+), 5 deletions(-)
>
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index a89ee866ebbb..e11d31401f12 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -3293,6 +3293,7 @@ static void handle_forward(struct ceph_mds_client *mdsc,
>  	int err = -EINVAL;
>  	void *p = msg->front.iov_base;
>  	void *end = p + msg->front.iov_len;
> +	bool aborted = false;
>  
>  	ceph_decode_need(&p, end, 2*sizeof(u32), bad);
>  	next_mds = ceph_decode_32(&p);
> @@ -3301,16 +3302,41 @@ static void handle_forward(struct ceph_mds_client *mdsc,
>  	mutex_lock(&mdsc->mutex);
>  	req = lookup_get_request(mdsc, tid);
>  	if (!req) {
> +		mutex_unlock(&mdsc->mutex);
>  		dout("forward tid %llu to mds%d - req dne\n", tid, next_mds);
> -		goto out;  /* dup reply? */
> +		return;  /* dup reply? */
>  	}
>  
>  	if (test_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags)) {
>  		dout("forward tid %llu aborted, unregistering\n", tid);
>  		__unregister_request(mdsc, req);
>  	} else if (fwd_seq <= req->r_num_fwd) {
> -		dout("forward tid %llu to mds%d - old seq %d <= %d\n",
> -		     tid, next_mds, req->r_num_fwd, fwd_seq);
> +		/*
> +		 * The type of 'num_fwd' in ceph 'MClientRequestForward'
> +		 * is 'int32_t', while in 'ceph_mds_request_head' the
> +		 * type is '__u8'. So in case the request bounces between
> +		 * MDSes exceeding 256 times, the client will get stuck.
> +		 *
> +		 * In this case it's ususally a bug in MDS and continue
> +		 * bouncing the request makes no sense.
> +		 *
> +		 * In future this could be fixed in ceph code, so avoid
> +		 * using the hardcode here.
> +		 */
> +		int max = sizeof_field(struct ceph_mds_request_head, num_fwd);
> +		max = 1 << (max * BITS_PER_BYTE);
> +		if (req->r_num_fwd >= max) {
> +			mutex_lock(&req->r_fill_mutex);
> +			req->r_err = -EMULTIHOP;
> +			set_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags);
> +			mutex_unlock(&req->r_fill_mutex);
> +			aborted = true;
> +			pr_warn_ratelimited("forward tid %llu seq overflow\n",
> +					    tid);
> +		} else {
> +			dout("forward tid %llu to mds%d - old seq %d <= %d\n",
> +			     tid, next_mds, req->r_num_fwd, fwd_seq);
> +		}
>  	} else {
>  		/* resend. forward race not possible; mds would drop */
>  		dout("forward tid %llu to mds%d (we resend)\n", tid, next_mds);
> @@ -3322,9 +3348,12 @@ static void handle_forward(struct ceph_mds_client *mdsc,
>  		put_request_session(req);
>  		__do_request(mdsc, req);
>  	}
> -	ceph_mdsc_put_request(req);
> -out:
>  	mutex_unlock(&mdsc->mutex);
> +
> +	/* kick calling process */
> +	if (aborted)
> +		complete_request(mdsc, req);
> +	ceph_mdsc_put_request(req);
>  	return;
>  
>  bad:
> -- 
>
> 2.27.0
>

Yeah, looks good to me.  Thanks.

Reviewed-by: Luís Henriques <lhenriques@suse.de>

Cheers,
-- 
Luís

      parent reply	other threads:[~2022-03-30 10:59 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-30  1:25 [PATCH v3] ceph: stop forwarding the request when exceeding 256 times xiubli
2022-03-30 10:35 ` Jeff Layton
2022-03-30 10:59 ` Luís Henriques [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874k3fr9ik.fsf@brahms.olymp \
    --to=lhenriques@suse.de \
    --cc=ceph-devel@vger.kernel.org \
    --cc=gfarnum@redhat.com \
    --cc=idryomov@gmail.com \
    --cc=jlayton@kernel.org \
    --cc=vshankar@redhat.com \
    --cc=xiubli@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.