From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-yw1-f174.google.com (mail-yw1-f174.google.com [209.85.128.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5156A12F5A1 for ; Thu, 18 Jul 2024 14:44:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.174 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721313875; cv=none; b=jPTvMDNYQMcROfvHwJa4K/WZjGufWGOKi8Z6RPTjMk7979D9KSRL6fl9oQ8WVJczEOCtMsvqCq11biWVgO1HVIPZDq6TIEYDdNglriMIn/cy8e/jessmLu0AVimB0TPpvOgwFJBbhxDnkKXDKg4LlLSlD2ieyhdrKuxJny1oOUU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721313875; c=relaxed/simple; bh=DGvXyYV3ua+SqArf5bR3exBTqFl4attv9HXBDfclRQk=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=PUjsKdKMAMUrr+ynOnLR/A//CejujIi3iNOFKNyexcdkRjNxJawvcu8NermWZuwbEzhttFxlZBJgyIGTPAtNNogclOP6hYf7m6lY3Mt5rEqXxa9jdboRO5oZcHDVBFGpmSTd0ummpQgCGrAYOZgUngMOFE7nCqpiB1OlRE+iNUo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=toxicpanda.com; spf=none smtp.mailfrom=toxicpanda.com; dkim=pass (2048-bit key) header.d=toxicpanda-com.20230601.gappssmtp.com header.i=@toxicpanda-com.20230601.gappssmtp.com header.b=ZhOVqh3k; arc=none smtp.client-ip=209.85.128.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=toxicpanda.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=toxicpanda.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=toxicpanda-com.20230601.gappssmtp.com header.i=@toxicpanda-com.20230601.gappssmtp.com header.b="ZhOVqh3k" Received: by mail-yw1-f174.google.com with SMTP id 00721157ae682-65f8626780aso8199127b3.3 for ; Thu, 18 Jul 2024 07:44:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20230601.gappssmtp.com; s=20230601; t=1721313872; x=1721918672; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=MKhFNPjJUpLjnLuRbSgJY4qyi1z8LEzoBAkugdLbcKw=; b=ZhOVqh3klttsfvnC2opSOXBxkzN489dwvsjo0o2pKbqZthcjeGWsRaWRSqp7kCppOZ jRiJEoksfDSNsd3mEQWCPhPYYV/hABv8whw0wc2KKNqb2QMbN45E+iY4vxrP0XuMnKR8 5q6jLcRmdkRENRvzzazLbevjZOJhVo11H3nS8vaMz+upZNryc+IfB+dpAWUi9LRsvFtC SqKTFq8OOGvyhxbysmN0BIVjS8J5BNtnBnV8L66w2SyL+lbpPcpxsMSS/N3W1Zdzkc+E xNt3Nf4TAYNrn/wk974Kk46d6OfL1+/tE8COImsgDUuojtfq2fvcboIszWjji/AoeNmj fZUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721313872; x=1721918672; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=MKhFNPjJUpLjnLuRbSgJY4qyi1z8LEzoBAkugdLbcKw=; b=OSrYx1MGCgtq9iuKdmRdmu6VDIBvc0g3k49ZyWEv4jlq4d6UxwYdT32f1kYevsaVVi 6lC13woCHrT2BQscNeIrVwcoOr//wDf8s5oUKRRrqbbLwLY+y6yj5XA7ZREZcDFnqxu4 v8+Eyp/QDbGge4yP1zKM7hJbp//wxRYQFOno7eZjL+CIMSWNdn1QjCPlRo+v0D7lppjv zyoBP/duPzNQcbU65rvVu+rkaEIKKIi0apGrdr8TIqsp37fCbejxNhmNRSApUSzz3S5P oy26P9x47tnZ+wLCF0gZ5Pu5YwWR8RvKslTQ44+Lcm8h2sBC99owRKbZ8f6knMW6ej0p C1+Q== X-Forwarded-Encrypted: i=1; AJvYcCUCiHCpAHCmuX9wH/RQSJnoH1fQ6qRVO9yRXFUgNS4Tl0QMTs/bRtdd8+kdxZofpkPXOcmk3iU4UVz05cTR4Mn/tTsG5obQGKeEXi1SqQ== X-Gm-Message-State: AOJu0YyUNUEx+hWX1Cgb4cF4AtMqqwRyJ3DE30DsYAeog/ITsS/tF+X+ r7Wla8GcRLQgoKlOewI5IKAxxjI6x2hcJvCgmfPH+1hb1NfA4gbIUQx+Ze2hpy0= X-Google-Smtp-Source: AGHT+IHKmwQjv2YanE2T9boGh0Yh605UeDRW731VbpfrcsHrn3pq/Ma/XFTr4SW2Njd2fIBK2u7YBw== X-Received: by 2002:a05:690c:3203:b0:646:7b75:5c2c with SMTP id 00721157ae682-66601cce447mr32485437b3.16.1721313872074; Thu, 18 Jul 2024 07:44:32 -0700 (PDT) Received: from localhost (syn-076-182-020-124.res.spectrum.com. [76.182.20.124]) by smtp.gmail.com with ESMTPSA id 00721157ae682-66601ad9f79sm3502717b3.27.2024.07.18.07.44.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Jul 2024 07:44:31 -0700 (PDT) Date: Thu, 18 Jul 2024 10:44:30 -0400 From: Josef Bacik To: Joanne Koong Cc: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org, osandov@osandov.com, kernel-team@meta.com Subject: Re: [PATCH] fuse: add optional kernel-enforced timeout for fuse requests Message-ID: <20240718144430.GA2099026@perftesting> References: <20240717213458.1613347-1-joannelkoong@gmail.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240717213458.1613347-1-joannelkoong@gmail.com> On Wed, Jul 17, 2024 at 02:34:58PM -0700, Joanne Koong wrote: > There are situations where fuse servers can become unresponsive or take > too long to reply to a request. Currently there is no upper bound on > how long a request may take, which may be frustrating to users who get > stuck waiting for a request to complete. > > This commit adds a daemon timeout option (in seconds) for fuse requests. > If the timeout elapses before the request is replied to, the request will > fail with -ETIME. > > There are 3 possibilities for a request that times out: > a) The request times out before the request has been sent to userspace > b) The request times out after the request has been sent to userspace > and before it receives a reply from the server > c) The request times out after the request has been sent to userspace > and the server replies while the kernel is timing out the request > > Proper synchronization must be added to ensure that the request is > handled correctly in all of these cases. To this effect, there is a new > FR_PROCESSING bit added to the request flags, which is set atomically by > either the timeout handler (see fuse_request_timeout()) which is invoked > after the request timeout elapses or set by the request reply handler > (see dev_do_write()), whichever gets there first. > > If the reply handler and the timeout handler are executing simultaneously > and the reply handler sets FR_PROCESSING before the timeout handler, then > the request is re-queued onto the waitqueue and the kernel will process the > reply as though the timeout did not elapse. If the timeout handler sets > FR_PROCESSING before the reply handler, then the request will fail with > -ETIME and the request will be cleaned up. > > Proper acquires on the request reference must be added to ensure that the > timeout handler does not drop the last refcount on the request while the > reply handler (dev_do_write()) or forwarder handler (dev_do_read()) is > still accessing the request. (By "forwarder handler", this is the handler > that forwards the request to userspace). > > Currently, this is the lifecycle of the request refcount: > > Request is created: > fuse_simple_request -> allocates request, sets refcount to 1 > __fuse_request_send -> acquires refcount > queues request and waits for reply... > fuse_simple_request -> drops refcount > > Request is freed: > fuse_dev_do_write > fuse_request_end -> drops refcount on request > > The timeout handler drops the refcount on the request so that the > request is properly cleaned up if a reply is never received. Because of > this, both the forwarder handler and the reply handler must acquire a refcount > on the request while it accesses the request, and the refcount must be > acquired while the lock of the list the request is on is held. > > There is a potential race if the request is being forwarded to > userspace while the timeout handler is executing (eg FR_PENDING has > already been cleared but dev_do_read() hasn't finished executing). This > is a problem because this would free the request but the request has not > been removed from the fpq list it's on. To prevent this, dev_do_read() > must check FR_PROCESSING at the end of its logic and remove the request > from the fpq list if the timeout occurred. > > There is also the case where the connection may be aborted or the > device may be released while the timeout handler is running. To protect > against an extra refcount drop on the request, the timeout handler > checks the connected state of the list and lets the abort handler drop the > last reference if the abort is running simultaneously. Similarly, the > timeout handler also needs to check if the req->out.h.error is set to > -ESTALE, which indicates that the device release is cleaning up the > request. In both these cases, the timeout handler will return without > dropping the refcount. > > Please also note that background requests are not applicable for timeouts > since they are asynchronous. > > Signed-off-by: Joanne Koong > --- > fs/fuse/dev.c | 177 ++++++++++++++++++++++++++++++++++++++++++++--- > fs/fuse/fuse_i.h | 12 ++++ > fs/fuse/inode.c | 7 ++ > 3 files changed, 188 insertions(+), 8 deletions(-) > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > index 9eb191b5c4de..7dd7b244951b 100644 > --- a/fs/fuse/dev.c > +++ b/fs/fuse/dev.c > @@ -331,6 +331,69 @@ void fuse_request_end(struct fuse_req *req) > } > EXPORT_SYMBOL_GPL(fuse_request_end); > > +/* fuse_request_end for requests that timeout */ > +static void fuse_request_timeout(struct fuse_req *req) > +{ > + struct fuse_conn *fc = req->fm->fc; > + struct fuse_iqueue *fiq = &fc->iq; > + struct fuse_pqueue *fpq; > + > + spin_lock(&fiq->lock); > + if (!fiq->connected) { > + spin_unlock(&fiq->lock); > + /* > + * Connection is being aborted. The abort will release > + * the refcount on the request > + */ > + req->out.h.error = -ECONNABORTED; > + return; > + } > + if (test_bit(FR_PENDING, &req->flags)) { > + /* Request is not yet in userspace, bail out */ > + list_del(&req->list); > + spin_unlock(&fiq->lock); > + req->out.h.error = -ETIME; > + __fuse_put_request(req); Why is this safe? We could be the last holder of the reference on this request correct? The only places using __fuse_put_request() would be where we are in a path where the caller already holds a reference on the request. Since this is async it may not be the case right? If it is safe then it's just confusing and warrants a comment. > + return; > + } > + if (test_bit(FR_INTERRUPTED, &req->flags)) > + list_del_init(&req->intr_entry); > + > + fpq = req->fpq; > + spin_unlock(&fiq->lock); > + > + if (fpq) { > + spin_lock(&fpq->lock); > + if (!fpq->connected && (!test_bit(FR_PRIVATE, &req->flags))) { ^^ You don't need the extra () there. > + spin_unlock(&fpq->lock); > + /* > + * Connection is being aborted. The abort will release > + * the refcount on the request > + */ > + req->out.h.error = -ECONNABORTED; > + return; > + } > + if (req->out.h.error == -ESTALE) { > + /* > + * Device is being released. The fuse_dev_release call > + * will drop the refcount on the request > + */ > + spin_unlock(&fpq->lock); > + return; > + } > + if (!test_bit(FR_PRIVATE, &req->flags)) > + list_del_init(&req->list); > + spin_unlock(&fpq->lock); > + } > + > + req->out.h.error = -ETIME; > + > + if (test_bit(FR_ASYNC, &req->flags)) > + req->args->end(req->fm, req->args, req->out.h.error); > + > + fuse_put_request(req); > +} Just a general styling thing, we have two different states for requests here, PENDING and !PENDING correct? I think it may be better to do something like if (test_bit(FR_PENDING, &req->flags)) timeout_pending_req(); else timeout_inflight_req(); and then in timeout_pending_req() you do spin_lock(&fiq->lock); if (!test_bit(FR_PENDING, &req->flags)) { spin_unlock(&fiq_lock); timeout_inflight_req(); return; } This will keep the two different state cleanup functions separate and a little cleaner to grok. > + > static int queue_interrupt(struct fuse_req *req) > { > struct fuse_iqueue *fiq = &req->fm->fc->iq; > @@ -361,6 +424,62 @@ static int queue_interrupt(struct fuse_req *req) > return 0; > } > > +enum wait_type { > + WAIT_TYPE_INTERRUPTIBLE, > + WAIT_TYPE_KILLABLE, > + WAIT_TYPE_NONINTERRUPTIBLE, > +}; > + > +static int fuse_wait_event_interruptible_timeout(struct fuse_req *req) > +{ > + struct fuse_conn *fc = req->fm->fc; > + > + return wait_event_interruptible_timeout(req->waitq, > + test_bit(FR_FINISHED, > + &req->flags), > + fc->daemon_timeout); > +} > +ALLOW_ERROR_INJECTION(fuse_wait_event_interruptible_timeout, ERRNO); > + > +static int wait_answer_timeout(struct fuse_req *req, enum wait_type type) > +{ > + struct fuse_conn *fc = req->fm->fc; > + int err; > + > +wait_answer_start: > + if (type == WAIT_TYPE_INTERRUPTIBLE) > + err = fuse_wait_event_interruptible_timeout(req); > + else if (type == WAIT_TYPE_KILLABLE) > + err = wait_event_killable_timeout(req->waitq, > + test_bit(FR_FINISHED, &req->flags), > + fc->daemon_timeout); > + > + else if (type == WAIT_TYPE_NONINTERRUPTIBLE) > + err = wait_event_timeout(req->waitq, test_bit(FR_FINISHED, &req->flags), > + fc->daemon_timeout); > + else > + WARN_ON(1); This will leak some random value for err, so initialize err to something that will be dealt with, like -EINVAL; > + > + /* request was answered */ > + if (err > 0) > + return 0; > + > + /* request was not answered in time */ > + if (err == 0) { > + if (test_and_set_bit(FR_PROCESSING, &req->flags)) > + /* request reply is being processed by kernel right now. > + * we should wait for the answer. > + */ Format for multiline comments is /* * blah * blah */ and since this is a 1 line if statement put it above the if statement. > + goto wait_answer_start; > + > + fuse_request_timeout(req); > + return 0; > + } > + > + /* else request was interrupted */ > + return err; > +} > + > static void request_wait_answer(struct fuse_req *req) > { > struct fuse_conn *fc = req->fm->fc; > @@ -369,8 +488,11 @@ static void request_wait_answer(struct fuse_req *req) > > if (!fc->no_interrupt) { > /* Any signal may interrupt this */ > - err = wait_event_interruptible(req->waitq, > - test_bit(FR_FINISHED, &req->flags)); > + if (fc->daemon_timeout) > + err = wait_answer_timeout(req, WAIT_TYPE_INTERRUPTIBLE); > + else > + err = wait_event_interruptible(req->waitq, > + test_bit(FR_FINISHED, &req->flags)); > if (!err) > return; > > @@ -383,8 +505,11 @@ static void request_wait_answer(struct fuse_req *req) > > if (!test_bit(FR_FORCE, &req->flags)) { > /* Only fatal signals may interrupt this */ > - err = wait_event_killable(req->waitq, > - test_bit(FR_FINISHED, &req->flags)); > + if (fc->daemon_timeout) > + err = wait_answer_timeout(req, WAIT_TYPE_KILLABLE); > + else > + err = wait_event_killable(req->waitq, > + test_bit(FR_FINISHED, &req->flags)); > if (!err) > return; > > @@ -404,7 +529,10 @@ static void request_wait_answer(struct fuse_req *req) > * Either request is already in userspace, or it was forced. > * Wait it out. > */ > - wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags)); > + if (fc->daemon_timeout) > + wait_answer_timeout(req, WAIT_TYPE_NONINTERRUPTIBLE); > + else > + wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags)); > } > > static void __fuse_request_send(struct fuse_req *req) > @@ -1268,6 +1396,9 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, > req = list_entry(fiq->pending.next, struct fuse_req, list); > clear_bit(FR_PENDING, &req->flags); > list_del_init(&req->list); > + /* Acquire a reference since fuse_request_timeout may also be executing */ > + __fuse_get_request(req); > + req->fpq = fpq; > spin_unlock(&fiq->lock); > There's a race here with completion. If we timeout a request right here, we can end up sending that same request below. You are going to need to check test_bit(FR_PROCESSING) after you take the fpq->lock just below here to make sure you didn't race with the timeout handler and time the request out already. > args = req->args; > @@ -1280,6 +1411,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, > if (args->opcode == FUSE_SETXATTR) > req->out.h.error = -E2BIG; > fuse_request_end(req); > + fuse_put_request(req); > goto restart; > } > spin_lock(&fpq->lock); > @@ -1316,13 +1448,23 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, > } > hash = fuse_req_hash(req->in.h.unique); > list_move_tail(&req->list, &fpq->processing[hash]); > - __fuse_get_request(req); > set_bit(FR_SENT, &req->flags); > spin_unlock(&fpq->lock); > /* matches barrier in request_wait_answer() */ > smp_mb__after_atomic(); > if (test_bit(FR_INTERRUPTED, &req->flags)) > queue_interrupt(req); > + > + /* Check if request timed out */ > + if (test_bit(FR_PROCESSING, &req->flags)) { > + spin_lock(&fpq->lock); > + if (!test_bit(FR_PRIVATE, &req->flags)) > + list_del_init(&req->list); > + spin_unlock(&fpq->lock); > + fuse_put_request(req); > + return -ETIME; > + } This isn't quite right, we could have FR_PROCESSING set because we completed the request before we got here. If you put a schedule_timeout(HZ); right above this you could easily see where a request gets completed by userspace, but now you've fimed it out. Additionally if we have FR_PROCESSING set from the timeout, shouldn't this cleanup have been done already? I don't understand why we need to handle this here, we should just return and whoever is waiting on the request will get the error. Thanks, Josef