From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EBF8710957; Thu, 6 Nov 2025 00:19:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762388356; cv=none; b=T/2/3wE6JQ3A3EdlZPZ3kWofIOIih2jhSmobT2e1BPFUSOiQUpZQW/P6RgRwxRcN+Cn2nKTAZijN4r+1EtktgU5RJLicQig942kh7Ef+Kt2A66xxDirdorBkjs2FKGHNHqGOXDQFn0Rl6rMgJa7JBugPuY2GHLWiiLvAdNnPVeI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762388356; c=relaxed/simple; bh=yuz8cRWpXPaa88jkV7mgefpgHkZ19yoPK0xCS4Ul/VI=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=M7w+Ptp2P9FIfHRFkhBXSQi2pg0pvvpxrOQhiL1tSIfx/oIZ58AONqdZQ18ZI3GQIbFtFC69jaYpdDLRaxk0cCO8azFPNkbG2pcWOxkIaqb2S3zJTcQTWL4EgEdggURPDk+vRtG+yh3XWzunROASUd5tAtXjmw0yF12UNyrS6FE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=R1A5OGiW; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="R1A5OGiW" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 229EBC4CEF5; Thu, 6 Nov 2025 00:19:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1762388355; bh=yuz8cRWpXPaa88jkV7mgefpgHkZ19yoPK0xCS4Ul/VI=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=R1A5OGiWP68niBI+otgdSkB33I3FvHoFZwPtlQ2+S2ILENyq30G7KJcdjvLPFVrmS 9/aan1z5hYSsu4PHxcYDmrM74iZ2ZP7DK0kpKf6MBBqv6EhIOc48qRKj5gL8XpCZFI PHDBZ3dNLnzyAT6IfCfSGgXh2wCJKS+o7NFgaUpcfVWom/c3Ave7gz8SibOfk2EaVz xxKrYZNVH4uUqNwrIIk6LKsvkPA1Dz9u49MnmGxYPNuDxFH4ZqzLAi1D8M6kkcfCWG gd/4Fk4As0CzOl2wlcjffbHt2mVsRyvzf3og4/RfnJhPWirGFPIJz2xvdDOYG3rBKd k3aDV+/PPQyyA== Date: Wed, 5 Nov 2025 16:19:14 -0800 From: "Darrick J. Wong" To: Bernd Schubert Cc: Joanne Koong , miklos@szeredi.hu, neal@gompa.dev, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH 1/5] fuse: flush pending fuse events before aborting the connection Message-ID: <20251106001914.GI196358@frogsfrogsfrogs> References: <176169809222.1424347.16562281526870178424.stgit@frogsfrogsfrogs> <176169809274.1424347.4813085698864777783.stgit@frogsfrogsfrogs> <20251103221349.GE196370@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-ext4@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Tue, Nov 04, 2025 at 10:47:52PM +0100, Bernd Schubert wrote: > > > On 11/4/25 20:22, Joanne Koong wrote: > > On Mon, Nov 3, 2025 at 2:13 PM Darrick J. Wong wrote: > >> > >> On Mon, Nov 03, 2025 at 09:20:26AM -0800, Joanne Koong wrote: > >>> On Tue, Oct 28, 2025 at 5:43 PM Darrick J. Wong wrote: > >>>> > >>>> From: Darrick J. Wong > >>>> > >>>> generic/488 fails with fuse2fs in the following fashion: > >>>> > >>>> generic/488 _check_generic_filesystem: filesystem on /dev/sdf is inconsistent > >>>> (see /var/tmp/fstests/generic/488.full for details) > >>>> > >>>> This test opens a large number of files, unlinks them (which really just > >>>> renames them to fuse hidden files), closes the program, unmounts the > >>>> filesystem, and runs fsck to check that there aren't any inconsistencies > >>>> in the filesystem. > >>>> > >>>> Unfortunately, the 488.full file shows that there are a lot of hidden > >>>> files left over in the filesystem, with incorrect link counts. Tracing > >>>> fuse_request_* shows that there are a large number of FUSE_RELEASE > >>>> commands that are queued up on behalf of the unlinked files at the time > >>>> that fuse_conn_destroy calls fuse_abort_conn. Had the connection not > >>>> aborted, the fuse server would have responded to the RELEASE commands by > >>>> removing the hidden files; instead they stick around. > >>>> > >>>> For upper-level fuse servers that don't use fuseblk mode this isn't a > >>>> problem because libfuse responds to the connection going down by pruning > >>>> its inode cache and calling the fuse server's ->release for any open > >>>> files before calling the server's ->destroy function. > >>>> > >>>> For fuseblk servers this is a problem, however, because the kernel sends > >>>> FUSE_DESTROY to the fuse server, and the fuse server has to close the > >>>> block device before returning. This means that the kernel must flush > >>>> all pending FUSE_RELEASE requests before issuing FUSE_DESTROY. > >>>> > >>>> Create a function to push all the background requests to the queue and > >>>> then wait for the number of pending events to hit zero, and call this > >>>> before sending FUSE_DESTROY. That way, all the pending events are > >>>> processed by the fuse server and we don't end up with a corrupt > >>>> filesystem. > >>>> > >>>> Note that we use a wait_event_timeout() loop to cause the process to > >>>> schedule at least once per second to avoid a "task blocked" warning: > >>>> > >>>> INFO: task umount:1279 blocked for more than 20 seconds. > >>>> Not tainted 6.17.0-rc7-xfsx #rc7 > >>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this messag. > >>>> task:umount state:D stack:11984 pid:1279 tgid:1279 ppid:10690 > >>>> > >>>> Earlier in the threads about this patch there was a (self-inflicted) > >>>> dispute as to whether it was necessary to call touch_softlockup_watchdog > >>>> in the loop body. Because the process goes to sleep, it's not necessary > >>>> to touch the softlockup watchdog because we're not preventing another > >>>> process from being scheduled on a CPU. > >>>> > >>>> Signed-off-by: "Darrick J. Wong" > >>>> --- > >>>> fs/fuse/fuse_i.h | 5 +++++ > >>>> fs/fuse/dev.c | 35 +++++++++++++++++++++++++++++++++++ > >>>> fs/fuse/inode.c | 11 ++++++++++- > >>>> 3 files changed, 50 insertions(+), 1 deletion(-) > >>>> > >>>> > >>>> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h > >>>> index c2f2a48156d6c5..aaa8574fd72775 100644 > >>>> --- a/fs/fuse/fuse_i.h > >>>> +++ b/fs/fuse/fuse_i.h > >>>> @@ -1274,6 +1274,11 @@ void fuse_request_end(struct fuse_req *req); > >>>> void fuse_abort_conn(struct fuse_conn *fc); > >>>> void fuse_wait_aborted(struct fuse_conn *fc); > >>>> > >>>> +/** > >>>> + * Flush all pending requests and wait for them. > >>>> + */ > >>>> +void fuse_flush_requests_and_wait(struct fuse_conn *fc); > >>>> + > >>>> /* Check if any requests timed out */ > >>>> void fuse_check_timeout(struct work_struct *work); > >>>> > >>>> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > >>>> index 132f38619d7072..ecc0a5304c59d1 100644 > >>>> --- a/fs/fuse/dev.c > >>>> +++ b/fs/fuse/dev.c > >>>> @@ -24,6 +24,7 @@ > >>>> #include > >>>> #include > >>>> #include > >>>> +#include > >>>> > >>>> #include "fuse_trace.h" > >>>> > >>>> @@ -2430,6 +2431,40 @@ static void end_polls(struct fuse_conn *fc) > >>>> } > >>>> } > >>>> > >>>> +/* > >>>> + * Flush all pending requests and wait for them. Only call this function when > >>>> + * it is no longer possible for other threads to add requests. > >>>> + */ > >>>> +void fuse_flush_requests_and_wait(struct fuse_conn *fc) > >>>> +{ > >>>> + spin_lock(&fc->lock); > >>> > >>> Do we need to grab the fc lock? fc->connected is protected under the > >>> bg_lock, afaict from fuse_abort_conn(). > >> > >> Oh, heh. Yeah, it does indeed take both fc->lock and fc->bg_lock. > >> Will fix that, thanks. :) > >> > >> FWIW I don't think it's a big deal if we see a stale connected==1 value > >> because the events will all get cancelled and the wait loop won't run > >> anyway, but I agree with being consistent about lock ordering. :) > >> > >>>> + if (!fc->connected) { > >>>> + spin_unlock(&fc->lock); > >>>> + return; > >>>> + } > >>>> + > >>>> + /* Push all the background requests to the queue. */ > >>>> + spin_lock(&fc->bg_lock); > >>>> + fc->blocked = 0; > >>>> + fc->max_background = UINT_MAX; > >>>> + flush_bg_queue(fc); > >>>> + spin_unlock(&fc->bg_lock); > >>>> + spin_unlock(&fc->lock); > >>>> + > >>>> + /* > >>>> + * Wait for all pending fuse requests to complete or abort. The fuse > >>>> + * server could take a significant amount of time to complete a > >>>> + * request, so run this in a loop with a short timeout so that we don't > >>>> + * trip the soft lockup detector. > >>>> + */ > >>>> + smp_mb(); > >>>> + while (wait_event_timeout(fc->blocked_waitq, > >>>> + !fc->connected || atomic_read(&fc->num_waiting) == 0, > >>>> + HZ) == 0) { > >>>> + /* empty */ > >>>> + } > >>> > >>> I'm wondering if it's necessary to wait here for all the pending > >>> requests to complete or abort? > >> > >> I'm not 100% sure what the fuse client shutdown sequence is supposed to > >> be. If someone kills a program with a large number of open unlinked > >> files and immediately calls umount(), then the fuse client could be in > >> the process of sending FUSE_RELEASE requests to the server. > >> > >> [background info, feel free to speedread this paragraph] > >> For a non-fuseblk server, unmount aborts all pending requests and > >> disconnects the fuse device. This means that the fuse server won't see > >> all the FUSE_REQUESTs before libfuse calls ->destroy having observed the > >> fusedev shutdown. The end result is that (on fuse2fs anyway) you end up > >> with a lot of .fuseXXXXX files that nobody cleans up. > >> > >> If you make ->destroy release all the remaining open files, now you run > >> into a second problem, which is that if there are a lot of open unlinked > >> files, freeing the inodes can collectively take enough time that the > >> FUSE_DESTROY request times out. > >> > >> On a fuseblk server with libfuse running in multithreaded mode, there > >> can be several threads reading fuse requests from the fusedev. The > >> kernel actually sends its own FUSE_DESTROY request, but there's no > >> coordination between the fuse workers, which means that the fuse server > >> can process FUSE_DESTROY at the same time it's processing FUSE_RELEASE. > >> If ->destroy closes the filesystem before the FUSE_RELEASE requests are > >> processed, you end up with the same .fuseXXXXX file cleanup problem. > > > > imo it is the responsibility of the server to coordinate this and make > > sure it has handled all the requests it has received before it starts > > executing the destruction logic. imo the only responsibility of the > > kernel is to actually send the background requests before it sends the > > FUSE_DESTROY. I think non-fuseblk servers should also receive the > > FUSE_DESTROY request. > > Hmm, good idea, I guess we can add that in libfuse, maybe with some kind > of timeout. > > There is something I don't understand though, how can FUSE_DESTROY > happen before FUSE_RELEASE is completed? > > ->release / fuse_release > fuse_release_common > fuse_file_release > fuse_file_put > fuse_simple_background > > > fuse_release_end > iput() > > I.e. how can it release the superblock (which triggers FUSE_DESTROY) The short answer is that fuse_file_put doesn't wait for the backgrounded release request to complete and returns; and that FUSE_DESTROY is sent synchronously and with args->force = true so it jumps the queue. (See my longer reply to Joanne for more details) --D > > Thanks, > Bernd >