From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from fout-b8-smtp.messagingengine.com (fout-b8-smtp.messagingengine.com [202.12.124.151]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B21D92BDC29; Tue, 4 Nov 2025 21:47:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.151 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762292879; cv=none; b=r38SXU47SJ3XfzhxgjO6QokZDimRZukxp4jpk/TmQq1YKI6ixsn/wloF9z8fm8ILrsPCxGByHjc+EB2HP6c3LMUHmaNOB9/mOJ98PK7TO/JK9TvOERKgUhwvFuo/MhwTrjkBuHu0IIzaQGK0yN3IYLe5ER6cBVM7ZZBp6XBaI5k= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762292879; c=relaxed/simple; bh=CBRzfv5gGBC5Uc9eN6jFEVJRkhOC4OpydzBhjBIX7Js=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=BQGNlGJVnc8SXZBGhodhusbmBX4nepw1jpxxN2o9QIYejp7cYSL0aZ13fZkyDyB4kNc8koLTJiBhN8GfENrod3+eb48vRYTzOTlcrsX1+c8L4ApBZiS61DhoW3QzYnCKqqP7Rfs4L5d3s3u2AsE3Ba3nBW01hcqxvFsAtaZYPrM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=bsbernd.com; spf=pass smtp.mailfrom=bsbernd.com; dkim=pass (2048-bit key) header.d=bsbernd.com header.i=@bsbernd.com header.b=NiAD7qnd; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=ROpU75sE; arc=none smtp.client-ip=202.12.124.151 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=bsbernd.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bsbernd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bsbernd.com header.i=@bsbernd.com header.b="NiAD7qnd"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="ROpU75sE" Received: from phl-compute-09.internal (phl-compute-09.internal [10.202.2.49]) by mailfout.stl.internal (Postfix) with ESMTP id C6D5E1D0010A; Tue, 4 Nov 2025 16:47:55 -0500 (EST) Received: from phl-mailfrontend-02 ([10.202.2.163]) by phl-compute-09.internal (MEProxy); Tue, 04 Nov 2025 16:47:55 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsbernd.com; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm1; t=1762292875; x=1762379275; bh=UZMkskdZGEy/ZOhbY3vVwwld6Sejbwuvlzt3XY//SHU=; b= NiAD7qndnvk0POVlL5bhlXqiRi9EhQ2cWOCtR6Jhq8hY9qnZxEm2qDwSAzQUQYQL o5tjJwYBmYpUAPssZP9tRIomoHzNwK1qifRTxX4lWKRCvqFiKlZ34pdfRQhywymI TYO0ikCv6MVo8eEZc1+h2yIZ/M/Pg2JgMpaYpzPPNwDcBkldU9nh9/cO4F1Mu87P GJE9lFpDpid9CG8/lfQQ5gesDOmNMO3A8dkcOOt8YfWTYt28py/y9mG2YPXrU5nP UBLBRgBXmZeWl8f+iiKJ2HcoXEBad03U+sGvxk2GSx0DnQ4uLQzr9njVuoetUL+F IucJbsXdCeoLwmuvVoXI3w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1762292875; x= 1762379275; bh=UZMkskdZGEy/ZOhbY3vVwwld6Sejbwuvlzt3XY//SHU=; b=R OpU75sET0evUjVM3eqSCE+JlF7u//GfZlrweHiidZG86aiUKkC6Hi9pV4hyKHEUp 4poSlmC6DvVZslBQPe0RSYEJs5EAm/0AnSzSfpdRbDiIg+fREyNK1jZq9knJYPK0 SHHBLIIs7G1Fe/oasStNfSpVhTY+ptz8dhkvMgYEXQ/tCN6l5uHBdCpM8h3pRwi6 8Yxb6LtlVzkLLw+FlhVtgx5AnBELwP+pexM9Dcp9siS/M3kjzfjXkhB3YoS77Kct gs4V06j1Eifapp2zHHXl7zxqWoW36KFpFIvd6akDjF7sZDZ5ce6S0esTVUDyat5g /ov5LPWoduYTAx90+5GXA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtdeggddukedvudegucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhepkfffgggfuffvvehfhfgjtgfgsehtkeertddtvdejnecuhfhrohhmpeeuvghrnhgu ucfutghhuhgsvghrthcuoegsvghrnhgusegsshgsvghrnhgurdgtohhmqeenucggtffrrg htthgvrhhnpeefgeegfeffkeduudelfeehleelhefgffehudejvdfgteevvddtfeeiheef lefgvdenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpe gsvghrnhgusegsshgsvghrnhgurdgtohhmpdhnsggprhgtphhtthhopeeipdhmohguvgep shhmthhpohhuthdprhgtphhtthhopehjohgrnhhnvghlkhhoohhnghesghhmrghilhdrtg homhdprhgtphhtthhopegujhifohhngheskhgvrhhnvghlrdhorhhgpdhrtghpthhtohep mhhikhhlohhssehsiigvrhgvughirdhhuhdprhgtphhtthhopehnvggrlhesghhomhhprg druggvvhdprhgtphhtthhopehlihhnuhigqdgvgihtgeesvhhgvghrrdhkvghrnhgvlhdr ohhrghdprhgtphhtthhopehlihhnuhigqdhfshguvghvvghlsehvghgvrhdrkhgvrhhnvg hlrdhorhhg X-ME-Proxy: Feedback-ID: i5c2e48a5:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 4 Nov 2025 16:47:53 -0500 (EST) Message-ID: Date: Tue, 4 Nov 2025 22:47:52 +0100 Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 1/5] fuse: flush pending fuse events before aborting the connection To: Joanne Koong , "Darrick J. Wong" Cc: miklos@szeredi.hu, neal@gompa.dev, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org References: <176169809222.1424347.16562281526870178424.stgit@frogsfrogsfrogs> <176169809274.1424347.4813085698864777783.stgit@frogsfrogsfrogs> <20251103221349.GE196370@frogsfrogsfrogs> From: Bernd Schubert Content-Language: en-US, de-DE, fr In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 11/4/25 20:22, Joanne Koong wrote: > On Mon, Nov 3, 2025 at 2:13 PM Darrick J. Wong wrote: >> >> On Mon, Nov 03, 2025 at 09:20:26AM -0800, Joanne Koong wrote: >>> On Tue, Oct 28, 2025 at 5:43 PM Darrick J. Wong wrote: >>>> >>>> From: Darrick J. Wong >>>> >>>> generic/488 fails with fuse2fs in the following fashion: >>>> >>>> generic/488 _check_generic_filesystem: filesystem on /dev/sdf is inconsistent >>>> (see /var/tmp/fstests/generic/488.full for details) >>>> >>>> This test opens a large number of files, unlinks them (which really just >>>> renames them to fuse hidden files), closes the program, unmounts the >>>> filesystem, and runs fsck to check that there aren't any inconsistencies >>>> in the filesystem. >>>> >>>> Unfortunately, the 488.full file shows that there are a lot of hidden >>>> files left over in the filesystem, with incorrect link counts. Tracing >>>> fuse_request_* shows that there are a large number of FUSE_RELEASE >>>> commands that are queued up on behalf of the unlinked files at the time >>>> that fuse_conn_destroy calls fuse_abort_conn. Had the connection not >>>> aborted, the fuse server would have responded to the RELEASE commands by >>>> removing the hidden files; instead they stick around. >>>> >>>> For upper-level fuse servers that don't use fuseblk mode this isn't a >>>> problem because libfuse responds to the connection going down by pruning >>>> its inode cache and calling the fuse server's ->release for any open >>>> files before calling the server's ->destroy function. >>>> >>>> For fuseblk servers this is a problem, however, because the kernel sends >>>> FUSE_DESTROY to the fuse server, and the fuse server has to close the >>>> block device before returning. This means that the kernel must flush >>>> all pending FUSE_RELEASE requests before issuing FUSE_DESTROY. >>>> >>>> Create a function to push all the background requests to the queue and >>>> then wait for the number of pending events to hit zero, and call this >>>> before sending FUSE_DESTROY. That way, all the pending events are >>>> processed by the fuse server and we don't end up with a corrupt >>>> filesystem. >>>> >>>> Note that we use a wait_event_timeout() loop to cause the process to >>>> schedule at least once per second to avoid a "task blocked" warning: >>>> >>>> INFO: task umount:1279 blocked for more than 20 seconds. >>>> Not tainted 6.17.0-rc7-xfsx #rc7 >>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this messag. >>>> task:umount state:D stack:11984 pid:1279 tgid:1279 ppid:10690 >>>> >>>> Earlier in the threads about this patch there was a (self-inflicted) >>>> dispute as to whether it was necessary to call touch_softlockup_watchdog >>>> in the loop body. Because the process goes to sleep, it's not necessary >>>> to touch the softlockup watchdog because we're not preventing another >>>> process from being scheduled on a CPU. >>>> >>>> Signed-off-by: "Darrick J. Wong" >>>> --- >>>> fs/fuse/fuse_i.h | 5 +++++ >>>> fs/fuse/dev.c | 35 +++++++++++++++++++++++++++++++++++ >>>> fs/fuse/inode.c | 11 ++++++++++- >>>> 3 files changed, 50 insertions(+), 1 deletion(-) >>>> >>>> >>>> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h >>>> index c2f2a48156d6c5..aaa8574fd72775 100644 >>>> --- a/fs/fuse/fuse_i.h >>>> +++ b/fs/fuse/fuse_i.h >>>> @@ -1274,6 +1274,11 @@ void fuse_request_end(struct fuse_req *req); >>>> void fuse_abort_conn(struct fuse_conn *fc); >>>> void fuse_wait_aborted(struct fuse_conn *fc); >>>> >>>> +/** >>>> + * Flush all pending requests and wait for them. >>>> + */ >>>> +void fuse_flush_requests_and_wait(struct fuse_conn *fc); >>>> + >>>> /* Check if any requests timed out */ >>>> void fuse_check_timeout(struct work_struct *work); >>>> >>>> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c >>>> index 132f38619d7072..ecc0a5304c59d1 100644 >>>> --- a/fs/fuse/dev.c >>>> +++ b/fs/fuse/dev.c >>>> @@ -24,6 +24,7 @@ >>>> #include >>>> #include >>>> #include >>>> +#include >>>> >>>> #include "fuse_trace.h" >>>> >>>> @@ -2430,6 +2431,40 @@ static void end_polls(struct fuse_conn *fc) >>>> } >>>> } >>>> >>>> +/* >>>> + * Flush all pending requests and wait for them. Only call this function when >>>> + * it is no longer possible for other threads to add requests. >>>> + */ >>>> +void fuse_flush_requests_and_wait(struct fuse_conn *fc) >>>> +{ >>>> + spin_lock(&fc->lock); >>> >>> Do we need to grab the fc lock? fc->connected is protected under the >>> bg_lock, afaict from fuse_abort_conn(). >> >> Oh, heh. Yeah, it does indeed take both fc->lock and fc->bg_lock. >> Will fix that, thanks. :) >> >> FWIW I don't think it's a big deal if we see a stale connected==1 value >> because the events will all get cancelled and the wait loop won't run >> anyway, but I agree with being consistent about lock ordering. :) >> >>>> + if (!fc->connected) { >>>> + spin_unlock(&fc->lock); >>>> + return; >>>> + } >>>> + >>>> + /* Push all the background requests to the queue. */ >>>> + spin_lock(&fc->bg_lock); >>>> + fc->blocked = 0; >>>> + fc->max_background = UINT_MAX; >>>> + flush_bg_queue(fc); >>>> + spin_unlock(&fc->bg_lock); >>>> + spin_unlock(&fc->lock); >>>> + >>>> + /* >>>> + * Wait for all pending fuse requests to complete or abort. The fuse >>>> + * server could take a significant amount of time to complete a >>>> + * request, so run this in a loop with a short timeout so that we don't >>>> + * trip the soft lockup detector. >>>> + */ >>>> + smp_mb(); >>>> + while (wait_event_timeout(fc->blocked_waitq, >>>> + !fc->connected || atomic_read(&fc->num_waiting) == 0, >>>> + HZ) == 0) { >>>> + /* empty */ >>>> + } >>> >>> I'm wondering if it's necessary to wait here for all the pending >>> requests to complete or abort? >> >> I'm not 100% sure what the fuse client shutdown sequence is supposed to >> be. If someone kills a program with a large number of open unlinked >> files and immediately calls umount(), then the fuse client could be in >> the process of sending FUSE_RELEASE requests to the server. >> >> [background info, feel free to speedread this paragraph] >> For a non-fuseblk server, unmount aborts all pending requests and >> disconnects the fuse device. This means that the fuse server won't see >> all the FUSE_REQUESTs before libfuse calls ->destroy having observed the >> fusedev shutdown. The end result is that (on fuse2fs anyway) you end up >> with a lot of .fuseXXXXX files that nobody cleans up. >> >> If you make ->destroy release all the remaining open files, now you run >> into a second problem, which is that if there are a lot of open unlinked >> files, freeing the inodes can collectively take enough time that the >> FUSE_DESTROY request times out. >> >> On a fuseblk server with libfuse running in multithreaded mode, there >> can be several threads reading fuse requests from the fusedev. The >> kernel actually sends its own FUSE_DESTROY request, but there's no >> coordination between the fuse workers, which means that the fuse server >> can process FUSE_DESTROY at the same time it's processing FUSE_RELEASE. >> If ->destroy closes the filesystem before the FUSE_RELEASE requests are >> processed, you end up with the same .fuseXXXXX file cleanup problem. > > imo it is the responsibility of the server to coordinate this and make > sure it has handled all the requests it has received before it starts > executing the destruction logic. imo the only responsibility of the > kernel is to actually send the background requests before it sends the > FUSE_DESTROY. I think non-fuseblk servers should also receive the > FUSE_DESTROY request. Hmm, good idea, I guess we can add that in libfuse, maybe with some kind of timeout. There is something I don't understand though, how can FUSE_DESTROY happen before FUSE_RELEASE is completed? ->release / fuse_release fuse_release_common fuse_file_release fuse_file_put fuse_simple_background fuse_release_end iput() I.e. how can it release the superblock (which triggers FUSE_DESTROY) Thanks, Bernd