From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from fout-a2-smtp.messagingengine.com (fout-a2-smtp.messagingengine.com [103.168.172.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 19D2730568B for ; Fri, 15 May 2026 18:38:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.145 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778870325; cv=none; b=ihmomLwVDxccapf0iowRdKIsBB1ZY1WanocpXHhR9Je1TE8fQ5l7sJopATr58qurn57nnMQkfDyns+BeK0G86r46i/sPiRfFV/KHPguncKTpVu3YUfjZOHv0V2twQKl/7j2IShY7ZnvAmpWNo87FTyYRAvFDqYc+0FDq6FxEKOk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778870325; c=relaxed/simple; bh=YhPxZREZahmlWfIgc4+2NuZkVUH7hOCBNH1luVoLrBA=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=bqyBtXpTTMWM/odZSs0tWkPiUUe+XWsYwkQ0x55y44nWSzmNok3zIGNgkxny6Kq4ps4f+3cvZCageuYzPmBdS4Ivjedy+6P6eDLUI0JrPaiI7k+BEvWEXJBv886iLlZN9NE/NoJ9l7IcY98ukwCwKZnPI7KpxyPyIYUH60v8+Jg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io; spf=pass smtp.mailfrom=bur.io; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b=LgxXydwa; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=EfeoAktW; arc=none smtp.client-ip=103.168.172.145 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bur.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b="LgxXydwa"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="EfeoAktW" Received: from phl-compute-04.internal (phl-compute-04.internal [10.202.2.44]) by mailfout.phl.internal (Postfix) with ESMTP id BABF7EC0017; Fri, 15 May 2026 14:38:41 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-04.internal (MEProxy); Fri, 15 May 2026 14:38:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc:cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm1; t=1778870321; x=1778956721; bh=htzE2WoeSe wZGh/kA6qRosNnUKKyj5E9+up8q2KO7lc=; b=LgxXydwajYi84XnA5JTFREpSK4 5OafNYQo8Vb/IGF+YBQzCrKq8RZTitG3WL/OqAPh1RQVQgwnvus+nNgkFNGYNEbg dlpNeTokfW3BC7CO9GeHV/fFvKmXdda3NMhCwtZLdGZYynL9nUP4ON5iTWpFqs5I EtimZam+XvC9bHLrjMim3j/Qx9WtmR13mXD4xhInC3yiqq7UO2vNmVujq56oazs4 +jU4dd4Qp6wBeDzHashNeOSXO7qV4cxeLg2bE46srmiy+jECnb3WcpHYIaaBqkNP 4cwuoDzj2F2qYP85CWc3cPXl1WNodFhW3Ey7WNGGL/w3JS4/xTzPfPe8b1xQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t= 1778870321; x=1778956721; bh=htzE2WoeSewZGh/kA6qRosNnUKKyj5E9+up 8q2KO7lc=; b=EfeoAktWxZI3V20RHhgxv3k+HnW3nP97H5RmdNTa1ko8wEQN0FB qwUuEjtVuuBvy2j82KTDATMK2P4u9Dvxj99BKylDV65Uda1Jkw+dBWbqWHGwtJl5 dVCrmtLzdR/kEK0uRjPYKRTB53SusFKxV3vDey6GyCwoZH67488VSCTBPieMNsSo 07InvRMoeBqEuuStGGGm7FPZM87ZC3L7wR/BJrnzxrqZ00ENPAuH86yvEapKDaLX 3hBHxzC+iJH70I5c441EXtJYm7PmL0hlbEMYbXR3FBazcd3dDA+QfD5FP79Zo620 5av9kyG4dimyAzdqgkl3VqbBD5QVzo8IfAA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefhedrtddtgddufeduudegucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhepfffhvfevuffkfhggtggujgesthdtredttddtvdenucfhrhhomhepuehorhhishcu uehurhhkohhvuceosghorhhishessghurhdrihhoqeenucggtffrrghtthgvrhhnpeekvd ekffejleelhfevhedvjeduhfejtdfhvdevieeiiedugfeugfdtjefgfeeljeenucevlhhu shhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpegsohhrihhssegsuh hrrdhiohdpnhgspghrtghpthhtohepkedpmhhouggvpehsmhhtphhouhhtpdhrtghpthht ohepjhhohhgrnhhnvghsrdhthhhumhhshhhirhhnseifuggtrdgtohhmpdhrtghpthhtoh eplhhinhhugidqsghtrhhfshesvhhgvghrrdhkvghrnhgvlhdrohhrghdprhgtphhtthho pehfughmrghnrghnrgesshhushgvrdgtohhmpdhrtghpthhtohepughsthgvrhgsrgessh hushgvrdgtohhmpdhrtghpthhtohephhgrnhhsrdhhohhlmhgsvghrghesfigutgdrtgho mhdprhgtphhtthhopegulhgvmhhorghlsehkvghrnhgvlhdrohhrghdprhgtphhtthhope hnrghohhhirhhordgrohhtrgesfigutgdrtghomhdprhgtphhtthhopehhtghhsehlshht rdguvg X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 15 May 2026 14:38:40 -0400 (EDT) Date: Fri, 15 May 2026 11:38:27 -0700 From: Boris Burkov To: Johannes Thumshirn Cc: linux-btrfs@vger.kernel.org, Filipe Manana , David Sterba , Hans Holmberg , Damien Le Moal , Naohiro Aota , Christoph Hellwig Subject: Re: [RFC PATCH 7/7] btrfs: zoned: add RECLAIM_ZONES and RESET_ZONES to first async reclaim loop Message-ID: <20260515183827.GE1197064@zen.localdomain> References: <20260513123445.43197-1-johannes.thumshirn@wdc.com> <20260513123445.43197-8-johannes.thumshirn@wdc.com> Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260513123445.43197-8-johannes.thumshirn@wdc.com> On Wed, May 13, 2026 at 02:34:45PM +0200, Johannes Thumshirn wrote: > On zoned filesystems, when waiting for space tickets during data > relocation, the async reclaim flush state machine may starve if > RECLAIM_ZONES and RESET_ZONES states are not executed early in the flush > sequence. > > Currently do_async_reclaim_data_space() only executes RECLAIM_ZONES and > RESET_ZONES in later flush states (FLUSH_DELALLOC and beyond), but by > the time these states are reached, the ticket wait may have already > deadlocked waiting for space that can only be freed by zone reset. This explanation is a bit confusing to me. Does your previous fix prevent all known deadlocks? If not, can you describe the remaining deadlock in more detail? If having these flush states in the general flush state list causes a deadlock, we should not leave them there, even if we add this earlier pass. I assume the issue is that some other flusher can't make progress when we are out of zones and also lands on a ticket, and so async reclaim is stuck on a ticket and the only way to make progress is to reset a zone (hopefully) or reclaim a zone (painfully?) Maybe we need some high level flushing logic like "needs zoned help now please"? i.e., if we are low/out of free zones, do only zone flushing, otherwise do regular flushing (including zoned stuff if necessary/wise?) > > Fix this by adding RECLAIM_ZONES and RESET_ZONES to the first async > reclaim loop (FLUSH_ALLOC) for zoned filesystems, ensuring zone reset > happens early enough to free space for pending allocation tickets. > > Signed-off-by: Johannes Thumshirn > --- > > This patch was AI assisted and I'm not sure this is the correct thing to > do (the flushing, not the use of AI), hence the RFC tag. > > fs/btrfs/space-info.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c > index ec811a77ebb1..a1235f114f3e 100644 > --- a/fs/btrfs/space-info.c > +++ b/fs/btrfs/space-info.c > @@ -1451,6 +1451,17 @@ static void do_async_reclaim_data_space(struct btrfs_space_info *space_info) > > while (!space_info->full) { > flush_space(space_info, U64_MAX, ALLOC_CHUNK_FORCE, false); > + /* > + * For zoned filesystems, also run RECLAIM_ZONES and RESET_ZONES > + * in the first loop to avoid starvation. Zoned filesystems have > + * sequential write requirements, so space cannot be reused until > + * zones are reset. Running these states early ensures zones are > + * reclaimed and reset before we get into a starvation situation. > + */ > + if (btrfs_is_zoned(fs_info)) { > + flush_space(space_info, U64_MAX, RECLAIM_ZONES, false); > + flush_space(space_info, U64_MAX, RESET_ZONES, false); > + } Just to set a common ground on the existing algorithm: The current logic is to allocate a chunk (which may satisfy tickets) then check if we have any tickets left. If not tickets left, great we're done. Else, allocate another chunk (till full). Finally, go into the various flushers if we can't allocate a chunk. The way you have changed it, you tack on the two zoned specific flushes right after allocating a chunk, regardless of the continued presense of tickets. That feels off to me. I don't know enough about zoned to accurately judge how you want to order it, but I think the question you want to answer for yourself is: If there are totally free bgs on a zoned fs, do I want to run reclaim/reset zones before or after allocating them? Given the fixed number of zones, I would assume reset zones at least should come before grabbing a fresh bg? (Unless that fails in a free zone aware way?) OTOH, if you put zoned reclaim before chunk alloc, we may block data allocations on pretty expensive reclaim work when we could just make progress now by allocating a chunk. Long term, I am planning to refactor space flushing to try to make the separate work less sequential and driven by the demand for the particular type of flushing, but that is way longer term than your immediate need. I am just saying that to hopefully make the pain of the "ordering" aspect a bit more clear in greater context, it's not zoned specific. (it's bad to keep running delalloc first if we have a bunch of ordered extents out and should instead run delayed refs or commmit a txn to unpin, e.g.) > spin_lock(&space_info->lock); > if (list_empty(&space_info->tickets)) { > space_info->flush = false; > -- > 2.54.0 >