From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A39EB2E1EFC; Mon, 6 Apr 2026 05:49:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775454568; cv=none; b=VMB3tlLUYEuKmI5VaurVQt6VI96OM2B1CcFFNTNr38s+FGrA+689c702m55DovouzQHN6x5NK4syd43BE+8HIoMo58rDAGTnpXTRGgppwqPUHCb/iu2yxLbzfarT55m++XXkhY5JJ77JACiNKy1w3sV3mkmPGXC6mmi2x0KM43w= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775454568; c=relaxed/simple; bh=KM+X2KZEqBDA/K5dp30dFrpiPmLj69S+fufNYc2cU2A=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=j9T9cchlaGCJoKTOorpkwJsHQy3TdlEXJmsk12cMZ49jf1NYD3t2LFA1bsrAYRCXDu56ZJ8z8JqCwXzE2gPnciyh/ES9mhaVkpM7duyTCZPAiw9BnrhvQjiqXXD6C2SruJ/MaGkOIJ2kmLliO10bvPgLvGJ89tVWt0kS9Iw/Uns= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=aMkntTdZ; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="aMkntTdZ" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=6WWXWZMI41Am3IOMc7V4QgXpbUSV0Yys/y4z2vFj0+U=; b=aMkntTdZNJnKIOOxVJAW0Dcitj 5A3rbsHMazHEWugGsyUSoquIwVRWzA+1t4rpE1kiP7C2zi0C+12p3Mg0321M3Gspnxbie8PkrchOf YL5gPP30MTxuVja7u8F2A3zQNDCJDMLkZz1MqPrSy930AE6sMtFBo2oRZBRQKzBX+L9M0EpToOCa1 ociP97dZmUvsZXgZh46Wp+XTT7xCh8Pz6PTJlnZ5owTujnFESE/ni8q27LKuU2dkmlRfwygAqwyC4 dRFU/zZ0EZ7mtyjztdyfi3muuDiWGVwAAmr3M+AxcQ1zgRt7fIoDGatFJqeUKLRu3nuYUm3KRJKx6 E8ikz8MA==; Received: from hch by bombadil.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1w9cpw-00000004p6U-2Mgw; Mon, 06 Apr 2026 05:49:24 +0000 Date: Sun, 5 Apr 2026 22:49:24 -0700 From: Christoph Hellwig To: Jeff Layton Cc: Christoph Hellwig , Alexander Viro , Christian Brauner , Jan Kara , "Matthew Wilcox (Oracle)" , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Mike Snitzer , Chuck Lever , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 2/4] mm: add atomic flush guard for IOCB_DONTCACHE writeback Message-ID: References: <20260401-dontcache-v1-0-1f5746fab47a@kernel.org> <20260401-dontcache-v1-2-1f5746fab47a@kernel.org> <629f21c6591903512eb2f3f3c4d6b14a9ac7b91a.camel@kernel.org> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <629f21c6591903512eb2f3f3c4d6b14a9ac7b91a.camel@kernel.org> X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html On Thu, Apr 02, 2026 at 08:49:45AM -0400, Jeff Layton wrote: > > Have you considered stopping to do in-caller writeback for > > IOCB_DONTCACHE vs just leaving it to the writeback daeon? > > > > Either by totally disabling the writeback and just leaving the > > dropbehind bit, or by queuing up wb_writeback_work instances for > > the ranges, or by just increasing the pressure for the writeback > > daemon. Note that with all schemes including the one in this patch > > we might eventually run into writeback scalability limits, which > > will require multiple writeback workers. > > I did test a "dropbehind" mode that just set the dropbehind bit without > doing the flush at the end of the write. It was better than stock > dontcache but the tail latencies were still pretty bad. > > I think having each writer do some writeback submission work makes a > lot of sense. It helps keep the dirty pages below the dirty thresholds > and doesn't seem to tax each writing task _too_ much. The trick is > avoiding lock contention while doing it. Well, an any time you hit a shared resources from multiple threads you create that lock contention. Which is why in file system and writeback land we've moved away from random user processes hitting both data and metadata (e.g. XFS AIL) writeback as it leads to these scalability issues. At some point we might run out of steam in a single thread, although so far that's mostly been because it does stupid things (e.g. writeback on file systems doing complex allocator stuff). > I think what would be ideal would be to have some (lockless) mechanism > to say "there is enough data touched by the range just written to kick > off a write that's a suitable size for the backing store". Each writer > could check that and then kick off writeback for an approprite range. And that is called the writeback thread. So what we should do there is to make sure we queue up writeback on it for each dontcache write. Initially queuing up a wb_writeback_work for each range might be first approximation, although we should probably find a way to just increase a threshold if going down that road. > I think this even could be beneficial in the normal buffered write > codepath too. Yes, we've had lots of observation that the current 30s timeout is actively harmful. Especially on SSDs, but even on HDD just keeping the active might make sense.