From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D0EDE29D281 for ; Mon, 10 Nov 2025 17:57:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762797462; cv=none; b=Bx33VltDEl+Z8/MM5NXUSCQ7aaexo/eZ8TmWvJfr9f5FlZhgfh6sRKDw07zvawGV9TAqJIu62+Iwga/Y+CwH9yKtbfPF4Dsjys8GR5oNWfoJXRReS6Inxbm9FSi79yp7317LlpD3X+L+OX4On3mJ0kCcJQAxSD9s3fmHF+F0g1Q= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762797462; c=relaxed/simple; bh=oqBAPL5BwQ2HDF800QYDMEOysjocvnkHmt3Wp7C85g8=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=t8Hng+AnP9tFpY+MaUu/YN6ZhWD0ul0iP/TQKo9c4ped+8gRLxEW3BM7RBmEQxr9AxLIxWzZ/xjF0JvY2IT7LEX8l0vwP6qhC+MlqblURbEB43lIOqajM8OzdvnMlCdQjjU8oMNTVfg6yJdSv+mFEWr3Ro5cIR+87ovCmGy+S/8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=lMzsJhfD; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="lMzsJhfD" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 290A9C113D0; Mon, 10 Nov 2025 17:57:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1762797462; bh=oqBAPL5BwQ2HDF800QYDMEOysjocvnkHmt3Wp7C85g8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=lMzsJhfDU0u1UzS9CxlMpkNGgbnyH+/iNW9esWpJaoHsRmrAh/wn6IoJYAPurbtxw Bvo0mBRKJzNM5C0z638A3ig/+MpjY6vm0Y8G15AD7MIrgqFB0bN93FYK1Ucnr283Vr o8dyixSgDqhAMgBEVURbfT0T4BY6Quxv9l1yweuaZfDRBw1KHw5x+rqTyczu4orQCZ eixyPM8PBpiD8lV6JX+J9fWakDLjCVC+VsjkW1r1WgTzAsD71jwmlDHraamsQBFp8C pjKrHdCU52psddZo0Tey7MIaM0UOAQKS80dUIGdKUti1gLyOkCnZCJBtZGcOhXTOJy w93T19SbLVGyg== Date: Mon, 10 Nov 2025 12:57:41 -0500 From: Mike Snitzer To: Chuck Lever Cc: NeilBrown , Christoph Hellwig , Jeff Layton , Olga Kornievskaia , Dai Ngo , Tom Talpey , linux-nfs@vger.kernel.org, axboe@kernel.dk Subject: Re: [PATCH v11 2/3] NFSD: Implement NFSD_IO_DIRECT for NFS WRITE Message-ID: References: <82be5f47-77df-423d-a4f3-17f83ddb6636@kernel.org> <176255273643.634289.15333032218575182744@noble.neil.brown.name> <176255894778.634289.2265909350991291087@noble.neil.brown.name> <2b024928-e078-4414-a062-bbeedfeea5d9@oracle.com> Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2b024928-e078-4414-a062-bbeedfeea5d9@oracle.com> On Mon, Nov 10, 2025 at 11:41:09AM -0500, Chuck Lever wrote: > On 11/7/25 9:01 PM, Mike Snitzer wrote: > > Q: Case 2 uses DONTCACHE, so case 1 should too right? > > > > A: NO. There is legit benefit to having case 1 use cached buffered IO > > when issuing its 2 subpage IOs; and that benefit doesn't cause harm> because order-0 page management is not causing the MM problems that > > NFSD_IO_DIRECT sets out to avoid (whereas higher order cached buffered > > IO is exactly what both DONTCACHE and NFSD_IO_DIRECT aim to avoid. > > Otherwise MM spins and spins trying to find adequate free pages, > > cannot so does dirty writeback and reclaim which causes kswapd and > > kcompactd to burn cpu, etc). > > Paraphrasing: Each unaligned end (case 1) is always smaller than a page, > thus it will stick with order-0 allocations (if that byte range is not > already in the page cache), allocations that are, practically speaking, > reliable. > > However it might be a stretch to claim that an order-0 allocation > /never/ drives memory reclaim. That's fair. > It still comes down to "it's faster and generally not harmful... and > clients have to issue WRITEs that are arbitrarily aligned, so let's help > them out." > > What we still don't know is exactly what the extra cost of setting > DONTCACHE is, even on small writes. Maybe DONTCACHE should be cleared > for /all/ segments that are smaller than a page? I think so. Might be Jens has thoughts on this (now cc'd)? > Sidebar: I resist calling such writes poorly formed or misaligned, as > there seems to be a little (unintended) moralism in those terms. Clients > must write only what data has changed. Aligning the payload byte ranges > (using RMW) is incredibly inefficient. So those writes are just as > correct and valid as writes that are optimally aligned. > > When I hear "poorly formed" write, I think of an NFS WRITE that has > invalid arguments or corrupted XDR. Very useful sidebar point. NFSD is just trying to accommodate IO it has no choice but to service from the NFS clients. > > Let's please not get hung up of intent of O_DIRECT because > > NFSD_IO_DIRECT achieves that intent very well > > I think we need to have a clear idea what that intent is, because it > is explicitly referenced in a code comment as the rationale for setting > DONTCACHE. Sure, the duality of: DONTCACHE being useful for large buffered IO versus DONTCACHE being detrimental for subpage IO. That duality just means using DONTCACHE isn't a silver bullet for when we need to fallback to buffered IO (but our overall intent is to avoid page cache contention). > It might be better to replace that comment with a reason for > using DONTCACHE that does not mention "the intent of using DIRECT". > Something like: > > * Mark the I/O buffer as evict-able to reduce memory contention. Sure, that'd work. I'm really reassured by how well you understand all this Chuck. Restating for benefit of others (Jens in particular): My point was that even with the case where an IO is split into 3 segments (1: subpage head 2: DIO-aligned middle 3: subpage tail) the intent of O_DIRECT (avoiding caching buffered IO's page cache use) is met as best we can (ideally IO is aligned so there are no subpage end segments). The other extreme this NFSD_IO_DIRECT mode must handle is the entire IO doesn't have any segment that is DIO-aligned, so it must be issued with buffered IO but we want to do so without opening us up to memory contention, DONTCACHE gives us the best option for those IOs. Mike