From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D548EC5475B for ; Wed, 28 Feb 2024 23:34:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3B9C86B009D; Wed, 28 Feb 2024 18:34:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 368C86B009E; Wed, 28 Feb 2024 18:34:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 257096B00A0; Wed, 28 Feb 2024 18:34:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 176CD6B009D for ; Wed, 28 Feb 2024 18:34:08 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 5CDA0811EA for ; Wed, 28 Feb 2024 23:34:07 +0000 (UTC) X-FDA: 81842818134.09.94E6C59 Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by imf06.hostedemail.com (Postfix) with ESMTP id 5C98C180003 for ; Wed, 28 Feb 2024 23:34:05 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=mit.edu header.s=outgoing header.b=LtAr9CAe; dmarc=pass (policy=none) header.from=mit.edu; spf=pass (imf06.hostedemail.com: domain of tytso@mit.edu designates 18.9.28.11 as permitted sender) smtp.mailfrom=tytso@mit.edu ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709163245; a=rsa-sha256; cv=none; b=24dUAD5CbRN9FA9OSitH+sycauu6EFKe7AqfvZQbesR4YADUaMIot7wA5hOPxexX6y8usM 6CNbiOEaHURctSYFYT1H/xhpj/J1U27oBV3FsSwSIYGIvDy9lP8K20c6uOy7gC1NmxB0E4 BtHr97sh2SAeMCDi+SlrKyxEbaCoRGU= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=mit.edu header.s=outgoing header.b=LtAr9CAe; dmarc=pass (policy=none) header.from=mit.edu; spf=pass (imf06.hostedemail.com: domain of tytso@mit.edu designates 18.9.28.11 as permitted sender) smtp.mailfrom=tytso@mit.edu ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709163245; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Cw9owdyW56cAXnWXdSRMc6KqAjpPFWobVCNSzlRNVrA=; b=YpgyMMY9piexiLKbZcQO37OcLoq7w2BvwfW/zKL+jFJ2N/Y3GMa8h1+0cepHHA4iFMuVaz o8rF+ug/X5nX5bxXVWDk/yI/EaIcym1bIq5wW0ekTFFUWB6GtKnlRKPyrT/tD97kYB6knE q2h67AHfWbBpsf7NF5/e/L/BLCVwAvg= Received: from macsyma.thunk.org (c-73-8-226-230.hsd1.il.comcast.net [73.8.226.230]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 41SNXtho015776 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 28 Feb 2024 18:33:56 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mit.edu; s=outgoing; t=1709163237; bh=Cw9owdyW56cAXnWXdSRMc6KqAjpPFWobVCNSzlRNVrA=; h=Date:From:Subject:Message-ID:MIME-Version:Content-Type; b=LtAr9CAeCbJZfDGyZYqTwGLxIt4xeZAduEgnfNyhZ5iTGc+t5YYiBfkTo5Sx4lonE lSbfrSehNqyXSEokd64cQeKBrj/hGjEup+QinNRDW2oQzAzTLvUbQE63zKjhyGDWdn GnMzPd4BInw9LHeOhWR02xo8kLw0dhG43+ZAsQ+fcDN74Nwu4E/leI4+buqe1eY+pn gKO24Pbyb48KLF8lKHOVxJlM53nliSBc0e3eugbrEZQIiCO9LaJBM/fLwnBowKCZd8 i+robjRUT2OAxKgR1wMmGwO79Aaxx/pPjd4BAFrIHKQqa/g2gzcQtxdWGYgcQCbntS fb1vQkxptF9hQ== Received: by macsyma.thunk.org (Postfix, from userid 15806) id 0538F3404B0; Wed, 28 Feb 2024 17:33:55 -0600 (CST) Date: Wed, 28 Feb 2024 17:33:54 -0600 From: "Theodore Ts'o" To: Matthew Wilcox Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm Subject: Re: [LSF/MM/BPF TOPIC] untorn buffered writes Message-ID: <20240228233354.GC177082@mit.edu> References: <20240228061257.GA106651@mit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 5C98C180003 X-Stat-Signature: paizcecf39k3kf9y8wefgthsnjeywzhs X-HE-Tag: 1709163245-182651 X-HE-Meta: U2FsdGVkX18NWgonPVbNFIXjzVSoq1OCcKpahmr0QMeuTwRt+S48VcYXJTnIct3Y75YBLB/3LbIDKWbGNWPz4K9r5hN19iLTUhN+4zE73U0PNFAmvOfBWr16tpuJw79PszK5EqEzw7wp1t7xUG6PLK3tfM+3rbXxS/IFdv3g3toYG5B6ROfeBmt9fWVLQPvqs+AKVnTaSQ1Tu1VUg+VZBvo8LWMhhF87A73vXjOhupR7uoj8h6mHwd4zO0Xda9HLgwUriLblvaEx1n5ZfKFeTZb804QkHEjnEniVxegM0jYMgHGcvtX026HyWjI0kzpUyqEy2XBxb4T5jPf4lK6BLDwQhq7l1LEzrfsEwfMGJ5QgnJCbNpPBSKxY5fUNgJ6xhPIUoGxBw55f/5lffIM10ZMav1spYPPCGxQsm5l21oDxfLUiRzxdpiEgk41yMis02JQ2xkf8HzYH7Zgzz6GaOrNTiLz8zoJ0mvmE0WYl9f6Z5ziFJmE3Y2dyHOViu76E2J5IBBUB/x/TT66MhAnIZaFIwlqmuLPwDkitWkvN0Hr3m8VKGYw9coxZXgOm1UmH/+S4zjLsLg9eVqWLVAFQgIhvia8YagH9526jOyrhodtmcsrAOZbAT8imYYHavJL2regDP4gG2HlUoLRwbKs/RZm96um6+a/6eoA1nDz9KCjNVxAkLj5PpNJzDQsxxc/nybZtUneeUb5jPT8vW7tzitNueG4suiMI1SyS6i7U5Gpt5bKqx8t56XpFL5tiGV+x8jiOH7xJhEjCRaM6Q2wkM1yOJBJRkubq0PFcvcyuz5eg20FKHWmFeqIHu5Lp6dc18DdhmH7zxr376DEOnkzKH8+KpbJ20e/7PuA2gyKaUTg6XQJQJBayfJtlUqkrGjgNIfpLxLMRvaFwoR7SjTf8DCylY5je2h7Itbyy5+jrfv8FeXL8+aVGC+Lge3t4qfsZdfcKi9sggVQIw3v1ujo AbIndSEE Zq/bTz8hBvgrN7KG1m4a5KA1tn9nfu4A5Ug3/bmh6ym6FMu3NPs/XuOe+qt2IEl4OKgRl41sraA8UrpQBZjlc5Krq0v12gPIp5mGjjDo85u1mHely65fd7wgOA1svQkHovTi6htr44CSH3tixfagirdJ8cPsenBDYt9ku X-Bogosity: Ham, tests=bogofilter, spamicity=0.000018, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Feb 28, 2024 at 02:11:06PM +0000, Matthew Wilcox wrote: > I'm not entirely sure that it does become a mess. If our implementation > of this ensures that each write ends up in a single folio (even if the > entire folio is larger than the write), then we will have satisfied the > semantics of the flag. What if we do a 32k write which spans two folios? And what if the physical pages for those 32k in the buffer cache are not contiguous? Are you going to have to join the two 16k folios together, or maybe two 8k folios and an 16k folio, and relocate pages to make a contiguous 32k folio when we do a buffered RWF_ATOMIC write of size 32k? Folios have to consist of physically contiguous pages, right? But we can do send a single 32k write request using scatter-gather even if the pages are not physically contiguous. So it would seem to me that trying to overload the folio size to represent the "atomic write guarantee" of RWF_ATOMIC seems unwise. (And yes, the database might not need it to be 32k untorn write, but what if it sends a 32k write, for example because it's writing a set of pages to the database journal file? The RWF_ATOMIC interface doesn't *know* what is really required, the only thing it knows is the overly strong guarantees that we set in the definition of that interface. Or are we going to make the RWF_ATOMIC interface fail all writes that aren't exactly 16k? That seems.... baroque.) > I think we'd be better off treating RWF_ATOMIC like it's a bs>PS device. > That takes two somewhat special cases and makes them use the same code > paths, which probably means fewer bugs as both camps will be testing > the same code. But for a bs > PS device, where the logical block size is greater than the page size, you don't need the RWF_ATOMIC flag at all. All direct I/O writes *must* be a multiple of the logical sector size, and buffered writes, if they are smaller than the block size, *must* be handled as a read-modify-write, since you can't send writes to the device smaller than the logical sector size. This is why I claim that LBS devices and untorn writes are largely orthogonal; for LBS devices no special API is needed at all, and certainly not the highly problematic RWF_ATOMIC API that has been proposed. (Well, not problematic for Direct I/O, which is what we had originally focused upon, but highly problematic for buffered I/O.) - Ted