From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 61627A95C; Tue, 5 Nov 2024 00:09:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730765397; cv=none; b=nr93RficSJe+5EtsWOobVK8jY90xOndOUqQsU5dSB8T7kRXjKY3G057p7u7pzzmnO4LWIaB6NzylS84W+dfbkyfWM0sNqGrs2s4mGNHo2lReYvDz/gwuv+tppVAjbzLM88uu/2D/3X4GzS4fnm+9xtqY+VyVslWaUFwYUUEDoFc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730765397; c=relaxed/simple; bh=5SMbRoqtUQVxiBaWYqe7JPubf3gdBEC5Rp9xzsEkQBQ=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=rnImv3Wged7YhpqdbH9RRVwKdBLo/CpJK3zMBLztUEncU/RYBitJg8Y11+iSIc8HVto0+1+YcaG0P66u7jB/fpaB5H4yp8w8+hyEc8SeRRfXBWtB0H7C/ninNRMljNdSgE/9DfqKKeUlxQr2VAs/EaRRtqyf66fC4hoXk7f0Omc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=fhJE3Fnl; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="fhJE3Fnl" Received: by smtp.kernel.org (Postfix) with ESMTPSA id DADFEC4CECE; Tue, 5 Nov 2024 00:09:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730765396; bh=5SMbRoqtUQVxiBaWYqe7JPubf3gdBEC5Rp9xzsEkQBQ=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=fhJE3FnlD42iycAX4PXmPBOthNKuOIhE7JsPDqNnV149eJ2o8kmL5mI+vOCmThMw7 T75bO1lKqxjXSYa14GiUdZp/kSa9ElnCjBUvpviNk0AxgoJ0QmAqu/Mi45Zhf+k6Qw 7TOvDZbWBxA8FZ6fdkWpQOLKJswph/2ryE+XEx0A5Yj3f2Yq+qecBM3S7KXXFnj4xB QwXmEQvuqQorkVOIFAfQOgVkMCKDUWUuWoN0/CG8gy1Nh3gObOR4x88R9VepSEGv1S reGacUWpNmOlPj8DdnZxwdZ/ydnwAs0JKw54N7HA4pzJZa9BAF6foTpvuWZH4L+hSC 6E5B0j5RTIw4Q== Date: Mon, 4 Nov 2024 16:09:56 -0800 From: "Darrick J. Wong" To: Dave Chinner Cc: Ritesh Harjani , Theodore Ts'o , John Garry , linux-ext4@vger.kernel.org, Jan Kara , Christoph Hellwig , Ojaswin Mujoo , linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH 5/6] iomap: Lift blocksize restriction on atomic writes Message-ID: <20241105000956.GJ21832@frogsfrogsfrogs> References: <7e322989-c6e0-424a-94bd-3ad6ce5ffee9@oracle.com> <87ttd0mnuo.fsf@gmail.com> <7aea00d4-3914-414d-a18f-586a303868c1@oracle.com> <87r084mkat.fsf@gmail.com> <509180f3-4cc1-4cc2-9d43-5a1e728fb718@oracle.com> <87plnomfsy.fsf@gmail.com> <20241025182858.GM2386201@frogsfrogsfrogs> <87jzdvmqfz.fsf@gmail.com> <20241031213640.GB21832@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Mon, Nov 04, 2024 at 12:52:42PM +1100, Dave Chinner wrote: > On Thu, Oct 31, 2024 at 02:36:40PM -0700, Darrick J. Wong wrote: > > On Sat, Oct 26, 2024 at 10:05:44AM +0530, Ritesh Harjani wrote: > > > > This gets me to the third and much less general solution -- only allow > > > > untorn writes if we know that the ioend only ever has to run a single > > > > transaction. That's why untorn writes are limited to a single fsblock > > > > for now -- it's a simple solution so that we can get our downstream > > > > customers to kick the tires and start on the next iteration instead of > > > > spending years on waterfalling. > > > > > > > > Did you notice that in all of these cases, the capabilities of the > > > > filesystem's ioend processing determines the restrictions on the number > > > > and type of mappings that ->iomap_begin can give to iomap? > > > > > > > > Now that we have a second system trying to hook up to the iomap support, > > > > it's clear to me that the restrictions on mappings are specific to each > > > > filesystem. Therefore, the iomap directio code should not impose > > > > restrictions on the mappings it receives unless they would prevent the > > > > creation of the single aligned bio. > > > > > > > > Instead, xfs_direct_write_iomap_begin and ext4_iomap_begin should return > > > > EINVAL or something if they look at the file mappings and discover that > > > > they cannot perform the ioend without risking torn mapping updates. In > > > > the long run, ->iomap_begin is where this iomap->len <= iter->len check > > > > really belongs, but hold that thought. > > > > > > > > For the multi fsblock case, the ->iomap_begin functions would have to > > > > check that only one metadata update would be necessary in the ioend. > > > > That's where things get murky, since ext4/xfs drop their mapping locks > > > > between calls to ->iomap_begin. So you'd have to check all the mappings > > > > for unsupported mixed state every time. Yuck. > > > > > > > > > > Thanks Darrick for taking time summarizing what all has been done > > > and your thoughts here. > > > > > > > It might be less gross to retain the restriction that iomap accepts only > > > > one mapping for the entire file range, like Ritesh has here. > > > > > > less gross :) sure. > > > > > > I would like to think of this as, being less restrictive (compared to > > > only allowing a single fsblock) by adding a constraint on the atomic > > > write I/O request i.e. > > > > > > "Atomic write I/O request to a region in a file is only allowed if that > > > region has no partially allocated extents. Otherwise, the file system > > > can fail the I/O operation by returning -EINVAL." > > > > > > Essentially by adding this constraint to the I/O request, we are > > > helping the user to prevent atomic writes from accidentally getting > > > torned and also allowing multi-fsblock writes. So I still think that > > > might be the right thing to do here or at least a better start. FS can > > > later work on adding such support where we don't even need above > > > such constraint on a given atomic write I/O request. > > > > On today's ext4 call, Ted and Ritesh and I realized that there's a bit > > more to it than this -- it's not possible to support untorn writes to a > > mix of written/(cow,unwritten) mappings even if they all point to the > > same physical space. If the system fails after the storage device > > commits the write but before any of the ioend processing is scheduled, a > > subsequent read of the previously written blocks will produce the new > > data, but reads to the other areas will produce the old contents (or > > zeroes, or whatever). That's a torn write. > > I'm *really* surprised that people are only realising that IO > completion processing for atomic writes *must be atomic*. I've been saying for a while; it's just that I didn't realize until last week that there were more rules than "can't do an untorn write if you need to do more than 1 mapping update": Untorn writes are not possible if: 1. More than 1 mapping update is needed 2. 1 mapping update is needed but there's a written block (1) can be worked around with a log intent item for ioend processing, but I don't think (2) can at all. That's why I went back to saying that untorn writes require that there can only be one mapping. > This was the foundational constraint that the forced alignment > proposal for XFS was intended to address. i.e. to prevent fs > operations from violating atomic write IO constraints (e.g. punching > sub-atomic write size holes in the file) so that the physical IO can > be done without tearing and the IO completion processing that > exposes the new data can be done atomically. > > > Therefore, iomap ought to stick to requiring that ->iomap_begin returns > > a single iomap to cover the entire file range for the untorn write. For > > an unwritten extent, the post-recovery read will see either zeroes or > > the new contents; for a single-mapping COW it'll see old or new contents > > but not both. > > I'm pretty sure we enforced that in the XFS mapping implemention for > atomic writes using forced alignment. i.e. we have to return a > correctly aligned, contiguous mapping to iomap or we have to return > -EINVAL to indicate atomic write mapping failed. I think you guys did, it's just the ext4 bigalloc thing that started this back up again. :/ > Yes, we can check this in iomap, but it's really the filesystem that > has to implement and enforce it... I think we ought to keep the "1 mapping per untorn write" check in iomap until someone decides that it's worth the trouble to make a filesystem handle mixed states correctly. Mostly as a guard against the implementations. --D > -Dave. > -- > Dave Chinner > david@fromorbit.com >