linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Garry <john.g.garry@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Ojaswin Mujoo <ojaswin@linux.ibm.com>,
	linux-ext4@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
	Ritesh Harjani <ritesh.list@gmail.com>,
	linux-kernel@vger.kernel.org,
	"Darrick J . Wong" <djwong@kernel.org>,
	linux-block@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, dchinner@redhat.com
Subject: Re: [RFC 1/7] iomap: Don't fall back to buffered write if the write is atomic
Date: Thu, 7 Dec 2023 12:43:12 +0000	[thread overview]
Message-ID: <fe6a98a8-5f53-47d4-8ae3-08e47dd383c2@oracle.com> (raw)
In-Reply-To: <ZWpZJicSjW2XqMmp@dread.disaster.area>

On 01/12/2023 22:07, Dave Chinner wrote:
> RWF_ATOMIC is no different to RWF_NOWAIT. The API doesn't decide
> what can be supported - the filesystems themselves decide what part
> of the API they can support and implement those pieces.
> 
> TO go back to RWF_NOWAIT, for a long time we (XFS) only supported
> RWF_NOWAIT on DIO, and buffered reads and writes were given
> -EOPNOTSUPP by the filesystem. Then other filesystems started
> supporting DIO with RWF_NOWAIT. Then buffered read support was added
> to the page cache and XFS, and as other filesystems were converted
> they removed the RWF_NOWAIT exclusion check from their read IO
> paths.
> 
> We are now in the same place with buffered write support for
> RWF_NOWAIT. XFS, the page cache and iomap allow buffered writes w/
> RWF_NOWAIT, but ext4, btrfs and f2fs still all return -EOPNOTSUPP
> because they don't support non-blocking buffered writes yet.
> 
> This is the same model we should be applying with RWF_ATOMIC - we
> know that over time we'll be able to expand support for atomic
> writes across both direct and buffered IO, so we should not be
> restricting the API or infrastructure to only allow RWF_ATOMIC w/
> DIO. Just have the filesystems reject RWF_ATOMIC w/ -EOPNOTSUPP if
> they don't support it, and for those that do it is conditional on
> whther the filesystem supports it for the given type of IO being
> done.
> 
> Seriously - an application can easily probe for RWF_ATOMIC support
> without needing information to be directly exposed in statx() - just
> open a O_TMPFILE, issue the type of RWF_ATOMIC IO you require to be
> supported, and if it returns -EOPNOTSUPP then it you can't use
> RWF_ATOMIC optimisations in the application....

Hi Dave,

For rejecting RWF_ATOMIC when not supported for a file, how about 
something like this:

--->8----

diff --git a/block/fops.c b/block/fops.c
index 273bd8f5a370..d9563ef29dde 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -639,6 +637,9 @@ static int blkdev_open(struct inode *inode, struct 
file *filp)
  	if (IS_ERR(handle))
  		return PTR_ERR(handle);

+	if (queue_atomic_write_unit_max_bytes(bdev_get_queue(handle->bdev)))
+		filp->f_mode |= FMODE_CAN_ATOMIC_WRITE;
+
  	if (bdev_nowait(handle->bdev))
  		filp->f_mode |= FMODE_NOWAIT;

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4256ec184461..d725c194243c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -185,6 +185,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, 
loff_t offset,
  /* File supports async nowait buffered writes */
  #define FMODE_BUF_WASYNC	((__force fmode_t)0x80000000)

+/* File supports atomic writes */
+#define FMODE_CAN_ATOMIC_WRITE	((__force fmode_t)0x100000000)
+
  /*
   * Attribute flags.  These should be or-ed together to figure out what
   * has been changed!
@@ -3266,6 +3269,10 @@ static inline int kiocb_set_rw_flags(struct kiocb 
*ki, rwf_t flags)
  			return -EOPNOTSUPP;
  		kiocb_flags |= IOCB_NOIO;
  	}
+	if (flags & RWF_ATOMIC) {
+		if (!(ki->ki_filp->f_mode & FMODE_CAN_ATOMIC_WRITE))
+			return -EOPNOTSUPP;
+	}
  	kiocb_flags |= (__force int) (flags & RWF_SUPPORTED);
  	if (flags & RWF_SYNC)
  		kiocb_flags |= IOCB_DSYNC;
diff --git a/include/linux/types.h b/include/linux/types.h
index 253168bb3fe1..49c754fde1d6 100644
--- a/include/linux/types.h
+++ b/include/linux/types.h
@@ -153,7 +153,7 @@ typedef u32 dma_addr_t;

  typedef unsigned int __bitwise gfp_t;
  typedef unsigned int __bitwise slab_flags_t;
-typedef unsigned int __bitwise fmode_t;
+typedef unsigned long __bitwise fmode_t;

  #ifdef CONFIG_PHYS_ADDR_T_64BIT
  typedef u64 phys_addr_t;


----8<------

My concern is that we need to increase fmode_t in size as all available 
32 bits are used up.

Thanks,
John

  parent reply	other threads:[~2023-12-07 12:43 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-30 13:53 [RFC 0/7] ext4: Allocator changes for atomic write support with DIO Ojaswin Mujoo
2023-11-30 13:53 ` [RFC 1/7] iomap: Don't fall back to buffered write if the write is atomic Ojaswin Mujoo
2023-11-30 21:10   ` Dave Chinner
2023-12-01 10:42     ` John Garry
2023-12-01 13:27       ` Matthew Wilcox
2023-12-01 19:06         ` John Garry
2023-12-01 22:07       ` Dave Chinner
2023-12-04  9:02         ` John Garry
2023-12-04 18:17           ` Darrick J. Wong
2023-12-04 18:34             ` John Garry
2023-12-07 12:43         ` John Garry [this message]
2023-11-30 13:53 ` [RFC 2/7] ext4: Factor out size and start prediction from ext4_mb_normalize_request() Ojaswin Mujoo
2023-11-30 13:53 ` [RFC 3/7] ext4: add aligned allocation support in mballoc Ojaswin Mujoo
2023-11-30 13:53 ` [RFC 4/7] ext4: allow inode preallocation for aligned alloc Ojaswin Mujoo
2023-11-30 13:53 ` [RFC 5/7] block: export blkdev_atomic_write_valid() and refactor api Ojaswin Mujoo
2023-12-01 10:47   ` John Garry
2023-12-11 10:57     ` Ojaswin Mujoo
2023-11-30 13:53 ` [RFC 6/7] ext4: Add aligned allocation support for atomic direct io Ojaswin Mujoo
2023-11-30 13:53 ` [RFC 7/7] ext4: Support atomic write for statx Ojaswin Mujoo
2023-12-04 10:36 ` [RFC 0/7] ext4: Allocator changes for atomic write support with DIO John Garry
2023-12-04 13:38   ` Ojaswin Mujoo
2023-12-04 14:44     ` John Garry
2023-12-11 10:54       ` Ojaswin Mujoo
2023-12-12  7:46         ` John Garry
2023-12-12 13:10           ` Christoph Hellwig
2023-12-12 15:16             ` Theodore Ts'o
2023-12-12 15:19               ` Christoph Hellwig
2023-12-12 16:10             ` John Garry
2023-12-13  5:59           ` Ojaswin Mujoo
2023-12-13  9:17             ` John Garry
2023-12-13  6:42         ` Ojaswin Mujoo
2023-12-13  9:20           ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fe6a98a8-5f53-47d4-8ae3-08e47dd383c2@oracle.com \
    --to=john.g.garry@oracle.com \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ojaswin@linux.ibm.com \
    --cc=ritesh.list@gmail.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).