linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] Restore data lifetime support
@ 2024-02-02 20:39 Bart Van Assche
  2024-02-02 20:39 ` [PATCH v2 1/6] fs: Fix rw_hint validation Bart Van Assche
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: Bart Van Assche @ 2024-02-02 20:39 UTC (permalink / raw)
  To: Christian Brauner, Alexander Viro
  Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig, Martin K . Petersen,
	Bart Van Assche


Hi Christian,

UFS devices are widely used in mobile applications, e.g. in smartphones.
UFS vendors need data lifetime information to achieve good performance.
Providing data lifetime information to UFS devices can result in up to 40%
lower write amplification. Hence this patch series that restores the
bi_write_hint member in struct bio. After this patch series has been merged,
patches that implement data lifetime support in the SCSI disk (sd) driver
will be sent to the Linux kernel SCSI maintainer.

The following changes are included in this patch series:
 - Improvements for the F_GET_RW_HINT and F_SET_RW_HINT fcntls.
 - Move enum rw_hint into a new header file.
 - Support F_SET_RW_HINT for block devices to make it easy to test data
   lifetime support.
 - Restore the bio.bi_write_hint member and restore support in the VFS layer
   and also in the block layer for data lifetime information.

The shell script that has been used to test the patch series combined with
the SCSI patches is available at the end of this cover letter.

Please consider this patch series for the next merge window.

Thanks,

Bart.

Changes compared to v1:
 - Rebased this patch series on top of the vfs.all branch.
 - Use READ_ONCE() / WRITE_ONCE() to access i_write_hint in the fcntl
   implementation.

Bart Van Assche (6):
  fs: Fix rw_hint validation
  fs: Verify write lifetime constants at compile time
  fs: Split fcntl_rw_hint()
  fs: Move enum rw_hint into a new header file
  fs: Propagate write hints to the struct block_device inode
  block, fs: Restore the per-bio/request data lifetime fields

 block/bio.c                 |  2 ++
 block/blk-crypto-fallback.c |  1 +
 block/blk-merge.c           |  8 +++++
 block/blk-mq.c              |  2 ++
 block/bounce.c              |  1 +
 block/fops.c                |  3 ++
 fs/buffer.c                 | 12 ++++---
 fs/direct-io.c              |  2 ++
 fs/f2fs/f2fs.h              |  1 +
 fs/fcntl.c                  | 64 +++++++++++++++++++++++--------------
 fs/inode.c                  |  1 +
 fs/iomap/buffered-io.c      |  1 +
 fs/iomap/direct-io.c        |  1 +
 fs/mpage.c                  |  1 +
 include/linux/blk-mq.h      |  2 ++
 include/linux/blk_types.h   |  2 ++
 include/linux/fs.h          | 16 ++--------
 include/linux/rw_hint.h     | 24 ++++++++++++++
 18 files changed, 102 insertions(+), 42 deletions(-)
 create mode 100644 include/linux/rw_hint.h


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/6] fs: Fix rw_hint validation
  2024-02-02 20:39 [PATCH v2 0/6] Restore data lifetime support Bart Van Assche
@ 2024-02-02 20:39 ` Bart Van Assche
  2024-02-02 20:39 ` [PATCH v2 2/6] fs: Verify write lifetime constants at compile time Bart Van Assche
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Bart Van Assche @ 2024-02-02 20:39 UTC (permalink / raw)
  To: Christian Brauner, Alexander Viro
  Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig, Martin K . Petersen,
	Bart Van Assche, Kanchan Joshi, Jeff Layton, Chuck Lever,
	Stephen Rothwell

Reject values that are valid rw_hints after truncation but not before
truncation by passing an untruncated value to rw_hint_valid().

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 5657cb0797c4 ("fs/fcntl: use copy_to/from_user() for u64 types")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 fs/fcntl.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index c80a6acad742..3ff707bf2743 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -268,7 +268,7 @@ static int f_getowner_uids(struct file *filp, unsigned long arg)
 }
 #endif
 
-static bool rw_hint_valid(enum rw_hint hint)
+static bool rw_hint_valid(u64 hint)
 {
 	switch (hint) {
 	case RWH_WRITE_LIFE_NOT_SET:
@@ -288,19 +288,17 @@ static long fcntl_rw_hint(struct file *file, unsigned int cmd,
 {
 	struct inode *inode = file_inode(file);
 	u64 __user *argp = (u64 __user *)arg;
-	enum rw_hint hint;
-	u64 h;
+	u64 hint;
 
 	switch (cmd) {
 	case F_GET_RW_HINT:
-		h = inode->i_write_hint;
-		if (copy_to_user(argp, &h, sizeof(*argp)))
+		hint = inode->i_write_hint;
+		if (copy_to_user(argp, &hint, sizeof(*argp)))
 			return -EFAULT;
 		return 0;
 	case F_SET_RW_HINT:
-		if (copy_from_user(&h, argp, sizeof(h)))
+		if (copy_from_user(&hint, argp, sizeof(hint)))
 			return -EFAULT;
-		hint = (enum rw_hint) h;
 		if (!rw_hint_valid(hint))
 			return -EINVAL;
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/6] fs: Verify write lifetime constants at compile time
  2024-02-02 20:39 [PATCH v2 0/6] Restore data lifetime support Bart Van Assche
  2024-02-02 20:39 ` [PATCH v2 1/6] fs: Fix rw_hint validation Bart Van Assche
@ 2024-02-02 20:39 ` Bart Van Assche
  2024-02-02 20:39 ` [PATCH v2 3/6] fs: Split fcntl_rw_hint() Bart Van Assche
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Bart Van Assche @ 2024-02-02 20:39 UTC (permalink / raw)
  To: Christian Brauner, Alexander Viro
  Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig, Martin K . Petersen,
	Bart Van Assche, Johannes Thumshirn

The code in fs/fcntl.c converts RWH_* constants to and from WRITE_LIFE_*
constants using casts. Verify at compile time that these casts will yield
the intended effect.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 fs/fcntl.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index 3ff707bf2743..f3bc4662455f 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -270,6 +270,13 @@ static int f_getowner_uids(struct file *filp, unsigned long arg)
 
 static bool rw_hint_valid(u64 hint)
 {
+	BUILD_BUG_ON(WRITE_LIFE_NOT_SET != RWH_WRITE_LIFE_NOT_SET);
+	BUILD_BUG_ON(WRITE_LIFE_NONE != RWH_WRITE_LIFE_NONE);
+	BUILD_BUG_ON(WRITE_LIFE_SHORT != RWH_WRITE_LIFE_SHORT);
+	BUILD_BUG_ON(WRITE_LIFE_MEDIUM != RWH_WRITE_LIFE_MEDIUM);
+	BUILD_BUG_ON(WRITE_LIFE_LONG != RWH_WRITE_LIFE_LONG);
+	BUILD_BUG_ON(WRITE_LIFE_EXTREME != RWH_WRITE_LIFE_EXTREME);
+
 	switch (hint) {
 	case RWH_WRITE_LIFE_NOT_SET:
 	case RWH_WRITE_LIFE_NONE:

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 3/6] fs: Split fcntl_rw_hint()
  2024-02-02 20:39 [PATCH v2 0/6] Restore data lifetime support Bart Van Assche
  2024-02-02 20:39 ` [PATCH v2 1/6] fs: Fix rw_hint validation Bart Van Assche
  2024-02-02 20:39 ` [PATCH v2 2/6] fs: Verify write lifetime constants at compile time Bart Van Assche
@ 2024-02-02 20:39 ` Bart Van Assche
  2024-02-02 20:39 ` [PATCH v2 4/6] fs: Move enum rw_hint into a new header file Bart Van Assche
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Bart Van Assche @ 2024-02-02 20:39 UTC (permalink / raw)
  To: Christian Brauner, Alexander Viro
  Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig, Martin K . Petersen,
	Bart Van Assche, Kanchan Joshi, Jeff Layton, Chuck Lever,
	Stephen Rothwell

Split fcntl_rw_hint() such that there is one helper function per fcntl.
Use READ_ONCE() and WRITE_ONCE() to access the i_write_hint member
instead of protecting such accesses with the inode lock. READ_ONCE() is
not used in I/O path code that reads i_write_hint. Users who want
F_SET_RW_HINT to affect I/O need to make sure that F_SET_RW_HINT has
completed before I/O is submitted that should use the configured write
hint.

Cc: Christoph Hellwig <hch@lst.de>
Suggested-by: Christoph Hellwig <hch@lst.de>
Cc: Kanchan Joshi <joshi.k@samsung.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 fs/fcntl.c | 45 ++++++++++++++++++++++++---------------------
 1 file changed, 24 insertions(+), 21 deletions(-)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index f3bc4662455f..d2b15351ab8e 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -290,32 +290,33 @@ static bool rw_hint_valid(u64 hint)
 	}
 }
 
-static long fcntl_rw_hint(struct file *file, unsigned int cmd,
-			  unsigned long arg)
+static long fcntl_get_rw_hint(struct file *file, unsigned int cmd,
+			      unsigned long arg)
 {
 	struct inode *inode = file_inode(file);
 	u64 __user *argp = (u64 __user *)arg;
-	u64 hint;
+	u64 hint = READ_ONCE(inode->i_write_hint);
 
-	switch (cmd) {
-	case F_GET_RW_HINT:
-		hint = inode->i_write_hint;
-		if (copy_to_user(argp, &hint, sizeof(*argp)))
-			return -EFAULT;
-		return 0;
-	case F_SET_RW_HINT:
-		if (copy_from_user(&hint, argp, sizeof(hint)))
-			return -EFAULT;
-		if (!rw_hint_valid(hint))
-			return -EINVAL;
+	if (copy_to_user(argp, &hint, sizeof(*argp)))
+		return -EFAULT;
+	return 0;
+}
 
-		inode_lock(inode);
-		inode->i_write_hint = hint;
-		inode_unlock(inode);
-		return 0;
-	default:
+static long fcntl_set_rw_hint(struct file *file, unsigned int cmd,
+			      unsigned long arg)
+{
+	struct inode *inode = file_inode(file);
+	u64 __user *argp = (u64 __user *)arg;
+	u64 hint;
+
+	if (copy_from_user(&hint, argp, sizeof(hint)))
+		return -EFAULT;
+	if (!rw_hint_valid(hint))
 		return -EINVAL;
-	}
+
+	WRITE_ONCE(inode->i_write_hint, hint);
+
+	return 0;
 }
 
 static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
@@ -421,8 +422,10 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 		err = memfd_fcntl(filp, cmd, argi);
 		break;
 	case F_GET_RW_HINT:
+		err = fcntl_get_rw_hint(filp, cmd, arg);
+		break;
 	case F_SET_RW_HINT:
-		err = fcntl_rw_hint(filp, cmd, arg);
+		err = fcntl_set_rw_hint(filp, cmd, arg);
 		break;
 	default:
 		break;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 4/6] fs: Move enum rw_hint into a new header file
  2024-02-02 20:39 [PATCH v2 0/6] Restore data lifetime support Bart Van Assche
                   ` (2 preceding siblings ...)
  2024-02-02 20:39 ` [PATCH v2 3/6] fs: Split fcntl_rw_hint() Bart Van Assche
@ 2024-02-02 20:39 ` Bart Van Assche
  2024-02-02 20:39 ` [PATCH v2 5/6] fs: Propagate write hints to the struct block_device inode Bart Van Assche
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Bart Van Assche @ 2024-02-02 20:39 UTC (permalink / raw)
  To: Christian Brauner, Alexander Viro
  Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig, Martin K . Petersen,
	Bart Van Assche, Chao Yu, Jan Kara

Move enum rw_hint into a new header file to prepare for using this data
type in the block layer. Add the attribute __packed to reduce the space
occupied by instances of this data type from four bytes to one byte.
Change the data type of i_write_hint from u8 into enum rw_hint.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Chao Yu <chao@kernel.org> # for the F2FS part
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 fs/f2fs/f2fs.h          |  1 +
 fs/fcntl.c              |  1 +
 fs/inode.c              |  1 +
 include/linux/fs.h      | 16 ++--------------
 include/linux/rw_hint.h | 24 ++++++++++++++++++++++++
 5 files changed, 29 insertions(+), 14 deletions(-)
 create mode 100644 include/linux/rw_hint.h

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index dda0aed95464..4a4e60cdac4e 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -24,6 +24,7 @@
 #include <linux/blkdev.h>
 #include <linux/quotaops.h>
 #include <linux/part_stat.h>
+#include <linux/rw_hint.h>
 #include <crypto/hash.h>
 
 #include <linux/fscrypt.h>
diff --git a/fs/fcntl.c b/fs/fcntl.c
index d2b15351ab8e..00be0a710bba 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -27,6 +27,7 @@
 #include <linux/memfd.h>
 #include <linux/compat.h>
 #include <linux/mount.h>
+#include <linux/rw_hint.h>
 
 #include <linux/poll.h>
 #include <asm/siginfo.h>
diff --git a/fs/inode.c b/fs/inode.c
index 6d0d54230363..585d79d40158 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -20,6 +20,7 @@
 #include <linux/ratelimit.h>
 #include <linux/list_lru.h>
 #include <linux/iversion.h>
+#include <linux/rw_hint.h>
 #include <trace/events/writeback.h>
 #include "internal.h"
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3ce900e34f52..4a953eb8b0d1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -43,6 +43,7 @@
 #include <linux/cred.h>
 #include <linux/mnt_idmapping.h>
 #include <linux/slab.h>
+#include <linux/rw_hint.h>
 
 #include <asm/byteorder.h>
 #include <uapi/linux/fs.h>
@@ -309,19 +310,6 @@ struct address_space;
 struct writeback_control;
 struct readahead_control;
 
-/*
- * Write life time hint values.
- * Stored in struct inode as u8.
- */
-enum rw_hint {
-	WRITE_LIFE_NOT_SET	= 0,
-	WRITE_LIFE_NONE		= RWH_WRITE_LIFE_NONE,
-	WRITE_LIFE_SHORT	= RWH_WRITE_LIFE_SHORT,
-	WRITE_LIFE_MEDIUM	= RWH_WRITE_LIFE_MEDIUM,
-	WRITE_LIFE_LONG		= RWH_WRITE_LIFE_LONG,
-	WRITE_LIFE_EXTREME	= RWH_WRITE_LIFE_EXTREME,
-};
-
 /* Match RWF_* bits to IOCB bits */
 #define IOCB_HIPRI		(__force int) RWF_HIPRI
 #define IOCB_DSYNC		(__force int) RWF_DSYNC
@@ -677,7 +665,7 @@ struct inode {
 	spinlock_t		i_lock;	/* i_blocks, i_bytes, maybe i_size */
 	unsigned short          i_bytes;
 	u8			i_blkbits;
-	u8			i_write_hint;
+	enum rw_hint		i_write_hint;
 	blkcnt_t		i_blocks;
 
 #ifdef __NEED_I_SIZE_ORDERED
diff --git a/include/linux/rw_hint.h b/include/linux/rw_hint.h
new file mode 100644
index 000000000000..309ca72f2dfb
--- /dev/null
+++ b/include/linux/rw_hint.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_RW_HINT_H
+#define _LINUX_RW_HINT_H
+
+#include <linux/build_bug.h>
+#include <linux/compiler_attributes.h>
+#include <uapi/linux/fcntl.h>
+
+/* Block storage write lifetime hint values. */
+enum rw_hint {
+	WRITE_LIFE_NOT_SET	= RWH_WRITE_LIFE_NOT_SET,
+	WRITE_LIFE_NONE		= RWH_WRITE_LIFE_NONE,
+	WRITE_LIFE_SHORT	= RWH_WRITE_LIFE_SHORT,
+	WRITE_LIFE_MEDIUM	= RWH_WRITE_LIFE_MEDIUM,
+	WRITE_LIFE_LONG		= RWH_WRITE_LIFE_LONG,
+	WRITE_LIFE_EXTREME	= RWH_WRITE_LIFE_EXTREME,
+} __packed;
+
+/* Sparse ignores __packed annotations on enums, hence the #ifndef below. */
+#ifndef __CHECKER__
+static_assert(sizeof(enum rw_hint) == 1);
+#endif
+
+#endif /* _LINUX_RW_HINT_H */

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 5/6] fs: Propagate write hints to the struct block_device inode
  2024-02-02 20:39 [PATCH v2 0/6] Restore data lifetime support Bart Van Assche
                   ` (3 preceding siblings ...)
  2024-02-02 20:39 ` [PATCH v2 4/6] fs: Move enum rw_hint into a new header file Bart Van Assche
@ 2024-02-02 20:39 ` Bart Van Assche
  2024-02-02 20:39 ` [PATCH v2 6/6] block, fs: Restore the per-bio/request data lifetime fields Bart Van Assche
  2024-02-06 13:35 ` [PATCH v2 0/6] Restore data lifetime support Christian Brauner
  6 siblings, 0 replies; 10+ messages in thread
From: Bart Van Assche @ 2024-02-02 20:39 UTC (permalink / raw)
  To: Christian Brauner, Alexander Viro
  Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig, Martin K . Petersen,
	Bart Van Assche, Jeff Layton, Chuck Lever, Kanchan Joshi

Write hints applied with F_SET_RW_HINT on a block device affect the
block device inode only. Propagate these hints to the inode associated
with struct block_device because that is the inode used when writing
back dirty pages.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 fs/fcntl.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index 00be0a710bba..8eb6d64a985b 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -317,6 +317,13 @@ static long fcntl_set_rw_hint(struct file *file, unsigned int cmd,
 
 	WRITE_ONCE(inode->i_write_hint, hint);
 
+	/*
+	 * file->f_mapping->host may differ from inode. As an example,
+	 * blkdev_open() modifies file->f_mapping.
+	 */
+	if (file->f_mapping->host != inode)
+		WRITE_ONCE(file->f_mapping->host->i_write_hint, hint);
+
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 6/6] block, fs: Restore the per-bio/request data lifetime fields
  2024-02-02 20:39 [PATCH v2 0/6] Restore data lifetime support Bart Van Assche
                   ` (4 preceding siblings ...)
  2024-02-02 20:39 ` [PATCH v2 5/6] fs: Propagate write hints to the struct block_device inode Bart Van Assche
@ 2024-02-02 20:39 ` Bart Van Assche
  2024-02-05 20:46   ` Bart Van Assche
  2024-02-06 13:35 ` [PATCH v2 0/6] Restore data lifetime support Christian Brauner
  6 siblings, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2024-02-02 20:39 UTC (permalink / raw)
  To: Christian Brauner, Alexander Viro
  Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig, Martin K . Petersen,
	Bart Van Assche, Kanchan Joshi

Restore support for passing data lifetime information from filesystems to
block drivers. This patch reverts commit b179c98f7697 ("block: Remove
request.write_hint") and commit c75e707fe1aa ("block: remove the
per-bio/request write hint").

This patch does not modify the size of struct bio because the new
bi_write_hint member fills a hole in struct bio. pahole reports the
following for struct bio on an x86_64 system with this patch applied:

        /* size: 112, cachelines: 2, members: 20 */
        /* sum members: 110, holes: 1, sum holes: 2 */
        /* last cacheline: 48 bytes */

Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/bio.c                 |  2 ++
 block/blk-crypto-fallback.c |  1 +
 block/blk-merge.c           |  8 ++++++++
 block/blk-mq.c              |  2 ++
 block/bounce.c              |  1 +
 block/fops.c                |  3 +++
 fs/buffer.c                 | 12 ++++++++----
 fs/direct-io.c              |  2 ++
 fs/iomap/buffered-io.c      |  1 +
 fs/iomap/direct-io.c        |  1 +
 fs/mpage.c                  |  1 +
 include/linux/blk-mq.h      |  2 ++
 include/linux/blk_types.h   |  2 ++
 13 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index b9642a41f286..c9223e9d31da 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -251,6 +251,7 @@ void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table,
 	bio->bi_opf = opf;
 	bio->bi_flags = 0;
 	bio->bi_ioprio = 0;
+	bio->bi_write_hint = 0;
 	bio->bi_status = 0;
 	bio->bi_iter.bi_sector = 0;
 	bio->bi_iter.bi_size = 0;
@@ -813,6 +814,7 @@ static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp)
 {
 	bio_set_flag(bio, BIO_CLONED);
 	bio->bi_ioprio = bio_src->bi_ioprio;
+	bio->bi_write_hint = bio_src->bi_write_hint;
 	bio->bi_iter = bio_src->bi_iter;
 
 	if (bio->bi_bdev) {
diff --git a/block/blk-crypto-fallback.c b/block/blk-crypto-fallback.c
index e6468eab2681..b1e7415f8439 100644
--- a/block/blk-crypto-fallback.c
+++ b/block/blk-crypto-fallback.c
@@ -172,6 +172,7 @@ static struct bio *blk_crypto_fallback_clone_bio(struct bio *bio_src)
 	if (bio_flagged(bio_src, BIO_REMAPPED))
 		bio_set_flag(bio, BIO_REMAPPED);
 	bio->bi_ioprio		= bio_src->bi_ioprio;
+	bio->bi_write_hint	= bio_src->bi_write_hint;
 	bio->bi_iter.bi_sector	= bio_src->bi_iter.bi_sector;
 	bio->bi_iter.bi_size	= bio_src->bi_iter.bi_size;
 
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 2d470cf2173e..2a06fd33039d 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -810,6 +810,10 @@ static struct request *attempt_merge(struct request_queue *q,
 	if (rq_data_dir(req) != rq_data_dir(next))
 		return NULL;
 
+	/* Don't merge requests with different write hints. */
+	if (req->write_hint != next->write_hint)
+		return NULL;
+
 	if (req->ioprio != next->ioprio)
 		return NULL;
 
@@ -937,6 +941,10 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
 	if (!bio_crypt_rq_ctx_compatible(rq, bio))
 		return false;
 
+	/* Don't merge requests with different write hints. */
+	if (rq->write_hint != bio->bi_write_hint)
+		return false;
+
 	if (rq->ioprio != bio_prio(bio))
 		return false;
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index aa87fcfda1ec..34ceb15d2ea4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2585,6 +2585,7 @@ static void blk_mq_bio_to_request(struct request *rq, struct bio *bio,
 		rq->cmd_flags |= REQ_FAILFAST_MASK;
 
 	rq->__sector = bio->bi_iter.bi_sector;
+	rq->write_hint = bio->bi_write_hint;
 	blk_rq_bio_prep(rq, bio, nr_segs);
 
 	/* This can't fail, since GFP_NOIO includes __GFP_DIRECT_RECLAIM. */
@@ -3185,6 +3186,7 @@ int blk_rq_prep_clone(struct request *rq, struct request *rq_src,
 	}
 	rq->nr_phys_segments = rq_src->nr_phys_segments;
 	rq->ioprio = rq_src->ioprio;
+	rq->write_hint = rq_src->write_hint;
 
 	if (rq->bio && blk_crypto_rq_bio_prep(rq, rq->bio, gfp_mask) < 0)
 		goto free_and_out;
diff --git a/block/bounce.c b/block/bounce.c
index 7cfcb242f9a1..d6a5219f29dd 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -169,6 +169,7 @@ static struct bio *bounce_clone_bio(struct bio *bio_src)
 	if (bio_flagged(bio_src, BIO_REMAPPED))
 		bio_set_flag(bio, BIO_REMAPPED);
 	bio->bi_ioprio		= bio_src->bi_ioprio;
+	bio->bi_write_hint	= bio_src->bi_write_hint;
 	bio->bi_iter.bi_sector	= bio_src->bi_iter.bi_sector;
 	bio->bi_iter.bi_size	= bio_src->bi_iter.bi_size;
 
diff --git a/block/fops.c b/block/fops.c
index 93bae17ce660..63566a3db186 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -73,6 +73,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
 		bio_init(&bio, bdev, vecs, nr_pages, dio_bio_write_op(iocb));
 	}
 	bio.bi_iter.bi_sector = pos >> SECTOR_SHIFT;
+	bio.bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint;
 	bio.bi_ioprio = iocb->ki_ioprio;
 
 	ret = bio_iov_iter_get_pages(&bio, iter);
@@ -203,6 +204,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 
 	for (;;) {
 		bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT;
+		bio->bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint;
 		bio->bi_private = dio;
 		bio->bi_end_io = blkdev_bio_end_io;
 		bio->bi_ioprio = iocb->ki_ioprio;
@@ -321,6 +323,7 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
 	dio->flags = 0;
 	dio->iocb = iocb;
 	bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT;
+	bio->bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint;
 	bio->bi_end_io = blkdev_bio_end_io_async;
 	bio->bi_ioprio = iocb->ki_ioprio;
 
diff --git a/fs/buffer.c b/fs/buffer.c
index b55dea034a5d..d6b64124977a 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -55,7 +55,7 @@
 
 static int fsync_buffers_list(spinlock_t *lock, struct list_head *list);
 static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
-			  struct writeback_control *wbc);
+			  enum rw_hint hint, struct writeback_control *wbc);
 
 #define BH_ENTRY(list) list_entry((list), struct buffer_head, b_assoc_buffers)
 
@@ -1889,7 +1889,8 @@ int __block_write_full_folio(struct inode *inode, struct folio *folio,
 	do {
 		struct buffer_head *next = bh->b_this_page;
 		if (buffer_async_write(bh)) {
-			submit_bh_wbc(REQ_OP_WRITE | write_flags, bh, wbc);
+			submit_bh_wbc(REQ_OP_WRITE | write_flags, bh,
+				      inode->i_write_hint, wbc);
 			nr_underway++;
 		}
 		bh = next;
@@ -1944,7 +1945,8 @@ int __block_write_full_folio(struct inode *inode, struct folio *folio,
 		struct buffer_head *next = bh->b_this_page;
 		if (buffer_async_write(bh)) {
 			clear_buffer_dirty(bh);
-			submit_bh_wbc(REQ_OP_WRITE | write_flags, bh, wbc);
+			submit_bh_wbc(REQ_OP_WRITE | write_flags, bh,
+				      inode->i_write_hint, wbc);
 			nr_underway++;
 		}
 		bh = next;
@@ -2756,6 +2758,7 @@ static void end_bio_bh_io_sync(struct bio *bio)
 }
 
 static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
+			  enum rw_hint write_hint,
 			  struct writeback_control *wbc)
 {
 	const enum req_op op = opf & REQ_OP_MASK;
@@ -2783,6 +2786,7 @@ static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
 	fscrypt_set_bio_crypt_ctx_bh(bio, bh, GFP_NOIO);
 
 	bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
+	bio->bi_write_hint = write_hint;
 
 	__bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));
 
@@ -2802,7 +2806,7 @@ static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
 
 void submit_bh(blk_opf_t opf, struct buffer_head *bh)
 {
-	submit_bh_wbc(opf, bh, NULL);
+	submit_bh_wbc(opf, bh, WRITE_LIFE_NOT_SET, NULL);
 }
 EXPORT_SYMBOL(submit_bh);
 
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 60456263a338..62c97ff9e852 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -410,6 +410,8 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio,
 		bio->bi_end_io = dio_bio_end_io;
 	if (dio->is_pinned)
 		bio_set_flag(bio, BIO_PAGE_PINNED);
+	bio->bi_write_hint = file_inode(dio->iocb->ki_filp)->i_write_hint;
+
 	sdio->bio = bio;
 	sdio->logical_offset_in_bio = sdio->cur_page_fs_offset;
 }
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 2ad0e287c704..7f3ee4112e2a 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1689,6 +1689,7 @@ static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
 			       GFP_NOFS, &iomap_ioend_bioset);
 	bio->bi_iter.bi_sector = iomap_sector(&wpc->iomap, pos);
 	bio->bi_end_io = iomap_writepage_end_bio;
+	bio->bi_write_hint = inode->i_write_hint;
 	wbc_init_bio(wbc, bio);
 
 	ioend = iomap_ioend_from_bio(bio);
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index bcd3f8cf5ea4..f3b43d223a46 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -380,6 +380,7 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
 		fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits,
 					  GFP_KERNEL);
 		bio->bi_iter.bi_sector = iomap_sector(iomap, pos);
+		bio->bi_write_hint = inode->i_write_hint;
 		bio->bi_ioprio = dio->iocb->ki_ioprio;
 		bio->bi_private = dio;
 		bio->bi_end_io = iomap_dio_bio_end_io;
diff --git a/fs/mpage.c b/fs/mpage.c
index 738882e0766d..fa8b99a199fa 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -605,6 +605,7 @@ static int __mpage_writepage(struct folio *folio, struct writeback_control *wbc,
 				GFP_NOFS);
 		bio->bi_iter.bi_sector = first_block << (blkbits - 9);
 		wbc_init_bio(wbc, bio);
+		bio->bi_write_hint = inode->i_write_hint;
 	}
 
 	/*
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 7a8150a5f051..492b0128b5d9 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -8,6 +8,7 @@
 #include <linux/scatterlist.h>
 #include <linux/prefetch.h>
 #include <linux/srcu.h>
+#include <linux/rw_hint.h>
 
 struct blk_mq_tags;
 struct blk_flush_queue;
@@ -135,6 +136,7 @@ struct request {
 	struct blk_crypto_keyslot *crypt_keyslot;
 #endif
 
+	enum rw_hint write_hint;
 	unsigned short ioprio;
 
 	enum mq_rq_state state;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index f288c94374b3..12d87cef2c03 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -10,6 +10,7 @@
 #include <linux/bvec.h>
 #include <linux/device.h>
 #include <linux/ktime.h>
+#include <linux/rw_hint.h>
 
 struct bio_set;
 struct bio;
@@ -269,6 +270,7 @@ struct bio {
 						 */
 	unsigned short		bi_flags;	/* BIO_* below */
 	unsigned short		bi_ioprio;
+	enum rw_hint		bi_write_hint;
 	blk_status_t		bi_status;
 	atomic_t		__bi_remaining;
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 6/6] block, fs: Restore the per-bio/request data lifetime fields
  2024-02-02 20:39 ` [PATCH v2 6/6] block, fs: Restore the per-bio/request data lifetime fields Bart Van Assche
@ 2024-02-05 20:46   ` Bart Van Assche
  0 siblings, 0 replies; 10+ messages in thread
From: Bart Van Assche @ 2024-02-05 20:46 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christian Brauner, Alexander Viro, linux-fsdevel,
	Christoph Hellwig, Martin K . Petersen, Kanchan Joshi

On 2/2/24 12:39, Bart Van Assche wrote:
> Restore support for passing data lifetime information from filesystems to
> block drivers. This patch reverts commit b179c98f7697 ("block: Remove
> request.write_hint") and commit c75e707fe1aa ("block: remove the
> per-bio/request write hint").
> 
> This patch does not modify the size of struct bio because the new
> bi_write_hint member fills a hole in struct bio. pahole reports the
> following for struct bio on an x86_64 system with this patch applied:
> 
>          /* size: 112, cachelines: 2, members: 20 */
>          /* sum members: 110, holes: 1, sum holes: 2 */
>          /* last cacheline: 48 bytes */

Hi Jens,

Since this patch affects both the block layer and VFS code, can you please
help with reviewing this patch?

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/6] Restore data lifetime support
  2024-02-02 20:39 [PATCH v2 0/6] Restore data lifetime support Bart Van Assche
                   ` (5 preceding siblings ...)
  2024-02-02 20:39 ` [PATCH v2 6/6] block, fs: Restore the per-bio/request data lifetime fields Bart Van Assche
@ 2024-02-06 13:35 ` Christian Brauner
  2024-02-06 18:25   ` Bart Van Assche
  6 siblings, 1 reply; 10+ messages in thread
From: Christian Brauner @ 2024-02-06 13:35 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christian Brauner, linux-fsdevel, linux-kernel, Jens Axboe,
	Christoph Hellwig, Martin K . Petersen

On Thu, 11 Jan 2024 17:22:40 +1100, David Disseldorp wrote:
> If initrd_start cpio extraction fails, CONFIG_BLK_DEV_RAM triggers
> fallback to initrd.image handling via populate_initrd_image().
> The populate_initrd_image() call follows successful extraction of any
> built-in cpio archive at __initramfs_start, but currently performs
> built-in archive extraction a second time.
> 
> Prior to commit b2a74d5f9d446 ("initramfs: remove clean_rootfs"),
> the second built-in initramfs unpack call was used to repopulate entries
> removed by clean_rootfs(), but it's no longer necessary now the contents
> of the previous extraction are retained.
> 
> [...]

I've pulled this in. There was a minor merge-conflict with
fs/iomap/buffer_write.c that I've resolved. Please double-check that
it's sane. I'll treat this branch as stable by Friday since I know you
want to rely on it.

We can do it right now but I reckon that there might still be an ack or
two incoming that you wanted.

---

Applied to the vfs.rw branch of the vfs/vfs.git tree.
Patches in the vfs.rw branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.rw

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/6] Restore data lifetime support
  2024-02-06 13:35 ` [PATCH v2 0/6] Restore data lifetime support Christian Brauner
@ 2024-02-06 18:25   ` Bart Van Assche
  0 siblings, 0 replies; 10+ messages in thread
From: Bart Van Assche @ 2024-02-06 18:25 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, linux-kernel, Jens Axboe, Christoph Hellwig,
	Martin K . Petersen

On 2/6/24 05:35, Christian Brauner wrote:
> On Thu, 11 Jan 2024 17:22:40 +1100, David Disseldorp wrote:
>> If initrd_start cpio extraction fails, CONFIG_BLK_DEV_RAM triggers
>> fallback to initrd.image handling via populate_initrd_image().
>> The populate_initrd_image() call follows successful extraction of any
>> built-in cpio archive at __initramfs_start, but currently performs
>> built-in archive extraction a second time.
>>
>> Prior to commit b2a74d5f9d446 ("initramfs: remove clean_rootfs"),
>> the second built-in initramfs unpack call was used to repopulate entries
>> removed by clean_rootfs(), but it's no longer necessary now the contents
>> of the previous extraction are retained.
>>
>> [...]
> 
> I've pulled this in. There was a minor merge-conflict with
> fs/iomap/buffer_write.c that I've resolved. Please double-check that
> it's sane. I'll treat this branch as stable by Friday since I know you
> want to rely on it.

Thanks! The patches on the vfs.rw branch look good to me and pass my regression
tests.

> We can do it right now but I reckon that there might still be an ack or
> two incoming that you wanted.
I'm not sure acks will be incoming despite earlier promises. The following
was promised almost two years ago: "If at some point there's a desire to
actually try and upstream this support, then we'll be happy to review that
patchset." [ ... ] "As I've said multiple times, whenever code is available,
it'll be reviewed and discussed."

Sources:
* https://lore.kernel.org/linux-block/ef77ef36-df95-8658-ff54-7d8046f5d0e7@kernel.dk/,
   March 2022.
* https://lore.kernel.org/linux-block/95588225-b2af-72b6-2feb-811a3b346f9f@kernel.dk/,
   March 2022.

Bart.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-02-06 18:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-02 20:39 [PATCH v2 0/6] Restore data lifetime support Bart Van Assche
2024-02-02 20:39 ` [PATCH v2 1/6] fs: Fix rw_hint validation Bart Van Assche
2024-02-02 20:39 ` [PATCH v2 2/6] fs: Verify write lifetime constants at compile time Bart Van Assche
2024-02-02 20:39 ` [PATCH v2 3/6] fs: Split fcntl_rw_hint() Bart Van Assche
2024-02-02 20:39 ` [PATCH v2 4/6] fs: Move enum rw_hint into a new header file Bart Van Assche
2024-02-02 20:39 ` [PATCH v2 5/6] fs: Propagate write hints to the struct block_device inode Bart Van Assche
2024-02-02 20:39 ` [PATCH v2 6/6] block, fs: Restore the per-bio/request data lifetime fields Bart Van Assche
2024-02-05 20:46   ` Bart Van Assche
2024-02-06 13:35 ` [PATCH v2 0/6] Restore data lifetime support Christian Brauner
2024-02-06 18:25   ` Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).