linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] Restore data lifetime support
@ 2024-01-31 20:52 Bart Van Assche
  2024-01-31 20:52 ` [PATCH 1/6] fs: Fix rw_hint validation Bart Van Assche
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Bart Van Assche @ 2024-01-31 20:52 UTC (permalink / raw)
  To: Christian Brauner, Alexander Viro
  Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig, Martin K . Petersen,
	Bart Van Assche

Hi Christian,

UFS devices are widely used in mobile applications, e.g. in smartphones.
UFS vendors need data lifetime information to achieve good performance.
Providing data lifetime information to UFS devices can result in up to 40%
lower write amplification. Hence this patch series that restores the
bi_write_hint member in struct bio. After this patch series has been merged,
patches that implement data lifetime support in the SCSI disk (sd) driver
will be sent to the Linux kernel SCSI maintainer.

The following changes are included in this patch series:
 - Improvements for the F_GET_RW_HINT and F_SET_RW_HINT fcntls.
 - Move enum rw_hint into a new header file.
 - Support F_SET_RW_HINT for block devices to make it easy to test data
   lifetime support.
 - Restore the bio.bi_write_hint member and restore support in the VFS layer
   and also in the block layer for data lifetime information.

The shell script that has been used to test the patch series combined with
the SCSI patches is available at the end of this cover letter.

Please consider this patch series for the next merge window.

Thanks,

Bart.

Bart Van Assche (6):
  fs: Fix rw_hint validation
  fs: Verify write lifetime constants at compile time
  fs: Split fcntl_rw_hint()
  fs: Move enum rw_hint into a new header file
  fs: Propagate write hints to the struct block_device inode
  block, fs: Restore the per-bio/request data lifetime fields

 block/bio.c                 |  2 ++
 block/blk-crypto-fallback.c |  1 +
 block/blk-merge.c           |  8 +++++
 block/blk-mq.c              |  2 ++
 block/bounce.c              |  1 +
 block/fops.c                |  3 ++
 fs/buffer.c                 | 12 ++++---
 fs/direct-io.c              |  2 ++
 fs/f2fs/f2fs.h              |  1 +
 fs/fcntl.c                  | 64 +++++++++++++++++++++++++------------
 fs/inode.c                  |  1 +
 fs/iomap/buffered-io.c      |  2 ++
 fs/iomap/direct-io.c        |  1 +
 fs/mpage.c                  |  1 +
 include/linux/blk-mq.h      |  2 ++
 include/linux/blk_types.h   |  2 ++
 include/linux/fs.h          | 16 ++--------
 include/linux/rw_hint.h     | 24 ++++++++++++++
 18 files changed, 106 insertions(+), 39 deletions(-)
 create mode 100644 include/linux/rw_hint.h

This patch series, combined with the SCSI patches, has been tested with the
following shell script:

#!/bin/bash
# SPDX-License-Identifier: GPL-3.0+
# Copyright (C) 2022 Google LLC

. tests/zbd/rc
. common/null_blk
. common/scsi_debug

DESCRIPTION="test block write hint support"
QUICK=1

requires() {
	_have_fio
	_have_scsi_debug_group_number_stats
}

submit_io() {
	local stats_attr=/sys/bus/pseudo/drivers/scsi_debug/group_number_stats
	echo "$1 ($3)"
	echo "$*" >>"${FULL}"
	local direct_io=$2
	echo 0 > "${stats_attr}" &&
	local fio_args wh &&
	for wh in none short medium long extreme; do
		if [ "${direct_io}" = 0 ]; then
			sync
			echo 1 > /proc/sys/vm/drop_caches
		fi
		fio_args=(
			--bs=4K
			--buffer_pattern='"'"$wh"'"'
			--direct="${direct_io}"
			--disable_clat=1
			--disable_slat=1
			--end_fsync=$((1 - direct_io))
			--filename="${dev}"
			--group_reporting=1
			--gtod_reduce=1
			--ioengine="$3"
			--ioscheduler=none
			--name=whint_"$wh"
			--norandommap
			--rw=randwrite
			--size=4M
			--thread=1
			--write_hint="$wh"
		)
		echo "fio ${fio_args[*]}" >>"${FULL}" 2>&1
		fio "${fio_args[@]}" >>"${FULL}" 2>&1 || return $?
	done &&
	grep -v ' 0$' "${stats_attr}" >> "${FULL}"
	while read -r group count; do
		if [ "$count" -gt 0 ]; then echo "$group"; fi
	done < "${stats_attr}"
}

test() {
	echo "Running ${TEST_NAME}"

	local scsi_debug_params=(
		delay=0
		dev_size_mb=1024
		sector_size=4096
	)
	_init_scsi_debug "${scsi_debug_params[@]}" &&
	local dev="/dev/${SCSI_DEBUG_DEVICES[0]}" fail &&
	ls -ldi "${dev}" >>"${FULL}" &&
	submit_io "Direct I/O" 1 pvsync &&
	submit_io "Direct I/O" 1 libaio &&
	submit_io "Direct I/O" 1 io_uring &&
	submit_io "Buffered I/O" 0 pvsync ||
	fail=true

	_exit_scsi_debug

	if [ -z "$fail" ]; then
		echo "Test complete"
	else
		echo "Test failed"
		return 1
	fi
}

Expected output:

Running scsi/097
Direct I/O (pvsync)
1
2
3
4
5
Direct I/O (libaio)
1
2
3
4
5
Direct I/O (io_uring)
1
2
3
4
5
Buffered I/O (pvsync)
1
2
3
4
5
Test complete

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/6] fs: Fix rw_hint validation
  2024-01-31 20:52 [PATCH 0/6] Restore data lifetime support Bart Van Assche
@ 2024-01-31 20:52 ` Bart Van Assche
  2024-01-31 20:52 ` [PATCH 2/6] fs: Verify write lifetime constants at compile time Bart Van Assche
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Bart Van Assche @ 2024-01-31 20:52 UTC (permalink / raw)
  To: Christian Brauner, Alexander Viro
  Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig, Martin K . Petersen,
	Bart Van Assche, Kanchan Joshi, Jeff Layton, Chuck Lever,
	Stephen Rothwell

Reject values that are valid rw_hints after truncation but not before
truncation by passing an untruncated value to rw_hint_valid().

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 5657cb0797c4 ("fs/fcntl: use copy_to/from_user() for u64 types")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 fs/fcntl.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index c80a6acad742..3ff707bf2743 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -268,7 +268,7 @@ static int f_getowner_uids(struct file *filp, unsigned long arg)
 }
 #endif
 
-static bool rw_hint_valid(enum rw_hint hint)
+static bool rw_hint_valid(u64 hint)
 {
 	switch (hint) {
 	case RWH_WRITE_LIFE_NOT_SET:
@@ -288,19 +288,17 @@ static long fcntl_rw_hint(struct file *file, unsigned int cmd,
 {
 	struct inode *inode = file_inode(file);
 	u64 __user *argp = (u64 __user *)arg;
-	enum rw_hint hint;
-	u64 h;
+	u64 hint;
 
 	switch (cmd) {
 	case F_GET_RW_HINT:
-		h = inode->i_write_hint;
-		if (copy_to_user(argp, &h, sizeof(*argp)))
+		hint = inode->i_write_hint;
+		if (copy_to_user(argp, &hint, sizeof(*argp)))
 			return -EFAULT;
 		return 0;
 	case F_SET_RW_HINT:
-		if (copy_from_user(&h, argp, sizeof(h)))
+		if (copy_from_user(&hint, argp, sizeof(hint)))
 			return -EFAULT;
-		hint = (enum rw_hint) h;
 		if (!rw_hint_valid(hint))
 			return -EINVAL;
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/6] fs: Verify write lifetime constants at compile time
  2024-01-31 20:52 [PATCH 0/6] Restore data lifetime support Bart Van Assche
  2024-01-31 20:52 ` [PATCH 1/6] fs: Fix rw_hint validation Bart Van Assche
@ 2024-01-31 20:52 ` Bart Van Assche
  2024-01-31 20:52 ` [PATCH 3/6] fs: Split fcntl_rw_hint() Bart Van Assche
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Bart Van Assche @ 2024-01-31 20:52 UTC (permalink / raw)
  To: Christian Brauner, Alexander Viro
  Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig, Martin K . Petersen,
	Bart Van Assche, Johannes Thumshirn

The code in fs/fcntl.c converts RWH_* constants to and from WRITE_LIFE_*
constants using casts. Verify at compile time that these casts will yield
the intended effect.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 fs/fcntl.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index 3ff707bf2743..f3bc4662455f 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -270,6 +270,13 @@ static int f_getowner_uids(struct file *filp, unsigned long arg)
 
 static bool rw_hint_valid(u64 hint)
 {
+	BUILD_BUG_ON(WRITE_LIFE_NOT_SET != RWH_WRITE_LIFE_NOT_SET);
+	BUILD_BUG_ON(WRITE_LIFE_NONE != RWH_WRITE_LIFE_NONE);
+	BUILD_BUG_ON(WRITE_LIFE_SHORT != RWH_WRITE_LIFE_SHORT);
+	BUILD_BUG_ON(WRITE_LIFE_MEDIUM != RWH_WRITE_LIFE_MEDIUM);
+	BUILD_BUG_ON(WRITE_LIFE_LONG != RWH_WRITE_LIFE_LONG);
+	BUILD_BUG_ON(WRITE_LIFE_EXTREME != RWH_WRITE_LIFE_EXTREME);
+
 	switch (hint) {
 	case RWH_WRITE_LIFE_NOT_SET:
 	case RWH_WRITE_LIFE_NONE:

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/6] fs: Split fcntl_rw_hint()
  2024-01-31 20:52 [PATCH 0/6] Restore data lifetime support Bart Van Assche
  2024-01-31 20:52 ` [PATCH 1/6] fs: Fix rw_hint validation Bart Van Assche
  2024-01-31 20:52 ` [PATCH 2/6] fs: Verify write lifetime constants at compile time Bart Van Assche
@ 2024-01-31 20:52 ` Bart Van Assche
  2024-01-31 21:07   ` Dave Chinner
  2024-01-31 20:52 ` [PATCH 4/6] fs: Move enum rw_hint into a new header file Bart Van Assche
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2024-01-31 20:52 UTC (permalink / raw)
  To: Christian Brauner, Alexander Viro
  Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig, Martin K . Petersen,
	Bart Van Assche, Kanchan Joshi, Jeff Layton, Chuck Lever,
	Stephen Rothwell

Split fcntl_rw_hint() such that there is one helper function per fcntl. No
functionality is changed by this patch.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 fs/fcntl.c | 47 ++++++++++++++++++++++++++---------------------
 1 file changed, 26 insertions(+), 21 deletions(-)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index f3bc4662455f..5fa2d95114bf 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -290,32 +290,35 @@ static bool rw_hint_valid(u64 hint)
 	}
 }
 
-static long fcntl_rw_hint(struct file *file, unsigned int cmd,
-			  unsigned long arg)
+static long fcntl_get_rw_hint(struct file *file, unsigned int cmd,
+			      unsigned long arg)
 {
 	struct inode *inode = file_inode(file);
 	u64 __user *argp = (u64 __user *)arg;
-	u64 hint;
+	u64 hint = inode->i_write_hint;
 
-	switch (cmd) {
-	case F_GET_RW_HINT:
-		hint = inode->i_write_hint;
-		if (copy_to_user(argp, &hint, sizeof(*argp)))
-			return -EFAULT;
-		return 0;
-	case F_SET_RW_HINT:
-		if (copy_from_user(&hint, argp, sizeof(hint)))
-			return -EFAULT;
-		if (!rw_hint_valid(hint))
-			return -EINVAL;
+	if (copy_to_user(argp, &hint, sizeof(*argp)))
+		return -EFAULT;
+	return 0;
+}
 
-		inode_lock(inode);
-		inode->i_write_hint = hint;
-		inode_unlock(inode);
-		return 0;
-	default:
+static long fcntl_set_rw_hint(struct file *file, unsigned int cmd,
+			      unsigned long arg)
+{
+	struct inode *inode = file_inode(file);
+	u64 __user *argp = (u64 __user *)arg;
+	u64 hint;
+
+	if (copy_from_user(&hint, argp, sizeof(hint)))
+		return -EFAULT;
+	if (!rw_hint_valid(hint))
 		return -EINVAL;
-	}
+
+	inode_lock(inode);
+	inode->i_write_hint = hint;
+	inode_unlock(inode);
+
+	return 0;
 }
 
 static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
@@ -421,8 +424,10 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 		err = memfd_fcntl(filp, cmd, argi);
 		break;
 	case F_GET_RW_HINT:
+		err = fcntl_get_rw_hint(filp, cmd, arg);
+		break;
 	case F_SET_RW_HINT:
-		err = fcntl_rw_hint(filp, cmd, arg);
+		err = fcntl_set_rw_hint(filp, cmd, arg);
 		break;
 	default:
 		break;

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/6] fs: Move enum rw_hint into a new header file
  2024-01-31 20:52 [PATCH 0/6] Restore data lifetime support Bart Van Assche
                   ` (2 preceding siblings ...)
  2024-01-31 20:52 ` [PATCH 3/6] fs: Split fcntl_rw_hint() Bart Van Assche
@ 2024-01-31 20:52 ` Bart Van Assche
  2024-01-31 20:52 ` [PATCH 5/6] fs: Propagate write hints to the struct block_device inode Bart Van Assche
  2024-01-31 20:52 ` [PATCH 6/6] block, fs: Restore the per-bio/request data lifetime fields Bart Van Assche
  5 siblings, 0 replies; 9+ messages in thread
From: Bart Van Assche @ 2024-01-31 20:52 UTC (permalink / raw)
  To: Christian Brauner, Alexander Viro
  Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig, Martin K . Petersen,
	Bart Van Assche, Chao Yu, Jan Kara

Move enum rw_hint into a new header file to prepare for using this data
type in the block layer. Add the attribute __packed to reduce the space
occupied by instances of this data type from four bytes to one byte.
Change the data type of i_write_hint from u8 into enum rw_hint.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Chao Yu <chao@kernel.org> # for the F2FS part
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 fs/f2fs/f2fs.h          |  1 +
 fs/fcntl.c              |  1 +
 fs/inode.c              |  1 +
 include/linux/fs.h      | 16 ++--------------
 include/linux/rw_hint.h | 24 ++++++++++++++++++++++++
 5 files changed, 29 insertions(+), 14 deletions(-)
 create mode 100644 include/linux/rw_hint.h

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index dda0aed95464..4a4e60cdac4e 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -24,6 +24,7 @@
 #include <linux/blkdev.h>
 #include <linux/quotaops.h>
 #include <linux/part_stat.h>
+#include <linux/rw_hint.h>
 #include <crypto/hash.h>
 
 #include <linux/fscrypt.h>
diff --git a/fs/fcntl.c b/fs/fcntl.c
index 5fa2d95114bf..fc73c5fae43c 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -27,6 +27,7 @@
 #include <linux/memfd.h>
 #include <linux/compat.h>
 #include <linux/mount.h>
+#include <linux/rw_hint.h>
 
 #include <linux/poll.h>
 #include <asm/siginfo.h>
diff --git a/fs/inode.c b/fs/inode.c
index 6d0d54230363..585d79d40158 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -20,6 +20,7 @@
 #include <linux/ratelimit.h>
 #include <linux/list_lru.h>
 #include <linux/iversion.h>
+#include <linux/rw_hint.h>
 #include <trace/events/writeback.h>
 #include "internal.h"
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ebce4763b4bb..f3ba0895bcd7 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -43,6 +43,7 @@
 #include <linux/cred.h>
 #include <linux/mnt_idmapping.h>
 #include <linux/slab.h>
+#include <linux/rw_hint.h>
 
 #include <asm/byteorder.h>
 #include <uapi/linux/fs.h>
@@ -309,19 +310,6 @@ struct address_space;
 struct writeback_control;
 struct readahead_control;
 
-/*
- * Write life time hint values.
- * Stored in struct inode as u8.
- */
-enum rw_hint {
-	WRITE_LIFE_NOT_SET	= 0,
-	WRITE_LIFE_NONE		= RWH_WRITE_LIFE_NONE,
-	WRITE_LIFE_SHORT	= RWH_WRITE_LIFE_SHORT,
-	WRITE_LIFE_MEDIUM	= RWH_WRITE_LIFE_MEDIUM,
-	WRITE_LIFE_LONG		= RWH_WRITE_LIFE_LONG,
-	WRITE_LIFE_EXTREME	= RWH_WRITE_LIFE_EXTREME,
-};
-
 /* Match RWF_* bits to IOCB bits */
 #define IOCB_HIPRI		(__force int) RWF_HIPRI
 #define IOCB_DSYNC		(__force int) RWF_DSYNC
@@ -677,7 +665,7 @@ struct inode {
 	spinlock_t		i_lock;	/* i_blocks, i_bytes, maybe i_size */
 	unsigned short          i_bytes;
 	u8			i_blkbits;
-	u8			i_write_hint;
+	enum rw_hint		i_write_hint;
 	blkcnt_t		i_blocks;
 
 #ifdef __NEED_I_SIZE_ORDERED
diff --git a/include/linux/rw_hint.h b/include/linux/rw_hint.h
new file mode 100644
index 000000000000..309ca72f2dfb
--- /dev/null
+++ b/include/linux/rw_hint.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_RW_HINT_H
+#define _LINUX_RW_HINT_H
+
+#include <linux/build_bug.h>
+#include <linux/compiler_attributes.h>
+#include <uapi/linux/fcntl.h>
+
+/* Block storage write lifetime hint values. */
+enum rw_hint {
+	WRITE_LIFE_NOT_SET	= RWH_WRITE_LIFE_NOT_SET,
+	WRITE_LIFE_NONE		= RWH_WRITE_LIFE_NONE,
+	WRITE_LIFE_SHORT	= RWH_WRITE_LIFE_SHORT,
+	WRITE_LIFE_MEDIUM	= RWH_WRITE_LIFE_MEDIUM,
+	WRITE_LIFE_LONG		= RWH_WRITE_LIFE_LONG,
+	WRITE_LIFE_EXTREME	= RWH_WRITE_LIFE_EXTREME,
+} __packed;
+
+/* Sparse ignores __packed annotations on enums, hence the #ifndef below. */
+#ifndef __CHECKER__
+static_assert(sizeof(enum rw_hint) == 1);
+#endif
+
+#endif /* _LINUX_RW_HINT_H */

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 5/6] fs: Propagate write hints to the struct block_device inode
  2024-01-31 20:52 [PATCH 0/6] Restore data lifetime support Bart Van Assche
                   ` (3 preceding siblings ...)
  2024-01-31 20:52 ` [PATCH 4/6] fs: Move enum rw_hint into a new header file Bart Van Assche
@ 2024-01-31 20:52 ` Bart Van Assche
  2024-01-31 20:52 ` [PATCH 6/6] block, fs: Restore the per-bio/request data lifetime fields Bart Van Assche
  5 siblings, 0 replies; 9+ messages in thread
From: Bart Van Assche @ 2024-01-31 20:52 UTC (permalink / raw)
  To: Christian Brauner, Alexander Viro
  Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig, Martin K . Petersen,
	Bart Van Assche, Kanchan Joshi, Jeff Layton, Chuck Lever

Write hints applied with F_SET_RW_HINT on a block device affect the
block device inode only. Propagate these hints to the inode associated
with struct block_device because that is the inode used when writing
back dirty pages.

Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 fs/fcntl.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index fc73c5fae43c..cfb52c3a4577 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -319,6 +319,17 @@ static long fcntl_set_rw_hint(struct file *file, unsigned int cmd,
 	inode->i_write_hint = hint;
 	inode_unlock(inode);
 
+	/*
+	 * file->f_mapping->host may differ from inode. As an example,
+	 * blkdev_open() modifies file->f_mapping.
+	 */
+	if (file->f_mapping->host != inode) {
+		inode = file->f_mapping->host;
+		inode_lock(inode);
+		inode->i_write_hint = hint;
+		inode_unlock(inode);
+	}
+
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 6/6] block, fs: Restore the per-bio/request data lifetime fields
  2024-01-31 20:52 [PATCH 0/6] Restore data lifetime support Bart Van Assche
                   ` (4 preceding siblings ...)
  2024-01-31 20:52 ` [PATCH 5/6] fs: Propagate write hints to the struct block_device inode Bart Van Assche
@ 2024-01-31 20:52 ` Bart Van Assche
  5 siblings, 0 replies; 9+ messages in thread
From: Bart Van Assche @ 2024-01-31 20:52 UTC (permalink / raw)
  To: Christian Brauner, Alexander Viro
  Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig, Martin K . Petersen,
	Bart Van Assche, Kanchan Joshi

Restore support for passing data lifetime information from filesystems to
block drivers. This patch reverts commit b179c98f7697 ("block: Remove
request.write_hint") and commit c75e707fe1aa ("block: remove the
per-bio/request write hint").

This patch does not modify the size of struct bio because the new
bi_write_hint member fills a hole in struct bio. pahole reports the
following for struct bio on an x86_64 system with this patch applied:

        /* size: 112, cachelines: 2, members: 20 */
        /* sum members: 110, holes: 1, sum holes: 2 */
        /* last cacheline: 48 bytes */

Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/bio.c                 |  2 ++
 block/blk-crypto-fallback.c |  1 +
 block/blk-merge.c           |  8 ++++++++
 block/blk-mq.c              |  2 ++
 block/bounce.c              |  1 +
 block/fops.c                |  3 +++
 fs/buffer.c                 | 12 ++++++++----
 fs/direct-io.c              |  2 ++
 fs/iomap/buffered-io.c      |  2 ++
 fs/iomap/direct-io.c        |  1 +
 fs/mpage.c                  |  1 +
 include/linux/blk-mq.h      |  2 ++
 include/linux/blk_types.h   |  2 ++
 13 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index b9642a41f286..c9223e9d31da 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -251,6 +251,7 @@ void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table,
 	bio->bi_opf = opf;
 	bio->bi_flags = 0;
 	bio->bi_ioprio = 0;
+	bio->bi_write_hint = 0;
 	bio->bi_status = 0;
 	bio->bi_iter.bi_sector = 0;
 	bio->bi_iter.bi_size = 0;
@@ -813,6 +814,7 @@ static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp)
 {
 	bio_set_flag(bio, BIO_CLONED);
 	bio->bi_ioprio = bio_src->bi_ioprio;
+	bio->bi_write_hint = bio_src->bi_write_hint;
 	bio->bi_iter = bio_src->bi_iter;
 
 	if (bio->bi_bdev) {
diff --git a/block/blk-crypto-fallback.c b/block/blk-crypto-fallback.c
index e6468eab2681..b1e7415f8439 100644
--- a/block/blk-crypto-fallback.c
+++ b/block/blk-crypto-fallback.c
@@ -172,6 +172,7 @@ static struct bio *blk_crypto_fallback_clone_bio(struct bio *bio_src)
 	if (bio_flagged(bio_src, BIO_REMAPPED))
 		bio_set_flag(bio, BIO_REMAPPED);
 	bio->bi_ioprio		= bio_src->bi_ioprio;
+	bio->bi_write_hint	= bio_src->bi_write_hint;
 	bio->bi_iter.bi_sector	= bio_src->bi_iter.bi_sector;
 	bio->bi_iter.bi_size	= bio_src->bi_iter.bi_size;
 
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 2d470cf2173e..2a06fd33039d 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -810,6 +810,10 @@ static struct request *attempt_merge(struct request_queue *q,
 	if (rq_data_dir(req) != rq_data_dir(next))
 		return NULL;
 
+	/* Don't merge requests with different write hints. */
+	if (req->write_hint != next->write_hint)
+		return NULL;
+
 	if (req->ioprio != next->ioprio)
 		return NULL;
 
@@ -937,6 +941,10 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
 	if (!bio_crypt_rq_ctx_compatible(rq, bio))
 		return false;
 
+	/* Don't merge requests with different write hints. */
+	if (rq->write_hint != bio->bi_write_hint)
+		return false;
+
 	if (rq->ioprio != bio_prio(bio))
 		return false;
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index aa87fcfda1ec..34ceb15d2ea4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2585,6 +2585,7 @@ static void blk_mq_bio_to_request(struct request *rq, struct bio *bio,
 		rq->cmd_flags |= REQ_FAILFAST_MASK;
 
 	rq->__sector = bio->bi_iter.bi_sector;
+	rq->write_hint = bio->bi_write_hint;
 	blk_rq_bio_prep(rq, bio, nr_segs);
 
 	/* This can't fail, since GFP_NOIO includes __GFP_DIRECT_RECLAIM. */
@@ -3185,6 +3186,7 @@ int blk_rq_prep_clone(struct request *rq, struct request *rq_src,
 	}
 	rq->nr_phys_segments = rq_src->nr_phys_segments;
 	rq->ioprio = rq_src->ioprio;
+	rq->write_hint = rq_src->write_hint;
 
 	if (rq->bio && blk_crypto_rq_bio_prep(rq, rq->bio, gfp_mask) < 0)
 		goto free_and_out;
diff --git a/block/bounce.c b/block/bounce.c
index 7cfcb242f9a1..d6a5219f29dd 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -169,6 +169,7 @@ static struct bio *bounce_clone_bio(struct bio *bio_src)
 	if (bio_flagged(bio_src, BIO_REMAPPED))
 		bio_set_flag(bio, BIO_REMAPPED);
 	bio->bi_ioprio		= bio_src->bi_ioprio;
+	bio->bi_write_hint	= bio_src->bi_write_hint;
 	bio->bi_iter.bi_sector	= bio_src->bi_iter.bi_sector;
 	bio->bi_iter.bi_size	= bio_src->bi_iter.bi_size;
 
diff --git a/block/fops.c b/block/fops.c
index 0cf8cf72cdfa..ab0e37d1dc48 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -73,6 +73,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
 		bio_init(&bio, bdev, vecs, nr_pages, dio_bio_write_op(iocb));
 	}
 	bio.bi_iter.bi_sector = pos >> SECTOR_SHIFT;
+	bio.bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint;
 	bio.bi_ioprio = iocb->ki_ioprio;
 
 	ret = bio_iov_iter_get_pages(&bio, iter);
@@ -203,6 +204,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 
 	for (;;) {
 		bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT;
+		bio->bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint;
 		bio->bi_private = dio;
 		bio->bi_end_io = blkdev_bio_end_io;
 		bio->bi_ioprio = iocb->ki_ioprio;
@@ -321,6 +323,7 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
 	dio->flags = 0;
 	dio->iocb = iocb;
 	bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT;
+	bio->bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint;
 	bio->bi_end_io = blkdev_bio_end_io_async;
 	bio->bi_ioprio = iocb->ki_ioprio;
 
diff --git a/fs/buffer.c b/fs/buffer.c
index b55dea034a5d..d6b64124977a 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -55,7 +55,7 @@
 
 static int fsync_buffers_list(spinlock_t *lock, struct list_head *list);
 static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
-			  struct writeback_control *wbc);
+			  enum rw_hint hint, struct writeback_control *wbc);
 
 #define BH_ENTRY(list) list_entry((list), struct buffer_head, b_assoc_buffers)
 
@@ -1889,7 +1889,8 @@ int __block_write_full_folio(struct inode *inode, struct folio *folio,
 	do {
 		struct buffer_head *next = bh->b_this_page;
 		if (buffer_async_write(bh)) {
-			submit_bh_wbc(REQ_OP_WRITE | write_flags, bh, wbc);
+			submit_bh_wbc(REQ_OP_WRITE | write_flags, bh,
+				      inode->i_write_hint, wbc);
 			nr_underway++;
 		}
 		bh = next;
@@ -1944,7 +1945,8 @@ int __block_write_full_folio(struct inode *inode, struct folio *folio,
 		struct buffer_head *next = bh->b_this_page;
 		if (buffer_async_write(bh)) {
 			clear_buffer_dirty(bh);
-			submit_bh_wbc(REQ_OP_WRITE | write_flags, bh, wbc);
+			submit_bh_wbc(REQ_OP_WRITE | write_flags, bh,
+				      inode->i_write_hint, wbc);
 			nr_underway++;
 		}
 		bh = next;
@@ -2756,6 +2758,7 @@ static void end_bio_bh_io_sync(struct bio *bio)
 }
 
 static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
+			  enum rw_hint write_hint,
 			  struct writeback_control *wbc)
 {
 	const enum req_op op = opf & REQ_OP_MASK;
@@ -2783,6 +2786,7 @@ static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
 	fscrypt_set_bio_crypt_ctx_bh(bio, bh, GFP_NOIO);
 
 	bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
+	bio->bi_write_hint = write_hint;
 
 	__bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));
 
@@ -2802,7 +2806,7 @@ static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
 
 void submit_bh(blk_opf_t opf, struct buffer_head *bh)
 {
-	submit_bh_wbc(opf, bh, NULL);
+	submit_bh_wbc(opf, bh, WRITE_LIFE_NOT_SET, NULL);
 }
 EXPORT_SYMBOL(submit_bh);
 
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 60456263a338..62c97ff9e852 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -410,6 +410,8 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio,
 		bio->bi_end_io = dio_bio_end_io;
 	if (dio->is_pinned)
 		bio_set_flag(bio, BIO_PAGE_PINNED);
+	bio->bi_write_hint = file_inode(dio->iocb->ki_filp)->i_write_hint;
+
 	sdio->bio = bio;
 	sdio->logical_offset_in_bio = sdio->cur_page_fs_offset;
 }
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 093c4515b22a..c961511bece1 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1667,6 +1667,7 @@ iomap_alloc_ioend(struct inode *inode, struct iomap_writepage_ctx *wpc,
 			       REQ_OP_WRITE | wbc_to_write_flags(wbc),
 			       GFP_NOFS, &iomap_ioend_bioset);
 	bio->bi_iter.bi_sector = sector;
+	bio->bi_write_hint = inode->i_write_hint;
 	wbc_init_bio(wbc, bio);
 
 	ioend = container_of(bio, struct iomap_ioend, io_inline_bio);
@@ -1697,6 +1698,7 @@ iomap_chain_bio(struct bio *prev)
 	new = bio_alloc(prev->bi_bdev, BIO_MAX_VECS, prev->bi_opf, GFP_NOFS);
 	bio_clone_blkg_association(new, prev);
 	new->bi_iter.bi_sector = bio_end_sector(prev);
+	new->bi_write_hint = prev->bi_write_hint;
 
 	bio_chain(prev, new);
 	bio_get(prev);		/* for iomap_finish_ioend */
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index bcd3f8cf5ea4..f3b43d223a46 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -380,6 +380,7 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
 		fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits,
 					  GFP_KERNEL);
 		bio->bi_iter.bi_sector = iomap_sector(iomap, pos);
+		bio->bi_write_hint = inode->i_write_hint;
 		bio->bi_ioprio = dio->iocb->ki_ioprio;
 		bio->bi_private = dio;
 		bio->bi_end_io = iomap_dio_bio_end_io;
diff --git a/fs/mpage.c b/fs/mpage.c
index 738882e0766d..fa8b99a199fa 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -605,6 +605,7 @@ static int __mpage_writepage(struct folio *folio, struct writeback_control *wbc,
 				GFP_NOFS);
 		bio->bi_iter.bi_sector = first_block << (blkbits - 9);
 		wbc_init_bio(wbc, bio);
+		bio->bi_write_hint = inode->i_write_hint;
 	}
 
 	/*
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 7a8150a5f051..492b0128b5d9 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -8,6 +8,7 @@
 #include <linux/scatterlist.h>
 #include <linux/prefetch.h>
 #include <linux/srcu.h>
+#include <linux/rw_hint.h>
 
 struct blk_mq_tags;
 struct blk_flush_queue;
@@ -135,6 +136,7 @@ struct request {
 	struct blk_crypto_keyslot *crypt_keyslot;
 #endif
 
+	enum rw_hint write_hint;
 	unsigned short ioprio;
 
 	enum mq_rq_state state;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index f288c94374b3..12d87cef2c03 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -10,6 +10,7 @@
 #include <linux/bvec.h>
 #include <linux/device.h>
 #include <linux/ktime.h>
+#include <linux/rw_hint.h>
 
 struct bio_set;
 struct bio;
@@ -269,6 +270,7 @@ struct bio {
 						 */
 	unsigned short		bi_flags;	/* BIO_* below */
 	unsigned short		bi_ioprio;
+	enum rw_hint		bi_write_hint;
 	blk_status_t		bi_status;
 	atomic_t		__bi_remaining;
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/6] fs: Split fcntl_rw_hint()
  2024-01-31 20:52 ` [PATCH 3/6] fs: Split fcntl_rw_hint() Bart Van Assche
@ 2024-01-31 21:07   ` Dave Chinner
  2024-02-01 17:38     ` Bart Van Assche
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2024-01-31 21:07 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christian Brauner, Alexander Viro, linux-fsdevel, Jens Axboe,
	Christoph Hellwig, Martin K . Petersen, Kanchan Joshi,
	Jeff Layton, Chuck Lever, Stephen Rothwell

On Wed, Jan 31, 2024 at 12:52:34PM -0800, Bart Van Assche wrote:
> Split fcntl_rw_hint() such that there is one helper function per fcntl. No
> functionality is changed by this patch.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Suggested-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
> Cc: Jeff Layton <jlayton@kernel.org>
> Cc: Chuck Lever <chuck.lever@oracle.com>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  fs/fcntl.c | 47 ++++++++++++++++++++++++++---------------------
>  1 file changed, 26 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/fcntl.c b/fs/fcntl.c
> index f3bc4662455f..5fa2d95114bf 100644
> --- a/fs/fcntl.c
> +++ b/fs/fcntl.c
> @@ -290,32 +290,35 @@ static bool rw_hint_valid(u64 hint)
>  	}
>  }
>  
> -static long fcntl_rw_hint(struct file *file, unsigned int cmd,
> -			  unsigned long arg)
> +static long fcntl_get_rw_hint(struct file *file, unsigned int cmd,
> +			      unsigned long arg)
>  {
>  	struct inode *inode = file_inode(file);
>  	u64 __user *argp = (u64 __user *)arg;
> -	u64 hint;
> +	u64 hint = inode->i_write_hint;
>  
> -	switch (cmd) {
> -	case F_GET_RW_HINT:
> -		hint = inode->i_write_hint;
> -		if (copy_to_user(argp, &hint, sizeof(*argp)))
> -			return -EFAULT;
> -		return 0;
> -	case F_SET_RW_HINT:
> -		if (copy_from_user(&hint, argp, sizeof(hint)))
> -			return -EFAULT;
> -		if (!rw_hint_valid(hint))
> -			return -EINVAL;
> +	if (copy_to_user(argp, &hint, sizeof(*argp)))
> +		return -EFAULT;
> +	return 0;
> +}
>  
> -		inode_lock(inode);
> -		inode->i_write_hint = hint;
> -		inode_unlock(inode);
> -		return 0;
> -	default:
> +static long fcntl_set_rw_hint(struct file *file, unsigned int cmd,
> +			      unsigned long arg)
> +{
> +	struct inode *inode = file_inode(file);
> +	u64 __user *argp = (u64 __user *)arg;
> +	u64 hint;
> +
> +	if (copy_from_user(&hint, argp, sizeof(hint)))
> +		return -EFAULT;
> +	if (!rw_hint_valid(hint))
>  		return -EINVAL;
> -	}
> +
> +	inode_lock(inode);
> +	inode->i_write_hint = hint;
> +	inode_unlock(inode);

What is this locking serialising against? The inode may or may not
be locked when we access this in IO path, so why isn't this just
WRITE_ONCE() here and READ_ONCE() in the IO paths?

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/6] fs: Split fcntl_rw_hint()
  2024-01-31 21:07   ` Dave Chinner
@ 2024-02-01 17:38     ` Bart Van Assche
  0 siblings, 0 replies; 9+ messages in thread
From: Bart Van Assche @ 2024-02-01 17:38 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christian Brauner, Alexander Viro, linux-fsdevel, Jens Axboe,
	Christoph Hellwig, Martin K . Petersen, Kanchan Joshi,
	Jeff Layton, Chuck Lever, Stephen Rothwell

On 1/31/24 13:07, Dave Chinner wrote:
> On Wed, Jan 31, 2024 at 12:52:34PM -0800, Bart Van Assche wrote:
>> +	inode_lock(inode);
>> +	inode->i_write_hint = hint;
>> +	inode_unlock(inode);
> 
> What is this locking serialising against? The inode may or may not
> be locked when we access this in IO path, so why isn't this just
> WRITE_ONCE() here and READ_ONCE() in the IO paths?

How about using WRITE_ONCE()/READ_ONCE() in the fcntl implementations and
regular reads in the I/O paths? Using F_SET_RW_HINT while I/O is ongoing
is racy - there are no guarantees about how F_SET_RW_HINT will affect I/O
that has already been submitted. Hence, I think that it is acceptable to
use regular reads for i_write_hint in the I/O paths.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-02-01 17:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-31 20:52 [PATCH 0/6] Restore data lifetime support Bart Van Assche
2024-01-31 20:52 ` [PATCH 1/6] fs: Fix rw_hint validation Bart Van Assche
2024-01-31 20:52 ` [PATCH 2/6] fs: Verify write lifetime constants at compile time Bart Van Assche
2024-01-31 20:52 ` [PATCH 3/6] fs: Split fcntl_rw_hint() Bart Van Assche
2024-01-31 21:07   ` Dave Chinner
2024-02-01 17:38     ` Bart Van Assche
2024-01-31 20:52 ` [PATCH 4/6] fs: Move enum rw_hint into a new header file Bart Van Assche
2024-01-31 20:52 ` [PATCH 5/6] fs: Propagate write hints to the struct block_device inode Bart Van Assche
2024-01-31 20:52 ` [PATCH 6/6] block, fs: Restore the per-bio/request data lifetime fields Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).