Linux io-uring development
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Kanchan Joshi <joshi.k@samsung.com>,
	axboe@kernel.dk, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me,
	martin.petersen@oracle.com, brauner@kernel.org,
	viro@zeniv.linux.org.uk, jack@suse.cz, jaegeuk@kernel.org,
	bcrl@kvack.org, dhowells@redhat.com, bvanassche@acm.org,
	asml.silence@gmail.com
Cc: linux-nvme@lists.infradead.org, linux-fsdevel@vger.kernel.org,
	io-uring@vger.kernel.org, linux-block@vger.kernel.org,
	linux-aio@kvack.org, gost.dev@samsung.com, vishak.g@samsung.com,
	javier.gonz@samsung.com, Nitesh Shetty <nj.shetty@samsung.com>
Subject: Re: [PATCH v6 3/3] io_uring: enable per-io hinting capability
Date: Wed, 25 Sep 2024 07:57:31 +0200	[thread overview]
Message-ID: <28419703-681c-4d8c-9450-bdc2aff19d56@suse.de> (raw)
In-Reply-To: <20240924092457.7846-4-joshi.k@samsung.com>

On 9/24/24 11:24, Kanchan Joshi wrote:
> With F_SET_RW_HINT fcntl, user can set a hint on the file inode, and
> all the subsequent writes on the file pass that hint value down.
> This can be limiting for large files (and for block device) as all the
> writes can be tagged with only one lifetime hint value.
> Concurrent writes (with different hint values) are hard to manage.
> Per-IO hinting solves that problem.
> 
> Allow userspace to pass the write hint type and its value in the SQE.
> Two new fields are carved in the leftover space of SQE:
> 	__u8 hint_type;
> 	__u64 hint_val;
> 
> Adding the hint_type helps in keeping the interface extensible for future
> use.
> At this point only one type TYPE_WRITE_LIFETIME_HINT is supported. With
> this type, user can pass the lifetime hint values that are currently
> supported by F_SET_RW_HINT fcntl.
> 
> The write handlers (io_prep_rw, io_write) process the hint type/value
> and hint value is passed to lower-layer using kiocb. This is good for
> supporting direct IO, but not when kiocb is not available (e.g.,
> buffered IO).
> 
> In general, per-io hints take the precedence on per-inode hints.
> Three cases to consider:
> 
> Case 1: When hint_type is 0 (explicitly, or implicitly as SQE fields are
> initialized to 0), this means user did not send any hint. The per-inode
> hint values are set in the kiocb (as before).
> 
> Case 2: When hint_type is TYPE_WRITE_LIFETIME_HINT, the hint_value is
> set into the kiocb after sanity checking.
> 
> Case 3: When hint_type is anything else, this is flagged as an error
> and write is failed.
> 
> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> ---
>   fs/fcntl.c                    | 22 ----------------------
>   include/linux/rw_hint.h       | 24 ++++++++++++++++++++++++
>   include/uapi/linux/io_uring.h | 10 ++++++++++
>   io_uring/rw.c                 | 21 ++++++++++++++++++++-
>   4 files changed, 54 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/fcntl.c b/fs/fcntl.c
> index 081e5e3d89ea..2eb78035a350 100644
> --- a/fs/fcntl.c
> +++ b/fs/fcntl.c
> @@ -334,28 +334,6 @@ static int f_getowner_uids(struct file *filp, unsigned long arg)
>   }
>   #endif
>   
> -static bool rw_hint_valid(u64 hint)
> -{
> -	BUILD_BUG_ON(WRITE_LIFE_NOT_SET != RWH_WRITE_LIFE_NOT_SET);
> -	BUILD_BUG_ON(WRITE_LIFE_NONE != RWH_WRITE_LIFE_NONE);
> -	BUILD_BUG_ON(WRITE_LIFE_SHORT != RWH_WRITE_LIFE_SHORT);
> -	BUILD_BUG_ON(WRITE_LIFE_MEDIUM != RWH_WRITE_LIFE_MEDIUM);
> -	BUILD_BUG_ON(WRITE_LIFE_LONG != RWH_WRITE_LIFE_LONG);
> -	BUILD_BUG_ON(WRITE_LIFE_EXTREME != RWH_WRITE_LIFE_EXTREME);
> -
> -	switch (hint) {
> -	case RWH_WRITE_LIFE_NOT_SET:
> -	case RWH_WRITE_LIFE_NONE:
> -	case RWH_WRITE_LIFE_SHORT:
> -	case RWH_WRITE_LIFE_MEDIUM:
> -	case RWH_WRITE_LIFE_LONG:
> -	case RWH_WRITE_LIFE_EXTREME:
> -		return true;
> -	default:
> -		return false;
> -	}
> -}
> -
>   static long fcntl_get_rw_hint(struct file *file, unsigned int cmd,
>   			      unsigned long arg)
>   {
> diff --git a/include/linux/rw_hint.h b/include/linux/rw_hint.h
> index 309ca72f2dfb..f4373a71ffed 100644
> --- a/include/linux/rw_hint.h
> +++ b/include/linux/rw_hint.h
> @@ -21,4 +21,28 @@ enum rw_hint {
>   static_assert(sizeof(enum rw_hint) == 1);
>   #endif
>   
> +#define	WRITE_LIFE_INVALID	(RWH_WRITE_LIFE_EXTREME + 1)
> +
> +static inline bool rw_hint_valid(u64 hint)
> +{
> +	BUILD_BUG_ON(WRITE_LIFE_NOT_SET != RWH_WRITE_LIFE_NOT_SET);
> +	BUILD_BUG_ON(WRITE_LIFE_NONE != RWH_WRITE_LIFE_NONE);
> +	BUILD_BUG_ON(WRITE_LIFE_SHORT != RWH_WRITE_LIFE_SHORT);
> +	BUILD_BUG_ON(WRITE_LIFE_MEDIUM != RWH_WRITE_LIFE_MEDIUM);
> +	BUILD_BUG_ON(WRITE_LIFE_LONG != RWH_WRITE_LIFE_LONG);
> +	BUILD_BUG_ON(WRITE_LIFE_EXTREME != RWH_WRITE_LIFE_EXTREME);
> +
> +	switch (hint) {
> +	case RWH_WRITE_LIFE_NOT_SET:
> +	case RWH_WRITE_LIFE_NONE:
> +	case RWH_WRITE_LIFE_SHORT:
> +	case RWH_WRITE_LIFE_MEDIUM:
> +	case RWH_WRITE_LIFE_LONG:
> +	case RWH_WRITE_LIFE_EXTREME:
> +		return true;
> +	default:
> +		return false;
> +	}
> +}
> +
>   #endif /* _LINUX_RW_HINT_H */
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index 1fe79e750470..e21a74dd0c49 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -98,6 +98,11 @@ struct io_uring_sqe {
>   			__u64	addr3;
>   			__u64	__pad2[1];
>   		};
> +		struct {
> +			/* To send per-io hint type/value with write command */
> +			__u64	hint_val;
> +			__u8	hint_type;
> +		};
Why is 'hint_val' 64 bits? Everything else is 8 bytes, so wouldn't it
be better to shorten that? As it stands the new struct will introduce
a hole of 24 bytes after 'hint_type'.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


  parent reply	other threads:[~2024-09-25  5:57 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20240924093247epcas5p4807fe5e531a2b7b2d6961d23bc989c80@epcas5p4.samsung.com>
2024-09-24  9:24 ` [PATCH v6 0/3] per-io hints and FDP Kanchan Joshi
2024-09-24  9:24   ` [PATCH v6 1/3] nvme: enable FDP support Kanchan Joshi
2024-09-24  9:39     ` Christoph Hellwig
2024-09-25  5:48     ` Hannes Reinecke
2024-09-24  9:24   ` [PATCH v6 2/3] block, fs: restore kiocb based write hint processing Kanchan Joshi
2024-09-25  5:49     ` Hannes Reinecke
2024-09-24  9:24   ` [PATCH v6 3/3] io_uring: enable per-io hinting capability Kanchan Joshi
2024-09-24  9:40     ` Christoph Hellwig
2024-09-25  5:57     ` Hannes Reinecke [this message]
2024-09-25 11:09       ` Kanchan Joshi
2024-09-25 12:23         ` Pavel Begunkov
2024-09-25 13:21           ` Kanchan Joshi
2024-09-26 20:09             ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=28419703-681c-4d8c-9450-bdc2aff19d56@suse.de \
    --to=hare@suse.de \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bcrl@kvack.org \
    --cc=brauner@kernel.org \
    --cc=bvanassche@acm.org \
    --cc=dhowells@redhat.com \
    --cc=gost.dev@samsung.com \
    --cc=hch@lst.de \
    --cc=io-uring@vger.kernel.org \
    --cc=jack@suse.cz \
    --cc=jaegeuk@kernel.org \
    --cc=javier.gonz@samsung.com \
    --cc=joshi.k@samsung.com \
    --cc=kbusch@kernel.org \
    --cc=linux-aio@kvack.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=martin.petersen@oracle.com \
    --cc=nj.shetty@samsung.com \
    --cc=sagi@grimberg.me \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vishak.g@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox