Linux block layer
 help / color / mirror / Atom feed
* Re: [PATCHv2 3/3] block: relax direct io memory alignment
From: Eric Biggers @ 2022-05-19  1:53 UTC (permalink / raw)
  To: Keith Busch
  Cc: Keith Busch, linux-fsdevel, linux-block, axboe, Kernel Team, hch,
	bvanassche, damien.lemoal
In-Reply-To: <YoWWtwsiKGqoTbVU@kbusch-mbp.dhcp.thefacebook.com>

On Wed, May 18, 2022 at 07:00:39PM -0600, Keith Busch wrote:
> On Wed, May 18, 2022 at 05:14:49PM -0700, Eric Biggers wrote:
> > On Wed, May 18, 2022 at 10:11:31AM -0700, Keith Busch wrote:
> > > diff --git a/block/fops.c b/block/fops.c
> > > index b9b83030e0df..d8537c29602f 100644
> > > --- a/block/fops.c
> > > +++ b/block/fops.c
> > > @@ -54,8 +54,9 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
> > >  	struct bio bio;
> > >  	ssize_t ret;
> > >  
> > > -	if ((pos | iov_iter_alignment(iter)) &
> > > -	    (bdev_logical_block_size(bdev) - 1))
> > > +	if ((pos | iov_iter_count(iter)) & (bdev_logical_block_size(bdev) - 1))
> > > +		return -EINVAL;
> > > +	if (iov_iter_alignment(iter) & bdev_dma_alignment(bdev))
> > >  		return -EINVAL;
> > 
> > The block layer makes a lot of assumptions that bios can be split at any bvec
> > boundary.  With this patch, bios whose length isn't a multiple of the logical
> > block size can be generated by splitting, which isn't valid.
> 
> How? This patch ensures every segment is block size aligned.

No, it doesn't.  It ensures that the *total* length of each bio is logical block
size aligned.  It doesn't ensure that for the individual bvecs.  By decreasing
the required memory alignment to below the logical block size, you're allowing
logical blocks to span a page boundary.  Whenever the two pages involved aren't
physically contiguous, the data of the block will be split across two bvecs.

> > Also some devices aren't compatible with logical blocks spanning bdevs at all.
> > dm-crypt errors out in this case, for example.
> 
> I'm sorry, but I am not understanding this.

I meant to write bvecs, not bdevs.

- Eric

^ permalink raw reply

* Re: [PATCH -next v2 6/6] nbd: use pr_err to output error message
From: yukuai (C) @ 2022-05-19  1:06 UTC (permalink / raw)
  To: Joe Perches, josef, axboe, ming.lei
  Cc: linux-block, nbd, linux-kernel, yi.zhang
In-Reply-To: <f0acebb66b9b46ad472e0d0989dc0f5810cac3dd.camel@perches.com>

在 2022/05/19 2:12, Joe Perches 写道:
> On Wed, 2022-05-18 at 20:26 +0800, Yu Kuai wrote:
>> Instead of using the long printk(KERN_ERR "nbd: ...") to
>> output error message, defining pr_fmt and using
>> the short pr_err("") to do that. The replacemen is done
>> by using the following command:
>>
>>    sed -i 's/printk(KERN_ERR "nbd: /pr_err("/g' \
>> 		  drivers/block/nbd.c
> 
> It's also good to rewrap to 80 columns where possible.
> 
>> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> []
>> @@ -2130,13 +2130,13 @@ static int nbd_genl_disconnect(struct sk_buff *skb, struct genl_info *info)
>>   	nbd = idr_find(&nbd_index_idr, index);
>>   	if (!nbd) {
>>   		mutex_unlock(&nbd_index_mutex);
>> -		printk(KERN_ERR "nbd: couldn't find device at index %d\n",
>> +		pr_err("couldn't find device at index %d\n",
>>   		       index);
> 
> like here
> 
>>   		return -EINVAL;
>>   	}
>>   	if (!refcount_inc_not_zero(&nbd->refs)) {
>>   		mutex_unlock(&nbd_index_mutex);
>> -		printk(KERN_ERR "nbd: device at index %d is going down\n",
>> +		pr_err("device at index %d is going down\n",
>>   		       index);
> 
> and here and below...
Hi, Josef

Thanks for your advice, I'll send a new version.
> 
>> @@ -2170,7 +2170,7 @@ static int nbd_genl_reconfigure(struct sk_buff *skb, struct genl_info *info)
>>   	nbd = idr_find(&nbd_index_idr, index);
>>   	if (!nbd) {
>>   		mutex_unlock(&nbd_index_mutex);
>> -		printk(KERN_ERR "nbd: couldn't find a device at index %d\n",
>> +		pr_err("couldn't find a device at index %d\n",
>>   		       index);
>>   		return -EINVAL;
>>   	}
>> @@ -2192,7 +2192,7 @@ static int nbd_genl_reconfigure(struct sk_buff *skb, struct genl_info *info)
>>   	}
>>   	if (!refcount_inc_not_zero(&nbd->refs)) {
>>   		mutex_unlock(&nbd_index_mutex);
>> -		printk(KERN_ERR "nbd: device at index %d is going down\n",
>> +		pr_err("device at index %d is going down\n",
>>   		       index);
>>   		return -EINVAL;
>>   	}
> 
> 
> 
> .
> 

^ permalink raw reply

* Re: [PATCHv2 0/3] direct io alignment relax
From: Chaitanya Kulkarni @ 2022-05-19  1:02 UTC (permalink / raw)
  To: Keith Busch, Eric Biggers
  Cc: Keith Busch, linux-fsdevel@vger.kernel.org,
	linux-block@vger.kernel.org, axboe@kernel.dk, Kernel Team,
	hch@lst.de, bvanassche@acm.org, damien.lemoal@opensource.wdc.com
In-Reply-To: <YoWUnTxag7TsCBwa@kbusch-mbp.dhcp.thefacebook.com>

On 5/18/22 17:51, Keith Busch wrote:
> On Wed, May 18, 2022 at 11:26:20PM +0000, Eric Biggers wrote:
>> On Wed, May 18, 2022 at 10:11:28AM -0700, Keith Busch wrote:
>>> From: Keith Busch <kbusch@kernel.org>
>>>
>>> Including the fs list this time.
>>>
>>> I am still working on a better interface to report the dio alignment to
>>> an application. The most recent suggestion of using statx is proving to
>>> be less straight forward than I thought, but I don't want to hold this
>>> series up for that.
>>>
>>
>> Note that I already implemented the statx support and sent it out for review:
>> https://lore.kernel.org/linux-fsdevel/20220211061158.227688-1-ebiggers@kernel.org/T/#u
>> However, the patch series only received one comment.  I can send it out again if
>> people have become interested in it again...
> 
> Thanks, I didn't see that the first time around, but I'll be sure to look at
> your new version. It sounds like you encountered the same problem I did
> regarding block device handles: the devtmpfs inodes for the raw block device
> handles are not the bdev inodes. I do think it's useful the alignment
> attributes are accessible through the block device files, though.

Irrespective of above problem, as per my review comment [1] on the
initial version of Eric's series I really want to see the generic
interface that can accommodate exposing optimal values for different
operations REQ_OP_DISCARD/REQ_OP_WRITE_ZEROES/REQ_OP_VERIFY etc.
and not only for read/write.

-ck

https://lore.kernel.org/linux-fsdevel/20220211061158.227688-1-ebiggers@kernel.org/T/#r3ffe9183c372fb97a9753e286f9cf6400e8ec272



^ permalink raw reply

* Re: [PATCHv2 3/3] block: relax direct io memory alignment
From: Keith Busch @ 2022-05-19  1:00 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Keith Busch, linux-fsdevel, linux-block, axboe, Kernel Team, hch,
	bvanassche, damien.lemoal
In-Reply-To: <YoWL+T8JiIO5Ln3h@sol.localdomain>

On Wed, May 18, 2022 at 05:14:49PM -0700, Eric Biggers wrote:
> On Wed, May 18, 2022 at 10:11:31AM -0700, Keith Busch wrote:
> > diff --git a/block/fops.c b/block/fops.c
> > index b9b83030e0df..d8537c29602f 100644
> > --- a/block/fops.c
> > +++ b/block/fops.c
> > @@ -54,8 +54,9 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
> >  	struct bio bio;
> >  	ssize_t ret;
> >  
> > -	if ((pos | iov_iter_alignment(iter)) &
> > -	    (bdev_logical_block_size(bdev) - 1))
> > +	if ((pos | iov_iter_count(iter)) & (bdev_logical_block_size(bdev) - 1))
> > +		return -EINVAL;
> > +	if (iov_iter_alignment(iter) & bdev_dma_alignment(bdev))
> >  		return -EINVAL;
> 
> The block layer makes a lot of assumptions that bios can be split at any bvec
> boundary.  With this patch, bios whose length isn't a multiple of the logical
> block size can be generated by splitting, which isn't valid.

How? This patch ensures every segment is block size aligned. We can always
split a bio in the middle of a bvec for any lower level hardware constraints,
and I'm not finding any splitting criteria that would try to break a bio on a
non-block aligned boundary.

> Also some devices aren't compatible with logical blocks spanning bdevs at all.
> dm-crypt errors out in this case, for example.

I'm sorry, but I am not understanding this.

^ permalink raw reply

* Re: [PATCHv2 0/3] direct io alignment relax
From: Keith Busch @ 2022-05-19  0:51 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Keith Busch, linux-fsdevel, linux-block, axboe, Kernel Team, hch,
	bvanassche, damien.lemoal
In-Reply-To: <YoWAnDR/XOwegQNZ@gmail.com>

On Wed, May 18, 2022 at 11:26:20PM +0000, Eric Biggers wrote:
> On Wed, May 18, 2022 at 10:11:28AM -0700, Keith Busch wrote:
> > From: Keith Busch <kbusch@kernel.org>
> > 
> > Including the fs list this time.
> > 
> > I am still working on a better interface to report the dio alignment to
> > an application. The most recent suggestion of using statx is proving to
> > be less straight forward than I thought, but I don't want to hold this
> > series up for that.
> > 
> 
> Note that I already implemented the statx support and sent it out for review:
> https://lore.kernel.org/linux-fsdevel/20220211061158.227688-1-ebiggers@kernel.org/T/#u
> However, the patch series only received one comment.  I can send it out again if
> people have become interested in it again...

Thanks, I didn't see that the first time around, but I'll be sure to look at
your new version. It sounds like you encountered the same problem I did
regarding block device handles: the devtmpfs inodes for the raw block device
handles are not the bdev inodes. I do think it's useful the alignment
attributes are accessible through the block device files, though.

^ permalink raw reply

* Re: [GIT PULL] Block fix for 5.18-final
From: pr-tracker-bot @ 2022-05-19  0:35 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linus Torvalds, linux-block@vger.kernel.org
In-Reply-To: <9adae644-3856-a84e-b3d4-47106c91e09f@kernel.dk>

The pull request you sent on Wed, 18 May 2022 17:11:50 -0600:

> git://git.kernel.dk/linux-block.git tags/block-5.18-2022-05-18

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/f993aed406eaf968ba3867a76bb46c95336a33d0

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply

* Re: [PATCHv2 3/3] block: relax direct io memory alignment
From: Eric Biggers @ 2022-05-19  0:14 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-fsdevel, linux-block, axboe, Kernel Team, hch, bvanassche,
	damien.lemoal, Keith Busch
In-Reply-To: <20220518171131.3525293-4-kbusch@fb.com>

On Wed, May 18, 2022 at 10:11:31AM -0700, Keith Busch wrote:
> diff --git a/block/fops.c b/block/fops.c
> index b9b83030e0df..d8537c29602f 100644
> --- a/block/fops.c
> +++ b/block/fops.c
> @@ -54,8 +54,9 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
>  	struct bio bio;
>  	ssize_t ret;
>  
> -	if ((pos | iov_iter_alignment(iter)) &
> -	    (bdev_logical_block_size(bdev) - 1))
> +	if ((pos | iov_iter_count(iter)) & (bdev_logical_block_size(bdev) - 1))
> +		return -EINVAL;
> +	if (iov_iter_alignment(iter) & bdev_dma_alignment(bdev))
>  		return -EINVAL;

The block layer makes a lot of assumptions that bios can be split at any bvec
boundary.  With this patch, bios whose length isn't a multiple of the logical
block size can be generated by splitting, which isn't valid.

Also some devices aren't compatible with logical blocks spanning bdevs at all.
dm-crypt errors out in this case, for example.

What am I missing?

- Eric

^ permalink raw reply

* [RFC PATCH v2 4/7] f2fs: move f2fs_force_buffered_io() into file.c
From: Eric Biggers @ 2022-05-18 23:50 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, linux-api, linux-fscrypt,
	linux-block, linux-kernel, Keith Busch
In-Reply-To: <20220518235011.153058-1-ebiggers@kernel.org>

From: Eric Biggers <ebiggers@google.com>

f2fs_force_buffered_io() is only used in file.c, so move it into there.
No behavior change.  This makes it easier to review later patches.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/f2fs.h | 45 ---------------------------------------------
 fs/f2fs/file.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 271509b1c7928..2d6492c016ad6 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -4442,17 +4442,6 @@ static inline void f2fs_i_compr_blocks_update(struct inode *inode,
 	f2fs_mark_inode_dirty_sync(inode, true);
 }
 
-static inline int block_unaligned_IO(struct inode *inode,
-				struct kiocb *iocb, struct iov_iter *iter)
-{
-	unsigned int i_blkbits = READ_ONCE(inode->i_blkbits);
-	unsigned int blocksize_mask = (1 << i_blkbits) - 1;
-	loff_t offset = iocb->ki_pos;
-	unsigned long align = offset | iov_iter_alignment(iter);
-
-	return align & blocksize_mask;
-}
-
 static inline bool f2fs_allow_multi_device_dio(struct f2fs_sb_info *sbi,
 								int flag)
 {
@@ -4463,40 +4452,6 @@ static inline bool f2fs_allow_multi_device_dio(struct f2fs_sb_info *sbi,
 	return sbi->aligned_blksize;
 }
 
-static inline bool f2fs_force_buffered_io(struct inode *inode,
-				struct kiocb *iocb, struct iov_iter *iter)
-{
-	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
-	int rw = iov_iter_rw(iter);
-
-	if (!fscrypt_dio_supported(inode))
-		return true;
-	if (fsverity_active(inode))
-		return true;
-	if (f2fs_compressed_file(inode))
-		return true;
-
-	/* disallow direct IO if any of devices has unaligned blksize */
-	if (f2fs_is_multi_device(sbi) && !sbi->aligned_blksize)
-		return true;
-	/*
-	 * for blkzoned device, fallback direct IO to buffered IO, so
-	 * all IOs can be serialized by log-structured write.
-	 */
-	if (f2fs_sb_has_blkzoned(sbi))
-		return true;
-	if (f2fs_lfs_mode(sbi) && (rw == WRITE)) {
-		if (block_unaligned_IO(inode, iocb, iter))
-			return true;
-		if (F2FS_IO_ALIGNED(sbi))
-			return true;
-	}
-	if (is_sbi_flag_set(F2FS_I_SB(inode), SBI_CP_DISABLED))
-		return true;
-
-	return false;
-}
-
 static inline bool f2fs_need_verity(const struct inode *inode, pgoff_t idx)
 {
 	return fsverity_active(inode) &&
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 5b89af0f27f05..67f2e21ffbd67 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -807,6 +807,51 @@ int f2fs_truncate(struct inode *inode)
 	return 0;
 }
 
+static int block_unaligned_IO(struct inode *inode, struct kiocb *iocb,
+			      struct iov_iter *iter)
+{
+	unsigned int i_blkbits = READ_ONCE(inode->i_blkbits);
+	unsigned int blocksize_mask = (1 << i_blkbits) - 1;
+	loff_t offset = iocb->ki_pos;
+	unsigned long align = offset | iov_iter_alignment(iter);
+
+	return align & blocksize_mask;
+}
+
+static inline bool f2fs_force_buffered_io(struct inode *inode,
+				struct kiocb *iocb, struct iov_iter *iter)
+{
+	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+	int rw = iov_iter_rw(iter);
+
+	if (!fscrypt_dio_supported(inode))
+		return true;
+	if (fsverity_active(inode))
+		return true;
+	if (f2fs_compressed_file(inode))
+		return true;
+
+	/* disallow direct IO if any of devices has unaligned blksize */
+	if (f2fs_is_multi_device(sbi) && !sbi->aligned_blksize)
+		return true;
+	/*
+	 * for blkzoned device, fallback direct IO to buffered IO, so
+	 * all IOs can be serialized by log-structured write.
+	 */
+	if (f2fs_sb_has_blkzoned(sbi))
+		return true;
+	if (f2fs_lfs_mode(sbi) && (rw == WRITE)) {
+		if (block_unaligned_IO(inode, iocb, iter))
+			return true;
+		if (F2FS_IO_ALIGNED(sbi))
+			return true;
+	}
+	if (is_sbi_flag_set(F2FS_I_SB(inode), SBI_CP_DISABLED))
+		return true;
+
+	return false;
+}
+
 int f2fs_getattr(struct user_namespace *mnt_userns, const struct path *path,
 		 struct kstat *stat, u32 request_mask, unsigned int query_flags)
 {
-- 
2.36.1


^ permalink raw reply related

* [RFC PATCH v2 7/7] f2fs: support STATX_IOALIGN
From: Eric Biggers @ 2022-05-18 23:50 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, linux-api, linux-fscrypt,
	linux-block, linux-kernel, Keith Busch
In-Reply-To: <20220518235011.153058-1-ebiggers@kernel.org>

From: Eric Biggers <ebiggers@google.com>

Add support for STATX_IOALIGN to f2fs, so that I/O alignment information
is exposed to userspace in a consistent and easy-to-use way.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/file.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index c32f7722ba6b0..f89a190949c59 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -835,6 +835,21 @@ static bool f2fs_force_buffered_io(struct inode *inode)
 	return false;
 }
 
+/* Return the maximum value of io_opt across all the filesystem's devices. */
+static unsigned int f2fs_max_io_opt(struct inode *inode)
+{
+	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+	int io_opt = 0;
+	int i;
+
+	if (!f2fs_is_multi_device(sbi))
+		return bdev_io_opt(sbi->sb->s_bdev);
+
+	for (i = 0; i < sbi->s_ndevs; i++)
+		io_opt = max(io_opt, bdev_io_opt(FDEV(i).bdev));
+	return io_opt;
+}
+
 int f2fs_getattr(struct user_namespace *mnt_userns, const struct path *path,
 		 struct kstat *stat, u32 request_mask, unsigned int query_flags)
 {
@@ -851,6 +866,22 @@ int f2fs_getattr(struct user_namespace *mnt_userns, const struct path *path,
 		stat->btime.tv_nsec = fi->i_crtime.tv_nsec;
 	}
 
+	/*
+	 * Return the I/O alignment information if requested.  We only return
+	 * this information when requested, since on encrypted files it might
+	 * take a fair bit of work to get if the file wasn't opened recently.
+	 */
+	if ((request_mask & STATX_IOALIGN) && S_ISREG(inode->i_mode)) {
+		unsigned int bsize = i_blocksize(inode);
+
+		stat->result_mask |= STATX_IOALIGN;
+		if (!f2fs_force_buffered_io(inode)) {
+			stat->mem_align_dio = bsize;
+			stat->offset_align_dio = bsize;
+		}
+		stat->offset_align_optimal = max(f2fs_max_io_opt(inode), bsize);
+	}
+
 	flags = fi->i_flags;
 	if (flags & F2FS_COMPR_FL)
 		stat->attributes |= STATX_ATTR_COMPRESSED;
-- 
2.36.1


^ permalink raw reply related

* [RFC PATCH v2 6/7] f2fs: simplify f2fs_force_buffered_io()
From: Eric Biggers @ 2022-05-18 23:50 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, linux-api, linux-fscrypt,
	linux-block, linux-kernel, Keith Busch
In-Reply-To: <20220518235011.153058-1-ebiggers@kernel.org>

From: Eric Biggers <ebiggers@google.com>

f2fs only allows direct I/O that is aligned to the filesystem block
size.  Given that fact, simplify f2fs_force_buffered_io() by removing
the redundant call to block_unaligned_IO().

This makes it easier to reuse this code for STATX_IOALIGN.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/file.c | 24 ++++--------------------
 1 file changed, 4 insertions(+), 20 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 68947fe16ea35..c32f7722ba6b0 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -807,19 +807,7 @@ int f2fs_truncate(struct inode *inode)
 	return 0;
 }
 
-static int block_unaligned_IO(struct inode *inode, struct kiocb *iocb,
-			      struct iov_iter *iter)
-{
-	unsigned int i_blkbits = READ_ONCE(inode->i_blkbits);
-	unsigned int blocksize_mask = (1 << i_blkbits) - 1;
-	loff_t offset = iocb->ki_pos;
-	unsigned long align = offset | iov_iter_alignment(iter);
-
-	return align & blocksize_mask;
-}
-
-static inline bool f2fs_force_buffered_io(struct inode *inode,
-				struct kiocb *iocb, struct iov_iter *iter)
+static bool f2fs_force_buffered_io(struct inode *inode)
 {
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
 
@@ -839,12 +827,8 @@ static inline bool f2fs_force_buffered_io(struct inode *inode,
 	 */
 	if (f2fs_sb_has_blkzoned(sbi))
 		return true;
-	if (f2fs_lfs_mode(sbi)) {
-		if (block_unaligned_IO(inode, iocb, iter))
-			return true;
-		if (F2FS_IO_ALIGNED(sbi))
-			return true;
-	}
+	if (f2fs_lfs_mode(sbi) && F2FS_IO_ALIGNED(sbi))
+		return true;
 	if (is_sbi_flag_set(F2FS_I_SB(inode), SBI_CP_DISABLED))
 		return true;
 
@@ -4283,7 +4267,7 @@ static bool f2fs_should_use_dio(struct inode *inode, struct kiocb *iocb,
 	if (!(iocb->ki_flags & IOCB_DIRECT))
 		return false;
 
-	if (f2fs_force_buffered_io(inode, iocb, iter))
+	if (f2fs_force_buffered_io(inode))
 		return false;
 
 	/*
-- 
2.36.1


^ permalink raw reply related

* [RFC PATCH v2 2/7] fscrypt: change fscrypt_dio_supported() to prepare for STATX_IOALIGN
From: Eric Biggers @ 2022-05-18 23:50 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, linux-api, linux-fscrypt,
	linux-block, linux-kernel, Keith Busch
In-Reply-To: <20220518235011.153058-1-ebiggers@kernel.org>

From: Eric Biggers <ebiggers@google.com>

To prepare for STATX_IOALIGN support, make two changes to
fscrypt_dio_supported().

First, remove the filesystem-block-alignment check and make the
filesystems handle it instead.  It previously made sense to have it in
fs/crypto/; however, to support STATX_IOALIGN the alignment requirement
would have to be returned to filesystems.  It ends up being simpler if
filesystems handle this part themselves, especially for f2fs which only
allows fs-block-aligned DIO in the first place.

Second, make fscrypt_dio_supported() work on inodes whose encryption key
hasn't been set up yet, by making it set up the key if needed.  This is
required for statx(), since statx() doesn't require a file descriptor.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/crypto/inline_crypt.c | 48 +++++++++++++++++++++-------------------
 fs/ext4/file.c           |  9 ++++++--
 fs/f2fs/f2fs.h           |  2 +-
 include/linux/fscrypt.h  |  7 ++----
 4 files changed, 35 insertions(+), 31 deletions(-)

diff --git a/fs/crypto/inline_crypt.c b/fs/crypto/inline_crypt.c
index 93c2ca8580923..82df4c0b9903c 100644
--- a/fs/crypto/inline_crypt.c
+++ b/fs/crypto/inline_crypt.c
@@ -370,43 +370,45 @@ bool fscrypt_mergeable_bio_bh(struct bio *bio,
 EXPORT_SYMBOL_GPL(fscrypt_mergeable_bio_bh);
 
 /**
- * fscrypt_dio_supported() - check whether a DIO (direct I/O) request is
- *			     supported as far as encryption is concerned
- * @iocb: the file and position the I/O is targeting
- * @iter: the I/O data segment(s)
+ * fscrypt_dio_supported() - check whether DIO (direct I/O) is supported on an
+ *			     inode, as far as encryption is concerned
+ * @inode: the inode in question
  *
  * Return: %true if there are no encryption constraints that prevent DIO from
  *	   being supported; %false if DIO is unsupported.  (Note that in the
  *	   %true case, the filesystem might have other, non-encryption-related
- *	   constraints that prevent DIO from actually being supported.)
+ *	   constraints that prevent DIO from actually being supported.  Also, on
+ *	   encrypted files the filesystem is still responsible for only allowing
+ *	   DIO when requests are filesystem-block-aligned.)
  */
-bool fscrypt_dio_supported(struct kiocb *iocb, struct iov_iter *iter)
+bool fscrypt_dio_supported(struct inode *inode)
 {
-	const struct inode *inode = file_inode(iocb->ki_filp);
-	const unsigned int blocksize = i_blocksize(inode);
+	int err;
 
 	/* If the file is unencrypted, no veto from us. */
 	if (!fscrypt_needs_contents_encryption(inode))
 		return true;
 
-	/* We only support DIO with inline crypto, not fs-layer crypto. */
-	if (!fscrypt_inode_uses_inline_crypto(inode))
-		return false;
-
 	/*
-	 * Since the granularity of encryption is filesystem blocks, the file
-	 * position and total I/O length must be aligned to the filesystem block
-	 * size -- not just to the block device's logical block size as is
-	 * traditionally the case for DIO on many filesystems.
+	 * We only support DIO with inline crypto, not fs-layer crypto.
 	 *
-	 * We require that the user-provided memory buffers be filesystem block
-	 * aligned too.  It is simpler to have a single alignment value required
-	 * for all properties of the I/O, as is normally the case for DIO.
-	 * Also, allowing less aligned buffers would imply that data units could
-	 * cross bvecs, which would greatly complicate the I/O stack, which
-	 * assumes that bios can be split at any bvec boundary.
+	 * To determine whether the inode is using inline crypto, we have to set
+	 * up the key if it wasn't already done.  This is because in the current
+	 * design of fscrypt, the decision of whether to use inline crypto or
+	 * not isn't made until the inode's encryption key is being set up.  In
+	 * the DIO read/write case, the key will always be set up already, since
+	 * the file will be open.  But in the case of statx(), the key might not
+	 * be set up yet, as the file might not have been opened yet.
 	 */
-	if (!IS_ALIGNED(iocb->ki_pos | iov_iter_alignment(iter), blocksize))
+	err = fscrypt_require_key(inode);
+	if (err) {
+		/*
+		 * Key unavailable or couldn't be set up.  This edge case isn't
+		 * worth worrying about; just report that DIO is unsupported.
+		 */
+		return false;
+	}
+	if (!fscrypt_inode_uses_inline_crypto(inode))
 		return false;
 
 	return true;
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 6feb07e3e1eb5..de153b508b20a 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -40,8 +40,13 @@ static bool ext4_dio_supported(struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct inode *inode = file_inode(iocb->ki_filp);
 
-	if (!fscrypt_dio_supported(iocb, iter))
-		return false;
+	if (IS_ENCRYPTED(inode)) {
+		if (!fscrypt_dio_supported(inode))
+			return false;
+		if (!IS_ALIGNED(iocb->ki_pos | iov_iter_alignment(iter),
+				i_blocksize(inode)))
+			return false;
+	}
 	if (fsverity_active(inode))
 		return false;
 	if (ext4_should_journal_data(inode))
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 8c570de21ed5a..271509b1c7928 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -4469,7 +4469,7 @@ static inline bool f2fs_force_buffered_io(struct inode *inode,
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
 	int rw = iov_iter_rw(iter);
 
-	if (!fscrypt_dio_supported(iocb, iter))
+	if (!fscrypt_dio_supported(inode))
 		return true;
 	if (fsverity_active(inode))
 		return true;
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index 50d92d805bd8c..6ca89461d48dc 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -714,7 +714,7 @@ bool fscrypt_mergeable_bio(struct bio *bio, const struct inode *inode,
 bool fscrypt_mergeable_bio_bh(struct bio *bio,
 			      const struct buffer_head *next_bh);
 
-bool fscrypt_dio_supported(struct kiocb *iocb, struct iov_iter *iter);
+bool fscrypt_dio_supported(struct inode *inode);
 
 u64 fscrypt_limit_io_blocks(const struct inode *inode, u64 lblk, u64 nr_blocks);
 
@@ -747,11 +747,8 @@ static inline bool fscrypt_mergeable_bio_bh(struct bio *bio,
 	return true;
 }
 
-static inline bool fscrypt_dio_supported(struct kiocb *iocb,
-					 struct iov_iter *iter)
+static inline bool fscrypt_dio_supported(struct inode *inode)
 {
-	const struct inode *inode = file_inode(iocb->ki_filp);
-
 	return !fscrypt_needs_contents_encryption(inode);
 }
 
-- 
2.36.1


^ permalink raw reply related

* [RFC PATCH v2 3/7] ext4: support STATX_IOALIGN
From: Eric Biggers @ 2022-05-18 23:50 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, linux-api, linux-fscrypt,
	linux-block, linux-kernel, Keith Busch
In-Reply-To: <20220518235011.153058-1-ebiggers@kernel.org>

From: Eric Biggers <ebiggers@google.com>

Add support for STATX_IOALIGN to ext4, so that I/O alignment information
is exposed to userspace in a consistent and easy-to-use way.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/ext4/ext4.h  |  1 +
 fs/ext4/file.c  | 15 ++++-----------
 fs/ext4/inode.c | 31 +++++++++++++++++++++++++++++++
 3 files changed, 36 insertions(+), 11 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index a743b1e3b89ec..7c43428901632 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3020,6 +3020,7 @@ extern struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
 extern int  ext4_write_inode(struct inode *, struct writeback_control *);
 extern int  ext4_setattr(struct user_namespace *, struct dentry *,
 			 struct iattr *);
+extern u32  ext4_dio_alignment(struct inode *inode);
 extern int  ext4_getattr(struct user_namespace *, const struct path *,
 			 struct kstat *, u32, unsigned int);
 extern void ext4_evict_inode(struct inode *);
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index de153b508b20a..ba2271e5287b2 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -39,19 +39,12 @@
 static bool ext4_dio_supported(struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct inode *inode = file_inode(iocb->ki_filp);
+	u32 dio_align = ext4_dio_alignment(inode);
 
-	if (IS_ENCRYPTED(inode)) {
-		if (!fscrypt_dio_supported(inode))
-			return false;
-		if (!IS_ALIGNED(iocb->ki_pos | iov_iter_alignment(iter),
-				i_blocksize(inode)))
-			return false;
-	}
-	if (fsverity_active(inode))
-		return false;
-	if (ext4_should_journal_data(inode))
+	if (!dio_align)
 		return false;
-	if (ext4_has_inline_data(inode))
+	if (dio_align > bdev_logical_block_size(inode->i_sb->s_bdev) &&
+	    !IS_ALIGNED(iocb->ki_pos | iov_iter_alignment(iter), dio_align))
 		return false;
 	return true;
 }
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 646ece9b3455f..5af2598aa170d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5533,6 +5533,22 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
 	return error;
 }
 
+u32 ext4_dio_alignment(struct inode *inode)
+{
+	if (fsverity_active(inode))
+		return 0;
+	if (ext4_should_journal_data(inode))
+		return 0;
+	if (ext4_has_inline_data(inode))
+		return 0;
+	if (IS_ENCRYPTED(inode)) {
+		if (!fscrypt_dio_supported(inode))
+			return 0;
+		return i_blocksize(inode);
+	}
+	return bdev_logical_block_size(inode->i_sb->s_bdev);
+}
+
 int ext4_getattr(struct user_namespace *mnt_userns, const struct path *path,
 		 struct kstat *stat, u32 request_mask, unsigned int query_flags)
 {
@@ -5548,6 +5564,21 @@ int ext4_getattr(struct user_namespace *mnt_userns, const struct path *path,
 		stat->btime.tv_nsec = ei->i_crtime.tv_nsec;
 	}
 
+	/*
+	 * Return the I/O alignment information if requested.  We only return
+	 * this information when requested, since on encrypted files it might
+	 * take a fair bit of work to get if the file wasn't opened recently.
+	 */
+	if ((request_mask & STATX_IOALIGN) && S_ISREG(inode->i_mode)) {
+		u32 dio_align = ext4_dio_alignment(inode);
+		unsigned int io_opt = bdev_io_opt(inode->i_sb->s_bdev);
+
+		stat->result_mask |= STATX_IOALIGN;
+		stat->mem_align_dio = dio_align;
+		stat->offset_align_dio = dio_align;
+		stat->offset_align_optimal = max(io_opt, i_blocksize(inode));
+	}
+
 	flags = ei->i_flags & EXT4_FL_USER_VISIBLE;
 	if (flags & EXT4_APPEND_FL)
 		stat->attributes |= STATX_ATTR_APPEND;
-- 
2.36.1


^ permalink raw reply related

* [RFC PATCH v2 5/7] f2fs: don't allow DIO reads but not DIO writes
From: Eric Biggers @ 2022-05-18 23:50 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, linux-api, linux-fscrypt,
	linux-block, linux-kernel, Keith Busch
In-Reply-To: <20220518235011.153058-1-ebiggers@kernel.org>

From: Eric Biggers <ebiggers@google.com>

Currently, if an f2fs filesystem is mounted with the mode=lfs and
io_bits mount options, DIO reads are allowed but DIO writes are not.
Allowing DIO reads but not DIO writes is an unusual restriction, which
is likely to be surprising to applications, namely any application that
both reads and writes from a file (using O_DIRECT).  Given this, let's
drop the support for DIO reads in this configuration.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/file.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 67f2e21ffbd67..68947fe16ea35 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -822,7 +822,6 @@ static inline bool f2fs_force_buffered_io(struct inode *inode,
 				struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
-	int rw = iov_iter_rw(iter);
 
 	if (!fscrypt_dio_supported(inode))
 		return true;
@@ -840,7 +839,7 @@ static inline bool f2fs_force_buffered_io(struct inode *inode,
 	 */
 	if (f2fs_sb_has_blkzoned(sbi))
 		return true;
-	if (f2fs_lfs_mode(sbi) && (rw == WRITE)) {
+	if (f2fs_lfs_mode(sbi)) {
 		if (block_unaligned_IO(inode, iocb, iter))
 			return true;
 		if (F2FS_IO_ALIGNED(sbi))
-- 
2.36.1


^ permalink raw reply related

* [RFC PATCH v2 1/7] statx: add I/O alignment information
From: Eric Biggers @ 2022-05-18 23:50 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, linux-api, linux-fscrypt,
	linux-block, linux-kernel, Keith Busch
In-Reply-To: <20220518235011.153058-1-ebiggers@kernel.org>

From: Eric Biggers <ebiggers@google.com>

Traditionally, the conditions for when DIO (direct I/O) is supported
were fairly simple: filesystems either supported DIO aligned to the
block device's logical block size, or didn't support DIO at all.

However, due to filesystem features that have been added over time (e.g,
data journalling, inline data, encryption, verity, compression,
checkpoint disabling, log-structured mode), the conditions for when DIO
is allowed on a file have gotten increasingly complex.  Whether a
particular file supports DIO, and with what alignment, can depend on
various file attributes and filesystem mount options, as well as which
block device(s) the file's data is located on.

XFS has an ioctl XFS_IOC_DIOINFO which exposes this information to
applications.  However, as discussed
(https://lore.kernel.org/linux-fsdevel/20220120071215.123274-1-ebiggers@kernel.org/T/#u),
this ioctl is rarely used and not known to be used outside of
XFS-specific code.  It also was never intended to indicate when a file
doesn't support DIO at all, and it only exposes the minimum I/O
alignment, not the optimal I/O alignment which has been requested too.

Therefore, let's expose this information via statx().  Add the
STATX_IOALIGN flag and three fields associated with it:

* stx_mem_align_dio: the alignment (in bytes) required for user memory
  buffers for DIO, or 0 if DIO is not supported on the file.

* stx_offset_align_dio: the alignment (in bytes) required for file
  offsets and I/O segment lengths for DIO, or 0 if DIO is not supported
  on the file.  This will only be nonzero if stx_mem_align_dio is
  nonzero, and vice versa.

* stx_offset_align_optimal: the alignment (in bytes) suggested for file
  offsets and I/O segment lengths to get optimal performance.  This
  applies to both DIO and buffered I/O.  It differs from stx_blocksize
  in that stx_offset_align_optimal will contain the real optimum I/O
  size, which may be a large value.  In contrast, for compatibility
  reasons stx_blocksize is the minimum size needed to avoid page cache
  read/write/modify cycles, which may be much smaller than the optimum
  I/O size.  For more details about the motivation for this field, see
  https://lore.kernel.org/r/20220210040304.GM59729@dread.disaster.area

Note that as with other statx() extensions, if STATX_IOALIGN isn't set
in the returned statx struct, then these new fields won't be filled in.
This will happen if the filesystem doesn't support STATX_IOALIGN, or if
the file isn't a regular file.  (It might be supported on block device
files in the future.)  It might also happen if the caller didn't include
STATX_IOALIGN in the request mask, since statx() isn't required to
return information that wasn't requested.

This commit adds the VFS-level plumbing for STATX_IOALIGN.  Individual
filesystems will still need to add code to support it.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/stat.c                 | 3 +++
 include/linux/stat.h      | 3 +++
 include/uapi/linux/stat.h | 9 +++++++--
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/fs/stat.c b/fs/stat.c
index 5c2c94464e8b0..9d477218545b8 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -611,6 +611,9 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer)
 	tmp.stx_dev_major = MAJOR(stat->dev);
 	tmp.stx_dev_minor = MINOR(stat->dev);
 	tmp.stx_mnt_id = stat->mnt_id;
+	tmp.stx_mem_align_dio = stat->mem_align_dio;
+	tmp.stx_offset_align_dio = stat->offset_align_dio;
+	tmp.stx_offset_align_optimal = stat->offset_align_optimal;
 
 	return copy_to_user(buffer, &tmp, sizeof(tmp)) ? -EFAULT : 0;
 }
diff --git a/include/linux/stat.h b/include/linux/stat.h
index 7df06931f25d8..48b8b1ad1567c 100644
--- a/include/linux/stat.h
+++ b/include/linux/stat.h
@@ -50,6 +50,9 @@ struct kstat {
 	struct timespec64 btime;			/* File creation time */
 	u64		blocks;
 	u64		mnt_id;
+	u32		mem_align_dio;
+	u32		offset_align_dio;
+	u32		offset_align_optimal;
 };
 
 #endif
diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
index 1500a0f58041a..f822b23e81091 100644
--- a/include/uapi/linux/stat.h
+++ b/include/uapi/linux/stat.h
@@ -124,9 +124,13 @@ struct statx {
 	__u32	stx_dev_minor;
 	/* 0x90 */
 	__u64	stx_mnt_id;
-	__u64	__spare2;
+	__u32	stx_mem_align_dio;	/* Memory buffer alignment for direct I/O */
+	__u32	stx_offset_align_dio;	/* File offset alignment for direct I/O */
 	/* 0xa0 */
-	__u64	__spare3[12];	/* Spare space for future expansion */
+	__u32	stx_offset_align_optimal; /* Optimal file offset alignment for I/O */
+	__u32	__spare2;
+	/* 0xa8 */
+	__u64	__spare3[11];	/* Spare space for future expansion */
 	/* 0x100 */
 };
 
@@ -152,6 +156,7 @@ struct statx {
 #define STATX_BASIC_STATS	0x000007ffU	/* The stuff in the normal stat struct */
 #define STATX_BTIME		0x00000800U	/* Want/got stx_btime */
 #define STATX_MNT_ID		0x00001000U	/* Got stx_mnt_id */
+#define STATX_IOALIGN		0x00002000U	/* Want/got IO alignment info */
 
 #define STATX__RESERVED		0x80000000U	/* Reserved for future struct statx expansion */
 
-- 
2.36.1


^ permalink raw reply related

* [RFC PATCH v2 0/7] make statx() return I/O alignment information
From: Eric Biggers @ 2022-05-18 23:50 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, linux-api, linux-fscrypt,
	linux-block, linux-kernel, Keith Busch

This patchset makes the statx() system call return I/O alignment
information, roughly following the design that was suggested at
https://lore.kernel.org/linux-fsdevel/20220120071215.123274-1-ebiggers@kernel.org/T/#u

This feature solves two problems: (a) it allows userspace to determine
when a file supports direct I/O, and with what alignment restrictions;
and (b) it allows userspace to determine the optimum I/O alignment for a
file.  For more details, see patch 1.

This is an RFC.  I'd greatly appreciate any feedback on the UAPI, as
that obviously needs to be gotten right from the beginning.  E.g., does
the proposed set of fields make sense?  Am I including the right
information in stx_offset_align_optimal?

Patch 1 adds the VFS support for STATX_IOALIGN.  The remaining patches
wire it up to ext4 and f2fs.  Support for other filesystems can be added
later.  We could also support this on block device files; however, since
block device nodes have different inodes from the block devices
themselves, it wouldn't apply to statx("/dev/$foo") but rather just to
'fd = open("/dev/foo"); statx(fd)'.  I'm unsure how useful that would be.

Note, f2fs has one corner case where DIO reads are allowed but not DIO
writes.  The proposed statx fields can't represent this.  My proposal
(patch 5) is to just eliminate this case, as it seems much too weird.
But I'd appreciate any feedback on that part.

This patchset applies to v5.18-rc7.

No changes since v1, which I sent a few months ago; I'm resending this
because people seem interested in it again
(https://lore.kernel.org/r/20220518171131.3525293-1-kbusch@fb.com).

Eric Biggers (7):
  statx: add I/O alignment information
  fscrypt: change fscrypt_dio_supported() to prepare for STATX_IOALIGN
  ext4: support STATX_IOALIGN
  f2fs: move f2fs_force_buffered_io() into file.c
  f2fs: don't allow DIO reads but not DIO writes
  f2fs: simplify f2fs_force_buffered_io()
  f2fs: support STATX_IOALIGN

 fs/crypto/inline_crypt.c  | 48 +++++++++++++++---------------
 fs/ext4/ext4.h            |  1 +
 fs/ext4/file.c            | 10 +++----
 fs/ext4/inode.c           | 31 ++++++++++++++++++++
 fs/f2fs/f2fs.h            | 45 -----------------------------
 fs/f2fs/file.c            | 61 ++++++++++++++++++++++++++++++++++++++-
 fs/stat.c                 |  3 ++
 include/linux/fscrypt.h   |  7 ++---
 include/linux/stat.h      |  3 ++
 include/uapi/linux/stat.h |  9 ++++--
 10 files changed, 136 insertions(+), 82 deletions(-)


base-commit: 42226c989789d8da4af1de0c31070c96726d990c
-- 
2.36.1


^ permalink raw reply

* Re: [PATCHv2 0/3] direct io alignment relax
From: Eric Biggers @ 2022-05-18 23:26 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-fsdevel, linux-block, axboe, Kernel Team, hch, bvanassche,
	damien.lemoal, Keith Busch
In-Reply-To: <20220518171131.3525293-1-kbusch@fb.com>

On Wed, May 18, 2022 at 10:11:28AM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Including the fs list this time.
> 
> I am still working on a better interface to report the dio alignment to
> an application. The most recent suggestion of using statx is proving to
> be less straight forward than I thought, but I don't want to hold this
> series up for that.
> 

Note that I already implemented the statx support and sent it out for review:
https://lore.kernel.org/linux-fsdevel/20220211061158.227688-1-ebiggers@kernel.org/T/#u
However, the patch series only received one comment.  I can send it out again if
people have become interested in it again...

- Eric

^ permalink raw reply

* Re: [PATCH 3/3] blk-mq: remove the done argument to blk_execute_rq_nowait
From: kernel test robot @ 2022-05-18 23:23 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: kbuild-all, Ming Lei, linux-block, linux-nvme, linux-scsi,
	target-devel
In-Reply-To: <20220517064901.3059255-4-hch@lst.de>

Hi Christoph,

I love your patch! Yet something to improve:

[auto build test ERROR on axboe-block/for-next]
[also build test ERROR on next-20220518]
[cannot apply to mkp-scsi/for-next jejb-scsi/for-next linus/master v5.18-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Christoph-Hellwig/blk-mq-remove-__blk_execute_rq_nowait/20220517-154900
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: arc-allyesconfig (https://download.01.org/0day-ci/archive/20220519/202205190712.zyCIh9kG-lkp@intel.com/config)
compiler: arceb-elf-gcc (GCC) 11.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/2dc03b4b4f1f1aa542a1ab6d6ff64be3d9db050c
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Christoph-Hellwig/blk-mq-remove-__blk_execute_rq_nowait/20220517-154900
        git checkout 2dc03b4b4f1f1aa542a1ab6d6ff64be3d9db050c
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=arc SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/scsi/ufs/ufshpb.c: In function 'ufshpb_execute_map_req':
>> drivers/scsi/ufs/ufshpb.c:721:41: error: expected ';' before 'hpb'
     721 |         blk_execute_rq_nowait(req, true)
         |                                         ^
         |                                         ;
     722 | 
     723 |         hpb->stats.map_req_cnt++;
         |         ~~~                              


vim +721 drivers/scsi/ufs/ufshpb.c

   683	
   684	static int ufshpb_execute_map_req(struct ufshpb_lu *hpb,
   685					  struct ufshpb_req *map_req, bool last)
   686	{
   687		struct request_queue *q;
   688		struct request *req;
   689		struct scsi_cmnd *scmd;
   690		int mem_size = hpb->srgn_mem_size;
   691		int ret = 0;
   692		int i;
   693	
   694		q = hpb->sdev_ufs_lu->request_queue;
   695		for (i = 0; i < hpb->pages_per_srgn; i++) {
   696			ret = bio_add_pc_page(q, map_req->bio, map_req->rb.mctx->m_page[i],
   697					      PAGE_SIZE, 0);
   698			if (ret != PAGE_SIZE) {
   699				dev_err(&hpb->sdev_ufs_lu->sdev_dev,
   700					   "bio_add_pc_page fail %d - %d\n",
   701					   map_req->rb.rgn_idx, map_req->rb.srgn_idx);
   702				return ret;
   703			}
   704		}
   705	
   706		req = map_req->req;
   707	
   708		blk_rq_append_bio(req, map_req->bio);
   709	
   710		req->end_io_data = map_req;
   711		req->end_io = ufshpb_map_req_compl_fn;
   712	
   713		if (unlikely(last))
   714			mem_size = hpb->last_srgn_entries * HPB_ENTRY_SIZE;
   715	
   716		scmd = blk_mq_rq_to_pdu(req);
   717		ufshpb_set_read_buf_cmd(scmd->cmnd, map_req->rb.rgn_idx,
   718					map_req->rb.srgn_idx, mem_size);
   719		scmd->cmd_len = HPB_READ_BUFFER_CMD_LENGTH;
   720	
 > 721		blk_execute_rq_nowait(req, true)
   722	
   723		hpb->stats.map_req_cnt++;
   724		return 0;
   725	}
   726	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply

* [GIT PULL] Block fix for 5.18-final
From: Jens Axboe @ 2022-05-18 23:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-block@vger.kernel.org

Hi Linus,

Just a small fix for a missing fifo time assigment for the head
insertion case in mq-deadline.

Please pull!


The following changes since commit f1c8781ac9d87650ccf45a354c0bbfa3f9230371:

  s390/dasd: Use kzalloc instead of kmalloc/memset (2022-05-05 20:08:27 -0600)

are available in the Git repository at:

  git://git.kernel.dk/linux-block.git tags/block-5.18-2022-05-18

for you to fetch changes up to 725f22a1477c9c15aa67ad3af96fe28ec4fe72d2:

  block/mq-deadline: Set the fifo_time member also if inserting at head (2022-05-13 17:02:46 -0600)

----------------------------------------------------------------
block-5.18-2022-05-18

----------------------------------------------------------------
Bart Van Assche (1):
      block/mq-deadline: Set the fifo_time member also if inserting at head

 block/mq-deadline.c | 1 +
 1 file changed, 1 insertion(+)

-- 
Jens Axboe


^ permalink raw reply

* Re: [dm-devel] [PATCH v4 00/13] support non power of 2 zoned devices
From: Luis Chamberlain @ 2022-05-18 23:06 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Christoph Hellwig, Pankaj Raghav, axboe, pankydev8, gost.dev,
	damien.lemoal, jiangbo.365, linux-nvme, linux-kernel, linux-block,
	linux-fsdevel, dm-devel, dsterba, linux-btrfs
In-Reply-To: <YoPAnj9ufkt5nh1G@mit.edu>

On Tue, May 17, 2022 at 11:34:54AM -0400, Theodore Ts'o wrote:
> On Tue, May 17, 2022 at 10:10:48AM +0200, Christoph Hellwig wrote:
> > I'm a little surprised about all this activity.
> > 
> > I though the conclusion at LSF/MM was that for Linux itself there
> > is very little benefit in supporting this scheme.  It will massively
> > fragment the supported based of devices and applications, while only
> > having the benefit of supporting some Samsung legacy devices.
> 
> FWIW,
> 
> That wasn't my impression from that LSF/MM session, but once the
> videos become available, folks can decide for themselves.

Agreed, contrary to conventional storage devices, with the zone storage
ecosystem we simply have a requirement of zone drive replacements matching
zone size. That requirement exists for po2 or npo2. The work in this patch
set proves that supporting npo2 was in the end straight forward. As the one
putting together the BoF I can say that there were no sticking points raised
to move forward with this when the topic came up. So I am very surprised to
hear about any other perceived conclusion.

  Luis

^ permalink raw reply

* Re: [PATCHv2 0/3] direct io alignment relax
From: Jens Axboe @ 2022-05-18 22:45 UTC (permalink / raw)
  To: Keith Busch, linux-fsdevel, linux-block
  Cc: Kernel Team, hch, bvanassche, damien.lemoal, Keith Busch
In-Reply-To: <20220518171131.3525293-1-kbusch@fb.com>

On 5/18/22 11:11 AM, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Including the fs list this time.
> 
> I am still working on a better interface to report the dio alignment to
> an application. The most recent suggestion of using statx is proving to
> be less straight forward than I thought, but I don't want to hold this
> series up for that.

This looks good to me. Anyone object to queueing this one up?

-- 
Jens Axboe


^ permalink raw reply

* Re: cleanup blk_execute_rq*
From: Jens Axboe @ 2022-05-18 22:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ming Lei, linux-block, linux-nvme, linux-scsi, target-devel
In-Reply-To: <20220517064901.3059255-1-hch@lst.de>

On 5/17/22 12:48 AM, Christoph Hellwig wrote:
> Hi Jens,
> 
> this series cleans up the blk_execute_rq* helpers.  It simplifies the
> plugging mess a bit, fixes the sparse __bitwise warnings and simplifies
> the blk_execute_rq_nowait API a bit.

Looks good to me, but let's do this series post flushing out the
initial bits. It ends up depending on the passthrough changes,
yet also conflicts with the nvme changes on the driver side.

-- 
Jens Axboe


^ permalink raw reply

* Re: [PATCH 1/3] blk-mq: remove __blk_execute_rq_nowait
From: Jens Axboe @ 2022-05-18 22:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ming Lei, linux-block, linux-nvme, linux-scsi, target-devel
In-Reply-To: <20220517064901.3059255-2-hch@lst.de>

On 5/17/22 12:48 AM, Christoph Hellwig wrote:
> +	} else {
> +		/*
> +		 * Prevent hang_check timer from firing at us during very long
> +		 * I/O
> +		 */
> +		unsigned long hang_check = 
> +			sysctl_hung_task_timeout_secs;

Trailing whitespace and odd formatting here, I fixed it up.

-- 
Jens Axboe


^ permalink raw reply

* [PATCH] blk-cgroup: delete rcu_read_lock_held() WARN_ON_ONCE()
From: Jens Axboe @ 2022-05-18 22:35 UTC (permalink / raw)
  To: linux-block@vger.kernel.org; +Cc: Marek Szyprowski

A previous commit got rid of unnecessary rcu_read_lock() inside the
IRQ disabling queue_lock, but this debug statement was left. It's now
firing since we are indeed not inside a RCU read lock, but we don't
need to be as we're still preempt safe.

Get rid of the check, as we have a lockdep assert for holding the
queue lock right after it anyway.

Link: https://lore.kernel.org/linux-block/46253c48-81cb-0787-20ad-9133afdd9e21@samsung.com/
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Fixes: 77c570a1ea85 ("blk-cgroup: Remove unnecessary rcu_read_lock/unlock()")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

---

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 0676cf7a41b5..40161a3f68d0 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -298,7 +298,6 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
 	struct blkcg_gq *blkg;
 	int i, ret;
 
-	WARN_ON_ONCE(!rcu_read_lock_held());
 	lockdep_assert_held(&q->queue_lock);
 
 	/* request_queue is dying, do not create/recreate a blkg */

-- 
Jens Axboe


^ permalink raw reply related

* Re: [PATCH] blk-cgroup: Remove unnecessary rcu_read_lock/unlock()
From: Jens Axboe @ 2022-05-18 22:31 UTC (permalink / raw)
  To: Marek Szyprowski, bh1scw, tj
  Cc: cgroups, linux-block, linux-kernel, songmuchun
In-Reply-To: <1dad86bb-ae31-5bf8-5810-9e81c68be8ff@kernel.dk>

On 5/18/22 4:29 PM, Jens Axboe wrote:
> On 5/18/22 1:28 PM, Marek Szyprowski wrote:
>> On 16.05.2022 19:39, bh1scw@gmail.com wrote:
>>> From: Fanjun Kong <bh1scw@gmail.com>
>>>
>>> spin_lock_irq/spin_unlock_irq contains preempt_disable/enable().
>>> Which can serve as RCU read-side critical region, so remove
>>> rcu_read_lock/unlock().
>>>
>>> Signed-off-by: Fanjun Kong <bh1scw@gmail.com>
>>
>> This patch landed in today's linux next-20220518 as commit 77c570a1ea85 
>> ("blk-cgroup: Remove unnecessary rcu_read_lock/unlock()").
>>
>> Unfortunately it triggers the following warning on ARM64 based Raspberry 
>> Pi 4B board:>
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 1 at block/blk-cgroup.c:301 blkg_create+0x398/0x4e0
> 
> Should this use rcu_read_lock_any_held() rather than
> rcu_read_lock_held()?

I think the better alternative is just to delete the WARN_ON(), we have
a:

lockdep_assert_held(&q->queue_lock);

right after it. Since the queue_lock is IRQ disabling, having two checks
serves no purpose. I'll kill the line.

-- 
Jens Axboe


^ permalink raw reply

* Re: [PATCH] blk-cgroup: Remove unnecessary rcu_read_lock/unlock()
From: Jens Axboe @ 2022-05-18 22:29 UTC (permalink / raw)
  To: Marek Szyprowski, bh1scw, tj
  Cc: cgroups, linux-block, linux-kernel, songmuchun
In-Reply-To: <46253c48-81cb-0787-20ad-9133afdd9e21@samsung.com>

On 5/18/22 1:28 PM, Marek Szyprowski wrote:
> On 16.05.2022 19:39, bh1scw@gmail.com wrote:
>> From: Fanjun Kong <bh1scw@gmail.com>
>>
>> spin_lock_irq/spin_unlock_irq contains preempt_disable/enable().
>> Which can serve as RCU read-side critical region, so remove
>> rcu_read_lock/unlock().
>>
>> Signed-off-by: Fanjun Kong <bh1scw@gmail.com>
> 
> This patch landed in today's linux next-20220518 as commit 77c570a1ea85 
> ("blk-cgroup: Remove unnecessary rcu_read_lock/unlock()").
> 
> Unfortunately it triggers the following warning on ARM64 based Raspberry 
> Pi 4B board:>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 1 at block/blk-cgroup.c:301 blkg_create+0x398/0x4e0

Should this use rcu_read_lock_any_held() rather than rcu_read_lock_held()?

-- 
Jens Axboe


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox