public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 0/2] block: try to make aligned bio in case of big chunk IO
@ 2023-11-09  8:28 Ming Lei
  2023-11-09  8:28 ` [PATCH V2 1/2] block: don't call into iov_iter_revert if nothing is left Ming Lei
  2023-11-09  8:28 ` [PATCH V2 2/2] block: try to make aligned bio in case of big chunk IO Ming Lei
  0 siblings, 2 replies; 6+ messages in thread
From: Ming Lei @ 2023-11-09  8:28 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Ed Tsai, Ming Lei

Hello,

The 1st adds check to avoid to call into iov_iter_revert() with zero
'unroll'.

The 2nd patch improves big chunk sequential IO performance by aligning
bio with max io size.

V2:
	- add patch 1/2
	- add 'max_size' hint to iov_iter_extract_pages() so that bio/iov
	  iter revert can be minimized, as suggested by Christoph

BTW, the check on 'bio->bi_bdev' isn't removed because there are more
such patterns(P2PDMA, block size check) in __bio_iov_iter_get_pages(),
and we need to audit that bio->bi_bdev is really non-NULL first, so it
shouldn't be part of this patchset

Ming Lei (2):
  block: don't call into iov_iter_revert if nothing is left
  block: try to make aligned bio in case of big chunk IO

 block/bio.c            | 126 +++++++++++++++++++++++++++++++++++++++--
 include/linux/blkdev.h |   5 ++
 2 files changed, 125 insertions(+), 6 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH V2 1/2] block: don't call into iov_iter_revert if nothing is left
  2023-11-09  8:28 [PATCH V2 0/2] block: try to make aligned bio in case of big chunk IO Ming Lei
@ 2023-11-09  8:28 ` Ming Lei
  2023-11-09 14:35   ` Christoph Hellwig
  2023-11-09  8:28 ` [PATCH V2 2/2] block: try to make aligned bio in case of big chunk IO Ming Lei
  1 sibling, 1 reply; 6+ messages in thread
From: Ming Lei @ 2023-11-09  8:28 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Ed Tsai, Ming Lei

Most of times all pages are just added to bio, and nothing is left, so
not necessary to call into iov_iter_revert().

Same with block size alignment handling.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/bio.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 816d412c06e9..09a5e71a0372 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1262,8 +1262,11 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 
 	if (bio->bi_bdev) {
 		size_t trim = size & (bdev_logical_block_size(bio->bi_bdev) - 1);
-		iov_iter_revert(iter, trim);
-		size -= trim;
+
+		if (trim) {
+			iov_iter_revert(iter, trim);
+			size -= trim;
+		}
 	}
 
 	if (unlikely(!size)) {
@@ -1286,7 +1289,8 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 		offset = 0;
 	}
 
-	iov_iter_revert(iter, left);
+	if (left)
+		iov_iter_revert(iter, left);
 out:
 	while (i < nr_pages)
 		bio_release_page(bio, pages[i++]);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH V2 2/2] block: try to make aligned bio in case of big chunk IO
  2023-11-09  8:28 [PATCH V2 0/2] block: try to make aligned bio in case of big chunk IO Ming Lei
  2023-11-09  8:28 ` [PATCH V2 1/2] block: don't call into iov_iter_revert if nothing is left Ming Lei
@ 2023-11-09  8:28 ` Ming Lei
  2023-11-09 14:39   ` Christoph Hellwig
  2023-11-09 19:30   ` kernel test robot
  1 sibling, 2 replies; 6+ messages in thread
From: Ming Lei @ 2023-11-09  8:28 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Ed Tsai, Ming Lei

In case of big chunk sequential IO, bio size is often not aligned with
queue's max IO size because of multipage bvec, and unaligned & small bio
can be caused by bio split, then sequential IO perf drops, so try to align
bio with max IO size for avoiding this issue.

Provide 'max_size' hint to iov_iter_extract_pages() when this bio is close
to be full, and try to keep bio aligned with max IO size, so that we can
minimize bio & iov_iter revert. In my 1GB IO test over VM with 2G ram,
when memory becomes highly fragmented, revert ratio(revert bytes/buf size)
can be kept as small as 0.5% with this algorithm.

Ed Tsai reported that this change improves 64MB read/write by > 15%~25% in
Antutu V10 Storage Test.

Reported-by: Ed Tsai <ed.tsai@mediatek.com>
Closes: https://lore.kernel.org/linux-block/20231025092255.27930-1-ed.tsai@mediatek.com/
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/bio.c            | 116 +++++++++++++++++++++++++++++++++++++++--
 include/linux/blkdev.h |   5 ++
 2 files changed, 118 insertions(+), 3 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 09a5e71a0372..e360ac052764 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1210,6 +1210,57 @@ static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page,
 	return 0;
 }
 
+/*
+ * Figure out max_size hint of iov_iter_extract_pages() for minimizing
+ * bio & iov iter revert so that bio can be aligned with max io size.
+ */
+static unsigned int bio_get_buffer_size_hint(const struct bio *bio,
+		unsigned int left)
+{
+	unsigned int nr_bvecs = bio->bi_max_vecs - bio->bi_vcnt;
+	unsigned int size, predicted_space, max_bytes;
+	unsigned int space = nr_bvecs << PAGE_SHIFT;
+	unsigned int align_deviation;
+
+	/* If we have enough space really, just try to get all pages */
+	if (!bio->bi_bdev || nr_bvecs >= (bio->bi_max_vecs / 4) ||
+			!bio->bi_vcnt || left <= space)
+		return UINT_MAX - size;
+
+	max_bytes = bdev_max_io_bytes(bio->bi_bdev);
+	size = bio->bi_iter.bi_size;
+
+	/*
+	 * One bvec can hold physically continuous page frames with
+	 * multipage bvec and bytes in these pages can be pretty big, so
+	 * predict the available space by averaging bytes on all bvecs
+	 */
+	predicted_space = size * nr_bvecs / bio->bi_vcnt;
+	/*
+	 * If predicted space is bigger than max io bytes and at least two
+	 * vectors left, ask for all pages
+	 */
+	if (predicted_space > max_bytes && nr_bvecs > 2)
+		return UINT_MAX - size;
+
+	/*
+	 * This bio is close to be full, and stop to add pages if it is
+	 * basically aligned, otherwise just get & add pages if the bio
+	 * can be kept as aligned, so that we can minimize bio & iov iter
+	 * revert
+	 */
+	align_deviation = max_t(unsigned int, 16U * 1024, max_bytes / 16);
+	if ((size & (max_bytes - 1)) > align_deviation) {
+		unsigned aligned_bytes = max_bytes - (size & (max_bytes - 1));
+
+		/* try to keep bio aligned if we have enough data and space */
+		if (aligned_bytes <= left && aligned_bytes <= predicted_space)
+			return aligned_bytes;
+	}
+
+	return 0;
+}
+
 #define PAGE_PTRS_PER_BVEC     (sizeof(struct bio_vec) / sizeof(struct page *))
 
 /**
@@ -1229,7 +1280,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
 	struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
 	struct page **pages = (struct page **)bv;
-	ssize_t size, left;
+	ssize_t size, left, max_size;
 	unsigned len, i = 0;
 	size_t offset;
 	int ret = 0;
@@ -1245,6 +1296,10 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	if (bio->bi_bdev && blk_queue_pci_p2pdma(bio->bi_bdev->bd_disk->queue))
 		extraction_flags |= ITER_ALLOW_P2PDMA;
 
+	max_size = bio_get_buffer_size_hint(bio, iov_iter_count(iter));
+	if (!max_size)
+		return -E2BIG;
+
 	/*
 	 * Each segment in the iov is required to be a block size multiple.
 	 * However, we may not be able to get the entire segment if it spans
@@ -1252,8 +1307,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	 * result to ensure the bio's total size is correct. The remainder of
 	 * the iov data will be picked up in the next bio iteration.
 	 */
-	size = iov_iter_extract_pages(iter, &pages,
-				      UINT_MAX - bio->bi_iter.bi_size,
+	size = iov_iter_extract_pages(iter, &pages, max_size,
 				      nr_pages, extraction_flags, &offset);
 	if (unlikely(size <= 0))
 		return size ? size : -EFAULT;
@@ -1298,6 +1352,46 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	return ret;
 }
 
+/* should only be called before submission */
+static void bio_shrink(struct bio *bio, unsigned bytes)
+{
+	unsigned int size = bio->bi_iter.bi_size;
+	int idx;
+
+	if (bytes >= size)
+		return;
+
+	WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));
+
+	idx = bio->bi_vcnt - 1;
+	bio->bi_iter.bi_size -= bytes;
+	while (bytes > 0) {
+		struct bio_vec *bv = &bio->bi_io_vec[idx];
+		unsigned int len = min_t(unsigned, bv->bv_len, bytes);
+
+		bytes -= len;
+		bv->bv_len -= len;
+		if (!bv->bv_len) {
+			bio_release_page(bio, bv->bv_page);
+			idx--;
+		}
+	}
+	WARN_ON_ONCE(idx < 0);
+	bio->bi_vcnt = idx + 1;
+}
+
+static unsigned bio_align_with_io_size(struct bio *bio)
+{
+	unsigned int size = bio->bi_iter.bi_size;
+	unsigned int trim = size & (bdev_max_io_bytes(bio->bi_bdev) - 1);
+
+	if (trim && trim != size) {
+		bio_shrink(bio, trim);
+		return trim;
+	}
+	return 0;
+}
+
 /**
  * bio_iov_iter_get_pages - add user or kernel pages to a bio
  * @bio: bio to add pages to
@@ -1337,6 +1431,22 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 		ret = __bio_iov_iter_get_pages(bio, iter);
 	} while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));
 
+
+	/*
+	 * If we still have data and bio is full, this bio size may not be
+	 * aligned with max io size, small bio can be caused by split, try
+	 * to avoid this situation by aligning bio with max io size.
+	 *
+	 * Big chunk of sequential IO workload can benefit from this way.
+	 */
+	if (!ret && iov_iter_count(iter) && bio->bi_bdev &&
+			bio_op(bio) != REQ_OP_ZONE_APPEND) {
+		unsigned trim = bio_align_with_io_size(bio);
+
+		if (trim)
+			iov_iter_revert(iter, trim);
+	}
+
 	return bio->bi_vcnt ? 0 : ret;
 }
 EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index eef450f25982..2d275cdc39d8 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1151,6 +1151,11 @@ static inline unsigned queue_logical_block_size(const struct request_queue *q)
 	return retval;
 }
 
+static inline unsigned int bdev_max_io_bytes(struct block_device *bdev)
+{
+	return queue_max_bytes(bdev_get_queue(bdev));
+}
+
 static inline unsigned int bdev_logical_block_size(struct block_device *bdev)
 {
 	return queue_logical_block_size(bdev_get_queue(bdev));
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH V2 1/2] block: don't call into iov_iter_revert if nothing is left
  2023-11-09  8:28 ` [PATCH V2 1/2] block: don't call into iov_iter_revert if nothing is left Ming Lei
@ 2023-11-09 14:35   ` Christoph Hellwig
  0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2023-11-09 14:35 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Christoph Hellwig, Ed Tsai

>  	if (bio->bi_bdev) {
>  		size_t trim = size & (bdev_logical_block_size(bio->bi_bdev) - 1);
> -		iov_iter_revert(iter, trim);
> -		size -= trim;
> +
> +		if (trim) {
> +			iov_iter_revert(iter, trim);
> +			size -= trim;
> +		}

We also need to fix this.  This isn't supposed to be called without
a bdev set, but Kent broke this.  But if we don't have a bdev
we need to pass the block size manually and still do the revert if needed.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH V2 2/2] block: try to make aligned bio in case of big chunk IO
  2023-11-09  8:28 ` [PATCH V2 2/2] block: try to make aligned bio in case of big chunk IO Ming Lei
@ 2023-11-09 14:39   ` Christoph Hellwig
  2023-11-09 19:30   ` kernel test robot
  1 sibling, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2023-11-09 14:39 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Christoph Hellwig, Ed Tsai

> +/*
> + * Figure out max_size hint of iov_iter_extract_pages() for minimizing
> + * bio & iov iter revert so that bio can be aligned with max io size.
> + */
> +static unsigned int bio_get_buffer_size_hint(const struct bio *bio,
> +		unsigned int left)
> +{
> +	unsigned int nr_bvecs = bio->bi_max_vecs - bio->bi_vcnt;
> +	unsigned int size, predicted_space, max_bytes;
> +	unsigned int space = nr_bvecs << PAGE_SHIFT;
> +	unsigned int align_deviation;
> +
> +	/* If we have enough space really, just try to get all pages */
> +	if (!bio->bi_bdev || nr_bvecs >= (bio->bi_max_vecs / 4) ||
> +			!bio->bi_vcnt || left <= space)

We need to either decided if the bdev is mandatory and we can rely
on it, or stop adding conditionals based on it.  NAK for anything
adding more if bio->bi_bdev in this path.

But more important I'm not sure why we need all this extra code to
start with.  If you just did an extra for more space than we want
to submit, just release the pages that we grabbed but don't want to
add instead of adding them to be bio_vec.

Remember this is an absolute fast path for high IOPS I/O, we should
avoid adding cruft to it.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH V2 2/2] block: try to make aligned bio in case of big chunk IO
  2023-11-09  8:28 ` [PATCH V2 2/2] block: try to make aligned bio in case of big chunk IO Ming Lei
  2023-11-09 14:39   ` Christoph Hellwig
@ 2023-11-09 19:30   ` kernel test robot
  1 sibling, 0 replies; 6+ messages in thread
From: kernel test robot @ 2023-11-09 19:30 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: llvm, oe-kbuild-all, linux-block, Christoph Hellwig, Ed Tsai,
	Ming Lei

Hi Ming,

kernel test robot noticed the following build warnings:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on linus/master v6.6 next-20231109]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Ming-Lei/block-don-t-call-into-iov_iter_revert-if-nothing-is-left/20231109-190100
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
patch link:    https://lore.kernel.org/r/20231109082827.2276696-3-ming.lei%40redhat.com
patch subject: [PATCH V2 2/2] block: try to make aligned bio in case of big chunk IO
config: arm-davinci_all_defconfig (https://download.01.org/0day-ci/archive/20231110/202311100354.HYfqOQ7o-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231110/202311100354.HYfqOQ7o-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311100354.HYfqOQ7o-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> block/bio.c:1228:21: warning: variable 'size' is uninitialized when used here [-Wuninitialized]
    1228 |                 return UINT_MAX - size;
         |                                   ^~~~
   block/bio.c:1221:19: note: initialize the variable 'size' to silence this warning
    1221 |         unsigned int size, predicted_space, max_bytes;
         |                          ^
         |                           = 0
   1 warning generated.

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for PINCTRL_SINGLE
   Depends on [n]: PINCTRL [=n] && OF [=y] && HAS_IOMEM [=y]
   Selected by [y]:
   - ARCH_DAVINCI [=y] && ARCH_MULTI_V5 [=y] && CPU_LITTLE_ENDIAN [=y]


vim +/size +1228 block/bio.c

  1212	
  1213	/*
  1214	 * Figure out max_size hint of iov_iter_extract_pages() for minimizing
  1215	 * bio & iov iter revert so that bio can be aligned with max io size.
  1216	 */
  1217	static unsigned int bio_get_buffer_size_hint(const struct bio *bio,
  1218			unsigned int left)
  1219	{
  1220		unsigned int nr_bvecs = bio->bi_max_vecs - bio->bi_vcnt;
  1221		unsigned int size, predicted_space, max_bytes;
  1222		unsigned int space = nr_bvecs << PAGE_SHIFT;
  1223		unsigned int align_deviation;
  1224	
  1225		/* If we have enough space really, just try to get all pages */
  1226		if (!bio->bi_bdev || nr_bvecs >= (bio->bi_max_vecs / 4) ||
  1227				!bio->bi_vcnt || left <= space)
> 1228			return UINT_MAX - size;
  1229	
  1230		max_bytes = bdev_max_io_bytes(bio->bi_bdev);
  1231		size = bio->bi_iter.bi_size;
  1232	
  1233		/*
  1234		 * One bvec can hold physically continuous page frames with
  1235		 * multipage bvec and bytes in these pages can be pretty big, so
  1236		 * predict the available space by averaging bytes on all bvecs
  1237		 */
  1238		predicted_space = size * nr_bvecs / bio->bi_vcnt;
  1239		/*
  1240		 * If predicted space is bigger than max io bytes and at least two
  1241		 * vectors left, ask for all pages
  1242		 */
  1243		if (predicted_space > max_bytes && nr_bvecs > 2)
  1244			return UINT_MAX - size;
  1245	
  1246		/*
  1247		 * This bio is close to be full, and stop to add pages if it is
  1248		 * basically aligned, otherwise just get & add pages if the bio
  1249		 * can be kept as aligned, so that we can minimize bio & iov iter
  1250		 * revert
  1251		 */
  1252		align_deviation = max_t(unsigned int, 16U * 1024, max_bytes / 16);
  1253		if ((size & (max_bytes - 1)) > align_deviation) {
  1254			unsigned aligned_bytes = max_bytes - (size & (max_bytes - 1));
  1255	
  1256			/* try to keep bio aligned if we have enough data and space */
  1257			if (aligned_bytes <= left && aligned_bytes <= predicted_space)
  1258				return aligned_bytes;
  1259		}
  1260	
  1261		return 0;
  1262	}
  1263	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-11-09 19:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-09  8:28 [PATCH V2 0/2] block: try to make aligned bio in case of big chunk IO Ming Lei
2023-11-09  8:28 ` [PATCH V2 1/2] block: don't call into iov_iter_revert if nothing is left Ming Lei
2023-11-09 14:35   ` Christoph Hellwig
2023-11-09  8:28 ` [PATCH V2 2/2] block: try to make aligned bio in case of big chunk IO Ming Lei
2023-11-09 14:39   ` Christoph Hellwig
2023-11-09 19:30   ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox