* [PATCH v7 0/4] block: add larger order folio instead of pages
[not found] <CGME20240704071118epcas5p141ef3c2cbcde0ce31d342b5743a7dcf1@epcas5p1.samsung.com>
@ 2024-07-04 7:03 ` Kundan Kumar
2024-07-04 7:03 ` [PATCH v7 1/4] block: Added folio-lized version of bvec_try_merge_hw_page() Kundan Kumar
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Kundan Kumar @ 2024-07-04 7:03 UTC (permalink / raw)
To: axboe, hch, willy, kbusch
Cc: linux-block, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti,
gost.dev, Kundan Kumar
User space memory is mapped in kernel in form of pages array. These pages
are iterated and added to BIO. In process, pages are also checked for
contiguity and merged.
When mTHP is enabled the pages generally belong to larger order folio. This
patch series enables adding large folio to bio. It fetches folio for
page in the page array. The page might start from an offset in the folio
which could be multiples of PAGE_SIZE. Subsequent pages in page array
might belong to same folio. Using the length of folio, folio_offset and
remaining size, determine length in folio which can be added to the bio.
Check if pages are contiguous and belong to same folio. If yes then skip
further processing for the contiguous pages.
This complete scheme reduces the overhead of iterating through pages.
perf diff before and after this change(with mTHP enabled):
Perf diff for write I/O with 128K block size:
1.24% -0.20% [kernel.kallsyms] [k] bio_iov_iter_get_pages
1.71% [kernel.kallsyms] [k] bvec_try_merge_page
Perf diff for read I/O with 128K block size:
4.03% -1.59% [kernel.kallsyms] [k] bio_iov_iter_get_pages
5.14% [kernel.kallsyms] [k] bvec_try_merge_page
Patch 1: Adds folio-lized version of bvec_try_merge_hw_page()
Patch 2: Adds folio-lized version of bio_add_hw_page()
Patch 3: Adds bigger size from folio to BIO
Patch 4: Unpin user pages belonging to folio at once
Changes since v6:
- folio-lize bvec_try_merge_hw_page() to bvec_try_merge_hw_folio()
- restructured the code as per comments
- typecast with size_t while calculating the offset in folio.
Changes since v5:
- Made offset and len as size_t in function bio_add_hw_folio()
- Avoid unpinning skipped pages at submission, rather unpin all pages at
once on IO completion
Changes since v4:
- folio-lize bio_add_hw_page() to bio_add_hw_folio()
- make bio_add_hw_page() as a wrapper around bio_add_hw_folio()
- make new functions bio_release_folio() and unpin_user_folio()
- made a helper function to check for contiguous pages of folio
- changed &folio->page to folio_page(folio, 0)
- reworded comments
Changes since v3:
- Added change to see if pages are contiguous and belong to same folio.
If not then avoid skipping of pages.(Suggested by Matthew Wilcox)
Changes since v2:
- Made separate patches
- Corrected code as per kernel coding style
- Removed size_folio variable
Changes since v1:
- Changed functions bio_iov_add_page() and bio_iov_add_zone_append_page()
to accept a folio
- Removed branch and calculate folio_offset and len in same fashion for
both 0 order and larger folios
- Added change in NVMe driver to use nvme_setup_prp_simple() by
ignoring multiples of PAGE_SIZE in offset
- Added a change to unpin_user_pages which were added as folios. Also
stopped the unpin of pages one by one from __bio_release_pages()
(Suggested by Keith)
Kundan Kumar (4):
block: Added folio-lized version of bvec_try_merge_hw_page()
block: Added folio-lized version of bio_add_hw_page()
block: introduce folio awareness and add a bigger size from folio
block: unpin user pages belonging to a folio at once
block/bio.c | 133 ++++++++++++++++++++++++++++++++++++---------
block/blk.h | 15 +++++
include/linux/mm.h | 1 +
mm/gup.c | 13 +++++
4 files changed, 135 insertions(+), 27 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v7 1/4] block: Added folio-lized version of bvec_try_merge_hw_page()
2024-07-04 7:03 ` [PATCH v7 0/4] block: add larger order folio instead of pages Kundan Kumar
@ 2024-07-04 7:03 ` Kundan Kumar
2024-07-06 8:01 ` Christoph Hellwig
2024-07-04 7:03 ` [PATCH v7 2/4] block: Added folio-lized version of bio_add_hw_page() Kundan Kumar
` (2 subsequent siblings)
3 siblings, 1 reply; 11+ messages in thread
From: Kundan Kumar @ 2024-07-04 7:03 UTC (permalink / raw)
To: axboe, hch, willy, kbusch
Cc: linux-block, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti,
gost.dev, Kundan Kumar
Added new function bvec_try_merge_hw_folio(). This is a prep patch.
Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com>
---
block/bio.c | 17 +++++++++++++++++
block/blk.h | 4 ++++
2 files changed, 21 insertions(+)
diff --git a/block/bio.c b/block/bio.c
index e9e809a63c59..c10f5fa0ba27 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -952,6 +952,23 @@ bool bvec_try_merge_hw_page(struct request_queue *q, struct bio_vec *bv,
struct page *page, unsigned len, unsigned offset,
bool *same_page)
{
+ struct folio *folio = page_folio(page);
+
+ return bvec_try_merge_hw_folio(q, bv, folio, len,
+ ((size_t)folio_page_idx(folio, page) << PAGE_SHIFT) +
+ offset, same_page);
+}
+
+/*
+ * Try to merge a folio into a segment, while obeying the hardware segment
+ * size limit. This is not for normal read/write bios, but for passthrough
+ * or Zone Append operations that we can't split.
+ */
+bool bvec_try_merge_hw_folio(struct request_queue *q, struct bio_vec *bv,
+ struct folio *folio, size_t len, size_t offset,
+ bool *same_page)
+{
+ struct page *page = folio_page(folio, 0);
unsigned long mask = queue_segment_boundary(q);
phys_addr_t addr1 = page_to_phys(bv->bv_page) + bv->bv_offset;
phys_addr_t addr2 = page_to_phys(page) + offset + len - 1;
diff --git a/block/blk.h b/block/blk.h
index 47dadd2439b1..17478657c5ef 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -94,6 +94,10 @@ bool bvec_try_merge_hw_page(struct request_queue *q, struct bio_vec *bv,
struct page *page, unsigned len, unsigned offset,
bool *same_page);
+bool bvec_try_merge_hw_folio(struct request_queue *q, struct bio_vec *bv,
+ struct folio *folio, size_t len, size_t offset,
+ bool *same_page);
+
static inline bool biovec_phys_mergeable(struct request_queue *q,
struct bio_vec *vec1, struct bio_vec *vec2)
{
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v7 2/4] block: Added folio-lized version of bio_add_hw_page()
2024-07-04 7:03 ` [PATCH v7 0/4] block: add larger order folio instead of pages Kundan Kumar
2024-07-04 7:03 ` [PATCH v7 1/4] block: Added folio-lized version of bvec_try_merge_hw_page() Kundan Kumar
@ 2024-07-04 7:03 ` Kundan Kumar
2024-07-06 8:02 ` Christoph Hellwig
2024-07-04 7:03 ` [PATCH v7 3/4] block: introduce folio awareness and add a bigger size from folio Kundan Kumar
2024-07-04 7:03 ` [PATCH v7 4/4] block: unpin user pages belonging to a folio at once Kundan Kumar
3 siblings, 1 reply; 11+ messages in thread
From: Kundan Kumar @ 2024-07-04 7:03 UTC (permalink / raw)
To: axboe, hch, willy, kbusch
Cc: linux-block, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti,
gost.dev, Kundan Kumar
Added new bio_add_hw_folio() function. This is a prep patch.
Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com>
---
block/bio.c | 32 +++++++++++++++++++++++++++-----
block/blk.h | 4 ++++
2 files changed, 31 insertions(+), 5 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index c10f5fa0ba27..05d624f016f0 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -996,6 +996,30 @@ bool bvec_try_merge_hw_folio(struct request_queue *q, struct bio_vec *bv,
int bio_add_hw_page(struct request_queue *q, struct bio *bio,
struct page *page, unsigned int len, unsigned int offset,
unsigned int max_sectors, bool *same_page)
+{
+ struct folio *folio = page_folio(page);
+
+ return bio_add_hw_folio(q, bio, folio, len,
+ ((size_t)folio_page_idx(folio, page) << PAGE_SHIFT) +
+ offset, max_sectors, same_page);
+}
+
+/**
+ * bio_add_hw_folio - attempt to add a folio to a bio with hw constraints
+ * @q: the target queue
+ * @bio: destination bio
+ * @folio: folio to add
+ * @len: vec entry length
+ * @offset: vec entry offset in the folio
+ * @max_sectors: maximum number of sectors that can be added
+ * @same_page: return if the segment has been merged inside the same folio
+ *
+ * Add a folio to a bio while respecting the hardware max_sectors, max_segment
+ * and gap limitations.
+ */
+int bio_add_hw_folio(struct request_queue *q, struct bio *bio,
+ struct folio *folio, size_t len, size_t offset,
+ unsigned int max_sectors, bool *same_page)
{
unsigned int max_size = max_sectors << SECTOR_SHIFT;
@@ -1009,8 +1033,8 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio,
if (bio->bi_vcnt > 0) {
struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
- if (bvec_try_merge_hw_page(q, bv, page, len, offset,
- same_page)) {
+ if (bvec_try_merge_hw_folio(q, bv, folio, len, offset,
+ same_page)) {
bio->bi_iter.bi_size += len;
return len;
}
@@ -1027,9 +1051,7 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio,
return 0;
}
- bvec_set_page(&bio->bi_io_vec[bio->bi_vcnt], page, len, offset);
- bio->bi_vcnt++;
- bio->bi_iter.bi_size += len;
+ bio_add_folio_nofail(bio, folio, len, offset);
return len;
}
diff --git a/block/blk.h b/block/blk.h
index 17478657c5ef..0c8857fe4079 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -534,6 +534,10 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio,
struct page *page, unsigned int len, unsigned int offset,
unsigned int max_sectors, bool *same_page);
+int bio_add_hw_folio(struct request_queue *q, struct bio *bio,
+ struct folio *folio, size_t len, size_t offset,
+ unsigned int max_sectors, bool *same_page);
+
/*
* Clean up a page appropriately, where the page may be pinned, may have a
* ref taken on it or neither.
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v7 3/4] block: introduce folio awareness and add a bigger size from folio
2024-07-04 7:03 ` [PATCH v7 0/4] block: add larger order folio instead of pages Kundan Kumar
2024-07-04 7:03 ` [PATCH v7 1/4] block: Added folio-lized version of bvec_try_merge_hw_page() Kundan Kumar
2024-07-04 7:03 ` [PATCH v7 2/4] block: Added folio-lized version of bio_add_hw_page() Kundan Kumar
@ 2024-07-04 7:03 ` Kundan Kumar
2024-07-06 8:03 ` Christoph Hellwig
2024-07-08 8:24 ` kernel test robot
2024-07-04 7:03 ` [PATCH v7 4/4] block: unpin user pages belonging to a folio at once Kundan Kumar
3 siblings, 2 replies; 11+ messages in thread
From: Kundan Kumar @ 2024-07-04 7:03 UTC (permalink / raw)
To: axboe, hch, willy, kbusch
Cc: linux-block, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti,
gost.dev, Kundan Kumar
Add a bigger size from folio to bio and skip merge processing for pages.
Fetch the offset of page within a folio. Depending on the size of folio
and folio_offset, fetch a larger length. This length may consist of
multiple contiguous pages if folio is multiorder.
Using the length calculate number of pages which will be added to bio and
increment the loop counter to skip those pages.
Using a helper function check if pages are contiguous and belong to same
folio, this is done as a COW may happen and change contiguous mapping of
pages of folio.
This technique helps to avoid overhead of merging pages which belong to
same large order folio.
Also folio-lize the functions bio_iov_add_page() and
bio_iov_add_zone_append_page()
Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com>
---
block/bio.c | 77 ++++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 61 insertions(+), 16 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index 05d624f016f0..32c9c6d80384 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1243,8 +1243,8 @@ void bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter)
bio_set_flag(bio, BIO_CLONED);
}
-static int bio_iov_add_page(struct bio *bio, struct page *page,
- unsigned int len, unsigned int offset)
+static int bio_iov_add_folio(struct bio *bio, struct folio *folio, size_t len,
+ size_t offset)
{
bool same_page = false;
@@ -1253,30 +1253,61 @@ static int bio_iov_add_page(struct bio *bio, struct page *page,
if (bio->bi_vcnt > 0 &&
bvec_try_merge_page(&bio->bi_io_vec[bio->bi_vcnt - 1],
- page, len, offset, &same_page)) {
+ folio_page(folio, 0), len, offset,
+ &same_page)) {
bio->bi_iter.bi_size += len;
if (same_page)
- bio_release_page(bio, page);
+ bio_release_page(bio, folio_page(folio, 0));
return 0;
}
- __bio_add_page(bio, page, len, offset);
+ bio_add_folio_nofail(bio, folio, len, offset);
return 0;
}
-static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page,
- unsigned int len, unsigned int offset)
+static int bio_iov_add_zone_append_folio(struct bio *bio, struct folio *folio,
+ size_t len, size_t offset)
{
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
bool same_page = false;
- if (bio_add_hw_page(q, bio, page, len, offset,
+ if (bio_add_hw_folio(q, bio, folio, len, offset,
queue_max_zone_append_sectors(q), &same_page) != len)
return -EINVAL;
if (same_page)
- bio_release_page(bio, page);
+ bio_release_page(bio, folio_page(folio, 0));
return 0;
}
+static unsigned int get_contig_folio_len(unsigned int *num_pages,
+ struct page **pages, unsigned int i,
+ struct folio *folio, size_t left,
+ size_t offset)
+{
+ size_t bytes = left;
+ size_t contig_sz = min_t(size_t, PAGE_SIZE - offset, bytes);
+ unsigned int j;
+
+ /*
+ * We might COW a single page in the middle of
+ * a large folio, so we have to check that all
+ * pages belong to the same folio.
+ */
+ bytes -= contig_sz;
+ for (j = i + 1; j < i + *num_pages; j++) {
+ size_t next = min_t(size_t, PAGE_SIZE, bytes);
+
+ if (page_folio(pages[j]) != folio ||
+ pages[j] != pages[j - 1] + 1) {
+ break;
+ }
+ contig_sz += next;
+ bytes -= next;
+ }
+ *num_pages = j - i;
+
+ return contig_sz;
+}
+
#define PAGE_PTRS_PER_BVEC (sizeof(struct bio_vec) / sizeof(struct page *))
/**
@@ -1296,9 +1327,9 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
struct page **pages = (struct page **)bv;
- ssize_t size, left;
- unsigned len, i = 0;
- size_t offset;
+ ssize_t size;
+ unsigned int i = 0, num_pages;
+ size_t offset, folio_offset, left, len;
int ret = 0;
/*
@@ -1340,15 +1371,29 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
for (left = size, i = 0; left > 0; left -= len, i++) {
struct page *page = pages[i];
+ struct folio *folio = page_folio(page);
+
+ folio_offset = ((size_t)folio_page_idx(folio, page) <<
+ PAGE_SHIFT) + offset;
+
+ len = min_t(size_t, (folio_size(folio) - folio_offset), left);
+
+ num_pages = DIV_ROUND_UP(offset + len, PAGE_SIZE);
+
+ if (num_pages > 1)
+ len = get_contig_folio_len(&num_pages, pages, i,
+ folio, left, offset);
- len = min_t(size_t, PAGE_SIZE - offset, left);
if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
- ret = bio_iov_add_zone_append_page(bio, page, len,
- offset);
+ ret = bio_iov_add_zone_append_folio(bio, folio, len,
+ folio_offset);
if (ret)
break;
} else
- bio_iov_add_page(bio, page, len, offset);
+ bio_iov_add_folio(bio, folio, len, folio_offset);
+
+ /* Skip the pages which got added */
+ i = i + (num_pages - 1);
offset = 0;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v7 4/4] block: unpin user pages belonging to a folio at once
2024-07-04 7:03 ` [PATCH v7 0/4] block: add larger order folio instead of pages Kundan Kumar
` (2 preceding siblings ...)
2024-07-04 7:03 ` [PATCH v7 3/4] block: introduce folio awareness and add a bigger size from folio Kundan Kumar
@ 2024-07-04 7:03 ` Kundan Kumar
2024-07-06 8:10 ` Christoph Hellwig
3 siblings, 1 reply; 11+ messages in thread
From: Kundan Kumar @ 2024-07-04 7:03 UTC (permalink / raw)
To: axboe, hch, willy, kbusch
Cc: linux-block, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti,
gost.dev, Kundan Kumar
Add a new function bio_release_folio() and use it to put refs by npages
count.
Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com>
---
block/bio.c | 7 +------
block/blk.h | 7 +++++++
include/linux/mm.h | 1 +
mm/gup.c | 13 +++++++++++++
4 files changed, 22 insertions(+), 6 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index 32c9c6d80384..0d923140006e 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1205,20 +1205,15 @@ void __bio_release_pages(struct bio *bio, bool mark_dirty)
struct folio_iter fi;
bio_for_each_folio_all(fi, bio) {
- struct page *page;
size_t nr_pages;
-
if (mark_dirty) {
folio_lock(fi.folio);
folio_mark_dirty(fi.folio);
folio_unlock(fi.folio);
}
- page = folio_page(fi.folio, fi.offset / PAGE_SIZE);
nr_pages = (fi.offset + fi.length - 1) / PAGE_SIZE -
fi.offset / PAGE_SIZE + 1;
- do {
- bio_release_page(bio, page++);
- } while (--nr_pages != 0);
+ bio_release_folio(bio, fi.folio, nr_pages);
}
}
EXPORT_SYMBOL_GPL(__bio_release_pages);
diff --git a/block/blk.h b/block/blk.h
index 0c8857fe4079..18520b05c6ce 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -548,6 +548,13 @@ static inline void bio_release_page(struct bio *bio, struct page *page)
unpin_user_page(page);
}
+static inline void bio_release_folio(struct bio *bio, struct folio *folio,
+ unsigned long npages)
+{
+ if (bio_flagged(bio, BIO_PAGE_PINNED))
+ unpin_user_folio(folio, npages);
+}
+
struct request_queue *blk_alloc_queue(struct queue_limits *lim, int node_id);
int disk_scan_partitions(struct gendisk *disk, blk_mode_t mode);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9849dfda44d4..b902c6c39e2b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1618,6 +1618,7 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
bool make_dirty);
void unpin_user_pages(struct page **pages, unsigned long npages);
+void unpin_user_folio(struct folio *folio, unsigned long npages);
static inline bool is_cow_mapping(vm_flags_t flags)
{
diff --git a/mm/gup.c b/mm/gup.c
index ca0f5cedce9b..bc96efa43d1b 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -488,6 +488,19 @@ void unpin_user_pages(struct page **pages, unsigned long npages)
}
EXPORT_SYMBOL(unpin_user_pages);
+/**
+ * unpin_user_folio() - release pages of a folio
+ * @folio: pointer to folio to be released
+ * @npages: number of pages of same folio
+ *
+ * Release npages of the folio
+ */
+void unpin_user_folio(struct folio *folio, unsigned long npages)
+{
+ gup_put_folio(folio, npages, FOLL_PIN);
+}
+EXPORT_SYMBOL(unpin_user_folio);
+
/*
* Set the MMF_HAS_PINNED if not set yet; after set it'll be there for the mm's
* lifecycle. Avoid setting the bit unless necessary, or it might cause write
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v7 1/4] block: Added folio-lized version of bvec_try_merge_hw_page()
2024-07-04 7:03 ` [PATCH v7 1/4] block: Added folio-lized version of bvec_try_merge_hw_page() Kundan Kumar
@ 2024-07-06 8:01 ` Christoph Hellwig
0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2024-07-06 8:01 UTC (permalink / raw)
To: Kundan Kumar
Cc: axboe, hch, willy, kbusch, linux-block, joshi.k, mcgrof, anuj20.g,
nj.shetty, c.gameti, gost.dev
I'd say folio-ized, in the subject (also for the next patch), but I'm not
a native speaker and this is all pretty odd terminology anyway..
> index e9e809a63c59..c10f5fa0ba27 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -952,6 +952,23 @@ bool bvec_try_merge_hw_page(struct request_queue *q, struct bio_vec *bv,
> struct page *page, unsigned len, unsigned offset,
> bool *same_page)
> {
> + struct folio *folio = page_folio(page);
> +
> + return bvec_try_merge_hw_folio(q, bv, folio, len,
> + ((size_t)folio_page_idx(folio, page) << PAGE_SHIFT) +
> + offset, same_page);
> +}
> +
> +/*
> + * Try to merge a folio into a segment, while obeying the hardware segment
> + * size limit. This is not for normal read/write bios, but for passthrough
> + * or Zone Append operations that we can't split.
> + */
> +bool bvec_try_merge_hw_folio(struct request_queue *q, struct bio_vec *bv,
> + struct folio *folio, size_t len, size_t offset,
> + bool *same_page)
Placing the wrapper above the guts of the implementation is a bit odd.
Can you reorder these?
Otherwise looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v7 2/4] block: Added folio-lized version of bio_add_hw_page()
2024-07-04 7:03 ` [PATCH v7 2/4] block: Added folio-lized version of bio_add_hw_page() Kundan Kumar
@ 2024-07-06 8:02 ` Christoph Hellwig
0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2024-07-06 8:02 UTC (permalink / raw)
To: Kundan Kumar
Cc: axboe, hch, willy, kbusch, linux-block, joshi.k, mcgrof, anuj20.g,
nj.shetty, c.gameti, gost.dev
Module the same comments as for the last patch:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v7 3/4] block: introduce folio awareness and add a bigger size from folio
2024-07-04 7:03 ` [PATCH v7 3/4] block: introduce folio awareness and add a bigger size from folio Kundan Kumar
@ 2024-07-06 8:03 ` Christoph Hellwig
2024-07-08 8:24 ` kernel test robot
1 sibling, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2024-07-06 8:03 UTC (permalink / raw)
To: Kundan Kumar
Cc: axboe, hch, willy, kbusch, linux-block, joshi.k, mcgrof, anuj20.g,
nj.shetty, c.gameti, gost.dev
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v7 4/4] block: unpin user pages belonging to a folio at once
2024-07-04 7:03 ` [PATCH v7 4/4] block: unpin user pages belonging to a folio at once Kundan Kumar
@ 2024-07-06 8:10 ` Christoph Hellwig
2024-07-06 8:12 ` Christoph Hellwig
0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2024-07-06 8:10 UTC (permalink / raw)
To: Kundan Kumar
Cc: axboe, hch, willy, kbusch, linux-block, joshi.k, mcgrof, anuj20.g,
nj.shetty, c.gameti, gost.dev
> nr_pages = (fi.offset + fi.length - 1) / PAGE_SIZE -
> fi.offset / PAGE_SIZE + 1;
> + bio_release_folio(bio, fi.folio, nr_pages);
> }
> }
> EXPORT_SYMBOL_GPL(__bio_release_pages);
> diff --git a/block/blk.h b/block/blk.h
> index 0c8857fe4079..18520b05c6ce 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -548,6 +548,13 @@ static inline void bio_release_page(struct bio *bio, struct page *page)
> unpin_user_page(page);
> }
>
> +static inline void bio_release_folio(struct bio *bio, struct folio *folio,
> + unsigned long npages)
> +{
> + if (bio_flagged(bio, BIO_PAGE_PINNED))
> + unpin_user_folio(folio, npages);
> +}
This is only used in __bio_release_pages, and given that
__bio_release_pages is only called when BIO_PAGE_PINNED is set there
is no need to check it inside the loop again.
Also this means we know the loop doesn't do anything if mark_dirty is
false, which is another trivial check that can move into
bio_release_pages. As this optimization already applies as-is I'll
send a prep patch for it.
so that we can avoid the npages calculation for the !BIO_PAGE_PINNED
case. Morover having the BIO_PAGE_PINNED knowledge there means we
can skip the entire loop for !BIO_PAGE_PINNED &&
> +/**
> + * unpin_user_folio() - release pages of a folio
> + * @folio: pointer to folio to be released
> + * @npages: number of pages of same folio
> + *
> + * Release npages of the folio
> + */
> +void unpin_user_folio(struct folio *folio, unsigned long npages)
> +{
> + gup_put_folio(folio, npages, FOLL_PIN);
> +}
> +EXPORT_SYMBOL(unpin_user_folio);
Please don't hide a new MM API inside a block patch, but split it out
with a mm prefix.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v7 4/4] block: unpin user pages belonging to a folio at once
2024-07-06 8:10 ` Christoph Hellwig
@ 2024-07-06 8:12 ` Christoph Hellwig
0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2024-07-06 8:12 UTC (permalink / raw)
To: Kundan Kumar
Cc: axboe, hch, willy, kbusch, linux-block, joshi.k, mcgrof, anuj20.g,
nj.shetty, c.gameti, gost.dev
On Sat, Jul 06, 2024 at 10:10:21AM +0200, Christoph Hellwig wrote:
> Also this means we know the loop doesn't do anything if mark_dirty is
> false, which is another trivial check that can move into
> bio_release_pages. As this optimization already applies as-is I'll
> send a prep patch for it.
>
> so that we can avoid the npages calculation for the !BIO_PAGE_PINNED
> case. Morover having the BIO_PAGE_PINNED knowledge there means we
> can skip the entire loop for !BIO_PAGE_PINNED &&
Braino - we're obviously already skipping it for all !BIO_PAGE_PINNED
case. So discard most of this except for that fact that we should
skip the wrapper doing the extra BIO_PAGE_PINNED checks.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v7 3/4] block: introduce folio awareness and add a bigger size from folio
2024-07-04 7:03 ` [PATCH v7 3/4] block: introduce folio awareness and add a bigger size from folio Kundan Kumar
2024-07-06 8:03 ` Christoph Hellwig
@ 2024-07-08 8:24 ` kernel test robot
1 sibling, 0 replies; 11+ messages in thread
From: kernel test robot @ 2024-07-08 8:24 UTC (permalink / raw)
To: Kundan Kumar
Cc: oe-lkp, lkp, linux-block, ltp, axboe, hch, willy, kbusch, joshi.k,
mcgrof, anuj20.g, nj.shetty, c.gameti, gost.dev, Kundan Kumar,
oliver.sang
Hello,
kernel test robot noticed "WARNING:at_mm/gup.c:#try_grab_page" on:
commit: 69b40318d4fdb6a9ac6bb833618e4cd954db4946 ("[PATCH v7 3/4] block: introduce folio awareness and add a bigger size from folio")
url: https://github.com/intel-lab-lkp/linux/commits/Kundan-Kumar/block-Added-folio-lized-version-of-bvec_try_merge_hw_page/20240705-055633
base: https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-next
patch link: https://lore.kernel.org/all/20240704070357.1993-4-kundan.kumar@samsung.com/
patch subject: [PATCH v7 3/4] block: introduce folio awareness and add a bigger size from folio
in testcase: ltp
version: ltp-x86_64-14c1f76-1_20240706
with following parameters:
disk: 1HDD
fs: ext4
test: ltp-aiodio.part2-00
compiler: gcc-13
test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202407081641.6a640f9e-oliver.sang@intel.com
kern :warn : [ 327.605962] ------------[ cut here ]------------
kern :warn : [ 327.606706] WARNING: CPU: 1 PID: 5867 at mm/gup.c:229 try_grab_page (mm/gup.c:229 (discriminator 1))
kern :warn : [ 327.607701] Modules linked in: intel_rapl_msr intel_rapl_common nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha512_ssse3 rapl btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c crc32c_intel sd_mod sg intel_cstate ahci mei_me nvme libahci ipmi_devintf ipmi_msghandler intel_uncore intel_wmi_thunderbolt wmi_bmof mxm_wmi wdat_wdt libata nvme_core i2c_i801 ioatdma mei i2c_smbus dca wmi binfmt_misc drm fuse loop dm_mod ip_tables
kern :warn : [ 327.613419] CPU: 1 PID: 5867 Comm: dio_sparse Not tainted 6.10.0-rc6-00246-g69b40318d4fd #1
kern :warn : [ 327.614547] Hardware name: Gigabyte Technology Co., Ltd. X299 UD4 Pro/X299 UD4 Pro-CF, BIOS F8a 04/27/2021
kern :warn : [ 327.615788] RIP: 0010:try_grab_page (mm/gup.c:229 (discriminator 1))
kern :warn : [ 327.616621] Code: 40 f6 c5 01 0f 84 1a fe ff ff 48 83 ed 01 e9 14 fe ff ff be 04 00 00 00 4c 89 e7 e8 bd 68 14 00 f0 41 ff 04 24 e9 67 ff ff ff <0f> 0b b8 f4 ff ff ff 5b 5d 41 5c 41 5d c3 cc cc cc cc e8 9c 68 14
All code
========
0: 40 f6 c5 01 test $0x1,%bpl
4: 0f 84 1a fe ff ff je 0xfffffffffffffe24
a: 48 83 ed 01 sub $0x1,%rbp
e: e9 14 fe ff ff jmp 0xfffffffffffffe27
13: be 04 00 00 00 mov $0x4,%esi
18: 4c 89 e7 mov %r12,%rdi
1b: e8 bd 68 14 00 call 0x1468dd
20: f0 41 ff 04 24 lock incl (%r12)
25: e9 67 ff ff ff jmp 0xffffffffffffff91
2a:* 0f 0b ud2 <-- trapping instruction
2c: b8 f4 ff ff ff mov $0xfffffff4,%eax
31: 5b pop %rbx
32: 5d pop %rbp
33: 41 5c pop %r12
35: 41 5d pop %r13
37: c3 ret
38: cc int3
39: cc int3
3a: cc int3
3b: cc int3
3c: e8 .byte 0xe8
3d: 9c pushf
3e: 68 .byte 0x68
3f: 14 .byte 0x14
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: b8 f4 ff ff ff mov $0xfffffff4,%eax
7: 5b pop %rbx
8: 5d pop %rbp
9: 41 5c pop %r12
b: 41 5d pop %r13
d: c3 ret
e: cc int3
f: cc int3
10: cc int3
11: cc int3
12: e8 .byte 0xe8
13: 9c pushf
14: 68 .byte 0x68
15: 14 .byte 0x14
kern :warn : [ 327.619059] RSP: 0018:ffffc9000d61f2f8 EFLAGS: 00010246
kern :warn : [ 327.619993] RAX: 0000000000000000 RBX: ffffea000bc10000 RCX: ffffffff8194dcbb
kern :warn : [ 327.621085] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffea000bc10034
kern :warn : [ 327.622179] RBP: ffffea000bc10000 R08: 0000000000000000 R09: fffff94001782006
kern :warn : [ 327.623273] R10: ffffea000bc10037 R11: 00007fd4b2aa6fff R12: ffffea000bc10034
kern :warn : [ 327.624366] R13: 0000000000290000 R14: ffff8881267fec40 R15: 0000000000000000
kern :warn : [ 327.625460] FS: 00007fd4b2929740(0000) GS:ffff889f8ae80000(0000) knlGS:0000000000000000
kern :warn : [ 327.626636] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kern :warn : [ 327.627622] CR2: 00007f517b030000 CR3: 0000002081e22001 CR4: 00000000003706f0
kern :warn : [ 327.628727] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kern :warn : [ 327.629832] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kern :warn : [ 327.631001] Call Trace:
kern :warn : [ 327.631743] <TASK>
kern :warn : [ 327.632445] ? __warn (kernel/panic.c:693)
kern :warn : [ 327.633243] ? try_grab_page (mm/gup.c:229 (discriminator 1))
kern :warn : [ 327.634128] ? report_bug (lib/bug.c:180 lib/bug.c:219)
kern :warn : [ 327.634985] ? handle_bug (arch/x86/kernel/traps.c:239 (discriminator 1))
kern :warn : [ 327.635835] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1))
kern :warn : [ 327.636695] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621)
kern :warn : [ 327.637587] ? try_grab_page (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:457 include/linux/atomic/atomic-instrumented.h:33 include/linux/page_ref.h:67 include/linux/page_ref.h:89 mm/gup.c:229)
kern :warn : [ 327.638509] ? try_grab_page (mm/gup.c:229 (discriminator 1))
kern :warn : [ 327.639418] ? try_grab_page (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:457 include/linux/atomic/atomic-instrumented.h:33 include/linux/page_ref.h:67 include/linux/page_ref.h:89 mm/gup.c:229)
kern :warn : [ 327.640285] follow_huge_pmd (mm/gup.c:809)
kern :warn : [ 327.641162] follow_pmd_mask+0x3f8/0x7e0
kern :warn : [ 327.642090] ? __pfx_follow_pmd_mask+0x10/0x10
kern :warn : [ 327.643125] ? find_vma (mm/mmap.c:1944)
kern :warn : [ 327.643948] ? __pfx_find_vma (mm/mmap.c:1944)
kern :warn : [ 327.644804] follow_page_mask (mm/gup.c:1116 mm/gup.c:1162)
kern :warn : [ 327.645674] __get_user_pages (mm/gup.c:1588 (discriminator 1))
kern :warn : [ 327.646601] ? __pfx___get_user_pages (mm/gup.c:1522)
kern :warn : [ 327.647533] ? down_read_killable (arch/x86/include/asm/atomic64_64.h:20 include/linux/atomic/atomic-arch-fallback.h:2629 include/linux/atomic/atomic-long.h:79 include/linux/atomic/atomic-instrumented.h:3224 kernel/locking/rwsem.c:176 kernel/locking/rwsem.c:181 kernel/locking/rwsem.c:249 kernel/locking/rwsem.c:241 kernel/locking/rwsem.c:1249 kernel/locking/rwsem.c:1273 kernel/locking/rwsem.c:1551)
kern :warn : [ 327.648428] ? __pfx_down_read_killable (kernel/locking/rwsem.c:1547)
kern :warn : [ 327.649345] __gup_longterm_locked (mm/gup.c:1859 mm/gup.c:2556)
kern :warn : [ 327.650237] gup_fast_fallback (mm/gup.c:3476)
kern :warn : [ 327.651111] ? __pfx__raw_spin_lock_irqsave (kernel/locking/spinlock.c:161)
kern :warn : [ 327.652030] ? __pfx_gup_fast_fallback (mm/gup.c:3439)
kern :warn : [ 327.652890] ? __link_object (include/linux/rculist.h:79 (discriminator 1) include/linux/rculist.h:128 (discriminator 1) mm/kmemleak.c:728 (discriminator 1))
kern :warn : [ 327.653674] iov_iter_extract_pages (lib/iov_iter.c:1584 (discriminator 1) lib/iov_iter.c:1646 (discriminator 1))
kern :warn : [ 327.654501] __bio_iov_iter_get_pages (block/bio.c:1353)
kern :warn : [ 327.655382] ? __pfx___bio_iov_iter_get_pages (block/bio.c:1324)
kern :warn : [ 327.656273] ? bio_init (arch/x86/include/asm/atomic.h:28 include/linux/atomic/atomic-arch-fallback.h:503 include/linux/atomic/atomic-instrumented.h:68 block/bio.c:279)
kern :warn : [ 327.656980] ? bio_alloc_bioset (block/bio.c:578)
kern :warn : [ 327.657721] bio_iov_iter_get_pages (block/bio.c:1446 (discriminator 3))
kern :warn : [ 327.658474] iomap_dio_bio_iter (fs/iomap/direct-io.c:388)
kern :warn : [ 327.659248] __iomap_dio_rw (fs/iomap/direct-io.c:501 fs/iomap/direct-io.c:660)
kern :warn : [ 327.659982] ? __pfx___iomap_dio_rw (fs/iomap/direct-io.c:544)
kern :warn : [ 327.660729] ? __pfx_ext4_dio_write_checks (fs/ext4/file.c:424)
kern :warn : [ 327.661518] ? __pfx_ext4_dio_write_end_io (fs/ext4/file.c:376)
kern :warn : [ 327.662299] ? iomap_dio_complete (fs/iomap/direct-io.c:133)
kern :warn : [ 327.663052] iomap_dio_rw (fs/iomap/direct-io.c:749)
kern :warn : [ 327.663723] ext4_dio_write_iter (fs/ext4/file.c:577)
kern :warn : [ 327.664439] ? __pfx_ext4_dio_write_iter (fs/ext4/file.c:499)
kern :warn : [ 327.665185] ? folio_unlock (arch/x86/include/asm/bitops.h:101 include/asm-generic/bitops/instrumented-lock.h:80 include/linux/page-flags.h:762 mm/filemap.c:1508)
kern :warn : [ 327.665838] ? do_wp_page (include/linux/vmstat.h:71 (discriminator 1) mm/memory.c:3193 (discriminator 1) mm/memory.c:3663 (discriminator 1))
kern :warn : [ 327.666494] ? __pfx___might_resched (kernel/sched/core.c:10151)
kern :warn : [ 327.667254] vfs_write (fs/read_write.c:497 fs/read_write.c:590)
kern :warn : [ 327.667915] ? __pfx___might_resched (kernel/sched/core.c:10151)
kern :warn : [ 327.668657] ? __pfx_vfs_write (fs/read_write.c:571)
kern :warn : [ 327.669342] ? __pfx___might_resched (kernel/sched/core.c:10151)
kern :warn : [ 327.670062] ? __pfx_put_timespec64 (kernel/time/time.c:904)
kern :warn : [ 327.670774] ksys_write (fs/read_write.c:643)
kern :warn : [ 327.671409] ? __pfx_ksys_write (fs/read_write.c:633)
kern :warn : [ 327.672129] ? __pfx___x64_sys_clock_gettime (kernel/time/posix-timers.c:1132)
kern :warn : [ 327.672939] ? fpregs_restore_userregs (arch/x86/include/asm/bitops.h:75 include/asm-generic/bitops/instrumented-atomic.h:42 include/linux/thread_info.h:94 arch/x86/kernel/fpu/context.h:79)
kern :warn : [ 327.673697] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1))
kern :warn : [ 327.674363] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
kern :warn : [ 327.675145] RIP: 0033:0x7fd4b2a24240
kern :warn : [ 327.675803] Code: 40 00 48 8b 15 c1 9b 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d a1 23 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89
All code
========
0: 40 00 48 8b rex add %cl,-0x75(%rax)
4: 15 c1 9b 0d 00 adc $0xd9bc1,%eax
9: f7 d8 neg %eax
b: 64 89 02 mov %eax,%fs:(%rdx)
e: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
15: eb b7 jmp 0xffffffffffffffce
17: 0f 1f 00 nopl (%rax)
1a: 80 3d a1 23 0e 00 00 cmpb $0x0,0xe23a1(%rip) # 0xe23c2
21: 74 17 je 0x3a
23: b8 01 00 00 00 mov $0x1,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 58 ja 0x8a
32: c3 ret
33: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
3a: 48 83 ec 28 sub $0x28,%rsp
3e: 48 rex.W
3f: 89 .byte 0x89
Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 58 ja 0x60
8: c3 ret
9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
10: 48 83 ec 28 sub $0x28,%rsp
14: 48 rex.W
15: 89 .byte 0x89
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240708/202407081641.6a640f9e-oliver.sang@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-07-08 8:24 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CGME20240704071118epcas5p141ef3c2cbcde0ce31d342b5743a7dcf1@epcas5p1.samsung.com>
2024-07-04 7:03 ` [PATCH v7 0/4] block: add larger order folio instead of pages Kundan Kumar
2024-07-04 7:03 ` [PATCH v7 1/4] block: Added folio-lized version of bvec_try_merge_hw_page() Kundan Kumar
2024-07-06 8:01 ` Christoph Hellwig
2024-07-04 7:03 ` [PATCH v7 2/4] block: Added folio-lized version of bio_add_hw_page() Kundan Kumar
2024-07-06 8:02 ` Christoph Hellwig
2024-07-04 7:03 ` [PATCH v7 3/4] block: introduce folio awareness and add a bigger size from folio Kundan Kumar
2024-07-06 8:03 ` Christoph Hellwig
2024-07-08 8:24 ` kernel test robot
2024-07-04 7:03 ` [PATCH v7 4/4] block: unpin user pages belonging to a folio at once Kundan Kumar
2024-07-06 8:10 ` Christoph Hellwig
2024-07-06 8:12 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).