* [PATCH v7 0/4] block: add larger order folio instead of pages [not found] <CGME20240704071118epcas5p141ef3c2cbcde0ce31d342b5743a7dcf1@epcas5p1.samsung.com> @ 2024-07-04 7:03 ` Kundan Kumar [not found] ` <CGME20240704071122epcas5p12a52967e486a69375ee77db86f2594a0@epcas5p1.samsung.com> ` (3 more replies) 0 siblings, 4 replies; 11+ messages in thread From: Kundan Kumar @ 2024-07-04 7:03 UTC (permalink / raw) To: axboe, hch, willy, kbusch Cc: linux-block, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti, gost.dev, Kundan Kumar User space memory is mapped in kernel in form of pages array. These pages are iterated and added to BIO. In process, pages are also checked for contiguity and merged. When mTHP is enabled the pages generally belong to larger order folio. This patch series enables adding large folio to bio. It fetches folio for page in the page array. The page might start from an offset in the folio which could be multiples of PAGE_SIZE. Subsequent pages in page array might belong to same folio. Using the length of folio, folio_offset and remaining size, determine length in folio which can be added to the bio. Check if pages are contiguous and belong to same folio. If yes then skip further processing for the contiguous pages. This complete scheme reduces the overhead of iterating through pages. perf diff before and after this change(with mTHP enabled): Perf diff for write I/O with 128K block size: 1.24% -0.20% [kernel.kallsyms] [k] bio_iov_iter_get_pages 1.71% [kernel.kallsyms] [k] bvec_try_merge_page Perf diff for read I/O with 128K block size: 4.03% -1.59% [kernel.kallsyms] [k] bio_iov_iter_get_pages 5.14% [kernel.kallsyms] [k] bvec_try_merge_page Patch 1: Adds folio-lized version of bvec_try_merge_hw_page() Patch 2: Adds folio-lized version of bio_add_hw_page() Patch 3: Adds bigger size from folio to BIO Patch 4: Unpin user pages belonging to folio at once Changes since v6: - folio-lize bvec_try_merge_hw_page() to bvec_try_merge_hw_folio() - restructured the code as per comments - typecast with size_t while calculating the offset in folio. Changes since v5: - Made offset and len as size_t in function bio_add_hw_folio() - Avoid unpinning skipped pages at submission, rather unpin all pages at once on IO completion Changes since v4: - folio-lize bio_add_hw_page() to bio_add_hw_folio() - make bio_add_hw_page() as a wrapper around bio_add_hw_folio() - make new functions bio_release_folio() and unpin_user_folio() - made a helper function to check for contiguous pages of folio - changed &folio->page to folio_page(folio, 0) - reworded comments Changes since v3: - Added change to see if pages are contiguous and belong to same folio. If not then avoid skipping of pages.(Suggested by Matthew Wilcox) Changes since v2: - Made separate patches - Corrected code as per kernel coding style - Removed size_folio variable Changes since v1: - Changed functions bio_iov_add_page() and bio_iov_add_zone_append_page() to accept a folio - Removed branch and calculate folio_offset and len in same fashion for both 0 order and larger folios - Added change in NVMe driver to use nvme_setup_prp_simple() by ignoring multiples of PAGE_SIZE in offset - Added a change to unpin_user_pages which were added as folios. Also stopped the unpin of pages one by one from __bio_release_pages() (Suggested by Keith) Kundan Kumar (4): block: Added folio-lized version of bvec_try_merge_hw_page() block: Added folio-lized version of bio_add_hw_page() block: introduce folio awareness and add a bigger size from folio block: unpin user pages belonging to a folio at once block/bio.c | 133 ++++++++++++++++++++++++++++++++++++--------- block/blk.h | 15 +++++ include/linux/mm.h | 1 + mm/gup.c | 13 +++++ 4 files changed, 135 insertions(+), 27 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <CGME20240704071122epcas5p12a52967e486a69375ee77db86f2594a0@epcas5p1.samsung.com>]
* [PATCH v7 1/4] block: Added folio-lized version of bvec_try_merge_hw_page() [not found] ` <CGME20240704071122epcas5p12a52967e486a69375ee77db86f2594a0@epcas5p1.samsung.com> @ 2024-07-04 7:03 ` Kundan Kumar 2024-07-06 8:01 ` Christoph Hellwig 0 siblings, 1 reply; 11+ messages in thread From: Kundan Kumar @ 2024-07-04 7:03 UTC (permalink / raw) To: axboe, hch, willy, kbusch Cc: linux-block, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti, gost.dev, Kundan Kumar Added new function bvec_try_merge_hw_folio(). This is a prep patch. Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com> --- block/bio.c | 17 +++++++++++++++++ block/blk.h | 4 ++++ 2 files changed, 21 insertions(+) diff --git a/block/bio.c b/block/bio.c index e9e809a63c59..c10f5fa0ba27 100644 --- a/block/bio.c +++ b/block/bio.c @@ -952,6 +952,23 @@ bool bvec_try_merge_hw_page(struct request_queue *q, struct bio_vec *bv, struct page *page, unsigned len, unsigned offset, bool *same_page) { + struct folio *folio = page_folio(page); + + return bvec_try_merge_hw_folio(q, bv, folio, len, + ((size_t)folio_page_idx(folio, page) << PAGE_SHIFT) + + offset, same_page); +} + +/* + * Try to merge a folio into a segment, while obeying the hardware segment + * size limit. This is not for normal read/write bios, but for passthrough + * or Zone Append operations that we can't split. + */ +bool bvec_try_merge_hw_folio(struct request_queue *q, struct bio_vec *bv, + struct folio *folio, size_t len, size_t offset, + bool *same_page) +{ + struct page *page = folio_page(folio, 0); unsigned long mask = queue_segment_boundary(q); phys_addr_t addr1 = page_to_phys(bv->bv_page) + bv->bv_offset; phys_addr_t addr2 = page_to_phys(page) + offset + len - 1; diff --git a/block/blk.h b/block/blk.h index 47dadd2439b1..17478657c5ef 100644 --- a/block/blk.h +++ b/block/blk.h @@ -94,6 +94,10 @@ bool bvec_try_merge_hw_page(struct request_queue *q, struct bio_vec *bv, struct page *page, unsigned len, unsigned offset, bool *same_page); +bool bvec_try_merge_hw_folio(struct request_queue *q, struct bio_vec *bv, + struct folio *folio, size_t len, size_t offset, + bool *same_page); + static inline bool biovec_phys_mergeable(struct request_queue *q, struct bio_vec *vec1, struct bio_vec *vec2) { -- 2.25.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v7 1/4] block: Added folio-lized version of bvec_try_merge_hw_page() 2024-07-04 7:03 ` [PATCH v7 1/4] block: Added folio-lized version of bvec_try_merge_hw_page() Kundan Kumar @ 2024-07-06 8:01 ` Christoph Hellwig 0 siblings, 0 replies; 11+ messages in thread From: Christoph Hellwig @ 2024-07-06 8:01 UTC (permalink / raw) To: Kundan Kumar Cc: axboe, hch, willy, kbusch, linux-block, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti, gost.dev I'd say folio-ized, in the subject (also for the next patch), but I'm not a native speaker and this is all pretty odd terminology anyway.. > index e9e809a63c59..c10f5fa0ba27 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -952,6 +952,23 @@ bool bvec_try_merge_hw_page(struct request_queue *q, struct bio_vec *bv, > struct page *page, unsigned len, unsigned offset, > bool *same_page) > { > + struct folio *folio = page_folio(page); > + > + return bvec_try_merge_hw_folio(q, bv, folio, len, > + ((size_t)folio_page_idx(folio, page) << PAGE_SHIFT) + > + offset, same_page); > +} > + > +/* > + * Try to merge a folio into a segment, while obeying the hardware segment > + * size limit. This is not for normal read/write bios, but for passthrough > + * or Zone Append operations that we can't split. > + */ > +bool bvec_try_merge_hw_folio(struct request_queue *q, struct bio_vec *bv, > + struct folio *folio, size_t len, size_t offset, > + bool *same_page) Placing the wrapper above the guts of the implementation is a bit odd. Can you reorder these? Otherwise looks good: Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <CGME20240704071126epcas5p2572342c5d25c3292a9a39cb8c798a42c@epcas5p2.samsung.com>]
* [PATCH v7 2/4] block: Added folio-lized version of bio_add_hw_page() [not found] ` <CGME20240704071126epcas5p2572342c5d25c3292a9a39cb8c798a42c@epcas5p2.samsung.com> @ 2024-07-04 7:03 ` Kundan Kumar 2024-07-06 8:02 ` Christoph Hellwig 0 siblings, 1 reply; 11+ messages in thread From: Kundan Kumar @ 2024-07-04 7:03 UTC (permalink / raw) To: axboe, hch, willy, kbusch Cc: linux-block, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti, gost.dev, Kundan Kumar Added new bio_add_hw_folio() function. This is a prep patch. Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com> --- block/bio.c | 32 +++++++++++++++++++++++++++----- block/blk.h | 4 ++++ 2 files changed, 31 insertions(+), 5 deletions(-) diff --git a/block/bio.c b/block/bio.c index c10f5fa0ba27..05d624f016f0 100644 --- a/block/bio.c +++ b/block/bio.c @@ -996,6 +996,30 @@ bool bvec_try_merge_hw_folio(struct request_queue *q, struct bio_vec *bv, int bio_add_hw_page(struct request_queue *q, struct bio *bio, struct page *page, unsigned int len, unsigned int offset, unsigned int max_sectors, bool *same_page) +{ + struct folio *folio = page_folio(page); + + return bio_add_hw_folio(q, bio, folio, len, + ((size_t)folio_page_idx(folio, page) << PAGE_SHIFT) + + offset, max_sectors, same_page); +} + +/** + * bio_add_hw_folio - attempt to add a folio to a bio with hw constraints + * @q: the target queue + * @bio: destination bio + * @folio: folio to add + * @len: vec entry length + * @offset: vec entry offset in the folio + * @max_sectors: maximum number of sectors that can be added + * @same_page: return if the segment has been merged inside the same folio + * + * Add a folio to a bio while respecting the hardware max_sectors, max_segment + * and gap limitations. + */ +int bio_add_hw_folio(struct request_queue *q, struct bio *bio, + struct folio *folio, size_t len, size_t offset, + unsigned int max_sectors, bool *same_page) { unsigned int max_size = max_sectors << SECTOR_SHIFT; @@ -1009,8 +1033,8 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio, if (bio->bi_vcnt > 0) { struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1]; - if (bvec_try_merge_hw_page(q, bv, page, len, offset, - same_page)) { + if (bvec_try_merge_hw_folio(q, bv, folio, len, offset, + same_page)) { bio->bi_iter.bi_size += len; return len; } @@ -1027,9 +1051,7 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio, return 0; } - bvec_set_page(&bio->bi_io_vec[bio->bi_vcnt], page, len, offset); - bio->bi_vcnt++; - bio->bi_iter.bi_size += len; + bio_add_folio_nofail(bio, folio, len, offset); return len; } diff --git a/block/blk.h b/block/blk.h index 17478657c5ef..0c8857fe4079 100644 --- a/block/blk.h +++ b/block/blk.h @@ -534,6 +534,10 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio, struct page *page, unsigned int len, unsigned int offset, unsigned int max_sectors, bool *same_page); +int bio_add_hw_folio(struct request_queue *q, struct bio *bio, + struct folio *folio, size_t len, size_t offset, + unsigned int max_sectors, bool *same_page); + /* * Clean up a page appropriately, where the page may be pinned, may have a * ref taken on it or neither. -- 2.25.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v7 2/4] block: Added folio-lized version of bio_add_hw_page() 2024-07-04 7:03 ` [PATCH v7 2/4] block: Added folio-lized version of bio_add_hw_page() Kundan Kumar @ 2024-07-06 8:02 ` Christoph Hellwig 0 siblings, 0 replies; 11+ messages in thread From: Christoph Hellwig @ 2024-07-06 8:02 UTC (permalink / raw) To: Kundan Kumar Cc: axboe, hch, willy, kbusch, linux-block, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti, gost.dev Module the same comments as for the last patch: Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <CGME20240704071130epcas5p131b210c30237386f2c786e81c88355f6@epcas5p1.samsung.com>]
* [PATCH v7 3/4] block: introduce folio awareness and add a bigger size from folio [not found] ` <CGME20240704071130epcas5p131b210c30237386f2c786e81c88355f6@epcas5p1.samsung.com> @ 2024-07-04 7:03 ` Kundan Kumar 2024-07-06 8:03 ` Christoph Hellwig 2024-07-08 8:24 ` kernel test robot 0 siblings, 2 replies; 11+ messages in thread From: Kundan Kumar @ 2024-07-04 7:03 UTC (permalink / raw) To: axboe, hch, willy, kbusch Cc: linux-block, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti, gost.dev, Kundan Kumar Add a bigger size from folio to bio and skip merge processing for pages. Fetch the offset of page within a folio. Depending on the size of folio and folio_offset, fetch a larger length. This length may consist of multiple contiguous pages if folio is multiorder. Using the length calculate number of pages which will be added to bio and increment the loop counter to skip those pages. Using a helper function check if pages are contiguous and belong to same folio, this is done as a COW may happen and change contiguous mapping of pages of folio. This technique helps to avoid overhead of merging pages which belong to same large order folio. Also folio-lize the functions bio_iov_add_page() and bio_iov_add_zone_append_page() Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com> --- block/bio.c | 77 ++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 61 insertions(+), 16 deletions(-) diff --git a/block/bio.c b/block/bio.c index 05d624f016f0..32c9c6d80384 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1243,8 +1243,8 @@ void bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter) bio_set_flag(bio, BIO_CLONED); } -static int bio_iov_add_page(struct bio *bio, struct page *page, - unsigned int len, unsigned int offset) +static int bio_iov_add_folio(struct bio *bio, struct folio *folio, size_t len, + size_t offset) { bool same_page = false; @@ -1253,30 +1253,61 @@ static int bio_iov_add_page(struct bio *bio, struct page *page, if (bio->bi_vcnt > 0 && bvec_try_merge_page(&bio->bi_io_vec[bio->bi_vcnt - 1], - page, len, offset, &same_page)) { + folio_page(folio, 0), len, offset, + &same_page)) { bio->bi_iter.bi_size += len; if (same_page) - bio_release_page(bio, page); + bio_release_page(bio, folio_page(folio, 0)); return 0; } - __bio_add_page(bio, page, len, offset); + bio_add_folio_nofail(bio, folio, len, offset); return 0; } -static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page, - unsigned int len, unsigned int offset) +static int bio_iov_add_zone_append_folio(struct bio *bio, struct folio *folio, + size_t len, size_t offset) { struct request_queue *q = bdev_get_queue(bio->bi_bdev); bool same_page = false; - if (bio_add_hw_page(q, bio, page, len, offset, + if (bio_add_hw_folio(q, bio, folio, len, offset, queue_max_zone_append_sectors(q), &same_page) != len) return -EINVAL; if (same_page) - bio_release_page(bio, page); + bio_release_page(bio, folio_page(folio, 0)); return 0; } +static unsigned int get_contig_folio_len(unsigned int *num_pages, + struct page **pages, unsigned int i, + struct folio *folio, size_t left, + size_t offset) +{ + size_t bytes = left; + size_t contig_sz = min_t(size_t, PAGE_SIZE - offset, bytes); + unsigned int j; + + /* + * We might COW a single page in the middle of + * a large folio, so we have to check that all + * pages belong to the same folio. + */ + bytes -= contig_sz; + for (j = i + 1; j < i + *num_pages; j++) { + size_t next = min_t(size_t, PAGE_SIZE, bytes); + + if (page_folio(pages[j]) != folio || + pages[j] != pages[j - 1] + 1) { + break; + } + contig_sz += next; + bytes -= next; + } + *num_pages = j - i; + + return contig_sz; +} + #define PAGE_PTRS_PER_BVEC (sizeof(struct bio_vec) / sizeof(struct page *)) /** @@ -1296,9 +1327,9 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt; struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt; struct page **pages = (struct page **)bv; - ssize_t size, left; - unsigned len, i = 0; - size_t offset; + ssize_t size; + unsigned int i = 0, num_pages; + size_t offset, folio_offset, left, len; int ret = 0; /* @@ -1340,15 +1371,29 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) for (left = size, i = 0; left > 0; left -= len, i++) { struct page *page = pages[i]; + struct folio *folio = page_folio(page); + + folio_offset = ((size_t)folio_page_idx(folio, page) << + PAGE_SHIFT) + offset; + + len = min_t(size_t, (folio_size(folio) - folio_offset), left); + + num_pages = DIV_ROUND_UP(offset + len, PAGE_SIZE); + + if (num_pages > 1) + len = get_contig_folio_len(&num_pages, pages, i, + folio, left, offset); - len = min_t(size_t, PAGE_SIZE - offset, left); if (bio_op(bio) == REQ_OP_ZONE_APPEND) { - ret = bio_iov_add_zone_append_page(bio, page, len, - offset); + ret = bio_iov_add_zone_append_folio(bio, folio, len, + folio_offset); if (ret) break; } else - bio_iov_add_page(bio, page, len, offset); + bio_iov_add_folio(bio, folio, len, folio_offset); + + /* Skip the pages which got added */ + i = i + (num_pages - 1); offset = 0; } -- 2.25.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v7 3/4] block: introduce folio awareness and add a bigger size from folio 2024-07-04 7:03 ` [PATCH v7 3/4] block: introduce folio awareness and add a bigger size from folio Kundan Kumar @ 2024-07-06 8:03 ` Christoph Hellwig 2024-07-08 8:24 ` kernel test robot 1 sibling, 0 replies; 11+ messages in thread From: Christoph Hellwig @ 2024-07-06 8:03 UTC (permalink / raw) To: Kundan Kumar Cc: axboe, hch, willy, kbusch, linux-block, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti, gost.dev Looks good: Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v7 3/4] block: introduce folio awareness and add a bigger size from folio 2024-07-04 7:03 ` [PATCH v7 3/4] block: introduce folio awareness and add a bigger size from folio Kundan Kumar 2024-07-06 8:03 ` Christoph Hellwig @ 2024-07-08 8:24 ` kernel test robot 1 sibling, 0 replies; 11+ messages in thread From: kernel test robot @ 2024-07-08 8:24 UTC (permalink / raw) To: Kundan Kumar Cc: oe-lkp, lkp, linux-block, ltp, axboe, hch, willy, kbusch, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti, gost.dev, Kundan Kumar, oliver.sang Hello, kernel test robot noticed "WARNING:at_mm/gup.c:#try_grab_page" on: commit: 69b40318d4fdb6a9ac6bb833618e4cd954db4946 ("[PATCH v7 3/4] block: introduce folio awareness and add a bigger size from folio") url: https://github.com/intel-lab-lkp/linux/commits/Kundan-Kumar/block-Added-folio-lized-version-of-bvec_try_merge_hw_page/20240705-055633 base: https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-next patch link: https://lore.kernel.org/all/20240704070357.1993-4-kundan.kumar@samsung.com/ patch subject: [PATCH v7 3/4] block: introduce folio awareness and add a bigger size from folio in testcase: ltp version: ltp-x86_64-14c1f76-1_20240706 with following parameters: disk: 1HDD fs: ext4 test: ltp-aiodio.part2-00 compiler: gcc-13 test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory (please refer to attached dmesg/kmsg for entire log/backtrace) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202407081641.6a640f9e-oliver.sang@intel.com kern :warn : [ 327.605962] ------------[ cut here ]------------ kern :warn : [ 327.606706] WARNING: CPU: 1 PID: 5867 at mm/gup.c:229 try_grab_page (mm/gup.c:229 (discriminator 1)) kern :warn : [ 327.607701] Modules linked in: intel_rapl_msr intel_rapl_common nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha512_ssse3 rapl btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c crc32c_intel sd_mod sg intel_cstate ahci mei_me nvme libahci ipmi_devintf ipmi_msghandler intel_uncore intel_wmi_thunderbolt wmi_bmof mxm_wmi wdat_wdt libata nvme_core i2c_i801 ioatdma mei i2c_smbus dca wmi binfmt_misc drm fuse loop dm_mod ip_tables kern :warn : [ 327.613419] CPU: 1 PID: 5867 Comm: dio_sparse Not tainted 6.10.0-rc6-00246-g69b40318d4fd #1 kern :warn : [ 327.614547] Hardware name: Gigabyte Technology Co., Ltd. X299 UD4 Pro/X299 UD4 Pro-CF, BIOS F8a 04/27/2021 kern :warn : [ 327.615788] RIP: 0010:try_grab_page (mm/gup.c:229 (discriminator 1)) kern :warn : [ 327.616621] Code: 40 f6 c5 01 0f 84 1a fe ff ff 48 83 ed 01 e9 14 fe ff ff be 04 00 00 00 4c 89 e7 e8 bd 68 14 00 f0 41 ff 04 24 e9 67 ff ff ff <0f> 0b b8 f4 ff ff ff 5b 5d 41 5c 41 5d c3 cc cc cc cc e8 9c 68 14 All code ======== 0: 40 f6 c5 01 test $0x1,%bpl 4: 0f 84 1a fe ff ff je 0xfffffffffffffe24 a: 48 83 ed 01 sub $0x1,%rbp e: e9 14 fe ff ff jmp 0xfffffffffffffe27 13: be 04 00 00 00 mov $0x4,%esi 18: 4c 89 e7 mov %r12,%rdi 1b: e8 bd 68 14 00 call 0x1468dd 20: f0 41 ff 04 24 lock incl (%r12) 25: e9 67 ff ff ff jmp 0xffffffffffffff91 2a:* 0f 0b ud2 <-- trapping instruction 2c: b8 f4 ff ff ff mov $0xfffffff4,%eax 31: 5b pop %rbx 32: 5d pop %rbp 33: 41 5c pop %r12 35: 41 5d pop %r13 37: c3 ret 38: cc int3 39: cc int3 3a: cc int3 3b: cc int3 3c: e8 .byte 0xe8 3d: 9c pushf 3e: 68 .byte 0x68 3f: 14 .byte 0x14 Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: b8 f4 ff ff ff mov $0xfffffff4,%eax 7: 5b pop %rbx 8: 5d pop %rbp 9: 41 5c pop %r12 b: 41 5d pop %r13 d: c3 ret e: cc int3 f: cc int3 10: cc int3 11: cc int3 12: e8 .byte 0xe8 13: 9c pushf 14: 68 .byte 0x68 15: 14 .byte 0x14 kern :warn : [ 327.619059] RSP: 0018:ffffc9000d61f2f8 EFLAGS: 00010246 kern :warn : [ 327.619993] RAX: 0000000000000000 RBX: ffffea000bc10000 RCX: ffffffff8194dcbb kern :warn : [ 327.621085] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffea000bc10034 kern :warn : [ 327.622179] RBP: ffffea000bc10000 R08: 0000000000000000 R09: fffff94001782006 kern :warn : [ 327.623273] R10: ffffea000bc10037 R11: 00007fd4b2aa6fff R12: ffffea000bc10034 kern :warn : [ 327.624366] R13: 0000000000290000 R14: ffff8881267fec40 R15: 0000000000000000 kern :warn : [ 327.625460] FS: 00007fd4b2929740(0000) GS:ffff889f8ae80000(0000) knlGS:0000000000000000 kern :warn : [ 327.626636] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kern :warn : [ 327.627622] CR2: 00007f517b030000 CR3: 0000002081e22001 CR4: 00000000003706f0 kern :warn : [ 327.628727] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kern :warn : [ 327.629832] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 kern :warn : [ 327.631001] Call Trace: kern :warn : [ 327.631743] <TASK> kern :warn : [ 327.632445] ? __warn (kernel/panic.c:693) kern :warn : [ 327.633243] ? try_grab_page (mm/gup.c:229 (discriminator 1)) kern :warn : [ 327.634128] ? report_bug (lib/bug.c:180 lib/bug.c:219) kern :warn : [ 327.634985] ? handle_bug (arch/x86/kernel/traps.c:239 (discriminator 1)) kern :warn : [ 327.635835] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1)) kern :warn : [ 327.636695] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621) kern :warn : [ 327.637587] ? try_grab_page (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:457 include/linux/atomic/atomic-instrumented.h:33 include/linux/page_ref.h:67 include/linux/page_ref.h:89 mm/gup.c:229) kern :warn : [ 327.638509] ? try_grab_page (mm/gup.c:229 (discriminator 1)) kern :warn : [ 327.639418] ? try_grab_page (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:457 include/linux/atomic/atomic-instrumented.h:33 include/linux/page_ref.h:67 include/linux/page_ref.h:89 mm/gup.c:229) kern :warn : [ 327.640285] follow_huge_pmd (mm/gup.c:809) kern :warn : [ 327.641162] follow_pmd_mask+0x3f8/0x7e0 kern :warn : [ 327.642090] ? __pfx_follow_pmd_mask+0x10/0x10 kern :warn : [ 327.643125] ? find_vma (mm/mmap.c:1944) kern :warn : [ 327.643948] ? __pfx_find_vma (mm/mmap.c:1944) kern :warn : [ 327.644804] follow_page_mask (mm/gup.c:1116 mm/gup.c:1162) kern :warn : [ 327.645674] __get_user_pages (mm/gup.c:1588 (discriminator 1)) kern :warn : [ 327.646601] ? __pfx___get_user_pages (mm/gup.c:1522) kern :warn : [ 327.647533] ? down_read_killable (arch/x86/include/asm/atomic64_64.h:20 include/linux/atomic/atomic-arch-fallback.h:2629 include/linux/atomic/atomic-long.h:79 include/linux/atomic/atomic-instrumented.h:3224 kernel/locking/rwsem.c:176 kernel/locking/rwsem.c:181 kernel/locking/rwsem.c:249 kernel/locking/rwsem.c:241 kernel/locking/rwsem.c:1249 kernel/locking/rwsem.c:1273 kernel/locking/rwsem.c:1551) kern :warn : [ 327.648428] ? __pfx_down_read_killable (kernel/locking/rwsem.c:1547) kern :warn : [ 327.649345] __gup_longterm_locked (mm/gup.c:1859 mm/gup.c:2556) kern :warn : [ 327.650237] gup_fast_fallback (mm/gup.c:3476) kern :warn : [ 327.651111] ? __pfx__raw_spin_lock_irqsave (kernel/locking/spinlock.c:161) kern :warn : [ 327.652030] ? __pfx_gup_fast_fallback (mm/gup.c:3439) kern :warn : [ 327.652890] ? __link_object (include/linux/rculist.h:79 (discriminator 1) include/linux/rculist.h:128 (discriminator 1) mm/kmemleak.c:728 (discriminator 1)) kern :warn : [ 327.653674] iov_iter_extract_pages (lib/iov_iter.c:1584 (discriminator 1) lib/iov_iter.c:1646 (discriminator 1)) kern :warn : [ 327.654501] __bio_iov_iter_get_pages (block/bio.c:1353) kern :warn : [ 327.655382] ? __pfx___bio_iov_iter_get_pages (block/bio.c:1324) kern :warn : [ 327.656273] ? bio_init (arch/x86/include/asm/atomic.h:28 include/linux/atomic/atomic-arch-fallback.h:503 include/linux/atomic/atomic-instrumented.h:68 block/bio.c:279) kern :warn : [ 327.656980] ? bio_alloc_bioset (block/bio.c:578) kern :warn : [ 327.657721] bio_iov_iter_get_pages (block/bio.c:1446 (discriminator 3)) kern :warn : [ 327.658474] iomap_dio_bio_iter (fs/iomap/direct-io.c:388) kern :warn : [ 327.659248] __iomap_dio_rw (fs/iomap/direct-io.c:501 fs/iomap/direct-io.c:660) kern :warn : [ 327.659982] ? __pfx___iomap_dio_rw (fs/iomap/direct-io.c:544) kern :warn : [ 327.660729] ? __pfx_ext4_dio_write_checks (fs/ext4/file.c:424) kern :warn : [ 327.661518] ? __pfx_ext4_dio_write_end_io (fs/ext4/file.c:376) kern :warn : [ 327.662299] ? iomap_dio_complete (fs/iomap/direct-io.c:133) kern :warn : [ 327.663052] iomap_dio_rw (fs/iomap/direct-io.c:749) kern :warn : [ 327.663723] ext4_dio_write_iter (fs/ext4/file.c:577) kern :warn : [ 327.664439] ? __pfx_ext4_dio_write_iter (fs/ext4/file.c:499) kern :warn : [ 327.665185] ? folio_unlock (arch/x86/include/asm/bitops.h:101 include/asm-generic/bitops/instrumented-lock.h:80 include/linux/page-flags.h:762 mm/filemap.c:1508) kern :warn : [ 327.665838] ? do_wp_page (include/linux/vmstat.h:71 (discriminator 1) mm/memory.c:3193 (discriminator 1) mm/memory.c:3663 (discriminator 1)) kern :warn : [ 327.666494] ? __pfx___might_resched (kernel/sched/core.c:10151) kern :warn : [ 327.667254] vfs_write (fs/read_write.c:497 fs/read_write.c:590) kern :warn : [ 327.667915] ? __pfx___might_resched (kernel/sched/core.c:10151) kern :warn : [ 327.668657] ? __pfx_vfs_write (fs/read_write.c:571) kern :warn : [ 327.669342] ? __pfx___might_resched (kernel/sched/core.c:10151) kern :warn : [ 327.670062] ? __pfx_put_timespec64 (kernel/time/time.c:904) kern :warn : [ 327.670774] ksys_write (fs/read_write.c:643) kern :warn : [ 327.671409] ? __pfx_ksys_write (fs/read_write.c:633) kern :warn : [ 327.672129] ? __pfx___x64_sys_clock_gettime (kernel/time/posix-timers.c:1132) kern :warn : [ 327.672939] ? fpregs_restore_userregs (arch/x86/include/asm/bitops.h:75 include/asm-generic/bitops/instrumented-atomic.h:42 include/linux/thread_info.h:94 arch/x86/kernel/fpu/context.h:79) kern :warn : [ 327.673697] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1)) kern :warn : [ 327.674363] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) kern :warn : [ 327.675145] RIP: 0033:0x7fd4b2a24240 kern :warn : [ 327.675803] Code: 40 00 48 8b 15 c1 9b 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d a1 23 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89 All code ======== 0: 40 00 48 8b rex add %cl,-0x75(%rax) 4: 15 c1 9b 0d 00 adc $0xd9bc1,%eax 9: f7 d8 neg %eax b: 64 89 02 mov %eax,%fs:(%rdx) e: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax 15: eb b7 jmp 0xffffffffffffffce 17: 0f 1f 00 nopl (%rax) 1a: 80 3d a1 23 0e 00 00 cmpb $0x0,0xe23a1(%rip) # 0xe23c2 21: 74 17 je 0x3a 23: b8 01 00 00 00 mov $0x1,%eax 28: 0f 05 syscall 2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction 30: 77 58 ja 0x8a 32: c3 ret 33: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 3a: 48 83 ec 28 sub $0x28,%rsp 3e: 48 rex.W 3f: 89 .byte 0x89 Code starting with the faulting instruction =========================================== 0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax 6: 77 58 ja 0x60 8: c3 ret 9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 10: 48 83 ec 28 sub $0x28,%rsp 14: 48 rex.W 15: 89 .byte 0x89 The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240708/202407081641.6a640f9e-oliver.sang@intel.com -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <CGME20240704071134epcas5p2ec6160369e9092de98a051e05750bd4f@epcas5p2.samsung.com>]
* [PATCH v7 4/4] block: unpin user pages belonging to a folio at once [not found] ` <CGME20240704071134epcas5p2ec6160369e9092de98a051e05750bd4f@epcas5p2.samsung.com> @ 2024-07-04 7:03 ` Kundan Kumar 2024-07-06 8:10 ` Christoph Hellwig 0 siblings, 1 reply; 11+ messages in thread From: Kundan Kumar @ 2024-07-04 7:03 UTC (permalink / raw) To: axboe, hch, willy, kbusch Cc: linux-block, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti, gost.dev, Kundan Kumar Add a new function bio_release_folio() and use it to put refs by npages count. Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com> --- block/bio.c | 7 +------ block/blk.h | 7 +++++++ include/linux/mm.h | 1 + mm/gup.c | 13 +++++++++++++ 4 files changed, 22 insertions(+), 6 deletions(-) diff --git a/block/bio.c b/block/bio.c index 32c9c6d80384..0d923140006e 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1205,20 +1205,15 @@ void __bio_release_pages(struct bio *bio, bool mark_dirty) struct folio_iter fi; bio_for_each_folio_all(fi, bio) { - struct page *page; size_t nr_pages; - if (mark_dirty) { folio_lock(fi.folio); folio_mark_dirty(fi.folio); folio_unlock(fi.folio); } - page = folio_page(fi.folio, fi.offset / PAGE_SIZE); nr_pages = (fi.offset + fi.length - 1) / PAGE_SIZE - fi.offset / PAGE_SIZE + 1; - do { - bio_release_page(bio, page++); - } while (--nr_pages != 0); + bio_release_folio(bio, fi.folio, nr_pages); } } EXPORT_SYMBOL_GPL(__bio_release_pages); diff --git a/block/blk.h b/block/blk.h index 0c8857fe4079..18520b05c6ce 100644 --- a/block/blk.h +++ b/block/blk.h @@ -548,6 +548,13 @@ static inline void bio_release_page(struct bio *bio, struct page *page) unpin_user_page(page); } +static inline void bio_release_folio(struct bio *bio, struct folio *folio, + unsigned long npages) +{ + if (bio_flagged(bio, BIO_PAGE_PINNED)) + unpin_user_folio(folio, npages); +} + struct request_queue *blk_alloc_queue(struct queue_limits *lim, int node_id); int disk_scan_partitions(struct gendisk *disk, blk_mode_t mode); diff --git a/include/linux/mm.h b/include/linux/mm.h index 9849dfda44d4..b902c6c39e2b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1618,6 +1618,7 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages, bool make_dirty); void unpin_user_pages(struct page **pages, unsigned long npages); +void unpin_user_folio(struct folio *folio, unsigned long npages); static inline bool is_cow_mapping(vm_flags_t flags) { diff --git a/mm/gup.c b/mm/gup.c index ca0f5cedce9b..bc96efa43d1b 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -488,6 +488,19 @@ void unpin_user_pages(struct page **pages, unsigned long npages) } EXPORT_SYMBOL(unpin_user_pages); +/** + * unpin_user_folio() - release pages of a folio + * @folio: pointer to folio to be released + * @npages: number of pages of same folio + * + * Release npages of the folio + */ +void unpin_user_folio(struct folio *folio, unsigned long npages) +{ + gup_put_folio(folio, npages, FOLL_PIN); +} +EXPORT_SYMBOL(unpin_user_folio); + /* * Set the MMF_HAS_PINNED if not set yet; after set it'll be there for the mm's * lifecycle. Avoid setting the bit unless necessary, or it might cause write -- 2.25.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v7 4/4] block: unpin user pages belonging to a folio at once 2024-07-04 7:03 ` [PATCH v7 4/4] block: unpin user pages belonging to a folio at once Kundan Kumar @ 2024-07-06 8:10 ` Christoph Hellwig 2024-07-06 8:12 ` Christoph Hellwig 0 siblings, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2024-07-06 8:10 UTC (permalink / raw) To: Kundan Kumar Cc: axboe, hch, willy, kbusch, linux-block, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti, gost.dev > nr_pages = (fi.offset + fi.length - 1) / PAGE_SIZE - > fi.offset / PAGE_SIZE + 1; > + bio_release_folio(bio, fi.folio, nr_pages); > } > } > EXPORT_SYMBOL_GPL(__bio_release_pages); > diff --git a/block/blk.h b/block/blk.h > index 0c8857fe4079..18520b05c6ce 100644 > --- a/block/blk.h > +++ b/block/blk.h > @@ -548,6 +548,13 @@ static inline void bio_release_page(struct bio *bio, struct page *page) > unpin_user_page(page); > } > > +static inline void bio_release_folio(struct bio *bio, struct folio *folio, > + unsigned long npages) > +{ > + if (bio_flagged(bio, BIO_PAGE_PINNED)) > + unpin_user_folio(folio, npages); > +} This is only used in __bio_release_pages, and given that __bio_release_pages is only called when BIO_PAGE_PINNED is set there is no need to check it inside the loop again. Also this means we know the loop doesn't do anything if mark_dirty is false, which is another trivial check that can move into bio_release_pages. As this optimization already applies as-is I'll send a prep patch for it. so that we can avoid the npages calculation for the !BIO_PAGE_PINNED case. Morover having the BIO_PAGE_PINNED knowledge there means we can skip the entire loop for !BIO_PAGE_PINNED && > +/** > + * unpin_user_folio() - release pages of a folio > + * @folio: pointer to folio to be released > + * @npages: number of pages of same folio > + * > + * Release npages of the folio > + */ > +void unpin_user_folio(struct folio *folio, unsigned long npages) > +{ > + gup_put_folio(folio, npages, FOLL_PIN); > +} > +EXPORT_SYMBOL(unpin_user_folio); Please don't hide a new MM API inside a block patch, but split it out with a mm prefix. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v7 4/4] block: unpin user pages belonging to a folio at once 2024-07-06 8:10 ` Christoph Hellwig @ 2024-07-06 8:12 ` Christoph Hellwig 0 siblings, 0 replies; 11+ messages in thread From: Christoph Hellwig @ 2024-07-06 8:12 UTC (permalink / raw) To: Kundan Kumar Cc: axboe, hch, willy, kbusch, linux-block, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti, gost.dev On Sat, Jul 06, 2024 at 10:10:21AM +0200, Christoph Hellwig wrote: > Also this means we know the loop doesn't do anything if mark_dirty is > false, which is another trivial check that can move into > bio_release_pages. As this optimization already applies as-is I'll > send a prep patch for it. > > so that we can avoid the npages calculation for the !BIO_PAGE_PINNED > case. Morover having the BIO_PAGE_PINNED knowledge there means we > can skip the entire loop for !BIO_PAGE_PINNED && Braino - we're obviously already skipping it for all !BIO_PAGE_PINNED case. So discard most of this except for that fact that we should skip the wrapper doing the extra BIO_PAGE_PINNED checks. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-07-08 8:24 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CGME20240704071118epcas5p141ef3c2cbcde0ce31d342b5743a7dcf1@epcas5p1.samsung.com>
2024-07-04 7:03 ` [PATCH v7 0/4] block: add larger order folio instead of pages Kundan Kumar
[not found] ` <CGME20240704071122epcas5p12a52967e486a69375ee77db86f2594a0@epcas5p1.samsung.com>
2024-07-04 7:03 ` [PATCH v7 1/4] block: Added folio-lized version of bvec_try_merge_hw_page() Kundan Kumar
2024-07-06 8:01 ` Christoph Hellwig
[not found] ` <CGME20240704071126epcas5p2572342c5d25c3292a9a39cb8c798a42c@epcas5p2.samsung.com>
2024-07-04 7:03 ` [PATCH v7 2/4] block: Added folio-lized version of bio_add_hw_page() Kundan Kumar
2024-07-06 8:02 ` Christoph Hellwig
[not found] ` <CGME20240704071130epcas5p131b210c30237386f2c786e81c88355f6@epcas5p1.samsung.com>
2024-07-04 7:03 ` [PATCH v7 3/4] block: introduce folio awareness and add a bigger size from folio Kundan Kumar
2024-07-06 8:03 ` Christoph Hellwig
2024-07-08 8:24 ` kernel test robot
[not found] ` <CGME20240704071134epcas5p2ec6160369e9092de98a051e05750bd4f@epcas5p2.samsung.com>
2024-07-04 7:03 ` [PATCH v7 4/4] block: unpin user pages belonging to a folio at once Kundan Kumar
2024-07-06 8:10 ` Christoph Hellwig
2024-07-06 8:12 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).