* [PATCH v4 0/2] block: add larger order folio instead of pages [not found] <CGME20240605093220epcas5p18c9f9d8fe89f53f91f7c1c2464b07a65@epcas5p1.samsung.com> @ 2024-06-05 9:24 ` Kundan Kumar 2024-06-05 9:24 ` [PATCH v4 1/2] block: add folio awareness instead of looping through pages Kundan Kumar 2024-06-05 9:24 ` [PATCH v4 2/2] block: unpin user pages belonging to a folio Kundan Kumar 0 siblings, 2 replies; 11+ messages in thread From: Kundan Kumar @ 2024-06-05 9:24 UTC (permalink / raw) To: axboe, hch, willy, kbusch Cc: linux-block, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti, gost.dev, Kundan Kumar User space memory is mapped in kernel in form of pages array. These pages are iterated and added to BIO. In process, pages are also checked for contiguity and merged. When mTHP is enabled the pages generally belong to larger order folio. This patch series enables adding large folio to bio. It fetches folio for page in the page array. The page might start from an offset in the folio which could be multiples of PAGE_SIZE. Subsequent pages in page array might belong to same folio. Using the length of folio, folio_offset and remaining size, determine length in folio which can be added to the bio. Check if pages are contiguous and belong to same folio. If yes then skip further processing for the contiguous pages. This complete scheme reduces the overhead of iterating through pages. perf diff before and after this change: Perf diff for write I/O with 128K block size: 1.32% -0.33% [kernel.kallsyms] [k] bio_iov_iter_get_pages 1.78% [kernel.kallsyms] [k] bvec_try_merge_page Perf diff for read I/O with 128K block size: 3.99% -1.61% [kernel.kallsyms] [k] bio_iov_iter_get_pages 5.21% [kernel.kallsyms] [k] bvec_try_merge_page Patch 1: Adds changes to add larger order folio to BIO. Patch 2: If a large folio gets added, the subsequent pages of the folio are released. This helps to avoid calculations at I/O completion. Changes since v3: - Added change to see if pages are contiguous and belong to same folio. If not then avoid skipping of pages.(Suggested by Matthew Wilcox) Changes since v2: - Made separate patches - Corrected code as per kernel coding style - Removed size_folio variable Changes since v1: - Changed functions bio_iov_add_page() and bio_iov_add_zone_append_page() to accept a folio - Removed branch and calculate folio_offset and len in same fashion for both 0 order and larger folios - Added change in NVMe driver to use nvme_setup_prp_simple() by ignoring multiples of PAGE_SIZE in offset - Added a change to unpin_user_pages which were added as folios. Also stopped the unpin of pages one by one from __bio_release_pages() (Suggested by Keith) Kundan Kumar (2): block: add folio awareness instead of looping through pages block: unpin user pages belonging to a folio block/bio.c | 75 +++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 52 insertions(+), 23 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v4 1/2] block: add folio awareness instead of looping through pages 2024-06-05 9:24 ` [PATCH v4 0/2] block: add larger order folio instead of pages Kundan Kumar @ 2024-06-05 9:24 ` Kundan Kumar 2024-06-05 18:45 ` Matthew Wilcox ` (3 more replies) 2024-06-05 9:24 ` [PATCH v4 2/2] block: unpin user pages belonging to a folio Kundan Kumar 1 sibling, 4 replies; 11+ messages in thread From: Kundan Kumar @ 2024-06-05 9:24 UTC (permalink / raw) To: axboe, hch, willy, kbusch Cc: linux-block, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti, gost.dev, Kundan Kumar Add a bigger size from folio to bio and skip merge processing for pages. Fetch the offset of page within a folio. Depending on the size of folio and folio_offset, fetch a larger length. This length may consist of multiple contiguous pages if folio is multiorder. Using the length calculate number of pages which will be added to bio and increment the loop counter to skip those pages. There is also a check to see if pages are contiguous and belong to same folio, this is done as a COW may happen and change contiguous mapping of pages of folio. This technique helps to avoid overhead of merging pages which belong to same large order folio. Also folio-lize the functions bio_iov_add_page() and bio_iov_add_zone_append_page() Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com> --- block/bio.c | 62 +++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 48 insertions(+), 14 deletions(-) diff --git a/block/bio.c b/block/bio.c index e9e809a63c59..7857b9ca5957 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1204,7 +1204,7 @@ void bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter) bio_set_flag(bio, BIO_CLONED); } -static int bio_iov_add_page(struct bio *bio, struct page *page, +static int bio_iov_add_folio(struct bio *bio, struct folio *folio, unsigned int len, unsigned int offset) { bool same_page = false; @@ -1214,27 +1214,27 @@ static int bio_iov_add_page(struct bio *bio, struct page *page, if (bio->bi_vcnt > 0 && bvec_try_merge_page(&bio->bi_io_vec[bio->bi_vcnt - 1], - page, len, offset, &same_page)) { + &folio->page, len, offset, &same_page)) { bio->bi_iter.bi_size += len; if (same_page) - bio_release_page(bio, page); + bio_release_page(bio, &folio->page); return 0; } - __bio_add_page(bio, page, len, offset); + bio_add_folio_nofail(bio, folio, len, offset); return 0; } -static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page, +static int bio_iov_add_zone_append_folio(struct bio *bio, struct folio *folio, unsigned int len, unsigned int offset) { struct request_queue *q = bdev_get_queue(bio->bi_bdev); bool same_page = false; - if (bio_add_hw_page(q, bio, page, len, offset, + if (bio_add_hw_page(q, bio, &folio->page, len, offset, queue_max_zone_append_sectors(q), &same_page) != len) return -EINVAL; if (same_page) - bio_release_page(bio, page); + bio_release_page(bio, &folio->page); return 0; } @@ -1258,9 +1258,9 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt; struct page **pages = (struct page **)bv; ssize_t size, left; - unsigned len, i = 0; - size_t offset; - int ret = 0; + unsigned int len, i = 0, j; + size_t offset, folio_offset; + int ret = 0, num_pages; /* * Move page array up in the allocated memory for the bio vecs as far as @@ -1301,15 +1301,49 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) for (left = size, i = 0; left > 0; left -= len, i++) { struct page *page = pages[i]; + struct folio *folio = page_folio(page); + + /* Calculate the offset of page in folio */ + folio_offset = (folio_page_idx(folio, page) << PAGE_SHIFT) + + offset; + + len = min_t(size_t, (folio_size(folio) - folio_offset), left); + + num_pages = DIV_ROUND_UP(offset + len, PAGE_SIZE); + + if (num_pages > 1) { + ssize_t bytes = left; + size_t contig_sz = min_t(size_t, PAGE_SIZE - offset, + bytes); + + /* + * Check if pages are contiguous and belong to the + * same folio. + */ + for (j = i + 1; j < i + num_pages; j++) { + size_t next = min_t(size_t, PAGE_SIZE, bytes); + + if (page_folio(pages[j]) != folio || + pages[j] != pages[j - 1] + 1) { + break; + } + contig_sz += next; + bytes -= next; + } + num_pages = j - i; + len = contig_sz; + } - len = min_t(size_t, PAGE_SIZE - offset, left); if (bio_op(bio) == REQ_OP_ZONE_APPEND) { - ret = bio_iov_add_zone_append_page(bio, page, len, - offset); + ret = bio_iov_add_zone_append_folio(bio, folio, len, + folio_offset); if (ret) break; } else - bio_iov_add_page(bio, page, len, offset); + bio_iov_add_folio(bio, folio, len, folio_offset); + + /* Skip the pages which got added */ + i = i + (num_pages - 1); offset = 0; } -- 2.25.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v4 1/2] block: add folio awareness instead of looping through pages 2024-06-05 9:24 ` [PATCH v4 1/2] block: add folio awareness instead of looping through pages Kundan Kumar @ 2024-06-05 18:45 ` Matthew Wilcox 2024-06-06 4:54 ` Christoph Hellwig 2024-06-05 21:23 ` Matthew Wilcox ` (2 subsequent siblings) 3 siblings, 1 reply; 11+ messages in thread From: Matthew Wilcox @ 2024-06-05 18:45 UTC (permalink / raw) To: Kundan Kumar Cc: axboe, hch, kbusch, linux-block, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti, gost.dev On Wed, Jun 05, 2024 at 02:54:54PM +0530, Kundan Kumar wrote: > @@ -1214,27 +1214,27 @@ static int bio_iov_add_page(struct bio *bio, struct page *page, > > if (bio->bi_vcnt > 0 && > bvec_try_merge_page(&bio->bi_io_vec[bio->bi_vcnt - 1], > - page, len, offset, &same_page)) { > + &folio->page, len, offset, &same_page)) { Let's make that folio_page(folio, 0) > +static int bio_iov_add_zone_append_folio(struct bio *bio, struct folio *folio, > unsigned int len, unsigned int offset) > { > struct request_queue *q = bdev_get_queue(bio->bi_bdev); > bool same_page = false; > > - if (bio_add_hw_page(q, bio, page, len, offset, > + if (bio_add_hw_page(q, bio, &folio->page, len, offset, Likewise. > @@ -1301,15 +1301,49 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) > > for (left = size, i = 0; left > 0; left -= len, i++) { > struct page *page = pages[i]; > + struct folio *folio = page_folio(page); > + > + /* Calculate the offset of page in folio */ > + folio_offset = (folio_page_idx(folio, page) << PAGE_SHIFT) + > + offset; No it doesn't. I'd drop the comment entirely, but if you must have a comment, it turns the page offset into a folio offset. > + len = min_t(size_t, (folio_size(folio) - folio_offset), left); > + > + num_pages = DIV_ROUND_UP(offset + len, PAGE_SIZE); > + > + if (num_pages > 1) { > + ssize_t bytes = left; > + size_t contig_sz = min_t(size_t, PAGE_SIZE - offset, > + bytes); > + > + /* > + * Check if pages are contiguous and belong to the > + * same folio. > + */ You do like writing obvious comments, don't you? > + for (j = i + 1; j < i + num_pages; j++) { > + size_t next = min_t(size_t, PAGE_SIZE, bytes); > + > + if (page_folio(pages[j]) != folio || > + pages[j] != pages[j - 1] + 1) { > + break; > + } > + contig_sz += next; > + bytes -= next; > + } > + num_pages = j - i; > + len = contig_sz; > + } ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v4 1/2] block: add folio awareness instead of looping through pages 2024-06-05 18:45 ` Matthew Wilcox @ 2024-06-06 4:54 ` Christoph Hellwig 0 siblings, 0 replies; 11+ messages in thread From: Christoph Hellwig @ 2024-06-06 4:54 UTC (permalink / raw) To: Matthew Wilcox Cc: Kundan Kumar, axboe, hch, kbusch, linux-block, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti, gost.dev On Wed, Jun 05, 2024 at 07:45:23PM +0100, Matthew Wilcox wrote: > > if (bio->bi_vcnt > 0 && > > bvec_try_merge_page(&bio->bi_io_vec[bio->bi_vcnt - 1], > > - page, len, offset, &same_page)) { > > + &folio->page, len, offset, &same_page)) { > > Let's make that folio_page(folio, 0) Or just pass a folio to bvec_try_merge_page (and rename it) if we want to go all the way, but given that that will go down quite a bit into xen and zone device code maybe that's not worth it for now as we should eventually just convert it to a plain phys_addr_t anyway. > > struct request_queue *q = bdev_get_queue(bio->bi_bdev); > > bool same_page = false; > > > > - if (bio_add_hw_page(q, bio, page, len, offset, > > + if (bio_add_hw_page(q, bio, &folio->page, len, offset, > > Likewise. OTOH replacing / augmenting bio_add_hw_page with a bio_add_hw_folio should be worthwhile, so I'd love to see that as a prep patch. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v4 1/2] block: add folio awareness instead of looping through pages 2024-06-05 9:24 ` [PATCH v4 1/2] block: add folio awareness instead of looping through pages Kundan Kumar 2024-06-05 18:45 ` Matthew Wilcox @ 2024-06-05 21:23 ` Matthew Wilcox 2024-06-06 4:56 ` Christoph Hellwig 2024-06-11 13:50 ` kernel test robot 3 siblings, 0 replies; 11+ messages in thread From: Matthew Wilcox @ 2024-06-05 21:23 UTC (permalink / raw) To: Kundan Kumar Cc: axboe, hch, kbusch, linux-block, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti, gost.dev On Wed, Jun 05, 2024 at 02:54:54PM +0530, Kundan Kumar wrote: > + /* > + * Check if pages are contiguous and belong to the > + * same folio. > + */ Oh, a useful comment here would be *why* we do this, not *what* we do. /* * We might COW a single page in the middle of * a large folio, so we have to check that all * pages belong to the same folio. */ for example. > + for (j = i + 1; j < i + num_pages; j++) { > + size_t next = min_t(size_t, PAGE_SIZE, bytes); > + > + if (page_folio(pages[j]) != folio || > + pages[j] != pages[j - 1] + 1) { > + break; > + } > + contig_sz += next; > + bytes -= next; > + } > + num_pages = j - i; > + len = contig_sz; > + } ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v4 1/2] block: add folio awareness instead of looping through pages 2024-06-05 9:24 ` [PATCH v4 1/2] block: add folio awareness instead of looping through pages Kundan Kumar 2024-06-05 18:45 ` Matthew Wilcox 2024-06-05 21:23 ` Matthew Wilcox @ 2024-06-06 4:56 ` Christoph Hellwig 2024-06-10 9:26 ` Kundan Kumar 2024-06-11 13:50 ` kernel test robot 3 siblings, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2024-06-06 4:56 UTC (permalink / raw) To: Kundan Kumar Cc: axboe, hch, willy, kbusch, linux-block, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti, gost.dev > @@ -1301,15 +1301,49 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) > > for (left = size, i = 0; left > 0; left -= len, i++) { > struct page *page = pages[i]; > + struct folio *folio = page_folio(page); > + > + /* Calculate the offset of page in folio */ > + folio_offset = (folio_page_idx(folio, page) << PAGE_SHIFT) + > + offset; > + > + len = min_t(size_t, (folio_size(folio) - folio_offset), left); > + > + num_pages = DIV_ROUND_UP(offset + len, PAGE_SIZE); > + > + if (num_pages > 1) { I still hate having this logic in the block layer. Even if it is just a dumb wrapper with the same logic as here I'd much prefer to have a iov_iter_extract_folios, which can then shift down into a pin_user_folios_fast and into the MM slowly rather than adding this logic to the caller. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v4 1/2] block: add folio awareness instead of looping through pages 2024-06-06 4:56 ` Christoph Hellwig @ 2024-06-10 9:26 ` Kundan Kumar 0 siblings, 0 replies; 11+ messages in thread From: Kundan Kumar @ 2024-06-10 9:26 UTC (permalink / raw) To: Christoph Hellwig Cc: axboe, willy, kbusch, linux-block, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti, gost.dev [-- Attachment #1: Type: text/plain, Size: 1096 bytes --] On 06/06/24 06:56AM, Christoph Hellwig wrote: >> @@ -1301,15 +1301,49 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) >> >> for (left = size, i = 0; left > 0; left -= len, i++) { >> struct page *page = pages[i]; >> + struct folio *folio = page_folio(page); >> + >> + /* Calculate the offset of page in folio */ >> + folio_offset = (folio_page_idx(folio, page) << PAGE_SHIFT) + >> + offset; >> + >> + len = min_t(size_t, (folio_size(folio) - folio_offset), left); >> + >> + num_pages = DIV_ROUND_UP(offset + len, PAGE_SIZE); >> + >> + if (num_pages > 1) { > >I still hate having this logic in the block layer. Even if it is just >a dumb wrapper with the same logic as here I'd much prefer to have >a iov_iter_extract_folios, which can then shift down into a >pin_user_folios_fast and into the MM slowly rather than adding this >logic to the caller. > iov_iter_extract_folios will require allocation of a folio_vec array which then will be filled after processing the pages[] array. This will introduce extra allocation cost in the hot path. -- Kundan [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v4 1/2] block: add folio awareness instead of looping through pages 2024-06-05 9:24 ` [PATCH v4 1/2] block: add folio awareness instead of looping through pages Kundan Kumar ` (2 preceding siblings ...) 2024-06-06 4:56 ` Christoph Hellwig @ 2024-06-11 13:50 ` kernel test robot 3 siblings, 0 replies; 11+ messages in thread From: kernel test robot @ 2024-06-11 13:50 UTC (permalink / raw) To: Kundan Kumar Cc: oe-lkp, lkp, linux-block, ltp, axboe, hch, willy, kbusch, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti, gost.dev, Kundan Kumar, oliver.sang Hello, kernel test robot noticed "WARNING:at_lib/iov_iter.c:#iov_iter_revert" on: commit: d245fbe9b4ca466a625094e538ce363e4dbfde94 ("[PATCH v4 1/2] block: add folio awareness instead of looping through pages") url: https://github.com/intel-lab-lkp/linux/commits/Kundan-Kumar/block-add-folio-awareness-instead-of-looping-through-pages/20240605-182718 base: https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-next patch link: https://lore.kernel.org/all/20240605092455.20435-2-kundan.kumar@samsung.com/ patch subject: [PATCH v4 1/2] block: add folio awareness instead of looping through pages in testcase: ltp version: ltp-x86_64-14c1f76-1_20240608 with following parameters: test: lvm.local-00 compiler: gcc-13 test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz (Kaby Lake) with 32G memory (please refer to attached dmesg/kmsg for entire log/backtrace) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202406111633.69b1950e-lkp@intel.com The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240611/202406111633.69b1950e-lkp@intel.com [ 130.031598][ T3261] ------------[ cut here ]------------ [ 130.037219][ T3261] WARNING: CPU: 3 PID: 3261 at lib/iov_iter.c:552 iov_iter_revert+0x1e7/0x250 [ 130.045995][ T3261] Modules linked in: vfat fat xfs ext2 btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal sd_mod intel_powerclamp t10_pi coretemp crc64_rocksoft_generic crc64_rocksoft crc64 kvm_intel sg ipmi_devintf ipmi_msghandler i915 kvm crct10dif_pclmul drm_buddy crc32_pclmul intel_gtt crc32c_intel ghash_clmulni_intel drm_display_helper sha512_ssse3 mei_wdt ttm rapl drm_kms_helper wmi_bmof ahci mei_me libahci video intel_cstate i2c_designware_platform i2c_designware_core intel_uncore idma64 i2c_i801 libata mei i2c_smbus pinctrl_sunrisepoint wmi acpi_pad binfmt_misc fuse loop drm dm_mod ip_tables [ 130.104614][ T3261] CPU: 3 PID: 3261 Comm: pvcreate Tainted: G S 6.10.0-rc1-00017-gd245fbe9b4ca #1 [ 130.114995][ T3261] Hardware name: Dell Inc. OptiPlex 7050/062KRH, BIOS 1.2.0 12/22/2016 [ 130.123127][ T3261] RIP: 0010:iov_iter_revert+0x1e7/0x250 [ 130.128576][ T3261] Code: 48 89 f8 48 c1 e8 03 42 0f b6 04 30 84 c0 74 04 3c 03 7e 31 8b 45 08 49 83 c4 01 4d 89 65 20 48 39 d8 72 d1 49 89 6d 10 eb 89 <0f> 0b eb 8c 4c 89 ef e8 ed 5b 6a ff e9 74 fe ff ff e8 73 5c 6a ff [ 130.148044][ T3261] RSP: 0018:ffffc90000f4f8e8 EFLAGS: 00010286 [ 130.154025][ T3261] RAX: 0000000000000000 RBX: fffffffffffff200 RCX: 1ffff1103df41cfd [ 130.161898][ T3261] RDX: 1ffff1103df41d04 RSI: fffffffffffff200 RDI: ffffc90000f4fbf0 [ 130.169767][ T3261] RBP: 0000000000000002 R08: 0000000000099000 R09: 0000000000001200 [ 130.177636][ T3261] R10: 0000000000000002 R11: ffffea000f6a0000 R12: dffffc0000000000 [ 130.185502][ T3261] R13: ffff8881efa0e7c0 R14: ffffc90000f4fbf0 R15: fffffffffffff200 [ 130.193371][ T3261] FS: 00007fc2df84fd40(0000) GS:ffff88876c180000(0000) knlGS:0000000000000000 [ 130.202190][ T3261] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 130.208672][ T3261] CR2: 000055bf19c62500 CR3: 0000000807f5c001 CR4: 00000000003706f0 [ 130.216537][ T3261] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 130.224413][ T3261] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 130.232292][ T3261] Call Trace: [ 130.235486][ T3261] <TASK> [ 130.238335][ T3261] ? __warn+0xcc/0x260 [ 130.242311][ T3261] ? iov_iter_revert+0x1e7/0x250 [ 130.247138][ T3261] ? report_bug+0x261/0x2c0 [ 130.251500][ T3261] ? handle_bug+0x6d/0x90 [ 130.255683][ T3261] ? exc_invalid_op+0x17/0x40 [ 130.260210][ T3261] ? asm_exc_invalid_op+0x1a/0x20 [ 130.265085][ T3261] ? iov_iter_revert+0x1e7/0x250 [ 130.269887][ T3261] __bio_iov_iter_get_pages+0x868/0xcb0 [ 130.275282][ T3261] ? bio_associate_blkg_from_css+0x2f5/0xbb0 [ 130.281111][ T3261] ? __pfx___bio_iov_iter_get_pages+0x10/0x10 [ 130.287021][ T3261] ? bio_init+0x34a/0x5c0 [ 130.291199][ T3261] ? bio_alloc_bioset+0x28f/0x950 [ 130.296073][ T3261] bio_iov_iter_get_pages+0xb6/0x470 [ 130.301205][ T3261] __blkdev_direct_IO_async+0x3be/0x7f0 [ 130.306597][ T3261] ? blkdev_direct_IO+0x1d7/0x310 [ 130.311467][ T3261] blkdev_write_iter+0x5cd/0xac0 [ 130.316254][ T3261] aio_write+0x344/0x730 [ 130.320346][ T3261] ? __pfx_aio_write+0x10/0x10 [ 130.324961][ T3261] ? _raw_spin_lock_irqsave+0x8b/0xf0 [ 130.330182][ T3261] ? io_submit_one+0x213/0x9a0 [ 130.334794][ T3261] io_submit_one+0x213/0x9a0 [ 130.339231][ T3261] ? __pfx_io_submit_one+0x10/0x10 [ 130.344190][ T3261] ? __pfx___might_resched+0x10/0x10 [ 130.349319][ T3261] __x64_sys_io_submit+0x167/0x380 [ 130.354279][ T3261] ? __pfx___x64_sys_io_submit+0x10/0x10 [ 130.359762][ T3261] ? fpregs_restore_userregs+0xe3/0x1f0 [ 130.365153][ T3261] do_syscall_64+0x5f/0x170 [ 130.369506][ T3261] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 130.375243][ T3261] RIP: 0033:0x7fc2dfd41719 [ 130.379508][ T3261] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b7 06 0d 00 f7 d8 64 89 01 48 [ 130.398931][ T3261] RSP: 002b:00007ffc819ab548 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1 [ 130.407175][ T3261] RAX: ffffffffffffffda RBX: 00007fc2df84fb10 RCX: 00007fc2dfd41719 [ 130.414988][ T3261] RDX: 00007ffc819ab5f0 RSI: 0000000000000001 RDI: 00007fc2e00cf000 [ 130.422803][ T3261] RBP: 00007fc2e00cf000 R08: 00007ffc819ab490 R09: 0000000000000008 [ 130.430616][ T3261] R10: 000055bf19955608 R11: 0000000000000246 R12: 0000000000000001 [ 130.438425][ T3261] R13: 0000000000000002 R14: 00007ffc819ab5f0 R15: 00007fc2e010e020 [ 130.446238][ T3261] </TASK> [ 130.449114][ T3261] ---[ end trace 0000000000000000 ]--- [ 130.455322][ T3261] ------------[ cut here ]------------ [ 130.460954][ T3261] WARNING: CPU: 7 PID: 3261 at lib/iov_iter.c:552 iov_iter_revert+0x1e7/0x250 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v4 2/2] block: unpin user pages belonging to a folio 2024-06-05 9:24 ` [PATCH v4 0/2] block: add larger order folio instead of pages Kundan Kumar 2024-06-05 9:24 ` [PATCH v4 1/2] block: add folio awareness instead of looping through pages Kundan Kumar @ 2024-06-05 9:24 ` Kundan Kumar 2024-06-05 21:22 ` Matthew Wilcox 1 sibling, 1 reply; 11+ messages in thread From: Kundan Kumar @ 2024-06-05 9:24 UTC (permalink / raw) To: axboe, hch, willy, kbusch Cc: linux-block, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti, gost.dev, Kundan Kumar Unpin pages which belong to same folio. This enables us to release folios on I/O completion rather than looping through pages. Suggested-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com> --- block/bio.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/block/bio.c b/block/bio.c index 7857b9ca5957..28418170a14a 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1166,20 +1166,12 @@ void __bio_release_pages(struct bio *bio, bool mark_dirty) struct folio_iter fi; bio_for_each_folio_all(fi, bio) { - struct page *page; - size_t nr_pages; - if (mark_dirty) { folio_lock(fi.folio); folio_mark_dirty(fi.folio); folio_unlock(fi.folio); } - page = folio_page(fi.folio, fi.offset / PAGE_SIZE); - nr_pages = (fi.offset + fi.length - 1) / PAGE_SIZE - - fi.offset / PAGE_SIZE + 1; - do { - bio_release_page(bio, page++); - } while (--nr_pages != 0); + bio_release_page(bio, &fi.folio->page); } } EXPORT_SYMBOL_GPL(__bio_release_pages); @@ -1342,6 +1334,9 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) } else bio_iov_add_folio(bio, folio, len, folio_offset); + if (bio_flagged(bio, BIO_PAGE_PINNED) && num_pages > 1) + unpin_user_pages(pages + i, num_pages - 1); + /* Skip the pages which got added */ i = i + (num_pages - 1); -- 2.25.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v4 2/2] block: unpin user pages belonging to a folio 2024-06-05 9:24 ` [PATCH v4 2/2] block: unpin user pages belonging to a folio Kundan Kumar @ 2024-06-05 21:22 ` Matthew Wilcox 2024-06-06 4:58 ` Christoph Hellwig 0 siblings, 1 reply; 11+ messages in thread From: Matthew Wilcox @ 2024-06-05 21:22 UTC (permalink / raw) To: Kundan Kumar Cc: axboe, hch, kbusch, linux-block, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti, gost.dev On Wed, Jun 05, 2024 at 02:54:55PM +0530, Kundan Kumar wrote: > - page = folio_page(fi.folio, fi.offset / PAGE_SIZE); > - nr_pages = (fi.offset + fi.length - 1) / PAGE_SIZE - > - fi.offset / PAGE_SIZE + 1; > - do { > - bio_release_page(bio, page++); > - } while (--nr_pages != 0); > + bio_release_page(bio, &fi.folio->page); > } Why can't we have ... bio_release_folio(bio, fi.folio, nr_pages); which is implemented as: static inline void bio_release_page(struct bio *bio, struct folio *folio, unsigned long nr_pages) { if (bio_flagged(bio, BIO_PAGE_PINNED)) gup_put_folio(folio, nr_pages, FOLL_PIN); } Sure, we'd need to make gup_put_folio() unstatic, but this seems far more sensible. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v4 2/2] block: unpin user pages belonging to a folio 2024-06-05 21:22 ` Matthew Wilcox @ 2024-06-06 4:58 ` Christoph Hellwig 0 siblings, 0 replies; 11+ messages in thread From: Christoph Hellwig @ 2024-06-06 4:58 UTC (permalink / raw) To: Matthew Wilcox Cc: Kundan Kumar, axboe, hch, kbusch, linux-block, joshi.k, mcgrof, anuj20.g, nj.shetty, c.gameti, gost.dev On Wed, Jun 05, 2024 at 10:22:11PM +0100, Matthew Wilcox wrote: > Why can't we have ... > > bio_release_folio(bio, fi.folio, nr_pages); > > which is implemented as: > > static inline void bio_release_page(struct bio *bio, struct folio *folio, unsigned long nr_pages) > { > if (bio_flagged(bio, BIO_PAGE_PINNED)) > gup_put_folio(folio, nr_pages, FOLL_PIN); > } > > Sure, we'd need to make gup_put_folio() unstatic, but this seems far > more sensible. Yes. Although maybe a unpin_user_folio wrapper that hides the FOLL_PIN which we're trying to keep private would be the slightly nicer variant. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-06-11 13:50 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CGME20240605093220epcas5p18c9f9d8fe89f53f91f7c1c2464b07a65@epcas5p1.samsung.com>
2024-06-05 9:24 ` [PATCH v4 0/2] block: add larger order folio instead of pages Kundan Kumar
2024-06-05 9:24 ` [PATCH v4 1/2] block: add folio awareness instead of looping through pages Kundan Kumar
2024-06-05 18:45 ` Matthew Wilcox
2024-06-06 4:54 ` Christoph Hellwig
2024-06-05 21:23 ` Matthew Wilcox
2024-06-06 4:56 ` Christoph Hellwig
2024-06-10 9:26 ` Kundan Kumar
2024-06-11 13:50 ` kernel test robot
2024-06-05 9:24 ` [PATCH v4 2/2] block: unpin user pages belonging to a folio Kundan Kumar
2024-06-05 21:22 ` Matthew Wilcox
2024-06-06 4:58 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).