linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wang Yugui <wangyugui@e16-tech.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: Creating large folios in iomap buffered write path
Date: Sat, 20 May 2023 21:35:32 +0800	[thread overview]
Message-ID: <20230520213531.38CB.409509F4@e16-tech.com> (raw)
In-Reply-To: <ZGeX9Oc5vTkrceLZ@casper.infradead.org>

Hi,

> On Fri, May 19, 2023 at 10:55:29AM +0800, Wang Yugui wrote:
> > Hi,
> > 
> > > On Thu, May 18, 2023 at 10:46:43PM +0100, Matthew Wilcox wrote:
> > > > -struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos)
> > > > +struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len)
> > > >  {
> > > >  	unsigned fgp = FGP_WRITEBEGIN | FGP_NOFS;
> > > > +	struct folio *folio;
> > > >  
> > > >  	if (iter->flags & IOMAP_NOWAIT)
> > > >  		fgp |= FGP_NOWAIT;
> > > > +	fgp |= fgp_order(len);
> > > >  
> > > > -	return __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT,
> > > > +	folio = __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT,
> > > >  			fgp, mapping_gfp_mask(iter->inode->i_mapping));
> > > > +	if (!IS_ERR(folio) && folio_test_large(folio))
> > > > +		printk("index:%lu len:%zu order:%u\n", (unsigned long)(pos / PAGE_SIZE), len, folio_order(folio));
> > > > +	return folio;
> > > >  }
> > > 
> > > Forgot to take the debugging out.  This should read:
> > > 
> > > -struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos)
> > > +struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len)
> > >  {
> > >  	unsigned fgp = FGP_WRITEBEGIN | FGP_NOFS;
> > >  	if (iter->flags & IOMAP_NOWAIT)
> > >  		fgp |= FGP_NOWAIT;
> > > +	fgp |= fgp_order(len);
> > >  
> > >  	return __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT,
> > >  			fgp, mapping_gfp_mask(iter->inode->i_mapping));
> > >  }
> > 
> > I test it (attachment file) on 6.4.0-rc2.
> > fio -name write-bandwidth -rw=write -bs=1024Ki -size=32Gi -runtime=30 -iodepth 1 -ioengine sync -zero_buffers=1 -direct=0 -end_fsync=1 -numjobs=4 -directory=/mnt/test
> > 
> > fio  WRITE: bw=2430MiB/s.
> > expected value: > 6000MiB/s
> > so yet no fio write bandwidth improvement.
> 
> That's basically unchanged.  There's no per-page or per-block work being
> done in start/end writeback, so if Dave's investigation is applicable
> to your situation, I'd expect to see an improvement.
> 
> Maybe try the second version of the patch I sent with the debug in,
> to confirm you really are seeing large folios being created (you might
> want to use printk_ratelimit() instead of printk to ensure it doesn't
> overwhelm your system)?  That fio command you were using ought to create
> them, but there's always a chance it doesn't.
> 
> Perhaps you could use perf (the command Dave used) to see where the time
> is being spent.

test result of the second version of the patch.

# dmesg |grep 'index\|suppressed'
[   89.376149] index:0 len:292 order:2
[   97.862938] index:0 len:4096 order:2
[   98.340665] index:0 len:4096 order:2
[   98.346633] index:0 len:4096 order:2
[   98.352323] index:0 len:4096 order:2
[   98.359952] index:0 len:4096 order:2
[   98.364015] index:3 len:4096 order:2
[   98.368943] index:0 len:4096 order:2
[   98.374285] index:0 len:4096 order:2
[   98.379163] index:3 len:4096 order:2
[   98.384760] index:0 len:4096 order:2
[  181.103751] iomap_get_folio: 342 callbacks suppressed
[  181.103761] index:0 len:292 order:2


'perf report -g' result:
Samples: 344K of event 'cycles', Event count (approx.): 103747884662
  Children      Self  Command          Shared Object            Symbol
+   58.73%     0.01%  fio              [kernel.kallsyms]        [k] entry_SYSCALL_64_after_hwframe
+   58.72%     0.01%  fio              [kernel.kallsyms]        [k] do_syscall_64
+   58.53%     0.00%  fio              libpthread-2.17.so       [.] 0x00007f83e400e6fd
+   58.47%     0.01%  fio              [kernel.kallsyms]        [k] ksys_write
+   58.45%     0.02%  fio              [kernel.kallsyms]        [k] vfs_write
+   58.41%     0.03%  fio              [kernel.kallsyms]        [k] xfs_file_buffered_write
+   57.96%     0.57%  fio              [kernel.kallsyms]        [k] iomap_file_buffered_write
+   27.57%     1.29%  fio              [kernel.kallsyms]        [k] iomap_write_begin
+   25.32%     0.43%  fio              [kernel.kallsyms]        [k] iomap_get_folio
+   24.84%     0.70%  fio              [kernel.kallsyms]        [k] __filemap_get_folio.part.69
+   20.11%     0.00%  kworker/u98:15-  [kernel.kallsyms]        [k] ret_from_fork
+   20.11%     0.00%  kworker/u98:15-  [kernel.kallsyms]        [k] kthread
+   20.11%     0.00%  kworker/u98:15-  [kernel.kallsyms]        [k] worker_thread
+   20.11%     0.00%  kworker/u98:15-  [kernel.kallsyms]        [k] process_one_work
+   20.11%     0.00%  kworker/u98:15-  [kernel.kallsyms]        [k] wb_workfn
+   20.11%     0.00%  kworker/u98:15-  [kernel.kallsyms]        [k] wb_writeback
+   20.11%     0.00%  kworker/u98:15-  [kernel.kallsyms]        [k] __writeback_inodes_wb
+   20.11%     0.00%  kworker/u98:15-  [kernel.kallsyms]        [k] writeback_sb_inodes
+   20.11%     0.00%  kworker/u98:15-  [kernel.kallsyms]        [k] __writeback_single_inode
+   20.11%     0.00%  kworker/u98:15-  [kernel.kallsyms]        [k] do_writepages
+   20.11%     0.00%  kworker/u98:15-  [kernel.kallsyms]        [k] xfs_vm_writepages
+   20.10%     0.00%  kworker/u98:15-  [kernel.kallsyms]        [k] iomap_writepages
+   20.10%     0.69%  kworker/u98:15-  [kernel.kallsyms]        [k] write_cache_pages
+   16.95%     0.39%  fio              [kernel.kallsyms]        [k] copy_page_from_iter_atomic
+   16.53%     0.10%  fio              [kernel.kallsyms]        [k] copyin


'perf report ' result:

Samples: 335K of event 'cycles', Event count (approx.): 108508755052
Overhead  Command          Shared Object        Symbol
  17.70%  fio              [kernel.kallsyms]    [k] rep_movs_alternative
   2.89%  kworker/u98:2-e  [kernel.kallsyms]    [k] native_queued_spin_lock_slowpath
   2.88%  fio              [kernel.kallsyms]    [k] get_page_from_freelist
   2.67%  fio              [kernel.kallsyms]    [k] native_queued_spin_lock_slowpath
   2.59%  fio              [kernel.kallsyms]    [k] asm_exc_nmi
   2.25%  swapper          [kernel.kallsyms]    [k] intel_idle
   1.59%  kworker/u98:2-e  [kernel.kallsyms]    [k] __folio_start_writeback
   1.52%  fio              [kernel.kallsyms]    [k] xas_load
   1.45%  fio              [kernel.kallsyms]    [k] lru_add_fn
   1.44%  fio              [kernel.kallsyms]    [k] xas_descend
   1.32%  fio              [kernel.kallsyms]    [k] iomap_write_begin
   1.29%  fio              [kernel.kallsyms]    [k] __filemap_add_folio
   1.08%  kworker/u98:2-e  [kernel.kallsyms]    [k] folio_clear_dirty_for_io
   1.07%  fio              [kernel.kallsyms]    [k] __folio_mark_dirty
   0.94%  fio              [kernel.kallsyms]    [k] iomap_set_range_uptodate.part.24
   0.93%  fio              [kernel.kallsyms]    [k] node_dirty_ok
   0.92%  kworker/u98:2-e  [kernel.kallsyms]    [k] _raw_spin_lock_irqsave
   0.83%  fio              [kernel.kallsyms]    [k] xas_start
   0.83%  fio              [kernel.kallsyms]    [k] __alloc_pages
   0.83%  fio              [kernel.kallsyms]    [k] _raw_spin_lock_irqsave
   0.81%  kworker/u98:2-e  [kernel.kallsyms]    [k] asm_exc_nmi
   0.79%  fio              [kernel.kallsyms]    [k] percpu_counter_add_batch
   0.75%  kworker/u98:2-e  [kernel.kallsyms]    [k] iomap_writepage_map
   0.74%  kworker/u98:2-e  [kernel.kallsyms]    [k] __mod_lruvec_page_state
   0.70%  fio              fio                  [.] 0x000000000001b1ac
   0.70%  fio              [kernel.kallsyms]    [k] filemap_dirty_folio
   0.69%  kworker/u98:2-e  [kernel.kallsyms]    [k] write_cache_pages
   0.69%  fio              [kernel.kallsyms]    [k] __filemap_get_folio.part.69
   0.67%  kworker/1:0-eve  [kernel.kallsyms]    [k] native_queued_spin_lock_slowpath
   0.66%  fio              [kernel.kallsyms]    [k] __mod_lruvec_page_state
   0.64%  fio              [kernel.kallsyms]    [k] __mod_node_page_state
   0.64%  fio              [kernel.kallsyms]    [k] folio_add_lru
   0.64%  fio              [kernel.kallsyms]    [k] balance_dirty_pages_ratelimited_flags
   0.62%  fio              [kernel.kallsyms]    [k] __mod_memcg_lruvec_state
   0.61%  fio              [kernel.kallsyms]    [k] iomap_write_end
   0.60%  fio              [kernel.kallsyms]    [k] xas_find_conflict
   0.59%  fio              [kernel.kallsyms]    [k] bad_range
   0.58%  kworker/u98:2-e  [kernel.kallsyms]    [k] xas_load
   0.57%  fio              [kernel.kallsyms]    [k] iomap_file_buffered_write
   0.56%  kworker/u98:2-e  [kernel.kallsyms]    [k] percpu_counter_add_batch
   0.49%  fio              [kernel.kallsyms]    [k] __might_resched


'perf top -g -U' result:
Samples: 78K of event 'cycles', 4000 Hz, Event count (approx.): 29400815085 lost: 0/0 drop: 0/0
  Children      Self  Shared Object       Symbol
+   62.59%     0.03%  [kernel]            [k] entry_SYSCALL_64_after_hwframe
+   60.15%     0.02%  [kernel]            [k] do_syscall_64
+   59.45%     0.02%  [kernel]            [k] vfs_write
+   59.09%     0.54%  [kernel]            [k] iomap_file_buffered_write
+   57.41%     0.00%  [kernel]            [k] ksys_write
+   57.36%     0.01%  [kernel]            [k] xfs_file_buffered_write
+   37.82%     0.00%  libpthread-2.17.so  [.] 0x00007fce6f20e6fd
+   26.83%     1.20%  [kernel]            [k] iomap_write_begin
+   24.65%     0.45%  [kernel]            [k] iomap_get_folio
+   24.15%     0.74%  [kernel]            [k] __filemap_get_folio.part.69
+   20.17%     0.00%  [kernel]            [k] __writeback_single_inode
+   20.17%     0.65%  [kernel]            [k] write_cache_pages
+   17.66%     0.43%  [kernel]            [k] copy_page_from_iter_atomic
+   17.18%     0.12%  [kernel]            [k] copyin
+   17.08%    16.71%  [kernel]            [k] rep_movs_alternative
+   16.97%     0.00%  [kernel]            [k] ret_from_fork
+   16.97%     0.00%  [kernel]            [k] kthread
+   16.87%     0.00%  [kernel]            [k] worker_thread
+   16.84%     0.00%  [kernel]            [k] process_one_work
+   14.86%     0.17%  [kernel]            [k] filemap_add_folio
+   13.83%     0.77%  [kernel]            [k] iomap_writepage_map
+   11.90%     0.33%  [kernel]            [k] iomap_finish_ioend
+   11.57%     0.23%  [kernel]            [k] folio_end_writeback
+   11.51%     0.73%  [kernel]            [k] iomap_write_end
+   11.30%     2.14%  [kernel]            [k] __folio_end_writeback
+   10.70%     0.00%  [kernel]            [k] wb_workfn
+   10.70%     0.00%  [kernel]            [k] wb_writeback
+   10.70%     0.00%  [kernel]            [k] __writeback_inodes_wb
+   10.70%     0.00%  [kernel]            [k] writeback_sb_inodes
+   10.70%     0.00%  [kernel]            [k] do_writepages
+   10.70%     0.00%  [kernel]            [k] xfs_vm_writepages
+   10.70%     0.00%  [kernel]            [k] iomap_writepages
+   10.19%     2.68%  [kernel]            [k] _raw_spin_lock_irqsave
+   10.17%     1.35%  [kernel]            [k] __filemap_add_folio
+    9.94%     0.00%  [unknown]           [k] 0x0000000001942a70
+    9.94%     0.00%  [unknown]           [k] 0x0000000001942ac0
+    9.94%     0.00%  [unknown]           [k] 0x0000000001942b30

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2023/05/20




  reply	other threads:[~2023-05-20 13:35 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-08  9:24 performance regression between 6.1.x and 5.15.x Wang Yugui
2023-05-08 14:46 ` Wang Yugui
2023-05-08 22:32   ` Dave Chinner
2023-05-08 23:25     ` Wang Yugui
2023-05-09  1:36       ` Dave Chinner
2023-05-09 12:37         ` Wang Yugui
2023-05-09 22:14           ` Dave Chinner
2023-05-10  5:46             ` Wang Yugui
2023-05-10  7:27               ` Dave Chinner
2023-05-10  8:50                 ` Wang Yugui
2023-05-11  1:34                   ` Dave Chinner
2023-05-17 13:07                     ` Wang Yugui
2023-05-17 22:11                       ` Dave Chinner
2023-05-18 18:36                       ` Creating large folios in iomap buffered write path Matthew Wilcox
2023-05-18 21:46                         ` Matthew Wilcox
2023-05-18 22:03                           ` Matthew Wilcox
2023-05-19  2:55                             ` Wang Yugui
2023-05-19 15:38                               ` Matthew Wilcox
2023-05-20 13:35                                 ` Wang Yugui [this message]
2023-05-20 16:35                                   ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230520213531.38CB.409509F4@e16-tech.com \
    --to=wangyugui@e16-tech.com \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).