linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* question about the performance of 'btrfs send'
@ 2022-10-15 12:35 Wang Yugui
  2022-10-17 13:02 ` David Sterba
  0 siblings, 1 reply; 2+ messages in thread
From: Wang Yugui @ 2022-10-15 12:35 UTC (permalink / raw)
  To: linux-btrfs

Hi,

a question about the performance of 'btrfs send'.

The output speed of 'btrfs send' is about 700MiB/s in the 3 cases.
1) kernel 5.15.73 + 'btrfs send --proto 1'
2) kernel: 6.0.1(with btrfs-devel misc-6.1) +  'btrfs send --proto 1'
3) kernel: 6.0.1(with btrfs-devel misc-6.1) +  'btrfs send --proto 2'
btrfs-progs: 6.0

the outut of 'perf report':
Overhead  Command  Shared Object      Symbol
*1  40.63%  btrfs    [kernel.kallsyms]  [k] __crc32c_le
*2   9.97%  btrfs    [kernel.kallsyms]  [k] memcpy_erms
*3   9.25%  btrfs    [kernel.kallsyms]  [k] send_extent_data
*4   5.40%  btrfs    [kernel.kallsyms]  [k] asm_exc_nmi
*5   2.73%  btrfs    [kernel.kallsyms]  [k] __alloc_pages
   1.14%  btrfs    [kernel.kallsyms]  [k] __rmqueue_pcplist
   0.92%  btrfs    [kernel.kallsyms]  [k] bad_range
   0.88%  btrfs    [kernel.kallsyms]  [k] get_page_from_freelist

What I expected:
the above *1) __crc32c_le take >60%, and the outut speed > 1GiB/s.
The *1) __crc32c_le is necessary operation, and the speed
seems OK.  2GB/s * 40% = 800MiB/s, it is close to 700MiB/s.

Question:
The above *3) is difficult to understand. Any advice?

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2022/10/15



^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: question about the performance of 'btrfs send'
  2022-10-15 12:35 question about the performance of 'btrfs send' Wang Yugui
@ 2022-10-17 13:02 ` David Sterba
  0 siblings, 0 replies; 2+ messages in thread
From: David Sterba @ 2022-10-17 13:02 UTC (permalink / raw)
  To: Wang Yugui; +Cc: linux-btrfs

On Sat, Oct 15, 2022 at 08:35:01PM +0800, Wang Yugui wrote:
> Hi,
> 
> a question about the performance of 'btrfs send'.
> 
> The output speed of 'btrfs send' is about 700MiB/s in the 3 cases.
> 1) kernel 5.15.73 + 'btrfs send --proto 1'
> 2) kernel: 6.0.1(with btrfs-devel misc-6.1) +  'btrfs send --proto 1'
> 3) kernel: 6.0.1(with btrfs-devel misc-6.1) +  'btrfs send --proto 2'
> btrfs-progs: 6.0
> 
> the outut of 'perf report':
> Overhead  Command  Shared Object      Symbol
> *1  40.63%  btrfs    [kernel.kallsyms]  [k] __crc32c_le
> *2   9.97%  btrfs    [kernel.kallsyms]  [k] memcpy_erms
> *3   9.25%  btrfs    [kernel.kallsyms]  [k] send_extent_data
> *4   5.40%  btrfs    [kernel.kallsyms]  [k] asm_exc_nmi
> *5   2.73%  btrfs    [kernel.kallsyms]  [k] __alloc_pages
>    1.14%  btrfs    [kernel.kallsyms]  [k] __rmqueue_pcplist
>    0.92%  btrfs    [kernel.kallsyms]  [k] bad_range
>    0.88%  btrfs    [kernel.kallsyms]  [k] get_page_from_freelist
> 
> What I expected:
> the above *1) __crc32c_le take >60%, and the outut speed > 1GiB/s.
> The *1) __crc32c_le is necessary operation, and the speed
> seems OK.  2GB/s * 40% = 800MiB/s, it is close to 700MiB/s.
> 
> Question:
> The above *3) is difficult to understand. Any advice?

The perf report does not include IO, right? It's only CPU time spent.
That it's accounted only for send_extent_data would also mean there's
some function inlining involved so it does not point exactly where the
time is spent. I'd say it's the main loop around send_write that emits
the commands and works with memory data.

What could be suboptimal is the call get_cur_path in send_write that
rebuilds the path each time even though it's for the same file.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-10-17 13:03 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-15 12:35 question about the performance of 'btrfs send' Wang Yugui
2022-10-17 13:02 ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).