Re: Debugging abysmal write performance with 100% cpu kworker/u16:X+flush-btrfs-2

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Hans van Kranenburg <hans@knorrie.org>, linux-btrfs@vger.kernel.org
Subject: Re: Debugging abysmal write performance with 100% cpu kworker/u16:X+flush-btrfs-2
Date: Mon, 27 Jul 2020 19:09:44 +0800	[thread overview]
Message-ID: <f3cba5f0-8cc4-a521-3bba-2c02ff6c93a2@gmx.com> (raw)
In-Reply-To: <2523ce77-31a3-ecec-f36d-8d74132eae02@knorrie.org>



On 2020/7/25 下午10:24, Hans van Kranenburg wrote:
> Hi,
>
> I have a filesystem here that I'm filling up with data from elsewhere.
> Most of it is done by rsync, and part by send/receive. So, receiving
> data over the network, and then writing the files to disk. There can be
> a dozen of these processes running in parallel.
>
> Now, when doing so, the kworker/u16:X+flush-btrfs-2 process (with
> varying X) often is using nearly 100% cpu, while enormously slowing down
> disk writes. This shows as disk IO wait for the rsync and btrfs receive
> processes.

The name is in fact pretty strange.
It doesn't follow the btrfs workqueue names.

There are two type of kernel threads used by btrfs:
- kthread
  This includes:
  * btrfs-uuid
  * btrfs-cleaner
  * btrfs-transaction
  * btrfs-devrepl
  * btrfs-ino-cache-*llu (deprecated)
  * btrfs-balance

- btrfs_workqueue (a wrapper around kernel workqueue)
  This mostly includes the following one, some of them may have another
workqueue with
  "-high" suffix:
  * btrfs-worker
  * btrfs-delalloc
  * btrfs-flush_delalloc
  * btrfs-cache
  * btrfs-fixup
  * btrfs-endio
  * btrfs-endio-meta
  * btrfs-endio-meta-write
  * btrfs-endio-raid56
  * btrfs-rmw
  * btrfs-endio-write
  * btrfs-freespace-write
  * btrfs-delayed-meta
  * btrfs-readahead
  * btrfs-qgroup-rescan
  * btrfs-scrub
  * btrfs-scrubwrc
  * btrfs-scrubparity

  (No wonder Linus is not happy with so many work queues)

As you can see, there is no one named like "flush-btrfs".

Thus I guess it's from other part of the stack.

Also, the calltrace also shows that, that kernel thread is only doing
page writeback, which calls back to the page write hooks of btrfs.

So I guess it may not be btrfs, but something else trying to do all the
writeback.

But still, the CPU usage is still a problem, it shouldn't cost so much
CPU time just writing back pages from btrfs.


>
> The underlying storage (iSCSI connected over 10Gb/s network) can easily
> eat a few hundred MiB/s. When looking at actual disk business on the
> storage device, percentages <5% utilization are reported for the actual
> disks.
>
> It's clearly kworker/u16:X+flush-btrfs-2 which is the bottleneck here.
>
> I just did a 'perf record -g -a sleep 60' while disk writes were down to
> under 1MiB (!) per second and then 'perf report'. Attached some 'screen
> shot' of it. Also attached an example of what nmon shows to give an idea
> about the situation.
>
> If the kworker/u16:X+flush-btrfs-2 cpu usage decreases, I immediately
> see network and disk write speed ramping up, easily over 200 MiB/s,
> until it soon plummets again.
>
> I see the same behavior with a recent 4.19 kernel and with 5.7.6 (which
> is booted now).
>
> So, what I'm looking for is:
> * Does anyone else see this happen when doing a lot of concurrent writes?
> * What does this flush thing do?
> * Why is it using 100% cpu all the time?
> * How can I debug this more?

bcc based runtime probes I guess?

Since it's almost a dead CPU burner loop, regular sleep based lockup
detector won't help much.

You can try trace events first to see which trace event get executed the
most frequently, then try to add probe points to pin down the real cause.

But personally speaking, it's better to shrink the workload, to find a
minimal workload to reproduce the 100% CPU burn, so that you need less
probes/time to pindown the problem.

> * Ultimately, of course... how can we improve this?
>
> I can recompile the kernel image to e.g. put more trace points in, in
> different places.>
> I just have no idea where to start.
>
> Thanks,
> Hans
>
> P.S. /proc/<pid>/stack for the kworker/u16:X+flush-btrfs-2 mostly shows
> nothing at all, and sometimes when it does show some output, it mostly
> looks like this:
>
> ----
> [<0>] rq_qos_wait+0xfa/0x170
> [<0>] wbt_wait+0x98/0xe0
> [<0>] __rq_qos_throttle+0x23/0x30
> [<0>] blk_mq_make_request+0x12a/0x5d0
> [<0>] generic_make_request+0xcf/0x310
> [<0>] submit_bio+0x42/0x1c0
> [<0>] btrfs_map_bio+0x1c0/0x380 [btrfs]
> [<0>] btrfs_submit_bio_hook+0x8c/0x180 [btrfs]
> [<0>] submit_one_bio+0x31/0x50 [btrfs]
> [<0>] submit_extent_page+0x102/0x210 [btrfs]
> [<0>] __extent_writepage_io+0x1cf/0x380 [btrfs]
> [<0>] __extent_writepage+0x101/0x300 [btrfs]
> [<0>] extent_write_cache_pages+0x2bb/0x440 [btrfs]
> [<0>] extent_writepages+0x44/0x90 [btrfs]
> [<0>] do_writepages+0x41/0xd0

Yeah, it's just trying to write dirty pages for btrfs.

I don't really believe it's btrfs causing the 100% CPU burn.

Maybe you could just try run btrfs on bare storage, without iSCSI, just
to verify it's btrfs to blame?

Thanks,
Qu

> [<0>] __writeback_single_inode+0x3d/0x340
> [<0>] writeback_sb_inodes+0x1e5/0x480
> [<0>] __writeback_inodes_wb+0x5d/0xb0
> [<0>] wb_writeback+0x25f/0x2f0
> [<0>] wb_workfn+0x2fe/0x3f0
> [<0>] process_one_work+0x1ad/0x370
> [<0>] worker_thread+0x30/0x390
> [<0>] kthread+0x112/0x130
> [<0>] ret_from_fork+0x1f/0x40
>
> ----
> [<0>] rq_qos_wait+0xfa/0x170
> [<0>] wbt_wait+0x98/0xe0
> [<0>] __rq_qos_throttle+0x23/0x30
> [<0>] blk_mq_make_request+0x12a/0x5d0
> [<0>] generic_make_request+0xcf/0x310
> [<0>] submit_bio+0x42/0x1c0
> [<0>] btrfs_map_bio+0x1c0/0x380 [btrfs]
> [<0>] btrfs_submit_bio_hook+0x8c/0x180 [btrfs]
> [<0>] submit_one_bio+0x31/0x50 [btrfs]
> [<0>] extent_writepages+0x5d/0x90 [btrfs]
> [<0>] do_writepages+0x41/0xd0
> [<0>] __writeback_single_inode+0x3d/0x340
> [<0>] writeback_sb_inodes+0x1e5/0x480
> [<0>] __writeback_inodes_wb+0x5d/0xb0
> [<0>] wb_writeback+0x25f/0x2f0
> [<0>] wb_workfn+0x2fe/0x3f0
> [<0>] process_one_work+0x1ad/0x370
> [<0>] worker_thread+0x30/0x390
> [<0>] kthread+0x112/0x130
> [<0>] ret_from_fork+0x1f/0x40
>
> ----
> [<0>] rq_qos_wait+0xfa/0x170
> [<0>] wbt_wait+0x98/0xe0
> [<0>] __rq_qos_throttle+0x23/0x30
> [<0>] blk_mq_make_request+0x12a/0x5d0
> [<0>] generic_make_request+0xcf/0x310
> [<0>] submit_bio+0x42/0x1c0
> [<0>] btrfs_map_bio+0x1c0/0x380 [btrfs]
> [<0>] btrfs_submit_bio_hook+0x8c/0x180 [btrfs]
> [<0>] submit_one_bio+0x31/0x50 [btrfs]
> [<0>] submit_extent_page+0x102/0x210 [btrfs]
> [<0>] __extent_writepage_io+0x1cf/0x380 [btrfs]
> [<0>] __extent_writepage+0x101/0x300 [btrfs]
> [<0>] extent_write_cache_pages+0x2bb/0x440 [btrfs]
> [<0>] extent_writepages+0x44/0x90 [btrfs]
> [<0>] do_writepages+0x41/0xd0
> [<0>] __writeback_single_inode+0x3d/0x340
> [<0>] writeback_sb_inodes+0x1e5/0x480
> [<0>] __writeback_inodes_wb+0x5d/0xb0
> [<0>] wb_writeback+0x25f/0x2f0
> [<0>] wb_workfn+0x2fe/0x3f0
> [<0>] process_one_work+0x1ad/0x370
> [<0>] worker_thread+0x30/0x390
> [<0>] kthread+0x112/0x130
> [<0>] ret_from_fork+0x1f/0x40
>

next prev parent reply	other threads:[~2020-07-27 11:09 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-25 14:24 Debugging abysmal write performance with 100% cpu kworker/u16:X+flush-btrfs-2 Hans van Kranenburg
2020-07-25 15:37 ` Holger Hoffstätte
2020-07-25 16:43   ` Hans van Kranenburg
2020-07-25 19:44     ` Holger Hoffstätte
2020-07-25 21:03       ` Hans van Kranenburg
2020-07-26  1:00         ` Chris Murphy
2020-07-25 21:27 ` Hans van Kranenburg
2020-07-26  8:10   ` A L
2020-07-26  0:50 ` Chris Murphy
2020-07-27 11:09 ` Qu Wenruo [this message]
2020-07-27 17:17   ` Hans van Kranenburg
2020-07-27 19:23     ` Chris Murphy
2020-07-27 23:16     ` Chris Murphy
2020-07-28  0:51     ` Qu Wenruo
2020-07-28  1:52       ` Qu Wenruo
2020-07-28 14:52         ` Hans van Kranenburg
2020-07-29  0:15           ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f3cba5f0-8cc4-a521-3bba-2c02ff6c93a2@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=hans@knorrie.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox