Slow snapshot deletion

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Slow snapshot deletion
@ 2011-07-28 20:04 Bruce Guenter
  2011-07-28 20:28 ` Chris Mason
  0 siblings, 1 reply; 12+ messages in thread
From: Bruce Guenter @ 2011-07-28 20:04 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1227 bytes --]

At work we have a backup server system running btrfs.  The main backup
pool is a 1.3TB partition (on LVM).  Every night, a series of backups
are done with rsync, with each backup onto a separate subvolume, and a
snapshot made of each backup.  Compression is used to minimize disk
space used.

When the system was first provisioned, all the operations ran
impressively fast, but now it is almost unusably slow.  It has taken 2
days to delete snapshots that recovered 30GB of space, and some of the
backups are taking longer than 24 hours to complete.

Is there any way to monitor the progress of the snapshot deletions?

What could be causing it to run so slow?

What could we do to speed things up?

Kernel is 2.6.39-gentoo-r3.  It was last rebooted 2 days ago, and
rebooting does not seem to improve the situation.  Mount options
according to /proc are "rw,nosuid,noatime,nodiratime,compress=zlib,noacl"
btrfs filesystem df shows:

Data: total=1.15TB, used=1.10TB
System, DUP: total=8.00MB, used=184.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=49.38GB, used=31.59GB
Metadata: total=8.00MB, used=0.00

-- 
Bruce Guenter <bruce@untroubled.org>                http://untroubled.org/

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Slow snapshot deletion
  2011-07-28 20:04 Slow snapshot deletion Bruce Guenter
@ 2011-07-28 20:28 ` Chris Mason
  2011-07-28 20:51   ` Chris Mason
  2011-08-01 22:26   ` Bruce Guenter
  0 siblings, 2 replies; 12+ messages in thread
From: Chris Mason @ 2011-07-28 20:28 UTC (permalink / raw)
  To: Bruce Guenter; +Cc: linux-btrfs

Excerpts from Bruce Guenter's message of 2011-07-28 16:04:45 -0400:
> 
> At work we have a backup server system running btrfs.  The main backup
> pool is a 1.3TB partition (on LVM).  Every night, a series of backups
> are done with rsync, with each backup onto a separate subvolume, and a
> snapshot made of each backup.  Compression is used to minimize disk
> space used.
> 
> When the system was first provisioned, all the operations ran
> impressively fast, but now it is almost unusably slow.  It has taken 2
> days to delete snapshots that recovered 30GB of space, and some of the
> backups are taking longer than 24 hours to complete.
> 
> Is there any way to monitor the progress of the snapshot deletions?
> 
> What could be causing it to run so slow?
> 
> What could we do to speed things up?
> 
> Kernel is 2.6.39-gentoo-r3.  It was last rebooted 2 days ago, and
> rebooting does not seem to improve the situation.  Mount options
> according to /proc are "rw,nosuid,noatime,nodiratime,compress=zlib,noacl"
> btrfs filesystem df shows:
> 
> Data: total=1.15TB, used=1.10TB
> System, DUP: total=8.00MB, used=184.00KB
> System: total=4.00MB, used=0.00
> Metadata, DUP: total=49.38GB, used=31.59GB
> Metadata: total=8.00MB, used=0.00
> 

The slow performance is probably coming from reading in the metadata
associated with the snapshot extents.  The new readahead extentions from
Arne should help once we've adapted them to it.  The easiest way to make
sure is to hit sysrq-w a few times while it is slow (or run a patched
latencytop).

-chris

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Slow snapshot deletion
  2011-07-28 20:28 ` Chris Mason
@ 2011-07-28 20:51   ` Chris Mason
  2011-08-04 16:35     ` Bruce Guenter
  2011-08-01 22:26   ` Bruce Guenter
  1 sibling, 1 reply; 12+ messages in thread
From: Chris Mason @ 2011-07-28 20:51 UTC (permalink / raw)
  To: Chris Mason; +Cc: Bruce Guenter, linux-btrfs

Excerpts from Chris Mason's message of 2011-07-28 16:28:04 -0400:
> Excerpts from Bruce Guenter's message of 2011-07-28 16:04:45 -0400:
> > 
> > At work we have a backup server system running btrfs.  The main backup
> > pool is a 1.3TB partition (on LVM).  Every night, a series of backups
> > are done with rsync, with each backup onto a separate subvolume, and a
> > snapshot made of each backup.  Compression is used to minimize disk
> > space used.
> > 
> > When the system was first provisioned, all the operations ran
> > impressively fast, but now it is almost unusably slow.  It has taken 2
> > days to delete snapshots that recovered 30GB of space, and some of the
> > backups are taking longer than 24 hours to complete.
> > 
> > Is there any way to monitor the progress of the snapshot deletions?
> > 
> > What could be causing it to run so slow?
> > 
> > What could we do to speed things up?
> > 
> > Kernel is 2.6.39-gentoo-r3.  It was last rebooted 2 days ago, and
> > rebooting does not seem to improve the situation.  Mount options
> > according to /proc are "rw,nosuid,noatime,nodiratime,compress=zlib,noacl"
> > btrfs filesystem df shows:
> > 
> > Data: total=1.15TB, used=1.10TB
> > System, DUP: total=8.00MB, used=184.00KB
> > System: total=4.00MB, used=0.00
> > Metadata, DUP: total=49.38GB, used=31.59GB
> > Metadata: total=8.00MB, used=0.00
> > 
> 
> The slow performance is probably coming from reading in the metadata
> associated with the snapshot extents.  The new readahead extentions from
> Arne should help once we've adapted them to it.  The easiest way to make
> sure is to hit sysrq-w a few times while it is slow (or run a patched
> latencytop).

Sorry, hit send too soon.  Here is the latencytop patch, after you
recompile run latencytop -c for a few minutes.  Send the output here.

http://oss.oracle.com/~mason/latencytop.patch

-chris

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Slow snapshot deletion
  2011-07-28 20:28 ` Chris Mason
  2011-07-28 20:51   ` Chris Mason
@ 2011-08-01 22:26   ` Bruce Guenter
  2011-08-01 22:41     ` cwillu
  1 sibling, 1 reply; 12+ messages in thread
From: Bruce Guenter @ 2011-08-01 22:26 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 894 bytes --]

On Thu, Jul 28, 2011 at 04:28:04PM -0400, Chris Mason wrote:
> The slow performance is probably coming from reading in the metadata
> associated with the snapshot extents.  The new readahead extentions from
> Arne should help once we've adapted them to it.  The easiest way to make
> sure is to hit sysrq-w a few times while it is slow (or run a patched
> latencytop).

We shut down the rsync backups for now, and are waiting for the
snapshots to complete deletion so we can unmount the filesystem to
reboot (missing kernel options for latencytop).  In the mean time I hit
SysRq-W quite a number of times, and nothing is showing up in kernel
messages.  However, btrfs-cleaner and/or btrfs-transaction (sometimes
both, sometimes just one) have been running high in top, taking up all
the CPU time.

-- 
Bruce Guenter <bruce@untroubled.org>                http://untroubled.org/

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Slow snapshot deletion
  2011-08-01 22:26   ` Bruce Guenter
@ 2011-08-01 22:41     ` cwillu
  2011-08-01 22:59       ` Bruce Guenter
  0 siblings, 1 reply; 12+ messages in thread
From: cwillu @ 2011-08-01 22:41 UTC (permalink / raw)
  To: linux-btrfs

On Mon, Aug 1, 2011 at 4:26 PM, Bruce Guenter <bruce@untroubled.org> wr=
ote:
> On Thu, Jul 28, 2011 at 04:28:04PM -0400, Chris Mason wrote:
>> The slow performance is probably coming from reading in the metadata
>> associated with the snapshot extents. =C2=A0The new readahead extent=
ions from
>> Arne should help once we've adapted them to it. =C2=A0The easiest wa=
y to make
>> sure is to hit sysrq-w a few times while it is slow (or run a patche=
d
>> latencytop).
>
> We shut down the rsync backups for now, and are waiting for the
> snapshots to complete deletion so we can unmount the filesystem to
> reboot (missing kernel options for latencytop). =C2=A0In the mean tim=
e I hit
> SysRq-W quite a number of times, and nothing is showing up in kernel

The keystroke should be alt-sysrq-w.

> messages. =C2=A0However, btrfs-cleaner and/or btrfs-transaction (some=
times
> both, sometimes just one) have been running high in top, taking up al=
l
> the CPU time.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Slow snapshot deletion
  2011-08-01 22:41     ` cwillu
@ 2011-08-01 22:59       ` Bruce Guenter
  2011-08-02  0:29         ` Chris Mason
  0 siblings, 1 reply; 12+ messages in thread
From: Bruce Guenter @ 2011-08-01 22:59 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 409 bytes --]

On Mon, Aug 01, 2011 at 04:41:10PM -0600, cwillu wrote:
> The keystroke should be alt-sysrq-w.

Right.  What I meant is that no backtraces are showing up.  I was
actually using

	echo w > /proc/sysrq-trigger

and all I was seeing is:

SysRq : Show Blocked State
  task                        PC stack   pid father

-- 
Bruce Guenter <bruce@untroubled.org>                http://untroubled.org/

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Slow snapshot deletion
  2011-08-01 22:59       ` Bruce Guenter
@ 2011-08-02  0:29         ` Chris Mason
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Mason @ 2011-08-02  0:29 UTC (permalink / raw)
  To: Bruce Guenter; +Cc: linux-btrfs

Excerpts from Bruce Guenter's message of 2011-08-01 18:59:15 -0400:
> On Mon, Aug 01, 2011 at 04:41:10PM -0600, cwillu wrote:
> > The keystroke should be alt-sysrq-w.
> 
> Right.  What I meant is that no backtraces are showing up.  I was
> actually using
> 
>     echo w > /proc/sysrq-trigger
> 
> and all I was seeing is:
> 
> SysRq : Show Blocked State
>   task                        PC stack   pid father
> 

# once per boot, run dmesg -n8
dmesg -n8

echo w > /proc/sysrq-trigger

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Slow snapshot deletion
  2011-07-28 20:51   ` Chris Mason
@ 2011-08-04 16:35     ` Bruce Guenter
  2011-08-11 15:04       ` Bruce Guenter
  0 siblings, 1 reply; 12+ messages in thread
From: Bruce Guenter @ 2011-08-04 16:35 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 4176 bytes --]

On Thu, Jul 28, 2011 at 04:51:24PM -0400, Chris Mason wrote:
> Sorry, hit send too soon.  Here is the latencytop patch, after you
> recompile run latencytop -c for a few minutes.  Send the output here.
> 
> http://oss.oracle.com/~mason/latencytop.patch

Ok, I ran it for more than a few minutes.  Early on (shortly after boot), the output shows this for a long time:

=============== Mon Aug  1 21:09:30 2011
Globals: Cause Maximum Percentage
Process details:
Process ksoftirqd/0 (3) Total:   3.9 msec
        [run_ksoftirqd]   3.9 msec        100.0 %
                run_ksoftirqd kthread kernel_thread_helper 
Process kworker/1:1 (983) Total:  31.2 msec
        .         3.9 msec        100.0 %
                worker_thread kthread kernel_thread_helper 
Process sshd (3948) Total:   3.9 msec
        Waiting for event (select)        3.9 msec        100.0 %
                poll_schedule_timeout do_select core_sys_select sys_select 
                system_call_fastpath 
Process btrfs-cleaner (4345) Total: 2433.6 msec
        [sleep_on_page]  31.3 msec        100.0 %
                sleep_on_page wait_on_page_bit read_extent_buffer_pages 
                btree_read_extent_buffer_pages read_tree_block 
                read_block_for_search btrfs_search_slot 
                lookup_inline_extent_backref __btrfs_free_extent 
                run_clustered_refs btrfs_run_delayed_refs __btrfs_end_transaction 
Process btrfs-transacti (4346) Total: 2437.5 msec
        [sleep_on_page]  27.3 msec        100.0 %
                sleep_on_page wait_on_page_bit read_extent_buffer_pages 
                btree_read_extent_buffer_pages read_tree_block 
                read_block_for_search btrfs_search_slot 
                lookup_inline_extent_backref __btrfs_free_extent 
                run_clustered_refs btrfs_run_delayed_refs btrfs_commit_transaction 

This repeats identically (all numbers and backtrace is very stable) for
a long time.  Much later, I see this:

=============== Tue Aug  2 13:46:29 2011
Globals: Cause Maximum Percentage
Process details:
Process btrfs-submit-0 (4332) Total: 199.2 msec
        Creating block layer request     58.6 msec        100.0 %
                get_request_wait __make_request generic_make_request 
                submit_bio run_scheduled_bios pending_bios_fn worker_loop 
                kthread kernel_thread_helper 
Process btrfs-cleaner (4345) Total:   3.9 msec
        Creating block layer request      3.9 msec        100.0 %
                get_request_wait __make_request generic_make_request 
                submit_bio btrfs_map_bio btree_submit_bio_hook submit_one_bio 
                read_extent_buffer_pages readahead_tree_block reada_walk_down 
                do_walk_down walk_down_tree 
Process btrfs-endio-met (4700) Total:  35.2 msec
        [worker_loop]     3.9 msec        100.0 %
                worker_loop kthread kernel_thread_helper 
Process kworker/1:2 (26315) Total:  50.8 msec
        .         3.9 msec        100.0 %
                worker_thread kthread kernel_thread_helper 
[...snip...]
=============== Thu Aug  4 12:34:08 2011
Globals: Cause Maximum Percentage
Process details:
Process btrfs-submit-0 (4332) Total: 199.2 msec
        Creating block layer request     58.6 msec        100.0 %
                get_request_wait __make_request generic_make_request 
                submit_bio run_scheduled_bios pending_bios_fn worker_loop 
                kthread kernel_thread_helper 
Process btrfs-cleaner (4345) Total:   3.9 msec
        Creating block layer request      3.9 msec        100.0 %
                get_request_wait __make_request generic_make_request 
                submit_bio btrfs_map_bio btree_submit_bio_hook submit_one_bio 
                read_extent_buffer_pages readahead_tree_block reada_walk_down 
                do_walk_down walk_down_tree 
Process btrfs-endio-met (4700) Total:  35.2 msec
        [worker_loop]     3.9 msec        100.0 %
                worker_loop kthread kernel_thread_helper 


-- 
Bruce Guenter <bruce@untroubled.org>                http://untroubled.org/

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Slow snapshot deletion
  2011-08-04 16:35     ` Bruce Guenter
@ 2011-08-11 15:04       ` Bruce Guenter
  2011-08-12  0:40         ` Simon Kirby
  0 siblings, 1 reply; 12+ messages in thread
From: Bruce Guenter @ 2011-08-11 15:04 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 318 bytes --]

On Thu, Aug 04, 2011 at 10:35:36AM -0600, Bruce Guenter wrote:
> Ok, I ran it for more than a few minutes.  Early on (shortly after boot), the output shows this for a long time:

Does anybody have any ideas what might be going on?

-- 
Bruce Guenter <bruce@untroubled.org>                http://untroubled.org/

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Slow snapshot deletion
  2011-08-11 15:04       ` Bruce Guenter
@ 2011-08-12  0:40         ` Simon Kirby
  2011-08-12 18:21           ` Bruce Guenter
  0 siblings, 1 reply; 12+ messages in thread
From: Simon Kirby @ 2011-08-12  0:40 UTC (permalink / raw)
  To: linux-btrfs

On Thu, Aug 11, 2011 at 09:04:56AM -0600, Bruce Guenter wrote:

> On Thu, Aug 04, 2011 at 10:35:36AM -0600, Bruce Guenter wrote:
> > Ok, I ran it for more than a few minutes.  Early on (shortly after boot), the output shows this for a long time:
> 
> Does anybody have any ideas what might be going on?

By chance, does "vmstat 1" or top show a high system cpu% while this is
happening, and if so, does cat /proc/<pid>/stack of the PIDs in R state
sometimes show things under btrfs_run_delayed_refs like
setup_cluster_no_bitmap() and rb_next()?

I haven't been able to build a 2.6.39 to verify yet, but 2.6.38.5 does
not show the problem for some reason. This is on a backup server, too,
with 20 3 TB btrfs volumes.

You should also be able to see a stack trace in this case (NOT the
uninterruptible sleep case) with echo l > /rpoc/sysrq-trigger, or use
oprofile or "perf top", which is what I am using. I see output such as:

     samples  pcnt function                           DSO
     _______ _____ __________________________________ ______________

    13537.00 59.2% rb_next                            [kernel]
     3539.00 15.5% _raw_spin_lock                     [kernel]
     1668.00  7.3% setup_cluster_no_bitmap            [kernel]
      799.00  3.5% tree_search_offset                 [kernel]
      476.00  2.1% fill_window                        [kernel]

and and perf record -g; perf report --stdio, which gives me:

47.60%  btrfs-transacti  [kernel.kallsyms]          [k] rb_next
        |
        --- rb_next
           |
           |--97.36%-- setup_cluster_no_bitmap
           |          btrfs_find_space_cluster
           |          find_free_extent
           |          btrfs_reserve_extent
           |          btrfs_alloc_free_block
           |          __btrfs_cow_block
           |          btrfs_cow_block
           |          |
           |          |--99.95%-- btrfs_search_slot
           |          |          |
           |          |          |--55.01%-- lookup_inline_extent_backref
           |          |          |          |
           |          |          |          |--97.17%-- __btrfs_free_extent
           |          |          |          |          run_clustered_refs
           |          |          |          |          btrfs_run_delayed_refs
           |          |          |          |          |
           |          |          |          |          |--88.72%-- btrfs_commit_transaction
           |          |          |          |          |          transaction_kthread

At one point, I rebooted the box without unmounting, and mounting of the
60 TB of btrfs volumes never actually finished within 8 hours. It seemed
to get stuck on one of them about half way through, constantly writing
(slowly) and taking lots of CPU. This is just from mounting, not any use.
Booting back into 2.6.38.5 was fine. The kernel giving problems was 3.0,
and now 3.1-rc1.

Before, when it actually was mounted, "sync" (with nothing else dirtying
pages) also took 4 minutes to run and pegged the CPU with a profile
similar to the above.

I tried started trying to do a bisection but tons of other things have
changed and it takes quite a while to verify each pass. It also doesn't
help that I can't seem to reproduce it on any other non-production boxes.

Simon-

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Slow snapshot deletion
  2011-08-12  0:40         ` Simon Kirby
@ 2011-08-12 18:21           ` Bruce Guenter
  2011-08-12 18:26             ` Simon Kirby
  0 siblings, 1 reply; 12+ messages in thread
From: Bruce Guenter @ 2011-08-12 18:21 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 615 bytes --]

On Thu, Aug 11, 2011 at 05:40:36PM -0700, Simon Kirby wrote:
> By chance, does "vmstat 1" or top show a high system cpu% while this is
> happening,

Absolutely, usually around 80-90%

> and if so, does cat /proc/<pid>/stack of the PIDs in R state
> sometimes show things under btrfs_run_delayed_refs like
> setup_cluster_no_bitmap() and rb_next()?

No, I am not seeing what you describe.  The stack traces are usually
showing what I originally quoted.  I don't know if I've seen
setup_cluster_no_bitmap or rb_* at all.

-- 
Bruce Guenter <bruce@untroubled.org>                http://untroubled.org/

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Slow snapshot deletion
  2011-08-12 18:21           ` Bruce Guenter
@ 2011-08-12 18:26             ` Simon Kirby
  0 siblings, 0 replies; 12+ messages in thread
From: Simon Kirby @ 2011-08-12 18:26 UTC (permalink / raw)
  To: linux-btrfs

On Fri, Aug 12, 2011 at 12:21:44PM -0600, Bruce Guenter wrote:

> On Thu, Aug 11, 2011 at 05:40:36PM -0700, Simon Kirby wrote:
> > By chance, does "vmstat 1" or top show a high system cpu% while this is
> > happening,
> 
> Absolutely, usually around 80-90%

...which is perhaps why you had difficult obtaining traces with sysrq-w,
since most of the time there were no D-state processes. Instead, sysrq-l
give a backtrace of each running CPU.

> > and if so, does cat /proc/<pid>/stack of the PIDs in R state
> > sometimes show things under btrfs_run_delayed_refs like
> > setup_cluster_no_bitmap() and rb_next()?
> 
> No, I am not seeing what you describe.  The stack traces are usually
> showing what I originally quoted.  I don't know if I've seen
> setup_cluster_no_bitmap or rb_* at all.

Ok, oh well. :)

Simon-

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-08-12 18:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-28 20:04 Slow snapshot deletion Bruce Guenter
2011-07-28 20:28 ` Chris Mason
2011-07-28 20:51   ` Chris Mason
2011-08-04 16:35     ` Bruce Guenter
2011-08-11 15:04       ` Bruce Guenter
2011-08-12  0:40         ` Simon Kirby
2011-08-12 18:21           ` Bruce Guenter
2011-08-12 18:26             ` Simon Kirby
2011-08-01 22:26   ` Bruce Guenter
2011-08-01 22:41     ` cwillu
2011-08-01 22:59       ` Bruce Guenter
2011-08-02  0:29         ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).