Balance performance problem with 3.14.1

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Balance performance problem with 3.14.1
@ 2014-05-08  9:50 Russell Coker
  2014-05-08 15:07 ` Duncan
  2014-05-08 16:10 ` Josef Bacik
  0 siblings, 2 replies; 5+ messages in thread
From: Russell Coker @ 2014-05-08  9:50 UTC (permalink / raw)
  To: Btrfs BTRFS

I've got a server/workstation (KDE desktop and file server) running kernel 
3.14.1 from the Debian package 3.14-trunk-amd64.

It was running well until I decided to do a full balance of the BTRFS RAID-1 
array of 3TB SATA disks (which hadn't been balanced before due to previous 
kernels performing badly with scrub or balance).  I canceled the balance after 
about 5 days when it had been claiming to be about 65% done for a day while 
doing a lot of disk IO.

After canceling the balance the performance of the array has been poor.  It 
has a cron job that runs twice a week to rsync data from a Maildir based mail 
server that currently has 2401473 Inodes in use according to ZFS on the mail 
server (unfortunately BTRFS won't tell me how many Inodes are in use).  The 
cron job does an "rsync -va" type backup WITHOUT the -c option, so we're 
basically doing a recursive stat on all files and then transferring new files 
(Dovecot index files are excluded so files tend never to change).  After the 
rsync is complete "cp -rl" is used to make a backup of the tree.

The cp -rl usually takes something less than 30 minutes (not long enough for 
me to even notice) but today cp has been running for 5.5 hours and seems to be 
about 3/5 done (9000/15000 subdirectories linked).

The backup script that runs rsync and cp is run at 2AM and usually finishes 
well before 9AM to avoid interfering with the workstation use of the system.  
Today the script is still running at 7:38PM and seems likely to run for some 
hours.

The system in question has a SSD for /, /home, and swap.  The RAID-1 array of 
SATA disks is used for file serving and online backup.  I am not aware of any 
performance problems with the SSD, but a reasonably fast Intel SSD used for 
light desktop use could probably run at 10% normal speed and still seem fast.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Balance performance problem with 3.14.1
  2014-05-08  9:50 Balance performance problem with 3.14.1 Russell Coker
@ 2014-05-08 15:07 ` Duncan
  2014-05-08 16:10 ` Josef Bacik
  1 sibling, 0 replies; 5+ messages in thread
From: Duncan @ 2014-05-08 15:07 UTC (permalink / raw)
  To: linux-btrfs

Russell Coker posted on Thu, 08 May 2014 19:50:23 +1000 as excerpted:

> I've got a server/workstation (KDE desktop and file server) running
> kernel 3.14.1 from the Debian package 3.14-trunk-amd64.
> 
> It was running well until I decided to do a full balance of the BTRFS
> RAID-1 array of 3TB SATA disks (which hadn't been balanced before due to
> previous kernels performing badly with scrub or balance).  I canceled
> the balance after about 5 days when it had been claiming to be about 65%
> done for a day while doing a lot of disk IO.
> 
> After canceling the balance the performance of the array has been poor.

Not much help, but two comments:

1) I've seen other reports of performance problems after balance.  Nobody 
seems to have a good reason as to why a balance might do that.

2) I've NOT seen anything like that, here.  However, my btrfs are all 
rather small, under 50 GiB.

(FWIW, it /still/ seems weird to me calling a GB "small".  It doesn't 
seem all /that/ long ago that I bought my first 1 GB drive, and it sure 
wasn't small nor inexpensive then!  I guess I've officially joined the 
computer old-timers!)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Balance performance problem with 3.14.1
  2014-05-08  9:50 Balance performance problem with 3.14.1 Russell Coker
  2014-05-08 15:07 ` Duncan
@ 2014-05-08 16:10 ` Josef Bacik
  2014-05-12  2:17   ` Russell Coker
  1 sibling, 1 reply; 5+ messages in thread
From: Josef Bacik @ 2014-05-08 16:10 UTC (permalink / raw)
  To: russell, Btrfs BTRFS

On 05/08/2014 05:50 AM, Russell Coker wrote:
> I've got a server/workstation (KDE desktop and file server) running kernel
> 3.14.1 from the Debian package 3.14-trunk-amd64.
>
> It was running well until I decided to do a full balance of the BTRFS RAID-1
> array of 3TB SATA disks (which hadn't been balanced before due to previous
> kernels performing badly with scrub or balance).  I canceled the balance after
> about 5 days when it had been claiming to be about 65% done for a day while
> doing a lot of disk IO.
>

Can I see dmesg/messages for the time that this was running for so long?

> After canceling the balance the performance of the array has been poor.  It
> has a cron job that runs twice a week to rsync data from a Maildir based mail
> server that currently has 2401473 Inodes in use according to ZFS on the mail
> server (unfortunately BTRFS won't tell me how many Inodes are in use).  The
> cron job does an "rsync -va" type backup WITHOUT the -c option, so we're
> basically doing a recursive stat on all files and then transferring new files
> (Dovecot index files are excluded so files tend never to change).  After the
> rsync is complete "cp -rl" is used to make a backup of the tree.
>
> The cp -rl usually takes something less than 30 minutes (not long enough for
> me to even notice) but today cp has been running for 5.5 hours and seems to be
> about 3/5 done (9000/15000 subdirectories linked).
>

Sysrq+w while it's running, just one and then wait a while and then 
another.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Balance performance problem with 3.14.1
  2014-05-08 16:10 ` Josef Bacik
@ 2014-05-12  2:17   ` Russell Coker
  0 siblings, 0 replies; 5+ messages in thread
From: Russell Coker @ 2014-05-12  2:17 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Btrfs BTRFS

On Fri, 9 May 2014, Josef Bacik <jbacik@fb.com> wrote:
> On 05/08/2014 05:50 AM, Russell Coker wrote:
> > I've got a server/workstation (KDE desktop and file server) running
> > kernel 3.14.1 from the Debian package 3.14-trunk-amd64.
> > 
> > It was running well until I decided to do a full balance of the BTRFS
> > RAID-1 array of 3TB SATA disks (which hadn't been balanced before due to
> > previous kernels performing badly with scrub or balance).  I canceled
> > the balance after about 5 days when it had been claiming to be about 65%
> > done for a day while doing a lot of disk IO.

Firstly after writing my previous message but before receiving your message 
the cron job finished and I umounted the filesystem and mounted it again.  
That freed 2G of RAM from cache (according to the "free" command) and the 
filesystem performed a lot better for most tasks (EG running a basic ls 
command didn't give a noticable delay).

Number of files: 2010493
Number of files transferred: 19609
Total file size: 216240165510 bytes
Total transferred file size: 2632884253 bytes
Literal data: 2632483826 bytes
Matched data: 400442 bytes
File list size: 98728269
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 452779
Total bytes received: 1743689596

sent 452779 bytes  received 1743689596 bytes  150818.66 bytes/sec
total size is 216240165510  speedup is 123.98

This morning the cron job is running again, above is the rsync output to show 
the nature of the data I'm dealing with.  2M files in 3.2 hours gives 174 
files per second.

> Can I see dmesg/messages for the time that this was running for so long?

It's running now, cp has run for 21 minutes and copied 806/14938 directories, 
5% done so it'll probably take 5 hours.  There is nothing in the kernel 
message log since before the cron job started - that was when I turned off the 
monitor that is USB attached.

Now cp has been running for 2 hours 25 minutes and copied 7180 directories.

> > The cp -rl usually takes something less than 30 minutes (not long enough
> > for me to even notice) but today cp has been running for 5.5 hours and
> > seems to be about 3/5 done (9000/15000 subdirectories linked).
> 
> Sysrq+w while it's running, just one and then wait a while and then
> another.  Thanks,

http://www.coker.com.au/bug/btrfs-slow-cp.gz

I've done a few of those, the log is at the above URL.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Balance performance problem with 3.14.1
@ 2014-05-08 11:47 Tomasz Chmielewski
  0 siblings, 0 replies; 5+ messages in thread
From: Tomasz Chmielewski @ 2014-05-08 11:47 UTC (permalink / raw)
  To: linux-btrfs

> I canceled the balance after 
> about 5 days when it had been claiming to be about 65% done for a day
> while doing a lot of disk IO.

I can see similar behaviour with 3.14.2 - after 4 days, it's only 25% done:

root      8382  2.1  0.0  17840   628 pts/1    D+   May04 124:12      \_ btrfs balance start /mnt/lxc1 -v

# btrfs balance status /mnt/lxc1
Balance on '/mnt/lxc1' is running
611 out of about 2492 chunks balanced (612 considered),  75% left


The reason I'm running balance here is that 3.15-rc* had the filesystem hanging for me when doing rsync
(and sometimes hanging during snapshotting), so I've downgraded to 3.14.2 (no hanging so far).


-- 
Tomasz Chmielewski
http://wpkg.org

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-05-12  2:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-08  9:50 Balance performance problem with 3.14.1 Russell Coker
2014-05-08 15:07 ` Duncan
2014-05-08 16:10 ` Josef Bacik
2014-05-12  2:17   ` Russell Coker
  -- strict thread matches above, loose matches on Subject: below --
2014-05-08 11:47 Tomasz Chmielewski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).