linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Duncan <1i5t5.duncan@cox.net>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds
Date: Tue, 2 Sep 2014 23:32:41 -0400	[thread overview]
Message-ID: <20140903033239.GA10133@hungrycats.org> (raw)
In-Reply-To: <pan$efe46$daffbd12$cb4477b6$9a3e7f57@cox.net>

[-- Attachment #1: Type: text/plain, Size: 2710 bytes --]

On Tue, Sep 02, 2014 at 05:20:29AM +0000, Duncan wrote:
> suspect your firmware is SERIOUSLY out of space and shuffling, as that'll 
> slow the balance down too, and again after), try running fstrim on the 
> device.  It may or may not work on that device, but if it does and the 
> firmware /was/ out of space and having to shuffle hard, it could improve 
> performance *DRAMATICALLY*.  The reason being that on devices where it 
> works, fstrim will tell the firmware what blocks are free, allowing it 
> more flexibility in erase-block shuffling.
> 
> If that makes a big difference, you can /try/ the discard mount option.  
> Tho doing the trim/discard as part of normal operations can slow them 
> down some too.  The alternative would be to simply run fstrim 
> periodically, perhaps every Nth rsync or some such.  Note that as the 
> fstrim manpage says, the output of fstrim run repeatedly will be the 
> same, since it only knows what areas are candidates to trim, not which 
> ones are already trimmed, but it shouldn't hurt the device any to 
> repeatedly fstrim it, and if you do it every N rsyncs, it should keep 
> things from getting too bad again.

Note that dm-crypt does not pass discards to the underlying block device
by default for security reasons (john didn't mention the dm-crypt options
he was using).  cryptsetup has the --allow-discards option, /etc/crypttab
has the discard option to enable this.


I've seen hung task timeouts on several filesystems under 3.14.17 and
3.15.8-9 (mostly on spinning disks with dm-crypt and lvm2 underneath,
but sometimes without either).  I adjusted kernel.hung_task_timeout_secs
from 120 to 960 and started running balances regularly, which helps
mitigate this problem, but not eliminate it (ironically, when a balance
is resumed at boot, it's usually one of the hung tasks in the kernel log).

A fairly good way to see this is to run 'btrfs fi defrag' on large files,
'btrfs balance' with large extents on the filesystem, or write a big
file quickly (1GB+ in <30 sec).  If a filesystem is more than 90% full
and free space is heavily fragmented (especially by rolling snapshots),
allocating large contiguous areas seems to take a long time, and it
seems to block some or all other allocations at the same time (I haven't
rigorously identified these, but it seems to include everything that calls
fsync() or performs certain metadata operations).  The writes usually do
finish in a few minutes, but write latency (measured by timing a 'mkdir'
call at regular intervals) can spike as high as 9+ hours.  Most people
(and watchdog robots) are reaching for the RESET button in less than
five minutes.  :-/


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

      parent reply	other threads:[~2014-09-03  3:32 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-01 13:33 kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds john terragon
2014-09-01 16:02 ` Chris Mason
2014-09-01 16:36   ` john terragon
2014-09-02  5:20     ` Duncan
2014-09-02  6:12       ` john terragon
2014-09-02  6:40         ` Duncan
2014-09-02 19:56           ` john terragon
2014-09-02 20:10             ` Chris Mason
2014-09-02 20:23               ` john terragon
2014-09-02 20:48                 ` john terragon
2014-09-03  1:31                   ` john terragon
2014-09-03 12:36                     ` Chris Mason
2014-09-03 14:11                       ` john terragon
2014-09-03 15:02                         ` Chris Murphy
2014-09-03  2:44           ` Chris Murphy
2014-09-03  5:37             ` Duncan
2014-09-03  6:03             ` john terragon
2014-09-03  9:14               ` Liu Bo
2014-09-03  3:32       ` Zygo Blaxell [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140903033239.GA10133@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).