linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin Steigerwald <Martin@lichtvoll.de>
To: Robert White <rwhite@pobox.com>
Cc: Bardur Arantsson <spam@scientician.net>, linux-btrfs@vger.kernel.org
Subject: Re: BTRFS free space handling still needs more work: Hangs again
Date: Sun, 28 Dec 2014 16:47 +0100	[thread overview]
Message-ID: <2555795.UuG01n5GBN@merkaba> (raw)
In-Reply-To: <2330517.PVzv17pTee@merkaba>

Am Sonntag, 28. Dezember 2014, 16:42:20 schrieb Martin Steigerwald:
> Am Sonntag, 28. Dezember 2014, 06:52:41 schrieb Robert White:
> > On 12/28/2014 04:07 AM, Martin Steigerwald wrote:
> > > Am Samstag, 27. Dezember 2014, 20:03:09 schrieb Robert White:
> > >> Now:
> > >>
> > >> The complaining party has verified the minimum, repeatable case of
> > >> simple file allocation on a very fragmented system and the responding
> > >> party and several others have understood and supported the bug.
> > >
> > > I didn´t yet provide such a test case.
> > 
> > My bad.
> > 
> > >
> > > At the moment I can only reproduce this kworker thread using a CPU for
> > > minutes case with my /home filesystem.
> > >
> > > A mininmal test case for me would be to be able to reproduce it with a
> > > fresh BTRFS filesystem. But yet with my testcase with the fresh BTRFS I
> > > get 4800 instead of 270 IOPS.
> > >
> > 
> > A version of the test case to demonstrate absolutely system-clogging 
> > loads is pretty easy to construct.
> > 
> > Make a raid1 filesystem.
> > Balance it once to make sure the seed filesystem is fully integrated.
> > 
> > Create a bunch of small files that are at least 4K in size, but are 
> > randomly sized. Fill the entire filesystem with them.
> > 
> > BASH Script:
> > typeset -i counter=0
> > while
> >   dd if=/dev/urandom of=/mnt/Work/$((++counter)) bs=$((4096 + $RANDOM)) 
> > count=1 2>/dev/null
> > do
> > echo $counter >/dev/null #basically a noop
> > done
> >
> > The while will exit when the dd encounters a full filesystem.
> > 
> > Then delete ~10% of the files with
> > rm *0
> > 
> > Run the while loop again, then delete a different 10% with "rm *1".
> > 
> > Then again with rm *2, etc...
> > 
> > Do this a few times and with each iteration the CPU usage gets worse and 
> > worse. You'll easily get system-wide stalls on all IO tasks lasting ten 
> > or more seconds.
> 
> Thanks Robert. Thats wonderful.
> 
> I wondered about such a test case already and thought about reproducing
> it just with fallocate calls instead to reduce the amount of actual
> writes done. I.e. just do some silly fallocate, truncating, write just
> some parts with dd seek and remove things again kind of workload.
> 
> Feel free to add your testcase to the bug report:
> 
> [Bug 90401] New: btrfs kworker thread uses up 100% of a Sandybridge core for minutes on random write into big file
> https://bugzilla.kernel.org/show_bug.cgi?id=90401
> 
> Cause anything that helps a BTRFS developer to reproduce will make it easier
> to find and fix the root cause of it.
> 
> I think I will try with this little critter:
> 
> merkaba:/mnt/btrfsraid1> cat freespracefragment.sh 
> #!/bin/bash
> 
> TESTDIR="./test"
> mkdir -p "$TESTDIR"
> 
> typeset -i counter=0
> while true; do
>         fallocate -l $((4096 + $RANDOM)) "$TESTDIR/$((++counter))"
>         echo $counter >/dev/null #basically a noop
> done
> 
> It takes a time, the script itself is using only a few percent of one core
> there, while busying out the SSDs more heavily than I thought it would do.
> But well I see up to 12000 writes per 10 seconds – thats not that much, still
> it busies one SSD for 80%:
> 
> ATOP - merkaba                                 2014/12/28  16:40:57                                 -----------                                   10s elapsed
> PRC | sys    1.50s | user   3.47s | #proc    367  | #trun      1 | #tslpi   649 | #tslpu     0 | #zombie    0 | clones   839  |              | no  procacct |
> CPU | sys      30% | user     38% | irq       1%  | idle    293% | wait     37% |              | steal     0% | guest     0%  | curf 1.63GHz | curscal  50% |
> cpu | sys       7% | user     11% | irq       1%  | idle     75% | cpu000 w  6% |              | steal     0% | guest     0%  | curf 1.25GHz | curscal  39% |
> cpu | sys       8% | user     11% | irq       0%  | idle     76% | cpu002 w  4% |              | steal     0% | guest     0%  | curf 1.55GHz | curscal  48% |
> cpu | sys       7% | user      9% | irq       0%  | idle     71% | cpu001 w 13% |              | steal     0% | guest     0%  | curf 1.75GHz | curscal  54% |
> cpu | sys       8% | user      7% | irq       0%  | idle     71% | cpu003 w 14% |              | steal     0% | guest     0%  | curf 1.96GHz | curscal  61% |
> CPL | avg1    1.69 | avg5    1.30 | avg15   0.94  |              |              | csw    68387 | intr   36928 |               |              | numcpu     4 |
> MEM | tot    15.5G | free    3.1G | cache   8.8G  | buff    4.2M | slab    1.0G | shmem 210.3M | shrss  79.1M | vmbal   0.0M  | hptot   0.0M | hpuse   0.0M |
> SWP | tot    12.0G | free   11.5G |               |              |              |              |              |               | vmcom   4.9G | vmlim  19.7G |
> LVM | a-btrfsraid1 | busy     80% | read       0  | write  11873 | KiB/r      0 | KiB/w      3 | MBr/s   0.00 | MBw/s   4.31  | avq     1.11 | avio 0.67 ms |
> LVM | a-btrfsraid1 | busy      5% | read       0  | write  11873 | KiB/r      0 | KiB/w      3 | MBr/s   0.00 | MBw/s   4.31  | avq     2.45 | avio 0.04 ms |
> LVM |   msata-home | busy      3% | read       0  | write    175 | KiB/r      0 | KiB/w      3 | MBr/s   0.00 | MBw/s   0.06  | avq     1.71 | avio 1.43 ms |
> LVM | msata-debian | busy      0% | read       0  | write     10 | KiB/r      0 | KiB/w      8 | MBr/s   0.00 | MBw/s   0.01  | avq     1.15 | avio 3.40 ms |
> LVM |    sata-home | busy      0% | read       0  | write    175 | KiB/r      0 | KiB/w      3 | MBr/s   0.00 | MBw/s   0.06  | avq     1.71 | avio 0.04 ms |
> LVM |  sata-debian | busy      0% | read       0  | write     10 | KiB/r      0 | KiB/w      8 | MBr/s   0.00 | MBw/s   0.01  | avq     1.00 | avio 0.10 ms |
> DSK |          sdb | busy     80% | read       0  | write  11880 | KiB/r      0 | KiB/w      3 | MBr/s   0.00 | MBw/s   4.38  | avq     1.11 | avio 0.67 ms |
> DSK |          sda | busy      5% | read       0  | write  12069 | KiB/r      0 | KiB/w      3 | MBr/s   0.00 | MBw/s   4.38  | avq     2.51 | avio 0.04 ms |
> NET | transport    | tcpi      26 | tcpo      26  | udpi       0 | udpo       0 | tcpao      2 | tcppo      1 | tcprs      0  | tcpie      0 | udpie      0 |
> NET | network      | ipi       26 | ipo       26  | ipfrw      0 | deliv     26 |              |              |               | icmpi      0 | icmpo      0 |
> NET | eth0      0% | pcki      10 | pcko      10  | si    5 Kbps | so    1 Kbps | coll       0 | erri       0 | erro       0  | drpi       0 | drpo       0 |
> NET | lo      ---- | pcki      16 | pcko      16  | si    2 Kbps | so    2 Kbps | coll       0 | erri       0 | erro       0  | drpi       0 | drpo       0 |
> 
>   PID     TID    RUID        EUID         THR    SYSCPU    USRCPU     VGROW     RGROW    RDDSK     WRDSK    ST    EXC    S    CPUNR     CPU    CMD        1/4
>  9169       -    martin      martin        14     0.22s     1.53s        0K        0K       0K        4K    --      -    S        1     18%    amarok
>  1488       -    root        root           1     0.34s     0.27s      220K        0K       0K        0K    --      -    S        2      6%    Xorg
>  6816       -    martin      martin         7     0.05s     0.44s        0K        0K       0K        0K    --      -    S        1      5%    kmail
> 24390       -    root        root           1     0.20s     0.25s       24K       24K       0K    40800K    --      -    S        0      5%    freespracefrag
>  3268       -    martin      martin         3     0.08s     0.34s        0K        0K       0K       24K    --      -    S        0      4%    kwin
> 
> 
> 
> But only with a low amount of writes:
> 
> merkaba:/mnt/btrfsraid1> vmstat 1
> procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>  2  0 538424 3326248   4304 9202576    6   11  1968  4029  273  207 15 10 72  3  0
>  1  0 538424 3325244   4304 9202836    0    0     0  6456 3498 7635 11  8 72 10  0
>  0  0 538424 3325168   4304 9202932    0    0     0  9032 3719 6764  9  9 74  9  0
>  0  0 538424 3334508   4304 9202932    0    0     0  8936 3548 6035  7  8 76  9  0
>  0  0 538424 3334144   4304 9202876    0    0     0  9008 3335 5635  7  7 76 10  0
>  0  0 538424 3332724   4304 9202728    0    0     0 11240 3555 5699  7  8 76 10  0
>  2  0 538424 3333328   4304 9202876    0    0     0  9080 3724 6542  8  8 75  9  0
>  0  0 538424 3333328   4304 9202876    0    0     0  6968 2951 5015  7  7 76 10  0
>  0  1 538424 3332832   4304 9202584    0    0     0  9160 3663 6772  8  8 76  9  0

Let me rephrase that.

One one hand rather low, but for the kind of workload just for *fallocating*
new files actually quite much. I just tell it to *reserve* the space for the
file I do not tell it to write to them. And yet its about 6-12 MiB/s.

> Still it busies one of both SSDs for about 80%:
> 
> iostat -xz 1
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            7,04    0,00    7,04    9,80    0,00   76,13
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sda               0,00     0,00    0,00 1220,00     0,00  4556,00     7,47     0,12    0,10    0,00    0,10   0,04   5,10
> sdb               0,00    10,00    0,00 1210,00     0,00  4556,00     7,53     0,85    0,70    0,00    0,70   0,66  79,90
> dm-2              0,00     0,00    0,00    4,00     0,00    36,00    18,00     0,02    5,00    0,00    5,00   4,25   1,70
> dm-5              0,00     0,00    0,00    4,00     0,00    36,00    18,00     0,00    0,25    0,00    0,25   0,25   0,10
> dm-6              0,00     0,00    0,00 1216,00     0,00  4520,00     7,43     0,12    0,10    0,00    0,10   0,04   5,00
> dm-7              0,00     0,00    0,00 1216,00     0,00  4520,00     7,43     0,84    0,69    0,00    0,69   0,66  79,70
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            6,55    0,00    7,81    9,32    0,00   76,32
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sda               0,00     0,00    0,00 1203,00     0,00  4472,00     7,43     0,09    0,07    0,00    0,07   0,03   3,80
> sdb               0,00     0,00    0,00 1203,00     0,00  4472,00     7,43     0,79    0,66    0,00    0,66   0,64  77,10
> dm-6              0,00     0,00    0,00 1203,00     0,00  4472,00     7,43     0,09    0,07    0,00    0,07   0,03   4,00
> dm-7              0,00     0,00    0,00 1203,00     0,00  4472,00     7,43     0,79    0,66    0,00    0,66   0,64  77,10
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            7,79    0,00    7,79    9,30    0,00   75,13
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sda               0,00     0,00    0,00 1202,00     0,00  4468,00     7,43     0,09    0,07    0,00    0,07   0,04   4,70
> sdb               0,00     0,00    4,00 1202,00  2048,00  4468,00    10,81     0,86    0,71    4,75    0,70   0,65  78,10
> dm-1              0,00     0,00    4,00    0,00  2048,00     0,00  1024,00     0,02    4,75    4,75    0,00   2,00   0,80
> dm-6              0,00     0,00    0,00 1202,00     0,00  4468,00     7,43     0,08    0,07    0,00    0,07   0,04   4,60
> dm-7              0,00     0,00    0,00 1202,00     0,00  4468,00     7,43     0,84    0,70    0,00    0,70   0,65  77,80
> 
> 
> But yet, neither I hit full CPU usage nor full SSD usage (just 80%), so
> this is yet another interesting case.
[…]
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

  reply	other threads:[~2014-12-28 15:47 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-26 13:37 BTRFS free space handling still needs more work: Hangs again Martin Steigerwald
2014-12-26 14:20 ` Martin Steigerwald
2014-12-26 14:41   ` Martin Steigerwald
2014-12-27  3:33     ` Duncan
2014-12-26 15:59 ` Martin Steigerwald
2014-12-27  4:26   ` Duncan
2014-12-26 22:48 ` Robert White
2014-12-27  5:54   ` Duncan
2014-12-27  9:01   ` Martin Steigerwald
2014-12-27  9:30     ` Hugo Mills
2014-12-27 10:54       ` Martin Steigerwald
2014-12-27 11:52         ` Robert White
2014-12-27 13:16           ` Martin Steigerwald
2014-12-27 13:49             ` Robert White
2014-12-27 14:06               ` Martin Steigerwald
2014-12-27 14:00             ` Robert White
2014-12-27 14:14               ` Martin Steigerwald
2014-12-27 14:21                 ` Martin Steigerwald
2014-12-27 15:14                   ` Robert White
2014-12-27 16:01                     ` Martin Steigerwald
2014-12-28  0:25                       ` Robert White
2014-12-28  1:01                         ` Bardur Arantsson
2014-12-28  4:03                           ` Robert White
2014-12-28 12:03                             ` Martin Steigerwald
2014-12-28 17:04                               ` Patrik Lundquist
2014-12-29 10:14                                 ` Martin Steigerwald
2014-12-28 12:07                             ` Martin Steigerwald
2014-12-28 14:52                               ` Robert White
2014-12-28 15:42                                 ` Martin Steigerwald
2014-12-28 15:47                                   ` Martin Steigerwald [this message]
2014-12-29  0:27                                   ` Robert White
2014-12-29  9:14                                     ` Martin Steigerwald
2014-12-27 16:10                     ` Martin Steigerwald
2014-12-27 14:19               ` Robert White
2014-12-27 11:11       ` Martin Steigerwald
2014-12-27 12:08         ` Robert White
2014-12-27 13:55       ` Martin Steigerwald
2014-12-27 14:54         ` Robert White
2014-12-27 16:26           ` Hugo Mills
2014-12-27 17:11             ` Martin Steigerwald
2014-12-27 17:59               ` Martin Steigerwald
2014-12-28  0:06             ` Robert White
2014-12-28 11:05               ` Martin Steigerwald
2014-12-28 13:00         ` BTRFS free space handling still needs more work: Hangs again (further tests) Martin Steigerwald
2014-12-28 13:40           ` BTRFS free space handling still needs more work: Hangs again (further tests, as close as I dare) Martin Steigerwald
2014-12-28 13:56             ` BTRFS free space handling still needs more work: Hangs again (further tests, as close as I dare, current idea) Martin Steigerwald
2014-12-28 15:00               ` Martin Steigerwald
2014-12-29  9:25               ` Martin Steigerwald
2014-12-27 18:28       ` BTRFS free space handling still needs more work: Hangs again Zygo Blaxell
2014-12-27 18:40         ` Hugo Mills
2014-12-27 19:23           ` BTRFS free space handling still needs more work: Hangs again (no complete lockups, "just" tasks stuck for some time) Martin Steigerwald
2014-12-29  2:07             ` Zygo Blaxell
2014-12-29  9:32               ` Martin Steigerwald
2015-01-06 20:03                 ` Zygo Blaxell
2015-01-07 19:08                   ` Martin Steigerwald
2015-01-07 21:41                     ` Zygo Blaxell
2015-01-08  5:45                     ` Duncan
2015-01-08 10:18                       ` Martin Steigerwald
2015-01-09  8:25                         ` Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2555795.UuG01n5GBN@merkaba \
    --to=martin@lichtvoll.de \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=rwhite@pobox.com \
    --cc=spam@scientician.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).