From: Robert White <rwhite@pobox.com>
To: Martin Steigerwald <Martin@lichtvoll.de>
Cc: Bardur Arantsson <spam@scientician.net>, linux-btrfs@vger.kernel.org
Subject: Re: BTRFS free space handling still needs more work: Hangs again
Date: Sun, 28 Dec 2014 16:27:41 -0800 [thread overview]
Message-ID: <54A09FFD.4030107@pobox.com> (raw)
In-Reply-To: <2330517.PVzv17pTee@merkaba>
On 12/28/2014 07:42 AM, Martin Steigerwald wrote:
> Am Sonntag, 28. Dezember 2014, 06:52:41 schrieb Robert White:
>> On 12/28/2014 04:07 AM, Martin Steigerwald wrote:
>>> Am Samstag, 27. Dezember 2014, 20:03:09 schrieb Robert White:
>>>> Now:
>>>>
>>>> The complaining party has verified the minimum, repeatable case of
>>>> simple file allocation on a very fragmented system and the responding
>>>> party and several others have understood and supported the bug.
>>>
>>> I didn´t yet provide such a test case.
>>
>> My bad.
>>
>>>
>>> At the moment I can only reproduce this kworker thread using a CPU for
>>> minutes case with my /home filesystem.
>>>
>>> A mininmal test case for me would be to be able to reproduce it with a
>>> fresh BTRFS filesystem. But yet with my testcase with the fresh BTRFS I
>>> get 4800 instead of 270 IOPS.
>>>
>>
>> A version of the test case to demonstrate absolutely system-clogging
>> loads is pretty easy to construct.
>>
>> Make a raid1 filesystem.
>> Balance it once to make sure the seed filesystem is fully integrated.
>>
>> Create a bunch of small files that are at least 4K in size, but are
>> randomly sized. Fill the entire filesystem with them.
>>
>> BASH Script:
>> typeset -i counter=0
>> while
>> dd if=/dev/urandom of=/mnt/Work/$((++counter)) bs=$((4096 + $RANDOM))
>> count=1 2>/dev/null
>> do
>> echo $counter >/dev/null #basically a noop
>> done
>>
>> The while will exit when the dd encounters a full filesystem.
>>
>> Then delete ~10% of the files with
>> rm *0
>>
>> Run the while loop again, then delete a different 10% with "rm *1".
>>
>> Then again with rm *2, etc...
>>
>> Do this a few times and with each iteration the CPU usage gets worse and
>> worse. You'll easily get system-wide stalls on all IO tasks lasting ten
>> or more seconds.
>
> Thanks Robert. Thats wonderful.
>
> I wondered about such a test case already and thought about reproducing
> it just with fallocate calls instead to reduce the amount of actual
> writes done. I.e. just do some silly fallocate, truncating, write just
> some parts with dd seek and remove things again kind of workload.
>
> Feel free to add your testcase to the bug report:
>
> [Bug 90401] New: btrfs kworker thread uses up 100% of a Sandybridge core for minutes on random write into big file
> https://bugzilla.kernel.org/show_bug.cgi?id=90401
>
> Cause anything that helps a BTRFS developer to reproduce will make it easier
> to find and fix the root cause of it.
>
> I think I will try with this little critter:
>
> merkaba:/mnt/btrfsraid1> cat freespracefragment.sh
> #!/bin/bash
>
> TESTDIR="./test"
> mkdir -p "$TESTDIR"
>
> typeset -i counter=0
> while true; do
> fallocate -l $((4096 + $RANDOM)) "$TESTDIR/$((++counter))"
> echo $counter >/dev/null #basically a noop
> done
If you don't do the remove/delete passes you won't get as much
fragmentation...
I also noticed that fallocate would not actually create the files in my
toolset, so I had to touch them first. So the theoretical script became
e.g.
typeset -i counter=0
for AA in {0..9}
do
while
touch ${TESTDIR}/$((++counter)) &&
fallocate -l $((4096 + $RANDOM)) $TESTDIR/$((counter))
do
if ((counter%100 == 0))
then
echo $counter
fi
done
echo "removing ${AA}"
rm ${TESTDIR}/*${AA}
done
Meanwhile, on my test rig using fallocate did _not_ result in final
exhaustion of resources. That is btrfs fi df /mnt/Work didn't show
significant changes on a near full expanse.
I also never got a failed response back from fallocate, that is the
inner loop never terminated. This could be a problem with the system
call itself or it could be a problem with the application wrapper.
Nor did I reach the CPU saturation I expected.
e.g.
Gust vm # btrfs fi df /mnt/Work/
Data, RAID1: total=1.72GiB, used=1.66GiB
System, RAID1: total=32.00MiB, used=16.00KiB
Metadata, RAID1: total=256.00MiB, used=57.84MiB
GlobalReserve, single: total=32.00MiB, used=0.00B
time passes while script running...
Gust vm # btrfs fi df /mnt/Work/
Data, RAID1: total=1.72GiB, used=1.66GiB
System, RAID1: total=32.00MiB, used=16.00KiB
Metadata, RAID1: total=256.00MiB, used=57.84MiB
GlobalReserve, single: total=32.00MiB, used=0.00B
So there may be some limiting factor or something.
Without the actual writes to the actual file expanse I don't get the stalls.
(I added a _touch_ of instrumentation, it makes the various catostrophy
events a little more obvious in context. 8-)
mount /dev/whattever /mnt/Work
typeset -i counter=0
for AA in {0..9}
do
while
dd if=/dev/urandom of=/mnt/Work/$((++counter)) bs=$((4096 +
$RANDOM)) count=1 2>/dev/null
do
if ((counter%100 == 0))
then
echo $counter
if ((counter%1000 == 0))
then
btrfs fi df /mnt/Work
fi
fi
done
btrfs fi df /mnt/Work
echo "removing ${AA}"
rm /mnt/Work/*${AA}
btrfs fi df /mnt/Work
done
So you definitely need the writes to really see the stalls.
> I may try with with my test BTRFS. I could even make it 2x20 GiB RAID 1
> as well.
I guess I never mentioned it... I am using 4x1GiB NOCOW files through
losetup as the basis of a RAID1. No compression (by virtue of the NOCOW
files in underlying fs, and not being set in the resulting mount). No
encryption. No LVM.
next prev parent reply other threads:[~2014-12-29 0:27 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-26 13:37 BTRFS free space handling still needs more work: Hangs again Martin Steigerwald
2014-12-26 14:20 ` Martin Steigerwald
2014-12-26 14:41 ` Martin Steigerwald
2014-12-27 3:33 ` Duncan
2014-12-26 15:59 ` Martin Steigerwald
2014-12-27 4:26 ` Duncan
2014-12-26 22:48 ` Robert White
2014-12-27 5:54 ` Duncan
2014-12-27 9:01 ` Martin Steigerwald
2014-12-27 9:30 ` Hugo Mills
2014-12-27 10:54 ` Martin Steigerwald
2014-12-27 11:52 ` Robert White
2014-12-27 13:16 ` Martin Steigerwald
2014-12-27 13:49 ` Robert White
2014-12-27 14:06 ` Martin Steigerwald
2014-12-27 14:00 ` Robert White
2014-12-27 14:14 ` Martin Steigerwald
2014-12-27 14:21 ` Martin Steigerwald
2014-12-27 15:14 ` Robert White
2014-12-27 16:01 ` Martin Steigerwald
2014-12-28 0:25 ` Robert White
2014-12-28 1:01 ` Bardur Arantsson
2014-12-28 4:03 ` Robert White
2014-12-28 12:03 ` Martin Steigerwald
2014-12-28 17:04 ` Patrik Lundquist
2014-12-29 10:14 ` Martin Steigerwald
2014-12-28 12:07 ` Martin Steigerwald
2014-12-28 14:52 ` Robert White
2014-12-28 15:42 ` Martin Steigerwald
2014-12-28 15:47 ` Martin Steigerwald
2014-12-29 0:27 ` Robert White [this message]
2014-12-29 9:14 ` Martin Steigerwald
2014-12-27 16:10 ` Martin Steigerwald
2014-12-27 14:19 ` Robert White
2014-12-27 11:11 ` Martin Steigerwald
2014-12-27 12:08 ` Robert White
2014-12-27 13:55 ` Martin Steigerwald
2014-12-27 14:54 ` Robert White
2014-12-27 16:26 ` Hugo Mills
2014-12-27 17:11 ` Martin Steigerwald
2014-12-27 17:59 ` Martin Steigerwald
2014-12-28 0:06 ` Robert White
2014-12-28 11:05 ` Martin Steigerwald
2014-12-28 13:00 ` BTRFS free space handling still needs more work: Hangs again (further tests) Martin Steigerwald
2014-12-28 13:40 ` BTRFS free space handling still needs more work: Hangs again (further tests, as close as I dare) Martin Steigerwald
2014-12-28 13:56 ` BTRFS free space handling still needs more work: Hangs again (further tests, as close as I dare, current idea) Martin Steigerwald
2014-12-28 15:00 ` Martin Steigerwald
2014-12-29 9:25 ` Martin Steigerwald
2014-12-27 18:28 ` BTRFS free space handling still needs more work: Hangs again Zygo Blaxell
2014-12-27 18:40 ` Hugo Mills
2014-12-27 19:23 ` BTRFS free space handling still needs more work: Hangs again (no complete lockups, "just" tasks stuck for some time) Martin Steigerwald
2014-12-29 2:07 ` Zygo Blaxell
2014-12-29 9:32 ` Martin Steigerwald
2015-01-06 20:03 ` Zygo Blaxell
2015-01-07 19:08 ` Martin Steigerwald
2015-01-07 21:41 ` Zygo Blaxell
2015-01-08 5:45 ` Duncan
2015-01-08 10:18 ` Martin Steigerwald
2015-01-09 8:25 ` Duncan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54A09FFD.4030107@pobox.com \
--to=rwhite@pobox.com \
--cc=Martin@lichtvoll.de \
--cc=linux-btrfs@vger.kernel.org \
--cc=spam@scientician.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).