From: Robert White <rwhite@pobox.com>
To: Martin Steigerwald <Martin@lichtvoll.de>
Cc: Bardur Arantsson <spam@scientician.net>, linux-btrfs@vger.kernel.org
Subject: Re: BTRFS free space handling still needs more work: Hangs again
Date: Sun, 28 Dec 2014 16:27:41 -0800 [thread overview]
Message-ID: <54A09FFD.4030107@pobox.com> (raw)
In-Reply-To: <2330517.PVzv17pTee@merkaba>
On 12/28/2014 07:42 AM, Martin Steigerwald wrote:
> Am Sonntag, 28. Dezember 2014, 06:52:41 schrieb Robert White:
>> On 12/28/2014 04:07 AM, Martin Steigerwald wrote:
>>> Am Samstag, 27. Dezember 2014, 20:03:09 schrieb Robert White:
>>>> Now:
>>>>
>>>> The complaining party has verified the minimum, repeatable case of
>>>> simple file allocation on a very fragmented system and the responding
>>>> party and several others have understood and supported the bug.
>>>
>>> I didn´t yet provide such a test case.
>>
>> My bad.
>>
>>>
>>> At the moment I can only reproduce this kworker thread using a CPU for
>>> minutes case with my /home filesystem.
>>>
>>> A mininmal test case for me would be to be able to reproduce it with a
>>> fresh BTRFS filesystem. But yet with my testcase with the fresh BTRFS I
>>> get 4800 instead of 270 IOPS.
>>>
>>
>> A version of the test case to demonstrate absolutely system-clogging
>> loads is pretty easy to construct.
>>
>> Make a raid1 filesystem.
>> Balance it once to make sure the seed filesystem is fully integrated.
>>
>> Create a bunch of small files that are at least 4K in size, but are
>> randomly sized. Fill the entire filesystem with them.
>>
>> BASH Script:
>> typeset -i counter=0
>> while
>> dd if=/dev/urandom of=/mnt/Work/$((++counter)) bs=$((4096 + $RANDOM))
>> count=1 2>/dev/null
>> do
>> echo $counter >/dev/null #basically a noop
>> done
>>
>> The while will exit when the dd encounters a full filesystem.
>>
>> Then delete ~10% of the files with
>> rm *0
>>
>> Run the while loop again, then delete a different 10% with "rm *1".
>>
>> Then again with rm *2, etc...
>>
>> Do this a few times and with each iteration the CPU usage gets worse and
>> worse. You'll easily get system-wide stalls on all IO tasks lasting ten
>> or more seconds.
>
> Thanks Robert. Thats wonderful.
>
> I wondered about such a test case already and thought about reproducing
> it just with fallocate calls instead to reduce the amount of actual
> writes done. I.e. just do some silly fallocate, truncating, write just
> some parts with dd seek and remove things again kind of workload.
>
> Feel free to add your testcase to the bug report:
>
> [Bug 90401] New: btrfs kworker thread uses up 100% of a Sandybridge core for minutes on random write into big file
> https://bugzilla.kernel.org/show_bug.cgi?id=90401
>
> Cause anything that helps a BTRFS developer to reproduce will make it easier
> to find and fix the root cause of it.
>
> I think I will try with this little critter:
>
> merkaba:/mnt/btrfsraid1> cat freespracefragment.sh
> #!/bin/bash
>
> TESTDIR="./test"
> mkdir -p "$TESTDIR"
>
> typeset -i counter=0
> while true; do
> fallocate -l $((4096 + $RANDOM)) "$TESTDIR/$((++counter))"
> echo $counter >/dev/null #basically a noop
> done
If you don't do the remove/delete passes you won't get as much
fragmentation...
I also noticed that fallocate would not actually create the files in my
toolset, so I had to touch them first. So the theoretical script became
e.g.
typeset -i counter=0
for AA in {0..9}
do
while
touch ${TESTDIR}/$((++counter)) &&
fallocate -l $((4096 + $RANDOM)) $TESTDIR/$((counter))
do
if ((counter%100 == 0))
then
echo $counter
fi
done
echo "removing ${AA}"
rm ${TESTDIR}/*${AA}
done
Meanwhile, on my test rig using fallocate did _not_ result in final
exhaustion of resources. That is btrfs fi df /mnt/Work didn't show
significant changes on a near full expanse.
I also never got a failed response back from fallocate, that is the
inner loop never terminated. This could be a problem with the system
call itself or it could be a problem with the application wrapper.
Nor did I reach the CPU saturation I expected.
e.g.
Gust vm # btrfs fi df /mnt/Work/
Data, RAID1: total=1.72GiB, used=1.66GiB
System, RAID1: total=32.00MiB, used=16.00KiB
Metadata, RAID1: total=256.00MiB, used=57.84MiB
GlobalReserve, single: total=32.00MiB, used=0.00B
time passes while script running...
Gust vm # btrfs fi df /mnt/Work/
Data, RAID1: total=1.72GiB, used=1.66GiB
System, RAID1: total=32.00MiB, used=16.00KiB
Metadata, RAID1: total=256.00MiB, used=57.84MiB
GlobalReserve, single: total=32.00MiB, used=0.00B
So there may be some limiting factor or something.
Without the actual writes to the actual file expanse I don't get the stalls.
(I added a _touch_ of instrumentation, it makes the various catostrophy
events a little more obvious in context. 8-)
mount /dev/whattever /mnt/Work
typeset -i counter=0
for AA in {0..9}
do
while
dd if=/dev/urandom of=/mnt/Work/$((++counter)) bs=$((4096 +
$RANDOM)) count=1 2>/dev/null
do
if ((counter%100 == 0))
then
echo $counter
if ((counter%1000 == 0))
then
btrfs fi df /mnt/Work
fi
fi
done
btrfs fi df /mnt/Work
echo "removing ${AA}"
rm /mnt/Work/*${AA}
btrfs fi df /mnt/Work
done
So you definitely need the writes to really see the stalls.
> I may try with with my test BTRFS. I could even make it 2x20 GiB RAID 1
> as well.
I guess I never mentioned it... I am using 4x1GiB NOCOW files through
losetup as the basis of a RAID1. No compression (by virtue of the NOCOW
files in underlying fs, and not being set in the resulting mount). No
encryption. No LVM.
next prev parent reply other threads:[~2014-12-29 0:27 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-26 13:37 BTRFS free space handling still needs more work: Hangs again Martin Steigerwald
2014-12-26 14:20 ` Martin Steigerwald
2014-12-26 14:41 ` Martin Steigerwald
2014-12-27 3:33 ` Duncan
2014-12-26 15:59 ` Martin Steigerwald
2014-12-27 4:26 ` Duncan
2014-12-26 22:48 ` Robert White
2014-12-27 5:54 ` Duncan
2014-12-27 9:01 ` Martin Steigerwald
2014-12-27 9:30 ` Hugo Mills
2014-12-27 10:54 ` Martin Steigerwald
2014-12-27 11:52 ` Robert White
2014-12-27 13:16 ` Martin Steigerwald
2014-12-27 13:49 ` Robert White
2014-12-27 14:06 ` Martin Steigerwald
2014-12-27 14:00 ` Robert White
2014-12-27 14:14 ` Martin Steigerwald
2014-12-27 14:21 ` Martin Steigerwald
2014-12-27 15:14 ` Robert White
2014-12-27 16:01 ` Martin Steigerwald
2014-12-28 0:25 ` Robert White
2014-12-28 1:01 ` Bardur Arantsson
2014-12-28 4:03 ` Robert White
2014-12-28 12:03 ` Martin Steigerwald
2014-12-28 17:04 ` Patrik Lundquist
2014-12-29 10:14 ` Martin Steigerwald
2014-12-28 12:07 ` Martin Steigerwald
2014-12-28 14:52 ` Robert White
2014-12-28 15:42 ` Martin Steigerwald
2014-12-28 15:47 ` Martin Steigerwald
2014-12-29 0:27 ` Robert White [this message]
2014-12-29 9:14 ` Martin Steigerwald
2014-12-27 16:10 ` Martin Steigerwald
2014-12-27 14:19 ` Robert White
2014-12-27 11:11 ` Martin Steigerwald
2014-12-27 12:08 ` Robert White
2014-12-27 13:55 ` Martin Steigerwald
2014-12-27 14:54 ` Robert White
2014-12-27 16:26 ` Hugo Mills
2014-12-27 17:11 ` Martin Steigerwald
2014-12-27 17:59 ` Martin Steigerwald
2014-12-28 0:06 ` Robert White
2014-12-28 11:05 ` Martin Steigerwald
2014-12-28 13:00 ` BTRFS free space handling still needs more work: Hangs again (further tests) Martin Steigerwald
2014-12-28 13:40 ` BTRFS free space handling still needs more work: Hangs again (further tests, as close as I dare) Martin Steigerwald
2014-12-28 13:56 ` BTRFS free space handling still needs more work: Hangs again (further tests, as close as I dare, current idea) Martin Steigerwald
2014-12-28 15:00 ` Martin Steigerwald
2014-12-29 9:25 ` Martin Steigerwald
2014-12-27 18:28 ` BTRFS free space handling still needs more work: Hangs again Zygo Blaxell
2014-12-27 18:40 ` Hugo Mills
2014-12-27 19:23 ` BTRFS free space handling still needs more work: Hangs again (no complete lockups, "just" tasks stuck for some time) Martin Steigerwald
2014-12-29 2:07 ` Zygo Blaxell
2014-12-29 9:32 ` Martin Steigerwald
2015-01-06 20:03 ` Zygo Blaxell
2015-01-07 19:08 ` Martin Steigerwald
2015-01-07 21:41 ` Zygo Blaxell
2015-01-08 5:45 ` Duncan
2015-01-08 10:18 ` Martin Steigerwald
2015-01-09 8:25 ` Duncan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54A09FFD.4030107@pobox.com \
--to=rwhite@pobox.com \
--cc=Martin@lichtvoll.de \
--cc=linux-btrfs@vger.kernel.org \
--cc=spam@scientician.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.