From: Stefan Behrens <sbehrens@giantdisaster.de>
To: miaox@cn.fujitsu.com
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] Btrfs: allocate the free space by the existed max extent size when ENOSPC
Date: Mon, 09 Sep 2013 11:06:05 +0200 [thread overview]
Message-ID: <522D8F7D.3020803@giantdisaster.de> (raw)
In-Reply-To: <522D6906.5040404@cn.fujitsu.com>
On 09/09/2013 08:21, Miao Xie wrote:
> On fri, 06 Sep 2013 15:47:08 +0200, Stefan Behrens wrote:
>> On Fri, 30 Aug 2013 18:35:34 +0800, Miao Xie wrote:
>>> By the current code, if the requested size is very large, and all the extents
>>> in the free space cache are small, we will waste lots of the cpu time to cut
>>> the requested size in half and search the cache again and again until it gets
>>> down to the size the allocator can return. In fact, we can know the max extent
>>> size in the cache after the first search, so we needn't cut the size in half
>>> repeatedly, and just use the max extent size directly. This way can save
>>> lots of cpu time and make the performance grow up when there are only fragments
>>> in the free space cache.
>>>
>>> According to my test, if there are only 4KB free space extents in the fs,
>>> and the total size of those extents are 256MB, we can reduce the execute
>>> time of the following test from 5.4s to 1.4s.
>>> dd if=/dev/zero of=<testfile> bs=1MB count=1 oflag=sync
>>>
>>> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
>>> ---
>>> Changelog v1 -> v2:
>>> - address the problem that we return a wrong start position when searching
>>> the free space in a bitmap.
>>> ---
>>> fs/btrfs/extent-tree.c | 29 ++++++++++++++-------
>>> fs/btrfs/free-space-cache.c | 62 +++++++++++++++++++++++++++++++--------------
>>> fs/btrfs/free-space-cache.h | 5 ++--
>>> 3 files changed, 66 insertions(+), 30 deletions(-)
>>
>> This patch makes the xfstest generic/256 lock up. It's quite reliably reproducible on one of my test boxes, and not at all visible on a second test box.
>>
>> And yes, I'm using the V2 patch although you haven't tagged it as V2 in the subject line of the mail :)
>
> According to my debug, the machine was not locked up, it seems the patch makes the test run very slow(90s ->850s on my machine).
With v2, the xfstest generic/256 was still running after 2 1/2 days with
the 'echo w > /proc/sysrq-trigger' output as reported.
> Could you try the v3 patch?
With v3, generic/256 always finishes after 26 seconds. The issue is
fixed with v3.
>>
>> # reboot
>> ... reboot done
>> # cd ~/git/xfs/cmds/xfstests
>> # export TEST_DEV=/dev/sdc1
>> # export TEST_DIR=/mnt2
>> # export SCRATCH_DEV=/dev/sdd1
>> # export SCRATCH_MNT=/mnt3
>> # umount $TEST_DIR $SCRATCH_MNT
>> # mkfs.btrfs -f $TEST_DEV
>> # mkfs.btrfs -f $SCRATCH_DEV
>> # ./check generic/256
>> ...should be finished after 20s, but it isn't, therefore after 180s:
>> # echo w > /proc/sysrq-trigger
>> root: run xfstest generic/256
>> SysRq : Show Blocked State
>> task PC stack pid father
>> btrfs-flush_del D 000000001a6d0000 6240 31190 2 0x00000000
>> ffff880804dbfcb8 0000000000000086 ffff880804dbffd8 ffff8807ef218000
>> ffff880804dbffd8 ffff880804dbffd8 ffff88080ad44520 ffff8807ef218000
>> ffff880804dbfc98 ffff880784d3ca50 ffff880784d3ca18 ffff880804dbfce8
>> Call Trace:
>> [<ffffffff81995da4>] schedule+0x24/0x60
>> [<ffffffffa05235c5>] btrfs_start_ordered_extent+0x85/0x130 [btrfs]
>> [<ffffffff810ac170>] ? wake_up_bit+0x40/0x40
>> [<ffffffffa0523694>] btrfs_run_ordered_extent_work+0x24/0x40 [btrfs]
>> [<ffffffffa0539d5f>] worker_loop+0x13f/0x5b0 [btrfs]
>> [<ffffffff810b5ba3>] ? finish_task_switch+0x43/0x110
>> [<ffffffff81995880>] ? __schedule+0x3f0/0x860
>> [<ffffffffa0539c20>] ? btrfs_queue_worker+0x300/0x300 [btrfs]
>> [<ffffffff810abd36>] kthread+0xd6/0xe0
>> [<ffffffff810e61ed>] ? trace_hardirqs_on+0xd/0x10
>> [<ffffffff810abc60>] ? kthread_create_on_node+0x130/0x130
>> [<ffffffff8199f66c>] ret_from_fork+0x7c/0xb0
>> [<ffffffff810abc60>] ? kthread_create_on_node+0x130/0x130
>> xfs_io D ffff880784d3cbc0 5008 31241 31240 0x00000000
>> ffff8808036f3868 0000000000000082 ffff8808036f3fd8 ffff8807c9878000
>> ffff8808036f3fd8 ffff8808036f3fd8 ffffffff82010440 ffff8807c9878000
>> ffff8808036f3848 ffff880784d3cb18 ffff880784d3cb20 7fffffffffffffff
>> Call Trace:
>> [<ffffffff81995da4>] schedule+0x24/0x60
>> [<ffffffff81992dc4>] schedule_timeout+0x194/0x260
>> [<ffffffff8199513a>] ? wait_for_completion+0x3a/0x110
>> [<ffffffff8199513a>] ? wait_for_completion+0x3a/0x110
>> [<ffffffff810e61ed>] ? trace_hardirqs_on+0xd/0x10
>> [<ffffffff819951cf>] wait_for_completion+0xcf/0x110
>> [<ffffffff810bb650>] ? try_to_wake_up+0x310/0x310
>> [<ffffffffa0523b7a>] btrfs_wait_ordered_extents+0x1ea/0x260 [btrfs]
>> [<ffffffffa0523ce5>] btrfs_wait_all_ordered_extents+0xf5/0x150 [btrfs]
>> [<ffffffffa04f4b8d>] reserve_metadata_bytes+0x7bd/0xa30 [btrfs]
>> [<ffffffffa04f516d>] btrfs_delalloc_reserve_metadata+0x16d/0x460 [btrfs]
>> [<ffffffffa051dad6>] __btrfs_buffered_write+0x276/0x4f0 [btrfs]
>> [<ffffffffa051df1a>] btrfs_file_aio_write+0x1ca/0x5a0 [btrfs]
>> [<ffffffff8119a6db>] do_sync_write+0x7b/0xb0
>> [<ffffffff8119b463>] vfs_write+0xc3/0x1e0
>> [<ffffffff8119bad2>] SyS_pwrite64+0x92/0xb0
>> [<ffffffff8199f712>] system_call_fastpath+0x16/0x1b
>>
>> (gdb) list *(btrfs_start_ordered_extent+0x85)
>> 0x4a545 is in btrfs_start_ordered_extent (fs/btrfs/ordered-data.c:747).
>> 742 * for the flusher thread to find them
>> 743 */
>> 744 if (!test_bit(BTRFS_ORDERED_DIRECT, &entry->flags))
>> 745 filemap_fdatawrite_range(inode->i_mapping, start, end);
>> 746 if (wait) {
>> 747 wait_event(entry->wait, test_bit(BTRFS_ORDERED_COMPLETE,
>> 748 &entry->flags));
>> 749 }
>> 750 }
>> 751
>>
>> (gdb) list *(btrfs_wait_ordered_extents+0x1ea)
>> 0x4aafa is in btrfs_wait_ordered_extents (fs/btrfs/ordered-data.c:610).
>> 605 list_for_each_entry_safe(ordered, next, &works, work_list) {
>> 606 list_del_init(&ordered->work_list);
>> 607 wait_for_completion(&ordered->completion);
>> 608
>> 609 inode = ordered->inode;
>> 610 btrfs_put_ordered_extent(ordered);
>> 611 if (delay_iput)
>> 612 btrfs_add_delayed_iput(inode);
>> 613 else
>> 614 iput(inode);
>>
>> # cat /proc/mounts | grep /mnt
>> /dev/sdc1 /mnt2 btrfs rw,relatime,ssd,space_cache 0 0
>> /dev/sdd1 /mnt3 btrfs rw,relatime,ssd,space_cache 0 0
>>
>> # btrfs fi show
>> Label: none uuid: 3dbe59c8-f4a0-4014-85f6-a6e9f5707c3a
>> Total devices 1 FS bytes used 1.44GiB
>> devid 1 size 1.50GiB used 1.50GiB path /dev/sdd1
>>
>> Label: none uuid: 60130e96-5fb6-4355-b81e-8113c6f5c517
>> Total devices 1 FS bytes used 32.00KiB
>> devid 1 size 20.00GiB used 20.00MiB path /dev/sdc1
>>
>> All partitions have a size of 20971520 blocks according to fdisk:
>> Device Boot Start End Blocks Id System
>> /dev/sdc1 2048 41945087 20971520 83 Linux
>>
>>
>> With the currently pushed btrfs-next and the latest xfstests.
>>
next prev parent reply other threads:[~2013-09-09 9:05 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-30 10:35 [PATCH] Btrfs: allocate the free space by the existed max extent size when ENOSPC Miao Xie
2013-09-06 13:47 ` Stefan Behrens
2013-09-09 6:21 ` Miao Xie
2013-09-09 9:06 ` Stefan Behrens [this message]
2013-09-17 13:13 ` David Sterba
2013-09-18 4:04 ` Miao Xie
2013-09-20 9:25 ` David Sterba
2013-09-09 5:19 ` [PATCH v3] " Miao Xie
-- strict thread matches above, loose matches on Subject: below --
2013-08-29 5:47 [PATCH] " Miao Xie
2013-08-29 12:45 ` David Sterba
2013-08-30 10:58 ` Miao Xie
2013-08-29 19:34 ` Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=522D8F7D.3020803@giantdisaster.de \
--to=sbehrens@giantdisaster.de \
--cc=linux-btrfs@vger.kernel.org \
--cc=miaox@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.