Re: [PATCH] e2fsprogs: don't set stripe/stride to 1 block in mkfs

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Eric Sandeen <sandeen@redhat.com>
To: Andreas Dilger <adilger@dilger.ca>
Cc: ext4 development <linux-ext4@vger.kernel.org>,
	Zeev Tarantov <zeev.tarantov@gmail.com>,
	Alex Zhuravlev <bzzz@whamcloud.com>
Subject: Re: [PATCH] e2fsprogs: don't set stripe/stride to 1 block in mkfs
Date: Thu, 07 Apr 2011 17:24:32 -0700	[thread overview]
Message-ID: <4D9E55C0.5000607@redhat.com> (raw)
In-Reply-To: <73371424-8E17-4BF4-BBF8-BA4E9B9EA7C1@dilger.ca>

On 4/7/11 5:13 PM, Andreas Dilger wrote:
> 
> On 2011-04-05, at 10:56 AM, Eric Sandeen wrote:
> 
>> On 4/5/11 9:39 AM, Eric Sandeen wrote:
>>> Andreas Dilger wrote:
>>>> I don't think it is harmful to specify an mballoc alignment that is
>>>> an even multiple of the underlying device IO size (e.g. at least
>>>> 256kB or 512kB).
>>>>
>>>> If the underlying device (e.g. zram) is reporting 16kB or 64kB opt_io
>>>> size because that is PAGE_SIZE, but blocksize is 4kB, then we will
>>>> have the same performance problem again.> 
>>>> Cheers, Andreas
>>>
>>> I need to look into why ext4_mb_scan_aligned is so inefficient for a block-sized stripe.
>>>
>>> In practice I don't think we've seen this problem with stripe size at 4 or 8 or 16 blocks; it may just be less apparent.  I think the function steps through by stripe-sized units, and if that is 1 block, it's a lot of stepping.  
>>>
>>>        while (i < EXT4_BLOCKS_PER_GROUP(sb)) {
>>> ...
>>>                if (!mb_test_bit(i, bitmap)) {
>>
>> Offhand I think maybe mb_find_next_zero_bit would be more efficient.
>>
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -1939,16 +1939,14 @@ void ext4_mb_scan_aligned(struct ext4_allocation_context *ac,
>>        i = (a * sbi->s_stripe) - first_group_block;
>>
>>        while (i < EXT4_BLOCKS_PER_GROUP(sb)) {
>> -               if (!mb_test_bit(i, bitmap)) {
>> -                       max = mb_find_extent(e4b, 0, i, sbi->s_stripe, &ex);
>> -                       if (max >= sbi->s_stripe) {
>> -                               ac->ac_found++;
>> -                               ac->ac_b_ex = ex;
>> -                               ext4_mb_use_best_found(ac, e4b);
>> -                               break;
>> -                       }
>> +               i = mb_find_next_zero_bit(bitmap, EXT4_BLOCKS_PER_GROUP(sb), i);
>> +               max = mb_find_extent(e4b, 0, i, sbi->s_stripe, &ex);
>> +               if (max >= sbi->s_stripe) {
>> +                       ac->ac_found++;
>> +                       ac->ac_b_ex = ex;
>> +                       ext4_mb_use_best_found(ac, e4b);
>> +                       break;
>>                }
>> -               i += sbi->s_stripe;
>>        }
>> }
>>
>> totally untested, but I think we have better ways to step through the bitmap.
> 
> This changes the allocation completely, AFAICS. Instead of doing
> checks for chunks of free space aligned on sbi->s_stripe boundaries,
> it is instead finding the first free space of size s_stripe
> regardless of alignment. That is not good for RAID back-ends, and is
> the primary reason for ext4_mb_scan_aligned() to exist.

Oh, er, right.  It's what I get for coding-at-conference, sorry.

I do wonder if test-bit/advance/test-bit/advance can be made a bit more efficient with something like find_next_bit.  I just did it wrong. :(

I'll revisit it when I get back home.

> I think my original assertion holds - that regardless of what the
> "optimal IO" size reported by the underlying device, doing larger
> allocations at the mballoc level that are even multiples of this size
> isn't harmful. That avoids not only the performance impact of
> 4kB-sized "optimal IO", but also the (lesser) impact of 8kB-64kB
> "optimal IO" allocations as well.> 
> Cheers, Andreas

I'll give that some thought; really, the whole align-on-a-stripe mechanism needs work, at least outside of the Lustre workload :)

Thanks,
-Eric

next prev parent reply	other threads:[~2011-04-08  0:24 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-04 19:11 [PATCH] e2fsprogs: don't set stripe/stride to 1 block in mkfs Eric Sandeen
2011-04-05  8:10 ` Andreas Dilger
2011-04-05  8:43   ` Zeev Tarantov
2011-04-05 16:39   ` Eric Sandeen
2011-04-05 16:56     ` Eric Sandeen
2011-04-05 22:21       ` Eric Sandeen
2011-04-06  6:51         ` Zeev Tarantov
2011-04-06 17:00           ` Eric Sandeen
2011-04-08  0:13       ` Andreas Dilger
2011-04-08  0:24         ` Eric Sandeen [this message]
2011-05-18 17:20 ` Ted Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D9E55C0.5000607@redhat.com \
    --to=sandeen@redhat.com \
    --cc=adilger@dilger.ca \
    --cc=bzzz@whamcloud.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=zeev.tarantov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).