linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Harmon <gharm@google.com>
To: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: Lukas Czerner <lczerner@redhat.com>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Curt Wohlgemuth <curtw@google.com>
Subject: Re: [PATCH] ext4: Do not normalize request from fallocate
Date: Fri, 22 Mar 2013 10:10:46 -0700	[thread overview]
Message-ID: <CABFC-fO3HrWFXJhyPMPKRy6Up3vGp_JUQs_VJGRyXwph1M95Cw@mail.gmail.com> (raw)
In-Reply-To: <877gl0yc2k.fsf@openvz.org>

[Resending without html.]
I haven't looked at ext4 in awhile now. CC'ing Ted and Curt in case
they want to chime in. See comment below as well.

On Thu, Mar 21, 2013 at 9:03 AM, Dmitry Monakhov <dmonakhov@openvz.org> wrote:
> On Thu, 21 Mar 2013 16:50:45 +0100, Lukas Czerner <lczerner@redhat.com> wrote:
>> Block requests from fallocate has been normalized originally. Then it was
>> changed by 556b27abf73833923d5cd4be80006292e1b31662 not to normalize it.
>> And then it was changed by 3c6fe77017bc6ce489f231c35fed3220b6691836
>> again to normalize the request.
>>
>> The fact is that we _never_ want to normalize the request from
>> fallocate. We know exactly how much space we're going to use and we do
>> not want anyone to mess with the request and there is no point in doing
>> so.
> Looks reasonable.
> Reviewed-by:Dmitry Monakhov <dmonakhov@openvz.org>
>>
>> Commit 3c6fe77017bc6ce489f231c35fed3220b6691836 mentioned that

Is this a publicly available commit? I did not find it with a quick
google search.
Anyway, I don't see a problem with this commit off-hand, but I'll
defer to Ted or Curt.

-Greg Harmon

>> large fallocate requests were not physically contiguous. However it is
>> important to see why that is the case. Because the request is so big the
>> allocator will try to find free group to allocate from skipping block
>> groups which are used, which is fine. However it will only allocate
>> extents of 2^15-1 block (limitation of uninitialized extent size)
>> which will leave one block in each block group free which will make the
>> extent tree physically non-contiguous, however _only_ by one block which
>> is perfectly fine.
>>
>> This will never happen when we normalize the request because for some
>> reason (maybe bug) it will be normalized to much smaller request (2048
>> blocks) and those extents will then be merged together not leaving any
>> free block in between - hence physically contiguous. However the fact
>> that we're splitting huge requests into ton of smaller ones and then
>> merging extents together is very _very_ bad for fallocate performance.
>>
>> The situation is even worst since with commit
>> ec22ba8edb507395c95fbc617eea26a6b2d98797 we no longer merge
>> uninitialized extents so we end up with absolutely _huge_ extent tree
>> for bigger fallocate requests which is also bad for performance but not
>> only when fallocate itself, but even when working with the file
>> later on.
>>
>> Fix this by disabling normalization for fallocate. From my simple testing
>> with this commit fallocate is much faster on non fragmented file
>> system. On my system fallocate 15T is almost 3x faster with this patch
>> and removing this file is almost 2x faster - tested on real hardware.
>>
>> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
>> ---
>>  fs/ext4/extents.c |   18 ++++++++++--------
>>  1 files changed, 10 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index e2bb929..a40a602 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -4422,16 +4422,18 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>>               trace_ext4_fallocate_exit(inode, offset, max_blocks, ret);
>>               return ret;
>>       }
>> -     flags = EXT4_GET_BLOCKS_CREATE_UNINIT_EXT;
>> -     if (mode & FALLOC_FL_KEEP_SIZE)
>> -             flags |= EXT4_GET_BLOCKS_KEEP_SIZE;
>> +
>>       /*
>> -      * Don't normalize the request if it can fit in one extent so
>> -      * that it doesn't get unnecessarily split into multiple
>> -      * extents.
>> +      * We do NOT want the requests from fallocate to be normalized
>> +      * ever!. We know exactly how much we want to allocate and
>> +      * we do not need to do any mumbo-jumbo with it. Requests bigger
>> +      * than uninit extent size, will be divided automatically into
>> +      * biggest possible extents.
>>        */
>> -     if (len <= EXT_UNINIT_MAX_LEN << blkbits)
>> -             flags |= EXT4_GET_BLOCKS_NO_NORMALIZE;
>> +     flags = EXT4_GET_BLOCKS_CREATE_UNINIT_EXT |
>> +             EXT4_GET_BLOCKS_NO_NORMALIZE;
>> +     if (mode & FALLOC_FL_KEEP_SIZE)
>> +             flags |= EXT4_GET_BLOCKS_KEEP_SIZE;
>>
>>  retry:
>>       while (ret >= 0 && ret < max_blocks) {
>> --
>> 1.7.7.6
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2013-03-22 17:11 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-21 15:50 [PATCH] ext4: Do not normalize request from fallocate Lukas Czerner
2013-03-21 16:03 ` Dmitry Monakhov
2013-03-22 17:10   ` Greg Harmon [this message]
2013-03-22 19:36     ` Theodore Ts'o
     [not found]     ` <514cb91d.8a48340a.33fd.ffff9fa3SMTPIN_ADDED_BROKEN@mx.google.com>
2013-03-22 22:19       ` Greg Harmon
2013-03-24  0:11 ` Theodore Ts'o
2013-03-24  2:42   ` Andreas Dilger
2013-03-25 10:09   ` Lukáš Czerner
2013-03-25 12:53     ` Theodore Ts'o
2013-03-25 13:26       ` Lukáš Czerner
2013-03-25 14:44         ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABFC-fO3HrWFXJhyPMPKRy6Up3vGp_JUQs_VJGRyXwph1M95Cw@mail.gmail.com \
    --to=gharm@google.com \
    --cc=curtw@google.com \
    --cc=dmonakhov@openvz.org \
    --cc=lczerner@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).