* Any qualms about reverting 3d0518f4, ext4: New rec_len encoding for very large blocksizes ?
@ 2010-08-03 22:30 Eric Sandeen
2010-08-03 22:44 ` Andreas Dilger
0 siblings, 1 reply; 6+ messages in thread
From: Eric Sandeen @ 2010-08-03 22:30 UTC (permalink / raw)
To: ext4 development; +Cc: Wei Yongjun
I think we should possibly revert this commit, due to a perf regression:
commit 3d0518f4758eca4339e75e5b9dbb7e06a5ce08b4
Author: Wei Yongjun <yjwei@cn.fujitsu.com>
Date: Sat Feb 14 23:01:36 2009 -0500
ext4: New rec_len encoding for very large blocksizes
The rec_len field in the directory entry is 16 bits, so to encode
blocksizes larger than 64k becomes problematic. This patch allows us
to supprot block sizes up to 256k, by using the low 2 bits to extend
the range of rec_len to 2**18-1 (since valid rec_len sizes must be a
multiple of 4). We use the convention that a rec_len of 0 or 65535
means the filesystem block size, for compatibility with older kernels.
It's unlikely we'll see VM pages of up to 256k, but at some point we
might find that the Linux VM has been enhanced to support filesystem
block sizes > than the VM page size, at which point it might be useful
for some applications to allow very large filesystem block sizes.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
It's a novel solution to the problem, but I'm not sure it's a problem
that needs to be solved today since we cannot even make filesystems with
> 64k blocks:
$ mkfs.ext4 -b 131072 fsfile
mkfs.ext4: invalid block size - 131072
Further, the patch un-inlines ext4_rec_len_from_disk() and adds
complexity to it, and this is a pretty hot function (called from
ext4_check_dir_entry) on a file-creation workload.
We could maybe try to fix it up or make it conditional on blocksize,
but it seems to me that there's no reason to have this complexity in the
codepaths today, when it's addressing an impossible situation.
Just for reference, the testing I did on RHEL6 was:
# bonnie++ -u root -s 0 -f -x 200 -d /mnt/test -n 32
(this does 200 iterations) and got this for the file creations:
ext4 stock: Average = 21206.8 files/s
ext4 patched: Average = 22822.1 files/s
This is a 7.6% improvement...
Thoughts?
Thanks,
-Eric
-Eric
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Any qualms about reverting 3d0518f4, ext4: New rec_len encoding for very large blocksizes ?
2010-08-03 22:30 Any qualms about reverting 3d0518f4, ext4: New rec_len encoding for very large blocksizes ? Eric Sandeen
@ 2010-08-03 22:44 ` Andreas Dilger
2010-08-03 22:49 ` Eric Sandeen
0 siblings, 1 reply; 6+ messages in thread
From: Andreas Dilger @ 2010-08-03 22:44 UTC (permalink / raw)
To: Eric Sandeen; +Cc: ext4 development, Wei Yongjun
On 2010-08-03, at 16:30, Eric Sandeen wrote:
> commit 3d0518f4758eca4339e75e5b9dbb7e06a5ce08b4
> Author: Wei Yongjun <yjwei@cn.fujitsu.com>
> Date: Sat Feb 14 23:01:36 2009 -0500
>
> ext4: New rec_len encoding for very large blocksizes
>
> The rec_len field in the directory entry is 16 bits, so to encode
> blocksizes larger than 64k becomes problematic. This patch allows us
> to supprot block sizes up to 256k, by using the low 2 bits to extend
> the range of rec_len to 2**18-1 (since valid rec_len sizes must be a
> multiple of 4). We use the convention that a rec_len of 0 or 65535
> means the filesystem block size, for compatibility with older kernels.
>
> It's a novel solution to the problem, but I'm not sure it's a problem
> that needs to be solved today since we cannot even make filesystems with
>>
>> 64k blocks:
I don't object to reverting the > 64kB support, but we should strive to keep 64kB directory blocks working, if that is possibly going to break with reverting this patch. I don't think any of the other extN code is expecting to handle block sizes larger than 64kB, so little value to do it just in the directory handling code.
> Just for reference, the testing I did on RHEL6 was:
>
> # bonnie++ -u root -s 0 -f -x 200 -d /mnt/test -n 32
>
> (this does 200 iterations) and got this for the file creations:
>
> ext4 stock: Average = 21206.8 files/s
> ext4 patched: Average = 22822.1 files/s
>
> This is a 7.6% improvement...
Nothing to sneeze at.
Cheers, Andreas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Any qualms about reverting 3d0518f4, ext4: New rec_len encoding for very large blocksizes ?
2010-08-03 22:44 ` Andreas Dilger
@ 2010-08-03 22:49 ` Eric Sandeen
2010-08-03 23:12 ` Ted Ts'o
0 siblings, 1 reply; 6+ messages in thread
From: Eric Sandeen @ 2010-08-03 22:49 UTC (permalink / raw)
To: Andreas Dilger; +Cc: ext4 development, Wei Yongjun
On 08/03/2010 05:44 PM, Andreas Dilger wrote:
> On 2010-08-03, at 16:30, Eric Sandeen wrote:
>> commit 3d0518f4758eca4339e75e5b9dbb7e06a5ce08b4 Author: Wei Yongjun
>> <yjwei@cn.fujitsu.com> Date: Sat Feb 14 23:01:36 2009 -0500
>>
>> ext4: New rec_len encoding for very large blocksizes
>>
>> The rec_len field in the directory entry is 16 bits, so to encode
>> blocksizes larger than 64k becomes problematic. This patch allows
>> us to supprot block sizes up to 256k, by using the low 2 bits to
>> extend the range of rec_len to 2**18-1 (since valid rec_len sizes
>> must be a multiple of 4). We use the convention that a rec_len of
>> 0 or 65535 means the filesystem block size, for compatibility with
>> older kernels.
>>
>> It's a novel solution to the problem, but I'm not sure it's a
>> problem that needs to be solved today since we cannot even make
>> filesystems with
>>>
>>> 64k blocks:
>
> I don't object to reverting the > 64kB support, but we should strive
> to keep 64kB directory blocks working, if that is possibly going to
> break with reverting this patch. I don't think any of the other extN
> code is expecting to handle block sizes larger than 64kB, so little
> value to do it just in the directory handling code.
As far as I know, reverting it won't break 64kb dir blocks...?
A simpler solution to the perf problem might be to just re-inline the
function, but I'm not sure what the point of all this is, as you say.
Thanks,
-Eric
>> Just for reference, the testing I did on RHEL6 was:
>>
>> # bonnie++ -u root -s 0 -f -x 200 -d /mnt/test -n 32
>>
>> (this does 200 iterations) and got this for the file creations:
>>
>> ext4 stock: Average = 21206.8 files/s
>> ext4 patched: Average = 22822.1 files/s
>>
>> This is a 7.6% improvement...
>
> Nothing to sneeze at.
>
> Cheers, Andreas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Any qualms about reverting 3d0518f4, ext4: New rec_len encoding for very large blocksizes ?
2010-08-03 22:49 ` Eric Sandeen
@ 2010-08-03 23:12 ` Ted Ts'o
2010-08-04 0:33 ` Andreas Dilger
0 siblings, 1 reply; 6+ messages in thread
From: Ted Ts'o @ 2010-08-03 23:12 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Andreas Dilger, ext4 development, Wei Yongjun
On Tue, Aug 03, 2010 at 05:49:22PM -0500, Eric Sandeen wrote:
>
> As far as I know, reverting it won't break 64kb dir blocks...?
I seem to recall there was some confusion about what was the correct
way of recording a rec_len of 64k --- 0 or 65535. So after reverting
the patch, we need to make sure we didn't end up breaking
compatibility with (a) existing file systems and (b) what older
versions of mke2fs may have generated.
> >> (this does 200 iterations) and got this for the file creations:
> >>
> >> ext4 stock: Average = 21206.8 files/s
> >> ext4 patched: Average = 22822.1 files/s
> >>
> >> This is a 7.6% improvement...
Wow. I assume that was because actually ending up burning enough CPU
time that it slowed bonnie's performance? I'm not sure how how
realistic is a benchmark that is simply creating vast numbers of small
files in a tight loop, but certainly on non-Itanium systems where the
page size is nowhere near 64k, it's arguably pointless. (Can you even
configure an Itanic to have to have a page size > 64k?)
So one way of dealing wih this is making it an inline, and then
#ifdef'ing out the more complex code if the page size is < 64k....
- Ted
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Any qualms about reverting 3d0518f4, ext4: New rec_len encoding for very large blocksizes ?
2010-08-03 23:12 ` Ted Ts'o
@ 2010-08-04 0:33 ` Andreas Dilger
2010-08-04 14:21 ` Eric Sandeen
0 siblings, 1 reply; 6+ messages in thread
From: Andreas Dilger @ 2010-08-04 0:33 UTC (permalink / raw)
To: Ted Ts'o; +Cc: Eric Sandeen, ext4 development, Wei Yongjun
On 2010-08-03, at 17:12, Ted Ts'o wrote:
> On Tue, Aug 03, 2010 at 05:49:22PM -0500, Eric Sandeen wrote:
>> As far as I know, reverting it won't break 64kb dir blocks...?
>
> I seem to recall there was some confusion about what was the correct
> way of recording a rec_len of 64k --- 0 or 65535. So after reverting
> the patch, we need to make sure we didn't end up breaking
> compatibility with (a) existing file systems and (b) what older
> versions of mke2fs may have generated.
Right - the newer code accepts both EXT4_MAX_REC_LEN and 0 to mean "blocksize". The old code took EXT4_MAX_REC_LEN to mean "1 << 16". For filesystems with blocksize < 64k there is no difference, but I think it makes sense to continue to accept both EXT4_MAX_REC_LEN and "0".
>>>> (this does 200 iterations) and got this for the file creations:
>>>>
>>>> ext4 stock: Average = 21206.8 files/s
>>>> ext4 patched: Average = 22822.1 files/s
>>>>
>>>> This is a 7.6% improvement...
>
> So one way of dealing wih this is making it an inline, and then
> #ifdef'ing out the more complex code if the page size is < 64k....
Should it also be put under "unlikely()", or for that matter, the whole thing could be #ifdef'd out for PAGE_SIZE < 65536 since we won't magically grow blocksize > PAGE_SIZE support without knowing it.
Cheers, Andreas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Any qualms about reverting 3d0518f4, ext4: New rec_len encoding for very large blocksizes ?
2010-08-04 0:33 ` Andreas Dilger
@ 2010-08-04 14:21 ` Eric Sandeen
0 siblings, 0 replies; 6+ messages in thread
From: Eric Sandeen @ 2010-08-04 14:21 UTC (permalink / raw)
To: Andreas Dilger; +Cc: Ted Ts'o, ext4 development, Wei Yongjun
Andreas Dilger wrote:
> On 2010-08-03, at 17:12, Ted Ts'o wrote:
>> On Tue, Aug 03, 2010 at 05:49:22PM -0500, Eric Sandeen wrote:
>>> As far as I know, reverting it won't break 64kb dir blocks...?
>> I seem to recall there was some confusion about what was the
>> correct way of recording a rec_len of 64k --- 0 or 65535. So after
>> reverting the patch, we need to make sure we didn't end up breaking
>> compatibility with (a) existing file systems and (b) what older
>> versions of mke2fs may have generated.
>
> Right - the newer code accepts both EXT4_MAX_REC_LEN and 0 to mean
> "blocksize". The old code took EXT4_MAX_REC_LEN to mean "1 << 16".
> For filesystems with blocksize < 64k there is no difference, but I
> think it makes sense to continue to accept both EXT4_MAX_REC_LEN and
> "0".
>
>>>>> (this does 200 iterations) and got this for the file
>>>>> creations:
>>>>>
>>>>> ext4 stock: Average = 21206.8 files/s ext4 patched: Average
>>>>> = 22822.1 files/s
>>>>>
>>>>> This is a 7.6% improvement...
>> So one way of dealing wih this is making it an inline, and then
>> #ifdef'ing out the more complex code if the page size is < 64k....
Ok, I'd thought about that route too (#ifdef, or whatnot) -
I'll see if I can work it out that way.
However, #ifdef on PAGE_CACHE_SIZE didn't seem to quite work, as the
original motivation was for the "whenever the VM allows block
size > page size" case, I think...
-Eric
> Should it also be put under "unlikely()", or for that matter, the
> whole thing could be #ifdef'd out for PAGE_SIZE < 65536 since we
> won't magically grow blocksize > PAGE_SIZE support without knowing
> it.
> Cheers, Andreas
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-08-04 14:22 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-03 22:30 Any qualms about reverting 3d0518f4, ext4: New rec_len encoding for very large blocksizes ? Eric Sandeen
2010-08-03 22:44 ` Andreas Dilger
2010-08-03 22:49 ` Eric Sandeen
2010-08-03 23:12 ` Ted Ts'o
2010-08-04 0:33 ` Andreas Dilger
2010-08-04 14:21 ` Eric Sandeen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).