* [RFC] on-line resize with flex_bg and exclude_bitmap
@ 2011-04-12 11:27 Amir Goldstein
2011-04-13 13:45 ` Andreas Dilger
2011-06-20 18:37 ` Amir Goldstein
0 siblings, 2 replies; 4+ messages in thread
From: Amir Goldstein @ 2011-04-12 11:27 UTC (permalink / raw)
To: Theodore Tso; +Cc: Ext4 Developers List
Hi Ted,
I realized another issue regarding exclude bitmap compatibility.
the exiting EXT4_IOC_GROUP_ADD ioctl doesn't pass a field for the location
of exclude_bitmap block, so we need to either allocate exclude_bitmap in kernel
or define a new ioctl, which passes the exclude_bitmap to kernel.
If we are going to go with the latter solution, we may want to add
support for flex_bg
layout for the new ioctl, so following is my proposal for
EXT4_IOC_FLEX_GROUP_ADD:
1. As far as online resize and mkfs are concerned, we always allocate all
group descriptors of a flex bg at the same time.
2. If there is not enough space for all flex_bg metadata in the last group,
the last group will be dropped.
3. The new flex group input will assume all bitmaps of the same type are
consecutive, so only the address of the first bitmap is needed as input.
struct ext4_new_flex_group_input {
__u32 group; /* Group number for this data */
__u64 block_bitmap; /* Absolute block number of first
block bitmap */
__u64 exclude_bitmap; /* Absolute block number of first
exclude bitmap */
__u64 inode_bitmap; /* Absolute block number of first
inode bitmap */
__u64 inode_table; /* Absolute block number of first
inode table start */
__u32 blocks_count; /* Total number of blocks in this flex group */
__u16 reserved_blocks; /* Number of reserved blocks in this
flex group */
__u16 flex_size; /* Number of groups in the flex group */
};
4. ext4_group_extend() should be the same except we need to allow extending
within the last flex bg, but not necessarily the last block group.
To look at this from a different angle, if you imagine that the flex
group is just
a big group, whose bitmaps and group descriptor are flex_size times bigger and
where ext4_new_flex_group_input encodes the info of the big descriptor,
then this design is identical to the current implementation of online resize.
What I will do, if you agree to this design, is use the new ioctl,
with flex_size = 1
to pass the exclude_bitmap in online resize and enforce flex_size == 1
in kernel.
Then later we can teach resize2fs and the kernel to extend flex groups properly
using the new ioctl.
What do you think?
Can you think of another way I can support exclude_bitmap and online resize,
without the need for a new ioctl?
Amir.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC] on-line resize with flex_bg and exclude_bitmap
2011-04-12 11:27 [RFC] on-line resize with flex_bg and exclude_bitmap Amir Goldstein
@ 2011-04-13 13:45 ` Andreas Dilger
2011-04-13 14:26 ` Amir Goldstein
2011-06-20 18:37 ` Amir Goldstein
1 sibling, 1 reply; 4+ messages in thread
From: Andreas Dilger @ 2011-04-13 13:45 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Theodore Tso, Ext4 Developers List
On 2011-04-12, at 7:27 AM, Amir Goldstein wrote:
> I realized another issue regarding exclude bitmap compatibility.
> the exiting EXT4_IOC_GROUP_ADD ioctl doesn't pass a field for the location
> of exclude_bitmap block, so we need to either allocate exclude_bitmap in kernel
> or define a new ioctl, which passes the exclude_bitmap to kernel.
>
> If we are going to go with the latter solution, we may want to add
> support for flex_bg layout for the new ioctl, so following is my proposal
> for EXT4_IOC_FLEX_GROUP_ADD:
>
> 1. As far as online resize and mkfs are concerned, we always allocate all
> group descriptors of a flex bg at the same time.
>
> 2. If there is not enough space for all flex_bg metadata in the last group,
> the last group will be dropped.
>
> 3. The new flex group input will assume all bitmaps of the same type are
> consecutive, so only the address of the first bitmap is needed as input.
>
> struct ext4_new_flex_group_input {
> __u32 group; /* Group number for this data */
This field will cause a misalignment/padding in the structure and will cause
problems with 32-bit binaries on 64-bit kernels.
Should we add a "__u32 flags" field to this structure? I prefer flags/features
instead of versions... This will avoid the need to continually adding new
ioctls in the future.
> __u64 block_bitmap; /* Absolute block number of first
> block bitmap */
> __u64 exclude_bitmap; /* Absolute block number of first
> exclude bitmap */
> __u64 inode_bitmap; /* Absolute block number of first
> inode bitmap */
> __u64 inode_table; /* Absolute block number of first
> inode table start */
> __u32 blocks_count; /* Total number of blocks in this flex group */
> __u16 reserved_blocks; /* Number of reserved blocks in this
> flex group */
> __u16 flex_size; /* Number of groups in the flex group */
> };
Since the kernel only supports a single s_log_groups_per_flex value in the superblock, we should use the superblock value and change "flex_size" to be just a count of the number of groups that should be added. That means resize2fs should align its ioctls to multiples of the superblock s_log_groups_per_flex.
> 4. ext4_group_extend() should be the same except we need to allow extending
> within the last flex bg, but not necessarily the last block group.
>
>
> To look at this from a different angle, if you imagine that the flex
> group is just a big group, whose bitmaps and group descriptor are flex_size
> times bigger and where ext4_new_flex_group_input encodes the info of the big
> descriptor, then this design is identical to the current implementation of
> online resize.
>
> What I will do, if you agree to this design, is use the new ioctl,
> with flex_size = 1 to pass the exclude_bitmap in online resize and enforce
> flex_size == 1 in kernel.
If you are going to be modifying the kernel to add support for this ioctl,
why not properly add in support for flex_size > 1? It should only involve
the kernel looping over the groups as they are added. This doesn't require
that resize2fs has added support for it yet, but forcing flex_size == 1 in
the kernel means that userspace will not know whether the kernel is doing
the right thing or not.
In order to handle future resize in the case of a partially-added flex group,
it would be desirable for the unused blocks (bitmaps, inode table) are reserved
so that they are not allocated by the block allocator.
> Then later we can teach resize2fs and the kernel to extend flex groups properly
> using the new ioctl.
>
> What do you think?
> Can you think of another way I can support exclude_bitmap and online resize,
> without the need for a new ioctl?
>
> Amir.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Cheers, Andreas
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC] on-line resize with flex_bg and exclude_bitmap
2011-04-13 13:45 ` Andreas Dilger
@ 2011-04-13 14:26 ` Amir Goldstein
0 siblings, 0 replies; 4+ messages in thread
From: Amir Goldstein @ 2011-04-13 14:26 UTC (permalink / raw)
To: Andreas Dilger; +Cc: Theodore Tso, Ext4 Developers List
On Wed, Apr 13, 2011 at 4:45 PM, Andreas Dilger <adilger@dilger.ca> wrote:
> On 2011-04-12, at 7:27 AM, Amir Goldstein wrote:
>> I realized another issue regarding exclude bitmap compatibility.
>> the exiting EXT4_IOC_GROUP_ADD ioctl doesn't pass a field for the location
>> of exclude_bitmap block, so we need to either allocate exclude_bitmap in kernel
>> or define a new ioctl, which passes the exclude_bitmap to kernel.
>>
>> If we are going to go with the latter solution, we may want to add
>> support for flex_bg layout for the new ioctl, so following is my proposal
>> for EXT4_IOC_FLEX_GROUP_ADD:
>>
>> 1. As far as online resize and mkfs are concerned, we always allocate all
>> group descriptors of a flex bg at the same time.
>>
>> 2. If there is not enough space for all flex_bg metadata in the last group,
>> the last group will be dropped.
>>
>> 3. The new flex group input will assume all bitmaps of the same type are
>> consecutive, so only the address of the first bitmap is needed as input.
>>
>> struct ext4_new_flex_group_input {
>> __u32 group; /* Group number for this data */
>
> This field will cause a misalignment/padding in the structure and will cause
> problems with 32-bit binaries on 64-bit kernels.
>
> Should we add a "__u32 flags" field to this structure? I prefer flags/features
> instead of versions... This will avoid the need to continually adding new
> ioctls in the future.
Agreed.
I just copy pasted the current ext4_new_group_input with its existing flaws.
>
>> __u64 block_bitmap; /* Absolute block number of first
>> block bitmap */
>> __u64 exclude_bitmap; /* Absolute block number of first
>> exclude bitmap */
>> __u64 inode_bitmap; /* Absolute block number of first
>> inode bitmap */
>> __u64 inode_table; /* Absolute block number of first
>> inode table start */
>> __u32 blocks_count; /* Total number of blocks in this flex group */
>> __u16 reserved_blocks; /* Number of reserved blocks in this
>> flex group */
>> __u16 flex_size; /* Number of groups in the flex group */
>> };
>
>
> Since the kernel only supports a single s_log_groups_per_flex value in the superblock, we should use the superblock value and change "flex_size" to be just a count of the number of groups that should be added. That means resize2fs should align its ioctls to multiples of the superblock s_log_groups_per_flex.
>
Agreed, let's call it group_count.
In most cases, resize2fs would use the new ioctl to add *exactly* 1 flex group,
with all its bitmaps and descriptors, just as the current ioctl always
adds 1 group,
even if not all group is made available at that time.
resize2fs will need to use group_count < groups_per_flex to align
existing fs to flex group.
in case existing fs has 1 group and online resize extends it to 16 groups,
the bitmaps for group 1-15 will all be placed in group 1.
does that make sense? or would it be better to extend the fs 1 group
at a time until
flex group boundary in that case?
>
>> 4. ext4_group_extend() should be the same except we need to allow extending
>> within the last flex bg, but not necessarily the last block group.
>>
>>
>> To look at this from a different angle, if you imagine that the flex
>> group is just a big group, whose bitmaps and group descriptor are flex_size
>> times bigger and where ext4_new_flex_group_input encodes the info of the big
>> descriptor, then this design is identical to the current implementation of
>> online resize.
>>
>> What I will do, if you agree to this design, is use the new ioctl,
>> with flex_size = 1 to pass the exclude_bitmap in online resize and enforce
>> flex_size == 1 in kernel.
>
> If you are going to be modifying the kernel to add support for this ioctl,
> why not properly add in support for flex_size > 1?
because I don't have time to do it (test it) now :-(
and I want to post the e2fsprogs exclude bitmap patch as soon as possible.
> It should only involve
> the kernel looping over the groups as they are added. This doesn't require
> that resize2fs has added support for it yet, but forcing flex_size == 1 in
> the kernel means that userspace will not know whether the kernel is doing
> the right thing or not.
>
if I implement group_count > 1 in the kernel, I will need to implement
it in e2fsprogs,
so I can test it properly.
hopefully, there will be no official kernel which enforces group_count == 1.
> In order to handle future resize in the case of a partially-added flex group,
> it would be desirable for the unused blocks (bitmaps, inode table) are reserved
> so that they are not allocated by the block allocator.
>
Yes, that would be desirable, but then again, we will anyway need to
handle resize
of fs which was formatted/resized before my patches.
Anayway, I hope I will implement proper flex_bg resize for 2.6.40, I'm
just not sure I can commit to it.
>> Then later we can teach resize2fs and the kernel to extend flex groups properly
>> using the new ioctl.
>>
>> What do you think?
>> Can you think of another way I can support exclude_bitmap and online resize,
>> without the need for a new ioctl?
>>
>> Amir.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
> Cheers, Andreas
>
>
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC] on-line resize with flex_bg and exclude_bitmap
2011-04-12 11:27 [RFC] on-line resize with flex_bg and exclude_bitmap Amir Goldstein
2011-04-13 13:45 ` Andreas Dilger
@ 2011-06-20 18:37 ` Amir Goldstein
1 sibling, 0 replies; 4+ messages in thread
From: Amir Goldstein @ 2011-06-20 18:37 UTC (permalink / raw)
To: Yongqiang Yang; +Cc: Ext4 Developers List, Theodore Tso, Andreas Dilger
On Tue, Apr 12, 2011 at 2:27 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> Hi Ted,
>
> I realized another issue regarding exclude bitmap compatibility.
> the exiting EXT4_IOC_GROUP_ADD ioctl doesn't pass a field for the location
> of exclude_bitmap block, so we need to either allocate exclude_bitmap in kernel
> or define a new ioctl, which passes the exclude_bitmap to kernel.
>
> If we are going to go with the latter solution, we may want to add
> support for flex_bg
> layout for the new ioctl, so following is my proposal for
> EXT4_IOC_FLEX_GROUP_ADD:
>
> 1. As far as online resize and mkfs are concerned, we always allocate all
> group descriptors of a flex bg at the same time.
>
> 2. If there is not enough space for all flex_bg metadata in the last group,
> the last group will be dropped.
>
> 3. The new flex group input will assume all bitmaps of the same type are
> consecutive, so only the address of the first bitmap is needed as input.
>
> struct ext4_new_flex_group_input {
> __u32 group; /* Group number for this data */
> __u64 block_bitmap; /* Absolute block number of first
> block bitmap */
> __u64 exclude_bitmap; /* Absolute block number of first
> exclude bitmap */
> __u64 inode_bitmap; /* Absolute block number of first
> inode bitmap */
> __u64 inode_table; /* Absolute block number of first
> inode table start */
> __u32 blocks_count; /* Total number of blocks in this flex group */
> __u16 reserved_blocks; /* Number of reserved blocks in this
> flex group */
> __u16 flex_size; /* Number of groups in the flex group */
> };
>
> 4. ext4_group_extend() should be the same except we need to allow extending
> within the last flex bg, but not necessarily the last block group.
>
>
> To look at this from a different angle, if you imagine that the flex
> group is just
> a big group, whose bitmaps and group descriptor are flex_size times bigger and
> where ext4_new_flex_group_input encodes the info of the big descriptor,
> then this design is identical to the current implementation of online resize.
>
> What I will do, if you agree to this design, is use the new ioctl,
> with flex_size = 1
> to pass the exclude_bitmap in online resize and enforce flex_size == 1
> in kernel.
> Then later we can teach resize2fs and the kernel to extend flex groups properly
> using the new ioctl.
>
> What do you think?
> Can you think of another way I can support exclude_bitmap and online resize,
> without the need for a new ioctl?
>
> Amir.
>
Hi Yongqiang,
Following today's conf call, we agreed to drop the old habit of
letting user space decide
on bitmap allocations and leave the decision in the hands on the kernel.
The new resize ioctl should just provide the new 64bit blocks count to
the kernel.
The new resize ioctl (say EXT4_IOC_ONLINE_RESIZE) cabability should be published
to user space via ext4_feat_attrs, like lazy_itable_init and
batch_discard and resize2fs
should check that attribute before choosing the ioctl to use.
As a first step, I think the new ioctl should be implemented by
calling ext4_group_add()
and ext4_group_extend() for all groups in the range.
This is sufficient for supporting exclude bitmap, because the kernel
can decide on the location
for the exclude bitmaps.
The next obvious steps would be to:
1. honor the flex_bg layout (whenever possible)
2. resize to large sizes very quickly by skipping the bitmaps and itable init
But you don't have to complete all the steps if you are overbooked
with other tasks.
More future enhancement that will be accessible with the new ioctl are:
3. support resize for 64bit/meta_bg fs
4. support resize for big_alloc fs
Thanks,
Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-06-20 18:37 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-12 11:27 [RFC] on-line resize with flex_bg and exclude_bitmap Amir Goldstein
2011-04-13 13:45 ` Andreas Dilger
2011-04-13 14:26 ` Amir Goldstein
2011-06-20 18:37 ` Amir Goldstein
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).