Re: ditching e4b->alloc

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: ditching e4b->alloc_semp
@ 2011-02-21 20:02 Amir Goldstein
  2011-02-22 17:18 ` Andreas Dilger
  2011-03-05 16:06 ` Ted Ts'o
  0 siblings, 2 replies; 11+ messages in thread
From: Amir Goldstein @ 2011-02-21 20:02 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: Aneesh Kumar K.V, Ext4 Developers List

On Mon, Feb 14, 2011 at 11:22 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Mon, Feb 14, 2011 at 9:58 PM, Ted Ts'o <tytso@mit.edu> wrote:
>
>> One thing that comes to mind about your question with the
>> e4b->alloc_semp causing problems.  If the only reason why we need it
>> is to protect against multiple attempts to initialize different block
>> groups that share the same buddy bitmap, can we solve the problem by
>> ditching e4b->alloc_semp entirely, and simply using lock_page() on the
>> buddy bitmap page to protect it?
>>
>
> Perhaps. I imagine there is more than one elegant way to deal with that,
> but using a semaphore is not one of them.
> I will take a shot at evaporating e4b->alloc_semp.
>

After looking at the code a bit, I find that the only critical
resource that several
groups may share on a single page is the Uptodate flag, which is used
to indicate
that the buddy cache for *all* these groups is loaded
and lock_page() and get_page() are used to protect it.

There are 2 ways to eliminate this dependency:

1. (AKA easy lane) use a single page (or more) per block group.
this will increase the memory usage for 1K blocks fs and for 2K block fs
on 8K page system, but are these use cases really that common?

2. (AKA hard lane) attach buffer heads to buddy page and use
buffer_uptodate() and buffer_lock() instead of PageUptodate() and lock_page()
to initialize buddy cache of groups that share the same page.

What do you say?
Shall I take easy lane?

Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ditching e4b->alloc_semp
  2011-02-21 20:02 ditching e4b->alloc_semp Amir Goldstein
@ 2011-02-22 17:18 ` Andreas Dilger
  2011-02-22 20:02   ` Amir Goldstein
  2011-03-05 16:06 ` Ted Ts'o
  1 sibling, 1 reply; 11+ messages in thread
From: Andreas Dilger @ 2011-02-22 17:18 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Ted Ts'o, Aneesh Kumar K.V, Ext4 Developers List

On 2011-02-21, at 1:02 PM, Amir Goldstein wrote:
> On Mon, Feb 14, 2011 at 11:22 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Mon, Feb 14, 2011 at 9:58 PM, Ted Ts'o <tytso@mit.edu> wrote:
>> 
>>> One thing that comes to mind about your question with the
>>> e4b->alloc_semp causing problems.  If the only reason why we need it
>>> is to protect against multiple attempts to initialize different block
>>> groups that share the same buddy bitmap, can we solve the problem by
>>> ditching e4b->alloc_semp entirely, and simply using lock_page() on the
>>> buddy bitmap page to protect it?
>> 
>> Perhaps. I imagine there is more than one elegant way to deal with that,
>> but using a semaphore is not one of them.
>> I will take a shot at evaporating e4b->alloc_semp.
> 
> After looking at the code a bit, I find that the only critical resource
> that several groups may share on a single page is the Uptodate flag,
> which is used to indicate that the buddy cache for *all* these groups
> is loaded and lock_page() and get_page() are used to protect it.
> 
> There are 2 ways to eliminate this dependency:
> 
> 1. (AKA easy lane) use a single page (or more) per block group.
> this will increase the memory usage for 1K blocks fs and for 2K block fs
> on 8K page system, but are these use cases really that common?

I think some distros may use 1kB block filesystems for root, where there are lots of small files.  I wonder if smolt would have this kind of info?

> 2. (AKA hard lane) attach buffer heads to buddy page and use
> buffer_uptodate() and buffer_lock() instead of PageUptodate() and lock_page()
> to initialize buddy cache of groups that share the same page.
> 
> What do you say?
> Shall I take easy lane?

For flex_bg filesystems, it would probably make even more sense to just load all of the bitmaps for that page, since it won't waste any more memory or cause extra disk seeks.  I wonder what the memory vs. seek performance tradeoff is for 1k filesystems to load all the bitmaps even for the non-flex_bg case (i.e. would the second bitmap have been loaded anyway in most cases)?

Cheers, Andreas






^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ditching e4b->alloc_semp
  2011-02-22 17:18 ` Andreas Dilger
@ 2011-02-22 20:02   ` Amir Goldstein
  2011-02-22 21:00     ` Andreas Dilger
  0 siblings, 1 reply; 11+ messages in thread
From: Amir Goldstein @ 2011-02-22 20:02 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Ted Ts'o, Aneesh Kumar K.V, Ext4 Developers List

On Tue, Feb 22, 2011 at 7:18 PM, Andreas Dilger <adilger@dilger.ca> wrote:
> On 2011-02-21, at 1:02 PM, Amir Goldstein wrote:
>> On Mon, Feb 14, 2011 at 11:22 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> On Mon, Feb 14, 2011 at 9:58 PM, Ted Ts'o <tytso@mit.edu> wrote:
>>>
>>>> One thing that comes to mind about your question with the
>>>> e4b->alloc_semp causing problems.  If the only reason why we need it
>>>> is to protect against multiple attempts to initialize different block
>>>> groups that share the same buddy bitmap, can we solve the problem by
>>>> ditching e4b->alloc_semp entirely, and simply using lock_page() on the
>>>> buddy bitmap page to protect it?
>>>
>>> Perhaps. I imagine there is more than one elegant way to deal with that,
>>> but using a semaphore is not one of them.
>>> I will take a shot at evaporating e4b->alloc_semp.
>>
>> After looking at the code a bit, I find that the only critical resource
>> that several groups may share on a single page is the Uptodate flag,
>> which is used to indicate that the buddy cache for *all* these groups
>> is loaded and lock_page() and get_page() are used to protect it.
>>
>> There are 2 ways to eliminate this dependency:
>>
>> 1. (AKA easy lane) use a single page (or more) per block group.
>> this will increase the memory usage for 1K blocks fs and for 2K block fs
>> on 8K page system, but are these use cases really that common?
>
> I think some distros may use 1kB block filesystems for root, where there are lots of small files.  I wonder if smolt would have this kind of info?
>
>> 2. (AKA hard lane) attach buffer heads to buddy page and use
>> buffer_uptodate() and buffer_lock() instead of PageUptodate() and lock_page()
>> to initialize buddy cache of groups that share the same page.
>>
>> What do you say?
>> Shall I take easy lane?
>
> For flex_bg filesystems, it would probably make even more sense to just load all of the bitmaps for that page, since it won't waste any more memory or cause extra disk seeks.  I wonder what the memory vs. seek performance tradeoff is for 1k filesystems to load all the bitmaps even for the non-flex_bg case (i.e. would the second bitmap have been loaded anyway in most cases)?
>

I'm sorry. I don't follow. I see how disk seeks can be avoided if we
load all bitmaps of a flex_bg,
but there can be no more than 2 groups on a page (4 on 8k system).
So what do I gain? My goal  is to remove the locking protection on
allocations from different block groups.

Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ditching e4b->alloc_semp
  2011-02-22 20:02   ` Amir Goldstein
@ 2011-02-22 21:00     ` Andreas Dilger
  2011-02-23  9:54       ` Amir Goldstein
  0 siblings, 1 reply; 11+ messages in thread
From: Andreas Dilger @ 2011-02-22 21:00 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Ted Ts'o, Aneesh Kumar K.V, Ext4 Developers List

On 2011-02-22, at 1:02 PM, Amir Goldstein wrote:
> On Tue, Feb 22, 2011 at 7:18 PM, Andreas Dilger <adilger@dilger.ca> wrote:
>> On 2011-02-21, at 1:02 PM, Amir Goldstein wrote:
>>> After looking at the code a bit, I find that the only critical resource
>>> that several groups may share on a single page is the Uptodate flag,
>>> which is used to indicate that the buddy cache for *all* these groups
>>> is loaded and lock_page() and get_page() are used to protect it.
>>> 
>>> There are 2 ways to eliminate this dependency:
>>> 
>>> 1. (AKA easy lane) use a single page (or more) per block group.
>>> this will increase the memory usage for 1K blocks fs and for 2K block fs
>>> on 8K page system, but are these use cases really that common?
>> 
>> I think some distros may use 1kB block filesystems for root, where there are lots of small files.  I wonder if smolt would have this kind of info?
>> 
>>> 2. (AKA hard lane) attach buffer heads to buddy page and use
>>> buffer_uptodate() and buffer_lock() instead of PageUptodate() and lock_page()
>>> to initialize buddy cache of groups that share the same page.
>>> 
>>> What do you say?
>>> Shall I take easy lane?
>> 
>> For flex_bg filesystems, it would probably make even more sense to just load all of the bitmaps for that page, since it won't waste any more memory or cause extra disk seeks.  I wonder what the memory vs. seek performance tradeoff is for 1k filesystems to load all the bitmaps even for the non-flex_bg case (i.e. would the second bitmap have been loaded anyway in most cases)?
> 
> I'm sorry. I don't follow. I see how disk seeks can be avoided if we
> load all bitmaps of a flex_bg,
> but there can be no more than 2 groups on a page (4 on 8k system).
> So what do I gain? My goal  is to remove the locking protection on
> allocations from different block groups.

My point was that the locking is necessary because we may be loading multiple bitmaps into a single page at different times, making it difficult to set PageUptodate at one time.  However, with flex_bg it is possible to read both bitmap blocks into the same page at one time without noticeably hurting performance, and possibly even improving performance due to reduced disk operations.  Without flex_bg the benefit of reading all the bitmaps into a page at one time is less clear, because it would seek to read each bitmap.

Cheers, Andreas






^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ditching e4b->alloc_semp
  2011-02-22 21:00     ` Andreas Dilger
@ 2011-02-23  9:54       ` Amir Goldstein
  0 siblings, 0 replies; 11+ messages in thread
From: Amir Goldstein @ 2011-02-23  9:54 UTC (permalink / raw)
  To: Ted Ts'o, Andreas Dilger; +Cc: Aneesh Kumar K.V, Ext4 Developers List

On Tue, Feb 22, 2011 at 11:00 PM, Andreas Dilger <adilger@dilger.ca> wrote:
> On 2011-02-22, at 1:02 PM, Amir Goldstein wrote:
>> On Tue, Feb 22, 2011 at 7:18 PM, Andreas Dilger <adilger@dilger.ca> wrote:
>>> On 2011-02-21, at 1:02 PM, Amir Goldstein wrote:
>>>> After looking at the code a bit, I find that the only critical resource
>>>> that several groups may share on a single page is the Uptodate flag,
>>>> which is used to indicate that the buddy cache for *all* these groups
>>>> is loaded and lock_page() and get_page() are used to protect it.
>>>>
>>>> There are 2 ways to eliminate this dependency:
>>>>
>>>> 1. (AKA easy lane) use a single page (or more) per block group.
>>>> this will increase the memory usage for 1K blocks fs and for 2K block fs
>>>> on 8K page system, but are these use cases really that common?
>>>
>>> I think some distros may use 1kB block filesystems for root, where there are lots of small files.  I wonder if smolt would have this kind of info?
>>>
>>>> 2. (AKA hard lane) attach buffer heads to buddy page and use
>>>> buffer_uptodate() and buffer_lock() instead of PageUptodate() and lock_page()
>>>> to initialize buddy cache of groups that share the same page.
>>>>
>>>> What do you say?
>>>> Shall I take easy lane?
>>>
>>> For flex_bg filesystems, it would probably make even more sense to just load all of the bitmaps for that page, since it won't waste any more memory or cause extra disk seeks.  I wonder what the memory vs. seek performance tradeoff is for 1k filesystems to load all the bitmaps even for the non-flex_bg case (i.e. would the second bitmap have been loaded anyway in most cases)?
>>
>> I'm sorry. I don't follow. I see how disk seeks can be avoided if we
>> load all bitmaps of a flex_bg,
>> but there can be no more than 2 groups on a page (4 on 8k system).
>> So what do I gain? My goal  is to remove the locking protection on
>> allocations from different block groups.
>
> My point was that the locking is necessary because we may be loading multiple bitmaps into a single page at different times, making it difficult to set PageUptodate at one time.  However, with flex_bg it is possible to read both bitmap blocks into the same page at one time without noticeably hurting performance, and possibly even improving performance due to reduced disk operations.  Without flex_bg the benefit of reading all the bitmaps into a page at one time is less clear, because it would seek to read each bitmap.
>

I'm afraid the the details of buddy cache initialization are a bit more complex
than the level of our discussion and you may be referring to other locks then
the ones I am referring to (there are several levels of lockings).

I am trying to ditch the holding of down_read(grp->alloc_sem) throughout the
time that buddy is loaded,
because I find it to be an overkill and it interferes with snapshots COW.

In both options (i.e. easy lane or hard lane above), the page_lock() will be
held during ext4_mb_init_cache() and down_write(grp->alloc_sem) of the group
ITSELF will be held during ext4_mb_init_group().
In the buddy page buffers option, it is possible to read all blocks of the same
page at the same time and mark all buddy page buffers uptodate.
Other groups on the same page will be able to skip the read from disk, go
straight to initializing phase and mark the buddy page buffers bitmap_uptodate
on completion.

ext4_mb_load_buddy() would then check for !buffer_bitmap_uptodate() instead
of !PageUptodate(), before calling ext4_mb_init_cache().

It's a bit complicated to explain without a patch and I will get to it...
I only wanted to know if I can take easy lane, since it is a lot less work
and I was not sure the little memory saved for 1K block fs is worth that work.

Easy lane will also be simplifying mballoc.c and reduce lines, whereas
hard lane is
likely to add more lines that it removes.

Ted, can you please comment on the easy vs. hard choice?

Thanks,
Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ditching e4b->alloc_semp
  2011-02-21 20:02 ditching e4b->alloc_semp Amir Goldstein
  2011-02-22 17:18 ` Andreas Dilger
@ 2011-03-05 16:06 ` Ted Ts'o
  2011-03-06 20:15   ` Amir Goldstein
  2011-03-21  4:39   ` Amir Goldstein
  1 sibling, 2 replies; 11+ messages in thread
From: Ted Ts'o @ 2011-03-05 16:06 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Aneesh Kumar K.V, Ext4 Developers List

On Mon, Feb 21, 2011 at 10:02:44PM +0200, Amir Goldstein wrote:
> 
> 1. (AKA easy lane) use a single page (or more) per block group.
> this will increase the memory usage for 1K blocks fs and for 2K block fs
> on 8K page system, but are these use cases really that common?

The most common use cases will be 4k block file system on 16k page
systems, which show up on PowerPC and Itanium systems.

> 2. (AKA hard lane) attach buffer heads to buddy page and use
> buffer_uptodate() and buffer_lock() instead of PageUptodate() and lock_page()
> to initialize buddy cache of groups that share the same page.

How about this; use lock_page() to guarantee exclusive access to the
shared buddy bitmap, and then define a new bit in
ext4_group_info->bb_state to indicate whether or not a particular
block group has been initialized.  If the page has gotten flushed from
memory, so that it is not present at all (i.e., find_get_page returns
NULL), then iterate over all of the groups to clear the
EXT4_GROUP_INFO_BUDDY_INIT bit.

If the page is returned by find_get_page(), then all you need to do is
check the EXT4_GROUP_INFO_BUDDY_INIT bit to discover whether or not or
not the buddy bitmap needs to be initialized.

							- Ted

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ditching e4b->alloc_semp
  2011-03-05 16:06 ` Ted Ts'o
@ 2011-03-06 20:15   ` Amir Goldstein
  2011-03-07  4:08     ` Ted Ts'o
  2011-03-21  4:39   ` Amir Goldstein
  1 sibling, 1 reply; 11+ messages in thread
From: Amir Goldstein @ 2011-03-06 20:15 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: Aneesh Kumar K.V, Ext4 Developers List

On Sat, Mar 5, 2011 at 6:06 PM, Ted Ts'o <tytso@mit.edu> wrote:
> On Mon, Feb 21, 2011 at 10:02:44PM +0200, Amir Goldstein wrote:
>>
>> 1. (AKA easy lane) use a single page (or more) per block group.
>> this will increase the memory usage for 1K blocks fs and for 2K block fs
>> on 8K page system, but are these use cases really that common?
>
> The most common use cases will be 4k block file system on 16k page
> systems, which show up on PowerPC and Itanium systems.

OK. no easy way out.

>
>> 2. (AKA hard lane) attach buffer heads to buddy page and use
>> buffer_uptodate() and buffer_lock() instead of PageUptodate() and lock_page()
>> to initialize buddy cache of groups that share the same page.
>
> How about this; use lock_page() to guarantee exclusive access to the
> shared buddy bitmap, and then define a new bit in
> ext4_group_info->bb_state to indicate whether or not a particular
> block group has been initialized.  If the page has gotten flushed from
> memory, so that it is not present at all (i.e., find_get_page returns
> NULL), then iterate over all of the groups to clear the
> EXT4_GROUP_INFO_BUDDY_INIT bit.
>
> If the page is returned by find_get_page(), then all you need to do is
> check the EXT4_GROUP_INFO_BUDDY_INIT bit to discover whether or not or
> not the buddy bitmap needs to be initialized.
>

That sounds about right, but why do I need a new bit?
Why can't I use EXT4_GROUP_INFO_NEED_INIT_BIT to tell me the exact same thing?
The only difference from current implementation is that I would have
to test the bit again
after taking page_lock().

I didn't have time to check the implementation into details yet.
I will try to get to it during this week and send a tryout patch.

Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ditching e4b->alloc_semp
  2011-03-06 20:15   ` Amir Goldstein
@ 2011-03-07  4:08     ` Ted Ts'o
  2011-03-07  5:52       ` Amir Goldstein
  0 siblings, 1 reply; 11+ messages in thread
From: Ted Ts'o @ 2011-03-07  4:08 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Aneesh Kumar K.V, Ext4 Developers List

On Sun, Mar 06, 2011 at 10:15:41PM +0200, Amir Goldstein wrote:
> 
> That sounds about right, but why do I need a new bit?
> Why can't I use EXT4_GROUP_INFO_NEED_INIT_BIT to tell me the exact 
> same thing?

The current meaning of NEED_INIT_BIT is that it indicates that the
group has been initialized once since the file system has been
mounted.  It is used by ext4_mb_good_group() to know whether it can
rely on ext4_group_info->bb_free, ext4_group_info->bb_fragments,
ext4_group_info->bb_largest_free_order, et. al, without needing to
reload the buddy bitmap.

We added this so that even if memory pressure has forced the buddy
bitmap and block allocation bitmaps out of memory, we have enough
information in the ext4_group_info summary array that we can quickly
decide whether or not a group is a likely good candidate to be
examined more closely to have the necessary free blocks.  Without this
(relatively recent) change, the mballoc code might potentially need to
read in tens if not hundreds of block allocation bitmaps only to find
that it didn't have enough contiguous blocks, and then the memory
pressure would push the block bitmap out of memory again.... and file
system performance would go into the toilet.

So we don't want to disturb the meaning of this particular bit.  If we
zap the NEED_INIT_BIT whenever we discover that the group's buddy
bitmap page has been pushed out of memory, then we will once again
need to read in massive numbers of block bitmaps because clearing the
bit effectively marks the summary information stored ext4_group_info
structure as invalid.

							- Ted

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ditching e4b->alloc_semp
  2011-03-07  4:08     ` Ted Ts'o
@ 2011-03-07  5:52       ` Amir Goldstein
  2011-03-17 16:13         ` Amir Goldstein
  0 siblings, 1 reply; 11+ messages in thread
From: Amir Goldstein @ 2011-03-07  5:52 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: Aneesh Kumar K.V, Ext4 Developers List

On Mon, Mar 7, 2011 at 6:08 AM, Ted Ts'o <tytso@mit.edu> wrote:
> On Sun, Mar 06, 2011 at 10:15:41PM +0200, Amir Goldstein wrote:
>>
>> That sounds about right, but why do I need a new bit?
>> Why can't I use EXT4_GROUP_INFO_NEED_INIT_BIT to tell me the exact
>> same thing?
>
> The current meaning of NEED_INIT_BIT is that it indicates that the
> group has been initialized once since the file system has been
> mounted.  It is used by ext4_mb_good_group() to know whether it can
> rely on ext4_group_info->bb_free, ext4_group_info->bb_fragments,
> ext4_group_info->bb_largest_free_order, et. al, without needing to
> reload the buddy bitmap.
>
> We added this so that even if memory pressure has forced the buddy
> bitmap and block allocation bitmaps out of memory, we have enough
> information in the ext4_group_info summary array that we can quickly
> decide whether or not a group is a likely good candidate to be
> examined more closely to have the necessary free blocks.  Without this
> (relatively recent) change, the mballoc code might potentially need to
> read in tens if not hundreds of block allocation bitmaps only to find
> that it didn't have enough contiguous blocks, and then the memory
> pressure would push the block bitmap out of memory again.... and file
> system performance would go into the toilet.
>

Right... we need it.
I also wanted to examine if clearing the NEED_INIT_BIT on add_group_blocks
is really necessary.
Couldn't the buddy bitmap of partial group be initialized with all
blocks at the end
"used", similar to the block bitmap itself?
Then add_group_blocks() could just "free" the extra added blocks.

>
> So we don't want to disturb the meaning of this particular bit.  If we
> zap the NEED_INIT_BIT whenever we discover that the group's buddy
> bitmap page has been pushed out of memory, then we will once again
> need to read in massive numbers of block bitmaps because clearing the
> bit effectively marks the summary information stored ext4_group_info
> structure as invalid.
>
>                                                        - Ted
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ditching e4b->alloc_semp
  2011-03-07  5:52       ` Amir Goldstein
@ 2011-03-17 16:13         ` Amir Goldstein
  0 siblings, 0 replies; 11+ messages in thread
From: Amir Goldstein @ 2011-03-17 16:13 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: Aneesh Kumar K.V, Ext4 Developers List

On Mon, Mar 7, 2011 at 7:52 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Mon, Mar 7, 2011 at 6:08 AM, Ted Ts'o <tytso@mit.edu> wrote:
>> On Sun, Mar 06, 2011 at 10:15:41PM +0200, Amir Goldstein wrote:
>>>
>>> That sounds about right, but why do I need a new bit?
>>> Why can't I use EXT4_GROUP_INFO_NEED_INIT_BIT to tell me the exact
>>> same thing?
>>
>> The current meaning of NEED_INIT_BIT is that it indicates that the
>> group has been initialized once since the file system has been
>> mounted.  It is used by ext4_mb_good_group() to know whether it can
>> rely on ext4_group_info->bb_free, ext4_group_info->bb_fragments,
>> ext4_group_info->bb_largest_free_order, et. al, without needing to
>> reload the buddy bitmap.
>>
>> We added this so that even if memory pressure has forced the buddy
>> bitmap and block allocation bitmaps out of memory, we have enough
>> information in the ext4_group_info summary array that we can quickly
>> decide whether or not a group is a likely good candidate to be
>> examined more closely to have the necessary free blocks.  Without this
>> (relatively recent) change, the mballoc code might potentially need to
>> read in tens if not hundreds of block allocation bitmaps only to find
>> that it didn't have enough contiguous blocks, and then the memory
>> pressure would push the block bitmap out of memory again.... and file
>> system performance would go into the toilet.
>>
>
> Right... we need it.
> I also wanted to examine if clearing the NEED_INIT_BIT on add_group_blocks
> is really necessary.
> Couldn't the buddy bitmap of partial group be initialized with all
> blocks at the end
> "used", similar to the block bitmap itself?
> Then add_group_blocks() could just "free" the extra added blocks.
>

I have started to look into removing down_write(&grp->alloc_sem) from
ext4_add_group_blocks(), because until I remove it, I won't be able to get rid
of down_read(&e4b->alloc_semp) and I don't see how I can use the page_lock()
and grp->bb_state bits semantics alone.

As long as the NEED_INIT_BIT is a clear-only flag the semantics work,
but when ext4_add_group_blocks() sets the bit, we may have a task holding
a reference to buddy page and another trying to initialize it (under page_lock).
I don't suppose you meant holding page_lock() for the entire allocation.

So unless you have a better idea, I will try to implement the "add group blocks
by freeing them" paradigm.

Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ditching e4b->alloc_semp
  2011-03-05 16:06 ` Ted Ts'o
  2011-03-06 20:15   ` Amir Goldstein
@ 2011-03-21  4:39   ` Amir Goldstein
  1 sibling, 0 replies; 11+ messages in thread
From: Amir Goldstein @ 2011-03-21  4:39 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: Aneesh Kumar K.V, Ext4 Developers List

On Sat, Mar 5, 2011 at 6:06 PM, Ted Ts'o <tytso@mit.edu> wrote:
> On Mon, Feb 21, 2011 at 10:02:44PM +0200, Amir Goldstein wrote:
>>
>> 1. (AKA easy lane) use a single page (or more) per block group.
>> this will increase the memory usage for 1K blocks fs and for 2K block fs
>> on 8K page system, but are these use cases really that common?
>
> The most common use cases will be 4k block file system on 16k page
> systems, which show up on PowerPC and Itanium systems.
>
>> 2. (AKA hard lane) attach buffer heads to buddy page and use
>> buffer_uptodate() and buffer_lock() instead of PageUptodate() and lock_page()
>> to initialize buddy cache of groups that share the same page.
>
> How about this; use lock_page() to guarantee exclusive access to the
> shared buddy bitmap, and then define a new bit in
> ext4_group_info->bb_state to indicate whether or not a particular
> block group has been initialized.  If the page has gotten flushed from
> memory, so that it is not present at all (i.e., find_get_page returns
> NULL), then iterate over all of the groups to clear the
> EXT4_GROUP_INFO_BUDDY_INIT bit.
>
> If the page is returned by find_get_page(), then all you need to do is
> check the EXT4_GROUP_INFO_BUDDY_INIT bit to discover whether or not or
> not the buddy bitmap needs to be initialized.
>

I didn't need to use EXT4_GROUP_INFO_BUDDY_INIT after all.
That state could be obtained from page uptodate bit.

See patches posted on the list.

Cheers,
Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-03-21  4:39 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-21 20:02 ditching e4b->alloc_semp Amir Goldstein
2011-02-22 17:18 ` Andreas Dilger
2011-02-22 20:02   ` Amir Goldstein
2011-02-22 21:00     ` Andreas Dilger
2011-02-23  9:54       ` Amir Goldstein
2011-03-05 16:06 ` Ted Ts'o
2011-03-06 20:15   ` Amir Goldstein
2011-03-07  4:08     ` Ted Ts'o
2011-03-07  5:52       ` Amir Goldstein
2011-03-17 16:13         ` Amir Goldstein
2011-03-21  4:39   ` Amir Goldstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).