linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Checks in ext4_ext_fiemap_cb() broken
@ 2011-07-25 15:58 Jan Kara
  2011-07-26  1:20 ` Yongqiang Yang
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Kara @ 2011-07-25 15:58 UTC (permalink / raw)
  To: Yongqiang Yang; +Cc: linux-ext4, Andreas Dilger, tytso

  Hello,

  I just had a look at the code checking delayed allocated buffers in
ext4_ext_fiemap_cb(). I believe the checks there could use some elimiation
of common patterns but that's just a minor thing. The main problem is that
the code can easily crash the kernel when it races with page reclaim. You
just cannot access most of the page contents (and for buffers it is
especially true) without locking the page. Getting a reference via
find_get_pages_tag() guarantees you the structure cannot go away but mm is
still free to detach the page from the mapping at any moment. So you must
always lock a page and check that it still belongs to the desired mapping
before you check 'page_has_buffers()'.

								Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Checks in ext4_ext_fiemap_cb() broken
  2011-07-25 15:58 Checks in ext4_ext_fiemap_cb() broken Jan Kara
@ 2011-07-26  1:20 ` Yongqiang Yang
  2011-07-26 12:12   ` Jan Kara
  0 siblings, 1 reply; 8+ messages in thread
From: Yongqiang Yang @ 2011-07-26  1:20 UTC (permalink / raw)
  To: Jan Kara, Ted Ts'o; +Cc: linux-ext4, Andreas Dilger

Hi Jan,

I have been thinking if we can handle fiemap much simpler for a while.
 Current code is very ugly due to page cache look up.  I have a
thought on simplifying these code.  The reason leading us to looking
up page cache is that delayed extents are not in extents tree.  I
think we can add an in-memory delayed extents list in inode, and we
can delete entries in the list after we allocate blocks for them.
There is no limit on length of extents in the list, this way can an
entry contain as many blocks as they are contiguous logically.

What's your opinion?

Yongqiang.

On Mon, Jul 25, 2011 at 11:58 PM, Jan Kara <jack@suse.cz> wrote:
>  Hello,
>
>  I just had a look at the code checking delayed allocated buffers in
> ext4_ext_fiemap_cb(). I believe the checks there could use some elimiation
> of common patterns but that's just a minor thing. The main problem is that
> the code can easily crash the kernel when it races with page reclaim. You
> just cannot access most of the page contents (and for buffers it is
> especially true) without locking the page. Getting a reference via
> find_get_pages_tag() guarantees you the structure cannot go away but mm is
> still free to detach the page from the mapping at any moment. So you must
> always lock a page and check that it still belongs to the desired mapping
> before you check 'page_has_buffers()'.
>
>                                                                Honza
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
>



-- 
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Checks in ext4_ext_fiemap_cb() broken
  2011-07-26  1:20 ` Yongqiang Yang
@ 2011-07-26 12:12   ` Jan Kara
  2011-07-26 12:48     ` Yongqiang Yang
  2011-07-26 18:48     ` Aditya Kali
  0 siblings, 2 replies; 8+ messages in thread
From: Jan Kara @ 2011-07-26 12:12 UTC (permalink / raw)
  To: Yongqiang Yang; +Cc: Jan Kara, Ted Ts'o, linux-ext4, Andreas Dilger

  Hi Yongqiang,

On Tue 26-07-11 09:20:28, Yongqiang Yang wrote:
> I have been thinking if we can handle fiemap much simpler for a while.
>  Current code is very ugly due to page cache look up.  I have a
> thought on simplifying these code.  The reason leading us to looking
> up page cache is that delayed extents are not in extents tree.  I
> think we can add an in-memory delayed extents list in inode, and we
> can delete entries in the list after we allocate blocks for them.
> There is no limit on length of extents in the list, this way can an
> entry contain as many blocks as they are contiguous logically.
> 
> What's your opinion?
  Yes, that should be doable and shouldn't have too big overhead. It's just
stupid we'll do all this stuff only for fiemap call which is relatively
rare.

								Honza

> On Mon, Jul 25, 2011 at 11:58 PM, Jan Kara <jack@suse.cz> wrote:
> >  Hello,
> >
> >  I just had a look at the code checking delayed allocated buffers in
> > ext4_ext_fiemap_cb(). I believe the checks there could use some elimiation
> > of common patterns but that's just a minor thing. The main problem is that
> > the code can easily crash the kernel when it races with page reclaim. You
> > just cannot access most of the page contents (and for buffers it is
> > especially true) without locking the page. Getting a reference via
> > find_get_pages_tag() guarantees you the structure cannot go away but mm is
> > still free to detach the page from the mapping at any moment. So you must
> > always lock a page and check that it still belongs to the desired mapping
> > before you check 'page_has_buffers()'.
> >
> >                                                                Honza
> > --
> > Jan Kara <jack@suse.cz>
> > SUSE Labs, CR
> >
> 
> 
> 
> -- 
> Best Wishes
> Yongqiang Yang
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Checks in ext4_ext_fiemap_cb() broken
  2011-07-26 12:12   ` Jan Kara
@ 2011-07-26 12:48     ` Yongqiang Yang
  2011-07-26 16:30       ` Allison Henderson
  2011-07-26 17:07       ` Ted Ts'o
  2011-07-26 18:48     ` Aditya Kali
  1 sibling, 2 replies; 8+ messages in thread
From: Yongqiang Yang @ 2011-07-26 12:48 UTC (permalink / raw)
  To: Jan Kara, Allison Henderson; +Cc: Ted Ts'o, linux-ext4, Andreas Dilger

On Tue, Jul 26, 2011 at 8:12 PM, Jan Kara <jack@suse.cz> wrote:
>  Hi Yongqiang,
>
> On Tue 26-07-11 09:20:28, Yongqiang Yang wrote:
>> I have been thinking if we can handle fiemap much simpler for a while.
>>  Current code is very ugly due to page cache look up.  I have a
>> thought on simplifying these code.  The reason leading us to looking
>> up page cache is that delayed extents are not in extents tree.  I
>> think we can add an in-memory delayed extents list in inode, and we
>> can delete entries in the list after we allocate blocks for them.
>> There is no limit on length of extents in the list, this way can an
>> entry contain as many blocks as they are contiguous logically.
>>
>> What's your opinion?
>  Yes, that should be doable and shouldn't have too big overhead. It's just
> stupid we'll do all this stuff only for fiemap call which is relatively
> rare.

I guess there are other places where delayed extents should be handled
by looking up page cache.

SEEK_HOLE and SEEK_DATA also need to lookup page cache to handle
delayed extents.

Hi Allison,

If a delayed extents list added in the inode, could punch hole code be simpler?


Yongqiang.
>
>                                                                Honza
>
>> On Mon, Jul 25, 2011 at 11:58 PM, Jan Kara <jack@suse.cz> wrote:
>> >  Hello,
>> >
>> >  I just had a look at the code checking delayed allocated buffers in
>> > ext4_ext_fiemap_cb(). I believe the checks there could use some elimiation
>> > of common patterns but that's just a minor thing. The main problem is that
>> > the code can easily crash the kernel when it races with page reclaim. You
>> > just cannot access most of the page contents (and for buffers it is
>> > especially true) without locking the page. Getting a reference via
>> > find_get_pages_tag() guarantees you the structure cannot go away but mm is
>> > still free to detach the page from the mapping at any moment. So you must
>> > always lock a page and check that it still belongs to the desired mapping
>> > before you check 'page_has_buffers()'.
>> >
>> >                                                                Honza
>> > --
>> > Jan Kara <jack@suse.cz>
>> > SUSE Labs, CR
>> >
>>
>>
>>
>> --
>> Best Wishes
>> Yongqiang Yang
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
>



-- 
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Checks in ext4_ext_fiemap_cb() broken
  2011-07-26 12:48     ` Yongqiang Yang
@ 2011-07-26 16:30       ` Allison Henderson
  2011-07-26 16:44         ` Andreas Dilger
  2011-07-26 17:07       ` Ted Ts'o
  1 sibling, 1 reply; 8+ messages in thread
From: Allison Henderson @ 2011-07-26 16:30 UTC (permalink / raw)
  To: Yongqiang Yang; +Cc: Jan Kara, Ted Ts'o, linux-ext4, Andreas Dilger

On 07/26/2011 05:48 AM, Yongqiang Yang wrote:
> On Tue, Jul 26, 2011 at 8:12 PM, Jan Kara<jack@suse.cz>  wrote:
>>   Hi Yongqiang,
>>
>> On Tue 26-07-11 09:20:28, Yongqiang Yang wrote:
>>> I have been thinking if we can handle fiemap much simpler for a while.
>>>   Current code is very ugly due to page cache look up.  I have a
>>> thought on simplifying these code.  The reason leading us to looking
>>> up page cache is that delayed extents are not in extents tree.  I
>>> think we can add an in-memory delayed extents list in inode, and we
>>> can delete entries in the list after we allocate blocks for them.
>>> There is no limit on length of extents in the list, this way can an
>>> entry contain as many blocks as they are contiguous logically.
>>>
>>> What's your opinion?
>>   Yes, that should be doable and shouldn't have too big overhead. It's just
>> stupid we'll do all this stuff only for fiemap call which is relatively
>> rare.
>
> I guess there are other places where delayed extents should be handled
> by looking up page cache.
>
> SEEK_HOLE and SEEK_DATA also need to lookup page cache to handle
> delayed extents.
>
> Hi Allison,
>
> If a delayed extents list added in the inode, could punch hole code be simpler?
>
>
> Yongqiang.

Hi there,

Well, I think we may be able to make it more efficient if we had the 
delayed extent list.

The earlier versions of punch hole were complex because of the different 
mechanisms needed to identify when extents were mapped, delayed or a 
hole.  Later we decided that this was too complex, and the pages that 
covered the hole need to be sync'd anyway, which eliminated the need to 
detect the delayed extents, but it is a wasteful operation if the 
extents in the hole were just unwritten.  If we had the delayed extent 
list, I think we may just be able to sync extents as needed instead of 
syncing the entire hole.

Allison Henderson

>>
>>                                                                 Honza
>>
>>> On Mon, Jul 25, 2011 at 11:58 PM, Jan Kara<jack@suse.cz>  wrote:
>>>>   Hello,
>>>>
>>>>   I just had a look at the code checking delayed allocated buffers in
>>>> ext4_ext_fiemap_cb(). I believe the checks there could use some elimiation
>>>> of common patterns but that's just a minor thing. The main problem is that
>>>> the code can easily crash the kernel when it races with page reclaim. You
>>>> just cannot access most of the page contents (and for buffers it is
>>>> especially true) without locking the page. Getting a reference via
>>>> find_get_pages_tag() guarantees you the structure cannot go away but mm is
>>>> still free to detach the page from the mapping at any moment. So you must
>>>> always lock a page and check that it still belongs to the desired mapping
>>>> before you check 'page_has_buffers()'.
>>>>
>>>>                                                                 Honza
>>>> --
>>>> Jan Kara<jack@suse.cz>
>>>> SUSE Labs, CR
>>>>
>>>
>>>
>>>
>>> --
>>> Best Wishes
>>> Yongqiang Yang
>> --
>> Jan Kara<jack@suse.cz>
>> SUSE Labs, CR
>>
>
>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Checks in ext4_ext_fiemap_cb() broken
  2011-07-26 16:30       ` Allison Henderson
@ 2011-07-26 16:44         ` Andreas Dilger
  0 siblings, 0 replies; 8+ messages in thread
From: Andreas Dilger @ 2011-07-26 16:44 UTC (permalink / raw)
  To: Yongqiang Yang; +Cc: Ted Ts'o, Ext4 Developers List

On Tue 26-07-11 09:20:28, Yongqiang Yang wrote:
> I have been thinking if we can handle fiemap much simpler for a while.
> Current code is very ugly due to page cache look up.  I have a
> thought on simplifying these code.  The reason leading us to looking
> up page cache is that delayed extents are not in extents tree.  I
> think we can add an in-memory delayed extents list in inode, and we
> can delete entries in the list after we allocate blocks for them.
> There is no limit on length of extents in the list, this way can an
> entry contain as many blocks as they are contiguous logically.
> 
> What's your opinion?

It may also be useful to have an extent list for submitting large contiguous writeouts to disk, instead of having to look them up.

The main question is whether the added overhead of maintaining the list is worthwhile if we can get an equivalent functionality using a tag lookup in the page cache.

Cheers, Andreas






^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Checks in ext4_ext_fiemap_cb() broken
  2011-07-26 12:48     ` Yongqiang Yang
  2011-07-26 16:30       ` Allison Henderson
@ 2011-07-26 17:07       ` Ted Ts'o
  1 sibling, 0 replies; 8+ messages in thread
From: Ted Ts'o @ 2011-07-26 17:07 UTC (permalink / raw)
  To: Yongqiang Yang; +Cc: Jan Kara, Allison Henderson, linux-ext4, Andreas Dilger

On Tue, Jul 26, 2011 at 08:48:21PM +0800, Yongqiang Yang wrote:
> I guess there are other places where delayed extents should be handled
> by looking up page cache.
> 
> SEEK_HOLE and SEEK_DATA also need to lookup page cache to handle
> delayed extents.

Another place where we're using testing the page cache for delalloc
extents is in the bigalloc patches.  See ext4_find_delalloc_range in
Aditya's patch:

	 http://permalink.gmane.org/gmane.comp.file-systems.ext4/26619

							- Ted


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Checks in ext4_ext_fiemap_cb() broken
  2011-07-26 12:12   ` Jan Kara
  2011-07-26 12:48     ` Yongqiang Yang
@ 2011-07-26 18:48     ` Aditya Kali
  1 sibling, 0 replies; 8+ messages in thread
From: Aditya Kali @ 2011-07-26 18:48 UTC (permalink / raw)
  To: Jan Kara; +Cc: Yongqiang Yang, Ted Ts'o, linux-ext4, Andreas Dilger

On Tue, Jul 26, 2011 at 5:12 AM, Jan Kara <jack@suse.cz> wrote:
>  Hi Yongqiang,
>
> On Tue 26-07-11 09:20:28, Yongqiang Yang wrote:
>> I have been thinking if we can handle fiemap much simpler for a while.
>>  Current code is very ugly due to page cache look up.  I have a
>> thought on simplifying these code.  The reason leading us to looking
>> up page cache is that delayed extents are not in extents tree.  I
>> think we can add an in-memory delayed extents list in inode, and we
>> can delete entries in the list after we allocate blocks for them.
>> There is no limit on length of extents in the list, this way can an
>> entry contain as many blocks as they are contiguous logically.
>>
>> What's your opinion?
>  Yes, that should be doable and shouldn't have too big overhead. It's just
> stupid we'll do all this stuff only for fiemap call which is relatively
> rare.
>
Delayed extents lookup will also help resolve another race that we
currently have in bigalloc code path. Here, we need to figure out if a
cluster is already under delayed allocation or not (to determine
whether we need to reserve quota for this cluster). But, determining
this races against the writeback of delayed allocated pages.
ext4_find_delalloc_range() function has a comment about this race. If
there is a delayed extents list and the extents are removed from his
list when they are actually mapped, then ext4_find_delalloc_range()
can simply check against this list.


>                                                                Honza
>
>> On Mon, Jul 25, 2011 at 11:58 PM, Jan Kara <jack@suse.cz> wrote:
>> >  Hello,
>> >
>> >  I just had a look at the code checking delayed allocated buffers in
>> > ext4_ext_fiemap_cb(). I believe the checks there could use some elimiation
>> > of common patterns but that's just a minor thing. The main problem is that
>> > the code can easily crash the kernel when it races with page reclaim. You
>> > just cannot access most of the page contents (and for buffers it is
>> > especially true) without locking the page. Getting a reference via
>> > find_get_pages_tag() guarantees you the structure cannot go away but mm is
>> > still free to detach the page from the mapping at any moment. So you must
>> > always lock a page and check that it still belongs to the desired mapping
>> > before you check 'page_has_buffers()'.
>> >
>> >                                                                Honza
>> > --
>> > Jan Kara <jack@suse.cz>
>> > SUSE Labs, CR
>> >
>>
>>
>>
>> --
>> Best Wishes
>> Yongqiang Yang
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-07-26 18:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-25 15:58 Checks in ext4_ext_fiemap_cb() broken Jan Kara
2011-07-26  1:20 ` Yongqiang Yang
2011-07-26 12:12   ` Jan Kara
2011-07-26 12:48     ` Yongqiang Yang
2011-07-26 16:30       ` Allison Henderson
2011-07-26 16:44         ` Andreas Dilger
2011-07-26 17:07       ` Ted Ts'o
2011-07-26 18:48     ` Aditya Kali

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).