* Checks in ext4_ext_fiemap_cb() broken @ 2011-07-25 15:58 Jan Kara 2011-07-26 1:20 ` Yongqiang Yang 0 siblings, 1 reply; 8+ messages in thread From: Jan Kara @ 2011-07-25 15:58 UTC (permalink / raw) To: Yongqiang Yang; +Cc: linux-ext4, Andreas Dilger, tytso Hello, I just had a look at the code checking delayed allocated buffers in ext4_ext_fiemap_cb(). I believe the checks there could use some elimiation of common patterns but that's just a minor thing. The main problem is that the code can easily crash the kernel when it races with page reclaim. You just cannot access most of the page contents (and for buffers it is especially true) without locking the page. Getting a reference via find_get_pages_tag() guarantees you the structure cannot go away but mm is still free to detach the page from the mapping at any moment. So you must always lock a page and check that it still belongs to the desired mapping before you check 'page_has_buffers()'. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Checks in ext4_ext_fiemap_cb() broken 2011-07-25 15:58 Checks in ext4_ext_fiemap_cb() broken Jan Kara @ 2011-07-26 1:20 ` Yongqiang Yang 2011-07-26 12:12 ` Jan Kara 0 siblings, 1 reply; 8+ messages in thread From: Yongqiang Yang @ 2011-07-26 1:20 UTC (permalink / raw) To: Jan Kara, Ted Ts'o; +Cc: linux-ext4, Andreas Dilger Hi Jan, I have been thinking if we can handle fiemap much simpler for a while. Current code is very ugly due to page cache look up. I have a thought on simplifying these code. The reason leading us to looking up page cache is that delayed extents are not in extents tree. I think we can add an in-memory delayed extents list in inode, and we can delete entries in the list after we allocate blocks for them. There is no limit on length of extents in the list, this way can an entry contain as many blocks as they are contiguous logically. What's your opinion? Yongqiang. On Mon, Jul 25, 2011 at 11:58 PM, Jan Kara <jack@suse.cz> wrote: > Hello, > > I just had a look at the code checking delayed allocated buffers in > ext4_ext_fiemap_cb(). I believe the checks there could use some elimiation > of common patterns but that's just a minor thing. The main problem is that > the code can easily crash the kernel when it races with page reclaim. You > just cannot access most of the page contents (and for buffers it is > especially true) without locking the page. Getting a reference via > find_get_pages_tag() guarantees you the structure cannot go away but mm is > still free to detach the page from the mapping at any moment. So you must > always lock a page and check that it still belongs to the desired mapping > before you check 'page_has_buffers()'. > > Honza > -- > Jan Kara <jack@suse.cz> > SUSE Labs, CR > -- Best Wishes Yongqiang Yang -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Checks in ext4_ext_fiemap_cb() broken 2011-07-26 1:20 ` Yongqiang Yang @ 2011-07-26 12:12 ` Jan Kara 2011-07-26 12:48 ` Yongqiang Yang 2011-07-26 18:48 ` Aditya Kali 0 siblings, 2 replies; 8+ messages in thread From: Jan Kara @ 2011-07-26 12:12 UTC (permalink / raw) To: Yongqiang Yang; +Cc: Jan Kara, Ted Ts'o, linux-ext4, Andreas Dilger Hi Yongqiang, On Tue 26-07-11 09:20:28, Yongqiang Yang wrote: > I have been thinking if we can handle fiemap much simpler for a while. > Current code is very ugly due to page cache look up. I have a > thought on simplifying these code. The reason leading us to looking > up page cache is that delayed extents are not in extents tree. I > think we can add an in-memory delayed extents list in inode, and we > can delete entries in the list after we allocate blocks for them. > There is no limit on length of extents in the list, this way can an > entry contain as many blocks as they are contiguous logically. > > What's your opinion? Yes, that should be doable and shouldn't have too big overhead. It's just stupid we'll do all this stuff only for fiemap call which is relatively rare. Honza > On Mon, Jul 25, 2011 at 11:58 PM, Jan Kara <jack@suse.cz> wrote: > > Hello, > > > > I just had a look at the code checking delayed allocated buffers in > > ext4_ext_fiemap_cb(). I believe the checks there could use some elimiation > > of common patterns but that's just a minor thing. The main problem is that > > the code can easily crash the kernel when it races with page reclaim. You > > just cannot access most of the page contents (and for buffers it is > > especially true) without locking the page. Getting a reference via > > find_get_pages_tag() guarantees you the structure cannot go away but mm is > > still free to detach the page from the mapping at any moment. So you must > > always lock a page and check that it still belongs to the desired mapping > > before you check 'page_has_buffers()'. > > > > Honza > > -- > > Jan Kara <jack@suse.cz> > > SUSE Labs, CR > > > > > > -- > Best Wishes > Yongqiang Yang -- Jan Kara <jack@suse.cz> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Checks in ext4_ext_fiemap_cb() broken 2011-07-26 12:12 ` Jan Kara @ 2011-07-26 12:48 ` Yongqiang Yang 2011-07-26 16:30 ` Allison Henderson 2011-07-26 17:07 ` Ted Ts'o 2011-07-26 18:48 ` Aditya Kali 1 sibling, 2 replies; 8+ messages in thread From: Yongqiang Yang @ 2011-07-26 12:48 UTC (permalink / raw) To: Jan Kara, Allison Henderson; +Cc: Ted Ts'o, linux-ext4, Andreas Dilger On Tue, Jul 26, 2011 at 8:12 PM, Jan Kara <jack@suse.cz> wrote: > Hi Yongqiang, > > On Tue 26-07-11 09:20:28, Yongqiang Yang wrote: >> I have been thinking if we can handle fiemap much simpler for a while. >> Current code is very ugly due to page cache look up. I have a >> thought on simplifying these code. The reason leading us to looking >> up page cache is that delayed extents are not in extents tree. I >> think we can add an in-memory delayed extents list in inode, and we >> can delete entries in the list after we allocate blocks for them. >> There is no limit on length of extents in the list, this way can an >> entry contain as many blocks as they are contiguous logically. >> >> What's your opinion? > Yes, that should be doable and shouldn't have too big overhead. It's just > stupid we'll do all this stuff only for fiemap call which is relatively > rare. I guess there are other places where delayed extents should be handled by looking up page cache. SEEK_HOLE and SEEK_DATA also need to lookup page cache to handle delayed extents. Hi Allison, If a delayed extents list added in the inode, could punch hole code be simpler? Yongqiang. > > Honza > >> On Mon, Jul 25, 2011 at 11:58 PM, Jan Kara <jack@suse.cz> wrote: >> > Hello, >> > >> > I just had a look at the code checking delayed allocated buffers in >> > ext4_ext_fiemap_cb(). I believe the checks there could use some elimiation >> > of common patterns but that's just a minor thing. The main problem is that >> > the code can easily crash the kernel when it races with page reclaim. You >> > just cannot access most of the page contents (and for buffers it is >> > especially true) without locking the page. Getting a reference via >> > find_get_pages_tag() guarantees you the structure cannot go away but mm is >> > still free to detach the page from the mapping at any moment. So you must >> > always lock a page and check that it still belongs to the desired mapping >> > before you check 'page_has_buffers()'. >> > >> > Honza >> > -- >> > Jan Kara <jack@suse.cz> >> > SUSE Labs, CR >> > >> >> >> >> -- >> Best Wishes >> Yongqiang Yang > -- > Jan Kara <jack@suse.cz> > SUSE Labs, CR > -- Best Wishes Yongqiang Yang -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Checks in ext4_ext_fiemap_cb() broken 2011-07-26 12:48 ` Yongqiang Yang @ 2011-07-26 16:30 ` Allison Henderson 2011-07-26 16:44 ` Andreas Dilger 2011-07-26 17:07 ` Ted Ts'o 1 sibling, 1 reply; 8+ messages in thread From: Allison Henderson @ 2011-07-26 16:30 UTC (permalink / raw) To: Yongqiang Yang; +Cc: Jan Kara, Ted Ts'o, linux-ext4, Andreas Dilger On 07/26/2011 05:48 AM, Yongqiang Yang wrote: > On Tue, Jul 26, 2011 at 8:12 PM, Jan Kara<jack@suse.cz> wrote: >> Hi Yongqiang, >> >> On Tue 26-07-11 09:20:28, Yongqiang Yang wrote: >>> I have been thinking if we can handle fiemap much simpler for a while. >>> Current code is very ugly due to page cache look up. I have a >>> thought on simplifying these code. The reason leading us to looking >>> up page cache is that delayed extents are not in extents tree. I >>> think we can add an in-memory delayed extents list in inode, and we >>> can delete entries in the list after we allocate blocks for them. >>> There is no limit on length of extents in the list, this way can an >>> entry contain as many blocks as they are contiguous logically. >>> >>> What's your opinion? >> Yes, that should be doable and shouldn't have too big overhead. It's just >> stupid we'll do all this stuff only for fiemap call which is relatively >> rare. > > I guess there are other places where delayed extents should be handled > by looking up page cache. > > SEEK_HOLE and SEEK_DATA also need to lookup page cache to handle > delayed extents. > > Hi Allison, > > If a delayed extents list added in the inode, could punch hole code be simpler? > > > Yongqiang. Hi there, Well, I think we may be able to make it more efficient if we had the delayed extent list. The earlier versions of punch hole were complex because of the different mechanisms needed to identify when extents were mapped, delayed or a hole. Later we decided that this was too complex, and the pages that covered the hole need to be sync'd anyway, which eliminated the need to detect the delayed extents, but it is a wasteful operation if the extents in the hole were just unwritten. If we had the delayed extent list, I think we may just be able to sync extents as needed instead of syncing the entire hole. Allison Henderson >> >> Honza >> >>> On Mon, Jul 25, 2011 at 11:58 PM, Jan Kara<jack@suse.cz> wrote: >>>> Hello, >>>> >>>> I just had a look at the code checking delayed allocated buffers in >>>> ext4_ext_fiemap_cb(). I believe the checks there could use some elimiation >>>> of common patterns but that's just a minor thing. The main problem is that >>>> the code can easily crash the kernel when it races with page reclaim. You >>>> just cannot access most of the page contents (and for buffers it is >>>> especially true) without locking the page. Getting a reference via >>>> find_get_pages_tag() guarantees you the structure cannot go away but mm is >>>> still free to detach the page from the mapping at any moment. So you must >>>> always lock a page and check that it still belongs to the desired mapping >>>> before you check 'page_has_buffers()'. >>>> >>>> Honza >>>> -- >>>> Jan Kara<jack@suse.cz> >>>> SUSE Labs, CR >>>> >>> >>> >>> >>> -- >>> Best Wishes >>> Yongqiang Yang >> -- >> Jan Kara<jack@suse.cz> >> SUSE Labs, CR >> > > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Checks in ext4_ext_fiemap_cb() broken 2011-07-26 16:30 ` Allison Henderson @ 2011-07-26 16:44 ` Andreas Dilger 0 siblings, 0 replies; 8+ messages in thread From: Andreas Dilger @ 2011-07-26 16:44 UTC (permalink / raw) To: Yongqiang Yang; +Cc: Ted Ts'o, Ext4 Developers List On Tue 26-07-11 09:20:28, Yongqiang Yang wrote: > I have been thinking if we can handle fiemap much simpler for a while. > Current code is very ugly due to page cache look up. I have a > thought on simplifying these code. The reason leading us to looking > up page cache is that delayed extents are not in extents tree. I > think we can add an in-memory delayed extents list in inode, and we > can delete entries in the list after we allocate blocks for them. > There is no limit on length of extents in the list, this way can an > entry contain as many blocks as they are contiguous logically. > > What's your opinion? It may also be useful to have an extent list for submitting large contiguous writeouts to disk, instead of having to look them up. The main question is whether the added overhead of maintaining the list is worthwhile if we can get an equivalent functionality using a tag lookup in the page cache. Cheers, Andreas ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Checks in ext4_ext_fiemap_cb() broken 2011-07-26 12:48 ` Yongqiang Yang 2011-07-26 16:30 ` Allison Henderson @ 2011-07-26 17:07 ` Ted Ts'o 1 sibling, 0 replies; 8+ messages in thread From: Ted Ts'o @ 2011-07-26 17:07 UTC (permalink / raw) To: Yongqiang Yang; +Cc: Jan Kara, Allison Henderson, linux-ext4, Andreas Dilger On Tue, Jul 26, 2011 at 08:48:21PM +0800, Yongqiang Yang wrote: > I guess there are other places where delayed extents should be handled > by looking up page cache. > > SEEK_HOLE and SEEK_DATA also need to lookup page cache to handle > delayed extents. Another place where we're using testing the page cache for delalloc extents is in the bigalloc patches. See ext4_find_delalloc_range in Aditya's patch: http://permalink.gmane.org/gmane.comp.file-systems.ext4/26619 - Ted ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Checks in ext4_ext_fiemap_cb() broken 2011-07-26 12:12 ` Jan Kara 2011-07-26 12:48 ` Yongqiang Yang @ 2011-07-26 18:48 ` Aditya Kali 1 sibling, 0 replies; 8+ messages in thread From: Aditya Kali @ 2011-07-26 18:48 UTC (permalink / raw) To: Jan Kara; +Cc: Yongqiang Yang, Ted Ts'o, linux-ext4, Andreas Dilger On Tue, Jul 26, 2011 at 5:12 AM, Jan Kara <jack@suse.cz> wrote: > Hi Yongqiang, > > On Tue 26-07-11 09:20:28, Yongqiang Yang wrote: >> I have been thinking if we can handle fiemap much simpler for a while. >> Current code is very ugly due to page cache look up. I have a >> thought on simplifying these code. The reason leading us to looking >> up page cache is that delayed extents are not in extents tree. I >> think we can add an in-memory delayed extents list in inode, and we >> can delete entries in the list after we allocate blocks for them. >> There is no limit on length of extents in the list, this way can an >> entry contain as many blocks as they are contiguous logically. >> >> What's your opinion? > Yes, that should be doable and shouldn't have too big overhead. It's just > stupid we'll do all this stuff only for fiemap call which is relatively > rare. > Delayed extents lookup will also help resolve another race that we currently have in bigalloc code path. Here, we need to figure out if a cluster is already under delayed allocation or not (to determine whether we need to reserve quota for this cluster). But, determining this races against the writeback of delayed allocated pages. ext4_find_delalloc_range() function has a comment about this race. If there is a delayed extents list and the extents are removed from his list when they are actually mapped, then ext4_find_delalloc_range() can simply check against this list. > Honza > >> On Mon, Jul 25, 2011 at 11:58 PM, Jan Kara <jack@suse.cz> wrote: >> > Hello, >> > >> > I just had a look at the code checking delayed allocated buffers in >> > ext4_ext_fiemap_cb(). I believe the checks there could use some elimiation >> > of common patterns but that's just a minor thing. The main problem is that >> > the code can easily crash the kernel when it races with page reclaim. You >> > just cannot access most of the page contents (and for buffers it is >> > especially true) without locking the page. Getting a reference via >> > find_get_pages_tag() guarantees you the structure cannot go away but mm is >> > still free to detach the page from the mapping at any moment. So you must >> > always lock a page and check that it still belongs to the desired mapping >> > before you check 'page_has_buffers()'. >> > >> > Honza >> > -- >> > Jan Kara <jack@suse.cz> >> > SUSE Labs, CR >> > >> >> >> >> -- >> Best Wishes >> Yongqiang Yang > -- > Jan Kara <jack@suse.cz> > SUSE Labs, CR > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-07-26 18:49 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-07-25 15:58 Checks in ext4_ext_fiemap_cb() broken Jan Kara 2011-07-26 1:20 ` Yongqiang Yang 2011-07-26 12:12 ` Jan Kara 2011-07-26 12:48 ` Yongqiang Yang 2011-07-26 16:30 ` Allison Henderson 2011-07-26 16:44 ` Andreas Dilger 2011-07-26 17:07 ` Ted Ts'o 2011-07-26 18:48 ` Aditya Kali
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).