* Re: Two questions on VFS/mm
[not found] <20080604163412.GL16572@duck.suse.cz>
@ 2008-06-04 17:10 ` Miklos Szeredi
2008-06-05 8:12 ` Jan Kara
0 siblings, 1 reply; 2+ messages in thread
From: Miklos Szeredi @ 2008-06-04 17:10 UTC (permalink / raw)
To: jack; +Cc: linux-kernel, linux-ext4, linux-mm, linux-fsdevel, akpm
(Added some CCs)
> could some kind soul knowledgable in VFS/mm help me with the following
> two questions? I've spotted them when testing some ext4 for patches...
> 1) In write_cache_pages() we do:
> ...
> lock_page(page);
> ...
> if (!wbc->range_cyclic && page->index > end) {
> done = 1;
> unlock_page(page);
> continue;
> }
> ...
> ret = (*writepage)(page, wbc, data);
>
> Now the problem is that if range_cyclic is set, it can happen that the
> page we give to the filesystem is beyond the current end of file (and can
> be already processed by invalidatepage()). Is the filesystem supposed to
> handle this (what would it be good for to give such a page to the fs?) or
> is it just a bug in write_cache_pages()?
There may be a bug somewhere, but write_cache_pages() looks correct.
It locks the page then checks for page->mapping to make sure the page
wasn't truncated. And truncation (including invalidatepage()) happens
with the page locked, so that can't race with page writeback.
However the do_invalidatepage() in block_write_full_page() looks
suspicious. It calls invalidatepage(), but doesn't perform all the
other things needed for truncation. Maybe there's a valid reason for
that, but I really don't have any idea what.
Miklos
>
> 2) I have the following problem with page_mkwrite() when blocksize <
> pagesize. What we want to do is to fill in a potential hole under a page
> somebody wants to write to. But consider following scenario with a
> filesystem with 1k blocksize:
> truncate("file", 1024);
> ptr = mmap("file");
> *ptr = 'a'
> -> page_mkwrite() is called.
> but "file" is only 1k large and we cannot really allocate blocks
> beyond end of file. So we allocate just one 1k block.
> truncate("file", 4096);
> *(ptr + 2048) = 'a'
> - nothing is called and later during writepage() time we are surprised
> we have a dirty page which is not backed by a filesystem block.
>
> How to solve this? One idea I have here is that when we handle truncate(),
> we mark the original last page (if it is partial) as read-only again so
> that page_mkwrite() is called on the next write to it. Is something like
> this possible? Pointers to code doing something similar are welcome, I don't
> really know these things ;).
>
> Thanks
> Honza
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Two questions on VFS/mm
2008-06-04 17:10 ` Two questions on VFS/mm Miklos Szeredi
@ 2008-06-05 8:12 ` Jan Kara
0 siblings, 0 replies; 2+ messages in thread
From: Jan Kara @ 2008-06-05 8:12 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: linux-kernel, linux-ext4, linux-mm, linux-fsdevel, akpm
On Wed 04-06-08 19:10:42, Miklos Szeredi wrote:
> (Added some CCs)
>
> > could some kind soul knowledgable in VFS/mm help me with the following
> > two questions? I've spotted them when testing some ext4 for patches...
> > 1) In write_cache_pages() we do:
> > ...
> > lock_page(page);
> > ...
> > if (!wbc->range_cyclic && page->index > end) {
> > done = 1;
> > unlock_page(page);
> > continue;
> > }
> > ...
> > ret = (*writepage)(page, wbc, data);
> >
> > Now the problem is that if range_cyclic is set, it can happen that the
> > page we give to the filesystem is beyond the current end of file (and can
> > be already processed by invalidatepage()). Is the filesystem supposed to
> > handle this (what would it be good for to give such a page to the fs?) or
> > is it just a bug in write_cache_pages()?
>
> There may be a bug somewhere, but write_cache_pages() looks correct.
> It locks the page then checks for page->mapping to make sure the page
> wasn't truncated. And truncation (including invalidatepage()) happens
> with the page locked, so that can't race with page writeback.
You are right, write_cache_pages() is correct - I've wrongly undrestood
what 'end' means.
> However the do_invalidatepage() in block_write_full_page() looks
> suspicious. It calls invalidatepage(), but doesn't perform all the
> other things needed for truncation. Maybe there's a valid reason for
> that, but I really don't have any idea what.
Hmm, the fact is I've seen in my tests writepage() being called on a page
which had its buffers removed. And because we attach buffers to a page in
page_mkwrite() and in write_begin() I think we should not see such page.
I've added more debug printings to the code to verify that the page has
indeed been truncated but so far I did not reproduce the problem again.
> > 2) I have the following problem with page_mkwrite() when blocksize <
> > pagesize. What we want to do is to fill in a potential hole under a page
> > somebody wants to write to. But consider following scenario with a
> > filesystem with 1k blocksize:
> > truncate("file", 1024);
> > ptr = mmap("file");
> > *ptr = 'a'
> > -> page_mkwrite() is called.
> > but "file" is only 1k large and we cannot really allocate blocks
> > beyond end of file. So we allocate just one 1k block.
> > truncate("file", 4096);
> > *(ptr + 2048) = 'a'
> > - nothing is called and later during writepage() time we are surprised
> > we have a dirty page which is not backed by a filesystem block.
> >
> > How to solve this? One idea I have here is that when we handle truncate(),
> > we mark the original last page (if it is partial) as read-only again so
> > that page_mkwrite() is called on the next write to it. Is something like
> > this possible? Pointers to code doing something similar are welcome, I don't
> > really know these things ;).
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2008-06-05 8:12 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20080604163412.GL16572@duck.suse.cz>
2008-06-04 17:10 ` Two questions on VFS/mm Miklos Szeredi
2008-06-05 8:12 ` Jan Kara
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).