linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] [RFC] Fix page_mkwrite for blocksize < pagesize
@ 2009-08-10 22:20 Jan Kara
  2009-08-10 22:20 ` [PATCH 1/5] fs: buffer_head writepage no invalidate Jan Kara
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Jan Kara @ 2009-08-10 22:20 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-mm, npiggin


  Hi,

  below is a patch series that is my new approach to solve problems with
page_mkwrite() when blocksize < pagesize. To refresh memory the main issue is
as follows:

We'd like to use page_mkwrite() to allocate blocks under a page which is
becoming writeably mmapped in some process address space. This allows a
filesystem to return a page fault if there is not enough space available, user
exceeds quota or similar problem happens, rather than silently discarding data
later when writepage is called.

On filesystems where blocksize < pagesize the situation is complicated though.
Think for example that blocksize = 1024, pagesize = 4096 and a process does:
  ftruncate(fd, 0);
  pwrite(fd, buf, 1024, 0);
  map = mmap(NULL, 4096, PROT_WRITE, MAP_SHARED, fd, 0);
  map[0] = 'a';  ----> page_mkwrite() for index 0 is called
  ftruncate(fd, 10000); /* or even pwrite(fd, buf, 1, 10000) */
  fsync(fd); ----> writepage() for index 0 is called

At the moment page_mkwrite() is called, filesystem can allocate only one block
for the page because i_size == 1024. Otherwise it would create blocks beyond
i_size which is generally undesirable. But later at writepage() time, we would
like to have blocks allocated for the whole page (and in principle we have to
allocate them because user could have filled the page with data after the
second ftruncate()).

This series is an attempt to fix the above issue. The idea is that we do i_size
update after an extending write or truncate not under the page lock of the page
where the i_size ends up but under the page lock of the page where i_size was
originally. This also allows us to solve a posix compliance issue where we
could have exposed data written via mmap beyond i_size.

I see two disputable things with this approach:
1) set_page_dirty_buffers() and create_empty_buffers() now checks i_size.
That's a bit ugly although not marking buffers dirty beyond i_size makes a lot
of sence to me.

2) to fix the problem with non-zeros written via mmap beyond EOF and then
being exposed by truncate, I've added zeroing to a function doing all the work
when extending i_size (which is essentially the only place where we can reliably
do the work and avoid races with mmap). That's a good fit but basically all
filesystems now have to extend i_size with this function which I don't find that
pleasing. Anybody has an idea how to avoid that conversion of every filesystem
or make it less painful?

Thanks for comments in advance.
									Honza

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-08-10 22:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-10 22:20 [PATCH 0/5] [RFC] Fix page_mkwrite for blocksize < pagesize Jan Kara
2009-08-10 22:20 ` [PATCH 1/5] fs: buffer_head writepage no invalidate Jan Kara
2009-08-10 22:20 ` [PATCH 2/5] fs: Don't zero out page on writepage() Jan Kara
2009-08-10 22:20 ` [PATCH 3/5] vfs: Create dirty buffer only inside i_size Jan Kara
2009-08-10 22:20 ` [PATCH 4/5] fs: Move i_size update in write_end() from under page lock Jan Kara
2009-08-10 22:20 ` [PATCH 5/5] vfs: Add better VFS support for page_mkwrite when blocksize < pagesize Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).