linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: linux-fsdevel@vger.kernel.org
Cc: LKML <linux-kernel@vger.kernel.org>,
	linux-ext4@vger.kernel.org, linux-mm@kvack.org, npiggin@suse.de
Subject: [RFC] [PATCH 0/7] Improve VFS to handle better mmaps when blocksize < pagesize (v3)
Date: Thu, 17 Sep 2009 17:21:40 +0200	[thread overview]
Message-ID: <1253200907-31392-1-git-send-email-jack@suse.cz> (raw)


  Hi,

  here is my next attempt to solve a problems arising with mmaped writes when
blocksize < pagesize. To recall what's the problem:

We'd like to use page_mkwrite() to allocate blocks under a page which is
becoming writeably mmapped in some process address space. This allows a
filesystem to return a page fault if there is not enough space available, user
exceeds quota or similar problem happens, rather than silently discarding data
later when writepage is called.

On filesystems where blocksize < pagesize the situation is complicated though.
Think for example that blocksize = 1024, pagesize = 4096 and a process does:
  ftruncate(fd, 0);
  pwrite(fd, buf, 1024, 0);
  map = mmap(NULL, 4096, PROT_WRITE, MAP_SHARED, fd, 0);
  map[0] = 'a';  ----> page_mkwrite() for index 0 is called
  ftruncate(fd, 10000); /* or even pwrite(fd, buf, 1, 10000) */
  fsync(fd); ----> writepage() for index 0 is called

At the moment page_mkwrite() is called, filesystem can allocate only one block
for the page because i_size == 1024. Otherwise it would create blocks beyond
i_size which is generally undesirable. But later at writepage() time, we would
like to have blocks allocated for the whole page (and in principle we have to
allocate them because user could have filled the page with data after the
second ftruncate()).
---

  The patches depend on Nick's truncate calling convention rewrite. The first
three patches in the patchset are just cleanups. The series converts ext4 and
ext2 filesystems just to give an idea how conversion of a filesystem will
look like.
  A few notes to the changes the main patch (patch number 4) does:
1) zeroing of tail of the last block now does not happen in writepage (which is
racy anyway as Nick pointed out) and foo_truncate_page but rather when i_size
is going to be extended.
2) writeback path does not care about i_size anymore, it uses buffer flags
instead. An exception is a nobh case where we have to use i_size. Thus
filesystems not using nobh code can update i_size in write_end without holding
page_lock.  Filesystems using nobh code still have to update i_size under the
page_lock since otherwise __mpage_writepage could come early, write just part
of the page, and clear all dirty bits, thus causing a data loss.
3) converted filesystems have to make sure that the buffers with valid data
to write are either mapped or delay before they call block_write_full_page.
The idea is that they should use page_mkwrite() to setup buffers.

  Both ext2 and ext4 have survived some beating with fsx-linux so they should
be at least moderately safe to use :). Any comments?
									Honza

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

             reply	other threads:[~2009-09-17 15:21 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-17 15:21 Jan Kara [this message]
2009-09-17 15:21 ` [PATCH 1/7] fs: buffer_head writepage no invalidate Jan Kara
2009-09-17 15:21 ` [PATCH 2/7] fs: Remove zeroing from nobh_writepage Jan Kara
2009-09-17 15:21 ` [PATCH 3/7] ext4: Deprecate nobh mount option Jan Kara
2009-09-17 15:21 ` [PATCH 4/7] vfs: Add better VFS support for page_mkwrite when blocksize < pagesize Jan Kara
2009-09-17 15:21 ` [PATCH 5/7] ext4: Convert filesystem to the new truncate calling convention Jan Kara
2009-09-22 14:36   ` Al Viro
2009-09-22 17:16     ` Jan Kara
2009-09-22 17:23       ` Al Viro
2009-09-22 17:37         ` Jan Kara
2009-09-17 15:21 ` [PATCH 6/7] ext4: Convert ext4 to new mkwrite code Jan Kara
2009-09-17 15:21 ` [PATCH 7/7] ext2: Convert ext2 " Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1253200907-31392-1-git-send-email-jack@suse.cz \
    --to=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).