From: Jan Kara <jack@suse.cz>
To: linux-fsdevel@vger.kernel.org
Cc: linux-mm@vger.kernel.org, npiggin@suse.de
Subject: [PATCH 0/5] [RFC] Fix page_mkwrite for blocksize < pagesize
Date: Tue, 11 Aug 2009 00:20:42 +0200 [thread overview]
Message-ID: <1249942847-23851-1-git-send-email-jack@suse.cz> (raw)
Hi,
below is a patch series that is my new approach to solve problems with
page_mkwrite() when blocksize < pagesize. To refresh memory the main issue is
as follows:
We'd like to use page_mkwrite() to allocate blocks under a page which is
becoming writeably mmapped in some process address space. This allows a
filesystem to return a page fault if there is not enough space available, user
exceeds quota or similar problem happens, rather than silently discarding data
later when writepage is called.
On filesystems where blocksize < pagesize the situation is complicated though.
Think for example that blocksize = 1024, pagesize = 4096 and a process does:
ftruncate(fd, 0);
pwrite(fd, buf, 1024, 0);
map = mmap(NULL, 4096, PROT_WRITE, MAP_SHARED, fd, 0);
map[0] = 'a'; ----> page_mkwrite() for index 0 is called
ftruncate(fd, 10000); /* or even pwrite(fd, buf, 1, 10000) */
fsync(fd); ----> writepage() for index 0 is called
At the moment page_mkwrite() is called, filesystem can allocate only one block
for the page because i_size == 1024. Otherwise it would create blocks beyond
i_size which is generally undesirable. But later at writepage() time, we would
like to have blocks allocated for the whole page (and in principle we have to
allocate them because user could have filled the page with data after the
second ftruncate()).
This series is an attempt to fix the above issue. The idea is that we do i_size
update after an extending write or truncate not under the page lock of the page
where the i_size ends up but under the page lock of the page where i_size was
originally. This also allows us to solve a posix compliance issue where we
could have exposed data written via mmap beyond i_size.
I see two disputable things with this approach:
1) set_page_dirty_buffers() and create_empty_buffers() now checks i_size.
That's a bit ugly although not marking buffers dirty beyond i_size makes a lot
of sence to me.
2) to fix the problem with non-zeros written via mmap beyond EOF and then
being exposed by truncate, I've added zeroing to a function doing all the work
when extending i_size (which is essentially the only place where we can reliably
do the work and avoid races with mmap). That's a good fit but basically all
filesystems now have to extend i_size with this function which I don't find that
pleasing. Anybody has an idea how to avoid that conversion of every filesystem
or make it less painful?
Thanks for comments in advance.
Honza
next reply other threads:[~2009-08-10 22:20 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-10 22:20 Jan Kara [this message]
2009-08-10 22:20 ` [PATCH 1/5] fs: buffer_head writepage no invalidate Jan Kara
2009-08-10 22:20 ` [PATCH 2/5] fs: Don't zero out page on writepage() Jan Kara
2009-08-10 22:20 ` [PATCH 3/5] vfs: Create dirty buffer only inside i_size Jan Kara
2009-08-10 22:20 ` [PATCH 4/5] fs: Move i_size update in write_end() from under page lock Jan Kara
2009-08-10 22:20 ` [PATCH 5/5] vfs: Add better VFS support for page_mkwrite when blocksize < pagesize Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1249942847-23851-1-git-send-email-jack@suse.cz \
--to=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@vger.kernel.org \
--cc=npiggin@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).