From: Andrew Morton <akpm@osdl.org>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: linux-kernel@vger.kernel.org, zach.brown@oracle.com
Subject: Re: [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path
Date: Wed, 4 Oct 2006 10:25:22 -0700 [thread overview]
Message-ID: <20061004102522.d58c00ef.akpm@osdl.org> (raw)
In-Reply-To: <x49zmcc6mhh.fsf@segfault.boston.devel.redhat.com>
On Wed, 04 Oct 2006 13:04:42 -0400
Jeff Moyer <jmoyer@redhat.com> wrote:
> Hi,
>
> It seems that Oracle creates sparse files when doing table creates, and
> then populates those files using O_DIRECT I/O. That means that every I/O to
> the sparse file falls back to buffered I/O. Currently, such a sequential
> O_DIRECT write to a sparse file will end up populating the page cache. The
> problem is that we don't invalidate the page cache pages used to perform
> the buffered fallback.
Why is this a problem? It's just like someone did a write(), and we'll
invalidate the pagecache on the next direct-io operation.
> After talking this over with Zach, we agreed that
> there should be a call to truncate_inode_pages_range after the buffered I/O
> fallback.
>
> Attached is a patch which addresses the problem in my testing. I wrote a
> simple test program that creates a sparse file and issues sequential DIO
> writes to it. Before the patch, the page cache would grow as the file was
> written. With the patch, the page cache does not grow.
>
> Comments welcome.
>
> Cheers,
>
> Jeff
>
> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
>
> --- linux-2.6.18.i686/mm/filemap.c.orig 2006-10-02 12:59:25.000000000 -0400
> +++ linux-2.6.18.i686/mm/filemap.c 2006-10-04 12:54:51.000000000 -0400
> @@ -2350,7 +2350,7 @@ __generic_file_aio_write_nolock(struct k
> unsigned long nr_segs, loff_t *ppos)
> {
> struct file *file = iocb->ki_filp;
> - const struct address_space * mapping = file->f_mapping;
> + struct address_space * mapping = file->f_mapping;
> size_t ocount; /* original count */
> size_t count; /* after file limit checks */
> struct inode *inode = mapping->host;
> @@ -2417,6 +2417,15 @@ __generic_file_aio_write_nolock(struct k
>
> written = generic_file_buffered_write(iocb, iov, nr_segs,
> pos, ppos, count, written);
> +
> + /*
> + * When falling through to buffered I/O, we need to ensure that the
> + * page cache pages are written to disk and invalidated to preserve
> + * the expected O_DIRECT semantics.
> + */
> + if (unlikely(file->f_flags & O_DIRECT))
> + truncate_inode_pages_range(mapping, pos, pos + count - 1);
> +
> out:
> current->backing_dev_info = NULL;
> return written ? written : err;
eek. truncate_inode_pages() will throw away dirty data. Very dangerous,
much chin-scratching needed.
It also has security implications: if a user can force this shootdown to
happen against dirty data at the right time (perhaps with multiple
processes) then a subsequent read of that part of the file will yield
uninitialised disk data.
next prev parent reply other threads:[~2006-10-04 17:25 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-10-04 17:04 [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path Jeff Moyer
2006-10-04 17:25 ` Andrew Morton [this message]
2006-10-04 17:51 ` Zach Brown
2006-10-04 17:53 ` Jeff Moyer
2006-10-04 18:16 ` Andrew Morton
2006-10-04 18:40 ` Zach Brown
2006-10-04 19:16 ` Andrew Morton
2006-10-04 20:53 ` Jeff Moyer
2006-10-04 21:22 ` Jeff Moyer
2006-10-04 23:55 ` Andrew Morton
2006-10-05 19:31 ` Jeff Moyer
2006-10-06 20:11 ` Andrew Morton
2006-10-11 16:48 ` jmoyer
2006-10-11 18:37 ` Andrew Morton
2006-10-12 22:01 ` Jeff Moyer
2006-10-12 22:37 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20061004102522.d58c00ef.akpm@osdl.org \
--to=akpm@osdl.org \
--cc=jmoyer@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=zach.brown@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox