From: Andrew Morton <akpm@linux-foundation.org>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: cmm@us.ibm.com, jack@suse.cz, linux-ext4@vger.kernel.org,
aneesh.kumar@linux.vnet.ibm.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ext2: Use page_mkwrite vma_operations to get mmap write notification.
Date: Thu, 5 Jun 2008 12:30:45 -0700 [thread overview]
Message-ID: <20080605123045.445e380a.akpm@linux-foundation.org> (raw)
In-Reply-To: <1212685513-32237-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
On Thu, 5 Jun 2008 22:35:12 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> We would like to get notified when we are doing a write on mmap
> section. The changes are needed to handle ENOSPC when writing to an
> mmap section of files with holes.
>
Whoa. You didn't copy anything like enough mailing lists for a change
of this magnitude. I added some.
This is a large change in behaviour!
a) applications will now get a synchronous SIGBUS when modifying a
page over an ENOSPC filesystem. Whereas previously they could have
proceeded to completion and then detected the error via an fsync().
It's going to take more than one skimpy little paragraph to
justify this, and to demonstrate that it is preferable, and to
convince us that nothing will break from this user-visible behaviour
change.
b) we're now doing fs operations (and some I/O) in the pagefault
code. This has several implications:
- performance changes
- potential for deadlocks when a process takes the fault from
within a copy_to_user() in, say, mm/filemap.c
- performing additional memory allocations within that
copy_to_user(). Possibility that these will reenter the
filesystem.
And that's just ext2.
For ext3 things are even more complex, because we have the
journal_start/journal_end pair which is effectively another "lock" for
ranking/deadlock purposes. And now we're taking i_alloc_sem and
lock_page and we're doing ->writepage() and its potential
journal_start(), all potentially within the context of a
copy_to_user().
Now, things become easier because copy_to_user() only happens on the
read() side of things, where we don't hold lock_page() and things are
generally simpler.
But still, this is a high-risk change. I think we should require a lot
of convincing that issues such as the above have been suitably
considered and addressed, and that the change has had *intense*
testing.
> index 47d88da..cc2e106 100644
> --- a/fs/ext2/ext2.h
> +++ b/fs/ext2/ext2.h
> @@ -136,6 +136,7 @@ extern void ext2_get_inode_flags(struct ext2_inode_info *);
> int __ext2_write_begin(struct file *file, struct address_space *mapping,
> loff_t pos, unsigned len, unsigned flags,
> struct page **pagep, void **fsdata);
> +extern int ext2_page_mkwrite(struct vm_area_struct *vma, struct page *page);
>
> /* ioctl.c */
> extern long ext2_ioctl(struct file *, unsigned int, unsigned long);
> diff --git a/fs/ext2/file.c b/fs/ext2/file.c
> index 5f2fa9c..d539dcf 100644
> --- a/fs/ext2/file.c
> +++ b/fs/ext2/file.c
> @@ -18,6 +18,7 @@
> * (jj@sunsite.ms.mff.cuni.cz)
> */
>
> +#include <linux/mm.h>
> #include <linux/time.h>
> #include "ext2.h"
> #include "xattr.h"
> @@ -38,6 +39,24 @@ static int ext2_release_file (struct inode * inode, struct file * filp)
> return 0;
> }
>
> +static struct vm_operations_struct ext2_file_vm_ops = {
> + .fault = filemap_fault,
> + .page_mkwrite = ext2_page_mkwrite,
> +};
> +
> +static int ext2_file_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> + struct address_space *mapping = file->f_mapping;
> +
> + if (!mapping->a_ops->readpage)
> + return -ENOEXEC;
this copied-and-pasted test can now be removed.
> + file_accessed(file);
> + vma->vm_ops = &ext2_file_vm_ops;
> + vma->vm_flags |= VM_CAN_NONLINEAR;
> + return 0;
> +}
> +
> +
> /*
> * We have mostly NULL's here: the current defaults are ok for
> * the ext2 filesystem.
> @@ -52,7 +71,7 @@ static int ext2_release_file (struct inode * inode, struct file * filp)
> #ifdef CONFIG_COMPAT
> .compat_ioctl = ext2_compat_ioctl,
> #endif
> - .mmap = generic_file_mmap,
> + .mmap = ext2_file_mmap,
> .open = generic_file_open,
> .release = ext2_release_file,
> .fsync = ext2_sync_file,
> diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
> index 384fc0d..d4c5c23 100644
> --- a/fs/ext2/inode.c
> +++ b/fs/ext2/inode.c
> @@ -1443,3 +1443,8 @@ int ext2_setattr(struct dentry *dentry, struct iattr *iattr)
> error = ext2_acl_chmod(inode);
> return error;
> }
> +
> +int ext2_page_mkwrite(struct vm_area_struct *vma, struct page *page)
> +{
> + return block_page_mkwrite(vma, page, ext2_get_block);
> +}
> --
> 1.5.5.1.357.g1af8b.dirty
next parent reply other threads:[~2008-06-05 19:31 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1212685513-32237-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
2008-06-05 19:30 ` Andrew Morton [this message]
2008-06-11 15:08 ` [PATCH] ext2: Use page_mkwrite vma_operations to get mmap write notification Aneesh Kumar K.V
2008-06-11 19:07 ` Andrew Morton
2008-06-12 4:06 ` Aneesh Kumar K.V
2008-06-12 12:22 ` Chris Mason
2008-06-12 16:17 ` Jan Kara
2008-06-22 22:50 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080605123045.445e380a.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=cmm@us.ibm.com \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox