* [PATCH] ext2: Use page_mkwrite vma_operations to get mmap write notification. @ 2008-06-05 17:05 Aneesh Kumar K.V 2008-06-05 17:05 ` [PATCH] ext3: " Aneesh Kumar K.V 2008-06-05 19:30 ` [PATCH] ext2: " Andrew Morton 0 siblings, 2 replies; 11+ messages in thread From: Aneesh Kumar K.V @ 2008-06-05 17:05 UTC (permalink / raw) To: cmm, akpm, jack; +Cc: linux-ext4, Aneesh Kumar K.V We would like to get notified when we are doing a write on mmap section. The changes are needed to handle ENOSPC when writing to an mmap section of files with holes. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> --- fs/ext2/ext2.h | 1 + fs/ext2/file.c | 21 ++++++++++++++++++++- fs/ext2/inode.c | 5 +++++ 3 files changed, 26 insertions(+), 1 deletions(-) diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h index 47d88da..cc2e106 100644 --- a/fs/ext2/ext2.h +++ b/fs/ext2/ext2.h @@ -136,6 +136,7 @@ extern void ext2_get_inode_flags(struct ext2_inode_info *); int __ext2_write_begin(struct file *file, struct address_space *mapping, loff_t pos, unsigned len, unsigned flags, struct page **pagep, void **fsdata); +extern int ext2_page_mkwrite(struct vm_area_struct *vma, struct page *page); /* ioctl.c */ extern long ext2_ioctl(struct file *, unsigned int, unsigned long); diff --git a/fs/ext2/file.c b/fs/ext2/file.c index 5f2fa9c..d539dcf 100644 --- a/fs/ext2/file.c +++ b/fs/ext2/file.c @@ -18,6 +18,7 @@ * (jj@sunsite.ms.mff.cuni.cz) */ +#include <linux/mm.h> #include <linux/time.h> #include "ext2.h" #include "xattr.h" @@ -38,6 +39,24 @@ static int ext2_release_file (struct inode * inode, struct file * filp) return 0; } +static struct vm_operations_struct ext2_file_vm_ops = { + .fault = filemap_fault, + .page_mkwrite = ext2_page_mkwrite, +}; + +static int ext2_file_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct address_space *mapping = file->f_mapping; + + if (!mapping->a_ops->readpage) + return -ENOEXEC; + file_accessed(file); + vma->vm_ops = &ext2_file_vm_ops; + vma->vm_flags |= VM_CAN_NONLINEAR; + return 0; +} + + /* * We have mostly NULL's here: the current defaults are ok for * the ext2 filesystem. @@ -52,7 +71,7 @@ static int ext2_release_file (struct inode * inode, struct file * filp) #ifdef CONFIG_COMPAT .compat_ioctl = ext2_compat_ioctl, #endif - .mmap = generic_file_mmap, + .mmap = ext2_file_mmap, .open = generic_file_open, .release = ext2_release_file, .fsync = ext2_sync_file, diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index 384fc0d..d4c5c23 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -1443,3 +1443,8 @@ int ext2_setattr(struct dentry *dentry, struct iattr *iattr) error = ext2_acl_chmod(inode); return error; } + +int ext2_page_mkwrite(struct vm_area_struct *vma, struct page *page) +{ + return block_page_mkwrite(vma, page, ext2_get_block); +} -- 1.5.5.1.357.g1af8b.dirty ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH] ext3: Use page_mkwrite vma_operations to get mmap write notification. 2008-06-05 17:05 [PATCH] ext2: Use page_mkwrite vma_operations to get mmap write notification Aneesh Kumar K.V @ 2008-06-05 17:05 ` Aneesh Kumar K.V 2008-06-05 19:30 ` [PATCH] ext2: " Andrew Morton 1 sibling, 0 replies; 11+ messages in thread From: Aneesh Kumar K.V @ 2008-06-05 17:05 UTC (permalink / raw) To: cmm, akpm, jack; +Cc: linux-ext4, Aneesh Kumar K.V We would like to get notified when we are doing a write on mmap section. The changes are needed to handle ENOSPC when writing to an mmap section of files with holes. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> --- fs/ext3/file.c | 19 +++++++++++- fs/ext3/inode.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++ include/linux/ext3_fs.h | 1 + 3 files changed, 95 insertions(+), 1 deletions(-) diff --git a/fs/ext3/file.c b/fs/ext3/file.c index acc4913..09e22e4 100644 --- a/fs/ext3/file.c +++ b/fs/ext3/file.c @@ -106,6 +106,23 @@ ext3_file_write(struct kiocb *iocb, const struct iovec *iov, return ret; } +static struct vm_operations_struct ext3_file_vm_ops = { + .fault = filemap_fault, + .page_mkwrite = ext3_page_mkwrite, +}; + +static int ext3_file_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct address_space *mapping = file->f_mapping; + + if (!mapping->a_ops->readpage) + return -ENOEXEC; + file_accessed(file); + vma->vm_ops = &ext3_file_vm_ops; + vma->vm_flags |= VM_CAN_NONLINEAR; + return 0; +} + const struct file_operations ext3_file_operations = { .llseek = generic_file_llseek, .read = do_sync_read, @@ -116,7 +133,7 @@ ext3_file_write(struct kiocb *iocb, const struct iovec *iov, #ifdef CONFIG_COMPAT .compat_ioctl = ext3_compat_ioctl, #endif - .mmap = generic_file_mmap, + .mmap = ext3_file_mmap, .open = generic_file_open, .release = ext3_release_file, .fsync = ext3_sync_file, diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c index 6ae4ecf..c8261f0 100644 --- a/fs/ext3/inode.c +++ b/fs/ext3/inode.c @@ -3295,3 +3295,79 @@ int ext3_change_inode_journal_flag(struct inode *inode, int val) return err; } + +static int ext3_bh_prepare_fill(handle_t *handle, struct buffer_head *bh) +{ + if (!buffer_mapped(bh)) { + /* + * Mark buffer as dirty so that + * block_write_full_page() writes it + */ + set_buffer_dirty(bh); + } + return 0; +} + +static int ext3_bh_unmapped(handle_t *handle, struct buffer_head *bh) +{ + return !buffer_mapped(bh); +} + +int ext3_page_mkwrite(struct vm_area_struct *vma, struct page *page) +{ + loff_t size; + unsigned long len; + int ret = -EINVAL; + struct file *file = vma->vm_file; + struct inode *inode = file->f_path.dentry->d_inode; + struct address_space *mapping = inode->i_mapping; + struct writeback_control wbc = { .sync_mode = WB_SYNC_NONE, + .nr_to_write = 1 }; + + /* + * Get i_alloc_sem to stop truncates messing with the inode. We cannot + * get i_mutex because we are already holding mmap_sem. + */ + down_read(&inode->i_alloc_sem); + size = i_size_read(inode); + if (page->mapping != mapping || size <= page_offset(page) + || !PageUptodate(page)) { + /* page got truncated from under us? */ + goto out_unlock; + } + ret = 0; + if (PageMappedToDisk(page)) + goto out_unlock; + + if (page->index == size >> PAGE_CACHE_SHIFT) + len = size & ~PAGE_CACHE_MASK; + else + len = PAGE_CACHE_SIZE; + + if (page_has_buffers(page)) { + /* return if we have all the buffers mapped */ + if (!walk_page_buffers(NULL, page_buffers(page), 0, len, NULL, + ext3_bh_unmapped)) + goto out_unlock; + /* + * Now mark all the buffer head dirty so + * that writepage can write it + */ + walk_page_buffers(NULL, page_buffers(page), 0, len, + NULL, ext3_bh_prepare_fill); + } + /* + * OK, we need to fill the hole... Lock the page and do writepage. + * We can't do write_begin and write_end here because we don't + * have inode_mutex and that allow parallel write_begin, write_end call. + * (lock_page prevent this from happening on the same page though) + */ + lock_page(page); + wbc.range_start = page_offset(page); + wbc.range_end = page_offset(page) + len; + ret = mapping->a_ops->writepage(page, &wbc); + /* writepage unlocks the page */ +out_unlock: + up_read(&inode->i_alloc_sem); + return ret; +} diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h index 36c5403..715c35e 100644 --- a/include/linux/ext3_fs.h +++ b/include/linux/ext3_fs.h @@ -836,6 +836,7 @@ extern void ext3_truncate (struct inode *); extern void ext3_set_inode_flags(struct inode *); extern void ext3_get_inode_flags(struct ext3_inode_info *); extern void ext3_set_aops(struct inode *inode); +extern int ext3_page_mkwrite(struct vm_area_struct *vma, struct page *page); /* ioctl.c */ extern int ext3_ioctl (struct inode *, struct file *, unsigned int, -- 1.5.5.1.357.g1af8b.dirty ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] ext2: Use page_mkwrite vma_operations to get mmap write notification. 2008-06-05 17:05 [PATCH] ext2: Use page_mkwrite vma_operations to get mmap write notification Aneesh Kumar K.V 2008-06-05 17:05 ` [PATCH] ext3: " Aneesh Kumar K.V @ 2008-06-05 19:30 ` Andrew Morton 2008-06-11 15:08 ` Aneesh Kumar K.V 1 sibling, 1 reply; 11+ messages in thread From: Andrew Morton @ 2008-06-05 19:30 UTC (permalink / raw) To: Aneesh Kumar K.V Cc: cmm, jack, linux-ext4, aneesh.kumar, linux-mm, linux-kernel On Thu, 5 Jun 2008 22:35:12 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote: > We would like to get notified when we are doing a write on mmap > section. The changes are needed to handle ENOSPC when writing to an > mmap section of files with holes. > Whoa. You didn't copy anything like enough mailing lists for a change of this magnitude. I added some. This is a large change in behaviour! a) applications will now get a synchronous SIGBUS when modifying a page over an ENOSPC filesystem. Whereas previously they could have proceeded to completion and then detected the error via an fsync(). It's going to take more than one skimpy little paragraph to justify this, and to demonstrate that it is preferable, and to convince us that nothing will break from this user-visible behaviour change. b) we're now doing fs operations (and some I/O) in the pagefault code. This has several implications: - performance changes - potential for deadlocks when a process takes the fault from within a copy_to_user() in, say, mm/filemap.c - performing additional memory allocations within that copy_to_user(). Possibility that these will reenter the filesystem. And that's just ext2. For ext3 things are even more complex, because we have the journal_start/journal_end pair which is effectively another "lock" for ranking/deadlock purposes. And now we're taking i_alloc_sem and lock_page and we're doing ->writepage() and its potential journal_start(), all potentially within the context of a copy_to_user(). Now, things become easier because copy_to_user() only happens on the read() side of things, where we don't hold lock_page() and things are generally simpler. But still, this is a high-risk change. I think we should require a lot of convincing that issues such as the above have been suitably considered and addressed, and that the change has had *intense* testing. > index 47d88da..cc2e106 100644 > --- a/fs/ext2/ext2.h > +++ b/fs/ext2/ext2.h > @@ -136,6 +136,7 @@ extern void ext2_get_inode_flags(struct ext2_inode_info *); > int __ext2_write_begin(struct file *file, struct address_space *mapping, > loff_t pos, unsigned len, unsigned flags, > struct page **pagep, void **fsdata); > +extern int ext2_page_mkwrite(struct vm_area_struct *vma, struct page *page); > > /* ioctl.c */ > extern long ext2_ioctl(struct file *, unsigned int, unsigned long); > diff --git a/fs/ext2/file.c b/fs/ext2/file.c > index 5f2fa9c..d539dcf 100644 > --- a/fs/ext2/file.c > +++ b/fs/ext2/file.c > @@ -18,6 +18,7 @@ > * (jj@sunsite.ms.mff.cuni.cz) > */ > > +#include <linux/mm.h> > #include <linux/time.h> > #include "ext2.h" > #include "xattr.h" > @@ -38,6 +39,24 @@ static int ext2_release_file (struct inode * inode, struct file * filp) > return 0; > } > > +static struct vm_operations_struct ext2_file_vm_ops = { > + .fault = filemap_fault, > + .page_mkwrite = ext2_page_mkwrite, > +}; > + > +static int ext2_file_mmap(struct file *file, struct vm_area_struct *vma) > +{ > + struct address_space *mapping = file->f_mapping; > + > + if (!mapping->a_ops->readpage) > + return -ENOEXEC; this copied-and-pasted test can now be removed. > + file_accessed(file); > + vma->vm_ops = &ext2_file_vm_ops; > + vma->vm_flags |= VM_CAN_NONLINEAR; > + return 0; > +} > + > + > /* > * We have mostly NULL's here: the current defaults are ok for > * the ext2 filesystem. > @@ -52,7 +71,7 @@ static int ext2_release_file (struct inode * inode, struct file * filp) > #ifdef CONFIG_COMPAT > .compat_ioctl = ext2_compat_ioctl, > #endif > - .mmap = generic_file_mmap, > + .mmap = ext2_file_mmap, > .open = generic_file_open, > .release = ext2_release_file, > .fsync = ext2_sync_file, > diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c > index 384fc0d..d4c5c23 100644 > --- a/fs/ext2/inode.c > +++ b/fs/ext2/inode.c > @@ -1443,3 +1443,8 @@ int ext2_setattr(struct dentry *dentry, struct iattr *iattr) > error = ext2_acl_chmod(inode); > return error; > } > + > +int ext2_page_mkwrite(struct vm_area_struct *vma, struct page *page) > +{ > + return block_page_mkwrite(vma, page, ext2_get_block); > +} > -- > 1.5.5.1.357.g1af8b.dirty ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ext2: Use page_mkwrite vma_operations to get mmap write notification. 2008-06-05 19:30 ` [PATCH] ext2: " Andrew Morton @ 2008-06-11 15:08 ` Aneesh Kumar K.V 2008-06-11 19:07 ` Andrew Morton 0 siblings, 1 reply; 11+ messages in thread From: Aneesh Kumar K.V @ 2008-06-11 15:08 UTC (permalink / raw) To: Andrew Morton; +Cc: cmm, jack, linux-ext4, linux-mm, linux-kernel On Thu, Jun 05, 2008 at 12:30:45PM -0700, Andrew Morton wrote: > On Thu, 5 Jun 2008 22:35:12 +0530 > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote: > > > We would like to get notified when we are doing a write on mmap > > section. The changes are needed to handle ENOSPC when writing to an > > mmap section of files with holes. > > > > Whoa. You didn't copy anything like enough mailing lists for a change > of this magnitude. I added some. > > This is a large change in behaviour! > > a) applications will now get a synchronous SIGBUS when modifying a > page over an ENOSPC filesystem. Whereas previously they could have > proceeded to completion and then detected the error via an fsync(). Or not detect the error at all if we don't call fsync() right ? Isn't a synchronous SIGBUS the right behaviour ? > > It's going to take more than one skimpy little paragraph to > justify this, and to demonstrate that it is preferable, and to > convince us that nothing will break from this user-visible behaviour > change. > > b) we're now doing fs operations (and some I/O) in the pagefault > code. This has several implications: > > - performance changes > > - potential for deadlocks when a process takes the fault from > within a copy_to_user() in, say, mm/filemap.c > > - performing additional memory allocations within that > copy_to_user(). Possibility that these will reenter the > filesystem. > > And that's just ext2. > > For ext3 things are even more complex, because we have the > journal_start/journal_end pair which is effectively another "lock" for > ranking/deadlock purposes. And now we're taking i_alloc_sem and > lock_page and we're doing ->writepage() and its potential > journal_start(), all potentially within the context of a > copy_to_user(). One of the reason why we would need this in ext3/ext4 is that we cannot do block allocation in the writepage with the recent locking changes. The locking changes involve changing the locking order of journal_start and page_lock. With writepage we are already called with page_lock and we can't start new transaction needed for block allocation. But if we agree that we should not do block allocation in page_mkwrite we need to add writepages and allocate blocks in writepages. > > Now, things become easier because copy_to_user() only happens on the > read() side of things, where we don't hold lock_page() and things are > generally simpler. > > But still, this is a high-risk change. I think we should require a lot > of convincing that issues such as the above have been suitably > considered and addressed, and that the change has had *intense* > testing. > > > index 47d88da..cc2e106 100644 > > --- a/fs/ext2/ext2.h > > +++ b/fs/ext2/ext2.h > > @@ -136,6 +136,7 @@ extern void ext2_get_inode_flags(struct ext2_inode_info *); > > int __ext2_write_begin(struct file *file, struct address_space *mapping, > > loff_t pos, unsigned len, unsigned flags, > > struct page **pagep, void **fsdata); > > +extern int ext2_page_mkwrite(struct vm_area_struct *vma, struct page *page); > > > > /* ioctl.c */ > > extern long ext2_ioctl(struct file *, unsigned int, unsigned long); > > diff --git a/fs/ext2/file.c b/fs/ext2/file.c > > index 5f2fa9c..d539dcf 100644 > > --- a/fs/ext2/file.c > > +++ b/fs/ext2/file.c > > @@ -18,6 +18,7 @@ > > * (jj@sunsite.ms.mff.cuni.cz) > > */ > > > > +#include <linux/mm.h> > > #include <linux/time.h> > > #include "ext2.h" > > #include "xattr.h" > > @@ -38,6 +39,24 @@ static int ext2_release_file (struct inode * inode, struct file * filp) > > return 0; > > } > > > > +static struct vm_operations_struct ext2_file_vm_ops = { > > + .fault = filemap_fault, > > + .page_mkwrite = ext2_page_mkwrite, > > +}; > > + > > +static int ext2_file_mmap(struct file *file, struct vm_area_struct *vma) > > +{ > > + struct address_space *mapping = file->f_mapping; > > + > > + if (!mapping->a_ops->readpage) > > + return -ENOEXEC; > > this copied-and-pasted test can now be removed. > > > + file_accessed(file); > > + vma->vm_ops = &ext2_file_vm_ops; > > + vma->vm_flags |= VM_CAN_NONLINEAR; > > + return 0; > > +} > > + > > + > > /* > > * We have mostly NULL's here: the current defaults are ok for > > * the ext2 filesystem. > > @@ -52,7 +71,7 @@ static int ext2_release_file (struct inode * inode, struct file * filp) > > #ifdef CONFIG_COMPAT > > .compat_ioctl = ext2_compat_ioctl, > > #endif > > - .mmap = generic_file_mmap, > > + .mmap = ext2_file_mmap, > > .open = generic_file_open, > > .release = ext2_release_file, > > .fsync = ext2_sync_file, > > diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c > > index 384fc0d..d4c5c23 100644 > > --- a/fs/ext2/inode.c > > +++ b/fs/ext2/inode.c > > @@ -1443,3 +1443,8 @@ int ext2_setattr(struct dentry *dentry, struct iattr *iattr) > > error = ext2_acl_chmod(inode); > > return error; > > } > > + > > +int ext2_page_mkwrite(struct vm_area_struct *vma, struct page *page) > > +{ > > + return block_page_mkwrite(vma, page, ext2_get_block); > > +} > > -- -aneesh ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ext2: Use page_mkwrite vma_operations to get mmap write notification. 2008-06-11 15:08 ` Aneesh Kumar K.V @ 2008-06-11 19:07 ` Andrew Morton 2008-06-12 4:06 ` Aneesh Kumar K.V 2008-06-12 16:17 ` Jan Kara 0 siblings, 2 replies; 11+ messages in thread From: Andrew Morton @ 2008-06-11 19:07 UTC (permalink / raw) To: Aneesh Kumar K.V; +Cc: cmm, jack, linux-ext4, linux-mm, linux-kernel On Wed, 11 Jun 2008 20:38:45 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote: > On Thu, Jun 05, 2008 at 12:30:45PM -0700, Andrew Morton wrote: > > On Thu, 5 Jun 2008 22:35:12 +0530 > > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote: > > > > > We would like to get notified when we are doing a write on mmap > > > section. The changes are needed to handle ENOSPC when writing to an > > > mmap section of files with holes. > > > > > > > Whoa. You didn't copy anything like enough mailing lists for a change > > of this magnitude. I added some. > > > > This is a large change in behaviour! > > > > a) applications will now get a synchronous SIGBUS when modifying a > > page over an ENOSPC filesystem. Whereas previously they could have > > proceeded to completion and then detected the error via an fsync(). > > Or not detect the error at all if we don't call fsync() right ? Isn't a > synchronous SIGBUS the right behaviour ? > Not according to POSIX. Or at least posix-several-years-ago, when this last was discussed. The spec doesn't have much useful to say about any of this. It's a significant change in the userspace interface. > > > > > It's going to take more than one skimpy little paragraph to > > justify this, and to demonstrate that it is preferable, and to > > convince us that nothing will break from this user-visible behaviour > > change. > > > > b) we're now doing fs operations (and some I/O) in the pagefault > > code. This has several implications: > > > > - performance changes > > > > - potential for deadlocks when a process takes the fault from > > within a copy_to_user() in, say, mm/filemap.c > > > > - performing additional memory allocations within that > > copy_to_user(). Possibility that these will reenter the > > filesystem. > > > > And that's just ext2. > > > > For ext3 things are even more complex, because we have the > > journal_start/journal_end pair which is effectively another "lock" for > > ranking/deadlock purposes. And now we're taking i_alloc_sem and > > lock_page and we're doing ->writepage() and its potential > > journal_start(), all potentially within the context of a > > copy_to_user(). > > One of the reason why we would need this in ext3/ext4 is that we cannot > do block allocation in the writepage with the recent locking changes. Perhaps those recent locking changes were wrong. > The locking changes involve changing the locking order of journal_start > and page_lock. With writepage we are already called with page_lock and > we can't start new transaction needed for block allocation. ext3_write_begin() has journal_start() nesting inside the lock_page(). > But if we agree that we should not do block allocation in page_mkwrite > we need to add writepages and allocate blocks in writepages. I'm not sure what writepages has to do with pagefaults? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ext2: Use page_mkwrite vma_operations to get mmap write notification. 2008-06-11 19:07 ` Andrew Morton @ 2008-06-12 4:06 ` Aneesh Kumar K.V 2008-06-12 12:22 ` Chris Mason 2008-06-12 16:17 ` Jan Kara 1 sibling, 1 reply; 11+ messages in thread From: Aneesh Kumar K.V @ 2008-06-12 4:06 UTC (permalink / raw) To: Andrew Morton; +Cc: cmm, jack, linux-ext4, linux-mm, linux-kernel On Wed, Jun 11, 2008 at 12:07:49PM -0700, Andrew Morton wrote: > On Wed, 11 Jun 2008 20:38:45 +0530 > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote: > > > On Thu, Jun 05, 2008 at 12:30:45PM -0700, Andrew Morton wrote: > > > On Thu, 5 Jun 2008 22:35:12 +0530 > > > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote: > > > > > > > We would like to get notified when we are doing a write on mmap > > > > section. The changes are needed to handle ENOSPC when writing to an > > > > mmap section of files with holes. > > > > > > > > > > Whoa. You didn't copy anything like enough mailing lists for a change > > > of this magnitude. I added some. > > > > > > This is a large change in behaviour! > > > > > > a) applications will now get a synchronous SIGBUS when modifying a > > > page over an ENOSPC filesystem. Whereas previously they could have > > > proceeded to completion and then detected the error via an fsync(). > > > > Or not detect the error at all if we don't call fsync() right ? Isn't a > > synchronous SIGBUS the right behaviour ? > > > > Not according to POSIX. Or at least posix-several-years-ago, when this > last was discussed. The spec doesn't have much useful to say about any > of this. > > It's a significant change in the userspace interface. > > > > > > > > > It's going to take more than one skimpy little paragraph to > > > justify this, and to demonstrate that it is preferable, and to > > > convince us that nothing will break from this user-visible behaviour > > > change. > > > > > > b) we're now doing fs operations (and some I/O) in the pagefault > > > code. This has several implications: > > > > > > - performance changes > > > > > > - potential for deadlocks when a process takes the fault from > > > within a copy_to_user() in, say, mm/filemap.c > > > > > > - performing additional memory allocations within that > > > copy_to_user(). Possibility that these will reenter the > > > filesystem. > > > > > > And that's just ext2. > > > > > > For ext3 things are even more complex, because we have the > > > journal_start/journal_end pair which is effectively another "lock" for > > > ranking/deadlock purposes. And now we're taking i_alloc_sem and > > > lock_page and we're doing ->writepage() and its potential > > > journal_start(), all potentially within the context of a > > > copy_to_user(). > > > > One of the reason why we would need this in ext3/ext4 is that we cannot > > do block allocation in the writepage with the recent locking changes. > > Perhaps those recent locking changes were wrong. > > > The locking changes involve changing the locking order of journal_start > > and page_lock. With writepage we are already called with page_lock and > > we can't start new transaction needed for block allocation. > > ext3_write_begin() has journal_start() nesting inside the lock_page(). > All those are changed as a part of lock inversion changes. > > But if we agree that we should not do block allocation in page_mkwrite > > we need to add writepages and allocate blocks in writepages. > > I'm not sure what writepages has to do with pagefaults? > The idea is to have ext3/4_writepages. In writepages start a transaction and iterate over the pages take the lock and do block allocation. With that change we should be able to not do block allocation in the page_mkwrite path. We may still want to do block reservation there. Something like. ext4_writepages() { journal_start() for_each_page() lock_page if (bh_unmapped()...) block_alloc() unlock_page journal_stop() } ext4_writepage() { for_each_buffer_head() if (bh_unmapped()) { redirty_page unlock_page return; } } ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ext2: Use page_mkwrite vma_operations to get mmap write notification. 2008-06-12 4:06 ` Aneesh Kumar K.V @ 2008-06-12 12:22 ` Chris Mason 0 siblings, 0 replies; 11+ messages in thread From: Chris Mason @ 2008-06-12 12:22 UTC (permalink / raw) To: Aneesh Kumar K.V Cc: Andrew Morton, cmm, jack, linux-ext4, linux-mm, linux-kernel On Thu, 2008-06-12 at 09:36 +0530, Aneesh Kumar K.V wrote: > On Wed, Jun 11, 2008 at 12:07:49PM -0700, Andrew Morton wrote: > > On Wed, 11 Jun 2008 20:38:45 +0530 > > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote: > The idea is to have ext3/4_writepages. In writepages start a transaction > and iterate over the pages take the lock and do block allocation. With > that change we should be able to not do block allocation in the > page_mkwrite path. We may still want to do block reservation there. > > Something like. > > ext4_writepages() > { > journal_start() > for_each_page() Even with delayed allocation, the vast majority of the pages won't need any allocations. You'll hit delalloc, do a big chunk with the journal lock held and then do simple writepages that don't need anything special. I know the jbd journal_start is cheaper than the reiserfs one is, but it might not perform well to hold it across the long writepages loop. At least reiser saw a good boost when I stopped calling journal_begin in writepage unless the page really needed allocations. With the loop you have in mind, it is easy enough to back out and start the transaction only when required. -chris ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ext2: Use page_mkwrite vma_operations to get mmap write notification. 2008-06-11 19:07 ` Andrew Morton 2008-06-12 4:06 ` Aneesh Kumar K.V @ 2008-06-12 16:17 ` Jan Kara 2008-06-22 22:50 ` Dave Chinner 1 sibling, 1 reply; 11+ messages in thread From: Jan Kara @ 2008-06-12 16:17 UTC (permalink / raw) To: Andrew Morton; +Cc: Aneesh Kumar K.V, cmm, linux-ext4, linux-mm, linux-kernel On Wed 11-06-08 12:07:49, Andrew Morton wrote: > On Wed, 11 Jun 2008 20:38:45 +0530 > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote: > > > On Thu, Jun 05, 2008 at 12:30:45PM -0700, Andrew Morton wrote: > > > On Thu, 5 Jun 2008 22:35:12 +0530 > > > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote: > > > > > > > We would like to get notified when we are doing a write on mmap > > > > section. The changes are needed to handle ENOSPC when writing to an > > > > mmap section of files with holes. > > > > > > > > > > Whoa. You didn't copy anything like enough mailing lists for a change > > > of this magnitude. I added some. > > > > > > This is a large change in behaviour! > > > > > > a) applications will now get a synchronous SIGBUS when modifying a > > > page over an ENOSPC filesystem. Whereas previously they could have > > > proceeded to completion and then detected the error via an fsync(). > > > > Or not detect the error at all if we don't call fsync() right ? Isn't a > > synchronous SIGBUS the right behaviour ? > > > > Not according to POSIX. Or at least posix-several-years-ago, when this > last was discussed. The spec doesn't have much useful to say about any > of this. > > It's a significant change in the userspace interface. > > > > > > > > > It's going to take more than one skimpy little paragraph to > > > justify this, and to demonstrate that it is preferable, and to > > > convince us that nothing will break from this user-visible behaviour > > > change. > > > > > > b) we're now doing fs operations (and some I/O) in the pagefault > > > code. This has several implications: > > > > > > - performance changes > > > > > > - potential for deadlocks when a process takes the fault from > > > within a copy_to_user() in, say, mm/filemap.c > > > > > > - performing additional memory allocations within that > > > copy_to_user(). Possibility that these will reenter the > > > filesystem. > > > > > > And that's just ext2. > > > > > > For ext3 things are even more complex, because we have the > > > journal_start/journal_end pair which is effectively another "lock" for > > > ranking/deadlock purposes. And now we're taking i_alloc_sem and > > > lock_page and we're doing ->writepage() and its potential > > > journal_start(), all potentially within the context of a > > > copy_to_user(). > > > > One of the reason why we would need this in ext3/ext4 is that we cannot > > do block allocation in the writepage with the recent locking changes. > > Perhaps those recent locking changes were wrong. Well, the locking changes are those reverting locking ordering of transaction start and page lock - we have them in ext4 and Aneesh seems to be looking into porting them to ext3 (at least ordered mode rewrite needs them). I wouldn't say they are wrong in principle. It's easier to use page_mkwrite() to allocate blocks so that later in writepage() we don't have to do block allocation which needs to start a transaction (because that means unlocking the page which gets quickly nasty to handle properly...). BTW: XFS, OCFS2 or GFS2 define page_mkwrite() in this manner so they do return SIGBUS when you run out of space when writing to mmapped hole. So it's not like this change is introducing completely new behavior... I can understand that we might not want to change the behavior for ext2 or ext3 but ext4 is IMO definitely free to choose. > > The locking changes involve changing the locking order of journal_start > > and page_lock. With writepage we are already called with page_lock and > > we can't start new transaction needed for block allocation. > > ext3_write_begin() has journal_start() nesting inside the lock_page(). > > > But if we agree that we should not do block allocation in page_mkwrite > > we need to add writepages and allocate blocks in writepages. > > I'm not sure what writepages has to do with pagefaults? Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ext2: Use page_mkwrite vma_operations to get mmap write notification. 2008-06-12 16:17 ` Jan Kara @ 2008-06-22 22:50 ` Dave Chinner 0 siblings, 0 replies; 11+ messages in thread From: Dave Chinner @ 2008-06-22 22:50 UTC (permalink / raw) To: Jan Kara Cc: Andrew Morton, Aneesh Kumar K.V, cmm, linux-ext4, linux-mm, linux-kernel On Thu, Jun 12, 2008 at 06:17:06PM +0200, Jan Kara wrote: > BTW: XFS, OCFS2 or GFS2 define page_mkwrite() in this manner so they do > return SIGBUS when you run out of space when writing to mmapped hole. So > it's not like this change is introducing completely new behavior... I can > understand that we might not want to change the behavior for ext2 or ext3 > but ext4 is IMO definitely free to choose. Yup, and it's the only sane behaviour, IMO. Letting the application continue to oversubscribe filesystem space and then throwing away the data that can't be written well after the fact (potentially after the application has gone away) is a horrendously bad failure mode. This was one of the main publicised features of ->page_mkwrite() - that it would allow up front detection of ENOSPC conditions during mmap writes. I'm extremely surprised to see that this is being considered undesirable after all this time.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Patches for the patchqueue @ 2008-06-06 18:24 Aneesh Kumar K.V 2008-06-06 18:24 ` [PATCH] ext4: cleanup blockallocator Aneesh Kumar K.V 0 siblings, 1 reply; 11+ messages in thread From: Aneesh Kumar K.V @ 2008-06-06 18:24 UTC (permalink / raw) To: cmm, tytso, sandeen; +Cc: linux-ext4 I address most of the comments from the last review. The updated patches are sent as a follow up to this mail. Also the patches and the series file wich indicate their respective ordering in the patchqueue can be found at http://www.radian.org/~kvaneesh/ext4/jun-6-2008/ -aneesh ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH] ext4: cleanup blockallocator 2008-06-06 18:24 Patches for the patchqueue Aneesh Kumar K.V @ 2008-06-06 18:24 ` Aneesh Kumar K.V 2008-06-06 18:24 ` [PATCH] ext2: Use page_mkwrite vma_operations to get mmap write notification Aneesh Kumar K.V 0 siblings, 1 reply; 11+ messages in thread From: Aneesh Kumar K.V @ 2008-06-06 18:24 UTC (permalink / raw) To: cmm, tytso, sandeen; +Cc: linux-ext4, Aneesh Kumar K.V Move the code for block allocation to a single function and add helpers for the allocation of data and meta data blocks Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> --- fs/ext4/balloc.c | 74 ++++++++++++++++++++-------------------------------- fs/ext4/ext4.h | 2 +- fs/ext4/mballoc.c | 2 +- 3 files changed, 31 insertions(+), 47 deletions(-) diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index b961ad1..10c2d49 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -1645,7 +1645,7 @@ int ext4_should_retry_alloc(struct super_block *sb, int *retries) } /** - * ext4_new_blocks_old() -- core block(s) allocation function + * ext4_orlov_new_blocks() -- core block(s) allocation function * @handle: handle to this transaction * @inode: file inode * @goal: given target block(filesystem wide) @@ -1658,7 +1658,7 @@ int ext4_should_retry_alloc(struct super_block *sb, int *retries) * any specific goal block. * */ -ext4_fsblk_t ext4_new_blocks_old(handle_t *handle, struct inode *inode, +ext4_fsblk_t ext4_orlov_new_blocks(handle_t *handle, struct inode *inode, ext4_fsblk_t goal, unsigned long *count, int *errp) { struct buffer_head *bitmap_bh = NULL; @@ -1928,55 +1928,17 @@ ext4_fsblk_t ext4_new_blocks_old(handle_t *handle, struct inode *inode, return 0; } -ext4_fsblk_t ext4_new_meta_block(handle_t *handle, struct inode *inode, - ext4_fsblk_t goal, int *errp) -{ - struct ext4_allocation_request ar; - ext4_fsblk_t ret; - - if (!test_opt(inode->i_sb, MBALLOC)) { - unsigned long count = 1; - ret = ext4_new_blocks_old(handle, inode, goal, &count, errp); - return ret; - } +#define EXT4_META_BLOCK 0x1 - memset(&ar, 0, sizeof(ar)); - ar.inode = inode; - ar.goal = goal; - ar.len = 1; - ret = ext4_mb_new_blocks(handle, &ar, errp); - return ret; -} -ext4_fsblk_t ext4_new_meta_blocks(handle_t *handle, struct inode *inode, - ext4_fsblk_t goal, unsigned long *count, int *errp) -{ - struct ext4_allocation_request ar; - ext4_fsblk_t ret; - - if (!test_opt(inode->i_sb, MBALLOC)) { - ret = ext4_new_blocks_old(handle, inode, goal, count, errp); - return ret; - } - - memset(&ar, 0, sizeof(ar)); - ar.inode = inode; - ar.goal = goal; - ar.len = *count; - ret = ext4_mb_new_blocks(handle, &ar, errp); - *count = ar.len; - return ret; -} - -ext4_fsblk_t ext4_new_blocks(handle_t *handle, struct inode *inode, +static ext4_fsblk_t do_blk_alloc(handle_t *handle, struct inode *inode, ext4_lblk_t iblock, ext4_fsblk_t goal, - unsigned long *count, int *errp) + unsigned long *count, int *errp, int flags) { struct ext4_allocation_request ar; ext4_fsblk_t ret; if (!test_opt(inode->i_sb, MBALLOC)) { - ret = ext4_new_blocks_old(handle, inode, goal, count, errp); - return ret; + return ext4_orlov_new_blocks(handle, inode, goal, count, errp); } memset(&ar, 0, sizeof(ar)); @@ -1990,7 +1952,7 @@ ext4_fsblk_t ext4_new_blocks(handle_t *handle, struct inode *inode, ar.goal = goal; ar.len = *count; ar.logical = iblock; - if (S_ISREG(inode->i_mode)) + if (S_ISREG(inode->i_mode) && !(flags & EXT4_META_BLOCK)) ar.flags = EXT4_MB_HINT_DATA; else /* disable in-core preallocation for non-regular files */ @@ -2001,6 +1963,28 @@ ext4_fsblk_t ext4_new_blocks(handle_t *handle, struct inode *inode, } +ext4_fsblk_t ext4_new_meta_block(handle_t *handle, struct inode *inode, + ext4_fsblk_t goal, int *errp) +{ + unsigned long count = 1; + return do_blk_alloc(handle, inode, 0, goal, + &count, errp, EXT4_META_BLOCK); +} + +ext4_fsblk_t ext4_new_meta_blocks(handle_t *handle, struct inode *inode, + ext4_fsblk_t goal, unsigned long *count, int *errp) +{ + return do_blk_alloc(handle, inode, 0, goal, + count, errp, EXT4_META_BLOCK); +} + +ext4_fsblk_t ext4_new_blocks(handle_t *handle, struct inode *inode, + ext4_lblk_t iblock, ext4_fsblk_t goal, + unsigned long *count, int *errp) +{ + return do_blk_alloc(handle, inode, iblock, goal, count, errp, 0); +} + /** * ext4_count_free_blocks() -- count filesystem free blocks * @sb: superblock diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index b3e62b7..e70ab6e 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -977,7 +977,7 @@ extern ext4_fsblk_t ext4_new_meta_blocks(handle_t *handle, struct inode *inode, extern ext4_fsblk_t ext4_new_blocks(handle_t *handle, struct inode *inode, ext4_lblk_t iblock, ext4_fsblk_t goal, unsigned long *count, int *errp); -extern ext4_fsblk_t ext4_new_blocks_old(handle_t *handle, struct inode *inode, +extern ext4_fsblk_t ext4_orlov_new_blocks(handle_t *handle, struct inode *inode, ext4_fsblk_t goal, unsigned long *count, int *errp); extern void ext4_free_blocks (handle_t *handle, struct inode *inode, ext4_fsblk_t block, unsigned long count, int metadata); diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 21a9e04..0011374 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4035,7 +4035,7 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle, sbi = EXT4_SB(sb); if (!test_opt(sb, MBALLOC)) { - block = ext4_new_blocks_old(handle, ar->inode, ar->goal, + block = ext4_orlov_new_blocks(handle, ar->inode, ar->goal, &(ar->len), errp); return block; } -- 1.5.5.1.357.g1af8b.dirty ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH] ext2: Use page_mkwrite vma_operations to get mmap write notification. 2008-06-06 18:24 ` [PATCH] ext4: cleanup blockallocator Aneesh Kumar K.V @ 2008-06-06 18:24 ` Aneesh Kumar K.V 0 siblings, 0 replies; 11+ messages in thread From: Aneesh Kumar K.V @ 2008-06-06 18:24 UTC (permalink / raw) To: cmm, tytso, sandeen; +Cc: linux-ext4, Aneesh Kumar K.V We would like to get notified when we are doing a write on mmap section. The changes are needed to handle ENOSPC when writing to an mmap section of files with holes. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> --- fs/ext2/ext2.h | 1 + fs/ext2/file.c | 21 ++++++++++++++++++++- fs/ext2/inode.c | 5 +++++ 3 files changed, 26 insertions(+), 1 deletions(-) diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h index 47d88da..cc2e106 100644 --- a/fs/ext2/ext2.h +++ b/fs/ext2/ext2.h @@ -136,6 +136,7 @@ extern void ext2_get_inode_flags(struct ext2_inode_info *); int __ext2_write_begin(struct file *file, struct address_space *mapping, loff_t pos, unsigned len, unsigned flags, struct page **pagep, void **fsdata); +extern int ext2_page_mkwrite(struct vm_area_struct *vma, struct page *page); /* ioctl.c */ extern long ext2_ioctl(struct file *, unsigned int, unsigned long); diff --git a/fs/ext2/file.c b/fs/ext2/file.c index 5f2fa9c..d539dcf 100644 --- a/fs/ext2/file.c +++ b/fs/ext2/file.c @@ -18,6 +18,7 @@ * (jj@sunsite.ms.mff.cuni.cz) */ +#include <linux/mm.h> #include <linux/time.h> #include "ext2.h" #include "xattr.h" @@ -38,6 +39,24 @@ static int ext2_release_file (struct inode * inode, struct file * filp) return 0; } +static struct vm_operations_struct ext2_file_vm_ops = { + .fault = filemap_fault, + .page_mkwrite = ext2_page_mkwrite, +}; + +static int ext2_file_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct address_space *mapping = file->f_mapping; + + if (!mapping->a_ops->readpage) + return -ENOEXEC; + file_accessed(file); + vma->vm_ops = &ext2_file_vm_ops; + vma->vm_flags |= VM_CAN_NONLINEAR; + return 0; +} + + /* * We have mostly NULL's here: the current defaults are ok for * the ext2 filesystem. @@ -52,7 +71,7 @@ static int ext2_release_file (struct inode * inode, struct file * filp) #ifdef CONFIG_COMPAT .compat_ioctl = ext2_compat_ioctl, #endif - .mmap = generic_file_mmap, + .mmap = ext2_file_mmap, .open = generic_file_open, .release = ext2_release_file, .fsync = ext2_sync_file, diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index 384fc0d..d4c5c23 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -1443,3 +1443,8 @@ int ext2_setattr(struct dentry *dentry, struct iattr *iattr) error = ext2_acl_chmod(inode); return error; } + +int ext2_page_mkwrite(struct vm_area_struct *vma, struct page *page) +{ + return block_page_mkwrite(vma, page, ext2_get_block); +} -- 1.5.5.1.357.g1af8b.dirty ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH] ext3: Return EIO if new block is allocated from system zone. @ 2008-03-24 17:04 Aneesh Kumar K.V 2008-03-24 17:04 ` [PATCH] ext3: Use page_mkwrite vma_operations to get mmap write notification Aneesh Kumar K.V 0 siblings, 1 reply; 11+ messages in thread From: Aneesh Kumar K.V @ 2008-03-24 17:04 UTC (permalink / raw) To: cmm, akpm; +Cc: linux-ext4, Aneesh Kumar K.V, Mingming Cao If the block allocator gets blocks out of system zone ext3 calls ext3_error. But if the file system is mounted with errors=continue return with -EIO. System zone is the block range mapping block bitmap, inode bitmap and inode table. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.vnet.ibm.com> --- fs/ext3/balloc.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/ext3/balloc.c b/fs/ext3/balloc.c index da0cb2c..6ce7f7d 100644 --- a/fs/ext3/balloc.c +++ b/fs/ext3/balloc.c @@ -1642,7 +1642,7 @@ allocated: "Allocating block in system zone - " "blocks from "E3FSBLK", length %lu", ret_block, num); - goto out; + goto io_error; } performed_allocation = 1; -- 1.5.5.rc0.16.g02b00.dirty ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH] ext3: Use page_mkwrite vma_operations to get mmap write notification. 2008-03-24 17:04 [PATCH] ext3: Return EIO if new block is allocated from system zone Aneesh Kumar K.V @ 2008-03-24 17:04 ` Aneesh Kumar K.V 2008-03-24 17:04 ` [PATCH] ext4: Export needed symbol for ZERO_PAGE usage in modules Aneesh Kumar K.V 0 siblings, 1 reply; 11+ messages in thread From: Aneesh Kumar K.V @ 2008-03-24 17:04 UTC (permalink / raw) To: cmm, akpm; +Cc: linux-ext4, Aneesh Kumar K.V We would like to get notified when we are doing a write on mmap section. The changes are needed to handle ENOSPC when writing to an mmap section of files with holes. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> --- fs/ext3/file.c | 19 ++++++++++++++++++- fs/ext3/inode.c | 5 +++++ include/linux/ext3_fs.h | 1 + 3 files changed, 24 insertions(+), 1 deletions(-) diff --git a/fs/ext3/file.c b/fs/ext3/file.c index acc4913..09e22e4 100644 --- a/fs/ext3/file.c +++ b/fs/ext3/file.c @@ -106,6 +106,23 @@ force_commit: return ret; } +static struct vm_operations_struct ext3_file_vm_ops = { + .fault = filemap_fault, + .page_mkwrite = ext3_page_mkwrite, +}; + +static int ext3_file_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct address_space *mapping = file->f_mapping; + + if (!mapping->a_ops->readpage) + return -ENOEXEC; + file_accessed(file); + vma->vm_ops = &ext3_file_vm_ops; + vma->vm_flags |= VM_CAN_NONLINEAR; + return 0; +} + const struct file_operations ext3_file_operations = { .llseek = generic_file_llseek, .read = do_sync_read, @@ -116,7 +133,7 @@ const struct file_operations ext3_file_operations = { #ifdef CONFIG_COMPAT .compat_ioctl = ext3_compat_ioctl, #endif - .mmap = generic_file_mmap, + .mmap = ext3_file_mmap, .open = generic_file_open, .release = ext3_release_file, .fsync = ext3_sync_file, diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c index eb95670..2293506 100644 --- a/fs/ext3/inode.c +++ b/fs/ext3/inode.c @@ -3306,3 +3306,8 @@ int ext3_change_inode_journal_flag(struct inode *inode, int val) return err; } + +int ext3_page_mkwrite(struct vm_area_struct *vma, struct page *page) +{ + return block_page_mkwrite(vma, page, ext3_get_block); +} diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h index 36c5403..715c35e 100644 --- a/include/linux/ext3_fs.h +++ b/include/linux/ext3_fs.h @@ -836,6 +836,7 @@ extern void ext3_truncate (struct inode *); extern void ext3_set_inode_flags(struct inode *); extern void ext3_get_inode_flags(struct ext3_inode_info *); extern void ext3_set_aops(struct inode *inode); +extern int ext3_page_mkwrite(struct vm_area_struct *vma, struct page *page); /* ioctl.c */ extern int ext3_ioctl (struct inode *, struct file *, unsigned int, -- 1.5.5.rc0.16.g02b00.dirty ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH] ext4: Export needed symbol for ZERO_PAGE usage in modules. 2008-03-24 17:04 ` [PATCH] ext3: Use page_mkwrite vma_operations to get mmap write notification Aneesh Kumar K.V @ 2008-03-24 17:04 ` Aneesh Kumar K.V 2008-03-24 17:04 ` [PATCH] ext2: Use page_mkwrite vma_operations to get mmap write notification Aneesh Kumar K.V 0 siblings, 1 reply; 11+ messages in thread From: Aneesh Kumar K.V @ 2008-03-24 17:04 UTC (permalink / raw) To: cmm, akpm; +Cc: linux-ext4, Aneesh Kumar K.V ext4 use ZERO_PAGE(0) to zero out blocks. We need to export different symbols in different arch for the usage of ZERO_PAGE in modules. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> --- arch/arm/mm/mmu.c | 1 + arch/m68k/mm/init.c | 1 + arch/powerpc/kernel/ppc_ksyms.c | 2 ++ arch/sparc/kernel/sparc_ksyms.c | 2 ++ arch/sparc64/mm/init.c | 1 + 5 files changed, 7 insertions(+), 0 deletions(-) diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c index d41a75e..2d6d682 100644 --- a/arch/arm/mm/mmu.c +++ b/arch/arm/mm/mmu.c @@ -35,6 +35,7 @@ extern pgd_t swapper_pg_dir[PTRS_PER_PGD]; * zero-initialized data and COW. */ struct page *empty_zero_page; +EXPORT_SYMBOL(empty_zero_page); /* * The pmd table for the upper-most set of pages. diff --git a/arch/m68k/mm/init.c b/arch/m68k/mm/init.c index f42caa7..ee27ed2 100644 --- a/arch/m68k/mm/init.c +++ b/arch/m68k/mm/init.c @@ -69,6 +69,7 @@ void __init m68k_setup_node(int node) */ void *empty_zero_page; +EXPORT_SYMBOL(empty_zero_page); void show_mem(void) { diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c index 9c98424..93fb09a 100644 --- a/arch/powerpc/kernel/ppc_ksyms.c +++ b/arch/powerpc/kernel/ppc_ksyms.c @@ -192,3 +192,5 @@ EXPORT_SYMBOL(intercept_table); EXPORT_SYMBOL(__mtdcr); EXPORT_SYMBOL(__mfdcr); #endif + +EXPORT_SYMBOL(empty_zero_page); diff --git a/arch/sparc/kernel/sparc_ksyms.c b/arch/sparc/kernel/sparc_ksyms.c index c1025e5..e902846 100644 --- a/arch/sparc/kernel/sparc_ksyms.c +++ b/arch/sparc/kernel/sparc_ksyms.c @@ -295,3 +295,5 @@ EXPORT_SYMBOL(do_BUG); /* Sun Power Management Idle Handler */ EXPORT_SYMBOL(pm_idle); + +EXPORT_SYMBOL(empty_zero_page); diff --git a/arch/sparc64/mm/init.c b/arch/sparc64/mm/init.c index b5c3041..2d15f92 100644 --- a/arch/sparc64/mm/init.c +++ b/arch/sparc64/mm/init.c @@ -159,6 +159,7 @@ extern unsigned int sparc_ramdisk_image; extern unsigned int sparc_ramdisk_size; struct page *mem_map_zero __read_mostly; +EXPORT_SYMBOL(mem_map_zero); unsigned int sparc64_highest_unlocked_tlb_ent __read_mostly; -- 1.5.5.rc0.16.g02b00.dirty ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH] ext2: Use page_mkwrite vma_operations to get mmap write notification. 2008-03-24 17:04 ` [PATCH] ext4: Export needed symbol for ZERO_PAGE usage in modules Aneesh Kumar K.V @ 2008-03-24 17:04 ` Aneesh Kumar K.V 0 siblings, 0 replies; 11+ messages in thread From: Aneesh Kumar K.V @ 2008-03-24 17:04 UTC (permalink / raw) To: cmm, akpm; +Cc: linux-ext4, Aneesh Kumar K.V We would like to get notified when we are doing a write on mmap section. The changes are needed to handle ENOSPC when writing to an mmap section of files with holes. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> --- fs/ext2/ext2.h | 1 + fs/ext2/file.c | 21 ++++++++++++++++++++- fs/ext2/inode.c | 5 +++++ 3 files changed, 26 insertions(+), 1 deletions(-) diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h index 47d88da..cc2e106 100644 --- a/fs/ext2/ext2.h +++ b/fs/ext2/ext2.h @@ -136,6 +136,7 @@ extern void ext2_get_inode_flags(struct ext2_inode_info *); int __ext2_write_begin(struct file *file, struct address_space *mapping, loff_t pos, unsigned len, unsigned flags, struct page **pagep, void **fsdata); +extern int ext2_page_mkwrite(struct vm_area_struct *vma, struct page *page); /* ioctl.c */ extern long ext2_ioctl(struct file *, unsigned int, unsigned long); diff --git a/fs/ext2/file.c b/fs/ext2/file.c index 5f2fa9c..d539dcf 100644 --- a/fs/ext2/file.c +++ b/fs/ext2/file.c @@ -18,6 +18,7 @@ * (jj@sunsite.ms.mff.cuni.cz) */ +#include <linux/mm.h> #include <linux/time.h> #include "ext2.h" #include "xattr.h" @@ -38,6 +39,24 @@ static int ext2_release_file (struct inode * inode, struct file * filp) return 0; } +static struct vm_operations_struct ext2_file_vm_ops = { + .fault = filemap_fault, + .page_mkwrite = ext2_page_mkwrite, +}; + +static int ext2_file_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct address_space *mapping = file->f_mapping; + + if (!mapping->a_ops->readpage) + return -ENOEXEC; + file_accessed(file); + vma->vm_ops = &ext2_file_vm_ops; + vma->vm_flags |= VM_CAN_NONLINEAR; + return 0; +} + + /* * We have mostly NULL's here: the current defaults are ok for * the ext2 filesystem. @@ -52,7 +71,7 @@ const struct file_operations ext2_file_operations = { #ifdef CONFIG_COMPAT .compat_ioctl = ext2_compat_ioctl, #endif - .mmap = generic_file_mmap, + .mmap = ext2_file_mmap, .open = generic_file_open, .release = ext2_release_file, .fsync = ext2_sync_file, diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index c620068..196e063 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -1444,3 +1444,8 @@ int ext2_setattr(struct dentry *dentry, struct iattr *iattr) error = ext2_acl_chmod(inode); return error; } + +int ext2_page_mkwrite(struct vm_area_struct *vma, struct page *page) +{ + return block_page_mkwrite(vma, page, ext2_get_block); +} -- 1.5.5.rc0.16.g02b00.dirty ^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2008-06-22 22:50 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-06-05 17:05 [PATCH] ext2: Use page_mkwrite vma_operations to get mmap write notification Aneesh Kumar K.V 2008-06-05 17:05 ` [PATCH] ext3: " Aneesh Kumar K.V 2008-06-05 19:30 ` [PATCH] ext2: " Andrew Morton 2008-06-11 15:08 ` Aneesh Kumar K.V 2008-06-11 19:07 ` Andrew Morton 2008-06-12 4:06 ` Aneesh Kumar K.V 2008-06-12 12:22 ` Chris Mason 2008-06-12 16:17 ` Jan Kara 2008-06-22 22:50 ` Dave Chinner -- strict thread matches above, loose matches on Subject: below -- 2008-06-06 18:24 Patches for the patchqueue Aneesh Kumar K.V 2008-06-06 18:24 ` [PATCH] ext4: cleanup blockallocator Aneesh Kumar K.V 2008-06-06 18:24 ` [PATCH] ext2: Use page_mkwrite vma_operations to get mmap write notification Aneesh Kumar K.V 2008-03-24 17:04 [PATCH] ext3: Return EIO if new block is allocated from system zone Aneesh Kumar K.V 2008-03-24 17:04 ` [PATCH] ext3: Use page_mkwrite vma_operations to get mmap write notification Aneesh Kumar K.V 2008-03-24 17:04 ` [PATCH] ext4: Export needed symbol for ZERO_PAGE usage in modules Aneesh Kumar K.V 2008-03-24 17:04 ` [PATCH] ext2: Use page_mkwrite vma_operations to get mmap write notification Aneesh Kumar K.V
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox