linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ext4: Use page_mkwrite vma_operations to get mmap write notification.
@ 2008-02-18 11:53 Aneesh Kumar K.V
  2008-02-18 11:53 ` [PATCH] ext4: Don't mark the filesystem with errors if we fail to fallocate Aneesh Kumar K.V
  0 siblings, 1 reply; 8+ messages in thread
From: Aneesh Kumar K.V @ 2008-02-18 11:53 UTC (permalink / raw)
  To: tytso, cmm; +Cc: linux-ext4, Aneesh Kumar K.V

We would like to get notified when we are doing a write on mmap section.
This is needed with respect to preallocated area. We split the preallocated
area into initialzed extent and uninitialzed extent in the call back. This
let us handle ENOSPC better. Otherwise we get ENOSPC in the writepage and
that would result in data loss.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/ext4/file.c          |   19 ++++++++++++++++++-
 fs/ext4/inode.c         |    6 ++++++
 include/linux/ext4_fs.h |    1 +
 3 files changed, 25 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 20507a2..77341c1 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -123,6 +123,23 @@ force_commit:
 	return ret;
 }
 
+static struct vm_operations_struct ext4_file_vm_ops = {
+	.fault		= filemap_fault,
+	.page_mkwrite   = ext4_page_mkwrite,
+};
+
+static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct address_space *mapping = file->f_mapping;
+
+	if (!mapping->a_ops->readpage)
+		return -ENOEXEC;
+	file_accessed(file);
+	vma->vm_ops = &ext4_file_vm_ops;
+	vma->vm_flags |= VM_CAN_NONLINEAR;
+	return 0;
+}
+
 const struct file_operations ext4_file_operations = {
 	.llseek		= generic_file_llseek,
 	.read		= do_sync_read,
@@ -133,7 +150,7 @@ const struct file_operations ext4_file_operations = {
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	= ext4_compat_ioctl,
 #endif
-	.mmap		= generic_file_mmap,
+	.mmap		= ext4_file_mmap,
 	.open		= generic_file_open,
 	.release	= ext4_release_file,
 	.fsync		= ext4_sync_file,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 34f3eb6..81faa67 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3466,3 +3466,9 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
 
 	return err;
 }
+
+int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page)
+{
+	return block_page_mkwrite(vma, page, ext4_get_block);
+}
+
diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h
index 22810b1..8f5a563 100644
--- a/include/linux/ext4_fs.h
+++ b/include/linux/ext4_fs.h
@@ -1059,6 +1059,7 @@ extern void ext4_set_aops(struct inode *inode);
 extern int ext4_writepage_trans_blocks(struct inode *);
 extern int ext4_block_truncate_page(handle_t *handle, struct page *page,
 		struct address_space *mapping, loff_t from);
+extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page);
 
 /* ioctl.c */
 extern long ext4_ioctl(struct file *, unsigned int, unsigned long);
-- 
1.5.4.1.97.g40aab-dirty

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH] ext4: Don't mark the filesystem with errors if we fail to fallocate.
  2008-02-18 11:53 [PATCH] ext4: Use page_mkwrite vma_operations to get mmap write notification Aneesh Kumar K.V
@ 2008-02-18 11:53 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 8+ messages in thread
From: Aneesh Kumar K.V @ 2008-02-18 11:53 UTC (permalink / raw)
  To: tytso, cmm; +Cc: linux-ext4, Aneesh Kumar K.V

IF we fail fallocate don't call ext4_error. Also don't hide errors
from ext4_get_blocks_wrap

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/ext4/extents.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 5b22f71..bb01ac6 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2653,13 +2653,14 @@ retry:
 		ret = ext4_get_blocks_wrap(handle, inode, block,
 					  max_blocks, &map_bh,
 					  EXT4_CREATE_UNINITIALIZED_EXT, 0);
-		WARN_ON(ret <= 0);
 		if (ret <= 0) {
+#ifdef EXT4FS_DEBUG
+			WARN_ON(ret <= 0);
 			ext4_error(inode->i_sb, "ext4_fallocate",
 				    "ext4_ext_get_blocks returned error: "
 				    "inode#%lu, block=%u, max_blocks=%lu",
 				    inode->i_ino, block, max_blocks);
-			ret = -EIO;
+#endif
 			ext4_mark_inode_dirty(handle, inode);
 			ret2 = ext4_journal_stop(handle);
 			break;
-- 
1.5.4.1.97.g40aab-dirty

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH] ext4: Use page_mkwrite vma_operations to get mmap write notification.
@ 2008-02-22 14:39 Aneesh Kumar K.V
  2008-02-22 18:10 ` Mingming Cao
  0 siblings, 1 reply; 8+ messages in thread
From: Aneesh Kumar K.V @ 2008-02-22 14:39 UTC (permalink / raw)
  To: cmm, tytso; +Cc: linux-ext4, Aneesh Kumar K.V

We would like to get notified when we are doing a write on mmap section.
This is needed with respect to preallocated area. We split the preallocated
area into initialzed extent and uninitialzed extent in the call back. This
let us handle ENOSPC better. Otherwise we get ENOSPC in the writepage and
that would result in data loss. The changes are also needed to handle ENOSPC
when writing to an mmap section of files with holes.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/ext4/file.c          |   19 ++++++++++++++-
 fs/ext4/inode.c         |   60 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/ext4_fs.h |    1 +
 3 files changed, 79 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 20507a2..77341c1 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -123,6 +123,23 @@ force_commit:
 	return ret;
 }
 
+static struct vm_operations_struct ext4_file_vm_ops = {
+	.fault		= filemap_fault,
+	.page_mkwrite   = ext4_page_mkwrite,
+};
+
+static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct address_space *mapping = file->f_mapping;
+
+	if (!mapping->a_ops->readpage)
+		return -ENOEXEC;
+	file_accessed(file);
+	vma->vm_ops = &ext4_file_vm_ops;
+	vma->vm_flags |= VM_CAN_NONLINEAR;
+	return 0;
+}
+
 const struct file_operations ext4_file_operations = {
 	.llseek		= generic_file_llseek,
 	.read		= do_sync_read,
@@ -133,7 +150,7 @@ const struct file_operations ext4_file_operations = {
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	= ext4_compat_ioctl,
 #endif
-	.mmap		= generic_file_mmap,
+	.mmap		= ext4_file_mmap,
 	.open		= generic_file_open,
 	.release	= ext4_release_file,
 	.fsync		= ext4_sync_file,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5b5d63d..00af97d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3490,3 +3490,63 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
 
 	return err;
 }
+
+int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page)
+{
+	unsigned long end;
+	loff_t size;
+	handle_t *handle;
+	int ret = -EINVAL, needed_blocks;
+	struct file *file   = vma->vm_file;
+	struct inode *inode = file->f_path.dentry->d_inode;
+
+	needed_blocks = ext4_writepage_trans_blocks(inode);
+	/* We need to take inode mutex to prevent parallel write */
+	mutex_lock(&inode->i_mutex);
+	lock_page(page);
+	size = i_size_read(inode);
+	if ((page->mapping != inode->i_mapping) ||
+	    (page_offset(page) > size)) {
+		/* page got truncated out from underneath us */
+		goto out_unlock;
+	}
+
+	/* page is wholly or partially inside EOF */
+	if (((page->index + 1) << PAGE_CACHE_SHIFT) > size)
+		end = size & ~PAGE_CACHE_MASK;
+	else
+		end = PAGE_CACHE_SIZE;
+
+	handle = ext4_journal_start(inode, needed_blocks);
+	if (IS_ERR(handle)) {
+		ret = PTR_ERR(handle);
+		goto out_unlock;
+	}
+	/* Will zero out the pages if buffer is marked new */
+	ret = block_prepare_write(page, 0, end, ext4_get_block);
+
+	if (!ret && ext4_should_journal_data(inode)) {
+		ret = walk_page_buffers(handle, page_buffers(page),
+				0, end, NULL, do_journal_get_write_access);
+		if (!ret)
+			ret = walk_page_buffers(handle, page_buffers(page),
+						0, end, NULL, write_end_fn);
+		/*
+		 * we don't want to call block_commit_write in journalled mode
+		 */
+		ext4_journal_stop(handle);
+		goto out_unlock;
+	}
+	if (!ret && ext4_should_order_data(inode)) {
+		ret = walk_page_buffers(handle, page_buffers(page),
+					0, end, NULL, ext4_journal_dirty_data);
+	}
+	if (!ret)
+		ret = block_commit_write(page, 0, end);
+
+	ext4_journal_stop(handle);
+out_unlock:
+	unlock_page(page);
+	mutex_unlock(&inode->i_mutex);
+	return ret;
+}
diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h
index 22810b1..8f5a563 100644
--- a/include/linux/ext4_fs.h
+++ b/include/linux/ext4_fs.h
@@ -1059,6 +1059,7 @@ extern void ext4_set_aops(struct inode *inode);
 extern int ext4_writepage_trans_blocks(struct inode *);
 extern int ext4_block_truncate_page(handle_t *handle, struct page *page,
 		struct address_space *mapping, loff_t from);
+extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page);
 
 /* ioctl.c */
 extern long ext4_ioctl(struct file *, unsigned int, unsigned long);
-- 
1.5.4.1.97.g40aab-dirty

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] ext4: Use page_mkwrite vma_operations to get mmap write notification.
  2008-02-22 14:39 [PATCH] ext4: Use page_mkwrite vma_operations to get mmap write notification Aneesh Kumar K.V
@ 2008-02-22 18:10 ` Mingming Cao
  2008-02-22 18:23   ` Aneesh Kumar K.V
  0 siblings, 1 reply; 8+ messages in thread
From: Mingming Cao @ 2008-02-22 18:10 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: tytso, linux-ext4

On Fri, 2008-02-22 at 20:09 +0530, Aneesh Kumar K.V wrote:
> We would like to get notified when we are doing a write on mmap section.
> This is needed with respect to preallocated area. We split the preallocated
> area into initialzed extent and uninitialzed extent in the call back. This
> let us handle ENOSPC better. Otherwise we get ENOSPC in the writepage and
> that would result in data loss. The changes are also needed to handle ENOSPC
> when writing to an mmap section of files with holes.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  fs/ext4/file.c          |   19 ++++++++++++++-
>  fs/ext4/inode.c         |   60 +++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/ext4_fs.h |    1 +
>  3 files changed, 79 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index 20507a2..77341c1 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -123,6 +123,23 @@ force_commit:
>  	return ret;
>  }
> 
> +static struct vm_operations_struct ext4_file_vm_ops = {
> +	.fault		= filemap_fault,
> +	.page_mkwrite   = ext4_page_mkwrite,
> +};
> +
> +static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +	struct address_space *mapping = file->f_mapping;
> +
> +	if (!mapping->a_ops->readpage)
> +		return -ENOEXEC;
> +	file_accessed(file);
> +	vma->vm_ops = &ext4_file_vm_ops;
> +	vma->vm_flags |= VM_CAN_NONLINEAR;
> +	return 0;
> +}
> +
>  const struct file_operations ext4_file_operations = {
>  	.llseek		= generic_file_llseek,
>  	.read		= do_sync_read,
> @@ -133,7 +150,7 @@ const struct file_operations ext4_file_operations = {
>  #ifdef CONFIG_COMPAT
>  	.compat_ioctl	= ext4_compat_ioctl,
>  #endif
> -	.mmap		= generic_file_mmap,
> +	.mmap		= ext4_file_mmap,
>  	.open		= generic_file_open,
>  	.release	= ext4_release_file,
>  	.fsync		= ext4_sync_file,
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 5b5d63d..00af97d 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3490,3 +3490,63 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
> 
>  	return err;
>  }
> +
> +int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page)
> +{
> +	unsigned long end;
> +	loff_t size;
> +	handle_t *handle;
> +	int ret = -EINVAL, needed_blocks;
> +	struct file *file   = vma->vm_file;
> +	struct inode *inode = file->f_path.dentry->d_inode;
> +
> +	needed_blocks = ext4_writepage_trans_blocks(inode);
> +	/* We need to take inode mutex to prevent parallel write */
> +	mutex_lock(&inode->i_mutex);
> +	lock_page(page);
> +	size = i_size_read(inode);
> +	if ((page->mapping != inode->i_mapping) ||
> +	    (page_offset(page) > size)) {
> +		/* page got truncated out from underneath us */
> +		goto out_unlock;
> +	}
> +
> +	/* page is wholly or partially inside EOF */
> +	if (((page->index + 1) << PAGE_CACHE_SHIFT) > size)
> +		end = size & ~PAGE_CACHE_MASK;
> +	else
> +		end = PAGE_CACHE_SIZE;
> +
> +	handle = ext4_journal_start(inode, needed_blocks);
> +	if (IS_ERR(handle)) {
> +		ret = PTR_ERR(handle);
> +		goto out_unlock;
> +	}
> +	/* Will zero out the pages if buffer is marked new */
> +	ret = block_prepare_write(page, 0, end, ext4_get_block);
> +
> +	if (!ret && ext4_should_journal_data(inode)) {
> +		ret = walk_page_buffers(handle, page_buffers(page),
> +				0, end, NULL, do_journal_get_write_access);
> +		if (!ret)
> +			ret = walk_page_buffers(handle, page_buffers(page),
> +						0, end, NULL, write_end_fn);
> +		/*
> +		 * we don't want to call block_commit_write in journalled mode
> +		 */
> +		ext4_journal_stop(handle);
> +		goto out_unlock;
> +	}
> +	if (!ret && ext4_should_order_data(inode)) {
> +		ret = walk_page_buffers(handle, page_buffers(page),
> +					0, end, NULL, ext4_journal_dirty_data);
> +	}
> +	if (!ret)
> +		ret = block_commit_write(page, 0, end);
> +
Hmm, it seems wired to do commit_write when the page is about becoming
writable, but maybe that's the way it needs to?

Don't we need to update the i_size somewhere?

> +	ext4_journal_stop(handle);
> +out_unlock:
> +	unlock_page(page);
> +	mutex_unlock(&inode->i_mutex);
> +	return ret;
> +}

It seems this combined the three journalling mode prepare_write() code
here:( 

Since prepare_write() and commit_write() is going to sunset, why not
simply calling mappings->a_ops->write_begin() and then write_end()? that
should take care of pretty much the journalling and the page operations,
no?

Mingming
> diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h
> index 22810b1..8f5a563 100644
> --- a/include/linux/ext4_fs.h
> +++ b/include/linux/ext4_fs.h
> @@ -1059,6 +1059,7 @@ extern void ext4_set_aops(struct inode *inode);
>  extern int ext4_writepage_trans_blocks(struct inode *);
>  extern int ext4_block_truncate_page(handle_t *handle, struct page *page,
>  		struct address_space *mapping, loff_t from);
> +extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page);
> 
>  /* ioctl.c */
>  extern long ext4_ioctl(struct file *, unsigned int, unsigned long);

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ext4: Use page_mkwrite vma_operations to get mmap write notification.
  2008-02-22 18:10 ` Mingming Cao
@ 2008-02-22 18:23   ` Aneesh Kumar K.V
  2008-02-22 19:28     ` Mingming Cao
  0 siblings, 1 reply; 8+ messages in thread
From: Aneesh Kumar K.V @ 2008-02-22 18:23 UTC (permalink / raw)
  To: Mingming Cao; +Cc: tytso, linux-ext4

On Fri, Feb 22, 2008 at 10:10:48AM -0800, Mingming Cao wrote:
> On Fri, 2008-02-22 at 20:09 +0530, Aneesh Kumar K.V wrote:

.....

> > +		ext4_journal_stop(handle);
> > +		goto out_unlock;
> > +	}
> > +	if (!ret && ext4_should_order_data(inode)) {
> > +		ret = walk_page_buffers(handle, page_buffers(page),
> > +					0, end, NULL, ext4_journal_dirty_data);
> > +	}
> > +	if (!ret)
> > +		ret = block_commit_write(page, 0, end);
> > +
> Hmm, it seems wired to do commit_write when the page is about becoming
> writable, but maybe that's the way it needs to?
> 
> Don't we need to update the i_size somewhere?

block_commit_write simply iterate over buffer_head of page and mark them
dirty. That is why we don't want to call that for data=journalled mode.

> 
> > +	ext4_journal_stop(handle);
> > +out_unlock:
> > +	unlock_page(page);
> > +	mutex_unlock(&inode->i_mutex);
> > +	return ret;
> > +}
> 
> It seems this combined the three journalling mode prepare_write() code
> here:( 
> 
> Since prepare_write() and commit_write() is going to sunset, why not
> simply calling mappings->a_ops->write_begin() and then write_end()? that
> should take care of pretty much the journalling and the page operations,
> no?

write_begin and write_end works with the user space buffer. In this case
we don't have one. Also what ext4_page_mkwrite does is mostly what
write_begin/write_end does except the copy of user space buffer.


-aneesh

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ext4: Use page_mkwrite vma_operations to get mmap write notification.
  2008-02-22 18:23   ` Aneesh Kumar K.V
@ 2008-02-22 19:28     ` Mingming Cao
  0 siblings, 0 replies; 8+ messages in thread
From: Mingming Cao @ 2008-02-22 19:28 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: tytso, linux-ext4

On Fri, 2008-02-22 at 23:53 +0530, Aneesh Kumar K.V wrote:
> On Fri, Feb 22, 2008 at 10:10:48AM -0800, Mingming Cao wrote:
> > On Fri, 2008-02-22 at 20:09 +0530, Aneesh Kumar K.V wrote:
> 
> .....
> 
> > > +		ext4_journal_stop(handle);
> > > +		goto out_unlock;
> > > +	}
> > > +	if (!ret && ext4_should_order_data(inode)) {
> > > +		ret = walk_page_buffers(handle, page_buffers(page),
> > > +					0, end, NULL, ext4_journal_dirty_data);
> > > +	}
> > > +	if (!ret)
> > > +		ret = block_commit_write(page, 0, end);
> > > +
> > Hmm, it seems wired to do commit_write when the page is about becoming
> > writable, but maybe that's the way it needs to?
> > 
> > Don't we need to update the i_size somewhere?
> 
ah, i_size didn't change with mapped IO.

> block_commit_write simply iterate over buffer_head of page and mark them
> dirty. That is why we don't want to call that for data=journalled mode.
> 

Right, but it still seems odd to mark the buffer_heard dirty *before*
the write happens.

I am confused, if i_size is not changing, then what we are journalling
about?  Keep journal ordering? but we haven't write anything yet....

Mingming
> > 
> > > +	ext4_journal_stop(handle);
> > > +out_unlock:
> > > +	unlock_page(page);
> > > +	mutex_unlock(&inode->i_mutex);
> > > +	return ret;
> > > +}
> > 
> > It seems this combined the three journalling mode prepare_write() code
> > here:( 
> > 
> > Since prepare_write() and commit_write() is going to sunset, why not
> > simply calling mappings->a_ops->write_begin() and then write_end()? that
> > should take care of pretty much the journalling and the page operations,
> > no?
> 
> write_begin and write_end works with the user space buffer. In this case
> we don't have one. Also what ext4_page_mkwrite does is mostly what
> write_begin/write_end does except the copy of user space buffer.
> 
> 
> -aneesh

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] ext4: Use page_mkwrite vma_operations to get mmap write notification.
  2008-05-21 17:44       ` [PATCH] ext4: Inverse locking order of page_lock and transaction start Aneesh Kumar K.V
@ 2008-05-21 17:44         ` Aneesh Kumar K.V
  0 siblings, 0 replies; 8+ messages in thread
From: Aneesh Kumar K.V @ 2008-05-21 17:44 UTC (permalink / raw)
  To: cmm, tytso, sandeen; +Cc: linux-ext4, Aneesh Kumar K.V

We would like to get notified when we are doing a write on mmap section.
This is needed with respect to preallocated area. We split the preallocated
area into initialzed extent and uninitialzed extent in the call back. This
let us handle ENOSPC better. Otherwise we get ENOSPC in the writepage and
that would result in data loss. The changes are also needed to handle ENOSPC
when writing to an mmap section of files with holes.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/ext4/ext4.h  |    1 +
 fs/ext4/file.c  |   19 +++++++++++++++++-
 fs/ext4/inode.c |   58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 77 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 6605076..77cbb28 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1053,6 +1053,7 @@ extern void ext4_set_aops(struct inode *inode);
 extern int ext4_writepage_trans_blocks(struct inode *);
 extern int ext4_block_truncate_page(handle_t *handle, struct page *page,
 		struct address_space *mapping, loff_t from);
+extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page);
 
 /* ioctl.c */
 extern long ext4_ioctl(struct file *, unsigned int, unsigned long);
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 4159be6..b9510ba 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -123,6 +123,23 @@ ext4_file_write(struct kiocb *iocb, const struct iovec *iov,
 	return ret;
 }
 
+static struct vm_operations_struct ext4_file_vm_ops = {
+	.fault		= filemap_fault,
+	.page_mkwrite   = ext4_page_mkwrite,
+};
+
+static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct address_space *mapping = file->f_mapping;
+
+	if (!mapping->a_ops->readpage)
+		return -ENOEXEC;
+	file_accessed(file);
+	vma->vm_ops = &ext4_file_vm_ops;
+	vma->vm_flags |= VM_CAN_NONLINEAR;
+	return 0;
+}
+
 const struct file_operations ext4_file_operations = {
 	.llseek		= generic_file_llseek,
 	.read		= do_sync_read,
@@ -133,7 +150,7 @@ ext4_file_write(struct kiocb *iocb, const struct iovec *iov,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	= ext4_compat_ioctl,
 #endif
-	.mmap		= generic_file_mmap,
+	.mmap		= ext4_file_mmap,
 	.open		= generic_file_open,
 	.release	= ext4_release_file,
 	.fsync		= ext4_sync_file,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 4a7ed29..d361693 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3555,3 +3555,61 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
 
 	return err;
 }
+
+static int ext4_bh_unmapped(handle_t *handle, struct buffer_head *bh)
+{
+	return !buffer_mapped(bh);
+}
+
+int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page)
+{
+	loff_t size;
+	unsigned long len;
+	int ret = -EINVAL;
+	struct file *file = vma->vm_file;
+	struct inode *inode = file->f_path.dentry->d_inode;
+	struct address_space *mapping = inode->i_mapping;
+	struct writeback_control wbc = { .sync_mode = WB_SYNC_NONE,
+					 .nr_to_write = 1 };
+
+	/*
+	 * Get i_alloc_sem to stop truncates messing with the inode. We cannot
+	 * get i_mutex because we are already holding mmap_sem.
+	 */
+	down_read(&inode->i_alloc_sem);
+	size = i_size_read(inode);
+	if (page->mapping != mapping || size <= page_offset(page)
+	    || !PageUptodate(page)) {
+		/* page got truncated from under us? */
+		goto out_unlock;
+	}
+	ret = 0;
+	if (PageMappedToDisk(page))
+		goto out_unlock;
+
+	if (page->index == size >> PAGE_CACHE_SHIFT)
+		len = size & ~PAGE_CACHE_MASK;
+	else
+		len = PAGE_CACHE_SIZE;
+
+	if (page_has_buffers(page)) {
+		/* return if we have all the buffers mapped */
+		if (!walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
+				       ext4_bh_unmapped))
+			goto out_unlock;
+	}
+	/*
+	 * OK, we need to fill the hole... Lock the page and do writepage.
+	 * We can't do write_begin and write_end here because we don't
+	 * have inode_mutex and that allow parallel write_begin, write_end call.
+	 * (lock_page prevent this from happening on the same page though)
+	 */
+	lock_page(page);
+	wbc.range_start = page_offset(page);
+	wbc.range_end = page_offset(page) + PAGE_CACHE_SIZE;
+	ret = mapping->a_ops->writepage(page, &wbc);
+	/* writepage unlocks the page */
+out_unlock:
+	up_read(&inode->i_alloc_sem);
+	return ret;
+}
-- 
1.5.5.1.211.g65ea3.dirty


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH] ext4: Use page_mkwrite vma_operations to get mmap write notification.
  2008-05-30 13:39 [PATCH -v2] delalloc and journal locking order inversion fixes Aneesh Kumar K.V
@ 2008-05-30 13:39 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 8+ messages in thread
From: Aneesh Kumar K.V @ 2008-05-30 13:39 UTC (permalink / raw)
  To: cmm, jack; +Cc: linux-ext4, Aneesh Kumar K.V, Theodore Ts'o

We would like to get notified when we are doing a write on mmap section.
This is needed with respect to preallocated area. We split the preallocated
area into initialzed extent and uninitialzed extent in the call back. This
let us handle ENOSPC better. Otherwise we get ENOSPC in the writepage and
that would result in data loss. The changes are also needed to handle ENOSPC
when writing to an mmap section of files with holes.

Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/ext4/ext4.h  |    1 +
 fs/ext4/file.c  |   19 +++++++++++++-
 fs/ext4/inode.c |   76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 95 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 6605076..77cbb28 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1053,6 +1053,7 @@ extern void ext4_set_aops(struct inode *inode);
 extern int ext4_writepage_trans_blocks(struct inode *);
 extern int ext4_block_truncate_page(handle_t *handle, struct page *page,
 		struct address_space *mapping, loff_t from);
+extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page);
 
 /* ioctl.c */
 extern long ext4_ioctl(struct file *, unsigned int, unsigned long);
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 4159be6..b9510ba 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -123,6 +123,23 @@ ext4_file_write(struct kiocb *iocb, const struct iovec *iov,
 	return ret;
 }
 
+static struct vm_operations_struct ext4_file_vm_ops = {
+	.fault		= filemap_fault,
+	.page_mkwrite   = ext4_page_mkwrite,
+};
+
+static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct address_space *mapping = file->f_mapping;
+
+	if (!mapping->a_ops->readpage)
+		return -ENOEXEC;
+	file_accessed(file);
+	vma->vm_ops = &ext4_file_vm_ops;
+	vma->vm_flags |= VM_CAN_NONLINEAR;
+	return 0;
+}
+
 const struct file_operations ext4_file_operations = {
 	.llseek		= generic_file_llseek,
 	.read		= do_sync_read,
@@ -133,7 +150,7 @@ ext4_file_write(struct kiocb *iocb, const struct iovec *iov,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	= ext4_compat_ioctl,
 #endif
-	.mmap		= generic_file_mmap,
+	.mmap		= ext4_file_mmap,
 	.open		= generic_file_open,
 	.release	= ext4_release_file,
 	.fsync		= ext4_sync_file,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 4a7ed29..bc52ef5 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3555,3 +3555,79 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
 
 	return err;
 }
+
+static int ext4_bh_prepare_fill(handle_t *handle, struct buffer_head *bh)
+{
+	if (!buffer_mapped(bh)) {
+		/*
+		 * Mark buffer as dirty so that
+		 * block_write_full_page() writes it
+		 */
+		set_buffer_dirty(bh);
+	}
+	return 0;
+}
+
+static int ext4_bh_unmapped(handle_t *handle, struct buffer_head *bh)
+{
+	return !buffer_mapped(bh);
+}
+
+int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page)
+{
+	loff_t size;
+	unsigned long len;
+	int ret = -EINVAL;
+	struct file *file = vma->vm_file;
+	struct inode *inode = file->f_path.dentry->d_inode;
+	struct address_space *mapping = inode->i_mapping;
+	struct writeback_control wbc = { .sync_mode = WB_SYNC_NONE,
+					 .nr_to_write = 1 };
+
+	/*
+	 * Get i_alloc_sem to stop truncates messing with the inode. We cannot
+	 * get i_mutex because we are already holding mmap_sem.
+	 */
+	down_read(&inode->i_alloc_sem);
+	size = i_size_read(inode);
+	if (page->mapping != mapping || size <= page_offset(page)
+	    || !PageUptodate(page)) {
+		/* page got truncated from under us? */
+		goto out_unlock;
+	}
+	ret = 0;
+	if (PageMappedToDisk(page))
+		goto out_unlock;
+
+	if (page->index == size >> PAGE_CACHE_SHIFT)
+		len = size & ~PAGE_CACHE_MASK;
+	else
+		len = PAGE_CACHE_SIZE;
+
+	if (page_has_buffers(page)) {
+		/* return if we have all the buffers mapped */
+		if (!walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
+				       ext4_bh_unmapped))
+			goto out_unlock;
+		/*
+		 * Now mark all the  buffer head dirty so
+		 * that writepage can write it
+		 */
+		walk_page_buffers(NULL, page_buffers(page), 0, len,
+					NULL, ext4_bh_prepare_fill);
+	}
+	/*
+	 * OK, we need to fill the hole... Lock the page and do writepage.
+	 * We can't do write_begin and write_end here because we don't
+	 * have inode_mutex and that allow parallel write_begin, write_end call.
+	 * (lock_page prevent this from happening on the same page though)
+	 */
+	lock_page(page);
+	wbc.range_start = page_offset(page);
+	wbc.range_end = page_offset(page) + len;
+	ret = mapping->a_ops->writepage(page, &wbc);
+	/* writepage unlocks the page */
+out_unlock:
+	up_read(&inode->i_alloc_sem);
+	return ret;
+}
-- 
1.5.5.1.357.g1af8b.dirty


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-05-30 13:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-18 11:53 [PATCH] ext4: Use page_mkwrite vma_operations to get mmap write notification Aneesh Kumar K.V
2008-02-18 11:53 ` [PATCH] ext4: Don't mark the filesystem with errors if we fail to fallocate Aneesh Kumar K.V
  -- strict thread matches above, loose matches on Subject: below --
2008-02-22 14:39 [PATCH] ext4: Use page_mkwrite vma_operations to get mmap write notification Aneesh Kumar K.V
2008-02-22 18:10 ` Mingming Cao
2008-02-22 18:23   ` Aneesh Kumar K.V
2008-02-22 19:28     ` Mingming Cao
2008-05-21 17:44 delalloc and journal locking order inversion fixes Aneesh Kumar K.V
2008-05-21 17:44 ` [PATCH] ext4: Add validation to jbd lock inversion patch and split and writepage Aneesh Kumar K.V
2008-05-21 17:44   ` [PATCH] ext4: inverse locking ordering of page_lock and transaction start in delalloc Aneesh Kumar K.V
2008-05-21 17:44     ` [PATCH] ext4: Fix delalloc sync hang with journal lock inversion Aneesh Kumar K.V
2008-05-21 17:44       ` [PATCH] ext4: Inverse locking order of page_lock and transaction start Aneesh Kumar K.V
2008-05-21 17:44         ` [PATCH] ext4: Use page_mkwrite vma_operations to get mmap write notification Aneesh Kumar K.V
2008-05-30 13:39 [PATCH -v2] delalloc and journal locking order inversion fixes Aneesh Kumar K.V
2008-05-30 13:39 ` [PATCH] ext4: Use page_mkwrite vma_operations to get mmap write notification Aneesh Kumar K.V

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).