public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH] ext4: Use page_mkwrite vma_operations to get mmap write notification.
@ 2008-02-28 18:05 Aneesh Kumar K.V
  2008-02-28 18:05 ` [RFC][PATCH] ext4: Fix fallocate error path Aneesh Kumar K.V
  0 siblings, 1 reply; 15+ messages in thread
From: Aneesh Kumar K.V @ 2008-02-28 18:05 UTC (permalink / raw)
  To: cmm; +Cc: linux-ext4, Aneesh Kumar K.V

We would like to get notified when we are doing a write on mmap section.
This is needed with respect to preallocated area. We split the preallocated
area into initialzed extent and uninitialzed extent in the call back. This
let us handle ENOSPC better. Otherwise we get ENOSPC in the writepage and
that would result in data loss. The changes are also needed to handle ENOSPC
when writing to an mmap section of files with holes.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/ext4/file.c          |   19 +++++++++++++++-
 fs/ext4/inode.c         |   54 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/ext4_fs.h |    1 +
 3 files changed, 73 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 20507a2..77341c1 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -123,6 +123,23 @@ force_commit:
 	return ret;
 }
 
+static struct vm_operations_struct ext4_file_vm_ops = {
+	.fault		= filemap_fault,
+	.page_mkwrite   = ext4_page_mkwrite,
+};
+
+static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct address_space *mapping = file->f_mapping;
+
+	if (!mapping->a_ops->readpage)
+		return -ENOEXEC;
+	file_accessed(file);
+	vma->vm_ops = &ext4_file_vm_ops;
+	vma->vm_flags |= VM_CAN_NONLINEAR;
+	return 0;
+}
+
 const struct file_operations ext4_file_operations = {
 	.llseek		= generic_file_llseek,
 	.read		= do_sync_read,
@@ -133,7 +150,7 @@ const struct file_operations ext4_file_operations = {
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	= ext4_compat_ioctl,
 #endif
-	.mmap		= generic_file_mmap,
+	.mmap		= ext4_file_mmap,
 	.open		= generic_file_open,
 	.release	= ext4_release_file,
 	.fsync		= ext4_sync_file,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5b5d63d..62aafc3 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3490,3 +3490,57 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
 
 	return err;
 }
+
+int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page)
+{
+	unsigned long end;
+	loff_t size;
+	handle_t *handle;
+	int ret = -EINVAL, needed_blocks;
+	struct file *file   = vma->vm_file;
+	struct inode *inode = file->f_path.dentry->d_inode;
+
+	needed_blocks = ext4_writepage_trans_blocks(inode);
+	/* We need to take inode mutex to prevent parallel write */
+	mutex_lock(&inode->i_mutex);
+	lock_page(page);
+	size = i_size_read(inode);
+	if ((page->mapping != inode->i_mapping) ||
+	    (page_offset(page) > size)) {
+		/* page got truncated out from underneath us */
+		goto out_unlock;
+	}
+	/* page is wholly or partially inside EOF */
+	if (((page->index + 1) << PAGE_CACHE_SHIFT) > size)
+		end = size & ~PAGE_CACHE_MASK;
+	else
+		end = PAGE_CACHE_SIZE;
+
+	/*
+	 * if ext4_get_block resulted in a split of an uninitialized extent,
+	 * in file system full case, we will have to take the journal write
+	 * access and zero out the page.
+	 */
+	handle = ext4_journal_start(inode, needed_blocks);
+	if (IS_ERR(handle)) {
+		ret = PTR_ERR(handle);
+		goto out_unlock;
+	}
+	/* Will zero out the pages if buffer is marked new */
+	ret = block_prepare_write(page, 0, end, ext4_get_block);
+
+	/*
+	 * Now call commit_write to mark the buffer dirty and page
+	 * uptodate. page_mkwrite makes the page dirty towards the
+	 * end. We don't want to mark the buffer dirty for
+	 * journalled mode.
+	 */
+	 if (!ext4_should_journal_data(inode))
+		 ret = block_commit_write(page, 0, end);
+
+	ext4_journal_stop(handle);
+out_unlock:
+	unlock_page(page);
+	mutex_unlock(&inode->i_mutex);
+	return ret;
+}
diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h
index 22810b1..8f5a563 100644
--- a/include/linux/ext4_fs.h
+++ b/include/linux/ext4_fs.h
@@ -1059,6 +1059,7 @@ extern void ext4_set_aops(struct inode *inode);
 extern int ext4_writepage_trans_blocks(struct inode *);
 extern int ext4_block_truncate_page(handle_t *handle, struct page *page,
 		struct address_space *mapping, loff_t from);
+extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page);
 
 /* ioctl.c */
 extern long ext4_ioctl(struct file *, unsigned int, unsigned long);
-- 
1.5.4.3.325.g6d216.dirty


^ permalink raw reply related	[flat|nested] 15+ messages in thread
* [RFC][PATCH] ext4: Convert uninitialized extent to initialized extent in case of file system full
@ 2008-02-21 19:17 Aneesh Kumar K.V
  2008-02-21 21:07 ` Mingming Cao
  0 siblings, 1 reply; 15+ messages in thread
From: Aneesh Kumar K.V @ 2008-02-21 19:17 UTC (permalink / raw)
  To: linux-ext4, Mingming Cao

This patch had very minimal testing. I am sending this to get the
feedback on the approach. The skip_index section in the below patch
is ugly. Any suggestion to improve ?

NOTE: ext4_ext_convert_to_initialized error path have some BUGs. It
doesn't reset the extent information in case of error. But that is
another patch.


>From 6a73edd4dbb32344e6a83ebdc07edd0e96d376bd Mon Sep 17 00:00:00 2001
From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date: Thu, 21 Feb 2008 23:57:38 +0530
Subject: [PATCH] ext4: Convert uninitialized extent to initialized extent in case of file system full

A write to prealloc area cause the split of unititalized extent into a initialized
and uninitialized extent. If we don't have space to add new extent information instead
of returning error convert the existing uninitialized extent to initialized one. We
need to zero out the blocks corresponding to the extent to prevent wrong data reaching
userspace.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/ext4/extents.c |  135 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 133 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index b179b03..d37c14e 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2137,6 +2137,103 @@ void ext4_ext_release(struct super_block *sb)
 #endif
 }
 
+static int ext4_ext_zero_out(handle_t *handle, struct inode *inode,
+				ext4_lblk_t iblock, struct ext4_extent *ex)
+{
+	ext4_lblk_t ee_block;
+	unsigned int ee_len, blkcount, blocksize;
+	loff_t pos;
+	pgoff_t index, skip_index;
+	unsigned long offset;
+	struct page *page;
+	struct address_space *mapping = inode->i_mapping;
+	struct buffer_head *head, *bh;
+	int err = 0;
+
+	ee_block = le32_to_cpu(ex->ee_block);
+	ee_len = blkcount = ext4_ext_get_actual_len(ex);
+	blocksize = inode->i_sb->s_blocksize;
+
+	/*
+	 * find the skip index. We can't call __grab_cache_page for this
+	 * because we are in the writeout of this page and we already have
+	 * taken the lock on this page
+	 */
+	pos = iblock <<  inode->i_blkbits;
+	skip_index = pos >> PAGE_CACHE_SHIFT;
+
+	while (blkcount) {
+		pos = (ee_block  + ee_len - blkcount) << inode->i_blkbits;
+		index = pos >> PAGE_CACHE_SHIFT;
+		offset = (pos & (PAGE_CACHE_SIZE - 1));
+		if (index == skip_index) {
+			/* Page will already be locked in the writepage */
+			read_lock_irq(&mapping->tree_lock);
+			page = radix_tree_lookup(&mapping->page_tree, index);
+			read_unlock_irq(&mapping->tree_lock);
+			if (page)
+				page_cache_get(page);
+			else
+				return -ENOMEM;
+		} else {
+			page = __grab_cache_page(mapping, index);
+			if (!page)
+				return -ENOMEM;
+		}
+
+		if (!page_has_buffers(page))
+			create_empty_buffers(page, blocksize, 0);
+
+		head = page_buffers(page);
+		/* Look for the buffer_head which map the block */
+		bh = head;
+		while (offset > 0) {
+			bh = bh->b_this_page;
+			offset -= blocksize;
+		}
+		offset = (pos & (PAGE_CACHE_SIZE - 1));
+
+		/* Now write all the buffer_heads in the page */
+		do {
+			set_buffer_uptodate(bh);
+			if (ext4_should_journal_data(inode)) {
+				err = ext4_journal_get_write_access(handle, bh);
+				/* do we have that many credits ??*/
+				if (err)
+					goto err_out;
+			}
+			zero_user(page, offset, blocksize);
+			offset += blocksize;
+			if (ext4_should_journal_data(inode)) {
+				err = ext4_journal_dirty_metadata(handle, bh);
+				if (err)
+					goto err_out;
+			} else {
+				if (ext4_should_order_data(inode)) {
+					err = ext4_journal_dirty_data(handle,
+									bh);
+					if (err)
+						goto err_out;
+				}
+				mark_buffer_dirty(bh);
+			}
+
+			bh = bh->b_this_page;
+			blkcount--;
+		} while ((bh != head) && (blkcount > 0));
+		/* only unlock if we have locked */
+		if (index != skip_index)
+			unlock_page(page);
+		page_cache_release(page);
+	}
+
+	return 0;
+err_out:
+	unlock_page(page);
+	page_cache_release(page);
+	return err;
+}
+
 /*
  * This function is called by ext4_ext_get_blocks() if someone tries to write
  * to an uninitialized extent. It may result in splitting the uninitialized
@@ -2153,7 +2250,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
 						ext4_lblk_t iblock,
 						unsigned long max_blocks)
 {
-	struct ext4_extent *ex, newex;
+	struct ext4_extent *ex, newex, zeroout_ex;
 	struct ext4_extent *ex1 = NULL;
 	struct ext4_extent *ex2 = NULL;
 	struct ext4_extent *ex3 = NULL;
@@ -2172,6 +2269,9 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
 	allocated = ee_len - (iblock - ee_block);
 	newblock = iblock - ee_block + ext_pblock(ex);
 	ex2 = ex;
+	zeroout_ex.ee_block = ex->ee_block;
+	zeroout_ex.ee_len   = cpu_to_le16(ee_len);
+	ext4_ext_store_pblock(&zeroout_ex, ext_pblock(ex));
 
 	err = ext4_ext_get_access(handle, inode, path + depth);
 	if (err)
@@ -2200,13 +2300,32 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
 		ex3->ee_len = cpu_to_le16(allocated - max_blocks);
 		ext4_ext_mark_uninitialized(ex3);
 		err = ext4_ext_insert_extent(handle, inode, path, ex3);
-		if (err)
+		if (err == -ENOSPC) {
+			err =  ext4_ext_zero_out(handle, inode,
+							iblock, &zeroout_ex);
+			if (err)
+				goto out;
+			/* update the extent length and mark as initialized */
+			ex->ee_block = zeroout_ex.ee_block;
+			ex->ee_len   = zeroout_ex.ee_len;
+			ext4_ext_store_pblock(ex, ext_pblock(&zeroout_ex));
+			ext4_ext_dirty(handle, inode, path + depth);
+			return le16_to_cpu(ex->ee_len);
+
+		} else if (err)
 			goto out;
+
 		/*
 		 * The depth, and hence eh & ex might change
 		 * as part of the insert above.
 		 */
 		newdepth = ext_depth(inode);
+		/*
+		 * update the extent length after successfull insert of the
+		 * split extent
+		 */
+		zeroout_ex.ee_len = cpu_to_le16(ee_len -
+						ext4_ext_get_actual_len(ex3));
 		if (newdepth != depth) {
 			depth = newdepth;
 			ext4_ext_drop_refs(path);
@@ -2281,6 +2400,18 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
 	goto out;
 insert:
 	err = ext4_ext_insert_extent(handle, inode, path, &newex);
+	if (err == -ENOSPC) {
+		err =  ext4_ext_zero_out(handle, inode, iblock, &zeroout_ex);
+		if (err)
+			goto out;
+		/* update the extent length and mark as initialized */
+		ex->ee_block = zeroout_ex.ee_block;
+		ex->ee_len   = zeroout_ex.ee_len;
+		ext4_ext_store_pblock(ex, ext_pblock(&zeroout_ex));
+		ext4_ext_dirty(handle, inode, path + depth);
+		return le16_to_cpu(ex->ee_len);
+	}
+
 out:
 	return err ? err : allocated;
 }
-- 
1.5.4.1.97.g40aab-dirty

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2008-03-02 18:51 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-28 18:05 [RFC][PATCH] ext4: Use page_mkwrite vma_operations to get mmap write notification Aneesh Kumar K.V
2008-02-28 18:05 ` [RFC][PATCH] ext4: Fix fallocate error path Aneesh Kumar K.V
2008-02-28 18:05   ` [RFC][PATCH] ext4: Convert uninitialized extent to initialized extent in case of file system full Aneesh Kumar K.V
2008-02-28 18:05     ` [RFC][PATCH] ext4: Enable extent format for symlink Aneesh Kumar K.V
2008-02-28 23:14     ` [RFC][PATCH] ext4: Convert uninitialized extent to initialized extent in case of file system full Mingming Cao
2008-02-29 11:09       ` Aneesh Kumar K.V
2008-02-29 19:21         ` Andreas Dilger
2008-03-01 17:30           ` Aneesh Kumar K.V
2008-03-02 18:51             ` Andreas Dilger
2008-02-29 18:05       ` Andreas Dilger
  -- strict thread matches above, loose matches on Subject: below --
2008-02-21 19:17 Aneesh Kumar K.V
2008-02-21 21:07 ` Mingming Cao
2008-02-22 14:31   ` Aneesh Kumar K.V
2008-02-22 15:42     ` Aneesh Kumar K.V
2008-02-22 17:28       ` Mingming Cao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox