linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/5] ext4: Remove restrictive checks for EOFBLOCKS_FL
@ 2012-03-21  7:23 Lukas Czerner
  2012-03-21  7:23 ` [PATCH 2/5] ext4: Correctly handle EOFBLOCKS flag in ext4_ext_punch_hole Lukas Czerner
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Lukas Czerner @ 2012-03-21  7:23 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, achender, Lukas Czerner

We are going to remove the EOFBLOCKS_FL flag in the future, so this is
the first part of the removal. We can not remove it entirely just now,
since the e2fsck is still checking for it and it might cause headache to
some people. Instead, remove the restrictive checks now and the rest
later, when the new e2fsck code is out and common enough.

This is also needed because punch hole already breaks the EOFBLOCKS_FL
semantics, so it might cause the some troubles. So simply remove it.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 fs/ext4/extents.c |   13 ++++++++-----
 fs/ext4/inode.c   |    6 ++----
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 9e10b82..06b0792 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3291,11 +3291,13 @@ static int check_eofblocks_fl(handle_t *handle, struct inode *inode,
 	depth = ext_depth(inode);
 	eh = path[depth].p_hdr;
 
-	if (unlikely(!eh->eh_entries)) {
-		EXT4_ERROR_INODE(inode, "eh->eh_entries == 0 and "
-				 "EOFBLOCKS_FL set");
-		return -EIO;
-	}
+	/*
+	 * We're going to remove EOFBLOCKS_FL entirely in future so we
+	 * do not care for this case anymore. Simply remove the flag
+	 * if there are no extents.
+	 */
+	if (unlikely(!eh->eh_entries))
+		goto out;
 	last_ex = EXT_LAST_EXTENT(eh);
 	/*
 	 * We should clear the EOFBLOCKS_FL flag if we are writing the
@@ -3319,6 +3321,7 @@ static int check_eofblocks_fl(handle_t *handle, struct inode *inode,
 	for (i = depth-1; i >= 0; i--)
 		if (path[i].p_idx != EXT_LAST_INDEX(path[i].p_hdr))
 			return 0;
+out:
 	ext4_clear_inode_flag(inode, EXT4_INODE_EOFBLOCKS);
 	return ext4_mark_inode_dirty(handle, inode);
 }
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index feaa82f..55f5b91 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4152,11 +4152,9 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
 	}
 
 	if (attr->ia_valid & ATTR_SIZE) {
-		if (attr->ia_size != i_size_read(inode)) {
+		if (attr->ia_size != i_size_read(inode))
 			truncate_setsize(inode, attr->ia_size);
-			ext4_truncate(inode);
-		} else if (ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))
-			ext4_truncate(inode);
+		ext4_truncate(inode);
 	}
 
 	if (!rc) {
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/5] ext4: Correctly handle EOFBLOCKS flag in ext4_ext_punch_hole
  2012-03-21  7:23 [PATCH 1/5] ext4: Remove restrictive checks for EOFBLOCKS_FL Lukas Czerner
@ 2012-03-21  7:23 ` Lukas Czerner
  2012-03-22  2:13   ` Ted Ts'o
  2012-03-21  7:23 ` [PATCH 3/5] ext4: Allow punch hole beyond i_size Lukas Czerner
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Lukas Czerner @ 2012-03-21  7:23 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, achender, Lukas Czerner

We have to clear the EOFBLOCK flag in the case that we just punched
out all blocks allocated past the i_size, so that e2fsck does not
compain about it. Even though we're going to remove EOFBLOCKS flag
in the future we still have to handle it correctly for now as there
are e2fsck versions out there which would complain about it. We can
remove this after the new e2fsck code ignoring the EOFBLOCKS flag
is common enough.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 fs/ext4/extents.c |   35 +++++++++++++++++++++++++++++++++++
 1 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 06b0792..7b822c0 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4822,6 +4822,41 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
 
 	up_write(&EXT4_I(inode)->i_data_sem);
 
+	/*
+	 * This is fugly, but even though we're going to get rid of the
+	 * EOFBLOCKS_LF in the future, we have to handle it correctly now
+	 * because there are still versions of e2fsck out there which
+	 * would scream otherwise. Once the new e2fsck code ignoring
+	 * this flag is common enough this can be removed entirely.
+	 */
+	if (ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)) {
+		struct ext4_ext_path *path;
+		ext4_lblk_t last_block;
+
+		mutex_lock(&inode->i_mutex);
+		down_read(&EXT4_I(inode)->i_data_sem);
+
+		last_block = inode->i_size >> EXT4_BLOCK_SIZE_BITS(sb);
+
+		/*
+		 * We have to check whether there is any extent past the
+		 * i_size. If not, we probably punched that out, so we need
+		 * to clear the EOFBLOCKS flag
+		 */
+		path = ext4_ext_find_extent(inode, last_block, NULL);
+		if (IS_ERR(path))
+			err = PTR_ERR(path);
+		else {
+			err = check_eofblocks_fl(handle, inode, last_block,
+						 path, 1);
+			ext4_ext_drop_refs(path);
+			kfree(path);
+		}
+
+		up_read(&EXT4_I(inode)->i_data_sem);
+		mutex_unlock(&inode->i_mutex);
+	}
+
 out:
 	ext4_orphan_del(handle, inode);
 	inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/5] ext4: Allow punch hole beyond i_size
  2012-03-21  7:23 [PATCH 1/5] ext4: Remove restrictive checks for EOFBLOCKS_FL Lukas Czerner
  2012-03-21  7:23 ` [PATCH 2/5] ext4: Correctly handle EOFBLOCKS flag in ext4_ext_punch_hole Lukas Czerner
@ 2012-03-21  7:23 ` Lukas Czerner
  2012-03-21  7:23 ` [PATCH 4/5] ext4: Remove uneeded i_size handling Lukas Czerner
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Lukas Czerner @ 2012-03-21  7:23 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, achender, Lukas Czerner

From: Allison Henderson <achender@linux.vnet.ibm.com>

From: Allison Henderson <achender@linux.vnet.ibm.com>

This patch allows blocks beyond i_size to be punched out.
This early return to catch this condition is simply removed
allowing punch hole to proceed beyond i_size.

Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 fs/ext4/extents.c |   14 --------------
 1 files changed, 0 insertions(+), 14 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 7b822c0..7d4f2cb 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4689,20 +4689,6 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
 	loff_t first_page_offset, last_page_offset;
 	int credits, err = 0;
 
-	/* No need to punch hole beyond i_size */
-	if (offset >= inode->i_size)
-		return 0;
-
-	/*
-	 * If the hole extends beyond i_size, set the hole
-	 * to end after the page that contains i_size
-	 */
-	if (offset + length > inode->i_size) {
-		length = inode->i_size +
-		   PAGE_CACHE_SIZE - (inode->i_size & (PAGE_CACHE_SIZE - 1)) -
-		   offset;
-	}

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/5] ext4: Remove uneeded i_size handling
  2012-03-21  7:23 [PATCH 1/5] ext4: Remove restrictive checks for EOFBLOCKS_FL Lukas Czerner
  2012-03-21  7:23 ` [PATCH 2/5] ext4: Correctly handle EOFBLOCKS flag in ext4_ext_punch_hole Lukas Czerner
  2012-03-21  7:23 ` [PATCH 3/5] ext4: Allow punch hole beyond i_size Lukas Czerner
@ 2012-03-21  7:23 ` Lukas Czerner
  2012-03-21  7:23 ` [PATCH 5/5] ext4: Correct ext4_punch_hole return codes Lukas Czerner
  2012-03-22  0:10 ` [PATCH 1/5] ext4: Remove restrictive checks for EOFBLOCKS_FL Allison Henderson
  4 siblings, 0 replies; 10+ messages in thread
From: Lukas Czerner @ 2012-03-21  7:23 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, achender, Lukas Czerner

From: Allison Henderson <achender@linux.vnet.ibm.com>

From: Allison Henderson <achender@linux.vnet.ibm.com>

This patch removes a fix that is now being addressed in another
patch.  The code being removed also made the assumption that a
hole cannot exceed or start after i_size, but since this is no
longer the case and the source of the bug has been corrected in
a different patch, this code is no longer needed.

The removed code initally corrected a bug found in fsx,
where garbage data would appear in the last page after i_size, when
ever a hole ended in the same page as i_size.  The cause of the cause
of the garbage data has been fixed in patch
"[PATCH 2/2] ext4: let ext4_bio_write_page handle EOF correctly"

This patch set has been tested with fsx on a 1k block size, and
successfully passed 24 hours.

Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 fs/ext4/extents.c |   19 -------------------
 1 files changed, 0 insertions(+), 19 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 7d4f2cb..846584d 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4767,25 +4767,6 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
 		}
 	}
 
-	/*
-	 * If i_size is contained in the last page, we need to
-	 * unmap and zero the partial page after i_size
-	 */
-	if (inode->i_size >> PAGE_CACHE_SHIFT == last_page &&
-	   inode->i_size % PAGE_CACHE_SIZE != 0) {
-
-		page_len = PAGE_CACHE_SIZE -
-			(inode->i_size & (PAGE_CACHE_SIZE - 1));
-
-		if (page_len > 0) {
-			err = ext4_discard_partial_page_buffers(handle,
-			  mapping, inode->i_size, page_len, 0);
-
-			if (err)
-				goto out;
-		}
-	}

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/5] ext4: Correct ext4_punch_hole return codes
  2012-03-21  7:23 [PATCH 1/5] ext4: Remove restrictive checks for EOFBLOCKS_FL Lukas Czerner
                   ` (2 preceding siblings ...)
  2012-03-21  7:23 ` [PATCH 4/5] ext4: Remove uneeded i_size handling Lukas Czerner
@ 2012-03-21  7:23 ` Lukas Czerner
  2012-03-22  0:10 ` [PATCH 1/5] ext4: Remove restrictive checks for EOFBLOCKS_FL Allison Henderson
  4 siblings, 0 replies; 10+ messages in thread
From: Lukas Czerner @ 2012-03-21  7:23 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, achender, Lukas Czerner

From: Allison Henderson <achender@linux.vnet.ibm.com>

From: Allison Henderson <achender@linux.vnet.ibm.com>

ext4_punch_hole returns -ENOTSUPP but it should be using
-EOPNOTSUPP

Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 fs/ext4/inode.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 55f5b91..d54143a 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3329,16 +3329,16 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
 {
 	struct inode *inode = file->f_path.dentry->d_inode;
 	if (!S_ISREG(inode->i_mode))
-		return -ENOTSUPP;
+		return -EOPNOTSUPP;
 
 	if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
 		/* TODO: Add support for non extent hole punching */
-		return -ENOTSUPP;
+		return -EOPNOTSUPP;
 	}
 
 	if (EXT4_SB(inode->i_sb)->s_cluster_ratio > 1) {
 		/* TODO: Add support for bigalloc file systems */
-		return -ENOTSUPP;
+		return -EOPNOTSUPP;
 	}
 
 	return ext4_ext_punch_hole(file, offset, length);
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/5] ext4: Remove restrictive checks for EOFBLOCKS_FL
  2012-03-21  7:23 [PATCH 1/5] ext4: Remove restrictive checks for EOFBLOCKS_FL Lukas Czerner
                   ` (3 preceding siblings ...)
  2012-03-21  7:23 ` [PATCH 5/5] ext4: Correct ext4_punch_hole return codes Lukas Czerner
@ 2012-03-22  0:10 ` Allison Henderson
  4 siblings, 0 replies; 10+ messages in thread
From: Allison Henderson @ 2012-03-22  0:10 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-ext4, tytso

On 03/21/2012 12:23 AM, Lukas Czerner wrote:
> We are going to remove the EOFBLOCKS_FL flag in the future, so this is
> the first part of the removal. We can not remove it entirely just now,
> since the e2fsck is still checking for it and it might cause headache to
> some people. Instead, remove the restrictive checks now and the rest
> later, when the new e2fsck code is out and common enough.
>
> This is also needed because punch hole already breaks the EOFBLOCKS_FL
> semantics, so it might cause the some troubles. So simply remove it.
>
> Signed-off-by: Lukas Czerner<lczerner@redhat.com>

Alrighty, I went over your set and it all looks good to me.  Thx Lukas!

Allison Henderson

> ---
>   fs/ext4/extents.c |   13 ++++++++-----
>   fs/ext4/inode.c   |    6 ++----
>   2 files changed, 10 insertions(+), 9 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 9e10b82..06b0792 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3291,11 +3291,13 @@ static int check_eofblocks_fl(handle_t *handle, struct inode *inode,
>   	depth = ext_depth(inode);
>   	eh = path[depth].p_hdr;
>
> -	if (unlikely(!eh->eh_entries)) {
> -		EXT4_ERROR_INODE(inode, "eh->eh_entries == 0 and "
> -				 "EOFBLOCKS_FL set");
> -		return -EIO;
> -	}
> +	/*
> +	 * We're going to remove EOFBLOCKS_FL entirely in future so we
> +	 * do not care for this case anymore. Simply remove the flag
> +	 * if there are no extents.
> +	 */
> +	if (unlikely(!eh->eh_entries))
> +		goto out;
>   	last_ex = EXT_LAST_EXTENT(eh);
>   	/*
>   	 * We should clear the EOFBLOCKS_FL flag if we are writing the
> @@ -3319,6 +3321,7 @@ static int check_eofblocks_fl(handle_t *handle, struct inode *inode,
>   	for (i = depth-1; i>= 0; i--)
>   		if (path[i].p_idx != EXT_LAST_INDEX(path[i].p_hdr))
>   			return 0;
> +out:
>   	ext4_clear_inode_flag(inode, EXT4_INODE_EOFBLOCKS);
>   	return ext4_mark_inode_dirty(handle, inode);
>   }
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index feaa82f..55f5b91 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4152,11 +4152,9 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
>   	}
>
>   	if (attr->ia_valid&  ATTR_SIZE) {
> -		if (attr->ia_size != i_size_read(inode)) {
> +		if (attr->ia_size != i_size_read(inode))
>   			truncate_setsize(inode, attr->ia_size);
> -			ext4_truncate(inode);
> -		} else if (ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))
> -			ext4_truncate(inode);
> +		ext4_truncate(inode);
>   	}
>
>   	if (!rc) {


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/5] ext4: Correctly handle EOFBLOCKS flag in ext4_ext_punch_hole
  2012-03-21  7:23 ` [PATCH 2/5] ext4: Correctly handle EOFBLOCKS flag in ext4_ext_punch_hole Lukas Czerner
@ 2012-03-22  2:13   ` Ted Ts'o
  2012-03-22  8:25     ` Lukas Czerner
  0 siblings, 1 reply; 10+ messages in thread
From: Ted Ts'o @ 2012-03-22  2:13 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-ext4, achender

On Wed, Mar 21, 2012 at 08:23:55AM +0100, Lukas Czerner wrote:
> +	/*
> +	 * This is fugly, but even though we're going to get rid of the
> +	 * EOFBLOCKS_LF in the future, we have to handle it correctly now
> +	 * because there are still versions of e2fsck out there which
> +	 * would scream otherwise. Once the new e2fsck code ignoring
> +	 * this flag is common enough this can be removed entirely.
> +	 */
> +	if (ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)) {
> +		struct ext4_ext_path *path;
> +		ext4_lblk_t last_block;
> +
> +		mutex_lock(&inode->i_mutex);
> +		down_read(&EXT4_I(inode)->i_data_sem);

I was looking at this patch, and I was wondering why we weren't taking
i_mutex earlier in ext4_ext_punch_hole().  The primary use of i_mutex
is to protect writes racing with each other and with truncate.  Given
that punch essentially works like truncate, and all of ext4_truncate()
is run with i_mutex down, and currently ext4_ext_punch_hole() (before
applying this patch) doesn't isn't taking i_mutex at all, I'm
wondering if we can run into problems where punch is racing against a
write --- if the pages are already in mapped, then the write might not
even need to take i_data_sem.

Lukas, Allison --- am I missing something here?

						- Ted

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/5] ext4: Correctly handle EOFBLOCKS flag in ext4_ext_punch_hole
  2012-03-22  2:13   ` Ted Ts'o
@ 2012-03-22  8:25     ` Lukas Czerner
  2012-03-22 13:47       ` Ted Ts'o
  0 siblings, 1 reply; 10+ messages in thread
From: Lukas Czerner @ 2012-03-22  8:25 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: Lukas Czerner, linux-ext4, achender

On Wed, 21 Mar 2012, Ted Ts'o wrote:

> On Wed, Mar 21, 2012 at 08:23:55AM +0100, Lukas Czerner wrote:
> > +	/*
> > +	 * This is fugly, but even though we're going to get rid of the
> > +	 * EOFBLOCKS_LF in the future, we have to handle it correctly now
> > +	 * because there are still versions of e2fsck out there which
> > +	 * would scream otherwise. Once the new e2fsck code ignoring
> > +	 * this flag is common enough this can be removed entirely.
> > +	 */
> > +	if (ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)) {
> > +		struct ext4_ext_path *path;
> > +		ext4_lblk_t last_block;
> > +
> > +		mutex_lock(&inode->i_mutex);
> > +		down_read(&EXT4_I(inode)->i_data_sem);
> 
> I was looking at this patch, and I was wondering why we weren't taking
> i_mutex earlier in ext4_ext_punch_hole().  The primary use of i_mutex
> is to protect writes racing with each other and with truncate.  Given
> that punch essentially works like truncate, and all of ext4_truncate()
> is run with i_mutex down, and currently ext4_ext_punch_hole() (before
> applying this patch) doesn't isn't taking i_mutex at all, I'm
> wondering if we can run into problems where punch is racing against a
> write --- if the pages are already in mapped, then the write might not
> even need to take i_data_sem.
> 
> Lukas, Allison --- am I missing something here?
> 
> 						- Ted

Hmm, so we can not race with truncate since we do not touch i_size in
punch_hole and the extent tree modification is done with i_data_sem
locked.

As far as write is concerned, I do not think that race is possible. If
for example we're trying to write into the same range we're punching out
the buffer we're trying to write into is either mapped, or not right ?

If it's mapped then punch_hole did not get to this block yet, but it
will eventually in the current transaction. If the buffer is unmapped,
then we're going to get a new block, which means we're going to have to
lock the i_data_sem, but since punch_hole is already holding it we have
to wait before it finishes.

The worse what can happen is that after a write spanning several block
we'll have first part of the write punched out, but second part written
correctly since in this case it might hit already punched block
and need to wait for punch_hole to finish, after that the rest of the
range is written. However the write should remain consistent on block
granularity which is all we guarantee anyway, right ?

-Lukas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/5] ext4: Correctly handle EOFBLOCKS flag in ext4_ext_punch_hole
  2012-03-22  8:25     ` Lukas Czerner
@ 2012-03-22 13:47       ` Ted Ts'o
  2012-03-22 14:05         ` Lukas Czerner
  0 siblings, 1 reply; 10+ messages in thread
From: Ted Ts'o @ 2012-03-22 13:47 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-ext4, achender

On Thu, Mar 22, 2012 at 09:25:15AM +0100, Lukas Czerner wrote:
> 
> The worse what can happen is that after a write spanning several block
> we'll have first part of the write punched out, but second part written
> correctly since in this case it might hit already punched block
> and need to wait for punch_hole to finish, after that the rest of the
> range is written. However the write should remain consistent on block
> granularity which is all we guarantee anyway, right ?

I need to look more closely at this, but thing that was worrying me
was the part of truncate/punch where we have to invalidate the parts
of the page cache where we've unmapped the blocks.  i.e., the call to
truncate_inode_pages_range() racing with the write.  I think we're ok,
since truncate_inode_pages_range() grabs the page spinlock and then
checks for PageWriteback, which ought to be sufficient, but truncate
does take that codepath with i_mutex down, and so my spidey sense is
tingling.  I may just being too paranoid, though.

Still, that's not a criticism of your patch.

More serious is the following lockdep warning that I got.  Grabbing
i_mutex after the transaction handle is started can lead to a circular
locking deadlock...

						- Ted

BEGIN TEST: Ext4 4k block Wed Mar 21 22:47:17 EDT 2012
Device: /dev/vdb
mke2fs options: -q
mount options: -o block_validity
000 - unknown test, ignored
FSTYP         -- ext4
PLATFORM      -- Linux/i686 candygram 3.3.0-rc2-00592-gc56a0b2
MKFS_OPTIONS  -- -q /dev/vdc
MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity /dev/vdc /vdc
075	[  808.872903] 
[  808.873567] ======================================================
[  808.875933] [ INFO: possible circular locking dependency detected ]
[  808.875933] 3.3.0-rc2-00592-gc56a0b2 #32 Not tainted
[  808.875933] -------------------------------------------------------
[  808.875933] fsx/13769 is trying to acquire lock:
[  808.875933]  (&sb->s_type->i_mutex_key#3){+.+.+.}, at: [<c028d900>] ext4_ext_punch_hole+0x2b8/0x382
[  808.875933] 
[  808.875933] but task is already holding lock:
[  808.875933]  (jbd2_handle){+.+...}, at: [<c02a5995>] start_this_handle+0x4e4/0x51a
[  808.875933] 
[  808.875933] which lock already depends on the new lock.
[  808.875933] 
[  808.875933] 
[  808.875933] the existing dependency chain (in reverse order) is:
[  808.875933] 
[  808.875933] -> #1 (jbd2_handle){+.+...}:
[  808.875933]        [<c019789d>] lock_acquire+0x99/0xbd
[  808.875933]        [<c02a59b7>] start_this_handle+0x506/0x51a
[  808.875933]        [<c02a5ba6>] jbd2__journal_start+0xae/0xda
[  808.875933]        [<c02a5be4>] jbd2_journal_start+0x12/0x14
[  808.875933]        [<c0284fb8>] ext4_journal_start_sb+0x11e/0x126
[  808.875933]        [<c0277661>] ext4_unlink+0x82/0x1e5
[  808.875933]        [<c02127e1>] vfs_unlink+0x61/0xaf
[  808.875933]        [<c02147b5>] do_unlinkat+0xa0/0x112
[  808.875933]        [<c0214946>] sys_unlinkat+0x30/0x37
[  808.875933]        [<c06d8c5d>] syscall_call+0x7/0xb
[  808.875933] 
[  808.875933] -> #0 (&sb->s_type->i_mutex_key#3){+.+.+.}:
[  808.875933]        [<c0197598>] __lock_acquire+0x989/0xbf5
[  808.875933]        [<c019789d>] lock_acquire+0x99/0xbd
[  808.875933]        [<c06d65f4>] __mutex_lock_common+0x30/0x316
[  808.875933]        [<c06d6988>] mutex_lock_nested+0x26/0x2f
[  808.875933]        [<c028d900>] ext4_ext_punch_hole+0x2b8/0x382
[  808.875933]        [<c026e316>] ext4_punch_hole+0x5f/0x70
[  808.875933]        [<c028fbce>] ext4_fallocate+0x63/0x469
[  808.875933]        [<c0208974>] do_fallocate+0xe7/0x105
[  808.875933]        [<c02089c3>] sys_fallocate+0x31/0x46
[  808.875933]        [<c06d8c5d>] syscall_call+0x7/0xb
[  808.875933] 
[  808.875933] other info that might help us debug this:
[  808.875933] 
[  808.875933]  Possible unsafe locking scenario:
[  808.875933] 
[  808.875933]        CPU0                    CPU1
[  808.875933]        ----                    ----
[  808.875933]   lock(jbd2_handle);
[  808.875933]                                lock(&sb->s_type->i_mutex_key#3);
[  808.875933]                                lock(jbd2_handle);
[  808.875933]   lock(&sb->s_type->i_mutex_key#3);
[  808.875933] 
[  808.875933]  *** DEADLOCK ***
[  808.875933] 
[  808.875933] 1 lock held by fsx/13769:
[  808.875933]  #0:  (jbd2_handle){+.+...}, at: [<c02a5995>] start_this_handle+0x4e4/0x51a
[  808.875933] 
[  808.875933] stack backtrace:
[  808.875933] Pid: 13769, comm: fsx Not tainted 3.3.0-rc2-00592-gc56a0b2 #32
[  808.875933] Call Trace:
[  808.875933]  [<c01954fb>] print_circular_bug+0x194/0x1a1
[  808.875933]  [<c0197598>] __lock_acquire+0x989/0xbf5
[  808.875933]  [<c019789d>] lock_acquire+0x99/0xbd
[  808.875933]  [<c028d900>] ? ext4_ext_punch_hole+0x2b8/0x382
[  808.875933]  [<c06d65f4>] __mutex_lock_common+0x30/0x316
[  808.875933]  [<c028d900>] ? ext4_ext_punch_hole+0x2b8/0x382
[  808.875933]  [<c017d53a>] ? local_clock+0x3d/0x55
[  808.875933]  [<c01942de>] ? lock_release_holdtime+0x2b/0xcd
[  808.875933]  [<c028d8d9>] ? ext4_ext_punch_hole+0x291/0x382
[  808.875933]  [<c06d6988>] mutex_lock_nested+0x26/0x2f
[  808.875933]  [<c028d900>] ? ext4_ext_punch_hole+0x2b8/0x382
[  808.875933]  [<c028d900>] ext4_ext_punch_hole+0x2b8/0x382
[  808.875933]  [<c026e316>] ext4_punch_hole+0x5f/0x70
[  808.875933]  [<c028fbce>] ext4_fallocate+0x63/0x469
[  808.875933]  [<c017d4ed>] ? sched_clock_cpu+0x134/0x144
[  808.875933]  [<c023473e>] ? fsnotify+0x1e8/0x202
[  808.875933]  [<c01940d5>] ? trace_hardirqs_off+0xb/0xd
[  808.875933]  [<c017d53a>] ? local_clock+0x3d/0x55
[  808.875933]  [<c020a873>] ? fget+0x57/0x71
[  808.875933]  [<c0208974>] do_fallocate+0xe7/0x105
[  808.875933]  [<c02089c3>] sys_fallocate+0x31/0x46
[  808.875933]  [<c06d8c5d>] syscall_call+0x7/0xb
[  808.875933]  [<c06d0000>] ? init_intel+0x1aa/0x370

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/5] ext4: Correctly handle EOFBLOCKS flag in ext4_ext_punch_hole
  2012-03-22 13:47       ` Ted Ts'o
@ 2012-03-22 14:05         ` Lukas Czerner
  0 siblings, 0 replies; 10+ messages in thread
From: Lukas Czerner @ 2012-03-22 14:05 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: Lukas Czerner, linux-ext4, achender

On Thu, 22 Mar 2012, Ted Ts'o wrote:

> On Thu, Mar 22, 2012 at 09:25:15AM +0100, Lukas Czerner wrote:
> > 
> > The worse what can happen is that after a write spanning several block
> > we'll have first part of the write punched out, but second part written
> > correctly since in this case it might hit already punched block
> > and need to wait for punch_hole to finish, after that the rest of the
> > range is written. However the write should remain consistent on block
> > granularity which is all we guarantee anyway, right ?
> 
> I need to look more closely at this, but thing that was worrying me
> was the part of truncate/punch where we have to invalidate the parts
> of the page cache where we've unmapped the blocks.  i.e., the call to
> truncate_inode_pages_range() racing with the write.  I think we're ok,
> since truncate_inode_pages_range() grabs the page spinlock and then
> checks for PageWriteback, which ought to be sufficient, but truncate
> does take that codepath with i_mutex down, and so my spidey sense is
> tingling.  I may just being too paranoid, though.
> 
> Still, that's not a criticism of your patch.
> 
> More serious is the following lockdep warning that I got.  Grabbing
> i_mutex after the transaction handle is started can lead to a circular
> locking deadlock...
> 
> 						- Ted

Hrm, that's not very good. So we probably need to take the i_mutex for
the whole transaction. It's not pretty solution, but I do not see other
way around. Maybe we could clear the flag after the punch_hole in
different transaction, but then the fallocate keep size and punch_hole
race window would be much bigger.

-Lukas

> 
> BEGIN TEST: Ext4 4k block Wed Mar 21 22:47:17 EDT 2012
> Device: /dev/vdb
> mke2fs options: -q
> mount options: -o block_validity
> 000 - unknown test, ignored
> FSTYP         -- ext4
> PLATFORM      -- Linux/i686 candygram 3.3.0-rc2-00592-gc56a0b2
> MKFS_OPTIONS  -- -q /dev/vdc
> MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity /dev/vdc /vdc
> 075	[  808.872903] 
> [  808.873567] ======================================================
> [  808.875933] [ INFO: possible circular locking dependency detected ]
> [  808.875933] 3.3.0-rc2-00592-gc56a0b2 #32 Not tainted
> [  808.875933] -------------------------------------------------------
> [  808.875933] fsx/13769 is trying to acquire lock:
> [  808.875933]  (&sb->s_type->i_mutex_key#3){+.+.+.}, at: [<c028d900>] ext4_ext_punch_hole+0x2b8/0x382
> [  808.875933] 
> [  808.875933] but task is already holding lock:
> [  808.875933]  (jbd2_handle){+.+...}, at: [<c02a5995>] start_this_handle+0x4e4/0x51a
> [  808.875933] 
> [  808.875933] which lock already depends on the new lock.
> [  808.875933] 
> [  808.875933] 
> [  808.875933] the existing dependency chain (in reverse order) is:
> [  808.875933] 
> [  808.875933] -> #1 (jbd2_handle){+.+...}:
> [  808.875933]        [<c019789d>] lock_acquire+0x99/0xbd
> [  808.875933]        [<c02a59b7>] start_this_handle+0x506/0x51a
> [  808.875933]        [<c02a5ba6>] jbd2__journal_start+0xae/0xda
> [  808.875933]        [<c02a5be4>] jbd2_journal_start+0x12/0x14
> [  808.875933]        [<c0284fb8>] ext4_journal_start_sb+0x11e/0x126
> [  808.875933]        [<c0277661>] ext4_unlink+0x82/0x1e5
> [  808.875933]        [<c02127e1>] vfs_unlink+0x61/0xaf
> [  808.875933]        [<c02147b5>] do_unlinkat+0xa0/0x112
> [  808.875933]        [<c0214946>] sys_unlinkat+0x30/0x37
> [  808.875933]        [<c06d8c5d>] syscall_call+0x7/0xb
> [  808.875933] 
> [  808.875933] -> #0 (&sb->s_type->i_mutex_key#3){+.+.+.}:
> [  808.875933]        [<c0197598>] __lock_acquire+0x989/0xbf5
> [  808.875933]        [<c019789d>] lock_acquire+0x99/0xbd
> [  808.875933]        [<c06d65f4>] __mutex_lock_common+0x30/0x316
> [  808.875933]        [<c06d6988>] mutex_lock_nested+0x26/0x2f
> [  808.875933]        [<c028d900>] ext4_ext_punch_hole+0x2b8/0x382
> [  808.875933]        [<c026e316>] ext4_punch_hole+0x5f/0x70
> [  808.875933]        [<c028fbce>] ext4_fallocate+0x63/0x469
> [  808.875933]        [<c0208974>] do_fallocate+0xe7/0x105
> [  808.875933]        [<c02089c3>] sys_fallocate+0x31/0x46
> [  808.875933]        [<c06d8c5d>] syscall_call+0x7/0xb
> [  808.875933] 
> [  808.875933] other info that might help us debug this:
> [  808.875933] 
> [  808.875933]  Possible unsafe locking scenario:
> [  808.875933] 
> [  808.875933]        CPU0                    CPU1
> [  808.875933]        ----                    ----
> [  808.875933]   lock(jbd2_handle);
> [  808.875933]                                lock(&sb->s_type->i_mutex_key#3);
> [  808.875933]                                lock(jbd2_handle);
> [  808.875933]   lock(&sb->s_type->i_mutex_key#3);
> [  808.875933] 
> [  808.875933]  *** DEADLOCK ***
> [  808.875933] 
> [  808.875933] 1 lock held by fsx/13769:
> [  808.875933]  #0:  (jbd2_handle){+.+...}, at: [<c02a5995>] start_this_handle+0x4e4/0x51a
> [  808.875933] 
> [  808.875933] stack backtrace:
> [  808.875933] Pid: 13769, comm: fsx Not tainted 3.3.0-rc2-00592-gc56a0b2 #32
> [  808.875933] Call Trace:
> [  808.875933]  [<c01954fb>] print_circular_bug+0x194/0x1a1
> [  808.875933]  [<c0197598>] __lock_acquire+0x989/0xbf5
> [  808.875933]  [<c019789d>] lock_acquire+0x99/0xbd
> [  808.875933]  [<c028d900>] ? ext4_ext_punch_hole+0x2b8/0x382
> [  808.875933]  [<c06d65f4>] __mutex_lock_common+0x30/0x316
> [  808.875933]  [<c028d900>] ? ext4_ext_punch_hole+0x2b8/0x382
> [  808.875933]  [<c017d53a>] ? local_clock+0x3d/0x55
> [  808.875933]  [<c01942de>] ? lock_release_holdtime+0x2b/0xcd
> [  808.875933]  [<c028d8d9>] ? ext4_ext_punch_hole+0x291/0x382
> [  808.875933]  [<c06d6988>] mutex_lock_nested+0x26/0x2f
> [  808.875933]  [<c028d900>] ? ext4_ext_punch_hole+0x2b8/0x382
> [  808.875933]  [<c028d900>] ext4_ext_punch_hole+0x2b8/0x382
> [  808.875933]  [<c026e316>] ext4_punch_hole+0x5f/0x70
> [  808.875933]  [<c028fbce>] ext4_fallocate+0x63/0x469
> [  808.875933]  [<c017d4ed>] ? sched_clock_cpu+0x134/0x144
> [  808.875933]  [<c023473e>] ? fsnotify+0x1e8/0x202
> [  808.875933]  [<c01940d5>] ? trace_hardirqs_off+0xb/0xd
> [  808.875933]  [<c017d53a>] ? local_clock+0x3d/0x55
> [  808.875933]  [<c020a873>] ? fget+0x57/0x71
> [  808.875933]  [<c0208974>] do_fallocate+0xe7/0x105
> [  808.875933]  [<c02089c3>] sys_fallocate+0x31/0x46
> [  808.875933]  [<c06d8c5d>] syscall_call+0x7/0xb
> [  808.875933]  [<c06d0000>] ? init_intel+0x1aa/0x370
> 

-- 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-03-22 14:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-21  7:23 [PATCH 1/5] ext4: Remove restrictive checks for EOFBLOCKS_FL Lukas Czerner
2012-03-21  7:23 ` [PATCH 2/5] ext4: Correctly handle EOFBLOCKS flag in ext4_ext_punch_hole Lukas Czerner
2012-03-22  2:13   ` Ted Ts'o
2012-03-22  8:25     ` Lukas Czerner
2012-03-22 13:47       ` Ted Ts'o
2012-03-22 14:05         ` Lukas Czerner
2012-03-21  7:23 ` [PATCH 3/5] ext4: Allow punch hole beyond i_size Lukas Czerner
2012-03-21  7:23 ` [PATCH 4/5] ext4: Remove uneeded i_size handling Lukas Czerner
2012-03-21  7:23 ` [PATCH 5/5] ext4: Correct ext4_punch_hole return codes Lukas Czerner
2012-03-22  0:10 ` [PATCH 1/5] ext4: Remove restrictive checks for EOFBLOCKS_FL Allison Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).