Re: 2.6.35 causes deadlock when snapshotting root lv

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: 2.6.35 causes deadlock when snapshotting root lv
       [not found] <4C1A2839.5010407@cfl.rr.com>
@ 2010-06-17 14:27 ` Mike Snitzer
  2010-06-17 16:16   ` Phillip Susi
  0 siblings, 1 reply; 9+ messages in thread
From: Mike Snitzer @ 2010-06-17 14:27 UTC (permalink / raw)
  To: Phillip Susi; +Cc: device-mapper development, linux-fsdevel

On Thu, Jun 17 2010 at  9:50am -0400,
Phillip Susi <psusi@cfl.rr.com> wrote:

> What:
> 
> lvcreate -s -n snap -L 1g vg/root hangs in an unkillable state while
> trying to flush the fs and suspend the device to switch tables over to
> the snapshot, and no further IO is possible so the system must be hard
> booted with magic-sysrq.
> 
> Where: 2.6.35, all 3 release candidates.  2.6.34 and earlier kernels are
> fine.

Which filesystem are you using?  Which 2.6.35-rc?

There haven't been _any_ changes to DM for 2.6.35 (some DM fixes may get
pushed to Linus once Alasdair gets back from traveling).

So what this means is that the VFS (or ext4) freeze changes introduced
in 2.6.35 are the likely culprit:

http://git.kernel.org/linus/18e9e5104fcd9a9
http://git.kernel.org/linus/6b0310fbf087ad6

Though it could also be an issue with the various 2.6.35 writeback
changes (depending on which -rcX you're using).

Point is: this is not a DM regression.

Mike

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.35 causes deadlock when snapshotting root lv
  2010-06-17 14:27 ` 2.6.35 causes deadlock when snapshotting root lv Mike Snitzer
@ 2010-06-17 16:16   ` Phillip Susi
  2010-06-17 16:27     ` Mike Snitzer
  0 siblings, 1 reply; 9+ messages in thread
From: Phillip Susi @ 2010-06-17 16:16 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: linux-fsdevel, device-mapper development

On 6/17/2010 10:27 AM, Mike Snitzer wrote:
> Which filesystem are you using?  Which 2.6.35-rc?

ext4, rc1, 2, and 3.

> There haven't been _any_ changes to DM for 2.6.35 (some DM fixes may get
> pushed to Linus once Alasdair gets back from traveling).

I just finished combing through the git log myself and was getting
concerned that I couldn't find any dm changes.  At least now I know that
I couldn't find them because they aren't there.

> So what this means is that the VFS (or ext4) freeze changes introduced
> in 2.6.35 are the likely culprit:
> 
> http://git.kernel.org/linus/18e9e5104fcd9a9
> http://git.kernel.org/linus/6b0310fbf087ad6

Yes, these do look suspicious.  I'll test reverting them tonight.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.35 causes deadlock when snapshotting root lv
  2010-06-17 16:16   ` Phillip Susi
@ 2010-06-17 16:27     ` Mike Snitzer
  2010-06-18  1:29       ` Phillip Susi
  0 siblings, 1 reply; 9+ messages in thread
From: Mike Snitzer @ 2010-06-17 16:27 UTC (permalink / raw)
  To: Phillip Susi; +Cc: device-mapper development, linux-fsdevel

On Thu, Jun 17 2010 at 12:16pm -0400,
Phillip Susi <psusi@cfl.rr.com> wrote:

> On 6/17/2010 10:27 AM, Mike Snitzer wrote:
> > Which filesystem are you using?  Which 2.6.35-rc?
> 
> ext4, rc1, 2, and 3.
> 
> > There haven't been _any_ changes to DM for 2.6.35 (some DM fixes may get
> > pushed to Linus once Alasdair gets back from traveling).
> 
> I just finished combing through the git log myself and was getting
> concerned that I couldn't find any dm changes.  At least now I know that
> I couldn't find them because they aren't there.
> 
> > So what this means is that the VFS (or ext4) freeze changes introduced
> > in 2.6.35 are the likely culprit:
> > 
> > http://git.kernel.org/linus/18e9e5104fcd9a9
> > http://git.kernel.org/linus/6b0310fbf087ad6
> 
> Yes, these do look suspicious.  I'll test reverting them tonight.

OK, I'd suggest reverting the VFS change (18e9e5104fcd9a9) first.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.35 causes deadlock when snapshotting root lv
  2010-06-17 16:27     ` Mike Snitzer
@ 2010-06-18  1:29       ` Phillip Susi
  2010-06-18 18:55         ` Eric Sandeen
  0 siblings, 1 reply; 9+ messages in thread
From: Phillip Susi @ 2010-06-18  1:29 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development, linux-fsdevel, Eric Sandeen

On 06/17/2010 12:27 PM, Mike Snitzer wrote:
>>> http://git.kernel.org/linus/18e9e5104fcd9a9
>>> http://git.kernel.org/linus/6b0310fbf087ad6
>>
>> Yes, these do look suspicious.  I'll test reverting them tonight.
>
> OK, I'd suggest reverting the VFS change (18e9e5104fcd9a9) first.

Turns out it was 6b0310fbf087ad6 that did it.  CCing Eric Sandeen since 
he wrote it.  Eric, this patch seems to cause a deadlock when taking an 
lvm snapshot of the root lv.  Any idea why?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.35 causes deadlock when snapshotting root lv
  2010-06-18  1:29       ` Phillip Susi
@ 2010-06-18 18:55         ` Eric Sandeen
  2010-06-18 19:19           ` Eric Sandeen
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2010-06-18 18:55 UTC (permalink / raw)
  To: Phillip Susi
  Cc: Mike Snitzer, device-mapper development, linux-fsdevel,
	Eric Sandeen

Phillip Susi wrote:
> On 06/17/2010 12:27 PM, Mike Snitzer wrote:
>>>> http://git.kernel.org/linus/18e9e5104fcd9a9
>>>> http://git.kernel.org/linus/6b0310fbf087ad6
>>>
>>> Yes, these do look suspicious.  I'll test reverting them tonight.
>>
>> OK, I'd suggest reverting the VFS change (18e9e5104fcd9a9) first.
> 
> Turns out it was 6b0310fbf087ad6 that did it.  CCing Eric Sandeen since
> he wrote it.  Eric, this patch seems to cause a deadlock when taking an
> lvm snapshot of the root lv.  Any idea why?

I'll look.  A sysrq-w when it hangs up would probably be enlightening...

-Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.35 causes deadlock when snapshotting root lv
  2010-06-18 18:55         ` Eric Sandeen
@ 2010-06-18 19:19           ` Eric Sandeen
  2010-06-18 19:23             ` Eric Sandeen
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2010-06-18 19:19 UTC (permalink / raw)
  To: Phillip Susi
  Cc: Mike Snitzer, device-mapper development, linux-fsdevel,
	Eric Sandeen

Eric Sandeen wrote:
> Phillip Susi wrote:
>> On 06/17/2010 12:27 PM, Mike Snitzer wrote:
>>>>> http://git.kernel.org/linus/18e9e5104fcd9a9
>>>>> http://git.kernel.org/linus/6b0310fbf087ad6
>>>> Yes, these do look suspicious.  I'll test reverting them tonight.
>>> OK, I'd suggest reverting the VFS change (18e9e5104fcd9a9) first.
>> Turns out it was 6b0310fbf087ad6 that did it.  CCing Eric Sandeen since
>> he wrote it.  Eric, this patch seems to cause a deadlock when taking an
>> lvm snapshot of the root lv.  Any idea why?
> 
> I'll look.  A sysrq-w when it hangs up would probably be enlightening...
> 
> -Eric

FWIW simple freeze/unfreeze works for me, and lvm snap on a non-root ext4
partition seems to as well.  I'm out for a few days and probably won't
get a root lv set up to test anytime soon.  If you can provide sysrq-w
output that probably will be a big help if it doesn't end up being obvious
by inspection.  :)

Thanks
-Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.35 causes deadlock when snapshotting root lv
  2010-06-18 19:19           ` Eric Sandeen
@ 2010-06-18 19:23             ` Eric Sandeen
  2010-06-23 19:37               ` [PATCH] ext4: fix freeze deadlock under IO Eric Sandeen
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2010-06-18 19:23 UTC (permalink / raw)
  To: Phillip Susi
  Cc: Mike Snitzer, device-mapper development, linux-fsdevel,
	Eric Sandeen

Eric Sandeen wrote:
> Eric Sandeen wrote:
>> Phillip Susi wrote:
>>> On 06/17/2010 12:27 PM, Mike Snitzer wrote:
>>>>>> http://git.kernel.org/linus/18e9e5104fcd9a9
>>>>>> http://git.kernel.org/linus/6b0310fbf087ad6
>>>>> Yes, these do look suspicious.  I'll test reverting them tonight.
>>>> OK, I'd suggest reverting the VFS change (18e9e5104fcd9a9) first.
>>> Turns out it was 6b0310fbf087ad6 that did it.  CCing Eric Sandeen since
>>> he wrote it.  Eric, this patch seems to cause a deadlock when taking an
>>> lvm snapshot of the root lv.  Any idea why?
>> I'll look.  A sysrq-w when it hangs up would probably be enlightening...
>>
>> -Eric
> 
> FWIW simple freeze/unfreeze works for me, and lvm snap on a non-root ext4
> partition seems to as well.  I'm out for a few days and probably won't
> get a root lv set up to test anytime soon.  If you can provide sysrq-w
> output that probably will be a big help if it doesn't end up being obvious
> by inspection.  :)

Spoke too soon.  A little active IO at time of snap did the trick.

SysRq : Show Blocked State
  task                        PC stack   pid father
jbd2/dm-2-8   D 0000000000000003     0  1297      2 0x00000080
 ffff88013bae8800 0000000000000046 0000000000000000 ffffffff81128046
 ffff8800ad876800 ffff8800ad876bb0 00000001007af71d 0000000000000000
 0000000000000000 ffff8800ad871024 ffff8800baea1d70 ffff8800ae01ec70
Call Trace:
 [<ffffffff81128046>] ? sync_dirty_buffer+0x7b/0x96
 [<ffffffffa01924af>] ? jbd2_journal_commit_transaction+0x249/0x13e2 [jbd2]
 [<ffffffff8106f420>] ? autoremove_wake_function+0x0/0x2e
 [<ffffffff8100fd47>] ? __switch_to+0xc3/0x1ec
 [<ffffffff81049489>] ? finish_task_switch+0x4c/0x72
 [<ffffffff81042135>] ? need_resched+0x1a/0x23
 [<ffffffff810605d4>] ? lock_timer_base+0x26/0x4b
 [<ffffffff81060661>] ? try_to_del_timer_sync+0x68/0x74
 [<ffffffffa0199194>] ? kjournald2+0x110/0x33b [jbd2]
 [<ffffffff8106f420>] ? autoremove_wake_function+0x0/0x2e
 [<ffffffffa0199084>] ? kjournald2+0x0/0x33b [jbd2]
 [<ffffffff8106f0d4>] ? kthread+0x65/0x6d
 [<ffffffff81011b0a>] ? child_rip+0xa/0x20
 [<ffffffff8106f06f>] ? kthread+0x0/0x6d
 [<ffffffff81011b00>] ? child_rip+0x0/0x20
dd            D 0000000000000001     0  2548  17647 0x00000080
 ffff88013ba5c800 0000000000000086 0000000000000000 ffff8800bd889af8
 ffff8800be8df000 ffff8800be8df3b0 00000001007af709 0000000000000000
 0000000000000000 ffff88010fbf3400 ffff88010fbf3658 ffff88003e9f4c28
Call Trace:
 [<ffffffffa034ec97>] ? ext4_journal_start_sb+0x84/0x105 [ext4]
 [<ffffffff8106f420>] ? autoremove_wake_function+0x0/0x2e
 [<ffffffff81125ccb>] ? __block_commit_write+0xa4/0xb3
 [<ffffffffa033ddc1>] ? ext4_dirty_inode+0x13/0x3f [ext4]
 [<ffffffff811208dd>] ? __mark_inode_dirty+0x25/0x11e
 [<ffffffff81125e57>] ? generic_write_end+0x4d/0x6a
 [<ffffffffa033eee3>] ? ext4_da_write_end+0x224/0x28c [ext4]
 [<ffffffffa033d82a>] ? ext4_da_get_block_prep+0x0/0x2ee [ext4]
 [<ffffffff810bfe86>] ? generic_file_buffered_write+0x164/0x226
 [<ffffffff8111ceda>] ? generic_getxattr+0x4c/0x59
 [<ffffffff810c02cf>] ? __generic_file_aio_write+0x25d/0x2b1
 [<ffffffff81042143>] ? should_resched+0x5/0x25
 [<ffffffff810c037c>] ? generic_file_aio_write+0x59/0xa1
 [<ffffffff8110391f>] ? do_sync_write+0xc9/0x10c
 [<ffffffff8106f420>] ? autoremove_wake_function+0x0/0x2e
 [<ffffffff81104026>] ? vfs_write+0xa9/0x101
 [<ffffffff81104b4d>] ? sys_write+0x45/0x6b
 [<ffffffff81010a82>] ? system_call_fastpath+0x16/0x1b
lvcreate      D 0000000000000002     0  2550  17647 0x00000080
 ffff88013ba5e800 0000000000000086 0000000000000000 ffff8800ad8710c8
 ffff8800be8df800 ffff8800be8dfbb0 00000001007af718 0000000000000000
 0000000000000000 ffff8800ad871024 ffff8800ad871000 ffff8800ad871098
Call Trace:
 [<ffffffffa00a409f>] ? dev_suspend+0x0/0x1c5 [dm_mod]
 [<ffffffffa01981da>] ? jbd2_log_wait_commit+0x112/0x163 [jbd2]
 [<ffffffff8106f420>] ? autoremove_wake_function+0x0/0x2e
 [<ffffffffa0198fb0>] ? jbd2_journal_start_commit+0x34/0x6b [jbd2]
 [<ffffffffa034e349>] ? ext4_sync_fs+0x74/0x81 [ext4]
 [<ffffffff8114a78c>] ? sync_quota_sb+0x45/0xe1
 [<ffffffff8112392d>] ? __sync_filesystem+0x3d/0x70
 [<ffffffff8112ae48>] ? freeze_bdev+0x8a/0x111
 [<ffffffffa00a0898>] ? dm_suspend+0x8a/0x175 [dm_mod]
 [<ffffffffa00a40fa>] ? dev_suspend+0x5b/0x1c5 [dm_mod]
 [<ffffffffa00a4b71>] ? dm_ctl_ioctl+0x22f/0x293 [dm_mod]
 [<ffffffff8103ed00>] ? place_entity+0x6e/0x95
 [<ffffffff810435ce>] ? __dequeue_entity+0x1b/0x2f
 [<ffffffff811115f9>] ? vfs_ioctl+0x21/0x6b
 [<ffffffff81111b3d>] ? do_vfs_ioctl+0x487/0x4da
 [<ffffffff8103e55a>] ? pick_next_task+0x1b/0x3c
 [<ffffffff81042135>] ? need_resched+0x1a/0x23
 [<ffffffff813b24c8>] ? thread_return+0x9b/0xb2
 [<ffffffff81111be1>] ? sys_ioctl+0x51/0x70
 [<ffffffff81010a82>] ? system_call_fastpath+0x16/0x1b

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] ext4: fix freeze deadlock under IO
  2010-06-18 19:23             ` Eric Sandeen
@ 2010-06-23 19:37               ` Eric Sandeen
  2010-08-01 21:41                 ` Ted Ts'o
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2010-06-23 19:37 UTC (permalink / raw)
  To: Phillip Susi
  Cc: Mike Snitzer, device-mapper development, linux-fsdevel,
	Eric Sandeen, ext4 development, Dave Chinner

Commit 6b0310fbf087ad6 caused a regression resulting in deadlocks
when freezing a filesystem which had active IO; the vfs_check_frozen
level (SB_FREEZE_WRITE) did not let the freeze-related IO syncing
through.  Duh.

Changing the test to FREEZE_TRANS should let the normal freeze
syncing get through the fs, but still block any transactions from
starting once the fs is completely frozen.

I tested this by running fsstress in the background while periodically
snapshotting the fs and running fsck on the result.  I ran into
occasional deadlocks, but different ones.  I think this is a
fine fix for the problem at hand, and the other deadlocky things
will need more investigation.

Reported-by: Phillip Susi <psusi@cfl.rr.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 4e8983a..a45ced9 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -241,7 +241,7 @@ handle_t *ext4_journal_start_sb(struct super_block *sb, int nblocks)
 	if (sb->s_flags & MS_RDONLY)
 		return ERR_PTR(-EROFS);

-	vfs_check_frozen(sb, SB_FREEZE_WRITE);
+	vfs_check_frozen(sb, SB_FREEZE_TRANS);
 	/* Special case here: if the journal has aborted behind our
 	 * backs (eg. EIO in the commit thread), then we still need to
 	 * take the FS itself readonly cleanly. */
@@ -3491,7 +3491,7 @@ int ext4_force_commit(struct super_block *sb)

 	journal = EXT4_SB(sb)->s_journal;
 	if (journal) {
-		vfs_check_frozen(sb, SB_FREEZE_WRITE);
+		vfs_check_frozen(sb, SB_FREEZE_TRANS);
 		ret = ext4_journal_force_commit(journal);
 	}

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] ext4: fix freeze deadlock under IO
  2010-06-23 19:37               ` [PATCH] ext4: fix freeze deadlock under IO Eric Sandeen
@ 2010-08-01 21:41                 ` Ted Ts'o
  0 siblings, 0 replies; 9+ messages in thread
From: Ted Ts'o @ 2010-08-01 21:41 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Phillip Susi, Mike Snitzer, device-mapper development,
	linux-fsdevel, Eric Sandeen, ext4 development, Dave Chinner

On Wed, Jun 23, 2010 at 02:37:27PM -0500, Eric Sandeen wrote:
> Commit 6b0310fbf087ad6 caused a regression resulting in deadlocks
> when freezing a filesystem which had active IO; the vfs_check_frozen
> level (SB_FREEZE_WRITE) did not let the freeze-related IO syncing
> through.  Duh.
> 
> Changing the test to FREEZE_TRANS should let the normal freeze
> syncing get through the fs, but still block any transactions from
> starting once the fs is completely frozen.
> 
> I tested this by running fsstress in the background while periodically
> snapshotting the fs and running fsck on the result.  I ran into
> occasional deadlocks, but different ones.  I think this is a
> fine fix for the problem at hand, and the other deadlocky things
> will need more investigation.
> 
> Reported-by: Phillip Susi <psusi@cfl.rr.com>
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>

Applied to the ext4 patch queue.  Sorry for missing this earier.

	       	    	  	  	    - Ted

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-08-01 21:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <4C1A2839.5010407@cfl.rr.com>
2010-06-17 14:27 ` 2.6.35 causes deadlock when snapshotting root lv Mike Snitzer
2010-06-17 16:16   ` Phillip Susi
2010-06-17 16:27     ` Mike Snitzer
2010-06-18  1:29       ` Phillip Susi
2010-06-18 18:55         ` Eric Sandeen
2010-06-18 19:19           ` Eric Sandeen
2010-06-18 19:23             ` Eric Sandeen
2010-06-23 19:37               ` [PATCH] ext4: fix freeze deadlock under IO Eric Sandeen
2010-08-01 21:41                 ` Ted Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).