Recipe for creating unlink deadlocks

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Recipe for creating unlink deadlocks
@ 2015-05-07 22:20 Zygo Blaxell
  2015-05-08 10:32 ` Filipe David Manana
  0 siblings, 1 reply; 6+ messages in thread
From: Zygo Blaxell @ 2015-05-07 22:20 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1400 bytes --]

This is the simplest repro recipe for this that I have found so far.
It takes only a few minutes for the rm processes to get stuck here:

# cat /proc/28396/stack                                                                                                                                       Thu May  7 18:13:05 2015

[<ffffffff813c8a2d>] lock_extent_bits+0x1ad/0x200
[<ffffffff813b5dfa>] btrfs_evict_inode+0x17a/0x5e0
[<ffffffff8123fc68>] evict+0xb8/0x1b0
[<ffffffff81240813>] iput+0x1f3/0x260
[<ffffffff81233c68>] do_unlinkat+0x1d8/0x360
[<ffffffff812346db>] SyS_unlinkat+0x1b/0x40
[<ffffffff8190024d>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff


Run these three scripts in a directory that is the top of a subvol:

	# Script #1:  randomly create or delete snapshots
	while sleep 1; do
		if [ $[RANDOM%2] = 0 ]; then
			btrfs sub snap . snaps-$RANDOM
		else
			for x in snaps-*; do
				btrfs sub del $x
				break
			done
			btrfs sub sync .
		fi
	done 

	# Script #2:  create a bunch of files of random sizes
	while true; do
		d=$[RANDOM%9]/$[RANDOM%9]/$[RANDOM%9]/$[RANDOM%9]
		mkdir -p ${d%/*}
		head -c $[RANDOM%1024]k /usr/share/doc/chromium/copyright > $d
	done 

	# Script #3:  read and immediately delete all the files
	while date; do
		sleep 1
		find -type f -exec cat {} \; -exec rm -fv {} \; > /dev/null
	done 


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Recipe for creating unlink deadlocks
  2015-05-07 22:20 Recipe for creating unlink deadlocks Zygo Blaxell
@ 2015-05-08 10:32 ` Filipe David Manana
  2015-05-08 13:21   ` Zygo Blaxell
  0 siblings, 1 reply; 6+ messages in thread
From: Filipe David Manana @ 2015-05-08 10:32 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: linux-btrfs@vger.kernel.org

On Thu, May 7, 2015 at 11:20 PM, Zygo Blaxell
<ce3g8jdj@umail.furryterror.org> wrote:
> This is the simplest repro recipe for this that I have found so far.
> It takes only a few minutes for the rm processes to get stuck here:
>
> # cat /proc/28396/stack                                                                                                                                       Thu May  7 18:13:05 2015
>
> [<ffffffff813c8a2d>] lock_extent_bits+0x1ad/0x200
> [<ffffffff813b5dfa>] btrfs_evict_inode+0x17a/0x5e0
> [<ffffffff8123fc68>] evict+0xb8/0x1b0
> [<ffffffff81240813>] iput+0x1f3/0x260
> [<ffffffff81233c68>] do_unlinkat+0x1d8/0x360
> [<ffffffff812346db>] SyS_unlinkat+0x1b/0x40
> [<ffffffff8190024d>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
>
>
> Run these three scripts in a directory that is the top of a subvol:
>
>         # Script #1:  randomly create or delete snapshots
>         while sleep 1; do
>                 if [ $[RANDOM%2] = 0 ]; then
>                         btrfs sub snap . snaps-$RANDOM
>                 else
>                         for x in snaps-*; do
>                                 btrfs sub del $x
>                                 break
>                         done
>                         btrfs sub sync .
>                 fi
>         done
>
>         # Script #2:  create a bunch of files of random sizes
>         while true; do
>                 d=$[RANDOM%9]/$[RANDOM%9]/$[RANDOM%9]/$[RANDOM%9]
>                 mkdir -p ${d%/*}
>                 head -c $[RANDOM%1024]k /usr/share/doc/chromium/copyright > $d
>         done
>
>         # Script #3:  read and immediately delete all the files
>         while date; do
>                 sleep 1
>                 find -type f -exec cat {} \; -exec rm -fv {} \; > /dev/null
>         done

Tried that for over 3 hours, on a 4.1-rc2 kernel with a few patches
from the list, with several combinations of mount options (compress,
autodefrag, nodatacow, etc) and didn't got any issue.

What kernel version are you testing? Any specific combination of mount options?

>



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Recipe for creating unlink deadlocks
  2015-05-08 10:32 ` Filipe David Manana
@ 2015-05-08 13:21   ` Zygo Blaxell
       [not found]     ` <20150513053340.GF18025@hungrycats.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Zygo Blaxell @ 2015-05-08 13:21 UTC (permalink / raw)
  To: Filipe David Manana; +Cc: linux-btrfs@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 2913 bytes --]

On Fri, May 08, 2015 at 11:32:07AM +0100, Filipe David Manana wrote:
> On Thu, May 7, 2015 at 11:20 PM, Zygo Blaxell
> <ce3g8jdj@umail.furryterror.org> wrote:
> > This is the simplest repro recipe for this that I have found so far.
> > It takes only a few minutes for the rm processes to get stuck here:
> >
> > # cat /proc/28396/stack                                                                                                                                       Thu May  7 18:13:05 2015
> >
> > [<ffffffff813c8a2d>] lock_extent_bits+0x1ad/0x200
> > [<ffffffff813b5dfa>] btrfs_evict_inode+0x17a/0x5e0
> > [<ffffffff8123fc68>] evict+0xb8/0x1b0
> > [<ffffffff81240813>] iput+0x1f3/0x260
> > [<ffffffff81233c68>] do_unlinkat+0x1d8/0x360
> > [<ffffffff812346db>] SyS_unlinkat+0x1b/0x40
> > [<ffffffff8190024d>] system_call_fastpath+0x16/0x1b
> > [<ffffffffffffffff>] 0xffffffffffffffff
> >
> >
> > Run these three scripts in a directory that is the top of a subvol:
> >
> >         # Script #1:  randomly create or delete snapshots
> >         while sleep 1; do
> >                 if [ $[RANDOM%2] = 0 ]; then
> >                         btrfs sub snap . snaps-$RANDOM
> >                 else
> >                         for x in snaps-*; do
> >                                 btrfs sub del $x
> >                                 break
> >                         done
> >                         btrfs sub sync .
> >                 fi
> >         done
> >
> >         # Script #2:  create a bunch of files of random sizes
> >         while true; do
> >                 d=$[RANDOM%9]/$[RANDOM%9]/$[RANDOM%9]/$[RANDOM%9]
> >                 mkdir -p ${d%/*}
> >                 head -c $[RANDOM%1024]k /usr/share/doc/chromium/copyright > $d
> >         done
> >
> >         # Script #3:  read and immediately delete all the files
> >         while date; do
> >                 sleep 1
> >                 find -type f -exec cat {} \; -exec rm -fv {} \; > /dev/null
> >         done
> 
> Tried that for over 3 hours, on a 4.1-rc2 kernel with a few patches
> from the list, with several combinations of mount options (compress,
> autodefrag, nodatacow, etc) and didn't got any issue.
> 
> What kernel version are you testing? Any specific combination of mount options?

I've seen it on the field on versions from v3.15 to v4.0.1.  The test I did
yesterday was v4.0.1.

Mount options are rw,relatime,compress-force=zlib,space_cache.

> >
> 
> 
> 
> -- 
> Filipe David Manana,
> 
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <20150513053340.GF18025@hungrycats.org>]

* Re: Recipe for creating unlink deadlocks (v2, verified on 4.1-rc3) (resend)
       [not found]     ` <20150513053340.GF18025@hungrycats.org>
@ 2015-05-13 23:07       ` Zygo Blaxell
  2015-05-14 12:25         ` Filipe David Manana
  0 siblings, 1 reply; 6+ messages in thread
From: Zygo Blaxell @ 2015-05-13 23:07 UTC (permalink / raw)
  To: Filipe David Manana; +Cc: linux-btrfs@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 6015 bytes --]

[Apologies for the duplication, if any.  I gave the original, longer
version 18 hours to appear, and it doesn't seem to have shown up yet.]

On Fri, May 08, 2015 at 09:21:02AM -0400, Zygo Blaxell wrote:
> On Fri, May 08, 2015 at 11:32:07AM +0100, Filipe David Manana wrote:
> > On Thu, May 7, 2015 at 11:20 PM, Zygo Blaxell
> > <ce3g8jdj@umail.furryterror.org> wrote:
> > > This is the simplest repro recipe for this that I have found so far.
> > > It takes only a few minutes for the rm processes to get stuck here:
> > >
> > > # cat /proc/28396/stack                                                                                                                                       Thu May  7 18:13:05 2015
> > >
> > > [<ffffffff813c8a2d>] lock_extent_bits+0x1ad/0x200
> > > [<ffffffff813b5dfa>] btrfs_evict_inode+0x17a/0x5e0
> > > [<ffffffff8123fc68>] evict+0xb8/0x1b0
> > > [<ffffffff81240813>] iput+0x1f3/0x260
> > > [<ffffffff81233c68>] do_unlinkat+0x1d8/0x360
> > > [<ffffffff812346db>] SyS_unlinkat+0x1b/0x40
> > > [<ffffffff8190024d>] system_call_fastpath+0x16/0x1b
> > > [<ffffffffffffffff>] 0xffffffffffffffff
> > >
> > >
> > > Run these three scripts in a directory that is the top of a subvol:

New versions of these scripts make the results a little more reproducible:


	#!/bin/bash
	set -x
	# Script #1:  randomly create or delete snapshots
	# v2:  no significant changes
	while date; do
		if [ $[RANDOM%2] = 0 ]; then
			btrfs sub snap . snaps-$RANDOM
		else
			for x in snaps-*; do
				btrfs sub del $x
				break
			done
			btrfs sub sync .
		fi
		sleep 1
	done


	#!/bin/bash
	# Script #2:  create a bunch of files of random sizes
	# v2:  create our own test file instead of using Chromium's 1.6M copyright file
	while echo -ne "\r$(date)"; do
		[ -s tester ] || head -c 1024k /dev/urandom > tester
		d=$[RANDOM%9]/$[RANDOM%9]/$[RANDOM%9]/$[RANDOM%9]
		mkdir -p ${d%/*}
		head -c $[RANDOM%1024]k tester > $d
	done



	#!/bin/bash
	set -x
	# Script #3:  read and immediately delete all the files
	# v3:  let script #2 create some more files
	while date; do
		find *[0-9] -type f -exec sh -c 'cat >/dev/null "$@"' -- {} \; -exec rm -fv {} \;

		# Allow some files to build up between runs
		sleep 1m

		# Make sure we are not reading from cache.
		# These are not strictly necessary but they reduce
		# the repro time by a minute or so.
		sync
		sysctl vm.drop_caches=1
	done


> > Tried that for over 3 hours, on a 4.1-rc2 kernel with a few patches
> > from the list, with several combinations of mount options (compress,
> > autodefrag, nodatacow, etc) and didn't got any issue.
> >
> > What kernel version are you testing? Any specific combination of mount options?
>
> I've seen it on the field on versions from v3.15 to v4.0.1.  The test I did
> yesterday was v4.0.1.

I just verified that the issue is still present in v4.1-rc3.  Tested on
bare hardware and kvm, and a mix of AMD and Intel CPUs.

The issue appears immediately after the test file collection becomes too
large to fit in the host RAM.  In my test environment I used RAM sizes
from 3GB to 16GB with a 16GB btrfs filesystem.  The test ran without
incident until the filesystem used space (reported by df) exceeded the
RAM size, then rm hung a few seconds later.

If I reboot after an rm hang and run scripts #1 and #3 (snapshots and
rm), it hangs almost immediately as soon as subvol delete and rm are
running at the same time.

If script #3 (remove files) runs too quickly (i.e. your disks are too
fast ;), try delaying script #3 until after #1 and #2 have accumulated
enough data to exceed RAM size.

I used default mkfs and mount options this time.  For kvm tests I used
a freshly debootstrapped Debian Jessie, and for the bare hardware tests
I used some random Debian Wheezy systems.

My kernel config file, logs, and repro scripts are available at:

	http://furryterror.org/~zblaxell/tmp/.ma12/

This is part of the kernel log after a typical failure (the whole thing
is available at the URL above):

	May 13 04:59:34 testhost kernel: [  720.290141] rm              D ffff8800ab8ebc78     0  4994  23006 0x00000000
	May 13 04:59:34 testhost kernel: [  720.329903]  ffff8800ab8ebc78 ffffffff814291b8 00000000ffffffff ffff8800aa831000
	May 13 04:59:34 testhost kernel: [  720.330512]  ffff8800ac4ed000 00000000000b0000 ffff8800ab8ec000 ffff8800acd670f0
	May 13 04:59:34 testhost kernel: [  720.331090]  ffff8800acd670d0 00000000000b0000 ffff8800aa62dae0 ffff8800ab8ebc98
	May 13 04:59:34 testhost kernel: [  720.400161] Call Trace:
	May 13 04:59:34 testhost kernel: [  720.401020]  [<ffffffff814291b8>] ? lock_extent_bits+0x1a8/0x200
	May 13 04:59:34 testhost kernel: [  720.462318]  [<ffffffff819a8297>] schedule+0x37/0x90
	May 13 04:59:34 testhost kernel: [  720.467741]  [<ffffffff814291bd>] lock_extent_bits+0x1ad/0x200
	May 13 04:59:34 testhost kernel: [  720.468489]  [<ffffffff810dfa30>] ? wait_woken+0xc0/0xc0
	May 13 04:59:34 testhost kernel: [  720.492572]  [<ffffffff814156ea>] btrfs_evict_inode+0x19a/0x760
	May 13 04:59:34 testhost kernel: [  720.493048]  [<ffffffff8127fc88>] evict+0xb8/0x1b0
	May 13 04:59:34 testhost kernel: [  720.494419]  [<ffffffff812808fe>] iput+0x2be/0x3e0
	May 13 04:59:34 testhost kernel: [  720.494598]  [<ffffffff81272cb8>] do_unlinkat+0x208/0x330
	May 13 04:59:34 testhost kernel: [  720.495086]  [<ffffffff81265cda>] ? SyS_newfstatat+0x2a/0x40
	May 13 04:59:34 testhost kernel: [  720.495511]  [<ffffffff81566925>] ? lockdep_sys_exit_thunk+0x12/0x14
	May 13 04:59:34 testhost kernel: [  720.495750]  [<ffffffff8127359b>] SyS_unlinkat+0x1b/0x40
	May 13 04:59:34 testhost kernel: [  720.496101]  [<ffffffff819af5b2>] system_call_fastpath+0x16/0x7a
	May 13 04:59:34 testhost kernel: [  720.628063] 1 lock held by rm/4994:
	May 13 04:59:34 testhost kernel: [  720.628239]  #0:  (sb_writers#3){.+.+.+}, at: [<ffffffff812861f4>] mnt_want_write+0x24/0x50

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Recipe for creating unlink deadlocks (v2, verified on 4.1-rc3) (resend)
  2015-05-13 23:07       ` Recipe for creating unlink deadlocks (v2, verified on 4.1-rc3) (resend) Zygo Blaxell
@ 2015-05-14 12:25         ` Filipe David Manana
  2015-05-14 19:45           ` Filipe David Manana
  0 siblings, 1 reply; 6+ messages in thread
From: Filipe David Manana @ 2015-05-14 12:25 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: linux-btrfs@vger.kernel.org

On Thu, May 14, 2015 at 12:07 AM, Zygo Blaxell
<ce3g8jdj@umail.furryterror.org> wrote:
> [Apologies for the duplication, if any.  I gave the original, longer
> version 18 hours to appear, and it doesn't seem to have shown up yet.]
>
> On Fri, May 08, 2015 at 09:21:02AM -0400, Zygo Blaxell wrote:
>> On Fri, May 08, 2015 at 11:32:07AM +0100, Filipe David Manana wrote:
>> > On Thu, May 7, 2015 at 11:20 PM, Zygo Blaxell
>> > <ce3g8jdj@umail.furryterror.org> wrote:
>> > > This is the simplest repro recipe for this that I have found so far.
>> > > It takes only a few minutes for the rm processes to get stuck here:
>> > >
>> > > # cat /proc/28396/stack                                                                                                                                       Thu May  7 18:13:05 2015
>> > >
>> > > [<ffffffff813c8a2d>] lock_extent_bits+0x1ad/0x200
>> > > [<ffffffff813b5dfa>] btrfs_evict_inode+0x17a/0x5e0
>> > > [<ffffffff8123fc68>] evict+0xb8/0x1b0
>> > > [<ffffffff81240813>] iput+0x1f3/0x260
>> > > [<ffffffff81233c68>] do_unlinkat+0x1d8/0x360
>> > > [<ffffffff812346db>] SyS_unlinkat+0x1b/0x40
>> > > [<ffffffff8190024d>] system_call_fastpath+0x16/0x1b
>> > > [<ffffffffffffffff>] 0xffffffffffffffff
>> > >
>> > >
>> > > Run these three scripts in a directory that is the top of a subvol:
>
> New versions of these scripts make the results a little more reproducible:
>
>
>         #!/bin/bash
>         set -x
>         # Script #1:  randomly create or delete snapshots
>         # v2:  no significant changes
>         while date; do
>                 if [ $[RANDOM%2] = 0 ]; then
>                         btrfs sub snap . snaps-$RANDOM
>                 else
>                         for x in snaps-*; do
>                                 btrfs sub del $x
>                                 break
>                         done
>                         btrfs sub sync .
>                 fi
>                 sleep 1
>         done
>
>
>         #!/bin/bash
>         # Script #2:  create a bunch of files of random sizes
>         # v2:  create our own test file instead of using Chromium's 1.6M copyright file
>         while echo -ne "\r$(date)"; do
>                 [ -s tester ] || head -c 1024k /dev/urandom > tester
>                 d=$[RANDOM%9]/$[RANDOM%9]/$[RANDOM%9]/$[RANDOM%9]
>                 mkdir -p ${d%/*}
>                 head -c $[RANDOM%1024]k tester > $d
>         done
>
>
>
>         #!/bin/bash
>         set -x
>         # Script #3:  read and immediately delete all the files
>         # v3:  let script #2 create some more files
>         while date; do
>                 find *[0-9] -type f -exec sh -c 'cat >/dev/null "$@"' -- {} \; -exec rm -fv {} \;
>
>                 # Allow some files to build up between runs
>                 sleep 1m
>
>                 # Make sure we are not reading from cache.
>                 # These are not strictly necessary but they reduce
>                 # the repro time by a minute or so.
>                 sync
>                 sysctl vm.drop_caches=1
>         done
>
>
>> > Tried that for over 3 hours, on a 4.1-rc2 kernel with a few patches
>> > from the list, with several combinations of mount options (compress,
>> > autodefrag, nodatacow, etc) and didn't got any issue.
>> >
>> > What kernel version are you testing? Any specific combination of mount options?
>>
>> I've seen it on the field on versions from v3.15 to v4.0.1.  The test I did
>> yesterday was v4.0.1.
>
> I just verified that the issue is still present in v4.1-rc3.  Tested on
> bare hardware and kvm, and a mix of AMD and Intel CPUs.
>
> The issue appears immediately after the test file collection becomes too
> large to fit in the host RAM.  In my test environment I used RAM sizes
> from 3GB to 16GB with a 16GB btrfs filesystem.  The test ran without
> incident until the filesystem used space (reported by df) exceeded the
> RAM size, then rm hung a few seconds later.
>
> If I reboot after an rm hang and run scripts #1 and #3 (snapshots and
> rm), it hangs almost immediately as soon as subvol delete and rm are
> running at the same time.
>
> If script #3 (remove files) runs too quickly (i.e. your disks are too
> fast ;), try delaying script #3 until after #1 and #2 have accumulated
> enough data to exceed RAM size.
>
> I used default mkfs and mount options this time.  For kvm tests I used
> a freshly debootstrapped Debian Jessie, and for the bare hardware tests
> I used some random Debian Wheezy systems.
>
> My kernel config file, logs, and repro scripts are available at:
>
>         http://furryterror.org/~zblaxell/tmp/.ma12/
>
> This is part of the kernel log after a typical failure (the whole thing
> is available at the URL above):
>
>         May 13 04:59:34 testhost kernel: [  720.290141] rm              D ffff8800ab8ebc78     0  4994  23006 0x00000000
>         May 13 04:59:34 testhost kernel: [  720.329903]  ffff8800ab8ebc78 ffffffff814291b8 00000000ffffffff ffff8800aa831000
>         May 13 04:59:34 testhost kernel: [  720.330512]  ffff8800ac4ed000 00000000000b0000 ffff8800ab8ec000 ffff8800acd670f0
>         May 13 04:59:34 testhost kernel: [  720.331090]  ffff8800acd670d0 00000000000b0000 ffff8800aa62dae0 ffff8800ab8ebc98
>         May 13 04:59:34 testhost kernel: [  720.400161] Call Trace:
>         May 13 04:59:34 testhost kernel: [  720.401020]  [<ffffffff814291b8>] ? lock_extent_bits+0x1a8/0x200
>         May 13 04:59:34 testhost kernel: [  720.462318]  [<ffffffff819a8297>] schedule+0x37/0x90
>         May 13 04:59:34 testhost kernel: [  720.467741]  [<ffffffff814291bd>] lock_extent_bits+0x1ad/0x200
>         May 13 04:59:34 testhost kernel: [  720.468489]  [<ffffffff810dfa30>] ? wait_woken+0xc0/0xc0
>         May 13 04:59:34 testhost kernel: [  720.492572]  [<ffffffff814156ea>] btrfs_evict_inode+0x19a/0x760
>         May 13 04:59:34 testhost kernel: [  720.493048]  [<ffffffff8127fc88>] evict+0xb8/0x1b0
>         May 13 04:59:34 testhost kernel: [  720.494419]  [<ffffffff812808fe>] iput+0x2be/0x3e0
>         May 13 04:59:34 testhost kernel: [  720.494598]  [<ffffffff81272cb8>] do_unlinkat+0x208/0x330
>         May 13 04:59:34 testhost kernel: [  720.495086]  [<ffffffff81265cda>] ? SyS_newfstatat+0x2a/0x40
>         May 13 04:59:34 testhost kernel: [  720.495511]  [<ffffffff81566925>] ? lockdep_sys_exit_thunk+0x12/0x14
>         May 13 04:59:34 testhost kernel: [  720.495750]  [<ffffffff8127359b>] SyS_unlinkat+0x1b/0x40
>         May 13 04:59:34 testhost kernel: [  720.496101]  [<ffffffff819af5b2>] system_call_fastpath+0x16/0x7a
>         May 13 04:59:34 testhost kernel: [  720.628063] 1 lock held by rm/4994:
>         May 13 04:59:34 testhost kernel: [  720.628239]  #0:  (sb_writers#3){.+.+.+}, at: [<ffffffff812861f4>] mnt_want_write+0x24/0x50

Thanks. After over 1 hour running these scripts I was able to
reproduce. I'll see if I can figure out why it happens and fix it.



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Recipe for creating unlink deadlocks (v2, verified on 4.1-rc3) (resend)
  2015-05-14 12:25         ` Filipe David Manana
@ 2015-05-14 19:45           ` Filipe David Manana
  0 siblings, 0 replies; 6+ messages in thread
From: Filipe David Manana @ 2015-05-14 19:45 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: linux-btrfs@vger.kernel.org

On Thu, May 14, 2015 at 1:25 PM, Filipe David Manana <fdmanana@gmail.com> wrote:
> On Thu, May 14, 2015 at 12:07 AM, Zygo Blaxell
> <ce3g8jdj@umail.furryterror.org> wrote:
>> [Apologies for the duplication, if any.  I gave the original, longer
>> version 18 hours to appear, and it doesn't seem to have shown up yet.]
>>
>> On Fri, May 08, 2015 at 09:21:02AM -0400, Zygo Blaxell wrote:
>>> On Fri, May 08, 2015 at 11:32:07AM +0100, Filipe David Manana wrote:
>>> > On Thu, May 7, 2015 at 11:20 PM, Zygo Blaxell
>>> > <ce3g8jdj@umail.furryterror.org> wrote:
>>> > > This is the simplest repro recipe for this that I have found so far.
>>> > > It takes only a few minutes for the rm processes to get stuck here:
>>> > >
>>> > > # cat /proc/28396/stack                                                                                                                                       Thu May  7 18:13:05 2015
>>> > >
>>> > > [<ffffffff813c8a2d>] lock_extent_bits+0x1ad/0x200
>>> > > [<ffffffff813b5dfa>] btrfs_evict_inode+0x17a/0x5e0
>>> > > [<ffffffff8123fc68>] evict+0xb8/0x1b0
>>> > > [<ffffffff81240813>] iput+0x1f3/0x260
>>> > > [<ffffffff81233c68>] do_unlinkat+0x1d8/0x360
>>> > > [<ffffffff812346db>] SyS_unlinkat+0x1b/0x40
>>> > > [<ffffffff8190024d>] system_call_fastpath+0x16/0x1b
>>> > > [<ffffffffffffffff>] 0xffffffffffffffff
>>> > >
>>> > >
>>> > > Run these three scripts in a directory that is the top of a subvol:
>>
>> New versions of these scripts make the results a little more reproducible:
>>
>>
>>         #!/bin/bash
>>         set -x
>>         # Script #1:  randomly create or delete snapshots
>>         # v2:  no significant changes
>>         while date; do
>>                 if [ $[RANDOM%2] = 0 ]; then
>>                         btrfs sub snap . snaps-$RANDOM
>>                 else
>>                         for x in snaps-*; do
>>                                 btrfs sub del $x
>>                                 break
>>                         done
>>                         btrfs sub sync .
>>                 fi
>>                 sleep 1
>>         done
>>
>>
>>         #!/bin/bash
>>         # Script #2:  create a bunch of files of random sizes
>>         # v2:  create our own test file instead of using Chromium's 1.6M copyright file
>>         while echo -ne "\r$(date)"; do
>>                 [ -s tester ] || head -c 1024k /dev/urandom > tester
>>                 d=$[RANDOM%9]/$[RANDOM%9]/$[RANDOM%9]/$[RANDOM%9]
>>                 mkdir -p ${d%/*}
>>                 head -c $[RANDOM%1024]k tester > $d
>>         done
>>
>>
>>
>>         #!/bin/bash
>>         set -x
>>         # Script #3:  read and immediately delete all the files
>>         # v3:  let script #2 create some more files
>>         while date; do
>>                 find *[0-9] -type f -exec sh -c 'cat >/dev/null "$@"' -- {} \; -exec rm -fv {} \;
>>
>>                 # Allow some files to build up between runs
>>                 sleep 1m
>>
>>                 # Make sure we are not reading from cache.
>>                 # These are not strictly necessary but they reduce
>>                 # the repro time by a minute or so.
>>                 sync
>>                 sysctl vm.drop_caches=1
>>         done
>>
>>
>>> > Tried that for over 3 hours, on a 4.1-rc2 kernel with a few patches
>>> > from the list, with several combinations of mount options (compress,
>>> > autodefrag, nodatacow, etc) and didn't got any issue.
>>> >
>>> > What kernel version are you testing? Any specific combination of mount options?
>>>
>>> I've seen it on the field on versions from v3.15 to v4.0.1.  The test I did
>>> yesterday was v4.0.1.
>>
>> I just verified that the issue is still present in v4.1-rc3.  Tested on
>> bare hardware and kvm, and a mix of AMD and Intel CPUs.
>>
>> The issue appears immediately after the test file collection becomes too
>> large to fit in the host RAM.  In my test environment I used RAM sizes
>> from 3GB to 16GB with a 16GB btrfs filesystem.  The test ran without
>> incident until the filesystem used space (reported by df) exceeded the
>> RAM size, then rm hung a few seconds later.
>>
>> If I reboot after an rm hang and run scripts #1 and #3 (snapshots and
>> rm), it hangs almost immediately as soon as subvol delete and rm are
>> running at the same time.
>>
>> If script #3 (remove files) runs too quickly (i.e. your disks are too
>> fast ;), try delaying script #3 until after #1 and #2 have accumulated
>> enough data to exceed RAM size.
>>
>> I used default mkfs and mount options this time.  For kvm tests I used
>> a freshly debootstrapped Debian Jessie, and for the bare hardware tests
>> I used some random Debian Wheezy systems.
>>
>> My kernel config file, logs, and repro scripts are available at:
>>
>>         http://furryterror.org/~zblaxell/tmp/.ma12/
>>
>> This is part of the kernel log after a typical failure (the whole thing
>> is available at the URL above):
>>
>>         May 13 04:59:34 testhost kernel: [  720.290141] rm              D ffff8800ab8ebc78     0  4994  23006 0x00000000
>>         May 13 04:59:34 testhost kernel: [  720.329903]  ffff8800ab8ebc78 ffffffff814291b8 00000000ffffffff ffff8800aa831000
>>         May 13 04:59:34 testhost kernel: [  720.330512]  ffff8800ac4ed000 00000000000b0000 ffff8800ab8ec000 ffff8800acd670f0
>>         May 13 04:59:34 testhost kernel: [  720.331090]  ffff8800acd670d0 00000000000b0000 ffff8800aa62dae0 ffff8800ab8ebc98
>>         May 13 04:59:34 testhost kernel: [  720.400161] Call Trace:
>>         May 13 04:59:34 testhost kernel: [  720.401020]  [<ffffffff814291b8>] ? lock_extent_bits+0x1a8/0x200
>>         May 13 04:59:34 testhost kernel: [  720.462318]  [<ffffffff819a8297>] schedule+0x37/0x90
>>         May 13 04:59:34 testhost kernel: [  720.467741]  [<ffffffff814291bd>] lock_extent_bits+0x1ad/0x200
>>         May 13 04:59:34 testhost kernel: [  720.468489]  [<ffffffff810dfa30>] ? wait_woken+0xc0/0xc0
>>         May 13 04:59:34 testhost kernel: [  720.492572]  [<ffffffff814156ea>] btrfs_evict_inode+0x19a/0x760
>>         May 13 04:59:34 testhost kernel: [  720.493048]  [<ffffffff8127fc88>] evict+0xb8/0x1b0
>>         May 13 04:59:34 testhost kernel: [  720.494419]  [<ffffffff812808fe>] iput+0x2be/0x3e0
>>         May 13 04:59:34 testhost kernel: [  720.494598]  [<ffffffff81272cb8>] do_unlinkat+0x208/0x330
>>         May 13 04:59:34 testhost kernel: [  720.495086]  [<ffffffff81265cda>] ? SyS_newfstatat+0x2a/0x40
>>         May 13 04:59:34 testhost kernel: [  720.495511]  [<ffffffff81566925>] ? lockdep_sys_exit_thunk+0x12/0x14
>>         May 13 04:59:34 testhost kernel: [  720.495750]  [<ffffffff8127359b>] SyS_unlinkat+0x1b/0x40
>>         May 13 04:59:34 testhost kernel: [  720.496101]  [<ffffffff819af5b2>] system_call_fastpath+0x16/0x7a
>>         May 13 04:59:34 testhost kernel: [  720.628063] 1 lock held by rm/4994:
>>         May 13 04:59:34 testhost kernel: [  720.628239]  #0:  (sb_writers#3){.+.+.+}, at: [<ffffffff812861f4>] mnt_want_write+0x24/0x50
>
> Thanks. After over 1 hour running these scripts I was able to
> reproduce. I'll see if I can figure out why it happens and fix it.

Zygo can you try the following patch?

https://patchwork.kernel.org/patch/6409591/

With it applied I couldn't get the hang anymore for a ~4 hours test
(though it's possible other stuff leads to the hangs too).
Thanks.

>
>
>
> --
> Filipe David Manana,
>
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-05-14 19:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-07 22:20 Recipe for creating unlink deadlocks Zygo Blaxell
2015-05-08 10:32 ` Filipe David Manana
2015-05-08 13:21   ` Zygo Blaxell
     [not found]     ` <20150513053340.GF18025@hungrycats.org>
2015-05-13 23:07       ` Recipe for creating unlink deadlocks (v2, verified on 4.1-rc3) (resend) Zygo Blaxell
2015-05-14 12:25         ` Filipe David Manana
2015-05-14 19:45           ` Filipe David Manana

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).