From: Laurent Riffard <laurent.riffard@free.fr>
To: "Vladimir V. Saveliev" <vs@namesys.com>
Cc: reiserfs-list@namesys.com, Alexander Zarochentsev <zam@namesys.com>
Subject: Re: reiser4-2.6.18-rc2-mm1: possible circular locking dependency detected in txn_end
Date: Mon, 14 Aug 2006 15:02:02 +0200 [thread overview]
Message-ID: <44E0744A.2000304@free.fr> (raw)
In-Reply-To: <200608141550.41663.vs@namesys.com>
Le 14.08.2006 13:50, Vladimir V. Saveliev a écrit :
> Hello
>
> On Saturday 12 August 2006 17:26, Laurent Riffard wrote:
>> Le 03.08.2006 17:07, Laurent Riffard a écrit :
>>> Le 03.08.2006 08:09, Alexander Zarochentsev a écrit :
>>>> On Tuesday 01 August 2006 01:29, Laurent Riffard wrote:
>>>>> Le 31.07.2006 21:55, Vladimir V. Saveliev a écrit :
>>>>>> Hello
>>>>>>
>>>>>> What kind of load did you run on reiser4 at that time?
>>>>> I just formatted a new 2GB Reiser4 FS, then I moved a whole ccache
>>>>> cache tree to this new FS (cache size was about 20~30 Mbytes).
>>>>> Something like:
>>>>>
>>>>> # mkfs.reiser4 /dev/vglinux1/ccache
>>>>> # mount -tauto -onoatime /dev/vglinux1/ccache /mnt/disk
>>>>> # mv ~laurent/.ccache/* /mnt/disk/
>>>> I was not able to reproduce it. Can you please try the following patch?
>>>>
>>>>
>>>> lock validator friendly locking of new atom in
>>>> atom_begin_and_assign_to_txnh and locking of two atoms.
>>>>
>>>> Signed-off-by: Alexander Zarochentsev <zam@namesys.com>
>>>> ---
>>>>
>>>> fs/reiser4/txnmgr.c | 14 ++++++++------
>>>> fs/reiser4/txnmgr.h | 15 +++++++++++++++
>>>> 2 files changed, 23 insertions(+), 6 deletions(-)
>> [patch snipped]
>>
>>> I tried this patch: it's slow as hell (CPU is ~100% system) and it
>>> panics when syncing...
>>>
>>> reiser4 panicked cowardly: reiser4[shutdown(1904)]: spin_lock_atom
>>> (fs/reiser4/txmgr.h:509)[]:
>> Hello,
>>
>> I tried again with linux 2.6.18-rc3-mm2+hotfixes.
>>
>> # booted to runlevel 1
>> ~$ mount
>> ...
>> /dev/mapper/vglinux1-lvhome on /home type reiserfs (rw)
>> /dev/mapper/vglinux1-lvccache on /home/laurent/.ccache type reiser4
>> (rw,nosuid,nodev,noatime) ...
>>
>> ~$ df ~/.ccache
>> Filesystem Size Used Avail Use% Mounted on
>> /dev/mapper/vglinux1-lvccache
>> 2.0G 53M 1.9G 3% /home/laurent/.ccache
>>
>> ~$ time mv ~/.ccache/* ~/tmp/ccache
>> 0.10user 6.01system 0:07.92elapsed 77%CPU (0avgtext+0avgdata
>> 0maxresident)k 0inputs+0outputs (1major+296minor)pagefaults 0swaps
>>
>> dmesg output:
>> =======================================================
>> [ INFO: possible circular locking dependency detected ]
>> -------------------------------------------------------
>> mv/1255 is trying to acquire lock:
>> (&txnh->hlock){--..}, at: [<e101f0cf>] txn_end+0x191/0x368 [reiser4]
>>
>> but task is already holding lock:
>> (&atom->alock){--..}, at: [<e101b674>] txnh_get_atom+0xf6/0x39e
>> [reiser4]
>>
>> which lock already depends on the new lock.
>>
>>
>> the existing dependency chain (in reverse order) is:
>>
>> -> #1 (&atom->alock){--..}:
>> [<c012ce82>] lock_acquire+0x60/0x80
>> [<c0291c08>] _spin_lock+0x19/0x28
>> [<e101cc0b>] try_capture+0x7cf/0x1cd7 [reiser4]
>> [<e10096e5>] longterm_lock_znode+0x427/0x84f [reiser4]
>> [<e1038fe7>] seal_validate+0x221/0x5ee [reiser4]
>> [<e10652a1>] find_entry+0x126/0x307 [reiser4]
>> [<e1065974>] rem_entry_common+0xe9/0x4ba [reiser4]
>> [<e104c9bc>] unlink_common+0x258/0x364 [reiser4]
>> [<c015f7bc>] vfs_unlink+0x47/0x87
>> [<c01611b4>] do_unlinkat+0x8c/0x122
>> [<c016125a>] sys_unlink+0x10/0x12
>> [<c0102c39>] sysenter_past_esp+0x56/0x8d
>>
>> -> #0 (&txnh->hlock){--..}:
>> [<c012ce82>] lock_acquire+0x60/0x80
>> [<c0291c08>] _spin_lock+0x19/0x28
>> [<e101f0cf>] txn_end+0x191/0x368 [reiser4]
>> [<e10109b5>] reiser4_exit_context+0x1c2/0x571 [reiser4]
>> [<e104cabd>] unlink_common+0x359/0x364 [reiser4]
>> [<c015f7bc>] vfs_unlink+0x47/0x87
>> [<c01611b4>] do_unlinkat+0x8c/0x122
>> [<c016125a>] sys_unlink+0x10/0x12
>> [<c0102c39>] sysenter_past_esp+0x56/0x8d
>>
>> other info that might help us debug this:
>>
>> 3 locks held by mv/1255:
>> #0: (&inode->i_mutex/1){--..}, at: [<c0161181>]
>> do_unlinkat+0x59/0x122 #1: (&inode->i_mutex){--..}, at: [<c0290a94>]
>> mutex_lock+0x21/0x24 #2: (&atom->alock){--..}, at: [<e101b674>]
>> txnh_get_atom+0xf6/0x39e [reiser4]
>>
>> stack backtrace:
>> [<c0103a97>] show_trace+0xd/0x10
>> [<c0103ff6>] dump_stack+0x19/0x1b
>> [<c012c20d>] print_circular_bug_tail+0x59/0x64
>> [<c012ca2c>] __lock_acquire+0x814/0x9a5
>> [<c012ce82>] lock_acquire+0x60/0x80
>> [<c0291c08>] _spin_lock+0x19/0x28
>> [<e101f0cf>] txn_end+0x191/0x368 [reiser4]
>> [<e10109b5>] reiser4_exit_context+0x1c2/0x571 [reiser4]
>> [<e104cabd>] unlink_common+0x359/0x364 [reiser4]
>> [<c015f7bc>] vfs_unlink+0x47/0x87
>> [<c01611b4>] do_unlinkat+0x8c/0x122
>> [<c016125a>] sys_unlink+0x10/0x12
>> [<c0102c39>] sysenter_past_esp+0x56/0x8d
>>
>>
>> ~$ time sync
>> 0.00user 0.02system 0:00.49elapsed 4%CPU (0avgtext+0avgdata
>> 0maxresident)k 0inputs+0outputs (1major+202minor)pagefaults 0swaps
>>
>> Move the files backward...
>> ~$ time mv ~/tmp/ccache/* ~/.ccache/
>> 0.11user 3.98system 0:04.09elapsed 100%CPU (0avgtext+0avgdata
>> 0maxresident)k 0inputs+0outputs (0major+286minor)pagefaults 0swaps
>>
>> ~$ time sync
>> 0.00user 0.00system 0:01.86elapsed 0%CPU (0avgtext+0avgdata
>> 0maxresident)k 0inputs+0outputs (0major+204minor)pagefaults 0swaps
>>
>> So this problem still appears in 2.6.18-rc3-mm2+hotfixes.
>>
>> I applied the patch zam sent 01 August 2006. Compile, boot to
>> runlevel 1 and test again. The warning went away but it's now really
>> really slow:
>>
>> ~$ time mv ~/.ccache/* ~/tmp/ccache
>> 0.09user 15.82system 0:16.08elapsed 98%CPU (0avgtext+0avgdata
>> 0maxresident)k 0inputs+0outputs (0major+299minor)pagefaults 0swaps
>>
>> ouch! 16s versus 8s.
>>
>> ~$ time sync
>> 0.00user 0.10system 0:00.58elapsed 17%CPU (0avgtext+0avgdata
>> 0maxresident)k 0inputs+0outputs (0major+205minor)pagefaults 0swaps
>>
>> Move the files backward...
>> ~$ time mv ~/tmp/ccache/* ~/.ccache/
>> 0.11user 27.20system 0:27.33elapsed 99%CPU (0avgtext+0avgdata
>> 0maxresident)k 0inputs+0outputs (0major+286minor)pagefaults 0swaps
>>
>> 27s versus 4s previously.
>>
>> ~$ time sync
>> 0.00user 0.02system 0:04.35elapsed 0%CPU (0avgtext+0avgdata
>> 0maxresident)k 0inputs+0outputs (0major+204minor)pagefaults 0swaps
>>
>> once again...
>> ~$ time ~/.ccache/* ~/tmp/ccache
>> 0.09user 37.43system 0:37.71elapsed 99%CPU (0avgtext+0avgdata
>> 0maxresident)k 0inputs+0outputs (0major+298minor)pagefaults 0swaps
>>
>> ~$ time sync
>> 0.00user 0.08system 0:00.44elapsed 19%CPU (0avgtext+0avgdata
>> 0maxresident)k 0inputs+0outputs (0major+204minor)pagefaults 0swaps
>>
>> What's going on ? Let's have a look at vmstat output
>> ~$ vmstat 3 &
>> [1] 1394
>> ~$ procs -----------memory---------- ---swap-- -----io---- --system--
>> ----cpu---- r b swpd free buff cache si so bi bo in
>> cs us sy id wa 1 0 0 438688 9684 45612 0 0 70 90 269
>> 50 1 15 81 3 0 0 0 438688 9684 45612 0 0 0 0
>> 350 58 0 1 99 0 0 0 0 438688 9688 45612 0 0 0
>> 3 251 10 0 0 100 0 0 0 0 438704 9688 45612 0 0 0
>> 0 257 10 0 0 100 0 time mv ~/tmp/ccache/* ~/.ccache/
>>
>> 1 0 0 437944 9728 45684 0 0 0 13 270 25 0
>> 76 23 0 1 0 0 437448 9732 46384 0 0 0 4 251 8
>> 0 100 0 0 1 0 0 437448 9784 46096 0 0 0 17 252
>> 16 0 100 0 0 1 0 0 436828 9908 45984 0 0 0 43 254
>> 12 0 100 0 0 1 0 0 436580 9908 45844 0 0 0 0
>> 250 8 0 100 0 0 1 0 0 436208 10004 46252 0 0 0
>> 42 260 12 0 100 0 0 1 0 0 435836 10004 46176 0 0 0
>> 0 251 7 1 99 0 0 1 0 0 434968 10096 46868 0 0
>> 0 31 251 11 0 100 0 0 1 0 0 435092 10144 46112 0 0
>> 0 16 251 12 0 100 0 0 0.07user 29.02system 0:29.10elapsed
>> 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs
>> (0major+287minor)pagefaults 0swaps
>> ~$
>> 0 0 0 434968 10144 45960 0 0 0 0 251 11 0
>> 92 8 0 0 0 0 434968 10192 45960 0 0 0 29 255 11
>> 0 0 99 1 0 0 0 434984 10192 45960 0 0 0 0 255
>> 27 0 0 100 0 kill %1
>> ~$
>> [1]+ Complete vmstat 3
>>
>> See? 100% system CPU. Without the patch, I have about 60% user and
>> 30% system.
>>
>> This patch seems to solve the "possible circular locking dependency
>> detected in txn_end". Moreover, it seems to solve the "possible
>> circular locking dependency detected wile unmounting reiser4 FS" I
>> reported in another thread.
>>
>> But it still panics while syncing:
>> reiser4 panicked cowardly: reiser4[sync(23302)]: spin_lock_atom
>> (fs/reiser4/txnmgr.h:509)[]: Kernel panic - not syncing:
>> reiser4[sync(23302)]: spin_lock_atom (fs/reiser4/txnmgr.h:509)[]:
>>
>
> Sorry, I am confused. In the above I see that sync completed because you were
> able to time it. Please describe how to reproduce the panic.
Oups! I forgot some explanations... Sorry for that.
During the timed test, I was timing the "mv", saw the completion and
then I timed the "sync" (ie a few seconds after the end of the "mv").
I've got the panic later: I was compiling a kernel (on a reiser4 FS)
and I issued a "sync" during the compilation. Actually, the compilation
process involves 2 distinct reiser4 FS: /home/laurent/kernel/ and
/home/laurent/.ccache/.
$ mount
/dev/mapper/vglinux1-lvhome on /home type reiserfs (rw)
/dev/mapper/vglinux1-lvccache on /home/laurent/.ccache type reiser4 (rw,nosuid,nodev,noatime)
/dev/mapper/vglinux1-lvkernel on /home/laurent/kernel type reiser4 (rw,nosuid,nodev,noatime)
In the timed tests, I didn't try to sync _during_ the "mv".
--
laurent
next prev parent reply other threads:[~2006-08-14 13:02 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-07-30 18:57 reiser4-2.6.18-rc2-mm1: possible circular locking dependency detected in txn_end Laurent Riffard
2006-07-31 19:55 ` Vladimir V. Saveliev
2006-07-31 21:29 ` Laurent Riffard
2006-08-03 6:09 ` Alexander Zarochentsev
2006-08-03 15:07 ` Laurent Riffard
2006-08-12 13:26 ` Laurent Riffard
2006-08-14 11:50 ` Vladimir V. Saveliev
2006-08-14 13:02 ` Laurent Riffard [this message]
2006-08-22 13:08 ` Alexander Zarochentsev
2006-08-23 23:46 ` Laurent Riffard
2006-08-01 19:18 ` Alexander Zarochentsev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=44E0744A.2000304@free.fr \
--to=laurent.riffard@free.fr \
--cc=reiserfs-list@namesys.com \
--cc=vs@namesys.com \
--cc=zam@namesys.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.