All of lore.kernel.org
 help / color / mirror / Atom feed
From: Laurent Riffard <laurent.riffard@free.fr>
To: "Vladimir V. Saveliev" <vs@namesys.com>
Cc: reiserfs-list@namesys.com, Alexander Zarochentsev <zam@namesys.com>
Subject: Re: reiser4-2.6.18-rc2-mm1: possible circular locking dependency detected in txn_end
Date: Mon, 14 Aug 2006 15:02:02 +0200	[thread overview]
Message-ID: <44E0744A.2000304@free.fr> (raw)
In-Reply-To: <200608141550.41663.vs@namesys.com>

Le 14.08.2006 13:50, Vladimir V. Saveliev a écrit :
> Hello
> 
> On Saturday 12 August 2006 17:26, Laurent Riffard wrote:
>> Le 03.08.2006 17:07, Laurent Riffard a écrit :
>>> Le 03.08.2006 08:09, Alexander Zarochentsev a écrit :
>>>> On Tuesday 01 August 2006 01:29, Laurent Riffard wrote:
>>>>> Le 31.07.2006 21:55, Vladimir V. Saveliev a écrit :
>>>>>> Hello
>>>>>>
>>>>>> What kind of load did you run on reiser4 at that time?
>>>>> I just formatted a new 2GB Reiser4 FS, then I moved a whole ccache
>>>>> cache tree to this new FS (cache size was about 20~30 Mbytes).
>>>>> Something like:
>>>>>
>>>>> # mkfs.reiser4 /dev/vglinux1/ccache
>>>>> # mount -tauto -onoatime /dev/vglinux1/ccache /mnt/disk
>>>>> # mv ~laurent/.ccache/* /mnt/disk/
>>>> I was not able to reproduce it.  Can you please try the following patch?
>>>>
>>>>
>>>> lock validator friendly locking of new atom in
>>>> atom_begin_and_assign_to_txnh and locking of two atoms.
>>>>
>>>> Signed-off-by: Alexander Zarochentsev <zam@namesys.com>
>>>> ---
>>>>
>>>>  fs/reiser4/txnmgr.c |   14 ++++++++------
>>>>  fs/reiser4/txnmgr.h |   15 +++++++++++++++
>>>>  2 files changed, 23 insertions(+), 6 deletions(-)
>> [patch snipped]
>>
>>> I tried this patch: it's slow as hell (CPU is ~100% system) and it
>>> panics when syncing...
>>>
>>> reiser4 panicked cowardly: reiser4[shutdown(1904)]: spin_lock_atom
>>> (fs/reiser4/txmgr.h:509)[]:
>> Hello,
>>
>> I tried again with linux 2.6.18-rc3-mm2+hotfixes.
>>
>>     # booted to runlevel 1
>>     ~$ mount
>>     ...
>>     /dev/mapper/vglinux1-lvhome on /home type reiserfs (rw)
>>     /dev/mapper/vglinux1-lvccache on /home/laurent/.ccache type reiser4
>> (rw,nosuid,nodev,noatime) ...
>>
>>     ~$ df ~/.ccache
>>     Filesystem            Size  Used Avail Use% Mounted on
>>     /dev/mapper/vglinux1-lvccache
>>                           2.0G   53M  1.9G   3% /home/laurent/.ccache
>>
>>     ~$ time mv ~/.ccache/* ~/tmp/ccache
>>     0.10user 6.01system 0:07.92elapsed 77%CPU (0avgtext+0avgdata
>> 0maxresident)k 0inputs+0outputs (1major+296minor)pagefaults 0swaps
>>
>> dmesg output:
>>     =======================================================
>>     [ INFO: possible circular locking dependency detected ]
>>     -------------------------------------------------------
>>     mv/1255 is trying to acquire lock:
>>      (&txnh->hlock){--..}, at: [<e101f0cf>] txn_end+0x191/0x368 [reiser4]
>>
>>     but task is already holding lock:
>>      (&atom->alock){--..}, at: [<e101b674>] txnh_get_atom+0xf6/0x39e
>> [reiser4]
>>
>>     which lock already depends on the new lock.
>>
>>
>>     the existing dependency chain (in reverse order) is:
>>
>>     -> #1 (&atom->alock){--..}:
>>            [<c012ce82>] lock_acquire+0x60/0x80
>>            [<c0291c08>] _spin_lock+0x19/0x28
>>            [<e101cc0b>] try_capture+0x7cf/0x1cd7 [reiser4]
>>            [<e10096e5>] longterm_lock_znode+0x427/0x84f [reiser4]
>>            [<e1038fe7>] seal_validate+0x221/0x5ee [reiser4]
>>            [<e10652a1>] find_entry+0x126/0x307 [reiser4]
>>            [<e1065974>] rem_entry_common+0xe9/0x4ba [reiser4]
>>            [<e104c9bc>] unlink_common+0x258/0x364 [reiser4]
>>            [<c015f7bc>] vfs_unlink+0x47/0x87
>>            [<c01611b4>] do_unlinkat+0x8c/0x122
>>            [<c016125a>] sys_unlink+0x10/0x12
>>            [<c0102c39>] sysenter_past_esp+0x56/0x8d
>>
>>     -> #0 (&txnh->hlock){--..}:
>>            [<c012ce82>] lock_acquire+0x60/0x80
>>            [<c0291c08>] _spin_lock+0x19/0x28
>>            [<e101f0cf>] txn_end+0x191/0x368 [reiser4]
>>            [<e10109b5>] reiser4_exit_context+0x1c2/0x571 [reiser4]
>>            [<e104cabd>] unlink_common+0x359/0x364 [reiser4]
>>            [<c015f7bc>] vfs_unlink+0x47/0x87
>>            [<c01611b4>] do_unlinkat+0x8c/0x122
>>            [<c016125a>] sys_unlink+0x10/0x12
>>            [<c0102c39>] sysenter_past_esp+0x56/0x8d
>>
>>     other info that might help us debug this:
>>
>>     3 locks held by mv/1255:
>>      #0:  (&inode->i_mutex/1){--..}, at: [<c0161181>]
>> do_unlinkat+0x59/0x122 #1:  (&inode->i_mutex){--..}, at: [<c0290a94>]
>> mutex_lock+0x21/0x24 #2:  (&atom->alock){--..}, at: [<e101b674>]
>> txnh_get_atom+0xf6/0x39e [reiser4]
>>
>>     stack backtrace:
>>      [<c0103a97>] show_trace+0xd/0x10
>>      [<c0103ff6>] dump_stack+0x19/0x1b
>>      [<c012c20d>] print_circular_bug_tail+0x59/0x64
>>      [<c012ca2c>] __lock_acquire+0x814/0x9a5
>>      [<c012ce82>] lock_acquire+0x60/0x80
>>      [<c0291c08>] _spin_lock+0x19/0x28
>>      [<e101f0cf>] txn_end+0x191/0x368 [reiser4]
>>      [<e10109b5>] reiser4_exit_context+0x1c2/0x571 [reiser4]
>>      [<e104cabd>] unlink_common+0x359/0x364 [reiser4]
>>      [<c015f7bc>] vfs_unlink+0x47/0x87
>>      [<c01611b4>] do_unlinkat+0x8c/0x122
>>      [<c016125a>] sys_unlink+0x10/0x12
>>      [<c0102c39>] sysenter_past_esp+0x56/0x8d
>>
>>
>>     ~$ time sync
>>     0.00user 0.02system 0:00.49elapsed 4%CPU (0avgtext+0avgdata
>> 0maxresident)k 0inputs+0outputs (1major+202minor)pagefaults 0swaps
>>
>> Move the files backward...
>>     ~$ time mv ~/tmp/ccache/* ~/.ccache/
>>     0.11user 3.98system 0:04.09elapsed 100%CPU (0avgtext+0avgdata
>> 0maxresident)k 0inputs+0outputs (0major+286minor)pagefaults 0swaps
>>
>>     ~$ time sync
>>     0.00user 0.00system 0:01.86elapsed 0%CPU (0avgtext+0avgdata
>> 0maxresident)k 0inputs+0outputs (0major+204minor)pagefaults 0swaps
>>
>> So this problem still appears in 2.6.18-rc3-mm2+hotfixes.
>>
>> I applied the patch zam sent 01 August 2006. Compile, boot to
>> runlevel 1 and test again. The warning went away but it's now really
>> really slow:
>>
>>     ~$ time mv ~/.ccache/* ~/tmp/ccache
>>     0.09user 15.82system 0:16.08elapsed 98%CPU (0avgtext+0avgdata
>> 0maxresident)k 0inputs+0outputs (0major+299minor)pagefaults 0swaps
>>
>> ouch! 16s versus 8s.
>>
>>     ~$ time sync
>>     0.00user 0.10system 0:00.58elapsed 17%CPU (0avgtext+0avgdata
>> 0maxresident)k 0inputs+0outputs (0major+205minor)pagefaults 0swaps
>>
>> Move the files backward...
>>     ~$ time mv ~/tmp/ccache/* ~/.ccache/
>>     0.11user 27.20system 0:27.33elapsed 99%CPU (0avgtext+0avgdata
>> 0maxresident)k 0inputs+0outputs (0major+286minor)pagefaults 0swaps
>>
>> 27s versus 4s previously.
>>
>>     ~$ time sync
>>     0.00user 0.02system 0:04.35elapsed 0%CPU (0avgtext+0avgdata
>> 0maxresident)k 0inputs+0outputs (0major+204minor)pagefaults 0swaps
>>
>> once again...
>>     ~$ time ~/.ccache/* ~/tmp/ccache
>>     0.09user 37.43system 0:37.71elapsed 99%CPU (0avgtext+0avgdata
>> 0maxresident)k 0inputs+0outputs (0major+298minor)pagefaults 0swaps
>>
>>     ~$ time sync
>>     0.00user 0.08system 0:00.44elapsed 19%CPU (0avgtext+0avgdata
>> 0maxresident)k 0inputs+0outputs (0major+204minor)pagefaults 0swaps
>>
>> What's going on ? Let's have a look at vmstat output
>>     ~$ vmstat 3 &
>>     [1] 1394
>>     ~$ procs -----------memory---------- ---swap-- -----io---- --system--
>> ----cpu---- r  b   swpd   free   buff  cache   si   so    bi    bo   in   
>> cs us sy id wa 1  0      0 438688   9684  45612    0    0    70    90  269 
>>   50  1 15 81  3 0  0      0 438688   9684  45612    0    0     0     0 
>> 350    58  0  1 99  0 0  0      0 438688   9688  45612    0    0     0    
>> 3  251    10  0  0 100  0 0  0      0 438704   9688  45612    0    0     0 
>>    0  257    10  0  0 100  0 time mv ~/tmp/ccache/* ~/.ccache/
>>
>>      1  0      0 437944   9728  45684    0    0     0    13  270    25  0
>> 76 23  0 1  0      0 437448   9732  46384    0    0     0     4  251     8 
>> 0 100  0  0 1  0      0 437448   9784  46096    0    0     0    17  252   
>> 16  0 100  0  0 1  0      0 436828   9908  45984    0    0     0    43  254
>>    12  0 100  0  0 1  0      0 436580   9908  45844    0    0     0     0 
>> 250     8  0 100  0  0 1  0      0 436208  10004  46252    0    0     0   
>> 42  260    12  0 100  0  0 1  0      0 435836  10004  46176    0    0     0
>>     0  251     7  1 99  0  0 1  0      0 434968  10096  46868    0    0    
>> 0    31  251    11  0 100  0  0 1  0      0 435092  10144  46112    0    0 
>>    0    16  251    12  0 100  0  0 0.07user 29.02system 0:29.10elapsed
>> 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs
>> (0major+287minor)pagefaults 0swaps
>>     ~$
>>      0  0      0 434968  10144  45960    0    0     0     0  251    11  0
>> 92  8  0 0  0      0 434968  10192  45960    0    0     0    29  255    11 
>> 0  0 99  1 0  0      0 434984  10192  45960    0    0     0     0  255   
>> 27  0  0 100  0 kill %1
>>     ~$
>>     [1]+  Complete              vmstat 3
>>
>> See? 100% system CPU. Without the patch, I have about 60% user and
>> 30% system.
>>
>> This patch seems to solve the "possible circular locking dependency
>> detected in txn_end". Moreover, it seems to solve the "possible
>> circular locking dependency detected wile unmounting reiser4 FS" I
>> reported in another thread.
>>
>> But it still panics while syncing:
>>     reiser4 panicked cowardly: reiser4[sync(23302)]: spin_lock_atom
>> (fs/reiser4/txnmgr.h:509)[]: Kernel panic - not syncing:
>> reiser4[sync(23302)]: spin_lock_atom (fs/reiser4/txnmgr.h:509)[]:
>>
> 
> Sorry, I am confused. In the above I see that sync completed because you were 
> able to time it. Please describe how to reproduce the panic.


Oups! I forgot some explanations... Sorry for that.

During the timed test, I was timing the "mv", saw the completion and 
then I timed the "sync" (ie a few seconds after the end of the "mv"). 

I've got the panic later: I was compiling a kernel (on a reiser4 FS) 
and I issued a "sync" during the compilation. Actually, the compilation 
process involves 2 distinct reiser4 FS: /home/laurent/kernel/ and 
/home/laurent/.ccache/. 

  $ mount
  /dev/mapper/vglinux1-lvhome on /home type reiserfs (rw)
  /dev/mapper/vglinux1-lvccache on /home/laurent/.ccache type reiser4 (rw,nosuid,nodev,noatime)
  /dev/mapper/vglinux1-lvkernel on /home/laurent/kernel type reiser4 (rw,nosuid,nodev,noatime)
 
In the timed tests, I didn't try to sync _during_ the "mv".

-- 
laurent

  reply	other threads:[~2006-08-14 13:02 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-07-30 18:57 reiser4-2.6.18-rc2-mm1: possible circular locking dependency detected in txn_end Laurent Riffard
2006-07-31 19:55 ` Vladimir V. Saveliev
2006-07-31 21:29   ` Laurent Riffard
2006-08-03  6:09     ` Alexander Zarochentsev
2006-08-03 15:07       ` Laurent Riffard
2006-08-12 13:26         ` Laurent Riffard
2006-08-14 11:50           ` Vladimir V. Saveliev
2006-08-14 13:02             ` Laurent Riffard [this message]
2006-08-22 13:08           ` Alexander Zarochentsev
2006-08-23 23:46             ` Laurent Riffard
2006-08-01 19:18 ` Alexander Zarochentsev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44E0744A.2000304@free.fr \
    --to=laurent.riffard@free.fr \
    --cc=reiserfs-list@namesys.com \
    --cc=vs@namesys.com \
    --cc=zam@namesys.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.