Re: reiser4-2.6.18-rc2-mm1: possible circular locking dependency detected in txn_end

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Laurent Riffard <laurent.riffard@free.fr>
To: Laurent Riffard <laurent.riffard@free.fr>
Cc: Alexander Zarochentsev <zam@namesys.com>,
	reiserfs-list@namesys.com,
	"Vladimir V. Saveliev" <vs@namesys.com>
Subject: Re: reiser4-2.6.18-rc2-mm1: possible circular locking dependency detected in txn_end
Date: Sat, 12 Aug 2006 15:26:31 +0200	[thread overview]
Message-ID: <44DDD707.1000506@free.fr> (raw)
In-Reply-To: <44D2112A.7080705@free.fr>


Le 03.08.2006 17:07, Laurent Riffard a écrit :
> 
> Le 03.08.2006 08:09, Alexander Zarochentsev a écrit :
>> On Tuesday 01 August 2006 01:29, Laurent Riffard wrote:
>>> Le 31.07.2006 21:55, Vladimir V. Saveliev a écrit :
>>>> Hello
>>>>
>>>> What kind of load did you run on reiser4 at that time?
>>> I just formatted a new 2GB Reiser4 FS, then I moved a whole ccache
>>> cache tree to this new FS (cache size was about 20~30 Mbytes).
>>> Something like:
>>>
>>> # mkfs.reiser4 /dev/vglinux1/ccache
>>> # mount -tauto -onoatime /dev/vglinux1/ccache /mnt/disk
>>> # mv ~laurent/.ccache/* /mnt/disk/
>> I was not able to reproduce it.  Can you please try the following patch?
>>
>>
>> lock validator friendly locking of new atom in 
>> atom_begin_and_assign_to_txnh and locking of two atoms.
>>
>> Signed-off-by: Alexander Zarochentsev <zam@namesys.com>
>> ---
>>
>>  fs/reiser4/txnmgr.c |   14 ++++++++------
>>  fs/reiser4/txnmgr.h |   15 +++++++++++++++
>>  2 files changed, 23 insertions(+), 6 deletions(-)
>>
[patch snipped]
> 
> I tried this patch: it's slow as hell (CPU is ~100% system) and it
> panics when syncing...
> 
> reiser4 panicked cowardly: reiser4[shutdown(1904)]: spin_lock_atom
> (fs/reiser4/txmgr.h:509)[]:
> 

Hello, 

I tried again with linux 2.6.18-rc3-mm2+hotfixes.

    # booted to runlevel 1
    ~$ mount
    ...
    /dev/mapper/vglinux1-lvhome on /home type reiserfs (rw)
    /dev/mapper/vglinux1-lvccache on /home/laurent/.ccache type reiser4 (rw,nosuid,nodev,noatime)
    ...

    ~$ df ~/.ccache
    Filesystem            Size  Used Avail Use% Mounted on
    /dev/mapper/vglinux1-lvccache
                          2.0G   53M  1.9G   3% /home/laurent/.ccache

    ~$ time mv ~/.ccache/* ~/tmp/ccache
    0.10user 6.01system 0:07.92elapsed 77%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (1major+296minor)pagefaults 0swaps

dmesg output:
    =======================================================
    [ INFO: possible circular locking dependency detected ]
    -------------------------------------------------------
    mv/1255 is trying to acquire lock:
     (&txnh->hlock){--..}, at: [<e101f0cf>] txn_end+0x191/0x368 [reiser4]

    but task is already holding lock:
     (&atom->alock){--..}, at: [<e101b674>] txnh_get_atom+0xf6/0x39e [reiser4]

    which lock already depends on the new lock.


    the existing dependency chain (in reverse order) is:

    -> #1 (&atom->alock){--..}:
           [<c012ce82>] lock_acquire+0x60/0x80
           [<c0291c08>] _spin_lock+0x19/0x28
           [<e101cc0b>] try_capture+0x7cf/0x1cd7 [reiser4]
           [<e10096e5>] longterm_lock_znode+0x427/0x84f [reiser4]
           [<e1038fe7>] seal_validate+0x221/0x5ee [reiser4]
           [<e10652a1>] find_entry+0x126/0x307 [reiser4]
           [<e1065974>] rem_entry_common+0xe9/0x4ba [reiser4]
           [<e104c9bc>] unlink_common+0x258/0x364 [reiser4]
           [<c015f7bc>] vfs_unlink+0x47/0x87
           [<c01611b4>] do_unlinkat+0x8c/0x122
           [<c016125a>] sys_unlink+0x10/0x12
           [<c0102c39>] sysenter_past_esp+0x56/0x8d

    -> #0 (&txnh->hlock){--..}:
           [<c012ce82>] lock_acquire+0x60/0x80
           [<c0291c08>] _spin_lock+0x19/0x28
           [<e101f0cf>] txn_end+0x191/0x368 [reiser4]
           [<e10109b5>] reiser4_exit_context+0x1c2/0x571 [reiser4]
           [<e104cabd>] unlink_common+0x359/0x364 [reiser4]
           [<c015f7bc>] vfs_unlink+0x47/0x87
           [<c01611b4>] do_unlinkat+0x8c/0x122
           [<c016125a>] sys_unlink+0x10/0x12
           [<c0102c39>] sysenter_past_esp+0x56/0x8d

    other info that might help us debug this:

    3 locks held by mv/1255:
     #0:  (&inode->i_mutex/1){--..}, at: [<c0161181>] do_unlinkat+0x59/0x122
     #1:  (&inode->i_mutex){--..}, at: [<c0290a94>] mutex_lock+0x21/0x24
     #2:  (&atom->alock){--..}, at: [<e101b674>] txnh_get_atom+0xf6/0x39e [reiser4]

    stack backtrace:
     [<c0103a97>] show_trace+0xd/0x10
     [<c0103ff6>] dump_stack+0x19/0x1b
     [<c012c20d>] print_circular_bug_tail+0x59/0x64
     [<c012ca2c>] __lock_acquire+0x814/0x9a5
     [<c012ce82>] lock_acquire+0x60/0x80
     [<c0291c08>] _spin_lock+0x19/0x28
     [<e101f0cf>] txn_end+0x191/0x368 [reiser4]
     [<e10109b5>] reiser4_exit_context+0x1c2/0x571 [reiser4]
     [<e104cabd>] unlink_common+0x359/0x364 [reiser4]
     [<c015f7bc>] vfs_unlink+0x47/0x87
     [<c01611b4>] do_unlinkat+0x8c/0x122
     [<c016125a>] sys_unlink+0x10/0x12
     [<c0102c39>] sysenter_past_esp+0x56/0x8d


    ~$ time sync
    0.00user 0.02system 0:00.49elapsed 4%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (1major+202minor)pagefaults 0swaps

Move the files backward...
    ~$ time mv ~/tmp/ccache/* ~/.ccache/
    0.11user 3.98system 0:04.09elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+286minor)pagefaults 0swaps

    ~$ time sync
    0.00user 0.00system 0:01.86elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+204minor)pagefaults 0swaps

So this problem still appears in 2.6.18-rc3-mm2+hotfixes. 

I applied the patch zam sent 01 August 2006. Compile, boot to
runlevel 1 and test again. The warning went away but it's now really
really slow:

    ~$ time mv ~/.ccache/* ~/tmp/ccache
    0.09user 15.82system 0:16.08elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+299minor)pagefaults 0swaps

ouch! 16s versus 8s.

    ~$ time sync
    0.00user 0.10system 0:00.58elapsed 17%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+205minor)pagefaults 0swaps

Move the files backward...
    ~$ time mv ~/tmp/ccache/* ~/.ccache/
    0.11user 27.20system 0:27.33elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+286minor)pagefaults 0swaps

27s versus 4s previously.

    ~$ time sync
    0.00user 0.02system 0:04.35elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+204minor)pagefaults 0swaps

once again...
    ~$ time ~/.ccache/* ~/tmp/ccache
    0.09user 37.43system 0:37.71elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+298minor)pagefaults 0swaps

    ~$ time sync
    0.00user 0.08system 0:00.44elapsed 19%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+204minor)pagefaults 0swaps

What's going on ? Let's have a look at vmstat output
    ~$ vmstat 3 &
    [1] 1394
    ~$ procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
     r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
     1  0      0 438688   9684  45612    0    0    70    90  269    50  1 15 81  3
     0  0      0 438688   9684  45612    0    0     0     0  350    58  0  1 99  0
     0  0      0 438688   9688  45612    0    0     0     3  251    10  0  0 100  0
     0  0      0 438704   9688  45612    0    0     0     0  257    10  0  0 100  0
    time mv ~/tmp/ccache/* ~/.ccache/

     1  0      0 437944   9728  45684    0    0     0    13  270    25  0 76 23  0
     1  0      0 437448   9732  46384    0    0     0     4  251     8  0 100  0  0
     1  0      0 437448   9784  46096    0    0     0    17  252    16  0 100  0  0
     1  0      0 436828   9908  45984    0    0     0    43  254    12  0 100  0  0
     1  0      0 436580   9908  45844    0    0     0     0  250     8  0 100  0  0
     1  0      0 436208  10004  46252    0    0     0    42  260    12  0 100  0  0
     1  0      0 435836  10004  46176    0    0     0     0  251     7  1 99  0  0
     1  0      0 434968  10096  46868    0    0     0    31  251    11  0 100  0  0
     1  0      0 435092  10144  46112    0    0     0    16  251    12  0 100  0  0
    0.07user 29.02system 0:29.10elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+287minor)pagefaults 0swaps
    ~$ 
     0  0      0 434968  10144  45960    0    0     0     0  251    11  0 92  8  0
     0  0      0 434968  10192  45960    0    0     0    29  255    11  0  0 99  1
     0  0      0 434984  10192  45960    0    0     0     0  255    27  0  0 100  0
    kill %1
    ~$ 
    [1]+  Complete              vmstat 3

See? 100% system CPU. Without the patch, I have about 60% user and
30% system. 

This patch seems to solve the "possible circular locking dependency
detected in txn_end". Moreover, it seems to solve the "possible
circular locking dependency detected wile unmounting reiser4 FS" I
reported in another thread. 

But it still panics while syncing:
    reiser4 panicked cowardly: reiser4[sync(23302)]: spin_lock_atom (fs/reiser4/txnmgr.h:509)[]:
    Kernel panic - not syncing: reiser4[sync(23302)]: spin_lock_atom (fs/reiser4/txnmgr.h:509)[]:

~~
laurent

next prev parent reply	other threads:[~2006-08-12 13:26 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-07-30 18:57 reiser4-2.6.18-rc2-mm1: possible circular locking dependency detected in txn_end Laurent Riffard
2006-07-31 19:55 ` Vladimir V. Saveliev
2006-07-31 21:29   ` Laurent Riffard
2006-08-03  6:09     ` Alexander Zarochentsev
2006-08-03 15:07       ` Laurent Riffard
2006-08-12 13:26         ` Laurent Riffard [this message]
2006-08-14 11:50           ` Vladimir V. Saveliev
2006-08-14 13:02             ` Laurent Riffard
2006-08-22 13:08           ` Alexander Zarochentsev
2006-08-23 23:46             ` Laurent Riffard
2006-08-01 19:18 ` Alexander Zarochentsev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44DDD707.1000506@free.fr \
    --to=laurent.riffard@free.fr \
    --cc=reiserfs-list@namesys.com \
    --cc=vs@namesys.com \
    --cc=zam@namesys.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.