From: "Rudá Moura" <ruda.moura@terra.com.br>
To: linux-kernel <linux-kernel@vger.kernel.org>
Cc: Alexander Viro <viro@parcelfarce.linux.theplanet.co.uk>
Subject: Processes stuck in D trying to acquire lock at vfs_rename_dir
Date: Tue, 19 Sep 2006 15:09:59 -0300 [thread overview]
Message-ID: <1158689399.32184.17.camel@localhost> (raw)
We are huge mail provider with a pool of many mx, pop and imap machines
and we are facing a rather strange situation when we recently
upgraded one of our applications, and after a week running, several
processes hang in "D" (disk wait) state forever. That way, we cannot
strace, gdb, pstack them to know what they were doing or where they
were.
On the top of those machines are running RHEL version 4 with
kernel 2.6.9-42.ELsmp. Some are running (stock) kernel 2.6.17.11
with KDB patch applied. The hardware is described as follow:
- Dell PowerEdge 6850, QUAD 3,2GHZ HT , 8GB RAM,
- 5 SCSI disks 146 GB 15k RPM.
We use NFS to keep mailboxes in maildir format. They are shared to many
machines and are provided by a NFS server storage.
In order to gain more understanding of the problem
we decide to enable mutex/futex debug (CONFIG_DEBUG_MUTEXES=y),
because the bug is hard to reproduce and happens in an occasional manner
for a week or more.
The process locked was "ttrlmtp_tcp", they stuck in D while are trying
to rename() as the log message states:
lmtplog.6.gz:Sep 18 19:00:02 mangoro trrlmtpd_tcp[12499]:
1158616802.840213: TrrMailMdirMoveMBToIdPermMB: trying to rename
[/nfs/mail3d03/m/a/r/i/a/4/8/2/6/maria4826/Maildir] to
[/nfs/mail3d03/m/a/r/i/a/4/8/2/6/18538100#perm!terra.maria4826] ...
and this is what Mutex/Futex Debug gave to us:
Sep 18 19:00:02 mangoro kernel:
==========================================
Sep 18 19:00:02 mangoro kernel: [ BUG: lock recursion deadlock detected!
|
Sep 18 19:00:02 mangoro kernel:
------------------------------------------
Sep 18 19:00:15 mangoro kernel:
Sep 18 19:00:15 mangoro kernel: trrlmtpd_tcp/12499 is trying to acquire
this lock:
Sep 18 19:00:15 mangoro kernel: [ed260830] {inode_init_once}
Sep 18 19:00:15 mangoro kernel: .. held by: trrlmtpd_tcp:12499
[d94cc560, 115]
Sep 18 19:00:15 mangoro kernel: ... acquired at:
lock_rename+0x36/0x94
Sep 18 19:00:15 mangoro kernel: ... trying at:
vfs_rename_dir+0x84/0x100
Sep 18 19:00:15 mangoro kernel: ------------------------------
Sep 18 19:00:15 mangoro kernel: | showing all locks held by: |
(trrlmtpd_tcp/12499 [d94cc560, 115]):
Sep 18 19:00:15 mangoro kernel: ------------------------------
Sep 18 19:00:15 mangoro kernel:
Sep 18 19:00:15 mangoro kernel: #001: [f6e839f4]
{alloc_super}
Sep 18 19:00:15 mangoro kernel: ... acquired at:
lock_rename+0x4f/0x94
Sep 18 19:00:15 mangoro kernel:
Sep 18 19:00:15 mangoro kernel: #002: [ed260b50]
{inode_init_once}
Sep 18 19:00:15 mangoro kernel: ... acquired at:
lock_rename+0x2b/0x94
Sep 18 19:00:15 mangoro kernel:
Sep 18 19:00:15 mangoro kernel: #003: [ed260830]
{inode_init_once}
Sep 18 19:00:15 mangoro kernel: ... acquired at:
lock_rename+0x36/0x94
Sep 18 19:00:15 mangoro kernel:
Sep 18 19:00:15 mangoro kernel:
Sep 18 19:00:15 mangoro kernel: trrlmtpd_tcp/12499's [current]
stackdump:
Sep 18 19:00:15 mangoro kernel:
Sep 18 19:00:15 mangoro kernel: <c0104205> show_trace+0xd/0xf
<c01042ce> dump_stack+0x15/0x17
Sep 18 19:00:15 mangoro kernel: <c0131205> report_deadlock+0xea/0x102
<c0131364> check_deadlock+0x147/0x151
Sep 18 19:00:15 mangoro kernel: <c0131908> debug_mutex_add_waiter
+0x80/0x93 <c02fcf02> __mutex_lock_slowpath+0x117/0x37a
Sep 18 19:00:15 mangoro kernel: <c02fcddb> mutex_lock+0x24/0x27
<c0168888> vfs_rename_dir+0x84/0x100
Sep 18 19:00:15 mangoro kernel: <c0168b2a> vfs_rename+0x135/0x273
<c0168d9f> do_rename+0x137/0x176
Sep 18 19:00:15 mangoro kernel: <c0168e17> sys_renameat+0x39/0x54
<c0168e44> sys_rename+0x12/0x14
Sep 18 19:00:15 mangoro kernel: <c01032ef> sysenter_past_esp+0x54/0x75
Unfortunately we couldn't use KDB to debug this process because we had
to reboot the machine.
We are waiting for another process lock.
Any help in the subject will be very appreciate and we are ready to
provide more information about this problem if required.
Thanks in Advance.
--
Rudá Moura <ruda.moura@terra.com.br>
next reply other threads:[~2006-09-19 18:10 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-09-19 18:09 Rudá Moura [this message]
[not found] <001301c6dcc3$a3c94d50$2201a8c0@soto>
2006-09-20 15:11 ` Processes stuck in D trying to acquire lock at vfs_rename_dir Fernando Soto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1158689399.32184.17.camel@localhost \
--to=ruda.moura@terra.com.br \
--cc=linux-kernel@vger.kernel.org \
--cc=viro@parcelfarce.linux.theplanet.co.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox