From: Andreas Dilger <adilger@clusterfs.com>
To: John Marconi <jamarconi@sbcglobal.net>
Cc: ext3-users@redhat.com, linux-ext4@vger.kernel.org
Subject: Re: kjournald hang on ext3 to ext3 copy
Date: Mon, 18 Jun 2007 23:14:02 -0600 [thread overview]
Message-ID: <20070619051402.GO5181@schatzie.adilger.int> (raw)
In-Reply-To: <4677531E.1030108@sbcglobal.net>
On Jun 18, 2007 22:53 -0500, John Marconi wrote:
> Andreas Dilger wrote:
> >Two tips for debugging this kind of issue:
> >- you need to have detailed stack traces (e.g. sysrq-t) of all the
> > interesting processes
> >
> >- if a process is stuck inside a large function (e.g. 8379 in example)
> > you need to provide the exact line number. this can be found by
> > compiling
> > the kernel with CONFIG_DEBUG_INFO (-g flag to gcc) and then doing
> > "gdb vmlinux" and "p *(journal_commit_transaction+{offset})", where the
> > byte offset is printed in the sysrq-t output, and then include the code
> > surrounding that line from the source file
> >
> >- a process stuck in "start_this_handle()" is often just an innocent
> > bystander. It is waiting for the currently committing transaction to
> > complete before it can start a new filesystem-modifying operation
> > (handle).
> > That said, the journal handle acts like a lock and has been the cause of
> > many deadlock problems (e.g. process 1 holds lock, waits for handle;
> > process 2 holds transaction open waiting for lock). pdflush might be one
> > of the "process 1" kind of tasks, and some other process is holding the
> > transaction open preventing it from completing.
>
> I am not able to update the entire kernel to a new version for a variety
> of reasons, however I can update certain parts in my system (such as the
> filesystem). I did a diff of the 2.6.16 kernel against my kernel, and
> the changes to jbd were minimal. I plan on looking at the latest
> versions of the kernel to determine if anything has changed since 2.6.16.
The problem may also be in the ext3 layer and not jbd.
> I took a look at the place that kjournald was stuck - it is in the
> journal_commit_transaction "while (comiit_transaction->t_updates)" loop
> and it is trying to "spin_lock(&journal->j_state_lock). When I look at
> pdflush, it is also trying to take the journal->j_state_lock. Do you
> have any tips on finding out which process might own journal->j_state_lock?
You can enable CONFIG_DEBUG_SPINLOCK in newer kernels and it appears the
spinlock will set the "owner" field to the task struct. You still need
to get access to this via e.g. "crash" or lkcd or something.
Hmm, it seems this is only set for ppc and s390??? That is how I would
debug this in any case. The other way (I've done this too many times
in the past) is to look through all of the stack traces and figure out
which ones are in a filesystem context, then check if any of them are
blocked on locks while holding transactions open. Needs a detailed
understanding of kernel callpaths.
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
prev parent reply other threads:[~2007-06-19 5:14 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <4673E2F1.2090704@sbcglobal.net>
2007-06-18 6:20 ` kjournald hang on ext3 to ext3 copy Andreas Dilger
2007-06-19 3:53 ` John Marconi
2007-06-19 5:14 ` Andreas Dilger [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070619051402.GO5181@schatzie.adilger.int \
--to=adilger@clusterfs.com \
--cc=ext3-users@redhat.com \
--cc=jamarconi@sbcglobal.net \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox