public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Bastien ROUCARIES <roucaries.bastien@gmail.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Reiserfs deadlock in 2.6.36
Date: Thu, 23 Dec 2010 04:42:33 +0100	[thread overview]
Message-ID: <20101223034229.GF1739@nowhere> (raw)
In-Reply-To: <AANLkTik=+smaiGNwVw0pDBXexVd=trsz9aSawuqLZP6k@mail.gmail.com>

On Wed, Dec 22, 2010 at 07:11:43PM +0100, Bastien ROUCARIES wrote:
> On Wed, Dec 22, 2010 at 7:04 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > On Wed, Dec 22, 2010 at 06:50:48PM +0100, Bastien ROUCARIES wrote:
> >> Le jeudi 16 décembre 2010 14:49:48, Bastien ROUCARIES a écrit :
> >> > Le jeudi 2 décembre 2010 18:43:32, vous avez écrit :
> >> > > On Fri, Nov 26, 2010 at 05:57:05PM +0100, Bastien ROUCARIES wrote:
> >> > > > Dear frederic,
> >> I achieve to reproduce it. BTW it is my home partition with acl enable
> >
> > How do you know you reproduced it? You had a crash before using SysRq?
> > Or you felt a deadlock or so?
> >
> > What's interesting is that report is that there is no blocked task
> > that holds the reiserfs lock.
> >
> > So I really feel the problem is that someone opened the journal but did not
> > release it.
> 
> Could you add a virtual lock for testing this hypothesis ? This lock
> will be held during journal opening and releasing during journal
> closing, using lockdep for testing this hypothesis ?


That's a good idea. But we can get the same result with traces more easily.
Plus I would like one more level of details about the origin of the issue.
We shouldn't skimp on dumping informations, given how hard it is to
reproduce ;)

So here is a patch that inserts some debug tracing points in the journal
opening and journal closing points, so that we can find if there is any
imbalance here, namely to find if the problem is some path that forgets
to close the journal (calling do_journal_end()).

But the reason could be something else. Like for some reasons writers
queue themselves waiting when they shouldn't. 
So I've inserted two more points that will let us know why the hung tasks
have put themselves in queue.

This all should narrow down the possible origins of the issue.

You will need to select CONFIG_TRACING. Just select

Kernel Hacking
	Tracers
		[*] Trace process context switches and events

Or whatever option inside Tracers menu.


And when your problem triggers, type the sysrq combination
to dump ftrace buffers: Sysrq z

Ah and also boot with the ftrace=nop parameter, this will give you
enough size for the buffer, although I guess the default size should
be enough but we never know.

Thanks.

The patch:

diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c
index d31bce1..e1737c8 100644
--- a/fs/reiserfs/journal.c
+++ b/fs/reiserfs/journal.c
@@ -3073,6 +3073,7 @@ static int do_journal_begin_r(struct reiserfs_transaction_handle *th,
 		    (journal->j_len_alloc * 75)) {
 			if (atomic_read(&journal->j_wcount) > 10) {
 				sched_count++;
+				trace_printk("queue log 1\n");
 				queue_log_writer(sb);
 				goto relock;
 			}
@@ -3083,6 +3084,7 @@ static int do_journal_begin_r(struct reiserfs_transaction_handle *th,
 		if (atomic_read(&journal->j_jlock)) {
 			while (journal->j_trans_id == old_trans_id &&
 			       atomic_read(&journal->j_jlock)) {
+				trace_printk("queue log 2\n");
 				queue_log_writer(sb);
 			}
 			goto relock;
@@ -3116,6 +3118,8 @@ static int do_journal_begin_r(struct reiserfs_transaction_handle *th,
 	unlock_journal(sb);
 	INIT_LIST_HEAD(&th->t_list);
 	get_fs_excl();
+	trace_printk("begin %p ret = 0\n", sb);
+	trace_dump_stack();
 	return 0;
 
       out_fail:
@@ -3124,6 +3128,8 @@ static int do_journal_begin_r(struct reiserfs_transaction_handle *th,
 	 * persistent transactions there are. We need to do this so if this
 	 * call is part of a failed restart_transaction, we can free it later */
 	th->t_super = sb;
+	trace_printk("begin %p ret = %d\n", sb, retval);
+	trace_dump_stack();
 	return retval;
 }
 
@@ -4295,6 +4301,8 @@ static int do_journal_end(struct reiserfs_transaction_handle *th,
 		flush_commit_list(sb, jl, 1);
 	}
       out:
+	trace_printk("end %p ret = %d\n", sb, journal->j_errno);
+	trace_dump_stack();
 	reiserfs_check_lock_depth(sb, "journal end2");
 
 	memset(th, 0, sizeof(*th));


  reply	other threads:[~2010-12-23  3:42 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-18 15:49 Reiserfs deadlock in 2.6.36 Bastien ROUCARIES
2010-11-18 16:30 ` Frederic Weisbecker
2010-11-18 22:02   ` Bastien ROUCARIES
2010-11-26 16:57   ` Bastien ROUCARIES
2010-11-26 17:27     ` Bastien ROUCARIES
2010-12-02 17:43     ` Frederic Weisbecker
2010-12-16 13:49       ` Bastien ROUCARIES
2010-12-22 17:50         ` Bastien ROUCARIES
2010-12-22 18:04           ` Frederic Weisbecker
2010-12-22 18:11             ` Bastien ROUCARIES
2010-12-23  3:42               ` Frederic Weisbecker [this message]
2011-01-30  0:08                 ` Bastien ROUCARIES
2011-02-16 16:22                   ` Regression with dataloss: Reiserfs deadlock in 2.6.36 and 2.6.37, know working 2.6.33 Bastien ROUCARIES
2011-02-16 16:55                     ` Frederic Weisbecker
2011-02-23 10:15                       ` Bastien ROUCARIES
2011-03-02 12:49                         ` Bastien ROUCARIES
2011-03-07 19:00                   ` Reiserfs deadlock in 2.6.36 Frederic Weisbecker
2011-03-08  8:41                     ` Bastien ROUCARIES
2011-03-08 14:05                       ` Frederic Weisbecker
2011-03-08 15:21                         ` Bastien ROUCARIES
2011-03-08 14:18                   ` Frederic Weisbecker
2011-03-08 15:22                     ` Bastien ROUCARIES
2011-03-28  9:14                       ` Bastien ROUCARIES
2011-03-31 15:04                         ` Bastien ROUCARIES
2011-04-05 13:30                           ` Bastien ROUCARIES
2011-04-05 15:58                             ` Jeff Mahoney
2011-04-05 16:10                               ` Bastien ROUCARIES
2011-04-05 22:58                                 ` Frederic Weisbecker
2011-04-06 10:14                                   ` Bastien ROUCARIES
2011-04-11  8:40                                     ` Bastien ROUCARIES
2011-04-11  8:49                                       ` Bastien ROUCARIES
2011-04-11  8:49                                         ` Bastien ROUCARIES
2011-04-11 23:18                                       ` Frederic Weisbecker
2011-04-12 12:01                                         ` Bastien ROUCARIES
2011-04-18  8:01                                           ` Bastien ROUCARIES
2011-04-26 15:29                                             ` Frederic Weisbecker
2011-04-27 11:08                                               ` Bastien ROUCARIES
2011-04-27 11:10                                                 ` Bastien ROUCARIES
2011-04-27 11:13                                                   ` Bastien ROUCARIES
2011-04-27 12:34                                                   ` solsTiCe d'Hiver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101223034229.GF1739@nowhere \
    --to=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=roucaries.bastien@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox