linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Bastien ROUCARIES <roucaries.bastien@gmail.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Reiserfs deadlock in  2.6.36
Date: Thu, 2 Dec 2010 18:43:32 +0100	[thread overview]
Message-ID: <20101202174328.GA1750@nowhere> (raw)
In-Reply-To: <201011261757.08303.roucaries.bastien@gmail.com>

On Fri, Nov 26, 2010 at 05:57:05PM +0100, Bastien ROUCARIES wrote:
> Dear frederic,
> > Hi Bastien,
> > 
> > This really looks like a hung task detector report.
> > Several tasks are stuck in queue_log_writer(), waiting
> > to be woken up on the "journal->j_join_wait" event and
> > that never happens because the waker is also stuck.
> > The problem is your report doesn't show where the waker
> > is stuck, but the hung task detector reports it, it just
> > did before or after the chunk you've posted.
> > 
> > If you could provide me the entire report, I could fix this
> > easily.
> 
> I have manged to reproduce it after six hour of stress. Unfornatly locked was 
> disabled due to a known non bug, in the init sequence. I have used sysrq -t in 
> order to get more information to you.
> 
> Do I need to try to reproduce it, with a newer kernel ? Or it is sufficient ?

> Nov 26 16:27:56 portablebastien kernel: [27960.775903] kded4         D 00000001006907a6     0  2852      1 0x00000000
> Nov 26 16:27:56 portablebastien kernel: [27960.777842]  ffff8800d8a97b28 0000000000000046 ffff880000000000 ffff880100000000
> Nov 26 16:27:56 portablebastien kernel: [27960.779768]  ffff8800d8a96010 ffff8800d8a97fd8 ffff8800379f4f60 ffff8800379f5230
> Nov 26 16:27:56 portablebastien kernel: [27960.781694]  ffff8800379f5228 0000000000014d80 0000000000014d80 ffff8800d8a97fd8
> Nov 26 16:27:56 portablebastien kernel: [27960.783594] Call Trace:
> Nov 26 16:27:56 portablebastien kernel: [27960.785483]  [<ffffffffa01b8454>] queue_log_writer+0x7e/0xaf [reiserfs]
> Nov 26 16:27:56 portablebastien kernel: [27960.787344]  [<ffffffff81044423>] ? default_wake_function+0x0/0xf
> Nov 26 16:27:56 portablebastien kernel: [27960.789253]  [<ffffffffa01bc402>] do_journal_begin_r+0x1ee/0x2d8 [reiserfs]
> Nov 26 16:27:56 portablebastien kernel: [27960.791142]  [<ffffffffa01bc5ae>] journal_begin+0xc2/0x103 [reiserfs]
> Nov 26 16:27:56 portablebastien kernel: [27960.793070]  [<ffffffffa019ebb6>] reiserfs_create+0x105/0x233 [reiserfs]
> Nov 26 16:27:56 portablebastien kernel: [27960.794960]  [<ffffffff8110b57d>] ? generic_permission+0x17/0x9a
> Nov 26 16:27:56 portablebastien kernel: [27960.796854]  [<ffffffff81171e65>] ? security_inode_permission+0x1c/0x1e
> Nov 26 16:27:56 portablebastien kernel: [27960.798714]  [<ffffffff8110c423>] vfs_create+0x6b/0x8d
> Nov 26 16:27:56 portablebastien kernel: [27960.800570]  [<ffffffff8110cdee>] do_last+0x26c/0x532
> Nov 26 16:27:56 portablebastien kernel: [27960.802377]  [<ffffffff8110eb96>] do_filp_open+0x203/0x599
> Nov 26 16:27:56 portablebastien kernel: [27960.804232]  [<ffffffff8134bd2b>] ? _raw_spin_unlock+0x26/0x2a
> Nov 26 16:27:56 portablebastien kernel: [27960.806058]  [<ffffffff811184a0>] ? alloc_fd+0x170/0x182
> Nov 26 16:27:56 portablebastien kernel: [27960.807911]  [<ffffffff81101366>] do_sys_open+0x5b/0xf7
> Nov 26 16:27:56 portablebastien kernel: [27960.809790]  [<ffffffff8134b48e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> Nov 26 16:27:56 portablebastien kernel: [27960.811646]  [<ffffffff8110142b>] sys_open+0x1b/0x1d
> Nov 26 16:27:56 portablebastien kernel: [27960.813506]  [<ffffffff81009ac2>] system_call_fastpath+0x16/0x1b

Ok, this time I don't have the feeling that a deadlock between reiserfs lock and
another lock is involved.

We entered queue_log_writer() and then waited for someone to call do_journal_end()
to testify he finished his job with the journal.

But somehow that didn't happen. Or may be we called queue_log_writer() but we shouldn't,
thinking there was a writer already but there wasn't. Or there is a crazy race somewhere.

On which kernel do you see this? Do you know a kernel on which you've never seen it.
Were you running something specific to trigger this deadlock?

Thanks!

  parent reply	other threads:[~2010-12-02 17:43 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-18 15:49 Reiserfs deadlock in 2.6.36 Bastien ROUCARIES
2010-11-18 16:30 ` Frederic Weisbecker
2010-11-18 22:02   ` Bastien ROUCARIES
2010-11-26 16:57   ` Bastien ROUCARIES
2010-11-26 17:27     ` Bastien ROUCARIES
2010-12-02 17:43     ` Frederic Weisbecker [this message]
2010-12-16 13:49       ` Bastien ROUCARIES
2010-12-22 17:50         ` Bastien ROUCARIES
2010-12-22 18:04           ` Frederic Weisbecker
2010-12-22 18:11             ` Bastien ROUCARIES
2010-12-23  3:42               ` Frederic Weisbecker
2011-01-30  0:08                 ` Bastien ROUCARIES
2011-02-16 16:22                   ` Regression with dataloss: Reiserfs deadlock in 2.6.36 and 2.6.37, know working 2.6.33 Bastien ROUCARIES
2011-02-16 16:55                     ` Frederic Weisbecker
2011-02-23 10:15                       ` Bastien ROUCARIES
2011-03-02 12:49                         ` Bastien ROUCARIES
2011-03-07 19:00                   ` Reiserfs deadlock in 2.6.36 Frederic Weisbecker
2011-03-08  8:41                     ` Bastien ROUCARIES
2011-03-08 14:05                       ` Frederic Weisbecker
2011-03-08 15:21                         ` Bastien ROUCARIES
2011-03-08 14:18                   ` Frederic Weisbecker
2011-03-08 15:22                     ` Bastien ROUCARIES
2011-03-28  9:14                       ` Bastien ROUCARIES
2011-03-31 15:04                         ` Bastien ROUCARIES
2011-04-05 13:30                           ` Bastien ROUCARIES
2011-04-05 15:58                             ` Jeff Mahoney
2011-04-05 16:10                               ` Bastien ROUCARIES
2011-04-05 22:58                                 ` Frederic Weisbecker
2011-04-06 10:14                                   ` Bastien ROUCARIES
2011-04-11  8:40                                     ` Bastien ROUCARIES
2011-04-11  8:49                                       ` Bastien ROUCARIES
2011-04-11  8:49                                         ` Bastien ROUCARIES
2011-04-11 23:18                                       ` Frederic Weisbecker
2011-04-12 12:01                                         ` Bastien ROUCARIES
2011-04-18  8:01                                           ` Bastien ROUCARIES
2011-04-26 15:29                                             ` Frederic Weisbecker
2011-04-27 11:08                                               ` Bastien ROUCARIES
2011-04-27 11:10                                                 ` Bastien ROUCARIES
2011-04-27 11:13                                                   ` Bastien ROUCARIES
2011-04-27 12:34                                                   ` solsTiCe d'Hiver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101202174328.GA1750@nowhere \
    --to=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=roucaries.bastien@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).