All of lore.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Bastien ROUCARIES <roucaries.bastien@gmail.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Reiserfs deadlock in  2.6.36
Date: Thu, 2 Dec 2010 18:43:32 +0100	[thread overview]
Message-ID: <20101202174328.GA1750@nowhere> (raw)
In-Reply-To: <201011261757.08303.roucaries.bastien@gmail.com>

On Fri, Nov 26, 2010 at 05:57:05PM +0100, Bastien ROUCARIES wrote:
> Dear frederic,
> > Hi Bastien,
> > 
> > This really looks like a hung task detector report.
> > Several tasks are stuck in queue_log_writer(), waiting
> > to be woken up on the "journal->j_join_wait" event and
> > that never happens because the waker is also stuck.
> > The problem is your report doesn't show where the waker
> > is stuck, but the hung task detector reports it, it just
> > did before or after the chunk you've posted.
> > 
> > If you could provide me the entire report, I could fix this
> > easily.
> 
> I have manged to reproduce it after six hour of stress. Unfornatly locked was 
> disabled due to a known non bug, in the init sequence. I have used sysrq -t in 
> order to get more information to you.
> 
> Do I need to try to reproduce it, with a newer kernel ? Or it is sufficient ?

> Nov 26 16:27:56 portablebastien kernel: [27960.775903] kded4         D 00000001006907a6     0  2852      1 0x00000000
> Nov 26 16:27:56 portablebastien kernel: [27960.777842]  ffff8800d8a97b28 0000000000000046 ffff880000000000 ffff880100000000
> Nov 26 16:27:56 portablebastien kernel: [27960.779768]  ffff8800d8a96010 ffff8800d8a97fd8 ffff8800379f4f60 ffff8800379f5230
> Nov 26 16:27:56 portablebastien kernel: [27960.781694]  ffff8800379f5228 0000000000014d80 0000000000014d80 ffff8800d8a97fd8
> Nov 26 16:27:56 portablebastien kernel: [27960.783594] Call Trace:
> Nov 26 16:27:56 portablebastien kernel: [27960.785483]  [<ffffffffa01b8454>] queue_log_writer+0x7e/0xaf [reiserfs]
> Nov 26 16:27:56 portablebastien kernel: [27960.787344]  [<ffffffff81044423>] ? default_wake_function+0x0/0xf
> Nov 26 16:27:56 portablebastien kernel: [27960.789253]  [<ffffffffa01bc402>] do_journal_begin_r+0x1ee/0x2d8 [reiserfs]
> Nov 26 16:27:56 portablebastien kernel: [27960.791142]  [<ffffffffa01bc5ae>] journal_begin+0xc2/0x103 [reiserfs]
> Nov 26 16:27:56 portablebastien kernel: [27960.793070]  [<ffffffffa019ebb6>] reiserfs_create+0x105/0x233 [reiserfs]
> Nov 26 16:27:56 portablebastien kernel: [27960.794960]  [<ffffffff8110b57d>] ? generic_permission+0x17/0x9a
> Nov 26 16:27:56 portablebastien kernel: [27960.796854]  [<ffffffff81171e65>] ? security_inode_permission+0x1c/0x1e
> Nov 26 16:27:56 portablebastien kernel: [27960.798714]  [<ffffffff8110c423>] vfs_create+0x6b/0x8d
> Nov 26 16:27:56 portablebastien kernel: [27960.800570]  [<ffffffff8110cdee>] do_last+0x26c/0x532
> Nov 26 16:27:56 portablebastien kernel: [27960.802377]  [<ffffffff8110eb96>] do_filp_open+0x203/0x599
> Nov 26 16:27:56 portablebastien kernel: [27960.804232]  [<ffffffff8134bd2b>] ? _raw_spin_unlock+0x26/0x2a
> Nov 26 16:27:56 portablebastien kernel: [27960.806058]  [<ffffffff811184a0>] ? alloc_fd+0x170/0x182
> Nov 26 16:27:56 portablebastien kernel: [27960.807911]  [<ffffffff81101366>] do_sys_open+0x5b/0xf7
> Nov 26 16:27:56 portablebastien kernel: [27960.809790]  [<ffffffff8134b48e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> Nov 26 16:27:56 portablebastien kernel: [27960.811646]  [<ffffffff8110142b>] sys_open+0x1b/0x1d
> Nov 26 16:27:56 portablebastien kernel: [27960.813506]  [<ffffffff81009ac2>] system_call_fastpath+0x16/0x1b

Ok, this time I don't have the feeling that a deadlock between reiserfs lock and
another lock is involved.

We entered queue_log_writer() and then waited for someone to call do_journal_end()
to testify he finished his job with the journal.

But somehow that didn't happen. Or may be we called queue_log_writer() but we shouldn't,
thinking there was a writer already but there wasn't. Or there is a crazy race somewhere.

On which kernel do you see this? Do you know a kernel on which you've never seen it.
Were you running something specific to trigger this deadlock?

Thanks!

  parent reply	other threads:[~2010-12-02 17:43 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-18 15:49 Reiserfs deadlock in 2.6.36 Bastien ROUCARIES
2010-11-18 16:30 ` Frederic Weisbecker
2010-11-18 22:02   ` Bastien ROUCARIES
2010-11-26 16:57   ` Bastien ROUCARIES
2010-11-26 17:27     ` Bastien ROUCARIES
2010-12-02 17:43     ` Frederic Weisbecker [this message]
2010-12-16 13:49       ` Bastien ROUCARIES
2010-12-22 17:50         ` Bastien ROUCARIES
2010-12-22 18:04           ` Frederic Weisbecker
2010-12-22 18:11             ` Bastien ROUCARIES
2010-12-23  3:42               ` Frederic Weisbecker
2011-01-30  0:08                 ` Bastien ROUCARIES
2011-02-16 16:22                   ` Regression with dataloss: Reiserfs deadlock in 2.6.36 and 2.6.37, know working 2.6.33 Bastien ROUCARIES
2011-02-16 16:55                     ` Frederic Weisbecker
2011-02-23 10:15                       ` Bastien ROUCARIES
2011-03-02 12:49                         ` Bastien ROUCARIES
2011-03-07 19:00                   ` Reiserfs deadlock in 2.6.36 Frederic Weisbecker
2011-03-08  8:41                     ` Bastien ROUCARIES
2011-03-08 14:05                       ` Frederic Weisbecker
2011-03-08 15:21                         ` Bastien ROUCARIES
2011-03-08 14:18                   ` Frederic Weisbecker
2011-03-08 15:22                     ` Bastien ROUCARIES
2011-03-28  9:14                       ` Bastien ROUCARIES
2011-03-31 15:04                         ` Bastien ROUCARIES
2011-04-05 13:30                           ` Bastien ROUCARIES
2011-04-05 15:58                             ` Jeff Mahoney
2011-04-05 16:10                               ` Bastien ROUCARIES
2011-04-05 22:58                                 ` Frederic Weisbecker
2011-04-06 10:14                                   ` Bastien ROUCARIES
2011-04-11  8:40                                     ` Bastien ROUCARIES
2011-04-11  8:49                                       ` Bastien ROUCARIES
2011-04-11  8:49                                         ` Bastien ROUCARIES
2011-04-11 23:18                                       ` Frederic Weisbecker
2011-04-12 12:01                                         ` Bastien ROUCARIES
2011-04-18  8:01                                           ` Bastien ROUCARIES
2011-04-26 15:29                                             ` Frederic Weisbecker
2011-04-27 11:08                                               ` Bastien ROUCARIES
2011-04-27 11:10                                                 ` Bastien ROUCARIES
2011-04-27 11:13                                                   ` Bastien ROUCARIES
2011-04-27 12:34                                                   ` solsTiCe d'Hiver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101202174328.GA1750@nowhere \
    --to=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=roucaries.bastien@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.