linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Nick Piggin <npiggin@kernel.dk>
Cc: Jan Kara <jack@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ted Ts'o <tytso@mit.edu>, Eric Sandeen <sandeen@redhat.com>,
	linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-btrfs@vger.kernel.org
Subject: Re: [patch] fix up lock order reversal in writeback
Date: Tue, 23 Nov 2010 14:32:08 +0100	[thread overview]
Message-ID: <20101123133208.GH6113@quack.suse.cz> (raw)
In-Reply-To: <20101123080758.GA3364@amd>

On Tue 23-11-10 19:07:58, Nick Piggin wrote:
> On Mon, Nov 22, 2010 at 07:16:55PM +0100, Jan Kara wrote:
> > On Fri 19-11-10 16:16:19, Nick Piggin wrote:
> > > On Fri, Nov 19, 2010 at 01:45:52AM +0100, Jan Kara wrote:
> > > > On Wed 17-11-10 22:28:34, Andrew Morton wrote:
> > > > > The fact that a call to ->write_begin can randomly return with s_umount
> > > > > held, to be randomly released at some random time in the future is a
> > > > > bit ugly, isn't it?  write_begin is a pretty low-level, per-inode
> > > > > thing.
> > > >   I guess you missed that writeback_inodes_sb_nr() (called from _if_idle
> > > > variants) does:
> > > >         bdi_queue_work(sb->s_bdi, &work);
> > > >         wait_for_completion(&done);
> > > >   So we return only after all the IO has been submitted and unlock s_umount
> > > > in writeback_inodes_sb_if_idle(). And we cannot really submit the IO ourselves
> > > > because we are holding i_mutex and we need to get and put references
> > > > to other inodes while doing writeback (those would be really horrible lock
> > > > dependencies - writeback thread can put the last reference to an unlinked
> > > > inode...).
> > > 
> > > But if we're waiting for it, with the lock held, then surely it can
> > > deadlock just the same as if we submit it ourself?
> >   Yes, that's what I realized as well and what needs fixing. It was an
> > unintentional consequence of locking changes Christoph did to the writeback
> > code to fix other races.
> > 
> > > BTW. are you taking i_mutex inside writeback? I mutex can be held
> > > while entering page reclaim, and thus writepage... so it could be a
> > > bug too.
> >   No, writeback does not need i_mutex.
> 
> I did in fact see i_mutex from writeback, which is how the lock order
> reversal was noticed in the first place. I haven't had much luck in
> reproducing it yet. It did come from end_io_work, I believe.
> 
> There is another deadlock in here, by the looks (although this is not
> the one which triggers lockdep -- the workqueue coupling means it
> defeats lockdep detection).
> 
> Buffered write holds i_lock, and waits for inode writeback submission,
> but the work queue seems to be blocked on i_mutex (ext4_end_io_work),
> and so it deadlocks.
  Ah, I see. That's a new thing introduced by Ted's rewrite of
ext4_writepages() - bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc. And indeed it
introduced a deadlock as you describe...

								Honza

> Note that this is an AA deadlock, different from ABBA one relating to
> s_umount lock (but very similar call chains IIRC).
> 
> 
> [  748.406457] SysRq : Show Blocked State
> [  748.406685]   task                        PC stack   pid father
> [  748.406868] kworker/9:1   D 0000000000000000  6296   118      2
> 0x00000000
> [  748.407173]  ffff88012acb3c90 0000000000000046 0000000000000000
> ffff88012b1cec80
> [  748.407686]  0000000000000002 ffff88012acb2000 ffff88012acb2000
> 000000000000d6c8
> [  748.408200]  ffff88012acb2010 ffff88012acb3fd8 ffff880129c7dd00
> ffff88012b1cec80
> [  748.408711] Call Trace:
> [  748.408866]  [<ffffffff81039431>] ? get_parent_ip+0x11/0x50
> [  748.409033]  [<ffffffff8160145c>] __mutex_lock_common+0x18c/0x480
> [  748.409205]  [<ffffffffa0042147>] ? ext4_end_io_work+0x37/0xb0 [ext4]
> [  748.409379]  [<ffffffffa0042147>] ? ext4_end_io_work+0x37/0xb0 [ext4]
> [  748.409546]  [<ffffffff8160182e>] mutex_lock_nested+0x3e/0x50
> [  748.409717]  [<ffffffffa0042147>] ext4_end_io_work+0x37/0xb0 [ext4]
> [  748.409883]  [<ffffffff81068378>] process_one_work+0x1b8/0x5a0
> [  748.410046]  [<ffffffff8106830e>] ? process_one_work+0x14e/0x5a0
> [  748.410219]  [<ffffffffa0042110>] ? ext4_end_io_work+0x0/0xb0 [ext4]
> [  748.410386]  [<ffffffff81069675>] worker_thread+0x175/0x3a0
> [  748.410549]  [<ffffffff81069500>] ? worker_thread+0x0/0x3a0
> [  748.410712]  [<ffffffff8106e246>] kthread+0x96/0xa0
> [  748.410874]  [<ffffffff81003ed4>] kernel_thread_helper+0x4/0x10
> [  748.411038]  [<ffffffff81039878>] ? finish_task_switch+0x78/0x110
> [  748.411202]  [<ffffffff816036c0>] ? restore_args+0x0/0x30
> [  748.411364]  [<ffffffff8106e1b0>] ? kthread+0x0/0xa0
> [  748.411524]  [<ffffffff81003ed0>] ? kernel_thread_helper+0x0/0x10
> [  748.606853] dbench        D 0000000000000000  2872  2916      1
> 0x00000004
> [  748.607154]  ffff880129ec58b8 0000000000000046 ffff880129ec5838
> ffffffff810806fd
> [  748.607665]  ffff880129ec5898 ffff880129ec4000 ffff880129ec4000
> 000000000000d6c8
> [  748.608176]  ffff880129ec4010 ffff880129ec5fd8 ffff88010a2d0000
> ffff88011ff69f00
> [  748.608686] Call Trace:
> [  748.608837]  [<ffffffff810806fd>] ? trace_hardirqs_off+0xd/0x10
> [  748.609001]  [<ffffffff81600bb5>] schedule_timeout+0x1f5/0x350
> [  748.609164]  [<ffffffff8108380b>] ? mark_held_locks+0x6b/0xa0
> [  748.609328]  [<ffffffff8160314b>] ? _raw_spin_unlock_irq+0x2b/0x60
> [  748.609493]  [<ffffffff81039431>] ? get_parent_ip+0x11/0x50
> [  748.609656]  [<ffffffff81606c3d>] ? sub_preempt_count+0x9d/0xd0
> [  748.609820]  [<ffffffff815ffacd>] wait_for_common+0x10d/0x190
> [  748.609984]  [<ffffffff810426e0>] ? default_wake_function+0x0/0x10
> [  748.610149]  [<ffffffff81602ec9>] ? _raw_spin_unlock_bh+0x39/0x40
> [  748.610314]  [<ffffffff815ffbf8>] wait_for_completion+0x18/0x20
> [  748.610479]  [<ffffffff8113d1e7>] writeback_inodes_sb_nr+0xf7/0x120
> [  748.610646]  [<ffffffff8113d68d>] writeback_inodes_sb+0x4d/0x60
> [  748.610811]  [<ffffffff8113d6d2>] ?
> writeback_inodes_sb_if_idle+0x32/0x60
> [  748.610978]  [<ffffffff8113d6da>]
> writeback_inodes_sb_if_idle+0x3a/0x60
> [  748.611153]  [<ffffffffa003fefd>] ext4_da_write_begin+0x20d/0x310
> [ext4]
> [  748.611321]  [<ffffffff810cbbf4>]
> generic_file_buffered_write+0x114/0x2a0
> 
> 
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  reply	other threads:[~2010-11-23 13:32 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-16 11:00 [patch] fix up lock order reversal in writeback Nick Piggin
2010-11-16 13:01 ` Jan Kara
2010-11-17  4:30   ` Eric Sandeen
2010-11-17  4:38     ` Nick Piggin
2010-11-17  5:05       ` Eric Sandeen
2010-11-17  6:10         ` Nick Piggin
2010-11-18  3:06           ` Ted Ts'o
2010-11-18  3:29             ` Andrew Morton
2010-11-18  6:00               ` Nick Piggin
2010-11-18  6:28                 ` Andrew Morton
2010-11-18  8:18                   ` Nick Piggin
2010-11-18 10:51                     ` Theodore Tso
2010-11-18 17:58                     ` Andrew Morton
2010-11-19  5:10                       ` Nick Piggin
2010-11-19 12:07                         ` Theodore Tso
2010-11-18 14:55                   ` Eric Sandeen
2010-11-18 17:10                     ` Andrew Morton
2010-11-18 18:04                       ` Eric Sandeen
2010-11-18 18:24                         ` Eric Sandeen
2010-11-18 18:39                           ` Chris Mason
2010-11-18 18:36                         ` Andrew Morton
2010-11-18 18:51                           ` Chris Mason
2010-11-18 20:22                             ` Andrew Morton
2010-11-18 20:36                               ` Chris Mason
2010-11-18 19:02                           ` Eric Sandeen
2010-11-18 20:17                             ` Andrew Morton
2010-11-18 18:33                   ` Chris Mason
2010-11-18 23:58                     ` Jan Kara
2010-11-19  0:45                   ` Jan Kara
2010-11-19  5:16                     ` Nick Piggin
2010-11-22 18:16                       ` Jan Kara
2010-11-23  8:07                         ` Nick Piggin
2010-11-23 13:32                           ` Jan Kara [this message]
2010-11-23  8:15                         ` Nick Piggin
2010-11-18 18:53             ` Al Viro
2010-11-18  3:18           ` Eric Sandeen
2010-11-22 23:43             ` Andrew Morton
2010-11-16 20:32 ` Andrew Morton
2010-11-17  3:56   ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101123133208.GH6113@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=npiggin@kernel.dk \
    --cc=sandeen@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).