linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Zach Brown <zach.brown@oracle.com>
Cc: Erez Zadok <ezk@cs.sunysb.edu>,
	linux-kernel@vger.kernel.org, ext3-users@redhat.com,
	Chris Mason <chris.mason@oracle.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: lockdep warning with LTP dio test (v2.6.24-rc6-125-g5356f66)
Date: Mon, 14 Jan 2008 18:06:09 +0100	[thread overview]
Message-ID: <20080114170609.GH4214@duck.suse.cz> (raw)
In-Reply-To: <477BF72B.4000608@oracle.com>

On Wed 02-01-08 12:42:19, Zach Brown wrote:
> Erez Zadok wrote:
> > Setting: ltp-full-20071031, dio01 test on ext3 with Linus's latest tree.
> > Kernel w/ SMP, preemption, and lockdep configured.
> 
> This is a real lock ordering problem.  Thanks for reporting it.
> 
> The updating of atime inside sys_mmap() orders the mmap_sem in the vfs
> outside of the journal handle in ext3's inode dirtying:
> 
> > -> #1 (jbd_handle){--..}:
> >        [<c023068f>] __lock_acquire+0x9cc/0xb95
> >        [<c0230c2f>] lock_acquire+0x5f/0x78
> >        [<c029c7e9>] journal_start+0xee/0xf8
> >        [<c02959d6>] ext3_journal_start_sb+0x48/0x4a
> >        [<c029152b>] ext3_dirty_inode+0x27/0x6c
> >        [<c026f701>] __mark_inode_dirty+0x29/0x144
> >        [<c026817b>] touch_atime+0xb7/0xbc
> >        [<c023af6d>] generic_file_mmap+0x2d/0x42
> >        [<c024a5cc>] mmap_region+0x1e6/0x3b4
> >        [<c024aa6b>] do_mmap_pgoff+0x1fb/0x253
> >        [<c02067af>] sys_mmap2+0x9b/0xb5
> >        [<c020275e>] syscall_call+0x7/0xb
> >        [<ffffffff>] 0xffffffff
> 
> ext3_direct_IO() orders the journal handle outside of the mmap_sem that
> dio_get_page() acquires to pin pages with get_user_pages():
> 
> > -> #0 (&mm->mmap_sem){----}:
> >        [<c023057f>] __lock_acquire+0x8bc/0xb95
> >        [<c0230c2f>] lock_acquire+0x5f/0x78
> >        [<c0397d4f>] down_read+0x3a/0x4c
> >        [<c02778a2>] dio_get_page+0x4e/0x15d
> >        [<c0278352>] __blockdev_direct_IO+0x431/0xa81
> >        [<c0291318>] ext3_direct_IO+0x10c/0x1a1
> >        [<c023c091>] generic_file_direct_IO+0x124/0x139
> >        [<c023c0fc>] generic_file_direct_write+0x56/0x11c
> >        [<c023ca15>] __generic_file_aio_write_nolock+0x33d/0x489
> >        [<c023cbb9>] generic_file_aio_write+0x58/0xb6
> >        [<c028d4e3>] ext3_file_write+0x27/0x99
> >        [<c0256d0f>] do_sync_write+0xc5/0x102
> >        [<c0257463>] vfs_write+0x90/0x119
> >        [<c0257a25>] sys_write+0x3d/0x61
> >        [<c02026d6>] sysenter_past_esp+0x5f/0xa5
> >        [<ffffffff>] 0xffffffff
> 
> Two fixes come to mind:
> 
> 1) use something like Peter's ->mmap_prepare() to update atime before
> acquiring the mmap_sem.  ( http://lkml.org/lkml/2007/11/11/97 ).  I
> don't know if this would leave more paths which do a journal_start()
> while holding the mmap_sem.
> 
> 2) rework ext3's dio to only hold the jbd handle in ext3_get_block().
> Chris has a patch for this kicking around somewhere but I'm told it has
> problems exposing old blocks in ordered data mode.
> 
> Does anyone have preferences?  I could go either way.  I certainly don't
> like the idea of journal handles being held across the entirety of
> fs/direct-io.c.  It's yet another case of O_DIRECT differing wildly from
> the buffered path :(.
  I've looked more into it and I think that 2) is the only way to go since
transaction start ranks below page lock (standard buffered write path) and
page lock ranks below mmap_sem. So we have at least one more dependency
mmap_sem must go before transaction start...

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  reply	other threads:[~2008-01-14 17:06 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <200712242302.lBON2O8s011190@agora.fsl.cs.sunysb.edu>
2008-01-02 20:42 ` lockdep warning with LTP dio test (v2.6.24-rc6-125-g5356f66) Zach Brown
2008-01-14 17:06   ` Jan Kara [this message]
2008-01-14 18:14     ` Chris Mason
2008-01-25 16:09       ` Jan Kara
2008-01-25 16:16         ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080114170609.GH4214@duck.suse.cz \
    --to=jack@suse.cz \
    --cc=chris.mason@oracle.com \
    --cc=ext3-users@redhat.com \
    --cc=ezk@cs.sunysb.edu \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=zach.brown@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).