Re: dio_get_page() lockdep complaints

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@linux-foundation.org>
To: Jens Axboe <jens.axboe@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-aio@kvack.org,
	reiserfs-dev@namesys.com, "Vladimir V. Saveliev" <vs@namesys.com>,
	linux-mm@kvack.org
Subject: Re: dio_get_page() lockdep complaints
Date: Thu, 19 Apr 2007 01:25:40 -0700	[thread overview]
Message-ID: <20070419012540.bed394e2.akpm@linux-foundation.org> (raw)
In-Reply-To: <20070419080157.GC20928@kernel.dk>

On Thu, 19 Apr 2007 10:01:57 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:

> On Thu, Apr 19 2007, Andrew Morton wrote:
> > On Thu, 19 Apr 2007 09:38:30 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> > 
> > > Hi,
> > > 
> > > Doing some testing on CFQ, I ran into this 100% reproducible report:
> > > 
> > > =======================================================
> > > [ INFO: possible circular locking dependency detected ]
> > > 2.6.21-rc7 #5
> > > -------------------------------------------------------
> > > fio/9741 is trying to acquire lock:
> > >  (&mm->mmap_sem){----}, at: [<b018cb34>] dio_get_page+0x54/0x161
> > > 
> > > but task is already holding lock:
> > >  (&inode->i_mutex){--..}, at: [<b038c6e5>] mutex_lock+0x1c/0x1f
> > > 
> > > which lock already depends on the new lock.
> > > 
> > 
> > This is the correct ranking: i_mutex outside mmap_sem.
> > 
> > > 
> > > the existing dependency chain (in reverse order) is:
> > > 
> > > -> #1 (&inode->i_mutex){--..}:
> > >        [<b013e3fb>] __lock_acquire+0xdee/0xf9c
> > >        [<b013e600>] lock_acquire+0x57/0x70
> > >        [<b038c4a5>] __mutex_lock_slowpath+0x73/0x297
> > >        [<b038c6e5>] mutex_lock+0x1c/0x1f
> > >        [<b01b17e9>] reiserfs_file_release+0x54/0x447
> > >        [<b016afe7>] __fput+0x53/0x101
> > >        [<b016b0ee>] fput+0x19/0x1c
> > >        [<b015bcd5>] remove_vma+0x3b/0x4d
> > >        [<b015c659>] do_munmap+0x17f/0x1cf
> > >        [<b015c6db>] sys_munmap+0x32/0x42
> > >        [<b0103f04>] sysenter_past_esp+0x5d/0x99
> > >        [<ffffffff>] 0xffffffff
> > > 
> > > -> #0 (&mm->mmap_sem){----}:
> > >        [<b013e259>] __lock_acquire+0xc4c/0xf9c
> > >        [<b013e600>] lock_acquire+0x57/0x70
> > >        [<b0137b92>] down_read+0x3a/0x4c
> > >        [<b018cb34>] dio_get_page+0x54/0x161
> > >        [<b018d7a9>] __blockdev_direct_IO+0x514/0xe2a
> > >        [<b01cf449>] ext3_direct_IO+0x98/0x1e5
> > >        [<b014e8df>] generic_file_direct_IO+0x63/0x133
> > >        [<b01500e9>] generic_file_aio_read+0x16b/0x222
> > >        [<b017f8b6>] aio_rw_vect_retry+0x5a/0x116
> > >        [<b0180147>] aio_run_iocb+0x69/0x129
> > >        [<b0180a78>] io_submit_one+0x194/0x2eb
> > >        [<b0181331>] sys_io_submit+0x92/0xe7
> > >        [<b0103f90>] syscall_call+0x7/0xb
> > >        [<ffffffff>] 0xffffffff
> > 
> > But here reiserfs is taking i_mutex in its file_operations.release(),
> > which can be called under mmap_sem.
> > 
> > Vladimir's recent de14569f94513279e3d44d9571a421e9da1759ae.
> > "resierfs: avoid tail packing if an inode was ever mmapped" comes real
> > close to this code, but afaict it did not cause this bug.
> > 
> > I can't think of anything which we've done in the 2.6.21 cycle which
> > would have caused this to start happening.  Odd.
> 
> The bug may be holder, let me know if you want me to check 2.6.20 or
> earlier.

Would be great if you could test 2.6.20.  I have a feeling that I missed
something, but what?  We didn't change the refcounting of lifetime of
vma.vm_file...


> > > The test run was fio, the job file used is:
> > > 
> > > # fio job file snip below
> > > [global]
> > > bs=4k
> > > buffered=0
> > > ioengine=libaio
> > > iodepth=4
> > > thread
> > > 
> > > [readers]
> > > numjobs=8
> > > size=128m
> > > rw=read
> > > # fio job file snip above
> > > 
> > > Filesystem was ext3, default mkfs and mount options. Kernel was
> > > 2.6.21-rc7 as of this morning, with some CFQ patches applied.
> > > 
> > 
> > It's interesting that lockdep learned the (wrong) ranking from a reiserfs
> > operation then later detected it being violated by ext3.
> 
> It's a scratch test box, which for some reason has reiserfs as the
> rootfs. So reiser gets to run first :-)

direct-io reads against reiserfs also will take i_mutex outside mmap_sem. 
As will pagefaults inside generic_file_write() (which is where this ranking
is primarily defined).

So an all-reiserfs system should be getting the same reports.  Obviously,
that isn't happening.

It's a bit odd that reiserfs is playing with file contents within
file_operations.release(): there could be other files open against that
inode.  One would expect this sort of thing to be happening in an
inode_operation.  But it's been like that for a long time.

Is it possible that fio was changed?  That it was changed to close() the fd
before doing the munmapping whereas it used to hold the file open?

next prev parent reply	other threads:[~2007-04-19  8:26 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-19  7:38 dio_get_page() lockdep complaints Jens Axboe
2007-04-19  8:01 ` Andrew Morton
2007-04-19  8:01   ` Jens Axboe
2007-04-19  8:25     ` Andrew Morton [this message]
2007-04-19  8:34       ` Jens Axboe
2007-04-19 12:43         ` Vladimir V. Saveliev
2007-04-19 12:49           ` Jens Axboe
2007-04-19 12:52             ` Jens Axboe
2007-04-19 13:53               ` Roland Dreier
2007-04-19 14:20                 ` Jens Axboe
2007-04-19 14:15         ` Jens Axboe
2007-04-19 14:55           ` Vladimir V. Saveliev
2007-04-19 14:57       ` Vladimir V. Saveliev
2007-04-19 16:42         ` Andrew Morton
2007-04-19 14:36   ` Chris Mason
2007-11-09 17:02 ` Peter Zijlstra
2007-11-09 17:30   ` Zach Brown
2007-11-09 17:45     ` Trond Myklebust
2007-11-09 17:48       ` Zach Brown
2007-11-09 18:01         ` Chris Mason
2007-11-09 18:35           ` Zach Brown
2007-11-09 18:53             ` Chris Mason
2007-11-09 18:57               ` Chris Mason
2007-11-09 19:16                 ` Zach Brown
2007-11-09 19:35                   ` Chris Mason
2007-11-11 19:49       ` Peter Zijlstra
2007-11-12  8:45         ` Martin Schwidefsky
2007-11-12  9:27           ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070419012540.bed394e2.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=jens.axboe@oracle.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=reiserfs-dev@namesys.com \
    --cc=vs@namesys.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox