All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	linux-mm@kvack.org, davej@redhat.com, jboyer@redhat.com,
	tyhicks@canonical.com, linux-kernel@vger.kernel.org,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Mimi Zohar <zohar@linux.vnet.ibm.com>
Subject: Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
Date: Thu, 8 Mar 2012 21:44:25 +0000	[thread overview]
Message-ID: <20120308214425.GA23916@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20120308130256.c7855cbd.akpm@linux-foundation.org>

On Thu, Mar 08, 2012 at 01:02:56PM -0800, Andrew Morton wrote:
> > This fix the below lockdep warning
> 
> OK, what's going on here.

Deadlock in hugetlbfs mmap getting misreported.

One last time: ->mmap_sem nests inside ->i_mutex.  Both for regular
files and for directories.  Always had.

For directories there's copy_to_user() from ->readdir() done under ->i_mutex.
For regular files there's copy_from_user() from ->write(), usually done under
->i_mutex.  On hugetlbfs there's copy_to_user() from ->read() done under
->i_mutex.

It had not changed at all.  Lockdep sees both call chains; the only question
is which chain is seen first.  And usually reading a directory happens earlier
in the boot than writing into a file.  That's all there is to it.

Unfortunately, the fact that call chain being reported is obviously about
directories leads to false hopes that deadlock doesn't exist - mmap()
obviously can't happen to a directory inode, so people hope that it's a
false positive.  It isn't.

Patch separating directory and non-directory ->i_mutex into different classes
went in at some point, precisely due to those hopes.  It had a braino that
made it useless.  Fix for that braino had been posted and sits my queue; I'll
push it to Linus along with other pending fixes tonight.

It will *not* eliminate the (very real) deadlock.  It might make the warning
go away, but only if read() on hugetlbfs files doesn't happen during boot.

I suspect that they right thing would be to have a way to set explicit
nesting rules, not tied to speficic call trace.  I hadn't looked into
lockdep guts, so no idea how much will that hurt to implement.  As in
lockdep_lock_nests(class_outer, class_inner, message), acting as if
there had been a call chain where class_outer had been taken before
class_inner, with message going in place of call trace for that chain
when we run into a conflict...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	linux-mm@kvack.org, davej@redhat.com, jboyer@redhat.com,
	tyhicks@canonical.com, linux-kernel@vger.kernel.org,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Mimi Zohar <zohar@linux.vnet.ibm.com>
Subject: Re: [PATCH] hugetlbfs: lockdep annotate root inode properly
Date: Thu, 8 Mar 2012 21:44:25 +0000	[thread overview]
Message-ID: <20120308214425.GA23916@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20120308130256.c7855cbd.akpm@linux-foundation.org>

On Thu, Mar 08, 2012 at 01:02:56PM -0800, Andrew Morton wrote:
> > This fix the below lockdep warning
> 
> OK, what's going on here.

Deadlock in hugetlbfs mmap getting misreported.

One last time: ->mmap_sem nests inside ->i_mutex.  Both for regular
files and for directories.  Always had.

For directories there's copy_to_user() from ->readdir() done under ->i_mutex.
For regular files there's copy_from_user() from ->write(), usually done under
->i_mutex.  On hugetlbfs there's copy_to_user() from ->read() done under
->i_mutex.

It had not changed at all.  Lockdep sees both call chains; the only question
is which chain is seen first.  And usually reading a directory happens earlier
in the boot than writing into a file.  That's all there is to it.

Unfortunately, the fact that call chain being reported is obviously about
directories leads to false hopes that deadlock doesn't exist - mmap()
obviously can't happen to a directory inode, so people hope that it's a
false positive.  It isn't.

Patch separating directory and non-directory ->i_mutex into different classes
went in at some point, precisely due to those hopes.  It had a braino that
made it useless.  Fix for that braino had been posted and sits my queue; I'll
push it to Linus along with other pending fixes tonight.

It will *not* eliminate the (very real) deadlock.  It might make the warning
go away, but only if read() on hugetlbfs files doesn't happen during boot.

I suspect that they right thing would be to have a way to set explicit
nesting rules, not tied to speficic call trace.  I hadn't looked into
lockdep guts, so no idea how much will that hurt to implement.  As in
lockdep_lock_nests(class_outer, class_inner, message), acting as if
there had been a call chain where class_outer had been taken before
class_inner, with message going in place of call trace for that chain
when we run into a conflict...

  parent reply	other threads:[~2012-03-08 21:44 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-08  9:15 [PATCH] hugetlbfs: lockdep annotate root inode properly Aneesh Kumar K.V
2012-03-08  9:15 ` Aneesh Kumar K.V
2012-03-08 21:02 ` Andrew Morton
2012-03-08 21:02   ` Andrew Morton
2012-03-08 21:10   ` Dave Jones
2012-03-08 21:10     ` Dave Jones
2012-03-08 21:19   ` Tyler Hicks
2012-03-08 21:40     ` Andrew Morton
2012-03-08 21:40       ` Andrew Morton
2012-03-08 21:49       ` Al Viro
2012-03-08 21:49         ` Al Viro
2012-03-08 22:19         ` Andrew Morton
2012-03-08 22:19           ` Andrew Morton
2012-03-08 22:33           ` Dave Jones
2012-03-08 22:33             ` Dave Jones
2012-03-08 22:45             ` Andrew Morton
2012-03-08 22:45               ` Andrew Morton
2012-03-09  5:00           ` Aneesh Kumar K.V
2012-03-09  5:00             ` Aneesh Kumar K.V
2012-03-09  5:03       ` Aneesh Kumar K.V
2012-03-09  5:03         ` Aneesh Kumar K.V
2012-03-08 21:44   ` Al Viro [this message]
2012-03-08 21:44     ` Al Viro
2012-03-08 22:44     ` Peter Zijlstra
2012-03-08 22:44       ` Peter Zijlstra
2012-03-08 22:46       ` Peter Zijlstra
2012-03-08 22:46         ` Peter Zijlstra
  -- strict thread matches above, loose matches on Subject: below --
2012-04-16 20:28 Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120308214425.GA23916@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=davej@redhat.com \
    --cc=jboyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tyhicks@canonical.com \
    --cc=zohar@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.