Re: [Resend] 3.2-rc6+: Reported regressions from 3.0 and 3.1

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [Resend] 3.2-rc6+: Reported regressions from 3.0 and 3.1
       [not found]   ` <4EF15F42.4070104@oracle.com>
@ 2011-12-21  4:44     ` Linus Torvalds
  2011-12-21  6:15       ` Hugh Dickins
  0 siblings, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2011-12-21  4:44 UTC (permalink / raw)
  To: Dave Kleikamp
  Cc: Rafael J. Wysocki, Dave Kleikamp, jfs-discussion,
	Kernel Testers List, LKML, Maciej Rutecki, Andrew Morton,
	Florian Mickler, davem, Hugh Dickins, Al Viro, linux-mm

On Tue, Dec 20, 2011 at 8:23 PM, Dave Kleikamp <dave.kleikamp@oracle.com> wrote:
>
> I don't think this is a regression.  It's been seen before, but the
> patch never got submitted, or was lost somewhere. I believe this
> will fix it.

Hmm. This patch looks obviously correct. But it looks *so* obviously
correct that it just makes me suspicious - this is not new or seldom
used code, it's been this way for ages and used all the time. That
line literally goes back to 2007, commit eb2be189317d0. And it looks
like even before that we had a GFP_KERNEL for the add_to_page_cache()
case and that goes back to before the git history. So this is
*ancient*.

Maybe almost nobody uses __read_cache_page() with a non-GFP_KERNEL gfp
and as a result we've not noticed.

Or maybe there is some crazy reason why it calls "add_to_page_cache()"
with GFP_KERNEL.

Adding the usual suspects for mm/filemap.c to the cc line (Andrew is
already cc'd, but Al and Hugh should comment).

Ack's, people? Is it really as obvious as it looks, and we've just had
this bug forever?

            Linus

--- snip snip ---
> vfs: __read_cache_page should use gfp argument rather than GFP_KERNEL
>
> lockdep reports a deadlock in jfs because a special inode's rw semaphore
> is taken recursively. The mapping's gfp mask is GFP_NOFS, but is not used
> when __read_cache_page() calls add_to_page_cache_lru().
>
> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index c106d3b..c9ea3df 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1828,7 +1828,7 @@ repeat:
>                page = __page_cache_alloc(gfp | __GFP_COLD);
>                if (!page)
>                        return ERR_PTR(-ENOMEM);
> -               err = add_to_page_cache_lru(page, mapping, index, GFP_KERNEL);
> +               err = add_to_page_cache_lru(page, mapping, index, gfp);
>                if (unlikely(err)) {
>                        page_cache_release(page);
>                        if (err == -EEXIST)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Resend] 3.2-rc6+: Reported regressions from 3.0 and 3.1
  2011-12-21  4:44     ` [Resend] 3.2-rc6+: Reported regressions from 3.0 and 3.1 Linus Torvalds
@ 2011-12-21  6:15       ` Hugh Dickins
  2011-12-21  7:10         ` Al Viro
  2011-12-21 17:05         ` [PATCH v2] vfs: __read_cache_page should use gfp argument rather than GFP_KERNEL Dave Kleikamp
  0 siblings, 2 replies; 6+ messages in thread
From: Hugh Dickins @ 2011-12-21  6:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Kleikamp, Rafael J. Wysocki, Dave Kleikamp, jfs-discussion,
	Kernel Testers List, LKML, Maciej Rutecki, Andrew Morton,
	Florian Mickler, davem, Al Viro, linux-mm

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2927 bytes --]

On Tue, 20 Dec 2011, Linus Torvalds wrote:
> On Tue, Dec 20, 2011 at 8:23 PM, Dave Kleikamp <dave.kleikamp@oracle.com> wrote:
> >
> > I don't think this is a regression.  It's been seen before, but the
> > patch never got submitted, or was lost somewhere. I believe this
> > will fix it.
> 
> Hmm. This patch looks obviously correct. But it looks *so* obviously
> correct that it just makes me suspicious - this is not new or seldom
> used code, it's been this way for ages and used all the time. That
> line literally goes back to 2007, commit eb2be189317d0. And it looks
> like even before that we had a GFP_KERNEL for the add_to_page_cache()
> case and that goes back to before the git history. So this is
> *ancient*.
> 
> Maybe almost nobody uses __read_cache_page() with a non-GFP_KERNEL gfp
> and as a result we've not noticed.
> 
> Or maybe there is some crazy reason why it calls "add_to_page_cache()"
> with GFP_KERNEL.
> 
> Adding the usual suspects for mm/filemap.c to the cc line (Andrew is
> already cc'd, but Al and Hugh should comment).
> 
> Ack's, people? Is it really as obvious as it looks, and we've just had
> this bug forever?

Certainly

Acked-by: Hugh Dickins <hughd@google.com>

from me (and add_to_page_cache_locked does the masking of inappropriate
bits when passing on down, so no need to worry about that aspect).

I agree that it's odd that we've never noticed it before, but I don't
think the GFP_KERNEL there has any more significance than oversight.
Nick cleaned up some similar instances in filemap.c a few years back,
I guess ones he hit in testing, but this just got left over.

page_cache_read()'s GFP_KERNEL looks similarly worrying, but as it's
only called by filemap_fault(), I suppose it's actually okay.

Ooh, maybe you should also update that comment on GFP_KERNEL above
read_cache_page_gfp()...

Hugh

> 
>             Linus
> 
> --- snip snip ---
> > vfs: __read_cache_page should use gfp argument rather than GFP_KERNEL
> >
> > lockdep reports a deadlock in jfs because a special inode's rw semaphore
> > is taken recursively. The mapping's gfp mask is GFP_NOFS, but is not used
> > when __read_cache_page() calls add_to_page_cache_lru().
> >
> > Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
> >
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index c106d3b..c9ea3df 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -1828,7 +1828,7 @@ repeat:
> >                page = __page_cache_alloc(gfp | __GFP_COLD);
> >                if (!page)
> >                        return ERR_PTR(-ENOMEM);
> > -               err = add_to_page_cache_lru(page, mapping, index, GFP_KERNEL);
> > +               err = add_to_page_cache_lru(page, mapping, index, gfp);
> >                if (unlikely(err)) {
> >                        page_cache_release(page);
> >                        if (err == -EEXIST)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Resend] 3.2-rc6+: Reported regressions from 3.0 and 3.1
  2011-12-21  6:15       ` Hugh Dickins
@ 2011-12-21  7:10         ` Al Viro
  2011-12-21 17:05         ` [PATCH v2] vfs: __read_cache_page should use gfp argument rather than GFP_KERNEL Dave Kleikamp
  1 sibling, 0 replies; 6+ messages in thread
From: Al Viro @ 2011-12-21  7:10 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Dave Kleikamp, Rafael J. Wysocki, Dave Kleikamp,
	jfs-discussion, Kernel Testers List, LKML, Maciej Rutecki,
	Andrew Morton, Florian Mickler, davem, linux-mm

On Tue, Dec 20, 2011 at 10:15:00PM -0800, Hugh Dickins wrote:

> Acked-by: Hugh Dickins <hughd@google.com>
> 
> from me (and add_to_page_cache_locked does the masking of inappropriate
> bits when passing on down, so no need to worry about that aspect).

I was grepping for possibilities of that hitting us right now...  OK,
rigth you are.

Acked-by: Al Viro <viro@zeniv.linux.org.uk>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2] vfs: __read_cache_page should use gfp argument rather than GFP_KERNEL
  2011-12-21  6:15       ` Hugh Dickins
  2011-12-21  7:10         ` Al Viro
@ 2011-12-21 17:05         ` Dave Kleikamp
  2011-12-21 20:28           ` Andrew Morton
  1 sibling, 1 reply; 6+ messages in thread
From: Dave Kleikamp @ 2011-12-21 17:05 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Rafael J. Wysocki, jfs-discussion,
	Kernel Testers List, LKML, Maciej Rutecki, Andrew Morton,
	Florian Mickler, davem, Al Viro, linux-mm

[ updated to remove now-obsolete comment in read_cache_page_gfp()]

lockdep reports a deadlock in jfs because a special inode's rw semaphore
is taken recursively. The mapping's gfp mask is GFP_NOFS, but is not used
when __read_cache_page() calls add_to_page_cache_lru().

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>

diff --git a/mm/filemap.c b/mm/filemap.c
index c106d3b..5f0a3c9 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1828,7 +1828,7 @@ repeat:
 		page = __page_cache_alloc(gfp | __GFP_COLD);
 		if (!page)
 			return ERR_PTR(-ENOMEM);
-		err = add_to_page_cache_lru(page, mapping, index, GFP_KERNEL);
+		err = add_to_page_cache_lru(page, mapping, index, gfp);
 		if (unlikely(err)) {
 			page_cache_release(page);
 			if (err == -EEXIST)
@@ -1925,10 +1925,7 @@ static struct page *wait_on_page_read(struct page *page)
  * @gfp:	the page allocator flags to use if allocating
  *
  * This is the same as "read_mapping_page(mapping, index, NULL)", but with
- * any new page allocations done using the specified allocation flags. Note
- * that the Radix tree operations will still use GFP_KERNEL, so you can't
- * expect to do this atomically or anything like that - but you can pass in
- * other page requirements.
+ * any new page allocations done using the specified allocation flags.
  *
  * If the page does not get brought uptodate, return -EIO.
  */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] vfs: __read_cache_page should use gfp argument rather than GFP_KERNEL
  2011-12-21 17:05         ` [PATCH v2] vfs: __read_cache_page should use gfp argument rather than GFP_KERNEL Dave Kleikamp
@ 2011-12-21 20:28           ` Andrew Morton
  2011-12-21 20:53             ` Dave Kleikamp
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2011-12-21 20:28 UTC (permalink / raw)
  To: Dave Kleikamp
  Cc: Hugh Dickins, Linus Torvalds, Rafael J. Wysocki, jfs-discussion,
	Kernel Testers List, LKML, Maciej Rutecki, Florian Mickler, davem,
	Al Viro, linux-mm

On Wed, 21 Dec 2011 11:05:48 -0600
Dave Kleikamp <dave.kleikamp@oracle.com> wrote:

> [ updated to remove now-obsolete comment in read_cache_page_gfp()]
> 
> lockdep reports a deadlock in jfs because a special inode's rw semaphore
> is taken recursively. The mapping's gfp mask is GFP_NOFS, but is not used
> when __read_cache_page() calls add_to_page_cache_lru().

Well hang on, it's not just a lockdep splat.  The kernel actually will
deadlock if we reenter JFS via this GFP_KERNEL allocation attempt, yes?

Was that GFP_NOFS allocation recently added to JFS?  If not then we
should backport this deadlock fix into -stable, no?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] vfs: __read_cache_page should use gfp argument rather than GFP_KERNEL
  2011-12-21 20:28           ` Andrew Morton
@ 2011-12-21 20:53             ` Dave Kleikamp
  0 siblings, 0 replies; 6+ messages in thread
From: Dave Kleikamp @ 2011-12-21 20:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Linus Torvalds, Rafael J. Wysocki, jfs-discussion,
	Kernel Testers List, LKML, Maciej Rutecki, Florian Mickler, davem,
	Al Viro, linux-mm

On 12/21/2011 02:28 PM, Andrew Morton wrote:
> On Wed, 21 Dec 2011 11:05:48 -0600
> Dave Kleikamp <dave.kleikamp@oracle.com> wrote:
> 
>> [ updated to remove now-obsolete comment in read_cache_page_gfp()]
>>
>> lockdep reports a deadlock in jfs because a special inode's rw semaphore
>> is taken recursively. The mapping's gfp mask is GFP_NOFS, but is not used
>> when __read_cache_page() calls add_to_page_cache_lru().
> 
> Well hang on, it's not just a lockdep splat.  The kernel actually will
> deadlock if we reenter JFS via this GFP_KERNEL allocation attempt, yes?

Yes, it could result in a real deadlock.

> Was that GFP_NOFS allocation recently added to JFS?  If not then we
> should backport this deadlock fix into -stable, no?

Yes, that would make sense.

Shaggy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-12-21 20:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <201112210054.46995.rjw@sisk.pl>
     [not found] ` <CA+55aFzee7ORKzjZ-_PrVy796k2ASyTe_Odz=ji7f1VzToOkKw@mail.gmail.com>
     [not found]   ` <4EF15F42.4070104@oracle.com>
2011-12-21  4:44     ` [Resend] 3.2-rc6+: Reported regressions from 3.0 and 3.1 Linus Torvalds
2011-12-21  6:15       ` Hugh Dickins
2011-12-21  7:10         ` Al Viro
2011-12-21 17:05         ` [PATCH v2] vfs: __read_cache_page should use gfp argument rather than GFP_KERNEL Dave Kleikamp
2011-12-21 20:28           ` Andrew Morton
2011-12-21 20:53             ` Dave Kleikamp

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).