All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	Christoph Lameter <cl@linux.com>,
	Lee Schermerhorn <lee.schermerhorn@hp.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tejun Heo <tj@kernel.org>, Peter Zijlstra <peterz@infradead.org>,
	Brian Gerst <brgerst@gmail.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org, mingo@elte.hu
Subject: Re: [PATCH] numa: fix slab_node(MPOL_BIND)
Date: Thu, 28 Oct 2010 17:45:14 +0100	[thread overview]
Message-ID: <20101028164514.GE4896@csn.ul.ie> (raw)
In-Reply-To: <AANLkTikmNc7qqDzoff_3i_FRbG=pmOC7TG3eeZnmvaTD@mail.gmail.com>

On Thu, Oct 28, 2010 at 08:59:42AM -0700, Linus Torvalds wrote:
> Hmm. More people added to the discussion..
> 
> This code seems to go back all the way to commit 19770b32609b: "mm:
> filter based on a nodemask as well as a gfp_mask". Which was back in
> April 2008. and got merged into 2.6.26.
> 

I am about to run out the door so I didn't read the thread but
first_zones_zonelist() can indeed return NULL. It happens when the
zonelist is empty (unlikely) or when a nodemask is applied restricting
the allowable nodes and that results in no valid zones (more likely).

> And I'd be happy to commit it (in fact, I was going to), but when
> looking for other uses of first_zones_zonelist(), I found
> local_memory_node() which does the exact same thing: ignore the return
> value, and unconditionally dereference the resulting 'zone' variable.
> 

That does look unsafe.

> And so does - although less obviously - mm/vmscan.c for the
> wait_iff_confgested() thing.
> 

It should be implicitly safe although it is non-obvious.  wait_iff_congested
in mm/vmscan.c is called from do_try_to_free_pages() which is in the direct
reclaim path. To get there, it must have passed this check in page_alloc.c

        first_zones_zonelist(zonelist, high_zoneidx, nodemask, &preferred_zone);
        if (!preferred_zone) {
                put_mems_allowed();
                return NULL;
        }

Did I miss anything?

The memory controller also can end up there but for it to get into
trouble, they would have to be trying to shrink a cgroup with an invalid
zonelist. Is that possible?

> So are those buggy too, since first_zones_zonelist() can apparently return NULL?
> 

Yes, it can.

> Please advise...
> 

Callers need to check for NULL or be sure they are not dealing with an
empty zonelist.

>                   Linus
> 
> On Wed, Oct 27, 2010 at 10:33 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le mercredi 27 octobre 2010 à 18:07 +0200, Eric Dumazet a écrit :
> >
> >> So I tried following experiment :
> >>
> >> # swapoff
> >> # numactl --membind=0 swapon -a
> >> # grep swap /proc/vmallocinfo
> >> 0xf9bf3000-0xf9cf4000 1052672 sys_swapon+0x4aa/0xb24 pages=256 vmalloc N0=256
> >> # swapoff -a
> >> # numactl --membind=1 swapon -a
> >>
> >> <<FREEZE>>
> >>
> >
> > Crash in fact, not freeze, in slab_node()
> >
> > Problem is : we dereference a NULL zone pointer.
> >
> > (node 1 has HighMem only)
> >
> > Following patch seems to solve the problem for me
> >
> > # swapoff -a
> > # numactl --membind=1 swapon -a
> > # grep swap /proc/vmallocinfo
> > 0xf9da5000-0xf9ea6000 1052672 sys_swapon+0x3f9/0xa34 pages=256 vmalloc N1=256
> >
> >
> > Thanks
> >
> >
> > [PATCH] numa: fix slab_node(MPOL_BIND)
> >
> > When a node contains only HighMem memory, slab_node(MPOL_BIND)
> > dereferences a NULL pointer.
> >
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > ---
> >  mm/mempolicy.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 81a1276..4a57f13 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -1597,7 +1597,7 @@ unsigned slab_node(struct mempolicy *policy)
> >                (void)first_zones_zonelist(zonelist, highest_zoneidx,
> >                                                        &policy->v.nodes,
> >                                                        &zone);
> > -               return zone->node;
> > +               return zone ? zone->node : numa_node_id();
> >        }
> >
> >        default:
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

  parent reply	other threads:[~2010-10-28 16:45 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-25 22:41 [PATCH] MN10300: Fix the PERCPU() alignment to allow for workqueues David Howells
2010-10-26  9:10 ` Tejun Heo
2010-10-26 10:22   ` David Howells
2010-10-26 12:14     ` Tejun Heo
2010-10-26 12:27       ` Tejun Heo
2010-10-26 12:45         ` [PATCH] x86, percpu: revert commit fe8e0c25 Tejun Heo
2010-10-26 13:25           ` Ingo Molnar
2010-10-26 13:34             ` Tejun Heo
2010-10-26 13:49               ` Brian Gerst
2010-10-26 15:08                 ` Linus Torvalds
2010-10-27  5:43                   ` [PATCH] x86-32: Allocate irq stacks seperate from percpu area Brian Gerst
2010-10-27  6:07                     ` Eric Dumazet
2010-10-27  9:57                       ` Peter Zijlstra
2010-10-27 13:33                         ` Eric Dumazet
2010-10-27 13:42                           ` Tejun Heo
2010-10-27 13:57                             ` Eric Dumazet
2010-10-27 14:00                               ` Tejun Heo
2010-10-27 14:24                                 ` Eric Dumazet
2010-10-27 14:39                                   ` Tejun Heo
2010-10-27 14:39                                   ` Eric Dumazet
2010-10-27 14:43                                     ` Tejun Heo
2010-10-27 15:21                                       ` Eric Dumazet
2010-10-27 15:35                                         ` Tejun Heo
2010-10-27 16:07                                           ` Eric Dumazet
2010-10-27 17:33                                             ` [PATCH] numa: fix slab_node(MPOL_BIND) Eric Dumazet
2010-10-28 15:59                                               ` Linus Torvalds
2010-10-28 16:27                                                 ` Eric Dumazet
2010-10-28 16:45                                                 ` Mel Gorman [this message]
2010-10-28 16:55                                                 ` Christoph Lameter
2010-10-28 21:07                                                   ` Andrew Morton
2010-10-29 14:55                                                     ` Christoph Lameter
2010-10-27 20:55                                           ` [PATCH] x86-32: Allocate irq stacks seperate from percpu area Eric Dumazet
2010-10-28 12:01                                             ` Tejun Heo
2010-10-28 12:30                                               ` Eric Dumazet
2010-10-28 14:40                         ` [PATCH] x86-32: NUMA irq stacks allocations Eric Dumazet
2010-10-29  6:43                           ` [tip:x86/urgent] x86-32: Restore irq stacks NUMA-aware allocations tip-bot for Eric Dumazet
2010-10-29 18:32                             ` Peter Zijlstra
2010-10-29 20:09                               ` Cyrill Gorcunov
2010-10-29 20:28                               ` Cyrill Gorcunov
2010-10-29 20:53                                 ` Eric Dumazet
2010-10-29 20:59                                   ` Cyrill Gorcunov
2010-10-29 20:58                                 ` Eric Dumazet
2010-10-29 21:21                                   ` Cyrill Gorcunov
2010-10-27 15:19                     ` [PATCH] x86-32: Allocate irq stacks seperate from percpu area Linus Torvalds
2010-10-27 15:30                       ` Ingo Molnar
2010-10-27 15:33                         ` Ingo Molnar
2010-10-27 15:40                           ` Tejun Heo
2010-10-27 15:43                             ` Ingo Molnar
2010-10-27 16:03                     ` [tip:x86/urgent] " tip-bot for Brian Gerst
2010-10-27 16:04                     ` [tip:x86/urgent] percpu: Remove the multi-page alignment facility tip-bot for Ingo Molnar
2010-10-26 14:06           ` [RFC PATCH] percpu: always align percpu output section to PAGE_SIZE Tejun Heo
2011-03-24  6:46             ` [Uclinux-dist-devel] " Mike Frysinger
2011-03-24  6:46               ` Mike Frysinger
2011-03-24  8:25               ` Tejun Heo
2011-03-24  8:25                 ` Tejun Heo
2011-03-24  8:51                 ` Tejun Heo
2011-03-24  8:51                   ` Tejun Heo
2011-03-24 13:46                   ` Mike Frysinger
2011-03-24 13:46                     ` Mike Frysinger
2011-03-24 17:51                     ` Tejun Heo
2011-03-24  8:54             ` [PATCH UPDATED] " Tejun Heo
2010-10-26 14:50       ` [PATCH] MN10300: Fix the PERCPU() alignment to allow for workqueues David Howells
2010-10-26 14:56         ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101028164514.GE4896@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=akpm@linux-foundation.org \
    --cc=brgerst@gmail.com \
    --cc=cl@linux.com \
    --cc=eric.dumazet@gmail.com \
    --cc=lee.schermerhorn@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.