All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	srikar@linux.vnet.ibm.com, aarcange@redhat.com, mingo@kernel.org,
	riel@redhat.com
Subject: Re: NUMA Autobalancing Kernel 3.8
Date: Wed, 3 Apr 2013 15:03:45 +0100	[thread overview]
Message-ID: <20130403140344.GA5811@suse.de> (raw)
In-Reply-To: <515AEC71.9020704@profihost.ag>

On Tue, Apr 02, 2013 at 04:34:25PM +0200, Stefan Priebe - Profihost AG wrote:
> > 
> > When you see the 100% CPU usage can you cat /proc/PID/stack a couple of
> > times and post it here? That might give a hint as to where it's going wrong.
> 
> Sadly i'm not able to reproduce a 100% load process tried now for some
> hours. Mostly they segfault.
> 

I see.

I checked the v3.8 and v3.9-rc results for my own NUMA machine but I'm
seeing no evidence of test failures or segfaults. 

> >>> Anything in the kernel log?
> >> Three examples:
> >> pigz[10194]: segfault at 0 ip           (null) sp 00007f6197ffed50 error
> >> 14 in pigz[400000+e000]
> >>
> >> rbd[2811]: segfault at b8 ip 00007f73c2d51b9e sp 00007f73bcae3b40 error
> >> 4 in librados.so.2.0.0[7f73c2afe000+3b9000]
> >>
> >> rbd[1805]: segfault at 0 ip 00007f60c28dceb4 sp 00007f60b7ffd1f8 error 4
> >> in ld-2.11.3.so[7f60c28cc000+1e000]
> >>
> >>> Any particular pattern to the crashes? Any means of reliably
> >>> reproducing it?
> >> No i just need to run some task and after some time they die or hang
> >> forever. I have this on 10 different E5-2640 and also on E56XX. I can
> >> "fix" this by:
> >>   1.) putting all memory to just ONE CPU
> >>   2.) Disable NUMA Balancing
> >>
> >
> > That does point the finger at the automatic balancing.
> > 
> >>> 3.8 vanilla, 3.8-stable or 3.8 with any other patches
> >>> applied?
> >> 3.8.4 without any patches.
> >>
> > Did it happen in 3.8?
> 
> I've now tested 3.9-rc5 this gaves me a slightly different kernel log:
> [  197.236518] pigz[2908]: segfault at 0 ip           (null) sp
> 00007f347bffed00 error 14
> [  197.237632] traps: pigz[2915] general protection ip:7f3482dbce2d
> sp:7f3473ffec10 error:0 in libz.so.1.2.3.4[7f3482db7000+17000]
> [  197.330615]  in pigz[400000+10000]
> 
> With 3.8 it is the same as with 3.8.4 or 3.8.5.
> 

Ok. Are there NUMA machines were you do *not* see this problem? If so,
can you spot what the common configuration, software or hardware, that
affects the broken machines versus the working machines? I'm wondering
if there is a bug in a migration handler.

Do you know if a NUMA nodes are low on memory when the segfaults occur?
I'm also considering the possibility that one of the migration failure
paths are failing to clear a NUMA hinting entry properly.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-04-03 14:03 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-02  7:24 NUMA Autobalancing Kernel 3.8 Stefan Priebe - Profihost AG
2013-04-02 10:48 ` Mel Gorman
2013-04-02 11:41   ` Stefan Priebe - Profihost AG
2013-04-02 12:54     ` Mel Gorman
2013-04-02 14:34       ` Stefan Priebe - Profihost AG
2013-04-03 14:03         ` Mel Gorman [this message]
2013-04-03 14:11           ` Stefan Priebe - Profihost AG
2013-04-05 12:00             ` Mel Gorman
2013-04-05 12:10               ` Stefan Priebe - Profihost AG
2013-04-08  8:13                 ` Mel Gorman
2013-04-08  9:14                   ` Stefan Priebe - Profihost AG

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130403140344.GA5811@suse.de \
    --to=mgorman@suse.de \
    --cc=aarcange@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mingo@kernel.org \
    --cc=riel@redhat.com \
    --cc=s.priebe@profihost.ag \
    --cc=srikar@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.