linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	srikar@linux.vnet.ibm.com, aarcange@redhat.com, mingo@kernel.org,
	riel@redhat.com
Subject: Re: NUMA Autobalancing Kernel 3.8
Date: Wed, 3 Apr 2013 15:03:45 +0100	[thread overview]
Message-ID: <20130403140344.GA5811@suse.de> (raw)
In-Reply-To: <515AEC71.9020704@profihost.ag>

On Tue, Apr 02, 2013 at 04:34:25PM +0200, Stefan Priebe - Profihost AG wrote:
> > 
> > When you see the 100% CPU usage can you cat /proc/PID/stack a couple of
> > times and post it here? That might give a hint as to where it's going wrong.
> 
> Sadly i'm not able to reproduce a 100% load process tried now for some
> hours. Mostly they segfault.
> 

I see.

I checked the v3.8 and v3.9-rc results for my own NUMA machine but I'm
seeing no evidence of test failures or segfaults. 

> >>> Anything in the kernel log?
> >> Three examples:
> >> pigz[10194]: segfault at 0 ip           (null) sp 00007f6197ffed50 error
> >> 14 in pigz[400000+e000]
> >>
> >> rbd[2811]: segfault at b8 ip 00007f73c2d51b9e sp 00007f73bcae3b40 error
> >> 4 in librados.so.2.0.0[7f73c2afe000+3b9000]
> >>
> >> rbd[1805]: segfault at 0 ip 00007f60c28dceb4 sp 00007f60b7ffd1f8 error 4
> >> in ld-2.11.3.so[7f60c28cc000+1e000]
> >>
> >>> Any particular pattern to the crashes? Any means of reliably
> >>> reproducing it?
> >> No i just need to run some task and after some time they die or hang
> >> forever. I have this on 10 different E5-2640 and also on E56XX. I can
> >> "fix" this by:
> >>   1.) putting all memory to just ONE CPU
> >>   2.) Disable NUMA Balancing
> >>
> >
> > That does point the finger at the automatic balancing.
> > 
> >>> 3.8 vanilla, 3.8-stable or 3.8 with any other patches
> >>> applied?
> >> 3.8.4 without any patches.
> >>
> > Did it happen in 3.8?
> 
> I've now tested 3.9-rc5 this gaves me a slightly different kernel log:
> [  197.236518] pigz[2908]: segfault at 0 ip           (null) sp
> 00007f347bffed00 error 14
> [  197.237632] traps: pigz[2915] general protection ip:7f3482dbce2d
> sp:7f3473ffec10 error:0 in libz.so.1.2.3.4[7f3482db7000+17000]
> [  197.330615]  in pigz[400000+10000]
> 
> With 3.8 it is the same as with 3.8.4 or 3.8.5.
> 

Ok. Are there NUMA machines were you do *not* see this problem? If so,
can you spot what the common configuration, software or hardware, that
affects the broken machines versus the working machines? I'm wondering
if there is a bug in a migration handler.

Do you know if a NUMA nodes are low on memory when the segfaults occur?
I'm also considering the possibility that one of the migration failure
paths are failing to clear a NUMA hinting entry properly.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-04-03 14:03 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-02  7:24 NUMA Autobalancing Kernel 3.8 Stefan Priebe - Profihost AG
2013-04-02 10:48 ` Mel Gorman
2013-04-02 11:41   ` Stefan Priebe - Profihost AG
2013-04-02 12:54     ` Mel Gorman
2013-04-02 14:34       ` Stefan Priebe - Profihost AG
2013-04-03 14:03         ` Mel Gorman [this message]
2013-04-03 14:11           ` Stefan Priebe - Profihost AG
2013-04-05 12:00             ` Mel Gorman
2013-04-05 12:10               ` Stefan Priebe - Profihost AG
2013-04-08  8:13                 ` Mel Gorman
2013-04-08  9:14                   ` Stefan Priebe - Profihost AG

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130403140344.GA5811@suse.de \
    --to=mgorman@suse.de \
    --cc=aarcange@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mingo@kernel.org \
    --cc=riel@redhat.com \
    --cc=s.priebe@profihost.ag \
    --cc=srikar@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).