linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Sachin Sant <sachinp@in.ibm.com>
Cc: Tejun Heo <tj@kernel.org>, Pekka Enberg <penberg@cs.helsinki.fi>,
	Nick Piggin <npiggin@suse.de>,
	Christoph Lameter <cl@linux-foundation.org>,
	heiko.carstens@de.ibm.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>
Subject: Re: [PATCH 1/3] slqb: Do not use DEFINE_PER_CPU for per-node data
Date: Mon, 21 Sep 2009 09:42:48 +0100	[thread overview]
Message-ID: <20090921084248.GC12726@csn.ul.ie> (raw)
In-Reply-To: <4AB739A6.5060807@in.ibm.com>

On Mon, Sep 21, 2009 at 02:00:30PM +0530, Sachin Sant wrote:
> Tejun Heo wrote:
>> Pekka Enberg wrote:
>>   
>>> Tejun Heo wrote:
>>>     
>>>> Pekka Enberg wrote:
>>>>       
>>>>> On Fri, Sep 18, 2009 at 10:34 PM, Mel Gorman <mel@csn.ul.ie> wrote:
>>>>>         
>>>>>> SLQB used a seemingly nice hack to allocate per-node data for the
>>>>>> statically
>>>>>> initialised caches. Unfortunately, due to some unknown per-cpu
>>>>>> optimisation, these regions are being reused by something else as the
>>>>>> per-node data is getting randomly scrambled. This patch fixes the
>>>>>> problem but it's not fully understood *why* it fixes the problem at the
>>>>>> moment.
>>>>>>           
>>>>> Ouch, that sounds bad. I guess it's architecture specific bug as x86
>>>>> works ok? Lets CC Tejun.
>>>>>         
>>>> Is the corruption being seen on ppc or s390?
>>>>       
>>> On ppc.
>>>     
>>
>> Can you please post full dmesg showing the corruption? 

There isn't a useful dmesg available and my evidence that it's within the
pcpu allocator is a bit weak. Symptons are crashing within SLQB when a
second CPU is brought up due to a bad data access with a declared per-cpu
area. Sometimes it'll look like the value was NULL and other times it's a
random.

The "per-cpu" area in this case is actually a per-node area. This implied that
it was either racing (but the locking looked sound), a buffer overflow (but
I couldn't find one) or the per-cpu areas were being written to by something
else unrelated. I considered it possible that as the CPU and node numbers did
not match up that the unused numbers were being freed up for use elsewhere. I
haven't dug into the per-cpu implementation to see if this is a possibility.

>> Also, if you
>> apply the attached patch, does the added BUG_ON() trigger?
>>   
> I applied the three patches from Mel and one from Tejun.

Thanks Sachin

Was there any useful result from Tejun's patch applied on its own?

> With these patches applied the machine boots past
> the original reported SLQB problem, but then hangs
> just after printing these messages.
>
> <6>ehea: eth0: Physical port up
> <7>irq: irq 33539 on host null mapped to virtual irq 259
> <6>ehea: External switch port is backup port
> <7>irq: irq 33540 on host null mapped to virtual irq 260
> <6>NET: Registered protocol family 10
> ^^^^^^ Hangs at this point.
>
> Tejun, the above hang looks exactly the same as the one
> i have reported here :
>
> http://lists.ozlabs.org/pipermail/linuxppc-dev/2009-September/075791.html
>
> This particular hang was bisected to the following patch
>
> powerpc64: convert to dynamic percpu allocator
>
> This hang can be recreated without SLQB. So i think this is a different
> problem. 
>

Was that bug ever resolved?

> I have attached the complete dmesg log here.
>

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-09-21  8:42 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-18 19:34 [RFC PATCH 0/3] Hatchet job for SLQB on memoryless configurations Mel Gorman
2009-09-18 19:34 ` [PATCH 1/3] slqb: Do not use DEFINE_PER_CPU for per-node data Mel Gorman
2009-09-20  8:45   ` Pekka Enberg
2009-09-20 10:00     ` Tejun Heo
2009-09-20 10:12       ` Pekka Enberg
2009-09-20 15:55         ` Tejun Heo
2009-09-21  6:24           ` Pekka Enberg
2009-09-21  8:46             ` Mel Gorman
2009-09-21  8:30           ` Sachin Sant
2009-09-21  8:42             ` Mel Gorman [this message]
2009-09-21  9:00               ` Tejun Heo
2009-09-21  9:44                 ` Mel Gorman
2009-09-21  9:53                   ` Tejun Heo
2009-09-21 10:04                     ` Mel Gorman
2009-09-21  9:02               ` Sachin Sant
2009-09-21  9:09                 ` Mel Gorman
2009-09-21 13:04               ` Mel Gorman
2009-09-21 13:31                 ` Pekka Enberg
2009-09-21 13:45                 ` Tejun Heo
2009-09-21 13:57                   ` Mel Gorman
2009-09-21 23:54                     ` Benjamin Herrenschmidt
2009-09-20 14:04     ` Mel Gorman
2009-09-18 19:34 ` [PATCH 2/3] slqb: Treat pages freed on a memoryless node as local node Mel Gorman
2009-09-18 21:01   ` Christoph Lameter
2009-09-19 11:46     ` Mel Gorman
2009-09-21 17:34       ` Lee Schermerhorn
2009-09-22 13:33         ` Mel Gorman
2009-09-22 18:29           ` Lee Schermerhorn
2009-09-18 19:34 ` [PATCH 3/3] slqb: Allow SLQB to be used on PPC and S390 Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090921084248.GC12726@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=cl@linux-foundation.org \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=penberg@cs.helsinki.fi \
    --cc=sachinp@in.ibm.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).