From: Petr Vandrovec <vandrove@vc.cvut.cz>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org, Christoph Lameter <christoph@lameter.com>
Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849
Date: Mon, 19 Sep 2005 20:56:26 +0200 [thread overview]
Message-ID: <432F09DA.7050408@vc.cvut.cz> (raw)
In-Reply-To: <20050919112912.18daf2eb.akpm@osdl.org>
Andrew Morton wrote:
> Petr Vandrovec <vandrove@vc.cvut.cz> wrote:
>
>>Andrew Morton wrote:
>>
>>>Petr Vandrovec <vandrove@vc.cvut.cz> wrote:
>>>
>>>
>>>>Andrew Morton wrote:
>>>>
>>>>>Petr Vandrovec <vandrove@vc.cvut.cz> wrote:
>>>>>
>>>>>
>>>>>> so now once crashes on UP system were sorted out, I tried to
>>>>>>put new kernel on my SMP host - and sorry to say, but it does not
>>>>>>seem to work as advertised :-(
>>>>>
>>>>>.config (again), please.
>>>>
>>>>Any SMP with NUMA. One which I'm trying to debug now is attached.
>>>>It is available at http://vana.vc.cvut.cz/config as well.
>>>
>>>I can get 2.6.14-rc1 to crash with your .config, but current -linus is OK.
>>
>>It still dies for me, with current git (tree 7513cdadc661cfe0bd1625145a4876e54df191ca,
>>commit 6c0741fbdee5bd0f8ed13ac287c4ab18e8ba7d83). Config is available at
>>http://platan.vc.cvut.cz/config-vana.txt. Box is dual opteron Tyan K8W, S2885.
>>
>>...
>>
>> ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
>> ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio
>>----------- [cut here ] --------- [please bite here ] ---------
>>Kernel BUG at mm/slab.c:1849
>>invalid operand: 0000 [1] SMP
>>CPU 0
>>Modules linked in:
>>Pid: 8, comm: events/0 Not tainted 2.6.14-rc1-6c07 #1
>>RIP: 0010:[<ffffffff8016d316>] <ffffffff8016d316>{free_block+294}
>>RSP: 0000:ffff81007ff21d88 EFLAGS: 00010002
>>RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000310
>>RDX: 0000000000000000 RSI: ffff81007ffddd10 RDI: ffff81007ffda080
>>RBP: ffff81007ffde000 R08: ffff81003ffa0d50 R09: 0000000000000000
>>R10: 00000000ffffffff R11: 0000000000000000 R12: ffff81007ffc9b50
>>R13: ffff81007ffde048 R14: ffff81007ffda080 R15: ffff81007ffda080
>>FS: 0000000000000000(0000) GS:ffffffff805f2800(0000) knlGS:0000000000000000
>>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>>CR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006e0
>>Process events/0 (pid: 8, threadinfo ffff81007ff20000, task ffff81003ff8c790)
>>Stack: 0000000000000000 0000000000000000 0000000000000292 hda: _NEC DVD_RW ND-3500AG, ATAPI CD/DVD-ROM drive
>>0000000200000000
>> ffff81007ffddd10 ffff81007ffddd10 ffff81007ffddce8 0000000000000002
>> 0000000000000000 ffff81007ffda080
>>Call Trace:<ffffffff8016e8b7>{drain_array_locked+167} <ffffffff8016e9f7>{cache_reap+231}
>> <ffffffff80131e23>{__wake_up+67} <ffffffff8016e910>{cache_reap+0}
>> <ffffffff8014930c>{worker_thread+476} <ffffffff80131d60>{default_wake_function+0}
>> <ffffffff80131d60>{default_wake_function+0} <ffffffff80149130>{worker_thread+0}
>> <ffffffff8014db82>{kthread+146} <ffffffff8010ec22>{child_rip+8}
>> <ffffffff80149130>{worker_thread+0} <ffffffff8014daf0>{kthread+0}
>> <ffffffff8010ec1a>{child_rip+0}
>>
>>Code: 0f 0b 68 9d 26 3d 80 c2 39 07 48 89 ee 4c 89 ff 4c 8d 75 30
>>RIP <ffffffff8016d316>{free_block+294} RSP <ffff81007ff21d88>
>> ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
>
>
> Well. The CPU_UP_CANCELED locking in cpuup_callback() looks borked to me -
> it takes cachep->nodelists[node]->list_lock and then calls
> drain_alien_cache() which appears to take the same lock. But that's not
> the problem here.
>
> The code in cache_reap() recalculates numa_node_id() multiple times, so if
> the caller changes CPUs then this assertion will trigger. However it's
> running under keventd here, which is pinned to a single CPU. Still, it
> would be useful if you could try putting preempt_disable()s in
> cache_reap(), or change cache_reap() to evaluate numa_node_id() just the
> once, and cache that in a local variable.
>
> I wonder why numa_node_id() uses raw_smp_processor_id()? That's just
> asking for preempt non-atomicity bugs.
I've thought that this is problem, but as far as I can tell while this is
problem it does not happen here. Just free_block() finds that pointer it
got from caller belongs to the slab that belongs to the CPU#1/node#1
while caller obtained lock on CPU#0/node#0 structures. Which suggests
that drain_array_locked() was issued with node #0 while array_cache->entry
it got contains blocks which belong to node #1. Which I cannot explain.
Petr
next prev parent reply other threads:[~2005-09-19 18:56 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-09-15 16:51 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 Petr Vandrovec
2005-09-15 17:33 ` Petr Vandrovec
[not found] ` <20050916023005.4146e499.akpm@osdl.org>
[not found] ` <432AA00D.4030706@vc.cvut.cz>
[not found] ` <20050916230809.789d6b0b.akpm@osdl.org>
2005-09-19 16:02 ` Petr Vandrovec
2005-09-19 18:29 ` Andrew Morton
2005-09-19 18:51 ` Christoph Lameter
2005-09-19 19:28 ` Andrew Morton
2005-09-19 21:20 ` Christoph Lameter
2005-09-20 5:16 ` Andrew Morton
2005-09-20 8:34 ` Alok Kataria
2005-09-20 13:58 ` Petr Vandrovec
2005-09-21 1:03 ` Christoph Lameter
2005-09-21 1:22 ` Petr Vandrovec
2005-09-21 15:59 ` Christoph Lameter
2005-09-22 19:52 ` Christoph Lameter
2005-09-22 20:01 ` Andrew Morton
2005-09-22 21:25 ` Petr Vandrovec
2005-09-22 21:32 ` Christoph Lameter
2005-09-22 21:46 ` Andrew Morton
2005-09-22 21:54 ` Christoph Lameter
2005-09-23 0:25 ` Petr Vandrovec
2005-09-28 21:02 ` Ravikiran G Thirumalai
2005-09-28 22:50 ` Christoph Lameter
2005-09-29 16:43 ` Petr Vandrovec
2005-09-29 18:11 ` Ravikiran G Thirumalai
2005-09-29 18:38 ` Christoph Lameter
2005-09-30 5:45 ` Ravikiran G Thirumalai
2005-09-30 6:05 ` Andrew Morton
2005-09-30 6:28 ` Ravikiran G Thirumalai
2005-09-30 15:16 ` Bryan O'Sullivan
2005-09-30 15:57 ` Christoph Lameter
2005-09-30 16:45 ` Bryan O'Sullivan
2005-09-30 20:11 ` Andi Kleen
2005-09-30 20:23 ` Ravikiran G Thirumalai
2005-09-30 16:55 ` Christoph Lameter
2005-09-19 18:56 ` Petr Vandrovec [this message]
2005-09-19 19:08 ` Christoph Lameter
-- strict thread matches above, loose matches on Subject: below --
2005-09-23 19:34 Alok Kataria
2005-09-23 23:57 ` Christoph Lameter
2005-09-24 0:05 ` Christoph Lameter
2005-09-24 12:52 ` Manfred Spraul
2005-09-25 14:16 Alok Kataria
2005-09-26 18:00 ` Christoph Lameter
2005-09-26 19:34 ` Alok Kataria
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=432F09DA.7050408@vc.cvut.cz \
--to=vandrove@vc.cvut.cz \
--cc=akpm@osdl.org \
--cc=christoph@lameter.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox