From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751674AbWCBTMU (ORCPT ); Thu, 2 Mar 2006 14:12:20 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752040AbWCBTMU (ORCPT ); Thu, 2 Mar 2006 14:12:20 -0500 Received: from omx1-ext.sgi.com ([192.48.179.11]:19650 "EHLO omx1.americas.sgi.com") by vger.kernel.org with ESMTP id S1751674AbWCBTMT (ORCPT ); Thu, 2 Mar 2006 14:12:19 -0500 Date: Thu, 2 Mar 2006 11:12:01 -0800 From: Paul Jackson To: Andrew Morton Cc: greg@kroah.com, ebiederm@xmission.com, linux-kernel@vger.kernel.org, yanmin.zhang@intel.com, neilb@cse.unsw.edu.au, steiner@sgi.com, hawkes@sgi.com Subject: Re: + proc-dont-lock-task_structs-indefinitely-cpuset-fix-2.patch added to -mm tree Message-Id: <20060302111201.cf61552f.pj@sgi.com> In-Reply-To: <20060301234215.62010fec.akpm@osdl.org> References: <20060228183610.5253feb9.akpm@osdl.org> <20060228194525.0faebaaa.pj@sgi.com> <20060228201040.34a1e8f5.pj@sgi.com> <20060228212501.25464659.pj@sgi.com> <20060228234807.55f1b25f.pj@sgi.com> <20060301002631.48e3800e.akpm@osdl.org> <20060301015338.b296b7ad.pj@sgi.com> <20060301192103.GA14320@kroah.com> <20060301125802.cce9ef51.pj@sgi.com> <20060301213048.GA17251@kroah.com> <20060301142631.22738f2d.akpm@osdl.org> <20060301151000.5fff8ec5.pj@sgi.com> <20060301154040.a7cb2afd.pj@sgi.com> <20060301202058.42975408.akpm@osdl.org> <20060301221429.c61b4ae6.pj@sgi.com> <20060301234215.62010fec.akpm@osdl.org> Organization: SGI X-Mailer: Sylpheed version 2.1.7 (GTK+ 2.4.9; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org John, Jack, Adding you to this one. We've seen this before, and I wasted Andrew's and others time chasing it again. I speculate at the bottom of this message that I should add a panic on the kzalloc that fails if one has 1024 CPUS, SPINLOCK debug, and just slightly larger data structures with a new patch than we had before. Feel free to comment on whether such a panic, or other remedy would be desirable or not. === Andrew wrote: > OK. This is awful. I cannot see it. Crap - I should have recognized this problem a day ago. I wasted your time. Sorry. The extra data pushed the size of our (big SN2) sysfs_cpus[] array past the point that it could be kzalloc'd. The initial failure is in the file: arch/ia64/kernel/topology.c function: topology_init line: sysfs_cpus = kzalloc(sizeof(struct ia64_cpu) * NR_CPUS, GFP_KERNEL); With our large NR_CPUS of 1024, and the additional cost of the CONFIG_DEBUG_SPINLOCK* debug stuff, and the little bit of additional data added by this patch, that kzalloc() fails. The final collapse occurs in the file: fs/sysfs/group.c function: sysfs_create_group line: BUG_ON(!kobj || !kobj->dentry); where kobj->dentry points to 0x58. The offset of dentry in struct kobject is 0x50, and the offset of that kobj in the containing struct sys_dev is another 0x8 bytes, resulting in the failed reference: Unable to handle kernel NULL pointer dereference (address 0000000000000058) The drivers/base/cpu.c array: static struct sys_device *cpu_sys_devices[NR_CPUS]; is never filled in, as a result of the above kzalloc() failure, causing the routine get_cpu_sysdev() in drivers/base/cpu.c to return a NULL pointer (unknowingly). I should stare at the code between this point of initial failure and the point that the house of cards finally collapsed and see if something should have squeaked sooner. Though I'm not a guru in this code, so I'm saying a little prayer that someone else will have a more useful suggestion. I've added a couple of SGI wizards in this area to the cc list. I suspect that the short term solution is to proceed without prejudice to the patch that triggered this: gregkh-driver-allow-sysfs-attribute-files-to-be-pollable.patch while I look at some way, if just a stop gap measure, to complain earlier in the boot, closer to the scene of the original crime, so that others hitting this won't waste more time. Perhaps failing that first kzalloc should cause a complaint, if not a panic. It would seem that the system is beyond repair if that kzalloc fails. And since the system hasn't even finished booting yet, and is for sure trying to boot some larger than tried before configuration, might just as well announce ones death boldly. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson 1.925.600.0401