From mboxrd@z Thu Jan 1 00:00:00 1970 From: Scott Garron Subject: Re: BUG: unable to handle kernel NULL pointer dereference at IP: [] process_one_work+ Date: Tue, 14 Jun 2011 17:55:47 -0400 Message-ID: <4DF7D8E3.7080705@sce.pridelands.org> References: <20110606191725.GZ32595@reaktio.net> <4DED478E.5070607@sce.pridelands.org> <20110607191949.GB2075@dumpdata.com> <4DEFBE7F.5060909@sce.pridelands.org> <20110608192916.GA4909@dumpdata.com> <4DF12747.2090900@sce.pridelands.org> <20110610125906.GA10831@dumpdata.com> <4DF24B79.8050909@sce.pridelands.org> <20110613220352.GA23755@dumpdata.com> <4DF69B42.4080908@sce.pridelands.org> <20110614135543.GA27849@dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20110614135543.GA27849@dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Konrad Rzeszutek Wilk Cc: Jeremy Fitzhardinge , xen-devel@lists.xensource.com, Dan Magenheimer List-Id: xen-devel@lists.xenproject.org On 06/14/2011 09:55 AM, Konrad Rzeszutek Wilk wrote: > But the curious thing is that you have two CPUs assigned to Dom0 and > while CPU0 looks to be bouncing back and forth, CPU1 is doing > something. The RIP is 0xffffffff8108820c. Can you try to run this > through System.map? Or the whole bunch of these: > > ffffffff8108820c ffffffff81088100 ffffffff810881a7 ffffffff8108811a > ffffffff816101a8 ffffffff81006c32 ffffffff816114a4 ffffffff8108803a > ffffffff8105f5bd ffffffff81618564 ffffffff81617973 ffffffff816117a1 > ffffffff81618560 I grabbed code snippets for each of these locations and put them here: http://pridelands.org/~simba/xen/hailstorm-debugnotes.txt > The other idea is to limit Dom0 to only run on one CPU. You can do > this by having 'dom0_max_vcpus=1 dom0_vcpus_pin' and see if it fails > somewhere else? It probably will die in the 0xffffffff810013aa :-( After setting dom0_max_vcpus=1 and dom0_vcpus_pin, the boot got to "Trying to unpack rootfs image as initramfs..." and hung there. The serial console as well as the CTRL_A(x3) * outputs are here: http://pridelands.org/~simba/xen/hailstorm-fullserial20110614.txt > But irregardless of what I mentioned above we need to find out why > process_one_worker got a toxic parameter. Can you disassemble > 0xffffffff8105ae4c and see what it does and how it corresponds to > 'process_one_work' in kernel/workqueue.c? I put the disassembly of it in the hailstorm-debugnotes.txt file that I mentioned above. Let me know if you need more than that. > You can also instrument the code to find out what: > > 1804 work_func_t f = work->func; > > is I think this request is starting to go a little beyond what I know how to do. -- Scott Garron