From mboxrd@z Thu Jan  1 00:00:00 1970
From: Scott Garron <xen-devel@sce.pridelands.org>
Subject: Re: BUG: unable to handle kernel NULL pointer
 dereference at IP: [<ffffffff8105ae4c>] process_one_work+
Date: Tue, 14 Jun 2011 17:55:47 -0400
Message-ID: <4DF7D8E3.7080705@sce.pridelands.org>
References: <20110606191725.GZ32595@reaktio.net>
	<4DED478E.5070607@sce.pridelands.org>
	<20110607191949.GB2075@dumpdata.com>
	<4DEFBE7F.5060909@sce.pridelands.org>
	<20110608192916.GA4909@dumpdata.com>
	<4DF12747.2090900@sce.pridelands.org>
	<20110610125906.GA10831@dumpdata.com>
	<4DF24B79.8050909@sce.pridelands.org>
	<20110613220352.GA23755@dumpdata.com>
	<4DF69B42.4080908@sce.pridelands.org>
	<20110614135543.GA27849@dumpdata.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <20110614135543.GA27849@dumpdata.com>
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>, xen-devel@lists.xensource.com, Dan Magenheimer <dan.magenheimer@oracle.com>
List-Id: xen-devel@lists.xenproject.org

On 06/14/2011 09:55 AM, Konrad Rzeszutek Wilk wrote:
> But the curious thing is that you have two CPUs assigned to Dom0 and
> while CPU0 looks to be bouncing back and forth, CPU1 is doing
> something. The RIP is 0xffffffff8108820c. Can you try to run this
> through System.map? Or the whole bunch of these:
>
> ffffffff8108820c ffffffff81088100 ffffffff810881a7 ffffffff8108811a
> ffffffff816101a8 ffffffff81006c32 ffffffff816114a4 ffffffff8108803a
> ffffffff8105f5bd ffffffff81618564 ffffffff81617973 ffffffff816117a1
> ffffffff81618560

      I grabbed code snippets for each of these locations and put them here:

http://pridelands.org/~simba/xen/hailstorm-debugnotes.txt

> The other idea is to limit Dom0 to only run on one CPU. You can do
> this by having 'dom0_max_vcpus=1 dom0_vcpus_pin' and see if it fails
> somewhere else? It probably will die in the 0xffffffff810013aa :-(

      After setting dom0_max_vcpus=1 and dom0_vcpus_pin, the boot got to
"Trying to unpack rootfs image as initramfs..." and hung there.  The
serial console as well as the CTRL_A(x3) * outputs are here:

http://pridelands.org/~simba/xen/hailstorm-fullserial20110614.txt

> But irregardless of what I mentioned above we need to find out why
> process_one_worker got a toxic parameter. Can you disassemble
> 0xffffffff8105ae4c and see what it does and how it corresponds to
> 'process_one_work' in kernel/workqueue.c?

      I put the disassembly of it in the hailstorm-debugnotes.txt file
that I mentioned above.  Let me know if you need more than that.

> You can also instrument the code to find out what:
>
> 1804         work_func_t f = work->func;
>
> is

      I think this request is starting to go a little beyond what I know
how to do.

-- 
Scott Garron