From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jason.wessel@windriver.com>
Received: from mail.windriver.com (mail.windriver.com [147.11.1.11])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.saout.de (Postfix) with ESMTPS
	for <dm-crypt@saout.de>; Sat, 31 Jul 2010 00:53:27 +0200 (CEST)
Message-ID: <4C5357D7.6030604@windriver.com>
Date: Fri, 30 Jul 2010 17:53:11 -0500
From: Jason Wessel <jason.wessel@windriver.com>
MIME-Version: 1.0
References: <AANLkTikjSiVXW+iGX-ofBs00crqiyT_d+DLpCBzFk4Ne@mail.gmail.com><4C533FA8.2010500@windriver.com>
	<AANLkTi=qWGMBFYJjRNOv2Ry3vBEvVaf2R=CJNpTfUsSo@mail.gmail.com>
In-Reply-To: <AANLkTi=qWGMBFYJjRNOv2Ry3vBEvVaf2R=CJNpTfUsSo@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Re: [dm-crypt] [Kgdb-bugreport] kcryptd oops when resuming with
 TuxOnIce with KDBoops afterwards
List-Id: <dm-crypt.saout.de>
List-Unsubscribe: <http://www.saout.de/mailman/options/dm-crypt>,
	<mailto:dm-crypt-request@saout.de?subject=unsubscribe>
List-Archive: <http://www.saout.de/pipermail/dm-crypt>
List-Post: <mailto:dm-crypt@saout.de>
List-Help: <mailto:dm-crypt-request@saout.de?subject=help>
List-Subscribe: <http://www.saout.de/mailman/listinfo/dm-crypt>,
	<mailto:dm-crypt-request@saout.de?subject=subscribe>
To: Pedro Ribeiro <pedrib@gmail.com>
Cc: dm-crypt@saout.de, kgdb-bugreport@lists.sourceforge.net, Nigel Cunningham <nigel@tuxonice.net>, tuxonice-devel@tuxonice.net, Kernel development list <linux-kernel@vger.kernel.org>

On 07/30/2010 04:33 PM, Pedro Ribeiro wrote:
> On 30 July 2010 22:10, Jason Wessel <jason.wessel@windriver.com> wrote:
>   
>> On 07/28/2010 08:30 PM, Pedro Ribeiro wrote:
>>     
>>> Hi all,
>>>
>>> I hit a bug when resuming with TuxOnIce. At the middle of a resume, it
>>> says Compress Read -22 and locks up. I caught the stack trace with kdb
>>> and took photos of that.
>>> I'm running 2.6.35-rc6 on a Lenovo T400. I have an encrypted LUKS
>>> partition (aes-cbc-essiv-128) which contains an LVM2 with my root,
>>> swap and home partitions inside.
>>>
>>> It seems that kcryptd caused the trouble. I've had other lockups with
>>> TuxOnIce that relate to kcryptd too, but I never caught them with kdb,
>>>
>>> After printing the stack trace I decided to see the output of the ps
>>> command. As I was scrolling the processes shown, kdb oops'ed and
>>> called itself. I also took photos of that kdb's own stack trace. I
>>> then tried the ps command again, but this time the stack trace was
>>> looping every few seconds (I took another photo of that). After a
>>> while it just panicked and kept calling itself on a loop. I rebooted
>>> and was able to successfully resume the TuxOnIce image.
>>>
>>> The stack trace means little to me, but might be helpful to you.
>>>
>>> The photos are:
>>> kcryptd_oops [1,2,3] - TuxOnIce compress read -22 error
>>> kdb_oops [1,2,3,4] - KDB oopses when scrolling output of kdb ps command
>>>
>>>       
>> You don't happen to have the vmlinux file around which corresponded to
>> that crashed kernel do you?
>>
>> If so, can you run:
>>
>> addr2line -f -e vmlinux 0xffffffff81030512
>> addr2line -f -e vmlinux 0xffffffff810ad1d0
>> addr2line -f -e vmlinux 0xffffffff810add3c
>>
>> And send me the output?
>>
>> I have a pretty good idea about what the problem is but it would be
>> interesting to know the exact failure point if the vmlinux file will
>> tell us.    In a nut shell, the "ps" command in kdb does not use
>> probe_kernel_address() to safely read memory in all instances.
>> Presently the ps function assumes that if the task struct was ok the
>> rest of memory accesses in this region would be ok as well.
>>
>>     
>
> Not sure if this is what you want...
>
> addr2line -f -e vmlinux 0xffffffff81030512:
> task_curr
> ??:0
>
> addr2line -f -e vmlinux 0xffffffff810ad1d0
> kdb_ps1
> ??:0
>
> addr2line -f -e vmlinux 0xffffffff810add3c
> kdb_task_state_char
> ??:0
>
>   

I guess there was no debuginfo in your vmlinux file then, because
normally that would return the source line information.   At least I
know where to look to fix the problem from the back trace.

Thanks,
Jason.