kcryptd oops when resuming with TuxOnIce with KDB oops afterwards

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* kcryptd oops when resuming with TuxOnIce with KDB oops afterwards
@ 2010-07-29  1:30 Pedro Ribeiro
  2010-07-29  2:49 ` Henrique de Moraes Holschuh
  2010-07-30 21:10 ` [Kgdb-bugreport] kcryptd oops when resuming with TuxOnIce with KDBoops afterwards Jason Wessel
  0 siblings, 2 replies; 9+ messages in thread
From: Pedro Ribeiro @ 2010-07-29  1:30 UTC (permalink / raw)
  To: Nigel Cunningham, tuxonice-devel, kgdb-bugreport,
	Kernel development list, dm-crypt

Hi all,

I hit a bug when resuming with TuxOnIce. At the middle of a resume, it
says Compress Read -22 and locks up. I caught the stack trace with kdb
and took photos of that.
I'm running 2.6.35-rc6 on a Lenovo T400. I have an encrypted LUKS
partition (aes-cbc-essiv-128) which contains an LVM2 with my root,
swap and home partitions inside.

It seems that kcryptd caused the trouble. I've had other lockups with
TuxOnIce that relate to kcryptd too, but I never caught them with kdb,

After printing the stack trace I decided to see the output of the ps
command. As I was scrolling the processes shown, kdb oops'ed and
called itself. I also took photos of that kdb's own stack trace. I
then tried the ps command again, but this time the stack trace was
looping every few seconds (I took another photo of that). After a
while it just panicked and kept calling itself on a loop. I rebooted
and was able to successfully resume the TuxOnIce image.

The stack trace means little to me, but might be helpful to you.

The photos are:
kcryptd_oops [1,2,3] - TuxOnIce compress read -22 error
kdb_oops [1,2,3,4] - KDB oopses when scrolling output of kdb ps command
kdb_blows_up - final stack trace being shown in a cycle before PANIC:
recursive entry into debugger and locking up completely

The files are in kcryptd_kdb_oopses.tar.gz (about 4.7 mb) located here
http://www.mediafire.com/file/uum6y1hwfk90124/kcryptd_kdb_oopses.tar.gz
. They should stay there at least 30 days.

Sorry for the file size but they are good quality pictures.

Regards,
Pedro

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kcryptd oops when resuming with TuxOnIce with KDB oops afterwards
  2010-07-29  1:30 kcryptd oops when resuming with TuxOnIce with KDB oops afterwards Pedro Ribeiro
@ 2010-07-29  2:49 ` Henrique de Moraes Holschuh
  2010-07-29  3:08   ` Nigel Cunningham
  2010-07-30 21:10 ` [Kgdb-bugreport] kcryptd oops when resuming with TuxOnIce with KDBoops afterwards Jason Wessel
  1 sibling, 1 reply; 9+ messages in thread
From: Henrique de Moraes Holschuh @ 2010-07-29  2:49 UTC (permalink / raw)
  To: Pedro Ribeiro
  Cc: Nigel Cunningham, tuxonice-devel, kgdb-bugreport,
	Kernel development list, dm-crypt

On Thu, 29 Jul 2010, Pedro Ribeiro wrote:
> I hit a bug when resuming with TuxOnIce. At the middle of a resume, it
> says Compress Read -22 and locks up. I caught the stack trace with kdb
> and took photos of that.

Maybe this?
http://lkml.org/lkml/2010/7/28/398

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kcryptd oops when resuming with TuxOnIce with KDB oops afterwards
  2010-07-29  2:49 ` Henrique de Moraes Holschuh
@ 2010-07-29  3:08   ` Nigel Cunningham
  2010-07-29 10:31     ` [dm-crypt] " Milan Broz
  2010-07-29 11:49     ` [TuxOnIce-devel] " Martin Steigerwald
  0 siblings, 2 replies; 9+ messages in thread
From: Nigel Cunningham @ 2010-07-29  3:08 UTC (permalink / raw)
  To: Henrique de Moraes Holschuh
  Cc: Pedro Ribeiro, tuxonice-devel, kgdb-bugreport,
	Kernel development list, dm-crypt

Hi Henrique.

On 29/07/10 12:49, Henrique de Moraes Holschuh wrote:
> On Thu, 29 Jul 2010, Pedro Ribeiro wrote:
>> I hit a bug when resuming with TuxOnIce. At the middle of a resume, it
>> says Compress Read -22 and locks up. I caught the stack trace with kdb
>> and took photos of that.
>
> Maybe this?
> http://lkml.org/lkml/2010/7/28/398

I don't think so. This issue has been around for a fair while. It's just 
impossible to reliably reproduce, and I haven't yet found the time to 
put some serious effort into tracking down the cause and fixing it.

Regards,

Nigel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dm-crypt] kcryptd oops when resuming with TuxOnIce with KDB oops afterwards
  2010-07-29  3:08   ` Nigel Cunningham
@ 2010-07-29 10:31     ` Milan Broz
  2010-07-29 11:38       ` Nigel Cunningham
  2010-07-29 11:49     ` [TuxOnIce-devel] " Martin Steigerwald
  1 sibling, 1 reply; 9+ messages in thread
From: Milan Broz @ 2010-07-29 10:31 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Henrique de Moraes Holschuh, dm-crypt, Pedro Ribeiro,
	tuxonice-devel, Kernel development list, kgdb-bugreport

On 07/29/2010 05:08 AM, Nigel Cunningham wrote:
> On 29/07/10 12:49, Henrique de Moraes Holschuh wrote:
>> On Thu, 29 Jul 2010, Pedro Ribeiro wrote:
>>> I hit a bug when resuming with TuxOnIce. At the middle of a resume, it
>>> says Compress Read -22 and locks up. I caught the stack trace with kdb
>>> and took photos of that.
>>
>> Maybe this?
>> http://lkml.org/lkml/2010/7/28/398
> 
> I don't think so. This issue has been around for a fair while. It's just 
> impossible to reliably reproduce, and I haven't yet found the time to 
> put some serious effort into tracking down the cause and fixing it.

Is it TuxOnIce only problem?
Or there is similar report with unpatched kernel?

Milan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dm-crypt] kcryptd oops when resuming with TuxOnIce with KDB oops afterwards
  2010-07-29 10:31     ` [dm-crypt] " Milan Broz
@ 2010-07-29 11:38       ` Nigel Cunningham
  0 siblings, 0 replies; 9+ messages in thread
From: Nigel Cunningham @ 2010-07-29 11:38 UTC (permalink / raw)
  To: Milan Broz
  Cc: Henrique de Moraes Holschuh, dm-crypt, Pedro Ribeiro,
	tuxonice-devel, Kernel development list, kgdb-bugreport

Hi.

On 29/07/10 20:31, Milan Broz wrote:
> On 07/29/2010 05:08 AM, Nigel Cunningham wrote:
>> On 29/07/10 12:49, Henrique de Moraes Holschuh wrote:
>>> On Thu, 29 Jul 2010, Pedro Ribeiro wrote:
>>>> I hit a bug when resuming with TuxOnIce. At the middle of a resume, it
>>>> says Compress Read -22 and locks up. I caught the stack trace with kdb
>>>> and took photos of that.
>>>
>>> Maybe this?
>>> http://lkml.org/lkml/2010/7/28/398
>>
>> I don't think so. This issue has been around for a fair while. It's just
>> impossible to reliably reproduce, and I haven't yet found the time to
>> put some serious effort into tracking down the cause and fixing it.
>
> Is it TuxOnIce only problem?
> Or there is similar report with unpatched kernel?

It's TuxOnIce specific.

Regards,

Nigel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [TuxOnIce-devel] kcryptd oops when resuming with TuxOnIce with KDB oops afterwards
  2010-07-29  3:08   ` Nigel Cunningham
  2010-07-29 10:31     ` [dm-crypt] " Milan Broz
@ 2010-07-29 11:49     ` Martin Steigerwald
  1 sibling, 0 replies; 9+ messages in thread
From: Martin Steigerwald @ 2010-07-29 11:49 UTC (permalink / raw)
  To: tuxonice-devel
  Cc: Nigel Cunningham, Henrique de Moraes Holschuh, dm-crypt,
	Kernel development list, kgdb-bugreport

[-- Attachment #1: Type: Text/Plain, Size: 2125 bytes --]

Am Donnerstag 29 Juli 2010 schrieb Nigel Cunningham:
> Hi Henrique.

Hi Nigel,

> On 29/07/10 12:49, Henrique de Moraes Holschuh wrote:
> > On Thu, 29 Jul 2010, Pedro Ribeiro wrote:
> >> I hit a bug when resuming with TuxOnIce. At the middle of a resume,
> >> it says Compress Read -22 and locks up. I caught the stack trace
> >> with kdb and took photos of that.
> > 
> > Maybe this?
> > http://lkml.org/lkml/2010/7/28/398
> 
> I don't think so. This issue has been around for a fair while. It's
> just impossible to reliably reproduce, and I haven't yet found the
> time to put some serious effort into tracking down the cause and
> fixing it.

I reported this one as - you said, its an TuxOnIce bug:
https://bugzilla.kernel.org/show_bug.cgi?id=15873

I switched compression on my ThinkPad T23 where it happened all 2-4 days 
or so with 2.6.34 from LZO to LZF. And since then I didn't get the error 
anymore, but with only 5 attempts so far, so I am not sure whether 
switching to LZF "fixed" it:

deepdance:~> cat /sys/power/tuxonice/debug_info 
TuxOnIce debugging info:
- TuxOnIce core  : 3.1.1.1
- Kernel Version : 2.6.34.1-tp23-toi-3.1.1.1-04990-g3a7d1f4
- Compiler vers. : 4.4
- Attempt number : 5
- Parameters     : 0 667656 0 1 0 0
- Overall expected compression percentage: 0.
- Checksum method is 'md4'.
  0 pages resaved in atomic copy.
- Compressor is 'lzf'.
  Compressed 776593408 bytes into 359897499 (53 percent compression).
- Block I/O active.
- Max outstanding reads 714. Max writes 5.
  Memory_needed: 1024 x (4096 + 200 + 76) = 4476928 bytes.
  Free mem throttle point reached 983.
- Swap Allocator enabled.
  Swap available for image: 229016 pages.
- File Allocator active.
  Storage available for image: 0 pages.
- I/O speed: Write 28 MB/s, Read 33 MB/s.
- Extra pages    : 26 used/500.
- Result         : Succeeded.

Maybe its a good idea to collect information in that bug report, even when 
it really is a TuxOnIce one.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Kgdb-bugreport] kcryptd oops when resuming with TuxOnIce with KDBoops afterwards
  2010-07-29  1:30 kcryptd oops when resuming with TuxOnIce with KDB oops afterwards Pedro Ribeiro
  2010-07-29  2:49 ` Henrique de Moraes Holschuh
@ 2010-07-30 21:10 ` Jason Wessel
  2010-07-30 21:33   ` Pedro Ribeiro
  1 sibling, 1 reply; 9+ messages in thread
From: Jason Wessel @ 2010-07-30 21:10 UTC (permalink / raw)
  To: Pedro Ribeiro
  Cc: Nigel Cunningham, tuxonice-devel, kgdb-bugreport,
	Kernel development list, dm-crypt

On 07/28/2010 08:30 PM, Pedro Ribeiro wrote:
> Hi all,
>
> I hit a bug when resuming with TuxOnIce. At the middle of a resume, it
> says Compress Read -22 and locks up. I caught the stack trace with kdb
> and took photos of that.
> I'm running 2.6.35-rc6 on a Lenovo T400. I have an encrypted LUKS
> partition (aes-cbc-essiv-128) which contains an LVM2 with my root,
> swap and home partitions inside.
>
> It seems that kcryptd caused the trouble. I've had other lockups with
> TuxOnIce that relate to kcryptd too, but I never caught them with kdb,
>
> After printing the stack trace I decided to see the output of the ps
> command. As I was scrolling the processes shown, kdb oops'ed and
> called itself. I also took photos of that kdb's own stack trace. I
> then tried the ps command again, but this time the stack trace was
> looping every few seconds (I took another photo of that). After a
> while it just panicked and kept calling itself on a loop. I rebooted
> and was able to successfully resume the TuxOnIce image.
>
> The stack trace means little to me, but might be helpful to you.
>
> The photos are:
> kcryptd_oops [1,2,3] - TuxOnIce compress read -22 error
> kdb_oops [1,2,3,4] - KDB oopses when scrolling output of kdb ps command
>   

You don't happen to have the vmlinux file around which corresponded to
that crashed kernel do you?

If so, can you run:

addr2line -f -e vmlinux 0xffffffff81030512
addr2line -f -e vmlinux 0xffffffff810ad1d0
addr2line -f -e vmlinux 0xffffffff810add3c

And send me the output?

I have a pretty good idea about what the problem is but it would be
interesting to know the exact failure point if the vmlinux file will
tell us.    In a nut shell, the "ps" command in kdb does not use
probe_kernel_address() to safely read memory in all instances. 
Presently the ps function assumes that if the task struct was ok the
rest of memory accesses in this region would be ok as well.




> kdb_blows_up - final stack trace being shown in a cycle before PANIC:
>   
Once kdb oopses the system is pretty much toast.  There are some limited
things you can do at that point like at least get a stack trace so the
original problem can be found.

Jason.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Kgdb-bugreport] kcryptd oops when resuming with TuxOnIce with  KDBoops afterwards
  2010-07-30 21:10 ` [Kgdb-bugreport] kcryptd oops when resuming with TuxOnIce with KDBoops afterwards Jason Wessel
@ 2010-07-30 21:33   ` Pedro Ribeiro
  2010-07-30 22:53     ` Jason Wessel
  0 siblings, 1 reply; 9+ messages in thread
From: Pedro Ribeiro @ 2010-07-30 21:33 UTC (permalink / raw)
  To: Jason Wessel
  Cc: Nigel Cunningham, tuxonice-devel, kgdb-bugreport,
	Kernel development list, dm-crypt

On 30 July 2010 22:10, Jason Wessel <jason.wessel@windriver.com> wrote:
> On 07/28/2010 08:30 PM, Pedro Ribeiro wrote:
>> Hi all,
>>
>> I hit a bug when resuming with TuxOnIce. At the middle of a resume, it
>> says Compress Read -22 and locks up. I caught the stack trace with kdb
>> and took photos of that.
>> I'm running 2.6.35-rc6 on a Lenovo T400. I have an encrypted LUKS
>> partition (aes-cbc-essiv-128) which contains an LVM2 with my root,
>> swap and home partitions inside.
>>
>> It seems that kcryptd caused the trouble. I've had other lockups with
>> TuxOnIce that relate to kcryptd too, but I never caught them with kdb,
>>
>> After printing the stack trace I decided to see the output of the ps
>> command. As I was scrolling the processes shown, kdb oops'ed and
>> called itself. I also took photos of that kdb's own stack trace. I
>> then tried the ps command again, but this time the stack trace was
>> looping every few seconds (I took another photo of that). After a
>> while it just panicked and kept calling itself on a loop. I rebooted
>> and was able to successfully resume the TuxOnIce image.
>>
>> The stack trace means little to me, but might be helpful to you.
>>
>> The photos are:
>> kcryptd_oops [1,2,3] - TuxOnIce compress read -22 error
>> kdb_oops [1,2,3,4] - KDB oopses when scrolling output of kdb ps command
>>
>
> You don't happen to have the vmlinux file around which corresponded to
> that crashed kernel do you?
>
> If so, can you run:
>
> addr2line -f -e vmlinux 0xffffffff81030512
> addr2line -f -e vmlinux 0xffffffff810ad1d0
> addr2line -f -e vmlinux 0xffffffff810add3c
>
> And send me the output?
>
> I have a pretty good idea about what the problem is but it would be
> interesting to know the exact failure point if the vmlinux file will
> tell us.    In a nut shell, the "ps" command in kdb does not use
> probe_kernel_address() to safely read memory in all instances.
> Presently the ps function assumes that if the task struct was ok the
> rest of memory accesses in this region would be ok as well.
>

Not sure if this is what you want...

addr2line -f -e vmlinux 0xffffffff81030512:
task_curr
??:0

addr2line -f -e vmlinux 0xffffffff810ad1d0
kdb_ps1
??:0

addr2line -f -e vmlinux 0xffffffff810add3c
kdb_task_state_char
??:0


>
>
>> kdb_blows_up - final stack trace being shown in a cycle before PANIC:
>>
> Once kdb oopses the system is pretty much toast.  There are some limited
> things you can do at that point like at least get a stack trace so the
> original problem can be found.
>
> Jason.
>

Can you tell me how to do that? So that when it happens next time I
have the chance to take a photo...

Regards,
Pedro

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Kgdb-bugreport] kcryptd oops when resuming with TuxOnIce with KDBoops afterwards
  2010-07-30 21:33   ` Pedro Ribeiro
@ 2010-07-30 22:53     ` Jason Wessel
  0 siblings, 0 replies; 9+ messages in thread
From: Jason Wessel @ 2010-07-30 22:53 UTC (permalink / raw)
  To: Pedro Ribeiro
  Cc: Nigel Cunningham, tuxonice-devel, kgdb-bugreport,
	Kernel development list, dm-crypt

On 07/30/2010 04:33 PM, Pedro Ribeiro wrote:
> On 30 July 2010 22:10, Jason Wessel <jason.wessel@windriver.com> wrote:
>   
>> On 07/28/2010 08:30 PM, Pedro Ribeiro wrote:
>>     
>>> Hi all,
>>>
>>> I hit a bug when resuming with TuxOnIce. At the middle of a resume, it
>>> says Compress Read -22 and locks up. I caught the stack trace with kdb
>>> and took photos of that.
>>> I'm running 2.6.35-rc6 on a Lenovo T400. I have an encrypted LUKS
>>> partition (aes-cbc-essiv-128) which contains an LVM2 with my root,
>>> swap and home partitions inside.
>>>
>>> It seems that kcryptd caused the trouble. I've had other lockups with
>>> TuxOnIce that relate to kcryptd too, but I never caught them with kdb,
>>>
>>> After printing the stack trace I decided to see the output of the ps
>>> command. As I was scrolling the processes shown, kdb oops'ed and
>>> called itself. I also took photos of that kdb's own stack trace. I
>>> then tried the ps command again, but this time the stack trace was
>>> looping every few seconds (I took another photo of that). After a
>>> while it just panicked and kept calling itself on a loop. I rebooted
>>> and was able to successfully resume the TuxOnIce image.
>>>
>>> The stack trace means little to me, but might be helpful to you.
>>>
>>> The photos are:
>>> kcryptd_oops [1,2,3] - TuxOnIce compress read -22 error
>>> kdb_oops [1,2,3,4] - KDB oopses when scrolling output of kdb ps command
>>>
>>>       
>> You don't happen to have the vmlinux file around which corresponded to
>> that crashed kernel do you?
>>
>> If so, can you run:
>>
>> addr2line -f -e vmlinux 0xffffffff81030512
>> addr2line -f -e vmlinux 0xffffffff810ad1d0
>> addr2line -f -e vmlinux 0xffffffff810add3c
>>
>> And send me the output?
>>
>> I have a pretty good idea about what the problem is but it would be
>> interesting to know the exact failure point if the vmlinux file will
>> tell us.    In a nut shell, the "ps" command in kdb does not use
>> probe_kernel_address() to safely read memory in all instances.
>> Presently the ps function assumes that if the task struct was ok the
>> rest of memory accesses in this region would be ok as well.
>>
>>     
>
> Not sure if this is what you want...
>
> addr2line -f -e vmlinux 0xffffffff81030512:
> task_curr
> ??:0
>
> addr2line -f -e vmlinux 0xffffffff810ad1d0
> kdb_ps1
> ??:0
>
> addr2line -f -e vmlinux 0xffffffff810add3c
> kdb_task_state_char
> ??:0
>
>   

I guess there was no debuginfo in your vmlinux file then, because
normally that would return the source line information.   At least I
know where to look to fix the problem from the back trace.

Thanks,
Jason.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-07-30 22:53 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-29  1:30 kcryptd oops when resuming with TuxOnIce with KDB oops afterwards Pedro Ribeiro
2010-07-29  2:49 ` Henrique de Moraes Holschuh
2010-07-29  3:08   ` Nigel Cunningham
2010-07-29 10:31     ` [dm-crypt] " Milan Broz
2010-07-29 11:38       ` Nigel Cunningham
2010-07-29 11:49     ` [TuxOnIce-devel] " Martin Steigerwald
2010-07-30 21:10 ` [Kgdb-bugreport] kcryptd oops when resuming with TuxOnIce with KDBoops afterwards Jason Wessel
2010-07-30 21:33   ` Pedro Ribeiro
2010-07-30 22:53     ` Jason Wessel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox