patch-2.6.33.9-rt31 problem

linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* patch-2.6.33.9-rt31 problem
@ 2012-07-10 20:32 Dong Liu
  2012-07-11 12:43 ` Steven Rostedt
  0 siblings, 1 reply; 3+ messages in thread
From: Dong Liu @ 2012-07-10 20:32 UTC (permalink / raw)
  To: linux-rt-users@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 719 bytes --]

Hi All,

Because I could not find a solution for the cpu stall problem on kernel 
3.2.18-rt29. I thought I might try an older kernel. So I download 
linux-2.6.33.9 and patch-2.6.33.9-rt31. But 2.6.33 doesn't have 
vhost_net, so I ported vhost_net from 2.6.34 back to 2.6.33.9.

The kernel was patched and built successfully. But when I boot, I got 
kernel NULL pointer dereference error. After the error, my system seems 
stable, I can start KVM client without CPU stalls. But very frequently, 
processes will locked up for long time, the wchan displayed by ps is 
either sync_page or synchronize_rcu. It looks that rcu still causes 
problem in the rt-kernel.

The dmesg out of NULL pointer is attached.

Thanks!

Dong

[-- Attachment #2: kernel-null-pointer.txt --]
[-- Type: text/plain, Size: 2131 bytes --]

BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
IP: [<ffffffff810645f1>] release_resource+0x21/0x90
PGD 123efa067 PUD 120639067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/kernel/kexec_crash_size
CPU 2
Pid: 1826, comm: sh Not tainted 2.6.33.9-1.el6.preempt_rt.x86_64 #1 2A9Ch/HP Elite 7100 Microtower PC
RIP: 0010:[<ffffffff810645f1>]  [<ffffffff810645f1>] release_resource+0x21/0x90
RSP: 0018:ffff880124fffde8  EFLAGS: 00010296
RAX: 0000000000000000 RBX: ffffffff81ac6e40 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000284 RDI: ffffffff8176a620
RBP: ffff880124fffdf8 R08: 0000000000000000 R09: 0000000000000008
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: ffffffff81aee8a0 R15: 0000000000000000
FS:  00007f877dd96700(0000) GS:ffff880028280000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000030 CR3: 000000011f4da000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sh (pid: 1826, threadinfo ffff880124ffe000, task ffff88011f426240)
Stack:
 ffff880124fffdf8 0000000000000000 ffff880124fffe48 ffffffff810a366b
<0> ffff880124fffe48 0000000000000000 0000000000000000 0000000000000001
<0> ffff880124ffff48 ffff8801271cf870 ffffffff81aee8a0 ffff880121c348c0
Call Trace:
 [<ffffffff810a366b>] crash_shrink_memory+0x14b/0x170
 [<ffffffff810845b1>] kexec_crash_size_store+0x41/0x60
 [<ffffffff81221e27>] kobj_attr_store+0x17/0x20
 [<ffffffff811b1a8c>] sysfs_write_file+0xfc/0x180
 [<ffffffff81147a78>] vfs_write+0xb8/0x1a0
 [<ffffffff810b96ea>] ? audit_syscall_entry+0x29a/0x2c0
 [<ffffffff81148451>] sys_write+0x51/0x90
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
Code: 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 89 fb 48 c7 c7 c0 34 a8 81 e8 23 1b 43 00 48 8b 53 20 <48\
> 8b 42 30 48 85 c0 74 20 48 39 c3 75 0e eb 33 0f 1f 80 00 00
RIP  [<ffffffff810645f1>] release_resource+0x21/0x90
 RSP <ffff880124fffde8>
CR2: 0000000000000030

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: patch-2.6.33.9-rt31 problem
  2012-07-10 20:32 patch-2.6.33.9-rt31 problem Dong Liu
@ 2012-07-11 12:43 ` Steven Rostedt
  2012-07-12 17:17   ` Dong Liu
  0 siblings, 1 reply; 3+ messages in thread
From: Steven Rostedt @ 2012-07-11 12:43 UTC (permalink / raw)
  To: Dong Liu; +Cc: linux-rt-users@vger.kernel.org

On Tue, 2012-07-10 at 16:32 -0400, Dong Liu wrote:
> Hi All,
> 
> Because I could not find a solution for the cpu stall problem on kernel 
> 3.2.18-rt29. I thought I might try an older kernel. So I download 
> linux-2.6.33.9 and patch-2.6.33.9-rt31. But 2.6.33 doesn't have 
> vhost_net, so I ported vhost_net from 2.6.34 back to 2.6.33.9.
> 
> The kernel was patched and built successfully. But when I boot, I got 
> kernel NULL pointer dereference error. After the error, my system seems 
> stable, I can start KVM client without CPU stalls. But very frequently, 
> processes will locked up for long time, the wchan displayed by ps is 
> either sync_page or synchronize_rcu. It looks that rcu still causes 
> problem in the rt-kernel.
> 
> The dmesg out of NULL pointer is attached.

Um, when you get one of those 'kernel NULL pointer' crashes, the system
is not in a good state. If the crash happened to a task that holds a
mutex or worse a spinlock, it will never release it. That means, any new
task that tries to take that same mutex or spinlock, will just block and
sit there.

Thus, those processes that are stuck at either sync_page or
synchronize_rcu, are probably waiting for that processes to release a
mutex, or finish something else that it will never do.

Basically, once you see a NULL pointer dereference, it's time to save
the dmesg and reboot the box.

-- Steve

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: patch-2.6.33.9-rt31 problem
  2012-07-11 12:43 ` Steven Rostedt
@ 2012-07-12 17:17   ` Dong Liu
  0 siblings, 0 replies; 3+ messages in thread
From: Dong Liu @ 2012-07-12 17:17 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-rt-users@vger.kernel.org

Hi Steve,

On 7/11/12 8:43 AM, Steven Rostedt wrote:
> On Tue, 2012-07-10 at 16:32 -0400, Dong Liu wrote:
>> Hi All,
>>
>> Because I could not find a solution for the cpu stall problem on kernel
>> 3.2.18-rt29. I thought I might try an older kernel. So I download
>> linux-2.6.33.9 and patch-2.6.33.9-rt31. But 2.6.33 doesn't have
>> vhost_net, so I ported vhost_net from 2.6.34 back to 2.6.33.9.
>>
>> The kernel was patched and built successfully. But when I boot, I got
>> kernel NULL pointer dereference error. After the error, my system seems
>> stable, I can start KVM client without CPU stalls. But very frequently,
>> processes will locked up for long time, the wchan displayed by ps is
>> either sync_page or synchronize_rcu. It looks that rcu still causes
>> problem in the rt-kernel.
>>
>> The dmesg out of NULL pointer is attached.
>
> Um, when you get one of those 'kernel NULL pointer' crashes, the system
> is not in a good state. If the crash happened to a task that holds a
> mutex or worse a spinlock, it will never release it. That means, any new
> task that tries to take that same mutex or spinlock, will just block and
> sit there.
>
> Thus, those processes that are stuck at either sync_page or
> synchronize_rcu, are probably waiting for that processes to release a
> mutex, or finish something else that it will never do.
>
> Basically, once you see a NULL pointer dereference, it's time to save
> the dmesg and reboot the box.
>

I finally tracked down the NULL pointer is caused by

echo -n "0" > /sys/kernel/kexec_crash_size

in /etc/init/kexec-disable.conf.

After I disabled, no more kernel NULL pointer dereference. But I still 
got cpu stall :(

Thanks,

Dong

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-07-12 17:17 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-10 20:32 patch-2.6.33.9-rt31 problem Dong Liu
2012-07-11 12:43 ` Steven Rostedt
2012-07-12 17:17   ` Dong Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).