All of lore.kernel.org
 help / color / mirror / Atom feed
* reproductible kernel oops with kernel 3.2 inside kvm
@ 2012-05-03 14:05 Yann Dupont
  2012-05-17 22:21 ` Josh Durgin
  0 siblings, 1 reply; 3+ messages in thread
From: Yann Dupont @ 2012-05-03 14:05 UTC (permalink / raw)
  To: ceph-devel

Hello. I'm stress testing ceph since some time now, with quite good 
results. I really like ceph and will probably use in in some 
pre-production services.

Anyway I've seen some bugs.

One of them is instability if the kernel is running inside KVM, leading 
to a very fast (and reproductible) kernel oops. On bare metal this 
particular oops doesn't happen.

The kernel oops itself involve ceph, but it could be a real bug in kvm too.

The host machine is runnning 3.2.2
kvm is quite ancien (0.14)
guest OS is ubuntu 12.04 with his standard kernel. Retried with custom 
3.2 kernel with the same problem.


I'm using ceph using mount -t ceph mon_adress:/ /mnt/temp

A simple recursive copy of /home lead to this kernel oops:

May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675559] BUG: 
unable to handle kernel NULL pointer dereference at   (null)
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675569] IP: 
[<f8379d8d>] ceph_d_prune+0x1d/0x30 [ceph]
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675579] *pde = 
00000000
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675583] Oops: 
0002 [#1] SMP
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675587] 
Modules linked in: ceph libceph libcrc32c zram(C) parport_pc rfcomm 
ppdev bnep lp bluetooth parport dm_crypt binfmt_misc psmouse mac_hid 
virtio_balloon serio_raw i2c_piix4 nf_conntrack_ipv6 nf_conntrack 
nf_defrag_ipv6 floppy
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675605]
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675609] Pid: 
27, comm: kswapd0 Tainted: G S      WC   3.2.0-24-generic #37-Ubuntu 
Bochs Bochs
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675614] EIP: 
0060:[<f8379d8d>] EFLAGS: 00010282 CPU: 0
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675618] EIP is 
at ceph_d_prune+0x1d/0x30 [ceph]
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675621] EAX: 
00000000 EBX: ed311480 ECX: cdf35a4c EDX: cdf35a00
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675623] ESI: 
ed3114e0 EDI: c8de4ccc EBP: f3e4bdec ESP: f3e4bdec
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675625]  DS: 
007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675628] 
Process kswapd0 (pid: 27, ti=f3e4a000 task=f3d45860 task.ti=f3e4a000)
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675630] Stack:
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675632] 
f3e4bdfc c114503e ed311480 cdf35a00 f3e4be28 c1146bdf c8de5764 ed31164c
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675638] 
cdf35a4c f3e4be44 ed3114cc ed17dbe0 f1b5ac00 f1b5ac80 eafb38e0 f3e4be58
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675644] 
c114763e eafb38cc 00000000 f3e4be3c f3e4be3c f3e4be3c c91a5e60 ed3114e0
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675685] Call 
Trace:
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675769] 
[<c114503e>] dentry_lru_prune+0x6e/0x70
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675774] 
[<c1146bdf>] shrink_dentry_list+0x14f/0x270
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675777] 
[<c114763e>] prune_dcache_sb+0x10e/0x130
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675786] 
[<c113584a>] prune_super+0xfa/0x160
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675790] 
[<c10f6056>] shrink_slab+0x166/0x2e0
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675793] 
[<c10f7c47>] ? shrink_zone+0x137/0x190
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675796] 
[<c10f8074>] balance_pgdat+0x3d4/0x540
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675800] 
[<c10f82d1>] kswapd+0xf1/0x1b0
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675803] 
[<c10f81e0>] ? balance_pgdat+0x540/0x540
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675812] 
[<c1069b8d>] kthread+0x6d/0x80
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675815] 
[<c1069b20>] ? flush_kthread_worker+0x80/0x80
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675828] 
[<c157e37e>] kernel_thread_helper+0x6/0x10
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675830] Code: 
e5 3e 8d 74 26 00 b8 01 00 00 00 5d c3 90 55 89 e5 3e 8d 74 26 00 8b 50 
10 85 d2 74 12 39 d0 74 0e 8b 40 0c 85 c0 74 07 8b 42 5c <f0> 80 20 fd 
5d c3 8d b6 00 00 00 00 8d bc 27 00 00 00 00 55 89
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675861] EIP: 
[<f8379d8d>] ceph_d_prune+0x1d/0x30 [ceph] SS:ESP 0068:f3e4bdec
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675867] CR2: 
0000000000000000
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675872] ---[ 
end trace a7919e7f17c0a727 ]---



Retried on another machine with kvm 1.0 :

May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.962997] BUG: 
unable to handle kernel NULL pointer dereference at   (null)
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963015] IP: 
[<f882fd8d>] ceph_d_prune+0x1d/0x30 [ceph]
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963038] *pde = 
7f686067
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963045] Oops: 
0002 [#1] SMP
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963057] Modules 
linked in: ceph libceph libcrc32c nfs lockd fscache auth_rpcgss nfs_acl 
sunrpc zram(C) dm_crypt rfcomm bnep parport_pc bluetooth ppdev lp 
parport mac_hid binfmt_misc psmouse serio_raw virtio_balloon i2c_piix4 
nf_conntrack_ipv6 nf_conntrack nf_defrag_ipv6 floppy
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963084]
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963091] Pid: 
27, comm: kswapd0 Tainted: G S      WC   3.2.0-24-generic #37-Ubuntu 
Bochs Bochs
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963099] EIP: 
0060:[<f882fd8d>] EFLAGS: 00010282 CPU: 0
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963104] EIP is 
at ceph_d_prune+0x1d/0x30 [ceph]
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963106] EAX: 
00000000 EBX: eb00cd00 ECX: eb26f4cc EDX: eb26f480
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963108] ESI: 
eb00cd60 EDI: eb00cd60 EBP: f3e6ddec ESP: f3e6ddec
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963110]  DS: 
007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963113] Process 
kswapd0 (pid: 27, ti=f3e6c000 task=f3d45860 task.ti=f3e6c000)
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963114] Stack:
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963116] 
f3e6ddfc c114503e eb00cd00 eb00cd4c f3e6de28 c1146c77 eb29a1fc eb26f4cc
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963124] 
eb26f480 f3e6de44 eb00cd00 eb00cd60 ebdc3400 ebdc3480 e9908060 f3e6de58
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963129] 
c114763e e990804c 00000000 f3e6de3c f3e6de3c f3e6de3c eb26f1e0 eb00cd60
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963134] Call Trace:
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963164] 
[<c114503e>] dentry_lru_prune+0x6e/0x70
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963169] 
[<c1146c77>] shrink_dentry_list+0x1e7/0x270
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963172] 
[<c114763e>] prune_dcache_sb+0x10e/0x130
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963179] 
[<c113584a>] prune_super+0xfa/0x160
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963183] 
[<c10f6056>] shrink_slab+0x166/0x2e0
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963190] 
[<c10f7c47>] ? shrink_zone+0x137/0x190
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963193] 
[<c10f8074>] balance_pgdat+0x3d4/0x540
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963196] 
[<c10f82d1>] kswapd+0xf1/0x1b0
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963199] 
[<c10f81e0>] ? balance_pgdat+0x540/0x540
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963207] 
[<c1069b8d>] kthread+0x6d/0x80
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963210] 
[<c1069b20>] ? flush_kthread_worker+0x80/0x80
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963223] 
[<c157e37e>] kernel_thread_helper+0x6/0x10
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963225] Code: 
e5 3e 8d 74 26 00 b8 01 00 00 00 5d c3 90 55 89 e5 3e 8d 74 26 00 8b 50 
10 85 d2 74 12 39 d0 74 0e 8b 40 0c 85 c0 74 07 8b 42 5c <f0> 80 20 fd 
5d c3 8d b6 00 00 00 00 8d bc 27 00 00 00 00 55 89
:


Is it something known ?

Cheers,

-- 
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: reproductible kernel oops with kernel 3.2 inside kvm
  2012-05-03 14:05 reproductible kernel oops with kernel 3.2 inside kvm Yann Dupont
@ 2012-05-17 22:21 ` Josh Durgin
  2012-05-21 12:24   ` Yann Dupont
  0 siblings, 1 reply; 3+ messages in thread
From: Josh Durgin @ 2012-05-17 22:21 UTC (permalink / raw)
  To: Yann Dupont; +Cc: ceph-devel

Hi Yann,

Sorry for the late response.

On 05/03/2012 07:05 AM, Yann Dupont wrote:
> Hello. I'm stress testing ceph since some time now, with quite good
> results. I really like ceph and will probably use in in some
> pre-production services.
>
> Anyway I've seen some bugs.
>
> One of them is instability if the kernel is running inside KVM, leading
> to a very fast (and reproductible) kernel oops. On bare metal this
> particular oops doesn't happen.
>
> The kernel oops itself involve ceph, but it could be a real bug in kvm too.
>
> The host machine is runnning 3.2.2
> kvm is quite ancien (0.14)
> guest OS is ubuntu 12.04 with his standard kernel. Retried with custom
> 3.2 kernel with the same problem.

I'm not sure how many people are using the kernel client within kvm,
but I haven't seen this problem before. Since it's in d_prune, it's
probably Ceph related, but perhaps kvm makes a race condition trigger
more often in your environment.

I filed http://tracker.newdream.net/issues/2444 to track this.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: reproductible kernel oops with kernel 3.2 inside kvm
  2012-05-17 22:21 ` Josh Durgin
@ 2012-05-21 12:24   ` Yann Dupont
  0 siblings, 0 replies; 3+ messages in thread
From: Yann Dupont @ 2012-05-21 12:24 UTC (permalink / raw)
  To: Josh Durgin; +Cc: ceph-devel

On 18/05/2012 00:21, Josh Durgin wrote:
> Hi Yann,
>
> Sorry for the late response.
>
> On 05/03/2012 07:05 AM, Yann Dupont wrote:
>> Hello. I'm stress testing ceph since some time now, with quite good
>> results. I really like ceph and will probably use in in some
>> pre-production services.
>>
>> Anyway I've seen some bugs.
>>
>> One of them is instability if the kernel is running inside KVM, leading
>> to a very fast (and reproductible) kernel oops. On bare metal this
>> particular oops doesn't happen.
>>
>> The kernel oops itself involve ceph, but it could be a real bug in kvm
>> too.
>>
>> The host machine is runnning 3.2.2
>> kvm is quite ancien (0.14)
>> guest OS is ubuntu 12.04 with his standard kernel. Retried with custom
>> 3.2 kernel with the same problem.
>
> I'm not sure how many people are using the kernel client within kvm,
> but I haven't seen this problem before. Since it's in d_prune, it's
> probably Ceph related, but perhaps kvm makes a race condition trigger
> more often in your environment.
>
> I filed http://tracker.newdream.net/issues/2444 to track this.

Ok, thanks.

I'll see the issue in the tracker. As this KVM isn't a production 
machine, I can make tests very easily on it.


Cheers
-- 
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-05-21 12:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-03 14:05 reproductible kernel oops with kernel 3.2 inside kvm Yann Dupont
2012-05-17 22:21 ` Josh Durgin
2012-05-21 12:24   ` Yann Dupont

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.