qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Fennosys <reader@fennosys.fi>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] Migration - Random guest kernel panics on target
Date: Tue, 24 Oct 2017 10:38:29 +0100	[thread overview]
Message-ID: <20171024093829.GA3780@work-vm> (raw)
In-Reply-To: <20171024110543.e532d1536d31dd3005efdd3f@fennosys.fi>

* Fennosys (reader@fennosys.fi) wrote:
> Hi,
> 
> I'm encountering random guest kernel crashes while doing live migration with qemu (using qemu cli and monitor commands). 

That shouldn't happen!

> QEMU emulator version 2.10.0

Can you try backing up to 2.9 and see if the problem still happens?

> Host kernel: 4.13.9-gentoo
> Guest kernel: 4.13.9-gentoo
> 
> Host cpu: 
> model name	: AMD Opteron(tm) Processor 6128
> stepping	: 1
> microcode	: 0x10000d9

Are both hosts identical 6128 ?

> 
> example of vm starup cli: 
> qemu-system-x86_64 -daemonize -name VM50 -vnc :50 -enable-kvm -cpu host -serial file:/var/log/kvm/50-serial.log -k fi \
> -kernel /somepath/bzImage \
> root=/dev/vda -m 4096 -smp 4 -runas kvm-user \
> -netdev type=tap,ifname=vm50,id=VM50,script=/etc/openvswitch/scripts/ifup-br0-50,downscript=/etc/openvswitch/scripts/ifdown-br0,vhost=on \
> -device virtio-net-pci,mac=xx:xx:xx:xx:xx:xx,netdev=VM50 \
> -drive file=/dev/drbd1,format=raw,if=virtio

You should add a   ,cache=none   to that -drive - but that wont cause
that kernel panic.

> 
> backtrace:
> [  370.984297] BUG: unable to handle kernel paging request at ffffcc40fe000020
> [  370.985542] IP: receive_buf+0x7db/0xd20
> [  370.986131] PGD 0 
> [  370.986132] P4D 0 
> [  370.986450] 
> [  370.987463] Oops: 0000 [#1] SMP
> [  370.987972] Modules linked in: kvm_amd kvm irqbypass
> [  370.988787] CPU: 1 PID: 14 Comm: ksoftirqd/1 Not tainted 4.13.9-gentoo #3
> [  370.989816] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014
> [  370.991131] task: ffff8ec1baae6c00 task.stack: ffff9cd9406b0000
> [  370.992018] RIP: 0010:receive_buf+0x7db/0xd20
> [  370.992673] RSP: 0018:ffff9cd9406b3d10 EFLAGS: 00010286
> [  370.993454] RAX: 0000713f00000000 RBX: 00000000000007dd RCX: 0000000000002b9d
> [  370.994508] RDX: ffffca7c00000000 RSI: ffff9cd9406b3d4c RDI: ffff8ec1ba11c000
> [  370.995571] RBP: ffff9cd9406b3d98 R08: 0000000000000000 R09: 0000000000000600
> [  370.996618] R10: ffffcc40fe000000 R11: ffff8ec1ba44d740 R12: ffff8ec1ba10f800
> [  370.997676] R13: ffff8ec1b9bf2400 R14: 0000000080000000 R15: ffff8ec1b9bf2d00
> [  370.998728] FS:  0000000000000000(0000) GS:ffff8ec1bfc80000(0000) knlGS:0000000000000000
> [  370.999924] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  371.000770] CR2: ffffcc40fe000020 CR3: 000000013a551000 CR4: 00000000000006a0
> [  371.001828] Call Trace:
> [  371.002231]  ? load_balance+0x144/0x970
> [  371.002802]  virtnet_poll+0x14e/0x260
> [  371.003433]  net_rx_action+0x1ab/0x2b0
> [  371.003996]  __do_softirq+0xdb/0x1e0
> [  371.004558]  run_ksoftirqd+0x24/0x50
> [  371.005107]  smpboot_thread_fn+0x107/0x160
> [  371.005718]  kthread+0xff/0x140
> [  371.006195]  ? sort_range+0x20/0x20
> [  371.006725]  ? kthread_create_on_node+0x40/0x40
> [  371.007415]  ret_from_fork+0x25/0x30
> [  371.007965] Code: 0a 8c 00 4d 01 f2 72 0e 48 c7 c0 00 00 00 80 48 2b 05 ba 6e 8e 00 49 01 c2 48 8b 15 a0 6e 8e 00 49 c1 ea 0c 49 c1 e2 06 49 01 d2 <49> 8b 42 20 a8 01 48 8d 48 ff 8b 45 b4 4c 0f 45 d1 49 39 c1 0f 
> [  371.010846] RIP: receive_buf+0x7db/0xd20 RSP: ffff9cd9406b3d10
> [  371.011701] CR2: ffffcc40fe000020
> [  371.012241] ---[ end trace b32e281709829620 ]---
> [  371.012929] Kernel panic - not syncing: Fatal exception in interrupt
> [  371.013999] Kernel Offset: 0x31000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [  371.015543] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
> 
> conditions:
> With low work-load the migration seems to perform as expected. 
> 
> If load average is between 3-4 the issue can be reproduced relatively easily (2-5 live migration till it's crashing).
> 
> The drbd block device is in dual primary mode during the migration.

Since the failure is a non-filesystem related kernel panic, I don't
think it's block device related.

If you use anything other than virtio-net-pci does it work?

Dave

> RAM (ECC) on both hosts has been tested before these test.
> 
> Cheers,
> Antti
> 
> 
> -- 
> Fennosys <reader@fennosys.fi>
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

      reply	other threads:[~2017-10-24  9:38 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-24  3:05 [Qemu-devel] Migration - Random guest kernel panics on target Fennosys
2017-10-24  9:38 ` Dr. David Alan Gilbert [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171024093829.GA3780@work-vm \
    --to=dgilbert@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=reader@fennosys.fi \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).