From: "Pasi Kärkkäinen" <pasik@iki.fi>
To: Greg Zapp <greg.zapp@gmail.com>
Cc: xen-devel@lists.xen.org
Subject: Re: Enabling NLB is crashing VM's/DRBD
Date: Fri, 30 Nov 2012 13:36:55 +0200 [thread overview]
Message-ID: <20121130113655.GP8912@reaktio.net> (raw)
In-Reply-To: <CAEHxbC1gG5RR9h9odxx59BZm2+SUdoGuwZc8Sco_yNtxVwod=Q@mail.gmail.com>
On Fri, Nov 30, 2012 at 11:11:11AM +1300, Greg Zapp wrote:
> HI,
>
> We are running Debian's provided xen-hypervisor-4.0-amd64(4.0.
> 1-4). The kernel is 2.6.32-5-xen-amd64(2.6.32-46) from Debian.
>
> The previously posted log lines were from the dom0's /var/log/messages.
> The only thing I'm seeing form xm dmesg is the following:
> (XEN) grant_table.c:1717:d0 Bad grant reference
>
> I've also picked up on some more entries from syslog that were not present
> in messages. Here is what's present in syslog. Time seems to be sync'd
> to the second on both machines:
> Nov 28 10:55:03 nodeA kernel: [1239467.400293] eth0: port 11(nlb2.e0)
> entering disabled state
> Nov 28 10:55:03 nodeA kernel: [1239467.400516] eth0: port 11(nlb2.e0)
> entering disabled state
How's your networking set up?
I hope the the Windows NLB VMs aren't using the same bridge/VLAN as DRBD is using?
-- Pasi
> Nov 28 10:55:04 nodeA kernel: [1239467.731442] frontend_changed:
> backend/vif/73/0: prepare for reconnect
> Nov 28 10:55:04 nodeA logger: /etc/xen/scripts/vif-bridge: offline
> XENBUS_PATH=backend/vif/73/0
> Nov 28 10:55:05 nodeA logger: /etc/xen/scripts/vif-bridge: brctl delif
> eth0 nlb2.e0 failed
> Nov 28 10:55:05 nodeA logger: /etc/xen/scripts/vif-bridge: ifconfig
> nlb2.e0 down failed
> Nov 28 10:55:06 nodeA logger: /etc/xen/scripts/vif-bridge: Successful
> vif-bridge offline for nlb2.e0, bridge eth0.
> Nov 28 10:55:06 nodeA logger: /etc/xen/scripts/vif-bridge: online
> XENBUS_PATH=backend/vif/73/0
> Nov 28 10:55:08 nodeA kernel: [1239471.758583] device nlb2.e0 entered
> promiscuous mode
> Nov 28 10:55:10 nodeA kernel: [1239473.795967] block drbd23: sock was shut
> down by peer
> Nov 28 10:55:27 nodeA kernel: [1239473.795973] block drbd23: peer( Primary
> -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
> Nov 28 10:55:27 nodeA kernel: [1239473.795980] block drbd23: short read
> expecting header on sock: r=0
> Nov 28 10:55:27 nodeA kernel: [1239474.009951] block drbd31: sock was shut
> down by peer
>
> Nov 28 10:55:09 nodeB kernel: [1239622.217505] block drbd23: PingAck did
> not arrive in time.
> Nov 28 10:55:09 nodeB kernel: [1239622.217542] block drbd23: peer(
> Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate
> -> DUnknown )
> Nov 28 10:55:09 nodeB kernel: [1239622.217551] block drbd23: asender
> terminated
> Nov 28 10:55:09 nodeB kernel: [1239622.217554] block drbd23: Terminating
> drbd23_asender
> Nov 28 10:55:09 nodeB kernel: [1239622.217795] block drbd23: short read
> expecting header on sock: r=-512
> Nov 28 10:55:09 nodeB kernel: [1239622.217887] block drbd23: Creating new
> current UUID
> Nov 28 10:55:09 nodeB kernel: [1239622.218118] block drbd23: Connection
> closed
> Nov 28 10:55:09 nodeB kernel: [1239622.218125] block drbd23: conn(
> NetworkFailure -> Unconnected )
> Nov 28 10:55:09 nodeB kernel: [1239622.218135] block drbd23: receiver
> terminated
> Nov 28 10:55:09 nodeB kernel: [1239622.218137] block drbd23: Restarting
> drbd23_receiver
> Nov 28 10:55:09 nodeB kernel: [1239622.218140] block drbd23: receiver
> (re)started
> Nov 28 10:55:09 nodeB kernel: [1239622.218143] block drbd23: conn(
> Unconnected -> WFConnection )
> Nov 28 10:55:09 nodeB kernel: [1239622.353589] block drbd30: PingAck did
> not arrive in time.
> Nov 28 10:55:09 nodeB kernel: [1239622.353627] block drbd30: peer(
> Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate
> -> DUnknown )
> Nov 28 10:55:09 nodeB kernel: [1239622.353637] block drbd30: asender
> terminated
> Nov 28 10:55:09 nodeB kernel: [1239622.353639] block drbd30: Terminating
> drbd30_asender
> Nov 28 10:55:09 nodeB kernel: [1239622.353668] block drbd30: short read
> expecting header on sock: r=-512
> Nov 28 10:55:09 nodeB kernel: [1239622.353754] block drbd30: Creating new
> current UUID
> Nov 28 10:55:09 nodeB kernel: [1239622.388101] block drbd30: Connection
> closed
> Nov 28 10:55:09 nodeB kernel: [1239622.388107] block drbd30: conn(
> NetworkFailure -> Unconnected )
> Nov 28 10:55:09 nodeB kernel: [1239622.388111] block drbd30: receiver
> terminated
> Nov 28 10:55:09 nodeB kernel: [1239622.388113] block drbd30: Restarting
> drbd30_receiver
> Nov 28 10:55:09 nodeB kernel: [1239622.388116] block drbd30: receiver
> (re)started
> Nov 28 10:55:09 nodeB kernel: [1239622.388119] block drbd30: conn(
> Unconnected -> WFConnection )
>
> I've also looked at the qemu, xend-hotplug, and xend logs and do not see
> any telling errors. In xend.log I just see lines pertaining to VM's being
> rebooted.
>
> As for GPLPV.. I haven't been able to reproduce the "network crashing" and
> rebooting in the lab and probably won't be able to until I can get a more
> robust production-like environment setup. Unfortunately I can't risk more
> customer down time by attempting to setup NLB without the GPLPV drivers in
> production. If I can manage to reproduce this in staging I will of course
> attempt without the GPLPV drivers.
>
> -Greg
next prev parent reply other threads:[~2012-11-30 11:36 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-29 0:12 Enabling NLB is crashing VM's/DRBD Greg Zapp
2012-11-29 15:07 ` Pasi Kärkkäinen
2012-11-29 22:11 ` Greg Zapp
2012-11-29 23:15 ` Pasi Kärkkäinen
2012-11-30 0:29 ` Greg Zapp
2012-11-30 1:52 ` Greg Zapp
2012-11-30 12:20 ` Pasi Kärkkäinen
2012-11-30 11:36 ` Pasi Kärkkäinen [this message]
2012-11-30 11:46 ` Pasi Kärkkäinen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121130113655.GP8912@reaktio.net \
--to=pasik@iki.fi \
--cc=greg.zapp@gmail.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.