All of lore.kernel.org
 help / color / mirror / Atom feed
* Enabling NLB is crashing VM's/DRBD
@ 2012-11-29  0:12 Greg Zapp
  2012-11-29 15:07 ` Pasi Kärkkäinen
  0 siblings, 1 reply; 9+ messages in thread
From: Greg Zapp @ 2012-11-29  0:12 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2963 bytes --]

Hi All,

We have a somewhat serious issue around NLB on Windows 2012 and Xen.
First, let me describe our environment and then I'll let you know what's
wrong.

2 X Debian-squeeze boxes running the latest provided AMD64 Xen kernel and
about 100GB of RAM.
These boxes are connected via infiniband and DRBD is running over
this(IPoIB).
Each VPS runs on a mirrored DRBD devices.
Each DRBD device sits on 2 logical volumes.  One for data and one for
metadata.
The hypervisors exclusively run Windows VM's(Server 2008 R2 and 2012).
The VM's are utilizing the GPLPV drivers(PCI,VBD,Net,etc).
We are using network-bridge.

So here is the trouble.  We had somebody trying to setup Windows NLB.  When
adding a host it would cause the VM to freeze but also disconnect the DRBD
devices.  Everything recovers but the DRBD devices resync and a bunch of
VM's on the one side(the side with the VM that hangs up) get rebooted by
Xen.  Here is what we are seeing in messages:

eth0: port 3(nlb2.e0) entering disabled state
eth0: port 3(nlb2.e0) entering disabled state
frontend_changed: backend/vif/65/0: prepare for reconnect
device nlb.e0 entered promiscuous mode
block drbd29: sock was shut down by peer
block drbd29: peer( Secondary -> Unknown ) conn( Connected -> BrokenPipe )
pdsk( UpToDate -> DUnknown )
block drbd24: sock was shut down by peer
block drbd24: peer( Primary -> Unknown ) conn( Connected -> BrokenPipe )
pdsk( UpToDate -> DUnknown )
block drbd29: Creating new current UUID
block drbd30: sock was shut down by peer
block drbd30: peer( Primary -> Unknown ) conn( Connected -> BrokenPipe )
pdsk( UpToDate -> DUnknown )
.... and on and on and on with the DRBD disconnecting
block drbd29: md_sync_timer expired! Worker calls drbd_md_sync().
block drbd21: md_sync_timer expired! Worker calls drbd_md_sync().
.... lots of that
block drbd24: Terminating drbd24_asender
block drbd21: asender terminated
block drbd21: Terminating drbd21_asender
....
eth0: port 3(nlb2.e0) entering forwarding state
....
block drbd1: Handshake successful: Agreed network protocol version 91
block drbd1: conn( WFConnection -> WFReportParams )
block drbd38: Handshake successful: Agreed network protocol version 91
block drbd38: conn( WFConnection -> WFReportParams )
block drbd38: Starting asender thread (from drbd38_receiver [16250])
block drbd1: Starting asender thread (from drbd1_receiver [18278])
... Then lots of stuff for the DRBD devices reconnecting and syncing.


This happened three times, each time the user was attempting to add the
second node into NLB.  I can reproduce the network adapter dying(Becomes
disabled and is unusable until reboot) in the lab on Server 2012 unless I
follow specific steps, but not the DRBD dying.  I can get NLB working but
I'm mostly concerned about one persons ability to effectively crash 8 other
VM's.  It looks like whatever is going on is somehow effecting my DRBD
connection.  Has anyone seen anything like this before?


Thanks,
   Greg

[-- Attachment #1.2: Type: text/html, Size: 3198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-11-30 12:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-29  0:12 Enabling NLB is crashing VM's/DRBD Greg Zapp
2012-11-29 15:07 ` Pasi Kärkkäinen
2012-11-29 22:11   ` Greg Zapp
2012-11-29 23:15     ` Pasi Kärkkäinen
2012-11-30  0:29       ` Greg Zapp
2012-11-30  1:52         ` Greg Zapp
2012-11-30 12:20           ` Pasi Kärkkäinen
2012-11-30 11:36     ` Pasi Kärkkäinen
2012-11-30 11:46       ` Pasi Kärkkäinen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.