All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nathan March <nathan@gt.net>
To: xen-devel@lists.xen.org
Subject: VM spontaneously losing network on 10gig interface
Date: Mon, 17 Sep 2012 16:00:29 -0700	[thread overview]
Message-ID: <5057AB8D.3040200@gt.net> (raw)

Hi All,

Having a very strange problem where a VM's bridge will spontaneously 
stop bridging traffic. This only seems to occur on our 10gig interfaces 
(intel x540 on ixgbe driver, mtu 9000), which are 2x links bonded into 
bond0, then broken down into pvlan462/pvlan463/etc before being bridged 
with the DomU's. Everything works great at first but several hours after 
starting a large rsync traffic stops crossing the bridge. Once it's 
stopped working it only affects that single VM on that single interface. 
Other VM's on the same dom0 still have access to the same affected vlan.

Layout is Nexenta NFS ---> 2x arista 10gig switches --> intel x540-t2 
(ixgbe) on dom0 --802.3ad--> bond0 --vconfig--> vlan 462 --bridged--> 
pvlan 462 / vif4.1 / vif6.1.
Dom0 is running kernel 3.2.28 w/ xen 4.1.3, domU is kernel 2.6.32.27

xen3 ~ # brctl show
bridge name     bridge id               STP enabled     interfaces
vlan462         8000.a0369f0eac2c       no              pvlan462
                                                         vif4.1
                                                         vif6.1
vlan463         8000.a0369f0eac2c       no              pvlan463
                                                         vif5.1

Once it breaks, doing a tcpdump inside the vm or on the dom0 against the 
vif show the same arp traffic from the VM (looking for the nfs server), 
but nothing incoming to the VM at all. Tcpdumping on the parent bridge 
shows the traffic as normal and other VMs on this bridge have regular 
access still, only the single vif is affected.

I've tried toggling net.bridge.bridge-nf-call-(arp|ip|ip6)tables off and 
it didn't seem to make a difference (also flushed all ip/eb/arptables 
rules just in case).

It takes me several hours to reproduce just by copying data and I 
haven't managed to figure out a nice small test case yet or what 
triggers the break. Considering I've found one bug in ixgbe already 
(reported + fixed!) I suspect the 10gig driver, but seems like this 
problem would come from either xen or bridging. This feels like a xen 
net back/front issue?

Any ideas? Or suggestions on where to start looking?

Thanks!

- Nathan

             reply	other threads:[~2012-09-17 23:00 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-17 23:00 Nathan March [this message]
2012-09-18 18:40 ` VM spontaneously losing network on 10gig interface Andrew Cooper
2012-09-19 19:07 ` Nathan March

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5057AB8D.3040200@gt.net \
    --to=nathan@gt.net \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.