From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Joanna Rutkowska <joanna@invisiblethingslab.com>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
Marek Marczykowski <marmarek@invisiblethingslab.com>
Subject: Re: The strange case of xen_netback not returning ARP replies
Date: Tue, 22 May 2012 15:53:31 -0400 [thread overview]
Message-ID: <20120522195331.GA6129@phenom.dumpdata.com> (raw)
In-Reply-To: <4FB39B13.70707@invisiblethingslab.com>
On Wed, May 16, 2012 at 02:18:27PM +0200, Joanna Rutkowska wrote:
> Hello,
>
> I'm facing a rather strange problem with the netback interface. My setup
> involves a netvm, which has some physical network interfaces assigned,
> and a client VM where a net front is running (exposed as eth0) and which
> is connected to that netvm (via vif42.0 interface, as seen in the netvm
> on the dumps below).
>
> Now, the netvm has two physical network interfaces assigned:
> 1) A standard Intel AGN (iwlwifi module, interface wlan0) -- this is
> just a PCI devices assigned
>
> 2) A USB 3G modem (cdc_ncm module, usb0 interface) -- this has been made
> available to the netvm by assigning a whole USB controller, where the 3G
> modem is connected to. This works fine.
There are some patches posted about netback and SKB slots that might
apply to the problem you guys are seeing.
>
> We do NAT in netvm for the traffic coming on vif* and send it out
> through the default outgoing interface, e.g. wlan0. Now, as long as I
> use the wlan0 for networking all works great. I've been using this setup
> for years, no problem here.
>
> However, when I switch to usb0 as a default outgoing interface in the
> netvm, something strange happens. The networking works fine via usb0 for
> some time (a few minutes typically), yet suddenly, after enough packets
> got exchanged, the networking stops working.
>
> When I run tcpdump on the vif* interface I can see that suddenly there
> is nobody (in the netvm) to reply for the ARP requests from the client
> VM (the client vm has Xen ID = 42 in this dump, and IP = .5, and gateway
> = .1):
>
> [root@netvm user]# tcpdump -ni vif42.0 arp
> tcpdump: WARNING: vif42.0: no IPv4 address assigned
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on vif42.0, link-type EN10MB (Ethernet), capture size 65535 bytes
> 13:41:55.031819 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length 28
> 13:41:56.031860 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length 28
> 13:41:57.031794 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length 28
> 13:41:59.287308 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length 28
> 13:42:00.283853 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length 28
> 13:42:01.283816 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length 28
> 13:42:03.231324 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length
>
> ... and this now continues until no end.
>
> For comparison, this is how it looks when I use networking via wlan0:
>
> [root@netvm user]# tcpdump -ni vif42.0 arp
> tcpdump: WARNING: vif42.0: no IPv4 address assigned
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on vif42.0, link-type EN10MB (Ethernet), capture size 65535 bytes
> 13:39:00.215883 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length 28
> 13:39:00.215911 ARP, Reply 10.137.1.1 is-at fe:ff:ff:ff:ff:ff, length 28
> 13:39:21.799844 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length 28
> 13:39:21.799869 ARP, Reply 10.137.1.1 is-at fe:ff:ff:ff:ff:ff, length 28
>
> We can see that every once in a while an ARP request for 10.137.1.1
> appears (a gateway for clientvm, so the netvm), yet this is immediately
> being answered (by netvm, as I understand).
>
> Now, this behavior seems really strange, because:
>
> 1) AFAIU, the ARP replies are/should be generated by the netback
> interface in the netvm (vif*).
>
> 2) It shouldn't matter, for the netback code, how the packets are later
> routed (via wlan0 vs. usb0) to provide this (dummy) arp response?
>
> 3) ...yet, for some reason, in the case when packets are later routed
> through usb0, the netback is not willing to generate arp response???
>
> Or am I misunderstanding this, and it is somebody else who is generating
> the arp responses? The final NIC?
>
> Some additional notes:
> 1) We make sure to set /proc/sys/net/ipv4/conf/vif*/proxy_arp to 1
>
> 2) When this "arp hang" happens, the networking (via usb0) is still
> working fine in the netvm (i.e. I can do ping google.com from the netvm)
>
> 3) This has been tested on various VM kernels (in the netvm): 3.0.4,
> 3.2.7, and 3.3.5 -- all exhibit the same behavior.
>
> 4) Nothing spectacular in the logs of the netvm, however, I can often
> see this crash in the *client* VM:
>
> [ 1257.228761] ------------[ cut here ]------------
> [ 1257.228767] WARNING: at
> /home/user/qubes-src/kernel/kernel-3.3.5/linux-3.3.5/fs/sysfs/file.c:498
> sysfs_attr_ns+0x93/0xa0()
> [ 1257.228776] sysfs: kobject eth0 without dirent
> [ 1257.228780] Modules linked in: iptable_raw bnep bluetooth rfkill
> ipt_MASQUERADE ipt_REJECT xt_state xt_tcpudp xen_netback iptable_filter
> iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4
> ip_tables x_tables xen_netfront microcode pcspkr u2mfn(O) xen_blkback
> xen_evtchn autofs4 ext4 jbd2 crc16 dm_snapshot xen_blkfront [last
> unloaded: scsi_wait_scan]
> [ 1257.228819] Pid: 11, comm: xenwatch Tainted: G W O
> 3.3.5-1.pvops.qubes.x86_64 #1
> [ 1257.228825] Call Trace:
> [ 1257.228830] [<ffffffff810495aa>] warn_slowpath_common+0x7a/0xb0
> [ 1257.228836] [<ffffffff81049681>] warn_slowpath_fmt+0x41/0x50
> [ 1257.228842] [<ffffffff81057ba7>] ? lock_timer_base+0x37/0x70
> [ 1257.228850] [<ffffffff811a7433>] sysfs_attr_ns+0x93/0xa0
> [ 1257.228856] [<ffffffff811a7aef>] sysfs_remove_file+0x1f/0x40
> [ 1257.228862] [<ffffffff812e5622>] device_remove_file+0x12/0x20
> [ 1257.228870] [<ffffffffa00faf5a>] xennet_remove+0x84/0xac [xen_netfront]
> [ 1257.228875] [<ffffffff812b5c82>] xenbus_dev_remove+0x42/0xa0
> [ 1257.228881] [<ffffffff812e85a7>] __device_release_driver+0x77/0xd0
> [ 1257.228887] [<ffffffff812e86e8>] device_release_driver+0x28/0x40
> [ 1257.228895] [<ffffffff812e790f>] bus_remove_device+0x10f/0x180
> [ 1257.228901] [<ffffffff812e5808>] device_del+0x118/0x1c0
> [ 1257.228906] [<ffffffff812e58cd>] device_unregister+0x1d/0x60
> [ 1257.228914] [<ffffffff812b5a46>] xenbus_dev_changed+0x96/0x1b0
> [ 1257.228920] [<ffffffff812b74b4>] frontend_changed+0x24/0x50
> [ 1257.228926] [<ffffffff812b4221>] xenwatch_thread+0xb1/0x170
> [ 1257.228933] [<ffffffff8106aea0>] ? wake_up_bit+0x40/0x40
> [ 1257.228939] [<ffffffff812b4170>] ? xenbus_thread+0x40/0x40
> [ 1257.228944] [<ffffffff8106a9a6>] kthread+0x96/0xa0
> [ 1257.228951] [<ffffffff81465724>] kernel_thread_helper+0x4/0x10
> [ 1257.228959] [<ffffffff8145c7fc>] ? retint_restore_args+0x5/0x6
> [ 1257.228964] [<ffffffff81465720>] ? gs_change+0x13/0x13
> [ 1257.228968] ---[ end trace 75286ef58ce0391f ]---
>
> But this seems rather irrelevant, as it seems like it is the netvm that
> is failing here, i.e. it doesn't generate ARP responses?
>
> I would appreciate any help with this issue!
>
> Thanks,
> joanna.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2012-05-22 19:53 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-16 12:18 The strange case of xen_netback not returning ARP replies Joanna Rutkowska
2012-05-22 19:53 ` Konrad Rzeszutek Wilk [this message]
2012-05-26 11:04 ` Joanna Rutkowska
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120522195331.GA6129@phenom.dumpdata.com \
--to=konrad.wilk@oracle.com \
--cc=joanna@invisiblethingslab.com \
--cc=marmarek@invisiblethingslab.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).