From mboxrd@z Thu Jan  1 00:00:00 1970
From: Joanna Rutkowska <joanna@invisiblethingslab.com>
Subject: The strange case of xen_netback not returning ARP
	replies
Date: Wed, 16 May 2012 14:18:27 +0200
Message-ID: <4FB39B13.70707@invisiblethingslab.com>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============4412883461917978488=="
Return-path: <xen-devel-bounces@lists.xen.org>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
Cc: Marek Marczykowski <marmarek@invisiblethingslab.com>
List-Id: xen-devel@lists.xenproject.org

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--===============4412883461917978488==
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature";
 boundary="------------enig25EA00E6119971304D270B7A"

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig25EA00E6119971304D270B7A
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hello,

I'm facing a rather strange problem with the netback interface. My setup
involves a netvm, which has some physical network interfaces assigned,
and a client VM where a net front is running (exposed as eth0) and which
is connected to that netvm (via vif42.0 interface, as seen in the netvm
on the dumps below).

Now, the netvm has two physical network interfaces assigned:
1) A standard Intel AGN (iwlwifi module, interface wlan0) -- this is
just a PCI devices assigned

2) A USB 3G modem (cdc_ncm module, usb0 interface) -- this has been made
available to the netvm by assigning a whole USB controller, where the 3G
modem is connected to. This works fine.

We do NAT in netvm for the traffic coming on vif* and send it out
through the default outgoing interface, e.g. wlan0. Now, as long as I
use the wlan0 for networking all works great. I've been using this setup
for years, no problem here.

However, when I switch to usb0 as a default outgoing interface in the
netvm, something strange happens. The networking works fine via usb0 for
some time (a few minutes typically), yet suddenly, after enough packets
got exchanged, the networking stops working.

When I run tcpdump on the vif* interface I can see that suddenly there
is nobody (in the netvm) to reply for the ARP requests from the client
VM (the client vm has Xen ID =3D 42 in this dump, and IP =3D .5, and gate=
way
=3D .1):

[root@netvm user]# tcpdump -ni vif42.0 arp
tcpdump: WARNING: vif42.0: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decod=
e
listening on vif42.0, link-type EN10MB (Ethernet), capture size 65535 byt=
es
13:41:55.031819 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length 2=
8
13:41:56.031860 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length 2=
8
13:41:57.031794 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length 2=
8
13:41:59.287308 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length 2=
8
13:42:00.283853 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length 2=
8
13:42:01.283816 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length 2=
8
13:42:03.231324 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length

=2E.. and this now continues until no end.

For comparison, this is how it looks when I use networking via wlan0:

[root@netvm user]# tcpdump -ni vif42.0 arp
tcpdump: WARNING: vif42.0: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decod=
e
listening on vif42.0, link-type EN10MB (Ethernet), capture size 65535 byt=
es
13:39:00.215883 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length 2=
8
13:39:00.215911 ARP, Reply 10.137.1.1 is-at fe:ff:ff:ff:ff:ff, length 28
13:39:21.799844 ARP, Request who-has 10.137.1.1 tell 10.137.1.5, length 2=
8
13:39:21.799869 ARP, Reply 10.137.1.1 is-at fe:ff:ff:ff:ff:ff, length 28

We can see that every once in a while an ARP request for 10.137.1.1
appears (a gateway for clientvm, so the netvm), yet this is immediately
being answered (by netvm, as I understand).

Now, this behavior seems really strange, because:

1) AFAIU, the ARP replies are/should be generated by the netback
interface in the netvm (vif*).

2) It shouldn't matter, for the netback code, how the packets are later
routed (via wlan0 vs. usb0) to provide this (dummy) arp response?

3) ...yet, for some reason, in the case when packets are later routed
through usb0, the netback is not willing to generate arp response???

Or am I misunderstanding this, and it is somebody else who is generating
the arp responses? The final NIC?

Some additional notes:
1) We make sure to set /proc/sys/net/ipv4/conf/vif*/proxy_arp to 1

2) When this "arp hang" happens, the networking (via usb0) is still
working fine in the netvm (i.e. I can do ping google.com from the netvm)

3) This has been tested on various VM kernels (in the netvm): 3.0.4,
3.2.7, and 3.3.5 -- all exhibit the same behavior.

4) Nothing spectacular in the logs of the netvm, however, I can often
see this crash in the *client* VM:

[ 1257.228761] ------------[ cut here ]------------
[ 1257.228767] WARNING: at
/home/user/qubes-src/kernel/kernel-3.3.5/linux-3.3.5/fs/sysfs/file.c:498
sysfs_attr_ns+0x93/0xa0()
[ 1257.228776] sysfs: kobject eth0 without dirent
[ 1257.228780] Modules linked in: iptable_raw bnep bluetooth rfkill
ipt_MASQUERADE ipt_REJECT xt_state xt_tcpudp xen_netback iptable_filter
iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4
ip_tables x_tables xen_netfront microcode pcspkr u2mfn(O) xen_blkback
xen_evtchn autofs4 ext4 jbd2 crc16 dm_snapshot xen_blkfront [last
unloaded: scsi_wait_scan]
[ 1257.228819] Pid: 11, comm: xenwatch Tainted: G        W  O
3.3.5-1.pvops.qubes.x86_64 #1
[ 1257.228825] Call Trace:
[ 1257.228830]  [<ffffffff810495aa>] warn_slowpath_common+0x7a/0xb0
[ 1257.228836]  [<ffffffff81049681>] warn_slowpath_fmt+0x41/0x50
[ 1257.228842]  [<ffffffff81057ba7>] ? lock_timer_base+0x37/0x70
[ 1257.228850]  [<ffffffff811a7433>] sysfs_attr_ns+0x93/0xa0
[ 1257.228856]  [<ffffffff811a7aef>] sysfs_remove_file+0x1f/0x40
[ 1257.228862]  [<ffffffff812e5622>] device_remove_file+0x12/0x20
[ 1257.228870]  [<ffffffffa00faf5a>] xennet_remove+0x84/0xac [xen_netfron=
t]
[ 1257.228875]  [<ffffffff812b5c82>] xenbus_dev_remove+0x42/0xa0
[ 1257.228881]  [<ffffffff812e85a7>] __device_release_driver+0x77/0xd0
[ 1257.228887]  [<ffffffff812e86e8>] device_release_driver+0x28/0x40
[ 1257.228895]  [<ffffffff812e790f>] bus_remove_device+0x10f/0x180
[ 1257.228901]  [<ffffffff812e5808>] device_del+0x118/0x1c0
[ 1257.228906]  [<ffffffff812e58cd>] device_unregister+0x1d/0x60
[ 1257.228914]  [<ffffffff812b5a46>] xenbus_dev_changed+0x96/0x1b0
[ 1257.228920]  [<ffffffff812b74b4>] frontend_changed+0x24/0x50
[ 1257.228926]  [<ffffffff812b4221>] xenwatch_thread+0xb1/0x170
[ 1257.228933]  [<ffffffff8106aea0>] ? wake_up_bit+0x40/0x40
[ 1257.228939]  [<ffffffff812b4170>] ? xenbus_thread+0x40/0x40
[ 1257.228944]  [<ffffffff8106a9a6>] kthread+0x96/0xa0
[ 1257.228951]  [<ffffffff81465724>] kernel_thread_helper+0x4/0x10
[ 1257.228959]  [<ffffffff8145c7fc>] ? retint_restore_args+0x5/0x6
[ 1257.228964]  [<ffffffff81465720>] ? gs_change+0x13/0x13
[ 1257.228968] ---[ end trace 75286ef58ce0391f ]---

But this seems rather irrelevant, as it seems like it is the netvm that
is failing here, i.e. it doesn't generate ARP responses?

I would appreciate any help with this issue!

Thanks,
joanna.


--------------enig25EA00E6119971304D270B7A
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQEcBAEBAgAGBQJPs5sTAAoJEDaIqHeRBUM0v1YIAOGfmXQNk/8dDdwBpAe/tMf5
7BdpFdF3bZwyN9AvNFnN0gsdsY2aPMQV2WHna4mr25k1o3DJyZCXrjltZdIw7RJS
D8V6t4cW4J6qTddaSWQQrK/5ftVbIeN5MsNYsmJfWEb3eayuuGFQAD1Rfi70LRCP
LtB+K5fzkROBomkOglaSNtG+LtH3OMWEW5P0+FkN1aQqXsWwmYO7UX/Rzo0G/uOo
/7WkR3SysEpAaTHF0UEmZdGkuPxPrUfATGJT7T/yeBr1iw/1NYjMKMucwxWTVrJ/
YT+OtUrXZzxlOQ+13OA72vXYTCHXNW6UuTI/NYU1xhGyhIjGgbQHSuCpCRwLiOU=
=+ufZ
-----END PGP SIGNATURE-----

--------------enig25EA00E6119971304D270B7A--


--===============4412883461917978488==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

--===============4412883461917978488==--