* dhclient, checksum and tap @ 2010-06-25 15:10 Michael S. Tsirkin 2010-06-25 18:21 ` David Miller 0 siblings, 1 reply; 7+ messages in thread From: Michael S. Tsirkin @ 2010-06-25 15:10 UTC (permalink / raw) To: Herbert Xu; +Cc: netdev Hi! I've been looking at the issue of checksum on dhcp packets: to recap, 8dc4194474159660d7f37c495e3fc3f10d0db8cc added a way for af_packet sockets to get the packet status. Unfortunately not all dhcp clients caught up with this development, so they are still broken when both server and client run on the same host, e.g. with bridge+tap. And of course virtualization adds another way to run old dhcp clients, so userspace virtio net in qemu has a hack to detect DHCP and fill in the checksum. I guess we could add this in vhost, as well. However, one wonders whether the tap driver is a better place for this work-around, that would help all users. Any objections against putting such code in tap? -- MST ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: dhclient, checksum and tap 2010-06-25 15:10 dhclient, checksum and tap Michael S. Tsirkin @ 2010-06-25 18:21 ` David Miller 2010-06-26 21:14 ` Michael S. Tsirkin 0 siblings, 1 reply; 7+ messages in thread From: David Miller @ 2010-06-25 18:21 UTC (permalink / raw) To: mst; +Cc: herbert.xu, netdev From: "Michael S. Tsirkin" <mst@redhat.com> Date: Fri, 25 Jun 2010 18:10:08 +0300 > I've been looking at the issue of checksum on > dhcp packets: to recap, 8dc4194474159660d7f37c495e3fc3f10d0db8cc > added a way for af_packet sockets to get the packet status. > Unfortunately not all dhcp clients caught up with > this development, so they are still broken > when both server and client run on the same host, > e.g. with bridge+tap. > > And of course virtualization adds another way to run > old dhcp clients, so userspace virtio net in qemu has > a hack to detect DHCP and fill in the checksum. > I guess we could add this in vhost, as well. > > However, one wonders whether the tap driver is a better place > for this work-around, that would help all users. > Any objections against putting such code in tap? We added the af_packet status as the migration path to deal with this issue in the cleanest manner possible. Putting a new hack into the TAP driver works contrary to that goal. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: dhclient, checksum and tap 2010-06-25 18:21 ` David Miller @ 2010-06-26 21:14 ` Michael S. Tsirkin 2010-06-27 3:03 ` David Miller 0 siblings, 1 reply; 7+ messages in thread From: Michael S. Tsirkin @ 2010-06-26 21:14 UTC (permalink / raw) To: David Miller; +Cc: herbert.xu, netdev On Fri, Jun 25, 2010 at 11:21:52AM -0700, David Miller wrote: > From: "Michael S. Tsirkin" <mst@redhat.com> > Date: Fri, 25 Jun 2010 18:10:08 +0300 > > > I've been looking at the issue of checksum on > > dhcp packets: to recap, 8dc4194474159660d7f37c495e3fc3f10d0db8cc > > added a way for af_packet sockets to get the packet status. > > Unfortunately not all dhcp clients caught up with > > this development, so they are still broken > > when both server and client run on the same host, > > e.g. with bridge+tap. > > > > And of course virtualization adds another way to run > > old dhcp clients, so userspace virtio net in qemu has > > a hack to detect DHCP and fill in the checksum. > > I guess we could add this in vhost, as well. > > > > However, one wonders whether the tap driver is a better place > > for this work-around, that would help all users. > > Any objections against putting such code in tap? > > We added the af_packet status as the migration path to deal with > this issue in the cleanest manner possible. Putting a new hack > into the TAP driver works contrary to that goal. Hmm, problem is, using the af_packets status requires userspace changes, and so does not help old clients. And for virt, clients might be running old kernels without this support. qemu has a hack to make old guests running within qemu work. I guess I can copy that hack into vhost - a bit ugly as I don't have access to the original skb there, so I will have to duplcate some logic, but doable. Is this what you suggest? OTOH if we had the workaround in tap, this could replace hacks in both vhost and qemu. -- MST ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: dhclient, checksum and tap 2010-06-26 21:14 ` Michael S. Tsirkin @ 2010-06-27 3:03 ` David Miller 2010-06-27 8:24 ` Michael S. Tsirkin 2010-06-27 15:46 ` [PATCH RFC] vhost-net: add dhclient work-around from userspace Michael S. Tsirkin 0 siblings, 2 replies; 7+ messages in thread From: David Miller @ 2010-06-27 3:03 UTC (permalink / raw) To: mst; +Cc: herbert.xu, netdev From: "Michael S. Tsirkin" <mst@redhat.com> Date: Sun, 27 Jun 2010 00:14:19 +0300 > On Fri, Jun 25, 2010 at 11:21:52AM -0700, David Miller wrote: >> We added the af_packet status as the migration path to deal with >> this issue in the cleanest manner possible. Putting a new hack >> into the TAP driver works contrary to that goal. > > Hmm, problem is, using the af_packets status requires > userspace changes, and so does not help old clients. > And for virt, clients might be running old kernels without this support. > qemu has a hack to make old guests running within qemu work. > I guess I can copy that hack into vhost - a bit ugly as I don't have > access to the original skb there, so I will have to duplcate some logic, > but doable. Is this what you suggest? OTOH if we had the workaround in > tap, this could replace hacks in both vhost and qemu. If you add the TAP thing you can _never_ remove it. Exactly for the same reason that the qemu thing can never be removed. It'll always be needed for the sake of old guests running old stuff. This is why I truly believe that keeping the af_packet status thing as the only kernel side assist is likely best in the long run. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: dhclient, checksum and tap 2010-06-27 3:03 ` David Miller @ 2010-06-27 8:24 ` Michael S. Tsirkin 2010-06-28 4:59 ` David Miller 2010-06-27 15:46 ` [PATCH RFC] vhost-net: add dhclient work-around from userspace Michael S. Tsirkin 1 sibling, 1 reply; 7+ messages in thread From: Michael S. Tsirkin @ 2010-06-27 8:24 UTC (permalink / raw) To: David Miller; +Cc: herbert.xu, netdev On Sat, Jun 26, 2010 at 08:03:20PM -0700, David Miller wrote: > From: "Michael S. Tsirkin" <mst@redhat.com> > Date: Sun, 27 Jun 2010 00:14:19 +0300 > > > On Fri, Jun 25, 2010 at 11:21:52AM -0700, David Miller wrote: > >> We added the af_packet status as the migration path to deal with > >> this issue in the cleanest manner possible. Putting a new hack > >> into the TAP driver works contrary to that goal. > > > > Hmm, problem is, using the af_packets status requires > > userspace changes, and so does not help old clients. > > And for virt, clients might be running old kernels without this support. > > qemu has a hack to make old guests running within qemu work. > > I guess I can copy that hack into vhost - a bit ugly as I don't have > > access to the original skb there, so I will have to duplcate some logic, > > but doable. Is this what you suggest? OTOH if we had the workaround in > > tap, this could replace hacks in both vhost and qemu. > > If you add the TAP thing you can _never_ remove it. Exactly for the > same reason that the qemu thing can never be removed. It'll always be > needed for the sake of old guests running old stuff. > > This is why I truly believe that keeping the af_packet status thing as > the only kernel side assist is likely best in the long run. Just to spell it out for me, you think the hack should be done in vhost-net? -- MST ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: dhclient, checksum and tap 2010-06-27 8:24 ` Michael S. Tsirkin @ 2010-06-28 4:59 ` David Miller 0 siblings, 0 replies; 7+ messages in thread From: David Miller @ 2010-06-28 4:59 UTC (permalink / raw) To: mst; +Cc: herbert.xu, netdev From: "Michael S. Tsirkin" <mst@redhat.com> Date: Sun, 27 Jun 2010 11:24:40 +0300 > Just to spell it out for me, you think the hack should be done > in vhost-net? Pretty much, yes. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH RFC] vhost-net: add dhclient work-around from userspace 2010-06-27 3:03 ` David Miller 2010-06-27 8:24 ` Michael S. Tsirkin @ 2010-06-27 15:46 ` Michael S. Tsirkin 1 sibling, 0 replies; 7+ messages in thread From: Michael S. Tsirkin @ 2010-06-27 15:46 UTC (permalink / raw) To: Sridhar Samudrala, David S. Miller, Arnd Bergmann, Paul E. McKenney, kvm Userspace virtio server has the following hack so guests rely on it, and we have to replicate it, too: use source port to detect incoming IPv4 DHCP response packets, and fill in the checksum for these. The issue we are solving is that on linux guests, some apps that use recvmsg with AF_PACKET sockets, don't know how to handle CHECKSUM_PARTIAL; The interface to return the relevant information was added in 8dc4194474159660d7f37c495e3fc3f10d0db8cc, and older userspace does not use it. One important user of recvmsg with AF_PACKET is dhclient, so we add a work-around just for DHCP. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> --- So here's what I came up with: I basically copied the work-around from userspace virtio. As suggested by Dave (assuming I understood the suggestion correctly) this implements the workaround in vhost-net, so other tun users don't start relying on it. Untested. drivers/vhost/net.c | 42 +++++++++++++++++++++++++++++++++++++++++- 1 files changed, 41 insertions(+), 1 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 54096ee..9ed4051 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -25,6 +25,10 @@ #include <linux/if_tun.h> #include <linux/if_macvlan.h> +#include <linux/ip.h> +#include <linux/udp.h> +#include <linux/netdevice.h> + #include <net/sock.h> #include "vhost.h" @@ -191,6 +195,42 @@ static void handle_tx(struct vhost_net *net) unuse_mm(net->dev.mm); } +static int peek_head(struct sock *sk) +{ + struct sk_buff *head; + int ret; + + lock_sock(sk); + head = skb_peek(&sk->sk_receive_queue); + if (likely(head)) { + ret = 1; + /* Userspace virtio server has the following hack so + * guests rely on it, and we have to replicate it, too: */ + /* On linux guests, some apps that use recvmsg with AF_PACKET + * sockets, don't know how to handle CHECKSUM_PARTIAL; + * The interface to return the relevant information was added in + * 8dc4194474159660d7f37c495e3fc3f10d0db8cc, + * and older userspace does not use it. + * One important user of recvmsg with AF_PACKET is dhclient, + * so we add a work-around just for DHCP. */ + /* We use source port to detect DHCP packets. */ + if (skb->ip_summed == CHECKSUM_PARTIAL && + skb->protocol == htons(ETH_P_IP) && + skb_network_header_len(skb) >= sizeof(struct iphdr) && + ip_hdr(skb)->protocol == IPPRODO_UDP && + skb_headlen(skb) >= skb_transport_offset(skb) + sizeof(struct udphdr) && + udp_hdr(skb)->source == htons(0x67)) { + skb_checksum_help(skb); + /* Restore ip_summed value: tun passes it to user. */ + skb->ip_summed = CHECKSUM_PARTIAL; + } + } else { + ret = 0; + } + release_sock(sk); + return len; +} + /* Expects to be always run from workqueue - which acts as * read-size critical section for our kind of RCU. */ static void handle_rx(struct vhost_net *net) @@ -228,7 +268,7 @@ static void handle_rx(struct vhost_net *net) vq_log = unlikely(vhost_has_feature(&net->dev, VHOST_F_LOG_ALL)) ? vq->log : NULL; - for (;;) { + while (peek_head(sock)) { head = vhost_get_vq_desc(&net->dev, vq, vq->iov, ARRAY_SIZE(vq->iov), &out, &in, -- 1.7.1.12.g42b7f ^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-06-28 4:59 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-06-25 15:10 dhclient, checksum and tap Michael S. Tsirkin 2010-06-25 18:21 ` David Miller 2010-06-26 21:14 ` Michael S. Tsirkin 2010-06-27 3:03 ` David Miller 2010-06-27 8:24 ` Michael S. Tsirkin 2010-06-28 4:59 ` David Miller 2010-06-27 15:46 ` [PATCH RFC] vhost-net: add dhclient work-around from userspace Michael S. Tsirkin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).