From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57246) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WPro5-0007V1-8O for qemu-devel@nongnu.org; Tue, 18 Mar 2014 07:03:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WPrnz-0001RW-3d for qemu-devel@nongnu.org; Tue, 18 Mar 2014 07:03:41 -0400 Received: from rcdn-iport-8.cisco.com ([173.37.86.79]:60579) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WPrny-0001R7-PE for qemu-devel@nongnu.org; Tue, 18 Mar 2014 07:03:35 -0400 From: "Anton Ivanov (antivano)" Date: Tue, 18 Mar 2014 11:03:30 +0000 Message-ID: <53282801.9090509@cisco.com> References: <1394541930-410778-1-git-send-email-anton.ivanov@kot-begemot.co.uk> <20140318104404.GA31452@stefanha-thinkpad.redhat.com> In-Reply-To: <20140318104404.GA31452@stefanha-thinkpad.redhat.com> Content-Language: en-US Content-Type: text/plain; charset="iso-8859-1" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH v3] net: L2TPv3 transport List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: "pbonzini@redhat.com" , "anton.ivanov@kot-begemot.co.uk" , "afaerber@suse.de" , "qemu-devel@nongnu.org" On 18/03/14 10:44, Stefan Hajnoczi wrote: > On Tue, Mar 11, 2014 at 12:45:30PM +0000, anton.ivanov@kot-begemot.co.uk = wrote: >> +/* This protocol number is passed getaddrinfo(), and not >> + * used directly. If you give gettaddrinfo() what one is >> + * supposed to give - INET, RAW, 0, the result is not >> + * set correctly. >> + * Setting the args to INET, RAW, L2TPv3 is the "lowest pain" >> + * workaround in this case as it allows common raw and udp >> + * setup. >> + */ >> + >> +#ifndef IPPROTO_L2TP >> +#define IPPROTO_L2TP 0x73 >> +#endif > The comment is incorrect. Using IPPROTO_L2TP is not a workaround, it's > the expected way to implement an IP-level protocol in userspace. > > getaddrinfo() returns protocol IPPROTO_L2TP unchanged. Then we call > socket(AF_INET, SOCK_RAW, IPPROTO_L2TP). > > INET, RAW, L2TPv3 tells the kernel that we wish to receive IP packets > when the protocol field is equal to IPPROTO_L2TP. If we didn't tell the > kernel which IP protocol to listen for then the socket would not receive > packets (IPPROTO_RAW is send-only). > > I suggest changing the comment simply to: > /* IANA-assigned IP protocol ID for L2TPv3 */ OK :) > >> +static ssize_t net_l2tpv3_receive_dgram_iov(NetClientState *nc, >> + const struct iovec *iov, >> + int iovcnt) >> +{ >> + NetL2TPV3State *s =3D DO_UPCAST(NetL2TPV3State, nc, nc); >> + >> + struct msghdr message; >> + int ret; >> + >> + if (iovcnt > MAX_L2TPV3_IOVCNT - 1) { >> + error_report( >> + "iovec too long %d > %d, change l2tpv3.h", >> + iovcnt, MAX_L2TPV3_IOVCNT >> + ); >> + return -1; >> + } >> + l2tpv3_form_header(s); >> + memcpy(s->vec + 1, iov, iovcnt * sizeof(struct iovec)); >> + s->vec->iov_base =3D s->header_buf; >> + s->vec->iov_len =3D s->offset; >> + message.msg_name =3D s->dgram_dst; >> + message.msg_namelen =3D s->dst_size; >> + message.msg_iov =3D s->vec; >> + message.msg_iovlen =3D iovcnt + 1; >> + message.msg_control =3D NULL; >> + message.msg_controllen =3D 0; >> + message.msg_flags =3D 0; >> + do { >> + ret =3D sendmsg(s->fd, &message, 0); >> + } while ((ret =3D=3D -1) && (errno =3D=3D EINTR)); >> + if (ret > 0) { >> + ret -=3D s->offset; >> + } else if (ret =3D=3D 0) { >> + /* belt and braces - should not occur on DGRAM >> + * we should get an error and never a 0 send >> + */ >> + ret =3D iov_size(iov, iovcnt); >> + } else { >> + /* signal upper layer that socket buffer is full */ >> + ret =3D -errno; >> + if (ret =3D=3D -EAGAIN || ret =3D=3D -ENOBUFS) { >> + ret =3D 0; >> + } >> + } >> + return ret; >> +} > Returning 0 means the peer (the emulated NIC) will stop sending packets > until further notice. Since qemu_flush_queued_packets() is not invoked > anywhere in this source file this will result in deadlock! > > What needs to happen is: > 1. We begin monitoring s->fd to see when it becomes writable again. > 2. When it becomes writable we call qemu_flush_queued_packets() so the > peer can resuming transmission. > > See net/tap.c or other backends for examples of how this is done. OK. Understood. Will add it. I will also add it raw/packet mmap which is almost ready for submission. > >> +static int l2tpv3_verify_header(NetL2TPV3State *s, uint8_t *buf) >> +{ >> + >> + uint32_t *session; >> + uint64_t cookie; >> + >> + if ((!s->udp) && (!s->ipv6)) { >> + buf +=3D sizeof(struct iphdr) /* fix for ipv4 raw */; >> + } >> + if (s->cookie) { >> + if (s->cookie_is_64) { >> + cookie =3D ldq_be_p(buf + s->cookie_offset); >> + } else { >> + cookie =3D ldl_be_p(buf + s->cookie_offset); >> + } >> + if (cookie !=3D s->rx_cookie) { >> + error_report("unknown cookie id\n"); >> + return -1; >> + } >> + } >> + session =3D (uint32_t *) (buf + s->session_offset); >> + if (ldl_be_p(session) !=3D s->rx_session) { >> + error_report("session mismatch"); >> + return -1; >> + } >> + return 0; >> +} >> + >> +static void net_l2tpv3_send(void *opaque) >> +{ >> + NetL2TPV3State *s =3D opaque; >> + >> + int i, count, offset; >> + struct mmsghdr *msgvec; >> + struct iovec *vec; >> + >> + msgvec =3D s->msgvec; >> + offset =3D s->offset; >> + if ((!s->udp) && (!s->ipv6)) { >> + offset +=3D sizeof(struct iphdr); >> + } >> + count =3D recvmmsg(s->fd, msgvec, MAX_L2TPV3_MSGCNT, MSG_DONTWAIT, = NULL); >> + for (i =3D 0; i < count; i++) { >> + if (msgvec->msg_len > 0) { >> + vec =3D msgvec->msg_hdr.msg_iov; >> + if ((msgvec->msg_len > offset) && >> + (l2tpv3_verify_header(s, vec->iov_base) =3D=3D 0)) { >> + vec++; >> + qemu_send_packet(&s->nc, vec->iov_base, >> + msgvec->msg_len - offset); >> + } else { >> + error_report("l2tpv3 header verification failed"); > This and the error_report() ins l2tpv3_verify_header() could be used as > a Denial of Service: a malicious host may be able to send invalid > packets to fill up the host's disk with error messages. > > These are good candidates for rate-limiting or even once-only error > messages. The error_report() API currently does not have a way to do > that. > > I don't insist on this but it's something worth changing in the future. Agreed - we just increment the dropped count it in the UML counterpart (I removed this for that exact reason there). > >> diff --git a/qemu-options.hx b/qemu-options.hx >> index 8b94264..ce4d99d 100644 >> --- a/qemu-options.hx >> +++ b/qemu-options.hx >> @@ -1395,19 +1395,28 @@ DEF("net", HAS_ARG, QEMU_OPTION_net, >> " (default=3D" DEFAULT_BRIDGE_INTERFACE ") using the= program 'helper'\n" >> " (default=3D" DEFAULT_BRIDGE_HELPER ")\n" >> #endif >> - "-net socket[,vlan=3Dn][,name=3Dstr][,fd=3Dh][,listen=3D[host]:port= ][,connect=3Dhost:port]\n" >> - " connect the vlan 'n' to another VLAN using a socke= t connection\n" >> - "-net socket[,vlan=3Dn][,name=3Dstr][,fd=3Dh][,mcast=3Dmaddr:port[,= localaddr=3Daddr]]\n" >> - " connect the vlan 'n' to multicast maddr and port\n= " >> - " use 'localaddr=3Daddr' to specify the host address= to send packets from\n" >> - "-net socket[,vlan=3Dn][,name=3Dstr][,fd=3Dh][,udp=3Dhost:port][,lo= caladdr=3Dhost:port]\n" >> - " connect the vlan 'n' to another VLAN using an UDP = tunnel\n" >> -#ifdef CONFIG_VDE >> - "-net vde[,vlan=3Dn][,name=3Dstr][,sock=3Dsocketpath][,port=3Dn][,g= roup=3Dgroupname][,mode=3Doctalmode]\n" >> - " connect the vlan 'n' to port 'n' of a vde switch r= unning\n" >> - " on host and listening for incoming connections on = 'socketpath'.\n" >> - " Use group 'groupname' and mode 'octalmode' to chan= ge default\n" >> - " ownership and permissions for communication port.\= n" > Please leave socket and vde. They are not being deprecated yet. Apologies, did not have the intention of killing them :) A.