From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36481) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WVgdQ-0003mO-U8 for qemu-devel@nongnu.org; Thu, 03 Apr 2014 08:20:49 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WVgdM-0007zT-9M for qemu-devel@nongnu.org; Thu, 03 Apr 2014 08:20:44 -0400 Received: from lputeaux-656-01-25-125.w80-12.abo.wanadoo.fr ([80.12.84.125]:52748 helo=paradis.irqsave.net) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WVgdL-0007zH-KA for qemu-devel@nongnu.org; Thu, 03 Apr 2014 08:20:40 -0400 Date: Thu, 3 Apr 2014 14:20:40 +0200 From: =?iso-8859-1?Q?Beno=EEt?= Canet Message-ID: <20140403122039.GA4672@irqsave.net> References: <1396276759-189034-1-git-send-email-anton.ivanov@kot-begemot.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <1396276759-189034-1-git-send-email-anton.ivanov@kot-begemot.co.uk> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v7] net: L2TPv3 transport List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: anton.ivanov@kot-begemot.co.uk Cc: Anton Ivanov , pbonzini@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com, afaerber@suse.de The Monday 31 Mar 2014 =E0 15:39:19 (+0100), anton.ivanov@kot-begemot.co.= uk wrote : > From: Anton Ivanov >=20 > This transport allows to connect a QEMU nic to a static Ethernet > over L2TPv3 tunnel. The transport supports all options present > in the Linux kernel implementation. It allows QEMU to connect > to any Linux host running kernel 3.3+, most routers and network > devices as well as other QEMU instances. >=20 > Signed-off-by: Anton Ivanov > --- Hi, On what branch does this patch apply ? I failed to apply it on master. Best regards Beno=EEt >=20 > Addressed in this release: >=20 > 1. Back to qemu send_packet instead of direct delivery from the > recvmmsg ring. The zero copy off driver rx ring will be reintroduced > in a later patch >=20 > 2. Fixed mismerge of header size handling from our tree >=20 > 3. Fixed formatting >=20 >=20 > net/Makefile.objs | 1 + > net/clients.h | 2 + > net/l2tpv3.c | 745 +++++++++++++++++++++++++++++++++++++++++++++= ++++++++ > net/net.c | 3 + > qapi-schema.json | 60 +++++ > qemu-options.hx | 82 ++++++ > 6 files changed, 893 insertions(+) > create mode 100644 net/l2tpv3.c >=20 > diff --git a/net/Makefile.objs b/net/Makefile.objs > index 4854a14..160214e 100644 > --- a/net/Makefile.objs > +++ b/net/Makefile.objs > @@ -2,6 +2,7 @@ common-obj-y =3D net.o queue.o checksum.o util.o hub.o > common-obj-y +=3D socket.o > common-obj-y +=3D dump.o > common-obj-y +=3D eth.o > +common-obj-$(CONFIG_LINUX) +=3D l2tpv3.o > common-obj-$(CONFIG_POSIX) +=3D tap.o > common-obj-$(CONFIG_LINUX) +=3D tap-linux.o > common-obj-$(CONFIG_WIN32) +=3D tap-win32.o > diff --git a/net/clients.h b/net/clients.h > index 7793294..bbf177c 100644 > --- a/net/clients.h > +++ b/net/clients.h > @@ -47,6 +47,8 @@ int net_init_tap(const NetClientOptions *opts, const = char *name, > int net_init_bridge(const NetClientOptions *opts, const char *name, > NetClientState *peer); > =20 > +int net_init_l2tpv3(const NetClientOptions *opts, const char *name, > + NetClientState *peer); > #ifdef CONFIG_VDE > int net_init_vde(const NetClientOptions *opts, const char *name, > NetClientState *peer); > diff --git a/net/l2tpv3.c b/net/l2tpv3.c > new file mode 100644 > index 0000000..4439ab7 > --- /dev/null > +++ b/net/l2tpv3.c > @@ -0,0 +1,745 @@ > +/* > + * QEMU System Emulator > + * > + * Copyright (c) 2003-2008 Fabrice Bellard > + * Copyright (c) 2012-2014 Cisco Systems > + * > + * Permission is hereby granted, free of charge, to any person obtaini= ng a copy > + * of this software and associated documentation files (the "Software"= ), to deal > + * in the Software without restriction, including without limitation t= he rights > + * to use, copy, modify, merge, publish, distribute, sublicense, and/o= r sell > + * copies of the Software, and to permit persons to whom the Software = is > + * furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice shall be incl= uded in > + * all copies or substantial portions of the Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXP= RESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABI= LITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT S= HALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES O= R OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARI= SING FROM, > + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALI= NGS IN > + * THE SOFTWARE. > + */ > + > +#include > +#include > +#include "config-host.h" > +#include "net/net.h" > +#include "clients.h" > +#include "monitor/monitor.h" > +#include "qemu-common.h" > +#include "qemu/error-report.h" > +#include "qemu/option.h" > +#include "qemu/sockets.h" > +#include "qemu/iov.h" > +#include "qemu/main-loop.h" > + > + > +/* The buffer size needs to be investigated for optimum numbers and > + * optimum means of paging in on different systems. This size is > + * chosen to be sufficient to accommodate one packet with some headers > + */ > + > +#define BUFFER_ALIGN sysconf(_SC_PAGESIZE) > +#define BUFFER_SIZE 2048 > +#define IOVSIZE 2 > +#define MAX_L2TPV3_MSGCNT 64 > +#define MAX_L2TPV3_IOVCNT (MAX_L2TPV3_MSGCNT * IOVSIZE) > + > +/* Header set to 0x30000 signifies a data packet */ > + > +#define L2TPV3_DATA_PACKET 0x30000 > + > +/* IANA-assigned IP protocol ID for L2TPv3 */ > + > +#ifndef IPPROTO_L2TP > +#define IPPROTO_L2TP 0x73 > +#endif > + > +typedef struct NetL2TPV3State { > + NetClientState nc; > + int fd; > + > + /* > + * these are used for xmit - that happens packet a time > + * and for first sign of life packet (easier to parse that once) > + */ > + > + uint8_t *header_buf; > + struct iovec *vec; > + > + /* > + * these are used for receive - try to "eat" up to 32 packets at a= time > + */ > + > + struct mmsghdr *msgvec; > + > + /* > + * peer address > + */ > + > + struct sockaddr_storage *dgram_dst; > + uint32_t dst_size; > + > + /* > + * L2TPv3 parameters > + */ > + > + uint64_t rx_cookie; > + uint64_t tx_cookie; > + uint32_t rx_session; > + uint32_t tx_session; > + uint32_t header_size; > + uint32_t counter; > + > + /* > + * DOS avoidance in error handling > + */ > + > + bool header_mismatch; > + > + /* > + * Ring buffer handling > + */ > + > + int queue_head; > + int queue_tail; > + int queue_depth; > + > + /* > + * Precomputed offsets > + */ > + > + uint32_t offset; > + uint32_t cookie_offset; > + uint32_t counter_offset; > + uint32_t session_offset; > + > + /* Poll Control */ > + > + bool read_poll; > + bool write_poll; > + > + /* Flags */ > + > + bool ipv6; > + bool udp; > + bool has_counter; > + bool pin_counter; > + bool cookie; > + bool cookie_is_64; > + > +} NetL2TPV3State; > + > +static int l2tpv3_can_send(void *opaque); > +static void net_l2tpv3_send(void *opaque); > +static void l2tpv3_writable(void *opaque); > + > +static void l2tpv3_update_fd_handler(NetL2TPV3State *s) > +{ > + qemu_set_fd_handler2(s->fd, > + s->read_poll ? l2tpv3_can_send : NULL, > + s->read_poll ? net_l2tpv3_send : NULL, > + s->write_poll ? l2tpv3_writable : NULL, > + s); > +} > + > +static void l2tpv3_read_poll(NetL2TPV3State *s, bool enable) > +{ > + if (s->read_poll !=3D enable) { > + s->read_poll =3D enable; > + l2tpv3_update_fd_handler(s); > + } > +} > + > +static void l2tpv3_write_poll(NetL2TPV3State *s, bool enable) > +{ > + if (s->write_poll !=3D enable) { > + s->write_poll =3D enable; > + l2tpv3_update_fd_handler(s); > + } > +} > + > +static void l2tpv3_writable(void *opaque) > +{ > + NetL2TPV3State *s =3D opaque; > + l2tpv3_write_poll(s, false); > + qemu_flush_queued_packets(&s->nc); > +} > + > +static int l2tpv3_can_send(void *opaque) > +{ > + NetL2TPV3State *s =3D opaque; > + > + return qemu_can_send_packet(&s->nc); > +} > + > +static void l2tpv3_send_completed(NetClientState *nc, ssize_t len) > +{ > + NetL2TPV3State *s =3D DO_UPCAST(NetL2TPV3State, nc, nc); > + l2tpv3_read_poll(s, true); > +} > + > +static void l2tpv3_poll(NetClientState *nc, bool enable) > +{ > + NetL2TPV3State *s =3D DO_UPCAST(NetL2TPV3State, nc, nc); > + l2tpv3_write_poll(s, enable); > + l2tpv3_read_poll(s, enable); > +} > + > +static void l2tpv3_form_header(NetL2TPV3State *s) > +{ > + uint32_t *counter; > + > + if (s->udp) { > + stl_be_p((uint32_t *) s->header_buf, L2TPV3_DATA_PACKET); > + } > + stl_be_p( > + (uint32_t *) (s->header_buf + s->session_offset), > + s->tx_session > + ); > + if (s->cookie) { > + if (s->cookie_is_64) { > + stq_be_p( > + (uint64_t *)(s->header_buf + s->cookie_offset), > + s->tx_cookie > + ); > + } else { > + stl_be_p( > + (uint32_t *) (s->header_buf + s->cookie_offset), > + s->tx_cookie > + ); > + } > + } > + if (s->has_counter) { > + counter =3D (uint32_t *)(s->header_buf + s->counter_offset); > + if (s->pin_counter) { > + *counter =3D 0; > + } else { > + stl_be_p(counter, ++s->counter); > + } > + } > +} > + > +static ssize_t net_l2tpv3_receive_dgram_iov(NetClientState *nc, > + const struct iovec *iov, > + int iovcnt) > +{ > + NetL2TPV3State *s =3D DO_UPCAST(NetL2TPV3State, nc, nc); > + > + struct msghdr message; > + int ret; > + > + if (iovcnt > MAX_L2TPV3_IOVCNT - 1) { > + error_report( > + "iovec too long %d > %d, change l2tpv3.h", > + iovcnt, MAX_L2TPV3_IOVCNT > + ); > + return -1; > + } > + l2tpv3_form_header(s); > + memcpy(s->vec + 1, iov, iovcnt * sizeof(struct iovec)); > + s->vec->iov_base =3D s->header_buf; > + s->vec->iov_len =3D s->offset; > + message.msg_name =3D s->dgram_dst; > + message.msg_namelen =3D s->dst_size; > + message.msg_iov =3D s->vec; > + message.msg_iovlen =3D iovcnt + 1; > + message.msg_control =3D NULL; > + message.msg_controllen =3D 0; > + message.msg_flags =3D 0; > + do { > + ret =3D sendmsg(s->fd, &message, 0); > + } while ((ret =3D=3D -1) && (errno =3D=3D EINTR)); > + if (ret > 0) { > + ret -=3D s->offset; > + } else if (ret =3D=3D 0) { > + /* belt and braces - should not occur on DGRAM > + * we should get an error and never a 0 send > + */ > + ret =3D iov_size(iov, iovcnt); > + } else { > + /* signal upper layer that socket buffer is full */ > + ret =3D -errno; > + if (ret =3D=3D -EAGAIN || ret =3D=3D -ENOBUFS) { > + l2tpv3_write_poll(s, true); > + ret =3D 0; > + } > + } > + return ret; > +} > + > +static ssize_t net_l2tpv3_receive_dgram(NetClientState *nc, > + const uint8_t *buf, > + size_t size) > +{ > + NetL2TPV3State *s =3D DO_UPCAST(NetL2TPV3State, nc, nc); > + > + struct iovec *vec; > + struct msghdr message; > + ssize_t ret =3D 0; > + > + l2tpv3_form_header(s); > + vec =3D s->vec; > + vec->iov_base =3D s->header_buf; > + vec->iov_len =3D s->offset; > + vec++; > + vec->iov_base =3D (void *) buf; > + vec->iov_len =3D size; > + message.msg_name =3D s->dgram_dst; > + message.msg_namelen =3D s->dst_size; > + message.msg_iov =3D s->vec; > + message.msg_iovlen =3D 2; > + message.msg_control =3D NULL; > + message.msg_controllen =3D 0; > + message.msg_flags =3D 0; > + do { > + ret =3D sendmsg(s->fd, &message, 0); > + } while ((ret =3D=3D -1) && (errno =3D=3D EINTR)); > + if (ret > 0) { > + ret -=3D s->offset; > + } else if (ret =3D=3D 0) { > + /* belt and braces - should not occur on DGRAM > + * we should get an error and never a 0 send > + */ > + ret =3D size; > + } else { > + ret =3D -errno; > + if (ret =3D=3D -EAGAIN || ret =3D=3D -ENOBUFS) { > + /* signal upper layer that socket buffer is full */ > + l2tpv3_write_poll(s, true); > + ret =3D 0; > + } > + } > + return ret; > +} > + > +static int l2tpv3_verify_header(NetL2TPV3State *s, uint8_t *buf) > +{ > + > + uint32_t *session; > + uint64_t cookie; > + > + if ((!s->udp) && (!s->ipv6)) { > + buf +=3D sizeof(struct iphdr) /* fix for ipv4 raw */; > + } > + > + /* we do not do a strict check for "data" packets as per > + * the RFC spec because the pure IP spec does not have > + * that anyway. > + */ > + > + if (s->cookie) { > + if (s->cookie_is_64) { > + cookie =3D ldq_be_p(buf + s->cookie_offset); > + } else { > + cookie =3D ldl_be_p(buf + s->cookie_offset); > + } > + if (cookie !=3D s->rx_cookie) { > + if (!s->header_mismatch) { > + error_report("unknown cookie id"); > + } > + return -1; > + } > + } > + session =3D (uint32_t *) (buf + s->session_offset); > + if (ldl_be_p(session) !=3D s->rx_session) { > + if (!s->header_mismatch) { > + error_report("session mismatch"); > + } > + return -1; > + } > + return 0; > +} > + > +static void net_l2tpv3_process_queue(NetL2TPV3State *s) > +{ > + int size =3D 0; > + struct iovec *vec; > + bool bad_read; > + int data_size; > + struct mmsghdr *msgvec; > + > + /* go into ring mode only if there is a "pending" tail */ > + if (s->queue_depth > 0) { > + do { > + msgvec =3D s->msgvec + s->queue_tail; > + if (msgvec->msg_len > 0) { > + data_size =3D msgvec->msg_len - s->header_size; > + vec =3D msgvec->msg_hdr.msg_iov; > + if ((data_size > 0) && > + (l2tpv3_verify_header(s, vec->iov_base) =3D=3D 0))= { > + vec++; > + /* Use the legacy delivery for now, we will > + * switch to using our own ring as a queueing mech= anism > + * at a later date > + */ > + size =3D qemu_send_packet_async( > + &s->nc, > + vec->iov_base, > + data_size, > + l2tpv3_send_completed > + ); > + bad_read =3D false; > + } else { > + bad_read =3D true; > + if (!s->header_mismatch) { > + /* report error only once */ > + error_report("l2tpv3 header verification faile= d"); > + s->header_mismatch =3D true; > + } > + } > + } else { > + bad_read =3D true; > + } > + if ((bad_read) || (size > 0)) { > + s->queue_tail =3D (s->queue_tail + 1) % MAX_L2TPV3_MSG= CNT; > + s->queue_depth--; > + } > + } while ( > + (s->queue_depth > 0) && > + qemu_can_send_packet(&s->nc) && > + ((size > 0) || bad_read) > + ); > + } > +} > + > +static void net_l2tpv3_send(void *opaque) > +{ > + NetL2TPV3State *s =3D opaque; > + int target_count, count; > + struct mmsghdr *msgvec; > + > + /* go into ring mode only if there is a "pending" tail */ > + > + if (s->queue_depth) { > + > + /* The ring buffer we use has variable intake > + * count of how much we can read varies - adjust accordingly > + */ > + > + target_count =3D MAX_L2TPV3_MSGCNT - s->queue_depth; > + > + /* Ensure we do not overrun the ring when we have > + * a lot of enqueued packets > + */ > + > + if (s->queue_head + target_count > MAX_L2TPV3_MSGCNT) { > + target_count =3D MAX_L2TPV3_MSGCNT - s->queue_head; > + } > + } else { > + > + /* we do not have any pending packets - we can use > + * the whole message vector linearly instead of using > + * it as a ring > + */ > + > + s->queue_head =3D 0; > + s->queue_tail =3D 0; > + target_count =3D MAX_L2TPV3_MSGCNT; > + } > + > + msgvec =3D s->msgvec + s->queue_head; > + if (target_count > 0) { > + do { > + count =3D recvmmsg( > + s->fd, > + msgvec, > + target_count, MSG_DONTWAIT, NULL); > + } while ((count =3D=3D -1) && (errno =3D=3D EINTR)); > + if (count < 0) { > + /* Recv error - we still need to flush packets here, > + * (re)set queue head to current position > + */ > + count =3D 0; > + } > + s->queue_head =3D (s->queue_head + count) % MAX_L2TPV3_MSGCNT; > + s->queue_depth +=3D count; > + } > + net_l2tpv3_process_queue(s); > +} > + > +static void destroy_vector(struct mmsghdr *msgvec, int count, int iovc= ount) > +{ > + int i, j; > + struct iovec *iov; > + struct mmsghdr *cleanup =3D msgvec; > + if (cleanup) { > + for (i =3D 0; i < count; i++) { > + if (cleanup->msg_hdr.msg_iov) { > + iov =3D cleanup->msg_hdr.msg_iov; > + for (j =3D 0; j < iovcount; j++) { > + g_free(iov->iov_base); > + iov++; > + } > + g_free(cleanup->msg_hdr.msg_iov); > + } > + cleanup++; > + } > + g_free(msgvec); > + } > +} > + > +static struct mmsghdr *build_l2tpv3_vector(NetL2TPV3State *s, int coun= t) > +{ > + int i; > + struct iovec *iov; > + struct mmsghdr *msgvec, *result; > + > + msgvec =3D g_malloc(sizeof(struct mmsghdr) * count); > + result =3D msgvec; > + for (i =3D 0; i < count ; i++) { > + msgvec->msg_hdr.msg_name =3D NULL; > + msgvec->msg_hdr.msg_namelen =3D 0; > + iov =3D g_malloc(sizeof(struct iovec) * IOVSIZE); > + msgvec->msg_hdr.msg_iov =3D iov; > + iov->iov_base =3D g_malloc(s->header_size); > + iov->iov_len =3D s->header_size; > + iov++ ; > + iov->iov_base =3D qemu_memalign(BUFFER_ALIGN, BUFFER_SIZE); > + iov->iov_len =3D BUFFER_SIZE; > + msgvec->msg_hdr.msg_iovlen =3D 2; > + msgvec->msg_hdr.msg_control =3D NULL; > + msgvec->msg_hdr.msg_controllen =3D 0; > + msgvec->msg_hdr.msg_flags =3D 0; > + msgvec++; > + } > + return result; > +} > + > +static void net_l2tpv3_cleanup(NetClientState *nc) > +{ > + NetL2TPV3State *s =3D DO_UPCAST(NetL2TPV3State, nc, nc); > + qemu_purge_queued_packets(nc); > + l2tpv3_read_poll(s, false); > + l2tpv3_write_poll(s, false); > + close(s->fd); > + destroy_vector(s->msgvec, MAX_L2TPV3_MSGCNT, IOVSIZE); > + g_free(s->header_buf); > + g_free(s->dgram_dst); > +} > + > +static NetClientInfo net_l2tpv3_info =3D { > + .type =3D NET_CLIENT_OPTIONS_KIND_L2TPV3, > + .size =3D sizeof(NetL2TPV3State), > + .receive =3D net_l2tpv3_receive_dgram, > + .receive_iov =3D net_l2tpv3_receive_dgram_iov, > + .poll =3D l2tpv3_poll, > + .cleanup =3D net_l2tpv3_cleanup, > +}; > + > +int net_init_l2tpv3(const NetClientOptions *opts, > + const char *name, > + NetClientState *peer) > +{ > + > + > + const NetdevL2TPv3Options *l2tpv3; > + NetL2TPV3State *s; > + NetClientState *nc; > + int fd =3D -1, gairet; > + struct addrinfo hints; > + struct addrinfo *result =3D NULL; > + char *srcport, *dstport; > + > + nc =3D qemu_new_net_client(&net_l2tpv3_info, peer, "l2tpv3", name)= ; > + > + s =3D DO_UPCAST(NetL2TPV3State, nc, nc); > + > + s->queue_head =3D 0; > + s->queue_tail =3D 0; > + s->header_mismatch =3D false; > + > + assert(opts->kind =3D=3D NET_CLIENT_OPTIONS_KIND_L2TPV3); > + l2tpv3 =3D opts->l2tpv3; > + > + if (l2tpv3->has_ipv6 && l2tpv3->ipv6) { > + s->ipv6 =3D l2tpv3->ipv6; > + } else { > + s->ipv6 =3D false; > + } > + > + if (l2tpv3->has_rxcookie || l2tpv3->has_txcookie) { > + if (l2tpv3->has_rxcookie && l2tpv3->has_txcookie) { > + s->cookie =3D true; > + } else { > + goto outerr; > + } > + } else { > + s->cookie =3D false; > + } > + > + if (l2tpv3->has_cookie64 || l2tpv3->cookie64) { > + s->cookie_is_64 =3D true; > + } else { > + s->cookie_is_64 =3D false; > + } > + > + if (l2tpv3->has_udp && l2tpv3->udp) { > + s->udp =3D true; > + if (!(l2tpv3->has_srcport && l2tpv3->has_dstport)) { > + error_report("l2tpv3_open : need both src and dst port for= udp"); > + goto outerr; > + } else { > + srcport =3D l2tpv3->srcport; > + dstport =3D l2tpv3->dstport; > + } > + } else { > + s->udp =3D false; > + srcport =3D NULL; > + dstport =3D NULL; > + } > + > + > + s->offset =3D 4; > + s->session_offset =3D 0; > + s->cookie_offset =3D 4; > + s->counter_offset =3D 4; > + > + s->tx_session =3D l2tpv3->txsession; > + if (l2tpv3->has_rxsession) { > + s->rx_session =3D l2tpv3->rxsession; > + } else { > + s->rx_session =3D s->tx_session; > + } > + > + if (s->cookie) { > + s->rx_cookie =3D l2tpv3->rxcookie; > + s->tx_cookie =3D l2tpv3->txcookie; > + if (s->cookie_is_64 =3D=3D true) { > + /* 64 bit cookie */ > + s->offset +=3D 8; > + s->counter_offset +=3D 8; > + } else { > + /* 32 bit cookie */ > + s->offset +=3D 4; > + s->counter_offset +=3D 4; > + } > + } > + > + memset(&hints, 0, sizeof(hints)); > + > + if (s->ipv6) { > + hints.ai_family =3D AF_INET6; > + } else { > + hints.ai_family =3D AF_INET; > + } > + if (s->udp) { > + hints.ai_socktype =3D SOCK_DGRAM; > + hints.ai_protocol =3D 0; > + s->offset +=3D 4; > + s->counter_offset +=3D 4; > + s->session_offset +=3D 4; > + s->cookie_offset +=3D 4; > + } else { > + hints.ai_socktype =3D SOCK_RAW; > + hints.ai_protocol =3D IPPROTO_L2TP; > + } > + > + gairet =3D getaddrinfo(l2tpv3->src, srcport, &hints, &result); > + > + if ((gairet !=3D 0) || (result =3D=3D NULL)) { > + error_report( > + "l2tpv3_open : could not resolve src, errno =3D %s", > + gai_strerror(gairet) > + ); > + goto outerr; > + } > + fd =3D socket(result->ai_family, result->ai_socktype, result->ai_p= rotocol); > + if (fd =3D=3D -1) { > + fd =3D -errno; > + error_report("l2tpv3_open : socket creation failed, errno =3D = %d", -fd); > + freeaddrinfo(result); > + goto outerr; > + } > + if (bind(fd, (struct sockaddr *) result->ai_addr, result->ai_addrl= en)) { > + error_report("l2tpv3_open : could not bind socket err=3D%i", = errno); > + goto outerr; > + } > + > + freeaddrinfo(result); > + > + memset(&hints, 0, sizeof(hints)); > + > + if (s->ipv6) { > + hints.ai_family =3D AF_INET6; > + } else { > + hints.ai_family =3D AF_INET; > + } > + if (s->udp) { > + hints.ai_socktype =3D SOCK_DGRAM; > + hints.ai_protocol =3D 0; > + } else { > + hints.ai_socktype =3D SOCK_RAW; > + hints.ai_protocol =3D IPPROTO_L2TP; > + } > + > + gairet =3D getaddrinfo(l2tpv3->dst, dstport, &hints, &result); > + if ((gairet !=3D 0) || (result =3D=3D NULL)) { > + error_report( > + "l2tpv3_open : could not resolve dst, error =3D %s", > + gai_strerror(gairet) > + ); > + goto outerr; > + } > + > + s->dgram_dst =3D g_malloc(sizeof(struct sockaddr_storage)); > + memset(s->dgram_dst, '\0' , sizeof(struct sockaddr_storage)); > + memcpy(s->dgram_dst, result->ai_addr, result->ai_addrlen); > + s->dst_size =3D result->ai_addrlen; > + > + freeaddrinfo(result); > + > + if (l2tpv3->has_counter && l2tpv3->counter) { > + s->has_counter =3D true; > + s->offset +=3D 4; > + } else { > + s->has_counter =3D false; > + } > + > + if (l2tpv3->has_pincounter && l2tpv3->pincounter) { > + s->has_counter =3D true; /* pin counter implies that there is= counter */ > + s->pin_counter =3D true; > + } else { > + s->pin_counter =3D false; > + } > + > + if (l2tpv3->has_offset) { > + /* extra offset */ > + s->offset +=3D l2tpv3->offset; > + } > + > + if ((s->ipv6) || (s->udp)) { > + s->header_size =3D s->offset; > + } else { > + s->header_size =3D s->offset + sizeof(struct iphdr); > + } > + > + s->msgvec =3D build_l2tpv3_vector(s, MAX_L2TPV3_MSGCNT); > + s->vec =3D g_malloc(sizeof(struct iovec) * MAX_L2TPV3_IOVCNT); > + s->header_buf =3D g_malloc(s->header_size); > + > + qemu_set_nonblock(fd); > + > + s->fd =3D fd; > + s->counter =3D 0; > + > + l2tpv3_read_poll(s, true); > + > + if (!s) { > + error_report("l2tpv3_open : failed to set fd handler"); > + goto outerr; > + } > + snprintf(s->nc.info_str, sizeof(s->nc.info_str), > + "l2tpv3: connected"); > + return 0; > +outerr: > + qemu_del_net_client(nc); > + if (fd > 0) { > + close(fd); > + } > + return -1; > +} > + > diff --git a/net/net.c b/net/net.c > index 0a88e68..749d34c 100644 > --- a/net/net.c > +++ b/net/net.c > @@ -731,6 +731,9 @@ static int (* const net_client_init_fun[NET_CLIENT_= OPTIONS_KIND_MAX])( > [NET_CLIENT_OPTIONS_KIND_BRIDGE] =3D net_init_bridge, > #endif > [NET_CLIENT_OPTIONS_KIND_HUBPORT] =3D net_init_hubport, > +#ifdef CONFIG_LINUX > + [NET_CLIENT_OPTIONS_KIND_L2TPV3] =3D net_init_l2tpv3, > +#endif > }; > =20 > =20 > diff --git a/qapi-schema.json b/qapi-schema.json > index 83fa485..aefc478 100644 > --- a/qapi-schema.json > +++ b/qapi-schema.json > @@ -2941,6 +2941,62 @@ > '*udp': 'str' } } > =20 > ## > +# @NetdevL2TPv3Options > +# > +# Connect the VLAN to Ethernet over L2TPv3 Static tunnel > +# > +# @src: source address > +# > +# @dst: destination address > +# > +# @srcport: #optional source port - mandatory for udp, optional for ip > +# > +# @dstport: #optional destination port - mandatory for udp, optional f= or ip > +# > +# @ipv6: #optional - force the use of ipv6 > +# > +# @udp: #optional - use the udp version of l2tpv3 encapsulation > +# > +# @cookie64: #optional - use 64 bit coookies > +# > +# @counter: #optional have sequence counter > +# > +# @pincounter: #optional pin sequence counter to zero - > +# workaround for buggy implementations or > +# networks with packet reorder > +# > +# @txcookie: #optional 32 or 64 bit transmit cookie > +# > +# @rxcookie: #optional 32 or 64 bit receive cookie > +# > +# @txsession: 32 bit transmit session > +# > +# @rxsession: #optional 32 bit receive session - if not specified > +# set to the same value as transmit > +# > +# @offset: #optional additional offset - allows the insertion of > +# additional application-specific data before the packet payl= oad > +# > +# Since 2.1 > +## > +{ 'type': 'NetdevL2TPv3Options', > + 'data': { > + 'src': 'str', > + 'dst': 'str', > + '*srcport': 'str', > + '*dstport': 'str', > + '*ipv6': 'bool', > + '*udp': 'bool', > + '*cookie64': 'bool', > + '*counter': 'bool', > + '*pincounter': 'bool', > + '*txcookie': 'uint64', > + '*rxcookie': 'uint64', > + 'txsession': 'uint32', > + '*rxsession': 'uint32', > + '*offset': 'uint32' } } > + > +## > # @NetdevVdeOptions > # > # Connect the VLAN to a vde switch running on the host. > @@ -3014,6 +3070,9 @@ > # A discriminated record of network device traits. > # > # Since 1.2 > +# > +# 'l2tpv3' - since 2.1 > +# > ## > { 'union': 'NetClientOptions', > 'data': { > @@ -3021,6 +3080,7 @@ > 'nic': 'NetLegacyNicOptions', > 'user': 'NetdevUserOptions', > 'tap': 'NetdevTapOptions', > + 'l2tpv3': 'NetdevL2TPv3Options', > 'socket': 'NetdevSocketOptions', > 'vde': 'NetdevVdeOptions', > 'dump': 'NetdevDumpOptions', > diff --git a/qemu-options.hx b/qemu-options.hx > index 8b94264..e1caf6f 100644 > --- a/qemu-options.hx > +++ b/qemu-options.hx > @@ -1395,6 +1395,29 @@ DEF("net", HAS_ARG, QEMU_OPTION_net, > " (default=3D" DEFAULT_BRIDGE_INTERFACE ") using th= e program 'helper'\n" > " (default=3D" DEFAULT_BRIDGE_HELPER ")\n" > #endif > +#ifdef __linux__ > + "-net l2tpv3[,vlan=3Dn][,name=3Dstr],src=3Dsrcaddr,dst=3Ddstaddr[,= srcport=3Dsrcport][,dstport=3Ddstport],txsession=3Dtxsession[,rxsession=3D= rxsession][,ipv6=3Don/off][,udp=3Don/off][,cookie64=3Don/off][,counter][,= pincounter][,txcookie=3Dtxcookie][,rxcookie=3Drxcookie][,offset=3Doffset]= \n" > + " connect the VLAN to an Ethernet over L2TPv3 pseud= owire\n" > + " Linux kernel 3.3+ as well as most routers can tal= k\n" > + " L2TPv3. This transport allows to connect a VM to = a VM,\n" > + " VM to a router and even VM to Host. It is a nearl= y-universal\n" > + " standard (RFC3391). Note - this implementation us= es static\n" > + " pre-configured tunnels (same as the Linux kernel)= .\n" > + " use 'src=3D' to specify source address\n" > + " use 'dst=3D' to specify destination address\n" > + " use 'udp=3Don' to specify udp encapsulation\n" > + " use 'dstport=3D' to specify destination udp port\= n" > + " use 'dstport=3D' to specify destination udp port\= n" > + " use 'ipv6=3Don' to force v6\n" > + " L2TPv3 uses cookies to prevent misconfiguration a= s\n" > + " well as a weak security measure\n" > + " use 'rxcookie=3D0x012345678' to specify a rxcooki= e\n" > + " use 'txcookie=3D0x012345678' to specify a txcooki= e\n" > + " use 'cookie64=3Don' to set cookie size to 64 bit,= otherwise 32\n" > + " use 'counter=3Doff' to force a 'cut-down' L2TPv3 = with no counter\n" > + " use 'pincounter=3Don' to work around broken count= er handling in peer\n" > + " use 'offset=3DX' to add an extra offset between h= eader and data\n" > +#endif > "-net socket[,vlan=3Dn][,name=3Dstr][,fd=3Dh][,listen=3D[host]:por= t][,connect=3Dhost:port]\n" > " connect the vlan 'n' to another VLAN using a sock= et connection\n" > "-net socket[,vlan=3Dn][,name=3Dstr][,fd=3Dh][,mcast=3Dmaddr:port[= ,localaddr=3Daddr]]\n" > @@ -1730,6 +1753,65 @@ qemu-system-i386 linux.img \ > -net socket,mcast=3D239.192.168.1:1102,localaddr=3D1.= 2.3.4 > @end example > =20 > +@item -netdev l2tpv3,id=3D@var{id},src=3D@var{srcaddr},dst=3D@var{dsta= ddr}[,srcport=3D@var{srcport}][,dstport=3D@var{dstport}],txsession=3D@var= {txsession}[,rxsession=3D@var{rxsession}][,ipv6][,udp][,cookie64][,counte= r][,pincounter][,txcookie=3D@var{txcookie}][,rxcookie=3D@var{rxcookie}][,= offset=3D@var{offset}] > +@item -net l2tpv3[,vlan=3D@var{n}][,name=3D@var{name}],src=3D@var{srca= ddr},dst=3D@var{dstaddr}[,srcport=3D@var{srcport}][,dstport=3D@var{dstpor= t}],txsession=3D@var{txsession}[,rxsession=3D@var{rxsession}][,ipv6][,udp= ][,cookie64][,counter][,pincounter][,txcookie=3D@var{txcookie}][,rxcookie= =3D@var{rxcookie}][,offset=3D@var{offset}] > +Connect VLAN @var{n} to L2TPv3 pseudowire. L2TPv3 (RFC3391) is a popul= ar > +protocol to transport Ethernet (and other Layer 2) data frames between > +two systems. It is present in routers, firewalls and the Linux kernel > +(from version 3.3 onwards). > + > +This transport allows a VM to communicate to another VM, router or fir= ewall directly. > + > +@item src=3D@var{srcaddr} > + source address (mandatory) > +@item dst=3D@var{dstaddr} > + destination address (mandatory) > +@item udp > + select udp encapsulation (default is ip). > +@item srcport=3D@var{srcport} > + source udp port. > +@item dstport=3D@var{dstport} > + destination udp port. > +@item ipv6 > + force v6, otherwise defaults to v4. > +@item rxcookie=3D@var{rxcookie} > +@item txcookie=3D@var{txcookie} > + Cookies are a weak form of security in the l2tpv3 specification. > +Their function is mostly to prevent misconfiguration. By default they = are 32 > +bit. > +@item cookie64 > + Set cookie size to 64 bit instead of the default 32 > +@item counter=3Doff > + Force a 'cut-down' L2TPv3 with no counter as in > +draft-mkonstan-l2tpext-keyed-ipv6-tunnel-00 > +@item pincounter=3Don > + Work around broken counter handling in peer. This may also help on > +networks which have packet reorder. > +@item offset=3D@var{offset} > + Add an extra offset between header and data > + > +For example, to attach a VM running on host 4.3.2.1 via L2TPv3 to the = bridge br-lan > +on the remote Linux host 1.2.3.4: > +@example > +# Setup tunnel on linux host using raw ip as encapsulation > +# on 1.2.3.4 > +ip l2tp add tunnel remote 4.3.2.1 local 1.2.3.4 tunnel_id 1 peer_tunne= l_id 1 \ > + encap udp udp_sport 16384 udp_dport 16384 > +ip l2tp add session tunnel_id 1 name vmtunnel0 session_id \ > + 0xFFFFFFFF peer_session_id 0xFFFFFFFF > +ifconfig vmtunnel0 mtu 1500 > +ifconfig vmtunnel0 up > +brctl addif br-lan vmtunnel0 > + > + > +# on 4.3.2.1 > +# launch QEMU instance - if your network has reorder or is very lossy = add ,pincounter > + > +qemu-system-i386 linux.img -net nic -net l2tpv3,src=3D4.2.3.1,dst=3D1.= 2.3.4,udp,srcport=3D16384,dstport=3D16384,rxsession=3D0xffffffff,txsessio= n=3D0xffffffff,counter > + > + > +@end example > + > @item -netdev vde,id=3D@var{id}[,sock=3D@var{socketpath}][,port=3D@var= {n}][,group=3D@var{groupname}][,mode=3D@var{octalmode}] > @item -net vde[,vlan=3D@var{n}][,name=3D@var{name}][,sock=3D@var{socke= tpath}] [,port=3D@var{n}][,group=3D@var{groupname}][,mode=3D@var{octalmod= e}] > Connect VLAN @var{n} to PORT @var{n} of a vde switch running on host a= nd > --=20 > 1.7.10.4 >=20 >=20