xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Introduce xensock socket and implement sockback and sockfront drivers
@ 2014-08-08 11:32 Oleksandr Dmytryshyn
  2014-08-08 11:32 ` [PATCH 1/3] net: introduce new socket support: xensock Oleksandr Dmytryshyn
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Oleksandr Dmytryshyn @ 2014-08-08 11:32 UTC (permalink / raw)
  To: Ian Campbell, Stefano Stabellini, Tim Deegan, xen-devel

Next series of the patches adds new socket support: xensock.
Those sockets will be used for the xen-sock frontend/backend
drivers. Those drivers will allow to connect via xensock
sockets (in this case dom0/domD sockets can be used for the
server application and domU sockets can be used for the
client application). Those sockets are similar to the TCP sockets.
But there are some limitations. Xensock sockets
ignore an address and can act only as the stream
sockets. Only one xensock socket can be opened in the frontend
side and it will be connected with the single xensock socket
in the backend side (this link is created automatically by
frontend and backend driver).

Oleksandr Dmytryshyn (3):
  net: introduce new socket support: xensock
  xen-sockback: add support of the xen-sockback driver
  xen-sockfront: add support of the xen-sockfront driver

 drivers/net/Kconfig                  |   38 +
 drivers/net/Makefile                 |    4 +
 drivers/net/xen-sockback/Makefile    |    3 +
 drivers/net/xen-sockback/common.h    |  150 ++++
 drivers/net/xen-sockback/interface.c |  245 ++++++
 drivers/net/xen-sockback/sockback.c  | 1527 ++++++++++++++++++++++++++++++++++
 drivers/net/xen-sockback/xenbus.c    |  348 ++++++++
 drivers/net/xen-sockfront.c          | 1364 ++++++++++++++++++++++++++++++
 drivers/net/xensock/Makefile         |    5 +
 drivers/net/xensock/xensock-dev.c    |  269 ++++++
 drivers/net/xensock/xensock-proto.c  |  767 +++++++++++++++++
 include/linux/socket.h               |    4 +-
 include/net/af_xensock.h             |   46 +
 include/net/xensock.h                |  130 +++
 include/xen/interface/io/sockif.h    |   74 ++
 net/core/sock.c                      |    9 +-
 16 files changed, 4979 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/xen-sockback/Makefile
 create mode 100644 drivers/net/xen-sockback/common.h
 create mode 100644 drivers/net/xen-sockback/interface.c
 create mode 100644 drivers/net/xen-sockback/sockback.c
 create mode 100644 drivers/net/xen-sockback/xenbus.c
 create mode 100644 drivers/net/xen-sockfront.c
 create mode 100644 drivers/net/xensock/Makefile
 create mode 100644 drivers/net/xensock/xensock-dev.c
 create mode 100644 drivers/net/xensock/xensock-proto.c
 create mode 100644 include/net/af_xensock.h
 create mode 100644 include/net/xensock.h
 create mode 100644 include/xen/interface/io/sockif.h

-- 
1.8.2.rc2

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] net: introduce new socket support: xensock
  2014-08-08 11:32 [PATCH 0/3] Introduce xensock socket and implement sockback and sockfront drivers Oleksandr Dmytryshyn
@ 2014-08-08 11:32 ` Oleksandr Dmytryshyn
  2014-08-08 11:32 ` [PATCH 2/3] xen-sockback: add support of the xen-sockback driver Oleksandr Dmytryshyn
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Oleksandr Dmytryshyn @ 2014-08-08 11:32 UTC (permalink / raw)
  To: Ian Campbell, Stefano Stabellini, Tim Deegan, xen-devel

Those sockets will be used for the xen-sock frontend/backend
drivers. Those drivers will allow to connect via xensock
sockets (in this case dom0/domD sockets can be used for the
server application and domU sockets can be used for the
client application).

Signed-off-by: Oleksandr Dmytryshyn <oleksandr.dmytryshyn@globallogic.com>
---
 drivers/net/Kconfig                 |   4 +
 drivers/net/Makefile                |   2 +
 drivers/net/xensock/Makefile        |   5 +
 drivers/net/xensock/xensock-dev.c   | 269 +++++++++++++
 drivers/net/xensock/xensock-proto.c | 767 ++++++++++++++++++++++++++++++++++++
 include/linux/socket.h              |   4 +-
 include/net/af_xensock.h            |  46 +++
 include/net/xensock.h               | 130 ++++++
 net/core/sock.c                     |   9 +-
 9 files changed, 1232 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/xensock/Makefile
 create mode 100644 drivers/net/xensock/xensock-dev.c
 create mode 100644 drivers/net/xensock/xensock-proto.c
 create mode 100644 include/net/af_xensock.h
 create mode 100644 include/net/xensock.h

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 89402c3..420981a 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -348,6 +348,10 @@ config XEN_NETDEV_BACKEND
 	  compile this driver as a module, chose M here: the module
 	  will be called xen-netback.
 
+config XEN_SOCKDEV_PROTO
+	bool
+	default n
+
 config VMXNET3
 	tristate "VMware VMXNET3 ethernet driver"
 	depends on PCI && INET
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 3fef8a8..43bf910 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -61,6 +61,8 @@ obj-$(CONFIG_VMXNET3) += vmxnet3/
 obj-$(CONFIG_XEN_NETDEV_FRONTEND) += xen-netfront.o
 obj-$(CONFIG_XEN_NETDEV_BACKEND) += xen-netback/
 
+obj-$(CONFIG_XEN_SOCKDEV_PROTO) += xensock/
+
 obj-$(CONFIG_USB_CATC)          += usb/
 obj-$(CONFIG_USB_KAWETH)        += usb/
 obj-$(CONFIG_USB_PEGASUS)       += usb/
diff --git a/drivers/net/xensock/Makefile b/drivers/net/xensock/Makefile
new file mode 100644
index 0000000..d70db09
--- /dev/null
+++ b/drivers/net/xensock/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for the xensock driver
+#
+
+obj-$(CONFIG_XEN_SOCKDEV_PROTO) := xensock-proto.o xensock-dev.o
diff --git a/drivers/net/xensock/xensock-dev.c b/drivers/net/xensock/xensock-dev.c
new file mode 100644
index 0000000..6da8f34
--- /dev/null
+++ b/drivers/net/xensock/xensock-dev.c
@@ -0,0 +1,269 @@
+/*
+ * Xen socket dev driver.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <linux/socket.h>
+#include <linux/skbuff.h>
+#include <linux/sched.h>
+#include <linux/spinlock.h>
+
+#include <net/af_xensock.h>
+
+LIST_HEAD(xensock_list);
+DEFINE_RWLOCK(xensock_list_lock);
+DECLARE_WAIT_QUEUE_HEAD(xensock_accept_wait);
+
+static inline void xensock_lock_dev(struct xen_sock_dev *dev)
+{
+	spin_lock(&dev->lock);
+}
+
+static inline void xensock_unlock_dev(struct xen_sock_dev *dev)
+{
+	spin_unlock(&dev->lock);
+}
+
+struct xen_sock_dev *alloc_xen_sock_dev(int sizeof_priv, const char *name)
+{
+	struct xen_sock_dev *dev;
+	int alloc_size;
+	struct xen_sock_dev *p;
+
+	BUG_ON(strlen(name) >= sizeof(dev->name));
+
+	alloc_size = sizeof(struct xen_sock_dev);
+
+	if (sizeof_priv) {
+		alloc_size = ALIGN(sizeof(struct xen_sock_dev), SOCKDEV_ALIGN);
+		alloc_size += sizeof_priv;
+	}
+
+	/* ensure 32-byte alignment of whole construct */
+	alloc_size += SOCKDEV_ALIGN - 1;
+
+	p = kzalloc(alloc_size, GFP_KERNEL);
+	if (!p) {
+		pr_err("alloc_xen_sock_dev: Unable to allocate device\n");
+		return NULL;
+	}
+
+	dev = PTR_ALIGN(p, SOCKDEV_ALIGN);
+	dev->padded = (char *)dev - (char *)p;
+
+	strcpy(dev->name, name);
+	spin_lock_init(&dev->lock);
+	sockif_carrier_off(dev);
+	sockif_stop_queue(dev);
+
+	return dev;
+}
+EXPORT_SYMBOL(alloc_xen_sock_dev);
+
+void free_xen_sock_dev(struct xen_sock_dev *dev)
+{
+	if (!dev)
+		return;
+
+	kfree((char *)dev - dev->padded);
+}
+EXPORT_SYMBOL(free_xen_sock_dev);
+
+
+int xensock_register_dev(struct xen_sock_dev *dev)
+{
+	write_lock_bh(&xensock_list_lock);
+	list_add_tail(&dev->list, &xensock_list);
+	write_unlock_bh(&xensock_list_lock);
+	wake_up_interruptible(&xensock_accept_wait);
+
+	return 0;
+}
+EXPORT_SYMBOL(xensock_register_dev);
+
+static void __xensock_dev_unlink_sk(struct xen_sock_dev *dev);
+
+void  xensock_unregister_dev(struct xen_sock_dev *dev)
+{
+	write_lock_bh(&xensock_list_lock);
+	xensock_lock_dev(dev);
+	__xensock_dev_unlink_sk(dev);
+	list_del(&dev->list);
+	xensock_unlock_dev(dev);
+	write_unlock_bh(&xensock_list_lock);
+}
+EXPORT_SYMBOL(xensock_unregister_dev);
+
+static int __xensock_dev_link_sk(struct xen_sock_dev *dev, struct sock *sk)
+{
+	sock_hold(sk);
+	dev->sk = sk;
+	xen_sk(sk)->dev = dev;
+
+	return 0;
+}
+
+static void __xensock_dev_unlink_sk(struct xen_sock_dev *dev)
+{
+	struct sock *sk = dev->sk;
+	struct socket *sock;
+
+	if (sk) {
+		sock = sk->sk_socket;
+		sock->state = SS_UNCONNECTED;
+
+		xen_sk(sk)->dev = NULL;
+		dev->sk = NULL;
+		sk->sk_err = ENOTCONN;
+		sk->sk_state_change(sk);
+		sock_put(sk);
+	}
+}
+
+void xensock_dev_unlink_sk(struct xen_sock_dev *dev)
+{
+	write_lock_bh(&xensock_list_lock);
+	xensock_lock_dev(dev);
+	__xensock_dev_unlink_sk(dev);
+	xensock_unlock_dev(dev);
+	write_unlock_bh(&xensock_list_lock);
+	wake_up_interruptible(&xensock_accept_wait);
+}
+
+void xensock_unlink_all_dev_sk(void)
+{
+	struct xen_sock_dev *ldev;
+
+	write_lock_bh(&xensock_list_lock);
+	list_for_each_entry(ldev, &xensock_list, list) {
+		xensock_lock_dev(ldev);
+		__xensock_dev_unlink_sk(ldev);
+		xensock_unlock_dev(ldev);
+	}
+	write_unlock_bh(&xensock_list_lock);
+}
+
+int xensock_dev_wait(struct sock *sk, struct sock *nsk)
+{
+	int rc = 0;
+	long timeout = sk->sk_rcvtimeo;
+	struct xen_sock_dev *ldev;
+	bool dev_found;
+
+	DECLARE_WAITQUEUE(wait, current);
+
+	add_wait_queue_exclusive(&xensock_accept_wait, &wait);
+	for (;;) {
+		__set_current_state(TASK_INTERRUPTIBLE);
+		if (sk->sk_shutdown & RCV_SHUTDOWN)
+			break;
+		rc = -ERESTARTSYS;
+		if (signal_pending(current))
+			break;
+		rc = -EAGAIN;
+		if (!timeout)
+			break;
+		rc = 0;
+		dev_found = false;
+		read_lock_bh(&xensock_list_lock);
+		list_for_each_entry(ldev, &xensock_list, list) {
+			xensock_lock_dev(ldev);
+			if (ldev->sk == NULL) {
+				__xensock_dev_link_sk(ldev, nsk);
+				xensock_unlock_dev(ldev);
+				dev_found = true;
+				break;
+			}
+			xensock_unlock_dev(ldev);
+		}
+		read_unlock_bh(&xensock_list_lock);
+		if (dev_found)
+			break;
+
+		release_sock(sk);
+		timeout = schedule_timeout(timeout);
+		lock_sock(sk);
+	}
+	__set_current_state(TASK_RUNNING);
+	remove_wait_queue(&xensock_accept_wait, &wait);
+
+	return rc;
+}
+
+/* xensock_dev_send - transmit a xensock frame
+ * @skb: pointer to socket buffer with xensock frame in data section
+ */
+int xensock_dev_send(struct sk_buff *skb, struct xen_sock_dev *dev)
+{
+	int err = -EINVAL;
+
+	/* Make sure the xensock frame can pass the selected xensock device */
+	if (unlikely(skb->len > XENSOCK_MTU)) {
+		err = -EMSGSIZE;
+		goto inval_skb;
+	}
+
+	if (unlikely(!dev->start_xmit)) {
+		err = -EIO;
+		goto inval_skb;
+	}
+
+	if (sockif_queue_stopped(dev)) {
+		err = -ENOBUFS;
+		goto inval_skb;
+	}
+
+	err = dev->start_xmit(skb, dev);
+	if (err)
+		goto inval_skb;
+
+	return 0;
+
+inval_skb:
+	kfree_skb(skb);
+	return err;
+}
+
+int xensock_dev_queue_rx_skb(struct sk_buff *skb, struct xen_sock_dev *dev)
+{
+	int rc = -ENETDOWN;
+	struct sock *sk = dev->sk;
+
+	xensock_lock_dev(dev);
+	if (sk == NULL)
+		goto out;
+
+	rc = 0;
+	skb_queue_tail(&sk->sk_receive_queue, skb);
+	if (!sock_flag(sk, SOCK_DEAD))
+		sk->sk_data_ready(sk, skb->len);
+out:
+	xensock_unlock_dev(dev);
+	return rc;
+}
diff --git a/drivers/net/xensock/xensock-proto.c b/drivers/net/xensock/xensock-proto.c
new file mode 100644
index 0000000..d05e5d5
--- /dev/null
+++ b/drivers/net/xensock/xensock-proto.c
@@ -0,0 +1,767 @@
+/*
+ * Xen socket protocol driver.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <linux/socket.h>
+#include <linux/skbuff.h>
+#include <linux/if_arp.h>
+#include <linux/highmem.h>
+
+#include <net/tcp_states.h>
+#include <net/af_xensock.h>
+
+#define XENSOCK_DISCONNECT		BIT(0)
+
+struct xensock_skb_cb {
+	unsigned int flags_checked;
+};
+
+/* Return pointer to store the extra msg flags for xensock_proto_recvmsg().
+ * We use the space of one unsigned int beyond the 'int' size
+ * in skb->cb. Xensock frontend and backend drivers are using this
+ * first 'int' part of the skb->cb.
+ */
+static inline struct xensock_skb_cb *xensock_proto_cb(struct sk_buff *skb)
+{
+	BUILD_BUG_ON(sizeof(skb->cb) <= (sizeof(int) +
+					 sizeof(struct xensock_skb_cb)));
+
+	/* return pointer after 'int' size */
+	return (struct xensock_skb_cb *)(&((int *)skb->cb)[1]);
+}
+
+static int xensock_prot_init(struct sock *sk)
+{
+	return 0;
+}
+
+static struct proto xensock_proto __read_mostly = {
+	.name		= "XENSOCK",
+	.owner		= THIS_MODULE,
+	.obj_size	= sizeof(struct xen_sock),
+	.init		= xensock_prot_init,
+};
+
+struct sock *xen_allocate_sock(struct net *net)
+{
+	struct xen_sock *xsk;
+	struct sock *sk;
+
+	sk = sk_alloc(net, PF_XENSOCK, GFP_KERNEL, &xensock_proto);
+
+	if (!sk)
+		goto out;
+
+	sock_init_data(NULL, sk);
+
+	xsk = xen_sk(sk);
+	xsk->dev = NULL;
+out:
+	return sk;
+}
+
+static struct sock *xensock_make_new(struct sock *osk)
+{
+	struct sock *sk = NULL;
+
+	if (osk->sk_type != SOCK_RAW)
+		goto out;
+
+	sk = xen_allocate_sock(sock_net(osk));
+	if (sk == NULL)
+		goto out;
+
+	sk->sk_type        = osk->sk_type;
+	sk->sk_priority    = osk->sk_priority;
+	sk->sk_protocol    = osk->sk_protocol;
+	sk->sk_rcvbuf      = osk->sk_rcvbuf;
+	sk->sk_sndbuf      = osk->sk_sndbuf;
+	sk->sk_state       = TCP_ESTABLISHED;
+	sk->sk_backlog_rcv = osk->sk_backlog_rcv;
+	sock_copy_flags(sk, osk);
+
+out:
+	return sk;
+}
+
+static int xensock_prot_connect(struct socket *sock, struct sockaddr *uaddr,
+				int addr_len, int flags)
+{
+	struct sock *sk;
+	int rc;
+
+	if (sock->state == SS_CONNECTED)
+		return 0;
+
+	sk = sock->sk;
+
+	lock_sock(sk);
+	rc = xensock_dev_wait(sk, sk);
+
+	if (!rc)
+		sock->state = SS_CONNECTED;
+
+	release_sock(sk);
+
+	return rc;
+}
+
+static int xensock_prot_accept(struct socket *sock, struct socket *newsock,
+			       int flags)
+{
+	struct sock *nsk, *sk = sock->sk;
+	int rc = -EINVAL;
+
+	if (!sk)
+		goto out;
+
+	rc = -EOPNOTSUPP;
+	if (sk->sk_type != SOCK_RAW)
+		goto out;
+
+	lock_sock(sk);
+	rc = -EINVAL;
+	if (sk->sk_state != TCP_LISTEN)
+		goto err_release_sk;
+
+	rc = -ENOMEM;
+	nsk = xensock_make_new(sk);
+	if (!nsk)
+		goto err_release_sk;
+
+	rc = xensock_dev_wait(sk, nsk);
+	if (rc)
+		goto err_remove_nsk;
+
+	sock_graft(nsk, newsock);
+
+	/* Now attach up the new socket */
+	sk->sk_ack_backlog--;
+	newsock->state = SS_CONNECTED;
+	rc = 0;
+err_release_sk:
+	release_sock(sk);
+out:
+	return rc;
+
+err_remove_nsk:
+	sock_orphan(nsk);
+	sock_put(nsk);
+	goto err_release_sk;
+}
+
+static int xensock_prot_listen(struct socket *sock, int backlog)
+{
+	struct sock *sk = sock->sk;
+	int rc = -EOPNOTSUPP;
+
+	lock_sock(sk);
+	/* All created sockets (in .accept callback) will have a non
+	 * NULL pointer and listen operation for them is prohibited.
+	 */
+	if (xen_sk(sk)->dev)
+		goto out;
+
+	if (sk->sk_state != TCP_LISTEN) {
+		sk->sk_max_ack_backlog = backlog;
+		sk->sk_state           = TCP_LISTEN;
+		rc = 0;
+	}
+out:
+	release_sock(sk);
+
+	return rc;
+}
+
+static int xensock_prot_release(struct socket *sock)
+{
+	struct sock *sk = sock->sk;
+	struct xen_sock_dev *dev;
+	struct sk_buff *skb;
+	unsigned char xflag;
+	int err;
+
+	if (!sk)
+		return 0;
+
+	lock_sock(sk);
+	dev = xen_sk(sk)->dev;
+
+	if (dev) {
+		release_sock(sk);
+		skb = sock_alloc_send_skb(sk, 1, 0, &err);
+		lock_sock(sk);
+		if (!skb)
+			goto skip_send_no_con;
+
+		xflag = XENSOCK_DISCONNECT;
+		memcpy(skb_put(skb, 1), &xflag, 1);
+		skb->dev = NULL;
+		skb->sk  = sk;
+
+		err = xensock_dev_send(skb, dev);
+
+		if (err)
+			kfree_skb(skb);
+
+skip_send_no_con:
+		xensock_dev_unlink_sk(dev);
+	}
+
+	/* Flush the recv buffs */
+	while ((skb = __skb_dequeue(&sk->sk_receive_queue)) != NULL)
+		__kfree_skb(skb);
+
+	sock_orphan(sk);
+	sock->sk = NULL;
+
+	release_sock(sk);
+	sock_put(sk);
+
+	return 0;
+}
+
+int xensock_prot_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
+{
+	struct sock *sk = sock->sk;
+
+	switch (cmd) {
+	case SIOCGSTAMP:
+		return sock_get_timestamp(sk, (struct timeval __user *)arg);
+
+	default:
+		return -ENOIOCTLCMD;
+	}
+}
+
+static int xensock_prot_sendmsg(struct kiocb *iocb, struct socket *sock,
+				struct msghdr *msg, size_t size)
+{
+	struct sock *sk = sock->sk;
+	struct xen_sock_dev *dev;
+	struct sk_buff *skb;
+	int err;
+	unsigned char xflag;
+	size_t sent = 0;
+	unsigned int header_len, data_len;
+	unsigned int chunk;
+
+	if (msg->msg_flags & MSG_OOB)
+		return -EOPNOTSUPP;
+
+	lock_sock(sk);
+
+	err = sock_error(sk);
+	if (err)
+		goto out;
+
+	err = -ENOTCONN;
+
+	dev = xen_sk(sk)->dev;
+	if (!dev)
+		goto out;
+
+	/* Another side has closed connection */
+	if (sock->state == SS_DISCONNECTING) {
+		err = size;
+		goto out;
+	}
+
+	if (sock->state != SS_CONNECTED)
+		goto out;
+
+	if (size == 0) {
+		err = 0;
+		goto out;
+	}
+
+	do {
+		chunk = size + 1;
+
+		if (chunk > XENSOCK_MTU)
+			chunk = XENSOCK_MTU;
+
+		if (chunk <= PAGE_SIZE) {
+			header_len = chunk;
+			data_len = 0;
+		} else {
+			header_len = PAGE_SIZE;
+			data_len = chunk - PAGE_SIZE;
+		}
+
+		release_sock(sk);
+		skb = sock_alloc_send_pskb(sk, header_len, data_len,
+					   msg->msg_flags & MSG_DONTWAIT,
+					   &err);
+		lock_sock(sk);
+		if (!skb)
+			goto out;
+
+		xflag = 0;
+		memcpy(skb_tail_pointer(skb), &xflag, 1);
+
+		skb->data_len = data_len;
+		skb->len = chunk;
+
+		err = skb_copy_datagram_from_iovec(skb, 1, msg->msg_iov,
+						   sent, chunk - 1);
+		if (err < 0)
+			goto free_skb;
+
+		/* move pointers in the skb */
+		skb->tail += header_len;
+
+		err = sock_tx_timestamp(sk, &skb_shinfo(skb)->tx_flags);
+		if (err < 0)
+			goto free_skb;
+
+		skb->dev = NULL;
+		skb->sk  = sk;
+
+		err = xensock_dev_send(skb, dev);
+
+		if (err)
+			goto free_skb;
+
+		/* First byte of the SKB data is the xensock flags */
+		chunk--;
+
+		sent += chunk;
+		size -= chunk;
+	} while (size);
+
+	err = sent;
+out:
+	release_sock(sk);
+	return err;
+
+free_skb:
+	kfree_skb(skb);
+	goto out;
+}
+
+static long xen_sock_data_wait(struct sock *sk, long timeo)
+{
+	DECLARE_WAITQUEUE(wait, current);
+
+	add_wait_queue(sk_sleep(sk), &wait);
+	for (;;) {
+		set_current_state(TASK_INTERRUPTIBLE);
+
+		if (!skb_queue_empty(&sk->sk_receive_queue))
+			break;
+
+		if (sk->sk_err || (sk->sk_shutdown & RCV_SHUTDOWN))
+			break;
+
+		if (signal_pending(current) || !timeo)
+			break;
+
+		set_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
+		release_sock(sk);
+		timeo = schedule_timeout(timeo);
+		lock_sock(sk);
+		clear_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
+	}
+
+	__set_current_state(TASK_RUNNING);
+	remove_wait_queue(sk_sleep(sk), &wait);
+	return timeo;
+}
+
+static unsigned char get_skb_xflags(struct sk_buff *skb)
+{
+	unsigned char ret;
+	int i;
+	int nr_frags;
+	int skb_len = skb_headlen(skb);
+
+	if (1 <= skb_len) {
+		ret = skb->data[0];
+		__skb_pull(skb, 1);
+		return ret;
+	} else {
+		struct sk_buff *frag1;
+
+		skb_walk_frags(skb, frag1) {
+			if (frag1->len) {
+				skb->len -= 1;
+				skb->data_len -= 1;
+				ret = frag1->data[0];
+				__skb_pull(frag1, 1);
+				return ret;
+			}
+		}
+
+		nr_frags = skb_shinfo(skb)->nr_frags;
+		for (i = 0; i < nr_frags; i++) {
+			skb_frag_t *frag2;
+
+			frag2 = &skb_shinfo(skb)->frags[i];
+			if (skb_frag_size(frag2)) {
+				struct page *page = skb_frag_page(frag2);
+				u8  *vaddr;
+
+				vaddr = kmap(page);
+				ret = vaddr[frag2->page_offset];
+				kunmap(page);
+				skb->len -= 1;
+				skb->data_len -= 1;
+				skb_frag_size_sub(frag2, 1);
+				frag2->page_offset += 1;
+				return ret;
+			}
+		}
+	}
+	return 0;
+}
+
+static int xensock_prot_recvmsg(struct kiocb *iocb, struct socket *sock,
+				struct msghdr *msg, size_t size, int flags)
+{
+	struct sock *sk = sock->sk;
+	int err = 0;
+	size_t target, copied = 0;
+	long timeo;
+	int i;
+	struct sk_buff *skb;
+	int nr_frags;
+	unsigned char xflags;
+	struct xensock_skb_cb *scb;
+	struct xen_sock_dev *dev = xen_sk(sk)->dev;
+
+	if (flags & MSG_OOB)
+		return -EOPNOTSUPP;
+
+	msg->msg_namelen = 0;
+
+	lock_sock(sk);
+
+	if (!dev) {
+		copied = -ENOTCONN;
+		goto out;
+	}
+
+	if (sock->state == SS_DISCONNECTING)
+		goto out_disconnecting_state;
+
+	target = sock_rcvlowat(sk, flags & MSG_WAITALL, size);
+	timeo  = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
+
+	do {
+		int chunk;
+
+		skb = skb_dequeue(&sk->sk_receive_queue);
+		if (!skb) {
+			if (copied >= target)
+				break;
+
+			err = sock_error(sk);
+			if (err)
+				break;
+			if (sk->sk_shutdown & RCV_SHUTDOWN)
+				break;
+
+			err = -EAGAIN;
+			if (!timeo)
+				break;
+
+			timeo = xen_sock_data_wait(sk, timeo);
+
+			if (signal_pending(current)) {
+				err = sock_intr_errno(timeo);
+				goto out;
+			}
+			continue;
+		}
+
+		scb = xensock_proto_cb(skb);
+
+		if (!scb->flags_checked) {
+			/* First byte of the SKB data is the xensock flags */
+			if (skb->len < 1) {
+				copied = -EFAULT;
+				break;
+			}
+
+			scb->flags_checked = 1;
+
+			xflags = get_skb_xflags(skb);
+			if (xflags & XENSOCK_DISCONNECT) {
+				sock->state = SS_DISCONNECTING;
+				xensock_dev_unlink_sk(dev);
+				goto out_disconnecting_state;
+			}
+		}
+
+		chunk = min_t(unsigned int, skb->len, size);
+		if (skb_copy_datagram_iovec(skb, 0, msg->msg_iov, chunk)) {
+			skb_queue_head(&sk->sk_receive_queue, skb);
+			if (!copied)
+				copied = -EFAULT;
+			break;
+		}
+		copied += chunk;
+		size   -= chunk;
+
+		sock_recv_ts_and_drops(msg, sk, skb);
+
+		if (!(flags & MSG_PEEK)) {
+			int skb_len = skb_headlen(skb);
+
+			if (chunk <= skb_len) {
+				__skb_pull(skb, chunk);
+			} else {
+				struct sk_buff *frag1;
+
+				__skb_pull(skb, skb_len);
+				chunk -= skb_len;
+
+				skb_walk_frags(skb, frag1) {
+					if (chunk <= frag1->len) {
+						/* Pulling partial data */
+						skb->len -= chunk;
+						skb->data_len -= chunk;
+						__skb_pull(frag1, chunk);
+						break;
+					} else if (frag1->len) {
+						/* Pulling all frag data */
+						chunk -= frag1->len;
+						skb->len -= frag1->len;
+						skb->data_len -= frag1->len;
+						__skb_pull(frag1, frag1->len);
+					}
+				}
+
+				nr_frags = skb_shinfo(skb)->nr_frags;
+				for (i = 0; i < nr_frags; i++) {
+					skb_frag_t *frag2;
+					unsigned int frag_size;
+
+					frag2 = &skb_shinfo(skb)->frags[i];
+					frag_size = skb_frag_size(frag2);
+
+					if (chunk <= frag_size) {
+						/* Pulling partial data */
+						skb->len -= chunk;
+						skb->data_len -= chunk;
+						skb_frag_size_sub(frag2, chunk);
+						frag2->page_offset += chunk;
+						break;
+					} else if (frag_size) {
+						/* Pulling all frag data */
+						chunk -= frag_size;
+						skb->len -= frag_size;
+						skb->data_len -= frag_size;
+						skb_frag_size_set(frag2, 0);
+					}
+				}
+			}
+
+			if (skb->len) {
+				skb_queue_head(&sk->sk_receive_queue, skb);
+				break;
+			}
+			kfree_skb(skb);
+
+		} else {
+			/* put message back and return */
+			skb_queue_head(&sk->sk_receive_queue, skb);
+			break;
+		}
+	} while (size);
+out:
+	release_sock(sk);
+	return copied ? : err;
+
+out_disconnecting_state:
+	/* Flush the recv buffs */
+	while ((skb = __skb_dequeue(&sk->sk_receive_queue)) != NULL)
+		__kfree_skb(skb);
+
+	err = 0;
+	copied = 0;
+	goto out;
+}
+
+static const struct proto_ops xen_sock_server_ops = {
+	.family        = PF_XENSOCK,
+	.release       = xensock_prot_release,
+	.bind          = sock_no_bind,
+	.connect       = sock_no_connect,
+	.socketpair    = sock_no_socketpair,
+	.accept        = xensock_prot_accept,
+	.getname       = sock_no_getname,
+	.poll          = sock_no_poll,
+	.ioctl         = xensock_prot_ioctl,
+	.listen        = xensock_prot_listen,
+	.shutdown      = sock_no_shutdown,
+	.setsockopt    = sock_no_setsockopt,
+	.getsockopt    = sock_no_getsockopt,
+	.sendmsg       = xensock_prot_sendmsg,
+	.recvmsg       = xensock_prot_recvmsg,
+	.mmap          = sock_no_mmap,
+	.sendpage      = sock_no_sendpage,
+};
+
+static const struct proto_ops xen_sock_client_ops = {
+	.family        = PF_XENSOCK,
+	.release       = xensock_prot_release,
+	.bind          = sock_no_bind,
+	.connect       = xensock_prot_connect,
+	.socketpair    = sock_no_socketpair,
+	.accept        = sock_no_accept,
+	.getname       = sock_no_getname,
+	.poll          = sock_no_poll,
+	.ioctl         = xensock_prot_ioctl,
+	.listen        = sock_no_listen,
+	.shutdown      = sock_no_shutdown,
+	.setsockopt    = sock_no_setsockopt,
+	.getsockopt    = sock_no_getsockopt,
+	.sendmsg       = xensock_prot_sendmsg,
+	.recvmsg       = xensock_prot_recvmsg,
+	.mmap          = sock_no_mmap,
+	.sendpage      = sock_no_sendpage,
+};
+
+static void xensock_destruct(struct sock *sk)
+{
+}
+
+static int xensock_create(struct net *net, struct socket *sock, int proto,
+			  int kern, const struct proto_ops *xen_proto_ops)
+{
+	struct sock *sk;
+	int err = 0;
+
+	if (sock->type != SOCK_RAW)
+		return -ESOCKTNOSUPPORT;
+	if (proto != 0)
+		return -EPROTONOSUPPORT;
+
+	sk = xen_allocate_sock(net);
+	if (!sk)
+		return -ENOMEM;
+
+	sock->state = SS_UNCONNECTED;
+	sock->ops = xen_proto_ops;
+	sock_init_data(sock, sk);
+
+	sk->sk_destruct = xensock_destruct;
+	sk->sk_protocol = proto;
+
+	xen_sk(sk)->dev = NULL;
+
+	if (sk->sk_prot->init)
+		err = sk->sk_prot->init(sk);
+
+	if (err) {
+		/* release sk on errors */
+		sock_orphan(sk);
+		sock_put(sk);
+	}
+
+	return err;
+}
+
+static int xensock_server_create(struct net *net, struct socket *sock,
+				 int proto, int kern)
+{
+	return xensock_create(net, sock, proto, kern, &xen_sock_server_ops);
+}
+
+static int xensock_client_create(struct net *net, struct socket *sock,
+				 int proto, int kern)
+{
+	return xensock_create(net, sock, proto, kern, &xen_sock_client_ops);
+}
+
+static const struct net_proto_family xensock_server_family_ops = {
+	.family = PF_XENSOCK,
+	.create = xensock_server_create,
+	.owner  = THIS_MODULE,
+};
+
+static const struct net_proto_family xensock_client_family_ops = {
+	.family = PF_XENSOCK,
+	.create = xensock_client_create,
+	.owner  = THIS_MODULE,
+};
+
+static int xensock_proto_init(bool is_server_ops)
+{
+	int ret;
+
+	ret = proto_register(&xensock_proto, 0);
+	if (ret) {
+		pr_err("proto_register failed: %d\n", ret);
+		return ret;
+	}
+
+	if (is_server_ops)
+		ret = sock_register(&xensock_server_family_ops);
+	else
+		ret = sock_register(&xensock_client_family_ops);
+
+	if (ret) {
+		pr_err("sock_register failed: %d\n", ret);
+		goto proto_unreg;
+	}
+
+	return 0;
+
+proto_unreg:
+	proto_unregister(&xensock_proto);
+	return ret;
+}
+
+int xensock_proto_server_init(void)
+{
+	return xensock_proto_init(true);
+}
+EXPORT_SYMBOL(xensock_proto_server_init);
+
+int xensock_proto_client_init(void)
+{
+	return xensock_proto_init(false);
+}
+EXPORT_SYMBOL(xensock_proto_client_init);
+
+void xensock_proto_cleanup(void)
+{
+	xensock_unlink_all_dev_sk();
+	sock_unregister(PF_XENSOCK);
+	proto_unregister(&xensock_proto);
+}
+EXPORT_SYMBOL(xensock_proto_cleanup);
+
+MODULE_DESCRIPTION("xensock protocol");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS_NETPROTO(AF_XENSOCK);
diff --git a/include/linux/socket.h b/include/linux/socket.h
index ec538fc..79ffa55 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -180,7 +180,8 @@ struct ucred {
 #define AF_ALG		38	/* Algorithm sockets		*/
 #define AF_NFC		39	/* NFC sockets			*/
 #define AF_VSOCK	40	/* vSockets			*/
-#define AF_MAX		41	/* For now.. */
+#define AF_XENSOCK	41	/* xensock sockets		*/
+#define AF_MAX		42	/* For now.. */
 
 /* Protocol families, same as address families. */
 #define PF_UNSPEC	AF_UNSPEC
@@ -225,6 +226,7 @@ struct ucred {
 #define PF_ALG		AF_ALG
 #define PF_NFC		AF_NFC
 #define PF_VSOCK	AF_VSOCK
+#define PF_XENSOCK	AF_XENSOCK
 #define PF_MAX		AF_MAX
 
 /* Maximum queue length specifiable by listen.  */
diff --git a/include/net/af_xensock.h b/include/net/af_xensock.h
new file mode 100644
index 0000000..48df5ce
--- /dev/null
+++ b/include/net/af_xensock.h
@@ -0,0 +1,46 @@
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef __AF_XENSOCK_H__
+#define __AF_XENSOCK_H__
+
+#include <net/xensock.h>
+
+struct xen_sock {
+	struct sock sk;
+	struct xen_sock_dev *dev;
+};
+
+static inline struct xen_sock *xen_sk(const struct sock *sk)
+{
+	return (struct xen_sock *)sk;
+}
+
+int xensock_proto_server_init(void);
+int xensock_proto_client_init(void);
+void xensock_proto_cleanup(void);
+
+#endif /* __AF_XENSOCK_H__ */
diff --git a/include/net/xensock.h b/include/net/xensock.h
new file mode 100644
index 0000000..2e5949b
--- /dev/null
+++ b/include/net/xensock.h
@@ -0,0 +1,130 @@
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef __XENSOCK_H__
+#define __XENSOCK_H__
+
+#include <linux/bitops.h>
+#include <linux/if.h>
+#include <net/sock.h>
+
+#define		XENSOCK_MTU		65535
+
+#define		SOCKDEV_ALIGN		32
+
+#define		__SOCK_STATE_NOCARRIER		0
+#define		__SOCK_STATE_QUEUESTOPPED	1
+
+struct xen_sock_dev {
+	char name[IFNAMSIZ];
+	struct sock *sk;
+	spinlock_t lock;       /* sock_dev operations lock */
+	unsigned long state;
+	unsigned short padded; /* Padding added by alloc_xen_sock_dev() */
+
+	int (*start_xmit)(struct sk_buff *skb, struct xen_sock_dev *dev);
+	unsigned long tx_queue_len;
+	struct list_head list;
+};
+
+/*
+ *	xensock_dev_priv - access sock device private data
+ *	@dev: xen_sock device
+ *
+ * Get xen_sock device private data
+ */
+static inline void *xensock_dev_priv(const struct xen_sock_dev *dev)
+{
+	return (char *)dev + ALIGN(sizeof(struct xen_sock_dev), SOCKDEV_ALIGN);
+}
+
+/*
+ *	sockif_carrier_ok - test if carrier present
+ *	@dev: xensock device
+ *
+ * Check if carrier is present on device
+ */
+static inline bool sockif_carrier_ok(const struct xen_sock_dev *dev)
+{
+	return !test_bit(__SOCK_STATE_NOCARRIER, &dev->state);
+}
+
+/*
+ *	sockif_carrier_on - set carrier
+ *	@dev: xensock device
+ */
+static inline void sockif_carrier_on(struct xen_sock_dev *dev)
+{
+	clear_bit(__SOCK_STATE_NOCARRIER, &dev->state);
+}
+
+/*
+ *	sockif_carrier_on - clear carrier
+ *	@dev: xensock device
+ */
+static inline void sockif_carrier_off(struct xen_sock_dev *dev)
+{
+	set_bit(__SOCK_STATE_NOCARRIER, &dev->state);
+}
+
+/*
+ *	sockif_queue_stopped - test if tx queue is stopped
+ *	@dev: xensock device
+ */
+static inline bool sockif_queue_stopped(const struct xen_sock_dev *dev)
+{
+	return test_bit(__SOCK_STATE_QUEUESTOPPED, &dev->state);
+}
+
+/*
+ *	sockif_wake_queue - wake tx queue
+ *	@dev: xensock device
+ */
+static inline void sockif_wake_queue(struct xen_sock_dev *dev)
+{
+	clear_bit(__SOCK_STATE_QUEUESTOPPED, &dev->state);
+}
+
+/*
+ *	sockif_stop_queue - stop tx queue
+ *	@dev: xensock device
+ */
+static inline void sockif_stop_queue(struct xen_sock_dev *dev)
+{
+	set_bit(__SOCK_STATE_QUEUESTOPPED, &dev->state);
+}
+
+struct xen_sock_dev *alloc_xen_sock_dev(int sizeof_priv, const char *name);
+void free_xen_sock_dev(struct xen_sock_dev *dev);
+int xensock_register_dev(struct xen_sock_dev *dev);
+void xensock_unregister_dev(struct xen_sock_dev *dev);
+void xensock_dev_unlink_sk(struct xen_sock_dev *dev);
+void xensock_unlink_all_dev_sk(void);
+int xensock_dev_wait(struct sock *sk, struct sock *nsk);
+int xensock_dev_send(struct sk_buff *skb, struct xen_sock_dev *dev);
+int xensock_dev_queue_rx_skb(struct sk_buff *skb, struct xen_sock_dev *dev);
+
+#endif /* __XENSOCK_H__ */
diff --git a/net/core/sock.c b/net/core/sock.c
index 026e01f..a57f264 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -262,7 +262,8 @@ static const char *const af_family_key_strings[AF_MAX+1] = {
   "sk_lock-AF_TIPC"  , "sk_lock-AF_BLUETOOTH", "sk_lock-IUCV"        ,
   "sk_lock-AF_RXRPC" , "sk_lock-AF_ISDN"     , "sk_lock-AF_PHONET"   ,
   "sk_lock-AF_IEEE802154", "sk_lock-AF_CAIF" , "sk_lock-AF_ALG"      ,
-  "sk_lock-AF_NFC"   , "sk_lock-AF_VSOCK"    , "sk_lock-AF_MAX"
+  "sk_lock-AF_NFC"   , "sk_lock-AF_VSOCK"    , "sk_lock-AF_XENSOCK"  ,
+  "sk_lock-AF_MAX"
 };
 static const char *const af_family_slock_key_strings[AF_MAX+1] = {
   "slock-AF_UNSPEC", "slock-AF_UNIX"     , "slock-AF_INET"     ,
@@ -278,7 +279,8 @@ static const char *const af_family_slock_key_strings[AF_MAX+1] = {
   "slock-AF_TIPC"  , "slock-AF_BLUETOOTH", "slock-AF_IUCV"     ,
   "slock-AF_RXRPC" , "slock-AF_ISDN"     , "slock-AF_PHONET"   ,
   "slock-AF_IEEE802154", "slock-AF_CAIF" , "slock-AF_ALG"      ,
-  "slock-AF_NFC"   , "slock-AF_VSOCK"    ,"slock-AF_MAX"
+  "slock-AF_NFC"   , "slock-AF_VSOCK"    , "slock-AF_XENSOCK"  ,
+  "slock-AF_MAX"
 };
 static const char *const af_family_clock_key_strings[AF_MAX+1] = {
   "clock-AF_UNSPEC", "clock-AF_UNIX"     , "clock-AF_INET"     ,
@@ -294,7 +296,8 @@ static const char *const af_family_clock_key_strings[AF_MAX+1] = {
   "clock-AF_TIPC"  , "clock-AF_BLUETOOTH", "clock-AF_IUCV"     ,
   "clock-AF_RXRPC" , "clock-AF_ISDN"     , "clock-AF_PHONET"   ,
   "clock-AF_IEEE802154", "clock-AF_CAIF" , "clock-AF_ALG"      ,
-  "clock-AF_NFC"   , "clock-AF_VSOCK"    , "clock-AF_MAX"
+  "clock-AF_NFC"   , "clock-AF_VSOCK"    , "clock-AF_XENSOCK"  ,
+  "clock-AF_MAX"
 };
 
 /*
-- 
1.8.2.rc2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] xen-sockback: add support of the xen-sockback driver
  2014-08-08 11:32 [PATCH 0/3] Introduce xensock socket and implement sockback and sockfront drivers Oleksandr Dmytryshyn
  2014-08-08 11:32 ` [PATCH 1/3] net: introduce new socket support: xensock Oleksandr Dmytryshyn
@ 2014-08-08 11:32 ` Oleksandr Dmytryshyn
  2014-08-08 11:32 ` [PATCH 3/3] xen-sockfront: add support of the xen-sockfront driver Oleksandr Dmytryshyn
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Oleksandr Dmytryshyn @ 2014-08-08 11:32 UTC (permalink / raw)
  To: Ian Campbell, Stefano Stabellini, Tim Deegan, xen-devel

This driver will allow to use xensock sockets for the
server application for the dom0/domD domain.

Signed-off-by: Oleksandr Dmytryshyn <oleksandr.dmytryshyn@globallogic.com>
---
 drivers/net/Kconfig                  |   19 +
 drivers/net/Makefile                 |    1 +
 drivers/net/xen-sockback/Makefile    |    3 +
 drivers/net/xen-sockback/common.h    |  150 ++++
 drivers/net/xen-sockback/interface.c |  245 ++++++
 drivers/net/xen-sockback/sockback.c  | 1527 ++++++++++++++++++++++++++++++++++
 drivers/net/xen-sockback/xenbus.c    |  348 ++++++++
 include/xen/interface/io/sockif.h    |   74 ++
 8 files changed, 2367 insertions(+)
 create mode 100644 drivers/net/xen-sockback/Makefile
 create mode 100644 drivers/net/xen-sockback/common.h
 create mode 100644 drivers/net/xen-sockback/interface.c
 create mode 100644 drivers/net/xen-sockback/sockback.c
 create mode 100644 drivers/net/xen-sockback/xenbus.c
 create mode 100644 include/xen/interface/io/sockif.h

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 420981a..e643f42 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -348,9 +348,28 @@ config XEN_NETDEV_BACKEND
 	  compile this driver as a module, chose M here: the module
 	  will be called xen-netback.
 
+config XEN_SOCKDEV_BACKEND
+	tristate "Xen backend socket device"
+	depends on XEN_BACKEND
+	select XEN_SOCKDEV_PROTO
+	help
+	  This driver allows the kernel to act as a Xen socket driver
+	  domain which exports paravirtual network devices to other
+	  Xen domains. These devices can be accessed by any operating
+	  system that implements a compatible front end.
+
+	  The corresponding Linux frontend driver is enabled by the
+	  CONFIG_XEN_SOCKDEV_FRONTEND configuration option.
+
+	  If you are compiling a kernel to run in a Xen socket driver
+	  domain (often this is domain 0) you should say Y here. To
+	  compile this driver as a module, chose M here: the module
+	  will be called xen-sockback.
+
 config XEN_SOCKDEV_PROTO
 	bool
 	default n
+	depends on XEN_SOCKDEV_BACKEND
 
 config VMXNET3
 	tristate "VMware VMXNET3 ethernet driver"
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 43bf910..96c6c97 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -62,6 +62,7 @@ obj-$(CONFIG_XEN_NETDEV_FRONTEND) += xen-netfront.o
 obj-$(CONFIG_XEN_NETDEV_BACKEND) += xen-netback/
 
 obj-$(CONFIG_XEN_SOCKDEV_PROTO) += xensock/
+obj-$(CONFIG_XEN_SOCKDEV_BACKEND) += xen-sockback/
 
 obj-$(CONFIG_USB_CATC)          += usb/
 obj-$(CONFIG_USB_KAWETH)        += usb/
diff --git a/drivers/net/xen-sockback/Makefile b/drivers/net/xen-sockback/Makefile
new file mode 100644
index 0000000..c255867
--- /dev/null
+++ b/drivers/net/xen-sockback/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_XEN_SOCKDEV_BACKEND) := xen-sockback.o
+
+xen-sockback-y := sockback.o xenbus.o interface.o
diff --git a/drivers/net/xen-sockback/common.h b/drivers/net/xen-sockback/common.h
new file mode 100644
index 0000000..b5b09bb
--- /dev/null
+++ b/drivers/net/xen-sockback/common.h
@@ -0,0 +1,150 @@
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef __XEN_SOCKBACK__COMMON_H__
+#define __XEN_SOCKBACK__COMMON_H__
+
+#define pr_fmt(fmt) KBUILD_MODNAME ":%s: " fmt, __func__
+
+#include <linux/module.h>
+#include <linux/interrupt.h>
+#include <linux/slab.h>
+#include <linux/ip.h>
+#include <linux/in.h>
+#include <linux/io.h>
+#include <linux/wait.h>
+#include <linux/sched.h>
+
+#include <xen/interface/io/sockif.h>
+#include <xen/interface/grant_table.h>
+#include <xen/grant_table.h>
+#include <xen/xenbus.h>
+
+#include <net/xensock.h>
+#include <net/af_xensock.h>
+
+struct xen_sockbk;
+
+struct xenvsock {
+	/* Unique identifier for this interface. */
+	domid_t          domid;
+
+	/* Reference to xensock processing backend. */
+	struct xen_sockbk *sockbk;
+
+	/* Physical parameters of the comms window. */
+	unsigned int     irq;
+
+	/* Back pointer to the xenbus_device. */
+	struct xenbus_device		*xbdev;
+
+	/* List of frontends to notify after a batch of frames sent. */
+	struct list_head notify_list;
+
+	/* The shared rings and indexes. */
+	struct xen_sockif_tx_back_ring tx;
+	struct xen_sockif_rx_back_ring rx;
+
+	/* Internal feature information. */
+	u8 can_queue:1;	    /* can queue packets for receiver? */
+
+	/* Allow xenvsock_start_xmit() to peek ahead in the rx request
+	 * ring.  This is a prediction of what rx_req_cons will be
+	 * once all queued skbs are put on the ring.
+	 */
+	RING_IDX rx_req_cons_peek;
+
+	/* Transmit shaping: allow 'credit_bytes' every 'credit_usec'. */
+	unsigned long   credit_bytes;
+	unsigned long   credit_usec;
+	unsigned long   remaining_credit;
+	struct timer_list credit_timeout;
+
+	/* Statistics */
+	unsigned long rx_gso_checksum_fixup;
+
+	/* Miscellaneous private stuff. */
+	struct list_head schedule_list;
+	atomic_t         refcnt;
+	struct xen_sock_dev *dev;
+
+	wait_queue_head_t waiting_to_free;
+};
+
+#define XEN_SOCKIF_TX_RING_SIZE __CONST_RING_SIZE(xen_sockif_tx, PAGE_SIZE)
+#define XEN_SOCKIF_RX_RING_SIZE __CONST_RING_SIZE(xen_sockif_rx, PAGE_SIZE)
+
+struct xenvsock *xenvsock_alloc(struct device *parent,
+				domid_t domid,
+				unsigned int handle);
+
+int xenvsock_connect(struct xenvsock *vsock, unsigned long tx_ring_ref,
+		     unsigned long rx_ring_ref, unsigned int evtchn);
+void xenvsock_disconnect(struct xenvsock *vsock);
+
+void xenvsock_get(struct xenvsock *vsock);
+void xenvsock_put(struct xenvsock *vsock);
+
+int xenvsock_xenbus_init(void);
+
+int xenvsock_schedulable(struct xenvsock *vsock);
+
+int xen_sockbk_rx_ring_full(struct xenvsock *vsock);
+
+int xen_sockbk_must_stop_queue(struct xenvsock *vsock);
+
+/* (Un)Map communication rings. */
+void xen_sockbk_unmap_frontend_rings(struct xenvsock *vsock);
+int xen_sockbk_map_frontend_rings(struct xenvsock *vsock,
+				  grant_ref_t tx_ring_ref,
+				  grant_ref_t rx_ring_ref);
+
+/* (De)Register a xenvsock with the xensock backend. */
+void xen_sockbk_add_xenvsock(struct xenvsock *vsock);
+void xen_sockbk_remove_xenvsock(struct xenvsock *vsock);
+
+/* (De)Schedule backend processing for a xenvsock */
+void xen_sockbk_schedule_xenvsock(struct xenvsock *vsock);
+void xen_sockbk_deschedule_xenvsock(struct xenvsock *vsock);
+
+/* Check for SKBs from frontend and schedule backend processing */
+void xen_sockbk_check_rx_xenvsock(struct xenvsock *vsock);
+/* Receive an SKB from the frontend */
+void xenvsock_receive_skb(struct xenvsock *vsock, struct sk_buff *skb);
+
+/* Queue an SKB for transmission to the frontend */
+void xen_sockbk_queue_tx_skb(struct xenvsock *vsock, struct sk_buff *skb);
+/* Notify xenvsock that ring now has space to send an skb to the frontend */
+void xenvsock_notify_tx_completion(struct xenvsock *vsock);
+
+/* Prevent the device from generating any further traffic. */
+void xenvsock_carrier_off(struct xenvsock *vsock);
+
+/* Returns number of ring slots required to send an skb to the frontend */
+unsigned int xen_sockbk_count_skb_slots(struct xenvsock *vsock,
+					struct sk_buff *skb);
+
+#endif /* __XEN_SOCKBACK__COMMON_H__ */
diff --git a/drivers/net/xen-sockback/interface.c b/drivers/net/xen-sockback/interface.c
new file mode 100644
index 0000000..995fb22
--- /dev/null
+++ b/drivers/net/xen-sockback/interface.c
@@ -0,0 +1,245 @@
+/*
+ * Xenvsock-device interface management.
+ *
+ * Copyright (c) 2004-2005, Keir Fraser
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "common.h"
+
+#include <xen/events.h>
+#include <asm/xen/hypercall.h>
+
+#define XENVSOCK_QUEUE_LENGTH 32
+
+void xenvsock_get(struct xenvsock *vsock)
+{
+	atomic_inc(&vsock->refcnt);
+}
+
+void xenvsock_put(struct xenvsock *vsock)
+{
+	if (atomic_dec_and_test(&vsock->refcnt))
+		wake_up(&vsock->waiting_to_free);
+}
+
+int xenvsock_schedulable(struct xenvsock *vsock)
+{
+	return sockif_carrier_ok(vsock->dev);
+}
+
+static int xenvsock_rx_schedulable(struct xenvsock *vsock)
+{
+	return xenvsock_schedulable(vsock) && !xen_sockbk_rx_ring_full(vsock);
+}
+
+static irqreturn_t xenvsock_interrupt(int irq, void *dev_id)
+{
+	struct xenvsock *vsock = dev_id;
+
+	if (vsock->sockbk == NULL)
+		return IRQ_NONE;
+
+	xen_sockbk_schedule_xenvsock(vsock);
+
+	if (xenvsock_rx_schedulable(vsock))
+		sockif_wake_queue(vsock->dev);
+
+	return IRQ_HANDLED;
+}
+
+static int xenvsock_start_xmit(struct sk_buff *skb, struct xen_sock_dev *dev)
+{
+	struct xenvsock *vsock = xensock_dev_priv(dev);
+
+	if (vsock->sockbk == NULL)
+		goto drop;
+
+	/* Drop the packet if the target domain has no receive buffers. */
+	if (!xenvsock_rx_schedulable(vsock))
+		goto drop;
+
+	skb->dev = (struct net_device *)dev;
+
+	/* Reserve ring slots for the worst-case number of fragments. */
+	vsock->rx_req_cons_peek += xen_sockbk_count_skb_slots(vsock, skb);
+	xenvsock_get(vsock);
+
+	if (vsock->can_queue && xen_sockbk_must_stop_queue(vsock))
+		sockif_stop_queue(dev);
+
+	xen_sockbk_queue_tx_skb(vsock, skb);
+
+	return 0;
+
+ drop:
+	dev_kfree_skb(skb);
+	return 0;
+}
+
+void xenvsock_receive_skb(struct xenvsock *vsock, struct sk_buff *skb)
+{
+	xensock_dev_queue_rx_skb(skb, vsock->dev);
+}
+
+void xenvsock_notify_tx_completion(struct xenvsock *vsock)
+{
+	if (sockif_queue_stopped(vsock->dev) && xenvsock_rx_schedulable(vsock))
+		sockif_wake_queue(vsock->dev);
+}
+
+static void xenvsock_up(struct xenvsock *vsock)
+{
+	xen_sockbk_add_xenvsock(vsock);
+	enable_irq(vsock->irq);
+	xen_sockbk_check_rx_xenvsock(vsock);
+}
+
+static void xenvsock_down(struct xenvsock *vsock)
+{
+	disable_irq(vsock->irq);
+	del_timer_sync(&vsock->credit_timeout);
+	xen_sockbk_deschedule_xenvsock(vsock);
+	xen_sockbk_remove_xenvsock(vsock);
+}
+
+static const struct xenvsock_stat {
+	char name[ETH_GSTRING_LEN];
+	u16 offset;
+} xenvsock_stats[] = {
+	{
+		"rx_gso_checksum_fixup",
+		offsetof(struct xenvsock, rx_gso_checksum_fixup)
+	},
+};
+
+struct xenvsock *xenvsock_alloc(struct device *parent, domid_t domid,
+				unsigned int handle)
+{
+	int err;
+	struct xen_sock_dev *dev;
+	struct xenvsock *vsock;
+	char name[IFNAMSIZ] = {};
+
+	snprintf(name, IFNAMSIZ - 1, "vsock%u.%u", domid, handle);
+	dev = alloc_xen_sock_dev(sizeof(struct xenvsock), name);
+	if (dev == NULL) {
+		pr_warn("Could not allocate sockdev\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	vsock = xensock_dev_priv(dev);
+	vsock->domid  = domid;
+	vsock->sockbk  = NULL;
+	atomic_set(&vsock->refcnt, 1);
+	init_waitqueue_head(&vsock->waiting_to_free);
+	vsock->dev = dev;
+	INIT_LIST_HEAD(&vsock->schedule_list);
+	INIT_LIST_HEAD(&vsock->notify_list);
+
+	vsock->credit_bytes = ~0UL;
+	vsock->remaining_credit = ~0UL;
+	vsock->credit_usec  = 0UL;
+	init_timer(&vsock->credit_timeout);
+	/* Initialize 'expires' now: it's used to track the credit window. */
+	vsock->credit_timeout.expires = jiffies;
+
+	dev->start_xmit = xenvsock_start_xmit;
+
+	dev->tx_queue_len = XENVSOCK_QUEUE_LENGTH;
+
+	sockif_carrier_off(dev);
+
+	err = xensock_register_dev(dev);
+	if (err) {
+		pr_warn("Could not register device: err=%d\n", err);
+		free_xen_sock_dev(dev);
+		return ERR_PTR(err);
+	}
+
+	pr_warn("Successfully created xenvsock\n");
+	return vsock;
+}
+
+int xenvsock_connect(struct xenvsock *vsock, unsigned long tx_ring_ref,
+		     unsigned long rx_ring_ref, unsigned int evtchn)
+{
+	int err = -ENOMEM;
+
+	/* Already connected through? */
+	if (vsock->irq)
+		return 0;
+
+	err = xen_sockbk_map_frontend_rings(vsock, tx_ring_ref, rx_ring_ref);
+	if (err < 0)
+		goto err;
+
+	err = bind_interdomain_evtchn_to_irqhandler(
+		vsock->domid, evtchn, xenvsock_interrupt, 0,
+		vsock->dev->name, vsock);
+	if (err < 0)
+		goto err_unmap;
+	vsock->irq = err;
+	disable_irq(vsock->irq);
+
+	xenvsock_get(vsock);
+
+	sockif_carrier_on(vsock->dev);
+	xenvsock_up(vsock);
+
+	return 0;
+err_unmap:
+	xen_sockbk_unmap_frontend_rings(vsock);
+err:
+	return err;
+}
+
+void xenvsock_carrier_off(struct xenvsock *vsock)
+{
+	struct xen_sock_dev *dev = vsock->dev;
+
+	sockif_carrier_off(dev); /* discard queued packets */
+	xenvsock_down(vsock);
+	xenvsock_put(vsock);
+}
+
+void xenvsock_disconnect(struct xenvsock *vsock)
+{
+	if (sockif_carrier_ok(vsock->dev))
+		xenvsock_carrier_off(vsock);
+
+	atomic_dec(&vsock->refcnt);
+	wait_event(vsock->waiting_to_free, atomic_read(&vsock->refcnt) == 0);
+
+	if (vsock->irq)
+		unbind_from_irqhandler(vsock->irq, vsock);
+
+	xensock_unregister_dev(vsock->dev);
+
+	xen_sockbk_unmap_frontend_rings(vsock);
+
+	free_xen_sock_dev(vsock->dev);
+}
diff --git a/drivers/net/xen-sockback/sockback.c b/drivers/net/xen-sockback/sockback.c
new file mode 100644
index 0000000..84f5848
--- /dev/null
+++ b/drivers/net/xen-sockback/sockback.c
@@ -0,0 +1,1527 @@
+/*
+ * Back-end of the driver for xensock devices. A
+ * reference front-end implementation can be found in:
+ *  drivers/net/xen-sockfront.c
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "common.h"
+
+#include <linux/kthread.h>
+#include <linux/if_vlan.h>
+
+#include <net/tcp.h>
+
+#include <xen/xen.h>
+#include <xen/events.h>
+#include <xen/interface/memory.h>
+
+#include <asm/xen/hypercall.h>
+#include <asm/xen/page.h>
+
+struct pending_tx_info {
+	struct xen_sockif_tx_request req;
+	struct xenvsock *vsock;
+};
+typedef unsigned int pending_ring_idx_t;
+
+struct sockbk_rx_meta {
+	int id;
+	int size;
+};
+
+#define SOCK_DEV_SKB(skb)	((struct xen_sock_dev *)((skb)->dev))
+
+#define MAX_PENDING_REQS 256
+
+/* Discriminate from any valid pending_idx value. */
+#define INVALID_PENDING_IDX 0xFFFF
+
+#define MAX_BUFFER_OFFSET PAGE_SIZE
+
+/* extra field used in struct page */
+union page_ext {
+	struct {
+#if BITS_PER_LONG < 64
+#define IDX_WIDTH   8
+#define GROUP_WIDTH (BITS_PER_LONG - IDX_WIDTH)
+		unsigned int group:GROUP_WIDTH;
+		unsigned int idx:IDX_WIDTH;
+#else
+		unsigned int group, idx;
+#endif
+	} e;
+	void *mapping;
+};
+
+struct xen_sockbk {
+	wait_queue_head_t wq;
+	struct task_struct *task;
+
+	struct sk_buff_head rx_queue;
+	struct sk_buff_head tx_queue;
+
+	struct timer_list sock_timer;
+
+	struct page *mmap_pages[MAX_PENDING_REQS];
+
+	pending_ring_idx_t pending_prod;
+	pending_ring_idx_t pending_cons;
+	struct list_head sock_schedule_list;
+
+	/* Protect the sock_schedule_list in sockbk. */
+	spinlock_t sock_schedule_list_lock;
+
+	atomic_t sockfront_count;
+
+	struct pending_tx_info pending_tx_info[MAX_PENDING_REQS];
+	struct gnttab_copy tx_copy_ops[MAX_PENDING_REQS];
+
+	u16 pending_ring[MAX_PENDING_REQS];
+
+	/* Given MAX_BUFFER_OFFSET of 4096 the worst case is that each
+	 * head/fragment page uses 2 copy operations because it
+	 * straddles two buffers in the frontend.
+	 */
+	struct gnttab_copy grant_copy_op[2*XEN_SOCKIF_RX_RING_SIZE];
+	struct sockbk_rx_meta meta[2*XEN_SOCKIF_RX_RING_SIZE];
+};
+
+static struct xen_sockbk *xen_sockbk;
+static int xen_sockbk_group_nr;
+
+void xen_sockbk_add_xenvsock(struct xenvsock *vsock)
+{
+	int i;
+	int min_sockfront_count;
+	int min_group = 0;
+	struct xen_sockbk *sockbk;
+
+	min_sockfront_count = atomic_read(&xen_sockbk[0].sockfront_count);
+	for (i = 0; i < xen_sockbk_group_nr; i++) {
+		int sockfront_count;
+
+		sockfront_count = atomic_read(&xen_sockbk[i].sockfront_count);
+		if (sockfront_count < min_sockfront_count) {
+			min_group = i;
+			min_sockfront_count = sockfront_count;
+		}
+	}
+
+	sockbk = &xen_sockbk[min_group];
+
+	vsock->sockbk = sockbk;
+	atomic_inc(&sockbk->sockfront_count);
+}
+
+void xen_sockbk_remove_xenvsock(struct xenvsock *vsock)
+{
+	struct xen_sockbk *sockbk = vsock->sockbk;
+
+	vsock->sockbk = NULL;
+	atomic_dec(&sockbk->sockfront_count);
+}
+
+static void xen_sockbk_idx_release(struct xen_sockbk *sockbk, u16 pending_idx,
+				   u8 status);
+static void make_tx_response(struct xenvsock *vsock,
+			     struct xen_sockif_tx_request *txp,
+			     s8       st);
+static struct xen_sockif_rx_response *make_rx_response(struct xenvsock *vsock,
+						       u16 id, s8 st,
+						       u16 offset,
+						       u16 size,
+						       u16 flags);
+
+static inline unsigned long idx_to_pfn(struct xen_sockbk *sockbk,
+				       u16 idx)
+{
+	return page_to_pfn(sockbk->mmap_pages[idx]);
+}
+
+static inline unsigned long idx_to_kaddr(struct xen_sockbk *sockbk,
+					 u16 idx)
+{
+	return (unsigned long)pfn_to_kaddr(idx_to_pfn(sockbk, idx));
+}
+
+/* extra field used in struct page */
+static inline void set_page_ext(struct page *pg, struct xen_sockbk *sockbk,
+				unsigned int idx)
+{
+	unsigned int group = sockbk - xen_sockbk;
+	union page_ext ext = { .e = { .group = group + 1, .idx = idx } };
+
+	BUILD_BUG_ON(sizeof(ext) > sizeof(ext.mapping));
+	pg->mapping = ext.mapping;
+}
+
+static int get_page_ext(struct page *pg,
+			unsigned int *pgroup, unsigned int *pidx)
+{
+	union page_ext ext = { .mapping = pg->mapping };
+	struct xen_sockbk *sockbk;
+	unsigned int group, idx;
+
+	group = ext.e.group - 1;
+
+	if (group < 0 || group >= xen_sockbk_group_nr)
+		return 0;
+
+	sockbk = &xen_sockbk[group];
+
+	idx = ext.e.idx;
+
+	if ((idx < 0) || (idx >= MAX_PENDING_REQS))
+		return 0;
+
+	if (sockbk->mmap_pages[idx] != pg)
+		return 0;
+
+	*pgroup = group;
+	*pidx = idx;
+
+	return 1;
+}
+
+/* This is the amount of packet we copy rather than map, so that the
+ * guest can't fiddle with the contents of the headers while we do
+ * packet processing on them.
+ */
+#define PKT_PROT_LEN    (ETH_HLEN + \
+			 VLAN_HLEN + \
+			 sizeof(struct iphdr) + MAX_IPOPTLEN + \
+			 sizeof(struct tcphdr) + MAX_TCP_OPTION_SPACE)
+
+static u16 frag_get_pending_idx(skb_frag_t *frag)
+{
+	return (u16)frag->page_offset;
+}
+
+static void frag_set_pending_idx(skb_frag_t *frag, u16 pending_idx)
+{
+	frag->page_offset = pending_idx;
+}
+
+static inline pending_ring_idx_t pending_index(unsigned i)
+{
+	return i & (MAX_PENDING_REQS-1);
+}
+
+static inline pending_ring_idx_t nr_pending_reqs(struct xen_sockbk *sockbk)
+{
+	return MAX_PENDING_REQS -
+		sockbk->pending_prod + sockbk->pending_cons;
+}
+
+static void xen_sockbk_kick_thread(struct xen_sockbk *sockbk)
+{
+	wake_up(&sockbk->wq);
+}
+
+static int max_required_rx_slots(struct xenvsock *vsock)
+{
+	int max = DIV_ROUND_UP(XENSOCK_MTU, PAGE_SIZE);
+	return max;
+}
+
+int xen_sockbk_rx_ring_full(struct xenvsock *vsock)
+{
+	RING_IDX peek   = vsock->rx_req_cons_peek;
+	RING_IDX needed = max_required_rx_slots(vsock);
+	RING_IDX rsp_prod = vsock->rx.rsp_prod_pvt + XEN_SOCKIF_RX_RING_SIZE;
+
+	return ((vsock->rx.sring->req_prod - peek) < needed) ||
+	       ((rsp_prod - peek) < needed);
+}
+
+int xen_sockbk_must_stop_queue(struct xenvsock *vsock)
+{
+	if (!xen_sockbk_rx_ring_full(vsock))
+		return 0;
+
+	vsock->rx.sring->req_event = vsock->rx_req_cons_peek +
+		max_required_rx_slots(vsock);
+	mb(); /* request notification /then/ check the queue */
+
+	return xen_sockbk_rx_ring_full(vsock);
+}
+
+/* Returns true if we should start a new receive buffer instead of
+ * adding 'size' bytes to a buffer which currently contains 'offset'
+ * bytes.
+ */
+static bool start_new_rx_buffer(int offset, unsigned long size, int head)
+{
+	/* simple case: we have completely filled the current buffer. */
+	if (offset == MAX_BUFFER_OFFSET)
+		return true;
+
+	/* complex case: start a fresh buffer if the current frag
+	 * would overflow the current buffer but only if:
+	 *     (i)   this frag would fit completely in the next buffer
+	 * and (ii)  there is already some data in the current buffer
+	 * and (iii) this is not the head buffer.
+	 *
+	 * Where:
+	 * - (i) stops us splitting a frag into two copies
+	 *   unless the frag is too large for a single buffer.
+	 * - (ii) stops us from leaving a buffer pointlessly empty.
+	 * - (iii) stops us leaving the first buffer
+	 *   empty. Strictly speaking this is already covered
+	 *   by (ii) but is explicitly checked because
+	 *   sock front relies on the first buffer being
+	 *   non-empty and can crash otherwise.
+	 *
+	 * This means we will effectively linearise small
+	 * frags but do not needlessly split large buffers
+	 * into multiple copies tend to give large frags their
+	 * own buffers as before.
+	 */
+	if ((offset + size > MAX_BUFFER_OFFSET) &&
+	    (size <= MAX_BUFFER_OFFSET) && offset && !head)
+		return true;
+
+	return false;
+}
+
+/* Figure out how many ring slots we're going to need to send @skb to
+ * the guest. This function is essentially a dry run of
+ * sockbk_gop_frag_copy.
+ */
+unsigned int xen_sockbk_count_skb_slots(struct xenvsock *vsock,
+					struct sk_buff *skb)
+{
+	unsigned int count;
+	int i, copy_off;
+
+	count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE);
+
+	copy_off = skb_headlen(skb) % PAGE_SIZE;
+
+	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+		unsigned long size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
+		unsigned long offset = skb_shinfo(skb)->frags[i].page_offset;
+		unsigned long bytes;
+
+		offset &= ~PAGE_MASK;
+
+		while (size > 0) {
+			BUG_ON(offset >= PAGE_SIZE);
+			BUG_ON(copy_off > MAX_BUFFER_OFFSET);
+
+			bytes = PAGE_SIZE - offset;
+
+			if (bytes > size)
+				bytes = size;
+
+			if (start_new_rx_buffer(copy_off, bytes, 0)) {
+				count++;
+				copy_off = 0;
+			}
+
+			if (copy_off + bytes > MAX_BUFFER_OFFSET)
+				bytes = MAX_BUFFER_OFFSET - copy_off;
+
+			copy_off += bytes;
+
+			offset += bytes;
+			size -= bytes;
+
+			if (offset == PAGE_SIZE)
+				offset = 0;
+		}
+	}
+	return count;
+}
+
+struct sockrx_pending_operations {
+	unsigned copy_prod, copy_cons;
+	unsigned meta_prod, meta_cons;
+	struct gnttab_copy *copy;
+	struct sockbk_rx_meta *meta;
+	int copy_off;
+	grant_ref_t copy_gref;
+};
+
+static
+struct sockbk_rx_meta *get_next_rx_buf(struct xenvsock *vsock,
+				       struct sockrx_pending_operations *spo)
+{
+	struct sockbk_rx_meta *meta;
+	struct xen_sockif_rx_request *req;
+
+	req = RING_GET_REQUEST(&vsock->rx, vsock->rx.req_cons++);
+
+	meta = spo->meta + spo->meta_prod++;
+	meta->size = 0;
+	meta->id = req->id;
+
+	spo->copy_off = 0;
+	spo->copy_gref = req->gref;
+
+	return meta;
+}
+
+/* Set up the grant operations for this fragment. If it's a flipping
+ * interface, we also set up the unmap request from here.
+ */
+static void sockbk_gop_frag_copy(struct xenvsock *vsock, struct sk_buff *skb,
+				 struct sockrx_pending_operations *spo,
+				 struct page *page, unsigned long size,
+				 unsigned long offset, int *head)
+{
+	struct gnttab_copy *copy_gop;
+	struct sockbk_rx_meta *meta;
+	/* These variables are used iff get_page_ext returns true,
+	 * in which case they are guaranteed to be initialized.
+	 */
+	unsigned int uninitialized_var(group), uninitialized_var(idx);
+	int foreign = get_page_ext(page, &group, &idx);
+	unsigned long bytes;
+
+	/* Data must not cross a page boundary. */
+	BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
+
+	meta = spo->meta + spo->meta_prod - 1;
+
+	/* Skip unused frames from start of page */
+	page += offset >> PAGE_SHIFT;
+	offset &= ~PAGE_MASK;
+
+	while (size > 0) {
+		BUG_ON(offset >= PAGE_SIZE);
+		BUG_ON(spo->copy_off > MAX_BUFFER_OFFSET);
+
+		bytes = PAGE_SIZE - offset;
+
+		if (bytes > size)
+			bytes = size;
+
+		if (start_new_rx_buffer(spo->copy_off, bytes, *head)) {
+			/* Sockfront requires there to be some data in the head
+			 * buffer.
+			 */
+			BUG_ON(*head);
+
+			meta = get_next_rx_buf(vsock, spo);
+		}
+
+		if (spo->copy_off + bytes > MAX_BUFFER_OFFSET)
+			bytes = MAX_BUFFER_OFFSET - spo->copy_off;
+
+		copy_gop = spo->copy + spo->copy_prod++;
+		copy_gop->flags = GNTCOPY_dest_gref;
+		if (foreign) {
+			struct xen_sockbk *sockbk = &xen_sockbk[group];
+			struct pending_tx_info *src_pend;
+
+			src_pend = &sockbk->pending_tx_info[idx];
+
+			copy_gop->source.domid = src_pend->vsock->domid;
+			copy_gop->source.u.ref = src_pend->req.gref;
+			copy_gop->flags |= GNTCOPY_source_gref;
+		} else {
+			void *vaddr = page_address(page);
+
+			copy_gop->source.domid = DOMID_SELF;
+			copy_gop->source.u.gmfn = virt_to_mfn(vaddr);
+		}
+		copy_gop->source.offset = offset;
+		copy_gop->dest.domid = vsock->domid;
+
+		copy_gop->dest.offset = spo->copy_off;
+		copy_gop->dest.u.ref = spo->copy_gref;
+		copy_gop->len = bytes;
+
+		spo->copy_off += bytes;
+		meta->size += bytes;
+
+		offset += bytes;
+		size -= bytes;
+
+		/* Next frame */
+		if (offset == PAGE_SIZE && size) {
+			BUG_ON(!PageCompound(page));
+			page++;
+			offset = 0;
+		}
+
+		*head = 0; /* There must be something in this buffer now. */
+	}
+}
+
+/* Prepare an SKB to be transmitted to the frontend.
+ *
+ * This function is responsible for allocating grant operations, meta
+ * structures, etc.
+ *
+ * It returns the number of meta structures consumed. The number of
+ * ring slots used is always equal to the number of meta slots used
+ * plus the number of GSO descriptors used. Currently, we use either
+ * zero GSO descriptors (for non-GSO packets) or one descriptor (for
+ * frontend-side LRO).
+ */
+static int sockbk_gop_skb(struct sk_buff *skb,
+			  struct sockrx_pending_operations *spo)
+{
+	struct xenvsock *vsock = xensock_dev_priv(SOCK_DEV_SKB(skb));
+	int nr_frags = skb_shinfo(skb)->nr_frags;
+	int i;
+	struct xen_sockif_rx_request *req;
+	struct sockbk_rx_meta *meta;
+	unsigned char *data;
+	int head = 1;
+	int old_meta_prod;
+
+	old_meta_prod = spo->meta_prod;
+
+	req = RING_GET_REQUEST(&vsock->rx, vsock->rx.req_cons++);
+	meta = spo->meta + spo->meta_prod++;
+
+	meta->size = 0;
+	meta->id = req->id;
+	spo->copy_off = 0;
+	spo->copy_gref = req->gref;
+
+	data = skb->data;
+	while (data < skb_tail_pointer(skb)) {
+		unsigned int offset = offset_in_page(data);
+		unsigned int len = PAGE_SIZE - offset;
+
+		if (data + len > skb_tail_pointer(skb))
+			len = skb_tail_pointer(skb) - data;
+
+		sockbk_gop_frag_copy(vsock, skb, spo,
+				     virt_to_page(data), len, offset, &head);
+		data += len;
+	}
+
+	for (i = 0; i < nr_frags; i++) {
+		sockbk_gop_frag_copy(vsock, skb, spo,
+				     skb_frag_page(&skb_shinfo(skb)->frags[i]),
+				     skb_frag_size(&skb_shinfo(skb)->frags[i]),
+				     skb_shinfo(skb)->frags[i].page_offset,
+				     &head);
+	}
+
+	return spo->meta_prod - old_meta_prod;
+}
+
+/* This is a twin to sockbk_gop_skb.  Assume that sockbk_gop_skb was
+ * used to set up the operations on the top of
+ * sockrx_pending_operations, which have since been done.  Check that
+ * they didn't give any errors and advance over them.
+ */
+static int sockbk_check_gop(struct xenvsock *vsock, int nr_meta_slots,
+			    struct sockrx_pending_operations *spo)
+{
+	struct gnttab_copy     *copy_op;
+	int status = XEN_SOCKIF_RSP_OKAY;
+	int i;
+
+	for (i = 0; i < nr_meta_slots; i++) {
+		copy_op = spo->copy + spo->copy_cons++;
+		if (copy_op->status != GNTST_okay) {
+			dev_dbg(&vsock->xbdev->dev,
+				"Bad status %d from copy to DOM%d.\n",
+				copy_op->status, vsock->domid);
+			status = XEN_SOCKIF_RSP_ERROR;
+		}
+	}
+
+	return status;
+}
+
+static void sockbk_add_frag_responses(struct xenvsock *vsock, int status,
+				      struct sockbk_rx_meta *meta,
+				      int nr_meta_slots)
+{
+	int i;
+	unsigned long offset;
+
+	/* No fragments used */
+	if (nr_meta_slots <= 1)
+		return;
+
+	nr_meta_slots--;
+
+	for (i = 0; i < nr_meta_slots; i++) {
+		int flags;
+
+		if (i == nr_meta_slots - 1)
+			flags = 0;
+		else
+			flags = XEN_SOCKRXF_more_data;
+
+		offset = 0;
+		make_rx_response(vsock, meta[i].id, status, offset,
+				 meta[i].size, flags);
+	}
+}
+
+struct skb_cb_overlay {
+	int meta_slots_used;
+};
+
+static void xen_sockbk_rx_action(struct xen_sockbk *sockbk)
+{
+	struct xenvsock *vsock = NULL, *tmp;
+	s8 status;
+	u16 irq, flags;
+	struct xen_sockif_rx_response *resp;
+	struct sk_buff_head rxq;
+	struct sk_buff *skb;
+	LIST_HEAD(notify);
+	int ret;
+	int nr_frags;
+	int count;
+	unsigned long offset;
+	struct skb_cb_overlay *sco;
+
+	struct sockrx_pending_operations spo = {
+		.copy  = sockbk->grant_copy_op,
+		.meta  = sockbk->meta,
+	};
+
+	skb_queue_head_init(&rxq);
+
+	count = 0;
+
+	while ((skb = skb_dequeue(&sockbk->rx_queue)) != NULL) {
+		vsock = xensock_dev_priv(SOCK_DEV_SKB(skb));
+
+		nr_frags = skb_shinfo(skb)->nr_frags;
+
+		sco = (struct skb_cb_overlay *)skb->cb;
+		sco->meta_slots_used = sockbk_gop_skb(skb, &spo);
+
+		count += nr_frags + 1;
+
+		__skb_queue_tail(&rxq, skb);
+
+		/* Filled the batch queue? */
+		if (count + MAX_SKB_FRAGS >= XEN_SOCKIF_RX_RING_SIZE)
+			break;
+	}
+
+	BUG_ON(spo.meta_prod > ARRAY_SIZE(sockbk->meta));
+
+	if (!spo.copy_prod)
+		return;
+
+	BUG_ON(spo.copy_prod > ARRAY_SIZE(sockbk->grant_copy_op));
+	gnttab_batch_copy(sockbk->grant_copy_op, spo.copy_prod);
+
+	while ((skb = __skb_dequeue(&rxq)) != NULL) {
+		sco = (struct skb_cb_overlay *)skb->cb;
+
+		vsock = xensock_dev_priv(SOCK_DEV_SKB(skb));
+
+		status = sockbk_check_gop(vsock, sco->meta_slots_used, &spo);
+
+		if (sco->meta_slots_used == 1)
+			flags = 0;
+		else
+			flags = XEN_SOCKRXF_more_data;
+
+		offset = 0;
+		resp = make_rx_response(vsock, sockbk->meta[spo.meta_cons].id,
+					status, offset,
+					sockbk->meta[spo.meta_cons].size,
+					flags);
+
+		sockbk_add_frag_responses(vsock, status,
+					  sockbk->meta + spo.meta_cons + 1,
+					  sco->meta_slots_used);
+
+		RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&vsock->rx, ret);
+		irq = vsock->irq;
+		if (ret && list_empty(&vsock->notify_list))
+			list_add_tail(&vsock->notify_list, &notify);
+
+		xenvsock_notify_tx_completion(vsock);
+
+		xenvsock_put(vsock);
+		spo.meta_cons += sco->meta_slots_used;
+		dev_kfree_skb(skb);
+	}
+
+	list_for_each_entry_safe(vsock, tmp, &notify, notify_list) {
+		notify_remote_via_irq(vsock->irq);
+		list_del_init(&vsock->notify_list);
+	}
+
+	/* More work to do? */
+	if (!skb_queue_empty(&sockbk->rx_queue) &&
+	    !timer_pending(&sockbk->sock_timer))
+		xen_sockbk_kick_thread(sockbk);
+}
+
+void xen_sockbk_queue_tx_skb(struct xenvsock *vsock, struct sk_buff *skb)
+{
+	struct xen_sockbk *sockbk = vsock->sockbk;
+
+	skb_queue_tail(&sockbk->rx_queue, skb);
+	xen_sockbk_kick_thread(sockbk);
+}
+
+static void xen_sockbk_alarm(unsigned long data)
+{
+	struct xen_sockbk *sockbk = (struct xen_sockbk *)data;
+
+	xen_sockbk_kick_thread(sockbk);
+}
+
+static int __on_sock_schedule_list(struct xenvsock *vsock)
+{
+	return !list_empty(&vsock->schedule_list);
+}
+
+/* Must be called with sock_schedule_list_lock held */
+static void remove_from_sock_schedule_list(struct xenvsock *vsock)
+{
+	if (likely(__on_sock_schedule_list(vsock))) {
+		list_del_init(&vsock->schedule_list);
+		xenvsock_put(vsock);
+	}
+}
+
+static struct xenvsock *poll_sock_schedule_list(struct xen_sockbk *sockbk)
+{
+	struct xenvsock *vsock = NULL;
+
+	spin_lock_irq(&sockbk->sock_schedule_list_lock);
+	if (list_empty(&sockbk->sock_schedule_list))
+		goto out;
+
+	vsock = list_first_entry(&sockbk->sock_schedule_list,
+				 struct xenvsock, schedule_list);
+	if (!vsock)
+		goto out;
+
+	xenvsock_get(vsock);
+
+	remove_from_sock_schedule_list(vsock);
+out:
+	spin_unlock_irq(&sockbk->sock_schedule_list_lock);
+	return vsock;
+}
+
+void xen_sockbk_schedule_xenvsock(struct xenvsock *vsock)
+{
+	unsigned long flags;
+	struct xen_sockbk *sockbk = vsock->sockbk;
+	struct list_head *sockbk_schedule_list = &sockbk->sock_schedule_list;
+
+	if (__on_sock_schedule_list(vsock))
+		goto kick;
+
+	spin_lock_irqsave(&sockbk->sock_schedule_list_lock, flags);
+	if (!__on_sock_schedule_list(vsock) &&
+	    likely(xenvsock_schedulable(vsock))) {
+		list_add_tail(&vsock->schedule_list, sockbk_schedule_list);
+		xenvsock_get(vsock);
+	}
+	spin_unlock_irqrestore(&sockbk->sock_schedule_list_lock, flags);
+
+kick:
+	smp_mb(); /* Ensure that an item is added to the shedule list */
+	if ((nr_pending_reqs(sockbk) < (MAX_PENDING_REQS/2)) &&
+	    !list_empty(sockbk_schedule_list))
+		xen_sockbk_kick_thread(sockbk);
+}
+
+void xen_sockbk_deschedule_xenvsock(struct xenvsock *vsock)
+{
+	struct xen_sockbk *sockbk = vsock->sockbk;
+
+	spin_lock_irq(&sockbk->sock_schedule_list_lock);
+	remove_from_sock_schedule_list(vsock);
+	spin_unlock_irq(&sockbk->sock_schedule_list_lock);
+}
+
+void xen_sockbk_check_rx_xenvsock(struct xenvsock *vsock)
+{
+	int more_to_do;
+
+	RING_FINAL_CHECK_FOR_REQUESTS(&vsock->tx, more_to_do);
+
+	if (more_to_do)
+		xen_sockbk_schedule_xenvsock(vsock);
+}
+
+static void tx_add_credit(struct xenvsock *vsock)
+{
+	unsigned long max_burst, max_credit;
+
+	/* Allow a burst big enough to transmit a jumbo packet of up to 128kB.
+	 * Otherwise the interface can seize up due to insufficient credit.
+	 */
+	max_burst = RING_GET_REQUEST(&vsock->tx, vsock->tx.req_cons)->size;
+	max_burst = min(max_burst, 131072UL);
+	max_burst = max(max_burst, vsock->credit_bytes);
+
+	/* Take care that adding a new chunk of credit doesn't wrap to zero. */
+	max_credit = vsock->remaining_credit + vsock->credit_bytes;
+	if (max_credit < vsock->remaining_credit)
+		max_credit = ULONG_MAX; /* wrapped: clamp to ULONG_MAX */
+
+	vsock->remaining_credit = min(max_credit, max_burst);
+}
+
+static void tx_credit_callback(unsigned long data)
+{
+	struct xenvsock *vsock = (struct xenvsock *)data;
+
+	tx_add_credit(vsock);
+	xen_sockbk_check_rx_xenvsock(vsock);
+}
+
+static void sockbk_tx_err(struct xenvsock *vsock,
+			  struct xen_sockif_tx_request *txp, RING_IDX end)
+{
+	RING_IDX cons = vsock->tx.req_cons;
+
+	do {
+		make_tx_response(vsock, txp, XEN_SOCKIF_RSP_ERROR);
+		if (cons == end)
+			break;
+		txp = RING_GET_REQUEST(&vsock->tx, cons++);
+	} while (1);
+	vsock->tx.req_cons = cons;
+	xen_sockbk_check_rx_xenvsock(vsock);
+	xenvsock_put(vsock);
+}
+
+static void sockbk_fatal_tx_err(struct xenvsock *vsock)
+{
+	dev_err(&vsock->xbdev->dev, "fatal error; disabling device\n");
+	xenvsock_carrier_off(vsock);
+	xenvsock_put(vsock);
+}
+
+static int sockbk_count_requests(struct xenvsock *vsock,
+				 struct xen_sockif_tx_request *first,
+				 struct xen_sockif_tx_request *txp,
+				 int work_to_do)
+{
+	RING_IDX cons = vsock->tx.req_cons;
+	int frags = 0;
+
+	if (!(first->flags & XEN_SOCKTXF_more_data))
+		return 0;
+
+	do {
+		if (frags >= work_to_do) {
+			dev_err(&vsock->xbdev->dev, "Need more frags\n");
+			sockbk_fatal_tx_err(vsock);
+			return -ENODATA;
+		}
+
+		if (unlikely(frags >= MAX_SKB_FRAGS)) {
+			dev_err(&vsock->xbdev->dev, "Too many frags\n");
+			sockbk_fatal_tx_err(vsock);
+			return -E2BIG;
+		}
+
+		memcpy(txp, RING_GET_REQUEST(&vsock->tx, cons + frags),
+		       sizeof(*txp));
+		if (txp->size > first->size) {
+			dev_err(&vsock->xbdev->dev, "Frag is bigger than frame.\n");
+			sockbk_fatal_tx_err(vsock);
+			return -EIO;
+		}
+
+		first->size -= txp->size;
+		frags++;
+
+		if (unlikely((txp->offset + txp->size) > PAGE_SIZE)) {
+			dev_err(&vsock->xbdev->dev, "txp->offset: %x, size: %u\n",
+				txp->offset, txp->size);
+			sockbk_fatal_tx_err(vsock);
+			return -EINVAL;
+		}
+	} while ((txp++)->flags & XEN_SOCKTXF_more_data);
+	return frags;
+}
+
+static struct page *xen_sockbk_alloc_page(struct xen_sockbk *sockbk,
+					  struct sk_buff *skb,
+					  u16 pending_idx)
+{
+	struct page *page;
+
+	page = alloc_page(GFP_KERNEL|__GFP_COLD);
+	if (!page)
+		return NULL;
+	set_page_ext(page, sockbk, pending_idx);
+	sockbk->mmap_pages[pending_idx] = page;
+	return page;
+}
+
+static
+struct gnttab_copy *xen_sockbk_get_requests(struct xen_sockbk *sockbk,
+					    struct xenvsock *vsock,
+					    struct sk_buff *skb,
+					    struct xen_sockif_tx_request *txp,
+					    struct gnttab_copy *gop)
+{
+	struct skb_shared_info *shinfo = skb_shinfo(skb);
+	skb_frag_t *frags = shinfo->frags;
+	u16 pending_idx = *((u16 *)skb->data);
+	int i, start;
+
+	/* Skip first skb fragment if it is on same page as header fragment. */
+	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
+
+	for (i = start; i < shinfo->nr_frags; i++, txp++) {
+		struct page *page;
+		pending_ring_idx_t index;
+		struct pending_tx_info *pending_tx_info =
+			sockbk->pending_tx_info;
+
+		index = pending_index(sockbk->pending_cons++);
+		pending_idx = sockbk->pending_ring[index];
+		page = xen_sockbk_alloc_page(sockbk, skb, pending_idx);
+		if (!page)
+			goto err;
+
+		gop->source.u.ref = txp->gref;
+		gop->source.domid = vsock->domid;
+		gop->source.offset = txp->offset;
+
+		gop->dest.u.gmfn = virt_to_mfn(page_address(page));
+		gop->dest.domid = DOMID_SELF;
+		gop->dest.offset = txp->offset;
+
+		gop->len = txp->size;
+		gop->flags = GNTCOPY_source_gref;
+
+		gop++;
+
+		memcpy(&pending_tx_info[pending_idx].req, txp, sizeof(*txp));
+		xenvsock_get(vsock);
+		pending_tx_info[pending_idx].vsock = vsock;
+		frag_set_pending_idx(&frags[i], pending_idx);
+	}
+
+	return gop;
+err:
+	/* Unwind, freeing all pages and sending error responses. */
+	while (i-- > start) {
+		xen_sockbk_idx_release(sockbk, frag_get_pending_idx(&frags[i]),
+				       XEN_SOCKIF_RSP_ERROR);
+	}
+	/* The head too, if necessary. */
+	if (start)
+		xen_sockbk_idx_release(sockbk, pending_idx,
+				       XEN_SOCKIF_RSP_ERROR);
+
+	return NULL;
+}
+
+static int xen_sockbk_tx_check_gop(struct xen_sockbk *sockbk,
+				   struct sk_buff *skb,
+				   struct gnttab_copy **gopp)
+{
+	struct gnttab_copy *gop = *gopp;
+	u16 pending_idx = *((u16 *)skb->data);
+	struct skb_shared_info *shinfo = skb_shinfo(skb);
+	int nr_frags = shinfo->nr_frags;
+	int i, err, start;
+
+	/* Check status of header. */
+	err = gop->status;
+	if (unlikely(err))
+		xen_sockbk_idx_release(sockbk, pending_idx,
+				       XEN_SOCKIF_RSP_ERROR);
+
+	/* Skip first skb fragment if it is on same page as header fragment. */
+	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
+
+	for (i = start; i < nr_frags; i++) {
+		int j, newerr;
+
+		pending_idx = frag_get_pending_idx(&shinfo->frags[i]);
+
+		/* Check error status: if okay then remember grant handle. */
+		newerr = (++gop)->status;
+		if (likely(!newerr)) {
+			/* Had a previous error? Invalidate this fragment. */
+			if (unlikely(err))
+				xen_sockbk_idx_release(sockbk, pending_idx,
+						       XEN_SOCKIF_RSP_OKAY);
+			continue;
+		}
+
+		/* Error on this fragment: respond to client with an error. */
+		xen_sockbk_idx_release(sockbk, pending_idx,
+				       XEN_SOCKIF_RSP_ERROR);
+
+		/* Not the first error? Preceding frags already invalidated. */
+		if (err)
+			continue;
+
+		/* First error: invalidate header and preceding fragments. */
+		pending_idx = *((u16 *)skb->data);
+		xen_sockbk_idx_release(sockbk, pending_idx,
+				       XEN_SOCKIF_RSP_OKAY);
+		for (j = start; j < i; j++) {
+			pending_idx = frag_get_pending_idx(&shinfo->frags[j]);
+			xen_sockbk_idx_release(sockbk, pending_idx,
+					       XEN_SOCKIF_RSP_OKAY);
+		}
+
+		/* Remember the error: invalidate all subsequent fragments. */
+		err = newerr;
+	}
+
+	*gopp = gop + 1;
+	return err;
+}
+
+static void xen_sockbk_fill_frags(struct xen_sockbk *sockbk,
+				  struct sk_buff *skb)
+{
+	struct skb_shared_info *shinfo = skb_shinfo(skb);
+	int nr_frags = shinfo->nr_frags;
+	int i;
+
+	for (i = 0; i < nr_frags; i++) {
+		skb_frag_t *frag = shinfo->frags + i;
+		struct xen_sockif_tx_request *txp;
+		struct page *page;
+		u16 pending_idx;
+
+		pending_idx = frag_get_pending_idx(frag);
+
+		txp = &sockbk->pending_tx_info[pending_idx].req;
+		page = virt_to_page(idx_to_kaddr(sockbk, pending_idx));
+		__skb_fill_page_desc(skb, i, page, txp->offset, txp->size);
+		skb->len += txp->size;
+		skb->data_len += txp->size;
+		skb->truesize += txp->size;
+
+		/* Take an extra reference to offset xen_sockbk_idx_release */
+		get_page(sockbk->mmap_pages[pending_idx]);
+		xen_sockbk_idx_release(sockbk, pending_idx,
+				       XEN_SOCKIF_RSP_OKAY);
+	}
+}
+
+static bool tx_credit_exceeded(struct xenvsock *vsock, unsigned size)
+{
+	unsigned long now = jiffies;
+	unsigned long next_credit =
+		vsock->credit_timeout.expires +
+		msecs_to_jiffies(vsock->credit_usec / 1000);
+
+	/* Timer could already be pending in rare cases. */
+	if (timer_pending(&vsock->credit_timeout))
+		return true;
+
+	/* Passed the point where we can replenish credit? */
+	if (time_after_eq(now, next_credit)) {
+		vsock->credit_timeout.expires = now;
+		tx_add_credit(vsock);
+	}
+
+	/* Still too big to send right now? Set a callback. */
+	if (size > vsock->remaining_credit) {
+		vsock->credit_timeout.data     =
+			(unsigned long)vsock;
+		vsock->credit_timeout.function =
+			tx_credit_callback;
+		mod_timer(&vsock->credit_timeout,
+			  next_credit);
+
+		return true;
+	}
+
+	return false;
+}
+
+static unsigned xen_sockbk_tx_build_gops(struct xen_sockbk *sockbk)
+{
+	struct gnttab_copy *gop = sockbk->tx_copy_ops, *request_gop;
+	struct sk_buff *skb;
+	int ret;
+
+	while (((nr_pending_reqs(sockbk) + MAX_SKB_FRAGS) < MAX_PENDING_REQS) &&
+	       !list_empty(&sockbk->sock_schedule_list)) {
+		struct xenvsock *vsock;
+		struct xen_sockif_tx_request txreq;
+		struct xen_sockif_tx_request txfrags[MAX_SKB_FRAGS];
+		struct page *page;
+		u16 pending_idx;
+		RING_IDX idx;
+		int work_to_do;
+		unsigned int data_len;
+		pending_ring_idx_t index;
+
+		/* Get a vsock from the list with work to do. */
+		vsock = poll_sock_schedule_list(sockbk);
+		/* This can sometimes happen because the test of
+		 * list_empty(sock_schedule_list) at the top of the
+		 * loop is unlocked.  Just go back and have another
+		 * look.
+		 */
+		if (!vsock)
+			continue;
+
+		if (vsock->tx.sring->req_prod - vsock->tx.req_cons >
+		    XEN_SOCKIF_TX_RING_SIZE) {
+			dev_err(&vsock->xbdev->dev,
+				"Impossible number of requests. "
+				"req_prod %d, req_cons %d, size %ld\n",
+				vsock->tx.sring->req_prod,
+				vsock->tx.req_cons,
+				XEN_SOCKIF_TX_RING_SIZE);
+			sockbk_fatal_tx_err(vsock);
+			continue;
+		}
+
+		RING_FINAL_CHECK_FOR_REQUESTS(&vsock->tx, work_to_do);
+		if (!work_to_do) {
+			xenvsock_put(vsock);
+			continue;
+		}
+
+		idx = vsock->tx.req_cons;
+		rmb(); /* Ensure that we see the request before we copy it. */
+		memcpy(&txreq, RING_GET_REQUEST(&vsock->tx, idx),
+		       sizeof(txreq));
+
+		/* Credit-based scheduling. */
+		if (txreq.size > vsock->remaining_credit &&
+		    tx_credit_exceeded(vsock, txreq.size)) {
+			xenvsock_put(vsock);
+			continue;
+		}
+
+		vsock->remaining_credit -= txreq.size;
+
+		work_to_do--;
+		vsock->tx.req_cons = ++idx;
+
+		ret = sockbk_count_requests(vsock, &txreq, txfrags, work_to_do);
+		if (unlikely(ret < 0))
+			continue;
+
+		idx += ret;
+
+		if (unlikely(txreq.size == 0)) {
+			dev_dbg(&vsock->xbdev->dev,
+				"Bad packet size: %d\n", txreq.size);
+			sockbk_tx_err(vsock, &txreq, idx);
+			continue;
+		}
+
+		/* No crossing a page as the payload mustn't fragment. */
+		if (unlikely((txreq.offset + txreq.size) > PAGE_SIZE)) {
+			dev_err(&vsock->xbdev->dev,
+				"txreq.offset: %x, size: %u, end: %lu\n",
+				txreq.offset, txreq.size,
+				(txreq.offset&~PAGE_MASK) + txreq.size);
+			sockbk_fatal_tx_err(vsock);
+			continue;
+		}
+
+		index = pending_index(sockbk->pending_cons);
+		pending_idx = sockbk->pending_ring[index];
+
+		data_len = (txreq.size > PKT_PROT_LEN &&
+			    ret < MAX_SKB_FRAGS) ?
+			PKT_PROT_LEN : txreq.size;
+
+		skb = alloc_skb(data_len + NET_SKB_PAD + NET_IP_ALIGN,
+				GFP_ATOMIC | __GFP_NOWARN);
+		if (unlikely(skb == NULL)) {
+			dev_dbg(&vsock->xbdev->dev,
+				"Can't allocate a skb in start_xmit.\n");
+			sockbk_tx_err(vsock, &txreq, idx);
+			break;
+		}
+
+		/* Packets passed to sockif_rx() must have some headroom. */
+		skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+
+		/* XXX could copy straight to head */
+		page = xen_sockbk_alloc_page(sockbk, skb, pending_idx);
+		if (!page) {
+			kfree_skb(skb);
+			sockbk_tx_err(vsock, &txreq, idx);
+			continue;
+		}
+
+		gop->source.u.ref = txreq.gref;
+		gop->source.domid = vsock->domid;
+		gop->source.offset = txreq.offset;
+
+		gop->dest.u.gmfn = virt_to_mfn(page_address(page));
+		gop->dest.domid = DOMID_SELF;
+		gop->dest.offset = txreq.offset;
+
+		gop->len = txreq.size;
+		gop->flags = GNTCOPY_source_gref;
+
+		gop++;
+
+		memcpy(&sockbk->pending_tx_info[pending_idx].req,
+		       &txreq, sizeof(txreq));
+		sockbk->pending_tx_info[pending_idx].vsock = vsock;
+		*((u16 *)skb->data) = pending_idx;
+
+		__skb_put(skb, data_len);
+
+		skb_shinfo(skb)->nr_frags = ret;
+		if (data_len < txreq.size) {
+			skb_shinfo(skb)->nr_frags++;
+			frag_set_pending_idx(&skb_shinfo(skb)->frags[0],
+					     pending_idx);
+		} else {
+			frag_set_pending_idx(&skb_shinfo(skb)->frags[0],
+					     INVALID_PENDING_IDX);
+		}
+
+		sockbk->pending_cons++;
+
+		request_gop = xen_sockbk_get_requests(sockbk, vsock,
+						      skb, txfrags, gop);
+		if (request_gop == NULL) {
+			kfree_skb(skb);
+			sockbk_tx_err(vsock, &txreq, idx);
+			continue;
+		}
+		gop = request_gop;
+
+		__skb_queue_tail(&sockbk->tx_queue, skb);
+
+		vsock->tx.req_cons = idx;
+		xen_sockbk_check_rx_xenvsock(vsock);
+
+		if (gop-sockbk->tx_copy_ops >= ARRAY_SIZE(sockbk->tx_copy_ops))
+			break;
+	}
+
+	return gop - sockbk->tx_copy_ops;
+}
+
+static void xen_sockbk_tx_submit(struct xen_sockbk *sockbk)
+{
+	struct gnttab_copy *gop = sockbk->tx_copy_ops;
+	struct sk_buff *skb;
+
+	while ((skb = __skb_dequeue(&sockbk->tx_queue)) != NULL) {
+		struct xen_sockif_tx_request *txp;
+		struct xenvsock *vsock;
+		u16 pending_idx;
+		unsigned data_len;
+
+		pending_idx = *((u16 *)skb->data);
+		vsock = sockbk->pending_tx_info[pending_idx].vsock;
+		txp = &sockbk->pending_tx_info[pending_idx].req;
+
+		/* Check the remap error code. */
+		if (unlikely(xen_sockbk_tx_check_gop(sockbk, skb, &gop))) {
+			dev_dbg(&vsock->xbdev->dev, "sockback grant failed.\n");
+			skb_shinfo(skb)->nr_frags = 0;
+			kfree_skb(skb);
+			continue;
+		}
+
+		data_len = skb->len;
+		memcpy(skb->data,
+		       (void *)(idx_to_kaddr(sockbk, pending_idx)|txp->offset),
+		       data_len);
+		if (data_len < txp->size) {
+			/* Append the packet payload as a fragment. */
+			txp->offset += data_len;
+			txp->size -= data_len;
+		} else {
+			/* Schedule a response immediately. */
+			xen_sockbk_idx_release(sockbk, pending_idx,
+					       XEN_SOCKIF_RSP_OKAY);
+		}
+
+		xen_sockbk_fill_frags(sockbk, skb);
+
+		/* If the initial fragment was < PKT_PROT_LEN then
+		 * pull through some bytes from the other fragments to
+		 * increase the linear region to PKT_PROT_LEN bytes.
+		 */
+		if (skb_headlen(skb) < PKT_PROT_LEN && skb_is_nonlinear(skb)) {
+			int target = min_t(int, skb->len, PKT_PROT_LEN);
+
+			__pskb_pull_tail(skb, target - skb_headlen(skb));
+		}
+
+		xenvsock_receive_skb(vsock, skb);
+	}
+}
+
+/* Called after sockfront has transmitted */
+static void xen_sockbk_tx_action(struct xen_sockbk *sockbk)
+{
+	unsigned nr_gops;
+
+	nr_gops = xen_sockbk_tx_build_gops(sockbk);
+
+	if (nr_gops == 0)
+		return;
+
+	gnttab_batch_copy(sockbk->tx_copy_ops, nr_gops);
+
+	xen_sockbk_tx_submit(sockbk);
+}
+
+static void xen_sockbk_idx_release(struct xen_sockbk *sockbk, u16 pending_idx,
+				   u8 status)
+{
+	struct xenvsock *vsock;
+	struct pending_tx_info *pending_tx_info;
+	pending_ring_idx_t index;
+
+	/* Already complete? */
+	if (sockbk->mmap_pages[pending_idx] == NULL)
+		return;
+
+	pending_tx_info = &sockbk->pending_tx_info[pending_idx];
+
+	vsock = pending_tx_info->vsock;
+
+	make_tx_response(vsock, &pending_tx_info->req, status);
+
+	index = pending_index(sockbk->pending_prod++);
+	sockbk->pending_ring[index] = pending_idx;
+
+	xenvsock_put(vsock);
+
+	sockbk->mmap_pages[pending_idx]->mapping = 0;
+	put_page(sockbk->mmap_pages[pending_idx]);
+	sockbk->mmap_pages[pending_idx] = NULL;
+}
+
+static void make_tx_response(struct xenvsock *vsock,
+			     struct xen_sockif_tx_request *txp,
+			     s8 st)
+{
+	RING_IDX i = vsock->tx.rsp_prod_pvt;
+	struct xen_sockif_tx_response *resp;
+	int notify;
+
+	resp = RING_GET_RESPONSE(&vsock->tx, i);
+	resp->id     = txp->id;
+	resp->status = st;
+
+	vsock->tx.rsp_prod_pvt = ++i;
+	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&vsock->tx, notify);
+	if (notify)
+		notify_remote_via_irq(vsock->irq);
+}
+
+static struct xen_sockif_rx_response *make_rx_response(struct xenvsock *vsock,
+						       u16 id, s8 st,
+						       u16 offset, u16 size,
+						       u16 flags)
+{
+	RING_IDX i = vsock->rx.rsp_prod_pvt;
+	struct xen_sockif_rx_response *resp;
+
+	resp = RING_GET_RESPONSE(&vsock->rx, i);
+	resp->offset     = offset;
+	resp->flags      = flags;
+	resp->id         = id;
+	resp->status     = (s16)size;
+	if (st < 0)
+		resp->status = (s16)st;
+
+	vsock->rx.rsp_prod_pvt = ++i;
+
+	return resp;
+}
+
+static inline int rx_work_todo(struct xen_sockbk *sockbk)
+{
+	return !skb_queue_empty(&sockbk->rx_queue);
+}
+
+static inline int tx_work_todo(struct xen_sockbk *sockbk)
+{
+	if (((nr_pending_reqs(sockbk) + MAX_SKB_FRAGS) < MAX_PENDING_REQS) &&
+	    !list_empty(&sockbk->sock_schedule_list))
+		return 1;
+
+	return 0;
+}
+
+static int xen_sockbk_kthread(void *data)
+{
+	struct xen_sockbk *sockbk = data;
+
+	while (!kthread_should_stop()) {
+		wait_event_interruptible(sockbk->wq,
+					 rx_work_todo(sockbk) ||
+					 tx_work_todo(sockbk) ||
+					 kthread_should_stop());
+		cond_resched();
+
+		if (kthread_should_stop())
+			break;
+
+		if (rx_work_todo(sockbk))
+			xen_sockbk_rx_action(sockbk);
+
+		if (tx_work_todo(sockbk))
+			xen_sockbk_tx_action(sockbk);
+	}
+
+	return 0;
+}
+
+void xen_sockbk_unmap_frontend_rings(struct xenvsock *vsock)
+{
+	if (vsock->tx.sring)
+		xenbus_unmap_ring_vfree(vsock->xbdev, vsock->tx.sring);
+	if (vsock->rx.sring)
+		xenbus_unmap_ring_vfree(vsock->xbdev, vsock->rx.sring);
+}
+
+int xen_sockbk_map_frontend_rings(struct xenvsock *vsock,
+				  grant_ref_t tx_ring_ref,
+				  grant_ref_t rx_ring_ref)
+{
+	void *addr;
+	struct xen_sockif_tx_sring *txs;
+	struct xen_sockif_rx_sring *rxs;
+
+	int err = -ENOMEM;
+
+	err = xenbus_map_ring_valloc(vsock->xbdev, tx_ring_ref, &addr);
+	if (err)
+		goto err;
+
+	txs = (struct xen_sockif_tx_sring *)addr;
+	BACK_RING_INIT(&vsock->tx, txs, PAGE_SIZE);
+
+	err = xenbus_map_ring_valloc(vsock->xbdev, rx_ring_ref, &addr);
+	if (err)
+		goto err;
+
+	rxs = (struct xen_sockif_rx_sring *)addr;
+	BACK_RING_INIT(&vsock->rx, rxs, PAGE_SIZE);
+
+	vsock->rx_req_cons_peek = 0;
+
+	return 0;
+
+err:
+	xen_sockbk_unmap_frontend_rings(vsock);
+	return err;
+}
+
+static int __init sockback_init(void)
+{
+	int i;
+	int rc = 0;
+	int group;
+
+	if (!xen_domain())
+		return -ENODEV;
+
+	xen_sockbk_group_nr = num_online_cpus();
+	xen_sockbk = vzalloc(sizeof(*xen_sockbk) * xen_sockbk_group_nr);
+	if (!xen_sockbk)
+		return -ENOMEM;
+
+	for (group = 0; group < xen_sockbk_group_nr; group++) {
+		struct xen_sockbk *sockbk = &xen_sockbk[group];
+
+		skb_queue_head_init(&sockbk->rx_queue);
+		skb_queue_head_init(&sockbk->tx_queue);
+
+		init_timer(&sockbk->sock_timer);
+		sockbk->sock_timer.data = (unsigned long)sockbk;
+		sockbk->sock_timer.function = xen_sockbk_alarm;
+
+		sockbk->pending_cons = 0;
+		sockbk->pending_prod = MAX_PENDING_REQS;
+		for (i = 0; i < MAX_PENDING_REQS; i++)
+			sockbk->pending_ring[i] = i;
+
+		init_waitqueue_head(&sockbk->wq);
+		sockbk->task = kthread_create(xen_sockbk_kthread,
+					     (void *)sockbk,
+					     "sockback/%u", group);
+
+		if (IS_ERR(sockbk->task)) {
+			pr_alert("kthread_create() fails at sockback\n");
+			del_timer(&sockbk->sock_timer);
+			rc = PTR_ERR(sockbk->task);
+			goto failed_init;
+		}
+
+		kthread_bind(sockbk->task, group);
+
+		INIT_LIST_HEAD(&sockbk->sock_schedule_list);
+
+		spin_lock_init(&sockbk->sock_schedule_list_lock);
+
+		atomic_set(&sockbk->sockfront_count, 0);
+
+		wake_up_process(sockbk->task);
+	}
+
+	rc = xenvsock_xenbus_init();
+	if (rc)
+		goto failed_init;
+
+	rc = xensock_proto_server_init();
+	if (rc)
+		goto failed_init;
+
+	return 0;
+
+failed_init:
+	while (--group >= 0) {
+		struct xen_sockbk *sockbk = &xen_sockbk[group];
+
+		for (i = 0; i < MAX_PENDING_REQS; i++) {
+			if (sockbk->mmap_pages[i])
+				__free_page(sockbk->mmap_pages[i]);
+		}
+		del_timer(&sockbk->sock_timer);
+		kthread_stop(sockbk->task);
+	}
+	vfree(xen_sockbk);
+	return rc;
+}
+
+module_init(sockback_init);
+
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_ALIAS("xen-backend:vsock");
diff --git a/drivers/net/xen-sockback/xenbus.c b/drivers/net/xen-sockback/xenbus.c
new file mode 100644
index 0000000..7bd9a06
--- /dev/null
+++ b/drivers/net/xen-sockback/xenbus.c
@@ -0,0 +1,348 @@
+/*
+ * Xenbus code for sockif backend
+ *
+ * Copyright (C) 2005 Rusty Russell <rusty@rustcorp.com.au>
+ * Copyright (C) 2005 XenSource Ltd
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+*/
+#include "common.h"
+
+struct backend_info {
+	struct xenbus_device *dev;
+	struct xenvsock *vsock;
+	enum xenbus_state frontend_state;
+};
+
+static int connect_rings(struct backend_info *);
+static void connect(struct backend_info *);
+static int backend_create_xenvsock(struct backend_info *be);
+
+static int sockback_remove(struct xenbus_device *dev)
+{
+	struct backend_info *be = dev_get_drvdata(&dev->dev);
+
+	if (be->vsock) {
+		xenvsock_disconnect(be->vsock);
+		be->vsock = NULL;
+	}
+	kfree(be);
+	dev_set_drvdata(&dev->dev, NULL);
+	return 0;
+}
+
+
+/**
+ * Entry point to this code when a new device is created.  Allocate the basic
+ * structures and switch to InitWait.
+ */
+static int sockback_probe(struct xenbus_device *dev,
+			  const struct xenbus_device_id *id)
+{
+	const char *message;
+	struct xenbus_transaction xbt;
+	int err;
+	int sg;
+	struct backend_info *be = kzalloc(sizeof(*be), GFP_KERNEL);
+
+	if (!be) {
+		xenbus_dev_fatal(dev, -ENOMEM,
+				 "allocating backend structure");
+		return -ENOMEM;
+	}
+
+	be->dev = dev;
+	dev_set_drvdata(&dev->dev, be);
+
+	sg = 1;
+
+	do {
+		err = xenbus_transaction_start(&xbt);
+		if (err) {
+			xenbus_dev_fatal(dev, err, "starting transaction");
+			goto fail;
+		}
+
+		/* We support rx-copy path. */
+		err = xenbus_printf(xbt, dev->nodename,
+				    "feature-rx-copy", "%d", 1);
+		if (err) {
+			message = "writing feature-rx-copy";
+			goto abort_transaction;
+		}
+
+		/* We don't support rx-flip path (except old guests who don't
+		 * grok this feature flag).
+		 */
+		err = xenbus_printf(xbt, dev->nodename,
+				    "feature-rx-flip", "%d", 0);
+		if (err) {
+			message = "writing feature-rx-flip";
+			goto abort_transaction;
+		}
+
+		err = xenbus_transaction_end(xbt, 0);
+	} while (err == -EAGAIN);
+
+	if (err) {
+		xenbus_dev_fatal(dev, err, "completing transaction");
+		goto fail;
+	}
+
+	err = xenbus_switch_state(dev, XenbusStateInitWait);
+	if (err)
+		goto fail;
+
+	return 0;
+
+abort_transaction:
+	xenbus_transaction_end(xbt, 1);
+	xenbus_dev_fatal(dev, err, "%s", message);
+fail:
+	pr_debug("failed");
+	sockback_remove(dev);
+	return err;
+}
+
+static int backend_create_xenvsock(struct backend_info *be)
+{
+	int err;
+	long handle;
+	struct xenbus_device *dev = be->dev;
+
+	if (be->vsock != NULL)
+		return 0;
+
+	err = xenbus_scanf(XBT_NIL, dev->nodename, "handle", "%li", &handle);
+	if (err != 1) {
+		err = -EINVAL;
+		xenbus_dev_fatal(dev, err, "reading handle");
+		goto fail;
+	}
+
+	be->vsock = xenvsock_alloc(&dev->dev, dev->otherend_id, handle);
+	if (IS_ERR(be->vsock)) {
+		err = PTR_ERR(be->vsock);
+		be->vsock = NULL;
+		xenbus_dev_fatal(dev, err, "creating interface");
+		goto fail;
+	}
+
+	/* setup back pointer */
+	be->vsock->xbdev = dev;
+
+	err = 0;
+fail:
+	return err;
+}
+
+
+static void disconnect_backend(struct xenbus_device *dev)
+{
+	struct backend_info *be = dev_get_drvdata(&dev->dev);
+
+	if (be->vsock) {
+		xenvsock_disconnect(be->vsock);
+		be->vsock = NULL;
+	}
+}
+
+/**
+ * Callback received when the frontend's state changes.
+ */
+static void frontend_changed(struct xenbus_device *dev,
+			     enum xenbus_state frontend_state)
+{
+	struct backend_info *be = dev_get_drvdata(&dev->dev);
+
+	pr_debug("frontend state %s", xenbus_strstate(frontend_state));
+
+	be->frontend_state = frontend_state;
+
+	switch (frontend_state) {
+	case XenbusStateInitialising:
+		if (dev->state == XenbusStateClosed) {
+			pr_alert("%s: %s: prepare for reconnect\n",
+				 __func__, dev->nodename);
+			xenbus_switch_state(dev, XenbusStateInitWait);
+		}
+		break;
+
+	case XenbusStateInitialised:
+	case XenbusStateConnected:
+		/* Ensure we connect even when two watches fire in
+		 * close succession and we miss the intermediate value
+		 * of frontend_state.
+		 */
+		if (dev->state == XenbusStateConnected)
+			break;
+
+		/* Enforce precondition before potential leak point.
+		 * disconnect_backend() is idempotent.
+		 */
+		disconnect_backend(dev);
+
+		if (backend_create_xenvsock(be))
+			break;
+
+		connect(be);
+		break;
+
+	case XenbusStateClosing:
+		xenbus_switch_state(dev, XenbusStateClosing);
+		break;
+
+	case XenbusStateClosed:
+		disconnect_backend(dev);
+		xenbus_switch_state(dev, XenbusStateClosed);
+		if (xenbus_dev_is_online(dev))
+			break;
+		/* fall through if not online */
+	case XenbusStateUnknown:
+		device_unregister(&dev->dev);
+		break;
+
+	default:
+		xenbus_dev_fatal(dev, -EINVAL, "saw state %d at frontend",
+				 frontend_state);
+		break;
+	}
+}
+
+static void xen_sock_read_rate(struct xenbus_device *dev,
+			       unsigned long *bytes, unsigned long *usec)
+{
+	char *s, *e;
+	unsigned long b, u;
+	char *ratestr;
+
+	/* Default to unlimited bandwidth. */
+	*bytes = ~0UL;
+	*usec = 0;
+
+	ratestr = xenbus_read(XBT_NIL, dev->nodename, "rate", NULL);
+	if (IS_ERR(ratestr))
+		return;
+
+	s = ratestr;
+	e = strchr(s, ',');
+	if (e == NULL)
+		goto fail;
+
+	e[0] = '\0';
+
+	if (kstrtoul(s, 10, &b))
+		goto fail;
+
+	s = e + 1;
+	if (kstrtoul(s, 10, &u))
+		goto fail;
+
+	*bytes = b;
+	*usec = u;
+
+	kfree(ratestr);
+	return;
+
+ fail:
+	pr_warn("Failed to parse socket data rate limit. Traffic unlimited.\n");
+	kfree(ratestr);
+}
+
+static void connect(struct backend_info *be)
+{
+	int err;
+	struct xenbus_device *dev = be->dev;
+
+	err = connect_rings(be);
+	if (err)
+		return;
+
+	xen_sock_read_rate(dev, &be->vsock->credit_bytes,
+			   &be->vsock->credit_usec);
+	be->vsock->remaining_credit = be->vsock->credit_bytes;
+
+	sockif_wake_queue(be->vsock->dev);
+	xenbus_switch_state(dev, XenbusStateConnected);
+}
+
+static int connect_rings(struct backend_info *be)
+{
+	struct xenvsock *vsock = be->vsock;
+	struct xenbus_device *dev = be->dev;
+	unsigned long tx_ring_ref, rx_ring_ref;
+	unsigned int evtchn, rx_copy;
+	int err;
+	int val;
+
+	err = xenbus_gather(XBT_NIL, dev->otherend,
+			    "tx-ring-ref", "%lu", &tx_ring_ref,
+			    "rx-ring-ref", "%lu", &rx_ring_ref,
+			    "event-channel", "%u", &evtchn, NULL);
+	if (err) {
+		xenbus_dev_fatal(dev, err,
+				 "reading %s/ring-ref and event-channel",
+				 dev->otherend);
+		return err;
+	}
+
+	err = xenbus_scanf(XBT_NIL, dev->otherend, "request-rx-copy", "%u",
+			   &rx_copy);
+	if (err == -ENOENT) {
+		err = 0;
+		rx_copy = 0;
+	}
+	if (err < 0) {
+		xenbus_dev_fatal(dev, err, "reading %s/request-rx-copy",
+				 dev->otherend);
+		return err;
+	}
+	if (!rx_copy)
+		return -EOPNOTSUPP;
+
+	if (vsock->dev->tx_queue_len != 0) {
+		if (xenbus_scanf(XBT_NIL, dev->otherend,
+				 "feature-rx-notify", "%d", &val) < 0)
+			val = 0;
+		if (val)
+			vsock->can_queue = 1;
+		else
+			/* Must be non-zero for pfifo_fast to work. */
+			vsock->dev->tx_queue_len = 1;
+	}
+
+	/* Map the shared frame, irq etc. */
+	err = xenvsock_connect(vsock, tx_ring_ref, rx_ring_ref, evtchn);
+	if (err) {
+		xenbus_dev_fatal(dev, err,
+				 "mapping shared-frames %lu/%lu port %u",
+				 tx_ring_ref, rx_ring_ref, evtchn);
+		return err;
+	}
+	return 0;
+}
+
+/* ** Driver Registration ** */
+static const struct xenbus_device_id sockback_ids[] = {
+	{ "vsock" },
+	{ "" }
+};
+
+static DEFINE_XENBUS_DRIVER(sockback, ,
+	.probe = sockback_probe,
+	.remove = sockback_remove,
+	.otherend_changed = frontend_changed,
+);
+
+int xenvsock_xenbus_init(void)
+{
+	return xenbus_register_backend(&sockback_driver);
+}
diff --git a/include/xen/interface/io/sockif.h b/include/xen/interface/io/sockif.h
new file mode 100644
index 0000000..c12ce11
--- /dev/null
+++ b/include/xen/interface/io/sockif.h
@@ -0,0 +1,74 @@
+/******************************************************************************
+ * sockif.h
+ *
+ * Unified xensock I/O interface for Xen guest OSes.
+ *
+ */
+#ifndef __XEN_PUBLIC_IO_SOCKIF_H__
+#define __XEN_PUBLIC_IO_SOCKIF_H__
+
+#include <linux/types.h>
+#include <linux/spinlock.h>
+
+#include <net/sock.h>
+
+#include <xen/interface/io/ring.h>
+#include <xen/interface/grant_table.h>
+
+/*
+ * REQUEST CODES.
+ */
+#define SOCKIF_OP_SENDMSG		0
+#define SOCKIF_OP_RECVMSG		1
+
+#define SOCKIF_MAX_PAGES_PER_REQUEST	10
+
+#define SOCKIF_DEV_ID_CNT		5
+
+
+/* Packet continues in the next request descriptor. */
+#define _XEN_SOCKTXF_more_data		(2)
+#define  XEN_SOCKTXF_more_data		(1U<<_XEN_SOCKTXF_more_data)
+
+struct xen_sockif_tx_request {
+	grant_ref_t gref;      /* Reference to buffer page */
+	uint16_t offset;       /* Offset within buffer page */
+	uint16_t flags;        /* XEN_SOCKTXF_* */
+	uint16_t id;           /* Echoed in response message. */
+	uint16_t size;         /* Packet size in bytes.       */
+};
+
+struct xen_sockif_tx_response {
+	uint16_t id;
+	int16_t  status;       /* XEN_SOCKIF_RSP_* */
+};
+
+struct xen_sockif_rx_request {
+	uint16_t    id;        /* Echoed in response message.        */
+	grant_ref_t gref;      /* Reference to incoming granted frame */
+};
+
+/* Packet continues in the next request descriptor. */
+#define _XEN_SOCKRXF_more_data		(0)
+#define  XEN_SOCKRXF_more_data		(1U<<_XEN_SOCKRXF_more_data)
+
+struct xen_sockif_rx_response {
+	uint16_t id;
+	uint16_t offset;       /* Offset in page of start of received packet  */
+	uint16_t flags;        /* XEN_SOCKRXF_* */
+	int16_t  status;       /* -ve: BLKIF_RSP_* ; +ve: Rx'ed pkt size. */
+};
+
+DEFINE_RING_TYPES(xen_sockif_tx,
+		  struct xen_sockif_tx_request,
+		  struct xen_sockif_tx_response);
+
+DEFINE_RING_TYPES(xen_sockif_rx,
+		  struct xen_sockif_rx_request,
+		  struct xen_sockif_rx_response);
+
+#define XEN_SOCKIF_RSP_DROPPED	-2
+#define XEN_SOCKIF_RSP_ERROR	-1
+#define XEN_SOCKIF_RSP_OKAY	 0
+
+#endif /* __XEN_PUBLIC_IO_SOCKIF_H__ */
-- 
1.8.2.rc2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] xen-sockfront: add support of the xen-sockfront driver
  2014-08-08 11:32 [PATCH 0/3] Introduce xensock socket and implement sockback and sockfront drivers Oleksandr Dmytryshyn
  2014-08-08 11:32 ` [PATCH 1/3] net: introduce new socket support: xensock Oleksandr Dmytryshyn
  2014-08-08 11:32 ` [PATCH 2/3] xen-sockback: add support of the xen-sockback driver Oleksandr Dmytryshyn
@ 2014-08-08 11:32 ` Oleksandr Dmytryshyn
  2014-08-08 12:36 ` [PATCH 0/3] Introduce xensock socket and implement sockback and sockfront drivers David Vrabel
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Oleksandr Dmytryshyn @ 2014-08-08 11:32 UTC (permalink / raw)
  To: Ian Campbell, Stefano Stabellini, Tim Deegan, xen-devel

This driver will allow to use xensock sockets for the
client application for the domU domain.

Signed-off-by: Oleksandr Dmytryshyn <oleksandr.dmytryshyn@globallogic.com>
---
 drivers/net/Kconfig         |   17 +-
 drivers/net/Makefile        |    1 +
 drivers/net/xen-sockfront.c | 1364 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 1381 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/xen-sockfront.c

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index e643f42..887fe4b 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -348,6 +348,21 @@ config XEN_NETDEV_BACKEND
 	  compile this driver as a module, chose M here: the module
 	  will be called xen-netback.
 
+config XEN_SOCKDEV_FRONTEND
+	tristate "Xen frontend socket device"
+	depends on XEN_DOMU
+	select XEN_SOCKDEV_PROTO
+	help
+	  This driver provides support for Xen socket devices exported
+	  by a Xen socket driver domain (often domain 0).
+
+	  The corresponding Linux backend driver is enabled by the
+	  XEN_SOCKDEV_BACKEND option.
+
+	  If you are compiling a kernel for use as Xen guest, you
+	  should say Y here. To compile this driver as a module, chose
+	  M here: the module will be called xen-sockfront.
+
 config XEN_SOCKDEV_BACKEND
 	tristate "Xen backend socket device"
 	depends on XEN_BACKEND
@@ -369,7 +384,7 @@ config XEN_SOCKDEV_BACKEND
 config XEN_SOCKDEV_PROTO
 	bool
 	default n
-	depends on XEN_SOCKDEV_BACKEND
+	depends on XEN_SOCKDEV_BACKEND || XEN_SOCKDEV_FRONTEND
 
 config VMXNET3
 	tristate "VMware VMXNET3 ethernet driver"
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 96c6c97..8c3eee8 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -62,6 +62,7 @@ obj-$(CONFIG_XEN_NETDEV_FRONTEND) += xen-netfront.o
 obj-$(CONFIG_XEN_NETDEV_BACKEND) += xen-netback/
 
 obj-$(CONFIG_XEN_SOCKDEV_PROTO) += xensock/
+obj-$(CONFIG_XEN_SOCKDEV_FRONTEND) += xen-sockfront.o
 obj-$(CONFIG_XEN_SOCKDEV_BACKEND) += xen-sockback/
 
 obj-$(CONFIG_USB_CATC)          += usb/
diff --git a/drivers/net/xen-sockfront.c b/drivers/net/xen-sockfront.c
new file mode 100644
index 0000000..2aa65e4
--- /dev/null
+++ b/drivers/net/xen-sockfront.c
@@ -0,0 +1,1364 @@
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/moduleparam.h>
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/interrupt.h>
+
+#include <net/xensock.h>
+#include <net/af_xensock.h>
+
+#include <asm/xen/page.h>
+#include <xen/xen.h>
+#include <xen/xenbus.h>
+#include <xen/events.h>
+#include <xen/page.h>
+#include <xen/grant_table.h>
+
+#include <xen/interface/io/sockif.h>
+#include <xen/interface/memory.h>
+#include <xen/interface/grant_table.h>
+
+struct sockfront_cb {
+	int pull_to;
+};
+
+#define SOCKFRONT_SKB_CB(skb)	((struct sockfront_cb *)((skb)->cb))
+
+#define RX_COPY_THRESHOLD 256
+
+#define GRANT_INVALID_REF	0
+
+#define SOCK_TX_RING_SIZE __CONST_RING_SIZE(xen_sockif_tx, PAGE_SIZE)
+#define SOCK_RX_RING_SIZE __CONST_RING_SIZE(xen_sockif_rx, PAGE_SIZE)
+#define TX_MAX_TARGET min_t(int, SOCK_TX_RING_SIZE, 256)
+
+struct sockfront_info {
+	struct list_head list;
+	struct xen_sock_dev *sockdev;
+	unsigned int irq;
+
+	struct tasklet_struct rx_poll;
+
+	unsigned int evtchn;
+	struct xenbus_device *xbdev;
+
+	spinlock_t   tx_lock; /* transmitter lock */
+	struct xen_sockif_tx_front_ring tx;
+	int tx_ring_ref;
+
+	/* {tx,rx}_skbs store outstanding skbuffs. Free tx_skb entries
+	 * are linked from tx_skb_freelist through skb_entry.link.
+	 *
+	 *  NB. Freelist index entries are always going to be less than
+	 *  PAGE_OFFSET, whereas pointers to skbs will always be equal or
+	 *  greater than PAGE_OFFSET: we use this property to distinguish
+	 *  them.
+	 */
+	union skb_entry {
+		struct sk_buff *skb;
+		unsigned long link;
+	} tx_skbs[SOCK_TX_RING_SIZE];
+	grant_ref_t gref_tx_head;
+	grant_ref_t grant_tx_ref[SOCK_TX_RING_SIZE];
+	unsigned tx_skb_freelist;
+
+	spinlock_t   rx_lock ____cacheline_aligned_in_smp;
+	struct xen_sockif_rx_front_ring rx;
+	int rx_ring_ref;
+
+	/* Receive-ring batched refills. */
+#define RX_MIN_TARGET 8
+#define RX_DFL_MIN_TARGET 64
+#define RX_MAX_TARGET min_t(int, SOCK_RX_RING_SIZE, 256)
+	unsigned rx_min_target, rx_max_target, rx_target;
+	struct sk_buff_head rx_batch;
+
+	struct timer_list rx_refill_timer;
+
+	struct sk_buff *rx_skbs[SOCK_RX_RING_SIZE];
+	grant_ref_t gref_rx_head;
+	grant_ref_t grant_rx_ref[SOCK_RX_RING_SIZE];
+
+	unsigned long rx_pfn_array[SOCK_RX_RING_SIZE];
+	struct multicall_entry rx_mcl[SOCK_RX_RING_SIZE+1];
+	struct mmu_update rx_mmu[SOCK_RX_RING_SIZE];
+};
+
+struct sockfront_rx_info {
+	struct xen_sockif_rx_response rx;
+};
+
+static void skb_entry_set_link(union skb_entry *list, unsigned short id)
+{
+	list->link = id;
+}
+
+static int skb_entry_is_link(const union skb_entry *list)
+{
+	BUILD_BUG_ON(sizeof(list->skb) != sizeof(list->link));
+	return (unsigned long)list->skb < PAGE_OFFSET;
+}
+
+/* Access macros for acquiring freeing slots in tx_skbs[] */
+
+static void add_id_to_freelist(unsigned *head, union skb_entry *list,
+			       unsigned short id)
+{
+	skb_entry_set_link(&list[id], *head);
+	*head = id;
+}
+
+static unsigned short get_id_from_freelist(unsigned *head,
+					   union skb_entry *list)
+{
+	unsigned int id = *head;
+	*head = list[id].link;
+	return id;
+}
+
+static int xensock_rxidx(RING_IDX idx)
+{
+	return idx & (SOCK_RX_RING_SIZE - 1);
+}
+
+static struct sk_buff *xensock_get_rx_skb(struct sockfront_info *np,
+					  RING_IDX ri)
+{
+	int i = xensock_rxidx(ri);
+	struct sk_buff *skb = np->rx_skbs[i];
+
+	np->rx_skbs[i] = NULL;
+	return skb;
+}
+
+static grant_ref_t xensock_get_rx_ref(struct sockfront_info *np,
+				      RING_IDX ri)
+{
+	int i = xensock_rxidx(ri);
+	grant_ref_t ref = np->grant_rx_ref[i];
+
+	np->grant_rx_ref[i] = GRANT_INVALID_REF;
+	return ref;
+}
+
+static void rx_refill_timeout(unsigned long data)
+{
+	struct xen_sock_dev *dev = (struct xen_sock_dev *)data;
+	struct sockfront_info *np = xensock_dev_priv(dev);
+
+	tasklet_schedule(&np->rx_poll);
+}
+
+static int sockfront_tx_slot_available(struct sockfront_info *np)
+{
+	return (np->tx.req_prod_pvt - np->tx.rsp_cons) <
+		(TX_MAX_TARGET - MAX_SKB_FRAGS - 2);
+}
+
+static void xensock_maybe_wake_tx(struct xen_sock_dev *dev)
+{
+	struct sockfront_info *np = xensock_dev_priv(dev);
+
+	if (unlikely(sockif_queue_stopped(dev)) &&
+	    sockfront_tx_slot_available(np))
+		sockif_wake_queue(dev);
+}
+
+static void xensock_alloc_rx_buffers(struct xen_sock_dev *dev)
+{
+	unsigned short id;
+	struct sockfront_info *np = xensock_dev_priv(dev);
+	struct sk_buff *skb;
+	struct page *page;
+	int i, batch_target, notify;
+	RING_IDX req_prod = np->rx.req_prod_pvt;
+	grant_ref_t ref;
+	unsigned long pfn;
+	void *vaddr;
+	struct xen_sockif_rx_request *req;
+
+	if (unlikely(!sockif_carrier_ok(dev)))
+		return;
+
+	/* Allocate skbuffs greedily, even though we batch updates to the
+	 * receive ring. This creates a less bursty demand on the memory
+	 * allocator, so should reduce the chance of failed allocation requests
+	 * both for ourself and for other kernel subsystems.
+	 */
+	batch_target = np->rx_target - (req_prod - np->rx.rsp_cons);
+	for (i = skb_queue_len(&np->rx_batch); i < batch_target; i++) {
+		skb = alloc_skb(RX_COPY_THRESHOLD + NET_IP_ALIGN + NET_SKB_PAD,
+				GFP_ATOMIC | __GFP_NOWARN);
+		if (unlikely(!skb))
+			goto no_skb;
+
+		/* Align ip header to a 16 bytes boundary */
+		skb_reserve(skb, NET_IP_ALIGN);
+
+		page = alloc_page(GFP_ATOMIC | __GFP_NOWARN);
+		if (!page) {
+			kfree_skb(skb);
+no_skb:
+			/* Any skbuffs queued for refill? Force them out. */
+			if (i != 0)
+				goto refill;
+			/* Could not allocate any skbuffs. Try again later. */
+			mod_timer(&np->rx_refill_timer,
+				  jiffies + (HZ/10));
+			break;
+		}
+
+		__skb_fill_page_desc(skb, 0, page, 0, 0);
+		skb_shinfo(skb)->nr_frags = 1;
+		__skb_queue_tail(&np->rx_batch, skb);
+	}
+
+	/* Is the batch large enough to be worthwhile? */
+	if (i < (np->rx_target/2)) {
+		if (req_prod > np->rx.sring->req_prod)
+			goto push;
+		return;
+	}
+
+	/* Adjust our fill target if we risked running out of buffers. */
+	if ((req_prod - np->rx.sring->rsp_prod) < (np->rx_target / 4)) {
+		np->rx_target *= 2;
+		if (np->rx_target > np->rx_max_target)
+			np->rx_target = np->rx_max_target;
+	}
+
+ refill:
+	for (i = 0; ; i++) {
+		skb = __skb_dequeue(&np->rx_batch);
+		if (skb == NULL)
+			break;
+
+		id = xensock_rxidx(req_prod + i);
+
+		BUG_ON(np->rx_skbs[id]);
+		np->rx_skbs[id] = skb;
+
+		ref = gnttab_claim_grant_reference(&np->gref_rx_head);
+		BUG_ON((signed short)ref < 0);
+		np->grant_rx_ref[id] = ref;
+
+		pfn = page_to_pfn(skb_frag_page(&skb_shinfo(skb)->frags[0]));
+		vaddr = page_address(skb_frag_page(&skb_shinfo(skb)->frags[0]));
+
+		req = RING_GET_REQUEST(&np->rx, req_prod + i);
+		gnttab_grant_foreign_access_ref(ref,
+						np->xbdev->otherend_id,
+						pfn_to_mfn(pfn),
+						0);
+
+		req->id = id;
+		req->gref = ref;
+	}
+
+	wmb();		/* barrier so backend seens requests */
+
+	/* Above is a suitable barrier to ensure backend will see requests. */
+	np->rx.req_prod_pvt = req_prod + i;
+ push:
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&np->rx, notify);
+	if (notify)
+		notify_remote_via_irq(np->irq);
+}
+
+static void xensock_tx_buf_gc(struct xen_sock_dev *dev)
+{
+	RING_IDX cons, prod;
+	unsigned short id;
+	struct sockfront_info *np = xensock_dev_priv(dev);
+	struct sk_buff *skb;
+
+	BUG_ON(!sockif_carrier_ok(dev));
+
+	do {
+		prod = np->tx.sring->rsp_prod;
+		rmb(); /* Ensure we see responses up to 'rp'. */
+
+		for (cons = np->tx.rsp_cons; cons != prod; cons++) {
+			struct xen_sockif_tx_response *txrsp;
+
+			txrsp = RING_GET_RESPONSE(&np->tx, cons);
+
+			id  = txrsp->id;
+			skb = np->tx_skbs[id].skb;
+			if (unlikely(gnttab_query_foreign_access(
+				np->grant_tx_ref[id]) != 0)) {
+				pr_alert("%s: warning -- grant still in use "
+					 "by backend domain.\n", __func__);
+				BUG();
+			}
+			gnttab_end_foreign_access_ref(
+				np->grant_tx_ref[id], GNTMAP_readonly);
+			gnttab_release_grant_reference(
+				&np->gref_tx_head, np->grant_tx_ref[id]);
+			np->grant_tx_ref[id] = GRANT_INVALID_REF;
+			add_id_to_freelist(&np->tx_skb_freelist, np->tx_skbs,
+					   id);
+			dev_kfree_skb_irq(skb);
+		}
+
+		np->tx.rsp_cons = prod;
+
+		/* Set a new event, then check for race with update of tx_cons.
+		 * Note that it is essential to schedule a callback, no matter
+		 * how few buffers are pending. Even if there is space in the
+		 * transmit ring, higher layers may be blocked because too much
+		 * data is outstanding: in such cases notification from Xen is
+		 * likely to be the only kick that we'll get.
+		 */
+		np->tx.sring->rsp_event =
+			prod + ((np->tx.sring->req_prod - prod) >> 1) + 1;
+		mb();		/* update shared area */
+	} while ((cons == prod) && (prod != np->tx.sring->rsp_prod));
+
+	xensock_maybe_wake_tx(dev);
+}
+
+static void xensock_make_frags(struct sk_buff *skb, struct xen_sock_dev *dev,
+			       struct xen_sockif_tx_request *tx)
+{
+	struct sockfront_info *np = xensock_dev_priv(dev);
+	char *data = skb->data;
+	unsigned long mfn;
+	RING_IDX prod = np->tx.req_prod_pvt;
+	int frags = skb_shinfo(skb)->nr_frags;
+	unsigned int offset = offset_in_page(data);
+	unsigned int len = skb_headlen(skb);
+	unsigned int id;
+	grant_ref_t ref;
+	int i;
+
+	/* While the header overlaps a page boundary (including being
+	 * larger than a page), split it it into page-sized chunks.
+	 */
+	while (len > PAGE_SIZE - offset) {
+		tx->size = PAGE_SIZE - offset;
+		tx->flags |= XEN_SOCKTXF_more_data;
+		len -= tx->size;
+		data += tx->size;
+		offset = 0;
+
+		id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs);
+		np->tx_skbs[id].skb = skb_get(skb);
+		tx = RING_GET_REQUEST(&np->tx, prod++);
+		tx->id = id;
+		ref = gnttab_claim_grant_reference(&np->gref_tx_head);
+		BUG_ON((signed short)ref < 0);
+
+		mfn = virt_to_mfn(data);
+		gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id,
+						mfn, GNTMAP_readonly);
+
+		tx->gref = ref;
+		np->grant_tx_ref[id] = ref;
+		tx->offset = offset;
+		tx->size = len;
+		tx->flags = 0;
+	}
+
+	/* Grant backend access to each skb fragment page. */
+	for (i = 0; i < frags; i++) {
+		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
+		struct page *page = skb_frag_page(frag);
+
+		len = skb_frag_size(frag);
+		offset = frag->page_offset;
+
+		/* Data must not cross a page boundary. */
+		BUG_ON(len + offset > PAGE_SIZE<<compound_order(page));
+
+		/* Skip unused frames from start of page */
+		page += offset >> PAGE_SHIFT;
+		offset &= ~PAGE_MASK;
+
+		while (len > 0) {
+			unsigned long bytes;
+
+			BUG_ON(offset >= PAGE_SIZE);
+
+			bytes = PAGE_SIZE - offset;
+			if (bytes > len)
+				bytes = len;
+
+			tx->flags |= XEN_SOCKTXF_more_data;
+
+			id = get_id_from_freelist(&np->tx_skb_freelist,
+						  np->tx_skbs);
+			np->tx_skbs[id].skb = skb_get(skb);
+			tx = RING_GET_REQUEST(&np->tx, prod++);
+			tx->id = id;
+			ref = gnttab_claim_grant_reference(&np->gref_tx_head);
+			BUG_ON((signed short)ref < 0);
+
+			mfn = pfn_to_mfn(page_to_pfn(page));
+			gnttab_grant_foreign_access_ref(ref,
+							np->xbdev->otherend_id,
+							mfn, GNTMAP_readonly);
+
+			tx->gref = ref;
+			np->grant_tx_ref[id] = ref;
+			tx->offset = offset;
+			tx->size = bytes;
+			tx->flags = 0;
+
+			offset += bytes;
+			len -= bytes;
+
+			/* Next frame */
+			if (offset == PAGE_SIZE && len) {
+				BUG_ON(!PageCompound(page));
+				page++;
+				offset = 0;
+			}
+		}
+	}
+
+	np->tx.req_prod_pvt = prod;
+}
+
+/* Count how many ring slots are required to send the frags of this
+ * skb. Each frag might be a compound page.
+ */
+static int xensock_count_skb_frag_slots(struct sk_buff *skb)
+{
+	int i, frags = skb_shinfo(skb)->nr_frags;
+	int pages = 0;
+
+	for (i = 0; i < frags; i++) {
+		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
+		unsigned long size = skb_frag_size(frag);
+		unsigned long offset = frag->page_offset;
+
+		/* Skip unused frames from start of page */
+		offset &= ~PAGE_MASK;
+
+		pages += PFN_UP(offset + size);
+	}
+
+	return pages;
+}
+
+static int xensock_start_xmit(struct sk_buff *skb, struct xen_sock_dev *dev)
+{
+	unsigned short id;
+	struct sockfront_info *np = xensock_dev_priv(dev);
+	struct xen_sockif_tx_request *tx;
+	char *data = skb->data;
+	RING_IDX i;
+	grant_ref_t ref;
+	unsigned long mfn;
+	int notify;
+	int slots;
+	unsigned int offset = offset_in_page(data);
+	unsigned int len = skb_headlen(skb);
+	unsigned long flags;
+
+	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
+		xensock_count_skb_frag_slots(skb);
+	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
+		pr_err_ratelimited("xensock: skb rides the rocket: %d slots\n",
+				   slots);
+		goto drop;
+	}
+
+	spin_lock_irqsave(&np->tx_lock, flags);
+
+	if (unlikely(!sockif_carrier_ok(dev))) {
+		spin_unlock_irqrestore(&np->tx_lock, flags);
+		goto drop;
+	}
+
+	i = np->tx.req_prod_pvt;
+
+	id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs);
+	np->tx_skbs[id].skb = skb;
+
+	tx = RING_GET_REQUEST(&np->tx, i);
+
+	tx->id   = id;
+	ref = gnttab_claim_grant_reference(&np->gref_tx_head);
+	BUG_ON((signed short)ref < 0);
+	mfn = virt_to_mfn(data);
+	gnttab_grant_foreign_access_ref(
+		ref, np->xbdev->otherend_id, mfn, GNTMAP_readonly);
+	tx->gref = ref;
+	np->grant_tx_ref[id] = ref;
+	tx->offset = offset;
+	tx->size = len;
+
+	tx->flags = 0;
+
+	np->tx.req_prod_pvt = i + 1;
+
+	xensock_make_frags(skb, dev, tx);
+	tx->size = skb->len;
+
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&np->tx, notify);
+	if (notify)
+		notify_remote_via_irq(np->irq);
+
+	/* Note: It is not safe to access skb after xensock_tx_buf_gc()! */
+	xensock_tx_buf_gc(dev);
+
+	if (!sockfront_tx_slot_available(np))
+		sockif_stop_queue(dev);
+
+	spin_unlock_irqrestore(&np->tx_lock, flags);
+
+	return 0;
+
+ drop:
+	dev_kfree_skb(skb);
+	return 0;
+}
+
+static void xensock_move_rx_slot(struct sockfront_info *np, struct sk_buff *skb,
+				 grant_ref_t ref)
+{
+	int new = xensock_rxidx(np->rx.req_prod_pvt);
+
+	BUG_ON(np->rx_skbs[new]);
+	np->rx_skbs[new] = skb;
+	np->grant_rx_ref[new] = ref;
+	RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->id = new;
+	RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->gref = ref;
+	np->rx.req_prod_pvt++;
+}
+
+static int xensock_get_responses(struct sockfront_info *np,
+				 struct sockfront_rx_info *rinfo, RING_IDX rp,
+				 struct sk_buff_head *list)
+{
+	struct xen_sockif_rx_response *rx = &rinfo->rx;
+	struct device *dev = &np->xbdev->dev;
+	RING_IDX cons = np->rx.rsp_cons;
+	struct sk_buff *skb = xensock_get_rx_skb(np, cons);
+	grant_ref_t ref = xensock_get_rx_ref(np, cons);
+	int max = MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD);
+	int frags = 1;
+	int err = 0;
+	unsigned long ret;
+
+	for (;;) {
+		if (unlikely(rx->status < 0 ||
+			     rx->offset + rx->status > PAGE_SIZE)) {
+			dev_warn_ratelimited(dev, "rx->offset: %x, size: %u\n",
+					     rx->offset, rx->status);
+			xensock_move_rx_slot(np, skb, ref);
+			err = -EINVAL;
+			goto next;
+		}
+
+		/* This definitely indicates a bug, either in this driver or in
+		 * the backend driver. In future this should flag the bad
+		 * situation to the system controller to reboot the backed.
+		 */
+		if (ref == GRANT_INVALID_REF) {
+			dev_warn_ratelimited(dev, "Bad rx response id %d.\n",
+					     rx->id);
+			err = -EINVAL;
+			goto next;
+		}
+
+		ret = gnttab_end_foreign_access_ref(ref, 0);
+		BUG_ON(!ret);
+
+		gnttab_release_grant_reference(&np->gref_rx_head, ref);
+
+		__skb_queue_tail(list, skb);
+
+next:
+		if (!(rx->flags & XEN_SOCKRXF_more_data))
+			break;
+
+		if (cons + frags == rp) {
+			dev_warn_ratelimited(dev, "Need more frags\n");
+			err = -ENOENT;
+			break;
+		}
+
+		rx = RING_GET_RESPONSE(&np->rx, cons + frags);
+		skb = xensock_get_rx_skb(np, cons + frags);
+		ref = xensock_get_rx_ref(np, cons + frags);
+		frags++;
+	}
+
+	if (unlikely(frags > max)) {
+		dev_warn_ratelimited(dev, "Too many frags\n");
+		err = -E2BIG;
+	}
+
+	if (unlikely(err))
+		np->rx.rsp_cons = cons + frags;
+
+	return err;
+}
+
+static RING_IDX xensock_fill_frags(struct sockfront_info *np,
+				   struct sk_buff *skb,
+				   struct sk_buff_head *list)
+{
+	struct skb_shared_info *shinfo = skb_shinfo(skb);
+	int nr_frags = shinfo->nr_frags;
+	RING_IDX cons = np->rx.rsp_cons;
+	struct sk_buff *nskb;
+
+	while ((nskb = __skb_dequeue(list))) {
+		struct xen_sockif_rx_response *rx =
+			RING_GET_RESPONSE(&np->rx, ++cons);
+		skb_frag_t *nfrag = &skb_shinfo(nskb)->frags[0];
+
+		__skb_fill_page_desc(skb, nr_frags,
+				     skb_frag_page(nfrag),
+				     rx->offset, rx->status);
+
+		skb->data_len += rx->status;
+
+		skb_shinfo(nskb)->nr_frags = 0;
+		kfree_skb(nskb);
+
+		nr_frags++;
+	}
+
+	shinfo->nr_frags = nr_frags;
+	return cons;
+}
+
+static int handle_incoming_queue(struct xen_sock_dev *dev,
+				 struct sk_buff_head *rxq)
+{
+	int packets_dropped = 0;
+	struct sk_buff *skb;
+
+	while ((skb = __skb_dequeue(rxq)) != NULL) {
+		int pull_to = SOCKFRONT_SKB_CB(skb)->pull_to;
+
+		__pskb_pull_tail(skb, pull_to - skb_headlen(skb));
+
+		/* Pass it up. */
+		xensock_dev_queue_rx_skb(skb, dev);
+	}
+
+	return packets_dropped;
+}
+
+static void xensock_poll(long unsigned int data)
+{
+	struct sockfront_info *np = (struct sockfront_info *)data;
+	struct xen_sock_dev *dev = np->sockdev;
+	struct sk_buff *skb;
+	struct sockfront_rx_info rinfo;
+	struct xen_sockif_rx_response *rx = &rinfo.rx;
+	RING_IDX i, rp;
+	int work_done;
+	struct sk_buff_head rxq;
+	struct sk_buff_head errq;
+	struct sk_buff_head tmpq;
+	unsigned long flags;
+	int err;
+	int budget = 16;
+	int more_to_do = 1;
+
+	spin_lock(&np->rx_lock);
+
+	skb_queue_head_init(&rxq);
+	skb_queue_head_init(&errq);
+	skb_queue_head_init(&tmpq);
+
+	rp = np->rx.sring->rsp_prod;
+	rmb(); /* Ensure we see queued responses up to 'rp'. */
+
+	i = np->rx.rsp_cons;
+	work_done = 0;
+	while ((i != rp) && (work_done < budget)) {
+		memcpy(rx, RING_GET_RESPONSE(&np->rx, i), sizeof(*rx));
+
+		err = xensock_get_responses(np, &rinfo, rp, &tmpq);
+
+		if (unlikely(err)) {
+			while ((skb = __skb_dequeue(&tmpq)))
+				__skb_queue_tail(&errq, skb);
+			i = np->rx.rsp_cons;
+			continue;
+		}
+
+		skb = __skb_dequeue(&tmpq);
+
+		SOCKFRONT_SKB_CB(skb)->pull_to = rx->status;
+		if (SOCKFRONT_SKB_CB(skb)->pull_to > RX_COPY_THRESHOLD)
+			SOCKFRONT_SKB_CB(skb)->pull_to = RX_COPY_THRESHOLD;
+
+		skb_shinfo(skb)->frags[0].page_offset = rx->offset;
+		skb_frag_size_set(&skb_shinfo(skb)->frags[0], rx->status);
+		skb->data_len = rx->status;
+
+		i = xensock_fill_frags(np, skb, &tmpq);
+
+		/* Truesize is the actual allocation size, even if the
+		 * allocation is only partially used.
+		 */
+		skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags;
+		skb->len += skb->data_len;
+
+		__skb_queue_tail(&rxq, skb);
+
+		np->rx.rsp_cons = ++i;
+		work_done++;
+	}
+
+	__skb_queue_purge(&errq);
+
+	work_done -= handle_incoming_queue(dev, &rxq);
+
+	/* If we get a callback with very few responses, reduce fill target. */
+	/* NB. Note exponential increase, linear decrease. */
+	if (((np->rx.req_prod_pvt - np->rx.sring->rsp_prod) >
+	     ((3*np->rx_target) / 4)) &&
+	    (--np->rx_target < np->rx_min_target))
+		np->rx_target = np->rx_min_target;
+
+	xensock_alloc_rx_buffers(dev);
+
+	if (work_done < budget) {
+		local_irq_save(flags);
+
+		RING_FINAL_CHECK_FOR_RESPONSES(&np->rx, more_to_do);
+
+		local_irq_restore(flags);
+	}
+
+	spin_unlock(&np->rx_lock);
+	if (more_to_do)
+		tasklet_schedule(&np->rx_poll);
+}
+
+static void xensock_release_tx_bufs(struct sockfront_info *np)
+{
+	struct sk_buff *skb;
+	int i;
+
+	for (i = 0; i < SOCK_TX_RING_SIZE; i++) {
+		/* Skip over entries which are actually freelist references */
+		if (skb_entry_is_link(&np->tx_skbs[i]))
+			continue;
+
+		skb = np->tx_skbs[i].skb;
+		gnttab_end_foreign_access_ref(np->grant_tx_ref[i],
+					      GNTMAP_readonly);
+		gnttab_release_grant_reference(&np->gref_tx_head,
+					       np->grant_tx_ref[i]);
+		np->grant_tx_ref[i] = GRANT_INVALID_REF;
+		add_id_to_freelist(&np->tx_skb_freelist, np->tx_skbs, i);
+		dev_kfree_skb_irq(skb);
+	}
+}
+
+static void xensock_release_rx_bufs(struct sockfront_info *np)
+{
+	struct mmu_update      *mmu = np->rx_mmu;
+	struct multicall_entry *mcl = np->rx_mcl;
+	struct sk_buff_head free_list;
+	struct sk_buff *skb;
+	unsigned long mfn;
+	int xfer = 0, noxfer = 0, unused = 0;
+	int id, ref;
+
+	dev_warn(&np->xbdev->dev, "%s: fix me for copying receiver.\n",
+		 __func__);
+	return;
+
+	skb_queue_head_init(&free_list);
+
+	spin_lock_bh(&np->rx_lock);
+
+	for (id = 0; id < SOCK_RX_RING_SIZE; id++) {
+		ref = np->grant_rx_ref[id];
+		if (ref == GRANT_INVALID_REF) {
+			unused++;
+			continue;
+		}
+
+		skb = np->rx_skbs[id];
+		mfn = gnttab_end_foreign_transfer_ref(ref);
+		gnttab_release_grant_reference(&np->gref_rx_head, ref);
+		np->grant_rx_ref[id] = GRANT_INVALID_REF;
+
+		if (0 == mfn) {
+			skb_shinfo(skb)->nr_frags = 0;
+			dev_kfree_skb(skb);
+			noxfer++;
+			continue;
+		}
+
+		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+			/* Remap the page. */
+			const struct page *page =
+				skb_frag_page(&skb_shinfo(skb)->frags[0]);
+			unsigned long pfn = page_to_pfn(page);
+			void *vaddr = page_address(page);
+
+			MULTI_update_va_mapping(mcl, (unsigned long)vaddr,
+						mfn_pte(mfn, PAGE_KERNEL),
+						0);
+			mcl++;
+			mmu->ptr = ((u64)mfn << PAGE_SHIFT)
+				| MMU_MACHPHYS_UPDATE;
+			mmu->val = pfn;
+			mmu++;
+
+			set_phys_to_machine(pfn, mfn);
+		}
+		__skb_queue_tail(&free_list, skb);
+		xfer++;
+	}
+
+	dev_info(&np->xbdev->dev, "%s: %d xfer, %d noxfer, %d unused\n",
+		 __func__, xfer, noxfer, unused);
+
+	if (xfer) {
+		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+			/* Do all the remapping work and M2P updates. */
+			MULTI_mmu_update(mcl, np->rx_mmu, mmu - np->rx_mmu,
+					 NULL, DOMID_SELF);
+			mcl++;
+			HYPERVISOR_multicall(np->rx_mcl, mcl - np->rx_mcl);
+		}
+	}
+
+	__skb_queue_purge(&free_list);
+
+	spin_unlock_bh(&np->rx_lock);
+}
+
+static void xensock_uninit(struct xen_sock_dev *dev)
+{
+	struct sockfront_info *np = xensock_dev_priv(dev);
+
+	xensock_release_tx_bufs(np);
+	xensock_release_rx_bufs(np);
+	gnttab_free_grant_references(np->gref_tx_head);
+	gnttab_free_grant_references(np->gref_rx_head);
+}
+
+static irqreturn_t xensock_interrupt(int irq, void *dev_id)
+{
+	struct xen_sock_dev *dev = dev_id;
+	struct sockfront_info *np = xensock_dev_priv(dev);
+	unsigned long flags;
+
+	spin_lock_irqsave(&np->tx_lock, flags);
+
+	if (likely(sockif_carrier_ok(dev))) {
+		xensock_tx_buf_gc(dev);
+		/* Under tx_lock: protects access to rx shared-ring indexes. */
+		if (RING_HAS_UNCONSUMED_RESPONSES(&np->rx))
+			tasklet_schedule(&np->rx_poll);
+	}
+
+	spin_unlock_irqrestore(&np->tx_lock, flags);
+
+	return IRQ_HANDLED;
+}
+
+static struct xen_sock_dev *xensock_create_dev(struct xenbus_device *dev)
+{
+	int i, err;
+	struct xen_sock_dev *sockdev;
+	struct sockfront_info *np;
+
+	sockdev = alloc_xen_sock_dev(sizeof(struct sockfront_info), "vsock");
+	if (!sockdev)
+		return ERR_PTR(-ENOMEM);
+
+	np                   = xensock_dev_priv(sockdev);
+	np->xbdev            = dev;
+
+	spin_lock_init(&np->tx_lock);
+	spin_lock_init(&np->rx_lock);
+
+	skb_queue_head_init(&np->rx_batch);
+	np->rx_target     = RX_DFL_MIN_TARGET;
+	np->rx_min_target = RX_DFL_MIN_TARGET;
+	np->rx_max_target = RX_MAX_TARGET;
+
+	init_timer(&np->rx_refill_timer);
+	np->rx_refill_timer.data = (unsigned long)sockdev;
+	np->rx_refill_timer.function = rx_refill_timeout;
+
+	/* Initialise tx_skbs as a free chain containing every entry. */
+	np->tx_skb_freelist = 0;
+	for (i = 0; i < SOCK_TX_RING_SIZE; i++) {
+		skb_entry_set_link(&np->tx_skbs[i], i+1);
+		np->grant_tx_ref[i] = GRANT_INVALID_REF;
+	}
+
+	/* Clear out rx_skbs */
+	for (i = 0; i < SOCK_RX_RING_SIZE; i++) {
+		np->rx_skbs[i] = NULL;
+		np->grant_rx_ref[i] = GRANT_INVALID_REF;
+	}
+
+	/* A grant for every tx ring slot */
+	if (gnttab_alloc_grant_references(TX_MAX_TARGET,
+					  &np->gref_tx_head) < 0) {
+		pr_alert("#### sockfront can't alloc tx grant refs\n");
+		err = -ENOMEM;
+		goto exit;
+	}
+	/* A grant for every rx ring slot */
+	if (gnttab_alloc_grant_references(RX_MAX_TARGET,
+					  &np->gref_rx_head) < 0) {
+		pr_alert("#### sockfront can't alloc rx grant refs\n");
+		err = -ENOMEM;
+		goto exit_free_tx;
+	}
+
+	tasklet_init(&np->rx_poll, xensock_poll, (unsigned long)np);
+
+	sockdev->start_xmit = xensock_start_xmit;
+
+	np->sockdev = sockdev;
+
+	sockif_carrier_off(sockdev);
+
+	return sockdev;
+
+ exit_free_tx:
+	gnttab_free_grant_references(np->gref_tx_head);
+ exit:
+	free_xen_sock_dev(sockdev);
+	return ERR_PTR(err);
+}
+
+static void xensock_end_access(int ref, void *page)
+{
+	/* This frees the page as a side-effect */
+	if (ref != GRANT_INVALID_REF)
+		gnttab_end_foreign_access(ref, 0, (unsigned long)page);
+}
+
+static void xensock_disconnect_backend(struct sockfront_info *info)
+{
+	/* Stop old i/f to prevent errors whilst we rebuild the state. */
+	spin_lock_bh(&info->rx_lock);
+	spin_lock_irq(&info->tx_lock);
+	sockif_carrier_off(info->sockdev);
+	spin_unlock_irq(&info->tx_lock);
+	spin_unlock_bh(&info->rx_lock);
+
+	if (info->irq)
+		unbind_from_irqhandler(info->irq, info);
+	info->evtchn = 0;
+	info->irq = 0;
+
+	/* End access and free the pages */
+	xensock_end_access(info->tx_ring_ref, info->tx.sring);
+	xensock_end_access(info->rx_ring_ref, info->rx.sring);
+
+	info->tx_ring_ref = GRANT_INVALID_REF;
+	info->rx_ring_ref = GRANT_INVALID_REF;
+	info->tx.sring = NULL;
+	info->rx.sring = NULL;
+}
+
+static int setup_sockfront(struct xenbus_device *dev,
+			   struct sockfront_info *info)
+{
+	struct xen_sockif_tx_sring *txs;
+	struct xen_sockif_rx_sring *rxs;
+	int err;
+
+	info->tx_ring_ref = GRANT_INVALID_REF;
+	info->rx_ring_ref = GRANT_INVALID_REF;
+	info->rx.sring = NULL;
+	info->tx.sring = NULL;
+	info->irq = 0;
+
+	txs = (struct xen_sockif_tx_sring *)get_zeroed_page(GFP_NOIO |
+							     __GFP_HIGH);
+	if (!txs) {
+		err = -ENOMEM;
+		xenbus_dev_fatal(dev, err, "allocating tx ring page");
+		goto fail;
+	}
+	SHARED_RING_INIT(txs);
+	FRONT_RING_INIT(&info->tx, txs, PAGE_SIZE);
+
+	err = xenbus_grant_ring(dev, virt_to_mfn(txs));
+	if (err < 0) {
+		free_page((unsigned long)txs);
+		goto fail;
+	}
+
+	info->tx_ring_ref = err;
+	rxs = (struct xen_sockif_rx_sring *)get_zeroed_page(GFP_NOIO |
+							     __GFP_HIGH);
+	if (!rxs) {
+		err = -ENOMEM;
+		xenbus_dev_fatal(dev, err, "allocating rx ring page");
+		goto fail;
+	}
+	SHARED_RING_INIT(rxs);
+	FRONT_RING_INIT(&info->rx, rxs, PAGE_SIZE);
+
+	err = xenbus_grant_ring(dev, virt_to_mfn(rxs));
+	if (err < 0) {
+		free_page((unsigned long)rxs);
+		goto fail;
+	}
+	info->rx_ring_ref = err;
+
+	err = xenbus_alloc_evtchn(dev, &info->evtchn);
+	if (err)
+		goto fail;
+
+	err = bind_evtchn_to_irqhandler(info->evtchn, xensock_interrupt,
+					0, "sockif", info->sockdev);
+	if (err < 0)
+		goto fail;
+	info->irq = err;
+	return 0;
+
+ fail:
+	return err;
+}
+
+/* Common code used when first setting up, and when resuming. */
+static int talk_to_sockback(struct xenbus_device *dev,
+			    struct sockfront_info *info)
+{
+	const char *message;
+	struct xenbus_transaction xbt;
+	int err;
+
+	/* Create shared ring, alloc event channel. */
+	err = setup_sockfront(dev, info);
+	if (err)
+		goto out;
+
+again:
+	err = xenbus_transaction_start(&xbt);
+	if (err) {
+		xenbus_dev_fatal(dev, err, "starting transaction");
+		goto destroy_ring;
+	}
+
+	err = xenbus_printf(xbt, dev->nodename, "tx-ring-ref", "%u",
+			    info->tx_ring_ref);
+	if (err) {
+		message = "writing tx ring-ref";
+		goto abort_transaction;
+	}
+	err = xenbus_printf(xbt, dev->nodename, "rx-ring-ref", "%u",
+			    info->rx_ring_ref);
+	if (err) {
+		message = "writing rx ring-ref";
+		goto abort_transaction;
+	}
+	err = xenbus_printf(xbt, dev->nodename,
+			    "event-channel", "%u", info->evtchn);
+	if (err) {
+		message = "writing event-channel";
+		goto abort_transaction;
+	}
+
+	err = xenbus_printf(xbt, dev->nodename, "request-rx-copy", "%u",
+			    1);
+	if (err) {
+		message = "writing request-rx-copy";
+		goto abort_transaction;
+	}
+
+	err = xenbus_printf(xbt, dev->nodename, "feature-rx-notify", "%d", 1);
+	if (err) {
+		message = "writing feature-rx-notify";
+		goto abort_transaction;
+	}
+
+	err = xenbus_transaction_end(xbt, 0);
+	if (err) {
+		if (err == -EAGAIN)
+			goto again;
+		xenbus_dev_fatal(dev, err, "completing transaction");
+		goto destroy_ring;
+	}
+
+	xenbus_switch_state(dev, XenbusStateInitialised);
+	return 0;
+
+ abort_transaction:
+	xenbus_transaction_end(xbt, 1);
+	xenbus_dev_fatal(dev, err, "%s", message);
+ destroy_ring:
+	xensock_disconnect_backend(info);
+ out:
+	return err;
+}
+
+/**
+ * We are reconnecting to the backend, due to a suspend/resume, or a backend
+ * driver restart.  We tear down our sockif structure and recreate it, but
+ * leave the device-layer structures intact so that this is transparent to the
+ * rest of the kernel.
+ */
+static int sockfront_resume(struct xenbus_device *dev)
+{
+	struct sockfront_info *info = dev_get_drvdata(&dev->dev);
+
+	dev_dbg(&dev->dev, "%s\n", dev->nodename);
+
+	xensock_disconnect_backend(info);
+	return talk_to_sockback(info->xbdev, info);
+}
+
+/**
+ * Entry point to this code when a new device is created.  Allocate the basic
+ * structures and the ring buffers for communication with the backend, and
+ * inform the backend of the appropriate details for those.
+ */
+static int sockfront_probe(struct xenbus_device *dev,
+			   const struct xenbus_device_id *id)
+{
+	int err;
+	struct xen_sock_dev *sockdev;
+	struct sockfront_info *info;
+
+	sockdev = xensock_create_dev(dev);
+	if (IS_ERR(sockdev)) {
+		err = PTR_ERR(sockdev);
+		xenbus_dev_fatal(dev, err, "creating sockdev");
+		return err;
+	}
+
+	info = xensock_dev_priv(sockdev);
+	dev_set_drvdata(&dev->dev, info);
+
+	err = xensock_register_dev(info->sockdev);
+	if (err) {
+		pr_warn("%s: xensock_register_dev err=%d\n",
+			__func__, err);
+		goto fail;
+	}
+
+	err = talk_to_sockback(info->xbdev, info);
+	if (err)
+		goto fail;
+
+	return 0;
+
+ fail:
+	free_xen_sock_dev(sockdev);
+	dev_set_drvdata(&dev->dev, NULL);
+	return err;
+}
+
+static int xensock_connect(struct xen_sock_dev *dev)
+{
+	struct sockfront_info *np = xensock_dev_priv(dev);
+	int i, requeue_idx, err;
+	struct sk_buff *skb;
+	grant_ref_t ref;
+	struct xen_sockif_rx_request *req;
+	unsigned int feature_rx_copy;
+
+	err = xenbus_scanf(XBT_NIL, np->xbdev->otherend,
+			   "feature-rx-copy", "%u", &feature_rx_copy);
+	if (err != 1)
+		feature_rx_copy = 0;
+
+	if (!feature_rx_copy) {
+		dev_info(&np->xbdev->dev,
+			 "backend does not support copying receive path\n");
+		return -ENODEV;
+	}
+
+	spin_lock_bh(&np->rx_lock);
+	spin_lock_irq(&np->tx_lock);
+
+	/* Step 1: Discard all pending TX packet fragments. */
+	xensock_release_tx_bufs(np);
+
+	/* Step 2: Rebuild the RX buffer freelist and the RX ring itself. */
+	for (requeue_idx = 0, i = 0; i < SOCK_RX_RING_SIZE; i++) {
+		skb_frag_t *frag;
+		const struct page *page;
+
+		if (!np->rx_skbs[i])
+			continue;
+
+		skb = xensock_get_rx_skb(np, i);
+		np->rx_skbs[requeue_idx] = skb;
+		ref = xensock_get_rx_ref(np, i);
+		np->grant_rx_ref[requeue_idx] = ref;
+		req = RING_GET_REQUEST(&np->rx, requeue_idx);
+
+		frag = &skb_shinfo(skb)->frags[0];
+		page = skb_frag_page(frag);
+		gnttab_grant_foreign_access_ref(
+			ref, np->xbdev->otherend_id,
+			pfn_to_mfn(page_to_pfn(page)),
+			0);
+		req->gref = ref;
+		req->id   = requeue_idx;
+
+		requeue_idx++;
+	}
+
+	np->rx.req_prod_pvt = requeue_idx;
+
+	/* Step 3: All public and private state should now be sane.  Get
+	 * ready to start sending and receiving packets and give the driver
+	 * domain a kick because we've probably just requeued some
+	 * packets.
+	 */
+	sockif_carrier_on(np->sockdev);
+	notify_remote_via_irq(np->irq);
+	xensock_tx_buf_gc(dev);
+	xensock_alloc_rx_buffers(dev);
+
+	spin_unlock_irq(&np->tx_lock);
+	spin_unlock_bh(&np->rx_lock);
+
+	return 0;
+}
+
+/**
+ * Callback received when the backend's state changes.
+ */
+static void sockback_changed(struct xenbus_device *dev,
+			     enum xenbus_state backend_state)
+{
+	struct sockfront_info *np = dev_get_drvdata(&dev->dev);
+	struct xen_sock_dev *sockdev = np->sockdev;
+
+	dev_dbg(&dev->dev, "sockfront:sndback_changed to state %s\n",
+		xenbus_strstate(backend_state));
+
+	switch (backend_state) {
+	case XenbusStateInitialising:
+	case XenbusStateInitWait:
+	case XenbusStateInitialised:
+	case XenbusStateReconfiguring:
+	case XenbusStateReconfigured:
+	case XenbusStateUnknown:
+	case XenbusStateClosed:
+		break;
+
+	case XenbusStateConnected:
+		if (xensock_connect(sockdev) != 0)
+			break;
+		xenbus_switch_state(dev, XenbusStateConnected);
+		break;
+
+	case XenbusStateClosing:
+		xenbus_frontend_closed(dev);
+		break;
+	}
+}
+
+static const struct xenbus_device_id sockfront_ids[] = {
+	{ "vsock" },
+	{ "" }
+};
+
+
+static int sockfront_remove(struct xenbus_device *dev)
+{
+	struct sockfront_info *info = dev_get_drvdata(&dev->dev);
+
+	dev_dbg(&dev->dev, "%s\n", dev->nodename);
+
+	xensock_disconnect_backend(info);
+
+	xensock_unregister_dev(info->sockdev);
+
+	tasklet_disable(&info->rx_poll);
+	tasklet_kill(&info->rx_poll);
+	xensock_uninit(info->sockdev);
+
+	del_timer_sync(&info->rx_refill_timer);
+
+	free_xen_sock_dev(info->sockdev);
+
+	return 0;
+}
+
+static DEFINE_XENBUS_DRIVER(sockfront, ,
+	.probe = sockfront_probe,
+	.remove = sockfront_remove,
+	.resume = sockfront_resume,
+	.otherend_changed = sockback_changed,
+);
+
+static int __init sockif_init(void)
+{
+	int ret;
+
+	/*FIXME: xen_pv_domain() should be here, but ARM hardcoded to hvm*/
+	if (!xen_domain())
+		return -ENODEV;
+
+	/* Nothing to do if running in dom0. */
+	if (xen_initial_domain())
+		return -ENODEV;
+
+	pr_info("Initialising Xen socket driver.\n");
+
+	ret = xenbus_register_frontend(&sockfront_driver);
+	if (ret < 0)
+		return ret;
+
+	ret = xensock_proto_client_init();
+	if (ret)
+		goto err_unregister_frontend;
+
+	return ret;
+
+err_unregister_frontend:
+	xenbus_unregister_driver(&sockfront_driver);
+	return ret;
+}
+module_init(sockif_init);
+
+
+static void __exit sockif_exit(void)
+{
+	xensock_proto_cleanup();
+	xenbus_unregister_driver(&sockfront_driver);
+}
+module_exit(sockif_exit);
+
+MODULE_DESCRIPTION("Xensock device frontend");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("xen:xensock");
-- 
1.8.2.rc2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] Introduce xensock socket and implement sockback and sockfront drivers
  2014-08-08 11:32 [PATCH 0/3] Introduce xensock socket and implement sockback and sockfront drivers Oleksandr Dmytryshyn
                   ` (2 preceding siblings ...)
  2014-08-08 11:32 ` [PATCH 3/3] xen-sockfront: add support of the xen-sockfront driver Oleksandr Dmytryshyn
@ 2014-08-08 12:36 ` David Vrabel
  2014-08-08 14:41   ` Stefano Stabellini
  2014-08-11  9:40   ` Vitaly V. Ch
  2014-08-08 15:27 ` Wei Liu
  2014-08-11 12:58 ` David Vrabel
  5 siblings, 2 replies; 12+ messages in thread
From: David Vrabel @ 2014-08-08 12:36 UTC (permalink / raw)
  To: Oleksandr Dmytryshyn, Ian Campbell, Stefano Stabellini,
	Tim Deegan, xen-devel

On 08/08/14 12:32, Oleksandr Dmytryshyn wrote:
> Next series of the patches adds new socket support: xensock.
> Those sockets will be used for the xen-sock frontend/backend
> drivers. Those drivers will allow to connect via xensock
> sockets (in this case dom0/domD sockets can be used for the
> server application and domU sockets can be used for the
> client application). Those sockets are similar to the TCP sockets.
> But there are some limitations. Xensock sockets
> ignore an address and can act only as the stream
> sockets. Only one xensock socket can be opened in the frontend
> side and it will be connected with the single xensock socket
> in the backend side (this link is created automatically by
> frontend and backend driver).

You should look at using AF_VSOCK sockets instead.

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] Introduce xensock socket and implement sockback and sockfront drivers
  2014-08-08 12:36 ` [PATCH 0/3] Introduce xensock socket and implement sockback and sockfront drivers David Vrabel
@ 2014-08-08 14:41   ` Stefano Stabellini
  2014-08-20 17:58     ` Ian Campbell
  2014-08-11  9:40   ` Vitaly V. Ch
  1 sibling, 1 reply; 12+ messages in thread
From: Stefano Stabellini @ 2014-08-08 14:41 UTC (permalink / raw)
  To: David Vrabel
  Cc: Tim Deegan, Oleksandr Dmytryshyn, Stefano Stabellini,
	Ian Campbell, xen-devel

On Fri, 8 Aug 2014, David Vrabel wrote:
> On 08/08/14 12:32, Oleksandr Dmytryshyn wrote:
> > Next series of the patches adds new socket support: xensock.
> > Those sockets will be used for the xen-sock frontend/backend
> > drivers. Those drivers will allow to connect via xensock
> > sockets (in this case dom0/domD sockets can be used for the
> > server application and domU sockets can be used for the
> > client application). Those sockets are similar to the TCP sockets.
> > But there are some limitations. Xensock sockets
> > ignore an address and can act only as the stream
> > sockets. Only one xensock socket can be opened in the frontend
> > side and it will be connected with the single xensock socket
> > in the backend side (this link is created automatically by
> > frontend and backend driver).
> 
> You should look at using AF_VSOCK sockets instead.

Even if he uses AF_VSOCK instead of introducing AF_XENSOCK, he would
still need to add a new Xen specific frontend and backend pair, right?

I am in favor of this work, but you would need to spend a few words in
the patch description to explain why libvchan doesn't fit your needs.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] Introduce xensock socket and implement sockback and sockfront drivers
  2014-08-08 11:32 [PATCH 0/3] Introduce xensock socket and implement sockback and sockfront drivers Oleksandr Dmytryshyn
                   ` (3 preceding siblings ...)
  2014-08-08 12:36 ` [PATCH 0/3] Introduce xensock socket and implement sockback and sockfront drivers David Vrabel
@ 2014-08-08 15:27 ` Wei Liu
  2014-08-11  9:28   ` Oleksandr Dmytryshyn
  2014-08-11 12:58 ` David Vrabel
  5 siblings, 1 reply; 12+ messages in thread
From: Wei Liu @ 2014-08-08 15:27 UTC (permalink / raw)
  To: Oleksandr Dmytryshyn
  Cc: Tim Deegan, wei.liu2, Stefano Stabellini, Ian Campbell, xen-devel

Yes you've hooked it up to socket layer but for large part it's
duplication of old (and possible buggy) code -- 3.10 netfront/back.

I think you should first state why this new driver is needed. You will
also need to refactor your code, so that it depends on xen-netback /
front. Code duplication like this is not acceptable upstream.

Wei.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] Introduce xensock socket and implement sockback and sockfront drivers
  2014-08-08 15:27 ` Wei Liu
@ 2014-08-11  9:28   ` Oleksandr Dmytryshyn
  0 siblings, 0 replies; 12+ messages in thread
From: Oleksandr Dmytryshyn @ 2014-08-11  9:28 UTC (permalink / raw)
  To: Wei Liu; +Cc: Tim Deegan, Stefano Stabellini, Ian Campbell, xen-devel

My drivers provide a large bandwidth and minimal latency with small
packets. In case the PV Net there is an TCP-specific protocol
overhead.

Oleksandr Dmytryshyn | Product Engineering and Development
GlobalLogic
M +38.067.382.2525
www.globallogic.com

http://www.globallogic.com/email_disclaimer.txt


On Fri, Aug 8, 2014 at 6:27 PM, Wei Liu <wei.liu2@citrix.com> wrote:
> Yes you've hooked it up to socket layer but for large part it's
> duplication of old (and possible buggy) code -- 3.10 netfront/back.
>
> I think you should first state why this new driver is needed. You will
> also need to refactor your code, so that it depends on xen-netback /
> front. Code duplication like this is not acceptable upstream.
>
> Wei.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] Introduce xensock socket and implement sockback and sockfront drivers
  2014-08-08 12:36 ` [PATCH 0/3] Introduce xensock socket and implement sockback and sockfront drivers David Vrabel
  2014-08-08 14:41   ` Stefano Stabellini
@ 2014-08-11  9:40   ` Vitaly V. Ch
  2014-08-11  9:50     ` Vitaly V. Ch
  1 sibling, 1 reply; 12+ messages in thread
From: Vitaly V. Ch @ 2014-08-11  9:40 UTC (permalink / raw)
  To: David Vrabel
  Cc: Tim Deegan, Oleksandr Dmytryshyn, Stefano Stabellini,
	Ian Campbell, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1277 bytes --]

On Fri, Aug 8, 2014 at 3:36 PM, David Vrabel <david.vrabel@citrix.com>
wrote:

> On 08/08/14 12:32, Oleksandr Dmytryshyn wrote:
> > Next series of the patches adds new socket support: xensock.
> > Those sockets will be used for the xen-sock frontend/backend
> > drivers. Those drivers will allow to connect via xensock
> > sockets (in this case dom0/domD sockets can be used for the
> > server application and domU sockets can be used for the
> > client application). Those sockets are similar to the TCP sockets.
> > But there are some limitations. Xensock sockets
> > ignore an address and can act only as the stream
> > sockets. Only one xensock socket can be opened in the frontend
> > side and it will be connected with the single xensock socket
> > in the backend side (this link is created automatically by
> > frontend and backend driver).
>
> You should look at using AF_VSOCK sockets instead.
>

But in this situation he will need to introduce new protocol in that
domain. Do you sure than new protocol is better than new domain? Also does
anybody use AF_VSOCK outside vmware?

With best regards,

Vitaly Chernooky


>
> David
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

[-- Attachment #1.2: Type: text/html, Size: 2139 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] Introduce xensock socket and implement sockback and sockfront drivers
  2014-08-11  9:40   ` Vitaly V. Ch
@ 2014-08-11  9:50     ` Vitaly V. Ch
  0 siblings, 0 replies; 12+ messages in thread
From: Vitaly V. Ch @ 2014-08-11  9:50 UTC (permalink / raw)
  To: David Vrabel
  Cc: Tim Deegan, Oleksandr Dmytryshyn, Stefano Stabellini,
	Ian Campbell, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1553 bytes --]

On Mon, Aug 11, 2014 at 12:40 PM, Vitaly V. Ch <vitaly.v.ch@chernooky.com>
wrote:

>
>
>
> On Fri, Aug 8, 2014 at 3:36 PM, David Vrabel <david.vrabel@citrix.com>
> wrote:
>
>> On 08/08/14 12:32, Oleksandr Dmytryshyn wrote:
>> > Next series of the patches adds new socket support: xensock.
>> > Those sockets will be used for the xen-sock frontend/backend
>> > drivers. Those drivers will allow to connect via xensock
>> > sockets (in this case dom0/domD sockets can be used for the
>> > server application and domU sockets can be used for the
>> > client application). Those sockets are similar to the TCP sockets.
>> > But there are some limitations. Xensock sockets
>> > ignore an address and can act only as the stream
>> > sockets. Only one xensock socket can be opened in the frontend
>> > side and it will be connected with the single xensock socket
>> > in the backend side (this link is created automatically by
>> > frontend and backend driver).
>>
>> You should look at using AF_VSOCK sockets instead.
>>
>
> But in this situation he will need to introduce new protocol in that
> domain. Do you sure than new protocol is better than new domain? Also does
> anybody use AF_VSOCK outside vmware?
>


Also in kernel version where Alexander introduce AF_XENSOCK nothing from
AF_VSOCK is present.

With best regards,

Vitaly Chernooky


>
> With best regards,
>
> Vitaly Chernooky
>
>
>>
>> David
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>>
>
>

[-- Attachment #1.2: Type: text/html, Size: 2981 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] Introduce xensock socket and implement sockback and sockfront drivers
  2014-08-08 11:32 [PATCH 0/3] Introduce xensock socket and implement sockback and sockfront drivers Oleksandr Dmytryshyn
                   ` (4 preceding siblings ...)
  2014-08-08 15:27 ` Wei Liu
@ 2014-08-11 12:58 ` David Vrabel
  5 siblings, 0 replies; 12+ messages in thread
From: David Vrabel @ 2014-08-11 12:58 UTC (permalink / raw)
  To: Oleksandr Dmytryshyn, Ian Campbell, Stefano Stabellini,
	Tim Deegan, xen-devel

On 08/08/14 12:32, Oleksandr Dmytryshyn wrote:
> Next series of the patches adds new socket support: xensock.
> Those sockets will be used for the xen-sock frontend/backend
> drivers. Those drivers will allow to connect via xensock
> sockets (in this case dom0/domD sockets can be used for the
> server application and domU sockets can be used for the
> client application). Those sockets are similar to the TCP sockets.
> But there are some limitations. Xensock sockets
> ignore an address and can act only as the stream
> sockets. Only one xensock socket can be opened in the frontend
> side and it will be connected with the single xensock socket
> in the backend side (this link is created automatically by
> frontend and backend driver).

We only want one generic socket-based interdomain communication
mechanism in the kernel and this proposal is lacking in several areas.

* It does not use the existing support for interdomain sockets (AF_VSOCK).

* It it not suitable for a channel between two mutually untrusting peers.

* The single channel to one other domain is insufficient.  Each domain
must be able to support connections to many other domains.

See also a previous discussion.

http://lists.xen.org/archives/html/xen-devel/2013-06/msg01123.html

But note the document has not been updated following the feedback from
that thread.

There is a userspace prototype available:

http://xenbits.xen.org/gitweb/?p=people/dvrabel/idc-prototype.git;a=summary

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] Introduce xensock socket and implement sockback and sockfront drivers
  2014-08-08 14:41   ` Stefano Stabellini
@ 2014-08-20 17:58     ` Ian Campbell
  0 siblings, 0 replies; 12+ messages in thread
From: Ian Campbell @ 2014-08-20 17:58 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Tim Deegan, Oleksandr Dmytryshyn, Stefano Stabellini,
	David Vrabel, xen-devel

On Fri, 2014-08-08 at 15:41 +0100, Stefano Stabellini wrote:
> On Fri, 8 Aug 2014, David Vrabel wrote:
> > On 08/08/14 12:32, Oleksandr Dmytryshyn wrote:
> > > Next series of the patches adds new socket support: xensock.
> > > Those sockets will be used for the xen-sock frontend/backend
> > > drivers. Those drivers will allow to connect via xensock
> > > sockets (in this case dom0/domD sockets can be used for the
> > > server application and domU sockets can be used for the
> > > client application). Those sockets are similar to the TCP sockets.
> > > But there are some limitations. Xensock sockets
> > > ignore an address and can act only as the stream
> > > sockets. Only one xensock socket can be opened in the frontend
> > > side and it will be connected with the single xensock socket
> > > in the backend side (this link is created automatically by
> > > frontend and backend driver).
> > 
> > You should look at using AF_VSOCK sockets instead.
> 
> Even if he uses AF_VSOCK instead of introducing AF_XENSOCK, he would
> still need to add a new Xen specific frontend and backend pair, right?
> 
> I am in favor of this work, but you would need to spend a few words in
> the patch description to explain why libvchan doesn't fit your needs.

Agreed.

It would be nice (tm) to have an in kernel implementation of the vchan
code so that a userprocess in one domain and kernel code in another
could interact over vchan (I can't remember who but someone needed
something like that), but I guess this is somewhat orthogonal to this
series.

Ian.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-08-20 17:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-08 11:32 [PATCH 0/3] Introduce xensock socket and implement sockback and sockfront drivers Oleksandr Dmytryshyn
2014-08-08 11:32 ` [PATCH 1/3] net: introduce new socket support: xensock Oleksandr Dmytryshyn
2014-08-08 11:32 ` [PATCH 2/3] xen-sockback: add support of the xen-sockback driver Oleksandr Dmytryshyn
2014-08-08 11:32 ` [PATCH 3/3] xen-sockfront: add support of the xen-sockfront driver Oleksandr Dmytryshyn
2014-08-08 12:36 ` [PATCH 0/3] Introduce xensock socket and implement sockback and sockfront drivers David Vrabel
2014-08-08 14:41   ` Stefano Stabellini
2014-08-20 17:58     ` Ian Campbell
2014-08-11  9:40   ` Vitaly V. Ch
2014-08-11  9:50     ` Vitaly V. Ch
2014-08-08 15:27 ` Wei Liu
2014-08-11  9:28   ` Oleksandr Dmytryshyn
2014-08-11 12:58 ` David Vrabel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).