* [PATCH v3 00/19] Vhost-user: Implement device IOTLB support
@ 2017-10-05  8:36 Maxime Coquelin
  2017-10-05  8:36 ` [PATCH v3 01/19] Revert "vhost: workaround MQ fails to startup" Maxime Coquelin
                   ` (19 more replies)
  0 siblings, 20 replies; 48+ messages in thread
From: Maxime Coquelin @ 2017-10-05  8:36 UTC (permalink / raw)
  To: dev, remy.horton, tiwei.bie, yliu
  Cc: mst, jfreiman, vkaplans, jasowang, Maxime Coquelin
This v3 lists the feature in the release note, and fixes the bug in
is_vring_iotlb_update() reported by Yuanhan.
The purpose of this series is to add support for
VIRTIO_F_IOMMU_PLATFORM feature, by implementing device IOTLB in the
vhost-user backend. It improves the guest safety by enabling the
possibility to isolate the Virtio device.
It makes possible to use Virtio PMD in guest with using VFIO driver
without enable_unsafe_noiommu_mode parameter set, so that the DPDK
application on guest can only access memory its has been allowed to,
and preventing malicious/buggy DPDK application in guest to make
vhost-user backend write random guest memory. Note that Virtio-net
Kernel driver also support IOMMU.
The series depends on Qemu's "vhost-user: Specify and implement
device IOTLB support" [0], available upstream and which will be part
of Qemu v2.10 release.
Performance-wise, even if this RFC has still room for optimizations,
no performance degradation is noticed with static mappings (i.e. DPDK
on guest) with PVP benchmark:
	Traffic Generator: Moongen (lua-trafficgen)
	Acceptable Loss: 0.005%
	Validation run time: 1 min
	Guest DPDK version/commit: v17.05
	QEMU version/commit: master (6db174aed1fd)
	Virtio features: default
	CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
	NIC: 2 x X710
	Page size: 1G host/1G guest
	Results (bidirectional, total of the two flows):
	 - base: 18.8Mpps
	 - base + IOTLB series, IOMMU OFF: 18.8Mpps
	 - base + IOTLB series, IOMMU ON: 18.8Mpps (14.5Mpps w/o PATCH 21/21)
This is explained because IOTLB misses, which are very costly, only
happen at startup time. Indeed, once used, the buffers are not
invalidated, so if the IOTLB cache is large enough, there will be only
cache hits. Also, the use of 1G huge pages improves the IOTLB cache
searching time by reducing the number of entries.
With 2M hugepages, a performance degradation is seen with IOMMU on:
	Traffic Generator: Moongen (lua-trafficgen)
	Acceptable Loss: 0.005%
	Validation run time: 1 min
	Guest DPDK version/commit: v17.05
	QEMU version/commit: master (6db174aed1fd)
	Virtio features: default
	CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
	NIC: 2 x X710
	Page size: 2M host/2M guest
	Results (bidirectional, total of the two flows):
	 - base: 18.8Mpps
	 - base + IOTLB series, IOMMU OFF: 18.8Mpps
	 - base + IOTLB series, IOMMU ON: 13.5Mpps (12.4Mpps wo PATCH 21/21)
A possible improvement would be to merge contiguous IOTLB entries sharing
the same permissions. A very rough patch implementing this idea fixes
the performance degradation (18.8Mpps), but the required work to clean
it would delay this series after v17.11.
With dynamic mappings (i.e. Virtio-net kernel driver), this is another
story. The performance is so poor it makes it almost unusable. Indeed,
since the Kernel driver unmaps the buffers as soon as they are handled,
almost all descriptors buffers addresses translations result in an IOTLB
miss. There is not much that can be done on DPDK side, except maybe
batching IOTLB miss requests no to break bursts, but it would require
a big rework. In Qemu, we may consider enabling IOMMU MAP notifications,
so that DPDK receives the IOTLB updates without having to send IOTLB miss
request.
Regarding the design choices:
 - I initially intended to use userspace RCU library[1] for the cache
implementation, but it would have added an external dependency, and the
lib is not available in all distros. Qemu for example got rid of this
dependency by copying some of the userspace RCU lib parts into Qemu tree,
but this is not possible with DPDK due to licensing issues (RCU lib is
LGPL v2). Thanks to Jason advice, I implemented the cache using rd/wr
locks.
 - I initially implemented a per-device IOTLB cache, but the concurrent
acccesses on the IOTLB lock had huge impact on performance (~-40% in
bidirectionnal, expect even worse with multiqueue). I move to a per-
virtqueue IOTLB design, which prevents this concurrency.
 - The slave IOTLB miss request supports reply-ack feature in spec, but
this version doesn't block or busy-wait for the corresponding update so
that other queues sharing the same lcore can be processed in the meantime.
For those who would like to test the series, I made it available on
gitlab[2] (vhost_user_iotlb_v1 tag). The guest kernel command line requires
the intel_iommu=on parameter, and the guest should be started with and
iommu device attached to the virtio-net device. For example:
./qemu-system-x86_64 \
  -enable-kvm -m 4096 -smp 2 \
  -M q35,kernel-irqchip=split \
  -cpu host \
  -device intel-iommu,device-iotlb=on,intremap \
  -device ioh3420,id=root.1,chassis=1 \
  -chardev socket,id=char0,path=/tmp/vhost-user1 \
  -netdev type=vhost-user,id=hn2,chardev=char0 \
  -device virtio-net-pci,netdev=hn2,id=v0,mq=off,mac=$MAC,bus=root.1,disable-modern=off,disable-legacy=on,iommu_platform=on,ats=on \
...
[0]: https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg00520.html
[1]: http://liburcu.org/
[2]: https://gitlab.com/mcoquelin/dpdk-next-virtio/commits/vhost_user_iotlb_v2
Changes since v1:
=================
- Add feature support in release note (Yuanhan)
- Fix return value in is_vring_iotlb_update() (Yuanhan)
Changes since v1:
=================
- Implement random cache entry eviction instead of removing all entries
  when cache is full (Yuanhan)
- Adds bounds check on vring_idx (Remy)
- Initialize slave_req_fd to -1 (Tiwei)
- Remove virtio-net device lock (Tiwei)
- Simplify vhost_user_iotlb_cache_remove() (Tiwei)
- Squash iotlb lock usage optimization patch
- Inline noiommu part of vhost_iova_to_vva to remove performance
  regressions with IOMMU=off
- Reworked iotlb files copyrights
Changes since RFC:
==================
- Fix memory leak in error patch reported by Jens
- Rework wait for IOTLB update by stopping the burst to let other
  queues to be processed, if any. It implies the introduction an
  iotlb_pending_list, so that iotlb miss requests aren't sent multiple
  times for a same address.
- Optimize iotlb lock usage to recover to same as IOMMU off performance
- Fix device locking issue in rte_vhost_dequeue_burst() error path
- Change virtio_dev_rx error handling for consistency with mergeable rx,
  and to ease returning in case of IOTLB misses.
- Fix checkpatch warnings reported by checkpatch@dpdk.org
Maxime Coquelin (19):
  Revert "vhost: workaround MQ fails to startup"
  vhost: make error handling consistent in rx path
  vhost: prepare send_vhost_message() to slave requests
  vhost: add support to slave requests channel
  vhost: declare missing IOMMU-related definitions for old kernels
  vhost: add iotlb helper functions
  vhost: iotlb: add pending miss request list and helpers
  vhost-user: add support to IOTLB miss slave requests
  vhost: initialize vrings IOTLB caches
  vhost-user: handle IOTLB update and invalidate requests
  vhost: introduce guest IOVA to backend VA helper
  vhost: use the guest IOVA to host VA helper
  vhost: enable rings at the right time
  vhost: don't dereference invalid dev pointer after its reallocation
  vhost: postpone rings addresses translation
  vhost-user: translate ring addresses when IOMMU enabled
  vhost-user: iommu: postpone device creation until ring are mapped
  vhost: iommu: Invalidate vring in case of matching IOTLB invalidate
  vhost: enable IOMMU support
 doc/guides/rel_notes/release_17_11.rst |   4 +
 lib/librte_vhost/Makefile              |   4 +-
 lib/librte_vhost/iotlb.c               | 352 +++++++++++++++++++++++++++++++++
 lib/librte_vhost/iotlb.h               |  76 +++++++
 lib/librte_vhost/vhost.c               | 115 +++++++++--
 lib/librte_vhost/vhost.h               |  61 +++++-
 lib/librte_vhost/vhost_user.c          | 312 +++++++++++++++++++++++++----
 lib/librte_vhost/vhost_user.h          |  20 +-
 lib/librte_vhost/virtio_net.c          |  96 +++++++--
 9 files changed, 962 insertions(+), 78 deletions(-)
 create mode 100644 lib/librte_vhost/iotlb.c
 create mode 100644 lib/librte_vhost/iotlb.h
-- 
2.13.6
^ permalink raw reply	[flat|nested] 48+ messages in thread
* [PATCH v3 01/19] Revert "vhost: workaround MQ fails to startup"
  2017-10-05  8:36 [PATCH v3 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
@ 2017-10-05  8:36 ` Maxime Coquelin
  2017-11-01 17:11   ` Kavanagh, Mark B
  2017-10-05  8:36 ` [PATCH v3 02/19] vhost: make error handling consistent in rx path Maxime Coquelin
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Maxime Coquelin @ 2017-10-05  8:36 UTC (permalink / raw)
  To: dev, remy.horton, tiwei.bie, yliu
  Cc: mst, jfreiman, vkaplans, jasowang, Maxime Coquelin
This reverts commit 04d81227960b5c1cf2f11f492100979ead20c526.
As agreed when this workaround was introduced, it can be reverted
as Qemu v2.10 that fixes the issue is now out.
The reply-ack feature is required for vhost-user IOMMU support.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost_user.h | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index 35ebd7190..2ba22dbb0 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -49,14 +49,10 @@
 #define VHOST_USER_PROTOCOL_F_REPLY_ACK	3
 #define VHOST_USER_PROTOCOL_F_NET_MTU 4
 
-/*
- * disable REPLY_ACK feature to workaround the buggy QEMU implementation.
- * Proved buggy QEMU includes v2.7 - v2.9.
- */
 #define VHOST_USER_PROTOCOL_FEATURES	((1ULL << VHOST_USER_PROTOCOL_F_MQ) | \
 					 (1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD) |\
 					 (1ULL << VHOST_USER_PROTOCOL_F_RARP) | \
-					 (0ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK) | \
+					 (1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK) | \
 					 (1ULL << VHOST_USER_PROTOCOL_F_NET_MTU))
 
 typedef enum VhostUserRequest {
-- 
2.13.6
^ permalink raw reply related	[flat|nested] 48+ messages in thread
* [PATCH v3 02/19] vhost: make error handling consistent in rx path
  2017-10-05  8:36 [PATCH v3 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
  2017-10-05  8:36 ` [PATCH v3 01/19] Revert "vhost: workaround MQ fails to startup" Maxime Coquelin
@ 2017-10-05  8:36 ` Maxime Coquelin
  2017-10-05  8:36 ` [PATCH v3 03/19] vhost: prepare send_vhost_message() to slave requests Maxime Coquelin
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 48+ messages in thread
From: Maxime Coquelin @ 2017-10-05  8:36 UTC (permalink / raw)
  To: dev, remy.horton, tiwei.bie, yliu
  Cc: mst, jfreiman, vkaplans, jasowang, Maxime Coquelin
In the non-mergeable receive case, when copy_mbuf_to_desc()
call fails the packet is skipped, the corresponding used element
len field is set to vnet header size, and it continues with next
packet/desc. It could be a problem because it does not know why it
failed, and assume the desc buffer is large enough.
In mergeable receive case, when copy_mbuf_to_desc_mergeable()
fails, packets burst is simply stopped.
This patch makes the non-mergeable error path to behave as the
mergeable one, as it seems the safest way. Also, doing this way
will simplify pending IOTLB miss requests handling.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/virtio_net.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index f8732dfec..59ff6c875 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -374,11 +374,8 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 
 		err = copy_mbuf_to_desc(dev, vq, descs, pkts[i], desc_idx, sz);
 		if (unlikely(err)) {
-			used_idx = (start_idx + i) & (vq->size - 1);
-			vq->used->ring[used_idx].len = dev->vhost_hlen;
-			vhost_log_used_vring(dev, vq,
-				offsetof(struct vring_used, ring[used_idx]),
-				sizeof(vq->used->ring[used_idx]));
+			count = i;
+			break;
 		}
 
 		if (i + 1 < count)
-- 
2.13.6
^ permalink raw reply related	[flat|nested] 48+ messages in thread
* [PATCH v3 03/19] vhost: prepare send_vhost_message() to slave requests
  2017-10-05  8:36 [PATCH v3 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
  2017-10-05  8:36 ` [PATCH v3 01/19] Revert "vhost: workaround MQ fails to startup" Maxime Coquelin
  2017-10-05  8:36 ` [PATCH v3 02/19] vhost: make error handling consistent in rx path Maxime Coquelin
@ 2017-10-05  8:36 ` Maxime Coquelin
  2017-10-05  8:36 ` [PATCH v3 04/19] vhost: add support to slave requests channel Maxime Coquelin
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 48+ messages in thread
From: Maxime Coquelin @ 2017-10-05  8:36 UTC (permalink / raw)
  To: dev, remy.horton, tiwei.bie, yliu
  Cc: mst, jfreiman, vkaplans, jasowang, Maxime Coquelin
send_vhost_message() is currently only used to send
replies, so it modifies message flags to perpare the
reply.
With upcoming channel for backend initiated request,
this function can be used to send requests.
This patch introduces a new send_vhost_reply() that
does the message flags modifications, and makes
send_vhost_message() generic.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost_user.c | 27 ++++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index b62e3828b..a068d8651 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -919,8 +919,16 @@ read_vhost_message(int sockfd, struct VhostUserMsg *msg)
 static int
 send_vhost_message(int sockfd, struct VhostUserMsg *msg)
 {
-	int ret;
+	if (!msg)
+		return 0;
+
+	return send_fd_message(sockfd, (char *)msg,
+		VHOST_USER_HDR_SIZE + msg->size, NULL, 0);
+}
 
+static int
+send_vhost_reply(int sockfd, struct VhostUserMsg *msg)
+{
 	if (!msg)
 		return 0;
 
@@ -929,10 +937,7 @@ send_vhost_message(int sockfd, struct VhostUserMsg *msg)
 	msg->flags |= VHOST_USER_VERSION;
 	msg->flags |= VHOST_USER_REPLY_MASK;
 
-	ret = send_fd_message(sockfd, (char *)msg,
-		VHOST_USER_HDR_SIZE + msg->size, NULL, 0);
-
-	return ret;
+	return send_vhost_message(sockfd, msg);
 }
 
 /*
@@ -1024,7 +1029,7 @@ vhost_user_msg_handler(int vid, int fd)
 	case VHOST_USER_GET_FEATURES:
 		msg.payload.u64 = vhost_user_get_features(dev);
 		msg.size = sizeof(msg.payload.u64);
-		send_vhost_message(fd, &msg);
+		send_vhost_reply(fd, &msg);
 		break;
 	case VHOST_USER_SET_FEATURES:
 		vhost_user_set_features(dev, msg.payload.u64);
@@ -1033,7 +1038,7 @@ vhost_user_msg_handler(int vid, int fd)
 	case VHOST_USER_GET_PROTOCOL_FEATURES:
 		msg.payload.u64 = VHOST_USER_PROTOCOL_FEATURES;
 		msg.size = sizeof(msg.payload.u64);
-		send_vhost_message(fd, &msg);
+		send_vhost_reply(fd, &msg);
 		break;
 	case VHOST_USER_SET_PROTOCOL_FEATURES:
 		vhost_user_set_protocol_features(dev, msg.payload.u64);
@@ -1055,7 +1060,7 @@ vhost_user_msg_handler(int vid, int fd)
 
 		/* it needs a reply */
 		msg.size = sizeof(msg.payload.u64);
-		send_vhost_message(fd, &msg);
+		send_vhost_reply(fd, &msg);
 		break;
 	case VHOST_USER_SET_LOG_FD:
 		close(msg.fds[0]);
@@ -1075,7 +1080,7 @@ vhost_user_msg_handler(int vid, int fd)
 	case VHOST_USER_GET_VRING_BASE:
 		vhost_user_get_vring_base(dev, &msg);
 		msg.size = sizeof(msg.payload.state);
-		send_vhost_message(fd, &msg);
+		send_vhost_reply(fd, &msg);
 		break;
 
 	case VHOST_USER_SET_VRING_KICK:
@@ -1094,7 +1099,7 @@ vhost_user_msg_handler(int vid, int fd)
 	case VHOST_USER_GET_QUEUE_NUM:
 		msg.payload.u64 = VHOST_MAX_QUEUE_PAIRS;
 		msg.size = sizeof(msg.payload.u64);
-		send_vhost_message(fd, &msg);
+		send_vhost_reply(fd, &msg);
 		break;
 
 	case VHOST_USER_SET_VRING_ENABLE:
@@ -1117,7 +1122,7 @@ vhost_user_msg_handler(int vid, int fd)
 	if (msg.flags & VHOST_USER_NEED_REPLY) {
 		msg.payload.u64 = !!ret;
 		msg.size = sizeof(msg.payload.u64);
-		send_vhost_message(fd, &msg);
+		send_vhost_reply(fd, &msg);
 	}
 
 	if (!(dev->flags & VIRTIO_DEV_RUNNING) && virtio_is_ready(dev)) {
-- 
2.13.6
^ permalink raw reply related	[flat|nested] 48+ messages in thread
* [PATCH v3 04/19] vhost: add support to slave requests channel
  2017-10-05  8:36 [PATCH v3 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (2 preceding siblings ...)
  2017-10-05  8:36 ` [PATCH v3 03/19] vhost: prepare send_vhost_message() to slave requests Maxime Coquelin
@ 2017-10-05  8:36 ` Maxime Coquelin
  2017-10-05  8:36 ` [PATCH v3 05/19] vhost: declare missing IOMMU-related definitions for old kernels Maxime Coquelin
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 48+ messages in thread
From: Maxime Coquelin @ 2017-10-05  8:36 UTC (permalink / raw)
  To: dev, remy.horton, tiwei.bie, yliu
  Cc: mst, jfreiman, vkaplans, jasowang, Maxime Coquelin
Currently, only QEMU sends requests, the backend sends
replies. In some cases, the backend may need to send
requests to QEMU, like IOTLB miss events when IOMMU is
supported.
This patch introduces a new channel for such requests.
QEMU sends a file descriptor of a new socket using
VHOST_USER_SET_SLAVE_REQ_FD.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.c      |  1 +
 lib/librte_vhost/vhost.h      |  2 ++
 lib/librte_vhost/vhost_user.c | 27 +++++++++++++++++++++++++++
 lib/librte_vhost/vhost_user.h | 10 +++++++++-
 4 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 474b6e493..2d30f14c4 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -207,6 +207,7 @@ vhost_new_device(void)
 
 	vhost_devices[i] = dev;
 	dev->vid = i;
+	dev->slave_req_fd = -1;
 
 	return i;
 }
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 74df74717..8405f879b 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -209,6 +209,8 @@ struct virtio_net {
 	uint32_t		nr_guest_pages;
 	uint32_t		max_guest_pages;
 	struct guest_page       *guest_pages;
+
+	int			slave_req_fd;
 } __rte_cache_aligned;
 
 
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index a068d8651..0ba66e193 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -76,6 +76,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
 	[VHOST_USER_SET_VRING_ENABLE]  = "VHOST_USER_SET_VRING_ENABLE",
 	[VHOST_USER_SEND_RARP]  = "VHOST_USER_SEND_RARP",
 	[VHOST_USER_NET_SET_MTU]  = "VHOST_USER_NET_SET_MTU",
+	[VHOST_USER_SET_SLAVE_REQ_FD]  = "VHOST_USER_SET_SLAVE_REQ_FD",
 };
 
 static uint64_t
@@ -122,6 +123,11 @@ vhost_backend_cleanup(struct virtio_net *dev)
 		munmap((void *)(uintptr_t)dev->log_addr, dev->log_size);
 		dev->log_addr = 0;
 	}
+
+	if (dev->slave_req_fd >= 0) {
+		close(dev->slave_req_fd);
+		dev->slave_req_fd = -1;
+	}
 }
 
 /*
@@ -886,6 +892,23 @@ vhost_user_net_set_mtu(struct virtio_net *dev, struct VhostUserMsg *msg)
 	return 0;
 }
 
+static int
+vhost_user_set_req_fd(struct virtio_net *dev, struct VhostUserMsg *msg)
+{
+	int fd = msg->fds[0];
+
+	if (fd < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+				"Invalid file descriptor for slave channel (%d)\n",
+				fd);
+		return -1;
+	}
+
+	dev->slave_req_fd = fd;
+
+	return 0;
+}
+
 /* return bytes# of read on success or negative val on failure. */
 static int
 read_vhost_message(int sockfd, struct VhostUserMsg *msg)
@@ -1113,6 +1136,10 @@ vhost_user_msg_handler(int vid, int fd)
 		ret = vhost_user_net_set_mtu(dev, &msg);
 		break;
 
+	case VHOST_USER_SET_SLAVE_REQ_FD:
+		ret = vhost_user_set_req_fd(dev, &msg);
+		break;
+
 	default:
 		ret = -1;
 		break;
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index 2ba22dbb0..98f6e6f37 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -48,12 +48,14 @@
 #define VHOST_USER_PROTOCOL_F_RARP	2
 #define VHOST_USER_PROTOCOL_F_REPLY_ACK	3
 #define VHOST_USER_PROTOCOL_F_NET_MTU 4
+#define VHOST_USER_PROTOCOL_F_SLAVE_REQ 5
 
 #define VHOST_USER_PROTOCOL_FEATURES	((1ULL << VHOST_USER_PROTOCOL_F_MQ) | \
 					 (1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD) |\
 					 (1ULL << VHOST_USER_PROTOCOL_F_RARP) | \
 					 (1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK) | \
-					 (1ULL << VHOST_USER_PROTOCOL_F_NET_MTU))
+					 (1ULL << VHOST_USER_PROTOCOL_F_NET_MTU) | \
+					 (1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ))
 
 typedef enum VhostUserRequest {
 	VHOST_USER_NONE = 0,
@@ -77,9 +79,15 @@ typedef enum VhostUserRequest {
 	VHOST_USER_SET_VRING_ENABLE = 18,
 	VHOST_USER_SEND_RARP = 19,
 	VHOST_USER_NET_SET_MTU = 20,
+	VHOST_USER_SET_SLAVE_REQ_FD = 21,
 	VHOST_USER_MAX
 } VhostUserRequest;
 
+typedef enum VhostUserSlaveRequest {
+	VHOST_USER_SLAVE_NONE = 0,
+	VHOST_USER_SLAVE_MAX
+} VhostUserSlaveRequest;
+
 typedef struct VhostUserMemoryRegion {
 	uint64_t guest_phys_addr;
 	uint64_t memory_size;
-- 
2.13.6
^ permalink raw reply related	[flat|nested] 48+ messages in thread
* [PATCH v3 05/19] vhost: declare missing IOMMU-related definitions for old kernels
  2017-10-05  8:36 [PATCH v3 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (3 preceding siblings ...)
  2017-10-05  8:36 ` [PATCH v3 04/19] vhost: add support to slave requests channel Maxime Coquelin
@ 2017-10-05  8:36 ` Maxime Coquelin
  2017-10-05  8:36 ` [PATCH v3 06/19] vhost: add iotlb helper functions Maxime Coquelin
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 48+ messages in thread
From: Maxime Coquelin @ 2017-10-05  8:36 UTC (permalink / raw)
  To: dev, remy.horton, tiwei.bie, yliu
  Cc: mst, jfreiman, vkaplans, jasowang, Maxime Coquelin
These defines and enums have been introduced in upstream kernel v4.8,
and backported to RHEL 7.4.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.h | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 8405f879b..94bee4c8d 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -145,6 +145,37 @@ struct vhost_virtqueue {
  #define VIRTIO_NET_F_MTU 3
 #endif
 
+/* Declare IOMMU related bits for older kernels */
+#ifndef VIRTIO_F_IOMMU_PLATFORM
+
+#define VIRTIO_F_IOMMU_PLATFORM 33
+
+struct vhost_iotlb_msg {
+	__u64 iova;
+	__u64 size;
+	__u64 uaddr;
+#define VHOST_ACCESS_RO      0x1
+#define VHOST_ACCESS_WO      0x2
+#define VHOST_ACCESS_RW      0x3
+	__u8 perm;
+#define VHOST_IOTLB_MISS           1
+#define VHOST_IOTLB_UPDATE         2
+#define VHOST_IOTLB_INVALIDATE     3
+#define VHOST_IOTLB_ACCESS_FAIL    4
+	__u8 type;
+};
+
+#define VHOST_IOTLB_MSG 0x1
+
+struct vhost_msg {
+	int type;
+	union {
+		struct vhost_iotlb_msg iotlb;
+		__u8 padding[64];
+	};
+};
+#endif
+
 /*
  * Define virtio 1.0 for older kernels
  */
-- 
2.13.6
^ permalink raw reply related	[flat|nested] 48+ messages in thread
* [PATCH v3 06/19] vhost: add iotlb helper functions
  2017-10-05  8:36 [PATCH v3 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (4 preceding siblings ...)
  2017-10-05  8:36 ` [PATCH v3 05/19] vhost: declare missing IOMMU-related definitions for old kernels Maxime Coquelin
@ 2017-10-05  8:36 ` Maxime Coquelin
  2017-10-05  8:36 ` [PATCH v3 07/19] vhost: iotlb: add pending miss request list and helpers Maxime Coquelin
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 48+ messages in thread
From: Maxime Coquelin @ 2017-10-05  8:36 UTC (permalink / raw)
  To: dev, remy.horton, tiwei.bie, yliu
  Cc: mst, jfreiman, vkaplans, jasowang, Maxime Coquelin
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/Makefile |   4 +-
 lib/librte_vhost/iotlb.c  | 259 ++++++++++++++++++++++++++++++++++++++++++++++
 lib/librte_vhost/iotlb.h  |  70 +++++++++++++
 lib/librte_vhost/vhost.c  |   1 +
 lib/librte_vhost/vhost.h  |   6 ++
 5 files changed, 338 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_vhost/iotlb.c
 create mode 100644 lib/librte_vhost/iotlb.h
diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index 4a116fe31..e1084aba5 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -47,8 +47,8 @@ LDLIBS += -lnuma
 endif
 
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c socket.c vhost.c vhost_user.c \
-				   virtio_net.c
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c iotlb.c socket.c vhost.c \
+					vhost_user.c virtio_net.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
diff --git a/lib/librte_vhost/iotlb.c b/lib/librte_vhost/iotlb.c
new file mode 100644
index 000000000..fcfdd25d7
--- /dev/null
+++ b/lib/librte_vhost/iotlb.c
@@ -0,0 +1,259 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright (c) 2017 Red Hat, Inc.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifdef RTE_LIBRTE_VHOST_NUMA
+#include <numaif.h>
+#endif
+
+#include <rte_tailq.h>
+
+#include "iotlb.h"
+#include "vhost.h"
+
+struct vhost_iotlb_entry {
+	TAILQ_ENTRY(vhost_iotlb_entry) next;
+
+	uint64_t iova;
+	uint64_t uaddr;
+	uint64_t size;
+	uint8_t perm;
+};
+
+#define IOTLB_CACHE_SIZE 1024
+
+static void
+vhost_user_iotlb_cache_remove_all(struct vhost_virtqueue *vq)
+{
+	struct vhost_iotlb_entry *node, *temp_node;
+
+	rte_rwlock_write_lock(&vq->iotlb_lock);
+
+	TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+		TAILQ_REMOVE(&vq->iotlb_list, node, next);
+		rte_mempool_put(vq->iotlb_pool, node);
+	}
+
+	vq->iotlb_cache_nr = 0;
+
+	rte_rwlock_write_unlock(&vq->iotlb_lock);
+}
+
+static void
+vhost_user_iotlb_cache_random_evict(struct vhost_virtqueue *vq)
+{
+	struct vhost_iotlb_entry *node, *temp_node;
+	int entry_idx;
+
+	rte_rwlock_write_lock(&vq->iotlb_lock);
+
+	entry_idx = rte_rand() % vq->iotlb_cache_nr;
+
+	TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+		if (!entry_idx) {
+			TAILQ_REMOVE(&vq->iotlb_list, node, next);
+			rte_mempool_put(vq->iotlb_pool, node);
+			vq->iotlb_cache_nr--;
+			break;
+		}
+		entry_idx--;
+	}
+
+	rte_rwlock_write_unlock(&vq->iotlb_lock);
+}
+
+void
+vhost_user_iotlb_cache_insert(struct vhost_virtqueue *vq, uint64_t iova,
+				uint64_t uaddr, uint64_t size, uint8_t perm)
+{
+	struct vhost_iotlb_entry *node, *new_node;
+	int ret;
+
+	ret = rte_mempool_get(vq->iotlb_pool, (void **)&new_node);
+	if (ret) {
+		RTE_LOG(DEBUG, VHOST_CONFIG, "IOTLB pool empty, evict one entry\n");
+		vhost_user_iotlb_cache_random_evict(vq);
+		ret = rte_mempool_get(vq->iotlb_pool, (void **)&new_node);
+		if (ret) {
+			RTE_LOG(ERR, VHOST_CONFIG, "IOTLB pool still empty, failure\n");
+			return;
+		}
+	}
+
+	new_node->iova = iova;
+	new_node->uaddr = uaddr;
+	new_node->size = size;
+	new_node->perm = perm;
+
+	rte_rwlock_write_lock(&vq->iotlb_lock);
+
+	TAILQ_FOREACH(node, &vq->iotlb_list, next) {
+		/*
+		 * Entries must be invalidated before being updated.
+		 * So if iova already in list, assume identical.
+		 */
+		if (node->iova == new_node->iova) {
+			rte_mempool_put(vq->iotlb_pool, new_node);
+			goto unlock;
+		} else if (node->iova > new_node->iova) {
+			TAILQ_INSERT_BEFORE(node, new_node, next);
+			vq->iotlb_cache_nr++;
+			goto unlock;
+		}
+	}
+
+	TAILQ_INSERT_TAIL(&vq->iotlb_list, new_node, next);
+	vq->iotlb_cache_nr++;
+
+unlock:
+	rte_rwlock_write_unlock(&vq->iotlb_lock);
+}
+
+void
+vhost_user_iotlb_cache_remove(struct vhost_virtqueue *vq,
+					uint64_t iova, uint64_t size)
+{
+	struct vhost_iotlb_entry *node, *temp_node;
+
+	if (unlikely(!size))
+		return;
+
+	rte_rwlock_write_lock(&vq->iotlb_lock);
+
+	TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+		/* Sorted list */
+		if (unlikely(iova + size < node->iova))
+			break;
+
+		if (iova < node->iova + node->size) {
+			TAILQ_REMOVE(&vq->iotlb_list, node, next);
+			rte_mempool_put(vq->iotlb_pool, node);
+			vq->iotlb_cache_nr--;
+		}
+	}
+
+	rte_rwlock_write_unlock(&vq->iotlb_lock);
+}
+
+uint64_t
+vhost_user_iotlb_cache_find(struct vhost_virtqueue *vq, uint64_t iova,
+						uint64_t *size, uint8_t perm)
+{
+	struct vhost_iotlb_entry *node;
+	uint64_t offset, vva = 0, mapped = 0;
+
+	if (unlikely(!*size))
+		goto out;
+
+	TAILQ_FOREACH(node, &vq->iotlb_list, next) {
+		/* List sorted by iova */
+		if (unlikely(iova < node->iova))
+			break;
+
+		if (iova >= node->iova + node->size)
+			continue;
+
+		if (unlikely((perm & node->perm) != perm)) {
+			vva = 0;
+			break;
+		}
+
+		offset = iova - node->iova;
+		if (!vva)
+			vva = node->uaddr + offset;
+
+		mapped += node->size - offset;
+		iova = node->iova + node->size;
+
+		if (mapped >= *size)
+			break;
+	}
+
+out:
+	/* Only part of the requested chunk is mapped */
+	if (unlikely(mapped < *size))
+		*size = mapped;
+
+	return vva;
+}
+
+int
+vhost_user_iotlb_init(struct virtio_net *dev, int vq_index)
+{
+	char pool_name[RTE_MEMPOOL_NAMESIZE];
+	struct vhost_virtqueue *vq = dev->virtqueue[vq_index];
+	int ret = -1, socket;
+
+	if (vq->iotlb_pool) {
+		/*
+		 * The cache has already been initialized,
+		 * just drop all entries
+		 */
+		vhost_user_iotlb_cache_remove_all(vq);
+		return 0;
+	}
+
+#ifdef RTE_LIBRTE_VHOST_NUMA
+	ret = get_mempolicy(&socket, NULL, 0, vq, MPOL_F_NODE | MPOL_F_ADDR);
+#endif
+	if (ret)
+		socket = 0;
+
+	rte_rwlock_init(&vq->iotlb_lock);
+
+	TAILQ_INIT(&vq->iotlb_list);
+
+	snprintf(pool_name, sizeof(pool_name), "iotlb_cache_%d_%d",
+			dev->vid, vq_index);
+
+	/* If already created, free it and recreate */
+	vq->iotlb_pool = rte_mempool_lookup(pool_name);
+	if (vq->iotlb_pool)
+		rte_mempool_free(vq->iotlb_pool);
+
+	vq->iotlb_pool = rte_mempool_create(pool_name,
+			IOTLB_CACHE_SIZE, sizeof(struct vhost_iotlb_entry), 0,
+			0, 0, NULL, NULL, NULL, socket,
+			MEMPOOL_F_NO_CACHE_ALIGN |
+			MEMPOOL_F_SP_PUT |
+			MEMPOOL_F_SC_GET);
+	if (!vq->iotlb_pool) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+				"Failed to create IOTLB cache pool (%s)\n",
+				pool_name);
+		return -1;
+	}
+
+	vq->iotlb_cache_nr = 0;
+
+	return 0;
+}
+
diff --git a/lib/librte_vhost/iotlb.h b/lib/librte_vhost/iotlb.h
new file mode 100644
index 000000000..27b2d6b30
--- /dev/null
+++ b/lib/librte_vhost/iotlb.h
@@ -0,0 +1,70 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright (c) 2017 Red Hat, Inc.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#ifndef _VHOST_IOTLB_H_
+#define _VHOST_IOTLB_H_
+
+#include "vhost.h"
+
+static __rte_always_inline void
+vhost_user_iotlb_rd_lock(struct vhost_virtqueue *vq)
+{
+	rte_rwlock_read_lock(&vq->iotlb_lock);
+}
+
+static __rte_always_inline void
+vhost_user_iotlb_rd_unlock(struct vhost_virtqueue *vq)
+{
+	rte_rwlock_read_unlock(&vq->iotlb_lock);
+}
+
+static __rte_always_inline void
+vhost_user_iotlb_wr_lock(struct vhost_virtqueue *vq)
+{
+	rte_rwlock_write_lock(&vq->iotlb_lock);
+}
+
+static __rte_always_inline void
+vhost_user_iotlb_wr_unlock(struct vhost_virtqueue *vq)
+{
+	rte_rwlock_write_unlock(&vq->iotlb_lock);
+}
+
+void vhost_user_iotlb_cache_insert(struct vhost_virtqueue *vq, uint64_t iova,
+					uint64_t uaddr, uint64_t size,
+					uint8_t perm);
+void vhost_user_iotlb_cache_remove(struct vhost_virtqueue *vq,
+					uint64_t iova, uint64_t size);
+uint64_t vhost_user_iotlb_cache_find(struct vhost_virtqueue *vq, uint64_t iova,
+					uint64_t *size, uint8_t perm);
+int vhost_user_iotlb_init(struct virtio_net *dev, int vq_index);
+
+#endif /* _VHOST_IOTLB_H_ */
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 2d30f14c4..edcf1e0c5 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -103,6 +103,7 @@ free_device(struct virtio_net *dev)
 
 		rte_free(vq->shadow_used_ring);
 		rte_free(vq->batch_copy_elems);
+		rte_mempool_free(vq->iotlb_pool);
 		rte_free(vq);
 	}
 
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 94bee4c8d..09a00186f 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -45,6 +45,7 @@
 
 #include <rte_log.h>
 #include <rte_ether.h>
+#include <rte_rwlock.h>
 
 #include "rte_vhost.h"
 
@@ -127,6 +128,11 @@ struct vhost_virtqueue {
 
 	struct batch_copy_elem	*batch_copy_elems;
 	uint16_t		batch_copy_nb_elems;
+
+	rte_rwlock_t	iotlb_lock;
+	struct rte_mempool *iotlb_pool;
+	TAILQ_HEAD(, vhost_iotlb_entry) iotlb_list;
+	int				iotlb_cache_nr;
 } __rte_cache_aligned;
 
 /* Old kernels have no such macros defined */
-- 
2.13.6
^ permalink raw reply related	[flat|nested] 48+ messages in thread
* [PATCH v3 07/19] vhost: iotlb: add pending miss request list and helpers
  2017-10-05  8:36 [PATCH v3 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (5 preceding siblings ...)
  2017-10-05  8:36 ` [PATCH v3 06/19] vhost: add iotlb helper functions Maxime Coquelin
@ 2017-10-05  8:36 ` Maxime Coquelin
  2017-10-05  8:36 ` [PATCH v3 08/19] vhost-user: add support to IOTLB miss slave requests Maxime Coquelin
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 48+ messages in thread
From: Maxime Coquelin @ 2017-10-05  8:36 UTC (permalink / raw)
  To: dev, remy.horton, tiwei.bie, yliu
  Cc: mst, jfreiman, vkaplans, jasowang, Maxime Coquelin
In order to be able to handle other ports or queues while waiting
for an IOTLB miss reply, a pending list is created so that waiter
can return and restart later on with sending again a miss request.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/iotlb.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++-
 lib/librte_vhost/iotlb.h |  6 +++
 lib/librte_vhost/vhost.h |  4 +-
 3 files changed, 104 insertions(+), 3 deletions(-)
diff --git a/lib/librte_vhost/iotlb.c b/lib/librte_vhost/iotlb.c
index fcfdd25d7..066c37a73 100644
--- a/lib/librte_vhost/iotlb.c
+++ b/lib/librte_vhost/iotlb.c
@@ -48,7 +48,94 @@ struct vhost_iotlb_entry {
 	uint8_t perm;
 };
 
-#define IOTLB_CACHE_SIZE 1024
+#define IOTLB_CACHE_SIZE 2048
+
+static void
+vhost_user_iotlb_pending_remove_all(struct vhost_virtqueue *vq)
+{
+	struct vhost_iotlb_entry *node, *temp_node;
+
+	rte_rwlock_write_lock(&vq->iotlb_pending_lock);
+
+	TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next, temp_node) {
+		TAILQ_REMOVE(&vq->iotlb_pending_list, node, next);
+		rte_mempool_put(vq->iotlb_pool, node);
+	}
+
+	rte_rwlock_write_unlock(&vq->iotlb_pending_lock);
+}
+
+bool
+vhost_user_iotlb_pending_miss(struct vhost_virtqueue *vq, uint64_t iova,
+				uint8_t perm)
+{
+	struct vhost_iotlb_entry *node;
+	bool found = false;
+
+	rte_rwlock_read_lock(&vq->iotlb_pending_lock);
+
+	TAILQ_FOREACH(node, &vq->iotlb_pending_list, next) {
+		if ((node->iova == iova) && (node->perm == perm)) {
+			found = true;
+			break;
+		}
+	}
+
+	rte_rwlock_read_unlock(&vq->iotlb_pending_lock);
+
+	return found;
+}
+
+void
+vhost_user_iotlb_pending_insert(struct vhost_virtqueue *vq,
+				uint64_t iova, uint8_t perm)
+{
+	struct vhost_iotlb_entry *node;
+	int ret;
+
+	ret = rte_mempool_get(vq->iotlb_pool, (void **)&node);
+	if (ret) {
+		RTE_LOG(INFO, VHOST_CONFIG,
+				"IOTLB pool empty, clear pending misses\n");
+		vhost_user_iotlb_pending_remove_all(vq);
+		ret = rte_mempool_get(vq->iotlb_pool, (void **)&node);
+		if (ret) {
+			RTE_LOG(ERR, VHOST_CONFIG, "IOTLB pool still empty, failure\n");
+			return;
+		}
+	}
+
+	node->iova = iova;
+	node->perm = perm;
+
+	rte_rwlock_write_lock(&vq->iotlb_pending_lock);
+
+	TAILQ_INSERT_TAIL(&vq->iotlb_pending_list, node, next);
+
+	rte_rwlock_write_unlock(&vq->iotlb_pending_lock);
+}
+
+static void
+vhost_user_iotlb_pending_remove(struct vhost_virtqueue *vq,
+				uint64_t iova, uint64_t size, uint8_t perm)
+{
+	struct vhost_iotlb_entry *node, *temp_node;
+
+	rte_rwlock_write_lock(&vq->iotlb_pending_lock);
+
+	TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next, temp_node) {
+		if (node->iova < iova)
+			continue;
+		if (node->iova >= iova + size)
+			continue;
+		if ((node->perm & perm) != node->perm)
+			continue;
+		TAILQ_REMOVE(&vq->iotlb_pending_list, node, next);
+		rte_mempool_put(vq->iotlb_pool, node);
+	}
+
+	rte_rwlock_write_unlock(&vq->iotlb_pending_lock);
+}
 
 static void
 vhost_user_iotlb_cache_remove_all(struct vhost_virtqueue *vq)
@@ -134,7 +221,10 @@ vhost_user_iotlb_cache_insert(struct vhost_virtqueue *vq, uint64_t iova,
 	vq->iotlb_cache_nr++;
 
 unlock:
+	vhost_user_iotlb_pending_remove(vq, iova, size, perm);
+
 	rte_rwlock_write_unlock(&vq->iotlb_lock);
+
 }
 
 void
@@ -215,9 +305,10 @@ vhost_user_iotlb_init(struct virtio_net *dev, int vq_index)
 	if (vq->iotlb_pool) {
 		/*
 		 * The cache has already been initialized,
-		 * just drop all entries
+		 * just drop all cached and pending entries.
 		 */
 		vhost_user_iotlb_cache_remove_all(vq);
+		vhost_user_iotlb_pending_remove_all(vq);
 		return 0;
 	}
 
@@ -228,8 +319,10 @@ vhost_user_iotlb_init(struct virtio_net *dev, int vq_index)
 		socket = 0;
 
 	rte_rwlock_init(&vq->iotlb_lock);
+	rte_rwlock_init(&vq->iotlb_pending_lock);
 
 	TAILQ_INIT(&vq->iotlb_list);
+	TAILQ_INIT(&vq->iotlb_pending_list);
 
 	snprintf(pool_name, sizeof(pool_name), "iotlb_cache_%d_%d",
 			dev->vid, vq_index);
diff --git a/lib/librte_vhost/iotlb.h b/lib/librte_vhost/iotlb.h
index 27b2d6b30..f1a050e44 100644
--- a/lib/librte_vhost/iotlb.h
+++ b/lib/librte_vhost/iotlb.h
@@ -32,6 +32,8 @@
 #ifndef _VHOST_IOTLB_H_
 #define _VHOST_IOTLB_H_
 
+#include <stdbool.h>
+
 #include "vhost.h"
 
 static __rte_always_inline void
@@ -65,6 +67,10 @@ void vhost_user_iotlb_cache_remove(struct vhost_virtqueue *vq,
 					uint64_t iova, uint64_t size);
 uint64_t vhost_user_iotlb_cache_find(struct vhost_virtqueue *vq, uint64_t iova,
 					uint64_t *size, uint8_t perm);
+bool vhost_user_iotlb_pending_miss(struct vhost_virtqueue *vq, uint64_t iova,
+						uint8_t perm);
+void vhost_user_iotlb_pending_insert(struct vhost_virtqueue *vq, uint64_t iova,
+						uint8_t perm);
 int vhost_user_iotlb_init(struct virtio_net *dev, int vq_index);
 
 #endif /* _VHOST_IOTLB_H_ */
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 09a00186f..8131bef9c 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -130,9 +130,11 @@ struct vhost_virtqueue {
 	uint16_t		batch_copy_nb_elems;
 
 	rte_rwlock_t	iotlb_lock;
+	rte_rwlock_t	iotlb_pending_lock;
 	struct rte_mempool *iotlb_pool;
 	TAILQ_HEAD(, vhost_iotlb_entry) iotlb_list;
-	int				iotlb_cache_nr;
+	int				iotlb_cache_nr;	
+	TAILQ_HEAD(, vhost_iotlb_entry) iotlb_pending_list;
 } __rte_cache_aligned;
 
 /* Old kernels have no such macros defined */
-- 
2.13.6
^ permalink raw reply related	[flat|nested] 48+ messages in thread
* [PATCH v3 08/19] vhost-user: add support to IOTLB miss slave requests
  2017-10-05  8:36 [PATCH v3 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (6 preceding siblings ...)
  2017-10-05  8:36 ` [PATCH v3 07/19] vhost: iotlb: add pending miss request list and helpers Maxime Coquelin
@ 2017-10-05  8:36 ` Maxime Coquelin
  2017-10-05  8:36 ` [PATCH v3 09/19] vhost: initialize vrings IOTLB caches Maxime Coquelin
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 48+ messages in thread
From: Maxime Coquelin @ 2017-10-05  8:36 UTC (permalink / raw)
  To: dev, remy.horton, tiwei.bie, yliu
  Cc: mst, jfreiman, vkaplans, jasowang, Maxime Coquelin
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost_user.c | 25 +++++++++++++++++++++++++
 lib/librte_vhost/vhost_user.h |  3 +++
 2 files changed, 28 insertions(+)
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 0ba66e193..3df5c5755 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -1168,3 +1168,28 @@ vhost_user_msg_handler(int vid, int fd)
 
 	return 0;
 }
+
+int vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm)
+{
+	int ret;
+	struct VhostUserMsg msg = {
+		.request = VHOST_USER_SLAVE_IOTLB_MSG,
+		.flags = VHOST_USER_VERSION,
+		.size = sizeof(msg.payload.iotlb),
+		.payload.iotlb = {
+			.iova = iova,
+			.perm = perm,
+			.type = VHOST_IOTLB_MISS,
+		},
+	};
+
+	ret = send_vhost_message(dev->slave_req_fd, &msg);
+	if (ret < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+				"Failed to send IOTLB miss message (%d)\n",
+				ret);
+		return ret;
+	}
+
+	return 0;
+}
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index 98f6e6f37..0b2aff14e 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -85,6 +85,7 @@ typedef enum VhostUserRequest {
 
 typedef enum VhostUserSlaveRequest {
 	VHOST_USER_SLAVE_NONE = 0,
+	VHOST_USER_SLAVE_IOTLB_MSG = 1,
 	VHOST_USER_SLAVE_MAX
 } VhostUserSlaveRequest;
 
@@ -122,6 +123,7 @@ typedef struct VhostUserMsg {
 		struct vhost_vring_addr addr;
 		VhostUserMemory memory;
 		VhostUserLog    log;
+		struct vhost_iotlb_msg iotlb;
 	} payload;
 	int fds[VHOST_MEMORY_MAX_NREGIONS];
 } __attribute((packed)) VhostUserMsg;
@@ -134,6 +136,7 @@ typedef struct VhostUserMsg {
 
 /* vhost_user.c */
 int vhost_user_msg_handler(int vid, int fd);
+int vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm);
 
 /* socket.c */
 int read_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num);
-- 
2.13.6
^ permalink raw reply related	[flat|nested] 48+ messages in thread
* [PATCH v3 09/19] vhost: initialize vrings IOTLB caches
  2017-10-05  8:36 [PATCH v3 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (7 preceding siblings ...)
  2017-10-05  8:36 ` [PATCH v3 08/19] vhost-user: add support to IOTLB miss slave requests Maxime Coquelin
@ 2017-10-05  8:36 ` Maxime Coquelin
  2017-10-05  8:36 ` [PATCH v3 10/19] vhost-user: handle IOTLB update and invalidate requests Maxime Coquelin
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 48+ messages in thread
From: Maxime Coquelin @ 2017-10-05  8:36 UTC (permalink / raw)
  To: dev, remy.horton, tiwei.bie, yliu
  Cc: mst, jfreiman, vkaplans, jasowang, Maxime Coquelin
The per-virtqueue IOTLB cache init is done at virtqueue
init time. init_vring_queue() now takes vring id as parameter,
so that the IOTLB cache mempool name can be generated.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.c | 32 +++++++++++++++++++++++++++-----
 1 file changed, 27 insertions(+), 5 deletions(-)
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index edcf1e0c5..2493a7992 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -48,6 +48,7 @@
 #include <rte_malloc.h>
 #include <rte_vhost.h>
 
+#include "iotlb.h"
 #include "vhost.h"
 
 struct virtio_net *vhost_devices[MAX_VHOST_DEVICE];
@@ -111,13 +112,25 @@ free_device(struct virtio_net *dev)
 }
 
 static void
-init_vring_queue(struct vhost_virtqueue *vq)
+init_vring_queue(struct virtio_net *dev, uint32_t vring_idx)
 {
+	struct vhost_virtqueue *vq;
+
+	if (vring_idx >= VHOST_MAX_VRING) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+				"Failed not init vring, out of bound (%d)\n",
+				vring_idx);
+		return;
+	}
+
+	vq = dev->virtqueue[vring_idx];
+
 	memset(vq, 0, sizeof(struct vhost_virtqueue));
 
 	vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
 	vq->callfd = VIRTIO_UNINITIALIZED_EVENTFD;
 
+	vhost_user_iotlb_init(dev, vring_idx);
 	/* Backends are set to -1 indicating an inactive device. */
 	vq->backend = -1;
 
@@ -131,12 +144,21 @@ init_vring_queue(struct vhost_virtqueue *vq)
 }
 
 static void
-reset_vring_queue(struct vhost_virtqueue *vq)
+reset_vring_queue(struct virtio_net *dev, uint32_t vring_idx)
 {
+	struct vhost_virtqueue *vq;
 	int callfd;
 
+	if (vring_idx >= VHOST_MAX_VRING) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+				"Failed not init vring, out of bound (%d)\n",
+				vring_idx);
+		return;
+	}
+
+	vq = dev->virtqueue[vring_idx];
 	callfd = vq->callfd;
-	init_vring_queue(vq);
+	init_vring_queue(dev, vring_idx);
 	vq->callfd = callfd;
 }
 
@@ -153,7 +175,7 @@ alloc_vring_queue(struct virtio_net *dev, uint32_t vring_idx)
 	}
 
 	dev->virtqueue[vring_idx] = vq;
-	init_vring_queue(vq);
+	init_vring_queue(dev, vring_idx);
 
 	dev->nr_vring += 1;
 
@@ -175,7 +197,7 @@ reset_device(struct virtio_net *dev)
 	dev->flags = 0;
 
 	for (i = 0; i < dev->nr_vring; i++)
-		reset_vring_queue(dev->virtqueue[i]);
+		reset_vring_queue(dev, i);
 }
 
 /*
-- 
2.13.6
^ permalink raw reply related	[flat|nested] 48+ messages in thread
* [PATCH v3 10/19] vhost-user: handle IOTLB update and invalidate requests
  2017-10-05  8:36 [PATCH v3 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (8 preceding siblings ...)
  2017-10-05  8:36 ` [PATCH v3 09/19] vhost: initialize vrings IOTLB caches Maxime Coquelin
@ 2017-10-05  8:36 ` Maxime Coquelin
  2017-10-05  8:36 ` [PATCH v3 11/19] vhost: introduce guest IOVA to backend VA helper Maxime Coquelin
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 48+ messages in thread
From: Maxime Coquelin @ 2017-10-05  8:36 UTC (permalink / raw)
  To: dev, remy.horton, tiwei.bie, yliu
  Cc: mst, jfreiman, vkaplans, jasowang, Maxime Coquelin
Vhost-user device IOTLB protocol extension introduces
VHOST_USER_IOTLB message type. The associated payload is the
vhost_iotlb_msg struct defined in Kernel, which in this was can
be either an IOTLB update or invalidate message.
On IOTLB update, the virtqueues get notified of a new entry.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost_user.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 lib/librte_vhost/vhost_user.h |  1 +
 2 files changed, 44 insertions(+)
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 3df5c5755..0f23ea388 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -48,6 +48,7 @@
 #include <rte_malloc.h>
 #include <rte_log.h>
 
+#include "iotlb.h"
 #include "vhost.h"
 #include "vhost_user.h"
 
@@ -77,6 +78,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
 	[VHOST_USER_SEND_RARP]  = "VHOST_USER_SEND_RARP",
 	[VHOST_USER_NET_SET_MTU]  = "VHOST_USER_NET_SET_MTU",
 	[VHOST_USER_SET_SLAVE_REQ_FD]  = "VHOST_USER_SET_SLAVE_REQ_FD",
+	[VHOST_USER_IOTLB_MSG]  = "VHOST_USER_IOTLB_MSG",
 };
 
 static uint64_t
@@ -909,6 +911,43 @@ vhost_user_set_req_fd(struct virtio_net *dev, struct VhostUserMsg *msg)
 	return 0;
 }
 
+static int
+vhost_user_iotlb_msg(struct virtio_net *dev, struct VhostUserMsg *msg)
+{
+	struct vhost_iotlb_msg *imsg = &msg->payload.iotlb;
+	uint16_t i;
+	uint64_t vva;
+
+	switch (imsg->type) {
+	case VHOST_IOTLB_UPDATE:
+		vva = qva_to_vva(dev, imsg->uaddr);
+		if (!vva)
+			return -1;
+
+		for (i = 0; i < dev->nr_vring; i++) {
+			struct vhost_virtqueue *vq = dev->virtqueue[i];
+
+			vhost_user_iotlb_cache_insert(vq, imsg->iova, vva,
+					imsg->size, imsg->perm);
+		}
+		break;
+	case VHOST_IOTLB_INVALIDATE:
+		for (i = 0; i < dev->nr_vring; i++) {
+			struct vhost_virtqueue *vq = dev->virtqueue[i];
+
+			vhost_user_iotlb_cache_remove(vq, imsg->iova,
+					imsg->size);
+		}
+		break;
+	default:
+		RTE_LOG(ERR, VHOST_CONFIG, "Invalid IOTLB message type (%d)\n",
+				imsg->type);
+		return -1;
+	}
+
+	return 0;
+}
+
 /* return bytes# of read on success or negative val on failure. */
 static int
 read_vhost_message(int sockfd, struct VhostUserMsg *msg)
@@ -1140,6 +1179,10 @@ vhost_user_msg_handler(int vid, int fd)
 		ret = vhost_user_set_req_fd(dev, &msg);
 		break;
 
+	case VHOST_USER_IOTLB_MSG:
+		ret = vhost_user_iotlb_msg(dev, &msg);
+		break;
+
 	default:
 		ret = -1;
 		break;
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index 0b2aff14e..46c6ff956 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -80,6 +80,7 @@ typedef enum VhostUserRequest {
 	VHOST_USER_SEND_RARP = 19,
 	VHOST_USER_NET_SET_MTU = 20,
 	VHOST_USER_SET_SLAVE_REQ_FD = 21,
+	VHOST_USER_IOTLB_MSG = 22,
 	VHOST_USER_MAX
 } VhostUserRequest;
 
-- 
2.13.6
^ permalink raw reply related	[flat|nested] 48+ messages in thread
* [PATCH v3 11/19] vhost: introduce guest IOVA to backend VA helper
  2017-10-05  8:36 [PATCH v3 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (9 preceding siblings ...)
  2017-10-05  8:36 ` [PATCH v3 10/19] vhost-user: handle IOTLB update and invalidate requests Maxime Coquelin
@ 2017-10-05  8:36 ` Maxime Coquelin
  2017-10-05  8:36 ` [PATCH v3 12/19] vhost: use the guest IOVA to host " Maxime Coquelin
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 48+ messages in thread
From: Maxime Coquelin @ 2017-10-05  8:36 UTC (permalink / raw)
  To: dev, remy.horton, tiwei.bie, yliu
  Cc: mst, jfreiman, vkaplans, jasowang, Maxime Coquelin
This patch introduces vhost_iova_to_vva() function to translate
guest's IO virtual addresses to backend's virtual addresses.
When IOMMU is enabled, the IOTLB cache is queried to get the
translation. If missing from the IOTLB cache, an IOTLB_MISS request
is sent to Qemu, and IOTLB cache is queried again on IOTLB event
notification.
When IOMMU is disabled, the passed address is a guest's physical
address, so the legacy rte_vhost_gpa_to_vva() API is used.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.c | 24 ++++++++++++++++++++++++
 lib/librte_vhost/vhost.h | 13 +++++++++++++
 2 files changed, 37 insertions(+)
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 2493a7992..6f243534e 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -47,12 +47,36 @@
 #include <rte_memory.h>
 #include <rte_malloc.h>
 #include <rte_vhost.h>
+#include <rte_rwlock.h>
 
 #include "iotlb.h"
 #include "vhost.h"
+#include "vhost_user.h"
 
 struct virtio_net *vhost_devices[MAX_VHOST_DEVICE];
 
+uint64_t __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
+			uint64_t iova, uint64_t size, uint8_t perm)
+{
+	uint64_t vva, tmp_size;
+
+	if (unlikely(!size))
+		return 0;
+
+	tmp_size = size;
+
+	vva = vhost_user_iotlb_cache_find(vq, iova, &tmp_size, perm);
+	if (tmp_size == size)
+		return vva;
+
+	if (!vhost_user_iotlb_pending_miss(vq, iova + tmp_size, perm)) {
+		vhost_user_iotlb_pending_insert(vq, iova + tmp_size, perm);
+		vhost_user_iotlb_miss(dev, iova + tmp_size, perm);
+	}
+
+	return 0;
+}
+
 struct virtio_net *
 get_device(int vid)
 {
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 8131bef9c..79351c66f 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -375,4 +375,17 @@ struct vhost_device_ops const *vhost_driver_callback_get(const char *path);
  */
 void vhost_backend_cleanup(struct virtio_net *dev);
 
+uint64_t __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
+			uint64_t iova, uint64_t size, uint8_t perm);
+
+static __rte_always_inline uint64_t
+vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
+			uint64_t iova, uint64_t size, uint8_t perm)
+{
+	if (!(dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)))
+		return rte_vhost_gpa_to_vva(dev->mem, iova);
+
+	return __vhost_iova_to_vva(dev, vq, iova, size, perm);
+}
+
 #endif /* _VHOST_NET_CDEV_H_ */
-- 
2.13.6
^ permalink raw reply related	[flat|nested] 48+ messages in thread
* [PATCH v3 12/19] vhost: use the guest IOVA to host VA helper
  2017-10-05  8:36 [PATCH v3 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (10 preceding siblings ...)
  2017-10-05  8:36 ` [PATCH v3 11/19] vhost: introduce guest IOVA to backend VA helper Maxime Coquelin
@ 2017-10-05  8:36 ` Maxime Coquelin
  2017-10-05  8:36 ` [PATCH v3 13/19] vhost: enable rings at the right time Maxime Coquelin
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 48+ messages in thread
From: Maxime Coquelin @ 2017-10-05  8:36 UTC (permalink / raw)
  To: dev, remy.horton, tiwei.bie, yliu
  Cc: mst, jfreiman, vkaplans, jasowang, Maxime Coquelin
Replace rte_vhost_gpa_to_vva() calls with vhost_iova_to_vva(), which
requires to also pass the mapped len and the access permissions needed.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/virtio_net.c | 71 +++++++++++++++++++++++++++++++++++--------
 1 file changed, 58 insertions(+), 13 deletions(-)
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 59ff6c875..cdfb6f957 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -45,6 +45,7 @@
 #include <rte_sctp.h>
 #include <rte_arp.h>
 
+#include "iotlb.h"
 #include "vhost.h"
 
 #define MAX_PKT_BURST 32
@@ -211,7 +212,8 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	int error = 0;
 
 	desc = &descs[desc_idx];
-	desc_addr = rte_vhost_gpa_to_vva(dev->mem, desc->addr);
+	desc_addr = vhost_iova_to_vva(dev, vq, desc->addr,
+					desc->len, VHOST_ACCESS_RW);
 	/*
 	 * Checking of 'desc_addr' placed outside of 'unlikely' macro to avoid
 	 * performance issue with some versions of gcc (4.8.4 and 5.3.0) which
@@ -255,7 +257,9 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			}
 
 			desc = &descs[desc->next];
-			desc_addr = rte_vhost_gpa_to_vva(dev->mem, desc->addr);
+			desc_addr = vhost_iova_to_vva(dev, vq, desc->addr,
+							desc->len,
+							VHOST_ACCESS_RW);
 			if (unlikely(!desc_addr)) {
 				error = -1;
 				goto out;
@@ -352,14 +356,20 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 	}
 
 	rte_prefetch0(&vq->desc[desc_indexes[0]]);
+
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_lock(vq);
+
 	for (i = 0; i < count; i++) {
 		uint16_t desc_idx = desc_indexes[i];
 		int err;
 
 		if (vq->desc[desc_idx].flags & VRING_DESC_F_INDIRECT) {
 			descs = (struct vring_desc *)(uintptr_t)
-				rte_vhost_gpa_to_vva(dev->mem,
-					vq->desc[desc_idx].addr);
+				vhost_iova_to_vva(dev,
+						vq, vq->desc[desc_idx].addr,
+						vq->desc[desc_idx].len,
+						VHOST_ACCESS_RO);
 			if (unlikely(!descs)) {
 				count = i;
 				break;
@@ -384,6 +394,9 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 
 	do_data_copy_enqueue(dev, vq);
 
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_unlock(vq);
+
 	rte_smp_wmb();
 
 	*(volatile uint16_t *)&vq->used->idx += count;
@@ -417,7 +430,9 @@ fill_vec_buf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 	if (vq->desc[idx].flags & VRING_DESC_F_INDIRECT) {
 		descs = (struct vring_desc *)(uintptr_t)
-			rte_vhost_gpa_to_vva(dev->mem, vq->desc[idx].addr);
+			vhost_iova_to_vva(dev, vq, vq->desc[idx].addr,
+						vq->desc[idx].len,
+						VHOST_ACCESS_RO);
 		if (unlikely(!descs))
 			return -1;
 
@@ -512,7 +527,9 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		goto out;
 	}
 
-	desc_addr = rte_vhost_gpa_to_vva(dev->mem, buf_vec[vec_idx].buf_addr);
+	desc_addr = vhost_iova_to_vva(dev, vq, buf_vec[vec_idx].buf_addr,
+						buf_vec[vec_idx].buf_len,
+						VHOST_ACCESS_RW);
 	if (buf_vec[vec_idx].buf_len < dev->vhost_hlen || !desc_addr) {
 		error = -1;
 		goto out;
@@ -535,8 +552,11 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		/* done with current desc buf, get the next one */
 		if (desc_avail == 0) {
 			vec_idx++;
-			desc_addr = rte_vhost_gpa_to_vva(dev->mem,
-					buf_vec[vec_idx].buf_addr);
+			desc_addr =
+				vhost_iova_to_vva(dev, vq,
+					buf_vec[vec_idx].buf_addr,
+					buf_vec[vec_idx].buf_len,
+					VHOST_ACCESS_RW);
 			if (unlikely(!desc_addr)) {
 				error = -1;
 				goto out;
@@ -637,6 +657,10 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
 
 	vq->shadow_used_idx = 0;
 	avail_head = *((volatile uint16_t *)&vq->avail->idx);
+
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_lock(vq);
+
 	for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
 		uint32_t pkt_len = pkts[pkt_idx]->pkt_len + dev->vhost_hlen;
 
@@ -665,6 +689,9 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
 
 	do_data_copy_enqueue(dev, vq);
 
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_unlock(vq);
+
 	if (likely(vq->shadow_used_idx)) {
 		flush_shadow_used_ring(dev, vq);
 
@@ -875,7 +902,10 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		goto out;
 	}
 
-	desc_addr = rte_vhost_gpa_to_vva(dev->mem, desc->addr);
+	desc_addr = vhost_iova_to_vva(dev,
+					vq, desc->addr,
+					desc->len,
+					VHOST_ACCESS_RO);
 	if (unlikely(!desc_addr)) {
 		error = -1;
 		goto out;
@@ -899,7 +929,10 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			goto out;
 		}
 
-		desc_addr = rte_vhost_gpa_to_vva(dev->mem, desc->addr);
+		desc_addr = vhost_iova_to_vva(dev,
+							vq, desc->addr,
+							desc->len,
+							VHOST_ACCESS_RO);
 		if (unlikely(!desc_addr)) {
 			error = -1;
 			goto out;
@@ -982,7 +1015,10 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 				goto out;
 			}
 
-			desc_addr = rte_vhost_gpa_to_vva(dev->mem, desc->addr);
+			desc_addr = vhost_iova_to_vva(dev,
+							vq, desc->addr,
+							desc->len,
+							VHOST_ACCESS_RO);
 			if (unlikely(!desc_addr)) {
 				error = -1;
 				goto out;
@@ -1226,6 +1262,10 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 
 	/* Prefetch descriptor index. */
 	rte_prefetch0(&vq->desc[desc_indexes[0]]);
+
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_lock(vq);
+
 	for (i = 0; i < count; i++) {
 		struct vring_desc *desc;
 		uint16_t sz, idx;
@@ -1236,8 +1276,10 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 
 		if (vq->desc[desc_indexes[i]].flags & VRING_DESC_F_INDIRECT) {
 			desc = (struct vring_desc *)(uintptr_t)
-				rte_vhost_gpa_to_vva(dev->mem,
-					vq->desc[desc_indexes[i]].addr);
+				vhost_iova_to_vva(dev, vq,
+						vq->desc[desc_indexes[i]].addr,
+						sizeof(*desc),
+						VHOST_ACCESS_RO);
 			if (unlikely(!desc))
 				break;
 
@@ -1287,6 +1329,9 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 			TAILQ_INSERT_TAIL(&vq->zmbuf_list, zmbuf, next);
 		}
 	}
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_unlock(vq);
+
 	vq->last_avail_idx += i;
 
 	if (likely(dev->dequeue_zero_copy == 0)) {
-- 
2.13.6
^ permalink raw reply related	[flat|nested] 48+ messages in thread
* [PATCH v3 13/19] vhost: enable rings at the right time
  2017-10-05  8:36 [PATCH v3 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (11 preceding siblings ...)
  2017-10-05  8:36 ` [PATCH v3 12/19] vhost: use the guest IOVA to host " Maxime Coquelin
@ 2017-10-05  8:36 ` Maxime Coquelin
  2017-10-05  8:36 ` [PATCH v3 14/19] vhost: don't dereference invalid dev pointer after its reallocation Maxime Coquelin
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 48+ messages in thread
From: Maxime Coquelin @ 2017-10-05  8:36 UTC (permalink / raw)
  To: dev, remy.horton, tiwei.bie, yliu
  Cc: mst, jfreiman, vkaplans, jasowang, Maxime Coquelin
When VHOST_USER_F_PROTOCOL_FEATURES is negotiated, the ring is not
enabled when started, but enabled through dedicated
VHOST_USER_SET_VRING_ENABLE request.
When not negotiated, the ring is started in enabled state, at
VHOST_USER_SET_VRING_KICK request time.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.c      | 6 ------
 lib/librte_vhost/vhost_user.c | 9 +++++++++
 2 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 6f243534e..0e2ad3322 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -158,12 +158,6 @@ init_vring_queue(struct virtio_net *dev, uint32_t vring_idx)
 	/* Backends are set to -1 indicating an inactive device. */
 	vq->backend = -1;
 
-	/*
-	 * always set the vq to enabled; this is to keep compatibility
-	 * with the old QEMU, whereas there is no SET_VRING_ENABLE message.
-	 */
-	vq->enabled = 1;
-
 	TAILQ_INIT(&vq->zmbuf_list);
 }
 
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 0f23ea388..8aca7ef7e 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -699,6 +699,15 @@ vhost_user_set_vring_kick(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 		"vring kick idx:%d file:%d\n", file.index, file.fd);
 
 	vq = dev->virtqueue[file.index];
+
+	/*
+	 * When VHOST_USER_F_PROTOCOL_FEATURES is not negotiated,
+	 * the ring starts already enabled. Otherwise, it is enabled via
+	 * the SET_VRING_ENABLE message.
+	 */
+	if (!(dev->features & (1ULL << VHOST_USER_F_PROTOCOL_FEATURES)))
+		vq->enabled = 1;
+
 	if (vq->kickfd >= 0)
 		close(vq->kickfd);
 	vq->kickfd = file.fd;
-- 
2.13.6
^ permalink raw reply related	[flat|nested] 48+ messages in thread
* [PATCH v3 14/19] vhost: don't dereference invalid dev pointer after its reallocation
  2017-10-05  8:36 [PATCH v3 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (12 preceding siblings ...)
  2017-10-05  8:36 ` [PATCH v3 13/19] vhost: enable rings at the right time Maxime Coquelin
@ 2017-10-05  8:36 ` Maxime Coquelin
  2017-10-05  8:36 ` [PATCH v3 15/19] vhost: postpone rings addresses translation Maxime Coquelin
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 48+ messages in thread
From: Maxime Coquelin @ 2017-10-05  8:36 UTC (permalink / raw)
  To: dev, remy.horton, tiwei.bie, yliu
  Cc: mst, jfreiman, vkaplans, jasowang, Maxime Coquelin, stable
numa_realloc() reallocates the virtio_net device structure and
updates the vhost_devices[] table with the new pointer if the rings
are allocated different NUMA node.
Problem is that vhost_user_msg_handler() still dereferences old
pointer afterward.
This patch prevents this by fetching again the dev pointer in
vhost_devices[] after messages have been handled.
Cc: stable@dpdk.org
Fixes: af295ad4698c ("vhost: realloc device and queues to same numa node as vring desc")
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost_user.c | 6 ++++++
 1 file changed, 6 insertions(+)
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 8aca7ef7e..f495dd36e 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -1198,6 +1198,12 @@ vhost_user_msg_handler(int vid, int fd)
 
 	}
 
+	/*
+	 * The virtio_net struct might have been reallocated on a different
+	 * NUMA node, so dev pointer might no more be valid.
+	 */
+	dev = get_device(vid);
+
 	if (msg.flags & VHOST_USER_NEED_REPLY) {
 		msg.payload.u64 = !!ret;
 		msg.size = sizeof(msg.payload.u64);
-- 
2.13.6
^ permalink raw reply related	[flat|nested] 48+ messages in thread
* [PATCH v3 15/19] vhost: postpone rings addresses translation
  2017-10-05  8:36 [PATCH v3 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (13 preceding siblings ...)
  2017-10-05  8:36 ` [PATCH v3 14/19] vhost: don't dereference invalid dev pointer after its reallocation Maxime Coquelin
@ 2017-10-05  8:36 ` Maxime Coquelin
  2017-10-13  1:47   ` Yao, Lei A
  2017-10-05  8:36 ` [PATCH v3 16/19] vhost-user: translate ring addresses when IOMMU enabled Maxime Coquelin
                   ` (4 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Maxime Coquelin @ 2017-10-05  8:36 UTC (permalink / raw)
  To: dev, remy.horton, tiwei.bie, yliu
  Cc: mst, jfreiman, vkaplans, jasowang, Maxime Coquelin
This patch postpones rings addresses translations and checks, as
addresses sent by the master shuld not be interpreted as long as
ring is not started and enabled[0].
When protocol features aren't negotiated, the ring is started in
enabled state, so the addresses translations are postponed to
vhost_user_set_vring_kick().
Otherwise, it is postponed to when ring is enabled, in
vhost_user_set_vring_enable().
[0]: http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg04355.html
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.h      |  1 +
 lib/librte_vhost/vhost_user.c | 69 ++++++++++++++++++++++++++++++++++---------
 2 files changed, 56 insertions(+), 14 deletions(-)
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 79351c66f..903da5db5 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -125,6 +125,7 @@ struct vhost_virtqueue {
 
 	struct vring_used_elem  *shadow_used_ring;
 	uint16_t                shadow_used_idx;
+	struct vhost_vring_addr ring_addrs;
 
 	struct batch_copy_elem	*batch_copy_elems;
 	uint16_t		batch_copy_nb_elems;
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index f495dd36e..319867c65 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -356,6 +356,7 @@ static int
 vhost_user_set_vring_addr(struct virtio_net *dev, VhostUserMsg *msg)
 {
 	struct vhost_virtqueue *vq;
+	struct vhost_vring_addr *addr = &msg->payload.addr;
 
 	if (dev->mem == NULL)
 		return -1;
@@ -363,35 +364,50 @@ vhost_user_set_vring_addr(struct virtio_net *dev, VhostUserMsg *msg)
 	/* addr->index refers to the queue index. The txq 1, rxq is 0. */
 	vq = dev->virtqueue[msg->payload.addr.index];
 
+	/*
+	 * Rings addresses should not be interpreted as long as the ring is not
+	 * started and enabled
+	 */
+	memcpy(&vq->ring_addrs, addr, sizeof(*addr));
+
+	return 0;
+}
+
+static struct virtio_net *translate_ring_addresses(struct virtio_net *dev,
+		int vq_index)
+{
+	struct vhost_virtqueue *vq = dev->virtqueue[vq_index];
+	struct vhost_vring_addr *addr = &vq->ring_addrs;
+
 	/* The addresses are converted from QEMU virtual to Vhost virtual. */
 	vq->desc = (struct vring_desc *)(uintptr_t)qva_to_vva(dev,
-			msg->payload.addr.desc_user_addr);
+			addr->desc_user_addr);
 	if (vq->desc == 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"(%d) failed to find desc ring address.\n",
 			dev->vid);
-		return -1;
+		return NULL;
 	}
 
-	dev = numa_realloc(dev, msg->payload.addr.index);
-	vq = dev->virtqueue[msg->payload.addr.index];
+	dev = numa_realloc(dev, vq_index);
+	vq = dev->virtqueue[vq_index];
 
 	vq->avail = (struct vring_avail *)(uintptr_t)qva_to_vva(dev,
-			msg->payload.addr.avail_user_addr);
+			addr->avail_user_addr);
 	if (vq->avail == 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"(%d) failed to find avail ring address.\n",
 			dev->vid);
-		return -1;
+		return NULL;
 	}
 
 	vq->used = (struct vring_used *)(uintptr_t)qva_to_vva(dev,
-			msg->payload.addr.used_user_addr);
+			addr->used_user_addr);
 	if (vq->used == 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"(%d) failed to find used ring address.\n",
 			dev->vid);
-		return -1;
+		return NULL;
 	}
 
 	if (vq->last_used_idx != vq->used->idx) {
@@ -403,7 +419,7 @@ vhost_user_set_vring_addr(struct virtio_net *dev, VhostUserMsg *msg)
 		vq->last_avail_idx = vq->used->idx;
 	}
 
-	vq->log_guest_addr = msg->payload.addr.log_guest_addr;
+	vq->log_guest_addr = addr->log_guest_addr;
 
 	LOG_DEBUG(VHOST_CONFIG, "(%d) mapped address desc: %p\n",
 			dev->vid, vq->desc);
@@ -414,7 +430,7 @@ vhost_user_set_vring_addr(struct virtio_net *dev, VhostUserMsg *msg)
 	LOG_DEBUG(VHOST_CONFIG, "(%d) log_guest_addr: %" PRIx64 "\n",
 			dev->vid, vq->log_guest_addr);
 
-	return 0;
+	return dev;
 }
 
 /*
@@ -685,10 +701,11 @@ vhost_user_set_vring_call(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 }
 
 static void
-vhost_user_set_vring_kick(struct virtio_net *dev, struct VhostUserMsg *pmsg)
+vhost_user_set_vring_kick(struct virtio_net **pdev, struct VhostUserMsg *pmsg)
 {
 	struct vhost_vring_file file;
 	struct vhost_virtqueue *vq;
+	struct virtio_net *dev = *pdev;
 
 	file.index = pmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
 	if (pmsg->payload.u64 & VHOST_USER_VRING_NOFD_MASK)
@@ -698,6 +715,16 @@ vhost_user_set_vring_kick(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 	RTE_LOG(INFO, VHOST_CONFIG,
 		"vring kick idx:%d file:%d\n", file.index, file.fd);
 
+	/*
+	 * Interpret ring addresses only when ring is started and enabled.
+	 * This is now if protocol features aren't supported.
+	 */
+	if (!(dev->features & (1ULL << VHOST_USER_F_PROTOCOL_FEATURES))) {
+		*pdev = dev = translate_ring_addresses(dev, file.index);
+		if (!dev)
+			return;
+	}
+
 	vq = dev->virtqueue[file.index];
 
 	/*
@@ -778,15 +805,29 @@ vhost_user_get_vring_base(struct virtio_net *dev,
  * enable the virtio queue pair.
  */
 static int
-vhost_user_set_vring_enable(struct virtio_net *dev,
+vhost_user_set_vring_enable(struct virtio_net **pdev,
 			    VhostUserMsg *msg)
 {
+	struct virtio_net *dev = *pdev;
 	int enable = (int)msg->payload.state.num;
 
 	RTE_LOG(INFO, VHOST_CONFIG,
 		"set queue enable: %d to qp idx: %d\n",
 		enable, msg->payload.state.index);
 
+	/*
+	 * Interpret ring addresses only when ring is started and enabled.
+	 * This is now if protocol features are supported.
+	 */
+	if (enable && (dev->features &
+				(1ULL << VHOST_USER_F_PROTOCOL_FEATURES))) {
+		dev = translate_ring_addresses(dev, msg->payload.state.index);
+		if (!dev)
+			return -1;
+
+		*pdev = dev;
+	}
+
 	if (dev->notify_ops->vring_state_changed)
 		dev->notify_ops->vring_state_changed(dev->vid,
 				msg->payload.state.index, enable);
@@ -1155,7 +1196,7 @@ vhost_user_msg_handler(int vid, int fd)
 		break;
 
 	case VHOST_USER_SET_VRING_KICK:
-		vhost_user_set_vring_kick(dev, &msg);
+		vhost_user_set_vring_kick(&dev, &msg);
 		break;
 	case VHOST_USER_SET_VRING_CALL:
 		vhost_user_set_vring_call(dev, &msg);
@@ -1174,7 +1215,7 @@ vhost_user_msg_handler(int vid, int fd)
 		break;
 
 	case VHOST_USER_SET_VRING_ENABLE:
-		vhost_user_set_vring_enable(dev, &msg);
+		vhost_user_set_vring_enable(&dev, &msg);
 		break;
 	case VHOST_USER_SEND_RARP:
 		vhost_user_send_rarp(dev, &msg);
-- 
2.13.6
^ permalink raw reply related	[flat|nested] 48+ messages in thread
* [PATCH v3 16/19] vhost-user: translate ring addresses when IOMMU enabled
  2017-10-05  8:36 [PATCH v3 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (14 preceding siblings ...)
  2017-10-05  8:36 ` [PATCH v3 15/19] vhost: postpone rings addresses translation Maxime Coquelin
@ 2017-10-05  8:36 ` Maxime Coquelin
  2017-10-05  8:36 ` [PATCH v3 17/19] vhost-user: iommu: postpone device creation until ring are mapped Maxime Coquelin
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 48+ messages in thread
From: Maxime Coquelin @ 2017-10-05  8:36 UTC (permalink / raw)
  To: dev, remy.horton, tiwei.bie, yliu
  Cc: mst, jfreiman, vkaplans, jasowang, Maxime Coquelin
When IOMMU is enabled, the ring addresses set by the
VHOST_USER_SET_VRING_ADDR requests are guest's IO virtual addresses,
whereas Qemu virtual addresses when IOMMU is disabled.
When enabled and the required translation is not in the IOTLB cache,
an IOTLB miss request is sent, but being called by the vhost-user
socket handling thread, the function does not wait for the requested
IOTLB update.
The function will be called again on the next IOTLB update message
reception if matching the vring addresses.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost_user.c | 44 +++++++++++++++++++++++++++++++++----------
 1 file changed, 34 insertions(+), 10 deletions(-)
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 319867c65..90b209764 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -324,10 +324,7 @@ numa_realloc(struct virtio_net *dev, int index __rte_unused)
 }
 #endif
 
-/*
- * Converts QEMU virtual address to Vhost virtual address. This function is
- * used to convert the ring addresses to our address space.
- */
+/* Converts QEMU virtual address to Vhost virtual address. */
 static uint64_t
 qva_to_vva(struct virtio_net *dev, uint64_t qva)
 {
@@ -348,6 +345,30 @@ qva_to_vva(struct virtio_net *dev, uint64_t qva)
 	return 0;
 }
 
+
+/*
+ * Converts ring address to Vhost virtual address.
+ * If IOMMU is enabled, the ring address is a guest IO virtual address,
+ * else it is a QEMU virtual address.
+ */
+static uint64_t
+ring_addr_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		uint64_t ra, uint64_t size)
+{
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) {
+		uint64_t vva;
+
+		vva = vhost_user_iotlb_cache_find(vq, ra,
+					&size, VHOST_ACCESS_RW);
+		if (!vva)
+			vhost_user_iotlb_miss(dev, ra, VHOST_ACCESS_RW);
+
+		return vva;
+	}
+
+	return qva_to_vva(dev, ra);
+}
+
 /*
  * The virtio device sends us the desc, used and avail ring addresses.
  * This function then converts these to our address space.
@@ -380,8 +401,11 @@ static struct virtio_net *translate_ring_addresses(struct virtio_net *dev,
 	struct vhost_vring_addr *addr = &vq->ring_addrs;
 
 	/* The addresses are converted from QEMU virtual to Vhost virtual. */
-	vq->desc = (struct vring_desc *)(uintptr_t)qva_to_vva(dev,
-			addr->desc_user_addr);
+	if (vq->desc && vq->avail && vq->used)
+		return dev;
+
+	vq->desc = (struct vring_desc *)(uintptr_t)ring_addr_to_vva(dev,
+			vq, addr->desc_user_addr, sizeof(struct vring_desc));
 	if (vq->desc == 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"(%d) failed to find desc ring address.\n",
@@ -392,8 +416,8 @@ static struct virtio_net *translate_ring_addresses(struct virtio_net *dev,
 	dev = numa_realloc(dev, vq_index);
 	vq = dev->virtqueue[vq_index];
 
-	vq->avail = (struct vring_avail *)(uintptr_t)qva_to_vva(dev,
-			addr->avail_user_addr);
+	vq->avail = (struct vring_avail *)(uintptr_t)ring_addr_to_vva(dev,
+			vq, addr->avail_user_addr, sizeof(struct vring_avail));
 	if (vq->avail == 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"(%d) failed to find avail ring address.\n",
@@ -401,8 +425,8 @@ static struct virtio_net *translate_ring_addresses(struct virtio_net *dev,
 		return NULL;
 	}
 
-	vq->used = (struct vring_used *)(uintptr_t)qva_to_vva(dev,
-			addr->used_user_addr);
+	vq->used = (struct vring_used *)(uintptr_t)ring_addr_to_vva(dev,
+			vq, addr->used_user_addr, sizeof(struct vring_used));
 	if (vq->used == 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"(%d) failed to find used ring address.\n",
-- 
2.13.6
^ permalink raw reply related	[flat|nested] 48+ messages in thread
* [PATCH v3 17/19] vhost-user: iommu: postpone device creation until ring are mapped
  2017-10-05  8:36 [PATCH v3 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (15 preceding siblings ...)
  2017-10-05  8:36 ` [PATCH v3 16/19] vhost-user: translate ring addresses when IOMMU enabled Maxime Coquelin
@ 2017-10-05  8:36 ` Maxime Coquelin
  2017-11-02  7:21   ` Yao, Lei A
  2017-10-05  8:36 ` [PATCH v3 18/19] vhost: iommu: Invalidate vring in case of matching IOTLB invalidate Maxime Coquelin
                   ` (2 subsequent siblings)
  19 siblings, 1 reply; 48+ messages in thread
From: Maxime Coquelin @ 2017-10-05  8:36 UTC (permalink / raw)
  To: dev, remy.horton, tiwei.bie, yliu
  Cc: mst, jfreiman, vkaplans, jasowang, Maxime Coquelin
Translating the start addresses of the rings is not enough, we need to
be sure all the ring is made available by the guest.
It depends on the size of the rings, which is not known on SET_VRING_ADDR
reception. Furthermore, we need to be be safe against vring pages
invalidates.
This patch introduces a new access_ok flag per virtqueue, which is set
when all the rings are mapped, and cleared as soon as a page used by a
ring is invalidated. The invalidation part is implemented in a following
patch.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.c      | 37 ++++++++++++++++++++++++++
 lib/librte_vhost/vhost.h      |  2 ++
 lib/librte_vhost/vhost_user.c | 62 +++++++++++++++++++++++++++++++------------
 lib/librte_vhost/virtio_net.c | 60 +++++++++++++++++++++++++----------------
 4 files changed, 121 insertions(+), 40 deletions(-)
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 0e2ad3322..ef54835a6 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -135,6 +135,43 @@ free_device(struct virtio_net *dev)
 	rte_free(dev);
 }
 
+int
+vring_translate(struct virtio_net *dev, struct vhost_virtqueue *vq)
+{
+	uint64_t size;
+
+	if (!(dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)))
+		goto out;
+
+	size = sizeof(struct vring_desc) * vq->size;
+	vq->desc = (struct vring_desc *)vhost_iova_to_vva(dev, vq,
+						vq->ring_addrs.desc_user_addr,
+						size, VHOST_ACCESS_RW);
+	if (!vq->desc)
+		return -1;
+
+	size = sizeof(struct vring_avail);
+	size += sizeof(uint16_t) * vq->size;
+	vq->avail = (struct vring_avail *)vhost_iova_to_vva(dev, vq,
+						vq->ring_addrs.avail_user_addr,
+						size, VHOST_ACCESS_RW);
+	if (!vq->avail)
+		return -1;
+
+	size = sizeof(struct vring_used);
+	size += sizeof(struct vring_used_elem) * vq->size;
+	vq->used = (struct vring_used *)vhost_iova_to_vva(dev, vq,
+						vq->ring_addrs.used_user_addr,
+						size, VHOST_ACCESS_RW);
+	if (!vq->used)
+		return -1;
+
+out:
+	vq->access_ok = 1;
+
+	return 0;
+}
+
 static void
 init_vring_queue(struct virtio_net *dev, uint32_t vring_idx)
 {
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 903da5db5..b3fe6bb8e 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -113,6 +113,7 @@ struct vhost_virtqueue {
 	/* Currently unused as polling mode is enabled */
 	int			kickfd;
 	int			enabled;
+	int			access_ok;
 
 	/* Physical address of used ring, for logging */
 	uint64_t		log_guest_addr;
@@ -378,6 +379,7 @@ void vhost_backend_cleanup(struct virtio_net *dev);
 
 uint64_t __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			uint64_t iova, uint64_t size, uint8_t perm);
+int vring_translate(struct virtio_net *dev, struct vhost_virtqueue *vq);
 
 static __rte_always_inline uint64_t
 vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 90b209764..dd6562fd8 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -391,6 +391,12 @@ vhost_user_set_vring_addr(struct virtio_net *dev, VhostUserMsg *msg)
 	 */
 	memcpy(&vq->ring_addrs, addr, sizeof(*addr));
 
+	vq->desc = NULL;
+	vq->avail = NULL;
+	vq->used = NULL;
+
+	vq->access_ok = 0;
+
 	return 0;
 }
 
@@ -407,10 +413,10 @@ static struct virtio_net *translate_ring_addresses(struct virtio_net *dev,
 	vq->desc = (struct vring_desc *)(uintptr_t)ring_addr_to_vva(dev,
 			vq, addr->desc_user_addr, sizeof(struct vring_desc));
 	if (vq->desc == 0) {
-		RTE_LOG(ERR, VHOST_CONFIG,
+		RTE_LOG(DEBUG, VHOST_CONFIG,
 			"(%d) failed to find desc ring address.\n",
 			dev->vid);
-		return NULL;
+		return dev;
 	}
 
 	dev = numa_realloc(dev, vq_index);
@@ -419,19 +425,19 @@ static struct virtio_net *translate_ring_addresses(struct virtio_net *dev,
 	vq->avail = (struct vring_avail *)(uintptr_t)ring_addr_to_vva(dev,
 			vq, addr->avail_user_addr, sizeof(struct vring_avail));
 	if (vq->avail == 0) {
-		RTE_LOG(ERR, VHOST_CONFIG,
+		RTE_LOG(DEBUG, VHOST_CONFIG,
 			"(%d) failed to find avail ring address.\n",
 			dev->vid);
-		return NULL;
+		return dev;
 	}
 
 	vq->used = (struct vring_used *)(uintptr_t)ring_addr_to_vva(dev,
 			vq, addr->used_user_addr, sizeof(struct vring_used));
 	if (vq->used == 0) {
-		RTE_LOG(ERR, VHOST_CONFIG,
+		RTE_LOG(DEBUG, VHOST_CONFIG,
 			"(%d) failed to find used ring address.\n",
 			dev->vid);
-		return NULL;
+		return dev;
 	}
 
 	if (vq->last_used_idx != vq->used->idx) {
@@ -677,7 +683,7 @@ vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 static int
 vq_is_ready(struct vhost_virtqueue *vq)
 {
-	return vq && vq->desc   &&
+	return vq && vq->desc && vq->avail && vq->used &&
 	       vq->kickfd != VIRTIO_UNINITIALIZED_EVENTFD &&
 	       vq->callfd != VIRTIO_UNINITIALIZED_EVENTFD;
 }
@@ -986,8 +992,29 @@ vhost_user_set_req_fd(struct virtio_net *dev, struct VhostUserMsg *msg)
 }
 
 static int
-vhost_user_iotlb_msg(struct virtio_net *dev, struct VhostUserMsg *msg)
+is_vring_iotlb_update(struct vhost_virtqueue *vq, struct vhost_iotlb_msg *imsg)
 {
+	struct vhost_vring_addr *ra;
+	uint64_t start, end;
+
+	start = imsg->iova;
+	end = start + imsg->size;
+
+	ra = &vq->ring_addrs;
+	if (ra->desc_user_addr >= start && ra->desc_user_addr < end)
+		return 1;
+	if (ra->avail_user_addr >= start && ra->avail_user_addr < end)
+		return 1;
+	if (ra->used_user_addr >= start && ra->used_user_addr < end)
+		return 1;
+
+	return 0;
+}
+
+static int
+vhost_user_iotlb_msg(struct virtio_net **pdev, struct VhostUserMsg *msg)
+{
+	struct virtio_net *dev = *pdev;
 	struct vhost_iotlb_msg *imsg = &msg->payload.iotlb;
 	uint16_t i;
 	uint64_t vva;
@@ -1003,6 +1030,9 @@ vhost_user_iotlb_msg(struct virtio_net *dev, struct VhostUserMsg *msg)
 
 			vhost_user_iotlb_cache_insert(vq, imsg->iova, vva,
 					imsg->size, imsg->perm);
+
+			if (is_vring_iotlb_update(vq, imsg))
+				*pdev = dev = translate_ring_addresses(dev, i);
 		}
 		break;
 	case VHOST_IOTLB_INVALIDATE:
@@ -1151,8 +1181,12 @@ vhost_user_msg_handler(int vid, int fd)
 	}
 
 	ret = 0;
-	RTE_LOG(INFO, VHOST_CONFIG, "read message %s\n",
-		vhost_message_str[msg.request]);
+	if (msg.request != VHOST_USER_IOTLB_MSG)
+		RTE_LOG(INFO, VHOST_CONFIG, "read message %s\n",
+			vhost_message_str[msg.request]);
+	else
+		RTE_LOG(DEBUG, VHOST_CONFIG, "read message %s\n",
+			vhost_message_str[msg.request]);
 
 	ret = vhost_user_check_and_alloc_queue_pair(dev, &msg);
 	if (ret < 0) {
@@ -1254,7 +1288,7 @@ vhost_user_msg_handler(int vid, int fd)
 		break;
 
 	case VHOST_USER_IOTLB_MSG:
-		ret = vhost_user_iotlb_msg(dev, &msg);
+		ret = vhost_user_iotlb_msg(&dev, &msg);
 		break;
 
 	default:
@@ -1263,12 +1297,6 @@ vhost_user_msg_handler(int vid, int fd)
 
 	}
 
-	/*
-	 * The virtio_net struct might have been reallocated on a different
-	 * NUMA node, so dev pointer might no more be valid.
-	 */
-	dev = get_device(vid);
-
 	if (msg.flags & VHOST_USER_NEED_REPLY) {
 		msg.payload.u64 = !!ret;
 		msg.size = sizeof(msg.payload.u64);
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index cdfb6f957..b75c93cf1 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -329,13 +329,23 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 	if (unlikely(vq->enabled == 0))
 		return 0;
 
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_lock(vq);
+
+	if (unlikely(vq->access_ok == 0)) {
+		if (unlikely(vring_translate(dev, vq) < 0)) {
+			count = 0;
+			goto out;
+		}
+	}
+
 	avail_idx = *((volatile uint16_t *)&vq->avail->idx);
 	start_idx = vq->last_used_idx;
 	free_entries = avail_idx - start_idx;
 	count = RTE_MIN(count, free_entries);
 	count = RTE_MIN(count, (uint32_t)MAX_PKT_BURST);
 	if (count == 0)
-		return 0;
+		goto out;
 
 	LOG_DEBUG(VHOST_DATA, "(%d) start_idx %d | end_idx %d\n",
 		dev->vid, start_idx, start_idx + count);
@@ -356,10 +366,6 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 	}
 
 	rte_prefetch0(&vq->desc[desc_indexes[0]]);
-
-	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
-		vhost_user_iotlb_rd_lock(vq);
-
 	for (i = 0; i < count; i++) {
 		uint16_t desc_idx = desc_indexes[i];
 		int err;
@@ -394,9 +400,6 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 
 	do_data_copy_enqueue(dev, vq);
 
-	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
-		vhost_user_iotlb_rd_unlock(vq);
-
 	rte_smp_wmb();
 
 	*(volatile uint16_t *)&vq->used->idx += count;
@@ -412,6 +415,10 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 	if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)
 			&& (vq->callfd >= 0))
 		eventfd_write(vq->callfd, (eventfd_t)1);
+out:
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_unlock(vq);
+
 	return count;
 }
 
@@ -647,9 +654,16 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
 	if (unlikely(vq->enabled == 0))
 		return 0;
 
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_lock(vq);
+
+	if (unlikely(vq->access_ok == 0))
+		if (unlikely(vring_translate(dev, vq) < 0))
+			goto out;
+
 	count = RTE_MIN((uint32_t)MAX_PKT_BURST, count);
 	if (count == 0)
-		return 0;
+		goto out;
 
 	vq->batch_copy_nb_elems = 0;
 
@@ -657,10 +671,6 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
 
 	vq->shadow_used_idx = 0;
 	avail_head = *((volatile uint16_t *)&vq->avail->idx);
-
-	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
-		vhost_user_iotlb_rd_lock(vq);
-
 	for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
 		uint32_t pkt_len = pkts[pkt_idx]->pkt_len + dev->vhost_hlen;
 
@@ -689,9 +699,6 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
 
 	do_data_copy_enqueue(dev, vq);
 
-	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
-		vhost_user_iotlb_rd_unlock(vq);
-
 	if (likely(vq->shadow_used_idx)) {
 		flush_shadow_used_ring(dev, vq);
 
@@ -704,6 +711,10 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
 			eventfd_write(vq->callfd, (eventfd_t)1);
 	}
 
+out:
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_unlock(vq);
+
 	return pkt_idx;
 }
 
@@ -1173,6 +1184,13 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 
 	vq->batch_copy_nb_elems = 0;
 
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_lock(vq);
+
+	if (unlikely(vq->access_ok == 0))
+		if (unlikely(vring_translate(dev, vq) < 0))
+			goto out;
+
 	if (unlikely(dev->dequeue_zero_copy)) {
 		struct zcopy_mbuf *zmbuf, *next;
 		int nr_updated = 0;
@@ -1262,10 +1280,6 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 
 	/* Prefetch descriptor index. */
 	rte_prefetch0(&vq->desc[desc_indexes[0]]);
-
-	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
-		vhost_user_iotlb_rd_lock(vq);
-
 	for (i = 0; i < count; i++) {
 		struct vring_desc *desc;
 		uint16_t sz, idx;
@@ -1329,9 +1343,6 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 			TAILQ_INSERT_TAIL(&vq->zmbuf_list, zmbuf, next);
 		}
 	}
-	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
-		vhost_user_iotlb_rd_unlock(vq);
-
 	vq->last_avail_idx += i;
 
 	if (likely(dev->dequeue_zero_copy == 0)) {
@@ -1341,6 +1352,9 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 	}
 
 out:
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_unlock(vq);
+
 	if (unlikely(rarp_mbuf != NULL)) {
 		/*
 		 * Inject it to the head of "pkts" array, so that switch's mac
-- 
2.13.6
^ permalink raw reply related	[flat|nested] 48+ messages in thread
* [PATCH v3 18/19] vhost: iommu: Invalidate vring in case of matching IOTLB invalidate
  2017-10-05  8:36 [PATCH v3 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (16 preceding siblings ...)
  2017-10-05  8:36 ` [PATCH v3 17/19] vhost-user: iommu: postpone device creation until ring are mapped Maxime Coquelin
@ 2017-10-05  8:36 ` Maxime Coquelin
  2017-10-05  8:36 ` [PATCH v3 19/19] vhost: enable IOMMU support Maxime Coquelin
  2017-10-06  6:24 ` [PATCH v3 00/19] Vhost-user: Implement device IOTLB support Yuanhan Liu
  19 siblings, 0 replies; 48+ messages in thread
From: Maxime Coquelin @ 2017-10-05  8:36 UTC (permalink / raw)
  To: dev, remy.horton, tiwei.bie, yliu
  Cc: mst, jfreiman, vkaplans, jasowang, Maxime Coquelin
As soon as a page used by a ring is invalidated, the access_ok flag
is cleared, so that processing threads try to map them again.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.c      | 14 ++++++++++++++
 lib/librte_vhost/vhost.h      |  1 +
 lib/librte_vhost/vhost_user.c | 38 +++++++++++++++++++++++++++++++++-----
 3 files changed, 48 insertions(+), 5 deletions(-)
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index ef54835a6..061f08a69 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -172,6 +172,20 @@ vring_translate(struct virtio_net *dev, struct vhost_virtqueue *vq)
 	return 0;
 }
 
+void vring_invalidate(struct virtio_net *dev, struct vhost_virtqueue *vq)
+{
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_wr_lock(vq);
+
+	vq->access_ok = 0;
+	vq->desc = NULL;
+	vq->avail = NULL;
+	vq->used = NULL;
+
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_wr_unlock(vq);
+}
+
 static void
 init_vring_queue(struct virtio_net *dev, uint32_t vring_idx)
 {
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index b3fe6bb8e..fb48f3012 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -380,6 +380,7 @@ void vhost_backend_cleanup(struct virtio_net *dev);
 uint64_t __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			uint64_t iova, uint64_t size, uint8_t perm);
 int vring_translate(struct virtio_net *dev, struct vhost_virtqueue *vq);
+void vring_invalidate(struct virtio_net *dev, struct vhost_virtqueue *vq);
 
 static __rte_always_inline uint64_t
 vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index dd6562fd8..f891f5741 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -391,11 +391,7 @@ vhost_user_set_vring_addr(struct virtio_net *dev, VhostUserMsg *msg)
 	 */
 	memcpy(&vq->ring_addrs, addr, sizeof(*addr));
 
-	vq->desc = NULL;
-	vq->avail = NULL;
-	vq->used = NULL;
-
-	vq->access_ok = 0;
+	vring_invalidate(dev, vq);
 
 	return 0;
 }
@@ -1012,6 +1008,35 @@ is_vring_iotlb_update(struct vhost_virtqueue *vq, struct vhost_iotlb_msg *imsg)
 }
 
 static int
+is_vring_iotlb_invalidate(struct vhost_virtqueue *vq,
+				struct vhost_iotlb_msg *imsg)
+{
+	uint64_t istart, iend, vstart, vend;
+
+	istart = imsg->iova;
+	iend = istart + imsg->size - 1;
+
+	vstart = (uint64_t)vq->desc;
+	vend = vstart + sizeof(struct vring_desc) * vq->size - 1;
+	if (vstart <= iend && istart <= vend)
+		return 1;
+
+	vstart = (uint64_t)vq->avail;
+	vend = vstart + sizeof(struct vring_avail);
+	vend += sizeof(uint16_t) * vq->size - 1;
+	if (vstart <= iend && istart <= vend)
+		return 1;
+
+	vstart = (uint64_t)vq->used;
+	vend = vstart + sizeof(struct vring_used);
+	vend += sizeof(struct vring_used_elem) * vq->size - 1;
+	if (vstart <= iend && istart <= vend)
+		return 1;
+
+	return 0;
+}
+
+static int
 vhost_user_iotlb_msg(struct virtio_net **pdev, struct VhostUserMsg *msg)
 {
 	struct virtio_net *dev = *pdev;
@@ -1041,6 +1066,9 @@ vhost_user_iotlb_msg(struct virtio_net **pdev, struct VhostUserMsg *msg)
 
 			vhost_user_iotlb_cache_remove(vq, imsg->iova,
 					imsg->size);
+
+			if (is_vring_iotlb_invalidate(vq, imsg))
+				vring_invalidate(dev, vq);
 		}
 		break;
 	default:
-- 
2.13.6
^ permalink raw reply related	[flat|nested] 48+ messages in thread
* [PATCH v3 19/19] vhost: enable IOMMU support
  2017-10-05  8:36 [PATCH v3 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (17 preceding siblings ...)
  2017-10-05  8:36 ` [PATCH v3 18/19] vhost: iommu: Invalidate vring in case of matching IOTLB invalidate Maxime Coquelin
@ 2017-10-05  8:36 ` Maxime Coquelin
  2017-10-06  6:24 ` [PATCH v3 00/19] Vhost-user: Implement device IOTLB support Yuanhan Liu
  19 siblings, 0 replies; 48+ messages in thread
From: Maxime Coquelin @ 2017-10-05  8:36 UTC (permalink / raw)
  To: dev, remy.horton, tiwei.bie, yliu
  Cc: mst, jfreiman, vkaplans, jasowang, Maxime Coquelin
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/rel_notes/release_17_11.rst | 4 ++++
 lib/librte_vhost/vhost.h               | 3 ++-
 2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/doc/guides/rel_notes/release_17_11.rst b/doc/guides/rel_notes/release_17_11.rst
index 170f4f916..c0fc4ac7f 100644
--- a/doc/guides/rel_notes/release_17_11.rst
+++ b/doc/guides/rel_notes/release_17_11.rst
@@ -41,6 +41,10 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **Added IOMMU support to libvhost-user**
+
+  Implemented device IOTLB in Vhost-user backend, and enabled Virtio's IOMMU
+  feature.
 
 Resolved Issues
 ---------------
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index fb48f3012..598c65b56 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -211,7 +211,8 @@ struct vhost_msg {
 				(1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
 				(1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
 				(1ULL << VIRTIO_RING_F_INDIRECT_DESC) | \
-				(1ULL << VIRTIO_NET_F_MTU))
+				(1ULL << VIRTIO_NET_F_MTU) | \
+				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
 
 
 struct guest_page {
-- 
2.13.6
^ permalink raw reply related	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 00/19] Vhost-user: Implement device IOTLB support
  2017-10-05  8:36 [PATCH v3 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (18 preceding siblings ...)
  2017-10-05  8:36 ` [PATCH v3 19/19] vhost: enable IOMMU support Maxime Coquelin
@ 2017-10-06  6:24 ` Yuanhan Liu
  19 siblings, 0 replies; 48+ messages in thread
From: Yuanhan Liu @ 2017-10-06  6:24 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: dev, remy.horton, tiwei.bie, mst, jfreiman, vkaplans, jasowang
Series applied to dpdk-next-virtio.
Thanks.
	--yliu
On Thu, Oct 05, 2017 at 10:36:08AM +0200, Maxime Coquelin wrote:
> This v3 lists the feature in the release note, and fixes the bug in
> is_vring_iotlb_update() reported by Yuanhan.
> 
> The purpose of this series is to add support for
> VIRTIO_F_IOMMU_PLATFORM feature, by implementing device IOTLB in the
> vhost-user backend. It improves the guest safety by enabling the
> possibility to isolate the Virtio device.
> 
> It makes possible to use Virtio PMD in guest with using VFIO driver
> without enable_unsafe_noiommu_mode parameter set, so that the DPDK
> application on guest can only access memory its has been allowed to,
> and preventing malicious/buggy DPDK application in guest to make
> vhost-user backend write random guest memory. Note that Virtio-net
> Kernel driver also support IOMMU.
> 
> The series depends on Qemu's "vhost-user: Specify and implement
> device IOTLB support" [0], available upstream and which will be part
> of Qemu v2.10 release.
> 
> Performance-wise, even if this RFC has still room for optimizations,
> no performance degradation is noticed with static mappings (i.e. DPDK
> on guest) with PVP benchmark:
> 	Traffic Generator: Moongen (lua-trafficgen)
> 	Acceptable Loss: 0.005%
> 	Validation run time: 1 min
> 	Guest DPDK version/commit: v17.05
> 	QEMU version/commit: master (6db174aed1fd)
> 	Virtio features: default
> 	CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
> 	NIC: 2 x X710
> 	Page size: 1G host/1G guest
> 	Results (bidirectional, total of the two flows):
> 	 - base: 18.8Mpps
> 	 - base + IOTLB series, IOMMU OFF: 18.8Mpps
> 	 - base + IOTLB series, IOMMU ON: 18.8Mpps (14.5Mpps w/o PATCH 21/21)
> 
> This is explained because IOTLB misses, which are very costly, only
> happen at startup time. Indeed, once used, the buffers are not
> invalidated, so if the IOTLB cache is large enough, there will be only
> cache hits. Also, the use of 1G huge pages improves the IOTLB cache
> searching time by reducing the number of entries.
> 
> With 2M hugepages, a performance degradation is seen with IOMMU on:
> 	Traffic Generator: Moongen (lua-trafficgen)
> 	Acceptable Loss: 0.005%
> 	Validation run time: 1 min
> 	Guest DPDK version/commit: v17.05
> 	QEMU version/commit: master (6db174aed1fd)
> 	Virtio features: default
> 	CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
> 	NIC: 2 x X710
> 	Page size: 2M host/2M guest
> 	Results (bidirectional, total of the two flows):
> 	 - base: 18.8Mpps
> 	 - base + IOTLB series, IOMMU OFF: 18.8Mpps
> 	 - base + IOTLB series, IOMMU ON: 13.5Mpps (12.4Mpps wo PATCH 21/21)
> 
> A possible improvement would be to merge contiguous IOTLB entries sharing
> the same permissions. A very rough patch implementing this idea fixes
> the performance degradation (18.8Mpps), but the required work to clean
> it would delay this series after v17.11.
> 
> With dynamic mappings (i.e. Virtio-net kernel driver), this is another
> story. The performance is so poor it makes it almost unusable. Indeed,
> since the Kernel driver unmaps the buffers as soon as they are handled,
> almost all descriptors buffers addresses translations result in an IOTLB
> miss. There is not much that can be done on DPDK side, except maybe
> batching IOTLB miss requests no to break bursts, but it would require
> a big rework. In Qemu, we may consider enabling IOMMU MAP notifications,
> so that DPDK receives the IOTLB updates without having to send IOTLB miss
> request.
> 
> Regarding the design choices:
>  - I initially intended to use userspace RCU library[1] for the cache
> implementation, but it would have added an external dependency, and the
> lib is not available in all distros. Qemu for example got rid of this
> dependency by copying some of the userspace RCU lib parts into Qemu tree,
> but this is not possible with DPDK due to licensing issues (RCU lib is
> LGPL v2). Thanks to Jason advice, I implemented the cache using rd/wr
> locks.
>  - I initially implemented a per-device IOTLB cache, but the concurrent
> acccesses on the IOTLB lock had huge impact on performance (~-40% in
> bidirectionnal, expect even worse with multiqueue). I move to a per-
> virtqueue IOTLB design, which prevents this concurrency.
>  - The slave IOTLB miss request supports reply-ack feature in spec, but
> this version doesn't block or busy-wait for the corresponding update so
> that other queues sharing the same lcore can be processed in the meantime.
> 
> For those who would like to test the series, I made it available on
> gitlab[2] (vhost_user_iotlb_v1 tag). The guest kernel command line requires
> the intel_iommu=on parameter, and the guest should be started with and
> iommu device attached to the virtio-net device. For example:
> 
> ./qemu-system-x86_64 \
>   -enable-kvm -m 4096 -smp 2 \
>   -M q35,kernel-irqchip=split \
>   -cpu host \
>   -device intel-iommu,device-iotlb=on,intremap \
>   -device ioh3420,id=root.1,chassis=1 \
>   -chardev socket,id=char0,path=/tmp/vhost-user1 \
>   -netdev type=vhost-user,id=hn2,chardev=char0 \
>   -device virtio-net-pci,netdev=hn2,id=v0,mq=off,mac=$MAC,bus=root.1,disable-modern=off,disable-legacy=on,iommu_platform=on,ats=on \
> ...
> 
> [0]: https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg00520.html
> [1]: http://liburcu.org/
> [2]: https://gitlab.com/mcoquelin/dpdk-next-virtio/commits/vhost_user_iotlb_v2
> 
> Changes since v1:
> =================
> - Add feature support in release note (Yuanhan)
> - Fix return value in is_vring_iotlb_update() (Yuanhan)
> 
> Changes since v1:
> =================
> - Implement random cache entry eviction instead of removing all entries
>   when cache is full (Yuanhan)
> - Adds bounds check on vring_idx (Remy)
> - Initialize slave_req_fd to -1 (Tiwei)
> - Remove virtio-net device lock (Tiwei)
> - Simplify vhost_user_iotlb_cache_remove() (Tiwei)
> - Squash iotlb lock usage optimization patch
> - Inline noiommu part of vhost_iova_to_vva to remove performance
>   regressions with IOMMU=off
> - Reworked iotlb files copyrights
> 
> Changes since RFC:
> ==================
> - Fix memory leak in error patch reported by Jens
> - Rework wait for IOTLB update by stopping the burst to let other
>   queues to be processed, if any. It implies the introduction an
>   iotlb_pending_list, so that iotlb miss requests aren't sent multiple
>   times for a same address.
> - Optimize iotlb lock usage to recover to same as IOMMU off performance
> - Fix device locking issue in rte_vhost_dequeue_burst() error path
> - Change virtio_dev_rx error handling for consistency with mergeable rx,
>   and to ease returning in case of IOTLB misses.
> - Fix checkpatch warnings reported by checkpatch@dpdk.org
> 
> Maxime Coquelin (19):
>   Revert "vhost: workaround MQ fails to startup"
>   vhost: make error handling consistent in rx path
>   vhost: prepare send_vhost_message() to slave requests
>   vhost: add support to slave requests channel
>   vhost: declare missing IOMMU-related definitions for old kernels
>   vhost: add iotlb helper functions
>   vhost: iotlb: add pending miss request list and helpers
>   vhost-user: add support to IOTLB miss slave requests
>   vhost: initialize vrings IOTLB caches
>   vhost-user: handle IOTLB update and invalidate requests
>   vhost: introduce guest IOVA to backend VA helper
>   vhost: use the guest IOVA to host VA helper
>   vhost: enable rings at the right time
>   vhost: don't dereference invalid dev pointer after its reallocation
>   vhost: postpone rings addresses translation
>   vhost-user: translate ring addresses when IOMMU enabled
>   vhost-user: iommu: postpone device creation until ring are mapped
>   vhost: iommu: Invalidate vring in case of matching IOTLB invalidate
>   vhost: enable IOMMU support
> 
>  doc/guides/rel_notes/release_17_11.rst |   4 +
>  lib/librte_vhost/Makefile              |   4 +-
>  lib/librte_vhost/iotlb.c               | 352 +++++++++++++++++++++++++++++++++
>  lib/librte_vhost/iotlb.h               |  76 +++++++
>  lib/librte_vhost/vhost.c               | 115 +++++++++--
>  lib/librte_vhost/vhost.h               |  61 +++++-
>  lib/librte_vhost/vhost_user.c          | 312 +++++++++++++++++++++++++----
>  lib/librte_vhost/vhost_user.h          |  20 +-
>  lib/librte_vhost/virtio_net.c          |  96 +++++++--
>  9 files changed, 962 insertions(+), 78 deletions(-)
>  create mode 100644 lib/librte_vhost/iotlb.c
>  create mode 100644 lib/librte_vhost/iotlb.h
> 
> -- 
> 2.13.6
^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 15/19] vhost: postpone rings addresses translation
  2017-10-05  8:36 ` [PATCH v3 15/19] vhost: postpone rings addresses translation Maxime Coquelin
@ 2017-10-13  1:47   ` Yao, Lei A
  2017-10-13  7:32     ` Maxime Coquelin
  0 siblings, 1 reply; 48+ messages in thread
From: Yao, Lei A @ 2017-10-13  1:47 UTC (permalink / raw)
  To: Maxime Coquelin, dev@dpdk.org, Horton, Remy, Bie, Tiwei,
	yliu@fridaylinux.org
  Cc: mst@redhat.com, jfreiman@redhat.com, vkaplans@redhat.com,
	jasowang@redhat.com
Hi, Maxime
After this commit, vhost/virtio loopback test will fail when 
use the CPU on socket 1. 
Error message like following during initialize:
VHOST_CONFIG: vring kick idx:0 file:20
VHOST_CONFIG: reallocate vq from 0 to 1 node
VHOST_CONFIG: reallocate dev from 0 to 1 node
VHOST_CONFIG: (0) failed to find avail ring address.
VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK
VHOST_CONFIG: vring kick idx:1 file:21
VHOST_CONFIG: reallocate vq from 0 to 1 node
VHOST_CONFIG: (0) failed to find avail ring address.
But if use CPU on socket 0. It still can work. Could you have a check on 
this? Thanks a lot!
Following is my cmd:
Vhost: testpmd -n 4 -c 0x3000000 --socket-mem 1024,1024 --file-prefix=vhost \
                --vdev 'net_vhost0,iface=vhost-net,queues=1,client=0' -- -i
Virtio-user: testpmd -n 4 -c 0xc00000 --socket-mem 1024,1024 --no-pci --file-prefix=virtio \
                       --vdev=net_virtio_user0,mac=00:01:02:03:04:05,path=./vhost-net \
                        -- -i --txqflags=0xf01 --disable-hw-vlan-filter
BRs
Lei 
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Maxime Coquelin
> Sent: Thursday, October 5, 2017 4:36 PM
> To: dev@dpdk.org; Horton, Remy <remy.horton@intel.com>; Bie, Tiwei
> <tiwei.bie@intel.com>; yliu@fridaylinux.org
> Cc: mst@redhat.com; jfreiman@redhat.com; vkaplans@redhat.com;
> jasowang@redhat.com; Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses
> translation
> 
> This patch postpones rings addresses translations and checks, as
> addresses sent by the master shuld not be interpreted as long as
> ring is not started and enabled[0].
> 
> When protocol features aren't negotiated, the ring is started in
> enabled state, so the addresses translations are postponed to
> vhost_user_set_vring_kick().
> Otherwise, it is postponed to when ring is enabled, in
> vhost_user_set_vring_enable().
> 
> [0]: http://lists.nongnu.org/archive/html/qemu-devel/2017-
> 05/msg04355.html
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/librte_vhost/vhost.h      |  1 +
>  lib/librte_vhost/vhost_user.c | 69
> ++++++++++++++++++++++++++++++++++---------
>  2 files changed, 56 insertions(+), 14 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> index 79351c66f..903da5db5 100644
> --- a/lib/librte_vhost/vhost.h
> +++ b/lib/librte_vhost/vhost.h
> @@ -125,6 +125,7 @@ struct vhost_virtqueue {
> 
>  	struct vring_used_elem  *shadow_used_ring;
>  	uint16_t                shadow_used_idx;
> +	struct vhost_vring_addr ring_addrs;
> 
>  	struct batch_copy_elem	*batch_copy_elems;
>  	uint16_t		batch_copy_nb_elems;
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> index f495dd36e..319867c65 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -356,6 +356,7 @@ static int
>  vhost_user_set_vring_addr(struct virtio_net *dev, VhostUserMsg *msg)
>  {
>  	struct vhost_virtqueue *vq;
> +	struct vhost_vring_addr *addr = &msg->payload.addr;
> 
>  	if (dev->mem == NULL)
>  		return -1;
> @@ -363,35 +364,50 @@ vhost_user_set_vring_addr(struct virtio_net *dev,
> VhostUserMsg *msg)
>  	/* addr->index refers to the queue index. The txq 1, rxq is 0. */
>  	vq = dev->virtqueue[msg->payload.addr.index];
> 
> +	/*
> +	 * Rings addresses should not be interpreted as long as the ring is not
> +	 * started and enabled
> +	 */
> +	memcpy(&vq->ring_addrs, addr, sizeof(*addr));
> +
> +	return 0;
> +}
> +
> +static struct virtio_net *translate_ring_addresses(struct virtio_net *dev,
> +		int vq_index)
> +{
> +	struct vhost_virtqueue *vq = dev->virtqueue[vq_index];
> +	struct vhost_vring_addr *addr = &vq->ring_addrs;
> +
>  	/* The addresses are converted from QEMU virtual to Vhost virtual.
> */
>  	vq->desc = (struct vring_desc *)(uintptr_t)qva_to_vva(dev,
> -			msg->payload.addr.desc_user_addr);
> +			addr->desc_user_addr);
>  	if (vq->desc == 0) {
>  		RTE_LOG(ERR, VHOST_CONFIG,
>  			"(%d) failed to find desc ring address.\n",
>  			dev->vid);
> -		return -1;
> +		return NULL;
>  	}
> 
> -	dev = numa_realloc(dev, msg->payload.addr.index);
> -	vq = dev->virtqueue[msg->payload.addr.index];
> +	dev = numa_realloc(dev, vq_index);
> +	vq = dev->virtqueue[vq_index];
> 
>  	vq->avail = (struct vring_avail *)(uintptr_t)qva_to_vva(dev,
> -			msg->payload.addr.avail_user_addr);
> +			addr->avail_user_addr);
>  	if (vq->avail == 0) {
>  		RTE_LOG(ERR, VHOST_CONFIG,
>  			"(%d) failed to find avail ring address.\n",
>  			dev->vid);
> -		return -1;
> +		return NULL;
>  	}
> 
>  	vq->used = (struct vring_used *)(uintptr_t)qva_to_vva(dev,
> -			msg->payload.addr.used_user_addr);
> +			addr->used_user_addr);
>  	if (vq->used == 0) {
>  		RTE_LOG(ERR, VHOST_CONFIG,
>  			"(%d) failed to find used ring address.\n",
>  			dev->vid);
> -		return -1;
> +		return NULL;
>  	}
> 
>  	if (vq->last_used_idx != vq->used->idx) {
> @@ -403,7 +419,7 @@ vhost_user_set_vring_addr(struct virtio_net *dev,
> VhostUserMsg *msg)
>  		vq->last_avail_idx = vq->used->idx;
>  	}
> 
> -	vq->log_guest_addr = msg->payload.addr.log_guest_addr;
> +	vq->log_guest_addr = addr->log_guest_addr;
> 
>  	LOG_DEBUG(VHOST_CONFIG, "(%d) mapped address desc: %p\n",
>  			dev->vid, vq->desc);
> @@ -414,7 +430,7 @@ vhost_user_set_vring_addr(struct virtio_net *dev,
> VhostUserMsg *msg)
>  	LOG_DEBUG(VHOST_CONFIG, "(%d) log_guest_addr: %" PRIx64 "\n",
>  			dev->vid, vq->log_guest_addr);
> 
> -	return 0;
> +	return dev;
>  }
> 
>  /*
> @@ -685,10 +701,11 @@ vhost_user_set_vring_call(struct virtio_net *dev,
> struct VhostUserMsg *pmsg)
>  }
> 
>  static void
> -vhost_user_set_vring_kick(struct virtio_net *dev, struct VhostUserMsg
> *pmsg)
> +vhost_user_set_vring_kick(struct virtio_net **pdev, struct VhostUserMsg
> *pmsg)
>  {
>  	struct vhost_vring_file file;
>  	struct vhost_virtqueue *vq;
> +	struct virtio_net *dev = *pdev;
> 
>  	file.index = pmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
>  	if (pmsg->payload.u64 & VHOST_USER_VRING_NOFD_MASK)
> @@ -698,6 +715,16 @@ vhost_user_set_vring_kick(struct virtio_net *dev,
> struct VhostUserMsg *pmsg)
>  	RTE_LOG(INFO, VHOST_CONFIG,
>  		"vring kick idx:%d file:%d\n", file.index, file.fd);
> 
> +	/*
> +	 * Interpret ring addresses only when ring is started and enabled.
> +	 * This is now if protocol features aren't supported.
> +	 */
> +	if (!(dev->features & (1ULL <<
> VHOST_USER_F_PROTOCOL_FEATURES))) {
> +		*pdev = dev = translate_ring_addresses(dev, file.index);
> +		if (!dev)
> +			return;
> +	}
> +
>  	vq = dev->virtqueue[file.index];
> 
>  	/*
> @@ -778,15 +805,29 @@ vhost_user_get_vring_base(struct virtio_net *dev,
>   * enable the virtio queue pair.
>   */
>  static int
> -vhost_user_set_vring_enable(struct virtio_net *dev,
> +vhost_user_set_vring_enable(struct virtio_net **pdev,
>  			    VhostUserMsg *msg)
>  {
> +	struct virtio_net *dev = *pdev;
>  	int enable = (int)msg->payload.state.num;
> 
>  	RTE_LOG(INFO, VHOST_CONFIG,
>  		"set queue enable: %d to qp idx: %d\n",
>  		enable, msg->payload.state.index);
> 
> +	/*
> +	 * Interpret ring addresses only when ring is started and enabled.
> +	 * This is now if protocol features are supported.
> +	 */
> +	if (enable && (dev->features &
> +				(1ULL <<
> VHOST_USER_F_PROTOCOL_FEATURES))) {
> +		dev = translate_ring_addresses(dev, msg-
> >payload.state.index);
> +		if (!dev)
> +			return -1;
> +
> +		*pdev = dev;
> +	}
> +
>  	if (dev->notify_ops->vring_state_changed)
>  		dev->notify_ops->vring_state_changed(dev->vid,
>  				msg->payload.state.index, enable);
> @@ -1155,7 +1196,7 @@ vhost_user_msg_handler(int vid, int fd)
>  		break;
> 
>  	case VHOST_USER_SET_VRING_KICK:
> -		vhost_user_set_vring_kick(dev, &msg);
> +		vhost_user_set_vring_kick(&dev, &msg);
>  		break;
>  	case VHOST_USER_SET_VRING_CALL:
>  		vhost_user_set_vring_call(dev, &msg);
> @@ -1174,7 +1215,7 @@ vhost_user_msg_handler(int vid, int fd)
>  		break;
> 
>  	case VHOST_USER_SET_VRING_ENABLE:
> -		vhost_user_set_vring_enable(dev, &msg);
> +		vhost_user_set_vring_enable(&dev, &msg);
>  		break;
>  	case VHOST_USER_SEND_RARP:
>  		vhost_user_send_rarp(dev, &msg);
> --
> 2.13.6
^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 15/19] vhost: postpone rings addresses translation
  2017-10-13  1:47   ` Yao, Lei A
@ 2017-10-13  7:32     ` Maxime Coquelin
  2017-10-13  7:55       ` Yao, Lei A
  0 siblings, 1 reply; 48+ messages in thread
From: Maxime Coquelin @ 2017-10-13  7:32 UTC (permalink / raw)
  To: Yao, Lei A, dev@dpdk.org, Horton, Remy, Bie, Tiwei,
	yliu@fridaylinux.org
  Cc: mst@redhat.com, jfreiman@redhat.com, vkaplans@redhat.com,
	jasowang@redhat.com
Hi Lei,
On 10/13/2017 03:47 AM, Yao, Lei A wrote:
> Hi, Maxime
> 
> After this commit, vhost/virtio loopback test will fail when
> use the CPU on socket 1.
> Error message like following during initialize:
> VHOST_CONFIG: vring kick idx:0 file:20
> VHOST_CONFIG: reallocate vq from 0 to 1 node
> VHOST_CONFIG: reallocate dev from 0 to 1 node
> VHOST_CONFIG: (0) failed to find avail ring address.
> VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
> VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
> VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
> VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK
> VHOST_CONFIG: vring kick idx:1 file:21
> VHOST_CONFIG: reallocate vq from 0 to 1 node
> VHOST_CONFIG: (0) failed to find avail ring address.
> 
> But if use CPU on socket 0. It still can work. Could you have a check on
> this? Thanks a lot!
Thanks for reporting the issue. It seems addr pointer still points to
the old structure after the reallocation. This patch should fix the
issue:
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 9acac6125..2416a0061 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -417,6 +417,7 @@ translate_ring_addresses(struct virtio_net *dev, int 
vq_index)
         dev = numa_realloc(dev, vq_index);
         vq = dev->virtqueue[vq_index];
+       addr = &vq->ring_addrs;
         vq->avail = (struct vring_avail *)(uintptr_t)ring_addr_to_vva(dev,
                         vq, addr->avail_user_addr, sizeof(struct 
vring_avail));
I only have access to single-socket machines at the moment, so I cannot
reproduce the issue.
Can you have a try?
Thanks,
Maxime
> Following is my cmd:
> Vhost: testpmd -n 4 -c 0x3000000 --socket-mem 1024,1024 --file-prefix=vhost \
>                  --vdev 'net_vhost0,iface=vhost-net,queues=1,client=0' -- -i
> Virtio-user: testpmd -n 4 -c 0xc00000 --socket-mem 1024,1024 --no-pci --file-prefix=virtio \
>                         --vdev=net_virtio_user0,mac=00:01:02:03:04:05,path=./vhost-net \
>                          -- -i --txqflags=0xf01 --disable-hw-vlan-filter
> 
> BRs
> Lei
> 
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Maxime Coquelin
>> Sent: Thursday, October 5, 2017 4:36 PM
>> To: dev@dpdk.org; Horton, Remy <remy.horton@intel.com>; Bie, Tiwei
>> <tiwei.bie@intel.com>; yliu@fridaylinux.org
>> Cc: mst@redhat.com; jfreiman@redhat.com; vkaplans@redhat.com;
>> jasowang@redhat.com; Maxime Coquelin <maxime.coquelin@redhat.com>
>> Subject: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses
>> translation
>>
>> This patch postpones rings addresses translations and checks, as
>> addresses sent by the master shuld not be interpreted as long as
>> ring is not started and enabled[0].
>>
>> When protocol features aren't negotiated, the ring is started in
>> enabled state, so the addresses translations are postponed to
>> vhost_user_set_vring_kick().
>> Otherwise, it is postponed to when ring is enabled, in
>> vhost_user_set_vring_enable().
>>
>> [0]: http://lists.nongnu.org/archive/html/qemu-devel/2017-
>> 05/msg04355.html
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>   lib/librte_vhost/vhost.h      |  1 +
>>   lib/librte_vhost/vhost_user.c | 69
>> ++++++++++++++++++++++++++++++++++---------
>>   2 files changed, 56 insertions(+), 14 deletions(-)
>>
>> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
>> index 79351c66f..903da5db5 100644
>> --- a/lib/librte_vhost/vhost.h
>> +++ b/lib/librte_vhost/vhost.h
>> @@ -125,6 +125,7 @@ struct vhost_virtqueue {
>>
>>   	struct vring_used_elem  *shadow_used_ring;
>>   	uint16_t                shadow_used_idx;
>> +	struct vhost_vring_addr ring_addrs;
>>
>>   	struct batch_copy_elem	*batch_copy_elems;
>>   	uint16_t		batch_copy_nb_elems;
>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
>> index f495dd36e..319867c65 100644
>> --- a/lib/librte_vhost/vhost_user.c
>> +++ b/lib/librte_vhost/vhost_user.c
>> @@ -356,6 +356,7 @@ static int
>>   vhost_user_set_vring_addr(struct virtio_net *dev, VhostUserMsg *msg)
>>   {
>>   	struct vhost_virtqueue *vq;
>> +	struct vhost_vring_addr *addr = &msg->payload.addr;
>>
>>   	if (dev->mem == NULL)
>>   		return -1;
>> @@ -363,35 +364,50 @@ vhost_user_set_vring_addr(struct virtio_net *dev,
>> VhostUserMsg *msg)
>>   	/* addr->index refers to the queue index. The txq 1, rxq is 0. */
>>   	vq = dev->virtqueue[msg->payload.addr.index];
>>
>> +	/*
>> +	 * Rings addresses should not be interpreted as long as the ring is not
>> +	 * started and enabled
>> +	 */
>> +	memcpy(&vq->ring_addrs, addr, sizeof(*addr));
>> +
>> +	return 0;
>> +}
>> +
>> +static struct virtio_net *translate_ring_addresses(struct virtio_net *dev,
>> +		int vq_index)
>> +{
>> +	struct vhost_virtqueue *vq = dev->virtqueue[vq_index];
>> +	struct vhost_vring_addr *addr = &vq->ring_addrs;
>> +
>>   	/* The addresses are converted from QEMU virtual to Vhost virtual.
>> */
>>   	vq->desc = (struct vring_desc *)(uintptr_t)qva_to_vva(dev,
>> -			msg->payload.addr.desc_user_addr);
>> +			addr->desc_user_addr);
>>   	if (vq->desc == 0) {
>>   		RTE_LOG(ERR, VHOST_CONFIG,
>>   			"(%d) failed to find desc ring address.\n",
>>   			dev->vid);
>> -		return -1;
>> +		return NULL;
>>   	}
>>
>> -	dev = numa_realloc(dev, msg->payload.addr.index);
>> -	vq = dev->virtqueue[msg->payload.addr.index];
>> +	dev = numa_realloc(dev, vq_index);
>> +	vq = dev->virtqueue[vq_index];
>>
>>   	vq->avail = (struct vring_avail *)(uintptr_t)qva_to_vva(dev,
>> -			msg->payload.addr.avail_user_addr);
>> +			addr->avail_user_addr);
>>   	if (vq->avail == 0) {
>>   		RTE_LOG(ERR, VHOST_CONFIG,
>>   			"(%d) failed to find avail ring address.\n",
>>   			dev->vid);
>> -		return -1;
>> +		return NULL;
>>   	}
>>
>>   	vq->used = (struct vring_used *)(uintptr_t)qva_to_vva(dev,
>> -			msg->payload.addr.used_user_addr);
>> +			addr->used_user_addr);
>>   	if (vq->used == 0) {
>>   		RTE_LOG(ERR, VHOST_CONFIG,
>>   			"(%d) failed to find used ring address.\n",
>>   			dev->vid);
>> -		return -1;
>> +		return NULL;
>>   	}
>>
>>   	if (vq->last_used_idx != vq->used->idx) {
>> @@ -403,7 +419,7 @@ vhost_user_set_vring_addr(struct virtio_net *dev,
>> VhostUserMsg *msg)
>>   		vq->last_avail_idx = vq->used->idx;
>>   	}
>>
>> -	vq->log_guest_addr = msg->payload.addr.log_guest_addr;
>> +	vq->log_guest_addr = addr->log_guest_addr;
>>
>>   	LOG_DEBUG(VHOST_CONFIG, "(%d) mapped address desc: %p\n",
>>   			dev->vid, vq->desc);
>> @@ -414,7 +430,7 @@ vhost_user_set_vring_addr(struct virtio_net *dev,
>> VhostUserMsg *msg)
>>   	LOG_DEBUG(VHOST_CONFIG, "(%d) log_guest_addr: %" PRIx64 "\n",
>>   			dev->vid, vq->log_guest_addr);
>>
>> -	return 0;
>> +	return dev;
>>   }
>>
>>   /*
>> @@ -685,10 +701,11 @@ vhost_user_set_vring_call(struct virtio_net *dev,
>> struct VhostUserMsg *pmsg)
>>   }
>>
>>   static void
>> -vhost_user_set_vring_kick(struct virtio_net *dev, struct VhostUserMsg
>> *pmsg)
>> +vhost_user_set_vring_kick(struct virtio_net **pdev, struct VhostUserMsg
>> *pmsg)
>>   {
>>   	struct vhost_vring_file file;
>>   	struct vhost_virtqueue *vq;
>> +	struct virtio_net *dev = *pdev;
>>
>>   	file.index = pmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
>>   	if (pmsg->payload.u64 & VHOST_USER_VRING_NOFD_MASK)
>> @@ -698,6 +715,16 @@ vhost_user_set_vring_kick(struct virtio_net *dev,
>> struct VhostUserMsg *pmsg)
>>   	RTE_LOG(INFO, VHOST_CONFIG,
>>   		"vring kick idx:%d file:%d\n", file.index, file.fd);
>>
>> +	/*
>> +	 * Interpret ring addresses only when ring is started and enabled.
>> +	 * This is now if protocol features aren't supported.
>> +	 */
>> +	if (!(dev->features & (1ULL <<
>> VHOST_USER_F_PROTOCOL_FEATURES))) {
>> +		*pdev = dev = translate_ring_addresses(dev, file.index);
>> +		if (!dev)
>> +			return;
>> +	}
>> +
>>   	vq = dev->virtqueue[file.index];
>>
>>   	/*
>> @@ -778,15 +805,29 @@ vhost_user_get_vring_base(struct virtio_net *dev,
>>    * enable the virtio queue pair.
>>    */
>>   static int
>> -vhost_user_set_vring_enable(struct virtio_net *dev,
>> +vhost_user_set_vring_enable(struct virtio_net **pdev,
>>   			    VhostUserMsg *msg)
>>   {
>> +	struct virtio_net *dev = *pdev;
>>   	int enable = (int)msg->payload.state.num;
>>
>>   	RTE_LOG(INFO, VHOST_CONFIG,
>>   		"set queue enable: %d to qp idx: %d\n",
>>   		enable, msg->payload.state.index);
>>
>> +	/*
>> +	 * Interpret ring addresses only when ring is started and enabled.
>> +	 * This is now if protocol features are supported.
>> +	 */
>> +	if (enable && (dev->features &
>> +				(1ULL <<
>> VHOST_USER_F_PROTOCOL_FEATURES))) {
>> +		dev = translate_ring_addresses(dev, msg-
>>> payload.state.index);
>> +		if (!dev)
>> +			return -1;
>> +
>> +		*pdev = dev;
>> +	}
>> +
>>   	if (dev->notify_ops->vring_state_changed)
>>   		dev->notify_ops->vring_state_changed(dev->vid,
>>   				msg->payload.state.index, enable);
>> @@ -1155,7 +1196,7 @@ vhost_user_msg_handler(int vid, int fd)
>>   		break;
>>
>>   	case VHOST_USER_SET_VRING_KICK:
>> -		vhost_user_set_vring_kick(dev, &msg);
>> +		vhost_user_set_vring_kick(&dev, &msg);
>>   		break;
>>   	case VHOST_USER_SET_VRING_CALL:
>>   		vhost_user_set_vring_call(dev, &msg);
>> @@ -1174,7 +1215,7 @@ vhost_user_msg_handler(int vid, int fd)
>>   		break;
>>
>>   	case VHOST_USER_SET_VRING_ENABLE:
>> -		vhost_user_set_vring_enable(dev, &msg);
>> +		vhost_user_set_vring_enable(&dev, &msg);
>>   		break;
>>   	case VHOST_USER_SEND_RARP:
>>   		vhost_user_send_rarp(dev, &msg);
>> --
>> 2.13.6
> 
^ permalink raw reply related	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 15/19] vhost: postpone rings addresses translation
  2017-10-13  7:32     ` Maxime Coquelin
@ 2017-10-13  7:55       ` Yao, Lei A
  2017-10-13  7:56         ` Maxime Coquelin
  2017-10-16  5:59         ` Yao, Lei A
  0 siblings, 2 replies; 48+ messages in thread
From: Yao, Lei A @ 2017-10-13  7:55 UTC (permalink / raw)
  To: Maxime Coquelin, dev@dpdk.org, Horton, Remy, Bie, Tiwei,
	yliu@fridaylinux.org
  Cc: mst@redhat.com, jfreiman@redhat.com, vkaplans@redhat.com,
	jasowang@redhat.com
Hi, Maxime
> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Friday, October 13, 2017 3:32 PM
> To: Yao, Lei A <lei.a.yao@intel.com>; dev@dpdk.org; Horton, Remy
> <remy.horton@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>;
> yliu@fridaylinux.org
> Cc: mst@redhat.com; jfreiman@redhat.com; vkaplans@redhat.com;
> jasowang@redhat.com
> Subject: Re: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses
> translation
> 
> Hi Lei,
> 
> On 10/13/2017 03:47 AM, Yao, Lei A wrote:
> > Hi, Maxime
> >
> > After this commit, vhost/virtio loopback test will fail when
> > use the CPU on socket 1.
> > Error message like following during initialize:
> > VHOST_CONFIG: vring kick idx:0 file:20
> > VHOST_CONFIG: reallocate vq from 0 to 1 node
> > VHOST_CONFIG: reallocate dev from 0 to 1 node
> > VHOST_CONFIG: (0) failed to find avail ring address.
> > VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
> > VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
> > VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
> > VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK
> > VHOST_CONFIG: vring kick idx:1 file:21
> > VHOST_CONFIG: reallocate vq from 0 to 1 node
> > VHOST_CONFIG: (0) failed to find avail ring address.
> >
> > But if use CPU on socket 0. It still can work. Could you have a check on
> > this? Thanks a lot!
> 
> Thanks for reporting the issue. It seems addr pointer still points to
> the old structure after the reallocation. This patch should fix the
> issue:
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> index 9acac6125..2416a0061 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -417,6 +417,7 @@ translate_ring_addresses(struct virtio_net *dev, int
> vq_index)
> 
>          dev = numa_realloc(dev, vq_index);
>          vq = dev->virtqueue[vq_index];
> +       addr = &vq->ring_addrs;
> 
>          vq->avail = (struct vring_avail *)(uintptr_t)ring_addr_to_vva(dev,
>                          vq, addr->avail_user_addr, sizeof(struct
> vring_avail));
> 
> I only have access to single-socket machines at the moment, so I cannot
> reproduce the issue.
> Can you have a try?
With your new patch, it can work on socket 1 now. Thanks.
> 
> Thanks,
> Maxime
> 
> > Following is my cmd:
> > Vhost: testpmd -n 4 -c 0x3000000 --socket-mem 1024,1024 --file-
> prefix=vhost \
> >                  --vdev 'net_vhost0,iface=vhost-net,queues=1,client=0' -- -i
> > Virtio-user: testpmd -n 4 -c 0xc00000 --socket-mem 1024,1024 --no-pci --
> file-prefix=virtio \
> >                         --vdev=net_virtio_user0,mac=00:01:02:03:04:05,path=./vhost-
> net \
> >                          -- -i --txqflags=0xf01 --disable-hw-vlan-filter
> >
> > BRs
> > Lei
> >
> >> -----Original Message-----
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Maxime
> Coquelin
> >> Sent: Thursday, October 5, 2017 4:36 PM
> >> To: dev@dpdk.org; Horton, Remy <remy.horton@intel.com>; Bie, Tiwei
> >> <tiwei.bie@intel.com>; yliu@fridaylinux.org
> >> Cc: mst@redhat.com; jfreiman@redhat.com; vkaplans@redhat.com;
> >> jasowang@redhat.com; Maxime Coquelin
> <maxime.coquelin@redhat.com>
> >> Subject: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses
> >> translation
> >>
> >> This patch postpones rings addresses translations and checks, as
> >> addresses sent by the master shuld not be interpreted as long as
> >> ring is not started and enabled[0].
> >>
> >> When protocol features aren't negotiated, the ring is started in
> >> enabled state, so the addresses translations are postponed to
> >> vhost_user_set_vring_kick().
> >> Otherwise, it is postponed to when ring is enabled, in
> >> vhost_user_set_vring_enable().
> >>
> >> [0]: http://lists.nongnu.org/archive/html/qemu-devel/2017-
> >> 05/msg04355.html
> >>
> >> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> ---
> >>   lib/librte_vhost/vhost.h      |  1 +
> >>   lib/librte_vhost/vhost_user.c | 69
> >> ++++++++++++++++++++++++++++++++++---------
> >>   2 files changed, 56 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> >> index 79351c66f..903da5db5 100644
> >> --- a/lib/librte_vhost/vhost.h
> >> +++ b/lib/librte_vhost/vhost.h
> >> @@ -125,6 +125,7 @@ struct vhost_virtqueue {
> >>
> >>   	struct vring_used_elem  *shadow_used_ring;
> >>   	uint16_t                shadow_used_idx;
> >> +	struct vhost_vring_addr ring_addrs;
> >>
> >>   	struct batch_copy_elem	*batch_copy_elems;
> >>   	uint16_t		batch_copy_nb_elems;
> >> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> >> index f495dd36e..319867c65 100644
> >> --- a/lib/librte_vhost/vhost_user.c
> >> +++ b/lib/librte_vhost/vhost_user.c
> >> @@ -356,6 +356,7 @@ static int
> >>   vhost_user_set_vring_addr(struct virtio_net *dev, VhostUserMsg *msg)
> >>   {
> >>   	struct vhost_virtqueue *vq;
> >> +	struct vhost_vring_addr *addr = &msg->payload.addr;
> >>
> >>   	if (dev->mem == NULL)
> >>   		return -1;
> >> @@ -363,35 +364,50 @@ vhost_user_set_vring_addr(struct virtio_net
> *dev,
> >> VhostUserMsg *msg)
> >>   	/* addr->index refers to the queue index. The txq 1, rxq is 0. */
> >>   	vq = dev->virtqueue[msg->payload.addr.index];
> >>
> >> +	/*
> >> +	 * Rings addresses should not be interpreted as long as the ring is not
> >> +	 * started and enabled
> >> +	 */
> >> +	memcpy(&vq->ring_addrs, addr, sizeof(*addr));
> >> +
> >> +	return 0;
> >> +}
> >> +
> >> +static struct virtio_net *translate_ring_addresses(struct virtio_net *dev,
> >> +		int vq_index)
> >> +{
> >> +	struct vhost_virtqueue *vq = dev->virtqueue[vq_index];
> >> +	struct vhost_vring_addr *addr = &vq->ring_addrs;
> >> +
> >>   	/* The addresses are converted from QEMU virtual to Vhost virtual.
> >> */
> >>   	vq->desc = (struct vring_desc *)(uintptr_t)qva_to_vva(dev,
> >> -			msg->payload.addr.desc_user_addr);
> >> +			addr->desc_user_addr);
> >>   	if (vq->desc == 0) {
> >>   		RTE_LOG(ERR, VHOST_CONFIG,
> >>   			"(%d) failed to find desc ring address.\n",
> >>   			dev->vid);
> >> -		return -1;
> >> +		return NULL;
> >>   	}
> >>
> >> -	dev = numa_realloc(dev, msg->payload.addr.index);
> >> -	vq = dev->virtqueue[msg->payload.addr.index];
> >> +	dev = numa_realloc(dev, vq_index);
> >> +	vq = dev->virtqueue[vq_index];
> >>
> >>   	vq->avail = (struct vring_avail *)(uintptr_t)qva_to_vva(dev,
> >> -			msg->payload.addr.avail_user_addr);
> >> +			addr->avail_user_addr);
> >>   	if (vq->avail == 0) {
> >>   		RTE_LOG(ERR, VHOST_CONFIG,
> >>   			"(%d) failed to find avail ring address.\n",
> >>   			dev->vid);
> >> -		return -1;
> >> +		return NULL;
> >>   	}
> >>
> >>   	vq->used = (struct vring_used *)(uintptr_t)qva_to_vva(dev,
> >> -			msg->payload.addr.used_user_addr);
> >> +			addr->used_user_addr);
> >>   	if (vq->used == 0) {
> >>   		RTE_LOG(ERR, VHOST_CONFIG,
> >>   			"(%d) failed to find used ring address.\n",
> >>   			dev->vid);
> >> -		return -1;
> >> +		return NULL;
> >>   	}
> >>
> >>   	if (vq->last_used_idx != vq->used->idx) {
> >> @@ -403,7 +419,7 @@ vhost_user_set_vring_addr(struct virtio_net *dev,
> >> VhostUserMsg *msg)
> >>   		vq->last_avail_idx = vq->used->idx;
> >>   	}
> >>
> >> -	vq->log_guest_addr = msg->payload.addr.log_guest_addr;
> >> +	vq->log_guest_addr = addr->log_guest_addr;
> >>
> >>   	LOG_DEBUG(VHOST_CONFIG, "(%d) mapped address desc: %p\n",
> >>   			dev->vid, vq->desc);
> >> @@ -414,7 +430,7 @@ vhost_user_set_vring_addr(struct virtio_net *dev,
> >> VhostUserMsg *msg)
> >>   	LOG_DEBUG(VHOST_CONFIG, "(%d) log_guest_addr: %" PRIx64 "\n",
> >>   			dev->vid, vq->log_guest_addr);
> >>
> >> -	return 0;
> >> +	return dev;
> >>   }
> >>
> >>   /*
> >> @@ -685,10 +701,11 @@ vhost_user_set_vring_call(struct virtio_net
> *dev,
> >> struct VhostUserMsg *pmsg)
> >>   }
> >>
> >>   static void
> >> -vhost_user_set_vring_kick(struct virtio_net *dev, struct VhostUserMsg
> >> *pmsg)
> >> +vhost_user_set_vring_kick(struct virtio_net **pdev, struct
> VhostUserMsg
> >> *pmsg)
> >>   {
> >>   	struct vhost_vring_file file;
> >>   	struct vhost_virtqueue *vq;
> >> +	struct virtio_net *dev = *pdev;
> >>
> >>   	file.index = pmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
> >>   	if (pmsg->payload.u64 & VHOST_USER_VRING_NOFD_MASK)
> >> @@ -698,6 +715,16 @@ vhost_user_set_vring_kick(struct virtio_net *dev,
> >> struct VhostUserMsg *pmsg)
> >>   	RTE_LOG(INFO, VHOST_CONFIG,
> >>   		"vring kick idx:%d file:%d\n", file.index, file.fd);
> >>
> >> +	/*
> >> +	 * Interpret ring addresses only when ring is started and enabled.
> >> +	 * This is now if protocol features aren't supported.
> >> +	 */
> >> +	if (!(dev->features & (1ULL <<
> >> VHOST_USER_F_PROTOCOL_FEATURES))) {
> >> +		*pdev = dev = translate_ring_addresses(dev, file.index);
> >> +		if (!dev)
> >> +			return;
> >> +	}
> >> +
> >>   	vq = dev->virtqueue[file.index];
> >>
> >>   	/*
> >> @@ -778,15 +805,29 @@ vhost_user_get_vring_base(struct virtio_net
> *dev,
> >>    * enable the virtio queue pair.
> >>    */
> >>   static int
> >> -vhost_user_set_vring_enable(struct virtio_net *dev,
> >> +vhost_user_set_vring_enable(struct virtio_net **pdev,
> >>   			    VhostUserMsg *msg)
> >>   {
> >> +	struct virtio_net *dev = *pdev;
> >>   	int enable = (int)msg->payload.state.num;
> >>
> >>   	RTE_LOG(INFO, VHOST_CONFIG,
> >>   		"set queue enable: %d to qp idx: %d\n",
> >>   		enable, msg->payload.state.index);
> >>
> >> +	/*
> >> +	 * Interpret ring addresses only when ring is started and enabled.
> >> +	 * This is now if protocol features are supported.
> >> +	 */
> >> +	if (enable && (dev->features &
> >> +				(1ULL <<
> >> VHOST_USER_F_PROTOCOL_FEATURES))) {
> >> +		dev = translate_ring_addresses(dev, msg-
> >>> payload.state.index);
> >> +		if (!dev)
> >> +			return -1;
> >> +
> >> +		*pdev = dev;
> >> +	}
> >> +
> >>   	if (dev->notify_ops->vring_state_changed)
> >>   		dev->notify_ops->vring_state_changed(dev->vid,
> >>   				msg->payload.state.index, enable);
> >> @@ -1155,7 +1196,7 @@ vhost_user_msg_handler(int vid, int fd)
> >>   		break;
> >>
> >>   	case VHOST_USER_SET_VRING_KICK:
> >> -		vhost_user_set_vring_kick(dev, &msg);
> >> +		vhost_user_set_vring_kick(&dev, &msg);
> >>   		break;
> >>   	case VHOST_USER_SET_VRING_CALL:
> >>   		vhost_user_set_vring_call(dev, &msg);
> >> @@ -1174,7 +1215,7 @@ vhost_user_msg_handler(int vid, int fd)
> >>   		break;
> >>
> >>   	case VHOST_USER_SET_VRING_ENABLE:
> >> -		vhost_user_set_vring_enable(dev, &msg);
> >> +		vhost_user_set_vring_enable(&dev, &msg);
> >>   		break;
> >>   	case VHOST_USER_SEND_RARP:
> >>   		vhost_user_send_rarp(dev, &msg);
> >> --
> >> 2.13.6
> >
^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 15/19] vhost: postpone rings addresses translation
  2017-10-13  7:55       ` Yao, Lei A
@ 2017-10-13  7:56         ` Maxime Coquelin
  2017-10-16  5:59         ` Yao, Lei A
  1 sibling, 0 replies; 48+ messages in thread
From: Maxime Coquelin @ 2017-10-13  7:56 UTC (permalink / raw)
  To: Yao, Lei A, dev@dpdk.org, Horton, Remy, Bie, Tiwei,
	yliu@fridaylinux.org
  Cc: mst@redhat.com, jfreiman@redhat.com, vkaplans@redhat.com,
	jasowang@redhat.com
On 10/13/2017 09:55 AM, Yao, Lei A wrote:
> Hi, Maxime
> 
>> -----Original Message-----
>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>> Sent: Friday, October 13, 2017 3:32 PM
>> To: Yao, Lei A <lei.a.yao@intel.com>; dev@dpdk.org; Horton, Remy
>> <remy.horton@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>;
>> yliu@fridaylinux.org
>> Cc: mst@redhat.com; jfreiman@redhat.com; vkaplans@redhat.com;
>> jasowang@redhat.com
>> Subject: Re: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses
>> translation
>>
>> Hi Lei,
>>
>> On 10/13/2017 03:47 AM, Yao, Lei A wrote:
>>> Hi, Maxime
>>>
>>> After this commit, vhost/virtio loopback test will fail when
>>> use the CPU on socket 1.
>>> Error message like following during initialize:
>>> VHOST_CONFIG: vring kick idx:0 file:20
>>> VHOST_CONFIG: reallocate vq from 0 to 1 node
>>> VHOST_CONFIG: reallocate dev from 0 to 1 node
>>> VHOST_CONFIG: (0) failed to find avail ring address.
>>> VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
>>> VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
>>> VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
>>> VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK
>>> VHOST_CONFIG: vring kick idx:1 file:21
>>> VHOST_CONFIG: reallocate vq from 0 to 1 node
>>> VHOST_CONFIG: (0) failed to find avail ring address.
>>>
>>> But if use CPU on socket 0. It still can work. Could you have a check on
>>> this? Thanks a lot!
>>
>> Thanks for reporting the issue. It seems addr pointer still points to
>> the old structure after the reallocation. This patch should fix the
>> issue:
>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
>> index 9acac6125..2416a0061 100644
>> --- a/lib/librte_vhost/vhost_user.c
>> +++ b/lib/librte_vhost/vhost_user.c
>> @@ -417,6 +417,7 @@ translate_ring_addresses(struct virtio_net *dev, int
>> vq_index)
>>
>>           dev = numa_realloc(dev, vq_index);
>>           vq = dev->virtqueue[vq_index];
>> +       addr = &vq->ring_addrs;
>>
>>           vq->avail = (struct vring_avail *)(uintptr_t)ring_addr_to_vva(dev,
>>                           vq, addr->avail_user_addr, sizeof(struct
>> vring_avail));
>>
>> I only have access to single-socket machines at the moment, so I cannot
>> reproduce the issue.
>> Can you have a try?
> With your new patch, it can work on socket 1 now. Thanks.
Great, thanks for testing. I'll post the patch today.
Thanks,
Maxime
>>
>> Thanks,
>> Maxime
>>
>>> Following is my cmd:
>>> Vhost: testpmd -n 4 -c 0x3000000 --socket-mem 1024,1024 --file-
>> prefix=vhost \
>>>                   --vdev 'net_vhost0,iface=vhost-net,queues=1,client=0' -- -i
>>> Virtio-user: testpmd -n 4 -c 0xc00000 --socket-mem 1024,1024 --no-pci --
>> file-prefix=virtio \
>>>                          --vdev=net_virtio_user0,mac=00:01:02:03:04:05,path=./vhost-
>> net \
>>>                           -- -i --txqflags=0xf01 --disable-hw-vlan-filter
>>>
>>> BRs
>>> Lei
>>>
>>>> -----Original Message-----
>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Maxime
>> Coquelin
>>>> Sent: Thursday, October 5, 2017 4:36 PM
>>>> To: dev@dpdk.org; Horton, Remy <remy.horton@intel.com>; Bie, Tiwei
>>>> <tiwei.bie@intel.com>; yliu@fridaylinux.org
>>>> Cc: mst@redhat.com; jfreiman@redhat.com; vkaplans@redhat.com;
>>>> jasowang@redhat.com; Maxime Coquelin
>> <maxime.coquelin@redhat.com>
>>>> Subject: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses
>>>> translation
>>>>
>>>> This patch postpones rings addresses translations and checks, as
>>>> addresses sent by the master shuld not be interpreted as long as
>>>> ring is not started and enabled[0].
>>>>
>>>> When protocol features aren't negotiated, the ring is started in
>>>> enabled state, so the addresses translations are postponed to
>>>> vhost_user_set_vring_kick().
>>>> Otherwise, it is postponed to when ring is enabled, in
>>>> vhost_user_set_vring_enable().
>>>>
>>>> [0]: http://lists.nongnu.org/archive/html/qemu-devel/2017-
>>>> 05/msg04355.html
>>>>
>>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>> ---
>>>>    lib/librte_vhost/vhost.h      |  1 +
>>>>    lib/librte_vhost/vhost_user.c | 69
>>>> ++++++++++++++++++++++++++++++++++---------
>>>>    2 files changed, 56 insertions(+), 14 deletions(-)
>>>>
>>>> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
>>>> index 79351c66f..903da5db5 100644
>>>> --- a/lib/librte_vhost/vhost.h
>>>> +++ b/lib/librte_vhost/vhost.h
>>>> @@ -125,6 +125,7 @@ struct vhost_virtqueue {
>>>>
>>>>    	struct vring_used_elem  *shadow_used_ring;
>>>>    	uint16_t                shadow_used_idx;
>>>> +	struct vhost_vring_addr ring_addrs;
>>>>
>>>>    	struct batch_copy_elem	*batch_copy_elems;
>>>>    	uint16_t		batch_copy_nb_elems;
>>>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
>>>> index f495dd36e..319867c65 100644
>>>> --- a/lib/librte_vhost/vhost_user.c
>>>> +++ b/lib/librte_vhost/vhost_user.c
>>>> @@ -356,6 +356,7 @@ static int
>>>>    vhost_user_set_vring_addr(struct virtio_net *dev, VhostUserMsg *msg)
>>>>    {
>>>>    	struct vhost_virtqueue *vq;
>>>> +	struct vhost_vring_addr *addr = &msg->payload.addr;
>>>>
>>>>    	if (dev->mem == NULL)
>>>>    		return -1;
>>>> @@ -363,35 +364,50 @@ vhost_user_set_vring_addr(struct virtio_net
>> *dev,
>>>> VhostUserMsg *msg)
>>>>    	/* addr->index refers to the queue index. The txq 1, rxq is 0. */
>>>>    	vq = dev->virtqueue[msg->payload.addr.index];
>>>>
>>>> +	/*
>>>> +	 * Rings addresses should not be interpreted as long as the ring is not
>>>> +	 * started and enabled
>>>> +	 */
>>>> +	memcpy(&vq->ring_addrs, addr, sizeof(*addr));
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static struct virtio_net *translate_ring_addresses(struct virtio_net *dev,
>>>> +		int vq_index)
>>>> +{
>>>> +	struct vhost_virtqueue *vq = dev->virtqueue[vq_index];
>>>> +	struct vhost_vring_addr *addr = &vq->ring_addrs;
>>>> +
>>>>    	/* The addresses are converted from QEMU virtual to Vhost virtual.
>>>> */
>>>>    	vq->desc = (struct vring_desc *)(uintptr_t)qva_to_vva(dev,
>>>> -			msg->payload.addr.desc_user_addr);
>>>> +			addr->desc_user_addr);
>>>>    	if (vq->desc == 0) {
>>>>    		RTE_LOG(ERR, VHOST_CONFIG,
>>>>    			"(%d) failed to find desc ring address.\n",
>>>>    			dev->vid);
>>>> -		return -1;
>>>> +		return NULL;
>>>>    	}
>>>>
>>>> -	dev = numa_realloc(dev, msg->payload.addr.index);
>>>> -	vq = dev->virtqueue[msg->payload.addr.index];
>>>> +	dev = numa_realloc(dev, vq_index);
>>>> +	vq = dev->virtqueue[vq_index];
>>>>
>>>>    	vq->avail = (struct vring_avail *)(uintptr_t)qva_to_vva(dev,
>>>> -			msg->payload.addr.avail_user_addr);
>>>> +			addr->avail_user_addr);
>>>>    	if (vq->avail == 0) {
>>>>    		RTE_LOG(ERR, VHOST_CONFIG,
>>>>    			"(%d) failed to find avail ring address.\n",
>>>>    			dev->vid);
>>>> -		return -1;
>>>> +		return NULL;
>>>>    	}
>>>>
>>>>    	vq->used = (struct vring_used *)(uintptr_t)qva_to_vva(dev,
>>>> -			msg->payload.addr.used_user_addr);
>>>> +			addr->used_user_addr);
>>>>    	if (vq->used == 0) {
>>>>    		RTE_LOG(ERR, VHOST_CONFIG,
>>>>    			"(%d) failed to find used ring address.\n",
>>>>    			dev->vid);
>>>> -		return -1;
>>>> +		return NULL;
>>>>    	}
>>>>
>>>>    	if (vq->last_used_idx != vq->used->idx) {
>>>> @@ -403,7 +419,7 @@ vhost_user_set_vring_addr(struct virtio_net *dev,
>>>> VhostUserMsg *msg)
>>>>    		vq->last_avail_idx = vq->used->idx;
>>>>    	}
>>>>
>>>> -	vq->log_guest_addr = msg->payload.addr.log_guest_addr;
>>>> +	vq->log_guest_addr = addr->log_guest_addr;
>>>>
>>>>    	LOG_DEBUG(VHOST_CONFIG, "(%d) mapped address desc: %p\n",
>>>>    			dev->vid, vq->desc);
>>>> @@ -414,7 +430,7 @@ vhost_user_set_vring_addr(struct virtio_net *dev,
>>>> VhostUserMsg *msg)
>>>>    	LOG_DEBUG(VHOST_CONFIG, "(%d) log_guest_addr: %" PRIx64 "\n",
>>>>    			dev->vid, vq->log_guest_addr);
>>>>
>>>> -	return 0;
>>>> +	return dev;
>>>>    }
>>>>
>>>>    /*
>>>> @@ -685,10 +701,11 @@ vhost_user_set_vring_call(struct virtio_net
>> *dev,
>>>> struct VhostUserMsg *pmsg)
>>>>    }
>>>>
>>>>    static void
>>>> -vhost_user_set_vring_kick(struct virtio_net *dev, struct VhostUserMsg
>>>> *pmsg)
>>>> +vhost_user_set_vring_kick(struct virtio_net **pdev, struct
>> VhostUserMsg
>>>> *pmsg)
>>>>    {
>>>>    	struct vhost_vring_file file;
>>>>    	struct vhost_virtqueue *vq;
>>>> +	struct virtio_net *dev = *pdev;
>>>>
>>>>    	file.index = pmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
>>>>    	if (pmsg->payload.u64 & VHOST_USER_VRING_NOFD_MASK)
>>>> @@ -698,6 +715,16 @@ vhost_user_set_vring_kick(struct virtio_net *dev,
>>>> struct VhostUserMsg *pmsg)
>>>>    	RTE_LOG(INFO, VHOST_CONFIG,
>>>>    		"vring kick idx:%d file:%d\n", file.index, file.fd);
>>>>
>>>> +	/*
>>>> +	 * Interpret ring addresses only when ring is started and enabled.
>>>> +	 * This is now if protocol features aren't supported.
>>>> +	 */
>>>> +	if (!(dev->features & (1ULL <<
>>>> VHOST_USER_F_PROTOCOL_FEATURES))) {
>>>> +		*pdev = dev = translate_ring_addresses(dev, file.index);
>>>> +		if (!dev)
>>>> +			return;
>>>> +	}
>>>> +
>>>>    	vq = dev->virtqueue[file.index];
>>>>
>>>>    	/*
>>>> @@ -778,15 +805,29 @@ vhost_user_get_vring_base(struct virtio_net
>> *dev,
>>>>     * enable the virtio queue pair.
>>>>     */
>>>>    static int
>>>> -vhost_user_set_vring_enable(struct virtio_net *dev,
>>>> +vhost_user_set_vring_enable(struct virtio_net **pdev,
>>>>    			    VhostUserMsg *msg)
>>>>    {
>>>> +	struct virtio_net *dev = *pdev;
>>>>    	int enable = (int)msg->payload.state.num;
>>>>
>>>>    	RTE_LOG(INFO, VHOST_CONFIG,
>>>>    		"set queue enable: %d to qp idx: %d\n",
>>>>    		enable, msg->payload.state.index);
>>>>
>>>> +	/*
>>>> +	 * Interpret ring addresses only when ring is started and enabled.
>>>> +	 * This is now if protocol features are supported.
>>>> +	 */
>>>> +	if (enable && (dev->features &
>>>> +				(1ULL <<
>>>> VHOST_USER_F_PROTOCOL_FEATURES))) {
>>>> +		dev = translate_ring_addresses(dev, msg-
>>>>> payload.state.index);
>>>> +		if (!dev)
>>>> +			return -1;
>>>> +
>>>> +		*pdev = dev;
>>>> +	}
>>>> +
>>>>    	if (dev->notify_ops->vring_state_changed)
>>>>    		dev->notify_ops->vring_state_changed(dev->vid,
>>>>    				msg->payload.state.index, enable);
>>>> @@ -1155,7 +1196,7 @@ vhost_user_msg_handler(int vid, int fd)
>>>>    		break;
>>>>
>>>>    	case VHOST_USER_SET_VRING_KICK:
>>>> -		vhost_user_set_vring_kick(dev, &msg);
>>>> +		vhost_user_set_vring_kick(&dev, &msg);
>>>>    		break;
>>>>    	case VHOST_USER_SET_VRING_CALL:
>>>>    		vhost_user_set_vring_call(dev, &msg);
>>>> @@ -1174,7 +1215,7 @@ vhost_user_msg_handler(int vid, int fd)
>>>>    		break;
>>>>
>>>>    	case VHOST_USER_SET_VRING_ENABLE:
>>>> -		vhost_user_set_vring_enable(dev, &msg);
>>>> +		vhost_user_set_vring_enable(&dev, &msg);
>>>>    		break;
>>>>    	case VHOST_USER_SEND_RARP:
>>>>    		vhost_user_send_rarp(dev, &msg);
>>>> --
>>>> 2.13.6
>>>
^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 15/19] vhost: postpone rings addresses translation
  2017-10-13  7:55       ` Yao, Lei A
  2017-10-13  7:56         ` Maxime Coquelin
@ 2017-10-16  5:59         ` Yao, Lei A
  2017-10-16  6:23           ` Yao, Lei A
  1 sibling, 1 reply; 48+ messages in thread
From: Yao, Lei A @ 2017-10-16  5:59 UTC (permalink / raw)
  To: 'Maxime Coquelin', 'dev@dpdk.org', Horton, Remy,
	Bie, Tiwei, 'yliu@fridaylinux.org'
  Cc: 'mst@redhat.com', 'jfreiman@redhat.com',
	'vkaplans@redhat.com', 'jasowang@redhat.com'
Hi, Maxime
> -----Original Message-----
> From: Yao, Lei A
> Sent: Friday, October 13, 2017 3:55 PM
> To: Maxime Coquelin <maxime.coquelin@redhat.com>; dev@dpdk.org;
> Horton, Remy <remy.horton@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>;
> yliu@fridaylinux.org
> Cc: mst@redhat.com; jfreiman@redhat.com; vkaplans@redhat.com;
> jasowang@redhat.com
> Subject: RE: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses
> translation
> 
> Hi, Maxime
> 
> > -----Original Message-----
> > From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> > Sent: Friday, October 13, 2017 3:32 PM
> > To: Yao, Lei A <lei.a.yao@intel.com>; dev@dpdk.org; Horton, Remy
> > <remy.horton@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>;
> > yliu@fridaylinux.org
> > Cc: mst@redhat.com; jfreiman@redhat.com; vkaplans@redhat.com;
> > jasowang@redhat.com
> > Subject: Re: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses
> > translation
> >
> > Hi Lei,
> >
> > On 10/13/2017 03:47 AM, Yao, Lei A wrote:
> > > Hi, Maxime
> > >
> > > After this commit, vhost/virtio loopback test will fail when
> > > use the CPU on socket 1.
> > > Error message like following during initialize:
> > > VHOST_CONFIG: vring kick idx:0 file:20
> > > VHOST_CONFIG: reallocate vq from 0 to 1 node
> > > VHOST_CONFIG: reallocate dev from 0 to 1 node
> > > VHOST_CONFIG: (0) failed to find avail ring address.
> > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
> > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
> > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
> > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK
> > > VHOST_CONFIG: vring kick idx:1 file:21
> > > VHOST_CONFIG: reallocate vq from 0 to 1 node
> > > VHOST_CONFIG: (0) failed to find avail ring address.
> > >
> > > But if use CPU on socket 0. It still can work. Could you have a check on
> > > this? Thanks a lot!
> >
> > Thanks for reporting the issue. It seems addr pointer still points to
> > the old structure after the reallocation. This patch should fix the
> > issue:
> > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> > index 9acac6125..2416a0061 100644
> > --- a/lib/librte_vhost/vhost_user.c
> > +++ b/lib/librte_vhost/vhost_user.c
> > @@ -417,6 +417,7 @@ translate_ring_addresses(struct virtio_net *dev, int
> > vq_index)
> >
> >          dev = numa_realloc(dev, vq_index);
> >          vq = dev->virtqueue[vq_index];
> > +       addr = &vq->ring_addrs;
> >
> >          vq->avail = (struct vring_avail *)(uintptr_t)ring_addr_to_vva(dev,
> >                          vq, addr->avail_user_addr, sizeof(struct
> > vring_avail));
> >
> > I only have access to single-socket machines at the moment, so I cannot
> > reproduce the issue.
> > Can you have a try?
> With your new patch, it can work on socket 1 now. Thanks.
> 
I find another issue with this patch when I use V17.11-rc1.
It will break the connection between vhost and virtio-net in VM.
The link status of vhost-user port is always down.
But virtio-pmd driver is still ok. 
My qemu version: 2.5
Guest OS: Ubuntu 16.04
Guest kernel: 4.4.0
Could you have a check on this issue? Thanks a lot!
BRs
Lei
> >
> > Thanks,
> > Maxime
> >
> > > Following is my cmd:
> > > Vhost: testpmd -n 4 -c 0x3000000 --socket-mem 1024,1024 --file-
> > prefix=vhost \
> > >                  --vdev 'net_vhost0,iface=vhost-net,queues=1,client=0' -- -i
> > > Virtio-user: testpmd -n 4 -c 0xc00000 --socket-mem 1024,1024 --no-pci --
> > file-prefix=virtio \
> > >                         --
> vdev=net_virtio_user0,mac=00:01:02:03:04:05,path=./vhost-
> > net \
> > >                          -- -i --txqflags=0xf01 --disable-hw-vlan-filter
> > >
> > > BRs
> > > Lei
> > >
> > >> -----Original Message-----
> > >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Maxime
> > Coquelin
> > >> Sent: Thursday, October 5, 2017 4:36 PM
> > >> To: dev@dpdk.org; Horton, Remy <remy.horton@intel.com>; Bie, Tiwei
> > >> <tiwei.bie@intel.com>; yliu@fridaylinux.org
> > >> Cc: mst@redhat.com; jfreiman@redhat.com; vkaplans@redhat.com;
> > >> jasowang@redhat.com; Maxime Coquelin
> > <maxime.coquelin@redhat.com>
> > >> Subject: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses
> > >> translation
> > >>
> > >> This patch postpones rings addresses translations and checks, as
> > >> addresses sent by the master shuld not be interpreted as long as
> > >> ring is not started and enabled[0].
> > >>
> > >> When protocol features aren't negotiated, the ring is started in
> > >> enabled state, so the addresses translations are postponed to
> > >> vhost_user_set_vring_kick().
> > >> Otherwise, it is postponed to when ring is enabled, in
> > >> vhost_user_set_vring_enable().
> > >>
> > >> [0]: http://lists.nongnu.org/archive/html/qemu-devel/2017-
> > >> 05/msg04355.html
> > >>
> > >> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> > >> ---
> > >>   lib/librte_vhost/vhost.h      |  1 +
> > >>   lib/librte_vhost/vhost_user.c | 69
> > >> ++++++++++++++++++++++++++++++++++---------
> > >>   2 files changed, 56 insertions(+), 14 deletions(-)
> > >>
> > >> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> > >> index 79351c66f..903da5db5 100644
> > >> --- a/lib/librte_vhost/vhost.h
> > >> +++ b/lib/librte_vhost/vhost.h
> > >> @@ -125,6 +125,7 @@ struct vhost_virtqueue {
> > >>
> > >>   	struct vring_used_elem  *shadow_used_ring;
> > >>   	uint16_t                shadow_used_idx;
> > >> +	struct vhost_vring_addr ring_addrs;
> > >>
> > >>   	struct batch_copy_elem	*batch_copy_elems;
> > >>   	uint16_t		batch_copy_nb_elems;
> > >> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> > >> index f495dd36e..319867c65 100644
> > >> --- a/lib/librte_vhost/vhost_user.c
> > >> +++ b/lib/librte_vhost/vhost_user.c
> > >> @@ -356,6 +356,7 @@ static int
> > >>   vhost_user_set_vring_addr(struct virtio_net *dev, VhostUserMsg
> *msg)
> > >>   {
> > >>   	struct vhost_virtqueue *vq;
> > >> +	struct vhost_vring_addr *addr = &msg->payload.addr;
> > >>
> > >>   	if (dev->mem == NULL)
> > >>   		return -1;
> > >> @@ -363,35 +364,50 @@ vhost_user_set_vring_addr(struct virtio_net
> > *dev,
> > >> VhostUserMsg *msg)
> > >>   	/* addr->index refers to the queue index. The txq 1, rxq is 0. */
> > >>   	vq = dev->virtqueue[msg->payload.addr.index];
> > >>
> > >> +	/*
> > >> +	 * Rings addresses should not be interpreted as long as the ring is not
> > >> +	 * started and enabled
> > >> +	 */
> > >> +	memcpy(&vq->ring_addrs, addr, sizeof(*addr));
> > >> +
> > >> +	return 0;
> > >> +}
> > >> +
> > >> +static struct virtio_net *translate_ring_addresses(struct virtio_net
> *dev,
> > >> +		int vq_index)
> > >> +{
> > >> +	struct vhost_virtqueue *vq = dev->virtqueue[vq_index];
> > >> +	struct vhost_vring_addr *addr = &vq->ring_addrs;
> > >> +
> > >>   	/* The addresses are converted from QEMU virtual to Vhost virtual.
> > >> */
> > >>   	vq->desc = (struct vring_desc *)(uintptr_t)qva_to_vva(dev,
> > >> -			msg->payload.addr.desc_user_addr);
> > >> +			addr->desc_user_addr);
> > >>   	if (vq->desc == 0) {
> > >>   		RTE_LOG(ERR, VHOST_CONFIG,
> > >>   			"(%d) failed to find desc ring address.\n",
> > >>   			dev->vid);
> > >> -		return -1;
> > >> +		return NULL;
> > >>   	}
> > >>
> > >> -	dev = numa_realloc(dev, msg->payload.addr.index);
> > >> -	vq = dev->virtqueue[msg->payload.addr.index];
> > >> +	dev = numa_realloc(dev, vq_index);
> > >> +	vq = dev->virtqueue[vq_index];
> > >>
> > >>   	vq->avail = (struct vring_avail *)(uintptr_t)qva_to_vva(dev,
> > >> -			msg->payload.addr.avail_user_addr);
> > >> +			addr->avail_user_addr);
> > >>   	if (vq->avail == 0) {
> > >>   		RTE_LOG(ERR, VHOST_CONFIG,
> > >>   			"(%d) failed to find avail ring address.\n",
> > >>   			dev->vid);
> > >> -		return -1;
> > >> +		return NULL;
> > >>   	}
> > >>
> > >>   	vq->used = (struct vring_used *)(uintptr_t)qva_to_vva(dev,
> > >> -			msg->payload.addr.used_user_addr);
> > >> +			addr->used_user_addr);
> > >>   	if (vq->used == 0) {
> > >>   		RTE_LOG(ERR, VHOST_CONFIG,
> > >>   			"(%d) failed to find used ring address.\n",
> > >>   			dev->vid);
> > >> -		return -1;
> > >> +		return NULL;
> > >>   	}
> > >>
> > >>   	if (vq->last_used_idx != vq->used->idx) {
> > >> @@ -403,7 +419,7 @@ vhost_user_set_vring_addr(struct virtio_net
> *dev,
> > >> VhostUserMsg *msg)
> > >>   		vq->last_avail_idx = vq->used->idx;
> > >>   	}
> > >>
> > >> -	vq->log_guest_addr = msg->payload.addr.log_guest_addr;
> > >> +	vq->log_guest_addr = addr->log_guest_addr;
> > >>
> > >>   	LOG_DEBUG(VHOST_CONFIG, "(%d) mapped address desc: %p\n",
> > >>   			dev->vid, vq->desc);
> > >> @@ -414,7 +430,7 @@ vhost_user_set_vring_addr(struct virtio_net
> *dev,
> > >> VhostUserMsg *msg)
> > >>   	LOG_DEBUG(VHOST_CONFIG, "(%d) log_guest_addr: %" PRIx64 "\n",
> > >>   			dev->vid, vq->log_guest_addr);
> > >>
> > >> -	return 0;
> > >> +	return dev;
> > >>   }
> > >>
> > >>   /*
> > >> @@ -685,10 +701,11 @@ vhost_user_set_vring_call(struct virtio_net
> > *dev,
> > >> struct VhostUserMsg *pmsg)
> > >>   }
> > >>
> > >>   static void
> > >> -vhost_user_set_vring_kick(struct virtio_net *dev, struct
> VhostUserMsg
> > >> *pmsg)
> > >> +vhost_user_set_vring_kick(struct virtio_net **pdev, struct
> > VhostUserMsg
> > >> *pmsg)
> > >>   {
> > >>   	struct vhost_vring_file file;
> > >>   	struct vhost_virtqueue *vq;
> > >> +	struct virtio_net *dev = *pdev;
> > >>
> > >>   	file.index = pmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
> > >>   	if (pmsg->payload.u64 & VHOST_USER_VRING_NOFD_MASK)
> > >> @@ -698,6 +715,16 @@ vhost_user_set_vring_kick(struct virtio_net
> *dev,
> > >> struct VhostUserMsg *pmsg)
> > >>   	RTE_LOG(INFO, VHOST_CONFIG,
> > >>   		"vring kick idx:%d file:%d\n", file.index, file.fd);
> > >>
> > >> +	/*
> > >> +	 * Interpret ring addresses only when ring is started and enabled.
> > >> +	 * This is now if protocol features aren't supported.
> > >> +	 */
> > >> +	if (!(dev->features & (1ULL <<
> > >> VHOST_USER_F_PROTOCOL_FEATURES))) {
> > >> +		*pdev = dev = translate_ring_addresses(dev, file.index);
> > >> +		if (!dev)
> > >> +			return;
> > >> +	}
> > >> +
> > >>   	vq = dev->virtqueue[file.index];
> > >>
> > >>   	/*
> > >> @@ -778,15 +805,29 @@ vhost_user_get_vring_base(struct virtio_net
> > *dev,
> > >>    * enable the virtio queue pair.
> > >>    */
> > >>   static int
> > >> -vhost_user_set_vring_enable(struct virtio_net *dev,
> > >> +vhost_user_set_vring_enable(struct virtio_net **pdev,
> > >>   			    VhostUserMsg *msg)
> > >>   {
> > >> +	struct virtio_net *dev = *pdev;
> > >>   	int enable = (int)msg->payload.state.num;
> > >>
> > >>   	RTE_LOG(INFO, VHOST_CONFIG,
> > >>   		"set queue enable: %d to qp idx: %d\n",
> > >>   		enable, msg->payload.state.index);
> > >>
> > >> +	/*
> > >> +	 * Interpret ring addresses only when ring is started and enabled.
> > >> +	 * This is now if protocol features are supported.
> > >> +	 */
> > >> +	if (enable && (dev->features &
> > >> +				(1ULL <<
> > >> VHOST_USER_F_PROTOCOL_FEATURES))) {
> > >> +		dev = translate_ring_addresses(dev, msg-
> > >>> payload.state.index);
> > >> +		if (!dev)
> > >> +			return -1;
> > >> +
> > >> +		*pdev = dev;
> > >> +	}
> > >> +
> > >>   	if (dev->notify_ops->vring_state_changed)
> > >>   		dev->notify_ops->vring_state_changed(dev->vid,
> > >>   				msg->payload.state.index, enable);
> > >> @@ -1155,7 +1196,7 @@ vhost_user_msg_handler(int vid, int fd)
> > >>   		break;
> > >>
> > >>   	case VHOST_USER_SET_VRING_KICK:
> > >> -		vhost_user_set_vring_kick(dev, &msg);
> > >> +		vhost_user_set_vring_kick(&dev, &msg);
> > >>   		break;
> > >>   	case VHOST_USER_SET_VRING_CALL:
> > >>   		vhost_user_set_vring_call(dev, &msg);
> > >> @@ -1174,7 +1215,7 @@ vhost_user_msg_handler(int vid, int fd)
> > >>   		break;
> > >>
> > >>   	case VHOST_USER_SET_VRING_ENABLE:
> > >> -		vhost_user_set_vring_enable(dev, &msg);
> > >> +		vhost_user_set_vring_enable(&dev, &msg);
> > >>   		break;
> > >>   	case VHOST_USER_SEND_RARP:
> > >>   		vhost_user_send_rarp(dev, &msg);
> > >> --
> > >> 2.13.6
> > >
^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 15/19] vhost: postpone rings addresses translation
  2017-10-16  5:59         ` Yao, Lei A
@ 2017-10-16  6:23           ` Yao, Lei A
  2017-10-16  9:47             ` Maxime Coquelin
  0 siblings, 1 reply; 48+ messages in thread
From: Yao, Lei A @ 2017-10-16  6:23 UTC (permalink / raw)
  To: Yao, Lei A, 'Maxime Coquelin', 'dev@dpdk.org',
	Horton, Remy, Bie, Tiwei, 'yliu@fridaylinux.org'
  Cc: 'mst@redhat.com', 'jfreiman@redhat.com',
	'vkaplans@redhat.com', 'jasowang@redhat.com'
Hi, Maxime
Add one comment:
This issue with virtio-net only occur when I use CPU on socket 1.
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yao, Lei A
> Sent: Monday, October 16, 2017 2:00 PM
> To: 'Maxime Coquelin' <maxime.coquelin@redhat.com>; 'dev@dpdk.org'
> <dev@dpdk.org>; Horton, Remy <remy.horton@intel.com>; Bie, Tiwei
> <tiwei.bie@intel.com>; 'yliu@fridaylinux.org' <yliu@fridaylinux.org>
> Cc: 'mst@redhat.com' <mst@redhat.com>; 'jfreiman@redhat.com'
> <jfreiman@redhat.com>; 'vkaplans@redhat.com' <vkaplans@redhat.com>;
> 'jasowang@redhat.com' <jasowang@redhat.com>
> Subject: Re: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses
> translation
> 
> Hi, Maxime
> 
> > -----Original Message-----
> > From: Yao, Lei A
> > Sent: Friday, October 13, 2017 3:55 PM
> > To: Maxime Coquelin <maxime.coquelin@redhat.com>; dev@dpdk.org;
> > Horton, Remy <remy.horton@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>;
> > yliu@fridaylinux.org
> > Cc: mst@redhat.com; jfreiman@redhat.com; vkaplans@redhat.com;
> > jasowang@redhat.com
> > Subject: RE: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses
> > translation
> >
> > Hi, Maxime
> >
> > > -----Original Message-----
> > > From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> > > Sent: Friday, October 13, 2017 3:32 PM
> > > To: Yao, Lei A <lei.a.yao@intel.com>; dev@dpdk.org; Horton, Remy
> > > <remy.horton@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>;
> > > yliu@fridaylinux.org
> > > Cc: mst@redhat.com; jfreiman@redhat.com; vkaplans@redhat.com;
> > > jasowang@redhat.com
> > > Subject: Re: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings
> addresses
> > > translation
> > >
> > > Hi Lei,
> > >
> > > On 10/13/2017 03:47 AM, Yao, Lei A wrote:
> > > > Hi, Maxime
> > > >
> > > > After this commit, vhost/virtio loopback test will fail when
> > > > use the CPU on socket 1.
> > > > Error message like following during initialize:
> > > > VHOST_CONFIG: vring kick idx:0 file:20
> > > > VHOST_CONFIG: reallocate vq from 0 to 1 node
> > > > VHOST_CONFIG: reallocate dev from 0 to 1 node
> > > > VHOST_CONFIG: (0) failed to find avail ring address.
> > > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
> > > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
> > > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
> > > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK
> > > > VHOST_CONFIG: vring kick idx:1 file:21
> > > > VHOST_CONFIG: reallocate vq from 0 to 1 node
> > > > VHOST_CONFIG: (0) failed to find avail ring address.
> > > >
> > > > But if use CPU on socket 0. It still can work. Could you have a check on
> > > > this? Thanks a lot!
> > >
> > > Thanks for reporting the issue. It seems addr pointer still points to
> > > the old structure after the reallocation. This patch should fix the
> > > issue:
> > > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> > > index 9acac6125..2416a0061 100644
> > > --- a/lib/librte_vhost/vhost_user.c
> > > +++ b/lib/librte_vhost/vhost_user.c
> > > @@ -417,6 +417,7 @@ translate_ring_addresses(struct virtio_net *dev,
> int
> > > vq_index)
> > >
> > >          dev = numa_realloc(dev, vq_index);
> > >          vq = dev->virtqueue[vq_index];
> > > +       addr = &vq->ring_addrs;
> > >
> > >          vq->avail = (struct vring_avail *)(uintptr_t)ring_addr_to_vva(dev,
> > >                          vq, addr->avail_user_addr, sizeof(struct
> > > vring_avail));
> > >
> > > I only have access to single-socket machines at the moment, so I cannot
> > > reproduce the issue.
> > > Can you have a try?
> > With your new patch, it can work on socket 1 now. Thanks.
> >
> I find another issue with this patch when I use V17.11-rc1.
> It will break the connection between vhost and virtio-net in VM.
> The link status of vhost-user port is always down.
> But virtio-pmd driver is still ok.
> 
> My qemu version: 2.5
> Guest OS: Ubuntu 16.04
> Guest kernel: 4.4.0
> 
> Could you have a check on this issue? Thanks a lot!
> 
> BRs
> Lei
> 
> > >
> > > Thanks,
> > > Maxime
> > >
> > > > Following is my cmd:
> > > > Vhost: testpmd -n 4 -c 0x3000000 --socket-mem 1024,1024 --file-
> > > prefix=vhost \
> > > >                  --vdev 'net_vhost0,iface=vhost-net,queues=1,client=0' -- -i
> > > > Virtio-user: testpmd -n 4 -c 0xc00000 --socket-mem 1024,1024 --no-pci -
> -
> > > file-prefix=virtio \
> > > >                         --
> > vdev=net_virtio_user0,mac=00:01:02:03:04:05,path=./vhost-
> > > net \
> > > >                          -- -i --txqflags=0xf01 --disable-hw-vlan-filter
> > > >
> > > > BRs
> > > > Lei
> > > >
> > > >> -----Original Message-----
> > > >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Maxime
> > > Coquelin
> > > >> Sent: Thursday, October 5, 2017 4:36 PM
> > > >> To: dev@dpdk.org; Horton, Remy <remy.horton@intel.com>; Bie,
> Tiwei
> > > >> <tiwei.bie@intel.com>; yliu@fridaylinux.org
> > > >> Cc: mst@redhat.com; jfreiman@redhat.com; vkaplans@redhat.com;
> > > >> jasowang@redhat.com; Maxime Coquelin
> > > <maxime.coquelin@redhat.com>
> > > >> Subject: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings
> addresses
> > > >> translation
> > > >>
> > > >> This patch postpones rings addresses translations and checks, as
> > > >> addresses sent by the master shuld not be interpreted as long as
> > > >> ring is not started and enabled[0].
> > > >>
> > > >> When protocol features aren't negotiated, the ring is started in
> > > >> enabled state, so the addresses translations are postponed to
> > > >> vhost_user_set_vring_kick().
> > > >> Otherwise, it is postponed to when ring is enabled, in
> > > >> vhost_user_set_vring_enable().
> > > >>
> > > >> [0]: http://lists.nongnu.org/archive/html/qemu-devel/2017-
> > > >> 05/msg04355.html
> > > >>
> > > >> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> > > >> ---
> > > >>   lib/librte_vhost/vhost.h      |  1 +
> > > >>   lib/librte_vhost/vhost_user.c | 69
> > > >> ++++++++++++++++++++++++++++++++++---------
> > > >>   2 files changed, 56 insertions(+), 14 deletions(-)
> > > >>
> > > >> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> > > >> index 79351c66f..903da5db5 100644
> > > >> --- a/lib/librte_vhost/vhost.h
> > > >> +++ b/lib/librte_vhost/vhost.h
> > > >> @@ -125,6 +125,7 @@ struct vhost_virtqueue {
> > > >>
> > > >>   	struct vring_used_elem  *shadow_used_ring;
> > > >>   	uint16_t                shadow_used_idx;
> > > >> +	struct vhost_vring_addr ring_addrs;
> > > >>
> > > >>   	struct batch_copy_elem	*batch_copy_elems;
> > > >>   	uint16_t		batch_copy_nb_elems;
> > > >> diff --git a/lib/librte_vhost/vhost_user.c
> b/lib/librte_vhost/vhost_user.c
> > > >> index f495dd36e..319867c65 100644
> > > >> --- a/lib/librte_vhost/vhost_user.c
> > > >> +++ b/lib/librte_vhost/vhost_user.c
> > > >> @@ -356,6 +356,7 @@ static int
> > > >>   vhost_user_set_vring_addr(struct virtio_net *dev, VhostUserMsg
> > *msg)
> > > >>   {
> > > >>   	struct vhost_virtqueue *vq;
> > > >> +	struct vhost_vring_addr *addr = &msg->payload.addr;
> > > >>
> > > >>   	if (dev->mem == NULL)
> > > >>   		return -1;
> > > >> @@ -363,35 +364,50 @@ vhost_user_set_vring_addr(struct virtio_net
> > > *dev,
> > > >> VhostUserMsg *msg)
> > > >>   	/* addr->index refers to the queue index. The txq 1, rxq is 0.
> */
> > > >>   	vq = dev->virtqueue[msg->payload.addr.index];
> > > >>
> > > >> +	/*
> > > >> +	 * Rings addresses should not be interpreted as long as the
> ring is not
> > > >> +	 * started and enabled
> > > >> +	 */
> > > >> +	memcpy(&vq->ring_addrs, addr, sizeof(*addr));
> > > >> +
> > > >> +	return 0;
> > > >> +}
> > > >> +
> > > >> +static struct virtio_net *translate_ring_addresses(struct virtio_net
> > *dev,
> > > >> +		int vq_index)
> > > >> +{
> > > >> +	struct vhost_virtqueue *vq = dev->virtqueue[vq_index];
> > > >> +	struct vhost_vring_addr *addr = &vq->ring_addrs;
> > > >> +
> > > >>   	/* The addresses are converted from QEMU virtual to Vhost
> virtual.
> > > >> */
> > > >>   	vq->desc = (struct vring_desc *)(uintptr_t)qva_to_vva(dev,
> > > >> -			msg->payload.addr.desc_user_addr);
> > > >> +			addr->desc_user_addr);
> > > >>   	if (vq->desc == 0) {
> > > >>   		RTE_LOG(ERR, VHOST_CONFIG,
> > > >>   			"(%d) failed to find desc ring address.\n",
> > > >>   			dev->vid);
> > > >> -		return -1;
> > > >> +		return NULL;
> > > >>   	}
> > > >>
> > > >> -	dev = numa_realloc(dev, msg->payload.addr.index);
> > > >> -	vq = dev->virtqueue[msg->payload.addr.index];
> > > >> +	dev = numa_realloc(dev, vq_index);
> > > >> +	vq = dev->virtqueue[vq_index];
> > > >>
> > > >>   	vq->avail = (struct vring_avail *)(uintptr_t)qva_to_vva(dev,
> > > >> -			msg->payload.addr.avail_user_addr);
> > > >> +			addr->avail_user_addr);
> > > >>   	if (vq->avail == 0) {
> > > >>   		RTE_LOG(ERR, VHOST_CONFIG,
> > > >>   			"(%d) failed to find avail ring address.\n",
> > > >>   			dev->vid);
> > > >> -		return -1;
> > > >> +		return NULL;
> > > >>   	}
> > > >>
> > > >>   	vq->used = (struct vring_used *)(uintptr_t)qva_to_vva(dev,
> > > >> -			msg->payload.addr.used_user_addr);
> > > >> +			addr->used_user_addr);
> > > >>   	if (vq->used == 0) {
> > > >>   		RTE_LOG(ERR, VHOST_CONFIG,
> > > >>   			"(%d) failed to find used ring address.\n",
> > > >>   			dev->vid);
> > > >> -		return -1;
> > > >> +		return NULL;
> > > >>   	}
> > > >>
> > > >>   	if (vq->last_used_idx != vq->used->idx) {
> > > >> @@ -403,7 +419,7 @@ vhost_user_set_vring_addr(struct virtio_net
> > *dev,
> > > >> VhostUserMsg *msg)
> > > >>   		vq->last_avail_idx = vq->used->idx;
> > > >>   	}
> > > >>
> > > >> -	vq->log_guest_addr = msg->payload.addr.log_guest_addr;
> > > >> +	vq->log_guest_addr = addr->log_guest_addr;
> > > >>
> > > >>   	LOG_DEBUG(VHOST_CONFIG, "(%d) mapped address
> desc: %p\n",
> > > >>   			dev->vid, vq->desc);
> > > >> @@ -414,7 +430,7 @@ vhost_user_set_vring_addr(struct virtio_net
> > *dev,
> > > >> VhostUserMsg *msg)
> > > >>   	LOG_DEBUG(VHOST_CONFIG, "(%d) log_guest_addr: %"
> PRIx64 "\n",
> > > >>   			dev->vid, vq->log_guest_addr);
> > > >>
> > > >> -	return 0;
> > > >> +	return dev;
> > > >>   }
> > > >>
> > > >>   /*
> > > >> @@ -685,10 +701,11 @@ vhost_user_set_vring_call(struct virtio_net
> > > *dev,
> > > >> struct VhostUserMsg *pmsg)
> > > >>   }
> > > >>
> > > >>   static void
> > > >> -vhost_user_set_vring_kick(struct virtio_net *dev, struct
> > VhostUserMsg
> > > >> *pmsg)
> > > >> +vhost_user_set_vring_kick(struct virtio_net **pdev, struct
> > > VhostUserMsg
> > > >> *pmsg)
> > > >>   {
> > > >>   	struct vhost_vring_file file;
> > > >>   	struct vhost_virtqueue *vq;
> > > >> +	struct virtio_net *dev = *pdev;
> > > >>
> > > >>   	file.index = pmsg->payload.u64 &
> VHOST_USER_VRING_IDX_MASK;
> > > >>   	if (pmsg->payload.u64 & VHOST_USER_VRING_NOFD_MASK)
> > > >> @@ -698,6 +715,16 @@ vhost_user_set_vring_kick(struct virtio_net
> > *dev,
> > > >> struct VhostUserMsg *pmsg)
> > > >>   	RTE_LOG(INFO, VHOST_CONFIG,
> > > >>   		"vring kick idx:%d file:%d\n", file.index, file.fd);
> > > >>
> > > >> +	/*
> > > >> +	 * Interpret ring addresses only when ring is started and
> enabled.
> > > >> +	 * This is now if protocol features aren't supported.
> > > >> +	 */
> > > >> +	if (!(dev->features & (1ULL <<
> > > >> VHOST_USER_F_PROTOCOL_FEATURES))) {
> > > >> +		*pdev = dev = translate_ring_addresses(dev,
> file.index);
> > > >> +		if (!dev)
> > > >> +			return;
> > > >> +	}
> > > >> +
> > > >>   	vq = dev->virtqueue[file.index];
> > > >>
> > > >>   	/*
> > > >> @@ -778,15 +805,29 @@ vhost_user_get_vring_base(struct
> virtio_net
> > > *dev,
> > > >>    * enable the virtio queue pair.
> > > >>    */
> > > >>   static int
> > > >> -vhost_user_set_vring_enable(struct virtio_net *dev,
> > > >> +vhost_user_set_vring_enable(struct virtio_net **pdev,
> > > >>   			    VhostUserMsg *msg)
> > > >>   {
> > > >> +	struct virtio_net *dev = *pdev;
> > > >>   	int enable = (int)msg->payload.state.num;
> > > >>
> > > >>   	RTE_LOG(INFO, VHOST_CONFIG,
> > > >>   		"set queue enable: %d to qp idx: %d\n",
> > > >>   		enable, msg->payload.state.index);
> > > >>
> > > >> +	/*
> > > >> +	 * Interpret ring addresses only when ring is started and
> enabled.
> > > >> +	 * This is now if protocol features are supported.
> > > >> +	 */
> > > >> +	if (enable && (dev->features &
> > > >> +				(1ULL <<
> > > >> VHOST_USER_F_PROTOCOL_FEATURES))) {
> > > >> +		dev = translate_ring_addresses(dev, msg-
> > > >>> payload.state.index);
> > > >> +		if (!dev)
> > > >> +			return -1;
> > > >> +
> > > >> +		*pdev = dev;
> > > >> +	}
> > > >> +
> > > >>   	if (dev->notify_ops->vring_state_changed)
> > > >>   		dev->notify_ops->vring_state_changed(dev->vid,
> > > >>   				msg->payload.state.index, enable);
> > > >> @@ -1155,7 +1196,7 @@ vhost_user_msg_handler(int vid, int fd)
> > > >>   		break;
> > > >>
> > > >>   	case VHOST_USER_SET_VRING_KICK:
> > > >> -		vhost_user_set_vring_kick(dev, &msg);
> > > >> +		vhost_user_set_vring_kick(&dev, &msg);
> > > >>   		break;
> > > >>   	case VHOST_USER_SET_VRING_CALL:
> > > >>   		vhost_user_set_vring_call(dev, &msg);
> > > >> @@ -1174,7 +1215,7 @@ vhost_user_msg_handler(int vid, int fd)
> > > >>   		break;
> > > >>
> > > >>   	case VHOST_USER_SET_VRING_ENABLE:
> > > >> -		vhost_user_set_vring_enable(dev, &msg);
> > > >> +		vhost_user_set_vring_enable(&dev, &msg);
> > > >>   		break;
> > > >>   	case VHOST_USER_SEND_RARP:
> > > >>   		vhost_user_send_rarp(dev, &msg);
> > > >> --
> > > >> 2.13.6
> > > >
^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 15/19] vhost: postpone rings addresses translation
  2017-10-16  6:23           ` Yao, Lei A
@ 2017-10-16  9:47             ` Maxime Coquelin
  2017-10-16 10:47               ` Maxime Coquelin
  0 siblings, 1 reply; 48+ messages in thread
From: Maxime Coquelin @ 2017-10-16  9:47 UTC (permalink / raw)
  To: Yao, Lei A, 'dev@dpdk.org', Horton, Remy, Bie, Tiwei,
	'yliu@fridaylinux.org'
  Cc: 'mst@redhat.com', 'jfreiman@redhat.com',
	'vkaplans@redhat.com', 'jasowang@redhat.com'
Hi Yao,
On 10/16/2017 08:23 AM, Yao, Lei A wrote:
> Hi, Maxime
> 
> Add one comment:
> This issue with virtio-net only occur when I use CPU on socket 1.
Thanks for the report.
I fail to reproduce for now.
What is your qemu command line?
Is it reproducible systematically when there is a NUMA reallocation
(DPDK on socket 0, QEMU on socket 1)?
Maxime
> 
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yao, Lei A
>> Sent: Monday, October 16, 2017 2:00 PM
>> To: 'Maxime Coquelin' <maxime.coquelin@redhat.com>; 'dev@dpdk.org'
>> <dev@dpdk.org>; Horton, Remy <remy.horton@intel.com>; Bie, Tiwei
>> <tiwei.bie@intel.com>; 'yliu@fridaylinux.org' <yliu@fridaylinux.org>
>> Cc: 'mst@redhat.com' <mst@redhat.com>; 'jfreiman@redhat.com'
>> <jfreiman@redhat.com>; 'vkaplans@redhat.com' <vkaplans@redhat.com>;
>> 'jasowang@redhat.com' <jasowang@redhat.com>
>> Subject: Re: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses
>> translation
>>
>> Hi, Maxime
>>
>>> -----Original Message-----
>>> From: Yao, Lei A
>>> Sent: Friday, October 13, 2017 3:55 PM
>>> To: Maxime Coquelin <maxime.coquelin@redhat.com>; dev@dpdk.org;
>>> Horton, Remy <remy.horton@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>;
>>> yliu@fridaylinux.org
>>> Cc: mst@redhat.com; jfreiman@redhat.com; vkaplans@redhat.com;
>>> jasowang@redhat.com
>>> Subject: RE: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses
>>> translation
>>>
>>> Hi, Maxime
>>>
>>>> -----Original Message-----
>>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>>>> Sent: Friday, October 13, 2017 3:32 PM
>>>> To: Yao, Lei A <lei.a.yao@intel.com>; dev@dpdk.org; Horton, Remy
>>>> <remy.horton@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>;
>>>> yliu@fridaylinux.org
>>>> Cc: mst@redhat.com; jfreiman@redhat.com; vkaplans@redhat.com;
>>>> jasowang@redhat.com
>>>> Subject: Re: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings
>> addresses
>>>> translation
>>>>
>>>> Hi Lei,
>>>>
>>>> On 10/13/2017 03:47 AM, Yao, Lei A wrote:
>>>>> Hi, Maxime
>>>>>
>>>>> After this commit, vhost/virtio loopback test will fail when
>>>>> use the CPU on socket 1.
>>>>> Error message like following during initialize:
>>>>> VHOST_CONFIG: vring kick idx:0 file:20
>>>>> VHOST_CONFIG: reallocate vq from 0 to 1 node
>>>>> VHOST_CONFIG: reallocate dev from 0 to 1 node
>>>>> VHOST_CONFIG: (0) failed to find avail ring address.
>>>>> VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
>>>>> VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
>>>>> VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
>>>>> VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK
>>>>> VHOST_CONFIG: vring kick idx:1 file:21
>>>>> VHOST_CONFIG: reallocate vq from 0 to 1 node
>>>>> VHOST_CONFIG: (0) failed to find avail ring address.
>>>>>
>>>>> But if use CPU on socket 0. It still can work. Could you have a check on
>>>>> this? Thanks a lot!
>>>>
>>>> Thanks for reporting the issue. It seems addr pointer still points to
>>>> the old structure after the reallocation. This patch should fix the
>>>> issue:
>>>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
>>>> index 9acac6125..2416a0061 100644
>>>> --- a/lib/librte_vhost/vhost_user.c
>>>> +++ b/lib/librte_vhost/vhost_user.c
>>>> @@ -417,6 +417,7 @@ translate_ring_addresses(struct virtio_net *dev,
>> int
>>>> vq_index)
>>>>
>>>>           dev = numa_realloc(dev, vq_index);
>>>>           vq = dev->virtqueue[vq_index];
>>>> +       addr = &vq->ring_addrs;
>>>>
>>>>           vq->avail = (struct vring_avail *)(uintptr_t)ring_addr_to_vva(dev,
>>>>                           vq, addr->avail_user_addr, sizeof(struct
>>>> vring_avail));
>>>>
>>>> I only have access to single-socket machines at the moment, so I cannot
>>>> reproduce the issue.
>>>> Can you have a try?
>>> With your new patch, it can work on socket 1 now. Thanks.
>>>
>> I find another issue with this patch when I use V17.11-rc1.
>> It will break the connection between vhost and virtio-net in VM.
>> The link status of vhost-user port is always down.
>> But virtio-pmd driver is still ok.
>>
>> My qemu version: 2.5
>> Guest OS: Ubuntu 16.04
>> Guest kernel: 4.4.0
>>
>> Could you have a check on this issue? Thanks a lot!
>>
>> BRs
>> Lei
>>
>>>>
>>>> Thanks,
>>>> Maxime
>>>>
>>>>> Following is my cmd:
>>>>> Vhost: testpmd -n 4 -c 0x3000000 --socket-mem 1024,1024 --file-
>>>> prefix=vhost \
>>>>>                   --vdev 'net_vhost0,iface=vhost-net,queues=1,client=0' -- -i
>>>>> Virtio-user: testpmd -n 4 -c 0xc00000 --socket-mem 1024,1024 --no-pci -
>> -
>>>> file-prefix=virtio \
>>>>>                          --
>>> vdev=net_virtio_user0,mac=00:01:02:03:04:05,path=./vhost-
>>>> net \
>>>>>                           -- -i --txqflags=0xf01 --disable-hw-vlan-filter
>>>>>
>>>>> BRs
>>>>> Lei
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Maxime
>>>> Coquelin
>>>>>> Sent: Thursday, October 5, 2017 4:36 PM
>>>>>> To: dev@dpdk.org; Horton, Remy <remy.horton@intel.com>; Bie,
>> Tiwei
>>>>>> <tiwei.bie@intel.com>; yliu@fridaylinux.org
>>>>>> Cc: mst@redhat.com; jfreiman@redhat.com; vkaplans@redhat.com;
>>>>>> jasowang@redhat.com; Maxime Coquelin
>>>> <maxime.coquelin@redhat.com>
>>>>>> Subject: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings
>> addresses
>>>>>> translation
>>>>>>
>>>>>> This patch postpones rings addresses translations and checks, as
>>>>>> addresses sent by the master shuld not be interpreted as long as
>>>>>> ring is not started and enabled[0].
>>>>>>
>>>>>> When protocol features aren't negotiated, the ring is started in
>>>>>> enabled state, so the addresses translations are postponed to
>>>>>> vhost_user_set_vring_kick().
>>>>>> Otherwise, it is postponed to when ring is enabled, in
>>>>>> vhost_user_set_vring_enable().
>>>>>>
>>>>>> [0]: http://lists.nongnu.org/archive/html/qemu-devel/2017-
>>>>>> 05/msg04355.html
>>>>>>
>>>>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>> ---
>>>>>>    lib/librte_vhost/vhost.h      |  1 +
>>>>>>    lib/librte_vhost/vhost_user.c | 69
>>>>>> ++++++++++++++++++++++++++++++++++---------
>>>>>>    2 files changed, 56 insertions(+), 14 deletions(-)
>>>>>>
>>>>>> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
>>>>>> index 79351c66f..903da5db5 100644
>>>>>> --- a/lib/librte_vhost/vhost.h
>>>>>> +++ b/lib/librte_vhost/vhost.h
>>>>>> @@ -125,6 +125,7 @@ struct vhost_virtqueue {
>>>>>>
>>>>>>    	struct vring_used_elem  *shadow_used_ring;
>>>>>>    	uint16_t                shadow_used_idx;
>>>>>> +	struct vhost_vring_addr ring_addrs;
>>>>>>
>>>>>>    	struct batch_copy_elem	*batch_copy_elems;
>>>>>>    	uint16_t		batch_copy_nb_elems;
>>>>>> diff --git a/lib/librte_vhost/vhost_user.c
>> b/lib/librte_vhost/vhost_user.c
>>>>>> index f495dd36e..319867c65 100644
>>>>>> --- a/lib/librte_vhost/vhost_user.c
>>>>>> +++ b/lib/librte_vhost/vhost_user.c
>>>>>> @@ -356,6 +356,7 @@ static int
>>>>>>    vhost_user_set_vring_addr(struct virtio_net *dev, VhostUserMsg
>>> *msg)
>>>>>>    {
>>>>>>    	struct vhost_virtqueue *vq;
>>>>>> +	struct vhost_vring_addr *addr = &msg->payload.addr;
>>>>>>
>>>>>>    	if (dev->mem == NULL)
>>>>>>    		return -1;
>>>>>> @@ -363,35 +364,50 @@ vhost_user_set_vring_addr(struct virtio_net
>>>> *dev,
>>>>>> VhostUserMsg *msg)
>>>>>>    	/* addr->index refers to the queue index. The txq 1, rxq is 0.
>> */
>>>>>>    	vq = dev->virtqueue[msg->payload.addr.index];
>>>>>>
>>>>>> +	/*
>>>>>> +	 * Rings addresses should not be interpreted as long as the
>> ring is not
>>>>>> +	 * started and enabled
>>>>>> +	 */
>>>>>> +	memcpy(&vq->ring_addrs, addr, sizeof(*addr));
>>>>>> +
>>>>>> +	return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static struct virtio_net *translate_ring_addresses(struct virtio_net
>>> *dev,
>>>>>> +		int vq_index)
>>>>>> +{
>>>>>> +	struct vhost_virtqueue *vq = dev->virtqueue[vq_index];
>>>>>> +	struct vhost_vring_addr *addr = &vq->ring_addrs;
>>>>>> +
>>>>>>    	/* The addresses are converted from QEMU virtual to Vhost
>> virtual.
>>>>>> */
>>>>>>    	vq->desc = (struct vring_desc *)(uintptr_t)qva_to_vva(dev,
>>>>>> -			msg->payload.addr.desc_user_addr);
>>>>>> +			addr->desc_user_addr);
>>>>>>    	if (vq->desc == 0) {
>>>>>>    		RTE_LOG(ERR, VHOST_CONFIG,
>>>>>>    			"(%d) failed to find desc ring address.\n",
>>>>>>    			dev->vid);
>>>>>> -		return -1;
>>>>>> +		return NULL;
>>>>>>    	}
>>>>>>
>>>>>> -	dev = numa_realloc(dev, msg->payload.addr.index);
>>>>>> -	vq = dev->virtqueue[msg->payload.addr.index];
>>>>>> +	dev = numa_realloc(dev, vq_index);
>>>>>> +	vq = dev->virtqueue[vq_index];
>>>>>>
>>>>>>    	vq->avail = (struct vring_avail *)(uintptr_t)qva_to_vva(dev,
>>>>>> -			msg->payload.addr.avail_user_addr);
>>>>>> +			addr->avail_user_addr);
>>>>>>    	if (vq->avail == 0) {
>>>>>>    		RTE_LOG(ERR, VHOST_CONFIG,
>>>>>>    			"(%d) failed to find avail ring address.\n",
>>>>>>    			dev->vid);
>>>>>> -		return -1;
>>>>>> +		return NULL;
>>>>>>    	}
>>>>>>
>>>>>>    	vq->used = (struct vring_used *)(uintptr_t)qva_to_vva(dev,
>>>>>> -			msg->payload.addr.used_user_addr);
>>>>>> +			addr->used_user_addr);
>>>>>>    	if (vq->used == 0) {
>>>>>>    		RTE_LOG(ERR, VHOST_CONFIG,
>>>>>>    			"(%d) failed to find used ring address.\n",
>>>>>>    			dev->vid);
>>>>>> -		return -1;
>>>>>> +		return NULL;
>>>>>>    	}
>>>>>>
>>>>>>    	if (vq->last_used_idx != vq->used->idx) {
>>>>>> @@ -403,7 +419,7 @@ vhost_user_set_vring_addr(struct virtio_net
>>> *dev,
>>>>>> VhostUserMsg *msg)
>>>>>>    		vq->last_avail_idx = vq->used->idx;
>>>>>>    	}
>>>>>>
>>>>>> -	vq->log_guest_addr = msg->payload.addr.log_guest_addr;
>>>>>> +	vq->log_guest_addr = addr->log_guest_addr;
>>>>>>
>>>>>>    	LOG_DEBUG(VHOST_CONFIG, "(%d) mapped address
>> desc: %p\n",
>>>>>>    			dev->vid, vq->desc);
>>>>>> @@ -414,7 +430,7 @@ vhost_user_set_vring_addr(struct virtio_net
>>> *dev,
>>>>>> VhostUserMsg *msg)
>>>>>>    	LOG_DEBUG(VHOST_CONFIG, "(%d) log_guest_addr: %"
>> PRIx64 "\n",
>>>>>>    			dev->vid, vq->log_guest_addr);
>>>>>>
>>>>>> -	return 0;
>>>>>> +	return dev;
>>>>>>    }
>>>>>>
>>>>>>    /*
>>>>>> @@ -685,10 +701,11 @@ vhost_user_set_vring_call(struct virtio_net
>>>> *dev,
>>>>>> struct VhostUserMsg *pmsg)
>>>>>>    }
>>>>>>
>>>>>>    static void
>>>>>> -vhost_user_set_vring_kick(struct virtio_net *dev, struct
>>> VhostUserMsg
>>>>>> *pmsg)
>>>>>> +vhost_user_set_vring_kick(struct virtio_net **pdev, struct
>>>> VhostUserMsg
>>>>>> *pmsg)
>>>>>>    {
>>>>>>    	struct vhost_vring_file file;
>>>>>>    	struct vhost_virtqueue *vq;
>>>>>> +	struct virtio_net *dev = *pdev;
>>>>>>
>>>>>>    	file.index = pmsg->payload.u64 &
>> VHOST_USER_VRING_IDX_MASK;
>>>>>>    	if (pmsg->payload.u64 & VHOST_USER_VRING_NOFD_MASK)
>>>>>> @@ -698,6 +715,16 @@ vhost_user_set_vring_kick(struct virtio_net
>>> *dev,
>>>>>> struct VhostUserMsg *pmsg)
>>>>>>    	RTE_LOG(INFO, VHOST_CONFIG,
>>>>>>    		"vring kick idx:%d file:%d\n", file.index, file.fd);
>>>>>>
>>>>>> +	/*
>>>>>> +	 * Interpret ring addresses only when ring is started and
>> enabled.
>>>>>> +	 * This is now if protocol features aren't supported.
>>>>>> +	 */
>>>>>> +	if (!(dev->features & (1ULL <<
>>>>>> VHOST_USER_F_PROTOCOL_FEATURES))) {
>>>>>> +		*pdev = dev = translate_ring_addresses(dev,
>> file.index);
>>>>>> +		if (!dev)
>>>>>> +			return;
>>>>>> +	}
>>>>>> +
>>>>>>    	vq = dev->virtqueue[file.index];
>>>>>>
>>>>>>    	/*
>>>>>> @@ -778,15 +805,29 @@ vhost_user_get_vring_base(struct
>> virtio_net
>>>> *dev,
>>>>>>     * enable the virtio queue pair.
>>>>>>     */
>>>>>>    static int
>>>>>> -vhost_user_set_vring_enable(struct virtio_net *dev,
>>>>>> +vhost_user_set_vring_enable(struct virtio_net **pdev,
>>>>>>    			    VhostUserMsg *msg)
>>>>>>    {
>>>>>> +	struct virtio_net *dev = *pdev;
>>>>>>    	int enable = (int)msg->payload.state.num;
>>>>>>
>>>>>>    	RTE_LOG(INFO, VHOST_CONFIG,
>>>>>>    		"set queue enable: %d to qp idx: %d\n",
>>>>>>    		enable, msg->payload.state.index);
>>>>>>
>>>>>> +	/*
>>>>>> +	 * Interpret ring addresses only when ring is started and
>> enabled.
>>>>>> +	 * This is now if protocol features are supported.
>>>>>> +	 */
>>>>>> +	if (enable && (dev->features &
>>>>>> +				(1ULL <<
>>>>>> VHOST_USER_F_PROTOCOL_FEATURES))) {
>>>>>> +		dev = translate_ring_addresses(dev, msg-
>>>>>>> payload.state.index);
>>>>>> +		if (!dev)
>>>>>> +			return -1;
>>>>>> +
>>>>>> +		*pdev = dev;
>>>>>> +	}
>>>>>> +
>>>>>>    	if (dev->notify_ops->vring_state_changed)
>>>>>>    		dev->notify_ops->vring_state_changed(dev->vid,
>>>>>>    				msg->payload.state.index, enable);
>>>>>> @@ -1155,7 +1196,7 @@ vhost_user_msg_handler(int vid, int fd)
>>>>>>    		break;
>>>>>>
>>>>>>    	case VHOST_USER_SET_VRING_KICK:
>>>>>> -		vhost_user_set_vring_kick(dev, &msg);
>>>>>> +		vhost_user_set_vring_kick(&dev, &msg);
>>>>>>    		break;
>>>>>>    	case VHOST_USER_SET_VRING_CALL:
>>>>>>    		vhost_user_set_vring_call(dev, &msg);
>>>>>> @@ -1174,7 +1215,7 @@ vhost_user_msg_handler(int vid, int fd)
>>>>>>    		break;
>>>>>>
>>>>>>    	case VHOST_USER_SET_VRING_ENABLE:
>>>>>> -		vhost_user_set_vring_enable(dev, &msg);
>>>>>> +		vhost_user_set_vring_enable(&dev, &msg);
>>>>>>    		break;
>>>>>>    	case VHOST_USER_SEND_RARP:
>>>>>>    		vhost_user_send_rarp(dev, &msg);
>>>>>> --
>>>>>> 2.13.6
>>>>>
^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 15/19] vhost: postpone rings addresses translation
  2017-10-16  9:47             ` Maxime Coquelin
@ 2017-10-16 10:47               ` Maxime Coquelin
  2017-10-17  1:24                 ` Yao, Lei A
  0 siblings, 1 reply; 48+ messages in thread
From: Maxime Coquelin @ 2017-10-16 10:47 UTC (permalink / raw)
  To: Yao, Lei A, 'dev@dpdk.org', Horton, Remy, Bie, Tiwei,
	'yliu@fridaylinux.org'
  Cc: 'mst@redhat.com', 'jfreiman@redhat.com',
	'vkaplans@redhat.com', 'jasowang@redhat.com'
On 10/16/2017 11:47 AM, Maxime Coquelin wrote:
> Hi Yao,
> 
> On 10/16/2017 08:23 AM, Yao, Lei A wrote:
>> Hi, Maxime
>>
>> Add one comment:
>> This issue with virtio-net only occur when I use CPU on socket 1.
> 
> Thanks for the report.
> I fail to reproduce for now.
> 
> What is your qemu command line?
> Is it reproducible systematically when there is a NUMA reallocation
> (DPDK on socket 0, QEMU on socket 1)?
Nevermind, I just reproduced the (an?) issue.
The issue I reproduce is not linked to NUMA reallocation, but to
messages sequencing differences between QEMU versions.
So, I'm not 100% this is the same issue, as you mention it works fine
when using CPU socket 0.
The patch "vhost: postpone rings addresses translation" moves rings
addresses translation at either vring kick or enable time, depending
on whether protocol features are enabled or not. This has been done
because we must not interpret ring information as long as the vring is
not fully initialized.
While my patch works fine with recent QEMU version, it breaks with older
ones, like QEMU v2.5. The reason is that on these older versions,
VHOST_USER_SET_VRING_ENABLE is called once and before
VHOST_USER_SET_VRING_ADDR. At that time, the ring adresses aren't
available so the translation is not done. On recent QEMU versions,
we receive VHOST_USER_SET_VRING_ENABLE also after having received
the rings addresses, so it works fine.
The below fix consists in performing the rings addresses translation
also when handling VHOST_USER_SET_VRING_ADDR if ring has already been
enabled.
I'll post a formal patch later today or tomorrow morning after having
tested it more conscientiously. Let me know if it fixes your issue.
Thanks,
Maxime
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 76c4eeca5..1f6cba4b9 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -372,33 +372,6 @@ ring_addr_to_vva(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
         return qva_to_vva(dev, ra);
  }
-/*
- * The virtio device sends us the desc, used and avail ring addresses.
- * This function then converts these to our address space.
- */
-static int
-vhost_user_set_vring_addr(struct virtio_net *dev, VhostUserMsg *msg)
-{
-       struct vhost_virtqueue *vq;
-       struct vhost_vring_addr *addr = &msg->payload.addr;
-
-       if (dev->mem == NULL)
-               return -1;
-
-       /* addr->index refers to the queue index. The txq 1, rxq is 0. */
-       vq = dev->virtqueue[msg->payload.addr.index];
-
-       /*
-        * Rings addresses should not be interpreted as long as the ring 
is not
-        * started and enabled
-        */
-       memcpy(&vq->ring_addrs, addr, sizeof(*addr));
-
-       vring_invalidate(dev, vq);
-
-       return 0;
-}
-
  static struct virtio_net *
  translate_ring_addresses(struct virtio_net *dev, int vq_index)
  {
@@ -464,6 +437,43 @@ translate_ring_addresses(struct virtio_net *dev, 
int vq_index)
  }
  /*
+ * The virtio device sends us the desc, used and avail ring addresses.
+ * This function then converts these to our address space.
+ */
+static int
+vhost_user_set_vring_addr(struct virtio_net **pdev, VhostUserMsg *msg)
+{
+       struct vhost_virtqueue *vq;
+       struct vhost_vring_addr *addr = &msg->payload.addr;
+       struct virtio_net *dev = *pdev;
+
+       if (dev->mem == NULL)
+               return -1;
+
+       /* addr->index refers to the queue index. The txq 1, rxq is 0. */
+       vq = dev->virtqueue[msg->payload.addr.index];
+
+       /*
+        * Rings addresses should not be interpreted as long as the ring 
is not
+        * started and enabled
+        */
+       memcpy(&vq->ring_addrs, addr, sizeof(*addr));
+
+       vring_invalidate(dev, vq);
+
+       if (vq->enabled && (dev->features &
+                               (1ULL << VHOST_USER_F_PROTOCOL_FEATURES))) {
+               dev = translate_ring_addresses(dev, 
msg->payload.state.index);
+               if (!dev)
+                       return -1;
+
+               *pdev = dev;
+       }
+
+       return 0;
+}
+
+/*
   * The virtio device sends us the available ring last used index.
   */
  static int
@@ -1273,7 +1283,7 @@ vhost_user_msg_handler(int vid, int fd)
                 vhost_user_set_vring_num(dev, &msg);
                 break;
         case VHOST_USER_SET_VRING_ADDR:
-               vhost_user_set_vring_addr(dev, &msg);
+               vhost_user_set_vring_addr(&dev, &msg);
                 break;
         case VHOST_USER_SET_VRING_BASE:
                 vhost_user_set_vring_base(dev, &msg);
^ permalink raw reply related	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 15/19] vhost: postpone rings addresses translation
  2017-10-16 10:47               ` Maxime Coquelin
@ 2017-10-17  1:24                 ` Yao, Lei A
  2017-10-17  8:06                   ` Maxime Coquelin
  0 siblings, 1 reply; 48+ messages in thread
From: Yao, Lei A @ 2017-10-17  1:24 UTC (permalink / raw)
  To: Maxime Coquelin, 'dev@dpdk.org', Horton, Remy, Bie, Tiwei,
	'yliu@fridaylinux.org'
  Cc: 'mst@redhat.com', 'jfreiman@redhat.com',
	'vkaplans@redhat.com', 'jasowang@redhat.com'
Hi, Maxime
> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Monday, October 16, 2017 6:48 PM
> To: Yao, Lei A <lei.a.yao@intel.com>; 'dev@dpdk.org' <dev@dpdk.org>;
> Horton, Remy <remy.horton@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>;
> 'yliu@fridaylinux.org' <yliu@fridaylinux.org>
> Cc: 'mst@redhat.com' <mst@redhat.com>; 'jfreiman@redhat.com'
> <jfreiman@redhat.com>; 'vkaplans@redhat.com' <vkaplans@redhat.com>;
> 'jasowang@redhat.com' <jasowang@redhat.com>
> Subject: Re: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses
> translation
> 
> 
> 
> On 10/16/2017 11:47 AM, Maxime Coquelin wrote:
> > Hi Yao,
> >
> > On 10/16/2017 08:23 AM, Yao, Lei A wrote:
> >> Hi, Maxime
> >>
> >> Add one comment:
> >> This issue with virtio-net only occur when I use CPU on socket 1.
> >
> > Thanks for the report.
> > I fail to reproduce for now.
> >
> > What is your qemu command line?
> > Is it reproducible systematically when there is a NUMA reallocation
> > (DPDK on socket 0, QEMU on socket 1)?
> 
> Nevermind, I just reproduced the (an?) issue.
> The issue I reproduce is not linked to NUMA reallocation, but to
> messages sequencing differences between QEMU versions.
> 
> So, I'm not 100% this is the same issue, as you mention it works fine
> when using CPU socket 0.
> 
> The patch "vhost: postpone rings addresses translation" moves rings
> addresses translation at either vring kick or enable time, depending
> on whether protocol features are enabled or not. This has been done
> because we must not interpret ring information as long as the vring is
> not fully initialized.
> 
> While my patch works fine with recent QEMU version, it breaks with older
> ones, like QEMU v2.5. The reason is that on these older versions,
> VHOST_USER_SET_VRING_ENABLE is called once and before
> VHOST_USER_SET_VRING_ADDR. At that time, the ring adresses aren't
> available so the translation is not done. On recent QEMU versions,
> we receive VHOST_USER_SET_VRING_ENABLE also after having received
> the rings addresses, so it works fine.
> 
> The below fix consists in performing the rings addresses translation
> also when handling VHOST_USER_SET_VRING_ADDR if ring has already been
> enabled.
> 
> I'll post a formal patch later today or tomorrow morning after having
> tested it more conscientiously. Let me know if it fixes your issue.
> 
> Thanks,
> Maxime
> 
Thanks for your quick fix. I try your patch and it can totally work at my side for virtio-net.
I tested it with Qemu 2.5~2.7, 2.10. 
The previous info about it can work on numa 0 is a misleading info. Because
I use qemu 2.10 for some special test at that time. So it can work. 
BRs
Lei
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> index 76c4eeca5..1f6cba4b9 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -372,33 +372,6 @@ ring_addr_to_vva(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
>          return qva_to_vva(dev, ra);
>   }
> 
> -/*
> - * The virtio device sends us the desc, used and avail ring addresses.
> - * This function then converts these to our address space.
> - */
> -static int
> -vhost_user_set_vring_addr(struct virtio_net *dev, VhostUserMsg *msg)
> -{
> -       struct vhost_virtqueue *vq;
> -       struct vhost_vring_addr *addr = &msg->payload.addr;
> -
> -       if (dev->mem == NULL)
> -               return -1;
> -
> -       /* addr->index refers to the queue index. The txq 1, rxq is 0. */
> -       vq = dev->virtqueue[msg->payload.addr.index];
> -
> -       /*
> -        * Rings addresses should not be interpreted as long as the ring
> is not
> -        * started and enabled
> -        */
> -       memcpy(&vq->ring_addrs, addr, sizeof(*addr));
> -
> -       vring_invalidate(dev, vq);
> -
> -       return 0;
> -}
> -
>   static struct virtio_net *
>   translate_ring_addresses(struct virtio_net *dev, int vq_index)
>   {
> @@ -464,6 +437,43 @@ translate_ring_addresses(struct virtio_net *dev,
> int vq_index)
>   }
> 
>   /*
> + * The virtio device sends us the desc, used and avail ring addresses.
> + * This function then converts these to our address space.
> + */
> +static int
> +vhost_user_set_vring_addr(struct virtio_net **pdev, VhostUserMsg *msg)
> +{
> +       struct vhost_virtqueue *vq;
> +       struct vhost_vring_addr *addr = &msg->payload.addr;
> +       struct virtio_net *dev = *pdev;
> +
> +       if (dev->mem == NULL)
> +               return -1;
> +
> +       /* addr->index refers to the queue index. The txq 1, rxq is 0. */
> +       vq = dev->virtqueue[msg->payload.addr.index];
> +
> +       /*
> +        * Rings addresses should not be interpreted as long as the ring
> is not
> +        * started and enabled
> +        */
> +       memcpy(&vq->ring_addrs, addr, sizeof(*addr));
> +
> +       vring_invalidate(dev, vq);
> +
> +       if (vq->enabled && (dev->features &
> +                               (1ULL << VHOST_USER_F_PROTOCOL_FEATURES))) {
> +               dev = translate_ring_addresses(dev,
> msg->payload.state.index);
> +               if (!dev)
> +                       return -1;
> +
> +               *pdev = dev;
> +       }
> +
> +       return 0;
> +}
> +
> +/*
>    * The virtio device sends us the available ring last used index.
>    */
>   static int
> @@ -1273,7 +1283,7 @@ vhost_user_msg_handler(int vid, int fd)
>                  vhost_user_set_vring_num(dev, &msg);
>                  break;
>          case VHOST_USER_SET_VRING_ADDR:
> -               vhost_user_set_vring_addr(dev, &msg);
> +               vhost_user_set_vring_addr(&dev, &msg);
>                  break;
>          case VHOST_USER_SET_VRING_BASE:
>                  vhost_user_set_vring_base(dev, &msg);
^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 15/19] vhost: postpone rings addresses translation
  2017-10-17  1:24                 ` Yao, Lei A
@ 2017-10-17  8:06                   ` Maxime Coquelin
  0 siblings, 0 replies; 48+ messages in thread
From: Maxime Coquelin @ 2017-10-17  8:06 UTC (permalink / raw)
  To: Yao, Lei A, 'dev@dpdk.org', Horton, Remy, Bie, Tiwei,
	'yliu@fridaylinux.org'
  Cc: 'mst@redhat.com', 'jfreiman@redhat.com',
	'vkaplans@redhat.com', 'jasowang@redhat.com'
On 10/17/2017 03:24 AM, Yao, Lei A wrote:
> Hi, Maxime
> 
>> -----Original Message-----
>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>> Sent: Monday, October 16, 2017 6:48 PM
>> To: Yao, Lei A <lei.a.yao@intel.com>; 'dev@dpdk.org' <dev@dpdk.org>;
>> Horton, Remy <remy.horton@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>;
>> 'yliu@fridaylinux.org' <yliu@fridaylinux.org>
>> Cc: 'mst@redhat.com' <mst@redhat.com>; 'jfreiman@redhat.com'
>> <jfreiman@redhat.com>; 'vkaplans@redhat.com' <vkaplans@redhat.com>;
>> 'jasowang@redhat.com' <jasowang@redhat.com>
>> Subject: Re: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses
>> translation
>>
>>
>>
>> On 10/16/2017 11:47 AM, Maxime Coquelin wrote:
>>> Hi Yao,
>>>
>>> On 10/16/2017 08:23 AM, Yao, Lei A wrote:
>>>> Hi, Maxime
>>>>
>>>> Add one comment:
>>>> This issue with virtio-net only occur when I use CPU on socket 1.
>>>
>>> Thanks for the report.
>>> I fail to reproduce for now.
>>>
>>> What is your qemu command line?
>>> Is it reproducible systematically when there is a NUMA reallocation
>>> (DPDK on socket 0, QEMU on socket 1)?
>>
>> Nevermind, I just reproduced the (an?) issue.
>> The issue I reproduce is not linked to NUMA reallocation, but to
>> messages sequencing differences between QEMU versions.
>>
>> So, I'm not 100% this is the same issue, as you mention it works fine
>> when using CPU socket 0.
>>
>> The patch "vhost: postpone rings addresses translation" moves rings
>> addresses translation at either vring kick or enable time, depending
>> on whether protocol features are enabled or not. This has been done
>> because we must not interpret ring information as long as the vring is
>> not fully initialized.
>>
>> While my patch works fine with recent QEMU version, it breaks with older
>> ones, like QEMU v2.5. The reason is that on these older versions,
>> VHOST_USER_SET_VRING_ENABLE is called once and before
>> VHOST_USER_SET_VRING_ADDR. At that time, the ring adresses aren't
>> available so the translation is not done. On recent QEMU versions,
>> we receive VHOST_USER_SET_VRING_ENABLE also after having received
>> the rings addresses, so it works fine.
>>
>> The below fix consists in performing the rings addresses translation
>> also when handling VHOST_USER_SET_VRING_ADDR if ring has already been
>> enabled.
>>
>> I'll post a formal patch later today or tomorrow morning after having
>> tested it more conscientiously. Let me know if it fixes your issue.
>>
>> Thanks,
>> Maxime
>>
> Thanks for your quick fix. I try your patch and it can totally work at my side for virtio-net.
> I tested it with Qemu 2.5~2.7, 2.10.
> The previous info about it can work on numa 0 is a misleading info. Because
> I use qemu 2.10 for some special test at that time. So it can work.
Great. Thanks for having tried it.
Maxime
> BRs
> Lei
>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
>> index 76c4eeca5..1f6cba4b9 100644
>> --- a/lib/librte_vhost/vhost_user.c
>> +++ b/lib/librte_vhost/vhost_user.c
>> @@ -372,33 +372,6 @@ ring_addr_to_vva(struct virtio_net *dev, struct
>> vhost_virtqueue *vq,
>>           return qva_to_vva(dev, ra);
>>    }
>>
>> -/*
>> - * The virtio device sends us the desc, used and avail ring addresses.
>> - * This function then converts these to our address space.
>> - */
>> -static int
>> -vhost_user_set_vring_addr(struct virtio_net *dev, VhostUserMsg *msg)
>> -{
>> -       struct vhost_virtqueue *vq;
>> -       struct vhost_vring_addr *addr = &msg->payload.addr;
>> -
>> -       if (dev->mem == NULL)
>> -               return -1;
>> -
>> -       /* addr->index refers to the queue index. The txq 1, rxq is 0. */
>> -       vq = dev->virtqueue[msg->payload.addr.index];
>> -
>> -       /*
>> -        * Rings addresses should not be interpreted as long as the ring
>> is not
>> -        * started and enabled
>> -        */
>> -       memcpy(&vq->ring_addrs, addr, sizeof(*addr));
>> -
>> -       vring_invalidate(dev, vq);
>> -
>> -       return 0;
>> -}
>> -
>>    static struct virtio_net *
>>    translate_ring_addresses(struct virtio_net *dev, int vq_index)
>>    {
>> @@ -464,6 +437,43 @@ translate_ring_addresses(struct virtio_net *dev,
>> int vq_index)
>>    }
>>
>>    /*
>> + * The virtio device sends us the desc, used and avail ring addresses.
>> + * This function then converts these to our address space.
>> + */
>> +static int
>> +vhost_user_set_vring_addr(struct virtio_net **pdev, VhostUserMsg *msg)
>> +{
>> +       struct vhost_virtqueue *vq;
>> +       struct vhost_vring_addr *addr = &msg->payload.addr;
>> +       struct virtio_net *dev = *pdev;
>> +
>> +       if (dev->mem == NULL)
>> +               return -1;
>> +
>> +       /* addr->index refers to the queue index. The txq 1, rxq is 0. */
>> +       vq = dev->virtqueue[msg->payload.addr.index];
>> +
>> +       /*
>> +        * Rings addresses should not be interpreted as long as the ring
>> is not
>> +        * started and enabled
>> +        */
>> +       memcpy(&vq->ring_addrs, addr, sizeof(*addr));
>> +
>> +       vring_invalidate(dev, vq);
>> +
>> +       if (vq->enabled && (dev->features &
>> +                               (1ULL << VHOST_USER_F_PROTOCOL_FEATURES))) {
>> +               dev = translate_ring_addresses(dev,
>> msg->payload.state.index);
>> +               if (!dev)
>> +                       return -1;
>> +
>> +               *pdev = dev;
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> +/*
>>     * The virtio device sends us the available ring last used index.
>>     */
>>    static int
>> @@ -1273,7 +1283,7 @@ vhost_user_msg_handler(int vid, int fd)
>>                   vhost_user_set_vring_num(dev, &msg);
>>                   break;
>>           case VHOST_USER_SET_VRING_ADDR:
>> -               vhost_user_set_vring_addr(dev, &msg);
>> +               vhost_user_set_vring_addr(&dev, &msg);
>>                   break;
>>           case VHOST_USER_SET_VRING_BASE:
>>                   vhost_user_set_vring_base(dev, &msg);
^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 01/19] Revert "vhost: workaround MQ fails to startup"
  2017-10-05  8:36 ` [PATCH v3 01/19] Revert "vhost: workaround MQ fails to startup" Maxime Coquelin
@ 2017-11-01 17:11   ` Kavanagh, Mark B
  2017-11-02  9:40     ` Maxime Coquelin
  0 siblings, 1 reply; 48+ messages in thread
From: Kavanagh, Mark B @ 2017-11-01 17:11 UTC (permalink / raw)
  To: Maxime Coquelin, dev@dpdk.org, Horton, Remy, Bie, Tiwei,
	yliu@fridaylinux.org
  Cc: mst@redhat.com, jfreiman@redhat.com, vkaplans@redhat.com,
	jasowang@redhat.com, Mcnamara, John, Loftus, Ciara, Stokes, Ian
Hi Maxime,
First off, apologies for the lateness of this reply - I realize that this patch has already been upstreamed.
Unfortunately, during OvS-DPDK regression testing for DPDK v17.11-rc2 just today, a regression involving vHost multiq was detected, and pinpointed to this patch.
Version info for the components involved during the aforementioned testing is as follows:
DPDK:	v17.11-rc2
OvS:	af2e40c ("sparse: eliminate "duplicate initialization") + DPDK v17.11 upgrade patch
QEMU:	v2.7.0
The regression may be reproduced as follows:
- Set up OvS-DPDK as normal, and add one vHostUser client port:
	$OVS_DIR/utilities/ovs-vsctl add-port br0 vhost-user0 -- set Interface vhost-user0 type=dpdkvhostuserclient
 
- Start QEMU, specifying 2 ports for the guest interface"
	$QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 \
    -cpu host -enable-kvm -m 4096M \
    -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on \
    -numa node,memdev=mem -mem-prealloc \
    -drive file=$VM_IMAGE \
    -chardev socket,id=char0,path=/tmp/sock0,server \
    -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce,queues=2 \
    -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off,mq=on,vectors=6 \
    -nographic"
 
- The guest subsequently starts as normal, but then hangs completely, rendering it completely unusable.
  The last lines of ovs-vswitchd.log read as follows:
	|00051|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE
	|00052|dpdk|INFO|VHOST_CONFIG: set queue enable: 1 to qp idx: 1
	|00053|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_SET_FEATURES
Moving from QEMU v2.7.0 to v2.10.0 resolves the issue. However, herein lies the issue: QEMU v2.10.0 was only released in August of this year; anecdotally, we know that many OvS-DPDK customers use older versions of QEMU (typically, v2.7.0), and are likely un[able|willing] to move. With this patch, a hard dependency on QEMU v2.10 is created for users who want to use the vHU multiq feature in DPDK v17.11 (and subsequently, the upcoming OvS v2.9.0), which IMO will likely be unacceptable for many.
One potential solution to this problem is to introduce a compile-time option that would allow the user to [dis|en]able the VHOST_USER_PROTOCOL_F_REPLY_ACK feature - is that something that would be acceptable to you Maxime?
Thanks in advance,
Mark 
>From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Maxime Coquelin
>Sent: Thursday, October 5, 2017 9:36 AM
>To: dev@dpdk.org; Horton, Remy <remy.horton@intel.com>; Bie, Tiwei
><tiwei.bie@intel.com>; yliu@fridaylinux.org
>Cc: mst@redhat.com; jfreiman@redhat.com; vkaplans@redhat.com;
>jasowang@redhat.com; Maxime Coquelin <maxime.coquelin@redhat.com>
>Subject: [dpdk-dev] [PATCH v3 01/19] Revert "vhost: workaround MQ fails to
>startup"
>
>This reverts commit 04d81227960b5c1cf2f11f492100979ead20c526.
>
>As agreed when this workaround was introduced, it can be reverted
>as Qemu v2.10 that fixes the issue is now out.
>
>The reply-ack feature is required for vhost-user IOMMU support.
>
>Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>---
> lib/librte_vhost/vhost_user.h | 6 +-----
> 1 file changed, 1 insertion(+), 5 deletions(-)
>
>diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
>index 35ebd7190..2ba22dbb0 100644
>--- a/lib/librte_vhost/vhost_user.h
>+++ b/lib/librte_vhost/vhost_user.h
>@@ -49,14 +49,10 @@
> #define VHOST_USER_PROTOCOL_F_REPLY_ACK	3
> #define VHOST_USER_PROTOCOL_F_NET_MTU 4
>
>-/*
>- * disable REPLY_ACK feature to workaround the buggy QEMU implementation.
>- * Proved buggy QEMU includes v2.7 - v2.9.
>- */
> #define VHOST_USER_PROTOCOL_FEATURES	((1ULL << VHOST_USER_PROTOCOL_F_MQ) | \
> 					 (1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD) |\
> 					 (1ULL << VHOST_USER_PROTOCOL_F_RARP) | \
>-					 (0ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK) | \
>+					 (1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK) | \
> 					 (1ULL << VHOST_USER_PROTOCOL_F_NET_MTU))
>
> typedef enum VhostUserRequest {
>--
>2.13.6
^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 17/19] vhost-user: iommu: postpone device creation until ring are mapped
  2017-10-05  8:36 ` [PATCH v3 17/19] vhost-user: iommu: postpone device creation until ring are mapped Maxime Coquelin
@ 2017-11-02  7:21   ` Yao, Lei A
  2017-11-02  8:21     ` Maxime Coquelin
  0 siblings, 1 reply; 48+ messages in thread
From: Yao, Lei A @ 2017-11-02  7:21 UTC (permalink / raw)
  To: Maxime Coquelin, dev@dpdk.org, Horton, Remy, Bie, Tiwei,
	yliu@fridaylinux.org
  Cc: mst@redhat.com, jfreiman@redhat.com, vkaplans@redhat.com,
	jasowang@redhat.com
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Maxime Coquelin
> Sent: Thursday, October 5, 2017 4:36 PM
> To: dev@dpdk.org; Horton, Remy <remy.horton@intel.com>; Bie, Tiwei
> <tiwei.bie@intel.com>; yliu@fridaylinux.org
> Cc: mst@redhat.com; jfreiman@redhat.com; vkaplans@redhat.com;
> jasowang@redhat.com; Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [dpdk-dev] [PATCH v3 17/19] vhost-user: iommu: postpone device
> creation until ring are mapped
> 
> Translating the start addresses of the rings is not enough, we need to
> be sure all the ring is made available by the guest.
> 
> It depends on the size of the rings, which is not known on SET_VRING_ADDR
> reception. Furthermore, we need to be be safe against vring pages
> invalidates.
> 
> This patch introduces a new access_ok flag per virtqueue, which is set
> when all the rings are mapped, and cleared as soon as a page used by a
> ring is invalidated. The invalidation part is implemented in a following
> patch.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/librte_vhost/vhost.c      | 37 ++++++++++++++++++++++++++
>  lib/librte_vhost/vhost.h      |  2 ++
>  lib/librte_vhost/vhost_user.c | 62 +++++++++++++++++++++++++++++++--
> ----------
>  lib/librte_vhost/virtio_net.c | 60 +++++++++++++++++++++++++-------------
> ---
>  4 files changed, 121 insertions(+), 40 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
> index 0e2ad3322..ef54835a6 100644
> --- a/lib/librte_vhost/vhost.c
> +++ b/lib/librte_vhost/vhost.c
> @@ -135,6 +135,43 @@ free_device(struct virtio_net *dev)
>  	rte_free(dev);
>  }
> 
> +int
> +vring_translate(struct virtio_net *dev, struct vhost_virtqueue *vq)
> +{
> +	uint64_t size;
> +
> +	if (!(dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)))
> +		goto out;
> +
> +	size = sizeof(struct vring_desc) * vq->size;
> +	vq->desc = (struct vring_desc *)vhost_iova_to_vva(dev, vq,
> +						vq-
> >ring_addrs.desc_user_addr,
> +						size, VHOST_ACCESS_RW);
> +	if (!vq->desc)
> +		return -1;
> +
> +	size = sizeof(struct vring_avail);
> +	size += sizeof(uint16_t) * vq->size;
> +	vq->avail = (struct vring_avail *)vhost_iova_to_vva(dev, vq,
> +						vq-
> >ring_addrs.avail_user_addr,
> +						size, VHOST_ACCESS_RW);
> +	if (!vq->avail)
> +		return -1;
> +
> +	size = sizeof(struct vring_used);
> +	size += sizeof(struct vring_used_elem) * vq->size;
> +	vq->used = (struct vring_used *)vhost_iova_to_vva(dev, vq,
> +						vq-
> >ring_addrs.used_user_addr,
> +						size, VHOST_ACCESS_RW);
> +	if (!vq->used)
> +		return -1;
> +
> +out:
> +	vq->access_ok = 1;
> +
> +	return 0;
> +}
> +
>  static void
>  init_vring_queue(struct virtio_net *dev, uint32_t vring_idx)
>  {
> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> index 903da5db5..b3fe6bb8e 100644
> --- a/lib/librte_vhost/vhost.h
> +++ b/lib/librte_vhost/vhost.h
> @@ -113,6 +113,7 @@ struct vhost_virtqueue {
>  	/* Currently unused as polling mode is enabled */
>  	int			kickfd;
>  	int			enabled;
> +	int			access_ok;
> 
>  	/* Physical address of used ring, for logging */
>  	uint64_t		log_guest_addr;
> @@ -378,6 +379,7 @@ void vhost_backend_cleanup(struct virtio_net *dev);
> 
>  uint64_t __vhost_iova_to_vva(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
>  			uint64_t iova, uint64_t size, uint8_t perm);
> +int vring_translate(struct virtio_net *dev, struct vhost_virtqueue *vq);
> 
>  static __rte_always_inline uint64_t
>  vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> index 90b209764..dd6562fd8 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -391,6 +391,12 @@ vhost_user_set_vring_addr(struct virtio_net *dev,
> VhostUserMsg *msg)
>  	 */
>  	memcpy(&vq->ring_addrs, addr, sizeof(*addr));
> 
> +	vq->desc = NULL;
> +	vq->avail = NULL;
> +	vq->used = NULL;
> +
> +	vq->access_ok = 0;
> +
>  	return 0;
>  }
> 
> @@ -407,10 +413,10 @@ static struct virtio_net
> *translate_ring_addresses(struct virtio_net *dev,
>  	vq->desc = (struct vring_desc *)(uintptr_t)ring_addr_to_vva(dev,
>  			vq, addr->desc_user_addr, sizeof(struct vring_desc));
>  	if (vq->desc == 0) {
> -		RTE_LOG(ERR, VHOST_CONFIG,
> +		RTE_LOG(DEBUG, VHOST_CONFIG,
>  			"(%d) failed to find desc ring address.\n",
>  			dev->vid);
> -		return NULL;
> +		return dev;
>  	}
> 
>  	dev = numa_realloc(dev, vq_index);
> @@ -419,19 +425,19 @@ static struct virtio_net
> *translate_ring_addresses(struct virtio_net *dev,
>  	vq->avail = (struct vring_avail *)(uintptr_t)ring_addr_to_vva(dev,
>  			vq, addr->avail_user_addr, sizeof(struct vring_avail));
>  	if (vq->avail == 0) {
> -		RTE_LOG(ERR, VHOST_CONFIG,
> +		RTE_LOG(DEBUG, VHOST_CONFIG,
>  			"(%d) failed to find avail ring address.\n",
>  			dev->vid);
> -		return NULL;
> +		return dev;
>  	}
> 
>  	vq->used = (struct vring_used *)(uintptr_t)ring_addr_to_vva(dev,
>  			vq, addr->used_user_addr, sizeof(struct
> vring_used));
>  	if (vq->used == 0) {
> -		RTE_LOG(ERR, VHOST_CONFIG,
> +		RTE_LOG(DEBUG, VHOST_CONFIG,
>  			"(%d) failed to find used ring address.\n",
>  			dev->vid);
> -		return NULL;
> +		return dev;
>  	}
> 
>  	if (vq->last_used_idx != vq->used->idx) {
> @@ -677,7 +683,7 @@ vhost_user_set_mem_table(struct virtio_net *dev,
> struct VhostUserMsg *pmsg)
>  static int
>  vq_is_ready(struct vhost_virtqueue *vq)
>  {
> -	return vq && vq->desc   &&
> +	return vq && vq->desc && vq->avail && vq->used &&
>  	       vq->kickfd != VIRTIO_UNINITIALIZED_EVENTFD &&
>  	       vq->callfd != VIRTIO_UNINITIALIZED_EVENTFD;
>  }
> @@ -986,8 +992,29 @@ vhost_user_set_req_fd(struct virtio_net *dev,
> struct VhostUserMsg *msg)
>  }
> 
>  static int
> -vhost_user_iotlb_msg(struct virtio_net *dev, struct VhostUserMsg *msg)
> +is_vring_iotlb_update(struct vhost_virtqueue *vq, struct vhost_iotlb_msg
> *imsg)
>  {
> +	struct vhost_vring_addr *ra;
> +	uint64_t start, end;
> +
> +	start = imsg->iova;
> +	end = start + imsg->size;
> +
> +	ra = &vq->ring_addrs;
> +	if (ra->desc_user_addr >= start && ra->desc_user_addr < end)
> +		return 1;
> +	if (ra->avail_user_addr >= start && ra->avail_user_addr < end)
> +		return 1;
> +	if (ra->used_user_addr >= start && ra->used_user_addr < end)
> +		return 1;
> +
> +	return 0;
> +}
> +
> +static int
> +vhost_user_iotlb_msg(struct virtio_net **pdev, struct VhostUserMsg *msg)
> +{
> +	struct virtio_net *dev = *pdev;
>  	struct vhost_iotlb_msg *imsg = &msg->payload.iotlb;
>  	uint16_t i;
>  	uint64_t vva;
> @@ -1003,6 +1030,9 @@ vhost_user_iotlb_msg(struct virtio_net *dev,
> struct VhostUserMsg *msg)
> 
>  			vhost_user_iotlb_cache_insert(vq, imsg->iova, vva,
>  					imsg->size, imsg->perm);
> +
> +			if (is_vring_iotlb_update(vq, imsg))
> +				*pdev = dev = translate_ring_addresses(dev,
> i);
>  		}
>  		break;
>  	case VHOST_IOTLB_INVALIDATE:
> @@ -1151,8 +1181,12 @@ vhost_user_msg_handler(int vid, int fd)
>  	}
> 
>  	ret = 0;
> -	RTE_LOG(INFO, VHOST_CONFIG, "read message %s\n",
> -		vhost_message_str[msg.request]);
> +	if (msg.request != VHOST_USER_IOTLB_MSG)
> +		RTE_LOG(INFO, VHOST_CONFIG, "read message %s\n",
> +			vhost_message_str[msg.request]);
> +	else
> +		RTE_LOG(DEBUG, VHOST_CONFIG, "read message %s\n",
> +			vhost_message_str[msg.request]);
> 
>  	ret = vhost_user_check_and_alloc_queue_pair(dev, &msg);
>  	if (ret < 0) {
> @@ -1254,7 +1288,7 @@ vhost_user_msg_handler(int vid, int fd)
>  		break;
> 
>  	case VHOST_USER_IOTLB_MSG:
> -		ret = vhost_user_iotlb_msg(dev, &msg);
> +		ret = vhost_user_iotlb_msg(&dev, &msg);
>  		break;
> 
>  	default:
> @@ -1263,12 +1297,6 @@ vhost_user_msg_handler(int vid, int fd)
> 
>  	}
> 
> -	/*
> -	 * The virtio_net struct might have been reallocated on a different
> -	 * NUMA node, so dev pointer might no more be valid.
> -	 */
> -	dev = get_device(vid);
> -
>  	if (msg.flags & VHOST_USER_NEED_REPLY) {
>  		msg.payload.u64 = !!ret;
>  		msg.size = sizeof(msg.payload.u64);
> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> index cdfb6f957..b75c93cf1 100644
> --- a/lib/librte_vhost/virtio_net.c
> +++ b/lib/librte_vhost/virtio_net.c
> @@ -329,13 +329,23 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t
> queue_id,
>  	if (unlikely(vq->enabled == 0))
>  		return 0;
> 
> +	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
> +		vhost_user_iotlb_rd_lock(vq);
> +
> +	if (unlikely(vq->access_ok == 0)) {
> +		if (unlikely(vring_translate(dev, vq) < 0)) {
> +			count = 0;
> +			goto out;
> +		}
> +	}
> +
>  	avail_idx = *((volatile uint16_t *)&vq->avail->idx);
>  	start_idx = vq->last_used_idx;
>  	free_entries = avail_idx - start_idx;
>  	count = RTE_MIN(count, free_entries);
>  	count = RTE_MIN(count, (uint32_t)MAX_PKT_BURST);
>  	if (count == 0)
> -		return 0;
> +		goto out;
> 
>  	LOG_DEBUG(VHOST_DATA, "(%d) start_idx %d | end_idx %d\n",
>  		dev->vid, start_idx, start_idx + count);
> @@ -356,10 +366,6 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t
> queue_id,
>  	}
> 
>  	rte_prefetch0(&vq->desc[desc_indexes[0]]);
> -
> -	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
> -		vhost_user_iotlb_rd_lock(vq);
> -
>  	for (i = 0; i < count; i++) {
>  		uint16_t desc_idx = desc_indexes[i];
>  		int err;
> @@ -394,9 +400,6 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t
> queue_id,
> 
>  	do_data_copy_enqueue(dev, vq);
> 
> -	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
> -		vhost_user_iotlb_rd_unlock(vq);
> -
>  	rte_smp_wmb();
> 
>  	*(volatile uint16_t *)&vq->used->idx += count;
> @@ -412,6 +415,10 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t
> queue_id,
>  	if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)
>  			&& (vq->callfd >= 0))
>  		eventfd_write(vq->callfd, (eventfd_t)1);
> +out:
> +	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
> +		vhost_user_iotlb_rd_unlock(vq);
> +
>  	return count;
>  }
> 
> @@ -647,9 +654,16 @@ virtio_dev_merge_rx(struct virtio_net *dev,
> uint16_t queue_id,
>  	if (unlikely(vq->enabled == 0))
>  		return 0;
> 
> +	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
> +		vhost_user_iotlb_rd_lock(vq);
> +
> +	if (unlikely(vq->access_ok == 0))
> +		if (unlikely(vring_translate(dev, vq) < 0))
> +			goto out;
> +
>  	count = RTE_MIN((uint32_t)MAX_PKT_BURST, count);
>  	if (count == 0)
> -		return 0;
> +		goto out;
> 
>  	vq->batch_copy_nb_elems = 0;
> 
> @@ -657,10 +671,6 @@ virtio_dev_merge_rx(struct virtio_net *dev,
> uint16_t queue_id,
> 
>  	vq->shadow_used_idx = 0;
>  	avail_head = *((volatile uint16_t *)&vq->avail->idx);
> -
> -	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
> -		vhost_user_iotlb_rd_lock(vq);
> -
>  	for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
>  		uint32_t pkt_len = pkts[pkt_idx]->pkt_len + dev-
> >vhost_hlen;
> 
> @@ -689,9 +699,6 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t
> queue_id,
> 
>  	do_data_copy_enqueue(dev, vq);
> 
> -	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
> -		vhost_user_iotlb_rd_unlock(vq);
> -
>  	if (likely(vq->shadow_used_idx)) {
>  		flush_shadow_used_ring(dev, vq);
> 
> @@ -704,6 +711,10 @@ virtio_dev_merge_rx(struct virtio_net *dev,
> uint16_t queue_id,
>  			eventfd_write(vq->callfd, (eventfd_t)1);
>  	}
> 
> +out:
> +	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
> +		vhost_user_iotlb_rd_unlock(vq);
> +
>  	return pkt_idx;
>  }
> 
> @@ -1173,6 +1184,13 @@ rte_vhost_dequeue_burst(int vid, uint16_t
> queue_id,
> 
>  	vq->batch_copy_nb_elems = 0;
> 
> +	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
> +		vhost_user_iotlb_rd_lock(vq);
> +
> +	if (unlikely(vq->access_ok == 0))
> +		if (unlikely(vring_translate(dev, vq) < 0))
> +			goto out;
> +
>  	if (unlikely(dev->dequeue_zero_copy)) {
>  		struct zcopy_mbuf *zmbuf, *next;
>  		int nr_updated = 0;
> @@ -1262,10 +1280,6 @@ rte_vhost_dequeue_burst(int vid, uint16_t
> queue_id,
> 
>  	/* Prefetch descriptor index. */
>  	rte_prefetch0(&vq->desc[desc_indexes[0]]);
> -
> -	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
> -		vhost_user_iotlb_rd_lock(vq);
> -
>  	for (i = 0; i < count; i++) {
>  		struct vring_desc *desc;
>  		uint16_t sz, idx;
> @@ -1329,9 +1343,6 @@ rte_vhost_dequeue_burst(int vid, uint16_t
> queue_id,
>  			TAILQ_INSERT_TAIL(&vq->zmbuf_list, zmbuf, next);
>  		}
>  	}
> -	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
> -		vhost_user_iotlb_rd_unlock(vq);
> -
>  	vq->last_avail_idx += i;
> 
>  	if (likely(dev->dequeue_zero_copy == 0)) {
> @@ -1341,6 +1352,9 @@ rte_vhost_dequeue_burst(int vid, uint16_t
> queue_id,
>  	}
> 
>  out:
> +	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
> +		vhost_user_iotlb_rd_unlock(vq);
> +
>  	if (unlikely(rarp_mbuf != NULL)) {
>  		/*
>  		 * Inject it to the head of "pkts" array, so that switch's mac
> --
> 2.13.6
Hi, Maxime
I met one issue with your patch set during the v17.11 test.
The test scenario is following, 
1.	Bind one NIC, use test-pmd set vhost-user with 2 queue
usertools/dpdk-devbind.py --bind=igb_uio 0000:05:00.0
./x86_64-native-linuxapp-gcc/app/testpmd -c 0xe -n 4 --socket-mem 1024,1024 \
--vdev 'net_vhost0,iface=vhost-net,queues=2' - -i --rxq=2 --txq=2 --nb-cores=2 --rss-ip
2.	Launch qemu with  virtio device which has 2 queue 
3.	In VM, launch testpmd with virtio-pmd using only 1 queue.
x86_64-native-linuxapp-gcc/app/testpmd -c 0x07 -n 3 - -i --txqflags=0xf01 \
--rxq=1 --txq=1 --rss-ip --nb-cores=1
First, 
commit 09927b5249694bad1c094d3068124673722e6b8f
vhost: translate ring addresses when IOMMU enabled
The patch causes no traffic in PVP test. but link status is still up in vhost-user.
Second, 
eefac9536a901a1f0bb52aa3b6fec8f375f09190 
vhost: postpone device creation until rings are mapped
The patch causes link status "down" in vhost-user.
Could you have a check at your side? Thanks.
BRs
Lei
^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 17/19] vhost-user: iommu: postpone device creation until ring are mapped
  2017-11-02  7:21   ` Yao, Lei A
@ 2017-11-02  8:21     ` Maxime Coquelin
  2017-11-02 16:02       ` Maxime Coquelin
  0 siblings, 1 reply; 48+ messages in thread
From: Maxime Coquelin @ 2017-11-02  8:21 UTC (permalink / raw)
  To: Yao, Lei A, dev@dpdk.org, Horton, Remy, Bie, Tiwei,
	yliu@fridaylinux.org
  Cc: mst@redhat.com, jfreiman@redhat.com, vkaplans@redhat.com,
	jasowang@redhat.com
Hi Lei,
On 11/02/2017 08:21 AM, Yao, Lei A wrote:
> 
...
> Hi, Maxime > I met one issue with your patch set during the v17.11 test.
Is it with v17.11-rc2 or -rc1?
> The test scenario is following,
> 1.	Bind one NIC, use test-pmd set vhost-user with 2 queue
> usertools/dpdk-devbind.py --bind=igb_uio 0000:05:00.0
> ./x86_64-native-linuxapp-gcc/app/testpmd -c 0xe -n 4 --socket-mem 1024,1024 \
> --vdev 'net_vhost0,iface=vhost-net,queues=2' - -i --rxq=2 --txq=2 --nb-cores=2 --rss-ip
> 2.	Launch qemu with  virtio device which has 2 queue
> 3.	In VM, launch testpmd with virtio-pmd using only 1 queue.
> x86_64-native-linuxapp-gcc/app/testpmd -c 0x07 -n 3 - -i --txqflags=0xf01 \
> --rxq=1 --txq=1 --rss-ip --nb-cores=1
> 
> First,
> commit 09927b5249694bad1c094d3068124673722e6b8f
> vhost: translate ring addresses when IOMMU enabled
> The patch causes no traffic in PVP test. but link status is still up in vhost-user.
> 
> Second,
> eefac9536a901a1f0bb52aa3b6fec8f375f09190
> vhost: postpone device creation until rings are mapped
> The patch causes link status "down" in vhost-user.
> 
> Could you have a check at your side? Thanks.
Sure, could you please on your side provide more info?
1. Host testpmd logs
2. Qemu version and cmdline
Thanks,
Maxime
> BRs
> Lei
> 
^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 01/19] Revert "vhost: workaround MQ fails to startup"
  2017-11-01 17:11   ` Kavanagh, Mark B
@ 2017-11-02  9:40     ` Maxime Coquelin
  2017-11-03 13:05       ` Yuanhan Liu
  2017-11-03 15:34       ` Thomas Monjalon
  0 siblings, 2 replies; 48+ messages in thread
From: Maxime Coquelin @ 2017-11-02  9:40 UTC (permalink / raw)
  To: Kavanagh, Mark B, yliu@fridaylinux.org
  Cc: dev@dpdk.org, Horton, Remy, Bie, Tiwei, mst@redhat.com,
	jfreiman@redhat.com, vkaplans@redhat.com, jasowang@redhat.com,
	Mcnamara, John, Loftus, Ciara, Stokes, Ian
Hi Mark,
On 11/01/2017 06:11 PM, Kavanagh, Mark B wrote:
> Hi Maxime,
> 
> First off, apologies for the lateness of this reply - I realize that this patch has already been upstreamed.
No worries, great to see DPDK integration being tested before v17.11 is
released. Is the v17.11 upgrade patch available somewhere?
> Unfortunately, during OvS-DPDK regression testing for DPDK v17.11-rc2 just today, a regression involving vHost multiq was detected, and pinpointed to this patch.
> 
> Version info for the components involved during the aforementioned testing is as follows:
> DPDK:	v17.11-rc2
> OvS:	af2e40c ("sparse: eliminate "duplicate initialization") + DPDK v17.11 upgrade patch
> QEMU:	v2.7.0
> 
> The regression may be reproduced as follows:
> - Set up OvS-DPDK as normal, and add one vHostUser client port:
> 	$OVS_DIR/utilities/ovs-vsctl add-port br0 vhost-user0 -- set Interface vhost-user0 type=dpdkvhostuserclient
>   
> - Start QEMU, specifying 2 ports for the guest interface"
> 	$QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 \
>      -cpu host -enable-kvm -m 4096M \
>      -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on \
>      -numa node,memdev=mem -mem-prealloc \
>      -drive file=$VM_IMAGE \
>      -chardev socket,id=char0,path=/tmp/sock0,server \
>      -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce,queues=2 \
>      -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off,mq=on,vectors=6 \
>      -nographic"
>   
> - The guest subsequently starts as normal, but then hangs completely, rendering it completely unusable.
>    The last lines of ovs-vswitchd.log read as follows:
> 	|00051|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE
> 	|00052|dpdk|INFO|VHOST_CONFIG: set queue enable: 1 to qp idx: 1
> 	|00053|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_SET_FEATURES
> 
> Moving from QEMU v2.7.0 to v2.10.0 resolves the issue. However, herein lies the issue: QEMU v2.10.0 was only released in August of this year; anecdotally, we know that many OvS-DPDK customers use older versions of QEMU (typically, v2.7.0), and are likely un[able|willing] to move. With this patch, a hard dependency on QEMU v2.10 is created for users who want to use the vHU multiq feature in DPDK v17.11 (and subsequently, the upcoming OvS v2.9.0), which IMO will likely be unacceptable for many.
Do you mean that upstream Qemu v2.7.0 is used in production?
I would expect the customers to use a distro Qemu which should contain
relevant fixes, or follow upstream's stable branches.
FYI, Qemu v2.9.1 contains a backport of the fix.
> One potential solution to this problem is to introduce a compile-time option that would allow the user to [dis|en]able the VHOST_USER_PROTOCOL_F_REPLY_ACK feature - is that something that would be acceptable to you Maxime?
Yes, that's one option, but:
1. VHOST_USER_PROTOCOL_F_REPLY_ACK enabled should be the default
2. VHOST_USER_PROTOCOL_F_REPLY_ACK disabled will be less extensively
tested.
Yuanhan, what do you think?
Regards,
Maxime
> Thanks in advance,
> Mark
> 
> 
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Maxime Coquelin
>> Sent: Thursday, October 5, 2017 9:36 AM
>> To: dev@dpdk.org; Horton, Remy <remy.horton@intel.com>; Bie, Tiwei
>> <tiwei.bie@intel.com>; yliu@fridaylinux.org
>> Cc: mst@redhat.com; jfreiman@redhat.com; vkaplans@redhat.com;
>> jasowang@redhat.com; Maxime Coquelin <maxime.coquelin@redhat.com>
>> Subject: [dpdk-dev] [PATCH v3 01/19] Revert "vhost: workaround MQ fails to
>> startup"
>>
>> This reverts commit 04d81227960b5c1cf2f11f492100979ead20c526.
>>
>> As agreed when this workaround was introduced, it can be reverted
>> as Qemu v2.10 that fixes the issue is now out.
>>
>> The reply-ack feature is required for vhost-user IOMMU support.
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>> lib/librte_vhost/vhost_user.h | 6 +-----
>> 1 file changed, 1 insertion(+), 5 deletions(-)
>>
>> diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
>> index 35ebd7190..2ba22dbb0 100644
>> --- a/lib/librte_vhost/vhost_user.h
>> +++ b/lib/librte_vhost/vhost_user.h
>> @@ -49,14 +49,10 @@
>> #define VHOST_USER_PROTOCOL_F_REPLY_ACK	3
>> #define VHOST_USER_PROTOCOL_F_NET_MTU 4
>>
>> -/*
>> - * disable REPLY_ACK feature to workaround the buggy QEMU implementation.
>> - * Proved buggy QEMU includes v2.7 - v2.9.
>> - */
>> #define VHOST_USER_PROTOCOL_FEATURES	((1ULL << VHOST_USER_PROTOCOL_F_MQ) | \
>> 					 (1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD) |\
>> 					 (1ULL << VHOST_USER_PROTOCOL_F_RARP) | \
>> -					 (0ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK) | \
>> +					 (1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK) | \
>> 					 (1ULL << VHOST_USER_PROTOCOL_F_NET_MTU))
>>
>> typedef enum VhostUserRequest {
>> --
>> 2.13.6
> 
^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 17/19] vhost-user: iommu: postpone device creation until ring are mapped
  2017-11-02  8:21     ` Maxime Coquelin
@ 2017-11-02 16:02       ` Maxime Coquelin
  2017-11-03  8:25         ` Maxime Coquelin
  0 siblings, 1 reply; 48+ messages in thread
From: Maxime Coquelin @ 2017-11-02 16:02 UTC (permalink / raw)
  To: Yao, Lei A, dev@dpdk.org, yliu@fridaylinux.org
  Cc: Horton, Remy, Bie, Tiwei, mst@redhat.com, jfreiman@redhat.com,
	vkaplans@redhat.com, jasowang@redhat.com
On 11/02/2017 09:21 AM, Maxime Coquelin wrote:
> Hi Lei,
> 
> On 11/02/2017 08:21 AM, Yao, Lei A wrote:
>>
> ...
>> Hi, Maxime > I met one issue with your patch set during the v17.11 test.
> 
> Is it with v17.11-rc2 or -rc1?
> 
>> The test scenario is following,
>> 1.    Bind one NIC, use test-pmd set vhost-user with 2 queue
>> usertools/dpdk-devbind.py --bind=igb_uio 0000:05:00.0
>> ./x86_64-native-linuxapp-gcc/app/testpmd -c 0xe -n 4 --socket-mem 
>> 1024,1024 \
>> --vdev 'net_vhost0,iface=vhost-net,queues=2' - -i --rxq=2 --txq=2 
>> --nb-cores=2 --rss-ip
>> 2.    Launch qemu with  virtio device which has 2 queue
>> 3.    In VM, launch testpmd with virtio-pmd using only 1 queue.
>> x86_64-native-linuxapp-gcc/app/testpmd -c 0x07 -n 3 - -i 
>> --txqflags=0xf01 \
>> --rxq=1 --txq=1 --rss-ip --nb-cores=1
>>
>> First,
>> commit 09927b5249694bad1c094d3068124673722e6b8f
>> vhost: translate ring addresses when IOMMU enabled
>> The patch causes no traffic in PVP test. but link status is still up 
>> in vhost-user.
>>
>> Second,
>> eefac9536a901a1f0bb52aa3b6fec8f375f09190
>> vhost: postpone device creation until rings are mapped
>> The patch causes link status "down" in vhost-user.
I reproduced this one, and understand why link status remains down.
My series did fixed a potential issue Michael raised, that the vring
addresses should only interpreted once the ring is enabled.
When VHOST_USER_F_PROTOCOL_FEATURES is negotiated, the rings addrs are
translated when ring is enabled via VHOST_USER_SET_VRING_ENABLE.
When not negotiated, the ring is considered started enabled, so
translation is done at VHOST_USER_SET_VRING_KICK time.
In your case, protocol features are negotiated, so the ring addresses
are translated at enable time. The problem is that the code considers
the device is ready once addresses for all the rings are translated.
But since only the first pair of rings is used, it never happens, and
the link remains down.
One of the reason this check is done is to avoid starting the PMD
threads before the addresses are translated in case of NUMA
reallocation, as virtqueues and virtio-net device structs can be
reallocated on a different node.
I think the right fix would be to only perform NUMA reallocation for
vring 0, as today we would end-up reallocating virtio-net struct
mulitple time if VQs are on different NUMA nodes.
Doing that, we could then consider the device is ready if vring 0 is
enabled and its ring addresses are translated, and if other vrings have
been kicked.
I'll post a patch shortly implementing this idea.
Thanks,
Maxime
^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 17/19] vhost-user: iommu: postpone device creation until ring are mapped
  2017-11-02 16:02       ` Maxime Coquelin
@ 2017-11-03  8:25         ` Maxime Coquelin
  2017-11-03 15:15           ` Michael S. Tsirkin
  0 siblings, 1 reply; 48+ messages in thread
From: Maxime Coquelin @ 2017-11-03  8:25 UTC (permalink / raw)
  To: Yao, Lei A, dev@dpdk.org, yliu@fridaylinux.org
  Cc: Horton, Remy, Bie, Tiwei, mst@redhat.com, jfreiman@redhat.com,
	vkaplans@redhat.com, jasowang@redhat.com
On 11/02/2017 05:02 PM, Maxime Coquelin wrote:
> 
> 
> On 11/02/2017 09:21 AM, Maxime Coquelin wrote:
>> Hi Lei,
>>
>> On 11/02/2017 08:21 AM, Yao, Lei A wrote:
>>>
>> ...
>>> Hi, Maxime > I met one issue with your patch set during the v17.11 test.
>>
>> Is it with v17.11-rc2 or -rc1?
>>
>>> The test scenario is following,
>>> 1.    Bind one NIC, use test-pmd set vhost-user with 2 queue
>>> usertools/dpdk-devbind.py --bind=igb_uio 0000:05:00.0
>>> ./x86_64-native-linuxapp-gcc/app/testpmd -c 0xe -n 4 --socket-mem 
>>> 1024,1024 \
>>> --vdev 'net_vhost0,iface=vhost-net,queues=2' - -i --rxq=2 --txq=2 
>>> --nb-cores=2 --rss-ip
>>> 2.    Launch qemu with  virtio device which has 2 queue
>>> 3.    In VM, launch testpmd with virtio-pmd using only 1 queue.
>>> x86_64-native-linuxapp-gcc/app/testpmd -c 0x07 -n 3 - -i 
>>> --txqflags=0xf01 \
>>> --rxq=1 --txq=1 --rss-ip --nb-cores=1
>>>
>>> First,
>>> commit 09927b5249694bad1c094d3068124673722e6b8f
>>> vhost: translate ring addresses when IOMMU enabled
>>> The patch causes no traffic in PVP test. but link status is still up 
>>> in vhost-user.
>>>
>>> Second,
>>> eefac9536a901a1f0bb52aa3b6fec8f375f09190
>>> vhost: postpone device creation until rings are mapped
>>> The patch causes link status "down" in vhost-user.
> 
> I reproduced this one, and understand why link status remains down.
> My series did fixed a potential issue Michael raised, that the vring
> addresses should only interpreted once the ring is enabled.
> When VHOST_USER_F_PROTOCOL_FEATURES is negotiated, the rings addrs are
> translated when ring is enabled via VHOST_USER_SET_VRING_ENABLE.
> When not negotiated, the ring is considered started enabled, so
> translation is done at VHOST_USER_SET_VRING_KICK time.
> 
> In your case, protocol features are negotiated, so the ring addresses
> are translated at enable time. The problem is that the code considers
> the device is ready once addresses for all the rings are translated.
> But since only the first pair of rings is used, it never happens, and
> the link remains down.
> 
> One of the reason this check is done is to avoid starting the PMD
> threads before the addresses are translated in case of NUMA
> reallocation, as virtqueues and virtio-net device structs can be
> reallocated on a different node.
> 
> I think the right fix would be to only perform NUMA reallocation for
> vring 0, as today we would end-up reallocating virtio-net struct
> mulitple time if VQs are on different NUMA nodes.
> 
> Doing that, we could then consider the device is ready if vring 0 is
> enabled and its ring addresses are translated, and if other vrings have
> been kicked.
> 
> I'll post a patch shortly implementing this idea.
The proposed solution doesn't work, because disabled queues get accessed 
at device start time:
int
rte_vhost_enable_guest_notification(int vid, uint16_t queue_id, int enable)
{
..
	dev->virtqueue[queue_id]->used->flags = VRING_USED_F_NO_NOTIFY;
	return 0;
}
The above function being called in Vhost PMD for every queues, enabled
or not. While we could fix the PMD, it could break other applications
using the Vhost lib API directly, so we cannot translate at enable
time reliably.
I think we may be a bit less conservative, and postpone addresses
translation at kick time, whatever VHOST_USER_F_PROTOCOL_FEATURES is
negotiated or not.
Regards,
Maxime
> Thanks,
> Maxime
^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 01/19] Revert "vhost: workaround MQ fails to startup"
  2017-11-02  9:40     ` Maxime Coquelin
@ 2017-11-03 13:05       ` Yuanhan Liu
  2017-11-03 14:28         ` Maxime Coquelin
  2017-11-03 15:34       ` Thomas Monjalon
  1 sibling, 1 reply; 48+ messages in thread
From: Yuanhan Liu @ 2017-11-03 13:05 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: Kavanagh, Mark B, dev@dpdk.org, Horton, Remy, Bie, Tiwei,
	mst@redhat.com, jfreiman@redhat.com, vkaplans@redhat.com,
	jasowang@redhat.com, Mcnamara, John, Loftus, Ciara, Stokes, Ian,
	Thomas Monjalon
On Thu, Nov 02, 2017 at 10:40:26AM +0100, Maxime Coquelin wrote:
> >Moving from QEMU v2.7.0 to v2.10.0 resolves the issue. However, herein lies the issue: QEMU v2.10.0 was only released in August of this year; anecdotally, we know that many OvS-DPDK customers use older versions of QEMU (typically, v2.7.0), and are likely un[able|willing] to move. With this patch, a hard dependency on QEMU v2.10 is created for users who want to use the vHU multiq feature in DPDK v17.11 (and subsequently, the upcoming OvS v2.9.0), which IMO will likely be unacceptable for many.
> 
> Do you mean that upstream Qemu v2.7.0 is used in production?
> I would expect the customers to use a distro Qemu which should contain
> relevant fixes, or follow upstream's stable branches.
> 
> FYI, Qemu v2.9.1 contains a backport of the fix.
> 
> >One potential solution to this problem is to introduce a compile-time option that would allow the user to [dis|en]able the VHOST_USER_PROTOCOL_F_REPLY_ACK feature - is that something that would be acceptable to you Maxime?
> 
> Yes, that's one option, but:
> 1. VHOST_USER_PROTOCOL_F_REPLY_ACK enabled should be the default
> 2. VHOST_USER_PROTOCOL_F_REPLY_ACK disabled will be less extensively
> tested.
> 
> Yuanhan, what do you think?
My suggestion is to still disable it by default. Qemu 2.7 - 2.9 (inclusive)
is a pretty big range, that I think quite many people would hit this issue.
	--yliu
^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 01/19] Revert "vhost: workaround MQ fails to startup"
  2017-11-03 13:05       ` Yuanhan Liu
@ 2017-11-03 14:28         ` Maxime Coquelin
  2017-11-06 12:00           ` Yuanhan Liu
  0 siblings, 1 reply; 48+ messages in thread
From: Maxime Coquelin @ 2017-11-03 14:28 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: Kavanagh, Mark B, dev@dpdk.org, Horton, Remy, Bie, Tiwei,
	mst@redhat.com, jfreiman@redhat.com, vkaplans@redhat.com,
	jasowang@redhat.com, Mcnamara, John, Loftus, Ciara, Stokes, Ian,
	Thomas Monjalon
On 11/03/2017 02:05 PM, Yuanhan Liu wrote:
> On Thu, Nov 02, 2017 at 10:40:26AM +0100, Maxime Coquelin wrote:
>>> Moving from QEMU v2.7.0 to v2.10.0 resolves the issue. However, herein lies the issue: QEMU v2.10.0 was only released in August of this year; anecdotally, we know that many OvS-DPDK customers use older versions of QEMU (typically, v2.7.0), and are likely un[able|willing] to move. With this patch, a hard dependency on QEMU v2.10 is created for users who want to use the vHU multiq feature in DPDK v17.11 (and subsequently, the upcoming OvS v2.9.0), which IMO will likely be unacceptable for many.
>>
>> Do you mean that upstream Qemu v2.7.0 is used in production?
>> I would expect the customers to use a distro Qemu which should contain
>> relevant fixes, or follow upstream's stable branches.
>>
>> FYI, Qemu v2.9.1 contains a backport of the fix.
>>
>>> One potential solution to this problem is to introduce a compile-time option that would allow the user to [dis|en]able the VHOST_USER_PROTOCOL_F_REPLY_ACK feature - is that something that would be acceptable to you Maxime?
>>
>> Yes, that's one option, but:
>> 1. VHOST_USER_PROTOCOL_F_REPLY_ACK enabled should be the default
>> 2. VHOST_USER_PROTOCOL_F_REPLY_ACK disabled will be less extensively
>> tested.
>>
>> Yuanhan, what do you think?
> 
> My suggestion is to still disable it by default. Qemu 2.7 - 2.9 (inclusive)
> is a pretty big range, that I think quite many people would hit this issue
Ok, then what about adding a new flag to rte_vhost_driver_register(), as
done for tx zero copy to enable IOMMU feature?
If flag is unset, then we mask out both IOMMU virtio feature flag and
REPLY_ACK protocol feature flag.
For a while this flag will be unset by default, not to break these
deprecated and unmaintained Qemu versions. But I think at some point
we should make it enabled by default, as it would be sad not to benefit
from this security feature.
This change will have an impact on OVS, as it will need a new vhost-user
port option to enable IOMMU feature. Thing that is transparent to OVS
currently.
Mark, Yuanhan, does that sound good to you?
Maxime
> 	--yliu
> 
^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 17/19] vhost-user: iommu: postpone device creation until ring are mapped
  2017-11-03  8:25         ` Maxime Coquelin
@ 2017-11-03 15:15           ` Michael S. Tsirkin
  2017-11-03 15:54             ` Maxime Coquelin
  0 siblings, 1 reply; 48+ messages in thread
From: Michael S. Tsirkin @ 2017-11-03 15:15 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: Yao, Lei A, dev@dpdk.org, yliu@fridaylinux.org, Horton, Remy,
	Bie, Tiwei, jfreiman@redhat.com, vkaplans@redhat.com,
	jasowang@redhat.com
On Fri, Nov 03, 2017 at 09:25:58AM +0100, Maxime Coquelin wrote:
> 
> 
> On 11/02/2017 05:02 PM, Maxime Coquelin wrote:
> > 
> > 
> > On 11/02/2017 09:21 AM, Maxime Coquelin wrote:
> > > Hi Lei,
> > > 
> > > On 11/02/2017 08:21 AM, Yao, Lei A wrote:
> > > > 
> > > ...
> > > > Hi, Maxime > I met one issue with your patch set during the v17.11 test.
> > > 
> > > Is it with v17.11-rc2 or -rc1?
> > > 
> > > > The test scenario is following,
> > > > 1.    Bind one NIC, use test-pmd set vhost-user with 2 queue
> > > > usertools/dpdk-devbind.py --bind=igb_uio 0000:05:00.0
> > > > ./x86_64-native-linuxapp-gcc/app/testpmd -c 0xe -n 4
> > > > --socket-mem 1024,1024 \
> > > > --vdev 'net_vhost0,iface=vhost-net,queues=2' - -i --rxq=2
> > > > --txq=2 --nb-cores=2 --rss-ip
> > > > 2.    Launch qemu with  virtio device which has 2 queue
> > > > 3.    In VM, launch testpmd with virtio-pmd using only 1 queue.
> > > > x86_64-native-linuxapp-gcc/app/testpmd -c 0x07 -n 3 - -i
> > > > --txqflags=0xf01 \
> > > > --rxq=1 --txq=1 --rss-ip --nb-cores=1
> > > > 
> > > > First,
> > > > commit 09927b5249694bad1c094d3068124673722e6b8f
> > > > vhost: translate ring addresses when IOMMU enabled
> > > > The patch causes no traffic in PVP test. but link status is
> > > > still up in vhost-user.
> > > > 
> > > > Second,
> > > > eefac9536a901a1f0bb52aa3b6fec8f375f09190
> > > > vhost: postpone device creation until rings are mapped
> > > > The patch causes link status "down" in vhost-user.
> > 
> > I reproduced this one, and understand why link status remains down.
> > My series did fixed a potential issue Michael raised, that the vring
> > addresses should only interpreted once the ring is enabled.
> > When VHOST_USER_F_PROTOCOL_FEATURES is negotiated, the rings addrs are
> > translated when ring is enabled via VHOST_USER_SET_VRING_ENABLE.
> > When not negotiated, the ring is considered started enabled, so
> > translation is done at VHOST_USER_SET_VRING_KICK time.
> > 
> > In your case, protocol features are negotiated, so the ring addresses
> > are translated at enable time. The problem is that the code considers
> > the device is ready once addresses for all the rings are translated.
> > But since only the first pair of rings is used, it never happens, and
> > the link remains down.
> > 
> > One of the reason this check is done is to avoid starting the PMD
> > threads before the addresses are translated in case of NUMA
> > reallocation, as virtqueues and virtio-net device structs can be
> > reallocated on a different node.
> > 
> > I think the right fix would be to only perform NUMA reallocation for
> > vring 0, as today we would end-up reallocating virtio-net struct
> > mulitple time if VQs are on different NUMA nodes.
> > 
> > Doing that, we could then consider the device is ready if vring 0 is
> > enabled and its ring addresses are translated, and if other vrings have
> > been kicked.
> > 
> > I'll post a patch shortly implementing this idea.
> 
> The proposed solution doesn't work, because disabled queues get accessed at
> device start time:
> 
> int
> rte_vhost_enable_guest_notification(int vid, uint16_t queue_id, int enable)
> {
> ..
> 	dev->virtqueue[queue_id]->used->flags = VRING_USED_F_NO_NOTIFY;
> 	return 0;
> }
> 
> The above function being called in Vhost PMD for every queues, enabled
> or not. While we could fix the PMD, it could break other applications
> using the Vhost lib API directly, so we cannot translate at enable
> time reliably.
> 
> I think we may be a bit less conservative, and postpone addresses
> translation at kick time, whatever VHOST_USER_F_PROTOCOL_FEATURES is
> negotiated or not.
> 
> Regards,
> Maxime
> 
> > Thanks,
> > Maxime
I agree, enabling has nothing to do with it.
The spec is quite explicit:
Client must only process each ring when it is started.
and
Client must start ring upon receiving a kick (that is, detecting that file
descriptor is readable) on the descriptor specified by
VHOST_USER_SET_VRING_KICK, and stop ring upon receiving
VHOST_USER_GET_VRING_BASE.
-- 
MST
^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 01/19] Revert "vhost: workaround MQ fails to startup"
  2017-11-02  9:40     ` Maxime Coquelin
  2017-11-03 13:05       ` Yuanhan Liu
@ 2017-11-03 15:34       ` Thomas Monjalon
  2017-11-03 16:31         ` Kavanagh, Mark B
  1 sibling, 1 reply; 48+ messages in thread
From: Thomas Monjalon @ 2017-11-03 15:34 UTC (permalink / raw)
  To: Kavanagh, Mark B, yliu@fridaylinux.org
  Cc: dev, Maxime Coquelin, Horton, Remy, Bie, Tiwei, mst@redhat.com,
	jfreiman@redhat.com, vkaplans@redhat.com, jasowang@redhat.com,
	Mcnamara, John, Loftus, Ciara, Stokes, Ian
02/11/2017 10:40, Maxime Coquelin:
> Hi Mark,
> 
> On 11/01/2017 06:11 PM, Kavanagh, Mark B wrote:
> > Hi Maxime,
> > 
> > First off, apologies for the lateness of this reply - I realize that this patch has already been upstreamed.
> 
> No worries, great to see DPDK integration being tested before v17.11 is
> released. Is the v17.11 upgrade patch available somewhere?
> 
> > Unfortunately, during OvS-DPDK regression testing for DPDK v17.11-rc2 just today, a regression involving vHost multiq was detected, and pinpointed to this patch.
> > 
> > Version info for the components involved during the aforementioned testing is as follows:
> > DPDK:	v17.11-rc2
> > OvS:	af2e40c ("sparse: eliminate "duplicate initialization") + DPDK v17.11 upgrade patch
> > QEMU:	v2.7.0
[...]
> > Moving from QEMU v2.7.0 to v2.10.0 resolves the issue. However, herein lies the issue: QEMU v2.10.0 was only released in August of this year; anecdotally, we know that many OvS-DPDK customers use older versions of QEMU (typically, v2.7.0), and are likely un[able|willing] to move. With this patch, a hard dependency on QEMU v2.10 is created for users who want to use the vHU multiq feature in DPDK v17.11 (and subsequently, the upcoming OvS v2.9.0), which IMO will likely be unacceptable for many.
> 
> Do you mean that upstream Qemu v2.7.0 is used in production?
> I would expect the customers to use a distro Qemu which should contain
> relevant fixes, or follow upstream's stable branches.
Me too, I would expect they integrate the fixes.
> FYI, Qemu v2.9.1 contains a backport of the fix.
But you know, some users do not want to upgrade anything in production,
as in the old time of hardware networking equipments.
Curiously, the case considered here seems to be users sticked to old Qemu
while willing to consider the switch as an upgradable software.
It is really really strange, but let's consider such constraint.
If I remember well, we have integrated the vhost multiqueue feature
as soon as a Qemu release was almost ready.
For the record, it was Qemu 2.5.0 (released in Dec 2015):
	https://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg02731.html
	https://wiki.qemu.org/ChangeLog/2.5#virtio
And it was supported in DPDK 2.2.0 (released one day before):
	http://dpdk.org/ml/archives/announce/2015-December/000073.html
	http://dpdk.org/doc/guides/rel_notes/release_2_2.html
Nowadays, Qemu 2.10 is ready to let us enable IOMMU support.
But you ask to wait more. How much time should we wait?
Is there a policy to agree or are we just waiting for someone to approve?
^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 17/19] vhost-user: iommu: postpone device creation until ring are mapped
  2017-11-03 15:15           ` Michael S. Tsirkin
@ 2017-11-03 15:54             ` Maxime Coquelin
  0 siblings, 0 replies; 48+ messages in thread
From: Maxime Coquelin @ 2017-11-03 15:54 UTC (permalink / raw)
  To: Michael S. Tsirkin, Yao, Lei A
  Cc: dev@dpdk.org, yliu@fridaylinux.org, Horton, Remy, Bie, Tiwei,
	jfreiman@redhat.com, vkaplans@redhat.com, jasowang@redhat.com
On 11/03/2017 04:15 PM, Michael S. Tsirkin wrote:
> On Fri, Nov 03, 2017 at 09:25:58AM +0100, Maxime Coquelin wrote:
>>
>>
>> On 11/02/2017 05:02 PM, Maxime Coquelin wrote:
>>>
>>>
>>> On 11/02/2017 09:21 AM, Maxime Coquelin wrote:
>>>> Hi Lei,
>>>>
>>>> On 11/02/2017 08:21 AM, Yao, Lei A wrote:
>>>>>
>>>> ...
>>>>> Hi, Maxime > I met one issue with your patch set during the v17.11 test.
>>>>
>>>> Is it with v17.11-rc2 or -rc1?
>>>>
>>>>> The test scenario is following,
>>>>> 1.    Bind one NIC, use test-pmd set vhost-user with 2 queue
>>>>> usertools/dpdk-devbind.py --bind=igb_uio 0000:05:00.0
>>>>> ./x86_64-native-linuxapp-gcc/app/testpmd -c 0xe -n 4
>>>>> --socket-mem 1024,1024 \
>>>>> --vdev 'net_vhost0,iface=vhost-net,queues=2' - -i --rxq=2
>>>>> --txq=2 --nb-cores=2 --rss-ip
>>>>> 2.    Launch qemu with  virtio device which has 2 queue
>>>>> 3.    In VM, launch testpmd with virtio-pmd using only 1 queue.
>>>>> x86_64-native-linuxapp-gcc/app/testpmd -c 0x07 -n 3 - -i
>>>>> --txqflags=0xf01 \
>>>>> --rxq=1 --txq=1 --rss-ip --nb-cores=1
>>>>>
>>>>> First,
>>>>> commit 09927b5249694bad1c094d3068124673722e6b8f
>>>>> vhost: translate ring addresses when IOMMU enabled
>>>>> The patch causes no traffic in PVP test. but link status is
>>>>> still up in vhost-user.
>>>>>
>>>>> Second,
>>>>> eefac9536a901a1f0bb52aa3b6fec8f375f09190
>>>>> vhost: postpone device creation until rings are mapped
>>>>> The patch causes link status "down" in vhost-user.
>>>
>>> I reproduced this one, and understand why link status remains down.
>>> My series did fixed a potential issue Michael raised, that the vring
>>> addresses should only interpreted once the ring is enabled.
>>> When VHOST_USER_F_PROTOCOL_FEATURES is negotiated, the rings addrs are
>>> translated when ring is enabled via VHOST_USER_SET_VRING_ENABLE.
>>> When not negotiated, the ring is considered started enabled, so
>>> translation is done at VHOST_USER_SET_VRING_KICK time.
>>>
>>> In your case, protocol features are negotiated, so the ring addresses
>>> are translated at enable time. The problem is that the code considers
>>> the device is ready once addresses for all the rings are translated.
>>> But since only the first pair of rings is used, it never happens, and
>>> the link remains down.
>>>
>>> One of the reason this check is done is to avoid starting the PMD
>>> threads before the addresses are translated in case of NUMA
>>> reallocation, as virtqueues and virtio-net device structs can be
>>> reallocated on a different node.
>>>
>>> I think the right fix would be to only perform NUMA reallocation for
>>> vring 0, as today we would end-up reallocating virtio-net struct
>>> mulitple time if VQs are on different NUMA nodes.
>>>
>>> Doing that, we could then consider the device is ready if vring 0 is
>>> enabled and its ring addresses are translated, and if other vrings have
>>> been kicked.
>>>
>>> I'll post a patch shortly implementing this idea.
>>
>> The proposed solution doesn't work, because disabled queues get accessed at
>> device start time:
>>
>> int
>> rte_vhost_enable_guest_notification(int vid, uint16_t queue_id, int enable)
>> {
>> ..
>> 	dev->virtqueue[queue_id]->used->flags = VRING_USED_F_NO_NOTIFY;
>> 	return 0;
>> }
>>
>> The above function being called in Vhost PMD for every queues, enabled
>> or not. While we could fix the PMD, it could break other applications
>> using the Vhost lib API directly, so we cannot translate at enable
>> time reliably.
>>
>> I think we may be a bit less conservative, and postpone addresses
>> translation at kick time, whatever VHOST_USER_F_PROTOCOL_FEATURES is
>> negotiated or not.
>>
>> Regards,
>> Maxime
>>
>>> Thanks,
>>> Maxime
> 
> I agree, enabling has nothing to do with it.
> 
> The spec is quite explicit:
> 
> Client must only process each ring when it is started.
> 
> and
> 
> Client must start ring upon receiving a kick (that is, detecting that file
> descriptor is readable) on the descriptor specified by
> VHOST_USER_SET_VRING_KICK, and stop ring upon receiving
> VHOST_USER_GET_VRING_BASE.
> 
Thanks for the confirmation Michael, fix posted.
Maxime
^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 01/19] Revert "vhost: workaround MQ fails to startup"
  2017-11-03 15:34       ` Thomas Monjalon
@ 2017-11-03 16:31         ` Kavanagh, Mark B
  0 siblings, 0 replies; 48+ messages in thread
From: Kavanagh, Mark B @ 2017-11-03 16:31 UTC (permalink / raw)
  To: Thomas Monjalon, yliu@fridaylinux.org
  Cc: dev@dpdk.org, Maxime Coquelin, Horton, Remy, Bie, Tiwei,
	mst@redhat.com, jfreiman@redhat.com, vkaplans@redhat.com,
	jasowang@redhat.com, Mcnamara, John, Loftus, Ciara, Stokes, Ian
>From: Thomas Monjalon [mailto:thomas@monjalon.net]
>Sent: Friday, November 3, 2017 3:35 PM
>To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; yliu@fridaylinux.org
>Cc: dev@dpdk.org; Maxime Coquelin <maxime.coquelin@redhat.com>; Horton, Remy
><remy.horton@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>; mst@redhat.com;
>jfreiman@redhat.com; vkaplans@redhat.com; jasowang@redhat.com; Mcnamara, John
><john.mcnamara@intel.com>; Loftus, Ciara <ciara.loftus@intel.com>; Stokes, Ian
><ian.stokes@intel.com>
>Subject: Re: [dpdk-dev] [PATCH v3 01/19] Revert "vhost: workaround MQ fails to
>startup"
>
>02/11/2017 10:40, Maxime Coquelin:
>> Hi Mark,
>>
>> On 11/01/2017 06:11 PM, Kavanagh, Mark B wrote:
>> > Hi Maxime,
>> >
>> > First off, apologies for the lateness of this reply - I realize that this
>patch has already been upstreamed.
>>
>> No worries, great to see DPDK integration being tested before v17.11 is
>> released. Is the v17.11 upgrade patch available somewhere?
>>
>> > Unfortunately, during OvS-DPDK regression testing for DPDK v17.11-rc2 just
>today, a regression involving vHost multiq was detected, and pinpointed to
>this patch.
>> >
>> > Version info for the components involved during the aforementioned testing
>is as follows:
>> > DPDK:	v17.11-rc2
>> > OvS:	af2e40c ("sparse: eliminate "duplicate initialization") + DPDK
>v17.11 upgrade patch
>> > QEMU:	v2.7.0
>[...]
>> > Moving from QEMU v2.7.0 to v2.10.0 resolves the issue. However, herein
>lies the issue: QEMU v2.10.0 was only released in August of this year;
>anecdotally, we know that many OvS-DPDK customers use older versions of QEMU
>(typically, v2.7.0), and are likely un[able|willing] to move. With this patch,
>a hard dependency on QEMU v2.10 is created for users who want to use the vHU
>multiq feature in DPDK v17.11 (and subsequently, the upcoming OvS v2.9.0),
>which IMO will likely be unacceptable for many.
>>
>> Do you mean that upstream Qemu v2.7.0 is used in production?
>> I would expect the customers to use a distro Qemu which should contain
>> relevant fixes, or follow upstream's stable branches.
>
To be honest, I don't have hard data to back this up, apart from anecdotal reports that "some customers use 'older' versions of QEMU".
I understand that this is not the most solid foundation upon which to build an argument.
>Me too, I would expect they integrate the fixes.
>
>> FYI, Qemu v2.9.1 contains a backport of the fix.
>
>But you know, some users do not want to upgrade anything in production,
>as in the old time of hardware networking equipments.
>Curiously, the case considered here seems to be users sticked to old Qemu
>while willing to consider the switch as an upgradable software.
>It is really really strange, but let's consider such constraint.
>
>If I remember well, we have integrated the vhost multiqueue feature
>as soon as a Qemu release was almost ready.
>For the record, it was Qemu 2.5.0 (released in Dec 2015):
>	https://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg02731.html
>	https://wiki.qemu.org/ChangeLog/2.5#virtio
>And it was supported in DPDK 2.2.0 (released one day before):
>	http://dpdk.org/ml/archives/announce/2015-December/000073.html
>	http://dpdk.org/doc/guides/rel_notes/release_2_2.html
Hmm, that's an interesting data point.
>
>Nowadays, Qemu 2.10 is ready to let us enable IOMMU support.
>But you ask to wait more. How much time should we wait?
>Is there a policy to agree or are we just waiting for someone to approve?
I'm afraid that I don't have an answer for this. 
My concern here was purely proactive - however, without concrete data to back it up, and in the light of Thomas' point regarding 2.5.0/DPDK 2.2.0, perhaps my concerns are unfounded.
It may be sufficient therefore, to simply document compatible versions of QEMU as part of the OvS documentation.
Thanks to all involved for your input,
Mark
^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 01/19] Revert "vhost: workaround MQ fails to startup"
  2017-11-03 14:28         ` Maxime Coquelin
@ 2017-11-06 12:00           ` Yuanhan Liu
  2017-11-06 12:07             ` Maxime Coquelin
  0 siblings, 1 reply; 48+ messages in thread
From: Yuanhan Liu @ 2017-11-06 12:00 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: Kavanagh, Mark B, dev@dpdk.org, Horton, Remy, Bie, Tiwei,
	mst@redhat.com, jfreiman@redhat.com, vkaplans@redhat.com,
	jasowang@redhat.com, Mcnamara, John, Loftus, Ciara, Stokes, Ian,
	Thomas Monjalon
On Fri, Nov 03, 2017 at 03:28:36PM +0100, Maxime Coquelin wrote:
> 
> 
> On 11/03/2017 02:05 PM, Yuanhan Liu wrote:
> >On Thu, Nov 02, 2017 at 10:40:26AM +0100, Maxime Coquelin wrote:
> >>>Moving from QEMU v2.7.0 to v2.10.0 resolves the issue. However, herein lies the issue: QEMU v2.10.0 was only released in August of this year; anecdotally, we know that many OvS-DPDK customers use older versions of QEMU (typically, v2.7.0), and are likely un[able|willing] to move. With this patch, a hard dependency on QEMU v2.10 is created for users who want to use the vHU multiq feature in DPDK v17.11 (and subsequently, the upcoming OvS v2.9.0), which IMO will likely be unacceptable for many.
> >>
> >>Do you mean that upstream Qemu v2.7.0 is used in production?
> >>I would expect the customers to use a distro Qemu which should contain
> >>relevant fixes, or follow upstream's stable branches.
> >>
> >>FYI, Qemu v2.9.1 contains a backport of the fix.
> >>
> >>>One potential solution to this problem is to introduce a compile-time option that would allow the user to [dis|en]able the VHOST_USER_PROTOCOL_F_REPLY_ACK feature - is that something that would be acceptable to you Maxime?
> >>
> >>Yes, that's one option, but:
> >>1. VHOST_USER_PROTOCOL_F_REPLY_ACK enabled should be the default
> >>2. VHOST_USER_PROTOCOL_F_REPLY_ACK disabled will be less extensively
> >>tested.
> >>
> >>Yuanhan, what do you think?
> >
> >My suggestion is to still disable it by default. Qemu 2.7 - 2.9 (inclusive)
> >is a pretty big range, that I think quite many people would hit this issue
> Ok, then what about adding a new flag to rte_vhost_driver_register(), as
> done for tx zero copy to enable IOMMU feature?
> If flag is unset, then we mask out both IOMMU virtio feature flag and
> REPLY_ACK protocol feature flag.
> 
> For a while this flag will be unset by default, not to break these
> deprecated and unmaintained Qemu versions. But I think at some point
> we should make it enabled by default, as it would be sad not to benefit
> from this security feature.
This sounds good to me.
	--yliu
> 
> This change will have an impact on OVS, as it will need a new vhost-user
> port option to enable IOMMU feature. Thing that is transparent to OVS
> currently.
> 
> Mark, Yuanhan, does that sound good to you?
> 
> Maxime
> >	--yliu
> >
^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 01/19] Revert "vhost: workaround MQ fails to startup"
  2017-11-06 12:00           ` Yuanhan Liu
@ 2017-11-06 12:07             ` Maxime Coquelin
  2017-11-06 12:24               ` Yuanhan Liu
  0 siblings, 1 reply; 48+ messages in thread
From: Maxime Coquelin @ 2017-11-06 12:07 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: Kavanagh, Mark B, dev@dpdk.org, Horton, Remy, Bie, Tiwei,
	mst@redhat.com, jfreiman@redhat.com, vkaplans@redhat.com,
	jasowang@redhat.com, Mcnamara, John, Loftus, Ciara, Stokes, Ian,
	Thomas Monjalon
On 11/06/2017 01:00 PM, Yuanhan Liu wrote:
> On Fri, Nov 03, 2017 at 03:28:36PM +0100, Maxime Coquelin wrote:
>>
>>
>> On 11/03/2017 02:05 PM, Yuanhan Liu wrote:
>>> On Thu, Nov 02, 2017 at 10:40:26AM +0100, Maxime Coquelin wrote:
>>>>> Moving from QEMU v2.7.0 to v2.10.0 resolves the issue. However, herein lies the issue: QEMU v2.10.0 was only released in August of this year; anecdotally, we know that many OvS-DPDK customers use older versions of QEMU (typically, v2.7.0), and are likely un[able|willing] to move. With this patch, a hard dependency on QEMU v2.10 is created for users who want to use the vHU multiq feature in DPDK v17.11 (and subsequently, the upcoming OvS v2.9.0), which IMO will likely be unacceptable for many.
>>>>
>>>> Do you mean that upstream Qemu v2.7.0 is used in production?
>>>> I would expect the customers to use a distro Qemu which should contain
>>>> relevant fixes, or follow upstream's stable branches.
>>>>
>>>> FYI, Qemu v2.9.1 contains a backport of the fix.
>>>>
>>>>> One potential solution to this problem is to introduce a compile-time option that would allow the user to [dis|en]able the VHOST_USER_PROTOCOL_F_REPLY_ACK feature - is that something that would be acceptable to you Maxime?
>>>>
>>>> Yes, that's one option, but:
>>>> 1. VHOST_USER_PROTOCOL_F_REPLY_ACK enabled should be the default
>>>> 2. VHOST_USER_PROTOCOL_F_REPLY_ACK disabled will be less extensively
>>>> tested.
>>>>
>>>> Yuanhan, what do you think?
>>>
>>> My suggestion is to still disable it by default. Qemu 2.7 - 2.9 (inclusive)
>>> is a pretty big range, that I think quite many people would hit this issue
>> Ok, then what about adding a new flag to rte_vhost_driver_register(), as
>> done for tx zero copy to enable IOMMU feature?
>> If flag is unset, then we mask out both IOMMU virtio feature flag and
>> REPLY_ACK protocol feature flag.
>>
>> For a while this flag will be unset by default, not to break these
>> deprecated and unmaintained Qemu versions. But I think at some point
>> we should make it enabled by default, as it would be sad not to benefit
>> from this security feature.
> 
> This sounds good to me.
Actually, I have posted a different patch, so that we don't have API
change for this. Upstream OVS can disable IOMMU feature, which will in
turn disable REPLY-ACK protocol feature if they want to.
Thanks,
Maxime
> 	--yliu
>>
>> This change will have an impact on OVS, as it will need a new vhost-user
>> port option to enable IOMMU feature. Thing that is transparent to OVS
>> currently.
>>
>> Mark, Yuanhan, does that sound good to you?
>>
>> Maxime
>>> 	--yliu
>>>
^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 01/19] Revert "vhost: workaround MQ fails to startup"
  2017-11-06 12:07             ` Maxime Coquelin
@ 2017-11-06 12:24               ` Yuanhan Liu
  2017-11-06 12:50                 ` Maxime Coquelin
  0 siblings, 1 reply; 48+ messages in thread
From: Yuanhan Liu @ 2017-11-06 12:24 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: Kavanagh, Mark B, dev@dpdk.org, Horton, Remy, Bie, Tiwei,
	mst@redhat.com, jfreiman@redhat.com, vkaplans@redhat.com,
	jasowang@redhat.com, Mcnamara, John, Loftus, Ciara, Stokes, Ian,
	Thomas Monjalon
On Mon, Nov 06, 2017 at 01:07:15PM +0100, Maxime Coquelin wrote:
> 
> 
> On 11/06/2017 01:00 PM, Yuanhan Liu wrote:
> >On Fri, Nov 03, 2017 at 03:28:36PM +0100, Maxime Coquelin wrote:
> >>
> >>
> >>On 11/03/2017 02:05 PM, Yuanhan Liu wrote:
> >>>On Thu, Nov 02, 2017 at 10:40:26AM +0100, Maxime Coquelin wrote:
> >>>>>Moving from QEMU v2.7.0 to v2.10.0 resolves the issue. However, herein lies the issue: QEMU v2.10.0 was only released in August of this year; anecdotally, we know that many OvS-DPDK customers use older versions of QEMU (typically, v2.7.0), and are likely un[able|willing] to move. With this patch, a hard dependency on QEMU v2.10 is created for users who want to use the vHU multiq feature in DPDK v17.11 (and subsequently, the upcoming OvS v2.9.0), which IMO will likely be unacceptable for many.
> >>>>
> >>>>Do you mean that upstream Qemu v2.7.0 is used in production?
> >>>>I would expect the customers to use a distro Qemu which should contain
> >>>>relevant fixes, or follow upstream's stable branches.
> >>>>
> >>>>FYI, Qemu v2.9.1 contains a backport of the fix.
> >>>>
> >>>>>One potential solution to this problem is to introduce a compile-time option that would allow the user to [dis|en]able the VHOST_USER_PROTOCOL_F_REPLY_ACK feature - is that something that would be acceptable to you Maxime?
> >>>>
> >>>>Yes, that's one option, but:
> >>>>1. VHOST_USER_PROTOCOL_F_REPLY_ACK enabled should be the default
> >>>>2. VHOST_USER_PROTOCOL_F_REPLY_ACK disabled will be less extensively
> >>>>tested.
> >>>>
> >>>>Yuanhan, what do you think?
> >>>
> >>>My suggestion is to still disable it by default. Qemu 2.7 - 2.9 (inclusive)
> >>>is a pretty big range, that I think quite many people would hit this issue
> >>Ok, then what about adding a new flag to rte_vhost_driver_register(), as
> >>done for tx zero copy to enable IOMMU feature?
> >>If flag is unset, then we mask out both IOMMU virtio feature flag and
> >>REPLY_ACK protocol feature flag.
> >>
> >>For a while this flag will be unset by default, not to break these
> >>deprecated and unmaintained Qemu versions. But I think at some point
> >>we should make it enabled by default, as it would be sad not to benefit
> >>from this security feature.
> >
> >This sounds good to me.
> 
> Actually, I have posted a different patch, so that we don't have API
> change for this. Upstream OVS can disable IOMMU feature, which will in
> turn disable REPLY-ACK protocol feature if they want to.
Sorry I missed that. So the REPLY-ACK will still be enabled by default and
you leave the choice to the users to disable it, explicitly? This doesn't
sound the best to me. We now know that it breaks OVS, but other users may
hit the same issue again without any awareness.
Also, I know this feature brings good benefits on security. But IIRC, you
mentioned that it became barely un-usable with Linux kernel virtio-net
driver.
>From the two points, I think let's make it be disable by default now?
	--yliu
^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 01/19] Revert "vhost: workaround MQ fails to startup"
  2017-11-06 12:24               ` Yuanhan Liu
@ 2017-11-06 12:50                 ` Maxime Coquelin
  2017-11-06 13:36                   ` Yuanhan Liu
  0 siblings, 1 reply; 48+ messages in thread
From: Maxime Coquelin @ 2017-11-06 12:50 UTC (permalink / raw)
  To: Yuanhan Liu, Thomas Monjalon
  Cc: Kavanagh, Mark B, dev@dpdk.org, Horton, Remy, Bie, Tiwei,
	mst@redhat.com, jfreiman@redhat.com, vkaplans@redhat.com,
	jasowang@redhat.com, Mcnamara, John, Loftus, Ciara, Stokes, Ian
On 11/06/2017 01:24 PM, Yuanhan Liu wrote:
> On Mon, Nov 06, 2017 at 01:07:15PM +0100, Maxime Coquelin wrote:
>>
>>
>> On 11/06/2017 01:00 PM, Yuanhan Liu wrote:
>>> On Fri, Nov 03, 2017 at 03:28:36PM +0100, Maxime Coquelin wrote:
>>>>
>>>>
>>>> On 11/03/2017 02:05 PM, Yuanhan Liu wrote:
>>>>> On Thu, Nov 02, 2017 at 10:40:26AM +0100, Maxime Coquelin wrote:
>>>>>>> Moving from QEMU v2.7.0 to v2.10.0 resolves the issue. However, herein lies the issue: QEMU v2.10.0 was only released in August of this year; anecdotally, we know that many OvS-DPDK customers use older versions of QEMU (typically, v2.7.0), and are likely un[able|willing] to move. With this patch, a hard dependency on QEMU v2.10 is created for users who want to use the vHU multiq feature in DPDK v17.11 (and subsequently, the upcoming OvS v2.9.0), which IMO will likely be unacceptable for many.
>>>>>>
>>>>>> Do you mean that upstream Qemu v2.7.0 is used in production?
>>>>>> I would expect the customers to use a distro Qemu which should contain
>>>>>> relevant fixes, or follow upstream's stable branches.
>>>>>>
>>>>>> FYI, Qemu v2.9.1 contains a backport of the fix.
>>>>>>
>>>>>>> One potential solution to this problem is to introduce a compile-time option that would allow the user to [dis|en]able the VHOST_USER_PROTOCOL_F_REPLY_ACK feature - is that something that would be acceptable to you Maxime?
>>>>>>
>>>>>> Yes, that's one option, but:
>>>>>> 1. VHOST_USER_PROTOCOL_F_REPLY_ACK enabled should be the default
>>>>>> 2. VHOST_USER_PROTOCOL_F_REPLY_ACK disabled will be less extensively
>>>>>> tested.
>>>>>>
>>>>>> Yuanhan, what do you think?
>>>>>
>>>>> My suggestion is to still disable it by default. Qemu 2.7 - 2.9 (inclusive)
>>>>> is a pretty big range, that I think quite many people would hit this issue
>>>> Ok, then what about adding a new flag to rte_vhost_driver_register(), as
>>>> done for tx zero copy to enable IOMMU feature?
>>>> If flag is unset, then we mask out both IOMMU virtio feature flag and
>>>> REPLY_ACK protocol feature flag.
>>>>
>>>> For a while this flag will be unset by default, not to break these
>>>> deprecated and unmaintained Qemu versions. But I think at some point
>>>> we should make it enabled by default, as it would be sad not to benefit
>>> >from this security feature.
>>>
>>> This sounds good to me.
>>
>> Actually, I have posted a different patch, so that we don't have API
>> change for this. Upstream OVS can disable IOMMU feature, which will in
>> turn disable REPLY-ACK protocol feature if they want to.
> 
> Sorry I missed that. So the REPLY-ACK will still be enabled by default and
> you leave the choice to the users to disable it, explicitly? This doesn't
> sound the best to me. We now know that it breaks OVS, but other users may
> hit the same issue again without any awareness.
 >
> Also, I know this feature brings good benefits on security. But IIRC, you
> mentioned that it became barely un-usable with Linux kernel virtio-net
> driver.
> 
>  From the two points, I think let's make it be disable by default now?
What concerns me is that hasn't been replied yet is when will we 
consider Qemu 2.7.0-Qemu v2.9.0 (Qemu v2.9.1 being fixed) old enough
to enable it by default? Knowing that Qemu 2.7.x/2.8.x are already
end of life uptream.
Maxime
> 	--yliu
> 
^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [PATCH v3 01/19] Revert "vhost: workaround MQ fails to startup"
  2017-11-06 12:50                 ` Maxime Coquelin
@ 2017-11-06 13:36                   ` Yuanhan Liu
  0 siblings, 0 replies; 48+ messages in thread
From: Yuanhan Liu @ 2017-11-06 13:36 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: Thomas Monjalon, Kavanagh, Mark B, dev@dpdk.org, Horton, Remy,
	Bie, Tiwei, mst@redhat.com, jfreiman@redhat.com,
	vkaplans@redhat.com, jasowang@redhat.com, Mcnamara, John,
	Loftus, Ciara, Stokes, Ian
On Mon, Nov 06, 2017 at 01:50:35PM +0100, Maxime Coquelin wrote:
> 
> 
> On 11/06/2017 01:24 PM, Yuanhan Liu wrote:
> >On Mon, Nov 06, 2017 at 01:07:15PM +0100, Maxime Coquelin wrote:
> >>
> >>
> >>On 11/06/2017 01:00 PM, Yuanhan Liu wrote:
> >>>On Fri, Nov 03, 2017 at 03:28:36PM +0100, Maxime Coquelin wrote:
> >>>>
> >>>>
> >>>>On 11/03/2017 02:05 PM, Yuanhan Liu wrote:
> >>>>>On Thu, Nov 02, 2017 at 10:40:26AM +0100, Maxime Coquelin wrote:
> >>>>>>>Moving from QEMU v2.7.0 to v2.10.0 resolves the issue. However, herein lies the issue: QEMU v2.10.0 was only released in August of this year; anecdotally, we know that many OvS-DPDK customers use older versions of QEMU (typically, v2.7.0), and are likely un[able|willing] to move. With this patch, a hard dependency on QEMU v2.10 is created for users who want to use the vHU multiq feature in DPDK v17.11 (and subsequently, the upcoming OvS v2.9.0), which IMO will likely be unacceptable for many.
> >>>>>>
> >>>>>>Do you mean that upstream Qemu v2.7.0 is used in production?
> >>>>>>I would expect the customers to use a distro Qemu which should contain
> >>>>>>relevant fixes, or follow upstream's stable branches.
> >>>>>>
> >>>>>>FYI, Qemu v2.9.1 contains a backport of the fix.
> >>>>>>
> >>>>>>>One potential solution to this problem is to introduce a compile-time option that would allow the user to [dis|en]able the VHOST_USER_PROTOCOL_F_REPLY_ACK feature - is that something that would be acceptable to you Maxime?
> >>>>>>
> >>>>>>Yes, that's one option, but:
> >>>>>>1. VHOST_USER_PROTOCOL_F_REPLY_ACK enabled should be the default
> >>>>>>2. VHOST_USER_PROTOCOL_F_REPLY_ACK disabled will be less extensively
> >>>>>>tested.
> >>>>>>
> >>>>>>Yuanhan, what do you think?
> >>>>>
> >>>>>My suggestion is to still disable it by default. Qemu 2.7 - 2.9 (inclusive)
> >>>>>is a pretty big range, that I think quite many people would hit this issue
> >>>>Ok, then what about adding a new flag to rte_vhost_driver_register(), as
> >>>>done for tx zero copy to enable IOMMU feature?
> >>>>If flag is unset, then we mask out both IOMMU virtio feature flag and
> >>>>REPLY_ACK protocol feature flag.
> >>>>
> >>>>For a while this flag will be unset by default, not to break these
> >>>>deprecated and unmaintained Qemu versions. But I think at some point
> >>>>we should make it enabled by default, as it would be sad not to benefit
> >>>>from this security feature.
> >>>
> >>>This sounds good to me.
> >>
> >>Actually, I have posted a different patch, so that we don't have API
> >>change for this. Upstream OVS can disable IOMMU feature, which will in
> >>turn disable REPLY-ACK protocol feature if they want to.
> >
> >Sorry I missed that. So the REPLY-ACK will still be enabled by default and
> >you leave the choice to the users to disable it, explicitly? This doesn't
> >sound the best to me. We now know that it breaks OVS, but other users may
> >hit the same issue again without any awareness.
> >
> >Also, I know this feature brings good benefits on security. But IIRC, you
> >mentioned that it became barely un-usable with Linux kernel virtio-net
> >driver.
> >
> > From the two points, I think let's make it be disable by default now?
> 
> What concerns me is that hasn't been replied yet is when will we consider
> Qemu 2.7.0-Qemu v2.9.0 (Qemu v2.9.1 being fixed) old enough
> to enable it by default? Knowing that Qemu 2.7.x/2.8.x are already
> end of life uptream.
I can't tell. But there are probably something we could do. For example, we
could introduce a vhost pmd option, to enable the IOMMU feature. If the user
concerns about the security, he could use such option. By default, let's still
disable it. Meanwhile, OVS may could also add such an option.
	--yliu
^ permalink raw reply	[flat|nested] 48+ messages in thread
end of thread, other threads:[~2017-11-06 13:36 UTC | newest]
Thread overview: 48+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-05  8:36 [PATCH v3 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 01/19] Revert "vhost: workaround MQ fails to startup" Maxime Coquelin
2017-11-01 17:11   ` Kavanagh, Mark B
2017-11-02  9:40     ` Maxime Coquelin
2017-11-03 13:05       ` Yuanhan Liu
2017-11-03 14:28         ` Maxime Coquelin
2017-11-06 12:00           ` Yuanhan Liu
2017-11-06 12:07             ` Maxime Coquelin
2017-11-06 12:24               ` Yuanhan Liu
2017-11-06 12:50                 ` Maxime Coquelin
2017-11-06 13:36                   ` Yuanhan Liu
2017-11-03 15:34       ` Thomas Monjalon
2017-11-03 16:31         ` Kavanagh, Mark B
2017-10-05  8:36 ` [PATCH v3 02/19] vhost: make error handling consistent in rx path Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 03/19] vhost: prepare send_vhost_message() to slave requests Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 04/19] vhost: add support to slave requests channel Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 05/19] vhost: declare missing IOMMU-related definitions for old kernels Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 06/19] vhost: add iotlb helper functions Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 07/19] vhost: iotlb: add pending miss request list and helpers Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 08/19] vhost-user: add support to IOTLB miss slave requests Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 09/19] vhost: initialize vrings IOTLB caches Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 10/19] vhost-user: handle IOTLB update and invalidate requests Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 11/19] vhost: introduce guest IOVA to backend VA helper Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 12/19] vhost: use the guest IOVA to host " Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 13/19] vhost: enable rings at the right time Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 14/19] vhost: don't dereference invalid dev pointer after its reallocation Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 15/19] vhost: postpone rings addresses translation Maxime Coquelin
2017-10-13  1:47   ` Yao, Lei A
2017-10-13  7:32     ` Maxime Coquelin
2017-10-13  7:55       ` Yao, Lei A
2017-10-13  7:56         ` Maxime Coquelin
2017-10-16  5:59         ` Yao, Lei A
2017-10-16  6:23           ` Yao, Lei A
2017-10-16  9:47             ` Maxime Coquelin
2017-10-16 10:47               ` Maxime Coquelin
2017-10-17  1:24                 ` Yao, Lei A
2017-10-17  8:06                   ` Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 16/19] vhost-user: translate ring addresses when IOMMU enabled Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 17/19] vhost-user: iommu: postpone device creation until ring are mapped Maxime Coquelin
2017-11-02  7:21   ` Yao, Lei A
2017-11-02  8:21     ` Maxime Coquelin
2017-11-02 16:02       ` Maxime Coquelin
2017-11-03  8:25         ` Maxime Coquelin
2017-11-03 15:15           ` Michael S. Tsirkin
2017-11-03 15:54             ` Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 18/19] vhost: iommu: Invalidate vring in case of matching IOTLB invalidate Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 19/19] vhost: enable IOMMU support Maxime Coquelin
2017-10-06  6:24 ` [PATCH v3 00/19] Vhost-user: Implement device IOTLB support Yuanhan Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).