Linux virtualization list
 help / color / mirror / Atom feed
* Re: [RFC] virtio_net: add local_bh_disable() around u64_stats_update_begin
From: Sebastian Andrzej Siewior @ 2018-10-18  9:08 UTC (permalink / raw)
  To: Toshiaki Makita
  Cc: Michael S. Tsirkin, netdev, virtualization, tglx, David S. Miller
In-Reply-To: <4849237d-38f5-9840-4ab9-4419de31db85@lab.ntt.co.jp>

On 2018-10-18 18:00:05 [+0900], Toshiaki Makita wrote:
> On 2018/10/18 17:47, Sebastian Andrzej Siewior wrote:
> > On 2018-10-17 14:48:02 [+0800], Jason Wang wrote:
> >>
> >> On 2018/10/17 上午9:13, Toshiaki Makita wrote:
> >>> I'm not sure what condition triggered this warning.
> > 
> > If the seqlock is acquired once in softirq and then in process context
> > again it is enough evidence for lockdep to trigger this warning.
> 
> No. As I said that should not happen because of NAPI guard.
Again: lockdep saw the lock in softirq context once and in process
context once and this is what triggers the warning. It does not matter
if NAPI is enabled or not during the access in process context. If you
want to allow this you need further lockdep annotation…

… but: refill_work() disables NAPI for &vi->rq[1] and refills + updates
stats while NAPI is enabled for &vi->rq[0].

Sebastian
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH] virtio_net: add local_bh_disable() around u64_stats_update_begin
From: Toshiaki Makita @ 2018-10-18  9:06 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Michael S. Tsirkin, netdev, virtualization, tglx, David S. Miller
In-Reply-To: <20181018084313.oopu34iwfwgkcwwc@linutronix.de>

On 2018/10/18 17:43, Sebastian Andrzej Siewior wrote:
> on 32bit, lockdep notices that virtnet_open() and refill_work() invoke
> try_fill_recv() from process context while virtnet_receive() invokes the
> same function from BH context. The problem that the seqcounter within
> u64_stats_update_begin() may deadlock if it is interrupted by BH and
> then acquired again.
> 
> Introduce u64_stats_update_begin_bh() which disables BH on 32bit
> architectures. Since the BH might interrupt the worker, this new
> function should not limited to SMP like the others which are expected
> to be used in softirq.
> 
> With this change we might lose increments but this is okay. The
> important part that the two 32bit parts of the 64bit counter are not
> corrupted.
> 
> Fixes: 461f03dc99cf6 ("virtio_net: Add kick stats").
> Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

NACK. Again, this race should not happen because of NAPI guard.
We need to investigate why this warning happened.

-- 
Toshiaki Makita

^ permalink raw reply

* Re: [RFC] virtio_net: add local_bh_disable() around u64_stats_update_begin
From: Toshiaki Makita @ 2018-10-18  9:00 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Jason Wang
  Cc: netdev, tglx, Michael S. Tsirkin, David S. Miller, virtualization
In-Reply-To: <20181018084753.wefvsypdevbzoadg@linutronix.de>

On 2018/10/18 17:47, Sebastian Andrzej Siewior wrote:
> On 2018-10-17 14:48:02 [+0800], Jason Wang wrote:
>>
>> On 2018/10/17 上午9:13, Toshiaki Makita wrote:
>>> I'm not sure what condition triggered this warning.
> 
> If the seqlock is acquired once in softirq and then in process context
> again it is enough evidence for lockdep to trigger this warning.

No. As I said that should not happen because of NAPI guard.

-- 
Toshiaki Makita

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [RFC] virtio_net: add local_bh_disable() around u64_stats_update_begin
From: Sebastian Andrzej Siewior @ 2018-10-18  8:47 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, netdev, virtualization, tglx, David S. Miller
In-Reply-To: <a281371f-dd20-2036-d0a8-1081c2f6a452@redhat.com>

On 2018-10-17 14:48:02 [+0800], Jason Wang wrote:
> 
> On 2018/10/17 上午9:13, Toshiaki Makita wrote:
> > I'm not sure what condition triggered this warning.

If the seqlock is acquired once in softirq and then in process context
again it is enough evidence for lockdep to trigger this warning.

> > Toshiaki Makita
> 
> 
> Or maybe NAPI is enabled unexpectedly somewhere?
> 
> Btw, the schedule_delayed_work() in virtnet_open() is also suspicious, if
> the work is executed before virtnet_napi_enable(), there will be a deadloop
> for napi_disable().

something like this? It is also likely if it runs OOM on queue 2, it
will run OOM again on queue 3.

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index fbcfb4d272336..87d6ec4765270 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1263,22 +1263,22 @@ static void refill_work(struct work_struct *work)
 {
 	struct virtnet_info *vi =
 		container_of(work, struct virtnet_info, refill.work);
-	bool still_empty;
+	int still_empty = 0;
 	int i;
 
 	for (i = 0; i < vi->curr_queue_pairs; i++) {
 		struct receive_queue *rq = &vi->rq[i];
 
 		napi_disable(&rq->napi);
-		still_empty = !try_fill_recv(vi, rq, GFP_KERNEL);
+		if (!try_fill_recv(vi, rq, GFP_KERNEL))
+		    still_empty++;
 		virtnet_napi_enable(rq->vq, &rq->napi);
-
-		/* In theory, this can happen: if we don't get any buffers in
-		 * we will *never* try to fill again.
-		 */
-		if (still_empty)
-			schedule_delayed_work(&vi->refill, HZ/2);
 	}
+	/* In theory, this can happen: if we don't get any buffers in
+	 * we will *never* try to fill again.
+	 */
+	if (still_empty)
+		schedule_delayed_work(&vi->refill, HZ/2);
 }
 
 static int virtnet_receive(struct receive_queue *rq, int budget,
@@ -1407,12 +1407,13 @@ static int virtnet_open(struct net_device *dev)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
 	int i, err;
+	int need_refill = 0;
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		if (i < vi->curr_queue_pairs)
 			/* Make sure we have some buffers: if oom use wq. */
 			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
-				schedule_delayed_work(&vi->refill, 0);
+				need_refill++;
 
 		err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i);
 		if (err < 0)
@@ -1428,6 +1429,8 @@ static int virtnet_open(struct net_device *dev)
 		virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi);
 		virtnet_napi_tx_enable(vi, vi->sq[i].vq, &vi->sq[i].napi);
 	}
+	if (need_refill)
+		schedule_delayed_work(&vi->refill, 0);
 
 	return 0;
 }
@@ -2236,6 +2239,7 @@ static int virtnet_restore_up(struct virtio_device *vdev)
 {
 	struct virtnet_info *vi = vdev->priv;
 	int err, i;
+	int need_refill = 0;
 
 	err = init_vqs(vi);
 	if (err)
@@ -2246,13 +2250,15 @@ static int virtnet_restore_up(struct virtio_device *vdev)
 	if (netif_running(vi->dev)) {
 		for (i = 0; i < vi->curr_queue_pairs; i++)
 			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
-				schedule_delayed_work(&vi->refill, 0);
+				need_refill++;
 
 		for (i = 0; i < vi->max_queue_pairs; i++) {
 			virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi);
 			virtnet_napi_tx_enable(vi, vi->sq[i].vq,
 					       &vi->sq[i].napi);
 		}
+		if (need_refill)
+			schedule_delayed_work(&vi->refill, 0);
 	}
 
 	netif_device_attach(vi->dev);

> Thanks

Sebastian
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related

* [PATCH] virtio_net: add local_bh_disable() around u64_stats_update_begin
From: Sebastian Andrzej Siewior @ 2018-10-18  8:43 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Michael S. Tsirkin, netdev, virtualization, tglx, David S. Miller
In-Reply-To: <20181016114414.23ea73c3@xeon-e3>

on 32bit, lockdep notices that virtnet_open() and refill_work() invoke
try_fill_recv() from process context while virtnet_receive() invokes the
same function from BH context. The problem that the seqcounter within
u64_stats_update_begin() may deadlock if it is interrupted by BH and
then acquired again.

Introduce u64_stats_update_begin_bh() which disables BH on 32bit
architectures. Since the BH might interrupt the worker, this new
function should not limited to SMP like the others which are expected
to be used in softirq.

With this change we might lose increments but this is okay. The
important part that the two 32bit parts of the 64bit counter are not
corrupted.

Fixes: 461f03dc99cf6 ("virtio_net: Add kick stats").
Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/virtio_net.c       |  4 ++--
 include/linux/u64_stats_sync.h | 16 ++++++++++++++++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index dab504ec5e502..fbcfb4d272336 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1206,9 +1206,9 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
 			break;
 	} while (rq->vq->num_free);
 	if (virtqueue_kick_prepare(rq->vq) && virtqueue_notify(rq->vq)) {
-		u64_stats_update_begin(&rq->stats.syncp);
+		u64_stats_update_begin_bh(&rq->stats.syncp);
 		rq->stats.kicks++;
-		u64_stats_update_end(&rq->stats.syncp);
+		u64_stats_update_end_bh(&rq->stats.syncp);
 	}
 
 	return !oom;
diff --git a/include/linux/u64_stats_sync.h b/include/linux/u64_stats_sync.h
index a27604f99ed04..46b6ad6175628 100644
--- a/include/linux/u64_stats_sync.h
+++ b/include/linux/u64_stats_sync.h
@@ -90,6 +90,22 @@ static inline void u64_stats_update_end(struct u64_stats_sync *syncp)
 #endif
 }
 
+static inline void u64_stats_update_begin_bh(struct u64_stats_sync *syncp)
+{
+#if BITS_PER_LONG==32
+	local_bh_disable();
+	write_seqcount_begin(&syncp->seq);
+#endif
+}
+
+static inline void u64_stats_update_end_bh(struct u64_stats_sync *syncp)
+{
+#if BITS_PER_LONG==32
+	write_seqcount_end(&syncp->seq);
+	local_bh_enable();
+#endif
+}
+
 static inline unsigned long
 u64_stats_update_begin_irqsave(struct u64_stats_sync *syncp)
 {
-- 
2.19.1

^ permalink raw reply related

* Re: [PATCH 7/8] drm/virtio: move virtio_gpu_object_{attach, detach} calls.
From: Gerd Hoffmann @ 2018-10-18  7:00 UTC (permalink / raw)
  To: Dave Airlie; +Cc: Dave Airlie, LKML, dri-devel, open list:VIRTIO CORE, NET...
In-Reply-To: <CAPM=9tzwcPtOfNuoKdbSY0=EBsbsKgM04HZpzAazPEmDbQOXSw@mail.gmail.com>

> > > This to me feels more like a bind/unbind operation rather than a
> > > populate/unpopulate operation,
> > >
> > > bind is " Bind the backend pages into the aperture in the location"
> > >
> > > whereas populate is
> > >
> > > allocate pages for a ttm.
> >
> > I ran into that trap too ;)
> >
> > My first attempt was to map this to bind/unbind.  But this is not
> > correct and therefore didn't work very well.
> >
> > virtio_gpu_object_attach() will send a scatter list of the pages
> > allocated for the object to the host (so the host knows where to
> > copy from/to when processing the transfer_from/to calls).  So IMO
> > it should be done on population not when binding.
> 
> Well bind on AGP is the same thing, we'd fill the AGP GART table on
> bind, so that the AGP GPU could access the pages.

> So I'm interested in why using bind/unbind failed if you have some more info?

Need to try again to be sure, but IIRC I saw multiple bind/unbind calls
for the same object.  ttm probably does it to not waste AGB GART address
space for objects not in use.  But for virtio it is pointless overhead.
But maybe it is just a matter of taking a reference and keeping it for
the whole lifetime of the object to make the binding permanent ...

cheers,
  Gerd

^ permalink raw reply

* [PATCH v3] virtio-gpu: add VIRTIO_GPU_F_EDID feature
From: Gerd Hoffmann @ 2018-10-18  6:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, David Airlie, open list, dri-devel,
	open list:VIRTIO GPU DRIVER
In-Reply-To: <20181018063946.22919-1-kraxel@redhat.com>

The feature allows the guest request an EDID blob (describing monitor
capabilities) for a given scanout (aka virtual monitor connector).

It brings a new command message, which has just a scanout field (beside
the standard virtio-gpu header) and a response message which carries the
EDID data.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
---
 include/uapi/linux/virtio_gpu.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/include/uapi/linux/virtio_gpu.h b/include/uapi/linux/virtio_gpu.h
index f43c3c6171..68198691a6 100644
--- a/include/uapi/linux/virtio_gpu.h
+++ b/include/uapi/linux/virtio_gpu.h
@@ -41,6 +41,7 @@
 #include <linux/types.h>
 
 #define VIRTIO_GPU_F_VIRGL 0
+#define VIRTIO_GPU_F_EDID  1
 
 enum virtio_gpu_ctrl_type {
 	VIRTIO_GPU_UNDEFINED = 0,
@@ -56,6 +57,7 @@ enum virtio_gpu_ctrl_type {
 	VIRTIO_GPU_CMD_RESOURCE_DETACH_BACKING,
 	VIRTIO_GPU_CMD_GET_CAPSET_INFO,
 	VIRTIO_GPU_CMD_GET_CAPSET,
+	VIRTIO_GPU_CMD_GET_EDID,
 
 	/* 3d commands */
 	VIRTIO_GPU_CMD_CTX_CREATE = 0x0200,
@@ -76,6 +78,7 @@ enum virtio_gpu_ctrl_type {
 	VIRTIO_GPU_RESP_OK_DISPLAY_INFO,
 	VIRTIO_GPU_RESP_OK_CAPSET_INFO,
 	VIRTIO_GPU_RESP_OK_CAPSET,
+	VIRTIO_GPU_RESP_OK_EDID,
 
 	/* error responses */
 	VIRTIO_GPU_RESP_ERR_UNSPEC = 0x1200,
@@ -291,6 +294,19 @@ struct virtio_gpu_resp_capset {
 	__u8 capset_data[];
 };
 
+/* VIRTIO_GPU_CMD_GET_EDID */
+struct virtio_gpu_cmd_get_edid {
+	struct virtio_gpu_ctrl_hdr hdr;
+	__le32 scanout;
+};
+
+/* VIRTIO_GPU_RESP_OK_EDID */
+struct virtio_gpu_resp_edid {
+	struct virtio_gpu_ctrl_hdr hdr;
+	__le32 size;
+	__u8 edid[1024];
+};
+
 #define VIRTIO_GPU_EVENT_DISPLAY (1 << 0)
 
 struct virtio_gpu_config {
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH 7/8] drm/virtio: move virtio_gpu_object_{attach, detach} calls.
From: Dave Airlie @ 2018-10-18  6:36 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: Dave Airlie, LKML, dri-devel, open list:VIRTIO CORE, NET...
In-Reply-To: <20181018061059.d32xhg6ijlvpqyvi@sirius.home.kraxel.org>

On Thu, 18 Oct 2018 at 16:11, Gerd Hoffmann <kraxel@redhat.com> wrote:
>
> On Thu, Oct 18, 2018 at 11:41:52AM +1000, Dave Airlie wrote:
> > On Mon, 1 Oct 2018 at 20:33, Gerd Hoffmann <kraxel@redhat.com> wrote:
> > >
> > > Remove the virtio_gpu_object_{attach,detach} calls from move_notify()
> > > callback.  Add them to the ttm_tt_{populate,unpopulate} callbacks, which
> > > is the correct place to handle this.
> > >
> > > The new ttm_tt_{populate,unpopulate} callbacks call the
> > > ttm_pool_populate()/unpopulate() functions (which are the default
> > > implementation in case the callbacks not present) for the actual ttm
> > > work.  Additionally virtio_gpu_object_{attach,detach} is called to
> > > update the state on the host.
> >
> > This to me feels more like a bind/unbind operation rather than a
> > populate/unpopulate operation,
> >
> > bind is " Bind the backend pages into the aperture in the location"
> >
> > whereas populate is
> >
> > allocate pages for a ttm.
>
> I ran into that trap too ;)
>
> My first attempt was to map this to bind/unbind.  But this is not
> correct and therefore didn't work very well.
>
> virtio_gpu_object_attach() will send a scatter list of the pages
> allocated for the object to the host (so the host knows where to
> copy from/to when processing the transfer_from/to calls).  So IMO
> it should be done on population not when binding.

Well bind on AGP is the same thing, we'd fill the AGP GART table on
bind, so that the AGP GPU could access the pages.

So I'm interested in why using bind/unbind failed if you have some more info?

Dave.

^ permalink raw reply

* Re: [PATCH 7/8] drm/virtio: move virtio_gpu_object_{attach, detach} calls.
From: Gerd Hoffmann @ 2018-10-18  6:10 UTC (permalink / raw)
  To: Dave Airlie; +Cc: Dave Airlie, LKML, dri-devel, open list:VIRTIO CORE, NET...
In-Reply-To: <CAPM=9tx3vX_HLyGqvSitCTb=v5QN4tk4TjbwhreLhnT8BytHEg@mail.gmail.com>

On Thu, Oct 18, 2018 at 11:41:52AM +1000, Dave Airlie wrote:
> On Mon, 1 Oct 2018 at 20:33, Gerd Hoffmann <kraxel@redhat.com> wrote:
> >
> > Remove the virtio_gpu_object_{attach,detach} calls from move_notify()
> > callback.  Add them to the ttm_tt_{populate,unpopulate} callbacks, which
> > is the correct place to handle this.
> >
> > The new ttm_tt_{populate,unpopulate} callbacks call the
> > ttm_pool_populate()/unpopulate() functions (which are the default
> > implementation in case the callbacks not present) for the actual ttm
> > work.  Additionally virtio_gpu_object_{attach,detach} is called to
> > update the state on the host.
> 
> This to me feels more like a bind/unbind operation rather than a
> populate/unpopulate operation,
> 
> bind is " Bind the backend pages into the aperture in the location"
> 
> whereas populate is
> 
> allocate pages for a ttm.

I ran into that trap too ;)

My first attempt was to map this to bind/unbind.  But this is not
correct and therefore didn't work very well.

virtio_gpu_object_attach() will send a scatter list of the pages
allocated for the object to the host (so the host knows where to
copy from/to when processing the transfer_from/to calls).  So IMO
it should be done on population not when binding.

cheers,
  Gerd

^ permalink raw reply

* Re: [PATCH 5/8] drm/virtio: track created object state
From: Gerd Hoffmann @ 2018-10-18  5:57 UTC (permalink / raw)
  To: Dave Airlie; +Cc: Dave Airlie, LKML, dri-devel, open list:VIRTIO CORE, NET...
In-Reply-To: <CAPM=9tzpzxv9TorRDWX3=sj_=EEqu3gyU2+AyWAFVHLPzZVcew@mail.gmail.com>

On Thu, Oct 18, 2018 at 11:25:22AM +1000, Dave Airlie wrote:
> On Mon, 1 Oct 2018 at 20:33, Gerd Hoffmann <kraxel@redhat.com> wrote:
> >
> > Track whenever the virtio_gpu_object is already created (i.e. host knows
> > about it) in a new variable.  Add checks to virtio_gpu_object_attach()
> > to do nothing on objects not created yet.
> >
> > Make virtio_gpu_ttm_bo_destroy() use the new variable too, instead of
> > expecting hw_res_handle indicating the object state.
> 
> Is there a potential patch ordering issue here? If this patch changes
> the code to avoid using hw_res_handle,
> is after patches that start filling in hw_res_handle in places we
> haven't filled it in before.
> Maybe this patch should happen earlier.

Didn't run into trouble in testing, but yes, it might be an issue in
error paths.  I'll reorder the patches.

cheers,
  Gerd

^ permalink raw reply

* Re: [PATCH v3] virtio_net: avoid using netif_tx_disable() for serializing tx routine
From: David Miller @ 2018-10-18  5:30 UTC (permalink / raw)
  To: ake; +Cc: netdev, virtualization, linux-kernel, mst
In-Reply-To: <20181017104419.13003-1-ake@igel.co.jp>

From: Ake Koomsin <ake@igel.co.jp>
Date: Wed, 17 Oct 2018 19:44:12 +0900

> Commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
> introduces netif_tx_disable() after netif_device_detach() in order to
> avoid use-after-free of tx queues. However, there are two issues.
> 
> 1) Its operation is redundant with netif_device_detach() in case the
>    interface is running.
> 2) In case of the interface is not running before suspending and
>    resuming, the tx does not get resumed by netif_device_attach().
>    This results in losing network connectivity.
> 
> It is better to use netif_tx_lock_bh()/netif_tx_unlock_bh() instead for
> serializing tx routine during reset. This also preserves the symmetry
> of netif_device_detach() and netif_device_attach().
> 
> Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
> Signed-off-by: Ake Koomsin <ake@igel.co.jp>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH v2] drm/bochs: add edid support.
From: Gerd Hoffmann @ 2018-10-18  5:11 UTC (permalink / raw)
  To: Dave Airlie; +Cc: Dave Airlie, LKML, dri-devel, open list:VIRTIO CORE, NET...
In-Reply-To: <CAPM=9twEjDr5dj+9B==QSYtAB7uqy0Szn2d7cw9swSk+THMA2g@mail.gmail.com>

  Hi,

> > Recent qemu (latest master branch, upcoming 3.1 release) got support
> > for EDID data.  This patch adds guest driver support.
> >
> > EDID support in qemu is not (yet) enabled by default, so please use
> > 'qemu -device VGA,edid=on' for testing.
> 
> The EDID never changes after boot?

Right now it doesn't.

> Is there plans to let it?

Needs more virtual hardware changes, the stdvga has no support for
interrupts, so qemu can't signal an edid update to the guest.

So not not sure yet what to do about it.  Maybe add it later (but it's
not high priority).  Maybe simply recommend to use virtio-gpu instead.

> > +       kfree(bochs->edid);
> > +       bochs->edid = kmalloc(len, GFP_KERNEL);
> 
> Don't you need to free it somewhere?

Oh, yes, we leak that on driver unload.
I'll fix it.

cheers,
  Gerd

^ permalink raw reply

* Re: [RFC] VSOCK: The performance problem of vhost_vsock.
From: Jason Wang @ 2018-10-18  2:45 UTC (permalink / raw)
  To: jiangyiwen, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <5BC7E04E.9000002@huawei.com>


On 2018/10/18 上午9:22, jiangyiwen wrote:
> On 2018/10/17 20:31, Jason Wang wrote:
>> On 2018/10/17 下午7:41, jiangyiwen wrote:
>>> On 2018/10/17 17:51, Jason Wang wrote:
>>>> On 2018/10/17 下午5:39, Jason Wang wrote:
>>>>>> Hi Jason and Stefan,
>>>>>>
>>>>>> Maybe I find the reason of bad performance.
>>>>>>
>>>>>> I found pkt_len is limited to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE(4K),
>>>>>> it will cause the bandwidth is limited to 500~600MB/s. And once I
>>>>>> increase to 64k, it can improve about 3 times(~1500MB/s).
>>>>> Looks like the value was chosen for a balance between rx buffer size and performance. Allocating 64K always even for small packet is kind of waste and stress for guest memory. Virito-net try to avoid this by inventing the merge able rx buffer which allows big packet to be scattered in into different buffers. We can reuse this idea or revisit the idea of using virtio-net/vhost-net as a transport of vsock.
>>>>>
>>>>> What interesting is the performance is still behind vhost-net.
>>>>>
>>>>> Thanks
>>>>>
>>>>>> By the way, I send to 64K in application once, and I don't use
>>>>>> sg_init_one and rewrite function to packet sg list because pkt_len
>>>>>> include multiple pages.
>>>>>>
>>>>>> Thanks,
>>>>>> Yiwen.
>>>> Btw, if you're using vsock for transferring large files, maybe it's more efficient to implement sendpage() for vsock to allow sendfile()/splice() work.
>>>>
>>>> Thanks
>>>>
>>> I can't agree more.
>>>
>>> why vhost_vsock is still behind vhost_net?
>>> Because I use sendfile() to test performance at first, and then
>>> I found vsock don't implement sendpage() and cause the bandwidth
>>> can't be increased. So I use read() and send() to replace sendfile(),
>>> it will increase some switch between kernel and user mode, and sendfile()
>>> can support zero copy. I think this is main reason.
>>>
>>> Thanks.
>>
>> Want to post patches for this then :) ?
>>
>> Thanks
>>
> I may not post patches at the moment because there are other tasks.
>
> After a period of time, I will consider implement the feature.
>
> Thanks.


That's fine.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH net-next V2 6/8] vhost: packed ring support
From: Jason Wang @ 2018-10-18  2:44 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, netdev, linux-kernel, virtualization, maxime.coquelin, wexu
In-Reply-To: <20181015062241-mutt-send-email-mst@kernel.org>


On 2018/10/15 下午6:25, Michael S. Tsirkin wrote:
> On Mon, Oct 15, 2018 at 10:51:06AM +0800, Jason Wang wrote:
>> On 2018年10月15日 10:43, Michael S. Tsirkin wrote:
>>> On Mon, Oct 15, 2018 at 10:22:33AM +0800, Jason Wang wrote:
>>>> On 2018年10月13日 01:23, Michael S. Tsirkin wrote:
>>>>> On Fri, Oct 12, 2018 at 10:32:44PM +0800, Tiwei Bie wrote:
>>>>>> On Mon, Jul 16, 2018 at 11:28:09AM +0800, Jason Wang wrote:
>>>>>> [...]
>>>>>>> @@ -1367,10 +1397,48 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
>>>>>>>     		vq->last_avail_idx = s.num;
>>>>>>>     		/* Forget the cached index value. */
>>>>>>>     		vq->avail_idx = vq->last_avail_idx;
>>>>>>> +		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) {
>>>>>>> +			vq->last_avail_wrap_counter = wrap_counter;
>>>>>>> +			vq->avail_wrap_counter = vq->last_avail_wrap_counter;
>>>>>>> +		}
>>>>>>>     		break;
>>>>>>>     	case VHOST_GET_VRING_BASE:
>>>>>>>     		s.index = idx;
>>>>>>>     		s.num = vq->last_avail_idx;
>>>>>>> +		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
>>>>>>> +			s.num |= vq->last_avail_wrap_counter << 31;
>>>>>>> +		if (copy_to_user(argp, &s, sizeof(s)))
>>>>>>> +			r = -EFAULT;
>>>>>>> +		break;
>>>>>>> +	case VHOST_SET_VRING_USED_BASE:
>>>>>>> +		/* Moving base with an active backend?
>>>>>>> +		 * You don't want to do that.
>>>>>>> +		 */
>>>>>>> +		if (vq->private_data) {
>>>>>>> +			r = -EBUSY;
>>>>>>> +			break;
>>>>>>> +		}
>>>>>>> +		if (copy_from_user(&s, argp, sizeof(s))) {
>>>>>>> +			r = -EFAULT;
>>>>>>> +			break;
>>>>>>> +		}
>>>>>>> +		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) {
>>>>>>> +			wrap_counter = s.num >> 31;
>>>>>>> +			s.num &= ~(1 << 31);
>>>>>>> +		}
>>>>>>> +		if (s.num > 0xffff) {
>>>>>>> +			r = -EINVAL;
>>>>>>> +			break;
>>>>>>> +		}
>>>>>> Do we want to put wrap_counter at bit 15?
>>>>> I think I second that - seems to be consistent with
>>>>> e.g. event suppression structure and the proposed
>>>>> extension to driver notifications.
>>>> Ok, I assumes packed virtqueue support 64K but looks not. I can change it to
>>>> bit 15 and GET_VRING_BASE need to be changed as well.
>>>>
>>>>>> If put wrap_counter at bit 31, the check (s.num > 0xffff)
>>>>>> won't be able to catch the illegal index 0x8000~0xffff for
>>>>>> packed ring.
>>>>>>
>>>> Do we need to clarify this in the spec?
>>> Isn't this all internal vhost stuff?
>> I meant the illegal index 0x8000-0xffff.
> It does say packed virtqueues support up to 2 15 entries each.
>
> But yes we can add a requirement that devices do not expose
> larger rings. Split does not support 2**16 either, right?
> With 2**16 enties avail index becomes 0 and ring looks empty.
>

Yes, so it's better to clarify this in the spec.

Thanks


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 7/8] drm/virtio: move virtio_gpu_object_{attach, detach} calls.
From: Dave Airlie @ 2018-10-18  1:41 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: Dave Airlie, LKML, dri-devel, open list:VIRTIO CORE, NET...
In-Reply-To: <20181001103222.11924-8-kraxel@redhat.com>

On Mon, 1 Oct 2018 at 20:33, Gerd Hoffmann <kraxel@redhat.com> wrote:
>
> Remove the virtio_gpu_object_{attach,detach} calls from move_notify()
> callback.  Add them to the ttm_tt_{populate,unpopulate} callbacks, which
> is the correct place to handle this.
>
> The new ttm_tt_{populate,unpopulate} callbacks call the
> ttm_pool_populate()/unpopulate() functions (which are the default
> implementation in case the callbacks not present) for the actual ttm
> work.  Additionally virtio_gpu_object_{attach,detach} is called to
> update the state on the host.

This to me feels more like a bind/unbind operation rather than a
populate/unpopulate operation,

bind is " Bind the backend pages into the aperture in the location"

whereas populate is

allocate pages for a ttm.

Dave.

>
> With that in place the move and move_notify callbacks are not needed
> any more, so drop them.
>
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  drivers/gpu/drm/virtio/virtgpu_ttm.c | 70 +++++++++++-------------------------
>  1 file changed, 21 insertions(+), 49 deletions(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_ttm.c b/drivers/gpu/drm/virtio/virtgpu_ttm.c
> index cd63dffa6d..96fb17e0fc 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_ttm.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_ttm.c
> @@ -250,33 +250,24 @@ static void virtio_gpu_ttm_io_mem_free(struct ttm_bo_device *bdev,
>   */
>  struct virtio_gpu_ttm_tt {
>         struct ttm_dma_tt               ttm;
> -       struct virtio_gpu_device        *vgdev;
> -       u64                             offset;
> +       struct virtio_gpu_object        *obj;
>  };
>
>  static int virtio_gpu_ttm_backend_bind(struct ttm_tt *ttm,
>                                        struct ttm_mem_reg *bo_mem)
>  {
> -       struct virtio_gpu_ttm_tt *gtt = (void *)ttm;
> -
> -       gtt->offset = (unsigned long)(bo_mem->start << PAGE_SHIFT);
> -       if (!ttm->num_pages)
> -               WARN(1, "nothing to bind %lu pages for mreg %p back %p!\n",
> -                    ttm->num_pages, bo_mem, ttm);
> -
> -       /* Not implemented */
>         return 0;
>  }
>
>  static int virtio_gpu_ttm_backend_unbind(struct ttm_tt *ttm)
>  {
> -       /* Not implemented */
>         return 0;
>  }
>
>  static void virtio_gpu_ttm_backend_destroy(struct ttm_tt *ttm)
>  {
> -       struct virtio_gpu_ttm_tt *gtt = (void *)ttm;
> +       struct virtio_gpu_ttm_tt *gtt =
> +               container_of(ttm, struct virtio_gpu_ttm_tt, ttm.ttm);
>
>         ttm_dma_tt_fini(&gtt->ttm);
>         kfree(gtt);
> @@ -299,7 +290,7 @@ static struct ttm_tt *virtio_gpu_ttm_tt_create(struct ttm_buffer_object *bo,
>         if (gtt == NULL)
>                 return NULL;
>         gtt->ttm.ttm.func = &virtio_gpu_backend_func;
> -       gtt->vgdev = vgdev;
> +       gtt->obj = container_of(bo, struct virtio_gpu_object, tbo);
>         if (ttm_dma_tt_init(&gtt->ttm, bo, page_flags)) {
>                 kfree(gtt);
>                 return NULL;
> @@ -307,49 +298,30 @@ static struct ttm_tt *virtio_gpu_ttm_tt_create(struct ttm_buffer_object *bo,
>         return &gtt->ttm.ttm;
>  }
>
> -static void virtio_gpu_move_null(struct ttm_buffer_object *bo,
> -                                struct ttm_mem_reg *new_mem)
> -{
> -       struct ttm_mem_reg *old_mem = &bo->mem;
> -
> -       BUG_ON(old_mem->mm_node != NULL);
> -       *old_mem = *new_mem;
> -       new_mem->mm_node = NULL;
> -}
> -
> -static int virtio_gpu_bo_move(struct ttm_buffer_object *bo, bool evict,
> -                             struct ttm_operation_ctx *ctx,
> -                             struct ttm_mem_reg *new_mem)
> +static int virtio_gpu_ttm_tt_populate(struct ttm_tt *ttm, struct ttm_operation_ctx *ctx)
>  {
> +       struct virtio_gpu_ttm_tt *gtt =
> +               container_of(ttm, struct virtio_gpu_ttm_tt, ttm.ttm);
> +       struct virtio_gpu_device *vgdev =
> +               (struct virtio_gpu_device *)gtt->obj->gem_base.dev->dev_private;
>         int ret;
>
> -       ret = ttm_bo_wait(bo, ctx->interruptible, ctx->no_wait_gpu);
> +       ret = ttm_pool_populate(ttm, ctx);
>         if (ret)
>                 return ret;
> -
> -       virtio_gpu_move_null(bo, new_mem);
> -       return 0;
> +       virtio_gpu_object_attach(vgdev, gtt->obj, NULL);
> +       return ret;
>  }
>
> -static void virtio_gpu_bo_move_notify(struct ttm_buffer_object *tbo,
> -                                     bool evict,
> -                                     struct ttm_mem_reg *new_mem)
> +static void virtio_gpu_ttm_tt_unpopulate(struct ttm_tt *ttm)
>  {
> -       struct virtio_gpu_object *bo;
> -       struct virtio_gpu_device *vgdev;
> +       struct virtio_gpu_ttm_tt *gtt =
> +               container_of(ttm, struct virtio_gpu_ttm_tt, ttm.ttm);
> +       struct virtio_gpu_device *vgdev =
> +               (struct virtio_gpu_device *)gtt->obj->gem_base.dev->dev_private;
>
> -       bo = container_of(tbo, struct virtio_gpu_object, tbo);
> -       vgdev = (struct virtio_gpu_device *)bo->gem_base.dev->dev_private;
> -
> -       if (!new_mem || (new_mem->placement & TTM_PL_FLAG_SYSTEM)) {
> -               if (bo->hw_res_handle)
> -                       virtio_gpu_object_detach(vgdev, bo);
> -
> -       } else if (new_mem->placement & TTM_PL_FLAG_TT) {
> -               if (bo->hw_res_handle) {
> -                       virtio_gpu_object_attach(vgdev, bo, NULL);
> -               }
> -       }
> +       virtio_gpu_object_detach(vgdev, gtt->obj);
> +       ttm_pool_unpopulate(ttm);
>  }
>
>  static void virtio_gpu_bo_swap_notify(struct ttm_buffer_object *tbo)
> @@ -366,15 +338,15 @@ static void virtio_gpu_bo_swap_notify(struct ttm_buffer_object *tbo)
>
>  static struct ttm_bo_driver virtio_gpu_bo_driver = {
>         .ttm_tt_create = &virtio_gpu_ttm_tt_create,
> +       .ttm_tt_populate = &virtio_gpu_ttm_tt_populate,
> +       .ttm_tt_unpopulate = &virtio_gpu_ttm_tt_unpopulate,
>         .invalidate_caches = &virtio_gpu_invalidate_caches,
>         .init_mem_type = &virtio_gpu_init_mem_type,
>         .eviction_valuable = ttm_bo_eviction_valuable,
>         .evict_flags = &virtio_gpu_evict_flags,
> -       .move = &virtio_gpu_bo_move,
>         .verify_access = &virtio_gpu_verify_access,
>         .io_mem_reserve = &virtio_gpu_ttm_io_mem_reserve,
>         .io_mem_free = &virtio_gpu_ttm_io_mem_free,
> -       .move_notify = &virtio_gpu_bo_move_notify,
>         .swap_notify = &virtio_gpu_bo_swap_notify,
>  };
>
> --
> 2.9.3
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply

* Re: [PATCH 5/8] drm/virtio: track created object state
From: Dave Airlie @ 2018-10-18  1:25 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: Dave Airlie, LKML, dri-devel, open list:VIRTIO CORE, NET...
In-Reply-To: <20181001103222.11924-6-kraxel@redhat.com>

On Mon, 1 Oct 2018 at 20:33, Gerd Hoffmann <kraxel@redhat.com> wrote:
>
> Track whenever the virtio_gpu_object is already created (i.e. host knows
> about it) in a new variable.  Add checks to virtio_gpu_object_attach()
> to do nothing on objects not created yet.
>
> Make virtio_gpu_ttm_bo_destroy() use the new variable too, instead of
> expecting hw_res_handle indicating the object state.

Is there a potential patch ordering issue here? If this patch changes
the code to avoid using hw_res_handle,
is after patches that start filling in hw_res_handle in places we
haven't filled it in before.

Maybe this patch should happen earlier.

Dave.

>
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  drivers/gpu/drm/virtio/virtgpu_drv.h    | 3 ++-
>  drivers/gpu/drm/virtio/virtgpu_fb.c     | 2 +-
>  drivers/gpu/drm/virtio/virtgpu_gem.c    | 2 +-
>  drivers/gpu/drm/virtio/virtgpu_ioctl.c  | 3 ++-
>  drivers/gpu/drm/virtio/virtgpu_object.c | 2 +-
>  drivers/gpu/drm/virtio/virtgpu_vq.c     | 8 ++++++--
>  6 files changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
> index 9b26e8ee84..4a39877ce6 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> @@ -65,6 +65,7 @@ struct virtio_gpu_object {
>         struct ttm_placement            placement;
>         struct ttm_buffer_object        tbo;
>         struct ttm_bo_kmap_obj          kmap;
> +       bool created;
>  };
>  #define gem_to_virtio_gpu_obj(gobj) \
>         container_of((gobj), struct virtio_gpu_object, gem_base)
> @@ -263,7 +264,7 @@ void virtio_gpu_resource_id_get(struct virtio_gpu_device *vgdev,
>                                uint32_t *resid);
>  void virtio_gpu_resource_id_put(struct virtio_gpu_device *vgdev, uint32_t id);
>  void virtio_gpu_cmd_create_resource(struct virtio_gpu_device *vgdev,
> -                                   uint32_t resource_id,
> +                                   struct virtio_gpu_object *bo,
>                                     uint32_t format,
>                                     uint32_t width,
>                                     uint32_t height);
> diff --git a/drivers/gpu/drm/virtio/virtgpu_fb.c b/drivers/gpu/drm/virtio/virtgpu_fb.c
> index b16c62c4d8..c22a8246b6 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_fb.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_fb.c
> @@ -232,7 +232,7 @@ static int virtio_gpufb_create(struct drm_fb_helper *helper,
>                 return PTR_ERR(obj);
>
>         virtio_gpu_resource_id_get(vgdev, &obj->hw_res_handle);
> -       virtio_gpu_cmd_create_resource(vgdev, obj->hw_res_handle, format,
> +       virtio_gpu_cmd_create_resource(vgdev, obj, format,
>                                        mode_cmd.width, mode_cmd.height);
>
>         ret = virtio_gpu_object_kmap(obj);
> diff --git a/drivers/gpu/drm/virtio/virtgpu_gem.c b/drivers/gpu/drm/virtio/virtgpu_gem.c
> index 5450f7ab5b..665d18a49d 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_gem.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_gem.c
> @@ -104,7 +104,7 @@ int virtio_gpu_mode_dumb_create(struct drm_file *file_priv,
>         format = virtio_gpu_translate_format(DRM_FORMAT_HOST_XRGB8888);
>         obj = gem_to_virtio_gpu_obj(gobj);
>         virtio_gpu_resource_id_get(vgdev, &obj->hw_res_handle);
> -       virtio_gpu_cmd_create_resource(vgdev, obj->hw_res_handle, format,
> +       virtio_gpu_cmd_create_resource(vgdev, obj, format,
>                                        args->width, args->height);
>
>         /* attach the object to the resource */
> diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> index 44c9160c14..f9c55ecfca 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> @@ -256,7 +256,7 @@ static int virtio_gpu_resource_create_ioctl(struct drm_device *dev, void *data,
>         virtio_gpu_resource_id_get(vgdev, &qobj->hw_res_handle);
>
>         if (!vgdev->has_virgl_3d) {
> -               virtio_gpu_cmd_create_resource(vgdev, qobj->hw_res_handle, rc->format,
> +               virtio_gpu_cmd_create_resource(vgdev, qobj, rc->format,
>                                                rc->width, rc->height);
>
>                 ret = virtio_gpu_object_attach(vgdev, qobj, NULL);
> @@ -285,6 +285,7 @@ static int virtio_gpu_resource_create_ioctl(struct drm_device *dev, void *data,
>                 rc_3d.flags = cpu_to_le32(rc->flags);
>
>                 virtio_gpu_cmd_resource_create_3d(vgdev, &rc_3d, NULL);
> +               qobj->created = true;
>                 ret = virtio_gpu_object_attach(vgdev, qobj, &fence);
>                 if (ret) {
>                         ttm_eu_backoff_reservation(&ticket, &validate_list);
> diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c b/drivers/gpu/drm/virtio/virtgpu_object.c
> index eca7655374..6611c487d7 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_object.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_object.c
> @@ -33,7 +33,7 @@ static void virtio_gpu_ttm_bo_destroy(struct ttm_buffer_object *tbo)
>         bo = container_of(tbo, struct virtio_gpu_object, tbo);
>         vgdev = (struct virtio_gpu_device *)bo->gem_base.dev->dev_private;
>
> -       if (bo->hw_res_handle)
> +       if (bo->created)
>                 virtio_gpu_cmd_unref_resource(vgdev, bo->hw_res_handle);
>         if (bo->pages)
>                 virtio_gpu_object_free_sg_table(bo);
> diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c
> index dd4464ccb1..45da3c87b6 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_vq.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
> @@ -388,7 +388,7 @@ static int virtio_gpu_queue_cursor(struct virtio_gpu_device *vgdev,
>
>  /* create a basic resource */
>  void virtio_gpu_cmd_create_resource(struct virtio_gpu_device *vgdev,
> -                                   uint32_t resource_id,
> +                                   struct virtio_gpu_object *bo,
>                                     uint32_t format,
>                                     uint32_t width,
>                                     uint32_t height)
> @@ -400,12 +400,13 @@ void virtio_gpu_cmd_create_resource(struct virtio_gpu_device *vgdev,
>         memset(cmd_p, 0, sizeof(*cmd_p));
>
>         cmd_p->hdr.type = cpu_to_le32(VIRTIO_GPU_CMD_RESOURCE_CREATE_2D);
> -       cmd_p->resource_id = cpu_to_le32(resource_id);
> +       cmd_p->resource_id = cpu_to_le32(bo->hw_res_handle);
>         cmd_p->format = cpu_to_le32(format);
>         cmd_p->width = cpu_to_le32(width);
>         cmd_p->height = cpu_to_le32(height);
>
>         virtio_gpu_queue_ctrl_buffer(vgdev, vbuf);
> +       bo->created = true;
>  }
>
>  void virtio_gpu_cmd_unref_resource(struct virtio_gpu_device *vgdev,
> @@ -868,6 +869,9 @@ int virtio_gpu_object_attach(struct virtio_gpu_device *vgdev,
>         struct scatterlist *sg;
>         int si, nents;
>
> +       if (!obj->created)
> +               return 0;
> +
>         if (!obj->pages) {
>                 int ret;
>
> --
> 2.9.3
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply

* Re: [RFC] VSOCK: The performance problem of vhost_vsock.
From: jiangyiwen @ 2018-10-18  1:22 UTC (permalink / raw)
  To: Jason Wang, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <70e7a894-ad12-9909-a0a8-c8772c7c232f@redhat.com>

On 2018/10/17 20:31, Jason Wang wrote:
> 
> On 2018/10/17 下午7:41, jiangyiwen wrote:
>> On 2018/10/17 17:51, Jason Wang wrote:
>>> On 2018/10/17 下午5:39, Jason Wang wrote:
>>>>> Hi Jason and Stefan,
>>>>>
>>>>> Maybe I find the reason of bad performance.
>>>>>
>>>>> I found pkt_len is limited to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE(4K),
>>>>> it will cause the bandwidth is limited to 500~600MB/s. And once I
>>>>> increase to 64k, it can improve about 3 times(~1500MB/s).
>>>>
>>>> Looks like the value was chosen for a balance between rx buffer size and performance. Allocating 64K always even for small packet is kind of waste and stress for guest memory. Virito-net try to avoid this by inventing the merge able rx buffer which allows big packet to be scattered in into different buffers. We can reuse this idea or revisit the idea of using virtio-net/vhost-net as a transport of vsock.
>>>>
>>>> What interesting is the performance is still behind vhost-net.
>>>>
>>>> Thanks
>>>>
>>>>> By the way, I send to 64K in application once, and I don't use
>>>>> sg_init_one and rewrite function to packet sg list because pkt_len
>>>>> include multiple pages.
>>>>>
>>>>> Thanks,
>>>>> Yiwen.
>>>
>>> Btw, if you're using vsock for transferring large files, maybe it's more efficient to implement sendpage() for vsock to allow sendfile()/splice() work.
>>>
>>> Thanks
>>>
>> I can't agree more.
>>
>> why vhost_vsock is still behind vhost_net?
>> Because I use sendfile() to test performance at first, and then
>> I found vsock don't implement sendpage() and cause the bandwidth
>> can't be increased. So I use read() and send() to replace sendfile(),
>> it will increase some switch between kernel and user mode, and sendfile()
>> can support zero copy. I think this is main reason.
>>
>> Thanks.
> 
> 
> Want to post patches for this then :) ?
> 
> Thanks
> 

I may not post patches at the moment because there are other tasks.

After a period of time, I will consider implement the feature.

Thanks.

> 
>>
>>> .
>>>
>>
> 
> .
> 


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v2] virtio-gpu: add VIRTIO_GPU_F_EDID feature
From: Dave Airlie @ 2018-10-18  1:17 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: KVM list, Michael S. Tsirkin, Dave Airlie, LKML, dri-devel,
	virtio-dev, open list:VIRTIO CORE, NET...
In-Reply-To: <20181005125154.19824-1-kraxel@redhat.com>

Reviewed-by: Dave Airlie <airlied@redhat.com>
On Fri, 5 Oct 2018 at 22:52, Gerd Hoffmann <kraxel@redhat.com> wrote:
>
> The feature allows the guest request an EDID blob (describing monitor
> capabilities) for a given scanout (aka virtual monitor connector).
>
> It brings a new command message, which has just a scanout field (beside
> the standard virtio-gpu header) and a response message which carries the
> EDID data.
>
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  include/uapi/linux/virtio_gpu.h | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
>
> diff --git a/include/uapi/linux/virtio_gpu.h b/include/uapi/linux/virtio_gpu.h
> index f43c3c6171..1cef1ff339 100644
> --- a/include/uapi/linux/virtio_gpu.h
> +++ b/include/uapi/linux/virtio_gpu.h
> @@ -41,6 +41,7 @@
>  #include <linux/types.h>
>
>  #define VIRTIO_GPU_F_VIRGL 0
> +#define VIRTIO_GPU_F_EDID  1
>
>  enum virtio_gpu_ctrl_type {
>         VIRTIO_GPU_UNDEFINED = 0,
> @@ -56,6 +57,7 @@ enum virtio_gpu_ctrl_type {
>         VIRTIO_GPU_CMD_RESOURCE_DETACH_BACKING,
>         VIRTIO_GPU_CMD_GET_CAPSET_INFO,
>         VIRTIO_GPU_CMD_GET_CAPSET,
> +       VIRTIO_GPU_CMD_GET_EDID,
>
>         /* 3d commands */
>         VIRTIO_GPU_CMD_CTX_CREATE = 0x0200,
> @@ -76,6 +78,7 @@ enum virtio_gpu_ctrl_type {
>         VIRTIO_GPU_RESP_OK_DISPLAY_INFO,
>         VIRTIO_GPU_RESP_OK_CAPSET_INFO,
>         VIRTIO_GPU_RESP_OK_CAPSET,
> +       VIRTIO_GPU_RESP_OK_EDID,
>
>         /* error responses */
>         VIRTIO_GPU_RESP_ERR_UNSPEC = 0x1200,
> @@ -291,6 +294,20 @@ struct virtio_gpu_resp_capset {
>         __u8 capset_data[];
>  };
>
> +/* VIRTIO_GPU_CMD_GET_EDID */
> +struct virtio_gpu_get_edid {
> +       struct virtio_gpu_ctrl_hdr hdr;
> +       __le32 scanout;
> +};
> +
> +/* VIRTIO_GPU_RESP_OK_EDID */
> +struct virtio_gpu_resp_edid {
> +       struct virtio_gpu_ctrl_hdr hdr;
> +       __le32 scanout;
> +       __le32 size;
> +       __u8 edid[1024];
> +};
> +
>  #define VIRTIO_GPU_EVENT_DISPLAY (1 << 0)
>
>  struct virtio_gpu_config {
> --
> 2.9.3
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply

* Re: [PATCH v2] drm/bochs: add edid support.
From: Dave Airlie @ 2018-10-18  1:15 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: Dave Airlie, LKML, dri-devel, open list:VIRTIO CORE, NET...
In-Reply-To: <20181005120934.21959-1-kraxel@redhat.com>

On Fri, 5 Oct 2018 at 22:10, Gerd Hoffmann <kraxel@redhat.com> wrote:
>
> Recent qemu (latest master branch, upcoming 3.1 release) got support
> for EDID data.  This patch adds guest driver support.
>
> EDID support in qemu is not (yet) enabled by default, so please use
> 'qemu -device VGA,edid=on' for testing.

The EDID never changes after boot? Is there plans to let it?
>
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  drivers/gpu/drm/bochs/bochs.h     |  1 +
>  drivers/gpu/drm/bochs/bochs_hw.c  | 38 ++++++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/bochs/bochs_kms.c | 18 +++++++++++++++---
>  3 files changed, 54 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/bochs/bochs.h b/drivers/gpu/drm/bochs/bochs.h
> index e7a69077e4..06b8166efa 100644
> --- a/drivers/gpu/drm/bochs/bochs.h
> +++ b/drivers/gpu/drm/bochs/bochs.h
> @@ -66,6 +66,7 @@ struct bochs_device {
>         u16 yres_virtual;
>         u32 stride;
>         u32 bpp;
> +       struct edid *edid;
>
>         /* drm */
>         struct drm_device  *dev;
> diff --git a/drivers/gpu/drm/bochs/bochs_hw.c b/drivers/gpu/drm/bochs/bochs_hw.c
> index cacff73a64..6ce4cdac38 100644
> --- a/drivers/gpu/drm/bochs/bochs_hw.c
> +++ b/drivers/gpu/drm/bochs/bochs_hw.c
> @@ -69,6 +69,41 @@ static void bochs_hw_set_little_endian(struct bochs_device *bochs)
>  #define bochs_hw_set_native_endian(_b) bochs_hw_set_little_endian(_b)
>  #endif
>
> +static int bochs_load_edid(struct bochs_device *bochs)
> +{
> +       uint8_t *blob;
> +       size_t i, len;
> +       uint8_t num_exts;
> +
> +       if (!bochs->mmio)
> +               return -1;
> +
> +       if ((readb(bochs->mmio+0) != 0x00 ||
> +            readb(bochs->mmio+1) != 0xff))
> +               return -1;
> +
> +       num_exts = readb(bochs->mmio + 126);
> +       len = EDID_LENGTH * (1 + num_exts);
> +       if (len > 0x400 /* vga register offset */)
> +               return -1;
> +
> +       kfree(bochs->edid);
> +       bochs->edid = kmalloc(len, GFP_KERNEL);

Don't you need to free it somewhere?

Dave.

^ permalink raw reply

* Re: [PATCH v3 2/7] dt-bindings: virtio: Add virtio-pci-iommu node
From: Rob Herring @ 2018-10-18  0:35 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: mark.rutland, peter.maydell, lorenzo.pieralisi, tnowicki,
	devicetree, linux-pci, joro, mst, will.deacon, virtualization,
	eric.auger, iommu, robh+dt, marc.zyngier, robin.murphy, kvmarm
In-Reply-To: <20181012145917.6840-3-jean-philippe.brucker@arm.com>

On Fri, 12 Oct 2018 15:59:12 +0100, Jean-Philippe Brucker wrote:
> Some systems implement virtio-iommu as a PCI endpoint. The operating
> systems needs to discover the relationship between IOMMU and masters long
> before the PCI endpoint gets probed. Add a PCI child node to describe the
> virtio-iommu device.
> 
> The virtio-pci-iommu is conceptually split between a PCI programming
> interface and a translation component on the parent bus. The latter
> doesn't have a node in the device tree. The virtio-pci-iommu node
> describes both, by linking the PCI endpoint to "iommus" property of DMA
> master nodes and to "iommu-map" properties of bus nodes.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  .../devicetree/bindings/virtio/iommu.txt      | 66 +++++++++++++++++++
>  1 file changed, 66 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt
> 

Reviewed-by: Rob Herring <robh@kernel.org>

^ permalink raw reply

* Re: [PATCH v3 1/7] dt-bindings: virtio-mmio: Add IOMMU description
From: Rob Herring @ 2018-10-18  0:30 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: mark.rutland, peter.maydell, lorenzo.pieralisi, tnowicki,
	devicetree, linux-pci, joro, mst, will.deacon, virtualization,
	eric.auger, iommu, marc.zyngier, robin.murphy, kvmarm
In-Reply-To: <20181012145917.6840-2-jean-philippe.brucker@arm.com>

On Fri, Oct 12, 2018 at 03:59:11PM +0100, Jean-Philippe Brucker wrote:
> The nature of a virtio-mmio node is discovered by the virtio driver at
> probe time. However the DMA relation between devices must be described
> statically. When a virtio-mmio node is a virtio-iommu device, it needs an
> "#iommu-cells" property as specified by bindings/iommu/iommu.txt.
> 
> Otherwise, the virtio-mmio device may perform DMA through an IOMMU, which
> requires an "iommus" property. Describe these requirements in the
> device-tree bindings documentation.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  .../devicetree/bindings/virtio/mmio.txt       | 30 +++++++++++++++++++
>  1 file changed, 30 insertions(+)

One nit, otherwise,

Reviewed-by: Rob Herring <robh@kernel.org>

> 
> diff --git a/Documentation/devicetree/bindings/virtio/mmio.txt b/Documentation/devicetree/bindings/virtio/mmio.txt
> index 5069c1b8e193..748595473b36 100644
> --- a/Documentation/devicetree/bindings/virtio/mmio.txt
> +++ b/Documentation/devicetree/bindings/virtio/mmio.txt
> @@ -8,10 +8,40 @@ Required properties:
>  - reg:		control registers base address and size including configuration space
>  - interrupts:	interrupt generated by the device
>  
> +Required properties for virtio-iommu:
> +
> +- #iommu-cells:	When the node corresponds to a virtio-iommu device, it is
> +		linked to DMA masters using the "iommus" or "iommu-map"
> +		properties [1][2]. #iommu-cells specifies the size of the
> +		"iommus" property. For virtio-iommu #iommu-cells must be
> +		1, each cell describing a single endpoint ID.
> +
> +Optional properties:
> +
> +- iommus:	If the device accesses memory through an IOMMU, it should
> +		have an "iommus" property [1]. Since virtio-iommu itself
> +		does not access memory through an IOMMU, the "virtio,mmio"
> +		node cannot have both an "#iommu-cells" and an "iommus"
> +		property.
> +
>  Example:
>  
>  	virtio_block@3000 {
>  		compatible = "virtio,mmio";
>  		reg = <0x3000 0x100>;
>  		interrupts = <41>;
> +
> +		/* Device has endpoint ID 23 */
> +		iommus = <&viommu 23>
>  	}
> +
> +	viommu: virtio_iommu@3100 {

iommu@3100

> +		compatible = "virtio,mmio";
> +		reg = <0x3100 0x100>;
> +		interrupts = <42>;
> +
> +		#iommu-cells = <1>
> +	}
> +
> +[1] Documentation/devicetree/bindings/iommu/iommu.txt
> +[2] Documentation/devicetree/bindings/pci/pci-iommu.txt
> -- 
> 2.19.1
> 

^ permalink raw reply

* Re: [PATCH v3 0/7] Add virtio-iommu driver
From: Michael S. Tsirkin @ 2018-10-17 15:23 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Mark Rutland, devicetree@vger.kernel.org, Lorenzo Pieralisi,
	tnowicki@caviumnetworks.com, peter.maydell@linaro.org,
	linux-pci@vger.kernel.org, joro@8bytes.org, Will Deacon,
	virtualization@lists.linux-foundation.org, Auger Eric,
	iommu@lists.linux-foundation.org, robh+dt@kernel.org,
	Marc Zyngier, Robin Murphy, kvmarm@lists.cs.columbia.edu
In-Reply-To: <40dddb20-6248-5bb0-e0ed-48bacd1867a1@arm.com>

On Wed, Oct 17, 2018 at 12:54:28PM +0100, Jean-Philippe Brucker wrote:
> On 16/10/2018 21:31, Auger Eric wrote:
> > Hi Jean,
> > 
> > On 10/16/18 8:44 PM, Jean-Philippe Brucker wrote:
> >> On 16/10/2018 10:25, Auger Eric wrote:
> >>> Hi Jean,
> >>>
> >>> On 10/12/18 4:59 PM, Jean-Philippe Brucker wrote:
> >>>> Implement the virtio-iommu driver, following specification v0.8 [1].
> >>>> Changes since v2 [2]:
> >>>>
> >>>> * Patches 2-4 allow virtio-iommu to use the PCI transport, since QEMU
> >>>>    would like to phase out the MMIO transport. This produces a complex
> >>>>    topology where the programming interface of the IOMMU could appear
> >>>>    lower than the endpoints that it translates. It's not unheard of (e.g.
> >>>>    AMD IOMMU), and the guest easily copes with this.
> >>>>    
> >>>>    The "Firmware description" section of the specification has been
> >>>>    updated with all combinations of PCI, MMIO and DT, ACPI.
> >>>
> >>> I have a question wrt the FW specification. The IOMMU consumes 1 slot in
> >>> the PCI domain and one needs to leave a RID hole in the iommu-map.  It
> >>> is not obvious to me that this RID always is predictable given the pcie
> >>> enumeration mechanism. Generally we have a coarse grain mapping of RID
> >>> onto iommu phandles/STREAMIDs. Here, if I understand correctly we need
> >>> to precisely identify the RID granted to the iommu. On QEMU this may
> >>> depend on the instantiation order of the virtio-pci device right?
> >> 
> >> Yes, although it should all happen before you boot the guest, since
> >> there is no hotplugging an IOMMU. Could you reserve a PCI slot upfront
> >> and use it for virtio-iommu later? Or generate the iommu-map at the same
> >> time as generating the child node of the PCI RC?
> > 
> > Even when cold-plugging the PCIe devices through qemu CLI, this depends
> > on the order of the pcie devices in the list I guess. I need to further
> > experiment.
> 
> Please let me know how it goes. I guess the problem will be the same for
> building IORT tables? You're also going to need a hole in the ID
> mappings of the PCI root complex node.
> 
> >>> So
> >>> this does not look trivial to build this info. Isn't it possible to do
> >>> this exclusion at kernel level instead?
> >> 
> >> So in theory VIRTIO_F_IOMMU_PLATFORM already does that:
> >> 
> >> VIRTIO_F_IOMMU_PLATFORM(33)
> >>     This feature indicates that the device is behind an IOMMU that
> >>     translates bus addresses from the device into physical addresses in
> >>     memory. If this feature bit is set to 0, then the device emits
> >>     physical addresses which are not translated further, even though an
> >>     IOMMU may be present.
> > 
> > This tells the driver to use the dma api, right? 
> 
> That's how Linux implements the bit, install custom DMA ops when the bit
> is absent. But it doesn't work for everyone and has caused a lot of
> debate (https://patchwork.ozlabs.org/cover/946708/)
> 
> > Effectively this
> > explicitly says whether the device is supposed to be upfront an IOMMU.
> 
> Yes. It's quite strange if you consider hotpluggable hardware, since
> those devices shouldn't get to choose whether they are managed by an
> IOMMU. For the IOMMU itself, it should be fine
> 
> >> For better or for worse, the guest has to implement it. If this feature
> >> bit is unset for virtio-iommu, it does DMA on the physical address
> >> space, regardless of what the static topology description says.
> >> 
> >> In practice it doesn't quite work. If your iommu-map describes the IOMMU
> >> as translating itself, Linux' OF code will wait for the IOMMU to be
> >> probed before probing the IOMMU. Working around this with hacks is
> >> possible, but I don't want to introduce more questionable code to OF and
> >> device tree bindings if there is any other way.
> > Hum ok. I cannot really comment on this.
> > 
> > I just wanted to raise this concern about RID identfication.
> 
> We can always try. Relaxing iommu-map further would be one additional
> patch to Documentation/devicetree/bindings/pci/pci-iommu.txt, and one to
> drivers/iommu/of-iommu.c. I'd rather make it a separate RFC.
> 
> Since we need acks from an OF maintainer and I'd also like Joerg's
> approval for adding a new driver to the IOMMU tree, I think it's too
> late for this iteration. I wasn't intending for this to go into 4.20,
> just have something to discuss at KVM forum next week.
> 
> Thanks,
> Jean

OK then. I'd appreciate it if you mark patches that aren't
intended to be merged as RFC in subject line.
Thanks!

-- 
MST

^ permalink raw reply

* Re: [PATCH v3 3/7] PCI: OF: Allow endpoints to bypass the iommu
From: Michael S. Tsirkin @ 2018-10-17 15:14 UTC (permalink / raw)
  To: Jean-philippe Brucker
  Cc: mark.rutland, devicetree, tnowicki, peter.maydell, marc.zyngier,
	linux-pci, will.deacon, virtualization, jean-philippe.brucker,
	iommu, robh+dt, Bjorn Helgaas, robin.murphy, kvmarm
In-Reply-To: <482d0eb9-8c4c-9d64-7b32-25d5d11a8b8f@gmail.com>

On Mon, Oct 15, 2018 at 08:46:41PM +0100, Jean-philippe Brucker wrote:
> [Replying with my personal address because we're having SMTP issues]
> 
> On 15/10/2018 11:52, Michael S. Tsirkin wrote:
> > On Fri, Oct 12, 2018 at 02:41:59PM -0500, Bjorn Helgaas wrote:
> >> s/iommu/IOMMU/ in subject
> >>
> >> On Fri, Oct 12, 2018 at 03:59:13PM +0100, Jean-Philippe Brucker wrote:
> >>> Using the iommu-map binding, endpoints in a given PCI domain can be
> >>> managed by different IOMMUs. Some virtual machines may allow a subset of
> >>> endpoints to bypass the IOMMU. In some case the IOMMU itself is presented
> >>
> >> s/case/cases/
> >>
> >>> as a PCI endpoint (e.g. AMD IOMMU and virtio-iommu). Currently, when a
> >>> PCI root complex has an iommu-map property, the driver requires all
> >>> endpoints to be described by the property. Allow the iommu-map property to
> >>> have gaps.
> >>
> >> I'm not an IOMMU or virtio expert, so it's not obvious to me why it is
> >> safe to allow devices to bypass the IOMMU.  Does this mean a typo in
> >> iommu-map could inadvertently allow devices to bypass it?
> > 
> > 
> > Thinking about this comment, I would like to ask: can't the
> > virtio device indicate the ranges in a portable way?
> > This would minimize the dependency on dt bindings and ACPI,
> > enabling support for systems that have neither but do
> > have virtio e.g. through pci.
> 
> I thought about adding a PROBE request for this in virtio-iommu, but it
> wouldn't be usable by a Linux guest because of a bootstrapping problem.

Hmm. At some level it seems wrong to design hardware interfaces
around how Linux happens to probe things. That can change at any time
...

> Early on, Linux needs a description of device dependencies, to determine
> in which order to probe them. If the device dependency was described by
> virtio-iommu itself, the guest could for example initialize a NIC,
> allocate buffers and start DMA on the physical address space (which aborts
> if the IOMMU implementation disallows DMA by default), only to find out
> once the virtio-iommu module is loaded that it needs to cancel all DMA and
> reconfigure the NIC. With a static description such as iommu-map in DT or
> ACPI remapping tables, the guest can defer probing of the NIC until the
> IOMMU is initialized.
> 
> Thanks,
> Jean

Could you point me at the code you refer to here?

-- 
MST

^ permalink raw reply

* Re: [PATCH v3] virtio_net: avoid using netif_tx_disable() for serializing tx routine
From: Michael S. Tsirkin @ 2018-10-17 15:09 UTC (permalink / raw)
  To: Ake Koomsin; +Cc: netdev, David S. Miller, linux-kernel, virtualization
In-Reply-To: <20181017104419.13003-1-ake@igel.co.jp>

On Wed, Oct 17, 2018 at 07:44:12PM +0900, Ake Koomsin wrote:
> Commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
> introduces netif_tx_disable() after netif_device_detach() in order to
> avoid use-after-free of tx queues. However, there are two issues.
> 
> 1) Its operation is redundant with netif_device_detach() in case the
>    interface is running.
> 2) In case of the interface is not running before suspending and
>    resuming, the tx does not get resumed by netif_device_attach().
>    This results in losing network connectivity.
> 
> It is better to use netif_tx_lock_bh()/netif_tx_unlock_bh() instead for
> serializing tx routine during reset. This also preserves the symmetry
> of netif_device_detach() and netif_device_attach().
> 
> Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
> Signed-off-by: Ake Koomsin <ake@igel.co.jp>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

Thanks a lot for debugging!
Seems like stable material to me, right?

> ---
>  drivers/net/virtio_net.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 3f5aa59c37b7..3e2c041d76ac 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2267,8 +2267,9 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
>  	/* Make sure no work handler is accessing the device */
>  	flush_work(&vi->config_work);
>  
> +	netif_tx_lock_bh(vi->dev);
>  	netif_device_detach(vi->dev);
> -	netif_tx_disable(vi->dev);
> +	netif_tx_unlock_bh(vi->dev);
>  	cancel_delayed_work_sync(&vi->refill);
>  
>  	if (netif_running(vi->dev)) {
> @@ -2304,7 +2305,9 @@ static int virtnet_restore_up(struct virtio_device *vdev)
>  		}
>  	}
>  
> +	netif_tx_lock_bh(vi->dev);
>  	netif_device_attach(vi->dev);
> +	netif_tx_unlock_bh(vi->dev);
>  	return err;
>  }
>  
> -- 
> 2.19.1

^ permalink raw reply

* Re: [RFC] VSOCK: The performance problem of vhost_vsock.
From: Jason Wang @ 2018-10-17 12:31 UTC (permalink / raw)
  To: jiangyiwen, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <5BC72006.9010000@huawei.com>


On 2018/10/17 下午7:41, jiangyiwen wrote:
> On 2018/10/17 17:51, Jason Wang wrote:
>> On 2018/10/17 下午5:39, Jason Wang wrote:
>>>> Hi Jason and Stefan,
>>>>
>>>> Maybe I find the reason of bad performance.
>>>>
>>>> I found pkt_len is limited to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE(4K),
>>>> it will cause the bandwidth is limited to 500~600MB/s. And once I
>>>> increase to 64k, it can improve about 3 times(~1500MB/s).
>>>
>>> Looks like the value was chosen for a balance between rx buffer size and performance. Allocating 64K always even for small packet is kind of waste and stress for guest memory. Virito-net try to avoid this by inventing the merge able rx buffer which allows big packet to be scattered in into different buffers. We can reuse this idea or revisit the idea of using virtio-net/vhost-net as a transport of vsock.
>>>
>>> What interesting is the performance is still behind vhost-net.
>>>
>>> Thanks
>>>
>>>> By the way, I send to 64K in application once, and I don't use
>>>> sg_init_one and rewrite function to packet sg list because pkt_len
>>>> include multiple pages.
>>>>
>>>> Thanks,
>>>> Yiwen.
>>
>> Btw, if you're using vsock for transferring large files, maybe it's more efficient to implement sendpage() for vsock to allow sendfile()/splice() work.
>>
>> Thanks
>>
> I can't agree more.
>
> why vhost_vsock is still behind vhost_net?
> Because I use sendfile() to test performance at first, and then
> I found vsock don't implement sendpage() and cause the bandwidth
> can't be increased. So I use read() and send() to replace sendfile(),
> it will increase some switch between kernel and user mode, and sendfile()
> can support zero copy. I think this is main reason.
>
> Thanks.


Want to post patches for this then :) ?

Thanks


>
>> .
>>
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox