Linux virtualization list
 help / color / mirror / Atom feed
* Re: [PATCH] vhost/scsi: truncate T10 PI iov_iter to prot_bytes
From: Greg Edwards @ 2018-09-21 17:49 UTC (permalink / raw)
  To: virtualization; +Cc: Paolo Bonzini, Mike Christie, Michael S. Tsirkin
In-Reply-To: <20180822192153.24217-1-gedwards@ddn.com>

On Wed, Aug 22, 2018 at 01:21:53PM -0600, Greg Edwards wrote:
> Commands with protection information included were not truncating the
> protection iov_iter to the number of protection bytes in the command.
> This resulted in vhost_scsi mis-calculating the size of the protection
> SGL in vhost_scsi_calc_sgls(), and including both the protection and
> data SG entries in the protection SGL.
>
> Fixes: 09b13fa8c1a1 ("vhost/scsi: Add ANY_LAYOUT support in vhost_scsi_handle_vq")
> Signed-off-by: Greg Edwards <gedwards@ddn.com>


Any thoughts on this patch?


> ---
>  drivers/vhost/scsi.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
> index 76f8d649147b..cbe0ea26c1ff 100644
> --- a/drivers/vhost/scsi.c
> +++ b/drivers/vhost/scsi.c
> @@ -964,7 +964,8 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
>  				prot_bytes = vhost32_to_cpu(vq, v_req_pi.pi_bytesin);
>  			}
>  			/*
> -			 * Set prot_iter to data_iter, and advance past any
> +			 * Set prot_iter to data_iter and truncate it to
> +			 * prot_bytes, and advance data_iter past any
>  			 * preceeding prot_bytes that may be present.
>  			 *
>  			 * Also fix up the exp_data_len to reflect only the
> @@ -973,6 +974,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
>  			if (prot_bytes) {
>  				exp_data_len -= prot_bytes;
>  				prot_iter = data_iter;
> +				iov_iter_truncate(&prot_iter, prot_bytes);
>  				iov_iter_advance(&data_iter, prot_bytes);
>  			}
>  			tag = vhost64_to_cpu(vq, v_req_pi.tag);
> --
> 2.17.1

^ permalink raw reply

* Re: [PATCH net-next] vhost_net: add a missing error return
From: Michael S. Tsirkin @ 2018-09-20 12:51 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: netdev, kernel-janitors, kvm, virtualization
In-Reply-To: <20180920100158.GA9551@mwanda>

On Thu, Sep 20, 2018 at 01:01:59PM +0300, Dan Carpenter wrote:
> We accidentally left out this error return so it leads to some use after
> free bugs later on.
> 
> Fixes: 0a0be13b8fe2 ("vhost_net: batch submitting XDP buffers to underlayer sockets")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

Ouch.

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index dd4e0a301635..1bff6bc8161a 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -1244,6 +1244,7 @@ static int vhost_net_open(struct inode *inode, struct file *f)
>  		kfree(vqs);
>  		kvfree(n);
>  		kfree(queue);
> +		return -ENOMEM;
>  	}
>  	n->vqs[VHOST_NET_VQ_TX].xdp = xdp;
>  

^ permalink raw reply

* Re: [RFC PATCH 2/2] virtio/s390: fix race in ccw_io_helper()
From: Cornelia Huck @ 2018-09-20 10:15 UTC (permalink / raw)
  To: Halil Pasic; +Cc: linux-s390, Colin Ian King, kvm, virtualization
In-Reply-To: <592b9fd1-0ab0-aa9f-31d7-a717610bd95c@linux.ibm.com>

On Wed, 19 Sep 2018 18:56:45 +0200
Halil Pasic <pasic@linux.ibm.com> wrote:

> On 09/19/2018 04:07 PM, Cornelia Huck wrote:

> > Do you spot any other places where we may need to care about concurrent
> > processing (like for the ->config area in the previous patch)?
> >   
> 
> It is hard to tell, because:
> * Synchronization external to the transport could make things work
> out just fine.
> * virtio_config_ops does not document these requirements if any.
> * So it's up to the devices to use the stuff without shooting
>   themselves in the foot.
> * virtio-pci does not seem to do more to avoid such problems that
>   we do.
> 
> Back then when learning vritio-ccw I did ask myself such questions
> and based on vrito-pci and I was like looks similar, should be
> good enough.

Yep, I agree. If there's nothing obvious, I think we should just leave
it as it is now.

^ permalink raw reply

* [PATCH net-next] vhost_net: add a missing error return
From: Dan Carpenter @ 2018-09-20 10:01 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang
  Cc: netdev, kernel-janitors, kvm, virtualization

We accidentally left out this error return so it leads to some use after
free bugs later on.

Fixes: 0a0be13b8fe2 ("vhost_net: batch submitting XDP buffers to underlayer sockets")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index dd4e0a301635..1bff6bc8161a 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1244,6 +1244,7 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 		kfree(vqs);
 		kvfree(n);
 		kfree(queue);
+		return -ENOMEM;
 	}
 	n->vqs[VHOST_NET_VQ_TX].xdp = xdp;

^ permalink raw reply related

* Re: [virtio-dev] [PATCH 2/2] drm/virtio: add iommu support.
From: Gerd Hoffmann @ 2018-09-20  6:32 UTC (permalink / raw)
  To: Jiandi An
  Cc: virtio-dev@lists.oasis-open.org, Singh, Brijesh, Lendacky, Thomas,
	Dave Airlie, LKML, dri-devel, open list:VIRTIO CORE, NET...,
	Dave Airlie
In-Reply-To: <bb7b8676-cf39-3144-2de1-389f4b0a70b3@amd.com>

  Hi,

> void virtio_gpu_cmd_transfer_to_host_2d(struct virtio_gpu_device *vgdev,
>                                         uint32_t resource_id, uint64_t offset,
> ...
>      struct virtio_gpu_fbdev *vgfbdev = vgdev->vgfbdev;
>      struct virtio_gpu_framebuffer *fb = &vgfbdev->vgfb;
>      struct virtio_gpu_object *obj = gem_to_virtio_gpu_obj(fb->base.obj[0]);

Ah, right.  Should have noticed this on review.  You sync the fbcon
framebuffer unconfitionally ...

> Is there better way to get to the virtio_gpu_object created in the
> virtio_gpu_mode_dumb_create() path from virtio_gpu_device or somehow from drm_file
> via gem_handle down at the layer of virtio_gpu_cmd_transfer_to_host()?

Just pass it down, the call sites all know it (see patch just sent).

cheers,
  Gerd

^ permalink raw reply

* [PATCH] drm/virtio: pass virtio_gpu_object to virtio_gpu_cmd_transfer_to_host_{2d, 3d}
From: Gerd Hoffmann @ 2018-09-20  6:29 UTC (permalink / raw)
  To: dri-devel; +Cc: David Airlie, open list, jiandi, open list:VIRTIO GPU DRIVER

Pass virtio_gpu_object down to virtio_gpu_cmd_transfer_to_host_2d and
virtio_gpu_cmd_transfer_to_host_3d functions, instead of passing just
the virtio resource handle.

This is needed to lookup the scatter list of the object, for dma sync.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 drivers/gpu/drm/virtio/virtgpu_drv.h   |  6 ++++--
 drivers/gpu/drm/virtio/virtgpu_fb.c    |  2 +-
 drivers/gpu/drm/virtio/virtgpu_ioctl.c |  4 ++--
 drivers/gpu/drm/virtio/virtgpu_plane.c |  4 ++--
 drivers/gpu/drm/virtio/virtgpu_vq.c    | 20 ++++++++------------
 5 files changed, 17 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
index a2d79e18bd..253fcf018d 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -270,7 +270,8 @@ void virtio_gpu_cmd_create_resource(struct virtio_gpu_device *vgdev,
 void virtio_gpu_cmd_unref_resource(struct virtio_gpu_device *vgdev,
 				   uint32_t resource_id);
 void virtio_gpu_cmd_transfer_to_host_2d(struct virtio_gpu_device *vgdev,
-					uint32_t resource_id, uint64_t offset,
+					struct virtio_gpu_object *bo,
+					uint64_t offset,
 					__le32 width, __le32 height,
 					__le32 x, __le32 y,
 					struct virtio_gpu_fence **fence);
@@ -316,7 +317,8 @@ void virtio_gpu_cmd_transfer_from_host_3d(struct virtio_gpu_device *vgdev,
 					  struct virtio_gpu_box *box,
 					  struct virtio_gpu_fence **fence);
 void virtio_gpu_cmd_transfer_to_host_3d(struct virtio_gpu_device *vgdev,
-					uint32_t resource_id, uint32_t ctx_id,
+					struct virtio_gpu_object *bo,
+					uint32_t ctx_id,
 					uint64_t offset, uint32_t level,
 					struct virtio_gpu_box *box,
 					struct virtio_gpu_fence **fence);
diff --git a/drivers/gpu/drm/virtio/virtgpu_fb.c b/drivers/gpu/drm/virtio/virtgpu_fb.c
index b9678c4082..3364b0970d 100644
--- a/drivers/gpu/drm/virtio/virtgpu_fb.c
+++ b/drivers/gpu/drm/virtio/virtgpu_fb.c
@@ -95,7 +95,7 @@ static int virtio_gpu_dirty_update(struct virtio_gpu_framebuffer *fb,
 
 		offset = (y * fb->base.pitches[0]) + x * bpp;
 
-		virtio_gpu_cmd_transfer_to_host_2d(vgdev, obj->hw_res_handle,
+		virtio_gpu_cmd_transfer_to_host_2d(vgdev, obj,
 						   offset,
 						   cpu_to_le32(w),
 						   cpu_to_le32(h),
diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index 7bdf6f0e58..f16b875d6a 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -429,11 +429,11 @@ static int virtio_gpu_transfer_to_host_ioctl(struct drm_device *dev, void *data,
 	convert_to_hw_box(&box, &args->box);
 	if (!vgdev->has_virgl_3d) {
 		virtio_gpu_cmd_transfer_to_host_2d
-			(vgdev, qobj->hw_res_handle, offset,
+			(vgdev, qobj, offset,
 			 box.w, box.h, box.x, box.y, NULL);
 	} else {
 		virtio_gpu_cmd_transfer_to_host_3d
-			(vgdev, qobj->hw_res_handle,
+			(vgdev, qobj,
 			 vfpriv ? vfpriv->ctx_id : 0, offset,
 			 args->level, &box, &fence);
 		reservation_object_add_excl_fence(qobj->tbo.resv,
diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c b/drivers/gpu/drm/virtio/virtgpu_plane.c
index 88f2fb8c61..682a977d68 100644
--- a/drivers/gpu/drm/virtio/virtgpu_plane.c
+++ b/drivers/gpu/drm/virtio/virtgpu_plane.c
@@ -158,7 +158,7 @@ static void virtio_gpu_primary_plane_update(struct drm_plane *plane,
 		handle = bo->hw_res_handle;
 		if (bo->dumb) {
 			virtio_gpu_cmd_transfer_to_host_2d
-				(vgdev, handle, 0,
+				(vgdev, bo, 0,
 				 cpu_to_le32(plane->state->src_w >> 16),
 				 cpu_to_le32(plane->state->src_h >> 16),
 				 cpu_to_le32(plane->state->src_x >> 16),
@@ -217,7 +217,7 @@ static void virtio_gpu_cursor_plane_update(struct drm_plane *plane,
 	if (bo && bo->dumb && (plane->state->fb != old_state->fb)) {
 		/* new cursor -- update & wait */
 		virtio_gpu_cmd_transfer_to_host_2d
-			(vgdev, handle, 0,
+			(vgdev, bo, 0,
 			 cpu_to_le32(plane->state->crtc_w),
 			 cpu_to_le32(plane->state->crtc_h),
 			 0, 0, &fence);
diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c
index df32811f2c..4e2e037aed 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vq.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
@@ -483,28 +483,26 @@ void virtio_gpu_cmd_resource_flush(struct virtio_gpu_device *vgdev,
 }
 
 void virtio_gpu_cmd_transfer_to_host_2d(struct virtio_gpu_device *vgdev,
-					uint32_t resource_id, uint64_t offset,
+					struct virtio_gpu_object *bo,
+					uint64_t offset,
 					__le32 width, __le32 height,
 					__le32 x, __le32 y,
 					struct virtio_gpu_fence **fence)
 {
 	struct virtio_gpu_transfer_to_host_2d *cmd_p;
 	struct virtio_gpu_vbuffer *vbuf;
-	struct virtio_gpu_fbdev *vgfbdev = vgdev->vgfbdev;
-	struct virtio_gpu_framebuffer *fb = &vgfbdev->vgfb;
-	struct virtio_gpu_object *obj = gem_to_virtio_gpu_obj(fb->base.obj[0]);
 	bool use_dma_api = !virtio_has_iommu_quirk(vgdev->vdev);
 
 	if (use_dma_api)
 		dma_sync_sg_for_device(vgdev->vdev->dev.parent,
-				       obj->pages->sgl, obj->pages->nents,
+				       bo->pages->sgl, bo->pages->nents,
 				       DMA_TO_DEVICE);
 
 	cmd_p = virtio_gpu_alloc_cmd(vgdev, &vbuf, sizeof(*cmd_p));
 	memset(cmd_p, 0, sizeof(*cmd_p));
 
 	cmd_p->hdr.type = cpu_to_le32(VIRTIO_GPU_CMD_TRANSFER_TO_HOST_2D);
-	cmd_p->resource_id = cpu_to_le32(resource_id);
+	cmd_p->resource_id = cpu_to_le32(bo->hw_res_handle);
 	cmd_p->offset = cpu_to_le64(offset);
 	cmd_p->r.width = width;
 	cmd_p->r.height = height;
@@ -791,21 +789,19 @@ virtio_gpu_cmd_resource_create_3d(struct virtio_gpu_device *vgdev,
 }
 
 void virtio_gpu_cmd_transfer_to_host_3d(struct virtio_gpu_device *vgdev,
-					uint32_t resource_id, uint32_t ctx_id,
+					struct virtio_gpu_object *bo,
+					uint32_t ctx_id,
 					uint64_t offset, uint32_t level,
 					struct virtio_gpu_box *box,
 					struct virtio_gpu_fence **fence)
 {
 	struct virtio_gpu_transfer_host_3d *cmd_p;
 	struct virtio_gpu_vbuffer *vbuf;
-	struct virtio_gpu_fbdev *vgfbdev = vgdev->vgfbdev;
-	struct virtio_gpu_framebuffer *fb = &vgfbdev->vgfb;
-	struct virtio_gpu_object *obj = gem_to_virtio_gpu_obj(fb->base.obj[0]);
 	bool use_dma_api = !virtio_has_iommu_quirk(vgdev->vdev);
 
 	if (use_dma_api)
 		dma_sync_sg_for_device(vgdev->vdev->dev.parent,
-				       obj->pages->sgl, obj->pages->nents,
+				       bo->pages->sgl, bo->pages->nents,
 				       DMA_TO_DEVICE);
 
 	cmd_p = virtio_gpu_alloc_cmd(vgdev, &vbuf, sizeof(*cmd_p));
@@ -813,7 +809,7 @@ void virtio_gpu_cmd_transfer_to_host_3d(struct virtio_gpu_device *vgdev,
 
 	cmd_p->hdr.type = cpu_to_le32(VIRTIO_GPU_CMD_TRANSFER_TO_HOST_3D);
 	cmd_p->hdr.ctx_id = cpu_to_le32(ctx_id);
-	cmd_p->resource_id = cpu_to_le32(resource_id);
+	cmd_p->resource_id = cpu_to_le32(bo->hw_res_handle);
 	cmd_p->box = *box;
 	cmd_p->offset = cpu_to_le64(offset);
 	cmd_p->level = cpu_to_le32(level);
-- 
2.9.3

^ permalink raw reply related

* Re: [RFC PATCH 2/2] virtio/s390: fix race in ccw_io_helper()
From: Cornelia Huck @ 2018-09-19 14:07 UTC (permalink / raw)
  To: Halil Pasic; +Cc: linux-s390, Colin Ian King, kvm, virtualization
In-Reply-To: <c0148f86-b7a5-0c66-146d-f2dbcd678436@linux.ibm.com>

On Wed, 19 Sep 2018 15:17:28 +0200
Halil Pasic <pasic@linux.ibm.com> wrote:

> On 09/18/2018 08:45 PM, Cornelia Huck wrote:

> > We basically have two options:
> > - Have a way to queue I/O operations and then handle them in sequence.
> >   Creates complexity, and is likely overkill. (We already have a kind
> >   of serialization because we re-submit the channel program until the
> >   hypervisor accepts it; the problem comes from the wait queue usage.)  
> 
> I secretly hoped we already have something like this somewhere. Getting
> some kind of requests processed and wanting to know if each of these worked
> or not seemed like fairly common. I agree, implementing this just for
> virtio-ccw would be an overkill, I agree.

I've encountered that pattern so far mostly for driver-internal I/O
(setting some stuff up via channel commands etc.) Other usages (like
e.g. the dasd driver processing block layer requests) are asynchronous,
and the common I/O layer uses a full-fledged fsm. Much of the trouble
comes from implementing a synchronous interface via asynchronous
commands, and I'd really like to keep that as simple as possible
(especially as this is not the hot path).

> 
> > - Add serialization around the submit/wait procedure (as you did), but
> >   with a per-device mutex. That looks like the easiest solution.
> >   
> 
> Yep, I'm for doing something like this first. We can think about doing
> something more elaborate later. I will send a non-RFC with an extra
> per-device mutex. Unless you object.

No, that sounds good to me.

> 
> Another thought that crossed my head was making the transport ops
> mutex on each virtio-ccw device -- see our conversation on get/set
> config. I don't think it would make a big difference, since the
> ccw stuff is mutex already, so we only have parallelism for the
> preparation and for post-processing the results of the ccw io.

Do you spot any other places where we may need to care about concurrent
processing (like for the ->config area in the previous patch)?

^ permalink raw reply

* Re: [patch 09/11] x86/vdso: Simplify the invalid vclock case
From: Thomas Gleixner @ 2018-09-19 13:29 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Florian Weimer, Juergen Gross, Arnd Bergmann, Peter Zijlstra,
	X86 ML, LKML, Andy Lutomirski, Stephen Boyd, John Stultz,
	Andy Lutomirski, Paolo Bonzini, devel, Linux Virtualization,
	Matt Rickard
In-Reply-To: <06e91c26-756f-f236-0af2-327e520c3065@rasmusvillemoes.dk>

[-- Attachment #1: Type: text/plain, Size: 1781 bytes --]

On Wed, 19 Sep 2018, Rasmus Villemoes wrote:
> On 2018-09-19 00:46, Thomas Gleixner wrote:
> > On Tue, 18 Sep 2018, Andy Lutomirski wrote:
> >>>
> >>
> >> Do we do better if we use signed arithmetic for the whole calculation?
> >> Then a small backwards movement would result in a small backwards result.
> >> Or we could offset everything so that we’d have to go back several
> >> hundred ms before we cross zero.
> > 
> > That would be probably the better solution as signed math would be
> > problematic when the resulting ns value becomes negative. As the delta is
> > really small, otherwise the TSC sync check would have caught it, the caller
> > should never be able to observe time going backwards.
> > 
> > I'll have a look into that. It needs some thought vs. the fractional part
> > of the base time, but it should be not rocket science to get that
> > correct. Famous last words...
> 
> Does the sentinel need to be U64_MAX? What if vgetcyc and its minions
> returned gtod->cycle_last-1 (for some value of 1), and the caller just
> does "if ((s64)cycles - (s64)last < 0) return fallback; ns +=
> (cycles-last)* ...". That should just be a "sub ; js ; ". It's an extra
> load of ->cycle_last, but only on the path where we're heading for the
> fallback anyway. The value of 1 can be adjusted so that in the "js"
> path, we could detect and accept an rdtsc_ordered() call that's just a
> few 10s of cycles behind last and treat that as 0 and continue back on
> the normal path. But maybe it's hard to get gcc to generate the expected
> code.

I played around with a lot of variants and GCC generates all kinds of
interesting ASM. And at some point optimizing that math code is not buying
anything because the LFENCE before RDTSC is dominating all of it.

Thanks,

	tglx

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [virtio-dev] [PATCH 2/2] drm/virtio: add iommu support.
From: Gerd Hoffmann @ 2018-09-19 11:38 UTC (permalink / raw)
  To: Jiandi An
  Cc: virtio-dev@lists.oasis-open.org, Singh, Brijesh, Lendacky, Thomas,
	Dave Airlie, LKML, dri-devel, open list:VIRTIO CORE, NET...,
	Dave Airlie
In-Reply-To: <d4c25272-9df8-33d6-d310-964b17a8c80d@amd.com>

> >> Once display manger is kicked off for example (sudo systemctl start lightdm.service) and
> >> resource id 3 gets created from user space down, it still gives a blank black screen.
> > 
> > Hmm.  I'd suspect there is simply a code path missing.  Can you send the
> > patch you have?
> > 
> > cheers,
> >   Gerd
> > 
> 
> I sent the patch.  For now it does dma_sync_sg on the pages in 
> TRANSFER_TO_HOST_2D/3D when use_dma_api is true.
> 
> https://lore.kernel.org/lkml/20180919070931.91168-1-jiandi.an@amd.com/

Hmm, the way it is hooked up it should not miss any resource update.
So not sure why it isn't working.
Pushed the patch nevertheless as it is clearly a step into the right
direction.

cheers,
  Gerd

^ permalink raw reply

* Re: [PATCH] drm/virtio: add dma sync for dma mapped virtio gpu framebuffer pages
From: kraxel @ 2018-09-19 11:34 UTC (permalink / raw)
  To: An, Jiandi
  Cc: Lendacky, Thomas, Singh, Brijesh, airlied@linux.ie,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	virtualization@lists.linux-foundation.org
In-Reply-To: <20180919070931.91168-1-jiandi.an@amd.com>

On Wed, Sep 19, 2018 at 07:09:53AM +0000, An, Jiandi wrote:
> With virtio gpu ttm-pages being dma mapped, dma sync is needed when
> swiotlb is used as bounce buffers, before TRANSFER_TO_HOST_2D/3D
> commands are sent.

Pushed to drm-misc-next.

thanks,
  Gerd

^ permalink raw reply

* Re: [RFC PATCH 1/2] virtio/s390: avoid race on vcdev->config
From: Cornelia Huck @ 2018-09-19 11:28 UTC (permalink / raw)
  To: Halil Pasic; +Cc: linux-s390, Colin Ian King, kvm, virtualization
In-Reply-To: <2f27c41d-4465-0fce-bbbb-b7b22a179eae@linux.ibm.com>

On Wed, 19 Sep 2018 00:52:14 +0200
Halil Pasic <pasic@linux.ibm.com> wrote:

> On 09/18/2018 08:29 PM, Cornelia Huck wrote:
> > On Wed, 12 Sep 2018 16:02:01 +0200
> > Halil Pasic <pasic@linux.ibm.com> wrote:
> >   
> >> Currently we have a race on vcdev->config in virtio_ccw_get_config() and
> >> in virtio_ccw_set_config().
> >>
> >> This normally does not cause problems, as these are usually infrequent
> >> operations. But occasionally sysfs attributes are directly dependent on
> >> pieces of virio config and trigger a get on each read. This gives us at
> >> least one trigger.  
> > 
> > So, the problem is that you might get unexpected/inconsistent config
> > information?
> >   
> >>
> >> Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> >> ---
> >>  drivers/s390/virtio/virtio_ccw.c | 10 ++++++++--
> >>  1 file changed, 8 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> >> index 8f5c1d7f751a..a5e8530a3391 100644
> >> --- a/drivers/s390/virtio/virtio_ccw.c
> >> +++ b/drivers/s390/virtio/virtio_ccw.c
> >> @@ -828,6 +828,7 @@ static void virtio_ccw_get_config(struct virtio_device *vdev,
> >>  	int ret;
> >>  	struct ccw1 *ccw;
> >>  	void *config_area;
> >> +	unsigned long flags;
> >>  
> >>  	ccw = kzalloc(sizeof(*ccw), GFP_DMA | GFP_KERNEL);
> >>  	if (!ccw)
> >> @@ -846,11 +847,13 @@ static void virtio_ccw_get_config(struct virtio_device *vdev,
> >>  	if (ret)
> >>  		goto out_free;
> >>  
> >> +	spin_lock_irqsave(&vcdev->lock, flags);  
> > 
> > I'm not sure that vcdev->lock is the right lock to use for this, but
> > I'm a bit unsure about what you're guarding against here.>  
> 
> I'm guarding against multiple threads using the shared state that is
> the config member of struct virtio_ccw_device so that at least one
> writes. I will continue with an example below.

Ok.

>  
> >>  	memcpy(vcdev->config, config_area, offset + len);
> >> -	if (buf)
> >> -		memcpy(buf, &vcdev->config[offset], len);
> >>  	if (vcdev->config_ready < offset + len)
> >>  		vcdev->config_ready = offset + len;
> >> +	spin_unlock_irqrestore(&vcdev->lock, flags);
> >> +	if (buf)
> >> +		memcpy(buf, config_area + offset, len);
> >>  
> >>  out_free:
> >>  	kfree(config_area);
> >> @@ -864,6 +867,7 @@ static void virtio_ccw_set_config(struct virtio_device *vdev,
> >>  	struct virtio_ccw_device *vcdev = to_vc_device(vdev);
> >>  	struct ccw1 *ccw;
> >>  	void *config_area;
> >> +	unsigned long flags;
> >>  
> >>  	ccw = kzalloc(sizeof(*ccw), GFP_DMA | GFP_KERNEL);
> >>  	if (!ccw)
> >> @@ -876,9 +880,11 @@ static void virtio_ccw_set_config(struct virtio_device *vdev,
> >>  	/* Make sure we don't overwrite fields. */
> >>  	if (vcdev->config_ready < offset)
> >>  		virtio_ccw_get_config(vdev, 0, NULL, offset);
> >> +	spin_lock_irqsave(&vcdev->lock, flags);
> >>  	memcpy(&vcdev->config[offset], buf, len);
> >>  	/* Write the config area to the host. */
> >>  	memcpy(config_area, vcdev->config, sizeof(vcdev->config));  
> 
> While in this section which is critical now, we could have raced
> with another thread that is writing the vcdev->config (critical section in get)
> that we are reading. That could result in something that could never happen if the
> operations are serialized.

So, the race is basically on the memcpy?

> 
> >> +	spin_unlock_irqrestore(&vcdev->lock, flags);
> >>  	ccw->cmd_code = CCW_CMD_WRITE_CONF;
> >>  	ccw->flags = 0;
> >>  	ccw->count = offset + len;  
> > 
> > One thing that might be a problem here is how reading/writing the
> > config stuff works for virtio-ccw: As channel devices don't have a
> > config space like e.g. PCI devices, I had to come up with a way to
> > implement something like that via dedicated channel commands. I did not
> > want to go via a payload that would provide offset/length, but went
> > with reading/writing everything before the value to be read/written as
> > well. That means we need to call read before write to make sure we
> > don't overwrite things (as the comment states), and how this is done
> > might be problematic.
> >   
> 
> Nod.
> 
> > I'm thinking what we may need is a way to make "read and then write" a
> > single operation and make sure that it does not run concurrently with
> > the simple read operation. Factor out the proper invocation of the read
> > command and the status update, have get_config call this with a reader
> > lock and have set_config call the read-then-write combo with a write
> > lock, maybe?
> >   
> 
> I'm inclined to say. The other tread doing the get may only get us more
> recent results, and that should at least as good. Our get is guaranteed
> to finish, so we won't write complete garbage.
> 
> AFAIR the config can change form the other side too. In that sense making
> the read and the write one operation on the other side would help make us
> completely sane. 

Yes, config fields may be writeable both by the host and by the guest.

> But at the moment I don't think that what you propose
> here is giving us an edge over this patch.

It is probably a bit cleaner concept wise, but I'm not wedded to it.

Another thing that comes to mind is that 'config generation' thing,
which is used on e.g. PCI to make sure that values longer than 32 bits
are consistent. I wonder whether we should explore that direction as
well.

^ permalink raw reply

* [PATCH v3 5/5] drm/virtio: fix DRM_FORMAT_* handling
From: Gerd Hoffmann @ 2018-09-19 11:12 UTC (permalink / raw)
  To: dri-devel
  Cc: David Airlie, michel, open list, open list:VIRTIO GPU DRIVER,
	imirkin
In-Reply-To: <20180919111252.7932-1-kraxel@redhat.com>

Use DRM_FORMAT_HOST_XRGB8888, so we are using the correct format code
on bigendian machines.  Also set the quirk_addfb_prefer_host_byte_order
mode_config bit so drm_mode_addfb() asks for the correct format code.

Both DRM_FORMAT_* and VIRTIO_GPU_FORMAT_* are defined to be little
endian, so using a different mapping on bigendian machines is wrong.
It's there because of broken drm_mode_addfb() behavior.  So with
drm_mode_addfb() being fixed we can fix this too.

While wading through the code I've noticed we have a little issue in
virtio:  We attach a format to the bo when it is created
(DRM_IOCTL_MODE_CREATE_DUMB), not when we map it as framebuffer
(DRM_IOCTL_MODE_ADDFB).  Easy way out:  Support a single format only.
Pick DRM_FORMAT_HOST_XRGB8888, it is the only one actually used in
practice.  Drop unused mappings in virtio_gpu_translate_format().

With this patch applied both ADDFB and ADDFB2 ioctls work correctly in
the virtio-gpu.ko driver on big endian machines.  Without the patch only
ADDFB (which still seems to be used by the majority of userspace) works
correctly.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 drivers/gpu/drm/virtio/virtgpu_display.c |  5 +++
 drivers/gpu/drm/virtio/virtgpu_fb.c      |  2 +-
 drivers/gpu/drm/virtio/virtgpu_gem.c     |  7 +++--
 drivers/gpu/drm/virtio/virtgpu_plane.c   | 54 ++------------------------------
 4 files changed, 13 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_display.c b/drivers/gpu/drm/virtio/virtgpu_display.c
index 0379d68976..8f8fed471e 100644
--- a/drivers/gpu/drm/virtio/virtgpu_display.c
+++ b/drivers/gpu/drm/virtio/virtgpu_display.c
@@ -307,6 +307,10 @@ virtio_gpu_user_framebuffer_create(struct drm_device *dev,
 	struct virtio_gpu_framebuffer *virtio_gpu_fb;
 	int ret;
 
+	if (mode_cmd->pixel_format != DRM_FORMAT_HOST_XRGB8888 &&
+	    mode_cmd->pixel_format != DRM_FORMAT_HOST_ARGB8888)
+		return ERR_PTR(-ENOENT);
+
 	/* lookup object associated with res handle */
 	obj = drm_gem_object_lookup(file_priv, mode_cmd->handles[0]);
 	if (!obj)
@@ -355,6 +359,7 @@ int virtio_gpu_modeset_init(struct virtio_gpu_device *vgdev)
 	int i;
 
 	drm_mode_config_init(vgdev->ddev);
+	vgdev->ddev->mode_config.quirk_addfb_prefer_host_byte_order = true;
 	vgdev->ddev->mode_config.funcs = &virtio_gpu_mode_funcs;
 	vgdev->ddev->mode_config.helper_private = &virtio_mode_config_helpers;
 
diff --git a/drivers/gpu/drm/virtio/virtgpu_fb.c b/drivers/gpu/drm/virtio/virtgpu_fb.c
index b5cebc9a17..6c89974c7f 100644
--- a/drivers/gpu/drm/virtio/virtgpu_fb.c
+++ b/drivers/gpu/drm/virtio/virtgpu_fb.c
@@ -233,7 +233,7 @@ static int virtio_gpufb_create(struct drm_fb_helper *helper,
 	mode_cmd.width = sizes->surface_width;
 	mode_cmd.height = sizes->surface_height;
 	mode_cmd.pitches[0] = mode_cmd.width * 4;
-	mode_cmd.pixel_format = drm_mode_legacy_fb_format(32, 24);
+	mode_cmd.pixel_format = DRM_FORMAT_HOST_XRGB8888;
 
 	format = virtio_gpu_translate_format(mode_cmd.pixel_format);
 	if (format == 0)
diff --git a/drivers/gpu/drm/virtio/virtgpu_gem.c b/drivers/gpu/drm/virtio/virtgpu_gem.c
index 0f2768eaca..82c817f37c 100644
--- a/drivers/gpu/drm/virtio/virtgpu_gem.c
+++ b/drivers/gpu/drm/virtio/virtgpu_gem.c
@@ -90,7 +90,10 @@ int virtio_gpu_mode_dumb_create(struct drm_file *file_priv,
 	uint32_t resid;
 	uint32_t format;
 
-	pitch = args->width * ((args->bpp + 1) / 8);
+	if (args->bpp != 32)
+		return -EINVAL;
+
+	pitch = args->width * 4;
 	args->size = pitch * args->height;
 	args->size = ALIGN(args->size, PAGE_SIZE);
 
@@ -99,7 +102,7 @@ int virtio_gpu_mode_dumb_create(struct drm_file *file_priv,
 	if (ret)
 		goto fail;
 
-	format = virtio_gpu_translate_format(DRM_FORMAT_XRGB8888);
+	format = virtio_gpu_translate_format(DRM_FORMAT_HOST_XRGB8888);
 	virtio_gpu_resource_id_get(vgdev, &resid);
 	virtio_gpu_cmd_create_resource(vgdev, resid, format,
 				       args->width, args->height);
diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c b/drivers/gpu/drm/virtio/virtgpu_plane.c
index 88f2fb8c61..a84ff56507 100644
--- a/drivers/gpu/drm/virtio/virtgpu_plane.c
+++ b/drivers/gpu/drm/virtio/virtgpu_plane.c
@@ -28,22 +28,11 @@
 #include <drm/drm_atomic_helper.h>
 
 static const uint32_t virtio_gpu_formats[] = {
-	DRM_FORMAT_XRGB8888,
-	DRM_FORMAT_ARGB8888,
-	DRM_FORMAT_BGRX8888,
-	DRM_FORMAT_BGRA8888,
-	DRM_FORMAT_RGBX8888,
-	DRM_FORMAT_RGBA8888,
-	DRM_FORMAT_XBGR8888,
-	DRM_FORMAT_ABGR8888,
+	DRM_FORMAT_HOST_XRGB8888,
 };
 
 static const uint32_t virtio_gpu_cursor_formats[] = {
-#ifdef __BIG_ENDIAN
-	DRM_FORMAT_BGRA8888,
-#else
-	DRM_FORMAT_ARGB8888,
-#endif
+	DRM_FORMAT_HOST_ARGB8888,
 };
 
 uint32_t virtio_gpu_translate_format(uint32_t drm_fourcc)
@@ -51,32 +40,6 @@ uint32_t virtio_gpu_translate_format(uint32_t drm_fourcc)
 	uint32_t format;
 
 	switch (drm_fourcc) {
-#ifdef __BIG_ENDIAN
-	case DRM_FORMAT_XRGB8888:
-		format = VIRTIO_GPU_FORMAT_X8R8G8B8_UNORM;
-		break;
-	case DRM_FORMAT_ARGB8888:
-		format = VIRTIO_GPU_FORMAT_A8R8G8B8_UNORM;
-		break;
-	case DRM_FORMAT_BGRX8888:
-		format = VIRTIO_GPU_FORMAT_B8G8R8X8_UNORM;
-		break;
-	case DRM_FORMAT_BGRA8888:
-		format = VIRTIO_GPU_FORMAT_B8G8R8A8_UNORM;
-		break;
-	case DRM_FORMAT_RGBX8888:
-		format = VIRTIO_GPU_FORMAT_R8G8B8X8_UNORM;
-		break;
-	case DRM_FORMAT_RGBA8888:
-		format = VIRTIO_GPU_FORMAT_R8G8B8A8_UNORM;
-		break;
-	case DRM_FORMAT_XBGR8888:
-		format = VIRTIO_GPU_FORMAT_X8B8G8R8_UNORM;
-		break;
-	case DRM_FORMAT_ABGR8888:
-		format = VIRTIO_GPU_FORMAT_A8B8G8R8_UNORM;
-		break;
-#else
 	case DRM_FORMAT_XRGB8888:
 		format = VIRTIO_GPU_FORMAT_B8G8R8X8_UNORM;
 		break;
@@ -89,19 +52,6 @@ uint32_t virtio_gpu_translate_format(uint32_t drm_fourcc)
 	case DRM_FORMAT_BGRA8888:
 		format = VIRTIO_GPU_FORMAT_A8R8G8B8_UNORM;
 		break;
-	case DRM_FORMAT_RGBX8888:
-		format = VIRTIO_GPU_FORMAT_X8B8G8R8_UNORM;
-		break;
-	case DRM_FORMAT_RGBA8888:
-		format = VIRTIO_GPU_FORMAT_A8B8G8R8_UNORM;
-		break;
-	case DRM_FORMAT_XBGR8888:
-		format = VIRTIO_GPU_FORMAT_R8G8B8X8_UNORM;
-		break;
-	case DRM_FORMAT_ABGR8888:
-		format = VIRTIO_GPU_FORMAT_R8G8B8A8_UNORM;
-		break;
-#endif
 	default:
 		/*
 		 * This should not happen, we handle everything listed
-- 
2.9.3

^ permalink raw reply related

* [PATCH v3 4/5] drm/bochs: support changing byteorder at mode set time
From: Gerd Hoffmann @ 2018-09-19 11:12 UTC (permalink / raw)
  To: dri-devel
  Cc: David Airlie, michel, open list,
	open list:DRM DRIVER FOR BOCHS VIRTUAL GPU, imirkin
In-Reply-To: <20180919111252.7932-1-kraxel@redhat.com>

Add bochs_hw_set_*_endian() helper functions, to set the framebuffer
byteorder at mode set time.  Support both DRM_FORMAT_XRGB8888 and
DRM_FORMAT_BGRX8888 framebuffer formats, no matter what the native
machine byte order is.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 drivers/gpu/drm/bochs/bochs.h       |  4 ++-
 drivers/gpu/drm/bochs/bochs_fbdev.c |  3 +-
 drivers/gpu/drm/bochs/bochs_hw.c    | 64 +++++++++++++++++++++++++++++--------
 drivers/gpu/drm/bochs/bochs_kms.c   |  8 +++--
 4 files changed, 61 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/bochs/bochs.h b/drivers/gpu/drm/bochs/bochs.h
index b4f6bb5219..e7a69077e4 100644
--- a/drivers/gpu/drm/bochs/bochs.h
+++ b/drivers/gpu/drm/bochs/bochs.h
@@ -58,6 +58,7 @@ struct bochs_device {
 	void __iomem   *fb_map;
 	unsigned long  fb_base;
 	unsigned long  fb_size;
+	unsigned long  qext_size;
 
 	/* mode */
 	u16 xres;
@@ -121,7 +122,8 @@ int bochs_hw_init(struct drm_device *dev);
 void bochs_hw_fini(struct drm_device *dev);
 
 void bochs_hw_setmode(struct bochs_device *bochs,
-		      struct drm_display_mode *mode);
+		      struct drm_display_mode *mode,
+		      const struct drm_format_info *format);
 void bochs_hw_setbase(struct bochs_device *bochs,
 		      int x, int y, u64 addr);
 
diff --git a/drivers/gpu/drm/bochs/bochs_fbdev.c b/drivers/gpu/drm/bochs/bochs_fbdev.c
index c46fdae44e..dd3c7df267 100644
--- a/drivers/gpu/drm/bochs/bochs_fbdev.c
+++ b/drivers/gpu/drm/bochs/bochs_fbdev.c
@@ -140,7 +140,8 @@ static struct drm_framebuffer *
 bochs_gem_fb_create(struct drm_device *dev, struct drm_file *file,
 		    const struct drm_mode_fb_cmd2 *mode_cmd)
 {
-	if (mode_cmd->pixel_format != DRM_FORMAT_HOST_XRGB8888)
+	if (mode_cmd->pixel_format != DRM_FORMAT_XRGB8888 &&
+	    mode_cmd->pixel_format != DRM_FORMAT_BGRX8888)
 		return ERR_PTR(-EINVAL);
 
 	return drm_gem_fb_create(dev, file, mode_cmd);
diff --git a/drivers/gpu/drm/bochs/bochs_hw.c b/drivers/gpu/drm/bochs/bochs_hw.c
index 16e4f1cacc..cacff73a64 100644
--- a/drivers/gpu/drm/bochs/bochs_hw.c
+++ b/drivers/gpu/drm/bochs/bochs_hw.c
@@ -47,11 +47,33 @@ static void bochs_dispi_write(struct bochs_device *bochs, u16 reg, u16 val)
 	}
 }
 
+static void bochs_hw_set_big_endian(struct bochs_device *bochs)
+{
+	if (bochs->qext_size < 8)
+		return;
+
+	writel(0xbebebebe, bochs->mmio + 0x604);
+}
+
+static void bochs_hw_set_little_endian(struct bochs_device *bochs)
+{
+	if (bochs->qext_size < 8)
+		return;
+
+	writel(0x1e1e1e1e, bochs->mmio + 0x604);
+}
+
+#ifdef __BIG_ENDIAN
+#define bochs_hw_set_native_endian(_b) bochs_hw_set_big_endian(_b)
+#else
+#define bochs_hw_set_native_endian(_b) bochs_hw_set_little_endian(_b)
+#endif
+
 int bochs_hw_init(struct drm_device *dev)
 {
 	struct bochs_device *bochs = dev->dev_private;
 	struct pci_dev *pdev = dev->pdev;
-	unsigned long addr, size, mem, ioaddr, iosize, qext_size;
+	unsigned long addr, size, mem, ioaddr, iosize;
 	u16 id;
 
 	if (pdev->resource[2].flags & IORESOURCE_MEM) {
@@ -117,19 +139,14 @@ int bochs_hw_init(struct drm_device *dev)
 		 ioaddr);
 
 	if (bochs->mmio && pdev->revision >= 2) {
-		qext_size = readl(bochs->mmio + 0x600);
-		if (qext_size < 4 || qext_size > iosize)
+		bochs->qext_size = readl(bochs->mmio + 0x600);
+		if (bochs->qext_size < 4 || bochs->qext_size > iosize) {
+			bochs->qext_size = 0;
 			goto noext;
-		DRM_DEBUG("Found qemu ext regs, size %ld\n", qext_size);
-		if (qext_size >= 8) {
-#ifdef __BIG_ENDIAN
-			writel(0xbebebebe, bochs->mmio + 0x604);
-#else
-			writel(0x1e1e1e1e, bochs->mmio + 0x604);
-#endif
-			DRM_DEBUG("  qext endian: 0x%x\n",
-				  readl(bochs->mmio + 0x604));
 		}
+		DRM_DEBUG("Found qemu ext regs, size %ld\n",
+			  bochs->qext_size);
+		bochs_hw_set_native_endian(bochs);
 	}
 
 noext:
@@ -150,7 +167,8 @@ void bochs_hw_fini(struct drm_device *dev)
 }
 
 void bochs_hw_setmode(struct bochs_device *bochs,
-		      struct drm_display_mode *mode)
+		      struct drm_display_mode *mode,
+		      const struct drm_format_info *format)
 {
 	bochs->xres = mode->hdisplay;
 	bochs->yres = mode->vdisplay;
@@ -158,8 +176,12 @@ void bochs_hw_setmode(struct bochs_device *bochs,
 	bochs->stride = mode->hdisplay * (bochs->bpp / 8);
 	bochs->yres_virtual = bochs->fb_size / bochs->stride;
 
-	DRM_DEBUG_DRIVER("%dx%d @ %d bpp, vy %d\n",
+	DRM_DEBUG_DRIVER("%dx%d @ %d bpp, format %c%c%c%c, vy %d\n",
 			 bochs->xres, bochs->yres, bochs->bpp,
+			 (format->format >>  0) & 0xff,
+			 (format->format >>  8) & 0xff,
+			 (format->format >> 16) & 0xff,
+			 (format->format >> 24) & 0xff,
 			 bochs->yres_virtual);
 
 	bochs_vga_writeb(bochs, 0x3c0, 0x20); /* unblank */
@@ -177,6 +199,20 @@ void bochs_hw_setmode(struct bochs_device *bochs,
 
 	bochs_dispi_write(bochs, VBE_DISPI_INDEX_ENABLE,
 			  VBE_DISPI_ENABLED | VBE_DISPI_LFB_ENABLED);
+
+	switch (format->format) {
+	case DRM_FORMAT_XRGB8888:
+		bochs_hw_set_little_endian(bochs);
+		break;
+	case DRM_FORMAT_BGRX8888:
+		bochs_hw_set_big_endian(bochs);
+		break;
+	default:
+		/* should not happen */
+		DRM_ERROR("%s: Huh? Got framebuffer format 0x%x",
+			  __func__, format->format);
+		break;
+	};
 }
 
 void bochs_hw_setbase(struct bochs_device *bochs,
diff --git a/drivers/gpu/drm/bochs/bochs_kms.c b/drivers/gpu/drm/bochs/bochs_kms.c
index f3fdaf9456..9bc5b438ae 100644
--- a/drivers/gpu/drm/bochs/bochs_kms.c
+++ b/drivers/gpu/drm/bochs/bochs_kms.c
@@ -77,7 +77,10 @@ static int bochs_crtc_mode_set(struct drm_crtc *crtc,
 	struct bochs_device *bochs =
 		container_of(crtc, struct bochs_device, crtc);
 
-	bochs_hw_setmode(bochs, mode);
+	if (WARN_ON(crtc->primary->fb == NULL))
+		return -EINVAL;
+
+	bochs_hw_setmode(bochs, mode, crtc->primary->fb->format);
 	bochs_crtc_mode_set_base(crtc, x, y, old_fb);
 	return 0;
 }
@@ -127,7 +130,8 @@ static const struct drm_crtc_helper_funcs bochs_helper_funcs = {
 };
 
 static const uint32_t bochs_formats[] = {
-	DRM_FORMAT_HOST_XRGB8888,
+	DRM_FORMAT_XRGB8888,
+	DRM_FORMAT_BGRX8888,
 };
 
 static struct drm_plane *bochs_primary_plane(struct drm_device *dev)
-- 
2.9.3

^ permalink raw reply related

* [PATCH v3 3/5] drm/bochs: fix DRM_FORMAT_* handling for big endian machines.
From: Gerd Hoffmann @ 2018-09-19 11:12 UTC (permalink / raw)
  To: dri-devel
  Cc: David Airlie, michel, open list,
	open list:DRM DRIVER FOR BOCHS VIRTUAL GPU, imirkin
In-Reply-To: <20180919111252.7932-1-kraxel@redhat.com>

Use DRM_FORMAT_HOST_XRGB8888, so we are using the correct format code
on bigendian machines.  Also set the quirk_addfb_prefer_host_byte_order
mode_config bit so drm_mode_addfb() asks for the correct format code.

Create our own plane and use drm_crtc_init_with_planes() instead of
depending on the default created by drm_crtc_init().  That way the plane
format list is correct on bigendian machines.

Also re-add the framebuffer format check dropped by "df2052cc92 bochs:
convert to drm_fb_helper_fbdev_setup/teardown".

With this patch applied both ADDFB and ADDFB2 ioctls work correctly in
the bochs-drm.ko driver on big endian machines.  Without the patch only
ADDFB (which still seems to be used by the majority of userspace) works
correctly.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 drivers/gpu/drm/bochs/bochs_fbdev.c | 17 +++++++++++++----
 drivers/gpu/drm/bochs/bochs_kms.c   | 34 +++++++++++++++++++++++++++++++++-
 2 files changed, 46 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/bochs/bochs_fbdev.c b/drivers/gpu/drm/bochs/bochs_fbdev.c
index 8f4d6c052f..c46fdae44e 100644
--- a/drivers/gpu/drm/bochs/bochs_fbdev.c
+++ b/drivers/gpu/drm/bochs/bochs_fbdev.c
@@ -63,9 +63,8 @@ static int bochsfb_create(struct drm_fb_helper *helper,
 
 	mode_cmd.width = sizes->surface_width;
 	mode_cmd.height = sizes->surface_height;
-	mode_cmd.pitches[0] = mode_cmd.width * ((sizes->surface_bpp + 7) / 8);
-	mode_cmd.pixel_format = drm_mode_legacy_fb_format(sizes->surface_bpp,
-							  sizes->surface_depth);
+	mode_cmd.pitches[0] = sizes->surface_width * 4;
+	mode_cmd.pixel_format = DRM_FORMAT_HOST_XRGB8888;
 	size = mode_cmd.pitches[0] * mode_cmd.height;
 
 	/* alloc, pin & map bo */
@@ -137,8 +136,18 @@ static const struct drm_fb_helper_funcs bochs_fb_helper_funcs = {
 	.fb_probe = bochsfb_create,
 };
 
+static struct drm_framebuffer *
+bochs_gem_fb_create(struct drm_device *dev, struct drm_file *file,
+		    const struct drm_mode_fb_cmd2 *mode_cmd)
+{
+	if (mode_cmd->pixel_format != DRM_FORMAT_HOST_XRGB8888)
+		return ERR_PTR(-EINVAL);
+
+	return drm_gem_fb_create(dev, file, mode_cmd);
+}
+
 const struct drm_mode_config_funcs bochs_mode_funcs = {
-	.fb_create = drm_gem_fb_create,
+	.fb_create = bochs_gem_fb_create,
 };
 
 int bochs_fbdev_init(struct bochs_device *bochs)
diff --git a/drivers/gpu/drm/bochs/bochs_kms.c b/drivers/gpu/drm/bochs/bochs_kms.c
index ea9a43d31b..f3fdaf9456 100644
--- a/drivers/gpu/drm/bochs/bochs_kms.c
+++ b/drivers/gpu/drm/bochs/bochs_kms.c
@@ -126,12 +126,43 @@ static const struct drm_crtc_helper_funcs bochs_helper_funcs = {
 	.commit = bochs_crtc_commit,
 };
 
+static const uint32_t bochs_formats[] = {
+	DRM_FORMAT_HOST_XRGB8888,
+};
+
+static struct drm_plane *bochs_primary_plane(struct drm_device *dev)
+{
+	struct drm_plane *primary;
+	int ret;
+
+	primary = kzalloc(sizeof(*primary), GFP_KERNEL);
+	if (primary == NULL) {
+		DRM_DEBUG_KMS("Failed to allocate primary plane\n");
+		return NULL;
+	}
+
+	ret = drm_universal_plane_init(dev, primary, 0,
+				       &drm_primary_helper_funcs,
+				       bochs_formats,
+				       ARRAY_SIZE(bochs_formats),
+				       NULL,
+				       DRM_PLANE_TYPE_PRIMARY, NULL);
+	if (ret) {
+		kfree(primary);
+		primary = NULL;
+	}
+
+	return primary;
+}
+
 static void bochs_crtc_init(struct drm_device *dev)
 {
 	struct bochs_device *bochs = dev->dev_private;
 	struct drm_crtc *crtc = &bochs->crtc;
+	struct drm_plane *primary = bochs_primary_plane(dev);
 
-	drm_crtc_init(dev, crtc, &bochs_crtc_funcs);
+	drm_crtc_init_with_planes(dev, crtc, primary, NULL,
+				  &bochs_crtc_funcs, NULL);
 	drm_crtc_helper_add(crtc, &bochs_helper_funcs);
 }
 
@@ -250,6 +281,7 @@ int bochs_kms_init(struct bochs_device *bochs)
 	bochs->dev->mode_config.fb_base = bochs->fb_base;
 	bochs->dev->mode_config.preferred_depth = 24;
 	bochs->dev->mode_config.prefer_shadow = 0;
+	bochs->dev->mode_config.quirk_addfb_prefer_host_byte_order = true;
 
 	bochs->dev->mode_config.funcs = &bochs_mode_funcs;
 
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH] x86/paravirt: fix some warning messages
From: Juergen Gross @ 2018-09-19 10:47 UTC (permalink / raw)
  To: Dan Carpenter, Peter Zijlstra
  Cc: x86, kernel-janitors, virtualization, Ingo Molnar, H. Peter Anvin,
	Alok Kataria, Thomas Gleixner
In-Reply-To: <20180919103553.GD9238@mwanda>

On 19/09/18 12:35, Dan Carpenter wrote:
> The first argument to WARN_ONCE() is a condition.
> 
> Fixes: 5800dc5c19f3 ("x86/paravirt: Fix spectre-v2 mitigations for paravirt guests")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

^ permalink raw reply

* [PATCH] x86/paravirt: fix some warning messages
From: Dan Carpenter @ 2018-09-19 10:35 UTC (permalink / raw)
  To: Juergen Gross, Peter Zijlstra
  Cc: x86, kernel-janitors, virtualization, Ingo Molnar, H. Peter Anvin,
	Alok Kataria, Thomas Gleixner

The first argument to WARN_ONCE() is a condition.

Fixes: 5800dc5c19f3 ("x86/paravirt: Fix spectre-v2 mitigations for paravirt guests")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index bbf006fe78d7..e4d4df37922a 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -89,7 +89,7 @@ static unsigned paravirt_patch_call(void *insnbuf, const void *target,
 
 	if (len < 5) {
 #ifdef CONFIG_RETPOLINE
-		WARN_ONCE("Failing to patch indirect CALL in %ps\n", (void *)addr);
+		WARN_ONCE(1, "Failing to patch indirect CALL in %ps\n", (void *)addr);
 #endif
 		return len;	/* call too long for patch site */
 	}
@@ -110,7 +110,7 @@ static unsigned paravirt_patch_jmp(void *insnbuf, const void *target,
 
 	if (len < 5) {
 #ifdef CONFIG_RETPOLINE
-		WARN_ONCE("Failing to patch indirect JMP in %ps\n", (void *)addr);
+		WARN_ONCE(1, "Failing to patch indirect JMP in %ps\n", (void *)addr);
 #endif
 		return len;	/* call too long for patch site */
 	}

^ permalink raw reply related

* Re: [virtio-dev] [PATCH 2/2] drm/virtio: add iommu support.
From: Gerd Hoffmann @ 2018-09-19  4:46 UTC (permalink / raw)
  To: Jiandi An
  Cc: virtio-dev@lists.oasis-open.org, Singh, Brijesh, Lendacky, Thomas,
	Dave Airlie, LKML, dri-devel, open list:VIRTIO CORE, NET...,
	Dave Airlie
In-Reply-To: <5684a3fc-3605-7bb0-b0be-418c2ae702b5@amd.com>

  Hi,

> buffer.  I tried to put a dma_sync_sg_for_device() on virtio_gpu_object obj->pages-sgl
> before VIRTIO_GPU_CMD_TRANSFER_TO_HOST_2D is sent.  This fixes the kernel console path.

That should be the right place.

> Once display manger is kicked off for example (sudo systemctl start lightdm.service) and
> resource id 3 gets created from user space down, it still gives a blank black screen.

Hmm.  I'd suspect there is simply a code path missing.  Can you send the
patch you have?

cheers,
  Gerd

^ permalink raw reply

* Re: [patch 09/11] x86/vdso: Simplify the invalid vclock case
From: Thomas Gleixner @ 2018-09-18 23:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Florian Weimer, Juergen Gross, Arnd Bergmann, Peter Zijlstra,
	X86 ML, LKML, Linux Virtualization, Stephen Boyd, John Stultz,
	Andy Lutomirski, Paolo Bonzini, devel, Matt Rickard
In-Reply-To: <439A3E73-E4FF-4D66-800E-5BEE58EDE8F6@amacapital.net>

[-- Attachment #1: Type: text/plain, Size: 1254 bytes --]

On Tue, 18 Sep 2018, Andy Lutomirski wrote:
> > On Sep 18, 2018, at 3:46 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Tue, 18 Sep 2018, Andy Lutomirski wrote:
> >> Do we do better if we use signed arithmetic for the whole calculation?
> >> Then a small backwards movement would result in a small backwards result.
> >> Or we could offset everything so that we’d have to go back several
> >> hundred ms before we cross zero.
> > 
> > That would be probably the better solution as signed math would be
> > problematic when the resulting ns value becomes negative. As the delta is
> > really small, otherwise the TSC sync check would have caught it, the caller
> > should never be able to observe time going backwards.
> > 
> > I'll have a look into that. It needs some thought vs. the fractional part
> > of the base time, but it should be not rocket science to get that
> > correct. Famous last words...
> > 
> 
> It’s also fiddly to tune. If you offset it too much, then the fancy
> divide-by-repeated-subtraction loop will hurt more than the comparison to
> last.

Not really. It's sufficient to offset it by at max. 1000 cycles or so. That
won't hurt the magic loop, but it will definitely cover that slight offset
case.

Thanks,

	tglx


[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [patch 09/11] x86/vdso: Simplify the invalid vclock case
From: Andy Lutomirski @ 2018-09-18 23:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Florian Weimer, Juergen Gross, Arnd Bergmann, Peter Zijlstra,
	X86 ML, LKML, Linux Virtualization, Stephen Boyd, John Stultz,
	Andy Lutomirski, Paolo Bonzini, devel, Matt Rickard
In-Reply-To: <alpine.DEB.2.21.1809190037570.1468@nanos.tec.linutronix.de>



> On Sep 18, 2018, at 3:46 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Tue, 18 Sep 2018, Andy Lutomirski wrote:
>>> On Sep 18, 2018, at 12:52 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>>> 
>>>>> On Mon, 17 Sep 2018, John Stultz wrote:
>>>>> On Mon, Sep 17, 2018 at 12:25 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>>>> Also, I'm not entirely convinced that this "last" thing is needed at
>>>>> all.  John, what's the scenario under which we need it?
>>>> 
>>>> So my memory is probably a bit foggy, but I recall that as we
>>>> accelerated gettimeofday, we found that even on systems that claimed
>>>> to have synced TSCs, they were actually just slightly out of sync.
>>>> Enough that right after cycles_last had been updated, a read on
>>>> another cpu could come in just behind cycles_last, resulting in a
>>>> negative interval causing lots of havoc.
>>>> 
>>>> So the sanity check is needed to avoid that case.
>>> 
>>> Your memory serves you right. That's indeed observable on CPUs which
>>> lack TSC_ADJUST.
>>> 
>>> @Andy: Welcome to the wonderful world of TSC.
>>> 
>> 
>> Do we do better if we use signed arithmetic for the whole calculation?
>> Then a small backwards movement would result in a small backwards result.
>> Or we could offset everything so that we’d have to go back several
>> hundred ms before we cross zero.
> 
> That would be probably the better solution as signed math would be
> problematic when the resulting ns value becomes negative. As the delta is
> really small, otherwise the TSC sync check would have caught it, the caller
> should never be able to observe time going backwards.
> 
> I'll have a look into that. It needs some thought vs. the fractional part
> of the base time, but it should be not rocket science to get that
> correct. Famous last words...
> 

It’s also fiddly to tune. If you offset it too much, then the fancy divide-by-repeated-subtraction loop will hurt more than the comparison to last.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [patch 09/11] x86/vdso: Simplify the invalid vclock case
From: Thomas Gleixner @ 2018-09-18 22:46 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Florian Weimer, Juergen Gross, Arnd Bergmann, Peter Zijlstra,
	X86 ML, LKML, Linux Virtualization, Stephen Boyd, John Stultz,
	Andy Lutomirski, Paolo Bonzini, devel, Matt Rickard
In-Reply-To: <863331ED-B04A-4B94-91A2-D34002C9CCDC@amacapital.net>

[-- Attachment #1: Type: text/plain, Size: 1734 bytes --]

On Tue, 18 Sep 2018, Andy Lutomirski wrote:
> > On Sep 18, 2018, at 12:52 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > 
> >> On Mon, 17 Sep 2018, John Stultz wrote:
> >>> On Mon, Sep 17, 2018 at 12:25 PM, Andy Lutomirski <luto@kernel.org> wrote:
> >>> Also, I'm not entirely convinced that this "last" thing is needed at
> >>> all.  John, what's the scenario under which we need it?
> >> 
> >> So my memory is probably a bit foggy, but I recall that as we
> >> accelerated gettimeofday, we found that even on systems that claimed
> >> to have synced TSCs, they were actually just slightly out of sync.
> >> Enough that right after cycles_last had been updated, a read on
> >> another cpu could come in just behind cycles_last, resulting in a
> >> negative interval causing lots of havoc.
> >> 
> >> So the sanity check is needed to avoid that case.
> > 
> > Your memory serves you right. That's indeed observable on CPUs which
> > lack TSC_ADJUST.
> > 
> > @Andy: Welcome to the wonderful world of TSC.
> > 
> 
> Do we do better if we use signed arithmetic for the whole calculation?
> Then a small backwards movement would result in a small backwards result.
> Or we could offset everything so that we’d have to go back several
> hundred ms before we cross zero.

That would be probably the better solution as signed math would be
problematic when the resulting ns value becomes negative. As the delta is
really small, otherwise the TSC sync check would have caught it, the caller
should never be able to observe time going backwards.

I'll have a look into that. It needs some thought vs. the fractional part
of the base time, but it should be not rocket science to get that
correct. Famous last words...

Thanks,

	tglx



[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [RFC PATCH 2/2] virtio/s390: fix race in ccw_io_helper()
From: Cornelia Huck @ 2018-09-18 18:45 UTC (permalink / raw)
  To: Halil Pasic; +Cc: linux-s390, Colin Ian King, kvm, virtualization
In-Reply-To: <20180912140202.12292-3-pasic@linux.ibm.com>

On Wed, 12 Sep 2018 16:02:02 +0200
Halil Pasic <pasic@linux.ibm.com> wrote:

> While ccw_io_helper() seems like intended to be exclusive in a sense that
> it is supposed to facilitate I/O for at most one thread at any given
> time, there is actually nothing ensuring that threads won't pile up at
> vcdev->wait_q. If they all threads get woken up and see the status that
> belongs to some other request as their own. This can lead to bugs. For an
> example see :
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1788432
> 
> This normally does not cause problems, as these are usually infrequent
> operations that happen in a well defined sequence and normally do not
> fail. But occasionally sysfs attributes are directly dependent
> on pieces of virio config and trigger a get on each read.  This gives us
> at least one method to trigger races.

Yes, the idea behind ccw_io_helper() was to provide a simple way to use
the inherently asynchronous channel I/O operations in a synchronous
way, as that's what the virtio callbacks expect. I did not consider
multiple callbacks for a device running at the same time; but if the
interface allows that, we obviously need to be able to handle it.

Has this only been observed for the config get/set commands? (The
read-before-write thing?)

> 
> Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> Reported-by: Colin Ian King <colin.king@canonical.com>
> ---
> 
> This is a big hammer -- mutex on virtio_ccw device level would more than
> suffice. But I don't think it hurts, and maybe there is a better way e.g.
> one using some common ccw/cio mechanisms to address this. That's why this
> is an RFC.

I'm for using more delicate tools, if possible :)

We basically have two options:
- Have a way to queue I/O operations and then handle them in sequence.
  Creates complexity, and is likely overkill. (We already have a kind
  of serialization because we re-submit the channel program until the
  hypervisor accepts it; the problem comes from the wait queue usage.)
- Add serialization around the submit/wait procedure (as you did), but
  with a per-device mutex. That looks like the easiest solution.

> ---
>  drivers/s390/virtio/virtio_ccw.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> index a5e8530a3391..36252f344660 100644
> --- a/drivers/s390/virtio/virtio_ccw.c
> +++ b/drivers/s390/virtio/virtio_ccw.c
> @@ -289,6 +289,8 @@ static int doing_io(struct virtio_ccw_device *vcdev, __u32 flag)
>  	return ret;
>  }
>  
> +DEFINE_MUTEX(vcio_mtx);
> +
>  static int ccw_io_helper(struct virtio_ccw_device *vcdev,
>  			 struct ccw1 *ccw, __u32 intparm)
>  {
> @@ -296,6 +298,7 @@ static int ccw_io_helper(struct virtio_ccw_device *vcdev,
>  	unsigned long flags;
>  	int flag = intparm & VIRTIO_CCW_INTPARM_MASK;
>  
> +	mutex_lock(&vcio_mtx);
>  	do {
>  		spin_lock_irqsave(get_ccwdev_lock(vcdev->cdev), flags);
>  		ret = ccw_device_start(vcdev->cdev, ccw, intparm, 0, 0);
> @@ -308,7 +311,9 @@ static int ccw_io_helper(struct virtio_ccw_device *vcdev,
>  		cpu_relax();
>  	} while (ret == -EBUSY);

We probably still want to keep this while loop to be on the safe side
(unsolicited status from the hypervisor, for example.)

>  	wait_event(vcdev->wait_q, doing_io(vcdev, flag) == 0);
> -	return ret ? ret : vcdev->err;
> +	ret = ret ? ret : vcdev->err;
> +	mutex_unlock(&vcio_mtx);
> +	return ret;
>  }
>  
>  static void virtio_ccw_drop_indicator(struct virtio_ccw_device *vcdev,

^ permalink raw reply

* Re: [RFC PATCH 1/2] virtio/s390: avoid race on vcdev->config
From: Cornelia Huck @ 2018-09-18 18:29 UTC (permalink / raw)
  To: Halil Pasic; +Cc: linux-s390, Colin Ian King, kvm, virtualization
In-Reply-To: <20180912140202.12292-2-pasic@linux.ibm.com>

On Wed, 12 Sep 2018 16:02:01 +0200
Halil Pasic <pasic@linux.ibm.com> wrote:

> Currently we have a race on vcdev->config in virtio_ccw_get_config() and
> in virtio_ccw_set_config().
> 
> This normally does not cause problems, as these are usually infrequent
> operations. But occasionally sysfs attributes are directly dependent on
> pieces of virio config and trigger a get on each read. This gives us at
> least one trigger.

So, the problem is that you might get unexpected/inconsistent config
information?

> 
> Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> ---
>  drivers/s390/virtio/virtio_ccw.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> index 8f5c1d7f751a..a5e8530a3391 100644
> --- a/drivers/s390/virtio/virtio_ccw.c
> +++ b/drivers/s390/virtio/virtio_ccw.c
> @@ -828,6 +828,7 @@ static void virtio_ccw_get_config(struct virtio_device *vdev,
>  	int ret;
>  	struct ccw1 *ccw;
>  	void *config_area;
> +	unsigned long flags;
>  
>  	ccw = kzalloc(sizeof(*ccw), GFP_DMA | GFP_KERNEL);
>  	if (!ccw)
> @@ -846,11 +847,13 @@ static void virtio_ccw_get_config(struct virtio_device *vdev,
>  	if (ret)
>  		goto out_free;
>  
> +	spin_lock_irqsave(&vcdev->lock, flags);

I'm not sure that vcdev->lock is the right lock to use for this, but
I'm a bit unsure about what you're guarding against here.

>  	memcpy(vcdev->config, config_area, offset + len);
> -	if (buf)
> -		memcpy(buf, &vcdev->config[offset], len);
>  	if (vcdev->config_ready < offset + len)
>  		vcdev->config_ready = offset + len;
> +	spin_unlock_irqrestore(&vcdev->lock, flags);
> +	if (buf)
> +		memcpy(buf, config_area + offset, len);
>  
>  out_free:
>  	kfree(config_area);
> @@ -864,6 +867,7 @@ static void virtio_ccw_set_config(struct virtio_device *vdev,
>  	struct virtio_ccw_device *vcdev = to_vc_device(vdev);
>  	struct ccw1 *ccw;
>  	void *config_area;
> +	unsigned long flags;
>  
>  	ccw = kzalloc(sizeof(*ccw), GFP_DMA | GFP_KERNEL);
>  	if (!ccw)
> @@ -876,9 +880,11 @@ static void virtio_ccw_set_config(struct virtio_device *vdev,
>  	/* Make sure we don't overwrite fields. */
>  	if (vcdev->config_ready < offset)
>  		virtio_ccw_get_config(vdev, 0, NULL, offset);
> +	spin_lock_irqsave(&vcdev->lock, flags);
>  	memcpy(&vcdev->config[offset], buf, len);
>  	/* Write the config area to the host. */
>  	memcpy(config_area, vcdev->config, sizeof(vcdev->config));
> +	spin_unlock_irqrestore(&vcdev->lock, flags);
>  	ccw->cmd_code = CCW_CMD_WRITE_CONF;
>  	ccw->flags = 0;
>  	ccw->count = offset + len;

One thing that might be a problem here is how reading/writing the
config stuff works for virtio-ccw: As channel devices don't have a
config space like e.g. PCI devices, I had to come up with a way to
implement something like that via dedicated channel commands. I did not
want to go via a payload that would provide offset/length, but went
with reading/writing everything before the value to be read/written as
well. That means we need to call read before write to make sure we
don't overwrite things (as the comment states), and how this is done
might be problematic.

I'm thinking what we may need is a way to make "read and then write" a
single operation and make sure that it does not run concurrently with
the simple read operation. Factor out the proper invocation of the read
command and the status update, have get_config call this with a reader
lock and have set_config call the read-then-write combo with a write
lock, maybe?

I might not understand the problem correctly, though (or I might have
spent too much time reading email today already :)

^ permalink raw reply

* Re: [patch 09/11] x86/vdso: Simplify the invalid vclock case
From: Thomas Gleixner @ 2018-09-18 15:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Florian Weimer, Juergen Gross, Arnd Bergmann, Stephen Boyd,
	X86 ML, LKML, Linux Virtualization, John Stultz, Andy Lutomirski,
	Paolo Bonzini, devel, Matt Rickard
In-Reply-To: <alpine.DEB.2.21.1809181515170.6950@nanos.tec.linutronix.de>

On Tue, 18 Sep 2018, Thomas Gleixner wrote:
> So if the TSC on CPU1 is slightly behind the TSC on CPU0 then now1 can be
> smaller than cycle_last. The TSC sync stuff does not catch the small delta
> for unknown raisins. I'll go and find that machine and test that again.

Of course it does not trigger anymore. We accumulated code between the
point in timekeeping_advance() where the TSC is read and the update of the
VDSO data.

I'll might have to get an 2.6ish kernel booted on that machine and try with
that again. /me shudders

Thanks,

	tglx

^ permalink raw reply

* Re: [patch 09/11] x86/vdso: Simplify the invalid vclock case
From: Andy Lutomirski @ 2018-09-18 14:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Florian Weimer, Juergen Gross, Arnd Bergmann, Peter Zijlstra,
	X86 ML, LKML, Linux Virtualization, Stephen Boyd, John Stultz,
	Andy Lutomirski, Paolo Bonzini, devel, Matt Rickard
In-Reply-To: <alpine.DEB.2.21.1809180948570.3558@nanos.tec.linutronix.de>


> On Sep 18, 2018, at 12:52 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
>> On Mon, 17 Sep 2018, John Stultz wrote:
>>> On Mon, Sep 17, 2018 at 12:25 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>> Also, I'm not entirely convinced that this "last" thing is needed at
>>> all.  John, what's the scenario under which we need it?
>> 
>> So my memory is probably a bit foggy, but I recall that as we
>> accelerated gettimeofday, we found that even on systems that claimed
>> to have synced TSCs, they were actually just slightly out of sync.
>> Enough that right after cycles_last had been updated, a read on
>> another cpu could come in just behind cycles_last, resulting in a
>> negative interval causing lots of havoc.
>> 
>> So the sanity check is needed to avoid that case.
> 
> Your memory serves you right. That's indeed observable on CPUs which
> lack TSC_ADJUST.
> 
> @Andy: Welcome to the wonderful world of TSC.
> 

Do we do better if we use signed arithmetic for the whole calculation? Then a small backwards movement would result in a small backwards result.  Or we could offset everything so that we’d have to go back several hundred ms before we cross zero.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [patch 09/11] x86/vdso: Simplify the invalid vclock case
From: Peter Zijlstra @ 2018-09-18 13:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Florian Weimer, Juergen Gross, Arnd Bergmann, Stephen Boyd,
	X86 ML, LKML, Linux Virtualization, John Stultz, Andy Lutomirski,
	Paolo Bonzini, devel, Matt Rickard
In-Reply-To: <alpine.DEB.2.21.1809181515170.6950@nanos.tec.linutronix.de>

On Tue, Sep 18, 2018 at 03:23:08PM +0200, Thomas Gleixner wrote:
> On Tue, 18 Sep 2018, Peter Zijlstra wrote:
> > On Tue, Sep 18, 2018 at 12:41:57PM +0200, Thomas Gleixner wrote:
> > > I still have one of the machines which is affected by this.
> > 
> > Are we sure this isn't a load vs rdtsc reorder? Because if I look at the
> > current code:
> 
> The load order of last vs. rdtsc does not matter at all.
> 
> CPU0					CPU1
> 
> ....
> now0 = rdtsc_ordered();
> ...
> tk->cycle_last = now0;
> 
> gtod->seq++;
> gtod->cycle_last = tk->cycle_last;
> ...
> gtod->seq++;
> 					seq_begin(gtod->seq);
> 					now1 = rdtsc_ordered();
> 
> So if the TSC on CPU1 is slightly behind the TSC on CPU0 then now1 can be
> smaller than cycle_last. The TSC sync stuff does not catch the small delta
> for unknown raisins. I'll go and find that machine and test that again.

Yeah, somehow I forgot about rseq.. maybe I should go sleep or
something.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox