Linux virtualization list
 help / color / mirror / Atom feed
* Re: [PATCH v3 5/6] pv-qspinlock: use cmpxchg_release in __pv_queued_spin_unlock
From: Peter Zijlstra @ 2016-05-26 16:47 UTC (permalink / raw)
  To: Pan Xinhui
  Cc: jeremy, mpe, linux-kernel, virtualization, chrisw, mingo, paulus,
	benh, akataria, paulmck, linuxppc-dev
In-Reply-To: <1464164289-6124-6-git-send-email-xinhui.pan@linux.vnet.ibm.com>

On Wed, May 25, 2016 at 04:18:08PM +0800, Pan Xinhui wrote:
> cmpxchg_release is light-wight than cmpxchg, we can gain a better
> performace then. On some arch like ppc, barrier impact the performace
> too much.
> 
> Suggested-by:  Boqun Feng <boqun.feng@gmail.com>
> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
> ---
>  kernel/locking/qspinlock_paravirt.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
> index a5b1248..2bbffe4 100644
> --- a/kernel/locking/qspinlock_paravirt.h
> +++ b/kernel/locking/qspinlock_paravirt.h
> @@ -614,7 +614,7 @@ __visible void __pv_queued_spin_unlock(struct qspinlock *lock)
>  	 * unhash. Otherwise it would be possible to have multiple @lock
>  	 * entries, which would be BAD.
>  	 */
> -	locked = cmpxchg(&l->locked, _Q_LOCKED_VAL, 0);
> +	locked = cmpxchg_release(&l->locked, _Q_LOCKED_VAL, 0);
>  	if (likely(locked == _Q_LOCKED_VAL))
>  		return;

This patch fails to explain _why_ it can be relaxed.

And seeing how this cmpxchg() can actually unlock the lock, I don't see
how this can possibly be correct. Maybe cmpxchg_release(), but relaxed
seems very wrong.

^ permalink raw reply

* Re: [PATCH v3 0/6] powerpc use pv-qpsinlock as the default spinlock implemention
From: Peter Zijlstra @ 2016-05-26 16:50 UTC (permalink / raw)
  To: Pan Xinhui
  Cc: jeremy, mpe, linux-kernel, virtualization, chrisw, mingo, paulus,
	benh, akataria, paulmck, linuxppc-dev
In-Reply-To: <1464164289-6124-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>

On Wed, May 25, 2016 at 04:18:03PM +0800, Pan Xinhui wrote:

> _____test________________spinlcok______________pv-qspinlcok_____
> |futex hash	|	556370 ops	|	629634 ops	|
> |futex lock-pi	|	362 ops		|	367 ops		|
> 
> scheduler test:
> Test how many loops of schedule() can finish within 10 seconds on all cpus.
> 
> _____test________________spinlcok______________pv-qspinlcok_____
> |schedule() loops|	322811921 	|	311449290	|
> 
> kernel compiling test:
> build a linux kernel image to see how long it took
> 
> _____test________________spinlcok______________pv-qspinlcok_____
> | compiling takes|	22m 		|	22m		|


s/spinlcok/spinlock/

Is 'spinlcok' the current test-and-set lock?

And what about regular qspinlock, in case of !SHARED_PROCESSOR?

^ permalink raw reply

* Re: [PATCH v3 0/6] powerpc use pv-qpsinlock as the default spinlock implemention
From: Peter Zijlstra @ 2016-05-26 16:54 UTC (permalink / raw)
  To: Pan Xinhui
  Cc: jeremy, mpe, linux-kernel, virtualization, chrisw, mingo, paulus,
	benh, akataria, paulmck, linuxppc-dev
In-Reply-To: <1464164289-6124-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>

On Wed, May 25, 2016 at 04:18:03PM +0800, Pan Xinhui wrote:
> I do several tests on pseries IBM,8408-E8E with 32cpus, 64GB memory.
> benchmark test results are below.

> _____test________________spinlcok______________pv-qspinlcok_____
> | compiling takes|	22m 		|	22m		|

How can a kernel build take 22 minutes with 32 CPUs? That's far too
long. My 40 CPU IVB-EP takes all of 50 seconds to build an
x86_64-defconfig from scratch.

^ permalink raw reply

* Re: [PATCH v3 5/6] pv-qspinlock: use cmpxchg_release in __pv_queued_spin_unlock
From: Peter Zijlstra @ 2016-05-26 16:57 UTC (permalink / raw)
  To: Pan Xinhui
  Cc: jeremy, mpe, linux-kernel, virtualization, chrisw, mingo, paulus,
	benh, akataria, paulmck, linuxppc-dev
In-Reply-To: <20160526164729.GL3206@twins.programming.kicks-ass.net>

On Thu, May 26, 2016 at 06:47:29PM +0200, Peter Zijlstra wrote:
> On Wed, May 25, 2016 at 04:18:08PM +0800, Pan Xinhui wrote:
> > cmpxchg_release is light-wight than cmpxchg, we can gain a better
> > performace then. On some arch like ppc, barrier impact the performace
> > too much.
> > 
> > Suggested-by:  Boqun Feng <boqun.feng@gmail.com>
> > Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
> > ---
> >  kernel/locking/qspinlock_paravirt.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
> > index a5b1248..2bbffe4 100644
> > --- a/kernel/locking/qspinlock_paravirt.h
> > +++ b/kernel/locking/qspinlock_paravirt.h
> > @@ -614,7 +614,7 @@ __visible void __pv_queued_spin_unlock(struct qspinlock *lock)
> >  	 * unhash. Otherwise it would be possible to have multiple @lock
> >  	 * entries, which would be BAD.
> >  	 */
> > -	locked = cmpxchg(&l->locked, _Q_LOCKED_VAL, 0);
> > +	locked = cmpxchg_release(&l->locked, _Q_LOCKED_VAL, 0);
> >  	if (likely(locked == _Q_LOCKED_VAL))
> >  		return;
> 
> This patch fails to explain _why_ it can be relaxed.
> 
> And seeing how this cmpxchg() can actually unlock the lock, I don't see
> how this can possibly be correct. Maybe cmpxchg_release(), but relaxed
> seems very wrong.

Clearly I need to stop working for the day, I cannea read. You're doing
release, not relaxed.

Still Changelog needs improvement.

^ permalink raw reply

* Re: [PATCH v3 7/7] [wip] virtio-gpu: add page flip support
From: Gerd Hoffmann @ 2016-05-27  7:46 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: open list, dri-devel, open list:VIRTIO GPU DRIVER
In-Reply-To: <CAKMK7uFAjZ9DQSXwfJQWocjc1SXYORTCYdJ6i4d1i3p95Yai2w@mail.gmail.com>

On Mi, 2016-05-25 at 18:37 +0200, Daniel Vetter wrote:
> On Fri, Oct 2, 2015 at 1:58 PM, Gerd Hoffmann <kraxel@redhat.com> wrote:
> > Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> 
> So I entirely missed this, but this isn't really how to implement
> page_flip for an atomic driver. Working on some stuff and will hack up
> a likely totally broken patch, but should be enough as guideline.
> -Daniel

Hmm, no patch in my inbox yet.  Care to send it over or give some hints
what is wrong with this?

thanks,
  Gerd

^ permalink raw reply

* Re: [PATCH] Add virtio gpu driver.
From: Gerd Hoffmann @ 2016-05-27  7:48 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: virtio-dev, Michael S. Tsirkin, open list:ABI/API, open list,
	open list:DRM DRIVERS, open list:VIRTIO CORE, NET..., Dave Airlie
In-Reply-To: <CAKMK7uFUNT193f5djZD6T8cgRxxnZL73yrqbupPHOd2P-+Y=pQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 505 bytes --]

  Hi,

> I guess I didn't do a good job at looking at your v2: Cursor is still
> using legacy interfaces and not a proper plane. Would be awesome if
> you could fix that up. Atomic drivers really shouldn't use the legacy
> cursor interfaces any more at all.
> -Daniel

Figured that one for the most part, see attached draft.

The only thing I'm wondering is how the hotspot is handled.
drm_mode_cursor_universal doesn't even look at req->hot_{x,y}.

/me looks confused.

cheers,
  Gerd


[-- Attachment #2: 0001-wip-virtio-gpu-switch-to-atomic-cursor-interfaces.patch --]
[-- Type: text/x-patch, Size: 10880 bytes --]

From fb1d0700a46d850ec9f931304a9e99854a3ce5e9 Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann <kraxel@redhat.com>
Date: Thu, 26 May 2016 11:42:52 +0200
Subject: [PATCH] [wip] virtio-gpu: switch to atomic cursor interfaces

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 drivers/gpu/drm/virtio/virtgpu_display.c | 102 +++-----------------------
 drivers/gpu/drm/virtio/virtgpu_drv.h     |   1 +
 drivers/gpu/drm/virtio/virtgpu_plane.c   | 122 ++++++++++++++++++++++++-------
 3 files changed, 109 insertions(+), 116 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_display.c b/drivers/gpu/drm/virtio/virtgpu_display.c
index 5990cab..d6b16d1 100644
--- a/drivers/gpu/drm/virtio/virtgpu_display.c
+++ b/drivers/gpu/drm/virtio/virtgpu_display.c
@@ -29,8 +29,8 @@
 #include <drm/drm_crtc_helper.h>
 #include <drm/drm_atomic_helper.h>
 
-#define XRES_MIN   320
-#define YRES_MIN   200
+#define XRES_MIN    32
+#define YRES_MIN    32
 
 #define XRES_DEF  1024
 #define YRES_DEF   768
@@ -38,86 +38,6 @@
 #define XRES_MAX  8192
 #define YRES_MAX  8192
 
-static void
-virtio_gpu_hide_cursor(struct virtio_gpu_device *vgdev,
-		       struct virtio_gpu_output *output)
-{
-	output->cursor.hdr.type = cpu_to_le32(VIRTIO_GPU_CMD_UPDATE_CURSOR);
-	output->cursor.resource_id = 0;
-	virtio_gpu_cursor_ping(vgdev, output);
-}
-
-static int virtio_gpu_crtc_cursor_set(struct drm_crtc *crtc,
-				      struct drm_file *file_priv,
-				      uint32_t handle,
-				      uint32_t width,
-				      uint32_t height,
-				      int32_t hot_x, int32_t hot_y)
-{
-	struct virtio_gpu_device *vgdev = crtc->dev->dev_private;
-	struct virtio_gpu_output *output =
-		container_of(crtc, struct virtio_gpu_output, crtc);
-	struct drm_gem_object *gobj = NULL;
-	struct virtio_gpu_object *qobj = NULL;
-	struct virtio_gpu_fence *fence = NULL;
-	int ret = 0;
-
-	if (handle == 0) {
-		virtio_gpu_hide_cursor(vgdev, output);
-		return 0;
-	}
-
-	/* lookup the cursor */
-	gobj = drm_gem_object_lookup(crtc->dev, file_priv, handle);
-	if (gobj == NULL)
-		return -ENOENT;
-
-	qobj = gem_to_virtio_gpu_obj(gobj);
-
-	if (!qobj->hw_res_handle) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	virtio_gpu_cmd_transfer_to_host_2d(vgdev, qobj->hw_res_handle, 0,
-					   cpu_to_le32(64),
-					   cpu_to_le32(64),
-					   0, 0, &fence);
-	ret = virtio_gpu_object_reserve(qobj, false);
-	if (!ret) {
-		reservation_object_add_excl_fence(qobj->tbo.resv,
-						  &fence->f);
-		fence_put(&fence->f);
-		virtio_gpu_object_unreserve(qobj);
-		virtio_gpu_object_wait(qobj, false);
-	}
-
-	output->cursor.hdr.type = cpu_to_le32(VIRTIO_GPU_CMD_UPDATE_CURSOR);
-	output->cursor.resource_id = cpu_to_le32(qobj->hw_res_handle);
-	output->cursor.hot_x = cpu_to_le32(hot_x);
-	output->cursor.hot_y = cpu_to_le32(hot_y);
-	virtio_gpu_cursor_ping(vgdev, output);
-	ret = 0;
-
-out:
-	drm_gem_object_unreference_unlocked(gobj);
-	return ret;
-}
-
-static int virtio_gpu_crtc_cursor_move(struct drm_crtc *crtc,
-				    int x, int y)
-{
-	struct virtio_gpu_device *vgdev = crtc->dev->dev_private;
-	struct virtio_gpu_output *output =
-		container_of(crtc, struct virtio_gpu_output, crtc);
-
-	output->cursor.hdr.type = cpu_to_le32(VIRTIO_GPU_CMD_MOVE_CURSOR);
-	output->cursor.pos.x = cpu_to_le32(x);
-	output->cursor.pos.y = cpu_to_le32(y);
-	virtio_gpu_cursor_ping(vgdev, output);
-	return 0;
-}
-
 static int virtio_gpu_page_flip(struct drm_crtc *crtc,
 				struct drm_framebuffer *fb,
 				struct drm_pending_vblank_event *event,
@@ -164,8 +84,6 @@ static int virtio_gpu_page_flip(struct drm_crtc *crtc,
 }
 
 static const struct drm_crtc_funcs virtio_gpu_crtc_funcs = {
-	.cursor_set2            = virtio_gpu_crtc_cursor_set,
-	.cursor_move            = virtio_gpu_crtc_cursor_move,
 	.set_config             = drm_atomic_helper_set_config,
 	.destroy                = drm_crtc_cleanup,
 
@@ -406,7 +324,7 @@ static int vgdev_output_init(struct virtio_gpu_device *vgdev, int index)
 	struct drm_connector *connector = &output->conn;
 	struct drm_encoder *encoder = &output->enc;
 	struct drm_crtc *crtc = &output->crtc;
-	struct drm_plane *plane;
+	struct drm_plane *primary, *cursor;
 
 	output->index = index;
 	if (index == 0) {
@@ -415,14 +333,18 @@ static int vgdev_output_init(struct virtio_gpu_device *vgdev, int index)
 		output->info.r.height = cpu_to_le32(YRES_DEF);
 	}
 
-	plane = virtio_gpu_plane_init(vgdev, index);
-	if (IS_ERR(plane))
-		return PTR_ERR(plane);
-	drm_crtc_init_with_planes(dev, crtc, plane, NULL,
+	primary = virtio_gpu_plane_init(vgdev, DRM_PLANE_TYPE_PRIMARY, index);
+	if (IS_ERR(primary))
+		return PTR_ERR(primary);
+	cursor = virtio_gpu_plane_init(vgdev, DRM_PLANE_TYPE_CURSOR, index);
+	if (IS_ERR(cursor))
+		return PTR_ERR(cursor);
+	drm_crtc_init_with_planes(dev, crtc, primary, cursor,
 				  &virtio_gpu_crtc_funcs, NULL);
 	drm_mode_crtc_set_gamma_size(crtc, 256);
 	drm_crtc_helper_add(crtc, &virtio_gpu_crtc_helper_funcs);
-	plane->crtc = crtc;
+	primary->crtc = crtc;
+	cursor->crtc = crtc;
 
 	drm_connector_init(dev, connector, &virtio_gpu_connector_funcs,
 			   DRM_MODE_CONNECTOR_VIRTUAL);
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 8f486f4..a1f7e9d 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -335,6 +335,7 @@ void virtio_gpu_modeset_fini(struct virtio_gpu_device *vgdev);
 
 /* virtio_gpu_plane.c */
 struct drm_plane *virtio_gpu_plane_init(struct virtio_gpu_device *vgdev,
+					enum drm_plane_type type,
 					int index);
 
 /* virtio_gpu_ttm.c */
diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c b/drivers/gpu/drm/virtio/virtgpu_plane.c
index 70b44a2..d68270f 100644
--- a/drivers/gpu/drm/virtio/virtgpu_plane.c
+++ b/drivers/gpu/drm/virtio/virtgpu_plane.c
@@ -38,6 +38,10 @@ static const uint32_t virtio_gpu_formats[] = {
 	DRM_FORMAT_ABGR8888,
 };
 
+static const uint32_t virtio_gpu_cursor_formats[] = {
+	DRM_FORMAT_ARGB8888,
+};
+
 static void virtio_gpu_plane_destroy(struct drm_plane *plane)
 {
 	kfree(plane);
@@ -63,39 +67,97 @@ static void virtio_gpu_plane_atomic_update(struct drm_plane *plane,
 {
 	struct drm_device *dev = plane->dev;
 	struct virtio_gpu_device *vgdev = dev->dev_private;
-	struct virtio_gpu_output *output = drm_crtc_to_virtio_gpu_output(plane->crtc);
+	struct virtio_gpu_output *output = NULL;
 	struct virtio_gpu_framebuffer *vgfb;
-	struct virtio_gpu_object *bo;
+	struct virtio_gpu_fence *fence = NULL;
+	struct virtio_gpu_object *bo = NULL;
 	uint32_t handle;
+	int ret = 0;
+
+	if (plane->state->crtc)
+		output = drm_crtc_to_virtio_gpu_output(plane->state->crtc);
+	if (old_state->crtc)
+		output = drm_crtc_to_virtio_gpu_output(old_state->crtc);
+	WARN_ON(!output);
 
 	if (plane->state->fb) {
 		vgfb = to_virtio_gpu_framebuffer(plane->state->fb);
 		bo = gem_to_virtio_gpu_obj(vgfb->obj);
 		handle = bo->hw_res_handle;
-		if (bo->dumb) {
-			virtio_gpu_cmd_transfer_to_host_2d
-				(vgdev, handle, 0,
-				 cpu_to_le32(plane->state->crtc_w),
-				 cpu_to_le32(plane->state->crtc_h),
-				 plane->state->crtc_x, plane->state->crtc_y, NULL);
-		}
 	} else {
 		handle = 0;
 	}
 
-	DRM_DEBUG("handle 0x%x, crtc %dx%d+%d+%d\n", handle,
-		  plane->state->crtc_w, plane->state->crtc_h,
-		  plane->state->crtc_x, plane->state->crtc_y);
-	virtio_gpu_cmd_set_scanout(vgdev, output->index, handle,
-				   plane->state->crtc_w,
-				   plane->state->crtc_h,
-				   plane->state->crtc_x,
-				   plane->state->crtc_y);
-	virtio_gpu_cmd_resource_flush(vgdev, handle,
-				      plane->state->crtc_x,
-				      plane->state->crtc_y,
-				      plane->state->crtc_w,
-				      plane->state->crtc_h);
+	switch (plane->type) {
+	case DRM_PLANE_TYPE_CURSOR:
+		if (bo && bo->dumb && (plane->state->fb != old_state->fb)) {
+			/* new cursor -- update & wait */
+			virtio_gpu_cmd_transfer_to_host_2d
+				(vgdev, handle, 0,
+				 cpu_to_le32(plane->state->crtc_w),
+				 cpu_to_le32(plane->state->crtc_h),
+				 0, 0, &fence);
+			ret = virtio_gpu_object_reserve(bo, false);
+			if (!ret) {
+				reservation_object_add_excl_fence(bo->tbo.resv,
+								  &fence->f);
+				fence_put(&fence->f);
+				fence = NULL;
+				virtio_gpu_object_unreserve(bo);
+				virtio_gpu_object_wait(bo, false);
+			}
+		}
+
+		if (plane->state->fb != old_state->fb) {
+			DRM_DEBUG("cursor update, handle %d, +%d+%d\n", handle,
+				  plane->state->crtc_x,
+				  plane->state->crtc_y);
+			output->cursor.hdr.type =
+				cpu_to_le32(VIRTIO_GPU_CMD_UPDATE_CURSOR);
+			output->cursor.resource_id = cpu_to_le32(handle);
+#if 0
+			output->cursor.hot_x = cpu_to_le32(hot_x);
+			output->cursor.hot_y = cpu_to_le32(hot_y);
+#endif
+		} else {
+			DRM_DEBUG("cursor move +%d+%d\n",
+				  plane->state->crtc_x,
+				  plane->state->crtc_y);
+			output->cursor.hdr.type =
+				cpu_to_le32(VIRTIO_GPU_CMD_MOVE_CURSOR);
+		}
+		output->cursor.pos.x = cpu_to_le32(plane->state->crtc_x);
+		output->cursor.pos.y = cpu_to_le32(plane->state->crtc_y);
+		virtio_gpu_cursor_ping(vgdev, output);
+		break;
+
+	case DRM_PLANE_TYPE_PRIMARY:
+		DRM_DEBUG("primary, handle 0x%x, crtc %dx%d+%d+%d\n", handle,
+			  plane->state->crtc_w, plane->state->crtc_h,
+			  plane->state->crtc_x, plane->state->crtc_y);
+		if (bo && bo->dumb) {
+			virtio_gpu_cmd_transfer_to_host_2d
+				(vgdev, handle, 0,
+				 cpu_to_le32(plane->state->crtc_w),
+				 cpu_to_le32(plane->state->crtc_h),
+				 plane->state->crtc_x, plane->state->crtc_y,
+				 &fence);
+		}
+		virtio_gpu_cmd_set_scanout(vgdev, output->index, handle,
+					   plane->state->crtc_w,
+					   plane->state->crtc_h,
+					   plane->state->crtc_x,
+					   plane->state->crtc_y);
+		virtio_gpu_cmd_resource_flush(vgdev, handle,
+					      plane->state->crtc_x,
+					      plane->state->crtc_y,
+					      plane->state->crtc_w,
+					      plane->state->crtc_h);
+		break;
+
+	default:
+		WARN_ON(true);
+	}
 }
 
 
@@ -105,21 +167,29 @@ static const struct drm_plane_helper_funcs virtio_gpu_plane_helper_funcs = {
 };
 
 struct drm_plane *virtio_gpu_plane_init(struct virtio_gpu_device *vgdev,
+					enum drm_plane_type type,
 					int index)
 {
 	struct drm_device *dev = vgdev->ddev;
 	struct drm_plane *plane;
-	int ret;
+	const uint32_t *formats;
+	int ret, nformats;
 
 	plane = kzalloc(sizeof(*plane), GFP_KERNEL);
 	if (!plane)
 		return ERR_PTR(-ENOMEM);
 
+	if (type == DRM_PLANE_TYPE_CURSOR) {
+		formats = virtio_gpu_cursor_formats;
+		nformats = ARRAY_SIZE(virtio_gpu_cursor_formats);
+	} else {
+		formats = virtio_gpu_formats;
+		nformats = ARRAY_SIZE(virtio_gpu_formats);
+	}
 	ret = drm_universal_plane_init(dev, plane, 1 << index,
 				       &virtio_gpu_plane_funcs,
-				       virtio_gpu_formats,
-				       ARRAY_SIZE(virtio_gpu_formats),
-				       DRM_PLANE_TYPE_PRIMARY, NULL);
+				       formats, nformats,
+				       type, NULL);
 	if (ret)
 		goto err_plane_init;
 
-- 
1.8.3.1


[-- Attachment #3: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related

* Re: [PATCH v3 7/7] [wip] virtio-gpu: add page flip support
From: Daniel Vetter @ 2016-05-27  7:50 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: open list:VIRTIO GPU DRIVER, dri-devel, Daniel Vetter, open list
In-Reply-To: <1464335163.10663.7.camel@redhat.com>

On Fri, May 27, 2016 at 09:46:03AM +0200, Gerd Hoffmann wrote:
> On Mi, 2016-05-25 at 18:37 +0200, Daniel Vetter wrote:
> > On Fri, Oct 2, 2015 at 1:58 PM, Gerd Hoffmann <kraxel@redhat.com> wrote:
> > > Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> > 
> > So I entirely missed this, but this isn't really how to implement
> > page_flip for an atomic driver. Working on some stuff and will hack up
> > a likely totally broken patch, but should be enough as guideline.
> > -Daniel
> 
> Hmm, no patch in my inbox yet.  Care to send it over or give some hints
> what is wrong with this?

You should use drm_atomic_helper_page_flip as .page_flip hook instead of
rolling your own, when you have a brand-new atomic driver. If that helper
doesn't work that means you have a bug in your driver somewhere.

And indeed virtio just plugs in drm_atomic_helper_commit, which thus far
doesnt' do nonblocking commits, which means the page flip stuff won't
work. I'm working on generic nonblocking atomic to fix this up. Currently
still wip and buggy though, hence no patches.

But I'll take you up on the implied offer to help out and test ;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply

* Re: [PATCH] Add virtio gpu driver.
From: Daniel Vetter @ 2016-05-27  9:03 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: virtio-dev, Michael S. Tsirkin, open list:ABI/API, open list,
	open list:DRM DRIVERS, open list:VIRTIO CORE, NET...,
	Daniel Vetter, Dave Airlie
In-Reply-To: <1464335302.10663.10.camel@redhat.com>

On Fri, May 27, 2016 at 09:48:22AM +0200, Gerd Hoffmann wrote:
> > I guess I didn't do a good job at looking at your v2: Cursor is still
> > using legacy interfaces and not a proper plane. Would be awesome if
> > you could fix that up. Atomic drivers really shouldn't use the legacy
> > cursor interfaces any more at all.
> > -Daniel
> 
> Figured that one for the most part, see attached draft.
> 
> The only thing I'm wondering is how the hotspot is handled.
> drm_mode_cursor_universal doesn't even look at req->hot_{x,y}.
> 
> /me looks confused.

No need to, you're simply the first virtual driver to wire up atomic
cursors. Hence some gaps to be filled out.

First we need to wire up the state tracking scaffolding for atomic:
- add hot_x/y to drm_plane_state
- add property pointers for hot_x/y to dev->mode_config (like we have for
  all the other atomic props like "SRC_X").
- add encode/decode support for these properties to
  drm_atomic_plane_get_property and drm_atomic_plane_set_property, similar
  again to "SRC_X" and friends
- add a small core function to registerr HOT_X/HOT_Y for a (cursor) plane,
  e.g. drm_plane_register_hotspot(). That should allocate the properties
  (if they don't exist yet) and then attach those props to the cursor. We
  don't want those props everywhere, but only on drivers that support/need
  them, aka virtual hw.

With that a real atomic driver will be able to move the cursor and it's
hotspot around, all nicely done in an atomic commit. But it won't work yet
for userspace for legacy applications. For that we need a notch more:
- one option would be to add hot_x/hot_y to the ->update_plane hook, but
  that has massive trickling effects throughout the subsystem. Probably
  not what we want to do.
- 2nd option would be to add a DRIVER_ATOMIC check to
  drm_mode_cursor_common and call a new drm_mode_cursor_atomic in that
  case:

diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
index 3e52a6ecf6c0..2f15ce2c6bf4 100644
--- a/drivers/gpu/drm/drm_crtc.c
+++ b/drivers/gpu/drm/drm_crtc.c
@@ -3046,7 +3046,10 @@ static int drm_mode_cursor_common(struct drm_device *dev,
 	 */
 	drm_modeset_lock_crtc(crtc, crtc->cursor);
 	if (crtc->cursor) {
-		ret = drm_mode_cursor_universal(crtc, req, file_priv);
+		if (drm_core_check_feature(DRIVER_ATOMIC))
+			ret = drm_mode_cursor_atomic(crtc, req, file_priv);
+		else
+			ret = drm_mode_cursor_universal(crtc, req, file_priv);
 		goto out;
 	}
 
  drm_mode_cursor_atomic would simply be a fusing of
  drm_mode_cursor_universal + drm_atomic_helper_update_plane (dump all the
  intermediate variables and store directly in the plane state), with the
  addition of also storing hot_x/y into the plane state.

Sorry that this turned into a bit of a project, I've forgotten that we
haven't wired up hot_x/y at all for atomic ...

If you don't want to bother with the atomic properties (only needed for
atomic userspace), then just adding hot_x/y to drm_plane_state is all you
need from the first group of tasks.

Your patch below to implement the atomic cursor looks reasonable. Although
personally I'd go with a separate vfunc table for the cursor so that you
can avoid that ugly switch in the atomic_update hook.

Cheers, Daniel

> 
> cheers,
>   Gerd
> 

> From fb1d0700a46d850ec9f931304a9e99854a3ce5e9 Mon Sep 17 00:00:00 2001
> From: Gerd Hoffmann <kraxel@redhat.com>
> Date: Thu, 26 May 2016 11:42:52 +0200
> Subject: [PATCH] [wip] virtio-gpu: switch to atomic cursor interfaces
> 
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  drivers/gpu/drm/virtio/virtgpu_display.c | 102 +++-----------------------
>  drivers/gpu/drm/virtio/virtgpu_drv.h     |   1 +
>  drivers/gpu/drm/virtio/virtgpu_plane.c   | 122 ++++++++++++++++++++++++-------
>  3 files changed, 109 insertions(+), 116 deletions(-)
> 
> diff --git a/drivers/gpu/drm/virtio/virtgpu_display.c b/drivers/gpu/drm/virtio/virtgpu_display.c
> index 5990cab..d6b16d1 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_display.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_display.c
> @@ -29,8 +29,8 @@
>  #include <drm/drm_crtc_helper.h>
>  #include <drm/drm_atomic_helper.h>
>  
> -#define XRES_MIN   320
> -#define YRES_MIN   200
> +#define XRES_MIN    32
> +#define YRES_MIN    32
>  
>  #define XRES_DEF  1024
>  #define YRES_DEF   768
> @@ -38,86 +38,6 @@
>  #define XRES_MAX  8192
>  #define YRES_MAX  8192
>  
> -static void
> -virtio_gpu_hide_cursor(struct virtio_gpu_device *vgdev,
> -		       struct virtio_gpu_output *output)
> -{
> -	output->cursor.hdr.type = cpu_to_le32(VIRTIO_GPU_CMD_UPDATE_CURSOR);
> -	output->cursor.resource_id = 0;
> -	virtio_gpu_cursor_ping(vgdev, output);
> -}
> -
> -static int virtio_gpu_crtc_cursor_set(struct drm_crtc *crtc,
> -				      struct drm_file *file_priv,
> -				      uint32_t handle,
> -				      uint32_t width,
> -				      uint32_t height,
> -				      int32_t hot_x, int32_t hot_y)
> -{
> -	struct virtio_gpu_device *vgdev = crtc->dev->dev_private;
> -	struct virtio_gpu_output *output =
> -		container_of(crtc, struct virtio_gpu_output, crtc);
> -	struct drm_gem_object *gobj = NULL;
> -	struct virtio_gpu_object *qobj = NULL;
> -	struct virtio_gpu_fence *fence = NULL;
> -	int ret = 0;
> -
> -	if (handle == 0) {
> -		virtio_gpu_hide_cursor(vgdev, output);
> -		return 0;
> -	}
> -
> -	/* lookup the cursor */
> -	gobj = drm_gem_object_lookup(crtc->dev, file_priv, handle);
> -	if (gobj == NULL)
> -		return -ENOENT;
> -
> -	qobj = gem_to_virtio_gpu_obj(gobj);
> -
> -	if (!qobj->hw_res_handle) {
> -		ret = -EINVAL;
> -		goto out;
> -	}
> -
> -	virtio_gpu_cmd_transfer_to_host_2d(vgdev, qobj->hw_res_handle, 0,
> -					   cpu_to_le32(64),
> -					   cpu_to_le32(64),
> -					   0, 0, &fence);
> -	ret = virtio_gpu_object_reserve(qobj, false);
> -	if (!ret) {
> -		reservation_object_add_excl_fence(qobj->tbo.resv,
> -						  &fence->f);
> -		fence_put(&fence->f);
> -		virtio_gpu_object_unreserve(qobj);
> -		virtio_gpu_object_wait(qobj, false);
> -	}
> -
> -	output->cursor.hdr.type = cpu_to_le32(VIRTIO_GPU_CMD_UPDATE_CURSOR);
> -	output->cursor.resource_id = cpu_to_le32(qobj->hw_res_handle);
> -	output->cursor.hot_x = cpu_to_le32(hot_x);
> -	output->cursor.hot_y = cpu_to_le32(hot_y);
> -	virtio_gpu_cursor_ping(vgdev, output);
> -	ret = 0;
> -
> -out:
> -	drm_gem_object_unreference_unlocked(gobj);
> -	return ret;
> -}
> -
> -static int virtio_gpu_crtc_cursor_move(struct drm_crtc *crtc,
> -				    int x, int y)
> -{
> -	struct virtio_gpu_device *vgdev = crtc->dev->dev_private;
> -	struct virtio_gpu_output *output =
> -		container_of(crtc, struct virtio_gpu_output, crtc);
> -
> -	output->cursor.hdr.type = cpu_to_le32(VIRTIO_GPU_CMD_MOVE_CURSOR);
> -	output->cursor.pos.x = cpu_to_le32(x);
> -	output->cursor.pos.y = cpu_to_le32(y);
> -	virtio_gpu_cursor_ping(vgdev, output);
> -	return 0;
> -}
> -
>  static int virtio_gpu_page_flip(struct drm_crtc *crtc,
>  				struct drm_framebuffer *fb,
>  				struct drm_pending_vblank_event *event,
> @@ -164,8 +84,6 @@ static int virtio_gpu_page_flip(struct drm_crtc *crtc,
>  }
>  
>  static const struct drm_crtc_funcs virtio_gpu_crtc_funcs = {
> -	.cursor_set2            = virtio_gpu_crtc_cursor_set,
> -	.cursor_move            = virtio_gpu_crtc_cursor_move,
>  	.set_config             = drm_atomic_helper_set_config,
>  	.destroy                = drm_crtc_cleanup,
>  
> @@ -406,7 +324,7 @@ static int vgdev_output_init(struct virtio_gpu_device *vgdev, int index)
>  	struct drm_connector *connector = &output->conn;
>  	struct drm_encoder *encoder = &output->enc;
>  	struct drm_crtc *crtc = &output->crtc;
> -	struct drm_plane *plane;
> +	struct drm_plane *primary, *cursor;
>  
>  	output->index = index;
>  	if (index == 0) {
> @@ -415,14 +333,18 @@ static int vgdev_output_init(struct virtio_gpu_device *vgdev, int index)
>  		output->info.r.height = cpu_to_le32(YRES_DEF);
>  	}
>  
> -	plane = virtio_gpu_plane_init(vgdev, index);
> -	if (IS_ERR(plane))
> -		return PTR_ERR(plane);
> -	drm_crtc_init_with_planes(dev, crtc, plane, NULL,
> +	primary = virtio_gpu_plane_init(vgdev, DRM_PLANE_TYPE_PRIMARY, index);
> +	if (IS_ERR(primary))
> +		return PTR_ERR(primary);
> +	cursor = virtio_gpu_plane_init(vgdev, DRM_PLANE_TYPE_CURSOR, index);
> +	if (IS_ERR(cursor))
> +		return PTR_ERR(cursor);
> +	drm_crtc_init_with_planes(dev, crtc, primary, cursor,
>  				  &virtio_gpu_crtc_funcs, NULL);
>  	drm_mode_crtc_set_gamma_size(crtc, 256);
>  	drm_crtc_helper_add(crtc, &virtio_gpu_crtc_helper_funcs);
> -	plane->crtc = crtc;
> +	primary->crtc = crtc;
> +	cursor->crtc = crtc;
>  
>  	drm_connector_init(dev, connector, &virtio_gpu_connector_funcs,
>  			   DRM_MODE_CONNECTOR_VIRTUAL);
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
> index 8f486f4..a1f7e9d 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> @@ -335,6 +335,7 @@ void virtio_gpu_modeset_fini(struct virtio_gpu_device *vgdev);
>  
>  /* virtio_gpu_plane.c */
>  struct drm_plane *virtio_gpu_plane_init(struct virtio_gpu_device *vgdev,
> +					enum drm_plane_type type,
>  					int index);
>  
>  /* virtio_gpu_ttm.c */
> diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c b/drivers/gpu/drm/virtio/virtgpu_plane.c
> index 70b44a2..d68270f 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_plane.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_plane.c
> @@ -38,6 +38,10 @@ static const uint32_t virtio_gpu_formats[] = {
>  	DRM_FORMAT_ABGR8888,
>  };
>  
> +static const uint32_t virtio_gpu_cursor_formats[] = {
> +	DRM_FORMAT_ARGB8888,
> +};
> +
>  static void virtio_gpu_plane_destroy(struct drm_plane *plane)
>  {
>  	kfree(plane);
> @@ -63,39 +67,97 @@ static void virtio_gpu_plane_atomic_update(struct drm_plane *plane,
>  {
>  	struct drm_device *dev = plane->dev;
>  	struct virtio_gpu_device *vgdev = dev->dev_private;
> -	struct virtio_gpu_output *output = drm_crtc_to_virtio_gpu_output(plane->crtc);
> +	struct virtio_gpu_output *output = NULL;
>  	struct virtio_gpu_framebuffer *vgfb;
> -	struct virtio_gpu_object *bo;
> +	struct virtio_gpu_fence *fence = NULL;
> +	struct virtio_gpu_object *bo = NULL;
>  	uint32_t handle;
> +	int ret = 0;
> +
> +	if (plane->state->crtc)
> +		output = drm_crtc_to_virtio_gpu_output(plane->state->crtc);
> +	if (old_state->crtc)
> +		output = drm_crtc_to_virtio_gpu_output(old_state->crtc);
> +	WARN_ON(!output);
>  
>  	if (plane->state->fb) {
>  		vgfb = to_virtio_gpu_framebuffer(plane->state->fb);
>  		bo = gem_to_virtio_gpu_obj(vgfb->obj);
>  		handle = bo->hw_res_handle;
> -		if (bo->dumb) {
> -			virtio_gpu_cmd_transfer_to_host_2d
> -				(vgdev, handle, 0,
> -				 cpu_to_le32(plane->state->crtc_w),
> -				 cpu_to_le32(plane->state->crtc_h),
> -				 plane->state->crtc_x, plane->state->crtc_y, NULL);
> -		}
>  	} else {
>  		handle = 0;
>  	}
>  
> -	DRM_DEBUG("handle 0x%x, crtc %dx%d+%d+%d\n", handle,
> -		  plane->state->crtc_w, plane->state->crtc_h,
> -		  plane->state->crtc_x, plane->state->crtc_y);
> -	virtio_gpu_cmd_set_scanout(vgdev, output->index, handle,
> -				   plane->state->crtc_w,
> -				   plane->state->crtc_h,
> -				   plane->state->crtc_x,
> -				   plane->state->crtc_y);
> -	virtio_gpu_cmd_resource_flush(vgdev, handle,
> -				      plane->state->crtc_x,
> -				      plane->state->crtc_y,
> -				      plane->state->crtc_w,
> -				      plane->state->crtc_h);
> +	switch (plane->type) {
> +	case DRM_PLANE_TYPE_CURSOR:
> +		if (bo && bo->dumb && (plane->state->fb != old_state->fb)) {
> +			/* new cursor -- update & wait */
> +			virtio_gpu_cmd_transfer_to_host_2d
> +				(vgdev, handle, 0,
> +				 cpu_to_le32(plane->state->crtc_w),
> +				 cpu_to_le32(plane->state->crtc_h),
> +				 0, 0, &fence);
> +			ret = virtio_gpu_object_reserve(bo, false);
> +			if (!ret) {
> +				reservation_object_add_excl_fence(bo->tbo.resv,
> +								  &fence->f);
> +				fence_put(&fence->f);
> +				fence = NULL;
> +				virtio_gpu_object_unreserve(bo);
> +				virtio_gpu_object_wait(bo, false);
> +			}
> +		}
> +
> +		if (plane->state->fb != old_state->fb) {
> +			DRM_DEBUG("cursor update, handle %d, +%d+%d\n", handle,
> +				  plane->state->crtc_x,
> +				  plane->state->crtc_y);
> +			output->cursor.hdr.type =
> +				cpu_to_le32(VIRTIO_GPU_CMD_UPDATE_CURSOR);
> +			output->cursor.resource_id = cpu_to_le32(handle);
> +#if 0
> +			output->cursor.hot_x = cpu_to_le32(hot_x);
> +			output->cursor.hot_y = cpu_to_le32(hot_y);
> +#endif
> +		} else {
> +			DRM_DEBUG("cursor move +%d+%d\n",
> +				  plane->state->crtc_x,
> +				  plane->state->crtc_y);
> +			output->cursor.hdr.type =
> +				cpu_to_le32(VIRTIO_GPU_CMD_MOVE_CURSOR);
> +		}
> +		output->cursor.pos.x = cpu_to_le32(plane->state->crtc_x);
> +		output->cursor.pos.y = cpu_to_le32(plane->state->crtc_y);
> +		virtio_gpu_cursor_ping(vgdev, output);
> +		break;
> +
> +	case DRM_PLANE_TYPE_PRIMARY:
> +		DRM_DEBUG("primary, handle 0x%x, crtc %dx%d+%d+%d\n", handle,
> +			  plane->state->crtc_w, plane->state->crtc_h,
> +			  plane->state->crtc_x, plane->state->crtc_y);
> +		if (bo && bo->dumb) {
> +			virtio_gpu_cmd_transfer_to_host_2d
> +				(vgdev, handle, 0,
> +				 cpu_to_le32(plane->state->crtc_w),
> +				 cpu_to_le32(plane->state->crtc_h),
> +				 plane->state->crtc_x, plane->state->crtc_y,
> +				 &fence);
> +		}
> +		virtio_gpu_cmd_set_scanout(vgdev, output->index, handle,
> +					   plane->state->crtc_w,
> +					   plane->state->crtc_h,
> +					   plane->state->crtc_x,
> +					   plane->state->crtc_y);
> +		virtio_gpu_cmd_resource_flush(vgdev, handle,
> +					      plane->state->crtc_x,
> +					      plane->state->crtc_y,
> +					      plane->state->crtc_w,
> +					      plane->state->crtc_h);
> +		break;
> +
> +	default:
> +		WARN_ON(true);
> +	}
>  }
>  
>  
> @@ -105,21 +167,29 @@ static const struct drm_plane_helper_funcs virtio_gpu_plane_helper_funcs = {
>  };
>  
>  struct drm_plane *virtio_gpu_plane_init(struct virtio_gpu_device *vgdev,
> +					enum drm_plane_type type,
>  					int index)
>  {
>  	struct drm_device *dev = vgdev->ddev;
>  	struct drm_plane *plane;
> -	int ret;
> +	const uint32_t *formats;
> +	int ret, nformats;
>  
>  	plane = kzalloc(sizeof(*plane), GFP_KERNEL);
>  	if (!plane)
>  		return ERR_PTR(-ENOMEM);
>  
> +	if (type == DRM_PLANE_TYPE_CURSOR) {
> +		formats = virtio_gpu_cursor_formats;
> +		nformats = ARRAY_SIZE(virtio_gpu_cursor_formats);
> +	} else {
> +		formats = virtio_gpu_formats;
> +		nformats = ARRAY_SIZE(virtio_gpu_formats);
> +	}
>  	ret = drm_universal_plane_init(dev, plane, 1 << index,
>  				       &virtio_gpu_plane_funcs,
> -				       virtio_gpu_formats,
> -				       ARRAY_SIZE(virtio_gpu_formats),
> -				       DRM_PLANE_TYPE_PRIMARY, NULL);
> +				       formats, nformats,
> +				       type, NULL);
>  	if (ret)
>  		goto err_plane_init;
>  
> -- 
> 1.8.3.1
> 


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply related

* Re: [PATCH v3 5/6] pv-qspinlock: use cmpxchg_release in __pv_queued_spin_unlock
From: xinhui @ 2016-05-27 10:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: jeremy, mpe, linux-kernel, virtualization, chrisw, mingo, paulus,
	benh, akataria, paulmck, linuxppc-dev
In-Reply-To: <20160526165712.GT3205@twins.programming.kicks-ass.net>



On 2016年05月27日 00:57, Peter Zijlstra wrote:
> On Thu, May 26, 2016 at 06:47:29PM +0200, Peter Zijlstra wrote:
>> On Wed, May 25, 2016 at 04:18:08PM +0800, Pan Xinhui wrote:
>>> cmpxchg_release is light-wight than cmpxchg, we can gain a better
>>> performace then. On some arch like ppc, barrier impact the performace
>>> too much.
>>>
>>> Suggested-by:  Boqun Feng <boqun.feng@gmail.com>
>>> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
>>> ---
>>>   kernel/locking/qspinlock_paravirt.h | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
>>> index a5b1248..2bbffe4 100644
>>> --- a/kernel/locking/qspinlock_paravirt.h
>>> +++ b/kernel/locking/qspinlock_paravirt.h
>>> @@ -614,7 +614,7 @@ __visible void __pv_queued_spin_unlock(struct qspinlock *lock)
>>>   	 * unhash. Otherwise it would be possible to have multiple @lock
>>>   	 * entries, which would be BAD.
>>>   	 */
>>> -	locked = cmpxchg(&l->locked, _Q_LOCKED_VAL, 0);
>>> +	locked = cmpxchg_release(&l->locked, _Q_LOCKED_VAL, 0);
>>>   	if (likely(locked == _Q_LOCKED_VAL))
>>>   		return;
>>
>> This patch fails to explain _why_ it can be relaxed.
>>
>> And seeing how this cmpxchg() can actually unlock the lock, I don't see
>> how this can possibly be correct. Maybe cmpxchg_release(), but relaxed
>> seems very wrong.
>
> Clearly I need to stop working for the day, I cannea read. You're doing
> release, not relaxed.
>
Never mind.  thanks for review :)

> Still Changelog needs improvement.
>
Will do that.

thanks
xinhui

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v3 0/6] powerpc use pv-qpsinlock as the default spinlock implemention
From: xinhui @ 2016-05-27 10:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: jeremy, mpe, linux-kernel, virtualization, chrisw, mingo, paulus,
	benh, akataria, paulmck, linuxppc-dev
In-Reply-To: <20160526165047.GM3206@twins.programming.kicks-ass.net>


On 2016年05月27日 00:50, Peter Zijlstra wrote:
> On Wed, May 25, 2016 at 04:18:03PM +0800, Pan Xinhui wrote:
>
>> _____test________________spinlcok______________pv-qspinlcok_____
>> |futex hash	|	556370 ops	|	629634 ops	|
>> |futex lock-pi	|	362 ops		|	367 ops		|
>>
>> scheduler test:
>> Test how many loops of schedule() can finish within 10 seconds on all cpus.
>>
>> _____test________________spinlcok______________pv-qspinlcok_____
>> |schedule() loops|	322811921 	|	311449290	|
>>
>> kernel compiling test:
>> build a linux kernel image to see how long it took
>>
>> _____test________________spinlcok______________pv-qspinlcok_____
>> | compiling takes|	22m 		|	22m		|
>
>
> s/spinlcok/spinlock/
>
Oh, foolish mistake...sorry

> Is 'spinlcok' the current test-and-set lock?
>
Yes. I will describe it in a clear way in the next patchset.
  
> And what about regular qspinlock, in case of !SHARED_PROCESSOR?
>

You mean the test results on powerNV?

yes, I make a kernel build with !SHARED_PROCESSOR.
and do perf tests and scheduler tests on same machine(32 cpus). performance is better than current spinlock

  _____test________________spinlock________________qspinlock_____
  |futex hash	|	533060 ops	|	541513 ops	|
  |futex lock-pi	|	357 ops		|	356 ops		|


  _____test________________spinlock________________qspinlock_____
  |schedule() loops|	337691713 	|	361935207	|


NOTE: I have updated the scheduler test tools, and the new performance test results show that both pv-spinlock and qspinlock is better than current spinlock.
I will also update the test result in my next patchset.

thanks
xinhui

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v6 02/12] mm: migrate: support non-lru movable page migration
From: Vlastimil Babka @ 2016-05-27 14:26 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton
  Cc: Rik van Riel, Sergey Senozhatsky, Rafael Aquini, Jonathan Corbet,
	Hugh Dickins, linux-kernel, dri-devel, virtualization,
	John Einar Reitan, linux-mm, Mel Gorman, Joonsoo Kim, Gioh Kim
In-Reply-To: <1463754225-31311-3-git-send-email-minchan@kernel.org>

On 05/20/2016 04:23 PM, Minchan Kim wrote:
> We have allowed migration for only LRU pages until now and it was
> enough to make high-order pages. But recently, embedded system(e.g.,
> webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory)
> so we have seen several reports about troubles of small high-order
> allocation. For fixing the problem, there were several efforts
> (e,g,. enhance compaction algorithm, SLUB fallback to 0-order page,
> reserved memory, vmalloc and so on) but if there are lots of
> non-movable pages in system, their solutions are void in the long run.
>
> So, this patch is to support facility to change non-movable pages
> with movable. For the feature, this patch introduces functions related
> to migration to address_space_operations as well as some page flags.
>
> If a driver want to make own pages movable, it should define three functions
> which are function pointers of struct address_space_operations.
>
> 1. bool (*isolate_page) (struct page *page, isolate_mode_t mode);
>
> What VM expects on isolate_page function of driver is to return *true*
> if driver isolates page successfully. On returing true, VM marks the page
> as PG_isolated so concurrent isolation in several CPUs skip the page
> for isolation. If a driver cannot isolate the page, it should return *false*.
>
> Once page is successfully isolated, VM uses page.lru fields so driver
> shouldn't expect to preserve values in that fields.
>
> 2. int (*migratepage) (struct address_space *mapping,
> 		struct page *newpage, struct page *oldpage, enum migrate_mode);
>
> After isolation, VM calls migratepage of driver with isolated page.
> The function of migratepage is to move content of the old page to new page
> and set up fields of struct page newpage. Keep in mind that you should
> clear PG_movable of oldpage via __ClearPageMovable under page_lock if you
> migrated the oldpage successfully and returns 0.
> If driver cannot migrate the page at the moment, driver can return -EAGAIN.
> On -EAGAIN, VM will retry page migration in a short time because VM interprets
> -EAGAIN as "temporal migration failure". On returning any error except -EAGAIN,
> VM will give up the page migration without retrying in this time.
>
> Driver shouldn't touch page.lru field VM using in the functions.
>
> 3. void (*putback_page)(struct page *);
>
> If migration fails on isolated page, VM should return the isolated page
> to the driver so VM calls driver's putback_page with migration failed page.
> In this function, driver should put the isolated page back to the own data
> structure.
>
> 4. non-lru movable page flags
>
> There are two page flags for supporting non-lru movable page.
>
> * PG_movable
>
> Driver should use the below function to make page movable under page_lock.
>
> 	void __SetPageMovable(struct page *page, struct address_space *mapping)
>
> It needs argument of address_space for registering migration family functions
> which will be called by VM. Exactly speaking, PG_movable is not a real flag of
> struct page. Rather than, VM reuses page->mapping's lower bits to represent it.
>
> 	#define PAGE_MAPPING_MOVABLE 0x2
> 	page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;

Interesting, let's see how that works out...

Overal this looks much better than the last version I checked!

[...]

> @@ -357,29 +360,37 @@ PAGEFLAG(Idle, idle, PF_ANY)
>   * with the PAGE_MAPPING_ANON bit set to distinguish it.  See rmap.h.
>   *
>   * On an anonymous page in a VM_MERGEABLE area, if CONFIG_KSM is enabled,
> - * the PAGE_MAPPING_KSM bit may be set along with the PAGE_MAPPING_ANON bit;
> - * and then page->mapping points, not to an anon_vma, but to a private
> + * the PAGE_MAPPING_MOVABLE bit may be set along with the PAGE_MAPPING_ANON
> + * bit; and then page->mapping points, not to an anon_vma, but to a private
>   * structure which KSM associates with that merged page.  See ksm.h.
>   *
> - * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is currently never used.
> + * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is used for non-lru movable
> + * page and then page->mapping points a struct address_space.
>   *
>   * Please note that, confusingly, "page_mapping" refers to the inode
>   * address_space which maps the page from disk; whereas "page_mapped"
>   * refers to user virtual address space into which the page is mapped.
>   */
> -#define PAGE_MAPPING_ANON	1
> -#define PAGE_MAPPING_KSM	2
> -#define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM)
> +#define PAGE_MAPPING_ANON	0x1
> +#define PAGE_MAPPING_MOVABLE	0x2
> +#define PAGE_MAPPING_KSM	(PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
> +#define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
>
> -static __always_inline int PageAnonHead(struct page *page)
> +static __always_inline int PageMappingFlag(struct page *page)

PageMappingFlags()?

[...]

> diff --git a/mm/compaction.c b/mm/compaction.c
> index 1427366ad673..2d6862d0df60 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -81,6 +81,41 @@ static inline bool migrate_async_suitable(int migratetype)
>
>  #ifdef CONFIG_COMPACTION
>
> +int PageMovable(struct page *page)
> +{
> +	struct address_space *mapping;
> +
> +	WARN_ON(!PageLocked(page));

Why not VM_BUG_ON_PAGE as elsewhere?

> +	if (!__PageMovable(page))
> +		goto out;

Just return 0.

> +
> +	mapping = page_mapping(page);
> +	if (mapping && mapping->a_ops && mapping->a_ops->isolate_page)
> +		return 1;
> +out:
> +	return 0;
> +}
> +EXPORT_SYMBOL(PageMovable);
> +
> +void __SetPageMovable(struct page *page, struct address_space *mapping)
> +{
> +	VM_BUG_ON_PAGE(!PageLocked(page), page);
> +	VM_BUG_ON_PAGE((unsigned long)mapping & PAGE_MAPPING_MOVABLE, page);
> +	page->mapping = (void *)((unsigned long)mapping | PAGE_MAPPING_MOVABLE);
> +}
> +EXPORT_SYMBOL(__SetPageMovable);
> +
> +void __ClearPageMovable(struct page *page)
> +{
> +	VM_BUG_ON_PAGE(!PageLocked(page), page);
> +	VM_BUG_ON_PAGE(!PageMovable(page), page);
> +	VM_BUG_ON_PAGE(!((unsigned long)page->mapping & PAGE_MAPPING_MOVABLE),
> +				page);

The last line sounds redundant, PageMovable() already checked this via 
__PageMovable()


> +	page->mapping = (void *)((unsigned long)page->mapping &
> +				PAGE_MAPPING_MOVABLE);

This should be negated to clear... use ~PAGE_MAPPING_MOVABLE ?

> +}
> +EXPORT_SYMBOL(__ClearPageMovable);
> +
>  /* Do not skip compaction more than 64 times */
>  #define COMPACT_MAX_DEFER_SHIFT 6
>
> @@ -735,21 +770,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  		}
>
>  		/*
> -		 * Check may be lockless but that's ok as we recheck later.
> -		 * It's possible to migrate LRU pages and balloon pages
> -		 * Skip any other type of page
> -		 */
> -		is_lru = PageLRU(page);
> -		if (!is_lru) {
> -			if (unlikely(balloon_page_movable(page))) {
> -				if (balloon_page_isolate(page)) {
> -					/* Successfully isolated */
> -					goto isolate_success;
> -				}
> -			}
> -		}

So this effectively prevents movable compound pages from being migrated. Are you 
sure no users of this functionality are going to have compound pages? I assumed 
that they could, and so made the code like this, with the is_lru variable (which 
is redundant after your change).

> -		/*
>  		 * Regardless of being on LRU, compound pages such as THP and
>  		 * hugetlbfs are not to be compacted. We can potentially save
>  		 * a lot of iterations if we skip them at once. The check is
> @@ -765,8 +785,38 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  			goto isolate_fail;
>  		}
>
> -		if (!is_lru)
> +		/*
> +		 * Check may be lockless but that's ok as we recheck later.
> +		 * It's possible to migrate LRU and non-lru movable pages.
> +		 * Skip any other type of page
> +		 */
> +		is_lru = PageLRU(page);
> +		if (!is_lru) {
> +			if (unlikely(balloon_page_movable(page))) {
> +				if (balloon_page_isolate(page)) {
> +					/* Successfully isolated */
> +					goto isolate_success;
> +				}
> +			}

[...]

> +bool isolate_movable_page(struct page *page, isolate_mode_t mode)
> +{
> +	struct address_space *mapping;
> +
> +	/*
> +	 * Avoid burning cycles with pages that are yet under __free_pages(),
> +	 * or just got freed under us.
> +	 *
> +	 * In case we 'win' a race for a movable page being freed under us and
> +	 * raise its refcount preventing __free_pages() from doing its job
> +	 * the put_page() at the end of this block will take care of
> +	 * release this page, thus avoiding a nasty leakage.
> +	 */
> +	if (unlikely(!get_page_unless_zero(page)))
> +		goto out;
> +
> +	/*
> +	 * Check PageMovable before holding a PG_lock because page's owner
> +	 * assumes anybody doesn't touch PG_lock of newly allocated page
> +	 * so unconditionally grapping the lock ruins page's owner side.
> +	 */
> +	if (unlikely(!__PageMovable(page)))
> +		goto out_putpage;
> +	/*
> +	 * As movable pages are not isolated from LRU lists, concurrent
> +	 * compaction threads can race against page migration functions
> +	 * as well as race against the releasing a page.
> +	 *
> +	 * In order to avoid having an already isolated movable page
> +	 * being (wrongly) re-isolated while it is under migration,
> +	 * or to avoid attempting to isolate pages being released,
> +	 * lets be sure we have the page lock
> +	 * before proceeding with the movable page isolation steps.
> +	 */
> +	if (unlikely(!trylock_page(page)))
> +		goto out_putpage;
> +
> +	if (!PageMovable(page) || PageIsolated(page))
> +		goto out_no_isolated;
> +
> +	mapping = page_mapping(page);

Hmm so on first tail page of a THP compound page, page->mapping will alias with 
compound_mapcount. That can easily have a value matching PageMovable flags and 
we'll proceed and start inspecting the compound head in page_mapping()... maybe 
it's not a big deal, or we better check and skip PageTail first, must think 
about it more...

[...]

> @@ -755,33 +844,69 @@ static int move_to_new_page(struct page *newpage, struct page *page,
>  				enum migrate_mode mode)
>  {
>  	struct address_space *mapping;
> -	int rc;
> +	int rc = -EAGAIN;
> +	bool is_lru = !__PageMovable(page);
>
>  	VM_BUG_ON_PAGE(!PageLocked(page), page);
>  	VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
>
>  	mapping = page_mapping(page);
> -	if (!mapping)
> -		rc = migrate_page(mapping, newpage, page, mode);
> -	else if (mapping->a_ops->migratepage)
> -		/*
> -		 * Most pages have a mapping and most filesystems provide a
> -		 * migratepage callback. Anonymous pages are part of swap
> -		 * space which also has its own migratepage callback. This
> -		 * is the most common path for page migration.
> -		 */
> -		rc = mapping->a_ops->migratepage(mapping, newpage, page, mode);
> -	else
> -		rc = fallback_migrate_page(mapping, newpage, page, mode);
> +	/*
> +	 * In case of non-lru page, it could be released after
> +	 * isolation step. In that case, we shouldn't try
> +	 * fallback migration which is designed for LRU pages.
> +	 */

Hmm but is_lru was determined from !__PageMovable() above, also well after the 
isolation step. So if the driver already released it, we wouldn't detect it? And 
this function is all under same page lock, so if __PageMovable was true above, 
so will be PageMovable below?

> +	if (unlikely(!is_lru)) {
> +		VM_BUG_ON_PAGE(!PageIsolated(page), page);
> +		if (!PageMovable(page)) {
> +			rc = MIGRATEPAGE_SUCCESS;
> +			__ClearPageIsolated(page);
> +			goto out;
> +		}
> +	}
> +
> +	if (likely(is_lru)) {
> +		if (!mapping)
> +			rc = migrate_page(mapping, newpage, page, mode);
> +		else if (mapping->a_ops->migratepage)
> +			/*
> +			 * Most pages have a mapping and most filesystems
> +			 * provide a migratepage callback. Anonymous pages
> +			 * are part of swap space which also has its own
> +			 * migratepage callback. This is the most common path
> +			 * for page migration.
> +			 */
> +			rc = mapping->a_ops->migratepage(mapping, newpage,
> +							page, mode);
> +		else
> +			rc = fallback_migrate_page(mapping, newpage,
> +							page, mode);
> +	} else {
> +		rc = mapping->a_ops->migratepage(mapping, newpage,
> +						page, mode);
> +		WARN_ON_ONCE(rc == MIGRATEPAGE_SUCCESS &&
> +			!PageIsolated(page));
> +	}

Why split the !is_lru handling in two places?

>
>  	/*
>  	 * When successful, old pagecache page->mapping must be cleared before
>  	 * page is freed; but stats require that PageAnon be left as PageAnon.
>  	 */
>  	if (rc == MIGRATEPAGE_SUCCESS) {
> -		if (!PageAnon(page))
> +		if (__PageMovable(page)) {
> +			VM_BUG_ON_PAGE(!PageIsolated(page), page);
> +
> +			/*
> +			 * We clear PG_movable under page_lock so any compactor
> +			 * cannot try to migrate this page.
> +			 */
> +			__ClearPageIsolated(page);
> +		}
> +
> +		if (!((unsigned long)page->mapping & PAGE_MAPPING_FLAGS))
>  			page->mapping = NULL;

The two lines above make little sense to me without a comment. Should the 
condition be negated, even?

^ permalink raw reply

* couple virtio header files refer to non-existent "__KERNEL__" test
From: Robert P. J. Day @ 2016-05-29 16:04 UTC (permalink / raw)
  To: virtualization


  just noticed that these header files contain comments referring to
"the #ifdef __KERNEL__ part":

include/uapi/linux/virtio_console.h: * This header, excluding the #ifdef __KERNEL__ part, is BSD licensed so
include/uapi/linux/virtio_config.h:/* This header, excluding the #ifdef __KERNEL__ part, is BSD licensed so
include/linux/virtio_console.h: * This header, excluding the #ifdef __KERNEL__ part, is BSD licensed so

but none of those files contain any "#ifdef __KERNEL__" test, i'm
assuming since those were removed when the header files were
refactored to use UAPI(?)

  in any event, someone is welcome to remove those useless references
if they are so inclined.

rday

-- 

========================================================================
Robert P. J. Day                                 Ottawa, Ontario, CANADA
                        http://crashcourse.ca

Twitter:                                       http://twitter.com/rpjday
LinkedIn:                               http://ca.linkedin.com/in/rpjday
========================================================================

^ permalink raw reply

* Re: [PATCH v6 02/12] mm: migrate: support non-lru movable page migration
From: Minchan Kim @ 2016-05-30  1:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Rik van Riel, Sergey Senozhatsky, Rafael Aquini, Jonathan Corbet,
	Hugh Dickins, linux-kernel, dri-devel, virtualization,
	John Einar Reitan, linux-mm, Mel Gorman, Andrew Morton,
	Joonsoo Kim, Gioh Kim
In-Reply-To: <ebe3244c-4821-aad2-ed32-8e730a882438@suse.cz>

On Fri, May 27, 2016 at 04:26:21PM +0200, Vlastimil Babka wrote:
> On 05/20/2016 04:23 PM, Minchan Kim wrote:
> >We have allowed migration for only LRU pages until now and it was
> >enough to make high-order pages. But recently, embedded system(e.g.,
> >webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory)
> >so we have seen several reports about troubles of small high-order
> >allocation. For fixing the problem, there were several efforts
> >(e,g,. enhance compaction algorithm, SLUB fallback to 0-order page,
> >reserved memory, vmalloc and so on) but if there are lots of
> >non-movable pages in system, their solutions are void in the long run.
> >
> >So, this patch is to support facility to change non-movable pages
> >with movable. For the feature, this patch introduces functions related
> >to migration to address_space_operations as well as some page flags.
> >
> >If a driver want to make own pages movable, it should define three functions
> >which are function pointers of struct address_space_operations.
> >
> >1. bool (*isolate_page) (struct page *page, isolate_mode_t mode);
> >
> >What VM expects on isolate_page function of driver is to return *true*
> >if driver isolates page successfully. On returing true, VM marks the page
> >as PG_isolated so concurrent isolation in several CPUs skip the page
> >for isolation. If a driver cannot isolate the page, it should return *false*.
> >
> >Once page is successfully isolated, VM uses page.lru fields so driver
> >shouldn't expect to preserve values in that fields.
> >
> >2. int (*migratepage) (struct address_space *mapping,
> >		struct page *newpage, struct page *oldpage, enum migrate_mode);
> >
> >After isolation, VM calls migratepage of driver with isolated page.
> >The function of migratepage is to move content of the old page to new page
> >and set up fields of struct page newpage. Keep in mind that you should
> >clear PG_movable of oldpage via __ClearPageMovable under page_lock if you
> >migrated the oldpage successfully and returns 0.
> >If driver cannot migrate the page at the moment, driver can return -EAGAIN.
> >On -EAGAIN, VM will retry page migration in a short time because VM interprets
> >-EAGAIN as "temporal migration failure". On returning any error except -EAGAIN,
> >VM will give up the page migration without retrying in this time.
> >
> >Driver shouldn't touch page.lru field VM using in the functions.
> >
> >3. void (*putback_page)(struct page *);
> >
> >If migration fails on isolated page, VM should return the isolated page
> >to the driver so VM calls driver's putback_page with migration failed page.
> >In this function, driver should put the isolated page back to the own data
> >structure.
> >
> >4. non-lru movable page flags
> >
> >There are two page flags for supporting non-lru movable page.
> >
> >* PG_movable
> >
> >Driver should use the below function to make page movable under page_lock.
> >
> >	void __SetPageMovable(struct page *page, struct address_space *mapping)
> >
> >It needs argument of address_space for registering migration family functions
> >which will be called by VM. Exactly speaking, PG_movable is not a real flag of
> >struct page. Rather than, VM reuses page->mapping's lower bits to represent it.
> >
> >	#define PAGE_MAPPING_MOVABLE 0x2
> >	page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;
> 
> Interesting, let's see how that works out...
> 
> Overal this looks much better than the last version I checked!

Thanks.

> 
> [...]
> 
> >@@ -357,29 +360,37 @@ PAGEFLAG(Idle, idle, PF_ANY)
> >  * with the PAGE_MAPPING_ANON bit set to distinguish it.  See rmap.h.
> >  *
> >  * On an anonymous page in a VM_MERGEABLE area, if CONFIG_KSM is enabled,
> >- * the PAGE_MAPPING_KSM bit may be set along with the PAGE_MAPPING_ANON bit;
> >- * and then page->mapping points, not to an anon_vma, but to a private
> >+ * the PAGE_MAPPING_MOVABLE bit may be set along with the PAGE_MAPPING_ANON
> >+ * bit; and then page->mapping points, not to an anon_vma, but to a private
> >  * structure which KSM associates with that merged page.  See ksm.h.
> >  *
> >- * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is currently never used.
> >+ * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is used for non-lru movable
> >+ * page and then page->mapping points a struct address_space.
> >  *
> >  * Please note that, confusingly, "page_mapping" refers to the inode
> >  * address_space which maps the page from disk; whereas "page_mapped"
> >  * refers to user virtual address space into which the page is mapped.
> >  */
> >-#define PAGE_MAPPING_ANON	1
> >-#define PAGE_MAPPING_KSM	2
> >-#define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM)
> >+#define PAGE_MAPPING_ANON	0x1
> >+#define PAGE_MAPPING_MOVABLE	0x2
> >+#define PAGE_MAPPING_KSM	(PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
> >+#define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
> >
> >-static __always_inline int PageAnonHead(struct page *page)
> >+static __always_inline int PageMappingFlag(struct page *page)
> 
> PageMappingFlags()?

Yeb.

> 
> [...]
> 
> >diff --git a/mm/compaction.c b/mm/compaction.c
> >index 1427366ad673..2d6862d0df60 100644
> >--- a/mm/compaction.c
> >+++ b/mm/compaction.c
> >@@ -81,6 +81,41 @@ static inline bool migrate_async_suitable(int migratetype)
> >
> > #ifdef CONFIG_COMPACTION
> >
> >+int PageMovable(struct page *page)
> >+{
> >+	struct address_space *mapping;
> >+
> >+	WARN_ON(!PageLocked(page));
> 
> Why not VM_BUG_ON_PAGE as elsewhere?

I have backported this patchset to huge old kernel which doesn't have
VM_BUG_ON_PAGE and forgot to correct it beforre sending mainline.

Thanks.

> 
> >+	if (!__PageMovable(page))
> >+		goto out;
> 
> Just return 0.

Maybe I love goto. You realized me. I will split up will her.

> 
> >+
> >+	mapping = page_mapping(page);
> >+	if (mapping && mapping->a_ops && mapping->a_ops->isolate_page)
> >+		return 1;
> >+out:
> >+	return 0;
> >+}
> >+EXPORT_SYMBOL(PageMovable);
> >+
> >+void __SetPageMovable(struct page *page, struct address_space *mapping)
> >+{
> >+	VM_BUG_ON_PAGE(!PageLocked(page), page);
> >+	VM_BUG_ON_PAGE((unsigned long)mapping & PAGE_MAPPING_MOVABLE, page);
> >+	page->mapping = (void *)((unsigned long)mapping | PAGE_MAPPING_MOVABLE);
> >+}
> >+EXPORT_SYMBOL(__SetPageMovable);
> >+
> >+void __ClearPageMovable(struct page *page)
> >+{
> >+	VM_BUG_ON_PAGE(!PageLocked(page), page);
> >+	VM_BUG_ON_PAGE(!PageMovable(page), page);
> >+	VM_BUG_ON_PAGE(!((unsigned long)page->mapping & PAGE_MAPPING_MOVABLE),
> >+				page);
> 
> The last line sounds redundant, PageMovable() already checked this
> via __PageMovable()

Yeb.

> 
> 
> >+	page->mapping = (void *)((unsigned long)page->mapping &
> >+				PAGE_MAPPING_MOVABLE);
> 
> This should be negated to clear... use ~PAGE_MAPPING_MOVABLE ?

No.

The intention is to clear only mapping value but PAGE_MAPPING_MOVABLE
flag. So, any new migration trial will be failed because PageMovable
checks page's mapping value but ongoing migraion handling can catch
whether it's movable page or not with the type bit.

For example, we need to keep the type bit to handle putback the page.
With parallel freeing(e.g., __ClearPageMovable) from the owner after
isolating the page, migration will be failed and put it back. Then,
we need to identify movable to clear PG_isolate and shouldn't call
mapping->a_ops->putback_page. For that, we need to keep the type bit
until the page is return to buddy.

> 
> >+}
> >+EXPORT_SYMBOL(__ClearPageMovable);
> >+
> > /* Do not skip compaction more than 64 times */
> > #define COMPACT_MAX_DEFER_SHIFT 6
> >
> >@@ -735,21 +770,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> > 		}
> >
> > 		/*
> >-		 * Check may be lockless but that's ok as we recheck later.
> >-		 * It's possible to migrate LRU pages and balloon pages
> >-		 * Skip any other type of page
> >-		 */
> >-		is_lru = PageLRU(page);
> >-		if (!is_lru) {
> >-			if (unlikely(balloon_page_movable(page))) {
> >-				if (balloon_page_isolate(page)) {
> >-					/* Successfully isolated */
> >-					goto isolate_success;
> >-				}
> >-			}
> >-		}
> 
> So this effectively prevents movable compound pages from being
> migrated. Are you sure no users of this functionality are going to
> have compound pages? I assumed that they could, and so made the code
> like this, with the is_lru variable (which is redundant after your
> change).

This implementation at the moment disables effectively non-lru compound
page migration but I'm not a god so I can't make sure no one doesn't want
it in future. If someone want it, we can support it then because this work
doesn't prevent it by design.

> 
> >-		/*
> > 		 * Regardless of being on LRU, compound pages such as THP and
> > 		 * hugetlbfs are not to be compacted. We can potentially save
> > 		 * a lot of iterations if we skip them at once. The check is
> >@@ -765,8 +785,38 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> > 			goto isolate_fail;
> > 		}
> >
> >-		if (!is_lru)
> >+		/*
> >+		 * Check may be lockless but that's ok as we recheck later.
> >+		 * It's possible to migrate LRU and non-lru movable pages.
> >+		 * Skip any other type of page
> >+		 */
> >+		is_lru = PageLRU(page);
> >+		if (!is_lru) {
> >+			if (unlikely(balloon_page_movable(page))) {
> >+				if (balloon_page_isolate(page)) {
> >+					/* Successfully isolated */
> >+					goto isolate_success;
> >+				}
> >+			}
> 
> [...]
> 
> >+bool isolate_movable_page(struct page *page, isolate_mode_t mode)
> >+{
> >+	struct address_space *mapping;
> >+
> >+	/*
> >+	 * Avoid burning cycles with pages that are yet under __free_pages(),
> >+	 * or just got freed under us.
> >+	 *
> >+	 * In case we 'win' a race for a movable page being freed under us and
> >+	 * raise its refcount preventing __free_pages() from doing its job
> >+	 * the put_page() at the end of this block will take care of
> >+	 * release this page, thus avoiding a nasty leakage.
> >+	 */
> >+	if (unlikely(!get_page_unless_zero(page)))
> >+		goto out;
> >+
> >+	/*
> >+	 * Check PageMovable before holding a PG_lock because page's owner
> >+	 * assumes anybody doesn't touch PG_lock of newly allocated page
> >+	 * so unconditionally grapping the lock ruins page's owner side.
> >+	 */
> >+	if (unlikely(!__PageMovable(page)))
> >+		goto out_putpage;
> >+	/*
> >+	 * As movable pages are not isolated from LRU lists, concurrent
> >+	 * compaction threads can race against page migration functions
> >+	 * as well as race against the releasing a page.
> >+	 *
> >+	 * In order to avoid having an already isolated movable page
> >+	 * being (wrongly) re-isolated while it is under migration,
> >+	 * or to avoid attempting to isolate pages being released,
> >+	 * lets be sure we have the page lock
> >+	 * before proceeding with the movable page isolation steps.
> >+	 */
> >+	if (unlikely(!trylock_page(page)))
> >+		goto out_putpage;
> >+
> >+	if (!PageMovable(page) || PageIsolated(page))
> >+		goto out_no_isolated;
> >+
> >+	mapping = page_mapping(page);
> 
> Hmm so on first tail page of a THP compound page, page->mapping will
> alias with compound_mapcount. That can easily have a value matching
> PageMovable flags and we'll proceed and start inspecting the
> compound head in page_mapping()... maybe it's not a big deal, or we
> better check and skip PageTail first, must think about it more...

I thouht PageCompound check right before isolate_movable_page in
isolate_migratepages_block will filter it out mostly but yeah
it is racy without zone->lru_lock so it could reach to isolate_movable_page.
However, PageMovable check in there investigates mapping, mapping->a_ops,
and a_ops->isolate_page to verify whether it's movable page or not.

I thought it's sufficient to filter THP page.

> 
> [...]
> 
> >@@ -755,33 +844,69 @@ static int move_to_new_page(struct page *newpage, struct page *page,
> > 				enum migrate_mode mode)
> > {
> > 	struct address_space *mapping;
> >-	int rc;
> >+	int rc = -EAGAIN;
> >+	bool is_lru = !__PageMovable(page);
> >
> > 	VM_BUG_ON_PAGE(!PageLocked(page), page);
> > 	VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
> >
> > 	mapping = page_mapping(page);
> >-	if (!mapping)
> >-		rc = migrate_page(mapping, newpage, page, mode);
> >-	else if (mapping->a_ops->migratepage)
> >-		/*
> >-		 * Most pages have a mapping and most filesystems provide a
> >-		 * migratepage callback. Anonymous pages are part of swap
> >-		 * space which also has its own migratepage callback. This
> >-		 * is the most common path for page migration.
> >-		 */
> >-		rc = mapping->a_ops->migratepage(mapping, newpage, page, mode);
> >-	else
> >-		rc = fallback_migrate_page(mapping, newpage, page, mode);
> >+	/*
> >+	 * In case of non-lru page, it could be released after
> >+	 * isolation step. In that case, we shouldn't try
> >+	 * fallback migration which is designed for LRU pages.
> >+	 */
> 
> Hmm but is_lru was determined from !__PageMovable() above, also well
> after the isolation step. So if the driver already released it, we
> wouldn't detect it? And this function is all under same page lock,
> so if __PageMovable was true above, so will be PageMovable below?

You are missing what I mentioned above.
We should keep the type bit to catch what you are saying(i.e., driver
already released).

__PageMovable just checks PAGE_MAPPING_MOVABLE flag and PageMovable
checks page->mapping valid while __ClearPageMovable reset only
valid vaule of mapping, not PAGE_MAPPING_MOVABLE flag.

I wrote it down in Documentation/vm/page_migration.

"For testing of non-lru movable page, VM supports __PageMovable function.
However, it doesn't guarantee to identify non-lru movable page because
page->mapping field is unified with other variables in struct page.
As well, if driver releases the page after isolation by VM, page->mapping
doesn't have stable value although it has PAGE_MAPPING_MOVABLE
(Look at __ClearPageMovable). But __PageMovable is cheap to catch whether
page is LRU or non-lru movable once the page has been isolated. Because
LRU pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also
good for just peeking to test non-lru movable pages before more expensive
checking with lock_page in pfn scanning to select victim.

For guaranteeing non-lru movable page, VM provides PageMovable function.
Unlike __PageMovable, PageMovable functions validates page->mapping and 
mapping->a_ops->isolate_page under lock_page. The lock_page prevents sudden
destroying of page->mapping.

Driver using __SetPageMovable should clear the flag via __ClearMovablePage
under page_lock before the releasing the page."

> 
> >+	if (unlikely(!is_lru)) {
> >+		VM_BUG_ON_PAGE(!PageIsolated(page), page);
> >+		if (!PageMovable(page)) {
> >+			rc = MIGRATEPAGE_SUCCESS;
> >+			__ClearPageIsolated(page);
> >+			goto out;
> >+		}
> >+	}
> >+
> >+	if (likely(is_lru)) {
> >+		if (!mapping)
> >+			rc = migrate_page(mapping, newpage, page, mode);
> >+		else if (mapping->a_ops->migratepage)
> >+			/*
> >+			 * Most pages have a mapping and most filesystems
> >+			 * provide a migratepage callback. Anonymous pages
> >+			 * are part of swap space which also has its own
> >+			 * migratepage callback. This is the most common path
> >+			 * for page migration.
> >+			 */
> >+			rc = mapping->a_ops->migratepage(mapping, newpage,
> >+							page, mode);
> >+		else
> >+			rc = fallback_migrate_page(mapping, newpage,
> >+							page, mode);
> >+	} else {
> >+		rc = mapping->a_ops->migratepage(mapping, newpage,
> >+						page, mode);
> >+		WARN_ON_ONCE(rc == MIGRATEPAGE_SUCCESS &&
> >+			!PageIsolated(page));
> >+	}
> 
> Why split the !is_lru handling in two places?

Fixed.

> 
> >
> > 	/*
> > 	 * When successful, old pagecache page->mapping must be cleared before
> > 	 * page is freed; but stats require that PageAnon be left as PageAnon.
> > 	 */
> > 	if (rc == MIGRATEPAGE_SUCCESS) {
> >-		if (!PageAnon(page))
> >+		if (__PageMovable(page)) {
> >+			VM_BUG_ON_PAGE(!PageIsolated(page), page);
> >+
> >+			/*
> >+			 * We clear PG_movable under page_lock so any compactor
> >+			 * cannot try to migrate this page.
> >+			 */
> >+			__ClearPageIsolated(page);
> >+		}
> >+
> >+		if (!((unsigned long)page->mapping & PAGE_MAPPING_FLAGS))
> > 			page->mapping = NULL;
> 
> The two lines above make little sense to me without a comment.

I folded this.

@@ -901,7 +901,12 @@ static int move_to_new_page(struct page *newpage, struct page *page,
                        __ClearPageIsolated(page);
                }
 
-               if (!((unsigned long)page->mapping & PAGE_MAPPING_FLAGS))
+               /*
+                * Anonymous and movable page->mapping will be cleard by
+                * free_pages_prepare so don't reset it here for keeping
+                * the type to work PageAnon, for example.
+                */
+               if (!PageMappingFlags(page))
                        page->mapping = NULL;
        }
 out:


> Should the condition be negated, even?

No.

> 
> 

^ permalink raw reply

* PATCH v6v2 02/12] mm: migrate: support non-lru movable page migration
From: Minchan Kim @ 2016-05-30  1:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Sergey Senozhatsky, Rafael Aquini, Jonathan Corbet,
	Hugh Dickins, linux-kernel, dri-devel, virtualization,
	John Einar Reitan, linux-mm, Gioh Kim, Mel Gorman, Joonsoo Kim,
	Vlastimil Babka
In-Reply-To: <1463754225-31311-3-git-send-email-minchan@kernel.org>

Per Vlastimil's review comment,

Vlastimil, I updated based on your comment. Please review this.
If everything is done, I will send v7 rebased on recent mmotm.

Thanks for the review!

From ad4157e98651a2d18fd0a4ae90d1d9f609aab314 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Fri, 8 Apr 2016 10:34:49 +0900
Subject: [PATCH v6r2] mm: migrate: support non-lru movable page migration

We have allowed migration for only LRU pages until now and it was
enough to make high-order pages. But recently, embedded system(e.g.,
webOS, android) uses lots of non-movable pages(e.g., zram, GPU memory)
so we have seen several reports about troubles of small high-order
allocation. For fixing the problem, there were several efforts
(e,g,. enhance compaction algorithm, SLUB fallback to 0-order page,
reserved memory, vmalloc and so on) but if there are lots of
non-movable pages in system, their solutions are void in the long run.

So, this patch is to support facility to change non-movable pages
with movable. For the feature, this patch introduces functions related
to migration to address_space_operations as well as some page flags.

If a driver want to make own pages movable, it should define three functions
which are function pointers of struct address_space_operations.

1. bool (*isolate_page) (struct page *page, isolate_mode_t mode);

What VM expects on isolate_page function of driver is to return *true*
if driver isolates page successfully. On returing true, VM marks the page
as PG_isolated so concurrent isolation in several CPUs skip the page
for isolation. If a driver cannot isolate the page, it should return *false*.

Once page is successfully isolated, VM uses page.lru fields so driver
shouldn't expect to preserve values in that fields.

2. int (*migratepage) (struct address_space *mapping,
		struct page *newpage, struct page *oldpage, enum migrate_mode);

After isolation, VM calls migratepage of driver with isolated page.
The function of migratepage is to move content of the old page to new page
and set up fields of struct page newpage. Keep in mind that you should
clear PG_movable of oldpage via __ClearPageMovable under page_lock if you
migrated the oldpage successfully and returns 0.
If driver cannot migrate the page at the moment, driver can return -EAGAIN.
On -EAGAIN, VM will retry page migration in a short time because VM interprets
-EAGAIN as "temporal migration failure". On returning any error except -EAGAIN,
VM will give up the page migration without retrying in this time.

Driver shouldn't touch page.lru field VM using in the functions.

3. void (*putback_page)(struct page *);

If migration fails on isolated page, VM should return the isolated page
to the driver so VM calls driver's putback_page with migration failed page.
In this function, driver should put the isolated page back to the own data
structure.

4. non-lru movable page flags

There are two page flags for supporting non-lru movable page.

* PG_movable

Driver should use the below function to make page movable under page_lock.

	void __SetPageMovable(struct page *page, struct address_space *mapping)

It needs argument of address_space for registering migration family functions
which will be called by VM. Exactly speaking, PG_movable is not a real flag of
struct page. Rather than, VM reuses page->mapping's lower bits to represent it.

	#define PAGE_MAPPING_MOVABLE 0x2
	page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;

so driver shouldn't access page->mapping directly. Instead, driver should
use page_mapping which mask off the low two bits of page->mapping so it can get
right struct address_space.

For testing of non-lru movable page, VM supports __PageMovable function.
However, it doesn't guarantee to identify non-lru movable page because
page->mapping field is unified with other variables in struct page.
As well, if driver releases the page after isolation by VM, page->mapping
doesn't have stable value although it has PAGE_MAPPING_MOVABLE
(Look at __ClearPageMovable). But __PageMovable is cheap to catch whether
page is LRU or non-lru movable once the page has been isolated. Because
LRU pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also
good for just peeking to test non-lru movable pages before more expensive
checking with lock_page in pfn scanning to select victim.

For guaranteeing non-lru movable page, VM provides PageMovable function.
Unlike __PageMovable, PageMovable functions validates page->mapping and
mapping->a_ops->isolate_page under lock_page. The lock_page prevents sudden
destroying of page->mapping.

Driver using __SetPageMovable should clear the flag via __ClearMovablePage
under page_lock before the releasing the page.

* PG_isolated

To prevent concurrent isolation among several CPUs, VM marks isolated page
as PG_isolated under lock_page. So if a CPU encounters PG_isolated non-lru
movable page, it can skip it. Driver doesn't need to manipulate the flag
because VM will set/clear it automatically. Keep in mind that if driver
sees PG_isolated page, it means the page have been isolated by VM so it
shouldn't touch page.lru field.
PG_isolated is alias with PG_reclaim flag so driver shouldn't use the flag
for own purpose.

Cc: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: John Einar Reitan <john.reitan@foss.arm.com>
Cc: dri-devel@lists.freedesktop.org
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 Documentation/filesystems/Locking |   4 +
 Documentation/filesystems/vfs.txt |  11 +++
 Documentation/vm/page_migration   | 107 ++++++++++++++++++++-
 include/linux/compaction.h        |  17 ++++
 include/linux/fs.h                |   2 +
 include/linux/ksm.h               |   3 +-
 include/linux/migrate.h           |   2 +
 include/linux/mm.h                |   1 +
 include/linux/page-flags.h        |  33 +++++--
 mm/compaction.c                   |  80 ++++++++++++----
 mm/ksm.c                          |   4 +-
 mm/migrate.c                      | 192 ++++++++++++++++++++++++++++++++++----
 mm/page_alloc.c                   |   2 +-
 mm/util.c                         |   6 +-
 14 files changed, 412 insertions(+), 52 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index af7c030a0368..3991a976cf43 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -195,7 +195,9 @@ unlocks and drops the reference.
 	int (*releasepage) (struct page *, int);
 	void (*freepage)(struct page *);
 	int (*direct_IO)(struct kiocb *, struct iov_iter *iter);
+	bool (*isolate_page) (struct page *, isolate_mode_t);
 	int (*migratepage)(struct address_space *, struct page *, struct page *);
+	void (*putback_page) (struct page *);
 	int (*launder_page)(struct page *);
 	int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long);
 	int (*error_remove_page)(struct address_space *, struct page *);
@@ -219,7 +221,9 @@ invalidatepage:		yes
 releasepage:		yes
 freepage:		yes
 direct_IO:
+isolate_page:		yes
 migratepage:		yes (both)
+putback_page:		yes
 launder_page:		yes
 is_partially_uptodate:	yes
 error_remove_page:	yes
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 19366fef2652..9d4ae317fdcb 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -591,9 +591,14 @@ struct address_space_operations {
 	int (*releasepage) (struct page *, int);
 	void (*freepage)(struct page *);
 	ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter);
+	/* isolate a page for migration */
+	bool (*isolate_page) (struct page *, isolate_mode_t);
 	/* migrate the contents of a page to the specified target */
 	int (*migratepage) (struct page *, struct page *);
+	/* put migration-failed page back to right list */
+	void (*putback_page) (struct page *);
 	int (*launder_page) (struct page *);
+
 	int (*is_partially_uptodate) (struct page *, unsigned long,
 					unsigned long);
 	void (*is_dirty_writeback) (struct page *, bool *, bool *);
@@ -739,6 +744,10 @@ struct address_space_operations {
         and transfer data directly between the storage and the
         application's address space.
 
+  isolate_page: Called by the VM when isolating a movable non-lru page.
+	If page is successfully isolated, VM marks the page as PG_isolated
+	via __SetPageIsolated.
+
   migrate_page:  This is used to compact the physical memory usage.
         If the VM wants to relocate a page (maybe off a memory card
         that is signalling imminent failure) it will pass a new page
@@ -746,6 +755,8 @@ struct address_space_operations {
 	transfer any private data across and update any references
         that it has to the page.
 
+  putback_page: Called by the VM when isolated page's migration fails.
+
   launder_page: Called before freeing a page - it writes back the dirty page. To
   	prevent redirtying the page, it is kept locked during the whole
 	operation.
diff --git a/Documentation/vm/page_migration b/Documentation/vm/page_migration
index fea5c0864170..80e98af46e95 100644
--- a/Documentation/vm/page_migration
+++ b/Documentation/vm/page_migration
@@ -142,5 +142,110 @@ is increased so that the page cannot be freed while page migration occurs.
 20. The new page is moved to the LRU and can be scanned by the swapper
     etc again.
 
-Christoph Lameter, May 8, 2006.
+C. Non-LRU page migration
+-------------------------
+
+Although original migration aimed for reducing the latency of memory access
+for NUMA, compaction who want to create high-order page is also main customer.
+
+Current problem of the implementation is that it is designed to migrate only
+*LRU* pages. However, there are potential non-lru pages which can be migrated
+in drivers, for example, zsmalloc, virtio-balloon pages.
+
+For virtio-balloon pages, some parts of migration code path have been hooked
+up and added virtio-balloon specific functions to intercept migration logics.
+It's too specific to a driver so other drivers who want to make their pages
+movable would have to add own specific hooks in migration path.
+
+To overclome the problem, VM supports non-LRU page migration which provides
+generic functions for non-LRU movable pages without driver specific hooks
+migration path.
+
+If a driver want to make own pages movable, it should define three functions
+which are function pointers of struct address_space_operations.
+
+1. bool (*isolate_page) (struct page *page, isolate_mode_t mode);
+
+What VM expects on isolate_page function of driver is to return *true*
+if driver isolates page successfully. On returing true, VM marks the page
+as PG_isolated so concurrent isolation in several CPUs skip the page
+for isolation. If a driver cannot isolate the page, it should return *false*.
+
+Once page is successfully isolated, VM uses page.lru fields so driver
+shouldn't expect to preserve values in that fields.
+
+2. int (*migratepage) (struct address_space *mapping,
+		struct page *newpage, struct page *oldpage, enum migrate_mode);
+
+After isolation, VM calls migratepage of driver with isolated page.
+The function of migratepage is to move content of the old page to new page
+and set up fields of struct page newpage. Keep in mind that you should
+clear PG_movable of oldpage via __ClearPageMovable under page_lock if you
+migrated the oldpage successfully and returns 0.
+If driver cannot migrate the page at the moment, driver can return -EAGAIN.
+On -EAGAIN, VM will retry page migration in a short time because VM interprets
+-EAGAIN as "temporal migration failure". On returning any error except -EAGAIN,
+VM will give up the page migration without retrying in this time.
+
+Driver shouldn't touch page.lru field VM using in the functions.
+
+3. void (*putback_page)(struct page *);
+
+If migration fails on isolated page, VM should return the isolated page
+to the driver so VM calls driver's putback_page with migration failed page.
+In this function, driver should put the isolated page back to the own data
+structure.
 
+4. non-lru movable page flags
+
+There are two page flags for supporting non-lru movable page.
+
+* PG_movable
+
+Driver should use the below function to make page movable under page_lock.
+
+	void __SetPageMovable(struct page *page, struct address_space *mapping)
+
+It needs argument of address_space for registering migration family functions
+which will be called by VM. Exactly speaking, PG_movable is not a real flag of
+struct page. Rather than, VM reuses page->mapping's lower bits to represent it.
+
+	#define PAGE_MAPPING_MOVABLE 0x2
+	page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;
+
+so driver shouldn't access page->mapping directly. Instead, driver should
+use page_mapping which mask off the low two bits of page->mapping so it can get
+right struct address_space.
+
+For testing of non-lru movable page, VM supports __PageMovable function.
+However, it doesn't guarantee to identify non-lru movable page because
+page->mapping field is unified with other variables in struct page.
+As well, if driver releases the page after isolation by VM, page->mapping
+doesn't have stable value although it has PAGE_MAPPING_MOVABLE
+(Look at __ClearPageMovable). But __PageMovable is cheap to catch whether
+page is LRU or non-lru movable once the page has been isolated. Because
+LRU pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also
+good for just peeking to test non-lru movable pages before more expensive
+checking with lock_page in pfn scanning to select victim.
+
+For guaranteeing non-lru movable page, VM provides PageMovable function.
+Unlike __PageMovable, PageMovable functions validates page->mapping and
+mapping->a_ops->isolate_page under lock_page. The lock_page prevents sudden
+destroying of page->mapping.
+
+Driver using __SetPageMovable should clear the flag via __ClearMovablePage
+under page_lock before the releasing the page.
+
+* PG_isolated
+
+To prevent concurrent isolation among several CPUs, VM marks isolated page
+as PG_isolated under lock_page. So if a CPU encounters PG_isolated non-lru
+movable page, it can skip it. Driver doesn't need to manipulate the flag
+because VM will set/clear it automatically. Keep in mind that if driver
+sees PG_isolated page, it means the page have been isolated by VM so it
+shouldn't touch page.lru field.
+PG_isolated is alias with PG_reclaim flag so driver shouldn't use the flag
+for own purpose.
+
+Christoph Lameter, May 8, 2006.
+Minchan Kim, Mar 28, 2016.
diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index a58c852a268f..c6b47c861cea 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -54,6 +54,9 @@ enum compact_result {
 struct alloc_context; /* in mm/internal.h */
 
 #ifdef CONFIG_COMPACTION
+extern int PageMovable(struct page *page);
+extern void __SetPageMovable(struct page *page, struct address_space *mapping);
+extern void __ClearPageMovable(struct page *page);
 extern int sysctl_compact_memory;
 extern int sysctl_compaction_handler(struct ctl_table *table, int write,
 			void __user *buffer, size_t *length, loff_t *ppos);
@@ -151,6 +154,19 @@ extern void kcompactd_stop(int nid);
 extern void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_idx);
 
 #else
+static inline int PageMovable(struct page *page)
+{
+	return 0;
+}
+static inline void __SetPageMovable(struct page *page,
+			struct address_space *mapping)
+{
+}
+
+static inline void __ClearPageMovable(struct page *page)
+{
+}
+
 static inline enum compact_result try_to_compact_pages(gfp_t gfp_mask,
 			unsigned int order, int alloc_flags,
 			const struct alloc_context *ac,
@@ -212,6 +228,7 @@ static inline void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_i
 #endif /* CONFIG_COMPACTION */
 
 #if defined(CONFIG_COMPACTION) && defined(CONFIG_SYSFS) && defined(CONFIG_NUMA)
+struct node;
 extern int compaction_register_node(struct node *node);
 extern void compaction_unregister_node(struct node *node);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0cfdf2aec8f7..39ef97414033 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -402,6 +402,8 @@ struct address_space_operations {
 	 */
 	int (*migratepage) (struct address_space *,
 			struct page *, struct page *, enum migrate_mode);
+	bool (*isolate_page)(struct page *, isolate_mode_t);
+	void (*putback_page)(struct page *);
 	int (*launder_page) (struct page *);
 	int (*is_partially_uptodate) (struct page *, unsigned long,
 					unsigned long);
diff --git a/include/linux/ksm.h b/include/linux/ksm.h
index 7ae216a39c9e..481c8c4627ca 100644
--- a/include/linux/ksm.h
+++ b/include/linux/ksm.h
@@ -43,8 +43,7 @@ static inline struct stable_node *page_stable_node(struct page *page)
 static inline void set_page_stable_node(struct page *page,
 					struct stable_node *stable_node)
 {
-	page->mapping = (void *)stable_node +
-				(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
+	page->mapping = (void *)((unsigned long)stable_node | PAGE_MAPPING_KSM);
 }
 
 /*
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 9b50325e4ddf..404fbfefeb33 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -37,6 +37,8 @@ extern int migrate_page(struct address_space *,
 			struct page *, struct page *, enum migrate_mode);
 extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
 		unsigned long private, enum migrate_mode mode, int reason);
+extern bool isolate_movable_page(struct page *page, isolate_mode_t mode);
+extern void putback_movable_page(struct page *page);
 
 extern int migrate_prep(void);
 extern int migrate_prep_local(void);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a00ec816233a..33eaec57e997 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1035,6 +1035,7 @@ static inline pgoff_t page_file_index(struct page *page)
 }
 
 bool page_mapped(struct page *page);
+struct address_space *page_mapping(struct page *page);
 
 /*
  * Return true only if the page has been allocated with
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e5a32445f930..f36dbb3a3060 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -129,6 +129,9 @@ enum pageflags {
 
 	/* Compound pages. Stored in first tail page's flags */
 	PG_double_map = PG_private_2,
+
+	/* non-lru isolated movable page */
+	PG_isolated = PG_reclaim,
 };
 
 #ifndef __GENERATING_BOUNDS_H
@@ -357,29 +360,37 @@ PAGEFLAG(Idle, idle, PF_ANY)
  * with the PAGE_MAPPING_ANON bit set to distinguish it.  See rmap.h.
  *
  * On an anonymous page in a VM_MERGEABLE area, if CONFIG_KSM is enabled,
- * the PAGE_MAPPING_KSM bit may be set along with the PAGE_MAPPING_ANON bit;
- * and then page->mapping points, not to an anon_vma, but to a private
+ * the PAGE_MAPPING_MOVABLE bit may be set along with the PAGE_MAPPING_ANON
+ * bit; and then page->mapping points, not to an anon_vma, but to a private
  * structure which KSM associates with that merged page.  See ksm.h.
  *
- * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is currently never used.
+ * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is used for non-lru movable
+ * page and then page->mapping points a struct address_space.
  *
  * Please note that, confusingly, "page_mapping" refers to the inode
  * address_space which maps the page from disk; whereas "page_mapped"
  * refers to user virtual address space into which the page is mapped.
  */
-#define PAGE_MAPPING_ANON	1
-#define PAGE_MAPPING_KSM	2
-#define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM)
+#define PAGE_MAPPING_ANON	0x1
+#define PAGE_MAPPING_MOVABLE	0x2
+#define PAGE_MAPPING_KSM	(PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
+#define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
 
-static __always_inline int PageAnonHead(struct page *page)
+static __always_inline int PageMappingFlags(struct page *page)
 {
-	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
+	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) != 0;
 }
 
 static __always_inline int PageAnon(struct page *page)
 {
 	page = compound_head(page);
-	return PageAnonHead(page);
+	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
+}
+
+static __always_inline int __PageMovable(struct page *page)
+{
+	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
+				PAGE_MAPPING_MOVABLE;
 }
 
 #ifdef CONFIG_KSM
@@ -393,7 +404,7 @@ static __always_inline int PageKsm(struct page *page)
 {
 	page = compound_head(page);
 	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
-				(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
+				PAGE_MAPPING_KSM;
 }
 #else
 TESTPAGEFLAG_FALSE(Ksm)
@@ -641,6 +652,8 @@ static inline void __ClearPageBalloon(struct page *page)
 	atomic_set(&page->_mapcount, -1);
 }
 
+__PAGEFLAG(Isolated, isolated, PF_ANY);
+
 /*
  * If network-based swap is enabled, sl*b must keep track of whether pages
  * were allocated from pfmemalloc reserves.
diff --git a/mm/compaction.c b/mm/compaction.c
index 1427366ad673..c7e0cd4dda9d 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -81,6 +81,39 @@ static inline bool migrate_async_suitable(int migratetype)
 
 #ifdef CONFIG_COMPACTION
 
+int PageMovable(struct page *page)
+{
+	struct address_space *mapping;
+
+	VM_BUG_ON_PAGE(!PageLocked(page), page);
+	if (!__PageMovable(page))
+		return 0;
+
+	mapping = page_mapping(page);
+	if (mapping && mapping->a_ops && mapping->a_ops->isolate_page)
+		return 1;
+
+	return 0;
+}
+EXPORT_SYMBOL(PageMovable);
+
+void __SetPageMovable(struct page *page, struct address_space *mapping)
+{
+	VM_BUG_ON_PAGE(!PageLocked(page), page);
+	VM_BUG_ON_PAGE((unsigned long)mapping & PAGE_MAPPING_MOVABLE, page);
+	page->mapping = (void *)((unsigned long)mapping | PAGE_MAPPING_MOVABLE);
+}
+EXPORT_SYMBOL(__SetPageMovable);
+
+void __ClearPageMovable(struct page *page)
+{
+	VM_BUG_ON_PAGE(!PageLocked(page), page);
+	VM_BUG_ON_PAGE(!PageMovable(page), page);
+	page->mapping = (void *)((unsigned long)page->mapping &
+				PAGE_MAPPING_MOVABLE);
+}
+EXPORT_SYMBOL(__ClearPageMovable);
+
 /* Do not skip compaction more than 64 times */
 #define COMPACT_MAX_DEFER_SHIFT 6
 
@@ -735,21 +768,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		}
 
 		/*
-		 * Check may be lockless but that's ok as we recheck later.
-		 * It's possible to migrate LRU pages and balloon pages
-		 * Skip any other type of page
-		 */
-		is_lru = PageLRU(page);
-		if (!is_lru) {
-			if (unlikely(balloon_page_movable(page))) {
-				if (balloon_page_isolate(page)) {
-					/* Successfully isolated */
-					goto isolate_success;
-				}
-			}
-		}
-
-		/*
 		 * Regardless of being on LRU, compound pages such as THP and
 		 * hugetlbfs are not to be compacted. We can potentially save
 		 * a lot of iterations if we skip them at once. The check is
@@ -765,8 +783,38 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 			goto isolate_fail;
 		}
 
-		if (!is_lru)
+		/*
+		 * Check may be lockless but that's ok as we recheck later.
+		 * It's possible to migrate LRU and non-lru movable pages.
+		 * Skip any other type of page
+		 */
+		is_lru = PageLRU(page);
+		if (!is_lru) {
+			if (unlikely(balloon_page_movable(page))) {
+				if (balloon_page_isolate(page)) {
+					/* Successfully isolated */
+					goto isolate_success;
+				}
+			}
+
+			/*
+			 * __PageMovable can return false positive so we need
+			 * to verify it under page_lock.
+			 */
+			if (unlikely(__PageMovable(page)) &&
+					!PageIsolated(page)) {
+				if (locked) {
+					spin_unlock_irqrestore(&zone->lru_lock,
+									flags);
+					locked = false;
+				}
+
+				if (isolate_movable_page(page, isolate_mode))
+					goto isolate_success;
+			}
+
 			goto isolate_fail;
+		}
 
 		/*
 		 * Migration will fail if an anonymous page is pinned in memory,
diff --git a/mm/ksm.c b/mm/ksm.c
index 4786b4150f62..35b8aef867a9 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -532,8 +532,8 @@ static struct page *get_ksm_page(struct stable_node *stable_node, bool lock_it)
 	void *expected_mapping;
 	unsigned long kpfn;
 
-	expected_mapping = (void *)stable_node +
-				(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
+	expected_mapping = (void *)((unsigned long)stable_node |
+					PAGE_MAPPING_KSM);
 again:
 	kpfn = READ_ONCE(stable_node->kpfn);
 	page = pfn_to_page(kpfn);
diff --git a/mm/migrate.c b/mm/migrate.c
index 2666f28b5236..60abcf379b51 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -31,6 +31,7 @@
 #include <linux/vmalloc.h>
 #include <linux/security.h>
 #include <linux/backing-dev.h>
+#include <linux/compaction.h>
 #include <linux/syscalls.h>
 #include <linux/hugetlb.h>
 #include <linux/hugetlb_cgroup.h>
@@ -73,6 +74,81 @@ int migrate_prep_local(void)
 	return 0;
 }
 
+bool isolate_movable_page(struct page *page, isolate_mode_t mode)
+{
+	struct address_space *mapping;
+
+	/*
+	 * Avoid burning cycles with pages that are yet under __free_pages(),
+	 * or just got freed under us.
+	 *
+	 * In case we 'win' a race for a movable page being freed under us and
+	 * raise its refcount preventing __free_pages() from doing its job
+	 * the put_page() at the end of this block will take care of
+	 * release this page, thus avoiding a nasty leakage.
+	 */
+	if (unlikely(!get_page_unless_zero(page)))
+		goto out;
+
+	/*
+	 * Check PageMovable before holding a PG_lock because page's owner
+	 * assumes anybody doesn't touch PG_lock of newly allocated page
+	 * so unconditionally grapping the lock ruins page's owner side.
+	 */
+	if (unlikely(!__PageMovable(page)))
+		goto out_putpage;
+	/*
+	 * As movable pages are not isolated from LRU lists, concurrent
+	 * compaction threads can race against page migration functions
+	 * as well as race against the releasing a page.
+	 *
+	 * In order to avoid having an already isolated movable page
+	 * being (wrongly) re-isolated while it is under migration,
+	 * or to avoid attempting to isolate pages being released,
+	 * lets be sure we have the page lock
+	 * before proceeding with the movable page isolation steps.
+	 */
+	if (unlikely(!trylock_page(page)))
+		goto out_putpage;
+
+	if (!PageMovable(page) || PageIsolated(page))
+		goto out_no_isolated;
+
+	mapping = page_mapping(page);
+	VM_BUG_ON_PAGE(!mapping, page);
+
+	if (!mapping->a_ops->isolate_page(page, mode))
+		goto out_no_isolated;
+
+	/* Driver shouldn't use PG_isolated bit of page->flags */
+	WARN_ON_ONCE(PageIsolated(page));
+	__SetPageIsolated(page);
+	unlock_page(page);
+
+	return true;
+
+out_no_isolated:
+	unlock_page(page);
+out_putpage:
+	put_page(page);
+out:
+	return false;
+}
+
+/* It should be called on page which is PG_movable */
+void putback_movable_page(struct page *page)
+{
+	struct address_space *mapping;
+
+	VM_BUG_ON_PAGE(!PageLocked(page), page);
+	VM_BUG_ON_PAGE(!PageMovable(page), page);
+	VM_BUG_ON_PAGE(!PageIsolated(page), page);
+
+	mapping = page_mapping(page);
+	mapping->a_ops->putback_page(page);
+	__ClearPageIsolated(page);
+}
+
 /*
  * Put previously isolated pages back onto the appropriate lists
  * from where they were once taken off for compaction/migration.
@@ -94,10 +170,25 @@ void putback_movable_pages(struct list_head *l)
 		list_del(&page->lru);
 		dec_zone_page_state(page, NR_ISOLATED_ANON +
 				page_is_file_cache(page));
-		if (unlikely(isolated_balloon_page(page)))
+		if (unlikely(isolated_balloon_page(page))) {
 			balloon_page_putback(page);
-		else
+		/*
+		 * We isolated non-lru movable page so here we can use
+		 * __PageMovable because LRU page's mapping cannot have
+		 * PAGE_MAPPING_MOVABLE.
+		 */
+		} else if (unlikely(__PageMovable(page))) {
+			VM_BUG_ON_PAGE(!PageIsolated(page), page);
+			lock_page(page);
+			if (PageMovable(page))
+				putback_movable_page(page);
+			else
+				__ClearPageIsolated(page);
+			unlock_page(page);
+			put_page(page);
+		} else {
 			putback_lru_page(page);
+		}
 	}
 }
 
@@ -592,7 +683,7 @@ void migrate_page_copy(struct page *newpage, struct page *page)
  ***********************************************************/
 
 /*
- * Common logic to directly migrate a single page suitable for
+ * Common logic to directly migrate a single LRU page suitable for
  * pages that do not use PagePrivate/PagePrivate2.
  *
  * Pages are locked upon entry and exit.
@@ -755,33 +846,72 @@ static int move_to_new_page(struct page *newpage, struct page *page,
 				enum migrate_mode mode)
 {
 	struct address_space *mapping;
-	int rc;
+	int rc = -EAGAIN;
+	bool is_lru = !__PageMovable(page);
 
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
 
 	mapping = page_mapping(page);
-	if (!mapping)
-		rc = migrate_page(mapping, newpage, page, mode);
-	else if (mapping->a_ops->migratepage)
+
+	if (likely(is_lru)) {
+		if (!mapping)
+			rc = migrate_page(mapping, newpage, page, mode);
+		else if (mapping->a_ops->migratepage)
+			/*
+			 * Most pages have a mapping and most filesystems
+			 * provide a migratepage callback. Anonymous pages
+			 * are part of swap space which also has its own
+			 * migratepage callback. This is the most common path
+			 * for page migration.
+			 */
+			rc = mapping->a_ops->migratepage(mapping, newpage,
+							page, mode);
+		else
+			rc = fallback_migrate_page(mapping, newpage,
+							page, mode);
+	} else {
 		/*
-		 * Most pages have a mapping and most filesystems provide a
-		 * migratepage callback. Anonymous pages are part of swap
-		 * space which also has its own migratepage callback. This
-		 * is the most common path for page migration.
+		 * In case of non-lru page, it could be released after
+		 * isolation step. In that case, we shouldn't try migration.
 		 */
-		rc = mapping->a_ops->migratepage(mapping, newpage, page, mode);
-	else
-		rc = fallback_migrate_page(mapping, newpage, page, mode);
+		VM_BUG_ON_PAGE(!PageIsolated(page), page);
+		if (!PageMovable(page)) {
+			rc = MIGRATEPAGE_SUCCESS;
+			__ClearPageIsolated(page);
+			goto out;
+		}
+
+		rc = mapping->a_ops->migratepage(mapping, newpage,
+						page, mode);
+		WARN_ON_ONCE(rc == MIGRATEPAGE_SUCCESS &&
+			!PageIsolated(page));
+	}
 
 	/*
 	 * When successful, old pagecache page->mapping must be cleared before
 	 * page is freed; but stats require that PageAnon be left as PageAnon.
 	 */
 	if (rc == MIGRATEPAGE_SUCCESS) {
-		if (!PageAnon(page))
+		if (__PageMovable(page)) {
+			VM_BUG_ON_PAGE(!PageIsolated(page), page);
+
+			/*
+			 * We clear PG_movable under page_lock so any compactor
+			 * cannot try to migrate this page.
+			 */
+			__ClearPageIsolated(page);
+		}
+
+		/*
+		 * Anonymous and movable page->mapping will be cleard by
+		 * free_pages_prepare so don't reset it here for keeping
+		 * the type to work PageAnon, for example.
+		 */
+		if (!PageMappingFlags(page))
 			page->mapping = NULL;
 	}
+out:
 	return rc;
 }
 
@@ -791,6 +921,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 	int rc = -EAGAIN;
 	int page_was_mapped = 0;
 	struct anon_vma *anon_vma = NULL;
+	bool is_lru = !__PageMovable(page);
 
 	if (!trylock_page(page)) {
 		if (!force || mode == MIGRATE_ASYNC)
@@ -871,6 +1002,11 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 		goto out_unlock_both;
 	}
 
+	if (unlikely(!is_lru)) {
+		rc = move_to_new_page(newpage, page, mode);
+		goto out_unlock_both;
+	}
+
 	/*
 	 * Corner case handling:
 	 * 1. When a new swap-cache page is read into, it is added to the LRU
@@ -920,7 +1056,8 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 	 * list in here.
 	 */
 	if (rc == MIGRATEPAGE_SUCCESS) {
-		if (unlikely(__is_movable_balloon_page(newpage)))
+		if (unlikely(__is_movable_balloon_page(newpage) ||
+				__PageMovable(newpage)))
 			put_page(newpage);
 		else
 			putback_lru_page(newpage);
@@ -961,6 +1098,12 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
 		/* page was freed from under us. So we are done. */
 		ClearPageActive(page);
 		ClearPageUnevictable(page);
+		if (unlikely(__PageMovable(page))) {
+			lock_page(page);
+			if (!PageMovable(page))
+				__ClearPageIsolated(page);
+			unlock_page(page);
+		}
 		if (put_new_page)
 			put_new_page(newpage, private);
 		else
@@ -1010,8 +1153,21 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
 				num_poisoned_pages_inc();
 		}
 	} else {
-		if (rc != -EAGAIN)
-			putback_lru_page(page);
+		if (rc != -EAGAIN) {
+			if (likely(!__PageMovable(page))) {
+				putback_lru_page(page);
+				goto put_new;
+			}
+
+			lock_page(page);
+			if (PageMovable(page))
+				putback_movable_page(page);
+			else
+				__ClearPageIsolated(page);
+			unlock_page(page);
+			put_page(page);
+		}
+put_new:
 		if (put_new_page)
 			put_new_page(newpage, private);
 		else
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d27e8b968ac3..4d0ba8f6cfee 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1014,7 +1014,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
 			(page + i)->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
 		}
 	}
-	if (PageAnonHead(page))
+	if (PageMappingFlags(page))
 		page->mapping = NULL;
 	if (check_free)
 		bad += free_pages_check(page);
diff --git a/mm/util.c b/mm/util.c
index 917e0e3d0f8e..b756ee36f7f0 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -399,10 +399,12 @@ struct address_space *page_mapping(struct page *page)
 	}
 
 	mapping = page->mapping;
-	if ((unsigned long)mapping & PAGE_MAPPING_FLAGS)
+	if ((unsigned long)mapping & PAGE_MAPPING_ANON)
 		return NULL;
-	return mapping;
+
+	return (void *)((unsigned long)mapping & ~PAGE_MAPPING_FLAGS);
 }
+EXPORT_SYMBOL(page_mapping);
 
 /* Slow path of page_mapcount() for compound pages */
 int __page_mapcount(struct page *page)
-- 
1.9.1

^ permalink raw reply related

* [PATCH V2 0/2] vhost_net polling optimization
From: Jason Wang @ 2016-05-30  6:47 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev, linux-kernel

Hi:

This series tries to optimize vhost_net polling at two points:

- Stop rx polling for reduicng the unnecessary wakeups during
  handle_rx().
- Conditonally enable tx polling for reducing the unnecessary
  traversing and spinlock touching.

Test shows about 17% improvement on rx pps.

Please review

Changes from V1:
- use vhost_net_disable_vq()/vhost_net_enable_vq() instead of open
  coding.
- Add a new patch for conditionally enable tx polling.

Jason Wang (2):
  vhost_net: stop polling socket during rx processing
  vhost_net: conditionally enable tx polling

 drivers/vhost/net.c | 59 +++++++++++++++++++++++++++++------------------------
 1 file changed, 32 insertions(+), 27 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* [PATCH V2 1/2] vhost_net: stop polling socket during rx processing
From: Jason Wang @ 2016-05-30  6:47 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev, linux-kernel
In-Reply-To: <1464590874-39539-1-git-send-email-jasowang@redhat.com>

We don't stop rx polling socket during rx processing, this will lead
unnecessary wakeups from under layer net devices (E.g
sock_def_readable() form tun). Rx will be slowed down in this
way. This patch avoids this by stop polling socket during rx
processing. A small drawback is that this introduces some overheads in
light load case because of the extra start/stop polling, but single
netperf TCP_RR does not notice any change. In a super heavy load case,
e.g using pktgen to inject packet to guest, we get about ~8.8%
improvement on pps:

before: ~1240000 pkt/s
after:  ~1350000 pkt/s

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 56 +++++++++++++++++++++++++++--------------------------
 1 file changed, 29 insertions(+), 27 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 10ff494..e91603b 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -301,6 +301,32 @@ static bool vhost_can_busy_poll(struct vhost_dev *dev,
 	       !vhost_has_work(dev);
 }
 
+static void vhost_net_disable_vq(struct vhost_net *n,
+				 struct vhost_virtqueue *vq)
+{
+	struct vhost_net_virtqueue *nvq =
+		container_of(vq, struct vhost_net_virtqueue, vq);
+	struct vhost_poll *poll = n->poll + (nvq - n->vqs);
+	if (!vq->private_data)
+		return;
+	vhost_poll_stop(poll);
+}
+
+static int vhost_net_enable_vq(struct vhost_net *n,
+				struct vhost_virtqueue *vq)
+{
+	struct vhost_net_virtqueue *nvq =
+		container_of(vq, struct vhost_net_virtqueue, vq);
+	struct vhost_poll *poll = n->poll + (nvq - n->vqs);
+	struct socket *sock;
+
+	sock = vq->private_data;
+	if (!sock)
+		return 0;
+
+	return vhost_poll_start(poll, sock->file);
+}
+
 static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
 				    struct vhost_virtqueue *vq,
 				    struct iovec iov[], unsigned int iov_size,
@@ -627,6 +653,7 @@ static void handle_rx(struct vhost_net *net)
 	if (!sock)
 		goto out;
 	vhost_disable_notify(&net->dev, vq);
+	vhost_net_disable_vq(net, vq);
 
 	vhost_hlen = nvq->vhost_hlen;
 	sock_hlen = nvq->sock_hlen;
@@ -715,9 +742,10 @@ static void handle_rx(struct vhost_net *net)
 		total_len += vhost_len;
 		if (unlikely(total_len >= VHOST_NET_WEIGHT)) {
 			vhost_poll_queue(&vq->poll);
-			break;
+			goto out;
 		}
 	}
+	vhost_net_enable_vq(net, vq);
 out:
 	mutex_unlock(&vq->mutex);
 }
@@ -796,32 +824,6 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 	return 0;
 }
 
-static void vhost_net_disable_vq(struct vhost_net *n,
-				 struct vhost_virtqueue *vq)
-{
-	struct vhost_net_virtqueue *nvq =
-		container_of(vq, struct vhost_net_virtqueue, vq);
-	struct vhost_poll *poll = n->poll + (nvq - n->vqs);
-	if (!vq->private_data)
-		return;
-	vhost_poll_stop(poll);
-}
-
-static int vhost_net_enable_vq(struct vhost_net *n,
-				struct vhost_virtqueue *vq)
-{
-	struct vhost_net_virtqueue *nvq =
-		container_of(vq, struct vhost_net_virtqueue, vq);
-	struct vhost_poll *poll = n->poll + (nvq - n->vqs);
-	struct socket *sock;
-
-	sock = vq->private_data;
-	if (!sock)
-		return 0;
-
-	return vhost_poll_start(poll, sock->file);
-}
-
 static struct socket *vhost_net_stop_vq(struct vhost_net *n,
 					struct vhost_virtqueue *vq)
 {
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH V2 2/2] vhost_net: conditionally enable tx polling
From: Jason Wang @ 2016-05-30  6:47 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev, linux-kernel
In-Reply-To: <1464590874-39539-1-git-send-email-jasowang@redhat.com>

We always poll tx for socket, this is sub optimal since:

- it will be only used when we exceed the sndbuf of the socket.
- since we use two independent polls for tx and vq, this will slightly
  increase the waitqueue traversing time and more important, vhost
  could not benefit from commit
  9e641bdcfa4ef4d6e2fbaa59c1be0ad5d1551fd5 ("net-tun: restructure
  tun_do_read for better sleep/wakeup efficiency") even if we've
  stopped rx polling during handle_rx since tx poll were still left in
  the waitqueue.

Fix this by conditionally enable tx polling only when -EAGAIN were
met.

Test shows about 8% improvement on guest rx pps.

Before: ~1350000
After:  ~1460000

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index e91603b..5a05fa0 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -378,6 +378,7 @@ static void handle_tx(struct vhost_net *net)
 		goto out;
 
 	vhost_disable_notify(&net->dev, vq);
+	vhost_net_disable_vq(net, vq);
 
 	hdr_size = nvq->vhost_hlen;
 	zcopy = nvq->ubufs;
@@ -459,6 +460,8 @@ static void handle_tx(struct vhost_net *net)
 					% UIO_MAXIOV;
 			}
 			vhost_discard_vq_desc(vq, 1);
+			if (err == -EAGAIN)
+				vhost_net_enable_vq(net, vq);
 			break;
 		}
 		if (err != len)
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH v3 7/7] [wip] virtio-gpu: add page flip support
From: Daniel Vetter @ 2016-05-30  8:42 UTC (permalink / raw)
  To: Gerd Hoffmann, dri-devel, open list, open list:VIRTIO GPU DRIVER
In-Reply-To: <20160527075027.GW27098@phenom.ffwll.local>

On Fri, May 27, 2016 at 09:50:27AM +0200, Daniel Vetter wrote:
> On Fri, May 27, 2016 at 09:46:03AM +0200, Gerd Hoffmann wrote:
> > On Mi, 2016-05-25 at 18:37 +0200, Daniel Vetter wrote:
> > > On Fri, Oct 2, 2015 at 1:58 PM, Gerd Hoffmann <kraxel@redhat.com> wrote:
> > > > Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> > > 
> > > So I entirely missed this, but this isn't really how to implement
> > > page_flip for an atomic driver. Working on some stuff and will hack up
> > > a likely totally broken patch, but should be enough as guideline.
> > > -Daniel
> > 
> > Hmm, no patch in my inbox yet.  Care to send it over or give some hints
> > what is wrong with this?
> 
> You should use drm_atomic_helper_page_flip as .page_flip hook instead of
> rolling your own, when you have a brand-new atomic driver. If that helper
> doesn't work that means you have a bug in your driver somewhere.
> 
> And indeed virtio just plugs in drm_atomic_helper_commit, which thus far
> doesnt' do nonblocking commits, which means the page flip stuff won't
> work. I'm working on generic nonblocking atomic to fix this up. Currently
> still wip and buggy though, hence no patches.
> 
> But I'll take you up on the implied offer to help out and test ;-)

Ok, patches are now on the list and also in my fdo repo at:

https://cgit.freedesktop.org/~danvet/drm/log/

git://people.freedesktop.org/~danvet/drm stuff

Would be really awesome if you could test this on virtio. Note that the
new nonblocking helpers require that your atomic backend gets the drm
event handling right. So if there's a bug in that logic then you'll see
lots of dmesg noise about waits timing out (after 10s of waiting). From a
quick inspection it should work though.

Upshot is that once you get normal modeset to work with those new helpers,
then nonblocking should Just Work, and there's no need anymore for
additional testing. Or at least shouldn't.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply

* Re: [PATCH v6 02/12] mm: migrate: support non-lru movable page migration
From: Vlastimil Babka @ 2016-05-30  9:01 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Rik van Riel, Sergey Senozhatsky, Rafael Aquini, Jonathan Corbet,
	Hugh Dickins, linux-kernel, dri-devel, virtualization,
	John Einar Reitan, linux-mm, Mel Gorman, Andrew Morton,
	Joonsoo Kim, Gioh Kim
In-Reply-To: <20160530013327.GA8683@bbox>

On 05/30/2016 03:33 AM, Minchan Kim wrote:
>>
>>
>>> +	page->mapping = (void *)((unsigned long)page->mapping &
>>> +				PAGE_MAPPING_MOVABLE);
>>
>> This should be negated to clear... use ~PAGE_MAPPING_MOVABLE ?
>
> No.
>
> The intention is to clear only mapping value but PAGE_MAPPING_MOVABLE
> flag. So, any new migration trial will be failed because PageMovable
> checks page's mapping value but ongoing migraion handling can catch
> whether it's movable page or not with the type bit.

Oh, OK, I got that wrong. I'll point out in the reply to the v6v2 what 
misled me :)

>>
>> So this effectively prevents movable compound pages from being
>> migrated. Are you sure no users of this functionality are going to
>> have compound pages? I assumed that they could, and so made the code
>> like this, with the is_lru variable (which is redundant after your
>> change).
>
> This implementation at the moment disables effectively non-lru compound
> page migration but I'm not a god so I can't make sure no one doesn't want
> it in future. If someone want it, we can support it then because this work
> doesn't prevent it by design.

Oh well. As long as the balloon pages or zsmalloc don't already use 
compound pages...

>
> I thouht PageCompound check right before isolate_movable_page in
> isolate_migratepages_block will filter it out mostly but yeah
> it is racy without zone->lru_lock so it could reach to isolate_movable_page.
> However, PageMovable check in there investigates mapping, mapping->a_ops,
> and a_ops->isolate_page to verify whether it's movable page or not.
>
> I thought it's sufficient to filter THP page.

I guess, yeah.

>>
>> [...]
>>
>>> @@ -755,33 +844,69 @@ static int move_to_new_page(struct page *newpage, struct page *page,
>>> 				enum migrate_mode mode)
>>> {
>>> 	struct address_space *mapping;
>>> -	int rc;
>>> +	int rc = -EAGAIN;
>>> +	bool is_lru = !__PageMovable(page);
>>>
>>> 	VM_BUG_ON_PAGE(!PageLocked(page), page);
>>> 	VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
>>>
>>> 	mapping = page_mapping(page);
>>> -	if (!mapping)
>>> -		rc = migrate_page(mapping, newpage, page, mode);
>>> -	else if (mapping->a_ops->migratepage)
>>> -		/*
>>> -		 * Most pages have a mapping and most filesystems provide a
>>> -		 * migratepage callback. Anonymous pages are part of swap
>>> -		 * space which also has its own migratepage callback. This
>>> -		 * is the most common path for page migration.
>>> -		 */
>>> -		rc = mapping->a_ops->migratepage(mapping, newpage, page, mode);
>>> -	else
>>> -		rc = fallback_migrate_page(mapping, newpage, page, mode);
>>> +	/*
>>> +	 * In case of non-lru page, it could be released after
>>> +	 * isolation step. In that case, we shouldn't try
>>> +	 * fallback migration which is designed for LRU pages.
>>> +	 */
>>
>> Hmm but is_lru was determined from !__PageMovable() above, also well
>> after the isolation step. So if the driver already released it, we
>> wouldn't detect it? And this function is all under same page lock,
>> so if __PageMovable was true above, so will be PageMovable below?
>
> You are missing what I mentioned above.
> We should keep the type bit to catch what you are saying(i.e., driver
> already released).
>
> __PageMovable just checks PAGE_MAPPING_MOVABLE flag and PageMovable
> checks page->mapping valid while __ClearPageMovable reset only
> valid vaule of mapping, not PAGE_MAPPING_MOVABLE flag.
>
> I wrote it down in Documentation/vm/page_migration.
>
> "For testing of non-lru movable page, VM supports __PageMovable function.
> However, it doesn't guarantee to identify non-lru movable page because
> page->mapping field is unified with other variables in struct page.
> As well, if driver releases the page after isolation by VM, page->mapping
> doesn't have stable value although it has PAGE_MAPPING_MOVABLE
> (Look at __ClearPageMovable). But __PageMovable is cheap to catch whether
> page is LRU or non-lru movable once the page has been isolated. Because
> LRU pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also
> good for just peeking to test non-lru movable pages before more expensive
> checking with lock_page in pfn scanning to select victim.
>
> For guaranteeing non-lru movable page, VM provides PageMovable function.
> Unlike __PageMovable, PageMovable functions validates page->mapping and
> mapping->a_ops->isolate_page under lock_page. The lock_page prevents sudden
> destroying of page->mapping.
>
> Driver using __SetPageMovable should clear the flag via __ClearMovablePage
> under page_lock before the releasing the page."

Right, I get it now.


>>> +		if (!((unsigned long)page->mapping & PAGE_MAPPING_FLAGS))
>>> 			page->mapping = NULL;
>>
>> The two lines above make little sense to me without a comment.
>
> I folded this.
>
> @@ -901,7 +901,12 @@ static int move_to_new_page(struct page *newpage, struct page *page,
>                         __ClearPageIsolated(page);
>                 }
>
> -               if (!((unsigned long)page->mapping & PAGE_MAPPING_FLAGS))
> +               /*
> +                * Anonymous and movable page->mapping will be cleard by
> +                * free_pages_prepare so don't reset it here for keeping
> +                * the type to work PageAnon, for example.
> +                */
> +               if (!PageMappingFlags(page))
>                         page->mapping = NULL;
>         }

Thanks.

^ permalink raw reply

* Re: PATCH v6v2 02/12] mm: migrate: support non-lru movable page migration
From: Vlastimil Babka @ 2016-05-30  9:36 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton
  Cc: Rik van Riel, Sergey Senozhatsky, Rafael Aquini, Jonathan Corbet,
	Hugh Dickins, linux-kernel, dri-devel, virtualization,
	John Einar Reitan, linux-mm, Mel Gorman, Joonsoo Kim, Gioh Kim
In-Reply-To: <20160530013926.GB8683@bbox>

On 05/30/2016 03:39 AM, Minchan Kim wrote:
> After isolation, VM calls migratepage of driver with isolated page.
> The function of migratepage is to move content of the old page to new page
> and set up fields of struct page newpage. Keep in mind that you should
> clear PG_movable of oldpage via __ClearPageMovable under page_lock if you
> migrated the oldpage successfully and returns 0.

This "clear PG_movable" is one of the reasons I was confused about what 
__ClearPageMovable() really does. There's no actual "PG_movable" page 
flag and the function doesn't clear even the actual mapping flag :) Also 
same thing in the Documentation/ part.

Something like "... you should indicate to the VM that the oldpage is no 
longer movable via __ClearPageMovable() ..."?

> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -81,6 +81,39 @@ static inline bool migrate_async_suitable(int migratetype)
>
>  #ifdef CONFIG_COMPACTION
>
> +int PageMovable(struct page *page)
> +{
> +	struct address_space *mapping;
> +
> +	VM_BUG_ON_PAGE(!PageLocked(page), page);
> +	if (!__PageMovable(page))
> +		return 0;
> +
> +	mapping = page_mapping(page);
> +	if (mapping && mapping->a_ops && mapping->a_ops->isolate_page)
> +		return 1;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(PageMovable);
> +
> +void __SetPageMovable(struct page *page, struct address_space *mapping)
> +{
> +	VM_BUG_ON_PAGE(!PageLocked(page), page);
> +	VM_BUG_ON_PAGE((unsigned long)mapping & PAGE_MAPPING_MOVABLE, page);
> +	page->mapping = (void *)((unsigned long)mapping | PAGE_MAPPING_MOVABLE);
> +}
> +EXPORT_SYMBOL(__SetPageMovable);
> +
> +void __ClearPageMovable(struct page *page)
> +{
> +	VM_BUG_ON_PAGE(!PageLocked(page), page);
> +	VM_BUG_ON_PAGE(!PageMovable(page), page);
> +	page->mapping = (void *)((unsigned long)page->mapping &
> +				PAGE_MAPPING_MOVABLE);
> +}
> +EXPORT_SYMBOL(__ClearPageMovable);

The second confusing thing is that the function is named 
__ClearPageMovable(), but what it really clears is the mapping pointer,
which is not at all the opposite of what __SetPageMovable() does.

I know it's explained in the documentation, but it also deserves a 
comment here so it doesn't confuse everyone who looks at it.
Even better would be a less confusing name for the function, but I can't 
offer one right now.

> diff --git a/mm/util.c b/mm/util.c
> index 917e0e3d0f8e..b756ee36f7f0 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -399,10 +399,12 @@ struct address_space *page_mapping(struct page *page)
>  	}
>
>  	mapping = page->mapping;

I'd probably use READ_ONCE() here to be safe. Not all callers are under 
page lock?

> -	if ((unsigned long)mapping & PAGE_MAPPING_FLAGS)
> +	if ((unsigned long)mapping & PAGE_MAPPING_ANON)
>  		return NULL;
> -	return mapping;
> +
> +	return (void *)((unsigned long)mapping & ~PAGE_MAPPING_FLAGS);
>  }
> +EXPORT_SYMBOL(page_mapping);
>
>  /* Slow path of page_mapcount() for compound pages */
>  int __page_mapcount(struct page *page)
>

^ permalink raw reply

* [PATCH] virtio-gpu: fix output lookup
From: Gerd Hoffmann @ 2016-05-30 12:03 UTC (permalink / raw)
  To: dri-devel
  Cc: David Airlie, open list, open list:VIRTIO GPU DRIVER,
	open list:VIRTIO GPU DRIVER

Needed for multihead setups where we can have disabled
outputs and therefore plane->crtc can be NULL.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 drivers/gpu/drm/virtio/virtgpu_plane.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c b/drivers/gpu/drm/virtio/virtgpu_plane.c
index 70b44a2..e50674b 100644
--- a/drivers/gpu/drm/virtio/virtgpu_plane.c
+++ b/drivers/gpu/drm/virtio/virtgpu_plane.c
@@ -63,11 +63,17 @@ static void virtio_gpu_plane_atomic_update(struct drm_plane *plane,
 {
 	struct drm_device *dev = plane->dev;
 	struct virtio_gpu_device *vgdev = dev->dev_private;
-	struct virtio_gpu_output *output = drm_crtc_to_virtio_gpu_output(plane->crtc);
+	struct virtio_gpu_output *output = NULL;
 	struct virtio_gpu_framebuffer *vgfb;
 	struct virtio_gpu_object *bo;
 	uint32_t handle;
 
+	if (plane->state->crtc)
+		output = drm_crtc_to_virtio_gpu_output(plane->state->crtc);
+	if (old_state->crtc)
+		output = drm_crtc_to_virtio_gpu_output(old_state->crtc);
+	WARN_ON(!output);
+
 	if (plane->state->fb) {
 		vgfb = to_virtio_gpu_framebuffer(plane->state->fb);
 		bo = gem_to_virtio_gpu_obj(vgfb->obj);
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH v3 7/7] [wip] virtio-gpu: add page flip support
From: Gerd Hoffmann @ 2016-05-30 12:06 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: open list, dri-devel, open list:VIRTIO GPU DRIVER
In-Reply-To: <20160530084233.GA27098@phenom.ffwll.local>

  Hi,

> > But I'll take you up on the implied offer to help out and test ;-)
> 
> git://people.freedesktop.org/~danvet/drm stuff

Tried that branch.

> Would be really awesome if you could test this on virtio. Note that the
> new nonblocking helpers require that your atomic backend gets the drm
> event handling right. So if there's a bug in that logic then you'll see
> lots of dmesg noise about waits timing out (after 10s of waiting). From a
> quick inspection it should work though.

No timeouts.  Yay!

But it seems crtcs can be (temporarely) disabled now, so we might have
to pick up the crtc from old_state in virtio_gpu_plane_atomic_update to
figure which virtual output needs to be turned off.  Ran into this last
week already.  Happened with multihead setups only, but the same patch
fixes this one too ;)

https://lists.freedesktop.org/archives/dri-devel/2016-May/108772.html

cheers,
  Gerd

^ permalink raw reply

* Re: [PATCH v6 03/12] mm: balloon: use general non-lru movable page feature
From: Vlastimil Babka @ 2016-05-30 12:16 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton
  Cc: Rafael Aquini, linux-kernel, virtualization, linux-mm,
	Konstantin Khlebnikov, Gioh Kim
In-Reply-To: <1463754225-31311-4-git-send-email-minchan@kernel.org>

On 05/20/2016 04:23 PM, Minchan Kim wrote:
> Now, VM has a feature to migrate non-lru movable pages so
> balloon doesn't need custom migration hooks in migrate.c
> and compaction.c. Instead, this patch implements
> page->mapping->a_ops->{isolate|migrate|putback} functions.
>
> With that, we could remove hooks for ballooning in general
> migration functions and make balloon compaction simple.
>
> Cc: virtualization@lists.linux-foundation.org
> Cc: Rafael Aquini <aquini@redhat.com>
> Cc: Konstantin Khlebnikov <koct9i@gmail.com>
> Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
> Signed-off-by: Minchan Kim <minchan@kernel.org>

Except for the inode/pseudofs stuff which I'm not familiar with,

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply

* Re: [PATCH] Add virtio gpu driver.
From: Gerd Hoffmann @ 2016-05-30 13:50 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: virtio-dev, Michael S. Tsirkin, open list:ABI/API, open list,
	open list:DRM DRIVERS, open list:VIRTIO CORE, NET..., Dave Airlie
In-Reply-To: <20160527090313.GX27098@phenom.ffwll.local>

  Hi,

> - add a small core function to registerr HOT_X/HOT_Y for a (cursor) plane,
>   e.g. drm_plane_register_hotspot(). That should allocate the properties
>   (if they don't exist yet) and then attach those props to the cursor. We
>   don't want those props everywhere, but only on drivers that support/need
>   them, aka virtual hw.

Hmm, why is this special to virtual hw?

>  	if (crtc->cursor) {
> -		ret = drm_mode_cursor_universal(crtc, req, file_priv);
> +		if (drm_core_check_feature(DRIVER_ATOMIC))
> +			ret = drm_mode_cursor_atomic(crtc, req, file_priv);
> +		else
> +			ret = drm_mode_cursor_universal(crtc, req, file_priv);
>  		goto out;

>   drm_mode_cursor_atomic would simply be a fusing of
>   drm_mode_cursor_universal + drm_atomic_helper_update_plane (dump all the
>   intermediate variables and store directly in the plane state), with the
>   addition of also storing hot_x/y into the plane state.

Hmm, that'll either make drm_mode_cursor_atomic a big cut+pasted
function, or need quite some refactoring to move common code into
functions callable from both drm_mode_cursor_atomic
+drm_mode_cursor_universal ...

Why attach the hotspot to the plane?  Wouldn't it make more sense to
make it a framebuffer property?

cheers,
  Gerd

^ permalink raw reply

* Re: [PATCH v3 7/7] [wip] virtio-gpu: add page flip support
From: Daniel Vetter @ 2016-05-30 14:43 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: open list:VIRTIO GPU DRIVER, dri-devel, Daniel Vetter, open list
In-Reply-To: <1464610010.5179.34.camel@redhat.com>

On Mon, May 30, 2016 at 02:06:50PM +0200, Gerd Hoffmann wrote:
>   Hi,
> 
> > > But I'll take you up on the implied offer to help out and test ;-)
> > 
> > git://people.freedesktop.org/~danvet/drm stuff
> 
> Tried that branch.
> 
> > Would be really awesome if you could test this on virtio. Note that the
> > new nonblocking helpers require that your atomic backend gets the drm
> > event handling right. So if there's a bug in that logic then you'll see
> > lots of dmesg noise about waits timing out (after 10s of waiting). From a
> > quick inspection it should work though.
> 
> No timeouts.  Yay!
> 
> But it seems crtcs can be (temporarely) disabled now, so we might have
> to pick up the crtc from old_state in virtio_gpu_plane_atomic_update to
> figure which virtual output needs to be turned off.  Ran into this last
> week already.  Happened with multihead setups only, but the same patch
> fixes this one too ;)
> 
> https://lists.freedesktop.org/archives/dri-devel/2016-May/108772.html

Hm, smells more like virtio isn't too happy with the default ordering of
the commit operation. The default is:

- Disable any crtc/encoders that need to be disabled/change.
- Bash new plane setup into hw.
- Enable all crtcs/encoders that need to be enabled/have changed.

There's two problems:
- some hw gets real grumpy if you bash in plane state without the crtc
  state yet matching.
- if you do runtime pm nothing is enabled and the writes get lost at best,
  or hang your interconnect at worst.

That's why you can overwrite atomic_commit_tail, and use something more
sensible. See for example what it looks like for rockchip. I have a gut
feeling that should also take care of your troubles. Using the old crtc is
definitely not what you want.

Another option is that virtio isn't happy about bashing in plane state for
disabled crtc. Again helpers have you covered, look at the active_only
parameter for drm_atomic_helper_commit_planes().

Aside, if you wonder why these defaults: They match what the crtc helpers
are doing, but they are a bit surprising indeed.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox