* [PATCH v5 0/3] Add option to mmap GEM buffers cached
@ 2021-05-23 17:04 Paul Cercueil
  2021-05-23 17:04 ` [PATCH v5 1/3] drm: Add support for GEM buffers backed by non-coherent memory Paul Cercueil
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Paul Cercueil @ 2021-05-23 17:04 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Daniel Vetter
  Cc: Christoph Hellwig, list, dri-devel, linux-kernel, linux-mips,
	Paul Cercueil
V5 of my patchset which adds the option for having GEM buffers backed by
non-coherent memory.
Changes from V4:
- [2/3]: 
    - Rename to drm_fb_cma_sync_non_coherent
    - Invert loops for better cache locality
    - Only sync BOs that have the non-coherent flag
    - Properly sort includes
    - Move to drm_fb_cma_helper.c to avoid circular dependency
- [3/3]:
    - Fix drm_atomic_get_new_plane_state() used to retrieve the old
      state
    - Use custom drm_gem_fb_create()
    - Only check damage clips and sync DMA buffers if non-coherent
      buffers are used
Cheers,
-Paul
Paul Cercueil (3):
  drm: Add support for GEM buffers backed by non-coherent memory
  drm: Add and export function drm_fb_cma_sync_non_coherent
  drm/ingenic: Add option to alloc cached GEM buffers
 drivers/gpu/drm/drm_fb_cma_helper.c       | 46 ++++++++++++++++++
 drivers/gpu/drm/drm_gem_cma_helper.c      | 38 +++++++++++----
 drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 59 +++++++++++++++++++++--
 drivers/gpu/drm/ingenic/ingenic-drm.h     |  1 +
 drivers/gpu/drm/ingenic/ingenic-ipu.c     | 21 ++++++--
 include/drm/drm_fb_cma_helper.h           |  4 ++
 include/drm/drm_gem_cma_helper.h          |  3 ++
 7 files changed, 156 insertions(+), 16 deletions(-)
-- 
2.30.2
^ permalink raw reply	[flat|nested] 10+ messages in thread
* [PATCH v5 1/3] drm: Add support for GEM buffers backed by non-coherent memory
  2021-05-23 17:04 [PATCH v5 0/3] Add option to mmap GEM buffers cached Paul Cercueil
@ 2021-05-23 17:04 ` Paul Cercueil
  2021-05-27 10:40   ` Tomi Valkeinen
  2021-05-23 17:04 ` [PATCH v5 2/3] drm: Add and export function drm_fb_cma_sync_non_coherent Paul Cercueil
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Paul Cercueil @ 2021-05-23 17:04 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Daniel Vetter
  Cc: Christoph Hellwig, list, dri-devel, linux-kernel, linux-mips,
	Paul Cercueil
Having GEM buffers backed by non-coherent memory is interesting in the
particular case where it is faster to render to a non-coherent buffer
then sync the data cache, than to render to a write-combine buffer, and
(by extension) much faster than using a shadow buffer. This is true for
instance on some Ingenic SoCs, where even simple blits (e.g. memcpy)
are about three times faster using this method.
Add a 'map_noncoherent' flag to the drm_gem_cma_object structure, which
can be set by the drivers when they create the dumb buffer.
Since this really only applies to software rendering, disable this flag
as soon as the CMA objects are exported via PRIME.
v3: New patch. Now uses a simple 'map_noncoherent' flag to control how
    the objects are mapped, and use the new dma_mmap_pages function.
v4: Make sure map_noncoherent is always disabled when creating GEM
    objects meant to be used with dma-buf.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
---
 drivers/gpu/drm/drm_gem_cma_helper.c | 38 +++++++++++++++++++++-------
 include/drm/drm_gem_cma_helper.h     |  3 +++
 2 files changed, 32 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
index 7942cf05cd93..235c7a63da2b 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -46,6 +46,7 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = {
  * __drm_gem_cma_create - Create a GEM CMA object without allocating memory
  * @drm: DRM device
  * @size: size of the object to allocate
+ * @private: true if used for internal purposes
  *
  * This function creates and initializes a GEM CMA object of the given size,
  * but doesn't allocate any memory to back the object.
@@ -55,11 +56,11 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = {
  * error code on failure.
  */
 static struct drm_gem_cma_object *
-__drm_gem_cma_create(struct drm_device *drm, size_t size)
+__drm_gem_cma_create(struct drm_device *drm, size_t size, bool private)
 {
 	struct drm_gem_cma_object *cma_obj;
 	struct drm_gem_object *gem_obj;
-	int ret;
+	int ret = 0;
 
 	if (drm->driver->gem_create_object)
 		gem_obj = drm->driver->gem_create_object(drm, size);
@@ -73,7 +74,14 @@ __drm_gem_cma_create(struct drm_device *drm, size_t size)
 
 	cma_obj = container_of(gem_obj, struct drm_gem_cma_object, base);
 
-	ret = drm_gem_object_init(drm, gem_obj, size);
+	if (private) {
+		drm_gem_private_object_init(drm, gem_obj, size);
+
+		/* Always use writecombine for dma-buf mappings */
+		cma_obj->map_noncoherent = false;
+	} else {
+		ret = drm_gem_object_init(drm, gem_obj, size);
+	}
 	if (ret)
 		goto error;
 
@@ -111,12 +119,19 @@ struct drm_gem_cma_object *drm_gem_cma_create(struct drm_device *drm,
 
 	size = round_up(size, PAGE_SIZE);
 
-	cma_obj = __drm_gem_cma_create(drm, size);
+	cma_obj = __drm_gem_cma_create(drm, size, false);
 	if (IS_ERR(cma_obj))
 		return cma_obj;
 
-	cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr,
-				      GFP_KERNEL | __GFP_NOWARN);
+	if (cma_obj->map_noncoherent) {
+		cma_obj->vaddr = dma_alloc_noncoherent(drm->dev, size,
+						       &cma_obj->paddr,
+						       DMA_TO_DEVICE,
+						       GFP_KERNEL | __GFP_NOWARN);
+	} else {
+		cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr,
+					      GFP_KERNEL | __GFP_NOWARN);
+	}
 	if (!cma_obj->vaddr) {
 		drm_dbg(drm, "failed to allocate buffer with size %zu\n",
 			 size);
@@ -432,7 +447,7 @@ drm_gem_cma_prime_import_sg_table(struct drm_device *dev,
 		return ERR_PTR(-EINVAL);
 
 	/* Create a CMA GEM buffer. */
-	cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size);
+	cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size, true);
 	if (IS_ERR(cma_obj))
 		return ERR_CAST(cma_obj);
 
@@ -499,8 +514,13 @@ int drm_gem_cma_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
 
 	cma_obj = to_drm_gem_cma_obj(obj);
 
-	ret = dma_mmap_wc(cma_obj->base.dev->dev, vma, cma_obj->vaddr,
-			  cma_obj->paddr, vma->vm_end - vma->vm_start);
+	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
+	if (!cma_obj->map_noncoherent)
+		vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+
+	ret = dma_mmap_pages(cma_obj->base.dev->dev,
+			     vma, vma->vm_end - vma->vm_start,
+			     virt_to_page(cma_obj->vaddr));
 	if (ret)
 		drm_gem_vm_close(vma);
 
diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h
index 0a9711caa3e8..cd13508acbc1 100644
--- a/include/drm/drm_gem_cma_helper.h
+++ b/include/drm/drm_gem_cma_helper.h
@@ -16,6 +16,7 @@ struct drm_mode_create_dumb;
  *       more than one entry but they are guaranteed to have contiguous
  *       DMA addresses.
  * @vaddr: kernel virtual address of the backing memory
+ * @map_noncoherent: if true, the GEM object is backed by non-coherent memory
  */
 struct drm_gem_cma_object {
 	struct drm_gem_object base;
@@ -24,6 +25,8 @@ struct drm_gem_cma_object {
 
 	/* For objects with DMA memory allocated by GEM CMA */
 	void *vaddr;
+
+	bool map_noncoherent;
 };
 
 #define to_drm_gem_cma_obj(gem_obj) \
-- 
2.30.2
^ permalink raw reply related	[flat|nested] 10+ messages in thread
* [PATCH v5 2/3] drm: Add and export function drm_fb_cma_sync_non_coherent
  2021-05-23 17:04 [PATCH v5 0/3] Add option to mmap GEM buffers cached Paul Cercueil
  2021-05-23 17:04 ` [PATCH v5 1/3] drm: Add support for GEM buffers backed by non-coherent memory Paul Cercueil
@ 2021-05-23 17:04 ` Paul Cercueil
  2021-05-23 18:40   ` kernel test robot
  2021-05-23 17:04 ` [PATCH v5 3/3] drm/ingenic: Add option to alloc cached GEM buffers Paul Cercueil
  2021-05-23 19:05 ` [PATCH v5 0/3] Add option to mmap GEM buffers cached Thomas Zimmermann
  3 siblings, 1 reply; 10+ messages in thread
From: Paul Cercueil @ 2021-05-23 17:04 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Daniel Vetter
  Cc: Christoph Hellwig, list, dri-devel, linux-kernel, linux-mips,
	Paul Cercueil
This function can be used by drivers that use damage clips and have
CMA GEM objects backed by non-coherent memory. Calling this function
in a plane's .atomic_update ensures that all the data in the backing
memory have been written to RAM.
v3: - Only sync data if using GEM objects backed by non-coherent memory.
    - Use a drm_device pointer instead of device pointer in prototype
v5: - Rename to drm_fb_cma_sync_non_coherent
    - Invert loops for better cache locality
    - Only sync BOs that have the non-coherent flag
    - Move to drm_fb_cma_helper.c to avoid circular dependency
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
 drivers/gpu/drm/drm_fb_cma_helper.c | 46 +++++++++++++++++++++++++++++
 include/drm/drm_fb_cma_helper.h     |  4 +++
 2 files changed, 50 insertions(+)
diff --git a/drivers/gpu/drm/drm_fb_cma_helper.c b/drivers/gpu/drm/drm_fb_cma_helper.c
index cb2349ad338d..69c57273b184 100644
--- a/drivers/gpu/drm/drm_fb_cma_helper.c
+++ b/drivers/gpu/drm/drm_fb_cma_helper.c
@@ -9,12 +9,14 @@
  *  Copyright (C) 2012 Red Hat
  */
 
+#include <drm/drm_damage_helper.h>
 #include <drm/drm_fb_cma_helper.h>
 #include <drm/drm_fourcc.h>
 #include <drm/drm_framebuffer.h>
 #include <drm/drm_gem_cma_helper.h>
 #include <drm/drm_gem_framebuffer_helper.h>
 #include <drm/drm_plane.h>
+#include <linux/dma-mapping.h>
 #include <linux/module.h>
 
 /**
@@ -97,3 +99,47 @@ dma_addr_t drm_fb_cma_get_gem_addr(struct drm_framebuffer *fb,
 	return paddr;
 }
 EXPORT_SYMBOL_GPL(drm_fb_cma_get_gem_addr);
+
+/**
+ * drm_fb_cma_sync_non_coherent - Sync GEM object to non-coherent backing
+ *	memory
+ * @drm: DRM device
+ * @old_state: Old plane state
+ * @state: New plane state
+ *
+ * This function can be used by drivers that use damage clips and have
+ * CMA GEM objects backed by non-coherent memory. Calling this function
+ * in a plane's .atomic_update ensures that all the data in the backing
+ * memory have been written to RAM.
+ */
+void drm_fb_cma_sync_non_coherent(struct drm_device *drm,
+				  struct drm_plane_state *old_state,
+				  struct drm_plane_state *state)
+{
+	const struct drm_format_info *finfo = state->fb->format;
+	struct drm_atomic_helper_damage_iter iter;
+	const struct drm_gem_cma_object *cma_obj;
+	unsigned int offset, i;
+	struct drm_rect clip;
+	dma_addr_t daddr;
+	size_t nb_bytes;
+
+	for (i = 0; i < finfo->num_planes; i++) {
+		cma_obj = drm_fb_cma_get_gem_obj(state->fb, i);
+		if (!cma_obj->map_noncoherent)
+			continue;
+
+		daddr = drm_fb_cma_get_gem_addr(state->fb, state, i);
+		drm_atomic_helper_damage_iter_init(&iter, old_state, state);
+
+		drm_atomic_for_each_plane_damage(&iter, &clip) {
+			/* Ignore x1/x2 values, invalidate complete lines */
+			offset = clip.y1 * state->fb->pitches[i];
+
+			nb_bytes = (clip.y2 - clip.y1) * state->fb->pitches[i];
+			dma_sync_single_for_device(drm->dev, daddr + offset,
+						   nb_bytes, DMA_TO_DEVICE);
+		}
+	}
+}
+EXPORT_SYMBOL_GPL(drm_fb_cma_sync_non_coherent);
diff --git a/include/drm/drm_fb_cma_helper.h b/include/drm/drm_fb_cma_helper.h
index 795aea1d0a25..3a177632b17e 100644
--- a/include/drm/drm_fb_cma_helper.h
+++ b/include/drm/drm_fb_cma_helper.h
@@ -14,5 +14,9 @@ dma_addr_t drm_fb_cma_get_gem_addr(struct drm_framebuffer *fb,
 				   struct drm_plane_state *state,
 				   unsigned int plane);
 
+void drm_fb_cma_sync_non_coherent(struct drm_device *drm,
+				  struct drm_plane_state *old_state,
+				  struct drm_plane_state *state);
+
 #endif
 
-- 
2.30.2
^ permalink raw reply related	[flat|nested] 10+ messages in thread
* [PATCH v5 3/3] drm/ingenic: Add option to alloc cached GEM buffers
  2021-05-23 17:04 [PATCH v5 0/3] Add option to mmap GEM buffers cached Paul Cercueil
  2021-05-23 17:04 ` [PATCH v5 1/3] drm: Add support for GEM buffers backed by non-coherent memory Paul Cercueil
  2021-05-23 17:04 ` [PATCH v5 2/3] drm: Add and export function drm_fb_cma_sync_non_coherent Paul Cercueil
@ 2021-05-23 17:04 ` Paul Cercueil
  2021-05-23 19:05 ` [PATCH v5 0/3] Add option to mmap GEM buffers cached Thomas Zimmermann
  3 siblings, 0 replies; 10+ messages in thread
From: Paul Cercueil @ 2021-05-23 17:04 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Daniel Vetter
  Cc: Christoph Hellwig, list, dri-devel, linux-kernel, linux-mips,
	Paul Cercueil
Alloc GEM buffers backed by noncoherent memory on SoCs where it is
actually faster than write-combine.
This dramatically speeds up software rendering on these SoCs, even for
tasks where write-combine memory should in theory be faster (e.g. simple
blits).
v3: The option is now selected per-SoC instead of being a module
    parameter.
v5: - Fix drm_atomic_get_new_plane_state() used to retrieve the old
      state
    - Use custom drm_gem_fb_create()
    - Only check damage clips and sync DMA buffers if non-coherent
      buffers are used
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
 drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 59 +++++++++++++++++++++--
 drivers/gpu/drm/ingenic/ingenic-drm.h     |  1 +
 drivers/gpu/drm/ingenic/ingenic-ipu.c     | 21 ++++++--
 3 files changed, 74 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
index 389cad59e090..5244f4763477 100644
--- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
+++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
@@ -9,6 +9,7 @@
 #include <linux/component.h>
 #include <linux/clk.h>
 #include <linux/dma-mapping.h>
+#include <linux/io.h>
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/of_device.h>
@@ -23,6 +24,7 @@
 #include <drm/drm_color_mgmt.h>
 #include <drm/drm_crtc.h>
 #include <drm/drm_crtc_helper.h>
+#include <drm/drm_damage_helper.h>
 #include <drm/drm_drv.h>
 #include <drm/drm_encoder.h>
 #include <drm/drm_gem_cma_helper.h>
@@ -57,6 +59,7 @@ struct ingenic_dma_hwdescs {
 struct jz_soc_info {
 	bool needs_dev_clk;
 	bool has_osd;
+	bool map_noncoherent;
 	unsigned int max_width, max_height;
 	const u32 *formats_f0, *formats_f1;
 	unsigned int num_formats_f0, num_formats_f1;
@@ -410,6 +413,9 @@ static int ingenic_drm_plane_atomic_check(struct drm_plane *plane,
 	     old_plane_state->fb->format->format != new_plane_state->fb->format->format))
 		crtc_state->mode_changed = true;
 
+	if (priv->soc_info->map_noncoherent)
+		drm_atomic_helper_check_plane_damage(state, new_plane_state);
+
 	return 0;
 }
 
@@ -526,6 +532,13 @@ void ingenic_drm_plane_config(struct device *dev,
 	}
 }
 
+bool ingenic_drm_map_noncoherent(const struct device *dev)
+{
+	const struct ingenic_drm *priv = dev_get_drvdata(dev);
+
+	return priv->soc_info->map_noncoherent;
+}
+
 static void ingenic_drm_update_palette(struct ingenic_drm *priv,
 				       const struct drm_color_lut *lut)
 {
@@ -544,8 +557,8 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane,
 					    struct drm_atomic_state *state)
 {
 	struct ingenic_drm *priv = drm_device_get_priv(plane->dev);
-	struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state,
-									  plane);
+	struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state, plane);
+	struct drm_plane_state *oldstate = drm_atomic_get_old_plane_state(state, plane);
 	struct drm_crtc_state *crtc_state;
 	struct ingenic_dma_hwdesc *hwdesc;
 	unsigned int width, height, cpp, offset;
@@ -553,6 +566,9 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane,
 	u32 fourcc;
 
 	if (newstate && newstate->fb) {
+		if (priv->soc_info->map_noncoherent)
+			drm_fb_cma_sync_non_coherent(&priv->drm, oldstate, newstate);
+
 		crtc_state = newstate->crtc->state;
 
 		addr = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
@@ -742,6 +758,33 @@ static void ingenic_drm_disable_vblank(struct drm_crtc *crtc)
 	regmap_update_bits(priv->map, JZ_REG_LCD_CTRL, JZ_LCD_CTRL_EOF_IRQ, 0);
 }
 
+static struct drm_framebuffer *
+ingenic_drm_gem_fb_create(struct drm_device *drm, struct drm_file *file,
+			  const struct drm_mode_fb_cmd2 *mode_cmd)
+{
+	struct ingenic_drm *priv = drm_device_get_priv(drm);
+
+	if (priv->soc_info->map_noncoherent)
+		return drm_gem_fb_create_with_dirty(drm, file, mode_cmd);
+
+	return drm_gem_fb_create(drm, file, mode_cmd);
+}
+
+static struct drm_gem_object *
+ingenic_drm_gem_create_object(struct drm_device *drm, size_t size)
+{
+	struct ingenic_drm *priv = drm_device_get_priv(drm);
+	struct drm_gem_cma_object *obj;
+
+	obj = kzalloc(sizeof(*obj), GFP_KERNEL);
+	if (!obj)
+		return ERR_PTR(-ENOMEM);
+
+	obj->map_noncoherent = priv->soc_info->map_noncoherent;
+
+	return &obj->base;
+}
+
 DEFINE_DRM_GEM_CMA_FOPS(ingenic_drm_fops);
 
 static const struct drm_driver ingenic_drm_driver_data = {
@@ -754,6 +797,7 @@ static const struct drm_driver ingenic_drm_driver_data = {
 	.patchlevel		= 0,
 
 	.fops			= &ingenic_drm_fops,
+	.gem_create_object	= ingenic_drm_gem_create_object,
 	DRM_GEM_CMA_DRIVER_OPS,
 
 	.irq_handler		= ingenic_drm_irq_handler,
@@ -804,7 +848,7 @@ static const struct drm_encoder_helper_funcs ingenic_drm_encoder_helper_funcs =
 };
 
 static const struct drm_mode_config_funcs ingenic_drm_mode_config_funcs = {
-	.fb_create		= drm_gem_fb_create,
+	.fb_create		= ingenic_drm_gem_fb_create,
 	.output_poll_changed	= drm_fb_helper_output_poll_changed,
 	.atomic_check		= drm_atomic_helper_check,
 	.atomic_commit		= drm_atomic_helper_commit,
@@ -961,6 +1005,9 @@ static int ingenic_drm_bind(struct device *dev, bool has_components)
 		return ret;
 	}
 
+	if (soc_info->map_noncoherent)
+		drm_plane_enable_fb_damage_clips(&priv->f1);
+
 	drm_crtc_helper_add(&priv->crtc, &ingenic_drm_crtc_helper_funcs);
 
 	ret = drm_crtc_init_with_planes(drm, &priv->crtc, primary,
@@ -989,6 +1036,9 @@ static int ingenic_drm_bind(struct device *dev, bool has_components)
 			return ret;
 		}
 
+		if (soc_info->map_noncoherent)
+			drm_plane_enable_fb_damage_clips(&priv->f0);
+
 		if (IS_ENABLED(CONFIG_DRM_INGENIC_IPU) && has_components) {
 			ret = component_bind_all(dev, drm);
 			if (ret) {
@@ -1245,6 +1295,7 @@ static const u32 jz4770_formats_f0[] = {
 static const struct jz_soc_info jz4740_soc_info = {
 	.needs_dev_clk = true,
 	.has_osd = false,
+	.map_noncoherent = false,
 	.max_width = 800,
 	.max_height = 600,
 	.formats_f1 = jz4740_formats,
@@ -1255,6 +1306,7 @@ static const struct jz_soc_info jz4740_soc_info = {
 static const struct jz_soc_info jz4725b_soc_info = {
 	.needs_dev_clk = false,
 	.has_osd = true,
+	.map_noncoherent = false,
 	.max_width = 800,
 	.max_height = 600,
 	.formats_f1 = jz4725b_formats_f1,
@@ -1266,6 +1318,7 @@ static const struct jz_soc_info jz4725b_soc_info = {
 static const struct jz_soc_info jz4770_soc_info = {
 	.needs_dev_clk = false,
 	.has_osd = true,
+	.map_noncoherent = true,
 	.max_width = 1280,
 	.max_height = 720,
 	.formats_f1 = jz4770_formats_f1,
diff --git a/drivers/gpu/drm/ingenic/ingenic-drm.h b/drivers/gpu/drm/ingenic/ingenic-drm.h
index 1b4347f7f084..22654ac1dde1 100644
--- a/drivers/gpu/drm/ingenic/ingenic-drm.h
+++ b/drivers/gpu/drm/ingenic/ingenic-drm.h
@@ -184,6 +184,7 @@ struct platform_driver;
 void ingenic_drm_plane_config(struct device *dev,
 			      struct drm_plane *plane, u32 fourcc);
 void ingenic_drm_plane_disable(struct device *dev, struct drm_plane *plane);
+bool ingenic_drm_map_noncoherent(const struct device *dev);
 
 extern struct platform_driver *ingenic_ipu_driver_ptr;
 
diff --git a/drivers/gpu/drm/ingenic/ingenic-ipu.c b/drivers/gpu/drm/ingenic/ingenic-ipu.c
index 3b1091e7c0cd..61b6d9fdbba1 100644
--- a/drivers/gpu/drm/ingenic/ingenic-ipu.c
+++ b/drivers/gpu/drm/ingenic/ingenic-ipu.c
@@ -20,10 +20,13 @@
 
 #include <drm/drm_atomic.h>
 #include <drm/drm_atomic_helper.h>
+#include <drm/drm_damage_helper.h>
 #include <drm/drm_drv.h>
 #include <drm/drm_fb_cma_helper.h>
 #include <drm/drm_fourcc.h>
 #include <drm/drm_gem_atomic_helper.h>
+#include <drm/drm_gem_cma_helper.h>
+#include <drm/drm_gem_framebuffer_helper.h>
 #include <drm/drm_plane.h>
 #include <drm/drm_plane_helper.h>
 #include <drm/drm_property.h>
@@ -285,8 +288,8 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
 					    struct drm_atomic_state *state)
 {
 	struct ingenic_ipu *ipu = plane_to_ingenic_ipu(plane);
-	struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state,
-									  plane);
+	struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state, plane);
+	struct drm_plane_state *oldstate = drm_atomic_get_old_plane_state(state, plane);
 	const struct drm_format_info *finfo;
 	u32 ctrl, stride = 0, coef_index = 0, format = 0;
 	bool needs_modeset, upscaling_w, upscaling_h;
@@ -317,6 +320,9 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
 				JZ_IPU_CTRL_CHIP_EN | JZ_IPU_CTRL_LCDC_SEL);
 	}
 
+	if (ingenic_drm_map_noncoherent(ipu->master))
+		drm_fb_cma_sync_non_coherent(ipu->drm, oldstate, newstate);
+
 	/* New addresses will be committed in vblank handler... */
 	ipu->addr_y = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
 	if (finfo->num_planes > 1)
@@ -541,7 +547,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
 
 	if (!new_plane_state->crtc ||
 	    !crtc_state->mode.hdisplay || !crtc_state->mode.vdisplay)
-		return 0;
+		goto out_check_damage;
 
 	/* Plane must be fully visible */
 	if (new_plane_state->crtc_x < 0 || new_plane_state->crtc_y < 0 ||
@@ -558,7 +564,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
 		return -EINVAL;
 
 	if (!osd_changed(new_plane_state, old_plane_state))
-		return 0;
+		goto out_check_damage;
 
 	crtc_state->mode_changed = true;
 
@@ -592,6 +598,10 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
 	ipu->denom_w = denom_w;
 	ipu->denom_h = denom_h;
 
+out_check_damage:
+	if (ingenic_drm_map_noncoherent(ipu->master))
+		drm_atomic_helper_check_plane_damage(state, new_plane_state);
+
 	return 0;
 }
 
@@ -773,6 +783,9 @@ static int ingenic_ipu_bind(struct device *dev, struct device *master, void *d)
 		return err;
 	}
 
+	if (ingenic_drm_map_noncoherent(master))
+		drm_plane_enable_fb_damage_clips(plane);
+
 	/*
 	 * Sharpness settings range is [0,32]
 	 * 0       : nearest-neighbor
-- 
2.30.2
^ permalink raw reply related	[flat|nested] 10+ messages in thread
* Re: [PATCH v5 2/3] drm: Add and export function drm_fb_cma_sync_non_coherent
  2021-05-23 17:04 ` [PATCH v5 2/3] drm: Add and export function drm_fb_cma_sync_non_coherent Paul Cercueil
@ 2021-05-23 18:40   ` kernel test robot
  0 siblings, 0 replies; 10+ messages in thread
From: kernel test robot @ 2021-05-23 18:40 UTC (permalink / raw)
  To: Paul Cercueil, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter
  Cc: kbuild-all, clang-built-linux, Christoph Hellwig, list, dri-devel,
	linux-kernel, linux-mips
[-- Attachment #1: Type: text/plain, Size: 8828 bytes --]
Hi Paul,
I love your patch! Perhaps something to improve:
[auto build test WARNING on linus/master]
[also build test WARNING on v5.13-rc2 next-20210521]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url:    https://github.com/0day-ci/linux/commits/Paul-Cercueil/Add-option-to-mmap-GEM-buffers-cached/20210524-010735
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 4d7620341eda38573a73ab63c33423534fa38eb9
config: powerpc-randconfig-r013-20210524 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project b4fd512c36ca344a3ff69350219e8b0a67e9472a)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install powerpc cross compiling tool for clang build
        # apt-get install binutils-powerpc-linux-gnu
        # https://github.com/0day-ci/linux/commit/2dea3091811eb397337113498cf675a383412c59
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Paul-Cercueil/Add-option-to-mmap-GEM-buffers-cached/20210524-010735
        git checkout 2dea3091811eb397337113498cf675a383412c59
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc 
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:619:
   arch/powerpc/include/asm/io-defs.h:43:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(insb, (unsigned long p, void *b, unsigned long c),
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET'
                   __do_##name al;                                 \
                   ^~~~~~~~~~~~~~
   <scratch space>:84:1: note: expanded from here
   __do_insb
   ^
   arch/powerpc/include/asm/io.h:556:56: note: expanded from macro '__do_insb'
   #define __do_insb(p, b, n)      readsb((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
                                          ~~~~~~~~~~~~~~~~~~~~~^
   In file included from drivers/gpu/drm/pl111/pl111_display.c:15:
   In file included from include/linux/dma-buf.h:16:
   In file included from include/linux/dma-buf-map.h:9:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:619:
   arch/powerpc/include/asm/io-defs.h:45:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(insw, (unsigned long p, void *b, unsigned long c),
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET'
                   __do_##name al;                                 \
                   ^~~~~~~~~~~~~~
   <scratch space>:86:1: note: expanded from here
   __do_insw
   ^
   arch/powerpc/include/asm/io.h:557:56: note: expanded from macro '__do_insw'
   #define __do_insw(p, b, n)      readsw((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
                                          ~~~~~~~~~~~~~~~~~~~~~^
   In file included from drivers/gpu/drm/pl111/pl111_display.c:15:
   In file included from include/linux/dma-buf.h:16:
   In file included from include/linux/dma-buf-map.h:9:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:619:
   arch/powerpc/include/asm/io-defs.h:47:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(insl, (unsigned long p, void *b, unsigned long c),
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET'
                   __do_##name al;                                 \
                   ^~~~~~~~~~~~~~
   <scratch space>:88:1: note: expanded from here
   __do_insl
   ^
   arch/powerpc/include/asm/io.h:558:56: note: expanded from macro '__do_insl'
   #define __do_insl(p, b, n)      readsl((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
                                          ~~~~~~~~~~~~~~~~~~~~~^
   In file included from drivers/gpu/drm/pl111/pl111_display.c:15:
   In file included from include/linux/dma-buf.h:16:
   In file included from include/linux/dma-buf-map.h:9:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:619:
   arch/powerpc/include/asm/io-defs.h:49:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(outsb, (unsigned long p, const void *b, unsigned long c),
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET'
                   __do_##name al;                                 \
                   ^~~~~~~~~~~~~~
   <scratch space>:90:1: note: expanded from here
   __do_outsb
   ^
   arch/powerpc/include/asm/io.h:559:58: note: expanded from macro '__do_outsb'
   #define __do_outsb(p, b, n)     writesb((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
                                           ~~~~~~~~~~~~~~~~~~~~~^
   In file included from drivers/gpu/drm/pl111/pl111_display.c:15:
   In file included from include/linux/dma-buf.h:16:
   In file included from include/linux/dma-buf-map.h:9:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:619:
   arch/powerpc/include/asm/io-defs.h:51:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(outsw, (unsigned long p, const void *b, unsigned long c),
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET'
                   __do_##name al;                                 \
                   ^~~~~~~~~~~~~~
   <scratch space>:92:1: note: expanded from here
   __do_outsw
   ^
   arch/powerpc/include/asm/io.h:560:58: note: expanded from macro '__do_outsw'
   #define __do_outsw(p, b, n)     writesw((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
                                           ~~~~~~~~~~~~~~~~~~~~~^
   In file included from drivers/gpu/drm/pl111/pl111_display.c:15:
   In file included from include/linux/dma-buf.h:16:
   In file included from include/linux/dma-buf-map.h:9:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:619:
   arch/powerpc/include/asm/io-defs.h:53:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(outsl, (unsigned long p, const void *b, unsigned long c),
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET'
                   __do_##name al;                                 \
                   ^~~~~~~~~~~~~~
   <scratch space>:94:1: note: expanded from here
   __do_outsl
   ^
   arch/powerpc/include/asm/io.h:561:58: note: expanded from macro '__do_outsl'
   #define __do_outsl(p, b, n)     writesl((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
                                           ~~~~~~~~~~~~~~~~~~~~~^
   In file included from drivers/gpu/drm/pl111/pl111_display.c:18:
>> include/drm/drm_fb_cma_helper.h:17:42: warning: declaration of 'struct drm_device' will not be visible outside of this function [-Wvisibility]
   void drm_fb_cma_sync_non_coherent(struct drm_device *drm,
                                            ^
   7 warnings generated.
vim +17 include/drm/drm_fb_cma_helper.h
     9	
    10	struct drm_gem_cma_object *drm_fb_cma_get_gem_obj(struct drm_framebuffer *fb,
    11		unsigned int plane);
    12	
    13	dma_addr_t drm_fb_cma_get_gem_addr(struct drm_framebuffer *fb,
    14					   struct drm_plane_state *state,
    15					   unsigned int plane);
    16	
  > 17	void drm_fb_cma_sync_non_coherent(struct drm_device *drm,
    18					  struct drm_plane_state *old_state,
    19					  struct drm_plane_state *state);
    20	
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 42694 bytes --]
^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: [PATCH v5 0/3] Add option to mmap GEM buffers cached
  2021-05-23 17:04 [PATCH v5 0/3] Add option to mmap GEM buffers cached Paul Cercueil
                   ` (2 preceding siblings ...)
  2021-05-23 17:04 ` [PATCH v5 3/3] drm/ingenic: Add option to alloc cached GEM buffers Paul Cercueil
@ 2021-05-23 19:05 ` Thomas Zimmermann
  2021-05-23 19:19   ` Paul Cercueil
  3 siblings, 1 reply; 10+ messages in thread
From: Thomas Zimmermann @ 2021-05-23 19:05 UTC (permalink / raw)
  To: Paul Cercueil, Maarten Lankhorst, Maxime Ripard, David Airlie,
	Daniel Vetter
  Cc: linux-kernel, dri-devel, linux-mips, Christoph Hellwig, list
[-- Attachment #1.1: Type: text/plain, Size: 2137 bytes --]
Hi
Am 23.05.21 um 19:04 schrieb Paul Cercueil:
> V5 of my patchset which adds the option for having GEM buffers backed by
> non-coherent memory.
> 
> Changes from V4:
> 
> - [2/3]:
>      - Rename to drm_fb_cma_sync_non_coherent
>      - Invert loops for better cache locality
>      - Only sync BOs that have the non-coherent flag
>      - Properly sort includes
>      - Move to drm_fb_cma_helper.c to avoid circular dependency
I'm pretty sure it's still not the right place. That would be something 
like drm_gem_cma_atomic_helper.c, but creating a new file just for a 
single function doesn't make sense.
> 
> - [3/3]:
>      - Fix drm_atomic_get_new_plane_state() used to retrieve the old
>        state
>      - Use custom drm_gem_fb_create()
It's often a better choice to express such differences via different 
data structures (i.e., different instances of drm_mode_config_funcs) but 
it's not a big deal either.
Please go ahaed and merge if no one objects. All patches:
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Best regards
Thomas
>      - Only check damage clips and sync DMA buffers if non-coherent
>        buffers are used
> 
> Cheers,
> -Paul
> 
> Paul Cercueil (3):
>    drm: Add support for GEM buffers backed by non-coherent memory
>    drm: Add and export function drm_fb_cma_sync_non_coherent
>    drm/ingenic: Add option to alloc cached GEM buffers
> 
>   drivers/gpu/drm/drm_fb_cma_helper.c       | 46 ++++++++++++++++++
>   drivers/gpu/drm/drm_gem_cma_helper.c      | 38 +++++++++++----
>   drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 59 +++++++++++++++++++++--
>   drivers/gpu/drm/ingenic/ingenic-drm.h     |  1 +
>   drivers/gpu/drm/ingenic/ingenic-ipu.c     | 21 ++++++--
>   include/drm/drm_fb_cma_helper.h           |  4 ++
>   include/drm/drm_gem_cma_helper.h          |  3 ++
>   7 files changed, 156 insertions(+), 16 deletions(-)
> 
-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]
^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: [PATCH v5 0/3] Add option to mmap GEM buffers cached
  2021-05-23 19:05 ` [PATCH v5 0/3] Add option to mmap GEM buffers cached Thomas Zimmermann
@ 2021-05-23 19:19   ` Paul Cercueil
  2021-05-25 10:03     ` Thomas Zimmermann
  0 siblings, 1 reply; 10+ messages in thread
From: Paul Cercueil @ 2021-05-23 19:19 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: Maarten Lankhorst, Maxime Ripard, David Airlie, Daniel Vetter,
	linux-kernel, dri-devel, linux-mips, Christoph Hellwig, list
Hi Thomas,
Le dim., mai 23 2021 at 21:05:30 +0200, Thomas Zimmermann 
<tzimmermann@suse.de> a écrit :
> Hi
> 
> Am 23.05.21 um 19:04 schrieb Paul Cercueil:
>> V5 of my patchset which adds the option for having GEM buffers 
>> backed by
>> non-coherent memory.
>> 
>> Changes from V4:
>> 
>> - [2/3]:
>>      - Rename to drm_fb_cma_sync_non_coherent
>>      - Invert loops for better cache locality
>>      - Only sync BOs that have the non-coherent flag
>>      - Properly sort includes
>>      - Move to drm_fb_cma_helper.c to avoid circular dependency
> 
> I'm pretty sure it's still not the right place. That would be 
> something like drm_gem_cma_atomic_helper.c, but creating a new file 
> just for a single function doesn't make sense.
drm_fb_cma_sync_non_coherent calls drm_fb_cma_* functions, so it's a 
better match than its former location (which wasn't good as it created 
a circular dependency between drm.ko and drm-kms-helper.ko).
Do you have a better idea?
>> 
>> - [3/3]:
>>      - Fix drm_atomic_get_new_plane_state() used to retrieve the old
>>        state
>>      - Use custom drm_gem_fb_create()
> 
> It's often a better choice to express such differences via different 
> data structures (i.e., different instances of drm_mode_config_funcs) 
> but it's not a big deal either.
The different drm_mode_config_funcs instances already exist in 
drm_gem_framebuffer_helper.c but are static, and drm_gem_fb_create() 
and drm_gem_fb_create_with_dirty() are just tiny wrappers around 
drm_gem_fb_create_with_funcs() with the corresponding 
drm_mode_config_funcs instance. I didn't want to copy them to 
ingenic-drm-drv.c, but maybe I can export the symbols and use 
drm_gem_fb_create_with_funcs() directly?
> Please go ahaed and merge if no one objects. All patches:
> 
> Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Thanks!
Cheers,
-Paul
> Best regards
> Thomas
> 
>>      - Only check damage clips and sync DMA buffers if non-coherent
>>        buffers are used
>> 
>> Cheers,
>> -Paul
>> 
>> Paul Cercueil (3):
>>    drm: Add support for GEM buffers backed by non-coherent memory
>>    drm: Add and export function drm_fb_cma_sync_non_coherent
>>    drm/ingenic: Add option to alloc cached GEM buffers
>> 
>>   drivers/gpu/drm/drm_fb_cma_helper.c       | 46 ++++++++++++++++++
>>   drivers/gpu/drm/drm_gem_cma_helper.c      | 38 +++++++++++----
>>   drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 59 
>> +++++++++++++++++++++--
>>   drivers/gpu/drm/ingenic/ingenic-drm.h     |  1 +
>>   drivers/gpu/drm/ingenic/ingenic-ipu.c     | 21 ++++++--
>>   include/drm/drm_fb_cma_helper.h           |  4 ++
>>   include/drm/drm_gem_cma_helper.h          |  3 ++
>>   7 files changed, 156 insertions(+), 16 deletions(-)
>> 
> 
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Felix Imendörffer
> 
^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: [PATCH v5 0/3] Add option to mmap GEM buffers cached
  2021-05-23 19:19   ` Paul Cercueil
@ 2021-05-25 10:03     ` Thomas Zimmermann
  0 siblings, 0 replies; 10+ messages in thread
From: Thomas Zimmermann @ 2021-05-25 10:03 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: David Airlie, linux-kernel, dri-devel, linux-mips,
	Christoph Hellwig, list
[-- Attachment #1.1: Type: text/plain, Size: 1427 bytes --]
Hi
Am 23.05.21 um 21:19 schrieb Paul Cercueil:
> Hi Thomas,
> 
> Le dim., mai 23 2021 at 21:05:30 +0200, Thomas Zimmermann 
> <tzimmermann@suse.de> a écrit :
>> Hi
>>
>> Am 23.05.21 um 19:04 schrieb Paul Cercueil:
>>> V5 of my patchset which adds the option for having GEM buffers backed 
by
>>> non-coherent memory.
>>>
>>> Changes from V4:
>>>
>>> - [2/3]:
>>>      - Rename to drm_fb_cma_sync_non_coherent
>>>      - Invert loops for better cache locality
>>>      - Only sync BOs that have the non-coherent flag
>>>      - Properly sort includes
>>>      - Move to drm_fb_cma_helper.c to avoid circular dependency
>>
>> I'm pretty sure it's still not the right place. That would be 
>> something like drm_gem_cma_atomic_helper.c, but creating a new file 
>> just for a single function doesn't make sense.
> 
> drm_fb_cma_sync_non_coherent calls drm_fb_cma_* functions, so it's a 
> better match than its former location (which wasn't good as it created a 
> circular dependency between drm.ko and drm-kms-helper.ko).
> 
> Do you have a better idea?
No, it was more of a rant than a review comment. Please go ahead and 
merge the patchset.
Best regards
Thomas
-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]
^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: [PATCH v5 1/3] drm: Add support for GEM buffers backed by non-coherent memory
  2021-05-23 17:04 ` [PATCH v5 1/3] drm: Add support for GEM buffers backed by non-coherent memory Paul Cercueil
@ 2021-05-27 10:40   ` Tomi Valkeinen
  2021-05-27 12:43     ` Paul Cercueil
  0 siblings, 1 reply; 10+ messages in thread
From: Tomi Valkeinen @ 2021-05-27 10:40 UTC (permalink / raw)
  To: Paul Cercueil, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter
  Cc: linux-kernel, dri-devel, linux-mips, Christoph Hellwig, list,
	Lokesh Vutla, Yadav, Pratyush
On 23/05/2021 20:04, Paul Cercueil wrote:
> Having GEM buffers backed by non-coherent memory is interesting in the
> particular case where it is faster to render to a non-coherent buffer
> then sync the data cache, than to render to a write-combine buffer, and
> (by extension) much faster than using a shadow buffer. This is true for
> instance on some Ingenic SoCs, where even simple blits (e.g. memcpy)
> are about three times faster using this method.
> 
> Add a 'map_noncoherent' flag to the drm_gem_cma_object structure, which
> can be set by the drivers when they create the dumb buffer.
> 
> Since this really only applies to software rendering, disable this flag
> as soon as the CMA objects are exported via PRIME.
> 
> v3: New patch. Now uses a simple 'map_noncoherent' flag to control how
>      the objects are mapped, and use the new dma_mmap_pages function.
> 
> v4: Make sure map_noncoherent is always disabled when creating GEM
>      objects meant to be used with dma-buf.
> 
> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
> ---
>   drivers/gpu/drm/drm_gem_cma_helper.c | 38 +++++++++++++++++++++-------
>   include/drm/drm_gem_cma_helper.h     |  3 +++
>   2 files changed, 32 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
> index 7942cf05cd93..235c7a63da2b 100644
> --- a/drivers/gpu/drm/drm_gem_cma_helper.c
> +++ b/drivers/gpu/drm/drm_gem_cma_helper.c
> @@ -46,6 +46,7 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = {
>    * __drm_gem_cma_create - Create a GEM CMA object without allocating memory
>    * @drm: DRM device
>    * @size: size of the object to allocate
> + * @private: true if used for internal purposes
>    *
>    * This function creates and initializes a GEM CMA object of the given size,
>    * but doesn't allocate any memory to back the object.
> @@ -55,11 +56,11 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = {
>    * error code on failure.
>    */
>   static struct drm_gem_cma_object *
> -__drm_gem_cma_create(struct drm_device *drm, size_t size)
> +__drm_gem_cma_create(struct drm_device *drm, size_t size, bool private)
>   {
>   	struct drm_gem_cma_object *cma_obj;
>   	struct drm_gem_object *gem_obj;
> -	int ret;
> +	int ret = 0;
>   
>   	if (drm->driver->gem_create_object)
>   		gem_obj = drm->driver->gem_create_object(drm, size);
> @@ -73,7 +74,14 @@ __drm_gem_cma_create(struct drm_device *drm, size_t size)
>   
>   	cma_obj = container_of(gem_obj, struct drm_gem_cma_object, base);
>   
> -	ret = drm_gem_object_init(drm, gem_obj, size);
> +	if (private) {
> +		drm_gem_private_object_init(drm, gem_obj, size);
> +
> +		/* Always use writecombine for dma-buf mappings */
> +		cma_obj->map_noncoherent = false;
> +	} else {
> +		ret = drm_gem_object_init(drm, gem_obj, size);
> +	}
>   	if (ret)
>   		goto error;
>   
> @@ -111,12 +119,19 @@ struct drm_gem_cma_object *drm_gem_cma_create(struct drm_device *drm,
>   
>   	size = round_up(size, PAGE_SIZE);
>   
> -	cma_obj = __drm_gem_cma_create(drm, size);
> +	cma_obj = __drm_gem_cma_create(drm, size, false);
>   	if (IS_ERR(cma_obj))
>   		return cma_obj;
>   
> -	cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr,
> -				      GFP_KERNEL | __GFP_NOWARN);
> +	if (cma_obj->map_noncoherent) {
> +		cma_obj->vaddr = dma_alloc_noncoherent(drm->dev, size,
> +						       &cma_obj->paddr,
> +						       DMA_TO_DEVICE,
> +						       GFP_KERNEL | __GFP_NOWARN);
> +	} else {
> +		cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr,
> +					      GFP_KERNEL | __GFP_NOWARN);
> +	}
>   	if (!cma_obj->vaddr) {
>   		drm_dbg(drm, "failed to allocate buffer with size %zu\n",
>   			 size);
> @@ -432,7 +447,7 @@ drm_gem_cma_prime_import_sg_table(struct drm_device *dev,
>   		return ERR_PTR(-EINVAL);
>   
>   	/* Create a CMA GEM buffer. */
> -	cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size);
> +	cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size, true);
>   	if (IS_ERR(cma_obj))
>   		return ERR_CAST(cma_obj);
>   
> @@ -499,8 +514,13 @@ int drm_gem_cma_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
>   
>   	cma_obj = to_drm_gem_cma_obj(obj);
>   
> -	ret = dma_mmap_wc(cma_obj->base.dev->dev, vma, cma_obj->vaddr,
> -			  cma_obj->paddr, vma->vm_end - vma->vm_start);
> +	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
> +	if (!cma_obj->map_noncoherent)
> +		vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
> +
> +	ret = dma_mmap_pages(cma_obj->base.dev->dev,
> +			     vma, vma->vm_end - vma->vm_start,
> +			     virt_to_page(cma_obj->vaddr));
This breaks mmap on TI's J7 EVM (tidss driver). All DRM apps just die 
when doing mmap. Changing these lines back to dma_mmap_wc() makes it work.
Is dma_alloc_wc() even compatible with dma_mmap_pages()?
  Tomi
^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: [PATCH v5 1/3] drm: Add support for GEM buffers backed by non-coherent memory
  2021-05-27 10:40   ` Tomi Valkeinen
@ 2021-05-27 12:43     ` Paul Cercueil
  0 siblings, 0 replies; 10+ messages in thread
From: Paul Cercueil @ 2021-05-27 12:43 UTC (permalink / raw)
  To: Tomi Valkeinen
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Daniel Vetter, linux-kernel, dri-devel, linux-mips,
	Christoph Hellwig, list, Lokesh Vutla, Yadav, Pratyush
Hi Tomi,
Le jeu., mai 27 2021 at 13:40:07 +0300, Tomi Valkeinen 
<tomi.valkeinen@ideasonboard.com> a écrit :
> On 23/05/2021 20:04, Paul Cercueil wrote:
>> Having GEM buffers backed by non-coherent memory is interesting in 
>> the
>> particular case where it is faster to render to a non-coherent buffer
>> then sync the data cache, than to render to a write-combine buffer, 
>> and
>> (by extension) much faster than using a shadow buffer. This is true 
>> for
>> instance on some Ingenic SoCs, where even simple blits (e.g. memcpy)
>> are about three times faster using this method.
>> 
>> Add a 'map_noncoherent' flag to the drm_gem_cma_object structure, 
>> which
>> can be set by the drivers when they create the dumb buffer.
>> 
>> Since this really only applies to software rendering, disable this 
>> flag
>> as soon as the CMA objects are exported via PRIME.
>> 
>> v3: New patch. Now uses a simple 'map_noncoherent' flag to control 
>> how
>>      the objects are mapped, and use the new dma_mmap_pages function.
>> 
>> v4: Make sure map_noncoherent is always disabled when creating GEM
>>      objects meant to be used with dma-buf.
>> 
>> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
>> Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
>> ---
>>   drivers/gpu/drm/drm_gem_cma_helper.c | 38 
>> +++++++++++++++++++++-------
>>   include/drm/drm_gem_cma_helper.h     |  3 +++
>>   2 files changed, 32 insertions(+), 9 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c 
>> b/drivers/gpu/drm/drm_gem_cma_helper.c
>> index 7942cf05cd93..235c7a63da2b 100644
>> --- a/drivers/gpu/drm/drm_gem_cma_helper.c
>> +++ b/drivers/gpu/drm/drm_gem_cma_helper.c
>> @@ -46,6 +46,7 @@ static const struct drm_gem_object_funcs 
>> drm_gem_cma_default_funcs = {
>>    * __drm_gem_cma_create - Create a GEM CMA object without 
>> allocating memory
>>    * @drm: DRM device
>>    * @size: size of the object to allocate
>> + * @private: true if used for internal purposes
>>    *
>>    * This function creates and initializes a GEM CMA object of the 
>> given size,
>>    * but doesn't allocate any memory to back the object.
>> @@ -55,11 +56,11 @@ static const struct drm_gem_object_funcs 
>> drm_gem_cma_default_funcs = {
>>    * error code on failure.
>>    */
>>   static struct drm_gem_cma_object *
>> -__drm_gem_cma_create(struct drm_device *drm, size_t size)
>> +__drm_gem_cma_create(struct drm_device *drm, size_t size, bool 
>> private)
>>   {
>>   	struct drm_gem_cma_object *cma_obj;
>>   	struct drm_gem_object *gem_obj;
>> -	int ret;
>> +	int ret = 0;
>>   \x7f  	if (drm->driver->gem_create_object)
>>   		gem_obj = drm->driver->gem_create_object(drm, size);
>> @@ -73,7 +74,14 @@ __drm_gem_cma_create(struct drm_device *drm, 
>> size_t size)
>>   \x7f  	cma_obj = container_of(gem_obj, struct drm_gem_cma_object, 
>> base);
>>   \x7f-	ret = drm_gem_object_init(drm, gem_obj, size);
>> +	if (private) {
>> +		drm_gem_private_object_init(drm, gem_obj, size);
>> +
>> +		/* Always use writecombine for dma-buf mappings */
>> +		cma_obj->map_noncoherent = false;
>> +	} else {
>> +		ret = drm_gem_object_init(drm, gem_obj, size);
>> +	}
>>   	if (ret)
>>   		goto error;
>>   \x7f@@ -111,12 +119,19 @@ struct drm_gem_cma_object 
>> *drm_gem_cma_create(struct drm_device *drm,
>>   \x7f  	size = round_up(size, PAGE_SIZE);
>>   \x7f-	cma_obj = __drm_gem_cma_create(drm, size);
>> +	cma_obj = __drm_gem_cma_create(drm, size, false);
>>   	if (IS_ERR(cma_obj))
>>   		return cma_obj;
>>   \x7f-	cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr,
>> -				      GFP_KERNEL | __GFP_NOWARN);
>> +	if (cma_obj->map_noncoherent) {
>> +		cma_obj->vaddr = dma_alloc_noncoherent(drm->dev, size,
>> +						       &cma_obj->paddr,
>> +						       DMA_TO_DEVICE,
>> +						       GFP_KERNEL | __GFP_NOWARN);
>> +	} else {
>> +		cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr,
>> +					      GFP_KERNEL | __GFP_NOWARN);
>> +	}
>>   	if (!cma_obj->vaddr) {
>>   		drm_dbg(drm, "failed to allocate buffer with size %zu\n",
>>   			 size);
>> @@ -432,7 +447,7 @@ drm_gem_cma_prime_import_sg_table(struct 
>> drm_device *dev,
>>   		return ERR_PTR(-EINVAL);
>>   \x7f  	/* Create a CMA GEM buffer. */
>> -	cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size);
>> +	cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size, true);
>>   	if (IS_ERR(cma_obj))
>>   		return ERR_CAST(cma_obj);
>>   \x7f@@ -499,8 +514,13 @@ int drm_gem_cma_mmap(struct drm_gem_object 
>> *obj, struct vm_area_struct *vma)
>>   \x7f  	cma_obj = to_drm_gem_cma_obj(obj);
>>   \x7f-	ret = dma_mmap_wc(cma_obj->base.dev->dev, vma, cma_obj->vaddr,
>> -			  cma_obj->paddr, vma->vm_end - vma->vm_start);
>> +	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
>> +	if (!cma_obj->map_noncoherent)
>> +		vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
>> +
>> +	ret = dma_mmap_pages(cma_obj->base.dev->dev,
>> +			     vma, vma->vm_end - vma->vm_start,
>> +			     virt_to_page(cma_obj->vaddr));
> 
> This breaks mmap on TI's J7 EVM (tidss driver). All DRM apps just die 
> when doing mmap. Changing these lines back to dma_mmap_wc() makes it 
> work.
> 
> Is dma_alloc_wc() even compatible with dma_mmap_pages()?
My bad, dma_mmap_wc() is not just a regular mmap with writecombine page 
protection.
The solution, I guess, would be to call dma_mmap_wc() if 
(!cma_obj->map_noncoherent). I can send a patch later this week, unless 
you already have one?
Cheers,
-Paul
^ permalink raw reply	[flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-05-27 15:31 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-05-23 17:04 [PATCH v5 0/3] Add option to mmap GEM buffers cached Paul Cercueil
2021-05-23 17:04 ` [PATCH v5 1/3] drm: Add support for GEM buffers backed by non-coherent memory Paul Cercueil
2021-05-27 10:40   ` Tomi Valkeinen
2021-05-27 12:43     ` Paul Cercueil
2021-05-23 17:04 ` [PATCH v5 2/3] drm: Add and export function drm_fb_cma_sync_non_coherent Paul Cercueil
2021-05-23 18:40   ` kernel test robot
2021-05-23 17:04 ` [PATCH v5 3/3] drm/ingenic: Add option to alloc cached GEM buffers Paul Cercueil
2021-05-23 19:05 ` [PATCH v5 0/3] Add option to mmap GEM buffers cached Thomas Zimmermann
2021-05-23 19:19   ` Paul Cercueil
2021-05-25 10:03     ` Thomas Zimmermann
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).