linux-omap.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
@ 2010-05-01 20:44 Ohad Ben-Cohen
  2010-05-01 20:44 ` [RFC/PATCH 1/6] DSPBRIDGE: add memory_map_info to PROC Ohad Ben-Cohen
                   ` (6 more replies)
  0 siblings, 7 replies; 45+ messages in thread
From: Ohad Ben-Cohen @ 2010-05-01 20:44 UTC (permalink / raw)
  To: linux-omap
  Cc: Kanigeri Hari, Omar Ramirez Luna, Guzman Lugo Fernando,
	Menon Nishanth, Hiroshi Doyu, Ohad Ben-Cohen

This patchset introduces an approach to eliminate the direct calls
to follow_page and to the low level cache APIs.

The patchset works by caching the page information while memory
is mapped, and then using that information later when needed 
instead of calling follow_page. The low level cache API is then replaced
by the standard DMA API.

A few key points in the current approach that I'd be happy to hear
your feedback about:
1. The new mapping + page information is currently cached in the
   proc object, but it might fit better inside dmm map objects
   (by enhancing the DMM layer to support the required data caching,
   storing and retrieving).
2. The information is stored in a linked list. That's pretty fine
   as long as the number of memory mappings per application is not 
   extremely high. If that assumption is wrong, a different underlying
   data structure might be better (hash table, priority tree, etc..).
3. Moving to standard DMA API completely changes the user's point
   of view; users should no longer think in terms of which cache
   manipulation is required, but instead, they should just tell dspbridge
   before a DMA transfer begins, and after it ends. Between the begin
   and end calls, the buffer "belongs" to the DSP and should not
   be accessed by the user.

   The patchset renames the flush ioctl to begin_dma_to_dsp and
   the invalidate ioctl to begin_dma_from_dsp. Both functions
   eventually call dma_map_sg, with the former requesting a
   DMA_BIDIRECTIONAL direction, and the latter requesting a
   DMA_FROM_DEVICE direction.
   In addition, the patchset adds two new APIs which calls dma_unmap_sg:
   end_dma_to_dsp and end_dma_from_dsp.

   Ideally, there would be only a single begin_dma command and a single
   end_dma one, which would accept an additional parameter that will
   determine the direction of the transfer. Such an approach would be more
   versatile and cleaner, but it would also break all user space apps that
   use dspbridge today.

Notes:
1. During my tests, a few testsuite scenarios failed due to the
   fact that the test cases called flush/invalidate on a memory
   buffer it didn't map beforehand.
   I consider that as a misuse of the API, and thus a testsuite error.
2. The global bridge device struct is used by adding an 'extern'
   to proc. This issue should be handled in a different patch series
   (the struct should not be global. instead, it should be accessible
   to the dspbridge code via one of the context objects. This way we 
   will also be able to transform pr_* prints to dev_* prints).
3. The patchset is not yet rebased on the latest dspbridge commits.
   It's currently based on 13e2573f2162b76d45313e790fc67a0d7672930b.
4. The patchset was tested with the bridge testsuite and the dmm sample
   application running on ZOOM3 / 2.6.33.

I'd like to thank Hari and Fernando for initial review of this patchset.

Please review and let me know your comments.

Thanks,
Ohad.

---
If you want, you can also reach me at <  ohadb at ti dot com  >.

Ohad Ben-Cohen (6):
  DSPBRIDGE: add memory_map_info to PROC
  DSPBRIDGE: remember mapping and page info in proc_map
  DSPBRIDGE: remove mapping information in proc_unmap
  DSPBRIDGE: do not call follow_page
  DSPBRIDGE: do not use low level cache manipulation API
  DSPBRIDGE: add dspbridge API to mark end of DMA

 arch/arm/plat-omap/include/dspbridge/_dcd.h     |    6 +-
 arch/arm/plat-omap/include/dspbridge/proc.h     |    8 +-
 arch/arm/plat-omap/include/dspbridge/wcdioctl.h |    6 +-
 arch/arm/plat-omap/include/dspbridge/wmd.h      |    3 +-
 drivers/dsp/bridge/pmgr/wcd.c                   |   39 ++-
 drivers/dsp/bridge/rmgr/proc.c                  |  440 ++++++++++++++++++++---
 drivers/dsp/bridge/wmd/io_sm.c                  |   10 +-
 drivers/dsp/bridge/wmd/tiomap3430.c             |   11 +-
 8 files changed, 448 insertions(+), 75 deletions(-)


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC/PATCH 1/6] DSPBRIDGE: add memory_map_info to PROC
  2010-05-01 20:44 [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues Ohad Ben-Cohen
@ 2010-05-01 20:44 ` Ohad Ben-Cohen
  2010-05-01 20:44 ` [RFC/PATCH 2/6] DSPBRIDGE: remember mapping and page info in proc_map Ohad Ben-Cohen
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 45+ messages in thread
From: Ohad Ben-Cohen @ 2010-05-01 20:44 UTC (permalink / raw)
  To: linux-omap
  Cc: Kanigeri Hari, Omar Ramirez Luna, Guzman Lugo Fernando,
	Menon Nishanth, Hiroshi Doyu, Ohad Ben-Cohen

Add the memory_map_info structure which will be used to maintain
mapping and page information. Every memory mapping requested by the
MM application will be remembered and kept in a linked list
inside the proc object.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
---
If you want, you can also reach me at <  ohadb at ti dot com  >.

 drivers/dsp/bridge/rmgr/proc.c |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/drivers/dsp/bridge/rmgr/proc.c b/drivers/dsp/bridge/rmgr/proc.c
index 1f7dd09..95a194a 100644
--- a/drivers/dsp/bridge/rmgr/proc.c
+++ b/drivers/dsp/bridge/rmgr/proc.c
@@ -17,6 +17,8 @@
  */
 
 /* ------------------------------------ Host OS */
+#include <linux/list.h>
+#include <linux/spinlock.h>
 #include <dspbridge/host_os.h>
 
 /*  ----------------------------------- DSP/BIOS Bridge */
@@ -102,6 +104,20 @@ struct proc_object {
 	struct bridge_drv_interface *intf_fxns;	/* Function interface to WMD */
 	char *psz_last_coff;
 	struct list_head proc_list;
+
+	/* memory mapping information */
+	struct list_head maps;
+	spinlock_t maps_lock;
+};
+
+/* used to cache memory mapping information */
+struct memory_map_info {
+	struct list_head node;
+	struct page **pages;
+	u32 mpu_addr;
+	u32 dsp_addr;
+	u32 size;
+	u32 num_usr_pgs;
 };
 
 static u32 refs;
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC/PATCH 2/6] DSPBRIDGE: remember mapping and page info in proc_map
  2010-05-01 20:44 [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues Ohad Ben-Cohen
  2010-05-01 20:44 ` [RFC/PATCH 1/6] DSPBRIDGE: add memory_map_info to PROC Ohad Ben-Cohen
@ 2010-05-01 20:44 ` Ohad Ben-Cohen
  2010-05-15  8:34   ` Felipe Contreras
  2010-05-01 20:44 ` [RFC/PATCH 3/6] DSPBRIDGE: remove mapping information in proc_unmap Ohad Ben-Cohen
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 45+ messages in thread
From: Ohad Ben-Cohen @ 2010-05-01 20:44 UTC (permalink / raw)
  To: linux-omap
  Cc: Kanigeri Hari, Omar Ramirez Luna, Guzman Lugo Fernando,
	Menon Nishanth, Hiroshi Doyu, Ohad Ben-Cohen

Every time the MM application calls proc_map to map
a memory area, remember the details of that mapping,
together with the related page structures.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
---
If you want, you can also reach me at <  ohadb at ti dot com  >.

 arch/arm/plat-omap/include/dspbridge/wmd.h |    3 +-
 drivers/dsp/bridge/rmgr/proc.c             |   55 ++++++++++++++++++++++++++--
 drivers/dsp/bridge/wmd/io_sm.c             |   10 +++--
 drivers/dsp/bridge/wmd/tiomap3430.c        |   11 +++++-
 4 files changed, 68 insertions(+), 11 deletions(-)

diff --git a/arch/arm/plat-omap/include/dspbridge/wmd.h b/arch/arm/plat-omap/include/dspbridge/wmd.h
index f9883db..37ee0f3 100644
--- a/arch/arm/plat-omap/include/dspbridge/wmd.h
+++ b/arch/arm/plat-omap/include/dspbridge/wmd.h
@@ -182,7 +182,8 @@ typedef dsp_status(*fxn_brd_memwrite) (struct wmd_dev_context
 typedef dsp_status(*fxn_brd_memmap) (struct wmd_dev_context
 				     * hDevContext, u32 ul_mpu_addr,
 				     u32 ulVirtAddr, u32 ul_num_bytes,
-				     u32 ulMapAttrs);
+				     u32 ulMapAttrs,
+				     struct page **mapped_pages);
 
 /*
  *  ======== bridge_brd_mem_un_map ========
diff --git a/drivers/dsp/bridge/rmgr/proc.c b/drivers/dsp/bridge/rmgr/proc.c
index 95a194a..b03232f 100644
--- a/drivers/dsp/bridge/rmgr/proc.c
+++ b/drivers/dsp/bridge/rmgr/proc.c
@@ -130,6 +130,45 @@ static s32 get_envp_count(char **envp);
 static char **prepend_envp(char **new_envp, char **envp, s32 envp_elems,
 			   s32 cnew_envp, char *szVar);
 
+/* remember mapping information */
+static struct memory_map_info *add_mapping_info(struct proc_object *pr_obj,
+				u32 mpu_addr, u32 dsp_addr, u32 size)
+{
+	struct memory_map_info *map_info;
+
+	u32 num_usr_pgs = size / PG_SIZE4K;
+
+	pr_debug("%s: adding map info: mpu_addr 0x%x virt 0x%x size 0x%x\n",
+						__func__, mpu_addr,
+						dsp_addr, size);
+
+	map_info = kzalloc(sizeof(struct memory_map_info), GFP_KERNEL);
+	if (!map_info) {
+		pr_err("%s: kzalloc failed\n", __func__);
+		return NULL;
+	}
+	INIT_LIST_HEAD(&map_info->node);
+
+	map_info->pages = kcalloc(num_usr_pgs, sizeof(struct page *),
+							GFP_KERNEL);
+	if (!map_info->pages) {
+		pr_err("%s: kzalloc failed\n", __func__);
+		kfree(map_info);
+		return NULL;
+	}
+
+	map_info->mpu_addr = mpu_addr;
+	map_info->dsp_addr = dsp_addr;
+	map_info->size = size;
+	map_info->num_usr_pgs = num_usr_pgs;
+
+	spin_lock(&pr_obj->maps_lock);
+	list_add(&map_info->node, &pr_obj->maps);
+	spin_unlock(&pr_obj->maps_lock);
+
+	return map_info;
+}
+
 /*
  *  ======== proc_attach ========
  *  Purpose:
@@ -185,6 +224,8 @@ proc_attach(u32 processor_id,
 	p_proc_object->process = current->tgid;
 
 	INIT_LIST_HEAD(&p_proc_object->proc_list);
+	INIT_LIST_HEAD(&p_proc_object->maps);
+	spin_lock_init(&p_proc_object->maps_lock);
 
 	if (attr_in)
 		p_proc_object->utimeout = attr_in->utimeout;
@@ -1091,6 +1132,7 @@ dsp_status proc_map(void *hprocessor, void *pmpu_addr, u32 ul_size,
 	dsp_status status = DSP_SOK;
 	struct proc_object *p_proc_object = (struct proc_object *)hprocessor;
 	struct dmm_map_object *map_obj;
+	struct memory_map_info *map_info;
 
 #ifdef CONFIG_BRIDGE_CACHE_LINE_CHECK
 	if ((ul_map_attr & BUFMODE_MASK) != RBUF) {
@@ -1121,10 +1163,15 @@ dsp_status proc_map(void *hprocessor, void *pmpu_addr, u32 ul_size,
 
 	/* Add mapping to the page tables. */
 	if (DSP_SUCCEEDED(status)) {
-
-		status = (*p_proc_object->intf_fxns->pfn_brd_mem_map)
-		    (p_proc_object->hwmd_context, pa_align, va_align,
-		     size_align, ul_map_attr);
+		/* cache mapping information */
+		map_info = add_mapping_info(p_proc_object, pa_align, va_align,
+						size_align);
+		if (!map_info)
+			status = DSP_EMEMORY;
+		else
+			status = (*p_proc_object->intf_fxns->pfn_brd_mem_map)
+			    (p_proc_object->hwmd_context, pa_align, va_align,
+			     size_align, ul_map_attr, map_info->pages);
 	}
 	if (DSP_SUCCEEDED(status)) {
 		/* Mapped address = MSB of VA | LSB of PA */
diff --git a/drivers/dsp/bridge/wmd/io_sm.c b/drivers/dsp/bridge/wmd/io_sm.c
index 1b5d977..d8cd9e3 100644
--- a/drivers/dsp/bridge/wmd/io_sm.c
+++ b/drivers/dsp/bridge/wmd/io_sm.c
@@ -505,7 +505,8 @@ dsp_status bridge_io_on_loaded(struct io_mgr *hio_mgr)
 				    hio_mgr->intf_fxns->
 				    pfn_brd_mem_map(hio_mgr->hwmd_context,
 						    pa_curr, va_curr,
-						    page_size[i], map_attrs);
+						    page_size[i], map_attrs,
+						    NULL);
 				if (DSP_FAILED(status))
 					goto func_end;
 				pa_curr += page_size[i];
@@ -570,7 +571,8 @@ dsp_status bridge_io_on_loaded(struct io_mgr *hio_mgr)
 				    hio_mgr->intf_fxns->
 				    pfn_brd_mem_map(hio_mgr->hwmd_context,
 						    pa_curr, va_curr,
-						    page_size[i], map_attrs);
+						    page_size[i], map_attrs,
+						    NULL);
 				dev_dbg(bridge,
 					"shm MMU PTE entry PA %x"
 					" VA %x DSP_VA %x Size %x\n",
@@ -639,7 +641,7 @@ dsp_status bridge_io_on_loaded(struct io_mgr *hio_mgr)
 				     hio_mgr->ext_proc_info.ty_tlb[i].
 				     ul_gpp_phys,
 				     hio_mgr->ext_proc_info.ty_tlb[i].
-				     ul_dsp_virt, 0x100000, map_attrs);
+				     ul_dsp_virt, 0x100000, map_attrs, NULL);
 			}
 		}
 		if (DSP_FAILED(status))
@@ -658,7 +660,7 @@ dsp_status bridge_io_on_loaded(struct io_mgr *hio_mgr)
 		status = hio_mgr->intf_fxns->pfn_brd_mem_map
 		    (hio_mgr->hwmd_context, l4_peripheral_table[i].phys_addr,
 		     l4_peripheral_table[i].dsp_virt_addr, HW_PAGE_SIZE4KB,
-		     map_attrs);
+		     map_attrs, NULL);
 		if (DSP_FAILED(status))
 			goto func_end;
 		i++;
diff --git a/drivers/dsp/bridge/wmd/tiomap3430.c b/drivers/dsp/bridge/wmd/tiomap3430.c
index 0c6cf2f..720c0c0 100644
--- a/drivers/dsp/bridge/wmd/tiomap3430.c
+++ b/drivers/dsp/bridge/wmd/tiomap3430.c
@@ -107,7 +107,8 @@ static dsp_status bridge_brd_mem_write(struct wmd_dev_context *dev_context,
 				    u32 ul_num_bytes, u32 ulMemType);
 static dsp_status bridge_brd_mem_map(struct wmd_dev_context *hDevContext,
 				  u32 ul_mpu_addr, u32 ulVirtAddr,
-				  u32 ul_num_bytes, u32 ul_map_attr);
+				  u32 ul_num_bytes, u32 ul_map_attr,
+				  struct page **mapped_pages);
 static dsp_status bridge_brd_mem_un_map(struct wmd_dev_context *hDevContext,
 				     u32 ulVirtAddr, u32 ul_num_bytes);
 static dsp_status bridge_dev_create(OUT struct wmd_dev_context **ppDevContext,
@@ -948,6 +949,7 @@ static dsp_status bridge_dev_create(OUT struct wmd_dev_context **ppDevContext,
 		status = DSP_EMEMORY;
 		goto func_end;
 	}
+
 	status = cfg_get_host_resources((struct cfg_devnode *)
 					drv_get_first_dev_extension(),
 					&resources);
@@ -1276,7 +1278,8 @@ static dsp_status bridge_brd_mem_write(struct wmd_dev_context *hDevContext,
  */
 static dsp_status bridge_brd_mem_map(struct wmd_dev_context *hDevContext,
 				  u32 ul_mpu_addr, u32 ulVirtAddr,
-				  u32 ul_num_bytes, u32 ul_map_attr)
+				  u32 ul_num_bytes, u32 ul_map_attr,
+				  struct page **mapped_pages)
 {
 	u32 attrs;
 	dsp_status status = DSP_SOK;
@@ -1438,12 +1441,16 @@ static dsp_status bridge_brd_mem_map(struct wmd_dev_context *hDevContext,
 					bad_page_dump(page_to_phys(mapped_page),
 						      mapped_page);
 				}
+
 				status = pte_set(dev_context->pt_attrs,
 						 page_to_phys(mapped_page), va,
 						 HW_PAGE_SIZE4KB, &hw_attrs);
 				if (DSP_FAILED(status))
 					break;
 
+				if (mapped_pages)
+					mapped_pages[pg_i] = mapped_page;
+
 				va += HW_PAGE_SIZE4KB;
 				ul_mpu_addr += HW_PAGE_SIZE4KB;
 			} else {
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC/PATCH 3/6] DSPBRIDGE: remove mapping information in proc_unmap
  2010-05-01 20:44 [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues Ohad Ben-Cohen
  2010-05-01 20:44 ` [RFC/PATCH 1/6] DSPBRIDGE: add memory_map_info to PROC Ohad Ben-Cohen
  2010-05-01 20:44 ` [RFC/PATCH 2/6] DSPBRIDGE: remember mapping and page info in proc_map Ohad Ben-Cohen
@ 2010-05-01 20:44 ` Ohad Ben-Cohen
  2010-05-15  8:38   ` Felipe Contreras
  2010-05-01 20:44 ` [RFC/PATCH 4/6] DSPBRIDGE: do not call follow_page Ohad Ben-Cohen
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 45+ messages in thread
From: Ohad Ben-Cohen @ 2010-05-01 20:44 UTC (permalink / raw)
  To: linux-omap
  Cc: Kanigeri Hari, Omar Ramirez Luna, Guzman Lugo Fernando,
	Menon Nishanth, Hiroshi Doyu, Ohad Ben-Cohen

Clean up all mapping information resources whenever
a buffer is unmapped.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
---
If you want, you can also reach me at <  ohadb at ti dot com  >.

 drivers/dsp/bridge/rmgr/proc.c |   43 ++++++++++++++++++++++++++++++++++++++++
 1 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/drivers/dsp/bridge/rmgr/proc.c b/drivers/dsp/bridge/rmgr/proc.c
index b03232f..ebb11b1 100644
--- a/drivers/dsp/bridge/rmgr/proc.c
+++ b/drivers/dsp/bridge/rmgr/proc.c
@@ -169,6 +169,46 @@ static struct memory_map_info *add_mapping_info(struct proc_object *pr_obj,
 	return map_info;
 }
 
+static int match_exact_map_info(struct memory_map_info *map_info,
+					u32 dsp_addr, u32 size)
+{
+	return map_info->dsp_addr == dsp_addr &&
+		map_info->size == size;
+}
+
+static void remove_mapping_information(struct proc_object *pr_obj,
+					u32 dsp_addr, u32 size)
+{
+	struct memory_map_info *map_info;
+	struct list_head *iter;
+
+	pr_debug("%s: looking for virt 0x%x size 0x%x\n", __func__,
+							dsp_addr, size);
+
+	spin_lock(&pr_obj->maps_lock);
+	list_for_each(iter, &pr_obj->maps) {
+		map_info = list_entry(iter, struct memory_map_info, node);
+		pr_debug("%s: candidate: mpu_addr 0x%x virt 0x%x size 0x%x\n",
+						__func__,
+						map_info->mpu_addr,
+						map_info->dsp_addr,
+						map_info->size);
+
+		if (match_exact_map_info(map_info, dsp_addr, size)) {
+			pr_debug("%s: match, deleting map info\n", __func__);
+			list_del(&map_info->node);
+			kfree(map_info->pages);
+			kfree(map_info);
+			goto out;
+		}
+		pr_debug("%s: candidate didn't match\n", __func__);
+	}
+
+	pr_err("%s: failed to find given map info\n", __func__);
+out:
+	spin_unlock(&pr_obj->maps_lock);
+}
+
 /*
  *  ======== proc_attach ========
  *  Purpose:
@@ -1508,6 +1548,9 @@ dsp_status proc_un_map(void *hprocessor, void *map_addr,
 		status = (*p_proc_object->intf_fxns->pfn_brd_mem_un_map)
 		    (p_proc_object->hwmd_context, va_align, size_align);
 	}
+
+	remove_mapping_information(p_proc_object, va_align, size_align);
+
 	mutex_unlock(&proc_lock);
 	if (DSP_FAILED(status))
 		goto func_end;
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC/PATCH 4/6] DSPBRIDGE: do not call follow_page
  2010-05-01 20:44 [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues Ohad Ben-Cohen
                   ` (2 preceding siblings ...)
  2010-05-01 20:44 ` [RFC/PATCH 3/6] DSPBRIDGE: remove mapping information in proc_unmap Ohad Ben-Cohen
@ 2010-05-01 20:44 ` Ohad Ben-Cohen
  2010-05-01 20:44 ` [RFC/PATCH 5/6] DSPBRIDGE: do not use low level cache manipulation API Ohad Ben-Cohen
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 45+ messages in thread
From: Ohad Ben-Cohen @ 2010-05-01 20:44 UTC (permalink / raw)
  To: linux-omap
  Cc: Kanigeri Hari, Omar Ramirez Luna, Guzman Lugo Fernando,
	Menon Nishanth, Hiroshi Doyu, Ohad Ben-Cohen

Eliminate the call to follow_page. Instead, use the page
information that was kept during the proc_map call.
This also has the advantage that users can now only
specify memory areas that were previously mapped.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
---
If you want, you can also reach me at <  ohadb at ti dot com  >.

 drivers/dsp/bridge/rmgr/proc.c |  143 ++++++++++++++++++++++++++--------------
 1 files changed, 93 insertions(+), 50 deletions(-)

diff --git a/drivers/dsp/bridge/rmgr/proc.c b/drivers/dsp/bridge/rmgr/proc.c
index ebb11b1..bbc7e0f 100644
--- a/drivers/dsp/bridge/rmgr/proc.c
+++ b/drivers/dsp/bridge/rmgr/proc.c
@@ -209,6 +209,71 @@ out:
 	spin_unlock(&pr_obj->maps_lock);
 }
 
+static int match_containing_map_info(struct memory_map_info *map_info,
+					u32 mpu_addr, u32 size)
+{
+	u32 map_info_end = map_info->mpu_addr + map_info->size;
+
+	return mpu_addr >= map_info->mpu_addr &&
+		mpu_addr + size <= map_info_end;
+}
+
+static struct memory_map_info *find_containing_mapping(
+				struct proc_object *pr_obj,
+				u32 mpu_addr, u32 size)
+{
+	struct memory_map_info *map_info;
+	struct list_head *iter;
+	pr_debug("%s: looking for mpu_addr 0x%x size 0x%x\n", __func__,
+						mpu_addr, size);
+
+	spin_lock(&pr_obj->maps_lock);
+	list_for_each(iter, &pr_obj->maps) {
+		map_info = list_entry(iter, struct memory_map_info, node);
+		pr_debug("%s: candidate: mpu_addr 0x%x virt 0x%x size 0x%x\n",
+						__func__,
+						map_info->mpu_addr,
+						map_info->dsp_addr,
+						map_info->size);
+		if (match_containing_map_info(map_info, mpu_addr, size)) {
+			pr_debug("%s: match!\n", __func__);
+			goto out;
+		}
+
+		pr_debug("%s: no match!\n", __func__);
+	}
+
+	map_info = NULL;
+out:
+	spin_unlock(&pr_obj->maps_lock);
+	return map_info;
+}
+
+static int find_first_page_in_cache(struct memory_map_info *map_info,
+					unsigned long mpu_addr)
+{
+	u32 mapped_base_page = map_info->mpu_addr >> PAGE_SHIFT;
+	u32 requested_base_page = mpu_addr >> PAGE_SHIFT;
+	int pg_index = requested_base_page - mapped_base_page;
+
+	if (pg_index < 0 || pg_index >= map_info->num_usr_pgs) {
+		pr_err("%s: failed (got %d)\n", __func__, pg_index);
+		return -1;
+	}
+
+	pr_debug("%s: first page is %d\n", __func__, pg_index);
+	return pg_index;
+}
+
+static inline struct page *get_mapping_page(struct memory_map_info *map_info,
+								int pg_i)
+{
+	if (pg_i < 0 || pg_i >= map_info->num_usr_pgs)
+		return NULL;
+
+	return map_info->pages[pg_i];
+}
+
 /*
  *  ======== proc_attach ========
  *  Purpose:
@@ -561,23 +626,30 @@ dsp_status proc_enum_nodes(void *hprocessor, void **node_tab,
 }
 
 /* Cache operation against kernel address instead of users */
-static int memory_sync_page(struct vm_area_struct *vma, unsigned long start,
-			    ssize_t len, enum dsp_flushtype ftype)
+static int memory_sync_page(struct memory_map_info *map_info,
+		unsigned long start, ssize_t len, enum dsp_flushtype ftype)
 {
 	struct page *page;
 	void *kaddr;
 	unsigned long offset;
 	ssize_t rest;
+	int pg_i;
+
+	pg_i = find_first_page_in_cache(map_info, start);
+	if (pg_i < 0) {
+		pr_err("%s: failed to find first page in cache\n", __func__);
+		return -EINVAL;
+	}
 
 	while (len) {
-		page = follow_page(vma, start, FOLL_GET);
+		page = get_mapping_page(map_info, pg_i);
 		if (!page) {
 			pr_err("%s: no page for %08lx\n", __func__, start);
 			return -EINVAL;
 		} else if (IS_ERR(page)) {
 			pr_err("%s: err page for %08lx(%lu)\n", __func__, start,
-			       IS_ERR(page));
-			return IS_ERR(page);
+			       PTR_ERR(page));
+			return PTR_ERR(page);
 		}
 
 		offset = start & ~PAGE_MASK;
@@ -586,59 +658,22 @@ static int memory_sync_page(struct vm_area_struct *vma, unsigned long start,
 		mem_flush_cache(kaddr, rest, ftype);
 
 		kunmap(page);
-		put_page(page);
 		len -= rest;
 		start += rest;
+		pg_i++;
 	}
 
 	return 0;
 }
 
-/* Check if the given area blongs to process virtul memory address space */
-static int memory_sync_vma(unsigned long start, u32 len,
-			   enum dsp_flushtype ftype)
-{
-	int err = 0;
-	unsigned long end;
-	struct vm_area_struct *vma;
-
-	end = start + len;
-	if (end <= start)
-		return -EINVAL;
-
-	while ((vma = find_vma(current->mm, start)) != NULL) {
-		ssize_t size;
-
-		if (vma->vm_flags & (VM_IO | VM_PFNMAP))
-			return -EINVAL;
-
-		if (vma->vm_start > start)
-			return -EINVAL;
-
-		size = min_t(ssize_t, vma->vm_end - start, len);
-		err = memory_sync_page(vma, start, size, ftype);
-		if (err)
-			break;
-
-		if (end <= vma->vm_end)
-			break;
-
-		start = vma->vm_end;
-	}
-
-	if (!vma)
-		err = -EINVAL;
-
-	return err;
-}
-
 static dsp_status proc_memory_sync(void *hprocessor, void *pmpu_addr,
 				   u32 ul_size, u32 ul_flags,
-				   enum dsp_flushtype FlushMemType)
+				   enum dsp_flushtype ftype)
 {
 	/* Keep STATUS here for future additions to this function */
 	dsp_status status = DSP_SOK;
 	struct proc_object *p_proc_object = (struct proc_object *)hprocessor;
+	struct memory_map_info *map_info;
 
 	DBC_REQUIRE(refs > 0);
 
@@ -647,18 +682,26 @@ static dsp_status proc_memory_sync(void *hprocessor, void *pmpu_addr,
 		goto err_out;
 	}
 
-	down_read(&current->mm->mmap_sem);
+	pr_debug("%s: addr 0x%x, size 0x%x, type %d\n", __func__,
+							(u32)pmpu_addr,
+							ul_size, ftype);
+
+	/* find requested memory are in cached mapping information */
+	map_info = find_containing_mapping(p_proc_object, (u32) pmpu_addr,
+								ul_size);
+	if (!map_info) {
+		pr_err("%s: find_containing_mapping failed\n", __func__);
+		status = DSP_EHANDLE;
+		goto err_out;
+	}
 
-	if (memory_sync_vma((u32) pmpu_addr, ul_size, FlushMemType)) {
+	if (memory_sync_page(map_info, (u32) pmpu_addr, ul_size, ftype)) {
 		pr_err("%s: InValid address parameters %p %x\n",
 		       __func__, pmpu_addr, ul_size);
 		status = DSP_EHANDLE;
 	}
 
-	up_read(&current->mm->mmap_sem);
-
 err_out:
-
 	return status;
 }
 
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC/PATCH 5/6] DSPBRIDGE: do not use low level cache manipulation API
  2010-05-01 20:44 [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues Ohad Ben-Cohen
                   ` (3 preceding siblings ...)
  2010-05-01 20:44 ` [RFC/PATCH 4/6] DSPBRIDGE: do not call follow_page Ohad Ben-Cohen
@ 2010-05-01 20:44 ` Ohad Ben-Cohen
  2010-05-02 11:56   ` Ohad Ben-Cohen
  2010-05-01 20:44 ` [RFC/PATCH 6/6] DSPBRIDGE: add dspbridge API to mark end of DMA Ohad Ben-Cohen
  2010-05-02 13:17 ` [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues Felipe Contreras
  6 siblings, 1 reply; 45+ messages in thread
From: Ohad Ben-Cohen @ 2010-05-01 20:44 UTC (permalink / raw)
  To: linux-omap
  Cc: Kanigeri Hari, Omar Ramirez Luna, Guzman Lugo Fernando,
	Menon Nishanth, Hiroshi Doyu, Ohad Ben-Cohen

Instead of using low level cache manipulation API,
use the standard DMA API.  This changes the concept
of the dspbridge cache API a little, hence the naming changes:
* Flush marks the beginning of a DMA transfer from the MPU
to the DSP.
* Invalidate marks the beginning of a DMA transfer from the DSP
to the MPU.

Both of these actions eventually build a scatter gatter list
using the page information that was kept during proc_map,
and feed it to the standard dma_map_sg API.
Note that now users cannot manipulate the cache state of any random
address; if the buffer is not part of a previous memory mapping of that
application, the request is denied.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
---
If you want, you can also reach me at <  ohadb at ti dot com  >.

 arch/arm/plat-omap/include/dspbridge/_dcd.h     |    4 +-
 arch/arm/plat-omap/include/dspbridge/proc.h     |    4 +-
 arch/arm/plat-omap/include/dspbridge/wcdioctl.h |    4 +-
 drivers/dsp/bridge/pmgr/wcd.c                   |   12 +-
 drivers/dsp/bridge/rmgr/proc.c                  |  152 ++++++++++++++++++-----
 5 files changed, 134 insertions(+), 42 deletions(-)

diff --git a/arch/arm/plat-omap/include/dspbridge/_dcd.h b/arch/arm/plat-omap/include/dspbridge/_dcd.h
index 1350feb..0af4a31 100644
--- a/arch/arm/plat-omap/include/dspbridge/_dcd.h
+++ b/arch/arm/plat-omap/include/dspbridge/_dcd.h
@@ -110,9 +110,9 @@ extern u32 procwrap_reserve_memory(union Trapped_Args *args, void *pr_ctxt);
 extern u32 procwrap_un_reserve_memory(union Trapped_Args *args, void *pr_ctxt);
 extern u32 procwrap_map(union Trapped_Args *args, void *pr_ctxt);
 extern u32 procwrap_un_map(union Trapped_Args *args, void *pr_ctxt);
-extern u32 procwrap_flush_memory(union Trapped_Args *args, void *pr_ctxt);
+extern u32 procwrap_begin_dma_to_dsp(union Trapped_Args *args, void *pr_ctxt);
 extern u32 procwrap_stop(union Trapped_Args *args, void *pr_ctxt);
-extern u32 procwrap_invalidate_memory(union Trapped_Args *args, void *pr_ctxt);
+extern u32 procwrap_begin_dma_from_dsp(union Trapped_Args *args, void *pr_ctxt);
 
 /* NODE wrapper functions */
 extern u32 nodewrap_allocate(union Trapped_Args *args, void *pr_ctxt);
diff --git a/arch/arm/plat-omap/include/dspbridge/proc.h b/arch/arm/plat-omap/include/dspbridge/proc.h
index 0707739..f8450a6 100644
--- a/arch/arm/plat-omap/include/dspbridge/proc.h
+++ b/arch/arm/plat-omap/include/dspbridge/proc.h
@@ -472,7 +472,7 @@ extern dsp_status proc_stop(void *hprocessor);
  *  Details:
  *      All the arguments are currently ignored.
  */
-extern dsp_status proc_flush_memory(void *hprocessor,
+extern dsp_status proc_begin_dma_to_dsp(void *hprocessor,
 				    void *pmpu_addr, u32 ul_size, u32 ul_flags);
 
 /*
@@ -493,7 +493,7 @@ extern dsp_status proc_flush_memory(void *hprocessor,
  *  Details:
  *      All the arguments are currently ignored.
  */
-extern dsp_status proc_invalidate_memory(void *hprocessor,
+extern dsp_status proc_begin_dma_from_dsp(void *hprocessor,
 					 void *pmpu_addr, u32 ul_size);
 
 /*
diff --git a/arch/arm/plat-omap/include/dspbridge/wcdioctl.h b/arch/arm/plat-omap/include/dspbridge/wcdioctl.h
index b6a4dda..aba2078 100644
--- a/arch/arm/plat-omap/include/dspbridge/wcdioctl.h
+++ b/arch/arm/plat-omap/include/dspbridge/wcdioctl.h
@@ -452,9 +452,9 @@ union Trapped_Args {
 #define PROC_UNRSVMEM		_IOW(DB, DB_IOC(DB_PROC, 11), unsigned long)
 #define PROC_MAPMEM		_IOWR(DB, DB_IOC(DB_PROC, 12), unsigned long)
 #define PROC_UNMAPMEM		_IOR(DB, DB_IOC(DB_PROC, 13), unsigned long)
-#define PROC_FLUSHMEMORY	_IOW(DB, DB_IOC(DB_PROC, 14), unsigned long)
+#define PROC_BEGINDMATODSP	_IOW(DB, DB_IOC(DB_PROC, 14), unsigned long)
 #define PROC_STOP		_IOWR(DB, DB_IOC(DB_PROC, 15), unsigned long)
-#define PROC_INVALIDATEMEMORY	_IOW(DB, DB_IOC(DB_PROC, 16), unsigned long)
+#define PROC_BEGINDMAFROMDSP	_IOW(DB, DB_IOC(DB_PROC, 16), unsigned long)
 
 /* NODE Module */
 #define NODE_ALLOCATE		_IOWR(DB, DB_IOC(DB_NODE, 0), unsigned long)
diff --git a/drivers/dsp/bridge/pmgr/wcd.c b/drivers/dsp/bridge/pmgr/wcd.c
index 15a05a6..89243f1 100644
--- a/drivers/dsp/bridge/pmgr/wcd.c
+++ b/drivers/dsp/bridge/pmgr/wcd.c
@@ -111,9 +111,9 @@ static struct wcd_cmd proc_cmd[] = {
 	{procwrap_un_reserve_memory},	/* PROC_UNRSVMEM */
 	{procwrap_map},		/* PROC_MAPMEM */
 	{procwrap_un_map},	/* PROC_UNMAPMEM */
-	{procwrap_flush_memory},	/* PROC_FLUSHMEMORY */
+	{procwrap_begin_dma_to_dsp},	/* PROC_BEGINDMATODSP */
 	{procwrap_stop},	/* PROC_STOP */
-	{procwrap_invalidate_memory},	/* PROC_INVALIDATEMEMORY */
+	{procwrap_begin_dma_from_dsp},	/* PROC_BEGINDMAFROMDSP */
 };
 
 /* NODE wrapper functions */
@@ -680,7 +680,7 @@ u32 procwrap_enum_node_info(union Trapped_Args *args, void *pr_ctxt)
 /*
  * ======== procwrap_flush_memory ========
  */
-u32 procwrap_flush_memory(union Trapped_Args *args, void *pr_ctxt)
+u32 procwrap_begin_dma_to_dsp(union Trapped_Args *args, void *pr_ctxt)
 {
 	dsp_status status;
 
@@ -688,7 +688,7 @@ u32 procwrap_flush_memory(union Trapped_Args *args, void *pr_ctxt)
 	    PROC_WRITEBACK_INVALIDATE_MEM)
 		return DSP_EINVALIDARG;
 
-	status = proc_flush_memory(args->args_proc_flushmemory.hprocessor,
+	status = proc_begin_dma_to_dsp(args->args_proc_flushmemory.hprocessor,
 				   args->args_proc_flushmemory.pmpu_addr,
 				   args->args_proc_flushmemory.ul_size,
 				   args->args_proc_flushmemory.ul_flags);
@@ -698,12 +698,12 @@ u32 procwrap_flush_memory(union Trapped_Args *args, void *pr_ctxt)
 /*
  * ======== procwrap_invalidate_memory ========
  */
-u32 procwrap_invalidate_memory(union Trapped_Args *args, void *pr_ctxt)
+u32 procwrap_begin_dma_from_dsp(union Trapped_Args *args, void *pr_ctxt)
 {
 	dsp_status status;
 
 	status =
-	    proc_invalidate_memory(args->args_proc_invalidatememory.hprocessor,
+	    proc_begin_dma_from_dsp(args->args_proc_invalidatememory.hprocessor,
 				   args->args_proc_invalidatememory.pmpu_addr,
 				   args->args_proc_invalidatememory.ul_size);
 	return status;
diff --git a/drivers/dsp/bridge/rmgr/proc.c b/drivers/dsp/bridge/rmgr/proc.c
index bbc7e0f..8a76681 100644
--- a/drivers/dsp/bridge/rmgr/proc.c
+++ b/drivers/dsp/bridge/rmgr/proc.c
@@ -18,6 +18,8 @@
 
 /* ------------------------------------ Host OS */
 #include <linux/list.h>
+#include <linux/dma-mapping.h>
+#include <linux/scatterlist.h>
 #include <linux/spinlock.h>
 #include <dspbridge/host_os.h>
 
@@ -80,6 +82,7 @@
 #define WBUF		0x8000		/* Output Buffer */
 
 extern char *iva_img;
+extern struct device *bridge;
 
 /*  ----------------------------------- Globals */
 
@@ -110,6 +113,18 @@ struct proc_object {
 	spinlock_t maps_lock;
 };
 
+/* used to cache dma mapping information */
+struct bridge_dma_map_info {
+	/* direction of DMA in action, or DMA_NONE */
+	enum dma_data_direction dir;
+	/* number of elements requested by us */
+	int num_pages;
+	/* number of elements returned from dma_map_sg */
+	int sg_num;
+	/* list of buffers used in this DMA action */
+	struct scatterlist *sg;
+};
+
 /* used to cache memory mapping information */
 struct memory_map_info {
 	struct list_head node;
@@ -118,6 +133,7 @@ struct memory_map_info {
 	u32 dsp_addr;
 	u32 size;
 	u32 num_usr_pgs;
+	struct bridge_dma_map_info dma_info;
 };
 
 static u32 refs;
@@ -130,7 +146,23 @@ static s32 get_envp_count(char **envp);
 static char **prepend_envp(char **new_envp, char **envp, s32 envp_elems,
 			   s32 cnew_envp, char *szVar);
 
-/* remember mapping information */
+/* Mapping and Page info caching
+ *
+ * The map_info mechanism is built to remember the (struct page *)
+ * pointers of all pages per a specific memory mapping of a specific
+ * process.
+ * Whenever a memory area is mapped, get_user_pages is used to pin the
+ * relevant pages in memory. As a result of running get_user_pages,
+ * we get the pointers to the page structures, which we now keep in
+ * the memory_map_info struct.
+ * Then, any time the user (the MM application) intends to begin
+ * a DMA (Direct Memory Access) operation to/from the remote processor,
+ * we use this cached page information to build a scatter gatter list
+ * which is given to the standard DMA API (which takes care of low
+ * level cache manipulation).
+ * Currently a simple linked list is used to cache the memory mapping
+ * info per process. This can be further optimized if needed.
+ */
 static struct memory_map_info *add_mapping_info(struct proc_object *pr_obj,
 				u32 mpu_addr, u32 dsp_addr, u32 size)
 {
@@ -197,6 +229,7 @@ static void remove_mapping_information(struct proc_object *pr_obj,
 		if (match_exact_map_info(map_info, dsp_addr, size)) {
 			pr_debug("%s: match, deleting map info\n", __func__);
 			list_del(&map_info->node);
+			kfree(map_info->dma_info.sg);
 			kfree(map_info->pages);
 			kfree(map_info);
 			goto out;
@@ -625,50 +658,109 @@ dsp_status proc_enum_nodes(void *hprocessor, void **node_tab,
 	return status;
 }
 
-/* Cache operation against kernel address instead of users */
-static int memory_sync_page(struct memory_map_info *map_info,
-		unsigned long start, ssize_t len, enum dsp_flushtype ftype)
+static int build_dma_sg(struct memory_map_info *map_info, unsigned long start,
+						ssize_t len, int pg_i)
 {
 	struct page *page;
-	void *kaddr;
 	unsigned long offset;
 	ssize_t rest;
-	int pg_i;
-
-	pg_i = find_first_page_in_cache(map_info, start);
-	if (pg_i < 0) {
-		pr_err("%s: failed to find first page in cache\n", __func__);
-		return -EINVAL;
-	}
+	int ret = 0, i = 0;
+	struct scatterlist *sg = map_info->dma_info.sg;
 
 	while (len) {
 		page = get_mapping_page(map_info, pg_i);
 		if (!page) {
 			pr_err("%s: no page for %08lx\n", __func__, start);
-			return -EINVAL;
+			ret = -EINVAL;
+			goto out;
 		} else if (IS_ERR(page)) {
 			pr_err("%s: err page for %08lx(%lu)\n", __func__, start,
 			       PTR_ERR(page));
-			return PTR_ERR(page);
+			ret = PTR_ERR(page);
+			goto out;
 		}
 
 		offset = start & ~PAGE_MASK;
-		kaddr = kmap(page) + offset;
 		rest = min_t(ssize_t, PAGE_SIZE - offset, len);
-		mem_flush_cache(kaddr, rest, ftype);
 
-		kunmap(page);
+		sg_set_page(&sg[i], page, rest, offset);
+
 		len -= rest;
 		start += rest;
-		pg_i++;
+		pg_i++, i++;
+	}
+
+	if (i != map_info->dma_info.num_pages) {
+		pr_err("%s: bad number of sg iterations\n", __func__);
+		ret = -EFAULT;
+		goto out;
+	}
+
+out:
+	return ret;
+}
+
+/* Cache operation against kernel address instead of users */
+static int memory_release_ownership(struct memory_map_info *map_info,
+		unsigned long start, ssize_t len, enum dma_data_direction dir)
+{
+	int pg_i, ret, sg_num;
+	struct scatterlist *sg;
+	unsigned long first_data_page = start >> PAGE_SHIFT;
+	unsigned long last_data_page = ((u32)(start + len - 1) >> PAGE_SHIFT);
+	/* calculating the number of pages this area spans */
+	unsigned long num_pages = last_data_page - first_data_page + 1;
+
+	pg_i = find_first_page_in_cache(map_info, start);
+	if (pg_i < 0) {
+		pr_err("%s: failed to find first page in cache\n", __func__);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	sg = kcalloc(num_pages, sizeof(*sg), GFP_KERNEL);
+	if (!sg) {
+		pr_err("%s: kcalloc failed\n", __func__);
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	sg_init_table(sg, num_pages);
+
+	/* cleanup a previous sg allocation */
+	/* this may happen if application doesn't signal for e/o DMA */
+	kfree(map_info->dma_info.sg);
+
+	map_info->dma_info.sg = sg;
+	map_info->dma_info.dir = dir;
+	map_info->dma_info.num_pages = num_pages;
+
+	ret = build_dma_sg(map_info, start, len, pg_i);
+	if (ret)
+		goto kfree_sg;
+
+	sg_num = dma_map_sg(bridge, sg, num_pages, dir);
+	if (sg_num < 1) {
+		pr_err("%s: dma_map_sg failed: %d\n", __func__, sg_num);
+		ret = -EFAULT;
+		goto kfree_sg;
 	}
 
+	pr_debug("%s: dma_map_sg mapped %d elements\n", __func__, sg_num);
+	map_info->dma_info.sg_num = sg_num;
+
 	return 0;
+
+kfree_sg:
+	kfree(sg);
+	map_info->dma_info.sg = NULL;
+out:
+	return ret;
 }
 
-static dsp_status proc_memory_sync(void *hprocessor, void *pmpu_addr,
-				   u32 ul_size, u32 ul_flags,
-				   enum dsp_flushtype ftype)
+static dsp_status proc_begin_dma(void *hprocessor, void *pmpu_addr,
+					u32 ul_size, u32 ul_flags,
+					enum dma_data_direction dir)
 {
 	/* Keep STATUS here for future additions to this function */
 	dsp_status status = DSP_SOK;
@@ -684,7 +776,7 @@ static dsp_status proc_memory_sync(void *hprocessor, void *pmpu_addr,
 
 	pr_debug("%s: addr 0x%x, size 0x%x, type %d\n", __func__,
 							(u32)pmpu_addr,
-							ul_size, ftype);
+							ul_size, dir);
 
 	/* find requested memory are in cached mapping information */
 	map_info = find_containing_mapping(p_proc_object, (u32) pmpu_addr,
@@ -695,7 +787,8 @@ static dsp_status proc_memory_sync(void *hprocessor, void *pmpu_addr,
 		goto err_out;
 	}
 
-	if (memory_sync_page(map_info, (u32) pmpu_addr, ul_size, ftype)) {
+	if (memory_release_ownership(map_info, (u32) pmpu_addr, ul_size,
+								dir)) {
 		pr_err("%s: InValid address parameters %p %x\n",
 		       __func__, pmpu_addr, ul_size);
 		status = DSP_EHANDLE;
@@ -710,13 +803,12 @@ err_out:
  *  Purpose:
  *     Flush cache
  */
-dsp_status proc_flush_memory(void *hprocessor, void *pmpu_addr,
+dsp_status proc_begin_dma_to_dsp(void *hprocessor, void *pmpu_addr,
 			     u32 ul_size, u32 ul_flags)
 {
-	enum dsp_flushtype mtype = PROC_WRITEBACK_INVALIDATE_MEM;
+	enum dma_data_direction dir = DMA_BIDIRECTIONAL;
 
-	return proc_memory_sync(hprocessor, pmpu_addr, ul_size, ul_flags,
-				mtype);
+	return proc_begin_dma(hprocessor, pmpu_addr, ul_size, ul_flags, dir);
 }
 
 /*
@@ -724,12 +816,12 @@ dsp_status proc_flush_memory(void *hprocessor, void *pmpu_addr,
  *  Purpose:
  *     Invalidates the memory specified
  */
-dsp_status proc_invalidate_memory(void *hprocessor, void *pmpu_addr,
+dsp_status proc_begin_dma_from_dsp(void *hprocessor, void *pmpu_addr,
 				  u32 ul_size)
 {
-	enum dsp_flushtype mtype = PROC_INVALIDATE_MEM;
+	enum dma_data_direction dir = DMA_FROM_DEVICE;
 
-	return proc_memory_sync(hprocessor, pmpu_addr, ul_size, 0, mtype);
+	return proc_begin_dma(hprocessor, pmpu_addr, ul_size, 0, dir);
 }
 
 /*
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC/PATCH 6/6] DSPBRIDGE: add dspbridge API to mark end of DMA
  2010-05-01 20:44 [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues Ohad Ben-Cohen
                   ` (4 preceding siblings ...)
  2010-05-01 20:44 ` [RFC/PATCH 5/6] DSPBRIDGE: do not use low level cache manipulation API Ohad Ben-Cohen
@ 2010-05-01 20:44 ` Ohad Ben-Cohen
  2010-05-02 13:17 ` [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues Felipe Contreras
  6 siblings, 0 replies; 45+ messages in thread
From: Ohad Ben-Cohen @ 2010-05-01 20:44 UTC (permalink / raw)
  To: linux-omap
  Cc: Kanigeri Hari, Omar Ramirez Luna, Guzman Lugo Fernando,
	Menon Nishanth, Hiroshi Doyu, Ohad Ben-Cohen

Standard DMA API is built upon the notion of "buffer ownership".
The buffer is either exclusively owned by the MPU (and therefore
may be accessed by it) or exclusively owned by the DMA device
(in our case, the dsp remote processor).
This patch adds the missing dspbridge API with which the MM
application can mark the end of a DMA transfer (and thus regain
ownership of the buffer).

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
---
If you want, you can also reach me at <  ohadb at ti dot com  >.

 arch/arm/plat-omap/include/dspbridge/_dcd.h     |    2 +
 arch/arm/plat-omap/include/dspbridge/proc.h     |    4 +
 arch/arm/plat-omap/include/dspbridge/wcdioctl.h |    2 +
 drivers/dsp/bridge/pmgr/wcd.c                   |   27 +++++++
 drivers/dsp/bridge/rmgr/proc.c                  |   87 +++++++++++++++++++++++
 5 files changed, 122 insertions(+), 0 deletions(-)

diff --git a/arch/arm/plat-omap/include/dspbridge/_dcd.h b/arch/arm/plat-omap/include/dspbridge/_dcd.h
index 0af4a31..adfcf67 100644
--- a/arch/arm/plat-omap/include/dspbridge/_dcd.h
+++ b/arch/arm/plat-omap/include/dspbridge/_dcd.h
@@ -113,6 +113,8 @@ extern u32 procwrap_un_map(union Trapped_Args *args, void *pr_ctxt);
 extern u32 procwrap_begin_dma_to_dsp(union Trapped_Args *args, void *pr_ctxt);
 extern u32 procwrap_stop(union Trapped_Args *args, void *pr_ctxt);
 extern u32 procwrap_begin_dma_from_dsp(union Trapped_Args *args, void *pr_ctxt);
+extern u32 procwrap_end_dma_to_dsp(union Trapped_Args *args, void *pr_ctxt);
+extern u32 procwrap_end_dma_from_dsp(union Trapped_Args *args, void *pr_ctxt);
 
 /* NODE wrapper functions */
 extern u32 nodewrap_allocate(union Trapped_Args *args, void *pr_ctxt);
diff --git a/arch/arm/plat-omap/include/dspbridge/proc.h b/arch/arm/plat-omap/include/dspbridge/proc.h
index f8450a6..7a7b8e8 100644
--- a/arch/arm/plat-omap/include/dspbridge/proc.h
+++ b/arch/arm/plat-omap/include/dspbridge/proc.h
@@ -474,6 +474,8 @@ extern dsp_status proc_stop(void *hprocessor);
  */
 extern dsp_status proc_begin_dma_to_dsp(void *hprocessor,
 				    void *pmpu_addr, u32 ul_size, u32 ul_flags);
+extern dsp_status proc_end_dma_to_dsp(void *hprocessor,
+				    void *pmpu_addr, u32 ul_size, u32 ul_flags);
 
 /*
  *  ======== proc_invalidate_memory ========
@@ -495,6 +497,8 @@ extern dsp_status proc_begin_dma_to_dsp(void *hprocessor,
  */
 extern dsp_status proc_begin_dma_from_dsp(void *hprocessor,
 					 void *pmpu_addr, u32 ul_size);
+extern dsp_status proc_end_dma_from_dsp(void *hprocessor,
+					 void *pmpu_addr, u32 ul_size);
 
 /*
  *  ======== proc_map ========
diff --git a/arch/arm/plat-omap/include/dspbridge/wcdioctl.h b/arch/arm/plat-omap/include/dspbridge/wcdioctl.h
index aba2078..a6debf2 100644
--- a/arch/arm/plat-omap/include/dspbridge/wcdioctl.h
+++ b/arch/arm/plat-omap/include/dspbridge/wcdioctl.h
@@ -455,6 +455,8 @@ union Trapped_Args {
 #define PROC_BEGINDMATODSP	_IOW(DB, DB_IOC(DB_PROC, 14), unsigned long)
 #define PROC_STOP		_IOWR(DB, DB_IOC(DB_PROC, 15), unsigned long)
 #define PROC_BEGINDMAFROMDSP	_IOW(DB, DB_IOC(DB_PROC, 16), unsigned long)
+#define PROC_ENDDMATODSP	_IOW(DB, DB_IOC(DB_PROC, 17), unsigned long)
+#define PROC_ENDDMAFROMDSP	_IOW(DB, DB_IOC(DB_PROC, 18), unsigned long)
 
 /* NODE Module */
 #define NODE_ALLOCATE		_IOWR(DB, DB_IOC(DB_NODE, 0), unsigned long)
diff --git a/drivers/dsp/bridge/pmgr/wcd.c b/drivers/dsp/bridge/pmgr/wcd.c
index 89243f1..ae6e8ab 100644
--- a/drivers/dsp/bridge/pmgr/wcd.c
+++ b/drivers/dsp/bridge/pmgr/wcd.c
@@ -114,6 +114,8 @@ static struct wcd_cmd proc_cmd[] = {
 	{procwrap_begin_dma_to_dsp},	/* PROC_BEGINDMATODSP */
 	{procwrap_stop},	/* PROC_STOP */
 	{procwrap_begin_dma_from_dsp},	/* PROC_BEGINDMAFROMDSP */
+	{procwrap_end_dma_to_dsp},	/* PROC_ENDDMATODSP */
+	{procwrap_end_dma_from_dsp},	/* PROC_ENDDMAFROMDSP */
 };
 
 /* NODE wrapper functions */
@@ -709,6 +711,31 @@ u32 procwrap_begin_dma_from_dsp(union Trapped_Args *args, void *pr_ctxt)
 	return status;
 }
 
+u32 procwrap_end_dma_to_dsp(union Trapped_Args *args, void *pr_ctxt)
+{
+	dsp_status status;
+
+	if (args->args_proc_flushmemory.ul_flags >
+	    PROC_WRITEBACK_INVALIDATE_MEM)
+		return DSP_EINVALIDARG;
+
+	status = proc_end_dma_to_dsp(args->args_proc_flushmemory.hprocessor,
+				   args->args_proc_flushmemory.pmpu_addr,
+				   args->args_proc_flushmemory.ul_size,
+				   args->args_proc_flushmemory.ul_flags);
+	return status;
+}
+
+u32 procwrap_end_dma_from_dsp(union Trapped_Args *args, void *pr_ctxt)
+{
+	dsp_status status;
+
+	status =
+	    proc_end_dma_from_dsp(args->args_proc_invalidatememory.hprocessor,
+				   args->args_proc_invalidatememory.pmpu_addr,
+				   args->args_proc_invalidatememory.ul_size);
+	return status;
+}
 /*
  * ======== procwrap_enum_resources ========
  */
diff --git a/drivers/dsp/bridge/rmgr/proc.c b/drivers/dsp/bridge/rmgr/proc.c
index 8a76681..c185cd1 100644
--- a/drivers/dsp/bridge/rmgr/proc.c
+++ b/drivers/dsp/bridge/rmgr/proc.c
@@ -700,6 +700,36 @@ out:
 	return ret;
 }
 
+static int memory_regain_ownership(struct memory_map_info *map_info,
+		unsigned long start, ssize_t len, enum dma_data_direction dir)
+{
+	int ret = 0;
+	unsigned long first_data_page = start >> PAGE_SHIFT;
+	unsigned long last_data_page = ((u32)(start + len - 1) >> PAGE_SHIFT);
+	/* calculating the number of pages this area spans */
+	unsigned long num_pages = last_data_page - first_data_page + 1;
+	struct bridge_dma_map_info *dma_info = &map_info->dma_info;
+
+	if (!dma_info->sg)
+		goto out;
+
+	if (dma_info->dir != dir || dma_info->num_pages != num_pages) {
+		pr_err("%s: dma info doesn't match given params\n", __func__);
+		return -EINVAL;
+	}
+
+	dma_unmap_sg(bridge, dma_info->sg, num_pages, dma_info->dir);
+
+	pr_debug("%s: dma_map_sg unmapped\n", __func__);
+
+	kfree(dma_info->sg);
+
+	map_info->dma_info.sg = NULL;
+
+out:
+	return ret;
+}
+
 /* Cache operation against kernel address instead of users */
 static int memory_release_ownership(struct memory_map_info *map_info,
 		unsigned long start, ssize_t len, enum dma_data_direction dir)
@@ -798,6 +828,47 @@ err_out:
 	return status;
 }
 
+static dsp_status proc_end_dma(void *hprocessor, void *pmpu_addr,
+				   u32 ul_size, u32 ul_flags,
+				   enum dsp_flushtype ftype)
+{
+	/* Keep STATUS here for future additions to this function */
+	dsp_status status = DSP_SOK;
+	struct proc_object *p_proc_object = (struct proc_object *)hprocessor;
+	struct memory_map_info *map_info;
+
+	DBC_REQUIRE(refs > 0);
+
+	if (!MEM_IS_VALID_HANDLE(p_proc_object, PROC_SIGNATURE)) {
+		status = DSP_EHANDLE;
+		goto err_out;
+	}
+
+	pr_debug("%s: addr 0x%x, size 0x%x, type %d\n", __func__,
+							(u32)pmpu_addr,
+							ul_size, ftype);
+
+	/* find requested memory are in cached mapping information */
+	map_info = find_containing_mapping(p_proc_object, (u32) pmpu_addr,
+								ul_size);
+	if (!map_info) {
+		pr_err("%s: find_containing_mapping failed\n", __func__);
+		status = DSP_EHANDLE;
+		goto err_out;
+	}
+
+	if (memory_regain_ownership(map_info, (u32) pmpu_addr, ul_size,
+								ftype)) {
+		pr_err("%s: InValid address parameters %p %x\n",
+		       __func__, pmpu_addr, ul_size);
+		status = DSP_EHANDLE;
+		goto err_out;
+	}
+
+err_out:
+	return status;
+}
+
 /*
  *  ======== proc_flush_memory ========
  *  Purpose:
@@ -824,6 +895,22 @@ dsp_status proc_begin_dma_from_dsp(void *hprocessor, void *pmpu_addr,
 	return proc_begin_dma(hprocessor, pmpu_addr, ul_size, 0, dir);
 }
 
+dsp_status proc_end_dma_to_dsp(void *hprocessor, void *pmpu_addr,
+			     u32 ul_size, u32 ul_flags)
+{
+	enum dma_data_direction dir = DMA_BIDIRECTIONAL;
+
+	return proc_end_dma(hprocessor, pmpu_addr, ul_size, ul_flags, dir);
+}
+
+dsp_status proc_end_dma_from_dsp(void *hprocessor, void *pmpu_addr,
+							u32 ul_size)
+{
+	enum dma_data_direction dir = DMA_FROM_DEVICE;
+
+	return proc_end_dma(hprocessor, pmpu_addr, ul_size, 0, dir);
+}
+
 /*
  *  ======== proc_get_resource_info ========
  *  Purpose:
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 5/6] DSPBRIDGE: do not use low level cache manipulation API
  2010-05-01 20:44 ` [RFC/PATCH 5/6] DSPBRIDGE: do not use low level cache manipulation API Ohad Ben-Cohen
@ 2010-05-02 11:56   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 45+ messages in thread
From: Ohad Ben-Cohen @ 2010-05-02 11:56 UTC (permalink / raw)
  To: linux-omap
  Cc: Kanigeri Hari, Omar Ramirez Luna, Guzman Lugo Fernando,
	Menon Nishanth, Hiroshi Doyu, Ohad Ben-Cohen

On Sat, May 1, 2010 at 11:44 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> Instead of using low level cache manipulation API,
> use the standard DMA API.  This changes the concept
> of the dspbridge cache API a little, hence the naming changes:
> * Flush marks the beginning of a DMA transfer from the MPU
> to the DSP.
> * Invalidate marks the beginning of a DMA transfer from the DSP
> to the MPU.
>
> Both of these actions eventually build a scatter gatter list
> using the page information that was kept during proc_map,
> and feed it to the standard dma_map_sg API.
> Note that now users cannot manipulate the cache state of any random
> address; if the buffer is not part of a previous memory mapping of that
> application, the request is denied.
>
> Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
> ---
> If you want, you can also reach me at <  ohadb at ti dot com  >.
>
>  arch/arm/plat-omap/include/dspbridge/_dcd.h     |    4 +-
>  arch/arm/plat-omap/include/dspbridge/proc.h     |    4 +-
>  arch/arm/plat-omap/include/dspbridge/wcdioctl.h |    4 +-
>  drivers/dsp/bridge/pmgr/wcd.c                   |   12 +-
>  drivers/dsp/bridge/rmgr/proc.c                  |  152 ++++++++++++++++++-----
>  5 files changed, 134 insertions(+), 42 deletions(-)
>
> diff --git a/arch/arm/plat-omap/include/dspbridge/_dcd.h b/arch/arm/plat-omap/include/dspbridge/_dcd.h
> index 1350feb..0af4a31 100644
> --- a/arch/arm/plat-omap/include/dspbridge/_dcd.h
> +++ b/arch/arm/plat-omap/include/dspbridge/_dcd.h
> @@ -110,9 +110,9 @@ extern u32 procwrap_reserve_memory(union Trapped_Args *args, void *pr_ctxt);
>  extern u32 procwrap_un_reserve_memory(union Trapped_Args *args, void *pr_ctxt);
>  extern u32 procwrap_map(union Trapped_Args *args, void *pr_ctxt);
>  extern u32 procwrap_un_map(union Trapped_Args *args, void *pr_ctxt);
> -extern u32 procwrap_flush_memory(union Trapped_Args *args, void *pr_ctxt);
> +extern u32 procwrap_begin_dma_to_dsp(union Trapped_Args *args, void *pr_ctxt);
>  extern u32 procwrap_stop(union Trapped_Args *args, void *pr_ctxt);
> -extern u32 procwrap_invalidate_memory(union Trapped_Args *args, void *pr_ctxt);
> +extern u32 procwrap_begin_dma_from_dsp(union Trapped_Args *args, void *pr_ctxt);
>
>  /* NODE wrapper functions */
>  extern u32 nodewrap_allocate(union Trapped_Args *args, void *pr_ctxt);
> diff --git a/arch/arm/plat-omap/include/dspbridge/proc.h b/arch/arm/plat-omap/include/dspbridge/proc.h
> index 0707739..f8450a6 100644
> --- a/arch/arm/plat-omap/include/dspbridge/proc.h
> +++ b/arch/arm/plat-omap/include/dspbridge/proc.h
> @@ -472,7 +472,7 @@ extern dsp_status proc_stop(void *hprocessor);
>  *  Details:
>  *      All the arguments are currently ignored.
>  */
> -extern dsp_status proc_flush_memory(void *hprocessor,
> +extern dsp_status proc_begin_dma_to_dsp(void *hprocessor,
>                                    void *pmpu_addr, u32 ul_size, u32 ul_flags);
>
>  /*
> @@ -493,7 +493,7 @@ extern dsp_status proc_flush_memory(void *hprocessor,
>  *  Details:
>  *      All the arguments are currently ignored.
>  */
> -extern dsp_status proc_invalidate_memory(void *hprocessor,
> +extern dsp_status proc_begin_dma_from_dsp(void *hprocessor,
>                                         void *pmpu_addr, u32 ul_size);
>
>  /*
> diff --git a/arch/arm/plat-omap/include/dspbridge/wcdioctl.h b/arch/arm/plat-omap/include/dspbridge/wcdioctl.h
> index b6a4dda..aba2078 100644
> --- a/arch/arm/plat-omap/include/dspbridge/wcdioctl.h
> +++ b/arch/arm/plat-omap/include/dspbridge/wcdioctl.h
> @@ -452,9 +452,9 @@ union Trapped_Args {
>  #define PROC_UNRSVMEM          _IOW(DB, DB_IOC(DB_PROC, 11), unsigned long)
>  #define PROC_MAPMEM            _IOWR(DB, DB_IOC(DB_PROC, 12), unsigned long)
>  #define PROC_UNMAPMEM          _IOR(DB, DB_IOC(DB_PROC, 13), unsigned long)
> -#define PROC_FLUSHMEMORY       _IOW(DB, DB_IOC(DB_PROC, 14), unsigned long)
> +#define PROC_BEGINDMATODSP     _IOW(DB, DB_IOC(DB_PROC, 14), unsigned long)
>  #define PROC_STOP              _IOWR(DB, DB_IOC(DB_PROC, 15), unsigned long)
> -#define PROC_INVALIDATEMEMORY  _IOW(DB, DB_IOC(DB_PROC, 16), unsigned long)
> +#define PROC_BEGINDMAFROMDSP   _IOW(DB, DB_IOC(DB_PROC, 16), unsigned long)
>
>  /* NODE Module */
>  #define NODE_ALLOCATE          _IOWR(DB, DB_IOC(DB_NODE, 0), unsigned long)
> diff --git a/drivers/dsp/bridge/pmgr/wcd.c b/drivers/dsp/bridge/pmgr/wcd.c
> index 15a05a6..89243f1 100644
> --- a/drivers/dsp/bridge/pmgr/wcd.c
> +++ b/drivers/dsp/bridge/pmgr/wcd.c
> @@ -111,9 +111,9 @@ static struct wcd_cmd proc_cmd[] = {
>        {procwrap_un_reserve_memory},   /* PROC_UNRSVMEM */
>        {procwrap_map},         /* PROC_MAPMEM */
>        {procwrap_un_map},      /* PROC_UNMAPMEM */
> -       {procwrap_flush_memory},        /* PROC_FLUSHMEMORY */
> +       {procwrap_begin_dma_to_dsp},    /* PROC_BEGINDMATODSP */
>        {procwrap_stop},        /* PROC_STOP */
> -       {procwrap_invalidate_memory},   /* PROC_INVALIDATEMEMORY */
> +       {procwrap_begin_dma_from_dsp},  /* PROC_BEGINDMAFROMDSP */
>  };
>
>  /* NODE wrapper functions */
> @@ -680,7 +680,7 @@ u32 procwrap_enum_node_info(union Trapped_Args *args, void *pr_ctxt)
>  /*
>  * ======== procwrap_flush_memory ========
>  */
> -u32 procwrap_flush_memory(union Trapped_Args *args, void *pr_ctxt)
> +u32 procwrap_begin_dma_to_dsp(union Trapped_Args *args, void *pr_ctxt)
>  {
>        dsp_status status;
>
> @@ -688,7 +688,7 @@ u32 procwrap_flush_memory(union Trapped_Args *args, void *pr_ctxt)
>            PROC_WRITEBACK_INVALIDATE_MEM)
>                return DSP_EINVALIDARG;
>
> -       status = proc_flush_memory(args->args_proc_flushmemory.hprocessor,
> +       status = proc_begin_dma_to_dsp(args->args_proc_flushmemory.hprocessor,
>                                   args->args_proc_flushmemory.pmpu_addr,
>                                   args->args_proc_flushmemory.ul_size,
>                                   args->args_proc_flushmemory.ul_flags);
> @@ -698,12 +698,12 @@ u32 procwrap_flush_memory(union Trapped_Args *args, void *pr_ctxt)
>  /*
>  * ======== procwrap_invalidate_memory ========
>  */
> -u32 procwrap_invalidate_memory(union Trapped_Args *args, void *pr_ctxt)
> +u32 procwrap_begin_dma_from_dsp(union Trapped_Args *args, void *pr_ctxt)
>  {
>        dsp_status status;
>
>        status =
> -           proc_invalidate_memory(args->args_proc_invalidatememory.hprocessor,
> +           proc_begin_dma_from_dsp(args->args_proc_invalidatememory.hprocessor,
>                                   args->args_proc_invalidatememory.pmpu_addr,
>                                   args->args_proc_invalidatememory.ul_size);
>        return status;
> diff --git a/drivers/dsp/bridge/rmgr/proc.c b/drivers/dsp/bridge/rmgr/proc.c
> index bbc7e0f..8a76681 100644
> --- a/drivers/dsp/bridge/rmgr/proc.c
> +++ b/drivers/dsp/bridge/rmgr/proc.c
> @@ -18,6 +18,8 @@
>
>  /* ------------------------------------ Host OS */
>  #include <linux/list.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/scatterlist.h>
>  #include <linux/spinlock.h>
>  #include <dspbridge/host_os.h>
>
> @@ -80,6 +82,7 @@
>  #define WBUF           0x8000          /* Output Buffer */
>
>  extern char *iva_img;
> +extern struct device *bridge;
>
>  /*  ----------------------------------- Globals */
>
> @@ -110,6 +113,18 @@ struct proc_object {
>        spinlock_t maps_lock;
>  };
>
> +/* used to cache dma mapping information */
> +struct bridge_dma_map_info {
> +       /* direction of DMA in action, or DMA_NONE */
> +       enum dma_data_direction dir;
> +       /* number of elements requested by us */
> +       int num_pages;
> +       /* number of elements returned from dma_map_sg */
> +       int sg_num;
> +       /* list of buffers used in this DMA action */
> +       struct scatterlist *sg;
> +};
> +
>  /* used to cache memory mapping information */
>  struct memory_map_info {
>        struct list_head node;
> @@ -118,6 +133,7 @@ struct memory_map_info {
>        u32 dsp_addr;
>        u32 size;
>        u32 num_usr_pgs;
> +       struct bridge_dma_map_info dma_info;
>  };
>
>  static u32 refs;
> @@ -130,7 +146,23 @@ static s32 get_envp_count(char **envp);
>  static char **prepend_envp(char **new_envp, char **envp, s32 envp_elems,
>                           s32 cnew_envp, char *szVar);
>
> -/* remember mapping information */
> +/* Mapping and Page info caching
> + *
> + * The map_info mechanism is built to remember the (struct page *)
> + * pointers of all pages per a specific memory mapping of a specific
> + * process.
> + * Whenever a memory area is mapped, get_user_pages is used to pin the
> + * relevant pages in memory. As a result of running get_user_pages,
> + * we get the pointers to the page structures, which we now keep in
> + * the memory_map_info struct.
> + * Then, any time the user (the MM application) intends to begin
> + * a DMA (Direct Memory Access) operation to/from the remote processor,
> + * we use this cached page information to build a scatter gatter list
> + * which is given to the standard DMA API (which takes care of low
> + * level cache manipulation).
> + * Currently a simple linked list is used to cache the memory mapping
> + * info per process. This can be further optimized if needed.
> + */
>  static struct memory_map_info *add_mapping_info(struct proc_object *pr_obj,
>                                u32 mpu_addr, u32 dsp_addr, u32 size)
>  {
> @@ -197,6 +229,7 @@ static void remove_mapping_information(struct proc_object *pr_obj,
>                if (match_exact_map_info(map_info, dsp_addr, size)) {
>                        pr_debug("%s: match, deleting map info\n", __func__);
>                        list_del(&map_info->node);
> +                       kfree(map_info->dma_info.sg);
>                        kfree(map_info->pages);
>                        kfree(map_info);
>                        goto out;
> @@ -625,50 +658,109 @@ dsp_status proc_enum_nodes(void *hprocessor, void **node_tab,
>        return status;
>  }
>
> -/* Cache operation against kernel address instead of users */
> -static int memory_sync_page(struct memory_map_info *map_info,
> -               unsigned long start, ssize_t len, enum dsp_flushtype ftype)
> +static int build_dma_sg(struct memory_map_info *map_info, unsigned long start,
> +                                               ssize_t len, int pg_i)
>  {
>        struct page *page;
> -       void *kaddr;
>        unsigned long offset;
>        ssize_t rest;
> -       int pg_i;
> -
> -       pg_i = find_first_page_in_cache(map_info, start);
> -       if (pg_i < 0) {
> -               pr_err("%s: failed to find first page in cache\n", __func__);
> -               return -EINVAL;
> -       }
> +       int ret = 0, i = 0;
> +       struct scatterlist *sg = map_info->dma_info.sg;
>
>        while (len) {
>                page = get_mapping_page(map_info, pg_i);
>                if (!page) {
>                        pr_err("%s: no page for %08lx\n", __func__, start);
> -                       return -EINVAL;
> +                       ret = -EINVAL;
> +                       goto out;
>                } else if (IS_ERR(page)) {
>                        pr_err("%s: err page for %08lx(%lu)\n", __func__, start,
>                               PTR_ERR(page));
> -                       return PTR_ERR(page);
> +                       ret = PTR_ERR(page);
> +                       goto out;
>                }
>
>                offset = start & ~PAGE_MASK;
> -               kaddr = kmap(page) + offset;
>                rest = min_t(ssize_t, PAGE_SIZE - offset, len);
> -               mem_flush_cache(kaddr, rest, ftype);


as a result, we can now have this as well:


diff --git a/arch/arm/plat-omap/include/dspbridge/mem.h
b/arch/arm/plat-omap/include/
index 087f69f..4d9b33b 100644
--- a/arch/arm/plat-omap/include/dspbridge/mem.h
+++ b/arch/arm/plat-omap/include/dspbridge/mem.h
@@ -124,21 +124,6 @@ extern void *mem_calloc(IN u32 byte_size, IN enum
mem_poolattrs
 extern void mem_exit(void);

 /*
- *  ======== mem_flush_cache ========
- *  Purpose:
- *      Performs system cache sync with discard
- *  Parameters:
- *      pMemBuf:    Pointer to memory region to be flushed.
- *      pMemBuf:    Size of the memory region to be flushed.
- *  Returns:
- *  Requires:
- *      MEM is initialized.
- *  Ensures:
- *      Cache is synchronized
- */
-extern void mem_flush_cache(void *pMemBuf, u32 byte_size, s32 FlushType);
-
-/*
  *  ======== mem_free_phys_mem ========
  *  Purpose:
  *      Free the given block of physically contiguous memory.
diff --git a/drivers/dsp/bridge/services/mem.c
b/drivers/dsp/bridge/services/mem.c
index 916a49f..dfb17cf 100644
--- a/drivers/dsp/bridge/services/mem.c
+++ b/drivers/dsp/bridge/services/mem.c
@@ -212,39 +212,6 @@ void mem_exit(void)
 }

 /*
- *  ======== mem_flush_cache ========
- *  Purpose:
- *      Flush cache
- */
-void mem_flush_cache(void *pMemBuf, u32 byte_size, s32 FlushType)
-{
-       if (!pMemBuf)
-               return;
-
-       switch (FlushType) {
-               /* invalidate only */
-       case PROC_INVALIDATE_MEM:
-               dmac_inv_range(pMemBuf, pMemBuf + byte_size);
-               outer_inv_range(__pa((u32) pMemBuf), __pa((u32) pMemBuf +
-                                                         byte_size));
-               break;
-               /* writeback only */
-       case PROC_WRITEBACK_MEM:
-               dmac_clean_range(pMemBuf, pMemBuf + byte_size);
-               outer_clean_range(__pa((u32) pMemBuf), __pa((u32) pMemBuf +
-                                                           byte_size));
-               break;
-               /* writeback and invalidate */
-       case PROC_WRITEBACK_INVALIDATE_MEM:
-               dmac_flush_range(pMemBuf, pMemBuf + byte_size);
-               outer_flush_range(__pa((u32) pMemBuf), __pa((u32) pMemBuf +
-                                                           byte_size));
-               break;
-       }
-
-}
-
-/*
  *  ======== mem_free_phys_mem ========
  *  Purpose:
  *      Free the given block of physically contiguous memory.



>
> -               kunmap(page);
> +               sg_set_page(&sg[i], page, rest, offset);
> +
>                len -= rest;
>                start += rest;
> -               pg_i++;
> +               pg_i++, i++;
> +       }
> +
> +       if (i != map_info->dma_info.num_pages) {
> +               pr_err("%s: bad number of sg iterations\n", __func__);
> +               ret = -EFAULT;
> +               goto out;
> +       }
> +
> +out:
> +       return ret;
> +}
> +
> +/* Cache operation against kernel address instead of users */
> +static int memory_release_ownership(struct memory_map_info *map_info,
> +               unsigned long start, ssize_t len, enum dma_data_direction dir)
> +{
> +       int pg_i, ret, sg_num;
> +       struct scatterlist *sg;
> +       unsigned long first_data_page = start >> PAGE_SHIFT;
> +       unsigned long last_data_page = ((u32)(start + len - 1) >> PAGE_SHIFT);
> +       /* calculating the number of pages this area spans */
> +       unsigned long num_pages = last_data_page - first_data_page + 1;
> +
> +       pg_i = find_first_page_in_cache(map_info, start);
> +       if (pg_i < 0) {
> +               pr_err("%s: failed to find first page in cache\n", __func__);
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +
> +       sg = kcalloc(num_pages, sizeof(*sg), GFP_KERNEL);
> +       if (!sg) {
> +               pr_err("%s: kcalloc failed\n", __func__);
> +               ret = -ENOMEM;
> +               goto out;
> +       }
> +
> +       sg_init_table(sg, num_pages);
> +
> +       /* cleanup a previous sg allocation */
> +       /* this may happen if application doesn't signal for e/o DMA */
> +       kfree(map_info->dma_info.sg);
> +
> +       map_info->dma_info.sg = sg;
> +       map_info->dma_info.dir = dir;
> +       map_info->dma_info.num_pages = num_pages;
> +
> +       ret = build_dma_sg(map_info, start, len, pg_i);
> +       if (ret)
> +               goto kfree_sg;
> +
> +       sg_num = dma_map_sg(bridge, sg, num_pages, dir);
> +       if (sg_num < 1) {
> +               pr_err("%s: dma_map_sg failed: %d\n", __func__, sg_num);
> +               ret = -EFAULT;
> +               goto kfree_sg;
>        }
>
> +       pr_debug("%s: dma_map_sg mapped %d elements\n", __func__, sg_num);
> +       map_info->dma_info.sg_num = sg_num;
> +
>        return 0;
> +
> +kfree_sg:
> +       kfree(sg);
> +       map_info->dma_info.sg = NULL;
> +out:
> +       return ret;
>  }
>
> -static dsp_status proc_memory_sync(void *hprocessor, void *pmpu_addr,
> -                                  u32 ul_size, u32 ul_flags,
> -                                  enum dsp_flushtype ftype)
> +static dsp_status proc_begin_dma(void *hprocessor, void *pmpu_addr,
> +                                       u32 ul_size, u32 ul_flags,
> +                                       enum dma_data_direction dir)
>  {
>        /* Keep STATUS here for future additions to this function */
>        dsp_status status = DSP_SOK;
> @@ -684,7 +776,7 @@ static dsp_status proc_memory_sync(void *hprocessor, void *pmpu_addr,
>
>        pr_debug("%s: addr 0x%x, size 0x%x, type %d\n", __func__,
>                                                        (u32)pmpu_addr,
> -                                                       ul_size, ftype);
> +                                                       ul_size, dir);
>
>        /* find requested memory are in cached mapping information */
>        map_info = find_containing_mapping(p_proc_object, (u32) pmpu_addr,
> @@ -695,7 +787,8 @@ static dsp_status proc_memory_sync(void *hprocessor, void *pmpu_addr,
>                goto err_out;
>        }
>
> -       if (memory_sync_page(map_info, (u32) pmpu_addr, ul_size, ftype)) {
> +       if (memory_release_ownership(map_info, (u32) pmpu_addr, ul_size,
> +                                                               dir)) {
>                pr_err("%s: InValid address parameters %p %x\n",
>                       __func__, pmpu_addr, ul_size);
>                status = DSP_EHANDLE;
> @@ -710,13 +803,12 @@ err_out:
>  *  Purpose:
>  *     Flush cache
>  */
> -dsp_status proc_flush_memory(void *hprocessor, void *pmpu_addr,
> +dsp_status proc_begin_dma_to_dsp(void *hprocessor, void *pmpu_addr,
>                             u32 ul_size, u32 ul_flags)
>  {
> -       enum dsp_flushtype mtype = PROC_WRITEBACK_INVALIDATE_MEM;
> +       enum dma_data_direction dir = DMA_BIDIRECTIONAL;
>
> -       return proc_memory_sync(hprocessor, pmpu_addr, ul_size, ul_flags,
> -                               mtype);
> +       return proc_begin_dma(hprocessor, pmpu_addr, ul_size, ul_flags, dir);
>  }
>
>  /*
> @@ -724,12 +816,12 @@ dsp_status proc_flush_memory(void *hprocessor, void *pmpu_addr,
>  *  Purpose:
>  *     Invalidates the memory specified
>  */
> -dsp_status proc_invalidate_memory(void *hprocessor, void *pmpu_addr,
> +dsp_status proc_begin_dma_from_dsp(void *hprocessor, void *pmpu_addr,
>                                  u32 ul_size)
>  {
> -       enum dsp_flushtype mtype = PROC_INVALIDATE_MEM;
> +       enum dma_data_direction dir = DMA_FROM_DEVICE;
>
> -       return proc_memory_sync(hprocessor, pmpu_addr, ul_size, 0, mtype);
> +       return proc_begin_dma(hprocessor, pmpu_addr, ul_size, 0, dir);
>  }
>
>  /*
> --
> 1.6.3.3
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-01 20:44 [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues Ohad Ben-Cohen
                   ` (5 preceding siblings ...)
  2010-05-01 20:44 ` [RFC/PATCH 6/6] DSPBRIDGE: add dspbridge API to mark end of DMA Ohad Ben-Cohen
@ 2010-05-02 13:17 ` Felipe Contreras
  2010-05-02 17:47   ` Ohad Ben-Cohen
  6 siblings, 1 reply; 45+ messages in thread
From: Felipe Contreras @ 2010-05-02 13:17 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

On Sat, May 1, 2010 at 11:44 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> This patchset introduces an approach to eliminate the direct calls
> to follow_page and to the low level cache APIs.
>
> The patchset works by caching the page information while memory
> is mapped, and then using that information later when needed
> instead of calling follow_page. The low level cache API is then replaced
> by the standard DMA API.

Finally! Very interesting patch indeed.

> A few key points in the current approach that I'd be happy to hear
> your feedback about:
> 1. The new mapping + page information is currently cached in the
>   proc object, but it might fit better inside dmm map objects
>   (by enhancing the DMM layer to support the required data caching,
>   storing and retrieving).

Sounds like a good idea.

> 2. The information is stored in a linked list. That's pretty fine
>   as long as the number of memory mappings per application is not
>   extremely high. If that assumption is wrong, a different underlying
>   data structure might be better (hash table, priority tree, etc..).

I think a linked list is fine for now. AFAIK only a limited number of
mmaps happen at the same time, usually 4.

> 3. Moving to standard DMA API completely changes the user's point
>   of view; users should no longer think in terms of which cache
>   manipulation is required, but instead, they should just tell dspbridge
>   before a DMA transfer begins, and after it ends. Between the begin
>   and end calls, the buffer "belongs" to the DSP and should not
>   be accessed by the user.

This is really nice. That API should have been that way since the beginning.

>   The patchset renames the flush ioctl to begin_dma_to_dsp and
>   the invalidate ioctl to begin_dma_from_dsp. Both functions
>   eventually call dma_map_sg, with the former requesting a
>   DMA_BIDIRECTIONAL direction, and the latter requesting a
>   DMA_FROM_DEVICE direction.
>   In addition, the patchset adds two new APIs which calls dma_unmap_sg:
>   end_dma_to_dsp and end_dma_from_dsp.
>
>   Ideally, there would be only a single begin_dma command and a single
>   end_dma one, which would accept an additional parameter that will
>   determine the direction of the transfer. Such an approach would be more
>   versatile and cleaner, but it would also break all user space apps that
>   use dspbridge today.

If I understand correctly all user-space apps would be broken anyway
because they are not issuing the end_dma calls. At the very least they
need to be updated to use them.

Also, in Nokia we patched the bridgedriver to be able to have the 3
operations available from user-space (clean, inv, and flush), so we
would be very interested in having the direction of the transfer
available.

Cheers.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-02 13:17 ` [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues Felipe Contreras
@ 2010-05-02 17:47   ` Ohad Ben-Cohen
  2010-05-14 19:27     ` Felipe Contreras
  0 siblings, 1 reply; 45+ messages in thread
From: Ohad Ben-Cohen @ 2010-05-02 17:47 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

On Sun, May 2, 2010 at 4:17 PM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> On Sat, May 1, 2010 at 11:44 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>> This patchset introduces an approach to eliminate the direct calls
>> to follow_page and to the low level cache APIs.
>>
>> The patchset works by caching the page information while memory
>> is mapped, and then using that information later when needed
>> instead of calling follow_page. The low level cache API is then replaced
>> by the standard DMA API.
>
> Finally! Very interesting patch indeed.

Thanks!

>
>> A few key points in the current approach that I'd be happy to hear
>> your feedback about:
>> 1. The new mapping + page information is currently cached in the
>>   proc object, but it might fit better inside dmm map objects
>>   (by enhancing the DMM layer to support the required data caching,
>>   storing and retrieving).
>
> Sounds like a good idea.

Great, I'll look into this.

>
>> 2. The information is stored in a linked list. That's pretty fine
>>   as long as the number of memory mappings per application is not
>>   extremely high. If that assumption is wrong, a different underlying
>>   data structure might be better (hash table, priority tree, etc..).
>
> I think a linked list is fine for now. AFAIK only a limited number of
> mmaps happen at the same time, usually 4.

Thanks for confirming.

>
>> 3. Moving to standard DMA API completely changes the user's point
>>   of view; users should no longer think in terms of which cache
>>   manipulation is required, but instead, they should just tell dspbridge
>>   before a DMA transfer begins, and after it ends. Between the begin
>>   and end calls, the buffer "belongs" to the DSP and should not
>>   be accessed by the user.
>
> This is really nice. That API should have been that way since the beginning.
>
>>   The patchset renames the flush ioctl to begin_dma_to_dsp and
>>   the invalidate ioctl to begin_dma_from_dsp. Both functions
>>   eventually call dma_map_sg, with the former requesting a
>>   DMA_BIDIRECTIONAL direction, and the latter requesting a
>>   DMA_FROM_DEVICE direction.
>>   In addition, the patchset adds two new APIs which calls dma_unmap_sg:
>>   end_dma_to_dsp and end_dma_from_dsp.
>>
>>   Ideally, there would be only a single begin_dma command and a single
>>   end_dma one, which would accept an additional parameter that will
>>   determine the direction of the transfer. Such an approach would be more
>>   versatile and cleaner, but it would also break all user space apps that
>>   use dspbridge today.
>
> If I understand correctly all user-space apps would be broken anyway
> because they are not issuing the end_dma calls. At the very least they
> need to be updated to use them.

Basically you're right, but the patches currently silently allow
today's user space
to somehow work (because if you don't use dma bounce buffers and you feel lucky
about speculative prefetching then you might get away with not calling
dma unmap).
I did that on purpose, to ease testing & migration, but didn't
explicitely said it because
 frankly it's just wrong.

>
> Also, in Nokia we patched the bridgedriver to be able to have the 3
> operations available from user-space (clean, inv, and flush), so we
> would be very interested in having the direction of the transfer
> available.

Thanks, that's important to know.

What do you say about the following plan then:
- I'll add a single pair of begin_dma and end_dma commands that will
take the additional
direction parameter as described above. I'll then covert the existing
flush & invalidate commands to use this begin_dma command supplying it
the appropriate direction argument.
- In a separate patch, I'll deprecate flush & invalidate entirely
(usage of this deprecated
API will fail, resulting in a guiding error message).

We get a sane and versatile (and platform-independent) implementation
that always work,
but breaks user space. If anyone would need to work with current user space,
the deprecating patch can always be reverted locally to get back a
flush & invalidate
implementations that (somehow) work.

Anyway, I'm sure that breaking user space can be somewhat traumatic so
let's make sure
we get an across-the-board support for this.

Thanks a lot for the comments!
Ohad.

>
> Cheers.
>
> --
> Felipe Contreras
>
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-02 17:47   ` Ohad Ben-Cohen
@ 2010-05-14 19:27     ` Felipe Contreras
  2010-05-14 19:49       ` Omar Ramirez Luna
                         ` (3 more replies)
  0 siblings, 4 replies; 45+ messages in thread
From: Felipe Contreras @ 2010-05-14 19:27 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

Hi,

I spent some time looking deeper into this patch series, and I have some doubts.

On Sun, May 2, 2010 at 8:47 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> On Sun, May 2, 2010 at 4:17 PM, Felipe Contreras
> <felipe.contreras@gmail.com> wrote:
>> On Sat, May 1, 2010 at 11:44 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>>>   The patchset renames the flush ioctl to begin_dma_to_dsp and
>>>   the invalidate ioctl to begin_dma_from_dsp. Both functions
>>>   eventually call dma_map_sg, with the former requesting a
>>>   DMA_BIDIRECTIONAL direction, and the latter requesting a
>>>   DMA_FROM_DEVICE direction.
>>>   In addition, the patchset adds two new APIs which calls dma_unmap_sg:
>>>   end_dma_to_dsp and end_dma_from_dsp.
>>>
>>>   Ideally, there would be only a single begin_dma command and a single
>>>   end_dma one, which would accept an additional parameter that will
>>>   determine the direction of the transfer. Such an approach would be more
>>>   versatile and cleaner, but it would also break all user space apps that
>>>   use dspbridge today.
>>
>> If I understand correctly all user-space apps would be broken anyway
>> because they are not issuing the end_dma calls. At the very least they
>> need to be updated to use them.
>
> Basically you're right, but the patches currently silently allow
> today's user space
> to somehow work (because if you don't use dma bounce buffers and you feel lucky
> about speculative prefetching then you might get away with not calling
> dma unmap).
> I did that on purpose, to ease testing & migration, but didn't
> explicitely said it because
>  frankly it's just wrong.

I looked into the dma code (I guess it's in arch/arm/mm/dma-mapping.c)
and I don't understand what dma_unmap_sg is supposed to do. I see that
first some "safe buffer" is searched, and if there isn't any... then
it depends on the direction whether something is actually done or not.

I guess it depends whether our arch has dmabounce or not, which I have
no idea, but if we do, wouldn't skiping dma_unmap calls leak some
massive amount of "safe buffers"?

>> Also, in Nokia we patched the bridgedriver to be able to have the 3
>> operations available from user-space (clean, inv, and flush), so we
>> would be very interested in having the direction of the transfer
>> available.
>
> Thanks, that's important to know.

It's not that critical, but I guess it can't hurt to do the right thing.

> What do you say about the following plan then:
> - I'll add a single pair of begin_dma and end_dma commands that will
> take the additional
> direction parameter as described above. I'll then covert the existing
> flush & invalidate commands to use this begin_dma command supplying it
> the appropriate direction argument.

Sounds perfect, but I wonder if the deprecated flush & invalidate
would really work. See previous comments.

> - In a separate patch, I'll deprecate flush & invalidate entirely
> (usage of this deprecated
> API will fail, resulting in a guiding error message).
>
> We get a sane and versatile (and platform-independent) implementation
> that always work,
> but breaks user space. If anyone would need to work with current user space,
> the deprecating patch can always be reverted locally to get back a
> flush & invalidate
> implementations that (somehow) work.

User-space API is being broken all the time in dspbridge. The
difference is that this one might require nontrivial changes. But it
must be done.

So, I tried your patches, and a simple test app worked fine without
modification, but a real video decoding hanged the device
completely... some spinlock was stuck. I don't know if it's because of
your patches, or because of the state of the bridge at that point.
I'll try first to rebase to the latest to have a better idea of what's
happening.

Also, I noticed an important problem. Your code assumes that we would
always have a bunch of scattered pages, however you are not taking
into account bridge_brd_mem_map() where vm_flags have VM_IO... in that
case get_user_pages() is skipped completely. This code-path is
important in order to mmap framebuffer memory, which is contiguous.
So, in this case there are no pages too keep track at all.

This use-case is very important since the dspbridge mmap operation is
very expensive, and due to GStreamer design, we must do it
constantly... if the memory is contiguous (VM_IO), the mmap operation
is really fast (or at least should be... in theory).

Reading your patches I see the ioctl to start the dmap operations
would simply error out, but from the looks of it, they would be
failing already, which is weird, because we are already using
framebuffer memory for video decoding + rendering. In gst-dsp we are
not checking the success of clean/invalidate ioctls so it might be
that it has been failing all this time and seems to work by pure luck.

Anyway, I'll keep investigating and let you know if I find anything.

Cheers.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-14 19:27     ` Felipe Contreras
@ 2010-05-14 19:49       ` Omar Ramirez Luna
  2010-05-15  8:26         ` Felipe Contreras
  2010-05-16 17:35       ` Felipe Contreras
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 45+ messages in thread
From: Omar Ramirez Luna @ 2010-05-14 19:49 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: Ohad Ben-Cohen, linux-omap@vger.kernel.org, Kanigeri, Hari,
	Guzman Lugo, Fernando, Menon, Nishanth, Hiroshi Doyu

On 5/14/2010 2:27 PM, Felipe Contreras wrote:
[...]
>
> So, I tried your patches, and a simple test app worked fine without
> modification, but a real video decoding hanged the device
> completely... some spinlock was stuck. I don't know if it's because of
> your patches, or because of the state of the bridge at that point.
> I'll try first to rebase to the latest to have a better idea of what's
> happening.
>

You may want to check if you have this patch "DSPBRIDGE: Fix declaration 
and initialization of sync objects"[1]

This is the one I know fixes a spinlock issue and which description is 
not clear enough to state that.

File fixed was: drivers/dsp/bridge/wmd/msg_sm.c

[1]http://git.kernel.org/?p=linux/kernel/git/tmlind/linux-omap-2.6.git;a=commit;h=b3900e6df1f4e16b59d506a299cd5084c67a6ede

Regards,

- omar

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-14 19:49       ` Omar Ramirez Luna
@ 2010-05-15  8:26         ` Felipe Contreras
  2010-05-15  9:08           ` Felipe Contreras
  0 siblings, 1 reply; 45+ messages in thread
From: Felipe Contreras @ 2010-05-15  8:26 UTC (permalink / raw)
  To: Omar Ramirez Luna
  Cc: Ohad Ben-Cohen, linux-omap@vger.kernel.org, Kanigeri, Hari,
	Guzman Lugo, Fernando, Menon, Nishanth, Hiroshi Doyu

On Fri, May 14, 2010 at 10:49 PM, Omar Ramirez Luna <omar.ramirez@ti.com> wrote:
> On 5/14/2010 2:27 PM, Felipe Contreras wrote:
> [...]
>>
>> So, I tried your patches, and a simple test app worked fine without
>> modification, but a real video decoding hanged the device
>> completely... some spinlock was stuck. I don't know if it's because of
>> your patches, or because of the state of the bridge at that point.
>> I'll try first to rebase to the latest to have a better idea of what's
>> happening.
>
> You may want to check if you have this patch "DSPBRIDGE: Fix declaration and
> initialization of sync objects"[1]

I used what Ohad suggested: 13e2573. So no, that patch is not there.

> This is the one I know fixes a spinlock issue and which description is not
> clear enough to state that.
>
> File fixed was: drivers/dsp/bridge/wmd/msg_sm.c
>
> [1]http://git.kernel.org/?p=linux/kernel/git/tmlind/linux-omap-2.6.git;a=commit;h=b3900e6df1f4e16b59d506a299cd5084c67a6ede

I tried to cherry-pick that commit... didn't help:

BUG: spinlock lockup on CPU#0, gst-launch-0.10/534, c6404304
Backtrace:
[<c0040984>] (dump_backtrace+0x0/0xf8) from [<c036d204>] (dump_stack+0x18/0x1c)
 r6:00000000 r5:c644a000 r4:c6404304 r3:00000000
[<c036d1ec>] (dump_stack+0x0/0x1c) from [<c0213a08>]
(__spin_lock_debug+0xbc/0xd0)
[<c021394c>] (__spin_lock_debug+0x0/0xd0) from [<c0213aa0>]
(do_raw_spin_lock+0x84/0xb4)
 r8:c644be54 r7:00000064 r6:c005f178 r5:80000013 r4:c6404304
[<c0213a1c>] (do_raw_spin_lock+0x0/0xb4) from [<c036ff6c>]
(_raw_spin_lock_irqsave+0x5c/0x68)
 r4:c6404304 r3:c7865a00
[<c036ff10>] (_raw_spin_lock_irqsave+0x0/0x68) from [<c005f178>]
(completion_done+0x1c/0x38)
 r6:00000001 r5:c6404304 r4:c6404300
[<c005f15c>] (completion_done+0x0/0x38) from [<bf00eb10>]
(sync_wait_on_multiple_events+0x58/0x14c [bridgedriver])
 r5:00000002 r4:c644be4c
[<bf00eab8>] (sync_wait_on_multiple_events+0x0/0x14c [bridgedriver])
from [<bf01158c>] (bridge_msg_get+0x154/0x240 [bridgedriver])
[<bf011438>] (bridge_msg_get+0x0/0x240 [bridgedriver]) from
[<bf02220c>] (node_get_message+0x94/0x128 [bridgedriver])
[<bf022178>] (node_get_message+0x0/0x128 [bridgedriver]) from
[<bf019968>] (nodewrap_get_message+0x28/0x8c [bridgedriver])
 r7:c004db48 r6:bf02e7d6 r5:c644bebc r4:c644bf04
[<bf019940>] (nodewrap_get_message+0x0/0x8c [bridgedriver]) from
[<bf018d94>] (wcd_call_dev_io_ctl+0xf8/0x120 [bridgedriver])
 r5:00000040 r4:c644bf1c
[<bf018c9c>] (wcd_call_dev_io_ctl+0x0/0x120 [bridgedriver]) from
[<bf02a058>] (bridge_ioctl+0xac/0xcc [bridgedriver])
 r6:421e7d64 r5:c004db48 r4:c6419f00 r3:c65bb500
[<bf029fac>] (bridge_ioctl+0x0/0xcc [bridgedriver]) from [<c0113eb4>]
(vfs_ioctl+0x34/0xb4)
 r6:421e7d64 r5:bf029fac r4:c6419f00
[<c0113e80>] (vfs_ioctl+0x0/0xb4) from [<c01142b8>] (do_vfs_ioctl+0x1c4/0x1e0)
 r7:00000005 r6:c004db48 r5:421e7d64 r4:421e7d64
[<c01140f4>] (do_vfs_ioctl+0x0/0x1e0) from [<c0114314>] (sys_ioctl+0x40/0x64)
 r4:c6419f00
[<c01142d4>] (sys_ioctl+0x0/0x64) from [<c003cfc0>] (ret_fast_syscall+0x0/0x38)
 r7:00000036 r6:0006ef10 r5:4055d860 r4:0006a000

I'll try to rebase the patches to the latest head.

Cheer.s

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 2/6] DSPBRIDGE: remember mapping and page info in proc_map
  2010-05-01 20:44 ` [RFC/PATCH 2/6] DSPBRIDGE: remember mapping and page info in proc_map Ohad Ben-Cohen
@ 2010-05-15  8:34   ` Felipe Contreras
  2010-05-16 23:00     ` Ohad Ben-Cohen
  0 siblings, 1 reply; 45+ messages in thread
From: Felipe Contreras @ 2010-05-15  8:34 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

On Sat, May 1, 2010 at 11:44 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> Every time the MM application calls proc_map to map
> a memory area, remember the details of that mapping,
> together with the related page structures.
>
> Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
> ---
> If you want, you can also reach me at <  ohadb at ti dot com  >.

> @@ -948,6 +949,7 @@ static dsp_status bridge_dev_create(OUT struct wmd_dev_context **ppDevContext,
>                status = DSP_EMEMORY;
>                goto func_end;
>        }
> +
>        status = cfg_get_host_resources((struct cfg_devnode *)
>                                        drv_get_first_dev_extension(),
>                                        &resources);

Completely unrelated change. Moreover, this creates conflicts on
future commits since that code is gone.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 3/6] DSPBRIDGE: remove mapping information in proc_unmap
  2010-05-01 20:44 ` [RFC/PATCH 3/6] DSPBRIDGE: remove mapping information in proc_unmap Ohad Ben-Cohen
@ 2010-05-15  8:38   ` Felipe Contreras
  2010-05-16 23:02     ` Ohad Ben-Cohen
  0 siblings, 1 reply; 45+ messages in thread
From: Felipe Contreras @ 2010-05-15  8:38 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

On Sat, May 1, 2010 at 11:44 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> Clean up all mapping information resources whenever
> a buffer is unmapped.

If I understand correctly the previous patch doesn't make sense on
it's own because it will leak memory. Therefore it should be squashed
with this one. Otherwise a git bisect might land on a very bad place.
It's good for reviewing though.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-15  8:26         ` Felipe Contreras
@ 2010-05-15  9:08           ` Felipe Contreras
  2010-05-16 16:08             ` Felipe Contreras
  0 siblings, 1 reply; 45+ messages in thread
From: Felipe Contreras @ 2010-05-15  9:08 UTC (permalink / raw)
  To: Omar Ramirez Luna
  Cc: Ohad Ben-Cohen, linux-omap@vger.kernel.org, Kanigeri, Hari,
	Guzman Lugo, Fernando, Menon, Nishanth, Hiroshi Doyu

On Sat, May 15, 2010 at 11:26 AM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> On Fri, May 14, 2010 at 10:49 PM, Omar Ramirez Luna <omar.ramirez@ti.com> wrote:
>> On 5/14/2010 2:27 PM, Felipe Contreras wrote:
>> [...]
>>>
>>> So, I tried your patches, and a simple test app worked fine without
>>> modification, but a real video decoding hanged the device
>>> completely... some spinlock was stuck. I don't know if it's because of
>>> your patches, or because of the state of the bridge at that point.
>>> I'll try first to rebase to the latest to have a better idea of what's
>>> happening.
>>
>> You may want to check if you have this patch "DSPBRIDGE: Fix declaration and
>> initialization of sync objects"[1]
>
> I used what Ohad suggested: 13e2573. So no, that patch is not there.
>
>> This is the one I know fixes a spinlock issue and which description is not
>> clear enough to state that.
>>
>> File fixed was: drivers/dsp/bridge/wmd/msg_sm.c
>>
>> [1]http://git.kernel.org/?p=linux/kernel/git/tmlind/linux-omap-2.6.git;a=commit;h=b3900e6df1f4e16b59d506a299cd5084c67a6ede
>
> I tried to cherry-pick that commit... didn't help:
>
> BUG: spinlock lockup on CPU#0, gst-launch-0.10/534, c6404304
> Backtrace:
> [<c0040984>] (dump_backtrace+0x0/0xf8) from [<c036d204>] (dump_stack+0x18/0x1c)
>  r6:00000000 r5:c644a000 r4:c6404304 r3:00000000
> [<c036d1ec>] (dump_stack+0x0/0x1c) from [<c0213a08>]
> (__spin_lock_debug+0xbc/0xd0)
> [<c021394c>] (__spin_lock_debug+0x0/0xd0) from [<c0213aa0>]
> (do_raw_spin_lock+0x84/0xb4)
>  r8:c644be54 r7:00000064 r6:c005f178 r5:80000013 r4:c6404304
> [<c0213a1c>] (do_raw_spin_lock+0x0/0xb4) from [<c036ff6c>]
> (_raw_spin_lock_irqsave+0x5c/0x68)
>  r4:c6404304 r3:c7865a00
> [<c036ff10>] (_raw_spin_lock_irqsave+0x0/0x68) from [<c005f178>]
> (completion_done+0x1c/0x38)
>  r6:00000001 r5:c6404304 r4:c6404300
> [<c005f15c>] (completion_done+0x0/0x38) from [<bf00eb10>]
> (sync_wait_on_multiple_events+0x58/0x14c [bridgedriver])
>  r5:00000002 r4:c644be4c
> [<bf00eab8>] (sync_wait_on_multiple_events+0x0/0x14c [bridgedriver])
> from [<bf01158c>] (bridge_msg_get+0x154/0x240 [bridgedriver])
> [<bf011438>] (bridge_msg_get+0x0/0x240 [bridgedriver]) from
> [<bf02220c>] (node_get_message+0x94/0x128 [bridgedriver])
> [<bf022178>] (node_get_message+0x0/0x128 [bridgedriver]) from
> [<bf019968>] (nodewrap_get_message+0x28/0x8c [bridgedriver])
>  r7:c004db48 r6:bf02e7d6 r5:c644bebc r4:c644bf04
> [<bf019940>] (nodewrap_get_message+0x0/0x8c [bridgedriver]) from
> [<bf018d94>] (wcd_call_dev_io_ctl+0xf8/0x120 [bridgedriver])
>  r5:00000040 r4:c644bf1c
> [<bf018c9c>] (wcd_call_dev_io_ctl+0x0/0x120 [bridgedriver]) from
> [<bf02a058>] (bridge_ioctl+0xac/0xcc [bridgedriver])
>  r6:421e7d64 r5:c004db48 r4:c6419f00 r3:c65bb500
> [<bf029fac>] (bridge_ioctl+0x0/0xcc [bridgedriver]) from [<c0113eb4>]
> (vfs_ioctl+0x34/0xb4)
>  r6:421e7d64 r5:bf029fac r4:c6419f00
> [<c0113e80>] (vfs_ioctl+0x0/0xb4) from [<c01142b8>] (do_vfs_ioctl+0x1c4/0x1e0)
>  r7:00000005 r6:c004db48 r5:421e7d64 r4:421e7d64
> [<c01140f4>] (do_vfs_ioctl+0x0/0x1e0) from [<c0114314>] (sys_ioctl+0x40/0x64)
>  r4:c6419f00
> [<c01142d4>] (sys_ioctl+0x0/0x64) from [<c003cfc0>] (ret_fast_syscall+0x0/0x38)
>  r7:00000036 r6:0006ef10 r5:4055d860 r4:0006a000
>
> I'll try to rebase the patches to the latest head.

Nope, didn't help. I've put the updated patches here:
http://people.freedesktop.org/~felipec/dspbridge/

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-15  9:08           ` Felipe Contreras
@ 2010-05-16 16:08             ` Felipe Contreras
  0 siblings, 0 replies; 45+ messages in thread
From: Felipe Contreras @ 2010-05-16 16:08 UTC (permalink / raw)
  To: Omar Ramirez Luna
  Cc: Ohad Ben-Cohen, linux-omap@vger.kernel.org, Kanigeri, Hari,
	Guzman Lugo, Fernando, Menon, Nishanth, Hiroshi Doyu

On Sat, May 15, 2010 at 12:08 PM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> On Sat, May 15, 2010 at 11:26 AM, Felipe Contreras
> <felipe.contreras@gmail.com> wrote:
>> On Fri, May 14, 2010 at 10:49 PM, Omar Ramirez Luna <omar.ramirez@ti.com> wrote:
>>> On 5/14/2010 2:27 PM, Felipe Contreras wrote:
>>> [...]
>>>>
>>>> So, I tried your patches, and a simple test app worked fine without
>>>> modification, but a real video decoding hanged the device
>>>> completely... some spinlock was stuck. I don't know if it's because of
>>>> your patches, or because of the state of the bridge at that point.
>>>> I'll try first to rebase to the latest to have a better idea of what's
>>>> happening.
>>>
>>> You may want to check if you have this patch "DSPBRIDGE: Fix declaration and
>>> initialization of sync objects"[1]
>>
>> I used what Ohad suggested: 13e2573. So no, that patch is not there.
>>
>>> This is the one I know fixes a spinlock issue and which description is not
>>> clear enough to state that.
>>>
>>> File fixed was: drivers/dsp/bridge/wmd/msg_sm.c
>>>
>>> [1]http://git.kernel.org/?p=linux/kernel/git/tmlind/linux-omap-2.6.git;a=commit;h=b3900e6df1f4e16b59d506a299cd5084c67a6ede
>>
>> I tried to cherry-pick that commit... didn't help:
>>
>> BUG: spinlock lockup on CPU#0, gst-launch-0.10/534, c6404304
>> Backtrace:
>> [<c0040984>] (dump_backtrace+0x0/0xf8) from [<c036d204>] (dump_stack+0x18/0x1c)
>>  r6:00000000 r5:c644a000 r4:c6404304 r3:00000000
>> [<c036d1ec>] (dump_stack+0x0/0x1c) from [<c0213a08>]
>> (__spin_lock_debug+0xbc/0xd0)
>> [<c021394c>] (__spin_lock_debug+0x0/0xd0) from [<c0213aa0>]
>> (do_raw_spin_lock+0x84/0xb4)
>>  r8:c644be54 r7:00000064 r6:c005f178 r5:80000013 r4:c6404304
>> [<c0213a1c>] (do_raw_spin_lock+0x0/0xb4) from [<c036ff6c>]
>> (_raw_spin_lock_irqsave+0x5c/0x68)
>>  r4:c6404304 r3:c7865a00
>> [<c036ff10>] (_raw_spin_lock_irqsave+0x0/0x68) from [<c005f178>]
>> (completion_done+0x1c/0x38)
>>  r6:00000001 r5:c6404304 r4:c6404300
>> [<c005f15c>] (completion_done+0x0/0x38) from [<bf00eb10>]
>> (sync_wait_on_multiple_events+0x58/0x14c [bridgedriver])
>>  r5:00000002 r4:c644be4c
>> [<bf00eab8>] (sync_wait_on_multiple_events+0x0/0x14c [bridgedriver])
>> from [<bf01158c>] (bridge_msg_get+0x154/0x240 [bridgedriver])
>> [<bf011438>] (bridge_msg_get+0x0/0x240 [bridgedriver]) from
>> [<bf02220c>] (node_get_message+0x94/0x128 [bridgedriver])
>> [<bf022178>] (node_get_message+0x0/0x128 [bridgedriver]) from
>> [<bf019968>] (nodewrap_get_message+0x28/0x8c [bridgedriver])
>>  r7:c004db48 r6:bf02e7d6 r5:c644bebc r4:c644bf04
>> [<bf019940>] (nodewrap_get_message+0x0/0x8c [bridgedriver]) from
>> [<bf018d94>] (wcd_call_dev_io_ctl+0xf8/0x120 [bridgedriver])
>>  r5:00000040 r4:c644bf1c
>> [<bf018c9c>] (wcd_call_dev_io_ctl+0x0/0x120 [bridgedriver]) from
>> [<bf02a058>] (bridge_ioctl+0xac/0xcc [bridgedriver])
>>  r6:421e7d64 r5:c004db48 r4:c6419f00 r3:c65bb500
>> [<bf029fac>] (bridge_ioctl+0x0/0xcc [bridgedriver]) from [<c0113eb4>]
>> (vfs_ioctl+0x34/0xb4)
>>  r6:421e7d64 r5:bf029fac r4:c6419f00
>> [<c0113e80>] (vfs_ioctl+0x0/0xb4) from [<c01142b8>] (do_vfs_ioctl+0x1c4/0x1e0)
>>  r7:00000005 r6:c004db48 r5:421e7d64 r4:421e7d64
>> [<c01140f4>] (do_vfs_ioctl+0x0/0x1e0) from [<c0114314>] (sys_ioctl+0x40/0x64)
>>  r4:c6419f00
>> [<c01142d4>] (sys_ioctl+0x0/0x64) from [<c003cfc0>] (ret_fast_syscall+0x0/0x38)
>>  r7:00000036 r6:0006ef10 r5:4055d860 r4:0006a000
>>
>> I'll try to rebase the patches to the latest head.
>
> Nope, didn't help. I've put the updated patches here:
> http://people.freedesktop.org/~felipec/dspbridge/

Oops! Disregard that. I wasn't updating the uImage =/

The patches seem to work fine.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-14 19:27     ` Felipe Contreras
  2010-05-14 19:49       ` Omar Ramirez Luna
@ 2010-05-16 17:35       ` Felipe Contreras
  2010-05-16 22:57         ` Ohad Ben-Cohen
  2010-05-16 22:35       ` Ohad Ben-Cohen
  2010-05-20 21:18       ` Ohad Ben-Cohen
  3 siblings, 1 reply; 45+ messages in thread
From: Felipe Contreras @ 2010-05-16 17:35 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

On Fri, May 14, 2010 at 10:27 PM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> I spent some time looking deeper into this patch series, and I have some doubts.
>
> On Sun, May 2, 2010 at 8:47 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>> Basically you're right, but the patches currently silently allow
>> today's user space
>> to somehow work (because if you don't use dma bounce buffers and you feel lucky
>> about speculative prefetching then you might get away with not calling
>> dma unmap).
>> I did that on purpose, to ease testing & migration, but didn't
>> explicitely said it because
>>  frankly it's just wrong.
>
> I looked into the dma code (I guess it's in arch/arm/mm/dma-mapping.c)
> and I don't understand what dma_unmap_sg is supposed to do. I see that
> first some "safe buffer" is searched, and if there isn't any... then
> it depends on the direction whether something is actually done or not.
>
> I guess it depends whether our arch has dmabounce or not, which I have
> no idea, but if we do, wouldn't skiping dma_unmap calls leak some
> massive amount of "safe buffers"?

Now I understand better; arch/arm/mm/dma-mapping.c will not be used
unless CONFIG_DMABOUNCE=y, which is not the case for OMAP3.

So, as you can see in arch/arm/include/asm/dma-mapping.h, dma_unmap_sg
is a noop.

static inline void dma_unmap_single(struct device *dev, dma_addr_t handle,
		size_t size, enum dma_data_direction dir)
{
	/* nothing to do */
}

So, in the end, PROC_BEGINDMATODSP and PROC_BEGINDMAFROMDSP would do
exactly the same as PROC_FLUSHMEMORY and PROC_INVALIDATEMEMORY
(dmac_op_range/outer_io_range). And
PROC_ENDDMATODSP/PROC_ENDDMAFROMDSP don't do anything. Therefore even
if user-space updates to the new API, there's no change.

I don't think it makes sense to push for this new API since there will
be absolutely no gain.

>> What do you say about the following plan then:
>> - I'll add a single pair of begin_dma and end_dma commands that will
>> take the additional
>> direction parameter as described above. I'll then covert the existing
>> flush & invalidate commands to use this begin_dma command supplying it
>> the appropriate direction argument.
>
> Sounds perfect, but I wonder if the deprecated flush & invalidate
> would really work. See previous comments.

Actually it would work. I like this approach because it doesn't break
ABI, and doesn't change the semantics unless the new ioctls are used.

>> - In a separate patch, I'll deprecate flush & invalidate entirely
>> (usage of this deprecated
>> API will fail, resulting in a guiding error message).

I don't think there's any need for deprecation.

>> We get a sane and versatile (and platform-independent) implementation
>> that always work,
>> but breaks user space. If anyone would need to work with current user space,
>> the deprecating patch can always be reverted locally to get back a
>> flush & invalidate
>> implementations that (somehow) work.

I still would like the new API for the reason I mentioned before: so
that user-space can clean/invalidate/flush.

Cheers.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-14 19:27     ` Felipe Contreras
  2010-05-14 19:49       ` Omar Ramirez Luna
  2010-05-16 17:35       ` Felipe Contreras
@ 2010-05-16 22:35       ` Ohad Ben-Cohen
  2010-05-16 23:15         ` Felipe Contreras
  2010-05-20 21:18       ` Ohad Ben-Cohen
  3 siblings, 1 reply; 45+ messages in thread
From: Ohad Ben-Cohen @ 2010-05-16 22:35 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

Hi Felipe,

On Fri, May 14, 2010 at 10:27 PM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> Hi,
>
> I spent some time looking deeper into this patch series, and I have some doubts.
>
> On Sun, May 2, 2010 at 8:47 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>> On Sun, May 2, 2010 at 4:17 PM, Felipe Contreras
>> <felipe.contreras@gmail.com> wrote:
>>> On Sat, May 1, 2010 at 11:44 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>>>>   The patchset renames the flush ioctl to begin_dma_to_dsp and
>>>>   the invalidate ioctl to begin_dma_from_dsp. Both functions
>>>>   eventually call dma_map_sg, with the former requesting a
>>>>   DMA_BIDIRECTIONAL direction, and the latter requesting a
>>>>   DMA_FROM_DEVICE direction.
>>>>   In addition, the patchset adds two new APIs which calls dma_unmap_sg:
>>>>   end_dma_to_dsp and end_dma_from_dsp.
>>>>
>>>>   Ideally, there would be only a single begin_dma command and a single
>>>>   end_dma one, which would accept an additional parameter that will
>>>>   determine the direction of the transfer. Such an approach would be more
>>>>   versatile and cleaner, but it would also break all user space apps that
>>>>   use dspbridge today.
>>>
>>> If I understand correctly all user-space apps would be broken anyway
>>> because they are not issuing the end_dma calls. At the very least they
>>> need to be updated to use them.
>>
>> Basically you're right, but the patches currently silently allow
>> today's user space
>> to somehow work (because if you don't use dma bounce buffers and you feel lucky
>> about speculative prefetching then you might get away with not calling
>> dma unmap).
>> I did that on purpose, to ease testing & migration, but didn't
>> explicitely said it because
>>  frankly it's just wrong.
>
> I looked into the dma code (I guess it's in arch/arm/mm/dma-mapping.c)
> and I don't understand what dma_unmap_sg is supposed to do.

Portable code must use the dma_unmap_* APIs.

As you mentioned, one of the things it's crucial for
(but not used in our case) is bounce buffers:
e.g. when doing a DMA from a device using a buffer
that is out of the platform's DMA reach,
the DMA API uses bounce buffers: data DMA'ed to that
bounce buffer, and then the dma unmap copies it back
to the original buffer).

Another thing unmap is taking care of is speculative prefetch:
modern ARM processors can access the buffer during DMA
in order to speculatively prefetch data into the cache. Naturally
this can seriously break a DMA from a device, so unmap is
invalidating the caches to prevent this.

The current dspbridge cache API is mostly relying on the
application to do the right thing: application programmer should
decide when to clean, flush or invalidate the caches in order
to have a successful DMA operation.

This is prone to mistakes, it's not portable, and of course
not upstreamable.

Using the standard DMA API we take the freedom from the
application programmer - we just ask him/her to tell us before
a DMA operation begins, and after it ends. The DMA API
is then responsible to do the right thing.

> first some "safe buffer" is searched, and if there isn't any... then
> it depends on the direction whether something is actually done or not.
>
> I guess it depends whether our arch has dmabounce or not, which I have
> no idea, but if we do, wouldn't skiping dma_unmap calls leak some
> massive amount of "safe buffers"?
>
>>> Also, in Nokia we patched the bridgedriver to be able to have the 3
>>> operations available from user-space (clean, inv, and flush), so we
>>> would be very interested in having the direction of the transfer
>>> available.
>>
>> Thanks, that's important to know.
>
> It's not that critical, but I guess it can't hurt to do the right thing.
>
>> What do you say about the following plan then:
>> - I'll add a single pair of begin_dma and end_dma commands that will
>> take the additional
>> direction parameter as described above. I'll then covert the existing
>> flush & invalidate commands to use this begin_dma command supplying it
>> the appropriate direction argument.
>
> Sounds perfect, but I wonder if the deprecated flush & invalidate
> would really work. See previous comments.
>
>> - In a separate patch, I'll deprecate flush & invalidate entirely
>> (usage of this deprecated
>> API will fail, resulting in a guiding error message).
>>
>> We get a sane and versatile (and platform-independent) implementation
>> that always work,
>> but breaks user space. If anyone would need to work with current user space,
>> the deprecating patch can always be reverted locally to get back a
>> flush & invalidate
>> implementations that (somehow) work.
>
> User-space API is being broken all the time in dspbridge. The
> difference is that this one might require nontrivial changes. But it
> must be done.
>
> So, I tried your patches, and a simple test app worked fine without
> modification, but a real video decoding hanged the device
> completely... some spinlock was stuck. I don't know if it's because of
> your patches, or because of the state of the bridge at that point.
> I'll try first to rebase to the latest to have a better idea of what's
> happening.

I read at your latest posts that after rebasing to newest dspbridge
code the issue doesn't reproduce anymore ? Please tell me if it's back.

>
> Also, I noticed an important problem. Your code assumes that we would
> always have a bunch of scattered pages, however you are not taking
> into account bridge_brd_mem_map() where vm_flags have VM_IO... in that
> case get_user_pages() is skipped completely. This code-path is
> important in order to mmap framebuffer memory, which is contiguous.
> So, in this case there are no pages too keep track at all.
>
> This use-case is very important since the dspbridge mmap operation is
> very expensive, and due to GStreamer design, we must do it
> constantly... if the memory is contiguous (VM_IO), the mmap operation
> is really fast (or at least should be... in theory).

Thanks for catching this VM_IO code path. I'll take care of this
and resubmit the patches.

>
> Reading your patches I see the ioctl to start the dmap operations
> would simply error out, but from the looks of it, they would be
> failing already, which is weird, because we are already using
> framebuffer memory for video decoding + rendering. In gst-dsp we are
> not checking the success of clean/invalidate ioctls so it might be
> that it has been failing all this time and seems to work by pure luck.
>
> Anyway, I'll keep investigating and let you know if I find anything.


Again thanks for your review,
your comments are very helpful.

Ohad.


>
> Cheers.
>
> --
> Felipe Contreras
>
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-16 17:35       ` Felipe Contreras
@ 2010-05-16 22:57         ` Ohad Ben-Cohen
  2010-05-16 23:51           ` Felipe Contreras
  0 siblings, 1 reply; 45+ messages in thread
From: Ohad Ben-Cohen @ 2010-05-16 22:57 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

Hi Felipe,

On Sun, May 16, 2010 at 8:35 PM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> On Fri, May 14, 2010 at 10:27 PM, Felipe Contreras
> <felipe.contreras@gmail.com> wrote:
>> I spent some time looking deeper into this patch series, and I have some doubts.
>>
>> On Sun, May 2, 2010 at 8:47 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>>> Basically you're right, but the patches currently silently allow
>>> today's user space
>>> to somehow work (because if you don't use dma bounce buffers and you feel lucky
>>> about speculative prefetching then you might get away with not calling
>>> dma unmap).
>>> I did that on purpose, to ease testing & migration, but didn't
>>> explicitely said it because
>>>  frankly it's just wrong.
>>
>> I looked into the dma code (I guess it's in arch/arm/mm/dma-mapping.c)
>> and I don't understand what dma_unmap_sg is supposed to do. I see that
>> first some "safe buffer" is searched, and if there isn't any... then
>> it depends on the direction whether something is actually done or not.
>>
>> I guess it depends whether our arch has dmabounce or not, which I have
>> no idea, but if we do, wouldn't skiping dma_unmap calls leak some
>> massive amount of "safe buffers"?
>
> Now I understand better; arch/arm/mm/dma-mapping.c will not be used
> unless CONFIG_DMABOUNCE=y, which is not the case for OMAP3.
>
> So, as you can see in arch/arm/include/asm/dma-mapping.h, dma_unmap_sg
> is a noop.

This has changed since 2.6.34, when support was added
to ARM speculative prefetching. dma_unmap_* API is now
responsible to invalidate caches when data was moved in from the device
(regardless of CONFIG_DMABOUNCE).

Dspbridge really must support this, and applications should start
using it.

Whether to deprecate the old API ? Eventually I think we should,
but probably not anytime soon. Let's take the kernel approach of minimizing
user space pain: keep the old API around, mark it as candidate for
deprecation, and rethink in the future.

Thanks,
Ohad.

>
> static inline void dma_unmap_single(struct device *dev, dma_addr_t handle,
>                size_t size, enum dma_data_direction dir)
> {
>        /* nothing to do */
> }
>
> So, in the end, PROC_BEGINDMATODSP and PROC_BEGINDMAFROMDSP would do
> exactly the same as PROC_FLUSHMEMORY and PROC_INVALIDATEMEMORY
> (dmac_op_range/outer_io_range). And
> PROC_ENDDMATODSP/PROC_ENDDMAFROMDSP don't do anything. Therefore even
> if user-space updates to the new API, there's no change.
>
> I don't think it makes sense to push for this new API since there will
> be absolutely no gain.
>
>>> What do you say about the following plan then:
>>> - I'll add a single pair of begin_dma and end_dma commands that will
>>> take the additional
>>> direction parameter as described above. I'll then covert the existing
>>> flush & invalidate commands to use this begin_dma command supplying it
>>> the appropriate direction argument.
>>
>> Sounds perfect, but I wonder if the deprecated flush & invalidate
>> would really work. See previous comments.
>
> Actually it would work. I like this approach because it doesn't break
> ABI, and doesn't change the semantics unless the new ioctls are used.
>
>>> - In a separate patch, I'll deprecate flush & invalidate entirely
>>> (usage of this deprecated
>>> API will fail, resulting in a guiding error message).
>
> I don't think there's any need for deprecation.
>
>>> We get a sane and versatile (and platform-independent) implementation
>>> that always work,
>>> but breaks user space. If anyone would need to work with current user space,
>>> the deprecating patch can always be reverted locally to get back a
>>> flush & invalidate
>>> implementations that (somehow) work.
>
> I still would like the new API for the reason I mentioned before: so
> that user-space can clean/invalidate/flush.
>
> Cheers.
>
> --
> Felipe Contreras
>
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 2/6] DSPBRIDGE: remember mapping and page info in proc_map
  2010-05-15  8:34   ` Felipe Contreras
@ 2010-05-16 23:00     ` Ohad Ben-Cohen
  0 siblings, 0 replies; 45+ messages in thread
From: Ohad Ben-Cohen @ 2010-05-16 23:00 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

Hi Felipe,

On Sat, May 15, 2010 at 11:34 AM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> On Sat, May 1, 2010 at 11:44 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>> Every time the MM application calls proc_map to map
>> a memory area, remember the details of that mapping,
>> together with the related page structures.
>>
>> Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
>> ---
>> If you want, you can also reach me at <  ohadb at ti dot com  >.
>
>> @@ -948,6 +949,7 @@ static dsp_status bridge_dev_create(OUT struct wmd_dev_context **ppDevContext,
>>                status = DSP_EMEMORY;
>>                goto func_end;
>>        }
>> +
>>        status = cfg_get_host_resources((struct cfg_devnode *)
>>                                        drv_get_first_dev_extension(),
>>                                        &resources);
>
> Completely unrelated change. Moreover, this creates conflicts on
> future commits since that code is gone.

Which change are you referring to ? This is merely a newline :)
Or are you referring to something else ?

Thanks,
Ohad.

>
> --
> Felipe Contreras
>
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 3/6] DSPBRIDGE: remove mapping information in proc_unmap
  2010-05-15  8:38   ` Felipe Contreras
@ 2010-05-16 23:02     ` Ohad Ben-Cohen
  0 siblings, 0 replies; 45+ messages in thread
From: Ohad Ben-Cohen @ 2010-05-16 23:02 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

Hi Felipe,

On Sat, May 15, 2010 at 11:38 AM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> On Sat, May 1, 2010 at 11:44 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>> Clean up all mapping information resources whenever
>> a buffer is unmapped.
>
> If I understand correctly the previous patch doesn't make sense on
> it's own because it will leak memory. Therefore it should be squashed
> with this one. Otherwise a git bisect might land on a very bad place.
> It's good for reviewing though.

Squashed it will be.

Thanks for your review,
Ohad.

>
> --
> Felipe Contreras
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-16 22:35       ` Ohad Ben-Cohen
@ 2010-05-16 23:15         ` Felipe Contreras
  2010-05-16 23:21           ` Ohad Ben-Cohen
  2010-05-16 23:25           ` Ohad Ben-Cohen
  0 siblings, 2 replies; 45+ messages in thread
From: Felipe Contreras @ 2010-05-16 23:15 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

Hi Ohad,

On Mon, May 17, 2010 at 1:35 AM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> On Fri, May 14, 2010 at 10:27 PM, Felipe Contreras
> <felipe.contreras@gmail.com> wrote:
>> I looked into the dma code (I guess it's in arch/arm/mm/dma-mapping.c)
>> and I don't understand what dma_unmap_sg is supposed to do.
>
> Portable code must use the dma_unmap_* APIs.
>
> As you mentioned, one of the things it's crucial for
> (but not used in our case) is bounce buffers:
> e.g. when doing a DMA from a device using a buffer
> that is out of the platform's DMA reach,
> the DMA API uses bounce buffers: data DMA'ed to that
> bounce buffer, and then the dma unmap copies it back
> to the original buffer).

Yes, but in dspbridge the most important use-case is video
encoding/decoding, so we definitely don't want bouncing buffers.
Besides, from what I can see very few platforms need them.

> Another thing unmap is taking care of is speculative prefetch:
> modern ARM processors can access the buffer during DMA
> in order to speculatively prefetch data into the cache. Naturally
> this can seriously break a DMA from a device, so unmap is
> invalidating the caches to prevent this.
>
> The current dspbridge cache API is mostly relying on the
> application to do the right thing: application programmer should
> decide when to clean, flush or invalidate the caches in order
> to have a successful DMA operation.
>
> This is prone to mistakes, it's not portable, and of course
> not upstreamable.
>
> Using the standard DMA API we take the freedom from the
> application programmer - we just ask him/her to tell us before
> a DMA operation begins, and after it ends. The DMA API
> is then responsible to do the right thing.

It is a higher level of abstraction, but the application still needs
to do the right thing, and errors can still happen. Anyway, I see now
the changes 2.6.34.
can
> I read at your latest posts that after rebasing to newest dspbridge
> code the issue doesn't reproduce anymore ? Please tell me if it's back.

Yes... I haven't tried in the old commit you started from + the patch
Omar suggested, but I guess there's no need for that any more. I can
play with this code now :)

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-16 23:15         ` Felipe Contreras
@ 2010-05-16 23:21           ` Ohad Ben-Cohen
  2010-05-16 23:25           ` Ohad Ben-Cohen
  1 sibling, 0 replies; 45+ messages in thread
From: Ohad Ben-Cohen @ 2010-05-16 23:21 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

On Mon, May 17, 2010 at 2:15 AM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> Hi Ohad,
>
> On Mon, May 17, 2010 at 1:35 AM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>> On Fri, May 14, 2010 at 10:27 PM, Felipe Contreras
>> <felipe.contreras@gmail.com> wrote:
>>> I looked into the dma code (I guess it's in arch/arm/mm/dma-mapping.c)
>>> and I don't understand what dma_unmap_sg is supposed to do.
>>
>> Portable code must use the dma_unmap_* APIs.
>>
>> As you mentioned, one of the things it's crucial for
>> (but not used in our case) is bounce buffers:
>> e.g. when doing a DMA from a device using a buffer
>> that is out of the platform's DMA reach,
>> the DMA API uses bounce buffers: data DMA'ed to that
>> bounce buffer, and then the dma unmap copies it back
>> to the original buffer).
>
> Yes, but in dspbridge the most important use-case is video
> encoding/decoding, so we definitely don't want bouncing buffers.
> Besides, from what I can see very few platforms need them.

Sure.
I just mentioned that as one reason why portable code
that does DMA must use the unmap API.

>
>> Another thing unmap is taking care of is speculative prefetch:
>> modern ARM processors can access the buffer during DMA
>> in order to speculatively prefetch data into the cache. Naturally
>> this can seriously break a DMA from a device, so unmap is
>> invalidating the caches to prevent this.
>>
>> The current dspbridge cache API is mostly relying on the
>> application to do the right thing: application programmer should
>> decide when to clean, flush or invalidate the caches in order
>> to have a successful DMA operation.
>>
>> This is prone to mistakes, it's not portable, and of course
>> not upstreamable.
>>
>> Using the standard DMA API we take the freedom from the
>> application programmer - we just ask him/her to tell us before
>> a DMA operation begins, and after it ends. The DMA API
>> is then responsible to do the right thing.
>
> It is a higher level of abstraction, but the application still needs
> to do the right thing, and errors can still happen. Anyway, I see now
> the changes 2.6.34.
> can
>> I read at your latest posts that after rebasing to newest dspbridge
>> code the issue doesn't reproduce anymore ? Please tell me if it's back.
>
> Yes... I haven't tried in the old commit you started from + the patch
> Omar suggested, but I guess there's no need for that any more. I can
> play with this code now :)
>
> --
> Felipe Contreras
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-16 23:15         ` Felipe Contreras
  2010-05-16 23:21           ` Ohad Ben-Cohen
@ 2010-05-16 23:25           ` Ohad Ben-Cohen
  2010-05-17  0:05             ` Felipe Contreras
  1 sibling, 1 reply; 45+ messages in thread
From: Ohad Ben-Cohen @ 2010-05-16 23:25 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

On Mon, May 17, 2010 at 2:15 AM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> Hi Ohad,
>
> On Mon, May 17, 2010 at 1:35 AM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>> On Fri, May 14, 2010 at 10:27 PM, Felipe Contreras
>> <felipe.contreras@gmail.com> wrote:
>>> I looked into the dma code (I guess it's in arch/arm/mm/dma-mapping.c)
>>> and I don't understand what dma_unmap_sg is supposed to do.
>>
>> Portable code must use the dma_unmap_* APIs.
>>
>> As you mentioned, one of the things it's crucial for
>> (but not used in our case) is bounce buffers:
>> e.g. when doing a DMA from a device using a buffer
>> that is out of the platform's DMA reach,
>> the DMA API uses bounce buffers: data DMA'ed to that
>> bounce buffer, and then the dma unmap copies it back
>> to the original buffer).
>
> Yes, but in dspbridge the most important use-case is video
> encoding/decoding, so we definitely don't want bouncing buffers.
> Besides, from what I can see very few platforms need them.
>
>> Another thing unmap is taking care of is speculative prefetch:
>> modern ARM processors can access the buffer during DMA
>> in order to speculatively prefetch data into the cache. Naturally
>> this can seriously break a DMA from a device, so unmap is
>> invalidating the caches to prevent this.
>>
>> The current dspbridge cache API is mostly relying on the
>> application to do the right thing: application programmer should
>> decide when to clean, flush or invalidate the caches in order
>> to have a successful DMA operation.
>>
>> This is prone to mistakes, it's not portable, and of course
>> not upstreamable.
>>
>> Using the standard DMA API we take the freedom from the
>> application programmer - we just ask him/her to tell us before
>> a DMA operation begins, and after it ends. The DMA API
>> is then responsible to do the right thing.
>
> It is a higher level of abstraction, but the application still needs
> to do the right thing, and errors can still happen. Anyway, I see now
> the changes 2.6.34.
> can
>> I read at your latest posts that after rebasing to newest dspbridge
>> code the issue doesn't reproduce anymore ? Please tell me if it's back.
>
> Yes... I haven't tried in the old commit you started from + the patch
> Omar suggested, but I guess there's no need for that any more. I can
> play with this code now :)

Out of curiosity, what board/environment do you use to play with
the code ? I'd like to run the same use cases you do, so I can
reproduce any issue you may bump into.

Thanks,
Ohad.

>
> --
> Felipe Contreras
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-16 22:57         ` Ohad Ben-Cohen
@ 2010-05-16 23:51           ` Felipe Contreras
  2010-05-18  8:05             ` Ohad Ben-Cohen
  0 siblings, 1 reply; 45+ messages in thread
From: Felipe Contreras @ 2010-05-16 23:51 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

Hello,

On Mon, May 17, 2010 at 1:57 AM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> On Sun, May 16, 2010 at 8:35 PM, Felipe Contreras
> <felipe.contreras@gmail.com> wrote:
>> Now I understand better; arch/arm/mm/dma-mapping.c will not be used
>> unless CONFIG_DMABOUNCE=y, which is not the case for OMAP3.
>>
>> So, as you can see in arch/arm/include/asm/dma-mapping.h, dma_unmap_sgprefetching
>> is a noop.
>
> This has changed since 2.6.34, when support was added
> to ARM speculative prefetching. dma_unmap_* API is now
> responsible to invalidate caches when data was moved in from the device
> (regardless of CONFIG_DMABOUNCE).

Yes, I just realized that the implementation of dma_unmap changed. In
2.6.34 issues with speculative prefetching should be handled better
with dma_map/dma_unmap.

However, the situations the speculative prefetching fixes are supposed
to avoid, are present in any kernel <= 2.6.33, right? So, sure, it
would be nice to avoid those issues, but before that, the priority is
to get dspbridge working on 2.6.34.

So, in theory the easiest thing to do is to reach the deprecated API
dmac_flush_range/dmac_inv_range somehow.

Now, to simplify the analysis I'll ignore the portability arguments,
and concentrate on OMAP3, which is the only platform really supported
by the dspbridge. Here OUTER_CACHE=n, so, outer_*_range are no ops.

So currently (v2.6.33), before sending a read-olny buffer (TO_DEVICE),
we do dmac_flush_range (should be clean, but whatever), and before
sending a write-only (FROM_DEVICE), we do dmac_inv_range.

On v2.6.33 the same could be achieved with only dma_map_* functions,
but on v2.6.34 that's not the case.

This is what a properly implemented application with dma_map_* and
dma_map_* functions would end up doing to achieve similar behavior:

in: begin BIDI:
	dma_clean_range

out: begin FROM_DEV:
	dma_clean_range

out: end FROM_DEV:
	dma_inv_range

in: end BIDI:
	dma_inv_range

>From my reduced experience in caches I already see a problem there,
but that's irrelevant right now. Wouldn't the old
PROC_FLUSHMEMORY/PROC_INVALIDATEMEMORY behavior be achieved by calling
the begin and end operations at the same time? Or even easier, just
call dmac_flush_range (which is not obsolete yet)?

I'm not saying this should be the final solution... we should
definitely go for the right API, but in the meantime, I think the old
applications that still use the old API can work just fine if we call
dmac_flush_range(). This is only for compatibility reasons.

Here I'm assuming that applications would call PROC_INVALIDATEMEMORY
*before* sending the buffer to the DSP, which is the case for both
openmax and gst-dsp.

> Dspbridge really must support this, and applications should start
> using it.
>
> Whether to deprecate the old API ? Eventually I think we should,
> but probably not anytime soon. Let's take the kernel approach of minimizing
> user space pain: keep the old API around, mark it as candidate for
> deprecation, and rethink in the future.

Now that I see the changes for speculative prefetching, yeah, I agree.
But I think we can minimize the pain even more by providing an old
compatible API that actually works well (almost as good as before).

Cheers.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-16 23:25           ` Ohad Ben-Cohen
@ 2010-05-17  0:05             ` Felipe Contreras
  2010-05-18  6:20               ` Ohad Ben-Cohen
  0 siblings, 1 reply; 45+ messages in thread
From: Felipe Contreras @ 2010-05-17  0:05 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

On Mon, May 17, 2010 at 2:25 AM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> Out of curiosity, what board/environment do you use to play with
> the code ? I'd like to run the same use cases you do, so I can
> reproduce any issue you may bump into.

I use a beagleboard, use the DSP firmware in L23.i3.3[1], do simple
tests with dsp-test[2], and real tests with gst-dsp[3].

I just pushed my latest changes of dsp-tools, which include an app
called dsp-test that sends buffers back and forth (like TI's
dmmcopy.out), but has options to trigger MMU faults, change buffer
sizes, and number of iterations, along with a trivial DSP socket-node.

The whole thing (app + socket-node) is 15K and you don't even need
dynreg.out to load. Also, the code is really really simple. There are
no dependencies, so you can just type 'make'.

I'm not sure if you are familiar with GStreamer, but that's pretty
much all you need to compile gst-dsp. If you have problems we have a
relatively active mailing list.

Cheers.

[1] http://www.omapedia.org/wiki/L23.i3.3_Release_Notes
[2] http://github.com/felipec/dsp-tools
[3] http://github.com/felipec/gst-dsp

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-17  0:05             ` Felipe Contreras
@ 2010-05-18  6:20               ` Ohad Ben-Cohen
  0 siblings, 0 replies; 45+ messages in thread
From: Ohad Ben-Cohen @ 2010-05-18  6:20 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

Hi Felipe,

On Mon, May 17, 2010 at 3:05 AM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> On Mon, May 17, 2010 at 2:25 AM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>> Out of curiosity, what board/environment do you use to play with
>> the code ? I'd like to run the same use cases you do, so I can
>> reproduce any issue you may bump into.
>
> I use a beagleboard, use the DSP firmware in L23.i3.3[1], do simple
> tests with dsp-test[2], and real tests with gst-dsp[3].
>
> I just pushed my latest changes of dsp-tools, which include an app
> called dsp-test that sends buffers back and forth (like TI's
> dmmcopy.out), but has options to trigger MMU faults, change buffer
> sizes, and number of iterations, along with a trivial DSP socket-node.
>
> The whole thing (app + socket-node) is 15K and you don't even need
> dynreg.out to load. Also, the code is really really simple. There are
> no dependencies, so you can just type 'make'.
>
> I'm not sure if you are familiar with GStreamer, but that's pretty
> much all you need to compile gst-dsp. If you have problems we have a
> relatively active mailing list.

Thanks for the detailed answer!

I'll surely look into this.
Ohad.

>
> Cheers.
>
> [1] http://www.omapedia.org/wiki/L23.i3.3_Release_Notes
> [2] http://github.com/felipec/dsp-tools
> [3] http://github.com/felipec/gst-dsp
>
> --
> Felipe Contreras
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-16 23:51           ` Felipe Contreras
@ 2010-05-18  8:05             ` Ohad Ben-Cohen
  2010-05-18 11:02               ` Felipe Contreras
  0 siblings, 1 reply; 45+ messages in thread
From: Ohad Ben-Cohen @ 2010-05-18  8:05 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

On Mon, May 17, 2010 at 2:51 AM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> So currently (v2.6.33), before sending a read-olny buffer (TO_DEVICE),
> we do dmac_flush_range (should be clean, but whatever) and before
> sending a write-only (FROM_DEVICE), we do dmac_inv_range.
>
> On v2.6.33 the same could be achieved with only dma_map_* functions,
> but on v2.6.34 that's not the case.

Sorry, I didn't completely follow you here, let's take a look at the
mapping code:

ENTRY(v7_dma_map_area)
	add	r1, r1, r0
	teq	r2, #DMA_FROM_DEVICE
	beq	v7_dma_inv_range
	b	v7_dma_clean_range
ENDPROC(v7_dma_map_area)

DMA_FROM_DEVICE will get the cache invalidated and then cleaned,
and DMA_BIDIRECTIONAL/DMA_TO_DEVICE will get the cache only
cleaned.

Unfortunately I don't have a setup right now to test this, but the code
seems to be ok for our needs, don't you think ?

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-18  8:05             ` Ohad Ben-Cohen
@ 2010-05-18 11:02               ` Felipe Contreras
  2010-05-18 11:14                 ` Ohad Ben-Cohen
  0 siblings, 1 reply; 45+ messages in thread
From: Felipe Contreras @ 2010-05-18 11:02 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

On Tue, May 18, 2010 at 11:05 AM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> On Mon, May 17, 2010 at 2:51 AM, Felipe Contreras
> <felipe.contreras@gmail.com> wrote:
>> So currently (v2.6.33), before sending a read-olny buffer (TO_DEVICE),
>> we do dmac_flush_range (should be clean, but whatever) and before
>> sending a write-only (FROM_DEVICE), we do dmac_inv_range.
>>
>> On v2.6.33 the same could be achieved with only dma_map_* functions,
>> but on v2.6.34 that's not the case.
>
> Sorry, I didn't completely follow you here, let's take a look at the
> mapping code:
>
> ENTRY(v7_dma_map_area)
>        add     r1, r1, r0
>        teq     r2, #DMA_FROM_DEVICE
>        beq     v7_dma_inv_range
>        b       v7_dma_clean_range
> ENDPROC(v7_dma_map_area)
>
> DMA_FROM_DEVICE will get the cache invalidated and then cleaned,
> and DMA_BIDIRECTIONAL/DMA_TO_DEVICE will get the cache only
> cleaned.

No, when DMA_FROM_DEVICE, only v7_dma_inv_range is executed (see the
mov pc, lr at the end).

> Unfortunately I don't have a setup right now to test this, but the code
> seems to be ok for our needs, don't you think ?

But yeah, actually that fits our needs; calling the dma_map only,
while still wrong, will give us the same behavior we have right now
(v2.6.33).

So my table was wrong, it's actually:

in: begin BIDI:
       dma_clean_range

out: begin FROM_DEV:
       dma_inv_range

out: end FROM_DEV:
       dma_inv_range

in: end BIDI:
       dma_inv_range

Cheers.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-18 11:02               ` Felipe Contreras
@ 2010-05-18 11:14                 ` Ohad Ben-Cohen
  2010-05-18 11:43                   ` Felipe Contreras
  0 siblings, 1 reply; 45+ messages in thread
From: Ohad Ben-Cohen @ 2010-05-18 11:14 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

On Tue, May 18, 2010 at 2:02 PM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> On Tue, May 18, 2010 at 11:05 AM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>> On Mon, May 17, 2010 at 2:51 AM, Felipe Contreras
>> <felipe.contreras@gmail.com> wrote:
>>> So currently (v2.6.33), before sending a read-olny buffer (TO_DEVICE),
>>> we do dmac_flush_range (should be clean, but whatever) and before
>>> sending a write-only (FROM_DEVICE), we do dmac_inv_range.
>>>
>>> On v2.6.33 the same could be achieved with only dma_map_* functions,
>>> but on v2.6.34 that's not the case.
>>
>> Sorry, I didn't completely follow you here, let's take a look at the
>> mapping code:
>>
>> ENTRY(v7_dma_map_area)
>>        add     r1, r1, r0
>>        teq     r2, #DMA_FROM_DEVICE
>>        beq     v7_dma_inv_range
>>        b       v7_dma_clean_range
>> ENDPROC(v7_dma_map_area)
>>
>> DMA_FROM_DEVICE will get the cache invalidated and then cleaned,
>> and DMA_BIDIRECTIONAL/DMA_TO_DEVICE will get the cache only
>> cleaned.
>
> No, when DMA_FROM_DEVICE, only v7_dma_inv_range is executed (see the
> mov pc, lr at the end).

Oh, sure. thanks for pointing that out.

>> Unfortunately I don't have a setup right now to test this, but the code
>> seems to be ok for our needs, don't you think ?
>
> But yeah, actually that fits our needs; calling the dma_map only,
> while still wrong, will give us the same behavior we have right now
> (v2.6.33).
>

Great, so are you ok with this patchset proposal ?

I'll just add support for the VM_IO path you mentioned.

Thanks,
Ohad.

> So my table was wrong, it's actually:
>
> in: begin BIDI:
>       dma_clean_range
>
> out: begin FROM_DEV:
>       dma_inv_range
>
> out: end FROM_DEV:
>       dma_inv_range
>
> in: end BIDI:
>       dma_inv_range
>
> Cheers.
>
> --
> Felipe Contreras
>
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-18 11:14                 ` Ohad Ben-Cohen
@ 2010-05-18 11:43                   ` Felipe Contreras
  2010-05-18 11:57                     ` Ohad Ben-Cohen
  0 siblings, 1 reply; 45+ messages in thread
From: Felipe Contreras @ 2010-05-18 11:43 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

On Tue, May 18, 2010 at 2:14 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>>> Unfortunately I don't have a setup right now to test this, but the code
>>> seems to be ok for our needs, don't you think ?
>>
>> But yeah, actually that fits our needs; calling the dma_map only,
>> while still wrong, will give us the same behavior we have right now
>> (v2.6.33).
>
> Great, so are you ok with this patchset proposal ?

I thought you were going to add separate ioctls, one for dma_map, and
another for dma_unmap that receive direction as an argument. Then, map
the current PROC_FLUSH/INVALIDATE to those without changing their
semantics, but marking them as deprecated.

> I'll just add support for the VM_IO path you mentioned.

Cool. I actually tried your patches to render to the framebuffer, and
everything seemed to work fine. I didn't check for error codes or
anything, so I'm not sure what's going on.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-18 11:43                   ` Felipe Contreras
@ 2010-05-18 11:57                     ` Ohad Ben-Cohen
  2010-05-18 12:24                       ` Felipe Contreras
  0 siblings, 1 reply; 45+ messages in thread
From: Ohad Ben-Cohen @ 2010-05-18 11:57 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

On Tue, May 18, 2010 at 2:43 PM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> On Tue, May 18, 2010 at 2:14 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>>>> Unfortunately I don't have a setup right now to test this, but the code
>>>> seems to be ok for our needs, don't you think ?
>>>
>>> But yeah, actually that fits our needs; calling the dma_map only,
>>> while still wrong, will give us the same behavior we have right now
>>> (v2.6.33).
>>
>> Great, so are you ok with this patchset proposal ?
>
> I thought you were going to add separate ioctls, one for dma_map, and
> another for dma_unmap that receive direction as an argument. Then, map
> the current PROC_FLUSH/INVALIDATE to those without changing their
> semantics, but marking them as deprecated.

Yes, I am (this and a few other small stuff we mentioned in the
threads) - I was just asking generally about the move to dma_(un)map_*
API.

Anyway, I'll prepare a v2 and resubmit.

>
>> I'll just add support for the VM_IO path you mentioned.
>
> Cool. I actually tried your patches to render to the framebuffer, and
> everything seemed to work fine. I didn't check for error codes or
> anything, so I'm not sure what's going on.

How is the framebuffer mmap'ed ?

Can you please tell me more about this scenario ?
(applications + drivers involved).

How do I test this scenario ? using the tools you sent me ?
Do I have to have a beagle board or will ZOOM do ?

Thanks,
Ohad.


>
> --
> Felipe Contreras
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-18 11:57                     ` Ohad Ben-Cohen
@ 2010-05-18 12:24                       ` Felipe Contreras
  2010-05-18 12:53                         ` Ohad Ben-Cohen
  0 siblings, 1 reply; 45+ messages in thread
From: Felipe Contreras @ 2010-05-18 12:24 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

On Tue, May 18, 2010 at 2:57 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> On Tue, May 18, 2010 at 2:43 PM, Felipe Contreras
> <felipe.contreras@gmail.com> wrote:
>>> I'll just add support for the VM_IO path you mentioned.
>>
>> Cool. I actually tried your patches to render to the framebuffer, and
>> everything seemed to work fine. I didn't check for error codes or
>> anything, so I'm not sure what's going on.
>
> How is the framebuffer mmap'ed ?

mmap(NULL, self->mem_info.size, PROT_WRITE, MAP_SHARED, self->overlay_fd, 0);

> Can you please tell me more about this scenario ?
> (applications + drivers involved).

I use gst-omapfb[1], and then it's very easy with a gst-launch command
(part of GStreamer):

 gst-launch filesrc location=video.avi ! avidemux ! dspvdec ! omapfbsink

> How do I test this scenario ? using the tools you sent me ?
> Do I have to have a beagle board or will ZOOM do ?

I guess you would need to modify dsp-tools to use framebuffer memory.
Or you can do what I do and use gst-dsp + gst-omapfb.

I don't have a zoom, so I haven't tried, but gst-dsp should work on
any platform, and I have tried to make gst-omapfb the same, although
YMMV.

You could also try these binaries:
http://people.freedesktop.org/~felipec/dsp/gst-omap-test.tar.bz

Put on any system, on /opt/gst, and then PATH=/opt/gst/bin.

Cheers.

[1] http://github.com/felipec/gst-omapfb
[2] http://people.freedesktop.org/~felipec/beagle-2.6.32-rc3/

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-18 12:24                       ` Felipe Contreras
@ 2010-05-18 12:53                         ` Ohad Ben-Cohen
  2010-05-19 16:50                           ` Felipe Contreras
  0 siblings, 1 reply; 45+ messages in thread
From: Ohad Ben-Cohen @ 2010-05-18 12:53 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

On Tue, May 18, 2010 at 3:24 PM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
>>> Cool. I actually tried your patches to render to the framebuffer, and
>>> everything seemed to work fine. I didn't check for error codes or
>>> anything, so I'm not sure what's going on.
>>
>> How is the framebuffer mmap'ed ?
>
> mmap(NULL, self->mem_info.size, PROT_WRITE, MAP_SHARED, self->overlay_fd, 0);
>
>> Can you please tell me more about this scenario ?
>> (applications + drivers involved).
>
> I use gst-omapfb[1], and then it's very easy with a gst-launch command

Ok, thanks. I think I know why it worked for you -
The framebuffer is mmap'ed as uncacheable on ARM (check out
fb_pgprotect in fbmem.c),
which makes the whole dspbridge cache manipulations on it redundant.
In our case the cache operations silently failed, but it didn't matter.

We can probably just return whenever a dspbridge DMA operation will be
requested on VM_IO buffers (unless there are other VM_IO dspbrdige use
cases which are different).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-18 12:53                         ` Ohad Ben-Cohen
@ 2010-05-19 16:50                           ` Felipe Contreras
  2010-05-20 21:22                             ` Ohad Ben-Cohen
  0 siblings, 1 reply; 45+ messages in thread
From: Felipe Contreras @ 2010-05-19 16:50 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

On Tue, May 18, 2010 at 3:53 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> On Tue, May 18, 2010 at 3:24 PM, Felipe Contreras
> <felipe.contreras@gmail.com> wrote:
>>>> Cool. I actually tried your patches to render to the framebuffer, and
>>>> everything seemed to work fine. I didn't check for error codes or
>>>> anything, so I'm not sure what's going on.
>>>
>>> How is the framebuffer mmap'ed ?
>>
>> mmap(NULL, self->mem_info.size, PROT_WRITE, MAP_SHARED, self->overlay_fd, 0);
>>
>>> Can you please tell me more about this scenario ?
>>> (applications + drivers involved).
>>
>> I use gst-omapfb[1], and then it's very easy with a gst-launch command
>
> Ok, thanks. I think I know why it worked for you -
> The framebuffer is mmap'ed as uncacheable on ARM (check out
> fb_pgprotect in fbmem.c),
> which makes the whole dspbridge cache manipulations on it redundant.
> In our case the cache operations silently failed, but it didn't matter.

Indeed, although it's not uncacheable, it's writecombine. That explains a lot.

> We can probably just return whenever a dspbridge DMA operation will be
> requested on VM_IO buffers (unless there are other VM_IO dspbrdige use
> cases which are different).

I don't know what VM_IO means, but essentially we don't want to go
find each and page when we know the memory is contiguous. Although the
fact that the memory is writecombine would help us avoid unnecessary
flushes as well :)

A few updates:
1) libomxil-ti doesn't use the flush/inv ioctls at all... all buffers
are mapped/unmapped
2) I said only 4 buffers are mapped at a time. Actually, it's more
like 8 * 3 + 1. Most of these are 4K, although only a few bytes are
used.
4) I'm already doing the changes needed in gst-dsp.

Cheers.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-14 19:27     ` Felipe Contreras
                         ` (2 preceding siblings ...)
  2010-05-16 22:35       ` Ohad Ben-Cohen
@ 2010-05-20 21:18       ` Ohad Ben-Cohen
  3 siblings, 0 replies; 45+ messages in thread
From: Ohad Ben-Cohen @ 2010-05-20 21:18 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

Hi Felipe,

On Fri, May 14, 2010 at 10:27 PM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> Also, I noticed an important problem. Your code assumes that we would
> always have a bunch of scattered pages,

Just a small note: the pages can be contiguous even in normal memory
allocation (not with framebuffer). That's perfectly ok and there's no
problem with that in the code.

> however you are not taking
> into account bridge_brd_mem_map() where vm_flags have VM_IO... in that
> case get_user_pages() is skipped completely. This code-path is
> important in order to mmap framebuffer memory, which is contiguous.
> So, in this case there are no pages too keep track at all.

I gave some thought to this, and I concluded that nothing needs to be
changed in the patches for this scenario.
You see, in this case, as you said, no pages were pinned in memory
using get_user_pages.
This is perfectly fine since the MM application writes to a physical
writecombine framebuffer memory which is not cached or swapped out.
If the MM application will try to begin a DMA operation on this memory
it will immediately fail because no related mapping will be found.
This should actually be the expected behavior in such case: there
really isn't a mapping to this buffer since there's no point to have
it in VM_IO memory, and applications should not even try to flush/inv
such memory: it's just redundant.

IOW, the support for this scenario is actually to have no support :
just return an error, which is what already happens.

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-19 16:50                           ` Felipe Contreras
@ 2010-05-20 21:22                             ` Ohad Ben-Cohen
  2010-05-20 21:23                               ` Ohad Ben-Cohen
  2010-05-21  6:14                               ` Felipe Contreras
  0 siblings, 2 replies; 45+ messages in thread
From: Ohad Ben-Cohen @ 2010-05-20 21:22 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

Hi Felipe,

On Wed, May 19, 2010 at 7:50 PM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> I don't know what VM_IO means, but essentially we don't want to go
> find each and page when we know the memory is contiguous. Although the
> fact that the memory is writecombine would help us avoid unnecessary
> flushes as well :)

Yes, I guess you can remove those calls from the application :)

>
> A few updates:
> 1) libomxil-ti doesn't use the flush/inv ioctls at all... all buffers
> are mapped/unmapped

I'm not following libomxil-ti but I guess they count on the flushing
that happens when the buffers are mapped (as part of the
get_user_pages call)

> 2) I said only 4 buffers are mapped at a time. Actually, it's more
> like 8 * 3 + 1. Most of these are 4K, although only a few bytes are
> used.

Ok, thanks for the update. This still sounds like a small number that
doesn't require overkill data structures.

> 4) I'm already doing the changes needed in gst-dsp.

Great! I briefly looked at the patches and they seem very nice.

Thanks a lot,
Ohad.


>
> Cheers.
>
> --
> Felipe Contreras
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-20 21:22                             ` Ohad Ben-Cohen
@ 2010-05-20 21:23                               ` Ohad Ben-Cohen
  2010-05-21  6:14                               ` Felipe Contreras
  1 sibling, 0 replies; 45+ messages in thread
From: Ohad Ben-Cohen @ 2010-05-20 21:23 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

On Fri, May 21, 2010 at 12:22 AM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>> 4) I'm already doing the changes needed in gst-dsp.
>
> Great! I briefly looked at the patches and they seem very nice.

Btw I'll soon post version 2 of the DMA patches.

Thanks,
Ohad.

>
> Thanks a lot,
> Ohad.
>
>
>>
>> Cheers.
>>
>> --
>> Felipe Contreras
>>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-20 21:22                             ` Ohad Ben-Cohen
  2010-05-20 21:23                               ` Ohad Ben-Cohen
@ 2010-05-21  6:14                               ` Felipe Contreras
  2010-05-21  8:22                                 ` Ohad Ben-Cohen
  1 sibling, 1 reply; 45+ messages in thread
From: Felipe Contreras @ 2010-05-21  6:14 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

On Fri, May 21, 2010 at 12:22 AM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> On Wed, May 19, 2010 at 7:50 PM, Felipe Contreras
> <felipe.contreras@gmail.com> wrote:
>> I don't know what VM_IO means, but essentially we don't want to go
>> find each and page when we know the memory is contiguous. Although the
>> fact that the memory is writecombine would help us avoid unnecessary
>> flushes as well :)
>
> Yes, I guess you can remove those calls from the application :)

Can't. From GStreamer point of view the decoder is independent of the
renderer, so we have no way to know that. Perhaps if the memory map
operation returned some information we can avoid those calls.

>> A few updates:
>> 1) libomxil-ti doesn't use the flush/inv ioctls at all... all buffers
>> are mapped/unmapped
>
> I'm not following libomxil-ti but I guess they count on the flushing
> that happens when the buffers are mapped (as part of the
> get_user_pages call)

Yes, initially they were doing flushes, and so was gst-dsp, but I
found that little trick (not sure if they copied the trick or found it
themselves, but I did suggest it). Unfortunately gst-dsp is a bit
smarter and doesn't do constant memory mapping for the control buffers
(very small), which means we do clean/inv. For the data buffers (big
ones) we have to rely on constant memory mapping due to GStreamer
design, so we rely on get_user_pages(), although writebacks are not
needed for output buffers, and neither inv for the input ones, but I'm
not sure how much would be the gain.

Cheers.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-21  6:14                               ` Felipe Contreras
@ 2010-05-21  8:22                                 ` Ohad Ben-Cohen
  2010-05-21  9:42                                   ` Felipe Contreras
  0 siblings, 1 reply; 45+ messages in thread
From: Ohad Ben-Cohen @ 2010-05-21  8:22 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

On Fri, May 21, 2010 at 9:14 AM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> On Fri, May 21, 2010 at 12:22 AM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>> On Wed, May 19, 2010 at 7:50 PM, Felipe Contreras
>> <felipe.contreras@gmail.com> wrote:
>>> I don't know what VM_IO means, but essentially we don't want to go
>>> find each and page when we know the memory is contiguous. Although the
>>> fact that the memory is writecombine would help us avoid unnecessary
>>> flushes as well :)
>>
>> Yes, I guess you can remove those calls from the application :)
>
> Can't. From GStreamer point of view the decoder is independent of the
> renderer, so we have no way to know that.

Can you please elaborate on this a little ?

> Perhaps if the memory map
> operation returned some information we can avoid those calls.

I don't think it should; relying on the map call to tell us what type
of buffer we just gave it smells like a bad interface. After all, it
is the application who originally called mmap() to get access to that
buffer, so it already knows if it's a framebuffer of not.

Anyway it's not that an important optimization to do: the ioctl will
just fail. The negative side is that the application will have no way
to know if this is really a failure (did I just give a wrong buffer
address to dspbridge ?) or not (oh it's the framebuffer, there's no
mapping for that).

Performance-wise, it's better than the previous implementation which
would have executed redundant cache operation on the buffer.

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-21  8:22                                 ` Ohad Ben-Cohen
@ 2010-05-21  9:42                                   ` Felipe Contreras
  2010-05-24 16:19                                     ` Ohad Ben-Cohen
  0 siblings, 1 reply; 45+ messages in thread
From: Felipe Contreras @ 2010-05-21  9:42 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

On Fri, May 21, 2010 at 11:22 AM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> On Fri, May 21, 2010 at 9:14 AM, Felipe Contreras
> <felipe.contreras@gmail.com> wrote:
>> On Fri, May 21, 2010 at 12:22 AM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>>> On Wed, May 19, 2010 at 7:50 PM, Felipe Contreras
>>> <felipe.contreras@gmail.com> wrote:
>>>> I don't know what VM_IO means, but essentially we don't want to go
>>>> find each and page when we know the memory is contiguous. Although the
>>>> fact that the memory is writecombine would help us avoid unnecessary
>>>> flushes as well :)
>>>
>>> Yes, I guess you can remove those calls from the application :)
>>
>> Can't. From GStreamer point of view the decoder is independent of the
>> renderer, so we have no way to know that.
>
> Can you please elaborate on this a little ?

GStreamer has a plug-in architecture. So one plug-in (libgstdsp.so)
issues a request for a zero-copy buffer (gst_pad_alloc_buffer()) to
the next element in the chain. We don't know who that is, nor care,
and there might be elements in the middle that pass the buffer along.

The memory could be part of the framebuffer, or randomly allocated by
user-space. One element (omapxvimagesink) might always return memory
from omapfb, another one (omapfbsink), might allocate a separate
buffer if all the omapfb memory is being used currently.

In the end, the decoder element has no idea about that. Just the
GstBuffer which has a data pointer, and a size, and that's it.

There are even more complicated use-cases like video recording
directly on the framebuffer. Other pluggable MM frameworks like
OpenMAX IL would have similar problems.

>> Perhaps if the memory map
>> operation returned some information we can avoid those calls.
>
> I don't think it should; relying on the map call to tell us what type
> of buffer we just gave it smells like a bad interface. After all, it
> is the application who originally called mmap() to get access to that
> buffer, so it already knows if it's a framebuffer of not.

Well, requesting DMA operations from user-space is already bad
interface. It's the job of the driver to do that.

Due to bad design of the driver we'll have to do them, but if there's
a way to know if such operations are redundant or not, it would have
to be returned through map call, otherwise we'll be wasting ioctl
calls.

> Anyway it's not that an important optimization to do: the ioctl will
> just fail. The negative side is that the application will have no way
> to know if this is really a failure (did I just give a wrong buffer
> address to dspbridge ?) or not (oh it's the framebuffer, there's no
> mapping for that).

Exactly; we will not know when operations fail. But also, ioctl()
calls would be wasted.

> Performance-wise, it's better than the previous implementation which
> would have executed redundant cache operation on the buffer.

Indeed.

Although I'm still worried that VM_IO != write-combine, but I guess
it's a safe assumption to do for now.

Cheers.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-21  9:42                                   ` Felipe Contreras
@ 2010-05-24 16:19                                     ` Ohad Ben-Cohen
  2010-05-24 19:21                                       ` Felipe Contreras
  0 siblings, 1 reply; 45+ messages in thread
From: Ohad Ben-Cohen @ 2010-05-24 16:19 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

On Fri, May 21, 2010 at 12:42 PM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> Although I'm still worried that VM_IO != write-combine, but I guess
> it's a safe assumption to do for now.

small update we missed:

The original code is ignoring VM_IO buffers as well:

static int memory_sync_vma(unsigned long start, u32 len,
			   enum dsp_flushtype ftype)
{
...
		if (vma->vm_flags & (VM_IO | VM_PFNMAP))
			return -EINVAL;
...
}

So the new code is actually behaving in exactly the same manner
(besides the error code, which is now -EFAULT and not -EINVAL).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-24 16:19                                     ` Ohad Ben-Cohen
@ 2010-05-24 19:21                                       ` Felipe Contreras
  2010-05-24 19:49                                         ` Ohad Ben-Cohen
  0 siblings, 1 reply; 45+ messages in thread
From: Felipe Contreras @ 2010-05-24 19:21 UTC (permalink / raw)
  To: Ohad Ben-Cohen
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

On Mon, May 24, 2010 at 7:19 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> On Fri, May 21, 2010 at 12:42 PM, Felipe Contreras
> <felipe.contreras@gmail.com> wrote:
>> Although I'm still worried that VM_IO != write-combine, but I guess
>> it's a safe assumption to do for now.
>
> small update we missed:
>
> The original code is ignoring VM_IO buffers as well:
>
> static int memory_sync_vma(unsigned long start, u32 len,
>                           enum dsp_flushtype ftype)
> {
> ...
>                if (vma->vm_flags & (VM_IO | VM_PFNMAP))
>                        return -EINVAL;
> ...
> }
>
> So the new code is actually behaving in exactly the same manner
> (besides the error code, which is now -EFAULT and not -EINVAL).

That's what I said in my original comment:
> Reading your patches I see the ioctl to start the dmap operations would
> simply error out, but from the looks of it, they would be failing already,
> which is weird, because we are already using framebuffer memory for video
> decoding + rendering.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues
  2010-05-24 19:21                                       ` Felipe Contreras
@ 2010-05-24 19:49                                         ` Ohad Ben-Cohen
  0 siblings, 0 replies; 45+ messages in thread
From: Ohad Ben-Cohen @ 2010-05-24 19:49 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: linux-omap, Kanigeri Hari, Omar Ramirez Luna,
	Guzman Lugo Fernando, Menon Nishanth, Hiroshi Doyu

On Mon, May 24, 2010 at 10:21 PM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> On Mon, May 24, 2010 at 7:19 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>> On Fri, May 21, 2010 at 12:42 PM, Felipe Contreras
>> <felipe.contreras@gmail.com> wrote:
>>> Although I'm still worried that VM_IO != write-combine, but I guess
>>> it's a safe assumption to do for now.
>>
>> small update we missed:
>>
>> The original code is ignoring VM_IO buffers as well:
>>
>> static int memory_sync_vma(unsigned long start, u32 len,
>>                           enum dsp_flushtype ftype)
>> {
>> ...
>>                if (vma->vm_flags & (VM_IO | VM_PFNMAP))
>>                        return -EINVAL;
>> ...
>> }
>>
>> So the new code is actually behaving in exactly the same manner
>> (besides the error code, which is now -EFAULT and not -EINVAL).
>
> That's what I said in my original comment:

Oh I missed that one, and thought you meant this is rather a new behavior.

One thing I can think of to make this dspbridge VM_IO assumption
stronger is this:

diff --git a/drivers/dsp/bridge/core/tiomap3430.c
b/drivers/dsp/bridge/core/tiomap3430.c
index e122f04..7030012 100644
--- a/drivers/dsp/bridge/core/tiomap3430.c
+++ b/drivers/dsp/bridge/core/tiomap3430.c
@@ -1326,7 +1326,7 @@ static dsp_status bridge_brd_mem_map(struct
bridge_dev_context *hDevContext,
                goto func_cont;
        }

-       if (vma->vm_flags & VM_IO) {
+       if (vma->vm_flags & (VM_IO | VM_PFNMAP)) {
                num_usr_pgs = ul_num_bytes / PG_SIZE4K;
                mpu_addr = ul_mpu_addr;

This is exactly what __get_user_pages is checking for as well, and it
applies to our framebuffer use case.

Anyway I wonder if we have such dspbridge scenarios in which vm_flags
& VM_IO is true, but vm_flags & VM_PFNMAP is false.

Obviously this all issue has nothing to do with the current mem/cache
API patches.

Thanks,
Ohad.

>> Reading your patches I see the ioctl to start the dmap operations would
>> simply error out, but from the looks of it, they would be failing already,
>> which is weird, because we are already using framebuffer memory for video
>> decoding + rendering.
>
> --
> Felipe Contreras
> --
> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2010-05-24 19:49 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-01 20:44 [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues Ohad Ben-Cohen
2010-05-01 20:44 ` [RFC/PATCH 1/6] DSPBRIDGE: add memory_map_info to PROC Ohad Ben-Cohen
2010-05-01 20:44 ` [RFC/PATCH 2/6] DSPBRIDGE: remember mapping and page info in proc_map Ohad Ben-Cohen
2010-05-15  8:34   ` Felipe Contreras
2010-05-16 23:00     ` Ohad Ben-Cohen
2010-05-01 20:44 ` [RFC/PATCH 3/6] DSPBRIDGE: remove mapping information in proc_unmap Ohad Ben-Cohen
2010-05-15  8:38   ` Felipe Contreras
2010-05-16 23:02     ` Ohad Ben-Cohen
2010-05-01 20:44 ` [RFC/PATCH 4/6] DSPBRIDGE: do not call follow_page Ohad Ben-Cohen
2010-05-01 20:44 ` [RFC/PATCH 5/6] DSPBRIDGE: do not use low level cache manipulation API Ohad Ben-Cohen
2010-05-02 11:56   ` Ohad Ben-Cohen
2010-05-01 20:44 ` [RFC/PATCH 6/6] DSPBRIDGE: add dspbridge API to mark end of DMA Ohad Ben-Cohen
2010-05-02 13:17 ` [RFC/PATCH 0/6] DSPBRIDGE: fix mem+cache API issues Felipe Contreras
2010-05-02 17:47   ` Ohad Ben-Cohen
2010-05-14 19:27     ` Felipe Contreras
2010-05-14 19:49       ` Omar Ramirez Luna
2010-05-15  8:26         ` Felipe Contreras
2010-05-15  9:08           ` Felipe Contreras
2010-05-16 16:08             ` Felipe Contreras
2010-05-16 17:35       ` Felipe Contreras
2010-05-16 22:57         ` Ohad Ben-Cohen
2010-05-16 23:51           ` Felipe Contreras
2010-05-18  8:05             ` Ohad Ben-Cohen
2010-05-18 11:02               ` Felipe Contreras
2010-05-18 11:14                 ` Ohad Ben-Cohen
2010-05-18 11:43                   ` Felipe Contreras
2010-05-18 11:57                     ` Ohad Ben-Cohen
2010-05-18 12:24                       ` Felipe Contreras
2010-05-18 12:53                         ` Ohad Ben-Cohen
2010-05-19 16:50                           ` Felipe Contreras
2010-05-20 21:22                             ` Ohad Ben-Cohen
2010-05-20 21:23                               ` Ohad Ben-Cohen
2010-05-21  6:14                               ` Felipe Contreras
2010-05-21  8:22                                 ` Ohad Ben-Cohen
2010-05-21  9:42                                   ` Felipe Contreras
2010-05-24 16:19                                     ` Ohad Ben-Cohen
2010-05-24 19:21                                       ` Felipe Contreras
2010-05-24 19:49                                         ` Ohad Ben-Cohen
2010-05-16 22:35       ` Ohad Ben-Cohen
2010-05-16 23:15         ` Felipe Contreras
2010-05-16 23:21           ` Ohad Ben-Cohen
2010-05-16 23:25           ` Ohad Ben-Cohen
2010-05-17  0:05             ` Felipe Contreras
2010-05-18  6:20               ` Ohad Ben-Cohen
2010-05-20 21:18       ` Ohad Ben-Cohen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).