[PATCH] DSPBRIDGE:Fix Kernel memory poison overwritten after DSP

public inbox for linux-omap@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] DSPBRIDGE:Fix Kernel memory poison overwritten after DSP_MMUFAULT
@ 2010-04-13 16:46 Deepak Chitriki
  2010-04-13 16:54 ` Deepak Chitriki
  0 siblings, 1 reply; 12+ messages in thread
From: Deepak Chitriki @ 2010-04-13 16:46 UTC (permalink / raw)
  To: linux-omap; +Cc: Deepak Chitriki

kmalloc() does not guarantee page aligned memory always,hence
resulting in virtual addresses not getting aligned to page boundary.
This patch replaces kmalloc() with __get_free_pages() which
allocates kernel memory in terms of PAGES fixing the Kernel
memory corruption after DSP_MMUFAULT.

Signed-off-by: Deepak Chitriki <deepak.chitriki@ti.com>
---
 drivers/dsp/bridge/wmd/ue_deh.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/dsp/bridge/wmd/ue_deh.c b/drivers/dsp/bridge/wmd/ue_deh.c
index 14dd8ae..7ed5f60 100644
--- a/drivers/dsp/bridge/wmd/ue_deh.c
+++ b/drivers/dsp/bridge/wmd/ue_deh.c
@@ -239,7 +239,8 @@ void bridge_deh_notify(struct deh_mgr *hdeh_mgr, u32 ulEventMask, u32 dwErrInfo)
 			       "bridge_deh_notify: DSP_MMUFAULT, fault "
 			       "address = 0x%x\n", (unsigned int)fault_addr);
 			dummy_va_addr =
-			    (u32) mem_calloc(sizeof(char) * 0x1000, MEM_PAGED);
+			    (void *)__get_free_pages(GFP_ATOMIC | __GFP_ZERO,
+						     0);
 			mem_physical =
 			    VIRT_TO_PHYS(PG_ALIGN_LOW
 					 ((u32) dummy_va_addr, PG_SIZE4K));
@@ -338,6 +339,6 @@ dsp_status bridge_deh_get_info(struct deh_mgr *hdeh_mgr,
  */
 void bridge_deh_release_dummy_mem(void)
 {
-	kfree((void *)dummy_va_addr);
+	free_pages((void *)dummy_va_addr, 0);
 	dummy_va_addr = 0;
 }
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] DSPBRIDGE:Fix Kernel memory poison overwritten after DSP_MMUFAULT
  2010-04-13 16:46 [PATCH] DSPBRIDGE:Fix Kernel memory poison overwritten after DSP_MMUFAULT Deepak Chitriki
@ 2010-04-13 16:54 ` Deepak Chitriki
  0 siblings, 0 replies; 12+ messages in thread
From: Deepak Chitriki @ 2010-04-13 16:54 UTC (permalink / raw)
  To: linux-omap

Please ignore this patch.

Thanks,
Deepak

Chitriki Rudramuni, Deepak wrote:
> kmalloc() does not guarantee page aligned memory always,hence
> resulting in virtual addresses not getting aligned to page boundary.
> This patch replaces kmalloc() with __get_free_pages() which
> allocates kernel memory in terms of PAGES fixing the Kernel
> memory corruption after DSP_MMUFAULT.
>
> Signed-off-by: Deepak Chitriki <deepak.chitriki@ti.com>
> ---
>  drivers/dsp/bridge/wmd/ue_deh.c |    5 +++--
>  1 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/dsp/bridge/wmd/ue_deh.c b/drivers/dsp/bridge/wmd/ue_deh.c
> index 14dd8ae..7ed5f60 100644
> --- a/drivers/dsp/bridge/wmd/ue_deh.c
> +++ b/drivers/dsp/bridge/wmd/ue_deh.c
> @@ -239,7 +239,8 @@ void bridge_deh_notify(struct deh_mgr *hdeh_mgr, u32 ulEventMask, u32 dwErrInfo)
>  			       "bridge_deh_notify: DSP_MMUFAULT, fault "
>  			       "address = 0x%x\n", (unsigned int)fault_addr);
>  			dummy_va_addr =
> -			    (u32) mem_calloc(sizeof(char) * 0x1000, MEM_PAGED);
> +			    (void *)__get_free_pages(GFP_ATOMIC | __GFP_ZERO,
> +						     0);
>  			mem_physical =
>  			    VIRT_TO_PHYS(PG_ALIGN_LOW
>  					 ((u32) dummy_va_addr, PG_SIZE4K));
> @@ -338,6 +339,6 @@ dsp_status bridge_deh_get_info(struct deh_mgr *hdeh_mgr,
>   */
>  void bridge_deh_release_dummy_mem(void)
>  {
> -	kfree((void *)dummy_va_addr);
> +	free_pages((void *)dummy_va_addr, 0);
>  	dummy_va_addr = 0;
>  }
>   


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] DSPBRIDGE:Fix Kernel memory poison overwritten after DSP_MMUFAULT
@ 2010-04-13 16:55 Deepak Chitriki
  2010-04-19 18:25 ` Guzman Lugo, Fernando
  0 siblings, 1 reply; 12+ messages in thread
From: Deepak Chitriki @ 2010-04-13 16:55 UTC (permalink / raw)
  To: linux-omap
  Cc: Deepak Chitriki, Ameya Palande, Felipe Contreras, Hiroshi Doyu,
	Omar Ramirez Luna, Nishanth Menon

kmalloc() does not guarantee page aligned memory always,hence
resulting in virtual addresses not getting aligned to page boundary.
This patch replaces kmalloc() with __get_free_pages() which
allocates kernel memory in terms of PAGES fixing the Kernel
memory corruption after DSP_MMUFAULT.

Cc: Ameya Palande <ameya.palande@nokia.com>
Cc: Felipe Contreras <felipe.contreras@nokia.com>
Cc: Hiroshi Doyu <hiroshi.doyu@nokia.com>
Cc: Omar Ramirez Luna <omar.ramirez@ti.com>
Cc: Nishanth Menon <nm@ti.com>

Signed-off-by: Deepak Chitriki <deepak.chitriki@ti.com>
---
 drivers/dsp/bridge/wmd/ue_deh.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/dsp/bridge/wmd/ue_deh.c b/drivers/dsp/bridge/wmd/ue_deh.c
index 14dd8ae..7ed5f60 100644
--- a/drivers/dsp/bridge/wmd/ue_deh.c
+++ b/drivers/dsp/bridge/wmd/ue_deh.c
@@ -239,7 +239,8 @@ void bridge_deh_notify(struct deh_mgr *hdeh_mgr, u32 ulEventMask, u32 dwErrInfo)
 			       "bridge_deh_notify: DSP_MMUFAULT, fault "
 			       "address = 0x%x\n", (unsigned int)fault_addr);
 			dummy_va_addr =
-			    (u32) mem_calloc(sizeof(char) * 0x1000, MEM_PAGED);
+			    (void *)__get_free_pages(GFP_ATOMIC | __GFP_ZERO,
+						     0);
 			mem_physical =
 			    VIRT_TO_PHYS(PG_ALIGN_LOW
 					 ((u32) dummy_va_addr, PG_SIZE4K));
@@ -338,6 +339,6 @@ dsp_status bridge_deh_get_info(struct deh_mgr *hdeh_mgr,
  */
 void bridge_deh_release_dummy_mem(void)
 {
-	kfree((void *)dummy_va_addr);
+	free_pages((void *)dummy_va_addr, 0);
 	dummy_va_addr = 0;
 }
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* RE: [PATCH] DSPBRIDGE:Fix Kernel memory poison overwritten after DSP_MMUFAULT
  2010-04-13 16:55 Deepak Chitriki
@ 2010-04-19 18:25 ` Guzman Lugo, Fernando
  2010-05-12 19:39   ` Felipe Contreras
  0 siblings, 1 reply; 12+ messages in thread
From: Guzman Lugo, Fernando @ 2010-04-19 18:25 UTC (permalink / raw)
  To: Chitriki Rudramuni, Deepak, linux-omap
  Cc: Ameya Palande, Felipe Contreras, Hiroshi Doyu, Ramirez Luna, Omar,
	Menon, Nishanth



Hi all,

	I have found the really issue here:


The problem here is that after MMUFault the DSP is allowed to continue executing until here revices the message informing about the MMUFault and this problem since the patches for mailbox migration.


Previous code:

			if (DSP_SUCCEEDED(status)) {
				hwStatus = HW_MMU_TLBAdd(resources.dwDmmuBase,
					memPhysical, faultAddr,
					HW_PAGE_SIZE_4KB, 1, &mapAttrs,
					HW_SET, HW_SET);

<<<we add the dummy entry in the TBL, so that MMU module can translate the address, we always map pages so it does not matter if we pass the complete addrees (page + offset) or only the page aligned addres (page) we will write only the page.>>>

			}
			/* send an interrupt to DSP */
			HW_MBOX_MsgWrite(resources.dwMboxBase, MBOX_ARM2DSP,
					 MBX_DEH_CLASS | MBX_DEH_EMMU);

<<<we send a mailbox message to the DSP to inform it about MMUFault, this function write the message into mailbox and trigger mailbox interrupt in the DSP side.>>>
			/* Clear MMU interrupt */
			HW_MMU_EventAck(resources.dwDmmuBase,
					 HW_MMU_TRANSLATION_FAULT);

<<<we acked the MMU faul interrupt (transition fault interrupt). After MMUFault MMU module stops DSP execution until the MMUfault flag is acked and it can find the physical address of the virtual address requested by the DSP. So in this moment the DSP continue executing again but before it can use the address translated it had to attend mailbox interrupt (hardware interrupt) so it change context to mailbox ISR and the DSP is stuck in infinite while loop.>>>



However after mailbox migration patches the code looks like:

			if (DSP_SUCCEEDED(status)) {
				hw_status_obj =
				    hw_mmu_tlb_add(resources.dw_dmmu_base,
						   mem_physical, fault_addr,
						   HW_PAGE_SIZE4KB, 1,
						   &map_attrs, HW_SET, HW_SET);
			}
			/* send an interrupt to DSP */
			omap_mbox_msg_send(dev_context->mbox,
						MBX_DEH_CLASS | MBX_DEH_EMMU);

<<<the code looks pretty similar, however there is a difference inside  omap_mbox_msg_send function, this function does not write directly the mailbox register to put the new messages, instead of schedule a workqueue that will the in charge of doing that job>>>

			/* Clear MMU interrupt */
			hw_mmu_event_ack(resources.dw_dmmu_base,
					 HW_MMU_TRANSLATION_FAULT);

<<<So after we ack the MMU fault event the MMU lets DSP to continue executing, like the mailbox interrupt was not trigger in this moment (because of the latency of the workque) and if the fault address was being used in the DSP to write, it can corrupt memory.>>>


The patch send to linux-omap list (DSPBRIDGE:Fix Kernel memory poison overwritten after DSP_MMUFAULT) is just hidden the problem. Because in case the MPU had a lot of work the workqueue execution will be delay even more and the DSP side could reach the limit of the dummy page allocated and corrupt memory, or write memory in a downward way and corrupt preview memory maybe already map but not allowed to DSP write the entry page.


Also the way we are using the dummymemory to allow DSP write/read from that is not correct. Because the offset of the dummymemory and the offset of the DSP fault address should be match.

These values are taken from nokia logs:

Fault address: 0x21fa0040
dmm_va_addr: 0xdf16d140
mem_physical: 0x9f16d000

The address returned by kmalloc is 0xccbd2080, so we can write I this buffer from 0xdf16d140 until the end of the page and in physical memory from 0x9f16d140 until the end of the page. And in the DSP we map 0x9f16d000 <=> 0x21fa0000 and when it tries to write into 0x21fa0040 it is actually writing to 0x9f16d040 corrupting the memory. But in the previous code we did not allowed to the DSP do anything more after the MMU fault, that why we did not see that problem before.

The patch "DSPBRIDGE: MMU-Fault debugging enhancements" already sent to linux-omap list fix this problem indirectly. Now the way to inform about the MMUFault is not using a mailbox message, instead of we the GTP8 overflow interrupt.


				omap_dm_timer_set_load_start(timer, 0,
								0xfffffffe);
			<<<we set timer counter almost to overflue >>>


				/* Wait 80us for timer to overflow */
				udelay(80);

				/*
				 * Check interrupt status and
				 * wait for interrupt
				 */
				cnt = 0;
				while (!(omap_dm_timer_read_status(timer) &
					GPTIMER_IRQ_OVERFLOW)) {
					if (cnt++ >=
						GPTIMER_IRQ_WAIT_MAX_CNT) {
						pr_err("%s: GPTimer interrupt"
							" failed\n", __func__);
						break;
					}
				}
		<<<we wait until interrupt is trigger>>>>

			}
			hw_mmu_event_ack(resources->dw_dmmu_base,
					 HW_MMU_TRANSLATION_FAULT);
<<<DSP can continue in this point, but how the GTP8 interrupt was already trigger it change the context to the GTP8 ISR and it dumps DSP stack and then stuck in the infinite while loop>>>

			dump_dsp_stack(deh_mgr_obj->hwmd_context);
			omap_dm_timer_disable(timer);



I could reproduce the issue doing some change in the top of "DSPBRIDGE: MMU-Fault debugging enhancements":

			temp1 = kmalloc(0x100000, GFP_ATOMIC);
			temp2 = kmalloc(0x1000, GFP_ATOMIC);
			kfree(temp1);
			kfree(temp2);
	<<<doing some allocations and frees to fill slap with poison and redzone pattern>>>
			dummy_va_addr = (u32) kmalloc(0x1000, GFP_ATOMIC);

...
			/* Clear MMU interrupt */
			hw_mmu_event_ack(resources->dw_dmmu_base,
					 HW_MMU_TRANSLATION_FAULT);
	<<<Acked MMU fault flag, so that DSP can continue executing, before generate GTP8 interrupt>>>
			/*
			 * Send a GP Timer interrupt to DSP
			 * The DSP expects a GP timer interrupt after an
			 * MMU-Fault Request GPTimer
			 */
			if (timer) {


And this is what I get:

BUG kmalloc-64: Redzone overwritten
-----------------------------------------------------------------------------

INFO: 0xccbeea40-0xccbeea43. First byte 0x0 instead of 0xbb
INFO: Allocated in 0xe3510001 age=3858725493 cpu=2583691266 pid=-481230846
INFO: Freed in 0xea000007 age=473713215 cpu=3785367565 pid=-473809537
INFO: Slab 0xc0706d78 objects=32 used=31 fp=0xccbeea00 flags=0x00c2
INFO: Object 0xccbeea00 @offset=2560 fp=0x0a00000d

Bytes b4 0xccbee9f0:  0b 00 00 ea 00 00 a0 e3 0d 00 00 ea 0d 20 a0 e1 ...ê...ã..
.ê...á
  Object 0xccbeea00:  7f 3d c2 e3 3f 30 c3 e3 0c 10 a0 e1 04 20 8d e2 .=Âã?0Ãã..
.á...â
  Object 0xccbeea10:  04 30 93 e5 01 c1 d3 e3 20 30 a0 13 d0 30 a0 03 .0.å.ÁÓã.0 ..Ð0..
  Object 0xccbeea20:  fe ff ff eb 00 00 50 e3 04 30 9d 15 00 30 86 15 þÿÿë..Pã.0 ...0..
  Object 0xccbeea30:  00 00 00 1a 00 00 86 e5 7c 80 bd e8 08 00 00 00 .......å|.
½è....
 Redzone 0xccbeea40:  00 00 50 e3                                     ..Pã

 Padding 0xccbeea68:  04 30 93 e5 01 21 d3 e3 20 10 a0 13 d0 10 a0 03 .0.å.!Óã..
..Ð...
 Padding 0xccbeea78:  fe ff ff ea fe ff ff ea

I am getting the Redzone overwritten instead of Poison overwritten because was the start of the slab which was corrupted.


Keeping the code as before just changing the MMU fault ack after generating GTP8 interrupt is trigger the issue is not seen.

				while (!(omap_dm_timer_read_status(timer) &
					GPTIMER_IRQ_OVERFLOW)) {
					if (cnt++ >=
						GPTIMER_IRQ_WAIT_MAX_CNT) {
						pr_err("%s: GPTimer interrupt"
							" failed\n", __func__);
						break;
					}
				}
		<<< wait until GTP8 interrupt is generated>>>
			}
			hw_mmu_event_ack(resources->dw_dmmu_base,
					 HW_MMU_TRANSLATION_FAULT);


Even if I pass an address already freed to the tlb just to make sure that the DSP is not able to write to that address after MMUFault the issue is not seen:

			temp1 = kmalloc(0x100000, GFP_ATOMIC);
			temp2 = kmalloc(0x1000, GFP_ATOMIC);
			kfree(temp1);
			kfree(temp2);
...
				    hw_mmu_tlb_add(resources->dw_dmmu_base,
						   temp2, fault_addr,
			<<<using temp2 address, which is already free>>>

						   HW_PAGE_SIZE4KB, 1,
						   &map_attrs, HW_SET, HW_SET);

The issue is not even seen, the conclusion of this test is: we can pass a really "dummy" address (any address) to fill up the TLB, DSP is actually not using that, therefore we don't need even allocate memory for dummy_va_addr, I can even used NULL and there is not problem.



To sum up:

- "DSPBRIDGE:Fix Kernel memory poison overwritten after DSP_MMUFAULT" is only hidden the problem, we don't need aligned memory in this point, that patch should be removed if it is already apply.

- There is no need to create a patch for the issue because it is already indirectly fix with "DSPBRIDGE: MMU-Fault debugging enhancements".

- we don't need allocate memory for dummy_va_addr, if some patch should be created should be the patch to remove dummy_va_addr allocation and deletion.

Regards,
Fernando.


>-----Original Message-----
>From: linux-omap-owner@vger.kernel.org [mailto:linux-omap-
>owner@vger.kernel.org] On Behalf Of Chitriki Rudramuni, Deepak
>Sent: Tuesday, April 13, 2010 11:55 AM
>To: linux-omap
>Cc: Chitriki Rudramuni, Deepak; Ameya Palande; Felipe Contreras; Hiroshi
>Doyu; Ramirez Luna, Omar; Menon, Nishanth
>Subject: [PATCH] DSPBRIDGE:Fix Kernel memory poison overwritten after
>DSP_MMUFAULT
>
>kmalloc() does not guarantee page aligned memory always,hence
>resulting in virtual addresses not getting aligned to page boundary.
>This patch replaces kmalloc() with __get_free_pages() which
>allocates kernel memory in terms of PAGES fixing the Kernel
>memory corruption after DSP_MMUFAULT.
>
>Cc: Ameya Palande <ameya.palande@nokia.com>
>Cc: Felipe Contreras <felipe.contreras@nokia.com>
>Cc: Hiroshi Doyu <hiroshi.doyu@nokia.com>
>Cc: Omar Ramirez Luna <omar.ramirez@ti.com>
>Cc: Nishanth Menon <nm@ti.com>
>
>Signed-off-by: Deepak Chitriki <deepak.chitriki@ti.com>
>---
> drivers/dsp/bridge/wmd/ue_deh.c |    5 +++--
> 1 files changed, 3 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/dsp/bridge/wmd/ue_deh.c
>b/drivers/dsp/bridge/wmd/ue_deh.c
>index 14dd8ae..7ed5f60 100644
>--- a/drivers/dsp/bridge/wmd/ue_deh.c
>+++ b/drivers/dsp/bridge/wmd/ue_deh.c
>@@ -239,7 +239,8 @@ void bridge_deh_notify(struct deh_mgr *hdeh_mgr, u32
>ulEventMask, u32 dwErrInfo)
> 			       "bridge_deh_notify: DSP_MMUFAULT, fault "
> 			       "address = 0x%x\n", (unsigned int)fault_addr);
> 			dummy_va_addr =
>-			    (u32) mem_calloc(sizeof(char) * 0x1000, MEM_PAGED);
>+			    (void *)__get_free_pages(GFP_ATOMIC | __GFP_ZERO,
>+						     0);
> 			mem_physical =
> 			    VIRT_TO_PHYS(PG_ALIGN_LOW
> 					 ((u32) dummy_va_addr, PG_SIZE4K));
>@@ -338,6 +339,6 @@ dsp_status bridge_deh_get_info(struct deh_mgr
>*hdeh_mgr,
>  */
> void bridge_deh_release_dummy_mem(void)
> {
>-	kfree((void *)dummy_va_addr);
>+	free_pages((void *)dummy_va_addr, 0);
> 	dummy_va_addr = 0;
> }
>--
>1.6.3.3
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-omap" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] DSPBRIDGE:Fix Kernel memory poison overwritten after DSP_MMUFAULT
  2010-04-19 18:25 ` Guzman Lugo, Fernando
@ 2010-05-12 19:39   ` Felipe Contreras
  2010-05-12 21:09     ` Guzman Lugo, Fernando
  0 siblings, 1 reply; 12+ messages in thread
From: Felipe Contreras @ 2010-05-12 19:39 UTC (permalink / raw)
  To: Guzman Lugo, Fernando
  Cc: Chitriki Rudramuni, Deepak, linux-omap, Ameya Palande,
	Felipe Contreras, Hiroshi Doyu, Ramirez Luna, Omar,
	Menon, Nishanth

Hi,

I didn't touch this issue in the hopes that it would be fixed, but
seems it hasn't.

On Mon, Apr 19, 2010 at 9:25 PM, Guzman Lugo, Fernando <x0095840@ti.com> wrote:
> To sum up:
>
> - "DSPBRIDGE:Fix Kernel memory poison overwritten after DSP_MMUFAULT" is only hidden the problem, we don't need aligned memory in this point, that patch should be removed if it is already apply.
>
> - There is no need to create a patch for the issue because it is already indirectly fix with "DSPBRIDGE: MMU-Fault debugging enhancements".

If you are referring to this patch:
http://git.kernel.org/?p=linux/kernel/git/tmlind/linux-omap-2.6.git;a=commit;h=26ad62f03578a12e942d8bb86d0e52ef1afdee22

I tried to backport it to minimize the changes to a reproducible
test-case. I guess in the l-o branch the commit would be dd1fd0b.
Unfortunately that didn't fix the corruption. So I don't by that GPT8
theory.

> - we don't need allocate memory for dummy_va_addr, if some patch should be created should be the patch to remove dummy_va_addr allocation and deletion.

I tried that, and that actually fixes the corruption for me (passing 0
to hw_mmu_tlb_add).

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] DSPBRIDGE:Fix Kernel memory poison overwritten after DSP_MMUFAULT
  2010-05-12 19:39   ` Felipe Contreras
@ 2010-05-12 21:09     ` Guzman Lugo, Fernando
  2010-05-13 11:39       ` Felipe Contreras
  0 siblings, 1 reply; 12+ messages in thread
From: Guzman Lugo, Fernando @ 2010-05-12 21:09 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: Chitriki Rudramuni, Deepak, linux-omap, Ameya Palande,
	Felipe Contreras, Hiroshi Doyu, Ramirez Luna, Omar,
	Menon, Nishanth



Hi,

> -----Original Message-----
> From: Felipe Contreras [mailto:felipe.contreras@gmail.com]
> Sent: Wednesday, May 12, 2010 2:39 PM
> To: Guzman Lugo, Fernando
> Cc: Chitriki Rudramuni, Deepak; linux-omap; Ameya Palande; Felipe
> Contreras; Hiroshi Doyu; Ramirez Luna, Omar; Menon, Nishanth
> Subject: Re: [PATCH] DSPBRIDGE:Fix Kernel memory poison overwritten after
> DSP_MMUFAULT
> 
> Hi,
> 
> I didn't touch this issue in the hopes that it would be fixed, but
> seems it hasn't.
> 
> On Mon, Apr 19, 2010 at 9:25 PM, Guzman Lugo, Fernando <x0095840@ti.com>
> wrote:
> > To sum up:
> >
> > - "DSPBRIDGE:Fix Kernel memory poison overwritten after DSP_MMUFAULT" is
> only hidden the problem, we don't need aligned memory in this point, that
> patch should be removed if it is already apply.
> >
> > - There is no need to create a patch for the issue because it is already
> indirectly fix with "DSPBRIDGE: MMU-Fault debugging enhancements".
> 
> If you are referring to this patch:
> http://git.kernel.org/?p=linux/kernel/git/tmlind/linux-omap-
> 2.6.git;a=commit;h=26ad62f03578a12e942d8bb86d0e52ef1afdee22

Yes, that's the patch. Could you make sure that the GPT8 interrupt is generated before acking MMU fault interrupt?

> 
> I tried to backport it to minimize the changes to a reproducible
> test-case. I guess in the l-o branch the commit would be dd1fd0b.
> Unfortunately that didn't fix the corruption. So I don't by that GPT8
> theory.
> 
> > - we don't need allocate memory for dummy_va_addr, if some patch should
> be created should be the patch to remove dummy_va_addr allocation and
> deletion.
> 
> I tried that, and that actually fixes the corruption for me (passing 0
> to hw_mmu_tlb_add).

I think first page DSP side memory is never mapped to MPU side, so even if the DSP corrupts that page it does not affect MPU side. However the right solution is the one explained before: avoid DSP continues executing after MMUfault.

Regards,
Fernando.

> 
> --
> Felipe Contreras

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] DSPBRIDGE:Fix Kernel memory poison overwritten after DSP_MMUFAULT
  2010-05-12 21:09     ` Guzman Lugo, Fernando
@ 2010-05-13 11:39       ` Felipe Contreras
  2010-05-13 17:29         ` Guzman Lugo, Fernando
  2010-05-14 12:08         ` Felipe Contreras
  0 siblings, 2 replies; 12+ messages in thread
From: Felipe Contreras @ 2010-05-13 11:39 UTC (permalink / raw)
  To: Guzman Lugo, Fernando
  Cc: Chitriki Rudramuni, Deepak, linux-omap, Ameya Palande,
	Felipe Contreras, Hiroshi Doyu, Ramirez Luna, Omar,
	Menon, Nishanth

On Thu, May 13, 2010 at 12:09 AM, Guzman Lugo, Fernando
<fernando.lugo@ti.com> wrote:
>> If you are referring to this patch:
>> http://git.kernel.org/?p=linux/kernel/git/tmlind/linux-omap-
>> 2.6.git;a=commit;h=26ad62f03578a12e942d8bb86d0e52ef1afdee22
>
> Yes, that's the patch. Could you make sure that the GPT8 interrupt is generated before acking MMU fault interrupt?

I'll try tomorrow when I have access to the hw.

>> I tried to backport it to minimize the changes to a reproducible
>> test-case. I guess in the l-o branch the commit would be dd1fd0b.
>> Unfortunately that didn't fix the corruption. So I don't by that GPT8
>> theory.
>>
>> > - we don't need allocate memory for dummy_va_addr, if some patch should
>> be created should be the patch to remove dummy_va_addr allocation and
>> deletion.
>>
>> I tried that, and that actually fixes the corruption for me (passing 0
>> to hw_mmu_tlb_add).
>
> I think first page DSP side memory is never mapped to MPU side, so even if the DSP corrupts that page it does not affect MPU side. However the right solution is the one explained before: avoid DSP continues executing after MMUfault.

First of all, what is the DSP supposed to do with that memory? Do we
really need to call hw_mmu_tlb_add at all?

We really, absolutely want the DSP to don't corrupt memory on ARM
side, so if we pass something, it should be full pages.

Sure, it would be nice to wait for the DSP to stop, but if for some
reason it doesn't, we need to know that the DSP doesn't have the power
to corrupt memory.

Now, I went back to commit 72110f1 and tried the patch you mentioned.
There's no GPT8 involved, and I cannot reproduce any corruption on a
beagleboard.

--- a/drivers/dsp/bridge/wmd/ue_deh.c
+++ b/drivers/dsp/bridge/wmd/ue_deh.c
@@ -193,6 +193,7 @@ void bridge_deh_notify(struct deh_mgr *hdeh_mgr,
u32 ulEventMask, u32 dwErrInfo)
                                        &resources);

        if (MEM_IS_VALID_HANDLE(deh_mgr_obj, SIGNATURE)) {
+               void *temp1, *temp2;
                printk(KERN_INFO
                       "bridge_deh_notify: ********** DEVICE EXCEPTION "
                       "**********\n");
@@ -227,8 +228,11 @@ void bridge_deh_notify(struct deh_mgr *hdeh_mgr,
u32 ulEventMask, u32 dwErrInfo)
                        printk(KERN_INFO
                               "bridge_deh_notify: DSP_MMUFAULT, fault "
                               "address = 0x%x\n", (unsigned int)fault_addr);
-                       dummy_va_addr =
-                           (u32) mem_calloc(sizeof(char) * 0x1000, MEM_PAGED);
+                       temp1 = kmalloc(0x100000, GFP_ATOMIC);
+                       temp2 = kmalloc(0x1000, GFP_ATOMIC);
+                       kfree(temp1);
+                       kfree(temp2);
+                       dummy_va_addr = (u32) kmalloc(0x1000, GFP_ATOMIC);
                        mem_physical =
                            VIRT_TO_PHYS(PG_ALIGN_LOW
                                         ((u32) dummy_va_addr, PG_SIZE4K));

Is there anything special I should do?

Also, wouldn't it be easier to trigger this by doing:

                        printk(KERN_INFO
                               "bridge_deh_notify: DSP_MMUFAULT, fault "
                               "address = 0x%x\n", (unsigned int)fault_addr);
-                       dummy_va_addr =
-                           (u32) mem_calloc(sizeof(char) * 0x1000, MEM_PAGED);
+                       temp1 = kmalloc(0x100000, GFP_ATOMIC);
+                       temp2 = kmalloc(0x1000, GFP_ATOMIC);
+                       kfree(temp1);
                        mem_physical =
                            VIRT_TO_PHYS(PG_ALIGN_LOW
-                                        ((u32) dummy_va_addr, PG_SIZE4K));
+                                        ((u32) temp2, PG_SIZE4K));
+                       kfree(temp2);
                        dev_context = (struct wmd_dev_context *)
                            deh_mgr_obj->hwmd_context;
                        /* Reset the dynamic mmu index to fixed count if it

Cheers.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] DSPBRIDGE:Fix Kernel memory poison overwritten after DSP_MMUFAULT
  2010-05-13 11:39       ` Felipe Contreras
@ 2010-05-13 17:29         ` Guzman Lugo, Fernando
  2010-05-13 18:29           ` Felipe Contreras
  2010-05-14 12:08         ` Felipe Contreras
  1 sibling, 1 reply; 12+ messages in thread
From: Guzman Lugo, Fernando @ 2010-05-13 17:29 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: Chitriki Rudramuni, Deepak, linux-omap, Ameya Palande,
	Felipe Contreras, Hiroshi Doyu, Ramirez Luna, Omar,
	Menon, Nishanth



> -----Original Message-----
> From: Felipe Contreras [mailto:felipe.contreras@gmail.com]
> Sent: Thursday, May 13, 2010 6:39 AM
> To: Guzman Lugo, Fernando
> Cc: Chitriki Rudramuni, Deepak; linux-omap; Ameya Palande; Felipe
> Contreras; Hiroshi Doyu; Ramirez Luna, Omar; Menon, Nishanth
> Subject: Re: [PATCH] DSPBRIDGE:Fix Kernel memory poison overwritten after
> DSP_MMUFAULT
> 
> On Thu, May 13, 2010 at 12:09 AM, Guzman Lugo, Fernando
> <fernando.lugo@ti.com> wrote:
> >> If you are referring to this patch:
> >> http://git.kernel.org/?p=linux/kernel/git/tmlind/linux-omap-
> >> 2.6.git;a=commit;h=26ad62f03578a12e942d8bb86d0e52ef1afdee22
> >
> > Yes, that's the patch. Could you make sure that the GPT8 interrupt is
> generated before acking MMU fault interrupt?
> 
> I'll try tomorrow when I have access to the hw.
> 
> >> I tried to backport it to minimize the changes to a reproducible
> >> test-case. I guess in the l-o branch the commit would be dd1fd0b.
> >> Unfortunately that didn't fix the corruption. So I don't by that GPT8
> >> theory.
> >>
> >> > - we don't need allocate memory for dummy_va_addr, if some patch
> should
> >> be created should be the patch to remove dummy_va_addr allocation and
> >> deletion.
> >>
> >> I tried that, and that actually fixes the corruption for me (passing 0
> >> to hw_mmu_tlb_add).
> >
> > I think first page DSP side memory is never mapped to MPU side, so even
> if the DSP corrupts that page it does not affect MPU side. However the
> right solution is the one explained before: avoid DSP continues executing
> after MMUfault.
> 
> First of all, what is the DSP supposed to do with that memory? Do we
> really need to call hw_mmu_tlb_add at all?

Once DSP MMUfault happens iva mmu module prevents DSP continue executing until mmu module is able get some physical address for the virtual address that the dsp wanted to access. Once mmu fault interrupt is acked the mmu module tries to translate the virtual address again and if it gets the physical address DSP continue executing. So in order to DSP can dumps its stack we need to map some physical address to that virtual address, so that mmu release DSP and it can dumps the stack. Therefore we allocate some dummy buffer of one 4K page and get the physical address of that buffer and use that physical address to fill the tbl on the mmu module using hw_mmu_tlb_add function.

However the address returned by kmalloc is not page aling, that's means this mpu virtual address has some offset, for examples in the log that were send the dummy address had an offset of 0x080 and the DSP side virtual memory had an offset of 0x040. base on the offset of the MPU side and as we allocate one page that means we can access from 0x080 - 0xfff of the first page and from 0x000 - 0x080 if the second page, but we always allocate the first page to the DSP side, then DSP access to the address it wanted to access and now there is no mmufault but it is accessing (actually writing because reading not cause corruption) to that page but with a offset of 0x040 causing the corruption.

Using get_user_pages fixes that case because as it returns address page aligned the DSP side can access from 0x000 - 0xfff of that page.

However this is not the right solution because lets suppose if DSP side virtual address offset is 0xfff. So we map a page and DSP can access that page from 0x000 - 0xfff, however is the DSP is able to continue executing it will reach the following page and maybe that page is already mapped but it only can access from an specific offset like for example 0x100, in this ca DSP will still corrupt from 0x000 to 0x0ff of the next page.

Let me recheck the changes I and will let you my findings.

Regards,
Fernando.


> 
> We really, absolutely want the DSP to don't corrupt memory on ARM
> side, so if we pass something, it should be full pages.
> 
> Sure, it would be nice to wait for the DSP to stop, but if for some
> reason it doesn't, we need to know that the DSP doesn't have the power
> to corrupt memory.
> 
> Now, I went back to commit 72110f1 and tried the patch you mentioned.
> There's no GPT8 involved, and I cannot reproduce any corruption on a
> beagleboard.
> 
> --- a/drivers/dsp/bridge/wmd/ue_deh.c
> +++ b/drivers/dsp/bridge/wmd/ue_deh.c
> @@ -193,6 +193,7 @@ void bridge_deh_notify(struct deh_mgr *hdeh_mgr,
> u32 ulEventMask, u32 dwErrInfo)
>                                         &resources);
> 
>         if (MEM_IS_VALID_HANDLE(deh_mgr_obj, SIGNATURE)) {
> +               void *temp1, *temp2;
>                 printk(KERN_INFO
>                        "bridge_deh_notify: ********** DEVICE EXCEPTION "
>                        "**********\n");
> @@ -227,8 +228,11 @@ void bridge_deh_notify(struct deh_mgr *hdeh_mgr,
> u32 ulEventMask, u32 dwErrInfo)
>                         printk(KERN_INFO
>                                "bridge_deh_notify: DSP_MMUFAULT, fault "
>                                "address = 0x%x\n", (unsigned
> int)fault_addr);
> -                       dummy_va_addr =
> -                           (u32) mem_calloc(sizeof(char) * 0x1000,
> MEM_PAGED);
> +                       temp1 = kmalloc(0x100000, GFP_ATOMIC);
> +                       temp2 = kmalloc(0x1000, GFP_ATOMIC);
> +                       kfree(temp1);
> +                       kfree(temp2);
> +                       dummy_va_addr = (u32) kmalloc(0x1000, GFP_ATOMIC);
>                         mem_physical =
>                             VIRT_TO_PHYS(PG_ALIGN_LOW
>                                          ((u32) dummy_va_addr,
> PG_SIZE4K));
> 
> Is there anything special I should do?
> 
> Also, wouldn't it be easier to trigger this by doing:
> 
>                         printk(KERN_INFO
>                                "bridge_deh_notify: DSP_MMUFAULT, fault "
>                                "address = 0x%x\n", (unsigned
> int)fault_addr);
> -                       dummy_va_addr =
> -                           (u32) mem_calloc(sizeof(char) * 0x1000,
> MEM_PAGED);
> +                       temp1 = kmalloc(0x100000, GFP_ATOMIC);
> +                       temp2 = kmalloc(0x1000, GFP_ATOMIC);
> +                       kfree(temp1);
>                         mem_physical =
>                             VIRT_TO_PHYS(PG_ALIGN_LOW
> -                                        ((u32) dummy_va_addr,
> PG_SIZE4K));
> +                                        ((u32) temp2, PG_SIZE4K));
> +                       kfree(temp2);
>                         dev_context = (struct wmd_dev_context *)
>                             deh_mgr_obj->hwmd_context;
>                         /* Reset the dynamic mmu index to fixed count if
> it
> 
> Cheers.
> 
> --
> Felipe Contreras

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] DSPBRIDGE:Fix Kernel memory poison overwritten after DSP_MMUFAULT
  2010-05-13 17:29         ` Guzman Lugo, Fernando
@ 2010-05-13 18:29           ` Felipe Contreras
  2010-05-13 21:15             ` Guzman Lugo, Fernando
  0 siblings, 1 reply; 12+ messages in thread
From: Felipe Contreras @ 2010-05-13 18:29 UTC (permalink / raw)
  To: Guzman Lugo, Fernando
  Cc: Chitriki Rudramuni, Deepak, linux-omap, Ameya Palande,
	Felipe Contreras, Hiroshi Doyu, Ramirez Luna, Omar,
	Menon, Nishanth

On Thu, May 13, 2010 at 8:29 PM, Guzman Lugo, Fernando
<fernando.lugo@ti.com> wrote:
>> First of all, what is the DSP supposed to do with that memory? Do we
>> really need to call hw_mmu_tlb_add at all?
>
> Once DSP MMUfault happens iva mmu module prevents DSP continue executing until mmu module is able get some physical address for the virtual address that the dsp wanted to access. Once mmu fault interrupt is acked the mmu module tries to translate the virtual address again and if it gets the physical address DSP continue executing.

This is if we want the DSP to continue executing, which all the code
assumes we don't. If we wanted to do that, then we would need to know
how to get the data that the DSP code was trying to access, but we
don't. We always provide the data beforehand, and if the DSP code
tries to access something else, there's nothing else to do.

> So in order to DSP can dumps its stack we need to map some physical address to that virtual address, so that mmu release DSP and it can dumps the stack.

But the DSP is not dumping the stack there, from what I can see
bridge_brd_read() is used to read DSP internal memory.

You said yourself that you could pass a totally dummy address like 0,
and the stack will still be printed.

> Therefore we allocate some dummy buffer of one 4K page and get the physical address of that buffer and use that physical address to fill the tbl on the mmu module using hw_mmu_tlb_add function.

I think that's wrong. We should not give the DSP hopes that it will be
able to read data from that fault address... it's over.

> However the address returned by kmalloc is not page aling, that's means this mpu virtual address has some offset, for examples in the log that were send the dummy address had an offset of 0x080 and the DSP side virtual memory had an offset of 0x040. base on the offset of the MPU side and as we allocate one page that means we can access from 0x080 - 0xfff of the first page and from 0x000 - 0x080 if the second page, but we always allocate the first page to the DSP side, then DSP access to the address it wanted to access and now there is no mmufault but it is accessing (actually writing because reading not cause corruption) to that page but with a offset of 0x040 causing the corruption.
>
> Using get_user_pages fixes that case because as it returns address page aligned the DSP side can access from 0x000 - 0xfff of that page.

You mean __get_free_pages?

> However this is not the right solution because lets suppose if DSP side virtual address offset is 0xfff. So we map a page and DSP can access that page from 0x000 - 0xfff, however is the DSP is able to continue executing it will reach the following page and maybe that page is already mapped but it only can access from an specific offset like for example 0x100, in this ca DSP will still corrupt from 0x000 to 0x0ff of the next page.

>From what I understand it's impossible for the DSP to access memory
that wasn't mapped. So if we map only that page, when the DSP tries to
write to 0x100, another MMU fault will happen.


If I'm understanding things correctly, then we shouldn't map the
faulty address again (through hw_mmu_tlb_add), and we shouldn't clear
the interrupt either (HW_MMU_TRANSLATION_FAULT). (I haven't tested
this yet).

Cheers.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] DSPBRIDGE:Fix Kernel memory poison overwritten after DSP_MMUFAULT
  2010-05-13 18:29           ` Felipe Contreras
@ 2010-05-13 21:15             ` Guzman Lugo, Fernando
  2010-05-13 22:33               ` Felipe Contreras
  0 siblings, 1 reply; 12+ messages in thread
From: Guzman Lugo, Fernando @ 2010-05-13 21:15 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: Chitriki Rudramuni, Deepak, linux-omap, Ameya Palande,
	Felipe Contreras, Hiroshi Doyu, Ramirez Luna, Omar,
	Menon, Nishanth



Hi,


> -----Original Message-----
> From: Felipe Contreras [mailto:felipe.contreras@gmail.com]
> Sent: Thursday, May 13, 2010 1:30 PM
> To: Guzman Lugo, Fernando
> Cc: Chitriki Rudramuni, Deepak; linux-omap; Ameya Palande; Felipe
> Contreras; Hiroshi Doyu; Ramirez Luna, Omar; Menon, Nishanth
> Subject: Re: [PATCH] DSPBRIDGE:Fix Kernel memory poison overwritten after
> DSP_MMUFAULT
> 
> On Thu, May 13, 2010 at 8:29 PM, Guzman Lugo, Fernando
> <fernando.lugo@ti.com> wrote:
> >> First of all, what is the DSP supposed to do with that memory? Do we
> >> really need to call hw_mmu_tlb_add at all?
> >
> > Once DSP MMUfault happens iva mmu module prevents DSP continue executing
> until mmu module is able get some physical address for the virtual address
> that the dsp wanted to access. Once mmu fault interrupt is acked the mmu
> module tries to translate the virtual address again and if it gets the
> physical address DSP continue executing.
> 
> This is if we want the DSP to continue executing, which all the code
> assumes we don't. If we wanted to do that, then we would need to know
> how to get the data that the DSP code was trying to access, but we
> don't. We always provide the data beforehand, and if the DSP code
> tries to access something else, there's nothing else to do.
> 
> > So in order to DSP can dumps its stack we need to map some physical
> address to that virtual address, so that mmu release DSP and it can dumps
> the stack.
> 
> But the DSP is not dumping the stack there, from what I can see
> bridge_brd_read() is used to read DSP internal memory.

DSP is dumping the stack after the MMUFault and mmu let DSP to continue.

Let's see what happens in successful case, so that the mmu fault
Mechanics can be understood better:

1.- DSP wants to write some virtual address which is not found by the 
	Mmu.

2.- MMU module does not allow to the DSP continue executing and
	Generates MMUfault interrupt which is attached to MPU side.

3.- MPU side allocates a dummy address, so that it can be mapped to 
	The DSP fault address.

dummy_va_addr = kzalloc(sizeof(char) * 0x1000, GFP_ATOMIC);


3.- MPU dumps the DLL loaded
	At the moment of the crash, at this point we don't need anything from
	DSP because MPU has the information of DLL's loaded.


		print_dsp_trace_buffer(dev_context);
		dump_dl_modules(dev_context);


4.- MPU maps the physical address of the dummy address to the fault address
	So that, if the DSP want to write into the fault address it will
	Be writing into the dummy buffer revered previously.

				hw_mmu_tlb_add(resources->dw_dmmu_base,
						mem_physical, fault_addr,
						HW_PAGE_SIZE4KB, 1,
						&map_attrs, HW_SET, HW_SET);

5.- MPU generates a GPT8 overflow interrupt.

			while (!(omap_dm_timer_read_status(timer) &
				GPTIMER_IRQ_OVERFLOW)) {
				if (cnt++ >= GPTIMER_IRQ_WAIT_MAX_CNT) {
					pr_err("%s: GPTimer interrupt failed\n",
								__func__);
					break;
				}
			}


6.- MPU acked mmufault interrupt.


hw_mmu_event_ack(resources->dw_dmmu_base,
				HW_MMU_TRANSLATION_FAULT);


7.- MMU module tries to get the physical address of the DSP fault address
	A now it can, the address is the page of the dummy address + the
	Offset of the fault address.

8.- MMU module lets DSP to continue. But at that moment DSP has to attend
	The GPT8 hw interrupt so that it change the context to the GTP8
	overflow ISR and then dumps all the stack information in the same
	shared memory area which is use for SYS_printf traces.

9.- After doing the acked of the MMUfault interrupt MPU call 
	dump_dsp_stack function

		/* Clear MMU interrupt */
		hw_mmu_event_ack(resources->dw_dmmu_base,
				HW_MMU_TRANSLATION_FAULT);
		dump_dsp_stack(deh_mgr->hwmd_context);

10. Inside dump_dsp_stack we wait until DSP writes the special value
	MMU_FAULT_HEAD1 and MMU_FAULT_HEAD2 into tracing area, which
	States the DSP completed the stack dump.

		while ((mmu_fault_dbg_info.head[0] != MMU_FAULT_HEAD1 ||
			mmu_fault_dbg_info.head[1] != MMU_FAULT_HEAD2) &&
			poll_cnt < POLL_MAX) {

			/* Read DSP dump size from the DSP trace buffer... */
			status = (*intf_fxns->pfn_brd_read)(wmd_context,
				(u8 *)&mmu_fault_dbg_info, (u32)trace_begin,
				sizeof(mmu_fault_dbg_info), 0);

			if (DSP_FAILED(status))
				break;

			poll_cnt++;
		}


11 .- After writing the heads values, DSP just does an infinite while

12.- MPU then prints the information sent by DSP.


Please let me know if you have any doubt.


> 
> You said yourself that you could pass a totally dummy address like 0,
> and the stack will still be printed.
> 
> > Therefore we allocate some dummy buffer of one 4K page and get the
> physical address of that buffer and use that physical address to fill the
> tbl on the mmu module using hw_mmu_tlb_add function.
> 
> I think that's wrong. We should not give the DSP hopes that it will be
> able to read data from that fault address... it's over.
> 
> > However the address returned by kmalloc is not page aling, that's means
> this mpu virtual address has some offset, for examples in the log that
> were send the dummy address had an offset of 0x080 and the DSP side
> virtual memory had an offset of 0x040. base on the offset of the MPU side
> and as we allocate one page that means we can access from 0x080 - 0xfff of
> the first page and from 0x000 - 0x080 if the second page, but we always
> allocate the first page to the DSP side, then DSP access to the address it
> wanted to access and now there is no mmufault but it is accessing
> (actually writing because reading not cause corruption) to that page but
> with a offset of 0x040 causing the corruption.
> >
> > Using get_user_pages fixes that case because as it returns address page
> aligned the DSP side can access from 0x000 - 0xfff of that page.
> 
> You mean __get_free_pages?

Yes I do, sorry for the confusion.

> 
> > However this is not the right solution because lets suppose if DSP side
> virtual address offset is 0xfff. So we map a page and DSP can access that
> page from 0x000 - 0xfff, however is the DSP is able to continue executing
> it will reach the following page and maybe that page is already mapped but
> it only can access from an specific offset like for example 0x100, in this
> ca DSP will still corrupt from 0x000 to 0x0ff of the next page.
> 
> From what I understand it's impossible for the DSP to access memory
> that wasn't mapped. So if we map only that page, when the DSP tries to
> write to 0x100, another MMU fault will happen.

Yes, Only one page is mapped, if for example DSP wants to access 0x21230fff, only page 0x21230000 will be mapped, if the DSP wants access 0x21231000 it will cause another MMUfault.

> 
> 
> If I'm understanding things correctly, then we shouldn't map the
> faulty address again (through hw_mmu_tlb_add), and we shouldn't clear
> the interrupt either (HW_MMU_TRANSLATION_FAULT). (I haven't tested
> this yet).

If we do that, DSP would be able to dump the DSP stack. Also I am not sure
if after reloading the base image and resetting DSP MMU module, the 
HW_MMU_TRANSLATION_FAULT flag is reset too, maybe that whould have to take
care about that.


Regards,
Fanando.

> 
> Cheers.
> 
> --
> Felipe Contreras

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] DSPBRIDGE:Fix Kernel memory poison overwritten after DSP_MMUFAULT
  2010-05-13 21:15             ` Guzman Lugo, Fernando
@ 2010-05-13 22:33               ` Felipe Contreras
  0 siblings, 0 replies; 12+ messages in thread
From: Felipe Contreras @ 2010-05-13 22:33 UTC (permalink / raw)
  To: Guzman Lugo, Fernando
  Cc: Chitriki Rudramuni, Deepak, linux-omap, Ameya Palande,
	Felipe Contreras, Hiroshi Doyu, Ramirez Luna, Omar,
	Menon, Nishanth

On Fri, May 14, 2010 at 12:15 AM, Guzman Lugo, Fernando
<fernando.lugo@ti.com> wrote:
>> But the DSP is not dumping the stack there, from what I can see
>> bridge_brd_read() is used to read DSP internal memory.
>
> DSP is dumping the stack after the MMUFault and mmu let DSP to continue.
>
> Let's see what happens in successful case, so that the mmu fault
> Mechanics can be understood better:
>
> 1.- DSP wants to write some virtual address which is not found by the
>        Mmu.
>
> 2.- MMU module does not allow to the DSP continue executing and
>        Generates MMUfault interrupt which is attached to MPU side.
>
> 3.- MPU side allocates a dummy address, so that it can be mapped to
>        The DSP fault address.
>
> dummy_va_addr = kzalloc(sizeof(char) * 0x1000, GFP_ATOMIC);
>
>
> 3.- MPU dumps the DLL loaded
>        At the moment of the crash, at this point we don't need anything from
>        DSP because MPU has the information of DLL's loaded.
>
>
>                print_dsp_trace_buffer(dev_context);
>                dump_dl_modules(dev_context);
>
>
> 4.- MPU maps the physical address of the dummy address to the fault address
>        So that, if the DSP want to write into the fault address it will
>        Be writing into the dummy buffer revered previously.
>
>                                hw_mmu_tlb_add(resources->dw_dmmu_base,
>                                                mem_physical, fault_addr,
>                                                HW_PAGE_SIZE4KB, 1,
>                                                &map_attrs, HW_SET, HW_SET);
>
> 5.- MPU generates a GPT8 overflow interrupt.
>
>                        while (!(omap_dm_timer_read_status(timer) &
>                                GPTIMER_IRQ_OVERFLOW)) {
>                                if (cnt++ >= GPTIMER_IRQ_WAIT_MAX_CNT) {
>                                        pr_err("%s: GPTimer interrupt failed\n",
>                                                                __func__);
>                                        break;
>                                }
>                        }
>
>
> 6.- MPU acked mmufault interrupt.
>
>
> hw_mmu_event_ack(resources->dw_dmmu_base,
>                                HW_MMU_TRANSLATION_FAULT);
>
>
> 7.- MMU module tries to get the physical address of the DSP fault address
>        A now it can, the address is the page of the dummy address + the
>        Offset of the fault address.
>
> 8.- MMU module lets DSP to continue. But at that moment DSP has to attend
>        The GPT8 hw interrupt so that it change the context to the GTP8
>        overflow ISR and then dumps all the stack information in the same
>        shared memory area which is use for SYS_printf traces.
>
> 9.- After doing the acked of the MMUfault interrupt MPU call
>        dump_dsp_stack function
>
>                /* Clear MMU interrupt */
>                hw_mmu_event_ack(resources->dw_dmmu_base,
>                                HW_MMU_TRANSLATION_FAULT);
>                dump_dsp_stack(deh_mgr->hwmd_context);
>
> 10. Inside dump_dsp_stack we wait until DSP writes the special value
>        MMU_FAULT_HEAD1 and MMU_FAULT_HEAD2 into tracing area, which
>        States the DSP completed the stack dump.
>
>                while ((mmu_fault_dbg_info.head[0] != MMU_FAULT_HEAD1 ||
>                        mmu_fault_dbg_info.head[1] != MMU_FAULT_HEAD2) &&
>                        poll_cnt < POLL_MAX) {
>
>                        /* Read DSP dump size from the DSP trace buffer... */
>                        status = (*intf_fxns->pfn_brd_read)(wmd_context,
>                                (u8 *)&mmu_fault_dbg_info, (u32)trace_begin,
>                                sizeof(mmu_fault_dbg_info), 0);
>
>                        if (DSP_FAILED(status))
>                                break;
>
>                        poll_cnt++;
>                }
>
>
> 11 .- After writing the heads values, DSP just does an infinite while
>
> 12.- MPU then prints the information sent by DSP.
>
>
> Please let me know if you have any doubt.

You repeated step 3 twice. So let's assume the first one is 3.1.

1) What happens if you skip step 3.1 and 4?

You are assuming that the MMU unit would not let the DSP continue
running, but I fail to see why. Then the stack information would not
be available.

First of all, I don't see any stack information anyway:
dump_dsp_stack:No DSP MMU-Fault information available. Now Deepak has
used 0 in hw_mmu_tlb_add() and he is able to see the stack just fine.

>> If I'm understanding things correctly, then we shouldn't map the
>> faulty address again (through hw_mmu_tlb_add), and we shouldn't clear
>> the interrupt either (HW_MMU_TRANSLATION_FAULT). (I haven't tested
>> this yet).
>
> If we do that, DSP would be able to dump the DSP stack.

You mean we _woudn't_? First, I'm not really worried about loosing a
feature that doesn't seem to be working anyway. And second, we assume
we actually want that feature. For development purposes, sure, but in
a production device, no... we actually don't want all that debugging
code which seems to be quite big.

> Also I am not sure
> if after reloading the base image and resetting DSP MMU module, the
> HW_MMU_TRANSLATION_FAULT flag is reset too, maybe that whould have to take
> care about that.

Seems to be working fine here. Probably all interrupts are cleared
when reloading.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] DSPBRIDGE:Fix Kernel memory poison overwritten after DSP_MMUFAULT
  2010-05-13 11:39       ` Felipe Contreras
  2010-05-13 17:29         ` Guzman Lugo, Fernando
@ 2010-05-14 12:08         ` Felipe Contreras
  1 sibling, 0 replies; 12+ messages in thread
From: Felipe Contreras @ 2010-05-14 12:08 UTC (permalink / raw)
  To: Guzman Lugo, Fernando
  Cc: Chitriki Rudramuni, Deepak, linux-omap, Ameya Palande,
	Felipe Contreras, Hiroshi Doyu, Ramirez Luna, Omar,
	Menon, Nishanth

On Thu, May 13, 2010 at 2:39 PM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> On Thu, May 13, 2010 at 12:09 AM, Guzman Lugo, Fernando
> <fernando.lugo@ti.com> wrote:
>>> If you are referring to this patch:
>>> http://git.kernel.org/?p=linux/kernel/git/tmlind/linux-omap-
>>> 2.6.git;a=commit;h=26ad62f03578a12e942d8bb86d0e52ef1afdee22
>>
>> Yes, that's the patch. Could you make sure that the GPT8 interrupt is generated before acking MMU fault interrupt?
>
> I'll try tomorrow when I have access to the hw.

I should see "GPTimer interrupt failed" if it doesn't... right? Then
yes, the GPT8 interrupt is generated.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-05-14 12:09 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-13 16:46 [PATCH] DSPBRIDGE:Fix Kernel memory poison overwritten after DSP_MMUFAULT Deepak Chitriki
2010-04-13 16:54 ` Deepak Chitriki
  -- strict thread matches above, loose matches on Subject: below --
2010-04-13 16:55 Deepak Chitriki
2010-04-19 18:25 ` Guzman Lugo, Fernando
2010-05-12 19:39   ` Felipe Contreras
2010-05-12 21:09     ` Guzman Lugo, Fernando
2010-05-13 11:39       ` Felipe Contreras
2010-05-13 17:29         ` Guzman Lugo, Fernando
2010-05-13 18:29           ` Felipe Contreras
2010-05-13 21:15             ` Guzman Lugo, Fernando
2010-05-13 22:33               ` Felipe Contreras
2010-05-14 12:08         ` Felipe Contreras

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox