* Bug in v7_coherent_kern_range() ? @ 2012-04-01 3:21 Huang Shijie 2012-04-01 6:10 ` Dirk Behme 2012-04-02 11:12 ` Will Deacon 0 siblings, 2 replies; 20+ messages in thread From: Huang Shijie @ 2012-04-01 3:21 UTC (permalink / raw) To: linux-arm-kernel [1] Platform: freescale's IMX6Q(4 cores) , ARM CORTEX-A9 [2] kernel: 3.0.15(I have cherry-picked many patches, and the arch/arm/mm/cache-v7.S is same code with the latest kernel v3.4-rc1) enable SMP, VIPT, [3] application: I use our our application which will clone many threads, two threads (assume as A and B) may do the same thing at the same time as the following code: In most of the time, it's ok. But in some unknown situation, cacheflush() failed and one threads (assume A) may hung up in the following code: ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ open("/usr/lib/lib_mp3_dec_arm12_elinux.so.2.10.0", O_RDONLY) = 8 read(8, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\20\35\0\0004\0\0\0"..., 512) = 512 fstat64(8, {st_mode=S_IFREG|0644, st_size=56232, ...}) = 0 mmap2(NULL, 88032, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 8, 0) = 0x2ff0a000 mprotect(0x2ff18000, 28672, PROT_NONE) = 0 mmap2(0x2ff1f000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 8, 0xd) = 0x2ff1f000 close(8) = 0 mprotect(0x2ff0a000, 57344, PROT_READ|PROT_WRITE) = 0 mprotect(0x2ff0a000, 57344, PROT_READ|PROT_EXEC) = 0 cacheflush(0x2ff0a000, 0x2ff18000, 0, 0x6, 0x2cd03420) = 0 // System hung up here!!! ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [4] kernel log I use "echo t > /proc/sysrq-trigger" to show the tasks's information: ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ multiqueue0:src D 804cd678 0 7328 5963 0x00000001 [<804cd678>] (__schedule+0x228/0x760) from [<804d0564>] (__down_read+0xa8/0xe0) [<804d0564>] (__down_read+0xa8/0xe0) from [<800478c4>] (do_page_fault+0xbc/0x480) [<800478c4>] (do_page_fault+0xbc/0x480) from [<8003841c>] (do_DataAbort+0x34/0x98) [<8003841c>] (do_DataAbort+0x34/0x98) from [<8003df10>] (__dabt_svc+0x70/0xa0) Exception stack(0xbae37ea8 to 0xbae37ef0) 7ea0: 31e05000 31e1d000 00000020 0000001f 31e05000 31e1d000 7ec0: bfac86b8 31e05000 31e1d000 bae36000 08100075 31e056fc 31e08000 bae37ef0 7ee0: 800424a8 8004a1fc 800f0013 ffffffff [<8003df10>] (__dabt_svc+0x70/0xa0) from [<8004a1fc>] (v7_coherent_kern_range+0x20/0x80) [<8004a1fc>] (v7_coherent_kern_range+0x20/0x80) from [<800424a8>] (arm_syscall+0x2a0/0x2c4) [<800424a8>] (arm_syscall+0x2a0/0x2c4) from [<8003e500>] (ret_fast_syscall+0x0/0x3c) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ The do_cache_op() has already held the mm->mmap_sem, but v7_coherent_kern_range() cause one page fault during it flush the cache. deadlock! So it hung up in the do_page_fault(). [5] questions: Why the v7_coherent_kern_range() can caused the data abort? Is there something wrong about the v7_coherent_kern_range()? thanks Huang Shijie ^ permalink raw reply [flat|nested] 20+ messages in thread
* Bug in v7_coherent_kern_range() ? 2012-04-01 3:21 Bug in v7_coherent_kern_range() ? Huang Shijie @ 2012-04-01 6:10 ` Dirk Behme 2012-04-01 7:09 ` Huang Shijie 2012-04-02 11:12 ` Will Deacon 1 sibling, 1 reply; 20+ messages in thread From: Dirk Behme @ 2012-04-01 6:10 UTC (permalink / raw) To: linux-arm-kernel Hi Huang Shijie, On 01.04.2012 05:21, Huang Shijie wrote: > [1] Platform: > freescale's IMX6Q(4 cores) , ARM CORTEX-A9 > > [2] kernel: > 3.0.15(I have cherry-picked many patches, and the arch/arm/mm/cache-v7.S > is same code with the latest kernel v3.4-rc1) > enable SMP, VIPT, Could you try an unpatched, clean v3.4-rc1 instead? What's about your 2.6.38? What's about 3.0.26? 3.0.15 seems to miss some maybe relevant patches. > [3] application: Could you share a (simple) test case? Best regards Dirk > I use our our application which will clone many threads, > two threads (assume as A and B) may do the same thing at the same time > as the following code: > > In most of the time, it's ok. > But in some unknown situation, cacheflush() failed and one threads > (assume A) may hung up in the following code: > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > open("/usr/lib/lib_mp3_dec_arm12_elinux.so.2.10.0", O_RDONLY) = 8 > read(8, > "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\20\35\0\0004\0\0\0"..., > 512) = 512 > fstat64(8, {st_mode=S_IFREG|0644, st_size=56232, ...}) = 0 > mmap2(NULL, 88032, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 8, 0) > = 0x2ff0a000 > mprotect(0x2ff18000, 28672, PROT_NONE) = 0 > mmap2(0x2ff1f000, 4096, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 8, 0xd) = 0x2ff1f000 > close(8) = 0 > mprotect(0x2ff0a000, 57344, PROT_READ|PROT_WRITE) = 0 > mprotect(0x2ff0a000, 57344, PROT_READ|PROT_EXEC) = 0 > cacheflush(0x2ff0a000, 0x2ff18000, 0, 0x6, 0x2cd03420) = 0 // System > hung up here!!! > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > [4] kernel log > I use "echo t> /proc/sysrq-trigger" to show the tasks's information: > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > multiqueue0:src D 804cd678 0 7328 5963 0x00000001 > [<804cd678>] (__schedule+0x228/0x760) from [<804d0564>] > (__down_read+0xa8/0xe0) > [<804d0564>] (__down_read+0xa8/0xe0) from [<800478c4>] > (do_page_fault+0xbc/0x480) > [<800478c4>] (do_page_fault+0xbc/0x480) from [<8003841c>] > (do_DataAbort+0x34/0x98) > [<8003841c>] (do_DataAbort+0x34/0x98) from [<8003df10>] > (__dabt_svc+0x70/0xa0) > Exception stack(0xbae37ea8 to 0xbae37ef0) > 7ea0: 31e05000 31e1d000 00000020 0000001f 31e05000 31e1d000 > 7ec0: bfac86b8 31e05000 31e1d000 bae36000 08100075 31e056fc 31e08000 > bae37ef0 > 7ee0: 800424a8 8004a1fc 800f0013 ffffffff > [<8003df10>] (__dabt_svc+0x70/0xa0) from [<8004a1fc>] > (v7_coherent_kern_range+0x20/0x80) > [<8004a1fc>] (v7_coherent_kern_range+0x20/0x80) from [<800424a8>] > (arm_syscall+0x2a0/0x2c4) > [<800424a8>] (arm_syscall+0x2a0/0x2c4) from [<8003e500>] > (ret_fast_syscall+0x0/0x3c) > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > The do_cache_op() has already held the mm->mmap_sem, but > v7_coherent_kern_range() > cause one page fault during it flush the cache. deadlock! So it hung up > in the do_page_fault(). > > [5] questions: > Why the v7_coherent_kern_range() can caused the data abort? > Is there something wrong about the v7_coherent_kern_range()? > > > thanks > Huang Shijie > > > > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Bug in v7_coherent_kern_range() ? 2012-04-01 6:10 ` Dirk Behme @ 2012-04-01 7:09 ` Huang Shijie 2012-04-01 8:01 ` Dirk Behme 2012-04-01 8:57 ` Dirk Behme 0 siblings, 2 replies; 20+ messages in thread From: Huang Shijie @ 2012-04-01 7:09 UTC (permalink / raw) To: linux-arm-kernel Hi Dirk: > Hi Huang Shijie, > > On 01.04.2012 05:21, Huang Shijie wrote: >> [1] Platform: >> freescale's IMX6Q(4 cores) , ARM CORTEX-A9 >> >> [2] kernel: >> 3.0.15(I have cherry-picked many patches, and the arch/arm/mm/cache-v7.S >> is same code with the latest kernel v3.4-rc1) >> enable SMP, VIPT, > > Could you try an unpatched, clean v3.4-rc1 instead? Sorry, I could not try the v3.4-rc1. Some our bsp drivers are not DT supported. > > What's about your 2.6.38? 2.6.38 is not a good version to run the imx6q. It losts many our drivers's patches. > > What's about 3.0.26? 3.0.15 seems to miss some maybe relevant patches. > Our bsp release are based on 3.0.15. so we could not test it on 3.0.26 too. >> [3] application: > > Could you share a (simple) test case? The test case is like this: #gplay xx.avi gplay is our own player, such as mplayer. I just created a script which will play the video files one by one. BR Huang Shijie > > Best regards > > Dirk > >> I use our our application which will clone many threads, >> two threads (assume as A and B) may do the same thing at the same time >> as the following code: >> >> In most of the time, it's ok. >> But in some unknown situation, cacheflush() failed and one threads >> (assume A) may hung up in the following code: >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> open("/usr/lib/lib_mp3_dec_arm12_elinux.so.2.10.0", O_RDONLY) = 8 >> read(8, >> "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\20\35\0\0004\0\0\0"..., >> 512) = 512 >> fstat64(8, {st_mode=S_IFREG|0644, st_size=56232, ...}) = 0 >> mmap2(NULL, 88032, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 8, 0) >> = 0x2ff0a000 >> mprotect(0x2ff18000, 28672, PROT_NONE) = 0 >> mmap2(0x2ff1f000, 4096, PROT_READ|PROT_WRITE, >> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 8, 0xd) = 0x2ff1f000 >> close(8) = 0 >> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_WRITE) = 0 >> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_EXEC) = 0 >> cacheflush(0x2ff0a000, 0x2ff18000, 0, 0x6, 0x2cd03420) = 0 // System >> hung up here!!! >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> [4] kernel log >> I use "echo t> /proc/sysrq-trigger" to show the tasks's information: >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> multiqueue0:src D 804cd678 0 7328 5963 0x00000001 >> [<804cd678>] (__schedule+0x228/0x760) from [<804d0564>] >> (__down_read+0xa8/0xe0) >> [<804d0564>] (__down_read+0xa8/0xe0) from [<800478c4>] >> (do_page_fault+0xbc/0x480) >> [<800478c4>] (do_page_fault+0xbc/0x480) from [<8003841c>] >> (do_DataAbort+0x34/0x98) >> [<8003841c>] (do_DataAbort+0x34/0x98) from [<8003df10>] >> (__dabt_svc+0x70/0xa0) >> Exception stack(0xbae37ea8 to 0xbae37ef0) >> 7ea0: 31e05000 31e1d000 00000020 0000001f 31e05000 31e1d000 >> 7ec0: bfac86b8 31e05000 31e1d000 bae36000 08100075 31e056fc 31e08000 >> bae37ef0 >> 7ee0: 800424a8 8004a1fc 800f0013 ffffffff >> [<8003df10>] (__dabt_svc+0x70/0xa0) from [<8004a1fc>] >> (v7_coherent_kern_range+0x20/0x80) >> [<8004a1fc>] (v7_coherent_kern_range+0x20/0x80) from [<800424a8>] >> (arm_syscall+0x2a0/0x2c4) >> [<800424a8>] (arm_syscall+0x2a0/0x2c4) from [<8003e500>] >> (ret_fast_syscall+0x0/0x3c) >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> The do_cache_op() has already held the mm->mmap_sem, but >> v7_coherent_kern_range() >> cause one page fault during it flush the cache. deadlock! So it hung up >> in the do_page_fault(). >> >> [5] questions: >> Why the v7_coherent_kern_range() can caused the data abort? >> Is there something wrong about the v7_coherent_kern_range()? >> >> >> thanks >> Huang Shijie >> >> >> >> >> >> _______________________________________________ >> linux-arm-kernel mailing list >> linux-arm-kernel at lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >> > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Bug in v7_coherent_kern_range() ? 2012-04-01 7:09 ` Huang Shijie @ 2012-04-01 8:01 ` Dirk Behme 2012-04-01 8:16 ` Huang Shijie 2012-04-01 8:57 ` Dirk Behme 1 sibling, 1 reply; 20+ messages in thread From: Dirk Behme @ 2012-04-01 8:01 UTC (permalink / raw) To: linux-arm-kernel Hi Huang Shijie, On 01.04.2012 09:09, Huang Shijie wrote: > Hi Dirk: >> Hi Huang Shijie, >> >> On 01.04.2012 05:21, Huang Shijie wrote: >>> [1] Platform: >>> freescale's IMX6Q(4 cores) , ARM CORTEX-A9 >>> >>> [2] kernel: >>> 3.0.15(I have cherry-picked many patches, and the >>> arch/arm/mm/cache-v7.S >>> is same code with the latest kernel v3.4-rc1) >>> enable SMP, VIPT, >> >> Could you try an unpatched, clean v3.4-rc1 instead? > Sorry, I could not try the v3.4-rc1. Some our bsp drivers are not DT > supported. I think we are not talking about drivers, we are talking about some kernel core code, like cache handling? To test v7_coherent_kern_range() you might not need to many bsp drivers? >> What's about your 2.6.38? > 2.6.38 is not a good version to run the imx6q. It losts many our > drivers's patches. >> >> What's about 3.0.26? 3.0.15 seems to miss some maybe relevant patches. >> > Our bsp release are based on 3.0.15. so we could not test it on 3.0.26 > too. You can. Just give git rebase a try. >>> [3] application: >> >> Could you share a (simple) test case? > The test case is like this: > #gplay xx.avi > > gplay is our own player, such as mplayer. Could you share a (simple) test case? E.g. share 'gplay'? Or try to reproduce your issue with an other test case? E.g. mplayer? Or better anything simpler the community can use to try to reproduce your issue? Best regards Dirk > I just created a script which will play the video files one by one. > > BR > Huang Shijie > >> >> Best regards >> >> Dirk >> >>> I use our our application which will clone many threads, >>> two threads (assume as A and B) may do the same thing at the same time >>> as the following code: >>> >>> In most of the time, it's ok. >>> But in some unknown situation, cacheflush() failed and one threads >>> (assume A) may hung up in the following code: >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >>> >>> open("/usr/lib/lib_mp3_dec_arm12_elinux.so.2.10.0", O_RDONLY) = 8 >>> read(8, >>> "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\20\35\0\0004\0\0\0"..., >>> >>> 512) = 512 >>> fstat64(8, {st_mode=S_IFREG|0644, st_size=56232, ...}) = 0 >>> mmap2(NULL, 88032, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, >>> 8, 0) >>> = 0x2ff0a000 >>> mprotect(0x2ff18000, 28672, PROT_NONE) = 0 >>> mmap2(0x2ff1f000, 4096, PROT_READ|PROT_WRITE, >>> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 8, 0xd) = 0x2ff1f000 >>> close(8) = 0 >>> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_WRITE) = 0 >>> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_EXEC) = 0 >>> cacheflush(0x2ff0a000, 0x2ff18000, 0, 0x6, 0x2cd03420) = 0 // System >>> hung up here!!! >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >>> >>> >>> [4] kernel log >>> I use "echo t> /proc/sysrq-trigger" to show the tasks's information: >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >>> >>> multiqueue0:src D 804cd678 0 7328 5963 0x00000001 >>> [<804cd678>] (__schedule+0x228/0x760) from [<804d0564>] >>> (__down_read+0xa8/0xe0) >>> [<804d0564>] (__down_read+0xa8/0xe0) from [<800478c4>] >>> (do_page_fault+0xbc/0x480) >>> [<800478c4>] (do_page_fault+0xbc/0x480) from [<8003841c>] >>> (do_DataAbort+0x34/0x98) >>> [<8003841c>] (do_DataAbort+0x34/0x98) from [<8003df10>] >>> (__dabt_svc+0x70/0xa0) >>> Exception stack(0xbae37ea8 to 0xbae37ef0) >>> 7ea0: 31e05000 31e1d000 00000020 0000001f 31e05000 31e1d000 >>> 7ec0: bfac86b8 31e05000 31e1d000 bae36000 08100075 31e056fc 31e08000 >>> bae37ef0 >>> 7ee0: 800424a8 8004a1fc 800f0013 ffffffff >>> [<8003df10>] (__dabt_svc+0x70/0xa0) from [<8004a1fc>] >>> (v7_coherent_kern_range+0x20/0x80) >>> [<8004a1fc>] (v7_coherent_kern_range+0x20/0x80) from [<800424a8>] >>> (arm_syscall+0x2a0/0x2c4) >>> [<800424a8>] (arm_syscall+0x2a0/0x2c4) from [<8003e500>] >>> (ret_fast_syscall+0x0/0x3c) >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >>> >>> >>> The do_cache_op() has already held the mm->mmap_sem, but >>> v7_coherent_kern_range() >>> cause one page fault during it flush the cache. deadlock! So it >>> hung up >>> in the do_page_fault(). >>> >>> [5] questions: >>> Why the v7_coherent_kern_range() can caused the data abort? >>> Is there something wrong about the v7_coherent_kern_range()? >>> >>> >>> thanks >>> Huang Shijie >>> >>> >>> >>> >>> >>> _______________________________________________ >>> linux-arm-kernel mailing list >>> linux-arm-kernel at lists.infradead.org >>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >>> >> >> > > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Bug in v7_coherent_kern_range() ? 2012-04-01 8:01 ` Dirk Behme @ 2012-04-01 8:16 ` Huang Shijie 2012-04-01 8:50 ` Dirk Behme 0 siblings, 1 reply; 20+ messages in thread From: Huang Shijie @ 2012-04-01 8:16 UTC (permalink / raw) To: linux-arm-kernel Hi Dirk: > Hi Huang Shijie, > > On 01.04.2012 09:09, Huang Shijie wrote: >> Hi Dirk: >>> Hi Huang Shijie, >>> >>> On 01.04.2012 05:21, Huang Shijie wrote: >>>> [1] Platform: >>>> freescale's IMX6Q(4 cores) , ARM CORTEX-A9 >>>> >>>> [2] kernel: >>>> 3.0.15(I have cherry-picked many patches, and the >>>> arch/arm/mm/cache-v7.S >>>> is same code with the latest kernel v3.4-rc1) >>>> enable SMP, VIPT, >>> >>> Could you try an unpatched, clean v3.4-rc1 instead? >> Sorry, I could not try the v3.4-rc1. Some our bsp drivers are not DT >> supported. > > I think we are not talking about drivers, we are talking about some > kernel core code, like cache handling? To test > v7_coherent_kern_range() you might not need to many bsp drivers? Yes , the gplay will use the vpu driver. But the VPU driver is not in the kernel. Without the vpu driver, the gplay can not works. > >>> What's about your 2.6.38? >> 2.6.38 is not a good version to run the imx6q. It losts many our >> drivers's patches. >>> >>> What's about 3.0.26? 3.0.15 seems to miss some maybe relevant patches. >>> >> Our bsp release are based on 3.0.15. so we could not test it on 3.0.26 >> too. > > You can. Just give git rebase a try. It will be a nightmare to me. We have nearly 1000 patches. I will cost me much time to handle the conflicts. > >>>> [3] application: >>> >>> Could you share a (simple) test case? >> The test case is like this: >> #gplay xx.avi >> >> gplay is our own player, such as mplayer. > > Could you share a (simple) test case? E.g. share 'gplay'? Or try to > reproduce your issue with an other test case? E.g. mplayer? Or better > anything simpler the community can use to try to reproduce your issue? I can email to you the gplay, if you have an imx6q board. you can test it. I just wish someone give me some advice about this issue. I find the arch/arm/include/asm/assembler.h is out of date. So I will update it and test it again. thanks a lot , Dirk. Huang Shijie > > Best regards > > Dirk > >> I just created a script which will play the video files one by one. >> >> BR >> Huang Shijie >> >>> >>> Best regards >>> >>> Dirk >>> >>>> I use our our application which will clone many threads, >>>> two threads (assume as A and B) may do the same thing at the same time >>>> as the following code: >>>> >>>> In most of the time, it's ok. >>>> But in some unknown situation, cacheflush() failed and one threads >>>> (assume A) may hung up in the following code: >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> >>>> >>>> >>>> open("/usr/lib/lib_mp3_dec_arm12_elinux.so.2.10.0", O_RDONLY) = 8 >>>> read(8, >>>> "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\20\35\0\0004\0\0\0"..., >>>> >>>> >>>> 512) = 512 >>>> fstat64(8, {st_mode=S_IFREG|0644, st_size=56232, ...}) = 0 >>>> mmap2(NULL, 88032, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, >>>> 8, 0) >>>> = 0x2ff0a000 >>>> mprotect(0x2ff18000, 28672, PROT_NONE) = 0 >>>> mmap2(0x2ff1f000, 4096, PROT_READ|PROT_WRITE, >>>> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 8, 0xd) = 0x2ff1f000 >>>> close(8) = 0 >>>> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_WRITE) = 0 >>>> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_EXEC) = 0 >>>> cacheflush(0x2ff0a000, 0x2ff18000, 0, 0x6, 0x2cd03420) = 0 // System >>>> hung up here!!! >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> >>>> >>>> >>>> >>>> [4] kernel log >>>> I use "echo t> /proc/sysrq-trigger" to show the tasks's information: >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> >>>> >>>> >>>> multiqueue0:src D 804cd678 0 7328 5963 0x00000001 >>>> [<804cd678>] (__schedule+0x228/0x760) from [<804d0564>] >>>> (__down_read+0xa8/0xe0) >>>> [<804d0564>] (__down_read+0xa8/0xe0) from [<800478c4>] >>>> (do_page_fault+0xbc/0x480) >>>> [<800478c4>] (do_page_fault+0xbc/0x480) from [<8003841c>] >>>> (do_DataAbort+0x34/0x98) >>>> [<8003841c>] (do_DataAbort+0x34/0x98) from [<8003df10>] >>>> (__dabt_svc+0x70/0xa0) >>>> Exception stack(0xbae37ea8 to 0xbae37ef0) >>>> 7ea0: 31e05000 31e1d000 00000020 0000001f 31e05000 31e1d000 >>>> 7ec0: bfac86b8 31e05000 31e1d000 bae36000 08100075 31e056fc 31e08000 >>>> bae37ef0 >>>> 7ee0: 800424a8 8004a1fc 800f0013 ffffffff >>>> [<8003df10>] (__dabt_svc+0x70/0xa0) from [<8004a1fc>] >>>> (v7_coherent_kern_range+0x20/0x80) >>>> [<8004a1fc>] (v7_coherent_kern_range+0x20/0x80) from [<800424a8>] >>>> (arm_syscall+0x2a0/0x2c4) >>>> [<800424a8>] (arm_syscall+0x2a0/0x2c4) from [<8003e500>] >>>> (ret_fast_syscall+0x0/0x3c) >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> >>>> >>>> >>>> >>>> The do_cache_op() has already held the mm->mmap_sem, but >>>> v7_coherent_kern_range() >>>> cause one page fault during it flush the cache. deadlock! So it >>>> hung up >>>> in the do_page_fault(). >>>> >>>> [5] questions: >>>> Why the v7_coherent_kern_range() can caused the data abort? >>>> Is there something wrong about the v7_coherent_kern_range()? >>>> >>>> >>>> thanks >>>> Huang Shijie >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> linux-arm-kernel mailing list >>>> linux-arm-kernel at lists.infradead.org >>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >>>> >>> >>> >> >> >> > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Bug in v7_coherent_kern_range() ? 2012-04-01 8:16 ` Huang Shijie @ 2012-04-01 8:50 ` Dirk Behme 2012-04-01 9:14 ` Huang Shijie 0 siblings, 1 reply; 20+ messages in thread From: Dirk Behme @ 2012-04-01 8:50 UTC (permalink / raw) To: linux-arm-kernel On 01.04.2012 10:16, Huang Shijie wrote: > Hi Dirk: >> Hi Huang Shijie, >> >> On 01.04.2012 09:09, Huang Shijie wrote: >>> Hi Dirk: >>>> Hi Huang Shijie, >>>> >>>> On 01.04.2012 05:21, Huang Shijie wrote: >>>>> [1] Platform: >>>>> freescale's IMX6Q(4 cores) , ARM CORTEX-A9 >>>>> >>>>> [2] kernel: >>>>> 3.0.15(I have cherry-picked many patches, and the >>>>> arch/arm/mm/cache-v7.S >>>>> is same code with the latest kernel v3.4-rc1) >>>>> enable SMP, VIPT, >>>> >>>> Could you try an unpatched, clean v3.4-rc1 instead? >>> Sorry, I could not try the v3.4-rc1. Some our bsp drivers are not DT >>> supported. >> >> I think we are not talking about drivers, we are talking about some >> kernel core code, like cache handling? To test >> v7_coherent_kern_range() you might not need to many bsp drivers? > Yes , the gplay will use the vpu driver. But the VPU driver is not in > the kernel. Without the vpu driver, the gplay can not works. You could try to disable the vpu driver and check if the issue is still there, then. >>>> What's about your 2.6.38? >>> 2.6.38 is not a good version to run the imx6q. It losts many our >>> drivers's patches. >>>> >>>> What's about 3.0.26? 3.0.15 seems to miss some maybe relevant >>>> patches. >>>> >>> Our bsp release are based on 3.0.15. so we could not test it on 3.0.26 >>> too. >> >> You can. Just give git rebase a try. > It will be a nightmare to me. We have nearly 1000 patches. I will cost > me much time to handle the conflicts. IMHO you will get one easy to solve merge conflict. So it should you take < 10min to rebase to 3.0.26. Just try it ;) >> >>>>> [3] application: >>>> >>>> Could you share a (simple) test case? >>> The test case is like this: >>> #gplay xx.avi >>> >>> gplay is our own player, such as mplayer. >> >> Could you share a (simple) test case? E.g. share 'gplay'? Or try to >> reproduce your issue with an other test case? E.g. mplayer? Or >> better anything simpler the community can use to try to reproduce >> your issue? > I can email to you the gplay, if you have an imx6q board. you can test > it. > I just wish someone give me some advice about this issue. It would help to use a kernel version and a test case the community can use to reproduce. Best regards Dirk > I find the arch/arm/include/asm/assembler.h is out of date. So I will > update it and test it again. > > thanks a lot , Dirk. > > Huang Shijie >> >> Best regards >> >> Dirk >> >>> I just created a script which will play the video files one by one. >>> >>> BR >>> Huang Shijie >>> >>>> >>>> Best regards >>>> >>>> Dirk >>>> >>>>> I use our our application which will clone many threads, >>>>> two threads (assume as A and B) may do the same thing at the same >>>>> time >>>>> as the following code: >>>>> >>>>> In most of the time, it's ok. >>>>> But in some unknown situation, cacheflush() failed and one threads >>>>> (assume A) may hung up in the following code: >>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> >>>>> >>>>> >>>>> open("/usr/lib/lib_mp3_dec_arm12_elinux.so.2.10.0", O_RDONLY) = 8 >>>>> read(8, >>>>> "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\20\35\0\0004\0\0\0"..., >>>>> >>>>> >>>>> 512) = 512 >>>>> fstat64(8, {st_mode=S_IFREG|0644, st_size=56232, ...}) = 0 >>>>> mmap2(NULL, 88032, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, >>>>> 8, 0) >>>>> = 0x2ff0a000 >>>>> mprotect(0x2ff18000, 28672, PROT_NONE) = 0 >>>>> mmap2(0x2ff1f000, 4096, PROT_READ|PROT_WRITE, >>>>> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 8, 0xd) = 0x2ff1f000 >>>>> close(8) = 0 >>>>> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_WRITE) = 0 >>>>> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_EXEC) = 0 >>>>> cacheflush(0x2ff0a000, 0x2ff18000, 0, 0x6, 0x2cd03420) = 0 // System >>>>> hung up here!!! >>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> >>>>> >>>>> >>>>> >>>>> [4] kernel log >>>>> I use "echo t> /proc/sysrq-trigger" to show the tasks's information: >>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> >>>>> >>>>> >>>>> multiqueue0:src D 804cd678 0 7328 5963 0x00000001 >>>>> [<804cd678>] (__schedule+0x228/0x760) from [<804d0564>] >>>>> (__down_read+0xa8/0xe0) >>>>> [<804d0564>] (__down_read+0xa8/0xe0) from [<800478c4>] >>>>> (do_page_fault+0xbc/0x480) >>>>> [<800478c4>] (do_page_fault+0xbc/0x480) from [<8003841c>] >>>>> (do_DataAbort+0x34/0x98) >>>>> [<8003841c>] (do_DataAbort+0x34/0x98) from [<8003df10>] >>>>> (__dabt_svc+0x70/0xa0) >>>>> Exception stack(0xbae37ea8 to 0xbae37ef0) >>>>> 7ea0: 31e05000 31e1d000 00000020 0000001f 31e05000 31e1d000 >>>>> 7ec0: bfac86b8 31e05000 31e1d000 bae36000 08100075 31e056fc 31e08000 >>>>> bae37ef0 >>>>> 7ee0: 800424a8 8004a1fc 800f0013 ffffffff >>>>> [<8003df10>] (__dabt_svc+0x70/0xa0) from [<8004a1fc>] >>>>> (v7_coherent_kern_range+0x20/0x80) >>>>> [<8004a1fc>] (v7_coherent_kern_range+0x20/0x80) from [<800424a8>] >>>>> (arm_syscall+0x2a0/0x2c4) >>>>> [<800424a8>] (arm_syscall+0x2a0/0x2c4) from [<8003e500>] >>>>> (ret_fast_syscall+0x0/0x3c) >>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> >>>>> >>>>> >>>>> >>>>> The do_cache_op() has already held the mm->mmap_sem, but >>>>> v7_coherent_kern_range() >>>>> cause one page fault during it flush the cache. deadlock! So it >>>>> hung up >>>>> in the do_page_fault(). >>>>> >>>>> [5] questions: >>>>> Why the v7_coherent_kern_range() can caused the data abort? >>>>> Is there something wrong about the v7_coherent_kern_range()? >>>>> >>>>> >>>>> thanks >>>>> Huang Shijie >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> linux-arm-kernel mailing list >>>>> linux-arm-kernel at lists.infradead.org >>>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >>>>> >>>> >>>> >>> >>> >>> >> >> > > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Bug in v7_coherent_kern_range() ? 2012-04-01 8:50 ` Dirk Behme @ 2012-04-01 9:14 ` Huang Shijie 0 siblings, 0 replies; 20+ messages in thread From: Huang Shijie @ 2012-04-01 9:14 UTC (permalink / raw) To: linux-arm-kernel ? 2012?04?01? 16:50, Dirk Behme ??: > On 01.04.2012 10:16, Huang Shijie wrote: >> Hi Dirk: >>> Hi Huang Shijie, >>> >>> On 01.04.2012 09:09, Huang Shijie wrote: >>>> Hi Dirk: >>>>> Hi Huang Shijie, >>>>> >>>>> On 01.04.2012 05:21, Huang Shijie wrote: >>>>>> [1] Platform: >>>>>> freescale's IMX6Q(4 cores) , ARM CORTEX-A9 >>>>>> >>>>>> [2] kernel: >>>>>> 3.0.15(I have cherry-picked many patches, and the >>>>>> arch/arm/mm/cache-v7.S >>>>>> is same code with the latest kernel v3.4-rc1) >>>>>> enable SMP, VIPT, >>>>> >>>>> Could you try an unpatched, clean v3.4-rc1 instead? >>>> Sorry, I could not try the v3.4-rc1. Some our bsp drivers are not DT >>>> supported. >>> >>> I think we are not talking about drivers, we are talking about some >>> kernel core code, like cache handling? To test >>> v7_coherent_kern_range() you might not need to many bsp drivers? >> Yes , the gplay will use the vpu driver. But the VPU driver is not in >> the kernel. Without the vpu driver, the gplay can not works. > > You could try to disable the vpu driver and check if the issue is > still there, then. > :( I have no idea how to reproduce this issue if i disable the vpu driver. >>>>> What's about your 2.6.38? >>>> 2.6.38 is not a good version to run the imx6q. It losts many our >>>> drivers's patches. >>>>> >>>>> What's about 3.0.26? 3.0.15 seems to miss some maybe relevant >>>>> patches. >>>>> >>>> Our bsp release are based on 3.0.15. so we could not test it on 3.0.26 >>>> too. >>> >>> You can. Just give git rebase a try. >> It will be a nightmare to me. We have nearly 1000 patches. I will cost >> me much time to handle the conflicts. > > IMHO you will get one easy to solve merge conflict. So it should you > take < 10min to rebase to 3.0.26. Just try it ;) > >>> >>>>>> [3] application: >>>>> >>>>> Could you share a (simple) test case? >>>> The test case is like this: >>>> #gplay xx.avi >>>> >>>> gplay is our own player, such as mplayer. >>> >>> Could you share a (simple) test case? E.g. share 'gplay'? Or try to >>> reproduce your issue with an other test case? E.g. mplayer? Or >>> better anything simpler the community can use to try to reproduce >>> your issue? >> I can email to you the gplay, if you have an imx6q board. you can test >> it. >> I just wish someone give me some advice about this issue. > > It would help to use a kernel version and a test case the community > can use to reproduce. > I know. thanks Huang Shijie > Best regards > > Dirk > >> I find the arch/arm/include/asm/assembler.h is out of date. So I will >> update it and test it again. >> >> thanks a lot , Dirk. >> >> Huang Shijie >>> >>> Best regards >>> >>> Dirk >>> >>>> I just created a script which will play the video files one by one. >>>> >>>> BR >>>> Huang Shijie >>>> >>>>> >>>>> Best regards >>>>> >>>>> Dirk >>>>> >>>>>> I use our our application which will clone many threads, >>>>>> two threads (assume as A and B) may do the same thing at the same >>>>>> time >>>>>> as the following code: >>>>>> >>>>>> In most of the time, it's ok. >>>>>> But in some unknown situation, cacheflush() failed and one threads >>>>>> (assume A) may hung up in the following code: >>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> open("/usr/lib/lib_mp3_dec_arm12_elinux.so.2.10.0", O_RDONLY) = 8 >>>>>> read(8, >>>>>> "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\20\35\0\0004\0\0\0"..., >>>>>> >>>>>> >>>>>> >>>>>> 512) = 512 >>>>>> fstat64(8, {st_mode=S_IFREG|0644, st_size=56232, ...}) = 0 >>>>>> mmap2(NULL, 88032, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, >>>>>> 8, 0) >>>>>> = 0x2ff0a000 >>>>>> mprotect(0x2ff18000, 28672, PROT_NONE) = 0 >>>>>> mmap2(0x2ff1f000, 4096, PROT_READ|PROT_WRITE, >>>>>> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 8, 0xd) = 0x2ff1f000 >>>>>> close(8) = 0 >>>>>> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_WRITE) = 0 >>>>>> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_EXEC) = 0 >>>>>> cacheflush(0x2ff0a000, 0x2ff18000, 0, 0x6, 0x2cd03420) = 0 // System >>>>>> hung up here!!! >>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> [4] kernel log >>>>>> I use "echo t> /proc/sysrq-trigger" to show the tasks's information: >>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> multiqueue0:src D 804cd678 0 7328 5963 0x00000001 >>>>>> [<804cd678>] (__schedule+0x228/0x760) from [<804d0564>] >>>>>> (__down_read+0xa8/0xe0) >>>>>> [<804d0564>] (__down_read+0xa8/0xe0) from [<800478c4>] >>>>>> (do_page_fault+0xbc/0x480) >>>>>> [<800478c4>] (do_page_fault+0xbc/0x480) from [<8003841c>] >>>>>> (do_DataAbort+0x34/0x98) >>>>>> [<8003841c>] (do_DataAbort+0x34/0x98) from [<8003df10>] >>>>>> (__dabt_svc+0x70/0xa0) >>>>>> Exception stack(0xbae37ea8 to 0xbae37ef0) >>>>>> 7ea0: 31e05000 31e1d000 00000020 0000001f 31e05000 31e1d000 >>>>>> 7ec0: bfac86b8 31e05000 31e1d000 bae36000 08100075 31e056fc 31e08000 >>>>>> bae37ef0 >>>>>> 7ee0: 800424a8 8004a1fc 800f0013 ffffffff >>>>>> [<8003df10>] (__dabt_svc+0x70/0xa0) from [<8004a1fc>] >>>>>> (v7_coherent_kern_range+0x20/0x80) >>>>>> [<8004a1fc>] (v7_coherent_kern_range+0x20/0x80) from [<800424a8>] >>>>>> (arm_syscall+0x2a0/0x2c4) >>>>>> [<800424a8>] (arm_syscall+0x2a0/0x2c4) from [<8003e500>] >>>>>> (ret_fast_syscall+0x0/0x3c) >>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> The do_cache_op() has already held the mm->mmap_sem, but >>>>>> v7_coherent_kern_range() >>>>>> cause one page fault during it flush the cache. deadlock! So it >>>>>> hung up >>>>>> in the do_page_fault(). >>>>>> >>>>>> [5] questions: >>>>>> Why the v7_coherent_kern_range() can caused the data abort? >>>>>> Is there something wrong about the v7_coherent_kern_range()? >>>>>> >>>>>> >>>>>> thanks >>>>>> Huang Shijie >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> linux-arm-kernel mailing list >>>>>> linux-arm-kernel at lists.infradead.org >>>>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >>> >> >> >> > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Bug in v7_coherent_kern_range() ? 2012-04-01 7:09 ` Huang Shijie 2012-04-01 8:01 ` Dirk Behme @ 2012-04-01 8:57 ` Dirk Behme 2012-04-01 9:19 ` Huang Shijie 2012-04-01 9:19 ` Huang Shijie 1 sibling, 2 replies; 20+ messages in thread From: Dirk Behme @ 2012-04-01 8:57 UTC (permalink / raw) To: linux-arm-kernel On 01.04.2012 09:09, Huang Shijie wrote: > Hi Dirk: >> Hi Huang Shijie, >> >> On 01.04.2012 05:21, Huang Shijie wrote: >>> [1] Platform: >>> freescale's IMX6Q(4 cores) , ARM CORTEX-A9 >>> >>> [2] kernel: >>> 3.0.15(I have cherry-picked many patches, and the >>> arch/arm/mm/cache-v7.S >>> is same code with the latest kernel v3.4-rc1) >>> enable SMP, VIPT, >> >> Could you try an unpatched, clean v3.4-rc1 instead? > Sorry, I could not try the v3.4-rc1. Some our bsp drivers are not DT > supported. Have you tried the 3.2 based Linaro kernel? It's DT based. Best regards Dirk >> What's about your 2.6.38? > 2.6.38 is not a good version to run the imx6q. It losts many our > drivers's patches. >> >> What's about 3.0.26? 3.0.15 seems to miss some maybe relevant patches. >> > Our bsp release are based on 3.0.15. so we could not test it on 3.0.26 > too. > >>> [3] application: >> >> Could you share a (simple) test case? > The test case is like this: > #gplay xx.avi > > gplay is our own player, such as mplayer. > I just created a script which will play the video files one by one. > > BR > Huang Shijie > >> >> Best regards >> >> Dirk >> >>> I use our our application which will clone many threads, >>> two threads (assume as A and B) may do the same thing at the same time >>> as the following code: >>> >>> In most of the time, it's ok. >>> But in some unknown situation, cacheflush() failed and one threads >>> (assume A) may hung up in the following code: >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >>> >>> open("/usr/lib/lib_mp3_dec_arm12_elinux.so.2.10.0", O_RDONLY) = 8 >>> read(8, >>> "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\20\35\0\0004\0\0\0"..., >>> >>> 512) = 512 >>> fstat64(8, {st_mode=S_IFREG|0644, st_size=56232, ...}) = 0 >>> mmap2(NULL, 88032, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, >>> 8, 0) >>> = 0x2ff0a000 >>> mprotect(0x2ff18000, 28672, PROT_NONE) = 0 >>> mmap2(0x2ff1f000, 4096, PROT_READ|PROT_WRITE, >>> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 8, 0xd) = 0x2ff1f000 >>> close(8) = 0 >>> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_WRITE) = 0 >>> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_EXEC) = 0 >>> cacheflush(0x2ff0a000, 0x2ff18000, 0, 0x6, 0x2cd03420) = 0 // System >>> hung up here!!! >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >>> >>> >>> [4] kernel log >>> I use "echo t> /proc/sysrq-trigger" to show the tasks's information: >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >>> >>> multiqueue0:src D 804cd678 0 7328 5963 0x00000001 >>> [<804cd678>] (__schedule+0x228/0x760) from [<804d0564>] >>> (__down_read+0xa8/0xe0) >>> [<804d0564>] (__down_read+0xa8/0xe0) from [<800478c4>] >>> (do_page_fault+0xbc/0x480) >>> [<800478c4>] (do_page_fault+0xbc/0x480) from [<8003841c>] >>> (do_DataAbort+0x34/0x98) >>> [<8003841c>] (do_DataAbort+0x34/0x98) from [<8003df10>] >>> (__dabt_svc+0x70/0xa0) >>> Exception stack(0xbae37ea8 to 0xbae37ef0) >>> 7ea0: 31e05000 31e1d000 00000020 0000001f 31e05000 31e1d000 >>> 7ec0: bfac86b8 31e05000 31e1d000 bae36000 08100075 31e056fc 31e08000 >>> bae37ef0 >>> 7ee0: 800424a8 8004a1fc 800f0013 ffffffff >>> [<8003df10>] (__dabt_svc+0x70/0xa0) from [<8004a1fc>] >>> (v7_coherent_kern_range+0x20/0x80) >>> [<8004a1fc>] (v7_coherent_kern_range+0x20/0x80) from [<800424a8>] >>> (arm_syscall+0x2a0/0x2c4) >>> [<800424a8>] (arm_syscall+0x2a0/0x2c4) from [<8003e500>] >>> (ret_fast_syscall+0x0/0x3c) >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >>> >>> >>> The do_cache_op() has already held the mm->mmap_sem, but >>> v7_coherent_kern_range() >>> cause one page fault during it flush the cache. deadlock! So it >>> hung up >>> in the do_page_fault(). >>> >>> [5] questions: >>> Why the v7_coherent_kern_range() can caused the data abort? >>> Is there something wrong about the v7_coherent_kern_range()? >>> >>> >>> thanks >>> Huang Shijie >>> >>> >>> >>> >>> >>> _______________________________________________ >>> linux-arm-kernel mailing list >>> linux-arm-kernel at lists.infradead.org >>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >>> >> >> > > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Bug in v7_coherent_kern_range() ? 2012-04-01 8:57 ` Dirk Behme @ 2012-04-01 9:19 ` Huang Shijie 2012-04-01 9:19 ` Huang Shijie 1 sibling, 0 replies; 20+ messages in thread From: Huang Shijie @ 2012-04-01 9:19 UTC (permalink / raw) To: linux-arm-kernel Hi Dirk: > > Have you tried the 3.2 based Linaro kernel? It's DT based. > not yet. I will test the it. BR Huang Shijie ^ permalink raw reply [flat|nested] 20+ messages in thread
* Bug in v7_coherent_kern_range() ? 2012-04-01 8:57 ` Dirk Behme 2012-04-01 9:19 ` Huang Shijie @ 2012-04-01 9:19 ` Huang Shijie 1 sibling, 0 replies; 20+ messages in thread From: Huang Shijie @ 2012-04-01 9:19 UTC (permalink / raw) To: linux-arm-kernel Hi Dirk: > > Have you tried the 3.2 based Linaro kernel? It's DT based. > not yet. I will test it. BR Huang Shijie ^ permalink raw reply [flat|nested] 20+ messages in thread
* Bug in v7_coherent_kern_range() ? 2012-04-01 3:21 Bug in v7_coherent_kern_range() ? Huang Shijie 2012-04-01 6:10 ` Dirk Behme @ 2012-04-02 11:12 ` Will Deacon 2012-04-06 3:35 ` Huang Shijie 1 sibling, 1 reply; 20+ messages in thread From: Will Deacon @ 2012-04-02 11:12 UTC (permalink / raw) To: linux-arm-kernel On Sun, Apr 01, 2012 at 04:21:10AM +0100, Huang Shijie wrote: > But in some unknown situation, cacheflush() failed and one threads > (assume A) may hung up in the following code: [...] > multiqueue0:src D 804cd678 0 7328 5963 0x00000001 > [<804cd678>] (__schedule+0x228/0x760) from [<804d0564>] > (__down_read+0xa8/0xe0) > [<804d0564>] (__down_read+0xa8/0xe0) from [<800478c4>] > (do_page_fault+0xbc/0x480) > [<800478c4>] (do_page_fault+0xbc/0x480) from [<8003841c>] > (do_DataAbort+0x34/0x98) > [<8003841c>] (do_DataAbort+0x34/0x98) from [<8003df10>] > (__dabt_svc+0x70/0xa0) > Exception stack(0xbae37ea8 to 0xbae37ef0) > 7ea0: 31e05000 31e1d000 00000020 0000001f 31e05000 31e1d000 > 7ec0: bfac86b8 31e05000 31e1d000 bae36000 08100075 31e056fc 31e08000 > bae37ef0 > 7ee0: 800424a8 8004a1fc 800f0013 ffffffff > [<8003df10>] (__dabt_svc+0x70/0xa0) from [<8004a1fc>] > (v7_coherent_kern_range+0x20/0x80) > [<8004a1fc>] (v7_coherent_kern_range+0x20/0x80) from [<800424a8>] > (arm_syscall+0x2a0/0x2c4) > [<800424a8>] (arm_syscall+0x2a0/0x2c4) from [<8003e500>] > (ret_fast_syscall+0x0/0x3c) Please can you try the patch posted here:? http://lists.arm.linux.org.uk/lurker/message/20111107.173344.f738392e.en.html If it fixes your problem, please consider giving a tested-by. Will ^ permalink raw reply [flat|nested] 20+ messages in thread
* Bug in v7_coherent_kern_range() ? 2012-04-02 11:12 ` Will Deacon @ 2012-04-06 3:35 ` Huang Shijie 2012-04-10 9:22 ` Will Deacon 0 siblings, 1 reply; 20+ messages in thread From: Huang Shijie @ 2012-04-06 3:35 UTC (permalink / raw) To: linux-arm-kernel Hi Will: > On Sun, Apr 01, 2012 at 04:21:10AM +0100, Huang Shijie wrote: >> But in some unknown situation, cacheflush() failed and one threads >> (assume A) may hung up in the following code: > [...] > >> multiqueue0:src D 804cd678 0 7328 5963 0x00000001 >> [<804cd678>] (__schedule+0x228/0x760) from [<804d0564>] >> (__down_read+0xa8/0xe0) >> [<804d0564>] (__down_read+0xa8/0xe0) from [<800478c4>] >> (do_page_fault+0xbc/0x480) >> [<800478c4>] (do_page_fault+0xbc/0x480) from [<8003841c>] >> (do_DataAbort+0x34/0x98) >> [<8003841c>] (do_DataAbort+0x34/0x98) from [<8003df10>] >> (__dabt_svc+0x70/0xa0) >> Exception stack(0xbae37ea8 to 0xbae37ef0) >> 7ea0: 31e05000 31e1d000 00000020 0000001f 31e05000 31e1d000 >> 7ec0: bfac86b8 31e05000 31e1d000 bae36000 08100075 31e056fc 31e08000 >> bae37ef0 >> 7ee0: 800424a8 8004a1fc 800f0013 ffffffff >> [<8003df10>] (__dabt_svc+0x70/0xa0) from [<8004a1fc>] >> (v7_coherent_kern_range+0x20/0x80) >> [<8004a1fc>] (v7_coherent_kern_range+0x20/0x80) from [<800424a8>] >> (arm_syscall+0x2a0/0x2c4) >> [<800424a8>] (arm_syscall+0x2a0/0x2c4) from [<8003e500>] >> (ret_fast_syscall+0x0/0x3c) > Please can you try the patch posted here:? > > http://lists.arm.linux.org.uk/lurker/message/20111107.173344.f738392e.en.html I tested this patch. It fixed this bug. This bug did not occur any more. But my system still hung at futex. I think the futex issue is another bug.(will this patch affect the futex?) So : Tested-by: Huang Shijie <b32955@freescale.com> BR Huang Shijie > If it fixes your problem, please consider giving a tested-by. > > Will > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Bug in v7_coherent_kern_range() ? 2012-04-06 3:35 ` Huang Shijie @ 2012-04-10 9:22 ` Will Deacon 2012-04-10 10:30 ` Huang Shijie 0 siblings, 1 reply; 20+ messages in thread From: Will Deacon @ 2012-04-10 9:22 UTC (permalink / raw) To: linux-arm-kernel On Fri, Apr 06, 2012 at 04:35:09AM +0100, Huang Shijie wrote: > > > > http://lists.arm.linux.org.uk/lurker/message/20111107.173344.f738392e.en.html > I tested this patch. It fixed this bug. This bug did not occur any more. > But my system still hung at futex. I think the futex issue is another > bug.(will this patch affect the futex?) If you're on an SMP system, can you check that you have df77abca ("ARM: 7099/1: futex: preserve oldval in SMP __futex_atomic_op") applied? > So : > Tested-by: Huang Shijie <b32955@freescale.com> Ok, thanks. It looks that has briefly revived the discussion over there at least. Will ^ permalink raw reply [flat|nested] 20+ messages in thread
* Bug in v7_coherent_kern_range() ? 2012-04-10 9:22 ` Will Deacon @ 2012-04-10 10:30 ` Huang Shijie 2012-04-10 10:35 ` Will Deacon [not found] ` <4F854992.9080601@freescale.com> 0 siblings, 2 replies; 20+ messages in thread From: Huang Shijie @ 2012-04-10 10:30 UTC (permalink / raw) To: linux-arm-kernel Hi will: > On Fri, Apr 06, 2012 at 04:35:09AM +0100, Huang Shijie wrote: >>> http://lists.arm.linux.org.uk/lurker/message/20111107.173344.f738392e.en.html >> I tested this patch. It fixed this bug. This bug did not occur any more. >> But my system still hung at futex. I think the futex issue is another >> bug.(will this patch affect the futex?) > If you're on an SMP system, can you check that you have df77abca ("ARM: > 7099/1: futex: preserve oldval in SMP __futex_atomic_op") applied? > already applied. thanks. The futex codes (/kernel/futex.c and arch/arm/include/asm/futex.h) are the latest. I guess there is a bug in the futex code in SMP system. Best Regards Huang Shijie >> So : >> Tested-by: Huang Shijie<b32955@freescale.com> > Ok, thanks. It looks that has briefly revived the discussion over there at > least. > > Will > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Bug in v7_coherent_kern_range() ? 2012-04-10 10:30 ` Huang Shijie @ 2012-04-10 10:35 ` Will Deacon [not found] ` <4F854992.9080601@freescale.com> 1 sibling, 0 replies; 20+ messages in thread From: Will Deacon @ 2012-04-10 10:35 UTC (permalink / raw) To: linux-arm-kernel On Tue, Apr 10, 2012 at 11:30:52AM +0100, Huang Shijie wrote: > > On Fri, Apr 06, 2012 at 04:35:09AM +0100, Huang Shijie wrote: > >>> http://lists.arm.linux.org.uk/lurker/message/20111107.173344.f738392e.en.html > >> I tested this patch. It fixed this bug. This bug did not occur any more. > >> But my system still hung at futex. I think the futex issue is another > >> bug.(will this patch affect the futex?) > > If you're on an SMP system, can you check that you have df77abca ("ARM: > > 7099/1: futex: preserve oldval in SMP __futex_atomic_op") applied? > > > > already applied. > thanks. > > The futex codes (/kernel/futex.c and arch/arm/include/asm/futex.h) are > the latest. > I guess there is a bug in the futex code in SMP system. Ok. Can you please: (a) Make your test case available somewhere? (b) Try a more recent mainline kernel (3.3)? Also - which libc are you using? Some older library implementations incorrectly use swp for atomicity. If you have swp emulation enabled, this could cause a lock-up. Do you have CONFIG_SWP_EMULATE=y? Will ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <4F854992.9080601@freescale.com>]
* Bug in v7_coherent_kern_range() ? [not found] ` <4F854992.9080601@freescale.com> @ 2012-04-11 10:24 ` Will Deacon 2012-04-11 11:02 ` Fabio Estevam 2012-05-10 2:51 ` Huang Shijie 0 siblings, 2 replies; 20+ messages in thread From: Will Deacon @ 2012-04-11 10:24 UTC (permalink / raw) To: linux-arm-kernel On Wed, Apr 11, 2012 at 10:06:26AM +0100, Huang Shijie wrote: > Ok. Can you please: > > (a) Make your test case available somewhere? > > > I wish i could find a more common test case to reproduce this bug. But, i can't. > The only test case now is to run the gplay on our IMX6Q platform. Ok, that makes it tricky since I don't have gplay or an IMX6Q platform. > (b) Try a more recent mainline kernel (3.3)? > > > yes, I will try to test the linaro kernel. Can you not try vanilla mainline instead? Either way, let me know how you get on. > The info of the libc: > GNU libc version: 2.13 > GNU libc release: stable That looks new enough for swp not to be an issue. If you still have this problem with a newer kernel, we can try using the fallback SMP futex implementation and see if that works. Will ^ permalink raw reply [flat|nested] 20+ messages in thread
* Bug in v7_coherent_kern_range() ? 2012-04-11 10:24 ` Will Deacon @ 2012-04-11 11:02 ` Fabio Estevam 2012-04-16 5:48 ` Huang Shijie 2012-05-10 2:51 ` Huang Shijie 1 sibling, 1 reply; 20+ messages in thread From: Fabio Estevam @ 2012-04-11 11:02 UTC (permalink / raw) To: linux-arm-kernel On Wed, Apr 11, 2012 at 7:24 AM, Will Deacon <will.deacon@arm.com> wrote: >> I wish i could find a more common test case to reproduce this bug. But, i can't. >> The only test case now is to run the gplay on our IMX6Q platform. > > Ok, that makes it tricky since I don't have gplay or an IMX6Q platform. gplay is a C application that does the same thing as launching a simple Gstreamer pipeline like: gst-launch playbin2 uri=file:///home/file.mp4 Huang, Does the problem also occur if you don?t use the VPU driver? I mean, does it also happen if you decode the file using software codecs. I would like to know if the issue you see is related to the VPU driver or not. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Bug in v7_coherent_kern_range() ? 2012-04-11 11:02 ` Fabio Estevam @ 2012-04-16 5:48 ` Huang Shijie 0 siblings, 0 replies; 20+ messages in thread From: Huang Shijie @ 2012-04-16 5:48 UTC (permalink / raw) To: linux-arm-kernel ? 2012?04?11? 19:02, Fabio Estevam ??: > On Wed, Apr 11, 2012 at 7:24 AM, Will Deacon<will.deacon@arm.com> wrote: > >>> I wish i could find a more common test case to reproduce this bug. But, i can't. >>> The only test case now is to run the gplay on our IMX6Q platform. >> Ok, that makes it tricky since I don't have gplay or an IMX6Q platform. > gplay is a C application that does the same thing as launching a > simple Gstreamer pipeline like: > > gst-launch playbin2 uri=file:///home/file.mp4 > > Huang, > > Does the problem also occur if you don?t use the VPU driver? I mean, I do not test the case with the VPU disabled. > does it also happen if you decode the file using software codecs. I > would like to know if the issue you see is related to the VPU driver Can the vpu affects the futex? I am debugging an uart bug now. I will continue to debug this bug when i finish the uart bug. Best Regards Huang Shijie > or not. > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Bug in v7_coherent_kern_range() ? 2012-04-11 10:24 ` Will Deacon 2012-04-11 11:02 ` Fabio Estevam @ 2012-05-10 2:51 ` Huang Shijie 2012-05-10 8:38 ` Will Deacon 1 sibling, 1 reply; 20+ messages in thread From: Huang Shijie @ 2012-05-10 2:51 UTC (permalink / raw) To: linux-arm-kernel Hi Will: > If you still have this problem with a newer kernel, we can try using the > fallback SMP futex implementation and see if that works. > After we update our application(gstreamer), the futex issue gone. So this is not a kernel bug, but an application bug. thanks for your help. Huang Shijie ^ permalink raw reply [flat|nested] 20+ messages in thread
* Bug in v7_coherent_kern_range() ? 2012-05-10 2:51 ` Huang Shijie @ 2012-05-10 8:38 ` Will Deacon 0 siblings, 0 replies; 20+ messages in thread From: Will Deacon @ 2012-05-10 8:38 UTC (permalink / raw) To: linux-arm-kernel On Thu, May 10, 2012 at 03:51:20AM +0100, Huang Shijie wrote: > Hi Will: > > If you still have this problem with a newer kernel, we can try using the > > fallback SMP futex implementation and see if that works. > > > After we update our application(gstreamer), the futex issue gone. > So this is not a kernel bug, but an application bug. That's good to hear, thanks for reporting back. Will ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2012-05-10 8:38 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-04-01 3:21 Bug in v7_coherent_kern_range() ? Huang Shijie 2012-04-01 6:10 ` Dirk Behme 2012-04-01 7:09 ` Huang Shijie 2012-04-01 8:01 ` Dirk Behme 2012-04-01 8:16 ` Huang Shijie 2012-04-01 8:50 ` Dirk Behme 2012-04-01 9:14 ` Huang Shijie 2012-04-01 8:57 ` Dirk Behme 2012-04-01 9:19 ` Huang Shijie 2012-04-01 9:19 ` Huang Shijie 2012-04-02 11:12 ` Will Deacon 2012-04-06 3:35 ` Huang Shijie 2012-04-10 9:22 ` Will Deacon 2012-04-10 10:30 ` Huang Shijie 2012-04-10 10:35 ` Will Deacon [not found] ` <4F854992.9080601@freescale.com> 2012-04-11 10:24 ` Will Deacon 2012-04-11 11:02 ` Fabio Estevam 2012-04-16 5:48 ` Huang Shijie 2012-05-10 2:51 ` Huang Shijie 2012-05-10 8:38 ` Will Deacon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).