* [U-Boot] armv7 DMA and cache mangement functions @ 2015-08-24 7:54 Markus Niebel 2015-08-27 17:19 ` Markus Niebel 2015-08-27 17:41 ` Mark Rutland 0 siblings, 2 replies; 4+ messages in thread From: Markus Niebel @ 2015-08-24 7:54 UTC (permalink / raw) To: u-boot Hello, I'm not an expert in the low level details of this area. So please sorry if there are wrong assumptions in this post post. Hardware: i.MX6 Solo (TQMa6 on custom Mainboard) U-Boot: 2014.10 gcc: 4.8.3 We see an error using TFTP on i.MX6 that seems to triggered, if the code / data size goes over a limit. Code changes have nothing to do with network stack, network drivers, memory mangement. TFTP will completely unusable: device sees frequently erroneous packages with different of wierd errors. If code stays below this size all works fine. Up to now we checked a lot of things. The following brought us to the assumption, that this could be cache related: dynamically disable data cache before doing TFTP: TFTP works well again running with disabled L2 cache (data cache enabled): TFTP works well again Looking at the code in drivers/net/fec_mxc.c, function fec_recv we see a call to invalidate_dcache_range before accessing the received ethernet data. When looking at the code for invalidate_dcache_range in arch/arm/cpu/armv7/cache_v7.c an comparing how the things done in linux and barebox we noticed that the order of L2 chache / data cache invalidation is just swapped there. Applying this to the receive code for fec_mxc, TFTP will work again. Question: is the order of cache invalidation important? Thanks Markus ^ permalink raw reply [flat|nested] 4+ messages in thread
* [U-Boot] armv7 DMA and cache mangement functions 2015-08-24 7:54 [U-Boot] armv7 DMA and cache mangement functions Markus Niebel @ 2015-08-27 17:19 ` Markus Niebel 2015-08-27 17:41 ` Mark Rutland 1 sibling, 0 replies; 4+ messages in thread From: Markus Niebel @ 2015-08-27 17:19 UTC (permalink / raw) To: u-boot just a friendly ping: Am 24.08.2015 um 09:54 schrieb Markus Niebel: > Hello, > > I'm not an expert in the low level details of this area. So please sorry if there are > wrong assumptions in this post post. > > Hardware: i.MX6 Solo (TQMa6 on custom Mainboard) > U-Boot: 2014.10 > gcc: 4.8.3 > > We see an error using TFTP on i.MX6 that seems to triggered, if the code / data size goes > over a limit. Code changes have nothing to do with network stack, network drivers, > memory mangement. TFTP will completely unusable: device sees frequently erroneous packages > with different of wierd errors. If code stays below this size all works fine. > > Up to now we checked a lot of things. The following brought us to the assumption, that this > could be cache related: > > dynamically disable data cache before doing TFTP: TFTP works well again > running with disabled L2 cache (data cache enabled): TFTP works well again > > Looking at the code in drivers/net/fec_mxc.c, function fec_recv we see a call to > invalidate_dcache_range before accessing the received ethernet data. When looking at > the code for invalidate_dcache_range in arch/arm/cpu/armv7/cache_v7.c an comparing > how the things done in linux and barebox we noticed that the order of L2 chache / data cache > invalidation is just swapped there. Applying this to the receive code for fec_mxc, > TFTP will work again. > > Question: is the order of cache invalidation important? > > Thanks > > Markus > _______________________________________________ > U-Boot mailing list > U-Boot at lists.denx.de > http://lists.denx.de/mailman/listinfo/u-boot > ^ permalink raw reply [flat|nested] 4+ messages in thread
* [U-Boot] armv7 DMA and cache mangement functions 2015-08-24 7:54 [U-Boot] armv7 DMA and cache mangement functions Markus Niebel 2015-08-27 17:19 ` Markus Niebel @ 2015-08-27 17:41 ` Mark Rutland 2015-08-28 8:43 ` Markus Niebel 1 sibling, 1 reply; 4+ messages in thread From: Mark Rutland @ 2015-08-27 17:41 UTC (permalink / raw) To: u-boot On Mon, Aug 24, 2015 at 08:54:17AM +0100, Markus Niebel wrote: > Hello, > > I'm not an expert in the low level details of this area. So please sorry if there are > wrong assumptions in this post post. > > Hardware: i.MX6 Solo (TQMa6 on custom Mainboard) > U-Boot: 2014.10 > gcc: 4.8.3 > > We see an error using TFTP on i.MX6 that seems to triggered, if the code / data size goes > over a limit. Code changes have nothing to do with network stack, network drivers, > memory mangement. TFTP will completely unusable: device sees frequently erroneous packages > with different of wierd errors. If code stays below this size all works fine. > > Up to now we checked a lot of things. The following brought us to the assumption, that this > could be cache related: > > dynamically disable data cache before doing TFTP: TFTP works well again > running with disabled L2 cache (data cache enabled): TFTP works well again > > Looking at the code in drivers/net/fec_mxc.c, function fec_recv we see a call to > invalidate_dcache_range before accessing the received ethernet data. When looking at > the code for invalidate_dcache_range in arch/arm/cpu/armv7/cache_v7.c an comparing > how the things done in linux and barebox we noticed that the order of L2 chache / data cache > invalidation is just swapped there. Applying this to the receive code for fec_mxc, > TFTP will work again. > > Question: is the order of cache invalidation important? The order is important. Consider the case where both the external and architected caches contain stale (but clean) cache lines for the region you care about. If you invalidate the architected caches before the external L2, the architected caches may speculatively fetch (stale) data from the L2 before the L2 is cleaned, and so in the end you may still see stale data in the architected caches. If you invalidate the L2 first, the architected caches could speculatively fetch from the L2 (stale) or memory (new) while this is in progress, but they will then be invalidated, and from then on can only fetch the new data. That assumes that both levels were clean to begin with. If they are not, then additional maintenance is required. It's also conceivable that caches could be implemented such that the above is insufficient, YMMV. Thanks, Mark. ^ permalink raw reply [flat|nested] 4+ messages in thread
* [U-Boot] armv7 DMA and cache mangement functions 2015-08-27 17:41 ` Mark Rutland @ 2015-08-28 8:43 ` Markus Niebel 0 siblings, 0 replies; 4+ messages in thread From: Markus Niebel @ 2015-08-28 8:43 UTC (permalink / raw) To: u-boot Am 27.08.2015 um 19:41 schrieb Mark Rutland: > On Mon, Aug 24, 2015 at 08:54:17AM +0100, Markus Niebel wrote: >> Hello, >> >> I'm not an expert in the low level details of this area. So please sorry if there are >> wrong assumptions in this post post. >> >> Hardware: i.MX6 Solo (TQMa6 on custom Mainboard) >> U-Boot: 2014.10 >> gcc: 4.8.3 >> >> We see an error using TFTP on i.MX6 that seems to triggered, if the code / data size goes >> over a limit. Code changes have nothing to do with network stack, network drivers, >> memory mangement. TFTP will completely unusable: device sees frequently erroneous packages >> with different of wierd errors. If code stays below this size all works fine. >> >> Up to now we checked a lot of things. The following brought us to the assumption, that this >> could be cache related: >> >> dynamically disable data cache before doing TFTP: TFTP works well again >> running with disabled L2 cache (data cache enabled): TFTP works well again >> >> Looking at the code in drivers/net/fec_mxc.c, function fec_recv we see a call to >> invalidate_dcache_range before accessing the received ethernet data. When looking at >> the code for invalidate_dcache_range in arch/arm/cpu/armv7/cache_v7.c an comparing >> how the things done in linux and barebox we noticed that the order of L2 chache / data cache >> invalidation is just swapped there. Applying this to the receive code for fec_mxc, >> TFTP will work again. >> >> Question: is the order of cache invalidation important? > > The order is important. > > Consider the case where both the external and architected caches contain > stale (but clean) cache lines for the region you care about. > > If you invalidate the architected caches before the external L2, the > architected caches may speculatively fetch (stale) data from the L2 > before the L2 is cleaned, and so in the end you may still see stale > data in the architected caches. > > If you invalidate the L2 first, the architected caches could > speculatively fetch from the L2 (stale) or memory (new) while this is in > progress, but they will then be invalidated, and from then on can only > fetch the new data. > > That assumes that both levels were clean to begin with. If they are not, > then additional maintenance is required. It's also conceivable that > caches could be implemented such that the above is insufficient, YMMV. > Thank you for the clarification. Given your statement the code in question: arch/arm/cpu/armv7/cache_v7.c /* * Invalidates range in all levels of D-cache/unified cache used: * Affects the range [start, stop - 1] */ void invalidate_dcache_range(unsigned long start, unsigned long stop) { v7_dcache_maint_range(start, stop, ARMV7_DCACHE_INVAL_RANGE); v7_outer_cache_inval_range(start, stop); } is wrong - and also the flush_dcache_all logic. Given http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka14802.html the chache invalidate handler needs to be fixed in a way, that also the platforms providing atomic maintenance are taken in account. Correct me if I'm wrong. Regards Markus > Thanks, > Mark. > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-08-28 8:43 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-08-24 7:54 [U-Boot] armv7 DMA and cache mangement functions Markus Niebel 2015-08-27 17:19 ` Markus Niebel 2015-08-27 17:41 ` Mark Rutland 2015-08-28 8:43 ` Markus Niebel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox