[U-Boot] Unaligned flush_dcache

public inbox for u-boot@lists.denx.de
 help / color / mirror / Atom feed

* [U-Boot] Unaligned flush_dcache_range in axs101.c
@ 2016-04-04  7:38 Alexander Graf
  2016-04-11 17:48 ` Alexey Brodkin
  0 siblings, 1 reply; 10+ messages in thread
From: Alexander Graf @ 2016-04-04  7:38 UTC (permalink / raw)
  To: u-boot

Hi Alexey,

Marek just pointed out to me the fact that flush_dcache_range on arm
expects cache line aligned arguments. However, it seems like in axs101.c
we have an unaligned cache flush:

  flush_dcache_range(RESET_VECTOR_ADDR, RESET_VECTOR_ADDR + sizeof(int));

Could you please verify whether this is correct and if not just send a
quick patch to fix it?

Thanks!

Alex

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [U-Boot] Unaligned flush_dcache_range in axs101.c
  2016-04-04  7:38 [U-Boot] Unaligned flush_dcache_range in axs101.c Alexander Graf
@ 2016-04-11 17:48 ` Alexey Brodkin
  2016-04-11 17:54   ` Marek Vasut
  0 siblings, 1 reply; 10+ messages in thread
From: Alexey Brodkin @ 2016-04-11 17:48 UTC (permalink / raw)
  To: u-boot

Hi Alex,

On Mon, 2016-04-04 at 09:38 +0200, Alexander Graf wrote:
> Hi Alexey,
> 
> Marek just pointed out to me the fact that flush_dcache_range on arm
> expects cache line aligned arguments. However, it seems like in axs101.c
> we have an unaligned cache flush:
> 
> ? flush_dcache_range(RESET_VECTOR_ADDR, RESET_VECTOR_ADDR + sizeof(int));
> 
> Could you please verify whether this is correct and if not just send a
> quick patch to fix it?

First this code is for support of Synopsys DesignWare AXS10x boards.
And AFAIK there's no such board that may sport ARM CPU instead or ARC.
So I'm wondering how did you bumped into that [issue?]?

Then I'm not really sure if there's a common requirement for arguments of
flush_dcache_range(). At least in "include/common.h" there's no comment about
that.

Personally I'd say this is up to each arch or SoC to implement?flush_dcache_range()
so it works properly on that particular hardware. I wouldn't like to
overcomplicate high-level stuff with low-level details such as cache lines etc
if that is not really necessary.

Please correct me if I'm missing something here.

-Alexey

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [U-Boot] Unaligned flush_dcache_range in axs101.c
  2016-04-11 17:48 ` Alexey Brodkin
@ 2016-04-11 17:54   ` Marek Vasut
  2016-04-11 18:13     ` Alexey Brodkin
  0 siblings, 1 reply; 10+ messages in thread
From: Marek Vasut @ 2016-04-11 17:54 UTC (permalink / raw)
  To: u-boot

On 04/11/2016 07:48 PM, Alexey Brodkin wrote:
> Hi Alex,
> 
> On Mon, 2016-04-04 at 09:38 +0200, Alexander Graf wrote:
>> Hi Alexey,
>>
>> Marek just pointed out to me the fact that flush_dcache_range on arm
>> expects cache line aligned arguments. However, it seems like in axs101.c
>> we have an unaligned cache flush:
>>
>>   flush_dcache_range(RESET_VECTOR_ADDR, RESET_VECTOR_ADDR + sizeof(int));
>>
>> Could you please verify whether this is correct and if not just send a
>> quick patch to fix it?
> 
> First this code is for support of Synopsys DesignWare AXS10x boards.
> And AFAIK there's no such board that may sport ARM CPU instead or ARC.
> So I'm wondering how did you bumped into that [issue?]?
> 
> Then I'm not really sure if there's a common requirement for arguments of
> flush_dcache_range(). At least in "include/common.h" there's no comment about
> that.

Such comment should then be added. Sub-cacheline flush/invalidate calls
can corrupt surrounding data.

> Personally I'd say this is up to each arch or SoC to implement flush_dcache_range()
> so it works properly on that particular hardware. I wouldn't like to
> overcomplicate high-level stuff with low-level details such as cache lines etc
> if that is not really necessary.
> 
> Please correct me if I'm missing something here.
> 
> -Alexey
> 


-- 
Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [U-Boot] Unaligned flush_dcache_range in axs101.c
  2016-04-11 17:54   ` Marek Vasut
@ 2016-04-11 18:13     ` Alexey Brodkin
  2016-04-11 18:48       ` Marek Vasut
  0 siblings, 1 reply; 10+ messages in thread
From: Alexey Brodkin @ 2016-04-11 18:13 UTC (permalink / raw)
  To: u-boot

Hi Marek,

On Mon, 2016-04-11 at 19:54 +0200, Marek Vasut wrote:
> On 04/11/2016 07:48 PM, Alexey Brodkin wrote:
> > 
> > Hi Alex,
> > 
> > On Mon, 2016-04-04 at 09:38 +0200, Alexander Graf wrote:
> > > 
> > > Hi Alexey,
> > > 
> > > Marek just pointed out to me the fact that flush_dcache_range on arm
> > > expects cache line aligned arguments. However, it seems like in axs101.c
> > > we have an unaligned cache flush:
> > > 
> > > ? flush_dcache_range(RESET_VECTOR_ADDR, RESET_VECTOR_ADDR + sizeof(int));
> > > 
> > > Could you please verify whether this is correct and if not just send a
> > > quick patch to fix it?
> > First this code is for support of Synopsys DesignWare AXS10x boards.
> > And AFAIK there's no such board that may sport ARM CPU instead or ARC.
> > So I'm wondering how did you bumped into that [issue?]?
> > 
> > Then I'm not really sure if there's a common requirement for arguments of
> > flush_dcache_range(). At least in "include/common.h" there's no comment about
> > that.
> Such comment should then be added. Sub-cacheline flush/invalidate calls
> can corrupt surrounding data.

Well this is not that simple really.

For example that's what we have on ARC:

[1] We may deal with each cache line separately. And BTW that's what we have
? ? now in U-boot, see?http://git.denx.de/?p=u-boot.git;a=blob;f=arch/arc/lib/cache.c#l328
? ? In that case we only mention address of cache line start and regardless of its length
? ? line will be processed by HW.

[2] We may although deal with ranges as well (still this is not implemented in u-boot yet).
? ? In that case we need to set addresses of range beginning and end.
? ? But if start address falls actually in the middle of cache line it will be processed.
? ? And the same is valid for end of the region.

So from above I may conclude that it's more important to place data properly in memory.
I.e. if you put 2 completely independent substances in 1 cache line you won't be able to
deal with cache entries for them separately (at least on ARC).

I'm not really sure if ARM or any other arch in hardware may invalidate/writeback only part of
one cache line - that might very well be the case.

But in the original case my implementation makes perfect sense because what it does
it writes back instructions modified by the active CPU so others may see these changes.
and here I'd like ideally to have an option to write back only 1 CPU word (because that's
what I really modified) but this is not possible due to described above limitations of our HW.

-Alexey

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [U-Boot] Unaligned flush_dcache_range in axs101.c
  2016-04-11 18:13     ` Alexey Brodkin
@ 2016-04-11 18:48       ` Marek Vasut
  2016-04-15 13:00         ` Alexey Brodkin
  0 siblings, 1 reply; 10+ messages in thread
From: Marek Vasut @ 2016-04-11 18:48 UTC (permalink / raw)
  To: u-boot

On 04/11/2016 08:13 PM, Alexey Brodkin wrote:
> Hi Marek,
> 
> On Mon, 2016-04-11 at 19:54 +0200, Marek Vasut wrote:
>> On 04/11/2016 07:48 PM, Alexey Brodkin wrote:
>>>
>>> Hi Alex,
>>>
>>> On Mon, 2016-04-04 at 09:38 +0200, Alexander Graf wrote:
>>>>
>>>> Hi Alexey,
>>>>
>>>> Marek just pointed out to me the fact that flush_dcache_range on arm
>>>> expects cache line aligned arguments. However, it seems like in axs101.c
>>>> we have an unaligned cache flush:
>>>>
>>>>   flush_dcache_range(RESET_VECTOR_ADDR, RESET_VECTOR_ADDR + sizeof(int));
>>>>
>>>> Could you please verify whether this is correct and if not just send a
>>>> quick patch to fix it?
>>> First this code is for support of Synopsys DesignWare AXS10x boards.
>>> And AFAIK there's no such board that may sport ARM CPU instead or ARC.
>>> So I'm wondering how did you bumped into that [issue?]?
>>>
>>> Then I'm not really sure if there's a common requirement for arguments of
>>> flush_dcache_range(). At least in "include/common.h" there's no comment about
>>> that.
>> Such comment should then be added. Sub-cacheline flush/invalidate calls
>> can corrupt surrounding data.
> 
> Well this is not that simple really.
> 
> For example that's what we have on ARC:
> 
> [1] We may deal with each cache line separately. And BTW that's what we have
>     now in U-boot, see http://git.denx.de/?p=u-boot.git;a=blob;f=arch/arc/lib/cache.c#l328
>     In that case we only mention address of cache line start and regardless of its length
>     line will be processed by HW.

Can you flush/invalidate on sub-cacheline level? If not, you should not
paper over the issue and avoid re-aligning the end address.

> [2] We may although deal with ranges as well (still this is not implemented in u-boot yet).
>     In that case we need to set addresses of range beginning and end.
>     But if start address falls actually in the middle of cache line it will be processed.
>     And the same is valid for end of the region.

Same complaint as above, if you cannot flush with sub-cacheline
granularity, do not paper over the issue by re-aligning the start address.

> So from above I may conclude that it's more important to place data properly in memory.
> I.e. if you put 2 completely independent substances in 1 cache line you won't be able to
> deal with cache entries for them separately (at least on ARC).
> 
> I'm not really sure if ARM or any other arch in hardware may invalidate/writeback only part of
> one cache line - that might very well be the case.

It can not, which is the reason we have a warning on sub-cacheline
invalidate/flush on ARM.

> But in the original case my implementation makes perfect sense because what it does
> it writes back instructions modified by the active CPU so others may see these changes.
> and here I'd like ideally to have an option to write back only 1 CPU word (because that's
> what I really modified) but this is not possible due to described above limitations of our HW.

Consider a layout where you have first half of the cacheline used as
DMAble memory while the second half of the cacheline is used for some
variable, say some counter.

Consider this sequence of operations:

1. start DMA
2. counter++
3. invalidate_dcache_line
4. read DMAed buffer
5. print the counter

What will happen in step 5 ? Will you get the incremented counter or the
stale data which were in the DRAM ?

I see it this way -- In step 3, the increment of counter performed in
step 2 will be discarded. In step 4, the cacheline will be re-populated
by stale data from DRAM and in step 5, you will print the old value of
the counter.

> -Alexey
> 


-- 
Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [U-Boot] Unaligned flush_dcache_range in axs101.c
  2016-04-11 18:48       ` Marek Vasut
@ 2016-04-15 13:00         ` Alexey Brodkin
  2016-04-15 13:49           ` Marek Vasut
  0 siblings, 1 reply; 10+ messages in thread
From: Alexey Brodkin @ 2016-04-15 13:00 UTC (permalink / raw)
  To: u-boot

Hi Marek,

On Mon, 2016-04-11 at 20:48 +0200, Marek Vasut wrote:
> On 04/11/2016 08:13 PM, Alexey Brodkin wrote:
> > 
> > Hi Marek,
> > 
> > On Mon, 2016-04-11 at 19:54 +0200, Marek Vasut wrote:
> > > 
> > > On 04/11/2016 07:48 PM, Alexey Brodkin wrote:
> > > > 
> > > > 
> > > > Hi Alex,
> > > > 
> > > > On Mon, 2016-04-04 at 09:38 +0200, Alexander Graf wrote:
> > > > > 
> > > > > 
> > > > > Hi Alexey,
> > > > > 
> > > > > Marek just pointed out to me the fact that flush_dcache_range on arm
> > > > > expects cache line aligned arguments. However, it seems like in axs101.c
> > > > > we have an unaligned cache flush:
> > > > > 
> > > > > ? flush_dcache_range(RESET_VECTOR_ADDR, RESET_VECTOR_ADDR + sizeof(int));
> > > > > 
> > > > > Could you please verify whether this is correct and if not just send a
> > > > > quick patch to fix it?
> > > > First this code is for support of Synopsys DesignWare AXS10x boards.
> > > > And AFAIK there's no such board that may sport ARM CPU instead or ARC.
> > > > So I'm wondering how did you bumped into that [issue?]?
> > > > 
> > > > Then I'm not really sure if there's a common requirement for arguments of
> > > > flush_dcache_range(). At least in "include/common.h" there's no comment about
> > > > that.
> > > Such comment should then be added. Sub-cacheline flush/invalidate calls
> > > can corrupt surrounding data.
> > Well this is not that simple really.
> > 
> > For example that's what we have on ARC:
> > 
> > [1] We may deal with each cache line separately. And BTW that's what we have
> > ????now in U-boot, see http://git.denx.de/?p=u-boot.git;a=blob;f=arch/arc/lib/cache.c#l328
> > ????In that case we only mention address of cache line start and regardless of its length
> > ????line will be processed by HW.
> Can you flush/invalidate on sub-cacheline level? If not, you should not
> paper over the issue and avoid re-aligning the end address.
> 
> > 
> > [2] We may although deal with ranges as well (still this is not implemented in u-boot yet).
> > ????In that case we need to set addresses of range beginning and end.
> > ????But if start address falls actually in the middle of cache line it will be processed.
> > ????And the same is valid for end of the region.
> Same complaint as above, if you cannot flush with sub-cacheline
> granularity, do not paper over the issue by re-aligning the start address.
> 
> > 
> > So from above I may conclude that it's more important to place data properly in memory.
> > I.e. if you put 2 completely independent substances in 1 cache line you won't be able to
> > deal with cache entries for them separately (at least on ARC).
> > 
> > I'm not really sure if ARM or any other arch in hardware may invalidate/writeback only part of
> > one cache line - that might very well be the case.
> It can not, which is the reason we have a warning on sub-cacheline
> invalidate/flush on ARM.
> 
> > 
> > But in the original case my implementation makes perfect sense because what it does
> > it writes back instructions modified by the active CPU so others may see these changes.
> > and here I'd like ideally to have an option to write back only 1 CPU word (because that's
> > what I really modified) but this is not possible due to described above limitations of our HW.
> Consider a layout where you have first half of the cacheline used as
> DMAble memory while the second half of the cacheline is used for some
> variable, say some counter.
> 
> Consider this sequence of operations:
> 
> 1. start DMA
> 2. counter++
> 3. invalidate_dcache_line
> 4. read DMAed buffer
> 5. print the counter
> 
> What will happen in step 5 ? Will you get the incremented counter or the
> stale data which were in the DRAM ?
> 
> I see it this way -- In step 3, the increment of counter performed in
> step 2 will be discarded. In step 4, the cacheline will be re-populated
> by stale data from DRAM and in step 5, you will print the old value of
> the counter.

Let's not mix data and tools that operate with that data.

Cache management functions should be implemented per arch or platform and so
that they match requirements of underlying hardware. If hardware may only work on
whole cache line then cache routines must do exactly this.

As an extra options there could be a check for input arguments in those functions
that will warn a user if he or she passes unaligned start and/or end addresses.

Now speaking about data layout your example is very correct and we saw a lot of
such real use-cases - developer of drivers should design data layout such that
cache management doesn't lead to data corruption.

But again the code in question has nothing to do with cache line.
I only need to writeback 1 word and don't care what really happens with
all remaining words in the same cache line. And my implementation of
flush_dcache_range() will do everything perfectly correct - it will
flush exactly 1 line that contains my word I modified.

-Alexey

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [U-Boot] Unaligned flush_dcache_range in axs101.c
  2016-04-15 13:00         ` Alexey Brodkin
@ 2016-04-15 13:49           ` Marek Vasut
  2016-05-26 11:39             ` Alexey Brodkin
  0 siblings, 1 reply; 10+ messages in thread
From: Marek Vasut @ 2016-04-15 13:49 UTC (permalink / raw)
  To: u-boot

On 04/15/2016 03:00 PM, Alexey Brodkin wrote:
> Hi Marek,
> 
> On Mon, 2016-04-11 at 20:48 +0200, Marek Vasut wrote:
>> On 04/11/2016 08:13 PM, Alexey Brodkin wrote:
>>>
>>> Hi Marek,
>>>
>>> On Mon, 2016-04-11 at 19:54 +0200, Marek Vasut wrote:
>>>>
>>>> On 04/11/2016 07:48 PM, Alexey Brodkin wrote:
>>>>>
>>>>>
>>>>> Hi Alex,
>>>>>
>>>>> On Mon, 2016-04-04 at 09:38 +0200, Alexander Graf wrote:
>>>>>>
>>>>>>
>>>>>> Hi Alexey,
>>>>>>
>>>>>> Marek just pointed out to me the fact that flush_dcache_range on arm
>>>>>> expects cache line aligned arguments. However, it seems like in axs101.c
>>>>>> we have an unaligned cache flush:
>>>>>>
>>>>>>   flush_dcache_range(RESET_VECTOR_ADDR, RESET_VECTOR_ADDR + sizeof(int));
>>>>>>
>>>>>> Could you please verify whether this is correct and if not just send a
>>>>>> quick patch to fix it?
>>>>> First this code is for support of Synopsys DesignWare AXS10x boards.
>>>>> And AFAIK there's no such board that may sport ARM CPU instead or ARC.
>>>>> So I'm wondering how did you bumped into that [issue?]?
>>>>>
>>>>> Then I'm not really sure if there's a common requirement for arguments of
>>>>> flush_dcache_range(). At least in "include/common.h" there's no comment about
>>>>> that.
>>>> Such comment should then be added. Sub-cacheline flush/invalidate calls
>>>> can corrupt surrounding data.
>>> Well this is not that simple really.
>>>
>>> For example that's what we have on ARC:
>>>
>>> [1] We may deal with each cache line separately. And BTW that's what we have
>>>     now in U-boot, see http://git.denx.de/?p=u-boot.git;a=blob;f=arch/arc/lib/cache.c#l328
>>>     In that case we only mention address of cache line start and regardless of its length
>>>     line will be processed by HW.
>> Can you flush/invalidate on sub-cacheline level? If not, you should not
>> paper over the issue and avoid re-aligning the end address.
>>
>>>
>>> [2] We may although deal with ranges as well (still this is not implemented in u-boot yet).
>>>     In that case we need to set addresses of range beginning and end.
>>>     But if start address falls actually in the middle of cache line it will be processed.
>>>     And the same is valid for end of the region.
>> Same complaint as above, if you cannot flush with sub-cacheline
>> granularity, do not paper over the issue by re-aligning the start address.
>>
>>>
>>> So from above I may conclude that it's more important to place data properly in memory.
>>> I.e. if you put 2 completely independent substances in 1 cache line you won't be able to
>>> deal with cache entries for them separately (at least on ARC).
>>>
>>> I'm not really sure if ARM or any other arch in hardware may invalidate/writeback only part of
>>> one cache line - that might very well be the case.
>> It can not, which is the reason we have a warning on sub-cacheline
>> invalidate/flush on ARM.
>>
>>>
>>> But in the original case my implementation makes perfect sense because what it does
>>> it writes back instructions modified by the active CPU so others may see these changes.
>>> and here I'd like ideally to have an option to write back only 1 CPU word (because that's
>>> what I really modified) but this is not possible due to described above limitations of our HW.
>> Consider a layout where you have first half of the cacheline used as
>> DMAble memory while the second half of the cacheline is used for some
>> variable, say some counter.
>>
>> Consider this sequence of operations:
>>
>> 1. start DMA
>> 2. counter++
>> 3. invalidate_dcache_line
>> 4. read DMAed buffer
>> 5. print the counter
>>
>> What will happen in step 5 ? Will you get the incremented counter or the
>> stale data which were in the DRAM ?
>>
>> I see it this way -- In step 3, the increment of counter performed in
>> step 2 will be discarded. In step 4, the cacheline will be re-populated
>> by stale data from DRAM and in step 5, you will print the old value of
>> the counter.
> 
> Let's not mix data and tools that operate with that data.
> 
> Cache management functions should be implemented per arch or platform and so
> that they match requirements of underlying hardware. If hardware may only work on
> whole cache line then cache routines must do exactly this.

Is ARC designed this way ?

> As an extra options there could be a check for input arguments in those functions
> that will warn a user if he or she passes unaligned start and/or end addresses.

Right, ARMv7 does it already, ARM926 too, others need fixing.

> Now speaking about data layout your example is very correct and we saw a lot of
> such real use-cases - developer of drivers should design data layout such that
> cache management doesn't lead to data corruption.

Right.

> But again the code in question has nothing to do with cache line.
> I only need to writeback 1 word and don't care what really happens with
> all remaining words in the same cache line. And my implementation of
> flush_dcache_range() will do everything perfectly correct - it will
> flush exactly 1 line that contains my word I modified.

And this will lead to silent corruption which is hard to track down and
debug. I would rather have a check-and-refuse behavior than
silently-fix-and-permit.

> -Alexey
> 


-- 
Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [U-Boot] Unaligned flush_dcache_range in axs101.c
  2016-04-15 13:49           ` Marek Vasut
@ 2016-05-26 11:39             ` Alexey Brodkin
  2016-05-26 12:07               ` Vineet Gupta
  2016-05-27 13:21               ` Marek Vasut
  0 siblings, 2 replies; 10+ messages in thread
From: Alexey Brodkin @ 2016-05-26 11:39 UTC (permalink / raw)
  To: u-boot

Hi Marek,

On Fri, 2016-04-15 at 15:49 +0200, Marek Vasut wrote:
> On 04/15/2016 03:00 PM, Alexey Brodkin wrote:
> > Cache management functions should be implemented per arch or platform and so
> > that they match requirements of underlying hardware. If hardware may only work on
> > whole cache line then cache routines must do exactly this.
> Is ARC designed this way ?
> 
> > 
> > As an extra options there could be a check for input arguments in those functions
> > that will warn a user if he or she passes unaligned start and/or end addresses.
> Right, ARMv7 does it already, ARM926 too, others need fixing.
> 
> > 
> > Now speaking about data layout your example is very correct and we saw a lot of
> > such real use-cases - developer of drivers should design data layout such that
> > cache management doesn't lead to data corruption.
> Right.
> 
> > 
> > But again the code in question has nothing to do with cache line.
> > I only need to writeback 1 word and don't care what really happens with
> > all remaining words in the same cache line. And my implementation of
> > flush_dcache_range() will do everything perfectly correct - it will
> > flush exactly 1 line that contains my word I modified.
> And this will lead to silent corruption which is hard to track down and
> debug. I would rather have a check-and-refuse behavior than
> silently-fix-and-permit.

Even though that discussion is still not finished very related issue
actually hit me recently. So it looks like there's more here to discuss and
probably with wider audience.

So what happens: in our newer ARC HS38 cores we may have so-called IO Coherency
block. This hardware block basically makes sure data exchanged between CPU cores
(we may have single-core CPU or multi-core CPU) and DMA masters are coherent.
That means if IOC exists and enabled in the core we:
?1) Don't need to do manual cache management when sending data to peripherals and
? ? getting data back. IOC makes all work for us.
?2) We must not do manual cache management when sending data to peripherals and
? ? getting data back. Simply because our manual actions will:
? ? a) Interfere with what IOC already does
? ? b) Introduce extra performance overhead which we wanted to get rid of with
? ? ? ?invention of IOC.

So with peripherals it is all cool now.
I was just disabling all cache management routines if IOC gets detected on boot.

But returning back to your initial complaint.

In the code you were referring what I wanted to modify reset vector of the slave core.
And while we were living without IOC it was all OK. My code above wrote-back
(or as we used to call it within ARC "flushed") L1 data cache with modified
reset vector contents to L2 (which is combined data and instruction cache in case of ARC)
and once slave core was unhalted it read reset vector contents via instruction
fetch hardware and executed right what I wanted.

Now with IOC enabled and as I wrote above with disabled cache operations new reset
vector value gets stuck in L1 data cache of the master core. And slave cores never get
proper value in reset vector effectively going into the wild on unhalt.

The same actually applies to U-Boot relocation and happens even on single-core setups.
We copy U-Boot's image to the new location in the memory and until we explicitly flush
L1 data cache to the underlying L2 or DDR (if L2 doesn't exist) something may not make its
way back to the CPU when it starts fetching instructions from that new location.

So as a summary I would say that we may want to introduce different cache handling routines:
?[1] DMA related
?[2] Non-DMA related

Any thoughts? I'm especially interested if something similar happens on other arches and
how people deal with it.

-Alexey

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [U-Boot] Unaligned flush_dcache_range in axs101.c
  2016-05-26 11:39             ` Alexey Brodkin
@ 2016-05-26 12:07               ` Vineet Gupta
  2016-05-27 13:21               ` Marek Vasut
  1 sibling, 0 replies; 10+ messages in thread
From: Vineet Gupta @ 2016-05-26 12:07 UTC (permalink / raw)
  To: u-boot

On Thursday 26 May 2016 05:09 PM, Alexey Brodkin wrote:

> In the code you were referring what I wanted to modify reset vector of the slave core.
> And while we were living without IOC it was all OK. My code above wrote-back
> (or as we used to call it within ARC "flushed") L1 data cache with modified
> reset vector contents to L2 (which is combined data and instruction cache in case of ARC)
> and once slave core was unhalted it read reset vector contents via instruction
> fetch hardware and executed right what I wanted.

I don't have the full context here, but I think for this specific case (writing
reset vector of other core), you don't need to invent a new interface. You can
just do an uncached store (ST.di) which will ensure that the data goes straight to
memory. You need to make sure that this line is not in other cores' caches L1
and/or L2 - which can probably be assumed if it's just coming out of reset.

-Vineet

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [U-Boot] Unaligned flush_dcache_range in axs101.c
  2016-05-26 11:39             ` Alexey Brodkin
  2016-05-26 12:07               ` Vineet Gupta
@ 2016-05-27 13:21               ` Marek Vasut
  1 sibling, 0 replies; 10+ messages in thread
From: Marek Vasut @ 2016-05-27 13:21 UTC (permalink / raw)
  To: u-boot

On 05/26/2016 01:39 PM, Alexey Brodkin wrote:
> Hi Marek,

Hi!

> On Fri, 2016-04-15 at 15:49 +0200, Marek Vasut wrote:
>> On 04/15/2016 03:00 PM, Alexey Brodkin wrote:
>>> Cache management functions should be implemented per arch or platform and so
>>> that they match requirements of underlying hardware. If hardware may only work on
>>> whole cache line then cache routines must do exactly this.
>> Is ARC designed this way ?
>>
>>>
>>> As an extra options there could be a check for input arguments in those functions
>>> that will warn a user if he or she passes unaligned start and/or end addresses.
>> Right, ARMv7 does it already, ARM926 too, others need fixing.
>>
>>>
>>> Now speaking about data layout your example is very correct and we saw a lot of
>>> such real use-cases - developer of drivers should design data layout such that
>>> cache management doesn't lead to data corruption.
>> Right.
>>
>>>
>>> But again the code in question has nothing to do with cache line.
>>> I only need to writeback 1 word and don't care what really happens with
>>> all remaining words in the same cache line. And my implementation of
>>> flush_dcache_range() will do everything perfectly correct - it will
>>> flush exactly 1 line that contains my word I modified.
>> And this will lead to silent corruption which is hard to track down and
>> debug. I would rather have a check-and-refuse behavior than
>> silently-fix-and-permit.
> 
> Even though that discussion is still not finished very related issue
> actually hit me recently. So it looks like there's more here to discuss and
> probably with wider audience.
> 
> So what happens: in our newer ARC HS38 cores we may have so-called IO Coherency
> block. This hardware block basically makes sure data exchanged between CPU cores
> (we may have single-core CPU or multi-core CPU) and DMA masters are coherent.
> That means if IOC exists and enabled in the core we:
>  1) Don't need to do manual cache management when sending data to peripherals and
>     getting data back. IOC makes all work for us.
>  2) We must not do manual cache management when sending data to peripherals and
>     getting data back. Simply because our manual actions will:
>     a) Interfere with what IOC already does
>     b) Introduce extra performance overhead which we wanted to get rid of with
>        invention of IOC.
> 
> So with peripherals it is all cool now.
> I was just disabling all cache management routines if IOC gets detected on boot.
> 
> But returning back to your initial complaint.
> 
> In the code you were referring what I wanted to modify reset vector of the slave core.
> And while we were living without IOC it was all OK. My code above wrote-back
> (or as we used to call it within ARC "flushed") L1 data cache with modified
> reset vector contents to L2 (which is combined data and instruction cache in case of ARC)
> and once slave core was unhalted it read reset vector contents via instruction
> fetch hardware and executed right what I wanted.

Do you plan to run multiple u-boot instances in parallel ?
btw. can you read core ID to discern the master core (core 0) and the
slave cores (anything which you need to unhalt)?

> Now with IOC enabled and as I wrote above with disabled cache operations new reset
> vector value gets stuck in L1 data cache of the master core. And slave cores never get
> proper value in reset vector effectively going into the wild on unhalt.
> 
> The same actually applies to U-Boot relocation and happens even on single-core setups.
> We copy U-Boot's image to the new location in the memory and until we explicitly flush
> L1 data cache to the underlying L2 or DDR (if L2 doesn't exist) something may not make its
> way back to the CPU when it starts fetching instructions from that new location.
> 
> So as a summary I would say that we may want to introduce different cache handling routines:
>  [1] DMA related
>  [2] Non-DMA related
> 
> Any thoughts? I'm especially interested if something similar happens on other arches and
> how people deal with it.

I would suggest you to implement lowlevel_init() where you check the
core ID and if it's slave core, just make it loop while invalidating
it's caches. That way, the core would always fetch fresh data. Also,
you can make the secondary core check some variable which the core 0
would set to make the slave exit the loop and execute on. Maybe use
another variable to specify execution address too.

> -Alexey
> 


-- 
Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-05-27 13:21 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-04  7:38 [U-Boot] Unaligned flush_dcache_range in axs101.c Alexander Graf
2016-04-11 17:48 ` Alexey Brodkin
2016-04-11 17:54   ` Marek Vasut
2016-04-11 18:13     ` Alexey Brodkin
2016-04-11 18:48       ` Marek Vasut
2016-04-15 13:00         ` Alexey Brodkin
2016-04-15 13:49           ` Marek Vasut
2016-05-26 11:39             ` Alexey Brodkin
2016-05-26 12:07               ` Vineet Gupta
2016-05-27 13:21               ` Marek Vasut

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox