public inbox for u-boot@lists.denx.de
 help / color / mirror / Atom feed
* [U-Boot-Users] MPC83xx data cache lock?
@ 2006-05-26  5:33 Liu Dave-r63238
  2006-05-28 22:24 ` Wolfgang Denk
  0 siblings, 1 reply; 8+ messages in thread
From: Liu Dave-r63238 @ 2006-05-26  5:33 UTC (permalink / raw)
  To: u-boot

Hi Wolfgang,

Here has one patch can make DMA performance improved 6x.
This patch make DMA with cache line burst read and burst write
from/to DDR memory. 

DMA, ECC on
ddr init duration: 1335 ms
 
DMA, ECC off
ddr init duration: 966 ms

====================================
diff -r u-boot/cpu/mpc83xx/cpu.c u-boot-b/cpu/mpc83xx/cpu.c
260c260,262
<       dmamr0 = (DMA_CHANNEL_TRANSFER_MODE_DIRECT);
---
>       dmamr0 = (DMA_CHANNEL_TRANSFER_MODE_DIRECT |
>                       DMA_CHANNEL_SOURCE_ADDRESS_HOLD_8B |
>                       DMA_CHANNEL_SOURCE_ADRESSS_HOLD_EN);
diff -r u-boot/cpu/mpc83xx/cpu_init.c u-boot-b/cpu/mpc83xx/cpu_init.c
72,73d71
<       /* Set CSB bus pipeline depth */
<       im->arbiter.acr = 0x00030000;
diff -r u-boot/cpu/mpc83xx/spd_sdram.c u-boot-b/cpu/mpc83xx/spd_sdram.c
429c429
< #define CONFIG_DDR_ECC_INIT_VIA_DMA
---
> /* #define CONFIG_DDR_ECC_INIT_VIA_DMA */
======================================

Regards,
Dave


> -----Original Message-----
> From: u-boot-users-admin at lists.sourceforge.net 
> [mailto:u-boot-users-admin at lists.sourceforge.net] On Behalf 
> Of Liu Dave-r63238
> Sent: Wednesday, May 24, 2006 5:57 PM
> To: 'wd at denx.de'
> Cc: u-boot-users at lists.sourceforge.net
> Subject: RE: [U-Boot-Users] MPC83xx data cache lock? 
> 
> 
> 
> > -----Original Message-----
> > Just measue the time it takes to initialize ECC memory
> > either  using the  cache  or DMA methods; here is a short 
> > summary (don't complain - you asked for it!):
> > 
> > ----- quote begin -----
> > 
> > 1. Read vs. write performance
> > 
> > Writing to DDR memory is *much* slower than reading it.
> > 
> > ECC off
> > read  duration: 509 ms
> > write duration: 1546 ms
> > 
> > ECC on
> > read  duration: 509 ms
> > write duration: 5703 ms
> > 
> I have a test, the read vs. write performance is
> 
> ECC off
> read duration: 4124 ms
> write duration: 1516 ms
> 
> ECC on
> read duration: 4634 ms
> write duration: 5703 ms
> 
> Because data cache is locked all of ways, so the data cache's 
> behavior looks like cache inhibited, we access memory with 
> the two instructions, stw for 32bits write and lwz---for 32bits read.
> 
> The write performance is the same to you, but read 
> performance is very different between us.
> 
> I don't know how did you do the read access memory?
> 
> If you only read from memory to one variable, and you don't 
> reference this variable later, the compiler will remove the 
> load instruction to optimize. Or you define the variable with 
> volatile type.
> 
> I suggest you check the assembler code to make sure the load 
> instruction in the loop and no any other memory access 
> instructions in the loop. 
> 
> When the ECC enable, the write duration is 4x difference when 
> the ECC is off, I think sub-double word write cause 
> read-modify-write bus operation. It will consume more time do 
> the write access.
> 
> Why the read time is triple than the write time in my test?
> I will address this.
> 
> > There's no clear indication in both DDR (8349) docs and
> > Micron specification of our module on if and how read vs. 
> > write operations differ in timing. There is one pointer for 
> > the ECC case, which suggests writes can take three stages 
> > (full read-modify-write cycle) instead of just one:
> > 
> > "9.5.4 SDRAM Interface Timing - If ECC is disabled, writes
> > smaller than 
> > double words are performed by appropriately activating the 
> > data mask. If 
> > ECC is enabled, the controller performs a read-modify write."
> > 
> > The problem is we see 3x difference when the ECC is off, 
> and 10x when
> > on. We also did a series of tests with various chunk sizes of data 
> > written, so as to be sure we do not do the indicated 
> sub-double word 
> > writes, but the results were the same.
> > 
> Do you make sure you do not do the sub-double word writes?
> 
> I also do one 64 bits read / write access test for full memory space. 
> 
> Access memory with dobule precision float load/store 
> instructions. Lfd for 64 bits read and stfd for 64 bits write.
> 
> The code see the attatchment. And the result is
> 
> ECC off
> read duration: 2317 ms
> write duration: 774 ms
> 
> ECC on
> read duration: 2317 ms
> write duration: 774 ms
> 
> When ECC is on, we do double word write operation, so RMW 
> cycles don't happen.
> 
> 
> > This is really strange, although at least read operations are not
> > affected by enabling ECC (which is according to the book - 
> > there should 
> > be minimal overhead put on read operations while ECC on, see 
> > 3. below).
> > 
> > 2. DMA (low) performance
> > 
> > Using DMA for transfers proves very inefficient. As mentioned
> > earlier, 
> > the DMA module in 8349 is different than seen in other 
> > families, and it 
> > occured to us a bit "alien" when compared with the rest of 
> > the chip (DMA 
> > documentation part is rather limited, and different in style 
> > etc.), as 
> > if taken from elsewhere. It is also peculiar in technical aspects: 
> > endianness used is different, so we need to convert the order 
> > explicitly 
> > in s/w.
> > 
> > We tried increasing the local bus clocking but to no avail.
> > 
> Local bus clock don't effect to CSB and DDR performance.
> 
> > Given that low performance it doesn't make much difference
> > whether ECC 
> > is enabled or not:
> > 
> > DMA, ECC on
> > ddr init duration: 6947 ms
> > 
> > DMA, ECC off
> > ddr init duration: 6721 ms
> >
>  
> My test data is:
> 
> DMA, ECC on
> ddr init duration: 6945 ms
> 
> DMA, ECC off
> ddr init duration: 6558 ms
> 
> Just little difference to you.
> 
> > There seems something broken with the DMA operations in
> > general as they 
> > are way slower than just plain read/write to memory, which 
> is somehow 
> > confirmed by your recent communication from the customer.
> >
> Init all of memory with DMA method as u-boot code,
> DMA controller will do ----read from memory  and do ----write 
> to memory. and loop it.
> 
> This will arise lot of read access from memory. Consume more time. 
>  
> > 
> > 3. ECC penalty
> > 
> > As can be seen in results given in 1. enabling ECC puts a
> > huge burden on 
> > write access, which is contrary to 8349 UM:
> > 
> > p. 9-27 (above figure 9-24) "When ECC is enabled, one clock cycle is
> > added to the read path to check ECC and correct single-bit 
> > errors.  ECC 
> > generation does not add a cycle to the write path."
> > 
> > ----- quote begin -----
> > 
> > 
> > Can you explain why writing to ECC memory is  10  times
> > slower  than reading?
> > 
> I hope you can tell me how did you mesure the read time. Thanks.
> 
> 
> Regards,
> Dave 
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread
* [U-Boot-Users] MPC83xx data cache lock?
@ 2006-05-24  9:56 Liu Dave-r63238
  0 siblings, 0 replies; 8+ messages in thread
From: Liu Dave-r63238 @ 2006-05-24  9:56 UTC (permalink / raw)
  To: u-boot


> -----Original Message-----
> Just measue the time it takes to initialize ECC memory  
> either  using the  cache  or DMA methods; here is a short 
> summary (don't complain - you asked for it!):
> 
> ----- quote begin -----
> 
> 1. Read vs. write performance
> 
> Writing to DDR memory is *much* slower than reading it.
> 
> ECC off
> read  duration: 509 ms
> write duration: 1546 ms
> 
> ECC on
> read  duration: 509 ms
> write duration: 5703 ms
> 
I have a test, the read vs. write performance is

ECC off
read duration: 4124 ms
write duration: 1516 ms

ECC on
read duration: 4634 ms
write duration: 5703 ms

Because data cache is locked all of ways, so the data cache's
behavior looks like cache inhibited, we access memory with
the two instructions, stw for 32bits write and lwz---for 32bits read.

The write performance is the same to you, but read performance
is very different between us.

I don't know how did you do the read access memory?

If you only read from memory to one variable, and you don't reference
this variable later, the compiler will remove the load instruction to
optimize. Or you define the variable with volatile type.

I suggest you check the assembler code to make sure the load
instruction in the loop and no any other memory access instructions
in the loop. 

When the ECC enable, the write duration is 4x difference when
the ECC is off, I think sub-double word write cause read-modify-write
bus operation. It will consume more time do the write access.

Why the read time is triple than the write time in my test?
I will address this.

> There's no clear indication in both DDR (8349) docs and 
> Micron specification of our module on if and how read vs. 
> write operations differ in timing. There is one pointer for 
> the ECC case, which suggests writes can take three stages 
> (full read-modify-write cycle) instead of just one:
> 
> "9.5.4 SDRAM Interface Timing - If ECC is disabled, writes 
> smaller than 
> double words are performed by appropriately activating the 
> data mask. If 
> ECC is enabled, the controller performs a read-modify write."
> 
> The problem is we see 3x difference when the ECC is off, and 10x when 
> on. We also did a series of tests with various chunk sizes of data 
> written, so as to be sure we do not do the indicated sub-double word 
> writes, but the results were the same.
> 
Do you make sure you do not do the sub-double word writes?

I also do one 64 bits read / write access test for full memory space. 

Access memory with dobule precision float load/store instructions.
Lfd for 64 bits read and stfd for 64 bits write.

The code see the attatchment. And the result is

ECC off
read duration: 2317 ms
write duration: 774 ms

ECC on
read duration: 2317 ms
write duration: 774 ms

When ECC is on, we do double word write operation, so RMW
cycles don't happen.


> This is really strange, although at least read operations are not 
> affected by enabling ECC (which is according to the book - 
> there should 
> be minimal overhead put on read operations while ECC on, see 
> 3. below).
> 
> 2. DMA (low) performance
> 
> Using DMA for transfers proves very inefficient. As mentioned 
> earlier, 
> the DMA module in 8349 is different than seen in other 
> families, and it 
> occured to us a bit "alien" when compared with the rest of 
> the chip (DMA 
> documentation part is rather limited, and different in style 
> etc.), as 
> if taken from elsewhere. It is also peculiar in technical aspects: 
> endianness used is different, so we need to convert the order 
> explicitly 
> in s/w.
> 
> We tried increasing the local bus clocking but to no avail.
> 
Local bus clock don't effect to CSB and DDR performance.

> Given that low performance it doesn't make much difference 
> whether ECC 
> is enabled or not:
> 
> DMA, ECC on
> ddr init duration: 6947 ms
> 
> DMA, ECC off
> ddr init duration: 6721 ms
>
 
My test data is:

DMA, ECC on
ddr init duration: 6945 ms

DMA, ECC off
ddr init duration: 6558 ms

Just little difference to you.

> There seems something broken with the DMA operations in 
> general as they 
> are way slower than just plain read/write to memory, which is somehow 
> confirmed by your recent communication from the customer.
>
Init all of memory with DMA method as u-boot code,
DMA controller will do ----read from memory  and do ----write to memory.
and loop it.

This will arise lot of read access from memory. Consume more time. 
 
> 
> 3. ECC penalty
> 
> As can be seen in results given in 1. enabling ECC puts a 
> huge burden on 
> write access, which is contrary to 8349 UM:
> 
> p. 9-27 (above figure 9-24) "When ECC is enabled, one clock cycle is 
> added to the read path to check ECC and correct single-bit 
> errors.  ECC 
> generation does not add a cycle to the write path."
> 
> ----- quote begin -----
> 
> 
> Can you explain why writing to ECC memory is  10  times  
> slower  than reading?
> 
I hope you can tell me how did you mesure the read time. Thanks.


Regards,
Dave 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: u-boot.diff
Type: application/octet-stream
Size: 2535 bytes
Desc: not available
Url : http://lists.denx.de/pipermail/u-boot/attachments/20060524/c2ed8f38/attachment.obj 

^ permalink raw reply	[flat|nested] 8+ messages in thread
* [U-Boot-Users] MPC83xx data cache lock?
@ 2006-05-24  9:48 Liu Dave-r63238
  0 siblings, 0 replies; 8+ messages in thread
From: Liu Dave-r63238 @ 2006-05-24  9:48 UTC (permalink / raw)
  To: u-boot

> -----Original Message-----
> Just measue the time it takes to initialize ECC memory  
> either  using the  cache  or DMA methods; here is a short 
> summary (don't complain - you asked for it!):
> 
> ----- quote begin -----
> 
> 1. Read vs. write performance
> 
> Writing to DDR memory is *much* slower than reading it.
> 
> ECC off
> read  duration: 509 ms
> write duration: 1546 ms
> 
> ECC on
> read  duration: 509 ms
> write duration: 5703 ms
> 
I have a test, the read vs. write performance is

ECC off
read duration: 4124 ms
write duration: 1516 ms

ECC on
read duration: 4634 ms
write duration: 5703 ms

Because data cache is locked all of ways, so the data cache's
behavior looks like cache inhibited, we access memory with
the two instructions, stw for 32bits write and lwz---for 32bits read.

The write performance is the same to you, but read performance
is very different between us.

I don't know how did you do the read access memory?

If you only read from memory to one variable, and you don't reference
this variable later, the compiler will remove the load instruction to
optimize. Or you define the variable with volatile type.

I suggest you check the assembler code to make sure the load
instruction in the loop and no any other memory access instructions
in the loop. 

When the ECC enable, the write duration is 4x difference when
the ECC is off, I think sub-double word write cause read-modify-write
bus operation. It will consume more time do the write access.

Why the read time is triple than the write time in my test?
I will address this.

> There's no clear indication in both DDR (8349) docs and 
> Micron specification of our module on if and how read vs. 
> write operations differ in timing. There is one pointer for 
> the ECC case, which suggests writes can take three stages 
> (full read-modify-write cycle) instead of just one:
> 
> "9.5.4 SDRAM Interface Timing - If ECC is disabled, writes 
> smaller than 
> double words are performed by appropriately activating the 
> data mask. If 
> ECC is enabled, the controller performs a read-modify write."
> 
> The problem is we see 3x difference when the ECC is off, and 10x when 
> on. We also did a series of tests with various chunk sizes of data 
> written, so as to be sure we do not do the indicated sub-double word 
> writes, but the results were the same.
> 
Do you make sure you do not do the sub-double word writes?

I also do one 64 bits read / write access test for full memory space. 

Access memory with dobule precision float load/store instructions.
Lfd for 64 bits read and stfd for 64 bits write.

The code see the attatchment. And the result is

ECC off
read duration: 2317 ms
write duration: 774 ms

ECC on
read duration: 2317 ms
write duration: 774 ms

When ECC is on, we do double word write operation, so RMW
cycles don't happen.


> This is really strange, although at least read operations are not 
> affected by enabling ECC (which is according to the book - 
> there should 
> be minimal overhead put on read operations while ECC on, see 
> 3. below).
> 
> 2. DMA (low) performance
> 
> Using DMA for transfers proves very inefficient. As mentioned 
> earlier, 
> the DMA module in 8349 is different than seen in other 
> families, and it 
> occured to us a bit "alien" when compared with the rest of 
> the chip (DMA 
> documentation part is rather limited, and different in style 
> etc.), as 
> if taken from elsewhere. It is also peculiar in technical aspects: 
> endianness used is different, so we need to convert the order 
> explicitly 
> in s/w.
> 
> We tried increasing the local bus clocking but to no avail.
> 
Local bus clock don't effect to CSB and DDR performance.

> Given that low performance it doesn't make much difference 
> whether ECC 
> is enabled or not:
> 
> DMA, ECC on
> ddr init duration: 6947 ms
> 
> DMA, ECC off
> ddr init duration: 6721 ms
>
 
My test data is:

DMA, ECC on
ddr init duration: 6945 ms

DMA, ECC off
ddr init duration: 6558 ms

Just little difference to you.

> There seems something broken with the DMA operations in 
> general as they 
> are way slower than just plain read/write to memory, which is somehow 
> confirmed by your recent communication from the customer.
>
Init all of memory with DMA method as u-boot code,
DMA controller will do ----read from memory  and do ----write to memory.
and loop it.

This will arise lot of read access from memory. Consume more time. 
 
> 
> 3. ECC penalty
> 
> As can be seen in results given in 1. enabling ECC puts a 
> huge burden on 
> write access, which is contrary to 8349 UM:
> 
> p. 9-27 (above figure 9-24) "When ECC is enabled, one clock cycle is 
> added to the read path to check ECC and correct single-bit 
> errors.  ECC 
> generation does not add a cycle to the write path."
> 
> ----- quote begin -----
> 
> 
> Can you explain why writing to ECC memory is  10  times  
> slower  than reading?
> 
I hope you can tell me how did you mesure the read time. Thanks.


Regards,
Dave 

^ permalink raw reply	[flat|nested] 8+ messages in thread
* [U-Boot-Users] MPC83xx data cache lock?
@ 2006-05-23  9:13 Liu Dave-r63238
  2006-05-23  9:26 ` Wolfgang Denk
  0 siblings, 1 reply; 8+ messages in thread
From: Liu Dave-r63238 @ 2006-05-23  9:13 UTC (permalink / raw)
  To: u-boot

What is the [non-existent] write performance? Can you tell me?


In message <9FCDBA58F226D911B202000BDBAD4673026FD90F@zch01exm40.ap.freescale.net> you wrote:
> 
> MPC83xx data cache locked all ways in u-boot. This means data cache 
> looks like cache-inhibit. when kernel run at this u-boot, kernel don't 
> unlock data cache, so the data cache actually run at inhibited state. 
> I suggest we need unlock data cache after re locate_code in start.S 
> file.

Can you please retrict your line legth to the usual 70 characters  or so? Thanks.


> If unlock data cache in start.S file, we need change everything, such 
> as BATs settings and DDR ECC test code. I have found some bugs in ECC 
> test code.

Ummm.... Can you please  be  a  bit  more  specific?  I'm  definitely
interested   in  details.  Did  you  find  any  way  to  improve  the
[non-existent] write performance?

> ------_=_NextPart_001_01C67E43.D555AAE4
> Content-Type: text/html
> Content-Transfer-Encoding: base64
> 
> PCFET0NUWVBFIEhUTUwgUFVCTElDICItLy9XM0MvL0RURCBIVE1MIDQuMCBUcmFuc2l0aW
> 9uYWwv
> L0VOIj4NCjxIVE1MPjxIRUFEPg0KPE1FVEEgSFRUUC1FUVVJVj0iQ29udGVudC1UeXBlIiBDT05U
> RU5UPSJ0ZXh0L2h0bWw7IGNoYXJzZXQ9dXMtYXNjaWkiPg0KPFRJVExFPk1lc3NhZ2U8L1RJVExF
...

And please, don't post HTML, especially not encoded as  base64  which is a double PITA.

Best regards,

Wolfgang Denk

-- 
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de It is dangerous to be sincere unless you are also stupid.
                                                - George Bernard Shaw

^ permalink raw reply	[flat|nested] 8+ messages in thread
* [U-Boot-Users] MPC83xx data cache lock?
@ 2006-05-23  8:35 Liu Dave-r63238
  2006-05-23  8:52 ` Wolfgang Denk
  0 siblings, 1 reply; 8+ messages in thread
From: Liu Dave-r63238 @ 2006-05-23  8:35 UTC (permalink / raw)
  To: u-boot

All,
 
MPC83xx data cache locked all ways in u-boot. This means data cache looks like cache-inhibit. when kernel run at this u-boot, kernel don't unlock data cache, so the data cache actually run at inhibited state. I suggest we need unlock data cache after relocate_code in start.S file.
 
If unlock data cache in start.S file, we need change everything, such as BATs settings and DDR ECC test code. I have found some bugs in ECC test code.
 
Regards,
Dave
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.denx.de/pipermail/u-boot/attachments/20060523/1999e084/attachment.htm 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-05-28 22:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-26  5:33 [U-Boot-Users] MPC83xx data cache lock? Liu Dave-r63238
2006-05-28 22:24 ` Wolfgang Denk
  -- strict thread matches above, loose matches on Subject: below --
2006-05-24  9:56 Liu Dave-r63238
2006-05-24  9:48 Liu Dave-r63238
2006-05-23  9:13 Liu Dave-r63238
2006-05-23  9:26 ` Wolfgang Denk
2006-05-23  8:35 Liu Dave-r63238
2006-05-23  8:52 ` Wolfgang Denk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox