Performance improvements of Marocchino implementation

linux-openrisc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Performance improvements of Marocchino implementation
       [not found] <810295975.3514440.1740967866807.ref@mail.yahoo.com>
@ 2025-03-03  2:11 ` Idzwan Nizam Jamal Abdul Nasir
  2025-03-06 15:38   ` Stafford Horne
  2025-03-09 17:02   ` el 01
  0 siblings, 2 replies; 7+ messages in thread
From: Idzwan Nizam Jamal Abdul Nasir @ 2025-03-03  2:11 UTC (permalink / raw)
  To: linux-openrisc

Hi,

I am interested in OpenRISC Benchmarking and Performance improvements task listed as one of the project ideas in Google Summer of Code. I am unable to participate in GSOC but I would like to contribute to the task gradually as I acquire skills in digital logic and computer architecture.

Is the task still open? I would be glad if you could point me to the right direction such as documentation I should read or tools I have to be familiar with. Any guidance is welcome and greatly appreciated. Thank you.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Performance improvements of Marocchino implementation
  2025-03-03  2:11 ` Performance improvements of Marocchino implementation Idzwan Nizam Jamal Abdul Nasir
@ 2025-03-06 15:38   ` Stafford Horne
       [not found]     ` <16047285.689010.1741349294786@mail.yahoo.com>
  2025-03-09 17:02   ` el 01
  1 sibling, 1 reply; 7+ messages in thread
From: Stafford Horne @ 2025-03-06 15:38 UTC (permalink / raw)
  To: Idzwan Nizam Jamal Abdul Nasir; +Cc: linux-openrisc

On Mon, Mar 03, 2025 at 02:11:06AM +0000, Idzwan Nizam Jamal Abdul Nasir wrote:
> Hi,
> 
> I am interested in OpenRISC Benchmarking and Performance improvements task
> listed as one of the project ideas in Google Summer of Code. I am unable to
> participate in GSOC but I would like to contribute to the task gradually as I
> acquire skills in digital logic and computer architecture.
> 
> Is the task still open? I would be glad if you could point me to the right
> direction such as documentation I should read or tools I have to be familiar
> with. Any guidance is welcome and greatly appreciated. Thank you.

The task is still open.  There are other students interested so I will have to
end up having to choose the student with the best proposal and skill.

Please try to get started by reading up on what was done last year in this
space and see if you can follow some of the steps to get the development
environment setup.

Our previous GSoC participant, Leo, did a lot of ground work but never submitted
any formal GSoC progress report or documentation.

What he did produce were:

    Here's the embench-iot with fusesoc compatability changes applied:
    https://github.com/hhe07/embench-or-patched

    And here's some tools I made for working with instruction printouts on
posedge:
    https://codeberg.org/hhe07/or-analysis-utils

He started this blog with details of how to get OpenRISC simulations working
with Litex or FuseSoC.

    https://hhe07.codeberg.page/openrisc-work/

What we want to do is:

 *Getting started*
  1 Get openrisc mor1kx and maroccino working in embench-iot
    a. With SoC, either fusesoc or litex (fusesoc should be easier)
    b. With backend, either icarus or verilator (verilator seems to run faster)
    c. Serial output needs to work to be able to capture timing information or
      when tests start and stop.  In verilator this is a bit tricky but should
      work with the proper flags.
 *Recording Results*
  2 We next want to record reults in a format similar to embench-iot-results[0]
    a. Collect results for default maroccino, mor1kx
    b. Collect results with permutations of cpu and compiler config
      i.  Caches of difference sizes, enabled/disabled, different branch
          prediction algorithms, different ALU implementations, etc.
      ii. Certain instructions enabled/disabled by the compiler, -mror, -msfimm
          etc see `or1k-elf-gcc --target-help` (Note: the project aims to
          benchmark both CPU pipeline, compiler and instruction set efficiency)
 *Improving results*
  3 We next can look at where the cpus are lacking performance and how to
    improve things in the pipeline, LSU, caches etc.
  4 We can then go back to 2 (Record results again) and compare.

The 2024 GSoC project was only able to get done with steps in 1 and just started
with 2.a.  I think there is a lot of work left to be done, getting done with 2
for a GSoC project would be a great accomplishment.

[0] https://github.com/embench/embench-iot-results

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Performance improvements of Marocchino implementation
       [not found]     ` <16047285.689010.1741349294786@mail.yahoo.com>
@ 2025-03-07 16:07       ` Stafford Horne
  0 siblings, 0 replies; 7+ messages in thread
From: Stafford Horne @ 2025-03-07 16:07 UTC (permalink / raw)
  To: Idzwan Nizam Jamal Abdul Nasir; +Cc: linux-openrisc

Hello,

Replying Again, as last mail had html and didn't make it to the list.

On Fri, Mar 07, 2025 at 12:08:14PM +0000, Idzwan Nizam Jamal Abdul Nasir wrote:
>  
> Does it mean I need to send a proposal like a student who's joining GSOC too? 

Sorry no, there is no need if you are not doing gsoc.  If you can make progress
before gsoc starts I can stear students away from this project.

But please just post your interest and progress on the mailing list.

-Stafford

>      On Thursday, March 6, 2025 at 11:39:13 PM GMT+8, Stafford Horne <shorne@gmail.com> wrote:  
>  
>  On Mon, Mar 03, 2025 at 02:11:06AM +0000, Idzwan Nizam Jamal Abdul Nasir wrote:
> > Hi,
> > 
> > I am interested in OpenRISC Benchmarking and Performance improvements task
> > listed as one of the project ideas in Google Summer of Code. I am unable to
> > participate in GSOC but I would like to contribute to the task gradually as I
> > acquire skills in digital logic and computer architecture.
> > 
> > Is the task still open? I would be glad if you could point me to the right
> > direction such as documentation I should read or tools I have to be familiar
> > with. Any guidance is welcome and greatly appreciated. Thank you.
> 
> The task is still open.  There are other students interested so I will have to
> end up having to choose the student with the best proposal and skill.
> 
> Please try to get started by reading up on what was done last year in this
> space and see if you can follow some of the steps to get the development
> environment setup.
> 
> Our previous GSoC participant, Leo, did a lot of ground work but never submitted
> any formal GSoC progress report or documentation.
> 
> What he did produce were:
> 
>     Here's the embench-iot with fusesoc compatability changes applied:
>     https://github.com/hhe07/embench-or-patched
> 
>     And here's some tools I made for working with instruction printouts on
> posedge:
>     https://codeberg.org/hhe07/or-analysis-utils
> 
> He started this blog with details of how to get OpenRISC simulations working
> with Litex or FuseSoC.
> 
>     https://hhe07.codeberg.page/openrisc-work/
> 
> What we want to do is:
> 
>  *Getting started*
>   1 Get openrisc mor1kx and maroccino working in embench-iot
>     a. With SoC, either fusesoc or litex (fusesoc should be easier)
>     b. With backend, either icarus or verilator (verilator seems to run faster)
>     c. Serial output needs to work to be able to capture timing information or
>       when tests start and stop.  In verilator this is a bit tricky but should
>       work with the proper flags.
>  *Recording Results*
>   2 We next want to record reults in a format similar to embench-iot-results[0]
>     a. Collect results for default maroccino, mor1kx
>     b. Collect results with permutations of cpu and compiler config
>       i.  Caches of difference sizes, enabled/disabled, different branch
>           prediction algorithms, different ALU implementations, etc.
>       ii. Certain instructions enabled/disabled by the compiler, -mror, -msfimm
>           etc see `or1k-elf-gcc --target-help` (Note: the project aims to
>           benchmark both CPU pipeline, compiler and instruction set efficiency)
>  *Improving results*
>   3 We next can look at where the cpus are lacking performance and how to
>     improve things in the pipeline, LSU, caches etc.
>   4 We can then go back to 2 (Record results again) and compare.
> 
> The 2024 GSoC project was only able to get done with steps in 1 and just started
> with 2.a.  I think there is a lot of work left to be done, getting done with 2
> for a GSoC project would be a great accomplishment.
> 
> [0] https://github.com/embench/embench-iot-results
> 
>   

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Performance improvements of Marocchino implementation
  2025-03-03  2:11 ` Performance improvements of Marocchino implementation Idzwan Nizam Jamal Abdul Nasir
  2025-03-06 15:38   ` Stafford Horne
@ 2025-03-09 17:02   ` el 01
  2025-03-10 10:09     ` Idzwan Nizam
  1 sibling, 1 reply; 7+ messages in thread
From: el 01 @ 2025-03-09 17:02 UTC (permalink / raw)
  To: Idzwan Nizam Jamal Abdul Nasir, linux-openrisc

[-- Attachment #1: Type: text/plain, Size: 3633 bytes --]

Hello,

Hopefully this makes its way onto the mailing list, my previous email 
didn't.

Stafford's previous email basically covered what I did last summer. I've 
been dealing with some health issues and haven't been able to 
consistently document my progress; really sorry about the lack of 
documentation.

I left off at trying to figure out some (perhaps superficial) 
differences between measured cycle counts when running a benchmark on 
LiteX and FuseSoC, two different 'build systems' for the HDL design. 
This doesn't directly address the discrepancy between marocchino and 
mor1kx, but was a step along the way.

The build systems bundle the OpenRISC core with some other necessary 
hardware (e.g. simulated memory, peripherals, etc.) and build either a 
binary for simulation on your computer, or something which can be put on 
an FPGA.

When running binaries from the Embench benchmarking suite on the same 
processor core / simulation engine (and only changing the build system), 
there are some tests which have substantially different cycle counts.

Some initial data I gathered will be attached, it seems like there are 
some substantial differences in the cycles required to execute some 
instructions on LiteX.

I'm also not 100% on whether the measured cycle counts are completely 
accurate, as the debug / trace parts of the LiteX and FuseSoC are 
somewhat different.

Another minor thing that I wanted to address was some inefficiency in 
running LiteX simulations. Because of the way that the Embench testing 
script for LiteX works (see 
https://github.com/hhe07/litex-esp/blob/main/sim.py -- from what I 
remember this is stuff that you can copy into your Embench install 
folder to enable compatibility), I think the CPU and some of the 
supporting software is rebuilt every time a different benchmark is run, 
which wastes a lot of time.

As for where this fits into the larger issue of the performance 
discrepancy between mor1kx and marocchino, (in my opinion / experience) 
I spent a lot of time trying to figure out the tools and determining if 
what I wanted to do was a feature of a tool or something I needed to 
figure out. So, I'd recommend trying to understand the tooling and 
perhaps doing some practice tasks around it. YMMV, though.

I know I haven't really made this problem better due to poor 
documentation on my part, so please email if you're unsure about 
something that I did. I'll try to reply ASAP.

As for the attached files:
- profile.ods includes analysis on cycle counts per instruction for one 
test, I think nettle_sha256. This is for the mor1kx CPU.
- results.ods includes cycle counts for all Embench tests run on both 
FuseSoC and LiteX, and calculated percent differences between the two.
- nettle-mor1kx-{fusesoc, litex}-trace-prof include the outputs of cycle 
counts from the analysis scripts I wrote (basically same as 
profile.ods), as well as some additional information on the PCs of the 
start/end of critical sections in the code, and how many cycles they 
took to execute.

~ Leo

On 02/03/2025 21:11, Idzwan Nizam Jamal Abdul Nasir wrote:
> Hi,
> 
> I am interested in OpenRISC Benchmarking and Performance improvements task listed as one of the project ideas in Google Summer of Code. I am unable to participate in GSOC but I would like to contribute to the task gradually as I acquire skills in digital logic and computer architecture.
> 
> Is the task still open? I would be glad if you could point me to the right direction such as documentation I should read or tools I have to be familiar with. Any guidance is welcome and greatly appreciated. Thank you.
> 

[-- Attachment #2: profile.ods --]
[-- Type: application/vnd.oasis.opendocument.spreadsheet, Size: 30653 bytes --]

[-- Attachment #3: results.ods --]
[-- Type: application/vnd.oasis.opendocument.spreadsheet, Size: 44467 bytes --]

[-- Attachment #4: nettle-mor1kx-fusesoc-trace-prof --]
[-- Type: text/plain, Size: 473 bytes --]

jal: 1.000000
jump: 1.000000
l.add: 1.066557
l.addi: 1.381047
l.and: 1.015590
l.andi: 1.117647
l.bf: 1.589264
l.bnf: 1.000000
l.jr: 1.750131
l.lbz: 1.000000
l.lhz: 1.000000
l.lwz: 1.148739
l.movhi: 1.004103
l.nop: 1.000000
l.or: 1.003138
l.ori: 1.018006
l.sb: 2.550000
l.sfgtu: 1.593730
l.sll: 1.006010
l.srl: 1.067661
l.sub: 1.000000
l.sw: 1.384018
l.xor: 1.012799
l.xori: 1.000000

nettle update pc: 45f0 -> 45f8
56 -> 189

nettle write digest: 4600 -> 4608

191 -> 10521

[-- Attachment #5: nettle-mor1kx-litex-trace-prof --]
[-- Type: text/plain, Size: 520 bytes --]

jal: 3.624573
jump: 1.040984
l.add: 2.005511
l.addi: 1.331264
l.and: 1.948052
l.andi: 2.000000
l.bf: 1.137044
l.bnf: 2.000000
l.jr: 3.000789
l.lbz: 1.041667
l.lhz: 1.625000             
l.lwz: 2.438729
l.movhi: 5.815331
l.nop: 5.398569
l.or: 2.064235
l.ori: 2.064082
l.sb: 2.542857
l.sfgtu: 1.212507
l.sll: 2.140017
l.srl: 2.075332
l.sub: 2.000000
l.sw: 3.226664
l.xor: 2.095170
l.xori: 3.500000

nettle_sha256.update pc: 4000352c: -> 40003530:
59 -> 554

nettle_sha256.write digest: 4000353c: -> 40003540:

558 -> 12034

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Performance improvements of Marocchino implementation
  2025-03-09 17:02   ` el 01
@ 2025-03-10 10:09     ` Idzwan Nizam
  2025-03-11 20:00       ` el 01
  0 siblings, 1 reply; 7+ messages in thread
From: Idzwan Nizam @ 2025-03-10 10:09 UTC (permalink / raw)
  To: linux-openrisc

Would it be OK and consider a progress if only FuseSoC were used?

On 10/3/2025 1:02 am, el 01 wrote:
> Hello,
>
> Hopefully this makes its way onto the mailing list, my previous email 
> didn't.
>
> Stafford's previous email basically covered what I did last summer. 
> I've been dealing with some health issues and haven't been able to 
> consistently document my progress; really sorry about the lack of 
> documentation.
>
> I left off at trying to figure out some (perhaps superficial) 
> differences between measured cycle counts when running a benchmark on 
> LiteX and FuseSoC, two different 'build systems' for the HDL design. 
> This doesn't directly address the discrepancy between marocchino and 
> mor1kx, but was a step along the way.
>
> The build systems bundle the OpenRISC core with some other necessary 
> hardware (e.g. simulated memory, peripherals, etc.) and build either a 
> binary for simulation on your computer, or something which can be put 
> on an FPGA.
>
> When running binaries from the Embench benchmarking suite on the same 
> processor core / simulation engine (and only changing the build 
> system), there are some tests which have substantially different cycle 
> counts.
>
> Some initial data I gathered will be attached, it seems like there are 
> some substantial differences in the cycles required to execute some 
> instructions on LiteX.
>
> I'm also not 100% on whether the measured cycle counts are completely 
> accurate, as the debug / trace parts of the LiteX and FuseSoC are 
> somewhat different.
>
> Another minor thing that I wanted to address was some inefficiency in 
> running LiteX simulations. Because of the way that the Embench testing 
> script for LiteX works (see 
> https://github.com/hhe07/litex-esp/blob/main/sim.py -- from what I 
> remember this is stuff that you can copy into your Embench install 
> folder to enable compatibility), I think the CPU and some of the 
> supporting software is rebuilt every time a different benchmark is 
> run, which wastes a lot of time.
>
> As for where this fits into the larger issue of the performance 
> discrepancy between mor1kx and marocchino, (in my opinion / 
> experience) I spent a lot of time trying to figure out the tools and 
> determining if what I wanted to do was a feature of a tool or 
> something I needed to figure out. So, I'd recommend trying to 
> understand the tooling and perhaps doing some practice tasks around 
> it. YMMV, though.
>
> I know I haven't really made this problem better due to poor 
> documentation on my part, so please email if you're unsure about 
> something that I did. I'll try to reply ASAP.
>
> As for the attached files:
> - profile.ods includes analysis on cycle counts per instruction for 
> one test, I think nettle_sha256. This is for the mor1kx CPU.
> - results.ods includes cycle counts for all Embench tests run on both 
> FuseSoC and LiteX, and calculated percent differences between the two.
> - nettle-mor1kx-{fusesoc, litex}-trace-prof include the outputs of 
> cycle counts from the analysis scripts I wrote (basically same as 
> profile.ods), as well as some additional information on the PCs of the 
> start/end of critical sections in the code, and how many cycles they 
> took to execute.
>
>
> ~ Leo
>
> On 02/03/2025 21:11, Idzwan Nizam Jamal Abdul Nasir wrote:
>> Hi,
>>
>> I am interested in OpenRISC Benchmarking and Performance improvements 
>> task listed as one of the project ideas in Google Summer of Code. I 
>> am unable to participate in GSOC but I would like to contribute to 
>> the task gradually as I acquire skills in digital logic and computer 
>> architecture.
>>
>> Is the task still open? I would be glad if you could point me to the 
>> right direction such as documentation I should read or tools I have 
>> to be familiar with. Any guidance is welcome and greatly appreciated. 
>> Thank you.
>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Performance improvements of Marocchino implementation
  2025-03-10 10:09     ` Idzwan Nizam
@ 2025-03-11 20:00       ` el 01
  2025-03-11 22:17         ` Stafford Horne
  0 siblings, 1 reply; 7+ messages in thread
From: el 01 @ 2025-03-11 20:00 UTC (permalink / raw)
  To: Idzwan Nizam, linux-openrisc

I defer to Stafford's judgement, but please don't feel an obligation to 
continue exactly where I left off. If you want to work with FuseSoC 
only, that should be fine.

A bit of advice in advance -- I didn't write a script to batch run tests 
on FuseSoC. I instead ran each of the binaries individually by invoking 
``fusesoc run``.

On 10/03/2025 06:09, Idzwan Nizam wrote:
> Would it be OK and consider a progress if only FuseSoC were used?
> 
> On 10/3/2025 1:02 am, el 01 wrote:
>> Hello,
>>
>> Hopefully this makes its way onto the mailing list, my previous email 
>> didn't.
>>
>> Stafford's previous email basically covered what I did last summer. 
>> I've been dealing with some health issues and haven't been able to 
>> consistently document my progress; really sorry about the lack of 
>> documentation.
>>
>> I left off at trying to figure out some (perhaps superficial) 
>> differences between measured cycle counts when running a benchmark on 
>> LiteX and FuseSoC, two different 'build systems' for the HDL design. 
>> This doesn't directly address the discrepancy between marocchino and 
>> mor1kx, but was a step along the way.
>>
>> The build systems bundle the OpenRISC core with some other necessary 
>> hardware (e.g. simulated memory, peripherals, etc.) and build either a 
>> binary for simulation on your computer, or something which can be put 
>> on an FPGA.
>>
>> When running binaries from the Embench benchmarking suite on the same 
>> processor core / simulation engine (and only changing the build 
>> system), there are some tests which have substantially different cycle 
>> counts.
>>
>> Some initial data I gathered will be attached, it seems like there are 
>> some substantial differences in the cycles required to execute some 
>> instructions on LiteX.
>>
>> I'm also not 100% on whether the measured cycle counts are completely 
>> accurate, as the debug / trace parts of the LiteX and FuseSoC are 
>> somewhat different.
>>
>> Another minor thing that I wanted to address was some inefficiency in 
>> running LiteX simulations. Because of the way that the Embench testing 
>> script for LiteX works (see https://github.com/hhe07/litex-esp/blob/ 
>> main/sim.py -- from what I remember this is stuff that you can copy 
>> into your Embench install folder to enable compatibility), I think the 
>> CPU and some of the supporting software is rebuilt every time a 
>> different benchmark is run, which wastes a lot of time.
>>
>> As for where this fits into the larger issue of the performance 
>> discrepancy between mor1kx and marocchino, (in my opinion / 
>> experience) I spent a lot of time trying to figure out the tools and 
>> determining if what I wanted to do was a feature of a tool or 
>> something I needed to figure out. So, I'd recommend trying to 
>> understand the tooling and perhaps doing some practice tasks around 
>> it. YMMV, though.
>>
>> I know I haven't really made this problem better due to poor 
>> documentation on my part, so please email if you're unsure about 
>> something that I did. I'll try to reply ASAP.
>>
>> As for the attached files:
>> - profile.ods includes analysis on cycle counts per instruction for 
>> one test, I think nettle_sha256. This is for the mor1kx CPU.
>> - results.ods includes cycle counts for all Embench tests run on both 
>> FuseSoC and LiteX, and calculated percent differences between the two.
>> - nettle-mor1kx-{fusesoc, litex}-trace-prof include the outputs of 
>> cycle counts from the analysis scripts I wrote (basically same as 
>> profile.ods), as well as some additional information on the PCs of the 
>> start/end of critical sections in the code, and how many cycles they 
>> took to execute.
>>
>>
>> ~ Leo
>>
>> On 02/03/2025 21:11, Idzwan Nizam Jamal Abdul Nasir wrote:
>>> Hi,
>>>
>>> I am interested in OpenRISC Benchmarking and Performance improvements 
>>> task listed as one of the project ideas in Google Summer of Code. I 
>>> am unable to participate in GSOC but I would like to contribute to 
>>> the task gradually as I acquire skills in digital logic and computer 
>>> architecture.
>>>
>>> Is the task still open? I would be glad if you could point me to the 
>>> right direction such as documentation I should read or tools I have 
>>> to be familiar with. Any guidance is welcome and greatly appreciated. 
>>> Thank you.
>>>
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Performance improvements of Marocchino implementation
  2025-03-11 20:00       ` el 01
@ 2025-03-11 22:17         ` Stafford Horne
  0 siblings, 0 replies; 7+ messages in thread
From: Stafford Horne @ 2025-03-11 22:17 UTC (permalink / raw)
  To: el 01; +Cc: Idzwan Nizam, linux-openrisc

Hi Idzwan,

I noticed you reached out on IRC, if you want to join OTFC/#openrisc
we could chat there too.  I am not around much there as it has been really quiet
recently but its also available.

On Tue, Mar 11, 2025 at 04:00:23PM -0400, el 01 wrote:
> I defer to Stafford's judgement, but please don't feel an obligation to
> continue exactly where I left off. If you want to work with FuseSoC only,
> that should be fine.

I thimk using fusesoc is a good option.  But ideally both fusesoc and litex
would both work.  I think whatever you are more familiar with is best.

-Stafford

> A bit of advice in advance -- I didn't write a script to batch run tests on
> FuseSoC. I instead ran each of the binaries individually by invoking
> ``fusesoc run``.
> 
> On 10/03/2025 06:09, Idzwan Nizam wrote:
> > Would it be OK and consider a progress if only FuseSoC were used?
> > 
> > On 10/3/2025 1:02 am, el 01 wrote:
> > > Hello,
> > > 
> > > Hopefully this makes its way onto the mailing list, my previous
> > > email didn't.
> > > 
> > > Stafford's previous email basically covered what I did last summer.
> > > I've been dealing with some health issues and haven't been able to
> > > consistently document my progress; really sorry about the lack of
> > > documentation.
> > > 
> > > I left off at trying to figure out some (perhaps superficial)
> > > differences between measured cycle counts when running a benchmark
> > > on LiteX and FuseSoC, two different 'build systems' for the HDL
> > > design. This doesn't directly address the discrepancy between
> > > marocchino and mor1kx, but was a step along the way.
> > > 
> > > The build systems bundle the OpenRISC core with some other necessary
> > > hardware (e.g. simulated memory, peripherals, etc.) and build either
> > > a binary for simulation on your computer, or something which can be
> > > put on an FPGA.
> > > 
> > > When running binaries from the Embench benchmarking suite on the
> > > same processor core / simulation engine (and only changing the build
> > > system), there are some tests which have substantially different
> > > cycle counts.
> > > 
> > > Some initial data I gathered will be attached, it seems like there
> > > are some substantial differences in the cycles required to execute
> > > some instructions on LiteX.
> > > 
> > > I'm also not 100% on whether the measured cycle counts are
> > > completely accurate, as the debug / trace parts of the LiteX and
> > > FuseSoC are somewhat different.
> > > 
> > > Another minor thing that I wanted to address was some inefficiency
> > > in running LiteX simulations. Because of the way that the Embench
> > > testing script for LiteX works (see
> > > https://github.com/hhe07/litex-esp/blob/ main/sim.py -- from what I
> > > remember this is stuff that you can copy into your Embench install
> > > folder to enable compatibility), I think the CPU and some of the
> > > supporting software is rebuilt every time a different benchmark is
> > > run, which wastes a lot of time.
> > > 
> > > As for where this fits into the larger issue of the performance
> > > discrepancy between mor1kx and marocchino, (in my opinion /
> > > experience) I spent a lot of time trying to figure out the tools and
> > > determining if what I wanted to do was a feature of a tool or
> > > something I needed to figure out. So, I'd recommend trying to
> > > understand the tooling and perhaps doing some practice tasks around
> > > it. YMMV, though.
> > > 
> > > I know I haven't really made this problem better due to poor
> > > documentation on my part, so please email if you're unsure about
> > > something that I did. I'll try to reply ASAP.
> > > 
> > > As for the attached files:
> > > - profile.ods includes analysis on cycle counts per instruction for
> > > one test, I think nettle_sha256. This is for the mor1kx CPU.
> > > - results.ods includes cycle counts for all Embench tests run on
> > > both FuseSoC and LiteX, and calculated percent differences between
> > > the two.
> > > - nettle-mor1kx-{fusesoc, litex}-trace-prof include the outputs of
> > > cycle counts from the analysis scripts I wrote (basically same as
> > > profile.ods), as well as some additional information on the PCs of
> > > the start/end of critical sections in the code, and how many cycles
> > > they took to execute.
> > > 
> > > 
> > > ~ Leo
> > > 
> > > On 02/03/2025 21:11, Idzwan Nizam Jamal Abdul Nasir wrote:
> > > > Hi,
> > > > 
> > > > I am interested in OpenRISC Benchmarking and Performance
> > > > improvements task listed as one of the project ideas in Google
> > > > Summer of Code. I am unable to participate in GSOC but I would
> > > > like to contribute to the task gradually as I acquire skills in
> > > > digital logic and computer architecture.
> > > > 
> > > > Is the task still open? I would be glad if you could point me to
> > > > the right direction such as documentation I should read or tools
> > > > I have to be familiar with. Any guidance is welcome and greatly
> > > > appreciated. Thank you.
> > > > 
> > 
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-03-11 22:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <810295975.3514440.1740967866807.ref@mail.yahoo.com>
2025-03-03  2:11 ` Performance improvements of Marocchino implementation Idzwan Nizam Jamal Abdul Nasir
2025-03-06 15:38   ` Stafford Horne
     [not found]     ` <16047285.689010.1741349294786@mail.yahoo.com>
2025-03-07 16:07       ` Stafford Horne
2025-03-09 17:02   ` el 01
2025-03-10 10:09     ` Idzwan Nizam
2025-03-11 20:00       ` el 01
2025-03-11 22:17         ` Stafford Horne

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).