From: David Laight <David.Laight@ACULAB.COM>
To: 'Evan Green' <evan@rivosinc.com>
Cc: "Geert Uytterhoeven" <geert@linux-m68k.org>,
"Palmer Dabbelt" <palmer@rivosinc.com>,
"Heiko Stuebner" <heiko@sntech.de>,
"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
"Björn Töpel" <bjorn@rivosinc.com>,
"Conor Dooley" <conor.dooley@microchip.com>,
"Guo Ren" <guoren@kernel.org>,
"Jisheng Zhang" <jszhang@kernel.org>,
"linux-riscv@lists.infradead.org"
<linux-riscv@lists.infradead.org>,
"Jonathan Corbet" <corbet@lwn.net>,
"Sia Jee Heng" <jeeheng.sia@starfivetech.com>,
"Marc Zyngier" <maz@kernel.org>,
"Masahiro Yamada" <masahiroy@kernel.org>,
"Greentime Hu" <greentime.hu@sifive.com>,
"Simon Hosie" <shosie@rivosinc.com>,
"Andrew Jones" <ajones@ventanamicro.com>,
"Albert Ou" <aou@eecs.berkeley.edu>,
"Alexandre Ghiti" <alexghiti@rivosinc.com>,
"Ley Foon Tan" <leyfoon.tan@starfivetech.com>,
"Paul Walmsley" <paul.walmsley@sifive.com>,
"Anup Patel" <apatel@ventanamicro.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Xianting Tian" <xianting.tian@linux.alibaba.com>,
"Palmer Dabbelt" <palmer@dabbelt.com>,
"Andy Chiu" <andy.chiu@sifive.com>
Subject: RE: [PATCH v4 1/2] RISC-V: Probe for unaligned access speed
Date: Thu, 14 Sep 2023 15:55:41 +0000 [thread overview]
Message-ID: <172bc43cc2ac45239ec40477d53d263a@AcuMS.aculab.com> (raw)
In-Reply-To: <CALs-Hstcz3OAxUi80nm+U0R56VBUUPQT=+XMOLpVJsn2ZOcM1A@mail.gmail.com>
From: Evan Green
> Sent: 14 September 2023 16:01
>
> On Thu, Sep 14, 2023 at 1:47 AM David Laight <David.Laight@aculab.com> wrote:
> >
> > From: Geert Uytterhoeven
> > > Sent: 14 September 2023 08:33
> > ...
> > > > > rzfive:
> > > > > cpu0: Ratio of byte access time to unaligned word access is
> > > > > 1.05, unaligned accesses are fast
> > > >
> > > > Hrm, I'm a little surprised to be seeing this number come out so close
> > > > to 1. If you reboot a few times, what kind of variance do you get on
> > > > this?
> > >
> > > Rock-solid at 1.05 (even with increased resolution: 1.05853 on 3 tries)
> >
> > Would that match zero overhead unless the access crosses a
> > cache line boundary?
> > (I can't remember whether the test is using increasing addresses.)
>
> Yes, the test does use increasing addresses, it copies across 4 pages.
> We start with a warmup, so caching effects beyond L1 are largely not
> taken into account.
That seems entirely excessive.
If you want to avoid data cache issues (which probably do)
then just repeating a single access would almost certainly
suffice.
Repeatedly using a short buffer (say 256 bytes) won't add
much loop overhead.
Although you may want to do a test that avoids transfers
that cross cache line and especially page boundaries.
Either of those could easily be much slower than a read
that is entirely within a cache line.
...
> > > > > vexriscv/orangecrab:
> > > > >
> > > > > cpu0: Ratio of byte access time to unaligned word access is
> > > > > 0.00, unaligned accesses are slow
> > >
> > > cpu0: Ratio of byte access time to unaligned word access is 0.00417,
> > > unaligned accesses are slow
> > >
> > > > > I am a bit surprised by the near-zero values. Are these expected?
> > > >
> > > > This could be expected, if firmware is trapping the unaligned accesses
> > > > and coming out >100x slower than a native access. If you're interested
> > > > in getting a little more resolution, you could try to print a few more
> > > > decimal places with something like (sorry gmail mangles the whitespace
> > > > on this):
> >
> > I'd expect one of three possible values:
> > - 1.0x: Basically zero cost except for cache line/page boundaries.
> > - ~2: Hardware does two reads and merges the values.
> > - >100: Trap fixed up in software.
> >
> > I'd think the '2' case could be considered fast.
> > You only need to time one access to see if it was a fault.
>
> We're comparing misaligned word accesses with byte accesses of the
> same total size. So 1.0 means a misaligned load is basically no
> different from 8 byte loads. The goal was to help people that are
> forced to do odd loads and stores decide whether they are better off
> moving by bytes or by misaligned words. (In contrast, the answer to
> "should I do a misaligned word load or an aligned word load" is
> generally always "do the aligned one if you can", so comparing those
> two things didn't seem as useful).
Ah, I'd have compared the cost of aligned accesses with misaligned ones.
That would tell you whether you really need to avoid them.
The cost of byte and aligned word accesses should be much the same
(for each access that is) - if not you've got a real bottleneck.
If a misaligned access is 8 times slower than an aligned one
it is still 'quite slow'.
I'd definitely call that 8 not 1 - even if you treat it as 'fast'.
For comparison you (well I) can write x64-64 asm for the ip-checksum
loop that will execute 1 memory read every clock (8 bytes/clock).
It is very slightly slower for misaligned buffers, but by less
than 1 clock per cache line.
That's what I'd call 1.0 :-)
I'd expect even simple hardware to do misaligned reads as two
reads and then merge the data - so should really be no slower
than two separate aligned reads.
Since you'd expect a cpu to do an L1 data cache read every clock
(probably pipelined) the misaligned read should just add 1 clock.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
next prev parent reply other threads:[~2023-09-14 15:56 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-18 19:41 [PATCH v4 0/2] RISC-V: Probe for misaligned access speed Evan Green
2023-08-18 19:41 ` [PATCH v4 1/2] RISC-V: Probe for unaligned " Evan Green
2023-09-13 12:36 ` Geert Uytterhoeven
2023-09-13 17:46 ` Evan Green
2023-09-14 7:32 ` Geert Uytterhoeven
2023-09-14 8:46 ` David Laight
2023-09-14 15:01 ` Evan Green
2023-09-14 15:55 ` David Laight [this message]
2023-09-14 16:36 ` Evan Green
2023-09-15 7:57 ` David Laight
2023-09-15 16:47 ` Evan Green
2023-10-19 6:37 ` Geert Uytterhoeven
2023-10-19 7:51 ` Lad, Prabhakar
2023-08-18 19:41 ` [PATCH v4 2/2] RISC-V: alternative: Remove feature_probe_func Evan Green
2023-08-30 20:30 ` [PATCH v4 0/2] RISC-V: Probe for misaligned access speed patchwork-bot+linux-riscv
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=172bc43cc2ac45239ec40477d53d263a@AcuMS.aculab.com \
--to=david.laight@aculab.com \
--cc=ajones@ventanamicro.com \
--cc=alexghiti@rivosinc.com \
--cc=andy.chiu@sifive.com \
--cc=aou@eecs.berkeley.edu \
--cc=apatel@ventanamicro.com \
--cc=bjorn@rivosinc.com \
--cc=conor.dooley@microchip.com \
--cc=corbet@lwn.net \
--cc=evan@rivosinc.com \
--cc=geert@linux-m68k.org \
--cc=greentime.hu@sifive.com \
--cc=guoren@kernel.org \
--cc=heiko@sntech.de \
--cc=jeeheng.sia@starfivetech.com \
--cc=jszhang@kernel.org \
--cc=leyfoon.tan@starfivetech.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=masahiroy@kernel.org \
--cc=maz@kernel.org \
--cc=palmer@dabbelt.com \
--cc=palmer@rivosinc.com \
--cc=paul.walmsley@sifive.com \
--cc=shosie@rivosinc.com \
--cc=xianting.tian@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox