From: Charlie Jenkins <charlie@rivosinc.com>
To: David Laight <David.Laight@aculab.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>,
Conor Dooley <conor@kernel.org>,
Samuel Holland <samuel.holland@sifive.com>,
"linux-riscv@lists.infradead.org"
<linux-riscv@lists.infradead.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
Paul Walmsley <paul.walmsley@sifive.com>,
Albert Ou <aou@eecs.berkeley.edu>, Arnd Bergmann <arnd@arndb.de>
Subject: Re: [PATCH v6 3/4] riscv: Add checksum library
Date: Tue, 19 Sep 2023 14:04:48 -0400 [thread overview]
Message-ID: <ZQniwNEoYLo52HI7@ghost> (raw)
In-Reply-To: <0fe9694900c7492c96dce6b67710173f@AcuMS.aculab.com>
On Tue, Sep 19, 2023 at 08:00:12AM +0000, David Laight wrote:
> ...
> > > So ending up with (something like):
> > > end = buff + length;
> > > ...
> > > while (++ptr < end) {
> > > csum += data;
> > > carry += csum < data;
> > > data = ptr[-1];
> > > }
> > > (Although a do-while loop tends to generate better code
> > > and gcc will pretty much always make that transformation.)
> > >
> > > I think that is 4 instructions per word (load, add, cmp+set, add).
> > > In principle they could be completely pipelined and all
> > > execute (for different loop iterations) in the same clock.
> > > (But that is pretty unlikely to happen - even x86 isn't that good.)
> > > But taking two clocks is quite plausible.
> > > Plus 2 instructions per loop (inc, cmp+jmp).
> > > They might execute in parallel, but unrolling once
> > > may be required.
> > >
> > It looks like GCC actually ends up generating 7 total instructions:
> > ffffffff808d2acc: 97b6 add a5,a5,a3
> > ffffffff808d2ace: 00d7b533 sltu a0,a5,a3
> > ffffffff808d2ad2: 0721 add a4,a4,8
> > ffffffff808d2ad4: 86be mv a3,a5
> > ffffffff808d2ad6: 962a add a2,a2,a0
> > ffffffff808d2ad8: ff873783 ld a5,-8(a4)
> > ffffffff808d2adc: feb768e3 bltu a4,a1,ffffffff808d2acc <do_csum+0x34>
> >
> > This mv instruction could be avoided if the registers were shuffled
> > around, but perhaps this way reduces some dependency chains.
>
> gcc managed to do 'data += csum' so had add 'csum = data'.
> If you unroll once that might go away.
> It might then be 10 instructions for 16 bytes.
> Although you then need slightly larger alignment code.
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>
I messed with it a bit and couldn't get the mv to go away. I would expect
mv to be very cheap so it should be fine, and I would like to avoid adding
too much to the alignment code since it is already large, and I assume
that buff will be aligned more often than not.
Interestingly, the mv does not appear pre gcc 12, and does not appear on clang.
- Charlie
WARNING: multiple messages have this Message-ID (diff)
From: Charlie Jenkins <charlie@rivosinc.com>
To: David Laight <David.Laight@aculab.com>
Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
Albert Ou <aou@eecs.berkeley.edu>, Arnd Bergmann <arnd@arndb.de>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Conor Dooley <conor@kernel.org>,
Palmer Dabbelt <palmer@dabbelt.com>,
Paul Walmsley <paul.walmsley@sifive.com>,
"linux-riscv@lists.infradead.org"
<linux-riscv@lists.infradead.org>
Subject: Re: [PATCH v6 3/4] riscv: Add checksum library
Date: Tue, 19 Sep 2023 14:04:48 -0400 [thread overview]
Message-ID: <ZQniwNEoYLo52HI7@ghost> (raw)
In-Reply-To: <0fe9694900c7492c96dce6b67710173f@AcuMS.aculab.com>
On Tue, Sep 19, 2023 at 08:00:12AM +0000, David Laight wrote:
> ...
> > > So ending up with (something like):
> > > end = buff + length;
> > > ...
> > > while (++ptr < end) {
> > > csum += data;
> > > carry += csum < data;
> > > data = ptr[-1];
> > > }
> > > (Although a do-while loop tends to generate better code
> > > and gcc will pretty much always make that transformation.)
> > >
> > > I think that is 4 instructions per word (load, add, cmp+set, add).
> > > In principle they could be completely pipelined and all
> > > execute (for different loop iterations) in the same clock.
> > > (But that is pretty unlikely to happen - even x86 isn't that good.)
> > > But taking two clocks is quite plausible.
> > > Plus 2 instructions per loop (inc, cmp+jmp).
> > > They might execute in parallel, but unrolling once
> > > may be required.
> > >
> > It looks like GCC actually ends up generating 7 total instructions:
> > ffffffff808d2acc: 97b6 add a5,a5,a3
> > ffffffff808d2ace: 00d7b533 sltu a0,a5,a3
> > ffffffff808d2ad2: 0721 add a4,a4,8
> > ffffffff808d2ad4: 86be mv a3,a5
> > ffffffff808d2ad6: 962a add a2,a2,a0
> > ffffffff808d2ad8: ff873783 ld a5,-8(a4)
> > ffffffff808d2adc: feb768e3 bltu a4,a1,ffffffff808d2acc <do_csum+0x34>
> >
> > This mv instruction could be avoided if the registers were shuffled
> > around, but perhaps this way reduces some dependency chains.
>
> gcc managed to do 'data += csum' so had add 'csum = data'.
> If you unroll once that might go away.
> It might then be 10 instructions for 16 bytes.
> Although you then need slightly larger alignment code.
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>
I messed with it a bit and couldn't get the mv to go away. I would expect
mv to be very cheap so it should be fine, and I would like to avoid adding
too much to the alignment code since it is already large, and I assume
that buff will be aligned more often than not.
Interestingly, the mv does not appear pre gcc 12, and does not appear on clang.
- Charlie
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
next prev parent reply other threads:[~2023-09-19 18:04 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-15 17:01 [PATCH v6 0/4] riscv: Add fine-tuned checksum functions Charlie Jenkins
2023-09-15 17:01 ` Charlie Jenkins
2023-09-15 17:01 ` [PATCH v6 1/4] asm-generic: Improve csum_fold Charlie Jenkins
2023-09-15 17:01 ` Charlie Jenkins
2023-09-16 8:50 ` Conor Dooley
2023-09-16 8:50 ` Conor Dooley
2023-09-15 17:01 ` [PATCH v6 2/4] riscv: Checksum header Charlie Jenkins
2023-09-15 17:01 ` Charlie Jenkins
2023-09-15 17:01 ` [PATCH v6 3/4] riscv: Add checksum library Charlie Jenkins
2023-09-15 17:01 ` Charlie Jenkins
2023-09-16 9:32 ` David Laight
2023-09-16 9:32 ` David Laight
2023-09-19 2:58 ` Charlie Jenkins
2023-09-19 2:58 ` Charlie Jenkins
2023-09-19 8:00 ` David Laight
2023-09-19 8:00 ` David Laight
2023-09-19 18:04 ` Charlie Jenkins [this message]
2023-09-19 18:04 ` Charlie Jenkins
2023-09-15 17:01 ` [PATCH v6 4/4] riscv: Test checksum functions Charlie Jenkins
2023-09-15 17:01 ` Charlie Jenkins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZQniwNEoYLo52HI7@ghost \
--to=charlie@rivosinc.com \
--cc=David.Laight@aculab.com \
--cc=aou@eecs.berkeley.edu \
--cc=arnd@arndb.de \
--cc=conor@kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=samuel.holland@sifive.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.