From: "'Naveen N. Rao'" <naveen.n.rao@linux.vnet.ibm.com>
To: Benjamin Herrenschmidt <benh@au1.ibm.com>
Cc: David Laight <David.Laight@ACULAB.COM>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
"davem@davemloft.net" <davem@davemloft.net>,
"daniel@iogearbox.net" <daniel@iogearbox.net>,
"ast@fb.com" <ast@fb.com>,
Madhavan Srinivasan <maddy@linux.vnet.ibm.com>,
Michael Ellerman <mpe@ellerman.id.au>
Subject: Re: [PATCH 3/3] powerpc: bpf: implement in-register swap for 64-bit endian operations
Date: Tue, 24 Jan 2017 00:52:27 +0530 [thread overview]
Message-ID: <20170123192227.GE3820@naverao1-tp.localdomain> (raw)
In-Reply-To: <1484492458.11927.17.camel@au1.ibm.com>
On 2017/01/15 09:00AM, Benjamin Herrenschmidt wrote:
> On Fri, 2017-01-13 at 23:22 +0530, 'Naveen N. Rao' wrote:
> > > That rather depends on whether the processor has a store to load forwarder
> > > that will satisfy the read from the store buffer.
> > > I don't know about ppc, but at least some x86 will do that.
> >
> > Interesting - good to know that.
> >
> > However, I don't think powerpc does that and in-register swap is likely
> > faster regardless. Note also that gcc prefers this form at higher
> > optimization levels.
>
> Of course powerpc has a load-store forwarder these days, however, I
> wouldn't be surprised if the in-register form was still faster on some
> implementations, but this needs to be tested.
Thanks for clarifying! To test this, I wrote a simple (perhaps naive)
test that just issues a whole lot of endian swaps and in _that_ test, it
does look like the load-store forwarder is doing pretty well.
The tests:
bpf-bswap.S:
-----------
.file "bpf-bswap.S"
.abiversion 2
.section ".text"
.align 2
.globl main
.type main, @function
main:
mflr 0
std 0,16(1)
stdu 1,-32760(1)
addi 3,1,32
li 4,0
li 5,32720
li 11,32720
mulli 11,11,8
li 10,0
li 7,16
1: ldx 6,3,4
stdx 6,1,7
ldbrx 6,1,7
stdx 6,3,4
addi 4,4,8
cmpd 4,5
beq 2f
b 1b
2: addi 10,10,1
li 4,0
cmpd 10,11
beq 3f
b 1b
3: li 3,0
addi 1,1,32760
ld 0,16(1)
mtlr 0
blr
bpf-bswap-reg.S:
---------------
.file "bpf-bswap-reg.S"
.abiversion 2
.section ".text"
.align 2
.globl main
.type main, @function
main:
mflr 0
std 0,16(1)
stdu 1,-32760(1)
addi 3,1,32
li 4,0
li 5,32720
li 11,32720
mulli 11,11,8
li 10,0
1: ldx 6,3,4
rldicl 7,6,32,32
rlwinm 8,6,24,0,31
rlwimi 8,6,8,8,15
rlwinm 9,7,24,0,31
rlwimi 8,6,8,24,31
rlwimi 9,7,8,8,15
rlwimi 9,7,8,24,31
rldicr 8,8,32,31
or 6,8,9
stdx 6,3,4
addi 4,4,8
cmpd 4,5
beq 2f
b 1b
2: addi 10,10,1
li 4,0
cmpd 10,11
beq 3f
b 1b
3: li 3,0
addi 1,1,32760
ld 0,16(1)
mtlr 0
blr
Profiling the two variants:
# perf stat ./bpf-bswap
Performance counter stats for './bpf-bswap':
1395.979224 task-clock (msec) # 0.999 CPUs utilized
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
45 page-faults # 0.032 K/sec
4,651,874,673 cycles # 3.332 GHz (66.87%)
3,141,186 stalled-cycles-frontend # 0.07% frontend cycles idle (50.57%)
1,117,289,485 stalled-cycles-backend # 24.02% backend cycles idle (50.57%)
8,565,963,861 instructions # 1.84 insn per cycle
# 0.13 stalled cycles per insn (67.05%)
2,174,029,771 branches # 1557.351 M/sec (49.69%)
262,656 branch-misses # 0.01% of all branches (50.05%)
1.396893189 seconds time elapsed
# perf stat ./bpf-bswap-reg
Performance counter stats for './bpf-bswap-reg':
1819.758102 task-clock (msec) # 0.999 CPUs utilized
3 context-switches # 0.002 K/sec
0 cpu-migrations # 0.000 K/sec
44 page-faults # 0.024 K/sec
6,034,777,602 cycles # 3.316 GHz (66.83%)
2,010,983 stalled-cycles-frontend # 0.03% frontend cycles idle (50.47%)
1,024,975,759 stalled-cycles-backend # 16.98% backend cycles idle (50.52%)
16,043,732,849 instructions # 2.66 insn per cycle
# 0.06 stalled cycles per insn (67.01%)
2,148,710,750 branches # 1180.767 M/sec (49.57%)
268,046 branch-misses # 0.01% of all branches (49.52%)
1.821501345 seconds time elapsed
This is all in a POWER8 vm. On POWER7, the in-register variant is around
4 times faster than the ldbrx variant.
So, yes, unless I've missed something, the ldbrx variant seems to
perform better, if not on par with the in-register swap variant on
POWER8.
>
> Ideally, you'd want to try to "optimize" load+swap or swap+store
> though.
Agreed. This is already the case with BPF for packet access - those use
skb helpers which issue the appropriate lhbrx/lwbrx/ldbrx. The newer
BPF_FROM_LE/BPF_FROM_BE are for endian operations with other BPF
programs.
We can probably implement an extra pass to detect use of endian swap and
try to match it up with a previous load or a subsequent store though...
Thanks!
- Naveen
next prev parent reply other threads:[~2017-01-23 19:23 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-13 17:10 [PATCH 1/3] powerpc: bpf: remove redundant check for non-null image Naveen N. Rao
2017-01-13 17:10 ` [PATCH 2/3] powerpc: bpf: flush the entire JIT buffer Naveen N. Rao
2017-01-13 20:10 ` Alexei Starovoitov
2017-01-13 22:55 ` Daniel Borkmann
2017-01-27 0:40 ` [2/3] " Michael Ellerman
2017-01-13 17:10 ` [PATCH 3/3] powerpc: bpf: implement in-register swap for 64-bit endian operations Naveen N. Rao
2017-01-13 17:17 ` David Laight
2017-01-13 17:52 ` 'Naveen N. Rao'
2017-01-15 15:00 ` Benjamin Herrenschmidt
2017-01-23 19:22 ` 'Naveen N. Rao' [this message]
2017-01-24 16:13 ` David Laight
2017-01-24 16:25 ` 'Naveen N. Rao'
2017-01-13 20:09 ` [PATCH 1/3] powerpc: bpf: remove redundant check for non-null image Alexei Starovoitov
2017-01-16 18:38 ` David Miller
2017-01-23 17:14 ` Naveen N. Rao
2017-01-27 0:40 ` [1/3] " Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170123192227.GE3820@naverao1-tp.localdomain \
--to=naveen.n.rao@linux.vnet.ibm.com \
--cc=David.Laight@ACULAB.COM \
--cc=ast@fb.com \
--cc=benh@au1.ibm.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=maddy@linux.vnet.ibm.com \
--cc=mpe@ellerman.id.au \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).