Re: [RFC PATCH 1/1] BPF JIT for PPC64

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

From: Matt Evans <matt@ozlabs.org>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev@vger.kernel.org, linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [RFC PATCH 1/1] BPF JIT for PPC64
Date: Mon, 11 Jul 2011 17:04:34 +1000	[thread overview]
Message-ID: <4E1AA082.2040703@ozlabs.org> (raw)
In-Reply-To: <1308988180.2532.27.camel@edumazet-laptop>

Hi Eric,

On 25/06/11 17:49, Eric Dumazet wrote:
> Le samedi 25 juin 2011 à 09:33 +0200, Andreas Schwab a écrit :
>> Matt Evans <matt@ozlabs.org> writes:
>>
>>> +	stdu	r1, -128(r1);					\
>>
>>> +	addi	r5, r1, 128+BPF_PPC_STACK_BASIC+(2*8);		\
>>
>>> +	addi	r1, r1, 128;					\
>>
>>> +					PPC_STD(r_M + i, 1, -128 + (8*i));
>>
>>> +					PPC_LD(r_M + i, 1, -128 + (8*i));
>>
>> s/128/BPF_PPC_STACK_SAVE/?
>>
> 
> I am not sure using registers to hold MEM[] is a win if MEM[idx] is used
> once in the filter
> 
> # tcpdump "tcp[20]+tcp[21]=0" -d
> (000) ldh      [12]
> (001) jeq      #0x800           jt 2	jf 15
> (002) ldb      [23]
> (003) jeq      #0x6             jt 4	jf 15
> (004) ldh      [20]
> (005) jset     #0x1fff          jt 15	jf 6
> (006) ldxb     4*([14]&0xf)
> (007) ldb      [x + 34]
> (008) st       M[1]
> (009) ldb      [x + 35]
> (010) tax      
> (011) ld       M[1]
> (012) add      x
> (013) jeq      #0x0             jt 14	jf 15
> (014) ret      #65535
> (015) ret      #0
> 
> In this sample, we use M[1] once ( one store, one load)
> 
> So saving previous register content on stack in prologue, and restoring
> it in epilogue actually slow down the code, and adds two instructions in
> filter asm code.

The x86 version generates all accesses of M[x] as a load or store to the
stackframe, so your example would result in one store + one load to/from the
stack.  More than one access would result in N stores/loads.  By having M[] live
in r16-31 on PPC, there is a one-off cost of saving/restoring the (non-volatile)
register to the stack plus a per-use cost of a register-register move (MR, which
is pretty cheap compared to loads/stores!).

You are correct in that, for a single store/load of M[1] above, I'll generate
a STD, MR, MR, LD, but this extra cost of the two MRs is pretty small.  With the
current implementation the gains seen when accessing M[x] /more/ than once will
IMHO more than justify this.  (For several M[x] accesses, x86 would have several
mem ops, PPC would have several reg-reg moves and one load+store.)

An obvious alternative would be to use some of the three free volatile registers
first (in the hope that "most" filters won't use >3 M[] locations), with a
simple register allocator.  This would save the (non-volatile reg) spill/fill to
stack, too.  In the interest of simplicity I didn't want to do that on a first
pass.

> This also makes epilogue code not easy (not possible as a matter of fact)
> to unwind in helper function
> 
> In x86_64 implementation, I chose bpf_error be able to force
> an exception, not returning to JIT code but directly to bpf_func() caller
> 
> bpf_error:
> # force a return 0 from jit handler
>         xor             %eax,%eax
>         mov             -8(%rbp),%rbx
>         leaveq
>         ret

Yep, if I use non-volatile regs a return isn't just a simple stack "pop".
Currently, I've an extra branch in the return path to hit the common epilogue.
This could be optimised such that the out of line error path jumps directly to
the common epilogue to return (rather than back to the callsite, checking a flag
and /then/ to the epilogue) to speed up the non-error case.  However, it's just
a question of getting to the (existing) epilogue code to clean up; it doesn't
need to be unwound in the helper function.  I don't think this issue is a
strong argument against having M[] exist in registers, though.

>From the current position, I think going in the direction of using volatile regs
(without backup/restore cost) is better than going in the direction of making
all M[] references stack accesses.  Do you think it's bearable to continue as it
is and then perform that optimisation later?

Also, thanks for reading/commenting on the patch!

Cheers,

Matt

next prev parent reply	other threads:[~2011-07-11  7:04 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-24  6:02 [RFC PATCH 1/1] BPF JIT for PPC64 Matt Evans
2011-06-25  1:58 ` Ben Hutchings
2011-07-11  6:21   ` Matt Evans
2011-06-25  7:33 ` Andreas Schwab
2011-06-25  7:49   ` Eric Dumazet
2011-07-11  7:04     ` Matt Evans [this message]
2011-07-11  6:27   ` Matt Evans

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E1AA082.2040703@ozlabs.org \
    --to=matt@ozlabs.org \
    --cc=eric.dumazet@gmail.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).