From: Lorenzo Bianconi <lorenzo@kernel.org>
To: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Shay Agroskin <shayagr@amazon.com>,
Lorenzo Bianconi <lorenzo.bianconi@redhat.com>,
bpf@vger.kernel.org, netdev@vger.kernel.org, davem@davemloft.net,
kuba@kernel.org, ast@kernel.org, daniel@iogearbox.net,
toke@redhat.com, freysteinn.alfredsson@kau.se,
john.fastabend@gmail.com, jasowang@redhat.com, mst@redhat.com,
thomas.petazzoni@bootlin.com, mw@semihalf.com,
linux@armlinux.org.uk, ilias.apalodimas@linaro.org,
netanel@amazon.com, akiyano@amazon.com,
michael.chan@broadcom.com, madalin.bucur@nxp.com,
ioana.ciornei@nxp.com, jesse.brandeburg@intel.com,
anthony.l.nguyen@intel.com, saeedm@nvidia.com,
grygorii.strashko@ti.com, ecree.xilinx@gmail.com
Subject: Re: [PATCH v2 bpf-next] bpf: devmap: move drop error path to devmap for XDP_REDIRECT
Date: Tue, 2 Mar 2021 16:28:43 +0100 [thread overview]
Message-ID: <YD5ZqzIa5TymNdB4@lore-desk> (raw)
In-Reply-To: <20210301211837.4a755c44@carbon>
[-- Attachment #1: Type: text/plain, Size: 5625 bytes --]
> On Mon, 1 Mar 2021 13:23:06 +0200
> Shay Agroskin <shayagr@amazon.com> wrote:
>
> > Jesper Dangaard Brouer <brouer@redhat.com> writes:
> >
> > > On Sun, 28 Feb 2021 23:27:25 +0100
> > > Lorenzo Bianconi <lorenzo.bianconi@redhat.com> wrote:
> > >
> > >> > > drops = bq->count - sent;
> > >> > > -out:
> > >> > > - bq->count = 0;
> > >> > > + if (unlikely(drops > 0)) {
> > >> > > + /* If not all frames have been
> > >> > > transmitted, it is our
> > >> > > + * responsibility to free them
> > >> > > + */
> > >> > > + for (i = sent; i < bq->count; i++)
> > >> > > +
> > >> > > xdp_return_frame_rx_napi(bq->q[i]);
> > >> > > + }
> > >> >
> > >> > Wouldn't the logic above be the same even w/o the 'if'
> > >> > condition ?
> > >>
> > >> it is just an optimization to avoid the for loop instruction if
> > >> sent = bq->count
> > >
> > > True, and I like this optimization.
> > > It will affect how the code layout is (and thereby I-cache
> > > usage).
> >
> > I'm not sure what I-cache optimization you mean here. Compiling
> > the following C code:
> >
> > # define unlikely(x) __builtin_expect(!!(x), 0)
> >
> > extern void xdp_return_frame_rx_napi(int q);
> >
> > struct bq_stuff {
> > int q[4];
> > int count;
> > };
> >
> > int test(int sent, struct bq_stuff *bq) {
> > int i;
> > int drops;
> >
> > drops = bq->count - sent;
> > if(unlikely(drops > 0))
> > for (i = sent; i < bq->count; i++)
> > xdp_return_frame_rx_napi(bq->q[i]);
> >
> > return 2;
> > }
> >
> > with x86_64 gcc 10.2 with -O3 flag in https://godbolt.org/ (which
> > provides the assembly code for different compilers) yields the
> > following assembly:
> >
> > test:
> > mov eax, DWORD PTR [rsi+16]
> > mov edx, eax
> > sub edx, edi
> > test edx, edx
> > jg .L10
> > .L6:
> > mov eax, 2
> > ret
>
> This exactly shows my point. Notice how 'ret' happens earlier in this
> function. This is the common case, thus the CPU don't have to load the
> asm instruction below.
>
> > .L10:
> > cmp eax, edi
> > jle .L6
> > push rbp
> > mov rbp, rsi
> > push rbx
> > movsx rbx, edi
> > sub rsp, 8
> > .L3:
> > mov edi, DWORD PTR [rbp+0+rbx*4]
> > add rbx, 1
> > call xdp_return_frame_rx_napi
> > cmp DWORD PTR [rbp+16], ebx
> > jg .L3
> > add rsp, 8
> > mov eax, 2
> > pop rbx
> > pop rbp
> > ret
> >
> >
> > When dropping the 'if' completely I get the following assembly
> > output
> > test:
> > cmp edi, DWORD PTR [rsi+16]
> > jge .L6
>
> Jump to .L6 which is the common case. The code in between is not used
> in common case, but the CPU will likely load this into I-cache, and
> then jumps over the code in common case.
>
> > push rbp
> > mov rbp, rsi
> > push rbx
> > movsx rbx, edi
> > sub rsp, 8
> > .L3:
> > mov edi, DWORD PTR [rbp+0+rbx*4]
> > add rbx, 1
> > call xdp_return_frame_rx_napi
> > cmp DWORD PTR [rbp+16], ebx
> > jg .L3
> > add rsp, 8
> > mov eax, 2
> > pop rbx
> > pop rbp
> > ret
> > .L6:
> > mov eax, 2
> > ret
> >
> > which exits earlier from the function if 'drops > 0' compared to
> > the original code (the 'for' loop looks a little different, but
> > this shouldn't affect icache).
> >
> > When removing the 'if' and surrounding the 'for' condition with
> > 'unlikely' statement:
> >
> > for (i = sent; unlikely(i < bq->count); i++)
> >
> > I get the following assembly code:
> >
> > test:
> > cmp edi, DWORD PTR [rsi+16]
> > jl .L10
> > mov eax, 2
> > ret
> > .L10:
> > push rbx
> > movsx rbx, edi
> > sub rsp, 16
> > .L3:
> > mov edi, DWORD PTR [rsi+rbx*4]
> > mov QWORD PTR [rsp+8], rsi
> > add rbx, 1
> > call xdp_return_frame_rx_napi
> > mov rsi, QWORD PTR [rsp+8]
> > cmp DWORD PTR [rsi+16], ebx
> > jg .L3
> > add rsp, 16
> > mov eax, 2
> > pop rbx
> > ret
> >
> > which is shorter than the other two (one line compared to the
> > second and 7 lines compared the original code) and seems as
> > optimized as the second.
>
> You are also using unlikely() and get the earlier return, with less
> instructions, which is great. Perhaps we can use this type of
> unlikely() in the for-statement? WDYT Lorenzo?
sure, we can do it..I will address it in v3. Thanks.
Regards,
Lorenzo
>
>
> > I'm far from being an assembly expert, and I tested a code snippet
> > I wrote myself rather than the kernel's code (for the sake of
> > simplicity only).
> > Can you please elaborate on what makes the original 'if' essential
> > (I took the time to do the assembly tests, please take the time on
> > your side to prove your point, I'm not trying to be grumpy here).
> >
> > Shay
>
> --
> Best regards,
> Jesper Dangaard Brouer
> MSc.CS, Principal Kernel Engineer at Red Hat
> LinkedIn: http://www.linkedin.com/in/brouer
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
next prev parent reply other threads:[~2021-03-03 4:11 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-27 11:04 [PATCH v2 bpf-next] bpf: devmap: move drop error path to devmap for XDP_REDIRECT Lorenzo Bianconi
2021-02-28 12:15 ` Shay Agroskin
2021-02-28 22:27 ` Lorenzo Bianconi
2021-03-01 7:48 ` Jesper Dangaard Brouer
2021-03-01 11:23 ` Shay Agroskin
2021-03-01 20:18 ` Jesper Dangaard Brouer
2021-03-02 15:28 ` Lorenzo Bianconi [this message]
2021-03-03 11:29 ` Shay Agroskin
2021-03-01 11:59 ` Ioana Ciornei
2021-03-01 12:26 ` Ilias Apalodimas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YD5ZqzIa5TymNdB4@lore-desk \
--to=lorenzo@kernel.org \
--cc=akiyano@amazon.com \
--cc=anthony.l.nguyen@intel.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brouer@redhat.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=ecree.xilinx@gmail.com \
--cc=freysteinn.alfredsson@kau.se \
--cc=grygorii.strashko@ti.com \
--cc=ilias.apalodimas@linaro.org \
--cc=ioana.ciornei@nxp.com \
--cc=jasowang@redhat.com \
--cc=jesse.brandeburg@intel.com \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=linux@armlinux.org.uk \
--cc=lorenzo.bianconi@redhat.com \
--cc=madalin.bucur@nxp.com \
--cc=michael.chan@broadcom.com \
--cc=mst@redhat.com \
--cc=mw@semihalf.com \
--cc=netanel@amazon.com \
--cc=netdev@vger.kernel.org \
--cc=saeedm@nvidia.com \
--cc=shayagr@amazon.com \
--cc=thomas.petazzoni@bootlin.com \
--cc=toke@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).