From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Puranjay Mohan <puranjay@kernel.org>,
Albert Ou <aou@eecs.berkeley.edu>,
Alexei Starovoitov <ast@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Andrii Nakryiko <andrii@kernel.org>,
bpf@vger.kernel.org, Daniel Borkmann <daniel@iogearbox.net>,
"David S. Miller" <davem@davemloft.net>,
Eduard Zingerman <eddyz87@gmail.com>,
Eric Dumazet <edumazet@google.com>, Hao Luo <haoluo@google.com>,
Helge Deller <deller@gmx.de>, Jakub Kicinski <kuba@kernel.org>,
"James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>,
Jiri Olsa <jolsa@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
linux-kernel@vger.kernel.org, linux-parisc@vger.kernel.org,
linux-riscv@lists.infradead.org,
Martin KaFai Lau <martin.lau@linux.dev>,
Mykola Lysenko <mykolal@fb.com>,
netdev@vger.kernel.org, Palmer Dabbelt <palmer@dabbelt.com>,
Paolo Abeni <pabeni@redhat.com>,
Paul Walmsley <paul.walmsley@sifive.com>,
Puranjay Mohan <puranjay12@gmail.com>,
Puranjay Mohan <puranjay@kernel.org>,
Shuah Khan <shuah@kernel.org>, Song Liu <song@kernel.org>,
Stanislav Fomichev <sdf@fomichev.me>,
Yonghong Song <yonghong.song@linux.dev>
Subject: Re: [PATCH bpf-next 2/5] bpf: bpf_csum_diff: optimize and homogenize for all archs
Date: Tue, 22 Oct 2024 11:54:32 +0200 [thread overview]
Message-ID: <874j54iht3.fsf@toke.dk> (raw)
In-Reply-To: <20241021122112.101513-3-puranjay@kernel.org>
Puranjay Mohan <puranjay@kernel.org> writes:
> 1. Optimization
> ------------
>
> The current implementation copies the 'from' and 'to' buffers to a
> scratchpad and it takes the bitwise NOT of 'from' buffer while copying.
> In the next step csum_partial() is called with this scratchpad.
>
> so, mathematically, the current implementation is doing:
>
> result = csum(to - from)
>
> Here, 'to' and '~ from' are copied in to the scratchpad buffer, we need
> it in the scratchpad buffer because csum_partial() takes a single
> contiguous buffer and not two disjoint buffers like 'to' and 'from'.
>
> We can re write this equation to:
>
> result = csum(to) - csum(from)
>
> using the distributive property of csum().
>
> this allows 'to' and 'from' to be at different locations and therefore
> this scratchpad and copying is not needed.
>
> This in C code will look like:
>
> result = csum_sub(csum_partial(to, to_size, seed),
> csum_partial(from, from_size, 0));
>
> 2. Homogenization
> --------------
>
> The bpf_csum_diff() helper calls csum_partial() which is implemented by
> some architectures like arm and x86 but other architectures rely on the
> generic implementation in lib/checksum.c
>
> The generic implementation in lib/checksum.c returns a 16 bit value but
> the arch specific implementations can return more than 16 bits, this
> works out in most places because before the result is used, it is passed
> through csum_fold() that turns it into a 16-bit value.
>
> bpf_csum_diff() directly returns the value from csum_partial() and
> therefore the returned values could be different on different
> architectures. see discussion in [1]:
>
> for the int value 28 the calculated checksums are:
>
> x86 : -29 : 0xffffffe3
> generic (arm64, riscv) : 65507 : 0x0000ffe3
> arm : 131042 : 0x0001ffe2
>
> Pass the result of bpf_csum_diff() through from32to16() before returning
> to homogenize this result for all architectures.
>
> NOTE: from32to16() is used instead of csum_fold() because csum_fold()
> does from32to16() + bitwise NOT of the result, which is not what we want
> to do here.
>
> [1] https://lore.kernel.org/bpf/CAJ+HfNiQbOcqCLxFUP2FMm5QrLXUUaj852Fxe3hn_2JNiucn6g@mail.gmail.com/
>
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Pretty neat simplification :)
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
WARNING: multiple messages have this Message-ID (diff)
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Puranjay Mohan <puranjay@kernel.org>,
Albert Ou <aou@eecs.berkeley.edu>,
Alexei Starovoitov <ast@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Andrii Nakryiko <andrii@kernel.org>,
bpf@vger.kernel.org, Daniel Borkmann <daniel@iogearbox.net>,
"David S. Miller" <davem@davemloft.net>,
Eduard Zingerman <eddyz87@gmail.com>,
Eric Dumazet <edumazet@google.com>, Hao Luo <haoluo@google.com>,
Helge Deller <deller@gmx.de>, Jakub Kicinski <kuba@kernel.org>,
"James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>,
Jiri Olsa <jolsa@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
linux-kernel@vger.kernel.org, linux-parisc@vger.kernel.org,
linux-riscv@lists.infradead.org,
Martin KaFai Lau <martin.lau@linux.dev>,
Mykola Lysenko <mykolal@fb.com>,
netdev@vger.kernel.org, Palmer Dabbelt <palmer@dabbelt.com>,
Paolo Abeni <pabeni@redhat.com>,
Paul Walmsley <paul.walmsley@sifive.com>,
Puranjay Mohan <puranjay12@gmail.com>,
Puranjay Mohan <puranjay@kernel.org>,
Shuah Khan <shuah@kernel.org>, Song Liu <song@kernel.org>,
Stanislav Fomichev <sdf@fomichev.me>,
Yonghong Song <yonghong.song@linux.dev>
Subject: Re: [PATCH bpf-next 2/5] bpf: bpf_csum_diff: optimize and homogenize for all archs
Date: Tue, 22 Oct 2024 11:54:32 +0200 [thread overview]
Message-ID: <874j54iht3.fsf@toke.dk> (raw)
In-Reply-To: <20241021122112.101513-3-puranjay@kernel.org>
Puranjay Mohan <puranjay@kernel.org> writes:
> 1. Optimization
> ------------
>
> The current implementation copies the 'from' and 'to' buffers to a
> scratchpad and it takes the bitwise NOT of 'from' buffer while copying.
> In the next step csum_partial() is called with this scratchpad.
>
> so, mathematically, the current implementation is doing:
>
> result = csum(to - from)
>
> Here, 'to' and '~ from' are copied in to the scratchpad buffer, we need
> it in the scratchpad buffer because csum_partial() takes a single
> contiguous buffer and not two disjoint buffers like 'to' and 'from'.
>
> We can re write this equation to:
>
> result = csum(to) - csum(from)
>
> using the distributive property of csum().
>
> this allows 'to' and 'from' to be at different locations and therefore
> this scratchpad and copying is not needed.
>
> This in C code will look like:
>
> result = csum_sub(csum_partial(to, to_size, seed),
> csum_partial(from, from_size, 0));
>
> 2. Homogenization
> --------------
>
> The bpf_csum_diff() helper calls csum_partial() which is implemented by
> some architectures like arm and x86 but other architectures rely on the
> generic implementation in lib/checksum.c
>
> The generic implementation in lib/checksum.c returns a 16 bit value but
> the arch specific implementations can return more than 16 bits, this
> works out in most places because before the result is used, it is passed
> through csum_fold() that turns it into a 16-bit value.
>
> bpf_csum_diff() directly returns the value from csum_partial() and
> therefore the returned values could be different on different
> architectures. see discussion in [1]:
>
> for the int value 28 the calculated checksums are:
>
> x86 : -29 : 0xffffffe3
> generic (arm64, riscv) : 65507 : 0x0000ffe3
> arm : 131042 : 0x0001ffe2
>
> Pass the result of bpf_csum_diff() through from32to16() before returning
> to homogenize this result for all architectures.
>
> NOTE: from32to16() is used instead of csum_fold() because csum_fold()
> does from32to16() + bitwise NOT of the result, which is not what we want
> to do here.
>
> [1] https://lore.kernel.org/bpf/CAJ+HfNiQbOcqCLxFUP2FMm5QrLXUUaj852Fxe3hn_2JNiucn6g@mail.gmail.com/
>
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Pretty neat simplification :)
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
next prev parent reply other threads:[~2024-10-22 9:54 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-21 12:21 [PATCH bpf-next 0/5] Optimize bpf_csum_diff() and homogenize for all archs Puranjay Mohan
2024-10-21 12:21 ` Puranjay Mohan
2024-10-21 12:21 ` [PATCH bpf-next 1/5] net: checksum: move from32to16() to generic header Puranjay Mohan
2024-10-21 12:21 ` Puranjay Mohan
2024-10-21 13:41 ` Daniel Borkmann
2024-10-21 13:41 ` Daniel Borkmann
2024-10-22 9:49 ` Toke Høiland-Jørgensen
2024-10-22 9:49 ` Toke Høiland-Jørgensen
2024-10-22 13:50 ` kernel test robot
2024-10-22 13:50 ` kernel test robot
2024-10-21 12:21 ` [PATCH bpf-next 2/5] bpf: bpf_csum_diff: optimize and homogenize for all archs Puranjay Mohan
2024-10-21 12:21 ` Puranjay Mohan
2024-10-21 13:42 ` Daniel Borkmann
2024-10-21 13:42 ` Daniel Borkmann
2024-10-22 9:54 ` Toke Høiland-Jørgensen [this message]
2024-10-22 9:54 ` Toke Høiland-Jørgensen
2024-10-22 18:09 ` kernel test robot
2024-10-22 18:09 ` kernel test robot
2024-10-21 12:21 ` [PATCH bpf-next 3/5] selftests/bpf: don't mask result of bpf_csum_diff() in test_verifier Puranjay Mohan
2024-10-21 12:21 ` Puranjay Mohan
2024-10-21 13:01 ` Helge Deller
2024-10-21 13:01 ` Helge Deller
2024-10-21 13:14 ` Puranjay Mohan
2024-10-21 13:14 ` Puranjay Mohan
2024-10-21 14:04 ` Helge Deller
2024-10-21 14:04 ` Helge Deller
2024-10-21 13:42 ` Daniel Borkmann
2024-10-21 13:42 ` Daniel Borkmann
2024-10-22 9:55 ` Toke Høiland-Jørgensen
2024-10-22 9:55 ` Toke Høiland-Jørgensen
2024-10-21 12:21 ` [PATCH bpf-next 4/5] selftests/bpf: Add benchmark for bpf_csum_diff() helper Puranjay Mohan
2024-10-21 12:21 ` Puranjay Mohan
2024-10-21 13:43 ` Daniel Borkmann
2024-10-21 13:43 ` Daniel Borkmann
2024-10-21 23:28 ` Andrii Nakryiko
2024-10-21 23:28 ` Andrii Nakryiko
2024-10-22 10:21 ` Puranjay Mohan
2024-10-22 10:21 ` Puranjay Mohan
2024-10-22 17:47 ` Andrii Nakryiko
2024-10-22 17:47 ` Andrii Nakryiko
2024-10-22 17:58 ` Puranjay Mohan
2024-10-22 17:58 ` Puranjay Mohan
2024-10-23 15:37 ` Puranjay Mohan
2024-10-23 15:37 ` Puranjay Mohan
2024-10-21 12:21 ` [PATCH bpf-next 5/5] selftests/bpf: Add a selftest for bpf_csum_diff() Puranjay Mohan
2024-10-21 12:21 ` Puranjay Mohan
2024-10-21 13:44 ` Daniel Borkmann
2024-10-21 13:44 ` Daniel Borkmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=874j54iht3.fsf@toke.dk \
--to=toke@redhat.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=akpm@linux-foundation.org \
--cc=andrii@kernel.org \
--cc=aou@eecs.berkeley.edu \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=deller@gmx.de \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-parisc@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=martin.lau@linux.dev \
--cc=mykolal@fb.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=puranjay12@gmail.com \
--cc=puranjay@kernel.org \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.