From: Yonghong Song <yonghong.song@linux.dev>
To: Marco Elver <elver@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Song Liu <song@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>, Mykola Lysenko <mykolal@fb.com>,
Shuah Khan <shuah@kernel.org>,
bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-kselftest@vger.kernel.org
Subject: Re: [PATCH] bpf: Separate bpf_local_storage_lookup() fast and slow paths
Date: Wed, 31 Jan 2024 12:04:00 -0800 [thread overview]
Message-ID: <bd3d594e-55e7-47c3-99f4-efc1a4688fb2@linux.dev> (raw)
In-Reply-To: <20240131141858.1149719-1-elver@google.com>
On 1/31/24 6:18 AM, Marco Elver wrote:
> To allow the compiler to inline the bpf_local_storage_lookup() fast-
> path, factor it out by making bpf_local_storage_lookup() a static inline
> function and move the slow-path to bpf_local_storage_lookup_slowpath().
>
> Base on results from './benchs/run_bench_local_storage.sh' this produces
> improvements in throughput and latency in the majority of cases:
>
> | Hashmap Control
> | ===============
> | num keys: 10
> | hashmap (control) sequential get:
> | <before> | <after>
> | hits throughput: 13.895 ± 0.024 M ops/s | 14.022 ± 0.095 M ops/s (+0.9%)
> | hits latency: 71.968 ns/op | 71.318 ns/op (-0.9%)
> | important_hits throughput: 13.895 ± 0.024 M ops/s | 14.022 ± 0.095 M ops/s (+0.9%)
> |
> | num keys: 1000
> | hashmap (control) sequential get:
> | <before> | <after>
> | hits throughput: 11.793 ± 0.018 M ops/s | 11.645 ± 0.370 M ops/s (-1.3%)
> | hits latency: 84.794 ns/op | 85.874 ns/op (+1.3%)
> | important_hits throughput: 11.793 ± 0.018 M ops/s | 11.645 ± 0.370 M ops/s (-1.3%)
> |
> | num keys: 10000
> | hashmap (control) sequential get:
> | <before> | <after>
> | hits throughput: 7.113 ± 0.012 M ops/s | 7.037 ± 0.051 M ops/s (-1.1%)
> | hits latency: 140.581 ns/op | 142.113 ns/op (+1.1%)
> | important_hits throughput: 7.113 ± 0.012 M ops/s | 7.037 ± 0.051 M ops/s (-1.1%)
> |
> | num keys: 100000
> | hashmap (control) sequential get:
> | <before> | <after>
> | hits throughput: 4.793 ± 0.034 M ops/s | 4.990 ± 0.025 M ops/s (+4.1%)
> | hits latency: 208.623 ns/op | 200.401 ns/op (-3.9%)
> | important_hits throughput: 4.793 ± 0.034 M ops/s | 4.990 ± 0.025 M ops/s (+4.1%)
> |
> | num keys: 4194304
> | hashmap (control) sequential get:
> | <before> | <after>
> | hits throughput: 2.088 ± 0.008 M ops/s | 2.962 ± 0.004 M ops/s (+41.9%)
> | hits latency: 478.851 ns/op | 337.648 ns/op (-29.5%)
> | important_hits throughput: 2.088 ± 0.008 M ops/s | 2.962 ± 0.004 M ops/s (+41.9%)
> |
> | Local Storage
> | =============
> | num_maps: 1
> | local_storage cache sequential get:
> | <before> | <after>
> | hits throughput: 32.598 ± 0.008 M ops/s | 38.480 ± 0.054 M ops/s (+18.0%)
> | hits latency: 30.676 ns/op | 25.988 ns/op (-15.3%)
> | important_hits throughput: 32.598 ± 0.008 M ops/s | 38.480 ± 0.054 M ops/s (+18.0%)
> | local_storage cache interleaved get:
> | <before> | <after>
> | hits throughput: 36.963 ± 0.045 M ops/s | 43.847 ± 0.037 M ops/s (+18.6%)
> | hits latency: 27.054 ns/op | 22.807 ns/op (-15.7%)
> | important_hits throughput: 36.963 ± 0.045 M ops/s | 43.847 ± 0.037 M ops/s (+18.6%)
> |
> | num_maps: 10
> | local_storage cache sequential get:
> | <before> | <after>
> | hits throughput: 32.078 ± 0.004 M ops/s | 37.813 ± 0.020 M ops/s (+17.9%)
> | hits latency: 31.174 ns/op | 26.446 ns/op (-15.2%)
> | important_hits throughput: 3.208 ± 0.000 M ops/s | 3.781 ± 0.002 M ops/s (+17.9%)
> | local_storage cache interleaved get:
> | <before> | <after>
> | hits throughput: 34.564 ± 0.011 M ops/s | 40.082 ± 0.037 M ops/s (+16.0%)
> | hits latency: 28.932 ns/op | 24.949 ns/op (-13.8%)
> | important_hits throughput: 12.344 ± 0.004 M ops/s | 14.315 ± 0.013 M ops/s (+16.0%)
> |
> | num_maps: 16
> | local_storage cache sequential get:
> | <before> | <after>
> | hits throughput: 32.493 ± 0.023 M ops/s | 38.147 ± 0.029 M ops/s (+17.4%)
> | hits latency: 30.776 ns/op | 26.215 ns/op (-14.8%)
> | important_hits throughput: 2.031 ± 0.001 M ops/s | 2.384 ± 0.002 M ops/s (+17.4%)
> | local_storage cache interleaved get:
> | <before> | <after>
> | hits throughput: 34.380 ± 0.521 M ops/s | 41.605 ± 0.095 M ops/s (+21.0%)
> | hits latency: 29.087 ns/op | 24.035 ns/op (-17.4%)
> | important_hits throughput: 10.939 ± 0.166 M ops/s | 13.238 ± 0.030 M ops/s (+21.0%)
> |
> | num_maps: 17
> | local_storage cache sequential get:
> | <before> | <after>
> | hits throughput: 28.748 ± 0.028 M ops/s | 32.248 ± 0.080 M ops/s (+12.2%)
> | hits latency: 34.785 ns/op | 31.009 ns/op (-10.9%)
> | important_hits throughput: 1.693 ± 0.002 M ops/s | 1.899 ± 0.005 M ops/s (+12.2%)
> | local_storage cache interleaved get:
> | <before> | <after>
> | hits throughput: 31.313 ± 0.030 M ops/s | 35.911 ± 0.020 M ops/s (+14.7%)
> | hits latency: 31.936 ns/op | 27.847 ns/op (-12.8%)
> | important_hits throughput: 9.533 ± 0.009 M ops/s | 10.933 ± 0.006 M ops/s (+14.7%)
> |
> | num_maps: 24
> | local_storage cache sequential get:
> | <before> | <after>
> | hits throughput: 18.475 ± 0.027 M ops/s | 19.000 ± 0.006 M ops/s (+2.8%)
> | hits latency: 54.127 ns/op | 52.632 ns/op (-2.8%)
> | important_hits throughput: 0.770 ± 0.001 M ops/s | 0.792 ± 0.000 M ops/s (+2.9%)
> | local_storage cache interleaved get:
> | <before> | <after>
> | hits throughput: 21.361 ± 0.028 M ops/s | 22.388 ± 0.099 M ops/s (+4.8%)
> | hits latency: 46.814 ns/op | 44.667 ns/op (-4.6%)
> | important_hits throughput: 6.009 ± 0.008 M ops/s | 6.298 ± 0.028 M ops/s (+4.8%)
> |
> | num_maps: 32
> | local_storage cache sequential get:
> | <before> | <after>
> | hits throughput: 14.220 ± 0.006 M ops/s | 14.168 ± 0.020 M ops/s (-0.4%)
> | hits latency: 70.323 ns/op | 70.580 ns/op (+0.4%)
> | important_hits throughput: 0.445 ± 0.000 M ops/s | 0.443 ± 0.001 M ops/s (-0.4%)
> | local_storage cache interleaved get:
> | <before> | <after>
> | hits throughput: 17.250 ± 0.011 M ops/s | 16.650 ± 0.021 M ops/s (-3.5%)
> | hits latency: 57.971 ns/op | 60.061 ns/op (+3.6%)
> | important_hits throughput: 4.815 ± 0.003 M ops/s | 4.647 ± 0.006 M ops/s (-3.5%)
> |
> | num_maps: 100
> | local_storage cache sequential get:
> | <before> | <after>
> | hits throughput: 5.212 ± 0.012 M ops/s | 5.878 ± 0.004 M ops/s (+12.8%)
> | hits latency: 191.877 ns/op | 170.116 ns/op (-11.3%)
> | important_hits throughput: 0.052 ± 0.000 M ops/s | 0.059 ± 0.000 M ops/s (+13.5%)
> | local_storage cache interleaved get:
> | <before> | <after>
> | hits throughput: 6.521 ± 0.053 M ops/s | 7.086 ± 0.010 M ops/s (+8.7%)
> | hits latency: 153.343 ns/op | 141.116 ns/op (-8.0%)
> | important_hits throughput: 1.703 ± 0.014 M ops/s | 1.851 ± 0.003 M ops/s (+8.7%)
> |
> | num_maps: 1000
> | local_storage cache sequential get:
> | <before> | <after>
> | hits throughput: 0.357 ± 0.005 M ops/s | 0.325 ± 0.005 M ops/s (-9.0%)
> | hits latency: 2803.738 ns/op | 3076.923 ns/op (+9.7%)
> | important_hits throughput: 0.000 ± 0.000 M ops/s | 0.000 ± 0.000 M ops/s
> | local_storage cache interleaved get:
> | <before> | <after>
> | hits throughput: 0.434 ± 0.007 M ops/s | 0.447 ± 0.007 M ops/s (+3.0%)
> | hits latency: 2306.539 ns/op | 2237.687 ns/op (-3.0%)
> | important_hits throughput: 0.109 ± 0.002 M ops/s | 0.112 ± 0.002 M ops/s (+2.8%)
>
> Signed-off-by: Marco Elver <elver@google.com>
Thanks for the patch. This indeed could improve performance with inlining.
In Meta, we started to use LTO kernel which already have
bpf_local_storage_lookup() cross-file inlined into
bpf_sk_storage_lookup(), so we won't be able to reap
this benefit.
In the subject, please use tag [PATCH bpf-next] so CI can
pick it up for testing properly.
[...]
next prev parent reply other threads:[~2024-01-31 20:04 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-31 14:18 [PATCH] bpf: Separate bpf_local_storage_lookup() fast and slow paths Marco Elver
2024-01-31 19:52 ` Martin KaFai Lau
2024-01-31 20:09 ` Marco Elver
2024-02-05 15:00 ` Marco Elver
2024-02-05 23:24 ` Martin KaFai Lau
2024-02-06 17:04 ` Marco Elver
2024-02-07 1:22 ` Martin KaFai Lau
2024-02-07 9:56 ` Marco Elver
2024-01-31 20:04 ` Yonghong Song [this message]
2024-02-05 23:02 ` Song Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bd3d594e-55e7-47c3-99f4-efc1a4688fb2@linux.dev \
--to=yonghong.song@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=elver@google.com \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=mykolal@fb.com \
--cc=sdf@google.com \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.