public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Yonghong Song <yonghong.song@linux.dev>
To: Marco Elver <elver@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Song Liu <song@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>, Mykola Lysenko <mykolal@fb.com>,
	Shuah Khan <shuah@kernel.org>,
	bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org
Subject: Re: [PATCH] bpf: Separate bpf_local_storage_lookup() fast and slow paths
Date: Wed, 31 Jan 2024 12:04:00 -0800	[thread overview]
Message-ID: <bd3d594e-55e7-47c3-99f4-efc1a4688fb2@linux.dev> (raw)
In-Reply-To: <20240131141858.1149719-1-elver@google.com>


On 1/31/24 6:18 AM, Marco Elver wrote:
> To allow the compiler to inline the bpf_local_storage_lookup() fast-
> path, factor it out by making bpf_local_storage_lookup() a static inline
> function and move the slow-path to bpf_local_storage_lookup_slowpath().
>
> Base on results from './benchs/run_bench_local_storage.sh' this produces
> improvements in throughput and latency in the majority of cases:
>
> | Hashmap Control
> | ===============
> | num keys: 10
> |  hashmap (control) sequential get:
> |                              <before>                | <after>
> |   hits throughput:           13.895 ± 0.024 M ops/s  | 14.022 ± 0.095 M ops/s	(+0.9%)
> |   hits latency:              71.968 ns/op            | 71.318 ns/op		(-0.9%)
> |   important_hits throughput: 13.895 ± 0.024 M ops/s  | 14.022 ± 0.095 M ops/s	(+0.9%)
> |
> | num keys: 1000
> |  hashmap (control) sequential get:
> |                              <before>                | <after>
> |   hits throughput:           11.793 ± 0.018 M ops/s  | 11.645 ± 0.370 M ops/s	(-1.3%)
> |   hits latency:              84.794 ns/op            | 85.874 ns/op		(+1.3%)
> |   important_hits throughput: 11.793 ± 0.018 M ops/s  | 11.645 ± 0.370 M ops/s	(-1.3%)
> |
> | num keys: 10000
> |  hashmap (control) sequential get:
> |                              <before>                | <after>
> |   hits throughput:           7.113 ± 0.012 M ops/s   | 7.037 ± 0.051 M ops/s	(-1.1%)
> |   hits latency:              140.581 ns/op           | 142.113 ns/op		(+1.1%)
> |   important_hits throughput: 7.113 ± 0.012 M ops/s   | 7.037 ± 0.051 M ops/s	(-1.1%)
> |
> | num keys: 100000
> |  hashmap (control) sequential get:
> |                              <before>                | <after>
> |   hits throughput:           4.793 ± 0.034 M ops/s   | 4.990 ± 0.025 M ops/s	(+4.1%)
> |   hits latency:              208.623 ns/op           | 200.401 ns/op		(-3.9%)
> |   important_hits throughput: 4.793 ± 0.034 M ops/s   | 4.990 ± 0.025 M ops/s	(+4.1%)
> |
> | num keys: 4194304
> |  hashmap (control) sequential get:
> |                              <before>                | <after>
> |   hits throughput:           2.088 ± 0.008 M ops/s   | 2.962 ± 0.004 M ops/s	(+41.9%)
> |   hits latency:              478.851 ns/op           | 337.648 ns/op		(-29.5%)
> |   important_hits throughput: 2.088 ± 0.008 M ops/s   | 2.962 ± 0.004 M ops/s	(+41.9%)
> |
> | Local Storage
> | =============
> | num_maps: 1
> |  local_storage cache sequential  get:
> |                              <before>                | <after>
> |   hits throughput:           32.598 ± 0.008 M ops/s  | 38.480 ± 0.054 M ops/s	(+18.0%)
> |   hits latency:              30.676 ns/op            | 25.988 ns/op		(-15.3%)
> |   important_hits throughput: 32.598 ± 0.008 M ops/s  | 38.480 ± 0.054 M ops/s	(+18.0%)
> |  local_storage cache interleaved get:
> |                              <before>                | <after>
> |   hits throughput:           36.963 ± 0.045 M ops/s  | 43.847 ± 0.037 M ops/s	(+18.6%)
> |   hits latency:              27.054 ns/op            | 22.807 ns/op		(-15.7%)
> |   important_hits throughput: 36.963 ± 0.045 M ops/s  | 43.847 ± 0.037 M ops/s	(+18.6%)
> |
> | num_maps: 10
> |  local_storage cache sequential  get:
> |                              <before>                | <after>
> |   hits throughput:           32.078 ± 0.004 M ops/s  | 37.813 ± 0.020 M ops/s	(+17.9%)
> |   hits latency:              31.174 ns/op            | 26.446 ns/op		(-15.2%)
> |   important_hits throughput: 3.208 ± 0.000 M ops/s   | 3.781 ± 0.002 M ops/s	(+17.9%)
> |  local_storage cache interleaved get:
> |                              <before>                | <after>
> |   hits throughput:           34.564 ± 0.011 M ops/s  | 40.082 ± 0.037 M ops/s	(+16.0%)
> |   hits latency:              28.932 ns/op            | 24.949 ns/op		(-13.8%)
> |   important_hits throughput: 12.344 ± 0.004 M ops/s  | 14.315 ± 0.013 M ops/s	(+16.0%)
> |
> | num_maps: 16
> |  local_storage cache sequential  get:
> |                              <before>                | <after>
> |   hits throughput:           32.493 ± 0.023 M ops/s  | 38.147 ± 0.029 M ops/s	(+17.4%)
> |   hits latency:              30.776 ns/op            | 26.215 ns/op		(-14.8%)
> |   important_hits throughput: 2.031 ± 0.001 M ops/s   | 2.384 ± 0.002 M ops/s	(+17.4%)
> |  local_storage cache interleaved get:
> |                              <before>                | <after>
> |   hits throughput:           34.380 ± 0.521 M ops/s  | 41.605 ± 0.095 M ops/s	(+21.0%)
> |   hits latency:              29.087 ns/op            | 24.035 ns/op		(-17.4%)
> |   important_hits throughput: 10.939 ± 0.166 M ops/s  | 13.238 ± 0.030 M ops/s	(+21.0%)
> |
> | num_maps: 17
> |  local_storage cache sequential  get:
> |                              <before>                | <after>
> |   hits throughput:           28.748 ± 0.028 M ops/s  | 32.248 ± 0.080 M ops/s	(+12.2%)
> |   hits latency:              34.785 ns/op            | 31.009 ns/op		(-10.9%)
> |   important_hits throughput: 1.693 ± 0.002 M ops/s   | 1.899 ± 0.005 M ops/s	(+12.2%)
> |  local_storage cache interleaved get:
> |                              <before>                | <after>
> |   hits throughput:           31.313 ± 0.030 M ops/s  | 35.911 ± 0.020 M ops/s	(+14.7%)
> |   hits latency:              31.936 ns/op            | 27.847 ns/op		(-12.8%)
> |   important_hits throughput: 9.533 ± 0.009 M ops/s   | 10.933 ± 0.006 M ops/s	(+14.7%)
> |
> | num_maps: 24
> |  local_storage cache sequential  get:
> |                              <before>                | <after>
> |   hits throughput:           18.475 ± 0.027 M ops/s  | 19.000 ± 0.006 M ops/s	(+2.8%)
> |   hits latency:              54.127 ns/op            | 52.632 ns/op		(-2.8%)
> |   important_hits throughput: 0.770 ± 0.001 M ops/s   | 0.792 ± 0.000 M ops/s	(+2.9%)
> |  local_storage cache interleaved get:
> |                              <before>                | <after>
> |   hits throughput:           21.361 ± 0.028 M ops/s  | 22.388 ± 0.099 M ops/s	(+4.8%)
> |   hits latency:              46.814 ns/op            | 44.667 ns/op		(-4.6%)
> |   important_hits throughput: 6.009 ± 0.008 M ops/s   | 6.298 ± 0.028 M ops/s	(+4.8%)
> |
> | num_maps: 32
> |  local_storage cache sequential  get:
> |                              <before>                | <after>
> |   hits throughput:           14.220 ± 0.006 M ops/s  | 14.168 ± 0.020 M ops/s	(-0.4%)
> |   hits latency:              70.323 ns/op            | 70.580 ns/op		(+0.4%)
> |   important_hits throughput: 0.445 ± 0.000 M ops/s   | 0.443 ± 0.001 M ops/s	(-0.4%)
> |  local_storage cache interleaved get:
> |                              <before>                | <after>
> |   hits throughput:           17.250 ± 0.011 M ops/s  | 16.650 ± 0.021 M ops/s	(-3.5%)
> |   hits latency:              57.971 ns/op            | 60.061 ns/op		(+3.6%)
> |   important_hits throughput: 4.815 ± 0.003 M ops/s   | 4.647 ± 0.006 M ops/s	(-3.5%)
> |
> | num_maps: 100
> |  local_storage cache sequential  get:
> |                              <before>                | <after>
> |   hits throughput:           5.212 ± 0.012 M ops/s   | 5.878 ± 0.004 M ops/s	(+12.8%)
> |   hits latency:              191.877 ns/op           | 170.116 ns/op		(-11.3%)
> |   important_hits throughput: 0.052 ± 0.000 M ops/s   | 0.059 ± 0.000 M ops/s	(+13.5%)
> |  local_storage cache interleaved get:
> |                              <before>                | <after>
> |   hits throughput:           6.521 ± 0.053 M ops/s   | 7.086 ± 0.010 M ops/s	(+8.7%)
> |   hits latency:              153.343 ns/op           | 141.116 ns/op		(-8.0%)
> |   important_hits throughput: 1.703 ± 0.014 M ops/s   | 1.851 ± 0.003 M ops/s	(+8.7%)
> |
> | num_maps: 1000
> |  local_storage cache sequential  get:
> |                              <before>                | <after>
> |   hits throughput:           0.357 ± 0.005 M ops/s   | 0.325 ± 0.005 M ops/s	(-9.0%)
> |   hits latency:              2803.738 ns/op          | 3076.923 ns/op		(+9.7%)
> |   important_hits throughput: 0.000 ± 0.000 M ops/s   | 0.000 ± 0.000 M ops/s
> |  local_storage cache interleaved get:
> |                              <before>                | <after>
> |   hits throughput:           0.434 ± 0.007 M ops/s   | 0.447 ± 0.007 M ops/s	(+3.0%)
> |   hits latency:              2306.539 ns/op          | 2237.687 ns/op		(-3.0%)
> |   important_hits throughput: 0.109 ± 0.002 M ops/s   | 0.112 ± 0.002 M ops/s	(+2.8%)
>
> Signed-off-by: Marco Elver <elver@google.com>

Thanks for the patch. This indeed could improve performance with inlining.
In Meta, we started to use LTO kernel which already have
bpf_local_storage_lookup() cross-file inlined into
bpf_sk_storage_lookup(), so we won't be able to reap
this benefit.

In the subject, please use tag [PATCH bpf-next] so CI can
pick it up for testing properly.

[...]


  parent reply	other threads:[~2024-01-31 20:04 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-31 14:18 [PATCH] bpf: Separate bpf_local_storage_lookup() fast and slow paths Marco Elver
2024-01-31 19:52 ` Martin KaFai Lau
2024-01-31 20:09   ` Marco Elver
2024-02-05 15:00   ` Marco Elver
2024-02-05 23:24     ` Martin KaFai Lau
2024-02-06 17:04       ` Marco Elver
2024-02-07  1:22         ` Martin KaFai Lau
2024-02-07  9:56           ` Marco Elver
2024-01-31 20:04 ` Yonghong Song [this message]
2024-02-05 23:02 ` Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bd3d594e-55e7-47c3-99f4-efc1a4688fb2@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=elver@google.com \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=mykolal@fb.com \
    --cc=sdf@google.com \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox