All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin KaFai Lau <martin.lau@linux.dev>
To: Joe Stringer <joe@isovalent.com>
Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	ast@kernel.org, corbet@lwn.net, bagasdotme@gmail.com,
	maxtram95@gmail.com, bpf@vger.kernel.org
Subject: Re: [PATCH bpf-next v3] docs/bpf: Add LRU internals description and graph
Date: Tue, 14 Mar 2023 12:31:24 -0700	[thread overview]
Message-ID: <c6172fe2-7d88-f9f8-e19a-47c232f9cb75@linux.dev> (raw)
In-Reply-To: <20230312190600.324573-1-joe@isovalent.com>

On 3/12/23 12:05 PM, Joe Stringer wrote:
> Extend the bpf hashmap docs to include a brief description of the
> internals of the LRU map type (setting appropriate API expectations),
> including the original commit message from Martin and a variant on the
> graph that I had presented during my Linux Plumbers Conference 2022 talk
> on "Pressure feedback for LRU map types"[0].
> 
> The node names in the dot file correspond roughly to the functions where
> the logic for those decisions or steps is defined, to help curious
> developers to cross-reference and update this logic if the details of
> the LRU implementation ever differ from this description.
> 
> [0]: https://lpc.events/event/16/contributions/1368/
> 
> Signed-off-by: Joe Stringer <joe@isovalent.com>
> ---
> v3: Use standard table syntax
>      Replace inline commit message with reference to commit
>      Fix incorrect Y/N label for common LRU check
>      Rename some dotfile variables to reduce confusion between cases
>      Minor wording touchups
> v2: Fix issue that caused initial email submission to fail
> ---
>   Documentation/bpf/map_hash.rst            |  62 ++++++++
>   Documentation/bpf/map_lru_hash_update.dot | 166 ++++++++++++++++++++++
>   2 files changed, 228 insertions(+)
>   create mode 100644 Documentation/bpf/map_lru_hash_update.dot
> 
> diff --git a/Documentation/bpf/map_hash.rst b/Documentation/bpf/map_hash.rst
> index 8669426264c6..61602ce26561 100644
> --- a/Documentation/bpf/map_hash.rst
> +++ b/Documentation/bpf/map_hash.rst
> @@ -1,5 +1,6 @@
>   .. SPDX-License-Identifier: GPL-2.0-only
>   .. Copyright (C) 2022 Red Hat, Inc.
> +.. Copyright (C) 2022-2023 Isovalent, Inc.
>   
>   ===============================================
>   BPF_MAP_TYPE_HASH, with PERCPU and LRU Variants
> @@ -206,3 +207,64 @@ Userspace walking the map elements from the map declared above:
>                       cur_key = &next_key;
>               }
>       }
> +
> +Internals
> +=========
> +
> +This section of the document is targeted at Linux developers and describes
> +aspects of the map implementations that are not considered stable ABI. The
> +following details are subject to change in future versions of the kernel.
> +
> +``BPF_MAP_TYPE_LRU_HASH`` and variants
> +--------------------------------------
> +
> +An LRU hashmap type consists of two properties: Firstly, it is a hash map and
> +hence is indexable by key for constant time lookups. Secondly, when at map
> +capacity, map updates will trigger eviction of old entries based on the age of
> +the elements in a set of lists. Each of these properties may be either global
> +or per-CPU, depending on the map type and flags used to create the map:
> +
> ++------------------------+---------------------------+----------------------------------+
> +|                        | ``BPF_MAP_TYPE_LRU_HASH`` | ``BPF_MAP_TYPE_LRU_PERCPU_HASH`` |
> ++========================+===========================+==================================+
> +| ``BPF_NO_COMMON_LRU``  | Per-CPU LRU, global map   | Per-CPU LRU, per-cpu map         |
> ++------------------------+---------------------------+----------------------------------+
> +| ``!BPF_NO_COMMON_LRU`` | Global LRU, global map    | Global LRU, per-cpu map          |
> ++------------------------+---------------------------+----------------------------------+
> +
> +Notably, there are various steps that the update algorithm attempts in order to
> +enforce the LRU property which have increasing impacts on other CPUs involved
> +in the following operation attempts:
> +
> +- Attempt to use CPU-local state to batch operations
> +- Attempt to fetch free nodes from global lists
> +- Attempt to pull any node from a global list and remove it from the hashmap
> +- Attempt to pull any node from any CPU's list and remove it from the hashmap
> +
> +Even if an LRU node may be acquired, maps of type ``BPF_MAP_TYPE_LRU_HASH``
> +may fail to insert the entry into the map if other CPUs are heavily contending
> +on the global hashmap lock.

The global hashmap lock described here is the action taken in htab_lock_bucket()?

It is a percpu counter added in commit 20b6cc34ea74 ("bpf: Avoid hashtab 
deadlock with map_locked") to avoid deadlock/recursion.

I would suggest to simplify the diagram by removing the "Can lock this hashtab 
bucket?" details. May be a note somewhere to mention why it will still fail to 
shrink the list because the htab_lock_bucket() have detected potential 
deadlock/recursion which is a very unlikely case.


Thanks for the write-up!

  parent reply	other threads:[~2023-03-14 19:31 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-12 19:05 [PATCH bpf-next v3] docs/bpf: Add LRU internals description and graph Joe Stringer
2023-03-13  3:54 ` Bagas Sanjaya
2023-03-13 19:18   ` Joe Stringer
2023-03-14  5:21 ` John Fastabend
2023-03-16  1:19   ` Joe Stringer
2023-03-14 13:55 ` Maxim Mikityanskiy
2023-03-16  1:22   ` Joe Stringer
2023-03-14 19:31 ` Martin KaFai Lau [this message]
2023-03-16  1:54   ` Joe Stringer
2023-03-17  6:04     ` Martin KaFai Lau
2023-04-01 18:57       ` Joe Stringer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c6172fe2-7d88-f9f8-e19a-47c232f9cb75@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=ast@kernel.org \
    --cc=bagasdotme@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=joe@isovalent.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maxtram95@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.