public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Jiri Olsa <olsajiri@gmail.com>
To: Yan Zhai <yan@cloudflare.com>
Cc: bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	John Fastabend <john.fastabend@gmail.com>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
	Yonghong Song <yonghong.song@linux.dev>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>,
	Mykola Lysenko <mykolal@fb.com>, Shuah Khan <shuah@kernel.org>,
	Brian Vazquez <brianvv@google.com>,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	kernel-team@cloudflare.com, Hou Tao <houtao@huaweicloud.com>
Subject: Re: [PATCH v3 bpf 1/2] bpf: skip non exist keys in generic_map_lookup_batch
Date: Mon, 10 Feb 2025 10:19:44 +0100	[thread overview]
Message-ID: <Z6nEsGSbWqCSaVp3@krava> (raw)
In-Reply-To: <85618439eea75930630685c467ccefeac0942e2b.1739171594.git.yan@cloudflare.com>

On Sun, Feb 09, 2025 at 11:22:35PM -0800, Yan Zhai wrote:
> The generic_map_lookup_batch currently returns EINTR if it fails with
> ENOENT and retries several times on bpf_map_copy_value. The next batch
> would start from the same location, presuming it's a transient issue.
> This is incorrect if a map can actually have "holes", i.e.
> "get_next_key" can return a key that does not point to a valid value. At
> least the array of maps type may contain such holes legitly. Right now
> these holes show up, generic batch lookup cannot proceed any more. It
> will always fail with EINTR errors.
> 
> Rather, do not retry in generic_map_lookup_batch. If it finds a non
> existing element, skip to the next key. This simple solution comes with
> a price that transient errors may not be recovered, and the iteration
> might cycle back to the first key under parallel deletion. For example,

probably stupid question, but why not keep the retry logic and when
it fails then instead of returning EINTR just jump to the next key

jirka


> Hou Tao <houtao@huaweicloud.com> pointed out a following scenario:
> 
> For LPM trie map:
> (1) ->map_get_next_key(map, prev_key, key) returns a valid key
> 
> (2) bpf_map_copy_value() return -ENOMENT
> It means the key must be deleted concurrently.
> 
> (3) goto next_key
> It swaps the prev_key and key
> 
> (4) ->map_get_next_key(map, prev_key, key) again
> prev_key points to a non-existing key, for LPM trie it will treat just
> like prev_key=NULL case, the returned key will be duplicated.
> 
> With the retry logic, the iteration can continue to the key next to the
> deleted one. But if we directly skip to the next key, the iteration loop
> would restart from the first key for the lpm_trie type.
> 
> However, not all races may be recovered. For example, if current key is
> deleted after instead of before bpf_map_copy_value, or if the prev_key
> also gets deleted, then the loop will still restart from the first key
> for lpm_tire anyway. For generic lookup it might be better to stay
> simple, i.e. just skip to the next key. To guarantee that the output
> keys are not duplicated, it is better to implement map type specific
> batch operations, which can properly lock the trie and synchronize with
> concurrent mutators.
> 
> Fixes: cb4d03ab499d ("bpf: Add generic support for lookup batch op")
> Closes: https://lore.kernel.org/bpf/Z6JXtA1M5jAZx8xD@debian.debian/
> Signed-off-by: Yan Zhai <yan@cloudflare.com>
> Acked-by: Hou Tao <houtao1@huawei.com>
> ---
> v2->v3: deleted a used macro
> v1->v2: incorporate more useful information inside commit message.
> ---
>  kernel/bpf/syscall.c | 18 +++++-------------
>  1 file changed, 5 insertions(+), 13 deletions(-)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index c420edbfb7c8..e5f1c7fd0ba7 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1968,8 +1968,6 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
>  	return err;
>  }
>  
> -#define MAP_LOOKUP_RETRIES 3
> -
>  int generic_map_lookup_batch(struct bpf_map *map,
>  				    const union bpf_attr *attr,
>  				    union bpf_attr __user *uattr)
> @@ -1979,8 +1977,8 @@ int generic_map_lookup_batch(struct bpf_map *map,
>  	void __user *values = u64_to_user_ptr(attr->batch.values);
>  	void __user *keys = u64_to_user_ptr(attr->batch.keys);
>  	void *buf, *buf_prevkey, *prev_key, *key, *value;
> -	int err, retry = MAP_LOOKUP_RETRIES;
>  	u32 value_size, cp, max_count;
> +	int err;
>  
>  	if (attr->batch.elem_flags & ~BPF_F_LOCK)
>  		return -EINVAL;
> @@ -2026,14 +2024,8 @@ int generic_map_lookup_batch(struct bpf_map *map,
>  		err = bpf_map_copy_value(map, key, value,
>  					 attr->batch.elem_flags);
>  
> -		if (err == -ENOENT) {
> -			if (retry) {
> -				retry--;
> -				continue;
> -			}
> -			err = -EINTR;
> -			break;
> -		}
> +		if (err == -ENOENT)
> +			goto next_key;
>  
>  		if (err)
>  			goto free_buf;
> @@ -2048,12 +2040,12 @@ int generic_map_lookup_batch(struct bpf_map *map,
>  			goto free_buf;
>  		}
>  
> +		cp++;
> +next_key:
>  		if (!prev_key)
>  			prev_key = buf_prevkey;
>  
>  		swap(prev_key, key);
> -		retry = MAP_LOOKUP_RETRIES;
> -		cp++;
>  		cond_resched();
>  	}
>  
> -- 
> 2.39.5
> 
> 

  reply	other threads:[~2025-02-10  9:19 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-10  7:22 [PATCH v3 bpf 0/2] bpf: skip non exist keys in generic_map_lookup_batch Yan Zhai
2025-02-10  7:22 ` [PATCH v3 bpf 1/2] " Yan Zhai
2025-02-10  9:19   ` Jiri Olsa [this message]
2025-02-10 14:47     ` Brian Vazquez
2025-02-10 16:21       ` Yan Zhai
2025-02-12 17:04         ` Jiri Olsa
2025-02-10  7:22 ` [PATCH v3 bpf 2/2] selftests: bpf: test batch lookup on array of maps with holes Yan Zhai
2025-02-19  1:40 ` [PATCH v3 bpf 0/2] bpf: skip non exist keys in generic_map_lookup_batch patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6nEsGSbWqCSaVp3@krava \
    --to=olsajiri@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brianvv@google.com \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=haoluo@google.com \
    --cc=houtao@huaweicloud.com \
    --cc=john.fastabend@gmail.com \
    --cc=kernel-team@cloudflare.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=mykolal@fb.com \
    --cc=sdf@fomichev.me \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=yan@cloudflare.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox