All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin KaFai Lau <martin.lau@linux.dev>
To: Kui-Feng Lee <thinker.li@gmail.com>
Cc: bpf@vger.kernel.org, ast@kernel.org, song@kernel.org,
	kernel-team@meta.com, andrii@kernel.org, sinquersw@gmail.com,
	kuifeng@meta.com
Subject: Re: [PATCH bpf-next 3/6] bpf: provide a function to unregister struct_ops objects from consumers.
Date: Wed, 1 May 2024 11:48:37 -0700	[thread overview]
Message-ID: <f287c62f-628f-4201-ba34-03a7193212d8@linux.dev> (raw)
In-Reply-To: <20240429213609.487820-4-thinker.li@gmail.com>

On 4/29/24 2:36 PM, Kui-Feng Lee wrote:
> +/* Called from the subsystem that consume the struct_ops.
> + *
> + * The caller should protected this function by holding rcu_read_lock() to
> + * ensure "data" is valid. However, this function may unlock rcu
> + * temporarily. The caller should not rely on the preceding rcu_read_lock()
> + * after returning from this function.

This temporarily losing rcu_read_lock protection is error prone. The caller 
should do the inc_not_zero() instead if it is needed.

I feel the approach in patch 1 and 3 is a little box-ed in by the earlier tcp-cc 
usage that tried to fit into the kernel module reg/unreg paradigm and hide as 
much bpf details as possible from tcp-cc. This is not necessarily true now for 
other subsystem which has bpf struct_ops from day one.

The epoll detach notification is link only. Can this kernel side specific unreg 
be limited to struct_ops link only? During reg, a rcu protected link could be 
passed to the subsystem. That subsystem becomes a kernel user of the bpf link 
and it can call link_detach(link) to detach. Pseudo code:

struct link __rcu *link;

rcu_read_lock();
ref_link = rcu_dereference(link)
if (ref_link)
	ref_link = bpf_link_inc_not_zero(ref_link);
rcu_read_unlock();

if (!IS_ERR_OR_NULL(ref_link)) {
	bpf_struct_ops_map_link_detach(ref_link);
	bpf_link_put(ref_link);
}

> + *
> + * Return true if unreg() success. If a call fails, it means some other
> + * task has unrgistered or is unregistering the same object.
> + */
> +bool bpf_struct_ops_kvalue_unreg(void *data)
> +{
> +	struct bpf_struct_ops_map *st_map =
> +		container_of(data, struct bpf_struct_ops_map, kvalue.data);
> +	enum bpf_struct_ops_state prev_state;
> +	struct bpf_struct_ops_link *st_link;
> +	bool ret = false;
> +
> +	/* The st_map and st_link should be protected by rcu_read_lock(),
> +	 * or they may have been free when we try to increase their
> +	 * refcount.
> +	 */
> +	if (IS_ERR(bpf_map_inc_not_zero(&st_map->map)))
> +		/* The map is already gone */
> +		return false;
> +
> +	prev_state = cmpxchg(&st_map->kvalue.common.state,
> +			     BPF_STRUCT_OPS_STATE_INUSE,
> +			     BPF_STRUCT_OPS_STATE_TOBEFREE);
> +	if (prev_state == BPF_STRUCT_OPS_STATE_INUSE) {
> +		st_map->st_ops_desc->st_ops->unreg(data);
> +		/* Pair with bpf_map_inc() for reg() */
> +		bpf_map_put(&st_map->map);
> +		/* Pair with bpf_map_inc_not_zero() above */
> +		bpf_map_put(&st_map->map);
> +		return true;
> +	}
> +	if (prev_state != BPF_STRUCT_OPS_STATE_READY)
> +		goto fail;
> +
> +	/* With BPF_F_LINK */
> +
> +	st_link = rcu_dereference(st_map->attached);
> +	if (!st_link || !bpf_link_inc_not_zero(&st_link->link))
> +		/* The map is on the way to unregister */
> +		goto fail;
> +
> +	rcu_read_unlock();
> +	mutex_lock(&update_mutex);
> +
> +	if (rcu_dereference_protected(st_link->map, true) != &st_map->map)
> +		/* The map should be unregistered already or on the way to
> +		 * be unregistered.
> +		 */
> +		goto fail_unlock;
> +
> +	st_map->st_ops_desc->st_ops->unreg(data);
> +
> +	map_attached_null(st_map);
> +	rcu_assign_pointer(st_link->map, NULL);
> +	/* Pair with bpf_map_get() in bpf_struct_ops_link_create() or
> +	 * bpf_map_inc() in bpf_struct_ops_map_link_update().
> +	 */
> +	bpf_map_put(&st_map->map);
> +
> +	ret = true;
> +
> +fail_unlock:
> +	mutex_unlock(&update_mutex);
> +	rcu_read_lock();
> +	bpf_link_put(&st_link->link);
> +fail:
> +	bpf_map_put(&st_map->map);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(bpf_struct_ops_kvalue_unreg);


  reply	other threads:[~2024-05-01 18:48 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-29 21:36 [PATCH bpf-next 0/6] Notify user space when a struct_ops object is detached/unregisterd Kui-Feng Lee
2024-04-29 21:36 ` [PATCH bpf-next 1/6] bpf: add a pointer of the attached link to bpf_struct_ops_map Kui-Feng Lee
2024-05-01 17:01   ` Andrii Nakryiko
2024-05-01 22:15     ` Kui-Feng Lee
2024-04-29 21:36 ` [PATCH bpf-next 2/6] bpf: export bpf_link_inc_not_zero() Kui-Feng Lee
2024-04-29 21:36 ` [PATCH bpf-next 3/6] bpf: provide a function to unregister struct_ops objects from consumers Kui-Feng Lee
2024-05-01 18:48   ` Martin KaFai Lau [this message]
2024-05-01 22:15     ` Kui-Feng Lee
2024-05-01 23:06       ` Martin KaFai Lau
2024-05-02 17:56     ` Martin KaFai Lau
2024-05-02 18:29       ` Martin KaFai Lau
2024-05-03  0:41       ` Kui-Feng Lee
2024-05-03 16:19         ` Alexei Starovoitov
2024-05-03 18:09           ` Kui-Feng Lee
2024-05-03 17:17         ` Martin KaFai Lau
2024-04-29 21:36 ` [PATCH bpf-next 4/6] bpf: detach a bpf_struct_ops_map from a link Kui-Feng Lee
2024-04-29 21:36 ` [PATCH bpf-next 5/6] bpf: support epoll from bpf struct_ops links Kui-Feng Lee
2024-05-01 17:03   ` Andrii Nakryiko
2024-05-01 22:16     ` Kui-Feng Lee
2024-04-29 21:36 ` [PATCH bpf-next 6/6] selftests/bpf: test detaching " Kui-Feng Lee
2024-05-01 17:05   ` Andrii Nakryiko
2024-05-01 22:17     ` Kui-Feng Lee
2024-05-02 18:15   ` Martin KaFai Lau
2024-05-03 18:34     ` Kui-Feng Lee
2024-05-03 19:15       ` Martin KaFai Lau
2024-05-03 21:34         ` Kui-Feng Lee
2024-05-03 21:59           ` Martin KaFai Lau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f287c62f-628f-4201-ba34-03a7193212d8@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=kernel-team@meta.com \
    --cc=kuifeng@meta.com \
    --cc=sinquersw@gmail.com \
    --cc=song@kernel.org \
    --cc=thinker.li@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.