netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stanislav Fomichev <stfomichev@gmail.com>
To: Anjali Kulkarni <anjali.k.kulkarni@oracle.com>
Cc: davem@davemloft.net, Liam.Howlett@oracle.com,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	vschneid@redhat.com, jiri@resnulli.us,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	akpm@linux-foundation.org, shuah@kernel.org,
	linux-kselftest@vger.kernel.org, peili.io@oracle.com
Subject: Re: [PATCH net-next v3 1/3] connector/cn_proc: Add hash table for threads
Date: Thu, 17 Oct 2024 08:23:01 -0700	[thread overview]
Message-ID: <ZxEr1TOg4ddoAm7y@mini-arch> (raw)
In-Reply-To: <20241016220634.1469153-2-anjali.k.kulkarni@oracle.com>

On 10/16, Anjali Kulkarni wrote:
> Add a new type PROC_CN_MCAST_NOTIFY to proc connector API, which allows a
> thread to notify the kernel that is going to exit with a non-zero exit
> code and specify the exit code in it. When thread exits in the kernel,
> it will send this exit code as a proc filter notification to any
> listening process.
> Exiting thread can call this either when it wants to call pthread_exit()
> with non-zero value or from signal handler.
> 
> Add a new file cn_hash.c which implements a hash table storing the exit
> codes of abnormally exiting threads, received by the system call above.
> The key used for the hash table is the pid of the thread, so when the
> thread actually exits, we lookup it's pid in the hash table and retrieve
> the exit code sent by user. If the exit code in struct task is 0, we
> then replace it with the user supplied non-zero exit code.
> 
> cn_hash.c implements the hash table add, delete, lookup operations.
> mutex_lock() and mutex_unlock() operations are used to safeguard the
> integrity of the hash table while adding or deleting elements.
> connector.c has the API calls, called from cn_proc.c, as well as calls
> to allocate, initialize and free the hash table.
> 
> Add a new flag in PF_* flags of task_struct - EXIT_NOTIFY. This flag is
> set when user sends the exit code via PROC_CN_MCAST_NOTIFY. While
> exiting, this flag is checked and the hash table add or delete calls
> are only made if this flag is set.
> 
> A refcount field hrefcnt is added in struct cn_hash_dev, to keep track
> of number of threads which have added an entry in hash table. Before
> freeing the struct cn_hash_dev, this value must be 0.
> This refcnt check is added in case CONFIG_CONNECTOR is compiled as a
> module. In that case, when unloading the module, we need to make sure
> no hash entries are still present in the hdev table.
> 
> Copy the task's name (task->comm) into the exit event notification.
> This will allow applications to filter on the name further using
> userspace filtering like ebpf.
> 
> Signed-off-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com>
> ---
>  drivers/connector/Makefile    |   2 +-
>  drivers/connector/cn_hash.c   | 181 ++++++++++++++++++++++++++++++++++
>  drivers/connector/cn_proc.c   |  62 +++++++++++-
>  drivers/connector/connector.c |  63 +++++++++++-
>  include/linux/connector.h     |  31 ++++++
>  include/linux/sched.h         |   2 +-
>  include/uapi/linux/cn_proc.h  |   5 +-
>  7 files changed, 338 insertions(+), 8 deletions(-)
>  create mode 100644 drivers/connector/cn_hash.c
> 
> diff --git a/drivers/connector/Makefile b/drivers/connector/Makefile
> index 1bf67d3df97d..cb1dcdf067ad 100644
> --- a/drivers/connector/Makefile
> +++ b/drivers/connector/Makefile
> @@ -2,4 +2,4 @@
>  obj-$(CONFIG_CONNECTOR)		+= cn.o
>  obj-$(CONFIG_PROC_EVENTS)	+= cn_proc.o
>  
> -cn-y				+= cn_queue.o connector.o
> +cn-y				+= cn_hash.o cn_queue.o connector.o
> diff --git a/drivers/connector/cn_hash.c b/drivers/connector/cn_hash.c
> new file mode 100644
> index 000000000000..a079e9bcea6d
> --- /dev/null
> +++ b/drivers/connector/cn_hash.c
> @@ -0,0 +1,181 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Author: Anjali Kulkarni <anjali.k.kulkarni@oracle.com>
> + *
> + * Copyright (c) 2024 Oracle and/or its affiliates.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/init.h>
> +#include <linux/connector.h>
> +#include <linux/mutex.h>
> +#include <linux/pid_namespace.h>
> +
> +#include <linux/cn_proc.h>
> +
> +struct cn_hash_dev *cn_hash_alloc_dev(const char *name)
> +{
> +	struct cn_hash_dev *hdev;
> +
> +	hdev = kzalloc(sizeof(*hdev), GFP_KERNEL);
> +	if (!hdev)
> +		return NULL;
> +
> +	snprintf(hdev->name, sizeof(hdev->name), "%s", name);
> +	atomic_set(&hdev->hrefcnt, 0);
> +	mutex_init(&hdev->uexit_hash_lock);
> +	hash_init(hdev->uexit_pid_htable);
> +	return hdev;
> +}
> +
> +void cn_hash_free_dev(struct cn_hash_dev *hdev)
> +{
> +	struct uexit_pid_hnode *hnode;
> +	struct hlist_node *tmp;
> +	int bucket;
> +
> +	pr_debug("%s: Freeing entire hdev %p\n", __func__, hdev);
> +
> +	mutex_lock(&hdev->uexit_hash_lock);
> +	hash_for_each_safe(hdev->uexit_pid_htable, bucket, tmp,
> +			hnode, uexit_pid_hlist) {
> +		hash_del(&hnode->uexit_pid_hlist);
> +		pr_debug("%s: Freeing node for pid %d\n",
> +				__func__, hnode->pid);
> +		kfree(hnode);
> +	}
> +
> +	mutex_unlock(&hdev->uexit_hash_lock);
> +	mutex_destroy(&hdev->uexit_hash_lock);
> +
> +	/*
> +	 * This refcnt check is added in case CONFIG_CONNECTOR is
> +	 * compiled with =m as a module. In that case, when unloading
> +	 * the module, we need to make sure no hash entries are still
> +	 * present in the hdev table.
> +	 */
> +	while (atomic_read(&hdev->hrefcnt)) {
> +		pr_info("Waiting for %s to become free: refcnt=%d\n",
> +				hdev->name, atomic_read(&hdev->hrefcnt));
> +		msleep(1000);
> +	}
> +
> +	kfree(hdev);
> +	hdev = NULL;
> +}
> +
> +static struct uexit_pid_hnode *cn_hash_alloc_elem(__u32 uexit_code, pid_t pid)
> +{
> +	struct uexit_pid_hnode *elem;
> +
> +	elem = kzalloc(sizeof(*elem), GFP_KERNEL);
> +	if (!elem)
> +		return NULL;
> +
> +	INIT_HLIST_NODE(&elem->uexit_pid_hlist);
> +	elem->uexit_code = uexit_code;
> +	elem->pid = pid;
> +	return elem;
> +}
> +
> +static inline void cn_hash_free_elem(struct uexit_pid_hnode *elem)
> +{
> +	kfree(elem);
> +}
> +
> +int cn_hash_add_elem(struct cn_hash_dev *hdev, __u32 uexit_code, pid_t pid)
> +{
> +	struct uexit_pid_hnode *elem, *hnode;
> +
> +	elem = cn_hash_alloc_elem(uexit_code, pid);
> +	if (!elem) {
> +		pr_err("%s: cn_hash_alloc_elem() returned NULL pid %d\n",
> +				__func__, pid);
> +		return -ENOMEM;
> +	}
> +
> +	mutex_lock(&hdev->uexit_hash_lock);
> +	/*
> +	 * Check if an entry for the same pid already exists
> +	 */
> +	hash_for_each_possible(hdev->uexit_pid_htable,
> +				hnode, uexit_pid_hlist, pid) {
> +		if (hnode->pid == pid) {
> +			mutex_unlock(&hdev->uexit_hash_lock);
> +			cn_hash_free_elem(elem);
> +			pr_debug("%s: pid %d already exists in hash table\n",
> +				__func__, pid);
> +			return -EEXIST;
> +		}
> +	}
> +
> +	hash_add(hdev->uexit_pid_htable, &elem->uexit_pid_hlist, pid);
> +	mutex_unlock(&hdev->uexit_hash_lock);
> +
> +	atomic_inc(&hdev->hrefcnt);
> +
> +	pr_debug("%s: After hash_add of pid %d elem %p hrefcnt %d\n",
> +			__func__, pid, elem, atomic_read(&hdev->hrefcnt));
> +	return 0;
> +}
> +
> +int cn_hash_del_get_exval(struct cn_hash_dev *hdev, pid_t pid)
> +{
> +	struct uexit_pid_hnode *hnode;
> +	struct hlist_node *tmp;
> +	int excde;
> +
> +	mutex_lock(&hdev->uexit_hash_lock);
> +	hash_for_each_possible_safe(hdev->uexit_pid_htable,
> +				hnode, tmp, uexit_pid_hlist, pid) {
> +		if (hnode->pid == pid) {
> +			excde = hnode->uexit_code;
> +			hash_del(&hnode->uexit_pid_hlist);
> +			mutex_unlock(&hdev->uexit_hash_lock);
> +			kfree(hnode);
> +			atomic_dec(&hdev->hrefcnt);
> +			pr_debug("%s: After hash_del of pid %d, found exit code %u hrefcnt %d\n",
> +					__func__, pid, excde,
> +					atomic_read(&hdev->hrefcnt));
> +			return excde;
> +		}
> +	}
> +
> +	mutex_unlock(&hdev->uexit_hash_lock);
> +	pr_err("%s: pid %d not found in hash table\n",
> +			__func__, pid);
> +	return -EINVAL;
> +}
> +
> +int cn_hash_get_exval(struct cn_hash_dev *hdev, pid_t pid)
> +{
> +	struct uexit_pid_hnode *hnode;
> +	__u32 excde;
> +
> +	mutex_lock(&hdev->uexit_hash_lock);
> +	hash_for_each_possible(hdev->uexit_pid_htable,
> +				hnode, uexit_pid_hlist, pid) {
> +		if (hnode->pid == pid) {
> +			excde = hnode->uexit_code;
> +			mutex_unlock(&hdev->uexit_hash_lock);
> +			pr_debug("%s: Found exit code %u for pid %d\n",
> +					__func__, excde, pid);
> +			return excde;
> +		}
> +	}
> +
> +	mutex_unlock(&hdev->uexit_hash_lock);
> +	pr_debug("%s: pid %d not found in hash table\n",
> +			__func__, pid);
> +	return -EINVAL;
> +}
> +
> +bool cn_hash_table_empty(struct cn_hash_dev *hdev)
> +{
> +	bool is_empty;
> +
> +	is_empty = hash_empty(hdev->uexit_pid_htable);
> +	pr_debug("Hash table is %s\n", (is_empty ? "empty" : "not empty"));
> +
> +	return is_empty;
> +}
> diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
> index 44b19e696176..0632a70a89a0 100644
> --- a/drivers/connector/cn_proc.c
> +++ b/drivers/connector/cn_proc.c
> @@ -69,6 +69,8 @@ static int cn_filter(struct sock *dsk, struct sk_buff *skb, void *data)
>  	if ((__u32)val == PROC_EVENT_ALL)
>  		return 0;
>  
> +	pr_debug("%s: val %lx, what %x\n", __func__, val, what);
> +
>  	/*
>  	 * Drop packet if we have to report only non-zero exit status
>  	 * (PROC_EVENT_NONZERO_EXIT) and exit status is 0
> @@ -326,9 +328,15 @@ void proc_exit_connector(struct task_struct *task)
>  	struct proc_event *ev;
>  	struct task_struct *parent;
>  	__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
> +	int uexit_code;
>  
> -	if (atomic_read(&proc_event_num_listeners) < 1)
> +	if (atomic_read(&proc_event_num_listeners) < 1) {
> +		if (likely(!(task->flags & PF_EXIT_NOTIFY)))
> +			return;
> +
> +		cn_del_get_exval(task->pid);
>  		return;
> +	}
>  
>  	msg = buffer_to_cn_msg(buffer);
>  	ev = (struct proc_event *)msg->data;
> @@ -337,7 +345,26 @@ void proc_exit_connector(struct task_struct *task)
>  	ev->what = PROC_EVENT_EXIT;
>  	ev->event_data.exit.process_pid = task->pid;
>  	ev->event_data.exit.process_tgid = task->tgid;
> -	ev->event_data.exit.exit_code = task->exit_code;
> +	if (unlikely(task->flags & PF_EXIT_NOTIFY)) {
> +		task->flags &= ~PF_EXIT_NOTIFY;
> +
> +		uexit_code = cn_del_get_exval(task->pid);
> +		if (uexit_code <= 0) {
> +			pr_debug("%s: err %d returning task's exit code %u\n",
> +					uexit_code, __func__,
> +					task->exit_code);

The compiler complains about the format string:

In file included from ./include/linux/kernel.h:31,
                 from drivers/connector/cn_proc.c:11:
drivers/connector/cn_proc.c: In function ‘proc_exit_connector’:
drivers/connector/cn_proc.c:353:34: error: format ‘%s’ expects argument of type ‘char *’, but argument 3 has type ‘int’ [-Werror=format=]
  353 |                         pr_debug("%s: err %d returning task's exit code %u\n",
      |                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

---
pw-bot: cr

  reply	other threads:[~2024-10-17 15:23 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-16 22:06 [PATCH net-next v3 0/3] Threads support in proc connector Anjali Kulkarni
2024-10-16 22:06 ` [PATCH net-next v3 1/3] connector/cn_proc: Add hash table for threads Anjali Kulkarni
2024-10-17 15:23   ` Stanislav Fomichev [this message]
2024-10-16 22:06 ` [PATCH net-next v3 2/3] connector/cn_proc: Kunit tests for threads hash table Anjali Kulkarni
2024-10-16 22:06 ` [PATCH net-next v3 3/3] connector/cn_proc: Selftest for threads Anjali Kulkarni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZxEr1TOg4ddoAm7y@mini-arch \
    --to=stfomichev@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=anjali.k.kulkarni@oracle.com \
    --cc=bsegall@google.com \
    --cc=davem@davemloft.net \
    --cc=dietmar.eggemann@arm.com \
    --cc=edumazet@google.com \
    --cc=jiri@resnulli.us \
    --cc=juri.lelli@redhat.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=peili.io@oracle.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=shuah@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).