From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: rostedt <rostedt@goodmis.org>, paulmck <paulmck@kernel.org>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
Matt Mullins <mmullins@mmlx.us>, paulmck <paulmck@kernel.org>,
Ingo Molnar <mingo@redhat.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Dmitry Vyukov <dvyukov@google.com>,
Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
Yonghong Song <yhs@fb.com>, Andrii Nakryiko <andriin@fb.com>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@chromium.org>,
Kees Cook <keescook@chromium.org>,
Peter Zijlstra <peterz@infradead.org>,
Josh Poimboeuf <jpoimboe@redhat.com>,
Alexey Kardashevskiy <aik@ozlabs.ru>
Subject: Re: [PATCH v4] tracepoint: Do not fail unregistering a probe due to memory failure
Date: Wed, 27 Jan 2021 13:00:46 -0500 (EST) [thread overview]
Message-ID: <2075610164.123.1611770446483.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20210127123951.14f8d321@gandalf.local.home>
----- On Jan 27, 2021, at 12:39 PM, rostedt rostedt@goodmis.org wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
>
> The list of tracepoint callbacks is managed by an array that is protected
> by RCU. To update this array, a new array is allocated, the updates are
> copied over to the new array, and then the list of functions for the
> tracepoint is switched over to the new array. After a completion of an RCU
> grace period, the old array is freed.
>
> This process happens for both adding a callback as well as removing one.
> But on removing a callback, if the new array fails to be allocated, the
> callback is not removed, and may be used after it is freed by the clients
> of the tracepoint.
>
> The handling of a failed allocation for removing an event can break use
> cases as the error report is not propagated up to the original callers. To
> make matters worse, there's some paths that can not handle error cases.
>
> Instead of allocating a new array for removing a tracepoint, allocate twice
> the needed size when adding tracepoints to the array. On removing, use the
> second half of the allocated array. This removes the need to allocate memory
> for removing a tracepoint, as the allocation for removals will already have
> been done.
I don't see how this can work reliably. AFAIU, with RCU, approaches
requiring a pre-allocation of twice the size and swapping to the alternate
memory area on removal falls apart whenever you remove 2 or more elements
back-to-back without waiting for a grace period.
How is this handled by your scheme ?
Thanks,
Mathieu
>
> Link: https://lkml.kernel.org/r/20201115055256.65625-1-mmullins@mmlx.us
> Link: https://lkml.kernel.org/r/20201116175107.02db396d@gandalf.local.home
> Link: https://lkml.kennel.org/r/20201118093405.7a6d2290@gandalf.local.home
>
> Reported-by: Matt Mullins <mmullins@mmlx.us>
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
>
> Changes since v3:
>
> Scrapped the entire idea of having a stub function replace the removed
> tracepoint callback. Instead, simply allocate twice the needed array at
> addition of the tracepoint, and on removal, use the second half of the
> array. This removes the need to allocate anything on removal, which
> removes the possible failure of that allocation.
>
> kernel/tracepoint.c | 54 +++++++++++++++++++++++++++++++++++----------
> 1 file changed, 42 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index 7261fa0f5e3c..23088f6276a4 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -129,7 +129,7 @@ static struct tracepoint_func *
> func_add(struct tracepoint_func **funcs, struct tracepoint_func *tp_func,
> int prio)
> {
> - struct tracepoint_func *old, *new;
> + struct tracepoint_func *old, *new, *tp_funcs;
> int nr_probes = 0;
> int pos = -1;
>
> @@ -149,10 +149,28 @@ func_add(struct tracepoint_func **funcs, struct
> tracepoint_func *tp_func,
> return ERR_PTR(-EEXIST);
> }
> }
> - /* + 2 : one for new probe, one for NULL func */
> - new = allocate_probes(nr_probes + 2);
> - if (new == NULL)
> + /*
> + * The size of the tp_funcs will be the current size, plus
> + * one for the new probe, one for the NULL func, and one for
> + * the pointer to the "removal" array.
> + * And then double the size to create the "removal" array.
> + */
> + tp_funcs = allocate_probes((nr_probes + 3) * 2);
> + if (tp_funcs == NULL)
> return ERR_PTR(-ENOMEM);
> + /*
> + * When removing a probe and there are more probes left,
> + * we cannot rely on allocation to succeed to create the new
> + * RCU array. Thus, the array is doubled here, and on removal of
> + * a probe with other probes still in the array, the second half
> + * of the array is used.
> + *
> + * The first element of the array has its "func" field point to
> + * the start of the other half of the array.
> + */
> + tp_funcs->func = tp_funcs + nr_probes + 3;
> + tp_funcs[nr_probes + 3].func = tp_funcs;
> + new = tp_funcs + 1;
> if (old) {
> if (pos < 0) {
> pos = nr_probes;
> @@ -164,6 +182,14 @@ func_add(struct tracepoint_func **funcs, struct
> tracepoint_func *tp_func,
> memcpy(new + pos + 1, old + pos,
> (nr_probes - pos) * sizeof(struct tracepoint_func));
> }
> + /* Make old point back to the allocated array */
> + old--;
> + /*
> + * If this is the second half of the array,
> + * make it point back to the first half.
> + */
> + if (old->func < old)
> + old = old->func;
> } else
> pos = 0;
> new[pos] = *tp_func;
> @@ -202,14 +228,18 @@ static void *func_remove(struct tracepoint_func **funcs,
> /* N -> 0, (N > 1) */
> *funcs = NULL;
> debug_print_probes(*funcs);
> + /* Set old back to what it was allocated to */
> + old--;
> + if (old->func < old)
> + old = old->func;
> return old;
> } else {
> int j = 0;
> - /* N -> M, (N > 1, M > 0) */
> - /* + 1 for NULL */
> - new = allocate_probes(nr_probes - nr_del + 1);
> - if (new == NULL)
> - return ERR_PTR(-ENOMEM);
> +
> + /* Use the other half of the array for the new probes */
> + new = old - 1;
> + new = new->func;
> + new++;
> for (i = 0; old[i].func; i++)
> if (old[i].func != tp_func->func
> || old[i].data != tp_func->data)
> @@ -218,7 +248,7 @@ static void *func_remove(struct tracepoint_func **funcs,
> *funcs = new;
> }
> debug_print_probes(*funcs);
> - return old;
> + return NULL;
> }
>
> static void tracepoint_update_call(struct tracepoint *tp, struct tracepoint_func
> *tp_funcs, bool sync)
> @@ -309,8 +339,8 @@ static int tracepoint_remove_func(struct tracepoint *tp,
> rcu_assign_pointer(tp->funcs, tp_funcs);
> } else {
> rcu_assign_pointer(tp->funcs, tp_funcs);
> - tracepoint_update_call(tp, tp_funcs,
> - tp_funcs[0].func != old[0].func);
> + /* Need to sync, if going back to a single caller */
> + tracepoint_update_call(tp, tp_funcs, tp_funcs[1].func == NULL);
> }
> release_probes(old);
> return 0;
> --
> 2.25.4
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
next prev parent reply other threads:[~2021-01-27 18:02 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-27 17:39 [PATCH v4] tracepoint: Do not fail unregistering a probe due to memory failure Steven Rostedt
2021-01-27 17:54 ` Steven Rostedt
2021-01-27 18:00 ` Mathieu Desnoyers [this message]
2021-01-27 18:07 ` Steven Rostedt
2021-01-27 18:13 ` Mathieu Desnoyers
2021-01-27 19:16 ` Steven Rostedt
2021-01-27 20:17 ` Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2075610164.123.1611770446483.JavaMail.zimbra@efficios.com \
--to=mathieu.desnoyers@efficios.com \
--cc=aik@ozlabs.ru \
--cc=andriin@fb.com \
--cc=ast@kernel.org \
--cc=daniel@iogearbox.net \
--cc=dvyukov@google.com \
--cc=john.fastabend@gmail.com \
--cc=jpoimboe@redhat.com \
--cc=kafai@fb.com \
--cc=keescook@chromium.org \
--cc=kpsingh@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=mmullins@mmlx.us \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=songliubraving@fb.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox