Re: [PATCH RFC 0/4] static key support for error injection functions

linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Roman Gushchin <roman.gushchin@linux.dev>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Akinobu Mita <akinobu.mita@gmail.com>,
	Christoph Lameter <cl@linux.com>,
	David Rientjes <rientjes@google.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	"Naveen N. Rao" <naveen.n.rao@linux.ibm.com>,
	Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>,
	"David S. Miller" <davem@davemloft.net>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mark Rutland <mark.rutland@arm.com>, Jiri Olsa <jolsa@kernel.org>,
	Hyeonggon Yoo <42.hyeyoo@gmail.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	bpf@vger.kernel.org, linux-trace-kernel@vger.kernel.org
Subject: Re: [PATCH RFC 0/4] static key support for error injection functions
Date: Fri, 31 May 2024 16:39:38 -0700	[thread overview]
Message-ID: <ZlpfuiLRKa7wGD9y@P9FQF9L96D.corp.robot.car> (raw)
In-Reply-To: <20240531-fault-injection-statickeys-v1-0-a513fd0a9614@suse.cz>

On Fri, May 31, 2024 at 11:33:31AM +0200, Vlastimil Babka wrote:
> Incomplete, help needed from ftrace/kprobe and bpf folks.
> 
> As previously mentioned by myself [1] and others [2] the functions
> designed for error injection can bring visible overhead in fastpaths
> such as slab or page allocation, because even if nothing hooks into them
> at a given moment, they are noninline function calls regardless of
> CONFIG_ options since commits 4f6923fbb352 ("mm: make should_failslab
> always available for fault injection") and af3b854492f3
> ("mm/page_alloc.c: allow error injection").
> 
> Live patching their callsites has been also suggested in both [1] and
> [2] threads, and this is an attempt to do that with static keys that
> guard the call sites. When disabled, the error injection functions still
> exist and are noinline, but are not being called. Any of the existing
> mechanisms that can inject errors should make sure to enable the
> respective static key. I have added that support to some of them but
> need help with the others.

I think it's a clever idea and makes total sense!

> 
> - the legacy fault injection, i.e. CONFIG_FAILSLAB and
>   CONFIG_FAIL_PAGE_ALLOC is handled in Patch 1, and can be passed the
>   address of the static key if it exists. The key will be activated if the
>   fault injection probability becomes non-zero, and deactivated in the
>   opposite transition. This also removes the overhead of the evaluation
>   (on top of the noninline function call) when these mechanisms are
>   configured in the kernel but unused at the moment.
> 
> - the generic error injection using kretprobes with
>   override_function_with_return is handled in patch 2. The
>   ALLOW_ERROR_INJECTION() annotation is extended so that static key
>   address can be passed, and the framework controls it when error
>   injection is enabled or disabled in debugfs for the function.
> 
> There are two more users I know of but am not familiar enough to fix up
> myself. I hope people that are more familiar can help me here.
> 
> - ftrace seems to be using override_function_with_return from
>   #define ftrace_override_function_with_return but I found no place
>   where the latter is used. I assume it might be hidden behind more
>   macro magic? But the point is if ftrace can be instructed to act like
>   an error injection, it would also have to use some form of metadata
>   (from patch 2 presumably?) to get to the static key and control it.
> 
>   If ftrace can only observe the function being called, maybe it
>   wouldn't be wrong to just observe nothing if the static key isn't
>   enabled because nobody is doing the fault injection?
> 
> - bpftrace, as can be seen from the example in commit 4f6923fbb352
>   description. I suppose bpf is already aware what functions the
>   currently loaded bpf programs hook into, so that it could look up the
>   static key and control it. Maybe using again the metadata from patch 2,
>   or extending its own, as I've noticed there's e.g. BTF_ID(func,
>   should_failslab)
> 
> Now I realize maybe handling this at the k(ret)probe level would be
> sufficient for all cases except the legacy fault injection from Patch 1?
> Also wanted to note that by AFAIU by using the static_key_slow_dec/inc
> API (as done in patches 1/2) should allow all mechanisms to coexist
> naturally without fighting each other on the static key state, and also
> handle the reference count for e.g. active probes or bpf programs if
> there's no similar internal mechanism.
> 
> Patches 3 and 4 implement the static keys for the two mm fault injection
> sites in slab and page allocators. For a quick demonstration I've run a
> VM and the simple test from [1] that stresses the slab allocator and got
> this time before the series:
> 
> real    0m8.349s
> user    0m0.694s
> sys     0m7.648s
> 
> with perf showing
> 
>    0.61%  nonexistent  [kernel.kallsyms]  [k] should_failslab.constprop.0
>    0.00%  nonexistent  [kernel.kallsyms]  [k] should_fail_alloc_page                                                                                                                                                                                        ▒
> 
> And after the series
> 
> real    0m7.924s
> user    0m0.727s
> sys     0m7.191s

Is "user" increase a measurement error or it's real?

Otherwise, nice savings!

next prev parent reply	other threads:[~2024-05-31 23:39 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-31  9:33 [PATCH RFC 0/4] static key support for error injection functions Vlastimil Babka
2024-05-31  9:33 ` [PATCH RFC 1/4] fault-inject: add support for static keys around fault injection sites Vlastimil Babka
2024-05-31  9:33 ` [PATCH RFC 2/4] error-injection: support static keys around injectable functions Vlastimil Babka
2024-06-02 14:19   ` Masami Hiramatsu
2024-05-31  9:33 ` [PATCH RFC 3/4] mm, slab: add static key for should_failslab() Vlastimil Babka
2024-05-31 16:43   ` Alexei Starovoitov
2024-05-31 17:17     ` Yosry Ahmed
2024-06-01 20:57     ` Vlastimil Babka
2024-06-02 19:12       ` Alexei Starovoitov
2024-05-31 23:44   ` Roman Gushchin
2024-05-31  9:33 ` [PATCH RFC 4/4] mm, page_alloc: add static key for should_fail_alloc_page() Vlastimil Babka
2024-05-31 23:50   ` Roman Gushchin
2024-05-31 15:31 ` [PATCH RFC 0/4] static key support for error injection functions Mark Rutland
2024-06-01 20:48   ` Vlastimil Babka
2024-05-31 23:39 ` Roman Gushchin [this message]
2024-06-01 20:53   ` Vlastimil Babka
2024-06-02 11:36 ` Wei Yang
2024-06-02 20:47 ` David Rientjes
2024-06-02 21:08   ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZlpfuiLRKa7wGD9y@P9FQF9L96D.corp.robot.car \
    --to=roman.gushchin@linux.dev \
    --cc=42.hyeyoo@gmail.com \
    --cc=akinobu.mita@gmail.com \
    --cc=andrii@kernel.org \
    --cc=anil.s.keshavamurthy@intel.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mhiramat@kernel.org \
    --cc=naveen.n.rao@linux.ibm.com \
    --cc=rientjes@google.com \
    --cc=rostedt@goodmis.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).