Re: [PATCH RFC 0/4] static key support for error injection functions

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Wei Yang <richard.weiyang@gmail.com>
To: Vlastimil Babka <vbabka@suse.cz>, g@master
Cc: Akinobu Mita <akinobu.mita@gmail.com>,
	Christoph Lameter <cl@linux.com>,
	David Rientjes <rientjes@google.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	"Naveen N. Rao" <naveen.n.rao@linux.ibm.com>,
	Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>,
	"David S. Miller" <davem@davemloft.net>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mark Rutland <mark.rutland@arm.com>, Jiri Olsa <jolsa@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Hyeonggon Yoo <42.hyeyoo@gmail.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	bpf@vger.kernel.org, linux-trace-kernel@vger.kernel.org
Subject: Re: [PATCH RFC 0/4] static key support for error injection functions
Date: Sun, 2 Jun 2024 11:36:04 +0000	[thread overview]
Message-ID: <20240602113604.pn74o7g2lazr7ugk@master> (raw)
In-Reply-To: <20240531-fault-injection-statickeys-v1-0-a513fd0a9614@suse.cz>

On Fri, May 31, 2024 at 11:33:31AM +0200, Vlastimil Babka wrote:
>Incomplete, help needed from ftrace/kprobe and bpf folks.
>
>As previously mentioned by myself [1] and others [2] the functions
>designed for error injection can bring visible overhead in fastpaths
>such as slab or page allocation, because even if nothing hooks into them
>at a given moment, they are noninline function calls regardless of
>CONFIG_ options since commits 4f6923fbb352 ("mm: make should_failslab
>always available for fault injection") and af3b854492f3
>("mm/page_alloc.c: allow error injection").
>
>Live patching their callsites has been also suggested in both [1] and
>[2] threads, and this is an attempt to do that with static keys that
>guard the call sites. When disabled, the error injection functions still
>exist and are noinline, but are not being called. Any of the existing
>mechanisms that can inject errors should make sure to enable the
>respective static key. I have added that support to some of them but
>need help with the others.
>
>- the legacy fault injection, i.e. CONFIG_FAILSLAB and
>  CONFIG_FAIL_PAGE_ALLOC is handled in Patch 1, and can be passed the
>  address of the static key if it exists. The key will be activated if the
>  fault injection probability becomes non-zero, and deactivated in the
>  opposite transition. This also removes the overhead of the evaluation
>  (on top of the noninline function call) when these mechanisms are
>  configured in the kernel but unused at the moment.
>
>- the generic error injection using kretprobes with
>  override_function_with_return is handled in patch 2. The
>  ALLOW_ERROR_INJECTION() annotation is extended so that static key
>  address can be passed, and the framework controls it when error
>  injection is enabled or disabled in debugfs for the function.
>
>There are two more users I know of but am not familiar enough to fix up
>myself. I hope people that are more familiar can help me here.
>
>- ftrace seems to be using override_function_with_return from
>  #define ftrace_override_function_with_return but I found no place
>  where the latter is used. I assume it might be hidden behind more
>  macro magic? But the point is if ftrace can be instructed to act like
>  an error injection, it would also have to use some form of metadata
>  (from patch 2 presumably?) to get to the static key and control it.
>
>  If ftrace can only observe the function being called, maybe it
>  wouldn't be wrong to just observe nothing if the static key isn't
>  enabled because nobody is doing the fault injection?
>
>- bpftrace, as can be seen from the example in commit 4f6923fbb352
>  description. I suppose bpf is already aware what functions the
>  currently loaded bpf programs hook into, so that it could look up the
>  static key and control it. Maybe using again the metadata from patch 2,
>  or extending its own, as I've noticed there's e.g. BTF_ID(func,
>  should_failslab)
>
>Now I realize maybe handling this at the k(ret)probe level would be
>sufficient for all cases except the legacy fault injection from Patch 1?
>Also wanted to note that by AFAIU by using the static_key_slow_dec/inc
>API (as done in patches 1/2) should allow all mechanisms to coexist
>naturally without fighting each other on the static key state, and also
>handle the reference count for e.g. active probes or bpf programs if
>there's no similar internal mechanism.
>
>Patches 3 and 4 implement the static keys for the two mm fault injection
>sites in slab and page allocators. For a quick demonstration I've run a
>VM and the simple test from [1] that stresses the slab allocator and got

I took a look into [1] and I see some data like "1.43% plus the overhead in
its caller", but not clearly find which test cases are.

Sorry for my unfamiliarity, would you mind giving more words on the cases?

>this time before the series:
>
>real    0m8.349s
>user    0m0.694s
>sys     0m7.648s
>
>with perf showing
>
>   0.61%  nonexistent  [kernel.kallsyms]  [k] should_failslab.constprop.0
>   0.00%  nonexistent  [kernel.kallsyms]  [k] should_fail_alloc_page                                                                                                                                                                                        ▒
>
>And after the series
>
>real    0m7.924s
>user    0m0.727s
>sys     0m7.191s
>

Maybe add the percentage here would be more helpful.

>and the functions gone from perf report.
>
>There might be other such fault injection callsites in hotpaths of other
>subsystems but I didn't search for them at this point.
>
>[1] https://lore.kernel.org/all/6d5bb852-8703-4abf-a52b-90816bccbd7f@suse.cz/
>[2] https://lore.kernel.org/all/3j5d3p22ssv7xoaghzraa7crcfih3h2qqjlhmjppbp6f42pg2t@kg7qoicog5ye/
>
>Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>---
>Vlastimil Babka (4):
>      fault-inject: add support for static keys around fault injection sites
>      error-injection: support static keys around injectable functions
>      mm, slab: add static key for should_failslab()
>      mm, page_alloc: add static key for should_fail_alloc_page()
>
> include/asm-generic/error-injection.h | 13 ++++++++++-
> include/asm-generic/vmlinux.lds.h     |  2 +-
> include/linux/error-injection.h       |  9 +++++---
> include/linux/fault-inject.h          |  7 +++++-
> kernel/fail_function.c                | 22 +++++++++++++++---
> lib/error-inject.c                    |  6 ++++-
> lib/fault-inject.c                    | 43 ++++++++++++++++++++++++++++++++++-
> mm/fail_page_alloc.c                  |  3 ++-
> mm/failslab.c                         |  2 +-
> mm/internal.h                         |  2 ++
> mm/page_alloc.c                       | 11 ++++++---
> mm/slab.h                             |  3 +++
> mm/slub.c                             | 10 +++++---
> 13 files changed, 114 insertions(+), 19 deletions(-)
>---
>base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0
>change-id: 20240530-fault-injection-statickeys-66b7222e91b7
>
>Best regards,
>-- 
>Vlastimil Babka <vbabka@suse.cz>
>

-- 
Wei Yang
Help you, Help me

WARNING: multiple messages have this Message-ID (diff)

From: Wei Yang <richard.weiyang@gmail.com>
To: Vlastimil Babka <vbabka@suse.cz>, g@master.kvack.org
Cc: Akinobu Mita <akinobu.mita@gmail.com>,
	Christoph Lameter <cl@linux.com>,
	David Rientjes <rientjes@google.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	"Naveen N. Rao" <naveen.n.rao@linux.ibm.com>,
	Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>,
	"David S. Miller" <davem@davemloft.net>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mark Rutland <mark.rutland@arm.com>, Jiri Olsa <jolsa@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Hyeonggon Yoo <42.hyeyoo@gmail.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	bpf@vger.kernel.org, linux-trace-kernel@vger.kernel.org
Subject: Re: [PATCH RFC 0/4] static key support for error injection functions
Date: Sun, 2 Jun 2024 11:36:04 +0000	[thread overview]
Message-ID: <20240602113604.pn74o7g2lazr7ugk@master> (raw)
In-Reply-To: <20240531-fault-injection-statickeys-v1-0-a513fd0a9614@suse.cz>

On Fri, May 31, 2024 at 11:33:31AM +0200, Vlastimil Babka wrote:
>Incomplete, help needed from ftrace/kprobe and bpf folks.
>
>As previously mentioned by myself [1] and others [2] the functions
>designed for error injection can bring visible overhead in fastpaths
>such as slab or page allocation, because even if nothing hooks into them
>at a given moment, they are noninline function calls regardless of
>CONFIG_ options since commits 4f6923fbb352 ("mm: make should_failslab
>always available for fault injection") and af3b854492f3
>("mm/page_alloc.c: allow error injection").
>
>Live patching their callsites has been also suggested in both [1] and
>[2] threads, and this is an attempt to do that with static keys that
>guard the call sites. When disabled, the error injection functions still
>exist and are noinline, but are not being called. Any of the existing
>mechanisms that can inject errors should make sure to enable the
>respective static key. I have added that support to some of them but
>need help with the others.
>
>- the legacy fault injection, i.e. CONFIG_FAILSLAB and
>  CONFIG_FAIL_PAGE_ALLOC is handled in Patch 1, and can be passed the
>  address of the static key if it exists. The key will be activated if the
>  fault injection probability becomes non-zero, and deactivated in the
>  opposite transition. This also removes the overhead of the evaluation
>  (on top of the noninline function call) when these mechanisms are
>  configured in the kernel but unused at the moment.
>
>- the generic error injection using kretprobes with
>  override_function_with_return is handled in patch 2. The
>  ALLOW_ERROR_INJECTION() annotation is extended so that static key
>  address can be passed, and the framework controls it when error
>  injection is enabled or disabled in debugfs for the function.
>
>There are two more users I know of but am not familiar enough to fix up
>myself. I hope people that are more familiar can help me here.
>
>- ftrace seems to be using override_function_with_return from
>  #define ftrace_override_function_with_return but I found no place
>  where the latter is used. I assume it might be hidden behind more
>  macro magic? But the point is if ftrace can be instructed to act like
>  an error injection, it would also have to use some form of metadata
>  (from patch 2 presumably?) to get to the static key and control it.
>
>  If ftrace can only observe the function being called, maybe it
>  wouldn't be wrong to just observe nothing if the static key isn't
>  enabled because nobody is doing the fault injection?
>
>- bpftrace, as can be seen from the example in commit 4f6923fbb352
>  description. I suppose bpf is already aware what functions the
>  currently loaded bpf programs hook into, so that it could look up the
>  static key and control it. Maybe using again the metadata from patch 2,
>  or extending its own, as I've noticed there's e.g. BTF_ID(func,
>  should_failslab)
>
>Now I realize maybe handling this at the k(ret)probe level would be
>sufficient for all cases except the legacy fault injection from Patch 1?
>Also wanted to note that by AFAIU by using the static_key_slow_dec/inc
>API (as done in patches 1/2) should allow all mechanisms to coexist
>naturally without fighting each other on the static key state, and also
>handle the reference count for e.g. active probes or bpf programs if
>there's no similar internal mechanism.
>
>Patches 3 and 4 implement the static keys for the two mm fault injection
>sites in slab and page allocators. For a quick demonstration I've run a
>VM and the simple test from [1] that stresses the slab allocator and got

I took a look into [1] and I see some data like "1.43% plus the overhead in
its caller", but not clearly find which test cases are.

Sorry for my unfamiliarity, would you mind giving more words on the cases?

>this time before the series:
>
>real    0m8.349s
>user    0m0.694s
>sys     0m7.648s
>
>with perf showing
>
>   0.61%  nonexistent  [kernel.kallsyms]  [k] should_failslab.constprop.0
>   0.00%  nonexistent  [kernel.kallsyms]  [k] should_fail_alloc_page                                                                                                                                                                                        ▒
>
>And after the series
>
>real    0m7.924s
>user    0m0.727s
>sys     0m7.191s
>

Maybe add the percentage here would be more helpful.

>and the functions gone from perf report.
>
>There might be other such fault injection callsites in hotpaths of other
>subsystems but I didn't search for them at this point.
>
>[1] https://lore.kernel.org/all/6d5bb852-8703-4abf-a52b-90816bccbd7f@suse.cz/
>[2] https://lore.kernel.org/all/3j5d3p22ssv7xoaghzraa7crcfih3h2qqjlhmjppbp6f42pg2t@kg7qoicog5ye/
>
>Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>---
>Vlastimil Babka (4):
>      fault-inject: add support for static keys around fault injection sites
>      error-injection: support static keys around injectable functions
>      mm, slab: add static key for should_failslab()
>      mm, page_alloc: add static key for should_fail_alloc_page()
>
> include/asm-generic/error-injection.h | 13 ++++++++++-
> include/asm-generic/vmlinux.lds.h     |  2 +-
> include/linux/error-injection.h       |  9 +++++---
> include/linux/fault-inject.h          |  7 +++++-
> kernel/fail_function.c                | 22 +++++++++++++++---
> lib/error-inject.c                    |  6 ++++-
> lib/fault-inject.c                    | 43 ++++++++++++++++++++++++++++++++++-
> mm/fail_page_alloc.c                  |  3 ++-
> mm/failslab.c                         |  2 +-
> mm/internal.h                         |  2 ++
> mm/page_alloc.c                       | 11 ++++++---
> mm/slab.h                             |  3 +++
> mm/slub.c                             | 10 +++++---
> 13 files changed, 114 insertions(+), 19 deletions(-)
>---
>base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0
>change-id: 20240530-fault-injection-statickeys-66b7222e91b7
>
>Best regards,
>-- 
>Vlastimil Babka <vbabka@suse.cz>
>

-- 
Wei Yang
Help you, Help me

next prev parent reply	other threads:[~2024-06-02 11:36 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-31  9:33 [PATCH RFC 0/4] static key support for error injection functions Vlastimil Babka
2024-05-31  9:33 ` [PATCH RFC 1/4] fault-inject: add support for static keys around fault injection sites Vlastimil Babka
2024-05-31  9:33 ` [PATCH RFC 2/4] error-injection: support static keys around injectable functions Vlastimil Babka
2024-06-02 14:19   ` Masami Hiramatsu
2024-05-31  9:33 ` [PATCH RFC 3/4] mm, slab: add static key for should_failslab() Vlastimil Babka
2024-05-31 16:43   ` Alexei Starovoitov
2024-05-31 17:17     ` Yosry Ahmed
2024-06-01 20:57     ` Vlastimil Babka
2024-06-02 19:12       ` Alexei Starovoitov
2024-05-31 23:44   ` Roman Gushchin
2024-05-31  9:33 ` [PATCH RFC 4/4] mm, page_alloc: add static key for should_fail_alloc_page() Vlastimil Babka
2024-05-31 23:50   ` Roman Gushchin
2024-05-31 15:31 ` [PATCH RFC 0/4] static key support for error injection functions Mark Rutland
2024-06-01 20:48   ` Vlastimil Babka
2024-05-31 23:39 ` Roman Gushchin
2024-06-01 20:53   ` Vlastimil Babka
2024-06-02 11:36 ` Wei Yang [this message]
2024-06-02 11:36   ` Wei Yang
2024-06-02 20:47 ` David Rientjes
2024-06-02 21:08   ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240602113604.pn74o7g2lazr7ugk@master \
    --to=richard.weiyang@gmail.com \
    --cc=42.hyeyoo@gmail.com \
    --cc=akinobu.mita@gmail.com \
    --cc=andrii@kernel.org \
    --cc=anil.s.keshavamurthy@intel.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=g@master \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mhiramat@kernel.org \
    --cc=naveen.n.rao@linux.ibm.com \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.