public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org,
	Daniel Bristot de Oliveira <bristot@kernel.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	Leonardo Bras <leobras@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [patch 04/12] clockevent unbind: use smp_call_func_single_fail
Date: Wed, 14 Feb 2024 15:58:49 -0300	[thread overview]
Message-ID: <Zc0Naa3pwTyndUvK@tpad> (raw)
In-Reply-To: <87plx3l6wc.ffs@tglx>

On Sun, Feb 11, 2024 at 09:52:35AM +0100, Thomas Gleixner wrote:
> On Wed, Feb 07 2024 at 09:51, Marcelo Tosatti wrote:
> > On Wed, Feb 07, 2024 at 12:55:59PM +0100, Thomas Gleixner wrote:
> >
> > OK, so the problem is the following: due to software complexity, one is
> > often not aware of all operations that might take place.
> 
> The problem is that people throw random crap on their systems and avoid
> proper system engineering and then complain that their realtime
> constraints are violated. So you are proliferating bad engineering
> practices and encourage people not to care.

Its more of a practicality and cost concern: one usually does not have
resources to fully review software before using that software.

> > Now think of all possible paths, from userspace, that lead to kernel
> > code that ends up in smp_call_function_* variants (or other functions
> > that cause IPIs to isolated CPUs).
> 
> So you need to analyze every possible code path and interface and add
> your magic functions there after figuring out whether that's valid or
> not.

"A magic function", yes.

> > The alternative, from blocking this in the kernel, would be to validate all 
> > userspace software involved in your application, to ensure it won't end
> > up in the kernel sending IPIs. Which is impractical, isnt it ?
> 
> It's absolutely not impractical. It's part of proper system
> engineering. The wet dream that you can run random docker containers and
> everything works magically is just a wet dream.

Unfortunately that is what people do.

I understand that "full software review" would be the ideal, but in most
situations it does not seem to happen.

> > (or rather, with such option in the kernel, it would be possible to run 
> > applications which have not been validated, since the kernel would fail
> > the operation that results in IPI to isolated CPU).
> 
> That's a fallacy because you _cannot_ define with a single CPU mask
> which interface is valid in a particular configuration to end up with an
> IPI and which one is not. There are legitimate reasons in realtime or
> latency constraint systems to invoke selective functionality which
> interferes with the overall system constraints.
> 
> How do you cover that with your magic CPU mask? You can't.
> 
> Aside of that there is a decent chance that you are subtly breaking user
> space that way. Just look at that hwmon/coretemp commit you pointed to:
> 
>   "Temperature information from the housekeeping cores should be
>    sufficient to infer die temperature."
> 
> That's just wishful thinking for various reasons:
> 
>   - The die temperature on larger packages is not evenly distributed and
>     you can run into situations where the housekeeping cores are sitting
>     "far" enough away from the worker core which creates the heat spot

I know.

>   - Some monitoring applications just stop to work when they can't read
>     the full data set, which means that they break subtly and you can
>     infer exactly nothing.
> 
> > So the idea would be an additional "isolation mode", which when enabled, 
> > would disallow the IPIs. Its still possible for root user to disable
> > this mode, and retry the operation.
> >
> > So lets say i want to read MSRs on a given CPU, as root.
> >
> > You'd have to: 
> >
> > 1) readmsr on given CPU (returns -EPERM or whatever), since the
> > "block interference" mode is enabled for that CPU.
> >
> > 2) Disable that CPU in the block interference cpumask.
> >
> > 3) readmsr on the given CPU (success).
> >
> > 4) Re-enable CPU in block interference cpumask, if desired.
> 
> That's just wrong. Why?
> 
> Once you enable it just to read the MSR you enable the operation for
> _ALL_ other non-validated crap too. So while the single MSR read might
> be OK under certain circumstances the fact that you open up a window for
> all other interfaces to do far more interfering operations is a red
> flag.
> 
> This whole thing is a really badly defined policy mechanism of very
> dubious value.
> 
> Thanks,

OK, fair enough. From your comments, it seems that per-callsite 
toggling would be ideal, for example:

/sys/kernel/interference_blocking/ directory containing one sub-directory
per call site. 

Inside each sub-directory, a "enabled" file, controlling a boolean 
to enable or disable interference blocking for that particular
callsite.

Also a "cpumask" file on each directory, by default containing the same
cpumask as the nohz_full CPUs, to control to which CPUs to block the
interference for.

How does that sound?


  reply	other threads:[~2024-02-14 19:06 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-06 18:49 [patch 00/12] cpu isolation: infra to block interference to select CPUs Marcelo Tosatti
2024-02-06 18:49 ` [patch 01/12] cpu isolation: basic block interference infrastructure Marcelo Tosatti
2024-02-06 18:49 ` [patch 02/12] introduce smp_call_func_single_fail Marcelo Tosatti
2024-02-06 18:49 ` [patch 03/12] Introduce _fail variants of stop_machine functions Marcelo Tosatti
2024-02-06 18:49 ` [patch 04/12] clockevent unbind: use smp_call_func_single_fail Marcelo Tosatti
2024-02-07 11:55   ` Thomas Gleixner
2024-02-07 12:51     ` Marcelo Tosatti
2024-02-11  8:52       ` Thomas Gleixner
2024-02-14 18:58         ` Marcelo Tosatti [this message]
2024-02-06 18:49 ` [patch 05/12] timekeeping_notify: use stop_machine_fail when appropriate Marcelo Tosatti
2024-02-07 11:57   ` Thomas Gleixner
2024-02-07 12:58     ` Marcelo Tosatti
2024-02-08 15:23       ` Thomas Gleixner
2024-02-09 15:30         ` Marcelo Tosatti
2024-02-12 15:29           ` Thomas Gleixner
2024-02-06 18:49 ` [patch 06/12] perf_event_open: check for block interference CPUs Marcelo Tosatti
2024-02-06 18:49 ` [patch 07/12] mtrr_add_page/mtrr_del_page: " Marcelo Tosatti
2024-02-06 18:49 ` [patch 08/12] arm64 kernel/topology: use smp_call_function_single_fail Marcelo Tosatti
2024-02-06 18:49 ` [patch 09/12] AMD MCE: use smp_call_func_single_fail Marcelo Tosatti
2024-02-06 18:49 ` [patch 10/12] x86/mce/inject.c: fail if target cpu is block interference Marcelo Tosatti
2024-02-06 18:49 ` [patch 11/12] x86/resctrl: use smp_call_function_single_fail Marcelo Tosatti
2024-02-12 15:19   ` Thomas Gleixner
2024-02-14 18:59     ` Marcelo Tosatti
2024-02-06 18:49 ` [patch 12/12] x86/cacheinfo.c: check for block interference CPUs Marcelo Tosatti
2024-02-07 12:41   ` Thomas Gleixner
2024-02-07 13:10     ` Marcelo Tosatti
2024-02-07 13:16       ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zc0Naa3pwTyndUvK@tpad \
    --to=mtosatti@redhat.com \
    --cc=bristot@kernel.org \
    --cc=frederic@kernel.org \
    --cc=juri.lelli@redhat.com \
    --cc=leobras@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox