public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: "Eric W. Biederman" <ebiederm@xmission.com>
To: Sasha Levin <sashal@kernel.org>
Cc: linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	Oleg Nesterov <oleg@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, luto@kernel.org, frederic@kernel.org,
	mark.rutland@arm.com, valentin.schneider@arm.com,
	keescook@chromium.org, elver@google.com, legion@kernel.org
Subject: Re: [PATCH AUTOSEL 5.17 29/43] signal, x86: Delay calling signals in atomic on RT enabled kernels
Date: Mon, 28 Mar 2022 09:31:51 -0500	[thread overview]
Message-ID: <87r16mw3l4.fsf@email.froward.int.ebiederm.org> (raw)
In-Reply-To: <20220328111828.1554086-29-sashal@kernel.org> (Sasha Levin's message of "Mon, 28 Mar 2022 07:18:13 -0400")


Thank you for cc'ing me.  You probably want to hold off on back-porting
this patch.  The appropriate fix requires some more conversation.

At a mininum this patch should not be using TIF_NOTIFY_RESUME.

Eric



Sasha Levin <sashal@kernel.org> writes:

> From: Oleg Nesterov <oleg@redhat.com>
>
> [ Upstream commit bf9ad37dc8a30cce22ae95d6c2ca6abf8731d305 ]
>
> On x86_64 we must disable preemption before we enable interrupts
> for stack faults, int3 and debugging, because the current task is using
> a per CPU debug stack defined by the IST. If we schedule out, another task
> can come in and use the same stack and cause the stack to be corrupted
> and crash the kernel on return.
>
> When CONFIG_PREEMPT_RT is enabled, spinlock_t locks become sleeping, and
> one of these is the spin lock used in signal handling.
>
> Some of the debug code (int3) causes do_trap() to send a signal.
> This function calls a spinlock_t lock that has been converted to a
> sleeping lock. If this happens, the above issues with the corrupted
> stack is possible.
>
> Instead of calling the signal right away, for PREEMPT_RT and x86,
> the signal information is stored on the stacks task_struct and
> TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
> code will send the signal when preemption is enabled.
>
> [ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to
>   ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
> [bigeasy: Add on 32bit as per Yang Shi, minor rewording. ]
> [ tglx: Use a config option ]
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Link: https://lore.kernel.org/r/Ygq5aBB/qMQw6aP5@linutronix.de
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>  arch/x86/Kconfig       |  1 +
>  include/linux/sched.h  |  3 +++
>  kernel/Kconfig.preempt | 12 +++++++++++-
>  kernel/entry/common.c  | 14 ++++++++++++++
>  kernel/signal.c        | 40 ++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 69 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 9f5bd41bf660..d557ac29b6cd 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -120,6 +120,7 @@ config X86
>  	select ARCH_WANTS_NO_INSTR
>  	select ARCH_WANT_HUGE_PMD_SHARE
>  	select ARCH_WANT_LD_ORPHAN_WARN
> +	select ARCH_WANTS_RT_DELAYED_SIGNALS
>  	select ARCH_WANTS_THP_SWAP		if X86_64
>  	select ARCH_HAS_PARANOID_L1D_FLUSH
>  	select BUILDTIME_TABLE_SORT
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 75ba8aa60248..098e37fd770a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1087,6 +1087,9 @@ struct task_struct {
>  	/* Restored if set_restore_sigmask() was used: */
>  	sigset_t			saved_sigmask;
>  	struct sigpending		pending;
> +#ifdef CONFIG_RT_DELAYED_SIGNALS
> +	struct kernel_siginfo		forced_info;
> +#endif
>  	unsigned long			sas_ss_sp;
>  	size_t				sas_ss_size;
>  	unsigned int			sas_ss_flags;
> diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt
> index ce77f0265660..5644abd5f8a8 100644
> --- a/kernel/Kconfig.preempt
> +++ b/kernel/Kconfig.preempt
> @@ -132,4 +132,14 @@ config SCHED_CORE
>  	  which is the likely usage by Linux distributions, there should
>  	  be no measurable impact on performance.
>  
> -
> +config ARCH_WANTS_RT_DELAYED_SIGNALS
> +	bool
> +	help
> +	  This option is selected by architectures where raising signals
> +	  can happen in atomic contexts on PREEMPT_RT enabled kernels. This
> +	  option delays raising the signal until the return to user space
> +	  loop where it is also delivered. X86 requires this to deliver
> +	  signals from trap handlers which run on IST stacks.
> +
> +config RT_DELAYED_SIGNALS
> +	def_bool PREEMPT_RT && ARCH_WANTS_RT_DELAYED_SIGNALS
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index bad713684c2e..0543a2c92f20 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -148,6 +148,18 @@ static void handle_signal_work(struct pt_regs *regs, unsigned long ti_work)
>  	arch_do_signal_or_restart(regs, ti_work & _TIF_SIGPENDING);
>  }
>  
> +#ifdef CONFIG_RT_DELAYED_SIGNALS
> +static inline void raise_delayed_signal(void)
> +{
> +	if (unlikely(current->forced_info.si_signo)) {
> +		force_sig_info(&current->forced_info);
> +		current->forced_info.si_signo = 0;
> +	}
> +}
> +#else
> +static inline void raise_delayed_signal(void) { }
> +#endif
> +
>  static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
>  					    unsigned long ti_work)
>  {
> @@ -162,6 +174,8 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
>  		if (ti_work & _TIF_NEED_RESCHED)
>  			schedule();
>  
> +		raise_delayed_signal();
> +
>  		if (ti_work & _TIF_UPROBE)
>  			uprobe_notify_resume(regs);
>  
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 9b04631acde8..e93de6daa188 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1307,6 +1307,43 @@ enum sig_handler {
>  	HANDLER_EXIT,	 /* Only visible as the process exit code */
>  };
>  
> +/*
> + * On some archictectures, PREEMPT_RT has to delay sending a signal from a
> + * trap since it cannot enable preemption, and the signal code's
> + * spin_locks turn into mutexes. Instead, it must set TIF_NOTIFY_RESUME
> + * which will send the signal on exit of the trap.
> + */
> +#ifdef CONFIG_RT_DELAYED_SIGNALS
> +static inline bool force_sig_delayed(struct kernel_siginfo *info,
> +				     struct task_struct *t)
> +{
> +	if (!in_atomic())
> +		return false;
> +
> +	if (WARN_ON_ONCE(t->forced_info.si_signo))
> +		return true;
> +
> +	if (is_si_special(info)) {
> +		WARN_ON_ONCE(info != SEND_SIG_PRIV);
> +		t->forced_info.si_signo = info->si_signo;
> +		t->forced_info.si_errno = 0;
> +		t->forced_info.si_code = SI_KERNEL;
> +		t->forced_info.si_pid = 0;
> +		t->forced_info.si_uid = 0;
> +	} else {
> +		t->forced_info = *info;
> +	}
> +	set_tsk_thread_flag(t, TIF_NOTIFY_RESUME);
> +	return true;
> +}
> +#else
> +static inline bool force_sig_delayed(struct kernel_siginfo *info,
> +				     struct task_struct *t)
> +{
> +	return false;
> +}
> +#endif
> +
>  /*
>   * Force a signal that the process can't ignore: if necessary
>   * we unblock the signal and change any SIG_IGN to SIG_DFL.
> @@ -1327,6 +1364,9 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
>  	struct k_sigaction *action;
>  	int sig = info->si_signo;
>  
> +	if (force_sig_delayed(info, t))
> +		return 0;
> +
>  	spin_lock_irqsave(&t->sighand->siglock, flags);
>  	action = &t->sighand->action[sig-1];
>  	ignored = action->sa.sa_handler == SIG_IGN;

  reply	other threads:[~2022-03-28 14:32 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 02/43] regulator: rpi-panel: Handle I2C errors/timing to the Atmel Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 03/43] crypto: hisilicon/qm - cleanup warning in qm_vf_read_qos Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 04/43] crypto: octeontx2 - CN10K CPT to RNM workaround Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 05/43] gcc-plugins/stackleak: Exactly match strings instead of prefixes Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 06/43] rcu: Kill rnp->ofl_seq and use only rcu_state.ofl_lock for exclusion Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 07/43] pinctrl: npcm: Fix broken references to chip->parent_device Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 08/43] rcu: Mark writes to the rcu_segcblist structure's ->flags field Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 09/43] block: throttle split bio in case of iops limit Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 10/43] memstick/mspro_block: fix handling of read-only devices Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 11/43] block/bfq_wf2q: correct weight to ioprio Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 12/43] crypto: xts - Add softdep on ecb Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 13/43] crypto: hisilicon/sec - not need to enable sm4 extra mode at HW V3 Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 14/43] block, bfq: don't move oom_bfqq Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 15/43] selinux: use correct type for context length Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction Sasha Levin
2022-03-28 18:08   ` Eric Biggers
2022-03-28 18:34     ` Michael Brooks
2022-03-29  5:31     ` Jason A. Donenfeld
2022-04-05 22:10       ` Jason A. Donenfeld
2022-03-29 15:38     ` Theodore Ts'o
2022-03-29 17:34       ` Michael Brooks
2022-03-29 18:28         ` Theodore Ts'o
     [not found]   ` <CAOnCY6RUN+CSwjsD6Vg-MDi7ERAj2kKLorMLGp1jE8dTZ+3cpQ@mail.gmail.com>
2022-03-28 19:33     ` Michael Brooks
2022-03-30 16:08   ` Michael Brooks
2022-03-30 16:49     ` David Laight
2022-03-30 17:10       ` Michael Brooks
2022-03-30 18:33         ` Michael Brooks
2022-03-30 19:01           ` Theodore Y. Ts'o
2022-03-30 19:08             ` Michael Brooks
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 17/43] random: remove batched entropy locking Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 18/43] random: absorb fast pool into input pool after fast load Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 19/43] powercap/dtpm_cpu: Reset per_cpu variable in the release function Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 20/43] random: round-robin registers as ulong, not u32 Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 21/43] arm64: module: remove (NOLOAD) from linker script Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 22/43] selinux: allow FIOCLEX and FIONCLEX with policy capability Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 23/43] loop: use sysfs_emit() in the sysfs xxx show() Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 24/43] Fix incorrect type in assignment of ipv6 port for audit Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 25/43] irqchip/qcom-pdc: Fix broken locking Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 26/43] irqchip/nvic: Release nvic_base upon failure Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 27/43] fs/binfmt_elf: Fix AT_PHDR for unusual ELF files Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 28/43] hwrng: cavium - fix NULL but dereferenced coccicheck error Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 29/43] signal, x86: Delay calling signals in atomic on RT enabled kernels Sasha Levin
2022-03-28 14:31   ` Eric W. Biederman [this message]
2022-03-28 16:35     ` Sebastian Andrzej Siewior
2022-03-31 16:59       ` Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 30/43] bfq: fix use-after-free in bfq_dispatch_request Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 31/43] ACPICA: Avoid walking the ACPI Namespace if it is not there Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 32/43] ACPI / x86: Add skip i2c clients quirk for Nextbook Ares 8 Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 33/43] ACPI / x86: Add skip i2c clients quirk for Lenovo Yoga Tablet 1050F/L Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 34/43] lib/raid6/test/Makefile: Use $(pound) instead of \# for Make 4.3 Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 35/43] Revert "Revert "block, bfq: honor already-setup queue merges"" Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 36/43] ACPI/APEI: Limit printable size of BERT table data Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 37/43] PM: core: keep irq flags in device_pm_check_callbacks() Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 38/43] parisc: Fix non-access data TLB cache flush faults Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 39/43] parisc: Fix handling off probe non-access faults Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 40/43] nvme-tcp: lockdep: annotate in-kernel sockets Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 41/43] spi: tegra20: Use of_device_get_match_data() Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 42/43] Revert "ACPI: Pass the same capabilities to the _OSC regardless of the query flag" Sasha Levin
2022-07-07 21:30   ` Tom Crossland
2022-07-07 21:36     ` Limonciello, Mario
2022-07-08  9:22       ` Tom Crossland
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 43/43] spi: fsi: Implement a timeout for polling status Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r16mw3l4.fsf@email.froward.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=bigeasy@linutronix.de \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=elver@google.com \
    --cc=frederic@kernel.org \
    --cc=juri.lelli@redhat.com \
    --cc=keescook@chromium.org \
    --cc=legion@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox