All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Wander Lairson Costa <wander@redhat.com>,
	Hu Chunyu <chuhu@redhat.com>, Oleg Nesterov <oleg@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Sasha Levin <sashal@kernel.org>,
	brauner@kernel.org, mst@redhat.com, michael.christie@oracle.com,
	wangkefeng.wang@huawei.com, akpm@linux-foundation.org,
	surenb@google.com, Liam.Howlett@oracle.com,
	mathieu.desnoyers@efficios.com, npiggin@gmail.com,
	mjguzik@gmail.com, avagin@gmail.com
Subject: [PATCH AUTOSEL 6.4 02/13] kernel/fork: beware of __put_task_struct() calling context
Date: Fri,  8 Sep 2023 14:00:48 -0400	[thread overview]
Message-ID: <20230908180100.3458151-2-sashal@kernel.org> (raw)
In-Reply-To: <20230908180100.3458151-1-sashal@kernel.org>

From: Wander Lairson Costa <wander@redhat.com>

[ Upstream commit d243b34459cea30cfe5f3a9b2feb44e7daff9938 ]

Under PREEMPT_RT, __put_task_struct() indirectly acquires sleeping
locks. Therefore, it can't be called from an non-preemptible context.

One practical example is splat inside inactive_task_timer(), which is
called in a interrupt context:

  CPU: 1 PID: 2848 Comm: life Kdump: loaded Tainted: G W ---------
   Hardware name: HP ProLiant DL388p Gen8, BIOS P70 07/15/2012
   Call Trace:
   dump_stack_lvl+0x57/0x7d
   mark_lock_irq.cold+0x33/0xba
   mark_lock+0x1e7/0x400
   mark_usage+0x11d/0x140
   __lock_acquire+0x30d/0x930
   lock_acquire.part.0+0x9c/0x210
   rt_spin_lock+0x27/0xe0
   refill_obj_stock+0x3d/0x3a0
   kmem_cache_free+0x357/0x560
   inactive_task_timer+0x1ad/0x340
   __run_hrtimer+0x8a/0x1a0
   __hrtimer_run_queues+0x91/0x130
   hrtimer_interrupt+0x10f/0x220
   __sysvec_apic_timer_interrupt+0x7b/0xd0
   sysvec_apic_timer_interrupt+0x4f/0xd0
   asm_sysvec_apic_timer_interrupt+0x12/0x20
   RIP: 0033:0x7fff196bf6f5

Instead of calling __put_task_struct() directly, we defer it using
call_rcu(). A more natural approach would use a workqueue, but since
in PREEMPT_RT, we can't allocate dynamic memory from atomic context,
the code would become more complex because we would need to put the
work_struct instance in the task_struct and initialize it when we
allocate a new task_struct.

The issue is reproducible with stress-ng:

  while true; do
      stress-ng --sched deadline --sched-period 1000000000 \
	      --sched-runtime 800000000 --sched-deadline \
	      1000000000 --mmapfork 23 -t 20
  done

Reported-by: Hu Chunyu <chuhu@redhat.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Valentin Schneider <vschneid@redhat.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230614122323.37957-2-wander@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 include/linux/sched/task.h | 28 +++++++++++++++++++++++++++-
 kernel/fork.c              |  8 ++++++++
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index e0f5ac90a228b..d20de91e3b957 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -118,10 +118,36 @@ static inline struct task_struct *get_task_struct(struct task_struct *t)
 }
 
 extern void __put_task_struct(struct task_struct *t);
+extern void __put_task_struct_rcu_cb(struct rcu_head *rhp);
 
 static inline void put_task_struct(struct task_struct *t)
 {
-	if (refcount_dec_and_test(&t->usage))
+	if (!refcount_dec_and_test(&t->usage))
+		return;
+
+	/*
+	 * under PREEMPT_RT, we can't call put_task_struct
+	 * in atomic context because it will indirectly
+	 * acquire sleeping locks.
+	 *
+	 * call_rcu() will schedule delayed_put_task_struct_rcu()
+	 * to be called in process context.
+	 *
+	 * __put_task_struct() is called when
+	 * refcount_dec_and_test(&t->usage) succeeds.
+	 *
+	 * This means that it can't "conflict" with
+	 * put_task_struct_rcu_user() which abuses ->rcu the same
+	 * way; rcu_users has a reference so task->usage can't be
+	 * zero after rcu_users 1 -> 0 transition.
+	 *
+	 * delayed_free_task() also uses ->rcu, but it is only called
+	 * when it fails to fork a process. Therefore, there is no
+	 * way it can conflict with put_task_struct().
+	 */
+	if (IS_ENABLED(CONFIG_PREEMPT_RT) && !preemptible())
+		call_rcu(&t->rcu, __put_task_struct_rcu_cb);
+	else
 		__put_task_struct(t);
 }
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 8103ffd217e97..38cfcfbd8b489 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -989,6 +989,14 @@ void __put_task_struct(struct task_struct *tsk)
 }
 EXPORT_SYMBOL_GPL(__put_task_struct);
 
+void __put_task_struct_rcu_cb(struct rcu_head *rhp)
+{
+	struct task_struct *task = container_of(rhp, struct task_struct, rcu);
+
+	__put_task_struct(task);
+}
+EXPORT_SYMBOL_GPL(__put_task_struct_rcu_cb);
+
 void __init __weak arch_task_cache_init(void) { }
 
 /*
-- 
2.40.1


  reply	other threads:[~2023-09-08 18:02 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-08 18:00 [PATCH AUTOSEL 6.4 01/13] ACPICA: Add AML_NO_OPERAND_RESOLVE flag to Timer Sasha Levin
2023-09-08 18:01 ` [Acpica-devel] " Sasha Levin
2023-09-08 18:00 ` Sasha Levin [this message]
2023-09-08 18:00 ` [PATCH AUTOSEL 6.4 03/13] rcuscale: Move rcu_scale_writer() schedule_timeout_uninterruptible() to _idle() Sasha Levin
2023-09-08 18:00 ` [PATCH AUTOSEL 6.4 04/13] scftorture: Forgive memory-allocation failure if KASAN Sasha Levin
2023-09-08 18:00 ` [PATCH AUTOSEL 6.4 05/13] ACPI: video: Add backlight=native DMI quirk for Lenovo Ideapad Z470 Sasha Levin
2023-09-08 18:00 ` [PATCH AUTOSEL 6.4 06/13] perf/smmuv3: Enable HiSilicon Erratum 162001900 quirk for HIP08/09 Sasha Levin
2023-09-08 18:00   ` Sasha Levin
2023-09-08 18:00 ` [PATCH AUTOSEL 6.4 07/13] s390/boot: cleanup number of page table levels setup Sasha Levin
2023-09-08 18:00 ` [PATCH AUTOSEL 6.4 08/13] kselftest/arm64: fix a memleak in zt_regs_run() Sasha Levin
2023-09-08 18:00   ` Sasha Levin
2023-09-08 18:00 ` [PATCH AUTOSEL 6.4 09/13] perf/imx_ddr: speed up overflow frequency of cycle Sasha Levin
2023-09-08 18:00   ` Sasha Levin
2023-09-08 18:00 ` [PATCH AUTOSEL 6.4 10/13] ACPI: video: Add backlight=native DMI quirk for Apple iMac12,1 and iMac12,2 Sasha Levin
2023-09-08 18:00 ` [PATCH AUTOSEL 6.4 11/13] hw_breakpoint: fix single-stepping when using bpf_overflow_handler Sasha Levin
2023-09-08 18:00   ` Sasha Levin
2023-09-08 18:00 ` [PATCH AUTOSEL 6.4 12/13] ACPI: x86: s2idle: Catch multiple ACPI_TYPE_PACKAGE objects Sasha Levin
2023-09-08 18:00 ` [PATCH AUTOSEL 6.4 13/13] selftests/nolibc: fix up kernel parameters support Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230908180100.3458151-2-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=avagin@gmail.com \
    --cc=brauner@kernel.org \
    --cc=chuhu@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=michael.christie@oracle.com \
    --cc=mjguzik@gmail.com \
    --cc=mst@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=surenb@google.com \
    --cc=vschneid@redhat.com \
    --cc=wander@redhat.com \
    --cc=wangkefeng.wang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.