All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Frederic Weisbecker <frederic@kernel.org>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Neeraj Upadhyay <neeraju@codeaurora.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	Sasha Levin <sashal@kernel.org>,
	rcu@vger.kernel.org
Subject: [PATCH AUTOSEL 5.4 02/63] srcu: Fix broken node geometry after early ssp init
Date: Fri,  9 Jul 2021 22:26:08 -0400	[thread overview]
Message-ID: <20210710022709.3170675-2-sashal@kernel.org> (raw)
In-Reply-To: <20210710022709.3170675-1-sashal@kernel.org>

From: Frederic Weisbecker <frederic@kernel.org>

[ Upstream commit b5befe842e6612cf894cf4a199924ee872d8b7d8 ]

An srcu_struct structure that is initialized before rcu_init_geometry()
will have its srcu_node hierarchy based on CONFIG_NR_CPUS.  Once
rcu_init_geometry() is called, this hierarchy is compressed as needed
for the actual maximum number of CPUs for this system.

Later on, that srcu_struct structure is confused, sometimes referring
to its initial CONFIG_NR_CPUS-based hierarchy, and sometimes instead
to the new num_possible_cpus() hierarchy.  For example, each of its
->mynode fields continues to reference the original leaf rcu_node
structures, some of which might no longer exist.  On the other hand,
srcu_for_each_node_breadth_first() traverses to the new node hierarchy.

There are at least two bad possible outcomes to this:

1) a) A callback enqueued early on an srcu_data structure (call it
      *sdp) is recorded pending on sdp->mynode->srcu_data_have_cbs in
      srcu_funnel_gp_start() with sdp->mynode pointing to a deep leaf
      (say 3 levels).

   b) The grace period ends after rcu_init_geometry() shrinks the
      nodes level to a single one.  srcu_gp_end() walks through the new
      srcu_node hierarchy without ever reaching the old leaves so the
      callback is never executed.

   This is easily reproduced on an 8 CPUs machine with CONFIG_NR_CPUS >= 32
   and "rcupdate.rcu_self_test=1". The srcu_barrier() after early tests
   verification never completes and the boot hangs:

	[ 5413.141029] INFO: task swapper/0:1 blocked for more than 4915 seconds.
	[ 5413.147564]       Not tainted 5.12.0-rc4+ #28
	[ 5413.151927] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
	[ 5413.159753] task:swapper/0       state:D stack:    0 pid:    1 ppid:     0 flags:0x00004000
	[ 5413.168099] Call Trace:
	[ 5413.170555]  __schedule+0x36c/0x930
	[ 5413.174057]  ? wait_for_completion+0x88/0x110
	[ 5413.178423]  schedule+0x46/0xf0
	[ 5413.181575]  schedule_timeout+0x284/0x380
	[ 5413.185591]  ? wait_for_completion+0x88/0x110
	[ 5413.189957]  ? mark_held_locks+0x61/0x80
	[ 5413.193882]  ? mark_held_locks+0x61/0x80
	[ 5413.197809]  ? _raw_spin_unlock_irq+0x24/0x50
	[ 5413.202173]  ? wait_for_completion+0x88/0x110
	[ 5413.206535]  wait_for_completion+0xb4/0x110
	[ 5413.210724]  ? srcu_torture_stats_print+0x110/0x110
	[ 5413.215610]  srcu_barrier+0x187/0x200
	[ 5413.219277]  ? rcu_tasks_verify_self_tests+0x50/0x50
	[ 5413.224244]  ? rdinit_setup+0x2b/0x2b
	[ 5413.227907]  rcu_verify_early_boot_tests+0x2d/0x40
	[ 5413.232700]  do_one_initcall+0x63/0x310
	[ 5413.236541]  ? rdinit_setup+0x2b/0x2b
	[ 5413.240207]  ? rcu_read_lock_sched_held+0x52/0x80
	[ 5413.244912]  kernel_init_freeable+0x253/0x28f
	[ 5413.249273]  ? rest_init+0x250/0x250
	[ 5413.252846]  kernel_init+0xa/0x110
	[ 5413.256257]  ret_from_fork+0x22/0x30

2) An srcu_struct structure that is initialized before rcu_init_geometry()
   and used afterward will always have stale rdp->mynode references,
   resulting in callbacks to be missed in srcu_gp_end(), just like in
   the previous scenario.

This commit therefore causes init_srcu_struct_nodes to initialize the
geometry, if needed.  This ensures that the srcu_node hierarchy is
properly built and distributed from the get-go.

Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 kernel/rcu/rcu.h      |  2 ++
 kernel/rcu/srcutree.c |  3 +++
 kernel/rcu/tree.c     | 16 +++++++++++++++-
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 8fd4f82c9b3d..7fd1c18b7cf1 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -316,6 +316,8 @@ static inline void rcu_init_levelspread(int *levelspread, const int *levelcnt)
 	}
 }
 
+extern void rcu_init_geometry(void);
+
 /* Returns a pointer to the first leaf rcu_node structure. */
 #define rcu_first_leaf_node() (rcu_state.level[rcu_num_lvls - 1])
 
diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 21acdff3bd27..21115ffb6c44 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -90,6 +90,9 @@ static void init_srcu_struct_nodes(struct srcu_struct *ssp, bool is_static)
 	struct srcu_node *snp;
 	struct srcu_node *snp_first;
 
+	/* Initialize geometry if it has not already been initialized. */
+	rcu_init_geometry();
+
 	/* Work out the overall tree geometry. */
 	ssp->level[0] = &ssp->node[0];
 	for (i = 1; i < rcu_num_lvls; i++)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 4dfa9dd47223..f90f2c4b2608 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3425,11 +3425,25 @@ static void __init rcu_init_one(void)
  * replace the definitions in tree.h because those are needed to size
  * the ->node array in the rcu_state structure.
  */
-static void __init rcu_init_geometry(void)
+void rcu_init_geometry(void)
 {
 	ulong d;
 	int i;
+	static unsigned long old_nr_cpu_ids;
 	int rcu_capacity[RCU_NUM_LVLS];
+	static bool initialized;
+
+	if (initialized) {
+		/*
+		 * Warn if setup_nr_cpu_ids() had not yet been invoked,
+		 * unless nr_cpus_ids == NR_CPUS, in which case who cares?
+		 */
+		WARN_ON_ONCE(old_nr_cpu_ids != nr_cpu_ids);
+		return;
+	}
+
+	old_nr_cpu_ids = nr_cpu_ids;
+	initialized = true;
 
 	/*
 	 * Initialize any unspecified boot parameters.
-- 
2.30.2


  reply	other threads:[~2021-07-10  2:28 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-10  2:26 [PATCH AUTOSEL 5.4 01/63] dmaengine: fsl-qdma: check dma_set_mask return value Sasha Levin
2021-07-10  2:26 ` Sasha Levin [this message]
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 03/63] tty: serial: fsl_lpuart: fix the potential risk of division or modulo by zero Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 04/63] serial: 8250: of: Check for CONFIG_SERIAL_8250_BCM7271 Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 05/63] misc/libmasm/module: Fix two use after free in ibmasm_init_one Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 06/63] misc: alcor_pci: fix null-ptr-deref when there is no PCI bridge Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 07/63] iio: gyro: fxa21002c: Balance runtime pm + use pm_runtime_resume_and_get() Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 08/63] iio: magn: bmc150: " Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 09/63] ALSA: usx2y: Don't call free_pages_exact() with NULL address Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 10/63] Revert "ALSA: bebob/oxfw: fix Kconfig entry for Mackie d.2 Pro" Sasha Levin
2021-07-10  2:26   ` Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 11/63] w1: ds2438: fixing bug that would always get page0 Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 12/63] scsi: hisi_sas: Propagate errors in interrupt_init_v1_hw() Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 13/63] scsi: lpfc: Fix "Unexpected timeout" error in direct attach topology Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 14/63] scsi: lpfc: Fix crash when lpfc_sli4_hba_setup() fails to initialize the SGLs Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 15/63] scsi: core: Cap scsi_host cmd_per_lun at can_queue Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 16/63] ALSA: ac97: fix PM reference leak in ac97_bus_remove() Sasha Levin
2021-07-10  2:26   ` Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 17/63] tty: serial: 8250: serial_cs: Fix a memory leak in error handling path Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 18/63] scsi: scsi_dh_alua: Check for negative result value Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 19/63] fs/jfs: Fix missing error code in lmLogInit() Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 20/63] scsi: megaraid_sas: Fix resource leak in case of probe failure Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 21/63] scsi: megaraid_sas: Early detection of VD deletion through RaidMap update Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 22/63] scsi: megaraid_sas: Handle missing interrupts while re-enabling IRQs Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 23/63] scsi: iscsi: Add iscsi_cls_conn refcount helpers Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 24/63] scsi: iscsi: Fix conn use after free during resets Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 25/63] scsi: iscsi: Fix shost->max_id use Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 26/63] scsi: qedi: Fix null ref during abort handling Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 27/63] mfd: da9052/stmpe: Add and modify MODULE_DEVICE_TABLE Sasha Levin
2021-07-10  2:26   ` Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 28/63] mfd: cpcap: Fix cpcap dmamask not set warnings Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 29/63] ASoC: img: Fix PM reference leak in img_i2s_in_probe() Sasha Levin
2021-07-10  2:26   ` Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 30/63] serial: tty: uartlite: fix console setup Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 31/63] s390/sclp_vt220: fix console name to match device Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 32/63] selftests: timers: rtcpie: skip test if default RTC device does not exist Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 33/63] USB: core: Avoid WARNings for 0-length descriptor requests Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 34/63] ALSA: sb: Fix potential double-free of CSP mixer elements Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 35/63] powerpc/ps3: Add dma_mask to ps3_dma_region Sasha Levin
2021-07-10  2:26   ` Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 36/63] iommu/arm-smmu: Fix arm_smmu_device refcount leak when arm_smmu_rpm_get fails Sasha Levin
2021-07-10  2:26   ` Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 37/63] iommu/arm-smmu: Fix arm_smmu_device refcount leak in address translation Sasha Levin
2021-07-10  2:26   ` Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 38/63] gpio: zynq: Check return value of pm_runtime_get_sync Sasha Levin
2021-07-10  2:26   ` Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 39/63] ALSA: ppc: fix error return code in snd_pmac_probe() Sasha Levin
2021-07-10  2:26   ` Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 40/63] selftests/powerpc: Fix "no_handler" EBB selftest Sasha Levin
2021-07-10  2:26   ` Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 41/63] gpio: pca953x: Add support for the On Semi pca9655 Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 42/63] ASoC: soc-core: Fix the error return code in snd_soc_of_parse_audio_routing() Sasha Levin
2021-07-10  2:26   ` Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 43/63] s390/processor: always inline stap() and __load_psw_mask() Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 44/63] s390/ipl_parm: fix program check new psw handling Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 45/63] s390/mem_detect: fix diag260() " Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 46/63] s390/mem_detect: fix tprot() " Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 47/63] Input: hideep - fix the uninitialized use in hideep_nvm_unlock() Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 48/63] ALSA: bebob: add support for ToneWeal FW66 Sasha Levin
2021-07-10  2:26   ` Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 49/63] ALSA: usb-audio: scarlett2: Fix 18i8 Gen 2 PCM Input count Sasha Levin
2021-07-10  2:26   ` Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 50/63] ALSA: usb-audio: scarlett2: Fix data_mutex lock Sasha Levin
2021-07-10  2:26   ` Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 51/63] ALSA: usb-audio: scarlett2: Fix scarlett2_*_ctl_put() return values Sasha Levin
2021-07-10  2:26   ` Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 52/63] usb: gadget: f_hid: fix endianness issue with descriptors Sasha Levin
2021-07-10  2:26 ` [PATCH AUTOSEL 5.4 53/63] usb: gadget: hid: fix error return code in hid_bind() Sasha Levin
2021-07-10  2:27 ` [PATCH AUTOSEL 5.4 54/63] powerpc/boot: Fixup device-tree on little endian Sasha Levin
2021-07-10  2:27   ` Sasha Levin
2021-07-10  2:27 ` [PATCH AUTOSEL 5.4 55/63] ASoC: Intel: kbl_da7219_max98357a: shrink platform_id below 20 characters Sasha Levin
2021-07-10  2:27   ` Sasha Levin
2021-07-10  2:27 ` [PATCH AUTOSEL 5.4 56/63] backlight: lm3630a: Fix return code of .update_status() callback Sasha Levin
2021-07-10  2:27   ` Sasha Levin
2021-07-10  2:27 ` [PATCH AUTOSEL 5.4 57/63] ALSA: hda: Add IRQ check for platform_get_irq() Sasha Levin
2021-07-10  2:27   ` Sasha Levin
2021-07-10  2:27 ` [PATCH AUTOSEL 5.4 58/63] ALSA: usb-audio: scarlett2: Fix 6i6 Gen 2 line out descriptions Sasha Levin
2021-07-10  2:27   ` Sasha Levin
2021-07-10  2:27 ` [PATCH AUTOSEL 5.4 59/63] jfs: fix GPF in diFree Sasha Levin
2021-07-10  2:27 ` [PATCH AUTOSEL 5.4 60/63] staging: rtl8723bs: fix macro value for 2.4Ghz only device Sasha Levin
2021-07-10  2:27 ` [PATCH AUTOSEL 5.4 61/63] intel_th: Wait until port is in reset before programming it Sasha Levin
2021-07-10  2:27 ` [PATCH AUTOSEL 5.4 62/63] i2c: core: Disable client irq on reboot/shutdown Sasha Levin
2021-07-10  2:27 ` [PATCH AUTOSEL 5.4 63/63] lib/decompress_unlz4.c: correctly handle zero-padding around initrds Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210710022709.3170675-2-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=boqun.feng@gmail.com \
    --cc=frederic@kernel.org \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neeraju@codeaurora.org \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=urezki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.