From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 934B5192594; Tue, 8 Oct 2024 12:36:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728390992; cv=none; b=mc9sfMDJRLBCLYafXjuhQi11lkpXWv66SNjmF/tRkMghsk0R67nsZu9K9LGwds4DMglI2/CMQmyE4nWwVjBh9JvErMJWRUbSp2GPc5EQe07Mvf5nNfppYfYvKeDXK2XRsQNJYs8U62SAKSbQaxOqNTTa3/f8Rw7qBipw7zmPJPM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728390992; c=relaxed/simple; bh=xe7zYbz8+07HlRbojzYZgphMQkkW8jw53T6kb/ja7Go=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qC/uMm1lT+dETojGE9Ibgs97OXJynPqjkBXPmSpyeHO4d0WOBYfPigDfPBe73hVwr8WNcuVV/3Zbd3dUVMZ5xG8o8JTc6KzNsWIxCvLw4QrmV1tsY8E/a379/3lph+o/GMXtIlxZc9yK+ovVrWhVmEYliL6rVzVb0w5UyH2tBbI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b=fzwmL6C2; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b="fzwmL6C2" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 18B48C4AF0B; Tue, 8 Oct 2024 12:36:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1728390992; bh=xe7zYbz8+07HlRbojzYZgphMQkkW8jw53T6kb/ja7Go=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=fzwmL6C2WKMskA8Vc2hwwWn/5zb2HB0yRQenfX/XfiuYpJmKF+oOkdaIkDvjX0l9U 6W+rkfOPVtcyzq2Z2F3htZhFflJGzhDZoKYpcrtIiWnJMn90QD2cmdfk34jjRgWRPa 7VZrPpe/ZLg7fKVOOmyUUoJDYbGsWmArv7zvN8EQ= From: Greg Kroah-Hartman To: stable@vger.kernel.org Cc: Greg Kroah-Hartman , patches@lists.linux.dev, Masami Hiramatsu , Mathieu Desnoyers , Wei Li , "Steven Rostedt (Google)" Subject: [PATCH 6.10 430/482] tracing/timerlat: Fix a race during cpuhp processing Date: Tue, 8 Oct 2024 14:08:13 +0200 Message-ID: <20241008115705.444803927@linuxfoundation.org> X-Mailer: git-send-email 2.46.2 In-Reply-To: <20241008115648.280954295@linuxfoundation.org> References: <20241008115648.280954295@linuxfoundation.org> User-Agent: quilt/0.67 X-stable: review X-Patchwork-Hint: ignore Precedence: bulk X-Mailing-List: patches@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit 6.10-stable review patch. If anyone has any objections, please let me know. ------------------ From: Wei Li commit 829e0c9f0855f26b3ae830d17b24aec103f7e915 upstream. There is another found exception that the "timerlat/1" thread was scheduled on CPU0, and lead to timer corruption finally: ``` ODEBUG: init active (active state 0) object: ffff888237c2e108 object type: hrtimer hint: timerlat_irq+0x0/0x220 WARNING: CPU: 0 PID: 426 at lib/debugobjects.c:518 debug_print_object+0x7d/0xb0 Modules linked in: CPU: 0 UID: 0 PID: 426 Comm: timerlat/1 Not tainted 6.11.0-rc7+ #45 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:debug_print_object+0x7d/0xb0 ... Call Trace: ? __warn+0x7c/0x110 ? debug_print_object+0x7d/0xb0 ? report_bug+0xf1/0x1d0 ? prb_read_valid+0x17/0x20 ? handle_bug+0x3f/0x70 ? exc_invalid_op+0x13/0x60 ? asm_exc_invalid_op+0x16/0x20 ? debug_print_object+0x7d/0xb0 ? debug_print_object+0x7d/0xb0 ? __pfx_timerlat_irq+0x10/0x10 __debug_object_init+0x110/0x150 hrtimer_init+0x1d/0x60 timerlat_main+0xab/0x2d0 ? __pfx_timerlat_main+0x10/0x10 kthread+0xb7/0xe0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2d/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 ``` After tracing the scheduling event, it was discovered that the migration of the "timerlat/1" thread was performed during thread creation. Further analysis confirmed that it is because the CPU online processing for osnoise is implemented through workers, which is asynchronous with the offline processing. When the worker was scheduled to create a thread, the CPU may has already been removed from the cpu_online_mask during the offline process, resulting in the inability to select the right CPU: T1 | T2 [CPUHP_ONLINE] | cpu_device_down() osnoise_hotplug_workfn() | | cpus_write_lock() | takedown_cpu(1) | cpus_write_unlock() [CPUHP_OFFLINE] | cpus_read_lock() | start_kthread(1) | cpus_read_unlock() | To fix this, skip online processing if the CPU is already offline. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu Cc: Mathieu Desnoyers Link: https://lore.kernel.org/20240924094515.3561410-4-liwei391@huawei.com Fixes: c8895e271f79 ("trace/osnoise: Support hotplug operations") Signed-off-by: Wei Li Signed-off-by: Steven Rostedt (Google) Signed-off-by: Greg Kroah-Hartman --- kernel/trace/trace_osnoise.c | 2 ++ 1 file changed, 2 insertions(+) --- a/kernel/trace/trace_osnoise.c +++ b/kernel/trace/trace_osnoise.c @@ -2094,6 +2094,8 @@ static void osnoise_hotplug_workfn(struc mutex_lock(&interface_lock); cpus_read_lock(); + if (!cpu_online(cpu)) + goto out_unlock; if (!cpumask_test_cpu(cpu, &osnoise_cpumask)) goto out_unlock;