From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B38FA5F470 for ; Thu, 1 Feb 2024 14:57:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706799433; cv=none; b=BtxrvAZaN/McARt8thx6hBdBEY/i+SG5xy0cNW2lNqAPqRjz7Mazves/wMbSk3MjLe0K2dQy7tvcRi4pRScRlvoKKmZ7vzTwyMRtCRk5BWfMTG+vFvID+Ql4oJkp8pM1bY9FodQbyECQR4AFxrzBqrVXUqAfZbyaWfbae8bw2kk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706799433; c=relaxed/simple; bh=18pHoWH50Iv2v4FSlrpCqG2ZsbcsI6iZ9hyVB81jlLo=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=jj49TM/jleO9IuLQWgPIqxFBvVpBhnYpz+G+rq/wDMv5ilgxtyz7VKq8n8/BbcvsrXqWhtHD4RQgk4CpZ+lpfwJ/ih9fh6gmD/xdL+zFuztaQAmoELt5iMe1TZS+mIiEW+MCfdPXUgbeKwxEx1mRMqaVa5yxt6+sZPshTYVtkMc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id DAB98C433C7; Thu, 1 Feb 2024 14:57:12 +0000 (UTC) Date: Thu, 1 Feb 2024 09:57:29 -0500 From: Steven Rostedt To: Andy Chiu Cc: linux-trace-kernel@vger.kernel.org, bristot@kernel.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com Subject: Re: [v1] trace/hwlat: stop worker if !is_percpu_thread due to hotplug event Message-ID: <20240201095729.4fe5e56b@gandalf.local.home> In-Reply-To: <20240201062455.3154803-1-andy.chiu@sifive.com> References: <20240201062455.3154803-1-andy.chiu@sifive.com> X-Mailer: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Thu, 1 Feb 2024 14:24:55 +0800 Andy Chiu wrote: > If the task happens to run after cpu hot-plug offline, then it would not > be running in a percpu_thread. Instead, it would be re-queued into a > UNBOUND workqueue. This would trigger a warning if we enable kernel > preemption. >=20 > Signed-off-by: Andy Chiu > --- > kernel/trace/trace_hwlat.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) >=20 > diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c > index b791524a6536..87258ddc2141 100644 > --- a/kernel/trace/trace_hwlat.c > +++ b/kernel/trace/trace_hwlat.c > @@ -511,7 +511,16 @@ static int start_cpu_kthread(unsigned int cpu) > static void hwlat_hotplug_workfn(struct work_struct *dummy) > { > struct trace_array *tr =3D hwlat_trace; > - unsigned int cpu =3D smp_processor_id(); > + unsigned int cpu; > + > + /* > + * If the work is scheduled after CPU hotplug offline being invoked, > + * then it would be queued into UNBOUNDED workqueue Is it due to a CPU going to online and then right back to offline? This function is called by the online path. =46rom the queue_work_on() comment: /** * queue_work_on - queue work on specific cpu * @cpu: CPU number to execute work on * @wq: workqueue to use * @work: work to queue * * We queue the work to a specific CPU, the caller must ensure it * can't go away. Callers that fail to ensure that the specified * CPU cannot go away will execute on a randomly chosen CPU. * But note well that callers specifying a CPU that never has been * online will get a splat. * * Return: %false if @work was already on a queue, %true otherwise. */ So the work gets queued on the CPU but then the CPU immediately goes offline. I wonder what happens if it comes back online? The above states that if the work is already queued on a CPU it won't queue it again. Hmm, reading the comments, I'm not sure you can run the same workqueue on muliple CPUs. It looks to do only one and all the others will fail. I think we need to change this to a global work queue and use CPU masks. Where in the *cpu_init() function, it just sets the current CPU bit in a bit mask and calls the workqueue, and that workqueue is responsible for all CPUs. Then the workqueue function will call cpu_read_lock() and just iterate the CPU mask starting the threads for each of the bits that are set and clear the bit (possibly with __test_and_clear_bit). I don't think the schedule_on_cpu() is doing what we think it should be doing. -- Steve > + */ > + if (!is_percpu_thread()) > + return; > + > + cpu =3D smp_processor_id(); > =20 > mutex_lock(&trace_types_lock); > mutex_lock(&hwlat_data.lock);