From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B7831187843 for ; Thu, 29 Aug 2024 23:40:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724974831; cv=none; b=jExRstFlIbXx2rbPqcLadbFRIV4RYFFHvPZhXbXPvgoQjV3pI4VVwZLYennUpGnCUx7KIIik9zJg/X/IDQfQxtphygaQQaa4NTK62YujVXMWMoWzjBPTjLadCMx+djYbiBg9yCuAcv+RA+D3Y6Kc6+1ls7TcBrfWKFTNq9CFpm0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724974831; c=relaxed/simple; bh=CpHenLQGeHWJ3I89Oj0vMd6dOYNQ5tgX7FxaCCd0QLs=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=I/3jAAH/1IwV+ugSb4iu5/C28foiMWFClr2afAyiq28UsKXSKewXkCXijZ1DZ14ORM0+GShmglGLXcqBgiQniElGrfi+jdI2L3+gpbqEq41PjYAZfTLpfuh6hTFsnPIvJO1YFt9hY2jxtQe0TuDxXNXIkw1JWUbb/f1G5naCt9I= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=X1HPUx39; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="X1HPUx39" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1724974827; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BY7SH++AtttjQGWLu1s22uVFr5kdBWS1qBh1vIRVlGw=; b=X1HPUx394xd7H9f9ShdQjdqLsWngGDCT29kkVy5ueLx371NyrZgnOj+xuLRscM1q/hKxoA JhwHxxDplD7rCDN2bAnTZQX89bKBzIv03dSHhHvWjniHG9hZ5xlrrq+eJWpLKTKrAsmCot K0KTR2yiCT/sNWPGXVfpAf9CifnSwmY= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-248-jB1zItyFPeidQh0z-TUJqg-1; Thu, 29 Aug 2024 19:40:26 -0400 X-MC-Unique: jB1zItyFPeidQh0z-TUJqg-1 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B7FF01955D48; Thu, 29 Aug 2024 23:40:24 +0000 (UTC) Received: from localhost (unknown [10.22.32.86]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 051B11955BE3; Thu, 29 Aug 2024 23:40:23 +0000 (UTC) Date: Thu, 29 Aug 2024 20:40:22 -0300 From: "Luis Claudio R. Goncalves" To: Tomas Glozar Cc: Steven Rostedt , linux-trace-kernel@vger.kernel.org, linux-kernel@vger.kernel.org, jkacur@redhat.com Subject: Re: [PATCH] tracing/timerlat: Check tlat_var for NULL in timerlat_fd_release Message-ID: References: <20240820130001.124768-1-tglozar@redhat.com> <20240823125426.404f2705@gandalf.local.home> <20240823145211.34ccda61@gandalf.local.home> <20240826132620.1618d201@gandalf.local.home> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 On Tue, Aug 27, 2024 at 04:34:39PM +0200, Tomas Glozar wrote: > po 26. 8. 2024 v 19:27 odesílatel Steven Rostedt napsal: > > > > Yeah, I think I finally found the real issue. I don't think we need the ref > > counting. The problem is the creating and killing of the threads via the > > start and stop callbacks. That's not their purpose. The purpose of stop > > and start callbacks is when tracing_on is set to off and back on again. I > > think this is what is racing with the close. > > > > I believe I discovered the exact mechanism of the bug. Since rtla is > killed with SIGTERM, it leaves the timerlat tracer active and its > timerlatu processes running. The rtla from the next iteration of the > while loop in the reproducer deactivates the timerlat tracer, > triggering the killing of the timerlatu processes. However, this > killing is asynchronous, and while the processes are gradually dying > one by one and closing their fds, the rtla from the next iteration is > already activating the timerlat tracer again. One of the timerlatu > processes does not die fast enough to release the fd before the > tlat_var is cleared in osnoise_workload_start, triggering the panic. With this in mind I tried a simpler approach. I used a kernel without any extra patches from Steven or ourselves, just used the code below. Not saying it is a definitive solution, but as a proof of concept I have two boxes running the loop used to reproduce the problem and so far not a single problem. The patch: diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c index 66a871553d4a1..331c69bc7b0d4 100644 --- a/kernel/trace/trace_osnoise.c +++ b/kernel/trace/trace_osnoise.c @@ -2239,8 +2239,11 @@ static ssize_t osnoise_options_write(struct file *filp, const char __user *ubuf, */ mutex_lock(&trace_types_lock); running = osnoise_has_registered_instances(); - if (running) + if (running) { stop_per_cpu_kthreads(); + mutex_unlock(&trace_types_lock); + return -EBUSY; + } mutex_lock(&interface_lock); /* @@ -2579,7 +2582,8 @@ static int timerlat_fd_release(struct inode *inode, struct file *file) osn_var = per_cpu_ptr(&per_cpu_osnoise_var, cpu); tlat_var = per_cpu_ptr(&per_cpu_timerlat_var, cpu); - hrtimer_cancel(&tlat_var->timer); + if (tlat_var->kthread) + hrtimer_cancel(&tlat_var->timer); memset(tlat_var, 0, sizeof(*tlat_var)); osn_var->sampling = 0; --- In short, do not start new instances if the old one still registered. And only cancel the timer if the timerlat per cpu var has not been zeroed-out. Again, not a final patch but a simpler approach to protect start/stop races. Luis > > You can see the bug in action in this detailed trace I got today, with > the use of in-kernel trace_printk together with bpftrace and > bpf_trace_printk and captured using ftrace_dump_on_oops (running the > same reproducer on a 8 CPU KVM system): > [ 103.227679] rtla-2325 2..... 98242607us : > osnoise_workload_start: tlat_var_reset: cpu=0, > tlat_var=0000000034996efa > [ 103.227844] rtla-2325 2..... 98242607us : > osnoise_workload_start: tlat_var_reset: cpu=1, > tlat_var=0000000062d1f520 > [ 103.228009] rtla-2325 2..... 98242607us : > osnoise_workload_start: tlat_var_reset: cpu=2, > tlat_var=0000000002e262d8 > [ 103.228173] rtla-2325 2..... 98242607us : > osnoise_workload_start: tlat_var_reset: cpu=3, > tlat_var=00000000b9d3a280 > [ 103.228349] rtla-2325 2..... 98242607us : > osnoise_workload_start: tlat_var_reset: cpu=4, > tlat_var=000000000fba7a6e > [ 103.228510] rtla-2325 2..... 98242608us : > osnoise_workload_start: tlat_var_reset: cpu=5, > tlat_var=000000008ed0b87f > [ 103.228670] rtla-2325 2..... 98242608us : > osnoise_workload_start: tlat_var_reset: cpu=6, > tlat_var=00000000a02fe267 > [ 103.228831] rtla-2325 2..... 98242608us : > osnoise_workload_start: tlat_var_reset: cpu=7, > tlat_var=000000008e1d5095 > [ 103.228992] rtla-2327 3...11 98256551us : > bpf_trace_printk: rtla pid 2325 forks into 2328 > [ 103.229156] rtla-2327 3...11 98256873us : > bpf_trace_printk: rtla pid 2325 forks into 2329 > [ 103.229325] rtla-2327 3...11 98257124us : > bpf_trace_printk: rtla pid 2325 forks into 2330 > [ 103.229489] rtla-2327 3...11 98257353us : > bpf_trace_printk: rtla pid 2325 forks into 2331 > [ 103.229651] rtla-2327 3...11 98257574us : > bpf_trace_printk: rtla pid 2325 forks into 2332 > [ 103.229810] rtla-2327 3...11 98257801us : > bpf_trace_printk: rtla pid 2325 forks into 2333 > [ 103.229968] rtla-2327 3...11 98258008us : > bpf_trace_printk: rtla pid 2325 forks into 2334 > [ 103.230124] rtla-2327 3...11 98258216us : > bpf_trace_printk: rtla pid 2325 forks into 2335 > [ 103.230293] bash-2094 0d..31 102987914us : > bpf_trace_printk: rtla pid 2325 received signal 2 > [ 103.230452] bash-2094 0d..31 102987915us : > bpf_trace_printk: ...from bash pid 2094 > [ 103.230609] bash-2094 0d..31 102990000us : > bpf_trace_printk: rtla pid 2325 received signal 15 > [ 103.230766] bash-2094 0d..31 102990001us : > bpf_trace_printk: ...from bash pid 2094 > [ 103.230922] rtla-2325 2...21 102990010us : > bpf_trace_printk: rtla pid 2325 exiting > [ 103.231079] rtla-2327 3...21 102990503us : > bpf_trace_printk: rtla pid 2327 exiting > [ 103.231242] rtla-2337 3d..31 103232021us : > bpf_trace_printk: timerlatu/0 pid 2328 received signal 9 > [ 103.231399] rtla-2337 3d..31 103232022us : > bpf_trace_printk: ...from rtla pid 2337 > [ 103.231554] rtla-2337 3d..31 103232024us : > bpf_trace_printk: timerlatu/1 pid 2329 received signal 9 > [ 103.231711] rtla-2337 3d..31 103232025us : > bpf_trace_printk: ...from rtla pid 2337 > [ 103.231866] rtla-2337 3d..31 103232029us : > bpf_trace_printk: timerlatu/2 pid 2330 received signal 9 > [ 103.232022] rtla-2337 3d..31 103232030us : > bpf_trace_printk: ...from rtla pid 2337 > [ 103.232178] rtla-2337 3dN.31 103232031us : > bpf_trace_printk: timerlatu/3 pid 2331 received signal 9 > [ 103.232347] rtla-2337 3dN.31 103232031us : > bpf_trace_printk: ...from rtla pid 2337 > [ 103.232506] rtla-2337 3dN.31 103232034us : > bpf_trace_printk: timerlatu/4 pid 2332 received signal 9 > [ 103.232685] rtla-2337 3dN.31 103232034us : > bpf_trace_printk: ...from rtla pid 2337 > [ 103.232846] rtla-2337 3dN.31 103232039us : > bpf_trace_printk: timerlatu/5 pid 2333 received signal 9 > [ 103.233010] rtla-2337 3dN.31 103232039us : > bpf_trace_printk: ...from rtla pid 2337 > [ 103.233171] rtla-2337 3dN.31 103232042us : > bpf_trace_printk: timerlatu/6 pid 2334 received signal 9 > [ 103.233357] rtla-2337 3dN.31 103232042us : > bpf_trace_printk: ...from rtla pid 2337 > [ 103.233517] rtla-2337 3dN.31 103232046us : > bpf_trace_printk: timerlatu/7 pid 2335 received signal 9 > [ 103.233678] rtla-2337 3dN.31 103232046us : > bpf_trace_printk: ...from rtla pid 2337 > [ 103.233835] timerlat-2331 3...21 103232701us : > bpf_trace_printk: timerlatu/3 pid 2331 exiting > [ 103.233995] timerlat-2329 1...21 103232704us : > bpf_trace_printk: timerlatu/1 pid 2329 exiting > [ 103.234155] timerlat-2331 3....1 103232705us : > timerlat_fd_release: timerlat_fd_release: cpu=3, > tlat_var=000000000fba7a6e, osn_var->pid=2331 > [ 103.234357] timerlat-2329 1....1 103232706us : > timerlat_fd_release: timerlat_fd_release: cpu=1, > tlat_var=0000000002e262d8, osn_var->pid=2329 > [ 103.234508] timerlat-2334 6...21 103232821us : > bpf_trace_printk: timerlatu/6 pid 2334 exiting > [ 103.234657] timerlat-2334 6....1 103232826us : > timerlat_fd_release: timerlat_fd_release: cpu=6, > tlat_var=000000008e1d5095, osn_var->pid=2334 > [ 103.234813] timerlat-2330 2...21 103232850us : > bpf_trace_printk: timerlatu/2 pid 2330 exiting > [ 103.234966] timerlat-2330 2....1 103232853us : > timerlat_fd_release: timerlat_fd_release: cpu=2, > tlat_var=00000000b9d3a280, osn_var->pid=2330 > [ 103.235122] timerlat-2333 5...21 103232856us : > bpf_trace_printk: timerlatu/5 pid 2333 exiting > [ 103.235291] timerlat-2333 5....1 103232858us : > timerlat_fd_release: timerlat_fd_release: cpu=5, > tlat_var=00000000a02fe267, osn_var->pid=2333 > [ 103.235452] timerlat-2332 4...21 103232861us : > bpf_trace_printk: timerlatu/4 pid 2332 exiting > [ 103.235610] timerlat-2332 4....1 103232864us : > timerlat_fd_release: timerlat_fd_release: cpu=4, > tlat_var=000000008ed0b87f, osn_var->pid=2332 > [ 103.235769] timerlat-2335 7...21 103232867us : > bpf_trace_printk: timerlatu/7 pid 2335 exiting > [ 103.235932] timerlat-2335 7....1 103232869us : > timerlat_fd_release: timerlat_fd_release: cpu=7, > tlat_var=000000003113e297, osn_var->pid=2335 > [ 103.236103] rtla-2337 3..... 103234109us : > osnoise_workload_start: tlat_var_reset: cpu=0, > tlat_var=0000000034996efa > [ 103.236316] rtla-2337 3..... 103234109us : > osnoise_workload_start: tlat_var_reset: cpu=1, > tlat_var=0000000062d1f520 > [ 103.236473] rtla-2337 3..... 103234109us : > osnoise_workload_start: tlat_var_reset: cpu=2, > tlat_var=0000000002e262d8 > [ 103.236630] rtla-2337 3..... 103234110us : > osnoise_workload_start: tlat_var_reset: cpu=3, > tlat_var=00000000b9d3a280 > [ 103.236797] rtla-2337 3..... 103234110us : > osnoise_workload_start: tlat_var_reset: cpu=4, > tlat_var=000000000fba7a6e > [ 103.236972] rtla-2337 3..... 103234110us : > osnoise_workload_start: tlat_var_reset: cpu=5, > tlat_var=000000008ed0b87f > [ 103.237144] rtla-2337 3..... 103234110us : > osnoise_workload_start: tlat_var_reset: cpu=6, > tlat_var=00000000a02fe267 > [ 103.237345] rtla-2337 3..... 103234110us : > osnoise_workload_start: tlat_var_reset: cpu=7, > tlat_var=000000008e1d5095 > [ 103.237516] timerlat-2328 0...21 103234149us : > bpf_trace_printk: timerlatu/0 pid 2328 exiting > [ 103.237690] timerlat-2328 0....1 103234152us : > timerlat_fd_release: timerlat_fd_release: cpu=0, > tlat_var=0000000062d1f520, osn_var->pid=0 > [ 103.237871] --------------------------------- > [ 103.238053] CR2: 0000000000000010 > > On the kernel side, I'm using similar trace_printk debug prints as you > did; the bpftrace program used is this: > uretprobe:libc:fork > /strncmp(comm, "rtla", 5) == 0 && retval != 0/ > { > debugf("rtla pid %d forks into %d", pid, retval) > } > tracepoint:signal:signal_generate > /strncmp(args->comm, "rtla", 5) == 0 || strncmp(args->comm, > "timerlatu", 9) == 0/ { > debugf("%s pid %d received signal %d", args->comm, args->pid, > args->sig); > debugf("...from %s pid %d", comm, pid); > } > tracepoint:sched:sched_process_exit > /strncmp(args->comm, "rtla", 5) == 0 || strncmp(args->comm, > "timerlatu", 9) == 0/ { > debugf("%s pid %d exiting", args->comm, args->pid) > } > > > Anyway, the start and stop should probably just pause the threads and not > > kill them an start them again. That is, the osnoise_workload_start() should > > be called by the init callbacks and the osnoise_workload_stop should be > > called by reset callback. > > > > The start and stop callbacks should just pause and restart the the threads. > > > > I'm not sure if that fits into the design of timerlat. At least for > timerlat with user workload, each activation of the timerlat tracer > implies the creation of a new user workload, not the re-activation of > a lingering one (that might be an entirely different program). A > timerlat tracer is tightly associated with its threads: if the threads > exit, the timer itself ceases to have meaning. Of course also rtla > should make sure there are no lingering timerlatu processes. > > I think waiting on the threads to actually exit in stop_kthread() is > the proper solution: > > /* > * stop_kthread - stop a workload thread > */ > static void stop_kthread(unsigned int cpu){ > struct task_struct *kthread; > > kthread = per_cpu(per_cpu_osnoise_var, cpu).kthread; > if (kthread) { > if (test_bit(OSN_WORKLOAD, &osnoise_options)) { > kthread_stop(kthread); > } else { > /* > * This is a user thread waiting on the timerlat_fd. We need > * to close all users, and the best way to guarantee this is > * by killing the thread. NOTE: this is a purpose specific file. > */ > kill_pid(kthread->thread_pid, SIGKILL, 1); > /* ^^ here wait until kthread (actually the user workload) exits */ > put_task_struct(kthread); > } > per_cpu(per_cpu_osnoise_var, cpu).kthread = NULL; > } > > There might even be another bug, not sure if calling put_task_struct > on a task that is still exiting is sane. > > Tomas > ---end quoted text---