From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Robert Richter <robert.richter@amd.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Carl E. Love" <cel@us.ibm.com>,
Michael Ellerman <michaele@au1.ibm.com>,
oprofile-list <oprofile-list@lists.sourceforge.net>
Subject: Re: [PATCH] oprofile: fix crash when accessing freed task structs
Date: Mon, 16 Aug 2010 08:22:04 +1000 [thread overview]
Message-ID: <1281910924.2811.0.camel@pasglop> (raw)
In-Reply-To: <20100813153910.GD26154@erda.amd.com>
On Fri, 2010-08-13 at 17:39 +0200, Robert Richter wrote:
> On 02.08.10 21:39:33, Benjamin Herrenschmidt wrote:
>
> > I can't tell that much about the workload, I don't have access to it
> > either, let's say that from my point of view it's a "customer" binary
> > blob.
> >
> > I can re-trigger it though.
>
> Benjamin,
>
> can you try the patch below?
Thanks. I'll see if the folks who have a repro-case can give it a spin
for me.
Cheers,
Ben.
> Thanks,
>
> -Robert
>
> >From 4435322debc38097e9e863e14597ab3f78814d14 Mon Sep 17 00:00:00 2001
> From: Robert Richter <robert.richter@amd.com>
> Date: Fri, 13 Aug 2010 16:29:04 +0200
> Subject: [PATCH] oprofile: fix crash when accessing freed task structs
>
> This patch fixes a crash during shutdown reported below. The crash is
> caused be accessing already freed task structs. The fix changes the
> order for registering and unregistering notifier callbacks.
>
> All notifiers must be initialized before buffers start working. To
> stop buffer synchronization we cancel all workqueues, unregister the
> notifier callback and then flush all buffers. After all of this we
> finally can free all tasks listed.
>
> This should avoid accessing freed tasks.
>
> On 22.07.10 01:14:40, Benjamin Herrenschmidt wrote:
>
> > So the initial observation is a spinlock bad magic followed by a crash
> > in the spinlock debug code:
> >
> > [ 1541.586531] BUG: spinlock bad magic on CPU#5, events/5/136
> > [ 1541.597564] Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6b6d03
> >
> > Backtrace looks like:
> >
> > spin_bug+0x74/0xd4
> > ._raw_spin_lock+0x48/0x184
> > ._spin_lock+0x10/0x24
> > .get_task_mm+0x28/0x8c
> > .sync_buffer+0x1b4/0x598
> > .wq_sync_buffer+0xa0/0xdc
> > .worker_thread+0x1d8/0x2a8
> > .kthread+0xa8/0xb4
> > .kernel_thread+0x54/0x70
> >
> > So we are accessing a freed task struct in the work queue when
> > processing the samples.
>
> Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Signed-off-by: Robert Richter <robert.richter@amd.com>
> ---
> drivers/oprofile/buffer_sync.c | 27 ++++++++++++++-------------
> drivers/oprofile/cpu_buffer.c | 2 --
> 2 files changed, 14 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/oprofile/buffer_sync.c b/drivers/oprofile/buffer_sync.c
> index a9352b2..b7e755f 100644
> --- a/drivers/oprofile/buffer_sync.c
> +++ b/drivers/oprofile/buffer_sync.c
> @@ -141,16 +141,6 @@ static struct notifier_block module_load_nb = {
> .notifier_call = module_load_notify,
> };
>
> -
> -static void end_sync(void)
> -{
> - end_cpu_work();
> - /* make sure we don't leak task structs */
> - process_task_mortuary();
> - process_task_mortuary();
> -}
> -
> -
> int sync_start(void)
> {
> int err;
> @@ -158,7 +148,7 @@ int sync_start(void)
> if (!zalloc_cpumask_var(&marked_cpus, GFP_KERNEL))
> return -ENOMEM;
>
> - start_cpu_work();
> + mutex_lock(&buffer_mutex);
>
> err = task_handoff_register(&task_free_nb);
> if (err)
> @@ -173,7 +163,10 @@ int sync_start(void)
> if (err)
> goto out4;
>
> + start_cpu_work();
> +
> out:
> + mutex_unlock(&buffer_mutex);
> return err;
> out4:
> profile_event_unregister(PROFILE_MUNMAP, &munmap_nb);
> @@ -182,7 +175,6 @@ out3:
> out2:
> task_handoff_unregister(&task_free_nb);
> out1:
> - end_sync();
> free_cpumask_var(marked_cpus);
> goto out;
> }
> @@ -190,11 +182,20 @@ out1:
>
> void sync_stop(void)
> {
> + /* flush buffers */
> + mutex_lock(&buffer_mutex);
> + end_cpu_work();
> unregister_module_notifier(&module_load_nb);
> profile_event_unregister(PROFILE_MUNMAP, &munmap_nb);
> profile_event_unregister(PROFILE_TASK_EXIT, &task_exit_nb);
> task_handoff_unregister(&task_free_nb);
> - end_sync();
> + mutex_unlock(&buffer_mutex);
> + flush_scheduled_work();
> +
> + /* make sure we don't leak task structs */
> + process_task_mortuary();
> + process_task_mortuary();
> +
> free_cpumask_var(marked_cpus);
> }
>
> diff --git a/drivers/oprofile/cpu_buffer.c b/drivers/oprofile/cpu_buffer.c
> index 219f79e..f179ac2 100644
> --- a/drivers/oprofile/cpu_buffer.c
> +++ b/drivers/oprofile/cpu_buffer.c
> @@ -120,8 +120,6 @@ void end_cpu_work(void)
>
> cancel_delayed_work(&b->work);
> }
> -
> - flush_scheduled_work();
> }
>
> /*
> --
> 1.7.1.1
>
>
>
next prev parent reply other threads:[~2010-08-15 22:22 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-22 5:14 Possible Oprofile crash/race when stopping Benjamin Herrenschmidt
2010-07-28 12:21 ` Robert Richter
2010-08-03 1:39 ` Benjamin Herrenschmidt
2010-08-13 15:39 ` [PATCH] oprofile: fix crash when accessing freed task structs Robert Richter
2010-08-15 22:22 ` Benjamin Herrenschmidt [this message]
2010-08-31 10:28 ` Robert Richter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1281910924.2811.0.camel@pasglop \
--to=benh@kernel.crashing.org \
--cc=cel@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=michaele@au1.ibm.com \
--cc=oprofile-list@lists.sourceforge.net \
--cc=robert.richter@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox