From: Balbir Singh <balbir@in.ibm.com>
To: akpm@osdl.org
Cc: Balbir Singh <balbir@in.ibm.com>,
Shailabh Nagar <nagar@watson.ibm.com>,
Jay Lan <jlan@engr.sgi.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2.6.17-rc6-mm2] Fix exit race in per-task-delay-accounting
Date: Wed, 21 Jun 2006 12:15:47 +0530 [thread overview]
Message-ID: <4498EB1B.1070002@in.ibm.com> (raw)
In-Reply-To: <20060621055952.6658.49704.sendpatchset@localhost.localdomain>
Balbir Singh wrote:
> Hi, Andrew,
>
> This patch fixes a race in per-task-delay-accounting. This race
> was reported by Jay Lan. I tested the patch using
> cerebrus test control system for eight hours with getdelays running on
> the side (for both push and pull of delay statistics).
>
> It fixed the problem that Jay Lan saw.
>
> Here's an explanation of the race condition
>
> Consider tasks of the same thread group exiting, lets call them T1 and T2
>
>
> T1 T2
>
> CPU0 CPU1
> ===== =====
>
> do_exit()
> ... do_exit()
> taskstats_exit_send() ...
> taskstats_exit_send()
> fill_tgid()
> delayacct_add_tsk()
> delayacct_tsk_exit() ....
> __delayacct_add_tsk()
>
>
> While T1 is yet to be removed from the thread group. T1->delays is set to NULL
> between delayacct_add_tsk() and __delayacct_add_tsk() call.
>
> When T2 looks for threads in the thread group, it finds T1 and tries to
> collect stats for it. When we get to the spin_lock() in __delayacct_add_tsk(),
> we find T1->delays is a NULL pointer.
>
> Signed-off-by: Balbir Singh <balbir@in.ibm.com>
> ---
>
> include/linux/taskstats_kern.h | 1 +
> kernel/delayacct.c | 5 ++++-
> kernel/taskstats.c | 5 ++++-
> 3 files changed, 9 insertions(+), 2 deletions(-)
>
> diff -puN kernel/taskstats.c~per-task-delay-accounting-fix-exit-race kernel/taskstats.c
> --- linux-2.6.17-rc6/kernel/taskstats.c~per-task-delay-accounting-fix-exit-race 2006-06-20 13:49:29.000000000 +0530
> +++ linux-2.6.17-rc6-balbir/kernel/taskstats.c 2006-06-20 13:49:29.000000000 +0530
> @@ -25,7 +25,7 @@
> static DEFINE_PER_CPU(__u32, taskstats_seqnum) = { 0 };
> static int family_registered = 0;
> kmem_cache_t *taskstats_cache;
> -static DEFINE_MUTEX(taskstats_exit_mutex);
> +DEFINE_MUTEX(taskstats_exit_mutex);
>
> static struct genl_family family = {
> .id = GENL_ID_GENERATE,
> @@ -193,6 +193,7 @@ static int taskstats_send_stats(struct s
> if (rc < 0)
> return rc;
>
> + mutex_lock(&taskstats_exit_mutex);
> if (info->attrs[TASKSTATS_CMD_ATTR_PID]) {
> u32 pid = nla_get_u32(info->attrs[TASKSTATS_CMD_ATTR_PID]);
> rc = fill_pid(pid, NULL, &stats);
> @@ -218,6 +219,7 @@ static int taskstats_send_stats(struct s
> goto err;
> }
>
> + mutex_unlock(&taskstats_exit_mutex);
> nla_nest_end(rep_skb, na);
>
> return send_reply(rep_skb, info->snd_pid, TASKSTATS_MSG_UNICAST);
> @@ -226,6 +228,7 @@ nla_put_failure:
> return genlmsg_cancel(rep_skb, reply);
> err:
> nlmsg_free(rep_skb);
> + mutex_unlock(&taskstats_exit_mutex);
> return rc;
> }
>
> diff -puN kernel/delayacct.c~per-task-delay-accounting-fix-exit-race kernel/delayacct.c
> --- linux-2.6.17-rc6/kernel/delayacct.c~per-task-delay-accounting-fix-exit-race 2006-06-20 13:49:29.000000000 +0530
> +++ linux-2.6.17-rc6-balbir/kernel/delayacct.c 2006-06-20 13:49:29.000000000 +0530
> @@ -48,8 +48,11 @@ void __delayacct_tsk_init(struct task_st
>
> void __delayacct_tsk_exit(struct task_struct *tsk)
> {
> - kmem_cache_free(delayacct_cache, tsk->delays);
> + struct task_delay_info *delays = tsk->delays;
> + mutex_lock(&taskstats_exit_mutex);
> tsk->delays = NULL;
> + mutex_unlock(&taskstats_exit_mutex);
> + kmem_cache_free(delayacct_cache, delays);
> }
>
> /*
> diff -puN include/linux/taskstats_kern.h~per-task-delay-accounting-fix-exit-race include/linux/taskstats_kern.h
> --- linux-2.6.17-rc6/include/linux/taskstats_kern.h~per-task-delay-accounting-fix-exit-race 2006-06-20 13:49:29.000000000 +0530
> +++ linux-2.6.17-rc6-balbir/include/linux/taskstats_kern.h 2006-06-20 13:49:29.000000000 +0530
> @@ -17,6 +17,7 @@ enum {
>
> #ifdef CONFIG_TASKSTATS
> extern kmem_cache_t *taskstats_cache;
> +extern struct mutex taskstats_exit_mutex;
>
> static inline void taskstats_exit_alloc(struct taskstats **ptidstats,
> struct taskstats **ptgidstats)
> _
>
The Cc's got messed up (a really odd name lookup happened at the mail
server).
Sorry
Balbir Singh,
Linux Technology Center,
IBM Software Labs
next prev parent reply other threads:[~2006-06-21 6:52 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-06-21 5:59 [PATCH 2.6.17-rc6-mm2] Fix exit race in per-task-delay-accounting Balbir Singh
2006-06-21 6:45 ` Balbir Singh [this message]
2006-06-21 14:14 ` Shailabh Nagar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4498EB1B.1070002@in.ibm.com \
--to=balbir@in.ibm.com \
--cc=akpm@osdl.org \
--cc=jlan@engr.sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nagar@watson.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox