From: Balbir Singh <balbir@in.ibm.com>
To: akpm@osdl.org
Cc: Balbir Singh <balbir@in.ibm.com>,
Shailabh Nagar <nagar@watson.ibm.com>,
Jay Lan <jlan@engr.sgi.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2.6.17-rc6-mm2] Fix exit race in per-task-delay-accounting
Date: Wed, 21 Jun 2006 12:15:47 +0530 [thread overview]
Message-ID: <4498EB1B.1070002@in.ibm.com> (raw)
In-Reply-To: <20060621055952.6658.49704.sendpatchset@localhost.localdomain>
Balbir Singh wrote:
> Hi, Andrew,
>
> This patch fixes a race in per-task-delay-accounting. This race
> was reported by Jay Lan. I tested the patch using
> cerebrus test control system for eight hours with getdelays running on
> the side (for both push and pull of delay statistics).
>
> It fixed the problem that Jay Lan saw.
>
> Here's an explanation of the race condition
>
> Consider tasks of the same thread group exiting, lets call them T1 and T2
>
>
> T1 T2
>
> CPU0 CPU1
> ===== =====
>
> do_exit()
> ... do_exit()
> taskstats_exit_send() ...
> taskstats_exit_send()
> fill_tgid()
> delayacct_add_tsk()
> delayacct_tsk_exit() ....
> __delayacct_add_tsk()
>
>
> While T1 is yet to be removed from the thread group. T1->delays is set to NULL
> between delayacct_add_tsk() and __delayacct_add_tsk() call.
>
> When T2 looks for threads in the thread group, it finds T1 and tries to
> collect stats for it. When we get to the spin_lock() in __delayacct_add_tsk(),
> we find T1->delays is a NULL pointer.
>
> Signed-off-by: Balbir Singh <balbir@in.ibm.com>
> ---
>
> include/linux/taskstats_kern.h | 1 +
> kernel/delayacct.c | 5 ++++-
> kernel/taskstats.c | 5 ++++-
> 3 files changed, 9 insertions(+), 2 deletions(-)
>
> diff -puN kernel/taskstats.c~per-task-delay-accounting-fix-exit-race kernel/taskstats.c
> --- linux-2.6.17-rc6/kernel/taskstats.c~per-task-delay-accounting-fix-exit-race 2006-06-20 13:49:29.000000000 +0530
> +++ linux-2.6.17-rc6-balbir/kernel/taskstats.c 2006-06-20 13:49:29.000000000 +0530
> @@ -25,7 +25,7 @@
> static DEFINE_PER_CPU(__u32, taskstats_seqnum) = { 0 };
> static int family_registered = 0;
> kmem_cache_t *taskstats_cache;
> -static DEFINE_MUTEX(taskstats_exit_mutex);
> +DEFINE_MUTEX(taskstats_exit_mutex);
>
> static struct genl_family family = {
> .id = GENL_ID_GENERATE,
> @@ -193,6 +193,7 @@ static int taskstats_send_stats(struct s
> if (rc < 0)
> return rc;
>
> + mutex_lock(&taskstats_exit_mutex);
> if (info->attrs[TASKSTATS_CMD_ATTR_PID]) {
> u32 pid = nla_get_u32(info->attrs[TASKSTATS_CMD_ATTR_PID]);
> rc = fill_pid(pid, NULL, &stats);
> @@ -218,6 +219,7 @@ static int taskstats_send_stats(struct s
> goto err;
> }
>
> + mutex_unlock(&taskstats_exit_mutex);
> nla_nest_end(rep_skb, na);
>
> return send_reply(rep_skb, info->snd_pid, TASKSTATS_MSG_UNICAST);
> @@ -226,6 +228,7 @@ nla_put_failure:
> return genlmsg_cancel(rep_skb, reply);
> err:
> nlmsg_free(rep_skb);
> + mutex_unlock(&taskstats_exit_mutex);
> return rc;
> }
>
> diff -puN kernel/delayacct.c~per-task-delay-accounting-fix-exit-race kernel/delayacct.c
> --- linux-2.6.17-rc6/kernel/delayacct.c~per-task-delay-accounting-fix-exit-race 2006-06-20 13:49:29.000000000 +0530
> +++ linux-2.6.17-rc6-balbir/kernel/delayacct.c 2006-06-20 13:49:29.000000000 +0530
> @@ -48,8 +48,11 @@ void __delayacct_tsk_init(struct task_st
>
> void __delayacct_tsk_exit(struct task_struct *tsk)
> {
> - kmem_cache_free(delayacct_cache, tsk->delays);
> + struct task_delay_info *delays = tsk->delays;
> + mutex_lock(&taskstats_exit_mutex);
> tsk->delays = NULL;
> + mutex_unlock(&taskstats_exit_mutex);
> + kmem_cache_free(delayacct_cache, delays);
> }
>
> /*
> diff -puN include/linux/taskstats_kern.h~per-task-delay-accounting-fix-exit-race include/linux/taskstats_kern.h
> --- linux-2.6.17-rc6/include/linux/taskstats_kern.h~per-task-delay-accounting-fix-exit-race 2006-06-20 13:49:29.000000000 +0530
> +++ linux-2.6.17-rc6-balbir/include/linux/taskstats_kern.h 2006-06-20 13:49:29.000000000 +0530
> @@ -17,6 +17,7 @@ enum {
>
> #ifdef CONFIG_TASKSTATS
> extern kmem_cache_t *taskstats_cache;
> +extern struct mutex taskstats_exit_mutex;
>
> static inline void taskstats_exit_alloc(struct taskstats **ptidstats,
> struct taskstats **ptgidstats)
> _
>
The Cc's got messed up (a really odd name lookup happened at the mail
server).
Sorry
Balbir Singh,
Linux Technology Center,
IBM Software Labs
next prev parent reply other threads:[~2006-06-21 6:52 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-06-21 5:59 [PATCH 2.6.17-rc6-mm2] Fix exit race in per-task-delay-accounting Balbir Singh
2006-06-21 6:45 ` Balbir Singh [this message]
2006-06-21 14:14 ` Shailabh Nagar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4498EB1B.1070002@in.ibm.com \
--to=balbir@in.ibm.com \
--cc=akpm@osdl.org \
--cc=jlan@engr.sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nagar@watson.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.