* [PATCH] cred - synchronize rcu before releasing cred
@ 2010-06-16 12:24 Jiri Olsa
2010-06-16 12:45 ` Eric Dumazet
0 siblings, 1 reply; 25+ messages in thread
From: Jiri Olsa @ 2010-06-16 12:24 UTC (permalink / raw)
To: dhowells; +Cc: linux-kernel, Jiri Olsa
hi,
BZ 591015 - kernel BUG at kernel/cred.c:168
https://bugzilla.redhat.com/show_bug.cgi?id=591015
Above bugzilla reported bug during the releasing of
old cred structure.
There is reproducer attached to the bugzilla.
The issue is caused by releasing old cred struct while other
kernel path might be still using it. This leads to cred->usage
inconsistency inside the __put_cred and triggering the bug.
Following kernel paths are affected:
The CPU1 path is setting the new groups creds.
The CPU2 path is cat /proc/PID/status
CPU 1 CPU 2
sys_setgroups proc_pid_status
set_current_groups task_state
commit_creds rcu_read_lock
put_cred ...
__put_cred get_cred
BUG_ON(usage != 0) ...
rcu_read_unlock
If __put_cred got executed during the CPU2 holding the reference
the BUG_ON inside __put_cred is trigered.
I put synchronize_rcu before put_cred call, so we are sure
all the rcu_read_lock blocks are finished, and if needed, using
new creds.
Also I moved rcu_read_unlock below the put_cred for consistency.
wbr,
jirka
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
fs/proc/array.c | 3 ++-
kernel/cred.c | 6 ++++++
2 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 9b58d38..c0e60d1 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -199,7 +199,6 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
"FDSize:\t%d\n"
"Groups:\t",
fdt ? fdt->max_fds : 0);
- rcu_read_unlock();
group_info = cred->group_info;
task_unlock(p);
@@ -208,6 +207,8 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
seq_printf(m, "%d ", GROUP_AT(group_info, g));
put_cred(cred);
+ rcu_read_unlock();
+
seq_printf(m, "\n");
}
diff --git a/kernel/cred.c b/kernel/cred.c
index a2d5504..4872f12 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -510,6 +510,12 @@ int commit_creds(struct cred *new)
new->fsgid != old->fsgid)
proc_id_connector(task, PROC_EVENT_GID);
+ /*
+ * New cred is set already, now let the old one
+ * chance to disappear on other CPUs.
+ */
+ synchronize_rcu();
+
/* release the old obj and subj refs both */
put_cred(old);
put_cred(old);
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH] cred - synchronize rcu before releasing cred
2010-06-16 12:24 Jiri Olsa
@ 2010-06-16 12:45 ` Eric Dumazet
2010-06-16 12:57 ` Jiri Olsa
0 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2010-06-16 12:45 UTC (permalink / raw)
To: Jiri Olsa; +Cc: dhowells, linux-kernel, Paul E. McKenney
Le mercredi 16 juin 2010 à 14:24 +0200, Jiri Olsa a écrit :
> hi,
>
> BZ 591015 - kernel BUG at kernel/cred.c:168
> https://bugzilla.redhat.com/show_bug.cgi?id=591015
>
> Above bugzilla reported bug during the releasing of
> old cred structure.
>
> There is reproducer attached to the bugzilla.
>
> The issue is caused by releasing old cred struct while other
> kernel path might be still using it. This leads to cred->usage
> inconsistency inside the __put_cred and triggering the bug.
>
> Following kernel paths are affected:
>
> The CPU1 path is setting the new groups creds.
> The CPU2 path is cat /proc/PID/status
>
>
> CPU 1 CPU 2
>
> sys_setgroups proc_pid_status
> set_current_groups task_state
> commit_creds rcu_read_lock
> put_cred ...
> __put_cred get_cred
> BUG_ON(usage != 0) ...
> rcu_read_unlock
>
>
>
> If __put_cred got executed during the CPU2 holding the reference
> the BUG_ON inside __put_cred is trigered.
>
> I put synchronize_rcu before put_cred call, so we are sure
> all the rcu_read_lock blocks are finished, and if needed, using
> new creds.
>
> Also I moved rcu_read_unlock below the put_cred for consistency.
>
>
> wbr,
> jirka
>
>
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> ---
> fs/proc/array.c | 3 ++-
> kernel/cred.c | 6 ++++++
> 2 files changed, 8 insertions(+), 1 deletions(-)
>
> diff --git a/fs/proc/array.c b/fs/proc/array.c
> index 9b58d38..c0e60d1 100644
> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -199,7 +199,6 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
> "FDSize:\t%d\n"
> "Groups:\t",
> fdt ? fdt->max_fds : 0);
> - rcu_read_unlock();
>
> group_info = cred->group_info;
> task_unlock(p);
> @@ -208,6 +207,8 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
> seq_printf(m, "%d ", GROUP_AT(group_info, g));
> put_cred(cred);
>
> + rcu_read_unlock();
> +
> seq_printf(m, "\n");
> }
>
> diff --git a/kernel/cred.c b/kernel/cred.c
> index a2d5504..4872f12 100644
> --- a/kernel/cred.c
> +++ b/kernel/cred.c
> @@ -510,6 +510,12 @@ int commit_creds(struct cred *new)
> new->fsgid != old->fsgid)
> proc_id_connector(task, PROC_EVENT_GID);
>
> + /*
> + * New cred is set already, now let the old one
> + * chance to disappear on other CPUs.
> + */
> + synchronize_rcu();
> +
> /* release the old obj and subj refs both */
> put_cred(old);
> put_cred(old);
> --
I respectfully suggest that every patch adding a synchronize_rcu() call
be reviewed by Paul himself. This is really the last option that should
be considered, since it adds a big delay.
Can you explain why other solutions cannot be used ?
Thanks
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] cred - synchronize rcu before releasing cred
2010-06-16 12:45 ` Eric Dumazet
@ 2010-06-16 12:57 ` Jiri Olsa
2010-06-16 13:10 ` Eric Dumazet
0 siblings, 1 reply; 25+ messages in thread
From: Jiri Olsa @ 2010-06-16 12:57 UTC (permalink / raw)
To: Eric Dumazet; +Cc: dhowells, linux-kernel, Paul E. McKenney
On Wed, Jun 16, 2010 at 02:45:14PM +0200, Eric Dumazet wrote:
> Le mercredi 16 juin 2010 à 14:24 +0200, Jiri Olsa a écrit :
> > hi,
> >
> > BZ 591015 - kernel BUG at kernel/cred.c:168
> > https://bugzilla.redhat.com/show_bug.cgi?id=591015
> >
> > Above bugzilla reported bug during the releasing of
> > old cred structure.
> >
> > There is reproducer attached to the bugzilla.
> >
> > The issue is caused by releasing old cred struct while other
> > kernel path might be still using it. This leads to cred->usage
> > inconsistency inside the __put_cred and triggering the bug.
> >
> > Following kernel paths are affected:
> >
> > The CPU1 path is setting the new groups creds.
> > The CPU2 path is cat /proc/PID/status
> >
> >
> > CPU 1 CPU 2
> >
> > sys_setgroups proc_pid_status
> > set_current_groups task_state
> > commit_creds rcu_read_lock
> > put_cred ...
> > __put_cred get_cred
> > BUG_ON(usage != 0) ...
> > rcu_read_unlock
> >
> >
> >
> > If __put_cred got executed during the CPU2 holding the reference
> > the BUG_ON inside __put_cred is trigered.
> >
> > I put synchronize_rcu before put_cred call, so we are sure
> > all the rcu_read_lock blocks are finished, and if needed, using
> > new creds.
> >
> > Also I moved rcu_read_unlock below the put_cred for consistency.
> >
> >
> > wbr,
> > jirka
> >
> >
> > Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> > ---
> > fs/proc/array.c | 3 ++-
> > kernel/cred.c | 6 ++++++
> > 2 files changed, 8 insertions(+), 1 deletions(-)
> >
> > diff --git a/fs/proc/array.c b/fs/proc/array.c
> > index 9b58d38..c0e60d1 100644
> > --- a/fs/proc/array.c
> > +++ b/fs/proc/array.c
> > @@ -199,7 +199,6 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
> > "FDSize:\t%d\n"
> > "Groups:\t",
> > fdt ? fdt->max_fds : 0);
> > - rcu_read_unlock();
> >
> > group_info = cred->group_info;
> > task_unlock(p);
> > @@ -208,6 +207,8 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
> > seq_printf(m, "%d ", GROUP_AT(group_info, g));
> > put_cred(cred);
> >
> > + rcu_read_unlock();
> > +
> > seq_printf(m, "\n");
> > }
> >
> > diff --git a/kernel/cred.c b/kernel/cred.c
> > index a2d5504..4872f12 100644
> > --- a/kernel/cred.c
> > +++ b/kernel/cred.c
> > @@ -510,6 +510,12 @@ int commit_creds(struct cred *new)
> > new->fsgid != old->fsgid)
> > proc_id_connector(task, PROC_EVENT_GID);
> >
> > + /*
> > + * New cred is set already, now let the old one
> > + * chance to disappear on other CPUs.
> > + */
> > + synchronize_rcu();
> > +
> > /* release the old obj and subj refs both */
> > put_cred(old);
> > put_cred(old);
> > --
>
> I respectfully suggest that every patch adding a synchronize_rcu() call
> be reviewed by Paul himself. This is really the last option that should
> be considered, since it adds a big delay.
no problem, I wasn't aware of this
>
> Can you explain why other solutions cannot be used ?
I'm not sure which ones do you mean.. I'm not RCU expert, and
this one occured to me as probable fix, but if there're other
ways, no problem ;)
thanks,
jirka
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] cred - synchronize rcu before releasing cred
2010-06-16 12:57 ` Jiri Olsa
@ 2010-06-16 13:10 ` Eric Dumazet
2010-06-16 16:08 ` Jiri Olsa
0 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2010-06-16 13:10 UTC (permalink / raw)
To: Jiri Olsa; +Cc: dhowells, linux-kernel, Paul E. McKenney
Le mercredi 16 juin 2010 à 14:57 +0200, Jiri Olsa a écrit :
> On Wed, Jun 16, 2010 at 02:45:14PM +0200, Eric Dumazet wrote:
> > I respectfully suggest that every patch adding a synchronize_rcu() call
> > be reviewed by Paul himself. This is really the last option that should
> > be considered, since it adds a big delay.
>
> no problem, I wasn't aware of this
>
> >
> > Can you explain why other solutions cannot be used ?
>
> I'm not sure which ones do you mean.. I'm not RCU expert, and
> this one occured to me as probable fix, but if there're other
> ways, no problem ;)
To be fair, I am not RCU expert either, since I make mistakes on each
RCU related patch I post. Welcome to the club :)
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] cred - synchronize rcu before releasing cred
2010-06-16 13:10 ` Eric Dumazet
@ 2010-06-16 16:08 ` Jiri Olsa
2010-06-17 23:50 ` David Howells
0 siblings, 1 reply; 25+ messages in thread
From: Jiri Olsa @ 2010-06-16 16:08 UTC (permalink / raw)
To: Eric Dumazet; +Cc: dhowells, linux-kernel, Paul E. McKenney
On Wed, Jun 16, 2010 at 03:10:37PM +0200, Eric Dumazet wrote:
> Le mercredi 16 juin 2010 à 14:57 +0200, Jiri Olsa a écrit :
> > On Wed, Jun 16, 2010 at 02:45:14PM +0200, Eric Dumazet wrote:
>
> > > I respectfully suggest that every patch adding a synchronize_rcu() call
> > > be reviewed by Paul himself. This is really the last option that should
> > > be considered, since it adds a big delay.
> >
> > no problem, I wasn't aware of this
> >
> > >
> > > Can you explain why other solutions cannot be used ?
> >
> > I'm not sure which ones do you mean.. I'm not RCU expert, and
> > this one occured to me as probable fix, but if there're other
> > ways, no problem ;)
>
> To be fair, I am not RCU expert either, since I make mistakes on each
> RCU related patch I post. Welcome to the club :)
>
>
hm,
maybe I have better solution...
I think there's no need to get the cred refference as long as
the 'cred' handling stays inside the rcu_read_lock block.
And the condition 'make sure task doesn't go away', is done
by proc_single_show as this is the proc file.
I tested this succesfully with the reproducer from
https://bugzilla.redhat.com/show_bug.cgi?id=591015#c19
wbr,
jirka
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 9b58d38..ac3b3a4 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -176,7 +176,7 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
if (tracer)
tpid = task_pid_nr_ns(tracer, ns);
}
- cred = get_cred((struct cred *) __task_cred(p));
+ cred = __task_cred(p);
seq_printf(m,
"State:\t%s\n"
"Tgid:\t%d\n"
@@ -199,15 +199,14 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
"FDSize:\t%d\n"
"Groups:\t",
fdt ? fdt->max_fds : 0);
- rcu_read_unlock();
group_info = cred->group_info;
task_unlock(p);
for (g = 0; g < min(group_info->ngroups, NGROUPS_SMALL); g++)
seq_printf(m, "%d ", GROUP_AT(group_info, g));
- put_cred(cred);
+ rcu_read_unlock();
seq_printf(m, "\n");
}
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH] cred - synchronize rcu before releasing cred
2010-06-16 16:08 ` Jiri Olsa
@ 2010-06-17 23:50 ` David Howells
2010-06-19 12:01 ` Jiri Olsa
0 siblings, 1 reply; 25+ messages in thread
From: David Howells @ 2010-06-17 23:50 UTC (permalink / raw)
To: Jiri Olsa; +Cc: dhowells, Eric Dumazet, linux-kernel, Paul E. McKenney
Jiri Olsa <jolsa@redhat.com> wrote:
> maybe I have better solution...
>
> I think there's no need to get the cred refference as long as
> the 'cred' handling stays inside the rcu_read_lock block.
I think this is right. There should be no need for a synchronize_rcu() call
to be added in commit_creds() with this as commit_creds() calls put_cred()
which will defer the destruction until the RCU grace period is up anyway.
Whilst I'd prefer to call get_cred() in task_state(), as you point out (and I
hadn't considered), this may see an cred struct that has been detached from
its pointer on another CPU and had its usage count reduced to 0.
In such a case, we can't simply increment the count and then decrement it
again later as it's already on the RCU destruction queue and can't necessarily
be removed so that it can be added back in.
What could be done, though I'm not sure it's worth it, is to use
atomic_inc_not_zero() and loop around if the cred struct has gone out of
service when we try and access it and reread the pointer.
The advantage of this would be that we could manage to hold the RCU read lock
for as little time as possible.
Acked-by: David Howells <dhowells@redhat.com>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] cred - synchronize rcu before releasing cred
2010-06-17 23:50 ` David Howells
@ 2010-06-19 12:01 ` Jiri Olsa
2010-06-25 12:55 ` Jiri Olsa
0 siblings, 1 reply; 25+ messages in thread
From: Jiri Olsa @ 2010-06-19 12:01 UTC (permalink / raw)
To: David Howells; +Cc: Eric Dumazet, linux-kernel, Paul E. McKenney
On Fri, Jun 18, 2010 at 12:50:40AM +0100, David Howells wrote:
> Jiri Olsa <jolsa@redhat.com> wrote:
>
> > maybe I have better solution...
> >
> > I think there's no need to get the cred refference as long as
> > the 'cred' handling stays inside the rcu_read_lock block.
>
> I think this is right. There should be no need for a synchronize_rcu() call
> to be added in commit_creds() with this as commit_creds() calls put_cred()
> which will defer the destruction until the RCU grace period is up anyway.
>
> Whilst I'd prefer to call get_cred() in task_state(), as you point out (and I
> hadn't considered), this may see an cred struct that has been detached from
> its pointer on another CPU and had its usage count reduced to 0.
>
> In such a case, we can't simply increment the count and then decrement it
> again later as it's already on the RCU destruction queue and can't necessarily
> be removed so that it can be added back in.
>
> What could be done, though I'm not sure it's worth it, is to use
> atomic_inc_not_zero() and loop around if the cred struct has gone out of
> service when we try and access it and reread the pointer.
>
> The advantage of this would be that we could manage to hold the RCU read lock
> for as little time as possible.
>
> Acked-by: David Howells <dhowells@redhat.com>
thanks,
in case the there's need for some consistent comment :)
patch stays
wbr,
jirka
---
BZ 591015 - kernel BUG at kernel/cred.c:168
https://bugzilla.redhat.com/show_bug.cgi?id=591015
Above bugzilla reported bug during the releasing of
old cred structure.
There is reproducer attached to the bugzilla.
The issue is caused by releasing old cred struct while other
kernel path might be still using it. This leads to cred->usage
inconsistency inside the __put_cred and triggering the bug.
Following kernel paths are affected:
The CPU1 path is setting the new groups creds.
The CPU2 path is cat /proc/PID/status
CPU 1 CPU 2
sys_setgroups proc_pid_status
set_current_groups task_state
commit_creds rcu_read_lock
put_cred ...
__put_cred get_cred
BUG_ON(usage != 0) ...
rcu_read_unlock
If __put_cred got executed during the CPU2 holding the reference
the BUG_ON inside __put_cred is trigered.
I think there's no need to get the cred refference as long as
the 'cred' handling stays inside the rcu_read_lock block.
And the condition of __task_cred 'make sure task doesn't go away',
is done by proc_single_show as this is the proc file.
wbr,
jirka
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 9b58d38..ac3b3a4 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -176,7 +176,7 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
if (tracer)
tpid = task_pid_nr_ns(tracer, ns);
}
- cred = get_cred((struct cred *) __task_cred(p));
+ cred = __task_cred(p);
seq_printf(m,
"State:\t%s\n"
"Tgid:\t%d\n"
@@ -199,15 +199,14 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
"FDSize:\t%d\n"
"Groups:\t",
fdt ? fdt->max_fds : 0);
- rcu_read_unlock();
group_info = cred->group_info;
task_unlock(p);
for (g = 0; g < min(group_info->ngroups, NGROUPS_SMALL); g++)
seq_printf(m, "%d ", GROUP_AT(group_info, g));
- put_cred(cred);
+ rcu_read_unlock();
seq_printf(m, "\n");
}
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH] cred - synchronize rcu before releasing cred
2010-06-19 12:01 ` Jiri Olsa
@ 2010-06-25 12:55 ` Jiri Olsa
2010-06-25 13:28 ` David Howells
0 siblings, 1 reply; 25+ messages in thread
From: Jiri Olsa @ 2010-06-25 12:55 UTC (permalink / raw)
To: David Howells; +Cc: Eric Dumazet, linux-kernel, Paul E. McKenney
hi,
do I need to send this some one/place else?
not sure it's going to be picked up..
thanks,
jirka
On Sat, Jun 19, 2010 at 02:01:02PM +0200, Jiri Olsa wrote:
> On Fri, Jun 18, 2010 at 12:50:40AM +0100, David Howells wrote:
> > Jiri Olsa <jolsa@redhat.com> wrote:
> >
> > > maybe I have better solution...
> > >
> > > I think there's no need to get the cred refference as long as
> > > the 'cred' handling stays inside the rcu_read_lock block.
> >
> > I think this is right. There should be no need for a synchronize_rcu() call
> > to be added in commit_creds() with this as commit_creds() calls put_cred()
> > which will defer the destruction until the RCU grace period is up anyway.
> >
> > Whilst I'd prefer to call get_cred() in task_state(), as you point out (and I
> > hadn't considered), this may see an cred struct that has been detached from
> > its pointer on another CPU and had its usage count reduced to 0.
> >
> > In such a case, we can't simply increment the count and then decrement it
> > again later as it's already on the RCU destruction queue and can't necessarily
> > be removed so that it can be added back in.
> >
> > What could be done, though I'm not sure it's worth it, is to use
> > atomic_inc_not_zero() and loop around if the cred struct has gone out of
> > service when we try and access it and reread the pointer.
> >
> > The advantage of this would be that we could manage to hold the RCU read lock
> > for as little time as possible.
> >
> > Acked-by: David Howells <dhowells@redhat.com>
>
> thanks,
>
> in case the there's need for some consistent comment :)
> patch stays
>
> wbr,
> jirka
>
>
>
> ---
> BZ 591015 - kernel BUG at kernel/cred.c:168
> https://bugzilla.redhat.com/show_bug.cgi?id=591015
>
> Above bugzilla reported bug during the releasing of
> old cred structure.
>
> There is reproducer attached to the bugzilla.
>
> The issue is caused by releasing old cred struct while other
> kernel path might be still using it. This leads to cred->usage
> inconsistency inside the __put_cred and triggering the bug.
>
> Following kernel paths are affected:
>
> The CPU1 path is setting the new groups creds.
> The CPU2 path is cat /proc/PID/status
>
>
> CPU 1 CPU 2
>
> sys_setgroups proc_pid_status
> set_current_groups task_state
> commit_creds rcu_read_lock
> put_cred ...
> __put_cred get_cred
> BUG_ON(usage != 0) ...
> rcu_read_unlock
>
>
>
> If __put_cred got executed during the CPU2 holding the reference
> the BUG_ON inside __put_cred is trigered.
>
> I think there's no need to get the cred refference as long as
> the 'cred' handling stays inside the rcu_read_lock block.
>
> And the condition of __task_cred 'make sure task doesn't go away',
> is done by proc_single_show as this is the proc file.
>
> wbr,
> jirka
>
>
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> ---
> diff --git a/fs/proc/array.c b/fs/proc/array.c
> index 9b58d38..ac3b3a4 100644
> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -176,7 +176,7 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
> if (tracer)
> tpid = task_pid_nr_ns(tracer, ns);
> }
> - cred = get_cred((struct cred *) __task_cred(p));
> + cred = __task_cred(p);
> seq_printf(m,
> "State:\t%s\n"
> "Tgid:\t%d\n"
> @@ -199,15 +199,14 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
> "FDSize:\t%d\n"
> "Groups:\t",
> fdt ? fdt->max_fds : 0);
> - rcu_read_unlock();
>
> group_info = cred->group_info;
> task_unlock(p);
>
> for (g = 0; g < min(group_info->ngroups, NGROUPS_SMALL); g++)
> seq_printf(m, "%d ", GROUP_AT(group_info, g));
> - put_cred(cred);
>
> + rcu_read_unlock();
> seq_printf(m, "\n");
> }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] cred - synchronize rcu before releasing cred
2010-06-25 12:55 ` Jiri Olsa
@ 2010-06-25 13:28 ` David Howells
0 siblings, 0 replies; 25+ messages in thread
From: David Howells @ 2010-06-25 13:28 UTC (permalink / raw)
To: Jiri Olsa; +Cc: dhowells, Eric Dumazet, linux-kernel, Paul E. McKenney
Jiri Olsa <jolsa@redhat.com> wrote:
> do I need to send this some one/place else?
> not sure it's going to be picked up..
Post it to linux-security-module@vger.kernel.org with the ACKs you've
collected added in.
David
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH] cred - synchronize rcu before releasing cred
@ 2010-06-25 13:33 Jiri Olsa
2010-07-02 12:14 ` Jiri Olsa
0 siblings, 1 reply; 25+ messages in thread
From: Jiri Olsa @ 2010-06-25 13:33 UTC (permalink / raw)
To: linux-security-module
Cc: David Howells, Eric Dumazet, linux-kernel, Paul E. McKenney
BZ 591015 - kernel BUG at kernel/cred.c:168
https://bugzilla.redhat.com/show_bug.cgi?id=591015
Above bugzilla reported bug during the releasing of
old cred structure.
There is reproducer attached to the bugzilla.
The issue is caused by releasing old cred struct while other
kernel path might be still using it. This leads to cred->usage
inconsistency inside the __put_cred and triggering the bug.
Following kernel paths are affected:
The CPU1 path is setting the new groups creds.
The CPU2 path is cat /proc/PID/status
CPU 1 CPU 2
sys_setgroups proc_pid_status
set_current_groups task_state
commit_creds rcu_read_lock
put_cred ...
__put_cred get_cred
BUG_ON(usage != 0) ...
rcu_read_unlock
If __put_cred got executed during the CPU2 holding the reference
the BUG_ON inside __put_cred is trigered.
I think there's no need to get the cred refference as long as
the 'cred' handling stays inside the rcu_read_lock block.
And the condition of __task_cred 'make sure task doesn't go away',
is done by proc_single_show as this is the proc file.
wbr,
jirka
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
---
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 9b58d38..ac3b3a4 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -176,7 +176,7 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
if (tracer)
tpid = task_pid_nr_ns(tracer, ns);
}
- cred = get_cred((struct cred *) __task_cred(p));
+ cred = __task_cred(p);
seq_printf(m,
"State:\t%s\n"
"Tgid:\t%d\n"
@@ -199,15 +199,14 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
"FDSize:\t%d\n"
"Groups:\t",
fdt ? fdt->max_fds : 0);
- rcu_read_unlock();
group_info = cred->group_info;
task_unlock(p);
for (g = 0; g < min(group_info->ngroups, NGROUPS_SMALL); g++)
seq_printf(m, "%d ", GROUP_AT(group_info, g));
- put_cred(cred);
+ rcu_read_unlock();
seq_printf(m, "\n");
}
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH] cred - synchronize rcu before releasing cred
2010-06-25 13:33 Jiri Olsa
@ 2010-07-02 12:14 ` Jiri Olsa
0 siblings, 0 replies; 25+ messages in thread
From: Jiri Olsa @ 2010-07-02 12:14 UTC (permalink / raw)
To: linux-security-module
Cc: David Howells, Eric Dumazet, linux-kernel, Paul E. McKenney
hi,
any feedback?
jirka
On Fri, Jun 25, 2010 at 09:33:33AM -0400, Jiri Olsa wrote:
> BZ 591015 - kernel BUG at kernel/cred.c:168
> https://bugzilla.redhat.com/show_bug.cgi?id=591015
>
> Above bugzilla reported bug during the releasing of
> old cred structure.
>
> There is reproducer attached to the bugzilla.
>
> The issue is caused by releasing old cred struct while other
> kernel path might be still using it. This leads to cred->usage
> inconsistency inside the __put_cred and triggering the bug.
>
> Following kernel paths are affected:
>
> The CPU1 path is setting the new groups creds.
> The CPU2 path is cat /proc/PID/status
>
>
> CPU 1 CPU 2
>
> sys_setgroups proc_pid_status
> set_current_groups task_state
> commit_creds rcu_read_lock
> put_cred ...
> __put_cred get_cred
> BUG_ON(usage != 0) ...
> rcu_read_unlock
>
>
>
> If __put_cred got executed during the CPU2 holding the reference
> the BUG_ON inside __put_cred is trigered.
>
> I think there's no need to get the cred refference as long as
> the 'cred' handling stays inside the rcu_read_lock block.
>
> And the condition of __task_cred 'make sure task doesn't go away',
> is done by proc_single_show as this is the proc file.
>
> wbr,
> jirka
>
>
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> Acked-by: David Howells <dhowells@redhat.com>
> ---
> diff --git a/fs/proc/array.c b/fs/proc/array.c
> index 9b58d38..ac3b3a4 100644
> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -176,7 +176,7 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
> if (tracer)
> tpid = task_pid_nr_ns(tracer, ns);
> }
> - cred = get_cred((struct cred *) __task_cred(p));
> + cred = __task_cred(p);
> seq_printf(m,
> "State:\t%s\n"
> "Tgid:\t%d\n"
> @@ -199,15 +199,14 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
> "FDSize:\t%d\n"
> "Groups:\t",
> fdt ? fdt->max_fds : 0);
> - rcu_read_unlock();
>
> group_info = cred->group_info;
> task_unlock(p);
>
> for (g = 0; g < min(group_info->ngroups, NGROUPS_SMALL); g++)
> seq_printf(m, "%d ", GROUP_AT(group_info, g));
> - put_cred(cred);
>
> + rcu_read_unlock();
> seq_printf(m, "\n");
> }
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH] cred - synchronize rcu before releasing cred
@ 2010-07-27 15:50 Jiri Olsa
2010-07-27 16:16 ` Linus Torvalds
0 siblings, 1 reply; 25+ messages in thread
From: Jiri Olsa @ 2010-07-27 15:50 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Morton, David Howells, Eric Dumazet, linux-kernel,
Paul E. McKenney
hi,
got no objections on linux-security-module and acked by David.
Noone pick it, so got advice to send this directly to you.
wbr,
jirka
---
BZ 591015 - kernel BUG at kernel/cred.c:168
https://bugzilla.redhat.com/show_bug.cgi?id=591015
Above bugzilla reported bug during the releasing of
old cred structure.
There is reproducer attached to the bugzilla.
The issue is caused by releasing old cred struct while other
kernel path might be still using it. This leads to cred->usage
inconsistency inside the __put_cred and triggering the bug.
Following kernel paths are affected:
The CPU1 path is setting the new groups creds.
The CPU2 path is cat /proc/PID/status
CPU 1 CPU 2
sys_setgroups proc_pid_status
set_current_groups task_state
commit_creds rcu_read_lock
put_cred ...
__put_cred get_cred
BUG_ON(usage != 0) ...
rcu_read_unlock
If __put_cred got executed during the CPU2 holding the reference
the BUG_ON inside __put_cred is trigered.
I think there's no need to get the cred refference as long as
the 'cred' handling stays inside the rcu_read_lock block.
And the condition of __task_cred 'make sure task doesn't go away',
is done by proc_single_show as this is the proc file.
wbr,
jirka
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
---
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 9b58d38..ac3b3a4 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -176,7 +176,7 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
if (tracer)
tpid = task_pid_nr_ns(tracer, ns);
}
- cred = get_cred((struct cred *) __task_cred(p));
+ cred = __task_cred(p);
seq_printf(m,
"State:\t%s\n"
"Tgid:\t%d\n"
@@ -199,15 +199,14 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
"FDSize:\t%d\n"
"Groups:\t",
fdt ? fdt->max_fds : 0);
- rcu_read_unlock();
group_info = cred->group_info;
task_unlock(p);
for (g = 0; g < min(group_info->ngroups, NGROUPS_SMALL); g++)
seq_printf(m, "%d ", GROUP_AT(group_info, g));
- put_cred(cred);
+ rcu_read_unlock();
seq_printf(m, "\n");
}
----- End forwarded message -----
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH] cred - synchronize rcu before releasing cred
2010-07-27 15:50 [PATCH] cred - synchronize rcu before releasing cred Jiri Olsa
@ 2010-07-27 16:16 ` Linus Torvalds
2010-07-27 16:46 ` David Howells
0 siblings, 1 reply; 25+ messages in thread
From: Linus Torvalds @ 2010-07-27 16:16 UTC (permalink / raw)
To: Jiri Olsa
Cc: Andrew Morton, David Howells, Eric Dumazet, linux-kernel,
Paul E. McKenney
On Tue, Jul 27, 2010 at 8:50 AM, Jiri Olsa <jolsa@redhat.com> wrote:
>
> got no objections on linux-security-module and acked by David.
> Noone pick it, so got advice to send this directly to you.
The patch seems fundamentally buggy.
The whole patch seems to be based on "nobody can ever use
get_cred/put_cred, because concurrent use will then trigger the
BUG_ON() in __put_cred()".
But that's a bug in general, not in this particular usage that isn't
all that different from other uses. So rather than just remove the
code that uses the refcounting, we should either:
- FIX the damn ref-counting so that it works without bugging out
or
- remove the broken functions entirely.
In other words - why are we working around what looks like a bug,
rather than fixing the bug itself?
In particular, the code you remove seems to be basically _identical_
to get_task_cred(). So if the code you remove is buggy, then so is any
use of get_task_cred() - no?
So please explain why get_task_cred() is ok, but the particular use of
get_cred/put_cred that you removed is not.
Hmm? What am I missing?
Linus
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] cred - synchronize rcu before releasing cred
2010-07-27 16:16 ` Linus Torvalds
@ 2010-07-27 16:46 ` David Howells
2010-07-27 17:56 ` Linus Torvalds
0 siblings, 1 reply; 25+ messages in thread
From: David Howells @ 2010-07-27 16:46 UTC (permalink / raw)
To: Linus Torvalds
Cc: dhowells, Jiri Olsa, Andrew Morton, Eric Dumazet, linux-kernel,
Paul E. McKenney
Linus Torvalds <torvalds@linux-foundation.org> wrote:
> The whole patch seems to be based on "nobody can ever use
> get_cred/put_cred, because concurrent use will then trigger the
> BUG_ON() in __put_cred()".
That's not the problem.
The problem is that task_state() accesses the target task's credentials whilst
only holding the RCU read lock. That means that the existence of the cred
struct so accessed can only be guaranteed up to the point that the RCU read
lock is released.
What we shouldn't do is increment the usage count on the credentials because
we're not holding a lock that will prevent the target task reducing the
refcount on those credentials to zero between us reading the cred pointer and
incrementing the count.
If we want to increment the usage count on the credentials, we need to prevent
the target task from modifying its own credentials whilst we do it. Currently,
we can't do that as, taking an example, sys_setuid() doesn't hold hold any sort
of lock. We would have to add a spinlock or something like that for
commit_creds() to take.
What we have to do instead is grab any values we want from the cred struct
before releasing the RCU read lock. The moment we drop the lock, the cred
struct may cease to exist, even if we did increment its count.
David
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] cred - synchronize rcu before releasing cred
2010-07-27 16:46 ` David Howells
@ 2010-07-27 17:56 ` Linus Torvalds
2010-07-28 8:25 ` Jiri Olsa
2010-07-28 12:07 ` David Howells
0 siblings, 2 replies; 25+ messages in thread
From: Linus Torvalds @ 2010-07-27 17:56 UTC (permalink / raw)
To: David Howells
Cc: Jiri Olsa, Andrew Morton, Eric Dumazet, linux-kernel,
Paul E. McKenney
On Tue, Jul 27, 2010 at 9:46 AM, David Howells <dhowells@redhat.com> wrote:
>
> That's not the problem.
>
> The problem is that task_state() accesses the target task's credentials whilst
> only holding the RCU read lock. That means that the existence of the cred
> struct so accessed can only be guaranteed up to the point that the RCU read
> lock is released.
Umm. In that case, get_task_cred() is actively misleading.
What you are saying is that you cannot do
rcu_read_lock()
__cred = (struct cred *) __task_cred((task));
get_cred(__cred);
rcu_read_unlock();
but that is _exactly_ what get_task_cred() does. And that
__task_cred() check checks that we have
rcu_read_lock_held() || lockdep_tasklist_lock_is_held()
and what you are describing would require us to have a '&&' rather
than a '||' in that test. Because it is _not_ sufficient to have just
the rcu_read_lock held.
So it looks like the validation is simply wrong. The __task_cred()
helper is buggy. It's used for two different cases, and they have
totally different locking requirements.
Case #1:
- you can do __task_cred() with just read-lock held, but then you
cannot add refs to it
Case #2:
- you can do __task_cred() with read-lock held _and_ guaranteeing
that the task doesn't go away, and then you can hold a ref to it as
long as you still guarantee the task is around.
And the comments are actively wrong. The comments talk about the "case
#2" thing only. Ignoring case #1, except for the fact that the _check_
allows case #1, so you never get a warning from the RCU "proving" code
even for incorrect code.
So presumably Jiri's patch is correct, but the reason the bug happened
in the first place is that all those accessor functions are totally
confused about how they supposed to be used, with incorrect comments
and incorrect access checks.
That should get fixed. Who knows how many other buggy users there are
due to the confusion?
Linus
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] cred - synchronize rcu before releasing cred
2010-07-27 17:56 ` Linus Torvalds
@ 2010-07-28 8:25 ` Jiri Olsa
2010-07-28 12:07 ` David Howells
1 sibling, 0 replies; 25+ messages in thread
From: Jiri Olsa @ 2010-07-28 8:25 UTC (permalink / raw)
To: Linus Torvalds
Cc: David Howells, Andrew Morton, Eric Dumazet, linux-kernel,
Paul E. McKenney
On Tue, Jul 27, 2010 at 10:56:20AM -0700, Linus Torvalds wrote:
> On Tue, Jul 27, 2010 at 9:46 AM, David Howells <dhowells@redhat.com> wrote:
> >
> > That's not the problem.
> >
> > The problem is that task_state() accesses the target task's credentials whilst
> > only holding the RCU read lock. That means that the existence of the cred
> > struct so accessed can only be guaranteed up to the point that the RCU read
> > lock is released.
>
> Umm. In that case, get_task_cred() is actively misleading.
>
> What you are saying is that you cannot do
>
> rcu_read_lock()
> __cred = (struct cred *) __task_cred((task));
> get_cred(__cred);
> rcu_read_unlock();
>
> but that is _exactly_ what get_task_cred() does. And that
right, get_task_cred looks like source for similar bugs.. will check
> __task_cred() check checks that we have
>
> rcu_read_lock_held() || lockdep_tasklist_lock_is_held()
>
> and what you are describing would require us to have a '&&' rather
> than a '||' in that test. Because it is _not_ sufficient to have just
> the rcu_read_lock held.
>
> So it looks like the validation is simply wrong. The __task_cred()
> helper is buggy. It's used for two different cases, and they have
> totally different locking requirements.
>
> Case #1:
> - you can do __task_cred() with just read-lock held, but then you
> cannot add refs to it
>
> Case #2:
> - you can do __task_cred() with read-lock held _and_ guaranteeing
> that the task doesn't go away, and then you can hold a ref to it as
> long as you still guarantee the task is around.
>
> And the comments are actively wrong. The comments talk about the "case
> #2" thing only. Ignoring case #1, except for the fact that the _check_
> allows case #1, so you never get a warning from the RCU "proving" code
> even for incorrect code.
>
> So presumably Jiri's patch is correct, but the reason the bug happened
> in the first place is that all those accessor functions are totally
> confused about how they supposed to be used, with incorrect comments
> and incorrect access checks.
>
> That should get fixed. Who knows how many other buggy users there are
> due to the confusion?
I'll see if I can find some other places
thanks,
jirka
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] cred - synchronize rcu before releasing cred
2010-07-27 17:56 ` Linus Torvalds
2010-07-28 8:25 ` Jiri Olsa
@ 2010-07-28 12:07 ` David Howells
2010-07-28 12:47 ` David Howells
2010-07-28 13:17 ` David Howells
1 sibling, 2 replies; 25+ messages in thread
From: David Howells @ 2010-07-28 12:07 UTC (permalink / raw)
To: Linus Torvalds, Jiri Olsa
Cc: dhowells, Andrew Morton, Eric Dumazet, linux-kernel,
Paul E. McKenney, linux-security-module
Linus Torvalds <torvalds@linux-foundation.org> wrote:
> So it looks like the validation is simply wrong. The __task_cred()
> helper is buggy. It's used for two different cases, and they have
> totally different locking requirements.
I think part of my comment on __task_cred():
* The caller must make sure task doesn't go away, either by holding a
* ref on task or by holding tasklist_lock to prevent it from being
* unlinked.
may be obvious and perhaps unnecessary - anyone attempting to access a
task_struct should know that they need to prevent it from going away whilst
they do it - and I think this has led to Paul McKenny misunderstanding the
intent. What I was trying to convey is the destruction of the task_struct
involves discarding the credentials it points to.
Either I should insert the word 'also' into that comment after 'must' or just
get rid of it entirely.
I think I should remove lock_dep_tasklist_lock_is_held() from the stated
checks. It doesn't add anything, and someone calling __task_cred() on their
own process (perhaps indirectly) doesn't need the tasklist lock.
> Umm. In that case, get_task_cred() is actively misleading.
>
> What you are saying is that you cannot do
>
> rcu_read_lock()
> __cred = (struct cred *) __task_cred((task));
> get_cred(__cred);
> rcu_read_unlock();
Yeah. I think there are three alternatives:
(1) get_task_cred() could be done by doing { __task_cred(),
atomic_inc_not_zero() } in a loop until we manage to come up with the
goods. It probably wouldn't be all that inefficient as creds don't change
very often as a rule.
(2) Get a spinlock in commit_creds() around the point where we alter the cred
pointers.
(3) Try and get rid of get_task_cred() and use other means. I've gone through
all the users of get_task_cred() (see below) and this may be viable,
though restrictive, in places.
Any thoughts as to which is the best?
> So presumably Jiri's patch is correct, but the reason the bug happened
> in the first place is that all those accessor functions are totally
> confused about how they supposed to be used, with incorrect comments
> and incorrect access checks.
At some point in the past I think I discarded a lock from the code:-/
> That should get fixed. Who knows how many other buggy users there are
> due to the confusion?
Some.
warthog>grep get_task_cred `find . -name "*.[ch]"`
./kernel/auditsc.c: const struct cred *cred = get_task_cred(tsk);
This is audit_filter_rules() which is used by:
- audit_filter_task()
- audit_alloc() as called from copy_process() with the new process
- audit_filter_syscall()
- audit_get_context()
- audit_free() as called from copy_process() with the new, now dead process
- audit_free() as called from do_exit() with the dead process
- audit_syscall_exit() which passes current.
- audit_syscall_entry() which passes current.
- audit_filter_inodes()
- audit_update_watch() which passes current
- audit_get_context()
- see above.
which I think are all safe because when it's a new/dead process being accessed,
that process can't be modifying itself at that point, otherwise it's current
being accessed.
./kernel/cred.c: old = get_task_cred(daemon);
This is prepare_kernel_cred() which is used by:
- cachefiles_get_security_ID() which passes current.
so this is safe.
./include/linux/cred.h: * get_task_cred - Get another task's objective credentials
./include/linux/cred.h:#define get_task_cred(task) \
The header file stuff under discussion.
./security/security.c: cred = get_task_cred(tsk);
./security/security.c: cred = get_task_cred(tsk);
These are security_real_capable{,_noaudit}() if CONFIG_SECURITY=y, which could
be a problem since they're used by has_capability{,_noaudit}() and called by
things like the OOM killer with tasks other than current.
However, I'm not sure it's necessary for get_task_cred() to be used here (in
security/security.c) as it doesn't appear that the capable() LSM method sleeps
in the one LSM that implements it (SELinux) or in the commoncap code.
David
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] cred - synchronize rcu before releasing cred
2010-07-28 12:07 ` David Howells
@ 2010-07-28 12:47 ` David Howells
2010-07-29 6:00 ` Paul E. McKenney
2010-07-28 13:17 ` David Howells
1 sibling, 1 reply; 25+ messages in thread
From: David Howells @ 2010-07-28 12:47 UTC (permalink / raw)
To: Linus Torvalds, Jiri Olsa, Paul E. McKenney
Cc: dhowells, Andrew Morton, Eric Dumazet, linux-kernel,
linux-security-module
David Howells <dhowells@redhat.com> wrote:
> Yeah. I think there are three alternatives:
There's a fourth alternative too:
(4) I could try and make it so that if the RCU cleanup routine sees it with a
non-zero usage count, then it just ignores it. This, however, would
require call_rcu() to be able to cope with requeueing.
David
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] cred - synchronize rcu before releasing cred
2010-07-28 12:07 ` David Howells
2010-07-28 12:47 ` David Howells
@ 2010-07-28 13:17 ` David Howells
2010-07-28 14:46 ` Jiri Olsa
2010-07-28 15:51 ` Linus Torvalds
1 sibling, 2 replies; 25+ messages in thread
From: David Howells @ 2010-07-28 13:17 UTC (permalink / raw)
To: Linus Torvalds, Jiri Olsa
Cc: dhowells, Andrew Morton, Eric Dumazet, linux-kernel,
Paul E. McKenney, linux-security-module
The attached patch should suffice to fix get_task_cred(), and should render
Jiri's patch unnecessary.
David
---
From: David Howells <dhowells@redhat.com>
Subject: [PATCH] CRED: Move get_task_cred() out of line and make it use atomic_inc_not_zero()
It's possible for get_task_cred() as it currently stands to 'corrupt' a set of
credentials by incrementing their usage count after their replacement by the
task being accessed.
What happens is that get_task_cred() engages the RCU read lock, accesses the cred
TASK_1 TASK_2 RCU_CLEANER
-->get_task_cred(TASK_2)
rcu_read_lock()
__cred = __task_cred(TASK_2)
-->commit_creds()
old_cred = TASK_2->real_cred
TASK_2->real_cred = ...
put_cred(old_cred)
call_rcu(old_cred)
[__cred->usage == 0]
get_cred(__cred)
[__cred->usage == 1]
rcu_read_unlock()
-->put_cred_rcu()
[__cred->usage == 1]
panic()
However, since a tasks credentials are generally not changed very often, we can
reasonably make use of a loop involving reading the creds pointer and using
atomic_inc_not_zero() to attempt to increment it if it hasn't already hit zero.
If successful, we can safely return the credentials in the knowledge that, even
if the task we're accessing has released them, they haven't gone to the RCU
cleanup code.
Signed-off-by: David Howells <dhowells@redhat.com>
---
include/linux/cred.h | 21 +--------------------
kernel/cred.c | 25 +++++++++++++++++++++++++
2 files changed, 26 insertions(+), 20 deletions(-)
diff --git a/include/linux/cred.h b/include/linux/cred.h
index 75c0fa8..ce40cbc 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -153,6 +153,7 @@ struct cred {
extern void __put_cred(struct cred *);
extern void exit_creds(struct task_struct *);
extern int copy_creds(struct task_struct *, unsigned long);
+extern const struct cred *get_task_cred(struct task_struct *);
extern struct cred *cred_alloc_blank(void);
extern struct cred *prepare_creds(void);
extern struct cred *prepare_exec_creds(void);
@@ -282,26 +283,6 @@ static inline void put_cred(const struct cred *_cred)
((const struct cred *)(rcu_dereference_check((task)->real_cred, rcu_read_lock_held() || lockdep_tasklist_lock_is_held())))
/**
- * get_task_cred - Get another task's objective credentials
- * @task: The task to query
- *
- * Get the objective credentials of a task, pinning them so that they can't go
- * away. Accessing a task's credentials directly is not permitted.
- *
- * The caller must make sure task doesn't go away, either by holding a ref on
- * task or by holding tasklist_lock to prevent it from being unlinked.
- */
-#define get_task_cred(task) \
-({ \
- struct cred *__cred; \
- rcu_read_lock(); \
- __cred = (struct cred *) __task_cred((task)); \
- get_cred(__cred); \
- rcu_read_unlock(); \
- __cred; \
-})
-
-/**
* get_current_cred - Get the current task's subjective credentials
*
* Get the subjective credentials of the current task, pinning them so that
diff --git a/kernel/cred.c b/kernel/cred.c
index a2d5504..60bc8b1 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -209,6 +209,31 @@ void exit_creds(struct task_struct *tsk)
}
}
+/**
+ * get_task_cred - Get another task's objective credentials
+ * @task: The task to query
+ *
+ * Get the objective credentials of a task, pinning them so that they can't go
+ * away. Accessing a task's credentials directly is not permitted.
+ *
+ * The caller must also make sure task doesn't get deleted, either by holding a
+ * ref on task or by holding tasklist_lock to prevent it from being unlinked.
+ */
+const struct cred *get_task_cred(struct task_struct *task)
+{
+ const struct cred *cred;
+
+ rcu_read_lock();
+
+ do {
+ cred = __task_cred((task));
+ BUG_ON(!cred);
+ } while (!atomic_inc_not_zero(&((struct cred *)cred)->usage));
+
+ rcu_read_unlock();
+ return cred;
+}
+
/*
* Allocate blank credentials, such that the credentials can be filled in at a
* later date without risk of ENOMEM.
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH] cred - synchronize rcu before releasing cred
2010-07-28 13:17 ` David Howells
@ 2010-07-28 14:46 ` Jiri Olsa
2010-07-29 9:38 ` Jiri Olsa
2010-07-28 15:51 ` Linus Torvalds
1 sibling, 1 reply; 25+ messages in thread
From: Jiri Olsa @ 2010-07-28 14:46 UTC (permalink / raw)
To: David Howells
Cc: Linus Torvalds, Andrew Morton, Eric Dumazet, linux-kernel,
Paul E. McKenney, linux-security-module
On Wed, Jul 28, 2010 at 02:17:27PM +0100, David Howells wrote:
>
> The attached patch should suffice to fix get_task_cred(), and should render
> Jiri's patch unnecessary.
>
> David
> ---
> From: David Howells <dhowells@redhat.com>
> Subject: [PATCH] CRED: Move get_task_cred() out of line and make it use atomic_inc_not_zero()
>
> It's possible for get_task_cred() as it currently stands to 'corrupt' a set of
> credentials by incrementing their usage count after their replacement by the
> task being accessed.
>
> What happens is that get_task_cred() engages the RCU read lock, accesses the cred
>
>
> TASK_1 TASK_2 RCU_CLEANER
> -->get_task_cred(TASK_2)
> rcu_read_lock()
> __cred = __task_cred(TASK_2)
> -->commit_creds()
> old_cred = TASK_2->real_cred
> TASK_2->real_cred = ...
> put_cred(old_cred)
> call_rcu(old_cred)
> [__cred->usage == 0]
> get_cred(__cred)
> [__cred->usage == 1]
> rcu_read_unlock()
> -->put_cred_rcu()
> [__cred->usage == 1]
> panic()
>
> However, since a tasks credentials are generally not changed very often, we can
> reasonably make use of a loop involving reading the creds pointer and using
> atomic_inc_not_zero() to attempt to increment it if it hasn't already hit zero.
>
> If successful, we can safely return the credentials in the knowledge that, even
> if the task we're accessing has released them, they haven't gone to the RCU
> cleanup code.
looks ok, I changed the task_state to use this and I'm running the
bug reproducer... so far so good ;)
wbr,
jirka
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] cred - synchronize rcu before releasing cred
2010-07-28 13:17 ` David Howells
2010-07-28 14:46 ` Jiri Olsa
@ 2010-07-28 15:51 ` Linus Torvalds
1 sibling, 0 replies; 25+ messages in thread
From: Linus Torvalds @ 2010-07-28 15:51 UTC (permalink / raw)
To: David Howells
Cc: Jiri Olsa, Andrew Morton, Eric Dumazet, linux-kernel,
Paul E. McKenney, linux-security-module
On Wed, Jul 28, 2010 at 6:17 AM, David Howells <dhowells@redhat.com> wrote:
>
> The attached patch should suffice to fix get_task_cred(), and should render
> Jiri's patch unnecessary.
Ok, I like this one. It seems to make things much simpler, and makes
sense to me.
Let's just hope it works ;) but modulo that, ACK.
Linus
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] cred - synchronize rcu before releasing cred
2010-07-28 12:47 ` David Howells
@ 2010-07-29 6:00 ` Paul E. McKenney
2010-07-29 8:34 ` David Howells
0 siblings, 1 reply; 25+ messages in thread
From: Paul E. McKenney @ 2010-07-29 6:00 UTC (permalink / raw)
To: David Howells
Cc: Linus Torvalds, Jiri Olsa, Andrew Morton, Eric Dumazet,
linux-kernel, linux-security-module
On Wed, Jul 28, 2010 at 01:47:06PM +0100, David Howells wrote:
> David Howells <dhowells@redhat.com> wrote:
>
> > Yeah. I think there are three alternatives:
>
> There's a fourth alternative too:
>
> (4) I could try and make it so that if the RCU cleanup routine sees it with a
> non-zero usage count, then it just ignores it. This, however, would
> require call_rcu() to be able to cope with requeueing.
It is perfectly legal for an RCU callback to invoke call_rcu(). However,
this should be used -only- to wait for RCU readers. If there are no
RCU readers, the callback might be re-invoked in very short order,
expecially on UP systems.
Or am I misunderstanding what you mean by "require call_rcu() to be
able to cope iwth requeueing"?
Thanx, Paul
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] cred - synchronize rcu before releasing cred
2010-07-29 6:00 ` Paul E. McKenney
@ 2010-07-29 8:34 ` David Howells
2010-07-30 21:32 ` Paul E. McKenney
0 siblings, 1 reply; 25+ messages in thread
From: David Howells @ 2010-07-29 8:34 UTC (permalink / raw)
To: paulmck
Cc: dhowells, Linus Torvalds, Jiri Olsa, Andrew Morton, Eric Dumazet,
linux-kernel, linux-security-module
Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> It is perfectly legal for an RCU callback to invoke call_rcu(). However,
> this should be used -only- to wait for RCU readers. If there are no
> RCU readers, the callback might be re-invoked in very short order,
> expecially on UP systems.
>
> Or am I misunderstanding what you mean by "require call_rcu() to be
> able to cope iwth requeueing"?
I mean for call_rcu() to be called on an object that's already been
call_rcu()'d but not yet processed.
For example if struct cred gets its usage count reduced to 0, __put_cred()
will call_rcu() it, but what happens if someone comes along and resurrects it
by increasing its usage count again? And what happens if the usage count is
reduced back to zero and __put_cred() calls call_rcu() again before
put_cred_rcu() has a chance to run?
David
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] cred - synchronize rcu before releasing cred
2010-07-28 14:46 ` Jiri Olsa
@ 2010-07-29 9:38 ` Jiri Olsa
0 siblings, 0 replies; 25+ messages in thread
From: Jiri Olsa @ 2010-07-29 9:38 UTC (permalink / raw)
To: David Howells
Cc: Linus Torvalds, Andrew Morton, Eric Dumazet, linux-kernel,
Paul E. McKenney, linux-security-module
On Wed, Jul 28, 2010 at 04:46:27PM +0200, Jiri Olsa wrote:
> On Wed, Jul 28, 2010 at 02:17:27PM +0100, David Howells wrote:
> >
> > The attached patch should suffice to fix get_task_cred(), and should render
> > Jiri's patch unnecessary.
> >
> > David
> > ---
> > From: David Howells <dhowells@redhat.com>
> > Subject: [PATCH] CRED: Move get_task_cred() out of line and make it use atomic_inc_not_zero()
> >
> > It's possible for get_task_cred() as it currently stands to 'corrupt' a set of
> > credentials by incrementing their usage count after their replacement by the
> > task being accessed.
> >
> > What happens is that get_task_cred() engages the RCU read lock, accesses the cred
> >
> >
> > TASK_1 TASK_2 RCU_CLEANER
> > -->get_task_cred(TASK_2)
> > rcu_read_lock()
> > __cred = __task_cred(TASK_2)
> > -->commit_creds()
> > old_cred = TASK_2->real_cred
> > TASK_2->real_cred = ...
> > put_cred(old_cred)
> > call_rcu(old_cred)
> > [__cred->usage == 0]
> > get_cred(__cred)
> > [__cred->usage == 1]
> > rcu_read_unlock()
> > -->put_cred_rcu()
> > [__cred->usage == 1]
> > panic()
> >
> > However, since a tasks credentials are generally not changed very often, we can
> > reasonably make use of a loop involving reading the creds pointer and using
> > atomic_inc_not_zero() to attempt to increment it if it hasn't already hit zero.
> >
> > If successful, we can safely return the credentials in the knowledge that, even
> > if the task we're accessing has released them, they haven't gone to the RCU
> > cleanup code.
>
> looks ok, I changed the task_state to use this and I'm running the
> bug reproducer... so far so good ;)
the test did not hit the issue, so we are good probably
thanks,
jirka
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] cred - synchronize rcu before releasing cred
2010-07-29 8:34 ` David Howells
@ 2010-07-30 21:32 ` Paul E. McKenney
0 siblings, 0 replies; 25+ messages in thread
From: Paul E. McKenney @ 2010-07-30 21:32 UTC (permalink / raw)
To: David Howells
Cc: Linus Torvalds, Jiri Olsa, Andrew Morton, Eric Dumazet,
linux-kernel, linux-security-module
On Thu, Jul 29, 2010 at 09:34:20AM +0100, David Howells wrote:
> Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
>
> > It is perfectly legal for an RCU callback to invoke call_rcu(). However,
> > this should be used -only- to wait for RCU readers. If there are no
> > RCU readers, the callback might be re-invoked in very short order,
> > expecially on UP systems.
> >
> > Or am I misunderstanding what you mean by "require call_rcu() to be
> > able to cope iwth requeueing"?
>
> I mean for call_rcu() to be called on an object that's already been
> call_rcu()'d but not yet processed.
That would indeed be very bad!!!
> For example if struct cred gets its usage count reduced to 0, __put_cred()
> will call_rcu() it, but what happens if someone comes along and resurrects it
> by increasing its usage count again? And what happens if the usage count is
> reduced back to zero and __put_cred() calls call_rcu() again before
> put_cred_rcu() has a chance to run?
Doing this would mess up RCU's internal data structures. Mathieu
Desnoyers's recent debug changes (DEBUG_OBJECTS_RCU_HEAD) would catch
this sort of error.
Thanx, Paul
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2010-07-30 21:33 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-27 15:50 [PATCH] cred - synchronize rcu before releasing cred Jiri Olsa
2010-07-27 16:16 ` Linus Torvalds
2010-07-27 16:46 ` David Howells
2010-07-27 17:56 ` Linus Torvalds
2010-07-28 8:25 ` Jiri Olsa
2010-07-28 12:07 ` David Howells
2010-07-28 12:47 ` David Howells
2010-07-29 6:00 ` Paul E. McKenney
2010-07-29 8:34 ` David Howells
2010-07-30 21:32 ` Paul E. McKenney
2010-07-28 13:17 ` David Howells
2010-07-28 14:46 ` Jiri Olsa
2010-07-29 9:38 ` Jiri Olsa
2010-07-28 15:51 ` Linus Torvalds
-- strict thread matches above, loose matches on Subject: below --
2010-06-25 13:33 Jiri Olsa
2010-07-02 12:14 ` Jiri Olsa
2010-06-16 12:24 Jiri Olsa
2010-06-16 12:45 ` Eric Dumazet
2010-06-16 12:57 ` Jiri Olsa
2010-06-16 13:10 ` Eric Dumazet
2010-06-16 16:08 ` Jiri Olsa
2010-06-17 23:50 ` David Howells
2010-06-19 12:01 ` Jiri Olsa
2010-06-25 12:55 ` Jiri Olsa
2010-06-25 13:28 ` David Howells
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox