PATCH? rcu_do_batch: fix a pure theoretical memory ordering race

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* PATCH? rcu_do_batch: fix a pure theoretical memory ordering race
@ 2006-12-02 21:25 Oleg Nesterov
  2006-12-03 17:34 ` Eric Dumazet
  2006-12-04 16:43 ` Paul E. McKenney
  0 siblings, 2 replies; 8+ messages in thread
From: Oleg Nesterov @ 2006-12-02 21:25 UTC (permalink / raw)
  To: Paul E. McKenney, Dipankar Sarma
  Cc: Andrew Morton, Alan Stern, Eric Dumazet, linux-kernel

On top of rcu-add-a-prefetch-in-rcu_do_batch.patch

rcu_do_batch:

	struct rcu_head *next, *list;

	while (list) {
		next = list->next;	<------ [1]
		list->func(list);
		list = next;
	}

We can't trust *list after list->func() call, that is why we load list->next
beforehand. However I suspect in theory this is not enough, suppose that

	- [1] is stalled

	- list->func() marks *list as unused in some way

	- another CPU re-uses this rcu_head and dirties it

	- [1] completes and gets a wrong result

This means we need a barrier in between. mb() looks more suitable, but I think
rmb() should suffice.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>

--- 19-rc6/kernel/rcupdate.c~rdp	2006-12-02 20:46:03.000000000 +0300
+++ 19-rc6/kernel/rcupdate.c	2006-12-02 21:04:12.000000000 +0300
@@ -236,6 +236,8 @@ static void rcu_do_batch(struct rcu_data
 	list = rdp->donelist;
 	while (list) {
 		next = list->next;
+		/* complete the load above before we call ->func() */
+		smp_rmb();
 		prefetch(next);
 		list->func(list);
 		list = next;


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PATCH? rcu_do_batch: fix a pure theoretical memory ordering race
  2006-12-02 21:25 PATCH? rcu_do_batch: fix a pure theoretical memory ordering race Oleg Nesterov
@ 2006-12-03 17:34 ` Eric Dumazet
  2006-12-03 20:01   ` Oleg Nesterov
  2006-12-04 16:43 ` Paul E. McKenney
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2006-12-03 17:34 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Paul E. McKenney, Dipankar Sarma, Andrew Morton, Alan Stern,
	linux-kernel

Oleg Nesterov a écrit :
> On top of rcu-add-a-prefetch-in-rcu_do_batch.patch
> 
> rcu_do_batch:
> 
> 	struct rcu_head *next, *list;
> 
> 	while (list) {
> 		next = list->next;	<------ [1]
> 		list->func(list);
> 		list = next;
> 	}
> 
> We can't trust *list after list->func() call, that is why we load list->next
> beforehand. However I suspect in theory this is not enough, suppose that
> 
> 	- [1] is stalled
> 
> 	- list->func() marks *list as unused in some way
> 
> 	- another CPU re-uses this rcu_head and dirties it
> 
> 	- [1] completes and gets a wrong result
> 
> This means we need a barrier in between. mb() looks more suitable, but I think
> rmb() should suffice.
> 

Well, hopefully the "list->func()" MUST do the right thing [*], so your patch 
is not necessary.

For example, most structures are freed with kfree()/kmem_cache_free() and 
these functions MUST imply an smp_mb() [if/when exchanging data with other 
cpus], or else many uses in the kernel should be corrected as well.

[*] : In particular, slab code managment does something special when 
transfering local objects from local cpu A store to 'other cpus B'.
Other mechanisms should also use some kind of memory barrier in order to 
transfer an object to another cpu too, or you could imagine in flight stores 
from CPU A overwriting an object that was 'given' to CPU B.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PATCH? rcu_do_batch: fix a pure theoretical memory ordering race
  2006-12-03 17:34 ` Eric Dumazet
@ 2006-12-03 20:01   ` Oleg Nesterov
  2006-12-03 20:34     ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2006-12-03 20:01 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Paul E. McKenney, Dipankar Sarma, Andrew Morton, Alan Stern,
	linux-kernel

On 12/03, Eric Dumazet wrote:
>
> Oleg Nesterov a ?crit :
> >On top of rcu-add-a-prefetch-in-rcu_do_batch.patch
> >
> >rcu_do_batch:
> >
> >	struct rcu_head *next, *list;
> >
> >	while (list) {
> >		next = list->next;	<------ [1]
> >		list->func(list);
> >		list = next;
> >	}
> >
> >We can't trust *list after list->func() call, that is why we load 
> >list->next
> >beforehand. However I suspect in theory this is not enough, suppose that
> >
> >	- [1] is stalled
> >
> >	- list->func() marks *list as unused in some way
> >
> >	- another CPU re-uses this rcu_head and dirties it
> >
> >	- [1] completes and gets a wrong result
> >
> >This means we need a barrier in between. mb() looks more suitable, but I 
> >think
> >rmb() should suffice.
> >
> 
> Well, hopefully the "list->func()" MUST do the right thing [*], so your 
> patch is not necessary.

Yes, I don't claim it is necessary, note the "pure theoretical".

> For example, most structures are freed with kfree()/kmem_cache_free() and 
> these functions MUST imply an smp_mb() [if/when exchanging data with other 
> cpus], or else many uses in the kernel should be corrected as well.

Yes, mb() is enough (wmb() isn't) and kfree()/kmem_cache_free() are ok.
And I don't know any example of "unsafe" code in that sense.

However I believe it is easy to make the code which is correct from the
RCU's API pov, but unsafe.

Oleg.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PATCH? rcu_do_batch: fix a pure theoretical memory ordering race
  2006-12-03 20:01   ` Oleg Nesterov
@ 2006-12-03 20:34     ` Eric Dumazet
  2006-12-03 22:12       ` Oleg Nesterov
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2006-12-03 20:34 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Paul E. McKenney, Dipankar Sarma, Andrew Morton, Alan Stern,
	linux-kernel

Oleg Nesterov a écrit :
> On 12/03, Eric Dumazet wrote:
>> Oleg Nesterov a ?crit :
>>> On top of rcu-add-a-prefetch-in-rcu_do_batch.patch
>>>
>>> rcu_do_batch:
>>>
>>> 	struct rcu_head *next, *list;
>>>
>>> 	while (list) {
>>> 		next = list->next;	<------ [1]
>>> 		list->func(list);
>>> 		list = next;
>>> 	}
>>>
>>> We can't trust *list after list->func() call, that is why we load 
>>> list->next
>>> beforehand. However I suspect in theory this is not enough, suppose that
>>>
>>> 	- [1] is stalled
>>>
>>> 	- list->func() marks *list as unused in some way
>>>
>>> 	- another CPU re-uses this rcu_head and dirties it
>>>
>>> 	- [1] completes and gets a wrong result
>>>
>>> This means we need a barrier in between. mb() looks more suitable, but I 
>>> think
>>> rmb() should suffice.
>>>
>> Well, hopefully the "list->func()" MUST do the right thing [*], so your 
>> patch is not necessary.
> 
> Yes, I don't claim it is necessary, note the "pure theoretical".
> 
>> For example, most structures are freed with kfree()/kmem_cache_free() and 
>> these functions MUST imply an smp_mb() [if/when exchanging data with other 
>> cpus], or else many uses in the kernel should be corrected as well.
> 
> Yes, mb() is enough (wmb() isn't) and kfree()/kmem_cache_free() are ok.
> And I don't know any example of "unsafe" code in that sense.
> 
> However I believe it is easy to make the code which is correct from the
> RCU's API pov, but unsafe.

Yes, but how is it related to RCU ?
I mean, rcu_do_batch() is just a loop like others in kernel.
The loop itself is not buggy, but can call a buggy function, you are right.
A smp_rmb() wont avoid all possible bugs...


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PATCH? rcu_do_batch: fix a pure theoretical memory ordering race
  2006-12-03 20:34     ` Eric Dumazet
@ 2006-12-03 22:12       ` Oleg Nesterov
  2006-12-03 23:08         ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2006-12-03 22:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Paul E. McKenney, Dipankar Sarma, Andrew Morton, Alan Stern,
	linux-kernel

On 12/03, Eric Dumazet wrote:
>
> Oleg Nesterov a ?crit :
> 
> Yes, but how is it related to RCU ?
> I mean, rcu_do_batch() is just a loop like others in kernel.
> The loop itself is not buggy, but can call a buggy function, you are right.

	int start_me_again;

	struct rcu_head rcu_head;

	void rcu_func(struct rcu_head *rcu)
	{
		start_me_again = 1;
	}

	// could be called on arbitrary CPU
	void check_start_me_again(void)
	{
		static spinlock_t lock;

		spin_lock(lock);
		if (start_me_again) {
			start_me_again = 0;
			call_rcu(&rcu_head, rcu_func);
		}
		spin_unlock(lock);
	}

I'd say this code is not buggy.

In case it was not clear. I do not claim we need this patch (I don't know).
And yes, I very much doubt we can hit this problem in practice (even if I am
right).

What I don't agree with is that it is callback which should take care of this
problem.

> A smp_rmb() wont avoid all possible bugs...

For example?

Oleg.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PATCH? rcu_do_batch: fix a pure theoretical memory ordering race
  2006-12-03 22:12       ` Oleg Nesterov
@ 2006-12-03 23:08         ` Eric Dumazet
  2006-12-03 23:46           ` Oleg Nesterov
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2006-12-03 23:08 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Paul E. McKenney, Dipankar Sarma, Andrew Morton, Alan Stern,
	linux-kernel

Oleg Nesterov a écrit :
> On 12/03, Eric Dumazet wrote:
>> Oleg Nesterov a ?crit :
>>
>> Yes, but how is it related to RCU ?
>> I mean, rcu_do_batch() is just a loop like others in kernel.
>> The loop itself is not buggy, but can call a buggy function, you are right.
> 
> 	int start_me_again;
> 
> 	struct rcu_head rcu_head;
> 
> 	void rcu_func(struct rcu_head *rcu)
> 	{
> 		start_me_again = 1;
> 	}
> 
> 	// could be called on arbitrary CPU
> 	void check_start_me_again(void)
> 	{
> 		static spinlock_t lock;
> 
> 		spin_lock(lock);
> 		if (start_me_again) {
> 			start_me_again = 0;
> 			call_rcu(&rcu_head, rcu_func);
> 		}
> 		spin_unlock(lock);
> 	}
> 
> I'd say this code is not buggy.

Are you sure ? Can you prove it ? :)

I do think your rcu_func() misses some sync primitive, *after* start_me_again=1;
You seem to rely on some undocumented side effect.
Adding smp_rmb() before calling rcu_func() wont help.

> 
> In case it was not clear. I do not claim we need this patch (I don't know).
> And yes, I very much doubt we can hit this problem in practice (even if I am
> right).
> 
> What I don't agree with is that it is callback which should take care of this
> problem.
> 
>> A smp_rmb() wont avoid all possible bugs...
> 
> For example?

A smp_rmb() wont avoid stores pending on this CPU to be committed to memory 
after another cpu takes the object for itself. Those stores could overwrite 
stores done by the other cpu as well.

So in theory you could design a buggy callback function even after your patch 
applied.

Any function that can transfer an object from CPU A scope to CPU B scope must 
take care of memory barrier by itself. The caller *cannot* possibly do the 
job, especially if it used an indirect call. However, in some cases it is 
possible some clever algos are doing the reverse, ie doing the memory barrier 
in the callers.

Kernel is full of such constructs :

for (ptr = head; ptr != NULL ; ptr = next) {
	next = ptr->next;
	some_subsys_delete(ptr);
}

And we dont need to add smp_rmb() before the call to some_subsys_delete(), it 
would be a nightmare, and would slow down modern cpus.

When the callback function dont need a memory barrier, why should we force a 
generic one in the caller ?

AFAIK most kfree() calls dont need a barrier, since slab just queue the 
pointer into the cpu's array_cache if there is one available slot.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PATCH? rcu_do_batch: fix a pure theoretical memory ordering race
  2006-12-03 23:08         ` Eric Dumazet
@ 2006-12-03 23:46           ` Oleg Nesterov
  0 siblings, 0 replies; 8+ messages in thread
From: Oleg Nesterov @ 2006-12-03 23:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Paul E. McKenney, Dipankar Sarma, Andrew Morton, Alan Stern,
	linux-kernel

On 12/04, Eric Dumazet wrote:
>
> Oleg Nesterov a ?crit :
> >
> >	int start_me_again;
> >
> >	struct rcu_head rcu_head;
> >
> >	void rcu_func(struct rcu_head *rcu)
> >	{
> >		start_me_again = 1;
> >	}
> >
> >	// could be called on arbitrary CPU
> >	void check_start_me_again(void)
> >	{
> >		static spinlock_t lock;
> >
> >		spin_lock(lock);
> >		if (start_me_again) {
> >			start_me_again = 0;
> >			call_rcu(&rcu_head, rcu_func);
> >		}
> >		spin_unlock(lock);
> >	}
> >
> >I'd say this code is not buggy.
> 
> Are you sure ? Can you prove it ? :)

Looks like you think differently :)

> I do think your rcu_func() misses some sync primitive, *after* 
> start_me_again=1;
> You seem to rely on some undocumented side effect.
> Adding smp_rmb() before calling rcu_func() wont help.

I guess you mean that check_start_me_again() can miss start_me_again != 0 ?
Yes, of course, it should check the condition from time to time. We can even
do
	start_me_again = 1;
	wake_up(&start_me_again_wq);

, this is still unsafe.

> >>A smp_rmb() wont avoid all possible bugs...
> >
> >For example?
> 
> A smp_rmb() wont avoid stores pending on this CPU to be committed to memory 
> after another cpu takes the object for itself. Those stores could overwrite 
> stores done by the other cpu as well.

Yes. But RCU core doesn't write to rcu_head (except call_rcu). Callback _owns_
rcu_head, it should be ok to use it in any way without fear to break RCU.
Of course, callback should take care of its own locking/ordering.

> So in theory you could design a buggy callback function even after your 
> patch applied.

So. Do you claim that rcu_func() above is buggy?

> Any function that can transfer an object from CPU A scope to CPU B scope 
> must take care of memory barrier by itself. The caller *cannot* possibly do 
> the job, especially if it used an indirect call. However, in some cases it 
> is possible some clever algos are doing the reverse, ie doing the memory 
> barrier in the callers.
> 
> Kernel is full of such constructs :
> 
> for (ptr = head; ptr != NULL ; ptr = next) {
> 	next = ptr->next;
> 	some_subsys_delete(ptr);
> }
> 
> And we dont need to add smp_rmb() before the call to some_subsys_delete(), 
> it would be a nightmare, and would slow down modern cpus.

Agreed.

Oleg.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PATCH? rcu_do_batch: fix a pure theoretical memory ordering race
  2006-12-02 21:25 PATCH? rcu_do_batch: fix a pure theoretical memory ordering race Oleg Nesterov
  2006-12-03 17:34 ` Eric Dumazet
@ 2006-12-04 16:43 ` Paul E. McKenney
  1 sibling, 0 replies; 8+ messages in thread
From: Paul E. McKenney @ 2006-12-04 16:43 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dipankar Sarma, Andrew Morton, Alan Stern, Eric Dumazet,
	linux-kernel

On Sun, Dec 03, 2006 at 12:25:17AM +0300, Oleg Nesterov wrote:
> On top of rcu-add-a-prefetch-in-rcu_do_batch.patch
> 
> rcu_do_batch:
> 
> 	struct rcu_head *next, *list;
> 
> 	while (list) {
> 		next = list->next;	<------ [1]
> 		list->func(list);
> 		list = next;
> 	}
> 
> We can't trust *list after list->func() call, that is why we load list->next
> beforehand. However I suspect in theory this is not enough, suppose that
> 
> 	- [1] is stalled
> 
> 	- list->func() marks *list as unused in some way
> 
> 	- another CPU re-uses this rcu_head and dirties it

The memory allocators are required to do whatever is required to
ensure that the first CPU is no longer accessing the memory before
allocating it to the second CPU.

So we should not need to do this -- and if we did, we would have
serious problems throughout the kernel.

							Thanx, Paul

> 	- [1] completes and gets a wrong result
> 
> This means we need a barrier in between. mb() looks more suitable, but I think
> rmb() should suffice.
> 
> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
> 
> --- 19-rc6/kernel/rcupdate.c~rdp	2006-12-02 20:46:03.000000000 +0300
> +++ 19-rc6/kernel/rcupdate.c	2006-12-02 21:04:12.000000000 +0300
> @@ -236,6 +236,8 @@ static void rcu_do_batch(struct rcu_data
>  	list = rdp->donelist;
>  	while (list) {
>  		next = list->next;
> +		/* complete the load above before we call ->func() */
> +		smp_rmb();
>  		prefetch(next);
>  		list->func(list);
>  		list = next;
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-12-04 16:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-02 21:25 PATCH? rcu_do_batch: fix a pure theoretical memory ordering race Oleg Nesterov
2006-12-03 17:34 ` Eric Dumazet
2006-12-03 20:01   ` Oleg Nesterov
2006-12-03 20:34     ` Eric Dumazet
2006-12-03 22:12       ` Oleg Nesterov
2006-12-03 23:08         ` Eric Dumazet
2006-12-03 23:46           ` Oleg Nesterov
2006-12-04 16:43 ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox