* 2.6.23-rc6-mm1: IPC: sleeping function called ...
@ 2007-09-18 9:17 Alexey Dobriyan
2007-09-18 9:42 ` Andrew Morton
` (2 more replies)
0 siblings, 3 replies; 35+ messages in thread
From: Alexey Dobriyan @ 2007-09-18 9:17 UTC (permalink / raw)
To: akpm; +Cc: nadia.derbey, linux-kernel
I'm getting tons of this, and X fails to start
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_BKL=y
CONFIG_DEBUG_PREEMPT=y
BUG: sleeping function called from invalid context at kernel/rwsem.c:47
in_atomic():1, irqs_disabled():0
no locks held by X/5879.
[<c012bcb1>] down_write+0x15/0x50
[<c01b53af>] do_shmat+0x235/0x3a0
[<c0106be2>] sys_ipc+0x146/0x263
[<c0102892>] sysenter_past_esp+0xa7/0xb5
[<c0102856>] sysenter_past_esp+0x6b/0xb5
=======================
BUG: scheduling while atomic: X/5879/0x00000005
no locks held by X/5879.
irq event stamp: 401318
hardirqs last enabled at (401317): [<c02d96c8>] __mutex_unlock_slowpath+0xb4/0x170
hardirqs last disabled at (401318): [<c010286f>] sysenter_past_esp+0x84/0xb5
softirqs last enabled at (397628): [<c01051ce>] do_softirq+0xa6/0xf9
softirqs last disabled at (397585): [<c01051ce>] do_softirq+0xa6/0xf9
[<c02d86fe>] __sched_text_start+0x2de/0x384
[<c0106bf7>] sys_ipc+0x15b/0x263
[<c01029ae>] work_resched+0x5/0x31
=======================
note: X[5879] exited with preempt_count 7
BUG: scheduling while atomic: X/5879/0x10000008
no locks held by X/5879.
[<c02d86fe>] __sched_text_start+0x2de/0x384
[<c0114ad5>] __cond_resched+0x21/0x3b
[<c02d8c21>] cond_resched+0x2a/0x31
[<c014c877>] unmap_vmas+0x518/0x560
[<c014f5d6>] exit_mmap+0x6f/0x104
[<c0115d41>] mmput+0x3b/0xbe
[<c011ae16>] do_exit+0x142/0x870
[<c01114cb>] do_page_fault+0x0/0x675
[<c011b569>] do_group_exit+0x25/0x6b
[<c01114cb>] do_page_fault+0x0/0x675
[<c01221c5>] get_signal_to_deliver+0x286/0x439
[<c0120e00>] force_sig_info+0x80/0x8a
[<c01114cb>] do_page_fault+0x0/0x675
[<c0101e0e>] do_notify_resume+0x79/0x6bc
[<c015025a>] vma_link+0x97/0xd5
[<c015025a>] vma_link+0x97/0xd5
[<c0111754>] do_page_fault+0x289/0x675
[<c0102892>] sysenter_past_esp+0xa7/0xb5
[<c01114cb>] do_page_fault+0x0/0x675
[<c01029ed>] work_notifysig+0x13/0x1a
...
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-18 9:17 2.6.23-rc6-mm1: IPC: sleeping function called Alexey Dobriyan
@ 2007-09-18 9:42 ` Andrew Morton
2007-09-18 10:17 ` Andrew Morton
2007-09-18 10:27 ` 2.6.23-rc6-mm1: IPC: sleeping function called Andrew Morton
2 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2007-09-18 9:42 UTC (permalink / raw)
To: Alexey Dobriyan; +Cc: nadia.derbey, linux-kernel
On Tue, 18 Sep 2007 13:17:28 +0400 Alexey Dobriyan <adobriyan@sw.ru> wrote:
> I'm getting tons of this, and X fails to start
>
> CONFIG_SYSVIPC=y
> CONFIG_SYSVIPC_SYSCTL=y
> # CONFIG_PREEMPT_NONE is not set
> # CONFIG_PREEMPT_VOLUNTARY is not set
> CONFIG_PREEMPT=y
> CONFIG_PREEMPT_BKL=y
> CONFIG_DEBUG_PREEMPT=y
>
> BUG: sleeping function called from invalid context at kernel/rwsem.c:47
> in_atomic():1, irqs_disabled():0
> no locks held by X/5879.
> [<c012bcb1>] down_write+0x15/0x50
> [<c01b53af>] do_shmat+0x235/0x3a0
> [<c0106be2>] sys_ipc+0x146/0x263
> [<c0102892>] sysenter_past_esp+0xa7/0xb5
> [<c0102856>] sysenter_past_esp+0x6b/0xb5
Someone got their locking imbalanced. Seems that I lost the suitable config
settings to catch that.
Hang about while I do bisection search #800.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-18 9:17 2.6.23-rc6-mm1: IPC: sleeping function called Alexey Dobriyan
2007-09-18 9:42 ` Andrew Morton
@ 2007-09-18 10:17 ` Andrew Morton
2007-09-18 10:30 ` Nadia Derbey
2007-09-18 10:27 ` 2.6.23-rc6-mm1: IPC: sleeping function called Andrew Morton
2 siblings, 1 reply; 35+ messages in thread
From: Andrew Morton @ 2007-09-18 10:17 UTC (permalink / raw)
To: Alexey Dobriyan; +Cc: nadia.derbey, linux-kernel
On Tue, 18 Sep 2007 13:17:28 +0400 Alexey Dobriyan <adobriyan@sw.ru> wrote:
> I'm getting tons of this, and X fails to start
>
> CONFIG_SYSVIPC=y
> CONFIG_SYSVIPC_SYSCTL=y
> # CONFIG_PREEMPT_NONE is not set
> # CONFIG_PREEMPT_VOLUNTARY is not set
> CONFIG_PREEMPT=y
> CONFIG_PREEMPT_BKL=y
> CONFIG_DEBUG_PREEMPT=y
>
> BUG: sleeping function called from invalid context at kernel/rwsem.c:47
> in_atomic():1, irqs_disabled():0
> no locks held by X/5879.
> [<c012bcb1>] down_write+0x15/0x50
> [<c01b53af>] do_shmat+0x235/0x3a0
> [<c0106be2>] sys_ipc+0x146/0x263
> [<c0102892>] sysenter_past_esp+0xa7/0xb5
> [<c0102856>] sysenter_past_esp+0x6b/0xb5
> =======================
Here's a bug:
--- a/ipc/util.c~ipc-integrate-ipc_checkid-into-ipc_lock-fix-2
+++ a/ipc/util.c
@@ -691,7 +691,7 @@ struct kern_ipc_perm *ipc_lock(struct ip
rcu_read_unlock();
return ERR_PTR(-EINVAL);
}
-
+ rcu_read_unlock();
return out;
}
_
but it still doesn't fix it.
Nadia, please review Documentation/SubmitChecklist. It contains stuff
which would have prevented this.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-18 9:17 2.6.23-rc6-mm1: IPC: sleeping function called Alexey Dobriyan
2007-09-18 9:42 ` Andrew Morton
2007-09-18 10:17 ` Andrew Morton
@ 2007-09-18 10:27 ` Andrew Morton
2007-09-18 10:32 ` Alexey Dobriyan
2007-09-18 14:55 ` Nadia Derbey
2 siblings, 2 replies; 35+ messages in thread
From: Andrew Morton @ 2007-09-18 10:27 UTC (permalink / raw)
To: Alexey Dobriyan; +Cc: nadia.derbey, linux-kernel
On Tue, 18 Sep 2007 13:17:28 +0400 Alexey Dobriyan <adobriyan@sw.ru> wrote:
> I'm getting tons of this, and X fails to start
>
> CONFIG_SYSVIPC=y
> CONFIG_SYSVIPC_SYSCTL=y
> # CONFIG_PREEMPT_NONE is not set
> # CONFIG_PREEMPT_VOLUNTARY is not set
> CONFIG_PREEMPT=y
> CONFIG_PREEMPT_BKL=y
> CONFIG_DEBUG_PREEMPT=y
>
> BUG: sleeping function called from invalid context at kernel/rwsem.c:47
> in_atomic():1, irqs_disabled():0
OK, this fixes the locking here:
--- a/ipc/util.c~ipc-integrate-ipc_checkid-into-ipc_lock-fix-2
+++ a/ipc/util.c
@@ -295,7 +295,6 @@ int ipc_addid(struct ipc_ids* ids, struc
spin_lock_init(&new->lock);
new->deleted = 0;
- rcu_read_lock();
spin_lock(&new->lock);
return id;
}
@@ -691,7 +690,7 @@ struct kern_ipc_perm *ipc_lock(struct ip
rcu_read_unlock();
return ERR_PTR(-EINVAL);
}
-
+ rcu_read_unlock();
return out;
}
_
But I'm not very confident in what's happening in there. The patches take
away a _lot_ of the RCU locking, and it's unlear what's protecting the idr
tree. Is it rcu, or is it the mutex? The code seems confused and the
comments are all wrong.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-18 10:17 ` Andrew Morton
@ 2007-09-18 10:30 ` Nadia Derbey
2007-09-18 10:34 ` Andrew Morton
0 siblings, 1 reply; 35+ messages in thread
From: Nadia Derbey @ 2007-09-18 10:30 UTC (permalink / raw)
To: Andrew Morton; +Cc: Alexey Dobriyan, linux-kernel
Andrew Morton wrote:
> On Tue, 18 Sep 2007 13:17:28 +0400 Alexey Dobriyan <adobriyan@sw.ru> wrote:
>
>
>>I'm getting tons of this, and X fails to start
>>
>>CONFIG_SYSVIPC=y
>>CONFIG_SYSVIPC_SYSCTL=y
>># CONFIG_PREEMPT_NONE is not set
>># CONFIG_PREEMPT_VOLUNTARY is not set
>>CONFIG_PREEMPT=y
>>CONFIG_PREEMPT_BKL=y
>>CONFIG_DEBUG_PREEMPT=y
>>
>>BUG: sleeping function called from invalid context at kernel/rwsem.c:47
>>in_atomic():1, irqs_disabled():0
>>no locks held by X/5879.
>> [<c012bcb1>] down_write+0x15/0x50
>> [<c01b53af>] do_shmat+0x235/0x3a0
>> [<c0106be2>] sys_ipc+0x146/0x263
>> [<c0102892>] sysenter_past_esp+0xa7/0xb5
>> [<c0102856>] sysenter_past_esp+0x6b/0xb5
>> =======================
>
>
> Here's a bug:
>
> --- a/ipc/util.c~ipc-integrate-ipc_checkid-into-ipc_lock-fix-2
> +++ a/ipc/util.c
> @@ -691,7 +691,7 @@ struct kern_ipc_perm *ipc_lock(struct ip
> rcu_read_unlock();
> return ERR_PTR(-EINVAL);
> }
> -
> + rcu_read_unlock();
> return out;
> }
>
Andrew,
Actually the rcu_read_lock is released in ipc_unlock(). So I think we
shouldn't add an rcu_read_unlock() before leaving ipc_lock().
This is a part that has not changed since the ref code.
I'm looking also on my side.
Regards,
Nadia
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-18 10:27 ` 2.6.23-rc6-mm1: IPC: sleeping function called Andrew Morton
@ 2007-09-18 10:32 ` Alexey Dobriyan
2007-09-18 14:55 ` Nadia Derbey
1 sibling, 0 replies; 35+ messages in thread
From: Alexey Dobriyan @ 2007-09-18 10:32 UTC (permalink / raw)
To: Andrew Morton; +Cc: nadia.derbey, linux-kernel
On Tue, Sep 18, 2007 at 03:27:27AM -0700, Andrew Morton wrote:
> On Tue, 18 Sep 2007 13:17:28 +0400 Alexey Dobriyan <adobriyan@sw.ru> wrote:
>
> > I'm getting tons of this, and X fails to start
> >
> > CONFIG_SYSVIPC=y
> > CONFIG_SYSVIPC_SYSCTL=y
> > # CONFIG_PREEMPT_NONE is not set
> > # CONFIG_PREEMPT_VOLUNTARY is not set
> > CONFIG_PREEMPT=y
> > CONFIG_PREEMPT_BKL=y
> > CONFIG_DEBUG_PREEMPT=y
> >
> > BUG: sleeping function called from invalid context at kernel/rwsem.c:47
> > in_atomic():1, irqs_disabled():0
>
> OK, this fixes the locking here:
Ditto. Thanks!
[puts mental note to read IPC patches]
> --- a/ipc/util.c~ipc-integrate-ipc_checkid-into-ipc_lock-fix-2
> +++ a/ipc/util.c
> @@ -295,7 +295,6 @@ int ipc_addid(struct ipc_ids* ids, struc
>
> spin_lock_init(&new->lock);
> new->deleted = 0;
> - rcu_read_lock();
> spin_lock(&new->lock);
> return id;
> }
> @@ -691,7 +690,7 @@ struct kern_ipc_perm *ipc_lock(struct ip
> rcu_read_unlock();
> return ERR_PTR(-EINVAL);
> }
> -
> + rcu_read_unlock();
> return out;
> }
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-18 10:30 ` Nadia Derbey
@ 2007-09-18 10:34 ` Andrew Morton
[not found] ` <20070918142451.418b3b51@twins>
0 siblings, 1 reply; 35+ messages in thread
From: Andrew Morton @ 2007-09-18 10:34 UTC (permalink / raw)
To: Nadia Derbey; +Cc: Alexey Dobriyan, linux-kernel
On Tue, 18 Sep 2007 12:30:52 +0200 Nadia Derbey <Nadia.Derbey@bull.net> wrote:
> Andrew Morton wrote:
> > On Tue, 18 Sep 2007 13:17:28 +0400 Alexey Dobriyan <adobriyan@sw.ru> wrote:
> >
> >
> >>I'm getting tons of this, and X fails to start
> >>
> >>CONFIG_SYSVIPC=y
> >>CONFIG_SYSVIPC_SYSCTL=y
> >># CONFIG_PREEMPT_NONE is not set
> >># CONFIG_PREEMPT_VOLUNTARY is not set
> >>CONFIG_PREEMPT=y
> >>CONFIG_PREEMPT_BKL=y
> >>CONFIG_DEBUG_PREEMPT=y
> >>
> >>BUG: sleeping function called from invalid context at kernel/rwsem.c:47
> >>in_atomic():1, irqs_disabled():0
> >>no locks held by X/5879.
> >> [<c012bcb1>] down_write+0x15/0x50
> >> [<c01b53af>] do_shmat+0x235/0x3a0
> >> [<c0106be2>] sys_ipc+0x146/0x263
> >> [<c0102892>] sysenter_past_esp+0xa7/0xb5
> >> [<c0102856>] sysenter_past_esp+0x6b/0xb5
> >> =======================
> >
> >
> > Here's a bug:
> >
> > --- a/ipc/util.c~ipc-integrate-ipc_checkid-into-ipc_lock-fix-2
> > +++ a/ipc/util.c
> > @@ -691,7 +691,7 @@ struct kern_ipc_perm *ipc_lock(struct ip
> > rcu_read_unlock();
> > return ERR_PTR(-EINVAL);
> > }
> > -
> > + rcu_read_unlock();
> > return out;
> > }
> >
>
> Andrew,
>
> Actually the rcu_read_lock is released in ipc_unlock().
I agree it should be, but it isn't.
> So I think we
> shouldn't add an rcu_read_unlock() before leaving ipc_lock().
> This is a part that has not changed since the ref code.
>
Well, it was an optimisation. spin_lock() implies rcu_read_lock(). That's
a bit dirty and we might choose to not do that.
Would be interested in knowing the locking rules in there. For example,
this:
/**
* ipc_findkey - find a key in an ipc identifier set
* @ids: Identifier set
* @key: The key to find
*
* Requires ipc_ids.mutex locked.
* Returns the LOCKED pointer to the ipc structure if found or NULL
* if not.
* If key is found ipc contains its ipc structure
*/
appears to be hopelessly out of date?
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-18 10:27 ` 2.6.23-rc6-mm1: IPC: sleeping function called Andrew Morton
2007-09-18 10:32 ` Alexey Dobriyan
@ 2007-09-18 14:55 ` Nadia Derbey
2007-09-18 17:01 ` Andrew Morton
2007-09-19 14:07 ` Jarek Poplawski
1 sibling, 2 replies; 35+ messages in thread
From: Nadia Derbey @ 2007-09-18 14:55 UTC (permalink / raw)
To: Andrew Morton; +Cc: Alexey Dobriyan, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1466 bytes --]
Andrew Morton wrote:
> On Tue, 18 Sep 2007 13:17:28 +0400 Alexey Dobriyan <adobriyan@sw.ru> wrote:
>
>
>>I'm getting tons of this, and X fails to start
>>
>>CONFIG_SYSVIPC=y
>>CONFIG_SYSVIPC_SYSCTL=y
>># CONFIG_PREEMPT_NONE is not set
>># CONFIG_PREEMPT_VOLUNTARY is not set
>>CONFIG_PREEMPT=y
>>CONFIG_PREEMPT_BKL=y
>>CONFIG_DEBUG_PREEMPT=y
>>
>>BUG: sleeping function called from invalid context at kernel/rwsem.c:47
>>in_atomic():1, irqs_disabled():0
>
>
> OK, this fixes the locking here:
>
> --- a/ipc/util.c~ipc-integrate-ipc_checkid-into-ipc_lock-fix-2
> +++ a/ipc/util.c
> @@ -295,7 +295,6 @@ int ipc_addid(struct ipc_ids* ids, struc
>
> spin_lock_init(&new->lock);
> new->deleted = 0;
> - rcu_read_lock();
> spin_lock(&new->lock);
> return id;
> }
> @@ -691,7 +690,7 @@ struct kern_ipc_perm *ipc_lock(struct ip
> rcu_read_unlock();
> return ERR_PTR(-EINVAL);
> }
> -
> + rcu_read_unlock();
> return out;
> }
>
Well, reviewing the code I found another place where the
rcu_read_unlock() was missing.
I'm so sorry for the inconvenience. It's true that I should have tested
with CONFIG_PREEMPT=y :-(
Now, the ltp tests pass even with this option set...
In attachment you'll find a patch thhat
1) adds the missing rcu_read_unlock()
2) replaces Andrew's fix with a new one: the rcu_read_lock() is now
taken in ipc_lock() / ipc_lock_by_ptr() and released in ipc_unlock(),
exactly as it was done in the ref code.
Regards,
Nadia
[-- Attachment #2: ipc_missing_rcu_locks.patch --]
[-- Type: text/x-patch, Size: 1626 bytes --]
This patch fixes the missing rcu_read_(un)lock in the ipc code
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
---
ipc/util.c | 3 ++-
ipc/util.h | 4 +++-
2 files changed, 5 insertions(+), 2 deletions(-)
Index: linux-2.6.23-rc6-mm1/ipc/util.c
===================================================================
--- linux-2.6.23-rc6-mm1.orig/ipc/util.c 2007-09-18 16:39:43.000000000 +0200
+++ linux-2.6.23-rc6-mm1/ipc/util.c 2007-09-18 16:40:53.000000000 +0200
@@ -295,6 +295,7 @@ int ipc_addid(struct ipc_ids* ids, struc
spin_lock_init(&new->lock);
new->deleted = 0;
+ rcu_read_lock();
spin_lock(&new->lock);
return id;
}
@@ -690,7 +691,7 @@ struct kern_ipc_perm *ipc_lock(struct ip
rcu_read_unlock();
return ERR_PTR(-EINVAL);
}
- rcu_read_unlock();
+
return out;
}
Index: linux-2.6.23-rc6-mm1/ipc/util.h
===================================================================
--- linux-2.6.23-rc6-mm1.orig/ipc/util.h 2007-09-18 14:55:31.000000000 +0200
+++ linux-2.6.23-rc6-mm1/ipc/util.h 2007-09-18 16:41:37.000000000 +0200
@@ -141,12 +141,14 @@ static inline int ipc_checkid(struct ipc
static inline void ipc_lock_by_ptr(struct kern_ipc_perm *perm)
{
+ rcu_read_lock();
spin_lock(&perm->lock);
}
static inline void ipc_unlock(struct kern_ipc_perm *perm)
{
spin_unlock(&perm->lock);
+ rcu_read_unlock();
}
static inline struct kern_ipc_perm *ipc_lock_check(struct ipc_ids *ids,
@@ -159,7 +161,7 @@ static inline struct kern_ipc_perm *ipc_
return out;
if (ipc_checkid(ids, out, id)) {
- spin_unlock(&out->lock);
+ ipc_unlock(out);
return ERR_PTR(-EIDRM);
}
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
[not found] ` <20070918142451.418b3b51@twins>
@ 2007-09-18 16:13 ` Paul E. McKenney
2007-09-18 16:57 ` Andrew Morton
0 siblings, 1 reply; 35+ messages in thread
From: Paul E. McKenney @ 2007-09-18 16:13 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrew Morton, Nadia Derbey, Alexey Dobriyan, linux-kernel,
Ingo Molnar
On Tue, Sep 18, 2007 at 02:24:51PM +0200, Peter Zijlstra wrote:
> On Tue, 18 Sep 2007 03:34:00 -0700 Andrew Morton
> <akpm@linux-foundation.org> wrote:
>
> > Well, it was an optimisation. spin_lock() implies rcu_read_lock(). That's
> > a bit dirty and we might choose to not do that.
>
> Not true for the preemptible-rcu work. All such sites should be fixed,
> or at the very least heavily annotated.
What he said!!!
Thanx, Paul
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-18 16:13 ` Paul E. McKenney
@ 2007-09-18 16:57 ` Andrew Morton
2007-09-18 18:29 ` Paul E. McKenney
0 siblings, 1 reply; 35+ messages in thread
From: Andrew Morton @ 2007-09-18 16:57 UTC (permalink / raw)
To: paulmck
Cc: Peter Zijlstra, Nadia Derbey, Alexey Dobriyan, linux-kernel,
Ingo Molnar
On Tue, 18 Sep 2007 09:13:37 -0700 "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> On Tue, Sep 18, 2007 at 02:24:51PM +0200, Peter Zijlstra wrote:
> > On Tue, 18 Sep 2007 03:34:00 -0700 Andrew Morton
> > <akpm@linux-foundation.org> wrote:
> >
> > > Well, it was an optimisation. spin_lock() implies rcu_read_lock(). That's
> > > a bit dirty and we might choose to not do that.
> >
> > Not true for the preemptible-rcu work. All such sites should be fixed,
> > or at the very least heavily annotated.
>
> What he said!!!
>
What he said!
How are you going to find all such sites?
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-18 14:55 ` Nadia Derbey
@ 2007-09-18 17:01 ` Andrew Morton
2007-09-21 9:18 ` Nadia Derbey
2007-09-19 14:07 ` Jarek Poplawski
1 sibling, 1 reply; 35+ messages in thread
From: Andrew Morton @ 2007-09-18 17:01 UTC (permalink / raw)
To: Nadia Derbey; +Cc: Alexey Dobriyan, linux-kernel
On Tue, 18 Sep 2007 16:55:19 +0200 Nadia Derbey <Nadia.Derbey@bull.net> wrote:
> This patch fixes the missing rcu_read_(un)lock in the ipc code
Thanks. Could you please check the code comments in ipc/util.c for
accuracy and completeness sometime?
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-18 16:57 ` Andrew Morton
@ 2007-09-18 18:29 ` Paul E. McKenney
2007-09-18 19:41 ` Peter Zijlstra
` (2 more replies)
0 siblings, 3 replies; 35+ messages in thread
From: Paul E. McKenney @ 2007-09-18 18:29 UTC (permalink / raw)
To: Andrew Morton
Cc: Peter Zijlstra, Nadia Derbey, Alexey Dobriyan, linux-kernel,
Ingo Molnar
On Tue, Sep 18, 2007 at 09:57:15AM -0700, Andrew Morton wrote:
> On Tue, 18 Sep 2007 09:13:37 -0700 "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
>
> > On Tue, Sep 18, 2007 at 02:24:51PM +0200, Peter Zijlstra wrote:
> > > On Tue, 18 Sep 2007 03:34:00 -0700 Andrew Morton
> > > <akpm@linux-foundation.org> wrote:
> > >
> > > > Well, it was an optimisation. spin_lock() implies rcu_read_lock(). That's
> > > > a bit dirty and we might choose to not do that.
> > >
> > > Not true for the preemptible-rcu work. All such sites should be fixed,
> > > or at the very least heavily annotated.
> >
> > What he said!!!
> >
>
> What he said!
>
> How are you going to find all such sites?
A first step would be to look for patches in -rt that add rcu_read_lock()
or friends. A second step would be to look for rcu_dereference() without
an enclosing rcu_read_lock(), rcu_read_lock_bh(), or preempt_disable().
Beyond a certain point, this reduces to the problem of finding places
where spin_lock() was omitted, right?
Thanx, Paul
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-18 18:29 ` Paul E. McKenney
@ 2007-09-18 19:41 ` Peter Zijlstra
2007-09-18 20:26 ` [PATCH 1/2] lockdep: annotate rcu_read_lock() Peter Zijlstra
2007-09-18 20:27 ` [RFC][PATCH 2/2] lockdep: rcu_dereference() vs rcu_read_lock() Peter Zijlstra
2 siblings, 0 replies; 35+ messages in thread
From: Peter Zijlstra @ 2007-09-18 19:41 UTC (permalink / raw)
To: paulmck
Cc: Andrew Morton, Nadia Derbey, Alexey Dobriyan, linux-kernel,
Ingo Molnar
On Tue, 18 Sep 2007 11:29:24 -0700 "Paul E. McKenney"
<paulmck@linux.vnet.ibm.com> wrote:
> On Tue, Sep 18, 2007 at 09:57:15AM -0700, Andrew Morton wrote:
> > On Tue, 18 Sep 2007 09:13:37 -0700 "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> >
> > > On Tue, Sep 18, 2007 at 02:24:51PM +0200, Peter Zijlstra wrote:
> > > > On Tue, 18 Sep 2007 03:34:00 -0700 Andrew Morton
> > > > <akpm@linux-foundation.org> wrote:
> > > >
> > > > > Well, it was an optimisation. spin_lock() implies rcu_read_lock(). That's
> > > > > a bit dirty and we might choose to not do that.
> > > >
> > > > Not true for the preemptible-rcu work. All such sites should be fixed,
> > > > or at the very least heavily annotated.
> > >
> > > What he said!!!
> > >
> >
> > What he said!
> >
> > How are you going to find all such sites?
>
> A first step would be to look for patches in -rt that add rcu_read_lock()
> or friends. A second step would be to look for rcu_dereference() without
> an enclosing rcu_read_lock(), rcu_read_lock_bh(), or preempt_disable().
I could perhaps do that with that rcu_read_lock lockdep annotation. If
I add a check for holding rcu_read_lock in rcu_dereference().
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH 1/2] lockdep: annotate rcu_read_lock()
2007-09-18 18:29 ` Paul E. McKenney
2007-09-18 19:41 ` Peter Zijlstra
@ 2007-09-18 20:26 ` Peter Zijlstra
2007-09-18 20:27 ` [RFC][PATCH 2/2] lockdep: rcu_dereference() vs rcu_read_lock() Peter Zijlstra
2 siblings, 0 replies; 35+ messages in thread
From: Peter Zijlstra @ 2007-09-18 20:26 UTC (permalink / raw)
To: paulmck
Cc: Andrew Morton, Nadia Derbey, Alexey Dobriyan, linux-kernel,
Ingo Molnar
Lockdep annotate rcu_read_lock() so that it will find unbalanced
locking.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/lockdep.h | 7 +++++++
include/linux/rcupdate.h | 12 ++++++++++++
kernel/rcupdate.c | 8 ++++++++
3 files changed, 27 insertions(+)
Index: linux-2.6/include/linux/lockdep.h
===================================================================
--- linux-2.6.orig/include/linux/lockdep.h
+++ linux-2.6/include/linux/lockdep.h
@@ -252,6 +252,13 @@ extern void lockdep_init_map(struct lock
struct lock_class_key *key, int subclass);
/*
+ * To initialize a lockdep_map statically use this macro.
+ * Note that _name must not be NULL.
+ */
+#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
+ { .name = (_name), .key = (void *)(_key), }
+
+/*
* Reinitialize a lock key - for cases where there is special locking or
* special initialization of locks so that the validator gets the scope
* of dependencies wrong: they are either too broad (they need a class-split)
Index: linux-2.6/include/linux/rcupdate.h
===================================================================
--- linux-2.6.orig/include/linux/rcupdate.h
+++ linux-2.6/include/linux/rcupdate.h
@@ -41,6 +41,7 @@
#include <linux/percpu.h>
#include <linux/cpumask.h>
#include <linux/seqlock.h>
+#include <linux/lockdep.h>
/**
* struct rcu_head - callback structure for use with RCU
@@ -133,6 +134,15 @@ static inline void rcu_bh_qsctr_inc(int
extern int rcu_pending(int cpu);
extern int rcu_needs_cpu(int cpu);
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+extern struct lockdep_map rcu_lock_map;
+# define rcu_read_acquire() lock_acquire(&rcu_lock_map, 0, 0, 2, 1, _THIS_IP_)
+# define rcu_read_release() lock_release(&rcu_lock_map, 1, _THIS_IP_)
+#else
+# define rcu_read_acquire() do { } while (0)
+# define rcu_read_release() do { } while (0)
+#endif
+
/**
* rcu_read_lock - mark the beginning of an RCU read-side critical section.
*
@@ -166,6 +176,7 @@ extern int rcu_needs_cpu(int cpu);
do { \
preempt_disable(); \
__acquire(RCU); \
+ rcu_read_acquire(); \
} while(0)
/**
@@ -175,6 +186,7 @@ extern int rcu_needs_cpu(int cpu);
*/
#define rcu_read_unlock() \
do { \
+ rcu_read_release(); \
__release(RCU); \
preempt_enable(); \
} while(0)
Index: linux-2.6/kernel/rcupdate.c
===================================================================
--- linux-2.6.orig/kernel/rcupdate.c
+++ linux-2.6/kernel/rcupdate.c
@@ -48,6 +48,14 @@
#include <linux/cpu.h>
#include <linux/mutex.h>
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+static struct lock_class_key rcu_lock_key;
+struct lockdep_map rcu_lock_map =
+ STATIC_LOCKDEP_MAP_INIT("rcu_read_lock", &rcu_lock_key);
+
+EXPORT_SYMBOL_GPL(rcu_lock_map);
+#endif
+
/* Definition for rcupdate control block. */
static struct rcu_ctrlblk rcu_ctrlblk = {
.cur = -300,
^ permalink raw reply [flat|nested] 35+ messages in thread
* [RFC][PATCH 2/2] lockdep: rcu_dereference() vs rcu_read_lock()
2007-09-18 18:29 ` Paul E. McKenney
2007-09-18 19:41 ` Peter Zijlstra
2007-09-18 20:26 ` [PATCH 1/2] lockdep: annotate rcu_read_lock() Peter Zijlstra
@ 2007-09-18 20:27 ` Peter Zijlstra
2007-09-18 21:21 ` Paul E. McKenney
2 siblings, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2007-09-18 20:27 UTC (permalink / raw)
To: paulmck
Cc: Andrew Morton, Nadia Derbey, Alexey Dobriyan, linux-kernel,
Ingo Molnar
warn when rcu_dereference() is used outside of rcu_read_lock()
[ generates a _lot_ of output when booted ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/lockdep.h | 3 ++
include/linux/rcupdate.h | 5 ++++
kernel/lockdep.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 61 insertions(+)
Index: linux-2.6/include/linux/lockdep.h
===================================================================
--- linux-2.6.orig/include/linux/lockdep.h
+++ linux-2.6/include/linux/lockdep.h
@@ -303,6 +303,8 @@ extern void lock_acquire(struct lockdep_
extern void lock_release(struct lockdep_map *lock, int nested,
unsigned long ip);
+extern int lock_is_held(struct lockdep_map *lock);
+
# define INIT_LOCKDEP .lockdep_recursion = 0,
#define lockdep_depth(tsk) (debug_locks ? (tsk)->lockdep_depth : 0)
@@ -319,6 +321,7 @@ static inline void lockdep_on(void)
# define lock_acquire(l, s, t, r, c, i) do { } while (0)
# define lock_release(l, n, i) do { } while (0)
+# define lock_is_held(l) (0)
# define lockdep_init() do { } while (0)
# define lockdep_info() do { } while (0)
# define lockdep_init_map(lock, name, key, sub) do { (void)(key); } while (0)
Index: linux-2.6/include/linux/rcupdate.h
===================================================================
--- linux-2.6.orig/include/linux/rcupdate.h
+++ linux-2.6/include/linux/rcupdate.h
@@ -138,9 +138,11 @@ extern int rcu_needs_cpu(int cpu);
extern struct lockdep_map rcu_lock_map;
# define rcu_read_acquire() lock_acquire(&rcu_lock_map, 0, 0, 2, 1, _THIS_IP_)
# define rcu_read_release() lock_release(&rcu_lock_map, 1, _THIS_IP_)
+# define rcu_read_held() WARN_ON_ONCE(!lock_is_held(&rcu_lock_map))
#else
# define rcu_read_acquire() do { } while (0)
# define rcu_read_release() do { } while (0)
+# define rcu_read_held() do { } while (0)
#endif
/**
@@ -216,6 +218,7 @@ extern struct lockdep_map rcu_lock_map;
do { \
local_bh_disable(); \
__acquire(RCU_BH); \
+ rcu_read_acquire(); \
} while(0)
/*
@@ -225,6 +228,7 @@ extern struct lockdep_map rcu_lock_map;
*/
#define rcu_read_unlock_bh() \
do { \
+ rcu_read_release(); \
__release(RCU_BH); \
local_bh_enable(); \
} while(0)
@@ -254,6 +258,7 @@ extern struct lockdep_map rcu_lock_map;
#define rcu_dereference(p) ({ \
typeof(p) _________p1 = ACCESS_ONCE(p); \
smp_read_barrier_depends(); \
+ rcu_read_held(); \
(_________p1); \
})
Index: linux-2.6/kernel/lockdep.c
===================================================================
--- linux-2.6.orig/kernel/lockdep.c
+++ linux-2.6/kernel/lockdep.c
@@ -2624,6 +2624,36 @@ static int lock_release_nested(struct ta
return 1;
}
+static int __lock_is_held(struct lockdep_map *lock)
+{
+ struct task_struct *curr = current;
+ struct held_lock *hlock, *prev_hlock;
+ unsigned int depth;
+ int i;
+
+ /*
+ * Check whether the lock exists in the current stack
+ * of held locks:
+ */
+ depth = curr->lockdep_depth;
+ if (DEBUG_LOCKS_WARN_ON(!depth))
+ return 0;
+
+ prev_hlock = NULL;
+ for (i = depth-1; i >= 0; i--) {
+ hlock = curr->held_locks + i;
+ /*
+ * We must not cross into another context:
+ */
+ if (prev_hlock && prev_hlock->irq_context != hlock->irq_context)
+ break;
+ if (hlock->instance == lock)
+ return 1;
+ prev_hlock = hlock;
+ }
+ return 0;
+}
+
/*
* Remove the lock to the list of currently held locks - this gets
* called on mutex_unlock()/spin_unlock*() (or on a failed
@@ -2727,6 +2757,29 @@ void lock_release(struct lockdep_map *lo
EXPORT_SYMBOL_GPL(lock_release);
+int lock_is_held(struct lockdep_map *lock)
+{
+ int ret = 0;
+ unsigned long flags;
+
+ if (unlikely(!lock_stat && !prove_locking))
+ return 0;
+
+ if (unlikely(current->lockdep_recursion))
+ return -EBUSY;
+
+ raw_local_irq_save(flags);
+ check_flags(flags);
+ current->lockdep_recursion = 1;
+ ret = __lock_is_held(lock);
+ current->lockdep_recursion = 0;
+ raw_local_irq_restore(flags);
+
+ return ret;
+}
+
+EXPORT_SYMBOL_GPL(lock_is_held);
+
#ifdef CONFIG_LOCK_STAT
static int
print_lock_contention_bug(struct task_struct *curr, struct lockdep_map *lock,
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [RFC][PATCH 2/2] lockdep: rcu_dereference() vs rcu_read_lock()
2007-09-18 20:27 ` [RFC][PATCH 2/2] lockdep: rcu_dereference() vs rcu_read_lock() Peter Zijlstra
@ 2007-09-18 21:21 ` Paul E. McKenney
0 siblings, 0 replies; 35+ messages in thread
From: Paul E. McKenney @ 2007-09-18 21:21 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrew Morton, Nadia Derbey, Alexey Dobriyan, linux-kernel,
Ingo Molnar
On Tue, Sep 18, 2007 at 10:27:01PM +0200, Peter Zijlstra wrote:
>
>
> warn when rcu_dereference() is used outside of rcu_read_lock()
Cool!!!
> [ generates a _lot_ of output when booted ]
I bet! If you create an annotation for rcu_read_lock_bh()
and rcu_read_unlock_bh() like you did for rcu_read_lock() and
rcu_read_unlock(), I suspect that much of the noise will go away.
Of course, preempt_disable() / preempt_enable() is a bit of a two-edged
sword here. It is OK to do rcu_dereference() under explicit
preempt_disable(), but not OK under the implicit preempt_disable()
implied by spin_lock() in CONFIG_PREEMPT. One way to handle this
would be to have _rcu() wrappers for preempt_disable() and
preempt_enable() that are annotated.
There is probably a better way... Can't think of one right off, though...
Thanx, Paul
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> include/linux/lockdep.h | 3 ++
> include/linux/rcupdate.h | 5 ++++
> kernel/lockdep.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 61 insertions(+)
>
> Index: linux-2.6/include/linux/lockdep.h
> ===================================================================
> --- linux-2.6.orig/include/linux/lockdep.h
> +++ linux-2.6/include/linux/lockdep.h
> @@ -303,6 +303,8 @@ extern void lock_acquire(struct lockdep_
> extern void lock_release(struct lockdep_map *lock, int nested,
> unsigned long ip);
>
> +extern int lock_is_held(struct lockdep_map *lock);
> +
> # define INIT_LOCKDEP .lockdep_recursion = 0,
>
> #define lockdep_depth(tsk) (debug_locks ? (tsk)->lockdep_depth : 0)
> @@ -319,6 +321,7 @@ static inline void lockdep_on(void)
>
> # define lock_acquire(l, s, t, r, c, i) do { } while (0)
> # define lock_release(l, n, i) do { } while (0)
> +# define lock_is_held(l) (0)
> # define lockdep_init() do { } while (0)
> # define lockdep_info() do { } while (0)
> # define lockdep_init_map(lock, name, key, sub) do { (void)(key); } while (0)
> Index: linux-2.6/include/linux/rcupdate.h
> ===================================================================
> --- linux-2.6.orig/include/linux/rcupdate.h
> +++ linux-2.6/include/linux/rcupdate.h
> @@ -138,9 +138,11 @@ extern int rcu_needs_cpu(int cpu);
> extern struct lockdep_map rcu_lock_map;
> # define rcu_read_acquire() lock_acquire(&rcu_lock_map, 0, 0, 2, 1, _THIS_IP_)
> # define rcu_read_release() lock_release(&rcu_lock_map, 1, _THIS_IP_)
> +# define rcu_read_held() WARN_ON_ONCE(!lock_is_held(&rcu_lock_map))
> #else
> # define rcu_read_acquire() do { } while (0)
> # define rcu_read_release() do { } while (0)
> +# define rcu_read_held() do { } while (0)
> #endif
>
> /**
> @@ -216,6 +218,7 @@ extern struct lockdep_map rcu_lock_map;
> do { \
> local_bh_disable(); \
> __acquire(RCU_BH); \
> + rcu_read_acquire(); \
> } while(0)
>
> /*
> @@ -225,6 +228,7 @@ extern struct lockdep_map rcu_lock_map;
> */
> #define rcu_read_unlock_bh() \
> do { \
> + rcu_read_release(); \
> __release(RCU_BH); \
> local_bh_enable(); \
> } while(0)
> @@ -254,6 +258,7 @@ extern struct lockdep_map rcu_lock_map;
> #define rcu_dereference(p) ({ \
> typeof(p) _________p1 = ACCESS_ONCE(p); \
> smp_read_barrier_depends(); \
> + rcu_read_held(); \
> (_________p1); \
> })
>
> Index: linux-2.6/kernel/lockdep.c
> ===================================================================
> --- linux-2.6.orig/kernel/lockdep.c
> +++ linux-2.6/kernel/lockdep.c
> @@ -2624,6 +2624,36 @@ static int lock_release_nested(struct ta
> return 1;
> }
>
> +static int __lock_is_held(struct lockdep_map *lock)
> +{
> + struct task_struct *curr = current;
> + struct held_lock *hlock, *prev_hlock;
> + unsigned int depth;
> + int i;
> +
> + /*
> + * Check whether the lock exists in the current stack
> + * of held locks:
> + */
> + depth = curr->lockdep_depth;
> + if (DEBUG_LOCKS_WARN_ON(!depth))
> + return 0;
> +
> + prev_hlock = NULL;
> + for (i = depth-1; i >= 0; i--) {
> + hlock = curr->held_locks + i;
> + /*
> + * We must not cross into another context:
> + */
> + if (prev_hlock && prev_hlock->irq_context != hlock->irq_context)
> + break;
> + if (hlock->instance == lock)
> + return 1;
> + prev_hlock = hlock;
> + }
> + return 0;
> +}
> +
> /*
> * Remove the lock to the list of currently held locks - this gets
> * called on mutex_unlock()/spin_unlock*() (or on a failed
> @@ -2727,6 +2757,29 @@ void lock_release(struct lockdep_map *lo
>
> EXPORT_SYMBOL_GPL(lock_release);
>
> +int lock_is_held(struct lockdep_map *lock)
> +{
> + int ret = 0;
> + unsigned long flags;
> +
> + if (unlikely(!lock_stat && !prove_locking))
> + return 0;
> +
> + if (unlikely(current->lockdep_recursion))
> + return -EBUSY;
> +
> + raw_local_irq_save(flags);
> + check_flags(flags);
> + current->lockdep_recursion = 1;
> + ret = __lock_is_held(lock);
> + current->lockdep_recursion = 0;
> + raw_local_irq_restore(flags);
> +
> + return ret;
> +}
> +
> +EXPORT_SYMBOL_GPL(lock_is_held);
> +
> #ifdef CONFIG_LOCK_STAT
> static int
> print_lock_contention_bug(struct task_struct *curr, struct lockdep_map *lock,
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-18 14:55 ` Nadia Derbey
2007-09-18 17:01 ` Andrew Morton
@ 2007-09-19 14:07 ` Jarek Poplawski
2007-09-20 6:24 ` Nadia Derbey
1 sibling, 1 reply; 35+ messages in thread
From: Jarek Poplawski @ 2007-09-19 14:07 UTC (permalink / raw)
To: Nadia Derbey; +Cc: Andrew Morton, Alexey Dobriyan, linux-kernel
On 18-09-2007 16:55, Nadia Derbey wrote:
...
>
> Well, reviewing the code I found another place where the
> rcu_read_unlock() was missing.
> I'm so sorry for the inconvenience. It's true that I should have tested
> with CONFIG_PREEMPT=y :-(
> Now, the ltp tests pass even with this option set...
>
> In attachment you'll find a patch thhat
> 1) adds the missing rcu_read_unlock()
> 2) replaces Andrew's fix with a new one: the rcu_read_lock() is now
> taken in ipc_lock() / ipc_lock_by_ptr() and released in ipc_unlock(),
> exactly as it was done in the ref code.
BTW, probably I miss something, but I wonder, how this RCU is working
here. E.g. in msg.c do_msgsnd() there is:
msq = msg_lock_check(ns, msqid);
...
msg_unlock(msq);
schedule();
ipc_lock_by_ptr(&msq->q_perm);
Since msq_lock_check() gets msq with ipc_lock_check() under
rcu_read_lock(), and then goes msg_unlock(msq) (i.e. ipc_unlock())
with rcu_read_unlock(), is it valid to use this with
ipc_lock_by_ptr() yet?
Regards,
Jarek P.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-19 14:07 ` Jarek Poplawski
@ 2007-09-20 6:24 ` Nadia Derbey
2007-09-20 7:28 ` Jarek Poplawski
0 siblings, 1 reply; 35+ messages in thread
From: Nadia Derbey @ 2007-09-20 6:24 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Andrew Morton, Alexey Dobriyan, linux-kernel
Jarek Poplawski wrote:
> On 18-09-2007 16:55, Nadia Derbey wrote:
> ...
>
>>Well, reviewing the code I found another place where the
>>rcu_read_unlock() was missing.
>>I'm so sorry for the inconvenience. It's true that I should have tested
>>with CONFIG_PREEMPT=y :-(
>>Now, the ltp tests pass even with this option set...
>>
>>In attachment you'll find a patch thhat
>>1) adds the missing rcu_read_unlock()
>>2) replaces Andrew's fix with a new one: the rcu_read_lock() is now
>>taken in ipc_lock() / ipc_lock_by_ptr() and released in ipc_unlock(),
>>exactly as it was done in the ref code.
>
>
> BTW, probably I miss something, but I wonder, how this RCU is working
> here. E.g. in msg.c do_msgsnd() there is:
>
> msq = msg_lock_check(ns, msqid);
> ...
>
> msg_unlock(msq);
> schedule();
>
> ipc_lock_by_ptr(&msq->q_perm);
>
> Since msq_lock_check() gets msq with ipc_lock_check() under
> rcu_read_lock(), and then goes msg_unlock(msq) (i.e. ipc_unlock())
> with rcu_read_unlock(), is it valid to use this with
> ipc_lock_by_ptr() yet?
Before Calling msg_unlock() they call ipc_rcu_getref() that increments a
refcount in the rcu header for the msg structure. This guarantees that
the the structure won't be freed before they relock it. Once the
structure is relocked by ipc_lock_by_ptr(), they do the symmetric
operation i.e. ipc_rcu_putref().
Regards,
Nadia
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-20 6:24 ` Nadia Derbey
@ 2007-09-20 7:28 ` Jarek Poplawski
2007-09-20 8:21 ` Jarek Poplawski
2007-09-20 8:52 ` Nadia Derbey
0 siblings, 2 replies; 35+ messages in thread
From: Jarek Poplawski @ 2007-09-20 7:28 UTC (permalink / raw)
To: Nadia Derbey; +Cc: Andrew Morton, Alexey Dobriyan, linux-kernel
On Thu, Sep 20, 2007 at 08:24:58AM +0200, Nadia Derbey wrote:
> Jarek Poplawski wrote:
> >On 18-09-2007 16:55, Nadia Derbey wrote:
> >...
> >
> >>Well, reviewing the code I found another place where the
> >>rcu_read_unlock() was missing.
> >>I'm so sorry for the inconvenience. It's true that I should have tested
> >>with CONFIG_PREEMPT=y :-(
> >>Now, the ltp tests pass even with this option set...
> >>
> >>In attachment you'll find a patch thhat
> >>1) adds the missing rcu_read_unlock()
> >>2) replaces Andrew's fix with a new one: the rcu_read_lock() is now
> >>taken in ipc_lock() / ipc_lock_by_ptr() and released in ipc_unlock(),
> >>exactly as it was done in the ref code.
> >
> >
> >BTW, probably I miss something, but I wonder, how this RCU is working
> >here. E.g. in msg.c do_msgsnd() there is:
> >
> >msq = msg_lock_check(ns, msqid);
> >...
> >
> >msg_unlock(msq);
> >schedule();
> >
> >ipc_lock_by_ptr(&msq->q_perm);
> >
> >Since msq_lock_check() gets msq with ipc_lock_check() under
> >rcu_read_lock(), and then goes msg_unlock(msq) (i.e. ipc_unlock())
> >with rcu_read_unlock(), is it valid to use this with
> >ipc_lock_by_ptr() yet?
>
> Before Calling msg_unlock() they call ipc_rcu_getref() that increments a
> refcount in the rcu header for the msg structure. This guarantees that
> the the structure won't be freed before they relock it. Once the
> structure is relocked by ipc_lock_by_ptr(), they do the symmetric
> operation i.e. ipc_rcu_putref().
Yes, I've found this later too - sorry for bothering. I was mislead
by the code like this:
struct kern_ipc_perm *ipc_lock(struct ipc_ids *ids, int id)
{
struct kern_ipc_perm *out;
int lid = ipcid_to_idx(id);
rcu_read_lock();
out = idr_find(&ids->ipcs_idr, lid);
if (out == NULL) {
rcu_read_unlock();
return ERR_PTR(-EINVAL);
}
which seems to suggest "out" is an RCU protected pointer, so, I
thought these refcounts were for something else. But, after looking
at how it's used it turns out to be ~90% wrong: probably 9 out of 10
places use refcouning around this, so, these rcu_read_locks() don't
work here at all. So, probably I miss something again, but IMHO,
these rcu_read_locks/unlocks could be removed here or in
ipc_lock_by_ptr() and it should be enough to use them directly, where
really needed, e.g., in msg.c do_msgrcv().
BTW, I've found this comment, which, at least for me, explains very
good, what's going on here:
/* Lockless receive, part 3:
* Acquire the queue spinlock.
*/
ipc_lock_by_ptr(&msq->q_perm);
Thanks,
Jarek P.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-20 7:28 ` Jarek Poplawski
@ 2007-09-20 8:21 ` Jarek Poplawski
2007-09-20 8:52 ` Nadia Derbey
1 sibling, 0 replies; 35+ messages in thread
From: Jarek Poplawski @ 2007-09-20 8:21 UTC (permalink / raw)
To: Nadia Derbey; +Cc: Andrew Morton, Alexey Dobriyan, linux-kernel
On Thu, Sep 20, 2007 at 09:28:21AM +0200, Jarek Poplawski wrote:
> On Thu, Sep 20, 2007 at 08:24:58AM +0200, Nadia Derbey wrote:
...
> > Before Calling msg_unlock() they call ipc_rcu_getref() that increments a
> > refcount in the rcu header for the msg structure. This guarantees that
> > the the structure won't be freed before they relock it. Once the
> > structure is relocked by ipc_lock_by_ptr(), they do the symmetric
> > operation i.e. ipc_rcu_putref().
...
> which seems to suggest "out" is an RCU protected pointer, so, I
> thought these refcounts were for something else. But, after looking
> at how it's used it turns out to be ~90% wrong: probably 9 out of 10
> places use refcouning around this, so, these rcu_read_locks() don't
> work here at all. So, probably I miss something again, but IMHO,
> these rcu_read_locks/unlocks could be removed here or in
...
...So I missed it again: after all this RCU protection works before
and after refcounting.
Sorry,
Jarek P.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-20 7:28 ` Jarek Poplawski
2007-09-20 8:21 ` Jarek Poplawski
@ 2007-09-20 8:52 ` Nadia Derbey
2007-09-20 13:08 ` Nadia Derbey
2007-09-20 13:19 ` Jarek Poplawski
1 sibling, 2 replies; 35+ messages in thread
From: Nadia Derbey @ 2007-09-20 8:52 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Andrew Morton, Alexey Dobriyan, linux-kernel
Jarek Poplawski wrote:
> On Thu, Sep 20, 2007 at 08:24:58AM +0200, Nadia Derbey wrote:
>
>>Jarek Poplawski wrote:
>>
>>>On 18-09-2007 16:55, Nadia Derbey wrote:
>>>...
>>>
>>>
>>>>Well, reviewing the code I found another place where the
>>>>rcu_read_unlock() was missing.
>>>>I'm so sorry for the inconvenience. It's true that I should have tested
>>>>with CONFIG_PREEMPT=y :-(
>>>>Now, the ltp tests pass even with this option set...
>>>>
>>>>In attachment you'll find a patch thhat
>>>>1) adds the missing rcu_read_unlock()
>>>>2) replaces Andrew's fix with a new one: the rcu_read_lock() is now
>>>>taken in ipc_lock() / ipc_lock_by_ptr() and released in ipc_unlock(),
>>>>exactly as it was done in the ref code.
>>>
>>>
>>>BTW, probably I miss something, but I wonder, how this RCU is working
>>>here. E.g. in msg.c do_msgsnd() there is:
>>>
>>>msq = msg_lock_check(ns, msqid);
>>>...
>>>
>>>msg_unlock(msq);
>>>schedule();
>>>
>>>ipc_lock_by_ptr(&msq->q_perm);
>>>
>>>Since msq_lock_check() gets msq with ipc_lock_check() under
>>>rcu_read_lock(), and then goes msg_unlock(msq) (i.e. ipc_unlock())
>>>with rcu_read_unlock(), is it valid to use this with
>>>ipc_lock_by_ptr() yet?
>>
>>Before Calling msg_unlock() they call ipc_rcu_getref() that increments a
>>refcount in the rcu header for the msg structure. This guarantees that
>>the the structure won't be freed before they relock it. Once the
>>structure is relocked by ipc_lock_by_ptr(), they do the symmetric
>>operation i.e. ipc_rcu_putref().
>
>
> Yes, I've found this later too - sorry for bothering. I was mislead
> by the code like this:
>
> struct kern_ipc_perm *ipc_lock(struct ipc_ids *ids, int id)
> {
> struct kern_ipc_perm *out;
> int lid = ipcid_to_idx(id);
>
> rcu_read_lock();
> out = idr_find(&ids->ipcs_idr, lid);
> if (out == NULL) {
> rcu_read_unlock();
> return ERR_PTR(-EINVAL);
> }
>
> which seems to suggest "out" is an RCU protected pointer, so, I
> thought these refcounts were for something else. But, after looking
> at how it's used it turns out to be ~90% wrong: probably 9 out of 10
> places use refcouning around this,
Actually, ipc_lock() is called most of the time without the
ipc_ids.mutex held and without refcounting (maybe you didn't look for
the msg_lock() sem_lock() and shm_lock() too).
So I think disabling preemption is needed, isn't it?
> so, these rcu_read_locks() don't
> work here at all. So, probably I miss something again, but IMHO,
> these rcu_read_locks/unlocks could be removed here or in
> ipc_lock_by_ptr() and it should be enough to use them directly, where
> really needed, e.g., in msg.c do_msgrcv().
>
I have to check for the ipc_lock_by_ptr(): may be you're right!
Regards,
Nadia
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-20 8:52 ` Nadia Derbey
@ 2007-09-20 13:08 ` Nadia Derbey
2007-09-20 13:26 ` Jarek Poplawski
2007-09-21 8:44 ` Jarek Poplawski
2007-09-20 13:19 ` Jarek Poplawski
1 sibling, 2 replies; 35+ messages in thread
From: Nadia Derbey @ 2007-09-20 13:08 UTC (permalink / raw)
To: Nadia Derbey
Cc: Jarek Poplawski, Andrew Morton, Alexey Dobriyan, linux-kernel
Nadia Derbey wrote:
> Jarek Poplawski wrote:
>
>> On Thu, Sep 20, 2007 at 08:24:58AM +0200, Nadia Derbey wrote:
>>
>>> Jarek Poplawski wrote:
>>>
>>>> On 18-09-2007 16:55, Nadia Derbey wrote:
>>>> ...
>>>>
>>>>
>>>>> Well, reviewing the code I found another place where the
>>>>> rcu_read_unlock() was missing.
>>>>> I'm so sorry for the inconvenience. It's true that I should have
>>>>> tested with CONFIG_PREEMPT=y :-(
>>>>> Now, the ltp tests pass even with this option set...
>>>>>
>>>>> In attachment you'll find a patch thhat
>>>>> 1) adds the missing rcu_read_unlock()
>>>>> 2) replaces Andrew's fix with a new one: the rcu_read_lock() is now
>>>>> taken in ipc_lock() / ipc_lock_by_ptr() and released in
>>>>> ipc_unlock(), exactly as it was done in the ref code.
>>>>
>>>>
>>>>
>>>> BTW, probably I miss something, but I wonder, how this RCU is working
>>>> here. E.g. in msg.c do_msgsnd() there is:
>>>>
>>>> msq = msg_lock_check(ns, msqid);
>>>> ...
>>>>
>>>> msg_unlock(msq);
>>>> schedule();
>>>>
>>>> ipc_lock_by_ptr(&msq->q_perm);
>>>>
>>>> Since msq_lock_check() gets msq with ipc_lock_check() under
>>>> rcu_read_lock(), and then goes msg_unlock(msq) (i.e. ipc_unlock())
>>>> with rcu_read_unlock(), is it valid to use this with
>>>> ipc_lock_by_ptr() yet?
>>>
>>>
>>> Before Calling msg_unlock() they call ipc_rcu_getref() that
>>> increments a refcount in the rcu header for the msg structure. This
>>> guarantees that the the structure won't be freed before they relock
>>> it. Once the structure is relocked by ipc_lock_by_ptr(), they do the
>>> symmetric operation i.e. ipc_rcu_putref().
>>
>>
>>
>> Yes, I've found this later too - sorry for bothering. I was mislead
>> by the code like this:
>>
>> struct kern_ipc_perm *ipc_lock(struct ipc_ids *ids, int id)
>> {
>> struct kern_ipc_perm *out;
>> int lid = ipcid_to_idx(id);
>>
>> rcu_read_lock();
>> out = idr_find(&ids->ipcs_idr, lid);
>> if (out == NULL) {
>> rcu_read_unlock();
>> return ERR_PTR(-EINVAL);
>> }
>>
>> which seems to suggest "out" is an RCU protected pointer, so, I
>> thought these refcounts were for something else. But, after looking
>> at how it's used it turns out to be ~90% wrong: probably 9 out of 10
>> places use refcouning around this,
>
>
> Actually, ipc_lock() is called most of the time without the
> ipc_ids.mutex held and without refcounting (maybe you didn't look for
> the msg_lock() sem_lock() and shm_lock() too).
> So I think disabling preemption is needed, isn't it?
>
>> so, these rcu_read_locks() don't
>> work here at all. So, probably I miss something again, but IMHO,
>> these rcu_read_locks/unlocks could be removed here or in
>> ipc_lock_by_ptr() and it should be enough to use them directly, where
>> really needed, e.g., in msg.c do_msgrcv().
>>
>
> I have to check for the ipc_lock_by_ptr(): may be you're right!
>
So, here is the ipc_lock_by_ptr() status:
1) do_msgsnd(), semctl_main(GETALL), semctl_main(SETALL) and find_undo()
call it inside a refcounting.
==> no rcu read section needed.
2) *_exit_ns(), ipc_findkey() and sysvipc_find_ipc() call it under the
ipc_ids mutex lock.
==> no rcu read section needed.
3) do_msgrcv() is the only path where ipc_lock_by_ptr() is not called
under refcounting
==> rcu read section + some more checks needed once the spnlock is
taken.
So I completely agree with you: we might remove the rcu_read_lock() from
the ipc_lock_by_ptr() and explicitley call it when needed (actually, it
is already explicitly called in do_msgrcv()).
Regards,
Nadia
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-20 8:52 ` Nadia Derbey
2007-09-20 13:08 ` Nadia Derbey
@ 2007-09-20 13:19 ` Jarek Poplawski
1 sibling, 0 replies; 35+ messages in thread
From: Jarek Poplawski @ 2007-09-20 13:19 UTC (permalink / raw)
To: Nadia Derbey; +Cc: Andrew Morton, Alexey Dobriyan, linux-kernel
On Thu, Sep 20, 2007 at 10:52:43AM +0200, Nadia Derbey wrote:
> Jarek Poplawski wrote:
...
> >which seems to suggest "out" is an RCU protected pointer, so, I
> >thought these refcounts were for something else. But, after looking
> >at how it's used it turns out to be ~90% wrong: probably 9 out of 10
> >places use refcouning around this,
>
> Actually, ipc_lock() is called most of the time without the
> ipc_ids.mutex held and without refcounting (maybe you didn't look for
> the msg_lock() sem_lock() and shm_lock() too).
Yes, you are 100% right and I'm 90% lier, 10% blind (maybe backward
too).
> So I think disabling preemption is needed, isn't it?
Do I've to tell the truth...?
>
> >so, these rcu_read_locks() don't
> >work here at all. So, probably I miss something again, but IMHO,
> >these rcu_read_locks/unlocks could be removed here or in
> >ipc_lock_by_ptr() and it should be enough to use them directly, where
> >really needed, e.g., in msg.c do_msgrcv().
> >
>
> I have to check for the ipc_lock_by_ptr(): may be you're right!
Since this whole locking scheme is really quite a puzzle, and needs
more than one or two looks to figure it out, I'd better try stop to
discredit myself more.
Anyway it looks to me like the most sophisticated way of achieving
locklessness I've seen so far. I hope, there is still some gain after
this RCU + refcounting vs. simple locking. (It seems somebody had
similar doubts writing the "Lockless receive, part 1:" comment in
do_msgrcv(); and probably again I'm very wrong, but this checking of
validity of RCU protected structure with an r_msg value, which is
done to avoid refcounting, looks like not very different and has
some cost too).
Regards,
Jarek P.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-20 13:08 ` Nadia Derbey
@ 2007-09-20 13:26 ` Jarek Poplawski
2007-09-21 8:44 ` Jarek Poplawski
1 sibling, 0 replies; 35+ messages in thread
From: Jarek Poplawski @ 2007-09-20 13:26 UTC (permalink / raw)
To: Nadia Derbey; +Cc: Andrew Morton, Alexey Dobriyan, linux-kernel
On Thu, Sep 20, 2007 at 03:08:42PM +0200, Nadia Derbey wrote:
...
> So, here is the ipc_lock_by_ptr() status:
> 1) do_msgsnd(), semctl_main(GETALL), semctl_main(SETALL) and find_undo()
> call it inside a refcounting.
> ==> no rcu read section needed.
>
> 2) *_exit_ns(), ipc_findkey() and sysvipc_find_ipc() call it under the
> ipc_ids mutex lock.
> ==> no rcu read section needed.
>
> 3) do_msgrcv() is the only path where ipc_lock_by_ptr() is not called
> under refcounting
> ==> rcu read section + some more checks needed once the spnlock is
> taken.
>
> So I completely agree with you: we might remove the rcu_read_lock() from
> the ipc_lock_by_ptr() and explicitley call it when needed (actually, it
> is already explicitly called in do_msgrcv()).
Impossible!
I've to look at this once more in the evening (and try to recall
what I've seen yesterday and forgotten today...).
Cheers,
Jarek P.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-20 13:08 ` Nadia Derbey
2007-09-20 13:26 ` Jarek Poplawski
@ 2007-09-21 8:44 ` Jarek Poplawski
2007-09-21 10:11 ` Nadia Derbey
2007-09-24 9:50 ` Nadia Derbey
1 sibling, 2 replies; 35+ messages in thread
From: Jarek Poplawski @ 2007-09-21 8:44 UTC (permalink / raw)
To: Nadia Derbey; +Cc: Andrew Morton, Alexey Dobriyan, linux-kernel
On Thu, Sep 20, 2007 at 03:08:42PM +0200, Nadia Derbey wrote:
> Nadia Derbey wrote:
> >Jarek Poplawski wrote:
> >
> >>On Thu, Sep 20, 2007 at 08:24:58AM +0200, Nadia Derbey wrote:
...
> >Actually, ipc_lock() is called most of the time without the
> >ipc_ids.mutex held and without refcounting (maybe you didn't look for
> >the msg_lock() sem_lock() and shm_lock() too).
> >So I think disabling preemption is needed, isn't it?
> >
> >>so, these rcu_read_locks() don't
> >>work here at all. So, probably I miss something again, but IMHO,
> >>these rcu_read_locks/unlocks could be removed here or in
> >>ipc_lock_by_ptr() and it should be enough to use them directly, where
> >>really needed, e.g., in msg.c do_msgrcv().
> >>
> >
> >I have to check for the ipc_lock_by_ptr(): may be you're right!
> >
>
> So, here is the ipc_lock_by_ptr() status:
> 1) do_msgsnd(), semctl_main(GETALL), semctl_main(SETALL) and find_undo()
> call it inside a refcounting.
> ==> no rcu read section needed.
>
> 2) *_exit_ns(), ipc_findkey() and sysvipc_find_ipc() call it under the
> ipc_ids mutex lock.
> ==> no rcu read section needed.
>
> 3) do_msgrcv() is the only path where ipc_lock_by_ptr() is not called
> under refcounting
> ==> rcu read section + some more checks needed once the spnlock is
> taken.
>
> So I completely agree with you: we might remove the rcu_read_lock() from
> the ipc_lock_by_ptr() and explicitley call it when needed (actually, it
> is already explicitly called in do_msgrcv()).
Yes, IMHO, it should be at least more readable when we can see where
this RCU is really needed.
But, after 3-rd look, I have a few more doubts (btw., 3 looks are
still not enough for me with this code, so I cerainly can miss many
things here, and, alas, I manged to see util and msg code only):
1. ipc_lock() and ipc_lock_check() are used without ipc_ids.mutex,
but it's probably wrong: they call idr_find() with ipc_ids pointer
which needs this mutex, just like in similar code in: ipc_findkey(),
ipc_get_maxid() or sysvipc_find_ipc().
2. I'm not sure this refcounting with ipc_rcu_getref/putref is SMP
safe (memory barriers): it's not atomic, so locking is needed, but
e.g. in do_msgsnd() kern_ipc_perm lock is used for this, while
freeque() calls ipc_rcu_putref() with ipc_ids mutex only.
3. Probably similar problem is possible with msr_d.r_msg which is
read in do_msgrcv() under rcu_read_lock() only.
Regards,
Jarek P.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-18 17:01 ` Andrew Morton
@ 2007-09-21 9:18 ` Nadia Derbey
0 siblings, 0 replies; 35+ messages in thread
From: Nadia Derbey @ 2007-09-21 9:18 UTC (permalink / raw)
To: Andrew Morton; +Cc: Alexey Dobriyan, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 438 bytes --]
Andrew Morton wrote:
> On Tue, 18 Sep 2007 16:55:19 +0200 Nadia Derbey <Nadia.Derbey@bull.net> wrote:
>
>
>>This patch fixes the missing rcu_read_(un)lock in the ipc code
>
>
> Thanks. Could you please check the code comments in ipc/util.c for
> accuracy and completeness sometime?
>
Done - see attachment.
Hope I didn't forget / add some wrong stuff ;-)
BTW this patch also fixes a missing lock in shm_get_stat().
Regards,
Nadia
[-- Attachment #2: ipc_fix_wrong_comments.patch --]
[-- Type: text/x-patch, Size: 14401 bytes --]
This patch fixes the wrong / obsolete comments in the ipc code.
Also adds a missing lock around ipc_get_maxid() in shm_get_stat().
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
---
This patch applies on top of 2.6.23-rc6-mm1 + the previous IPC patches.
ipc/msg.c | 14 +++++++++-
ipc/sem.c | 14 ++++++++++
ipc/shm.c | 39 ++++++++++++++++++++----------
ipc/util.c | 78 +++++++++++++++++++++++++++++++++++++++----------------------
ipc/util.h | 18 ++++++++++++--
5 files changed, 118 insertions(+), 45 deletions(-)
Index: linux-2.6.23-rc6-mm1/ipc/util.c
===================================================================
--- linux-2.6.23-rc6-mm1.orig/ipc/util.c 2007-09-21 08:00:41.000000000 +0200
+++ linux-2.6.23-rc6-mm1/ipc/util.c 2007-09-21 10:32:03.000000000 +0200
@@ -194,7 +194,7 @@ void __init ipc_init_proc_interface(cons
* Requires ipc_ids.mutex locked.
* Returns the LOCKED pointer to the ipc structure if found or NULL
* if not.
- * If key is found ipc contains its ipc structure
+ * If key is found ipc points to the owning ipc structure
*/
static struct kern_ipc_perm *ipc_findkey(struct ipc_ids *ids, key_t key)
@@ -258,10 +258,10 @@ int ipc_get_maxid(struct ipc_ids *ids)
* @new: new IPC permission set
* @size: limit for the number of used ids
*
- * Add an entry 'new' to the IPC idr. The permissions object is
+ * Add an entry 'new' to the IPC ids idr. The permissions object is
* initialised and the first free entry is set up and the id assigned
- * is returned. The list is returned in a locked state on success.
- * On failure the list is not locked and -1 is returned.
+ * is returned. The 'new' entry is returned in a locked state on success.
+ * On failure the entry is not locked and -1 is returned.
*
* Called with ipc_ids.mutex held.
*/
@@ -270,10 +270,6 @@ int ipc_addid(struct ipc_ids* ids, struc
{
int id, err;
- /*
- * rcu_dereference()() is not needed here since
- * ipc_ids.mutex is held
- */
if (size > IPCMNI)
size = IPCMNI;
@@ -303,12 +299,12 @@ int ipc_addid(struct ipc_ids* ids, struc
/**
* ipcget_new - create a new ipc object
* @ns: namespace
- * @ids: identifer set
+ * @ids: IPC identifer set
* @ops: the actual creation routine to call
* @params: its parameters
*
- * This routine is called sys_msgget, sys_semget() and sys_shmget() when
- * the key is IPC_PRIVATE
+ * This routine is called by sys_msgget, sys_semget() and sys_shmget()
+ * when the key is IPC_PRIVATE.
*/
int ipcget_new(struct ipc_namespace *ns, struct ipc_ids *ids,
struct ipc_ops *ops, struct ipc_params *params)
@@ -330,9 +326,16 @@ int ipcget_new(struct ipc_namespace *ns,
/**
* ipc_check_perms - check security and permissions for an IPC
* @ipcp: ipc permission set
- * @ids: identifer set
* @ops: the actual security routine to call
* @params: its parameters
+ *
+ * This routine is called by sys_msgget(), sys_semget() and sys_shmget()
+ * when the key is not IPC_PRIVATE and that key already exists in the
+ * ids IDR.
+ *
+ * On success, the IPC id is returned.
+ *
+ * It is called with ipc_ids.mutex and ipcp->lock held.
*/
static int ipc_check_perms(struct kern_ipc_perm *ipcp, struct ipc_ops *ops,
struct ipc_params *params)
@@ -353,12 +356,16 @@ static int ipc_check_perms(struct kern_i
/**
* ipcget_public - get an ipc object or create a new one
* @ns: namespace
- * @ids: identifer set
+ * @ids: IPC identifer set
* @ops: the actual creation routine to call
* @params: its parameters
*
- * This routine is called sys_msgget, sys_semget() and sys_shmget() when
- * the key is not IPC_PRIVATE
+ * This routine is called by sys_msgget, sys_semget() and sys_shmget()
+ * when the key is not IPC_PRIVATE.
+ * It adds a new entry if the key is not found and does some permission
+ * / security checkings if the key is found.
+ *
+ * On success, the ipc id is returned.
*/
int ipcget_public(struct ipc_namespace *ns, struct ipc_ids *ids,
struct ipc_ops *ops, struct ipc_params *params)
@@ -389,6 +396,10 @@ int ipcget_public(struct ipc_namespace *
if (ops->more_checks)
err = ops->more_checks(ipcp, params);
if (!err)
+ /*
+ * ipc_check_perms returns the IPC id on
+ * success
+ */
err = ipc_check_perms(ipcp, ops, params);
}
ipc_unlock(ipcp);
@@ -401,12 +412,9 @@ int ipcget_public(struct ipc_namespace *
/**
* ipc_rmid - remove an IPC identifier
- * @ids: identifier set
- * @id: ipc perm structure containing the identifier to remove
+ * @ids: IPC identifier set
+ * @ipcp: ipc perm structure containing the identifier to remove
*
- * The identifier must be valid, and in use. The kernel will panic if
- * fed an invalid identifier. The entry is removed and internal
- * variables recomputed.
* ipc_ids.mutex and the spinlock for this ID are held before this
* function is called, and remain locked on the exit.
*/
@@ -558,10 +566,12 @@ static void ipc_do_vfree(struct work_str
*/
static void ipc_schedule_free(struct rcu_head *head)
{
- struct ipc_rcu_grace *grace =
- container_of(head, struct ipc_rcu_grace, rcu);
- struct ipc_rcu_sched *sched =
- container_of(&(grace->data[0]), struct ipc_rcu_sched, data[0]);
+ struct ipc_rcu_grace *grace;
+ struct ipc_rcu_sched *sched;
+
+ grace = container_of(head, struct ipc_rcu_grace, rcu);
+ sched = container_of(&(grace->data[0]), struct ipc_rcu_sched,
+ data[0]);
INIT_WORK(&sched->work, ipc_do_vfree);
schedule_work(&sched->work);
@@ -650,7 +660,7 @@ void kernel_to_ipc64_perm (struct kern_i
}
/**
- * ipc64_perm_to_ipc_perm - convert old ipc permissions to new
+ * ipc64_perm_to_ipc_perm - convert new ipc permissions to old
* @in: new style IPC permissions
* @out: old style IPC permissions
*
@@ -669,6 +679,18 @@ void ipc64_perm_to_ipc_perm (struct ipc6
out->seq = in->seq;
}
+/**
+ * ipc_lock - Lock an ipc structure
+ * @ids: IPC identifier set
+ * @id: ipc id to look for
+ *
+ * Look for an id in the ipc ids idr and lock the associated ipc object.
+ *
+ * ipc_ids.mutex is not necessarily held before this function is called,
+ * that's why we enter a RCU read section.
+ * The ipc object is locked on exit.
+ */
+
struct kern_ipc_perm *ipc_lock(struct ipc_ids *ids, int id)
{
struct kern_ipc_perm *out;
@@ -771,8 +793,8 @@ static void *sysvipc_proc_next(struct se
}
/*
- * File positions: pos 0 -> header, pos n -> ipc id + 1.
- * SeqFile iterator: iterator value locked shp or SEQ_TOKEN_START.
+ * File positions: pos 0 -> header, pos n -> ipc id = n - 1.
+ * SeqFile iterator: iterator value locked ipc pointer or SEQ_TOKEN_START.
*/
static void *sysvipc_proc_start(struct seq_file *s, loff_t *pos)
{
@@ -807,7 +829,7 @@ static void sysvipc_proc_stop(struct seq
struct ipc_proc_iface *iface = iter->iface;
struct ipc_ids *ids;
- /* If we had a locked segment, release it */
+ /* If we had a locked structure, release it */
if (ipc && ipc != SEQ_START_TOKEN)
ipc_unlock(ipc);
Index: linux-2.6.23-rc6-mm1/ipc/util.h
===================================================================
--- linux-2.6.23-rc6-mm1.orig/ipc/util.h 2007-09-21 08:00:41.000000000 +0200
+++ linux-2.6.23-rc6-mm1/ipc/util.h 2007-09-21 10:30:27.000000000 +0200
@@ -54,7 +54,7 @@ struct ipc_params {
* the calls to sys_msgget(), sys_semget(), sys_shmget()
* . routine to call to create a new ipc object. Can be one of newque,
* newary, newseg
- * . routine to call to call to check permissions for a new ipc object.
+ * . routine to call to check permissions for a new ipc object.
* Can be one of security_msg_associate, security_sem_associate,
* security_shm_associate
* . routine to call for an extra check if needed
@@ -88,7 +88,8 @@ int ipc_get_maxid(struct ipc_ids *);
/* must be called with both locks acquired. */
void ipc_rmid(struct ipc_ids *, struct kern_ipc_perm *);
-int ipcperms (struct kern_ipc_perm *ipcp, short flg);
+/* must be called with ipcp locked */
+int ipcperms(struct kern_ipc_perm *ipcp, short flg);
/* for rare, potentially huge allocations.
* both function can sleep
@@ -131,6 +132,9 @@ static inline int ipc_buildid(struct ipc
return SEQ_MULTIPLIER * seq + id;
}
+/*
+ * Must be called with ipcp locked
+ */
static inline int ipc_checkid(struct ipc_ids *ids, struct kern_ipc_perm *ipcp,
int uid)
{
@@ -168,6 +172,16 @@ static inline struct kern_ipc_perm *ipc_
return out;
}
+/**
+ * ipcget - Common sys_*get() code
+ * @ns : namsepace
+ * @ids : IPC identifier set
+ * @ops : operations to be called on ipc object creation, permission checks
+ * and further checks
+ * @params : the parameters needed by the previous operations.
+ *
+ * Common routine called by sys_msgget(), sys_semget() and sys_shmget().
+ */
static inline int ipcget(struct ipc_namespace *ns, struct ipc_ids *ids,
struct ipc_ops *ops, struct ipc_params *params)
{
Index: linux-2.6.23-rc6-mm1/ipc/msg.c
===================================================================
--- linux-2.6.23-rc6-mm1.orig/ipc/msg.c 2007-09-21 08:00:41.000000000 +0200
+++ linux-2.6.23-rc6-mm1/ipc/msg.c 2007-09-21 10:33:08.000000000 +0200
@@ -156,6 +156,13 @@ static inline void msg_rmid(struct ipc_n
ipc_rmid(&msg_ids(ns), &s->q_perm);
}
+/**
+ * newque - Create a new msg queue
+ * @ns: namespace
+ * @params: ptr to the structure that contains the key and msgflg
+ *
+ * Called with msg_ids.mutex held
+ */
static int newque(struct ipc_namespace *ns, struct ipc_params *params)
{
struct msg_queue *msq;
@@ -250,8 +257,8 @@ static void expunge_all(struct msg_queue
/*
* freeque() wakes up waiters on the sender and receiver waiting queue,
- * removes the message queue from message queue ID
- * IDR, and cleans up all the messages associated with this queue.
+ * removes the message queue from message queue ID IDR, and cleans up all the
+ * messages associated with this queue.
*
* msg_ids.mutex and the spinlock for this message queue are held
* before freeque() is called. msg_ids.mutex remains locked on exit.
@@ -278,6 +285,9 @@ static void freeque(struct ipc_namespace
ipc_rcu_putref(msq);
}
+/*
+ * Called with msg_ids.mutex and ipcp locked.
+ */
static inline int msg_security(struct kern_ipc_perm *ipcp, int msgflg)
{
struct msg_queue *msq = container_of(ipcp, struct msg_queue, q_perm);
Index: linux-2.6.23-rc6-mm1/ipc/sem.c
===================================================================
--- linux-2.6.23-rc6-mm1.orig/ipc/sem.c 2007-09-21 08:00:41.000000000 +0200
+++ linux-2.6.23-rc6-mm1/ipc/sem.c 2007-09-21 10:33:50.000000000 +0200
@@ -228,6 +228,14 @@ static inline void sem_rmid(struct ipc_n
*/
#define IN_WAKEUP 1
+/**
+ * newary - Create a new semaphore set
+ * @ns: namespace
+ * @params: ptr to the structure that contains key, semflg and nsems
+ *
+ * Called with sem_ids.mutex held
+ */
+
static int newary(struct ipc_namespace *ns, struct ipc_params *params)
{
int id;
@@ -281,6 +289,9 @@ static int newary(struct ipc_namespace *
}
+/*
+ * Called with sem_ids.mutex and ipcp locked.
+ */
static inline int sem_security(struct kern_ipc_perm *ipcp, int semflg)
{
struct sem_array *sma;
@@ -289,6 +300,9 @@ static inline int sem_security(struct ke
return security_sem_associate(sma, semflg);
}
+/*
+ * Called with sem_ids.mutex and ipcp locked.
+ */
static inline int sem_more_checks(struct kern_ipc_perm *ipcp,
struct ipc_params *params)
{
Index: linux-2.6.23-rc6-mm1/ipc/shm.c
===================================================================
--- linux-2.6.23-rc6-mm1.orig/ipc/shm.c 2007-09-21 08:00:41.000000000 +0200
+++ linux-2.6.23-rc6-mm1/ipc/shm.c 2007-09-21 10:34:25.000000000 +0200
@@ -82,6 +82,10 @@ static void __shm_init_ns(struct ipc_nam
ipc_init_ids(ids);
}
+/*
+ * Called with shm_ids.mutex and the shp structure locked.
+ * Only shm_ids.mutex remains locked on exit.
+ */
static void do_shm_rmid(struct ipc_namespace *ns, struct shmid_kernel *shp)
{
if (shp->shm_nattch){
@@ -182,6 +186,7 @@ static void shm_open(struct vm_area_stru
/*
* shm_destroy - free the struct shmid_kernel
*
+ * @ns: namespace
* @shp: struct to free
*
* It has to be called with shp and shm_ids.mutex locked,
@@ -343,6 +348,14 @@ static struct vm_operations_struct shm_v
#endif
};
+/**
+ * newseg - Create a new shared memory segment
+ * @ns: namespace
+ * @params: ptr to the structure that contains key, size and shmflg
+ *
+ * Called with shm_ids.mutex held
+ */
+
static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
{
key_t key = params->key;
@@ -428,6 +441,9 @@ no_file:
return error;
}
+/*
+ * Called with shm_ids.mutex and ipcp locked.
+ */
static inline int shm_security(struct kern_ipc_perm *ipcp, int shmflg)
{
struct shmid_kernel *shp;
@@ -436,6 +452,9 @@ static inline int shm_security(struct ke
return security_shm_associate(shp, shmflg);
}
+/*
+ * Called with shm_ids.mutex and ipcp locked.
+ */
static inline int shm_more_checks(struct kern_ipc_perm *ipcp,
struct ipc_params *params)
{
@@ -558,6 +577,9 @@ static inline unsigned long copy_shminfo
}
}
+/*
+ * Called with shm_ids.mutex held
+ */
static void shm_get_stat(struct ipc_namespace *ns, unsigned long *rss,
unsigned long *swp)
{
@@ -573,18 +595,6 @@ static void shm_get_stat(struct ipc_name
struct shmid_kernel *shp;
struct inode *inode;
- /*
- * idr_find() is called via shm_get(), so with shm_ids.mutex
- * locked. Since ipc_addid() is also called with
- * shm_ids.mutex down, there is no need to add read barriers
- * here to gurantee the writes in ipc_addid() are seen in
- * order here (for Alpha).
- * However idr_find() itself does not necessary require
- * ipc_ids.mutex down. So if idr_find() is used by other
- * places without ipc_ids.mutex down, then it needs read
- * read memory barriers as ipc_lock() does.
- */
-
shp = idr_find(&shm_ids(ns).ipcs_idr, next_id);
if (shp == NULL)
continue;
@@ -638,8 +648,11 @@ asmlinkage long sys_shmctl (int shmid, i
shminfo.shmmin = SHMMIN;
if(copy_shminfo_to_user (buf, &shminfo, version))
return -EFAULT;
- /* reading a integer is always atomic */
+
+ mutex_lock(&shm_ids(ns).mutex);
err = ipc_get_maxid(&shm_ids(ns));
+ mutex_unlock(&shm_ids(ns).mutex);
+
if(err<0)
err = 0;
goto out;
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-21 8:44 ` Jarek Poplawski
@ 2007-09-21 10:11 ` Nadia Derbey
2007-09-21 11:03 ` Jarek Poplawski
2007-09-24 9:50 ` Nadia Derbey
1 sibling, 1 reply; 35+ messages in thread
From: Nadia Derbey @ 2007-09-21 10:11 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Andrew Morton, Alexey Dobriyan, linux-kernel
Jarek Poplawski wrote:
> On Thu, Sep 20, 2007 at 03:08:42PM +0200, Nadia Derbey wrote:
>
>>Nadia Derbey wrote:
>>
>>>Jarek Poplawski wrote:
>>>
>>>
>>>>On Thu, Sep 20, 2007 at 08:24:58AM +0200, Nadia Derbey wrote:
>
> ...
>
>>>Actually, ipc_lock() is called most of the time without the
>>>ipc_ids.mutex held and without refcounting (maybe you didn't look for
>>>the msg_lock() sem_lock() and shm_lock() too).
>>>So I think disabling preemption is needed, isn't it?
>>>
>>>
>>>>so, these rcu_read_locks() don't
>>>>work here at all. So, probably I miss something again, but IMHO,
>>>>these rcu_read_locks/unlocks could be removed here or in
>>>>ipc_lock_by_ptr() and it should be enough to use them directly, where
>>>>really needed, e.g., in msg.c do_msgrcv().
>>>>
>>>
>>>I have to check for the ipc_lock_by_ptr(): may be you're right!
>>>
>>
>>So, here is the ipc_lock_by_ptr() status:
>>1) do_msgsnd(), semctl_main(GETALL), semctl_main(SETALL) and find_undo()
>>call it inside a refcounting.
>> ==> no rcu read section needed.
>>
>>2) *_exit_ns(), ipc_findkey() and sysvipc_find_ipc() call it under the
>>ipc_ids mutex lock.
>> ==> no rcu read section needed.
>>
>>3) do_msgrcv() is the only path where ipc_lock_by_ptr() is not called
>>under refcounting
>> ==> rcu read section + some more checks needed once the spnlock is
>> taken.
>>
>>So I completely agree with you: we might remove the rcu_read_lock() from
>>the ipc_lock_by_ptr() and explicitley call it when needed (actually, it
>>is already explicitly called in do_msgrcv()).
>
>
> Yes, IMHO, it should be at least more readable when we can see where
> this RCU is really needed.
>
> But, after 3-rd look, I have a few more doubts (btw., 3 looks are
> still not enough for me with this code, so I cerainly can miss many
> things here, and, alas, I manged to see util and msg code only):
>
> 1. ipc_lock() and ipc_lock_check() are used without ipc_ids.mutex,
> but it's probably wrong: they call idr_find() with ipc_ids pointer
> which needs this mutex, just like in similar code in: ipc_findkey(),
> ipc_get_maxid() or sysvipc_find_ipc().
>
> 2. I'm not sure this refcounting with ipc_rcu_getref/putref is SMP
> safe (memory barriers): it's not atomic, so locking is needed, but
> e.g. in do_msgsnd() kern_ipc_perm lock is used for this, while
> freeque() calls ipc_rcu_putref() with ipc_ids mutex only.
OK, but freeque() freeary() and shm_destroy() are special cases:
we have the following path:
mutex_lock(ipc_ids.mutex)
...
ipc_lock(ipcp)
... do whatever cleaning is needed ...
ipc_rmid(ipcp)
ipc_unlock(ipcp)
....
ipc_rcu_putref(ipcp)
Once the rmid has been done the ipc structure is considered as not
visible anymore from the user side ==> any syscall called with the
corresponding id will return invalid.
The only thing that could happen is that this structure be reused for a
newly allocated ipc structure. But this too cannot happen since we are
under the ipc_ids mutex lock.
Am I wrong?
Answers to the other questions in separate e-mails
Regards,
Nadia
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-21 10:11 ` Nadia Derbey
@ 2007-09-21 11:03 ` Jarek Poplawski
2007-09-21 11:15 ` Jarek Poplawski
2007-09-24 6:54 ` Jarek Poplawski
0 siblings, 2 replies; 35+ messages in thread
From: Jarek Poplawski @ 2007-09-21 11:03 UTC (permalink / raw)
To: Nadia Derbey; +Cc: Andrew Morton, Alexey Dobriyan, linux-kernel
On Fri, Sep 21, 2007 at 12:11:15PM +0200, Nadia Derbey wrote:
> Jarek Poplawski wrote:
...
> >2. I'm not sure this refcounting with ipc_rcu_getref/putref is SMP
> >safe (memory barriers): it's not atomic, so locking is needed, but
> >e.g. in do_msgsnd() kern_ipc_perm lock is used for this, while
> >freeque() calls ipc_rcu_putref() with ipc_ids mutex only.
>
> OK, but freeque() freeary() and shm_destroy() are special cases:
> we have the following path:
>
> mutex_lock(ipc_ids.mutex)
> ...
> ipc_lock(ipcp)
> ... do whatever cleaning is needed ...
> ipc_rmid(ipcp)
> ipc_unlock(ipcp)
> ....
> ipc_rcu_putref(ipcp)
>
> Once the rmid has been done the ipc structure is considered as not
> visible anymore from the user side ==> any syscall called with the
> corresponding id will return invalid.
> The only thing that could happen is that this structure be reused for a
> newly allocated ipc structure. But this too cannot happen since we are
> under the ipc_ids mutex lock.
>
> Am I wrong?
I hope not! But, then it would be probably another logical trick:
ipc_rcu_getref/putref() seems to prevent kfreeing of a structure, so
if it's used in do_msgsnd() there should be a risk something can do
this kfree at this moment, and it seems freeque() is the only one,
which both: can do this and cares for this refcount. Then, e.g., if
any of them does ipc_rcu_getref() a bit later and sees old (cached)
value - kfree can be skipped forever. On the other hand, if there is
no such possibility because of other reasons, this all refcounting
looks like a code beautifier only...
Thanks,
Jarek P.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-21 11:03 ` Jarek Poplawski
@ 2007-09-21 11:15 ` Jarek Poplawski
2007-09-24 6:54 ` Jarek Poplawski
1 sibling, 0 replies; 35+ messages in thread
From: Jarek Poplawski @ 2007-09-21 11:15 UTC (permalink / raw)
To: Nadia Derbey; +Cc: Andrew Morton, Alexey Dobriyan, linux-kernel
On Fri, Sep 21, 2007 at 01:03:47PM +0200, Jarek Poplawski wrote:
...
> any of them does ipc_rcu_getref() a bit later and sees old (cached)
Should be:
"any of them does ipc_rcu_putref() a bit later and sees old (cached)"
Sorry,
Jarek P.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-21 11:03 ` Jarek Poplawski
2007-09-21 11:15 ` Jarek Poplawski
@ 2007-09-24 6:54 ` Jarek Poplawski
2007-09-24 7:43 ` Jarek Poplawski
2007-09-24 8:18 ` Nadia Derbey
1 sibling, 2 replies; 35+ messages in thread
From: Jarek Poplawski @ 2007-09-24 6:54 UTC (permalink / raw)
To: Nadia Derbey; +Cc: Andrew Morton, Alexey Dobriyan, linux-kernel
On Fri, Sep 21, 2007 at 01:03:47PM +0200, Jarek Poplawski wrote:
...
> I hope not! But, then it would be probably another logical trick:
> ipc_rcu_getref/putref() seems to prevent kfreeing of a structure, so
> if it's used in do_msgsnd() there should be a risk something can do
> this kfree at this moment, and it seems freeque() is the only one,
> which both: can do this and cares for this refcount. Then, e.g., if
> any of them does ipc_rcu_getref() a bit later and sees old (cached)
> value - kfree can be skipped forever. [...]
After rethinking, this scenario seems to be wrong or very unprobable
(I'm not sure of all ways "if (--container...)" could be compiled),
so there should be no such risk - double kfree/vfree is more probable,
so no danger. More likely is such refcount abuse: ipc_rcu_getref() in
do_msgsnd() done a bit after ipc_rcu_putref() in freeque() (msq
pointer acquired by do_msgsend() before freeque() started); then,
after schedule(), do_msgsnd() can work with kfreed msq_queue structure
(at least considering classic RCU).
Regards,
Jarek P.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-24 6:54 ` Jarek Poplawski
@ 2007-09-24 7:43 ` Jarek Poplawski
2007-09-24 8:18 ` Nadia Derbey
1 sibling, 0 replies; 35+ messages in thread
From: Jarek Poplawski @ 2007-09-24 7:43 UTC (permalink / raw)
To: Nadia Derbey; +Cc: Andrew Morton, Alexey Dobriyan, linux-kernel
On Mon, Sep 24, 2007 at 08:54:07AM +0200, Jarek Poplawski wrote:
> After rethinking, this scenario seems to be wrong or very unprobable
> (I'm not sure of all ways "if (--container...)" could be compiled),
> so there should be no such risk - double kfree/vfree is more probable,
> so no danger. More likely is such refcount abuse: ipc_rcu_getref() in
> do_msgsnd() done a bit after ipc_rcu_putref() in freeque() (msq
> pointer acquired by do_msgsend() before freeque() started); then,
> after schedule(), do_msgsnd() can work with kfreed msq_queue structure
> (at least considering classic RCU).
I see this scenario is even more impossible, so you were right,
it's all right at this point.
Jarek P.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-24 6:54 ` Jarek Poplawski
2007-09-24 7:43 ` Jarek Poplawski
@ 2007-09-24 8:18 ` Nadia Derbey
1 sibling, 0 replies; 35+ messages in thread
From: Nadia Derbey @ 2007-09-24 8:18 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Andrew Morton, Alexey Dobriyan, linux-kernel
Jarek Poplawski wrote:
> On Fri, Sep 21, 2007 at 01:03:47PM +0200, Jarek Poplawski wrote:
> ...
>
>>I hope not! But, then it would be probably another logical trick:
>>ipc_rcu_getref/putref() seems to prevent kfreeing of a structure, so
>>if it's used in do_msgsnd() there should be a risk something can do
>>this kfree at this moment, and it seems freeque() is the only one,
>>which both: can do this and cares for this refcount. Then, e.g., if
>>any of them does ipc_rcu_getref() a bit later and sees old (cached)
>>value - kfree can be skipped forever. [...]
>
>
> After rethinking, this scenario seems to be wrong or very unprobable
> (I'm not sure of all ways "if (--container...)" could be compiled),
> so there should be no such risk - double kfree/vfree is more probable,
> so no danger. More likely is such refcount abuse: ipc_rcu_getref() in
> do_msgsnd() done a bit after ipc_rcu_putref() in freeque() (msq
> pointer acquired by do_msgsend() before freeque() started); then,
> after schedule(), do_msgsnd() can work with kfreed msq_queue structure
> (at least considering classic RCU).
>
If msgsnd() acquires the pointer first, it does it under lock +
rcu_getref(). ==> refcount = 2
When schedule() is called if freeque() takes the pointer it will call
msg_rmid() that sets the deleted field in the msg queue. When the lock
is released by freeque(), we have either 1) or 2):
1) freeque()'s putref called 1st ==> refocunt = 1
Then msgsnd()'s lock_by_ptr() is called ==> rcu lock
Then msgsnd()'s putref is called ==> refcount = 0
But this is done under RCU lock, so should be no problem
Then the deleted field is checked ==> return
2) msgsnd()'s lock_by_ptr() is called ==> rcu lock
Then we don't mind in which order are done the other operations
since we under rcu_lock: the structure won't disappear till we test
the deleted field.
Regards,
Nadia
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-21 8:44 ` Jarek Poplawski
2007-09-21 10:11 ` Nadia Derbey
@ 2007-09-24 9:50 ` Nadia Derbey
2007-09-25 11:47 ` Jarek Poplawski
1 sibling, 1 reply; 35+ messages in thread
From: Nadia Derbey @ 2007-09-24 9:50 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Andrew Morton, Alexey Dobriyan, linux-kernel
Jarek Poplawski wrote:
> On Thu, Sep 20, 2007 at 03:08:42PM +0200, Nadia Derbey wrote:
>
>>Nadia Derbey wrote:
>>
>>>Jarek Poplawski wrote:
>>>
>>>
>>>>On Thu, Sep 20, 2007 at 08:24:58AM +0200, Nadia Derbey wrote:
>
> ...
>
>>>Actually, ipc_lock() is called most of the time without the
>>>ipc_ids.mutex held and without refcounting (maybe you didn't look for
>>>the msg_lock() sem_lock() and shm_lock() too).
>>>So I think disabling preemption is needed, isn't it?
>>>
>>>
>>>>so, these rcu_read_locks() don't
>>>>work here at all. So, probably I miss something again, but IMHO,
>>>>these rcu_read_locks/unlocks could be removed here or in
>>>>ipc_lock_by_ptr() and it should be enough to use them directly, where
>>>>really needed, e.g., in msg.c do_msgrcv().
>>>>
>>>
>>>I have to check for the ipc_lock_by_ptr(): may be you're right!
>>>
>>
>>So, here is the ipc_lock_by_ptr() status:
>>1) do_msgsnd(), semctl_main(GETALL), semctl_main(SETALL) and find_undo()
>>call it inside a refcounting.
>> ==> no rcu read section needed.
>>
>>2) *_exit_ns(), ipc_findkey() and sysvipc_find_ipc() call it under the
>>ipc_ids mutex lock.
>> ==> no rcu read section needed.
>>
>>3) do_msgrcv() is the only path where ipc_lock_by_ptr() is not called
>>under refcounting
>> ==> rcu read section + some more checks needed once the spnlock is
>> taken.
>>
>>So I completely agree with you: we might remove the rcu_read_lock() from
>>the ipc_lock_by_ptr() and explicitley call it when needed (actually, it
>>is already explicitly called in do_msgrcv()).
>
>
> Yes, IMHO, it should be at least more readable when we can see where
> this RCU is really needed.
>
> But, after 3-rd look, I have a few more doubts (btw., 3 looks are
> still not enough for me with this code, so I cerainly can miss many
> things here, and, alas, I manged to see util and msg code only):
>
Jarek,
I'm realizing I did'nt give you an answer to issues # 1 and 3. Sorry for
that!
> 1. ipc_lock() and ipc_lock_check() are used without ipc_ids.mutex,
> but it's probably wrong: they call idr_find() with ipc_ids pointer
> which needs this mutex, just like in similar code in: ipc_findkey(),
> ipc_get_maxid() or sysvipc_find_ipc().
I think you're completely right: the rcu_read_lock() is not enough in
this case.
I have to solve this issue, but keeping the original way the ipc
developers have done it: I think they didn't want to take the mutex lock
for every single operation. E.g. sending a message to a given message
queue shouldn't avoid creating new message queues.
I'll come up with a solution.
>
> 2. I'm not sure this refcounting with ipc_rcu_getref/putref is SMP
> safe (memory barriers): it's not atomic, so locking is needed, but
> e.g. in do_msgsnd() kern_ipc_perm lock is used for this, while
> freeque() calls ipc_rcu_putref() with ipc_ids mutex only.
>
> 3. Probably similar problem is possible with msr_d.r_msg which is
> read in do_msgrcv() under rcu_read_lock() only.
>
In think here they have avoided refcoutning by using r_msg:
r_msg is initialzed to -EAGAIN before releasing the msq lock. if
freequeue() is called it sets r_msg to EIDRM (see expunge_all(-EIDRM)).
Setting r_msg is always done under the msq lock (expunge_all() /
pipelined_Sned()).
Since rcu_read_lock is called right after schedule, they are sure the
msq pointer is still valid when they re-lock it once a msg is present in
the receive queue.
Please tell me if I'm not clear ;-)
Regards,
Nadia
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-24 9:50 ` Nadia Derbey
@ 2007-09-25 11:47 ` Jarek Poplawski
2007-09-26 6:13 ` Jarek Poplawski
0 siblings, 1 reply; 35+ messages in thread
From: Jarek Poplawski @ 2007-09-25 11:47 UTC (permalink / raw)
To: Nadia Derbey; +Cc: Andrew Morton, Alexey Dobriyan, linux-kernel
On Mon, Sep 24, 2007 at 11:50:23AM +0200, Nadia Derbey wrote:
> Jarek Poplawski wrote:
...
> >2. I'm not sure this refcounting with ipc_rcu_getref/putref is SMP
> >safe (memory barriers): it's not atomic, so locking is needed, but
> >e.g. in do_msgsnd() kern_ipc_perm lock is used for this, while
> >freeque() calls ipc_rcu_putref() with ipc_ids mutex only.
> >
> >3. Probably similar problem is possible with msr_d.r_msg which is
> >read in do_msgrcv() under rcu_read_lock() only.
> >
>
> In think here they have avoided refcoutning by using r_msg:
> r_msg is initialzed to -EAGAIN before releasing the msq lock. if
> freequeue() is called it sets r_msg to EIDRM (see expunge_all(-EIDRM)).
> Setting r_msg is always done under the msq lock (expunge_all() /
> pipelined_Sned()).
> Since rcu_read_lock is called right after schedule, they are sure the
> msq pointer is still valid when they re-lock it once a msg is present in
> the receive queue.
>
> Please tell me if I'm not clear ;-)
All clear!
Except... this r_msg is still unclear to me. Since it's read without
msq lock I doubt this first check after schedule() is of any value. A
comment: "Lockless receive, part 2" tells about some safety against a
race with pipeline_send() and expunge_all() when in wake_up_process(),
but how can we be sure this r_msg is not just to be changed and this
wake_up_process() is running while "while" check still sees
ERR_PTR(-EAGAIN) instead of NULL?
Thanks & sorry for delay,
Jarek P.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.6.23-rc6-mm1: IPC: sleeping function called ...
2007-09-25 11:47 ` Jarek Poplawski
@ 2007-09-26 6:13 ` Jarek Poplawski
0 siblings, 0 replies; 35+ messages in thread
From: Jarek Poplawski @ 2007-09-26 6:13 UTC (permalink / raw)
To: Nadia Derbey; +Cc: Andrew Morton, Alexey Dobriyan, linux-kernel
On Tue, Sep 25, 2007 at 01:47:05PM +0200, Jarek Poplawski wrote:
> On Mon, Sep 24, 2007 at 11:50:23AM +0200, Nadia Derbey wrote:
> > Jarek Poplawski wrote:
> ...
> > >2. I'm not sure this refcounting with ipc_rcu_getref/putref is SMP
> > >safe (memory barriers): it's not atomic, so locking is needed, but
> > >e.g. in do_msgsnd() kern_ipc_perm lock is used for this, while
> > >freeque() calls ipc_rcu_putref() with ipc_ids mutex only.
> > >
> > >3. Probably similar problem is possible with msr_d.r_msg which is
> > >read in do_msgrcv() under rcu_read_lock() only.
> > >
> >
> > In think here they have avoided refcoutning by using r_msg:
> > r_msg is initialzed to -EAGAIN before releasing the msq lock. if
> > freequeue() is called it sets r_msg to EIDRM (see expunge_all(-EIDRM)).
> > Setting r_msg is always done under the msq lock (expunge_all() /
> > pipelined_Sned()).
> > Since rcu_read_lock is called right after schedule, they are sure the
> > msq pointer is still valid when they re-lock it once a msg is present in
> > the receive queue.
> >
> > Please tell me if I'm not clear ;-)
>
> All clear!
>
> Except... this r_msg is still unclear to me. Since it's read without
> msq lock I doubt this first check after schedule() is of any value. A
> comment: "Lockless receive, part 2" tells about some safety against a
> race with pipeline_send() and expunge_all() when in wake_up_process(),
> but how can we be sure this r_msg is not just to be changed and this
> wake_up_process() is running while "while" check still sees
> ERR_PTR(-EAGAIN) instead of NULL?
OOPS!
At last I've found enough time to look at this more exactly plus the
right comment in sem.c, and it looks like it's all right and really
thought up, with all variants foreseen. So, I withdraw this problem
nr 3 too.
Best regards,
Jarek P.
^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2007-09-26 6:11 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-18 9:17 2.6.23-rc6-mm1: IPC: sleeping function called Alexey Dobriyan
2007-09-18 9:42 ` Andrew Morton
2007-09-18 10:17 ` Andrew Morton
2007-09-18 10:30 ` Nadia Derbey
2007-09-18 10:34 ` Andrew Morton
[not found] ` <20070918142451.418b3b51@twins>
2007-09-18 16:13 ` Paul E. McKenney
2007-09-18 16:57 ` Andrew Morton
2007-09-18 18:29 ` Paul E. McKenney
2007-09-18 19:41 ` Peter Zijlstra
2007-09-18 20:26 ` [PATCH 1/2] lockdep: annotate rcu_read_lock() Peter Zijlstra
2007-09-18 20:27 ` [RFC][PATCH 2/2] lockdep: rcu_dereference() vs rcu_read_lock() Peter Zijlstra
2007-09-18 21:21 ` Paul E. McKenney
2007-09-18 10:27 ` 2.6.23-rc6-mm1: IPC: sleeping function called Andrew Morton
2007-09-18 10:32 ` Alexey Dobriyan
2007-09-18 14:55 ` Nadia Derbey
2007-09-18 17:01 ` Andrew Morton
2007-09-21 9:18 ` Nadia Derbey
2007-09-19 14:07 ` Jarek Poplawski
2007-09-20 6:24 ` Nadia Derbey
2007-09-20 7:28 ` Jarek Poplawski
2007-09-20 8:21 ` Jarek Poplawski
2007-09-20 8:52 ` Nadia Derbey
2007-09-20 13:08 ` Nadia Derbey
2007-09-20 13:26 ` Jarek Poplawski
2007-09-21 8:44 ` Jarek Poplawski
2007-09-21 10:11 ` Nadia Derbey
2007-09-21 11:03 ` Jarek Poplawski
2007-09-21 11:15 ` Jarek Poplawski
2007-09-24 6:54 ` Jarek Poplawski
2007-09-24 7:43 ` Jarek Poplawski
2007-09-24 8:18 ` Nadia Derbey
2007-09-24 9:50 ` Nadia Derbey
2007-09-25 11:47 ` Jarek Poplawski
2007-09-26 6:13 ` Jarek Poplawski
2007-09-20 13:19 ` Jarek Poplawski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox