* linux-kernel@vger.kernel.org
@ 2002-10-30 17:57 Manfred Spraul
2002-10-30 21:22 ` linux-kernel@vger.kernel.org Bill Davidsen
0 siblings, 1 reply; 5+ messages in thread
From: Manfred Spraul @ 2002-10-30 17:57 UTC (permalink / raw)
To: Bernhard Kaindl; +Cc: linux-kernel
You are right, there is a race in pipelined_send, but slightly different
than in your description:
pipelined_send is carefull not to read the msr pointer after
wake_up_process, but it does rely on the contents of the msr structure
after setting msr->r_msg.
I.e. the description is
CPU 1 CPU 2
sys_msgrcv()
(sleeps for messsage)
sys_msgsnd()
pipelined_send()
(woken up by a signal)
Notices that a message is there,
accepts the message and exists.
stack trashed, perhaps even task structure gone.
wake_up_process(msr->r_tsk)
*oops - msr is not valid anymore.
Is that possible? Do you apps use signals?
Your fix solves the problem, but I'd prefer to keep the current, lockless receive path - it avoids 50% of the spinlock operations.
I'll write a patch that adds the missing memory barriers and copies the fields before setting msr->r_msg.
--
Manfred
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: linux-kernel@vger.kernel.org
2002-10-30 17:57 linux-kernel@vger.kernel.org Manfred Spraul
@ 2002-10-30 21:22 ` Bill Davidsen
2002-10-30 21:37 ` [PATCH] IPC SMP race: msgrcv may not return before msgsnd is done Manfred Spraul
0 siblings, 1 reply; 5+ messages in thread
From: Bill Davidsen @ 2002-10-30 21:22 UTC (permalink / raw)
To: Manfred Spraul; +Cc: Bernhard Kaindl, linux-kernel
On Wed, 30 Oct 2002, Manfred Spraul wrote:
> You are right, there is a race in pipelined_send, but slightly different
> than in your description:
> pipelined_send is carefull not to read the msr pointer after
> wake_up_process, but it does rely on the contents of the msr structure
> after setting msr->r_msg.
> Is that possible? Do you apps use signals?
>
> Your fix solves the problem, but I'd prefer to keep the current,
> lockless receive path - it avoids 50% of the spinlock operations. I'll
> write a patch that adds the missing memory barriers and copies the
> fields before setting msr->r_msg.
Don't know about his, my app does, it's only a benchmark, but I do see bad
behaviour from time to time (total lockups) on SMP machines.
--
bill davidsen <davidsen@tmr.com>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] IPC SMP race: msgrcv may not return before msgsnd is done
2002-10-30 21:22 ` linux-kernel@vger.kernel.org Bill Davidsen
@ 2002-10-30 21:37 ` Manfred Spraul
0 siblings, 0 replies; 5+ messages in thread
From: Manfred Spraul @ 2002-10-30 21:37 UTC (permalink / raw)
To: Bill Davidsen; +Cc: Bernhard Kaindl, linux-kernel
[subject line switched back, sorry for the wrong line]
Bill Davidsen wrote:
>Don't know about his, my app does, it's only a benchmark, but I do see bad
>behaviour from time to time (total lockups) on SMP machines.
>
>
>
Could you try Bernhard's patch? Lockless receive is nice, but fixing the
found race would be really tricky, and is probably not worth the effort.
--
Manfred
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] IPC SMP race: msgrcv may not return before msgsnd is done
@ 2002-10-30 17:28 Bernhard Kaindl
2002-10-30 21:20 ` Bill Davidsen
0 siblings, 1 reply; 5+ messages in thread
From: Bernhard Kaindl @ 2002-10-30 17:28 UTC (permalink / raw)
To: linux-kernel; +Cc: Ulrich Weigand, Neale Ferguson
Hello,
this is targeted to System V message queue Experts and people who
use System V message queues quite extensively in SMP environments.
It contains descriptions of a SMP race in sys_msgrcv, a fix for
it and some questions about the code in question(maybe there is
a better fix, but I think the proposed one is ok)
I've already a short feedback from Andi Kleen, he told me that
the Description and patch seem to be correct as far as he can see.
Neale Ferguson from Software AG debugged a SMP race in the message
queue implementation of the 2.4 Kernel.
The race was triggered in a time frame of 20 minutes to 14 hours after
starting a complex application stress test which uses message queues
quite extensively.
The race could have been triggered as follows:
CPU 1 CPU 2
sys_msgrcv()
(sleeps for messsage)
sys_msgsnd()
pipelined_send()
wake_up_process()
(wakes up the process)
sys_msgrcv gets the
message and exits
pipelined_send() is still
using the linked list with
the receivers, but the
receiver element from the
process that just received
the message is freed now
because it is a stack variable
of the sys_msgrcv which just
exited.
schedules some random
new process which calls
the kernel and uses the
stack freed by sys_msgrcv
again
pipelined_send now tries to
dereference a pointer which
has been initalized on CPU1
by the random process
--------------------------------------------------------------------
The resulting oops in wake_up_process which is called within
pipelined_send has been reproduced on the i386 and s390 architectures
with 2.4.7, 2.4.17 and 2.4.19 kernels.
Now the patch part:
-------------------------------------------------------------------------
Symptom: jump to zero in kernel/wake_up_process causes Kernel oops
Description: sys_msgscv may return without waiting for the lock of
sys_msgsnd which causes that a stack variable may be
overwritten
Solution: Don't try to return in sys_msgscv before the lock is
released by sys_msgsnd.
Short description of the problem:
After schedule returns, sys_msgrcv needs to reacquire the lock to make
sure noone still uses a temporary pointer to the msg_receiver which is
on the stack of sys_msgrcv before it exits.
Detailed description of the problem:
In sys_msgrcv, msr_d is defined on the stack:
> 728 asmlinkage long sys_msgrcv (int msqid, struct msgbuf *msgp, size_t msgsz,
and then added to the list:
> 803 list_add_tail(&msr_d.r_list,&msq->q_receivers);
now we wait until wakeup because someone sent the message we are waiting for:
> 815 schedule();
When we return, we check for success and go out before checking for the lock
which may be still hold and needed by pipelined_send within sys_msgsnd:
> 818 msg = (struct msg_msg*) msr_d.r_msg;
> 819 if(!IS_ERR(msg))
> 820 goto out_success;
<patch description>
While attempting an initial patch I've noticed that the first
msg = (struct msg_msg*) msr_d.r_msg;
if(!IS_ERR(msg))
goto out_success;
can be just omitted and the rest should be just what we need,
because the next is reaquiriing the lock, the exact code we
need after the schedule in which we were waiting for the wake
up called by msgsnd because it just send us a message we can
process now:
t = msg_lock(msqid);
if(t==NULL)
msqid=-1;
msg = (struct msg_msg*)msr_d.r_msg;
if(!IS_ERR(msg)) {
/* our message arived while we waited for
* the spinlock. Process it.
*/
if(msqid!=-1)
msg_unlock(msqid);
goto out_success;
}
</patch description>
So this is the proposed patch:
--- linux-2.4.19/ipc/msg.c
+++ linux-2.4.19/ipc/msg.c
@@ -806,28 +806,24 @@
msr_d.r_mode = mode;
if(msgflg & MSG_NOERROR)
msr_d.r_maxsize = INT_MAX;
else
msr_d.r_maxsize = msgsz;
msr_d.r_msg = ERR_PTR(-EAGAIN);
current->state = TASK_INTERRUPTIBLE;
msg_unlock(msqid);
schedule();
current->state = TASK_RUNNING;
- msg = (struct msg_msg*) msr_d.r_msg;
- if(!IS_ERR(msg))
- goto out_success;
-
t = msg_lock(msqid);
if(t==NULL)
msqid=-1;
msg = (struct msg_msg*)msr_d.r_msg;
if(!IS_ERR(msg)) {
/* our message arived while we waited for
* the spinlock. Process it.
*/
if(msqid!=-1)
msg_unlock(msqid);
goto out_success;
}
Question part:
The question we have is why this first check was there in the first place,
the only explanation we could understand is that this was some attempt to
improve performance (if the contention at this lock is high).
Maybe someone can think of a better fix, but as long there is none,
correctness is more important than performance the patch should be
fine as bug-fix.
Acknowledgements:
-----------------
Analysis was done using lcrash and VM tracing on s390 by Neale Ferguson(SAG)
Very helpful directions and explanations provided by Ulrich Weigand(IBM)
--
Best Regards,
Bernhard Kaindl
SuSE Linux and United Linux Development
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] IPC SMP race: msgrcv may not return before msgsnd is done
2002-10-30 17:28 Bernhard Kaindl
@ 2002-10-30 21:20 ` Bill Davidsen
0 siblings, 0 replies; 5+ messages in thread
From: Bill Davidsen @ 2002-10-30 21:20 UTC (permalink / raw)
To: Bernhard Kaindl; +Cc: linux-kernel, Ulrich Weigand, Neale Ferguson
On Wed, 30 Oct 2002, Bernhard Kaindl wrote:
> this is targeted to System V message queue Experts and people who
> use System V message queues quite extensively in SMP environments.
>
> It contains descriptions of a SMP race in sys_msgrcv, a fix for
> it and some questions about the code in question(maybe there is
> a better fix, but I think the proposed one is ok)
>
> I've already a short feedback from Andi Kleen, he told me that
> the Description and patch seem to be correct as far as he can see.
I'll try it, I have a program which gets a lockup rather than an oops, but
it may be related.
> Neale Ferguson from Software AG debugged a SMP race in the message
> queue implementation of the 2.4 Kernel.
>
> The race was triggered in a time frame of 20 minutes to 14 hours after
> starting a complex application stress test which uses message queues
> quite extensively.
One of the test loads in the responsiveness test creates a ring of Ncpu+1
process which pass tokens in a ring using message queues, passing Ncpu
tokens at a time. I guess that might resemble "quite extensively" ;-) I
also note the tokens/sec going through a single process, that would show
any performance impact of the patch.
However, if this prevents the occasional lockup it will probably go in my
working kernel, unfortunately this is "occasional" so I won't know right
away.
--
bill davidsen <davidsen@tmr.com>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2002-10-30 21:31 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-10-30 17:57 linux-kernel@vger.kernel.org Manfred Spraul
2002-10-30 21:22 ` linux-kernel@vger.kernel.org Bill Davidsen
2002-10-30 21:37 ` [PATCH] IPC SMP race: msgrcv may not return before msgsnd is done Manfred Spraul
-- strict thread matches above, loose matches on Subject: below --
2002-10-30 17:28 Bernhard Kaindl
2002-10-30 21:20 ` Bill Davidsen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox