* 2.6.31.4-rt14 posix message queues problem
@ 2009-10-15 17:51 Joerg Abraham
[not found] ` <520f0cf10910151202k2ba01453r10e96eeb064232e8@mail.gmail.com>
0 siblings, 1 reply; 4+ messages in thread
From: Joerg Abraham @ 2009-10-15 17:51 UTC (permalink / raw)
To: LKML
[-- Attachment #1: Type: text/plain, Size: 1030 bytes --]
Hi,
During some performance measurement I stumbled over unexpected bad results for
a posix message queue test case. Which is not surprising since the kernel log
shows a "BUG: scheduling while atomic:" on every measurement trigger.
Attached is the kernel log for an embedded 8572ds powerpc system and a
condensed user space test appli. The test appli uses 2 processes (mq_server
and mq_client) and pingpongs some messages via posix message queues. Please
see the c-file header for details (the appli itself makes not much sense but
shows the problem).
./mq_server
./mq_client # just crashes during run
The problem does _not_ occur on kernels 2.6.29.6-rt24, 2.6.29.6 vanilla and
2.6.31.4 vanilla. The problem does _also_ show up on an embedded x86_64 system
and a full featured fc11 x86_64 system running a 2.6.31.4-rt14 based kernel,
so I dont think it's arch dependent.
Can anybody reproduce the crash ???
If you need more info's please let me know.
[And if I'm doing stupid things just forgive me]
Thanx
Joerg
[-- Attachment #2: 2.6.31.4-rt14_posix_mq.crash --]
[-- Type: text/plain, Size: 3509 bytes --]
[ 1.106260] NET: Registered protocol family 17
[ 1.110896] RPC: Registered udp transport module.
[ 1.115600] RPC: Registered tcp transport module.
[ 1.120324] 802.1Q VLAN Support v1.8 Ben Greear <greearb@candelatech.com>
[ 1.127113] All bugs added by David S. Miller <davem@redhat.com>
[ 1.133717] drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
[ 1.140006] Freeing unused kernel memory: 180k init
[ 6.931612] PHY: mdio@ffe24520:00 - Link is Up - 1000/Full
[ 394.113565] BUG: scheduling while atomic: mq_client/0x00000001/1039, CPU#1
[ 394.113572] Modules linked in:
[ 394.113575] Call Trace:
[ 394.113586] [ee8bddb0] [c00074d8] show_stack+0x4c/0x16c (unreliable)
[ 394.113596] [ee8bdde0] [c0037918] __schedule_bug+0x84/0x88
[ 394.113606] [ee8bddf0] [c02f4828] __schedule+0x338/0x778
[ 394.113611] [ee8bde60] [c02f4f14] schedule+0x20/0x44
[ 394.113617] [ee8bde70] [c02f520c] schedule_timeout+0x1c4/0x204
[ 394.113629] [ee8bdec0] [c0170b84] wq_sleep+0xb8/0x1a0
[ 394.113635] [ee8bdee0] [c0170d34] sys_mq_timedreceive+0xc8/0x388
[ 394.113644] [ee8bdf40] [c0010428] ret_from_syscall+0x0/0x3c
[ 394.113692] BUG: scheduling while atomic: mq_server/0x00000001/1038, CPU#1
[ 394.113696] Modules linked in:
[ 394.113699] Call Trace:
[ 394.113704] [ee8b9db0] [c00074d8] show_stack+0x4c/0x16c (unreliable)
[ 394.113711] [ee8b9de0] [c0037918] __schedule_bug+0x84/0x88
[ 394.113717] [ee8b9df0] [c02f4828] __schedule+0x338/0x778
[ 394.113723] [ee8b9e60] [c02f4f14] schedule+0x20/0x44
[ 394.113728] [ee8b9e70] [c02f520c] schedule_timeout+0x1c4/0x204
[ 394.113735] [ee8b9ec0] [c0170b84] wq_sleep+0xb8/0x1a0
[ 394.113741] [ee8b9ee0] [c0170d34] sys_mq_timedreceive+0xc8/0x388
[ 394.113747] [ee8b9f40] [c0010428] ret_from_syscall+0x0/0x3c
[ 394.113758] Page fault in user mode with in_atomic() = 1 mm = eee98c00
[ 394.113762] NIP = fec2d98 MSR = 202d000
[ 394.113766] Oops: Weird page fault, sig: 11 [#1]
[ 394.118382] PREEMPT SMP NR_CPUS=2 MPC8572 DS
[ 394.122657] Modules linked in:
[ 394.125710] NIP: 0fec2d98 LR: 0fec5278 CTR: 0fec51d8
[ 394.130673] REGS: ee8bdf50 TRAP: 0301 Not tainted (2.6.31.4-rt14-chja)
[ 394.137461] MSR: 0202d000 <EE,PR,VEC,ME,CE> CR: 44000422 XER: 20000000
[ 394.144197] DEAR: 0ff96df8, ESR: 00000000
[ 394.148203] TASK = eee48050[1039] 'mq_client' THREAD: ee8bc000 CPU: 1
[ 394.154473] GPR00: 00000000 bff7fd50 48027cb0 bff8025c 0000000a bff8025c 0ff97010 00020cd9
[ 394.162866] GPR08: 00000001 00000001 00000001 00000064 0ff96df4
[ 394.169076] NIP [0fec2d98] 0xfec2d98
[ 394.172647] LR [0fec5278] 0xfec5278
[ 394.176130] Call Trace:
[ 394.178583] ---[ end trace 5ca40653fb3fed3b ]---
[ 394.183199] note: mq_client[1039] exited with preempt_count 1
[ 394.189022] BUG: scheduling while atomic: mq_client/0x10000001/1039, CPU#1
[ 394.195897] Modules linked in:
[ 394.198950] Call Trace:
[ 394.201394] [ee8bdd30] [c00074d8] show_stack+0x4c/0x16c (unreliable)
[ 394.207758] [ee8bdd60] [c0037918] __schedule_bug+0x84/0x88
[ 394.213250] [ee8bdd70] [c02f4828] __schedule+0x338/0x778
[ 394.218567] [ee8bdde0] [c0039f84] __cond_resched+0x2c/0x50
[ 394.224057] [ee8bddf0] [c02f4dbc] _cond_resched+0x40/0x54
[ 394.229463] [ee8bde00] [c003fd00] put_files_struct+0xf8/0x110
[ 394.235214] [ee8bde20] [c00419f8] do_exit+0x540/0x654
[ 394.240269] [ee8bde70] [c000d98c] kernel_bad_stack+0x0/0x50
[ 394.245850] [ee8bde90] [c001422c] do_page_fault+0x30c/0x4e8
[ 394.251427] [ee8bdf40] [c0010868] handle_page_fault+0xc/0x80
[-- Attachment #3: mq.c --]
[-- Type: text/plain, Size: 5935 bytes --]
/*
* =====================================================================================
*
* Filename: mq.c
*
* Description: posix message test case
*
* compile server:
* gcc -DMQ_SERVER -g -O2 -o mq_server mq.c -lrt
*
* compile client:
* gcc -DMQ_CLIENT -g -O2 -o mq_client mq.c -lrt
*
* =====================================================================================
*/
#include <sys/types.h>
#include <sys/stat.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <limits.h>
#include <linux/unistd.h>
#include <time.h>
#include <malloc.h>
#include <mqueue.h>
#define _GNU_SOURCE
#include <sched.h>
#define MQ_SERVER_SIZE (64)
#define MQ_SERVER_PATH "/server_mq.1234"
#define MQ_SERVER_RES_PATH "/server_mq_res.1234"
typedef struct mq_server_s
{
unsigned long long int send_stamp;
unsigned long long int rcvd_stamp;
unsigned int cmd;
unsigned int unused0;
unsigned long long int unused1;
} mq_server_t;
typedef enum mq_cmd_s
{
MQ_CMD_NOTHING = 0,
MQ_CMD_GET_TS,
MQ_CMD_EXIT
} mq_cmd_t;
static unsigned long long int get_current_timestamp(void)
{
struct timespec ts;
clock_gettime(CLOCK_MONOTONIC, &ts);
return(((unsigned long long)ts.tv_sec) * 1000000000ULL + ((unsigned long long)ts.tv_nsec));
}
static int priority = 10; /* for FIFO scheduling class */
#if defined(MQ_SERVER)
/*
* === FUNCTION ======================================================================
* Name: mq_server
* Description: creates 2 posix message queue, waits for msgs on one of them
* executes what is received, sends back the result via the other mq
* =====================================================================================
*/
int main(int argc, char **argv)
{
mqd_t mq_fd, mq_res_fd;
struct mq_attr mq_attributes;
mq_server_t mq_msg;
int len;
struct sched_param schedp;
unsigned int mq_msg_prio = 1;
/* int policy = SCHED_FIFO; */
if (geteuid())
{
fprintf(stderr, "you should be root\n");
exit(-1);
}
/* the server has higher prio */
priority++;
/* create 2 posix message queues */
mq_attributes.mq_flags = 0;
mq_attributes.mq_maxmsg = MQ_SERVER_SIZE;
mq_attributes.mq_msgsize = sizeof(mq_server_t);
mq_attributes.mq_curmsgs = 0;
mq_fd = mq_open(MQ_SERVER_PATH, O_RDWR | O_CREAT, S_IWUSR | S_IRUSR, &mq_attributes);
mq_res_fd = mq_open(MQ_SERVER_RES_PATH, O_RDWR | O_CREAT, S_IWUSR | S_IRUSR, &mq_attributes);
memset(&schedp, 0, sizeof(schedp));
/*
* schedp.sched_priority = priority;
* sched_setscheduler(0, policy, &schedp);
*/
fprintf(stderr, "mq_server started (pid: %d prio: %d)\n", getpid(), schedp.sched_priority);
while(1)
{
len = mq_receive(mq_fd, (char *) &mq_msg, sizeof(mq_msg), &mq_msg_prio);
if(len == sizeof(mq_msg))
{
switch(mq_msg.cmd)
{
case MQ_CMD_NOTHING:
break;
case MQ_CMD_GET_TS:
mq_msg.rcvd_stamp = get_current_timestamp();
/* send with received prio */
mq_send(mq_res_fd, (char *) &mq_msg, sizeof(mq_msg), mq_msg_prio);
/* printf("got ts req send: 0x%016llx now: 0x%016llx \n", mq_msg.send_stamp, now ); */
break;
case MQ_CMD_EXIT:
goto OUT;
default:
break;
}
}
}
OUT:
/* let the client finish */
sleep(1);
fprintf(stderr, "mq_server exiting\n");
mq_close(mq_fd);
mq_unlink(MQ_SERVER_PATH);
mq_close(mq_res_fd);
mq_unlink(MQ_SERVER_RES_PATH);
return 0;
}
#elif defined(MQ_CLIENT)
static int timeout = 1000000; /* usec; 1 sec as default */
/*
* === FUNCTION ======================================================================
* Name: mq_client
* Description: opens 2 posix message queue, send out jobs on one mq
* collects the result via the other mq and prints the result
* =====================================================================================
*/
int main(int argc, char **argv)
{
int i, len;
mqd_t mq_fd, mq_res_fd;
struct mq_attr mq_attributes;
unsigned int mq_msg_prio = 1;
mq_server_t mq_msg;
struct sched_param schedp;
/* int policy = SCHED_FIFO; */
struct timespec ts_wait;
if (geteuid())
{
fprintf(stderr, "you should be root\n");
exit(-1);
}
ts_wait.tv_sec = timeout / 1000000;
ts_wait.tv_nsec = (timeout % 1000000) * 1000;
fprintf(stderr, "wait time: %ld seconds %ld nsec\n", ts_wait.tv_sec, ts_wait.tv_nsec);
mq_fd = mq_open(MQ_SERVER_PATH, O_RDWR, S_IWUSR | S_IRUSR, &mq_attributes);
mq_res_fd = mq_open(MQ_SERVER_RES_PATH, O_RDWR, S_IWUSR | S_IRUSR, &mq_attributes);
memset(&schedp, 0, sizeof(schedp));
/*
* schedp.sched_priority = priority;
* sched_setscheduler(0, policy, &schedp);
*/
fprintf(stderr, "mq_test started (pid: %d prio: %d)\n", getpid(), schedp.sched_priority);
for(i = 0; i < 3; i++)
{
nanosleep(&ts_wait, NULL);
/* sleep(1); */
/* query ts service from server */
mq_msg.cmd = MQ_CMD_GET_TS;
mq_msg.send_stamp = get_current_timestamp();
mq_send(mq_fd, (char *) &mq_msg, sizeof(mq_msg), mq_msg_prio);
/* got response */
len = mq_receive(mq_res_fd, (char *) &mq_msg, sizeof(mq_msg), &mq_msg_prio);
if (len == sizeof(mq_msg))
{
if(mq_msg.cmd == MQ_CMD_GET_TS)
{
fprintf(stderr, "measured mq latency %lld nsec (i: %d)\n",
mq_msg.rcvd_stamp - mq_msg.send_stamp, i);
}
}
} /* while (true) */
fprintf(stderr, "measure loop exited\n");
/* shut down server */
mq_msg.cmd = MQ_CMD_EXIT;
mq_send(mq_fd, (char *) &mq_msg, sizeof(mq_msg), mq_msg_prio);
mq_close(mq_res_fd);
mq_close(mq_fd);
return 0;
}
#else
#error "MQ_SERVER or MQ_CLIENT should be defined"
#endif /* MQ_SERVER */
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.6.31.4-rt14 posix message queues problem
[not found] ` <520f0cf10910151202k2ba01453r10e96eeb064232e8@mail.gmail.com>
@ 2009-10-15 19:09 ` John Kacur
2009-10-15 20:19 ` John Kacur
2009-10-16 8:09 ` 2.6.31.4-rt14 posix message queues problem (solved) Joerg Abraham
0 siblings, 2 replies; 4+ messages in thread
From: John Kacur @ 2009-10-15 19:09 UTC (permalink / raw)
To: Joerg.Abraham, linux-kernel; +Cc: Thomas Gleixner, Ingo Molnar
> From: Joerg Abraham <Joerg.Abraham@alcatel-lucent.de>
> Date: Thu, Oct 15, 2009 at 7:51 PM
> Subject: 2.6.31.4-rt14 posix message queues problem
> To: LKML <linux-kernel@vger.kernel.org>
>
>
> Hi,
>
> During some performance measurement I stumbled over unexpected bad
> results for a posix message queue test case. Which is not surprising
> since the kernel log shows a "BUG: scheduling while atomic:" on every
> measurement trigger.
>
> Attached is the kernel log for an embedded 8572ds powerpc system and a
> condensed user space test appli. The test appli uses 2 processes
> (mq_server and mq_client) and pingpongs some messages via posix
> message queues. Please see the c-file header for details (the appli
> itself makes not much sense but shows the problem).
>
> ./mq_server
> ./mq_client # just crashes during run
>
> The problem does _not_ occur on kernels 2.6.29.6-rt24, 2.6.29.6
> vanilla and 2.6.31.4 vanilla. The problem does _also_ show up on an
> embedded x86_64 system and a full featured fc11 x86_64 system running
> a 2.6.31.4-rt14 based kernel, so I dont think it's arch dependent.
>
> Can anybody reproduce the crash ???
>
> If you need more info's please let me know.
>
> [And if I'm doing stupid things just forgive me]
>
> Thanx
>
> Joerg
Hello Joerg
Does the following patch solve your problem?
>From 69fde4357db4a054733cbca4158606160e821a3a Mon Sep 17 00:00:00 2001
From: John Kacur <jkacur@redhat.com>
Date: Thu, 15 Oct 2009 20:52:37 +0200
Subject: [PATCH] preempt_disable_rt(); should be paired with preempt_enable_rt()
Signed-off-by: John Kacur <jkacur@redhat.com>
---
ipc/mqueue.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 63a47f7..ab4df36 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -830,7 +830,7 @@ static inline void pipelined_send(struct mqueue_inode_info *info,
wake_up_process(receiver->task);
smp_wmb();
receiver->state = STATE_READY;
- preempt_enable_nort();
+ preempt_enable_rt();
}
/* pipelined_receive() - if there is task waiting in sys_mq_timedsend()
--
1.6.0.6
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: 2.6.31.4-rt14 posix message queues problem
2009-10-15 19:09 ` John Kacur
@ 2009-10-15 20:19 ` John Kacur
2009-10-16 8:09 ` 2.6.31.4-rt14 posix message queues problem (solved) Joerg Abraham
1 sibling, 0 replies; 4+ messages in thread
From: John Kacur @ 2009-10-15 20:19 UTC (permalink / raw)
To: Joerg.Abraham, linux-kernel; +Cc: Thomas Gleixner, Ingo Molnar, linux-rt-users
On Thu, 15 Oct 2009, John Kacur wrote:
>
> > From: Joerg Abraham <Joerg.Abraham@alcatel-lucent.de>
> > Date: Thu, Oct 15, 2009 at 7:51 PM
> > Subject: 2.6.31.4-rt14 posix message queues problem
> > To: LKML <linux-kernel@vger.kernel.org>
> >
> >
> > Hi,
> >
> > During some performance measurement I stumbled over unexpected bad
> > results for a posix message queue test case. Which is not surprising
> > since the kernel log shows a "BUG: scheduling while atomic:" on every
> > measurement trigger.
> >
> > Attached is the kernel log for an embedded 8572ds powerpc system and a
> > condensed user space test appli. The test appli uses 2 processes
> > (mq_server and mq_client) and pingpongs some messages via posix
> > message queues. Please see the c-file header for details (the appli
> > itself makes not much sense but shows the problem).
> >
> > ./mq_server
> > ./mq_client # just crashes during run
> >
> > The problem does _not_ occur on kernels 2.6.29.6-rt24, 2.6.29.6
> > vanilla and 2.6.31.4 vanilla. The problem does _also_ show up on an
> > embedded x86_64 system and a full featured fc11 x86_64 system running
> > a 2.6.31.4-rt14 based kernel, so I dont think it's arch dependent.
> >
> > Can anybody reproduce the crash ???
> >
> > If you need more info's please let me know.
> >
> > [And if I'm doing stupid things just forgive me]
> >
> > Thanx
> >
> > Joerg
>
> Hello Joerg
>
> Does the following patch solve your problem?
>
> From 69fde4357db4a054733cbca4158606160e821a3a Mon Sep 17 00:00:00 2001
> From: John Kacur <jkacur@redhat.com>
> Date: Thu, 15 Oct 2009 20:52:37 +0200
> Subject: [PATCH] preempt_disable_rt(); should be paired with preempt_enable_rt()
>
> Signed-off-by: John Kacur <jkacur@redhat.com>
> ---
> ipc/mqueue.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/ipc/mqueue.c b/ipc/mqueue.c
> index 63a47f7..ab4df36 100644
> --- a/ipc/mqueue.c
> +++ b/ipc/mqueue.c
> @@ -830,7 +830,7 @@ static inline void pipelined_send(struct mqueue_inode_info *info,
> wake_up_process(receiver->task);
> smp_wmb();
> receiver->state = STATE_READY;
> - preempt_enable_nort();
> + preempt_enable_rt();
> }
>
> /* pipelined_receive() - if there is task waiting in sys_mq_timedsend()
> --
> 1.6.0.6
>
>
fyi: I completed my testing, and I was able to reproduce your problem on
v2.6.31.4-rt14, and it goes away with the above patch.
Cheers
John
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.6.31.4-rt14 posix message queues problem (solved)
2009-10-15 19:09 ` John Kacur
2009-10-15 20:19 ` John Kacur
@ 2009-10-16 8:09 ` Joerg Abraham
1 sibling, 0 replies; 4+ messages in thread
From: Joerg Abraham @ 2009-10-16 8:09 UTC (permalink / raw)
To: John Kacur; +Cc: linux-kernel@vger.kernel.org, Thomas Gleixner, Ingo Molnar
Hi John,
> Hello Joerg
>
> Does the following patch solve your problem?
yes, (verified on powerpc and x86_64)
so it may be save to include your patch into the next rt release.
Thanx,
Joerg
>
> From 69fde4357db4a054733cbca4158606160e821a3a Mon Sep 17 00:00:00 2001
> From: John Kacur <jkacur@redhat.com>
> Date: Thu, 15 Oct 2009 20:52:37 +0200
> Subject: [PATCH] preempt_disable_rt(); should be paired with preempt_enable_rt()
>
> Signed-off-by: John Kacur <jkacur@redhat.com>
> ---
> ipc/mqueue.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/ipc/mqueue.c b/ipc/mqueue.c
> index 63a47f7..ab4df36 100644
> --- a/ipc/mqueue.c
> +++ b/ipc/mqueue.c
> @@ -830,7 +830,7 @@ static inline void pipelined_send(struct mqueue_inode_info *info,
> wake_up_process(receiver->task);
> smp_wmb();
> receiver->state = STATE_READY;
> - preempt_enable_nort();
> + preempt_enable_rt();
> }
>
> /* pipelined_receive() - if there is task waiting in sys_mq_timedsend()
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-10-16 8:11 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-15 17:51 2.6.31.4-rt14 posix message queues problem Joerg Abraham
[not found] ` <520f0cf10910151202k2ba01453r10e96eeb064232e8@mail.gmail.com>
2009-10-15 19:09 ` John Kacur
2009-10-15 20:19 ` John Kacur
2009-10-16 8:09 ` 2.6.31.4-rt14 posix message queues problem (solved) Joerg Abraham
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.