All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai-help] More blackfin kernel oops under heavy load
@ 2011-02-28 11:44 Kolja Waschk
  2011-02-28 12:08 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 14+ messages in thread
From: Kolja Waschk @ 2011-02-28 11:44 UTC (permalink / raw)
  To: xenomai

Hi,

I currently can have a look again at the problems I initially described in

http://mail.gna.org/public/xenomai-help/2011-01/msg00091.html

In short, I get various exceptions (Illegal use of supervisor access, NULL pointer
etc.) in a setup involving a RTDM driver using interrupts, some RT tasks, and network
communication over normal (non-RT) sockets. The exceptions appear to occur in either
gatekeeper/0 or in the thread that serves the network requests, not in the thread
talking to the driver. It occurs only with the driver open however. It even
appears in exactly the same way if the application accessing the RTDM driver
and the webserver are started as separate processes. All other "normal"
(non-RT) processes run stable.  I guess there is some problem with interrupts
during system calls.

Initially I titled the posting with "...under heavy load". But now we know the
problem always occurs after some time, just more often the higher the load. I
probably have to go back to the previous bfin dist release (with older kernel
and Xenomai 2.4.x) if there's no fix.

Now I'm quite unable to further investigate within the ipipe/xenomai kernel code but
would like to create at least an example as small as possible but able to trigger the
problem, so others may reproduce it more easily. It seems I need a RTDM (dummy?)
driver that triggers some hardware interrupts, a thread talking to it, and another
thread making system calls such as select() etc...  If you have any thoughts on
existing example code that I could use as a base, or generally some guess about
what kind of simple test code could trigger such a problem, I'd appreciate your
input. For a RTDM test driver, I could use timers or UART as a commonly available
source for interrupts? My system is based on a BF537, somewhat similar to the STAMP.

Thanks,
Kolja



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai-help] More blackfin kernel oops under heavy load
  2011-02-28 11:44 [Xenomai-help] More blackfin kernel oops under heavy load Kolja Waschk
@ 2011-02-28 12:08 ` Gilles Chanteperdrix
  2011-02-28 12:25   ` Kolja Waschk
  0 siblings, 1 reply; 14+ messages in thread
From: Gilles Chanteperdrix @ 2011-02-28 12:08 UTC (permalink / raw)
  To: xenoka09; +Cc: xenomai

Kolja Waschk wrote:
> Hi,
> 
> I currently can have a look again at the problems I initially described in
> 
> http://mail.gna.org/public/xenomai-help/2011-01/msg00091.html
> 
> In short, I get various exceptions (Illegal use of supervisor access, NULL pointer
> etc.) in a setup involving a RTDM driver using interrupts, some RT tasks, and network
> communication over normal (non-RT) sockets. The exceptions appear to occur in either
> gatekeeper/0 or in the thread that serves the network requests, not in the thread
> talking to the driver. It occurs only with the driver open however. It even
> appears in exactly the same way if the application accessing the RTDM driver
> and the webserver are started as separate processes. All other "normal"
> (non-RT) processes run stable.  I guess there is some problem with interrupts
> during system calls.
> 
> Initially I titled the posting with "...under heavy load". But now we know the
> problem always occurs after some time, just more often the higher the load. I
> probably have to go back to the previous bfin dist release (with older kernel
> and Xenomai 2.4.x) if there's no fix.
> 
> Now I'm quite unable to further investigate within the ipipe/xenomai kernel code but
> would like to create at least an example as small as possible but able to trigger the
> problem, so others may reproduce it more easily. It seems I need a RTDM (dummy?)
> driver that triggers some hardware interrupts, a thread talking to it, and another
> thread making system calls such as select() etc...

Does it happen if you use other system calls than select?

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai-help] More blackfin kernel oops under heavy load
  2011-02-28 12:25   ` Kolja Waschk
@ 2011-02-28 12:19     ` Gilles Chanteperdrix
  2011-02-28 12:48       ` Kolja Waschk
  0 siblings, 1 reply; 14+ messages in thread
From: Gilles Chanteperdrix @ 2011-02-28 12:19 UTC (permalink / raw)
  To: xenoka09; +Cc: xenomai

Kolja Waschk wrote:
>>> problem, so others may reproduce it more easily. It seems I need a RTDM (dummy?)
>>> driver that triggers some hardware interrupts, a thread talking to it, and another
>>> thread making system calls such as select() etc...
>> Does it happen if you use other system calls than select?
> 
> According to the "CURRENT PROCESS" info from the kernel message when those faults
> occur, it happens also in threads that do not use select() at all. There is one
> master server thread that select()s on sockets listening for incoming
> connections and several workers that just recv() and send().

Sorry, what I mean, is, does the bug happen if no thread ever calls select?

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai-help] More blackfin kernel oops under heavy load
  2011-02-28 12:08 ` Gilles Chanteperdrix
@ 2011-02-28 12:25   ` Kolja Waschk
  2011-02-28 12:19     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 14+ messages in thread
From: Kolja Waschk @ 2011-02-28 12:25 UTC (permalink / raw)
  To: xenomai

>> problem, so others may reproduce it more easily. It seems I need a RTDM (dummy?)
>> driver that triggers some hardware interrupts, a thread talking to it, and another
>> thread making system calls such as select() etc...
>
> Does it happen if you use other system calls than select?

According to the "CURRENT PROCESS" info from the kernel message when those faults
occur, it happens also in threads that do not use select() at all. There is one
master server thread that select()s on sockets listening for incoming
connections and several workers that just recv() and send().

Kolja




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai-help] More blackfin kernel oops under heavy load
  2011-02-28 12:48       ` Kolja Waschk
@ 2011-02-28 12:41         ` Gilles Chanteperdrix
  2011-02-28 12:56           ` Kolja Waschk
  0 siblings, 1 reply; 14+ messages in thread
From: Gilles Chanteperdrix @ 2011-02-28 12:41 UTC (permalink / raw)
  To: xenoka09; +Cc: xenomai

Kolja Waschk wrote:
>> Sorry, what I mean, is, does the bug happen if no thread ever calls select?
> 
> Yes. I just tried and replaced the select with a loop accept()ing on the now
> nonblocking socket plus some usleep() and still the fault occurs as with
> select(). No other thread in the current setup uses select().

Mmmm busy waiting? But is Linux able to run from time to time?

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai-help] More blackfin kernel oops under heavy load
  2011-02-28 12:19     ` Gilles Chanteperdrix
@ 2011-02-28 12:48       ` Kolja Waschk
  2011-02-28 12:41         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 14+ messages in thread
From: Kolja Waschk @ 2011-02-28 12:48 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

> Sorry, what I mean, is, does the bug happen if no thread ever calls select?

Yes. I just tried and replaced the select with a loop accept()ing on the now
nonblocking socket plus some usleep() and still the fault occurs as with
select(). No other thread in the current setup uses select().

Kolja




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai-help] More blackfin kernel oops under heavy load
  2011-02-28 12:41         ` Gilles Chanteperdrix
@ 2011-02-28 12:56           ` Kolja Waschk
  2011-02-28 13:52             ` Gilles Chanteperdrix
  0 siblings, 1 reply; 14+ messages in thread
From: Kolja Waschk @ 2011-02-28 12:56 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

>> Yes. I just tried and replaced the select with a loop accept()ing on the now
>> nonblocking socket plus some usleep() and still the fault occurs as with
>> select(). No other thread in the current setup uses select().
>
> Mmmm busy waiting? But is Linux able to run from time to time?

Yes, an usleep(100000) after each "check" of all sockets allowed me to telnet
into the system while the server runs, for example






^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai-help] More blackfin kernel oops under heavy load
  2011-02-28 12:56           ` Kolja Waschk
@ 2011-02-28 13:52             ` Gilles Chanteperdrix
  2011-02-28 18:46               ` Kolja Waschk
  0 siblings, 1 reply; 14+ messages in thread
From: Gilles Chanteperdrix @ 2011-02-28 13:52 UTC (permalink / raw)
  To: xenoka09; +Cc: xenomai

Kolja Waschk wrote:
>>> Yes. I just tried and replaced the select with a loop accept()ing on the now
>>> nonblocking socket plus some usleep() and still the fault occurs as with
>>> select(). No other thread in the current setup uses select().
>> Mmmm busy waiting? But is Linux able to run from time to time?
> 
> Yes, an usleep(100000) after each "check" of all sockets allowed me to telnet
> into the system while the server runs, for example

I do not really understand what we are talking about. Are we talking
about Linux select/accept or Xenomai select/accept? Why not using the
blocking accept?

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai-help] More blackfin kernel oops under heavy load
  2011-02-28 13:52             ` Gilles Chanteperdrix
@ 2011-02-28 18:46               ` Kolja Waschk
  2011-03-01 10:50                 ` Kolja Waschk
  2011-03-01 12:13                 ` Gilles Chanteperdrix
  0 siblings, 2 replies; 14+ messages in thread
From: Kolja Waschk @ 2011-02-28 18:46 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

> I do not really understand what we are talking about. Are we talking
> about Linux select/accept or Xenomai select/accept? Why not using the

Linux select/accept. Using the blocking accept() would have changed the
behaviour somewhat compared to the original design. Anyway, I think that doesn't
actually matter much.

I have meanwhile derived a much smaller RTDM driver kernel module, test
application with blocking accept() ;) plus Makefile that do not depend on any
particular external hardware anymore: A SPORT interface (SPORT1 receiver) is
configured with internal clock and frame sync generation and so on itself
generates a lot of interrupts, and this alltogether quite quickly reproduces
the problem on my system. The files together are less than 1000 lines, 20kb.

I'd really appreciate if you or someone could take a look at it and maybe try
the code on his own bf537 system whether the same faults occur, and why. May I
post the files here (as a zipped attachment? inline)? I've already uploaded
a copy at

> http://www.ixo.de/tmp/till20110228.tgz

If I can do anything to make it more comfortable to try this, please let me know.

Short docs: The modt.c is derived from a larger "real" driver. I tried to strip
much processing code from it. It doesn't configure the SPORT interface to
really act reasonable anymore, just to generate a lot of interrupts, including
overflow error interrupts. After putting the path to the kernel into the
Makefile it should compile the module out-of-tree if you simply start "make".

The test application "till" starts one thread that continously contacts the
modt driver via ioctl. This ioctl does nothing but waits for an event that
is raised by the interrupts. The original application uses this ioctl()
to receive a pointer to a single received packet in mapped memory. I modified
the original driver so that it just receives data in the same buffer where
the original used a setup with a ring of several buffers.

Another thread accept()s incoming connections on port 80 and emits a small
chunk of text, enough to satisfy a simple wget.

In the current configuration, the fault occurs immediately after loading the
module and starting "till" on my system. Probably the SPORT configuration causes
much too many interrupts, but the fault still exactly looks like what I experience
in my real system. Previous setups required me to actually contact the server
and get some data, e.g. with

> while true; do wget http://host -O /dev/null ; done

1000x thanks for taking a look in advance... 
Kolja









^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai-help] More blackfin kernel oops under heavy load
  2011-02-28 18:46               ` Kolja Waschk
@ 2011-03-01 10:50                 ` Kolja Waschk
  2011-03-06  4:17                   ` Philippe Gerum
  2011-03-01 12:13                 ` Gilles Chanteperdrix
  1 sibling, 1 reply; 14+ messages in thread
From: Kolja Waschk @ 2011-03-01 10:50 UTC (permalink / raw)
  To: xenomai

Hi,

>> http://www.ixo.de/tmp/till20110228.tgz

After verifying that no one yet checked, I updated the code in the archive to even better reflect the "real" situation. The main thread emits a number on stdout every second endlessly, waiting for SIGINT. To trigger the fault, it is necessary to contact the net server (with wget as I described) in a fast loop and after some time (about 2 seconds in my setup) the kernel fault occurs.

Parameters to tune the load on the system are the frame size (MODT_BUFLEN in modt.h) and a nop loop in till.c rtdm_loop() (line 84). The BF537 here is running with 133 MHz bus clock, 533 MHz system clock.

Kolja



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai-help] More blackfin kernel oops under heavy load
  2011-02-28 18:46               ` Kolja Waschk
  2011-03-01 10:50                 ` Kolja Waschk
@ 2011-03-01 12:13                 ` Gilles Chanteperdrix
  2011-03-01 17:47                   ` Philippe Gerum
  1 sibling, 1 reply; 14+ messages in thread
From: Gilles Chanteperdrix @ 2011-03-01 12:13 UTC (permalink / raw)
  To: xenoka09; +Cc: xenomai

Kolja Waschk wrote:
>> I do not really understand what we are talking about. Are we talking
>> about Linux select/accept or Xenomai select/accept? Why not using the
> 
> Linux select/accept. Using the blocking accept() would have changed the
> behaviour somewhat compared to the original design. Anyway, I think that doesn't
> actually matter much.
> 
> I have meanwhile derived a much smaller RTDM driver kernel module, test
> application with blocking accept() ;) plus Makefile that do not depend on any
> particular external hardware anymore: A SPORT interface (SPORT1 receiver) is
> configured with internal clock and frame sync generation and so on itself
> generates a lot of interrupts, and this alltogether quite quickly reproduces
> the problem on my system. The files together are less than 1000 lines, 20kb.
> 
> I'd really appreciate if you or someone could take a look at it and maybe try
> the code on his own bf537 system whether the same faults occur, and why. May I
> post the files here (as a zipped attachment? inline)? I've already uploaded
> a copy at
> 
>> http://www.ixo.de/tmp/till20110228.tgz

Thanks a lot. I do not know much about blackfin, Philippe is the
specialist. But having this code will certainly help find the issue.

Regards.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai-help] More blackfin kernel oops under heavy load
  2011-03-01 12:13                 ` Gilles Chanteperdrix
@ 2011-03-01 17:47                   ` Philippe Gerum
  0 siblings, 0 replies; 14+ messages in thread
From: Philippe Gerum @ 2011-03-01 17:47 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

On Tue, 2011-03-01 at 13:13 +0100, Gilles Chanteperdrix wrote:
> Kolja Waschk wrote:
> >> I do not really understand what we are talking about. Are we talking
> >> about Linux select/accept or Xenomai select/accept? Why not using the
> > 
> > Linux select/accept. Using the blocking accept() would have changed the
> > behaviour somewhat compared to the original design. Anyway, I think that doesn't
> > actually matter much.
> > 
> > I have meanwhile derived a much smaller RTDM driver kernel module, test
> > application with blocking accept() ;) plus Makefile that do not depend on any
> > particular external hardware anymore: A SPORT interface (SPORT1 receiver) is
> > configured with internal clock and frame sync generation and so on itself
> > generates a lot of interrupts, and this alltogether quite quickly reproduces
> > the problem on my system. The files together are less than 1000 lines, 20kb.
> > 
> > I'd really appreciate if you or someone could take a look at it and maybe try
> > the code on his own bf537 system whether the same faults occur, and why. May I
> > post the files here (as a zipped attachment? inline)? I've already uploaded
> > a copy at
> > 
> >> http://www.ixo.de/tmp/till20110228.tgz
> 
> Thanks a lot. I do not know much about blackfin, Philippe is the
> specialist. But having this code will certainly help find the issue.
> 

Clearly, yes. Thanks. I'll do my best to find cycles to have a look at
this asap.

> Regards.
> 

-- 
Philippe.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai-help] More blackfin kernel oops under heavy load
  2011-03-01 10:50                 ` Kolja Waschk
@ 2011-03-06  4:17                   ` Philippe Gerum
  2011-03-06 11:58                     ` Kolja Waschk
  0 siblings, 1 reply; 14+ messages in thread
From: Philippe Gerum @ 2011-03-06  4:17 UTC (permalink / raw)
  To: Kolja Waschk; +Cc: xenomai

On Tue, 2011-03-01 at 11:50 +0100, Kolja Waschk wrote:
> Hi,
> 
> >> http://www.ixo.de/tmp/till20110228.tgz
> 
> After verifying that no one yet checked, I updated the code in the archive to even better reflect the "real" situation. The main thread emits a number on stdout every second endlessly, waiting for SIGINT. To trigger the fault, it is necessary to contact the net server (with wget as I described) in a fast loop and after some time (about 2 seconds in my setup) the kernel fault occurs.
> 
> Parameters to tune the load on the system are the frame size (MODT_BUFLEN in modt.h) and a nop loop in till.c rtdm_loop() (line 84). The BF537 here is running with 133 MHz bus clock, 533 MHz system clock.
> 

100% reproducible here as well. Does this fix help?

diff --git a/arch/blackfin/mach-common/entry.S b/arch/blackfin/mach-common/entry.S
index a5847f5..a7650ce 100644
--- a/arch/blackfin/mach-common/entry.S
+++ b/arch/blackfin/mach-common/entry.S
@@ -892,8 +892,17 @@ ENDPROC(_ret_from_exception)
 #ifdef CONFIG_IPIPE
 
 _resume_kernel_from_int:
+	r1 = LO(~0x8000) (Z);
+	r1 = r0 & r1;
+	r0 = 1;
+	r0 = r1 - r0;
+	r2 = r1 & r0;
+	cc = r2 == 0;
+	/* Sync the root stage only from the outer interrupt level. */
+	if !cc jump .Lnosync;
 	r0.l = ___ipipe_sync_root;
 	r0.h = ___ipipe_sync_root;
+	[--sp] = reti;
 	[--sp] = rets;
 	[--sp] = ( r7:4, p5:3 );
 	SP += -12;
@@ -901,6 +910,8 @@ _resume_kernel_from_int:
 	SP += 12;
 	( r7:4, p5:3 ) = [sp++];
 	rets = [sp++];
+	reti = [sp++];
+.Lnosync:
 	rts
 #else
 #define _resume_kernel_from_int	 2f

-- 
Philippe.




^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Xenomai-help] More blackfin kernel oops under heavy load
  2011-03-06  4:17                   ` Philippe Gerum
@ 2011-03-06 11:58                     ` Kolja Waschk
  0 siblings, 0 replies; 14+ messages in thread
From: Kolja Waschk @ 2011-03-06 11:58 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

> 100% reproducible here as well. Does this fix help?
> +++ b/arch/blackfin/mach-common/entry.S

Great, yes!

The real system is already up with this fix and running at highest load since
about one hour now, stable, without any OOPses.

Really good news. I was already reparing to backport some of our code to the older kernel... thank you so much!

Kolja



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-03-06 11:58 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-28 11:44 [Xenomai-help] More blackfin kernel oops under heavy load Kolja Waschk
2011-02-28 12:08 ` Gilles Chanteperdrix
2011-02-28 12:25   ` Kolja Waschk
2011-02-28 12:19     ` Gilles Chanteperdrix
2011-02-28 12:48       ` Kolja Waschk
2011-02-28 12:41         ` Gilles Chanteperdrix
2011-02-28 12:56           ` Kolja Waschk
2011-02-28 13:52             ` Gilles Chanteperdrix
2011-02-28 18:46               ` Kolja Waschk
2011-03-01 10:50                 ` Kolja Waschk
2011-03-06  4:17                   ` Philippe Gerum
2011-03-06 11:58                     ` Kolja Waschk
2011-03-01 12:13                 ` Gilles Chanteperdrix
2011-03-01 17:47                   ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.