All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
To: GP Orcullo <kinsamanka@gmail.com>
Cc: "xenomai@xenomai.org" <xenomai@xenomai.org>
Subject: Re: [Xenomai] Switchtest failures on ODROIDU3
Date: Thu, 02 Oct 2014 15:36:00 +0200	[thread overview]
Message-ID: <542D54C0.3070001@xenomai.org> (raw)
In-Reply-To: <CACreCVJvfuMeRCVdE6NTPK=frJ3_-PL-TDrZy9U5fLAW7gWjcg@mail.gmail.com>

On 10/02/2014 03:27 PM, GP Orcullo wrote:
> On Wed, Oct 1, 2014 at 5:20 PM, Gilles Chanteperdrix
> <gilles.chanteperdrix@xenomai.org> wrote:
>> On 10/01/2014 11:12 AM, GP Orcullo wrote:
>>> On Oct 1, 2014 3:54 PM, "Gilles Chanteperdrix" <
>>> gilles.chanteperdrix@xenomai.org> wrote:
>>>>
>>>> On 10/01/2014 01:32 AM, GP Orcullo wrote:
>>>>> On Sep 30, 2014 8:16 PM, "Gilles Chanteperdrix" <
>>>>> gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>
>>>>>> On 09/30/2014 02:04 PM, GP Orcullo wrote:
>>>>>>> On Sep 30, 2014 7:30 PM, "Gilles Chanteperdrix" <
>>>>>>> gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>>
>>>>>>>> On 09/30/2014 07:31 AM, GP Orcullo wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Running the switchtest for extended periods (>10 mins) causes the
>>>>>>>>> machine to lockup.
>>>>>>>>>
>>>>>>>>> I'm running a modified xeno-regression-test which contains only the
>>>>>>>>> following tests:
>>>>>>>>>
>>>>>>>>> check_alive /usr/lib/xenomai/testsuite/switchtest
>>>>>>>>> check_alive /usr/lib/xenomai/testsuite/switchtest -s 1000
>>>>>>>>> check_alive /usr/lib/xenomai/testsuite/latency ${1+"$@"}
>>>>>>>>>
>>>>>>>>> The script is invoked with the following arguments:
>>>>>>>>>
>>>>>>>>> nohup sudo ./xeno-regression-test -l
>>>>>>>>> "/usr/lib/xenomai/testsuite/dohell -m /media/work 36000" -t 2 >
>>>>>>>>> /dev/null & top -d0.5
>>>>>>>>>
>>>>>>>>> The kernel dumps the OOPS information intermittently so it's
>>> difficult
>>>>>>>>> to diagnose the issue.
>>>>>>>>>
>>>>>>>>> Attached is the kernel config and the logfile.
>>>>>>>>
>>>>>>>> Ok, this is an exynos. Sorry, but I have never seen the patch for
>>>>>>>> exynos, so I do not know what is inside. You should direct your
>>>>>>>> questions to whoever provided you with this support.
>>>>>>>
>>>>>>> I'm in the process of porting xenomai to run on exynos.
>>>>>>>
>>>>>>> The ipipe-core-3.8.13-arm-3.patch applies cleanly to the 3.8.13.11
>>>>> kernel
>>>>>>> used by the odroid U3 board.
>>>>>>>
>>>>>>> Attached is the ipipe patch that I've made.
>>>>>>>
>>>>>>> I was just wondering what would cause switchtest to fail. The error
>>>>> that I
>>>>>>> can see is that the system is running out of memory and I don't know
>>>>>>> exactly what is causing this.
>>>>>>
>>>>>> Certainly not switchtest as it does not do any memory allocation.
>>>>>> However, the dohell script has a loop creating a large file and
>>> removing
>>>>>> it. So, could you try and run the dohell script with an unpatched
>>> kernel
>>>>>> and see if you have the error?
>>>>>>
>>>>>
>>>>> Running dohell on a patched and unpatched kernel doesn't trigger the
>>> lockup.
>>>>>
>>>>> Running switchtest without dohell works OK.
>>>>
>>>> Is the problem a lockup, or an OOM?
>>>>
>>>
>>> It's a lockup.
>>>
>>> The OOM message is the only one that I've captured so far.  Most of the
>>> time the kernel doesn't spew any messages before the lockup.
>>>
>>> The lockups are repeatable but generating any error messages isn't.
>>
>> Are you running the tests on the serial console, or with ssh? Do you
>> have unlocked context switch enabled? Have you tried enabling some debug
>> options?
>>
> 
> I'm using the serial console to log the kernel messages and ssh to run
> the command. Using purely the serial console has the same results.

The main point was to avoid redirecting standard error to /dev/null to
see any application error message. Doing this on the serial console may
be a better idea that on ssh, because it means you are less likely to
miss a message that would be sent just prior to the system dying.

> 
> Is this the context switch?: "CONFIG_XENO_HW_UNLOCKED_SWITCH=y"

Yes, please try to disable it if you have it enabled.

> 
> I will try playing again with the debug options and see if I can get
> something useful.
> 
>> Also note that xeno-regression-test puts the system under a lot of
>> stress, so it may happen that there is no output for some time (several
>> minutes), normally the test should stop by itself if there is no output
>> for something like 30 minutes. So, I would recommend not redirecting
>> xeno-test output to see if there is any error before the lockup, and
>> when you see the lockup, leave the system for 30 minutes to see if it
>> does not restart or if xeno-regression-test can exit gracefully.
>>
> 
> This is a total lockup. There's a heartbeat led that dies when it occurs.

Well the heartbeat led does not prove anything: some Linux kernel
activity can very well prevent it from being toggled. Say if for
instance it is toggled by a thread and the activity that hogs the kernel
is a softirq that never ends.

> 
> Attached is one error log that I had captured previously and this one
> had the CONFIG_CPU_IDLE enabled. I've lost track on which kernel this
> trace came from but maybe the error looks familiar.

This trace misses an important information: the reason for the error.
So, please capture the serial console to a file, and post the complete
file, from boot up to the error.

Anyway, you did not answered my question: did you try to leave the
system on for say 30 minutes of 1 hour after the lockup to see if it
does not recover?


-- 
                                                                Gilles.


  reply	other threads:[~2014-10-02 13:36 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-30  5:31 [Xenomai] Switchtest failures on ODROIDU3 GP Orcullo
2014-09-30 11:22 ` Gilles Chanteperdrix
2014-09-30 11:30 ` Gilles Chanteperdrix
2014-09-30 12:04   ` GP Orcullo
2014-09-30 12:16     ` Gilles Chanteperdrix
2014-09-30 23:32       ` GP Orcullo
2014-10-01  7:54         ` Gilles Chanteperdrix
2014-10-01  9:12           ` GP Orcullo
2014-10-01  9:20             ` Gilles Chanteperdrix
2014-10-02 13:27               ` GP Orcullo
2014-10-02 13:36                 ` Gilles Chanteperdrix [this message]
2014-10-02 15:52                   ` GP Orcullo
2014-10-02 17:13                     ` Gilles Chanteperdrix
2014-10-02 23:40                       ` GP Orcullo
2014-10-03  3:35                       ` GP Orcullo
2014-10-03  7:20                         ` Gilles Chanteperdrix
2014-10-03  8:45                           ` GP Orcullo
2014-10-03  8:57                             ` Gilles Chanteperdrix
2014-10-03 10:58                               ` GP Orcullo
2014-10-03 13:37                                 ` Gilles Chanteperdrix
2014-10-03 15:28                                   ` GP Orcullo
2014-10-03 19:14                                     ` Gilles Chanteperdrix
2014-10-03 22:45                                       ` GP Orcullo
2014-10-03 22:48                                         ` Gilles Chanteperdrix
2014-10-04 10:26                                           ` GP Orcullo
2014-10-04 11:31                                             ` Gilles Chanteperdrix
2014-10-05 22:00                                               ` GP Orcullo
2014-10-05 22:04                                                 ` Gilles Chanteperdrix
2014-10-05 22:24                                                   ` GP Orcullo
2014-10-05 22:30                                                     ` Gilles Chanteperdrix
2014-10-09 10:02                                                       ` GP Orcullo
2014-10-09 10:06                                                         ` Gilles Chanteperdrix
2014-10-09 10:12                                                           ` GP Orcullo
2014-10-09 10:16                                                             ` Gilles Chanteperdrix
2014-10-09 10:41                                                               ` Gilles Chanteperdrix
2014-10-09 11:06                                                               ` GP Orcullo
2014-10-09 13:06                                                                 ` Gilles Chanteperdrix
2014-10-09 15:14                                                                 ` Gilles Chanteperdrix
2014-10-20  7:29                                                                   ` GP Orcullo
2014-10-20  7:33                                                                     ` Gilles Chanteperdrix
2014-10-22  6:28                                                                     ` Gilles Chanteperdrix
2014-10-29  1:23                                                                       ` GP Orcullo
2014-10-29  6:16                                                                         ` Gilles Chanteperdrix
2014-10-29  7:24                                                                           ` GP Orcullo
2014-10-29  7:26                                                                             ` Gilles Chanteperdrix
2014-10-29  7:47                                                                               ` GP Orcullo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=542D54C0.3070001@xenomai.org \
    --to=gilles.chanteperdrix@xenomai.org \
    --cc=kinsamanka@gmail.com \
    --cc=xenomai@xenomai.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.