All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vikesh Rambaran <vikesh.rambaran@domain.hid>
To: Jan Kiszka <jan.kiszka@domain.hid>
Cc: xenomai-help <xenomai@xenomai.org>
Subject: Re: [Xenomai-help] rtserial interface stalls
Date: Fri, 20 Mar 2009 17:38:57 +0200	[thread overview]
Message-ID: <1237563537.9844.150.camel@domain.hid> (raw)
In-Reply-To: <49C381F8.9070107@domain.hid>

On Fri, 2009-03-20 at 12:46 +0100, Jan Kiszka wrote:
> Vikesh Rambaran wrote:
> > Hi
> > 
> > We have the following setup : 
> > 
> > Hardware : Intel Core Duo, Quatech PCI 200/300 4 channel serial card
> 
> Do you see the same issue when using only one core?
> 

Set the CPU affinity to 1 using echo 1 > /proc/xenomai/affinity
(also tried echo 2 > /proc/xenomai/affinity)

I have confirmed that the tasks do get assigned to the specified cpu.

Also reverted to serialConfig.tx_timeout = RTSER_DEF_TIMEOUT; 
as per your suggestion below.

Task 2 still ends up waiting on the write to rtser0.

> > Software : Ubuntu 8.04, Linux 2.6.24, Xenomai 2.4.5.
> > 
> > Application : 3 tasks running at (Task1) 100uS (Task2) 20mS (Task3) 1S
> > 
> > Tasks 1 and 3 are 'empty' at the moment. Task 2 transmits and receives
> > multiple messages on rtser0 and rtser1
> > 
> > Serial ports are configured as follows
> > 
> >   serialConfig.config_mask      = 0xFFFF;
> >   serialConfig.baud_rate        = configPtr->BitRate;    //= 115200
> >   serialConfig.parity           = configPtr->Parity;     //= None
> >   serialConfig.data_bits        = configPtr->DataBits;   //= 8
> >   serialConfig.stop_bits        = configPtr->StopBits;   //= 1
> >   serialConfig.handshake        = configPtr->FlowControl;//= None 
> >   serialConfig.fifo_depth       = RTSER_DEF_FIFO_DEPTH;
> >   serialConfig.rx_timeout       = RTDM_TIMEOUT_NONE;//RTSER_DEF_TIMEOUT;
> >   serialConfig.tx_timeout       = RTSER_DEF_TIMEOUT;
> >   serialConfig.event_timeout    = RTSER_DEF_TIMEOUT;
> >   serialConfig.timestamp_history= RTSER_RX_TIMESTAMP_HISTORY;
> >   serialConfig.event_mask       = RTSER_DEF_EVENT_MASK;
> > 
> > 
> > Application runs for hours/days with loop back on PCI card connector
> > 
> > 
> > Problem
> > -------
> > 
> > However, between two PCs or external test unit, task 2 stops after a
> > few minutes. The other tasks are still active.
> > 
> > 
> > /proc/xenomai/ while running
> > 
> > 
> > seeker@domain.hid$ cat stat 
> > CPU  PID    MSW        CSW        PF    STAT       %CPU  NAME
> >   0  0      0          43778348   0     00500080   82.9  ROOT/0
> >   1  0      0          0          0     00500080  100.0  ROOT/1
> >   0  8284   1          280862     0     00300184    5.0  ASU_SYNC
> >   0  8285   2          3946       0     00300184    0.2  HILS
> >   0  8286   1          33         0     00300184    0.0  DEBUG
> >   0  0      0          8429       0     00000000    4.1  IRQ16: rtser0
> >   1  0      0          0          0     00000000    0.0  IRQ16: rtser0
> >   0  0      0          96987      0     00000000    6.1  IRQ16: rtser1
> >   1  0      0          0          0     00000000    0.0  IRQ16: rtser1
> >   0  0      0          45948236   0     00000000    1.2  IRQ233: [timer]
> >   1  0      0          2102218    0     00000000    0.0  IRQ233: [timer]
> > seeker@domain.hid$ cat sched 
> > CPU  PID    PRI      PERIOD     TIMEOUT    TIMEBASE  STAT       NAME
> >   0  0       -1      0          0          master    R          ROOT/0
> >   1  0       -1      0          0          master    R          ROOT/1
> >   0  8284     3      100000     51111      master    D          ASU_SYNC
> >   0  8285     2      20000000   13078512   master    D          HILS
> >   0  8286     1      1000000000 553400299  master    D          DEBUG
> > 
> > 
> > /proc/xenomai/ when task stalls
> > 
> > seeker@domain.hid$ cat sched 
> > CPU  PID    PRI      PERIOD     TIMEOUT    TIMEBASE  STAT       NAME
> >   0  0       -1      0          0          master    R          ROOT/0
> >   1  0       -1      0          0          master    R          ROOT/1
> >   0  6223     3      100000     67714      master    D          ASU_SYNC
> >   0  6224     2      20000000   0          master    W          HILS
> >   0  6225     1      1000000000 669570391  master    D          DEBUG
> > 
> > seeker@domain.hid$ cat stat 
> > CPU  PID    MSW        CSW        PF    STAT       %CPU  NAME
> >   0  0      0          37141696   0     00500080   85.1  ROOT/0
> >   1  0      0          0          0     00500080  100.0  ROOT/1
> >   0  6223   1          16318556   0     00300184    4.9  ASU_SYNC
> >   0  6224   1          9506       0     00300182    0.0  HILS
> >   0  6225   1          1756       0     00300184    0.0  DEBUG
> >   0  0      0          20694      0     00000000    2.8  IRQ16: rtser0
> >   1  0      0          0          0     00000000    0.0  IRQ16: rtser0
> >   0  0      0          5471547    0     00000000    5.5  IRQ16: rtser1
> >   1  0      0          0          0     00000000    0.0  IRQ16: rtser1
> >   0  0      0          39087824   0     00000000    1.2  IRQ233: [timer]
> >   1  0      0          1891732    0     00000000    0.0  IRQ233: [timer]
> > 
> > 
> > Alternative tried (after a bit of debugging)
> > -----------------
> > 
> > Changed  serialConfig.tx_timeout       = RTSER_DEF_TIMEOUT;
> > to       serialConfig.tx_timeout       = RTDM_TIMEOUT_NONE;
> > 
> > Task 2 then runs continuously. Writing to rtser0 returns valid number of
> > bytes written but no data appears on serial port pin. At the same time
> > rtser1 functions normally.
> 
> Without feedback from the device about its tx queue state you may
> quickly overload it this way (definitely if written bytes > fifo length).
> 
> > 

The idea is to write data into the devices' circular buffer and return
immediately. If there is not enough place in the buffer, i expected
the write call to return an error code or fewer bytes than that which
was requested. That would indicate a buffer overrun condition which can
be flagged at application level. This way the task will not be delayed
and other important functionality can be executed in a deterministic
way.

The data transmitted on each serial channel is less than 150 bytes at
115200kb/s with the task having a fixed period of 20mS. This should not
overflow default 4k buffers of the 16550A driver.

Well that's the plan:) Did i perhaps misunderstand the implementation
for the tx_timeout ?


> > /proc/xenomai/stat shows rtser0 CSW incrementing. Disconnecting rtser0
> > stops CSW from incrementing.
> > 
> > Looks like the rtser0 tx interrupt gets 'lost' somehow and never
> > recovers. Restarting the application restores communication again, for a
> > while ...
> > 
> > 
> > Has any one else experienced a similar situation ?
> 
> Not with the current versions. But there are many factors that may
> influence the situation.
> 
> > 
> > Any suggestions on how to trace this further, would be greatly
> > appreciated.
> 

Will implement the rest of your suggestions and provide feedback.

> First of all, it would in fact be good to rule-out issues of the old
> kernel/ipipe combination /wrt IRQ handling by giving latest versions a
> try (2.6.28 + Xenomai 2.4.7). The you may want to consider setting up a
> tracer:
> 
> The ipipe function tracer (see Xenomai wiki) would make sense when you
> can identify a failure very quickly (without a few 100 us or so) and
> trigger a stop. That is required as the ipipe tracer works on lowest
> lever (kernel functions) and quickly fills up its circular buffer with
> new events.
> 
> The LTTng tracer provides a higher level view on the problem and could
> perfectly run over a longer period. See related postings on this list
> for details (I'm currently maintaining a 2.6.28 port for ipipe, see also
> git.kiszka.org).
> 
> Once you have a picture of what goes on in the kernel generally, you may
> add ad-hoc instrumentations to driver or kernel (or we can discuss where
> to add them) to find out what actually happens.
> 
> Jan
> 




  reply	other threads:[~2009-03-20 15:38 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-20 10:56 [Xenomai-help] rtserial interface stalls Vikesh Rambaran
2009-03-20 11:46 ` Jan Kiszka
2009-03-20 15:38   ` Vikesh Rambaran [this message]
2009-03-21  9:02     ` Jan Kiszka
2009-03-21 16:26       ` vikesh rambaran
     [not found]   ` <b131c9f0903230122r713d131x7bb516ed9d00a42a@domain.hid>
2009-04-06  9:30     ` vikesh rambaran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1237563537.9844.150.camel@domain.hid \
    --to=vikesh.rambaran@domain.hid \
    --cc=jan.kiszka@domain.hid \
    --cc=xenomai@xenomai.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.