From: Vikesh Rambaran <vikesh.rambaran@domain.hid>
To: Jan Kiszka <jan.kiszka@domain.hid>
Cc: xenomai-help <xenomai@xenomai.org>
Subject: Re: [Xenomai-help] rtserial interface stalls
Date: Fri, 20 Mar 2009 17:38:57 +0200 [thread overview]
Message-ID: <1237563537.9844.150.camel@domain.hid> (raw)
In-Reply-To: <49C381F8.9070107@domain.hid>
On Fri, 2009-03-20 at 12:46 +0100, Jan Kiszka wrote:
> Vikesh Rambaran wrote:
> > Hi
> >
> > We have the following setup :
> >
> > Hardware : Intel Core Duo, Quatech PCI 200/300 4 channel serial card
>
> Do you see the same issue when using only one core?
>
Set the CPU affinity to 1 using echo 1 > /proc/xenomai/affinity
(also tried echo 2 > /proc/xenomai/affinity)
I have confirmed that the tasks do get assigned to the specified cpu.
Also reverted to serialConfig.tx_timeout = RTSER_DEF_TIMEOUT;
as per your suggestion below.
Task 2 still ends up waiting on the write to rtser0.
> > Software : Ubuntu 8.04, Linux 2.6.24, Xenomai 2.4.5.
> >
> > Application : 3 tasks running at (Task1) 100uS (Task2) 20mS (Task3) 1S
> >
> > Tasks 1 and 3 are 'empty' at the moment. Task 2 transmits and receives
> > multiple messages on rtser0 and rtser1
> >
> > Serial ports are configured as follows
> >
> > serialConfig.config_mask = 0xFFFF;
> > serialConfig.baud_rate = configPtr->BitRate; //= 115200
> > serialConfig.parity = configPtr->Parity; //= None
> > serialConfig.data_bits = configPtr->DataBits; //= 8
> > serialConfig.stop_bits = configPtr->StopBits; //= 1
> > serialConfig.handshake = configPtr->FlowControl;//= None
> > serialConfig.fifo_depth = RTSER_DEF_FIFO_DEPTH;
> > serialConfig.rx_timeout = RTDM_TIMEOUT_NONE;//RTSER_DEF_TIMEOUT;
> > serialConfig.tx_timeout = RTSER_DEF_TIMEOUT;
> > serialConfig.event_timeout = RTSER_DEF_TIMEOUT;
> > serialConfig.timestamp_history= RTSER_RX_TIMESTAMP_HISTORY;
> > serialConfig.event_mask = RTSER_DEF_EVENT_MASK;
> >
> >
> > Application runs for hours/days with loop back on PCI card connector
> >
> >
> > Problem
> > -------
> >
> > However, between two PCs or external test unit, task 2 stops after a
> > few minutes. The other tasks are still active.
> >
> >
> > /proc/xenomai/ while running
> >
> >
> > seeker@domain.hid$ cat stat
> > CPU PID MSW CSW PF STAT %CPU NAME
> > 0 0 0 43778348 0 00500080 82.9 ROOT/0
> > 1 0 0 0 0 00500080 100.0 ROOT/1
> > 0 8284 1 280862 0 00300184 5.0 ASU_SYNC
> > 0 8285 2 3946 0 00300184 0.2 HILS
> > 0 8286 1 33 0 00300184 0.0 DEBUG
> > 0 0 0 8429 0 00000000 4.1 IRQ16: rtser0
> > 1 0 0 0 0 00000000 0.0 IRQ16: rtser0
> > 0 0 0 96987 0 00000000 6.1 IRQ16: rtser1
> > 1 0 0 0 0 00000000 0.0 IRQ16: rtser1
> > 0 0 0 45948236 0 00000000 1.2 IRQ233: [timer]
> > 1 0 0 2102218 0 00000000 0.0 IRQ233: [timer]
> > seeker@domain.hid$ cat sched
> > CPU PID PRI PERIOD TIMEOUT TIMEBASE STAT NAME
> > 0 0 -1 0 0 master R ROOT/0
> > 1 0 -1 0 0 master R ROOT/1
> > 0 8284 3 100000 51111 master D ASU_SYNC
> > 0 8285 2 20000000 13078512 master D HILS
> > 0 8286 1 1000000000 553400299 master D DEBUG
> >
> >
> > /proc/xenomai/ when task stalls
> >
> > seeker@domain.hid$ cat sched
> > CPU PID PRI PERIOD TIMEOUT TIMEBASE STAT NAME
> > 0 0 -1 0 0 master R ROOT/0
> > 1 0 -1 0 0 master R ROOT/1
> > 0 6223 3 100000 67714 master D ASU_SYNC
> > 0 6224 2 20000000 0 master W HILS
> > 0 6225 1 1000000000 669570391 master D DEBUG
> >
> > seeker@domain.hid$ cat stat
> > CPU PID MSW CSW PF STAT %CPU NAME
> > 0 0 0 37141696 0 00500080 85.1 ROOT/0
> > 1 0 0 0 0 00500080 100.0 ROOT/1
> > 0 6223 1 16318556 0 00300184 4.9 ASU_SYNC
> > 0 6224 1 9506 0 00300182 0.0 HILS
> > 0 6225 1 1756 0 00300184 0.0 DEBUG
> > 0 0 0 20694 0 00000000 2.8 IRQ16: rtser0
> > 1 0 0 0 0 00000000 0.0 IRQ16: rtser0
> > 0 0 0 5471547 0 00000000 5.5 IRQ16: rtser1
> > 1 0 0 0 0 00000000 0.0 IRQ16: rtser1
> > 0 0 0 39087824 0 00000000 1.2 IRQ233: [timer]
> > 1 0 0 1891732 0 00000000 0.0 IRQ233: [timer]
> >
> >
> > Alternative tried (after a bit of debugging)
> > -----------------
> >
> > Changed serialConfig.tx_timeout = RTSER_DEF_TIMEOUT;
> > to serialConfig.tx_timeout = RTDM_TIMEOUT_NONE;
> >
> > Task 2 then runs continuously. Writing to rtser0 returns valid number of
> > bytes written but no data appears on serial port pin. At the same time
> > rtser1 functions normally.
>
> Without feedback from the device about its tx queue state you may
> quickly overload it this way (definitely if written bytes > fifo length).
>
> >
The idea is to write data into the devices' circular buffer and return
immediately. If there is not enough place in the buffer, i expected
the write call to return an error code or fewer bytes than that which
was requested. That would indicate a buffer overrun condition which can
be flagged at application level. This way the task will not be delayed
and other important functionality can be executed in a deterministic
way.
The data transmitted on each serial channel is less than 150 bytes at
115200kb/s with the task having a fixed period of 20mS. This should not
overflow default 4k buffers of the 16550A driver.
Well that's the plan:) Did i perhaps misunderstand the implementation
for the tx_timeout ?
> > /proc/xenomai/stat shows rtser0 CSW incrementing. Disconnecting rtser0
> > stops CSW from incrementing.
> >
> > Looks like the rtser0 tx interrupt gets 'lost' somehow and never
> > recovers. Restarting the application restores communication again, for a
> > while ...
> >
> >
> > Has any one else experienced a similar situation ?
>
> Not with the current versions. But there are many factors that may
> influence the situation.
>
> >
> > Any suggestions on how to trace this further, would be greatly
> > appreciated.
>
Will implement the rest of your suggestions and provide feedback.
> First of all, it would in fact be good to rule-out issues of the old
> kernel/ipipe combination /wrt IRQ handling by giving latest versions a
> try (2.6.28 + Xenomai 2.4.7). The you may want to consider setting up a
> tracer:
>
> The ipipe function tracer (see Xenomai wiki) would make sense when you
> can identify a failure very quickly (without a few 100 us or so) and
> trigger a stop. That is required as the ipipe tracer works on lowest
> lever (kernel functions) and quickly fills up its circular buffer with
> new events.
>
> The LTTng tracer provides a higher level view on the problem and could
> perfectly run over a longer period. See related postings on this list
> for details (I'm currently maintaining a 2.6.28 port for ipipe, see also
> git.kiszka.org).
>
> Once you have a picture of what goes on in the kernel generally, you may
> add ad-hoc instrumentations to driver or kernel (or we can discuss where
> to add them) to find out what actually happens.
>
> Jan
>
next prev parent reply other threads:[~2009-03-20 15:38 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-20 10:56 [Xenomai-help] rtserial interface stalls Vikesh Rambaran
2009-03-20 11:46 ` Jan Kiszka
2009-03-20 15:38 ` Vikesh Rambaran [this message]
2009-03-21 9:02 ` Jan Kiszka
2009-03-21 16:26 ` vikesh rambaran
[not found] ` <b131c9f0903230122r713d131x7bb516ed9d00a42a@domain.hid>
2009-04-06 9:30 ` vikesh rambaran
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1237563537.9844.150.camel@domain.hid \
--to=vikesh.rambaran@domain.hid \
--cc=jan.kiszka@domain.hid \
--cc=xenomai@xenomai.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.