From: Pavel Pisa <pisa@fel.cvut.cz>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users@vger.kernel.org, Carsten Emde <c.emde@osadl.org>,
linux-can@vger.kernel.org,
Oliver Hartkopp <socketcan@hartkopp.net>,
Jan Altenberg <Jan.Altenberg@osadl.org>,
Pavel Hronek <hronepa1@fel.cvut.cz>
Subject: Re: Question for AMD/Xilinx Zynq PREEMP_RT configuration check, CAN latency measuremet and FOSDEM 2025
Date: Wed, 29 Jan 2025 13:04:15 +0100 [thread overview]
Message-ID: <202501291304.15901.pisa@fel.cvut.cz> (raw)
In-Reply-To: <20250129101709.XQuo8Jle@linutronix.de>
Hello Sebastian,
On Wednesday 29 of January 2025 11:17:09 Sebastian Andrzej Siewior wrote:
> On 2025-01-28 16:29:27 [+0100], Pavel Pisa wrote:
> > Please check if you find some problematic choices.
>
> I didn't find anything obviously wrong. Assuming your CPU is busy in
> general you could remove NO_HZ in favour of PERIODIC. This is however
> not to cause spikes you describe below.
Great, thanks much for review by expert.
> > The cyclic test worked well, and we have even delivered two systems
> > to OSADL QA real-time farm
> >
> > https://www.osadl.org/?id=4109
>
> It shows "IRQ work interrupts". Not sure what causes them.
I am not sure either. That list is from old kernel
in long term testing setup at OSADL.
The actual one show none IRQ work interrupts
after last reboot and overnigh test
Linux mzapo 6.13.0-rc6-rt3-dut #1 SMP PREEMPT_RT
Wed Jan 29 04:46:40 CET 2025 armv7l GNU/Linux
CPU0 CPU1
24: 0 0 GIC-0 27 Edge gt
25: 700822 327164 GIC-0 29 Edge twd
26: 300 0 GIC-0 59 Level xuartps
29: 0 0 GIC-0 45 Level f8003000.dmac
30: 0 0 GIC-0 46 Level f8003000.dmac
31: 0 0 GIC-0 47 Level f8003000.dmac
32: 0 0 GIC-0 48 Level f8003000.dmac
33: 0 0 GIC-0 49 Level f8003000.dmac
34: 0 0 GIC-0 72 Level f8003000.dmac
35: 0 0 GIC-0 73 Level f8003000.dmac
36: 0 0 GIC-0 74 Level f8003000.dmac
37: 0 0 GIC-0 75 Level f8003000.dmac
40: 460330 0 GIC-0 54 Level end0
41: 0 0 GIC-0 53 Level e0002000.usb
42: 356 0 GIC-0 56 Level mmc0
43: 0 0 GIC-0 43 Level ttc_clockevent
44: 25 0 GIC-0 39 Level f8007100.adc
45: 0 0 GIC-0 37 Level arm-pmu
46: 0 0 GIC-0 38 Level arm-pmu
47: 128 0 GIC-0 40 Level f8007000.devcfg
48: 314697 0 GIC-0 61 Level can2
49: 314597 0 GIC-0 62 Level can3
50: 314759 0 GIC-0 63 Level can4
51: 311516 0 GIC-0 64 Level can5
IPI0: 0 0 CPU wakeup interrupts
IPI1: 0 0 Timer broadcast interrupts
IPI2: 17849 292126 Rescheduling interrupts
IPI3: 5923 11315 Function call interrupts
IPI4: 0 0 CPU stop interrupts
IPI5: 271078 74040 IRQ work interrupts
IPI6: 0 0 completion interrupts
Err: 0
So this seems as no cause.
> > However, the CAN/CAN FD communication latency measured on the CTU CAN FD
> > IP core is far from optimal. Some runs under load with
> > 10 msec latency. Our own CAN FD stack for RTEMS keeps with no exception
> > under 60 usec on the same hardware.
> >
> > I understand that the Linux socket layer and networking
> > stack are complex, and many optimizations are ahead.
> > We will be happy to contribute where we can and find time
> > and even some resources to engage more students etc...
> >
> > But I would like to be sure that the bad results are not
> > caused by our mistakes in configuration.
>
> You have CAN and "regular networking". My guess would be that regular
> networking blocks blocks BH and so your CAN. You could try to have all
> interrupts serviced on CPU0 and move CAN to CPU1. If so this should
> improve then. Other than that, I would suggest to get some tracing to
> see what delays your CAN interrupts and/ or handling in general.
Yes, I think that design mixing regular networking packet
processing with CAN is the problem. We test even with setup where
CAN interrupts priority is boosted to 90
echo "-> Rise CAN irq priorities"
PIDS=$(ps -e | grep -E irq/[0-9]+-can[3-4] | tr -s ' ' | cut -d ' ' -f2)
TXPID=$(ps -e | grep -E irq/[0-9]+-can2 | tr -s ' ' | cut -d ' ' -f2)
chrt -f --pid 80 $TXPID
for pid in $PIDS ; do
chrt -f --pid 85 $pid
done
ps Hxa --sort rtprio -o pid,policy,rtprio,state,tname,time,command
...
70 FF 50 S ? 00:00:00 [irq/37-f8003000.dmac]
71 FF 50 S ? 00:00:38 [irq/40-eth%d]
...
405 FF 50 S ? 00:00:00 [irq/26-xuartps]
355 FF 90 S ? 00:00:06 [irq/48-can2]
361 FF 90 S ? 00:00:13 [irq/49-can3]
366 FF 90 S ? 00:00:07 [irq/50-can4]
371 FF 90 S ? 00:00:06 [irq/51-can5]
22 FF 99 S ? 00:00:00 [migration/0]
27 FF 99 S ? 00:00:00 [migration/1]
Even this setup is problematic under load.
The situation with CAN IRQ priority 50 and 90 can be compared
by clicking on "RT priority set" option
https://canbus.pages.fel.cvut.cz/can-latester/inspect.html?kernel=rt&prio=1&load=1&flood=1&fd=1
The switch between in kernel CAN gateway and userpace one
is controlled by "Kernel GW".
User CAN gateway is run with priority 80
chrt -r 80 ugw -f can3 can2
I spot interesting trend after
run-250103-045322-hist+6.13.0-rc1-rt1-g5374fecd2695+flood-prio-fd-load.json
that user gateway case, simple copy of frames from can3 to can2
has never exceed 1.4 ms almost for one month.
It could be interesting to corelate that with kernel changes.
We use branch
for-kbuild-bot/current-stable
from
git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git
to run daily testing. We can consider even something different,
but this choice has been given by interest in something
functional for each day and ahead of mainline merges to
catch some problems in advance.
It is interesting than in kernel gateway is significantly worse
now. It does not overhead of switching to userspace. But I am not
sure if it is not invoked in some kernel worker which
has lower or same real time priority than Ethenet networking.
In general, I think that the problem is that incommin
packets (CAN and Ethernet) load the same per CPU
worker. There are even backlog_napi threads per CPU
46 TS - S ? 00:00:00 [backlog_napi/0]
47 TS - S ? 00:00:00 [backlog_napi/1]
It has even TS priority. If I remember well, there has been
added option to allocate separate RX packets processing
therad (instead for default per CPU one) for given interface.
But I have no experience with such configuration.
Do you have or somebody else have idea how to achieve
that and if it is legal to boost such kernel therad
priority. It could help, because my general experience
with PREEMPT_RT even on this target is very positive
for tasks mapping HW directly and doing RT control.
Same for latency tester. No spikes under load over
250 usec or less.
> > I will be happy to meet you and discuss Linux and other
> > control and real-time areas at FOSDEM 2025.
>
> I should be able to make it.
Great, I would be happy to meet at FOSDEM or discuss
these topic later at some event.
> > Slides in English which I want to update/correct for FOSDEM
> >
> >
> > https://talks.openalt.cz/media/openalt-2024/submissions/3XTMDF/resources/
> >openalt24_linux_for_rt-reduced_FbZPuS0.pdf
>
> looks good. If you want additional history points, I have some at
> https://files.speakerdeck.com/presentations/0620b5b3a00b42fc91fba6cc4092d2
>78/KR_2024_PREEMPT_RT_over_the_years.pdf Slide 11 - 21.
Thanks much for the input
> However you have most of the pieces so.
>
Best wishes,
Pavel
--
Pavel Pisa
phone: +420 603531357
e-mail: pisa@cmp.felk.cvut.cz
Department of Control Engineering FEE CVUT
Karlovo namesti 13, 121 35, Prague 2
university: http://control.fel.cvut.cz/
personal: http://cmp.felk.cvut.cz/~pisa
social: https://social.kernel.org/ppisa
projects: https://www.openhub.net/accounts/ppisa
CAN related:http://canbus.pages.fel.cvut.cz/
RISC-V education: https://comparch.edu.cvut.cz/
Open Technologies Research Education and Exchange Services
https://gitlab.fel.cvut.cz/otrees/org/-/wikis/home
next prev parent reply other threads:[~2025-01-29 12:04 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-28 15:29 Question for AMD/Xilinx Zynq PREEMP_RT configuration check, CAN latency measuremet and FOSDEM 2025 Pavel Pisa
2025-01-29 10:17 ` Sebastian Andrzej Siewior
2025-01-29 12:04 ` Pavel Pisa [this message]
2025-01-29 14:40 ` Sebastian Andrzej Siewior
2025-03-28 12:04 ` CAN latency measuremet on AMD/Xilinx Zynq with PREEMP_RT - added threaded NAPI configuration Pavel Pisa
2025-04-17 8:12 ` Sebastian Andrzej Siewior
2025-04-18 10:12 ` Pavel Pisa
2025-04-18 20:18 ` Oliver Hartkopp
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=202501291304.15901.pisa@fel.cvut.cz \
--to=pisa@fel.cvut.cz \
--cc=Jan.Altenberg@osadl.org \
--cc=bigeasy@linutronix.de \
--cc=c.emde@osadl.org \
--cc=hronepa1@fel.cvut.cz \
--cc=linux-can@vger.kernel.org \
--cc=linux-rt-users@vger.kernel.org \
--cc=socketcan@hartkopp.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox