linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Real-time kernel thread performance and optimization
@ 2012-11-30 15:46 Simon Falsig
  2012-11-30 22:31 ` Frank Rowand
  0 siblings, 1 reply; 16+ messages in thread
From: Simon Falsig @ 2012-11-30 15:46 UTC (permalink / raw)
  To: linux-rt-users

Hi,

Inspired by Thomas Gleixners LinuxCon '12 appeal for more
communication/feedback/interaction from people using the preempt-RT patch,
here comes a rather long (and hopefully at least slightly interesting) set
of questions.

First of all, a bit of background.  We have been using Linux and
preempt-RT on a custom ARM board for some years, and are currently in the
process of transitioning to a new AMD Fusion-based platform (also
custom-made, x86, 1.67 GHz dual-core). As we want to keep both systems in
production simultaneous for at least some time, we want to keep the
systems as similar as possible. For the new board, we have currently
settled on a 3.2.9 kernel with the rt16 patch (I can see that an rt17
patch has been released since we started though).

Our own system consists of a user-space application, communicating
with/over:
 - Ethernet (for our GUI, which runs on a separate machine)
 - Serial ports (various hardware)
 - A set of custom kernel modules (implementing device drivers for some
custom I/O hardware)

For the kernel modules we have a utility timer module, that allows other
modules to register a "poll" function, which is then run at a 10 ms cycle
rate. We want this to happen in real-time, so the timer module is made as
an rt-thread using hrtimers (the implementation is new, as the existing
code from our old board used the ARM hardware-timer). The following code
is used:

// Timer callback for 10ms polling of rackbus devices
static enum hrtimer_restart bus_10ms_callback(struct hrtimer *val) {
	struct custombus_device_driver *cbdrv, *next_cbdrv;
	ktime_t now = ktime_get();
	
	rt_mutex_lock(&list_10ms_mutex);
	
list_for_each_entry_safe(cbdrv,next_cbdrv,&polling_10ms_list,poll_list) {
		driver_for_each_device(&cbdrv->driver, NULL, NULL,
cbdrv->poll_function);
	}
	rt_mutex_unlock(&list_10ms_mutex);

	hrtimer_forward(&timer, now, kt);
	if(cancelCallback == 0) {
		return HRTIMER_RESTART;
	}
	else {
		return HRTIMER_NORESTART;
	}
}

// Thread to start 10ms timer
static int bus_rt_timer_init(void *arg) {
	kt = ktime_set(0, 10 * 1000 * 1000);		//10 ms = 10 *
1000 * 1000 ns
	cancelCallback = 0;
	hrtimer_init(&timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
	timer.function = bus_10ms_callback;
	hrtimer_start(&timer, kt, HRTIMER_MODE_REL);

	return 0;	
}

// Module initialization
int __init bus_timer_interrupt_init(void) {
	struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
	
	thread_10ms = kthread_create(bus_rt_timer_init, NULL, "bus_10ms");
	if (IS_ERR(thread_10ms)) {
		printk(KERN_ERR "Failed to create RT thread\n");
		return -ESRCH;
	}

	sched_setscheduler(thread_10ms, SCHED_FIFO, &param);

	wake_up_process(thread_10ms);
	
	printk(KERN_INFO "RT timer thread installed with priority %d.\n",
param.sched_priority);
	return 0;
}


I currently have a single module registered for polling. The poll function
is:

static inline void read_input(struct Io1000 *b)
{
	u16  *input = &b->ibuf[b->in];
	
	*input = le16_to_cpu((inb(REG_INPUT_1) << 8));

	process();
}


The "inb" function reads a register on an FPGA, attached over the LPC bus.
The pseudocode "process" function is a placeholder for some filtering of
the read inputs, performing mostly memory access (some of this protected
by a spin lock, although the lock should never be locked during the tests,
as there isn't anything else accessing it), and calling the kernel
"wake_up" function on the wait_queue containing our data.
To measure performance of the system, I've implemented a simple ChipScope
core in the FPGA, allowing me to count the number of cycles where the
period deviates above or below the desired 10 ms, and to store the maximum
period seen.

All this works just fine on an unloaded system. I'm consistently getting
cycle times very close to the 10 ms, with a range of 9.7 ms - 10.3 ms.

Once I start loading the system with various stress tests, I am getting
ranges of about 9.0 ms - 18.0 ms. I have however also seen rare 50-70 ms
spikes, typically  when starting the stress loads, but they don't seem to
be repeatable.
My stress loads are (inspired from Ingo Molnars dohell script
(https://lkml.org/lkml/2005/6/22/347)):

while true; do killall hackbench; sleep 5; done &
while true; do ./hackbench 20; done &
du / &
./dortc &
./serialspammer &

In addition to this, I'm also doing an external ping flood. The
serialspammer application basically just spams both our serial ports with
data (I've hardwired a physical loop-back to them), not because it's a lot
of data (at 115000kbps), but mostly as the serial chip is on the same LPC
bus as the FPGA. As our userspace application runs just fine on a 180 MHz
ARM, it only presents a very light load to our new platform. The used
stress loads should thus represent a very heavy load compared to what we
expect to see during normal operation.


Question 1:
- I'm rather content with the current performance, but I'd still like to
know if there is anything obvious (or anything obvious missing) in the
posted code that could be improved for better performance? I can see that
it is recommended to prefault and lock the used memory, but I haven't been
able to find anything about how to do this in a kernel thread?

Question 2:
 - Are latency spikes to be expected when starting the above stress loads?

Question 3:
 - As far as I can see spinlocks use priority inheritance - so I presume
that our spinlock calls from within our RT-thread should not pose a
potential major problem? According to
https://rt.wiki.kernel.org/index.php/HOWTO:_Build_an_RT-application
though, it seems that both spinlocks and "wake_up" are no-go's when called
in interrupt contexts - does the same apply to our timer context? (I've
had the "process" call commented out, without any seemingly noticeable
change in performance.)

Bonus-question:
 - Additionally, I've tried running cyclictest alongside with all the
above, and it actually performs rather well, without any substantial
spikes. A strange thing is though, that the results are actually better
with load than without? (running with -t1 -p 80 -n -i 10000 -l 10000)
 - Loaded: Min: 16, Avg: 41, Max: 177
 - No load: Min: 16, Avg: 97, Max: 263


Once I get this finished up, I'll be happy to do a complete write-up of
the timer-thread code, if anyone is interested. I remember looking for
something similar (but without success), when I wrote the code earlier
this year.

In any case, all kinds of answers or comments are welcome.
Thanks in advance!

Best regards,
Simon Falsig

^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: Real-time kernel thread performance and optimization
@ 2013-07-11  6:32 Simon Falsig
  0 siblings, 0 replies; 16+ messages in thread
From: Simon Falsig @ 2013-07-11  6:32 UTC (permalink / raw)
  To: linux-rt-users; +Cc: dvhart, frank.rowand, jkacur

>On 12/20/2012 12:21 AM, Simon Falsig wrote:
>> On 12/20/2012 01:12 AM, Darren Hart wrote:
>>> On 12/17/2012 02:18 PM, Frank Rowand wrote:
>>>> On 12/11/12 06:30, Simon Falsig wrote:
>>>>
>>>> < snip >
>>>>
>>>>>>> Once I get this finished up, I'll be happy to do a complete
>>>>>>> write-up of the timer-thread code, if anyone is interested. I
>>>>>>> remember looking for something similar (but without success), when
>>>>>>> I wrote the code earlier this year.
>>>>>>
>>>>>> It would be very useful to add your results to the wiki.
>>>>>>
>>>>>> -Frank
>>>>>
>>>>> Cool - is there any particular place it should go? A how-to, FAQ
>>>>> entry, etc? Just so I know how to do the write-up...
>>>>
>>>> https://rt.wiki.kernel.org/index.php/Main_Page would be my default
>>>> suggestion.  I'm not quite sure where on the wiki would be good
>>>> though.  Maybe under "Tips and Techniques"?
>>>>
>>>> I added the rtwiki maintainers to the cc: list.
>>>
>>> I don't have all the context, but this sounds a bit more like
something
>> for
>>> linux/Documentation (possibly for the preempt-rt patch set). If not,
the
>>> Documentation section on the wiki is a possibility.
>>> --
>>> Darren Hart
>>> Intel Open Source Technology Center
>>> Yocto Project - Technical Lead - Linux Kernel
>>
>> As I see it, the write-up could be done in two ways - 1) as a simple
code
>> example of a real-time loop in a kernel module, 2) as a blog-like post
of
>> the process I went through, investigating the performance, and
optimizing
>> my code.
>>
>> In the case of 1), I guess it could be added to
>> https://rt.wiki.kernel.org/index.php/RT_PREEMPT_HOWTO, as a kernel
version
>> of the realtime example, or possibly to
>> https://rt.wiki.kernel.org/index.php/HOWTO:_Build_an_RT-application
under
>> "Building Device Drivers"? In the case of 2) though, it could maybe be
on
>> its own page under "Tips and techniques"?
>
> I'd leave the exploration type write-up to your blog and we can link to
> it. An explicit example in one of the locations above also sounds
> appropriate.
>
> Thanks,
>
> Darren

So, it's been long overdue for some time now, but I finally got around to
finishing the writeup on my own blog. If anyone is still interested, it
can be found as a 3-part story here:

http://www.falsig.org/simon/blog/2013/03/30/real-time-linux-kernel-drivers
-part-1-the-setup/
http://www.falsig.org/simon/blog/2013/07/10/real-time-linux-kernel-drivers
-part-2-testing-and-first-implementation/
http://www.falsig.org/simon/blog/2013/07/10/real-time-linux-kernel-drivers
-part-3-the-better-implementation/

Any comments are more than welcome - and feel free to adapt any of the
code examples and put them on the wiki, if that could be of any help.

Best regards,
Simon

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-07-11  6:32 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-30 15:46 Real-time kernel thread performance and optimization Simon Falsig
2012-11-30 22:31 ` Frank Rowand
2012-12-03 12:39   ` Simon Falsig
2012-12-03 14:15   ` Carsten Emde
2012-12-11 14:43     ` Simon Falsig
2012-12-19  8:10       ` Carsten Emde
2012-12-20  8:09         ` Simon Falsig
2012-12-19 14:59     ` John Kacur
2012-12-19 15:20       ` Carsten Emde
2012-12-11 14:30   ` Simon Falsig
2012-12-17 22:18     ` Frank Rowand
2012-12-20  0:11       ` Darren Hart
2012-12-20  8:21         ` Simon Falsig
2013-01-02 17:21           ` Darren Hart
2012-12-12 15:39   ` Simon Falsig
  -- strict thread matches above, loose matches on Subject: below --
2013-07-11  6:32 Simon Falsig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).