ARMv7 futex latency spike

linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* ARMv7 futex latency spike
@ 2015-09-04  8:05 Ioan-Adrian Ratiu
  2015-09-04 12:34 ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Ioan-Adrian Ratiu @ 2015-09-04  8:05 UTC (permalink / raw)
  To: linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 7040 bytes --]

Hello

I wrote a C program to check jitter levels in userspace (source 
attached), ran it on various kernels (3.14.19-rt9, 3.18.9-rt5 and 
4.1.5-rt5) and on multiple machines.
The problem I'm describing here does not reproduce on x86 architectures, 
only on the armv7 ones with all those kernels, specifically xilinx zynq 
Z-7020 and raspberrypi 2.

The C program simulates some workloads using timed busy waits inside and 
outside a mutex lock.
I have two threads which acquire that mutex in a loop, one with high 
priority called "High-prio******" (CPU1) and a normal priority one 
called "Low-prio******" (no CPU affinity).
There is also a third thread called "Busy-loop******" (CPU1) which has 1 
level lesser priority than "High-prio******", and it should be running 
on CPU1 when the high priority thread is not.

The jitter is measured between two acquisitions of the mutex in 
"High-prio******" thread and the numbers I get are from 20-30us to 120us 
for ex.

Considering these results show high jitter, I've created traces to 
understand what is happening. trace-cmd adds more jiiter, so the values 
are between 40-60us to almost 200us, but the jitter spikes still reproduce.
When jitter is high, the "High-prio******" thread does priority 
inheritace and waits on the "Low-prio******" thread twice. This second 
priority inheritace is suspicious since "Low-prio******" is already in 
the mutex/futex unlock syscall.
Normally there is only one priority inheritance before "High-prio******" 
acquires the lock and continues to run.

This is a most intruiguing part in one of the traces from the zynq 
machine (full trace at [1], also from rpi2 at [2]):

Low-prio******-1311 [000] 12165.412116: sys_enter: NR 240 (11524, 86, 
220, b6e46920, 51f, 0)
...
Low-prio******-1311 [000] 12165.412198: sys_exit: NR 240 = 0
...
High-prio******-1312 [001] 12165.412223: sys_enter: NR 263 (1, b6e05e30, 
1290, 1, 0, 0)
High-prio******-1312 [001] 12165.412226: sys_exit: NR 263 = 0
High-prio******-1312 [001] 12165.412229: sys_enter: NR 240 (11524, 86, 
1, 0, 11524, b6e06920)
Low-prio******-1311 [000] 12165.412238: sched_stat_runtime: 
comm=Low-prio****** pid=1311 runtime=15384 [ns] vruntime=945770545539 [ns]
Low-prio******-1311 [000] 12165.412243: sched_switch: 
Low-prio******:1311 [120] R ==> trace-cmd:1308 [120]
High-prio******-1312 [001] 12165.412248: sched_pi_setprio: 
comm=Low-prio****** pid=1311 oldprio=120 newprio=40
trace-cmd-1308 [000] 12165.412251: sys_exit: NR 162 = 0
High-prio******-1312 [001] 12165.412251: sched_stat_runtime: 
comm=trace-cmd pid=1308 runtime=12957 [ns] vruntime=945764543112 [ns]
trace-cmd-1308 [000] 12165.412255: sys_enter: NR 340 (5, 0, 8, 0, 1000, 1)
trace-cmd-1308 [000] 12165.412263: sched_stat_runtime: comm=trace-cmd 
pid=1308 runtime=11568 [ns] vruntime=945764554680 [ns]
High-prio******-1312 [001] 12165.412266: sched_switch: 
High-prio******:1312 [40] S ==> Busy-loop******:1313 [41]
trace-cmd-1308 [000] 12165.412268: sched_switch: trace-cmd:1308 [120] R 
==> Low-prio******:1311 [40]
Low-prio******-1311 [000] 12165.412273: sys_enter: NR 263 (1, b6e45e20, 
220, b6e45e20, 0, 0)
Low-prio******-1311 [000] 12165.412276: sys_exit: NR 263 = 0
Low-prio******-1311 [000] 12165.412280: sys_enter: NR 240 (11524, 87, 
220, b6e46920, 51f, 0)
Low-prio******-1311 [000] 12165.412295: sched_wakeup: 
High-prio******:1312 [40] success=1 CPU:001
Low-prio******-1311 [000] 12165.412300: sched_pi_setprio: 
comm=Low-prio****** pid=1311 oldprio=40 newprio=120
Busy-loop******-1313 [001] 12165.412301: sched_switch: 
Busy-loop******:1313 [41] R ==> High-prio******:1312 [40]
Low-prio******-1311 [000] 12165.412306: sched_stat_runtime: 
comm=Low-prio****** pid=1311 runtime=2136 [ns] vruntime=945770547675 [ns]
Low-prio******-1311 [000] 12165.412312: sched_stat_runtime: 
comm=Low-prio****** pid=1311 runtime=6441 [ns] vruntime=945770554116 [ns]
Low-prio******-1311 [000] 12165.412316: sched_switch: 
Low-prio******:1311 [120] R ==> trace-cmd:1308 [120]
High-prio******-1312 [001] 12165.412321: sched_pi_setprio: 
comm=Low-prio****** pid=1311 oldprio=120 newprio=40
High-prio******-1312 [001] 12165.412324: sched_stat_runtime: 
comm=trace-cmd pid=1308 runtime=11910 [ns] vruntime=945764566590 [ns]
trace-cmd-1308 [000] 12165.412335: sched_stat_runtime: comm=trace-cmd 
pid=1308 runtime=9912 [ns] vruntime=945764576502 [ns]
High-prio******-1312 [001] 12165.412337: sched_switch: 
High-prio******:1312 [40] D ==> Busy-loop******:1313 [41]
trace-cmd-1308 [000] 12165.412340: sched_switch: trace-cmd:1308 [120] R 
==> Low-prio******:1311 [40]
Low-prio******-1311 [000] 12165.412352: sched_wakeup: 
High-prio******:1312 [40] success=1 CPU:001
Low-prio******-1311 [000] 12165.412356: sched_pi_setprio: 
comm=Low-prio****** pid=1311 oldprio=40 newprio=120
Busy-loop******-1313 [001] 12165.412356: sched_switch: 
Busy-loop******:1313 [41] R ==> High-prio******:1312 [40]
Low-prio******-1311 [000] 12165.412361: sched_stat_runtime: 
comm=Low-prio****** pid=1311 runtime=1998 [ns] vruntime=945770556114 [ns]
Low-prio******-1311 [000] 12165.412368: sched_stat_runtime: 
comm=Low-prio****** pid=1311 runtime=6108 [ns] vruntime=945770562222 [ns]
High-prio******-1312 [001] 12165.412370: sys_exit: NR 240 = 0
Low-prio******-1311 [000] 12165.412372: sched_switch: 
Low-prio******:1311 [120] R ==> trace-cmd:1308 [120]
High-prio******-1312 [001] 12165.412374: sys_enter: NR 263 (1, b6e05e10, 
26c, b6e05e10, 0, 0)
High-prio******-1312 [001] 12165.412377: sys_exit: NR 263 = 0
...
Low-prio******-1311 [000] 12165.412519: sys_exit: NR 240 = 0

Analysis:
At .412229, "High-prio******" does a futex lock (system call 240, 
parameters:11524, 86,)
At .412248, "Low-prio******" holds that lock so its priority is raised 
because "High-prio******" is waiting for it
At .412266, CPU1 switches from "High-prio******" to "Busy-loop******" 
because it is waiting at the lock
At .412280, "Low-prio******" wants to release the lock as it has 
finished its workload (system call 240, parameters: 11524, 87,)
At .412295, "Low-prio******" sends wake up event for "High-prio******" 
to run (probably because it is not holding the lock anymore)
At .412300, "Low-prio******" lowers its priority
At .412301, "High-prio******" gets to run, but this is interesting as in 
the next steps it won't exit the system call 240 (parameters:11524, 86,) 
until .412370
At .412321, "Low-prio******"'s priority is raised by "High-prio******" 
again, as if "High-prio******" is waiting for something, but then why 
was it woken up before?
* did "Low-prio******" acquire the lock again? How could it manage this 
feat since it is already in the unlock system call?
The next steps do the same as the above, raise priority for 
"Low-prio******" while "High-prio******" waits. "Low-prio******" will 
wakeup "High-prio******".
At .412372 "High-prio******" finally exits the acquire lock system call, 
141us later.
Much later, at .412519 "Low-prio******" finishes the unlock call.

[1] http://documents.adirat.com/zynq-trace
[2] http://documents.adirat.com/rpi2-trace

[-- Attachment #2: mutex_overhead.c --]
[-- Type: text/x-csrc, Size: 5498 bytes --]

#define _GNU_SOURCE
#include <pthread.h>
#include <sys/prctl.h>
#include <stdio.h>
#include <string.h>
#include <sched.h>

#define TRAINING		(50)
#define ITERATIONS		(10^3)
#define HIGH_THD_ITERATIONS	(TRAINING + ITERATIONS)

#define NS_PER_SEC	(10^9)

#define HIGH_PRIO	(59)

#define CORE0_ID	(0)
#define CORE1_ID	(1)

pthread_mutex_t lock;

/* the busyloop & lowpriority threads will loop on this while the high priority
   thread will set it after it finishes all its iterations */
int threads_stop_flag = 0;

/* the high-priority thread uses this struct to find the maximum latency between
   payloads from all its iterations */
struct timespec max_latency = {0, 0};

struct loads
{
	struct timespec hold_lock_load;
	struct timespec no_lock_load;
};

int subtract_timespec(struct timespec *x,
		      struct timespec *y,
		      struct timespec *result)
{
	if (x->tv_nsec < y->tv_nsec) {
		int sec = (y->tv_nsec - x->tv_nsec) / NS_PER_SEC + 1;
		y->tv_nsec -= NS_PER_SEC * sec;
		y->tv_sec += sec;
	}
	if (x->tv_nsec - y->tv_nsec > NS_PER_SEC) {
		int sec = (x->tv_nsec - y->tv_nsec) / NS_PER_SEC;
		y->tv_nsec += NS_PER_SEC * sec;
		y->tv_sec -= sec;
	}

	result->tv_sec = x->tv_sec - y->tv_sec;
	result->tv_nsec = x->tv_nsec - y->tv_nsec;

	return x->tv_sec < y->tv_sec;
}

int compare_timespec(struct timespec *x,
		     struct timespec *y)
{
	if (x->tv_sec > y->tv_sec)
		return 1;
	if (x->tv_sec == y->tv_sec)
	{
		if (x->tv_nsec == y->tv_nsec)
			return 0;
		if (x->tv_nsec > y->tv_nsec)
			return 1;
	}
	return -1;
}

int setup_rt_pthread(int prio, int cpu, const char* name)
{
	struct sched_param schedp;
	cpu_set_t cpuset;
	pthread_t thread = pthread_self();
	int error = 0;

	memset(&schedp, 0, sizeof(schedp));
	CPU_ZERO(&cpuset);

	schedp.sched_priority = prio;
	error = pthread_setschedparam(thread, SCHED_FIFO, &schedp);
	if (error)
		fprintf(stderr, "%s: Couldn't setschedparam, returned %d\n", name, error);

	CPU_SET(cpu, &cpuset);
	error = pthread_setaffinity_np(thread, sizeof(cpu_set_t), &cpuset);
	if (error)
		fprintf(stderr, "%s: Couldn't setaffinity, returned %d\n", name, error);

	error = prctl(PR_SET_NAME, name, NULL, NULL, NULL,NULL);
	if (error)
		fprintf(stderr, "%s: Couldn't set proc name, returned %d\n", name, error);

	return -error;
}

inline void do_load(struct timespec load)
{
	struct timespec loop_start, current, result;

	clock_gettime(CLOCK_MONOTONIC, &loop_start);

	do {
		clock_gettime(CLOCK_MONOTONIC, &current);
		subtract_timespec(&current, &loop_start, &result);
	} while (compare_timespec(&result, &load) <= 0);
}

void *busy_loop_threadfun(void *busy_loads)
{
	if (setup_rt_pthread(HIGH_PRIO - 1, CORE1_ID, "Busy-loop******")) {
		fprintf(stderr, "Busy-loop****** setup error");
		return (void*) -1;
	}
	
	while (!threads_stop_flag); /* loop until high-prio thread signals */

	return 0;
}

void *low_prio_threadfun(void *low_loads)
{
	struct loads *load = (struct loads*) low_loads;

	if (prctl(PR_SET_NAME, "Low-prio******", NULL, NULL, NULL,NULL) < 0) {
		fprintf(stderr, "Low--prio****** setup error");
		return (void*) -1;
	}

	while (!threads_stop_flag) {
		pthread_mutex_lock(&lock);
		do_load(load->hold_lock_load);
		pthread_mutex_unlock(&lock);
		do_load(load->no_lock_load);
	}

	return 0;
}

void *high_prio_threadfun(void* high_loads)
{
	struct loads *load = (struct loads*)high_loads;
	struct timespec prev, cur, latency;
	int i;

	if (setup_rt_pthread(HIGH_PRIO, CORE1_ID, "High-prio******")) 
	{
		fprintf(stderr, "High-prio****** setup error");
		threads_stop_flag = 1;
		return (void*) -1;
	}

	clock_gettime(CLOCK_MONOTONIC, &prev);

	/* runs the payloads and computes latencies between them, stores max;
	   the first iterations are for training, so no latency is stored */
	for (i = HIGH_THD_ITERATIONS; i > 0; --i) {
		pthread_mutex_lock(&lock);
		do_load(load->hold_lock_load);
		pthread_mutex_unlock(&lock);
		do_load(load->no_lock_load);

		clock_gettime(CLOCK_MONOTONIC, &cur);
		subtract_timespec(&cur, &prev, &latency);
		subtract_timespec(&latency, &load->hold_lock_load, &latency);
		subtract_timespec(&latency, &load->no_lock_load, &latency);

		if (i <= ITERATIONS &&
		    compare_timespec(&latency, &max_latency) > 0) {
			max_latency.tv_sec = latency.tv_sec;
			max_latency.tv_nsec = latency.tv_nsec;
		}

		prev = cur;
	}

	threads_stop_flag = 1;

	return 0;
}

unsigned int initialize_mutex()
{
	pthread_mutexattr_t lock_attr;
	unsigned int error = 0;

	memset(&lock, 0, sizeof(pthread_mutex_t));
	pthread_mutexattr_init(&lock_attr);
	error = pthread_mutexattr_settype(&lock_attr, PTHREAD_MUTEX_NORMAL);
	error |= pthread_mutexattr_setprotocol(&lock_attr, PTHREAD_PRIO_INHERIT);
	error |= pthread_mutex_init(&lock, &lock_attr);
	pthread_mutexattr_destroy(&lock_attr);

	return error;
}

int main (int argc, char* argv[])
{
	pthread_t low_prio_thd, high_prio_thd, busy_loop_thd;
	struct loads high_prio_load = {{0,1001},{0,1001}};
	struct loads low_prio_load = {{0,1001},{0,1001}};

	if (initialize_mutex())
		return -1;

	pthread_create(&low_prio_thd, NULL, &low_prio_threadfun, (void *) &low_prio_load);
	pthread_create(&high_prio_thd, NULL, &high_prio_threadfun, (void *) &high_prio_load);
	pthread_create(&busy_loop_thd, NULL, &busy_loop_threadfun, NULL);

	pthread_join(low_prio_thd, NULL);
	pthread_join(high_prio_thd, NULL);
	pthread_join(busy_loop_thd, NULL);

	pthread_mutex_destroy(&lock);

	printf("Highest latency: %li\n", max_latency.tv_sec * NS_PER_SEC + max_latency.tv_nsec);

	return 0;
}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ARMv7 futex latency spike
  2015-09-04  8:05 ARMv7 futex latency spike Ioan-Adrian Ratiu
@ 2015-09-04 12:34 ` Thomas Gleixner
  2015-09-04 14:45   ` Ioan-Adrian Ratiu
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2015-09-04 12:34 UTC (permalink / raw)
  To: Ioan-Adrian Ratiu; +Cc: linux-rt-users

On Fri, 4 Sep 2015, Ioan-Adrian Ratiu wrote:

> Hello
> 
> I wrote a C program to check jitter levels in userspace (source attached), ran
> it on various kernels (3.14.19-rt9, 3.18.9-rt5 and 4.1.5-rt5) and on multiple
> machines.

<snip>

> At .412321, "Low-prio******"'s priority is raised by "High-prio******" again,
> as if "High-prio******" is waiting for something, but then why was it woken up
> before?
> * did "Low-prio******" acquire the lock again? How could it manage this feat
> since it is already in the unlock system call?

That looks like highprio is trying to acquire a kernel internal lock
held by lowprio. The only lock which can cause this is the futex
hashbucket lock.

Now, I can see why that happens in 3.14.19-rt9, 3.18.9-rt5, but
4.1.5-rt5 has a fix for that. Are you sure that you that this happens
on 4.1.5-rt5 as well?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ARMv7 futex latency spike
  2015-09-04 12:34 ` Thomas Gleixner
@ 2015-09-04 14:45   ` Ioan-Adrian Ratiu
  2015-12-11 16:34     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 5+ messages in thread
From: Ioan-Adrian Ratiu @ 2015-09-04 14:45 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-rt-users

On 04.09.2015 15:34, Thomas Gleixner wrote:
> On Fri, 4 Sep 2015, Ioan-Adrian Ratiu wrote:
>
>> Hello
>>
>> I wrote a C program to check jitter levels in userspace (source attached), ran
>> it on various kernels (3.14.19-rt9, 3.18.9-rt5 and 4.1.5-rt5) and on multiple
>> machines.
>
> <snip>
>
>> At .412321, "Low-prio******"'s priority is raised by "High-prio******" again,
>> as if "High-prio******" is waiting for something, but then why was it woken up
>> before?
>> * did "Low-prio******" acquire the lock again? How could it manage this feat
>> since it is already in the unlock system call?
>
> That looks like highprio is trying to acquire a kernel internal lock
> held by lowprio. The only lock which can cause this is the futex
> hashbucket lock.
>
> Now, I can see why that happens in 3.14.19-rt9, 3.18.9-rt5, but
> 4.1.5-rt5 has a fix for that. Are you sure that you that this happens
> on 4.1.5-rt5 as well?

It was a 4.1.3-rt3 build, sorry.

I will try the latest 4.1.5-rt5 and come back if this is still an issue.

Thank you,
Adrian


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ARMv7 futex latency spike
  2015-09-04 14:45   ` Ioan-Adrian Ratiu
@ 2015-12-11 16:34     ` Sebastian Andrzej Siewior
  2015-12-14  8:54       ` Ioan-Adrian Ratiu
  0 siblings, 1 reply; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-12-11 16:34 UTC (permalink / raw)
  To: Ioan-Adrian Ratiu; +Cc: Thomas Gleixner, linux-rt-users

* Ioan-Adrian Ratiu | 2015-09-04 17:45:57 [+0300]:

>I will try the latest 4.1.5-rt5 and come back if this is still an issue.
I assume that problem went away…

>Thank you,
>Adrian

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ARMv7 futex latency spike
  2015-12-11 16:34     ` Sebastian Andrzej Siewior
@ 2015-12-14  8:54       ` Ioan-Adrian Ratiu
  0 siblings, 0 replies; 5+ messages in thread
From: Ioan-Adrian Ratiu @ 2015-12-14  8:54 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

On Fri, 11 Dec 2015 17:34:08 +0100
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> * Ioan-Adrian Ratiu | 2015-09-04 17:45:57 [+0300]:
> 
> >I will try the latest 4.1.5-rt5 and come back if this is still an
> >issue.
> I assume that problem went away…

Yes, it was not happening anymore in 4.1.5-rt5 and I did a backport of
this [1] series on an older kernel and it also fixed the issue.

[1] https://lkml.org/lkml/2015/4/7/604
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-12-14  8:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-04  8:05 ARMv7 futex latency spike Ioan-Adrian Ratiu
2015-09-04 12:34 ` Thomas Gleixner
2015-09-04 14:45   ` Ioan-Adrian Ratiu
2015-12-11 16:34     ` Sebastian Andrzej Siewior
2015-12-14  8:54       ` Ioan-Adrian Ratiu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).