* Question about sched_yield()
@ 2002-06-15 22:15 mgix
2002-06-16 14:43 ` [patch] " Ingo Molnar
2002-06-18 0:46 ` David Schwartz
0 siblings, 2 replies; 52+ messages in thread
From: mgix @ 2002-06-15 22:15 UTC (permalink / raw)
To: linux-kernel
Hello,
I am seeing some strange linux scheduler behaviours,
and I thought this'd be the best place to ask.
I have two processes, one that loops forever and
does nothing but calling sched_yield(), and the other
is basically benchmarking how fast it can compute
some long math calculation.
When I run the math simulator with nothing else running,
I get the following:
lenslark # ./loop
1.86543 MLoops per sec, sum=0.291969
1.84847 MLoops per sec, sum=1.65587
1.84064 MLoops per sec, sum=1.74601
When I run one instance of the math simulator and
one instance of the sched_yielder, I see the behaviour
I was expecting (and want): all CPU is allocated to the
process that does some actual work:
lenslark # ./yield &
[1] 26316
lenslark #
lenslark # ./loop
1.86502 MLoops per sec, sum=0.291969
1.84868 MLoops per sec, sum=1.65587
1.84035 MLoops per sec, sum=1.74601
and top confirms what I'm seeing: loop gets 99% of
the CPU, and yield gets close to none.
However, when I have two instances of the
sched_yielder running, things start to behave strange:
lenslark # ./yield &
[1] 26350
lenslark # ./yield &
[2] 26351
lenslark # ./loop
0.686901 MLoops per sec, sum=0.291969
0.674352 MLoops per sec, sum=1.65587
0.665542 MLoops per sec, sum=1.74601
0.674117 MLoops per sec, sum=0.407357
and sure enough, top is showing that for
some reason, the sched_yielders get 1/3rd
of the CPU:
26364 root 18 0 364 364 304 R 36.0 0.3 0:04 loop
26351 root 12 0 252 252 204 R 32.2 0.2 0:31 yield
26350 root 14 0 252 252 204 R 31.2 0.2 0:33 yield
26365 root 11 0 1024 1024 812 R 0.3 1.0 0:00 top
Does anyone have a clue as to why this happens ?
Is this the expected behaviour given the current
scheduler algorithm (of which I know nothing :)).
For comparison purposes, I ran the same test on
Win2k+MingWin32, and things behave the way I'd expect
there: no matter how many sched_yielders are running,
loop gets almost all of the CPU, and barely slows down.
I am running on an AMD-K6@400Mhz, on the following kernel:
lenslark # cat /proc/version
Linux version 2.4.18 (root@lenslark.mgix.com) (gcc version 2.96 20000731 (Mandra
ke Linux 8.2 2.96-0.76mdk)) #1 Sun Apr 28 15:25:19 PDT 2002
This kernel was compiled with SMP turned off.
If anyone can explain me what's going on, or even better
how to replicate the behaviour I'm seeing on Win2k, I'd
be very happy. Also, I'm not a subscriber to linux-kernel,
so I'd appreciate a CC.
I've included the source to the two small programs below:
================================= YIELD.CPP
#ifdef linux
#include <sched.h>
main()
{
while(1)
{
sched_yield();
}
}
#endif
#ifdef WIN32
#include <windows.h>
main()
{
while(1)
{
Sleep(0);
}
}
#endif
================================= LOOP.CPP
#include <math.h>
#include <stdio.h>
#include <sys/time.h>
#ifdef WIN32
#include <windows.h>
#endif
typedef unsigned long long uint64_t;
uint64_t usecs()
{
#ifdef linux
struct timeval t;
gettimeofday(&t,0);
uint64_t result;
result= t.tv_sec;
result*= 1000000;
result+= t.tv_usec;
return result;
#endif
#ifdef WIN32
return 1000*(uint64_t)GetTickCount();
#endif
}
main()
{
uint64_t t= 0;
double sum= 0.0;
uint64_t lastTime= 0;
uint64_t lastLoops= 0;
while(t<100000000)
{
sum+= sin((double)t);
if((t%4000000)==0)
{
uint64_t nowTime= usecs();
uint64_t nowLoops= t;
if(lastTime!=0)
{
double loops= (nowLoops-lastLoops);
double elapsed= (nowTime-lastTime)/1000000.0;
fprintf(
stderr,
"%g MLoops per sec, sum=%g\n",
(loops/elapsed)/1000000.0,
sum
);
}
lastTime= nowTime;
lastLoops= nowLoops;
}
++t;
}
fprintf(stderr,"Sum=%g\n",sum);
}
^ permalink raw reply [flat|nested] 52+ messages in thread* [patch] Re: Question about sched_yield() 2002-06-15 22:15 Question about sched_yield() mgix @ 2002-06-16 14:43 ` Ingo Molnar 2002-06-18 0:46 ` David Schwartz 1 sibling, 0 replies; 52+ messages in thread From: Ingo Molnar @ 2002-06-16 14:43 UTC (permalink / raw) To: mgix; +Cc: linux-kernel, Linus Torvalds On Sat, 15 Jun 2002 mgix@mgix.com wrote: > I am seeing some strange linux scheduler behaviours, and I thought > this'd be the best place to ask. > > I have two processes, one that loops forever and does nothing but > calling sched_yield(), and the other is basically benchmarking how fast > it can compute some long math calculation. sched_yield() is indeed misbehaving both in 2.4 and 2.5. (I think you are using some variant of 2.4.18, does that kernel have the O(1) scheduler patches applied?) In fact in 2.5 it's behaving slightly worse than in vanilla 2.4.18. the O(1)-scheduler's sched_yield() implementation does the following to 'give up' the CPU: - it decreases its priority by 1 until it reaches the lowest level - it queues the task to the end of the priority queue this scheme works fine in most cases, but if sched_yield()-active tasks are mixed with CPU-using processes then it's quite likely that the CPU-using process is in the expired array. In that case the yield()-ing process only requeues itself in the active array - a true context-switch to the expired process will only occur once the timeslice of the yield()-ing process has expired: in ~150 msecs. This leads to the yield()-ing and CPU-using process to use up rougly the same amount of CPU-time, which is arguably deficient. i've fixed this problem by extending sched_yield() the following way: + * There are three levels of how a yielding task will give up + * the current CPU: + * + * #1 - it decreases its priority by one. This priority loss is + * temporary, it's recovered once the current timeslice + * expires. + * + * #2 - once it has reached the lowest priority level, + * it will give up timeslices one by one. (We do not + * want to give them up all at once, it's gradual, + * to protect the casual yield()er.) + * + * #3 - once all timeslices are gone we put the process into + * the expired array. + * + * (special rule: RT tasks do not lose any priority, they just + * roundrobin on their current priority level.) + */ the attached patch against vanilla 2.5.21 also includes this sched_yield() improvement, and in my tests it now behaves well. Could you give it a go, does it now behave for your workload as well? Ingo diff -rNu linux-2.5.21-final/arch/i386/kernel/entry.S linux/arch/i386/kernel/entry.S --- linux-2.5.21-final/arch/i386/kernel/entry.S Sun Jun 9 07:28:19 2002 +++ linux/arch/i386/kernel/entry.S Sun Jun 16 16:35:52 2002 @@ -193,6 +193,7 @@ ENTRY(ret_from_fork) #if CONFIG_SMP || CONFIG_PREEMPT + # NOTE: this function takes a parameter but it's unused on x86. call schedule_tail #endif GET_THREAD_INFO(%ebx) diff -rNu linux-2.5.21-final/fs/pipe.c linux/fs/pipe.c --- linux-2.5.21-final/fs/pipe.c Sun Jun 9 07:26:29 2002 +++ linux/fs/pipe.c Sun Jun 16 16:35:36 2002 @@ -119,7 +119,7 @@ * writers synchronously that there is more * room. */ - wake_up_interruptible(PIPE_WAIT(*inode)); + wake_up_interruptible_sync(PIPE_WAIT(*inode)); kill_fasync(PIPE_FASYNC_WRITERS(*inode), SIGIO, POLL_OUT); if (!PIPE_EMPTY(*inode)) BUG(); @@ -219,7 +219,7 @@ * is going to give up this CPU, so it doesnt have * to do idle reschedules. */ - wake_up_interruptible(PIPE_WAIT(*inode)); + wake_up_interruptible_sync(PIPE_WAIT(*inode)); kill_fasync(PIPE_FASYNC_READERS(*inode), SIGIO, POLL_IN); PIPE_WAITING_WRITERS(*inode)++; pipe_wait(inode); diff -rNu linux-2.5.21-final/include/asm-i386/system.h linux/include/asm-i386/system.h --- linux-2.5.21-final/include/asm-i386/system.h Sun Jun 9 07:26:29 2002 +++ linux/include/asm-i386/system.h Sun Jun 16 16:35:52 2002 @@ -11,9 +11,12 @@ struct task_struct; /* one of the stranger aspects of C forward declarations.. */ extern void FASTCALL(__switch_to(struct task_struct *prev, struct task_struct *next)); -#define prepare_to_switch() do { } while(0) +#define prepare_arch_schedule(prev) do { } while(0) +#define finish_arch_schedule(prev) do { } while(0) +#define prepare_arch_switch(rq) do { } while(0) +#define finish_arch_switch(rq) spin_unlock_irq(&(rq)->lock) -#define switch_to(prev,next) do { \ +#define switch_to(prev,next,last) do { \ asm volatile("pushl %%esi\n\t" \ "pushl %%edi\n\t" \ "pushl %%ebp\n\t" \ diff -rNu linux-2.5.21-final/include/linux/sched.h linux/include/linux/sched.h --- linux-2.5.21-final/include/linux/sched.h Sun Jun 9 07:27:21 2002 +++ linux/include/linux/sched.h Sun Jun 16 16:35:36 2002 @@ -491,6 +491,7 @@ extern unsigned long prof_shift; extern void FASTCALL(__wake_up(wait_queue_head_t *q, unsigned int mode, int nr)); +extern void FASTCALL(__wake_up_sync(wait_queue_head_t *q, unsigned int mode, int nr)); extern void FASTCALL(sleep_on(wait_queue_head_t *q)); extern long FASTCALL(sleep_on_timeout(wait_queue_head_t *q, signed long timeout)); @@ -507,6 +508,11 @@ #define wake_up_interruptible(x) __wake_up((x),TASK_INTERRUPTIBLE, 1) #define wake_up_interruptible_nr(x, nr) __wake_up((x),TASK_INTERRUPTIBLE, nr) #define wake_up_interruptible_all(x) __wake_up((x),TASK_INTERRUPTIBLE, 0) +#ifdef CONFIG_SMP +#define wake_up_interruptible_sync(x) __wake_up_sync((x),TASK_INTERRUPTIBLE, 1) +#else +#define wake_up_interruptible_sync(x) __wake_up((x),TASK_INTERRUPTIBLE, 1) +#endif asmlinkage long sys_wait4(pid_t pid,unsigned int * stat_addr, int options, struct rusage * ru); extern int in_group_p(gid_t); diff -rNu linux-2.5.21-final/include/linux/spinlock.h linux/include/linux/spinlock.h --- linux-2.5.21-final/include/linux/spinlock.h Sun Jun 9 07:30:01 2002 +++ linux/include/linux/spinlock.h Sun Jun 16 16:35:46 2002 @@ -26,6 +26,7 @@ #define write_lock_bh(lock) do { local_bh_disable(); write_lock(lock); } while (0) #define spin_unlock_irqrestore(lock, flags) do { spin_unlock(lock); local_irq_restore(flags); } while (0) +#define _raw_spin_unlock_irqrestore(lock, flags) do { _raw_spin_unlock(lock); local_irq_restore(flags); } while (0) #define spin_unlock_irq(lock) do { spin_unlock(lock); local_irq_enable(); } while (0) #define spin_unlock_bh(lock) do { spin_unlock(lock); local_bh_enable(); } while (0) @@ -183,6 +184,12 @@ preempt_schedule(); \ } while (0) +#define preempt_check_resched() \ +do { \ + if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \ + preempt_schedule(); \ +} while (0) + #define spin_lock(lock) \ do { \ preempt_disable(); \ @@ -197,6 +204,12 @@ preempt_enable(); \ } while (0) +#define spin_unlock_no_resched(lock) \ +do { \ + _raw_spin_unlock(lock); \ + preempt_enable_no_resched(); \ +} while (0) + #define read_lock(lock) ({preempt_disable(); _raw_read_lock(lock);}) #define read_unlock(lock) ({_raw_read_unlock(lock); preempt_enable();}) #define write_lock(lock) ({preempt_disable(); _raw_write_lock(lock);}) @@ -206,20 +219,22 @@ #else -#define preempt_get_count() (0) -#define preempt_disable() do { } while (0) +#define preempt_get_count() (0) +#define preempt_disable() do { } while (0) #define preempt_enable_no_resched() do {} while(0) -#define preempt_enable() do { } while (0) +#define preempt_enable() do { } while (0) +#define preempt_check_resched() do { } while (0) -#define spin_lock(lock) _raw_spin_lock(lock) -#define spin_trylock(lock) _raw_spin_trylock(lock) -#define spin_unlock(lock) _raw_spin_unlock(lock) - -#define read_lock(lock) _raw_read_lock(lock) -#define read_unlock(lock) _raw_read_unlock(lock) -#define write_lock(lock) _raw_write_lock(lock) -#define write_unlock(lock) _raw_write_unlock(lock) -#define write_trylock(lock) _raw_write_trylock(lock) +#define spin_lock(lock) _raw_spin_lock(lock) +#define spin_trylock(lock) _raw_spin_trylock(lock) +#define spin_unlock(lock) _raw_spin_unlock(lock) +#define spin_unlock_no_resched(lock) _raw_spin_unlock(lock) + +#define read_lock(lock) _raw_read_lock(lock) +#define read_unlock(lock) _raw_read_unlock(lock) +#define write_lock(lock) _raw_write_lock(lock) +#define write_unlock(lock) _raw_write_unlock(lock) +#define write_trylock(lock) _raw_write_trylock(lock) #endif /* "lock on reference count zero" */ diff -rNu linux-2.5.21-final/kernel/ksyms.c linux/kernel/ksyms.c --- linux-2.5.21-final/kernel/ksyms.c Sun Jun 9 07:26:33 2002 +++ linux/kernel/ksyms.c Sun Jun 16 16:35:36 2002 @@ -457,6 +457,9 @@ /* process management */ EXPORT_SYMBOL(complete_and_exit); EXPORT_SYMBOL(__wake_up); +#if CONFIG_SMP +EXPORT_SYMBOL_GPL(__wake_up_sync); /* internal use only */ +#endif EXPORT_SYMBOL(wake_up_process); EXPORT_SYMBOL(sleep_on); EXPORT_SYMBOL(sleep_on_timeout); diff -rNu linux-2.5.21-final/kernel/sched.c linux/kernel/sched.c --- linux-2.5.21-final/kernel/sched.c Sun Jun 9 07:28:13 2002 +++ linux/kernel/sched.c Sun Jun 16 16:36:00 2002 @@ -135,7 +135,6 @@ */ struct runqueue { spinlock_t lock; - spinlock_t frozen; unsigned long nr_running, nr_switches, expired_timestamp; signed long nr_uninterruptible; task_t *curr, *idle; @@ -153,17 +152,21 @@ #define cpu_curr(cpu) (cpu_rq(cpu)->curr) #define rt_task(p) ((p)->prio < MAX_RT_PRIO) +/* + * task_rq_lock - lock the runqueue a given task resides on and disable + * interrupts. Note the ordering: we can safely lookup the task_rq without + * explicitly disabling preemption. + */ static inline runqueue_t *task_rq_lock(task_t *p, unsigned long *flags) { struct runqueue *rq; repeat_lock_task: - preempt_disable(); + local_irq_save(*flags); rq = task_rq(p); - spin_lock_irqsave(&rq->lock, *flags); + spin_lock(&rq->lock); if (unlikely(rq != task_rq(p))) { spin_unlock_irqrestore(&rq->lock, *flags); - preempt_enable(); goto repeat_lock_task; } return rq; @@ -172,7 +175,23 @@ static inline void task_rq_unlock(runqueue_t *rq, unsigned long *flags) { spin_unlock_irqrestore(&rq->lock, *flags); - preempt_enable(); +} + +/* + * rq_lock - lock a given runqueue and disable interrupts. + */ +static inline runqueue_t *rq_lock(runqueue_t *rq) +{ + local_irq_disable(); + rq = this_rq(); + spin_lock(&rq->lock); + return rq; +} + +static inline void rq_unlock(runqueue_t *rq) +{ + spin_unlock(&rq->lock); + local_irq_enable(); } /* @@ -284,9 +303,15 @@ repeat: preempt_disable(); rq = task_rq(p); - while (unlikely(rq->curr == p)) { + if (unlikely(rq->curr == p)) { cpu_relax(); - barrier(); + /* + * enable/disable preemption just to make this + * a preemption point - we are busy-waiting + * anyway. + */ + preempt_enable(); + goto repeat; } rq = task_rq_lock(p, &flags); if (unlikely(rq->curr == p)) { @@ -322,40 +347,50 @@ * "current->state = TASK_RUNNING" to mark yourself runnable * without the overhead of this. */ -static int try_to_wake_up(task_t * p) +static int try_to_wake_up(task_t * p, int sync) { unsigned long flags; int success = 0; long old_state; runqueue_t *rq; +repeat_lock_task: rq = task_rq_lock(p, &flags); old_state = p->state; - p->state = TASK_RUNNING; if (!p->array) { + if (unlikely(sync && (rq->curr != p))) { + if (p->thread_info->cpu != smp_processor_id()) { + p->thread_info->cpu = smp_processor_id(); + task_rq_unlock(rq, &flags); + goto repeat_lock_task; + } + } if (old_state == TASK_UNINTERRUPTIBLE) rq->nr_uninterruptible--; activate_task(p, rq); + /* + * If sync is set, a resched_task() is a NOOP + */ if (p->prio < rq->curr->prio) resched_task(rq->curr); success = 1; } + p->state = TASK_RUNNING; task_rq_unlock(rq, &flags); + return success; } int wake_up_process(task_t * p) { - return try_to_wake_up(p); + return try_to_wake_up(p, 0); } void wake_up_forked_process(task_t * p) { runqueue_t *rq; - preempt_disable(); - rq = this_rq(); - spin_lock_irq(&rq->lock); + rq = rq_lock(rq); p->state = TASK_RUNNING; if (!rt_task(p)) { @@ -371,8 +406,7 @@ p->thread_info->cpu = smp_processor_id(); activate_task(p, rq); - spin_unlock_irq(&rq->lock); - preempt_enable(); + rq_unlock(rq); } /* @@ -401,19 +435,18 @@ } #if CONFIG_SMP || CONFIG_PREEMPT -asmlinkage void schedule_tail(void) +asmlinkage void schedule_tail(task_t *prev) { - spin_unlock_irq(&this_rq()->frozen); + finish_arch_switch(this_rq()); + finish_arch_schedule(prev); } #endif -static inline void context_switch(task_t *prev, task_t *next) +static inline task_t * context_switch(task_t *prev, task_t *next) { struct mm_struct *mm = next->mm; struct mm_struct *oldmm = prev->active_mm; - prepare_to_switch(); - if (unlikely(!mm)) { next->active_mm = oldmm; atomic_inc(&oldmm->mm_count); @@ -427,7 +460,9 @@ } /* Here we just switch the register state and the stack. */ - switch_to(prev, next); + switch_to(prev, next, prev); + + return prev; } unsigned long nr_running(void) @@ -773,6 +808,7 @@ rq = this_rq(); release_kernel_lock(prev, smp_processor_id()); + prepare_arch_schedule(prev); prev->sleep_timestamp = jiffies; spin_lock_irq(&rq->lock); @@ -828,28 +864,20 @@ if (likely(prev != next)) { rq->nr_switches++; rq->curr = next; - spin_lock(&rq->frozen); - spin_unlock(&rq->lock); - - context_switch(prev, next); - - /* - * The runqueue pointer might be from another CPU - * if the new task was last running on a different - * CPU - thus re-load it. - */ - mb(); + + prepare_arch_switch(rq); + prev = context_switch(prev, next); + barrier(); rq = this_rq(); - spin_unlock_irq(&rq->frozen); - } else { + finish_arch_switch(rq); + } else spin_unlock_irq(&rq->lock); - } + finish_arch_schedule(prev); reacquire_kernel_lock(current); preempt_enable_no_resched(); if (test_thread_flag(TIF_NEED_RESCHED)) goto need_resched; - return; } #ifdef CONFIG_PREEMPT @@ -880,7 +908,7 @@ * started to run but is not in state TASK_RUNNING. try_to_wake_up() returns * zero in this (rare) case, and we handle it by continuing to scan the queue. */ -static inline void __wake_up_common(wait_queue_head_t *q, unsigned int mode, int nr_exclusive) +static inline void __wake_up_common(wait_queue_head_t *q, unsigned int mode, int nr_exclusive, int sync) { struct list_head *tmp; unsigned int state; @@ -891,7 +919,7 @@ curr = list_entry(tmp, wait_queue_t, task_list); p = curr->task; state = p->state; - if ((state & mode) && try_to_wake_up(p) && + if ((state & mode) && try_to_wake_up(p, sync) && ((curr->flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)) break; } @@ -905,17 +933,36 @@ return; spin_lock_irqsave(&q->lock, flags); - __wake_up_common(q, mode, nr_exclusive); + __wake_up_common(q, mode, nr_exclusive, 0); + spin_unlock_irqrestore(&q->lock, flags); +} + +#if CONFIG_SMP + +void __wake_up_sync(wait_queue_head_t *q, unsigned int mode, int nr_exclusive) +{ + unsigned long flags; + + if (unlikely(!q)) + return; + + spin_lock_irqsave(&q->lock, flags); + if (likely(nr_exclusive)) + __wake_up_common(q, mode, nr_exclusive, 1); + else + __wake_up_common(q, mode, nr_exclusive, 0); spin_unlock_irqrestore(&q->lock, flags); } +#endif + void complete(struct completion *x) { unsigned long flags; spin_lock_irqsave(&x->wait.lock, flags); x->done++; - __wake_up_common(&x->wait, TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, 1); + __wake_up_common(&x->wait, TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, 1, 0); spin_unlock_irqrestore(&x->wait.lock, flags); } @@ -1339,27 +1386,34 @@ asmlinkage long sys_sched_yield(void) { - runqueue_t *rq; - prio_array_t *array; - - preempt_disable(); - rq = this_rq(); + runqueue_t *rq = rq_lock(rq); + prio_array_t *array = current->array; /* - * Decrease the yielding task's priority by one, to avoid - * livelocks. This priority loss is temporary, it's recovered - * once the current timeslice expires. + * There are three levels of how a yielding task will give up + * the current CPU: * - * If priority is already MAX_PRIO-1 then we still - * roundrobin the task within the runlist. - */ - spin_lock_irq(&rq->lock); - array = current->array; - /* - * If the task has reached maximum priority (or is a RT task) - * then just requeue the task to the end of the runqueue: + * #1 - it decreases its priority by one. This priority loss is + * temporary, it's recovered once the current timeslice + * expires. + * #2 - once it has reached the lowest priority level, + * it will give up timeslices one by one. (We do not + * want to give them up all at once, it's gradual, + * to protect the casual yield()er.) + * + * #3 - once all timeslices are gone we put the process into + * the expired array. + * + * (special rule: RT tasks do not lose any priority, they just + * roundrobin on their current priority level.) */ - if (likely(current->prio == MAX_PRIO-1 || rt_task(current))) { + if (likely(current->prio == MAX_PRIO-1)) { + if (current->time_slice <= 1) { + dequeue_task(current, rq->active); + enqueue_task(current, rq->expired); + } else + current->time_slice--; + } else if (unlikely(rt_task(current))) { list_del(¤t->run_list); list_add_tail(¤t->run_list, array->queue + current->prio); } else { @@ -1370,8 +1424,7 @@ list_add_tail(¤t->run_list, array->queue + current->prio); __set_bit(current->prio, array->bitmap); } - spin_unlock(&rq->lock); - preempt_enable_no_resched(); + spin_unlock_no_resched(&rq->lock); schedule(); @@ -1599,7 +1652,6 @@ rq->active = rq->arrays; rq->expired = rq->arrays + 1; spin_lock_init(&rq->lock); - spin_lock_init(&rq->frozen); INIT_LIST_HEAD(&rq->migration_queue); for (j = 0; j < 2; j++) { @@ -1687,7 +1739,15 @@ task_rq_unlock(rq, &flags); goto out; } - + /* + * If the task is not on a runqueue (and not running), then + * it is sufficient to simply update the task's cpu field. + */ + if (!p->array && (p != rq->curr)) { + p->thread_info->cpu = __ffs(p->cpus_allowed); + task_rq_unlock(rq, &flags); + goto out; + } init_MUTEX_LOCKED(&req.sem); req.task = p; list_add(&req.list, &rq->migration_queue); ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-15 22:15 Question about sched_yield() mgix 2002-06-16 14:43 ` [patch] " Ingo Molnar @ 2002-06-18 0:46 ` David Schwartz 2002-06-18 0:55 ` Robert Love ` (4 more replies) 1 sibling, 5 replies; 52+ messages in thread From: David Schwartz @ 2002-06-18 0:46 UTC (permalink / raw) To: mgix, linux-kernel On Sat, 15 Jun 2002 15:15:32 -0700, mgix@mgix.com wrote: > >Hello, > >I am seeing some strange linux scheduler behaviours, >and I thought this'd be the best place to ask. > >I have two processes, one that loops forever and >does nothing but calling sched_yield(), and the other >is basically benchmarking how fast it can compute >some long math calculation. [snip] You seem to have a misconception about what sched_yield is for. It is not a replacement for blocking or a scheduling priority adjustment. It simply lets other ready-to-run tasks be scheduled before returning to the current task. Here's a quote from SuS3: "The sched_yield() function shall force the running thread to relinquish the processor until it again becomes the head of its thread list. It takes no arguments." This neither says nor implies anything about CPU usage. It simply says that the current thread will yield and be put at the end of the list. DS ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-18 0:46 ` David Schwartz @ 2002-06-18 0:55 ` Robert Love 2002-06-18 1:51 ` mgix ` (2 more replies) 2002-06-18 1:41 ` mgix ` (3 subsequent siblings) 4 siblings, 3 replies; 52+ messages in thread From: Robert Love @ 2002-06-18 0:55 UTC (permalink / raw) To: David Schwartz; +Cc: mgix, linux-kernel On Mon, 2002-06-17 at 17:46, David Schwartz wrote: > You seem to have a misconception about what sched_yield is for. > It is not a replacement for blocking or a scheduling priority > adjustment. It simply lets other ready-to-run tasks be scheduled before returning to the current task. > > Here's a quote from SuS3: > > "The sched_yield() function shall force the running thread to relinquish the > processor until it again becomes the head of its thread list. It takes no > arguments." > > This neither says nor implies anything about CPU usage. It simply says > that the current thread will yield and be put at the end of the list. And you seem to have a misconception about sched_yield, too. If a machine has n tasks, half of which are doing CPU-intense work and the other half of which are just yielding... why on Earth would the yielding tasks get any noticeable amount of CPU use? Seems to me the behavior of sched_yield is a bit broken. If the tasks are correctly returned to the end of their runqueue, this should not happen. Note, for example, you will not see this behavior in 2.5. Quite frankly, even if the supposed standard says nothing of this... I do not care: calling sched_yield in a loop should not show up as a CPU hog. Robert Love ^ permalink raw reply [flat|nested] 52+ messages in thread
* RE: Question about sched_yield() 2002-06-18 0:55 ` Robert Love @ 2002-06-18 1:51 ` mgix 2002-06-18 3:18 ` David Schwartz 2002-06-18 9:36 ` David Schwartz 2 siblings, 0 replies; 52+ messages in thread From: mgix @ 2002-06-18 1:51 UTC (permalink / raw) To: Robert Love, David Schwartz; +Cc: linux-kernel > Seems to me the behavior of sched_yield is a bit broken. If the tasks > are correctly returned to the end of their runqueue, this should not > happen. Note, for example, you will not see this behavior in 2.5. Actually, it seems to happen in 2.5 too. However, Ingo sent me a patch for 2.5.21 that fixes the issue. See this message: http://marc.theaimsgroup.com/?l=linux-kernel&m=102423901727214&w=2 - Mgix ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-18 0:55 ` Robert Love 2002-06-18 1:51 ` mgix @ 2002-06-18 3:18 ` David Schwartz 2002-06-18 9:36 ` David Schwartz 2 siblings, 0 replies; 52+ messages in thread From: David Schwartz @ 2002-06-18 3:18 UTC (permalink / raw) To: rml; +Cc: mgix, linux-kernel >> This neither says nor implies anything about CPU usage. It simply says >>that the current thread will yield and be put at the end of the list. >And you seem to have a misconception about sched_yield, too. If a >machine has n tasks, half of which are doing CPU-intense work and the >other half of which are just yielding... why on Earth would the yielding >tasks get any noticeable amount of CPU use? Because they're running infinite loops! >Quite frankly, even if the supposed standard says nothing of this... I >do not care: calling sched_yield in a loop should not show up as a CPU >hog. Calling any function that does not block in an endless loop *should* show up as a CPU hog. Yielding is not blocking or even lowering your priority. DS ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-18 0:55 ` Robert Love 2002-06-18 1:51 ` mgix 2002-06-18 3:18 ` David Schwartz @ 2002-06-18 9:36 ` David Schwartz 2002-06-18 16:58 ` Chris Friesen 2 siblings, 1 reply; 52+ messages in thread From: David Schwartz @ 2002-06-18 9:36 UTC (permalink / raw) To: rml; +Cc: mgix, linux-kernel >And you seem to have a misconception about sched_yield, too. If a >machine has n tasks, half of which are doing CPU-intense work and the >other half of which are just yielding... why on Earth would the yielding >tasks get any noticeable amount of CPU use? Because they are not blocking. They are in an endless CPU burning loop. They should get CPU use for the same reason they should get CPU use if they're the only threads running. They are always ready-to-run. >Quite frankly, even if the supposed standard says nothing of this... I >do not care: calling sched_yield in a loop should not show up as a CPU >hog. It has to. What if the only task running is: while(1) sched_yield(); What would you expect? You cannot use sched_yield as a replacement for blocking or scheduling priority. You cannot use it to be 'nice'. You can only use it to try to give other threads a chance to run, usually to give them a chance to release a resource. Imagine if a spinlock uses sched_yield in its wait loop, what would you expect on a dual-CPU system with a process on CPU A holding the lock? You *want* the sched_yield process to burn the CPU fully, so that it notices the spinlock acquisition as soon as possible. While linux's sched_yield behavior is definitely sub-optimal, the particular criticism being leveled (that sched_yield processes get too much CPU) is wrong-headed. DS ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-18 9:36 ` David Schwartz @ 2002-06-18 16:58 ` Chris Friesen 2002-06-18 17:12 ` Richard B. Johnson ` (2 more replies) 0 siblings, 3 replies; 52+ messages in thread From: Chris Friesen @ 2002-06-18 16:58 UTC (permalink / raw) To: David Schwartz; +Cc: rml, mgix, linux-kernel David Schwartz wrote: > > >And you seem to have a misconception about sched_yield, too. If a > >machine has n tasks, half of which are doing CPU-intense work and the > >other half of which are just yielding... why on Earth would the yielding > >tasks get any noticeable amount of CPU use? > > Because they are not blocking. They are in an endless CPU burning loop. They > should get CPU use for the same reason they should get CPU use if they're the > only threads running. They are always ready-to-run. > > >Quite frankly, even if the supposed standard says nothing of this... I > >do not care: calling sched_yield in a loop should not show up as a CPU > >hog. > > It has to. What if the only task running is: > > while(1) sched_yield(); > > What would you expect? If there is only the one task, then sure it's going to be 100% cpu on that task. However, if there is anything else other than the idle task that wants to run, then it should run until it exhausts its timeslice. One process looping on sched_yield() and another one doing calculations should result in almost the entire system being devoted to calculations. Chris -- Chris Friesen | MailStop: 043/33/F10 Nortel Networks | work: (613) 765-0557 3500 Carling Avenue | fax: (613) 765-2986 Nepean, ON K2H 8E9 Canada | email: cfriesen@nortelnetworks.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-18 16:58 ` Chris Friesen @ 2002-06-18 17:12 ` Richard B. Johnson 2002-06-18 17:19 ` mgix 2002-06-18 17:13 ` Robert Love 2002-06-18 17:23 ` Rik van Riel 2 siblings, 1 reply; 52+ messages in thread From: Richard B. Johnson @ 2002-06-18 17:12 UTC (permalink / raw) To: Chris Friesen; +Cc: David Schwartz, rml, mgix, linux-kernel On Tue, 18 Jun 2002, Chris Friesen wrote: > David Schwartz wrote: > > > > >And you seem to have a misconception about sched_yield, too. If a > > >machine has n tasks, half of which are doing CPU-intense work and the > > >other half of which are just yielding... why on Earth would the yielding > > >tasks get any noticeable amount of CPU use? > > > > Because they are not blocking. They are in an endless CPU burning loop. They > > should get CPU use for the same reason they should get CPU use if they're the > > only threads running. They are always ready-to-run. > > > > >Quite frankly, even if the supposed standard says nothing of this... I > > >do not care: calling sched_yield in a loop should not show up as a CPU > > >hog. > > > > It has to. What if the only task running is: > > > > while(1) sched_yield(); > > > > What would you expect? > > If there is only the one task, then sure it's going to be 100% cpu on that task. > > However, if there is anything else other than the idle task that wants to run, > then it should run until it exhausts its timeslice. > > One process looping on sched_yield() and another one doing calculations should > result in almost the entire system being devoted to calculations. > > Chris > It's all in the accounting. Use usleep(0) if you want it to "look good". Cheers, Dick Johnson Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips). Windows-2000/Professional isn't. ^ permalink raw reply [flat|nested] 52+ messages in thread
* RE: Question about sched_yield() 2002-06-18 17:12 ` Richard B. Johnson @ 2002-06-18 17:19 ` mgix 2002-06-18 18:01 ` David Schwartz 2002-06-18 18:21 ` Richard B. Johnson 0 siblings, 2 replies; 52+ messages in thread From: mgix @ 2002-06-18 17:19 UTC (permalink / raw) To: root, Chris Friesen; +Cc: David Schwartz, rml, linux-kernel > It's all in the accounting. Use usleep(0) if you want it to "look good". Two things: 1. First, I think there's a misunderstanding on what my original issue was: I am not interested in any way by CPU cycle accounting, and wether the yielding loop should log any of it. All I want is: when I run a bunch of yielders and a actual working process, I want the working process to not be slown down (wall clock) in anyway. That's all. What top shows is of little interest (to me). What matters is how many real world seconds it takes for the actually working process to complete its task. And that should not be affected by the presence of running yielders. And, David, no one is arguing the fact that a yielder running all by itself should log 100% of the CPU. 2. I have a question about usleep(0). You seem to make the point that usleep(0) is equivalent to sched_yield(). I can see how intuitively this should be the case, but I am not sure if it will always be true. It's certainly documented anywhere. - Mgix ^ permalink raw reply [flat|nested] 52+ messages in thread
* RE: Question about sched_yield() 2002-06-18 17:19 ` mgix @ 2002-06-18 18:01 ` David Schwartz 2002-06-18 18:05 ` mgix 2002-06-18 22:43 ` Olivier Galibert 2002-06-18 18:21 ` Richard B. Johnson 1 sibling, 2 replies; 52+ messages in thread From: David Schwartz @ 2002-06-18 18:01 UTC (permalink / raw) To: mgix, root, Chris Friesen; +Cc: rml, linux-kernel >All I want is: when I run a bunch of >yielders and a actual working process, I want the >working process to not be slown down (wall clock) in >anyway. That's all. What top shows is of little interest >(to me). What matters is how many real world seconds it takes >for the actually working process to complete its task. >And that should not be affected by the presence of running >yielders. And, David, no one is arguing the fact that a yielder >running all by itself should log 100% of the CPU. Your assumptions are just plain wrong. The yielder is being nice, so it should get preferential treatment, not worse treatment. All threads are ready-to-run all the time. Yielding is not the same as blocking or lowering your priority. DS ^ permalink raw reply [flat|nested] 52+ messages in thread
* RE: Question about sched_yield() 2002-06-18 18:01 ` David Schwartz @ 2002-06-18 18:05 ` mgix 2002-06-18 19:11 ` David Schwartz 2002-06-18 22:43 ` Olivier Galibert 1 sibling, 1 reply; 52+ messages in thread From: mgix @ 2002-06-18 18:05 UTC (permalink / raw) To: David Schwartz, root, Chris Friesen; +Cc: rml, linux-kernel > Your assumptions are just plain wrong. The yielder is being nice, so it > should get preferential treatment, not worse treatment. All threads are > ready-to-run all the time. Yielding is not the same as blocking or lowering > your priority. In other words, the more you yield, the nicer you are and the more CPU you get, and those nasty processes that are trying to actually use the CPU to do something with it and wear it down should get it as little as possible. I get it. - Mgix ^ permalink raw reply [flat|nested] 52+ messages in thread
* RE: Question about sched_yield() 2002-06-18 18:05 ` mgix @ 2002-06-18 19:11 ` David Schwartz 2002-06-18 16:58 ` Rob Landley 2002-06-18 19:25 ` Robert Love 0 siblings, 2 replies; 52+ messages in thread From: David Schwartz @ 2002-06-18 19:11 UTC (permalink / raw) To: mgix, root, Chris Friesen; +Cc: rml, linux-kernel On Tue, 18 Jun 2002 11:05:36 -0700, mgix@mgix.com wrote: >> Your assumptions are just plain wrong. The yielder is being nice, so it >>should get preferential treatment, not worse treatment. All threads are >>ready-to-run all the time. Yielding is not the same as blocking or lowering >>your priority. >In other words, the more you yield, the nicer you >are and the more CPU you get, and those nasty processes >that are trying to actually use the CPU to do something >with it and wear it down should get it as little as possible. >I get it. > > - Mgix I'm sorry, but you are being entirely unreasonable. The kernel has no way to know which processes are doing something useful and which ones are just wasting CPU. It knows is that they are all ready-to-run and they all have the same priority and none of them are blocking. The yielding threads are playing nice and shouldn't be penalized for it. The following code: while(1) sched_yield(); Is the problem, not the kernel. You can't expect the kernel's ESP to figure out that you really meant something else or that your process isn't really doing anything useful. If you didn't mean to burn the CPU in an endless loop, WHY DID YOU? You should never call sched_yield in a loop like this unless your intent is to burn the CPU until some other thread/process does something. Since you rarely want to do this, you should seldom if ever call sched_yield in a loop. What sched_yield is good for is if you encounter a situation where you need/want some resource and another thread/process has it. You call sched_yield once, and maybe when you run again, the other thread/process will have released it. You can also use it as the spin function in spinlocks. But your expectation that it will reduce CPU usage is just plain wrong. If you have one thread spinning on sched_yield, on a single CPU machine it will definitely get 100% of the CPU. If you have two, they will each definitely get 50% of the CPU. There are blocking functions and scheduler priority functions for this purpose. DS ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-18 19:11 ` David Schwartz @ 2002-06-18 16:58 ` Rob Landley 2002-06-18 19:25 ` Robert Love 1 sibling, 0 replies; 52+ messages in thread From: Rob Landley @ 2002-06-18 16:58 UTC (permalink / raw) To: David Schwartz, mgix, root, Chris Friesen; +Cc: rml, linux-kernel On Tuesday 18 June 2002 03:11 pm, David Schwartz wrote: > I'm sorry, but you are being entirely unreasonable. The kernel has no way > to know which processes are doing something useful and which ones are just > wasting CPU. So the fact one of them is calling sched_yield (repeatedly) and the other one isn't doesn't count as just a little BIT of a hint? > What sched_yield is good for is if you encounter a situation where you > need/want some resource and another thread/process has it. You call > sched_yield once, and maybe when you run again, the other thread/process > will have released it. Not if it was a higher-priority task that has already exhausted its time slice... > You can also use it as the spin function in > spinlocks. In user space? Rob ^ permalink raw reply [flat|nested] 52+ messages in thread
* RE: Question about sched_yield() 2002-06-18 19:11 ` David Schwartz 2002-06-18 16:58 ` Rob Landley @ 2002-06-18 19:25 ` Robert Love 2002-06-18 19:53 ` David Schwartz ` (2 more replies) 1 sibling, 3 replies; 52+ messages in thread From: Robert Love @ 2002-06-18 19:25 UTC (permalink / raw) To: David Schwartz; +Cc: mgix, root, Chris Friesen, linux-kernel On Tue, 2002-06-18 at 12:11, David Schwartz wrote: > I'm sorry, but you are being entirely unreasonable. No, sorry, you are. Listen to everyone else here. > If you didn't mean to burn the CPU in an endless loop, WHY DID YOU? It is not an endless loop. Here is the problem. You have n tasks. One type executes: while(1) ; the others execute: while(1) sched_yield(); the second bunch should _not_ receive has much CPU time as the first. This has nothing to do with intelligent work or blocking or picking your nose. It has everything to do with the fact that yielding means "relinquish my timeslice" and "put me at the end of the runqueue". If we are doing this, then why does the sched_yield'ing task monopolize the CPU? BECAUSE IT IS BROKEN. > You should never call sched_yield in a loop like this unless your intent is > to burn the CPU until some other thread/process does something. Since you > rarely want to do this, you should seldom if ever call sched_yield in a loop. But there are other tasks that wish to do something in these examples... > But your expectation that it will reduce CPU usage is just plain wrong. If > you have one thread spinning on sched_yield, on a single CPU machine it will > definitely get 100% of the CPU. If you have two, they will each definitely > get 50% of the CPU. There are blocking functions and scheduler priority > functions for this purpose. If they are all by themselves, of course they will get 100% of the CPU. No one is saying sched_yield is equivalent to blocking. I am not even saying it should get no CPU! It should get a bunch. But all the processes being equal, one that keeps yielding its timeslice should not get as much CPU time as one that does not. Why is that not logical to you? The original report was that given one task of the second case (above) and two of the first, everything was fine - the yielding task received little CPU as it continually relinquishes its timeslice. In the alternative case, there are two of each types of tasks. Now, the CPU is split and the yielding tasks are receiving a chunk of it. Why? Because the yielding behavior is broken and the tasks are continually yielding and rescheduling back and forth. So there is an example of how it should work and how it does. It is broken. There isn't even really an argument. Ingo and I have both identified this is a problem in 2.4 and 2.5 and Ingo fixed it in 2.5. If 2.5 no longer has this behavior, then are you saying it is NOW broken? Robert Love ^ permalink raw reply [flat|nested] 52+ messages in thread
* RE: Question about sched_yield() 2002-06-18 19:25 ` Robert Love @ 2002-06-18 19:53 ` David Schwartz 2002-06-18 20:12 ` mgix 2002-06-18 20:08 ` Richard B. Johnson 2002-06-19 11:10 ` Bill Davidsen 2 siblings, 1 reply; 52+ messages in thread From: David Schwartz @ 2002-06-18 19:53 UTC (permalink / raw) To: rml; +Cc: mgix, root, Chris Friesen, cfriesen, linux-kernel On 18 Jun 2002 12:25:13 -0700, Robert Love wrote: >> If you didn't mean to burn the CPU in an endless loop, WHY DID YOU? >It is not an endless loop. Here is the problem. You have n tasks. One >type executes: > > while(1) ; > >the others execute: > > while(1) > sched_yield(); >the second bunch should _not_ receive has much CPU time as the first. Why not? They're both at the same priority. They're both always ready-to-run. Neither blocks. Why should the kernel assume that one will do useful work while the other won't? >This has nothing to do with intelligent work or blocking or picking your >nose. It has everything to do with the fact that yielding means >"relinquish my timeslice" and "put me at the end of the runqueue". And consider me immediately ready to run again. >> You should never call sched_yield in a loop like this unless your >>intent >>is >>to burn the CPU until some other thread/process does something. Since you >>rarely want to do this, you should seldom if ever call sched_yield in a >>loop. >But there are other tasks that wish to do something in these examples... All the tasks are equally situated. Same priority, all ready-to-run. >> But your expectation that it will reduce CPU usage is just plain wrong. >>If >> >>you have one thread spinning on sched_yield, on a single CPU machine it >>will >>definitely get 100% of the CPU. If you have two, they will each definitely >>get 50% of the CPU. There are blocking functions and scheduler priority >>functions for this purpose. >If they are all by themselves, of course they will get 100% of the CPU. >No one is saying sched_yield is equivalent to blocking. I am not even >saying it should get no CPU! It should get a bunch. But all the >processes being equal, one that keeps yielding its timeslice should not >get as much CPU time as one that does not. Why is that not logical to >you? Because a thread/process that voluntarily yields its timeslice should be rewarded over one that burns it up, not penalized. >The original report was that given one task of the second case (above) >and two of the first, everything was fine - the yielding task received >little CPU as it continually relinquishes its timeslice. In the >alternative case, there are two of each types of tasks. Now, the CPU is >split and the yielding tasks are receiving a chunk of it. Why? Because >the yielding behavior is broken and the tasks are continually yielding >and rescheduling back and forth. So there is an example of how it >should work and how it does. It is broken. So you're saying that the yielding tasks should be penalized for yielding rather than just being descheduled? >There isn't even really an argument. Ingo and I have both identified >this is a problem in 2.4 and 2.5 and Ingo fixed it in 2.5. If 2.5 no >longer has this behavior, then are you saying it is NOW broken? I'm saying that the fixed behavior is better than the previous behavior and that the previous behavior wasn't particularly good. But I'm also saying that people are misusing sched_yield and have incorrect expectations about it. If you spin on sched_yield, expect CPU wastage. It's really that simple. Threads/processes that use sched_yield intelligently should be rewarded for doing so, not penalized. Otherwise sched_yield becomes much less useful. DS ^ permalink raw reply [flat|nested] 52+ messages in thread
* RE: Question about sched_yield() 2002-06-18 19:53 ` David Schwartz @ 2002-06-18 20:12 ` mgix 2002-06-18 20:42 ` David Schwartz 0 siblings, 1 reply; 52+ messages in thread From: mgix @ 2002-06-18 20:12 UTC (permalink / raw) To: David Schwartz, rml; +Cc: root, Chris.Friesen, cfriesen, linux-kernel Let's do a little thought experiment with 2 naive scheduling algorithms and see what happens: Algorithm 1: the FIFO (favours those that haven't had it yet over those who already had it) - All ready to run processes are in a FIFO queue. - The top of the queue is the yielder - It's given a slice of CPU time to run on - It gives it back right away. - It's sent to back to the queue. - The next in line is a task that does real work. - It gets a time slice and fully uses it. - etc ... Algorithm 1 will have the expected behaviour: yielders won't consume anything, workers get it all. Algorithm 2: the Monte-Carlo scheduler (favours no one, schedules blindly) - All ready to run processes are are kept in a black bag - The scheduler randomly grabs one out of the bag - It's given a slice of CPU time to run on - If it's a yielder, it gives the CPU right back - If it's an actual worker, it makes full use of the slice. - etc ... Lo and behold, Algorithm 2 exhibits the same behaviour as well: yielders get nothing since they give it all away, and workers get it all. Now, if I understand you well enough David, you'd like an algorithm where the less you want the CPU, the more you get it. I'd love if you could actually give us an outlook of your ideal scheduler so I can try my thought experiment on it, because from what I've understood so far, your hypothetical scheduler would allocate all of the CPU to the yielders. Also, since it seems to worry you: no I'm not using sched_yield to implement pseudo-blocking behaviour. - Mgix ^ permalink raw reply [flat|nested] 52+ messages in thread
* RE: Question about sched_yield() 2002-06-18 20:12 ` mgix @ 2002-06-18 20:42 ` David Schwartz 2002-06-18 20:47 ` mgix 2002-06-18 22:28 ` Ingo Molnar 0 siblings, 2 replies; 52+ messages in thread From: David Schwartz @ 2002-06-18 20:42 UTC (permalink / raw) To: mgix, rml; +Cc: root, Chris Friesen, cfriesen, linux-kernel >Now, if I understand you well enough David, you'd like an >algorithm where the less you want the CPU, the more you get >it. Exactly. This is the UNIX tradition of static and dynamic priorities. The more polite you are about yielding the CPU when you don't need it, the more claim you have to getting it when you do need it. >I'd love if you could actually give us an outlook of >your ideal scheduler so I can try my thought experiment on it, >because from what I've understood so far, your hypothetical >scheduler would allocate all of the CPU to the yielders. Not all, just the same share any other process gets. They're all ready-to-run, they're all at the same priority. >Also, since it seems to worry you: no I'm not using sched_yield >to implement pseudo-blocking behaviour. Then tell us what you are doing so we can tell you the *right* way to do it. Unless this is just an abstract theoretical exercise, you shouldn't complain when ready-to-run threads get the CPU. DS ^ permalink raw reply [flat|nested] 52+ messages in thread
* RE: Question about sched_yield() 2002-06-18 20:42 ` David Schwartz @ 2002-06-18 20:47 ` mgix 2002-06-18 22:00 ` David Schwartz 2002-06-18 22:28 ` Ingo Molnar 1 sibling, 1 reply; 52+ messages in thread From: mgix @ 2002-06-18 20:47 UTC (permalink / raw) To: David Schwartz, rml; +Cc: root, Chris.Friesen, cfriesen, linux-kernel > >Now, if I understand you well enough David, you'd like an > >algorithm where the less you want the CPU, the more you get > >it. > > Exactly. This is the UNIX tradition of static and dynamic priorities. The > more polite you are about yielding the CPU when you don't need it, the more > claim you have to getting it when you do need it. > > >I'd love if you could actually give us an outlook of > >your ideal scheduler so I can try my thought experiment on it, > >because from what I've understood so far, your hypothetical > >scheduler would allocate all of the CPU to the yielders. > > Not all, just the same share any other process gets. They're all > ready-to-run, they're all at the same priority. Correct my logic, please: 1. Rule: The less you want the CPU, the more you get it. 2. A yielder process never wants the CPU 3. As a result of Rule 1, it always gets it. ^ permalink raw reply [flat|nested] 52+ messages in thread
* RE: Question about sched_yield() 2002-06-18 20:47 ` mgix @ 2002-06-18 22:00 ` David Schwartz 0 siblings, 0 replies; 52+ messages in thread From: David Schwartz @ 2002-06-18 22:00 UTC (permalink / raw) To: mgix, rml; +Cc: root, Chris Friesen, cfriesen, linux-kernel >Correct my logic, please: > > 1. Rule: The less you want the CPU, the more you get it. The more you relinquish the CPU, the more you get it when you do want it. (Dynamic priority.) > 2. A yielder process never wants the CPU A yielder process *always* wants the CPU, but always relinquishes it when it gets it. (It's always ready-to-run.) > 3. As a result of Rule 1, it always gets it. The correct rules 1 and 2 don't lead to the conclusion you think. DS ^ permalink raw reply [flat|nested] 52+ messages in thread
* RE: Question about sched_yield() 2002-06-18 20:42 ` David Schwartz 2002-06-18 20:47 ` mgix @ 2002-06-18 22:28 ` Ingo Molnar 1 sibling, 0 replies; 52+ messages in thread From: Ingo Molnar @ 2002-06-18 22:28 UTC (permalink / raw) To: David Schwartz Cc: mgix, Robert Love, root, Chris Friesen, cfriesen, linux-kernel On Tue, 18 Jun 2002, David Schwartz wrote: > Exactly. This is the UNIX tradition of static and dynamic > priorities. The more polite you are about yielding the CPU when you > don't need it, the more claim you have to getting it when you do need > it. firstly, the thing that defines the scheduler's implementation details is not tradition but actual, hard, verifiable use. If you've ever seen sched_yield() code under Linux then you'll quickly realize that it's mainly used as a "oh damn, i cannot continue now, i have no proper kernel object to sleep on, lets busy-wait" kind of mechanizm. (Which, by the way, is as far from polite as it gets.) secondly, i'd like to ask everyone arguing one way or other, and this means you and Robert and everyone else as well, to actually read and think about the fix i did. The new implementation of sched_yield() is *not* throwing away your timeslices immediately. In fact it first tries it the 'polite' way to get some progress by gradually decreasing its priority - *IF* that fails (and it has to fail and keep looping for at least ~10 times if there are default priorities) then it will start dropping away timeslices - one by one. That gives another 20 opportunities for the task to actually get some work done. Even after this we still let the task run one more jiffy. *THEN* only is the task pushed back into the expired array. so the tests quoted here were the extreme example: a task did nothing but sched_yield(). And the modified scheduler did a good job of supressing this process, if there was other, non-sched_yield() work around. A database process with a casual sched_yield() call should not see much effect. Ingo ^ permalink raw reply [flat|nested] 52+ messages in thread
* RE: Question about sched_yield() 2002-06-18 19:25 ` Robert Love 2002-06-18 19:53 ` David Schwartz @ 2002-06-18 20:08 ` Richard B. Johnson 2002-06-19 11:10 ` Bill Davidsen 2 siblings, 0 replies; 52+ messages in thread From: Richard B. Johnson @ 2002-06-18 20:08 UTC (permalink / raw) To: Robert Love; +Cc: David Schwartz, mgix, Chris Friesen, linux-kernel On 18 Jun 2002, Robert Love wrote: > On Tue, 2002-06-18 at 12:11, David Schwartz wrote: > > > I'm sorry, but you are being entirely unreasonable. > > No, sorry, you are. Listen to everyone else here. > > > If you didn't mean to burn the CPU in an endless loop, WHY DID YOU? > > It is not an endless loop. Here is the problem. You have n tasks. One > type executes: > > while(1) ; > > the others execute: > > while(1) > sched_yield(); > > the second bunch should _not_ receive has much CPU time as the first. > This has nothing to do with intelligent work or blocking or picking your > nose. It has everything to do with the fact that yielding means > "relinquish my timeslice" and "put me at the end of the runqueue". > > If we are doing this, then why does the sched_yield'ing task monopolize > the CPU? BECAUSE IT IS BROKEN. > > > You should never call sched_yield in a loop like this unless your intent is > > to burn the CPU until some other thread/process does something. Since you > > rarely want to do this, you should seldom if ever call sched_yield in a loop. > > But there are other tasks that wish to do something in these examples... > > > But your expectation that it will reduce CPU usage is just plain wrong. If > > you have one thread spinning on sched_yield, on a single CPU machine it will > > definitely get 100% of the CPU. If you have two, they will each definitely > > get 50% of the CPU. There are blocking functions and scheduler priority > > functions for this purpose. > > If they are all by themselves, of course they will get 100% of the CPU. > No one is saying sched_yield is equivalent to blocking. I am not even > saying it should get no CPU! It should get a bunch. But all the > processes being equal, one that keeps yielding its timeslice should not > get as much CPU time as one that does not. Why is that not logical to > you? > > The original report was that given one task of the second case (above) > and two of the first, everything was fine - the yielding task received > little CPU as it continually relinquishes its timeslice. In the > alternative case, there are two of each types of tasks. Now, the CPU is > split and the yielding tasks are receiving a chunk of it. Why? Because > the yielding behavior is broken and the tasks are continually yielding > and rescheduling back and forth. So there is an example of how it > should work and how it does. It is broken. > > There isn't even really an argument. Ingo and I have both identified > this is a problem in 2.4 and 2.5 and Ingo fixed it in 2.5. If 2.5 no > longer has this behavior, then are you saying it is NOW broken? > > Robert Love Lets put in some numbers. for(;;) Task 0 ; for(;;) Task 1 sched_yield(); I will assume that scheduling takes 0 seconds and system calls also take 0 seconds. This should be fair if I modify task 0 so it does: for(;;) getuid(); Which means both tasks make continuous system calls, which should take, roughly, the same time. Initial conditions has task B spinning for HZ time until preempted so Task 0 starts up with HZ more time than Task 1. Task 1 now gets the CPU. When Task 1 gets the CPU, it immediately gives it up, it does not wait until it's preempted like task 0. Task 0 now gets the CPU and keeps it for HZ time. Clearly, task 0 should be getting at least HZ more CPU time than task 1. And, yawn, it probably does! I just launched 400 of the following programs: int main(int c, char *argv[]) { for(;;) sched_yield(); return c; } This is a dual CPU system. Top shows 184% system CPU usage and 14% user CPU usage. 3:54pm up 23:30, 2 users, load average: 14.74, 8.08, 3.30 463 processes: 62 sleeping, 401 running, 0 zombie, 0 stopped CPU states: 13.7% user, 186.3% system, 0.0% nice, 0.0% idle Mem: 320720K av, 215816K used, 104904K free, 0K shrd, 11752K buff Swap: 1044208K av, 1044208K used, 0K free 131732K cached PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND 8627 root 18 0 208 208 172 R 0 15.6 0.0 0:31 xxx^[[K 8637 root 18 0 208 208 172 R 0 15.6 0.0 0:30 xxx^[[K 8638 root 18 0 208 208 172 R 0 15.6 0.0 0:30 xxx^[[K 8624 root 15 0 208 208 172 R 0 13.7 0.0 0:32 xxx^[[K 8629 root 15 0 208 208 172 R 0 13.7 0.0 0:31 xxx^[[K 8632 root 15 0 208 208 172 R 0 13.7 0.0 0:30 xxx^[[K 8626 root 15 0 208 208 172 R 0 12.7 0.0 0:31 xxx^[[K 8628 root 15 0 208 208 172 R 0 12.7 0.0 0:31 xxx^[[K 8633 root 15 0 208 208 172 R 0 12.7 0.0 0:30 xxx^[[K 8639 root 15 0 208 208 172 R 0 12.7 0.0 0:30 xxx^[[K 8625 root 14 0 208 208 172 R 0 11.7 0.0 0:32 xxx^[[K 8630 root 15 0 208 208 172 R 0 11.7 0.0 0:30 xxx^[[K 8634 root 14 0 208 208 172 R 0 11.7 0.0 0:30 xxx^[[K 8635 root 15 0 208 208 172 R 0 11.7 0.0 0:30 xxx^[[K 8636 root 15 0 208 208 172 R 0 11.7 0.0 0:30 xxx^[[K [SNIPPED] The system is as responsive as ever even though there are all those 'CPU burners', each showing they take roughly 12 percent of the CPU. Let's see 400 * 0.12 = 48 == accounting error, nothing more. Cheers, Dick Johnson Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips). Windows-2000/Professional isn't. ^ permalink raw reply [flat|nested] 52+ messages in thread
* RE: Question about sched_yield() 2002-06-18 19:25 ` Robert Love 2002-06-18 19:53 ` David Schwartz 2002-06-18 20:08 ` Richard B. Johnson @ 2002-06-19 11:10 ` Bill Davidsen 2002-06-19 12:04 ` Ingo Molnar 2 siblings, 1 reply; 52+ messages in thread From: Bill Davidsen @ 2002-06-19 11:10 UTC (permalink / raw) To: Robert Love; +Cc: Linux Kernel Mailing List On 18 Jun 2002, Robert Love wrote: > the second bunch should _not_ receive has much CPU time as the first. > This has nothing to do with intelligent work or blocking or picking your > nose. It has everything to do with the fact that yielding means > "relinquish my timeslice" and "put me at the end of the runqueue". I have put some ideas below, I agree with "relinquish" but not end of queue. > If we are doing this, then why does the sched_yield'ing task monopolize > the CPU? BECAUSE IT IS BROKEN. Consider the case where a threaded application and a CPU hog are running. In sum the threaded process is also a hog, although any one thread is not. I find I can figure out problems like this if I start at the desired result and work back, and what I believe is the desired result is that the pure hog gets half the CPU and the threaded application, in aggregate, get the other. So to provide this, I would think that the desired action at sched_yeild() would look somewhat like this: If there is less than {sched overhead} on the timeslice add to end of queue do accounting to reflect the time used run the next thread else if there is another thread of this process queue this thread following the last thread of this process give the remaining time slice to the top thread of this process else put the yeild thread at the end of the queue . . One question which arises is cpu affinity, and I confess that I can't predict the better of process affinity or contested resource affinity (run the next thread now). But for uni machines, I think what I outline above does result in the behaviour I find most intuitive and desirable. On your comment about leaving the yeilded thread at the top of the queue, putting the yeilded thread at the end of the queue effectively forces the context switch time to be the sum of all the timeslices given in one pass through the queue. In a system with a contested resource and at least one hog, I think that would make the threaded task starve for CPU. As noted by several people, queue back at the head also can behave badly for some cases, although the demo program provided is probably not typical. I think what I suggest will avoid both problems, although it might have other drawbacks, like sched overhead if the queue was very long and didn't have other threads. Atypical, but a max "scan for another thread" count could be provided, or even a per-process count of runable threads in the queue, which is probably more overhead than just scanning for light load. At least under light load you can afford it. Your comments? -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 52+ messages in thread
* RE: Question about sched_yield() 2002-06-19 11:10 ` Bill Davidsen @ 2002-06-19 12:04 ` Ingo Molnar 0 siblings, 0 replies; 52+ messages in thread From: Ingo Molnar @ 2002-06-19 12:04 UTC (permalink / raw) To: Bill Davidsen; +Cc: Robert Love, Linux Kernel Mailing List On Wed, 19 Jun 2002, Bill Davidsen wrote: > Consider the case where a threaded application and a CPU hog are > running. In sum the threaded process is also a hog, although any one > thread is not. it is a mistake to assume that sched_yield() == 'threaded applications'. sched_yield() is a horrible interface to base userspace spinlocks on, it's not in any way cheaper than real blocking, even in the best-case situation when there is no 'useless' yielding back and forth. In the worst-case situation it can be a few orders more expensive (!) than blocking-based solutions. LinuxThread's use of sched_yield() was and remains a hack. I'm actually surprised that it works for real applications. We are trying to improve things as much as possible on the kernel scheduler side, but since there is absolutely no object/contention information available for kernel space (why it wants to schedule away, which other processes are using the blocked object, etc.), it's impossible to do a perfect job - in fact it's not even possible to do a 'good' job. IMO people interested in high-performance threading should concentrate on the lightweight, kernel-backed userspace semaphores patch(es) that have been presented a few weeks ago, and merge those concepts into LinuxThreads (or into their own threading/spinlock solutions). Those patches do it properly, and the scheduler will be able to give all the help it can. Ingo ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-18 18:01 ` David Schwartz 2002-06-18 18:05 ` mgix @ 2002-06-18 22:43 ` Olivier Galibert 1 sibling, 0 replies; 52+ messages in thread From: Olivier Galibert @ 2002-06-18 22:43 UTC (permalink / raw) To: linux-kernel On Tue, Jun 18, 2002 at 11:01:53AM -0700, David Schwartz wrote: > Your assumptions are just plain wrong. The yielder is being nice, so it > should get preferential treatment, not worse treatment. Heh? Yielding is not about being nice, if you want to be nice you lower your priority. Yielding is telling the scheduler "I am temporarily out of things to do, maybe for a while, but a least until all the others running threads got a chance to run too". Any scheduler that runs you immediatly again without running the others threads is _broken_. OG. ^ permalink raw reply [flat|nested] 52+ messages in thread
* RE: Question about sched_yield() 2002-06-18 17:19 ` mgix 2002-06-18 18:01 ` David Schwartz @ 2002-06-18 18:21 ` Richard B. Johnson 1 sibling, 0 replies; 52+ messages in thread From: Richard B. Johnson @ 2002-06-18 18:21 UTC (permalink / raw) To: mgix; +Cc: Chris Friesen, David Schwartz, rml, linux-kernel On Tue, 18 Jun 2002 mgix@mgix.com wrote: > > > It's all in the accounting. Use usleep(0) if you want it to "look good". > > > Two things: > > 1. First, I think there's a misunderstanding on what my > original issue was: I am not interested in any way by > CPU cycle accounting, and wether the yielding loop should > log any of it. All I want is: when I run a bunch of > yielders and a actual working process, I want the > working process to not be slown down (wall clock) in > anyway. That's all. What top shows is of little interest > (to me). What matters is how many real world seconds it takes > for the actually working process to complete its task. > And that should not be affected by the presence of running > yielders. And, David, no one is arguing the fact that a yielder > running all by itself should log 100% of the CPU. > Well the problem seems to be an inconsistancy I reported to 'the list' some time ago. As I recall, Ingo replied that I should use usleep(0) and the problem I reported would "go away". It did go away. However, if you run this on a single-processor machine, 'top' will show that each task gets 99+ percent of the CPU, which of course can't be correct. #include <stdio.h> #include <string.h> #include <unistd.h> int main(int c, char *argv[]) { if(!fork()) { strcpy(argv[0], "Child"); for(;;) ; } strcpy(argv[0], "Parent"); for(;;) sched_yield(); return c; } So, it seems that the guy that's yielding the CPU gets 'charged' for the CPU time he gave away. Fair enough, I guess. As I see it, sched_yield() will give the CPU to any single computible task once. After this, the caller gets the CPU back whether or not he wants it. > 2. I have a question about usleep(0). You seem to make the point > that usleep(0) is equivalent to sched_yield(). I can see how > intuitively this should be the case, but I am not sure if it > will always be true. It's certainly documented anywhere. > No. I did not mention or imply "equivalent", only that you can use it instead, in many, perhaps most, instances. Cheers, Dick Johnson Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips). Windows-2000/Professional isn't. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-18 16:58 ` Chris Friesen 2002-06-18 17:12 ` Richard B. Johnson @ 2002-06-18 17:13 ` Robert Love 2002-06-18 18:00 ` David Schwartz 2002-06-18 17:23 ` Rik van Riel 2 siblings, 1 reply; 52+ messages in thread From: Robert Love @ 2002-06-18 17:13 UTC (permalink / raw) To: Chris Friesen; +Cc: David Schwartz, mgix, linux-kernel On Tue, 2002-06-18 at 09:58, Chris Friesen wrote: > David Schwartz wrote: > > > What would you expect? > > If there is only the one task, then sure it's going to be 100% cpu on that > task. > > However, if there is anything else other than the idle task that wants to > run, then it should run until it exhausts its timeslice. > > One process looping on sched_yield() and another one doing calculations > should result in almost the entire system being devoted to calculations. Exactly. The reason the behavior is odd is not because the sched_yield task is getting any CPU, David. I realize sched_yield is not equivalent to blocking. The reason this behavior is suspect is because the task is receiving a similar amount of CPU to tasks that are _not_ yielding but in fact doing useful work for the entire duration of their timeslice. A task that continually uses its timeslice vs one that yields should easily receive a greater amount of CPU, but this is not the case. As someone who works in the scheduler, this points out that sched_yield is, well, broken. First guess would be it is queuing to the front of the runqueue (it once did this but I thought it was fixed) or otherwise exhausting the timeslice wrong. Someone pointed out this bug existed similarly in 2.5, although it was a bit different. 2.5 has a different (and better, imo) sched_yield implementation that tries to overcome certain shortcomings and also perform optimally and fairly. Robert Love ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-18 17:13 ` Robert Love @ 2002-06-18 18:00 ` David Schwartz 2002-06-18 22:45 ` Stevie O 0 siblings, 1 reply; 52+ messages in thread From: David Schwartz @ 2002-06-18 18:00 UTC (permalink / raw) To: rml, Chris Friesen; +Cc: mgix, linux-kernel >Exactly. The reason the behavior is odd is not because the sched_yield >task is getting any CPU, David. I realize sched_yield is not equivalent >to blocking. Good. >The reason this behavior is suspect is because the task is receiving a >similar amount of CPU to tasks that are _not_ yielding but in fact doing >useful work for the entire duration of their timeslice. This is the same error repeated again. Since you realize that an endless loop on sched_yield is *not* equivalent to blocking, why do you then say "in fact doing useful work"? By what form of ESP is the kernel supposed to determine that the sched_yield task is not 'doing useful work' and the other task is? The kernel doesn't know the loop is endless. For all it knows, as soon as another process gets a drop of CPU, the yielding process may be able to succeed. And because the yielding process is being nice (by yielding), it should get better and better treatment over processes that are burning CPU rather than yielding. >A task that continually uses its timeslice vs one that yields should >easily receive a greater amount of CPU, but this is not the case. Why should the mean task get preferential treatment over the nice task when both are always ready-to-run? DS ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-18 18:00 ` David Schwartz @ 2002-06-18 22:45 ` Stevie O 2002-06-19 2:11 ` David Schwartz 2002-06-20 20:31 ` David Schwartz 0 siblings, 2 replies; 52+ messages in thread From: Stevie O @ 2002-06-18 22:45 UTC (permalink / raw) To: David Schwartz, rml, Chris Friesen; +Cc: mgix, linux-kernel At 11:00 AM 6/18/2002 -0700, David Schwartz wrote: >This is the same error repeated again. Since you realize that an endless >loop on sched_yield is *not* equivalent to blocking, why do you then say "in >fact doing useful work"? By what form of ESP is the kernel supposed to >determine that the sched_yield task is not 'doing useful work' and the other >task is? By this form of ESP: sched_yield() means "I have nothing better to do right now, give my time to someone who does". If a thread is doing useful work, why would it call sched_yield() ?!? -- Stevie-O Real programmers use COPY CON PROGRAM.EXE ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-18 22:45 ` Stevie O @ 2002-06-19 2:11 ` David Schwartz 2002-06-19 2:52 ` Stevie O 2002-06-20 20:31 ` David Schwartz 1 sibling, 1 reply; 52+ messages in thread From: David Schwartz @ 2002-06-19 2:11 UTC (permalink / raw) To: stevie, rml, Chris Friesen; +Cc: mgix, linux-kernel On Tue, 18 Jun 2002 18:45:55 -0400, Stevie O wrote: >At 11:00 AM 6/18/2002 -0700, David Schwartz wrote: >>This is the same error repeated again. Since you realize that an endless >>loop on sched_yield is *not* equivalent to blocking, why do you then say >>"in >>fact doing useful work"? By what form of ESP is the kernel supposed to >>determine that the sched_yield task is not 'doing useful work' and the >>other >>task is? >By this form of ESP: sched_yield() means "I have nothing better to do right >now, give my time to someone who does". No, this is not what sched_yield means. What 'sched_yield' means is that you're at a point where it's convenient for another process to run. For example, perhaps you just released a mutex that you held for a long period of time, or perhaps you could function more efficiently if you could acquire a resource another thread holds. >If a thread is doing useful work, >why would it call sched_yield() ?!? Perhaps to allow other threads to make forward progress. Perhaps to give other threads a chance to use a resource it just released. Perhaps in hopes that another thread will release a resource it could benefit from being able to acquire. DS ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-19 2:11 ` David Schwartz @ 2002-06-19 2:52 ` Stevie O 0 siblings, 0 replies; 52+ messages in thread From: Stevie O @ 2002-06-19 2:52 UTC (permalink / raw) To: David Schwartz, rml, Chris Friesen; +Cc: mgix, linux-kernel At 07:11 PM 6/18/2002 -0700, David Schwartz wrote: >>By this form of ESP: sched_yield() means "I have nothing better to do right >>now, give my time to someone who does". > > No, this is not what sched_yield means. What 'sched_yield' means is that >you're at a point where it's convenient for another process to run. For >example, perhaps you just released a mutex that you held for a long period of >time, or perhaps you could function more efficiently if you could acquire a >resource another thread holds. To-may-to, to-mah-to. Okay. I'll try again. sched_yield() means "It would be convenient (or I would function more efficiently) if another process or thread were to run right now." i.e., "Would you be so kind as to run somebody else? As in, not me.". From what I've gathered in this thread, something akin to this happens: You are trying to get information out of any of three people. Two of the three people will have better/more reliable information, so you give them a higher priority. You choose a person to ask for information -> schedule() You ask person #1 for information. Person #1 says: "Ask someone else" -> sched_yield() You choose a person to ask for information -> schedule() You ask person #2 for information. Person #2 says: "Ask someone else" -> sched_yield() You choose a person to ask for information -> schedule() Now, any rational human being (who actually wants this information) will certainly proceed as such: You ask person #3 for information. Person #3 says: "Here is the information you need." You go on your merry way. However, the Linux scheduler will look at its options, see that Person #1 has a higher priority than Person #3, and that Person #1 is marked ready-to-run, it proceeds: You ask person #1 for information. Person #1 says: "Ask someone else" -> sched_yield() Proceed to thrash between #1 and #2, ignoring #3. Now, don't get me wrong; I understand your argument. Like I said, Person #1 is not blocking (sched_yield() does not block), and is a higher priority than #3. All other things being equal, Person #1 should be scheduled after Person #2 yields. But things aren't equal. There's a crucial difference here: Person #1 has relinquished his timeslice, and Person #3 hasn't. And I'm arguing that, if a thread/process/whatever calls sched_yield, then that thread should only be run in that timeslice if: * There are no other threads that are ready-to-run on that processor, or * Every other thread has called sched_yield(). Yes, the current behavior is technically correct; sched_yield() properly gives up control to another process. But it will sometimes give up the processor to another process that previously did a sched_yield() in the current timeslice. And that's just stupid. >>If a thread is doing useful work, >>why would it call sched_yield() ?!? > > Perhaps to allow other threads to make forward progress. Perhaps to give >other threads a chance to use a resource it just released. Perhaps in hopes >that another thread will release a resource it could benefit from being able >to acquire. Yeah. And if two threads decide to be 'nice' -- to allow a third thread to make forward progress -- neither will. The scheduler with thrash between the two threads, in preference to scheduling the third thread. This is in accordance with the strictest interpretation of sched_yield()'s definition. -- Stevie-O Real programmers use COPY CON PROGRAM.EXE ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-18 22:45 ` Stevie O 2002-06-19 2:11 ` David Schwartz @ 2002-06-20 20:31 ` David Schwartz 1 sibling, 0 replies; 52+ messages in thread From: David Schwartz @ 2002-06-20 20:31 UTC (permalink / raw) To: stevie, rml, Chris Friesen; +Cc: mgix, linux-kernel On Tue, 18 Jun 2002 18:45:55 -0400, Stevie O wrote: >At 11:00 AM 6/18/2002 -0700, David Schwartz wrote: >>This is the same error repeated again. Since you realize that an endless >>loop on sched_yield is *not* equivalent to blocking, why do you then say >>"in >>fact doing useful work"? By what form of ESP is the kernel supposed to >>determine that the sched_yield task is not 'doing useful work' and the >>other >>task is? >By this form of ESP: sched_yield() means "I have nothing better to do right >now, give my time to someone who does". If a thread is doing useful work, >why would it call sched_yield() ?!? To give other threads a chance to do useful work too, perhaps because it just released a mutex that other threads might need that it held for an unusually long time. DS ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-18 16:58 ` Chris Friesen 2002-06-18 17:12 ` Richard B. Johnson 2002-06-18 17:13 ` Robert Love @ 2002-06-18 17:23 ` Rik van Riel 2002-06-18 17:50 ` Chris Friesen 2 siblings, 1 reply; 52+ messages in thread From: Rik van Riel @ 2002-06-18 17:23 UTC (permalink / raw) To: Chris Friesen; +Cc: David Schwartz, rml, mgix, linux-kernel On Tue, 18 Jun 2002, Chris Friesen wrote: > > It has to. What if the only task running is: > > > > while(1) sched_yield(); > > > > What would you expect? > > If there is only the one task, then sure it's going to be 100% cpu on > that task. > > However, if there is anything else other than the idle task that wants > to run, then it should run until it exhausts its timeslice. > > One process looping on sched_yield() and another one doing calculations > should result in almost the entire system being devoted to calculations. So if you have a database with 20 threads yielding to each other each time a lock is contended and one CPU hog the database should be reduced to a very small percentage of the CPU ? regards, Rik -- http://www.linuxsymposium.org/2002/ "You're one of those condescending OLS attendants" "Here's a nickle kid. Go buy yourself a real t-shirt" http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-18 17:23 ` Rik van Riel @ 2002-06-18 17:50 ` Chris Friesen 0 siblings, 0 replies; 52+ messages in thread From: Chris Friesen @ 2002-06-18 17:50 UTC (permalink / raw) To: Rik van Riel; +Cc: David Schwartz, rml, mgix, linux-kernel Rik van Riel wrote: > > On Tue, 18 Jun 2002, Chris Friesen wrote: > > However, if there is anything else other than the idle task that wants > > to run, then it should run until it exhausts its timeslice. > > > > One process looping on sched_yield() and another one doing calculations > > should result in almost the entire system being devoted to calculations. > > So if you have a database with 20 threads yielding to each other > each time a lock is contended and one CPU hog the database should > be reduced to a very small percentage of the CPU ? If the database threads are at a higher priority than the cpu hog, then no they shouldn't. The one that owns the lock should progress as normal. Assuming that only the thread owning the lock progresses, the cpu usage ratio between the threads as a group and the hog should be roughly equivalent to a single database thread and a single hog. We certainly shouldn't thrash the yielding threads and starve the hog. Chris -- Chris Friesen | MailStop: 043/33/F10 Nortel Networks | work: (613) 765-0557 3500 Carling Avenue | fax: (613) 765-2986 Nepean, ON K2H 8E9 Canada | email: cfriesen@nortelnetworks.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* RE: Question about sched_yield() 2002-06-18 0:46 ` David Schwartz 2002-06-18 0:55 ` Robert Love @ 2002-06-18 1:41 ` mgix 2002-06-18 3:21 ` David Schwartz 2002-06-18 4:55 ` Ingo Molnar ` (2 subsequent siblings) 4 siblings, 1 reply; 52+ messages in thread From: mgix @ 2002-06-18 1:41 UTC (permalink / raw) To: David Schwartz, linux-kernel > -----Original Message----- > From: David Schwartz [mailto:davids@webmaster.com] > Sent: Monday, June 17, 2002 5:46 PM > To: mgix@mgix.com; linux-kernel@vger.kernel.org > Subject: Re: Question about sched_yield() > > > > On Sat, 15 Jun 2002 15:15:32 -0700, mgix@mgix.com wrote: > > > >Hello, > > > >I am seeing some strange linux scheduler behaviours, > >and I thought this'd be the best place to ask. > > > >I have two processes, one that loops forever and > >does nothing but calling sched_yield(), and the other > >is basically benchmarking how fast it can compute > >some long math calculation. > [snip] > > You seem to have a misconception about what sched_yield is for. It is not a > replacement for blocking or a scheduling priority adjustment. It simply lets > other ready-to-run tasks be scheduled before returning to the current task. > > Here's a quote from SuS3: > > "The sched_yield() function shall force the running thread to relinquish the > processor until it again becomes the head of its thread list. It takes no > arguments." > > This neither says nor implies anything about CPU usage. It simply says that > the current thread will yield and be put at the end of the list. If so, please enlighten me as to when, why, and what for you would use sched_yield. If willingly and knowingly relinquinshing a CPU does not make it possible for other processes to use what would otherwise have been your very own slice of processing time then what could it be used for, I really wonder. Second, I have tried to run my misconception on various other OS'es I have access to:Win2k, Mac OSX and OpenBSD, and suprinsingly enough, all of them seem to be sharing my twisted views of How Things Should Be (tm). - Mgix ^ permalink raw reply [flat|nested] 52+ messages in thread
* RE: Question about sched_yield() 2002-06-18 1:41 ` mgix @ 2002-06-18 3:21 ` David Schwartz 2002-06-18 3:52 ` mgix 0 siblings, 1 reply; 52+ messages in thread From: David Schwartz @ 2002-06-18 3:21 UTC (permalink / raw) To: mgix, linux-kernel >> This neither says nor implies anything about CPU usage. It simply says >>that >>the current thread will yield and be put at the end of the list. > >If so, please enlighten me as to when, why, and what for you would use >sched_yield. Generally because you can't do something until some other thread/process does something, so you give it a chance to finish immediately before trying something a more expensive way. >If willingly and knowingly relinquinshing a CPU does not make it possible >for other processes to use what would otherwise have been your very own >slice >of processing time then what could it be used for, I really wonder. It does! That's what it's for. >Second, I have tried to run my misconception on various other OS'es I have >access to:Win2k, Mac OSX and OpenBSD, and suprinsingly enough, all of them >seem to be sharing my twisted views of How Things Should Be (tm). Just because they do what you naively expect doesn't validate your expectations. DS ^ permalink raw reply [flat|nested] 52+ messages in thread
* RE: Question about sched_yield() 2002-06-18 3:21 ` David Schwartz @ 2002-06-18 3:52 ` mgix 0 siblings, 0 replies; 52+ messages in thread From: mgix @ 2002-06-18 3:52 UTC (permalink / raw) To: David Schwartz, linux-kernel > >If willingly and knowingly relinquinshing a CPU does not make it possible > >for other processes to use what would otherwise have been your very own > >slice > >of processing time then what could it be used for, I really wonder. > > It does! That's what it's for. Good, and now that we agree: Now back to the original point: sched_yield does not do the above on Linux as of today, which was the point of my original posting, and which is the reason Ingo posted a scheduler patch. - Mgix ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-18 0:46 ` David Schwartz 2002-06-18 0:55 ` Robert Love 2002-06-18 1:41 ` mgix @ 2002-06-18 4:55 ` Ingo Molnar 2002-06-19 11:24 ` Bill Davidsen 2002-06-18 18:56 ` Question about sched_yield() Rusty Russell 2002-06-19 2:10 ` jw schultz 4 siblings, 1 reply; 52+ messages in thread From: Ingo Molnar @ 2002-06-18 4:55 UTC (permalink / raw) To: David Schwartz; +Cc: mgix, linux-kernel On Mon, 17 Jun 2002, David Schwartz wrote: > >I am seeing some strange linux scheduler behaviours, > >and I thought this'd be the best place to ask. > > > >I have two processes, one that loops forever and > >does nothing but calling sched_yield(), and the other > >is basically benchmarking how fast it can compute > >some long math calculation. > [...] It is not a replacement for blocking or a scheduling priority > adjustment. It simply lets other ready-to-run tasks be scheduled before > returning to the current task. and this is what the scheduler didnt do properly, it actually didnt schedule valid ready-to-run processes for long milliseconds, it switched between two sched_yield processes, starving the CPU-intensive process. I posted a patch for 2.5 that fixes this, and the 2.4.19-pre10-ac2 backport i did includes this fix as well: http://redhat.com/~mingo/O(1)-scheduler/sched-2.4.19-pre10-ac2-A4 a good sched_yield() implementation should give *all* other tasks a chance to run, instead of switching between multiple sched_yield()-ing tasks. I don think this is specified anywhere, but it's a quality of implementation issue. Ingo ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-18 4:55 ` Ingo Molnar @ 2002-06-19 11:24 ` Bill Davidsen 2002-06-19 11:47 ` scheduler timeslice distribution, threads, processes. [was: Re: Question about sched_yield()] Ingo Molnar 0 siblings, 1 reply; 52+ messages in thread From: Bill Davidsen @ 2002-06-19 11:24 UTC (permalink / raw) To: Ingo Molnar; +Cc: Linux Kernel Mailing List On Tue, 18 Jun 2002, Ingo Molnar wrote: > > On Mon, 17 Jun 2002, David Schwartz wrote: > > > >I am seeing some strange linux scheduler behaviours, > > >and I thought this'd be the best place to ask. > > > > > >I have two processes, one that loops forever and > > >does nothing but calling sched_yield(), and the other > > >is basically benchmarking how fast it can compute > > >some long math calculation. > > > [...] It is not a replacement for blocking or a scheduling priority > > adjustment. It simply lets other ready-to-run tasks be scheduled before > > returning to the current task. > > and this is what the scheduler didnt do properly, it actually didnt > schedule valid ready-to-run processes for long milliseconds, it switched > between two sched_yield processes, starving the CPU-intensive process. I > posted a patch for 2.5 that fixes this, and the 2.4.19-pre10-ac2 backport > i did includes this fix as well: > > http://redhat.com/~mingo/O(1)-scheduler/sched-2.4.19-pre10-ac2-A4 > > a good sched_yield() implementation should give *all* other tasks a chance > to run, instead of switching between multiple sched_yield()-ing tasks. I > don think this is specified anywhere, but it's a quality of implementation > issue. Clearly your fix is subtle enough that a quick reading, at least by me, can't follow all the nuances, but it looks on first reading as if your patch fixes this too well, and will now starve the threaded process by only giving one turn at the CPU per full pass through the queue, rather than sharing the timeslice between threads of a process. I posted some thoughts on this to Robert, if you would comment I would appreciate. My thought is that if you have N processes and one is threaded, the aggregate of all threads should be 1/N of the CPU(s), not vastly more or less. I think with your change it will be less if they are sharing a resource. Feel free to tell me I misread what it does, or my desired behaviour is not correct. I do run 15 machines with long running a threaded server application and periodic CPU hog things like log roll and compress, stats generation, certain database cleanup, etc, and I have more than intelectual curiousity on this behaviour. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 52+ messages in thread
* scheduler timeslice distribution, threads, processes. [was: Re: Question about sched_yield()] 2002-06-19 11:24 ` Bill Davidsen @ 2002-06-19 11:47 ` Ingo Molnar 0 siblings, 0 replies; 52+ messages in thread From: Ingo Molnar @ 2002-06-19 11:47 UTC (permalink / raw) To: Bill Davidsen; +Cc: Linux Kernel Mailing List On Wed, 19 Jun 2002, Bill Davidsen wrote: > Clearly your fix is subtle enough that a quick reading, at least by me, > can't follow all the nuances, but it looks on first reading as if your > patch fixes this too well, and will now starve the threaded process by > only giving one turn at the CPU per full pass through the queue, rather > than sharing the timeslice between threads of a process. the important point is that to the Linux scheduler there is only one context of execution that matters: one Linux thread. The kernel itself is deeply threaded - in kernel mode all threads access all data structures of each other. Whether threads share certain resources (the VM or files) at the userspace level is irrelevant to the scheduler. My strong point is that this is not an accidental property of Linux, it's very intentional. CPU time partitioning and VM-sharing are two distinct concepts and they should not be artificially connected. > I posted some thoughts on this to Robert, if you would comment I would > appreciate. My thought is that if you have N processes and one is > threaded, the aggregate of all threads should be 1/N of the CPU(s), not > vastly more or less. I think with your change it will be less if they > are sharing a resource. Feel free to tell me I misread what it does, or > my desired behaviour is not correct. 'processes' are threads that have an isolated set of resources. The possible sharing relationships between threads is much more complex than what can be expressed via a simplicistic 'lightweight process' vs. 'heavyweight process' picture. [or the 'container process/shell' and 'member threads' concept used in a particular OS, designed in the early 90s ;-) ] It's a fundamental property of the scheduler that it shares timeslices between threads of execution - regardless of the resource-sharing done by those threads. [in fact it's almost impossible to get an accurate picture of resource sharing that can be used by the scheduler, the resource sharing capabilities of Linux are very finegrained. It's possible to share just a single file descriptor between two threads, and it's possible to share a given segment of virtual memory as well.] so in your example, if you have 10 isolated threads ('normal processes'), and a 'process' that has 5 shared-all threads, the scheduler only sees that there are 15 threads in the system - and each thread will get 1/15th of the CPU resources [if all threads have the same priority]. but i'm not fundamentally against enabling users to partition their CPU resource usage better - ie. to be able to tell which particular set of threads should get a given percentage of CPU time. What i'm against is to tie, hardcode this to the 'all VM sharing' property of threads - like other OSs do. Why shouldnt it be possible to give the 10 isolated threads 30% of all CPU time, and the remaining 5 shared-all threads 70% CPU time? By decoupling the ability to partition CPU time in a finegrained way we not only can do what you describe, but we also win lots of flexibility, and the whole model becomes much cleaner. > I do run 15 machines with long running a threaded server application and > periodic CPU hog things like log roll and compress, stats generation, > certain database cleanup, etc, and I have more than intelectual > curiousity on this behaviour. right now you cannot set specific hard percentages for given jobs, but you can give them a relative priority via nice(). While nice() isnt exactly a partitioning tool (eg. the CPU time used up increases with the number of threads), it's pretty close. But being able to partition timeslices more accurately is something that will happen eventually - and the scheduler is ready for it. Ingo ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-18 0:46 ` David Schwartz ` (2 preceding siblings ...) 2002-06-18 4:55 ` Ingo Molnar @ 2002-06-18 18:56 ` Rusty Russell 2002-06-18 19:12 ` David Schwartz 2002-06-19 11:29 ` Bill Davidsen 2002-06-19 2:10 ` jw schultz 4 siblings, 2 replies; 52+ messages in thread From: Rusty Russell @ 2002-06-18 18:56 UTC (permalink / raw) To: David Schwartz; +Cc: mgix, linux-kernel On Mon, 17 Jun 2002 17:46:29 -0700 David Schwartz <davids@webmaster.com> wrote: > "The sched_yield() function shall force the running thread to relinquish the > processor until it again becomes the head of its thread list. It takes no > arguments." Notice how incredibly useless this definition is. It's even defined in terms of UP. Rusty. -- there are those who do and those who hang on and you don't see too many doers quoting their contemporaries. -- Larry McVoy ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-18 18:56 ` Question about sched_yield() Rusty Russell @ 2002-06-18 19:12 ` David Schwartz 2002-06-18 20:19 ` Rusty Russell 2002-06-19 11:29 ` Bill Davidsen 1 sibling, 1 reply; 52+ messages in thread From: David Schwartz @ 2002-06-18 19:12 UTC (permalink / raw) To: rusty; +Cc: mgix, linux-kernel On Wed, 19 Jun 2002 04:56:06 +1000, Rusty Russell wrote: >On Mon, 17 Jun 2002 17:46:29 -0700 >David Schwartz <davids@webmaster.com> wrote: >>"The sched_yield() function shall force the running thread to relinquish >>the >>processor until it again becomes the head of its thread list. It takes no >>arguments." >Notice how incredibly useless this definition is. It's even defined in >terms >of UP. Huh?! This definition is beautiful in that it makes no such assumptions. How would you say this is invalid on an SMP machine? By "the processor", they mean "the process on which the thread is running" (the only one it could relinquish, after all). DS ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-18 19:12 ` David Schwartz @ 2002-06-18 20:19 ` Rusty Russell 2002-06-18 20:40 ` David Schwartz 2002-06-18 20:42 ` mgix 0 siblings, 2 replies; 52+ messages in thread From: Rusty Russell @ 2002-06-18 20:19 UTC (permalink / raw) To: David Schwartz; +Cc: mgix, linux-kernel, mingo In message <20020618191233.AAA27954@shell.webmaster.com@whenever> you write: > > On Wed, 19 Jun 2002 04:56:06 +1000, Rusty Russell wrote: > > >On Mon, 17 Jun 2002 17:46:29 -0700 > >David Schwartz <davids@webmaster.com> wrote: > > >>"The sched_yield() function shall force the running thread to relinquish > >>the processor until it again becomes the head of its thread list. > >> It takes no arguments." > > >Notice how incredibly useless this definition is. It's even defined in > >terms of UP. > > =09Huh?! This definition is beautiful in that it makes no such= > assumptions. How would you say this is invalid on an SMP machine? By > "the= processor", they mean "the process on which the thread is > running" (the only one= it could relinquish, after all). Read again: they use "relinquish ... until", not "relinquish". Subtle difference. I have 32 processors and 32 threads. One does a yield(). What happens? What should happen? Given that yield is "sleep for some time but I won't tell you what I'm doing", I have no sympathy for yield users 8) Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-18 20:19 ` Rusty Russell @ 2002-06-18 20:40 ` David Schwartz 2002-06-18 20:42 ` mgix 1 sibling, 0 replies; 52+ messages in thread From: David Schwartz @ 2002-06-18 20:40 UTC (permalink / raw) To: rusty; +Cc: mgix, linux-kernel, mingo >>>>"The sched_yield() function shall force the running thread to relinquish >>>>the processor until it again becomes the head of its thread list. >>>>It takes no arguments." >>>Notice how incredibly useless this definition is. It's even defined in >>>terms of UP. >>Huh?! This definition is beautiful in that it makes no such= >>assumptions. How would you say this is invalid on an SMP machine? By >>"the= processor", they mean "the process on which the thread is >>running" (the only one= it could relinquish, after all). >Read again: they use "relinquish ... until", not "relinquish". Subtle >difference. So? >I have 32 processors and 32 threads. One does a yield(). What >happens? What should happen? It should relinquish the processor it is running on until it again becomes the head of its thread list. (IE, for as long as it takes the scheduler to decide that it's the thread to run.) >Given that yield is "sleep for some time but I won't tell you what I'm >doing", I have no sympathy for yield users 8) I have sympathy for those who use it properly, I have no sympathy for those who loop on sched_yield burning the CPU and then complaining that it burns CPU. DS ^ permalink raw reply [flat|nested] 52+ messages in thread
* RE: Question about sched_yield() 2002-06-18 20:19 ` Rusty Russell 2002-06-18 20:40 ` David Schwartz @ 2002-06-18 20:42 ` mgix 2002-06-18 22:03 ` David Schwartz 2002-06-18 22:36 ` Ingo Molnar 1 sibling, 2 replies; 52+ messages in thread From: mgix @ 2002-06-18 20:42 UTC (permalink / raw) To: rusty, David Schwartz; +Cc: linux-kernel, mingo > Given that yield is "sleep for some time but I won't tell you what I'm > doing", I have no sympathy for yield users 8) Here's something interesting at last. True, when userland calls sched_yield, the kernel is left in the dark. However, I am not aware of any alternative to communicate what I really want to the scheduler, and here's why. If anyone has ideas on how to do this better, please, I'm all ears. It's basically about spinlocks and the cost of task switching. I'm trying to implement "smart" spinlocks. Regular blocking lock based on some kind of kernel object are: 1. Slow because every unsuccessful use of the lock will trigger a task switch. There is no way to have two threads exchange messages at a very fast rate with a blocking lock. 2. Expensive in terms of kernel resource consumption: you end up allocating a kernel object each time you need a lock. If your app's locking is sufficienly fine-grained, you end up allocating a huge number of kernel objects. 3. To avoid problem 2., you can try and allocate kernel objects on the fly, but then performance gets even worse: every lock acquisition/release requiring the creation/destruction of a kernel object. The alternative, userland spinlocks : 1. Cost zero in term of kernel resource consumption. 2. Are a major catastrophe when running on a UP box (you can spin as long as you want, nothing's going to change since there's only one CPU and you're hogging it) 3. A CPU hog at best when running on an SMP boxes: the spinning thread gobbles up a whole 100% of a CPU. "Smart" spinlocks basically try and do it this way: int spinLoops= GetNumberOfProcsICanRunOn() > 1 ? someBigNumber : 1; while(1) { int n= spinLoops; while(n--) tryAndGetTheSpinLock(); if(gotIt) break; sched_yield(); } These seem to have all the qualities I want: 1. On a UP box, the thread will poke at the spinlock only once, and then yield and *hopefully* get scheduled till later, giving a chance to the lock owner to finish its stuff and release. Net result: spinning thread gives no CPU consumption, no kernel object allocated, and proper locking behaviour. 2. On an SMP box, the thread will bang on the spinlock a large number of times, hoping to get it before it gets taskswitched away. If it does, great: no time lost. If it doesn't, we're out of luck, yield the CPU and try again next time. Net result: tunable balance between CPU consumption and task switching costs, no kernel object allocated, and proper locking behaviour. I'm sure this issue has been beaten to death before, but I just wanted to gove my $.02 - Mgix ^ permalink raw reply [flat|nested] 52+ messages in thread
* RE: Question about sched_yield() 2002-06-18 20:42 ` mgix @ 2002-06-18 22:03 ` David Schwartz 2002-06-18 22:36 ` Ingo Molnar 1 sibling, 0 replies; 52+ messages in thread From: David Schwartz @ 2002-06-18 22:03 UTC (permalink / raw) To: mgix, rusty; +Cc: linux-kernel, mingo > 3. A CPU hog at best when running on an SMP boxes: the spinning >thread gobbles up a whole 100% of a CPU. For the few hundred cycles some other thread holds the lock. >"Smart" spinlocks basically try and do it this way: > > int spinLoops= GetNumberOfProcsICanRunOn() > 1 ? someBigNumber : 1; > while(1) > { > int n= spinLoops; > while(n--) tryAndGetTheSpinLock(); > if(gotIt) break; > sched_yield(); > } > >These seem to have all the qualities I want: Almost. >2. On an SMP box, the thread will bang on the spinlock a large >number of times, hoping to get it before it gets taskswitched away. >If it does, great: no time lost. >If it doesn't, we're out of luck, yield the CPU and try again next time. You should limit how many times you spin in this loop. If it gets to be too many, you should block. You can either block by sleeping for a few milliseconds. If you don't like the idea that one thread will release the lock and the other will waste time sleeping, then associate a kernel lock with the spinlock when a thread gives up waiting, have your unlock function check for an associated kernel lock and if there is one, unlock it. DS ^ permalink raw reply [flat|nested] 52+ messages in thread
* RE: Question about sched_yield() 2002-06-18 20:42 ` mgix 2002-06-18 22:03 ` David Schwartz @ 2002-06-18 22:36 ` Ingo Molnar 1 sibling, 0 replies; 52+ messages in thread From: Ingo Molnar @ 2002-06-18 22:36 UTC (permalink / raw) To: mgix; +Cc: Rusty Russell, David Schwartz, linux-kernel, mingo On Tue, 18 Jun 2002 mgix@mgix.com wrote: > However, I am not aware of any alternative to communicate what I really > want to the scheduler, and here's why. If anyone has ideas on how to do > this better, please, I'm all ears. > > It's basically about spinlocks and the cost of task switching. > > I'm trying to implement "smart" spinlocks. the right solution for you are the new ultra-fast kernel-helped, user-space semaphores. Those are kernel objects where the semaphore data structure is accessible to user-space as well. Acquiring the semaphore in the no-contention case is very cheap, it's basically a single assembly instruction. In the contention case you'll enter the kernel and schedule away. (in the sched_yield() case you schedule away just as much in the contention case, so there isnt any difference.) for shortlived but contended locks this mechanism can be improved a bit by looping 'some more' on the lock before entering the kernel and blocking - but it could also be a loss if the amount of additional looping is too long or too short. Since the O(1) scheduler context-switches in under 1 microsecond on a 1 GHz box, and has no SMP interlocking and cache-trashing problems, you'll still get 1 million context switches per second, per CPU. And this is the 'slow' case. Ingo ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-18 18:56 ` Question about sched_yield() Rusty Russell 2002-06-18 19:12 ` David Schwartz @ 2002-06-19 11:29 ` Bill Davidsen 2002-06-19 14:03 ` Rusty Russell 1 sibling, 1 reply; 52+ messages in thread From: Bill Davidsen @ 2002-06-19 11:29 UTC (permalink / raw) To: Rusty Russell; +Cc: Linux Kernel Mailing List On Wed, 19 Jun 2002, Rusty Russell wrote: > On Mon, 17 Jun 2002 17:46:29 -0700 > David Schwartz <davids@webmaster.com> wrote: > > "The sched_yield() function shall force the running thread to relinquish the > > processor until it again becomes the head of its thread list. It takes no > > arguments." > > Notice how incredibly useless this definition is. It's even defined in terms > of UP. I think you parse this differently than I, I see no reference to UP. The term "the processor" clearly (to me at least) means the processor running in that thread at the time of the yeild. The number of processors running in a single thread at any one time is an integer number in the range zero to one. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-19 11:29 ` Bill Davidsen @ 2002-06-19 14:03 ` Rusty Russell 2002-06-19 22:25 ` Bill Davidsen 0 siblings, 1 reply; 52+ messages in thread From: Rusty Russell @ 2002-06-19 14:03 UTC (permalink / raw) To: Bill Davidsen; +Cc: Linux Kernel Mailing List In message <Pine.LNX.3.96.1020619072548.1119D-100000@gatekeeper.tmr.com> you wr ite: > On Wed, 19 Jun 2002, Rusty Russell wrote: > > > On Mon, 17 Jun 2002 17:46:29 -0700 > > David Schwartz <davids@webmaster.com> wrote: > > > "The sched_yield() function shall force the running thread to relinquish the > > > processor until it again becomes the head of its thread list. It takes no > > > arguments." > > > > Notice how incredibly useless this definition is. It's even defined in ter ms > > of UP. > > I think you parse this differently than I, I see no reference to UP. The > term "the processor" clearly (to me at least) means the processor running > in that thread at the time of the yeild. > > The number of processors running in a single thread at any one time is an > integer number in the range zero to one. It's the word "until": "relinquish the processor until". It's pretty clearly implied that it's going to "unrelinquish" *the processor* at the end of this process. So, by your definition, it can be scheduled on another CPU before it becomes head of the thread list? Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-19 14:03 ` Rusty Russell @ 2002-06-19 22:25 ` Bill Davidsen 2002-06-19 22:37 ` Ingo Molnar 0 siblings, 1 reply; 52+ messages in thread From: Bill Davidsen @ 2002-06-19 22:25 UTC (permalink / raw) To: Rusty Russell; +Cc: Linux Kernel Mailing List On Thu, 20 Jun 2002, Rusty Russell wrote: > In message <Pine.LNX.3.96.1020619072548.1119D-100000@gatekeeper.tmr.com> you wr > ite: > > On Wed, 19 Jun 2002, Rusty Russell wrote: > > > > > On Mon, 17 Jun 2002 17:46:29 -0700 > > > David Schwartz <davids@webmaster.com> wrote: > > > > "The sched_yield() function shall force the running thread to relinquish > the > > > > processor until it again becomes the head of its thread list. It takes no > > > > > arguments." > > > > > > Notice how incredibly useless this definition is. It's even defined in ter > ms > > > of UP. > > > > I think you parse this differently than I, I see no reference to UP. The > > term "the processor" clearly (to me at least) means the processor running > > in that thread at the time of the yeild. > > > > The number of processors running in a single thread at any one time is an > > integer number in the range zero to one. > > It's the word "until": "relinquish the processor until". > > It's pretty clearly implied that it's going to "unrelinquish" *the > processor* at the end of this process. > > So, by your definition, it can be scheduled on another CPU before it > becomes head of the thread list? I have to read "head of the thread list" broadly, and assume it means the thread will be run when it is the most appropriate thread to be run. I don't read the wording as requiring or forbidding SMP, uni, or a strict round-robin scheduler. The term "head of the thread list" doesn't state that all other processes must get a time slice. I believe I clarified my concerns in another post, I don't want to repeat them. I don't want a process with threads contending for a resource to get all or none of the CPU, each of which is possible with various schedulers and patches recently available. I'd like to see threads of a single process be able to get, use, and share a timeslice before some cpu hog comes in and get his timeslice. I don't read the text you quoted as requiring much more than 'run someone else," because it's comfortably vague. To me anyway. I'm not knocking anyone who reads it more strictly, I just don't see what you did. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-19 22:25 ` Bill Davidsen @ 2002-06-19 22:37 ` Ingo Molnar 0 siblings, 0 replies; 52+ messages in thread From: Ingo Molnar @ 2002-06-19 22:37 UTC (permalink / raw) To: Bill Davidsen; +Cc: Rusty Russell, Linux Kernel Mailing List On Wed, 19 Jun 2002, Bill Davidsen wrote: > [...] I'd like to see threads of a single process be able to get, use, > and share a timeslice before some cpu hog comes in and get his > timeslice. there is no such concept as 'threads of a single process' in Linux, and this is not just a naming difference. In Linux threads are threads, and whether they share the same set of pagetables or not is secondary to the kernel. (there are lots of other resources they might or might not share between each other.) the OS where processes 'own' threads, where the process is a container, where this concept is pretty much the only meaningful multiprogramming concept, and where the kernel API is separated into per-thread and per-process parts is not called Linux. Ingo ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Question about sched_yield() 2002-06-18 0:46 ` David Schwartz ` (3 preceding siblings ...) 2002-06-18 18:56 ` Question about sched_yield() Rusty Russell @ 2002-06-19 2:10 ` jw schultz 4 siblings, 0 replies; 52+ messages in thread From: jw schultz @ 2002-06-19 2:10 UTC (permalink / raw) To: linux-kernel Lets back up a moment please. On Mon, Jun 17, 2002 at 05:46:29PM -0700, David Schwartz wrote: > > On Sat, 15 Jun 2002 15:15:32 -0700, mgix@mgix.com wrote: > > > >Hello, > > > >I am seeing some strange linux scheduler behaviours, > >and I thought this'd be the best place to ask. > > > >I have two processes, one that loops forever and > >does nothing but calling sched_yield(), and the other > >is basically benchmarking how fast it can compute > >some long math calculation. > [snip] > > You seem to have a misconception about what sched_yield is for. It is not a > replacement for blocking or a scheduling priority adjustment. It simply lets > other ready-to-run tasks be scheduled before returning to the current task. > > Here's a quote from SuS3: > > "The sched_yield() function shall force the running thread to relinquish the > processor until it again becomes the head of its thread list. It takes no > arguments." .From this description it sounds like sched_yield was intended to deal with scheduling of threads within a process not the between processes. Namely that there is a "thread list" within a single process context. This would be used for yielding to another thread within the same list (context) when the current thread needs a contended-for resource (an inter-thread/intra-context lock) or the result of another thread's calculation. Or simply to allow switching, round-robin style, between a set of compute intensive threads by salting the compute loops with sched_yield. Am i right? Linux, as we know, treats threads and independent processes as equals. Therefore, the process scheduler now becomes responsible for dealing with sched_yield and somehow translating sched_yield's intent to a situation where threads are contending with other processes instead of just with their fellows and their fellows can wind up with differing priorities. Handled sched_yield wrong could result in a live/deadlock (i'm fuzzy at the moment on the distinction, sorry) if one thread is sched_yield/spinning on a resource pending action of a lower priority thread. It would seem that sched_yield should make sure that the calling thread has a priority lower than the lowest priority thread within the thread group. "until it again becomes the head of its thread list" sounds to me like all other threads in the group should have an opportunity to run before the calling thread does. I would suggest that in the case where there are no other threads in the group that are runnable the process should sleep(0) so another (lower priority) process can run. -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: jw@pegasys.ws Remember Cernan and Schmitt ^ permalink raw reply [flat|nested] 52+ messages in thread
end of thread, other threads:[~2002-06-20 20:31 UTC | newest] Thread overview: 52+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-06-15 22:15 Question about sched_yield() mgix 2002-06-16 14:43 ` [patch] " Ingo Molnar 2002-06-18 0:46 ` David Schwartz 2002-06-18 0:55 ` Robert Love 2002-06-18 1:51 ` mgix 2002-06-18 3:18 ` David Schwartz 2002-06-18 9:36 ` David Schwartz 2002-06-18 16:58 ` Chris Friesen 2002-06-18 17:12 ` Richard B. Johnson 2002-06-18 17:19 ` mgix 2002-06-18 18:01 ` David Schwartz 2002-06-18 18:05 ` mgix 2002-06-18 19:11 ` David Schwartz 2002-06-18 16:58 ` Rob Landley 2002-06-18 19:25 ` Robert Love 2002-06-18 19:53 ` David Schwartz 2002-06-18 20:12 ` mgix 2002-06-18 20:42 ` David Schwartz 2002-06-18 20:47 ` mgix 2002-06-18 22:00 ` David Schwartz 2002-06-18 22:28 ` Ingo Molnar 2002-06-18 20:08 ` Richard B. Johnson 2002-06-19 11:10 ` Bill Davidsen 2002-06-19 12:04 ` Ingo Molnar 2002-06-18 22:43 ` Olivier Galibert 2002-06-18 18:21 ` Richard B. Johnson 2002-06-18 17:13 ` Robert Love 2002-06-18 18:00 ` David Schwartz 2002-06-18 22:45 ` Stevie O 2002-06-19 2:11 ` David Schwartz 2002-06-19 2:52 ` Stevie O 2002-06-20 20:31 ` David Schwartz 2002-06-18 17:23 ` Rik van Riel 2002-06-18 17:50 ` Chris Friesen 2002-06-18 1:41 ` mgix 2002-06-18 3:21 ` David Schwartz 2002-06-18 3:52 ` mgix 2002-06-18 4:55 ` Ingo Molnar 2002-06-19 11:24 ` Bill Davidsen 2002-06-19 11:47 ` scheduler timeslice distribution, threads, processes. [was: Re: Question about sched_yield()] Ingo Molnar 2002-06-18 18:56 ` Question about sched_yield() Rusty Russell 2002-06-18 19:12 ` David Schwartz 2002-06-18 20:19 ` Rusty Russell 2002-06-18 20:40 ` David Schwartz 2002-06-18 20:42 ` mgix 2002-06-18 22:03 ` David Schwartz 2002-06-18 22:36 ` Ingo Molnar 2002-06-19 11:29 ` Bill Davidsen 2002-06-19 14:03 ` Rusty Russell 2002-06-19 22:25 ` Bill Davidsen 2002-06-19 22:37 ` Ingo Molnar 2002-06-19 2:10 ` jw schultz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox