public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Use of yield() in the kernel
@ 2002-10-15 13:36 Duncan Sands
  2002-10-15 15:15 ` Ingo Molnar
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Duncan Sands @ 2002-10-15 13:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton

The semantics of sched_yield() changed in the 2.5 kernel.
In the 2.4 series it meant "sleep a little".
The new 2.5 semantics are correct (move to the end of the
run queue) but can mean "sleep a lot" under load.

This already bit ext3 transaction batching, c.f. Andrew Morton's

>[PATCH] remove the sched_yield from the ext3 fsync path
>
>The changed sched_yield() semantics have made ext3's transaction
>batching terribly slow.
>
>Apparently a schedule() fixes that, although it probably breaks
>transaction batching.
>
>This patch largely fixes my complaints about the new scheduler being
>extremely sluggish to interactive applications.  Evidently those
>applications were calling fsync() and were spending extremely long
>periods in sched_yield().

Maybe it is worth auditing the kernel source files using yield()?
[There are only 33 of them, so not too bad - see below].
A number of them have comments like /* sleep a little */, so the
authors presumably weren't expecting to get "sleep a lot"...

Here is the list of files using yield(), excluding non-i386 arch specific files:

net/ipv4/tcp_output.c
net/sched/sch_generic.c
net/sunrpc/sched.c
net/unix/af_unix.c
net/socket.c
mm/oom_kill.c
mm/page_alloc.c
kernel/sched.c (in migration_call)
kernel/softirq.c
kernel/suspend.c
init/do_mounts.c
fs/jbd/journal.c
fs/jbd/revoke.c
fs/nfs/pagelist.c
fs/reiserfs/journal.c
fs/ufs/truncate.c
fs/buffer.c
fs/exec.c
fs/locks.c
fs/super.c
drivers/cdrom/cdu31a.c
drivers/cdrom/sonycd535.c
drivers/ide/ide-disk.c
drivers/message/i2o/i2o_core.c
drivers/net/e100/e100_eeprom.c
drivers/net/e100/e100_main.c
drivers/net/e100/e100_phy.c
drivers/net/e100/e100_test.c
drivers/net/depca.c
drivers/net/sb1000.c
drivers/net/sis900.c
drivers/net/slip.c
arch/i386/mm/fault.c

Thoughts?

Duncan.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Use of yield() in the kernel
  2002-10-15 13:36 Use of yield() in the kernel Duncan Sands
@ 2002-10-15 15:15 ` Ingo Molnar
  2002-10-15 17:12 ` John Levon
  2002-10-18 18:26 ` Pavel Machek
  2 siblings, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2002-10-15 15:15 UTC (permalink / raw)
  To: Duncan Sands; +Cc: linux-kernel, Andrew Morton


On Tue, 15 Oct 2002, Duncan Sands wrote:

> Maybe it is worth auditing the kernel source files using yield()?

most definitely.

> Here is the list of files using yield(), excluding non-i386 arch
> specific files:
> 
[...]
> mm/oom_kill.c

this one i think is OK.

> kernel/sched.c (in migration_call)

this is okay as well.

> kernel/softirq.c

these are okay too - both are nonperformance bits.

> arch/i386/mm/fault.c

okay as well, it's a last-ditch effort to not kill init, so yielding is
the right thing to do here.

the others i think should be fixed. (but there might be exceptions.)

	Ingo


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Use of yield() in the kernel
@ 2002-10-15 16:20 Marc-Christian Petersen
  2002-10-15 16:27 ` Duncan Sands
  0 siblings, 1 reply; 15+ messages in thread
From: Marc-Christian Petersen @ 2002-10-15 16:20 UTC (permalink / raw)
  To: linux-kernel; +Cc: Duncan Sands

Hi Duncan,

> The semantics of sched_yield() changed in the 2.5 kernel.
> In the 2.4 series it meant "sleep a little".
> The new 2.5 semantics are correct (move to the end of the
> run queue) but can mean "sleep a lot" under load.
>
> This already bit ext3 transaction batching, c.f. Andrew Morton's
>
>> [PATCH] remove the sched_yield from the ext3 fsync path
where did you read this ^^? :)

ciao, Marc

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Use of yield() in the kernel
  2002-10-15 16:20 Marc-Christian Petersen
@ 2002-10-15 16:27 ` Duncan Sands
  2002-10-25  8:43   ` Duncan Sands
  0 siblings, 1 reply; 15+ messages in thread
From: Duncan Sands @ 2002-10-15 16:27 UTC (permalink / raw)
  To: Marc-Christian Petersen, linux-kernel

> > This already bit ext3 transaction batching, c.f. Andrew Morton's
> >
> >> [PATCH] remove the sched_yield from the ext3 fsync path
>
> where did you read this ^^? :)
>
> ciao, Marc

http://linux.bkbits.net:8080/linux-2.5/cset@1.733.7.6?nav=index.html|ChangeSet@-7d

Duncan.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Use of yield() in the kernel
  2002-10-15 13:36 Use of yield() in the kernel Duncan Sands
  2002-10-15 15:15 ` Ingo Molnar
@ 2002-10-15 17:12 ` John Levon
  2002-10-17  6:36   ` Duncan Sands
  2002-10-18 18:26 ` Pavel Machek
  2 siblings, 1 reply; 15+ messages in thread
From: John Levon @ 2002-10-15 17:12 UTC (permalink / raw)
  To: Duncan Sands; +Cc: linux-kernel

On Tue, Oct 15, 2002 at 03:36:38PM +0200, Duncan Sands wrote:

> fs/locks.c

http://marc.theaimsgroup.com/?l=linux-kernel&m=103352923323135&w=2

I don't think the maintainer has submitted a patch for it yet.

regards
john

-- 
"CUT IT OUT FACEHEAD"
	- jeffk

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Use of yield() in the kernel
  2002-10-15 17:12 ` John Levon
@ 2002-10-17  6:36   ` Duncan Sands
  0 siblings, 0 replies; 15+ messages in thread
From: Duncan Sands @ 2002-10-17  6:36 UTC (permalink / raw)
  To: John Levon; +Cc: linux-kernel

On Tuesday 15 October 2002 19:12, John Levon wrote:
> On Tue, Oct 15, 2002 at 03:36:38PM +0200, Duncan Sands wrote:
> > fs/locks.c
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=103352923323135&w=2
>
> I don't think the maintainer has submitted a patch for it yet.

A fix has been applied in Linus's tree.  See

http://linux.bkbits.net:8080/linux-2.5/cset@1.781.27.3?nav=index.html|ChangeSet@-3d

Duncan.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Use of yield() in the kernel
  2002-10-15 13:36 Use of yield() in the kernel Duncan Sands
  2002-10-15 15:15 ` Ingo Molnar
  2002-10-15 17:12 ` John Levon
@ 2002-10-18 18:26 ` Pavel Machek
  2002-10-19 12:25   ` Duncan Sands
  2 siblings, 1 reply; 15+ messages in thread
From: Pavel Machek @ 2002-10-18 18:26 UTC (permalink / raw)
  To: Duncan Sands; +Cc: linux-kernel, Andrew Morton

Hi!

> Here is the list of files using yield(), excluding non-i386 arch specific files:
> 
...
> kernel/suspend.c

This is okay.
									Pavel
-- 
I'm pavel@ucw.cz. "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at discuss@linmodems.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Use of yield() in the kernel
  2002-10-18 18:26 ` Pavel Machek
@ 2002-10-19 12:25   ` Duncan Sands
  2002-10-19 22:00     ` Pavel Machek
  0 siblings, 1 reply; 15+ messages in thread
From: Duncan Sands @ 2002-10-19 12:25 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel, Andrew Morton

> > Here is the list of files using yield(), excluding non-i386 arch specific
> > files:
>
> ...
>
> > kernel/suspend.c
>
> This is okay.

Hi Pavel, I agree.  I have some questions about the code though:
when you come across a thread with (p->flags & PF_FROZEN), why
break out of the loop?  Why not just skip this thread and go on to the
next one?  Also, does it matter if the code doing the freezing is itself
frozen?

Ciao, Duncan.

PS: Here is the code, for reference:

        do {
                todo = 0;
                read_lock(&tasklist_lock);
                do_each_thread(g, p) {
                        unsigned long flags;
                        INTERESTING(p);
                        if (p->flags & PF_FROZEN)
                                continue;

                        /* FIXME: smp problem here: we may not access other proc
ess' flags
                           without locking */
                        p->flags |= PF_FREEZE;
                        spin_lock_irqsave(&p->sig->siglock, flags);
                        signal_wake_up(p);
                        spin_unlock_irqrestore(&p->sig->siglock, flags);
                        todo++;
                } while_each_thread(g, p);
                read_unlock(&tasklist_lock);
                yield();
                if (time_after(jiffies, start_time + TIMEOUT)) {
                        printk( "\n" );
                        return todo;
                }
        } while(todo);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Use of yield() in the kernel
  2002-10-19 12:25   ` Duncan Sands
@ 2002-10-19 22:00     ` Pavel Machek
  2002-10-20  9:10       ` Duncan Sands
  2002-10-20 11:22       ` Duncan Sands
  0 siblings, 2 replies; 15+ messages in thread
From: Pavel Machek @ 2002-10-19 22:00 UTC (permalink / raw)
  To: Duncan Sands; +Cc: linux-kernel, Andrew Morton

Hi!

> > > Here is the list of files using yield(), excluding non-i386 arch specific
> > > files:
> >
> > ...
> >
> > > kernel/suspend.c
> >
> > This is okay.
> 
> Hi Pavel, I agree.  I have some questions about the code though:
> when you come across a thread with (p->flags & PF_FROZEN), why
> break out of the loop?  Why not just skip this thread and go on to
> the

There's "continue;" in there, and it should "just skip this thread".
									Pavel


> next one?  Also, does it matter if the code doing the freezing is itself
> frozen?
> 
> Ciao, Duncan.
> 
> PS: Here is the code, for reference:
> 
>         do {
>                 todo = 0;
>                 read_lock(&tasklist_lock);
>                 do_each_thread(g, p) {
>                         unsigned long flags;
>                         INTERESTING(p);
>                         if (p->flags & PF_FROZEN)
>                                 continue;
> 
>                         /* FIXME: smp problem here: we may not access other proc
> ess' flags
>                            without locking */
>                         p->flags |= PF_FREEZE;
>                         spin_lock_irqsave(&p->sig->siglock, flags);
>                         signal_wake_up(p);
>                         spin_unlock_irqrestore(&p->sig->siglock, flags);
>                         todo++;
>                 } while_each_thread(g, p);
>                 read_unlock(&tasklist_lock);
>                 yield();
>                 if (time_after(jiffies, start_time + TIMEOUT)) {
>                         printk( "\n" );
>                         return todo;
>                 }
>         } while(todo);

-- 
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Use of yield() in the kernel
  2002-10-19 22:00     ` Pavel Machek
@ 2002-10-20  9:10       ` Duncan Sands
  2002-10-22 17:24         ` Mark Mielke
  2002-10-20 11:22       ` Duncan Sands
  1 sibling, 1 reply; 15+ messages in thread
From: Duncan Sands @ 2002-10-20  9:10 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel, Andrew Morton

> > Hi Pavel, I agree.  I have some questions about the code though:
> > when you come across a thread with (p->flags & PF_FROZEN), why
> > break out of the loop?  Why not just skip this thread and go on to
> > the
>
> There's "continue;" in there, and it should "just skip this thread".
> 									Pavel

I meant, why not just do as in the following code (I've changed
INTERESTING also, for same reason, as explained below):

	do {
		todo = 0;
		read_lock(&tasklist_lock);
		do_each_thread(g, p) {
			unsigned long flags;

                        if (
				!(p->flags & PF_IOTHREAD) &&
				(p != current) &&
				(p->state != TASK_ZOMBIE) &&
				!(p->flags & PF_FROZEN)
			) {

				/* FIXME: smp problem here: we may not access other process' flags
				   without locking */
				p->flags |= PF_FREEZE;
				spin_lock_irqsave(&p->sig->siglock, flags);
				signal_wake_up(p);
				spin_unlock_irqrestore(&p->sig->siglock, flags);
				todo++;
			}
		} while_each_thread(g, p);
		read_unlock(&tasklist_lock);
		yield();
		if (time_after(jiffies, start_time + TIMEOUT)) {
			printk( "\n" );
			printk(KERN_ERR " stopping tasks failed (%d tasks remaining)\n", todo );
			return todo;
		}
	} while(todo);

The reason is that yield(), which sends the current task to the expired list,
can take a long time before it runs again.  With the current code, every time
you meet, for example, a kernel thread you break out of the loop, perform
a yield (= wait a long time), before going on to the next thread.  This could
take forever.  With code like that above, you mark as many tasks frozen as
possible, with as few yields as possible.  Isn't that better?

Ciao,

Duncan.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Use of yield() in the kernel
  2002-10-19 22:00     ` Pavel Machek
  2002-10-20  9:10       ` Duncan Sands
@ 2002-10-20 11:22       ` Duncan Sands
  1 sibling, 0 replies; 15+ messages in thread
From: Duncan Sands @ 2002-10-20 11:22 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel, Andrew Morton

Well goodness me!  That just goes to show that spending
all your time programming in other languages can be harmful
to your C (or: to your brain)!  Let me just crawl away and find
a hole to die in before someone pours salt on me...

Ciao,

Duncan.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Use of yield() in the kernel
  2002-10-20  9:10       ` Duncan Sands
@ 2002-10-22 17:24         ` Mark Mielke
  2002-10-22 18:58           ` Duncan Sands
  2002-10-25 14:15           ` Duncan Sands
  0 siblings, 2 replies; 15+ messages in thread
From: Mark Mielke @ 2002-10-22 17:24 UTC (permalink / raw)
  To: Duncan Sands; +Cc: Pavel Machek, linux-kernel, Andrew Morton

Would it be sensible to add a "yield_short()" function to the kernel?

mark


On Sun, Oct 20, 2002 at 11:10:33AM +0200, Duncan Sands wrote:
> > > Hi Pavel, I agree.  I have some questions about the code though:
> > > when you come across a thread with (p->flags & PF_FROZEN), why
> > > break out of the loop?  Why not just skip this thread and go on to
> > > the
> >
> > There's "continue;" in there, and it should "just skip this thread".
> > 									Pavel
> 
> I meant, why not just do as in the following code (I've changed
> INTERESTING also, for same reason, as explained below):
> 
> 	do {
> 		todo = 0;
> 		read_lock(&tasklist_lock);
> 		do_each_thread(g, p) {
> 			unsigned long flags;
> 
>                         if (
> 				!(p->flags & PF_IOTHREAD) &&
> 				(p != current) &&
> 				(p->state != TASK_ZOMBIE) &&
> 				!(p->flags & PF_FROZEN)
> 			) {
> 
> 				/* FIXME: smp problem here: we may not access other process' flags
> 				   without locking */
> 				p->flags |= PF_FREEZE;
> 				spin_lock_irqsave(&p->sig->siglock, flags);
> 				signal_wake_up(p);
> 				spin_unlock_irqrestore(&p->sig->siglock, flags);
> 				todo++;
> 			}
> 		} while_each_thread(g, p);
> 		read_unlock(&tasklist_lock);
> 		yield();
> 		if (time_after(jiffies, start_time + TIMEOUT)) {
> 			printk( "\n" );
> 			printk(KERN_ERR " stopping tasks failed (%d tasks remaining)\n", todo );
> 			return todo;
> 		}
> 	} while(todo);
> 
> The reason is that yield(), which sends the current task to the expired list,
> can take a long time before it runs again.  With the current code, every time
> you meet, for example, a kernel thread you break out of the loop, perform
> a yield (= wait a long time), before going on to the next thread.  This could
> take forever.  With code like that above, you mark as many tasks frozen as
> possible, with as few yields as possible.  Isn't that better?
> 
> Ciao,
> 
> Duncan.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Use of yield() in the kernel
  2002-10-22 17:24         ` Mark Mielke
@ 2002-10-22 18:58           ` Duncan Sands
  2002-10-25 14:15           ` Duncan Sands
  1 sibling, 0 replies; 15+ messages in thread
From: Duncan Sands @ 2002-10-22 18:58 UTC (permalink / raw)
  To: Mark Mielke; +Cc: Pavel Machek, linux-kernel, Andrew Morton

On Tuesday 22 October 2002 19:24, Mark Mielke wrote:
> Would it be sensible to add a "yield_short()" function to the kernel?

You can do:

set_current_state(TASK_RUNNING);
schedule();

I'm not clear on what you are guaranteed to get - for example,
it looks as if it can sometimes return without sleeping at all.
There is also cond_resched(), which is

static inline void cond_resched(void)
{
        if (need_resched())
                __cond_resched();
}

void __cond_resched(void)
{
        set_current_state(TASK_RUNNING);
        schedule();
}

I don't know when need_resched evaluates to true, I haven't
had time to look into this yet.

Ciao, Duncan.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Use of yield() in the kernel
  2002-10-15 16:27 ` Duncan Sands
@ 2002-10-25  8:43   ` Duncan Sands
  0 siblings, 0 replies; 15+ messages in thread
From: Duncan Sands @ 2002-10-25  8:43 UTC (permalink / raw)
  To: Duncan Sands, Marc-Christian Petersen, linux-kernel

On Tuesday 15 October 2002 18:27, Duncan Sands wrote:
> > > This already bit ext3 transaction batching, c.f. Andrew Morton's
> > >
> > >> [PATCH] remove the sched_yield from the ext3 fsync path
> >
> > where did you read this ^^? :)
> >
> > ciao, Marc
>
> http://linux.bkbits.net:8080/linux-2.5/cset@1.733.7.6?nav=index.html|Change
>Set@-7d

Actually its

http://linux.bkbits.net:8080/linux-2.5/cset@1.781.32.3?nav=index.html|ChangeSet@-3w

Sorry about that.

Duncan.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Use of yield() in the kernel
  2002-10-22 17:24         ` Mark Mielke
  2002-10-22 18:58           ` Duncan Sands
@ 2002-10-25 14:15           ` Duncan Sands
  1 sibling, 0 replies; 15+ messages in thread
From: Duncan Sands @ 2002-10-25 14:15 UTC (permalink / raw)
  To: Mark Mielke; +Cc: Pavel Machek, linux-kernel, Andrew Morton

This is my current understanding of some different ways to schedule:

(1) Yield to a higher priority task (if there is one):
	cond_resched();

(2) Yield to the next task (if there is another one):
	set_current_state(TASK_RUNNING);
	schedule();

(3) Yield to all tasks:
	yield();

Notes:
(1) Also yields if the current task's time quantum has expired.
(3) Only guaranteed to yield to all tasks on the active list.

Corrections?

Duncan.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2002-10-25 14:09 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-10-15 13:36 Use of yield() in the kernel Duncan Sands
2002-10-15 15:15 ` Ingo Molnar
2002-10-15 17:12 ` John Levon
2002-10-17  6:36   ` Duncan Sands
2002-10-18 18:26 ` Pavel Machek
2002-10-19 12:25   ` Duncan Sands
2002-10-19 22:00     ` Pavel Machek
2002-10-20  9:10       ` Duncan Sands
2002-10-22 17:24         ` Mark Mielke
2002-10-22 18:58           ` Duncan Sands
2002-10-25 14:15           ` Duncan Sands
2002-10-20 11:22       ` Duncan Sands
  -- strict thread matches above, loose matches on Subject: below --
2002-10-15 16:20 Marc-Christian Petersen
2002-10-15 16:27 ` Duncan Sands
2002-10-25  8:43   ` Duncan Sands

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox