softirq in pre3 and all linux ports

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* softirq in pre3 and all linux ports
@ 2001-06-19 19:03 Andrea Arcangeli
  2001-06-20  3:33 ` Paul Mackerras
  0 siblings, 1 reply; 12+ messages in thread
From: Andrea Arcangeli @ 2001-06-19 19:03 UTC (permalink / raw)
  To: Linus Torvalds, Alan Cox, Ingo Molnar, kuznet; +Cc: linux-kernel

With pre3 there are bugs introduced into mainline that are getting
extended to all architectures.

First of all nucking the handle_softirq from entry.S is wrong. ppc
copied without thinking and we'll need to resurrect it too for example
so please arch maintainers don't kill that check (alpha in pre3 by luck
didn't killed it I think).

Without such check before returning to userspace any tasklet or softirq 
posted by kernel code will get a latency of 1/HZ.

Secondly the pre3 softirq re-enables irqs before returning from
do_softirq which is wrong as well because it can generate irq flood and
stack overflow and do_softirq run not at the first not nested irq layer.

Third if a softirq or a tasklet post itself running again the do_softirq
can starve userspace in one or more cpus.

Fourth if the tasklet or softirq or bottom half hander is been marked
running again because of another even (like a nested irq) the kernel can
starve userspace too. (softirqs are much heavier than the irq handler so
it can also live lockup much more easily this way)

This patch that I have in my tree since some day fixes all those issues.

The assmembler changes needed in the entry.S files while returning to
userspace can be described in C code this way, this is the 2.4.5 way:

	if (softirq_active(cpu) & softirq_mask(cpu))
		do_softirq();

This is the 2.4.6pre3+belowfix way:

	if (softirq_pending(cpu))
		do_softirq()

pending doesn't need to be a 64bit integer (it can though) but it needs
to be at least a 32bit integer. An `int' is fine for most archs, on
alpha we use a long though and that's fine too.

So I recommend Linus merging this patch that fixes all the above
mentioned bugs (the anti starvation/live lockup logic is called
ksoftirqd):

	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.6pre3aa1/00_ksoftirqd-6

Plus those SMP race fixes for archs where atomic operations aren't
implicit memory barriers:

	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.6pre3aa1/00_softirq-fixes-4

Plus this scheduler generic cpu binding fix to avoid ksoftirqd
deadlocking at boot:

	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.6pre3aa1/00_cpus_allowed-1

I verified the patches applies just fine to 2.4.6pre3 and they're not
controversial.

If you've any question on how to update a certain kernel port I will do
my best to help in the update process!

Andrea

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: softirq in pre3 and all linux ports
  2001-06-19 19:03 softirq in pre3 and all linux ports Andrea Arcangeli
@ 2001-06-20  3:33 ` Paul Mackerras
  2001-06-20  3:54   ` Andrea Arcangeli
  2001-06-20 18:16   ` kuznet
  0 siblings, 2 replies; 12+ messages in thread
From: Paul Mackerras @ 2001-06-20  3:33 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Linus Torvalds, Alan Cox, Ingo Molnar, kuznet, linux-kernel

Andrea Arcangeli writes:

> With pre3 there are bugs introduced into mainline that are getting
> extended to all architectures.
> 
> First of all nucking the handle_softirq from entry.S is wrong. ppc
> copied without thinking and we'll need to resurrect it too for example

Well, I object to the "without thinking" bit.  It seems to me that
code that raises a softirq without having either hard interrupts or
BHs disabled is buggy - why would you want to do that?  And if we do
want to allow that, shouldn't we put the check in raise_softirq or the
equivalent, to get the minimum latency?

> Fourth if the tasklet or softirq or bottom half hander is been marked
> running again because of another even (like a nested irq) the kernel can
> starve userspace too. (softirqs are much heavier than the irq handler so
> it can also live lockup much more easily this way)

Soft irqs should definitely not be much heavier than an irq handler,
if they are then we have implemented them wrongly somehow.

> So I recommend Linus merging this patch that fixes all the above
> mentioned bugs (the anti starvation/live lockup logic is called
> ksoftirqd):

ksoftirqd seems like the wrong solution to the problem to me, if we
really getting starved by softirqs then we need to look at whether
whatever is doing it should be a kernel thread itself rather than
doing it in softirqs.  Do you have a concrete example of the
starvation/live lockup that you can describe to us?

Regards,
Paul.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: softirq in pre3 and all linux ports
  2001-06-20  3:33 ` Paul Mackerras
@ 2001-06-20  3:54   ` Andrea Arcangeli
  2001-06-20  4:00     ` David S. Miller
  2001-06-20 12:18     ` Paul Mackerras
  2001-06-20 18:16   ` kuznet
  1 sibling, 2 replies; 12+ messages in thread
From: Andrea Arcangeli @ 2001-06-20  3:54 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Linus Torvalds, Alan Cox, Ingo Molnar, kuznet, linux-kernel

On Wed, Jun 20, 2001 at 01:33:19PM +1000, Paul Mackerras wrote:
> Well, I object to the "without thinking" bit. [..]

agreed, apologies.

> BHs disabled is buggy - why would you want to do that?  And if we do

tasklet_schedule

> want to allow that, shouldn't we put the check in raise_softirq or the
> equivalent, to get the minimum latency?

We should release the stack before running the softirq (some place uses
softirqs to release the stack and avoid overflows).

> Soft irqs should definitely not be much heavier than an irq handler,
> if they are then we have implemented them wrongly somehow.

ip + tcp are more intensive than just queueing a packet in a blacklog.
That's why they're not done in irq context in first place.

> ksoftirqd seems like the wrong solution to the problem to me, if we
> really getting starved by softirqs then we need to look at whether
> whatever is doing it should be a kernel thread itself rather than
> doing it in softirqs.  Do you have a concrete example of the
> starvation/live lockup that you can describe to us?

I don't have gigabit ethernet so I cannot flood my boxes to death.
But I think it's real, and a softirq marking itself runnable again is
another case to handle without live lockups or starvation.

Andrea

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: softirq in pre3 and all linux ports
  2001-06-20  3:54   ` Andrea Arcangeli
@ 2001-06-20  4:00     ` David S. Miller
  2001-06-20  4:07       ` Andrea Arcangeli
  2001-06-20 12:18     ` Paul Mackerras
  1 sibling, 1 reply; 12+ messages in thread
From: David S. Miller @ 2001-06-20  4:00 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Paul Mackerras, Linus Torvalds, Alan Cox, Ingo Molnar, kuznet,
	linux-kernel


Andrea Arcangeli writes:
 > I don't have gigabit ethernet so I cannot flood my boxes to death.
 > But I think it's real, and a softirq marking itself runnable again is
 > another case to handle without live lockups or starvation.

I think (still) that you're just moving the problem around and
not actually changing anything.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: softirq in pre3 and all linux ports
  2001-06-20  4:00     ` David S. Miller
@ 2001-06-20  4:07       ` Andrea Arcangeli
  2001-06-20 18:06         ` kuznet
  0 siblings, 1 reply; 12+ messages in thread
From: Andrea Arcangeli @ 2001-06-20  4:07 UTC (permalink / raw)
  To: David S. Miller
  Cc: Paul Mackerras, Linus Torvalds, Alan Cox, Ingo Molnar, kuznet,
	linux-kernel

On Tue, Jun 19, 2001 at 09:00:24PM -0700, David S. Miller wrote:
> 
> Andrea Arcangeli writes:
>  > I don't have gigabit ethernet so I cannot flood my boxes to death.
>  > But I think it's real, and a softirq marking itself runnable again is
>  > another case to handle without live lockups or starvation.
> 
> I think (still) that you're just moving the problem around and
> not actually changing anything.

something will defintely to change radically if the softirq marks itself
runnable again. but this to me sounds similar to the other one (irq
flood that basically left the softirq pending every time you check it).

Andrea

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: softirq in pre3 and all linux ports
  2001-06-20  3:54   ` Andrea Arcangeli
  2001-06-20  4:00     ` David S. Miller
@ 2001-06-20 12:18     ` Paul Mackerras
  2001-06-20 12:52       ` Andrea Arcangeli
  1 sibling, 1 reply; 12+ messages in thread
From: Paul Mackerras @ 2001-06-20 12:18 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Linus Torvalds, Alan Cox, Ingo Molnar, kuznet, linux-kernel

Andrea Arcangeli writes:

> We should release the stack before running the softirq (some place uses
> softirqs to release the stack and avoid overflows).

Well if they are relying on having a lot of stack available then those
places are buggy.  Once the softirq is made pending it can run at any
time that interrupts are enabled.  You can't rely on a softirq handler
having any more stack available than a hard interrupt handler has.

> ip + tcp are more intensive than just queueing a packet in a blacklog.
> That's why they're not done in irq context in first place.

Ah, ok, I misunderstood, I thought you were saying that that softirq
framework itself had a lot of overhead.

> I don't have gigabit ethernet so I cannot flood my boxes to death.
> But I think it's real, and a softirq marking itself runnable again is
> another case to handle without live lockups or starvation.

As for the gigabit ethernet case, if we are having packets coming in
and generating hard interrupts at that sort of a rate then what we
really need is the sort of interrupt throttling that Jamal talked
about at the 2.5 kernel kickoff.

It seems to me that possibly softirqs are being used in some places
where a kernel thread would be more appropriate.  Instead of making
softirqs use a kernel thread, I think it would be better to find the
places that should use a thread and make them do so.  Softirqs are
still after all interrupt handlers (ones that run at a lower priority
than any hardware interrupt) and should be treated as such.

Paul.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: softirq in pre3 and all linux ports
  2001-06-20 12:18     ` Paul Mackerras
@ 2001-06-20 12:52       ` Andrea Arcangeli
  0 siblings, 0 replies; 12+ messages in thread
From: Andrea Arcangeli @ 2001-06-20 12:52 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Linus Torvalds, Alan Cox, Ingo Molnar, kuznet, linux-kernel

On Wed, Jun 20, 2001 at 10:18:10PM +1000, Paul Mackerras wrote:
> Well if they are relying on having a lot of stack available then those
> places are buggy.  Once the softirq is made pending it can run at any

it's not about having lots of stack available, it's about avoiding
recursion.

Andrea

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: softirq in pre3 and all linux ports
  2001-06-20  4:07       ` Andrea Arcangeli
@ 2001-06-20 18:06         ` kuznet
  2001-06-20 22:10           ` David S. Miller
  0 siblings, 1 reply; 12+ messages in thread
From: kuznet @ 2001-06-20 18:06 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: davem, paulus, torvalds, alan, mingo, linux-kernel

Hello!

> > Andrea Arcangeli writes:
> >  > I don't have gigabit ethernet so I cannot flood my boxes to death.
> >  > But I think it's real, and a softirq marking itself runnable again is
> >  > another case to handle without live lockups or starvation.

Andrea, you do not need gigabit interfaces to check this. 100Mbit ones
are enough and even better, because they do not mitigate as rule
and consume more resources. 8) Actually, you may laugh, but one 10Mbit(!)
interface is enough in some curcumstances, namely when stack does more work
than usually: sniffing, connection tracking in presence of fragments,
syn flooding etc.

Actually, now I do not understand why TUX still works with Ingo's patch.
As soon as bulk work is made in thread context, it should die pretty
fastly doing no progress. :-)

> > I think (still) that you're just moving the problem around and
> > not actually changing anything.

Well, ksoftirqd is not sort of placebo yet. :-)

OK. Let's forget about infinite thread latency and live lock problems
introduced by Ingo's patch. Eventually, BSD does exactly the same
thing for ages and nobody but security paranoics cried about this
too much. We are just fully bsd compliant now. 8)

Let's look at different angle: f.e. with Ingo's patch, as soon as
one cpu processes some global BH, all the rest of cpus will spin
waiting for global bh release. Is this good? I am afraid this is not
quite good.

Alexey

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: softirq in pre3 and all linux ports
  2001-06-20  3:33 ` Paul Mackerras
  2001-06-20  3:54   ` Andrea Arcangeli
@ 2001-06-20 18:16   ` kuznet
  1 sibling, 0 replies; 12+ messages in thread
From: kuznet @ 2001-06-20 18:16 UTC (permalink / raw)
  To: paulus; +Cc: andrea, torvalds, alan, mingo, linux-kernel

Hello!

> Soft irqs should definitely not be much heavier than an irq handler,
> if they are then we have implemented them wrongly somehow.

For example, all the networking nicely fits to this class. :-)

Alexey

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: softirq in pre3 and all linux ports
  2001-06-20 18:06         ` kuznet
@ 2001-06-20 22:10           ` David S. Miller
  2001-06-20 23:16             ` Andrea Arcangeli
  2001-06-21 16:58             ` kuznet
  0 siblings, 2 replies; 12+ messages in thread
From: David S. Miller @ 2001-06-20 22:10 UTC (permalink / raw)
  To: kuznet; +Cc: Andrea Arcangeli, paulus, torvalds, alan, mingo, linux-kernel

kuznet@ms2.inr.ac.ru writes:
 > Actually, now I do not understand why TUX still works with Ingo's patch.
 > As soon as bulk work is made in thread context, it should die pretty
 > fastly doing no progress. :-)

TUX also has per-cpu timers patch of Ingo as well.
Did you forget this? :-)

 > Let's look at different angle: f.e. with Ingo's patch, as soon as
 > one cpu processes some global BH, all the rest of cpus will spin
 > waiting for global bh release. Is this good? I am afraid this is not
 > quite good.

It is equivalent to some old dumb code doing cli() right?

The only interesting global BHs left right now are:

1) Timers
2) SCSI BH

SCSI may be transformed right now in 15 minutes of boring editing to a
softirq, it has all the appropriate locking already.

Timers have no hard technical reason for not being a softirq
either.  However, this would be work requiring real thought,
not just mindless edits.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: softirq in pre3 and all linux ports
  2001-06-20 22:10           ` David S. Miller
@ 2001-06-20 23:16             ` Andrea Arcangeli
  2001-06-21 16:58             ` kuznet
  1 sibling, 0 replies; 12+ messages in thread
From: Andrea Arcangeli @ 2001-06-20 23:16 UTC (permalink / raw)
  To: David S. Miller; +Cc: kuznet, paulus, torvalds, alan, mingo, linux-kernel

On Wed, Jun 20, 2001 at 03:10:13PM -0700, David S. Miller wrote:
> TUX also has per-cpu timers patch of Ingo as well.

Not in my tree, tux doesn't depend on it at all. that's a further
optimization that tcp will take advatage of regardless of tux, same
applies to the pagecache scalability hashlock patch.

Andrea

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: softirq in pre3 and all linux ports
  2001-06-20 22:10           ` David S. Miller
  2001-06-20 23:16             ` Andrea Arcangeli
@ 2001-06-21 16:58             ` kuznet
  1 sibling, 0 replies; 12+ messages in thread
From: kuznet @ 2001-06-21 16:58 UTC (permalink / raw)
  To: David S. Miller; +Cc: andrea, paulus, torvalds, alan, mingo, linux-kernel

Hello!

> TUX also has per-cpu timers patch of Ingo as well.
> Did you forget this? :-)

If I remember correctly, it has threaded timer pool, but timers still acquire
global bh lock, so that the things become only worse. Apparently,
it is invisible at first sight because bulk work typical for tux and triggered
by timers, is moved to cpu local tasklets (garbage collection: time wait etc.).

> It is equivalent to some old dumb code doing cli() right?

Sort of.

> The only interesting global BHs left right now are:
> 
> 1) Timers
> 2) SCSI BH

In generic server case, yes.

But also add BH_IMMEDIATE and BHs, used by hordes of devices.

> Timers have no hard technical reason for not being a softirq
> either.  However, this would be work requiring real thought,
> not just mindless edits.

Yes.

But, in any case, global BHs are not a pathalogy: they were handy tool,
allowing to hide lots of spinlocks. And not plain spinlocks, but
asynchronous ones. It was pretty light, but had latency up to 1/HZ
in the worst case. Now they have unreasonably strict latency
(useless, as rule) but eat cpu instead.

Alexey

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2001-06-21 16:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-06-19 19:03 softirq in pre3 and all linux ports Andrea Arcangeli
2001-06-20  3:33 ` Paul Mackerras
2001-06-20  3:54   ` Andrea Arcangeli
2001-06-20  4:00     ` David S. Miller
2001-06-20  4:07       ` Andrea Arcangeli
2001-06-20 18:06         ` kuznet
2001-06-20 22:10           ` David S. Miller
2001-06-20 23:16             ` Andrea Arcangeli
2001-06-21 16:58             ` kuznet
2001-06-20 12:18     ` Paul Mackerras
2001-06-20 12:52       ` Andrea Arcangeli
2001-06-20 18:16   ` kuznet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox