public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.4.20 kernel/timer.c may incorrectly reenable interrupts
@ 2003-04-11  6:47 Keith Owens
  2003-04-11  7:15 ` george anzinger
  0 siblings, 1 reply; 7+ messages in thread
From: Keith Owens @ 2003-04-11  6:47 UTC (permalink / raw)
  To: linux-kernel

2.4.20 kernel/timer.c

static inline void update_times(void)
{
	unsigned long ticks;

	/*
	 * update_times() is run from the raw timer_bh handler so we
	 * just know that the irqs are locally enabled and so we don't
	 * need to save/restore the flags of the local CPU here. -arca
	 */
	write_lock_irq(&xtime_lock);
	vxtime_lock();

	ticks = jiffies - wall_jiffies;
	if (ticks) {
		wall_jiffies += ticks;
		update_wall_time(ticks);
	}
	vxtime_unlock();
	write_unlock_irq(&xtime_lock);
	calc_load(ticks);
}

I hit one case when the routine was called with interrupts disabled and
it unconditionally enabled them, with nasty side effects.  Code fragment

  local_irq_save();
  local_bh_disable();
  ....
  local_bh_enable();
  local_irq_restore();

local_bh_enable() checks for pending softirqs, finds that there is an
outstanding timer bh and runs it.  do_softirq() -> tasklet_hi_action()
-> bh_action() -> timer_bh() -> update_times() which unconditionally
reenables interrupts.  Then the timer code issued cli(), because
interrupts were incorrectly reenabled it tried to get the global cli
lock and hung.

There is no documentation that defines the required nesting order of
local_irq and local_bh.  Even if the above code fragment is deemed to
be illegal, there are uses of local_bh_enable() all through the kernel,
it will be difficult to prove that none of them are called with
interrupts disabled.  If there is any chance that local_bh_enable() is
called with interrupts off, update_times() is wrong.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.4.20 kernel/timer.c may incorrectly reenable interrupts
  2003-04-11  6:47 2.4.20 kernel/timer.c may incorrectly reenable interrupts Keith Owens
@ 2003-04-11  7:15 ` george anzinger
  2003-04-11  9:27   ` Ingo Oeser
  0 siblings, 1 reply; 7+ messages in thread
From: george anzinger @ 2003-04-11  7:15 UTC (permalink / raw)
  To: Keith Owens; +Cc: linux-kernel

Keith Owens wrote:
> 2.4.20 kernel/timer.c
> 
> static inline void update_times(void)
> {
> 	unsigned long ticks;
> 
> 	/*
> 	 * update_times() is run from the raw timer_bh handler so we
> 	 * just know that the irqs are locally enabled and so we don't
> 	 * need to save/restore the flags of the local CPU here. -arca
> 	 */
> 	write_lock_irq(&xtime_lock);
> 	vxtime_lock();
> 
> 	ticks = jiffies - wall_jiffies;
> 	if (ticks) {
> 		wall_jiffies += ticks;
> 		update_wall_time(ticks);
> 	}
> 	vxtime_unlock();
> 	write_unlock_irq(&xtime_lock);
> 	calc_load(ticks);
> }
> 
> I hit one case when the routine was called with interrupts disabled and
> it unconditionally enabled them, with nasty side effects.  Code fragment
> 
>   local_irq_save();
>   local_bh_disable();
>   ....
>   local_bh_enable();
>   local_irq_restore();
> 
> local_bh_enable() checks for pending softirqs, finds that there is an
> outstanding timer bh and runs it.  do_softirq() -> tasklet_hi_action()
> -> bh_action() -> timer_bh() -> update_times() which unconditionally
> reenables interrupts.  Then the timer code issued cli(), because
> interrupts were incorrectly reenabled it tried to get the global cli
> lock and hung.

If you look at do_softirq() you will see that it enables irqs 
unconditionally while calling pending functions.  It does, however, 
save the irq on entry and restore it on exit (seems strange eh).
> 
> There is no documentation that defines the required nesting order of
> local_irq and local_bh.  Even if the above code fragment is deemed to
> be illegal, there are uses of local_bh_enable() all through the kernel,
> it will be difficult to prove that none of them are called with
> interrupts disabled.  If there is any chance that local_bh_enable() is
> called with interrupts off, update_times() is wrong.

IMHO, update_times() is right!  The code fragment you found is wrong. 
  If there is a real need we could code up a check to see if 
local_bh_enable() is called with interrupts off.

As machines get faster and faster, it will be come more and more of a 
burden to "stop the world" and sync with the interrupt system, which 
is running at a much slower speed.  This is what the cli / sti/ 
restore flags causes.  I saw one test where the time to do the cli was 
as long as the run_timer_list code, for example.


-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.4.20 kernel/timer.c may incorrectly reenable interrupts
  2003-04-11  7:15 ` george anzinger
@ 2003-04-11  9:27   ` Ingo Oeser
  2003-04-11 21:21     ` george anzinger
  0 siblings, 1 reply; 7+ messages in thread
From: Ingo Oeser @ 2003-04-11  9:27 UTC (permalink / raw)
  To: george anzinger; +Cc: Keith Owens, linux-kernel

On Fri, Apr 11, 2003 at 12:15:54AM -0700, george anzinger wrote:
> As machines get faster and faster, it will be come more and more of a 
> burden to "stop the world" and sync with the interrupt system, which 
> is running at a much slower speed.  This is what the cli / sti/ 
> restore flags causes.  I saw one test where the time to do the cli was 
> as long as the run_timer_list code, for example.

Maybe we could replace some cli/sti pairs with spinlocks? If it
takes more time to cli/sti than to run the whole code section
that will be protected by the spinlock, then it might be better
to use that instead and block in the IRQ dispatch code.

But I have no measures, how fast the spinlocks are in the
non-/contention case

Problems: 
   - The total amount of CLI/STI doesn't matter, for spinlocks it
     does (they are not recursive)

   - spinlocks are usally not compiled in

   - Older CPUs may still benefit from cli/sti.

What do you think?

Regards

Ingo Oeser
-- 
Marketing ist die Kunst, Leuten Sachen zu verkaufen, die sie
nicht brauchen, mit Geld, was sie nicht haben, um Leute zu
beeindrucken, die sie nicht moegen.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.4.20 kernel/timer.c may incorrectly reenable interrupts
  2003-04-11  9:27   ` Ingo Oeser
@ 2003-04-11 21:21     ` george anzinger
  2003-04-12  8:55       ` Ingo Oeser
  0 siblings, 1 reply; 7+ messages in thread
From: george anzinger @ 2003-04-11 21:21 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: Keith Owens, linux-kernel

Ingo Oeser wrote:
> On Fri, Apr 11, 2003 at 12:15:54AM -0700, george anzinger wrote:
> 
>>As machines get faster and faster, it will be come more and more of a 
>>burden to "stop the world" and sync with the interrupt system, which 
>>is running at a much slower speed.  This is what the cli / sti/ 
>>restore flags causes.  I saw one test where the time to do the cli was 
>>as long as the run_timer_list code, for example.
> 
> 
> Maybe we could replace some cli/sti pairs with spinlocks? If it
> takes more time to cli/sti than to run the whole code section
> that will be protected by the spinlock, then it might be better
> to use that instead and block in the IRQ dispatch code.

Not so fast there :)  The cli/sti is there to protect from "same cpu" 
contention, i.e. this machine can come here on interrupt so don't 
allow interrupts.  The spin lock is only good for the "other" cpus.

On the other hand, a cli/sti will NOT protect from other machine 
interrupts (baring the global cli which is not even in 2.5).

The better thing to do, IMHO, is to move the work off the interrupt 
path where only spin locks (and preemption locks) are required.

Another possibility is to make more use of the new read/write stuff 
that is now being used for the xtime lock.  The idea here is that we 
don't care if an interrupt (or the other machine) visits this data 
while we are here as long as we know about it and can then redo what 
we are doing.  This works very well for fetching a few words that need 
to be fetch atomically WRT the lock.  If the fetch is not atomic (i.e. 
was interrupted), just try again.  I haven't measured or heard of 
measurements on this change, but I suspect that there is a significant 
reduction in the time to do gettimeofday() because the cli/sti is not 
on the read path any more.
> 
> But I have no measures, how fast the spinlocks are in the
> non-/contention case
> 
> Problems: 
>    - The total amount of CLI/STI doesn't matter, for spinlocks it
>      does (they are not recursive)
> 
>    - spinlocks are usally not compiled in
> 
>    - Older CPUs may still benefit from cli/sti.
> 
> What do you think?
> 
> Regards
> 
> Ingo Oeser

-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.4.20 kernel/timer.c may incorrectly reenable interrupts
  2003-04-11 21:21     ` george anzinger
@ 2003-04-12  8:55       ` Ingo Oeser
  2003-04-14 21:49         ` george anzinger
  0 siblings, 1 reply; 7+ messages in thread
From: Ingo Oeser @ 2003-04-12  8:55 UTC (permalink / raw)
  To: george anzinger; +Cc: Keith Owens, linux-kernel

On Fri, Apr 11, 2003 at 02:21:38PM -0700, george anzinger wrote:
> Ingo Oeser wrote:
> > Maybe we could replace some cli/sti pairs with spinlocks? If it
> > takes more time to cli/sti than to run the whole code section
> > that will be protected by the spinlock, then it might be better
> > to use that instead and block in the IRQ dispatch code.
> 
> Not so fast there :)  The cli/sti is there to protect from "same cpu" 
> contention, i.e. this machine can come here on interrupt so don't 
> allow interrupts.  The spin lock is only good for the "other" cpus.
>
> On the other hand, a cli/sti will NOT protect from other machine 
> interrupts (baring the global cli which is not even in 2.5).

Because it isn't implemented this way? ;-)

If you have a per-CPU lock, which block in the arch-specific
IRQ-dispatch section, you can simulate cli/sti quite well.

I have more problems with the semantical difference of cli/sti
vs. spinlocks on nesting. A cli/sti will never block anything but
interrupts, a spinlock will block the caller on contention.

This should be measured. Unfortunately I don't have fast enough
hardware to notice a difference.

> The better thing to do, IMHO, is to move the work off the interrupt 
> path where only spin locks (and preemption locks) are required.
 
This is done incrementally already, so its only a matter of time.

> Another possibility is to make more use of the new read/write stuff 
> that is now being used for the xtime lock.  The idea here is that we 
> don't care if an interrupt (or the other machine) visits this data 
> while we are here as long as we know about it and can then redo what 
> we are doing.  This works very well for fetching a few words that need 
> to be fetch atomically WRT the lock.  If the fetch is not atomic (i.e. 
> was interrupted), just try again.

Not suitable for drivers. They must read the registers, set some
other registers and ACK the IRQ in the ISR. There is no way
around that. 

If the maximum latency between ISR and process context work is
small and definable by the drivers, people would offload more.

So if there would be a schedule_work_deadline() then this would
be nice. The routine called later in process context called will
be noticed, if the deadline is missed or not and can act
correctly.

This will simplify IDE (which needs those small timeouts and
could act like the stuff timed out, if the deadline is missed),
this will simplify the deadline-scheduler for IO and allow an
additional scheduling method for user space.

The scheduler changes would be one additional queue and it could
also be depth limited, since later deadlines can be added later.

We currently have timers, but they are not suitable for doing the
work only for triggering it and that's a source of complexity in
driver design.

> I haven't measured or heard of 
> measurements on this change, but I suspect that there is a significant 
> reduction in the time to do gettimeofday() because the cli/sti is not 
> on the read path any more.

I agree that these historical kind of code should be changed,
but modern drivers, which need to protect from IRQs of this
device happening while changing some registers could not do this.

They really need the spin_lock_irqsave() (which does the cli
implicitly).

Regards

Ingo Oeser

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.4.20 kernel/timer.c may incorrectly reenable interrupts
  2003-04-12  8:55       ` Ingo Oeser
@ 2003-04-14 21:49         ` george anzinger
  2003-04-15 20:32           ` Ingo Oeser
  0 siblings, 1 reply; 7+ messages in thread
From: george anzinger @ 2003-04-14 21:49 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: Keith Owens, linux-kernel

Ingo Oeser wrote:
> On Fri, Apr 11, 2003 at 02:21:38PM -0700, george anzinger wrote:
> 
>>Ingo Oeser wrote:
>>
>>>Maybe we could replace some cli/sti pairs with spinlocks? If it
>>>takes more time to cli/sti than to run the whole code section
>>>that will be protected by the spinlock, then it might be better
>>>to use that instead and block in the IRQ dispatch code.
>>
>>Not so fast there :)  The cli/sti is there to protect from "same cpu" 
>>contention, i.e. this machine can come here on interrupt so don't 
>>allow interrupts.  The spin lock is only good for the "other" cpus.
>>
>>On the other hand, a cli/sti will NOT protect from other machine 
>>interrupts (baring the global cli which is not even in 2.5).
> 
> 
> Because it isn't implemented this way? ;-)
> 
> If you have a per-CPU lock, which block in the arch-specific
> IRQ-dispatch section, you can simulate cli/sti quite well.

Yes, I believe this is what RTLinux and RTIA do.  The sti macro then 
needs to check for pending interrupt work, much as the bh_unlock does.
> 
> I have more problems with the semantical difference of cli/sti
> vs. spinlocks on nesting. A cli/sti will never block anything but
> interrupts, a spinlock will block the caller on contention.
> 
> This should be measured. Unfortunately I don't have fast enough
> hardware to notice a difference.
> 
> 
>>The better thing to do, IMHO, is to move the work off the interrupt 
>>path where only spin locks (and preemption locks) are required.
> 
>  
> This is done incrementally already, so its only a matter of time.
> 
> 
>>Another possibility is to make more use of the new read/write stuff 
>>that is now being used for the xtime lock.  The idea here is that we 
>>don't care if an interrupt (or the other machine) visits this data 
>>while we are here as long as we know about it and can then redo what 
>>we are doing.  This works very well for fetching a few words that need 
>>to be fetch atomically WRT the lock.  If the fetch is not atomic (i.e. 
>>was interrupted), just try again.
> 
> 
> Not suitable for drivers. They must read the registers, set some
> other registers and ACK the IRQ in the ISR. There is no way
> around that. 

The lock has two sides, the reader and the writer.  The writer still 
takes the irq/ spinlock, only the reader gets the speed up.
> 
> If the maximum latency between ISR and process context work is
> small and definable by the drivers, people would offload more.
> 
> So if there would be a schedule_work_deadline() then this would
> be nice. The routine called later in process context called will
> be noticed, if the deadline is missed or not and can act
> correctly.
> 
> This will simplify IDE (which needs those small timeouts and
> could act like the stuff timed out, if the deadline is missed),
> this will simplify the deadline-scheduler for IO and allow an
> additional scheduling method for user space.
> 
> The scheduler changes would be one additional queue and it could
> also be depth limited, since later deadlines can be added later.
> 
> We currently have timers, but they are not suitable for doing the
> work only for triggering it and that's a source of complexity in
> driver design.

Something of this sort is present in the workqueues design.  There is 
a schedule work for later call.
> 
> 
>>I haven't measured or heard of 
>>measurements on this change, but I suspect that there is a significant 
>>reduction in the time to do gettimeofday() because the cli/sti is not 
>>on the read path any more.
> 
> 
> I agree that these historical kind of code should be changed,
> but modern drivers, which need to protect from IRQs of this
> device happening while changing some registers could not do this.
> 
> They really need the spin_lock_irqsave() (which does the cli
> implicitly).
> 
> Regards
> 
> Ingo Oeser
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.4.20 kernel/timer.c may incorrectly reenable interrupts
  2003-04-14 21:49         ` george anzinger
@ 2003-04-15 20:32           ` Ingo Oeser
  0 siblings, 0 replies; 7+ messages in thread
From: Ingo Oeser @ 2003-04-15 20:32 UTC (permalink / raw)
  To: george anzinger; +Cc: Keith Owens, linux-kernel

Hi George,
hi Keith,
hi lkml,

On Mon, Apr 14, 2003 at 02:49:39PM -0700, george anzinger wrote:
> Ingo Oeser wrote:
> > On Fri, Apr 11, 2003 at 02:21:38PM -0700, george anzinger wrote:
> > 
> >>Ingo Oeser wrote:
> Yes, I believe this is what RTLinux and RTIA do.  The sti macro then 
> needs to check for pending interrupt work, much as the bh_unlock does.
 
Ok, NOW we are talking ;-)

> > Not suitable for drivers. They must read the registers, set some
> > other registers and ACK the IRQ in the ISR. There is no way
> > around that. 
> 
> The lock has two sides, the reader and the writer.  The writer still 
> takes the irq/ spinlock, only the reader gets the speed up.

But the read can read garbage and may need a retry. This is not
acceptable for me, since one of my cards will do destructive
reads. They implement a hardware FIFO (of one element depth),
where the register I read from is removing the element while
reading. This is a common design you'll find in PCI-bridge chips
(I used the S5933 from AMCC for this).

So this scheme still doesn't help me and is not the proper
drop-in replacement, we are looking for.

BTW: Please note, that you can assume I know the semantics of all
   the locking functions currently in Linus' tree. So you can
   stop explaining me the details of them and continue with
   defining a proper cli/sti replacement. Thanks!

> > So if there would be a schedule_work_deadline() then this would
> > be nice. The routine called later in process context called will
> > be noticed, if the deadline is missed or not and can act
> > correctly.
[...]
> > We currently have timers, but they are not suitable for doing the
> > work only for triggering it and that's a source of complexity in
> > driver design.
> 
> Something of this sort is present in the workqueues design.  There is 
> a schedule work for later call.

Yes, but it doesn't solve the problem, that I've illustrated. You'll not
know, WHEN this scheduled work will trigger (kernel tells us: ASAP)
and you need another kernel thingie again. And you must check,
whether the deadline is over yourself.

This isn't that nice design. I just don't feel that experienced,
that I can change the scheduler.

But maybe we can talk about that now, that we both know that we
know the current kernel API well enough ;-)

Regards

Ingo Oeser

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2003-04-16  8:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-04-11  6:47 2.4.20 kernel/timer.c may incorrectly reenable interrupts Keith Owens
2003-04-11  7:15 ` george anzinger
2003-04-11  9:27   ` Ingo Oeser
2003-04-11 21:21     ` george anzinger
2003-04-12  8:55       ` Ingo Oeser
2003-04-14 21:49         ` george anzinger
2003-04-15 20:32           ` Ingo Oeser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox