netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: BUG or not? GFP_KERNEL with interrupts disabled.
       [not found] <E791C176A6139242A988ABA8B3D9B38A01085638@hasmsx403.iil.intel.com>
@ 2003-03-27 13:32 ` shmulik.hen
  2003-03-27 13:43   ` David S. Miller
  0 siblings, 1 reply; 17+ messages in thread
From: shmulik.hen @ 2003-03-27 13:32 UTC (permalink / raw)
  To: Dan Eble, bond-devel, bond-announce, linux-netdev, linux-kernel,
	linux-net

Further more, holding a lock_irq doesn't mean bottom halves are disabled
too, it just means interrupts are disabled and no *new* softirq can be
queued. Consider the following situation:

In bond_release() we hold write_lock_irqsave(&bond->lock, flags) and then
do all the releasing stuff. If, for example, we need to call
dev_mc_upload() for the released slave, the following will happen

        spin_lock_bh(&dev->xmit_lock);
        __dev_mc_upload(dev);
        spin_unlock_bh(&dev->xmit_lock);

spin_unlock_bh() calls local_bh_enable() which checks local_bh_count. If
local_bh_count reaches zero (and it does), it directly executes
do_softirq(). The check for in_interrupt() in do_softirq() is false and
the softirqs that were queued begin to run and process the Tx and Rx
backlogs. dev_queue_xmit() is called on the bond device which calls, lets
say, bond_xmit_xor(). The first thing bond_xmit_xor() does is try to grab
read_lock_irqsave(&bond->lock, flags). Since this lock is already held by
bond_release(), and we're on the same cpu without any context switch,
we've got ourselves a deadlock. This actually happened to us and it took us
a while to figure the system halt, but we've got the kdb trace to prove
it.

Specifically for bonding, as stated by Dan below, it is indeed not
necessary to hold a lock_irq in every entry point in the driver. From our
experience in previous projects, we discovered that it is sufficient to
just grab a read_lock when accessing the slaves list in any softirq level
function (receive, transmit and timer), and hold a write_lock_bh() only
when changing the slaves list in ioctl calls like bond_enslave(),
bond_release(), bond_release_all() which all run at user context.

We have created a version that uses the above scheme that is being tested
by our QA group these days. Such a major change in the locking scheme
requires allot of testing to try and detect potential hidden bugs and
corner cases. We expect this will also increase the total throughput,
since interrupts won't be blocked each time a packet is being transmitted
or the miimon timer pops. We believe we will be able to post the patch
(+results) next week.


On Tue, 25 Mar 2003, Dan Eble wrote:
> 
> (kernel is ppc 2.4.21-pre4)
> 
> In bond_enslave() [drivers/net/bonding.c]:
> 
>         write_lock_irqsave(&bond->lock, flags);
>         ...
>         err = netdev_set_master(slave_dev, master_dev);
>         ...
>         write_unlock_irqrestore(&bond->lock, flags);
> 
> In netdev_set_master() [net/core/dev.c]:
> 
>         rtmsg_ifinfo(RTM_NEWLINK, slave, IFF_SLAVE);
> 
> In rtmsg_ifinfo() [net/core/rtnetlink.c]:
> 
>         skb = alloc_skb(size, GFP_KERNEL);
>         ...
>         netlink_broadcast(rtnl, skb, 0, RTMGRP_LINK, GFP_KERNEL);
> 
> Doesn't this admit the possibility of sleeping with interrupts disabled? 
> I found it because I'm working on a driver that uses a master-slave
> relationship like the bonding driver, and decided I didn't really need to
> disable interrupts, so I tried using write_lock_bh()  instead.  The
> result
> was an "alloc_skb called nonatomically from interrupt" message because
> write_lock_bh() increments the local BH count (which seems reasonable).
> 
> A bigger question: Why are the IRQ check and the BH check inconsistent?
> That is, local_bh_count() says "yes" if you are currently running in BH
> context OR have disabled BHs; however, local_irq_count() says "yes" if
> you
> are currently running in interrupt context, but it says nothing (as far
> as
> I have seen) about whether IRQs are enabled or disabled.  Is this (a) the
> Right Way, (b) something that's more trouble to fix than to be burned-by
> once and then avoid for the rest of your life, or (c) totally horked?
> 
> --
> Dan Eble <dane@aiinet.com>  _____  .
>                            |  _  |/|
> Applied Innovation Inc.    | |_| | |
> http://www.aiinet.com/     |__/|_|_|
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-net" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 

-- 
| Shmulik Hen                                    |
| Israel Design Center (Jerusalem)               |
| LAN Access Division                            |
| Intel Communications Group, Intel corp.        |

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG or not? GFP_KERNEL with interrupts disabled.
  2003-03-27 13:32 ` shmulik.hen
@ 2003-03-27 13:43   ` David S. Miller
  2003-03-27 14:11     ` Trond Myklebust
  2003-03-27 17:22     ` Linus Torvalds
  0 siblings, 2 replies; 17+ messages in thread
From: David S. Miller @ 2003-03-27 13:43 UTC (permalink / raw)
  To: shmulik.hen
  Cc: dane, bonding-devel, bonding-announce, netdev, linux-kernel,
	linux-net, torvalds, mingo, kuznet

   From: shmulik.hen@intel.com
   Date: Thu, 27 Mar 2003 15:32:02 +0200 (IST)

   Further more, holding a lock_irq doesn't mean bottom halves are disabled
   too, it just means interrupts are disabled and no *new* softirq can be
   queued. Consider the following situation:
   
I think local_bh_enable() should check irqs_disabled() and honour that.
What you are showing here, that BH's can run via local_bh_enable()
even when IRQs are disabled, is a BUG().

IRQ disabling is meant to be stronger than softint disabling.

Ingo/Linus?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG or not? GFP_KERNEL with interrupts disabled.
  2003-03-27 13:43   ` David S. Miller
@ 2003-03-27 14:11     ` Trond Myklebust
  2003-03-27 14:12       ` David S. Miller
  2003-03-27 17:22     ` Linus Torvalds
  1 sibling, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2003-03-27 14:11 UTC (permalink / raw)
  To: David S. Miller
  Cc: shmulik.hen, dane, bonding-devel, bonding-announce, netdev,
	linux-kernel, linux-net, torvalds, mingo, kuznet

>>>>> " " == David S Miller <davem@redhat.com> writes:

     >    From: shmulik.hen@intel.com Date: Thu, 27 Mar 2003 15:32:02
     >    +0200 (IST)

     >    Further more, holding a lock_irq doesn't mean bottom halves
     >    are disabled too, it just means interrupts are disabled and
     >    no *new* softirq can be queued. Consider the following
     >    situation:
   
     > I think local_bh_enable() should check irqs_disabled() and
     > honour that.  What you are showing here, that BH's can run via
     > local_bh_enable() even when IRQs are disabled, is a BUG().

     > IRQ disabling is meant to be stronger than softint disabling.

In that case, you'll need to have things like spin_lock_irqrestore()
call local_bh_enable() in order to run the pending softirqs. Is that
worth the trouble?

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG or not? GFP_KERNEL with interrupts disabled.
  2003-03-27 14:11     ` Trond Myklebust
@ 2003-03-27 14:12       ` David S. Miller
  0 siblings, 0 replies; 17+ messages in thread
From: David S. Miller @ 2003-03-27 14:12 UTC (permalink / raw)
  To: trond.myklebust
  Cc: shmulik.hen, dane, bonding-devel, bonding-announce, netdev,
	linux-kernel, linux-net, torvalds, mingo, kuznet

   From: Trond Myklebust <trond.myklebust@fys.uio.no>
   Date: 27 Mar 2003 15:11:56 +0100

        > IRQ disabling is meant to be stronger than softint disabling.
   
   In that case, you'll need to have things like spin_lock_irqrestore()
   call local_bh_enable() in order to run the pending softirqs. Is that
   worth the trouble?

"trouble" is a weird word to use when the current behavior is
just wrong. :-)

My point is that it doesn't matter what the fix is, running
softints while hw IRQs are disabled must be fixed.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG or not? GFP_KERNEL with interrupts disabled.
  2003-03-27 13:43   ` David S. Miller
  2003-03-27 14:11     ` Trond Myklebust
@ 2003-03-27 17:22     ` Linus Torvalds
  2003-03-27 17:55       ` David S. Miller
  1 sibling, 1 reply; 17+ messages in thread
From: Linus Torvalds @ 2003-03-27 17:22 UTC (permalink / raw)
  To: David S. Miller
  Cc: shmulik.hen, dane, bonding-devel, bonding-announce, netdev,
	linux-kernel, linux-net, mingo, kuznet


On Thu, 27 Mar 2003, David S. Miller wrote:
> 
>    Further more, holding a lock_irq doesn't mean bottom halves are disabled
>    too, it just means interrupts are disabled and no *new* softirq can be
>    queued. Consider the following situation:
>    
> I think local_bh_enable() should check irqs_disabled() and honour that.
> What you are showing here, that BH's can run via local_bh_enable()
> even when IRQs are disabled, is a BUG().

I'd disagree.

I do agree that we should obviously not run bottom halves with interrupts 
disabled, but I think the _real_ bug is doing "local_bh_enable()" in the 
first place. It's a nesting bug: you must nest the "stronger" lock inside 
the weaker one, which means that the following is right:

	local_bh_disable()
		..
		local_irq_disable()
		...
		local_irq_enable()
		..
	local_bh_enable()

and this is WRONG:

	local_irq_disable() (or spinlock)
		..
		local_bh_disable()
		..
		local_bh_enable()	!BUG BUG BUG!
		..
	local_irq_enable()

So the bug is, in my opinion, not in BK handling, but in the caller.

I missed the start of this thread, so I don't know how hard this is to 
fix. But if you have a buggy sequence, the _simple_ fix may be to do 
somehting like this:

+++	local_bh_disable()
	local_irq_disable() (or spinlock)
		..
		local_bh_disable()
		..
		local_bh_enable()	! now it's a no-op and no longer a bug
		..
	local_irq_enable()
+++	local_bh_enable()

What's the code sequence?

		Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG or not? GFP_KERNEL with interrupts disabled.
  2003-03-27 17:22     ` Linus Torvalds
@ 2003-03-27 17:55       ` David S. Miller
  2003-03-27 18:04         ` Linus Torvalds
  0 siblings, 1 reply; 17+ messages in thread
From: David S. Miller @ 2003-03-27 17:55 UTC (permalink / raw)
  To: torvalds
  Cc: shmulik.hen, dane, bonding-devel, bonding-announce, netdev,
	linux-kernel, linux-net, mingo, kuznet

   From: Linus Torvalds <torvalds@transmeta.com>
   Date: Thu, 27 Mar 2003 09:22:29 -0800 (PST)

   I do agree that we should obviously not run bottom halves with
   interrupts disabled
   
Ok, so can we add a:

	if (irqs_disabled())
		BUG();

check to do_softirq()?

I'll address the rest of your email in a bit.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG or not? GFP_KERNEL with interrupts disabled.
  2003-03-27 17:55       ` David S. Miller
@ 2003-03-27 18:04         ` Linus Torvalds
  2003-03-27 18:07           ` David S. Miller
  0 siblings, 1 reply; 17+ messages in thread
From: Linus Torvalds @ 2003-03-27 18:04 UTC (permalink / raw)
  To: David S. Miller
  Cc: shmulik.hen, dane, bonding-devel, bonding-announce, netdev,
	linux-kernel, linux-net, mingo, kuznet


On Thu, 27 Mar 2003, David S. Miller wrote:
> 
>    I do agree that we should obviously not run bottom halves with
>    interrupts disabled
>    
> Ok, so can we add a:
> 
> 	if (irqs_disabled())
> 		BUG();
> 
> check to do_softirq()?

I'd suggest making it a counting warning (with a static counter per
local-bh-enable macro expansion) and adding it to local_bh_enable() -
otherwise it will only BUG()  when the (potentially rare) condition
happens - instead of always giving a nice backtrace of exact problem 
spots.

		Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG or not? GFP_KERNEL with interrupts disabled.
  2003-03-27 18:04         ` Linus Torvalds
@ 2003-03-27 18:07           ` David S. Miller
  0 siblings, 0 replies; 17+ messages in thread
From: David S. Miller @ 2003-03-27 18:07 UTC (permalink / raw)
  To: torvalds
  Cc: shmulik.hen, dane, bonding-devel, bonding-announce, netdev,
	linux-kernel, linux-net, mingo, kuznet

   From: Linus Torvalds <torvalds@transmeta.com>
   Date: Thu, 27 Mar 2003 10:04:52 -0800 (PST)

   I'd suggest making it a counting warning (with a static counter per
   local-bh-enable macro expansion) and adding it to local_bh_enable() -
   otherwise it will only BUG()  when the (potentially rare) condition
   happens - instead of always giving a nice backtrace of exact problem 
   spots.

Ok, maybe it's time to move local_bh_enable() out of line, it's
getting large and it's expanded in hundreds of places.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG or not? GFP_KERNEL with interrupts disabled.
       [not found] <3B785392832ED71192AE00D0B7B0D75B539668@aimail.aiinet.com>
@ 2003-03-27 19:02 ` Dan Eble
  2003-03-27 19:08   ` Linus Torvalds
  0 siblings, 1 reply; 17+ messages in thread
From: Dan Eble @ 2003-03-27 19:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David S. Miller, shmulik.hen, bonding-devel, bonding-announce,
	netdev, linux-kernel, linux-net, mingo, kuznet

On Thu, 27 Mar 2003, Linus Torvalds wrote:
> 
> On Thu, 27 Mar 2003, David S. Miller wrote:
> > 
> > Ok, so can we add a:
> > 
> > 	if (irqs_disabled())
> > 		BUG();
> > 
> > check to do_softirq()?
> 
> I'd suggest making it a counting warning (with a static counter per
> local-bh-enable macro expansion) and adding it to local_bh_enable() -
> otherwise it will only BUG()  when the (potentially rare) condition
> happens - instead of always giving a nice backtrace of exact problem 
> spots.

So, to return to my original question...  local_bh_count() > 0 when 
a BH is running or after local_bh_disable().  local_irq_count() > 0 in 
interrupt context, but not necessarily when interrupts are disabled.

This makes checks like the following (in alloc_skb) asymmetric:

    if (in_interrupt() && (gfp_mask & __GFP_WAIT)) {
        static int count = 0;
        if (++count < 5) {
            printk(KERN_ERR "alloc_skb called nonatomically "
                   "from interrupt %p\n", NET_CALLER(size));
            BUG();

In a driver I'm writing, this bug was hidden until I switched from using
write_lock_irqsave() to write_lock_bh().  Shouldn't this bug also be
announced if interrupts are disabled?  (I understand that disabling bh/irq
in the correct order will ensure that this bug is properly detected, but
it seems like a strange policy to rely on correct coding to catch a bug.)

-- 
Dan Eble <dane@aiinet.com>  _____  .
                           |  _  |/|
Applied Innovation Inc.    | |_| | |
http://www.aiinet.com/     |__/|_|_|

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG or not? GFP_KERNEL with interrupts disabled.
  2003-03-27 19:02 ` BUG or not? GFP_KERNEL with interrupts disabled Dan Eble
@ 2003-03-27 19:08   ` Linus Torvalds
  2003-03-27 19:10     ` David S. Miller
  0 siblings, 1 reply; 17+ messages in thread
From: Linus Torvalds @ 2003-03-27 19:08 UTC (permalink / raw)
  To: Dan Eble
  Cc: David S. Miller, shmulik.hen, bonding-devel, bonding-announce,
	netdev, linux-kernel, linux-net, mingo, kuznet


On Thu, 27 Mar 2003, Dan Eble wrote:
> 
> This makes checks like the following (in alloc_skb) asymmetric:
> 
>     if (in_interrupt() && (gfp_mask & __GFP_WAIT)) {
>         static int count = 0;
>         if (++count < 5) {
>             printk(KERN_ERR "alloc_skb called nonatomically "
>                    "from interrupt %p\n", NET_CALLER(size));
>             BUG();
> 
> In a driver I'm writing, this bug was hidden until I switched from using
> write_lock_irqsave() to write_lock_bh().  Shouldn't this bug also be
> announced if interrupts are disabled?

Yeah. It should also probably use "in_atomic()" instead of 
"in_interrupt()", since that also finds people who have marked themselves 
non-preemptible.

So what the test SHOULD look like is this:

	if (gfp_mask & __GFP_WAIT) {
		if (in_atomic() || irqs_disabled()) {
			static int count = 0;
			...
		}
	}

which should catch all the cases we really care about.

		Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG or not? GFP_KERNEL with interrupts disabled.
  2003-03-27 19:08   ` Linus Torvalds
@ 2003-03-27 19:10     ` David S. Miller
  2003-03-27 19:22       ` Linus Torvalds
  0 siblings, 1 reply; 17+ messages in thread
From: David S. Miller @ 2003-03-27 19:10 UTC (permalink / raw)
  To: torvalds
  Cc: dane, shmulik.hen, bonding-devel, bonding-announce, netdev,
	linux-kernel, linux-net, mingo, kuznet

   From: Linus Torvalds <torvalds@transmeta.com>
   Date: Thu, 27 Mar 2003 11:08:26 -0800 (PST)

   So what the test SHOULD look like is this:
   
   	if (gfp_mask & __GFP_WAIT) {
   		if (in_atomic() || irqs_disabled()) {
   			static int count = 0;
   			...
   		}
   	}
   
   which should catch all the cases we really care about.

Let's codify this "in_atomic() || irqs_disabled()" test into a macro
that everyone can use to test sleepability, ok?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG or not? GFP_KERNEL with interrupts disabled.
  2003-03-27 19:10     ` David S. Miller
@ 2003-03-27 19:22       ` Linus Torvalds
  2003-03-27 19:39         ` David S. Miller
  0 siblings, 1 reply; 17+ messages in thread
From: Linus Torvalds @ 2003-03-27 19:22 UTC (permalink / raw)
  To: David S. Miller
  Cc: dane, shmulik.hen, bonding-devel, bonding-announce, netdev,
	linux-kernel, linux-net, mingo, kuznet


On Thu, 27 Mar 2003, David S. Miller wrote:
> 
> Let's codify this "in_atomic() || irqs_disabled()" test into a macro
> that everyone can use to test sleepability, ok?

Well, I really don't want people to act dynamically differently depending 
on whether they can sleep or not. That makes static sanity-testing 
impossible. So I really think that the only really valid use of the above 
is on one single place: might_sleep().

Which right now doesn't do the "irqs_disabled()" test, but otherwise looks 
good. So the code should really just say

	if (gfp_mask & __GFP_WAIT)
		might_sleep();

and might_sleep() should be updated.

Anybody want to try that and see whether things break horribly?

		Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG or not? GFP_KERNEL with interrupts disabled.
  2003-03-27 19:22       ` Linus Torvalds
@ 2003-03-27 19:39         ` David S. Miller
  2003-03-27 19:52           ` Robert Love
  2003-03-27 20:55           ` David S. Miller
  0 siblings, 2 replies; 17+ messages in thread
From: David S. Miller @ 2003-03-27 19:39 UTC (permalink / raw)
  To: torvalds
  Cc: dane, shmulik.hen, bonding-devel, bonding-announce, netdev,
	linux-kernel, linux-net, mingo, kuznet

   From: Linus Torvalds <torvalds@transmeta.com>
   Date: Thu, 27 Mar 2003 11:22:55 -0800 (PST)

   	if (gfp_mask & __GFP_WAIT)
   		might_sleep();
   
   and might_sleep() should be updated.
   
   Anybody want to try that and see whether things break horribly?

I hadn't considered this, good idea.  I'm trying this out right now.

Someone should backport the might_sleep() stuff to 2.4.x, it's very
useful.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG or not? GFP_KERNEL with interrupts disabled.
  2003-03-27 19:39         ` David S. Miller
@ 2003-03-27 19:52           ` Robert Love
  2003-03-27 19:53             ` David S. Miller
  2003-03-27 20:55           ` David S. Miller
  1 sibling, 1 reply; 17+ messages in thread
From: Robert Love @ 2003-03-27 19:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: torvalds, dane, shmulik.hen, bonding-devel, bonding-announce,
	netdev, linux-kernel, linux-net, mingo, kuznet

On Thu, 2003-03-27 at 14:39, David S. Miller wrote:

> I hadn't considered this, good idea.  I'm trying this out right now.

I hope it works.  I have a sinking feeling we call it some places that
may have interrupts disabled...

> Someone should backport the might_sleep() stuff to 2.4.x, it's very
> useful.

Would be nice, but for the maximum effect we need kernel preemption
(which keeps track of atomicity via preempt_count).  I doubt we want to
go there for 2.4.

	Robert Love


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG or not? GFP_KERNEL with interrupts disabled.
  2003-03-27 19:52           ` Robert Love
@ 2003-03-27 19:53             ` David S. Miller
  0 siblings, 0 replies; 17+ messages in thread
From: David S. Miller @ 2003-03-27 19:53 UTC (permalink / raw)
  To: rml
  Cc: torvalds, dane, shmulik.hen, bonding-devel, bonding-announce,
	netdev, linux-kernel, linux-net, mingo, kuznet

   From: Robert Love <rml@tech9.net>
   Date: 27 Mar 2003 14:52:11 -0500

   On Thu, 2003-03-27 at 14:39, David S. Miller wrote:
   
   > I hadn't considered this, good idea.  I'm trying this out right now.
   
   I hope it works.  I have a sinking feeling we call it some places that
   may have interrupts disabled...

Your sinking feeling was warranted.

Nearly every hw IRQ implementation invokes irq_exit() with
CPU interrupts off :-(  That has to be screwing with performance
as well.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG or not? GFP_KERNEL with interrupts disabled.
  2003-03-27 19:39         ` David S. Miller
  2003-03-27 19:52           ` Robert Love
@ 2003-03-27 20:55           ` David S. Miller
  2003-03-27 21:29             ` David S. Miller
  1 sibling, 1 reply; 17+ messages in thread
From: David S. Miller @ 2003-03-27 20:55 UTC (permalink / raw)
  To: torvalds
  Cc: dane, shmulik.hen, bonding-devel, bonding-announce, netdev,
	linux-kernel, linux-net, mingo, kuznet

   From: "David S. Miller" <davem@redhat.com>
   Date: Thu, 27 Mar 2003 11:39:33 -0800 (PST)

      From: Linus Torvalds <torvalds@transmeta.com>
      Date: Thu, 27 Mar 2003 11:22:55 -0800 (PST)
   
      	if (gfp_mask & __GFP_WAIT)
      		might_sleep();
      
      and might_sleep() should be updated.
      
      Anybody want to try that and see whether things break horribly?
   
   I hadn't considered this, good idea.  I'm trying this out right now.
   
Ok, I'm running this now and it appears to work.

i386 will need similar changes to it's irq_exit() call sites.

One might_sleep still triggers, the cpufreq_register_notifier() call
during boot.  It takes a rwsem.  This will trigger on i386 too with
TSC and CPUFREQ both enabled.

Oh yeah, the fbcon cursor thing triggers too, but that's been
discussed to death in another thread and hopefully a fix will
be pushed upstream soon by the fbcon guys.

--- ./arch/sparc64/kernel/smp.c.~1~	Thu Mar 27 11:57:41 2003
+++ ./arch/sparc64/kernel/smp.c	Thu Mar 27 12:05:46 2003
@@ -1055,11 +1055,10 @@ void smp_percpu_timer_interrupt(struct p
 		clear_softint(tick_mask);
 	}
 
+	irq_enter();
 	do {
 		sparc64_do_profile(regs);
 		if (!--prof_counter(cpu)) {
-			irq_enter();
-
 			if (cpu == boot_cpu_id) {
 				kstat_cpu(cpu).irqs[0]++;
 				timer_tick_interrupt(regs);
@@ -1067,7 +1066,6 @@ void smp_percpu_timer_interrupt(struct p
 
 			update_process_times(user);
 
-			irq_exit();
 
 			prof_counter(cpu) = prof_multiplier(cpu);
 		}
@@ -1088,6 +1086,9 @@ void smp_percpu_timer_interrupt(struct p
 				     : /* no outputs */
 				     : "r" (pstate));
 	} while (time_after_eq(tick, compare));
+
+	local_irq_enable();
+	irq_exit();
 }
 
 static void __init smp_setup_percpu_timer(void)
--- ./arch/sparc64/kernel/irq.c.~1~	Thu Mar 27 11:57:41 2003
+++ ./arch/sparc64/kernel/irq.c	Thu Mar 27 12:42:13 2003
@@ -356,7 +356,7 @@ int request_irq(unsigned int irq, void (
 	}	
 	if (action == NULL)
 	    action = (struct irqaction *)kmalloc(sizeof(struct irqaction),
-						 GFP_KERNEL);
+						 GFP_ATOMIC);
 	
 	if (!action) { 
 		spin_unlock_irqrestore(&irq_action_lock, flags);
@@ -376,7 +376,7 @@ int request_irq(unsigned int irq, void (
 				goto free_and_ebusy;
 			}
 			if ((bucket->flags & IBF_MULTI) == 0) {
-				vector = kmalloc(sizeof(void *) * 4, GFP_KERNEL);
+				vector = kmalloc(sizeof(void *) * 4, GFP_ATOMIC);
 				if (vector == NULL)
 					goto free_and_enomem;
 
@@ -793,6 +793,7 @@ void handler_irq(int irq, struct pt_regs
 
 		bp->flags &= ~IBF_INPROGRESS;
 	}
+	local_irq_enable();
 	irq_exit();
 }
 
@@ -900,7 +901,7 @@ int request_fast_irq(unsigned int irq,
 	}
 	if (action == NULL)
 		action = (struct irqaction *)kmalloc(sizeof(struct irqaction),
-						     GFP_KERNEL);
+						     GFP_ATOMIC);
 	if (!action) {
 		spin_unlock_irqrestore(&irq_action_lock, flags);
 		return -ENOMEM;
--- ./arch/sparc64/kernel/traps.c.~1~	Thu Mar 27 12:13:23 2003
+++ ./arch/sparc64/kernel/traps.c	Thu Mar 27 12:15:53 2003
@@ -1575,6 +1575,9 @@ void show_trace_raw(struct thread_info *
 	struct reg_window *rw;
 	int count = 0;
 
+	if (tp == current_thread_info())
+		flushw_all();
+
 	fp = ksp + STACK_BIAS;
 	thread_base = (unsigned long) tp;
 	do {
@@ -1595,6 +1598,15 @@ void show_trace_task(struct task_struct 
 	if (tsk)
 		show_trace_raw(tsk->thread_info,
 			       tsk->thread_info->ksp);
+}
+
+void dump_stack(void)
+{
+	unsigned long ksp;
+
+	__asm__ __volatile__("mov	%%fp, %0"
+			     : "=r" (ksp));
+	show_trace_raw(current_thread_info(), ksp);
 }
 
 void die_if_kernel(char *str, struct pt_regs *regs)
--- ./kernel/sched.c.~1~	Thu Mar 27 11:27:01 2003
+++ ./kernel/sched.c	Thu Mar 27 11:27:41 2003
@@ -2554,7 +2554,7 @@ void __might_sleep(char *file, int line)
 #if defined(in_atomic)
 	static unsigned long prev_jiffy;	/* ratelimiting */
 
-	if (in_atomic()) {
+	if (in_atomic() || irqs_disabled()) {
 		if (time_before(jiffies, prev_jiffy + HZ))
 			return;
 		prev_jiffy = jiffies;
--- ./kernel/softirq.c.~1~	Thu Mar 27 11:28:20 2003
+++ ./kernel/softirq.c	Thu Mar 27 11:52:35 2003
@@ -60,6 +60,9 @@ asmlinkage void do_softirq()
 	if (in_interrupt())
 		return;
 
+	if (irqs_disabled())
+		BUG();
+
 	local_irq_save(flags);
 	cpu = smp_processor_id();
 
--- ./net/core/skbuff.c.~1~	Thu Mar 27 11:28:53 2003
+++ ./net/core/skbuff.c	Thu Mar 27 11:29:12 2003
@@ -170,15 +170,8 @@ struct sk_buff *alloc_skb(unsigned int s
 	struct sk_buff *skb;
 	u8 *data;
 
-	if (in_interrupt() && (gfp_mask & __GFP_WAIT)) {
-		static int count;
-		if (++count < 5) {
-			printk(KERN_ERR "alloc_skb called nonatomically "
-			       "from interrupt %p\n", NET_CALLER(size));
- 			BUG();
-		}
-		gfp_mask &= ~__GFP_WAIT;
-	}
+	if (gfp_mask & __GFP_WAIT)
+		might_sleep();
 
 	/* Get the HEAD */
 	skb = skb_head_from_pool();

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG or not? GFP_KERNEL with interrupts disabled.
  2003-03-27 20:55           ` David S. Miller
@ 2003-03-27 21:29             ` David S. Miller
  0 siblings, 0 replies; 17+ messages in thread
From: David S. Miller @ 2003-03-27 21:29 UTC (permalink / raw)
  To: torvalds
  Cc: dane, shmulik.hen, bonding-devel, bonding-announce, netdev,
	linux-kernel, linux-net, mingo, kuznet

   From: "David S. Miller" <davem@redhat.com>
   Date: Thu, 27 Mar 2003 12:55:07 -0800 (PST)

Alexey has pointed out a bug in my changes.

   @@ -1088,6 +1086,9 @@ void smp_percpu_timer_interrupt(struct p
    				     : /* no outputs */
    				     : "r" (pstate));
    	} while (time_after_eq(tick, compare));
   +
   +	local_irq_enable();
   +	irq_exit();
    }
    
    static void __init smp_setup_percpu_timer(void)

Of course this is bogus.

The IRQ enable needs to occur in the irq_exit() branch right
before do_softirq() is invoked.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2003-03-27 21:29 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <3B785392832ED71192AE00D0B7B0D75B539668@aimail.aiinet.com>
2003-03-27 19:02 ` BUG or not? GFP_KERNEL with interrupts disabled Dan Eble
2003-03-27 19:08   ` Linus Torvalds
2003-03-27 19:10     ` David S. Miller
2003-03-27 19:22       ` Linus Torvalds
2003-03-27 19:39         ` David S. Miller
2003-03-27 19:52           ` Robert Love
2003-03-27 19:53             ` David S. Miller
2003-03-27 20:55           ` David S. Miller
2003-03-27 21:29             ` David S. Miller
     [not found] <E791C176A6139242A988ABA8B3D9B38A01085638@hasmsx403.iil.intel.com>
2003-03-27 13:32 ` shmulik.hen
2003-03-27 13:43   ` David S. Miller
2003-03-27 14:11     ` Trond Myklebust
2003-03-27 14:12       ` David S. Miller
2003-03-27 17:22     ` Linus Torvalds
2003-03-27 17:55       ` David S. Miller
2003-03-27 18:04         ` Linus Torvalds
2003-03-27 18:07           ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).