* Re: BUG or not? GFP_KERNEL with interrupts disabled. [not found] <E791C176A6139242A988ABA8B3D9B38A01085638@hasmsx403.iil.intel.com> @ 2003-03-27 13:32 ` shmulik.hen 2003-03-27 13:43 ` David S. Miller 0 siblings, 1 reply; 15+ messages in thread From: shmulik.hen @ 2003-03-27 13:32 UTC (permalink / raw) To: Dan Eble, bond-devel, bond-announce, linux-netdev, linux-kernel, linux-net Further more, holding a lock_irq doesn't mean bottom halves are disabled too, it just means interrupts are disabled and no *new* softirq can be queued. Consider the following situation: In bond_release() we hold write_lock_irqsave(&bond->lock, flags) and then do all the releasing stuff. If, for example, we need to call dev_mc_upload() for the released slave, the following will happen spin_lock_bh(&dev->xmit_lock); __dev_mc_upload(dev); spin_unlock_bh(&dev->xmit_lock); spin_unlock_bh() calls local_bh_enable() which checks local_bh_count. If local_bh_count reaches zero (and it does), it directly executes do_softirq(). The check for in_interrupt() in do_softirq() is false and the softirqs that were queued begin to run and process the Tx and Rx backlogs. dev_queue_xmit() is called on the bond device which calls, lets say, bond_xmit_xor(). The first thing bond_xmit_xor() does is try to grab read_lock_irqsave(&bond->lock, flags). Since this lock is already held by bond_release(), and we're on the same cpu without any context switch, we've got ourselves a deadlock. This actually happened to us and it took us a while to figure the system halt, but we've got the kdb trace to prove it. Specifically for bonding, as stated by Dan below, it is indeed not necessary to hold a lock_irq in every entry point in the driver. From our experience in previous projects, we discovered that it is sufficient to just grab a read_lock when accessing the slaves list in any softirq level function (receive, transmit and timer), and hold a write_lock_bh() only when changing the slaves list in ioctl calls like bond_enslave(), bond_release(), bond_release_all() which all run at user context. We have created a version that uses the above scheme that is being tested by our QA group these days. Such a major change in the locking scheme requires allot of testing to try and detect potential hidden bugs and corner cases. We expect this will also increase the total throughput, since interrupts won't be blocked each time a packet is being transmitted or the miimon timer pops. We believe we will be able to post the patch (+results) next week. On Tue, 25 Mar 2003, Dan Eble wrote: > > (kernel is ppc 2.4.21-pre4) > > In bond_enslave() [drivers/net/bonding.c]: > > write_lock_irqsave(&bond->lock, flags); > ... > err = netdev_set_master(slave_dev, master_dev); > ... > write_unlock_irqrestore(&bond->lock, flags); > > In netdev_set_master() [net/core/dev.c]: > > rtmsg_ifinfo(RTM_NEWLINK, slave, IFF_SLAVE); > > In rtmsg_ifinfo() [net/core/rtnetlink.c]: > > skb = alloc_skb(size, GFP_KERNEL); > ... > netlink_broadcast(rtnl, skb, 0, RTMGRP_LINK, GFP_KERNEL); > > Doesn't this admit the possibility of sleeping with interrupts disabled? > I found it because I'm working on a driver that uses a master-slave > relationship like the bonding driver, and decided I didn't really need to > disable interrupts, so I tried using write_lock_bh() instead. The > result > was an "alloc_skb called nonatomically from interrupt" message because > write_lock_bh() increments the local BH count (which seems reasonable). > > A bigger question: Why are the IRQ check and the BH check inconsistent? > That is, local_bh_count() says "yes" if you are currently running in BH > context OR have disabled BHs; however, local_irq_count() says "yes" if > you > are currently running in interrupt context, but it says nothing (as far > as > I have seen) about whether IRQs are enabled or disabled. Is this (a) the > Right Way, (b) something that's more trouble to fix than to be burned-by > once and then avoid for the rest of your life, or (c) totally horked? > > -- > Dan Eble <dane@aiinet.com> _____ . > | _ |/| > Applied Innovation Inc. | |_| | | > http://www.aiinet.com/ |__/|_|_| > > - > To unsubscribe from this list: send the line "unsubscribe linux-net" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- | Shmulik Hen | | Israel Design Center (Jerusalem) | | LAN Access Division | | Intel Communications Group, Intel corp. | ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: BUG or not? GFP_KERNEL with interrupts disabled. 2003-03-27 13:32 ` BUG or not? GFP_KERNEL with interrupts disabled shmulik.hen @ 2003-03-27 13:43 ` David S. Miller 2003-03-27 14:11 ` Trond Myklebust 2003-03-27 17:22 ` Linus Torvalds 0 siblings, 2 replies; 15+ messages in thread From: David S. Miller @ 2003-03-27 13:43 UTC (permalink / raw) To: shmulik.hen Cc: dane, bonding-devel, bonding-announce, netdev, linux-kernel, linux-net, torvalds, mingo, kuznet From: shmulik.hen@intel.com Date: Thu, 27 Mar 2003 15:32:02 +0200 (IST) Further more, holding a lock_irq doesn't mean bottom halves are disabled too, it just means interrupts are disabled and no *new* softirq can be queued. Consider the following situation: I think local_bh_enable() should check irqs_disabled() and honour that. What you are showing here, that BH's can run via local_bh_enable() even when IRQs are disabled, is a BUG(). IRQ disabling is meant to be stronger than softint disabling. Ingo/Linus? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: BUG or not? GFP_KERNEL with interrupts disabled. 2003-03-27 13:43 ` David S. Miller @ 2003-03-27 14:11 ` Trond Myklebust 2003-03-27 14:12 ` David S. Miller 2003-03-27 17:22 ` Linus Torvalds 1 sibling, 1 reply; 15+ messages in thread From: Trond Myklebust @ 2003-03-27 14:11 UTC (permalink / raw) To: David S. Miller Cc: shmulik.hen, dane, bonding-devel, bonding-announce, netdev, linux-kernel, linux-net, torvalds, mingo, kuznet >>>>> " " == David S Miller <davem@redhat.com> writes: > From: shmulik.hen@intel.com Date: Thu, 27 Mar 2003 15:32:02 > +0200 (IST) > Further more, holding a lock_irq doesn't mean bottom halves > are disabled too, it just means interrupts are disabled and > no *new* softirq can be queued. Consider the following > situation: > I think local_bh_enable() should check irqs_disabled() and > honour that. What you are showing here, that BH's can run via > local_bh_enable() even when IRQs are disabled, is a BUG(). > IRQ disabling is meant to be stronger than softint disabling. In that case, you'll need to have things like spin_lock_irqrestore() call local_bh_enable() in order to run the pending softirqs. Is that worth the trouble? Cheers, Trond ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: BUG or not? GFP_KERNEL with interrupts disabled. 2003-03-27 14:11 ` Trond Myklebust @ 2003-03-27 14:12 ` David S. Miller 0 siblings, 0 replies; 15+ messages in thread From: David S. Miller @ 2003-03-27 14:12 UTC (permalink / raw) To: trond.myklebust Cc: shmulik.hen, dane, bonding-devel, bonding-announce, netdev, linux-kernel, linux-net, torvalds, mingo, kuznet From: Trond Myklebust <trond.myklebust@fys.uio.no> Date: 27 Mar 2003 15:11:56 +0100 > IRQ disabling is meant to be stronger than softint disabling. In that case, you'll need to have things like spin_lock_irqrestore() call local_bh_enable() in order to run the pending softirqs. Is that worth the trouble? "trouble" is a weird word to use when the current behavior is just wrong. :-) My point is that it doesn't matter what the fix is, running softints while hw IRQs are disabled must be fixed. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: BUG or not? GFP_KERNEL with interrupts disabled. 2003-03-27 13:43 ` David S. Miller 2003-03-27 14:11 ` Trond Myklebust @ 2003-03-27 17:22 ` Linus Torvalds 2003-03-27 17:55 ` David S. Miller 2004-10-10 1:38 ` udp_recvmsg: possible bug causing infinite hang? Chad N. Tindel 1 sibling, 2 replies; 15+ messages in thread From: Linus Torvalds @ 2003-03-27 17:22 UTC (permalink / raw) To: David S. Miller Cc: shmulik.hen, dane, bonding-devel, bonding-announce, netdev, linux-kernel, linux-net, mingo, kuznet On Thu, 27 Mar 2003, David S. Miller wrote: > > Further more, holding a lock_irq doesn't mean bottom halves are disabled > too, it just means interrupts are disabled and no *new* softirq can be > queued. Consider the following situation: > > I think local_bh_enable() should check irqs_disabled() and honour that. > What you are showing here, that BH's can run via local_bh_enable() > even when IRQs are disabled, is a BUG(). I'd disagree. I do agree that we should obviously not run bottom halves with interrupts disabled, but I think the _real_ bug is doing "local_bh_enable()" in the first place. It's a nesting bug: you must nest the "stronger" lock inside the weaker one, which means that the following is right: local_bh_disable() .. local_irq_disable() ... local_irq_enable() .. local_bh_enable() and this is WRONG: local_irq_disable() (or spinlock) .. local_bh_disable() .. local_bh_enable() !BUG BUG BUG! .. local_irq_enable() So the bug is, in my opinion, not in BK handling, but in the caller. I missed the start of this thread, so I don't know how hard this is to fix. But if you have a buggy sequence, the _simple_ fix may be to do somehting like this: +++ local_bh_disable() local_irq_disable() (or spinlock) .. local_bh_disable() .. local_bh_enable() ! now it's a no-op and no longer a bug .. local_irq_enable() +++ local_bh_enable() What's the code sequence? Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: BUG or not? GFP_KERNEL with interrupts disabled. 2003-03-27 17:22 ` Linus Torvalds @ 2003-03-27 17:55 ` David S. Miller 2003-03-27 18:04 ` Linus Torvalds 2004-10-10 1:38 ` udp_recvmsg: possible bug causing infinite hang? Chad N. Tindel 1 sibling, 1 reply; 15+ messages in thread From: David S. Miller @ 2003-03-27 17:55 UTC (permalink / raw) To: torvalds Cc: shmulik.hen, dane, bonding-devel, bonding-announce, netdev, linux-kernel, linux-net, mingo, kuznet From: Linus Torvalds <torvalds@transmeta.com> Date: Thu, 27 Mar 2003 09:22:29 -0800 (PST) I do agree that we should obviously not run bottom halves with interrupts disabled Ok, so can we add a: if (irqs_disabled()) BUG(); check to do_softirq()? I'll address the rest of your email in a bit. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: BUG or not? GFP_KERNEL with interrupts disabled. 2003-03-27 17:55 ` David S. Miller @ 2003-03-27 18:04 ` Linus Torvalds 2003-03-27 18:07 ` David S. Miller 0 siblings, 1 reply; 15+ messages in thread From: Linus Torvalds @ 2003-03-27 18:04 UTC (permalink / raw) To: David S. Miller Cc: shmulik.hen, dane, bonding-devel, bonding-announce, netdev, linux-kernel, linux-net, mingo, kuznet On Thu, 27 Mar 2003, David S. Miller wrote: > > I do agree that we should obviously not run bottom halves with > interrupts disabled > > Ok, so can we add a: > > if (irqs_disabled()) > BUG(); > > check to do_softirq()? I'd suggest making it a counting warning (with a static counter per local-bh-enable macro expansion) and adding it to local_bh_enable() - otherwise it will only BUG() when the (potentially rare) condition happens - instead of always giving a nice backtrace of exact problem spots. Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: BUG or not? GFP_KERNEL with interrupts disabled. 2003-03-27 18:04 ` Linus Torvalds @ 2003-03-27 18:07 ` David S. Miller 0 siblings, 0 replies; 15+ messages in thread From: David S. Miller @ 2003-03-27 18:07 UTC (permalink / raw) To: torvalds Cc: shmulik.hen, dane, bonding-devel, bonding-announce, netdev, linux-kernel, linux-net, mingo, kuznet From: Linus Torvalds <torvalds@transmeta.com> Date: Thu, 27 Mar 2003 10:04:52 -0800 (PST) I'd suggest making it a counting warning (with a static counter per local-bh-enable macro expansion) and adding it to local_bh_enable() - otherwise it will only BUG() when the (potentially rare) condition happens - instead of always giving a nice backtrace of exact problem spots. Ok, maybe it's time to move local_bh_enable() out of line, it's getting large and it's expanded in hundreds of places. ^ permalink raw reply [flat|nested] 15+ messages in thread
* udp_recvmsg: possible bug causing infinite hang? 2003-03-27 17:22 ` Linus Torvalds 2003-03-27 17:55 ` David S. Miller @ 2004-10-10 1:38 ` Chad N. Tindel 2004-10-10 1:45 ` David S. Miller 1 sibling, 1 reply; 15+ messages in thread From: Chad N. Tindel @ 2004-10-10 1:38 UTC (permalink / raw) To: netdev; +Cc: linux-net I've encountered a hang condition during testing that only appeared when we upgraded from Redhat EL 3 Update 2 to Redhat EL 3 Update 3. After looking at the differences, it appears to be caused by a change to udp_recvmsg that also appears to have filtered back into the main kernel tree, so it is possible that more people would be affected by this than just redhat users. Anyway, here is the scenario: User space code sends a datagram on a blocking socket, and then calls select() or poll() to wait for the reply. When that pops with a non-error condition (so we _know_ there is data to be read), recvfrom() is called. Now, assume that somewhere along the way (it doesn't really matter where) the UDP packet is corrupted. Also, assume that no further inbound datagrams are destined for this socket. The new udp_recvmsg() will get down to the bottom, and then will go to the try_again label, where it will block forever in skb_recv_datagram() waiting for a datagram that will never come. The old code used to not have this try_again case, and so would always just return immediately. While this is a general problem for any program that uses UDP and relies on the fact that select popped to insure that recvfrom won't hang, the place where we always see it is in the DNS lookup portion of glibc. The send_dg() function is what actually hangs. My questions are: 1. Why was the code changed in this way? 2. Is this a bug? It seems so to me, because select (or poll) specifically says there is data to read, and then it hangs when we try to read it. But, without the context of #1, it is hard to make this determination. Thanks, Chad ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: udp_recvmsg: possible bug causing infinite hang? 2004-10-10 1:38 ` udp_recvmsg: possible bug causing infinite hang? Chad N. Tindel @ 2004-10-10 1:45 ` David S. Miller 2004-10-10 1:54 ` Chad N. Tindel 2004-10-12 5:32 ` Nagendra Singh Tomar 0 siblings, 2 replies; 15+ messages in thread From: David S. Miller @ 2004-10-10 1:45 UTC (permalink / raw) To: Chad N. Tindel; +Cc: netdev, linux-net On Sat, 9 Oct 2004 18:38:58 -0700 (PDT) "Chad N. Tindel" <ctindel@falcon.csc.calpoly.edu> wrote: > User space code sends a datagram on a blocking socket, and then calls > select() or poll() to wait for the reply. If you wish the socket not to block, mark the file descriptor as non-blocking or pass in the appropriate MSG_* flag into recvmsg(). Select() returning that a filedescriptor is readable does not guarentee that a blocking socket will not block in the ead call. This behavior has been in the Linux kernel for an enourmous amount of them, the change you are noticing going from EL 3 update 2 to update 3 is that previously we were returning -EAGAIN to blocking sockets, instead we are properly blocking to wait for another packet. There is an enormous and long thread about this topic on the linux-kernel list, please read there before we duplicate such a long and tiring thread here. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: udp_recvmsg: possible bug causing infinite hang? 2004-10-10 1:45 ` David S. Miller @ 2004-10-10 1:54 ` Chad N. Tindel 2004-10-12 5:32 ` Nagendra Singh Tomar 1 sibling, 0 replies; 15+ messages in thread From: Chad N. Tindel @ 2004-10-10 1:54 UTC (permalink / raw) To: David S. Miller; +Cc: netdev, linux-net On Sat, Oct 09, 2004 at 06:45:03PM -0700, David S. Miller wrote: > On Sat, 9 Oct 2004 18:38:58 -0700 (PDT) > "Chad N. Tindel" <ctindel@falcon.csc.calpoly.edu> wrote: > > > User space code sends a datagram on a blocking socket, and then calls > > select() or poll() to wait for the reply. > > If you wish the socket not to block, mark the file descriptor > as non-blocking or pass in the appropriate MSG_* flag into > recvmsg(). > > Select() returning that a filedescriptor is readable does not > guarentee that a blocking socket will not block in the ead > call. > > This behavior has been in the Linux kernel for an enourmous > amount of them, the change you are noticing going from EL 3 update > 2 to update 3 is that previously we were returning -EAGAIN to > blocking sockets, instead we are properly blocking to wait for > another packet. > > There is an enormous and long thread about this topic on the > linux-kernel list, please read there before we duplicate such > a long and tiring thread here. OK, so what you're saying is that the C-library has a bug because it isn't using a non-blocking socket when doing DNS lookups? Thanks, Chad ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: udp_recvmsg: possible bug causing infinite hang? 2004-10-10 1:45 ` David S. Miller 2004-10-10 1:54 ` Chad N. Tindel @ 2004-10-12 5:32 ` Nagendra Singh Tomar 2004-10-12 5:32 ` Chad N. Tindel 1 sibling, 1 reply; 15+ messages in thread From: Nagendra Singh Tomar @ 2004-10-12 5:32 UTC (permalink / raw) To: David S. Miller; +Cc: Chad N. Tindel, netdev, linux-net On Sat, 9 Oct 2004, David S. Miller wrote: > Select() returning that a filedescriptor is readable does not > guarentee that a blocking socket will not block in the ead > call. > Does the POSIX standard _not_ give this implicit guarantee ? Whats the point in having a select call that says that the fd is readable, and when I go to read it the blocking read call blocks. Then either select returned a wrong positive or the read is buggy. The kernel should mark the socket readable (and hence select returns "readable") only after it has added the incoming data to the socket receive queue or it has changed the state of the socket (in case of TCP connections getting closed), in case of peer closing the connection. After that has been done, is the guarantee not there ? Thanx, tomar -- You have moved the mouse. Windows must be restarted for the changes to take effect. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: udp_recvmsg: possible bug causing infinite hang? 2004-10-12 5:32 ` Nagendra Singh Tomar @ 2004-10-12 5:32 ` Chad N. Tindel 2004-10-12 6:08 ` Nagendra Singh Tomar 0 siblings, 1 reply; 15+ messages in thread From: Chad N. Tindel @ 2004-10-12 5:32 UTC (permalink / raw) To: Nagendra Singh Tomar; +Cc: netdev, linux-net > Does the POSIX standard _not_ give this implicit guarantee ? Whats the > point in having a select call that says that the fd is readable, and when > I go to read it the blocking read call blocks. Then either select returned > a wrong positive or the read is buggy. > The kernel should mark the socket readable (and hence select returns > "readable") only after it has added the incoming data to the socket receive > queue or it has changed the state of the socket (in case of TCP connections > getting closed), in case of peer closing the connection. > After that has been done, is the guarantee not there ? What you're saying is correct. However, if the UDP handler pulls the data off of the receive queue and then finds that the datagram is corrupted, it will go back into the read function to wait for another datagram. Personally I think that it should return some other kind of error (not EAGAIN) to indicate somehow that we discarded a datagram due to a bad checksum. Chad ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: udp_recvmsg: possible bug causing infinite hang? 2004-10-12 5:32 ` Chad N. Tindel @ 2004-10-12 6:08 ` Nagendra Singh Tomar 2004-10-12 8:56 ` Henrik Nordstrom 0 siblings, 1 reply; 15+ messages in thread From: Nagendra Singh Tomar @ 2004-10-12 6:08 UTC (permalink / raw) To: Chad N. Tindel; +Cc: Nagendra Singh Tomar, netdev, linux-net On Tue, 12 Oct 2004, Chad N. Tindel wrote: > > Does the POSIX standard _not_ give this implicit guarantee ? Whats the > > point in having a select call that says that the fd is readable, and when > > I go to read it the blocking read call blocks. Then either select returned > > a wrong positive or the read is buggy. > > The kernel should mark the socket readable (and hence select returns > > "readable") only after it has added the incoming data to the socket receive > > queue or it has changed the state of the socket (in case of TCP connections > > getting closed), in case of peer closing the connection. > > After that has been done, is the guarantee not there ? > > What you're saying is correct. However, if the UDP handler pulls the data > off of the receive queue and then finds that the datagram is corrupted, it > will go back into the read function to wait for another datagram. Personally > I think that it should return some other kind of error (not EAGAIN) to indicate > somehow that we discarded a datagram due to a bad checksum. Now I wonder whether the skb should be added to the UDP socket receive queue, only after checksum verification (as is done for TCP). I understand that, doing it that way we lose the advantage of folding copy-to-user and checksumming. If the standard does not explicitly write anything about the guarantee of read/write passing after successful return from select, then its fine; but somehow that does not sound very rational. If anyone can point to the specific section in the POSIX standard that dictates this, it will be of great help. Thanx, tomar > > Chad > -- -- You have moved the mouse. Windows must be restarted for the changes to take effect. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: udp_recvmsg: possible bug causing infinite hang? 2004-10-12 6:08 ` Nagendra Singh Tomar @ 2004-10-12 8:56 ` Henrik Nordstrom 0 siblings, 0 replies; 15+ messages in thread From: Henrik Nordstrom @ 2004-10-12 8:56 UTC (permalink / raw) To: Nagendra Singh Tomar; +Cc: Chad N. Tindel, netdev, linux-net On Tue, 12 Oct 2004, Nagendra Singh Tomar wrote: > checksumming. If the standard does not explicitly write anything about the > guarantee of read/write passing after successful return from select, then > its fine; but somehow that does not sound very rational. > If anyone can point to the specific section in the POSIX standard that > dictates this, it will be of great help. >From SUSv3 on select(): A descriptor shall be considered ready for reading when a call to an input function with O_NONBLOCK clear would not block, whether or not the function would transfer data successfully. (The function might return data, an end-of-file indication, or an error other than one indicating that it is blocked, and in each of these cases the descriptor shall be considered ready for reading.) A descriptor shall be considered ready for writing when a call to an output function with O_NONBLOCK clear would not block, whether or not the function would transfer data successfully. so it seems to me the current Linux implementation is wrong in this regard. Regards Henrik ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2004-10-12 8:56 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <E791C176A6139242A988ABA8B3D9B38A01085638@hasmsx403.iil.intel.com>
2003-03-27 13:32 ` BUG or not? GFP_KERNEL with interrupts disabled shmulik.hen
2003-03-27 13:43 ` David S. Miller
2003-03-27 14:11 ` Trond Myklebust
2003-03-27 14:12 ` David S. Miller
2003-03-27 17:22 ` Linus Torvalds
2003-03-27 17:55 ` David S. Miller
2003-03-27 18:04 ` Linus Torvalds
2003-03-27 18:07 ` David S. Miller
2004-10-10 1:38 ` udp_recvmsg: possible bug causing infinite hang? Chad N. Tindel
2004-10-10 1:45 ` David S. Miller
2004-10-10 1:54 ` Chad N. Tindel
2004-10-12 5:32 ` Nagendra Singh Tomar
2004-10-12 5:32 ` Chad N. Tindel
2004-10-12 6:08 ` Nagendra Singh Tomar
2004-10-12 8:56 ` Henrik Nordstrom
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).