* raw sockets and blocking @ 2004-02-18 1:33 Paul Jakma 2004-02-18 5:37 ` David Schwartz 0 siblings, 1 reply; 12+ messages in thread From: Paul Jakma @ 2004-02-18 1:33 UTC (permalink / raw) To: Linux Kernel; +Cc: Hasso Tepper Hi, I'm curious, is it good for raw sockets to block for writes because a cable of one interface has been pulled? We're seeing a problem with ospfd (www.zebra.org/www.quagga.net) which uses a single raw, AF_INET/OSPF socket and manages it's own IP headers, to send/receive OSPF packets to/from a number of interfaces. The problem we see is that: - a cable is pulled from an interface - the application tests the file descriptor to see if it ready for writing, and finds it is. - the application constructs a packet to send out that interface and sends it with sendmsg(), no error is posted. - the file descriptor never becomes available for writing again - hence, all OSPF adjacencies are lost, because we can no longer write out packets to the file descriptor. we havnt yet tested if it becomes writeable again if we put cable back in, however if we detect absence of IFF_RUNNING and hence manually avoid constructing packets to be sent via link-down interfaces, we avoid this problem. However, this leaves us with a race. Is this proper behaviour? I'm guessing the driver or network layer is blocking the socket because it is waiting for the link to come back, however would it not be better to discard the packet, especially a raw packet? (if it is "proper" behaviour that's fine, we can work with that, we were just surprised sendmsg() is trying to be /that/ reliable :) .) regards, -- Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A warning: do not ever send email to spam@dishone.st Fortune: How much net work could a network work, if a network could net work? ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: raw sockets and blocking 2004-02-18 1:33 raw sockets and blocking Paul Jakma @ 2004-02-18 5:37 ` David Schwartz 2004-02-18 6:42 ` Hasso Tepper ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: David Schwartz @ 2004-02-18 5:37 UTC (permalink / raw) To: Linux Kernel; +Cc: Hasso Tepper > - a cable is pulled from an interface > - the application tests the file descriptor to see if it ready for > writing, and finds it is. > - the application constructs a packet to send out that interface > and sends it with sendmsg(), no error is posted. > - the file descriptor never becomes available for writing again > - hence, all OSPF adjacencies are lost, because we can no longer > write out packets to the file descriptor. This is rational behavior. > we havnt yet tested if it becomes writeable again if we put cable > back in, however if we detect absence of IFF_RUNNING and hence > manually avoid constructing packets to be sent via link-down > interfaces, we avoid this problem. However, this leaves us with a > race. I'm not sure I understand what the problem is. If the network cable is disconnected, you couldn't usefully send anything if the socket was ready anyway. > Is this proper behaviour? Certainly. > I'm guessing the driver or network layer is > blocking the socket because it is waiting for the link to come back, > however would it not be better to discard the packet, especially a > raw packet? If you want to discard the packet, you do it. Why should the kernel accept a packet just to discard it if it's smart enough to not accept it? > (if it is "proper" behaviour that's fine, we can work with that, we > were just surprised sendmsg() is trying to be /that/ reliable :) .) It is proper. Being always ready and dropping the packet is proper as well but inferior. If you want the behavior you say you expect, consider the packet always ready and if it's really not ready, drop the packet on the floor yourself. This will get you the (inferior) behavior you want. How would it help you to find the packet ready and send data the system will just drop on the floor? Won't you lose your adjacencies anyway -- they'll time out either way). DS ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: raw sockets and blocking 2004-02-18 5:37 ` David Schwartz @ 2004-02-18 6:42 ` Hasso Tepper 2004-02-18 11:42 ` David Schwartz 2004-02-18 8:43 ` Hasso Tepper 2004-02-19 6:20 ` Paul Jakma 2 siblings, 1 reply; 12+ messages in thread From: Hasso Tepper @ 2004-02-18 6:42 UTC (permalink / raw) To: davids; +Cc: Linux Kernel David Schwartz wrote: > > we havnt yet tested if it becomes writeable again if we put cable > > back in, however if we detect absence of IFF_RUNNING and hence > > manually avoid constructing packets to be sent via link-down > > interfaces, we avoid this problem. However, this leaves us with a > > race. > > I'm not sure I understand what the problem is. If the network > cable is disconnected, you couldn't usefully send anything if the > socket was ready anyway. One raw socket is used to send packets to several interfaces. If only one of them is down, socket will be blocked as well. Related problem is that we have no way to detect if vlan interface goes down. Wouldn't be correct behavior to remove IFF_RUNNING from all vlan interfaces bound to ethernet interface if this ethernet interface goes down? There might be similar problems with other network interfaces. -- Hasso Tepper Elion Enterprises Ltd. WAN administrator ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: raw sockets and blocking 2004-02-18 6:42 ` Hasso Tepper @ 2004-02-18 11:42 ` David Schwartz 2004-02-19 6:28 ` Paul Jakma 0 siblings, 1 reply; 12+ messages in thread From: David Schwartz @ 2004-02-18 11:42 UTC (permalink / raw) To: hasso; +Cc: Linux Kernel > David Schwartz wrote: > > > we havnt yet tested if it becomes writeable again if we put cable > > > back in, however if we detect absence of IFF_RUNNING and hence > > > manually avoid constructing packets to be sent via link-down > > > interfaces, we avoid this problem. However, this leaves us with a > > > race. > > I'm not sure I understand what the problem is. If the network > > cable is disconnected, you couldn't usefully send anything if the > > socket was ready anyway. > One raw socket is used to send packets to several interfaces. If only > one of them is down, socket will be blocked as well. Then the kernel is broken. It must not block an operation indefinitely when that operation can complete without blocking. It is, however, perfectly legal to say an operation can complete without blocking (say, through 'select' or 'poll') and later return EWOULDBLOCK. (So long as some operation could have completed, not necessarily the one you tried.) Just as a 'poll may return that write is okay for a TCP connection but a 64Kb write will definitely block. DS ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: raw sockets and blocking 2004-02-18 11:42 ` David Schwartz @ 2004-02-19 6:28 ` Paul Jakma 2004-02-19 7:53 ` Jamie Lokier 0 siblings, 1 reply; 12+ messages in thread From: Paul Jakma @ 2004-02-19 6:28 UTC (permalink / raw) To: David Schwartz; +Cc: hasso, Linux Kernel On Wed, 18 Feb 2004, David Schwartz wrote: > Then the kernel is broken. It must not block an operation > indefinitely when that operation can complete without blocking. Aha. > It is, however, perfectly legal to say an operation can > complete without blocking (say, through 'select' or 'poll') and > later return EWOULDBLOCK. (So long as some operation could have > completed, not necessarily the one you tried.) Right. But that's fine, we can deal with that, if the error is posted. Problem is no error is posted when we sendmsg[1], yet the socket thereafter stays write-blocked, with (sane) way for us to recover. (until presumably link comes back, for what ever reason, unfortunately the OSPF RFCs do not mandate for hosts to have robots attached to do media maintenance :) ). In short, for raw sockets at least, the kernel needs to either: - post an error for writes to raw sockets if they will block or - if the network driver concerned is not ready to take the packet, drop the packet right there. (upper layers (ie userspace, eg ospfd) will follow their own procedures for dealing with packet loss/down interfaces.) > DS 1. Least, Hasso has not reported the relevant error message occuring. regards, -- Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A warning: do not ever send email to spam@dishone.st Fortune: Your own qualities will help prevent your advancement in the world. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: raw sockets and blocking 2004-02-19 6:28 ` Paul Jakma @ 2004-02-19 7:53 ` Jamie Lokier 2004-02-19 8:34 ` Paul Jakma 0 siblings, 1 reply; 12+ messages in thread From: Jamie Lokier @ 2004-02-19 7:53 UTC (permalink / raw) To: Paul Jakma; +Cc: David Schwartz, hasso, Linux Kernel Paul Jakma wrote: > > It is, however, perfectly legal to say an operation can > > complete without blocking (say, through 'select' or 'poll') and > > later return EWOULDBLOCK. (So long as some operation could have > > completed, not necessarily the one you tried.) > > Right. But that's fine, we can deal with that, if the error is > posted. > > Problem is no error is posted when we sendmsg[1], yet the socket > thereafter stays write-blocked, with (sane) way for us to recover. I hate to check the obvious, but did you try setting the O_NONBLOCK flag for the socket? Did you try setting the MSG_DONTWAIT flag for the sendmsg operation? -- Jamie ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: raw sockets and blocking 2004-02-19 7:53 ` Jamie Lokier @ 2004-02-19 8:34 ` Paul Jakma 2004-02-19 12:40 ` Hasso Tepper 0 siblings, 1 reply; 12+ messages in thread From: Paul Jakma @ 2004-02-19 8:34 UTC (permalink / raw) To: Jamie Lokier; +Cc: David Schwartz, hasso, Linux Kernel On Thu, 19 Feb 2004, Jamie Lokier wrote: > I hate to check the obvious, but did you try setting the O_NONBLOCK > flag for the socket? Did you try setting the MSG_DONTWAIT flag for > the sendmsg operation? We're select() driven, so the problem is not that the process literally blocks and sleeps, its that the socket never becomes ready to write again. > -- Jamie regards, -- Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A warning: do not ever send email to spam@dishone.st Fortune: Assembly language experience is [important] for the maturity and understanding of how computers work that it provides. -- D. Gries ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: raw sockets and blocking 2004-02-19 8:34 ` Paul Jakma @ 2004-02-19 12:40 ` Hasso Tepper 2004-02-19 12:54 ` Paul Jakma 0 siblings, 1 reply; 12+ messages in thread From: Hasso Tepper @ 2004-02-19 12:40 UTC (permalink / raw) To: Paul Jakma; +Cc: Jamie Lokier, David Schwartz, Linux Kernel Paul Jakma wrote: > On Thu, 19 Feb 2004, Jamie Lokier wrote: > > I hate to check the obvious, but did you try setting the > > O_NONBLOCK flag for the socket? Did you try setting the > > MSG_DONTWAIT flag for the sendmsg operation? > > We're select() driven, so the problem is not that the process > literally blocks and sleeps, its that the socket never becomes > ready to write again. And maybe it makes sense to mention that all packets ospf daemon sends to actually down ethernet interface are multicast packets. -- Hasso Tepper Elion Enterprises Ltd. WAN administrator ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: raw sockets and blocking 2004-02-19 12:40 ` Hasso Tepper @ 2004-02-19 12:54 ` Paul Jakma 2004-03-22 7:14 ` Hasso Tepper 0 siblings, 1 reply; 12+ messages in thread From: Paul Jakma @ 2004-02-19 12:54 UTC (permalink / raw) To: Hasso Tepper; +Cc: Jamie Lokier, David Schwartz, Linux Kernel, Quagga Dev On Thu, 19 Feb 2004, Hasso Tepper wrote: > And maybe it makes sense to mention that all packets ospf daemon > sends to actually down ethernet interface are multicast packets. nearly all. unicast packets are sent too. regards, -- Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A warning: do not ever send email to spam@dishone.st Fortune: Death is only a state of mind. Only it doesn't leave you much time to think about anything else. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: raw sockets and blocking 2004-02-19 12:54 ` Paul Jakma @ 2004-03-22 7:14 ` Hasso Tepper 0 siblings, 0 replies; 12+ messages in thread From: Hasso Tepper @ 2004-03-22 7:14 UTC (permalink / raw) To: Paul Jakma; +Cc: Jamie Lokier, David Schwartz, Linux Kernel, Quagga Dev Paul Jakma wrote: > On Thu, 19 Feb 2004, Hasso Tepper wrote: > > And maybe it makes sense to mention that all packets ospf daemon > > sends to actually down ethernet interface are multicast packets. > > nearly all. unicast packets are sent too. Hello's in broadcast network are multicast. Problem is solved now for me, btw. It appears to be bug in e100 driver in 2.4.x. I can't reproduce it any more with e100 development driver (from http://sf.net/projects/e1000/). And I can't it reproduce it with forcing network to non-broadcast either (in this case unicast hello's are sent). So it's multicast problem with e100 2.x driver. -- Hasso Tepper Elion Enterprises Ltd. WAN administrator ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: raw sockets and blocking 2004-02-18 5:37 ` David Schwartz 2004-02-18 6:42 ` Hasso Tepper @ 2004-02-18 8:43 ` Hasso Tepper 2004-02-19 6:20 ` Paul Jakma 2 siblings, 0 replies; 12+ messages in thread From: Hasso Tepper @ 2004-02-18 8:43 UTC (permalink / raw) To: davids; +Cc: Linux Kernel David Schwartz wrote: > > I'm guessing the driver or network layer is > > blocking the socket because it is waiting for the link to come > > back, however would it not be better to discard the packet, > > especially a raw packet? > > If you want to discard the packet, you do it. Why should the > kernel accept a packet just to discard it if it's smart enough to > not accept it? >From "man sendmsg" in Debian unstable (manpage is dated 2003-10-25). ENOBUFS The output queue for a network interface was full. This generally indicates that the interface has stopped sending, but may be caused by transient congestion. (Normally, this does not occur in Linux. Packets are just silently dropped when a device queue overflows.) -- Hasso Tepper Elion Enterprises Ltd. WAN administrator ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: raw sockets and blocking 2004-02-18 5:37 ` David Schwartz 2004-02-18 6:42 ` Hasso Tepper 2004-02-18 8:43 ` Hasso Tepper @ 2004-02-19 6:20 ` Paul Jakma 2 siblings, 0 replies; 12+ messages in thread From: Paul Jakma @ 2004-02-19 6:20 UTC (permalink / raw) To: David Schwartz; +Cc: Linux Kernel, Hasso Tepper Hi David, On Tue, 17 Feb 2004, David Schwartz wrote: > > > - a cable is pulled from an interface > > - the application tests the file descriptor to see if it ready for > > writing, and finds it is. > > - the application constructs a packet to send out that interface > > and sends it with sendmsg(), no error is posted. > > - the file descriptor never becomes available for writing again > > - hence, all OSPF adjacencies are lost, because we can no longer > > write out packets to the file descriptor. > > This is rational behavior. It might be yes. We're trying to determine this. > I'm not sure I understand what the problem is. If the network cable is > disconnected, you couldn't usefully send anything if the socket was ready > anyway. We could, the down interface is but one of many. Yet, the raw socket becomes write-blocked because of a packet sent destined to be sent out a down interface, for ever. While I appreciate the kernel's best efforts, I feel it's possibly counter-productive to be so persistent for raw sockets :) To work around this behaviour, we'll have to move from one single global file descriptor to one file descriptor per interface. Which is potentially a scaling overhead for the case of thousands of interfaces. > If you want to discard the packet, you do it. Why should the kernel > accept a packet just to discard it if it's smart enough to not > accept it? How can we discard it? It's sitting queued somewhere in the socket layer, and we're blocked from sending from /any/ interface simply because of a cable pull on one interface. We could set a 'write blocked' timer I guess, and close() and reopen our raw socket if we find our raw socket write-blocked for too long, but that would be a gross hack. If the socket buffer were fill, fine, write-block for that. But surely otherwise, for a _raw socket_ which specifically makes no reliability, the socket should not get held up because a driver is throttling the socket due to no-link. This isnt a TCP socket, it's a raw socket - it's up to the process using the raw socket to implement it's own reliability and/or flow control, that's the precise point. Hence, the kernel should _not_. > It is proper. Being always ready and dropping the packet is proper > as well but inferior. For a raw socket? Surely the correct behaviour is to either return an error from sendmsg() or else drop the packet if the driver is link-down? > If you want the behavior you say you expect, consider the packet > always ready and if it's really not ready, drop the packet on the > floor yourself. This will get you the (inferior) behavior you want. We cant drop it unfortunately. How do we do this? SO_SNDTIMEO is not settable on linux. > How would it help you to find the packet ready and send data the > system will just drop on the floor? Won't you lose your adjacencies > anyway -- they'll time out either way). We multiplex adjacencies on many interfaces via one file-descriptor. We're dropping adjencies on all interfaces because we sent a packet destined to go out a link-down interface, which the kernel accepted _without_ returning an error[1]. 1. Hasso will correct me if i'm wrong here I hope - Hasso, no error is reported from ospf_write() from sendmsg() is there? > DS regards, -- Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A warning: do not ever send email to spam@dishone.st Fortune: Man invented language to satisfy his deep need to complain. -- Lily Tomlin ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2004-03-22 7:14 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-02-18 1:33 raw sockets and blocking Paul Jakma 2004-02-18 5:37 ` David Schwartz 2004-02-18 6:42 ` Hasso Tepper 2004-02-18 11:42 ` David Schwartz 2004-02-19 6:28 ` Paul Jakma 2004-02-19 7:53 ` Jamie Lokier 2004-02-19 8:34 ` Paul Jakma 2004-02-19 12:40 ` Hasso Tepper 2004-02-19 12:54 ` Paul Jakma 2004-03-22 7:14 ` Hasso Tepper 2004-02-18 8:43 ` Hasso Tepper 2004-02-19 6:20 ` Paul Jakma
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox