* raw sockets and blocking
@ 2004-02-18 1:33 Paul Jakma
2004-02-18 5:37 ` David Schwartz
0 siblings, 1 reply; 12+ messages in thread
From: Paul Jakma @ 2004-02-18 1:33 UTC (permalink / raw)
To: Linux Kernel; +Cc: Hasso Tepper
Hi,
I'm curious, is it good for raw sockets to block for writes because a
cable of one interface has been pulled?
We're seeing a problem with ospfd (www.zebra.org/www.quagga.net)
which uses a single raw, AF_INET/OSPF socket and manages it's own IP
headers, to send/receive OSPF packets to/from a number of interfaces.
The problem we see is that:
- a cable is pulled from an interface
- the application tests the file descriptor to see if it ready for
writing, and finds it is.
- the application constructs a packet to send out that interface
and sends it with sendmsg(), no error is posted.
- the file descriptor never becomes available for writing again
- hence, all OSPF adjacencies are lost, because we can no longer
write out packets to the file descriptor.
we havnt yet tested if it becomes writeable again if we put cable
back in, however if we detect absence of IFF_RUNNING and hence
manually avoid constructing packets to be sent via link-down
interfaces, we avoid this problem. However, this leaves us with a
race.
Is this proper behaviour? I'm guessing the driver or network layer is
blocking the socket because it is waiting for the link to come back,
however would it not be better to discard the packet, especially a
raw packet?
(if it is "proper" behaviour that's fine, we can work with that, we
were just surprised sendmsg() is trying to be /that/ reliable :) .)
regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
How much net work could a network work, if a network could net work?
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: raw sockets and blocking
2004-02-18 1:33 raw sockets and blocking Paul Jakma
@ 2004-02-18 5:37 ` David Schwartz
2004-02-18 6:42 ` Hasso Tepper
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: David Schwartz @ 2004-02-18 5:37 UTC (permalink / raw)
To: Linux Kernel; +Cc: Hasso Tepper
> - a cable is pulled from an interface
> - the application tests the file descriptor to see if it ready for
> writing, and finds it is.
> - the application constructs a packet to send out that interface
> and sends it with sendmsg(), no error is posted.
> - the file descriptor never becomes available for writing again
> - hence, all OSPF adjacencies are lost, because we can no longer
> write out packets to the file descriptor.
This is rational behavior.
> we havnt yet tested if it becomes writeable again if we put cable
> back in, however if we detect absence of IFF_RUNNING and hence
> manually avoid constructing packets to be sent via link-down
> interfaces, we avoid this problem. However, this leaves us with a
> race.
I'm not sure I understand what the problem is. If the network cable is
disconnected, you couldn't usefully send anything if the socket was ready
anyway.
> Is this proper behaviour?
Certainly.
> I'm guessing the driver or network layer is
> blocking the socket because it is waiting for the link to come back,
> however would it not be better to discard the packet, especially a
> raw packet?
If you want to discard the packet, you do it. Why should the kernel accept
a packet just to discard it if it's smart enough to not accept it?
> (if it is "proper" behaviour that's fine, we can work with that, we
> were just surprised sendmsg() is trying to be /that/ reliable :) .)
It is proper. Being always ready and dropping the packet is proper as well
but inferior.
If you want the behavior you say you expect, consider the packet always
ready and if it's really not ready, drop the packet on the floor yourself.
This will get you the (inferior) behavior you want. How would it help you to
find the packet ready and send data the system will just drop on the floor?
Won't you lose your adjacencies anyway -- they'll time out either way).
DS
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: raw sockets and blocking
2004-02-18 5:37 ` David Schwartz
@ 2004-02-18 6:42 ` Hasso Tepper
2004-02-18 11:42 ` David Schwartz
2004-02-18 8:43 ` Hasso Tepper
2004-02-19 6:20 ` Paul Jakma
2 siblings, 1 reply; 12+ messages in thread
From: Hasso Tepper @ 2004-02-18 6:42 UTC (permalink / raw)
To: davids; +Cc: Linux Kernel
David Schwartz wrote:
> > we havnt yet tested if it becomes writeable again if we put cable
> > back in, however if we detect absence of IFF_RUNNING and hence
> > manually avoid constructing packets to be sent via link-down
> > interfaces, we avoid this problem. However, this leaves us with a
> > race.
>
> I'm not sure I understand what the problem is. If the network
> cable is disconnected, you couldn't usefully send anything if the
> socket was ready anyway.
One raw socket is used to send packets to several interfaces. If only
one of them is down, socket will be blocked as well.
Related problem is that we have no way to detect if vlan interface
goes down. Wouldn't be correct behavior to remove IFF_RUNNING from
all vlan interfaces bound to ethernet interface if this ethernet
interface goes down? There might be similar problems with other
network interfaces.
--
Hasso Tepper
Elion Enterprises Ltd.
WAN administrator
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: raw sockets and blocking
2004-02-18 5:37 ` David Schwartz
2004-02-18 6:42 ` Hasso Tepper
@ 2004-02-18 8:43 ` Hasso Tepper
2004-02-19 6:20 ` Paul Jakma
2 siblings, 0 replies; 12+ messages in thread
From: Hasso Tepper @ 2004-02-18 8:43 UTC (permalink / raw)
To: davids; +Cc: Linux Kernel
David Schwartz wrote:
> > I'm guessing the driver or network layer is
> > blocking the socket because it is waiting for the link to come
> > back, however would it not be better to discard the packet,
> > especially a raw packet?
>
> If you want to discard the packet, you do it. Why should the
> kernel accept a packet just to discard it if it's smart enough to
> not accept it?
>From "man sendmsg" in Debian unstable (manpage is dated 2003-10-25).
ENOBUFS
The output queue for a network interface was full. This generally
indicates that the interface has stopped sending, but may be
caused by transient congestion. (Normally, this does not occur in
Linux. Packets are just silently dropped when a device queue
overflows.)
--
Hasso Tepper
Elion Enterprises Ltd.
WAN administrator
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: raw sockets and blocking
2004-02-18 6:42 ` Hasso Tepper
@ 2004-02-18 11:42 ` David Schwartz
2004-02-19 6:28 ` Paul Jakma
0 siblings, 1 reply; 12+ messages in thread
From: David Schwartz @ 2004-02-18 11:42 UTC (permalink / raw)
To: hasso; +Cc: Linux Kernel
> David Schwartz wrote:
> > > we havnt yet tested if it becomes writeable again if we put cable
> > > back in, however if we detect absence of IFF_RUNNING and hence
> > > manually avoid constructing packets to be sent via link-down
> > > interfaces, we avoid this problem. However, this leaves us with a
> > > race.
> > I'm not sure I understand what the problem is. If the network
> > cable is disconnected, you couldn't usefully send anything if the
> > socket was ready anyway.
> One raw socket is used to send packets to several interfaces. If only
> one of them is down, socket will be blocked as well.
Then the kernel is broken. It must not block an operation indefinitely when
that operation can complete without blocking.
It is, however, perfectly legal to say an operation can complete without
blocking (say, through 'select' or 'poll') and later return EWOULDBLOCK. (So
long as some operation could have completed, not necessarily the one you
tried.) Just as a 'poll may return that write is okay for a TCP connection
but a 64Kb write will definitely block.
DS
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: raw sockets and blocking
2004-02-18 5:37 ` David Schwartz
2004-02-18 6:42 ` Hasso Tepper
2004-02-18 8:43 ` Hasso Tepper
@ 2004-02-19 6:20 ` Paul Jakma
2 siblings, 0 replies; 12+ messages in thread
From: Paul Jakma @ 2004-02-19 6:20 UTC (permalink / raw)
To: David Schwartz; +Cc: Linux Kernel, Hasso Tepper
Hi David,
On Tue, 17 Feb 2004, David Schwartz wrote:
>
> > - a cable is pulled from an interface
> > - the application tests the file descriptor to see if it ready for
> > writing, and finds it is.
> > - the application constructs a packet to send out that interface
> > and sends it with sendmsg(), no error is posted.
> > - the file descriptor never becomes available for writing again
> > - hence, all OSPF adjacencies are lost, because we can no longer
> > write out packets to the file descriptor.
>
> This is rational behavior.
It might be yes. We're trying to determine this.
> I'm not sure I understand what the problem is. If the network cable is
> disconnected, you couldn't usefully send anything if the socket was ready
> anyway.
We could, the down interface is but one of many. Yet, the raw socket
becomes write-blocked because of a packet sent destined to be sent
out a down interface, for ever.
While I appreciate the kernel's best efforts, I feel it's possibly
counter-productive to be so persistent for raw sockets :)
To work around this behaviour, we'll have to move from one single
global file descriptor to one file descriptor per interface. Which is
potentially a scaling overhead for the case of thousands of
interfaces.
> If you want to discard the packet, you do it. Why should the kernel
> accept a packet just to discard it if it's smart enough to not
> accept it?
How can we discard it? It's sitting queued somewhere in the socket
layer, and we're blocked from sending from /any/ interface simply
because of a cable pull on one interface.
We could set a 'write blocked' timer I guess, and close() and reopen
our raw socket if we find our raw socket write-blocked for too long,
but that would be a gross hack.
If the socket buffer were fill, fine, write-block for that. But
surely otherwise, for a _raw socket_ which specifically makes no
reliability, the socket should not get held up because a driver is
throttling the socket due to no-link.
This isnt a TCP socket, it's a raw socket - it's up to the process
using the raw socket to implement it's own reliability and/or flow
control, that's the precise point. Hence, the kernel should _not_.
> It is proper. Being always ready and dropping the packet is proper
> as well but inferior.
For a raw socket?
Surely the correct behaviour is to either return an error from
sendmsg() or else drop the packet if the driver is link-down?
> If you want the behavior you say you expect, consider the packet
> always ready and if it's really not ready, drop the packet on the
> floor yourself. This will get you the (inferior) behavior you want.
We cant drop it unfortunately. How do we do this? SO_SNDTIMEO is not
settable on linux.
> How would it help you to find the packet ready and send data the
> system will just drop on the floor? Won't you lose your adjacencies
> anyway -- they'll time out either way).
We multiplex adjacencies on many interfaces via one file-descriptor.
We're dropping adjencies on all interfaces because we sent a packet
destined to go out a link-down interface, which the kernel accepted
_without_ returning an error[1].
1. Hasso will correct me if i'm wrong here I hope - Hasso, no error
is reported from ospf_write() from sendmsg() is there?
> DS
regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
Man invented language to satisfy his deep need to complain.
-- Lily Tomlin
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: raw sockets and blocking
2004-02-18 11:42 ` David Schwartz
@ 2004-02-19 6:28 ` Paul Jakma
2004-02-19 7:53 ` Jamie Lokier
0 siblings, 1 reply; 12+ messages in thread
From: Paul Jakma @ 2004-02-19 6:28 UTC (permalink / raw)
To: David Schwartz; +Cc: hasso, Linux Kernel
On Wed, 18 Feb 2004, David Schwartz wrote:
> Then the kernel is broken. It must not block an operation
> indefinitely when that operation can complete without blocking.
Aha.
> It is, however, perfectly legal to say an operation can
> complete without blocking (say, through 'select' or 'poll') and
> later return EWOULDBLOCK. (So long as some operation could have
> completed, not necessarily the one you tried.)
Right. But that's fine, we can deal with that, if the error is
posted.
Problem is no error is posted when we sendmsg[1], yet the socket
thereafter stays write-blocked, with (sane) way for us to recover.
(until presumably link comes back, for what ever reason,
unfortunately the OSPF RFCs do not mandate for hosts to have robots
attached to do media maintenance :) ).
In short, for raw sockets at least, the kernel needs to either:
- post an error for writes to raw sockets if they will block
or
- if the network driver concerned is not ready to take the packet,
drop the packet right there. (upper layers (ie userspace, eg ospfd)
will follow their own procedures for dealing with packet loss/down
interfaces.)
> DS
1. Least, Hasso has not reported the relevant error message occuring.
regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
Your own qualities will help prevent your advancement in the world.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: raw sockets and blocking
2004-02-19 6:28 ` Paul Jakma
@ 2004-02-19 7:53 ` Jamie Lokier
2004-02-19 8:34 ` Paul Jakma
0 siblings, 1 reply; 12+ messages in thread
From: Jamie Lokier @ 2004-02-19 7:53 UTC (permalink / raw)
To: Paul Jakma; +Cc: David Schwartz, hasso, Linux Kernel
Paul Jakma wrote:
> > It is, however, perfectly legal to say an operation can
> > complete without blocking (say, through 'select' or 'poll') and
> > later return EWOULDBLOCK. (So long as some operation could have
> > completed, not necessarily the one you tried.)
>
> Right. But that's fine, we can deal with that, if the error is
> posted.
>
> Problem is no error is posted when we sendmsg[1], yet the socket
> thereafter stays write-blocked, with (sane) way for us to recover.
I hate to check the obvious, but did you try setting the O_NONBLOCK
flag for the socket? Did you try setting the MSG_DONTWAIT flag for
the sendmsg operation?
-- Jamie
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: raw sockets and blocking
2004-02-19 7:53 ` Jamie Lokier
@ 2004-02-19 8:34 ` Paul Jakma
2004-02-19 12:40 ` Hasso Tepper
0 siblings, 1 reply; 12+ messages in thread
From: Paul Jakma @ 2004-02-19 8:34 UTC (permalink / raw)
To: Jamie Lokier; +Cc: David Schwartz, hasso, Linux Kernel
On Thu, 19 Feb 2004, Jamie Lokier wrote:
> I hate to check the obvious, but did you try setting the O_NONBLOCK
> flag for the socket? Did you try setting the MSG_DONTWAIT flag for
> the sendmsg operation?
We're select() driven, so the problem is not that the process
literally blocks and sleeps, its that the socket never becomes ready
to write again.
> -- Jamie
regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
Assembly language experience is [important] for the maturity
and understanding of how computers work that it provides.
-- D. Gries
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: raw sockets and blocking
2004-02-19 8:34 ` Paul Jakma
@ 2004-02-19 12:40 ` Hasso Tepper
2004-02-19 12:54 ` Paul Jakma
0 siblings, 1 reply; 12+ messages in thread
From: Hasso Tepper @ 2004-02-19 12:40 UTC (permalink / raw)
To: Paul Jakma; +Cc: Jamie Lokier, David Schwartz, Linux Kernel
Paul Jakma wrote:
> On Thu, 19 Feb 2004, Jamie Lokier wrote:
> > I hate to check the obvious, but did you try setting the
> > O_NONBLOCK flag for the socket? Did you try setting the
> > MSG_DONTWAIT flag for the sendmsg operation?
>
> We're select() driven, so the problem is not that the process
> literally blocks and sleeps, its that the socket never becomes
> ready to write again.
And maybe it makes sense to mention that all packets ospf daemon sends
to actually down ethernet interface are multicast packets.
--
Hasso Tepper
Elion Enterprises Ltd.
WAN administrator
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: raw sockets and blocking
2004-02-19 12:40 ` Hasso Tepper
@ 2004-02-19 12:54 ` Paul Jakma
2004-03-22 7:14 ` Hasso Tepper
0 siblings, 1 reply; 12+ messages in thread
From: Paul Jakma @ 2004-02-19 12:54 UTC (permalink / raw)
To: Hasso Tepper; +Cc: Jamie Lokier, David Schwartz, Linux Kernel, Quagga Dev
On Thu, 19 Feb 2004, Hasso Tepper wrote:
> And maybe it makes sense to mention that all packets ospf daemon
> sends to actually down ethernet interface are multicast packets.
nearly all. unicast packets are sent too.
regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
Death is only a state of mind.
Only it doesn't leave you much time to think about anything else.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: raw sockets and blocking
2004-02-19 12:54 ` Paul Jakma
@ 2004-03-22 7:14 ` Hasso Tepper
0 siblings, 0 replies; 12+ messages in thread
From: Hasso Tepper @ 2004-03-22 7:14 UTC (permalink / raw)
To: Paul Jakma; +Cc: Jamie Lokier, David Schwartz, Linux Kernel, Quagga Dev
Paul Jakma wrote:
> On Thu, 19 Feb 2004, Hasso Tepper wrote:
> > And maybe it makes sense to mention that all packets ospf daemon
> > sends to actually down ethernet interface are multicast packets.
>
> nearly all. unicast packets are sent too.
Hello's in broadcast network are multicast. Problem is solved now for
me, btw. It appears to be bug in e100 driver in 2.4.x.
I can't reproduce it any more with e100 development driver (from
http://sf.net/projects/e1000/). And I can't it reproduce it with
forcing network to non-broadcast either (in this case unicast hello's
are sent).
So it's multicast problem with e100 2.x driver.
--
Hasso Tepper
Elion Enterprises Ltd.
WAN administrator
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2004-03-22 7:14 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-18 1:33 raw sockets and blocking Paul Jakma
2004-02-18 5:37 ` David Schwartz
2004-02-18 6:42 ` Hasso Tepper
2004-02-18 11:42 ` David Schwartz
2004-02-19 6:28 ` Paul Jakma
2004-02-19 7:53 ` Jamie Lokier
2004-02-19 8:34 ` Paul Jakma
2004-02-19 12:40 ` Hasso Tepper
2004-02-19 12:54 ` Paul Jakma
2004-03-22 7:14 ` Hasso Tepper
2004-02-18 8:43 ` Hasso Tepper
2004-02-19 6:20 ` Paul Jakma
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox