raw sockets and blocking

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* raw sockets and blocking
@ 2004-02-18  1:33 Paul Jakma
  2004-02-18  5:37 ` David Schwartz
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Jakma @ 2004-02-18  1:33 UTC (permalink / raw)
  To: Linux Kernel; +Cc: Hasso Tepper

Hi,

I'm curious, is it good for raw sockets to block for writes because a 
cable of one interface has been pulled? 

We're seeing a problem with ospfd (www.zebra.org/www.quagga.net) 
which uses a single raw, AF_INET/OSPF socket and manages it's own IP 
headers, to send/receive OSPF packets to/from a number of interfaces.

The problem we see is that:

- a cable is pulled from an interface
- the application tests the file descriptor to see if it ready for 
  writing, and finds it is.
- the application constructs a packet to send out that interface
  and sends it with sendmsg(), no error is posted.
- the file descriptor never becomes available for writing again
- hence, all OSPF adjacencies are lost, because we can no longer 
write out packets to the file descriptor.

we havnt yet tested if it becomes writeable again if we put cable
back in, however if we detect absence of IFF_RUNNING and hence
manually avoid constructing packets to be sent via link-down
interfaces, we avoid this problem. However, this leaves us with a
race.

Is this proper behaviour? I'm guessing the driver or network layer is 
blocking the socket because it is waiting for the link to come back, 
however would it not be better to discard the packet, especially a 
raw packet?

(if it is "proper" behaviour that's fine, we can work with that, we 
were just surprised sendmsg() is trying to be /that/ reliable :) .)

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
	warning: do not ever send email to spam@dishone.st
Fortune:
How much net work could a network work, if a network could net work?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: raw sockets and blocking
  2004-02-18  1:33 raw sockets and blocking Paul Jakma
@ 2004-02-18  5:37 ` David Schwartz
  2004-02-18  6:42   ` Hasso Tepper
                     ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: David Schwartz @ 2004-02-18  5:37 UTC (permalink / raw)
  To: Linux Kernel; +Cc: Hasso Tepper


> - a cable is pulled from an interface
> - the application tests the file descriptor to see if it ready for
>   writing, and finds it is.
> - the application constructs a packet to send out that interface
>   and sends it with sendmsg(), no error is posted.
> - the file descriptor never becomes available for writing again
> - hence, all OSPF adjacencies are lost, because we can no longer
> write out packets to the file descriptor.

	This is rational behavior.

> we havnt yet tested if it becomes writeable again if we put cable
> back in, however if we detect absence of IFF_RUNNING and hence
> manually avoid constructing packets to be sent via link-down
> interfaces, we avoid this problem. However, this leaves us with a
> race.

	I'm not sure I understand what the problem is. If the network cable is
disconnected, you couldn't usefully send anything if the socket was ready
anyway.

> Is this proper behaviour?

	Certainly.

> I'm guessing the driver or network layer is
> blocking the socket because it is waiting for the link to come back,
> however would it not be better to discard the packet, especially a
> raw packet?

	If you want to discard the packet, you do it. Why should the kernel accept
a packet just to discard it if it's smart enough to not accept it?

> (if it is "proper" behaviour that's fine, we can work with that, we
> were just surprised sendmsg() is trying to be /that/ reliable :) .)

	It is proper. Being always ready and dropping the packet is proper as well
but inferior.

	If you want the behavior you say you expect, consider the packet always
ready and if it's really not ready, drop the packet on the floor yourself.
This will get you the (inferior) behavior you want. How would it help you to
find the packet ready and send data the system will just drop on the floor?
Won't you lose your adjacencies anyway -- they'll time out either way).

	DS



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: raw sockets and blocking
  2004-02-18  5:37 ` David Schwartz
@ 2004-02-18  6:42   ` Hasso Tepper
  2004-02-18 11:42     ` David Schwartz
  2004-02-18  8:43   ` Hasso Tepper
  2004-02-19  6:20   ` Paul Jakma
  2 siblings, 1 reply; 12+ messages in thread
From: Hasso Tepper @ 2004-02-18  6:42 UTC (permalink / raw)
  To: davids; +Cc: Linux Kernel

David Schwartz wrote:
> > we havnt yet tested if it becomes writeable again if we put cable
> > back in, however if we detect absence of IFF_RUNNING and hence
> > manually avoid constructing packets to be sent via link-down
> > interfaces, we avoid this problem. However, this leaves us with a
> > race.
>
> 	I'm not sure I understand what the problem is. If the network
> cable is disconnected, you couldn't usefully send anything if the
> socket was ready anyway.

One raw socket is used to send packets to several interfaces. If only 
one of them is down, socket will be blocked as well.

Related problem is that we have no way to detect if vlan interface 
goes down. Wouldn't be correct behavior to remove IFF_RUNNING from 
all vlan interfaces bound to ethernet interface if this ethernet 
interface goes down? There might be similar problems with other 
network interfaces.

-- 
Hasso Tepper
Elion Enterprises Ltd.
WAN administrator

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: raw sockets and blocking
  2004-02-18  6:42   ` Hasso Tepper
@ 2004-02-18 11:42     ` David Schwartz
  2004-02-19  6:28       ` Paul Jakma
  0 siblings, 1 reply; 12+ messages in thread
From: David Schwartz @ 2004-02-18 11:42 UTC (permalink / raw)
  To: hasso; +Cc: Linux Kernel



> David Schwartz wrote:

> > > we havnt yet tested if it becomes writeable again if we put cable
> > > back in, however if we detect absence of IFF_RUNNING and hence
> > > manually avoid constructing packets to be sent via link-down
> > > interfaces, we avoid this problem. However, this leaves us with a
> > > race.

> > 	I'm not sure I understand what the problem is. If the network
> > cable is disconnected, you couldn't usefully send anything if the
> > socket was ready anyway.

> One raw socket is used to send packets to several interfaces. If only
> one of them is down, socket will be blocked as well.

	Then the kernel is broken. It must not block an operation indefinitely when
that operation can complete without blocking.

	It is, however, perfectly legal to say an operation can complete without
blocking (say, through 'select' or 'poll') and later return EWOULDBLOCK. (So
long as some operation could have completed, not necessarily the one you
tried.) Just as a 'poll may return that write is okay for a TCP connection
but a 64Kb write will definitely block.

	DS



^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: raw sockets and blocking
  2004-02-18 11:42     ` David Schwartz
@ 2004-02-19  6:28       ` Paul Jakma
  2004-02-19  7:53         ` Jamie Lokier
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Jakma @ 2004-02-19  6:28 UTC (permalink / raw)
  To: David Schwartz; +Cc: hasso, Linux Kernel

On Wed, 18 Feb 2004, David Schwartz wrote:

> 	Then the kernel is broken. It must not block an operation
> indefinitely when that operation can complete without blocking.

Aha.

> 	It is, however, perfectly legal to say an operation can
> complete without blocking (say, through 'select' or 'poll') and
> later return EWOULDBLOCK. (So long as some operation could have
> completed, not necessarily the one you tried.)

Right. But that's fine, we can deal with that, if the error is
posted.

Problem is no error is posted when we sendmsg[1], yet the socket
thereafter stays write-blocked, with (sane) way for us to recover.  
(until presumably link comes back, for what ever reason,
unfortunately the OSPF RFCs do not mandate for hosts to have robots
attached to do media maintenance :) ).

In short, for raw sockets at least, the kernel needs to either:

- post an error for writes to raw sockets if they will block

or 

- if the network driver concerned is not ready to take the packet,
drop the packet right there. (upper layers (ie userspace, eg ospfd)  
will follow their own procedures for dealing with packet loss/down
interfaces.)

> 	DS

1. Least, Hasso has not reported the relevant error message occuring.

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
	warning: do not ever send email to spam@dishone.st
Fortune:
Your own qualities will help prevent your advancement in the world.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: raw sockets and blocking
  2004-02-19  6:28       ` Paul Jakma
@ 2004-02-19  7:53         ` Jamie Lokier
  2004-02-19  8:34           ` Paul Jakma
  0 siblings, 1 reply; 12+ messages in thread
From: Jamie Lokier @ 2004-02-19  7:53 UTC (permalink / raw)
  To: Paul Jakma; +Cc: David Schwartz, hasso, Linux Kernel

Paul Jakma wrote:
> > 	It is, however, perfectly legal to say an operation can
> > complete without blocking (say, through 'select' or 'poll') and
> > later return EWOULDBLOCK. (So long as some operation could have
> > completed, not necessarily the one you tried.)
> 
> Right. But that's fine, we can deal with that, if the error is
> posted.
> 
> Problem is no error is posted when we sendmsg[1], yet the socket
> thereafter stays write-blocked, with (sane) way for us to recover.  

I hate to check the obvious, but did you try setting the O_NONBLOCK
flag for the socket?  Did you try setting the MSG_DONTWAIT flag for
the sendmsg operation?

-- Jamie

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: raw sockets and blocking
  2004-02-19  7:53         ` Jamie Lokier
@ 2004-02-19  8:34           ` Paul Jakma
  2004-02-19 12:40             ` Hasso Tepper
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Jakma @ 2004-02-19  8:34 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: David Schwartz, hasso, Linux Kernel

On Thu, 19 Feb 2004, Jamie Lokier wrote:

> I hate to check the obvious, but did you try setting the O_NONBLOCK
> flag for the socket?  Did you try setting the MSG_DONTWAIT flag for
> the sendmsg operation?

We're select() driven, so the problem is not that the process
literally blocks and sleeps, its that the socket never becomes ready
to write again.

> -- Jamie

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
	warning: do not ever send email to spam@dishone.st
Fortune:
Assembly language experience is [important] for the maturity
and understanding of how computers work that it provides.
		-- D. Gries

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: raw sockets and blocking
  2004-02-19  8:34           ` Paul Jakma
@ 2004-02-19 12:40             ` Hasso Tepper
  2004-02-19 12:54               ` Paul Jakma
  0 siblings, 1 reply; 12+ messages in thread
From: Hasso Tepper @ 2004-02-19 12:40 UTC (permalink / raw)
  To: Paul Jakma; +Cc: Jamie Lokier, David Schwartz, Linux Kernel

Paul Jakma wrote:
> On Thu, 19 Feb 2004, Jamie Lokier wrote:
> > I hate to check the obvious, but did you try setting the
> > O_NONBLOCK flag for the socket?  Did you try setting the
> > MSG_DONTWAIT flag for the sendmsg operation?
>
> We're select() driven, so the problem is not that the process
> literally blocks and sleeps, its that the socket never becomes
> ready to write again.

And maybe it makes sense to mention that all packets ospf daemon sends 
to actually down ethernet interface are multicast packets.

-- 
Hasso Tepper
Elion Enterprises Ltd.
WAN administrator

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: raw sockets and blocking
  2004-02-19 12:40             ` Hasso Tepper
@ 2004-02-19 12:54               ` Paul Jakma
  2004-03-22  7:14                 ` Hasso Tepper
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Jakma @ 2004-02-19 12:54 UTC (permalink / raw)
  To: Hasso Tepper; +Cc: Jamie Lokier, David Schwartz, Linux Kernel, Quagga Dev

On Thu, 19 Feb 2004, Hasso Tepper wrote:

> And maybe it makes sense to mention that all packets ospf daemon
> sends to actually down ethernet interface are multicast packets.

nearly all. unicast packets are sent too.

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
	warning: do not ever send email to spam@dishone.st
Fortune:
Death is only a state of mind.

Only it doesn't leave you much time to think about anything else.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: raw sockets and blocking
  2004-02-19 12:54               ` Paul Jakma
@ 2004-03-22  7:14                 ` Hasso Tepper
  0 siblings, 0 replies; 12+ messages in thread
From: Hasso Tepper @ 2004-03-22  7:14 UTC (permalink / raw)
  To: Paul Jakma; +Cc: Jamie Lokier, David Schwartz, Linux Kernel, Quagga Dev

Paul Jakma wrote:
> On Thu, 19 Feb 2004, Hasso Tepper wrote:
> > And maybe it makes sense to mention that all packets ospf daemon
> > sends to actually down ethernet interface are multicast packets.
>
> nearly all. unicast packets are sent too.

Hello's in broadcast network are multicast. Problem is solved now for 
me, btw. It appears to be bug in e100 driver in 2.4.x.

I can't reproduce it any more with e100 development driver (from 
http://sf.net/projects/e1000/). And I can't it reproduce it with 
forcing network to non-broadcast either (in this case unicast hello's 
are sent).

So it's multicast problem with e100 2.x driver.

-- 
Hasso Tepper
Elion Enterprises Ltd.
WAN administrator

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: raw sockets and blocking
  2004-02-18  5:37 ` David Schwartz
  2004-02-18  6:42   ` Hasso Tepper
@ 2004-02-18  8:43   ` Hasso Tepper
  2004-02-19  6:20   ` Paul Jakma
  2 siblings, 0 replies; 12+ messages in thread
From: Hasso Tepper @ 2004-02-18  8:43 UTC (permalink / raw)
  To: davids; +Cc: Linux Kernel

David Schwartz wrote:
> > I'm guessing the driver or network layer is
> > blocking the socket because it is waiting for the link to come
> > back, however would it not be better to discard the packet,
> > especially a raw packet?
>
> 	If you want to discard the packet, you do it. Why should the
> kernel accept a packet just to discard it if it's smart enough to
> not accept it?

>From "man sendmsg" in Debian unstable (manpage is dated 2003-10-25).

ENOBUFS
The output queue for a network interface was full.  This generally 
indicates that the interface  has  stopped  sending, but  may be 
caused by transient congestion.  (Normally, this does not occur in 
Linux. Packets are just silently dropped when a device queue 
overflows.)

-- 
Hasso Tepper
Elion Enterprises Ltd.
WAN administrator

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: raw sockets and blocking
  2004-02-18  5:37 ` David Schwartz
  2004-02-18  6:42   ` Hasso Tepper
  2004-02-18  8:43   ` Hasso Tepper
@ 2004-02-19  6:20   ` Paul Jakma
  2 siblings, 0 replies; 12+ messages in thread
From: Paul Jakma @ 2004-02-19  6:20 UTC (permalink / raw)
  To: David Schwartz; +Cc: Linux Kernel, Hasso Tepper

Hi David,

On Tue, 17 Feb 2004, David Schwartz wrote:

> 
> > - a cable is pulled from an interface
> > - the application tests the file descriptor to see if it ready for
> >   writing, and finds it is.
> > - the application constructs a packet to send out that interface
> >   and sends it with sendmsg(), no error is posted.
> > - the file descriptor never becomes available for writing again
> > - hence, all OSPF adjacencies are lost, because we can no longer
> > write out packets to the file descriptor.
> 
> 	This is rational behavior.

It might be yes. We're trying to determine this.

> 	I'm not sure I understand what the problem is. If the network cable is
> disconnected, you couldn't usefully send anything if the socket was ready
> anyway.

We could, the down interface is but one of many. Yet, the raw socket 
becomes write-blocked because of a packet sent destined to be sent 
out a down interface, for ever.

While I appreciate the kernel's best efforts, I feel it's possibly 
counter-productive to be so persistent for raw sockets :)

To work around this behaviour, we'll have to move from one single 
global file descriptor to one file descriptor per interface. Which is 
potentially a scaling overhead for the case of thousands of 
interfaces.

> If you want to discard the packet, you do it. Why should the kernel
> accept a packet just to discard it if it's smart enough to not
> accept it?

How can we discard it? It's sitting queued somewhere in the socket 
layer, and we're blocked from sending from /any/ interface simply 
because of a cable pull on one interface. 

We could set a 'write blocked' timer I guess, and close() and reopen 
our raw socket if we find our raw socket write-blocked for too long, 
but that would be a gross hack.

If the socket buffer were fill, fine, write-block for that. But
surely otherwise, for a _raw socket_ which specifically makes no
reliability, the socket should not get held up because a driver is
throttling the socket due to no-link.

This isnt a TCP socket, it's a raw socket - it's up to the process 
using the raw socket to implement it's own reliability and/or flow 
control, that's the precise point. Hence, the kernel should _not_.

> It is proper. Being always ready and dropping the packet is proper
> as well but inferior.

For a raw socket?

Surely the correct behaviour is to either return an error from
sendmsg() or else drop the packet if the driver is link-down?

> If you want the behavior you say you expect, consider the packet
> always ready and if it's really not ready, drop the packet on the
> floor yourself. This will get you the (inferior) behavior you want.

We cant drop it unfortunately. How do we do this? SO_SNDTIMEO is not 
settable on linux.

> How would it help you to find the packet ready and send data the
> system will just drop on the floor? Won't you lose your adjacencies
> anyway -- they'll time out either way).

We multiplex adjacencies on many interfaces via one file-descriptor.  
We're dropping adjencies on all interfaces because we sent a packet
destined to go out a link-down interface, which the kernel accepted 
_without_ returning an error[1].

1. Hasso will correct me if i'm wrong here I hope - Hasso, no error 
is reported from ospf_write() from sendmsg() is there?

> 	DS

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
	warning: do not ever send email to spam@dishone.st
Fortune:
Man invented language to satisfy his deep need to complain.
		-- Lily Tomlin

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2004-03-22  7:14 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-18  1:33 raw sockets and blocking Paul Jakma
2004-02-18  5:37 ` David Schwartz
2004-02-18  6:42   ` Hasso Tepper
2004-02-18 11:42     ` David Schwartz
2004-02-19  6:28       ` Paul Jakma
2004-02-19  7:53         ` Jamie Lokier
2004-02-19  8:34           ` Paul Jakma
2004-02-19 12:40             ` Hasso Tepper
2004-02-19 12:54               ` Paul Jakma
2004-03-22  7:14                 ` Hasso Tepper
2004-02-18  8:43   ` Hasso Tepper
2004-02-19  6:20   ` Paul Jakma

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox