Re: [patch 4/10] s390: network driver.

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [patch 4/10] s390: network driver.
       [not found] <OF88EC0E9F.DE8FC278-ONC1256F4A.0038D5C0-C1256F4A.00398E11@de.ibm.com>
@ 2004-11-14  1:29 ` Jeff Garzik
  2004-11-15  7:52   ` Paul Jakma
  0 siblings, 1 reply; 67+ messages in thread
From: Jeff Garzik @ 2004-11-14  1:29 UTC (permalink / raw)
  To: Thomas Spatzier; +Cc: linux-kernel, Netdev

Thomas Spatzier wrote:
> 
> 
> 
>>You should be using netif_carrier_{on,off} properly, and not drop the
>>packets.  When (if) link comes back, you requeue the packets to hardware
>>(or hypervisor or whatever).  Your dev->stop() should stop operation and
>>clean up anything left in your send/receive {rings | buffers}.
>>
> 
> 
> When we do not drop packets, but call netif_stop_queue the write queues
> of all sockets associated to the net device are blocked as soon as they
> get full. This causes problems with programs such as the zebra routing
> daemon. So we have to keep the netif queue running in order to not block
> any programs.

This is very, very wrong.  You are essentially creating in in-driver 
/dev/null simulator.  This does nothing but chew CPU cycles by having 
the driver drop packets, rather than allowing the system to work as it 
should.

Queues are DESIGNED to fill up under various conditions.

Would not the zebra routing software have the same problems with cable 
pull under an e1000 or tg3 gigabit NIC?


> We also had a look at some other drivers and the common behaviour seems to
> be that packets are lost if the network cable is pulled out.

Which drivers, specifically, are these?

The most popular drivers -- e1000, tg3, etc. -- do not do this, for very 
good reasons.

	Jeff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-11-14  1:29 ` [patch 4/10] s390: network driver Jeff Garzik
@ 2004-11-15  7:52   ` Paul Jakma
  2004-11-21  8:16     ` Paul Jakma
  2004-11-29 15:57     ` Thomas Spatzier
  0 siblings, 2 replies; 67+ messages in thread
From: Paul Jakma @ 2004-11-15  7:52 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Thomas Spatzier, linux-kernel, Netdev

On Sat, 13 Nov 2004, Jeff Garzik wrote:

> Queues are DESIGNED to fill up under various conditions.

What happens when they are full? Blocking isnt good, at least not 
from GNU Zebra / Quagga's POV.

> Would not the zebra routing software have the same problems with 
> cable pull under an e1000 or tg3 gigabit NIC?

If they block further writes, yes.

> The most popular drivers -- e1000, tg3, etc. -- do not do this, for 
> very good reasons.

The problem is that GNU Zebra / Quagga uses a single raw socket for 
certain protocols, eg OSPF. Blocking the socket because one NIC has a 
cable pulled is undesireable, and there's no point queueing the 
packets, by the time the link comes back, if ever, its highly 
unlikely there is any use in sending the packets (sending OSPF 
hello's from X minutes ago on a link that just came back is useless).

The kernel really shouldnt get too much in the way of an application 
that already is fully aware of reliability issues[1] - at least that 
is the application's expectation in this case.

We could change it to use a socket/interface on Linux, but it seems a 
bit unnecessary to me, at least for raw socket/included-header 
sockets.

> 	Jeff

1. an application must be if its uses raw sockets surely? Even for 
non-raw/header-included sockets, eg BGP tcp sockets, a user like GNU 
Zebra / Quagga would much prefer packets to be dropped.

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
My interest is in the future because I am going to spend the rest of my
life there.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-11-15  7:52   ` Paul Jakma
@ 2004-11-21  8:16     ` Paul Jakma
  2004-11-29 15:57     ` Thomas Spatzier
  1 sibling, 0 replies; 67+ messages in thread
From: Paul Jakma @ 2004-11-21  8:16 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Thomas Spatzier, linux-kernel, Netdev

On Mon, 15 Nov 2004, Paul Jakma wrote:

> non-raw/header-included sockets, eg BGP tcp sockets, a user like 
> GNU Zebra / Quagga would much prefer packets to be dropped.

Ur... not for TCP.. obviously.

Anyway, is there any advice on how applications that use a single 
socket for raw/udp should deal with this new behaviour? All of the 
link-orientated routing protocol daemons in Quagga/GNU Zebra are 
going to break on Linux with this new behaviour.

Should such applications be changed to open a seperate socket per 
interface? Or could we have a SO_DROP_DONT_QUEUE sockopt to allow a 
privileged application to retain the previous behaviour, or some way 
to flush the queue for a socket?

Using a socket per interface wont address problem of sending quite 
stale packets when a link comes back after a long time down, AUI. 
(not a huge problem - but not nice).

Jeff???

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
Be incomprehensible.  If they can't understand, they can't disagree.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-11-15  7:52   ` Paul Jakma
  2004-11-21  8:16     ` Paul Jakma
@ 2004-11-29 15:57     ` Thomas Spatzier
  2004-11-29 16:30       ` Paul Jakma
  1 sibling, 1 reply; 67+ messages in thread
From: Thomas Spatzier @ 2004-11-29 15:57 UTC (permalink / raw)
  To: paul; +Cc: jgarzik, linux-kernel, netdev

> Using a socket per interface wont address problem of sending
> quite stale packets when a link > comes back after a long time
> down, AUI. (not a huge problem - but not nice).
>
> Jeff???

Has there been any outcome on the discussion about whether or not
a device driver should drop packets when the cable is disconnected?
It seems that from the zebra point of view, as Paul wrote,
it would be better to not block sockets by queueing up packets
when there is no cable connection.

I do also think that it does not make sense to keep packets in the
queue and then send those packets when the cable is plugged in
again after a possibly long time.
There are protocols like TCP that handle packet loss anyway.

Regards,
Thomas.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-11-29 15:57     ` Thomas Spatzier
@ 2004-11-29 16:30       ` Paul Jakma
  2004-11-29 16:41         ` Thomas Spatzier
  0 siblings, 1 reply; 67+ messages in thread
From: Paul Jakma @ 2004-11-29 16:30 UTC (permalink / raw)
  To: Thomas Spatzier; +Cc: jgarzik, linux-kernel, netdev

On Mon, 29 Nov 2004, Thomas Spatzier wrote:

> Has there been any outcome on the discussion about whether or not a 
> device driver should drop packets when the cable is disconnected?

There hasnt.

> It seems that from the zebra point of view, as Paul wrote, it would 
> be better to not block sockets by queueing up packets when there is 
> no cable connection.

I'd prefer either to get ENOBUFS or to have kernel drop the packet - 
we're privileged apps writing to raw sockets and/or with IP_HDRINCL, 
the kernel should assume we know what we're doing (cause if we dont, 
there's far /worse/ we could do than send packets to an 
effective /dev/null).

> I do also think that it does not make sense to keep packets in the 
> queue and then send those packets when the cable is plugged in 
> again after a possibly long time.

Well, if the kernel is going to queue these packets without notifying 
us, we absolutely *must* have some way to flush those queues. Sending 
stale packets many minutes after the application generated them could 
have serious consequences for routing (eg, think sending RIP, IPv4 
IRDP or v6 RAs which are no longer valid - client receives them and 
installs routes which are long invalid and loses connectivity to some 
part of the network).

So even if we moved to socket/interface, we still need some way to 
tell kernel to flush a socket when we receive link-down over netlink 
later.

Not trying to queue on sockets where app has no expectation of 
reliability in first place would be even better ;)

> There are protocols like TCP that handle packet loss anyway.

Yes.

I'd be very interested to hear advice from the kernel gurus (eg "god, 
dont be so stupid, do xyz in your application instead"). We can 
accomodate whatever kernel wants as long as its workable.

> Regards,
> Thomas.

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
You will pay for your sins.  If you have already paid, please disregard
this message.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-11-29 16:30       ` Paul Jakma
@ 2004-11-29 16:41         ` Thomas Spatzier
  2004-11-29 20:27           ` Paul Jakma
  0 siblings, 1 reply; 67+ messages in thread
From: Thomas Spatzier @ 2004-11-29 16:41 UTC (permalink / raw)
  To: Paul Jakma; +Cc: jgarzik, linux-kernel, netdev





Paul Jakma <paul@clubi.ie> wrote on 29.11.2004 17:30:23:
> Well, if the kernel is going to queue these packets without notifying
> us, we absolutely *must* have some way to flush those queues. Sending
> stale packets many minutes after the application generated them could
> have serious consequences for routing (eg, think sending RIP, IPv4
> IRDP or v6 RAs which are no longer valid - client receives them and
> installs routes which are long invalid and loses connectivity to some
> part of the network).
>

Yes, for the examples you mentioned the app should better be notified.
However, AFAICS, there are no such notification mechanisms on a
per-packet basis implemented in the kernel.
And I doubt that they are going to be implemented.

> I'd be very interested to hear advice from the kernel gurus (eg "god,
> dont be so stupid, do xyz in your application instead"). We can
> accomodate whatever kernel wants as long as its workable.

Good suggestion, if anyone has an interesting and feasible solution
I will be happy to integrate it. So far, however, it don't see one and I
would point people being worried about lost packets to TCP.


Regards,
Thomas.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-11-29 16:41         ` Thomas Spatzier
@ 2004-11-29 20:27           ` Paul Jakma
  2004-11-30  7:22             ` Thomas Spatzier
  0 siblings, 1 reply; 67+ messages in thread
From: Paul Jakma @ 2004-11-29 20:27 UTC (permalink / raw)
  To: Thomas Spatzier; +Cc: jgarzik, linux-kernel, netdev

On Mon, 29 Nov 2004, Thomas Spatzier wrote:

> Yes, for the examples you mentioned the app should better be 
> notified. However, AFAICS, there are no such notification 
> mechanisms on a per-packet basis implemented in the kernel. And I 
> doubt that they are going to be implemented.

We do get notified. We get a netlink interface update with 
unset IFF_RUNNING from the interface flags.

However, it can take time before zebra gets to read it, and further 
time before zebra sends its own interface update to protocol daemons, 
and further time before they might get to read and update their own 
interface state. By which time those daemons may have sent packets 
down those interfaces (which packets may become invalid before link 
becomes back, but which still will be sent and hence which 
potentially could disrupt routing).

Ideally, we should be notified synchronously (EBUFS?) or if thats not 
possible, packet should be dropped (see my below presumption though).

The other solution is some means for a daemon to be able flush queues 
for a specific interface. (but thats a bit ugly).

> Good suggestion, if anyone has an interesting and feasible solution 
> I will be happy to integrate it. So far, however, it don't see one 
> and I would point people being worried about lost packets to TCP.

Surely TCP already was able to take care of retransmits? I'm not 
familiar with Linux internals, but how did TCP cope with the previous 
driver behaviour? (I get the vague impression that we're talking 
about a driver specific buffer here, rather than the applications 
socket buffer - is that correct?).

Jeff, David, enlighten us please! :)

> Regards,
> Thomas.

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
I can't understand why people are frightened of new ideas.  I'm frightened
of the old ones.
 		-- John Cage

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-11-29 20:27           ` Paul Jakma
@ 2004-11-30  7:22             ` Thomas Spatzier
  2004-12-05  6:25               ` Paul Jakma
  0 siblings, 1 reply; 67+ messages in thread
From: Thomas Spatzier @ 2004-11-30  7:22 UTC (permalink / raw)
  To: Paul Jakma; +Cc: jgarzik, linux-kernel, netdev

Paul Jakma <paul@clubi.ie> wrote on 29.11.2004 21:27:57:
> We do get notified. We get a netlink interface update with
> unset IFF_RUNNING from the interface flags.
>
> However, it can take time before zebra gets to read it, and further
> time before zebra sends its own interface update to protocol daemons,
> and further time before they might get to read and update their own
> interface state. By which time those daemons may have sent packets
> down those interfaces (which packets may become invalid before link
> becomes back, but which still will be sent and hence which
> potentially could disrupt routing).

Ok, then some logic could be implemented in userland to take
appropriate actions. It must be ensured that zebra handles the
netlink notification fast enough.

>
> Ideally, we should be notified synchronously (EBUFS?) or if thats not
> possible, packet should be dropped (see my below presumption though).

In the manpages for send/sendto/sendmsg it says that there is a -ENOBUFS
return value, if a sockets write queue is full. It also says:
"Normally, this does not occur in Linux. Packets are just silently dropped
when a device queue overflows."
So, if packets are 'silently dropped' anyway, the fact that we drop them in
our driver (and increment the error count in the net_device_stats
accordingly)
should not be a problem.

> Surely TCP already was able to take care of retransmits? I'm not
> familiar with Linux internals, but how did TCP cope with the previous
> driver behaviour?

I think that both behaviours are similar for TCP. TCP waits for ACKs for
each packet. If they do not arrive, a retransmit is done. Sooner or later
the connection will be reset, if no responses from the other side arrive.
So the result for both driver behaviours should be the same.

Regards,
Thomas

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-11-30  7:22             ` Thomas Spatzier
@ 2004-12-05  6:25               ` Paul Jakma
  2004-12-06 11:01                 ` Post Network dev questions to netdev Please WAS(Re: " jamal
  2004-12-06 11:27                 ` jamal
  0 siblings, 2 replies; 67+ messages in thread
From: Paul Jakma @ 2004-12-05  6:25 UTC (permalink / raw)
  To: Thomas Spatzier; +Cc: jgarzik, linux-kernel, netdev

On Tue, 30 Nov 2004, Thomas Spatzier wrote:

> Ok, then some logic could be implemented in userland to take 
> appropriate actions. It must be ensured that zebra handles the 
> netlink notification fast enough.

AIUI, netlink is not synchronous, it most definitely makes no 
reliability guarantees (and at the moment, zebra isnt terribly 
efficient at reading netlink, large numbers of interfaces will cause 
overruns in zebra - fixing this is on the TODO list). So we can never 
get rid of the window where a daemon could send a packet out a 
link-down interface - we can make that window smaller but not 
eliminate it.

Hence we need either a way to flush packets associated with an 
(interface,socket) (or just the socket) or we need the kernel to not 
accept such packets (and drop packets it has accepted).

> In the manpages for send/sendto/sendmsg it says that there is a -ENOBUFS
> return value, if a sockets write queue is full.

Yes, ENOBUFS, sorry.

> It also says:

> "Normally, this does not occur in Linux. Packets are just silently dropped
> when a device queue overflows."

This has always been (AFAIK) the behaviour yes. We started getting 
reports of the new queuing behaviour with, iirc, a version of Intel's 
e100 driver for 2.4.2x, which was later changed back to the old 
behaviour. However now that the queue behaviour is apparently the 
mandated behaviour we really need to work out what to do about the 
sending-long-stale packets problem.

> So, if packets are 'silently dropped' anyway, the fact that we drop 
> them in our driver (and increment the error count in the 
> net_device_stats accordingly) should not be a problem.

It shouldnt no.

The likes of OSPF already specify their own reliability mechanisms.

> I think that both behaviours are similar for TCP. TCP waits for 
> ACKs for each packet. If they do not arrive, a retransmit is done. 
> Sooner or later the connection will be reset, if no responses from 
> the other side arrive. So the result for both driver behaviours 
> should be the same.

But if TCP worked even when drivers dropped packets, then that 
implies TCP has its own queue? That we're talking about a seperate 
driver packet queue rather than the socket buffer (which is, 
presumably, where TCP retains packets until ACKed - i have no idea).

Anyway, we do, I think, need some way to deal with the 
sending-stale-packet-on-link-back problem. Either a way to flush this 
driver queue or else a guarantee that writes to sockets whose 
protocol makes no reliability guarantee will either return ENOBUFS or 
drop the packet.

Otherwise we will start getting reports of "Quagga on Linux sent an 
ancient {RIP,IRDP,RA} packet when we fixed a switch problem, and it 
caused an outage for a section of our network due to bad routes", I 
think.

Some comment or advice would be useful. (Am I kill-filed by all of 
netdev? feels like it).

> Regards,
> Thomas

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
No directory.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Post Network dev questions to netdev Please WAS(Re: [patch 4/10] s390: network driver.
  2004-12-05  6:25               ` Paul Jakma
@ 2004-12-06 11:01                 ` jamal
  2004-12-06 11:27                 ` jamal
  1 sibling, 0 replies; 67+ messages in thread
From: jamal @ 2004-12-06 11:01 UTC (permalink / raw)
  To: Paul Jakma; +Cc: Thomas Spatzier, jgarzik, linux-kernel, netdev

On Sun, 2004-12-05 at 01:25, Paul Jakma wrote:

> 
> This has always been (AFAIK) the behaviour yes. We started getting 
> reports of the new queuing behaviour with, iirc, a version of Intel's 
> e100 driver for 2.4.2x, which was later changed back to the old 
> behaviour. However now that the queue behaviour is apparently the 
> mandated behaviour we really need to work out what to do about the 
> sending-long-stale packets problem.
> 

I missed the beginings of this thread. Seems some patch was posted
on lkml which started this discussion. I am pretty sure what the lkml
FAQ says is to post on netdev. Ok, If you insist posting on lkml
(because that the way to glory, good fortune and fame), then please have
the courtesy to post to netdev.  

Now lets see if we can help. Followups only on netdev.

cheers,
jamal

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-05  6:25               ` Paul Jakma
  2004-12-06 11:01                 ` Post Network dev questions to netdev Please WAS(Re: " jamal
@ 2004-12-06 11:27                 ` jamal
  2004-12-06 14:42                   ` Hasso Tepper
  2004-12-06 18:44                   ` Paul Jakma
  1 sibling, 2 replies; 67+ messages in thread
From: jamal @ 2004-12-06 11:27 UTC (permalink / raw)
  To: Paul Jakma; +Cc: Thomas Spatzier, jgarzik, linux-kernel, netdev

On Sun, 2004-12-05 at 01:25, Paul Jakma wrote:
[..]
> Anyway, we do, I think, need some way to deal with the 
> sending-stale-packet-on-link-back problem. Either a way to flush this 
> driver queue or else a guarantee that writes to sockets whose 
> protocol makes no reliability guarantee will either return ENOBUFS or 
> drop the packet.
> 
> Otherwise we will start getting reports of "Quagga on Linux sent an 
> ancient {RIP,IRDP,RA} packet when we fixed a switch problem, and it 
> caused an outage for a section of our network due to bad routes", I 
> think.
> 
> Some comment or advice would be useful. (Am I kill-filed by all of 
> netdev? feels like it).

Dont post networking related patches on other lists. I havent seen said
patch, but it seems someone is complaining about some behavior changing?

In regards to link down and packets being queued.
Agreed this is a little problematic for some apps/transports. TCP is
definetely not one of them. TCP in Linux actually is told if the drop is
local. This way it can make better judgement (and not unnecesarily
adjust windows for example).
SCTP AFAIK is the only transport that provides its apps opportunity to
obsolete messages already sent. I am not sure how well implemented or
whtether it is implemented at all. Someone working on SCTP could
comment.

In the case the netdevice is administratively downed both the qdisc and
DMA ring packets are flushed. Newer packets will never be queued and
you should quickly be able to find from your app that the device is
down.
In the case of netdevice being operationally down - I am hoping this is
what the discussion is, having jumped on it - both queues stay intact.
What you can do is certainly from user space admin down/up the device
when you receive a netlink carrier off notification.
I am struggling to see whether dropping the packet inside the tx path
once it is operationaly down is so blasphemous ... need to get caffeine
first.

cheers,
jamal

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-06 11:27                 ` jamal
@ 2004-12-06 14:42                   ` Hasso Tepper
  2004-12-07  1:13                     ` Herbert Xu
  2004-12-07  2:39                     ` jamal
  2004-12-06 18:44                   ` Paul Jakma
  1 sibling, 2 replies; 67+ messages in thread
From: Hasso Tepper @ 2004-12-06 14:42 UTC (permalink / raw)
  To: hadi; +Cc: Paul Jakma, Thomas Spatzier, jgarzik, netdev

jamal wrote:
> Dont post networking related patches on other lists. I havent seen said
> patch, but it seems someone is complaining about some behavior changing?

For some strange reason I didn't find the beginning of thread either from 
archive, the first post seems to be  
http://oss.sgi.com/projects/netdev/archive/2004-11/msg01015.html.

I'm trying to summarize. The approach - one raw socket to send/receive 
messages no matter how many interfaces are used - worked for Quagga/Zebra 
routing software daemons till now. If some of these interfaces was 
operationally down, socket wasn't blocked even if the queue was full. In 
fact "man sendmsg" has still text:

ENOBUFS
	The output queue for a network interface was full.  This  gener-
	ally  indicates  that the interface has stopped sending, but may
	be caused by transient congestion.   (Normally,  this  does  not
	occur  in Linux. Packets are just silently dropped when a device
	queue overflows.)

Seems that it's no longer true. Seems that kernel is now trying as hard as 
possible not to loose any data - data is queued and if the queue will be 
full, all related sockets will be blocked to notify application. So, one 
socket approach don't work any more for Quagga/Zebra. No problem, we can 
take the "one socket per interface" approach. And we already have link 
detection implemented to notify daemons.

But there will be still potential problem - sending the "interface down" 
message from kernel to the zebra daemon and then to the routing protocol 
daemon takes time. And during this time daemon can send packets which will 
sit in queue and may cause many problems if sent to the network later (if 
link comes up again). Think about statelss routing protocols like rip(ng), 
ipv6 router advertisements etc. They may carry the info that's no longer 
true causing routing loops etc.

And to clarify a little bit: no - the Quagga/Zebra didn't work with previous 
approach perfectly. I fact with link detection and socket per interface it 
will be better than ever no matter what's the kernel behaviour. We just 
want to make sure that solution will be bulletproof.

So, problem - how can we make sure that no potentially dangerous aged 
(routing) info will be in the network?

-- 
Hasso Tepper
Elion Enterprises Ltd.
WAN administrator

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-06 11:27                 ` jamal
  2004-12-06 14:42                   ` Hasso Tepper
@ 2004-12-06 18:44                   ` Paul Jakma
  1 sibling, 0 replies; 67+ messages in thread
From: Paul Jakma @ 2004-12-06 18:44 UTC (permalink / raw)
  To: jamal; +Cc: Thomas Spatzier, jgarzik, linux-kernel, netdev

On Mon, 6 Dec 2004, jamal wrote:

> Dont post networking related patches on other lists. I havent seen said
> patch, but it seems someone is complaining about some behavior changing?

I missed the beginning of the thread too, but saw Jeff's reply to 
Thomas on netdev. It appears the original patch was to make the s390 
network driver discard packets on link-down.

Jeff had replied to say this was bad, that queues are meant to fill 
and that this was what other drivers (e1000, tg3) did.

> In regards to link down and packets being queued. Agreed this is a 
> little problematic for some apps/transports.

Tis yes. Particularly for apps using raw and UDP+IP_HDRINCL sockets.

This problem came to light when we got reports of ospfd blocking 
because link was down, late in 2.4 with a certain version of the 
(iirc) e100 driver. ospfd uses one single socket for all interfaces, 
and relies on IP_HDRINCL to have the packet routed out right 
interface. However this approach doesnt play well if the socket can 
be blocked completely because of /one/ interface having its link 
down. The behaviour we expected (and got up until now) is to receive 
either ENOBUFS or else, if the kernel accepts the packet write, for 
it to drop it if it can not be sent.

We can work around that by moving to a socket/interface. However it 
still leaves the problem of packets being queued indefinitely while 
the link is down and being sent when link comes back. This is *not* 
good for RIP, IPv4 IRDP and IPv6 RA.

> In the case the netdevice is administratively downed both the qdisc 
> and DMA ring packets are flushed.

What about any packets remaining in the socket buffer? (if that makes 
sense - i dont know enough about internals sadly). Are those queued?

> Newer packets will never be queued

That no longer appears to be the case though. The socket blocks, and 
/some/ packets are queued (presumably those which still were in the 
socket buffer? i dont know exactly..).

> and you should quickly be able to find from your app that 
> the device is down.

We can yes, via rtnetlink - but impossible to guarantee we'll know 
the link is down before we try write a packet.

> In the case of netdevice being operationally down

?

As in 'ip link set dev ... down'?

> - I am hoping this is what the discussion is, having jumped on it -

No, its for link-down, AIUI.

> both queues stay intact. What you can do is certainly from user 
> space admin down/up the device when you receive a netlink carrier 
> off notification.

That seems possible, but quite a hack. Something to work at a socket 
level would possibly be nicer. (Socket being the primary handle our 
application has).

> I am struggling to see whether dropping the packet inside the tx 
> path once it is operationaly down is so blasphemous ... need to get 
> caffeine first.

As long as reliable transports have some other transport specific 
queue, shouldnt be a problem. For UDP and raw no reliability or 
guarantees are expected by applications (least shouldnt be), and 
queueing some packets on link-down interferes with application-layer 
expectations.

> cheers,
> jamal

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
The UPS doesn't have a battery backup.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-06 14:42                   ` Hasso Tepper
@ 2004-12-07  1:13                     ` Herbert Xu
  2004-12-07  2:22                       ` jamal
  2004-12-10 15:37                       ` Paul Jakma
  2004-12-07  2:39                     ` jamal
  1 sibling, 2 replies; 67+ messages in thread
From: Herbert Xu @ 2004-12-07  1:13 UTC (permalink / raw)
  To: Hasso Tepper; +Cc: hadi, paul, thomas.spatzier, jgarzik, netdev

Hasso Tepper <hasso@estpak.ee> wrote:
> 
> Seems that it's no longer true. Seems that kernel is now trying as hard as 
> possible not to loose any data - data is queued and if the queue will be 
> full, all related sockets will be blocked to notify application. So, one 
> socket approach don't work any more for Quagga/Zebra. No problem, we can 
> take the "one socket per interface" approach. And we already have link 
> detection implemented to notify daemons.

I don't see why this should be happening.  Can you please provide a
minimal program that reproduces this blocking problem? For example,
something that sends a packet to a downed interface and then sends
one to an interface that's up?
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-07  1:13                     ` Herbert Xu
@ 2004-12-07  2:22                       ` jamal
  2004-12-10 15:37                       ` Paul Jakma
  1 sibling, 0 replies; 67+ messages in thread
From: jamal @ 2004-12-07  2:22 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Hasso Tepper, paul, thomas.spatzier, jgarzik, netdev


On Mon, 2004-12-06 at 20:13, Herbert Xu wrote:
> Hasso Tepper <hasso@estpak.ee> wrote:
> > 
> > Seems that it's no longer true. Seems that kernel is now trying as hard as 
> > possible not to loose any data - data is queued and if the queue will be 
> > full, all related sockets will be blocked to notify application. So, one 
> > socket approach don't work any more for Quagga/Zebra. No problem, we can 
> > take the "one socket per interface" approach. And we already have link 
> > detection implemented to notify daemons.
> 
> I don't see why this should be happening.  Can you please provide a
> minimal program that reproduces this blocking problem? For example,
> something that sends a packet to a downed interface and then sends
> one to an interface that's up?

This may be something to do with socket level changes maybe?
Does this happen with sockets that are not raw?

Having said that: I think getting rid of obsolete messages is important.
One of the suggested schemes of operation sounds to be the least brutal.

Jgarzik, I thought about it a little and it seems to me that allowing
the device driver to drop packets on txmit when netcarrier is off is not
that bad. This is almost like pretending packets were dropped on the
wire. Thoughts?

cheers,
jamal 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-06 14:42                   ` Hasso Tepper
  2004-12-07  1:13                     ` Herbert Xu
@ 2004-12-07  2:39                     ` jamal
  1 sibling, 0 replies; 67+ messages in thread
From: jamal @ 2004-12-07  2:39 UTC (permalink / raw)
  To: Hasso Tepper; +Cc: Paul Jakma, Thomas Spatzier, jgarzik, netdev

On Mon, 2004-12-06 at 09:42, Hasso Tepper wrote:

> I'm trying to summarize. The approach - one raw socket to send/receive 
> messages no matter how many interfaces are used - worked for Quagga/Zebra 
> routing software daemons till now. If some of these interfaces was 
> operationally down, socket wasn't blocked even if the queue was full.
>
>  In 
> fact "man sendmsg" has still text:
> 
> ENOBUFS
> 	The output queue for a network interface was full.  This  gener-
> 	ally  indicates  that the interface has stopped sending, but may
> 	be caused by transient congestion.   (Normally,  this  does  not
> 	occur  in Linux. Packets are just silently dropped when a device
> 	queue overflows.)
> 
> Seems that it's no longer true. Seems that kernel is now trying as hard as 
> possible not to loose any data - data is queued and if the queue will be 
> full, all related sockets will be blocked to notify application.

We need to investigate why this happens. It doesnt sound like good
behavior neither legit.
Does this happen to only raw sockets? Herbert asked for a sample small
program that reproduces it.

>  So, one 
> socket approach don't work any more for Quagga/Zebra. No problem, we can 
> take the "one socket per interface" approach. And we already have link 
> detection implemented to notify daemons.
> 

I dont think it would be necessary with latest changes to notifications
in 2.6.x

> But there will be still potential problem - sending the "interface down" 
> message from kernel to the zebra daemon and then to the routing protocol 
> daemon takes time. And during this time daemon can send packets which will 
> sit in queue and may cause many problems if sent to the network later (if 
> link comes up again). Think about statelss routing protocols like rip(ng), 
> ipv6 router advertisements etc. They may carry the info that's no longer 
> true causing routing loops etc.
> 
> And to clarify a little bit: no - the Quagga/Zebra didn't work with previous 
> approach perfectly. I fact with link detection and socket per interface it 
> will be better than ever no matter what's the kernel behaviour. We just 
> want to make sure that solution will be bulletproof.
> 
> So, problem - how can we make sure that no potentially dangerous aged 
> (routing) info will be in the network?

I think the idea of having driver junk packets when link is down does
sound plausible as a solution.

cheers,
jamal 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-07  1:13                     ` Herbert Xu
  2004-12-07  2:22                       ` jamal
@ 2004-12-10 15:37                       ` Paul Jakma
  2004-12-14  7:40                         ` Thomas Spatzier
  1 sibling, 1 reply; 67+ messages in thread
From: Paul Jakma @ 2004-12-10 15:37 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Hasso Tepper, hadi, thomas.spatzier, jgarzik, netdev

On Tue, 7 Dec 2004, Herbert Xu wrote:

> I don't see why this should be happening.

Thomas' original patch was to address this problem. I wonder could he 
recap the kernel side of this problem?

> Can you please provide a minimal program that reproduces this 
> blocking problem? For example, something that sends a packet to a 
> downed interface and then sends one to an interface that's up?

I will try get this for you soon.

regards
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
Who was that masked man?

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-10 15:37                       ` Paul Jakma
@ 2004-12-14  7:40                         ` Thomas Spatzier
  2004-12-15 13:50                           ` jamal
  0 siblings, 1 reply; 67+ messages in thread
From: Thomas Spatzier @ 2004-12-14  7:40 UTC (permalink / raw)
  To: Paul Jakma; +Cc: hadi, Hasso Tepper, Herbert Xu, jgarzik, netdev

Paul Jakma <paul@clubi.ie> wrote on 10.12.2004 16:37:15:
> Thomas' original patch was to address this problem. I wonder could he
> recap the kernel side of this problem?

Here is why we submitted the original patch: We got reports from
several customers that their dynamic routing daemons got hung when
one network interface lost its physical connection. Some debugging
showed that the write queues of sockets went full and got blocked.
This was because we issued a netif_stop_queue when we detect a
cable pull or something.
As a solution, we removed the netif_stop_queue calls and just dropped
the packets + we increment the respective error counts in the
net_device_stats and call netif_carrier_off.
This solved the customer problems and seems to be right thing for
zebra etc.

Regards,
Thomas.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-14  7:40                         ` Thomas Spatzier
@ 2004-12-15 13:50                           ` jamal
  2004-12-15 15:03                             ` Thomas Spatzier
  0 siblings, 1 reply; 67+ messages in thread
From: jamal @ 2004-12-15 13:50 UTC (permalink / raw)
  To: Thomas Spatzier
  Cc: Paul Jakma, Hasso Tepper, Herbert Xu, jgarzik, netdev,
	David S. Miller

On Tue, 2004-12-14 at 02:40, Thomas Spatzier wrote:
> 
> 
> Paul Jakma <paul@clubi.ie> wrote on 10.12.2004 16:37:15:
> > Thomas' original patch was to address this problem. I wonder could he
> > recap the kernel side of this problem?
> 
> Here is why we submitted the original patch: We got reports from
> several customers that their dynamic routing daemons got hung when
> one network interface lost its physical connection. Some debugging
> showed that the write queues of sockets went full and got blocked.
> This was because we issued a netif_stop_queue when we detect a
> cable pull or something.

I did some more thinking in the background and i wish to change my
opinion. What you see is Very Odd. I think there may be a bug upstream
at the socket layer or even before that - but doesnt sound like a device
level bug. Wasnt someone supposed to send a small proggie to Herbert?
When you netif_stop_queue you should never receive packets anymore
at the device level. If you receive any its a bug and you should drop
them and bitch violently. In other words i think what you have at the
moment is bandaid not the solution.

> As a solution, we removed the netif_stop_queue calls and just dropped
> the packets + we increment the respective error counts in the
> net_device_stats and call netif_carrier_off.
> This solved the customer problems and seems to be right thing for
> zebra etc.

We need to Fix this issue. Either your driver is doing something wrong
or something is broken upstackstream.
Can you describe how your driver uses the netif_start/stop/wake
interfaces?
Whoever promised to send that program to Herbert - please do.

cheers,
jamal

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-15 13:50                           ` jamal
@ 2004-12-15 15:03                             ` Thomas Spatzier
  2004-12-19 19:29                               ` jamal
  0 siblings, 1 reply; 67+ messages in thread
From: Thomas Spatzier @ 2004-12-15 15:03 UTC (permalink / raw)
  To: hadi; +Cc: David S. Miller, Hasso Tepper, Herbert Xu, jgarzik, netdev,
	Paul Jakma

jamal <hadi@cyberus.ca> wrote on 15.12.2004 14:50:27:

> When you netif_stop_queue you should never receive packets anymore
> at the device level. If you receive any its a bug and you should drop
> them and bitch violently. In other words i think what you have at the
> moment is bandaid not the solution.

When we do a netif_stop_queue, we do not get any more packets.
So this behaviour is ok. The problem is that the socket write
queues fill up then and get blocked as soon as they are full.

> Can you describe how your driver uses the netif_start/stop/wake
> interfaces?

Before the patch we are talking about:
When we detect a cable pull (or something like this) we call
netif_stop_queue and set switch off the IFF_RUNNING flag.
Then when we detect that the cable is plugged in again, we
call netif_wake_queue and switch the IFF_RUNNING flag on.

And with the patch:
On cable pull we switch off IFF_RUNNING and call
netif_carrier_off. We still get packets but drop them.
When the cable is plugged in we switch on IFF_RUNNING and
call netif_carrier_on.

Regard,
Thomas.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-15 15:03                             ` Thomas Spatzier
@ 2004-12-19 19:29                               ` jamal
  2004-12-19 22:29                                 ` Tommy Christensen
                                                   ` (2 more replies)
  0 siblings, 3 replies; 67+ messages in thread
From: jamal @ 2004-12-19 19:29 UTC (permalink / raw)
  To: Thomas Spatzier
  Cc: David S. Miller, Hasso Tepper, Herbert Xu, jgarzik, netdev,
	Paul Jakma

On Wed, 2004-12-15 at 10:03, Thomas Spatzier wrote:

> jamal <hadi@cyberus.ca> wrote on 15.12.2004 14:50:27:
> 
> > When you netif_stop_queue you should never receive packets anymore
> > at the device level. If you receive any its a bug and you should drop
> > them and bitch violently. In other words i think what you have at the
> > moment is bandaid not the solution.
> 
> When we do a netif_stop_queue, we do not get any more packets.
> So this behaviour is ok. The problem is that the socket write
> queues fill up then and get blocked as soon as they are full.
> 

This is the strange part. Anyone still willing to provide a sample
program that hangs?

> > Can you describe how your driver uses the netif_start/stop/wake
> > interfaces?
> 
> Before the patch we are talking about:
> When we detect a cable pull (or something like this) we call
> netif_stop_queue and set switch off the IFF_RUNNING flag.
> Then when we detect that the cable is plugged in again, we
> call netif_wake_queue and switch the IFF_RUNNING flag on.
> 

Not too bad except user space doesnt get async notification.

> And with the patch:
> On cable pull we switch off IFF_RUNNING and call
> netif_carrier_off. We still get packets but drop them.
> When the cable is plugged in we switch on IFF_RUNNING and
> call netif_carrier_on.

Ok, thats something you need to change.
Why you are setting that IFF_RUNNING flag?
Look at e1000 they do it right there. 

On link up:
	netif_carrier_on(netdev);
	netif_wake_queue(netdev);
On Link Down:
	netif_carrier_off(netdev); // wonder if these need swapping
	netif_stop_queue(netdev);

Still, I think we need to resolve the original problem.
And I have a feeling we wont be seeing any program which 
reproduces it ;->

cheers,
jamal

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-19 19:29                               ` jamal
@ 2004-12-19 22:29                                 ` Tommy Christensen
  2004-12-19 23:05                                   ` jamal
  2004-12-19 22:43                                 ` Jeff Garzik
  2004-12-19 23:54                                 ` Paul Jakma
  2 siblings, 1 reply; 67+ messages in thread
From: Tommy Christensen @ 2004-12-19 22:29 UTC (permalink / raw)
  To: hadi
  Cc: Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu,
	jgarzik, netdev, Paul Jakma

jamal wrote:
> On Wed, 2004-12-15 at 10:03, Thomas Spatzier wrote:
> 
>>jamal <hadi@cyberus.ca> wrote on 15.12.2004 14:50:27:
>>
>>
>>>When you netif_stop_queue you should never receive packets anymore
>>>at the device level. If you receive any its a bug and you should drop
>>>them and bitch violently. In other words i think what you have at the
>>>moment is bandaid not the solution.
>>
>>When we do a netif_stop_queue, we do not get any more packets.
>>So this behaviour is ok. The problem is that the socket write
>>queues fill up then and get blocked as soon as they are full.
>>
> 
> This is the strange part. Anyone still willing to provide a sample
> program that hangs?

I haven't tried this myself, but surely it can happen when the
socket send-buffer is smaller than what can be queued up for
transmission: i.e. in the qdisc queue and device DMA ring.
And even more so, when sending to multiple devices from one socket.

> Look at e1000 they do it right there. 
> 
> On link up:
> 	netif_carrier_on(netdev);
> 	netif_wake_queue(netdev);
> On Link Down:
> 	netif_carrier_off(netdev); // wonder if these need swapping
> 	netif_stop_queue(netdev);
> 

Well, this is the same as what started this whole thread.
I believe that stopping the queue on link-down events is simply
bad behavior of the driver.

-Tommy

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-19 19:29                               ` jamal
  2004-12-19 22:29                                 ` Tommy Christensen
@ 2004-12-19 22:43                                 ` Jeff Garzik
  2004-12-19 23:54                                 ` Paul Jakma
  2 siblings, 0 replies; 67+ messages in thread
From: Jeff Garzik @ 2004-12-19 22:43 UTC (permalink / raw)
  To: hadi, David S. Miller, Paul Jakma
  Cc: Thomas Spatzier, Hasso Tepper, Herbert Xu, netdev

jamal wrote:
> On Wed, 2004-12-15 at 10:03, Thomas Spatzier wrote:
> 
> 
>>jamal <hadi@cyberus.ca> wrote on 15.12.2004 14:50:27:
>>
>>
>>>When you netif_stop_queue you should never receive packets anymore
>>>at the device level. If you receive any its a bug and you should drop
>>>them and bitch violently. In other words i think what you have at the
>>>moment is bandaid not the solution.
>>
>>When we do a netif_stop_queue, we do not get any more packets.
>>So this behaviour is ok. The problem is that the socket write
>>queues fill up then and get blocked as soon as they are full.
>>
> 
> 
> This is the strange part. Anyone still willing to provide a sample
> program that hangs?
> 
> 
>>>Can you describe how your driver uses the netif_start/stop/wake
>>>interfaces?
>>
>>Before the patch we are talking about:
>>When we detect a cable pull (or something like this) we call
>>netif_stop_queue and set switch off the IFF_RUNNING flag.
>>Then when we detect that the cable is plugged in again, we
>>call netif_wake_queue and switch the IFF_RUNNING flag on.
>>
> 
> 
> Not too bad except user space doesnt get async notification.
> 
> 
>>And with the patch:
>>On cable pull we switch off IFF_RUNNING and call
>>netif_carrier_off. We still get packets but drop them.
>>When the cable is plugged in we switch on IFF_RUNNING and
>>call netif_carrier_on.
> 
> 
> Ok, thats something you need to change.
> Why you are setting that IFF_RUNNING flag?
> Look at e1000 they do it right there. 
> 
> On link up:
> 	netif_carrier_on(netdev);
> 	netif_wake_queue(netdev);
> On Link Down:
> 	netif_carrier_off(netdev); // wonder if these need swapping
> 	netif_stop_queue(netdev);
> 
> Still, I think we need to resolve the original problem.
> And I have a feeling we wont be seeing any program which 
> reproduces it ;->


I've been watching this thread, and am still waiting to see a good, 
isolated test case.

My initial reaction was based on an observation that
(a) the proposed s390 change creates a CPU cycle soaker, a /dev/null for 
skbs.
(b) it really sounds like the userland program is doing something broken

Even if (b) is not true, the change is unacceptable due to (a) regardless.

	Jeff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-19 22:29                                 ` Tommy Christensen
@ 2004-12-19 23:05                                   ` jamal
  2004-12-19 23:46                                     ` Tommy Christensen
  0 siblings, 1 reply; 67+ messages in thread
From: jamal @ 2004-12-19 23:05 UTC (permalink / raw)
  To: Tommy Christensen
  Cc: Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu,
	jgarzik, netdev, Paul Jakma

On Sun, 2004-12-19 at 17:29, Tommy Christensen wrote:

> I haven't tried this myself, but surely it can happen when the
> socket send-buffer is smaller than what can be queued up for
> transmission: i.e. in the qdisc queue and device DMA ring.

Shouldnt this have to do with socket options? If you wish to block while
waiting for send, then you should be allowed.
For a routing protocol that actually is notified that the link went
down, it should probably flush those socket buffer at that point.

> And even more so, when sending to multiple devices from one socket.

Yes, this looks more within reason. I dont know what the answer to this
is. But would be helpful for someone to create a setup that reproduces
this.

> > Look at e1000 they do it right there. 
> > 
> > On link up:
> > 	netif_carrier_on(netdev);
> > 	netif_wake_queue(netdev);
> > On Link Down:
> > 	netif_carrier_off(netdev); // wonder if these need swapping
> > 	netif_stop_queue(netdev);
> > 
> 
> Well, this is the same as what started this whole thread.
> I believe that stopping the queue on link-down events is simply
> bad behavior of the driver.

>From what Thomas was saying, this is not what he was doing. Read his
email.

cheers,
jamal

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-19 23:05                                   ` jamal
@ 2004-12-19 23:46                                     ` Tommy Christensen
  2004-12-20  0:15                                       ` Jeff Garzik
                                                         ` (3 more replies)
  0 siblings, 4 replies; 67+ messages in thread
From: Tommy Christensen @ 2004-12-19 23:46 UTC (permalink / raw)
  To: hadi
  Cc: Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu,
	jgarzik, netdev, Paul Jakma

jamal wrote:
> On Sun, 2004-12-19 at 17:29, Tommy Christensen wrote:
> 
>>I haven't tried this myself, but surely it can happen when the
>>socket send-buffer is smaller than what can be queued up for
>>transmission: i.e. in the qdisc queue and device DMA ring.
> 
> Shouldnt this have to do with socket options? If you wish to block while
> waiting for send, then you should be allowed.
> For a routing protocol that actually is notified that the link went
> down, it should probably flush those socket buffer at that point.

OK. So is this the recommendation for these pour souls?

  - Use a socket for each device.
  - Set the socket buffer (SO_SNDBUF) large enough. E.g. 1 MB ?
    Or use non-blocking sockets - just in case.
  - If you care about not sending stale packets, it is the
    responsibility of the application to flush the socket on
    link-down events (by down'ing the interface?).

>>Well, this is the same as what started this whole thread.
>>I believe that stopping the queue on link-down events is simply
>>bad behavior of the driver.
> 
>>From what Thomas was saying, this is not what he was doing. Read his
> email.

It was at least to the same effect. The key issue is whether the
packets are kept in the queue (qdisc) until the link is back up,
or they are drained (and dropped) by the driver.

-Tommy

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-19 19:29                               ` jamal
  2004-12-19 22:29                                 ` Tommy Christensen
  2004-12-19 22:43                                 ` Jeff Garzik
@ 2004-12-19 23:54                                 ` Paul Jakma
  2004-12-20 14:11                                   ` jamal
  2 siblings, 1 reply; 67+ messages in thread
From: Paul Jakma @ 2004-12-19 23:54 UTC (permalink / raw)
  To: jamal
  Cc: Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu,
	jgarzik, netdev

On Sun, 19 Dec 2004, jamal wrote:

> This is the strange part. Anyone still willing to provide a sample
> program that hangs?

I can provide instructions, or you can wait a wee bit - I didnt have 
any e1000 hardware to test with (e1000 being one of the drivers which 
has this behavious, AFAIK/TTBOMK) - but a computer with an e1000 
arrived Friday. So, give me a bit and I'll try come up with a test 
case. A simple UDP send should be enough.

> Still, I think we need to resolve the original problem.
> And I have a feeling we wont be seeing any program which
> reproduces it ;->

Just send a UDP packet - AIUI when link goes down the application can 
(or at least a window exists) where the application can send a packet 
down the file descriptor without error, but it will be queued rather 
than sent and the fd eventually blocks/returns EAGAIN.

That window is the problem.

I'll test tomorrow.

> cheers,
> jamal

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
You want to know why I kept getting promoted?  Because my mouth knows more
than my brain.
 	-- W.G.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-19 23:46                                     ` Tommy Christensen
@ 2004-12-20  0:15                                       ` Jeff Garzik
  2004-12-20 14:10                                       ` jamal
                                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 67+ messages in thread
From: Jeff Garzik @ 2004-12-20  0:15 UTC (permalink / raw)
  To: Tommy Christensen
  Cc: hadi, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu,
	netdev, Paul Jakma

Tommy Christensen wrote:
> It was at least to the same effect. The key issue is whether the
> packets are kept in the queue (qdisc) until the link is back up,
> or they are drained (and dropped) by the driver.


They should absolutely not be dropped by the driver.  No ifs, ands, or buts.

This sort of problem is NOT solved by modifying hundreds of drivers.

	Jeff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-19 23:46                                     ` Tommy Christensen
  2004-12-20  0:15                                       ` Jeff Garzik
@ 2004-12-20 14:10                                       ` jamal
  2004-12-20 18:54                                         ` Jeff Garzik
  2004-12-20 14:16                                       ` Paul Jakma
  2004-12-26  5:36                                       ` Herbert Xu
  3 siblings, 1 reply; 67+ messages in thread
From: jamal @ 2004-12-20 14:10 UTC (permalink / raw)
  To: Tommy Christensen
  Cc: Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu,
	jgarzik, netdev, Paul Jakma

On Sun, 2004-12-19 at 18:46, Tommy Christensen wrote:

> OK. So is this the recommendation for these pour souls?
> 
>   - Use a socket for each device.

Sounds sensible (each device with IP enabled is more like it)

>   - Set the socket buffer (SO_SNDBUF) large enough. E.g. 1 MB ?
>     Or use non-blocking sockets - just in case.

I think we may need a socket "flush socket buffer" signal

>   - If you care about not sending stale packets, it is the
>     responsibility of the application to flush the socket on
>     link-down events (by down'ing the interface?).

sigh. I am begining to think this is too complex an approach.
It requires there be a way to automagically clean up the buffers
when things like this happen.

I beginuing to think thats the simplest way to achieve this: i.e not to
stop the queue but rather to let the packets continue showing up and
drop them at the driver when the link is down . The netlink async
carrier signal to the app is to be used to reroute instead of being a
signal to flush buffers. In other words the other Thomas got it right
(with the exception of setting the IFF_RUNNIGN flags)

Jeff? 

cheers,
jamal

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-19 23:54                                 ` Paul Jakma
@ 2004-12-20 14:11                                   ` jamal
  0 siblings, 0 replies; 67+ messages in thread
From: jamal @ 2004-12-20 14:11 UTC (permalink / raw)
  To: Paul Jakma
  Cc: Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu,
	jgarzik, netdev

On Sun, 2004-12-19 at 18:54, Paul Jakma wrote:
> On Sun, 19 Dec 2004, jamal wrote:
> 
> > This is the strange part. Anyone still willing to provide a sample
> > program that hangs?
> 
> I can provide instructions, or you can wait a wee bit - I didnt have 
> any e1000 hardware to test with (e1000 being one of the drivers which 
> has this behavious, AFAIK/TTBOMK) - but a computer with an e1000 
> arrived Friday. So, give me a bit and I'll try come up with a test 
> case. A simple UDP send should be enough.

When you are ready let me know, we could repeat what i said in my last
email.

cheers,
jamal

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-19 23:46                                     ` Tommy Christensen
  2004-12-20  0:15                                       ` Jeff Garzik
  2004-12-20 14:10                                       ` jamal
@ 2004-12-20 14:16                                       ` Paul Jakma
  2004-12-20 18:56                                         ` Jeff Garzik
  2004-12-26  5:36                                       ` Herbert Xu
  3 siblings, 1 reply; 67+ messages in thread
From: Paul Jakma @ 2004-12-20 14:16 UTC (permalink / raw)
  To: Tommy Christensen
  Cc: hadi, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu,
	jgarzik, netdev

On Mon, 20 Dec 2004, Tommy Christensen wrote:

>> For a routing protocol that actually is notified that the link 
>> went down, it should probably flush those socket buffer at that 
>> point.

Or why not return an error, as soon as possible on the socket, eg 
ENOBUFS, and discard anything in the queue before that. Make it 
configurable via a sockopt if you think it'd harm ordinary apps 
(though, anything that cant deal with ENOBUFS is broken already, 
really..) or make it apply only to nonblock sockets.

>   responsibility of the application to flush the socket on
>   link-down events (by down'ing the interface?).

That seems more complex than needs be, for userspace at least.

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
"Paul Lynde to block..."
-- a contestant on "Hollywood Squares"

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-20 14:10                                       ` jamal
@ 2004-12-20 18:54                                         ` Jeff Garzik
  2004-12-21  0:13                                           ` Tommy Christensen
  2004-12-22 10:56                                           ` Thomas Spatzier
  0 siblings, 2 replies; 67+ messages in thread
From: Jeff Garzik @ 2004-12-20 18:54 UTC (permalink / raw)
  To: hadi
  Cc: Tommy Christensen, Thomas Spatzier, David S. Miller, Hasso Tepper,
	Herbert Xu, netdev, Paul Jakma

jamal wrote:
> I beginuing to think thats the simplest way to achieve this: i.e not to
> stop the queue but rather to let the packets continue showing up and
> drop them at the driver when the link is down . The netlink async
> carrier signal to the app is to be used to reroute instead of being a
> signal to flush buffers. In other words the other Thomas got it right
> (with the exception of setting the IFF_RUNNIGN flags)
> 
> Jeff? 

I haven't heard anything to convince me that the same change should be 
deployed across NNN drivers.  The drivers already signal the net core 
that the link is down; to me, that implies there should be code in _one_ 
place that handles this condition, not NNN places.

Further,
* if this policy ("drop them in the driver") ever changes, we must again 
touch NNN drivers

* dropping them in the driver but not stopping the queue means that the 
system is allowed to continue to stream data into the driver, only for 
the driver to free it.  That will scale -- right up to (worst case) 100% 
CPU, with userland sending packets as fast as it can, and the driver 
dropping packets as fast as it can.  The only places the net stack 
currently checks carrier is dev_get_flags() and dev_watchdog().

* If you need a hook to flush the in-hardware buffers, add such a hook. 
  Don't hack it in like this.  Yeah, adding a hook touches NNN drivers 
but at least the hook is far more self-contained, it's semantics will be 
more clear, and it will increase the likelihood that the driver changes 
do not affect the hot path nor current netif_{start,stop}_queue() logic.

	Jeff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-20 14:16                                       ` Paul Jakma
@ 2004-12-20 18:56                                         ` Jeff Garzik
  0 siblings, 0 replies; 67+ messages in thread
From: Jeff Garzik @ 2004-12-20 18:56 UTC (permalink / raw)
  To: Paul Jakma, hadi
  Cc: Tommy Christensen, Thomas Spatzier, David S. Miller, Hasso Tepper,
	Herbert Xu, netdev

Paul Jakma wrote:
>>   responsibility of the application to flush the socket on
>>   link-down events (by down'ing the interface?).
> 
> 
> That seems more complex than needs be, for userspace at least.

It is the responsibility of the kernel to push complexity to userland.

Some applications may NOT desire that the socket be flushed.  That's an 
app policy decision.

If this is the core issue, then I am even more inclined to think that 
the kernel is not what needs to be modified here.

	Jeff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-20 18:54                                         ` Jeff Garzik
@ 2004-12-21  0:13                                           ` Tommy Christensen
  2004-12-21  1:19                                             ` Jeff Garzik
  2004-12-22 10:56                                           ` Thomas Spatzier
  1 sibling, 1 reply; 67+ messages in thread
From: Tommy Christensen @ 2004-12-21  0:13 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: hadi, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu,
	netdev, Paul Jakma

Jeff Garzik wrote:
> I haven't heard anything to convince me that the same change should be 
> deployed across NNN drivers.  The drivers already signal the net core 
> that the link is down; to me, that implies there should be code in _one_ 
> place that handles this condition, not NNN places.

AFAICS only a handful of (newer) drivers call netif_stop_queue() directly.
Others may do this indirectly if the MAC stops taking packets from the
DMA ringbuffer. At least some MAC's/drivers certainly don't.

I've always thought of netif_stop_queue() as the replacement of the old
tbusy flag, signaling a transient condition where the HW is unable to
keep up with the flow of packets. And it seems to be used for just this
in most cases.
Perhaps somebody confused netif_stop_queue with dev_deactivate() ??

OK, another view on this: isn't is problematic to have skb's stuck in
the network stack "indefinitely" ?
They hold references to a dst_entry and a sock (and probably more).
So how about this for the FAQ:
  Q: Why can't I unload the af_packet module?
  A: Ohh, you'll have to plug in the darn cable to eth0 first!
*Please* tell me, I've got this all wrong.

-Tommy

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-21  0:13                                           ` Tommy Christensen
@ 2004-12-21  1:19                                             ` Jeff Garzik
  0 siblings, 0 replies; 67+ messages in thread
From: Jeff Garzik @ 2004-12-21  1:19 UTC (permalink / raw)
  To: Tommy Christensen
  Cc: hadi, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu,
	netdev, Paul Jakma

Tommy Christensen wrote:
> Jeff Garzik wrote:
> 
>> I haven't heard anything to convince me that the same change should be 
>> deployed across NNN drivers.  The drivers already signal the net core 
>> that the link is down; to me, that implies there should be code in 
>> _one_ place that handles this condition, not NNN places.
> 
> 
> AFAICS only a handful of (newer) drivers call netif_stop_queue() directly.
> Others may do this indirectly if the MAC stops taking packets from the
> DMA ringbuffer. At least some MAC's/drivers certainly don't.

Incorrect.  Use of netif_stop_queue() is -required- to signal that the 
hardware cannot accept any more skbs from the system.

Far more than a "handful" and required for all but a few very strange 
drivers.


> OK, another view on this: isn't is problematic to have skb's stuck in
> the network stack "indefinitely" ?
> They hold references to a dst_entry and a sock (and probably more).
> So how about this for the FAQ:
>  Q: Why can't I unload the af_packet module?
>  A: Ohh, you'll have to plug in the darn cable to eth0 first!
> *Please* tell me, I've got this all wrong.

You've got this all wrong.

	Jeff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-20 18:54                                         ` Jeff Garzik
  2004-12-21  0:13                                           ` Tommy Christensen
@ 2004-12-22 10:56                                           ` Thomas Spatzier
  2004-12-22 11:07                                             ` Jeff Garzik
  2004-12-22 13:48                                             ` jamal
  1 sibling, 2 replies; 67+ messages in thread
From: Thomas Spatzier @ 2004-12-22 10:56 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: David S. Miller, hadi, Hasso Tepper, Herbert Xu, netdev,
	Paul Jakma, Tommy Christensen





Jeff Garzik <jgarzik@pobox.com> wrote on 20.12.2004 19:54:53:
> I haven't heard anything to convince me that the same change should be
> deployed across NNN drivers.  The drivers already signal the net core
> that the link is down; to me, that implies there should be code in _one_
> place that handles this condition, not NNN places.
>

This sounds plausible and I'm with Jeff here.
For me as the driver author it's the smallest change.
I will put it like this:
on cable gone:
      netif_stop_queue();
      netif_carrier_off();
on cable reconnected:
      netif_carrier_on();
      netif_wake_queue();

Is that ok, i.e. what all drivers do or should do?

For the problems that applications might have (i.e. sockets being
blocked etc.) another solution should be found.
And as Jeff pointed out, this should be a central solution and
not be implemented in drivers.

Regards,
Thomas.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-22 10:56                                           ` Thomas Spatzier
@ 2004-12-22 11:07                                             ` Jeff Garzik
  2004-12-22 13:48                                             ` jamal
  1 sibling, 0 replies; 67+ messages in thread
From: Jeff Garzik @ 2004-12-22 11:07 UTC (permalink / raw)
  To: Thomas Spatzier
  Cc: David S. Miller, hadi, Hasso Tepper, Herbert Xu, netdev,
	Paul Jakma, Tommy Christensen

On Wed, Dec 22, 2004 at 11:56:06AM +0100, Thomas Spatzier wrote:
> on cable reconnected:
>       netif_carrier_on();
>       netif_wake_queue();

Side note to all driver authors, make sure you only ever start or
wake the queue if there is truly space available for another skb.

	Jeff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-22 10:56                                           ` Thomas Spatzier
  2004-12-22 11:07                                             ` Jeff Garzik
@ 2004-12-22 13:48                                             ` jamal
  2005-01-03  9:10                                               ` Thomas Spatzier
  1 sibling, 1 reply; 67+ messages in thread
From: jamal @ 2004-12-22 13:48 UTC (permalink / raw)
  To: Thomas Spatzier
  Cc: Jeff Garzik, David S. Miller, Hasso Tepper, Herbert Xu, netdev,
	Paul Jakma, Tommy Christensen

On Wed, 2004-12-22 at 05:56, Thomas Spatzier wrote:

> 
> Is that ok, i.e. what all drivers do or should do?
> 
> For the problems that applications might have (i.e. sockets being
> blocked etc.) another solution should be found.
> And as Jeff pointed out, this should be a central solution and
> not be implemented in drivers.
> 

I think this needs to be resolved too. 
It is possible to have a centralized action instead of requiring drivers
to make changes if we know the state of the driver is in netcarrier_off.
What that would require is 
on cable gone, you just say:
      netif_carrier_off();
and the top layer code will junk the packets before they hit the driver.
This way the socket code can continue sending whatever it wants but if
theres no link, then its fair to drop those packets?

If this acceptable i can generate a quick patch.

cheers,
jamal

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-19 23:46                                     ` Tommy Christensen
                                                         ` (2 preceding siblings ...)
  2004-12-20 14:16                                       ` Paul Jakma
@ 2004-12-26  5:36                                       ` Herbert Xu
  3 siblings, 0 replies; 67+ messages in thread
From: Herbert Xu @ 2004-12-26  5:36 UTC (permalink / raw)
  To: Tommy Christensen
  Cc: hadi, thomas.spatzier, davem, hasso, herbert, jgarzik, netdev,
	paul

Tommy Christensen <tommy.christensen@tpack.net> wrote:
> 
> OK. So is this the recommendation for these pour souls?
> 
>  - Use a socket for each device.

Please show us your program first.  Then we can figure out
whether this is necessary or not.
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2004-12-22 13:48                                             ` jamal
@ 2005-01-03  9:10                                               ` Thomas Spatzier
  2005-01-03 15:05                                                 ` jamal
  0 siblings, 1 reply; 67+ messages in thread
From: Thomas Spatzier @ 2005-01-03  9:10 UTC (permalink / raw)
  To: hadi
  Cc: David S. Miller, Hasso Tepper, Herbert Xu, Jeff Garzik, netdev,
	Paul Jakma, Tommy Christensen





jamal <hadi@cyberus.ca> wrote on 22.12.2004 14:48:28:
> I think this needs to be resolved too.
> It is possible to have a centralized action instead of requiring drivers
> to make changes if we know the state of the driver is in netcarrier_off.
> What that would require is
> on cable gone, you just say:
>       netif_carrier_off();
> and the top layer code will junk the packets before they hit the driver.
> This way the socket code can continue sending whatever it wants but if
> theres no link, then its fair to drop those packets?
>
> If this acceptable i can generate a quick patch.

Does Jamal's solution sound good for all? Then I would change my driver
to do the following:
just call netif_carrier_off() (not netif_stop_queue)

Then the upper layers will do the propper thing, so I should not
get any more packets. If I still get some, I will drop them.
Did I get this correctly?

BTW: A happy new year to all ;-)

Regards,
Thomas.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-03  9:10                                               ` Thomas Spatzier
@ 2005-01-03 15:05                                                 ` jamal
  2005-01-04 23:28                                                   ` Jeff Garzik
  0 siblings, 1 reply; 67+ messages in thread
From: jamal @ 2005-01-03 15:05 UTC (permalink / raw)
  To: Thomas Spatzier
  Cc: David S. Miller, Hasso Tepper, Herbert Xu, Jeff Garzik, netdev,
	Paul Jakma, Tommy Christensen


The change is simple if theres consensus to go this path.

cheers,
jamal

On Mon, 2005-01-03 at 04:10, Thomas Spatzier wrote:
> 
> 
> jamal <hadi@cyberus.ca> wrote on 22.12.2004 14:48:28:
> > I think this needs to be resolved too.
> > It is possible to have a centralized action instead of requiring drivers
> > to make changes if we know the state of the driver is in netcarrier_off.
> > What that would require is
> > on cable gone, you just say:
> >       netif_carrier_off();
> > and the top layer code will junk the packets before they hit the driver.
> > This way the socket code can continue sending whatever it wants but if
> > theres no link, then its fair to drop those packets?
> >
> > If this acceptable i can generate a quick patch.
> 
> Does Jamal's solution sound good for all? Then I would change my driver
> to do the following:
> just call netif_carrier_off() (not netif_stop_queue)
> 
> Then the upper layers will do the propper thing, so I should not
> get any more packets. If I still get some, I will drop them.
> Did I get this correctly?
> 
> BTW: A happy new year to all ;-)
> 
> Regards,
> Thomas.
> 
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-03 15:05                                                 ` jamal
@ 2005-01-04 23:28                                                   ` Jeff Garzik
  2005-01-05  3:19                                                     ` jamal
  2005-01-05  6:26                                                     ` Paul Jakma
  0 siblings, 2 replies; 67+ messages in thread
From: Jeff Garzik @ 2005-01-04 23:28 UTC (permalink / raw)
  To: hadi
  Cc: Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu,
	netdev, Paul Jakma, Tommy Christensen

jamal wrote:
> The change is simple if theres consensus to go this path.

Can you resend your patch?

My main objection was that any change should be made in the core, not in 
individual net drivers.

Another objection was that it seemed that some of the proposed solutions 
for clearing the queue on link down were imposing app-specific policy on 
the overall kernel.

Aren't there cases where people would -not- want the queue to be cleared?

	Jeff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-04 23:28                                                   ` Jeff Garzik
@ 2005-01-05  3:19                                                     ` jamal
  2005-01-05  6:30                                                       ` Paul Jakma
  2005-01-05 15:35                                                       ` Tommy Christensen
  2005-01-05  6:26                                                     ` Paul Jakma
  1 sibling, 2 replies; 67+ messages in thread
From: jamal @ 2005-01-05  3:19 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu,
	netdev, Paul Jakma, Tommy Christensen

[-- Attachment #1: Type: text/plain, Size: 1005 bytes --]

On Tue, 2005-01-04 at 18:28, Jeff Garzik wrote:
> jamal wrote:
> > The change is simple if theres consensus to go this path.
> 
> Can you resend your patch?

I didnt send any patch - but heres one that looks right - havent tried
compiling it.

> My main objection was that any change should be made in the core, not in 
> individual net drivers.

Attached patch resolves that concern

> Another objection was that it seemed that some of the proposed solutions 
> for clearing the queue on link down were imposing app-specific policy on 
> the overall kernel.
>
> Aren't there cases where people would -not- want the queue to be cleared?

Thats a good question and your point of imposing policy in the kernel is
valid. I know that i would probably want packets to appear they are
going out when the wire is pulled from underneath me.
Theres possibly people who would want it differently - so for we havent
heard from them. Maybe poll far and wide - or push it in and wait for
them to whine.

cheers,
jamal

[-- Attachment #2: nc.p --]
[-- Type: text/plain, Size: 512 bytes --]

--- 2610-bk1/net/sched/sch_generic.c	2005/01/05 01:21:04	1.1
+++ 2610-bk1/net/sched/sch_generic.c	2005/01/05 01:36:26
@@ -152,7 +152,7 @@
 					spin_lock(&dev->queue_lock);
 					goto collision; 
 				}
-			}
+			} 
 
 			/* NETDEV_TX_BUSY - we need to requeue */
 			/* Release the driver */
@@ -162,6 +162,11 @@
 			} 
 			spin_lock(&dev->queue_lock);
 			q = dev->qdisc;
+			if (!netif_carrier_ok(dev)) {
+				kfree_skb(skb);
+				q->qstats.drops++;
+				return -1;
+			}
 		}
 
 		/* Device kicked us out :(

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-04 23:28                                                   ` Jeff Garzik
  2005-01-05  3:19                                                     ` jamal
@ 2005-01-05  6:26                                                     ` Paul Jakma
  1 sibling, 0 replies; 67+ messages in thread
From: Paul Jakma @ 2005-01-05  6:26 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: hadi, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu,
	netdev, Tommy Christensen

On Tue, 4 Jan 2005, Jeff Garzik wrote:

> Aren't there cases where people would -not- want the queue to be 
> cleared?

Why *do* you want the queue to be cleared? (avoiding NIC /dev/null is 
only reason right?)

TCP (AIUI) has its owns means to resend.

Most (all?) other transports have *never* had an expectation of such 
reliability. Applications *know* they have to provide their own 
reliability. Queueing and later sending (by then) stale packets is 
*bad*

Think:

- heartbeat applications
- Router or route advertisements (IPv6 RA, IPv4 ICMP IRDP, RIP)
- Streaming media (ok, not much damage here, but still there's simply
   no point queueing while link is down..)

Why queue packets on behalf of applications which have no expectation 
of kernel doing this (any application expecting such is certainlty 
broken) and which very likely implement their own reliability or 
packet-loss recovery strategies? Particularly when queueing is quite 
possibly *detrimental* to what the application is trying to achieve 
(timely sending of packets).

If an application wants reliable sending of packets, it will be using 
TCP..

> 	Jeff

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
When the speaker and he to whom he is speaks do not understand, that is
metaphysics.
 		-- Voltaire

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-05  3:19                                                     ` jamal
@ 2005-01-05  6:30                                                       ` Paul Jakma
  2005-01-05 13:16                                                         ` jamal
  2005-01-05 15:35                                                       ` Tommy Christensen
  1 sibling, 1 reply; 67+ messages in thread
From: Paul Jakma @ 2005-01-05  6:30 UTC (permalink / raw)
  To: jamal
  Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper,
	Herbert Xu, netdev, Tommy Christensen

On Wed, 4 Jan 2005, jamal wrote:

> Theres possibly people who would want it differently - so for we 
> havent heard from them. Maybe poll far and wide - or push it in and 
> wait for them to whine.

We're whining.

This policy in the Linux kernel makes using Linux dangerous for 
certain routing applications (RIP, routing advertisements).

Name a non-TCP application that *does* want this newish kernel policy 
and tell me how it breaks without this *new* policy. ;)

> cheers,
> jamal

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
My only love sprung from my only hate!
Too early seen unknown, and known too late!
 		-- William Shakespeare, "Romeo and Juliet"

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-05  6:30                                                       ` Paul Jakma
@ 2005-01-05 13:16                                                         ` jamal
  2005-01-05 14:29                                                           ` Paul Jakma
  0 siblings, 1 reply; 67+ messages in thread
From: jamal @ 2005-01-05 13:16 UTC (permalink / raw)
  To: Paul Jakma
  Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper,
	Herbert Xu, netdev, Tommy Christensen

On Wed, 2005-01-05 at 01:30, Paul Jakma wrote:
> On Wed, 4 Jan 2005, jamal wrote:
> 
> > Theres possibly people who would want it differently - so for we 
> > havent heard from them. Maybe poll far and wide - or push it in and 
> > wait for them to whine.
> 
> We're whining.
> 
> This policy in the Linux kernel makes using Linux dangerous for 
> certain routing applications (RIP, routing advertisements).
> 
> Name a non-TCP application that *does* want this newish kernel policy 
> and tell me how it breaks without this *new* policy. ;)

Ok, Iam confused - I thought you guys _wanted this_ ;->
The issue is about message obsolency more than it is about reliability.
Without this the scenario you played you played for us before was:

1)=>send a few LSAs from user space, pull cable before they go out
(this will happen if you are sending sufficiently large amounts of
packets i.e network is busy),
2)=>user space gets notified via netlink, device shuts down acces to
DMA, packets queued anyways and you have no ability to say "ooops,sorry
take that packet back"
a)You could do a move to another device at this point.
or
b) dumb app will continue sending

3)=>plug cable back in 2 minutes later, obsolete LSAs sent followed by
any new ones that may follow
....

With the patch, packets in #2 will be dropped. As a matter of fact
within those two minutes, if stopped, it is probable the device watchdog
timer will kick in and flush the DMA but not the scheduler queues above
it (which is where upto a 1000 stale packets could be sitting).

What is it that you dont like now? 

cheers,
jamal

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-05 13:16                                                         ` jamal
@ 2005-01-05 14:29                                                           ` Paul Jakma
  2005-01-06 13:55                                                             ` jamal
  0 siblings, 1 reply; 67+ messages in thread
From: Paul Jakma @ 2005-01-05 14:29 UTC (permalink / raw)
  To: jamal
  Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper,
	Herbert Xu, netdev, Tommy Christensen

On Wed, 5 Jan 2005, jamal wrote:

> Ok, Iam confused - I thought you guys _wanted this_ ;->

I'm confused too now. We dont want "queue packets" - that's the 
'newish' policy I was referring to. (newish in quotes, as I'm not 
sure from when this behaviour dates).

> The issue is about message obsolency more than it is about reliability.

Right, exactly.

So we do want what you think we want ;) (i think).

> Without this the scenario you played you played for us before was:
>
> 1)=>send a few LSAs from user space, pull cable before they go out
> (this will happen if you are sending sufficiently large amounts of
> packets i.e network is busy),
> 2)=>user space gets notified via netlink, device shuts down acces to
> DMA, packets queued anyways and you have no ability to say "ooops,sorry
> take that packet back"

Right.

Two things:

- we noticed this behaviour because of OSPF

Users reported ospfd would cease to send packets on all interfaces, 
(with certain drivers) because /one/ interface was link-down.

We can workaround this easily by opening a socket per interface - at 
present we simply punt OSPF packets down a single raw socket and rely 
on IP_HDRINCL to have kernel route the packet out correct interface 
(IP_MULTICAST_IF for multicast destined packets).

- The queueing does not affect OSPF terribly, it would affect other
   protocols though

OSPF implements its own 'synchronisation' facilities between 
neighbours and can easily 'detect' obsolecent packets. So the 
obsolence issue does not affect it, routing-information in stale 
packets will not propogate, so they cant do much damage really. (just 
unneccessary to queue and send such packets).

However, other commonly used protocols are not as robust. Mostly 
those where a protocol is used to distribute routing information to 
passive listeners, eg:

- RIP
- IPv4 ICMP based router-discovery (IRDP)
- IPv6 Router-advertisements

In these cases, the queuing behaviour is potentially dangerous and 
could disrupt connectivity by propogating no-longer-valid routing 
information.

> a)You could do a move to another device at this point.
> or
> b) dumb app will continue sending
>
> 3)=>plug cable back in 2 minutes later, obsolete LSAs sent followed by
> any new ones that may follow

Right. Except OSPF is robust enough against stale packets. Other 
protocols are not.

> With the patch, packets in #2 will be dropped.

Perfect.

> As a matter of fact within those two minutes, if stopped, it is 
> probable the device watchdog timer will kick in and flush the DMA 
> but not the scheduler queues above it (which is where upto a 1000 
> stale packets could be sitting).

Right, and our argument it doesnt make sense to send those packets. 
I've never heard of any UDP and/or raw application that expected a 
kernel to queue packets if they could not be sent for lack of link or 
other problem, and any which did are surely broken by definition? ;)

> What is it that you dont like now?

Sorry, wires crossed re "new behaviour". The "new new" behaviour in 
the patch as you describe would be perfect.

PS: Another issue, could we have kernel space IP fragmentation for 
IP_HDRINCL sockets please? We currently have to implement 
fragmentation ourselves, which seems silly given that kernel already 
has this functionality.

> cheers,
> jamal

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
The early bird who catches the worm works for someone who comes in late
and owns the worm farm.
 		-- Travis McGee

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-05  3:19                                                     ` jamal
  2005-01-05  6:30                                                       ` Paul Jakma
@ 2005-01-05 15:35                                                       ` Tommy Christensen
  2005-01-06 13:58                                                         ` jamal
  1 sibling, 1 reply; 67+ messages in thread
From: Tommy Christensen @ 2005-01-05 15:35 UTC (permalink / raw)
  To: hadi
  Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper,
	Herbert Xu, netdev, Paul Jakma

[-- Attachment #1: Type: text/plain, Size: 748 bytes --]

jamal wrote:
> On Tue, 2005-01-04 at 18:28, Jeff Garzik wrote:
> 
>>jamal wrote:
>>
>>>The change is simple if theres consensus to go this path.
>>
>>Can you resend your patch?
> 
> 
> I didnt send any patch - but heres one that looks right - havent tried
> compiling it.

Thank you for diving into this, Jamal.

For the patch to have much effect, we need to check the carrier before
calling hard_start_xmit(). Like in the modified patch below.

>>My main objection was that any change should be made in the core, not in 
>>individual net drivers.
> 
> 
> Attached patch resolves that concern

Except for the drivers that call netif_stop_queue() on link-down. These
calls (and the corresponding netif_wake_queue) would have to be removed.


-Tommy

[-- Attachment #2: sch_generic.c.patch --]
[-- Type: text/plain, Size: 615 bytes --]

--- linux-2.6.10-bk7/net/sched/sch_generic.c	Wed Jan  5 16:27:13 2005
+++ linux-2.6.10-work/net/sched/sch_generic.c	Wed Jan  5 16:28:53 2005
@@ -134,7 +134,7 @@
 			/* And release queue */
 			spin_unlock(&dev->queue_lock);
 
-			if (!netif_queue_stopped(dev)) {
+			if (!netif_queue_stopped(dev) && netif_carrier_ok(dev)) {
 				int ret;
 				if (netdev_nit)
 					dev_queue_xmit_nit(skb, dev);
@@ -162,6 +162,11 @@
 			} 
 			spin_lock(&dev->queue_lock);
 			q = dev->qdisc;
+			if (!netif_carrier_ok(dev)) {
+				kfree_skb(skb);
+				q->qstats.drops++;
+				return -1;
+			}
 		}
 
 		/* Device kicked us out :(

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-05 14:29                                                           ` Paul Jakma
@ 2005-01-06 13:55                                                             ` jamal
  0 siblings, 0 replies; 67+ messages in thread
From: jamal @ 2005-01-06 13:55 UTC (permalink / raw)
  To: Paul Jakma
  Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper,
	Herbert Xu, netdev, Tommy Christensen

On Wed, 2005-01-05 at 09:29, Paul Jakma wrote:
> On Wed, 5 Jan 2005, jamal wrote:

[..]
> Sorry, wires crossed re "new behaviour". The "new new" behaviour in 
> the patch as you describe would be perfect.
> 
> PS: Another issue, could we have kernel space IP fragmentation for 
> IP_HDRINCL sockets please? We currently have to implement 
> fragmentation ourselves, which seems silly given that kernel already 
> has this functionality.

I will let some other fireman grab this bait ;-> 
If not i will revisit it after a unknown/random timeout.

cheers,
jamal

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-05 15:35                                                       ` Tommy Christensen
@ 2005-01-06 13:58                                                         ` jamal
  2005-01-06 15:06                                                           ` Tommy Christensen
  0 siblings, 1 reply; 67+ messages in thread
From: jamal @ 2005-01-06 13:58 UTC (permalink / raw)
  To: Tommy Christensen
  Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper,
	Herbert Xu, netdev, Paul Jakma

On Wed, 2005-01-05 at 10:35, Tommy Christensen wrote:
> jamal wrote:

> 
> Except for the drivers that call netif_stop_queue() on link-down. These
> calls (and the corresponding netif_wake_queue) would have to be removed.

If we assume all drivers do:
netif_stop then carrier_off then you dont need that extra check.
Thats the working assumption i had - maybe a comment is deserving or
we could say we dont think that all drivers are going to follow that 
sequence.

cheers,
jamal

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-06 13:58                                                         ` jamal
@ 2005-01-06 15:06                                                           ` Tommy Christensen
  2005-01-07 13:32                                                             ` jamal
  0 siblings, 1 reply; 67+ messages in thread
From: Tommy Christensen @ 2005-01-06 15:06 UTC (permalink / raw)
  To: hadi
  Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper,
	Herbert Xu, netdev, Paul Jakma

On Thu, 2005-01-06 at 14:58, jamal wrote:
> On Wed, 2005-01-05 at 10:35, Tommy Christensen wrote:
> > jamal wrote:
> 
> > 
> > Except for the drivers that call netif_stop_queue() on link-down. These
> > calls (and the corresponding netif_wake_queue) would have to be removed.
> 
> If we assume all drivers do:
> netif_stop then carrier_off then you dont need that extra check.
> Thats the working assumption i had - maybe a comment is deserving or
> we could say we dont think that all drivers are going to follow that 
> sequence.

But qdisc_restart() isn't called any more after the queue is
stopped. So how do we get to drain the packets?

Another approach could be to reset the qdisc (kind of what
dev_deactivate does) if the driver stays in queue_stopped
and carrier_off for some period of time.

-Tommy

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-06 15:06                                                           ` Tommy Christensen
@ 2005-01-07 13:32                                                             ` jamal
  2005-01-07 15:26                                                               ` Tommy Christensen
  0 siblings, 1 reply; 67+ messages in thread
From: jamal @ 2005-01-07 13:32 UTC (permalink / raw)
  To: Tommy Christensen
  Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper,
	Herbert Xu, netdev, Paul Jakma

On Thu, 2005-01-06 at 10:06, Tommy Christensen wrote:

> But qdisc_restart() isn't called any more after the queue is
> stopped. So how do we get to drain the packets?

We return -1; so qdisc restart will be called until theres no packets
left on the queue. Did i miss something in the state machine there?

> Another approach could be to reset the qdisc (kind of what
> dev_deactivate does) if the driver stays in queue_stopped
> and carrier_off for some period of time.
> 

reseting qdisc could certainly be part of the device watchdog.
Note that once the packets are drained, the next thing that will happen
if device is stopped for a dev->timeout period is watchdog kicking in
and flushing the DMA. So you could reset your qdisc there though i am
not sure it will be needed with that patch.

cheers,
jamal 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-07 13:32                                                             ` jamal
@ 2005-01-07 15:26                                                               ` Tommy Christensen
  2005-01-10 13:18                                                                 ` jamal
  0 siblings, 1 reply; 67+ messages in thread
From: Tommy Christensen @ 2005-01-07 15:26 UTC (permalink / raw)
  To: hadi
  Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper,
	Herbert Xu, netdev, Paul Jakma

On Fri, 2005-01-07 at 14:32, jamal wrote:
> On Thu, 2005-01-06 at 10:06, Tommy Christensen wrote:
> 
> > But qdisc_restart() isn't called any more after the queue is
> > stopped. So how do we get to drain the packets?
> 
> We return -1; so qdisc restart will be called until theres no packets
> left on the queue. Did i miss something in the state machine there?

Yes, qdisc_run does this
  while (!netif_queue_stopped(dev) && qdisc_restart(dev) < 0)
    /* NOTHING */;

> > Another approach could be to reset the qdisc (kind of what
> > dev_deactivate does) if the driver stays in queue_stopped
> > and carrier_off for some period of time.
> > 
> 
> reseting qdisc could certainly be part of the device watchdog.
> Note that once the packets are drained, the next thing that will happen
> if device is stopped for a dev->timeout period is watchdog kicking in
> and flushing the DMA. So you could reset your qdisc there though i am
> not sure it will be needed with that patch.

Unfortunately the watchdog won't do this when carrier is off, which
is fair enough since the chip isn't hung as such.

So, we would need another method to flush out stale packets from
the DMA ring. Some drivers already do this themselves, and some won't
need it because they keep on running even when link is down.

-Tommy

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-07 15:26                                                               ` Tommy Christensen
@ 2005-01-10 13:18                                                                 ` jamal
  2005-01-16 23:10                                                                   ` jamal
  0 siblings, 1 reply; 67+ messages in thread
From: jamal @ 2005-01-10 13:18 UTC (permalink / raw)
  To: Tommy Christensen
  Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper,
	Herbert Xu, netdev, Paul Jakma

On Fri, 2005-01-07 at 10:26, Tommy Christensen wrote:
> On Fri, 2005-01-07 at 14:32, jamal wrote:
> > On Thu, 2005-01-06 at 10:06, Tommy Christensen wrote:

[..]

> > reseting qdisc could certainly be part of the device watchdog.
> > Note that once the packets are drained, the next thing that will happen
> > if device is stopped for a dev->timeout period is watchdog kicking in
> > and flushing the DMA. So you could reset your qdisc there though i am
> > not sure it will be needed with that patch.
> 
> Unfortunately the watchdog won't do this when carrier is off, which
> is fair enough since the chip isn't hung as such.

We may have to kick the dog in that path as well then.

> So, we would need another method to flush out stale packets from
> the DMA ring. Some drivers already do this themselves, and some won't
> need it because they keep on running even when link is down.

kicking tx_timeout seems like the right thing to do until i looked at
the drivers. Sigh. I just stared at a few drivers and it is not pleasant
- and they mostly have the same theme (so you cant say they are
inconsistent).
Looking at e1000 for example is quiet interesting - it actually brings
down then up the device ;-> This will both flash the dma and reset qdisc
but also loose any static routes. It will also reschedule the device;-<
So i am not so sure that the dog kicking in a minute later is a good
thing at all. We may need a lot of janitorial work to cleanse the
drivers - something jgarzik is against. 

A qdisc_reset, as you suggest, will probably not cure the problem but
provide some cosmetics. Thoughts?

cheers,
jamal

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-10 13:18                                                                 ` jamal
@ 2005-01-16 23:10                                                                   ` jamal
  2005-01-17 12:04                                                                     ` Hasso Tepper
  2005-01-17 21:38                                                                     ` Tommy Christensen
  0 siblings, 2 replies; 67+ messages in thread
From: jamal @ 2005-01-16 23:10 UTC (permalink / raw)
  To: Tommy Christensen
  Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper,
	Herbert Xu, netdev, Paul Jakma


Didnt see anymore discussion on this - have we given up on it?

cheers,
jamal

On Mon, 2005-01-10 at 08:18, jamal wrote:
> On Fri, 2005-01-07 at 10:26, Tommy Christensen wrote:
> > On Fri, 2005-01-07 at 14:32, jamal wrote:
> > > On Thu, 2005-01-06 at 10:06, Tommy Christensen wrote:
> 
> [..]
> 
> > > reseting qdisc could certainly be part of the device watchdog.
> > > Note that once the packets are drained, the next thing that will happen
> > > if device is stopped for a dev->timeout period is watchdog kicking in
> > > and flushing the DMA. So you could reset your qdisc there though i am
> > > not sure it will be needed with that patch.
> > 
> > Unfortunately the watchdog won't do this when carrier is off, which
> > is fair enough since the chip isn't hung as such.
> 
> We may have to kick the dog in that path as well then.
> 
> > So, we would need another method to flush out stale packets from
> > the DMA ring. Some drivers already do this themselves, and some won't
> > need it because they keep on running even when link is down.
> 
> kicking tx_timeout seems like the right thing to do until i looked at
> the drivers. Sigh. I just stared at a few drivers and it is not pleasant
> - and they mostly have the same theme (so you cant say they are
> inconsistent).
> Looking at e1000 for example is quiet interesting - it actually brings
> down then up the device ;-> This will both flash the dma and reset qdisc
> but also loose any static routes. It will also reschedule the device;-<
> So i am not so sure that the dog kicking in a minute later is a good
> thing at all. We may need a lot of janitorial work to cleanse the
> drivers - something jgarzik is against. 
> 
> A qdisc_reset, as you suggest, will probably not cure the problem but
> provide some cosmetics. Thoughts?
> 
> cheers,
> jamal
> 
> 
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-16 23:10                                                                   ` jamal
@ 2005-01-17 12:04                                                                     ` Hasso Tepper
  2005-01-17 22:04                                                                       ` Tommy Christensen
  2005-01-17 21:38                                                                     ` Tommy Christensen
  1 sibling, 1 reply; 67+ messages in thread
From: Hasso Tepper @ 2005-01-17 12:04 UTC (permalink / raw)
  To: hadi
  Cc: Tommy Christensen, Jeff Garzik, Thomas Spatzier, David S. Miller,
	Herbert Xu, netdev, Paul Jakma

jamal wrote:
> Didnt see anymore discussion on this - have we given up on it?

I'm not competent enough about kernel internals to decide how this should be 
done in kernel. But I would like to clarify situation for me as Quagga 
developer.

All this work is done to avoid sending stale packets to the network? Ie. 
there will be no change regarding socket blocking issue? Detecting carrier 
on/off and/or socket per interface is must be for us in user space to avoid 
socket blocking? I'm just trying to have full picture so I can put my 
priorities in place for next release.

Thanks for your support,

-- 
Hasso Tepper
Elion Enterprises Ltd.
WAN administrator

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-16 23:10                                                                   ` jamal
  2005-01-17 12:04                                                                     ` Hasso Tepper
@ 2005-01-17 21:38                                                                     ` Tommy Christensen
  2005-01-30 23:39                                                                       ` Tommy Christensen
  1 sibling, 1 reply; 67+ messages in thread
From: Tommy Christensen @ 2005-01-17 21:38 UTC (permalink / raw)
  To: hadi
  Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper,
	Herbert Xu, netdev, Paul Jakma

jamal wrote:
> Didnt see anymore discussion on this - have we given up on it?

It is not forgotten - just stalled ...
I'll try to look at it some more.

-Tommy

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-17 12:04                                                                     ` Hasso Tepper
@ 2005-01-17 22:04                                                                       ` Tommy Christensen
  2005-01-17 22:13                                                                         ` Peter Buckingham
  0 siblings, 1 reply; 67+ messages in thread
From: Tommy Christensen @ 2005-01-17 22:04 UTC (permalink / raw)
  To: Hasso Tepper
  Cc: hadi, Jeff Garzik, Thomas Spatzier, David S. Miller, Herbert Xu,
	netdev, Paul Jakma

Hasso Tepper wrote:
> All this work is done to avoid sending stale packets to the network? Ie. 
> there will be no change regarding socket blocking issue? Detecting carrier 
> on/off and/or socket per interface is must be for us in user space to avoid 
> socket blocking? I'm just trying to have full picture so I can put my 
> priorities in place for next release.

A proper solution would take care of both these issues: "stale packets"
and "socket blocking". They are pretty much related.

Using a socket for each interface *ought* not be needed.
Carrier detection sounds like a good thing in its own right.

Hope this helps,
Tommy

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-17 22:04                                                                       ` Tommy Christensen
@ 2005-01-17 22:13                                                                         ` Peter Buckingham
  2005-01-17 22:36                                                                           ` jamal
  2005-01-17 22:53                                                                           ` Tommy Christensen
  0 siblings, 2 replies; 67+ messages in thread
From: Peter Buckingham @ 2005-01-17 22:13 UTC (permalink / raw)
  To: Tommy Christensen
  Cc: Hasso Tepper, hadi, Jeff Garzik, Thomas Spatzier, David S. Miller,
	Herbert Xu, netdev, Paul Jakma

Hi,

Tommy Christensen wrote:
> A proper solution would take care of both these issues: "stale packets"
> and "socket blocking". They are pretty much related.
> 
> Using a socket for each interface *ought* not be needed.
> Carrier detection sounds like a good thing in its own right.

I came across this same problem with multicast/unicast with an e1000. 
For a quick hack we just check to see whether the carrier is ok, if it's 
not we just drop the packet. this might do some nasty things with tcp 
and it may be better to check to see wither the socket buffer is full, 
but this works for me here... (this is against a 2.6.9ish kernel)

peter

Index: net/core/dev.c
===================================================================
--- net/core/dev.c
+++ net/core/dev.c
@@ -1379,6 +1379,11 @@
  #ifdef CONFIG_NET_CLS_ACT
  	skb->tc_verd = SET_TC_AT(skb->tc_verd,AT_EGRESS);
  #endif
+	if (!netif_carrier_ok(dev)) {
+		rc = NET_XMIT_DROP;
+		goto out_kfree_skb;
+	}
+
  	if (q->enqueue) {
  		/* Grab device queue */
  		spin_lock(&dev->queue_lock);

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-17 22:13                                                                         ` Peter Buckingham
@ 2005-01-17 22:36                                                                           ` jamal
  2005-01-17 22:53                                                                           ` Tommy Christensen
  1 sibling, 0 replies; 67+ messages in thread
From: jamal @ 2005-01-17 22:36 UTC (permalink / raw)
  To: Peter Buckingham
  Cc: Tommy Christensen, Hasso Tepper, Jeff Garzik, Thomas Spatzier,
	David S. Miller, Herbert Xu, netdev, Paul Jakma


Hi there,

The solution that Tommy and myself have been batting around is a lot
more cleaner than this (however still doesnt solve the issue 100%)

cheers,
jamal

On Mon, 2005-01-17 at 17:13, Peter Buckingham wrote:
> Hi,
> 
> Tommy Christensen wrote:
> > A proper solution would take care of both these issues: "stale packets"
> > and "socket blocking". They are pretty much related.
> > 
> > Using a socket for each interface *ought* not be needed.
> > Carrier detection sounds like a good thing in its own right.
> 
> I came across this same problem with multicast/unicast with an e1000. 
> For a quick hack we just check to see whether the carrier is ok, if it's 
> not we just drop the packet. this might do some nasty things with tcp 
> and it may be better to check to see wither the socket buffer is full, 
> but this works for me here... (this is against a 2.6.9ish kernel)
> 
> peter
> 
> Index: net/core/dev.c
> ===================================================================
> --- net/core/dev.c
> +++ net/core/dev.c
> @@ -1379,6 +1379,11 @@
>   #ifdef CONFIG_NET_CLS_ACT
>   	skb->tc_verd = SET_TC_AT(skb->tc_verd,AT_EGRESS);
>   #endif
> +	if (!netif_carrier_ok(dev)) {
> +		rc = NET_XMIT_DROP;
> +		goto out_kfree_skb;
> +	}
> +
>   	if (q->enqueue) {
>   		/* Grab device queue */
>   		spin_lock(&dev->queue_lock);
> 
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-17 22:13                                                                         ` Peter Buckingham
  2005-01-17 22:36                                                                           ` jamal
@ 2005-01-17 22:53                                                                           ` Tommy Christensen
  1 sibling, 0 replies; 67+ messages in thread
From: Tommy Christensen @ 2005-01-17 22:53 UTC (permalink / raw)
  To: Peter Buckingham
  Cc: Hasso Tepper, hadi, Jeff Garzik, Thomas Spatzier, David S. Miller,
	Herbert Xu, netdev, Paul Jakma

Peter Buckingham wrote:
> Hi,
> 
> Tommy Christensen wrote:
> 
>> A proper solution would take care of both these issues: "stale packets"
>> and "socket blocking". They are pretty much related.
>>
>> Using a socket for each interface *ought* not be needed.
>> Carrier detection sounds like a good thing in its own right.
> 
> 
> I came across this same problem with multicast/unicast with an e1000. 
> For a quick hack we just check to see whether the carrier is ok, if it's 
> not we just drop the packet. this might do some nasty things with tcp 
> and it may be better to check to see wither the socket buffer is full, 
> but this works for me here... (this is against a 2.6.9ish kernel)

Thanks for the input!

The downside of this solution is that it doesn't do anything with the
packets already queued for tx (in the qdisc queue and in the DMA ring).
OTOH it probably works just fine in real life (!).

Also, we may not want to handle all devices alike, as this patch does.

-Tommy

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-17 21:38                                                                     ` Tommy Christensen
@ 2005-01-30 23:39                                                                       ` Tommy Christensen
  2005-01-31  0:09                                                                         ` jamal
  0 siblings, 1 reply; 67+ messages in thread
From: Tommy Christensen @ 2005-01-30 23:39 UTC (permalink / raw)
  To: hadi
  Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper,
	Herbert Xu, netdev, Paul Jakma

[-- Attachment #1: Type: text/plain, Size: 394 bytes --]

Tommy Christensen wrote:
> jamal wrote:
> 
>> Didnt see anymore discussion on this - have we given up on it?
> 
> 
> It is not forgotten - just stalled ...
> I'll try to look at it some more.

Sorry for the long delay.

Jamal, do you think something like this could work?

Flushing of the DMA ring could be added if the need arises.
e1000 does this already in the driver on link-down.

-Tommy


[-- Attachment #2: carrier.patch --]
[-- Type: text/plain, Size: 1671 bytes --]

diff -ur linux-2.6.11-rc2-bk8/net/core/link_watch.c linux-2.6.11-work/net/core/link_watch.c
--- linux-2.6.11-rc2-bk8/net/core/link_watch.c	2005-01-30 22:08:44.000000000 +0100
+++ linux-2.6.11-work/net/core/link_watch.c	2005-01-31 00:17:45.716039247 +0100
@@ -16,6 +16,7 @@
 #include <linux/netdevice.h>
 #include <linux/if.h>
 #include <net/sock.h>
+#include <net/pkt_sched.h>
 #include <linux/rtnetlink.h>
 #include <linux/jiffies.h>
 #include <linux/spinlock.h>
@@ -74,6 +75,9 @@
 		clear_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state);
 
 		if (dev->flags & IFF_UP) {
+			if (netif_carrier_ok(dev) &&
+			    dev->qdisc_sleeping != &noop_qdisc)
+				dev_activate(dev);
 			netdev_state_change(dev);
 		}
 
diff -ur linux-2.6.11-rc2-bk8/net/sched/sch_generic.c linux-2.6.11-work/net/sched/sch_generic.c
--- linux-2.6.11-rc2-bk8/net/sched/sch_generic.c	2005-01-30 22:11:50.458216314 +0100
+++ linux-2.6.11-work/net/sched/sch_generic.c	2005-01-31 00:06:47.609769835 +0100
@@ -185,6 +185,7 @@
 static void dev_watchdog(unsigned long arg)
 {
 	struct net_device *dev = (struct net_device *)arg;
+	int check_carrier = 0;
 
 	spin_lock(&dev->xmit_lock);
 	if (dev->qdisc != &noop_qdisc) {
@@ -198,10 +199,23 @@
 			}
 			if (!mod_timer(&dev->watchdog_timer, jiffies + dev->watchdog_timeo))
 				dev_hold(dev);
-		}
+		} else
+			check_carrier = 1;
 	}
 	spin_unlock(&dev->xmit_lock);
 
+	if (check_carrier) {
+		spin_lock(&dev->queue_lock);
+		if (!netif_carrier_ok(dev) && netif_queue_stopped(dev)) {
+			struct Qdisc *qdisc;
+
+			qdisc = dev->qdisc;
+			dev->qdisc = &noop_qdisc;
+			qdisc_reset(qdisc);
+		}
+		spin_unlock(&dev->queue_lock);
+	}
+
 	dev_put(dev);
 }
 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-30 23:39                                                                       ` Tommy Christensen
@ 2005-01-31  0:09                                                                         ` jamal
  2005-01-31  0:12                                                                           ` jamal
  2005-01-31  0:31                                                                           ` Tommy Christensen
  0 siblings, 2 replies; 67+ messages in thread
From: jamal @ 2005-01-31  0:09 UTC (permalink / raw)
  To: Tommy Christensen
  Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper,
	Herbert Xu, netdev, Paul Jakma

Tommy,

I wasnt sure why you did:

dev->qdisc = &noop_qdisc;

You should probably save the old qdisc in qdisc_sleeping instead
and restore it on wakeup - otherwise you always end with default qdisc.
check_carrier should probably just call dev_activate which does all you
wanted, no?

cheers,
jamal

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-31  0:09                                                                         ` jamal
@ 2005-01-31  0:12                                                                           ` jamal
  2005-01-31  0:31                                                                           ` Tommy Christensen
  1 sibling, 0 replies; 67+ messages in thread
From: jamal @ 2005-01-31  0:12 UTC (permalink / raw)
  To: Tommy Christensen
  Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper,
	Herbert Xu, netdev, Paul Jakma

On Sun, 2005-01-30 at 19:09, jamal wrote:
> Tommy,
> 
> I wasnt sure why you did:
> 
> dev->qdisc = &noop_qdisc;
> 
> You should probably save the old qdisc in qdisc_sleeping instead
> and restore it on wakeup - otherwise you always end with default qdisc.
> check_carrier should probably just call dev_activate

meant dev_deactivate - watch them locks though ..

cheers,
jamal

>  which does all you
> wanted, no?
> 
> cheers,
> jamal

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-31  0:09                                                                         ` jamal
  2005-01-31  0:12                                                                           ` jamal
@ 2005-01-31  0:31                                                                           ` Tommy Christensen
  2005-01-31  3:26                                                                             ` jamal
  1 sibling, 1 reply; 67+ messages in thread
From: Tommy Christensen @ 2005-01-31  0:31 UTC (permalink / raw)
  To: hadi
  Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper,
	Herbert Xu, netdev, Paul Jakma

jamal wrote:
> Tommy,
> 
> I wasnt sure why you did:
> 
> dev->qdisc = &noop_qdisc;
> 
> You should probably save the old qdisc in qdisc_sleeping instead
> and restore it on wakeup - otherwise you always end with default qdisc.

I think normally we will have qdisc == qdisc_sleeping. At least this
is how I read the code in dev_graft_qdisc(). When can they differ?

> check_carrier should probably just call dev_activate which does all you
> wanted, no?

I copied the logic from dev_deactivate(), but I didn't want to include
the waiting parts, since this is not in process context. OK?

-Tommy

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-31  0:31                                                                           ` Tommy Christensen
@ 2005-01-31  3:26                                                                             ` jamal
  2005-01-31 12:16                                                                               ` Tommy Christensen
  0 siblings, 1 reply; 67+ messages in thread
From: jamal @ 2005-01-31  3:26 UTC (permalink / raw)
  To: Tommy Christensen
  Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper,
	Herbert Xu, netdev, Paul Jakma

On Sun, 2005-01-30 at 19:31, Tommy Christensen wrote:
> jamal wrote:
> > Tommy,
> > 
> > I wasnt sure why you did:
> > 
> > dev->qdisc = &noop_qdisc;
> > 
> > You should probably save the old qdisc in qdisc_sleeping instead
> > and restore it on wakeup - otherwise you always end with default qdisc.
> 
> I think normally we will have qdisc == qdisc_sleeping. At least this
> is how I read the code in dev_graft_qdisc(). When can they differ?

You are correct.

> > check_carrier should probably just call dev_activate which does all you
> > wanted, no?
> 
> I copied the logic from dev_deactivate(), but I didn't want to include
> the waiting parts, since this is not in process context. OK?

Same here.
It does look pretty sane.. Tested?

cheers,
jamal

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-31  3:26                                                                             ` jamal
@ 2005-01-31 12:16                                                                               ` Tommy Christensen
  2005-03-13 17:49                                                                                 ` Hasso Tepper
  0 siblings, 1 reply; 67+ messages in thread
From: Tommy Christensen @ 2005-01-31 12:16 UTC (permalink / raw)
  To: hadi
  Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper,
	Herbert Xu, netdev, Paul Jakma

On Mon, 2005-01-31 at 04:26, jamal wrote:
> It does look pretty sane.. Tested?

Nope, I don't have access to any relevant HW. Hopefully someone
else can give it some beating. Please.

A simple test could be:
 o start tcpdump on host B
 o start pinging from host A to B
 o unplug cable on host A
 o wait a while and then plug the cable back in

This is what I get on an embedded PPC:
01:41:27.712729 IP 192.168.6.22 > 192.168.27.119: icmp 64: echo request
seq 7
01:41:28.712697 IP 192.168.6.22 > 192.168.27.119: icmp 64: echo request
seq 8
01:41:43.712240 IP 192.168.6.22 > 192.168.27.119: icmp 64: echo request
seq 23
01:41:44.712210 IP 192.168.6.22 > 192.168.27.119: icmp 64: echo request
seq 24

The interesting point is that the jump in sequence number coincides
with the jump in rx time. Theory is that not all drivers will behave
like this.

Hopefully someone can verify that with e.g. an e1000, and then check
whether the patch I send makes any difference.

-Tommy

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch 4/10] s390: network driver.
  2005-01-31 12:16                                                                               ` Tommy Christensen
@ 2005-03-13 17:49                                                                                 ` Hasso Tepper
  0 siblings, 0 replies; 67+ messages in thread
From: Hasso Tepper @ 2005-03-13 17:49 UTC (permalink / raw)
  To: Tommy Christensen
  Cc: hadi, Jeff Garzik, Thomas Spatzier, David S. Miller, Herbert Xu,
	netdev, Paul Jakma

Tommy Christensen wrote:
> On Mon, 2005-01-31 at 04:26, jamal wrote:
> > It does look pretty sane.. Tested?
>
> Nope, I don't have access to any relevant HW. Hopefully someone
> else can give it some beating. Please.

Sorry for long delay. I lost track in this discussion, was busy with other 
things, but found finally time to test this.

> Hopefully someone can verify that with e.g. an e1000, and then check
> whether the patch I send makes any difference.

Yes, confirmed that patch makes difference and solves Quagga ospfd problem 
(from what it all started :).

Thank you very much!

PS. Patch is tested with 2.4.29 as well, but it requires link watch stuff to 
work. Complete (link-watch + vlan carrier detection + this fix) is here - 
http://hasso.linux.ee/quagga/2.4.29-link-watch.patch

-- 
Hasso Tepper
Elion Enterprises Ltd.
WAN administrator

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2005-03-13 17:49 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <OF88EC0E9F.DE8FC278-ONC1256F4A.0038D5C0-C1256F4A.00398E11@de.ibm.com>
2004-11-14  1:29 ` [patch 4/10] s390: network driver Jeff Garzik
2004-11-15  7:52   ` Paul Jakma
2004-11-21  8:16     ` Paul Jakma
2004-11-29 15:57     ` Thomas Spatzier
2004-11-29 16:30       ` Paul Jakma
2004-11-29 16:41         ` Thomas Spatzier
2004-11-29 20:27           ` Paul Jakma
2004-11-30  7:22             ` Thomas Spatzier
2004-12-05  6:25               ` Paul Jakma
2004-12-06 11:01                 ` Post Network dev questions to netdev Please WAS(Re: " jamal
2004-12-06 11:27                 ` jamal
2004-12-06 14:42                   ` Hasso Tepper
2004-12-07  1:13                     ` Herbert Xu
2004-12-07  2:22                       ` jamal
2004-12-10 15:37                       ` Paul Jakma
2004-12-14  7:40                         ` Thomas Spatzier
2004-12-15 13:50                           ` jamal
2004-12-15 15:03                             ` Thomas Spatzier
2004-12-19 19:29                               ` jamal
2004-12-19 22:29                                 ` Tommy Christensen
2004-12-19 23:05                                   ` jamal
2004-12-19 23:46                                     ` Tommy Christensen
2004-12-20  0:15                                       ` Jeff Garzik
2004-12-20 14:10                                       ` jamal
2004-12-20 18:54                                         ` Jeff Garzik
2004-12-21  0:13                                           ` Tommy Christensen
2004-12-21  1:19                                             ` Jeff Garzik
2004-12-22 10:56                                           ` Thomas Spatzier
2004-12-22 11:07                                             ` Jeff Garzik
2004-12-22 13:48                                             ` jamal
2005-01-03  9:10                                               ` Thomas Spatzier
2005-01-03 15:05                                                 ` jamal
2005-01-04 23:28                                                   ` Jeff Garzik
2005-01-05  3:19                                                     ` jamal
2005-01-05  6:30                                                       ` Paul Jakma
2005-01-05 13:16                                                         ` jamal
2005-01-05 14:29                                                           ` Paul Jakma
2005-01-06 13:55                                                             ` jamal
2005-01-05 15:35                                                       ` Tommy Christensen
2005-01-06 13:58                                                         ` jamal
2005-01-06 15:06                                                           ` Tommy Christensen
2005-01-07 13:32                                                             ` jamal
2005-01-07 15:26                                                               ` Tommy Christensen
2005-01-10 13:18                                                                 ` jamal
2005-01-16 23:10                                                                   ` jamal
2005-01-17 12:04                                                                     ` Hasso Tepper
2005-01-17 22:04                                                                       ` Tommy Christensen
2005-01-17 22:13                                                                         ` Peter Buckingham
2005-01-17 22:36                                                                           ` jamal
2005-01-17 22:53                                                                           ` Tommy Christensen
2005-01-17 21:38                                                                     ` Tommy Christensen
2005-01-30 23:39                                                                       ` Tommy Christensen
2005-01-31  0:09                                                                         ` jamal
2005-01-31  0:12                                                                           ` jamal
2005-01-31  0:31                                                                           ` Tommy Christensen
2005-01-31  3:26                                                                             ` jamal
2005-01-31 12:16                                                                               ` Tommy Christensen
2005-03-13 17:49                                                                                 ` Hasso Tepper
2005-01-05  6:26                                                     ` Paul Jakma
2004-12-20 14:16                                       ` Paul Jakma
2004-12-20 18:56                                         ` Jeff Garzik
2004-12-26  5:36                                       ` Herbert Xu
2004-12-19 22:43                                 ` Jeff Garzik
2004-12-19 23:54                                 ` Paul Jakma
2004-12-20 14:11                                   ` jamal
2004-12-07  2:39                     ` jamal
2004-12-06 18:44                   ` Paul Jakma

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).