* Re: [patch 4/10] s390: network driver. [not found] <OF88EC0E9F.DE8FC278-ONC1256F4A.0038D5C0-C1256F4A.00398E11@de.ibm.com> @ 2004-11-14 1:29 ` Jeff Garzik 2004-11-15 7:52 ` Paul Jakma 0 siblings, 1 reply; 67+ messages in thread From: Jeff Garzik @ 2004-11-14 1:29 UTC (permalink / raw) To: Thomas Spatzier; +Cc: linux-kernel, Netdev Thomas Spatzier wrote: > > > >>You should be using netif_carrier_{on,off} properly, and not drop the >>packets. When (if) link comes back, you requeue the packets to hardware >>(or hypervisor or whatever). Your dev->stop() should stop operation and >>clean up anything left in your send/receive {rings | buffers}. >> > > > When we do not drop packets, but call netif_stop_queue the write queues > of all sockets associated to the net device are blocked as soon as they > get full. This causes problems with programs such as the zebra routing > daemon. So we have to keep the netif queue running in order to not block > any programs. This is very, very wrong. You are essentially creating in in-driver /dev/null simulator. This does nothing but chew CPU cycles by having the driver drop packets, rather than allowing the system to work as it should. Queues are DESIGNED to fill up under various conditions. Would not the zebra routing software have the same problems with cable pull under an e1000 or tg3 gigabit NIC? > We also had a look at some other drivers and the common behaviour seems to > be that packets are lost if the network cable is pulled out. Which drivers, specifically, are these? The most popular drivers -- e1000, tg3, etc. -- do not do this, for very good reasons. Jeff ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-11-14 1:29 ` [patch 4/10] s390: network driver Jeff Garzik @ 2004-11-15 7:52 ` Paul Jakma 2004-11-21 8:16 ` Paul Jakma 2004-11-29 15:57 ` Thomas Spatzier 0 siblings, 2 replies; 67+ messages in thread From: Paul Jakma @ 2004-11-15 7:52 UTC (permalink / raw) To: Jeff Garzik; +Cc: Thomas Spatzier, linux-kernel, Netdev On Sat, 13 Nov 2004, Jeff Garzik wrote: > Queues are DESIGNED to fill up under various conditions. What happens when they are full? Blocking isnt good, at least not from GNU Zebra / Quagga's POV. > Would not the zebra routing software have the same problems with > cable pull under an e1000 or tg3 gigabit NIC? If they block further writes, yes. > The most popular drivers -- e1000, tg3, etc. -- do not do this, for > very good reasons. The problem is that GNU Zebra / Quagga uses a single raw socket for certain protocols, eg OSPF. Blocking the socket because one NIC has a cable pulled is undesireable, and there's no point queueing the packets, by the time the link comes back, if ever, its highly unlikely there is any use in sending the packets (sending OSPF hello's from X minutes ago on a link that just came back is useless). The kernel really shouldnt get too much in the way of an application that already is fully aware of reliability issues[1] - at least that is the application's expectation in this case. We could change it to use a socket/interface on Linux, but it seems a bit unnecessary to me, at least for raw socket/included-header sockets. > Jeff 1. an application must be if its uses raw sockets surely? Even for non-raw/header-included sockets, eg BGP tcp sockets, a user like GNU Zebra / Quagga would much prefer packets to be dropped. regards, -- Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A Fortune: My interest is in the future because I am going to spend the rest of my life there. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-11-15 7:52 ` Paul Jakma @ 2004-11-21 8:16 ` Paul Jakma 2004-11-29 15:57 ` Thomas Spatzier 1 sibling, 0 replies; 67+ messages in thread From: Paul Jakma @ 2004-11-21 8:16 UTC (permalink / raw) To: Jeff Garzik; +Cc: Thomas Spatzier, linux-kernel, Netdev On Mon, 15 Nov 2004, Paul Jakma wrote: > non-raw/header-included sockets, eg BGP tcp sockets, a user like > GNU Zebra / Quagga would much prefer packets to be dropped. Ur... not for TCP.. obviously. Anyway, is there any advice on how applications that use a single socket for raw/udp should deal with this new behaviour? All of the link-orientated routing protocol daemons in Quagga/GNU Zebra are going to break on Linux with this new behaviour. Should such applications be changed to open a seperate socket per interface? Or could we have a SO_DROP_DONT_QUEUE sockopt to allow a privileged application to retain the previous behaviour, or some way to flush the queue for a socket? Using a socket per interface wont address problem of sending quite stale packets when a link comes back after a long time down, AUI. (not a huge problem - but not nice). Jeff??? regards, -- Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A Fortune: Be incomprehensible. If they can't understand, they can't disagree. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-11-15 7:52 ` Paul Jakma 2004-11-21 8:16 ` Paul Jakma @ 2004-11-29 15:57 ` Thomas Spatzier 2004-11-29 16:30 ` Paul Jakma 1 sibling, 1 reply; 67+ messages in thread From: Thomas Spatzier @ 2004-11-29 15:57 UTC (permalink / raw) To: paul; +Cc: jgarzik, linux-kernel, netdev > Using a socket per interface wont address problem of sending > quite stale packets when a link > comes back after a long time > down, AUI. (not a huge problem - but not nice). > > Jeff??? Has there been any outcome on the discussion about whether or not a device driver should drop packets when the cable is disconnected? It seems that from the zebra point of view, as Paul wrote, it would be better to not block sockets by queueing up packets when there is no cable connection. I do also think that it does not make sense to keep packets in the queue and then send those packets when the cable is plugged in again after a possibly long time. There are protocols like TCP that handle packet loss anyway. Regards, Thomas. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-11-29 15:57 ` Thomas Spatzier @ 2004-11-29 16:30 ` Paul Jakma 2004-11-29 16:41 ` Thomas Spatzier 0 siblings, 1 reply; 67+ messages in thread From: Paul Jakma @ 2004-11-29 16:30 UTC (permalink / raw) To: Thomas Spatzier; +Cc: jgarzik, linux-kernel, netdev On Mon, 29 Nov 2004, Thomas Spatzier wrote: > Has there been any outcome on the discussion about whether or not a > device driver should drop packets when the cable is disconnected? There hasnt. > It seems that from the zebra point of view, as Paul wrote, it would > be better to not block sockets by queueing up packets when there is > no cable connection. I'd prefer either to get ENOBUFS or to have kernel drop the packet - we're privileged apps writing to raw sockets and/or with IP_HDRINCL, the kernel should assume we know what we're doing (cause if we dont, there's far /worse/ we could do than send packets to an effective /dev/null). > I do also think that it does not make sense to keep packets in the > queue and then send those packets when the cable is plugged in > again after a possibly long time. Well, if the kernel is going to queue these packets without notifying us, we absolutely *must* have some way to flush those queues. Sending stale packets many minutes after the application generated them could have serious consequences for routing (eg, think sending RIP, IPv4 IRDP or v6 RAs which are no longer valid - client receives them and installs routes which are long invalid and loses connectivity to some part of the network). So even if we moved to socket/interface, we still need some way to tell kernel to flush a socket when we receive link-down over netlink later. Not trying to queue on sockets where app has no expectation of reliability in first place would be even better ;) > There are protocols like TCP that handle packet loss anyway. Yes. I'd be very interested to hear advice from the kernel gurus (eg "god, dont be so stupid, do xyz in your application instead"). We can accomodate whatever kernel wants as long as its workable. > Regards, > Thomas. regards, -- Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A Fortune: You will pay for your sins. If you have already paid, please disregard this message. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-11-29 16:30 ` Paul Jakma @ 2004-11-29 16:41 ` Thomas Spatzier 2004-11-29 20:27 ` Paul Jakma 0 siblings, 1 reply; 67+ messages in thread From: Thomas Spatzier @ 2004-11-29 16:41 UTC (permalink / raw) To: Paul Jakma; +Cc: jgarzik, linux-kernel, netdev Paul Jakma <paul@clubi.ie> wrote on 29.11.2004 17:30:23: > Well, if the kernel is going to queue these packets without notifying > us, we absolutely *must* have some way to flush those queues. Sending > stale packets many minutes after the application generated them could > have serious consequences for routing (eg, think sending RIP, IPv4 > IRDP or v6 RAs which are no longer valid - client receives them and > installs routes which are long invalid and loses connectivity to some > part of the network). > Yes, for the examples you mentioned the app should better be notified. However, AFAICS, there are no such notification mechanisms on a per-packet basis implemented in the kernel. And I doubt that they are going to be implemented. > I'd be very interested to hear advice from the kernel gurus (eg "god, > dont be so stupid, do xyz in your application instead"). We can > accomodate whatever kernel wants as long as its workable. Good suggestion, if anyone has an interesting and feasible solution I will be happy to integrate it. So far, however, it don't see one and I would point people being worried about lost packets to TCP. Regards, Thomas. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-11-29 16:41 ` Thomas Spatzier @ 2004-11-29 20:27 ` Paul Jakma 2004-11-30 7:22 ` Thomas Spatzier 0 siblings, 1 reply; 67+ messages in thread From: Paul Jakma @ 2004-11-29 20:27 UTC (permalink / raw) To: Thomas Spatzier; +Cc: jgarzik, linux-kernel, netdev On Mon, 29 Nov 2004, Thomas Spatzier wrote: > Yes, for the examples you mentioned the app should better be > notified. However, AFAICS, there are no such notification > mechanisms on a per-packet basis implemented in the kernel. And I > doubt that they are going to be implemented. We do get notified. We get a netlink interface update with unset IFF_RUNNING from the interface flags. However, it can take time before zebra gets to read it, and further time before zebra sends its own interface update to protocol daemons, and further time before they might get to read and update their own interface state. By which time those daemons may have sent packets down those interfaces (which packets may become invalid before link becomes back, but which still will be sent and hence which potentially could disrupt routing). Ideally, we should be notified synchronously (EBUFS?) or if thats not possible, packet should be dropped (see my below presumption though). The other solution is some means for a daemon to be able flush queues for a specific interface. (but thats a bit ugly). > Good suggestion, if anyone has an interesting and feasible solution > I will be happy to integrate it. So far, however, it don't see one > and I would point people being worried about lost packets to TCP. Surely TCP already was able to take care of retransmits? I'm not familiar with Linux internals, but how did TCP cope with the previous driver behaviour? (I get the vague impression that we're talking about a driver specific buffer here, rather than the applications socket buffer - is that correct?). Jeff, David, enlighten us please! :) > Regards, > Thomas. regards, -- Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A Fortune: I can't understand why people are frightened of new ideas. I'm frightened of the old ones. -- John Cage ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-11-29 20:27 ` Paul Jakma @ 2004-11-30 7:22 ` Thomas Spatzier 2004-12-05 6:25 ` Paul Jakma 0 siblings, 1 reply; 67+ messages in thread From: Thomas Spatzier @ 2004-11-30 7:22 UTC (permalink / raw) To: Paul Jakma; +Cc: jgarzik, linux-kernel, netdev Paul Jakma <paul@clubi.ie> wrote on 29.11.2004 21:27:57: > We do get notified. We get a netlink interface update with > unset IFF_RUNNING from the interface flags. > > However, it can take time before zebra gets to read it, and further > time before zebra sends its own interface update to protocol daemons, > and further time before they might get to read and update their own > interface state. By which time those daemons may have sent packets > down those interfaces (which packets may become invalid before link > becomes back, but which still will be sent and hence which > potentially could disrupt routing). Ok, then some logic could be implemented in userland to take appropriate actions. It must be ensured that zebra handles the netlink notification fast enough. > > Ideally, we should be notified synchronously (EBUFS?) or if thats not > possible, packet should be dropped (see my below presumption though). In the manpages for send/sendto/sendmsg it says that there is a -ENOBUFS return value, if a sockets write queue is full. It also says: "Normally, this does not occur in Linux. Packets are just silently dropped when a device queue overflows." So, if packets are 'silently dropped' anyway, the fact that we drop them in our driver (and increment the error count in the net_device_stats accordingly) should not be a problem. > Surely TCP already was able to take care of retransmits? I'm not > familiar with Linux internals, but how did TCP cope with the previous > driver behaviour? I think that both behaviours are similar for TCP. TCP waits for ACKs for each packet. If they do not arrive, a retransmit is done. Sooner or later the connection will be reset, if no responses from the other side arrive. So the result for both driver behaviours should be the same. Regards, Thomas ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-11-30 7:22 ` Thomas Spatzier @ 2004-12-05 6:25 ` Paul Jakma 2004-12-06 11:01 ` Post Network dev questions to netdev Please WAS(Re: " jamal 2004-12-06 11:27 ` jamal 0 siblings, 2 replies; 67+ messages in thread From: Paul Jakma @ 2004-12-05 6:25 UTC (permalink / raw) To: Thomas Spatzier; +Cc: jgarzik, linux-kernel, netdev On Tue, 30 Nov 2004, Thomas Spatzier wrote: > Ok, then some logic could be implemented in userland to take > appropriate actions. It must be ensured that zebra handles the > netlink notification fast enough. AIUI, netlink is not synchronous, it most definitely makes no reliability guarantees (and at the moment, zebra isnt terribly efficient at reading netlink, large numbers of interfaces will cause overruns in zebra - fixing this is on the TODO list). So we can never get rid of the window where a daemon could send a packet out a link-down interface - we can make that window smaller but not eliminate it. Hence we need either a way to flush packets associated with an (interface,socket) (or just the socket) or we need the kernel to not accept such packets (and drop packets it has accepted). > In the manpages for send/sendto/sendmsg it says that there is a -ENOBUFS > return value, if a sockets write queue is full. Yes, ENOBUFS, sorry. > It also says: > "Normally, this does not occur in Linux. Packets are just silently dropped > when a device queue overflows." This has always been (AFAIK) the behaviour yes. We started getting reports of the new queuing behaviour with, iirc, a version of Intel's e100 driver for 2.4.2x, which was later changed back to the old behaviour. However now that the queue behaviour is apparently the mandated behaviour we really need to work out what to do about the sending-long-stale packets problem. > So, if packets are 'silently dropped' anyway, the fact that we drop > them in our driver (and increment the error count in the > net_device_stats accordingly) should not be a problem. It shouldnt no. The likes of OSPF already specify their own reliability mechanisms. > I think that both behaviours are similar for TCP. TCP waits for > ACKs for each packet. If they do not arrive, a retransmit is done. > Sooner or later the connection will be reset, if no responses from > the other side arrive. So the result for both driver behaviours > should be the same. But if TCP worked even when drivers dropped packets, then that implies TCP has its own queue? That we're talking about a seperate driver packet queue rather than the socket buffer (which is, presumably, where TCP retains packets until ACKed - i have no idea). Anyway, we do, I think, need some way to deal with the sending-stale-packet-on-link-back problem. Either a way to flush this driver queue or else a guarantee that writes to sockets whose protocol makes no reliability guarantee will either return ENOBUFS or drop the packet. Otherwise we will start getting reports of "Quagga on Linux sent an ancient {RIP,IRDP,RA} packet when we fixed a switch problem, and it caused an outage for a section of our network due to bad routes", I think. Some comment or advice would be useful. (Am I kill-filed by all of netdev? feels like it). > Regards, > Thomas regards, -- Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A Fortune: No directory. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Post Network dev questions to netdev Please WAS(Re: [patch 4/10] s390: network driver. 2004-12-05 6:25 ` Paul Jakma @ 2004-12-06 11:01 ` jamal 2004-12-06 11:27 ` jamal 1 sibling, 0 replies; 67+ messages in thread From: jamal @ 2004-12-06 11:01 UTC (permalink / raw) To: Paul Jakma; +Cc: Thomas Spatzier, jgarzik, linux-kernel, netdev On Sun, 2004-12-05 at 01:25, Paul Jakma wrote: > > This has always been (AFAIK) the behaviour yes. We started getting > reports of the new queuing behaviour with, iirc, a version of Intel's > e100 driver for 2.4.2x, which was later changed back to the old > behaviour. However now that the queue behaviour is apparently the > mandated behaviour we really need to work out what to do about the > sending-long-stale packets problem. > I missed the beginings of this thread. Seems some patch was posted on lkml which started this discussion. I am pretty sure what the lkml FAQ says is to post on netdev. Ok, If you insist posting on lkml (because that the way to glory, good fortune and fame), then please have the courtesy to post to netdev. Now lets see if we can help. Followups only on netdev. cheers, jamal ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-05 6:25 ` Paul Jakma 2004-12-06 11:01 ` Post Network dev questions to netdev Please WAS(Re: " jamal @ 2004-12-06 11:27 ` jamal 2004-12-06 14:42 ` Hasso Tepper 2004-12-06 18:44 ` Paul Jakma 1 sibling, 2 replies; 67+ messages in thread From: jamal @ 2004-12-06 11:27 UTC (permalink / raw) To: Paul Jakma; +Cc: Thomas Spatzier, jgarzik, linux-kernel, netdev On Sun, 2004-12-05 at 01:25, Paul Jakma wrote: [..] > Anyway, we do, I think, need some way to deal with the > sending-stale-packet-on-link-back problem. Either a way to flush this > driver queue or else a guarantee that writes to sockets whose > protocol makes no reliability guarantee will either return ENOBUFS or > drop the packet. > > Otherwise we will start getting reports of "Quagga on Linux sent an > ancient {RIP,IRDP,RA} packet when we fixed a switch problem, and it > caused an outage for a section of our network due to bad routes", I > think. > > Some comment or advice would be useful. (Am I kill-filed by all of > netdev? feels like it). Dont post networking related patches on other lists. I havent seen said patch, but it seems someone is complaining about some behavior changing? In regards to link down and packets being queued. Agreed this is a little problematic for some apps/transports. TCP is definetely not one of them. TCP in Linux actually is told if the drop is local. This way it can make better judgement (and not unnecesarily adjust windows for example). SCTP AFAIK is the only transport that provides its apps opportunity to obsolete messages already sent. I am not sure how well implemented or whtether it is implemented at all. Someone working on SCTP could comment. In the case the netdevice is administratively downed both the qdisc and DMA ring packets are flushed. Newer packets will never be queued and you should quickly be able to find from your app that the device is down. In the case of netdevice being operationally down - I am hoping this is what the discussion is, having jumped on it - both queues stay intact. What you can do is certainly from user space admin down/up the device when you receive a netlink carrier off notification. I am struggling to see whether dropping the packet inside the tx path once it is operationaly down is so blasphemous ... need to get caffeine first. cheers, jamal ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-06 11:27 ` jamal @ 2004-12-06 14:42 ` Hasso Tepper 2004-12-07 1:13 ` Herbert Xu 2004-12-07 2:39 ` jamal 2004-12-06 18:44 ` Paul Jakma 1 sibling, 2 replies; 67+ messages in thread From: Hasso Tepper @ 2004-12-06 14:42 UTC (permalink / raw) To: hadi; +Cc: Paul Jakma, Thomas Spatzier, jgarzik, netdev jamal wrote: > Dont post networking related patches on other lists. I havent seen said > patch, but it seems someone is complaining about some behavior changing? For some strange reason I didn't find the beginning of thread either from archive, the first post seems to be http://oss.sgi.com/projects/netdev/archive/2004-11/msg01015.html. I'm trying to summarize. The approach - one raw socket to send/receive messages no matter how many interfaces are used - worked for Quagga/Zebra routing software daemons till now. If some of these interfaces was operationally down, socket wasn't blocked even if the queue was full. In fact "man sendmsg" has still text: ENOBUFS The output queue for a network interface was full. This gener- ally indicates that the interface has stopped sending, but may be caused by transient congestion. (Normally, this does not occur in Linux. Packets are just silently dropped when a device queue overflows.) Seems that it's no longer true. Seems that kernel is now trying as hard as possible not to loose any data - data is queued and if the queue will be full, all related sockets will be blocked to notify application. So, one socket approach don't work any more for Quagga/Zebra. No problem, we can take the "one socket per interface" approach. And we already have link detection implemented to notify daemons. But there will be still potential problem - sending the "interface down" message from kernel to the zebra daemon and then to the routing protocol daemon takes time. And during this time daemon can send packets which will sit in queue and may cause many problems if sent to the network later (if link comes up again). Think about statelss routing protocols like rip(ng), ipv6 router advertisements etc. They may carry the info that's no longer true causing routing loops etc. And to clarify a little bit: no - the Quagga/Zebra didn't work with previous approach perfectly. I fact with link detection and socket per interface it will be better than ever no matter what's the kernel behaviour. We just want to make sure that solution will be bulletproof. So, problem - how can we make sure that no potentially dangerous aged (routing) info will be in the network? -- Hasso Tepper Elion Enterprises Ltd. WAN administrator ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-06 14:42 ` Hasso Tepper @ 2004-12-07 1:13 ` Herbert Xu 2004-12-07 2:22 ` jamal 2004-12-10 15:37 ` Paul Jakma 2004-12-07 2:39 ` jamal 1 sibling, 2 replies; 67+ messages in thread From: Herbert Xu @ 2004-12-07 1:13 UTC (permalink / raw) To: Hasso Tepper; +Cc: hadi, paul, thomas.spatzier, jgarzik, netdev Hasso Tepper <hasso@estpak.ee> wrote: > > Seems that it's no longer true. Seems that kernel is now trying as hard as > possible not to loose any data - data is queued and if the queue will be > full, all related sockets will be blocked to notify application. So, one > socket approach don't work any more for Quagga/Zebra. No problem, we can > take the "one socket per interface" approach. And we already have link > detection implemented to notify daemons. I don't see why this should be happening. Can you please provide a minimal program that reproduces this blocking problem? For example, something that sends a packet to a downed interface and then sends one to an interface that's up? -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-07 1:13 ` Herbert Xu @ 2004-12-07 2:22 ` jamal 2004-12-10 15:37 ` Paul Jakma 1 sibling, 0 replies; 67+ messages in thread From: jamal @ 2004-12-07 2:22 UTC (permalink / raw) To: Herbert Xu; +Cc: Hasso Tepper, paul, thomas.spatzier, jgarzik, netdev On Mon, 2004-12-06 at 20:13, Herbert Xu wrote: > Hasso Tepper <hasso@estpak.ee> wrote: > > > > Seems that it's no longer true. Seems that kernel is now trying as hard as > > possible not to loose any data - data is queued and if the queue will be > > full, all related sockets will be blocked to notify application. So, one > > socket approach don't work any more for Quagga/Zebra. No problem, we can > > take the "one socket per interface" approach. And we already have link > > detection implemented to notify daemons. > > I don't see why this should be happening. Can you please provide a > minimal program that reproduces this blocking problem? For example, > something that sends a packet to a downed interface and then sends > one to an interface that's up? This may be something to do with socket level changes maybe? Does this happen with sockets that are not raw? Having said that: I think getting rid of obsolete messages is important. One of the suggested schemes of operation sounds to be the least brutal. Jgarzik, I thought about it a little and it seems to me that allowing the device driver to drop packets on txmit when netcarrier is off is not that bad. This is almost like pretending packets were dropped on the wire. Thoughts? cheers, jamal ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-07 1:13 ` Herbert Xu 2004-12-07 2:22 ` jamal @ 2004-12-10 15:37 ` Paul Jakma 2004-12-14 7:40 ` Thomas Spatzier 1 sibling, 1 reply; 67+ messages in thread From: Paul Jakma @ 2004-12-10 15:37 UTC (permalink / raw) To: Herbert Xu; +Cc: Hasso Tepper, hadi, thomas.spatzier, jgarzik, netdev On Tue, 7 Dec 2004, Herbert Xu wrote: > I don't see why this should be happening. Thomas' original patch was to address this problem. I wonder could he recap the kernel side of this problem? > Can you please provide a minimal program that reproduces this > blocking problem? For example, something that sends a packet to a > downed interface and then sends one to an interface that's up? I will try get this for you soon. regards -- Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A Fortune: Who was that masked man? ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-10 15:37 ` Paul Jakma @ 2004-12-14 7:40 ` Thomas Spatzier 2004-12-15 13:50 ` jamal 0 siblings, 1 reply; 67+ messages in thread From: Thomas Spatzier @ 2004-12-14 7:40 UTC (permalink / raw) To: Paul Jakma; +Cc: hadi, Hasso Tepper, Herbert Xu, jgarzik, netdev Paul Jakma <paul@clubi.ie> wrote on 10.12.2004 16:37:15: > Thomas' original patch was to address this problem. I wonder could he > recap the kernel side of this problem? Here is why we submitted the original patch: We got reports from several customers that their dynamic routing daemons got hung when one network interface lost its physical connection. Some debugging showed that the write queues of sockets went full and got blocked. This was because we issued a netif_stop_queue when we detect a cable pull or something. As a solution, we removed the netif_stop_queue calls and just dropped the packets + we increment the respective error counts in the net_device_stats and call netif_carrier_off. This solved the customer problems and seems to be right thing for zebra etc. Regards, Thomas. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-14 7:40 ` Thomas Spatzier @ 2004-12-15 13:50 ` jamal 2004-12-15 15:03 ` Thomas Spatzier 0 siblings, 1 reply; 67+ messages in thread From: jamal @ 2004-12-15 13:50 UTC (permalink / raw) To: Thomas Spatzier Cc: Paul Jakma, Hasso Tepper, Herbert Xu, jgarzik, netdev, David S. Miller On Tue, 2004-12-14 at 02:40, Thomas Spatzier wrote: > > > Paul Jakma <paul@clubi.ie> wrote on 10.12.2004 16:37:15: > > Thomas' original patch was to address this problem. I wonder could he > > recap the kernel side of this problem? > > Here is why we submitted the original patch: We got reports from > several customers that their dynamic routing daemons got hung when > one network interface lost its physical connection. Some debugging > showed that the write queues of sockets went full and got blocked. > This was because we issued a netif_stop_queue when we detect a > cable pull or something. I did some more thinking in the background and i wish to change my opinion. What you see is Very Odd. I think there may be a bug upstream at the socket layer or even before that - but doesnt sound like a device level bug. Wasnt someone supposed to send a small proggie to Herbert? When you netif_stop_queue you should never receive packets anymore at the device level. If you receive any its a bug and you should drop them and bitch violently. In other words i think what you have at the moment is bandaid not the solution. > As a solution, we removed the netif_stop_queue calls and just dropped > the packets + we increment the respective error counts in the > net_device_stats and call netif_carrier_off. > This solved the customer problems and seems to be right thing for > zebra etc. We need to Fix this issue. Either your driver is doing something wrong or something is broken upstackstream. Can you describe how your driver uses the netif_start/stop/wake interfaces? Whoever promised to send that program to Herbert - please do. cheers, jamal ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-15 13:50 ` jamal @ 2004-12-15 15:03 ` Thomas Spatzier 2004-12-19 19:29 ` jamal 0 siblings, 1 reply; 67+ messages in thread From: Thomas Spatzier @ 2004-12-15 15:03 UTC (permalink / raw) To: hadi; +Cc: David S. Miller, Hasso Tepper, Herbert Xu, jgarzik, netdev, Paul Jakma jamal <hadi@cyberus.ca> wrote on 15.12.2004 14:50:27: > When you netif_stop_queue you should never receive packets anymore > at the device level. If you receive any its a bug and you should drop > them and bitch violently. In other words i think what you have at the > moment is bandaid not the solution. When we do a netif_stop_queue, we do not get any more packets. So this behaviour is ok. The problem is that the socket write queues fill up then and get blocked as soon as they are full. > Can you describe how your driver uses the netif_start/stop/wake > interfaces? Before the patch we are talking about: When we detect a cable pull (or something like this) we call netif_stop_queue and set switch off the IFF_RUNNING flag. Then when we detect that the cable is plugged in again, we call netif_wake_queue and switch the IFF_RUNNING flag on. And with the patch: On cable pull we switch off IFF_RUNNING and call netif_carrier_off. We still get packets but drop them. When the cable is plugged in we switch on IFF_RUNNING and call netif_carrier_on. Regard, Thomas. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-15 15:03 ` Thomas Spatzier @ 2004-12-19 19:29 ` jamal 2004-12-19 22:29 ` Tommy Christensen ` (2 more replies) 0 siblings, 3 replies; 67+ messages in thread From: jamal @ 2004-12-19 19:29 UTC (permalink / raw) To: Thomas Spatzier Cc: David S. Miller, Hasso Tepper, Herbert Xu, jgarzik, netdev, Paul Jakma On Wed, 2004-12-15 at 10:03, Thomas Spatzier wrote: > jamal <hadi@cyberus.ca> wrote on 15.12.2004 14:50:27: > > > When you netif_stop_queue you should never receive packets anymore > > at the device level. If you receive any its a bug and you should drop > > them and bitch violently. In other words i think what you have at the > > moment is bandaid not the solution. > > When we do a netif_stop_queue, we do not get any more packets. > So this behaviour is ok. The problem is that the socket write > queues fill up then and get blocked as soon as they are full. > This is the strange part. Anyone still willing to provide a sample program that hangs? > > Can you describe how your driver uses the netif_start/stop/wake > > interfaces? > > Before the patch we are talking about: > When we detect a cable pull (or something like this) we call > netif_stop_queue and set switch off the IFF_RUNNING flag. > Then when we detect that the cable is plugged in again, we > call netif_wake_queue and switch the IFF_RUNNING flag on. > Not too bad except user space doesnt get async notification. > And with the patch: > On cable pull we switch off IFF_RUNNING and call > netif_carrier_off. We still get packets but drop them. > When the cable is plugged in we switch on IFF_RUNNING and > call netif_carrier_on. Ok, thats something you need to change. Why you are setting that IFF_RUNNING flag? Look at e1000 they do it right there. On link up: netif_carrier_on(netdev); netif_wake_queue(netdev); On Link Down: netif_carrier_off(netdev); // wonder if these need swapping netif_stop_queue(netdev); Still, I think we need to resolve the original problem. And I have a feeling we wont be seeing any program which reproduces it ;-> cheers, jamal ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-19 19:29 ` jamal @ 2004-12-19 22:29 ` Tommy Christensen 2004-12-19 23:05 ` jamal 2004-12-19 22:43 ` Jeff Garzik 2004-12-19 23:54 ` Paul Jakma 2 siblings, 1 reply; 67+ messages in thread From: Tommy Christensen @ 2004-12-19 22:29 UTC (permalink / raw) To: hadi Cc: Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, jgarzik, netdev, Paul Jakma jamal wrote: > On Wed, 2004-12-15 at 10:03, Thomas Spatzier wrote: > >>jamal <hadi@cyberus.ca> wrote on 15.12.2004 14:50:27: >> >> >>>When you netif_stop_queue you should never receive packets anymore >>>at the device level. If you receive any its a bug and you should drop >>>them and bitch violently. In other words i think what you have at the >>>moment is bandaid not the solution. >> >>When we do a netif_stop_queue, we do not get any more packets. >>So this behaviour is ok. The problem is that the socket write >>queues fill up then and get blocked as soon as they are full. >> > > This is the strange part. Anyone still willing to provide a sample > program that hangs? I haven't tried this myself, but surely it can happen when the socket send-buffer is smaller than what can be queued up for transmission: i.e. in the qdisc queue and device DMA ring. And even more so, when sending to multiple devices from one socket. > Look at e1000 they do it right there. > > On link up: > netif_carrier_on(netdev); > netif_wake_queue(netdev); > On Link Down: > netif_carrier_off(netdev); // wonder if these need swapping > netif_stop_queue(netdev); > Well, this is the same as what started this whole thread. I believe that stopping the queue on link-down events is simply bad behavior of the driver. -Tommy ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-19 22:29 ` Tommy Christensen @ 2004-12-19 23:05 ` jamal 2004-12-19 23:46 ` Tommy Christensen 0 siblings, 1 reply; 67+ messages in thread From: jamal @ 2004-12-19 23:05 UTC (permalink / raw) To: Tommy Christensen Cc: Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, jgarzik, netdev, Paul Jakma On Sun, 2004-12-19 at 17:29, Tommy Christensen wrote: > I haven't tried this myself, but surely it can happen when the > socket send-buffer is smaller than what can be queued up for > transmission: i.e. in the qdisc queue and device DMA ring. Shouldnt this have to do with socket options? If you wish to block while waiting for send, then you should be allowed. For a routing protocol that actually is notified that the link went down, it should probably flush those socket buffer at that point. > And even more so, when sending to multiple devices from one socket. Yes, this looks more within reason. I dont know what the answer to this is. But would be helpful for someone to create a setup that reproduces this. > > Look at e1000 they do it right there. > > > > On link up: > > netif_carrier_on(netdev); > > netif_wake_queue(netdev); > > On Link Down: > > netif_carrier_off(netdev); // wonder if these need swapping > > netif_stop_queue(netdev); > > > > Well, this is the same as what started this whole thread. > I believe that stopping the queue on link-down events is simply > bad behavior of the driver. >From what Thomas was saying, this is not what he was doing. Read his email. cheers, jamal ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-19 23:05 ` jamal @ 2004-12-19 23:46 ` Tommy Christensen 2004-12-20 0:15 ` Jeff Garzik ` (3 more replies) 0 siblings, 4 replies; 67+ messages in thread From: Tommy Christensen @ 2004-12-19 23:46 UTC (permalink / raw) To: hadi Cc: Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, jgarzik, netdev, Paul Jakma jamal wrote: > On Sun, 2004-12-19 at 17:29, Tommy Christensen wrote: > >>I haven't tried this myself, but surely it can happen when the >>socket send-buffer is smaller than what can be queued up for >>transmission: i.e. in the qdisc queue and device DMA ring. > > Shouldnt this have to do with socket options? If you wish to block while > waiting for send, then you should be allowed. > For a routing protocol that actually is notified that the link went > down, it should probably flush those socket buffer at that point. OK. So is this the recommendation for these pour souls? - Use a socket for each device. - Set the socket buffer (SO_SNDBUF) large enough. E.g. 1 MB ? Or use non-blocking sockets - just in case. - If you care about not sending stale packets, it is the responsibility of the application to flush the socket on link-down events (by down'ing the interface?). >>Well, this is the same as what started this whole thread. >>I believe that stopping the queue on link-down events is simply >>bad behavior of the driver. > >>From what Thomas was saying, this is not what he was doing. Read his > email. It was at least to the same effect. The key issue is whether the packets are kept in the queue (qdisc) until the link is back up, or they are drained (and dropped) by the driver. -Tommy ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-19 23:46 ` Tommy Christensen @ 2004-12-20 0:15 ` Jeff Garzik 2004-12-20 14:10 ` jamal ` (2 subsequent siblings) 3 siblings, 0 replies; 67+ messages in thread From: Jeff Garzik @ 2004-12-20 0:15 UTC (permalink / raw) To: Tommy Christensen Cc: hadi, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, netdev, Paul Jakma Tommy Christensen wrote: > It was at least to the same effect. The key issue is whether the > packets are kept in the queue (qdisc) until the link is back up, > or they are drained (and dropped) by the driver. They should absolutely not be dropped by the driver. No ifs, ands, or buts. This sort of problem is NOT solved by modifying hundreds of drivers. Jeff ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-19 23:46 ` Tommy Christensen 2004-12-20 0:15 ` Jeff Garzik @ 2004-12-20 14:10 ` jamal 2004-12-20 18:54 ` Jeff Garzik 2004-12-20 14:16 ` Paul Jakma 2004-12-26 5:36 ` Herbert Xu 3 siblings, 1 reply; 67+ messages in thread From: jamal @ 2004-12-20 14:10 UTC (permalink / raw) To: Tommy Christensen Cc: Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, jgarzik, netdev, Paul Jakma On Sun, 2004-12-19 at 18:46, Tommy Christensen wrote: > OK. So is this the recommendation for these pour souls? > > - Use a socket for each device. Sounds sensible (each device with IP enabled is more like it) > - Set the socket buffer (SO_SNDBUF) large enough. E.g. 1 MB ? > Or use non-blocking sockets - just in case. I think we may need a socket "flush socket buffer" signal > - If you care about not sending stale packets, it is the > responsibility of the application to flush the socket on > link-down events (by down'ing the interface?). sigh. I am begining to think this is too complex an approach. It requires there be a way to automagically clean up the buffers when things like this happen. I beginuing to think thats the simplest way to achieve this: i.e not to stop the queue but rather to let the packets continue showing up and drop them at the driver when the link is down . The netlink async carrier signal to the app is to be used to reroute instead of being a signal to flush buffers. In other words the other Thomas got it right (with the exception of setting the IFF_RUNNIGN flags) Jeff? cheers, jamal ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-20 14:10 ` jamal @ 2004-12-20 18:54 ` Jeff Garzik 2004-12-21 0:13 ` Tommy Christensen 2004-12-22 10:56 ` Thomas Spatzier 0 siblings, 2 replies; 67+ messages in thread From: Jeff Garzik @ 2004-12-20 18:54 UTC (permalink / raw) To: hadi Cc: Tommy Christensen, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, netdev, Paul Jakma jamal wrote: > I beginuing to think thats the simplest way to achieve this: i.e not to > stop the queue but rather to let the packets continue showing up and > drop them at the driver when the link is down . The netlink async > carrier signal to the app is to be used to reroute instead of being a > signal to flush buffers. In other words the other Thomas got it right > (with the exception of setting the IFF_RUNNIGN flags) > > Jeff? I haven't heard anything to convince me that the same change should be deployed across NNN drivers. The drivers already signal the net core that the link is down; to me, that implies there should be code in _one_ place that handles this condition, not NNN places. Further, * if this policy ("drop them in the driver") ever changes, we must again touch NNN drivers * dropping them in the driver but not stopping the queue means that the system is allowed to continue to stream data into the driver, only for the driver to free it. That will scale -- right up to (worst case) 100% CPU, with userland sending packets as fast as it can, and the driver dropping packets as fast as it can. The only places the net stack currently checks carrier is dev_get_flags() and dev_watchdog(). * If you need a hook to flush the in-hardware buffers, add such a hook. Don't hack it in like this. Yeah, adding a hook touches NNN drivers but at least the hook is far more self-contained, it's semantics will be more clear, and it will increase the likelihood that the driver changes do not affect the hot path nor current netif_{start,stop}_queue() logic. Jeff ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-20 18:54 ` Jeff Garzik @ 2004-12-21 0:13 ` Tommy Christensen 2004-12-21 1:19 ` Jeff Garzik 2004-12-22 10:56 ` Thomas Spatzier 1 sibling, 1 reply; 67+ messages in thread From: Tommy Christensen @ 2004-12-21 0:13 UTC (permalink / raw) To: Jeff Garzik Cc: hadi, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, netdev, Paul Jakma Jeff Garzik wrote: > I haven't heard anything to convince me that the same change should be > deployed across NNN drivers. The drivers already signal the net core > that the link is down; to me, that implies there should be code in _one_ > place that handles this condition, not NNN places. AFAICS only a handful of (newer) drivers call netif_stop_queue() directly. Others may do this indirectly if the MAC stops taking packets from the DMA ringbuffer. At least some MAC's/drivers certainly don't. I've always thought of netif_stop_queue() as the replacement of the old tbusy flag, signaling a transient condition where the HW is unable to keep up with the flow of packets. And it seems to be used for just this in most cases. Perhaps somebody confused netif_stop_queue with dev_deactivate() ?? OK, another view on this: isn't is problematic to have skb's stuck in the network stack "indefinitely" ? They hold references to a dst_entry and a sock (and probably more). So how about this for the FAQ: Q: Why can't I unload the af_packet module? A: Ohh, you'll have to plug in the darn cable to eth0 first! *Please* tell me, I've got this all wrong. -Tommy ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-21 0:13 ` Tommy Christensen @ 2004-12-21 1:19 ` Jeff Garzik 0 siblings, 0 replies; 67+ messages in thread From: Jeff Garzik @ 2004-12-21 1:19 UTC (permalink / raw) To: Tommy Christensen Cc: hadi, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, netdev, Paul Jakma Tommy Christensen wrote: > Jeff Garzik wrote: > >> I haven't heard anything to convince me that the same change should be >> deployed across NNN drivers. The drivers already signal the net core >> that the link is down; to me, that implies there should be code in >> _one_ place that handles this condition, not NNN places. > > > AFAICS only a handful of (newer) drivers call netif_stop_queue() directly. > Others may do this indirectly if the MAC stops taking packets from the > DMA ringbuffer. At least some MAC's/drivers certainly don't. Incorrect. Use of netif_stop_queue() is -required- to signal that the hardware cannot accept any more skbs from the system. Far more than a "handful" and required for all but a few very strange drivers. > OK, another view on this: isn't is problematic to have skb's stuck in > the network stack "indefinitely" ? > They hold references to a dst_entry and a sock (and probably more). > So how about this for the FAQ: > Q: Why can't I unload the af_packet module? > A: Ohh, you'll have to plug in the darn cable to eth0 first! > *Please* tell me, I've got this all wrong. You've got this all wrong. Jeff ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-20 18:54 ` Jeff Garzik 2004-12-21 0:13 ` Tommy Christensen @ 2004-12-22 10:56 ` Thomas Spatzier 2004-12-22 11:07 ` Jeff Garzik 2004-12-22 13:48 ` jamal 1 sibling, 2 replies; 67+ messages in thread From: Thomas Spatzier @ 2004-12-22 10:56 UTC (permalink / raw) To: Jeff Garzik Cc: David S. Miller, hadi, Hasso Tepper, Herbert Xu, netdev, Paul Jakma, Tommy Christensen Jeff Garzik <jgarzik@pobox.com> wrote on 20.12.2004 19:54:53: > I haven't heard anything to convince me that the same change should be > deployed across NNN drivers. The drivers already signal the net core > that the link is down; to me, that implies there should be code in _one_ > place that handles this condition, not NNN places. > This sounds plausible and I'm with Jeff here. For me as the driver author it's the smallest change. I will put it like this: on cable gone: netif_stop_queue(); netif_carrier_off(); on cable reconnected: netif_carrier_on(); netif_wake_queue(); Is that ok, i.e. what all drivers do or should do? For the problems that applications might have (i.e. sockets being blocked etc.) another solution should be found. And as Jeff pointed out, this should be a central solution and not be implemented in drivers. Regards, Thomas. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-22 10:56 ` Thomas Spatzier @ 2004-12-22 11:07 ` Jeff Garzik 2004-12-22 13:48 ` jamal 1 sibling, 0 replies; 67+ messages in thread From: Jeff Garzik @ 2004-12-22 11:07 UTC (permalink / raw) To: Thomas Spatzier Cc: David S. Miller, hadi, Hasso Tepper, Herbert Xu, netdev, Paul Jakma, Tommy Christensen On Wed, Dec 22, 2004 at 11:56:06AM +0100, Thomas Spatzier wrote: > on cable reconnected: > netif_carrier_on(); > netif_wake_queue(); Side note to all driver authors, make sure you only ever start or wake the queue if there is truly space available for another skb. Jeff ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-22 10:56 ` Thomas Spatzier 2004-12-22 11:07 ` Jeff Garzik @ 2004-12-22 13:48 ` jamal 2005-01-03 9:10 ` Thomas Spatzier 1 sibling, 1 reply; 67+ messages in thread From: jamal @ 2004-12-22 13:48 UTC (permalink / raw) To: Thomas Spatzier Cc: Jeff Garzik, David S. Miller, Hasso Tepper, Herbert Xu, netdev, Paul Jakma, Tommy Christensen On Wed, 2004-12-22 at 05:56, Thomas Spatzier wrote: > > Is that ok, i.e. what all drivers do or should do? > > For the problems that applications might have (i.e. sockets being > blocked etc.) another solution should be found. > And as Jeff pointed out, this should be a central solution and > not be implemented in drivers. > I think this needs to be resolved too. It is possible to have a centralized action instead of requiring drivers to make changes if we know the state of the driver is in netcarrier_off. What that would require is on cable gone, you just say: netif_carrier_off(); and the top layer code will junk the packets before they hit the driver. This way the socket code can continue sending whatever it wants but if theres no link, then its fair to drop those packets? If this acceptable i can generate a quick patch. cheers, jamal ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-22 13:48 ` jamal @ 2005-01-03 9:10 ` Thomas Spatzier 2005-01-03 15:05 ` jamal 0 siblings, 1 reply; 67+ messages in thread From: Thomas Spatzier @ 2005-01-03 9:10 UTC (permalink / raw) To: hadi Cc: David S. Miller, Hasso Tepper, Herbert Xu, Jeff Garzik, netdev, Paul Jakma, Tommy Christensen jamal <hadi@cyberus.ca> wrote on 22.12.2004 14:48:28: > I think this needs to be resolved too. > It is possible to have a centralized action instead of requiring drivers > to make changes if we know the state of the driver is in netcarrier_off. > What that would require is > on cable gone, you just say: > netif_carrier_off(); > and the top layer code will junk the packets before they hit the driver. > This way the socket code can continue sending whatever it wants but if > theres no link, then its fair to drop those packets? > > If this acceptable i can generate a quick patch. Does Jamal's solution sound good for all? Then I would change my driver to do the following: just call netif_carrier_off() (not netif_stop_queue) Then the upper layers will do the propper thing, so I should not get any more packets. If I still get some, I will drop them. Did I get this correctly? BTW: A happy new year to all ;-) Regards, Thomas. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-03 9:10 ` Thomas Spatzier @ 2005-01-03 15:05 ` jamal 2005-01-04 23:28 ` Jeff Garzik 0 siblings, 1 reply; 67+ messages in thread From: jamal @ 2005-01-03 15:05 UTC (permalink / raw) To: Thomas Spatzier Cc: David S. Miller, Hasso Tepper, Herbert Xu, Jeff Garzik, netdev, Paul Jakma, Tommy Christensen The change is simple if theres consensus to go this path. cheers, jamal On Mon, 2005-01-03 at 04:10, Thomas Spatzier wrote: > > > jamal <hadi@cyberus.ca> wrote on 22.12.2004 14:48:28: > > I think this needs to be resolved too. > > It is possible to have a centralized action instead of requiring drivers > > to make changes if we know the state of the driver is in netcarrier_off. > > What that would require is > > on cable gone, you just say: > > netif_carrier_off(); > > and the top layer code will junk the packets before they hit the driver. > > This way the socket code can continue sending whatever it wants but if > > theres no link, then its fair to drop those packets? > > > > If this acceptable i can generate a quick patch. > > Does Jamal's solution sound good for all? Then I would change my driver > to do the following: > just call netif_carrier_off() (not netif_stop_queue) > > Then the upper layers will do the propper thing, so I should not > get any more packets. If I still get some, I will drop them. > Did I get this correctly? > > BTW: A happy new year to all ;-) > > Regards, > Thomas. > > ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-03 15:05 ` jamal @ 2005-01-04 23:28 ` Jeff Garzik 2005-01-05 3:19 ` jamal 2005-01-05 6:26 ` Paul Jakma 0 siblings, 2 replies; 67+ messages in thread From: Jeff Garzik @ 2005-01-04 23:28 UTC (permalink / raw) To: hadi Cc: Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, netdev, Paul Jakma, Tommy Christensen jamal wrote: > The change is simple if theres consensus to go this path. Can you resend your patch? My main objection was that any change should be made in the core, not in individual net drivers. Another objection was that it seemed that some of the proposed solutions for clearing the queue on link down were imposing app-specific policy on the overall kernel. Aren't there cases where people would -not- want the queue to be cleared? Jeff ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-04 23:28 ` Jeff Garzik @ 2005-01-05 3:19 ` jamal 2005-01-05 6:30 ` Paul Jakma 2005-01-05 15:35 ` Tommy Christensen 2005-01-05 6:26 ` Paul Jakma 1 sibling, 2 replies; 67+ messages in thread From: jamal @ 2005-01-05 3:19 UTC (permalink / raw) To: Jeff Garzik Cc: Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, netdev, Paul Jakma, Tommy Christensen [-- Attachment #1: Type: text/plain, Size: 1005 bytes --] On Tue, 2005-01-04 at 18:28, Jeff Garzik wrote: > jamal wrote: > > The change is simple if theres consensus to go this path. > > Can you resend your patch? I didnt send any patch - but heres one that looks right - havent tried compiling it. > My main objection was that any change should be made in the core, not in > individual net drivers. Attached patch resolves that concern > Another objection was that it seemed that some of the proposed solutions > for clearing the queue on link down were imposing app-specific policy on > the overall kernel. > > Aren't there cases where people would -not- want the queue to be cleared? Thats a good question and your point of imposing policy in the kernel is valid. I know that i would probably want packets to appear they are going out when the wire is pulled from underneath me. Theres possibly people who would want it differently - so for we havent heard from them. Maybe poll far and wide - or push it in and wait for them to whine. cheers, jamal [-- Attachment #2: nc.p --] [-- Type: text/plain, Size: 512 bytes --] --- 2610-bk1/net/sched/sch_generic.c 2005/01/05 01:21:04 1.1 +++ 2610-bk1/net/sched/sch_generic.c 2005/01/05 01:36:26 @@ -152,7 +152,7 @@ spin_lock(&dev->queue_lock); goto collision; } - } + } /* NETDEV_TX_BUSY - we need to requeue */ /* Release the driver */ @@ -162,6 +162,11 @@ } spin_lock(&dev->queue_lock); q = dev->qdisc; + if (!netif_carrier_ok(dev)) { + kfree_skb(skb); + q->qstats.drops++; + return -1; + } } /* Device kicked us out :( ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-05 3:19 ` jamal @ 2005-01-05 6:30 ` Paul Jakma 2005-01-05 13:16 ` jamal 2005-01-05 15:35 ` Tommy Christensen 1 sibling, 1 reply; 67+ messages in thread From: Paul Jakma @ 2005-01-05 6:30 UTC (permalink / raw) To: jamal Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, netdev, Tommy Christensen On Wed, 4 Jan 2005, jamal wrote: > Theres possibly people who would want it differently - so for we > havent heard from them. Maybe poll far and wide - or push it in and > wait for them to whine. We're whining. This policy in the Linux kernel makes using Linux dangerous for certain routing applications (RIP, routing advertisements). Name a non-TCP application that *does* want this newish kernel policy and tell me how it breaks without this *new* policy. ;) > cheers, > jamal regards, -- Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A Fortune: My only love sprung from my only hate! Too early seen unknown, and known too late! -- William Shakespeare, "Romeo and Juliet" ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-05 6:30 ` Paul Jakma @ 2005-01-05 13:16 ` jamal 2005-01-05 14:29 ` Paul Jakma 0 siblings, 1 reply; 67+ messages in thread From: jamal @ 2005-01-05 13:16 UTC (permalink / raw) To: Paul Jakma Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, netdev, Tommy Christensen On Wed, 2005-01-05 at 01:30, Paul Jakma wrote: > On Wed, 4 Jan 2005, jamal wrote: > > > Theres possibly people who would want it differently - so for we > > havent heard from them. Maybe poll far and wide - or push it in and > > wait for them to whine. > > We're whining. > > This policy in the Linux kernel makes using Linux dangerous for > certain routing applications (RIP, routing advertisements). > > Name a non-TCP application that *does* want this newish kernel policy > and tell me how it breaks without this *new* policy. ;) Ok, Iam confused - I thought you guys _wanted this_ ;-> The issue is about message obsolency more than it is about reliability. Without this the scenario you played you played for us before was: 1)=>send a few LSAs from user space, pull cable before they go out (this will happen if you are sending sufficiently large amounts of packets i.e network is busy), 2)=>user space gets notified via netlink, device shuts down acces to DMA, packets queued anyways and you have no ability to say "ooops,sorry take that packet back" a)You could do a move to another device at this point. or b) dumb app will continue sending 3)=>plug cable back in 2 minutes later, obsolete LSAs sent followed by any new ones that may follow .... With the patch, packets in #2 will be dropped. As a matter of fact within those two minutes, if stopped, it is probable the device watchdog timer will kick in and flush the DMA but not the scheduler queues above it (which is where upto a 1000 stale packets could be sitting). What is it that you dont like now? cheers, jamal ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-05 13:16 ` jamal @ 2005-01-05 14:29 ` Paul Jakma 2005-01-06 13:55 ` jamal 0 siblings, 1 reply; 67+ messages in thread From: Paul Jakma @ 2005-01-05 14:29 UTC (permalink / raw) To: jamal Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, netdev, Tommy Christensen On Wed, 5 Jan 2005, jamal wrote: > Ok, Iam confused - I thought you guys _wanted this_ ;-> I'm confused too now. We dont want "queue packets" - that's the 'newish' policy I was referring to. (newish in quotes, as I'm not sure from when this behaviour dates). > The issue is about message obsolency more than it is about reliability. Right, exactly. So we do want what you think we want ;) (i think). > Without this the scenario you played you played for us before was: > > 1)=>send a few LSAs from user space, pull cable before they go out > (this will happen if you are sending sufficiently large amounts of > packets i.e network is busy), > 2)=>user space gets notified via netlink, device shuts down acces to > DMA, packets queued anyways and you have no ability to say "ooops,sorry > take that packet back" Right. Two things: - we noticed this behaviour because of OSPF Users reported ospfd would cease to send packets on all interfaces, (with certain drivers) because /one/ interface was link-down. We can workaround this easily by opening a socket per interface - at present we simply punt OSPF packets down a single raw socket and rely on IP_HDRINCL to have kernel route the packet out correct interface (IP_MULTICAST_IF for multicast destined packets). - The queueing does not affect OSPF terribly, it would affect other protocols though OSPF implements its own 'synchronisation' facilities between neighbours and can easily 'detect' obsolecent packets. So the obsolence issue does not affect it, routing-information in stale packets will not propogate, so they cant do much damage really. (just unneccessary to queue and send such packets). However, other commonly used protocols are not as robust. Mostly those where a protocol is used to distribute routing information to passive listeners, eg: - RIP - IPv4 ICMP based router-discovery (IRDP) - IPv6 Router-advertisements In these cases, the queuing behaviour is potentially dangerous and could disrupt connectivity by propogating no-longer-valid routing information. > a)You could do a move to another device at this point. > or > b) dumb app will continue sending > > 3)=>plug cable back in 2 minutes later, obsolete LSAs sent followed by > any new ones that may follow Right. Except OSPF is robust enough against stale packets. Other protocols are not. > With the patch, packets in #2 will be dropped. Perfect. > As a matter of fact within those two minutes, if stopped, it is > probable the device watchdog timer will kick in and flush the DMA > but not the scheduler queues above it (which is where upto a 1000 > stale packets could be sitting). Right, and our argument it doesnt make sense to send those packets. I've never heard of any UDP and/or raw application that expected a kernel to queue packets if they could not be sent for lack of link or other problem, and any which did are surely broken by definition? ;) > What is it that you dont like now? Sorry, wires crossed re "new behaviour". The "new new" behaviour in the patch as you describe would be perfect. PS: Another issue, could we have kernel space IP fragmentation for IP_HDRINCL sockets please? We currently have to implement fragmentation ourselves, which seems silly given that kernel already has this functionality. > cheers, > jamal regards, -- Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A Fortune: The early bird who catches the worm works for someone who comes in late and owns the worm farm. -- Travis McGee ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-05 14:29 ` Paul Jakma @ 2005-01-06 13:55 ` jamal 0 siblings, 0 replies; 67+ messages in thread From: jamal @ 2005-01-06 13:55 UTC (permalink / raw) To: Paul Jakma Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, netdev, Tommy Christensen On Wed, 2005-01-05 at 09:29, Paul Jakma wrote: > On Wed, 5 Jan 2005, jamal wrote: [..] > Sorry, wires crossed re "new behaviour". The "new new" behaviour in > the patch as you describe would be perfect. > > PS: Another issue, could we have kernel space IP fragmentation for > IP_HDRINCL sockets please? We currently have to implement > fragmentation ourselves, which seems silly given that kernel already > has this functionality. I will let some other fireman grab this bait ;-> If not i will revisit it after a unknown/random timeout. cheers, jamal ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-05 3:19 ` jamal 2005-01-05 6:30 ` Paul Jakma @ 2005-01-05 15:35 ` Tommy Christensen 2005-01-06 13:58 ` jamal 1 sibling, 1 reply; 67+ messages in thread From: Tommy Christensen @ 2005-01-05 15:35 UTC (permalink / raw) To: hadi Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, netdev, Paul Jakma [-- Attachment #1: Type: text/plain, Size: 748 bytes --] jamal wrote: > On Tue, 2005-01-04 at 18:28, Jeff Garzik wrote: > >>jamal wrote: >> >>>The change is simple if theres consensus to go this path. >> >>Can you resend your patch? > > > I didnt send any patch - but heres one that looks right - havent tried > compiling it. Thank you for diving into this, Jamal. For the patch to have much effect, we need to check the carrier before calling hard_start_xmit(). Like in the modified patch below. >>My main objection was that any change should be made in the core, not in >>individual net drivers. > > > Attached patch resolves that concern Except for the drivers that call netif_stop_queue() on link-down. These calls (and the corresponding netif_wake_queue) would have to be removed. -Tommy [-- Attachment #2: sch_generic.c.patch --] [-- Type: text/plain, Size: 615 bytes --] --- linux-2.6.10-bk7/net/sched/sch_generic.c Wed Jan 5 16:27:13 2005 +++ linux-2.6.10-work/net/sched/sch_generic.c Wed Jan 5 16:28:53 2005 @@ -134,7 +134,7 @@ /* And release queue */ spin_unlock(&dev->queue_lock); - if (!netif_queue_stopped(dev)) { + if (!netif_queue_stopped(dev) && netif_carrier_ok(dev)) { int ret; if (netdev_nit) dev_queue_xmit_nit(skb, dev); @@ -162,6 +162,11 @@ } spin_lock(&dev->queue_lock); q = dev->qdisc; + if (!netif_carrier_ok(dev)) { + kfree_skb(skb); + q->qstats.drops++; + return -1; + } } /* Device kicked us out :( ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-05 15:35 ` Tommy Christensen @ 2005-01-06 13:58 ` jamal 2005-01-06 15:06 ` Tommy Christensen 0 siblings, 1 reply; 67+ messages in thread From: jamal @ 2005-01-06 13:58 UTC (permalink / raw) To: Tommy Christensen Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, netdev, Paul Jakma On Wed, 2005-01-05 at 10:35, Tommy Christensen wrote: > jamal wrote: > > Except for the drivers that call netif_stop_queue() on link-down. These > calls (and the corresponding netif_wake_queue) would have to be removed. If we assume all drivers do: netif_stop then carrier_off then you dont need that extra check. Thats the working assumption i had - maybe a comment is deserving or we could say we dont think that all drivers are going to follow that sequence. cheers, jamal ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-06 13:58 ` jamal @ 2005-01-06 15:06 ` Tommy Christensen 2005-01-07 13:32 ` jamal 0 siblings, 1 reply; 67+ messages in thread From: Tommy Christensen @ 2005-01-06 15:06 UTC (permalink / raw) To: hadi Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, netdev, Paul Jakma On Thu, 2005-01-06 at 14:58, jamal wrote: > On Wed, 2005-01-05 at 10:35, Tommy Christensen wrote: > > jamal wrote: > > > > > Except for the drivers that call netif_stop_queue() on link-down. These > > calls (and the corresponding netif_wake_queue) would have to be removed. > > If we assume all drivers do: > netif_stop then carrier_off then you dont need that extra check. > Thats the working assumption i had - maybe a comment is deserving or > we could say we dont think that all drivers are going to follow that > sequence. But qdisc_restart() isn't called any more after the queue is stopped. So how do we get to drain the packets? Another approach could be to reset the qdisc (kind of what dev_deactivate does) if the driver stays in queue_stopped and carrier_off for some period of time. -Tommy ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-06 15:06 ` Tommy Christensen @ 2005-01-07 13:32 ` jamal 2005-01-07 15:26 ` Tommy Christensen 0 siblings, 1 reply; 67+ messages in thread From: jamal @ 2005-01-07 13:32 UTC (permalink / raw) To: Tommy Christensen Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, netdev, Paul Jakma On Thu, 2005-01-06 at 10:06, Tommy Christensen wrote: > But qdisc_restart() isn't called any more after the queue is > stopped. So how do we get to drain the packets? We return -1; so qdisc restart will be called until theres no packets left on the queue. Did i miss something in the state machine there? > Another approach could be to reset the qdisc (kind of what > dev_deactivate does) if the driver stays in queue_stopped > and carrier_off for some period of time. > reseting qdisc could certainly be part of the device watchdog. Note that once the packets are drained, the next thing that will happen if device is stopped for a dev->timeout period is watchdog kicking in and flushing the DMA. So you could reset your qdisc there though i am not sure it will be needed with that patch. cheers, jamal ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-07 13:32 ` jamal @ 2005-01-07 15:26 ` Tommy Christensen 2005-01-10 13:18 ` jamal 0 siblings, 1 reply; 67+ messages in thread From: Tommy Christensen @ 2005-01-07 15:26 UTC (permalink / raw) To: hadi Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, netdev, Paul Jakma On Fri, 2005-01-07 at 14:32, jamal wrote: > On Thu, 2005-01-06 at 10:06, Tommy Christensen wrote: > > > But qdisc_restart() isn't called any more after the queue is > > stopped. So how do we get to drain the packets? > > We return -1; so qdisc restart will be called until theres no packets > left on the queue. Did i miss something in the state machine there? Yes, qdisc_run does this while (!netif_queue_stopped(dev) && qdisc_restart(dev) < 0) /* NOTHING */; > > Another approach could be to reset the qdisc (kind of what > > dev_deactivate does) if the driver stays in queue_stopped > > and carrier_off for some period of time. > > > > reseting qdisc could certainly be part of the device watchdog. > Note that once the packets are drained, the next thing that will happen > if device is stopped for a dev->timeout period is watchdog kicking in > and flushing the DMA. So you could reset your qdisc there though i am > not sure it will be needed with that patch. Unfortunately the watchdog won't do this when carrier is off, which is fair enough since the chip isn't hung as such. So, we would need another method to flush out stale packets from the DMA ring. Some drivers already do this themselves, and some won't need it because they keep on running even when link is down. -Tommy ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-07 15:26 ` Tommy Christensen @ 2005-01-10 13:18 ` jamal 2005-01-16 23:10 ` jamal 0 siblings, 1 reply; 67+ messages in thread From: jamal @ 2005-01-10 13:18 UTC (permalink / raw) To: Tommy Christensen Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, netdev, Paul Jakma On Fri, 2005-01-07 at 10:26, Tommy Christensen wrote: > On Fri, 2005-01-07 at 14:32, jamal wrote: > > On Thu, 2005-01-06 at 10:06, Tommy Christensen wrote: [..] > > reseting qdisc could certainly be part of the device watchdog. > > Note that once the packets are drained, the next thing that will happen > > if device is stopped for a dev->timeout period is watchdog kicking in > > and flushing the DMA. So you could reset your qdisc there though i am > > not sure it will be needed with that patch. > > Unfortunately the watchdog won't do this when carrier is off, which > is fair enough since the chip isn't hung as such. We may have to kick the dog in that path as well then. > So, we would need another method to flush out stale packets from > the DMA ring. Some drivers already do this themselves, and some won't > need it because they keep on running even when link is down. kicking tx_timeout seems like the right thing to do until i looked at the drivers. Sigh. I just stared at a few drivers and it is not pleasant - and they mostly have the same theme (so you cant say they are inconsistent). Looking at e1000 for example is quiet interesting - it actually brings down then up the device ;-> This will both flash the dma and reset qdisc but also loose any static routes. It will also reschedule the device;-< So i am not so sure that the dog kicking in a minute later is a good thing at all. We may need a lot of janitorial work to cleanse the drivers - something jgarzik is against. A qdisc_reset, as you suggest, will probably not cure the problem but provide some cosmetics. Thoughts? cheers, jamal ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-10 13:18 ` jamal @ 2005-01-16 23:10 ` jamal 2005-01-17 12:04 ` Hasso Tepper 2005-01-17 21:38 ` Tommy Christensen 0 siblings, 2 replies; 67+ messages in thread From: jamal @ 2005-01-16 23:10 UTC (permalink / raw) To: Tommy Christensen Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, netdev, Paul Jakma Didnt see anymore discussion on this - have we given up on it? cheers, jamal On Mon, 2005-01-10 at 08:18, jamal wrote: > On Fri, 2005-01-07 at 10:26, Tommy Christensen wrote: > > On Fri, 2005-01-07 at 14:32, jamal wrote: > > > On Thu, 2005-01-06 at 10:06, Tommy Christensen wrote: > > [..] > > > > reseting qdisc could certainly be part of the device watchdog. > > > Note that once the packets are drained, the next thing that will happen > > > if device is stopped for a dev->timeout period is watchdog kicking in > > > and flushing the DMA. So you could reset your qdisc there though i am > > > not sure it will be needed with that patch. > > > > Unfortunately the watchdog won't do this when carrier is off, which > > is fair enough since the chip isn't hung as such. > > We may have to kick the dog in that path as well then. > > > So, we would need another method to flush out stale packets from > > the DMA ring. Some drivers already do this themselves, and some won't > > need it because they keep on running even when link is down. > > kicking tx_timeout seems like the right thing to do until i looked at > the drivers. Sigh. I just stared at a few drivers and it is not pleasant > - and they mostly have the same theme (so you cant say they are > inconsistent). > Looking at e1000 for example is quiet interesting - it actually brings > down then up the device ;-> This will both flash the dma and reset qdisc > but also loose any static routes. It will also reschedule the device;-< > So i am not so sure that the dog kicking in a minute later is a good > thing at all. We may need a lot of janitorial work to cleanse the > drivers - something jgarzik is against. > > A qdisc_reset, as you suggest, will probably not cure the problem but > provide some cosmetics. Thoughts? > > cheers, > jamal > > > ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-16 23:10 ` jamal @ 2005-01-17 12:04 ` Hasso Tepper 2005-01-17 22:04 ` Tommy Christensen 2005-01-17 21:38 ` Tommy Christensen 1 sibling, 1 reply; 67+ messages in thread From: Hasso Tepper @ 2005-01-17 12:04 UTC (permalink / raw) To: hadi Cc: Tommy Christensen, Jeff Garzik, Thomas Spatzier, David S. Miller, Herbert Xu, netdev, Paul Jakma jamal wrote: > Didnt see anymore discussion on this - have we given up on it? I'm not competent enough about kernel internals to decide how this should be done in kernel. But I would like to clarify situation for me as Quagga developer. All this work is done to avoid sending stale packets to the network? Ie. there will be no change regarding socket blocking issue? Detecting carrier on/off and/or socket per interface is must be for us in user space to avoid socket blocking? I'm just trying to have full picture so I can put my priorities in place for next release. Thanks for your support, -- Hasso Tepper Elion Enterprises Ltd. WAN administrator ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-17 12:04 ` Hasso Tepper @ 2005-01-17 22:04 ` Tommy Christensen 2005-01-17 22:13 ` Peter Buckingham 0 siblings, 1 reply; 67+ messages in thread From: Tommy Christensen @ 2005-01-17 22:04 UTC (permalink / raw) To: Hasso Tepper Cc: hadi, Jeff Garzik, Thomas Spatzier, David S. Miller, Herbert Xu, netdev, Paul Jakma Hasso Tepper wrote: > All this work is done to avoid sending stale packets to the network? Ie. > there will be no change regarding socket blocking issue? Detecting carrier > on/off and/or socket per interface is must be for us in user space to avoid > socket blocking? I'm just trying to have full picture so I can put my > priorities in place for next release. A proper solution would take care of both these issues: "stale packets" and "socket blocking". They are pretty much related. Using a socket for each interface *ought* not be needed. Carrier detection sounds like a good thing in its own right. Hope this helps, Tommy ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-17 22:04 ` Tommy Christensen @ 2005-01-17 22:13 ` Peter Buckingham 2005-01-17 22:36 ` jamal 2005-01-17 22:53 ` Tommy Christensen 0 siblings, 2 replies; 67+ messages in thread From: Peter Buckingham @ 2005-01-17 22:13 UTC (permalink / raw) To: Tommy Christensen Cc: Hasso Tepper, hadi, Jeff Garzik, Thomas Spatzier, David S. Miller, Herbert Xu, netdev, Paul Jakma Hi, Tommy Christensen wrote: > A proper solution would take care of both these issues: "stale packets" > and "socket blocking". They are pretty much related. > > Using a socket for each interface *ought* not be needed. > Carrier detection sounds like a good thing in its own right. I came across this same problem with multicast/unicast with an e1000. For a quick hack we just check to see whether the carrier is ok, if it's not we just drop the packet. this might do some nasty things with tcp and it may be better to check to see wither the socket buffer is full, but this works for me here... (this is against a 2.6.9ish kernel) peter Index: net/core/dev.c =================================================================== --- net/core/dev.c +++ net/core/dev.c @@ -1379,6 +1379,11 @@ #ifdef CONFIG_NET_CLS_ACT skb->tc_verd = SET_TC_AT(skb->tc_verd,AT_EGRESS); #endif + if (!netif_carrier_ok(dev)) { + rc = NET_XMIT_DROP; + goto out_kfree_skb; + } + if (q->enqueue) { /* Grab device queue */ spin_lock(&dev->queue_lock); ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-17 22:13 ` Peter Buckingham @ 2005-01-17 22:36 ` jamal 2005-01-17 22:53 ` Tommy Christensen 1 sibling, 0 replies; 67+ messages in thread From: jamal @ 2005-01-17 22:36 UTC (permalink / raw) To: Peter Buckingham Cc: Tommy Christensen, Hasso Tepper, Jeff Garzik, Thomas Spatzier, David S. Miller, Herbert Xu, netdev, Paul Jakma Hi there, The solution that Tommy and myself have been batting around is a lot more cleaner than this (however still doesnt solve the issue 100%) cheers, jamal On Mon, 2005-01-17 at 17:13, Peter Buckingham wrote: > Hi, > > Tommy Christensen wrote: > > A proper solution would take care of both these issues: "stale packets" > > and "socket blocking". They are pretty much related. > > > > Using a socket for each interface *ought* not be needed. > > Carrier detection sounds like a good thing in its own right. > > I came across this same problem with multicast/unicast with an e1000. > For a quick hack we just check to see whether the carrier is ok, if it's > not we just drop the packet. this might do some nasty things with tcp > and it may be better to check to see wither the socket buffer is full, > but this works for me here... (this is against a 2.6.9ish kernel) > > peter > > Index: net/core/dev.c > =================================================================== > --- net/core/dev.c > +++ net/core/dev.c > @@ -1379,6 +1379,11 @@ > #ifdef CONFIG_NET_CLS_ACT > skb->tc_verd = SET_TC_AT(skb->tc_verd,AT_EGRESS); > #endif > + if (!netif_carrier_ok(dev)) { > + rc = NET_XMIT_DROP; > + goto out_kfree_skb; > + } > + > if (q->enqueue) { > /* Grab device queue */ > spin_lock(&dev->queue_lock); > > ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-17 22:13 ` Peter Buckingham 2005-01-17 22:36 ` jamal @ 2005-01-17 22:53 ` Tommy Christensen 1 sibling, 0 replies; 67+ messages in thread From: Tommy Christensen @ 2005-01-17 22:53 UTC (permalink / raw) To: Peter Buckingham Cc: Hasso Tepper, hadi, Jeff Garzik, Thomas Spatzier, David S. Miller, Herbert Xu, netdev, Paul Jakma Peter Buckingham wrote: > Hi, > > Tommy Christensen wrote: > >> A proper solution would take care of both these issues: "stale packets" >> and "socket blocking". They are pretty much related. >> >> Using a socket for each interface *ought* not be needed. >> Carrier detection sounds like a good thing in its own right. > > > I came across this same problem with multicast/unicast with an e1000. > For a quick hack we just check to see whether the carrier is ok, if it's > not we just drop the packet. this might do some nasty things with tcp > and it may be better to check to see wither the socket buffer is full, > but this works for me here... (this is against a 2.6.9ish kernel) Thanks for the input! The downside of this solution is that it doesn't do anything with the packets already queued for tx (in the qdisc queue and in the DMA ring). OTOH it probably works just fine in real life (!). Also, we may not want to handle all devices alike, as this patch does. -Tommy ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-16 23:10 ` jamal 2005-01-17 12:04 ` Hasso Tepper @ 2005-01-17 21:38 ` Tommy Christensen 2005-01-30 23:39 ` Tommy Christensen 1 sibling, 1 reply; 67+ messages in thread From: Tommy Christensen @ 2005-01-17 21:38 UTC (permalink / raw) To: hadi Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, netdev, Paul Jakma jamal wrote: > Didnt see anymore discussion on this - have we given up on it? It is not forgotten - just stalled ... I'll try to look at it some more. -Tommy ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-17 21:38 ` Tommy Christensen @ 2005-01-30 23:39 ` Tommy Christensen 2005-01-31 0:09 ` jamal 0 siblings, 1 reply; 67+ messages in thread From: Tommy Christensen @ 2005-01-30 23:39 UTC (permalink / raw) To: hadi Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, netdev, Paul Jakma [-- Attachment #1: Type: text/plain, Size: 394 bytes --] Tommy Christensen wrote: > jamal wrote: > >> Didnt see anymore discussion on this - have we given up on it? > > > It is not forgotten - just stalled ... > I'll try to look at it some more. Sorry for the long delay. Jamal, do you think something like this could work? Flushing of the DMA ring could be added if the need arises. e1000 does this already in the driver on link-down. -Tommy [-- Attachment #2: carrier.patch --] [-- Type: text/plain, Size: 1671 bytes --] diff -ur linux-2.6.11-rc2-bk8/net/core/link_watch.c linux-2.6.11-work/net/core/link_watch.c --- linux-2.6.11-rc2-bk8/net/core/link_watch.c 2005-01-30 22:08:44.000000000 +0100 +++ linux-2.6.11-work/net/core/link_watch.c 2005-01-31 00:17:45.716039247 +0100 @@ -16,6 +16,7 @@ #include <linux/netdevice.h> #include <linux/if.h> #include <net/sock.h> +#include <net/pkt_sched.h> #include <linux/rtnetlink.h> #include <linux/jiffies.h> #include <linux/spinlock.h> @@ -74,6 +75,9 @@ clear_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state); if (dev->flags & IFF_UP) { + if (netif_carrier_ok(dev) && + dev->qdisc_sleeping != &noop_qdisc) + dev_activate(dev); netdev_state_change(dev); } diff -ur linux-2.6.11-rc2-bk8/net/sched/sch_generic.c linux-2.6.11-work/net/sched/sch_generic.c --- linux-2.6.11-rc2-bk8/net/sched/sch_generic.c 2005-01-30 22:11:50.458216314 +0100 +++ linux-2.6.11-work/net/sched/sch_generic.c 2005-01-31 00:06:47.609769835 +0100 @@ -185,6 +185,7 @@ static void dev_watchdog(unsigned long arg) { struct net_device *dev = (struct net_device *)arg; + int check_carrier = 0; spin_lock(&dev->xmit_lock); if (dev->qdisc != &noop_qdisc) { @@ -198,10 +199,23 @@ } if (!mod_timer(&dev->watchdog_timer, jiffies + dev->watchdog_timeo)) dev_hold(dev); - } + } else + check_carrier = 1; } spin_unlock(&dev->xmit_lock); + if (check_carrier) { + spin_lock(&dev->queue_lock); + if (!netif_carrier_ok(dev) && netif_queue_stopped(dev)) { + struct Qdisc *qdisc; + + qdisc = dev->qdisc; + dev->qdisc = &noop_qdisc; + qdisc_reset(qdisc); + } + spin_unlock(&dev->queue_lock); + } + dev_put(dev); } ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-30 23:39 ` Tommy Christensen @ 2005-01-31 0:09 ` jamal 2005-01-31 0:12 ` jamal 2005-01-31 0:31 ` Tommy Christensen 0 siblings, 2 replies; 67+ messages in thread From: jamal @ 2005-01-31 0:09 UTC (permalink / raw) To: Tommy Christensen Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, netdev, Paul Jakma Tommy, I wasnt sure why you did: dev->qdisc = &noop_qdisc; You should probably save the old qdisc in qdisc_sleeping instead and restore it on wakeup - otherwise you always end with default qdisc. check_carrier should probably just call dev_activate which does all you wanted, no? cheers, jamal ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-31 0:09 ` jamal @ 2005-01-31 0:12 ` jamal 2005-01-31 0:31 ` Tommy Christensen 1 sibling, 0 replies; 67+ messages in thread From: jamal @ 2005-01-31 0:12 UTC (permalink / raw) To: Tommy Christensen Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, netdev, Paul Jakma On Sun, 2005-01-30 at 19:09, jamal wrote: > Tommy, > > I wasnt sure why you did: > > dev->qdisc = &noop_qdisc; > > You should probably save the old qdisc in qdisc_sleeping instead > and restore it on wakeup - otherwise you always end with default qdisc. > check_carrier should probably just call dev_activate meant dev_deactivate - watch them locks though .. cheers, jamal > which does all you > wanted, no? > > cheers, > jamal ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-31 0:09 ` jamal 2005-01-31 0:12 ` jamal @ 2005-01-31 0:31 ` Tommy Christensen 2005-01-31 3:26 ` jamal 1 sibling, 1 reply; 67+ messages in thread From: Tommy Christensen @ 2005-01-31 0:31 UTC (permalink / raw) To: hadi Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, netdev, Paul Jakma jamal wrote: > Tommy, > > I wasnt sure why you did: > > dev->qdisc = &noop_qdisc; > > You should probably save the old qdisc in qdisc_sleeping instead > and restore it on wakeup - otherwise you always end with default qdisc. I think normally we will have qdisc == qdisc_sleeping. At least this is how I read the code in dev_graft_qdisc(). When can they differ? > check_carrier should probably just call dev_activate which does all you > wanted, no? I copied the logic from dev_deactivate(), but I didn't want to include the waiting parts, since this is not in process context. OK? -Tommy ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-31 0:31 ` Tommy Christensen @ 2005-01-31 3:26 ` jamal 2005-01-31 12:16 ` Tommy Christensen 0 siblings, 1 reply; 67+ messages in thread From: jamal @ 2005-01-31 3:26 UTC (permalink / raw) To: Tommy Christensen Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, netdev, Paul Jakma On Sun, 2005-01-30 at 19:31, Tommy Christensen wrote: > jamal wrote: > > Tommy, > > > > I wasnt sure why you did: > > > > dev->qdisc = &noop_qdisc; > > > > You should probably save the old qdisc in qdisc_sleeping instead > > and restore it on wakeup - otherwise you always end with default qdisc. > > I think normally we will have qdisc == qdisc_sleeping. At least this > is how I read the code in dev_graft_qdisc(). When can they differ? You are correct. > > check_carrier should probably just call dev_activate which does all you > > wanted, no? > > I copied the logic from dev_deactivate(), but I didn't want to include > the waiting parts, since this is not in process context. OK? Same here. It does look pretty sane.. Tested? cheers, jamal ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-31 3:26 ` jamal @ 2005-01-31 12:16 ` Tommy Christensen 2005-03-13 17:49 ` Hasso Tepper 0 siblings, 1 reply; 67+ messages in thread From: Tommy Christensen @ 2005-01-31 12:16 UTC (permalink / raw) To: hadi Cc: Jeff Garzik, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, netdev, Paul Jakma On Mon, 2005-01-31 at 04:26, jamal wrote: > It does look pretty sane.. Tested? Nope, I don't have access to any relevant HW. Hopefully someone else can give it some beating. Please. A simple test could be: o start tcpdump on host B o start pinging from host A to B o unplug cable on host A o wait a while and then plug the cable back in This is what I get on an embedded PPC: 01:41:27.712729 IP 192.168.6.22 > 192.168.27.119: icmp 64: echo request seq 7 01:41:28.712697 IP 192.168.6.22 > 192.168.27.119: icmp 64: echo request seq 8 01:41:43.712240 IP 192.168.6.22 > 192.168.27.119: icmp 64: echo request seq 23 01:41:44.712210 IP 192.168.6.22 > 192.168.27.119: icmp 64: echo request seq 24 The interesting point is that the jump in sequence number coincides with the jump in rx time. Theory is that not all drivers will behave like this. Hopefully someone can verify that with e.g. an e1000, and then check whether the patch I send makes any difference. -Tommy ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-31 12:16 ` Tommy Christensen @ 2005-03-13 17:49 ` Hasso Tepper 0 siblings, 0 replies; 67+ messages in thread From: Hasso Tepper @ 2005-03-13 17:49 UTC (permalink / raw) To: Tommy Christensen Cc: hadi, Jeff Garzik, Thomas Spatzier, David S. Miller, Herbert Xu, netdev, Paul Jakma Tommy Christensen wrote: > On Mon, 2005-01-31 at 04:26, jamal wrote: > > It does look pretty sane.. Tested? > > Nope, I don't have access to any relevant HW. Hopefully someone > else can give it some beating. Please. Sorry for long delay. I lost track in this discussion, was busy with other things, but found finally time to test this. > Hopefully someone can verify that with e.g. an e1000, and then check > whether the patch I send makes any difference. Yes, confirmed that patch makes difference and solves Quagga ospfd problem (from what it all started :). Thank you very much! PS. Patch is tested with 2.4.29 as well, but it requires link watch stuff to work. Complete (link-watch + vlan carrier detection + this fix) is here - http://hasso.linux.ee/quagga/2.4.29-link-watch.patch -- Hasso Tepper Elion Enterprises Ltd. WAN administrator ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2005-01-04 23:28 ` Jeff Garzik 2005-01-05 3:19 ` jamal @ 2005-01-05 6:26 ` Paul Jakma 1 sibling, 0 replies; 67+ messages in thread From: Paul Jakma @ 2005-01-05 6:26 UTC (permalink / raw) To: Jeff Garzik Cc: hadi, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, netdev, Tommy Christensen On Tue, 4 Jan 2005, Jeff Garzik wrote: > Aren't there cases where people would -not- want the queue to be > cleared? Why *do* you want the queue to be cleared? (avoiding NIC /dev/null is only reason right?) TCP (AIUI) has its owns means to resend. Most (all?) other transports have *never* had an expectation of such reliability. Applications *know* they have to provide their own reliability. Queueing and later sending (by then) stale packets is *bad* Think: - heartbeat applications - Router or route advertisements (IPv6 RA, IPv4 ICMP IRDP, RIP) - Streaming media (ok, not much damage here, but still there's simply no point queueing while link is down..) Why queue packets on behalf of applications which have no expectation of kernel doing this (any application expecting such is certainlty broken) and which very likely implement their own reliability or packet-loss recovery strategies? Particularly when queueing is quite possibly *detrimental* to what the application is trying to achieve (timely sending of packets). If an application wants reliable sending of packets, it will be using TCP.. > Jeff regards, -- Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A Fortune: When the speaker and he to whom he is speaks do not understand, that is metaphysics. -- Voltaire ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-19 23:46 ` Tommy Christensen 2004-12-20 0:15 ` Jeff Garzik 2004-12-20 14:10 ` jamal @ 2004-12-20 14:16 ` Paul Jakma 2004-12-20 18:56 ` Jeff Garzik 2004-12-26 5:36 ` Herbert Xu 3 siblings, 1 reply; 67+ messages in thread From: Paul Jakma @ 2004-12-20 14:16 UTC (permalink / raw) To: Tommy Christensen Cc: hadi, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, jgarzik, netdev On Mon, 20 Dec 2004, Tommy Christensen wrote: >> For a routing protocol that actually is notified that the link >> went down, it should probably flush those socket buffer at that >> point. Or why not return an error, as soon as possible on the socket, eg ENOBUFS, and discard anything in the queue before that. Make it configurable via a sockopt if you think it'd harm ordinary apps (though, anything that cant deal with ENOBUFS is broken already, really..) or make it apply only to nonblock sockets. > responsibility of the application to flush the socket on > link-down events (by down'ing the interface?). That seems more complex than needs be, for userspace at least. regards, -- Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A Fortune: "Paul Lynde to block..." -- a contestant on "Hollywood Squares" ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-20 14:16 ` Paul Jakma @ 2004-12-20 18:56 ` Jeff Garzik 0 siblings, 0 replies; 67+ messages in thread From: Jeff Garzik @ 2004-12-20 18:56 UTC (permalink / raw) To: Paul Jakma, hadi Cc: Tommy Christensen, Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, netdev Paul Jakma wrote: >> responsibility of the application to flush the socket on >> link-down events (by down'ing the interface?). > > > That seems more complex than needs be, for userspace at least. It is the responsibility of the kernel to push complexity to userland. Some applications may NOT desire that the socket be flushed. That's an app policy decision. If this is the core issue, then I am even more inclined to think that the kernel is not what needs to be modified here. Jeff ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-19 23:46 ` Tommy Christensen ` (2 preceding siblings ...) 2004-12-20 14:16 ` Paul Jakma @ 2004-12-26 5:36 ` Herbert Xu 3 siblings, 0 replies; 67+ messages in thread From: Herbert Xu @ 2004-12-26 5:36 UTC (permalink / raw) To: Tommy Christensen Cc: hadi, thomas.spatzier, davem, hasso, herbert, jgarzik, netdev, paul Tommy Christensen <tommy.christensen@tpack.net> wrote: > > OK. So is this the recommendation for these pour souls? > > - Use a socket for each device. Please show us your program first. Then we can figure out whether this is necessary or not. -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-19 19:29 ` jamal 2004-12-19 22:29 ` Tommy Christensen @ 2004-12-19 22:43 ` Jeff Garzik 2004-12-19 23:54 ` Paul Jakma 2 siblings, 0 replies; 67+ messages in thread From: Jeff Garzik @ 2004-12-19 22:43 UTC (permalink / raw) To: hadi, David S. Miller, Paul Jakma Cc: Thomas Spatzier, Hasso Tepper, Herbert Xu, netdev jamal wrote: > On Wed, 2004-12-15 at 10:03, Thomas Spatzier wrote: > > >>jamal <hadi@cyberus.ca> wrote on 15.12.2004 14:50:27: >> >> >>>When you netif_stop_queue you should never receive packets anymore >>>at the device level. If you receive any its a bug and you should drop >>>them and bitch violently. In other words i think what you have at the >>>moment is bandaid not the solution. >> >>When we do a netif_stop_queue, we do not get any more packets. >>So this behaviour is ok. The problem is that the socket write >>queues fill up then and get blocked as soon as they are full. >> > > > This is the strange part. Anyone still willing to provide a sample > program that hangs? > > >>>Can you describe how your driver uses the netif_start/stop/wake >>>interfaces? >> >>Before the patch we are talking about: >>When we detect a cable pull (or something like this) we call >>netif_stop_queue and set switch off the IFF_RUNNING flag. >>Then when we detect that the cable is plugged in again, we >>call netif_wake_queue and switch the IFF_RUNNING flag on. >> > > > Not too bad except user space doesnt get async notification. > > >>And with the patch: >>On cable pull we switch off IFF_RUNNING and call >>netif_carrier_off. We still get packets but drop them. >>When the cable is plugged in we switch on IFF_RUNNING and >>call netif_carrier_on. > > > Ok, thats something you need to change. > Why you are setting that IFF_RUNNING flag? > Look at e1000 they do it right there. > > On link up: > netif_carrier_on(netdev); > netif_wake_queue(netdev); > On Link Down: > netif_carrier_off(netdev); // wonder if these need swapping > netif_stop_queue(netdev); > > Still, I think we need to resolve the original problem. > And I have a feeling we wont be seeing any program which > reproduces it ;-> I've been watching this thread, and am still waiting to see a good, isolated test case. My initial reaction was based on an observation that (a) the proposed s390 change creates a CPU cycle soaker, a /dev/null for skbs. (b) it really sounds like the userland program is doing something broken Even if (b) is not true, the change is unacceptable due to (a) regardless. Jeff ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-19 19:29 ` jamal 2004-12-19 22:29 ` Tommy Christensen 2004-12-19 22:43 ` Jeff Garzik @ 2004-12-19 23:54 ` Paul Jakma 2004-12-20 14:11 ` jamal 2 siblings, 1 reply; 67+ messages in thread From: Paul Jakma @ 2004-12-19 23:54 UTC (permalink / raw) To: jamal Cc: Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, jgarzik, netdev On Sun, 19 Dec 2004, jamal wrote: > This is the strange part. Anyone still willing to provide a sample > program that hangs? I can provide instructions, or you can wait a wee bit - I didnt have any e1000 hardware to test with (e1000 being one of the drivers which has this behavious, AFAIK/TTBOMK) - but a computer with an e1000 arrived Friday. So, give me a bit and I'll try come up with a test case. A simple UDP send should be enough. > Still, I think we need to resolve the original problem. > And I have a feeling we wont be seeing any program which > reproduces it ;-> Just send a UDP packet - AIUI when link goes down the application can (or at least a window exists) where the application can send a packet down the file descriptor without error, but it will be queued rather than sent and the fd eventually blocks/returns EAGAIN. That window is the problem. I'll test tomorrow. > cheers, > jamal regards, -- Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A Fortune: You want to know why I kept getting promoted? Because my mouth knows more than my brain. -- W.G. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-19 23:54 ` Paul Jakma @ 2004-12-20 14:11 ` jamal 0 siblings, 0 replies; 67+ messages in thread From: jamal @ 2004-12-20 14:11 UTC (permalink / raw) To: Paul Jakma Cc: Thomas Spatzier, David S. Miller, Hasso Tepper, Herbert Xu, jgarzik, netdev On Sun, 2004-12-19 at 18:54, Paul Jakma wrote: > On Sun, 19 Dec 2004, jamal wrote: > > > This is the strange part. Anyone still willing to provide a sample > > program that hangs? > > I can provide instructions, or you can wait a wee bit - I didnt have > any e1000 hardware to test with (e1000 being one of the drivers which > has this behavious, AFAIK/TTBOMK) - but a computer with an e1000 > arrived Friday. So, give me a bit and I'll try come up with a test > case. A simple UDP send should be enough. When you are ready let me know, we could repeat what i said in my last email. cheers, jamal ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-06 14:42 ` Hasso Tepper 2004-12-07 1:13 ` Herbert Xu @ 2004-12-07 2:39 ` jamal 1 sibling, 0 replies; 67+ messages in thread From: jamal @ 2004-12-07 2:39 UTC (permalink / raw) To: Hasso Tepper; +Cc: Paul Jakma, Thomas Spatzier, jgarzik, netdev On Mon, 2004-12-06 at 09:42, Hasso Tepper wrote: > I'm trying to summarize. The approach - one raw socket to send/receive > messages no matter how many interfaces are used - worked for Quagga/Zebra > routing software daemons till now. If some of these interfaces was > operationally down, socket wasn't blocked even if the queue was full. > > In > fact "man sendmsg" has still text: > > ENOBUFS > The output queue for a network interface was full. This gener- > ally indicates that the interface has stopped sending, but may > be caused by transient congestion. (Normally, this does not > occur in Linux. Packets are just silently dropped when a device > queue overflows.) > > Seems that it's no longer true. Seems that kernel is now trying as hard as > possible not to loose any data - data is queued and if the queue will be > full, all related sockets will be blocked to notify application. We need to investigate why this happens. It doesnt sound like good behavior neither legit. Does this happen to only raw sockets? Herbert asked for a sample small program that reproduces it. > So, one > socket approach don't work any more for Quagga/Zebra. No problem, we can > take the "one socket per interface" approach. And we already have link > detection implemented to notify daemons. > I dont think it would be necessary with latest changes to notifications in 2.6.x > But there will be still potential problem - sending the "interface down" > message from kernel to the zebra daemon and then to the routing protocol > daemon takes time. And during this time daemon can send packets which will > sit in queue and may cause many problems if sent to the network later (if > link comes up again). Think about statelss routing protocols like rip(ng), > ipv6 router advertisements etc. They may carry the info that's no longer > true causing routing loops etc. > > And to clarify a little bit: no - the Quagga/Zebra didn't work with previous > approach perfectly. I fact with link detection and socket per interface it > will be better than ever no matter what's the kernel behaviour. We just > want to make sure that solution will be bulletproof. > > So, problem - how can we make sure that no potentially dangerous aged > (routing) info will be in the network? I think the idea of having driver junk packets when link is down does sound plausible as a solution. cheers, jamal ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch 4/10] s390: network driver. 2004-12-06 11:27 ` jamal 2004-12-06 14:42 ` Hasso Tepper @ 2004-12-06 18:44 ` Paul Jakma 1 sibling, 0 replies; 67+ messages in thread From: Paul Jakma @ 2004-12-06 18:44 UTC (permalink / raw) To: jamal; +Cc: Thomas Spatzier, jgarzik, linux-kernel, netdev On Mon, 6 Dec 2004, jamal wrote: > Dont post networking related patches on other lists. I havent seen said > patch, but it seems someone is complaining about some behavior changing? I missed the beginning of the thread too, but saw Jeff's reply to Thomas on netdev. It appears the original patch was to make the s390 network driver discard packets on link-down. Jeff had replied to say this was bad, that queues are meant to fill and that this was what other drivers (e1000, tg3) did. > In regards to link down and packets being queued. Agreed this is a > little problematic for some apps/transports. Tis yes. Particularly for apps using raw and UDP+IP_HDRINCL sockets. This problem came to light when we got reports of ospfd blocking because link was down, late in 2.4 with a certain version of the (iirc) e100 driver. ospfd uses one single socket for all interfaces, and relies on IP_HDRINCL to have the packet routed out right interface. However this approach doesnt play well if the socket can be blocked completely because of /one/ interface having its link down. The behaviour we expected (and got up until now) is to receive either ENOBUFS or else, if the kernel accepts the packet write, for it to drop it if it can not be sent. We can work around that by moving to a socket/interface. However it still leaves the problem of packets being queued indefinitely while the link is down and being sent when link comes back. This is *not* good for RIP, IPv4 IRDP and IPv6 RA. > In the case the netdevice is administratively downed both the qdisc > and DMA ring packets are flushed. What about any packets remaining in the socket buffer? (if that makes sense - i dont know enough about internals sadly). Are those queued? > Newer packets will never be queued That no longer appears to be the case though. The socket blocks, and /some/ packets are queued (presumably those which still were in the socket buffer? i dont know exactly..). > and you should quickly be able to find from your app that > the device is down. We can yes, via rtnetlink - but impossible to guarantee we'll know the link is down before we try write a packet. > In the case of netdevice being operationally down ? As in 'ip link set dev ... down'? > - I am hoping this is what the discussion is, having jumped on it - No, its for link-down, AIUI. > both queues stay intact. What you can do is certainly from user > space admin down/up the device when you receive a netlink carrier > off notification. That seems possible, but quite a hack. Something to work at a socket level would possibly be nicer. (Socket being the primary handle our application has). > I am struggling to see whether dropping the packet inside the tx > path once it is operationaly down is so blasphemous ... need to get > caffeine first. As long as reliable transports have some other transport specific queue, shouldnt be a problem. For UDP and raw no reliability or guarantees are expected by applications (least shouldnt be), and queueing some packets on link-down interferes with application-layer expectations. > cheers, > jamal regards, -- Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A Fortune: The UPS doesn't have a battery backup. ^ permalink raw reply [flat|nested] 67+ messages in thread
end of thread, other threads:[~2005-03-13 17:49 UTC | newest]
Thread overview: 67+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <OF88EC0E9F.DE8FC278-ONC1256F4A.0038D5C0-C1256F4A.00398E11@de.ibm.com>
2004-11-14 1:29 ` [patch 4/10] s390: network driver Jeff Garzik
2004-11-15 7:52 ` Paul Jakma
2004-11-21 8:16 ` Paul Jakma
2004-11-29 15:57 ` Thomas Spatzier
2004-11-29 16:30 ` Paul Jakma
2004-11-29 16:41 ` Thomas Spatzier
2004-11-29 20:27 ` Paul Jakma
2004-11-30 7:22 ` Thomas Spatzier
2004-12-05 6:25 ` Paul Jakma
2004-12-06 11:01 ` Post Network dev questions to netdev Please WAS(Re: " jamal
2004-12-06 11:27 ` jamal
2004-12-06 14:42 ` Hasso Tepper
2004-12-07 1:13 ` Herbert Xu
2004-12-07 2:22 ` jamal
2004-12-10 15:37 ` Paul Jakma
2004-12-14 7:40 ` Thomas Spatzier
2004-12-15 13:50 ` jamal
2004-12-15 15:03 ` Thomas Spatzier
2004-12-19 19:29 ` jamal
2004-12-19 22:29 ` Tommy Christensen
2004-12-19 23:05 ` jamal
2004-12-19 23:46 ` Tommy Christensen
2004-12-20 0:15 ` Jeff Garzik
2004-12-20 14:10 ` jamal
2004-12-20 18:54 ` Jeff Garzik
2004-12-21 0:13 ` Tommy Christensen
2004-12-21 1:19 ` Jeff Garzik
2004-12-22 10:56 ` Thomas Spatzier
2004-12-22 11:07 ` Jeff Garzik
2004-12-22 13:48 ` jamal
2005-01-03 9:10 ` Thomas Spatzier
2005-01-03 15:05 ` jamal
2005-01-04 23:28 ` Jeff Garzik
2005-01-05 3:19 ` jamal
2005-01-05 6:30 ` Paul Jakma
2005-01-05 13:16 ` jamal
2005-01-05 14:29 ` Paul Jakma
2005-01-06 13:55 ` jamal
2005-01-05 15:35 ` Tommy Christensen
2005-01-06 13:58 ` jamal
2005-01-06 15:06 ` Tommy Christensen
2005-01-07 13:32 ` jamal
2005-01-07 15:26 ` Tommy Christensen
2005-01-10 13:18 ` jamal
2005-01-16 23:10 ` jamal
2005-01-17 12:04 ` Hasso Tepper
2005-01-17 22:04 ` Tommy Christensen
2005-01-17 22:13 ` Peter Buckingham
2005-01-17 22:36 ` jamal
2005-01-17 22:53 ` Tommy Christensen
2005-01-17 21:38 ` Tommy Christensen
2005-01-30 23:39 ` Tommy Christensen
2005-01-31 0:09 ` jamal
2005-01-31 0:12 ` jamal
2005-01-31 0:31 ` Tommy Christensen
2005-01-31 3:26 ` jamal
2005-01-31 12:16 ` Tommy Christensen
2005-03-13 17:49 ` Hasso Tepper
2005-01-05 6:26 ` Paul Jakma
2004-12-20 14:16 ` Paul Jakma
2004-12-20 18:56 ` Jeff Garzik
2004-12-26 5:36 ` Herbert Xu
2004-12-19 22:43 ` Jeff Garzik
2004-12-19 23:54 ` Paul Jakma
2004-12-20 14:11 ` jamal
2004-12-07 2:39 ` jamal
2004-12-06 18:44 ` Paul Jakma
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).