Netdev List
 help / color / mirror / Atom feed
* Re: Congestion Avoidance Monitoring Tools
From: Stephen Hemminger @ 2006-04-21 15:19 UTC (permalink / raw)
  To: piet; +Cc: netdev, linux-net
In-Reply-To: <1145597174.12413.17.camel@piet2.bluelane.com>

On Thu, 20 Apr 2006 22:26:14 -0700
Piet Delaney <piet@bluelane.com> wrote:

> I'm upgrading our 2.6.12 kernel to 2.6.13, which includes significant
> congestion avoidance code additions and changes. I was wondering if
> there are any tools folks can recommend for testing the kernel to make
> sure the congestion avoidance code is operating correctly. For 
> example the displaying of the congestion window as a function of time
> while undergoing convergence. For causing congestion I could modify 
> a kernel to discard packets once in a while on a lab gateway and hit 
> it with iperf. HP's netperf looks interesting. 
> 
> Any suggestions?
> 
> 
> -piet
> 

2.6.13 still had lots of problems, things didn't really get working
right till 2.6.15 or later. Especially with TSO.

I have a tool using kprobe's see http://developer.osdl.org/shemminger/prototypes/tcpprobe.tar.gz
I try to keep it up to date with current kernel and build process, last used it
on 2.6.16.

^ permalink raw reply

* Question on using Linux as a router
From: Serge Goodenko @ 2006-04-21 14:21 UTC (permalink / raw)
  To: netdev

Hi everybody!

I got the following question.

When I use linux as a router (via ip forwarding) what kernel variables (maybe some queues?) represent the closest analogue of usual hardware router input and output buffers? May this be, say, backlog queue or something else?

The things I need to get are the sizes and loads of that buffers during transmission.
I know about variables such as sk->sk_rcvbuf and sk->sk_rmem_alloc but they are not used during ip forwarding as the socket (i.e. sock structure) is not even being created for that purpose. As far as I understood these variables in sock structure are mostly used for tcp-level packet processing and they represent the values written in files like /proc/sys/net/core/wmem_default etc. (please correct me if that's wrong), but nevertheless maybe I also can use these values for "routing" buffers (i.e. on ip level)?

thanks in advance,
Servge
MIPT
Moscow, Russia

^ permalink raw reply

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
From: Andy Gospodarek @ 2006-04-21 13:27 UTC (permalink / raw)
  To: Michael Chan
  Cc: Herbert Xu, shawvrana, netdev, auke-jan.h.kok, davem, jgarzik
In-Reply-To: <1145582676.3195.18.camel@rh4>

On Thu, Apr 20, 2006 at 06:24:36PM -0700, Michael Chan wrote:
> On Fri, 2006-04-21 at 12:40 +1000, Herbert Xu wrote:
> 
> > One simple solution is to establish a separate queue for RTNL-holding
> > users or vice versa for non-RTNL holding networking users.  That
> > would allow the drivers to safely flush the non-RTNL queue while
> > holding the RTNL.
> 
> You mean a separate workqueue for net drivers to use instead of the
> keventd_wq? Yeah, I think that'll work. Each driver can also create its
> own workqueue but that may be a bit more wasteful.
> 

Isn't the only possibility for a linkwatch deadlock when the
__LINK_STATE_LINKWATCH_PENDING but is set in dev->state?  

Off the top of my head...

Would it be interesting to change the calls for flush_scheduled_work()
to a new function net_flush_scheduled_work() with the intent on
eventually creating new a new work queue but temporarily just checking
to make sure there are no linkwatch events pending and if there are
allowing them to run first before calling flush_scheduled_work()?

This probably isn't a perfect solution, but I thought I'd throw it out
there and see what you think....

-andy



^ permalink raw reply

* Fw: Bug: PPP dropouts in >=2.6.16
From: Andrew Morton @ 2006-04-21  8:08 UTC (permalink / raw)
  To: netdev; +Cc: Nuri Jawad


We do seem to have had a few reports of ppp regressions around this
timeframe.


Begin forwarded message:

Date: Thu, 20 Apr 2006 23:28:24 +0200 (CEST)
From: Nuri Jawad <lkml@jawad.org>
To: linux-kernel@vger.kernel.org
Subject: Bug: PPP dropouts in >=2.6.16


Good evening,

I've recently had problems with my ADSL connection (PPPoE) after upgrading
from 2.6.15.3 to 2.6.16.5. Every now and then, it would stall for 30-50
seconds. According to tcpdump, during these dropouts packets get received
through ppp0 but none that are supposed to be sent appear on the
interface. This is consistent with the modem not sending out any data via
its ATM interface during these periods. I made sure the ISP, modem or
ethernet interface wasn't the problem with a second box running IPCop
(minimal 2.4 router distro, I can log in twice with my ISP) and a telnet
connection to the modem that never stalled.

Upgrading pppd from 2.4.2 to 2.4.4b1 and rp-pppoe from 3.5 (latest Debian
package) to 3.8 had no effect.
I was using Bittorrent and also the folding@home client so the system had
a high load. Stopping those seemed to reduce the frequency of dropouts
but they were still coming up at least every few hours. A fallback to the
old kernel made the problem disappear.

I then tested 24 hours a day this week by pinging my box from 2 hosts and
making sure above load was persistent, while trying different kernels. So
far I can say that:

- <= 2.6.15.7 is not affected, tested with .3, .4 and .7
- >= 2.6.16 is affected, tested with 2.6.16, .1, .5 and .9
- the dropouts last between 30 and 50 seconds but often exactly 41, they
  appear about every half to one hour
- a night yields 20-40 lost packets with 2.6.15 and 700-1100 with 2.6.16
  on my connection

The system:
P4 Northwood with HT enabled, 2 GB RAM, Asus P4C800
Boot/root PATA ICH5, torrents saved to SATA HD through ata-piix
Third drive through sata-promise
Debian GNU/Linux unstable
Kernel config: http://jawad.org/.config

Let me know what I can do to give you more specific information.

Regards,
Nuri
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: Congestion Avoidance Monitoring Tools
From: Piet Delaney @ 2006-04-21  7:25 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Piet Delaney, Tom Young, netdev, linux-net
In-Reply-To: <200604210857.19084.ak@suse.de>

On Fri, 2006-04-21 at 08:57 +0200, Andi Kleen wrote:
> On Friday 21 April 2006 07:59, Tom Young wrote:
> > On Thu, 2006-04-20 at 22:26 -0700, Piet Delaney wrote:
> > > I'm upgrading our 2.6.12 kernel to 2.6.13, which includes significant
> > > congestion avoidance code additions and changes. I was wondering if
> > > there are any tools folks can recommend for testing the kernel to make
> > > sure the congestion avoidance code is operating correctly. For 
> > > example the displaying of the congestion window as a function of time
> > > while undergoing convergence. For causing congestion I could modify 
> > > a kernel to discard packets once in a while on a lab gateway and hit 
> > > it with iperf. HP's netperf looks interesting. 
> > > 
> > > Any suggestions?
> > > 
> > > 
> > > -piet
> > > 
> > 
> > Hi,
> > 
> > Try having a look at the output of 'ss -i' (you may need to update to
> > the latest iproute2 tools). You could either try and parse the text
> > output of that or use the same inet_diag interface that ss uses to poll
> > for the data it at regular intervals.
> 
> Another way is to use tcptrace on a tcpdump file
> (http://jarok.cs.ohiou.edu/software/tcptrace/) 
> It finds a lot of statistics about the dumped TCP connections.

Tcptrace looks pretty good, graphing the sequence numbers as a function
of time was one of the features I'm looking for.

> 
> Newer ethereal also has some TCP plotting functions that are useful.
> They don't display the congestion window directly, but you can see it indirectly.

I never tried the ethereal graph facility, looks like another great
idea.


Thanks,
-piet

> 
> -Andi
-- 
---
piet@bluelane.com


^ permalink raw reply

* Re: Congestion Avoidance Monitoring Tools
From: Andi Kleen @ 2006-04-21  6:57 UTC (permalink / raw)
  To: Tom Young; +Cc: piet, netdev, linux-net
In-Reply-To: <1145599196.2534.38.camel@penguin>

On Friday 21 April 2006 07:59, Tom Young wrote:
> On Thu, 2006-04-20 at 22:26 -0700, Piet Delaney wrote:
> > I'm upgrading our 2.6.12 kernel to 2.6.13, which includes significant
> > congestion avoidance code additions and changes. I was wondering if
> > there are any tools folks can recommend for testing the kernel to make
> > sure the congestion avoidance code is operating correctly. For 
> > example the displaying of the congestion window as a function of time
> > while undergoing convergence. For causing congestion I could modify 
> > a kernel to discard packets once in a while on a lab gateway and hit 
> > it with iperf. HP's netperf looks interesting. 
> > 
> > Any suggestions?
> > 
> > 
> > -piet
> > 
> 
> Hi,
> 
> Try having a look at the output of 'ss -i' (you may need to update to
> the latest iproute2 tools). You could either try and parse the text
> output of that or use the same inet_diag interface that ss uses to poll
> for the data it at regular intervals.

Another way is to use tcptrace on a tcpdump file
(http://jarok.cs.ohiou.edu/software/tcptrace/) 
It finds a lot of statistics about the dumped TCP connections.

Newer ethereal also has some TCP plotting functions that are useful.
They don't display the congestion window directly, but you can see it indirectly.

-Andi

^ permalink raw reply

* Re: Congestion Avoidance Monitoring Tools
From: Tom Young @ 2006-04-21  5:59 UTC (permalink / raw)
  To: piet; +Cc: netdev, linux-net
In-Reply-To: <1145597174.12413.17.camel@piet2.bluelane.com>

On Thu, 2006-04-20 at 22:26 -0700, Piet Delaney wrote:
> I'm upgrading our 2.6.12 kernel to 2.6.13, which includes significant
> congestion avoidance code additions and changes. I was wondering if
> there are any tools folks can recommend for testing the kernel to make
> sure the congestion avoidance code is operating correctly. For 
> example the displaying of the congestion window as a function of time
> while undergoing convergence. For causing congestion I could modify 
> a kernel to discard packets once in a while on a lab gateway and hit 
> it with iperf. HP's netperf looks interesting. 
> 
> Any suggestions?
> 
> 
> -piet
> 

Hi,

Try having a look at the output of 'ss -i' (you may need to update to
the latest iproute2 tools). You could either try and parse the text
output of that or use the same inet_diag interface that ss uses to poll
for the data it at regular intervals.

-- 
Thomas Young
http://cubinlab.ee.unimelb.edu.au/~tyo/
Research Assistant
CUBIN Research Centre - University of Melbourne


^ permalink raw reply

* Fw: [Bugme-new] [Bug 6420] New: iptables is complaining with bogus unknown error 18446744073709551615
From: Andrew Morton @ 2006-04-21  6:26 UTC (permalink / raw)
  To: netdev; +Cc: mvolaski, bugme-daemon@kernel-bugs.osdl.org



Begin forwarded message:

Date: Thu, 20 Apr 2006 23:17:58 -0700
From: bugme-daemon@bugzilla.kernel.org
To: bugme-new@lists.osdl.org
Subject: [Bugme-new] [Bug 6420] New: iptables is complaining with bogus unknown error 18446744073709551615


http://bugzilla.kernel.org/show_bug.cgi?id=6420

           Summary: iptables is complaining with bogus unknown error
                    18446744073709551615
    Kernel Version: 2.6.17-rc2
            Status: NEW
          Severity: normal
             Owner: laforge@gnumonks.org
         Submitter: mvolaski@aecom.yu.edu


At least since 2.6.1.16.1, many calls to iptables no longer function at least
under 64-bit x86, presumably due to a bug in the netfilter kernel code.

The problem is still present in 2.6.17-rc2.

The error from iptables is
iptables: unknown error 18446744073709551615

Examples of rules that give the error are

1) iptables -A INPUT -i bond0 -s 129.98.90.0/24 -p tcp --dport 548 -j ACCEPT
2) iptables -A INPUT -i bond0 -s 129.98.90.101/32 -p tcp --dport 497 -j ACCEPT
3) iptables -A INPUT -i bond0 -s 129.98.90.227/32 -p tcp --dport 22 -j ACCEPT

Example of a rule that does not give the error:
1) iptables -A INPUT -i bond0 -p ICMP --icmp-type echo-request -s
129.98.90.13/32 -j ACCEPT

The computer is using IPv4 and not IPv6, which has not been compiled into the
kernel.

iptables is version 1.3.5.

Kernel configuration related to iptables follows:

CONFIG_IP_NF_CONNTRACK=m
CONFIG_IP_NF_CT_ACCT=y
CONFIG_IP_NF_CONNTRACK_MARK=y
CONFIG_IP_NF_CONNTRACK_EVENTS=y
CONFIG_IP_NF_CONNTRACK_NETLINK=m
# CONFIG_IP_NF_CT_PROTO_SCTP is not set
CONFIG_IP_NF_FTP=m
# CONFIG_IP_NF_IRC is not set
# CONFIG_IP_NF_NETBIOS_NS is not set
# CONFIG_IP_NF_TFTP is not set
# CONFIG_IP_NF_AMANDA is not set
# CONFIG_IP_NF_PPTP is not set
# CONFIG_IP_NF_H323 is not set
# CONFIG_IP_NF_QUEUE is not set
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_MATCH_IPRANGE=m
CONFIG_IP_NF_MATCH_TOS=m
CONFIG_IP_NF_MATCH_RECENT=m
CONFIG_IP_NF_MATCH_ECN=m
CONFIG_IP_NF_MATCH_DSCP=m
CONFIG_IP_NF_MATCH_AH=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_MATCH_OWNER=m
CONFIG_IP_NF_MATCH_ADDRTYPE=m
CONFIG_IP_NF_MATCH_HASHLIMIT=m
CONFIG_IP_NF_FILTER=m
# CONFIG_IP_NF_TARGET_REJECT is not set
CONFIG_IP_NF_TARGET_LOG=m
CONFIG_IP_NF_TARGET_ULOG=m
CONFIG_IP_NF_TARGET_TCPMSS=m
# CONFIG_IP_NF_NAT is not set
CONFIG_IP_NF_MANGLE=m
# CONFIG_IP_NF_TARGET_TOS is not set
# CONFIG_IP_NF_TARGET_ECN is not set
# CONFIG_IP_NF_TARGET_DSCP is not set
# CONFIG_IP_NF_TARGET_TTL is not set
# CONFIG_IP_NF_TARGET_CLUSTERIP is not set
CONFIG_IP_NF_RAW=m
CONFIG_IP_NF_ARPTABLES=m
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m

CONFIG_NETFILTER_XT_TARGET_CLASSIFY=m
# CONFIG_NETFILTER_XT_TARGET_CONNMARK is not set
CONFIG_NETFILTER_XT_TARGET_MARK=m
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m
# CONFIG_NETFILTER_XT_TARGET_NOTRACK is not set
CONFIG_NETFILTER_XT_MATCH_COMMENT=m
CONFIG_NETFILTER_XT_MATCH_CONNBYTES=m
CONFIG_NETFILTER_XT_MATCH_CONNMARK=m
CONFIG_NETFILTER_XT_MATCH_CONNTRACK=m
CONFIG_NETFILTER_XT_MATCH_DCCP=m
CONFIG_NETFILTER_XT_MATCH_ESP=m
CONFIG_NETFILTER_XT_MATCH_HELPER=m
CONFIG_NETFILTER_XT_MATCH_LENGTH=m
CONFIG_NETFILTER_XT_MATCH_LIMIT=m
CONFIG_NETFILTER_XT_MATCH_MAC=m
CONFIG_NETFILTER_XT_MATCH_MARK=m
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m
CONFIG_NETFILTER_XT_MATCH_PKTTYPE=m
CONFIG_NETFILTER_XT_MATCH_REALM=m
CONFIG_NETFILTER_XT_MATCH_SCTP=m
CONFIG_NETFILTER_XT_MATCH_STATE=m
CONFIG_NETFILTER_XT_MATCH_STRING=m
CONFIG_NETFILTER_XT_MATCH_TCPMSS=m


lsmod shows
xt_state                4928  0 
ipt_LOG                 8960  0 
ip_conntrack_ftp       10000  0 
ip_conntrack           57880  2 xt_state,ip_conntrack_ftp
nfnetlink               8520  1 ip_conntrack
iptable_filter          5440  0 
ip_tables              22168  1 iptable_filter
x_tables               17800  3 xt_state,ipt_LOG,ip_tables


This issue has been posted to netfilter bugzilla as
https://bugzilla.netfilter.org/bugzilla/show_bug.cgi?id=467

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

^ permalink raw reply

* Congestion Avoidance Monitoring Tools
From: Piet Delaney @ 2006-04-21  5:26 UTC (permalink / raw)
  To: netdev; +Cc: Piet Delaney, linux-net

I'm upgrading our 2.6.12 kernel to 2.6.13, which includes significant
congestion avoidance code additions and changes. I was wondering if
there are any tools folks can recommend for testing the kernel to make
sure the congestion avoidance code is operating correctly. For 
example the displaying of the congestion window as a function of time
while undergoing convergence. For causing congestion I could modify 
a kernel to discard packets once in a while on a lab gateway and hit 
it with iperf. HP's netperf looks interesting. 

Any suggestions?


-piet

-- 
---
piet@bluelane.com


^ permalink raw reply

* Re: [PATCH 0/10] [IOAT] I/OAT patches repost
From: Olof Johansson @ 2006-04-21  4:42 UTC (permalink / raw)
  To: David S. Miller; +Cc: olof, andrew.grover, netdev
In-Reply-To: <20060420.204200.103377406.davem@davemloft.net>

On Thu, Apr 20, 2006 at 08:42:00PM -0700, David S. Miller wrote:

> This is basically why none of the performance gains add up to me.  I
> am thus very concerned that the current non-cache-warming
> implmentation may fall flat performance wise.

Ok, I buy your arguments. It does seems unlikely that a DMA offload
without cache warmth will be a net gain. More performance data is
definitely be required.

After digging after PDFs, it seems as the Freescale 85xx (at least,
probably earlier models as well) can warm L2 for the DMA destination
data. However, I don't have any hardware with it to play around
with for benchmarking to see what cache warming might bring (back),
performance-wise.

I think there is still use for a common multi-function DMA framework
across platforms and client components, even if net receive doesn't end
up being {a,the first} user.


-Olof

^ permalink raw reply

* Re: [PATCH 0/10] [IOAT] I/OAT patches repost
From: David S. Miller @ 2006-04-21  3:42 UTC (permalink / raw)
  To: olof; +Cc: andrew.grover, netdev
In-Reply-To: <20060421030426.GM26746@pb15.lixom.net>

From: Olof Johansson <olof@lixom.net>
Date: Thu, 20 Apr 2006 22:04:26 -0500

> On Thu, Apr 20, 2006 at 05:27:42PM -0700, David S. Miller wrote:
> > Besides the control overhead of the DMA engines, the biggest thing
> > lost in my opinion is the perfect cache warming that a cpu based copy
> > does from the kernel socket buffer into userspace.
> 
> It's definitely the easiest way to always make sure the right caches
> are warm for the app, that I agree with.
> 
> But, when warming those caches by copying, the data is pulled in through
> a potentially cold cache in the first place. So the cache misses are
> just moved from the copy loop to userspace with dma offload. Or am I
> missing something?

Yes, and it means that the memory bandwidth costs are equivalent
between I/O AT and cpu copy.

In the cpu copy case you eat the read cache miss, but on the write
side you'll prewarm the cache properly.  In the I/O AT case you
eat the same read cost, but the cache will not be prewarmed, so you'll
eat the read cache miss in the application.  It's moving the same
exact cost from one place to another.

The time it takes to get the app to make forward progress (meaning
returned from the recvmsg() system call and back in userspace) must by
definition take at least as long with I/O AT as it does with cpu
copies.  Yet in the I/O AT case, the application must wait that long
and also then take in the delays of the cache misses when it tries to
read the data that the I/O AT engine copied.  Instead of eating the
cache miss cost in the kernel, we eat it in the app because in the I/O
AT case the cpu won't have the user data fresh and loaded into the cpu
cache.

And I say I/O AT must take "at least as long" as cpu copies because
the same memory copy cost is there, and on top of that I/O AT has to
program the DMA controller and touch a _lot_ of other state to get
things going and then wake the task up.  We're talking non-trivial
overheads like grabbing the page mappings out of the page tables using
get_user_pages().  Evgivny has posted some very nice performance graphs
showing how poorly that function scales.

This is basically why none of the performance gains add up to me.  I
am thus very concerned that the current non-cache-warming
implmentation may fall flat performance wise.

^ permalink raw reply

* [PATCH] Unregister network device before releasing PCMCIA resources
From: Pavel Roskin @ 2006-04-21  3:17 UTC (permalink / raw)
  To: netdev; +Cc: linux-pcmcia, linville, Dominik Brodowski

From: Pavel Roskin <proski@gnu.org>

This is the right thing to do and it prevents kernel BUG on unload.

Some PCMCIA network drivers use link->dev_node as a flag indicating that
the network device has been successfully registered.  Recent code
changes cause this flag to be 0 after PCMCIA resources have been
released.

Signed-off-by: Pavel Roskin <proski@gnu.org>
---

 drivers/net/wireless/netwave_cs.c  |    4 ++--
 drivers/net/wireless/orinoco_cs.c  |    5 +++--
 drivers/net/wireless/ray_cs.c      |    4 +++-
 drivers/net/wireless/spectrum_cs.c |    5 +++--
 drivers/net/wireless/wavelan_cs.c  |    9 +++++----
 5 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/drivers/net/wireless/netwave_cs.c b/drivers/net/wireless/netwave_cs.c
index 9343d97..5d80db2 100644
--- a/drivers/net/wireless/netwave_cs.c
+++ b/drivers/net/wireless/netwave_cs.c
@@ -445,11 +445,11 @@ static void netwave_detach(struct pcmcia
 
 	DEBUG(0, "netwave_detach(0x%p)\n", link);
 
-	netwave_release(link);
-
 	if (link->dev_node)
 		unregister_netdev(dev);
 
+	netwave_release(link);
+
 	free_netdev(dev);
 } /* netwave_detach */
 
diff --git a/drivers/net/wireless/orinoco_cs.c b/drivers/net/wireless/orinoco_cs.c
index 434f7d7..5988305 100644
--- a/drivers/net/wireless/orinoco_cs.c
+++ b/drivers/net/wireless/orinoco_cs.c
@@ -147,14 +147,15 @@ static void orinoco_cs_detach(struct pcm
 {
 	struct net_device *dev = link->priv;
 
-	orinoco_cs_release(link);
-
 	DEBUG(0, PFX "detach: link=%p link->dev_node=%p\n", link, link->dev_node);
 	if (link->dev_node) {
 		DEBUG(0, PFX "About to unregister net device %p\n",
 		      dev);
 		unregister_netdev(dev);
 	}
+
+	orinoco_cs_release(link);
+
 	free_orinocodev(dev);
 }				/* orinoco_cs_detach */
 
diff --git a/drivers/net/wireless/ray_cs.c b/drivers/net/wireless/ray_cs.c
index 879eb42..fac4f1b 100644
--- a/drivers/net/wireless/ray_cs.c
+++ b/drivers/net/wireless/ray_cs.c
@@ -388,13 +388,15 @@ static void ray_detach(struct pcmcia_dev
     this_device = NULL;
     dev = link->priv;
 
+    if (link->dev_node)
+	unregister_netdev(dev);
+
     ray_release(link);
 
     local = (ray_dev_t *)dev->priv;
     del_timer(&local->timer);
 
     if (link->priv) {
-	if (link->dev_node) unregister_netdev(dev);
         free_netdev(dev);
     }
     DEBUG(2,"ray_cs ray_detach ending\n");
diff --git a/drivers/net/wireless/spectrum_cs.c b/drivers/net/wireless/spectrum_cs.c
index f7b77ce..2551938 100644
--- a/drivers/net/wireless/spectrum_cs.c
+++ b/drivers/net/wireless/spectrum_cs.c
@@ -626,14 +626,15 @@ static void spectrum_cs_detach(struct pc
 {
 	struct net_device *dev = link->priv;
 
-	spectrum_cs_release(link);
-
 	DEBUG(0, PFX "detach: link=%p link->dev_node=%p\n", link, link->dev_node);
 	if (link->dev_node) {
 		DEBUG(0, PFX "About to unregister net device %p\n",
 		      dev);
 		unregister_netdev(dev);
 	}
+
+	spectrum_cs_release(link);
+
 	free_orinocodev(dev);
 }				/* spectrum_cs_detach */
 
diff --git a/drivers/net/wireless/wavelan_cs.c b/drivers/net/wireless/wavelan_cs.c
index f7724eb..03c2e16 100644
--- a/drivers/net/wireless/wavelan_cs.c
+++ b/drivers/net/wireless/wavelan_cs.c
@@ -4681,6 +4681,11 @@ #ifdef DEBUG_CALLBACK_TRACE
   printk(KERN_DEBUG "-> wavelan_detach(0x%p)\n", link);
 #endif
 
+  /* Remove ourselves from the kernel list of ethernet devices */
+  /* Warning : can't be called from interrupt, timer or wavelan_close() */
+  if (link->dev_node)
+    unregister_netdev(dev);
+
   /* Some others haven't done their job : give them another chance */
   wv_pcmcia_release(link);
 
@@ -4689,10 +4694,6 @@ #endif
     {
       struct net_device *	dev = (struct net_device *) link->priv;
 
-      /* Remove ourselves from the kernel list of ethernet devices */
-      /* Warning : can't be called from interrupt, timer or wavelan_close() */
-      if (link->dev_node)
-	unregister_netdev(dev);
       link->dev_node = NULL;
       ((net_local *)netdev_priv(dev))->link = NULL;
       ((net_local *)netdev_priv(dev))->dev = NULL;


^ permalink raw reply related

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
From: Michael Chan @ 2006-04-21  1:33 UTC (permalink / raw)
  To: Shaw Vrana; +Cc: Herbert Xu, netdev, Auke Kok, David S. Miller
In-Reply-To: <200604201942.07009.shawvrana@acm.org>

On Thu, 2006-04-20 at 19:42 -0700, Shaw Vrana wrote:

> I'll bite!  Here's a patch to add a call to flush_scheduled_work() in 
> e1000_down.  It's against 2.6.16.9.
> 
You're not following our discussion. It is not safe to call
flush_scheduled_work() in a driver's close() because it is holding the
rtnl and can deadlock with linkwatch_event() if it happens to be on the
workqueue.


^ permalink raw reply

* Re: [PATCH 0/10] [IOAT] I/OAT patches repost
From: Olof Johansson @ 2006-04-21  3:09 UTC (permalink / raw)
  To: David S. Miller; +Cc: andy.grover, netdev
In-Reply-To: <20060420.174438.15249396.davem@davemloft.net>

On Thu, Apr 20, 2006 at 05:44:38PM -0700, David S. Miller wrote:
> From: Olof Johansson <olof@lixom.net>
> Date: Thu, 20 Apr 2006 18:33:43 -0500
> 
> > On Thu, Apr 20, 2006 at 03:14:15PM -0700, Andrew Grover wrote:
> > > In
> > > addition, there may be workloads (file serving? backup?) where we
> > > could do a skb->page-in-page-cache copy and avoid cache pollution?
> > 
> > Yes, NFS is probably a prime example of where most of the data isn't
> > looked at; just written to disk. I'm not sure how well-optimized the
> > receive path is there already w.r.t. avoiding copying though. I don't
> > remember seeing memcpy and friends being high on the profile when I
> > looked at SPECsfs last.
> 
> If that makes sense then the cpu copy can be made to use non-temporal
> stores.

I'm not sure that would buy anything. I didn't mean caching was
necessarily bad, just that lack of it might not hurt as much under that
specific type of workload.

NFS has to look at RPC/NFS headers anyway, so it will benefit from the
cache being warm.


-Olof

^ permalink raw reply

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
From: shaw @ 2006-04-21  3:05 UTC (permalink / raw)
  To: Michael Chan; +Cc: Herbert Xu, shawvrana, netdev, Auke Kok, David S. Miller
In-Reply-To: <1145578214.3195.6.camel@rh4>

[-- Attachment #1: Type: text/plain, Size: 602 bytes --]

I've replied to this once before, but haven't seen my last two emails on the 
list, so I'm sending again with different settings.  Sorry for the noise.

On Thursday 20 April 2006 17:10, Michael Chan wrote:
> In tg3_remove_one(), we call flush_scheduled_work() in case the
> reset_task is still pending. Here, it is safe to call
> flush_scheduled_work() because we're not holding the rtnl. Again, when
> it runs, nothing bad will happen because it will see netif_running() ==
> 0.

I'll bite!  Here's a patch to add a call to flush_scheduled_work() in 
e1000_down.  It's against 2.6.16.9.

Thanks,
Shaw

[-- Attachment #2: e1000_flush_in_close.patch --]
[-- Type: text/x-diff, Size: 614 bytes --]

diff -u -uprN -X linux-2.6.16.9/Documentation/dontdiff linux-2.6.16.9/drivers/net/e1000/e1000_main.c linux-2.6.16.9-patch/drivers/net/e1000/e1000_main.c
--- linux-2.6.16.9/drivers/net/e1000/e1000_main.c	2006-04-18 23:10:14.000000000 -0700
+++ linux-2.6.16.9-patch/drivers/net/e1000/e1000_main.c	2006-04-20 19:36:55.000000000 -0700
@@ -538,6 +538,7 @@ e1000_down(struct e1000_adapter *adapter
 	del_timer_sync(&adapter->tx_fifo_stall_timer);
 	del_timer_sync(&adapter->watchdog_timer);
 	del_timer_sync(&adapter->phy_info_timer);
+	flush_scheduled_work();	
 
 #ifdef CONFIG_E1000_NAPI
 	netif_poll_disable(netdev);

^ permalink raw reply

* Re: [PATCH 0/10] [IOAT] I/OAT patches repost
From: Olof Johansson @ 2006-04-21  3:04 UTC (permalink / raw)
  To: David S. Miller; +Cc: olof, andrew.grover, netdev
In-Reply-To: <20060420.172742.132879746.davem@davemloft.net>

On Thu, Apr 20, 2006 at 05:27:42PM -0700, David S. Miller wrote:
> From: Olof Johansson <olof@lixom.net>
> Date: Thu, 20 Apr 2006 16:33:05 -0500
> 
> > From the wiki:
> > 
> > >    3. Data copied by I/OAT is not cached
> > 
> > This is a I/OAT device limitation and not a global statement of the
> > DMA infrastructure. Other platforms might be able to prime caches
> > with the DMA traffic. Hint flags should be added on either the channel
> > allocation calls, or per-operation calls, depending on where it makes
> > sense driver/client wise.
> 
> This sidesteps the whole question of _which_ cache to warm.  And if
> you choose wrongly, then what?
>
> Besides the control overhead of the DMA engines, the biggest thing
> lost in my opinion is the perfect cache warming that a cpu based copy
> does from the kernel socket buffer into userspace.

It's definitely the easiest way to always make sure the right caches
are warm for the app, that I agree with.

But, when warming those caches by copying, the data is pulled in through
a potentially cold cache in the first place. So the cache misses are
just moved from the copy loop to userspace with dma offload. Or am I
missing something?

> The first thing an application is going to do is touch that data.  So
> I think it's very important to prewarm the caches and the only
> straightforward way I know of to always warm up the correct cpu's
> caches is copy_to_user().

The other way (assuming the hardware supports cache warming) would be
to pass down affinities (or look them up during receive processing,
I'm not sure that's practical the way things work now), and dispatch
on a DMA channel with the right cache affinity. I've got a feeling that
"straightforward" is not a term to use for describing that solution
though.

> Unfortunately, many benchmarks just do raw bandwidth tests sending to
> a receiver that just doesn't even look at the data.  They just return
> from recvmsg() and loop back into it.  This is not what applications
> using networking actually do, so it's important to make sure we look
> intelligently at any benchmarks done and do not fall into the trap of
> saying "even without cache warming it made things faster" when in fact
> the tested receiver did not touch the data at all so was a false test.

Yes, some real-life-like benchmarking is definitiely needed. Unfortunately
I'm not at a position where I can do much (and share numbers) at the
moment myself.


-Olof

^ permalink raw reply

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
From: Michael Chan @ 2006-04-21  1:24 UTC (permalink / raw)
  To: Herbert Xu; +Cc: shawvrana, netdev, auke-jan.h.kok, davem, jgarzik
In-Reply-To: <20060421024024.GA29644@gondor.apana.org.au>

On Fri, 2006-04-21 at 12:40 +1000, Herbert Xu wrote:

> One simple solution is to establish a separate queue for RTNL-holding
> users or vice versa for non-RTNL holding networking users.  That
> would allow the drivers to safely flush the non-RTNL queue while
> holding the RTNL.

You mean a separate workqueue for net drivers to use instead of the
keventd_wq? Yeah, I think that'll work. Each driver can also create its
own workqueue but that may be a bit more wasteful.


^ permalink raw reply

* Re: Cannot receive multicast packets
From: David Stevens @ 2006-04-21  3:05 UTC (permalink / raw)
  To: Andrew Athan; +Cc: netdev
In-Reply-To: <44482C8D.7010401@cloakmail.com>

Andrew,

>  I did not 
> think the source IP was relevant to the matching code in linux, since 
> there are no source squelching socket options. 
> 
> There are no firewall rules active on this machine, and the packets are 
> definitely visible at the interface (see tcpdump output in my email).

        The source address is not relevant (other than potentially
for firewall rules), and I understand from your original mail that
they are arriving at the machine. The IP TTL is what I wanted to
know there; but "netstat -s" will normally tell you why a packet
was dropped, if it's arriving but not making it through the UDP/IP
stack (as is your case).

> I am going to try upgrading the kernel, and turning off the multicast 
> router kernel options as a next step.  But if you have any other ideas 
> at all, I'm all ears.

        "netstat -s" would be a good start. :-) tcpdump receiving a copy
of the packet does not mean UDP or IP won't drop it, but those drops
are counted.

                                                        +-DLS


^ permalink raw reply

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
From: Shaw Vrana @ 2006-04-21  2:42 UTC (permalink / raw)
  To: Michael Chan; +Cc: Herbert Xu, shawvrana, netdev, Auke Kok, David S. Miller
In-Reply-To: <1145578214.3195.6.camel@rh4>

[-- Attachment #1: Type: text/plain, Size: 441 bytes --]

On Thursday 20 April 2006 17:10, Michael Chan wrote:
> In tg3_remove_one(), we call flush_scheduled_work() in case the
> reset_task is still pending. Here, it is safe to call
> flush_scheduled_work() because we're not holding the rtnl. Again, when
> it runs, nothing bad will happen because it will see netif_running() ==
> 0.

I'll bite!  Here's a patch to add a call to flush_scheduled_work() in 
e1000_down.  It's against 2.6.16.9.

Shaw

[-- Attachment #2: e1000_flush_in_close.patch --]
[-- Type: text/x-diff, Size: 614 bytes --]

diff -u -uprN -X linux-2.6.16.9/Documentation/dontdiff linux-2.6.16.9/drivers/net/e1000/e1000_main.c linux-2.6.16.9-patch/drivers/net/e1000/e1000_main.c
--- linux-2.6.16.9/drivers/net/e1000/e1000_main.c	2006-04-18 23:10:14.000000000 -0700
+++ linux-2.6.16.9-patch/drivers/net/e1000/e1000_main.c	2006-04-20 19:36:55.000000000 -0700
@@ -538,6 +538,7 @@ e1000_down(struct e1000_adapter *adapter
 	del_timer_sync(&adapter->tx_fifo_stall_timer);
 	del_timer_sync(&adapter->watchdog_timer);
 	del_timer_sync(&adapter->phy_info_timer);
+	flush_scheduled_work();	
 
 #ifdef CONFIG_E1000_NAPI
 	netif_poll_disable(netdev);

^ permalink raw reply

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
From: Herbert Xu @ 2006-04-21  2:40 UTC (permalink / raw)
  To: Michael Chan; +Cc: shawvrana, netdev, auke-jan.h.kok, davem, jgarzik
In-Reply-To: <E1FWlWa-0007hN-00@gondolin.me.apana.org.au>

On Fri, Apr 21, 2006 at 12:37:36PM +1000, Herbert Xu wrote:
> 
> Rather than dealing with this individually in each driver perhaps we should
> come up with a more centralised solution?

One simple solution is to establish a separate queue for RTNL-holding
users or vice versa for non-RTNL holding networking users.  That
would allow the drivers to safely flush the non-RTNL queue while
holding the RTNL.
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
From: Herbert Xu @ 2006-04-21  2:37 UTC (permalink / raw)
  To: Michael Chan; +Cc: herbert, shawvrana, netdev, auke-jan.h.kok, davem, jgarzik
In-Reply-To: <1145578214.3195.6.camel@rh4>

Michael Chan <mchan@broadcom.com> wrote:
> 
> In tg3_remove_one(), we call flush_scheduled_work() in case the
> reset_task is still pending. Here, it is safe to call

Great.

> flush_scheduled_work() because we're not holding the rtnl. Again, when

Hmm doing a quick grep seems to indicate that quite a number of drivers
do this in netdev->close or other callbacks under RTNL.  This means that
they're all vulnerable to the linkwatch deadlock that you alluded to.

Rather than dealing with this individually in each driver perhaps we should
come up with a more centralised solution?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 0/10] [IOAT] I/OAT patches repost
From: Herbert Xu @ 2006-04-21  2:23 UTC (permalink / raw)
  To: David S. Miller; +Cc: andy.grover, olof, andrew.grover, netdev
In-Reply-To: <20060420.173853.60273448.davem@davemloft.net>

David S. Miller <davem@davemloft.net> wrote:
> 
> For I/O AT you'd really want to get the DMA engine going as soon
> as you had those packets, but I do not see a clean and reliable way
> to determine the target pages before the app gets back to recvmsg().

The vmsplice() system call proposed by Linus might be a good fit.

http://www.ussg.iu.edu/hypermail/linux/kernel/0604.2/0854.html
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
From: Michael Chan @ 2006-04-21  0:10 UTC (permalink / raw)
  To: Herbert Xu; +Cc: shawvrana, netdev, Auke Kok, David S. Miller
In-Reply-To: <20060421013340.GA29123@gondor.apana.org.au>

On Fri, 2006-04-21 at 11:33 +1000, Herbert Xu wrote:

> Actually, what if the tg3_close is followed by a tg3_open? That could
> produce a spurious reset which I suppose isn't that bad.

Yes, an extra reset. And yes, it isn't too bad.

> Also if the
> module is unloaded bad things will happen as well.

In tg3_remove_one(), we call flush_scheduled_work() in case the
reset_task is still pending. Here, it is safe to call
flush_scheduled_work() because we're not holding the rtnl. Again, when
it runs, nothing bad will happen because it will see netif_running() ==
0.



^ permalink raw reply

* Re: Open ethernet hardware specs
From: Alexey Dobriyan @ 2006-04-21  1:41 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Netdev List, Linux Kernel
In-Reply-To: <4448117E.3010708@garzik.org>

On Thu, Apr 20, 2006 at 06:55:58PM -0400, Jeff Garzik wrote:
> Also, janitors, there are more NIC specs at
> http://gkernel.sourceforge.net/specs/ than are listed on the wiki.  What
> I posted is just a starter list.  If someone were to comb through each
> PDF in the /specs/ sub-directories, and make sure it is linked on the
> wiki, I would be grateful.

Almost done.

P.S.:
http://gkernel.sourceforge.net/specs/via/501designguide.pdf.bz2 is
broken.


^ permalink raw reply

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
From: Herbert Xu @ 2006-04-21  1:33 UTC (permalink / raw)
  To: Michael Chan; +Cc: shawvrana, netdev, Auke Kok, David S. Miller
In-Reply-To: <20060421012701.GA29053@gondor.apana.org.au>

On Fri, Apr 21, 2006 at 11:27:01AM +1000, herbert wrote:
> On Thu, Apr 20, 2006 at 03:36:57PM -0700, Michael Chan wrote:
> >
> > If we're in tg3_close() and the reset task isn't running yet, tg3_close
> > () will proceed. However, when the reset task finally runs, it will see
> > that netif_running() is zero and will just return.
> 
> Yes you're absolutely right.

Actually, what if the tg3_close is followed by a tg3_open? That could
produce a spurious reset which I suppose isn't that bad.  Also if the
module is unloaded bad things will happen as well.  So I still don't
feel too comfortable about leaving it scheduled after a close.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox