Netdev List
 help / color / mirror / Atom feed
* [PATCH] ethernet: call __skb_pull() in eth_type_trans()
From: Changli Gao @ 2010-05-02 22:50 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev, Changli Gao

call __skb_pull() in eth_type_trans().

Since the callers of eth_type_trans() always feed it long enough packets,
we can use __skb_pull() to save some cycles.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 net/ethernet/eth.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 61ec032..aacbaf7 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -162,7 +162,7 @@ __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev)
 
 	skb->dev = dev;
 	skb_reset_mac_header(skb);
-	skb_pull_inline(skb, ETH_HLEN);
+	__skb_pull(skb, ETH_HLEN);
 	eth = eth_hdr(skb);
 
 	if (unlikely(is_multicast_ether_addr(eth->h_dest))) {

^ permalink raw reply related

* Re: [RFC] net-next: remove useless union keyword
From: Eric Dumazet @ 2010-05-03  7:44 UTC (permalink / raw)
  To: Changli Gao; +Cc: David Miller, netdev
In-Reply-To: <1272837206-13223-1-git-send-email-xiaosuo@gmail.com>

Le lundi 03 mai 2010 à 05:53 +0800, Changli Gao a écrit :
> remove useless union keyword in rtable, rt6_info and dn_route.
> 
> Since there is only one member in a union, the union keyword isn't useful.
> 
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> ----


Thats right, in 2.6.20, next pointer was relocalised at the end of
'struct dst_entry' in commits 093c2ca4167cf66f69020329d14138da0da8599b
and 1e19e02ca0c5e33ea73a25127dbe6c3b8fcaac4b

The union trick is only needed in 'struct dst_entry'.

Please respin your patch against net-next-2.6

patching file net/ipv6/ip6_output.c
Hunk #1 succeeded at 701 (offset 1 line).
Hunk #3 succeeded at 743 (offset 1 line).
Hunk #5 succeeded at 788 (offset 1 line).
Hunk #7 succeeded at 1159 (offset 1 line).
Hunk #9 succeeded at 1227 (offset 1 line).
Hunk #11 succeeded at 1284 (offset 1 line).
Hunk #13 succeeded at 1506 (offset 1 line).



^ permalink raw reply

* Re: [PATCH] macvtap: add ioctl to modify vnet header size
From: Michael S. Tsirkin @ 2010-05-03  7:55 UTC (permalink / raw)
  To: David Miller; +Cc: arnd, sri, eric.dumazet, netdev, linux-kernel, dlstevens
In-Reply-To: <20100502.233439.77341308.davem@davemloft.net>

On Sun, May 02, 2010 at 11:34:39PM -0700, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Sun, 02 May 2010 23:32:32 -0700 (PDT)
> 
> > From: Arnd Bergmann <arnd@arndb.de>
> > Date: Thu, 29 Apr 2010 16:40:57 +0200
> > 
> >> On Thursday 29 April 2010, Michael S. Tsirkin wrote:
> >>> This adds TUNSETVNETHDRSZ/TUNGETVNETHDRSZ support
> >>> to macvtap.
> >> 
> >> Looks good, thanks Michael!
> >> 
> >>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >> 
> >> Acked-by: Arnd Bergmann <arnd@arndb.de>
> > 
> > Applied to net-next-2.6, thanks.
> 
> Nevermind, reverted:
> 
> drivers/net/macvtap.c: In function 'macvtap_ioctl':
> drivers/net/macvtap.c:679:7: error: 'TUNGETVNETHDRSZ' undeclared (first use in this function)
> drivers/net/macvtap.c:679:7: note: each undeclared identifier is reported only once for each function it appears in
> drivers/net/macvtap.c:685:7: error: 'TUNSETVNETHDRSZ' undeclared (first use in this function)
> 
> What tree is this supposed to build under?  Certinaly not net-2.6
> or net-next-2.6

The reason is it needs to be applied on top of the patch that adds the
same header to tun that you acked.  I put this in patch description:
> I plan to merge both patches through vhost tree together
> with mergeable buffer support. Comments?
In other words, this will be included in a pull request that I
intend to send out shortly.

-- 
MST

^ permalink raw reply

* Re: [PATCH] ethernet: call __skb_pull() in eth_type_trans()
From: David Miller @ 2010-05-03  7:59 UTC (permalink / raw)
  To: xiaosuo; +Cc: eric.dumazet, netdev
In-Reply-To: <1272840617-17084-1-git-send-email-xiaosuo@gmail.com>

From: Changli Gao <xiaosuo@gmail.com>
Date: Mon,  3 May 2010 06:50:17 +0800

> call __skb_pull() in eth_type_trans().
> 
> Since the callers of eth_type_trans() always feed it long enough packets,
> we can use __skb_pull() to save some cycles.
> 
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>

Although they should, assuming that some runts won't show up here and
never check for that condition at all is dangerous.

At least we should have a WARN_ON() check for skb->len < ETH_ZLEN here
or similar.

So many other things get layered into ethernet, which means adding
this length assumption without any checks is bound to lead to
unpleasant surprises for somebody.

^ permalink raw reply

* Re: [PATCH] macvtap: add ioctl to modify vnet header size
From: David Miller @ 2010-05-03  8:00 UTC (permalink / raw)
  To: mst; +Cc: arnd, sri, eric.dumazet, netdev, linux-kernel, dlstevens
In-Reply-To: <20100503075511.GA8298@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Mon, 3 May 2010 10:55:11 +0300

> On Sun, May 02, 2010 at 11:34:39PM -0700, David Miller wrote:
>> What tree is this supposed to build under?  Certinaly not net-2.6
>> or net-next-2.6
> 
> The reason is it needs to be applied on top of the patch that adds the
> same header to tun that you acked.  I put this in patch description:
>> I plan to merge both patches through vhost tree together
>> with mergeable buffer support. Comments?
> In other words, this will be included in a pull request that I
> intend to send out shortly.

works for me:

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Performance problem in network namespaces
From: Martín Ferrari @ 2010-05-03  9:25 UTC (permalink / raw)
  To: netdev; +Cc: Mathieu Lacage

Hi,

When running some benchmarks to test the feasibility of using
namespaces for emulating networks, I have found a big drop in
performance when one of the namespaces is performing routing of
packets.

After some search, we found that  in ip_forward() the skb is being
copied. It seems that (ICMP and UDP, does not happen with TCP) packets
start with a small headroom (16 bytes in our observation) but skb_cow
always allocates at least NET_SKB_PAD (32 in x86) bytes of headroom,
thus triggering this unnecessary memcpy.

We made two crude attempts at fixing this, which are surely incorrect,
but hopefully somebody here could come up with a correct solution.

Our attempts are: 1. remove the lower bound in headroom size at
skb_cow, and 2. set needed_headroom in veth.c to NET_SKB_PAD; patches
included below.

Thanks.


diff -Naurp linux-2.6.34-rc5/include/linux/skbuff.h
../linux-2.6.34-rc5/include/linux/skbuff.h
--- linux-2.6.34-rc5/include/linux/skbuff.h	2010-04-20 01:29:56.000000000 +0200
+++ ../linux-2.6.34-rc5/include/linux/skbuff.h	2010-05-03
11:17:13.000000000 +0200
@@ -1526,8 +1526,6 @@ static inline int __skb_cow(struct sk_bu
 {
 	int delta = 0;

-	if (headroom < NET_SKB_PAD)
-		headroom = NET_SKB_PAD;
 	if (headroom > skb_headroom(skb))
 		delta = headroom - skb_headroom(skb);


diff -Naurp linux-2.6.34-rc5/drivers/net/veth.c
../linux-2.6.34-rc5/drivers/net/veth.c
--- linux-2.6.34-rc5/drivers/net/veth.c	2010-04-20 01:29:56.000000000 +0200
+++ ../linux-2.6.34-rc5/drivers/net/veth.c	2010-04-30 11:29:39.000000000 +0200
@@ -303,6 +303,8 @@ static void veth_setup(struct net_device
 	dev->ethtool_ops = &veth_ethtool_ops;
 	dev->features |= NETIF_F_LLTX;
 	dev->destructor = veth_dev_free;
+	/* Try to avoid skb copies when passing packets around */
+	dev->needed_headroom = NET_SKB_PAD;
 }

 /*


-- 
Martín Ferrari

^ permalink raw reply

* Re: [PATCH 1/3] ptp: Added a brand new class driver for ptp clocks.
From: Richard Cochran @ 2010-05-03 10:07 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: netdev
In-Reply-To: <4BDD5910.1040604@grandegger.com>

On Sun, May 02, 2010 at 12:50:56PM +0200, Wolfgang Grandegger wrote:
> 
> As long as the device is in use by an application, no other can access
> it, because the mutex is locked. Other application may want to read the
> PTP clock time while ptpd is running, though.

Yes, of course. I implemented it that way just to get started. I first
want to concentrate on getting the basic drivers in place (still have
IXP46x and Phyter to do), and then on the ancillary features, like
timers, time stamping external inputs, and so on.

I understand that some fine grained access control to the PTP clock
woul be nice to have, but I am not sure exactly what would work best,
and I would like to save that decision for later...

However, if you have some ideas, please take a look at the list of
features in the docu, and explain how you would like the access
control to work.

Or better yet, post a patch ;)

Thanks,
Richard

^ permalink raw reply

* [PATCH] IPv4: unresolved multicast route cleanup
From: Andreas Meißner @ 2010-05-03  9:47 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

from: Andreas Meissner <andreas.meissner@sphairon.com>

Fixes the expiration timer for unresolved multicast route entries.
In case new multicast routing requests come in faster than the 
expiration timeout occurs (e.g. zap through multicast TV streams), the 
timer is prevented from being called at time for already existing entries.

Signed-off by: Andreas Meissner <andreas.meissner@sphairon.com>
---
As the single timer is resetted to default whenever a new entry is made, 
the timeout for existing unresolved entires are missed and/or not 
updated. As a consequence new requests are denied when the limit of 
unresolved entries has been reached because old entries live longer than 
they are supposed to.
The solution is to reset the timer only for the first unresolved entry 
in the multicast routing cache. All other timers are already set and 
updated correctly within the timer function itself by now.
---
 ipv4/ipmr.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--- net/ipv4/ipmr.c.orig    2010-05-03 10:55:06.000000000 +0200
+++ net/ipv4/ipmr.c    2010-05-03 10:58:30.000000000 +0200
@@ -753,7 +753,8 @@ ipmr_cache_unresolved(struct net *net, v
         c->next = mfc_unres_queue;
         mfc_unres_queue = c;
 
-        mod_timer(&ipmr_expire_timer, c->mfc_un.unres.expires);
+        if (atomic_read(&net->ipv4.cache_resolve_queue_len) == 1)
+            mod_timer(&ipmr_expire_timer, c->mfc_un.unres.expires);
     }
 
     /*


^ permalink raw reply

* Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: Arjan van de Ven @ 2010-05-03 10:22 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andi Kleen, David Miller, hadi, xiaosuo, therbert, shemminger,
	netdev, lenb
In-Reply-To: <1272863834.2173.173.camel@edumazet-laptop>

On Mon, 03 May 2010 07:17:14 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le dimanche 02 mai 2010 à 20:50 -0700, Arjan van de Ven a écrit :
> 
> > we effectively do that. The thing is that C2 is so low cost normally
> > that it's still worth it even at 20k wakeups...
> > 
> > this is where the bios tells us how "heavy" the states are....
> > and 64 usec... is just not very much.
> 
> Maybe its low cost, (apparently, it is, since I can reach ~900.000
> ipis on my 16 cores machine) but multiply this by 16 or 32 or 64
> cpus, and clockevents_notify() cost appears to be a killer, all cpus
> compete on a single lock.
> 
> Maybe this notifier could use RCU ?

could this be an artifact of the local apic stopping in deeper C states?
(which is finally fixed in the Westmere generation)



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply

* Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: Andi Kleen @ 2010-05-03 10:34 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Eric Dumazet, Andi Kleen, David Miller, hadi, xiaosuo, therbert,
	shemminger, netdev, lenb
In-Reply-To: <20100503032227.268613ac@infradead.org>

> > Maybe its low cost, (apparently, it is, since I can reach ~900.000
> > ipis on my 16 cores machine) but multiply this by 16 or 32 or 64
> > cpus, and clockevents_notify() cost appears to be a killer, all cpus
> > compete on a single lock.
> > 
> > Maybe this notifier could use RCU ?
> 
> could this be an artifact of the local apic stopping in deeper C states?
> (which is finally fixed in the Westmere generation)

Yes it is I think.

But I suspect Eric wants a solution for Nehalem.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply

* Re: [PATCHv7] add mergeable buffers support to vhost_net
From: Michael S. Tsirkin @ 2010-05-03 10:34 UTC (permalink / raw)
  To: David L Stevens; +Cc: netdev, kvm, virtualization
In-Reply-To: <1272488232.11307.4.camel@w-dls.beaverton.ibm.com>

On Wed, Apr 28, 2010 at 01:57:12PM -0700, David L Stevens wrote:
> This patch adds mergeable receive buffer support to vhost_net.
> 
> Signed-off-by: David L Stevens <dlstevens@us.ibm.com>

I've been doing some more testing before sending out a pull
request, and I see a drastic performance degradation in guest to host
traffic when this is applied but mergeable buffers are not in used
by userspace (existing qemu-kvm userspace).

This is both with and without my patch on top.

Without patch:
[mst@tuck ~]$ sh runtest  2>&1 | tee ser-meregeable-disabled-kernel-only-tun-only.log
Starting netserver at port 12865
set_up_server could not establish a listen endpoint for  port 12865 with family AF_UNSPEC
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.4 (11.0.0.4) port 0 AF_INET : demo
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  16384  16384    10.00      9107.26   89.20    33.85    0.802   2.436  

With patch:
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.4 (11.0.0.4) port 0 AF_INET : demo
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  16384  16384    10.00        35.00   2.21     0.62     5.181   11.575 


For ease of testing, I put this on my tree
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-broken

Please take a look.
Thanks!

-- 
MST

^ permalink raw reply

* [PATCH  kernel 2.6.34-rc5] lib8390: to be SMP safe
From: Ken Kawasaki @ 2010-05-03 10:43 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20100307070256.cb86716d.ken_kawasaki@spring.nifty.jp>


lib8390:
	write the value "ENISR_ALL" to register "EN0_IMR"
	after enable_irq_lockdep_irqrestore. 

	This patch avoids frequent transmit error on SMP system.


Signed-off-by: Ken Kawasaki <ken_kawasaki@spring.nifty.jp>

---

--- linux-2.6.34-rc6/drivers/net/lib8390.c.orig	2010-05-02 16:49:57.000000000 +0900
+++ linux-2.6.34-rc6/drivers/net/lib8390.c	2010-05-02 18:09:18.000000000 +0900
@@ -367,9 +367,9 @@ static netdev_tx_t __ei_start_xmit(struc
 				dev->name, ei_local->tx1, ei_local->tx2, ei_local->lasttx);
 		ei_local->irqlock = 0;
 		netif_stop_queue(dev);
-		ei_outb_p(ENISR_ALL, e8390_base + EN0_IMR);
 		spin_unlock(&ei_local->page_lock);
 		enable_irq_lockdep_irqrestore(dev->irq, &flags);
+		ei_outb_p(ENISR_ALL, e8390_base + EN0_IMR);
 		dev->stats.tx_errors++;
 		return NETDEV_TX_BUSY;
 	}
@@ -407,10 +407,10 @@ static netdev_tx_t __ei_start_xmit(struc
 
 	/* Turn 8390 interrupts back on. */
 	ei_local->irqlock = 0;
-	ei_outb_p(ENISR_ALL, e8390_base + EN0_IMR);
 
 	spin_unlock(&ei_local->page_lock);
 	enable_irq_lockdep_irqrestore(dev->irq, &flags);
+	ei_outb_p(ENISR_ALL, e8390_base + EN0_IMR);
 
 	dev_kfree_skb (skb);
 	dev->stats.tx_bytes += send_length;

^ permalink raw reply

* Re: [patch v2.2 1/4] [PATCH v2.1 1/4] netfilter: xt_ipvs (netfilter matcher for IPVS)
From: Hannes Eder @ 2010-05-03 11:29 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Simon Horman, lvs-devel, netdev, linux-kernel, netfilter,
	Wensong Zhang, Julius Volz, David S. Miller,
	Netfilter Development Mailinglist
In-Reply-To: <4BDC543F.7060500@trash.net>

Thank you for picking this series of patches up again and thanks for
the feedback.

I'll send an updated version in the next days.

Cheers, -Hannes

On Sat, May 1, 2010 at 18:18, Patrick McHardy <kaber@trash.net> wrote:
> Simon Horman wrote:
>
>> @@ -0,0 +1,25 @@
>> +#ifndef _XT_IPVS_H
>> +#define _XT_IPVS_H 1
>
> You don't need to define a value.
>
>> +config NETFILTER_XT_MATCH_IPVS
>> +     tristate '"ipvs" match support'
>> +     depends on IP_VS
>> +     depends on NETFILTER_ADVANCED
>> +     help
>> +       This option allows you to match against IPVS properties of a packet.
>> +
>> +       If unsure, say N.
>
> You're using conntrack symbols, so this seems to need a dependency
> on NF_CONNTRACK.
>
>> +static bool ipvs_mt_check(const struct xt_mtchk_param *par)
>
> We've changed the signature to "int" in nf-next to be able to
> return errno codes. Please rebase your patches onto nf-next-2.6.git.
>
> Please also CC netfilter-devel at least for those parts that affect
> non-IPVS netfilter.
>
>> +{
>> +     if (par->family != NFPROTO_IPV4
>> +#ifdef CONFIG_IP_VS_IPV6
>> +         && par->family != NFPROTO_IPV6
>> +#endif
>> +             ) {
>> +             pr_info("protocol family %u not supported\n", par->family);
>> +             return false;
>> +     }
>> +
>> +     return true;
>> +}
>
>

^ permalink raw reply

* Re: [net-next-2.6 PATCH 2/2] add ndo_set_port_profile op support for enic dynamic vnics
From: Arnd Bergmann @ 2010-05-03 11:32 UTC (permalink / raw)
  To: Vivek Kashyap; +Cc: Scott Feldman, davem, netdev, chrisw, Jens Osterkamp
In-Reply-To: <alpine.LFD.2.00.1005022119140.16925@vk>

On Monday 03 May 2010, Vivek Kashyap wrote:
> > After a successful pre-associate-with-resource-reservation step, we
> > know that the actual associate step will be both fast and successful.
> > After it completes, the VSI is known to be on the destination
> > and all traffic goes there (replacing the gratuitous ARP method we do
> > today).
> >
> > I don't think we'd ever do a pre-associate without the
> > resource-reservation, but the standard defines both. In theory,
> > we could do a pre-associate at every switch in the data center
> > in order to find out if it's possible to migrate there.
> >
> > If you want to have more details, please look at the draft spec at
> > http://www.ieee802.org/1/files/public/docs2010/bg-joint-evb-0410v1.pdf
> 
> The basic difference is that in 'pre-associate with resoruce reservation', the 
> local buffers and resources needed for the eventual 'associate' are reserved
> at the switch port.  Therefore the associate will not fail with 
> 'insufficient resources'. It might otherwise.

Yes, that's exactly what I wrote. So do you have any idea why we would
ever not want to do the resource reservation?

	Arnd

^ permalink raw reply

* VLAN I/F's and TX queue.
From: Joakim Tjernlund @ 2010-05-03 11:34 UTC (permalink / raw)
  To: netdev


We noted dropped pkgs on our VLAN interfaces and i stated to look
for a cause. Here is a ifconfig example:

eth0      Link encap:Ethernet  HWaddr 00:AA:BB:CC:DD:EE
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:8886910 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8880219 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100
          RX bytes:1626842951 (1.5 GiB)  TX bytes:1555540810 (1.4 GiB)

eth0.1    Link encap:Ethernet  HWaddr 00:AA:BB:CC:DD:EE
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2163164 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2161943 errors:0 dropped:98 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2467090557 (2.2 GiB)  TX bytes:2480246455 (2.3 GiB)

eth0.1.1  Link encap:Ethernet  HWaddr 00:AA:BB:CC:DD:EE
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2163164 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2161943 errors:0 dropped:98 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2458437901 (2.2 GiB)  TX bytes:2471598683 (2.3 GiB)

Here I note that txqueuelen is 0 for eth0.1/eth0.1.1 and 100 for eth0 and
that it is only eth0.1 and eth0.1.1 that drops pkgs. It feels as if eth0.1
bypasses eth0's tx queue and passes pkgs directly to the HW driver. Is that so?
If so, that feels a bit strange and I am not sure how to best
fix this. Any ides?

Using kernel 2.6.33

     Jocke


^ permalink raw reply

* Re: ep93xx_eth stopps receiving packages
From: Stefan Agner @ 2010-05-03 11:37 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: netdev
In-Reply-To: <20100502104350.GS4586@mail.wantstofly.org>

Quoting Lennert Buytenhek <buytenh@wantstofly.org>:

> On Mon, Apr 19, 2010 at 05:38:13PM +0200, Stefan Agner wrote:
>
>> I'm using Linux 2.6.32.9 on a technologic systems TS-7250 SBC board, with
>> the ep93xx_eth driver for networking. On three identical, but independent
>> systems I noted that the system is unreachable after a while. On a serial
>> terminal I noted that only the TX counter counts onward, RX stays where it
>> is,
>> no matter if i try to ping from or to the system. Wireshark tells me exactly
>> that too: I see helpless ARP requests which gets answered, but no ICMP. The
>> system doesnt receive the ARP requests, and just sends another one.
>
> (So does the board or does it not respond to ARP requests for its IP?)
The board does not responds to ARP requests for its IP...

>> With a simple program which sends small packages in a fast pace I can
>> reproduce the problem after several seconds (additional CPU load seem to
>> provoke the problem even more). Remove and replug the network cable doesn't
>> solve the problem, but ifup/down does. I don't see any messages in dmesg,
>> memory is still available.
>
> Do you see interrupts increasing in /proc/interrupts when this happens?
No, interrupt doesn't increase anymore when it happens...

I debugged the problem myself inbetween, I just took not the time to format
and send the patch, sorry! There is a bug when interrupts gets disabled for
a longer period, each a frame arrives when one is just been processed.
ep93xx_rx gets called twice then, and the second time marks to many buffers
as relased. I corrected this error by releasing the correct number of buffers
when ep93xx_poll ends... Patch follows!

Stefan


-- 
Stefan Agner

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.


^ permalink raw reply

* [PATCH] ep93xx_eth stopps receiving packets
From: Stefan Agner @ 2010-05-03 11:42 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: netdev
In-Reply-To: <20100502104350.GS4586@mail.wantstofly.org>

Receiving small packet(s) in a fast pace leads to not receiving any
packets at all after some time.

After ethernet packet(s) arrived the receive descriptor is incremented
by the number of frames processed. If another packet arrives while
processing, this is processed in another call of ep93xx_rx. This
second call leads that too many receive descriptors getting released.

This fix increments, even in these case, the right number of processed
receive descriptors.

Signed-off-by: Stefan Agner <stefan@agner.ch>
---
  drivers/net/arm/ep93xx_eth.c |   10 +++++-----
  1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/arm/ep93xx_eth.c b/drivers/net/arm/ep93xx_eth.c
index 6995169..cd6cd3e 100644
--- a/drivers/net/arm/ep93xx_eth.c
+++ b/drivers/net/arm/ep93xx_eth.c
@@ -311,11 +311,6 @@ err:
  		processed++;
  	}

-	if (processed) {
-		wrw(ep, REG_RXDENQ, processed);
-		wrw(ep, REG_RXSTSENQ, processed);
-	}
-
  	return processed;
  }

@@ -350,6 +345,11 @@ poll_some_more:
  			goto poll_some_more;
  	}

+	if (rx) {
+                wrw(ep, REG_RXDENQ, rx);
+                wrw(ep, REG_RXSTSENQ, rx);
+        }
+
  	return rx;
  }

-- 
1.7.0


----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.


^ permalink raw reply related

* Re: [PATCH 1/2] ppp_generic: pull 2 bytes so that PPP_PROTO(skb) is valid
From: Simon Arlott @ 2010-05-03 11:50 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, paulus, linux-ppp
In-Reply-To: <20100502.232520.146109082.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 781 bytes --]

On Mon, May 3, 2010 07:25, David Miller wrote:
> From: Simon Arlott <simon@fire.lp0.eu>
> Date: Fri, 30 Apr 2010 19:41:17 +0100
>> @@ -1572,8 +1572,18 @@ ppp_input(struct ppp_channel *chan, struct sk_buff *skb)
>>  		return;
>>  	}
>>
>> -	proto = PPP_PROTO(skb);
>> +
>>  	read_lock_bh(&pch->upl);
>> +	if (!pskb_may_pull(skb, 2)) {
>
> This makes the skb->len == 0 test at the beginning completely redundant.
>
> Put your pskb_may_pull(skb, 2) call there and remove the skb->len==0
> check entirely.

If I move pskb_may_pull(skb, 2) up to where skb->len == 0 is then it can't
increment rx_length_errors because it doesn't have the read lock on pch->upl,
so I can only remove the redundant skb->len == 0 if that error count is to
remain.

Updated patch attached.

-- 
Simon Arlott

[-- Attachment #2: 0001-ppp_generic-pull-2-bytes-so-that-PPP_PROTO-skb-is-va.patch --]
[-- Type: application/octet-stream, Size: 2627 bytes --]

From f6d225971143db1ff5353008d20579e1de75f00d Mon Sep 17 00:00:00 2001
From: Simon Arlott <simon@fire.lp0.eu>
Date: Fri, 30 Apr 2010 19:04:33 +0100
Subject: [PATCH 1/2] ppp_generic: pull 2 bytes so that PPP_PROTO(skb) is valid

In ppp_input(), PPP_PROTO(skb) may refer to invalid data in the skb.

If this happens and (proto >= 0xc000 || proto == PPP_CCPFRAG) then
the packet is passed directly to pppd.

This occurs frequently when using PPPoE with an interface MTU
greater than 1500 because the skb is more likely to be non-linear.

The next 2 bytes need to be pulled in ppp_input(). The pull of 2
bytes in ppp_receive_frame() has been removed as it is no longer
required.

Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
---
 drivers/net/ppp_generic.c |   29 ++++++++++++++++++-----------
 1 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ppp_generic.c b/drivers/net/ppp_generic.c
index 6e281bc..75e8903 100644
--- a/drivers/net/ppp_generic.c
+++ b/drivers/net/ppp_generic.c
@@ -1567,13 +1567,22 @@ ppp_input(struct ppp_channel *chan, struct sk_buff *skb)
 	struct channel *pch = chan->ppp;
 	int proto;
 
-	if (!pch || skb->len == 0) {
+	if (!pch) {
 		kfree_skb(skb);
 		return;
 	}
 
-	proto = PPP_PROTO(skb);
 	read_lock_bh(&pch->upl);
+	if (!pskb_may_pull(skb, 2)) {
+		kfree_skb(skb);
+		if (pch->ppp) {
+			++pch->ppp->dev->stats.rx_length_errors;
+			ppp_receive_error(pch->ppp);
+		}
+		goto done;
+	}
+
+	proto = PPP_PROTO(skb);
 	if (!pch->ppp || proto >= 0xc000 || proto == PPP_CCPFRAG) {
 		/* put it on the channel queue */
 		skb_queue_tail(&pch->file.rq, skb);
@@ -1585,6 +1594,8 @@ ppp_input(struct ppp_channel *chan, struct sk_buff *skb)
 	} else {
 		ppp_do_recv(pch->ppp, skb, pch);
 	}
+
+done:
 	read_unlock_bh(&pch->upl);
 }
 
@@ -1617,7 +1628,8 @@ ppp_input_error(struct ppp_channel *chan, int code)
 static void
 ppp_receive_frame(struct ppp *ppp, struct sk_buff *skb, struct channel *pch)
 {
-	if (pskb_may_pull(skb, 2)) {
+	/* note: a 0-length skb is used as an error indication */
+	if (skb->len > 0) {
 #ifdef CONFIG_PPP_MULTILINK
 		/* XXX do channel-level decompression here */
 		if (PPP_PROTO(skb) == PPP_MP)
@@ -1625,15 +1637,10 @@ ppp_receive_frame(struct ppp *ppp, struct sk_buff *skb, struct channel *pch)
 		else
 #endif /* CONFIG_PPP_MULTILINK */
 			ppp_receive_nonmp_frame(ppp, skb);
-		return;
+	} else {
+		kfree_skb(skb);
+		ppp_receive_error(ppp);
 	}
-
-	if (skb->len > 0)
-		/* note: a 0-length skb is used as an error indication */
-		++ppp->dev->stats.rx_length_errors;
-
-	kfree_skb(skb);
-	ppp_receive_error(ppp);
 }
 
 static void
-- 
1.7.0.4


^ permalink raw reply related

* Re: [PATCH 3/3] ptp: Added a clock that uses the eTSEC found on the MPC85xx.
From: Kumar Gala @ 2010-05-03 12:35 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Netdev, linuxppc-dev, devicetree-discuss, Sandeep Gopalpet
In-Reply-To: <20100503062617.GA3310@riccoc20.at.omicron.at>


On May 3, 2010, at 1:26 AM, Richard Cochran wrote:

> On Sat, May 01, 2010 at 11:36:12AM -0500, Kumar Gala wrote:
>> Is there a binding document that describes this node you are adding?
> 
> No, but I will add one to Documentation/powerpc/dts-bindings.

Please do so we can review and comment.

- k

^ permalink raw reply

* [PATCH] unix/garbage: kill copy of the skb queue walker
From: Ilpo Järvinen @ 2010-05-03 13:22 UTC (permalink / raw)
  To: David Miller; +Cc: Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1569 bytes --]

Worse yet, it seems that its arguments were in reverse order. Also
remove one related helper which seems hardly worth keeping.

Compile tested.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/unix/garbage.c |   13 ++-----------
 1 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index 14c22c3..c8df6fd 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -153,15 +153,6 @@ void unix_notinflight(struct file *fp)
 	}
 }
 
-static inline struct sk_buff *sock_queue_head(struct sock *sk)
-{
-	return (struct sk_buff *)&sk->sk_receive_queue;
-}
-
-#define receive_queue_for_each_skb(sk, next, skb) \
-	for (skb = sock_queue_head(sk)->next, next = skb->next; \
-	     skb != sock_queue_head(sk); skb = next, next = skb->next)
-
 static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *),
 			  struct sk_buff_head *hitlist)
 {
@@ -169,7 +160,7 @@ static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *),
 	struct sk_buff *next;
 
 	spin_lock(&x->sk_receive_queue.lock);
-	receive_queue_for_each_skb(x, next, skb) {
+	skb_queue_walk_safe(&x->sk_receive_queue, skb, next) {
 		/*
 		 *	Do we have file descriptors ?
 		 */
@@ -225,7 +216,7 @@ static void scan_children(struct sock *x, void (*func)(struct unix_sock *),
 		 * and perform a scan on them as well.
 		 */
 		spin_lock(&x->sk_receive_queue.lock);
-		receive_queue_for_each_skb(x, next, skb) {
+		skb_queue_walk_safe(&x->sk_receive_queue, skb, next) {
 			u = unix_sk(skb->sk);
 
 			/*
-- 
1.5.6.3

^ permalink raw reply related

* Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: Arjan van de Ven @ 2010-05-03 14:09 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Eric Dumazet, David Miller, hadi, xiaosuo, therbert, shemminger,
	netdev, lenb
In-Reply-To: <20100503103426.GA25809@one.firstfloor.org>

On Mon, 3 May 2010 12:34:26 +0200
Andi Kleen <andi@firstfloor.org> wrote:

> > > Maybe its low cost, (apparently, it is, since I can reach ~900.000
> > > ipis on my 16 cores machine) but multiply this by 16 or 32 or 64
> > > cpus, and clockevents_notify() cost appears to be a killer, all
> > > cpus compete on a single lock.
> > > 
> > > Maybe this notifier could use RCU ?
> > 
> > could this be an artifact of the local apic stopping in deeper C
> > states? (which is finally fixed in the Westmere generation)
> 
> Yes it is I think.
> 
> But I suspect Eric wants a solution for Nehalem.

sure ;-)


so the hard problem is that on going idle, the local timers need to be
funneled to the external HPET. Afaik right now we use one channel of
the hpet, with the result that we have one global lock for this.

HPETs have more than one channel (2 or 3 historically, newer chipsets
iirc have a few more), so in principle we can split this lock at least
a little bit... if we can get to one hpet channel per level 3 cache
domain we'd already make huge progress in terms of cost of the
contention....



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply

* [PATCH v2] ethernet: call __skb_pull() in eth_type_trans()
From: Changli Gao @ 2010-05-03 14:12 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev, Changli Gao

call __skb_pull() in eth_type_trans().

The callers of eth_type_trans() should always feed it long enough packets. When
the length of the packet is less than ETH_ZLEN, a warning message will be shown,
and the later behaviors are undefined.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 net/ethernet/eth.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 61ec032..1df31cc 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -162,7 +162,10 @@ __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev)
 
 	skb->dev = dev;
 	skb_reset_mac_header(skb);
-	skb_pull_inline(skb, ETH_HLEN);
+	if (unlikely(skb->len < ETH_ZLEN))
+		dev_warn(&dev->dev, "too small ethernet packet: %u bytes\n",
+			 skb->len);
+	__skb_pull(skb, ETH_HLEN);
 	eth = eth_hdr(skb);
 
 	if (unlikely(is_multicast_ether_addr(eth->h_dest))) {

^ permalink raw reply related

* Re: [PATCH] [RFC] C/R: inet4 and inet6 unicast routes (v2)
From: Dan Smith @ 2010-05-03 14:21 UTC (permalink / raw)
  To: hadi; +Cc: Daniel Lezcano, containers, Vlad Yasevich, netdev, David Miller
In-Reply-To: <1272673614.14499.10.camel@bigi>

j> The problem as i see it (with all net structures not just routes -
j> i was equally pessimistic when i saw those other net structure
j> checkpoint/restore changes) is you are faced with a herculean
j> high-maintainance effort...  You have a separate piece of code
j> which populates structures that _you_ maintain for attributes that
j> are defined elsewhere by other people.  Nobody adding a new
j> attribute that is very important to route restoration for example
j> is likely to change your code. Unless you tie the two together (so
j> changing one forces the coder to change the other).  And once
j> people deploy kernels it is hard to change. Historically (for
j> pragmatic reasons) such rich interfaces sit in user space - much
j> easier to update user space.

The benefits of doing what we can in userspace are well-understood and
arguing for doing so where it makes sense is, of course, a good idea.

However, it seems to me that the rtnl interface provides us a
reasonable layer of isolation between us and such changes.  Am I
wrong?  The rtnl messages appear to be rather generic and timeless,
and in most cases have a significant amount of flexibility with
respect to allowing advanced attributes to be ignored (which implies
taking the default).

In many other areas of C/R we're not so lucky and don't have a
well-defined interface for dumping that information out of the
kernel...

-- 
Dan Smith
IBM Linux Technology Center
email: danms@us.ibm.com

^ permalink raw reply

* [PATCH] sky2: Avoid race in sky2_change_mtu
From: Mike McCormack @ 2010-05-03 14:18 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

netif_stop_queue does not ensure all in-progress transmits are complete,
 so use netif_tx_disable() instead.

Make sure NAPI polls are disabled, otherwise NAPI might trigger a TX
 restart between when we stop the queue and NAPI is disabled.

Signed-off-by: Mike McCormack <mikem@ring3k.org>
---
 drivers/net/sky2.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index 088c797..b839bae 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -2236,8 +2236,8 @@ static int sky2_change_mtu(struct net_device *dev, int new_mtu)
 	sky2_write32(hw, B0_IMSK, 0);
 
 	dev->trans_start = jiffies;	/* prevent tx timeout */
-	netif_stop_queue(dev);
 	napi_disable(&hw->napi);
+	netif_tx_disable(dev);
 
 	synchronize_irq(hw->pdev->irq);
 
-- 
1.5.6.5


^ permalink raw reply related

* Re: mmotm 2010-04-28 - RCU whinges
From: Valdis.Kletnieks @ 2010-05-03 14:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrew Morton, Peter Zijlstra, Patrick McHardy, David S. Miller,
	linux-kernel, netfilter-devel, netdev, Paul E. McKenney
In-Reply-To: <1272865137.2173.179.camel@edumazet-laptop>

[-- Attachment #1: Type: text/plain, Size: 2704 bytes --]

On Mon, 03 May 2010 07:38:57 +0200, Eric Dumazet said:
> Le dimanche 02 mai 2010 à 13:46 -0400, Valdis.Kletnieks@vt.edu a écrit :
> > On Wed, 28 Apr 2010 16:53:32 PDT, akpm@linux-foundation.org said:
> > > The mm-of-the-moment snapshot 2010-04-28-16-53 has been uploaded to
> > > 
> > >    http://userweb.kernel.org/~akpm/mmotm/
> > 
> > I thought we swatted all these, hit another one...

> Thanks for the report !
> 
> We can use rcu_dereference_protected() in those cases.
> 
> [PATCH] net: Use rcu_dereference_protected in nf_conntrack_ecache
> 
> Writers own nf_ct_ecache_mutex.

I *really* thought we swatted a bunch of these - did the fixes not make it
into linux-next or -mm?  Your patch fixed that one, but then:

[    9.128899] Netfilter messages via NETLINK v0.30.
[    9.128919] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[    9.129108] CONFIG_NF_CT_ACCT is deprecated and will be removed soon. Please use
[    9.129110] nf_conntrack.acct=1 kernel parameter, acct=1 nf_conntrack module option or
[    9.129113] sysctl net.netfilter.nf_conntrack_acct=1 to enable it.
[    9.129135] ctnetlink v0.93: registering with nfnetlink.
[    9.129452] ip_tables: (C) 2000-2006 Netfilter Core Team
[    9.129506] 
[    9.129507] ===================================================
[    9.129683] [ INFO: suspicious rcu_dereference_check() usage. ]
[    9.129777] ---------------------------------------------------
[    9.129872] net/netfilter/nf_log.c:55 invoked rcu_dereference_check() without protection!
[    9.129969] 
[    9.129969] other info that might help us debug this:
[    9.129970] 
[    9.130232] 
[    9.130232] rcu_scheduler_active = 1, debug_locks = 0
[    9.130407] 1 lock held by swapper/1:
[    9.130525]  #0:  (nf_log_mutex){+.+...}, at: [<ffffffff81481154>] nf_log_register+0x57/0x10f
[    9.130955] 
[    9.130956] stack backtrace:
[    9.131162] Pid: 1, comm: swapper Tainted: G        W   2.6.34-rc5-mmotm0428 #2
[    9.131259] Call Trace:
[    9.131370]  [<ffffffff81064832>] lockdep_rcu_dereference+0xaa/0xb2
[    9.131466]  [<ffffffff814811db>] nf_log_register+0xde/0x10f
[    9.131579]  [<ffffffff81b5ca28>] ? log_tg_init+0x0/0x29
[    9.131689]  [<ffffffff81b5ca4d>] log_tg_init+0x25/0x29
[    9.131800]  [<ffffffff810001ef>] do_one_initcall+0x59/0x14e
[    9.131912]  [<ffffffff81b2e68a>] kernel_init+0x144/0x1ce
[    9.132033]  [<ffffffff81003414>] kernel_thread_helper+0x4/0x10
[    9.132146]  [<ffffffff81598a40>] ? restore_args+0x0/0x30
[    9.132257]  [<ffffffff81b2e546>] ? kernel_init+0x0/0x1ce
[    9.132370]  [<ffffffff81003410>] ? kernel_thread_helper+0x0/0x10
[    9.132513] TCP bic registered


[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox