Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RFC] skb align patch
From: David Miller @ 2009-09-22  5:29 UTC (permalink / raw)
  To: eric.dumazet; +Cc: shemminger, jesse.brandeburg, hawk, netdev
In-Reply-To: <4AB84295.3050509@gmail.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 22 Sep 2009 05:20:53 +0200

> Oh I see, you want to optimize the rx (NIC has to do a DMA to write
> packet into host memory and this DMA could be a read /modify/write
> if address is not aligned, instead of a pure write), while I tried
> to align skb to optimize the pktgen tx (NIC has to do a DMA to read
> packet from host), and align the skb had no effect.

This is a problem with these kinds of changes.

This patch from Stephen came out of a presentation and discussion
at netconf where the Intel folks showed that if they did a combination
of things it improved NUMA forwarding numbers a lot.

So you couldn't just do NUMA spreading of RX queue memory, or just
do this ALIGN patch, or just eliminate the false sharing from
statistics updates.

You had to do all three to start seeing forwarding rates go up.

So don't worry, this is getting us somewhere to where improvement
shows, but individually each change won't trigger it.

The alignment in this patch is a real big deal for 64 byte forwarding
tests, where the entire packet is a whole PCI-E cacheline.  But not
if it isn't aligned properly.

^ permalink raw reply

* Re: [net-2.6 PATCH] igb: resolve namespacecheck warning for igb_hash_mc_addr
From: David Miller @ 2009-09-22  5:37 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, alexander.h.duyck
In-Reply-To: <20090918005219.25329.94906.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 17 Sep 2009 17:52:29 -0700

> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> This patch resolves a warning seen when doing namespace checking via
> "make namespacecheck"
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-2.6 PATCH 1/3] ixgbe: fix sfp_timer clean up in ixgbe_down
From: David Miller @ 2009-09-22  5:37 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: netdev, gospo, shannon.nelson, donald.c.skidmore,
	peter.p.waskiewicz.jr
In-Reply-To: <20090918194533.28898.49436.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Fri, 18 Sep 2009 12:45:43 -0700

> From: Don Skidmore <donald.c.skidmore@intel.com>
> 
> We weren't stoping the sfp_timer after the device was brought down.
> This patch properly cleans up.
> 
> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
> Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-2.6 PATCH 2/3] ixgbe: Allow tx itr specific settings
From: David Miller @ 2009-09-22  5:37 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, shannon.nelson, peter.p.waskiewicz.jr
In-Reply-To: <20090918194606.28898.37888.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Fri, 18 Sep 2009 12:46:06 -0700

> From: Nelson, Shannon <shannon.nelson@intel.com>
> 
> Allow the user to set Tx specific itr values.  This only makes sense
> when there are separate vectors for Tx and Rx.  When the queues are
> doubled up RxTx on the vectors, we still only use the rx itr value.
> 
> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
> Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-2.6 PATCH 3/3] ixgbe: move rx queue RSC configuration to a separate function
From: David Miller @ 2009-09-22  5:37 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: netdev, gospo, shannon.nelson, peter.p.waskiewicz.jr,
	donald.c.skidmore
In-Reply-To: <20090918194627.28898.75773.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Fri, 18 Sep 2009 12:46:27 -0700

> From: Nelson, Shannon <shannon.nelson@intel.com>
> 
> Shorten ixgbe_configure_rx() and lessen indent depth.
> 
> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
> Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [patch 1/1] net: fix CONFIG_NET=n build on sparc64
From: David Miller @ 2009-09-22  5:38 UTC (permalink / raw)
  To: akpm; +Cc: netdev
In-Reply-To: <200909181952.n8IJqEdD024614@imap1.linux-foundation.org>

From: akpm@linux-foundation.org
Date: Fri, 18 Sep 2009 12:52:13 -0700

> From: Andrew Morton <akpm@linux-foundation.org>
> 
> sparc64 allnoconfig:
> 
> arch/sparc/kernel/built-in.o(.text+0x134e0): In function `sys32_recvfrom':
> : undefined reference to `compat_sys_recvfrom'
> arch/sparc/kernel/built-in.o(.text+0x134e4): In function `sys32_recvfrom':
> : undefined reference to `compat_sys_recvfrom'
> 
> Cc: "David S. Miller" <davem@davemloft.net>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Applied.

^ permalink raw reply

* Re: [PATCH] cpmac: fix compilation errors against undeclared BUS_ID_SIZE
From: David Miller @ 2009-09-22  5:38 UTC (permalink / raw)
  To: florian; +Cc: netdev, ralf, linux-mips
In-Reply-To: <200909191243.09166.florian@openwrt.org>

From: Florian Fainelli <florian@openwrt.org>
Date: Sat, 19 Sep 2009 12:43:08 +0200

> David,
> 
> Ping ? This fixes a build failure. Thank you very much !

Applied, thanks.

^ permalink raw reply

* Re: [PATCH V3 1/2] cpc-usb: Removed driver from staging tree
From: David Miller @ 2009-09-22  5:39 UTC (permalink / raw)
  To: haas; +Cc: netdev, greg, wg, socketcan-core
In-Reply-To: <20090916120415.30391.69148.stgit@localhost.localdomain>

From: Sebastian Haas <haas@ems-wuensche.com>
Date: Wed, 16 Sep 2009 14:04:15 +0200

> This patch prepares replacing the staging driver cpc-usb with the new
> developed ems_usb CAN driver.
> 
> Signed-off-by: Sebastian Haas <haas@ems-wuensche.com>
> Acked-by: Greg Kroah-Hartman <gregkh@suse.de>

Applied.

^ permalink raw reply

* Re: [PATCH V3 2/2] ems_usb: Added support for EMS CPC-USB/ARM7 CAN/USB interface
From: David Miller @ 2009-09-22  5:39 UTC (permalink / raw)
  To: haas; +Cc: netdev, greg, wg, socketcan-core
In-Reply-To: <20090916120420.30391.40000.stgit@localhost.localdomain>

From: Sebastian Haas <haas@ems-wuensche.com>
Date: Wed, 16 Sep 2009 14:04:20 +0200

> This patch adds support for one channel CAN/USB interace CPC-USB/ARM7 from
> EMS Dr. Thomas Wuensche (http://www.ems-wuensche.com).
> 
> Signed-off-by: Sebastian Haas <haas@ems-wuensche.com>

Applied.

^ permalink raw reply

* Re: [PATCH 1/2] pktgen: check for link down
From: Stephen Hemminger @ 2009-09-22  5:55 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Miller, Robert Olsson, Jesper Dangaard Brouer, netdev
In-Reply-To: <20090919221844.114e2e23@nehalam>

On Sat, 19 Sep 2009 22:18:44 -0700
Stephen Hemminger <shemminger@vyatta.com> wrote:

> If cable is pulled, pktgen shouldn't continue slamming packets into the
> device.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> --- a/net/core/pktgen.c	2009-09-19 11:20:55.546463176 -0700
> +++ b/net/core/pktgen.c	2009-09-19 11:22:44.810509240 -0700
> @@ -1959,7 +1959,7 @@ static int pktgen_setup_dev(struct pktge
>  	if (odev->type != ARPHRD_ETHER) {
>  		printk(KERN_ERR "pktgen: not an ethernet device: \"%s\"\n", ifname);
>  		err = -EINVAL;
> -	} else if (!netif_running(odev)) {
> +	} else if (!netif_running(odev) || !netif_carrier_ok(odev)) {
>  		printk(KERN_ERR "pktgen: device is down: \"%s\"\n", ifname);
>  		err = -ENETDOWN;
>  	} else {
> @@ -3410,7 +3410,7 @@ static void pktgen_xmit(struct pktgen_de
>  	/* Did we saturate the queue already? */
>  	if (netif_tx_queue_stopped(txq) || netif_tx_queue_frozen(txq)) {
>  		/* If device is down, then all queues are permnantly frozen */
> -		if (netif_running(odev))
> +		if (netif_running(odev) && netif_carrier_ok(odev))
>  			idle(pkt_dev);
>  		else
>  			pktgen_stop_device(pkt_dev);

You can hold off on these two patches, I have better version
which fixes some other issues. But testing time is limited this week.

^ permalink raw reply

* Re: [PATCH 12/13] TProxy: added IPv6 support to the socket match
From: Balazs Scheidler @ 2009-09-22  6:33 UTC (permalink / raw)
  To: Brian Haley; +Cc: netfilter-devel, netdev
In-Reply-To: <4AB7BEF8.5050800@hp.com>

On Mon, 2009-09-21 at 13:59 -0400, Brian Haley wrote:
> Balazs Scheidler wrote:
> > +static bool
> > +socket_mt6_v1(const struct sk_buff *skb, const struct xt_match_param *par)
> > +{
> > +	struct ipv6hdr *iph = ipv6_hdr(skb);
> > +	struct udphdr _hdr, *hp = NULL;
> > +	struct sock *sk;
> > +	struct in6_addr *daddr, *saddr;
> > +	__be16 dport, sport;
> > +        int thoff;
> > +	u8 tproto;
> > +        const struct xt_socket_mtinfo1 *info = (struct xt_socket_mtinfo1 *) par->matchinfo;
> > +        
> > +        tproto = ipv6_find_hdr(skb, &thoff, -1, NULL);
> > +        if (tproto < 0) {
> > +		pr_debug("socket match: Unable to find transport header in IPv6 packet, dropping\n");
> > +		return NF_DROP;
> > +        }
> > +
> > +	if (tproto == IPPROTO_UDP || tproto == IPPROTO_TCP) {
> > +		hp = skb_header_pointer(skb, thoff,
> > +					sizeof(_hdr), &_hdr);
> > +		if (hp == NULL)
> > +			return false;
> > +
> > +		saddr = &iph->saddr;
> > +		sport = hp->source;
> > +		daddr = &iph->daddr;
> > +		dport = hp->dest;
> > +
> > +	} else if (tproto == IPPROTO_ICMP) {
> > +		if (extract_icmp6_fields(skb, thoff, &tproto, &saddr, &daddr,
> > +					 &sport, &dport))
> > +			return false;
> > +	} else {
> > +		return false;
> > +	}
> 
> Shouldn't this be IPPROTO_ICMPV6?

Yeah, thanks for spotting this. I'm going to have to add ICMP checks to
my test program, or at least retest that functionality manually.

-- 
Bazsi


^ permalink raw reply

* Re: [PATCH 13/13] TProxy: use the interface primary IP address as a default value for --on-ip
From: Balazs Scheidler @ 2009-09-22  6:38 UTC (permalink / raw)
  To: Brian Haley; +Cc: netfilter-devel, netdev
In-Reply-To: <4AB7BF47.2030404@hp.com>

On Mon, 2009-09-21 at 14:00 -0400, Brian Haley wrote:
> Balazs Scheidler wrote: 
> >  #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
> > +
> > +static inline const struct in6_addr *
> > +tproxy_laddr6(struct sk_buff *skb, const struct in6_addr *user_laddr, const struct in6_addr *daddr)
> > +{
> > +	struct inet6_dev *indev;
> > +	struct inet6_ifaddr *ifa;
> > +	struct in6_addr *laddr;
> > +	
> > +        if (!ipv6_addr_any(user_laddr))
> > +                return user_laddr;
> > +	
> > +        laddr = NULL;
> > +        rcu_read_lock();
> > +        indev = __in6_dev_get(skb->dev);
> > +        if (indev && (ifa = indev->addr_list)) {
> > +		laddr = &ifa->addr;
> > +	}
> > +        rcu_read_unlock();
> > +        
> > +        return laddr ? laddr : daddr;
> > +}
> 
> You should call ipv6_dev_get_saddr() to get a source address based on the target
> destination address.

Thanks for this hint, however this is not selecting a source address for
a given destination, rather it selects the address where tproxy is
redirecting the connection in case the user specified no --on-ip
parameter.

e.g. 

ip6tables -A PREROUTING -p tcp --dport 80 -j TPROXY --on-port 50080

This should redirect the connection to the primary IP address of the
incoming interface. In fact I spent 2 hours to figure out how to find
the proper address, and at the end I used the first IP address
configured to the interface, seeing that those addresses are sorted in
'scope' order, e.g. link-local and site-local addresses are at the end
of the list, thus the front should be ok.

Since I'm not that much into IPv6, I'd appreciate some help, is
ipv6_dev_get_saddr(client_ip_address) indeed the best solution here?

-- 
Bazsi

^ permalink raw reply

* Re: [PATCH 02/13] TProxy: add lookup type checks for UDP in nf_tproxy_get_sock_v4()
From: Balazs Scheidler @ 2009-09-22  6:40 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel, netdev
In-Reply-To: <alpine.LSU.2.00.0909212320150.31023@obet.zrqbmnf.qr>

On Mon, 2009-09-21 at 23:20 +0200, Jan Engelhardt wrote:
> On Saturday 2009-08-15 14:01, Balazs Scheidler wrote:
> 
> >+	case IPPROTO_UDP:
> >+		sk = udp4_lib_lookup(net, saddr, sport, daddr, dport,
> >+				     in->ifindex);
> 
> You might want to add IPPROTO_UDPLITE in all places.

Well, I preferred not to add those, as I'm unable to test it, and I'd
prefer to submit a patch that really works for UDP and TCP at the
minimum. Further protocols (like SCTP) should be added later IMHO.

-- 
Bazsi


^ permalink raw reply

* Re: [PATCH][RESEND] IPv6: 6rd tunnel mode
From: Alexandre Cassen @ 2009-09-22  6:59 UTC (permalink / raw)
  To: Brian Haley; +Cc: netdev
In-Reply-To: <4AB838F1.1090704@hp.com>

Hi Brian,

On Mon, 2009-09-21 at 22:39 -0400, Brian Haley wrote:
> Hi Alexandre,
> 
> Alexandre Cassen wrote:
> > This patch add support to 6rd tunnel mode currently targetting
> > standard track at the IETF.
> > 
> > IPv6 rapid deployment (RFC5569) builds upon mechanisms of 6to4 (RFC3056)
> > to enable a service provider to rapidly deploy IPv6 unicast service
> > to IPv4 sites to which it provides customer premise equipment.  Like
> > 6to4, it utilizes stateless IPv6 in IPv4 encapsulation in order to
> > transit IPv4-only network infrastructure. Unlike 6to4, a 6rd service
> > provider uses an IPv6 prefix of its own in place of the fixed 6to4
> > prefix.
> 
> I couldn't find RFC 5569 (delayed due to IPR rights?), although I did find
> the latest 6rd draft, -03.  It was showing as Informational, not Standards
> track, is that right?  Just curious.

In fact there is currently two draft :

1) https://datatracker.ietf.org/idtracker/draft-despres-6rd/

   This draft is targeting informational RFC as an independent
submission. It is currently queued and has been delayed since may for
IPR.

2) http://tools.ietf.org/html/draft-townsley-ipv6-6rd-01

   This draft is targeting standard track so work is in progress here.

A good sum up has been done by Mark Townsley at last IETF meeting :

http://www.ietf.org/proceedings/75/slides/dhc-4.pdf

> > +		case SIOCADD6RD:
> > +		case SIOCCHG6RD:
> > +			if (ip6rd.prefixlen >= 95) {
> > +				err = -EINVAL;
> > +				goto done;
> > +			}
> > +			t->ip6rd_prefix.addr = ip6rd.addr;
> 
> ipv6_addr_copy(&t->ip6rd_prefix.addr, &ip6rd.addr); is the preferred way to
> copy the address.

agreed. will fix and resend.

regs,
Alexandre


^ permalink raw reply

* Re: [PATCH 10/13] TProxy: added IPv6 socket lookup function to nf_tproxy_core
From: Jan Engelhardt @ 2009-09-22  8:30 UTC (permalink / raw)
  To: Balazs Scheidler; +Cc: netfilter-devel, netdev
In-Reply-To: <1253548005.12519.10.camel@bzorp.balabit>


On Monday 2009-08-24 14:51, Balazs Scheidler wrote:
>+		case NFT_LOOKUP_LISTENER:
>+			sk = inet6_lookup_listener(net, &tcp_hashinfo,
>+                                                   daddr, ntohs(dport),
>+                                                   in->ifindex);
>+
>+                        /* NOTE: we return listeners even if bound to
>+                         * 0.0.0.0, those are filtered out in

s/0.0.0.0/::/g  :-)

^ permalink raw reply

* Re: [PATCH 12/13] TProxy: added IPv6 support to the socket match
From: Jan Engelhardt @ 2009-09-22  8:33 UTC (permalink / raw)
  To: Balazs Scheidler; +Cc: netfilter-devel, netdev
In-Reply-To: <1253548005.12519.12.camel@bzorp.balabit>

On Monday 2009-08-24 14:52, Balazs Scheidler wrote:

>+static bool
>+socket_mt6_v1(const struct sk_buff *skb, const struct xt_match_param *par)
>+{
>+	struct ipv6hdr *iph = ipv6_hdr(skb);
>+	struct udphdr _hdr, *hp = NULL;
>+	struct sock *sk;
>+	struct in6_addr *daddr, *saddr;
>+	__be16 dport, sport;
>+        int thoff;
>+	u8 tproto;
>+        const struct xt_socket_mtinfo1 *info = (struct xt_socket_mtinfo1 *) par->matchinfo;
>+        
>+        tproto = ipv6_find_hdr(skb, &thoff, -1, NULL);
>+        if (tproto < 0) {
>+		pr_debug("socket match: Unable to find transport header in IPv6 packet, dropping\n");
>+		return NF_DROP;
>+        }
>+
>+	if (tproto == IPPROTO_UDP || tproto == IPPROTO_TCP) {

The tabbing seems off (also noticed this in other patches,
pcregrep for '^ {8}' )

^ permalink raw reply

* [PATCH][RESEND 2] IPv6: 6rd tunnel mode
From: Alexandre Cassen @ 2009-09-22  8:51 UTC (permalink / raw)
  To: netdev

This patch add support to 6rd tunnel mode currently targetting
standard track at the IETF.

Patch history :
* http://patchwork.ozlabs.org/patch/26870/
* http://patchwork.ozlabs.org/patch/34026/

IPv6 rapid deployment (RFC5569) builds upon mechanisms of 6to4 (RFC3056)
to enable a service provider to rapidly deploy IPv6 unicast service
to IPv4 sites to which it provides customer premise equipment.  Like
6to4, it utilizes stateless IPv6 in IPv4 encapsulation in order to
transit IPv4-only network infrastructure. Unlike 6to4, a 6rd service
provider uses an IPv6 prefix of its own in place of the fixed 6to4
prefix.

Signed-off-by: Alexandre Cassen <acassen@freebox.fr>
---
 include/linux/if_tunnel.h |   10 +++++
 include/net/ipip.h        |    2 +
 net/ipv6/Kconfig          |   13 +++++++
 net/ipv6/sit.c            |   84 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 109 insertions(+), 0 deletions(-)

diff --git a/include/linux/if_tunnel.h b/include/linux/if_tunnel.h
index 5eb9b0f..0d44376 100644
--- a/include/linux/if_tunnel.h
+++ b/include/linux/if_tunnel.h
@@ -15,6 +15,10 @@
 #define SIOCADDPRL      (SIOCDEVPRIVATE + 5)
 #define SIOCDELPRL      (SIOCDEVPRIVATE + 6)
 #define SIOCCHGPRL      (SIOCDEVPRIVATE + 7)
+#define SIOCGET6RD      (SIOCDEVPRIVATE + 8)
+#define SIOCADD6RD      (SIOCDEVPRIVATE + 9)
+#define SIOCDEL6RD      (SIOCDEVPRIVATE + 10)
+#define SIOCCHG6RD      (SIOCDEVPRIVATE + 11)
 
 #define GRE_CSUM	__cpu_to_be16(0x8000)
 #define GRE_ROUTING	__cpu_to_be16(0x4000)
@@ -51,6 +55,12 @@ struct ip_tunnel_prl {
 /* PRL flags */
 #define	PRL_DEFAULT		0x0001
 
+/* 6RD parms */
+struct ip_tunnel_6rd {
+	struct in6_addr		addr;
+	__u8			prefixlen;
+};
+
 enum
 {
 	IFLA_GRE_UNSPEC,
diff --git a/include/net/ipip.h b/include/net/ipip.h
index 5d3036f..fa92c41 100644
--- a/include/net/ipip.h
+++ b/include/net/ipip.h
@@ -26,6 +26,8 @@ struct ip_tunnel
 
 	struct ip_tunnel_prl_entry	*prl;		/* potential router list */
 	unsigned int			prl_count;	/* # of entries in PRL */
+
+	struct ip_tunnel_6rd	ip6rd_prefix;	/* 6RD SP prefix */
 };
 
 /* ISATAP: default interval between RS in secondy */
diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index ead6c7a..78a565b 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -170,6 +170,19 @@ config IPV6_SIT
 
 	  Saying M here will produce a module called sit. If unsure, say Y.
 
+config IPV6_SIT_6RD
+	bool "IPv6: 6rd tunnel mode (EXPERIMENTAL)"
+	depends on IPV6_SIT && EXPERIMENTAL
+	default n
+	---help---
+	IPv6 rapid deployment (RFC5569) builds upon mechanisms of 6to4 (RFC3056)
+	to enable a service provider to rapidly deploy IPv6 unicast service
+	to IPv4 sites to which it provides customer premise equipment.  Like
+	6to4, it utilizes stateless IPv6 in IPv4 encapsulation in order to
+	transit IPv4-only network infrastructure. Unlike 6to4, a 6rd service
+	provider uses an IPv6 prefix of its own in place of the fixed 6to4
+	prefix.
+
 config IPV6_NDISC_NODETYPE
 	bool
 
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 0ae4f64..034acdc 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -604,6 +604,30 @@ static inline __be32 try_6to4(struct in6_addr *v6dst)
 	return dst;
 }
 
+#ifdef CONFIG_IPV6_SIT_6RD
+/* Returns the embedded IPv4 address if the IPv6 address comes from
+   6rd rule */
+
+static inline __be32 try_6rd(struct in6_addr *addr, u8 prefix_len, struct in6_addr *v6dst)
+{
+	__be32 dst = 0;
+
+	/* isolate addr according to mask */
+	if (ipv6_prefix_equal(v6dst, addr, prefix_len)) {
+		unsigned int d32_off, bits;
+
+		d32_off = prefix_len >> 5;
+		bits = (prefix_len & 0x1f);
+
+		dst = (ntohl(v6dst->s6_addr32[d32_off]) << bits);
+		if (bits)
+			dst |= ntohl(v6dst->s6_addr32[d32_off + 1]) >> (32 - bits);
+		dst = htonl(dst);
+	}
+	return dst;
+}
+#endif
+
 /*
  *	This function assumes it is being called from dev_queue_xmit()
  *	and that skb is filled properly by that function.
@@ -657,6 +681,13 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 			goto tx_error;
 	}
 
+#ifdef CONFIG_IPV6_SIT_6RD
+	if (!dst && tunnel->ip6rd_prefix.prefixlen)
+		dst = try_6rd(&tunnel->ip6rd_prefix.addr,
+			      tunnel->ip6rd_prefix.prefixlen,
+			      &iph6->daddr);
+       else
+#endif
 	if (!dst)
 		dst = try_6to4(&iph6->daddr);
 
@@ -848,6 +879,9 @@ ipip6_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 	int err = 0;
 	struct ip_tunnel_parm p;
 	struct ip_tunnel_prl prl;
+#ifdef CONFIG_IPV6_SIT_6RD
+	struct ip_tunnel_6rd ip6rd;
+#endif
 	struct ip_tunnel *t;
 	struct net *net = dev_net(dev);
 	struct sit_net *sitn = net_generic(net, sit_net_id);
@@ -987,6 +1021,56 @@ ipip6_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 		netdev_state_change(dev);
 		break;
 
+#ifdef CONFIG_IPV6_SIT_6RD
+	case SIOCGET6RD:
+		err = -EINVAL;
+		if (dev == sitn->fb_tunnel_dev)
+			goto done;
+		err = -ENOENT;
+		if (!(t = netdev_priv(dev)))
+			goto done;
+		memcpy(&ip6rd, &t->ip6rd_prefix, sizeof(ip6rd));
+		if (copy_to_user(ifr->ifr_ifru.ifru_data, &ip6rd, sizeof(ip6rd)))
+			err = -EFAULT;
+		else
+			err = 0;
+		break;
+
+	case SIOCADD6RD:
+	case SIOCDEL6RD:
+	case SIOCCHG6RD:
+		err = -EPERM;
+		if (!capable(CAP_NET_ADMIN))
+			goto done;
+		err = -EINVAL;
+		if (dev == sitn->fb_tunnel_dev)
+			goto done;
+		err = -EFAULT;
+		if (copy_from_user(&ip6rd, ifr->ifr_ifru.ifru_data, sizeof(ip6rd)))
+			goto done;
+		err = -ENOENT;
+		if (!(t = netdev_priv(dev)))
+			goto done;
+
+		err = 0;
+		switch (cmd) {
+		case SIOCDEL6RD:
+			memset(&t->ip6rd_prefix, 0, sizeof(ip6rd));
+			break;
+		case SIOCADD6RD:
+		case SIOCCHG6RD:
+			if (ip6rd.prefixlen >= 95) {
+				err = -EINVAL;
+				goto done;
+			}
+			ipv6_addr_copy(&t->ip6rd_prefix.addr, &ip6rd.addr);
+			t->ip6rd_prefix.prefixlen = ip6rd.prefixlen;
+			break;
+		}
+		netdev_state_change(dev);
+		break;
+#endif
+
 	default:
 		err = -EINVAL;
 	}
-- 
1.6.0.4


^ permalink raw reply related

* Resend: [PATCH] TCP Early Retransmit: reduce required dupacks for triggering fast retrans
From: Christian Samsel @ 2009-09-22  8:59 UTC (permalink / raw)
  To: netdev

This patch implements draft-ietf-tcpm-early-rexmt. The early retransmit 
mechanism allows the transport to reduce the number of duplicate
acknowledgments required to trigger a fast retransmission in case we
don't expect enough dupacks, (e.g. because there are not enough
packets inflight and nothing to send). This allows the transport to use
fast retransmit to recover packet losses that would otherwise require
a lengthy retransmission timeout.

See: http://tools.ietf.org/html/draft-ietf-tcpm-early-rexmt-01

Signed-off-by: Christian Samsel <christian.samsel@rwth-aachen.de>

---
 net/ipv4/tcp_input.c |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index af6d6fa..c0cc4fd 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2913,6 +2913,7 @@ static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
  int do_lost = is_dupack || ((flag & FLAG_DATA_SACKED) &&
                                     (tcp_fackets_out(tp) > tp->reordering));
  int fast_rexmit = 0, mib_idx;
+ u32 in_flight;
 
  if (WARN_ON(!tp->packets_out && tp->sacked_out))
          tp->sacked_out = 0;
@@ -3062,6 +3063,21 @@ static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
  if (do_lost || (tcp_is_fack(tp) && tcp_head_timedout(sk)))
          tcp_update_scoreboard(sk, fast_rexmit);
  tcp_cwnd_down(sk, flag);
+       
+
+ /* draft-ietf-tcpm-early-rexmt: lowers dup ack threshold to prevent rto
+         * in case we don't expect enough dup ack. if number of outstanding
+         * packets is less than four and there is either no unsent data ready
+         * for transmission or the advertised window does not permit new
+         * segments.
+         */
+ in_flight = tcp_packets_in_flight(tp);
+ if ( in_flight < 4 && (skb_queue_empty(&sk->sk_write_queue) ||
+         tcp_may_send_now(sk) == 0) )
+         tp->reordering = in_flight - 1;
+ else if (tp->reordering != sysctl_tcp_reordering)
+         tp->reordering = sysctl_tcp_reordering;
+
  tcp_xmit_retransmit_queue(sk);
 }
 
-- 
1.6.4.1


^ permalink raw reply related

* Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
From: Avi Kivity @ 2009-09-22  9:43 UTC (permalink / raw)
  To: Ira W. Snyder
  Cc: Gregory Haskins, Michael S. Tsirkin, netdev, virtualization, kvm,
	linux-kernel, mingo, linux-mm, akpm, hpa, Rusty Russell, s.hetze,
	alacrityvm-devel
In-Reply-To: <20090921214312.GJ7182@ovro.caltech.edu>

On 09/22/2009 12:43 AM, Ira W. Snyder wrote:
>
>> Sure, virtio-ira and he is on his own to make a bus-model under that, or
>> virtio-vbus + vbus-ira-connector to use the vbus framework.  Either
>> model can work, I agree.
>>
>>      
> Yes, I'm having to create my own bus model, a-la lguest, virtio-pci, and
> virtio-s390. It isn't especially easy. I can steal lots of code from the
> lguest bus model, but sometimes it is good to generalize, especially
> after the fourth implemention or so. I think this is what GHaskins tried
> to do.
>    

Yes.  vbus is more finely layered so there is less code duplication.

The virtio layering was more or less dictated by Xen which doesn't have 
shared memory (it uses grant references instead).  As a matter of fact 
lguest, kvm/pci, and kvm/s390 all have shared memory, as you do, so that 
part is duplicated.  It's probably possible to add a virtio-shmem.ko 
library that people who do have shared memory can reuse.

> I've given it some thought, and I think that running vhost-net (or
> similar) on the ppc boards, with virtio-net on the x86 crate server will
> work. The virtio-ring abstraction is almost good enough to work for this
> situation, but I had to re-invent it to work with my boards.
>
> I've exposed a 16K region of memory as PCI BAR1 from my ppc board.
> Remember that this is the "host" system. I used each 4K block as a
> "device descriptor" which contains:
>
> 1) the type of device, config space, etc. for virtio
> 2) the "desc" table (virtio memory descriptors, see virtio-ring)
> 3) the "avail" table (available entries in the desc table)
>    

Won't access from x86 be slow to this memory (on the other hand, if you 
change it to main memory access from ppc will be slow... really depends 
on how your system is tuned.

> Parts 2 and 3 are repeated three times, to allow for a maximum of three
> virtqueues per device. This is good enough for all current drivers.
>    

The plan is to switch to multiqueue soon.  Will not affect you if your 
boards are uniprocessor or small smp.

> I've gotten plenty of email about this from lots of interested
> developers. There are people who would like this kind of system to just
> work, while having to write just some glue for their device, just like a
> network driver. I hunch most people have created some proprietary mess
> that basically works, and left it at that.
>    

So long as you keep the system-dependent features hookable or 
configurable, it should work.

> So, here is a desperate cry for help. I'd like to make this work, and
> I'd really like to see it in mainline. I'm trying to give back to the
> community from which I've taken plenty.
>    

Not sure who you're crying for help to.  Once you get this working, post 
patches.  If the patches are reasonably clean and don't impact 
performance for the main use case, and if you can show the need, I 
expect they'll be merged.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM
From: Michael S. Tsirkin @ 2009-09-22 10:38 UTC (permalink / raw)
  To: Chris Wright
  Cc: Stephen Hemminger, Rusty Russell, virtualization, Xin, Xiaohui,
	kvm@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, hpa@zytor.com,
	mingo@elte.hu, akpm@linux-foundation.org
In-Reply-To: <20090921162718.GM26034@sequoia.sous-sol.org>

On Mon, Sep 21, 2009 at 09:27:18AM -0700, Chris Wright wrote:
> * Stephen Hemminger (shemminger@vyatta.com) wrote:
> > On Mon, 21 Sep 2009 16:37:22 +0930
> > Rusty Russell <rusty@rustcorp.com.au> wrote:
> > 
> > > > > Actually this framework can apply to traditional network adapters which have
> > > > > just one tx/rx queue pair. And applications using the same user/kernel interface
> > > > > can utilize this framework to send/receive network traffic directly thru a tx/rx
> > > > > queue pair in a network adapter.
> > > > > 
> > 
> > More importantly, when virtualizations is used with multi-queue
> > NIC's the virtio-net NIC is a single CPU bottleneck. The virtio-net
> > NIC should preserve the parallelism (lock free) using multiple
> > receive/transmit queues. The number of queues should equal the
> > number of CPUs.
> 
> Yup, multiqueue virtio is on todo list ;-)
> 
> thanks,
> -chris

Note we'll need multiqueue tap for that to help.

-- 
MST

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH] drivers/net/wireless: Use usb_endpoint_dir_out
From: Julia Lawall @ 2009-09-22 11:45 UTC (permalink / raw)
  To: John W. Linville, Ulrich Kunitz, Daniel Drake, linux-wireless,
	netdev, linux-kernel

From: Julia Lawall <julia@diku.dk>

Use the usb_endpoint_dir_out API function.  Note that the use of
USB_TYPE_MASK in the original code is incorrect; it results in a test that
is always false.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
struct usb_endpoint_descriptor *endpoint;
expression E;
@@

- (endpoint->bEndpointAddress & E) == USB_DIR_OUT
+ usb_endpoint_dir_out(endpoint)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>

---
 drivers/net/wireless/zd1211rw/zd_usb.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -u -p a/drivers/net/wireless/zd1211rw/zd_usb.c b/drivers/net/wireless/zd1211rw/zd_usb.c
--- a/drivers/net/wireless/zd1211rw/zd_usb.c
+++ b/drivers/net/wireless/zd1211rw/zd_usb.c
@@ -1070,7 +1070,7 @@ static int eject_installer(struct usb_in
 
 	/* Find bulk out endpoint */
 	endpoint = &iface_desc->endpoint[1].desc;
-	if ((endpoint->bEndpointAddress & USB_TYPE_MASK) == USB_DIR_OUT &&
+	if (usb_endpoint_dir_out(endpoint) &&
 	    usb_endpoint_xfer_bulk(endpoint)) {
 		bulk_out_ep = endpoint->bEndpointAddress;
 	} else {

^ permalink raw reply

* Re: r8169 64-bit DMA support
From: Francois Romieu @ 2009-09-22 11:53 UTC (permalink / raw)
  To: Robert Hancock; +Cc: netdev
In-Reply-To: <4AB6BCEC.3070001@gmail.com>

Robert Hancock <hancockrwd@gmail.com> :
[...]
> It's not clear (from the mails I've read) exactly what was going on in  
> the case that caused this to be added.

Some AMD + r8169 systems simply did not work.

> Normally these days the PCI subsystem is supposed to detect that DAC
> isn't usable on a machine and refuse setting 64-bit DMA masks, it's
> not the driver's responsibility to handle this.
> I'm guessing that when this change was made that detection didn't exist
> though.

Not exactly. It was required for DAC to be explicitely enabled through
the CPlusCmd register.

> Thoughts on whether this default can be changed now ?

The 8168 does not seem to need the CPlusCmd stuff. I'll check it but it
should be possible to enable high DMA without condition for it.

-- 
Ueimor

^ permalink raw reply

* Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM
From: Arnd Bergmann @ 2009-09-22 11:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Chris Wright, Stephen Hemminger, Rusty Russell, virtualization,
	Xin, Xiaohui, kvm@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, hpa@zytor.com,
	mingo@elte.hu, akpm@linux-foundation.org
In-Reply-To: <20090922103807.GA2555@redhat.com>

On Tuesday 22 September 2009, Michael S. Tsirkin wrote:
> > > More importantly, when virtualizations is used with multi-queue
> > > NIC's the virtio-net NIC is a single CPU bottleneck. The virtio-net
> > > NIC should preserve the parallelism (lock free) using multiple
> > > receive/transmit queues. The number of queues should equal the
> > > number of CPUs.
> > 
> > Yup, multiqueue virtio is on todo list ;-)
> > 
> 
> Note we'll need multiqueue tap for that to help.

My idea for that was to open multiple file descriptors to the same
macvtap device and let the kernel figure out the  right thing to
do with that. You can do the same with raw packed sockets in case
of vhost_net, but I wouldn't want to add more complexity to the
tun/tap driver for this.

	Arnd <><

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH] skge: request IRQ on activating the interface
From: Michal Schmidt @ 2009-09-22 12:01 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger

skge requests IRQ in its probe function. This causes a problem in
the following real-life scenario with two different NICs in the machine:

1. modprobe skge
   The card is detected as eth0 and requests IRQ 17. Directory
   /proc/irq/17/eth0 is created.
2. There is an udev rule which says this interface should be called
   eth1, so udev renames eth0 -> eth1.
3. modprobe 8139too
   The Realtek card is detected as eth0. It will be using IRQ 17 too.
4. ip link set eth0 up
   Now 8139too requests IRQ 17.

The result is:
WARNING: at fs/proc/generic.c:590 proc_register ...
proc_dir_entry '17/eth0' already registered
...

And "ls /proc/irq/17" shows two subdirectories, both called eth0.

Fix it by requesting the IRQ in skge when the interface is activated.
This works, because interfaces can be renamed only while they are down.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
---

 drivers/net/skge.c |   27 +++++++++++++++------------
 1 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/net/skge.c b/drivers/net/skge.c
index 62e852e..7e90f27 100644
--- a/drivers/net/skge.c
+++ b/drivers/net/skge.c
@@ -105,6 +105,7 @@ static void yukon_init(struct skge_hw *hw, int port);
 static void genesis_mac_init(struct skge_hw *hw, int port);
 static void genesis_link_up(struct skge_port *skge);
 static void skge_set_multicast(struct net_device *dev);
+static irqreturn_t skge_intr(int irq, void *dev_id);
 
 /* Avoid conditionals by using array */
 static const int txqaddr[] = { Q_XA1, Q_XA2 };
@@ -2572,18 +2573,26 @@ static int skge_up(struct net_device *dev)
 	if (netif_msg_ifup(skge))
 		printk(KERN_INFO PFX "%s: enabling interface\n", dev->name);
 
+	err = request_irq(dev->irq, skge_intr, IRQF_SHARED, dev->name, hw);
+	if (err) {
+		dev_err(&hw->pdev->dev, "%s: cannot assign irq %d\n",
+			dev->name, dev->irq);
+		return err;
+	}
+
 	if (dev->mtu > RX_BUF_SIZE)
 		skge->rx_buf_size = dev->mtu + ETH_HLEN;
 	else
 		skge->rx_buf_size = RX_BUF_SIZE;
 
-
 	rx_size = skge->rx_ring.count * sizeof(struct skge_rx_desc);
 	tx_size = skge->tx_ring.count * sizeof(struct skge_tx_desc);
 	skge->mem_size = tx_size + rx_size;
 	skge->mem = pci_alloc_consistent(hw->pdev, skge->mem_size, &skge->dma);
-	if (!skge->mem)
-		return -ENOMEM;
+	if (!skge->mem) {
+		err = -ENOMEM;
+		goto free_irq;
+	}
 
 	BUG_ON(skge->dma & 7);
 
@@ -2646,6 +2655,8 @@ static int skge_up(struct net_device *dev)
  free_pci_mem:
 	pci_free_consistent(hw->pdev, skge->mem_size, skge->mem, skge->dma);
 	skge->mem = NULL;
+ free_irq:
+	free_irq(dev->irq, hw);
 
 	return err;
 }
@@ -2733,6 +2744,7 @@ static int skge_down(struct net_device *dev)
 	kfree(skge->tx_ring.start);
 	pci_free_consistent(hw->pdev, skge->mem_size, skge->mem, skge->dma);
 	skge->mem = NULL;
+	free_irq(dev->irq, hw);
 	return 0;
 }
 
@@ -3974,12 +3986,6 @@ static int __devinit skge_probe(struct pci_dev *pdev,
 		goto err_out_free_netdev;
 	}
 
-	err = request_irq(pdev->irq, skge_intr, IRQF_SHARED, dev->name, hw);
-	if (err) {
-		dev_err(&pdev->dev, "%s: cannot assign irq %d\n",
-		       dev->name, pdev->irq);
-		goto err_out_unregister;
-	}
 	skge_show_addr(dev);
 
 	if (hw->ports > 1 && (dev1 = skge_devinit(hw, 1, using_dac))) {
@@ -3996,8 +4002,6 @@ static int __devinit skge_probe(struct pci_dev *pdev,
 
 	return 0;
 
-err_out_unregister:
-	unregister_netdev(dev);
 err_out_free_netdev:
 	free_netdev(dev);
 err_out_led_off:
@@ -4041,7 +4045,6 @@ static void __devexit skge_remove(struct pci_dev *pdev)
 	skge_write16(hw, B0_LED, LED_STAT_OFF);
 	skge_write8(hw, B0_CTST, CS_RST_SET);
 
-	free_irq(pdev->irq, hw);
 	pci_release_regions(pdev);
 	pci_disable_device(pdev);
 	if (dev1)


^ permalink raw reply related

* [PATCH] 8139cp: fix duplicate loglevel in module load message
From: Alan Jenkins @ 2009-09-22 14:05 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-kernel, Alexander Beregalov

This was introduced by b93d58 "8139*: convert printk() to pr_<foo>()":

[ 2256252443 ] <6>8139cp: 10/100 PCI Ethernet driver v1.3 (Mar 22, 2004)

The "version" string is printed using pr_info(), so it doesn't need to
include a loglevel.

Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
CC: Alexander Beregalov <a.beregalov@gmail.com>
---
 drivers/net/8139cp.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/8139cp.c b/drivers/net/8139cp.c
index d0dbbf3..6841a9a 100644
--- a/drivers/net/8139cp.c
+++ b/drivers/net/8139cp.c
@@ -87,7 +87,7 @@
 
 /* These identify the driver base version and may not be removed. */
 static char version[] =
-KERN_INFO DRV_NAME ": 10/100 PCI Ethernet driver v" DRV_VERSION " (" DRV_RELDATE ")\n";
+DRV_NAME ": 10/100 PCI Ethernet driver v" DRV_VERSION " (" DRV_RELDATE ")\n";
 
 MODULE_AUTHOR("Jeff Garzik <jgarzik@pobox.com>");
 MODULE_DESCRIPTION("RealTek RTL-8139C+ series 10/100 PCI Ethernet driver");
-- 
1.6.3.2

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox