Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 1/3] drivers/staging/rtl8187se: Don't pass huge struct by value
From: Stephen Rothwell @ 2011-08-14 23:58 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: Greg Kroah-Hartman, devel, linux-kernel, Andrea Merello,
	Andre Nogueira, Lucas De Marchi, David S. Miller, Larry Finger,
	Stefan Weil, Ilia Mirkin, netdev
In-Reply-To: <alpine.LNX.2.00.1108142031330.14271@swampdragon.chaosbits.net>

[-- Attachment #1: Type: text/plain, Size: 642 bytes --]

Hi Jesper,

On Sun, 14 Aug 2011 20:32:49 +0200 (CEST) Jesper Juhl <jj@chaosbits.net> wrote:
>
> the "static inline" and defined in the header bits I didn't do because I 
> was afraid that that was not valid in combination with EXPORT_SYMBOL().

You don't need to/cannot EXPORT_SYMBOL() static inlines in header files.
Any use in modules will pick up the static inline definition from the
header file when they are built.  Unless, of course, someone takes the
addresses of these functions, then you should not inline them.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: [PATCH] virtio-net: Read MAC only after initializing MSI-X
From: Rusty Russell @ 2011-08-15  0:25 UTC (permalink / raw)
  To: Sasha Levin; +Cc: linux-kernel, Michael S. Tsirkin, virtualization, netdev, kvm
In-Reply-To: <1313330252.2422.12.camel@sasha>

On Sun, 14 Aug 2011 16:57:32 +0300, Sasha Levin <levinsasha928@gmail.com> wrote:
> On Sun, 2011-08-14 at 12:23 +0930, Rusty Russell wrote:
> > On Sat, 13 Aug 2011 11:51:01 +0300, Sasha Levin <levinsasha928@gmail.com> wrote:
> > > The MAC of a virtio-net device is located at the first field of the device
> > > specific header. This header is located at offset 20 if the device doesn't
> > > support MSI-X or offset 24 if it does.
> > 
> > Erk.  This means, in general, we have to do virtio_find_single_vq or
> > config->find_vqs before we examine any config options.
> > 
> > Look at virtio_blk, which has the same error.
> > 
> > Solutions in order of best to worst:
> > (1) Enable MSI-X before calling device probe.  This means reserving two
> >     vectors in virtio_pci_probe to ensure we *can* do this, I think.  Michael?
> 
> Do you mean reserving the vectors even before we probed the device for
> MSI-X support? Wouldn't we need 3 vectors then? (config, input, output).

We want three, but *need* two: see vp_find_vqs().  Also, the generic
code doesn't know how many virtqueues we have on the device.

> > (2) Ensure ordering of "find_vqs then access config space" statically.  This
> >     probably means handing the vqs array to virtio_config_val, so noone
> >     can call it before they have their virtqueues.
> 
> Just noticed that only virtio-blk uses virtio_config_val(), while the
> others are still doing 'if(virtio_has_feature()) vdev->config->get()',
> I'll send patches to fix that regardless of what we end up doing here.

Thanks.

> Did you want to pass the vq array to virtio_config_val() just to check
> that they were already found? 

Not if we fix is using method #1...

Thanks,
Rusty.

^ permalink raw reply

* PROTECTED PROJECT!!
From: KimJr @ 2011-08-15  1:26 UTC (permalink / raw)




-- 

I want to discuss an important issue with you . 
I write to know if this is your valid email. 
Please, let me know if your email is still valid.
My valid Email:  lkimyu@9.cn

KimJr

^ permalink raw reply

* Re: [PATCH] tcp: Use LIMIT_NETDEBUG in syn_flood_warning()
From: Tom Herbert @ 2011-08-15  3:20 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1313129310.2669.19.camel@edumazet-laptop>

> [PATCH] tcp: Use LIMIT_NETDEBUG in syn_flood_warning()
>
> LIMIT_NETDEBUG allows the admin to disable some warning messages :
> echo 0 > /proc/sys/net/core/warnings
>
> Use it to avoid filling syslog on busy servers.
>
> Based on a previous patch from Tom Herbert
>
> Factorize syn_flood_warning() IPv4/IPv6 implementations
>

Acked-by: Tom Herbert <therbert@google.coml>

> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Tom Herbert <therbert@google.com>
> ---
>  include/net/tcp.h   |    1 +
>  net/ipv4/tcp_ipv4.c |   14 ++++++--------
>  net/ipv6/tcp_ipv6.c |   17 +----------------
>  3 files changed, 8 insertions(+), 24 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 149a415..964341c 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -460,6 +460,7 @@ extern int tcp_write_wakeup(struct sock *);
>  extern void tcp_send_fin(struct sock *sk);
>  extern void tcp_send_active_reset(struct sock *sk, gfp_t priority);
>  extern int tcp_send_synack(struct sock *);
> +extern void tcp_syn_flood_warning(const struct sk_buff *skb, const char *proto);
>  extern void tcp_push_one(struct sock *, unsigned int mss_now);
>  extern void tcp_send_ack(struct sock *sk);
>  extern void tcp_send_delayed_ack(struct sock *sk);
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 1c12b8e..9e622da 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -808,20 +808,19 @@ static void tcp_v4_reqsk_destructor(struct request_sock *req)
>        kfree(inet_rsk(req)->opt);
>  }
>
> -static void syn_flood_warning(const struct sk_buff *skb)
> +void tcp_syn_flood_warning(const struct sk_buff *skb, const char *proto)
>  {
> -       const char *msg;
> +       const char *msg = "Dropping request";
>
>  #ifdef CONFIG_SYN_COOKIES
>        if (sysctl_tcp_syncookies)
>                msg = "Sending cookies";
> -       else
>  #endif
> -               msg = "Dropping request";
>
> -       pr_info("TCP: Possible SYN flooding on port %d. %s.\n",
> -                               ntohs(tcp_hdr(skb)->dest), msg);
> +       LIMIT_NETDEBUG(KERN_INFO "%s: Possible SYN flooding on port %d. %s.\n",
> +                      proto, ntohs(tcp_hdr(skb)->dest), msg);
>  }
> +EXPORT_SYMBOL(tcp_syn_flood_warning);
>
>  /*
>  * Save and compile IPv4 options into the request_sock if needed.
> @@ -1250,8 +1249,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
>         * evidently real one.
>         */
>        if (inet_csk_reqsk_queue_is_full(sk) && !isn) {
> -               if (net_ratelimit())
> -                       syn_flood_warning(skb);
> +               tcp_syn_flood_warning(skb, "TCP");
>  #ifdef CONFIG_SYN_COOKIES
>                if (sysctl_tcp_syncookies) {
>                        want_cookie = 1;
> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> index d1fb63f..a043386 100644
> --- a/net/ipv6/tcp_ipv6.c
> +++ b/net/ipv6/tcp_ipv6.c
> @@ -531,20 +531,6 @@ static int tcp_v6_rtx_synack(struct sock *sk, struct request_sock *req,
>        return tcp_v6_send_synack(sk, req, rvp);
>  }
>
> -static inline void syn_flood_warning(struct sk_buff *skb)
> -{
> -#ifdef CONFIG_SYN_COOKIES
> -       if (sysctl_tcp_syncookies)
> -               printk(KERN_INFO
> -                      "TCPv6: Possible SYN flooding on port %d. "
> -                      "Sending cookies.\n", ntohs(tcp_hdr(skb)->dest));
> -       else
> -#endif
> -               printk(KERN_INFO
> -                      "TCPv6: Possible SYN flooding on port %d. "
> -                      "Dropping request.\n", ntohs(tcp_hdr(skb)->dest));
> -}
> -
>  static void tcp_v6_reqsk_destructor(struct request_sock *req)
>  {
>        kfree_skb(inet6_rsk(req)->pktopts);
> @@ -1192,8 +1178,7 @@ static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
>                goto drop;
>
>        if (inet_csk_reqsk_queue_is_full(sk) && !isn) {
> -               if (net_ratelimit())
> -                       syn_flood_warning(skb);
> +               tcp_syn_flood_warning(skb, "TCPv6");
>  #ifdef CONFIG_SYN_COOKIES
>                if (sysctl_tcp_syncookies)
>                        want_cookie = 1;
>
>
>

^ permalink raw reply

* [PATCH 06/11] drivers/net/bna: do not use EXTRA_CFLAGS
From: Arnaud Lacombe @ 2011-08-15  5:07 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arnaud Lacombe, Sam Ravnborg, Rasesh Mody, netdev
In-Reply-To: <1313384834-24433-1-git-send-email-lacombar@gmail.com>

Usage of these flags has been deprecated for nearly 4 years by:

    commit f77bf01425b11947eeb3b5b54685212c302741b8
    Author: Sam Ravnborg <sam@neptun.(none)>
    Date:   Mon Oct 15 22:25:06 2007 +0200

        kbuild: introduce ccflags-y, asflags-y and ldflags-y

Moreover, these flags (at least EXTRA_CFLAGS) have been documented for command
line use. By default, gmake(1) do not override command line setting, so this is
likely to result in build failure or unexpected behavior.

Replace their usage by Kbuild's `{as,cc,ld}flags-y'.

Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Rasesh Mody <rmody@brocade.com>
Cc: netdev@vger.kernel.org
---
 drivers/net/bna/Makefile |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bna/Makefile b/drivers/net/bna/Makefile
index a5d604d..babc25f 100644
--- a/drivers/net/bna/Makefile
+++ b/drivers/net/bna/Makefile
@@ -8,4 +8,4 @@ obj-$(CONFIG_BNA) += bna.o
 bna-objs := bnad.o bnad_ethtool.o bna_ctrl.o bna_txrx.o
 bna-objs += bfa_ioc.o bfa_ioc_ct.o bfa_cee.o cna_fwimg.o
 
-EXTRA_CFLAGS := -Idrivers/net/bna
+ccflags-y := -Idrivers/net/bna
-- 
1.7.6.153.g78432

^ permalink raw reply related

* Re: [PATCH 06/11] drivers/net/bna: do not use EXTRA_CFLAGS
From: Arnaud Lacombe @ 2011-08-15  5:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arnaud Lacombe, Sam Ravnborg, Rasesh Mody, netdev
In-Reply-To: <1313384834-24433-7-git-send-email-lacombar@gmail.com>

Hi,

On Mon, Aug 15, 2011 at 1:07 AM, Arnaud Lacombe <lacombar@gmail.com> wrote:
> Usage of these flags has been deprecated for nearly 4 years by:
>
>    commit f77bf01425b11947eeb3b5b54685212c302741b8
>    Author: Sam Ravnborg <sam@neptun.(none)>
>    Date:   Mon Oct 15 22:25:06 2007 +0200
>
>        kbuild: introduce ccflags-y, asflags-y and ldflags-y
>
> Moreover, these flags (at least EXTRA_CFLAGS) have been documented for command
> line use. By default, gmake(1) do not override command line setting, so this is
> likely to result in build failure or unexpected behavior.
>
> Replace their usage by Kbuild's `{as,cc,ld}flags-y'.
>
> Cc: Sam Ravnborg <sam@ravnborg.org>
> Cc: Rasesh Mody <rmody@brocade.com>
> Cc: netdev@vger.kernel.org
Signed-off-by: Arnaud Lacombe <lacombar@gmail.com>

 - Arnaud

> ---
>  drivers/net/bna/Makefile |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/bna/Makefile b/drivers/net/bna/Makefile
> index a5d604d..babc25f 100644
> --- a/drivers/net/bna/Makefile
> +++ b/drivers/net/bna/Makefile
> @@ -8,4 +8,4 @@ obj-$(CONFIG_BNA) += bna.o
>  bna-objs := bnad.o bnad_ethtool.o bna_ctrl.o bna_txrx.o
>  bna-objs += bfa_ioc.o bfa_ioc_ct.o bfa_cee.o cna_fwimg.o
>
> -EXTRA_CFLAGS := -Idrivers/net/bna
> +ccflags-y := -Idrivers/net/bna
> --
> 1.7.6.153.g78432
>
>

^ permalink raw reply

* linux-next: build failure after merge of the final tree (net tree related)
From: Stephen Rothwell @ 2011-08-15  5:20 UTC (permalink / raw)
  To: David S. Miller, netdev; +Cc: linux-next, linux-kernel, Jeff Kirsher

[-- Attachment #1: Type: text/plain, Size: 897 bytes --]

Hi Dave,

After merging the final tree, today's linux-next build (powerpc
allyesconfig) failed like this:

make[5]: *** No rule to make target `drivers/net/ethernet/toshiba/ethernet/sun/sungem_phy.o', needed by `drivers/net/ethernet/toshiba/built-in.o'.
In file included from drivers/net/ethernet/toshiba/spider_net_ethtool.c:28:0:
drivers/net/ethernet/toshiba/spider_net.h:30:39: fatal error: ./ethernet/sun/sungem_phy.h: No such file or directory
In file included from drivers/net/ethernet/toshiba/spider_net.c:54:0:
drivers/net/ethernet/toshiba/spider_net.h:30:39: fatal error: ./ethernet/sun/sungem_phy.h: No such file or directory

Caused by commit 8df158ac36fa ("toshiba: Move the Toshiba drivers") or
the surrounding commits.

I have just left this failure for today.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* [PATCH] net: configurable sysctl parameter "net.core.tcp_lowat" for sk_stream_min_wspace()
From: Jun.Kondo @ 2011-08-15  5:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: omega-g1@ctc-g.co.jp, notsuki, Kozaki, Motokazu, Hajime Taira,
	netdev, TomohikoTAKAHASHI, Kotaro Sakai, ken sugawara

CTC had the following demand;

1. to ensure high throughput from the beginning of
tcp connection at normal times by acquiring large
default transmission buffer value

2. to limit the block time of the write in order to
prevent the timeout of upper layer applications
even when the connection has low throughput, such
as low rate streaming


The root of the issue;

2 can not be achieved with the configuration that
satisfies 1.

The current behavior is as follows;

Write is blocked when tcp transmission buffer (wmem)
becomes full.
In order to write again after that, one third of the
transmission buffer (sk_wmem_queued/2) must be freed.

When the throughput is low, timeout occurs by the time
when the free buffer space is created, which affects
streaming service.


The effect of the patch;

By putting xxx into the variable yyy, the portion of
the transmission buffer becomes zzz, thus timeout will
not occur in the low throughput network environment.

xxx → integer(e.g. 4)
yyy → "sysctl_tcp_lowat"
zzz → "sk_wmem_queued >> 4"

Also, we think one third of the transmission buffer
(sk_wmem_queued/2) is too deterministic, and it should
be configurable.

--------------------------------------------------
--- linux-mainline/include/net/sock.h.orig	2011-07-27 14:26:43.000000000 +0900
+++ linux-mainline/include/net/sock.h	2011-08-15 11:40:20.000000000 +0900
@@ -604,9 +604,11 @@ static inline int sk_acceptq_is_full(str
 /*
  * Compute minimal free write space needed to queue new packets.
  */
+extern __u32 sysctl_tcp_lowat;
+
 static inline int sk_stream_min_wspace(struct sock *sk)
 {
-	return sk->sk_wmem_queued >> 1;
+	return sk->sk_wmem_queued >> sysctl_tcp_lowat;
 }
 
 static inline int sk_stream_wspace(struct sock *sk)
--- linux-mainline/net/core/sock.c.orig	2011-07-24 05:04:06.000000000 +0900
+++ linux-mainline/net/core/sock.c	2011-08-15 11:34:27.000000000 +0900
@@ -217,6 +217,9 @@ __u32 sysctl_rmem_max __read_mostly = SK
 __u32 sysctl_wmem_default __read_mostly = SK_WMEM_MAX;
 __u32 sysctl_rmem_default __read_mostly = SK_RMEM_MAX;
 
+__u32 sysctl_tcp_lowat = 1;
+EXPORT_SYMBOL(sysctl_tcp_lowat);
+
 /* Maximal space eaten by iovec or ancillary data plus some space */
 int sysctl_optmem_max __read_mostly = sizeof(unsigned long)*(2*UIO_MAXIOV+512);
 EXPORT_SYMBOL(sysctl_optmem_max);
@@ -1330,6 +1333,8 @@ void __init sk_init(void)
 		sysctl_wmem_max = 131071;
 		sysctl_rmem_max = 131071;
 	}
+
+	sysctl_tcp_lowat = 1;
 }
 
 /*
--- linux-mainline/net/core/sysctl_net_core.c.orig	2011-05-29 06:01:16.000000000 +0900
+++ linux-mainline/net/core/sysctl_net_core.c	2011-08-15 11:05:38.000000000 +0900
@@ -168,6 +168,13 @@ static struct ctl_table net_core_table[]
 		.proc_handler	= rps_sock_flow_sysctl
 	},
 #endif
+	{
+		.procname	= "tcp_lowat",
+		.data		= &sysctl_tcp_lowat,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec
+	},
 #endif /* CONFIG_NET */
 	{
 		.procname	= "netdev_budget",

--------------------------------------------------

------------------------------------------
Jun.Kondo
ITOCHU TECHNO-SOLUTIONS Corporation(CTC)
tel:+81-3-6238-6607
fax:+81-3-5226-2369
------------------------------------------

^ permalink raw reply

* Re: iproute2: make arpd daemon write pid file on fork
From: Alex Dubov @ 2011-08-15  5:42 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev@vger.kernel.org
In-Reply-To: <20110812091547.17c40798@nehalam.ftrdhcpuser.net>

----- Original Message -----

> 
>>  The included patch makes arpd write it's own pid file after fork, in a 
> common 
>>  LSB fashion, so as to better inter-operate with start up scripts. Removal 
> of 
>>  stale pid files is handled elsewhere.
> 
> I already checked in a version which has the -p pidfile option.
>

Thanks for that.
It will be enough for arpd to fit with the rest of init scripts.


^ permalink raw reply

* [PATCH v2 0/4] rps: Look into tunnels to get hash
From: Tom Herbert @ 2011-08-15  5:44 UTC (permalink / raw)
  To: davem, netdev

In this version fixed calls to sock_rps_save_rxhash in IPv6 with correct
arguments and addressed comments from Eric Dumazet.

The patches in this series are to look into encapsulated packets
to compute the rx hash for RPS.  Before these patches, all packets
received on the same tunnel would wind up on the same RPS CPU-- this
can lead to very poor loading, and make RFS ineffective on these
packets.

This patch supports getting the rxhash out of a GRE encapsulated packet.

A couple of caveats:

- rxhash should be disabled in device to be able to use this.  I believe
probably all NICs would just provide rxhash on the outer packet
2-tuple.
- The l4_rxhash flag was added so that the hash is preserved across the
tunnel and can set in flow tables by the transport.  It would be nice
it driverswould set this to so to provide more useful information to the
stack (like whether the rxhash hash should be used in the flow table).
Unfortutunately, I don't think all drivers will be able to distinguish
the type of hash (2-tuple, 4-tuple, ...) without looking into the
packet.

^ permalink raw reply

* [PATCH v2 1/4] rps: Some minor cleanup in get_rps_cpus
From: Tom Herbert @ 2011-08-15  5:45 UTC (permalink / raw)
  To: davem, netdev

Use some variables for clarity and extensibility.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 net/core/dev.c |   12 +++++++-----
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index d22ffd7..6578d94 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2528,15 +2528,17 @@ __u32 __skb_get_rxhash(struct sk_buff *skb)
 	const struct ipv6hdr *ip6;
 	const struct iphdr *ip;
 	u8 ip_proto;
-	u32 addr1, addr2, ihl;
+	u32 addr1, addr2;
+	u16 proto;
 	union {
 		u32 v32;
 		u16 v16[2];
 	} ports;
 
 	nhoff = skb_network_offset(skb);
+	proto = skb->protocol;
 
-	switch (skb->protocol) {
+	switch (proto) {
 	case __constant_htons(ETH_P_IP):
 		if (!pskb_may_pull(skb, sizeof(*ip) + nhoff))
 			goto done;
@@ -2548,7 +2550,7 @@ __u32 __skb_get_rxhash(struct sk_buff *skb)
 			ip_proto = ip->protocol;
 		addr1 = (__force u32) ip->saddr;
 		addr2 = (__force u32) ip->daddr;
-		ihl = ip->ihl;
+		nhoff += ip->ihl * 4;
 		break;
 	case __constant_htons(ETH_P_IPV6):
 		if (!pskb_may_pull(skb, sizeof(*ip6) + nhoff))
@@ -2558,7 +2560,7 @@ __u32 __skb_get_rxhash(struct sk_buff *skb)
 		ip_proto = ip6->nexthdr;
 		addr1 = (__force u32) ip6->saddr.s6_addr32[3];
 		addr2 = (__force u32) ip6->daddr.s6_addr32[3];
-		ihl = (40 >> 2);
+		nhoff += 40;
 		break;
 	default:
 		goto done;
@@ -2567,7 +2569,7 @@ __u32 __skb_get_rxhash(struct sk_buff *skb)
 	ports.v32 = 0;
 	poff = proto_ports_offset(ip_proto);
 	if (poff >= 0) {
-		nhoff += ihl * 4 + poff;
+		nhoff += poff;
 		if (pskb_may_pull(skb, nhoff + 4)) {
 			ports.v32 = * (__force u32 *) (skb->data + nhoff);
 			if (ports.v16[1] < ports.v16[0])
-- 
1.7.3.1

^ permalink raw reply related

* Re: ip_rt_bug: 10.0.0.52 -> 255.255.255.255, ?
From: David Miller @ 2011-08-15  5:45 UTC (permalink / raw)
  To: justinmattock; +Cc: linux-kernel, linux-wireless, netdev
In-Reply-To: <4E4898F5.8050605@gmail.com>


First, please contact netdev@vger.kernel.org for networking issues.

Second, this is fixed already:

commit d547f727df86059104af2234804fdd538e112015
Author: Julian Anastasov <ja@ssi.bg>
Date:   Sun Aug 7 22:20:20 2011 -0700

    ipv4: fix the reusing of routing cache entries
    
    	compare_keys and ip_route_input_common rely on
    rt_oif for distinguishing of input and output routes
    with same keys values. But sometimes the input route has
    also same hash chain (keyed by iif != 0) with the output
    routes (keyed by orig_oif=0). Problem visible if running
    with small number of rhash_entries.
    
    	Fix them to use rt_route_iif instead. By this way
    input route can not be returned to users that request
    output route.
    
    	The patch fixes the ip_rt_bug errors that were
    reported in ip_local_out context, mostly for 255.255.255.255
    destinations.
    
    Signed-off-by: Julian Anastasov <ja@ssi.bg>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index e3dec1c..cb7efe0 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -731,6 +731,7 @@ static inline int compare_keys(struct rtable *rt1, struct rtable *rt2)
 		((__force u32)rt1->rt_key_src ^ (__force u32)rt2->rt_key_src) |
 		(rt1->rt_mark ^ rt2->rt_mark) |
 		(rt1->rt_key_tos ^ rt2->rt_key_tos) |
+		(rt1->rt_route_iif ^ rt2->rt_route_iif) |
 		(rt1->rt_oif ^ rt2->rt_oif) |
 		(rt1->rt_iif ^ rt2->rt_iif)) == 0;
 }
@@ -2321,8 +2322,8 @@ int ip_route_input_common(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 		if ((((__force u32)rth->rt_key_dst ^ (__force u32)daddr) |
 		     ((__force u32)rth->rt_key_src ^ (__force u32)saddr) |
 		     (rth->rt_iif ^ iif) |
-		     rth->rt_oif |
 		     (rth->rt_key_tos ^ tos)) == 0 &&
+		    rt_is_input_route(rth) &&
 		    rth->rt_mark == skb->mark &&
 		    net_eq(dev_net(rth->dst.dev), net) &&
 		    !rt_is_expired(rth)) {

^ permalink raw reply related

* [PATCH v2 2/4] rps: Add flag to skb to indicate rxhash is based on L4 tuple
From: Tom Herbert @ 2011-08-15  5:45 UTC (permalink / raw)
  To: davem, netdev

The l4_rxhash flag was added to the skb structure to indicate
that the rxhash value was computed over the 4 tuple for the
packet which includes the port information in the encapsulated
transport packet.  This is used by the stack to preserve the
rxhash value in __skb_rx_tunnel.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/linux/skbuff.h |    5 +++--
 include/net/dst.h      |    9 ++++++++-
 include/net/sock.h     |   15 ++++++++++++---
 net/core/dev.c         |   10 ++++++----
 net/core/skbuff.c      |    1 +
 net/ipv4/tcp_ipv4.c    |    6 +++---
 net/ipv4/udp.c         |    4 ++--
 net/ipv6/tcp_ipv6.c    |    6 +++---
 net/ipv6/udp.c         |    2 +-
 9 files changed, 39 insertions(+), 19 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7b996ed..f902c33 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -414,6 +414,7 @@ struct sk_buff {
 	__u8			ndisc_nodetype:2;
 #endif
 	__u8			ooo_okay:1;
+	__u8			l4_rxhash:1;
 	kmemcheck_bitfield_end(flags2);
 
 	/* 0/13 bit hole */
@@ -572,11 +573,11 @@ extern unsigned int   skb_find_text(struct sk_buff *skb, unsigned int from,
 				    unsigned int to, struct ts_config *config,
 				    struct ts_state *state);
 
-extern __u32 __skb_get_rxhash(struct sk_buff *skb);
+extern void __skb_get_rxhash(struct sk_buff *skb);
 static inline __u32 skb_get_rxhash(struct sk_buff *skb)
 {
 	if (!skb->rxhash)
-		skb->rxhash = __skb_get_rxhash(skb);
+		__skb_get_rxhash(skb);
 
 	return skb->rxhash;
 }
diff --git a/include/net/dst.h b/include/net/dst.h
index 13d507d..4fb6c43 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -325,7 +325,14 @@ static inline void skb_dst_force(struct sk_buff *skb)
 static inline void __skb_tunnel_rx(struct sk_buff *skb, struct net_device *dev)
 {
 	skb->dev = dev;
-	skb->rxhash = 0;
+
+	/*
+	 * Clear rxhash so that we can recalulate the hash for the
+	 * encapsulated packet, unless we have already determine the hash
+	 * over the L4 4-tuple.
+	 */
+	if (!skb->l4_rxhash)
+		skb->rxhash = 0;
 	skb_set_queue_mapping(skb, 0);
 	skb_dst_drop(skb);
 	nf_reset(skb);
diff --git a/include/net/sock.h b/include/net/sock.h
index 8e4062f..5ac682f 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -686,16 +686,25 @@ static inline void sock_rps_reset_flow(const struct sock *sk)
 #endif
 }
 
-static inline void sock_rps_save_rxhash(struct sock *sk, u32 rxhash)
+static inline void sock_rps_save_rxhash(struct sock *sk,
+					const struct sk_buff *skb)
 {
 #ifdef CONFIG_RPS
-	if (unlikely(sk->sk_rxhash != rxhash)) {
+	if (unlikely(sk->sk_rxhash != skb->rxhash)) {
 		sock_rps_reset_flow(sk);
-		sk->sk_rxhash = rxhash;
+		sk->sk_rxhash = skb->rxhash;
 	}
 #endif
 }
 
+static inline void sock_rps_reset_rxhash(struct sock *sk)
+{
+#ifdef CONFIG_RPS
+	sock_rps_reset_flow(sk);
+	sk->sk_rxhash = 0;
+#endif
+}
+
 #define sk_wait_event(__sk, __timeo, __condition)			\
 	({	int __rc;						\
 		release_sock(__sk);					\
diff --git a/net/core/dev.c b/net/core/dev.c
index 6578d94..e485cb3 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2519,10 +2519,11 @@ static inline void ____napi_schedule(struct softnet_data *sd,
 
 /*
  * __skb_get_rxhash: calculate a flow hash based on src/dst addresses
- * and src/dst port numbers. Returns a non-zero hash number on success
- * and 0 on failure.
+ * and src/dst port numbers.  Sets rxhash in skb to non-zero hash value
+ * on success, zero indicates no valid hash.  Also, sets l4_rxhash in skb
+ * if hash is a canonical 4-tuple hash over transport ports.
  */
-__u32 __skb_get_rxhash(struct sk_buff *skb)
+void __skb_get_rxhash(struct sk_buff *skb)
 {
 	int nhoff, hash = 0, poff;
 	const struct ipv6hdr *ip6;
@@ -2574,6 +2575,7 @@ __u32 __skb_get_rxhash(struct sk_buff *skb)
 			ports.v32 = * (__force u32 *) (skb->data + nhoff);
 			if (ports.v16[1] < ports.v16[0])
 				swap(ports.v16[0], ports.v16[1]);
+			skb->l4_rxhash = 1;
 		}
 	}
 
@@ -2586,7 +2588,7 @@ __u32 __skb_get_rxhash(struct sk_buff *skb)
 		hash = 1;
 
 done:
-	return hash;
+	skb->rxhash = hash;
 }
 EXPORT_SYMBOL(__skb_get_rxhash);
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 27002df..edb66f3 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -529,6 +529,7 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
 	new->mac_header		= old->mac_header;
 	skb_dst_copy(new, old);
 	new->rxhash		= old->rxhash;
+	new->l4_rxhash		= old->l4_rxhash;
 #ifdef CONFIG_XFRM
 	new->sp			= secpath_get(old->sp);
 #endif
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 1c12b8e..b3f2611 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1578,7 +1578,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
 #endif
 
 	if (sk->sk_state == TCP_ESTABLISHED) { /* Fast path */
-		sock_rps_save_rxhash(sk, skb->rxhash);
+		sock_rps_save_rxhash(sk, skb);
 		if (tcp_rcv_established(sk, skb, tcp_hdr(skb), skb->len)) {
 			rsk = sk;
 			goto reset;
@@ -1595,7 +1595,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
 			goto discard;
 
 		if (nsk != sk) {
-			sock_rps_save_rxhash(nsk, skb->rxhash);
+			sock_rps_save_rxhash(nsk, skb);
 			if (tcp_child_process(sk, nsk, skb)) {
 				rsk = nsk;
 				goto reset;
@@ -1603,7 +1603,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
 			return 0;
 		}
 	} else
-		sock_rps_save_rxhash(sk, skb->rxhash);
+		sock_rps_save_rxhash(sk, skb);
 
 	if (tcp_rcv_state_process(sk, skb, tcp_hdr(skb), skb->len)) {
 		rsk = sk;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c1d5fac..ebaa96b 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1267,7 +1267,7 @@ int udp_disconnect(struct sock *sk, int flags)
 	sk->sk_state = TCP_CLOSE;
 	inet->inet_daddr = 0;
 	inet->inet_dport = 0;
-	sock_rps_save_rxhash(sk, 0);
+	sock_rps_reset_rxhash(sk);
 	sk->sk_bound_dev_if = 0;
 	if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK))
 		inet_reset_saddr(sk);
@@ -1355,7 +1355,7 @@ static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	int rc;
 
 	if (inet_sk(sk)->inet_daddr)
-		sock_rps_save_rxhash(sk, skb->rxhash);
+		sock_rps_save_rxhash(sk, skb);
 
 	rc = ip_queue_rcv_skb(sk, skb);
 	if (rc < 0) {
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index d1fb63f..44a5859 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1628,7 +1628,7 @@ static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
 		opt_skb = skb_clone(skb, GFP_ATOMIC);
 
 	if (sk->sk_state == TCP_ESTABLISHED) { /* Fast path */
-		sock_rps_save_rxhash(sk, skb->rxhash);
+		sock_rps_save_rxhash(sk, skb);
 		if (tcp_rcv_established(sk, skb, tcp_hdr(skb), skb->len))
 			goto reset;
 		if (opt_skb)
@@ -1650,7 +1650,7 @@ static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
 		 * the new socket..
 		 */
 		if(nsk != sk) {
-			sock_rps_save_rxhash(nsk, skb->rxhash);
+			sock_rps_save_rxhash(nsk, skb);
 			if (tcp_child_process(sk, nsk, skb))
 				goto reset;
 			if (opt_skb)
@@ -1658,7 +1658,7 @@ static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
 			return 0;
 		}
 	} else
-		sock_rps_save_rxhash(sk, skb->rxhash);
+		sock_rps_save_rxhash(sk, skb);
 
 	if (tcp_rcv_state_process(sk, skb, tcp_hdr(skb), skb->len))
 		goto reset;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 97e47f0..35bbdc4 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -509,7 +509,7 @@ int udpv6_queue_rcv_skb(struct sock * sk, struct sk_buff *skb)
 	int is_udplite = IS_UDPLITE(sk);
 
 	if (!ipv6_addr_any(&inet6_sk(sk)->daddr))
-		sock_rps_save_rxhash(sk, skb->rxhash);
+		sock_rps_save_rxhash(sk, skb);
 
 	if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb))
 		goto drop;
-- 
1.7.3.1


^ permalink raw reply related

* [PATCH v2 3/4] rps: Infrastructure in __skb_get_rxhash for deep inspection
From: Tom Herbert @ 2011-08-15  5:46 UTC (permalink / raw)
  To: davem, netdev

Basics for looking for ports in encapsulated packets in tunnels.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 net/core/dev.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index e485cb3..4bee9a9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -133,6 +133,7 @@
 #include <linux/pci.h>
 #include <linux/inetdevice.h>
 #include <linux/cpu_rmap.h>
+#include <linux/if_tunnel.h>
 
 #include "net-sysfs.h"
 
@@ -2539,6 +2540,7 @@ void __skb_get_rxhash(struct sk_buff *skb)
 	nhoff = skb_network_offset(skb);
 	proto = skb->protocol;
 
+again:
 	switch (proto) {
 	case __constant_htons(ETH_P_IP):
 		if (!pskb_may_pull(skb, sizeof(*ip) + nhoff))
@@ -2567,6 +2569,11 @@ void __skb_get_rxhash(struct sk_buff *skb)
 		goto done;
 	}
 
+	switch (ip_proto) {
+	default:
+		break;
+	}
+
 	ports.v32 = 0;
 	poff = proto_ports_offset(ip_proto);
 	if (poff >= 0) {
-- 
1.7.3.1


^ permalink raw reply related

* [PATCH v2 4/4] rps: Inspect GRE encapsulated packets to get flow hash
From: Tom Herbert @ 2011-08-15  5:46 UTC (permalink / raw)
  To: davem, netdev

Crack open GRE packets in __skb_get_rxhash to compute 4-tuple hash on
in encapsulated packet.  Note that this is used only when the
__skb_get_rxhash is taken, in particular only when the device does
not compute provide the rxhash (ie. feature is disabled).

This was tested by creating a single GRE tunnel between two 16 core
AMD machines.  200 netperf TCP_RR streams were ran with 1 byte
request and response size.

Without patch: 157497 tps, 50/90/99% latencies 1250/1292/1364 usecs
With patch: 325896 tps, 50/90/99% latencies 603/848/1169

Signed-off-by: Tom Herbert <therbert@google.com>
---
 net/core/dev.c |   22 ++++++++++++++++++++++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 4bee9a9..a8d91a5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2570,6 +2570,28 @@ again:
 	}
 
 	switch (ip_proto) {
+	case IPPROTO_GRE:
+		if (pskb_may_pull(skb, nhoff + 16)) {
+			u8 *h = skb->data + nhoff;
+			__be16 flags = *(__be16 *)h;
+
+			/*
+			 * Only look inside GRE if version zero and no
+			 * routing
+			 */
+			if (!(flags & (GRE_VERSION|GRE_ROUTING))) {
+				proto = *(__be16 *)(h + 2);
+				nhoff += 4;
+				if (flags & GRE_CSUM)
+					nhoff += 4;
+				if (flags & GRE_KEY)
+					nhoff += 4;
+				if (flags & GRE_SEQ)
+					nhoff += 4;
+				goto again;
+			}
+		}
+		break;
 	default:
 		break;
 	}
-- 
1.7.3.1


^ permalink raw reply related

* Re: [PATCH] net: configurable sysctl parameter "net.core.tcp_lowat" for sk_stream_min_wspace()
From: David Miller @ 2011-08-15  5:47 UTC (permalink / raw)
  To: jun.kondo
  Cc: linux-kernel, omega-g1, notsuki, motokazu.kozaki, htaira, netdev,
	tomohiko.takahashi, kotaro.sakai, ken.sugawara
In-Reply-To: <4E48B0C3.2010203@ctc-g.co.jp>

From: "Jun.Kondo" <jun.kondo@ctc-g.co.jp>
Date: Mon, 15 Aug 2011 14:38:11 +0900

> 2. to limit the block time of the write in order to
> prevent the timeout of upper layer applications
> even when the connection has low throughput, such
> as low rate streaming

Use non-blocking writes if you want this behavior.

^ permalink raw reply

* Re: ip_rt_bug: 10.0.0.52 -> 255.255.255.255, ?
From: Justin P. Mattock @ 2011-08-15  5:55 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel, linux-wireless, netdev
In-Reply-To: <20110814.224536.1330937123770047026.davem@davemloft.net>

Oh.. guess I will wait for those to be put into the Mainline then. 
Thanks for the info!

>
> First, please contact netdev@vger.kernel.org for networking issues.
>
> Second, this is fixed already:
>
> commit d547f727df86059104af2234804fdd538e112015
> Author: Julian Anastasov<ja@ssi.bg>
> Date:   Sun Aug 7 22:20:20 2011 -0700
>
>      ipv4: fix the reusing of routing cache entries
>
>      	compare_keys and ip_route_input_common rely on
>      rt_oif for distinguishing of input and output routes
>      with same keys values. But sometimes the input route has
>      also same hash chain (keyed by iif != 0) with the output
>      routes (keyed by orig_oif=0). Problem visible if running
>      with small number of rhash_entries.
>
>      	Fix them to use rt_route_iif instead. By this way
>      input route can not be returned to users that request
>      output route.
>
>      	The patch fixes the ip_rt_bug errors that were
>      reported in ip_local_out context, mostly for 255.255.255.255
>      destinations.
>
>      Signed-off-by: Julian Anastasov<ja@ssi.bg>
>      Signed-off-by: David S. Miller<davem@davemloft.net>
>
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index e3dec1c..cb7efe0 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -731,6 +731,7 @@ static inline int compare_keys(struct rtable *rt1, struct rtable *rt2)
>   		((__force u32)rt1->rt_key_src ^ (__force u32)rt2->rt_key_src) |
>   		(rt1->rt_mark ^ rt2->rt_mark) |
>   		(rt1->rt_key_tos ^ rt2->rt_key_tos) |
> +		(rt1->rt_route_iif ^ rt2->rt_route_iif) |
>   		(rt1->rt_oif ^ rt2->rt_oif) |
>   		(rt1->rt_iif ^ rt2->rt_iif)) == 0;
>   }
> @@ -2321,8 +2322,8 @@ int ip_route_input_common(struct sk_buff *skb, __be32 daddr, __be32 saddr,
>   		if ((((__force u32)rth->rt_key_dst ^ (__force u32)daddr) |
>   		     ((__force u32)rth->rt_key_src ^ (__force u32)saddr) |
>   		     (rth->rt_iif ^ iif) |
> -		     rth->rt_oif |
>   		     (rth->rt_key_tos ^ tos)) == 0&&
> +		    rt_is_input_route(rth)&&
>   		rth->rt_mark == skb->mark&&
>   		net_eq(dev_net(rth->dst.dev), net)&&
>   		!rt_is_expired(rth)) {
>
>
>

^ permalink raw reply

* Re: linux-next: build failure after merge of the final tree (net tree related)
From: David Miller @ 2011-08-15  5:56 UTC (permalink / raw)
  To: sfr; +Cc: netdev, linux-next, linux-kernel, jeffrey.t.kirsher
In-Reply-To: <20110815152048.3b8f8484faf1891ef2fce266@canb.auug.org.au>

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Mon, 15 Aug 2011 15:20:48 +1000

> After merging the final tree, today's linux-next build (powerpc
> allyesconfig) failed like this:
> 
> make[5]: *** No rule to make target `drivers/net/ethernet/toshiba/ethernet/sun/sungem_phy.o', needed by `drivers/net/ethernet/toshiba/built-in.o'.
> In file included from drivers/net/ethernet/toshiba/spider_net_ethtool.c:28:0:
> drivers/net/ethernet/toshiba/spider_net.h:30:39: fatal error: ./ethernet/sun/sungem_phy.h: No such file or directory
> In file included from drivers/net/ethernet/toshiba/spider_net.c:54:0:
> drivers/net/ethernet/toshiba/spider_net.h:30:39: fatal error: ./ethernet/sun/sungem_phy.h: No such file or directory
> 
> Caused by commit 8df158ac36fa ("toshiba: Move the Toshiba drivers") or
> the surrounding commits.
> 
> I have just left this failure for today.

This should fix the include problem, but I suspect this thing won't
link.

Jeff we have to resolve this somehow, I explained last week how
you can't include object files outside of the current directory
in constructs like is being done for the spider_net driver in
order to get the sungem_phy.o thing tacked on.

--------------------
>From 2bb698412d8aab0bfc3f269f5ebe8eb67d7cc8f4 Mon Sep 17 00:00:00 2001
From: "David S. Miller" <davem@davemloft.net>
Date: Sun, 14 Aug 2011 22:52:04 -0700
Subject: [PATCH] net: Move sungem_phy.h under include/linux

Fixes build failures of the spider_net driver because it tries
to use a convoluted path to include this header.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/ethernet/sun/sungem.c         |    2 +-
 drivers/net/ethernet/sun/sungem_phy.h     |  132 -----------------------------
 drivers/net/ethernet/toshiba/spider_net.h |    2 +-
 include/linux/sungem_phy.h                |  132 +++++++++++++++++++++++++++++
 4 files changed, 134 insertions(+), 134 deletions(-)
 delete mode 100644 drivers/net/ethernet/sun/sungem_phy.h
 create mode 100644 include/linux/sungem_phy.h

diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c
index ade35dd..0f13c5d 100644
--- a/drivers/net/ethernet/sun/sungem.c
+++ b/drivers/net/ethernet/sun/sungem.c
@@ -59,7 +59,7 @@
 #include <asm/pmac_feature.h>
 #endif
 
-#include "sungem_phy.h"
+#include <linux/sungem_phy.h>
 #include "sungem.h"
 
 /* Stripping FCS is causing problems, disabled for now */
diff --git a/drivers/net/ethernet/sun/sungem_phy.h b/drivers/net/ethernet/sun/sungem_phy.h
deleted file mode 100644
index af02f94..0000000
--- a/drivers/net/ethernet/sun/sungem_phy.h
+++ /dev/null
@@ -1,132 +0,0 @@
-#ifndef __SUNGEM_PHY_H__
-#define __SUNGEM_PHY_H__
-
-struct mii_phy;
-
-/* Operations supported by any kind of PHY */
-struct mii_phy_ops
-{
-	int		(*init)(struct mii_phy *phy);
-	int		(*suspend)(struct mii_phy *phy);
-	int		(*setup_aneg)(struct mii_phy *phy, u32 advertise);
-	int		(*setup_forced)(struct mii_phy *phy, int speed, int fd);
-	int		(*poll_link)(struct mii_phy *phy);
-	int		(*read_link)(struct mii_phy *phy);
-	int		(*enable_fiber)(struct mii_phy *phy, int autoneg);
-};
-
-/* Structure used to statically define an mii/gii based PHY */
-struct mii_phy_def
-{
-	u32				phy_id;		/* Concatenated ID1 << 16 | ID2 */
-	u32				phy_id_mask;	/* Significant bits */
-	u32				features;	/* Ethtool SUPPORTED_* defines */
-	int				magic_aneg;	/* Autoneg does all speed test for us */
-	const char*			name;
-	const struct mii_phy_ops*	ops;
-};
-
-enum {
-	BCM54XX_COPPER,
-	BCM54XX_FIBER,
-	BCM54XX_GBIC,
-	BCM54XX_SGMII,
-	BCM54XX_UNKNOWN,
-};
-
-/* An instance of a PHY, partially borrowed from mii_if_info */
-struct mii_phy
-{
-	struct mii_phy_def*	def;
-	u32			advertising;
-	int			mii_id;
-
-	/* 1: autoneg enabled, 0: disabled */
-	int			autoneg;
-
-	/* forced speed & duplex (no autoneg)
-	 * partner speed & duplex & pause (autoneg)
-	 */
-	int			speed;
-	int			duplex;
-	int			pause;
-
-	/* Provided by host chip */
-	struct net_device	*dev;
-	int (*mdio_read) (struct net_device *dev, int mii_id, int reg);
-	void (*mdio_write) (struct net_device *dev, int mii_id, int reg, int val);
-	void			*platform_data;
-};
-
-/* Pass in a struct mii_phy with dev, mdio_read and mdio_write
- * filled, the remaining fields will be filled on return
- */
-extern int mii_phy_probe(struct mii_phy *phy, int mii_id);
-
-
-/* MII definitions missing from mii.h */
-
-#define BMCR_SPD2	0x0040		/* Gigabit enable (bcm54xx)	*/
-#define LPA_PAUSE	0x0400
-
-/* More PHY registers (model specific) */
-
-/* MII BCM5201 MULTIPHY interrupt register */
-#define MII_BCM5201_INTERRUPT			0x1A
-#define MII_BCM5201_INTERRUPT_INTENABLE		0x4000
-
-#define MII_BCM5201_AUXMODE2			0x1B
-#define MII_BCM5201_AUXMODE2_LOWPOWER		0x0008
-
-#define MII_BCM5201_MULTIPHY                    0x1E
-
-/* MII BCM5201 MULTIPHY register bits */
-#define MII_BCM5201_MULTIPHY_SERIALMODE         0x0002
-#define MII_BCM5201_MULTIPHY_SUPERISOLATE       0x0008
-
-/* MII BCM5221 Additional registers */
-#define MII_BCM5221_TEST			0x1f
-#define MII_BCM5221_TEST_ENABLE_SHADOWS		0x0080
-#define MII_BCM5221_SHDOW_AUX_STAT2		0x1b
-#define MII_BCM5221_SHDOW_AUX_STAT2_APD		0x0020
-#define MII_BCM5221_SHDOW_AUX_MODE4		0x1a
-#define MII_BCM5221_SHDOW_AUX_MODE4_IDDQMODE	0x0001
-#define MII_BCM5221_SHDOW_AUX_MODE4_CLKLOPWR	0x0004
-
-/* MII BCM5241 Additional registers */
-#define MII_BCM5241_SHDOW_AUX_MODE4_STANDBYPWR	0x0008
-
-/* MII BCM5400 1000-BASET Control register */
-#define MII_BCM5400_GB_CONTROL			0x09
-#define MII_BCM5400_GB_CONTROL_FULLDUPLEXCAP	0x0200
-
-/* MII BCM5400 AUXCONTROL register */
-#define MII_BCM5400_AUXCONTROL                  0x18
-#define MII_BCM5400_AUXCONTROL_PWR10BASET       0x0004
-
-/* MII BCM5400 AUXSTATUS register */
-#define MII_BCM5400_AUXSTATUS                   0x19
-#define MII_BCM5400_AUXSTATUS_LINKMODE_MASK     0x0700
-#define MII_BCM5400_AUXSTATUS_LINKMODE_SHIFT    8
-
-/* 1000BT control (Marvell & BCM54xx at least) */
-#define MII_1000BASETCONTROL			0x09
-#define MII_1000BASETCONTROL_FULLDUPLEXCAP	0x0200
-#define MII_1000BASETCONTROL_HALFDUPLEXCAP	0x0100
-
-/* Marvell 88E1011 PHY control */
-#define MII_M1011_PHY_SPEC_CONTROL		0x10
-#define MII_M1011_PHY_SPEC_CONTROL_MANUAL_MDIX	0x20
-#define MII_M1011_PHY_SPEC_CONTROL_AUTO_MDIX	0x40
-
-/* Marvell 88E1011 PHY status */
-#define MII_M1011_PHY_SPEC_STATUS		0x11
-#define MII_M1011_PHY_SPEC_STATUS_1000		0x8000
-#define MII_M1011_PHY_SPEC_STATUS_100		0x4000
-#define MII_M1011_PHY_SPEC_STATUS_SPD_MASK	0xc000
-#define MII_M1011_PHY_SPEC_STATUS_FULLDUPLEX	0x2000
-#define MII_M1011_PHY_SPEC_STATUS_RESOLVED	0x0800
-#define MII_M1011_PHY_SPEC_STATUS_TX_PAUSE	0x0008
-#define MII_M1011_PHY_SPEC_STATUS_RX_PAUSE	0x0004
-
-#endif /* __SUNGEM_PHY_H__ */
diff --git a/drivers/net/ethernet/toshiba/spider_net.h b/drivers/net/ethernet/toshiba/spider_net.h
index a891ad0..4ba2135 100644
--- a/drivers/net/ethernet/toshiba/spider_net.h
+++ b/drivers/net/ethernet/toshiba/spider_net.h
@@ -27,7 +27,7 @@
 
 #define VERSION "2.0 B"
 
-#include "./ethernet/sun/sungem_phy.h"
+#include <linux/sungem_phy.h>
 
 extern int spider_net_stop(struct net_device *netdev);
 extern int spider_net_open(struct net_device *netdev);
diff --git a/include/linux/sungem_phy.h b/include/linux/sungem_phy.h
new file mode 100644
index 0000000..af02f94
--- /dev/null
+++ b/include/linux/sungem_phy.h
@@ -0,0 +1,132 @@
+#ifndef __SUNGEM_PHY_H__
+#define __SUNGEM_PHY_H__
+
+struct mii_phy;
+
+/* Operations supported by any kind of PHY */
+struct mii_phy_ops
+{
+	int		(*init)(struct mii_phy *phy);
+	int		(*suspend)(struct mii_phy *phy);
+	int		(*setup_aneg)(struct mii_phy *phy, u32 advertise);
+	int		(*setup_forced)(struct mii_phy *phy, int speed, int fd);
+	int		(*poll_link)(struct mii_phy *phy);
+	int		(*read_link)(struct mii_phy *phy);
+	int		(*enable_fiber)(struct mii_phy *phy, int autoneg);
+};
+
+/* Structure used to statically define an mii/gii based PHY */
+struct mii_phy_def
+{
+	u32				phy_id;		/* Concatenated ID1 << 16 | ID2 */
+	u32				phy_id_mask;	/* Significant bits */
+	u32				features;	/* Ethtool SUPPORTED_* defines */
+	int				magic_aneg;	/* Autoneg does all speed test for us */
+	const char*			name;
+	const struct mii_phy_ops*	ops;
+};
+
+enum {
+	BCM54XX_COPPER,
+	BCM54XX_FIBER,
+	BCM54XX_GBIC,
+	BCM54XX_SGMII,
+	BCM54XX_UNKNOWN,
+};
+
+/* An instance of a PHY, partially borrowed from mii_if_info */
+struct mii_phy
+{
+	struct mii_phy_def*	def;
+	u32			advertising;
+	int			mii_id;
+
+	/* 1: autoneg enabled, 0: disabled */
+	int			autoneg;
+
+	/* forced speed & duplex (no autoneg)
+	 * partner speed & duplex & pause (autoneg)
+	 */
+	int			speed;
+	int			duplex;
+	int			pause;
+
+	/* Provided by host chip */
+	struct net_device	*dev;
+	int (*mdio_read) (struct net_device *dev, int mii_id, int reg);
+	void (*mdio_write) (struct net_device *dev, int mii_id, int reg, int val);
+	void			*platform_data;
+};
+
+/* Pass in a struct mii_phy with dev, mdio_read and mdio_write
+ * filled, the remaining fields will be filled on return
+ */
+extern int mii_phy_probe(struct mii_phy *phy, int mii_id);
+
+
+/* MII definitions missing from mii.h */
+
+#define BMCR_SPD2	0x0040		/* Gigabit enable (bcm54xx)	*/
+#define LPA_PAUSE	0x0400
+
+/* More PHY registers (model specific) */
+
+/* MII BCM5201 MULTIPHY interrupt register */
+#define MII_BCM5201_INTERRUPT			0x1A
+#define MII_BCM5201_INTERRUPT_INTENABLE		0x4000
+
+#define MII_BCM5201_AUXMODE2			0x1B
+#define MII_BCM5201_AUXMODE2_LOWPOWER		0x0008
+
+#define MII_BCM5201_MULTIPHY                    0x1E
+
+/* MII BCM5201 MULTIPHY register bits */
+#define MII_BCM5201_MULTIPHY_SERIALMODE         0x0002
+#define MII_BCM5201_MULTIPHY_SUPERISOLATE       0x0008
+
+/* MII BCM5221 Additional registers */
+#define MII_BCM5221_TEST			0x1f
+#define MII_BCM5221_TEST_ENABLE_SHADOWS		0x0080
+#define MII_BCM5221_SHDOW_AUX_STAT2		0x1b
+#define MII_BCM5221_SHDOW_AUX_STAT2_APD		0x0020
+#define MII_BCM5221_SHDOW_AUX_MODE4		0x1a
+#define MII_BCM5221_SHDOW_AUX_MODE4_IDDQMODE	0x0001
+#define MII_BCM5221_SHDOW_AUX_MODE4_CLKLOPWR	0x0004
+
+/* MII BCM5241 Additional registers */
+#define MII_BCM5241_SHDOW_AUX_MODE4_STANDBYPWR	0x0008
+
+/* MII BCM5400 1000-BASET Control register */
+#define MII_BCM5400_GB_CONTROL			0x09
+#define MII_BCM5400_GB_CONTROL_FULLDUPLEXCAP	0x0200
+
+/* MII BCM5400 AUXCONTROL register */
+#define MII_BCM5400_AUXCONTROL                  0x18
+#define MII_BCM5400_AUXCONTROL_PWR10BASET       0x0004
+
+/* MII BCM5400 AUXSTATUS register */
+#define MII_BCM5400_AUXSTATUS                   0x19
+#define MII_BCM5400_AUXSTATUS_LINKMODE_MASK     0x0700
+#define MII_BCM5400_AUXSTATUS_LINKMODE_SHIFT    8
+
+/* 1000BT control (Marvell & BCM54xx at least) */
+#define MII_1000BASETCONTROL			0x09
+#define MII_1000BASETCONTROL_FULLDUPLEXCAP	0x0200
+#define MII_1000BASETCONTROL_HALFDUPLEXCAP	0x0100
+
+/* Marvell 88E1011 PHY control */
+#define MII_M1011_PHY_SPEC_CONTROL		0x10
+#define MII_M1011_PHY_SPEC_CONTROL_MANUAL_MDIX	0x20
+#define MII_M1011_PHY_SPEC_CONTROL_AUTO_MDIX	0x40
+
+/* Marvell 88E1011 PHY status */
+#define MII_M1011_PHY_SPEC_STATUS		0x11
+#define MII_M1011_PHY_SPEC_STATUS_1000		0x8000
+#define MII_M1011_PHY_SPEC_STATUS_100		0x4000
+#define MII_M1011_PHY_SPEC_STATUS_SPD_MASK	0xc000
+#define MII_M1011_PHY_SPEC_STATUS_FULLDUPLEX	0x2000
+#define MII_M1011_PHY_SPEC_STATUS_RESOLVED	0x0800
+#define MII_M1011_PHY_SPEC_STATUS_TX_PAUSE	0x0008
+#define MII_M1011_PHY_SPEC_STATUS_RX_PAUSE	0x0004
+
+#endif /* __SUNGEM_PHY_H__ */
-- 
1.7.6

^ permalink raw reply related

* Re: ip_rt_bug: 10.0.0.52 -> 255.255.255.255, ?
From: David Miller @ 2011-08-15  5:56 UTC (permalink / raw)
  To: justinmattock; +Cc: linux-kernel, linux-wireless, netdev
In-Reply-To: <4E48B4B6.8000603@gmail.com>

From: "Justin P. Mattock" <justinmattock@gmail.com>
Date: Sun, 14 Aug 2011 22:55:02 -0700

> Oh.. guess I will wait for those to be put into the Mainline
> then. Thanks for the info!

It is in mainline.

^ permalink raw reply

* Re: [net-next 03/12] ethoc: Move the Avionic driver
From: Thierry Reding @ 2011-08-15  6:06 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: davem, netdev, gospo, sassmann
In-Reply-To: <1313222196-10074-4-git-send-email-jeffrey.t.kirsher@intel.com>

[-- Attachment #1: Type: text/plain, Size: 318 bytes --]

* Jeff Kirsher wrote:
> Move the Avionic driver into drivers/net/ethernet/ and make the
> necessary Kconfig and Makefile changes.
> 
> CC: Thierry Reding <thierry.reding@avionic-design.de>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Acked-by: Thierry Reding <thierry.reding@avionic-design.de>

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: ip_rt_bug: 10.0.0.52 -> 255.255.255.255, ?
From: Justin P. Mattock @ 2011-08-15  6:09 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel, linux-wireless, netdev
In-Reply-To: <20110814.225633.1354751384914126321.davem@davemloft.net>

On 08/14/2011 10:56 PM, David Miller wrote:
> From: "Justin P. Mattock"<justinmattock@gmail.com>
> Date: Sun, 14 Aug 2011 22:55:02 -0700
>
>> Oh.. guess I will wait for those to be put into the Mainline
>> then. Thanks for the info!
>
> It is in mainline.
>

alright.. I see it now. I pulled on thurs/fri but never went any further 
with the build after that. will clean and build when I get a chance and 
run it.

Thanks again!

Justin P. Mattock

^ permalink raw reply

* Re: [PATCH] tcp: Use LIMIT_NETDEBUG in syn_flood_warning()
From: David Miller @ 2011-08-15  6:39 UTC (permalink / raw)
  To: eric.dumazet; +Cc: therbert, netdev
In-Reply-To: <1313129310.2669.19.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 12 Aug 2011 08:08:30 +0200

> [PATCH] tcp: Use LIMIT_NETDEBUG in syn_flood_warning()
> 
> LIMIT_NETDEBUG allows the admin to disable some warning messages :
> echo 0 > /proc/sys/net/core/warnings
> 
> Use it to avoid filling syslog on busy servers.
> 
> Based on a previous patch from Tom Herbert
> 
> Factorize syn_flood_warning() IPv4/IPv6 implementations
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Tom Herbert <therbert@google.com>

This is a big hammer with no granularity.

I have to give up other potentially interesting log messages (open
request drops, IP frag out-of-memory, etc.) just to get this one to go
away.

I still stand by my original suggestion, print unconditionally and
only once, and also add the statistics counters.

^ permalink raw reply

* Bonding problem
From: Eduard Sinelnikov @ 2011-08-15  9:44 UTC (permalink / raw)
  To: netdev, Andy Gospodarek, Jay Vosburgh

[-- Attachment #1: Type: text/plain, Size: 3853 bytes --]

Hi all,

Following the thread:
http://marc.info/?l=linux-netdev&m=131282467512508&w=2

I have created the this patch for kernel version:3.0.1, which may fix
the bonding problem

Patch explanation:
The patch seting all slaves active prior to switching to round robin mode.
This is done to ensure that every posibly active slave will be used in
communication.

Also, I noticed that just changing the bond_xmit_round_robin will only
partially fix the problem.
Since slaves with inactive bit will not CATCH any trafic.

I wonder if I should remove the check "bond_is_active_slave(slave))"
in bond_xmit_round_robin

Please advice.
            Eduard


On Mon, Aug 08, 2011 at 10:06:05AM -0700, Jay Vosburgh wrote:
>
> Andy Gospodarek <andy@greyhouse.net> wrote:
>
> >On Sun, Aug 07, 2011 at 03:00:30PM +0300, Eduard Sinelnikov wrote:
> >> Hi,
> >>
> >> In the kernel 2.6.39.3 ( /drivers/net/bond/bond_main.c).
> >> In the function Â â€˜bond_xmit_roundrobinâ€™
> >> The code check if the bond is active via
> >> â€˜bond_is_active_slave(slave)â€™ Function call.
> >> Which actually checks if the slave is backup or active
> >> What is the meaning of slave being Â backup in round robin mode?
> >> Correct me if I wrong but in round robin every slave should send a
> >> packet, regardless of being active or backup.
> >>
> >> Thank you,
> >> Â  Â  Â  Â  Â  Â Eduard
> >
> >There probably is not a compelling reason to continue to have it.  There
> >may be a reason historically, but I'm not aware what that might be at
> >this point.  For modes other than active-backup, the value of
> >slave->link and slave->backup should always contain a value that
> >indicates the slave is up and available for transmit.
>
> If you read Eduard's other posts regarding this, the actual
> issue is that when changing from another mode into round-robin,
> occasionally slaves will still be marked as "backup" and won't be used:
>

I did notice that one after I sent this first response.

> >Date: Mon, 8 Aug 2011 11:16:39 +0300
> >Subject: On line Bonding configuration change fails
> >From: Eduard Sinelnikov <eduard.sinelnikov@gmail.com>
> >To: netdev@vger.kernel.org
> >Sender: netdev-owner@vger.kernel.org
> >
> >Hi,
> >
> >My configuration is a follows:
> >
> >Â  Â  Â  Â  Â  Â  Â | eth0 -------------->
> >Ububntu | eth1 --------------> Â  Â Swith ------------> Other computer
> >
> >Scenario:
> >â€¢ change the bond mode to active/backup
> >â€¢ unplug some of the cable
> >â€¢ plug-in the unplugged cable
> >â€¢ change bond mode to round robin
> >
> >I can see that only one eth1 is sending data. When I unplug it the ping stops.
> >
> >Is it a bug or some mis-configuration?
> >
> >In the kernel ( /drivers/net/bond/bond_main.c).
> >In the function Â â€˜bond_xmit_roundrobin
> >â€™
> >The code check if the bond is active via
> >â€˜bond_is_active_slave(slave)â€™ Function call.
> >Which actually checks if the slave is backup or active
> >What is the meaning of backup in round robin?
> >Correct me if I wrong but in round robin every slave should send a
> >packet, regardless of being active or backup.
>
> So from looking at the code, it seems that the actual problem is
> that when transitioning to round-robin mode, one or more slaves can
> remain marked as "backup," and in round-robin mode, that won't ever
> change.  We could probably work around that by removing the "is_active"
> test (essentially declaring that "is_active" is only valid in
> active-backup mode).  That might produce a few odd messages here and
> there (when removing a slave or during a link failure, for example).
>
> From inspection, the bond_xmit_xor function likely has this same
> problem.
>

Agreed.

> -J
>
> ---
> -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

[-- Attachment #2: bond_patch.patch --]
[-- Type: application/octet-stream, Size: 1471 bytes --]

diff -uprN linux-3.0.1/drivers/net/bonding/bond_sysfs.c linux-3.0.1.bond/drivers/net/bonding/bond_sysfs.c
--- linux-3.0.1/drivers/net/bonding/bond_sysfs.c	2011-08-05 07:59:21.000000000 +0300
+++ linux-3.0.1.bond/drivers/net/bonding/bond_sysfs.c	2011-08-15 11:59:13.346377263 +0300
@@ -290,6 +290,37 @@ static ssize_t bonding_show_mode(struct
 			bond->params.mode);
 }
 
+
+
+// activate all interfaces.
+static void inline bonding_activate_interfaces(struct bonding * bond )
+{
+	struct slave *slave ;
+	int i ;
+
+	
+
+	read_lock(&bond->lock);	
+	
+	bond_for_each_slave(bond, slave, i) {
+	
+		read_lock(&bond->curr_slave_lock);
+	
+		// change the backup to active since there is no meaninng of backup in round robin.
+		// Also, change the device state so it can catch traffic.
+		if ((  bond_slave_state(slave) ) || slave->inactive ) {
+			if ((slave->link == BOND_LINK_UP) && IS_UP(slave->dev)) {
+				bond_set_slave_active_flags(slave);
+			}
+		}
+		
+		read_unlock(&bond->curr_slave_lock);
+	}
+	
+	read_unlock(&bond->lock);
+
+} 
+
 static ssize_t bonding_store_mode(struct device *d,
 				  struct device_attribute *attr,
 				  const char *buf, size_t count)
@@ -320,6 +351,10 @@ static ssize_t bonding_store_mode(struct
 		goto out;
 	}
 
+	if (bond->params.mode == BOND_MODE_ROUNDROBIN) {
+		bonding_activate_interfaces(bond) ;
+	}
+
 	bond->params.mode = new_value;
 	bond_set_mode_ops(bond, bond->params.mode);
 	pr_info("%s: setting mode to %s (%d).\n",

^ permalink raw reply

* Re: Bonding problem
From: WeipingPan @ 2011-08-15 10:22 UTC (permalink / raw)
  To: Eduard Sinelnikov; +Cc: netdev, Andy Gospodarek, Jay Vosburgh
In-Reply-To: <CANMAZFWYUqDDp+_9EOOubRcvC90zHO9NR7D_N5D=uTjokpgU6A@mail.gmail.com>

On 08/15/2011 05:44 PM, Eduard Sinelnikov wrote:
> Hi all,
>
> Following the thread:
> http://marc.info/?l=linux-netdev&m=131282467512508&w=2
>
> I have created the this patch for kernel version:3.0.1, which may fix
> the bonding problem
>
> Patch explanation:
> The patch seting all slaves active prior to switching to round robin mode.
> This is done to ensure that every posibly active slave will be used in
> communication.
>
> Also, I noticed that just changing the bond_xmit_round_robin will only
> partially fix the problem.
> Since slaves with inactive bit will not CATCH any trafic.
>
> I wonder if I should remove the check "bond_is_active_slave(slave))"
> in bond_xmit_round_robin
>
> Please advice.
>              Eduard
>
>
My patch is to restore the backup and inactive flag of slave, too,
and I think it is more generic. :-)

Will send it soon.

thanks
Weiping Pan

> On Mon, Aug 08, 2011 at 10:06:05AM -0700, Jay Vosburgh wrote:
>> Andy Gospodarek<andy@greyhouse.net>  wrote:
>>
>>> On Sun, Aug 07, 2011 at 03:00:30PM +0300, Eduard Sinelnikov wrote:
>>>> Hi,
>>>>
>>>> In the kernel 2.6.39.3 ( /drivers/net/bond/bond_main.c).
>>>> In the function Â â€˜bond_xmit_roundrobinâ€™
>>>> The code check if the bond is active via
>>>> â€˜bond_is_active_slave(slave)â€™ Function call.
>>>> Which actually checks if the slave is backup or active
>>>> What is the meaning of slave being Â backup in round robin mode?
>>>> Correct me if I wrong but in round robin every slave should send a
>>>> packet, regardless of being active or backup.
>>>>
>>>> Thank you,
>>>> Â  Â  Â  Â  Â  Â Eduard
>>> There probably is not a compelling reason to continue to have it.  There
>>> may be a reason historically, but I'm not aware what that might be at
>>> this point.  For modes other than active-backup, the value of
>>> slave->link and slave->backup should always contain a value that
>>> indicates the slave is up and available for transmit.
>> If you read Eduard's other posts regarding this, the actual
>> issue is that when changing from another mode into round-robin,
>> occasionally slaves will still be marked as "backup" and won't be used:
>>
> I did notice that one after I sent this first response.
>
>>> Date: Mon, 8 Aug 2011 11:16:39 +0300
>>> Subject: On line Bonding configuration change fails
>>> From: Eduard Sinelnikov<eduard.sinelnikov@gmail.com>
>>> To: netdev@vger.kernel.org
>>> Sender: netdev-owner@vger.kernel.org
>>>
>>> Hi,
>>>
>>> My configuration is a follows:
>>>
>>> Â  Â  Â  Â  Â  Â  Â | eth0 -------------->
>>> Ububntu | eth1 -------------->  Â  Â Swith ------------>  Other computer
>>>
>>> Scenario:
>>> â€¢ change the bond mode to active/backup
>>> â€¢ unplug some of the cable
>>> â€¢ plug-in the unplugged cable
>>> â€¢ change bond mode to round robin
>>>
>>> I can see that only one eth1 is sending data. When I unplug it the ping stops.
>>>
>>> Is it a bug or some mis-configuration?
>>>
>>> In the kernel ( /drivers/net/bond/bond_main.c).
>>> In the function Â â€˜bond_xmit_roundrobin
>>> â€™
>>> The code check if the bond is active via
>>> â€˜bond_is_active_slave(slave)â€™ Function call.
>>> Which actually checks if the slave is backup or active
>>> What is the meaning of backup in round robin?
>>> Correct me if I wrong but in round robin every slave should send a
>>> packet, regardless of being active or backup.
>> So from looking at the code, it seems that the actual problem is
>> that when transitioning to round-robin mode, one or more slaves can
>> remain marked as "backup," and in round-robin mode, that won't ever
>> change.  We could probably work around that by removing the "is_active"
>> test (essentially declaring that "is_active" is only valid in
>> active-backup mode).  That might produce a few odd messages here and
>> there (when removing a slave or during a link failure, for example).
>>
>>  From inspection, the bond_xmit_xor function likely has this same
>> problem.
>>
> Agreed.
>
>> -J
>>
>> ---
>> -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com


^ permalink raw reply

* [PATCH net-2.6] bonding:restore backup and inactive flag of slave
From: Weiping Pan @ 2011-08-15 10:25 UTC (permalink / raw)
  To: netdev; +Cc: Weiping Pan
In-Reply-To: <CANMAZFWYUqDDp+_9EOOubRcvC90zHO9NR7D_N5D=uTjokpgU6A@mail.gmail.com>

Eduard Sinelnikov (eduard.sinelnikov@gmail.com) found that if we change
bonding mode from active backup to round robin, some slaves are still keeping
"backup", and won't transmit packets.

As Jay Vosburgh(fubar@us.ibm.com) pointed out that we can work around that by
removing the bond_is_active_slave() check, because the "backup" flag is only
meaningful for active backup mode.

But if we just simply ignore the bond_is_active_slave() check,
the transmission will work fine, but we can't maintain the correct value of
"backup" flag for each slaves, though it is meaningless for other mode than
active backup.

I'd like to restore "backup" and "inactive" flag in bond_open,
thus we can keep the correct value of them.

As for bond_is_active_slave(), I'd like to prepare another patch to handle it.

Signed-off-by: Weiping Pan <panweiping3@gmail.com>
---
 drivers/net/bonding/bond_main.c |   18 ++++++++++++++++++
 1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 38a83ac..3ed9827 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3419,9 +3419,27 @@ static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
 static int bond_open(struct net_device *bond_dev)
 {
 	struct bonding *bond = netdev_priv(bond_dev);
+	struct slave *slave;
+	int i;

 	bond->kill_timers = 0;

+	// restore slave->backup and slave->inactive
+	read_lock(&bond->lock);
+	read_lock(&bond->curr_slave_lock);
+	if (bond->slave_cnt > 0) {
+		bond_for_each_slave(bond, slave, i) {
+			if ((bond->params.mode == BOND_MODE_ACTIVEBACKUP)
+				&& (slave != bond->curr_active_slave)) {
+				bond_set_slave_inactive_flags(slave);
+			} else {
+				bond_set_slave_active_flags(slave);
+			}
+		}
+	}
+	read_unlock(&bond->curr_slave_lock);
+	read_unlock(&bond->lock);
+
 	INIT_DELAYED_WORK(&bond->mcast_work, bond_resend_igmp_join_requests_delayed);

 	if (bond_is_lb(bond)) {
-- 
1.7.4.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox