Netdev List
 help / color / mirror / Atom feed
* Re: Unable to flush ICMP redirect routes in kernel 3.0+
From: Ivan Zahariev @ 2011-11-17  8:10 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20111116223330.08de9e52@asterix.rh>

On 17.11.2011 г. 02:33 ч., Flavio Leitner wrote:
> On Thu, 17 Nov 2011 00:32:18 +0200
> Ivan Zahariev<famzah@icdsoft.com>  wrote:
>
>> On 11/15/2011 11:09 PM, Eric Dumazet wrote:
>>> Le mardi 15 novembre 2011 à 22:23 +0200, Ivan Zahariev a écrit :
>>>> Hello,
>>>>
>>>> We have changed nothing in our network infrastructure but only
>>>> upgraded from Linux kernel 2.6.36.2 to 3.0.3. Here is the problem
>>>> we are experiencing:
>>>>
>>>> ICMP redirected routes are cached forever, and they can be cleared
>>>> only by a reboot.
>>>>
>> ### (bug #1) even though we flushed the route cache, the<redirected>
>> route resurrects from somewhere; even without making any TCP requests
>> ### this time what "ip" returns is consistent with the real
>> (incorrect) routing behavior of machine5
>> root@machine5:~# ip route flush cache
>> root@machine5:~# ip route list cache match 8.8.4.4
>> root@machine5:~# ip route get 8.8.4.4
>> 8.8.4.4 via 192.168.0.120 dev eth0  src 192.168.0.244
>>       cache<redirected>   ipid 0x303a
>>
>> ### only a reboot clears the cached<redirected>  routes
> IIRC, the cache flush doesn't affect the inetpeer where the
> redirected gateway is now stored, so even after flushing the
> route cache, the inetpeer will restore the old info later.
>
> fbl
OK, I guess my questions now are:
* How to flush the inetpeer (redirected cache info) without having to 
reboot the machine?
* Why "ip route" returns an incorrect route; example:

### (bug #2) what "ip route" returns is inconsistent, because we are 
using the <redirected> route 192.168.0.120 in reality
### note that the count of the route lines increased with one
root@machine5:~# ip route list cache match 8.8.4.4
8.8.4.4 from 192.168.0.244 tos lowdelay via 192.168.0.8 dev eth0
     cache  ipid 0x303a
8.8.4.4 tos lowdelay via 192.168.0.8 dev eth0  src 192.168.0.244
     cache  ipid 0x303a
8.8.4.4 via 192.168.0.8 dev eth0  src 192.168.0.244
     cache
8.8.4.4 from 192.168.0.244 tos lowdelay via 192.168.0.8 dev eth0
     cache  ipid 0x303a

### After "ip route flush cache", the output of "ip route" gets 
consistent with the real routing behavior of machine5
root@machine5:~# ip route flush cache
root@machine5:~# ip route list cache match 8.8.4.4
root@machine5:~# ip route get 8.8.4.4
8.8.4.4 via 192.168.0.120 dev eth0  src 192.168.0.244
     cache <redirected>  ipid 0x303a

Thanks.
--Ivan

^ permalink raw reply

* Re: [PATCH net-next] r8169: Add 64bit statistics
From: Eric Dumazet @ 2011-11-17  8:11 UTC (permalink / raw)
  To: Junchang Wang; +Cc: Stephen Hemminger, netdev, romieu, nic swsd
In-Reply-To: <CABoNC82RO2uvn9TfToAygEspUZnNrefXgO6SGpZoSAayCp3QiA@mail.gmail.com>

Le jeudi 17 novembre 2011 à 15:46 +0800, Junchang Wang a écrit :
> > You dont need per-cpu since Tx is locked by dev->xmit_lock and
> > rx is implicitly single threaded by NAPI.
> 
> Thanks.
> 
> >You do need to have
> > two u64_stat_sync entries (one for Tx and one for Rx).
> 
> You mean Rx and Tx will perform on different cores at one moment.
> So I need a sync for Tx to protect tx_xxx, and another for Rx to
> protect rx_xxx. Is that right?
> 

Yes, look at sky2.c for a template

drivers/net/ethernet/marvell/sky2.c contains code like that
(different syncp for rx/tx)

TX path:
                        u64_stats_update_begin(&sky2->tx_stats.syncp);
                        ++sky2->tx_stats.packets;
                        sky2->tx_stats.bytes += skb->len;
                        u64_stats_update_end(&sky2->tx_stats.syncp);


RX path:

        u64_stats_update_begin(&sky2->rx_stats.syncp);
        sky2->rx_stats.packets += packets;
        sky2->rx_stats.bytes += bytes;
        u64_stats_update_end(&sky2->rx_stats.syncp);

^ permalink raw reply

* Re: [PATCH 1/5] stmmac: use mdelay on timeout of sw reset
From: David Miller @ 2011-11-17  8:13 UTC (permalink / raw)
  To: peppe.cavallaro; +Cc: netdev, francesco.virlinzi, srinivas.kandagatla
In-Reply-To: <1321516682-32208-1-git-send-email-peppe.cavallaro@st.com>

From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
Date: Thu, 17 Nov 2011 08:57:58 +0100

> From: Francesco Virlinzi <francesco.virlinzi@st.com>
> 
> This patch uses an mdelay to manage the timeout on
> sw reset to be independant of cpu_clk.
> 
> Signed-off-by: Francesco Virlinzi <francesco.virlinzi@st.com>
> Reviewed-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>

Applied.

^ permalink raw reply

* Re: [PATCH 2/5] stmmac: fix advertising 1000Base capabilties for non GMII iface
From: David Miller @ 2011-11-17  8:14 UTC (permalink / raw)
  To: peppe.cavallaro; +Cc: netdev, francesco.virlinzi, srinivas.kandagatla
In-Reply-To: <1321516682-32208-2-git-send-email-peppe.cavallaro@st.com>

From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
Date: Thu, 17 Nov 2011 08:57:59 +0100

> From: Srinivas Kandagatla <srinivas.kandagatla@st.com>
> 
> This patch fixes the way to stop the 1000Base advertising
> capabilties for non GMII interfaces.
> 
> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>

Applied.

^ permalink raw reply

* Re: [PATCH 3/5] stmmac: parameters auto-tuning through HW cap reg
From: David Miller @ 2011-11-17  8:14 UTC (permalink / raw)
  To: peppe.cavallaro; +Cc: netdev, francesco.virlinzi, srinivas.kandagatla
In-Reply-To: <1321516682-32208-3-git-send-email-peppe.cavallaro@st.com>

From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
Date: Thu, 17 Nov 2011 08:58:00 +0100

> New GMAC devices (newer than the databook 3.50a) have the
> HW capability register that provides which features are actually
> supported by the hardware.
> 
> On old devices many information have to be passed through the
> platform, for example: enhanced descriptor structure,
> TX COE etc. These are mandatory to properly configure the driver.
> This remains still valid because the driver has to support old
> Synopsys devices but now it's also able to override them using the
> values from the HW capability register if supported.
> 
> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>

Applied.

^ permalink raw reply

* Re: [PATCH 4/5] stmmac: remove spin_lock in stmmac_ioctl.
From: David Miller @ 2011-11-17  8:14 UTC (permalink / raw)
  To: peppe.cavallaro; +Cc: netdev, francesco.virlinzi, srinivas.kandagatla
In-Reply-To: <1321516682-32208-4-git-send-email-peppe.cavallaro@st.com>

From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
Date: Thu, 17 Nov 2011 08:58:01 +0100

> From: Srinivas Kandagatla <srinivas.kandagatla@st.com>
> 
> This patch removes un-needed spin_lock in stmmac_ioctl while reading and
> writing mdio registers. While holding spin_lock the code must be
> atomic, which is not true in this case as both mdiobus_read and writes
> have mutex locks.
> 
> Without this patch reading mdio registers via mii-tool results in below
> BUG:
> mii-tool -vvv eth0"
> Using SIOCGMIIPHY=0x8947
> BUG: sleeping function called from invalid context at kernel/mutex.c:287
 ...
> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>

Applied.

^ permalink raw reply

* Re: [PATCH 5/5] stmmac: fix pm functions avoiding sleep on spinlock
From: David Miller @ 2011-11-17  8:14 UTC (permalink / raw)
  To: peppe.cavallaro; +Cc: netdev, francesco.virlinzi, srinivas.kandagatla
In-Reply-To: <1321516682-32208-5-git-send-email-peppe.cavallaro@st.com>

From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
Date: Thu, 17 Nov 2011 08:58:02 +0100

> From: Francesco Virlinzi <francesco.virlinzi@st.com>
> 
> This patch fixes the pm functions to avoid the system
> sleeps while a spinlock is taken.
> 
> Signed-off-by: Francesco Virlinzi <francesco.virlinzi@st.com>
> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] IPV6 Fix a crash when trying to replace non existing route
From: David Miller @ 2011-11-17  8:19 UTC (permalink / raw)
  To: matti.vaittinen; +Cc: netdev
In-Reply-To: <1321514282.1858.125.camel@hakki>

From: Matti Vaittinen <matti.vaittinen@nsn.com>
Date: Thu, 17 Nov 2011 09:18:02 +0200

> 
> This patch fixes a crash when non existing IPv6 route is tried to be changed.
> 
> When new destination node was inserted in middle of FIB6 tree, no relevant
> sanity checks were performed. Later route insertion might have been prevented
> due to invalid request, causing node with no rt info being left in tree. 
> When this node was accessed, a crash occurred.
> 
> Patch adds missing checks in fib6_add_1()
> 
> Signed-off-by: Matti Vaittinen <Mazziesaccount@gmail.com>

Applied.

I also added the following patch, I should have caught this in your
original submission.

--------------------
[PATCH] ipv6: Use pr_warn() in ip6_fib.c

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv6/ip6_fib.c |   20 ++++++++++----------
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index e7b26dc..424f063 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -451,12 +451,12 @@ static struct fib6_node * fib6_add_1(struct fib6_node *root, void *addr,
 		    !ipv6_prefix_equal(&key->addr, addr, fn->fn_bit)) {
 			if (!allow_create) {
 				if (replace_required) {
-					printk(KERN_WARNING
-					    "IPv6: Can't replace route, no match found\n");
+					pr_warn("IPv6: Can't replace route, "
+						"no match found\n");
 					return ERR_PTR(-ENOENT);
 				}
-				printk(KERN_WARNING
-				    "IPv6: NLM_F_CREATE should be set when creating new route\n");
+				pr_warn("IPv6: NLM_F_CREATE should be set "
+					"when creating new route\n");
 			}
 			goto insert_above;
 		}
@@ -499,11 +499,11 @@ static struct fib6_node * fib6_add_1(struct fib6_node *root, void *addr,
 		 * That would keep IPv6 consistent with IPv4
 		 */
 		if (replace_required) {
-			printk(KERN_WARNING
-			    "IPv6: Can't replace route, no match found\n");
+			pr_warn("IPv6: Can't replace route, no match found\n");
 			return ERR_PTR(-ENOENT);
 		}
-		printk(KERN_WARNING "IPv6: NLM_F_CREATE should be set when creating new route\n");
+		pr_warn("IPv6: NLM_F_CREATE should be set "
+			"when creating new route\n");
 	}
 	/*
 	 *	We walked to the bottom of tree.
@@ -697,7 +697,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
 	 */
 	if (!replace) {
 		if (!add)
-			printk(KERN_WARNING "IPv6: NLM_F_CREATE should be set when creating new route\n");
+			pr_warn("IPv6: NLM_F_CREATE should be set when creating new route\n");
 
 add:
 		rt->dst.rt6_next = iter;
@@ -716,7 +716,7 @@ add:
 		if (!found) {
 			if (add)
 				goto add;
-			printk(KERN_WARNING "IPv6: NLM_F_REPLACE set, but no existing node found!\n");
+			pr_warn("IPv6: NLM_F_REPLACE set, but no existing node found!\n");
 			return -ENOENT;
 		}
 		*ins = rt;
@@ -768,7 +768,7 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info)
 			replace_required = 1;
 	}
 	if (!allow_create && !replace_required)
-		printk(KERN_WARNING "IPv6: RTM_NEWROUTE with no NLM_F_CREATE or NLM_F_REPLACE\n");
+		pr_warn("IPv6: RTM_NEWROUTE with no NLM_F_CREATE or NLM_F_REPLACE\n");
 
 	fn = fib6_add_1(root, &rt->rt6i_dst.addr, sizeof(struct in6_addr),
 		    rt->rt6i_dst.plen, offsetof(struct rt6_info, rt6i_dst),
-- 
1.7.6.4

^ permalink raw reply related

* Re: [patch net-next 3/3] team: replicate options on register
From: Eric Dumazet @ 2011-11-17  8:32 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, bhutchings, shemminger, andy, fbl, jzupka, ivecera
In-Reply-To: <1321477749-1877-4-git-send-email-jpirko@redhat.com>

Le mercredi 16 novembre 2011 à 22:09 +0100, Jiri Pirko a écrit :

> +
> +int team_options_register(struct team *team,
> +			  const struct team_option *option,
> +			  size_t option_count)
>  {
>  	int i;
> +	struct team_option *dst_opts[option_count];
> +	int err;

This kind of construct will trigger static analyzer alerts...

^ permalink raw reply

* [PATCH] ipv4: avoid to double release dst in tcp_v4_connect
From: roy.qing.li @ 2011-11-17  8:33 UTC (permalink / raw)
  To: netdev

From: RongQing.Li <roy.qing.li@gmail.com> 

When tcp_connect failed in tcp_v4_connect, the dst will be
released in error handler of tcp_v4_connect, but dst has been
set to sk->sk_dst_cache which will be released again when
destroy this sk.

Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
---
 net/ipv4/tcp_ipv4.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index a744315..adc2992 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -263,8 +263,10 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 
 	err = tcp_connect(sk);
 	rt = NULL;
-	if (err)
+	if (err) {
+		sk->sk_dst_cache = NULL;
 		goto failure;
+	}
 
 	return 0;
 
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH 1/6] sky2: fix hang on shutdown (and other irq issues)
From: Sven Joachim @ 2011-11-17  8:46 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: davem, netdev
In-Reply-To: <20111116234344.526517614@vyatta.com>

On 2011-11-17 00:42 +0100, Stephen Hemminger wrote:

> There are several problems with recent change to how IRQ's are setup.
>    * synchronize_irq in sky2_shutdown would hang because there
>      was no IRQ setup.
>    * when device was set to down, some IRQ bits left enabled so a
>      hardware error would produce IRQ with no handler
>    * quick link on Optima chip set was enabled without handler
>    * suspend/resume would leave IRQ on with no handler if device
>      was down

Unfortunately, this patch does not fix the hang at shutdown for me. :-(

> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>
> ---
> This needs to be applied to net-next and -net
>
> --- a/drivers/net/ethernet/marvell/sky2.c	2011-11-16 15:15:48.212513321 -0800
> +++ b/drivers/net/ethernet/marvell/sky2.c	2011-11-16 15:19:32.898508932 -0800
> @@ -1747,6 +1747,11 @@ static int sky2_up(struct net_device *de
>  
>  	sky2_hw_up(sky2);
>  
> +	if (hw->chip_id == CHIP_ID_YUKON_OPT ||
> +	    hw->chip_id == CHIP_ID_YUKON_PRM ||
> +	    hw->chip_id == CHIP_ID_YUKON_OP_2)
> +		imask |= Y2_IS_PHY_QLNK;	/* enable PHY Quick Link */
> +
>  	/* Enable interrupts from phy/mac for port */
>  	imask = sky2_read32(hw, B0_IMSK);
>  	imask |= portirq_msk[port];
> @@ -2101,15 +2106,21 @@ static int sky2_down(struct net_device *
>  
>  	netif_info(sky2, ifdown, dev, "disabling interface\n");
>  
> -	/* Disable port IRQ */
> -	sky2_write32(hw, B0_IMSK,
> -		     sky2_read32(hw, B0_IMSK) & ~portirq_msk[sky2->port]);
> -	sky2_read32(hw, B0_IMSK);
> -
>  	if (hw->ports == 1) {
> +		sky2_write32(hw, B0_IMSK, 0);
> +		sky2_read32(hw, B0_IMSK);
> +
>  		napi_disable(&hw->napi);
>  		free_irq(hw->pdev->irq, hw);
>  	} else {
> +		u32 imask;
> +
> +		/* Disable port IRQ */
> +		imask  = sky2_read32(hw, B0_IMSK);
> +		imask &= ~portirq_msk[sky2->port];
> +		sky2_write32(hw, B0_IMSK, imask);
> +		sky2_read32(hw, B0_IMSK);
> +
>  		synchronize_irq(hw->pdev->irq);
>  		napi_synchronize(&hw->napi);
>  	}
> @@ -3258,7 +3269,6 @@ static void sky2_reset(struct sky2_hw *h
>  	    hw->chip_id == CHIP_ID_YUKON_PRM ||
>  	    hw->chip_id == CHIP_ID_YUKON_OP_2) {
>  		u16 reg;
> -		u32 msk;
>  
>  		if (hw->chip_id == CHIP_ID_YUKON_OPT && hw->chip_rev == 0) {
>  			/* disable PCI-E PHY power down (set PHY reg 0x80, bit 7 */
> @@ -3281,11 +3291,6 @@ static void sky2_reset(struct sky2_hw *h
>  		sky2_write8(hw, B2_TST_CTRL1, TST_CFG_WRITE_ON);
>  		sky2_pci_write16(hw, PSM_CONFIG_REG4, reg);
>  
> -		/* enable PHY Quick Link */
> -		msk = sky2_read32(hw, B0_IMSK);
> -		msk |= Y2_IS_PHY_QLNK;
> -		sky2_write32(hw, B0_IMSK, msk);
> -
>  		/* check if PSMv2 was running before */
>  		reg = sky2_pci_read16(hw, PSM_CONFIG_REG3);
>  		if (reg & PCI_EXP_LNKCTL_ASPMC)
> @@ -3412,7 +3417,9 @@ static void sky2_all_down(struct sky2_hw
>  
>  	sky2_read32(hw, B0_IMSK);
>  	sky2_write32(hw, B0_IMSK, 0);
> -	synchronize_irq(hw->pdev->irq);
> +
> +	if (hw->ports > 1 || netif_running(hw->dev[0]))
> +		synchronize_irq(hw->pdev->irq);
>  	napi_disable(&hw->napi);
>  
>  	for (i = 0; i < hw->ports; i++) {
> @@ -3430,7 +3437,7 @@ static void sky2_all_down(struct sky2_hw
>  
>  static void sky2_all_up(struct sky2_hw *hw)
>  {
> -	u32 imask = Y2_IS_BASE;
> +	u32 imask = 0;
>  	int i;
>  
>  	for (i = 0; i < hw->ports; i++) {
> @@ -3446,11 +3453,13 @@ static void sky2_all_up(struct sky2_hw *
>  		netif_wake_queue(dev);
>  	}
>  
> -	sky2_write32(hw, B0_IMSK, imask);
> -	sky2_read32(hw, B0_IMSK);
> -
> -	sky2_read32(hw, B0_Y2_SP_LISR);
> -	napi_enable(&hw->napi);
> +	if (imask || hw->ports > 1) {
> +		imask |= Y2_IS_BASE;
> +		sky2_write32(hw, B0_IMSK, imask);
> +		sky2_read32(hw, B0_IMSK);
> +		sky2_read32(hw, B0_Y2_SP_LISR);
> +		napi_enable(&hw->napi);
> +	}
>  }
>  
>  static void sky2_restart(struct work_struct *work)

^ permalink raw reply

* Re: [PATCH 1/1] net/cadence: enable by default NET_ATMEL
From: Nicolas Ferre @ 2011-11-17  8:56 UTC (permalink / raw)
  To: Jean-Christophe PLAGNIOL-VILLARD; +Cc: linux-kernel, netdev
In-Reply-To: <1321385790-15056-1-git-send-email-plagnioj@jcrosoft.com>

On 11/15/2011 08:36 PM, Jean-Christophe PLAGNIOL-VILLARD :
> so the defconfig of the atmel continue to have the support of the network
> as before
> 
> Signed-off-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
> Cc: Nicolas Ferre <nicolas.ferre@atmel.com>

Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>

> ---
> Hi David,
> 
> 	can we have this for the 3.2 so the atmel continue to work as before
> 
> Best Regards,
> J.
>  drivers/net/ethernet/cadence/Kconfig |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cadence/Kconfig b/drivers/net/ethernet/cadence/Kconfig
> index 98849a1..b48378a 100644
> --- a/drivers/net/ethernet/cadence/Kconfig
> +++ b/drivers/net/ethernet/cadence/Kconfig
> @@ -7,6 +7,7 @@ config HAVE_NET_MACB
>  
>  config NET_ATMEL
>  	bool "Atmel devices"
> +	default y
>  	depends on HAVE_NET_MACB || (ARM && ARCH_AT91RM9200)
>  	---help---
>  	  If you have a network (Ethernet) card belonging to this class, say Y.


-- 
Nicolas Ferre

^ permalink raw reply

* [PATCH v7] Phonet: set the pipe handle using setsockopt
From: Hemant Vilas RAMDASI @ 2011-11-17  9:29 UTC (permalink / raw)
  To: netdev-owner
  Cc: netdev, remi.denis-courmont, Dinesh Kumar Sharma, Hemant Ramdasi

From: Dinesh Kumar Sharma <dinesh.sharma@stericsson.com>

This provides flexibility to set the pipe handle
using setsockopt. The pipe can be enabled (if disabled) later
using ioctl.

Signed-off-by: Hemant Ramdasi <hemant.ramdasi@stericsson.com>
Signed-off-by: Dinesh Kumar Sharma <dinesh.sharma@stericsson.com>
---
 include/linux/phonet.h |    2 +
 net/phonet/pep.c       |   96 ++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 91 insertions(+), 7 deletions(-)

diff --git a/include/linux/phonet.h b/include/linux/phonet.h
index 6fb1384..1847ef9 100644
--- a/include/linux/phonet.h
+++ b/include/linux/phonet.h
@@ -37,6 +37,7 @@
 #define PNPIPE_ENCAP		1
 #define PNPIPE_IFINDEX		2
 #define PNPIPE_HANDLE		3
+#define PNPIPE_INITSTATE	4
 
 #define PNADDR_ANY		0
 #define PNADDR_BROADCAST	0xFC
@@ -48,6 +49,7 @@
 
 /* ioctls */
 #define SIOCPNGETOBJECT		(SIOCPROTOPRIVATE + 0)
+#define SIOCPNENABLEPIPE	(SIOCPROTOPRIVATE + 13)
 #define SIOCPNADDRESOURCE	(SIOCPROTOPRIVATE + 14)
 #define SIOCPNDELRESOURCE	(SIOCPROTOPRIVATE + 15)
 
diff --git a/net/phonet/pep.c b/net/phonet/pep.c
index f17fd84..7acd262 100644
--- a/net/phonet/pep.c
+++ b/net/phonet/pep.c
@@ -533,6 +533,29 @@ static int pep_connresp_rcv(struct sock *sk, struct sk_buff *skb)
 	return pipe_handler_send_created_ind(sk);
 }
 
+static int pep_enableresp_rcv(struct sock *sk, struct sk_buff *skb)
+{
+	struct pnpipehdr *hdr = pnp_hdr(skb);
+
+	if (hdr->error_code != PN_PIPE_NO_ERROR)
+		return -ECONNREFUSED;
+
+	return pep_indicate(sk, PNS_PIPE_ENABLED_IND, 0 /* sub-blocks */,
+		NULL, 0, GFP_ATOMIC);
+
+}
+
+static void pipe_start_flow_control(struct sock *sk)
+{
+	struct pep_sock *pn = pep_sk(sk);
+
+	if (!pn_flow_safe(pn->tx_fc)) {
+		atomic_set(&pn->tx_credits, 1);
+		sk->sk_write_space(sk);
+	}
+	pipe_grant_credits(sk, GFP_ATOMIC);
+}
+
 /* Queue an skb to an actively connected sock.
  * Socket lock must be held. */
 static int pipe_handler_do_rcv(struct sock *sk, struct sk_buff *skb)
@@ -578,13 +601,25 @@ static int pipe_handler_do_rcv(struct sock *sk, struct sk_buff *skb)
 			sk->sk_state = TCP_CLOSE_WAIT;
 			break;
 		}
+		if (pn->init_enable == PN_PIPE_DISABLE)
+			sk->sk_state = TCP_SYN_RECV;
+		else {
+			sk->sk_state = TCP_ESTABLISHED;
+			pipe_start_flow_control(sk);
+		}
+		break;
 
-		sk->sk_state = TCP_ESTABLISHED;
-		if (!pn_flow_safe(pn->tx_fc)) {
-			atomic_set(&pn->tx_credits, 1);
-			sk->sk_write_space(sk);
+	case PNS_PEP_ENABLE_RESP:
+		if (sk->sk_state != TCP_SYN_SENT)
+			break;
+
+		if (pep_enableresp_rcv(sk, skb)) {
+			sk->sk_state = TCP_CLOSE_WAIT;
+			break;
 		}
-		pipe_grant_credits(sk, GFP_ATOMIC);
+
+		sk->sk_state = TCP_ESTABLISHED;
+		pipe_start_flow_control(sk);
 		break;
 
 	case PNS_PEP_DISCONNECT_RESP:
@@ -863,14 +898,32 @@ static int pep_sock_connect(struct sock *sk, struct sockaddr *addr, int len)
 	int err;
 	u8 data[4] = { 0 /* sub-blocks */, PAD, PAD, PAD };
 
-	pn->pipe_handle = 1; /* anything but INVALID_HANDLE */
+	if (pn->pipe_handle == PN_PIPE_INVALID_HANDLE)
+		pn->pipe_handle = 1; /* anything but INVALID_HANDLE */
+
 	err = pipe_handler_request(sk, PNS_PEP_CONNECT_REQ,
-					PN_PIPE_ENABLE, data, 4);
+				pn->init_enable, data, 4);
 	if (err) {
 		pn->pipe_handle = PN_PIPE_INVALID_HANDLE;
 		return err;
 	}
+
 	sk->sk_state = TCP_SYN_SENT;
+
+	return 0;
+}
+
+static int pep_sock_enable(struct sock *sk, struct sockaddr *addr, int len)
+{
+	int err;
+
+	err = pipe_handler_request(sk, PNS_PEP_ENABLE_REQ, PAD,
+				NULL, 0);
+	if (err)
+		return err;
+
+	sk->sk_state = TCP_SYN_SENT;
+
 	return 0;
 }
 
@@ -894,6 +947,19 @@ static int pep_ioctl(struct sock *sk, int cmd, unsigned long arg)
 			answ = 0;
 		release_sock(sk);
 		return put_user(answ, (int __user *)arg);
+		break;
+
+	case SIOCPNENABLEPIPE:
+		lock_sock(sk);
+		if (sk->sk_state == TCP_SYN_SENT)
+			answ =  -EBUSY;
+		else if (sk->sk_state == TCP_ESTABLISHED)
+			answ = -EISCONN;
+		else
+			answ = pep_sock_enable(sk, NULL, 0);
+		release_sock(sk);
+		return answ;
+		break;
 	}
 
 	return -ENOIOCTLCMD;
@@ -959,6 +1025,18 @@ static int pep_setsockopt(struct sock *sk, int level, int optname,
 		}
 		goto out_norel;
 
+	case PNPIPE_HANDLE:
+		if ((sk->sk_state == TCP_CLOSE) &&
+			(val >= 0) && (val < PN_PIPE_INVALID_HANDLE))
+			pn->pipe_handle = val;
+		else
+			err = -EINVAL;
+		break;
+
+	case PNPIPE_INITSTATE:
+		pn->init_enable = !!val;
+		break;
+
 	default:
 		err = -ENOPROTOOPT;
 	}
@@ -994,6 +1072,10 @@ static int pep_getsockopt(struct sock *sk, int level, int optname,
 			return -EINVAL;
 		break;
 
+	case PNPIPE_INITSTATE:
+		val = pn->init_enable;
+		break;
+
 	default:
 		return -ENOPROTOOPT;
 	}
-- 
1.7.4.3

^ permalink raw reply related

* Re: [PATCH 1/2] net: add network priority cgroup infrastructure
From: WANG Cong @ 2011-11-17  9:29 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1321476666-8225-2-git-send-email-nhorman@tuxdriver.com>

On Wed, 16 Nov 2011 15:51:05 -0500, Neil Horman wrote:
 
> +static void cgrp_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp)
> +{
> +	struct cgroup_netprio_state *cs;
> +	struct net_device *dev;
> +
> +	cs = cgrp_netprio_state(cgrp);
> +	rtnl_lock();
> +	for_each_netdev(&init_net, dev) {
> +		if (dev->priomap)
> +			dev->priomap->priomap[cs->prioidx] = 0; 
> +	}
> +	rtnl_unlock();
> +	put_prioidx(cs->prioidx);
> +out_free:
> +	kfree(cs);
> +}

'out_free' is unused.


> +
> +static int write_priomap(struct cgroup *cgrp, struct cftype *cft, 
> +			 const char *buffer)
> +{
> +	char *devname = kstrdup(buffer, GFP_KERNEL); 
> +	int ret = -EINVAL;
> +	u32 prioidx = cgrp_netprio_state(cgrp)->prioidx; 
> +	unsigned long priority;
> +	char *priostr;
> +	struct net_device *dev;
> +
> +	devname = kstrdup(buffer, GFP_KERNEL); 


kstrdup() is called twice...


Thanks.

^ permalink raw reply

* Re: [PATCH net-next] r8169: Add 64bit statistics
From: Francois Romieu @ 2011-11-17  9:36 UTC (permalink / raw)
  To: Junchang Wang; +Cc: nic_swsd, eric.dumazet, netdev
In-Reply-To: <20111117064826.GA4429@Desktop-Junchang>

Junchang Wang <junchangwang@gmail.com> :
> 
> Switch to use ndo_get_stats64 to get 64bit statistics.
> Per cpu data is used to avoid lock operations.

The 816x chipsets have hardware stats registers. The driver already
use them. Please use them more.

-- 
Ueimor

^ permalink raw reply

* Re: [PATCH] r8169: add module param for control of ASPM disable
From: Francois Romieu @ 2011-11-17  9:37 UTC (permalink / raw)
  To: Todd Broch
  Cc: Matthew Garrett, Realtek linux nic maintainers, netdev,
	Hayes Wang
In-Reply-To: <CA+iF6Rog3ptpmQZzhcRODmZUKN18_uw5t9xfpQjbJ86qKUA0eQ@mail.gmail.com>

Todd Broch <tbroch@chromium.org> :
[...]
>  I've tested on a Mobile Sandybridge platform only as I presently don't have
> access to any other systems w/ the r8169 h/w.

Sorry for the lack of specificity : the XID line from the r8169 driver would
be welcome. It should look like:


[    6.315053] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[    6.315287] r8169 0000:03:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[    6.315568] r8169 0000:03:00.0: setting latency timer to 64
[    6.315663] r8169 0000:03:00.0: irq 49 for MSI/MSI-X
[    6.315834] r8169 0000:03:00.0: eth0: RTL8168evl/8111evl at 0xffffc90000676000, 00:e0:4c:68:00:1f, XID 0c900800 IRQ 49
                                         ^^^^^^^^^^^^^^^^^^                                           ^^^^^^^^^^^^

Thanks.

-- 
Ueimor

^ permalink raw reply

* Re: [patch net-next 3/3] team: replicate options on register
From: Jiri Pirko @ 2011-11-17  9:57 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, davem, bhutchings, shemminger, andy, fbl, jzupka, ivecera
In-Reply-To: <1321518736.3274.45.camel@edumazet-laptop>

Thu, Nov 17, 2011 at 09:32:16AM CET, eric.dumazet@gmail.com wrote:
>Le mercredi 16 novembre 2011 à 22:09 +0100, Jiri Pirko a écrit :
>
>> +
>> +int team_options_register(struct team *team,
>> +			  const struct team_option *option,
>> +			  size_t option_count)
>>  {
>>  	int i;
>> +	struct team_option *dst_opts[option_count];
>> +	int err;
>
>This kind of construct will trigger static analyzer alerts...


I thought this is ok to do in kernel. Should I replace that with
kmalloc/kfree?

>
>
>
>

^ permalink raw reply

* Re: [patch net-next 3/3] team: replicate options on register
From: Eric Dumazet @ 2011-11-17 10:00 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, bhutchings, shemminger, andy, fbl, jzupka, ivecera
In-Reply-To: <20111117095729.GA2093@minipsycho>

Le jeudi 17 novembre 2011 à 10:57 +0100, Jiri Pirko a écrit :
> Thu, Nov 17, 2011 at 09:32:16AM CET, eric.dumazet@gmail.com wrote:
> >Le mercredi 16 novembre 2011 à 22:09 +0100, Jiri Pirko a écrit :
> >
> >> +
> >> +int team_options_register(struct team *team,
> >> +			  const struct team_option *option,
> >> +			  size_t option_count)
> >>  {
> >>  	int i;
> >> +	struct team_option *dst_opts[option_count];
> >> +	int err;
> >
> >This kind of construct will trigger static analyzer alerts...
> 
> 
> I thought this is ok to do in kernel. Should I replace that with
> kmalloc/kfree?

If you know the absolute limit of option_count, you could use

struct team_option *dst_opts[OPTION_COUNT_MAX];

If not, then a kmalloc()/kfree() is probably needed ;)

^ permalink raw reply

* Re: [PATCH 1/3] NET: MIPS: lantiq: make etop ethernet work on ase/ar9
From: John Crispin @ 2011-11-17 11:00 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20111116.220220.2004448468476437215.davem@davemloft.net>

On 17/11/11 04:02, David Miller wrote:
> From: John Crispin <blogic@openwrt.org>
> Date: Wed, 16 Nov 2011 15:41:46 +0100
>
>> Extend the driver to handle the different DMA channel layout for AR9 and
>> Amazon-SE SoCs. The patch also adds support for the integrated PHY found
>> on Amazon-SE and the gigabit switch found inside the AR9.
>>
>> Signed-off-by: John Crispin <blogic@openwrt.org>
>> Cc: netdev@vger.kernel.org
> Since these patches (at least partially) modify MIPS files, please submit
> them via the MIPS maintainer.
>
> Feel free to add:
>
> Acked-by: David S. Miller <davem@davemloft.net>
>

Thank you !

^ permalink raw reply

* Re: [patch net-next 3/3] team: replicate options on register
From: Eric Dumazet @ 2011-11-17 10:03 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, bhutchings, shemminger, andy, fbl, jzupka, ivecera
In-Reply-To: <1321524023.2751.1.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

Le jeudi 17 novembre 2011 à 11:00 +0100, Eric Dumazet a écrit :

> If you know the absolute limit of option_count, you could use
> 
> struct team_option *dst_opts[OPTION_COUNT_MAX];
> 
> If not, then a kmalloc()/kfree() is probably needed ;)
> 
> 

Sorry, forgot to include the link explaining Linus opinion on
variable-length arrays

https://lkml.org/lkml/2011/10/23/25

^ permalink raw reply

* Re: [PATCH 1/6] net: add the nwhwconfig support
From: Giuseppe CAVALLARO @ 2011-11-17 10:17 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, mamoroso, shiraz.hashim, armando.visconti, stuart.menefy
In-Reply-To: <20111117.030625.229541498275597867.davem@davemloft.net>

Hello David,

On 11/17/2011 9:06 AM, David Miller wrote:
> From: Giuseppe CAVALLARO<peppe.cavallaro@st.com>
> Date: Thu, 17 Nov 2011 09:01:40 +0100
>
>> Network drivers support hardware level configuration via utilities such as
>> ifconfig, ethtool and mii-tool. However sometimes these settings need to be
>> adjusted before a file system is available (typically if the root file system
>> uses NFS).
>
> No way, use an initial ramdisk.

Yes I agree with you that ramdisk is a solution :-) but this driver is 
actually helping many users of stmmac on several platforms.
Hmm, I also think it can be useful.

For example, in my environment, I need to continuously boot a Kernel on 
ST boxes and mount a rootFS via NFS to have the access to the full 
(arm/sh - glibc/uclibc) distributions (to use several packages for 
networking tests, X system etc).
This driver helps me to go faster in the normal development.
I am aware that starting with a ramdisk I could do that but with extra 
steps and configuration.
I mean, I want to pass all my network configuration in command line, 
such as ip= option, and this driver should not be in conflict with this 
logic.

With this small driver it's possible to force link speed and duplex 
modes when the kernel boots and this help me on validating HW and 
performing tests as well.

In any case, the driver is on the mailing list and other people can test 
it and provide feedback and enhancements.

Best Regards
Peppe

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* [PATCH] net: use jump_label to shortcut RPS if not setup
From: Eric Dumazet @ 2011-11-17 10:33 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Tom Herbert

Most machines dont use RPS/RFS, and pay a fair amount of instructions in
netif_receive_skb() / get_rps_cpu() just to discover RPS/RFS is not
setup.

Add a jump_label named rps_needed.

If no device rps_map or global rps_sock_flow_table is setup,
netif_receive_skb() does a single instruction instead of many ones,
including conditional jumps.

jmp +0    (if CONFIG_JUMP_LABEL=y)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Tom Herbert <therbert@google.com>
---
 include/linux/netdevice.h  |    5 +++++
 net/core/dev.c             |   14 ++++++--------
 net/core/net-sysfs.c       |    7 +++++--
 net/core/sysctl_net_core.c |    9 +++++++--
 4 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4d5698a..0bbe030 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -214,6 +214,11 @@ enum {
 #include <linux/cache.h>
 #include <linux/skbuff.h>
 
+#ifdef CONFIG_RPS
+#include <linux/jump_label.h>
+extern struct jump_label_key rps_needed;
+#endif
+
 struct neighbour;
 struct neigh_parms;
 struct sk_buff;
diff --git a/net/core/dev.c b/net/core/dev.c
index 26c49d5..4c3942c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2711,6 +2711,8 @@ EXPORT_SYMBOL(__skb_get_rxhash);
 struct rps_sock_flow_table __rcu *rps_sock_flow_table __read_mostly;
 EXPORT_SYMBOL(rps_sock_flow_table);
 
+struct jump_label_key rps_needed __read_mostly;
+
 static struct rps_dev_flow *
 set_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 	    struct rps_dev_flow *rflow, u16 next_cpu)
@@ -3359,7 +3361,7 @@ int netif_receive_skb(struct sk_buff *skb)
 		return NET_RX_SUCCESS;
 
 #ifdef CONFIG_RPS
-	{
+	if (static_branch(&rps_needed)) {
 		struct rps_dev_flow voidflow, *rflow = &voidflow;
 		int cpu, ret;
 
@@ -3370,16 +3372,12 @@ int netif_receive_skb(struct sk_buff *skb)
 		if (cpu >= 0) {
 			ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
 			rcu_read_unlock();
-		} else {
-			rcu_read_unlock();
-			ret = __netif_receive_skb(skb);
+			return ret;
 		}
-
-		return ret;
+		rcu_read_unlock();
 	}
-#else
-	return __netif_receive_skb(skb);
 #endif
+	return __netif_receive_skb(skb);
 }
 EXPORT_SYMBOL(netif_receive_skb);
 
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 602b141..db6c2f8 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -606,9 +606,12 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue,
 	rcu_assign_pointer(queue->rps_map, map);
 	spin_unlock(&rps_map_lock);
 
-	if (old_map)
+	if (map)
+		jump_label_inc(&rps_needed);
+	if (old_map) {
 		kfree_rcu(old_map, rcu);
-
+		jump_label_dec(&rps_needed);
+	}
 	free_cpumask_var(mask);
 	return len;
 }
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 77a65f0..d05559d 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -68,8 +68,13 @@ static int rps_sock_flow_sysctl(ctl_table *table, int write,
 
 		if (sock_table != orig_sock_table) {
 			rcu_assign_pointer(rps_sock_flow_table, sock_table);
-			synchronize_rcu();
-			vfree(orig_sock_table);
+			if (sock_table)
+				jump_label_inc(&rps_needed);
+			if (orig_sock_table) {
+				jump_label_dec(&rps_needed);
+				synchronize_rcu();
+				vfree(orig_sock_table);
+			}
 		}
 	}
 

^ permalink raw reply related

* Re: [PATCH net-next] r8169: Add 64bit statistics
From: Eric Dumazet @ 2011-11-17 10:51 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Junchang Wang, nic_swsd, netdev
In-Reply-To: <20111117093635.GA9112@electric-eye.fr.zoreil.com>

Le jeudi 17 novembre 2011 à 10:36 +0100, Francois Romieu a écrit :
> Junchang Wang <junchangwang@gmail.com> :
> > 
> > Switch to use ndo_get_stats64 to get 64bit statistics.
> > Per cpu data is used to avoid lock operations.
> 
> The 816x chipsets have hardware stats registers. The driver already
> use them. Please use them more.
> 

I would like to mention a possible bias.

I know for that tg3 includes in RX counters frames/bytes that were
dropped (because napi handler couldnot keep up with the load)

They also include FCS in the byte count.

When using software counters, we can compare "ethtool -S" and "ifconfig"
ones.

But generaly speaking, if hardware stats are available we should use
them and save few instructions per packet ;)

^ permalink raw reply

* Re: [PATCH net-next] r8169: Add 64bit statistics
From: Junchang Wang @ 2011-11-17 10:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Stephen Hemminger, netdev, romieu, nic swsd
In-Reply-To: <1321517516.3274.40.camel@edumazet-laptop>

> Yes, look at sky2.c for a template
>
> drivers/net/ethernet/marvell/sky2.c contains code like that
> (different syncp for rx/tx)
>
> TX path:
>                        u64_stats_update_begin(&sky2->tx_stats.syncp);
>                        ++sky2->tx_stats.packets;
>                        sky2->tx_stats.bytes += skb->len;
>                        u64_stats_update_end(&sky2->tx_stats.syncp);
>
>
> RX path:
>
>        u64_stats_update_begin(&sky2->rx_stats.syncp);
>        sky2->rx_stats.packets += packets;
>        sky2->rx_stats.bytes += bytes;
>        u64_stats_update_end(&sky2->rx_stats.syncp);
>

Thanks, Eric.

I'm still confused about why we need two sync entries. Please correct
me if I'm wrong.

Take r8169 for example, All statistic entries are updated in
rtl8169_rx_interrupt() or rtl8169_tx_interrupt(). Those two functions
are called in rtl8169_poll().
As far as I know, rtl8169_poll() is protected by NAPI_STATE_SCHED bit
to run on a single core at one moment. So there is not compulsory to
use two sync entries.

One benefit from two sync is that readers can avoid many retries.

Thanks.
-- 
--Junchang

^ permalink raw reply

* Re: [PATCH] net: use jump_label to shortcut RPS if not setup
From: Eric Dumazet @ 2011-11-17 11:08 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Tom Herbert
In-Reply-To: <1321526027.2751.16.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

Le jeudi 17 novembre 2011 à 11:33 +0100, Eric Dumazet a écrit :
> Most machines dont use RPS/RFS, and pay a fair amount of instructions in
> netif_receive_skb() / get_rps_cpu() just to discover RPS/RFS is not
> setup.
> 
> Add a jump_label named rps_needed.
> 
> If no device rps_map or global rps_sock_flow_table is setup,
> netif_receive_skb() does a single instruction instead of many ones,
> including conditional jumps.
> 
> jmp +0    (if CONFIG_JUMP_LABEL=y)
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Tom Herbert <therbert@google.com>
> ---

I'll send a V2 to take care of netif_rx() as well.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox