Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: 3.4-rc: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
From: Tomas Papan @ 2012-06-12  6:51 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev
In-Reply-To: <20120612052631.GA14567@electric-eye.fr.zoreil.com>

Hi Francois,

I tried your patch, so far no exception has been generated (uptime 1
hour) . I'll keep it running for 1 day and then I'll come back to you.
Anyway thanks for the patch.

Regards
Tomas

On Tue, Jun 12, 2012 at 7:26 AM, Francois Romieu <romieu@fr.zoreil.com> wrote:
> Tomas Papan <tomas.papan@gmail.com> :
> [...]
>> [    2.780758] r8169 0000:03:00.0: eth1: RTL8168e/8111e at
>> 0xffffc9000001c000, 80:ee:73:10:ad:44, XID 0c200000 IRQ 46
>
> Let's see if it behaves like RTL_GIGA_MAC_VER_34.
>
> Can you try the patch below ?
>
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index bbacb37..da46588 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -3766,6 +3766,7 @@ static void rtl_init_rxcfg(struct rtl8169_private *tp)
>        case RTL_GIGA_MAC_VER_22:
>        case RTL_GIGA_MAC_VER_23:
>        case RTL_GIGA_MAC_VER_24:
> +       case RTL_GIGA_MAC_VER_33:
>                RTL_W32(RxConfig, RX128_INT_EN | RX_MULTI_EN | RX_DMA_BURST);
>                break;
>        default:
>
>

^ permalink raw reply

* Re: Possible deadlock in ipv6?
From: David Miller @ 2012-06-12  6:54 UTC (permalink / raw)
  To: eric.dumazet; +Cc: vdavydov, netdev
In-Reply-To: <1338998314.26966.12.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 06 Jun 2012 17:58:34 +0200

> And it seems this neigh_down() can be removed, its called later
> (after dev->ip6_ptr is cleared)

It is unclear whether we need to do the the neigh_down() in both
the 'how' and '!how' cases.  If so then we can't make this change.

^ permalink raw reply

* [PATCH net-next v2 5/5] net: sh_eth: use NAPI
From: Shimoda, Yoshihiro @ 2012-06-12  6:59 UTC (permalink / raw)
  To: netdev; +Cc: SH-Linux

This patch modifies the driver to use NAPI.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
---
 about v2:
  - fix the condition which calls the netif_stop_queue()

 drivers/net/ethernet/renesas/sh_eth.c |  101 +++++++++++++++++++++------------
 drivers/net/ethernet/renesas/sh_eth.h |    3 +
 2 files changed, 67 insertions(+), 37 deletions(-)

diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index c64a31c..9196777 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -1035,7 +1035,7 @@ static int sh_eth_txfree(struct net_device *ndev)
 }

 /* Packet receive function */
-static int sh_eth_rx(struct net_device *ndev)
+static int sh_eth_rx(struct net_device *ndev, int *work, int budget)
 {
 	struct sh_eth_private *mdp = netdev_priv(ndev);
 	struct sh_eth_rxdesc *rxdesc;
@@ -1047,7 +1047,8 @@ static int sh_eth_rx(struct net_device *ndev)
 	u32 desc_status;

 	rxdesc = &mdp->rx_ring[entry];
-	while (!(rxdesc->status & cpu_to_edmac(mdp, RD_RACT))) {
+	while (!(rxdesc->status & cpu_to_edmac(mdp, RD_RACT)) &&
+	       *work < budget) {
 		desc_status = edmac_to_cpu(mdp, rxdesc->status);
 		pkt_len = rxdesc->frame_length;

@@ -1087,13 +1088,14 @@ static int sh_eth_rx(struct net_device *ndev)
 				skb_reserve(skb, NET_IP_ALIGN);
 			skb_put(skb, pkt_len);
 			skb->protocol = eth_type_trans(skb, ndev);
-			netif_rx(skb);
+			netif_receive_skb(skb);
 			ndev->stats.rx_packets++;
 			ndev->stats.rx_bytes += pkt_len;
 		}
 		rxdesc->status |= cpu_to_edmac(mdp, RD_RACT);
 		entry = (++mdp->cur_rx) % mdp->num_rx_ring;
 		rxdesc = &mdp->rx_ring[entry];
+		(*work)++;
 	}

 	/* Refill the Rx ring buffers. */
@@ -1125,7 +1127,7 @@ static int sh_eth_rx(struct net_device *ndev)

 	/* Restart Rx engine if stopped. */
 	/* If we don't need to check status, don't. -KDU */
-	if (!(sh_eth_read(ndev, EDRRR) & EDRRR_R)) {
+	if (*work < budget && !(sh_eth_read(ndev, EDRRR) & EDRRR_R)) {
 		/* fix the values for the next receiving */
 		mdp->cur_rx = mdp->dirty_rx = (sh_eth_read(ndev, RDFAR) -
 					       sh_eth_read(ndev, RDLAR)) >> 4;
@@ -1281,38 +1283,61 @@ static irqreturn_t sh_eth_interrupt(int irq, void *netdev)

 	/* Get interrpt stat */
 	intr_status = sh_eth_read(ndev, EESR);
-	/* Clear interrupt */
 	if (intr_status & (EESR_FRC | EESR_RMAF | EESR_RRF |
 			EESR_RTLF | EESR_RTSF | EESR_PRE | EESR_CERF |
 			cd->tx_check | cd->eesr_err_check)) {
-		sh_eth_write(ndev, intr_status, EESR);
+		if (napi_schedule_prep(&mdp->napi)) {
+			/* Disable interrupts of the channel */
+			sh_eth_write(ndev, 0, EESIPR);
+			__napi_schedule(&mdp->napi);
+		}
 		ret = IRQ_HANDLED;
-	} else
-		goto other_irq;
-
-	if (intr_status & (EESR_FRC | /* Frame recv*/
-			EESR_RMAF | /* Multi cast address recv*/
-			EESR_RRF  | /* Bit frame recv */
-			EESR_RTLF | /* Long frame recv*/
-			EESR_RTSF | /* short frame recv */
-			EESR_PRE  | /* PHY-LSI recv error */
-			EESR_CERF)){ /* recv frame CRC error */
-		sh_eth_rx(ndev);
 	}

-	/* Tx Check */
-	if (intr_status & cd->tx_check) {
-		sh_eth_txfree(ndev);
-		netif_wake_queue(ndev);
+	spin_unlock(&mdp->lock);
+
+	return ret;
+}
+
+static int sh_eth_poll(struct napi_struct *napi, int budget)
+{
+	struct sh_eth_private *mdp = container_of(napi, struct sh_eth_private,
+						  napi);
+	struct net_device *ndev = mdp->ndev;
+	struct sh_eth_cpu_data *cd = mdp->cd;
+	int work_done = 0, txfree_num;
+	u32 intr_status = sh_eth_read(ndev, EESR);
+
+	/* Clear interrupt flags */
+	sh_eth_write(ndev, intr_status, EESR);
+
+	/* check txdesc */
+	txfree_num = sh_eth_txfree(ndev);
+	if (txfree_num) {
+		netif_tx_lock(ndev);
+		if (netif_queue_stopped(ndev))
+			netif_wake_queue(ndev);
+		netif_tx_unlock(ndev);
 	}

+	/* check rxdesc */
+	sh_eth_rx(ndev, &work_done, budget);
+
+	/* check error flags */
 	if (intr_status & cd->eesr_err_check)
 		sh_eth_error(ndev, intr_status);

-other_irq:
-	spin_unlock(&mdp->lock);
+	/* get current interrupt flags */
+	intr_status = sh_eth_read(ndev, EESR);

-	return ret;
+	/* check whether this driver should call napi_complete() */
+	if (work_done < budget) {
+		napi_complete(napi);
+		/* Enable all interrupts */
+		sh_eth_write(ndev, cd->eesipr_value, EESIPR);
+	}
+
+	return work_done;
 }

 /* PHY state control function */
@@ -1545,6 +1570,7 @@ static int sh_eth_set_ringparam(struct net_device *ndev,
 		/* Stop the chip's Tx and Rx processes. */
 		sh_eth_write(ndev, 0, EDTRR);
 		sh_eth_write(ndev, 0, EDRRR);
+		napi_disable(&mdp->napi);
 		synchronize_irq(ndev->irq);
 	}

@@ -1569,6 +1595,7 @@ static int sh_eth_set_ringparam(struct net_device *ndev,
 	}

 	if (netif_running(ndev)) {
+		napi_enable(&mdp->napi);
 		sh_eth_write(ndev, mdp->cd->eesipr_value, EESIPR);
 		/* Setting the Rx mode will start the Rx process. */
 		sh_eth_write(ndev, EDRRR_R, EDRRR);
@@ -1600,6 +1627,8 @@ static int sh_eth_open(struct net_device *ndev)

 	pm_runtime_get_sync(&mdp->pdev->dev);

+	napi_enable(&mdp->napi);
+
 	ret = request_irq(ndev->irq, sh_eth_interrupt,
 #if defined(CONFIG_CPU_SUBTYPE_SH7763) || \
 	defined(CONFIG_CPU_SUBTYPE_SH7764) || \
@@ -1678,19 +1707,6 @@ static int sh_eth_start_xmit(struct sk_buff *skb, struct net_device *ndev)
 	struct sh_eth_private *mdp = netdev_priv(ndev);
 	struct sh_eth_txdesc *txdesc;
 	u32 entry;
-	unsigned long flags;
-
-	spin_lock_irqsave(&mdp->lock, flags);
-	if ((mdp->cur_tx - mdp->dirty_tx) >= (mdp->num_tx_ring - 4)) {
-		if (!sh_eth_txfree(ndev)) {
-			if (netif_msg_tx_queued(mdp))
-				dev_warn(&ndev->dev, "TxFD exhausted.\n");
-			netif_stop_queue(ndev);
-			spin_unlock_irqrestore(&mdp->lock, flags);
-			return NETDEV_TX_BUSY;
-		}
-	}
-	spin_unlock_irqrestore(&mdp->lock, flags);

 	entry = mdp->cur_tx % mdp->num_tx_ring;
 	mdp->tx_skbuff[entry] = skb;
@@ -1716,6 +1732,12 @@ static int sh_eth_start_xmit(struct sk_buff *skb, struct net_device *ndev)
 	if (!(sh_eth_read(ndev, EDTRR) & sh_eth_get_edtrr_trns(mdp)))
 		sh_eth_write(ndev, sh_eth_get_edtrr_trns(mdp), EDTRR);

+	if ((mdp->cur_tx - mdp->dirty_tx) >= (mdp->num_tx_ring - 4)) {
+		if (netif_msg_tx_queued(mdp))
+			dev_warn(&ndev->dev, "TxFD exhausted.\n");
+		netif_stop_queue(ndev);
+	}
+
 	return NETDEV_TX_OK;
 }

@@ -1739,6 +1761,8 @@ static int sh_eth_close(struct net_device *ndev)
 		phy_disconnect(mdp->phydev);
 	}

+	napi_disable(&mdp->napi);
+
 	free_irq(ndev->irq, ndev);

 	/* Free all the skbuffs in the Rx queue. */
@@ -2368,6 +2392,9 @@ static int sh_eth_drv_probe(struct platform_device *pdev)
 #endif
 	sh_eth_set_default_cpu_data(mdp->cd);

+	mdp->ndev = ndev;
+	netif_napi_add(ndev, &mdp->napi, sh_eth_poll, SH_ETH_NAPI_WEIGHT);
+
 	/* set function */
 	ndev->netdev_ops = &sh_eth_netdev_ops;
 	SET_ETHTOOL_OPS(ndev, &sh_eth_ethtool_ops);
diff --git a/drivers/net/ethernet/renesas/sh_eth.h b/drivers/net/ethernet/renesas/sh_eth.h
index f1dbc27..93dad7b 100644
--- a/drivers/net/ethernet/renesas/sh_eth.h
+++ b/drivers/net/ethernet/renesas/sh_eth.h
@@ -35,6 +35,7 @@
 #define PKT_BUF_SZ		1538
 #define SH_ETH_TSU_TIMEOUT_MS	500
 #define SH_ETH_TSU_CAM_ENTRIES	32
+#define SH_ETH_NAPI_WEIGHT	32

 enum {
 	/* E-DMAC registers */
@@ -728,6 +729,8 @@ struct sh_eth_private {
 	int duplex;
 	int port;		/* for TSU */
 	int vlan_num_ids;	/* for VLAN tag filter */
+	struct napi_struct napi;
+	struct net_device *ndev;

 	unsigned no_ether_link:1;
 	unsigned ether_link_active_low:1;
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH] net/sh-eth: Add support selecting MII function for SH7734 and R8A7740
From: Nobuhiro Iwamatsu @ 2012-06-12  7:28 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120611.164717.1390327462404819704.davem@davemloft.net>

David Miller さんは書きました:
> From: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
> Date: Fri,  8 Jun 2012 15:01:51 +0900
> 
>> +	unsigned select_mii:1;	/* EtherC have RMII_MII (MII select register) */
> 
> Nothing tests this value.
> 

Sorry, I sent old patch. I will sent new patch soon.
Best regards,
   Nobuhiro

^ permalink raw reply

* [PATCH v2] net/sh-eth: Add support selecting MII function for SH7734 and R8A7740
From: Nobuhiro Iwamatsu @ 2012-06-12  7:29 UTC (permalink / raw)
  To: netdev; +Cc: Nobuhiro Iwamatsu

Ethernet IP of SH7734 and R8A7740 has selecting MII register.
The user needs to change a value according to MII to be used.
This adds the function to change the value of this register.

Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
---
 V2: Fix the check by select_mii.
 drivers/net/ethernet/renesas/sh_eth.c |  106 ++++++++++++++++++++-------------
 drivers/net/ethernet/renesas/sh_eth.h |    1 +
 2 files changed, 65 insertions(+), 42 deletions(-)

diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index be3c221..5358804 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -49,6 +49,33 @@
 		NETIF_MSG_RX_ERR| \
 		NETIF_MSG_TX_ERR)
 
+#if defined(CONFIG_CPU_SUBTYPE_SH7734) || defined(CONFIG_CPU_SUBTYPE_SH7763) || \
+	defined(CONFIG_ARCH_R8A7740)
+static void sh_eth_select_mii(struct net_device *ndev)
+{
+	u32 value = 0x0;
+	struct sh_eth_private *mdp = netdev_priv(ndev);
+
+	switch (mdp->phy_interface) {
+	case PHY_INTERFACE_MODE_GMII:
+		value = 0x2;
+		break;
+	case PHY_INTERFACE_MODE_MII:
+		value = 0x1;
+		break;
+	case PHY_INTERFACE_MODE_RMII:
+		value = 0x0;
+		break;
+	default:
+		pr_warn("PHY interface mode was not setup. Set to MII.\n");
+		value = 0x1;
+		break;
+	}
+
+	sh_eth_write(ndev, value, RMII_MII);
+}
+#endif
+
 /* There is CPU dependent code */
 #if defined(CONFIG_CPU_SUBTYPE_SH7724)
 #define SH_ETH_RESET_DEFAULT	1
@@ -283,6 +310,7 @@ static struct sh_eth_cpu_data *sh_eth_get_cpu_data(struct sh_eth_private *mdp)
 #elif defined(CONFIG_CPU_SUBTYPE_SH7734) || defined(CONFIG_CPU_SUBTYPE_SH7763)
 #define SH_ETH_HAS_TSU	1
 static void sh_eth_reset_hw_crc(struct net_device *ndev);
+
 static void sh_eth_chip_reset(struct net_device *ndev)
 {
 	struct sh_eth_private *mdp = netdev_priv(ndev);
@@ -292,35 +320,6 @@ static void sh_eth_chip_reset(struct net_device *ndev)
 	mdelay(1);
 }
 
-static void sh_eth_reset(struct net_device *ndev)
-{
-	int cnt = 100;
-
-	sh_eth_write(ndev, EDSR_ENALL, EDSR);
-	sh_eth_write(ndev, sh_eth_read(ndev, EDMR) | EDMR_SRST_GETHER, EDMR);
-	while (cnt > 0) {
-		if (!(sh_eth_read(ndev, EDMR) & 0x3))
-			break;
-		mdelay(1);
-		cnt--;
-	}
-	if (cnt == 0)
-		printk(KERN_ERR "Device reset fail\n");
-
-	/* Table Init */
-	sh_eth_write(ndev, 0x0, TDLAR);
-	sh_eth_write(ndev, 0x0, TDFAR);
-	sh_eth_write(ndev, 0x0, TDFXR);
-	sh_eth_write(ndev, 0x0, TDFFR);
-	sh_eth_write(ndev, 0x0, RDLAR);
-	sh_eth_write(ndev, 0x0, RDFAR);
-	sh_eth_write(ndev, 0x0, RDFXR);
-	sh_eth_write(ndev, 0x0, RDFFR);
-
-	/* Reset HW CRC register */
-	sh_eth_reset_hw_crc(ndev);
-}
-
 static void sh_eth_set_duplex(struct net_device *ndev)
 {
 	struct sh_eth_private *mdp = netdev_priv(ndev);
@@ -377,9 +376,43 @@ static struct sh_eth_cpu_data sh_eth_my_cpu_data = {
 	.tsu		= 1,
 #if defined(CONFIG_CPU_SUBTYPE_SH7734)
 	.hw_crc     = 1,
+	.select_mii = 1,
 #endif
 };
 
+static void sh_eth_reset(struct net_device *ndev)
+{
+	int cnt = 100;
+
+	sh_eth_write(ndev, EDSR_ENALL, EDSR);
+	sh_eth_write(ndev, sh_eth_read(ndev, EDMR) | EDMR_SRST_GETHER, EDMR);
+	while (cnt > 0) {
+		if (!(sh_eth_read(ndev, EDMR) & 0x3))
+			break;
+		mdelay(1);
+		cnt--;
+	}
+	if (cnt == 0)
+		printk(KERN_ERR "Device reset fail\n");
+
+	/* Table Init */
+	sh_eth_write(ndev, 0x0, TDLAR);
+	sh_eth_write(ndev, 0x0, TDFAR);
+	sh_eth_write(ndev, 0x0, TDFXR);
+	sh_eth_write(ndev, 0x0, TDFFR);
+	sh_eth_write(ndev, 0x0, RDLAR);
+	sh_eth_write(ndev, 0x0, RDFAR);
+	sh_eth_write(ndev, 0x0, RDFXR);
+	sh_eth_write(ndev, 0x0, RDFFR);
+
+	/* Reset HW CRC register */
+	sh_eth_reset_hw_crc(ndev);
+
+	/* Select MII mode */
+	if (sh_eth_my_cpu_data.select_mii)
+		sh_eth_select_mii(ndev);
+}
+
 static void sh_eth_reset_hw_crc(struct net_device *ndev)
 {
 	if (sh_eth_my_cpu_data.hw_crc)
@@ -397,19 +430,7 @@ static void sh_eth_chip_reset(struct net_device *ndev)
 	sh_eth_tsu_write(mdp, ARSTR_ARSTR, ARSTR);
 	mdelay(1);
 
-	switch (mdp->phy_interface) {
-	case PHY_INTERFACE_MODE_GMII:
-		mii = 2;
-		break;
-	case PHY_INTERFACE_MODE_MII:
-		mii = 1;
-		break;
-	case PHY_INTERFACE_MODE_RMII:
-	default:
-		mii = 0;
-		break;
-	}
-	sh_eth_write(ndev, mii, RMII_MII);
+	sh_eth_select_mii(ndev);
 }
 
 static void sh_eth_reset(struct net_device *ndev)
@@ -492,6 +513,7 @@ static struct sh_eth_cpu_data sh_eth_my_cpu_data = {
 	.no_trimd	= 1,
 	.no_ade		= 1,
 	.tsu		= 1,
+	.select_mii = 1,
 };
 
 #elif defined(CONFIG_CPU_SUBTYPE_SH7619)
diff --git a/drivers/net/ethernet/renesas/sh_eth.h b/drivers/net/ethernet/renesas/sh_eth.h
index 57b8e1f..d6763b1392 100644
--- a/drivers/net/ethernet/renesas/sh_eth.h
+++ b/drivers/net/ethernet/renesas/sh_eth.h
@@ -757,6 +757,7 @@ struct sh_eth_cpu_data {
 	unsigned no_trimd:1;		/* E-DMAC DO NOT have TRIMD */
 	unsigned no_ade:1;	/* E-DMAC DO NOT have ADE bit in EESR */
 	unsigned hw_crc:1;	/* E-DMAC have CSMR */
+	unsigned select_mii:1;	/* EtherC have RMII_MII (MII select register) */
 };
 
 struct sh_eth_private {
-- 
1.7.10

^ permalink raw reply related

* Re: [PATCH v2] net/sh-eth: Add support selecting MII function for SH7734 and R8A7740
From: David Miller @ 2012-06-12  7:29 UTC (permalink / raw)
  To: nobuhiro.iwamatsu.yj; +Cc: netdev
In-Reply-To: <1339486142-32480-1-git-send-email-nobuhiro.iwamatsu.yj@renesas.com>

From: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
Date: Tue, 12 Jun 2012 16:29:02 +0900

> @@ -492,6 +513,7 @@ static struct sh_eth_cpu_data sh_eth_my_cpu_data = {
>  	.no_trimd	= 1,
>  	.no_ade		= 1,
>  	.tsu		= 1,
> +	.select_mii = 1,
>  };
>  

Indent this new line consistently with those around it.

^ permalink raw reply

* Re: Difficulties to get 1Gbps on be2net ethernet card
From: Jean-Michel Hautbois @ 2012-06-12  8:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Sathya.Perla, netdev
In-Reply-To: <CAL8zT=i6zruaa0-bc-MG4FyhdM8RDtARd2E_o0+G-=p7RsJdGw@mail.gmail.com>

2012/6/8 Jean-Michel Hautbois <jhautbois@gmail.com>:
> 2012/6/8 Eric Dumazet <eric.dumazet@gmail.com>:
>> On Fri, 2012-06-08 at 10:14 +0200, Jean-Michel Hautbois wrote:
>>> 2012/6/8 Eric Dumazet <eric.dumazet@gmail.com>:
>>> > On Thu, 2012-06-07 at 14:54 +0200, Jean-Michel Hautbois wrote:
>>> >
>>> >> eth1      Link encap:Ethernet  HWaddr 68:b5:99:b9:8d:d4
>>> >>           UP BROADCAST RUNNING SLAVE MULTICAST  MTU:4096  Metric:1
>>> >>           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>> >>           TX packets:15215387 errors:0 dropped:0 overruns:0 carrier:0
>>> >>           collisions:0 txqueuelen:1000
>>> >>           RX bytes:0 (0.0 B)  TX bytes:61476524359 (57.2 GiB)
>>> >
>>> >> qdisc mq 0: dev eth1 root
>>> >>  Sent 61476524359 bytes 15215387 pkt (dropped 45683472, overlimits 0
>>> >> requeues 17480)
>>> >
>>> > OK, and "tc -s -d cl show dev eth1"
>>> >
>>> > (How many queues are really used)
>>> >
>>> >
>>> >
>>>
>>> tc -s -d cl show dev eth1
>>> class mq :1 root
>>>  Sent 9798071746 bytes 2425410 pkt (dropped 3442405, overlimits 0 requeues 2747)
>>>  backlog 0b 0p requeues 2747
>>> class mq :2 root
>>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>> class mq :3 root
>>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>> class mq :4 root
>>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>> class mq :5 root
>>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>> class mq :6 root
>>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>> class mq :7 root
>>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>> class mq :8 root
>>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>
>>
>> Do you have the same distribution on old kernels as well ?
>> (ie only queue 0 is used)
>>
>>
>>
>
> On the old kernel, there is nothing returned by this command.
>
> JM

I used perf in order to get more information.
Here is the perf record -a sleep 10 result (I took only kernel) :
     6.93%    ModuleTester  [kernel.kallsyms]                 [k]
copy_user_generic_string
     2.99%         swapper  [kernel.kallsyms]                 [k] mwait_idle
     2.60%          kipmi0  [ipmi_si]                         [k] port_inb
     1.75%         swapper  [kernel.kallsyms]                 [k] rb_prev
     1.63%    ModuleTester  [kernel.kallsyms]                 [k] _raw_spin_lock
     1.43%     NodeManager  [kernel.kallsyms]                 [k] delay_tsc
     0.90%    ModuleTester  [kernel.kallsyms]                 [k] clear_page_c
     0.88%    ModuleTester  [kernel.kallsyms]                 [k] dev_queue_xmit
     0.80%              ip  [kernel.kallsyms]                 [k]
snmp_fold_field
     0.73%    ModuleTester  [kernel.kallsyms]                 [k]
clflush_cache_range
     0.69%            grep  [kernel.kallsyms]                 [k] page_fault
     0.61%              sh  [kernel.kallsyms]                 [k] page_fault
     0.59%    ModuleTester  [kernel.kallsyms]                 [k] udp_sendmsg
     0.55%    ModuleTester  [kernel.kallsyms]                 [k] _raw_read_lock
     0.53%              sh  [kernel.kallsyms]                 [k] unmap_vmas
     0.52%    ModuleTester  [kernel.kallsyms]                 [k] rb_prev
     0.51%    ModuleTester  [kernel.kallsyms]                 [k]
find_busiest_group
     0.49%    ModuleTester  [kernel.kallsyms]                 [k] __ip_make_skb
     0.48%    ModuleTester  [kernel.kallsyms]                 [k]
sock_alloc_send_pskb
     0.48%    ModuleTester  libpthread-2.7.so                 [.]
pthread_mutex_lock
     0.47%    ModuleTester  [kernel.kallsyms]                 [k]
__netif_receive_skb
     0.44%              ip  [kernel.kallsyms]                 [k] find_next_bit
     0.43%         swapper  [kernel.kallsyms]                 [k]
clflush_cache_range
     0.41%              ps  [kernel.kallsyms]                 [k] format_decode
     0.41%    ModuleTester  [bonding]                         [k]
bond_start_xmit
     0.39%    ModuleTester  [be2net]                          [k] be_xmit
     0.39%    ModuleTester  [kernel.kallsyms]                 [k]
__ip_append_data
     0.38%    ModuleTester  [kernel.kallsyms]                 [k] netif_rx
     0.37%         swapper  [be2net]                          [k] be_poll
     0.37%         swapper  [kernel.kallsyms]                 [k] ktime_get
     0.37%              sh  [kernel.kallsyms]                 [k] copy_page_c
     0.36%         swapper  [kernel.kallsyms]                 [k]
irq_entries_start
     0.36%    ModuleTester  [kernel.kallsyms]                 [k]
__alloc_pages_nodemask
     0.35%    ModuleTester  [kernel.kallsyms]                 [k] __slab_free
     0.35%    ModuleTester  [kernel.kallsyms]                 [k] ip_mc_output
     0.34%    ModuleTester  [kernel.kallsyms]                 [k]
skb_release_data
     0.33%              ip  [kernel.kallsyms]                 [k] page_fault
     0.33%    ModuleTester  [kernel.kallsyms]                 [k] udp_send_skb

And here is the perf record -a result without bonding :
     2.49%     ModuleTester  [kernel.kallsyms]               [k]
csum_partial_copy_generic
     1.35%     ModuleTester  [kernel.kallsyms]               [k] _raw_spin_lock
     1.29%     ModuleTester  [kernel.kallsyms]               [k]
clflush_cache_range
     1.16%       jobprocess  [kernel.kallsyms]               [k] rb_prev
     1.01%       jobprocess  [kernel.kallsyms]               [k]
clflush_cache_range
     0.81%     ModuleTester  [be2net]                        [k] be_xmit
     0.78%       jobprocess  [kernel.kallsyms]               [k] __slab_free
     0.77%          swapper  [kernel.kallsyms]               [k] mwait_idle
     0.72%     ModuleTester  [kernel.kallsyms]               [k]
__domain_mapping
     0.66%       jobprocess  [kernel.kallsyms]               [k] _raw_spin_lock
     0.59%       jobprocess  [kernel.kallsyms]               [k]
_raw_spin_lock_irqsave
     0.56%     ModuleTester  [kernel.kallsyms]               [k] rb_prev
     0.53%          swapper  [kernel.kallsyms]               [k] rb_prev
     0.49%     ModuleTester  [kernel.kallsyms]               [k] sock_wmalloc
     0.47%       jobprocess  [be2net]                        [k] be_poll
     0.47%     ModuleTester  [kernel.kallsyms]               [k]
kmem_cache_alloc
     0.47%          swapper  [kernel.kallsyms]               [k]
clflush_cache_range
     0.45%           kipmi0  [ipmi_si]                       [k] port_inb
     0.42%          swapper  [kernel.kallsyms]               [k] __slab_free
     0.41%       jobprocess  [kernel.kallsyms]               [k] try_to_wake_up
     0.40%     ModuleTester  [kernel.kallsyms]               [k]
kmem_cache_alloc_node
     0.40%       jobprocess  [kernel.kallsyms]               [k] tg_load_down
     0.39%       jobprocess  libodyssey.so.1.8.2             [.]
y8_deblocking_luma_vert_edge_h264_sse2
     0.38%       jobprocess  libodyssey.so.1.8.2             [.]
y8_deblocking_luma_horz_edge_h264_ssse3
     0.38%     ModuleTester  [kernel.kallsyms]               [k] rb_insert_color
     0.37%       jobprocess  [kernel.kallsyms]               [k] find_iova
     0.37%       jobprocess  [kernel.kallsyms]               [k]
find_busiest_group
     0.36%       jobprocess  libpthread-2.7.so               [.]
pthread_mutex_lock
     0.35%          swapper  [kernel.kallsyms]               [k]
_raw_spin_unlock_irqrestore
     0.34%     ModuleTester  [kernel.kallsyms]               [k]
_raw_spin_lock_irqsave
     0.33%     ModuleTester  [kernel.kallsyms]               [k]
pfifo_fast_dequeue
     0.32%     ModuleTester  [kernel.kallsyms]               [k]
__kmalloc_node_track_caller
     0.32%       jobprocess  [be2net]                        [k]
be_tx_compl_process
     0.31%     ModuleTester  [kernel.kallsyms]               [k] ip_fragment
     0.29%          swapper  [kernel.kallsyms]               [k]
__hrtimer_start_range_ns
     0.29%       jobprocess  [kernel.kallsyms]               [k] __schedule
     0.29%     ModuleTester  [kernel.kallsyms]               [k] dev_queue_xmit
     0.28%          swapper  [kernel.kallsyms]               [k] __schedule

First thing I notice is the difference in copy_user_generic_string (it
is only 0.11% on the second measure, I didn't reported it here).
I think perf can help in finding the issue I observe with bonding,
maybe do you have  suggestions on the parameters to use ?
FYI, with bonding, TX goes up to 640Mbps, without bonding, I can send
2.4Gbps without suffering...

JM

^ permalink raw reply

* Re: [PATCH v2] net/sh-eth: Add support selecting MII function for SH7734 and R8A7740
From: Florian Fainelli @ 2012-06-12  8:40 UTC (permalink / raw)
  To: Nobuhiro Iwamatsu; +Cc: netdev
In-Reply-To: <1339486142-32480-1-git-send-email-nobuhiro.iwamatsu.yj@renesas.com>

On Tuesday 12 June 2012 16:29:02 Nobuhiro Iwamatsu wrote:
> Ethernet IP of SH7734 and R8A7740 has selecting MII register.
> The user needs to change a value according to MII to be used.
> This adds the function to change the value of this register.
> 
> Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
> ---
>  V2: Fix the check by select_mii.
>  drivers/net/ethernet/renesas/sh_eth.c |  106 
++++++++++++++++++++-------------
>  drivers/net/ethernet/renesas/sh_eth.h |    1 +
>  2 files changed, 65 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/net/ethernet/renesas/sh_eth.c 
b/drivers/net/ethernet/renesas/sh_eth.c
> index be3c221..5358804 100644
> --- a/drivers/net/ethernet/renesas/sh_eth.c
> +++ b/drivers/net/ethernet/renesas/sh_eth.c
> @@ -49,6 +49,33 @@
>  		NETIF_MSG_RX_ERR| \
>  		NETIF_MSG_TX_ERR)
>  
> +#if defined(CONFIG_CPU_SUBTYPE_SH7734) || defined(CONFIG_CPU_SUBTYPE_SH7763) 
|| \
> +	defined(CONFIG_ARCH_R8A7740)
> +static void sh_eth_select_mii(struct net_device *ndev)
> +{
> +	u32 value = 0x0;
> +	struct sh_eth_private *mdp = netdev_priv(ndev);
> +
> +	switch (mdp->phy_interface) {
> +	case PHY_INTERFACE_MODE_GMII:
> +		value = 0x2;
> +		break;
> +	case PHY_INTERFACE_MODE_MII:
> +		value = 0x1;
> +		break;
> +	case PHY_INTERFACE_MODE_RMII:
> +		value = 0x0;
> +		break;
> +	default:
> +		pr_warn("PHY interface mode was not setup. Set to MII.\n");
> +		value = 0x1;
> +		break;
> +	}
> +
> +	sh_eth_write(ndev, value, RMII_MII);
> +}
> +#endif
> +
>  /* There is CPU dependent code */
>  #if defined(CONFIG_CPU_SUBTYPE_SH7724)
>  #define SH_ETH_RESET_DEFAULT	1
> @@ -283,6 +310,7 @@ static struct sh_eth_cpu_data 
*sh_eth_get_cpu_data(struct sh_eth_private *mdp)
>  #elif defined(CONFIG_CPU_SUBTYPE_SH7734) || 
defined(CONFIG_CPU_SUBTYPE_SH7763)
>  #define SH_ETH_HAS_TSU	1
>  static void sh_eth_reset_hw_crc(struct net_device *ndev);
> +
>  static void sh_eth_chip_reset(struct net_device *ndev)
>  {
>  	struct sh_eth_private *mdp = netdev_priv(ndev);
> @@ -292,35 +320,6 @@ static void sh_eth_chip_reset(struct net_device *ndev)
>  	mdelay(1);
>  }
>  
> -static void sh_eth_reset(struct net_device *ndev)
> -{
> -	int cnt = 100;
> -
> -	sh_eth_write(ndev, EDSR_ENALL, EDSR);
> -	sh_eth_write(ndev, sh_eth_read(ndev, EDMR) | EDMR_SRST_GETHER, EDMR);
> -	while (cnt > 0) {
> -		if (!(sh_eth_read(ndev, EDMR) & 0x3))
> -			break;
> -		mdelay(1);
> -		cnt--;
> -	}
> -	if (cnt == 0)
> -		printk(KERN_ERR "Device reset fail\n");
> -
> -	/* Table Init */
> -	sh_eth_write(ndev, 0x0, TDLAR);
> -	sh_eth_write(ndev, 0x0, TDFAR);
> -	sh_eth_write(ndev, 0x0, TDFXR);
> -	sh_eth_write(ndev, 0x0, TDFFR);
> -	sh_eth_write(ndev, 0x0, RDLAR);
> -	sh_eth_write(ndev, 0x0, RDFAR);
> -	sh_eth_write(ndev, 0x0, RDFXR);
> -	sh_eth_write(ndev, 0x0, RDFFR);
> -
> -	/* Reset HW CRC register */
> -	sh_eth_reset_hw_crc(ndev);
> -}
> -
>  static void sh_eth_set_duplex(struct net_device *ndev)
>  {
>  	struct sh_eth_private *mdp = netdev_priv(ndev);
> @@ -377,9 +376,43 @@ static struct sh_eth_cpu_data sh_eth_my_cpu_data = {
>  	.tsu		= 1,
>  #if defined(CONFIG_CPU_SUBTYPE_SH7734)
>  	.hw_crc     = 1,
> +	.select_mii = 1,
>  #endif
>  };
>  
> +static void sh_eth_reset(struct net_device *ndev)
> +{
> +	int cnt = 100;
> +
> +	sh_eth_write(ndev, EDSR_ENALL, EDSR);
> +	sh_eth_write(ndev, sh_eth_read(ndev, EDMR) | EDMR_SRST_GETHER, EDMR);
> +	while (cnt > 0) {
> +		if (!(sh_eth_read(ndev, EDMR) & 0x3))
> +			break;
> +		mdelay(1);
> +		cnt--;
> +	}
> +	if (cnt == 0)
> +		printk(KERN_ERR "Device reset fail\n");

It looks like this would need a subsequent fix. Failing to reset the adapter 
and just erroring out and not returning an error looks obviously wrong. Since 
sh_eth_reset() is called in sh_eth_dev_init() which does return an int, 
propagate the error back to the caller.

> +
> +	/* Table Init */
> +	sh_eth_write(ndev, 0x0, TDLAR);
> +	sh_eth_write(ndev, 0x0, TDFAR);
> +	sh_eth_write(ndev, 0x0, TDFXR);
> +	sh_eth_write(ndev, 0x0, TDFFR);
> +	sh_eth_write(ndev, 0x0, RDLAR);
> +	sh_eth_write(ndev, 0x0, RDFAR);
> +	sh_eth_write(ndev, 0x0, RDFXR);
> +	sh_eth_write(ndev, 0x0, RDFFR);
> +
> +	/* Reset HW CRC register */
> +	sh_eth_reset_hw_crc(ndev);
> +
> +	/* Select MII mode */
> +	if (sh_eth_my_cpu_data.select_mii)
> +		sh_eth_select_mii(ndev);
> +}
> +
>  static void sh_eth_reset_hw_crc(struct net_device *ndev)
>  {
>  	if (sh_eth_my_cpu_data.hw_crc)
> @@ -397,19 +430,7 @@ static void sh_eth_chip_reset(struct net_device *ndev)
>  	sh_eth_tsu_write(mdp, ARSTR_ARSTR, ARSTR);
>  	mdelay(1);
>  
> -	switch (mdp->phy_interface) {
> -	case PHY_INTERFACE_MODE_GMII:
> -		mii = 2;
> -		break;
> -	case PHY_INTERFACE_MODE_MII:
> -		mii = 1;
> -		break;
> -	case PHY_INTERFACE_MODE_RMII:
> -	default:
> -		mii = 0;
> -		break;
> -	}
> -	sh_eth_write(ndev, mii, RMII_MII);
> +	sh_eth_select_mii(ndev);
>  }
>  
>  static void sh_eth_reset(struct net_device *ndev)
> @@ -492,6 +513,7 @@ static struct sh_eth_cpu_data sh_eth_my_cpu_data = {
>  	.no_trimd	= 1,
>  	.no_ade		= 1,
>  	.tsu		= 1,
> +	.select_mii = 1,
>  };
>  
>  #elif defined(CONFIG_CPU_SUBTYPE_SH7619)
> diff --git a/drivers/net/ethernet/renesas/sh_eth.h 
b/drivers/net/ethernet/renesas/sh_eth.h
> index 57b8e1f..d6763b1392 100644
> --- a/drivers/net/ethernet/renesas/sh_eth.h
> +++ b/drivers/net/ethernet/renesas/sh_eth.h
> @@ -757,6 +757,7 @@ struct sh_eth_cpu_data {
>  	unsigned no_trimd:1;		/* E-DMAC DO NOT have TRIMD */
>  	unsigned no_ade:1;	/* E-DMAC DO NOT have ADE bit in EESR */
>  	unsigned hw_crc:1;	/* E-DMAC have CSMR */
> +	unsigned select_mii:1;	/* EtherC have RMII_MII (MII select register) 
*/
>  };
>  
>  struct sh_eth_private {
> -- 
> 1.7.10
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Florian

^ permalink raw reply

* Re: Difficulties to get 1Gbps on be2net ethernet card
From: Eric Dumazet @ 2012-06-12  8:55 UTC (permalink / raw)
  To: Jean-Michel Hautbois; +Cc: Sathya.Perla, netdev
In-Reply-To: <CAL8zT=gNtcdyyVcPt5hB6jyF1btzQArEuZgVKWkb0Wd=a4LcVA@mail.gmail.com>

On Tue, 2012-06-12 at 10:24 +0200, Jean-Michel Hautbois wrote:
> 2012/6/8 Jean-Michel Hautbois <jhautbois@gmail.com>:
> > 2012/6/8 Eric Dumazet <eric.dumazet@gmail.com>:
> >> On Fri, 2012-06-08 at 10:14 +0200, Jean-Michel Hautbois wrote:
> >>> 2012/6/8 Eric Dumazet <eric.dumazet@gmail.com>:
> >>> > On Thu, 2012-06-07 at 14:54 +0200, Jean-Michel Hautbois wrote:
> >>> >
> >>> >> eth1      Link encap:Ethernet  HWaddr 68:b5:99:b9:8d:d4
> >>> >>           UP BROADCAST RUNNING SLAVE MULTICAST  MTU:4096  Metric:1
> >>> >>           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> >>> >>           TX packets:15215387 errors:0 dropped:0 overruns:0 carrier:0
> >>> >>           collisions:0 txqueuelen:1000
> >>> >>           RX bytes:0 (0.0 B)  TX bytes:61476524359 (57.2 GiB)
> >>> >
> >>> >> qdisc mq 0: dev eth1 root
> >>> >>  Sent 61476524359 bytes 15215387 pkt (dropped 45683472, overlimits 0
> >>> >> requeues 17480)
> >>> >
> >>> > OK, and "tc -s -d cl show dev eth1"
> >>> >
> >>> > (How many queues are really used)
> >>> >
> >>> >
> >>> >
> >>>
> >>> tc -s -d cl show dev eth1
> >>> class mq :1 root
> >>>  Sent 9798071746 bytes 2425410 pkt (dropped 3442405, overlimits 0 requeues 2747)
> >>>  backlog 0b 0p requeues 2747
> >>> class mq :2 root
> >>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> >>>  backlog 0b 0p requeues 0
> >>> class mq :3 root
> >>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> >>>  backlog 0b 0p requeues 0
> >>> class mq :4 root
> >>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> >>>  backlog 0b 0p requeues 0
> >>> class mq :5 root
> >>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> >>>  backlog 0b 0p requeues 0
> >>> class mq :6 root
> >>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> >>>  backlog 0b 0p requeues 0
> >>> class mq :7 root
> >>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> >>>  backlog 0b 0p requeues 0
> >>> class mq :8 root
> >>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> >>>  backlog 0b 0p requeues 0
> >>
> >>
> >> Do you have the same distribution on old kernels as well ?
> >> (ie only queue 0 is used)
> >>
> >>
> >>
> >
> > On the old kernel, there is nothing returned by this command.
> >
> > JM
> 
> I used perf in order to get more information.

What happens if you force some traffic in the other way (say 50.000
(small) packets per second in RX) while doing your tests ?

^ permalink raw reply

* Re: Difficulties to get 1Gbps on be2net ethernet card
From: Jean-Michel Hautbois @ 2012-06-12  9:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Sathya.Perla, netdev
In-Reply-To: <1339491345.22704.22.camel@edumazet-glaptop>

2012/6/12 Eric Dumazet <eric.dumazet@gmail.com>:
> On Tue, 2012-06-12 at 10:24 +0200, Jean-Michel Hautbois wrote:
>> 2012/6/8 Jean-Michel Hautbois <jhautbois@gmail.com>:
>> > 2012/6/8 Eric Dumazet <eric.dumazet@gmail.com>:
>> >> On Fri, 2012-06-08 at 10:14 +0200, Jean-Michel Hautbois wrote:
>> >>> 2012/6/8 Eric Dumazet <eric.dumazet@gmail.com>:
>> >>> > On Thu, 2012-06-07 at 14:54 +0200, Jean-Michel Hautbois wrote:
>> >>> >
>> >>> >> eth1      Link encap:Ethernet  HWaddr 68:b5:99:b9:8d:d4
>> >>> >>           UP BROADCAST RUNNING SLAVE MULTICAST  MTU:4096  Metric:1
>> >>> >>           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>> >>> >>           TX packets:15215387 errors:0 dropped:0 overruns:0 carrier:0
>> >>> >>           collisions:0 txqueuelen:1000
>> >>> >>           RX bytes:0 (0.0 B)  TX bytes:61476524359 (57.2 GiB)
>> >>> >
>> >>> >> qdisc mq 0: dev eth1 root
>> >>> >>  Sent 61476524359 bytes 15215387 pkt (dropped 45683472, overlimits 0
>> >>> >> requeues 17480)
>> >>> >
>> >>> > OK, and "tc -s -d cl show dev eth1"
>> >>> >
>> >>> > (How many queues are really used)
>> >>> >
>> >>> >
>> >>> >
>> >>>
>> >>> tc -s -d cl show dev eth1
>> >>> class mq :1 root
>> >>>  Sent 9798071746 bytes 2425410 pkt (dropped 3442405, overlimits 0 requeues 2747)
>> >>>  backlog 0b 0p requeues 2747
>> >>> class mq :2 root
>> >>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>> >>>  backlog 0b 0p requeues 0
>> >>> class mq :3 root
>> >>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>> >>>  backlog 0b 0p requeues 0
>> >>> class mq :4 root
>> >>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>> >>>  backlog 0b 0p requeues 0
>> >>> class mq :5 root
>> >>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>> >>>  backlog 0b 0p requeues 0
>> >>> class mq :6 root
>> >>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>> >>>  backlog 0b 0p requeues 0
>> >>> class mq :7 root
>> >>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>> >>>  backlog 0b 0p requeues 0
>> >>> class mq :8 root
>> >>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>> >>>  backlog 0b 0p requeues 0
>> >>
>> >>
>> >> Do you have the same distribution on old kernels as well ?
>> >> (ie only queue 0 is used)
>> >>
>> >>
>> >>
>> >
>> > On the old kernel, there is nothing returned by this command.
>> >
>> > JM
>>
>> I used perf in order to get more information.
>
> What happens if you force some traffic in the other way (say 50.000
> (small) packets per second in RX) while doing your tests ?
>

Can I do that using netperf ?

^ permalink raw reply

* Re: Difficulties to get 1Gbps on be2net ethernet card
From: Eric Dumazet @ 2012-06-12  9:06 UTC (permalink / raw)
  To: Jean-Michel Hautbois; +Cc: Sathya.Perla, netdev
In-Reply-To: <CAL8zT=hKmnAG7kZE18noFNEksibvJpR_6Q9z59nGLhqbs5oxuw@mail.gmail.com>

On Tue, 2012-06-12 at 11:01 +0200, Jean-Michel Hautbois wrote:

> Can I do that using netperf ?


Sure, you could use netperf -t UDP_RR

^ permalink raw reply

* Re: Possible deadlock in ipv6?
From: Eric Dumazet @ 2012-06-12  9:09 UTC (permalink / raw)
  To: David Miller; +Cc: vdavydov, netdev
In-Reply-To: <20120611.235453.953830769326224643.davem@davemloft.net>

On Mon, 2012-06-11 at 23:54 -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 06 Jun 2012 17:58:34 +0200
> 
> > And it seems this neigh_down() can be removed, its called later
> > (after dev->ip6_ptr is cleared)
> 
> It is unclear whether we need to do the the neigh_down() in both
> the 'how' and '!how' cases.  If so then we can't make this change.
> 

Hmm...

Is it expected we send traffic on device dismantle ?

If no, we could do :

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index d81d026..16e0ddb 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -681,8 +681,6 @@ static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
 		while ((n = *np) != NULL) {
 			if (!dev || n->dev == dev) {
 				*np = n->next;
-				if (tbl->pdestructor)
-					tbl->pdestructor(n);
 				if (n->dev)
 					dev_put(n->dev);
 				release_net(pneigh_net(n));

^ permalink raw reply related

* Re: Difficulties to get 1Gbps on be2net ethernet card
From: Jean-Michel Hautbois @ 2012-06-12  9:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Sathya.Perla, netdev
In-Reply-To: <1339491979.22704.23.camel@edumazet-glaptop>

2012/6/12 Eric Dumazet <eric.dumazet@gmail.com>:
> On Tue, 2012-06-12 at 11:01 +0200, Jean-Michel Hautbois wrote:
>
>> Can I do that using netperf ?
>
>
> Sure, you could use netperf -t UDP_RR
>
>

It sends, but no change on TX...

^ permalink raw reply

* Re: net/netfilter/nf_conntrack_proto_tcp.c:1606:9: error: ‘struct nf_proto_net’ has no member named ‘user’
From: Pablo Neira Ayuso @ 2012-06-12  9:29 UTC (permalink / raw)
  To: Gao feng; +Cc: David Miller, wfg, netdev
In-Reply-To: <4FD69F5E.3060900@cn.fujitsu.com>

On Tue, Jun 12, 2012 at 09:46:06AM +0800, Gao feng wrote:
> 于 2012年06月12日 08:26, Pablo Neira Ayuso 写道:
> > Hi again David,
> > 
> > On Mon, Jun 11, 2012 at 03:23:44PM -0700, David Miller wrote:
> >> From: Pablo Neira Ayuso <pablo@netfilter.org>
> >> Date: Tue, 12 Jun 2012 00:15:21 +0200
> >>
> >>> Could you please apply the following patch to net-next to resolve
> >>> this? Thanks.
> >>
> >> Applied, but you have to be kidding me with those ifdefs.
> >>
> >> This is exactly the same kind of thing Gao suggested for
> >> the inetpeer code recently and which I flat out rejected.
> >>
> >> You can't pepper foo.c files with ifdefs all over the place.
> > 
> > Would you be OK if I send you patches to move all sysctl part of
> > nf_conntrack_proto_*.c to nf_conntrack_proto_*_sysctl.c
> > 
> > I can also do the same for nf_conntrack_proto.c.
> > 
> > This means more files under the net/netfilter directory, but less
> > ifdef kludges in the code.
> > 
> > Please, have a look at the patch enclosed to this email in case you
> > want to see how it would look like in the end with my proposal.
> 
> I am sorry for all the trouble aroused by my negligence.
> 
> >  static int tcpv4_init_net(struct net *net)
> >  {
> >  	int i;
> > @@ -1600,11 +1373,7 @@ static int tcpv4_init_net(struct net *net)
> >  	struct nf_tcp_net *tn = tcp_pernet(net);
> >  	struct nf_proto_net *pn = (struct nf_proto_net *)tn;
> >  
> > -#ifdef CONFIG_SYSCTL
> > -	if (!pn->ctl_table) {
> > -#else
> >  	if (!pn->users++) {
> 
> nf_proto_net.users has different meaning when SYSCTL enabled or disabled.
> 
> when SYSCTL enabled,it means if both tcpv4 and tcpv6 register the sysctl,
> it is increased when register sysctl success and decreased when unregister sysctl.
> we can regard it as the refcnt of ctl_table.
> 
> when SYSCTL disabled,it just used to identify if the proto's pernet data
> has been initialized.

We have to use two different counters for this. The conditional
meaning of that variable is really confusing.

^ permalink raw reply

* Re: [patch] can: c_can: precedence error in c_can_chip_config()
From: Marc Kleine-Budde @ 2012-06-12  9:37 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Dan Carpenter, Wolfgang Grandegger, AnilKumar Ch, David S. Miller,
	Jiri Kosina, linux-can, netdev, kernel-janitors
In-Reply-To: <4FD62E21.3020209@hartkopp.net>

[-- Attachment #1: Type: text/plain, Size: 973 bytes --]

On 06/11/2012 07:42 PM, Oliver Hartkopp wrote:
> On 10.06.2012 19:52, Marc Kleine-Budde wrote:
> 
>> On 06/09/2012 05:56 PM, Dan Carpenter wrote:
>>> (CAN_CTRLMODE_LISTENONLY & CAN_CTRLMODE_LOOPBACK) is (0x02 & 0x01) which
>>> is zero so the condition is never true.  The intent here was to test
>>> that both flags were set.
>>>
>>> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
>>> ---
>>> This is a static checker fix.  I'm not super familiar with the c_can
>>> code.
>>
>> Good catch. Applied to can-next.
>>
>> Marc
>>
> 
> 
> Shouldn't this fix go through the net-tree and stable instead of net-next?

Can I add your Acked-by ... when adding to net?

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply

* Re: [PATCH] usbnet: Activate the halt interrupt endpoint to fix endless "XactErr" error
From: Huajun Li @ 2012-06-12 10:09 UTC (permalink / raw)
  To: Bjørn Mork
  Cc: David Miller, Ming Lei, Alan Stern,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <87sje1podo.fsf-lbf33ChDnrE/G1V5fR+Y7Q@public.gmane.org>

On Tue, Jun 12, 2012 at 4:47 AM, Bjørn Mork <bjorn-yOkvZcmFvRU@public.gmane.org> wrote:
> Huajun Li <huajun.li.lee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
>> diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
>> index 76f4396..c0bcb61 100644
>> --- a/include/linux/usb/usbnet.h
>> +++ b/include/linux/usb/usbnet.h
>> @@ -62,13 +62,14 @@ struct usbnet {
>>       unsigned long           flags;
>>  #            define EVENT_TX_HALT    0
>>  #            define EVENT_RX_HALT    1
>> -#            define EVENT_RX_MEMORY  2
>> -#            define EVENT_STS_SPLIT  3
>> -#            define EVENT_LINK_RESET 4
>> -#            define EVENT_RX_PAUSED  5
>> -#            define EVENT_DEV_WAKING 6
>> -#            define EVENT_DEV_ASLEEP 7
>> -#            define EVENT_DEV_OPEN   8
>> +#            define EVENT_STS_HALT   2
>> +#            define EVENT_RX_MEMORY  3
>> +#            define EVENT_STS_SPLIT  4
>> +#            define EVENT_LINK_RESET 5
>> +#            define EVENT_RX_PAUSED  6
>> +#            define EVENT_DEV_WAKING 7
>> +#            define EVENT_DEV_ASLEEP 8
>> +#            define EVENT_DEV_OPEN   9
>>  };
>
> Why do you renumber all of these instead of adding the new
> EVENT_STS_HALT to the end of the list?
>

Thanks for your comments.

I think it's nice to sort these mask codes by their purposes.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] netpoll: Fix skb tail pointer in netpoll_send_udp()
From: Bogdan Hamciuc @ 2012-06-12 10:26 UTC (permalink / raw)
  To: davem; +Cc: netdev, Bogdan Hamciuc

As skb->tail wasn't updated after skb_copy_to_linear_data(), subsequent
calls to skb_realloc_headroom() (as made by an ethernet driver's
ndo_start_xmit routine) would only effectively copy the packet headers,
leaving garbage in the payload.

In the process, removed some unnecessary code.

Signed-off-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com>
---
 net/core/netpoll.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 3d84fb9..9a08068 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -362,22 +362,22 @@ EXPORT_SYMBOL(netpoll_send_skb_on_dev);
 
 void netpoll_send_udp(struct netpoll *np, const char *msg, int len)
 {
-	int total_len, eth_len, ip_len, udp_len;
+	int total_len, ip_len, udp_len;
 	struct sk_buff *skb;
 	struct udphdr *udph;
 	struct iphdr *iph;
 	struct ethhdr *eth;
 
 	udp_len = len + sizeof(*udph);
-	ip_len = eth_len = udp_len + sizeof(*iph);
-	total_len = eth_len + ETH_HLEN + NET_IP_ALIGN;
+	ip_len = udp_len + sizeof(*iph);
+	total_len = ip_len + ETH_HLEN + NET_IP_ALIGN;
 
 	skb = find_skb(np, total_len, total_len - len);
 	if (!skb)
 		return;
 
 	skb_copy_to_linear_data(skb, msg, len);
-	skb->len += len;
+	skb_put(skb, len);
 
 	skb_push(skb, sizeof(*udph));
 	skb_reset_transport_header(skb);
-- 
1.5.6.3

^ permalink raw reply related

* [PATCH] netpoll: Add support for hardware checksumming on egress
From: Bogdan Hamciuc @ 2012-06-12 10:26 UTC (permalink / raw)
  To: davem; +Cc: netdev, Bogdan Hamciuc
In-Reply-To: <1339496765-3093-1-git-send-email-bogdan.hamciuc@freescale.com>

Netpoll used to compute its own csum; but if the device supports, we
should let it do the checksum itself.

Signed-off-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com>
---
 net/core/netpoll.c |   14 ++++++++++----
 1 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 9a08068..f5d00b4 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -385,13 +385,19 @@ void netpoll_send_udp(struct netpoll *np, const char *msg, int len)
 	udph->source = htons(np->local_port);
 	udph->dest = htons(np->remote_port);
 	udph->len = htons(udp_len);
-	udph->check = 0;
-	udph->check = csum_tcpudp_magic(np->local_ip,
+
+	/* Only querying the IPv4 csumming capabilities */
+	if (np->dev->features & NETIF_F_IP_CSUM)
+		skb->ip_summed = CHECKSUM_PARTIAL;
+	else {
+		skb->ip_summed = CHECKSUM_NONE;
+		udph->check = csum_tcpudp_magic(np->local_ip,
 					np->remote_ip,
 					udp_len, IPPROTO_UDP,
 					csum_partial(udph, udp_len, 0));
-	if (udph->check == 0)
-		udph->check = CSUM_MANGLED_0;
+		if (udph->check == 0)
+			udph->check = CSUM_MANGLED_0;
+	}
 
 	skb_push(skb, sizeof(*iph));
 	skb_reset_network_header(skb);
-- 
1.5.6.3

^ permalink raw reply related

* [PATCHv2 net-next] ipv4: Add interface option to enable routing of 127.0.0.0/8
From: Thomas Graf @ 2012-06-12 10:44 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120611.165740.419299184892679723.davem@davemloft.net>

Routing of 127/8 is tradtionally forbidden, we consider
packets from that address block martian when routing and do
not process corresponding ARP requests.

This is a sane default but renders a huge address space
practically unuseable.

The RFC states that no address within the 127/8 block should
ever appear on any network anywhere but it does not forbid
the use of such addresses outside of the loopback device in
particular. For example to address a pool of virtual guests
behind a load balancer.

This patch adds a new interface option 'route_localnet'
enabling routing of the 127/8 address block and processing
of ARP requests on a specific interface.

Note that for the feature to work, the default local route
covering 127/8 dev lo needs to be removed.

Example:
  $ sysctl -w net.ipv4.conf.eth0.route_localnet=1
  $ ip route del 127.0.0.0/8 dev lo table local
  $ ip addr add 127.1.0.1/16 dev eth0
  $ ip route flush cache

V2: Fix invalid check to auto flush cache (thanks davem)

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 Documentation/networking/ip-sysctl.txt |    5 +++++
 include/linux/inetdevice.h             |    2 ++
 net/ipv4/arp.c                         |    3 ++-
 net/ipv4/devinet.c                     |    5 ++++-
 net/ipv4/route.c                       |   30 +++++++++++++++++++++---------
 5 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 6f896b9..99d0e05 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -862,6 +862,11 @@ accept_local - BOOLEAN
 	local interfaces over the wire and have them accepted properly.
 	default FALSE
 
+route_localnet - BOOLEAN
+	Do not consider loopback addresses as martian source or destination
+	while routing. This enables the use of 127/8 for local routing purposes.
+	default FALSE
+
 rp_filter - INTEGER
 	0 - No source validation.
 	1 - Strict mode as defined in RFC3704 Strict Reverse Path
diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index 597f4a9..67f9dda 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -38,6 +38,7 @@ enum
 	IPV4_DEVCONF_ACCEPT_LOCAL,
 	IPV4_DEVCONF_SRC_VMARK,
 	IPV4_DEVCONF_PROXY_ARP_PVLAN,
+	IPV4_DEVCONF_ROUTE_LOCALNET,
 	__IPV4_DEVCONF_MAX
 };
 
@@ -131,6 +132,7 @@ static inline void ipv4_devconf_setall(struct in_device *in_dev)
 #define IN_DEV_PROMOTE_SECONDARIES(in_dev) \
 					IN_DEV_ORCONF((in_dev), \
 						      PROMOTE_SECONDARIES)
+#define IN_DEV_ROUTE_LOCALNET(in_dev)	IN_DEV_ORCONF(in_dev, ROUTE_LOCALNET)
 
 #define IN_DEV_RX_REDIRECTS(in_dev) \
 	((IN_DEV_FORWARD(in_dev) && \
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index cda37be..2e560f0 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -790,7 +790,8 @@ static int arp_process(struct sk_buff *skb)
  *	Check for bad requests for 127.x.x.x and requests for multicast
  *	addresses.  If this is one such, delete it.
  */
-	if (ipv4_is_loopback(tip) || ipv4_is_multicast(tip))
+	if (ipv4_is_multicast(tip) ||
+	    (!IN_DEV_ROUTE_LOCALNET(in_dev) && ipv4_is_loopback(tip)))
 		goto out;
 
 /*
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 10e15a1..44bf82e 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1500,7 +1500,8 @@ static int devinet_conf_proc(ctl_table *ctl, int write,
 
 		if (cnf == net->ipv4.devconf_dflt)
 			devinet_copy_dflt_conf(net, i);
-		if (i == IPV4_DEVCONF_ACCEPT_LOCAL - 1)
+		if (i == IPV4_DEVCONF_ACCEPT_LOCAL - 1 ||
+		    i == IPV4_DEVCONF_ROUTE_LOCALNET - 1)
 			if ((new_value == 0) && (old_value != 0))
 				rt_cache_flush(net, 0);
 	}
@@ -1617,6 +1618,8 @@ static struct devinet_sysctl_table {
 					      "force_igmp_version"),
 		DEVINET_SYSCTL_FLUSHING_ENTRY(PROMOTE_SECONDARIES,
 					      "promote_secondaries"),
+		DEVINET_SYSCTL_FLUSHING_ENTRY(ROUTE_LOCALNET,
+					      "route_localnet"),
 	},
 };
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 842510d..655506a 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1960,9 +1960,13 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 		return -EINVAL;
 
 	if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr) ||
-	    ipv4_is_loopback(saddr) || skb->protocol != htons(ETH_P_IP))
+	    skb->protocol != htons(ETH_P_IP))
 		goto e_inval;
 
+	if (likely(!IN_DEV_ROUTE_LOCALNET(in_dev)))
+		if (ipv4_is_loopback(saddr))
+			goto e_inval;
+
 	if (ipv4_is_zeronet(saddr)) {
 		if (!ipv4_is_local_multicast(daddr))
 			goto e_inval;
@@ -2203,8 +2207,7 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	   by fib_lookup.
 	 */
 
-	if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr) ||
-	    ipv4_is_loopback(saddr))
+	if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr))
 		goto martian_source;
 
 	if (ipv4_is_lbcast(daddr) || (saddr == 0 && daddr == 0))
@@ -2216,9 +2219,17 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	if (ipv4_is_zeronet(saddr))
 		goto martian_source;
 
-	if (ipv4_is_zeronet(daddr) || ipv4_is_loopback(daddr))
+	if (ipv4_is_zeronet(daddr))
 		goto martian_destination;
 
+	if (likely(!IN_DEV_ROUTE_LOCALNET(in_dev))) {
+		if (ipv4_is_loopback(daddr))
+			goto martian_destination;
+
+		if (ipv4_is_loopback(saddr))
+			goto martian_source;
+	}
+
 	/*
 	 *	Now we are ready to route packet.
 	 */
@@ -2457,9 +2468,14 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 	u16 type = res->type;
 	struct rtable *rth;
 
-	if (ipv4_is_loopback(fl4->saddr) && !(dev_out->flags & IFF_LOOPBACK))
+	in_dev = __in_dev_get_rcu(dev_out);
+	if (!in_dev)
 		return ERR_PTR(-EINVAL);
 
+	if (likely(!IN_DEV_ROUTE_LOCALNET(in_dev)))
+		if (ipv4_is_loopback(fl4->saddr) && !(dev_out->flags & IFF_LOOPBACK))
+			return ERR_PTR(-EINVAL);
+
 	if (ipv4_is_lbcast(fl4->daddr))
 		type = RTN_BROADCAST;
 	else if (ipv4_is_multicast(fl4->daddr))
@@ -2470,10 +2486,6 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 	if (dev_out->flags & IFF_LOOPBACK)
 		flags |= RTCF_LOCAL;
 
-	in_dev = __in_dev_get_rcu(dev_out);
-	if (!in_dev)
-		return ERR_PTR(-EINVAL);
-
 	if (type == RTN_BROADCAST) {
 		flags |= RTCF_BROADCAST | RTCF_LOCAL;
 		fi = NULL;

^ permalink raw reply related

* Re: [patch] can: c_can: precedence error in c_can_chip_config()
From: Oliver Hartkopp @ 2012-06-12 11:01 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: Dan Carpenter, Wolfgang Grandegger, AnilKumar Ch, David S. Miller,
	Jiri Kosina, linux-can, netdev, kernel-janitors
In-Reply-To: <4FD70DD9.4010009@pengutronix.de>

On 12.06.2012 11:37, Marc Kleine-Budde wrote:

> On 06/11/2012 07:42 PM, Oliver Hartkopp wrote:
>> On 10.06.2012 19:52, Marc Kleine-Budde wrote:
>>
>>> On 06/09/2012 05:56 PM, Dan Carpenter wrote:
>>>> (CAN_CTRLMODE_LISTENONLY & CAN_CTRLMODE_LOOPBACK) is (0x02 & 0x01) which
>>>> is zero so the condition is never true.  The intent here was to test
>>>> that both flags were set.
>>>>
>>>> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
>>>> ---
>>>> This is a static checker fix.  I'm not super familiar with the c_can
>>>> code.
>>>
>>> Good catch. Applied to can-next.
>>>
>>> Marc
>>>
>>
>>
>> Shouldn't this fix go through the net-tree and stable instead of net-next?
> 
> Can I add your Acked-by ... when adding to net?


Yes you can :-)

Oliver

^ permalink raw reply

* Re: net/netfilter/nf_conntrack_proto_tcp.c:1606:9: error: ‘struct nf_proto_net’ has no member named ‘user’
From: Gao feng @ 2012-06-12 11:03 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: David Miller, wfg, netdev
In-Reply-To: <20120612092940.GB30080@1984>

于 2012年06月12日 17:29, Pablo Neira Ayuso 写道:

>> nf_proto_net.users has different meaning when SYSCTL enabled or disabled.
>>
>> when SYSCTL enabled,it means if both tcpv4 and tcpv6 register the sysctl,
>> it is increased when register sysctl success and decreased when unregister sysctl.
>> we can regard it as the refcnt of ctl_table.
>>
>> when SYSCTL disabled,it just used to identify if the proto's pernet data
>> has been initialized.
> 
> We have to use two different counters for this. The conditional
> meaning of that variable is really confusing.
> 
Hi David & Pablo

Please have a look at this patch and tell me if it's OK.
it base on Pable's patch.

diff --git a/include/net/netfilter/nf_conntrack_tcp.h b/include/net/netfilter/nf_conntrack_tcp.h
index 8d16ebe..0945446 100644
--- a/include/net/netfilter/nf_conntrack_tcp.h
+++ b/include/net/netfilter/nf_conntrack_tcp.h
@@ -1,8 +1,16 @@
 #ifndef _NF_CONNTRACK_TCP_H_
 #define _NF_CONNTRACK_TCP_H_

-int nf_ct_tcp_kmemdup_sysctl_table(struct nf_proto_net *pn);
-int nf_ct_tcp_compat_kmemdup_sysctl_table(struct nf_proto_net *pn);
-void nf_ct_tcp_compat_kfree_sysctl_table(struct nf_proto_net *pn);
+#ifdef CONFIG_SYSCTL
+int nf_ct_tcpv4_init_sysctl(struct nf_proto_net *pn);
+int nf_ct_tcpv6_init_sysctl(struct nf_proto_net *pn);
+#else
+int nf_ct_tcpv4_init_sysctl(struct nf_proto_net *pn)
+{
+       pn->users++;
+       return 0;
+}

+#define nf_ct_tcpv6_init_sysctl nf_ct_tcpv4_init_sysctl
+#endif
 #endif
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index cdf8b93..367153a 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -1368,12 +1368,11 @@ static const struct nla_policy tcp_timeout_nla_policy[CTA_TIMEOUT_TCP_MAX+1] = {

 static int tcpv4_init_net(struct net *net)
 {
-       int i;
-       int ret = 0;
        struct nf_tcp_net *tn = tcp_pernet(net);
        struct nf_proto_net *pn = (struct nf_proto_net *)tn;

-       if (!pn->users++) {
+       if (!pn->users) {
+               int i;
                for (i = 0; i < TCP_CONNTRACK_TIMEOUT_MAX; i++)
                        tn->timeouts[i] = tcp_timeouts[i];

@@ -1382,25 +1381,16 @@ static int tcpv4_init_net(struct net *net)
                tn->tcp_max_retrans = nf_ct_tcp_max_retrans;
        }

-       ret = nf_ct_tcp_compat_kmemdup_sysctl_table(pn);
-       if (ret < 0)
-               return ret;
-
-       ret = nf_ct_tcp_kmemdup_sysctl_table(pn);
-       if (ret < 0) {
-               kfree(pn->ctl_compat_table);
-               pn->ctl_compat_table = NULL;
-       }
-       return ret;
+       return nf_ct_tcpv4_init_sysctl(pn);
 }

 static int tcpv6_init_net(struct net *net)
 {
-       int i;
        struct nf_tcp_net *tn = tcp_pernet(net);
        struct nf_proto_net *pn = (struct nf_proto_net *)tn;

-       if (!pn->users++) {
+       if (!pn->users) {
+               int i;
                for (i = 0; i < TCP_CONNTRACK_TIMEOUT_MAX; i++)
                        tn->timeouts[i] = tcp_timeouts[i];
                tn->tcp_loose = nf_ct_tcp_loose;
@@ -1408,7 +1398,7 @@ static int tcpv6_init_net(struct net *net)
                tn->tcp_max_retrans = nf_ct_tcp_max_retrans;
        }

-       return nf_ct_tcp_kmemdup_sysctl_table(pn);
+       return nf_ct_tcpv6_init_sysctl(pn);
 }

 struct nf_conntrack_l4proto nf_conntrack_l4proto_tcp4 __read_mostly =
diff --git a/net/netfilter/nf_conntrack_proto_tcp_sysctl.c b/net/netfilter/nf_conntrack_proto_tcp_sysctl.c
index b9e027f..f038de4 100644
--- a/net/netfilter/nf_conntrack_proto_tcp_sysctl.c
+++ b/net/netfilter/nf_conntrack_proto_tcp_sysctl.c
@@ -182,7 +182,7 @@ static struct ctl_table tcp_compat_sysctl_table[] = {
 };
 #endif /* CONFIG_NF_CONNTRACK_PROC_COMPAT */

-int nf_ct_tcp_kmemdup_sysctl_table(struct nf_proto_net *pn)
+static int nf_ct_tcp_kmemdup_sysctl_table(struct nf_proto_net *pn)
 {
        struct nf_tcp_net *tn = (struct nf_tcp_net *)pn;

@@ -211,7 +211,7 @@ int nf_ct_tcp_kmemdup_sysctl_table(struct nf_proto_net *pn)
        return 0;
 }

-int nf_ct_tcp_compat_kmemdup_sysctl_table(struct nf_proto_net *pn)
+static int nf_ct_tcp_compat_kmemdup_sysctl_table(struct nf_proto_net *pn)
 {
 #ifdef CONFIG_NF_CONNTRACK_PROC_COMPAT
        struct nf_tcp_net *tn = (struct nf_tcp_net *)pn;
@@ -245,3 +245,23 @@ void nf_ct_tcp_compat_kfree_sysctl_table(struct nf_proto_net *pn)
        pn->ctl_compat_table = NULL;
 #endif
 }
+
+int nf_ct_tcpv4_init_sysctl(struct nf_proto_net *pn)
+{
+       int ret;
+
+       ret = nf_ct_tcp_compat_kmemdup_sysctl_table(pn);
+       if (ret < 0)
+               return ret;
+
+       ret = nf_ct_tcp_kmemdup_sysctl_table(pn);
+       if (ret < 0)
+               nf_ct_tcp_compat_kfree_sysctl_table(pn);
+
+       return ret;
+}
+
+int nf_ct_tcpv6_init_sysctl(struct nf_proto_net *pn)
+{
+       return nf_ct_tcp_compat_kmemdup_sysctl_table(pn);
+}

^ permalink raw reply related

* Re: [PATCHv2 net-next] ipv4: Add interface option to enable routing of 127.0.0.0/8
From: Neil Horman @ 2012-06-12 11:14 UTC (permalink / raw)
  To: David Miller, netdev
In-Reply-To: <20120612104401.GH28598@canuck.infradead.org>

On Tue, Jun 12, 2012 at 06:44:01AM -0400, Thomas Graf wrote:
> Routing of 127/8 is tradtionally forbidden, we consider
> packets from that address block martian when routing and do
> not process corresponding ARP requests.
> 
> This is a sane default but renders a huge address space
> practically unuseable.
> 
> The RFC states that no address within the 127/8 block should
> ever appear on any network anywhere but it does not forbid
> the use of such addresses outside of the loopback device in
> particular. For example to address a pool of virtual guests
> behind a load balancer.
> 
> This patch adds a new interface option 'route_localnet'
> enabling routing of the 127/8 address block and processing
> of ARP requests on a specific interface.
> 
> Note that for the feature to work, the default local route
> covering 127/8 dev lo needs to be removed.
> 
> Example:
>   $ sysctl -w net.ipv4.conf.eth0.route_localnet=1
>   $ ip route del 127.0.0.0/8 dev lo table local
>   $ ip addr add 127.1.0.1/16 dev eth0
>   $ ip route flush cache
> 
> V2: Fix invalid check to auto flush cache (thanks davem)
> 
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
Just out of curiosity, would it be more efficient to implement this by
optionally adding a prohibit route to the local table for 127.0.0.0/8 to every
interface that was brought up, based on weather or not that interfaces
route_localnet bool was true or not?  It would save the additional checks in the
routing path I think.  Not sure how much a savings that is, but I thought I
would ask.

Regards
Neil

^ permalink raw reply

* Re: [PATCHv2 net-next] ipv4: Add interface option to enable routing of 127.0.0.0/8
From: Thomas Graf @ 2012-06-12 11:31 UTC (permalink / raw)
  To: Neil Horman; +Cc: David Miller, netdev
In-Reply-To: <20120612111444.GA15984@hmsreliant.think-freely.org>

On Tue, Jun 12, 2012 at 07:14:44AM -0400, Neil Horman wrote:
> Just out of curiosity, would it be more efficient to implement this by
> optionally adding a prohibit route to the local table for 127.0.0.0/8 to every
> interface that was brought up, based on weather or not that interfaces
> route_localnet bool was true or not?  It would save the additional checks in the
> routing path I think.  Not sure how much a savings that is, but I thought I
> would ask.

It's not that simple because we also use the local table for source
address selection and local address verification. So we would have to
exclude/include such routes conditionally based on some route lookup
purpose indicator. Such a prohibit route would have to be valid only
in the output context.

^ permalink raw reply

* Re: [PATCH 2/5] ipv4: Kill ip_rt_frag_needed().
From: Steffen Klassert @ 2012-06-12 11:44 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120611.160258.866525532025442350.davem@davemloft.net>

On Mon, Jun 11, 2012 at 04:02:58PM -0700, David Miller wrote:
> 
> Here below is the kind of patch I was suggesting we make.  I did a
> simple test to make sure the update MTU code path is taken in
> raw_err().

I can confirm that your patch restores the old behaviour of ping.

> 
> But I'm having second thoughts about whether any of this is a good
> idea.
> 
> UDP works by notifying userspace of PMTU events.  And this is
> mandatory, if we're setting DF we have to get the user to decrease the
> size of it's datagram writes below the reported PMTU value.
> 
> As a consequence I believe RAW sockets should also work via
> notifications.
> 
> And therefore it can be argued that in neither case should we update
> the routing cache PMTU information.  

Should be ok as long as all userspace applications that use UDP or
RAW sockets handle pmtu event notifications properly.

ping might be a special case, but now the behaviour of a big
sized ping (say 1400 byte on a network that has a router with
mtu 1300 along the path) with IP_PMTUDISC_WANT might depend on
whether the cached pmtu informations are updated by a recent
tcp connection.

If we had no tcp connection before, we see the behaviour that
I described in my first mail. All packets have the DF bit set.

If a tcp connection updated the cached pmtu informations recently,
the packets don't have the DF bit set. They are fragmented according
the cached pmtu informations instead.

Other applications that do not care for pmtu event notifications
might be in a similar situation. So perhaps we need the kind of
patch you are suggested.

^ permalink raw reply

* Hello
From: Mrs Anna Kennedy @ 2012-06-12 11:39 UTC (permalink / raw)


I have an urgent proposal for you kindly get to me asap.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox