Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next 6/6] bonding: remove useless updating of slave->dev->last_rx
From: Veaceslav Falico @ 2014-01-16  2:05 UTC (permalink / raw)
  To: netdev; +Cc: Veaceslav Falico, Jay Vosburgh, Andy Gospodarek, David S. Miller
In-Reply-To: <1389837916-5377-1-git-send-email-vfalico@redhat.com>

Now that all the logic is handled via last_arp_rx, we don't need to use
last_rx.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
 drivers/net/bonding/bond_main.c | 3 ---
 include/linux/netdevice.h       | 8 +-------
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index f5ac7e0..f9e5512 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1163,9 +1163,6 @@ static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb)
 	slave = bond_slave_get_rcu(skb->dev);
 	bond = slave->bond;
 
-	if (bond->params.arp_interval)
-		slave->dev->last_rx = jiffies;
-
 	recv_probe = ACCESS_ONCE(bond->recv_probe);
 	if (recv_probe) {
 		ret = recv_probe(skb, bond, slave);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 30f6513..2016f00 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1299,13 +1299,7 @@ struct net_device {
 /*
  * Cache lines mostly used on receive path (including eth_type_trans())
  */
-	unsigned long		last_rx;	/* Time of last Rx
-						 * This should not be set in
-						 * drivers, unless really needed,
-						 * because network stack (bonding)
-						 * use it if/when necessary, to
-						 * avoid dirtying this cache line.
-						 */
+	unsigned long		last_rx;	/* Time of last Rx */
 
 	/* Interface address info used in eth_type_trans() */
 	unsigned char		*dev_addr;	/* hw address, (before bcast
-- 
1.8.4

^ permalink raw reply related

* [PATCH net-next] bonding: trivial: rename slave->jiffies to ->last_link_up
From: Veaceslav Falico @ 2014-01-16  2:20 UTC (permalink / raw)
  To: netdev; +Cc: Veaceslav Falico, Jay Vosburgh, Andy Gospodarek

slave->jiffies is updated every time the slave becomes active, which, for
bonding, means that its link is 'up'.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---

Notes:
    On top of
    
    [PATCH net-next 0/6] bonding: only rely on arp packets if arp monitor is used
    
    Sorry for the trivial patch, but I'm always loosing some time
    trying to remember what that actually means.

 drivers/net/bonding/bond_main.c | 20 ++++++++++----------
 drivers/net/bonding/bonding.h   |  3 ++-
 2 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index f9e5512..0f613ae 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -846,7 +846,7 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active)
 		return;
 
 	if (new_active) {
-		new_active->jiffies = jiffies;
+		new_active->last_link_up = jiffies;
 
 		if (new_active->link == BOND_LINK_BACK) {
 			if (USES_PRIMARY(bond->params.mode)) {
@@ -1488,7 +1488,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 	}
 
 	if (new_slave->link != BOND_LINK_DOWN)
-		new_slave->jiffies = jiffies;
+		new_slave->last_link_up = jiffies;
 	pr_debug("Initial state of slave_dev is BOND_LINK_%s\n",
 		new_slave->link == BOND_LINK_DOWN ? "DOWN" :
 			(new_slave->link == BOND_LINK_UP ? "UP" : "BACK"));
@@ -1923,7 +1923,7 @@ static int bond_miimon_inspect(struct bonding *bond)
 				 * recovered before downdelay expired
 				 */
 				slave->link = BOND_LINK_UP;
-				slave->jiffies = jiffies;
+				slave->last_link_up = jiffies;
 				pr_info("%s: link status up again after %d ms for interface %s.\n",
 					bond->dev->name,
 					(bond->params.downdelay - slave->delay) *
@@ -1998,7 +1998,7 @@ static void bond_miimon_commit(struct bonding *bond)
 
 		case BOND_LINK_UP:
 			slave->link = BOND_LINK_UP;
-			slave->jiffies = jiffies;
+			slave->last_link_up = jiffies;
 
 			if (bond->params.mode == BOND_MODE_8023AD) {
 				/* prevent it from being the active one */
@@ -2345,7 +2345,7 @@ int bond_arp_rcv(const struct sk_buff *skb, struct bonding *bond,
 		bond_validate_arp(bond, slave, sip, tip);
 	else if (bond->curr_active_slave &&
 		 time_after(slave_last_arp_rx(bond, bond->curr_active_slave),
-			    bond->curr_active_slave->jiffies))
+			    bond->curr_active_slave->last_link_up))
 		bond_validate_arp(bond, slave, tip, sip);
 
 out_unlock:
@@ -2392,9 +2392,9 @@ static void bond_loadbalance_arp_mon(struct work_struct *work)
 	oldcurrent = ACCESS_ONCE(bond->curr_active_slave);
 	/* see if any of the previous devices are up now (i.e. they have
 	 * xmt and rcv traffic). the curr_active_slave does not come into
-	 * the picture unless it is null. also, slave->jiffies is not needed
-	 * here because we send an arp on each slave and give a slave as
-	 * long as it needs to get the tx/rx within the delta.
+	 * the picture unless it is null. also, slave->last_link_up is not
+	 * needed here because we send an arp on each slave and give a slave
+	 * as long as it needs to get the tx/rx within the delta.
 	 * TODO: what about up/down delay in arp mode? it wasn't here before
 	 *       so it can wait
 	 */
@@ -2516,7 +2516,7 @@ static int bond_ab_arp_inspect(struct bonding *bond)
 		 * active.  This avoids bouncing, as the last receive
 		 * times need a full ARP monitor cycle to be updated.
 		 */
-		if (bond_time_in_interval(bond, slave->jiffies, 2))
+		if (bond_time_in_interval(bond, slave->last_link_up, 2))
 			continue;
 
 		/*
@@ -2709,7 +2709,7 @@ static void bond_ab_arp_probe(struct bonding *bond)
 	new_slave->link = BOND_LINK_BACK;
 	bond_set_slave_active_flags(new_slave);
 	bond_arp_send_all(bond, new_slave);
-	new_slave->jiffies = jiffies;
+	new_slave->last_link_up = jiffies;
 	rcu_assign_pointer(bond->current_arp_slave, new_slave);
 }
 
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 99126b2..9f07af1 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -184,7 +184,8 @@ struct slave {
 	struct net_device *dev; /* first - useful for panic debug */
 	struct bonding *bond; /* our master */
 	int    delay;
-	unsigned long jiffies;
+	/* all three in jiffies */
+	unsigned long last_link_up;
 	unsigned long last_arp_rx;
 	unsigned long target_last_arp_rx[BOND_MAX_ARP_TARGETS];
 	s8     link;    /* one of BOND_LINK_XXXX */
-- 
1.8.4

^ permalink raw reply related

* Re: [PATCH iproute2 2/2] netem: add 64bit rates support
From: Yang Yingliang @ 2014-01-16  2:28 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: stephen, netdev
In-Reply-To: <1389801367.31367.345.camel@edumazet-glaptop2.roam.corp.google.com>

On 2014/1/15 23:56, Eric Dumazet wrote:
> On Wed, 2014-01-15 at 17:42 +0800, Yang Yingliang wrote:
>> netem support 64bit rates start from linux-3.13.
>> Add 64bit rates support in tc tools.
>>
>> tc qdisc show dev eth0
>> qdisc netem 1: dev eth4 root refcnt 2 limit 1000 rate 35Gbit
>>
>> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
>> ---
>>  tc/q_netem.c | 29 ++++++++++++++++++++++++-----
>>  1 file changed, 24 insertions(+), 5 deletions(-)
>>
[...]
>>  
>>  	if (dist_data) {
>>  		if (addattr_l(n, MAX_DIST * sizeof(dist_data[0]),
>> @@ -522,6 +532,7 @@ static int netem_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
>>  	struct tc_netem_qopt qopt;
>>  	const struct tc_netem_rate *rate = NULL;
>>  	int len = RTA_PAYLOAD(opt) - sizeof(qopt);
>> +	__u64 *rate64 = NULL;
> 
> __u64 rate64 = 0;
> 
>>  	SPRINT_BUF(b1);
>>  
>>  	if (opt == NULL)
>> @@ -572,6 +583,11 @@ static int netem_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
>>  				return -1;
>>  			ecn = RTA_DATA(tb[TCA_NETEM_ECN]);
>>  		}
>> +		if (tb[TCA_NETEM_RATE64]) {
>> +			if (RTA_PAYLOAD(tb[TCA_NETEM_RATE64]) < sizeof(*rate64))
>> +				return -1;
>> +			rate64 = RTA_DATA(tb[TCA_NETEM_RATE64]);
> 
> rate64 = rta_getattr_u64(tb[TCA_NETEM_RATE64]);

It looks better, I'll send a v2.

Thanks!

^ permalink raw reply

* [PATCH v6 net-next 2/4] sh_eth: Add support for r7s72100
From: Simon Horman @ 2014-01-16  2:34 UTC (permalink / raw)
  To: David S. Miller, netdev, linux-sh
  Cc: linux-arm-kernel, Magnus Damm, Sergei Shtylyov, Joe Perches,
	Simon Horman
In-Reply-To: <1389839656-10932-1-git-send-email-horms+renesas@verge.net.au>

The r7s72100 SoC includes a fast ethernet controller.

Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

---
Dave, please consider this for net-next.

v6
* As suggested by Sergei Shtylyov
  - Update changelog
  - Position sh_eth_offset_fast_rz above sh_eth_offset_fast_rcar
  - Place [TR]X[NA]LCR0 in a separate group in sh_eth_offset_fast_rz
  - Simplify logic of sh_eth_is_rz_fast_ether
  - Add comma after and in comment.

v5
* As suggested by Sergei Shtylyov
  - Add the following missing registers to sh_eth_offset_fast_rz:
    RFLR, [TR]X[NA]LCR0.
  - Set the following in r7s72100_data: no_psr, no_ade, hw_crc.
  - Use EDTRR_TRNS_GETHER instead of adding EDTRR_TRNS_RZ_ETHER.
  - Position SH_ETH_REG_FAST_RZ before SH_ETH_REG_FAST_RCAR.
  - Do not remove ',' from before 'and' in EDSR comment

v4
* As requested by David Miller
  - Use a boolean for the return value of sh_eth_is_rz_fast_ether()
  - Correct coding style in sh_eth_get_stats()

v3
* No change

v2
* As suggested by Magnus Damm and Sergei Shtylyov
  - r7s72100 ethernet is not gigabit so do not refer to it as such

* As suggested by Magnus Damm
  - As RZ specific register layout rather than using the gigabit layout
    which includes registers that do not exist on this chip.

* As suggested by Sergei Shtylyov
  - Do not use sh_eth_chip_reset_r8a7740 as it accesses non-existent
    RMII registers. Instead use sh_eth_chip_reset.
  - Do not use sh_eth_set_rate_gether as it accesses non-existent registers.
  - Do not use reserved LCHNG bit of ECSR
  - Do not use reserved LCHNGIP bit of ECSIPR
  - Document that R8A779x also needs a 16 bit shift of the RFS bits
  - Do not document that the R7S72100 has GECMR, it does not
---
 drivers/net/ethernet/renesas/sh_eth.c | 124 ++++++++++++++++++++++++++++++++--
 drivers/net/ethernet/renesas/sh_eth.h |   3 +-
 2 files changed, 119 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index f12a929..a21be4a 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -144,6 +144,65 @@ static const u16 sh_eth_offset_gigabit[SH_ETH_MAX_REGISTER_OFFSET] = {
 	[FWALCR1]	= 0x00b4,
 };
 
+static const u16 sh_eth_offset_fast_rz[SH_ETH_MAX_REGISTER_OFFSET] = {
+	[EDSR]		= 0x0000,
+	[EDMR]		= 0x0400,
+	[EDTRR]		= 0x0408,
+	[EDRRR]		= 0x0410,
+	[EESR]		= 0x0428,
+	[EESIPR]	= 0x0430,
+	[TDLAR]		= 0x0010,
+	[TDFAR]		= 0x0014,
+	[TDFXR]		= 0x0018,
+	[TDFFR]		= 0x001c,
+	[RDLAR]		= 0x0030,
+	[RDFAR]		= 0x0034,
+	[RDFXR]		= 0x0038,
+	[RDFFR]		= 0x003c,
+	[TRSCER]	= 0x0438,
+	[RMFCR]		= 0x0440,
+	[TFTR]		= 0x0448,
+	[FDR]		= 0x0450,
+	[RMCR]		= 0x0458,
+	[RPADIR]	= 0x0460,
+	[FCFTR]		= 0x0468,
+	[CSMR]		= 0x04E4,
+
+	[ECMR]		= 0x0500,
+	[RFLR]		= 0x0508,
+	[ECSR]		= 0x0510,
+	[ECSIPR]	= 0x0518,
+	[PIR]		= 0x0520,
+	[APR]		= 0x0554,
+	[MPR]		= 0x0558,
+	[PFTCR]		= 0x055c,
+	[PFRCR]		= 0x0560,
+	[TPAUSER]	= 0x0564,
+	[MAHR]		= 0x05c0,
+	[MALR]		= 0x05c8,
+	[CEFCR]		= 0x0740,
+	[FRECR]		= 0x0748,
+	[TSFRCR]	= 0x0750,
+	[TLFRCR]	= 0x0758,
+	[RFCR]		= 0x0760,
+	[MAFCR]		= 0x0778,
+
+	[ARSTR]		= 0x0000,
+	[TSU_CTRST]	= 0x0004,
+	[TSU_VTAG0]	= 0x0058,
+	[TSU_ADSBSY]	= 0x0060,
+	[TSU_TEN]	= 0x0064,
+	[TSU_ADRH0]	= 0x0100,
+	[TSU_ADRL0]	= 0x0104,
+	[TSU_ADRH31]	= 0x01f8,
+	[TSU_ADRL31]	= 0x01fc,
+
+	[TXNLCR0]	= 0x0080,
+	[TXALCR0]	= 0x0084,
+	[RXNLCR0]	= 0x0088,
+	[RXALCR0]	= 0x008C,
+};
+
 static const u16 sh_eth_offset_fast_rcar[SH_ETH_MAX_REGISTER_OFFSET] = {
 	[ECMR]		= 0x0300,
 	[RFLR]		= 0x0308,
@@ -315,6 +374,11 @@ static bool sh_eth_is_gether(struct sh_eth_private *mdp)
 	return mdp->reg_offset == sh_eth_offset_gigabit;
 }
 
+static bool sh_eth_is_rz_fast_ether(struct sh_eth_private *mdp)
+{
+	return mdp->reg_offset == sh_eth_offset_fast_rz;
+}
+
 static void sh_eth_select_mii(struct net_device *ndev)
 {
 	u32 value = 0x0;
@@ -698,6 +762,38 @@ static struct sh_eth_cpu_data r8a7740_data = {
 	.shift_rd0	= 1,
 };
 
+/* R7S72100 */
+static struct sh_eth_cpu_data r7s72100_data = {
+	.chip_reset	= sh_eth_chip_reset,
+	.set_duplex	= sh_eth_set_duplex,
+
+	.register_type	= SH_ETH_REG_FAST_RZ,
+
+	.ecsr_value	= ECSR_ICD,
+	.ecsipr_value	= ECSIPR_ICDIP,
+	.eesipr_value	= 0xff7f009f,
+
+	.tx_check	= EESR_TC1 | EESR_FTC,
+	.eesr_err_check	= EESR_TWB1 | EESR_TWB | EESR_TABT | EESR_RABT |
+			  EESR_RFE | EESR_RDE | EESR_RFRMER | EESR_TFE |
+			  EESR_TDE | EESR_ECI,
+	.fdr_value	= 0x0000070f,
+	.rmcr_value	= RMCR_RNC,
+
+	.no_psr		= 1,
+	.apr		= 1,
+	.mpr		= 1,
+	.tpauser	= 1,
+	.hw_swap	= 1,
+	.rpadir		= 1,
+	.rpadir_value   = 2 << 16,
+	.no_trimd	= 1,
+	.no_ade		= 1,
+	.hw_crc		= 1,
+	.tsu		= 1,
+	.shift_rd0	= 1,
+};
+
 static struct sh_eth_cpu_data sh7619_data = {
 	.register_type	= SH_ETH_REG_FAST_SH3_SH2,
 
@@ -764,7 +860,7 @@ static int sh_eth_reset(struct net_device *ndev)
 	struct sh_eth_private *mdp = netdev_priv(ndev);
 	int ret = 0;
 
-	if (sh_eth_is_gether(mdp)) {
+	if (sh_eth_is_gether(mdp) || sh_eth_is_rz_fast_ether(mdp)) {
 		sh_eth_write(ndev, EDSR_ENALL, EDSR);
 		sh_eth_write(ndev, sh_eth_read(ndev, EDMR) | EDMR_SRST_GETHER,
 			     EDMR);
@@ -875,7 +971,7 @@ static void read_mac_address(struct net_device *ndev, unsigned char *mac)
 
 static unsigned long sh_eth_get_edtrr_trns(struct sh_eth_private *mdp)
 {
-	if (sh_eth_is_gether(mdp))
+	if (sh_eth_is_gether(mdp) || sh_eth_is_rz_fast_ether(mdp))
 		return EDTRR_TRNS_GETHER;
 	else
 		return EDTRR_TRNS_ETHER;
@@ -1038,7 +1134,8 @@ static void sh_eth_ring_format(struct net_device *ndev)
 		/* Rx descriptor address set */
 		if (i == 0) {
 			sh_eth_write(ndev, mdp->rx_desc_dma, RDLAR);
-			if (sh_eth_is_gether(mdp))
+			if (sh_eth_is_gether(mdp) ||
+			    sh_eth_is_rz_fast_ether(mdp))
 				sh_eth_write(ndev, mdp->rx_desc_dma, RDFAR);
 		}
 	}
@@ -1059,7 +1156,8 @@ static void sh_eth_ring_format(struct net_device *ndev)
 		if (i == 0) {
 			/* Tx descriptor address set */
 			sh_eth_write(ndev, mdp->tx_desc_dma, TDLAR);
-			if (sh_eth_is_gether(mdp))
+			if (sh_eth_is_gether(mdp) ||
+			    sh_eth_is_rz_fast_ether(mdp))
 				sh_eth_write(ndev, mdp->tx_desc_dma, TDFAR);
 		}
 	}
@@ -1306,9 +1404,9 @@ static int sh_eth_rx(struct net_device *ndev, u32 intr_status, int *quota)
 
 		/* In case of almost all GETHER/ETHERs, the Receive Frame State
 		 * (RFS) bits in the Receive Descriptor 0 are from bit 9 to
-		 * bit 0. However, in case of the R8A7740's GETHER, the RFS
-		 * bits are from bit 25 to bit 16. So, the driver needs right
-		 * shifting by 16.
+		 * bit 0. However, in case of the R8A7740, R8A779x, and
+		 * R7S72100 the RFS bits are from bit 25 to bit 16. So, the
+		 * driver needs right shifting by 16.
 		 */
 		if (mdp->cd->shift_rd0)
 			desc_status >>= 16;
@@ -2058,6 +2156,9 @@ static struct net_device_stats *sh_eth_get_stats(struct net_device *ndev)
 {
 	struct sh_eth_private *mdp = netdev_priv(ndev);
 
+	if (sh_eth_is_rz_fast_ether(mdp))
+		return &ndev->stats;
+
 	pm_runtime_get_sync(&mdp->pdev->dev);
 
 	ndev->stats.tx_dropped += sh_eth_read(ndev, TROCR);
@@ -2439,6 +2540,11 @@ static int sh_eth_vlan_rx_kill_vid(struct net_device *ndev,
 /* SuperH's TSU register init function */
 static void sh_eth_tsu_init(struct sh_eth_private *mdp)
 {
+	if (sh_eth_is_rz_fast_ether(mdp)) {
+		sh_eth_tsu_write(mdp, 0, TSU_TEN); /* Disable all CAM entry */
+		return;
+	}
+
 	sh_eth_tsu_write(mdp, 0, TSU_FWEN0);	/* Disable forward(0->1) */
 	sh_eth_tsu_write(mdp, 0, TSU_FWEN1);	/* Disable forward(1->0) */
 	sh_eth_tsu_write(mdp, 0, TSU_FCM);	/* forward fifo 3k-3k */
@@ -2558,6 +2664,9 @@ static const u16 *sh_eth_get_register_offset(int register_type)
 	case SH_ETH_REG_GIGABIT:
 		reg_offset = sh_eth_offset_gigabit;
 		break;
+	case SH_ETH_REG_FAST_RZ:
+		reg_offset = sh_eth_offset_fast_rz;
+		break;
 	case SH_ETH_REG_FAST_RCAR:
 		reg_offset = sh_eth_offset_fast_rcar;
 		break;
@@ -2796,6 +2905,7 @@ static struct platform_device_id sh_eth_id_table[] = {
 	{ "sh7757-ether", (kernel_ulong_t)&sh7757_data },
 	{ "sh7757-gether", (kernel_ulong_t)&sh7757_data_giga },
 	{ "sh7763-gether", (kernel_ulong_t)&sh7763_data },
+	{ "r7s72100-ether", (kernel_ulong_t)&r7s72100_data },
 	{ "r8a7740-gether", (kernel_ulong_t)&r8a7740_data },
 	{ "r8a777x-ether", (kernel_ulong_t)&r8a777x_data },
 	{ "r8a7790-ether", (kernel_ulong_t)&r8a779x_data },
diff --git a/drivers/net/ethernet/renesas/sh_eth.h b/drivers/net/ethernet/renesas/sh_eth.h
index 0fe35b7..6075915 100644
--- a/drivers/net/ethernet/renesas/sh_eth.h
+++ b/drivers/net/ethernet/renesas/sh_eth.h
@@ -155,6 +155,7 @@ enum {
 
 enum {
 	SH_ETH_REG_GIGABIT,
+	SH_ETH_REG_FAST_RZ,
 	SH_ETH_REG_FAST_RCAR,
 	SH_ETH_REG_FAST_SH4,
 	SH_ETH_REG_FAST_SH3_SH2
@@ -169,7 +170,7 @@ enum {
 
 /* Register's bits
  */
-/* EDSR : sh7734, sh7757, sh7763, and r8a7740 only */
+/* EDSR : sh7734, sh7757, sh7763, r8a7740, and r7s72100 only */
 enum EDSR_BIT {
 	EDSR_ENT = 0x01, EDSR_ENR = 0x02,
 };
-- 
1.8.4


^ permalink raw reply related

* [PATCH v6 3/4] ARM: shmobile: r7s72100: Add clock for r7s72100-ether
From: Simon Horman @ 2014-01-16  2:34 UTC (permalink / raw)
  To: David S. Miller, netdev, linux-sh
  Cc: linux-arm-kernel, Magnus Damm, Sergei Shtylyov, Joe Perches,
	Simon Horman
In-Reply-To: <1389839656-10932-1-git-send-email-horms+renesas@verge.net.au>

Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

---
Dave, I intend to take this change through my tree.

v6
* No change

v5
* Rebase

v3 - v4
* No change

v2
* As suggested by Sergei Shtylyov
  - Add MSTP74 to beginning of enum on a line by itself
* As suggested by Magnus Damm
  - r7s72100 ethernet is not gigabit so do not refer to it as such
---
 arch/arm/mach-shmobile/clock-r7s72100.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm/mach-shmobile/clock-r7s72100.c b/arch/arm/mach-shmobile/clock-r7s72100.c
index dd8ce87..0242ca5 100644
--- a/arch/arm/mach-shmobile/clock-r7s72100.c
+++ b/arch/arm/mach-shmobile/clock-r7s72100.c
@@ -27,6 +27,7 @@
 #define FRQCR2		0xfcfe0014
 #define STBCR3		0xfcfe0420
 #define STBCR4		0xfcfe0424
+#define STBCR7		0xfcfe0430
 #define STBCR9		0xfcfe0438
 
 #define PLL_RATE 30
@@ -146,6 +147,7 @@ struct clk div4_clks[DIV4_NR] = {
 };
 
 enum {	MSTP97, MSTP96, MSTP95, MSTP94,
+	MSTP74,
 	MSTP47, MSTP46, MSTP45, MSTP44, MSTP43, MSTP42, MSTP41, MSTP40,
 	MSTP33,	MSTP_NR };
 
@@ -154,6 +156,7 @@ static struct clk mstp_clks[MSTP_NR] = {
 	[MSTP96] = SH_CLK_MSTP8(&peripheral0_clk, STBCR9, 6, 0), /* RIIC1 */
 	[MSTP95] = SH_CLK_MSTP8(&peripheral0_clk, STBCR9, 5, 0), /* RIIC2 */
 	[MSTP94] = SH_CLK_MSTP8(&peripheral0_clk, STBCR9, 4, 0), /* RIIC3 */
+	[MSTP74] = SH_CLK_MSTP8(&peripheral1_clk, STBCR7, 4, 0), /* Ether */
 	[MSTP47] = SH_CLK_MSTP8(&peripheral1_clk, STBCR4, 7, 0), /* SCIF0 */
 	[MSTP46] = SH_CLK_MSTP8(&peripheral1_clk, STBCR4, 6, 0), /* SCIF1 */
 	[MSTP45] = SH_CLK_MSTP8(&peripheral1_clk, STBCR4, 5, 0), /* SCIF2 */
@@ -180,6 +183,7 @@ static struct clk_lookup lookups[] = {
 	CLKDEV_DEV_ID("fcfee400.i2c", &mstp_clks[MSTP96]),
 	CLKDEV_DEV_ID("fcfee800.i2c", &mstp_clks[MSTP95]),
 	CLKDEV_DEV_ID("fcfeec00.i2c", &mstp_clks[MSTP94]),
+	CLKDEV_DEV_ID("r7s72100-ether", &mstp_clks[MSTP74]),
 	CLKDEV_CON_ID("mtu2_fck", &mstp_clks[MSTP33]),
 
 	/* ICK */
-- 
1.8.4


^ permalink raw reply related

* [PATCH v6 net-next 1/4] sh_eth: Use bool as return type of sh_eth_is_gether()
From: Simon Horman @ 2014-01-16  2:34 UTC (permalink / raw)
  To: David S. Miller, netdev, linux-sh
  Cc: linux-arm-kernel, Magnus Damm, Sergei Shtylyov, Joe Perches,
	Simon Horman
In-Reply-To: <1389839656-10932-1-git-send-email-horms+renesas@verge.net.au>

Return a boolean and use true and false.

Signed-off-by: Simon Horman <horms+renesas@verge.net.au>

---
Dave, please consider this for net-next.

v6
* Simplify logic as suggested by Joe Perches

v5
* No change

v4
* First post
---
 drivers/net/ethernet/renesas/sh_eth.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index ba1f6c9..f12a929 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -310,12 +310,9 @@ static const u16 sh_eth_offset_fast_sh3_sh2[SH_ETH_MAX_REGISTER_OFFSET] = {
 	[TSU_ADRL31]	= 0x01fc,
 };
 
-static int sh_eth_is_gether(struct sh_eth_private *mdp)
+static bool sh_eth_is_gether(struct sh_eth_private *mdp)
 {
-	if (mdp->reg_offset == sh_eth_offset_gigabit)
-		return 1;
-	else
-		return 0;
+	return mdp->reg_offset == sh_eth_offset_gigabit;
 }
 
 static void sh_eth_select_mii(struct net_device *ndev)
-- 
1.8.4

^ permalink raw reply related

* [PATCH v6 0/4] Add ethernet support for r7s72100
From: Simon Horman @ 2014-01-16  2:34 UTC (permalink / raw)
  To: David S. Miller, netdev, linux-sh
  Cc: linux-arm-kernel, Magnus Damm, Sergei Shtylyov, Joe Perches,
	Simon Horman

Hi,

this series adds ethernet support to sh-pfc for the r7s72100 SoC.

This series is based on a merge of:
* The topic/r7s72100-v3.13-rc8-20140115 tag in my renesas tree
* net-next
  - Head revision: 08c93cd99b2f31ba9
    ("Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can-next")

The first two patches, targeted at net-next, also applies cleanly there.

Changes since v5
* Address feedback from Joe Perches and Sergei Shtylyov as detailed
  in the changelogs of:
  - sh_eth: Use bool as return type of sh_eth_is_gether()
  - sh_eth: Add support for r7s72100

Changes since v4
* Addressed feedback from Sergei Shtylyov as detailed in the changelog
  of "sh_eth: Add support for r7s72100"
* Rebase

Changes since v3
* Use bool as return type of sh_eth_is_gether()
  and sh_eth_is_rz_fast_ether()
* Correct coding style in sh_eth_get_stats()

Changes since v2
* Trivial rebase
* Dropped "RFC" from subject

Changes since v1 are noted in the changelog of each patch.

Simon Horman (4):
  sh_eth: Use bool as return type of sh_eth_is_gether()
  sh_eth: Add support for r7s72100
  ARM: shmobile: r7s72100: Add clock for r7s72100-ether
  ARM: shmobile: genmai: Enable r7s72100-ether

 arch/arm/mach-shmobile/board-genmai.c   |  21 +++++
 arch/arm/mach-shmobile/clock-r7s72100.c |   4 +
 drivers/net/ethernet/renesas/sh_eth.c   | 131 +++++++++++++++++++++++++++++---
 drivers/net/ethernet/renesas/sh_eth.h   |   3 +-
 4 files changed, 146 insertions(+), 13 deletions(-)

-- 
1.8.4

^ permalink raw reply

* [PATCH v6 4/4] ARM: shmobile: genmai: Enable r7s72100-ether
From: Simon Horman @ 2014-01-16  2:34 UTC (permalink / raw)
  To: David S. Miller, netdev, linux-sh
  Cc: linux-arm-kernel, Magnus Damm, Sergei Shtylyov, Joe Perches,
	Simon Horman, Simon Horman
In-Reply-To: <1389839656-10932-1-git-send-email-horms+renesas@verge.net.au>

Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

---
Dave, I intend to take this change through my tree

v3 - v6
* No change

v2
* As suggested by Magnus Damm and Sergei Shtylyov
  - r7s72100 ethernet is not gigabit so do not refer to it as such

* As suggested by Sergei Shtylyov
  - set no_ether_link as there is no LINK signal documented
    in the manual
---
 arch/arm/mach-shmobile/board-genmai.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/arch/arm/mach-shmobile/board-genmai.c b/arch/arm/mach-shmobile/board-genmai.c
index 3e92e3c..a1f6fe1 100644
--- a/arch/arm/mach-shmobile/board-genmai.c
+++ b/arch/arm/mach-shmobile/board-genmai.c
@@ -20,15 +20,36 @@
 
 #include <linux/kernel.h>
 #include <linux/platform_device.h>
+#include <linux/sh_eth.h>
 #include <mach/common.h>
+#include <mach/irqs.h>
 #include <mach/r7s72100.h>
 #include <asm/mach-types.h>
 #include <asm/mach/arch.h>
 
+/* Ether */
+static const struct sh_eth_plat_data ether_pdata __initconst = {
+	.phy			= 0x00, /* PD60610 */
+	.edmac_endian		= EDMAC_LITTLE_ENDIAN,
+	.phy_interface		= PHY_INTERFACE_MODE_MII,
+	.no_ether_link		= 1
+};
+
+static const struct resource ether_resources[] __initconst = {
+	DEFINE_RES_MEM(0xe8203000, 0x800),
+	DEFINE_RES_MEM(0xe8204800, 0x200),
+	DEFINE_RES_IRQ(gic_iid(359)),
+};
+
 static void __init genmai_add_standard_devices(void)
 {
 	r7s72100_clock_init();
 	r7s72100_add_dt_devices();
+
+	platform_device_register_resndata(&platform_bus, "r7s72100-ether", -1,
+					  ether_resources,
+					  ARRAY_SIZE(ether_resources),
+					  &ether_pdata, sizeof(ether_pdata));
 }
 
 static const char * const genmai_boards_compat_dt[] __initconst = {
-- 
1.8.4

^ permalink raw reply related

* Re: [RFC PATCH net-next 2/3] virtio_net: Introduce one dummy function virtnet_filter_rfs()
From: Zhi Yong Wu @ 2014-01-16  2:45 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Linux Netdev List, Eric Dumazet, David Miller, Zhi Yong Wu
In-Reply-To: <CA+mtBx8Z80d5Y1ti0_68UA-x0hZPeqT4wa9knevWoPxX+eyaVA@mail.gmail.com>

On Thu, Jan 16, 2014 at 1:54 AM, Tom Herbert <therbert@google.com> wrote:
> Zhi, this is promising work! I can't wait to see how this impacts
"Zhi" is part of my first name, you call me "Zhi Yong".
> network virtualization performance :-)
heh, too much work is missing.
>
> On Wed, Jan 15, 2014 at 6:20 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
>> From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>
>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>> ---
>>  drivers/net/virtio_net.c |   11 +++++++++++
>>  1 files changed, 11 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 7b17240..046421c 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -1295,6 +1295,14 @@ static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
>>         return 0;
>>  }
>>
>> +#ifdef CONFIG_RFS_ACCEL
>> +static int virtnet_filter_rfs(struct net_device *net_dev,
>> +               const struct sk_buff *skb, u16 rxq_index, u32 flow_id)
>> +{
> Does this need to be filled out with more stuff?
Yes, the following stuff are missing:
1.)  guest virtio_net driver should have one filter table and its
entries can be expired periodically;
2.)  guest virtio_net driver should pass rx queue index and filter
info down to the emulated virtio_net NIC in QEMU.
3.) the emulated virtio_net NIC should have its indirect table to
store the flow to rx queue mapping.
4.) the emulated virtio_net NIC should classify the rx packet to
selected queue by applying the filter.
5.) update virtio spec.
Do i miss anything? If yes, please correct me.
For 3.) and 4.), do you have any doc about how they are implemented in
physical NICs? e.g. mlx4_en or sfc, etc.

>
>> +       return 0;
>> +}
>> +#endif /* CONFIG_RFS_ACCEL */
>> +
>>  static const struct net_device_ops virtnet_netdev = {
>>         .ndo_open            = virtnet_open,
>>         .ndo_stop            = virtnet_close,
>> @@ -1309,6 +1317,9 @@ static const struct net_device_ops virtnet_netdev = {
>>  #ifdef CONFIG_NET_POLL_CONTROLLER
>>         .ndo_poll_controller = virtnet_netpoll,
>>  #endif
>> +#ifdef CONFIG_RFS_ACCEL
>> +       .ndo_rx_flow_steer   = virtnet_filter_rfs,
>> +#endif
>>  };
>>
>>  static void virtnet_config_changed_work(struct work_struct *work)
>> --
>> 1.7.6.5
>>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply

* Re: [PATCH 1/2 v3] ixgbe: define IXGBE_MAX_VFS_DRV_LIMIT macro and cleanup const 63
From: Brown, Aaron F @ 2014-01-16  2:51 UTC (permalink / raw)
  To: ethan.kernel@gmail.com
  Cc: e1000-devel@lists.sourceforge.net, netdev@vger.kernel.org,
	Allan, Bruce W, Brandeburg, Jesse, linux-kernel@vger.kernel.org,
	davem@davemloft.net
In-Reply-To: <CABawtvNSpDU+pjdJOpCB8kyQqA7B9N4g0S_yPgQaZnB3h0+nFg@mail.gmail.com>

On Thu, 2014-01-16 at 09:58 +0800, Ethan Zhao wrote:
> Aaron,
> 
> Is this your net-next repo ? if so, I rebuild the patch with this repo
> right now .
> git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next.git
> 
> Thanks,
> Ethan
> 
Only sort of.  Jeff uses it to push patches up, but I don't have an
account there so am simply sending them up the old fashioned way, via
email, hence it is probably not getting updated while Jeff is out.

As long as your patch can apply cleanly to Dave Miller's net-next tree I
should be able to pull it into our internal ones.
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: [PATCH net-next 2/2] reciprocal_divide: correction/update of the algorithm
From: Eric Dumazet @ 2014-01-16  3:07 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: davem, netdev, linux-kernel, Hannes Frederic Sowa,
	Austin S Hemmelgarn, Jesse Gross, Jamal Hadi Salim,
	Stephen Hemminger, Matt Mackall, Pekka Enberg, Christoph Lameter,
	Andy Gospodarek, Veaceslav Falico, Jay Vosburgh, Jakub Zawadzki
In-Reply-To: <1389828228-30312-3-git-send-email-dborkman@redhat.com>

On Thu, 2014-01-16 at 00:23 +0100, Daniel Borkmann wrote:

> Also, reciprocal_value() and reciprocal_divide() always return 0
> for divisions by 1. This is a bit worrisome as those functions
> also get used in mm/slab.c and lib/flex_array.c, apparently for
> index calculation to access array slots. 

Hi Daniel

This off-by-one limitation is a known one,
and mm/slab.c does not have an issue with it because :

- Minimal object size is not 1 byte, but 8 (or maybe 4)
- We always divide a multiple of the divisor,
  so there is no off-by-one effect.

Little attached prog does a brute force check if needed.

So far, the only relevant issue was about BPF, and a better
documentation of reciprocal_divide() use cases.

(I let Jesse comment on the flex_array case)

I am unsure we want to 'fix' things, we tried hard in the past to avoid
divides, so the ones we use are usually because the divisor is not
constant, so the reciprocal doesn't help.

(BPF is fixed in David tree)

Thanks !

#include <stdio.h>

typedef unsigned int u32;
typedef unsigned long long u64;

static u32 reciprocal_value(u32 k)
{
	u64 val = (1LL << 32) + (k - 1);
	val /= k;
	return (u32)val;
}

static inline u32 reciprocal_divide(u32 A, u32 R)
{
	return (u32)(((u64)A * R) >> 32);
}

#define LIMIT 1000*1000*1000

int main()
{
	u32 divisor, dividend, reciprocal, next;
	int res = 0;

	for (divisor = 2; divisor < LIMIT; divisor++) {
		reciprocal = reciprocal_value(divisor);
		for (dividend = 0; ; dividend = next) {
			if (reciprocal_divide(dividend, reciprocal) != (dividend / divisor)) {
				printf("Arg: %u/%u was not properly computed (%u/%u)\n",
					dividend, divisor,
					reciprocal_divide(dividend, reciprocal),
					dividend / divisor);
				res = 1;
				break;
			}
			next = dividend + divisor;
			if (next < dividend)
				break;
		}
	}
	return res;
}

^ permalink raw reply

* Re: [PATCH 1/2 v3] ixgbe: define IXGBE_MAX_VFS_DRV_LIMIT macro and cleanup const 63
From: Ethan Zhao @ 2014-01-16  3:08 UTC (permalink / raw)
  To: Brown, Aaron F
  Cc: Kirsher, Jeffrey T, Brandeburg, Jesse, Allan, Bruce W,
	Wyborny, Carolyn, davem@davemloft.net,
	e1000-devel@lists.sourceforge.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <309B89C4C689E141A5FF6A0C5FB2118B7312DF41@ORSMSX101.amr.corp.intel.com>

Aaron,
    Ok, Dave Miler's net-next tree.

Thanks,
Etan

On Thu, Jan 16, 2014 at 10:51 AM, Brown, Aaron F
<aaron.f.brown@intel.com> wrote:
> On Thu, 2014-01-16 at 09:58 +0800, Ethan Zhao wrote:
>> Aaron,
>>
>> Is this your net-next repo ? if so, I rebuild the patch with this repo
>> right now .
>> git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next.git
>>
>> Thanks,
>> Ethan
>>
> Only sort of.  Jeff uses it to push patches up, but I don't have an
> account there so am simply sending them up the old fashioned way, via
> email, hence it is probably not getting updated while Jeff is out.
>
> As long as your patch can apply cleanly to Dave Miller's net-next tree I
> should be able to pull it into our internal ones.

^ permalink raw reply

* [PATCH iproute2 v2 2/2] netem: add 64bit rates support
From: Yang Yingliang @ 2014-01-16  3:09 UTC (permalink / raw)
  To: stephen; +Cc: netdev, eric.dumazet
In-Reply-To: <1389841754-22008-1-git-send-email-yangyingliang@huawei.com>

netem support 64bit rates start from linux-3.13.
Add 64bit rates support in tc tools.

tc qdisc show dev eth0
qdisc netem 1: dev eth4 root refcnt 2 limit 1000 rate 35Gbit

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
---
 tc/q_netem.c | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/tc/q_netem.c b/tc/q_netem.c
index 9dd8712..946007c 100644
--- a/tc/q_netem.c
+++ b/tc/q_netem.c
@@ -183,6 +183,7 @@ static int netem_parse_opt(struct qdisc_util *qu, int argc, char **argv,
 	__s16 *dist_data = NULL;
 	__u16 loss_type = NETEM_LOSS_UNSPEC;
 	int present[__TCA_NETEM_MAX];
+	__u64 rate64 = 0;
 
 	memset(&cor, 0, sizeof(cor));
 	memset(&reorder, 0, sizeof(reorder));
@@ -391,7 +392,7 @@ static int netem_parse_opt(struct qdisc_util *qu, int argc, char **argv,
 		} else if (matches(*argv, "rate") == 0) {
 			++present[TCA_NETEM_RATE];
 			NEXT_ARG();
-			if (get_rate(&rate.rate, *argv)) {
+			if (get_rate64(&rate64, *argv)) {
 				explain1("rate");
 				return -1;
 			}
@@ -496,9 +497,18 @@ static int netem_parse_opt(struct qdisc_util *qu, int argc, char **argv,
 		addattr_nest_end(n, start);
 	}
 
-	if (present[TCA_NETEM_RATE] &&
-	    addattr_l(n, 1024, TCA_NETEM_RATE, &rate, sizeof(rate)) < 0)
-		return -1;
+	if (present[TCA_NETEM_RATE]) {
+		if (rate64 >= (1ULL << 32)) {
+			if (addattr_l(n, 1024,
+				      TCA_NETEM_RATE64, &rate64, sizeof(rate64)) < 0)
+				return -1;
+			rate.rate = ~0U;
+		} else {
+			rate.rate = rate64;
+		}
+		if (addattr_l(n, 1024, TCA_NETEM_RATE, &rate, sizeof(rate)) < 0)
+			return -1;
+	}
 
 	if (dist_data) {
 		if (addattr_l(n, MAX_DIST * sizeof(dist_data[0]),
@@ -522,6 +532,7 @@ static int netem_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
 	struct tc_netem_qopt qopt;
 	const struct tc_netem_rate *rate = NULL;
 	int len = RTA_PAYLOAD(opt) - sizeof(qopt);
+	__u64 rate64 = 0;
 	SPRINT_BUF(b1);
 
 	if (opt == NULL)
@@ -572,6 +583,11 @@ static int netem_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
 				return -1;
 			ecn = RTA_DATA(tb[TCA_NETEM_ECN]);
 		}
+		if (tb[TCA_NETEM_RATE64]) {
+			if (RTA_PAYLOAD(tb[TCA_NETEM_RATE64]) < sizeof(rate64))
+				return -1;
+			rate64 = rta_getattr_u64(tb[TCA_NETEM_RATE64]);
+		}
 	}
 
 	fprintf(f, "limit %d", qopt.limit);
@@ -632,7 +648,10 @@ static int netem_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
 	}
 
 	if (rate && rate->rate) {
-		fprintf(f, " rate %s", sprint_rate(rate->rate, b1));
+		if (rate64)
+			fprintf(f, " rate %s", sprint_rate(rate64, b1));
+		else
+			fprintf(f, " rate %s", sprint_rate(rate->rate, b1));
 		if (rate->packet_overhead)
 			fprintf(f, " packetoverhead %d", rate->packet_overhead);
 		if (rate->cell_size)
-- 
1.8.0

^ permalink raw reply related

* [PATCH iproute2 v2 1/2] tbf: support sending burst/mtu to kernel directly
From: Yang Yingliang @ 2014-01-16  3:09 UTC (permalink / raw)
  To: stephen; +Cc: netdev, eric.dumazet
In-Reply-To: <1389841754-22008-1-git-send-email-yangyingliang@huawei.com>

To avoid loss when transforming burst to buffer in userspace, send
burst/mtu to kernel directly.

Kernel commit 2e04ad424b("sch_tbf: add TBF_BURST/TBF_PBURST attribute")
make it can handle burst/mtu.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
---
 tc/q_tbf.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tc/q_tbf.c b/tc/q_tbf.c
index 2fbfd3b..f3022b6 100644
--- a/tc/q_tbf.c
+++ b/tc/q_tbf.c
@@ -232,12 +232,14 @@ static int tbf_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl
 	tail = NLMSG_TAIL(n);
 	addattr_l(n, 1024, TCA_OPTIONS, NULL, 0);
 	addattr_l(n, 2024, TCA_TBF_PARMS, &opt, sizeof(opt));
+	addattr_l(n, 2124, TCA_TBF_BURST, &buffer, sizeof(buffer));
 	if (rate64 >= (1ULL << 32))
 		addattr_l(n, 2124, TCA_TBF_RATE64, &rate64, sizeof(rate64));
 	addattr_l(n, 3024, TCA_TBF_RTAB, rtab, 1024);
 	if (opt.peakrate.rate) {
 		if (prate64 >= (1ULL << 32))
 			addattr_l(n, 3124, TCA_TBF_PRATE64, &prate64, sizeof(prate64));
+		addattr_l(n, 3224, TCA_TBF_PBURST, &mtu, sizeof(mtu));
 		addattr_l(n, 4096, TCA_TBF_PTAB, ptab, 1024);
 	}
 	tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
-- 
1.8.0

^ permalink raw reply related

* [PATCH iproute2 v2 0/2] two improvements in tc
From: Yang Yingliang @ 2014-01-16  3:09 UTC (permalink / raw)
  To: stephen; +Cc: netdev, eric.dumazet

Support sending burst/mtu to kernel directly in tbf.
Support 64bit rate in netem.

v1 -> v2:
  patch 2/2: Use rta_getattr_u64() to get value of rate64.

Yang Yingliang (2):
  tbf: support sending burst/mtu to kernel directly
  netem: add 64bit rates support

 tc/q_netem.c | 29 ++++++++++++++++++++++++-----
 tc/q_tbf.c   |  2 ++
 2 files changed, 26 insertions(+), 5 deletions(-)

-- 
1.8.0

^ permalink raw reply

* Re: [PATCH net-next 1/2] random32: add prandom_u32_lt_N and convert "misuses" of reciprocal_divide
From: Eric Dumazet @ 2014-01-16  3:14 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: davem, netdev, linux-kernel, Jakub Zawadzki, Hannes Frederic Sowa
In-Reply-To: <1389828228-30312-2-git-send-email-dborkman@redhat.com>

On Thu, 2014-01-16 at 00:23 +0100, Daniel Borkmann wrote:

> @@ -1220,7 +1219,7 @@ static unsigned int fanout_demux_hash(struct packet_fanout *f,
>  				      struct sk_buff *skb,
>  				      unsigned int num)
>  {
> -	return reciprocal_divide(skb->rxhash, num);
> +	return (u32)(((u64) skb->rxhash * num) >> 32);
>  }
>  

This is unfortunate.

(This reverts one of your patch : f55d112e529386 )

Please add a helper to explain what's going on here, and on many other
spots we do this computation (as in get_rps_cpu()).
Few people really understand this.

Or keep reciprocal_divide() as is, and introduce a new set of functions
for people really wanting the precise divides.

^ permalink raw reply

* Re: [PATCH iproute2 v2 2/2] netem: add 64bit rates support
From: Eric Dumazet @ 2014-01-16  3:17 UTC (permalink / raw)
  To: Yang Yingliang; +Cc: stephen, netdev
In-Reply-To: <1389841754-22008-3-git-send-email-yangyingliang@huawei.com>

On Thu, 2014-01-16 at 11:09 +0800, Yang Yingliang wrote:
> netem support 64bit rates start from linux-3.13.
> Add 64bit rates support in tc tools.
> 
> tc qdisc show dev eth0
> qdisc netem 1: dev eth4 root refcnt 2 limit 1000 rate 35Gbit
> 
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
> ---
>  tc/q_netem.c | 29 ++++++++++++++++++++++++-----
>  1 file changed, 24 insertions(+), 5 deletions(-)

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH 1/2 v3] ixgbe: define IXGBE_MAX_VFS_DRV_LIMIT macro and cleanup const 63
From: Ethan Zhao @ 2014-01-16  4:20 UTC (permalink / raw)
  To: Brown, Aaron F
  Cc: ethan zhao, e1000-devel@lists.sourceforge.net,
	netdev@vger.kernel.org, Allan, Bruce W, Brandeburg, Jesse,
	linux-kernel@vger.kernel.org, davem@davemloft.net
In-Reply-To: <CABawtvPJ1ObkwbKjVtzajQnn_181CTo-jmLc0uGY2nqZw9_jtg@mail.gmail.com>

Aaron,
    Revised those patches for Dave Miller's net-next OK, passed
building. resent.

Thanks,
Ethan

On Thu, Jan 16, 2014 at 11:08 AM, Ethan Zhao <ethan.kernel@gmail.com> wrote:
> Aaron,
>     Ok, Dave Miler's net-next tree.
>
> Thanks,
> Etan
>
> On Thu, Jan 16, 2014 at 10:51 AM, Brown, Aaron F
> <aaron.f.brown@intel.com> wrote:
>> On Thu, 2014-01-16 at 09:58 +0800, Ethan Zhao wrote:
>>> Aaron,
>>>
>>> Is this your net-next repo ? if so, I rebuild the patch with this repo
>>> right now .
>>> git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next.git
>>>
>>> Thanks,
>>> Ethan
>>>
>> Only sort of.  Jeff uses it to push patches up, but I don't have an
>> account there so am simply sending them up the old fashioned way, via
>> email, hence it is probably not getting updated while Jeff is out.
>>
>> As long as your patch can apply cleanly to Dave Miller's net-next tree I
>> should be able to pull it into our internal ones.

------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: [RFC PATCH net-next 0/3] virtio_net: add aRFS support
From: Jason Wang @ 2014-01-16  4:23 UTC (permalink / raw)
  To: Zhi Yong Wu; +Cc: netdev, therbert, edumazet, davem, Zhi Yong Wu
In-Reply-To: <1389795654-28381-1-git-send-email-zwu.kernel@gmail.com>

On 01/15/2014 10:20 PM, Zhi Yong Wu wrote:
> From: Zhi Yong Wu<wuzhy@linux.vnet.ibm.com>
>
> HI, folks
>
> The patchset is trying to integrate aRFS support to virtio_net. In this case,
> aRFS will be used to select the RX queue. To make sure that it's going ahead
> in the correct direction, although it is still one RFC and isn't tested, it's
> post out ASAP. Any comment are appreciated, thanks.
>
> If anyone is interested in playing with it, you can get this patchset from my
> dev git on github:
>    git://github.com/wuzhy/kernel.git virtnet_rfs
>
> Zhi Yong Wu (3):
>    virtio_pci: Introduce one new config api vp_get_vq_irq()
>    virtio_net: Introduce one dummy function virtnet_filter_rfs()
>    virtio-net: Add accelerated RFS support
>
>   drivers/net/virtio_net.c      |   67 ++++++++++++++++++++++++++++++++++++++++-
>   drivers/virtio/virtio_pci.c   |   11 +++++++
>   include/linux/virtio_config.h |   12 +++++++
>   3 files changed, 89 insertions(+), 1 deletions(-)
>

Please run get_maintainter.pl before sending the patch. You'd better at 
least cc virtio maintainer/list for this.

The core aRFS method is a noop in this RFC which make this series no 
much sense to discuss. You should at least mention the big picture here 
in the cover letter. I suggest you should post a RFC which can run and 
has expected result or you can just raise a thread for the design 
discussion.

And this method has been discussed before, you can search "[net-next RFC 
PATCH 5/5] virtio-net: flow director support" in netdev archive for a 
very old prototype implemented by me. It can work and looks like most of 
this RFC have already done there.

A basic question is whether or not we need this, not all the mq cards 
use aRFS (see ixgbe ATR). And whether or not it can bring extra 
overheads? For virtio, we want to reduce the vmexits as much as possible 
but this aRFS seems introduce a lot of more of this. Making a complex 
interfaces just for an virtual device may not be good, simple method may 
works for most of the cases.

We really should consider to offload this to real nic. VMDq and L2 
forwarding offload may help in this case.

^ permalink raw reply

* Re: [PATCH net-next RFC] virtio-net: drop rq->max and rq->num
From: Jason Wang @ 2014-01-16  4:24 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, virtualization, linux-kernel, mst
In-Reply-To: <20140115.164649.533508366980529205.davem@davemloft.net>

On 01/16/2014 08:46 AM, David Miller wrote:
> From: Rusty Russell<rusty@rustcorp.com.au>
> Date: Thu, 16 Jan 2014 10:25:26 +1030
>
>> Rusty Russell<rusty@rustcorp.com.au>  writes:
>>> Jason Wang<jasowang@redhat.com>  writes:
>>>> It looks like there's no need for those two fields:
>>>>
>>>> - Unless there's a failure for the first refill try, rq->max should be always
>>>>    equal to the vring size.
>>>> - rq->num is only used to determine the condition that we need to do the refill,
>>>>    we could check vq->num_free instead.
>>>> - rq->num was required to be increased or decreased explicitly after each
>>>>    get/put which results a bad API.
>>>>
>>>> So this patch removes them both to make the code simpler.
>>> Nice.  These fields date from when the vq struct was opaque.
>>>
>>> Applied,
>>> Rusty.
>> Oops, this doesn't require any core virtio changes, so it's for DaveM:
>>
>> Acked-by: Rusty Russell<rusty@rustcorp.com.au>
> Jason please repost this with Rusty's ACK, thanks.

Sure, will repost.

^ permalink raw reply

* [PATCH 1/2 Net-next] ixgbe: define IXGBE_MAX_VFS_DRV_LIMIT macro and cleanup const 63
From: ethan zhao @ 2014-01-16  4:25 UTC (permalink / raw)
  To: aaron.f.brown, jeffrey.t.kirsher
  Cc: e1000-devel, netdev, ethan.kernel, linux-kernel


Because ixgbe driver limit the max number of VF functions could be enalbed
to 63, so define one macro IXGBE_MAX_VFS_DRV_LIMIT and cleanup the const 63
in code.

v3: revised for net-next tree.

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Ethan Zhao <ethan.kernel@gmail.com>
---
  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |    4 ++--
  drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |    4 ++--
  drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h |    5 +++++
  3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index cc06854..bea2cec 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -5028,7 +5028,7 @@ static int ixgbe_sw_init(struct ixgbe_adapter 
*adapter)

      /* assign number of SR-IOV VFs */
      if (hw->mac.type != ixgbe_mac_82598EB) {
-        if (max_vfs > 63) {
+        if (max_vfs > IXGBE_MAX_VFS_DRV_LIMIT) {
              adapter->num_vfs = 0;
              e_dev_warn("max_vfs parameter out of range. Not assigning 
any SR-IOV VFs\n");
          } else {
@@ -7973,7 +7973,7 @@ static int ixgbe_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
      ixgbe_init_mbx_params_pf(hw);
      memcpy(&hw->mbx.ops, ii->mbx_ops, sizeof(hw->mbx.ops));
      ixgbe_enable_sriov(adapter);
-    pci_sriov_set_totalvfs(pdev, 63);
+    pci_sriov_set_totalvfs(pdev, IXGBE_MAX_VFS_DRV_LIMIT);
  skip_sriov:

  #endif
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 359f6e6..b324260 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -148,7 +148,7 @@ void ixgbe_enable_sriov(struct ixgbe_adapter *adapter)
           * physical function.  If the user requests greater thn
           * 63 VFs then it is an error - reset to default of zero.
           */
-        adapter->num_vfs = min_t(unsigned int, adapter->num_vfs, 63);
+        adapter->num_vfs = min_t(unsigned int, adapter->num_vfs, 
IXGBE_MAX_VFS_DRV_LIMIT);

          err = pci_enable_sriov(adapter->pdev, adapter->num_vfs);
          if (err) {
@@ -257,7 +257,7 @@ static int ixgbe_pci_sriov_enable(struct pci_dev 
*dev, int num_vfs)
       * PF.  The PCI bus driver already checks for other values out of
       * range.
       */
-    if (num_vfs > 63) {
+    if (num_vfs > IXGBE_MAX_VFS_DRV_LIMIT) {
          err = -EPERM;
          goto err_out;
      }
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h
index 4713f9f..8bd2919 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h
@@ -28,6 +28,11 @@
  #ifndef _IXGBE_SRIOV_H_
  #define _IXGBE_SRIOV_H_

+/*  ixgbe driver limit the max number of VFs could be enabled to
+ *  63 (IXGBE_MAX_VF_FUNCTIONS - 1)
+ */
+#define IXGBE_MAX_VFS_DRV_LIMIT  (IXGBE_MAX_VF_FUNCTIONS - 1)
+
  void ixgbe_restore_vf_multicasts(struct ixgbe_adapter *adapter);
  void ixgbe_msg_task(struct ixgbe_adapter *adapter);
  int ixgbe_vf_configuration(struct pci_dev *pdev, unsigned int event_mask);
-- 
1.7.1


------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply related

* [PATCH 2/2 Net-next] ixgbe: set driver_max_VFs should be done before enabling SRIOV
From: ethan zhao @ 2014-01-16  4:27 UTC (permalink / raw)
  To: aaron.f.brown, jeffrey.t.kirsher
  Cc: e1000-devel, netdev, ethan.kernel, linux-kernel


commit 43dc4e01 Limit number of reported VFs to device specific value
It doesn't work and always returns -EBUSY because VFs ware already enabled.

ixgbe_enable_sriov()
         pci_enable_sriov()
                 sriov_enable()
                 {
                 ... ..
                 iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE;
                 pci_cfg_access_lock(dev);
                 ... ...
                 }

pci_sriov_set_totalvfs()
{
... ...
if (dev->sriov->ctrl & PCI_SRIOV_CTRL_VFE)
                 return -EBUSY;
...
}

So should set driver_max_VFs with pci_sriov_set_totalvfs() before
enable VFs with ixgbe_enable_sriov().

V2: revised for net-next tree.

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Ethan Zhao <ethan.kernel@gmail.com>
---
  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index bea2cec..6e6af0d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -7972,8 +7972,8 @@ static int ixgbe_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
      /* Mailbox */
      ixgbe_init_mbx_params_pf(hw);
      memcpy(&hw->mbx.ops, ii->mbx_ops, sizeof(hw->mbx.ops));
-    ixgbe_enable_sriov(adapter);
      pci_sriov_set_totalvfs(pdev, IXGBE_MAX_VFS_DRV_LIMIT);
+    ixgbe_enable_sriov(adapter);
  skip_sriov:

  #endif
-- 
1.7.1


------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply related

* Re: [PATCH net-next] tun/macvtap: limit the packets queued through rcvbuf
From: Jason Wang @ 2014-01-16  4:29 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: davem, netdev, linux-kernel, Vlad Yasevich, John Fastabend,
	Stephen Hemminger, Herbert Xu
In-Reply-To: <20140115072104.GA32078@redhat.com>

On 01/15/2014 03:21 PM, Michael S. Tsirkin wrote:
> On Wed, Jan 15, 2014 at 11:36:01AM +0800, Jason Wang wrote:
>> On 01/14/2014 05:52 PM, Michael S. Tsirkin wrote:
>>> On Tue, Jan 14, 2014 at 04:45:24PM +0800, Jason Wang wrote:
>>>>> On 01/14/2014 04:25 PM, Michael S. Tsirkin wrote:
>>>>>>> On Tue, Jan 14, 2014 at 02:53:07PM +0800, Jason Wang wrote:
>>>>>>>>> We used to limit the number of packets queued through tx_queue_length. This
>>>>>>>>> has several issues:
>>>>>>>>>
>>>>>>>>> - tx_queue_length is the control of qdisc queue length, simply reusing it
>>>>>>>>>    to control the packets queued by device may cause confusion.
>>>>>>>>> - After commit 6acf54f1cf0a6747bac9fea26f34cfc5a9029523 ("macvtap: Add
>>>>>>>>>    support of packet capture on macvtap device."), an unexpected qdisc
>>>>>>>>>    caused by non-zero tx_queue_length will lead qdisc lock contention for
>>>>>>>>>    multiqueue deivce.
>>>>>>>>> - What we really want is to limit the total amount of memory occupied not
>>>>>>>>>    the number of packets.
>>>>>>>>>
>>>>>>>>> So this patch tries to solve the above issues by using socket rcvbuf to
>>>>>>>>> limit the packets could be queued for tun/macvtap. This was done by using
>>>>>>>>> sock_queue_rcv_skb() instead of a direct call to skb_queue_tail(). Also two
>>>>>>>>> new ioctl() were introduced for userspace to change the rcvbuf like what we
>>>>>>>>> have done for sndbuf.
>>>>>>>>>
>>>>>>>>> With this fix, we can safely change the tx_queue_len of macvtap to
>>>>>>>>> zero. This will make multiqueue works without extra lock contention.
>>>>>>>>>
>>>>>>>>> Cc: Vlad Yasevich<vyasevic@redhat.com>
>>>>>>>>> Cc: Michael S. Tsirkin<mst@redhat.com>
>>>>>>>>> Cc: John Fastabend<john.r.fastabend@intel.com>
>>>>>>>>> Cc: Stephen Hemminger<stephen@networkplumber.org>
>>>>>>>>> Cc: Herbert Xu<herbert@gondor.apana.org.au>
>>>>>>>>> Signed-off-by: Jason Wang<jasowang@redhat.com>
>>>>>>> No, I don't think we can change userspace-visible behaviour like that.
>>>>>>>
>>>>>>> This will break any existing user that tries to control
>>>>>>> queue length through sysfs,netlink or device ioctl.
>>>>> But it looks like a buggy API, since tx_queue_len should be for qdisc
>>>>> queue length instead of device itself.
>>> Probably, but it's been like this since 2.6.x time.
>>> Also, qdisc queue is unused for tun so it seemed kind of
>>> reasonable to override tx_queue_len.
>>>
>>>>> If we really want to preserve the
>>>>> behaviour, how about using a new feature flag and change the behaviour
>>>>> only when the device is created (TUNSETIFF) with the new flag?
>>> OK this addresses the issue partially, but there's also an issue
>>> of permissions: tx_queue_len can only be changed if
>>> capable(CAP_NET_ADMIN). OTOH in your patch a regular user
>>> can change the amount of memory consumed per queue
>>> by calling TUNSETRCVBUF.
>> Yes, but we have the same issue for TUNSETSNDBUF.
> To an extent, but TUNSETSNDBUF is different. It limits how much device can queue
> *in the networking stack* but each queue in the stack is also
> limited, when we exceed that we star dropping packets.
> So while with infinite value (which is the default btw)
> you can keep host pretty busy, you will not be able to run
> it out of memory.
>
> The proposed TUNSETRCVBUF would keep configured amount
> of memory around indefinitely so you can run host out of memory.
>
> So assuming all this
> How about an ethtool or netlink command to configure this
> instead?
>

Ok, so we can add net admin check for before trying to set rcvbuf. I 
think it's better to use ioctl since we've already use it for sndbuf. 
Using ethool means you need a dedicated new ethtool method just for 
tuntap which seems sub-optimal. Netlink looks better, but we should also 
implement other ioctl also.
>>>>>>> Take a look at my patch in msg ID 20140109071721.GD19559@redhat.com
>>>>>>> which gives one way to set tx_queue_len to zero without
>>>>>>> breaking userspace.
>>>>> If I read the patch correctly, it will make no way for the user who
>>>>> really want to change the qdisc queue length for tun.
>>> Why would this matter?  As far as I can see qdisc queue is currently unused.
>>>
>> User may use qdisc to do port mirroring, bandwidth limitation, traffic
>> prioritization or more for a VM. So we do have users and maybe more
>> consider the case of vpn.
> Well it's not used by default at least.
> I remember that we discussed this previously actually.
>
> If all we want to do actually is utilize no_qdisc by default,
> we can simply use Eric's patch:
>
> http://article.gmane.org/gmane.linux.kernel/1279597
>
> and a similar patch for macvtap.
> I tried it at the time and it didn't seem to help performance
> at all, but a lot has changed since, in particular I didn't
> test mq.
>
> If you now have results showing how it's beneficial, pls post them.
>

I will have a test to see the difference.

^ permalink raw reply

* Re: [PATCH net-next v3 2/2] net: Check skb->rxhash in gro_receive
From: Jerry Chu @ 2014-01-16  5:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Tom Herbert, David Miller, netdev@vger.kernel.org
In-Reply-To: <1389806708.31367.354.camel@edumazet-glaptop2.roam.corp.google.com>

On Wed, Jan 15, 2014 at 9:25 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2014-01-15 at 08:58 -0800, Tom Herbert wrote:
>> When initializing a gro_list for a packet, first check the rxhash of
>> the incoming skb against that of the skb's in the list. This should be
>> a very strong inidicator of whether the flow is going to be matched,
>> and potentially allows a lot of other checks to be short circuited.
>> Use skb_hash_raw so that we don't force the hash to be calculated.
>>
>> Tested by running netperf 200 TCP_STREAMs between two machines with
>> GRO, HW rxhash, and 1G. Saw no performance degration, slight reduction
>> of time in dev_gro_receive.
>>
>> Signed-off-by: Tom Herbert <therbert@google.com>
>> ---
>>  net/core/dev.c | 9 ++++++++-
>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 20c834e..c063c7c 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -3818,10 +3818,18 @@ static void gro_list_prepare(struct napi_struct *napi, struct sk_buff *skb)
>>  {
>>       struct sk_buff *p;
>>       unsigned int maclen = skb->dev->hard_header_len;
>> +     u32 hash = skb_get_hash_raw(skb);
>>
>>       for (p = napi->gro_list; p; p = p->next) {
>>               unsigned long diffs;
>>
>> +             NAPI_GRO_CB(p)->flush = 0;
>> +
>> +             if (hash != skb_get_hash_raw(p)) {
>> +                     NAPI_GRO_CB(p)->same_flow = 0;
>> +                     continue;
>> +             }
>> +
>>               diffs = (unsigned long)p->dev ^ (unsigned long)skb->dev;
>>               diffs |= p->vlan_tci ^ skb->vlan_tci;
>>               if (maclen == ETH_HLEN)
>> @@ -3832,7 +3840,6 @@ static void gro_list_prepare(struct napi_struct *napi, struct sk_buff *skb)
>>                                      skb_gro_mac_header(skb),
>>                                      maclen);
>>               NAPI_GRO_CB(p)->same_flow = !diffs;
>> -             NAPI_GRO_CB(p)->flush = 0;
>>       }
>>  }
>>
>
> Acked-by: Eric Dumazet <edumazet@google.com>
>
> Hmm, this looks like we should clear flush_id in ipv6 handler,
> otherwise we might reuse a flush_id set from a prior gro invocation in
> ipv4 (skb can be reused in napi_reuse_skb())

Good catch! It's also needed for the IPv6 encaped in IPv4 case (i forgot
there is no frag/id hence no such check for IPv6).

>
> Jerry, what do you think of following fix ?
>
> diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
> index 1e8683b135bb..598acd76ca4a 100644
> --- a/net/ipv6/ip6_offload.c
> +++ b/net/ipv6/ip6_offload.c
> @@ -256,6 +256,7 @@ static struct sk_buff **ipv6_gro_receive(struct sk_buff **head,
>                 /* flush if Traffic Class fields are different */
>                 NAPI_GRO_CB(p)->flush |= !!(first_word & htonl(0x0FF00000));
>                 NAPI_GRO_CB(p)->flush |= flush;
> +               NAPI_GRO_CB(p)->flush_id = 0;
>         }
>
>         NAPI_GRO_CB(skb)->flush |= flush;

Acked-by: H.K. Jerry Chu <hkchu@google.com>

>
>
>

^ permalink raw reply

* Re: [PATCH net-next 0/6] bonding: only rely on arp packets if arp monitor is used
From: Jay Vosburgh @ 2014-01-16  5:09 UTC (permalink / raw)
  To: Veaceslav Falico; +Cc: netdev, Andy Gospodarek, David S. Miller
In-Reply-To: <1389837916-5377-1-git-send-email-vfalico@redhat.com>

Veaceslav Falico <vfalico@redhat.com> wrote:

>Currently, if arp_validate is off (0), slave_last_rx() returns the
>slave->dev->last_rx, which is always updated on *any* packet received by
>slave, and not only arps. This means that, if the validation of arps is
>off, we're treating *any* incoming packet as a proof of slave being up, and
>not only arps.

	The "any incoming packet" part is intentional.

>This might seem logical at the first glance, however it can cause a lot of
>troubles and false-positives, one example would be:
>
>The arp_ip_target is NOT accessible, however someone in the broadcast domain
>spams with any broadcast traffic. This way bonding will be tricked that the
>slave is still up (as in - can access arp_ip_target), while it's not.

	This type of situation is why arp_validate was added.

	The specific situation was when multiple hosts using bonding
with the ARP monitor were set up behind a common gateway (in the same
Ethernet broadcast domain).  The arp_ip_target is unreachable for
whatever reason.  In that case, the various bonding instances on the
different hosts will each issue broadcast ARP requests, and (in the
absence of arp_validate) those requests would trick the other bonds into
believing that they are up.

	I don't think this patch set will resolve that problem, since
you explicitly permit any incoming ARP to count.

>The documentation for arp_validate also states that *ARPs* will (not) be
>validated if it's on/off, and that the arp monitoring works on arps as
>traffic generators.

	I wrote most of that text in the documentation, and the intent
was not to imply that only ARPs should count for "up-ness" even without
arp_validate enabled.  The intent was to distinguish it from
"non-validate," in which any incoming traffic counted for "up-ness."

	The main reason for preserving the non-validate behavior (any
traffic counts) is for the loadbalance (xor and rr) modes.  In those
modes, the switch decides which slave receives the incoming traffic, and
so it's to our advantage to permit any incoming traffic to count for
"up-ness."  The arp_validate option is not allowed in these modes
because it won't work.

	With these changes, I suspect that the loadbalance ARP monitor
will be less reliable with these changes (granted that it's already a
bit dodgy in its dependence on the switch to hit all slaves with
incoming packets regularly).  Particularly if the switch ports are
configured into an Etherchannel ("static link aggregation") group, in
which case only one slave will receive any given frame (broadcast /
multicast traffic will not be duplicated across all slaves).

	I'm not sure that this change (the "only count ARPs even without
arp_validate" bit) won't break existing configurations.  Did you test
the -rr and -xor modes with ARP monitor after your changes (with and
without configuring a channel group on the switch ports)?

>Also, the net_device->last_rx is already used in a lot of drivers (even
>though the comment states to NOT do it :)), and it's also ugly to modify it
>from bonding.

	I didn't check, but I suspect those are mostly leftovers from
the distant past, when the drivers were expected to update last_rx, or
perhaps drivers using it for their own purposes.

	I don't really see an issue in decoupling bonding from the
net_device->last_rx; it's pretty much the same thing that was done for
trans_start some time ago.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com


>So, to fix this, remove the last_rx from bonding, *always* call
>bond_arp_rcv() in slave's rx_handler (bond_handle_frame), and if we spot an
>arp there - update the slave->last_arp_rx - and use it instead of
>net_device->last_rx. Finally, rename slave_last_rx() to slave_last_arp_rx()
>to reflect the changes.
>
>As the changes touch really sensitive parts, I've tried to split them as
>much as possible, for easier debugging/bisecting.
>
>CC: Jay Vosburgh <fubar@us.ibm.com>
>CC: Andy Gospodarek <andy@greyhouse.net>
>CC: "David S. Miller" <davem@davemloft.net>
>CC: netdev@vger.kernel.org
>Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
>
>---
> drivers/net/bonding/bond_main.c    | 18 ++++++++----------
> drivers/net/bonding/bond_options.c | 12 ++----------
> drivers/net/bonding/bonding.h      | 16 ++++++----------
> include/linux/netdevice.h          |  8 +-------
> 4 files changed, 17 insertions(+), 37 deletions(-)
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox