Netdev List
 help / color / mirror / Atom feed
* [PATCH -net v2 0/2] bonding: fix arp_validate desync state & race
From: Nikolay Aleksandrov @ 2013-09-06 22:00 UTC (permalink / raw)
  To: netdev; +Cc: fubar, andy, davem

Hello all,
These two patches aim to fix the possible de-sync state which the bond
can enter if we have arp_validate without arp_interval or the other way
around. They also fix a race condition between arp_validate setting and
mode changing.

Patch 01 - fixes the race condition between store_arp_validate and bond
mode change by using rtnl for sync
Patch 02 - fixes the possible de-sync state by setting/unsetting recv_probe
if arp_interval is set/unset and also if arp_validate is set/unset

v2: Fix the mode check in store_arp_validate

Best regards,
 Nikolay Aleksandrov


Nikolay Aleksandrov (2):
  bonding: fix store_arp_validate race with mode change
  bonding: fix bond_arp_rcv setting and arp validate desync state

 drivers/net/bonding/bond_main.c  |  4 ++--
 drivers/net/bonding/bond_sysfs.c | 31 +++++++++++++++++++++++++------
 drivers/net/bonding/bonding.h    |  1 +
 3 files changed, 28 insertions(+), 8 deletions(-)

-- 
1.8.1.4

^ permalink raw reply

* Re: [RFC PATCHv2 0/4] Add DT support for fixed PHYs
From: Florian Fainelli @ 2013-09-06 20:42 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: David S. Miller, netdev, devicetree, Lior Amsalem,
	Gregory Clement, Ezequiel Garcia, linux-arm-kernel, Mark Rutland,
	Sascha Hauer, Christian Gmeiner
In-Reply-To: <1378480701-12908-1-git-send-email-thomas.petazzoni@free-electrons.com>

Hello Thomas,

Le vendredi 6 septembre 2013 17:18:17 Thomas Petazzoni a écrit :
> Hello,
> 
> Here is a second version of the patch set that adds a Device Tree
> binding and the related code to support fixed PHYs. Marked as RFC,
> this patch set is obviously not intended for merging in 3.12.

Thanks a lot for continuing on this work, I really like the state of it now.

> 
> Since the first version, the changes have been:
> 
>  * Instead of using a 'fixed-link' property inside the Ethernet device
>    DT node, with a fairly cryptic succession of integer values, we now
>    use a PHY subnode under the Ethernet device DT node, with explicit
>    properties to configure the duplex, speed, pause and other PHY
>    properties.
> 
>  * The PHY address is automatically allocated by the kernel and no
>    longer visible in the Device Tree binding.
> 
>  * The PHY device is created directly when the network driver calls
>    of_phy_connect_fixed_link(), and associated to the PHY DT node,
>    which allows the existing of_phy_connect() function to work,
>    without the need to use the deprecated of_phy_connect_fixed_link().
> 
> The things I am not entirely happy with yet are:
> 
>  * The PHY ID is hardcoded to 0xdeadbeef. Ideally, it should be a
>    properly reserved vendor/device identifier, but it isn't clear how
>    to get one allocated for this purpose.

Right, we should try to get something better, but we obviously cannot use an 
already allocated OUI for this. Can we ask the Linux foundation or a Linux-
friendly company to allocate one maybe?

> 
>  * The fixed_phy_register() function in drivers/net/phy/fixed.c has
>    some OF references. So ideally, I would have preferred to put this
>    code in drivers/of/of_mdio.c, but to call get_phy_device(), we need
>    a reference to the mii_bus structure that represents the fixed MDIO
>    bus.

This is not a big deal, not everything in drivers/ is consistent with this, 
and making the fixed MDIO bus globally accessible does not sound too great.

> 
>  * There is some error management missing in fixed_phy_register(), but
>    it can certainly be added easily. This RFC is meant to sort out the
>    general idea.

Do you think you could add these to got beyond the RFC state? The patchset as 
it currently is fine with me if you can address these.
-- 
Florian

^ permalink raw reply

* Re: [PATCH -net 0/2] bonding: fix arp_validate desync state & race
From: Nikolay Aleksandrov @ 2013-09-06 20:10 UTC (permalink / raw)
  To: netdev; +Cc: fubar, davem, andy, mleitner
In-Reply-To: <1378496518-16104-1-git-send-email-nikolay@redhat.com>

On 09/06/2013 09:41 PM, Nikolay Aleksandrov wrote:
> Hello all,
> These two patches aim to fix the possible de-sync state which the bond
> can enter if we have arp_validate without arp_interval or the other way
> around. They also fix a race condition between arp_validate setting and
> mode changing.
> 
> Patch 01 - fixes the race condition between store_arp_validate and bond
> mode change by using rtnl for sync
> Patch 02 - fixes the possible de-sync state by setting/unsetting recv_probe
> if arp_interval is set/unset and also if arp_validate is set/unset
> 
> Best regards,
>  Nikolay Aleksandrov
> 
> 
> Nikolay Aleksandrov (2):
>   bonding: fix store_arp_validate race with mode change
>   bonding: fix bond_arp_rcv setting and arp validate desync state
> 
>  drivers/net/bonding/bond_main.c  |  4 ++--
>  drivers/net/bonding/bond_sysfs.c | 31 +++++++++++++++++++++++++------
>  drivers/net/bonding/bonding.h    |  1 +
>  3 files changed, 28 insertions(+), 8 deletions(-)
> 
Ugh, self-nack I just noticed the mode check in store_arp_validate passes
if new_value is 0.

I'll wait some time to see if there isn't anything else and re-post.

Cheers,
 Nik

^ permalink raw reply

* [PATCH -net 2/2] bonding: fix bond_arp_rcv setting and arp validate desync state
From: Nikolay Aleksandrov @ 2013-09-06 19:41 UTC (permalink / raw)
  To: netdev; +Cc: fubar, davem, andy, mleitner
In-Reply-To: <1378496518-16104-1-git-send-email-nikolay@redhat.com>

We make bond_arp_rcv global so it can be used in bond_sysfs if the bond
interface is up and arp_interval is being changed to a positive value
and cleared otherwise as per Jay's suggestion.
This also fixes a problem where bond_arp_rcv was set even though
arp_validate was disabled while the bond was up by unsetting recv_probe
in bond_store_arp_validate and respectively setting it if enabled.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
I've intentionally left the prototype line >80 chars, let me know if I
should break it.

 drivers/net/bonding/bond_main.c  |  4 ++--
 drivers/net/bonding/bond_sysfs.c | 19 ++++++++++++++++---
 drivers/net/bonding/bonding.h    |  1 +
 3 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 39e5b1c..72df399 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2404,8 +2404,8 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
 	slave->target_last_arp_rx[i] = jiffies;
 }
 
-static int bond_arp_rcv(const struct sk_buff *skb, struct bonding *bond,
-			struct slave *slave)
+int bond_arp_rcv(const struct sk_buff *skb, struct bonding *bond,
+		 struct slave *slave)
 {
 	struct arphdr *arp = (struct arphdr *)skb->data;
 	unsigned char *arp_ptr;
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 4e38683..9afb1c5 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -349,6 +349,8 @@ static ssize_t bonding_store_mode(struct device *d,
 		goto out;
 	}
 
+	/* don't cache arp_validate between modes */
+	bond->params.arp_validate = BOND_ARP_VALIDATE_NONE;
 	bond->params.mode = new_value;
 	bond_set_mode_ops(bond, bond->params.mode);
 	pr_info("%s: setting mode to %s (%d).\n",
@@ -419,8 +421,8 @@ static ssize_t bonding_store_arp_validate(struct device *d,
 					  struct device_attribute *attr,
 					  const char *buf, size_t count)
 {
-	int new_value, ret = count;
 	struct bonding *bond = to_bond(d);
+	int new_value, ret = count;
 
 	if (!rtnl_trylock())
 		return restart_syscall();
@@ -431,7 +433,7 @@ static ssize_t bonding_store_arp_validate(struct device *d,
 		ret = -EINVAL;
 		goto out;
 	}
-	if (new_value && (bond->params.mode != BOND_MODE_ACTIVEBACKUP)) {
+	if (new_value && bond->params.mode != BOND_MODE_ACTIVEBACKUP) {
 		pr_err("%s: arp_validate only supported in active-backup mode.\n",
 		       bond->dev->name);
 		ret = -EINVAL;
@@ -441,6 +443,12 @@ static ssize_t bonding_store_arp_validate(struct device *d,
 		bond->dev->name, arp_validate_tbl[new_value].modename,
 		new_value);
 
+	if (bond->dev->flags & IFF_UP) {
+		if (!new_value)
+			bond->recv_probe = NULL;
+		else if (bond->params.arp_interval)
+			bond->recv_probe = bond_arp_rcv;
+	}
 	bond->params.arp_validate = new_value;
 out:
 	rtnl_unlock();
@@ -561,8 +569,8 @@ static ssize_t bonding_store_arp_interval(struct device *d,
 					  struct device_attribute *attr,
 					  const char *buf, size_t count)
 {
-	int new_value, ret = count;
 	struct bonding *bond = to_bond(d);
+	int new_value, ret = count;
 
 	if (!rtnl_trylock())
 		return restart_syscall();
@@ -605,8 +613,13 @@ static ssize_t bonding_store_arp_interval(struct device *d,
 		 * is called.
 		 */
 		if (!new_value) {
+			if (bond->params.arp_validate)
+				bond->recv_probe = NULL;
 			cancel_delayed_work_sync(&bond->arp_work);
 		} else {
+			/* arp_validate can be set only in active-backup mode */
+			if (bond->params.arp_validate)
+				bond->recv_probe = bond_arp_rcv;
 			cancel_delayed_work_sync(&bond->mii_work);
 			queue_delayed_work(bond->wq, &bond->arp_work, 0);
 		}
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index f7ab161..7ad8bd5 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -430,6 +430,7 @@ static inline bool slave_can_tx(struct slave *slave)
 
 struct bond_net;
 
+int bond_arp_rcv(const struct sk_buff *skb, struct bonding *bond, struct slave *slave);
 struct vlan_entry *bond_next_vlan(struct bonding *bond, struct vlan_entry *curr);
 int bond_dev_queue_xmit(struct bonding *bond, struct sk_buff *skb, struct net_device *slave_dev);
 void bond_xmit_slave_id(struct bonding *bond, struct sk_buff *skb, int slave_id);
-- 
1.8.1.4

^ permalink raw reply related

* [PATCH -net 0/2] bonding: fix arp_validate desync state & race
From: Nikolay Aleksandrov @ 2013-09-06 19:41 UTC (permalink / raw)
  To: netdev; +Cc: fubar, davem, andy, mleitner

Hello all,
These two patches aim to fix the possible de-sync state which the bond
can enter if we have arp_validate without arp_interval or the other way
around. They also fix a race condition between arp_validate setting and
mode changing.

Patch 01 - fixes the race condition between store_arp_validate and bond
mode change by using rtnl for sync
Patch 02 - fixes the possible de-sync state by setting/unsetting recv_probe
if arp_interval is set/unset and also if arp_validate is set/unset

Best regards,
 Nikolay Aleksandrov


Nikolay Aleksandrov (2):
  bonding: fix store_arp_validate race with mode change
  bonding: fix bond_arp_rcv setting and arp validate desync state

 drivers/net/bonding/bond_main.c  |  4 ++--
 drivers/net/bonding/bond_sysfs.c | 31 +++++++++++++++++++++++++------
 drivers/net/bonding/bonding.h    |  1 +
 3 files changed, 28 insertions(+), 8 deletions(-)

-- 
1.8.1.4

^ permalink raw reply

* [PATCH -net 1/2] bonding: fix store_arp_validate race with mode change
From: Nikolay Aleksandrov @ 2013-09-06 19:41 UTC (permalink / raw)
  To: netdev; +Cc: fubar, davem, andy, mleitner
In-Reply-To: <1378496518-16104-1-git-send-email-nikolay@redhat.com>

We need to protect store_arp_validate via rtnl because it can race with
mode changing and we can end up having arp_validate set in a mode
different from active-backup.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
 drivers/net/bonding/bond_sysfs.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index ce46776..4e38683 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -419,27 +419,33 @@ static ssize_t bonding_store_arp_validate(struct device *d,
 					  struct device_attribute *attr,
 					  const char *buf, size_t count)
 {
-	int new_value;
+	int new_value, ret = count;
 	struct bonding *bond = to_bond(d);
 
+	if (!rtnl_trylock())
+		return restart_syscall();
 	new_value = bond_parse_parm(buf, arp_validate_tbl);
 	if (new_value < 0) {
 		pr_err("%s: Ignoring invalid arp_validate value %s\n",
 		       bond->dev->name, buf);
-		return -EINVAL;
+		ret = -EINVAL;
+		goto out;
 	}
 	if (new_value && (bond->params.mode != BOND_MODE_ACTIVEBACKUP)) {
 		pr_err("%s: arp_validate only supported in active-backup mode.\n",
 		       bond->dev->name);
-		return -EINVAL;
+		ret = -EINVAL;
+		goto out;
 	}
 	pr_info("%s: setting arp_validate to %s (%d).\n",
 		bond->dev->name, arp_validate_tbl[new_value].modename,
 		new_value);
 
 	bond->params.arp_validate = new_value;
+out:
+	rtnl_unlock();
 
-	return count;
+	return ret;
 }
 
 static DEVICE_ATTR(arp_validate, S_IRUGO | S_IWUSR, bonding_show_arp_validate,
-- 
1.8.1.4

^ permalink raw reply related

* [GIT] Networking
From: David Miller @ 2013-09-06 19:40 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


A quick set of fixes, some to deal with fallout from yesterday's
net-next merge.

1) Fix compilation of bnx2x driver with CONFIG_BNX2X_SRIOV disabled, from
   Dmitry Kravkov.

2) Fix a bnx2x regression caused by one of Dave Jones's mistaken braces
   changes, from Eilon Greenstein.

3) Add some protective filtering in the netlink tap code, from Daniel
   Borkmann.

4) Fix TCP congestion window growth regression after timeouts, from
   Yuchung Cheng.

5) Correctly adjust TCP's rcv_ssthresh for out of order packets, from
   Eric Dumazet.

Please pull, thanks a lot!

The following changes since commit 2e032852245b3dcfe5461d7353e34eb6da095ccf:

  Merge branch 'for-linus' of git://git.linaro.org/people/rmk/linux-arm (2013-09-05 18:07:32 -0700)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master

for you to fetch changes up to 4e4f1fc226816905c937f9b29dabe351075dfe0f:

  tcp: properly increase rcv_ssthresh for ofo packets (2013-09-06 14:43:49 -0400)

----------------------------------------------------------------
Daniel Borkmann (1):
      net: netlink: filter particular protocols from analyzers

Dmitry Kravkov (1):
      bnx2x: fix broken compilation with CONFIG_BNX2X_SRIOV is not set

Eilon Greenstein (1):
      bnx2x: Restore a call to config_init

Eric Dumazet (1):
      tcp: properly increase rcv_ssthresh for ofo packets

Florian Fainelli (1):
      net: add documentation for BQL helpers

Michael Opdenacker (1):
      mlx5: remove unused MLX5_DEBUG param in Kconfig

Yuchung Cheng (1):
      tcp: fix no cwnd growth after timeout

 drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c  |  9 +++------
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.h |  2 ++
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig   | 10 ----------
 include/linux/netdevice.h                         | 26 ++++++++++++++++++++++++++
 net/ipv4/tcp_input.c                              | 11 ++++++-----
 net/netlink/af_netlink.c                          | 30 ++++++++++++++++++++++++++++++
 6 files changed, 67 insertions(+), 21 deletions(-)

^ permalink raw reply

* Re: GSO/GRO and UDP performance
From: Eric Dumazet @ 2013-09-06 19:32 UTC (permalink / raw)
  To: James Yonan; +Cc: Rick Jones, netdev
In-Reply-To: <522A2C76.10203@openvpn.net>

On Fri, 2013-09-06 at 13:26 -0600, James Yonan wrote:

> Where does the 16+1 come from?  I'm getting my 43x from the ratio of max 
> legal IP packet size (64KB) / internet MTU (1500).  Are you saying that 
> GRO cannot aggregate up to 64 KB?
> 

Yes this is what I said.

Hint : MAX_SKB_FRAGS is the number of fragments per skb

Each aggregated frame consumes at least one fragment.

Hint : some drivers uses more than one fragment per datagram.

-> A fragment in skb does not necessarily contains one and exactly one
datagram

> >> I think we cannot aggregate UDP packets, because UDP lacks sequence
> >> numbers, so reorders would be a problem.
> 
> >> You really need something that is not UDP generic.
> 
> Right -- that's why I'm proposing a hook for UDP GSO/GRO providers that 
> know about specific app-layer protocols and can provide segmentation and 
> aggregation methods for them.  Such a provider would be implemented in a 
> kernel module and would know about the specific app-layer protocol, so 
> it would be able to losslessly segment and aggregate it (i.e. it could 
> use a sequence number from the app-layer protocol).

Its not a choice given by application.

As I said you'll have to make sure all the stack will understand the
meaning of datagram aggregation.

^ permalink raw reply

* Re: GSO/GRO and UDP performance
From: James Yonan @ 2013-09-06 19:26 UTC (permalink / raw)
  To: Rick Jones; +Cc: Eric Dumazet, netdev
In-Reply-To: <522A05E9.3090206@hp.com>

On 06/09/2013 10:42, Rick Jones wrote:
> On 09/06/2013 06:07 AM, Eric Dumazet wrote:
>> On Fri, 2013-09-06 at 03:22 -0600, James Yonan wrote:
>>
>>> So I think that playing well with GSO/GRO is essential to get speedup in
>>> UDP apps because of this 43x multiplier.
>>>
>>
>> Thats not true. GRO cannot aggregate more than 16+1 packets.

Where does the 16+1 come from?  I'm getting my 43x from the ratio of max 
legal IP packet size (64KB) / internet MTU (1500).  Are you saying that 
GRO cannot aggregate up to 64 KB?

>> I think we cannot aggregate UDP packets, because UDP lacks sequence
>> numbers, so reorders would be a problem.

>> You really need something that is not UDP generic.

Right -- that's why I'm proposing a hook for UDP GSO/GRO providers that 
know about specific app-layer protocols and can provide segmentation and 
aggregation methods for them.  Such a provider would be implemented in a 
kernel module and would know about the specific app-layer protocol, so 
it would be able to losslessly segment and aggregate it (i.e. it could 
use a sequence number from the app-layer protocol).

> It may  not be as sexy, and it cannot get the 43x multiplier (just what
> *is* the service demand change on a netperf TCP_STREAM test these days
> between GSO/GRO on and off anyway?)

That's something I haven't really looked too closely at yet.  With 
MAX_GRO_SKBS set to only 8, how well would this really scale?

> but looking for basic path-length reductions would be goodness.

Path is fairly optimized as-is.

Direction 1: udp_encap_recv -> tunnel decapsulation -> netif_rx

Direction 2: ndo_start_xmit -> tunnel encapsulation -> ip_local_out

I've also looked into getting closer to driver TX by using 
dev_queue_xmit instead of ip_local_out.

Even though this is a virtual driver without interrupts, I'm also 
looking at NAPI as a way of getting packet flows into GRO on the RX side.

Bottom line is that I want to saturate 10 GigE with UDP packets without 
breaking a sweat.  ixgbe or other drivers in that class can handle it if 
the per-packet overhead in the network stack can be reduced enough.

James

^ permalink raw reply

* Re: [PATCH 2/2] sh_eth: add device tree support
From: Sergei Shtylyov @ 2013-09-06 19:10 UTC (permalink / raw)
  To: Mark Rutland
  Cc: netdev@vger.kernel.org, rob.herring@calxeda.com, Pawel Moll,
	swarren@wwwdotorg.org, ian.campbell@citrix.com,
	grant.likely@linaro.org, devicetree@vger.kernel.org,
	nobuhiro.iwamatsu.yj@renesas.com, linux-sh@vger.kernel.org,
	rob@landley.net, linux-doc@vger.kernel.org
In-Reply-To: <20130902085225.GF27891@e106331-lin.cambridge.arm.com>

Hello.

On 09/02/2013 12:52 PM, Mark Rutland wrote:

    Don't know why I haven't noticed your mail before...

>> Add support of the device tree probing for the Renesas SH-Mobile SoCs.

>> This work is loosely based on an original patch by Nobuhiro Iwamatsu
>> <nobuhiro.iwamatsu.yj@renesas.com>.

>> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

>> ---
>> This patch is against Dave's 'net-next.git' repo.

>>   Documentation/devicetree/bindings/net/sh_eth.txt |   40 +++++++++++++
>>   drivers/net/ethernet/renesas/sh_eth.c            |   66 ++++++++++++++++++++++-
>>   2 files changed, 105 insertions(+), 1 deletion(-)

>> Index: net-next/Documentation/devicetree/bindings/net/sh_eth.txt
>> ===================================================================
>> --- /dev/null
>> +++ net-next/Documentation/devicetree/bindings/net/sh_eth.txt
>> @@ -0,0 +1,40 @@
>> +* Renesas Electronics SH EtherMAC
>> +
>> +This file provides information on what the device node for the SH EtherMAC
>> +interface contains.
>> +
>> +Required properties:
>> +- compatible: "renesas,gether-r8a7740" if the device is a part of R8A7740 SoC.
>> +	      "renesas,ether-r8a7779" if the device is a part of R8A7778/9 SoCs.
>> +	      "renesas,ether-r8a7790" if the device is a part of R8A7790/1 SoCs.

> What are the functional differences between the blocks in these devices
> that mean they have different compatible strings?

     RA87740 has Gigabit Ether (GEther) with TSU block (I don't know what TSU 
stands for) and the register layout completely different from the other SoCs. 
R8A777x and R8A7790 are 100 Mbit/s Ether without TSU block differing in some 
minor but vital register details between them; they use so called "R-Car" 
register layout contrasted to the "Gigabit" layout of R8A7740.

>> +- reg: offset and length of the register set for the device; if the device has
>> +       TSU registers, you need to specify two register sets here.

> This doesn't explicitly state ordering, and doesn't describe what the
> first register set is (control registers?). If possible, it would be
> nice to refer to the set of registers by the name given in
> documentation; is there any available?

    No, there's not common name (and I only have R8A777x and R8A7790 docs).
There are two subranges for those SoCs: HDMAC and feLic registers at offsets 
0x200 and 0x300 from the "reg" prop start.

> I think we should have something like the below to ensure it's explicit.
> In general we need more consistency in the the way bindings describe reg
> properties.

> 	- reg: offset and length of:
> 	     [1] the control registers of the device (required)
> 	     [2] the TSU registers for the device (optional)

    OK.

>> +- interrupt-parent: the phandle for the interrupt controller that services
>> +		    interrupts for this device.

> Why is that required?

    I'm not sure, maybe it's not.

>> +- interrupts: interrupt mapping for the interrupt source.

> Interrupts are defined in terms of interrupt-specifiers. How about:

> 	- interrupts: an interrupt-specifier for the sole interrupt
> 	              generated by the device.

    OK.

>> +- phy-mode: string, operation mode of the PHY interface (a string that
>> +	    of_get_phy_mode() can understand).

> That looks suspicious. Bindings should *not* refer to Linux internals.
> Instead, we should document the phy-handle and phy-mode properties and

    There's also "phy" variant in use now in addition to the older 
"phy-handle". Sigh...

> how they are meant to be used in a generic binding document (I couldn't
> see a generic document doing this so far...).

    So you want me to create one?

>> +- phy-handle: phandle of the PHY device.
>> +
>> +Optional properties:
>> +- local-mac-address: 6 bytes, MAC address.
>> +- renesas,no-ether-link: specify when a board does not provide a proper LINK
>> +			 signal.
>> +- renesas,ether-link-active-low: specify when the LINK signal is active-low.

> What types are these? I know local-mac-address is a byte-string by
> ePAPR, presumably the last two are empty (boolean)?

    Exactly.

>> +
>> +Example (Armadillo800EVA board):
>> +
>> +	ethernet@e9a00000 {
>> +		compatible = "renesas,gether-r8a7740";
>> +		reg = <0xe9a00000 0x800>, <0xe9a01800 0x800>;
>> +		interrupt-parent = <&gic>;
>> +		interrupts = <0 142 0x4>;
>> +		phy-mode = "mii";
>> +		phy-handle = <&phy0>;
>> +		#address-cells = <1>;
>> +		#size-cells = <0>;
>> +
>> +		phy0: ethernet-phy@0 {
>> +			reg = <0>;
>> +		};

> The binding didn't state anything about sub-nodes. Is it a general
> property of phy bindings that they may be embedded within a consumer's
> node?

    Considering Ethernet PHY subnodes, yes. They hang off MDIO bus controlled 
by the Ethernet MAC registers.

>> +	};
>> Index: net-next/drivers/net/ethernet/renesas/sh_eth.c
>> ===================================================================
>> --- net-next.orig/drivers/net/ethernet/renesas/sh_eth.c
>> +++ net-next/drivers/net/ethernet/renesas/sh_eth.c
[...]
>> @@ -2600,6 +2603,52 @@ static const struct net_device_ops sh_et
>>   	.ndo_change_mtu		= eth_change_mtu,
>>   };
>>
>> +#ifdef CONFIG_OF
>> +static struct sh_eth_plat_data *sh_eth_parse_dt(struct device *dev)
>> +{
>> +	struct device_node *np = dev->of_node;
>> +	struct sh_eth_plat_data *pdata;
>> +	struct device_node *phy;
>> +	const char *mac_addr;
>> +
>> +	pdata = devm_kzalloc(dev, sizeof(*pdata), GFP_KERNEL);
>> +	if (!pdata)
>> +		return NULL;
>> +
>> +	pdata->phy_interface = of_get_phy_mode(np);
>> +
>> +	phy = of_parse_phandle(np, "phy-handle", 0);
>> +	if (!phy || of_property_read_u32(phy, "reg", &pdata->phy)) {

> NAK. You didn't describe the format of the phy node, yet you are reading
> values from it from a logically separate driver.

    See Documentation/devicetree/bindings/net/phy.txt. Although I really 
didn't follow it closely (and it looks pretty obsolete, requiring 
"device_type" and "linux,phandle" props).

> Thanks,
> Mark.

WBR, Sergei


^ permalink raw reply

* Re: [PATCH] tcp: properly increase rcv_ssthresh for ofo packets
From: David Miller @ 2013-09-06 18:48 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, ncardwell
In-Reply-To: <1378488958.31445.47.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 06 Sep 2013 10:35:58 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> TCP receive window handling is multi staged.
> 
> A socket has a memory budget, static or dynamic, in sk_rcvbuf.
> 
> Because we do not really know how this memory budget translates to
> a TCP window (payload), TCP announces a small initial window
> (about 20 MSS).
> 
> When a packet is received, we increase TCP rcv_win depending
> on the payload/truesize ratio of this packet. Good citizen
> packets give a hint that it's reasonable to have rcv_win = sk_rcvbuf/2
> 
> This heuristic takes place in tcp_grow_window()
> 
> Problem is : We currently call tcp_grow_window() only for in-order
> packets.
> 
> This means that reorders or packet losses stop proper grow of
> rcv_win, and senders are unable to benefit from fast recovery,
> or proper reordering level detection.
> 
> Really, a packet being stored in OFO queue is not a bad citizen.
> It should be part of the game as in-order packets.
> 
> In our traces, we very often see sender is limited by linux small
> receive windows, even if linux hosts use autotuning (DRS) and should
> allow rcv_win to grow to ~3MB.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Neal Cardwell <ncardwell@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next v2] net: add documentation for BQL helpers
From: David Miller @ 2013-09-06 18:47 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, eric.dumazet
In-Reply-To: <1378483080-31632-1-git-send-email-f.fainelli@gmail.com>

From: "Florian Fainelli" <f.fainelli@gmail.com>
Date: Fri, 6 Sep 2013 16:58:00 +0100

> Provide a kernel-doc comment documentation for the BQL helpers:
> - netdev_sent_queue
> - netdev_completed_queue
> - netdev_reset_queue
> 
> Similarly to how it is done for the other functions, the documentation
> only covers the function operating on struct net_device and not struct
> netdev_queue.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] mlx5: remove unused MLX5_DEBUG param in Kconfig
From: David Miller @ 2013-09-06 18:47 UTC (permalink / raw)
  To: michael.opdenacker-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8
  Cc: eli-VPRAkNaXOzVWk0Htik3J/w, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1378468162-3109-1-git-send-email-michael.opdenacker-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8@public.gmane.org>

From: Michael Opdenacker <michael.opdenacker-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8@public.gmane.org>
Date: Fri,  6 Sep 2013 13:49:22 +0200

> This patch proposes to remove the MLX5_DEBUG kernel configuration
> parameter defined in drivers/net/ethernet/mellanox/mlx5/core/Kconfig,
> but used nowhere in the makefiles and source code.
> 
> This could also be fixed by using this parameter,
> but this may be a leftover from driver development...
> 
> Signed-off-by: Michael Opdenacker <michael.opdenacker-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8@public.gmane.org>

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net] bnx2x: Restore a call to config_init
From: David Miller @ 2013-09-06 18:47 UTC (permalink / raw)
  To: eilong; +Cc: netdev, davej
In-Reply-To: <1378461302.15758.2.camel@lb-tlvb-eilong.il.broadcom.com>

From: "Eilon Greenstein" <eilong@broadcom.com>
Date: Fri, 6 Sep 2013 12:55:02 +0300

> Commit c0a77ec74f295013d7ba3204dd3ed25fccf83cb4 'bnx2x: Add missing braces in
> bnx2x:bnx2x_link_initialize' identified indentation problem, but resolved it
> by adding braces instead of fixing the indentation. The braces now prevents a
> config_init call in some cases, though it should be called regardless of that
> condition. This patch removes the braces and fix the confusing indentation
> that caused this mess.
> 
> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] bnx2x: fix broken compilation with CONFIG_BNX2X_SRIOV is not set
From: David Miller @ 2013-09-06 18:47 UTC (permalink / raw)
  To: dmitry; +Cc: netdev, eric.dumazet, ariele
In-Reply-To: <1378452928-6072-1-git-send-email-dmitry@broadcom.com>

From: "Dmitry Kravkov" <dmitry@broadcom.com>
Date: Fri, 6 Sep 2013 10:35:28 +0300

> Since commit 60cad4e67bd6ff400e7ea61fe762b3042b12ae9d
> "bnx2x: VF RSS support - VF side" fails to compile w/o
> CONFIG_BNX2X_SRIOV option.
>  
> Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Ariel Elior <ariele@broadcom.com>
> Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
> ---
> Done with same formatting used in the file. 

Applied.

^ permalink raw reply

* Re: [PATCH net-next] tcp: fix no cwnd growth after timeout
From: David Miller @ 2013-09-06 18:47 UTC (permalink / raw)
  To: ncardwell; +Cc: ycheng, edumazet, netdev
In-Reply-To: <CADVnQynvfPtsJN-XqpTkbu4Y+=uzYcmv22GpaV=tRYL8w90TUw@mail.gmail.com>

From: Neal Cardwell <ncardwell@google.com>
Date: Thu, 5 Sep 2013 19:56:59 -0400

> On Thu, Sep 5, 2013 at 6:36 PM, Yuchung Cheng <ycheng@google.com> wrote:
>> In commit 0f7cc9a3 "tcp: increase throughput when reordering is high",
>> it only allows cwnd to increase in Open state. This mistakenly disables
>> slow start after timeout (CA_Loss). Moreover cwnd won't grow if the
>> state moves from Disorder to Open later in tcp_fastretrans_alert().
>>
>> Therefore the correct logic should be to allow cwnd to grow as long
>> as the data is received in order in Open, Loss, or even Disorder state.
>>
>> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> 
> Acked-by: Neal Cardwell <ncardwell@google.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net] net: netlink: filter particular protocols from analyzers
From: David Miller @ 2013-09-06 18:47 UTC (permalink / raw)
  To: dborkman; +Cc: netdev, stephen
In-Reply-To: <5228E2A0.4010607@redhat.com>

From: Daniel Borkmann <dborkman@redhat.com>
Date: Thu, 05 Sep 2013 21:59:28 +0200

> On 09/05/2013 09:54 PM, David Miller wrote:
>> From: Daniel Borkmann <dborkman@redhat.com>
>> Date: Thu, 05 Sep 2013 21:48:00 +0200
>>
>>> With socket(PF_PACKET, ..., htons(ETH_P_ALL)) you will already get
>>> all users from the suggested white-list of the patch, which is the
>>> majority of netlink users I believe. Hence, you do not need to have
>>> one socket per protocol. skbs from there should get dragged into
>>> pf_packet via dev_queue_xmit_nit() which works on ptype_all list.
>>
>> What about user level netlink protocols?
> 
> If you are referring to NETLINK_USERSOCK, then we let this pass here,
> so nothing changes.

Ok, I've applied this, thanks Daniel.

^ permalink raw reply

* Re: [net-next v5 8/8] i40e: include i40e in kernel proper
From: Jesse Brandeburg @ 2013-09-06 18:30 UTC (permalink / raw)
  To: David Miller; +Cc: e1000-devel, netdev, gospo, sassmann
In-Reply-To: <20130906.140141.2245821068783126791.davem@davemloft.net>

On Fri, 6 Sep 2013 14:01:41 -0400
David Miller <davem@davemloft.net> wrote:
> Please rename this Kbuild file to the normal Makefile instead of
> trying to be different from every single other driver in the
> networking for the sake of an issue that is your, and your problem
> alone.

Thanks Dave, will do, I'm preparing the patch now.

> You guys should really be grateful that anyone at all not being paid
> to do so is reviewing such a huge body of code for you, rather than
> complaining that all the issues weren't discovered the first time
> this series was posted.

We *are* really grateful for all the effort of any/all reviewers.  I
would like to personally thank you Dave, Joe Perches, Ben Hutchings,
and Stephen Hemminger for the non-trival amount of time spent on
reviewing this patch set.

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: [PATCH] bnx2x: avoid atomic allocations during initialization
From: David Miller @ 2013-09-06 18:27 UTC (permalink / raw)
  To: mschmidt; +Cc: netdev, ariele, eilong
In-Reply-To: <1378411989-19775-1-git-send-email-mschmidt@redhat.com>

From: Michal Schmidt <mschmidt@redhat.com>
Date: Thu,  5 Sep 2013 22:13:09 +0200

> During initialization bnx2x allocates significant amounts of memory
> (for rx data, rx SGEs, TPA pool) using atomic allocations.
> 
> I received a report where bnx2x failed to allocate SGEs and it had
> to fall back to TPA-less operation.
> 
> Let's use GFP_KERNEL allocations during initialization, which runs
> in process context. Add gfp_t parameters to functions that are used
> both in initialization and in the receive path.
> 
> Use an unlikely branch in bnx2x_frag_alloc() to avoid atomic allocation
> by netdev_alloc_frag(). The branch is taken several thousands of times
> during initialization, but then never more. Note that fp->rx_frag_size
> is never greater than PAGE_SIZE, so __get_free_page() can be used here.
> 
> Signed-off-by: Michal Schmidt <mschmidt@redhat.com>

This change looks good to me, if some Broadcom folks could take a look
and ACK/NACK that would be great.

Thanks.

^ permalink raw reply

* Re: [E1000-devel] [net-next v5 8/8] i40e: include i40e in kernel proper
From: David Miller @ 2013-09-06 18:23 UTC (permalink / raw)
  To: joe
  Cc: mitch.a.williams, jesse.brandeburg, stephen, jeffrey.t.kirsher,
	e1000-devel, netdev, gospo, sassmann
In-Reply-To: <1378491615.19204.10.camel@joe-AO722>

From: Joe Perches <joe@perches.com>
Date: Fri, 06 Sep 2013 11:20:15 -0700

> On Fri, 2013-09-06 at 17:28 +0000, Williams, Mitch A wrote:
>> My goal was to make our out-of-tree driver as close as possible -
>> including the makefiles - to the upstream driver. Doing this makes it
>> simpler for us to backport and forward-port patches. It makes it less
>> confusing for us when we're moving from one environment to the other.
> 
> I think using things like
> 
> #define i40e_memset memset
> 
> is just odd.

Agreed, this is rediculous.

^ permalink raw reply

* Re: [E1000-devel] [net-next v5 8/8] i40e: include i40e in kernel proper
From: Joe Perches @ 2013-09-06 18:20 UTC (permalink / raw)
  To: Williams, Mitch A
  Cc: Brandeburg, Jesse, Stephen Hemminger, Kirsher, Jeffrey T,
	davem@davemloft.net, e1000-devel@lists.sourceforge.net,
	netdev@vger.kernel.org, gospo@redhat.com, sassmann@redhat.com
In-Reply-To: <AAEA33E297BCAC4B9BB20A7C2DF0AB8D5C3B0A3D@FMSMSX113.amr.corp.intel.com>

On Fri, 2013-09-06 at 17:28 +0000, Williams, Mitch A wrote:
> My goal was to make our out-of-tree driver as close as possible -
> including the makefiles - to the upstream driver. Doing this makes it
> simpler for us to backport and forward-port patches. It makes it less
> confusing for us when we're moving from one environment to the other.

I think using things like

#define i40e_memset memset

is just odd.

^ permalink raw reply

* Re: [PATCH net] bonding: Avoid possible de-sync with arp_validate
From: Marcelo Ricardo Leitner @ 2013-09-06 18:13 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev, naleksan, andy
In-Reply-To: <7249.1378490193@death.nxdomain>

Em 06-09-2013 14:56, Jay Vosburgh escreveu:
> Marcelo Ricardo Leitner <mleitner@redhat.com> wrote:
>
>> Hi Dave,
>>
>> Please queue this one for -stable trees too.
>>
>> Thanks.
>>
>> ---8<---
>>
>> As bond_store_arp_validation and bond_store_arp_interval doesn't
>> deal with each other, we can't chain both at bond_open, otherwise
>> we are open to this de-sync state, steps in sequence:
>> - bond is down
>> - arp_interval = 0
>> - arp_validate = 1 or 2
>> - bond is up
>> - arp_interval = 1
>>
>> This would lead to bond issuing ARP requests but never listening
>> to the replies, because the tap wasn't attached. After reaching
>> this state, while bond is up, setting arp_validate won't help
>> as it only acts if needed.
>
> 	You need to sign off you patch.

My bad, sorry

>> drivers/net/bonding/bond_main.c | 7 +++----
>> 1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 39e5b1c..805d098 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -3180,11 +3180,10 @@ static int bond_open(struct net_device *bond_dev)
>> 	if (bond->params.miimon)  /* link check interval, in milliseconds. */
>> 		queue_delayed_work(bond->wq, &bond->mii_work, 0);
>>
>> -	if (bond->params.arp_interval) {  /* arp interval, in milliseconds. */
>> +	if (bond->params.arp_interval)    /* arp interval, in milliseconds. */
>> 		queue_delayed_work(bond->wq, &bond->arp_work, 0);
>> -		if (bond->params.arp_validate)
>> -			bond->recv_probe = bond_arp_rcv;
>> -	}
>> +	if (bond->params.arp_validate)
>> +		bond->recv_probe = bond_arp_rcv;
>
> 	Won't this set the recv_probe when arp_interval is 0 (disabled)?
> That's going to make bonding look at a bunch of traffic it's not going
> to do anything with.

Yes.

> 	Why is this better than having bonding_store_arp_interval set
> recv_probe if arp_validate is set when it queues the arp_work?  And,
> presumably, clear the recv_probe if arp_interval is being cleared.
>
> 	-J

Yeah, no good. You're right.

Self-NACK here. Nikolay will submit a better patch later.

Thanks,
Marcelo

>> 	if (bond->params.mode == BOND_MODE_8023AD) {
>> 		queue_delayed_work(bond->wq, &bond->ad_work, 0);
>> --
>> 1.8.3.1
>>
>
> ---
> 	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
>
>
>

^ permalink raw reply

* Re: [PATCH net] bonding: Avoid possible de-sync with arp_validate
From: Marcelo Ricardo Leitner @ 2013-09-06 18:02 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, naleksan, fubar, andy
In-Reply-To: <20130906.135836.365469022075105934.davem@davemloft.net>

Em 06-09-2013 14:58, David Miller escreveu:
> From: Marcelo Ricardo Leitner <mleitner@redhat.com>
> Date: Fri,  6 Sep 2013 14:41:53 -0300
>
>> Please queue this one for -stable trees too.
>
> This is very much not the correct way to submit a patch.
>
> First of all, you haven't signed off on the patch.
>
> Second of all, any supplemental information that doesn't belong in the
> commit message needs to go after a "---" dileanator line.

Okay, sorry for that. I'll resubmit as soon as I work on Jay's comments.

Thanks
Marcelo

^ permalink raw reply

* Re: [E1000-devel] [net-next v5 8/8] i40e: include i40e in kernel proper
From: David Miller @ 2013-09-06 18:01 UTC (permalink / raw)
  To: jesse.brandeburg
  Cc: stephen, jeffrey.t.kirsher, e1000-devel, netdev, gospo, sassmann
In-Reply-To: <C0F8C5F1-0D98-4297-ACC9-55BB9C889FEB@intel.com>


I'm kind of getting sick of the "we'll fix it in a follow up patch"
talk.

Please rename this Kbuild file to the normal Makefile instead of
trying to be different from every single other driver in the
networking for the sake of an issue that is your, and your problem
alone.

You guys should really be grateful that anyone at all not being paid
to do so is reviewing such a huge body of code for you, rather than
complaining that all the issues weren't discovered the first time
this series was posted.

Please start being more reasonable about this situation.

Thank you.

^ permalink raw reply

* Re: [PATCH net] bonding: Avoid possible de-sync with arp_validate
From: David Miller @ 2013-09-06 17:58 UTC (permalink / raw)
  To: mleitner; +Cc: netdev, naleksan, fubar, andy
In-Reply-To: <994d2dc994a801fc3aae7aa912266b09ea45d95c.1378489255.git.mleitner@redhat.com>

From: Marcelo Ricardo Leitner <mleitner@redhat.com>
Date: Fri,  6 Sep 2013 14:41:53 -0300

> Please queue this one for -stable trees too.

This is very much not the correct way to submit a patch.

First of all, you haven't signed off on the patch.

Second of all, any supplemental information that doesn't belong in the
commit message needs to go after a "---" dileanator line.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox