Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 2/4 ethtool] ethtool: Support for configurable RSS hash key.
From: Ben Hutchings @ 2014-01-23  5:42 UTC (permalink / raw)
  To: Venkata Duvvuru; +Cc: netdev@vger.kernel.org
In-Reply-To: <BF3270C86E8B1349A26C34E4EC1C44CB2C84CF8C@CMEXMB1.ad.emulex.com>

[-- Attachment #1: Type: text/plain, Size: 2538 bytes --]

On Wed, 2014-01-22 at 12:06 +0000, Venkata Duvvuru wrote:
> > -----Original Message-----
> > From: Ben Hutchings [mailto:ben@decadent.org.uk]
> > Sent: Monday, January 20, 2014 7:13 PM
> > To: Venkata Duvvuru
> > Cc: netdev@vger.kernel.org
> > Subject: Re: [PATCH 2/4 ethtool] ethtool: Support for configurable RSS hash
> > key.
> > 
> > On Mon, 2014-01-20 at 13:28 +0000, Venkata Duvvuru wrote:
> > > Ben, Please ignore my previous reply. My reply options were screwed up in
> > that.
> > >
> > > > -----Original Message-----
> > > > From: Ben Hutchings [mailto:ben@decadent.org.uk]
> > > > Sent: Monday, January 20, 2014 12:06 AM
> > > > To: Venkata Duvvuru
> > > > Cc: netdev@vger.kernel.org
> > > > Subject: Re: [PATCH 2/4 ethtool] ethtool: Support for configurable
> > > > RSS hash key.
> > > >
> > > > On Fri, 2014-01-17 at 13:02 +0000, Venkata Duvvuru wrote:
> > > > > This ethtool patch will primarily implement the parser for the
> > > > > options
> > > > provided by the user for set and get hashkey before invoking the ioctl.
> > > > > This patch also has Ethtool man page changes which describes the
> > > > > Usage of
> > > > set and get hashkey options.
> > > >
> > > > I'd prefer to have this combined with the -x/-X options (and add new
> > > > long options to reflect that they cover the key as well).
> > >
> > > if we add hashkey options to the existing -x/-X (--show-rxfh-indir/ --set-
> > rxfh-indir), I think it won't be appropriate going by the command name.
> > > We could change the command name to something like --show-rssconfig /-
> > -rss-config but I'm afraid would that be backward compatible?
> > [...]
> > 
> > That's why I said 'add new long options'.  The ethtool argument parser allows
> > arbitrarily many aliases for each sub-command.
> 
> Just to make sure that we are in sync
>         { "-x|--show-rxfh-indir|--show-hashkey", 1, do_getrssconfig,
>           "Show RSS configuration" },
> 
>         { "-X|--set-rxfh-indir|--hashkey", 1, do_setrssconfig,
>           "Set RSS configuration",
>           "             equal N | weight W0 W1 ...\n"
>           "             hkey %x:%x:%x:%x:%x:....:%x\n" },
> 
> And equal/weight will be mutually exclusive with hkey.
> Does it makes sense?

No, you should be able to set both the key and the indirection table at
once.  And the new aliases should then be something like --set-rxfh and
--show-rxfh.

Ben.

-- 
Ben Hutchings
compatible: Gracefully accepts erroneous data from any source

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 3/4] ethtool: Support for configurable RSS hash key.
From: Ben Hutchings @ 2014-01-23  5:39 UTC (permalink / raw)
  To: Venkata Duvvuru; +Cc: netdev@vger.kernel.org
In-Reply-To: <BF3270C86E8B1349A26C34E4EC1C44CB2C84CFAE@CMEXMB1.ad.emulex.com>

[-- Attachment #1: Type: text/plain, Size: 1264 bytes --]

On Wed, 2014-01-22 at 12:12 +0000, Venkata Duvvuru wrote:
[...]
> > No, what I mean is:
> > 
> > 1. An RX flow steering filter can specify use of RSS, in which case the value
> > looked up in the indirection is added to the queue number specified in the
> > filter.  This is not yet controllable through RX NFC though there is room for
> > extension there.
> > 
> > 2. Multi-function controllers need multiple RSS contexts (key + indirection
> > table) to support independent use of RSS on each function.
> > But it may also be possible to allocate multiple contexts to a single function.
> > This could be useful in conjunction with 1.  But there would need to be a way
> > to allocate and configure extra contexts first.
> The proposed changes will be incremental so I think this can be done
> in a separate patch. Thoughts?

The ethtool ABI (to userland) has to remain backward-compatible, and it
is preferable if we don't add lots of different structures for this.

So please define the new command structure to include both the key and
indirection table, and some reserved space (documented as 'userland must
set to 0') for future extensions.

Ben.

-- 
Ben Hutchings
compatible: Gracefully accepts erroneous data from any source

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v3] tuntap: Fix for a race in accessing numqueues
From: David Miller @ 2014-01-23  5:33 UTC (permalink / raw)
  To: dominic.curran; +Cc: netdev, jasowang, maxk
In-Reply-To: <1390359803-27989-1-git-send-email-dominic.curran@citrix.com>

From: Dominic Curran <dominic.curran@citrix.com>
Date: Wed, 22 Jan 2014 03:03:23 +0000

> A patch for fixing a race between queue selection and changing queues
> was introduced in commit 92bb73ea2("tuntap: fix a possible race between
> queue selection and changing queues").
> 
> The fix was to prevent the driver from re-reading the tun->numqueues
> more than once within tun_select_queue() using ACCESS_ONCE().
> 
> We have been experiancing 'Divide-by-zero' errors in tun_net_xmit() 
> since we moved from 3.6 to 3.10, and believe that they come from a 
> simular source where the value of tun->numqueues changes to zero 
> between the first and a subsequent read of tun->numqueues.
> 
> The fix is a simular use of ACCESS_ONCE(), as well as a multiply
> instead of a divide in the if statement.
> 
> Signed-off-by: Dominic Curran <dominic.curran@citrix.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next] bridge: Remove unnecessary vlan_put_tag in br_handle_vlan
From: David Miller @ 2014-01-23  5:30 UTC (permalink / raw)
  To: makita.toshiaki; +Cc: stephen, vyasevic, netdev
In-Reply-To: <1390350577-4122-1-git-send-email-makita.toshiaki@lab.ntt.co.jp>

From: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Date: Wed, 22 Jan 2014 09:29:37 +0900

> br_handle_vlan() pushes HW accelerated vlan tag into skbuff when outgoing
> port is the bridge device.
> This is unnecessary because __netif_receive_skb_core() can handle skbs
> with HW accelerated vlan tag. In current implementation,
> __netif_receive_skb_core() needs to extract the vlan tag embedded in skb
> data. This could cause low network performance especially when receiving
> frames at a high frame rate on the bridge device.
> 
> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] tcp: metrics: Fix rcu-race when deleting multiple entries
From: David Miller @ 2014-01-23  5:26 UTC (permalink / raw)
  To: christoph.paasch; +Cc: netdev
In-Reply-To: <1390307426-10840-1-git-send-email-christoph.paasch@uclouvain.be>

From: Christoph Paasch <christoph.paasch@uclouvain.be>
Date: Tue, 21 Jan 2014 13:30:26 +0100

> In bbf852b96ebdc6d1 I introduced the tmlist, which allows to delete
> multiple entries from the cache that match a specified destination if no
> source-IP is specified.
> 
> However, as the cache is an RCU-list, we should not create this tmlist, as
> it will change the tcpm_next pointer of the element that will be deleted
> and so a thread iterating over the cache's entries while holding the
> RCU-lock might get "redirected" to this tmlist.
> 
> This patch fixes this, by reverting back to the old behavior prior to
> bbf852b96ebdc6d1, which means that we simply change the tcpm_next
> pointer of the previous element (pp) to jump over the one we are
> deleting.
> The difference is that we call kfree_rcu() directly on the cache entry,
> which allows us to delete multiple entries from the list.
> 
> Fixes: bbf852b96ebdc6d1 (tcp: metrics: Delete all entries matching a certain destination)
> Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>

Looks good, applied, thank you.

^ permalink raw reply

* Re: [PATCH] ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
From: David Miller @ 2014-01-23  5:24 UTC (permalink / raw)
  To: pshelar; +Cc: duanj.fnst, daniel.petre, edumazet, netdev
In-Reply-To: <CALnjE+qFPoK_0EnZ_TWe1sx4s166iSjAH_+1r2uENeS4aRBckQ@mail.gmail.com>

From: Pravin Shelar <pshelar@nicira.com>
Date: Tue, 21 Jan 2014 10:33:36 -0800

> Can you clear IPCB in error path for dst_link_failure(). So that we do
> not clear it twice in fast path.

Indeed, please add this memset() before the dst_link_failure() call.

Thanks.

^ permalink raw reply

* [PATCH v2 2/2] net/cxgb4: Don't retrieve stats during recovery
From: Gavin Shan @ 2014-01-23  4:27 UTC (permalink / raw)
  To: netdev; +Cc: dm, ben, davem, Gavin Shan
In-Reply-To: <1390451256-11486-1-git-send-email-shangw@linux.vnet.ibm.com>

We possibly retrieve the adapter's statistics during EEH recovery
and that should be disallowed. Otherwise, it would possibly incur
replicate EEH error and EEH recovery is going to fail eventually.

The patch reuses statistics lock and checks net_device is attached
before going to retrieve statistics, so that the problem can be
avoided.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index c8eafbf..139a704 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4288,7 +4288,15 @@ static struct rtnl_link_stats64 *cxgb_get_stats(struct net_device *dev,
 	struct port_info *p = netdev_priv(dev);
 	struct adapter *adapter = p->adapter;
 
+	/* Block retrieving statistics during EEH error
+	 * recovery. Otherwise, the recovery might fail
+	 * and the PCI device will be removed permanently
+	 */
 	spin_lock(&adapter->stats_lock);
+	if (!netif_device_present(dev)) {
+		spin_unlock(&adapter->stats_lock);
+		return ns;
+	}
 	t4_get_port_stats(adapter, p->tx_chan, &stats);
 	spin_unlock(&adapter->stats_lock);
 
@@ -5496,12 +5504,14 @@ static pci_ers_result_t eeh_err_detected(struct pci_dev *pdev,
 	rtnl_lock();
 	adap->flags &= ~FW_OK;
 	notify_ulds(adap, CXGB4_STATE_START_RECOVERY);
+	spin_lock(&adap->stats_lock);
 	for_each_port(adap, i) {
 		struct net_device *dev = adap->port[i];
 
 		netif_device_detach(dev);
 		netif_carrier_off(dev);
 	}
+	spin_unlock(&adap->stats_lock);
 	if (adap->flags & FULL_INIT_DONE)
 		cxgb_down(adap);
 	rtnl_unlock();
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH v2 1/2] net/cxgb4: Avoid disabling PCI device for towice
From: Gavin Shan @ 2014-01-23  4:27 UTC (permalink / raw)
  To: netdev; +Cc: dm, ben, davem, Gavin Shan

If we have EEH error happens to the adapter and we have to remove
it from the system for some reasons (e.g. more than 5 EEH errors
detected from the device in last hour), the adapter will be disabled
for towice separately by eeh_err_detected() and remove_one(), which
will incur following unexpected backtrace. The patch tries to avoid
it.

WARNING: at drivers/pci/pci.c:1431
CPU: 12 PID: 121 Comm: eehd Not tainted 3.13.0-rc7+ #1
task: c0000001823a3780 ti: c00000018240c000 task.ti: c00000018240c000
NIP: c0000000003c1e40 LR: c0000000003c1e3c CTR: 0000000001764c5c
REGS: c00000018240f470 TRAP: 0700   Not tainted  (3.13.0-rc7+)
MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI>  CR: 28000024  XER: 00000004
CFAR: c000000000706528 SOFTE: 1
GPR00: c0000000003c1e3c c00000018240f6f0 c0000000010fe1f8 0000000000000035
GPR04: 0000000000000000 0000000000000000 00000000003ae509 0000000000000000
GPR08: 000000000000346f 0000000000000000 0000000000000000 0000000000003fef
GPR12: 0000000028000022 c00000000ec93000 c0000000000c11b0 c000000184ac3e40
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 c0000000009398d8 c00000000101f9c0 c0000001860ae000
GPR28: c000000182ba0000 00000000000001f0 c0000001860ae6f8 c0000001860ae000
NIP [c0000000003c1e40] .pci_disable_device+0xd0/0xf0
LR [c0000000003c1e3c] .pci_disable_device+0xcc/0xf0
Call Trace:
[c0000000003c1e3c] .pci_disable_device+0xcc/0xf0 (unreliable)
[d0000000073881c4] .remove_one+0x174/0x320 [cxgb4]
[c0000000003c57e0] .pci_device_remove+0x60/0x100
[c00000000046396c] .__device_release_driver+0x9c/0x120
[c000000000463a20] .device_release_driver+0x30/0x60
[c0000000003bcdb4] .pci_stop_bus_device+0x94/0xd0
[c0000000003bcf48] .pci_stop_and_remove_bus_device+0x18/0x30
[c00000000003f548] .pcibios_remove_pci_devices+0xa8/0x140
[c000000000035c00] .eeh_handle_normal_event+0xa0/0x3c0
[c000000000035f50] .eeh_handle_event+0x30/0x2b0
[c0000000000362c4] .eeh_event_handler+0xf4/0x1b0
[c0000000000c12b8] .kthread+0x108/0x130
[c00000000000a168] .ret_from_kernel_thread+0x5c/0x74

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h      |    5 +++--
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |   23 ++++++++++++++++++-----
 2 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 56e0415..b4ca0a1 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -387,8 +387,9 @@ struct work_struct;
 
 enum {                                 /* adapter flags */
 	FULL_INIT_DONE     = (1 << 0),
-	USING_MSI          = (1 << 1),
-	USING_MSIX         = (1 << 2),
+	DEV_ENABLED        = (1 << 1),
+	USING_MSI          = (1 << 2),
+	USING_MSIX         = (1 << 3),
 	FW_OK              = (1 << 4),
 	RSS_TNLALLLOOKUP   = (1 << 5),
 	USING_SOFT_PARAMS  = (1 << 6),
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index fff02ed..c8eafbf 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -5505,7 +5505,10 @@ static pci_ers_result_t eeh_err_detected(struct pci_dev *pdev,
 	if (adap->flags & FULL_INIT_DONE)
 		cxgb_down(adap);
 	rtnl_unlock();
-	pci_disable_device(pdev);
+	if ((adap->flags & DEV_ENABLED)) {
+		pci_disable_device(pdev);
+		adap->flags &= ~DEV_ENABLED;
+	}
 out:	return state == pci_channel_io_perm_failure ?
 		PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_NEED_RESET;
 }
@@ -5522,9 +5525,13 @@ static pci_ers_result_t eeh_slot_reset(struct pci_dev *pdev)
 		return PCI_ERS_RESULT_RECOVERED;
 	}
 
-	if (pci_enable_device(pdev)) {
-		dev_err(&pdev->dev, "cannot reenable PCI device after reset\n");
-		return PCI_ERS_RESULT_DISCONNECT;
+	if (!(adap->flags & DEV_ENABLED)) {
+		if (pci_enable_device(pdev)) {
+			dev_err(&pdev->dev, "Cannot reenable PCI "
+					    "device after reset\n");
+			return PCI_ERS_RESULT_DISCONNECT;
+		}
+		adap->flags |= DEV_ENABLED;
 	}
 
 	pci_set_master(pdev);
@@ -5910,6 +5917,9 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto out_disable_device;
 	}
 
+	/* PCI device has been enabled */
+	adapter->flags |= DEV_ENABLED;
+
 	adapter->regs = pci_ioremap_bar(pdev, 0);
 	if (!adapter->regs) {
 		dev_err(&pdev->dev, "cannot map device registers\n");
@@ -6145,7 +6155,10 @@ static void remove_one(struct pci_dev *pdev)
 			iounmap(adapter->bar2);
 		kfree(adapter);
 		pci_disable_pcie_error_reporting(pdev);
-		pci_disable_device(pdev);
+		if ((adapter->flags & DEV_ENABLED)) {
+			pci_disable_device(pdev);
+			adapter->flags &= ~DEV_ENABLED;
+		}
 		pci_release_regions(pdev);
 	} else
 		pci_release_regions(pdev);
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH] team: Don't allow team devices to change network namespaces.
From: Chen Weilong @ 2014-01-23  3:07 UTC (permalink / raw)
  To: jiri, davem; +Cc: netdev

From: Weilong Chen <chenweilong@huawei.com>

Like bonding, team as netdevice doesn't cross netns boundaries.

Team ports and team itself live in same netns.

Signed-off-by: Weilong Chen <chenweilong@huawei.com>
---
 drivers/net/team/team.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index dff24e3..2840742 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -2034,6 +2034,10 @@ static void team_setup(struct net_device *dev)
 
 	dev->features |= NETIF_F_LLTX;
 	dev->features |= NETIF_F_GRO;
+
+	/* Don't allow team devices to change network namespaces. */
+	dev->features |= NETIF_F_NETNS_LOCAL;
+
 	dev->hw_features = TEAM_VLAN_FEATURES |
 			   NETIF_F_HW_VLAN_CTAG_TX |
 			   NETIF_F_HW_VLAN_CTAG_RX |
-- 
1.7.12

^ permalink raw reply related

* Re: Linux kernel patch: elide fib_validate_source() completely when possible - bad side effect?
From: Hannes Frederic Sowa @ 2014-01-23  3:03 UTC (permalink / raw)
  To: David Miller; +Cc: ja, gregory.hoggarth, netdev
In-Reply-To: <20140122.185305.1125981867482223830.davem@davemloft.net>

On Wed, Jan 22, 2014 at 06:53:05PM -0800, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Thu, 23 Jan 2014 03:02:24 +0100
> 
> > On Wed, Jan 22, 2014 at 11:04:52PM +0200, Julian Anastasov wrote:
> >> 	I don't remember someone mentioning about such
> >> side-effect, I guess it is overlooked. IMHO, it is not a good
> >> reason to restore the old behavior. Lets see other opinions.
> > 
> > A routing lookup to check for broadcast is much too expansive, I agree. But
> > can't we just check skb->pkt_type == PACKET_BROADCAST?
> 
> We can't universally know what a broadcast is, because the prefix of
> every network is a local piece of information.
> 
> We might not have the route necessary to know the source address is
> a broadcast if the packet came through more than 1 hop already.

But in this case we only need those local pieces.

E.g. we register all local registered broadcast addresses in a structure
like inet_addr_lst so we only need to check if the packet would leave
this host with a broadcast hardware address. If the packet is forwarded
the router must do the same check as only it knows the local broadcast
addresses. I hope this is correct. ;)

^ permalink raw reply

* Re: Linux kernel patch: elide fib_validate_source() completely when possible - bad side effect?
From: David Miller @ 2014-01-23  2:53 UTC (permalink / raw)
  To: hannes; +Cc: ja, gregory.hoggarth, netdev
In-Reply-To: <20140123020224.GG7269@order.stressinduktion.org>

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Thu, 23 Jan 2014 03:02:24 +0100

> On Wed, Jan 22, 2014 at 11:04:52PM +0200, Julian Anastasov wrote:
>> 	I don't remember someone mentioning about such
>> side-effect, I guess it is overlooked. IMHO, it is not a good
>> reason to restore the old behavior. Lets see other opinions.
> 
> A routing lookup to check for broadcast is much too expansive, I agree. But
> can't we just check skb->pkt_type == PACKET_BROADCAST?

We can't universally know what a broadcast is, because the prefix of
every network is a local piece of information.

We might not have the route necessary to know the source address is
a broadcast if the packet came through more than 1 hop already.

^ permalink raw reply

* Re: [PATCH net-next 1/3] bonding: Set the correct value to fail_over_mac at enslavement
From: Ding Tianhong @ 2014-01-23  2:50 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Veaceslav Falico, David S. Miller, Netdev, Andy Gospodarek
In-Reply-To: <26505.1390423887@death.nxdomain>

On 2014/1/23 4:51, Jay Vosburgh wrote:
> Ding Tianhong <dingtianhong@huawei.com> wrote:
> 
>> If the new slave don't support setting the MAC address, there are
>> two ways to handle this situation:
>>
>> 1). If the new slave is the first slave, set bond to the new slave's
>>    MAC address, if the mode is active-backup, set fail_over_mac to
>>    active, otherwise set fail_over_mac to none.
> 
> 	This should be "if the mode is active-backup, set fail_over_mac
> to active, otherwise do not change fail_over_mac."  Setting to none here
> would undo any setting of fail_over_mac that the user had set prior to
> adding the first slave.
> 
I thought about this question for a long time, I still can not think clearly.
ex: the first slave did not support setting MAC, and mode is RR, fail_over_mac=1(user set it
    when load the driver), so the after the enslavement, the fail_over_mac is still 1,
    then the second slave added, if the second slave did not support setting MAC, it will pass
    the check and go ahead, until the dev_set_mac_address(), and return error, so
    I think set it to none may cause the second slave return early, but no substance logic change.
    I will modify this place as your opinion.

>> 2). If the new slave is not the first slave and the fail_over_mac is
>>    active, it means that the slave could work normally in active-backup
>>    mode, otherwise if the fail_over_mac is none, the slave could not
>>    work normally for no active-backup mode, so bond could not ensalve
>>    the new dev.
> 
> 	This (#2) is not a code change, correct?  You're just restating
> the existing behavior of the code, right?
> 
> 	Also, I don't see where this patch set updates the slave removal
> processing where the slave's original MAC is restored.  At present, this
> is done by a test against fail_over_mac, but should be tested against
> the mode and fail_over_mac.
> 
> 	My comment to the prior version of this patchset, again:
> 
> 	The correct way to fix this in general is to permit setting an
> option at any time, but only have it take effect in active-backup mode.
> This minimizes ordering requirements when setting options.
> 
> 	I would instead modify the bond enslave and removal processing
> to check the mode in addition to fail_over_mac when setting a slave's
> MAC during enslavement.  The change active slave processing already only
> calls the fail_over_mac function when in active-backup mode.  This
> should also be a simpler change set.
> 
> 	These comments still apply to this version of the patchset.
> 
> 	-J
> 

OK.

Regards
Ding

>> Cc: Jay Vosburgh <fubar@us.ibm.com>
>> Cc: Veaceslav Falico <vfalico@redhat.com>
>> Cc: Andy Gospodarek <andy@greyhouse.net>
>> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>> ---
>> drivers/net/bonding/bond_main.c | 12 ++++++++++--
>> 1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 3220b48..598f100 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -1334,9 +1334,17 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
>>
>> 	if (slave_ops->ndo_set_mac_address == NULL) {
>> 		if (!bond_has_slaves(bond)) {
>> -			pr_warning("%s: Warning: The first slave device specified does not support setting the MAC address. Setting fail_over_mac to active.",
>> +			pr_warning("%s: Warning: The first slave device specified does not support setting the MAC address.\n",
>> 				   bond_dev->name);
>> -			bond->params.fail_over_mac = BOND_FOM_ACTIVE;
>> +			if (bond->params.mode == BOND_MODE_ACTIVEBACKUP) {
>> +				bond->params.fail_over_mac = BOND_FOM_ACTIVE;
>> +				pr_warning("%s: Setting fail_over_mac to active for active-backup mode.\n",
>> +					   bond_dev->name);
>> +			} else {
>> +				bond->params.fail_over_mac = BOND_FOM_NONE;
>> +				pr_warning("%s: Setting fail_over_mac to none for no active-backup modes",
>> +					   bond_dev->name);
>> +			}
>> 		} else if (bond->params.fail_over_mac != BOND_FOM_ACTIVE) {
>> 			pr_err("%s: Error: The slave device specified does not support setting the MAC address, but fail_over_mac is not set to active.\n",
>> 			       bond_dev->name);
>> -- 
>> 1.8.0
> 
> ---
> 	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
> 
> 
> .
> 

^ permalink raw reply

* RE: [PATCH 2/2] net/cxgb4: Don't retrieve stats during recovery
From: Dimitrios Michailidis @ 2014-01-23  2:17 UTC (permalink / raw)
  To: Gavin Shan; +Cc: netdev@vger.kernel.org
In-Reply-To: <20140123013134.GA5947@shangw.(null)>

On Wed, Jan 22, 2014 at 5:32 PM, Gavin Shan wrote:
> >On Mon, Jan 20, 2014 at 02:05:18PM -0800, Dimitris Michailidis wrote:
> >>On 01/19/2014 07:05 PM, Gavin Shan wrote:
> >>>We possiblly retrieve the adapter's statistics during EEH recovery
> >>>and that should be disallowed. Otherwise, it would possibly incur
> >>>replicate EEH error and EEH recovery is going to fail eventually.
> >>>The patch checks if the PCI device is off-line before statistic
> >>>retrieval.
> >>
> >>The net_devices are detached during EEH so I think
> >>netif_device_present is a better check than pci_channel_offline.  I
> >>am not sure such a test should be left to each driver though.  If you
> >>do end up putting it in the driver it needs better synchronization
> >>with the EEH handlers as Ben mentioned.
> >>
> >
> >Ok. I agree that netif_device_present() is better since the statistics
> >is more net_device specific (other than pci_dev). And it's more accurate
> >to use netif_device_present() based on what we have:
> >
> >	pci_channel_offline()	----------+
> >	eeh_err_detected()		  |
> >		!netif_device_present() --+-----+
> >	<EEH recovery>			  |	|
> >	!pci_channel_offline()	----------+	|
> >	eeh_slot_reset()			|
> >	eeh_resume()				|
> >		netif_device_present()	--------+
> >
> >For the syncrhonization, I think we can just reuse the "adap->stats_lock".
> >Something like this:
> >
> >static pci_ers_result_t eeh_err_detected(struct pci_dev *pdev,
> >					 pci_channel_state_t state)
> >{
> >	:
> >	spin_lock(&adap->stats_lock);
> >        for_each_port(adap, i) {
> >                struct net_device *dev = adap->port[i];
> >
> >                netif_device_detach(dev);
> >                netif_carrier_off(dev);
> >        }
> >	spin_unlock(&adap->stats_lock);
> >	:
> >}
> >
> >static void eeh_resume(struct pci_dev *pdev)
> >{
> >	:
> >	spin_lock(&adap->stats_lock);
> >        for_each_port(adap, i) {
> >                struct net_device *dev = adap->port[i];
> >
> >                if (netif_running(dev)) {
> >                        link_start(dev);
> >                        cxgb_set_rxmode(dev);
> >                }
> >                netif_device_attach(dev);
> >        }
> >	spin_unlock(&adap->stats_lock);
> >	:
> >}

Both link_start and cxgb_set_rxmode here issue blocking commands to FW, these two cannot be under a spinlock.  In fact I don't think you need locking here at all.  The devices can be attached asynchronously relative to the stats code, we don't care if it races.  On detach it matters but not here.

> >static struct rtnl_link_stats64 *cxgb_get_stats(struct net_device *dev,
> >                                                struct rtnl_link_stats64 *ns)
> >{
> >	:
> >        spin_lock(&adapter->stats_lock);
> >	if (!netif_device_present(dev)) {
> >		spin_unlock(&adapter->stats_lock);
> >		return ns;
> >	}
> >        t4_get_port_stats(adapter, p->tx_chan, &stats);
> >        spin_unlock(&adapter->stats_lock);
> >	:
> >}
> >
> 
> Dimitris, Any more comments on this? :-)
 
Just the above.  Thanks.

> If you think it's fine, I'm going to change it like this and send
> out "v2".
> 
> Thanks,
> Gavin
> 
> >>>
> >>>Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
> >>>---
> >>>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |   11 +++++++++++
> >>>  1 file changed, 11 insertions(+)
> >>>
> >>>diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> >>>index c8eafbf..b0e72fb 100644
> >>>--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> >>>+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> >>>@@ -4288,6 +4288,17 @@ static struct rtnl_link_stats64
> *cxgb_get_stats(struct net_device *dev,
> >>>  	struct port_info *p = netdev_priv(dev);
> >>>  	struct adapter *adapter = p->adapter;
> >>>
> >>>+	/*
> >>>+	 * We possibly retrieve the statistics while the PCI
> >>>+	 * device is off-line. That would cause the recovery
> >>>+	 * on off-lined PCI device going to fail. So it's
> >>>+	 * reasonable to block it during the recovery period.
> >>>+	 */
> >>>+	if (pci_channel_offline(adapter->pdev)) {
> >>>+		memset(ns, 0, sizeof(*ns));
> >>>+		return ns;
> >>>+	}
> >>>+
> >>>  	spin_lock(&adapter->stats_lock);
> >>>  	t4_get_port_stats(adapter, p->tx_chan, &stats);
> >>>  	spin_unlock(&adapter->stats_lock);
> >>>

^ permalink raw reply

* Re: [PATCH] bonding: Don't allow bond devices to change network namespaces.
From: chenweilong @ 2014-01-23  2:41 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: fubar, vfalico, andy, davem, netdev
In-Reply-To: <20140122094106.GB2814@minipsycho.orion>

On 2014/1/22 17:41, Jiri Pirko wrote:
> Wed, Jan 22, 2014 at 10:16:30AM CET, chenweilong@huawei.com wrote:
>> From: Weilong Chen <chenweilong@huawei.com>
>>
>> Like bridge, bonding as netdevice doesn't cross netns boundaries.
>>
>> Bonding ports and bonding itself live in same netns.
> 
> I think should should be done for team as well. Openvs already
> has this. I believe that for vlans it is ok to change ns, right?
> 

OK, I'll write a new patch for 'team'.

For vlan dev, I think it's more complex.
1.Move vlan dev to a new ns, its real dev is in old ns, vlan
  can not work.
2.Move vlan real dev to a new ns, vlan dev is in old ns, vlan
  can not work.
If someone want to move vlan and its real dev to new ns, he must
do 1) and 2) or 2) and 1), both way are breaking the vlan down. I
think move real dev to new ns, and add a new vlan dev will be better.

Maybe I need write another patch for 'vlan'.

>>
>> Signed-off-by: Weilong Chen <chenweilong@huawei.com>
>> ---
>> drivers/net/bonding/bond_main.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index f00dd45..897d153 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -3916,6 +3916,9 @@ void bond_setup(struct net_device *bond_dev)
>> 	 * capable
>> 	 */
>>
>> +	/* Don't allow bond devices to change network namespaces. */
>> +	bond_dev->features |= NETIF_F_NETNS_LOCAL;
>> +
>> 	bond_dev->hw_features = BOND_VLAN_FEATURES |
>> 				NETIF_F_HW_VLAN_CTAG_TX |
>> 				NETIF_F_HW_VLAN_CTAG_RX |
>> -- 
>> 1.7.12
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply

* Re: [PATCH net-next 3/3] bonding: cleanup some redundant code and wrong variables
From: Ding Tianhong @ 2014-01-23  2:32 UTC (permalink / raw)
  To: Nikolay Aleksandrov, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, David S. Miller, Netdev
In-Reply-To: <52DFD587.4070109@redhat.com>

On 2014/1/22 22:28, Nikolay Aleksandrov wrote:
> On 01/22/2014 10:22 AM, Ding Tianhong wrote:
>> The dev_set_mac_address() will check the dev->netdev_ops->ndo_set_mac_address,
>> so no need to check it in bond_set_mac_address().
>>
>> Fix the wrong variables for pr_err().
>>
>> Cc: Jay Vosburgh <fubar@us.ibm.com>
>> Cc: Veaceslav Falico <vfalico@redhat.com>
>> Cc: Andy Gospodarek <andy@greyhouse.net>
>> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>> ---
>>  drivers/net/bonding/bond_main.c | 10 +---------
>>  1 file changed, 1 insertion(+), 9 deletions(-)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index ce0f5c0..9d92f46 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -3509,15 +3509,7 @@ static int bond_set_mac_address(struct net_device *bond_dev, void *addr)
>>  	 */
>>  
>>  	bond_for_each_slave(bond, slave, iter) {
>> -		const struct net_device_ops *slave_ops = slave->dev->netdev_ops;
>>  		pr_debug("slave %p %s\n", slave, slave->dev->name);
>> -
>> -		if (slave_ops->ndo_set_mac_address == NULL) {
>> -			res = -EOPNOTSUPP;
>> -			pr_debug("EOPNOTSUPP %s\n", slave->dev->name);
>> -			goto unwind;
>> -		}
>> -
>>  		res = dev_set_mac_address(slave->dev, addr);
>>  		if (res) {
>>  			/* TODO: consider downing the slave
>> @@ -4317,7 +4309,7 @@ static int bond_check_params(struct bond_params *params)
>>  						      fail_over_mac_tbl);
>>  		if (fail_over_mac_value == -1) {
>>  			pr_err("Error: invalid fail_over_mac \"%s\"\n",
>> -			       arp_validate == NULL ? "NULL" : arp_validate);
>> +			       fail_over_mac == NULL ? "NULL" : fail_over_mac);
>>  			return -EINVAL;
>>  		}
> My option API changes include a fix for this which also removes the NULL check
> as it's already checked earlier if fail_over_mac is != NULL. If you'd like to
> fix this yourself I think you can also skip the NULL check when printing the error.
> 
yes, it exist a long time, need to update. thanks.

> Cheers,
>  Nik
>>  
>>
> 
> 
> .
> 

^ permalink raw reply

* Re: [PATCH net-next 1/3] bonding: Set the correct value to fail_over_mac at enslavement
From: Ding Tianhong @ 2014-01-23  2:31 UTC (permalink / raw)
  To: Nikolay Aleksandrov, Jay Vosburgh, Veaceslav Falico,
	David S. Miller, Netdev, Andy Gospodarek
In-Reply-To: <52DFD3BB.5010708@redhat.com>

On 2014/1/22 22:20, Nikolay Aleksandrov wrote:
> On 01/22/2014 10:22 AM, Ding Tianhong wrote:
>> If the new slave don't support setting the MAC address, there are
>> two ways to handle this situation:
>>
>> 1). If the new slave is the first slave, set bond to the new slave's
>>     MAC address, if the mode is active-backup, set fail_over_mac to
>>     active, otherwise set fail_over_mac to none.
>> 2). If the new slave is not the first slave and the fail_over_mac is
>>     active, it means that the slave could work normally in active-backup
>>     mode, otherwise if the fail_over_mac is none, the slave could not
>>     work normally for no active-backup mode, so bond could not ensalve
>>     the new dev.
>>
>> Cc: Jay Vosburgh <fubar@us.ibm.com>
>> Cc: Veaceslav Falico <vfalico@redhat.com>
>> Cc: Andy Gospodarek <andy@greyhouse.net>
>> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>> ---
>>  drivers/net/bonding/bond_main.c | 12 ++++++++++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 3220b48..598f100 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -1334,9 +1334,17 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
>>  
>>  	if (slave_ops->ndo_set_mac_address == NULL) {
>>  		if (!bond_has_slaves(bond)) {
>> -			pr_warning("%s: Warning: The first slave device specified does not support setting the MAC address. Setting fail_over_mac to active.",
>> +			pr_warning("%s: Warning: The first slave device specified does not support setting the MAC address.\n",
>>  				   bond_dev->name);
>> -			bond->params.fail_over_mac = BOND_FOM_ACTIVE;
>> +			if (bond->params.mode == BOND_MODE_ACTIVEBACKUP) {
>> +				bond->params.fail_over_mac = BOND_FOM_ACTIVE;
>> +				pr_warning("%s: Setting fail_over_mac to active for active-backup mode.\n",
>> +					   bond_dev->name);
>> +			} else {
>> +				bond->params.fail_over_mac = BOND_FOM_NONE;
>> +				pr_warning("%s: Setting fail_over_mac to none for no active-backup modes",
> At least make the warnings consistent: "... for no active-backup mode.\n".
> Also you might consider switching to pr_warn.
> 
ok, thanks.

>> +					   bond_dev->name);
>> +			}
>>  		} else if (bond->params.fail_over_mac != BOND_FOM_ACTIVE) {
>>  			pr_err("%s: Error: The slave device specified does not support setting the MAC address, but fail_over_mac is not set to active.\n",
>>  			       bond_dev->name);
>>
> 
> 
> .
> 

^ permalink raw reply

* Re: [PATCH] SUNRPC: Allow one callback request to be received from two sk_buff
From: shaobingqing @ 2014-01-23  2:23 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Dr Fields James Bruce, Miller David S., linuxnfs, netdev,
	Linux Kernel Mailing List
In-Reply-To: <1BFACA51-087E-4945-851A-FBF0F108604C@primarydata.com>

2014/1/21 Trond Myklebust <trond.myklebust@primarydata.com>:
>
> On Jan 21, 2014, at 3:08, shaobingqing <shaobingqing@bwstor.com.cn> wrote:
>
>> 2014/1/21 Trond Myklebust <trond.myklebust@primarydata.com>:
>>> On Mon, 2014-01-20 at 14:59 +0800, shaobingqing wrote:
>>>> In current code, there only one struct rpc_rqst is prealloced. If one
>>>> callback request is received from two sk_buff, the xprt_alloc_bc_request
>>>> would be execute two times with the same transport->xid. The first time
>>>> xprt_alloc_bc_request will alloc one struct rpc_rqst and the TCP_RCV_COPY_DATA
>>>> bit of transport->tcp_flags will not be cleared. The second time
>>>> xprt_alloc_bc_request could not alloc struct rpc_rqst any more and NULL
>>>> pointer will be returned, then xprt_force_disconnect occur. I think one
>>>> callback request can be allowed to be received from two sk_buff.
>>>>
>>>> Signed-off-by: shaobingqing <shaobingqing@bwstor.com.cn>
>>>> ---
>>>> net/sunrpc/xprtsock.c |   11 +++++++++--
>>>> 1 files changed, 9 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
>>>> index ee03d35..606950d 100644
>>>> --- a/net/sunrpc/xprtsock.c
>>>> +++ b/net/sunrpc/xprtsock.c
>>>> @@ -1271,8 +1271,13 @@ static inline int xs_tcp_read_callback(struct rpc_xprt *xprt,
>>>>      struct sock_xprt *transport =
>>>>                              container_of(xprt, struct sock_xprt, xprt);
>>>>      struct rpc_rqst *req;
>>>> +     static struct rpc_rqst *req_partial;
>>>> +
>>>> +     if (req_partial == NULL)
>>>> +             req = xprt_alloc_bc_request(xprt);
>>>> +     else if (req_partial->rq_xid == transport->tcp_xid)
>>>> +             req = req_partial;
>>>
>>> What happens here if req_partial->rq_xid != transport->tcp_xid? AFAICS,
>>> req will be undefined. Either way, you cannot use a static variable for
>>> storage here: that isn't re-entrant.
>>
>> Because metadata sever only have one slot for backchannel request,
>> req_partial->rq_xid == transport->tcp_xid always happens, if the callback
>> request just being splited in two sk_buffs. But req_partial->rq_xid !=
>> transport->tcp_xid may also happens in some special cases, such as
>> retransmission occurs?
>
> If the server retransmits, then it is broken. The NFSv4.1 protocol does not allow it to retransmit unless the connection breaks.

What I am saying above is bogus. As far as I can see, If one callback
request is splitted into two sk_buffs, the function
xs_tcp_read_callback will be called two times with the same rpc_xprt
and the same xid. If between the two calls there is
another call with the same rpc_xprt, but different xid, we consider it
is another callback request from the same server, in
the condition that there is no retransmission in our enviorenment. But
this might not happen because there is only one
callback slot in each server.

>
>> If one callback request is splited in two sk_buffs, xs_tcp_read_callback
>> will be execute two times. The req_partial should be a static variable,
>> because  the second execution of xs_tcp_read_callback should use
>> the rpc_rqst allocated for the first execution, which saves information
>> copies from the first sk_buff.
>
> No! This is a multi-threaded/process environment which can support multiple connection. It is a bug to use a static variable.

I think I have misunderstood the question. Here a static variable can
not be used. Perhaps, we should define a variable for each
rpc_client  (or rpc xprt).

>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Linux kernel patch: elide fib_validate_source() completely when possible - bad side effect?
From: Hannes Frederic Sowa @ 2014-01-23  2:06 UTC (permalink / raw)
  To: Julian Anastasov, gregory hoggarth, netdev
In-Reply-To: <20140123020224.GG7269@order.stressinduktion.org>

On Thu, Jan 23, 2014 at 03:02:24AM +0100, Hannes Frederic Sowa wrote:
> On Wed, Jan 22, 2014 at 11:04:52PM +0200, Julian Anastasov wrote:
> > 	I don't remember someone mentioning about such
> > side-effect, I guess it is overlooked. IMHO, it is not a good
> > reason to restore the old behavior. Lets see other opinions.
> 
> A routing lookup to check for broadcast is much too expansive, I agree. But
> can't we just check skb->pkt_type == PACKET_BROADCAST?

Ah, sorry, other way around, we need to check the source address. Maybe a
small hash table to look for currently installed broadcast addresses?

Greetings,

  Hannes

^ permalink raw reply

* Re: Linux kernel patch: elide fib_validate_source() completely when possible - bad side effect?
From: Hannes Frederic Sowa @ 2014-01-23  2:02 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: gregory hoggarth, netdev
In-Reply-To: <alpine.LFD.2.11.1401222222010.1855@ja.home.ssi.bg>

On Wed, Jan 22, 2014 at 11:04:52PM +0200, Julian Anastasov wrote:
> 	I don't remember someone mentioning about such
> side-effect, I guess it is overlooked. IMHO, it is not a good
> reason to restore the old behavior. Lets see other opinions.

A routing lookup to check for broadcast is much too expansive, I agree. But
can't we just check skb->pkt_type == PACKET_BROADCAST?

Greetings,

  Hannes

^ permalink raw reply

* Re: [PATCH 0/4] ipv4: small set of fixes
From: David Miller @ 2014-01-23  1:54 UTC (permalink / raw)
  To: popovich_sergei; +Cc: netdev
In-Reply-To: <cover.1390304505.git.popovich_sergei@mail.ru>

From: Sergey Popovich <popovich_sergei@mail.ru>
Date: Tue, 21 Jan 2014 13:48:47 +0200

> In this patch series I present my attempt to address
> some issues in IPv4 implementation.
> 
> Please take time to review my patches.

You're going to have to repost these with the adjustments I've
asked for.

^ permalink raw reply

* Re: [PATCH 4/4] ipv4: mark nexthop as dead when it's subnet becomes unreachable
From: David Miller @ 2014-01-23  1:53 UTC (permalink / raw)
  To: popovich_sergei; +Cc: netdev
In-Reply-To: <40044b636dbf7ae5bba5fe2873451e14438ec170.1390304505.git.popovich_sergei@mail.ru>

From: Sergey Popovich <popovich_sergei@mail.ru>
Date: Tue, 21 Jan 2014 13:48:51 +0200

> Also optimize loop a bit:
>   move handling of force > 1 out of the change_nexthops() loop.

Do not mix bug fixes with optimizations.

^ permalink raw reply

* Re: [PATCH 3/4] ipv4: use SNMP macro assuming softirq context in ip_forward().
From: David Miller @ 2014-01-23  1:52 UTC (permalink / raw)
  To: popovich_sergei; +Cc: netdev
In-Reply-To: <3c324aaf986e6d5a0678cac6d292dd0b96bd9766.1390304505.git.popovich_sergei@mail.ru>

From: Sergey Popovich <popovich_sergei@mail.ru>
Date: Tue, 21 Jan 2014 13:48:50 +0200

> ip_forward() runs entirely in softirq context, we could use
> version of SNMP macro which updates stats counters assuming that.
> 
> Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru>

This is not a bug fix, it's an optimization.

^ permalink raw reply

* Re: [PATCH 1/4] ipv4: don't disable interface if last ipv4 address is removed
From: David Miller @ 2014-01-23  1:52 UTC (permalink / raw)
  To: popovich_sergei; +Cc: netdev
In-Reply-To: <2748ead3b814852075f954ea157fe5d7288531e2.1390304505.git.popovich_sergei@mail.ru>

From: Sergey Popovich <popovich_sergei@mail.ru>
Date: Tue, 21 Jan 2014 13:48:48 +0200

> Since
> 
>   commit 876fd05ddbae03166e7037fca957b55bb3be6594
>   Author: Hannes Frederic Sowa <>
>   Date:   Mon Jun 24 00:22:20 2013 +0200
> 
>     ipv6: don't disable interface if last ipv6 address is removed
> 
> we have disabled this behavior for IPv6. Adjust behavior for IPv4.

I do not agree with this change.

IPV4 is a lot different from ipv6, for example ipv4 lacks things like
temporary address assignments and other notifications that links listen
to even if they don't have explicit static addressed assigned.

The ipv4 behavior also has two decades of precedence, and I think you'll
have a very hard time convincing me that nobody depends upon it.

So I'm not applying this patch, sorry.

^ permalink raw reply

* Re: [PATCH net-next v5 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
From: David Miller @ 2014-01-23  1:50 UTC (permalink / raw)
  To: zoltan.kiss
  Cc: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel,
	jonathan.davies
In-Reply-To: <1390253069-25507-1-git-send-email-zoltan.kiss@citrix.com>

From: Zoltan Kiss <zoltan.kiss@citrix.com>
Date: Mon, 20 Jan 2014 21:24:20 +0000

> A long known problem of the upstream netback implementation that on the TX
> path (from guest to Dom0) it copies the whole packet from guest memory into
> Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
> huge perfomance penalty. The classic kernel version of netback used grant
> mapping, and to get notified when the page can be unmapped, it used page
> destructors. Unfortunately that destructor is not an upstreamable solution.
> Ian Campbell's skb fragment destructor patch series [1] tried to solve this
> problem, however it seems to be very invasive on the network stack's code,
> and therefore haven't progressed very well.
> This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
> know when the skb is freed up.

This series does not apply to net-next due to some other recent changes.

Please respin, thanks.

^ permalink raw reply

* RE: Freescale FEC packet loss
From: fugang.duan @ 2014-01-23  1:49 UTC (permalink / raw)
  To: Marek Vasut, netdev@vger.kernel.org
  Cc: Frank.Li@freescale.com, Fabio.Estevam@freescale.com,
	Hector Palacios, linux-arm-kernel@lists.infradead.org,
	Detlev Zundel, Eric Nelson
In-Reply-To: <201401222255.29467.marex@denx.de>

Hi, Marek,


From: Marek Vasut <marex@denx.de>
Data: Thursday, January 23, 2014 5:55 AM
>To: netdev@vger.kernel.org
>Cc: Li Frank-B20596; Duan Fugang-B38611; Estevam Fabio-R49496; Hector Palacios;
>linux-arm-kernel@lists.infradead.org; Detlev Zundel; Eric Nelson
>Subject: Freescale FEC packet loss
>
>Hi guys,
>
>I am running stock Linux 3.13 on i.MX6Q SabreLite board. The CPU is i.MX6Q TO
>1.0 .
>
>I am hitting a WARNING when I use the FEC ethernet to transfer data, thus I
>started investigating this problem. TL;DR I am not able to figure this problem
>out, so I am not attaching a patch :-(
>
>Steps to reproduce:
>-------------------
>1) Boot stock Linux 3.13 on i.MX6Q SabreLite board
>2) Plug in an SD card into one of the SD slots (I use the full-size one)
>3) Plug in an USB stick into one of the USB ports (I use the upper one)
>4) Plug in an ethernet cable into the board
>   -> Connect the other side into a gigabit-capable PC
>   -> Let us assume the PC has IP address 192.168.1.1/24 and
>      the board 192.168.1.2/24
>5) Install iperf on both boards
>6) Bring up the ethernet link on both ends:
>   $ ifconfig ...
>7) On the PC, run:
>   $ iperf -s -i 1
>8) Start producing load on the in-CPU busses. Run this on the board:
>   $ sha1sum /dev/mmcblk0 &
>   $ sha1sum /dev/sda &
>9) Now let the board generate ethernet traffic:
>   $ iperf -c 192.168.1.1 -t 1000 -i 1
>
>Now wait about 10 minutes and the system will produce a warning and you will
>observe dips in the transmission speed. You can see the output at the end of
>the email.
>
>I observe that this happens more often when there is a severe load on the
>busses, which in this case I simulate by running the sha1sum on /dev/mmcblk0
>and sha1sum /dev/sda in the background in step 8) . I can also well simulate
>this by running 'sha1sum /dev/mmcblk0 & sha1sum /dev/mmcblk1 &' when I have
>both SD cards populated on the MX6Q sabrelite with the same result (WARNING and
>dips in speed).
>
>There was apparently a thread about this problem already [1] where the person
>used SATA to produce high bus load and had exactly the same result.
>
>One more observation is that I managed to replicate this problem all the way
>back to Linux 3.3.8 , which is one of the first kernel versions where sabrelite
>was supported. I also see this one the Freescale's 3.0.35-4.1.0 .
>
>I have trouble figuring out what this problem is all about. Can you please help
>me? Thank you!
>
>[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2013-
>October/202519.html
>
>root@(none):~# iperf -c 192.168.1.1 -t 1000 -i 1
>------------------------------------------------------------
>Client connecting to 192.168.1.1, TCP port 5001 TCP window size: 43.8 KByte
>(default)
>------------------------------------------------------------
>[  3] local 192.168.1.128 port 36760 connected with 192.168.1.1 port 5001
>[ ID] Interval       Transfer     Bandwidth
>[  3]  0.0- 1.0 sec  25.5 MBytes   214 Mbits/sec
>[  3]  1.0- 2.0 sec  28.9 MBytes   242 Mbits/sec
>[  3]  2.0- 3.0 sec  24.1 MBytes   202 Mbits/sec
>[  3]  3.0- 4.0 sec  21.1 MBytes   177 Mbits/sec
>[  3]  4.0- 5.0 sec  20.2 MBytes   170 Mbits/sec
>[  3]  5.0- 6.0 sec  20.0 MBytes   168 Mbits/sec
>[  3]  6.0- 7.0 sec  20.0 MBytes   168 Mbits/sec
>[  3]  7.0- 8.0 sec  20.1 MBytes   169 Mbits/sec
>[  3]  8.0- 9.0 sec  20.5 MBytes   172 Mbits/sec
>[  3]  9.0-10.0 sec  21.5 MBytes   180 Mbits/sec
>[  3] 10.0-11.0 sec  20.0 MBytes   168 Mbits/sec
>[  3] 11.0-12.0 sec  19.4 MBytes   163 Mbits/sec
>[  3] 12.0-13.0 sec  19.6 MBytes   165 Mbits/sec
>[  3] 13.0-14.0 sec  19.8 MBytes   166 Mbits/sec
>[  3] 14.0-15.0 sec  19.4 MBytes   163 Mbits/sec
>[  275.945247] ------------[ cut here ]------------ [  275.949920] WARNING: CPU:
>3 PID: 1155 at net/sched/sch_generic.c:264
>dev_watchdog+0x280/0x2a4()
>[  275.958679] NETDEV WATCHDOG: eth3 (fec): transmit queue 0 timed out
>[  275.964985] Modules linked in:
>[  275.968096] CPU: 3 PID: 1155 Comm: sha1sum Not tainted 3.13.0 #18
>[  275.974217] Backtrace:
>[  275.976787] [<80012410>] (dump_backtrace+0x0/0x10c) from [<800125ac>]
>(show_stack+0x18/0x1c)
>[  275.985271]  r6:00000108 r5:00000009 r4:00000000 r3:00000000
>[  275.991050] [<80012594>] (show_stack+0x0/0x1c) from [<8060f9c8>]
>(dump_stack+0x80/0x9c)
>[  275.999103] [<8060f948>] (dump_stack+0x0/0x9c) from [<8002703c>]
>(warn_slowpath_common+0x6c/0x90)
>[  276.008017]  r4:bef43e10 r3:bef96c00
>[  276.011663] [<80026fd0>] (warn_slowpath_common+0x0/0x90) from [<80027104>]
>(warn_slowpath_fmt+0x3)
>[  276.021170]  r8:bf098240 r7:bef42000 r6:bf098200 r5:bf068000 r4:00000000
>[  276.028083] [<800270cc>] (warn_slowpath_fmt+0x0/0x40) from [<804e2754>]
>(dev_watchdog+0x280/0x2a4)
>[  276.037095]  r3:bf068000 r2:807d2fb4
>[  276.040734] [<804e24d4>] (dev_watchdog+0x0/0x2a4) from [<80030e1c>]
>(call_timer_fn+0x70/0xec)
>[  276.049313] [<80030dac>] (call_timer_fn+0x0/0xec) from [<80031a90>]
>(run_timer_softirq+0x198/0x23)
>[  276.058407]  r8:00200200 r7:00000000 r6:bef43ec8 r5:bf836000 r4:bf068284
>[  276.065257] [<800318f8>] (run_timer_softirq+0x0/0x230) from [<8002b480>]
>(__do_softirq+0x100/0x25)
>[  276.074338] [<8002b380>] (__do_softirq+0x0/0x254) from [<8002b9a8>]
>(irq_exit+0xb0/0x108)
>[  276.082570] [<8002b8f8>] (irq_exit+0x0/0x108) from [<8000f3dc>]
>(handle_IRQ+0x58/0xb8)
>[  276.090528]  r4:8086ccc8 r3:00000184
>[  276.094162] [<8000f384>] (handle_IRQ+0x0/0xb8) from [<80008640>]
>(gic_handle_irq+0x30/0x64)
>[  276.102552]  r8:5590d9fa r7:f4000100 r6:bef43fb0 r5:8086ce14 r4:f400010c
>r3:000000a0
>[  276.110575] [<80008610>] (gic_handle_irq+0x0/0x64) from [<800133a0>]
>(__irq_usr+0x40/0x60)
>[  276.118871] Exception stack(0xbef43fb0 to 0xbef43ff8)
>[  276.123942] 3fa0:                                     fbe9d585 13e98d33
>6ed9eba1 21e67a57
>[  276.132173] 3fc0: 170dd9fc bc58d89e 5d1b7878 c19f2bf4 5590d9fa 2577bcc4
>7b2ac1ea 6cee44dd [  276.140398] 3fe0: cf486f09 7ea92a58 0000ba1f 0000ab72
>a0000030 ffffffff [  276.147041]  r7:c19f2bf4 r6:ffffffff r5:a0000030
>r4:0000ab72 [  276.152846] ---[ end trace 054500acb8edb763 ]---
>[  3] 15.0-16.0 sec  18.8 MBytes   157 Mbits/sec
>[  3] 16.0-17.0 sec  0.00 Bytes  0.00 bits/sec [  3] 17.0-18.0 sec  0.00 Bytes
>0.00 bits/sec [  3] 18.0-19.0 sec  0.00 Bytes  0.00 bits/sec [  3] 19.0-20.0
>sec  4.25 MBytes  35.7 Mbits/sec
>[  3] 20.0-21.0 sec  19.8 MBytes   166 Mbits/sec
>[  3] 21.0-22.0 sec  19.8 MBytes   166 Mbits/sec
>[  3] 22.0-23.0 sec  19.5 MBytes   164 Mbits/sec
>[  3] 23.0-24.0 sec  8.38 MBytes  70.3 Mbits/sec [  3] 24.0-25.0 sec  0.00
>Bytes  0.00 bits/sec [  3] 25.0-26.0 sec  7.88 MBytes  66.1 Mbits/sec
>[  3] 26.0-27.0 sec  20.1 MBytes   169 Mbits/sec
>[...]
>[  3] 71.0-72.0 sec  23.4 MBytes   196 Mbits/sec
>[  3] 72.0-73.0 sec  12.2 MBytes   103 Mbits/sec
>[  3] 73.0-74.0 sec  0.00 Bytes  0.00 bits/sec [  3] 74.0-75.0 sec  0.00 Bytes
>0.00 bits/sec [  3] 75.0-76.0 sec  10.9 MBytes  91.2 Mbits/sec
>[  3] 76.0-77.0 sec  22.4 MBytes   188 Mbits/sec
>[  3] 77.0-78.0 sec  23.0 MBytes   193 Mbits/sec
>

I will debug the issue when I am free, and then report the result to you.
Thanks for your reporting the issue.

Thanks,
Andy

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox