Netdev List
 help / color / mirror / Atom feed
* Re: Recent Linus' tree, kernel BUG at fs/inode.c:1436!
From: Al Viro @ 2014-12-19 12:01 UTC (permalink / raw)
  To: Pavel Emelyanov; +Cc: Linux Netdev List, linux-fsdevel
In-Reply-To: <54940D28.8050901@parallels.com>

On Fri, Dec 19, 2014 at 02:34:00PM +0300, Pavel Emelyanov wrote:
> Hi,
> 
> It looks like there's a strange refcount underflow in VFS/socket code.
> The proggie [1] crashes the recent Linus' tree (d790be38 Merge tag
> 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux) 
> with the calltrace [2].
> 
> If in the proggie the psk is replaced with non-socket descriptor the
> issue doesn't appear.

Gyah... mismerge on cherry-pick.  My fault - ->i_fop assignment should've
been removed from sock_alloc_file() in bd9b51.  Could you verify that the
following recovers the things?

diff --git a/net/socket.c b/net/socket.c
index 70bbde6..a2c33a4 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -372,7 +372,6 @@ struct file *sock_alloc_file(struct socket *sock, int flags, const char *dname)
 	path.mnt = mntget(sock_mnt);
 
 	d_instantiate(path.dentry, SOCK_INODE(sock));
-	SOCK_INODE(sock)->i_fop = &socket_file_ops;
 
 	file = alloc_file(&path, FMODE_READ | FMODE_WRITE,
 		  &socket_file_ops);

^ permalink raw reply related

* RE: [RFC iproute2] tc: Show classes in tree view
From: David Laight @ 2014-12-19 12:02 UTC (permalink / raw)
  To: 'Vadim Kochan', netdev@vger.kernel.org
In-Reply-To: <1418989381-6492-1-git-send-email-vadim4j@gmail.com>

From: Vadim Kochan
> Added new '-t[ree]' which shows classes dependency
> in the tree view.
...
> @@ -278,6 +279,8 @@ int main(int argc, char **argv)
>  			++show_raw;
>  		} else if (matches(argv[1], "-pretty") == 0) {
>  			++show_pretty;
> +		} else if (matches(argv[1], "-tree") == 0) {
> +			++show_tree;
>  		} else if (matches(argv[1], "-Version") == 0) {
>  			printf("tc utility, iproute2-ss%s\n", SNAPSHOT);
>  			return 0;

Clearly you've followed the old code, but it is much clearer
to assign 1 (or true) rather than doing an increment
(unless you expect '-tree -tree' to do something extra).

I believe the whole business of using 'foo++' to change a 0 to a 1
harks back to the pdp11 where it was a shorter instruction sequence.

	David

^ permalink raw reply

* Re: Recent Linus' tree, kernel BUG at fs/inode.c:1436!
From: Pavel Emelyanov @ 2014-12-19 12:08 UTC (permalink / raw)
  To: Al Viro; +Cc: Linux Netdev List, linux-fsdevel
In-Reply-To: <20141219120129.GX22149@ZenIV.linux.org.uk>

On 12/19/2014 03:01 PM, Al Viro wrote:
> On Fri, Dec 19, 2014 at 02:34:00PM +0300, Pavel Emelyanov wrote:
>> Hi,
>>
>> It looks like there's a strange refcount underflow in VFS/socket code.
>> The proggie [1] crashes the recent Linus' tree (d790be38 Merge tag
>> 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux) 
>> with the calltrace [2].
>>
>> If in the proggie the psk is replaced with non-socket descriptor the
>> issue doesn't appear.
> 
> Gyah... mismerge on cherry-pick.  My fault - ->i_fop assignment should've
> been removed from sock_alloc_file() in bd9b51.  Could you verify that the
> following recovers the things?
> 
> diff --git a/net/socket.c b/net/socket.c
> index 70bbde6..a2c33a4 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -372,7 +372,6 @@ struct file *sock_alloc_file(struct socket *sock, int flags, const char *dname)
>  	path.mnt = mntget(sock_mnt);
>  
>  	d_instantiate(path.dentry, SOCK_INODE(sock));
> -	SOCK_INODE(sock)->i_fop = &socket_file_ops;
>  
>  	file = alloc_file(&path, FMODE_READ | FMODE_WRITE,
>  		  &socket_file_ops);
> .
> 

Acked-by: Pavel Emelyanov <xemul@parallels.com>

This also makes socket non-open-able back again, which, in turn, was
another issue I was surprised with on the new kernel :)

Thanks,
Pavel


^ permalink raw reply

* [PATCH net-next v3] net: Do not call ndo_dflt_fdb_dump if ndo_fdb_dump is defined
From: Hubert Sokolowski @ 2014-12-19 12:14 UTC (permalink / raw)
  To: netdev; +Cc: ray.kinsella

Add checking whether the call to ndo_dflt_fdb_dump is needed.
It is not expected to call ndo_dflt_fdb_dump unconditionally
by some drivers (i.e. qlcnic or macvlan) that defines
own ndo_fdb_dump. Other drivers define own ndo_fdb_dump
and don't want ndo_dflt_fdb_dump to be called at all.
At the same time it is desirable to call the default dump
function on a bridge device.

Following tests for filtering have been performed before
the change and after the patch was applied to make sure
they are the same and it doesn't break the filtering algorithm.

[root@localhost ~]# cd /root/iproute2-3.17.0/bridge
[root@localhost bridge]# modprobe dummy
[root@localhost bridge]# ./bridge fdb add f1:f2:f3:f4:f5:f6 dev dummy0
[root@localhost bridge]# brctl addbr br0
[root@localhost bridge]# brctl addif  br0 dummy0
[root@localhost bridge]# ip link set dev br0 address 02:00:00:12:01:04
[root@localhost bridge]# # show all
[root@localhost bridge]# ./bridge fdb show
33:33:00:00:00:01 dev p2p1 self permanent
01:00:5e:00:00:01 dev p2p1 self permanent
33:33:ff:ac:ce:32 dev p2p1 self permanent
33:33:00:00:02:02 dev p2p1 self permanent
01:00:5e:00:00:fb dev p2p1 self permanent
33:33:00:00:00:01 dev p7p1 self permanent
01:00:5e:00:00:01 dev p7p1 self permanent
33:33:ff:79:50:53 dev p7p1 self permanent
33:33:00:00:02:02 dev p7p1 self permanent
01:00:5e:00:00:fb dev p7p1 self permanent
7e:0c:8b:b1:59:da dev dummy0 master br0 permanent
7e:0c:8b:b1:59:da dev dummy0 vlan 1 master br0 permanent
33:33:00:00:00:01 dev dummy0 self permanent
f1:f2:f3:f4:f5:f6 dev dummy0 self permanent
33:33:00:00:00:01 dev br0 self permanent
02:00:00:12:01:04 dev br0 vlan 1 master br0 permanent
02:00:00:12:01:04 dev br0 master br0 permanent
[root@localhost bridge]# # filter by bridge
[root@localhost bridge]# ./bridge fdb show br br0
7e:0c:8b:b1:59:da dev dummy0 master br0 permanent
7e:0c:8b:b1:59:da dev dummy0 vlan 1 master br0 permanent
33:33:00:00:00:01 dev dummy0 self permanent
f1:f2:f3:f4:f5:f6 dev dummy0 self permanent
33:33:00:00:00:01 dev br0 self permanent
02:00:00:12:01:04 dev br0 vlan 1 master br0 permanent
02:00:00:12:01:04 dev br0 master br0 permanent
[root@localhost bridge]# # filter by port
[root@localhost bridge]# ./bridge fdb show brport dummy0
7e:0c:8b:b1:59:da master br0 permanent
7e:0c:8b:b1:59:da vlan 1 master br0 permanent
33:33:00:00:00:01 self permanent
f1:f2:f3:f4:f5:f6 self permanent
[root@localhost bridge]# # filter by port + bridge
[root@localhost bridge]# ./bridge fdb show br br0 brport dummy0
7e:0c:8b:b1:59:da master br0 permanent
7e:0c:8b:b1:59:da vlan 1 master br0 permanent
33:33:00:00:00:01 self permanent
f1:f2:f3:f4:f5:f6 self permanent
[root@localhost bridge]#

Also following test was performed to proove it fixes the problem
with macvlan driver where dflt_fdb_dump was called twice showing
duplicate self entries:
[root@localhost bridge]# modprobe dummy
[root@localhost bridge]# ip li add link dummy0 mac0 type macvlan
[root@localhost bridge]# ./bridge fdb show dev mac0
33:33:00:00:00:01 self permanent
[root@localhost bridge]#

Signed-off-by: Hubert Sokolowski <hubert.sokolowski@intel.com>
---
 net/core/rtnetlink.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d06107d..d32518c 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2693,7 +2693,10 @@ static int rtnl_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb)
 							 idx);
 		}
 
-		idx = ndo_dflt_fdb_dump(skb, cb, dev, NULL, idx);
+		if ((dev->priv_flags & IFF_EBRIDGE) ||
+		    !(dev->netdev_ops->ndo_fdb_dump))
+			idx = ndo_dflt_fdb_dump(skb, cb, dev, NULL, idx);
+
 		if (dev->netdev_ops->ndo_fdb_dump)
 			idx = dev->netdev_ops->ndo_fdb_dump(skb, cb, bdev, dev,
 							    idx);
-- 
1.9.3

^ permalink raw reply related

* Re: [PATCH 2/3] cmtp: cmtp_add_connection() should verify that it's dealing with l2cap socket
From: Marcel Holtmann @ 2014-12-19 12:48 UTC (permalink / raw)
  To: Al Viro; +Cc: David S. Miller, netdev, linux-bluetooth
In-Reply-To: <1418970059-32486-2-git-send-email-viro@ZenIV.linux.org.uk>

Hi Al,

> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> ... rather than relying on ciptool(8) never passing it anything else.  Give
> it e.g. an AF_UNIX connected socket (from socketpair(2)) and it'll oops,
> trying to evaluate &l2cap_pi(sock->sk)->chan->dst...
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> net/bluetooth/cmtp/core.c | 2 ++
> 1 file changed, 2 insertions(+)

patch has been applied to bluetooth tree.

Regards

Marcel

^ permalink raw reply

* Re: [PATCH 3/3] bnep: bnep_add_connection() should verify that it's dealing with l2cap socket
From: Marcel Holtmann @ 2014-12-19 12:48 UTC (permalink / raw)
  To: Al Viro; +Cc: David S. Miller, netdev, linux-bluetooth
In-Reply-To: <1418970059-32486-3-git-send-email-viro@ZenIV.linux.org.uk>

Hi Al,

> same story as cmtp
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> net/bluetooth/bnep/core.c | 3 +++
> 1 file changed, 3 insertions(+)

patch has been applied to bluetooth tree.

Regards

Marcel

^ permalink raw reply

* Re: [PATCH 1/3] bluetooth: hidp_connection_add() unsafe use of l2cap_pi()
From: Marcel Holtmann @ 2014-12-19 12:49 UTC (permalink / raw)
  To: Al Viro; +Cc: David S. Miller, netdev, linux-bluetooth
In-Reply-To: <1418970059-32486-1-git-send-email-viro@ZenIV.linux.org.uk>

Hi Al,

> it's OK after we'd verified the sockets, but not before that.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> net/bluetooth/hidp/core.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)

patch has been applied to bluetooth tree.

Regards

Marcel

^ permalink raw reply

* Re: [patches] a bunch of old bluetooth fixes
From: Marcel Holtmann @ 2014-12-19 12:57 UTC (permalink / raw)
  To: Al Viro, David S. Miller; +Cc: Network Development, BlueZ development
In-Reply-To: <B0609FEE-1CE5-4618-A0B6-B2B82B1EC74D-kz+m5ild9QBg9hUCZPvPmw@public.gmane.org>

Hi Dave,

>> 	This stuff has been sitting in my queue since March; basically,
>> several places in net/bluetooth assume that they are dealing with
>> l2cap sockets, while it is possible to get an arbitrary socket to those.
>> Results are not pretty.
>> 	* HIDPCONNADD gets an arbitrary user-supplied socket; the code
>> it calls (hidp_connection_add()) verifies that the socket is l2cap one,
>> but before doing so it finds l2cap_pi(ctrl_sock->sk)->chan.  It's not
>> that big a deal (it's only 5 words past the end of struct sock), but
>> it's trivial to avoid and, in theory, we might end up oopsing here if
>> we are very unlucky and it happens to hit an unmapped page just past
>> the actual object ctrl_sock->sk sits in.
>> 	* CMTP counterpart of that doesn't validate the socket at all.
>> It proceeds to
>>       s = __cmtp_get_session(&l2cap_pi(sock->sk)->chan->dst);
>> which can very easily oops - here ->chan is already garbage and we
>> proceed to dereference that.  As with HIDP, one needs CAP_NET_ADMIN to
>> trigger that, but it's really a clear bug.  The only sanity check we
>> do is verifying that nsock->sk->sk_state is equal to BT_CONNECTED,
>> which is not unique to bluetooth, to put it mildly.  It's just 1,
>> so a TCP_ESTABLISHED tcp socket will pass that check just fune.
>> The fix is trivial...
>> 	* BNEP situation is identical to CMTP one.
>> 
>> I've sent these patches back then (March 10), but they seem to have fallen
>> through the cracks.  The bugs are still there and the fixes still apply.
>> If you would prefer me to resend them after -rc1, just tell...
> 
> they must have really fallen through the cracks since I do not even remember them.
> 
> My take is that these should all go in before -rc1 and preferable also make it into stable. While you need CAP_NET_ADMIN capability, there are clear stupid bugs on our side.
> 
> Dave, we can prepare a pull request for these or do you want to take them directly into net tree?

never mind that. We have another bug in 6LoWPAN with wrongly freeing a skb. I took all 3 of Al's patches now and I will ask Johan to send you a pull request for the whole set for net tree inclusion.

Regards

Marcel

^ permalink raw reply

* pull request: bluetooth 2014-12-19
From: Johan Hedberg @ 2014-12-19 14:34 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-bluetooth

[-- Attachment #1: Type: text/plain, Size: 1349 bytes --]

Hi Dave,

Here's one more pull request for 3.19. It contains the socket type
verification fixes from Al Viro as well as an skb double-free fix for
6lowpan from Jukka Rissanen.

Please let me know if there are any issues pulling. Thanks.

Johan

---
The following changes since commit 00c845dbfe2e966a2efd3818e40f46e286ca1ae6:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2014-12-18 16:41:13 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth.git for-upstream

for you to fetch changes up to 71bb99a02b32b4cc4265118e85f6035ca72923f0:

  Bluetooth: bnep: bnep_add_connection() should verify that it's dealing with l2cap socket (2014-12-19 13:48:27 +0100)

----------------------------------------------------------------
Al Viro (3):
      Bluetooth: hidp_connection_add() unsafe use of l2cap_pi()
      Bluetooth: cmtp: cmtp_add_connection() should verify that it's dealing with l2cap socket
      Bluetooth: bnep: bnep_add_connection() should verify that it's dealing with l2cap socket

Jukka Rissanen (1):
      Bluetooth: 6lowpan: Do not free skb when packet is dropped

 net/bluetooth/6lowpan.c   | 1 -
 net/bluetooth/bnep/core.c | 3 +++
 net/bluetooth/cmtp/core.c | 3 +++
 net/bluetooth/hidp/core.c | 3 ++-
 4 files changed, 8 insertions(+), 2 deletions(-)


[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* [PATCH RESEND] stmmac: Don't init ptp again when resume from suspend/hibernation
From: Huacai Chen @ 2014-12-19 14:38 UTC (permalink / raw)
  To: Giuseppe Cavallaro
  Cc: Srinivas Kandagatla, David S. Miller, netdev, Huacai Chen

Both stmmac_open() and stmmac_resume() call stmmac_hw_setup(), and
stmmac_hw_setup() call stmmac_init_ptp() unconditionally. However, only
stmmac_release() calls stmmac_release_ptp(). Since stmmac_suspend()
doesn't call stmmac_release_ptp(), stmmac_resume() also needn't call
stmmac_init_ptp().

This patch also fix a "scheduling while atomic" problem when resume
from suspend/hibernation. Because stmmac_init_ptp() will trigger
scheduling while stmmac_resume() hold a spinlock.

Callgraph of "scheduling while atomic":
stmmac_resume() --> stmmac_hw_setup() --> stmmac_init_ptp() -->
stmmac_ptp_register() --> ptp_clock_register() --> device_create() -->
device_create_groups_vargs() --> device_add() --> devtmpfs_create_node()
--> wait_for_common() --> schedule_timeout() --> __schedule()

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |   14 ++++++++------
 1 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 118a427..8c6b7c1 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1671,7 +1671,7 @@ static void stmmac_init_tx_coalesce(struct stmmac_priv *priv)
  *  0 on success and an appropriate (-)ve integer as defined in errno.h
  *  file on failure.
  */
-static int stmmac_hw_setup(struct net_device *dev)
+static int stmmac_hw_setup(struct net_device *dev, bool init_ptp)
 {
 	struct stmmac_priv *priv = netdev_priv(dev);
 	int ret;
@@ -1708,9 +1708,11 @@ static int stmmac_hw_setup(struct net_device *dev)
 
 	stmmac_mmc_setup(priv);
 
-	ret = stmmac_init_ptp(priv);
-	if (ret && ret != -EOPNOTSUPP)
-		pr_warn("%s: failed PTP initialisation\n", __func__);
+	if (init_ptp) {
+		ret = stmmac_init_ptp(priv);
+		if (ret && ret != -EOPNOTSUPP)
+			pr_warn("%s: failed PTP initialisation\n", __func__);
+	}
 
 #ifdef CONFIG_DEBUG_FS
 	ret = stmmac_init_fs(dev);
@@ -1787,7 +1789,7 @@ static int stmmac_open(struct net_device *dev)
 		goto init_error;
 	}
 
-	ret = stmmac_hw_setup(dev);
+	ret = stmmac_hw_setup(dev, true);
 	if (ret < 0) {
 		pr_err("%s: Hw setup failed\n", __func__);
 		goto init_error;
@@ -3036,7 +3038,7 @@ int stmmac_resume(struct net_device *ndev)
 	netif_device_attach(ndev);
 
 	init_dma_desc_rings(ndev, GFP_ATOMIC);
-	stmmac_hw_setup(ndev);
+	stmmac_hw_setup(ndev, false);
 	stmmac_init_tx_coalesce(priv);
 
 	napi_enable(&priv->napi);
-- 
1.7.7.3

^ permalink raw reply related

* Re: [RFC PATCH net-next v2 1/1] net: Support for switch port configuration
From: Andy Gospodarek @ 2014-12-19 14:50 UTC (permalink / raw)
  To: B Viswanath
  Cc: Jiri Pirko, Roopa Prabhu, Samudrala, Sridhar, John Fastabend,
	Varlese, Marco, netdev@vger.kernel.org, Thomas Graf,
	sfeldma@gmail.com, linux-kernel@vger.kernel.org
In-Reply-To: <CAN+pFwJr7TvJEW4x_HYPgTEvYGfi+YHgp=5vek2YiU1GLiH6_g@mail.gmail.com>

On Fri, Dec 19, 2014 at 03:05:27PM +0530, B Viswanath wrote:
> On 19 December 2014 at 14:53, Jiri Pirko <jiri@resnulli.us> wrote:
> > Fri, Dec 19, 2014 at 10:01:46AM CET, marichika4@gmail.com wrote:
> >>On 19 December 2014 at 13:57, Jiri Pirko <jiri@resnulli.us> wrote:
> >>> Fri, Dec 19, 2014 at 06:14:57AM CET, marichika4@gmail.com wrote:
> >>>>On 19 December 2014 at 05:18, Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
> >>>>> On 12/18/14, 3:26 PM, Samudrala, Sridhar wrote:
> >><snipped for ease of reading>
> >>>>>>
> >>>>>>
> >>>>>> We also need an interface to set per-switch attributes. Can this work?
> >>>>>>     bridge link set dev sw0 sw_attr bcast_flooding 1 master
> >>>>>> where sw0 is a bridge representing the hardware switch.
> >>>>>
> >>>>>
> >>>>> Not today. We discussed this @ LPC, and one way to do this would be to have
> >>>>> a device
> >>>>> representing the switch asic. This is in the works.
> >>>>
> >>>>
> >>>>Can I assume that on  platforms which house more than one asic (say
> >>>>two 24 port asics, interconnected via a 10G link or equivalent, to get
> >>>>a 48 port 'switch') , the 'rocker' driver (or similar) should expose
> >>>>them as a single set of ports, and not as two 'switch ports' ?
> >>>
> >>> Well that really depends on particular implementation and drivers. If you
> >>> have 2 pci-e devices, I think you should expose them as 2 entities. For
> >>> sure, you can have the driver to do the masking for you. I don't believe
> >>> that is correct though.
> >>>
> >>
> >>In a platform that houses two asic chips, IMO, the user is still
> >>expected to manage the router as a single entity. The configuration
> >>being applied on both asic devices need to be matching if not
> >>identical, and may not be conflicting. The FDB is to be synchronized
> >>so that (offloaded) switching can happen across the asics. Some of
> >>this stuff is asic specific anyway. Another example is that of the
> >>learning. The (hardware) learning can't be enabled on one asic, while
> >>being disabled on another one. The general use cases I have seen are
> >>all involving managing the 'router' as a single entity.  That the
> >>'router' is implemented with two asics instead of a single asic (with
> >>more ports) is to be treated as an implementation detail.  This is the
> >>usual router management method that exists today.
> >>
> >>I hope I make sense.
> >>
> >>So I am trying to figure out what this single entity that will be used
> >>from a user perspective. It can be a bridge, but our bridges are more
> >>802.1q bridges. We can use the 'self' mode, but then it means that it
> >>should reflect the entire port count, and not just an asic.
> >>
> >>So I was trying to deduce that in our switchdevice model, the best bet
> >>would be to leave the unification to the driver (i.e., to project the
> >>multiple physical asics as a single virtual switch device). Thist
> >
> > Is it possible to have the asic as just single one? Or is it possible to
> > connect asics being multiple chips maybe from multiple vendors together?
> 
> I didn't understand the first question. Some times, it is possible to
> have a single asic replace two, but its a cost factor, and others that
> are involved.
> 
> AFAIK, the answer to the second question is a No. Two asics from
> different vendors may not be connected together. The interconnect
> tends to be proprietary.
> 
> > I believe that answer is "yes" in both cases. Making two separate asics
> > to appear as one for user is not correct in my opinion. Driver should
> > not do such masking. It is unclean, unextendable.
> >
> 
> I am only looking for a single management entity. I am not thinking it
> needs to be at driver level. I am not sure of  any other option apart
> from creating a 'switchdev' that Roopa was mentioning.

This is certainly one of the possible use-cases of creating top-level
switching/routing/offload dev to control all of the ASICs, but not the
primary use of such a device at this point.

Earlier in the thread you asked how one might handle the multi-ASIC
configuration.  I agree with the opinion Jiri offered (that was the
concensus we reached when discussing this in person in Oct) that there
were not plans to provide full support for a multi-chip configuration
where the kernel tracks links between chips and keeps routing tables,
FDB tables, et al up to date automatically to everything works happily
without any userspace intervention.  This would be a nice addition at
some point.

At some point there might be a need to provide something like a
device-tree file or configuration on some platforms to provide a mapping
between ports on an ASIC/offload device as they are referenced in the
hardware, front-panel ports on a specific platform, and netdevs.

Something like this could also provide a way to allow the kernel to
configure the in-kernel FDB/neighbor/FIB code to write the appropriate
static entries to an internal interconnect port on a specific platform.
There are also some interesting use-cases for a feature like this in a
single-chip configuration as well, but an in-kernel solution to that
problem has not been explored at this point.

> >>allows any 'switch' level configurations to the bridge in 'self' mode.
> >>
> >>And  then we would need to consider stacking. Stacking differs from
> >>this multi-asic scenario since  there would be multiple CPU involved.

This would also be grouped into that same TODO category, but based on
the fact that these stacking protocols/frame formats appear to be
vendor-specific it would seem that this is an exercise best left to
userspace.

> >>
> >>Thanks
> >>Vissu
> >>
> >>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >>>>> the body of a message to majordomo@vger.kernel.org
> >>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] bonding: avoid re-entry of bond_release
From: Andy Gospodarek @ 2014-12-19 15:11 UTC (permalink / raw)
  To: Wengang Wang; +Cc: netdev
In-Reply-To: <1418979417-28867-1-git-send-email-wen.gang.wang@oracle.com>

On Fri, Dec 19, 2014 at 04:56:57PM +0800, Wengang Wang wrote:
> If bond_release is run against an interface which is already detached from
> it's master, then there is an error message shown like
> 	"<master name> cannot release <slave name>".
> 
> The call path is:
> 	bond_do_ioctl()
> 		bond_release()
> 			__bond_release_one()
> 
> Though it does not really harm, the message the message is misleading.
> This patch tries to avoid the message.
> 
> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
> ---
>  drivers/net/bonding/bond_main.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 184c434..4a71bbd 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -3256,7 +3256,10 @@ static int bond_do_ioctl(struct net_device *bond_dev, struct ifreq *ifr, int cmd
>  		break;
>  	case BOND_RELEASE_OLD:
>  	case SIOCBONDRELEASE:
> -		res = bond_release(bond_dev, slave_dev);
> +		if (slave_dev->flags & IFF_SLAVE)
> +			res = bond_release(bond_dev, slave_dev);
> +		else
> +			res = 0;

Functionally this patch is fine, but I would prefer that you simply
change the check in __bond_release_one to not be so noisy.  There is a
check[1] in bond_enslave to see if a slave is already in a bond and that
just prints a message of netdev_dbg (rather than netdev_err) and it
seems that would be appropriate for this type of message.

[1] from bond_enslave():

        /* already enslaved */
        if (slave_dev->flags & IFF_SLAVE) {
                netdev_dbg(bond_dev, "Error: Device was already enslaved\n");
                return -EBUSY;
        }


>  		break;
>  	case BOND_SETHWADDR_OLD:
>  	case SIOCBONDSETHWADDR:
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next RESEND] net: Do not call ndo_dflt_fdb_dump if ndo_fdb_dump is defined.
From: Hubert Sokolowski @ 2014-12-19 15:17 UTC (permalink / raw)
  To: Jamal Hadi Salim, vyasevic; +Cc: John Fastabend, Roopa Prabhu, netdev
In-Reply-To: <54935618.4070005@mojatatu.com>

On 18/12/14 22:32, Jamal Hadi Salim wrote:

> Sorry for the latency (head-buried-in-sand in effect)
> On 12/17/14 11:18, Hubert Sokolowski wrote:
>> I have just prepared a patch where I dump uc/mc for bridge devices
>> by looking at (dev->priv_flags & IFF_EBRIDGE), so I have same results
>> as without my changes. This should satisfy Jamal and Roopa.
>> I could send it as v3 of my patch along with the results if you are
>> interested.
> Please do. If you satisfy Vlad's goals then we are all happy.

Posted as v3, please review.
There is still open question I asked sometime ago but never got explained.
It is about the new filter_dev parameter that was added to ndo_fdb_dump:
        int                     (*ndo_fdb_dump)(struct sk_buff *skb,
                                                struct netlink_callback *cb,
                                                struct net_device *dev,
                                                struct net_device *filter_dev,
                                                int idx);

When we call this function for a device, dev pointer is passed as the filter_dev:
                if (dev->netdev_ops->ndo_fdb_dump)
                        idx = dev->netdev_ops->ndo_fdb_dump(skb, cb, bdev, dev,
                                                            idx);

This is not an issue for a bridge device and a device that is not enslaved
in a bridge because bdev == dev, but this can be dangerous in other cases.
Let's assume QLogic NIC has a master device, in this case bdev != dev.
Now look what is happening, dev is passed as filter_dev to:
static int qlcnic_fdb_dump(struct sk_buff *skb, struct netlink_callback *ncb,
                        struct net_device *netdev,
                        struct net_device *filter_dev, int idx)
{
        struct qlcnic_adapter *adapter = netdev_priv(netdev);
...

netdev_priv(netdev) returns a pointer to private struct of the bridge,but the driver
is expecting it's own private stuff.

Should we fix the driver and assume filter_dev is /me and dev is our master
or the parameters were reversed and should be passed as (skb, cb, dev, bdev, idx) ?
Is this something for another patch/discussion?

regards,
Hubert

-- 
Hubert Sokolowski          Intel Corporation

^ permalink raw reply

* Re: [RFC PATCH net-next v2 1/1] net: Support for switch port configuration
From: Roopa Prabhu @ 2014-12-19 16:22 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: B Viswanath, Samudrala, Sridhar, John Fastabend, Varlese, Marco,
	netdev@vger.kernel.org, Thomas Graf, sfeldma@gmail.com,
	linux-kernel@vger.kernel.org
In-Reply-To: <20141219095512.GH1848@nanopsycho.orion>

On 12/19/14, 1:55 AM, Jiri Pirko wrote:
> Fri, Dec 19, 2014 at 10:35:27AM CET, marichika4@gmail.com wrote:
>> On 19 December 2014 at 14:53, Jiri Pirko <jiri@resnulli.us> wrote:
>>> Fri, Dec 19, 2014 at 10:01:46AM CET, marichika4@gmail.com wrote:
>>>> On 19 December 2014 at 13:57, Jiri Pirko <jiri@resnulli.us> wrote:
>>>>> Fri, Dec 19, 2014 at 06:14:57AM CET, marichika4@gmail.com wrote:
>>>>>> On 19 December 2014 at 05:18, Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
>>>>>>> On 12/18/14, 3:26 PM, Samudrala, Sridhar wrote:
>>>> <snipped for ease of reading>
>>>>>>>>
>>>>>>>> We also need an interface to set per-switch attributes. Can this work?
>>>>>>>>      bridge link set dev sw0 sw_attr bcast_flooding 1 master
>>>>>>>> where sw0 is a bridge representing the hardware switch.
>>>>>>>
>>>>>>> Not today. We discussed this @ LPC, and one way to do this would be to have
>>>>>>> a device
>>>>>>> representing the switch asic. This is in the works.
>>>>>>
>>>>>> Can I assume that on  platforms which house more than one asic (say
>>>>>> two 24 port asics, interconnected via a 10G link or equivalent, to get
>>>>>> a 48 port 'switch') , the 'rocker' driver (or similar) should expose
>>>>>> them as a single set of ports, and not as two 'switch ports' ?
>>>>> Well that really depends on particular implementation and drivers. If you
>>>>> have 2 pci-e devices, I think you should expose them as 2 entities. For
>>>>> sure, you can have the driver to do the masking for you. I don't believe
>>>>> that is correct though.
>>>>>
>>>> In a platform that houses two asic chips, IMO, the user is still
>>>> expected to manage the router as a single entity. The configuration
>>>> being applied on both asic devices need to be matching if not
>>>> identical, and may not be conflicting. The FDB is to be synchronized
>>>> so that (offloaded) switching can happen across the asics. Some of
>>>> this stuff is asic specific anyway. Another example is that of the
>>>> learning. The (hardware) learning can't be enabled on one asic, while
>>>> being disabled on another one. The general use cases I have seen are
>>>> all involving managing the 'router' as a single entity.  That the
>>>> 'router' is implemented with two asics instead of a single asic (with
>>>> more ports) is to be treated as an implementation detail.  This is the
>>>> usual router management method that exists today.
>>>>
>>>> I hope I make sense.
>>>>
>>>> So I am trying to figure out what this single entity that will be used
>>> >from a user perspective. It can be a bridge, but our bridges are more
>>>> 802.1q bridges. We can use the 'self' mode, but then it means that it
>>>> should reflect the entire port count, and not just an asic.
>>>>
>>>> So I was trying to deduce that in our switchdevice model, the best bet
>>>> would be to leave the unification to the driver (i.e., to project the
>>>> multiple physical asics as a single virtual switch device). Thist
>>> Is it possible to have the asic as just single one? Or is it possible to
>>> connect asics being multiple chips maybe from multiple vendors together?
>> I didn't understand the first question. Some times, it is possible to
> I ment that there is a design with just a single asic of this type,
> instead of a pair.
>
>> have a single asic replace two, but its a cost factor, and others that
>> are involved.
>>
>> AFAIK, the answer to the second question is a No. Two asics from
>> different vendors may not be connected together. The interconnect
>> tends to be proprietary.
> Okay. In that case, it might make sense to mask it on driver level.
>
>
>>> I believe that answer is "yes" in both cases. Making two separate asics
>>> to appear as one for user is not correct in my opinion. Driver should
>>> not do such masking. It is unclean, unextendable.
>>>
>> I am only looking for a single management entity. I am not thinking it
>> needs to be at driver level. I am not sure of  any other option apart
> >from creating a 'switchdev' that Roopa was mentioning.
>
> Well the thing is there is a common desire to make the offloading as
> transparent as possible. For example, have 4 ports of same switch and
> put them into br0. Just like that, without need to do anything else
> than you would do when bridging ordinary NICs. Introducing some
> "management entity" would break this approach.
>
I don't think having a switchdevice breaks this approach. A software 
bridge is not a 1-1 mapping with the asic in all cases.
When its a vlan filtering bridge, yes, it is (In which case all switch 
global l2 non-port specific attributes can be applied to the bridge).

The switch asic can do l2 and l3 too. For a bridge, the switch asic is 
just accelerating l2.
And a switch asic is also capable of l3, acls. A switch device (whether 
accessible to userspace or not)
may become necessary (as discussed in other threads) where you cannot 
resolve a kernel object to a switch port (Global acl rules, unresolved 
route nexthops etc).

^ permalink raw reply

* Re: [PATCH net-next RESEND] net: Do not call ndo_dflt_fdb_dump if ndo_fdb_dump is defined.
From: Roopa Prabhu @ 2014-12-19 16:32 UTC (permalink / raw)
  To: Hubert Sokolowski; +Cc: Jamal Hadi Salim, vyasevic, John Fastabend, netdev
In-Reply-To: <5494418B.1000004@intel.com>

On 12/19/14, 7:17 AM, Hubert Sokolowski wrote:
> On 18/12/14 22:32, Jamal Hadi Salim wrote:
>
>> Sorry for the latency (head-buried-in-sand in effect)
>> On 12/17/14 11:18, Hubert Sokolowski wrote:
>>> I have just prepared a patch where I dump uc/mc for bridge devices
>>> by looking at (dev->priv_flags & IFF_EBRIDGE), so I have same results
>>> as without my changes. This should satisfy Jamal and Roopa.
>>> I could send it as v3 of my patch along with the results if you are
>>> interested.
>> Please do. If you satisfy Vlad's goals then we are all happy.
> Posted as v3, please review.
> There is still open question I asked sometime ago but never got explained.
> It is about the new filter_dev parameter that was added to ndo_fdb_dump:
>          int                     (*ndo_fdb_dump)(struct sk_buff *skb,
>                                                  struct netlink_callback *cb,
>                                                  struct net_device *dev,
>                                                  struct net_device *filter_dev,
>                                                  int idx);
>
> When we call this function for a device, dev pointer is passed as the filter_dev:
>                  if (dev->netdev_ops->ndo_fdb_dump)
>                          idx = dev->netdev_ops->ndo_fdb_dump(skb, cb, bdev, dev,
>                                                              idx);
seems like these calls should be fixed. bdev is really dev in this case. 
And filter_dev should be null.
>
> This is not an issue for a bridge device and a device that is not enslaved
> in a bridge because bdev == dev, but this can be dangerous in other cases.
> Let's assume QLogic NIC has a master device, in this case bdev != dev.
> Now look what is happening, dev is passed as filter_dev to:
> static int qlcnic_fdb_dump(struct sk_buff *skb, struct netlink_callback *ncb,
>                          struct net_device *netdev,
>                          struct net_device *filter_dev, int idx)
> {
>          struct qlcnic_adapter *adapter = netdev_priv(netdev);
> ...
>
> netdev_priv(netdev) returns a pointer to private struct of the bridge,but the driver
> is expecting it's own private stuff.
>
> Should we fix the driver and assume filter_dev is /me and dev is our master
> or the parameters were reversed and should be passed as (skb, cb, dev, bdev, idx) ?
> Is this something for another patch/discussion?
>
> regards,
> Hubert
>

^ permalink raw reply

* Re: [RFC iproute2] tc: Show classes in tree view
From: Cong Wang @ 2014-12-19 16:44 UTC (permalink / raw)
  To: Vadim Kochan; +Cc: netdev
In-Reply-To: <1418989381-6492-1-git-send-email-vadim4j@gmail.com>

On Fri, Dec 19, 2014 at 3:43 AM, Vadim Kochan <vadim4j@gmail.com> wrote:
> From: Vadim Kochan <vadim4j@gmail.com>
>
> Added new '-t[ree]' which shows classes dependency
> in the tree view.
>
> Signed-off-by: Vadim Kochan <vadim4j@gmail.com>

A sample output would convince people more easily.

^ permalink raw reply

* Re: [RFC iproute2] tc: Show classes in tree view
From: Vadim Kochan @ 2014-12-19 16:46 UTC (permalink / raw)
  To: Cong Wang; +Cc: Vadim Kochan, netdev
In-Reply-To: <CAHA+R7P0eZzEgx+Oq-X8eqs7NFFLon4HKa+UUgxDpdaM5e0X+g@mail.gmail.com>

On Fri, Dec 19, 2014 at 08:44:30AM -0800, Cong Wang wrote:
> On Fri, Dec 19, 2014 at 3:43 AM, Vadim Kochan <vadim4j@gmail.com> wrote:
> > From: Vadim Kochan <vadim4j@gmail.com>
> >
> > Added new '-t[ree]' which shows classes dependency
> > in the tree view.
> >
> > Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
> 
> A sample output would convince people more easily.

Yeah, good point:

# tc/tc -t class show dev tap0
+---1:2(htb) rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b 
|   +---1:40(htb) prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b 
|   +---1:50(htb) prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b 
|   +---1:60(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b 
|   
+---1:1(htb) rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b 
    +---1:10(htb) prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b 
    +---1:20(htb) prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b 
    +---1:30(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b 

# tc/tc -t -s class show dev tap0
+---1:2(htb) rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b 
|   |        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
|   |        rate 0bit 0pps backlog 0b 0p requeues 0 
|   |
|   +---1:40(htb) prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b 
|   |             Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
|   |             rate 0bit 0pps backlog 0b 0p requeues 0 
|   |    
|   +---1:50(htb) prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b 
|   |             Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
|   |             rate 0bit 0pps backlog 0b 0p requeues 0 
|   |    
|   +---1:60(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b 
|                 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
|                 rate 0bit 0pps backlog 0b 0p requeues 0 
|   
+---1:1(htb) rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b 
    |        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
    |        rate 0bit 0pps backlog 0b 0p requeues 0 
    |
    +---1:10(htb) prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b 
    |             Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
    |             rate 0bit 0pps backlog 0b 0p requeues 0 
    |    
    +---1:20(htb) prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b 
    |             Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
    |             rate 0bit 0pps backlog 0b 0p requeues 0 
    |    
    +---1:30(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b 
                  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
                  rate 0bit 0pps backlog 0b 0p requeues 0 

^ permalink raw reply

* Re: [patches] a bunch of old bluetooth fixes
From: David Miller @ 2014-12-19 16:59 UTC (permalink / raw)
  To: marcel-kz+m5ild9QBg9hUCZPvPmw
  Cc: viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-bluetooth-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <07BDA2A2-1560-4F78-A0B2-FC25E312CACE-kz+m5ild9QBg9hUCZPvPmw@public.gmane.org>

From: Marcel Holtmann <marcel-kz+m5ild9QBg9hUCZPvPmw@public.gmane.org>
Date: Fri, 19 Dec 2014 11:30:38 +0100

>> Dave, we can prepare a pull request for these or do you want to take them directly into net tree?
> 
> and in case you decide to take them directly.
> 
> Acked-by: Marcel Holtmann <marcel-kz+m5ild9QBg9hUCZPvPmw@public.gmane.org>

Feel free to create a pull request for me, that works best.

Thanks.

^ permalink raw reply

* Re: [bisected] tg3 broken in 3.18.0?
From: Bjorn Helgaas @ 2014-12-19 17:09 UTC (permalink / raw)
  To: Prashant Sreedharan
  Cc: Nils Holland, Marcelo Ricardo Leitner, Michael Chan, Rajat Jain,
	David Miller, netdev, linux-pci@vger.kernel.org, Rafael Wysocki
In-Reply-To: <1418955047.3433.27.camel@prashant>

On Thu, Dec 18, 2014 at 7:10 PM, Prashant Sreedharan
<prashant@broadcom.com> wrote:
> On Thu, 2014-12-18 at 21:26 +0100, Nils Holland wrote:
>> On Thu, Dec 18, 2014 at 11:28:09AM -0800, Prashant Sreedharan wrote:
>> > On Thu, 2014-12-18 at 12:15 -0700, Bjorn Helgaas wrote:
>> > >
>> > > Any updates from the hardware team?
>> > >
>> > > This is a pretty serious regression, but as far as I can tell, it is
>> > > not a PCI bug.  The device should respond to a config read of vendor
>> > > ID.  If the driver does something that make the read return CRS
>> > > status, I think the driver is responsible for doing whatever delay or
>> > > other fixup is required.
>> > >
>> > > I'm inclined to reassign this bug to the tg3 driver unless you think
>> > > the PCI core is doing something wrong here.
>> > >
>> > > Bjorn
>> >
>> > We were not able to reproduce this issue, could you please check what is
>> > the value of reg 0x70, before the pci_device_is_present call is made ?
>> > if bit 15 is set config access will be retried.
>> >
>> > --- a/drivers/net/ethernet/broadcom/tg3.c
>> > +++ b/drivers/net/ethernet/broadcom/tg3.c
>> > @@ -9025,6 +9025,7 @@ static int tg3_chip_reset(struct tg3 *tp)
>> >         void (*write_op)(struct tg3 *, u32, u32);
>> >         int i, err;
>> >
>> > +       printk(KERN_ERR "config state: %x\n", tr32(TG3PCI_PCISTATE));
>> >         if (!pci_device_is_present(tp->pdev))
>> >                 return -ENODEV;
>>
>> No problem, I gave this a try and here is what I get:
>>
>> [    2.185190] libphy: tg3 mdio bus: probed
>> [    2.229357] tsc: Refined TSC clocksource calibration: 2399.999 MHz
>> [    2.244993] config state: 1292
>> [    2.247136] tg3 0000:02:00.0 eth0: Tigon3 [partno(BCM57780) rev 57780001]
>>         (PCI Express) MAC address 00:19:99:ce:13:a6
>> [    2.249279] tg3 0000:02:00.0 eth0: attached PHY driver [Broadcom BCM57780]
>>         (mii_bus:phy_addr=200:01)
>> [    2.251460] tg3 0000:02:00.0 eth0: RXcsums[1] LinkChgREG[0]
>>         MIirq[0] ASF[0] TSOcap[1]
>> [    2.253672] tg3 0000:02:00.0 eth0: dma_rwctrl[76180000] dma_mask[64-bit]
>> [...]
>> [   12.204692] tg3 0000:02:00.0
>>         enp2s0: No firmware running
>> [   12.206653] config state: 1292
>> [   12.208655] config state: 1292
>>
>> That's all of the three times the new debugging line gets hit when I
>> boot my system using the supplied diagnostic patch.
>>
>> Hope that helps - of course, I'd gladly test any further
>> (diagnostic) patches if required! Also, if I can provide any
>> additional information that might be of value, just ask:-)
>>
> Nils/Marcelo thanks for inputs, since reg 0x70 bit 15 is clear it
> indicates the chip is not setting the config retry bit. We were hoping
> this bit is causing the config access to return CRS but looks like it is
> not.
>
> Btw after forcing the error path (tg3_init_one -> tg3_halt) in the
> driver now we are able to reproduce the problem on 5722 in house. We are
> working with the HW team to narrow this down.
>
> Also it is not clear to me how reverting commit cfa6a7877b17a667 fixes
> the problem.

The full commit is 89665a6a71408796565bfd29cfa6a7877b17a667, and git
works with any unique *prefix* of that.  The current convention is to
use the first 12 characters (I have "[core] abbrev = 12" in my
.git/config).  Unfortunately, suffixes don't work at all.

Anyway, here's why I think 89665a6a7140 makes a difference.  We're in this path:

  pci_device_is_present
    pci_bus_read_dev_vendor_id(..., crs_timeout = 0)
      pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, l)

and for some reason the chip returns 0x00010001 for that 32-bit read.
Before 89665a6a7140, we compared all 32 bits with "*l == 0xffff0001".
This is false, so pci_bus_read_dev_vendor_id() returns true, which
means pci_device_is_present() is also true.

After 89665a6a7140, we compare only the low 16 bits with ((*l &
0xffff) == 0x0001), which is true, so pci_bus_read_dev_vendor_id()
returns false, and pci_device_is_present() is false.

Bjorn

^ permalink raw reply

* Re: [bisected] tg3 broken in 3.18.0?
From: Marcelo Ricardo Leitner @ 2014-12-19 17:16 UTC (permalink / raw)
  To: Bjorn Helgaas, Prashant Sreedharan
  Cc: Nils Holland, Michael Chan, Rajat Jain, David Miller, netdev,
	linux-pci@vger.kernel.org, Rafael Wysocki
In-Reply-To: <CAErSpo4UdxwL=wA8cBJ7_DAdTb2xNeg6600qB0=Jdxv80aVmcg@mail.gmail.com>

On 19-12-2014 15:09, Bjorn Helgaas wrote:
> On Thu, Dec 18, 2014 at 7:10 PM, Prashant Sreedharan
> <prashant@broadcom.com> wrote:
>> On Thu, 2014-12-18 at 21:26 +0100, Nils Holland wrote:
>>> On Thu, Dec 18, 2014 at 11:28:09AM -0800, Prashant Sreedharan wrote:
>>>> On Thu, 2014-12-18 at 12:15 -0700, Bjorn Helgaas wrote:
>>>>>
>>>>> Any updates from the hardware team?
>>>>>
>>>>> This is a pretty serious regression, but as far as I can tell, it is
>>>>> not a PCI bug.  The device should respond to a config read of vendor
>>>>> ID.  If the driver does something that make the read return CRS
>>>>> status, I think the driver is responsible for doing whatever delay or
>>>>> other fixup is required.
>>>>>
>>>>> I'm inclined to reassign this bug to the tg3 driver unless you think
>>>>> the PCI core is doing something wrong here.
>>>>>
>>>>> Bjorn
>>>>
>>>> We were not able to reproduce this issue, could you please check what is
>>>> the value of reg 0x70, before the pci_device_is_present call is made ?
>>>> if bit 15 is set config access will be retried.
>>>>
>>>> --- a/drivers/net/ethernet/broadcom/tg3.c
>>>> +++ b/drivers/net/ethernet/broadcom/tg3.c
>>>> @@ -9025,6 +9025,7 @@ static int tg3_chip_reset(struct tg3 *tp)
>>>>          void (*write_op)(struct tg3 *, u32, u32);
>>>>          int i, err;
>>>>
>>>> +       printk(KERN_ERR "config state: %x\n", tr32(TG3PCI_PCISTATE));
>>>>          if (!pci_device_is_present(tp->pdev))
>>>>                  return -ENODEV;
>>>
>>> No problem, I gave this a try and here is what I get:
>>>
>>> [    2.185190] libphy: tg3 mdio bus: probed
>>> [    2.229357] tsc: Refined TSC clocksource calibration: 2399.999 MHz
>>> [    2.244993] config state: 1292
>>> [    2.247136] tg3 0000:02:00.0 eth0: Tigon3 [partno(BCM57780) rev 57780001]
>>>          (PCI Express) MAC address 00:19:99:ce:13:a6
>>> [    2.249279] tg3 0000:02:00.0 eth0: attached PHY driver [Broadcom BCM57780]
>>>          (mii_bus:phy_addr=200:01)
>>> [    2.251460] tg3 0000:02:00.0 eth0: RXcsums[1] LinkChgREG[0]
>>>          MIirq[0] ASF[0] TSOcap[1]
>>> [    2.253672] tg3 0000:02:00.0 eth0: dma_rwctrl[76180000] dma_mask[64-bit]
>>> [...]
>>> [   12.204692] tg3 0000:02:00.0
>>>          enp2s0: No firmware running
>>> [   12.206653] config state: 1292
>>> [   12.208655] config state: 1292
>>>
>>> That's all of the three times the new debugging line gets hit when I
>>> boot my system using the supplied diagnostic patch.
>>>
>>> Hope that helps - of course, I'd gladly test any further
>>> (diagnostic) patches if required! Also, if I can provide any
>>> additional information that might be of value, just ask:-)
>>>
>> Nils/Marcelo thanks for inputs, since reg 0x70 bit 15 is clear it
>> indicates the chip is not setting the config retry bit. We were hoping
>> this bit is causing the config access to return CRS but looks like it is
>> not.
>>
>> Btw after forcing the error path (tg3_init_one -> tg3_halt) in the
>> driver now we are able to reproduce the problem on 5722 in house. We are
>> working with the HW team to narrow this down.
>>
>> Also it is not clear to me how reverting commit cfa6a7877b17a667 fixes
>> the problem.
>
> The full commit is 89665a6a71408796565bfd29cfa6a7877b17a667, and git
> works with any unique *prefix* of that.  The current convention is to
> use the first 12 characters (I have "[core] abbrev = 12" in my
> .git/config).  Unfortunately, suffixes don't work at all.
>
> Anyway, here's why I think 89665a6a7140 makes a difference.  We're in this path:
>
>    pci_device_is_present
>      pci_bus_read_dev_vendor_id(..., crs_timeout = 0)
>        pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, l)
>
> and for some reason the chip returns 0x00010001 for that 32-bit read.

Actually it returns just 0x00000001, but yeah, that's my understanding too.

   Marcelo

> Before 89665a6a7140, we compared all 32 bits with "*l == 0xffff0001".
> This is false, so pci_bus_read_dev_vendor_id() returns true, which
> means pci_device_is_present() is also true.
>
> After 89665a6a7140, we compare only the low 16 bits with ((*l &
> 0xffff) == 0x0001), which is true, so pci_bus_read_dev_vendor_id()
> returns false, and pci_device_is_present() is false.
>
> Bjorn
>

^ permalink raw reply

* Re: [PATCH] net: eth: xgene: change APM X-Gene SoC platform ethernet to support ACPI
From: David Miller @ 2014-12-19 17:31 UTC (permalink / raw)
  To: fkan; +Cc: netdev, patches, linux-kernel
In-Reply-To: <1418945991-31494-1-git-send-email-fkan@apm.com>

From: Feng Kan <fkan@apm.com>
Date: Thu, 18 Dec 2014 15:39:51 -0800

> +#define NO_MAC_FOUND	0
> +#define RES_ENET_CSR	0
> +#define RES_RING_CSR	1
> +#define RES_RING_CMD	2

Don't define your own magic set of error return semantics.

For example, in the MAC address acquisition case, just verify that
you get an ETH_ALEN length value from the fetch routines and return
-ENODEV or similar if you do not.

^ permalink raw reply

* Re: [PATCH] net: eth: xgene: change APM X-Gene SoC platform ethernet to support ACPI
From: Feng Kan @ 2014-12-19 17:36 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, patches, linux-kernel@vger.kernel.org
In-Reply-To: <20141219.123107.1080751255468272638.davem@davemloft.net>

On Fri, Dec 19, 2014 at 9:31 AM, David Miller <davem@davemloft.net> wrote:
> From: Feng Kan <fkan@apm.com>
> Date: Thu, 18 Dec 2014 15:39:51 -0800
>
>> +#define NO_MAC_FOUND 0
>> +#define RES_ENET_CSR 0
>> +#define RES_RING_CSR 1
>> +#define RES_RING_CMD 2
Is there a issue with the RES defines, do you prefer enum types?

>
> Don't define your own magic set of error return semantics.
>
> For example, in the MAC address acquisition case, just verify that
> you get an ETH_ALEN length value from the fetch routines and return
> -ENODEV or similar if you do not.
Thanks, well change.

^ permalink raw reply

* Re: [PATCH net] r8152: drop the tx packet with invalid length
From: Eric Dumazet @ 2014-12-19 17:36 UTC (permalink / raw)
  To: Hayes Wang, Tom Herbert
  Cc: David Miller, netdev@vger.kernel.org, nic_swsd,
	linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org
In-Reply-To: <0835B3720019904CB8F7AA43166CEEB2ED585A@RTITMBSV03.realtek.com.tw>

On Fri, 2014-12-19 at 06:42 +0000, Hayes Wang wrote:

> Excuse me. I try to implement ndo_gso_check() with kernel 3.18.
> However, I still get packets with gso and their skb lengths are more
> than the acceptable one. Do I miss something?
> 
> +static bool rtl8152_gso_check(struct sk_buff *skb, struct net_device *dev)
> +{
> +	if ((skb->len + sizeof(struct tx_desc)) > agg_buf_sz)
> +		return false;
> +
> +	return true;
> +}
>  
> @@ -5861,6 +5876,9 @@ static const struct net_device_ops rtl8152_netdev_ops = {
>  	.ndo_set_mac_address	= rtl8152_set_mac_address,
>  	.ndo_change_mtu		= rtl8152_change_mtu,
>  	.ndo_validate_addr	= eth_validate_addr,
> +	.ndo_gso_check		= rtl8152_gso_check,
>  };

You are right, it seems ndo_gso_check() is buggy at this moment.

Presumably this method should alter %features so that we really segment
the packets in software.

features &= ~NETIF_F_GSO_MASK;

^ permalink raw reply

* Re: [PATCH] sunvnet: fix a memory leak in vnet_handle_offloads
From: David L Stevens @ 2014-12-19 17:39 UTC (permalink / raw)
  To: roy.qing.li, netdev
In-Reply-To: <1418966375-23188-1-git-send-email-roy.qing.li@gmail.com>


Acked-by: David L Stevens <david.stevens@oracle.com>

On 12/19/2014 12:19 AM, roy.qing.li@gmail.com wrote:
> From: Li RongQing <roy.qing.li@gmail.com>
> 
> when skb_gso_segment returns error, the original skb should be freed
> 
> Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
> ---
>  drivers/net/ethernet/sun/sunvnet.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/ethernet/sun/sunvnet.c b/drivers/net/ethernet/sun/sunvnet.c
> index 45c408e..d2835bf 100644
> --- a/drivers/net/ethernet/sun/sunvnet.c
> +++ b/drivers/net/ethernet/sun/sunvnet.c
> @@ -1201,6 +1201,7 @@ static int vnet_handle_offloads(struct vnet_port *port, struct sk_buff *skb)
>  		segs = skb_gso_segment(skb, dev->features & ~NETIF_F_TSO);
>  	if (IS_ERR(segs)) {
>  		dev->stats.tx_dropped++;
> +		dev_kfree_skb_any(skb);
>  		return NETDEV_TX_OK;
>  	}
>  
> 

^ permalink raw reply

* Re: [PATCH] net: eth: xgene: change APM X-Gene SoC platform ethernet to support ACPI
From: David Miller @ 2014-12-19 17:40 UTC (permalink / raw)
  To: fkan; +Cc: netdev, patches, linux-kernel
In-Reply-To: <CAL85gmCOLrcTijpEhXx1x5UjHu0sgS5DdGYJxY4mwPFKCHpU-g@mail.gmail.com>

From: Feng Kan <fkan@apm.com>
Date: Fri, 19 Dec 2014 09:36:13 -0800

> On Fri, Dec 19, 2014 at 9:31 AM, David Miller <davem@davemloft.net> wrote:
>> From: Feng Kan <fkan@apm.com>
>> Date: Thu, 18 Dec 2014 15:39:51 -0800
>>
>>> +#define NO_MAC_FOUND 0
>>> +#define RES_ENET_CSR 0
>>> +#define RES_RING_CSR 1
>>> +#define RES_RING_CMD 2
> Is there a issue with the RES defines, do you prefer enum types?

I don't have any problem with those.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox