* Re: Multicast packets being lost (3.10 stable)
From: David Miller @ 2014-12-13 20:37 UTC (permalink / raw)
To: linus.luessing; +Cc: openwrt-devel, netdev, bridge, gregkh, shemming
In-Reply-To: <20141210191633.GA2473@odroid>
From: Linus Lüssing <linus.luessing@c0d3.blue>
Date: Wed, 10 Dec 2014 20:16:33 +0100
> did you have a chance to look into backporting these fixes for
> stable yet?
I am not submitting -stable fixes back to 3.10 any longer, at most
I am doing 4 -stable releases and right now that is 3.18, 3.17,
v3.14, and v3.12
^ permalink raw reply
* Re: [bisected] tg3 broken in 3.18.0?
From: Nils Holland @ 2014-12-13 21:02 UTC (permalink / raw)
To: David Miller; +Cc: netdev, linux-pci, rajatxjain
rajatxjain@gmail.com
Bcc:
Subject: Re: [bisected] tg3 broken in 3.18.0?
Reply-To:
In-Reply-To: <20141212.201831.186234837340644301.davem@davemloft.net>
On Fri, Dec 12, 2014 at 08:18:31PM -0500, David Miller wrote:
> From: Nils Holland <nholland@tisys.org>
> Date: Sat, 13 Dec 2014 02:14:08 +0100
>
> >
> > My bisect exercise suggests that the following commit is the culprit:
> >
> > 89665a6a71408796565bfd29cfa6a7877b17a667 (PCI: Check only the Vendor
> > ID to identify Configuration Request Retry)
>
> You definitely need to bring this up with the author of that change
> and the relevent list for the PCI subsystem and/or linux-kernel.
I've now already sent an inquiry to Rajat Jain, the author of the
patch in question, and this message here is now also CC'd to
linux-pci@.
With this message, I'd like to add one last result of investigation
I've done today, in the hope that it will aid the folks with more
knowledge to go after the issue.
Basically, I've added a little debug output to tg3.c in the function
tg3_poll_fw(), as that function contained the code that would print
out the "No firmware running" line that was visible in dmesg on those
kernels where tg3 would not work for me. So, I basically had this:
static int tg3_poll_fw(struct tg3 *tp)
{
int i;
u32 val;
netdev_info(tp->dev, "XX: Boom!\n");
[...]
}
Now, I was looking through dmesg searching for occurances of this
debug output, using a standard 3.18.0 kernel (where my tg3 doesn't
work) as well as using a 3.18.0 kernel with
89665a6a71408796565bfd29cfa6a7877b17a667 reverted (where my tg3
works). Here's the results:
[standard 3.18.0 (=problematic)]:
[ 2.197653] libphy: tg3 mdio bus: probed
[ 2.257488] tg3 0000:02:00.0 eth0:
Tigon3 [partno(BCM57780) rev 57780001] (PCI Express) MAC address
00:19:99:ce:13:a6
[ 2.259589] tg3 0000:02:00.0 eth0:
attached PHY driver [Broadcom BCM57780] (mii_bus:phy_addr=200:01)
[ 2.261740] tg3 0000:02:00.0 eth0:
RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[ 2.263912] tg3 0000:02:00.0 eth0:
dma_rwctrl[76180000] dma_mask[64-bit]
[...]
[ 10.028002] tg3 0000:02:00.0: irq 25 for MSI/MSI-X
[ 10.028247] tg3 0000:02:00.0 enp2s0: XX: Boom!
[ 12.157034] tg3 0000:02:00.0 enp2s0: No firmware running
[3.18.0 without above mentioned patch, 3.17.3 is the same, both result
in a working tg3]:
[ 1.397167] libphy: tg3 mdio bus: probed
[ 1.456473] tg3 0000:02:00.0
(unnamed net_device) (uninitialized): XX: Boom!
[ 1.464987] tg3 0000:02:00.0 eth0:
Tigon3 [partno(BCM57780) rev 57780001] (PCI Express) MAC address
00:19:99:ce:13:a6
[ 1.467118] tg3 0000:02:00.0 eth0:
attached PHY driver [Broadcom BCM57780] (mii_bus:phy_addr=200:01)
[ 1.469311] tg3 0000:02:00.0 eth0:
RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[ 1.471500] tg3 0000:02:00.0 eth0:
dma_rwctrl[76180000] dma_mask[64-bit]
[...]
[ 9.631629] tg3 0000:02:00.0: irq 25 for MSI/MSI-X
[ 9.631962] tg3 0000:02:00.0 enp2s0: XX: Boom!
[ 9.634339] tg3 0000:02:00.0 enp2s0: XX: Boom!
[ 9.642741] IPv6:
ADDRCONF(NETDEV_UP): enp2s0: link is not ready
[ 10.479636] tg3 0000:02:00.0
enp2s0: Link is down
[ 11.484498] tg3 0000:02:00.0
enp2s0: Link is up at 100 Mbps, full duplex
As can be seen, there are two tg3-related sections in my dmesg in both
the working and non-working scenarios: At about 1 - 2 secs, the card
seems to begin initializing, and at about 9 - 10 seconds it is (or
should be) ready to establish a network connection.
My debug section, or tg3.c's tg3_poll_fw(), seems to be called thrice
in the working situation: The first hit occurs at 1.456473 where the tg3
device is still reported as "(unnamed net_device) (uninitialized)".
Then, the section gets hit twice again at around 9.63 - at this point
the driver already reports the card as initialized / by its real name.
In the non-working situation, the debug sections seems to be hit only
once, at 10.028247. At this point, the tg3 is already reported as
initialized - just like when it's hit the second and third time in the
working situation.
Bottom line is that commit 89665a6a71408796565bfd29cfa6a7877b17a667
really makes a difference regarding the way the tg3 card is
initialized, which seems to cause the problem.
Greetings,
Nils
^ permalink raw reply
* charity message
From: luv2charitys @ 2014-12-14 2:58 UTC (permalink / raw)
To: netdev
Hello,this is Mr Paul N,i sent you an email on charity work but i am yet
to hear fom you,do reply with this code CHA-2015 to my email address
paulcharity@qq.com i Look forward to hearing from you this time,God
bless Brother Paul
^ permalink raw reply
* Re: [WTF?] random test in netlink_sendmsg()
From: David Miller @ 2014-12-14 4:38 UTC (permalink / raw)
To: viro; +Cc: kaber, netdev, dmitry.tarnyagin
In-Reply-To: <20141213045133.GI22149@ZenIV.linux.org.uk>
From: Al Viro <viro@ZenIV.linux.org.uk>
Date: Sat, 13 Dec 2014 04:51:33 +0000
> On Sat, Dec 13, 2014 at 03:25:00AM +0000, Al Viro wrote:
>> msg->msg_iter.type == KVEC_ITER &&
>
> ITER_IOVEC, that is. And that way it even works... Are you OK with the
> commit below?
No objection, looks perfectly fine.
^ permalink raw reply
* Re: [PATCH net] net/mlx4_en: correct the endianness of doorbell_qpn on big endian platform
From: David Miller @ 2014-12-14 4:43 UTC (permalink / raw)
To: weiyang; +Cc: David.Laight, eric.dumazet, netdev, gideonn, edumazet, amirv
In-Reply-To: <20141213031338.GA12208@richard>
From: Wei Yang <weiyang@linux.vnet.ibm.com>
Date: Sat, 13 Dec 2014 11:13:38 +0800
> On Mon, Dec 08, 2014 at 10:42:37PM +0800, Wei Yang wrote:
> If you prefer this way, I would like to send a new version for this.
> Is it ok for you?
I'm not so sure. There are implications when using the __raw_*()
routines.
In particular, using __raw_{read,write}l() also means that the usual
necessary I/O memory barriers are not being performed.
There are therefore no ordering guarantees between __raw_*() and other
I/O or memory accesses whatsoever.
^ permalink raw reply
* [PATCH] cirrus: cs89x0: fix time comparison
From: Asaf Vertz @ 2014-12-14 8:34 UTC (permalink / raw)
To: davem; +Cc: ebiederm, julia.lawall, himangi774, asaf.vertz, netdev,
linux-kernel
To be future-proof and for better readability the time comparisons are
modified to use time_before, time_after, and time_after_eq instead of
plain, error-prone math.
Signed-off-by: Asaf Vertz <asaf.vertz@tandemg.com>
---
drivers/net/ethernet/cirrus/cs89x0.c | 27 ++++++++++++++-------------
1 files changed, 14 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/cirrus/cs89x0.c b/drivers/net/ethernet/cirrus/cs89x0.c
index 9823a0e..965331f 100644
--- a/drivers/net/ethernet/cirrus/cs89x0.c
+++ b/drivers/net/ethernet/cirrus/cs89x0.c
@@ -60,6 +60,7 @@
#include <linux/interrupt.h>
#include <linux/ioport.h>
#include <linux/in.h>
+#include <linux/jiffies.h>
#include <linux/skbuff.h>
#include <linux/spinlock.h>
#include <linux/string.h>
@@ -238,13 +239,13 @@ writereg(struct net_device *dev, u16 regno, u16 value)
static int __init
wait_eeprom_ready(struct net_device *dev)
{
- int timeout = jiffies;
+ unsigned long timeout = jiffies;
/* check to see if the EEPROM is ready,
* a timeout is used just in case EEPROM is ready when
* SI_BUSY in the PP_SelfST is clear
*/
while (readreg(dev, PP_SelfST) & SI_BUSY)
- if (jiffies - timeout >= 40)
+ if (time_after_eq(jiffies, timeout + 40))
return -1;
return 0;
}
@@ -485,7 +486,7 @@ control_dc_dc(struct net_device *dev, int on_not_off)
{
struct net_local *lp = netdev_priv(dev);
unsigned int selfcontrol;
- int timenow = jiffies;
+ unsigned long timenow = jiffies;
/* control the DC to DC convertor in the SelfControl register.
* Note: This is hooked up to a general purpose pin, might not
* always be a DC to DC convertor.
@@ -499,7 +500,7 @@ control_dc_dc(struct net_device *dev, int on_not_off)
writereg(dev, PP_SelfCTL, selfcontrol);
/* Wait for the DC/DC converter to power up - 500ms */
- while (jiffies - timenow < HZ)
+ while (time_before(jiffies, timenow + HZ))
;
}
@@ -514,7 +515,7 @@ send_test_pkt(struct net_device *dev)
0, 0, /* DSAP=0 & SSAP=0 fields */
0xf3, 0 /* Control (Test Req + P bit set) */
};
- long timenow = jiffies;
+ unsigned long timenow = jiffies;
writereg(dev, PP_LineCTL, readreg(dev, PP_LineCTL) | SERIAL_TX_ON);
@@ -525,10 +526,10 @@ send_test_pkt(struct net_device *dev)
iowrite16(ETH_ZLEN, lp->virt_addr + TX_LEN_PORT);
/* Test to see if the chip has allocated memory for the packet */
- while (jiffies - timenow < 5)
+ while (time_before(jiffies, timenow + 5))
if (readreg(dev, PP_BusST) & READY_FOR_TX_NOW)
break;
- if (jiffies - timenow >= 5)
+ if (time_after_eq(jiffies, timenow + 5))
return 0; /* this shouldn't happen */
/* Write the contents of the packet */
@@ -536,7 +537,7 @@ send_test_pkt(struct net_device *dev)
cs89_dbg(1, debug, "Sending test packet ");
/* wait a couple of jiffies for packet to be received */
- for (timenow = jiffies; jiffies - timenow < 3;)
+ for (timenow = jiffies; time_before(jiffies, timenow + 3);)
;
if ((readreg(dev, PP_TxEvent) & TX_SEND_OK_BITS) == TX_OK) {
cs89_dbg(1, cont, "succeeded\n");
@@ -556,7 +557,7 @@ static int
detect_tp(struct net_device *dev)
{
struct net_local *lp = netdev_priv(dev);
- int timenow = jiffies;
+ unsigned long timenow = jiffies;
int fdx;
cs89_dbg(1, debug, "%s: Attempting TP\n", dev->name);
@@ -574,7 +575,7 @@ detect_tp(struct net_device *dev)
/* Delay for the hardware to work out if the TP cable is present
* - 150ms
*/
- for (timenow = jiffies; jiffies - timenow < 15;)
+ for (timenow = jiffies; time_before(jiffies, timenow + 15);)
;
if ((readreg(dev, PP_LineST) & LINK_OK) == 0)
return DETECTED_NONE;
@@ -618,7 +619,7 @@ detect_tp(struct net_device *dev)
if ((lp->auto_neg_cnf & AUTO_NEG_BITS) == AUTO_NEG_ENABLE) {
pr_info("%s: negotiating duplex...\n", dev->name);
while (readreg(dev, PP_AutoNegST) & AUTO_NEG_BUSY) {
- if (jiffies - timenow > 4000) {
+ if (time_after(jiffies, timenow + 4000)) {
pr_err("**** Full / half duplex auto-negotiation timed out ****\n");
break;
}
@@ -1271,7 +1272,7 @@ static void __init reset_chip(struct net_device *dev)
{
#if !defined(CONFIG_MACH_MX31ADS)
struct net_local *lp = netdev_priv(dev);
- int reset_start_time;
+ unsigned long reset_start_time;
writereg(dev, PP_SelfCTL, readreg(dev, PP_SelfCTL) | POWER_ON_RESET);
@@ -1294,7 +1295,7 @@ static void __init reset_chip(struct net_device *dev)
/* Wait until the chip is reset */
reset_start_time = jiffies;
while ((readreg(dev, PP_SelfST) & INIT_DONE) == 0 &&
- jiffies - reset_start_time < 2)
+ time_before(jiffies, reset_start_time + 2))
;
#endif /* !CONFIG_MACH_MX31ADS */
}
--
1.7.0.4
^ permalink raw reply related
* Re: [PATCH iproute2 1/4] lib: Add netns_switch func for change network namespace
From: Jiri Pirko @ 2014-12-14 9:35 UTC (permalink / raw)
To: Vadim Kochan; +Cc: netdev
In-Reply-To: <1418493334-23142-2-git-send-email-vadim4j@gmail.com>
Sat, Dec 13, 2014 at 06:55:31PM CET, vadim4j@gmail.com wrote:
>From: Vadim Kochan <vadim4j@gmail.com>
>
>New netns_switch func moved to the lib/namespace.c from ip/ipnetns.c
>so it can be used from the other tools for fast switching
>network namespace.
>
>Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
^ permalink raw reply
* Re: [PATCH iproute2 2/4] ip: Allow to easy change network namespace
From: Jiri Pirko @ 2014-12-14 9:36 UTC (permalink / raw)
To: Vadim Kochan; +Cc: netdev
In-Reply-To: <1418493334-23142-3-git-send-email-vadim4j@gmail.com>
Sat, Dec 13, 2014 at 06:55:32PM CET, vadim4j@gmail.com wrote:
>From: Vadim Kochan <vadim4j@gmail.com>
>
>Added new '-netns' option to simplify executing following cmd:
>
> ip netns exec NETNS ip OPTIONS COMMAND OBJECT
>
> to
>
> ip -n[etns] NETNS OPTIONS COMMAND OBJECT
>
>e.g.:
>
> ip -net vnet0 link add br0 type bridge
> ip -n vnet0 link
>
>Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
^ permalink raw reply
* Re: [PATCH iproute2 3/4] bridge: Allow to easy change network namespace
From: Jiri Pirko @ 2014-12-14 9:36 UTC (permalink / raw)
To: Vadim Kochan; +Cc: netdev
In-Reply-To: <1418493334-23142-4-git-send-email-vadim4j@gmail.com>
Sat, Dec 13, 2014 at 06:55:33PM CET, vadim4j@gmail.com wrote:
>From: Vadim Kochan <vadim4j@gmail.com>
>
>Added new '-netns' option to simplify executing following cmd:
>
> ip netns exec NETNS bridge OPTIONS COMMAND OBJECT
>
> to
>
> bridge -n[etns] NETNS OPTIONS COMMAND OBJECT
>
>e.g.:
>
> bridge -net vnet0 fdb
>
>Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
^ permalink raw reply
* Re: [PATCH iproute2 4/4] tc: Allow to easy change network namespace
From: Jiri Pirko @ 2014-12-14 9:36 UTC (permalink / raw)
To: Vadim Kochan; +Cc: netdev
In-Reply-To: <1418493334-23142-5-git-send-email-vadim4j@gmail.com>
Sat, Dec 13, 2014 at 06:55:34PM CET, vadim4j@gmail.com wrote:
>From: Vadim Kochan <vadim4j@gmail.com>
>
>Added new '-netns' option to simplify executing following cmd:
>
> ip netns exec NETNS tc OPTIONS COMMAND OBJECT
>
> to
>
> tc -n[etns] NETNS OPTIONS COMMAND OBJECT
>
>e.g.:
>
> tc -net vnet0 qdisc
>
>Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
^ permalink raw reply
* [PATCH iproute2 REGRESSION] ss: Dont show netlink and packet sockets by default
From: Vadim Kochan @ 2014-12-14 9:36 UTC (permalink / raw)
To: netdev; +Cc: Vadim Kochan
From: Vadim Kochan <vadim4j@gmail.com>
Checking by SS_CLOSE state was remowed in:
(45a4770bc0) ss: Remove checking SS_CLOSE state for packet and netlink
which is not really correct because now by default all sockets are seen
when do 'ss'.
Here is most correct fix which considers specified family.
To see netlink sockets:
ss -A netlink
To see packet sockets:
ss -A packet
And ss by default will show only connected/established sockets as it
was before all the time.
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
---
misc/ss.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/misc/ss.c b/misc/ss.c
index e9927a5..6050ab6 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -2801,6 +2801,9 @@ static int packet_show(struct filter *f)
int ino;
unsigned long long sk;
+ if (preferred_family != AF_PACKET && !(f->states & (1<<SS_CLOSE)))
+ return 0;
+
if (packet_show_netlink(f, NULL) == 0)
return 0;
@@ -3028,6 +3031,9 @@ static int netlink_show(struct filter *f)
int rq, wq, rc;
unsigned long long sk, cb;
+ if (preferred_family != AF_NETLINK && !(f->states & (1<<SS_CLOSE)))
+ return 0;
+
if (!getenv("PROC_NET_NETLINK") && !getenv("PROC_ROOT") &&
netlink_show_netlink(f, NULL) == 0)
return 0;
--
2.1.3
^ permalink raw reply related
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
From: Roopa Prabhu @ 2014-12-14 14:13 UTC (permalink / raw)
To: Jiri Pirko
Cc: sfeldma, jhs, bcrl, tgraf, john.fastabend, stephen, linville,
vyasevic, netdev, davem, shm, gospo
In-Reply-To: <20141211222559.GB1880@nanopsycho.orion>
On 12/11/14, 2:25 PM, Jiri Pirko wrote:
> Thu, Dec 11, 2014 at 07:27:32PM CET, roopa@cumulusnetworks.com wrote:
>> On 12/11/14, 10:07 AM, Jiri Pirko wrote:
>>> Thu, Dec 11, 2014 at 06:59:15PM CET, roopa@cumulusnetworks.com wrote:
>>>> On 12/11/14, 9:11 AM, Jiri Pirko wrote:
>>>>> Thu, Dec 11, 2014 at 05:52:10PM CET, roopa@cumulusnetworks.com wrote:
>>>>>> On 12/10/14, 1:37 AM, Jiri Pirko wrote:
>>>>>>> Wed, Dec 10, 2014 at 10:05:18AM CET, roopa@cumulusnetworks.com wrote:
>>>>>>>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>>>>>
>>>>>>>> This patch adds two new api's netdev_switch_port_bridge_setlink
>>>>>>>> and netdev_switch_port_bridge_dellink to offload bridge port attributes
>>>>>>>> to switch asic
>>>>>>>>
>>>>>>>> (The names of the apis look odd with 'switch_port_bridge',
>>>>>>>> but am more inclined to change the prefix of the api to something else.
>>>>>>>> Will take any suggestions).
>>>>>>>>
>>>>>>>> The api's look at the NETIF_F_HW_NETFUNC_OFFLOAD feature flag to
>>>>>>>> pass bridge port attributes to the port device.
>>>>>>>>
>>>>>>>> If the device has the NETIF_F_HW_NETFUNC_OFFLOAD, but does not support
>>>>>>>> the bridge port attribute offload ndo, call bridge port attribute ndo's on
>>>>>>>> the lowerdevs if supported. This is one way to pass bridge port attributes
>>>>>>>> through stacked netdevs (example when bridge port is a bond and bond slaves
>>>>>>>> are switch ports).
>>>>>>>>
>>>>>>>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>>>>> ---
>>>>>>>> include/net/switchdev.h | 5 +++-
>>>>>>>> net/switchdev/switchdev.c | 70 +++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>> 2 files changed, 74 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>>>>>>> index 8a6d164..22676b6 100644
>>>>>>>> --- a/include/net/switchdev.h
>>>>>>>> +++ b/include/net/switchdev.h
>>>>>>>> @@ -17,7 +17,10 @@
>>>>>>>> int netdev_switch_parent_id_get(struct net_device *dev,
>>>>>>>> struct netdev_phys_item_id *psid);
>>>>>>>> int netdev_switch_port_stp_update(struct net_device *dev, u8 state);
>>>>>>>> -
>>>>>>>> +int netdev_switch_port_bridge_setlink(struct net_device *dev,
>>>>>>>> + struct nlmsghdr *nlh, u16 flags);
>>>>>>>> +int netdev_switch_port_bridge_dellink(struct net_device *dev,
>>>>>>>> + struct nlmsghdr *nlh, u16 flags);
>>>>>>>> #else
>>>>>>>>
>>>>>>>> static inline int netdev_switch_parent_id_get(struct net_device *dev,
>>>>>>>> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>>>>>>>> index d162b21..62317e1 100644
>>>>>>>> --- a/net/switchdev/switchdev.c
>>>>>>>> +++ b/net/switchdev/switchdev.c
>>>>>>>> @@ -50,3 +50,73 @@ int netdev_switch_port_stp_update(struct net_device *dev, u8 state)
>>>>>>>> return ops->ndo_switch_port_stp_update(dev, state);
>>>>>>>> }
>>>>>>>> EXPORT_SYMBOL(netdev_switch_port_stp_update);
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * netdev_switch_port_bridge_setlink - Notify switch device port of bridge
>>>>>>>> + * port attributes
>>>>>>>> + *
>>>>>>>> + * @dev: port device
>>>>>>>> + * @nlh: netlink msg with bridge port attributes
>>>>>>>> + *
>>>>>>>> + * Notify switch device port of bridge port attributes
>>>>>>>> + */
>>>>>>>> +int netdev_switch_port_bridge_setlink(struct net_device *dev,
>>>>>>>> + struct nlmsghdr *nlh, u16 flags)
>>>>>>>> +{
>>>>>>>> + const struct net_device_ops *ops = dev->netdev_ops;
>>>>>>>> + struct net_device *lower_dev;
>>>>>>>> + struct list_head *iter;
>>>>>>>> + int ret = 0, err = 0;
>>>>>>>> +
>>>>>>>> + if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>>>>>>>> + return err;
>>>>>>>> +
>>>>>>>> + if (ops->ndo_bridge_setlink) {
>>>>>>>> + WARN_ON(!ops->ndo_switch_parent_id_get);
>>>>>>>> + return ops->ndo_bridge_setlink(dev, nlh, flags);
>>>>>>> You have to change ndo_bridge_setlink in netdevice.h first.
>>>>>>> Otherwise when only this patch is applied (during bisection)
>>>>>>> this won't compile.
>>>>>> ack, will fix it and keep that in mind next time.
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>>>>>> I do not understand why to iterate over lower devices. At this
>>>>>>> stage we don't know a thing about this upper or its lowers. Let
>>>>>>> the uppers (/masters) to decide if this needs to be propagated
>>>>>>> or not.
>>>>>> Jiri, In the stacked devices case, there is no way to propagate the bridge
>>>>>> port attributes to switch device driver today (vlan and other bridge port
>>>>>> attributes). Can you tell me if there is a way ?. no, ndo_vlan* ndo's are not
>>>>>> useful here. Nor we should go and implement ndo_bridge_setlink* in all
>>>>>> devices that can be bridge ports.
>>>>> Hmm. I just think that is cleaner to implement ndo_bridge_setlink in
>>>>> bonding for example and let it propagate the the call to slaves.
>>>> No, that will require bridge attribute support in all drivers. And that is no
>>>> good.
>>> Not all drivers, just all masters which want to support this. Like bond,
>>> team, macvlan etc. That would be the same as for
>>> ndo_vlan_rx_add_vid/ndo_vlan_rx_kill_vid/ndo_change_mtu etc. I do not
>>> see any problem in that. It is much much clearer over big hammer iterate
>>> over lowers in my opinion.
>> You cannot avoid the lowerdev iteration in any case.
>> If you added it in the individual drivers: bond, macvlan and other drivers
>> will all have to do the same thing.
>> ie Call bridge setlink on lowerdevs.
> I feel that the right way is to let masters propagate that themselves in
> their code.
In this case no. Just because an interface is a port in a bridge, it is
wrong to indicate that the interface driver needs to understand all
bridge port configuration attributes. Note that with what you are asking
for ...all bridge port drivers (bonds, vxlans) will also need to
implement the netdev stp state update api.
> That's it. I might be wrong of course.
>> My patch avoids the need to modify these drivers. Besides it does this only
>> when the OFFLOAD flag is set.
>
> Yep, well in my reply to another patch of you series I expressed my
> feeling that the flag should be really checked in particular switch
> driver, not core. But I might be wrong there as well...
The bridge driver owns these attributes...and he needs to call the
switchdev api to offload.
And the condition for the switchdev api call is the offload flag. And
the offload flag is part of the switchdev api.
The drivers just set it on the netdev, they dont own the offload flag.
So, I don't see a reason why the core should not
know about the flag.
What has been accepted in the kernel currently does not help bridge
driver offloading to switchdev. It does help if you want to manage your
switch device separately like you were already doing with nics. ie going
to switch port driver directly. It does not help the stacked device case
either.
I will resubmit my series with the checkpatch errors you pointed out.
And, am also looking at other ways to solve the problem.
Thanks for the review.
>> It will not stop at adding the ndo_bridge_setlink to bond/macvlan etc. It
>> will be all other ndo_ops we will need for switch asics.
>> It will be l3 tomorrow, if the route is through a bond (But at that point, we
>> may end up having to introduce switch device instead of going to the port.
>> Lets see).
>>
>> Today this patch introduces an abstract way to get to the switch driver by
>> getting to the slave switch port (And only when the OFFLOAD flag is set).
>>
>>
>>>
>>>>> Let every "upper" to handle ndo_bridge_setlink their way. Sometimes it
>>>>> might not make sense to propagate to "lowers".
>>>> This does not really propagate to lowers. It is just trying to get to a
>>>> switch port and from there to the switch driver.
>>>> Example, bond driver does not need to care if its a bridge port. It will
>>>> simply pass the call to its slave which
>>>> might be a switch port.
>>>>
>>>> bond driver does not care if its a bridge port. But the switch driver cares,
>>>> because it knows that the bond was created with switch ports.
>>>>
>>>>
>>>>>> And this allows a switch driver to receive these callbacks if it has marked
>>>>>> the switch port with an offload flag. Your way of using the switch port to
>>>>>> get to the switch driver does not help in these cases.
>>>>> I do not follow how this is related to this case (stacked layout).
>>>>>
>>>>>> The other option is to use the 'switch device (not port)' to get to the
>>>>>> switch driver.
>>>>> That would not help this case (stacked layout) I believe.
>>>>>
>>>>>
>>>>>> This patch shows that you can still do this with the ndo ops.
>>>>>>>> + err = netdev_switch_port_bridge_setlink(lower_dev, nlh, flags);
>>>>>>>> + if (err)
>>>>>>>> + ret = err;
>>>>>>>> + }
>>>>>>> ^^^^^ Indent is off. This should be catched by scripts/checkpatch.pl.
>>>>>>>
>>>>>>>> +
>>>>>>>> + return ret;
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL(netdev_switch_port_bridge_setlink);
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * netdev_switch_port_bridge_dellink - Notify switch device port of bridge
>>>>>>>> + * attribute delete
>>>>>>>> + *
>>>>>>>> + * @dev: port device
>>>>>>>> + * @nlh: netlink msg with bridge port attributes
>>>>>>>> + *
>>>>>>>> + * Notify switch device port of bridge port attribute delete
>>>>>>>> + */
>>>>>>>> +int netdev_switch_port_bridge_dellink(struct net_device *dev,
>>>>>>>> + struct nlmsghdr *nlh, u16 flags)
>>>>>>>> +{
>>>>>>>> + const struct net_device_ops *ops = dev->netdev_ops;
>>>>>>>> + struct net_device *lower_dev;
>>>>>>>> + struct list_head *iter;
>>>>>>>> + int ret = 0, err = 0;
>>>>>>>> +
>>>>>>>> + if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>>>>>>>> + return err;
>>>>>>>> +
>>>>>>>> + if (ops->ndo_bridge_dellink) {
>>>>>>>> + WARN_ON(!ops->ndo_switch_parent_id_get);
>>>>>>>> + return ops->ndo_bridge_dellink(dev, nlh, flags);
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>>>>>>> + err = netdev_switch_port_bridge_dellink(lower_dev, nlh, flags);
>>>>>>>> + if (err)
>>>>>>>> + ret = err;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + return ret;
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL(netdev_switch_port_bridge_dellink);
>>>>>>>> --
>>>>>>>> 1.7.10.4
>>>>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH net 0/2] mlx4 driver fixes for 3.19-rc1
From: Or Gerlitz @ 2014-12-14 14:18 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Matan Barak, Amir Vadai, Tal Alon, Or Gerlitz
Hi Dave,
Just fixes for two small issues introduced in the 3.19 merge window
Or.
Matan Barak (1):
net/mlx4_core: Fixed memory leak and incorrect refcount in mlx4_load_one
Or Gerlitz (1):
net/mlx4_core: Avoid double dumping of the PF device capabilities
drivers/net/ethernet/mellanox/mlx4/fw.c | 26 +++++++-----
drivers/net/ethernet/mellanox/mlx4/fw.h | 1 +
drivers/net/ethernet/mellanox/mlx4/main.c | 62 ++++++++++++++++-------------
3 files changed, 50 insertions(+), 39 deletions(-)
^ permalink raw reply
* [PATCH net 2/2] net/mlx4_core: Avoid double dumping of the PF device capabilities
From: Or Gerlitz @ 2014-12-14 14:18 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Matan Barak, Amir Vadai, Tal Alon, Or Gerlitz
In-Reply-To: <1418566685-9855-1-git-send-email-ogerlitz@mellanox.com>
To support asymmetric EQ allocations, we should query the device
capabilities prior to enabling SRIOV. As a side effect of adding that,
we are dumping the PF device capabilities twice. Avoid that by moving
the printing into a helper function which is called once.
Fixes: 7ae0e400cd93 ('net/mlx4_core: Flexible (asymmetric) allocation of
EQs and MSI-X vectors for PF/VFs')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/fw.c | 26 +++++++++++++++-----------
drivers/net/ethernet/mellanox/mlx4/fw.h | 1 +
drivers/net/ethernet/mellanox/mlx4/main.c | 1 +
3 files changed, 17 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index ef3b95b..51807bb 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -787,11 +787,8 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
if ((1 << (field & 0x3f)) > (PAGE_SIZE / dev_cap->bf_reg_size))
field = 3;
dev_cap->bf_regs_per_page = 1 << (field & 0x3f);
- mlx4_dbg(dev, "BlueFlame available (reg size %d, regs/page %d)\n",
- dev_cap->bf_reg_size, dev_cap->bf_regs_per_page);
} else {
dev_cap->bf_reg_size = 0;
- mlx4_dbg(dev, "BlueFlame not available\n");
}
MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_SG_SQ_OFFSET);
@@ -902,9 +899,6 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
goto out;
}
- mlx4_dbg(dev, "Base MM extensions: flags %08x, rsvd L_Key %08x\n",
- dev_cap->bmme_flags, dev_cap->reserved_lkey);
-
/*
* Each UAR has 4 EQ doorbells; so if a UAR is reserved, then
* we can't use any EQs whose doorbell falls on that page,
@@ -916,6 +910,21 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
else
dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_SYS_EQS;
+out:
+ mlx4_free_cmd_mailbox(dev, mailbox);
+ return err;
+}
+
+void mlx4_dev_cap_dump(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
+{
+ if (dev_cap->bf_reg_size > 0)
+ mlx4_dbg(dev, "BlueFlame available (reg size %d, regs/page %d)\n",
+ dev_cap->bf_reg_size, dev_cap->bf_regs_per_page);
+ else
+ mlx4_dbg(dev, "BlueFlame not available\n");
+
+ mlx4_dbg(dev, "Base MM extensions: flags %08x, rsvd L_Key %08x\n",
+ dev_cap->bmme_flags, dev_cap->reserved_lkey);
mlx4_dbg(dev, "Max ICM size %lld MB\n",
(unsigned long long) dev_cap->max_icm_sz >> 20);
mlx4_dbg(dev, "Max QPs: %d, reserved QPs: %d, entry size: %d\n",
@@ -949,13 +958,8 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
dev_cap->dmfs_high_rate_qpn_base);
mlx4_dbg(dev, "DMFS high rate steer QPn range: %d\n",
dev_cap->dmfs_high_rate_qpn_range);
-
dump_dev_cap_flags(dev, dev_cap->flags);
dump_dev_cap_flags2(dev, dev_cap->flags2);
-
-out:
- mlx4_free_cmd_mailbox(dev, mailbox);
- return err;
}
int mlx4_QUERY_PORT(struct mlx4_dev *dev, int port, struct mlx4_port_cap *port_cap)
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.h b/drivers/net/ethernet/mellanox/mlx4/fw.h
index 794e282..62562b6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.h
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.h
@@ -224,6 +224,7 @@ struct mlx4_set_ib_param {
u32 cap_mask;
};
+void mlx4_dev_cap_dump(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap);
int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap);
int mlx4_QUERY_PORT(struct mlx4_dev *dev, int port, struct mlx4_port_cap *port_cap);
int mlx4_QUERY_FUNC_CAP(struct mlx4_dev *dev, u8 gen_or_port,
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index c2ef266..b935bf3 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -305,6 +305,7 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
mlx4_err(dev, "QUERY_DEV_CAP command failed, aborting\n");
return err;
}
+ mlx4_dev_cap_dump(dev, dev_cap);
if (dev_cap->min_page_sz > PAGE_SIZE) {
mlx4_err(dev, "HCA minimum page size of %d bigger than kernel PAGE_SIZE of %ld, aborting\n",
--
1.7.1
^ permalink raw reply related
* [PATCH net 1/2] net/mlx4_core: Fixed memory leak and incorrect refcount in mlx4_load_one
From: Or Gerlitz @ 2014-12-14 14:18 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Matan Barak, Amir Vadai, Tal Alon, Or Gerlitz
In-Reply-To: <1418566685-9855-1-git-send-email-ogerlitz@mellanox.com>
From: Matan Barak <matanb@mellanox.com>
The current mlx4_load_one has a memory leak as it always allocates
dev_cap, but frees it only on error.
In addition, even if VFs exist when mlx4_load_one is called,
we still need to notify probed VFs that we're loading (by
incrementing pf_loading).
Fixes: a0eacca948d2 ('net/mlx4_core: Refactor mlx4_load_one')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/main.c | 61 +++++++++++++++-------------
1 files changed, 33 insertions(+), 28 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index e25436b..c2ef266 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -2488,41 +2488,42 @@ static u64 mlx4_enable_sriov(struct mlx4_dev *dev, struct pci_dev *pdev,
u8 total_vfs, int existing_vfs)
{
u64 dev_flags = dev->flags;
+ int err = 0;
- dev->dev_vfs = kzalloc(
- total_vfs * sizeof(*dev->dev_vfs),
- GFP_KERNEL);
+ atomic_inc(&pf_loading);
+ if (dev->flags & MLX4_FLAG_SRIOV) {
+ if (existing_vfs != total_vfs) {
+ mlx4_err(dev, "SR-IOV was already enabled, but with num_vfs (%d) different than requested (%d)\n",
+ existing_vfs, total_vfs);
+ total_vfs = existing_vfs;
+ }
+ }
+
+ dev->dev_vfs = kzalloc(total_vfs * sizeof(*dev->dev_vfs), GFP_KERNEL);
if (NULL == dev->dev_vfs) {
mlx4_err(dev, "Failed to allocate memory for VFs\n");
goto disable_sriov;
- } else if (!(dev->flags & MLX4_FLAG_SRIOV)) {
- int err = 0;
-
- atomic_inc(&pf_loading);
- if (existing_vfs) {
- if (existing_vfs != total_vfs)
- mlx4_err(dev, "SR-IOV was already enabled, but with num_vfs (%d) different than requested (%d)\n",
- existing_vfs, total_vfs);
- } else {
- mlx4_warn(dev, "Enabling SR-IOV with %d VFs\n", total_vfs);
- err = pci_enable_sriov(pdev, total_vfs);
- }
- if (err) {
- mlx4_err(dev, "Failed to enable SR-IOV, continuing without SR-IOV (err = %d)\n",
- err);
- atomic_dec(&pf_loading);
- goto disable_sriov;
- } else {
- mlx4_warn(dev, "Running in master mode\n");
- dev_flags |= MLX4_FLAG_SRIOV |
- MLX4_FLAG_MASTER;
- dev_flags &= ~MLX4_FLAG_SLAVE;
- dev->num_vfs = total_vfs;
- }
+ }
+
+ if (!(dev->flags & MLX4_FLAG_SRIOV)) {
+ mlx4_warn(dev, "Enabling SR-IOV with %d VFs\n", total_vfs);
+ err = pci_enable_sriov(pdev, total_vfs);
+ }
+ if (err) {
+ mlx4_err(dev, "Failed to enable SR-IOV, continuing without SR-IOV (err = %d)\n",
+ err);
+ goto disable_sriov;
+ } else {
+ mlx4_warn(dev, "Running in master mode\n");
+ dev_flags |= MLX4_FLAG_SRIOV |
+ MLX4_FLAG_MASTER;
+ dev_flags &= ~MLX4_FLAG_SLAVE;
+ dev->num_vfs = total_vfs;
}
return dev_flags;
disable_sriov:
+ atomic_dec(&pf_loading);
dev->num_vfs = 0;
kfree(dev->dev_vfs);
return dev_flags & ~MLX4_FLAG_MASTER;
@@ -2606,8 +2607,10 @@ static int mlx4_load_one(struct pci_dev *pdev, int pci_dev_data,
}
if (total_vfs) {
- existing_vfs = pci_num_vf(pdev);
dev->flags = MLX4_FLAG_MASTER;
+ existing_vfs = pci_num_vf(pdev);
+ if (existing_vfs)
+ dev->flags |= MLX4_FLAG_SRIOV;
dev->num_vfs = total_vfs;
}
}
@@ -2643,6 +2646,7 @@ slave_start:
}
if (mlx4_is_master(dev)) {
+ /* when we hit the goto slave_start below, dev_cap already initialized */
if (!dev_cap) {
dev_cap = kzalloc(sizeof(*dev_cap), GFP_KERNEL);
@@ -2849,6 +2853,7 @@ slave_start:
if (mlx4_is_master(dev) && dev->num_vfs)
atomic_dec(&pf_loading);
+ kfree(dev_cap);
return 0;
err_port:
--
1.7.1
^ permalink raw reply related
* Re: [PATCH iproute2 REGRESSION] ss: Dont show netlink and packet sockets by default
From: Sergei Shtylyov @ 2014-12-14 14:26 UTC (permalink / raw)
To: Vadim Kochan, netdev
In-Reply-To: <1418549765-9466-1-git-send-email-vadim4j@gmail.com>
Hello.
On 12/14/2014 12:36 PM, Vadim Kochan wrote:
> From: Vadim Kochan <vadim4j@gmail.com>
> Checking by SS_CLOSE state was remowed in:
> (45a4770bc0) ss: Remove checking SS_CLOSE state for packet and netlink
> which is not really correct because now by default all sockets are seen
> when do 'ss'.
> Here is most correct fix which considers specified family.
> To see netlink sockets:
> ss -A netlink
> To see packet sockets:
> ss -A packet
> And ss by default will show only connected/established sockets as it
> was before all the time.
>
> Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
> ---
> misc/ss.c | 6 ++++++
> 1 file changed, 6 insertions(+)
> diff --git a/misc/ss.c b/misc/ss.c
> index e9927a5..6050ab6 100644
> --- a/misc/ss.c
> +++ b/misc/ss.c
> @@ -2801,6 +2801,9 @@ static int packet_show(struct filter *f)
> int ino;
> unsigned long long sk;
>
> + if (preferred_family != AF_PACKET && !(f->states & (1<<SS_CLOSE)))
Please surround << with spaces, to be consistent with other operators and
general kernel coding style.
> + return 0;
> +
> if (packet_show_netlink(f, NULL) == 0)
> return 0;
>
> @@ -3028,6 +3031,9 @@ static int netlink_show(struct filter *f)
> int rq, wq, rc;
> unsigned long long sk, cb;
>
> + if (preferred_family != AF_NETLINK && !(f->states & (1<<SS_CLOSE)))
Likewise.
> + return 0;
> +
[...]
WBR, Sergei
^ permalink raw reply
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
From: Jiri Pirko @ 2014-12-14 15:35 UTC (permalink / raw)
To: Roopa Prabhu
Cc: sfeldma, jhs, bcrl, tgraf, john.fastabend, stephen, linville,
vyasevic, netdev, davem, shm, gospo
In-Reply-To: <548D9B14.9010201@cumulusnetworks.com>
Sun, Dec 14, 2014 at 03:13:40PM CET, roopa@cumulusnetworks.com wrote:
>On 12/11/14, 2:25 PM, Jiri Pirko wrote:
>>Thu, Dec 11, 2014 at 07:27:32PM CET, roopa@cumulusnetworks.com wrote:
>>>On 12/11/14, 10:07 AM, Jiri Pirko wrote:
>>>>Thu, Dec 11, 2014 at 06:59:15PM CET, roopa@cumulusnetworks.com wrote:
>>>>>On 12/11/14, 9:11 AM, Jiri Pirko wrote:
>>>>>>Thu, Dec 11, 2014 at 05:52:10PM CET, roopa@cumulusnetworks.com wrote:
>>>>>>>On 12/10/14, 1:37 AM, Jiri Pirko wrote:
>>>>>>>>Wed, Dec 10, 2014 at 10:05:18AM CET, roopa@cumulusnetworks.com wrote:
>>>>>>>>>From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>>>>>>
>>>>>>>>>This patch adds two new api's netdev_switch_port_bridge_setlink
>>>>>>>>>and netdev_switch_port_bridge_dellink to offload bridge port attributes
>>>>>>>>>to switch asic
>>>>>>>>>
>>>>>>>>>(The names of the apis look odd with 'switch_port_bridge',
>>>>>>>>>but am more inclined to change the prefix of the api to something else.
>>>>>>>>>Will take any suggestions).
>>>>>>>>>
>>>>>>>>>The api's look at the NETIF_F_HW_NETFUNC_OFFLOAD feature flag to
>>>>>>>>>pass bridge port attributes to the port device.
>>>>>>>>>
>>>>>>>>>If the device has the NETIF_F_HW_NETFUNC_OFFLOAD, but does not support
>>>>>>>>>the bridge port attribute offload ndo, call bridge port attribute ndo's on
>>>>>>>>>the lowerdevs if supported. This is one way to pass bridge port attributes
>>>>>>>>>through stacked netdevs (example when bridge port is a bond and bond slaves
>>>>>>>>>are switch ports).
>>>>>>>>>
>>>>>>>>>Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>>>>>>---
>>>>>>>>>include/net/switchdev.h | 5 +++-
>>>>>>>>>net/switchdev/switchdev.c | 70 +++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>2 files changed, 74 insertions(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>>diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>>>>>>>>index 8a6d164..22676b6 100644
>>>>>>>>>--- a/include/net/switchdev.h
>>>>>>>>>+++ b/include/net/switchdev.h
>>>>>>>>>@@ -17,7 +17,10 @@
>>>>>>>>>int netdev_switch_parent_id_get(struct net_device *dev,
>>>>>>>>> struct netdev_phys_item_id *psid);
>>>>>>>>>int netdev_switch_port_stp_update(struct net_device *dev, u8 state);
>>>>>>>>>-
>>>>>>>>>+int netdev_switch_port_bridge_setlink(struct net_device *dev,
>>>>>>>>>+ struct nlmsghdr *nlh, u16 flags);
>>>>>>>>>+int netdev_switch_port_bridge_dellink(struct net_device *dev,
>>>>>>>>>+ struct nlmsghdr *nlh, u16 flags);
>>>>>>>>>#else
>>>>>>>>>
>>>>>>>>>static inline int netdev_switch_parent_id_get(struct net_device *dev,
>>>>>>>>>diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>>>>>>>>>index d162b21..62317e1 100644
>>>>>>>>>--- a/net/switchdev/switchdev.c
>>>>>>>>>+++ b/net/switchdev/switchdev.c
>>>>>>>>>@@ -50,3 +50,73 @@ int netdev_switch_port_stp_update(struct net_device *dev, u8 state)
>>>>>>>>> return ops->ndo_switch_port_stp_update(dev, state);
>>>>>>>>>}
>>>>>>>>>EXPORT_SYMBOL(netdev_switch_port_stp_update);
>>>>>>>>>+
>>>>>>>>>+/**
>>>>>>>>>+ * netdev_switch_port_bridge_setlink - Notify switch device port of bridge
>>>>>>>>>+ * port attributes
>>>>>>>>>+ *
>>>>>>>>>+ * @dev: port device
>>>>>>>>>+ * @nlh: netlink msg with bridge port attributes
>>>>>>>>>+ *
>>>>>>>>>+ * Notify switch device port of bridge port attributes
>>>>>>>>>+ */
>>>>>>>>>+int netdev_switch_port_bridge_setlink(struct net_device *dev,
>>>>>>>>>+ struct nlmsghdr *nlh, u16 flags)
>>>>>>>>>+{
>>>>>>>>>+ const struct net_device_ops *ops = dev->netdev_ops;
>>>>>>>>>+ struct net_device *lower_dev;
>>>>>>>>>+ struct list_head *iter;
>>>>>>>>>+ int ret = 0, err = 0;
>>>>>>>>>+
>>>>>>>>>+ if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>>>>>>>>>+ return err;
>>>>>>>>>+
>>>>>>>>>+ if (ops->ndo_bridge_setlink) {
>>>>>>>>>+ WARN_ON(!ops->ndo_switch_parent_id_get);
>>>>>>>>>+ return ops->ndo_bridge_setlink(dev, nlh, flags);
>>>>>>>> You have to change ndo_bridge_setlink in netdevice.h first.
>>>>>>>> Otherwise when only this patch is applied (during bisection)
>>>>>>>> this won't compile.
>>>>>>>ack, will fix it and keep that in mind next time.
>>>>>>>>>+ }
>>>>>>>>>+
>>>>>>>>>+ netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>>>>>>> I do not understand why to iterate over lower devices. At this
>>>>>>>> stage we don't know a thing about this upper or its lowers. Let
>>>>>>>> the uppers (/masters) to decide if this needs to be propagated
>>>>>>>> or not.
>>>>>>>Jiri, In the stacked devices case, there is no way to propagate the bridge
>>>>>>>port attributes to switch device driver today (vlan and other bridge port
>>>>>>>attributes). Can you tell me if there is a way ?. no, ndo_vlan* ndo's are not
>>>>>>>useful here. Nor we should go and implement ndo_bridge_setlink* in all
>>>>>>>devices that can be bridge ports.
>>>>>>Hmm. I just think that is cleaner to implement ndo_bridge_setlink in
>>>>>>bonding for example and let it propagate the the call to slaves.
>>>>>No, that will require bridge attribute support in all drivers. And that is no
>>>>>good.
>>>>Not all drivers, just all masters which want to support this. Like bond,
>>>>team, macvlan etc. That would be the same as for
>>>>ndo_vlan_rx_add_vid/ndo_vlan_rx_kill_vid/ndo_change_mtu etc. I do not
>>>>see any problem in that. It is much much clearer over big hammer iterate
>>>>over lowers in my opinion.
>>>You cannot avoid the lowerdev iteration in any case.
>>>If you added it in the individual drivers: bond, macvlan and other drivers
>>>will all have to do the same thing.
>>>ie Call bridge setlink on lowerdevs.
>>I feel that the right way is to let masters propagate that themselves in
>>their code.
>In this case no. Just because an interface is a port in a bridge, it is wrong
>to indicate that the interface driver needs to understand all bridge port
>configuration attributes. Note that with what you are asking for ...all
>bridge port drivers (bonds, vxlans) will also need to implement the netdev
>stp state update api.
I'm very well aware of this fact. But still, I'm convinced that the way
similar things are implemented now, using prapagation inside particular
drivers (bond/team/etc) is the correct way to go. I do not see any
downside in that. But we are running in circles here. I would love to
hear opinion of other people here.
>>That's it. I might be wrong of course.
>
>
>>>My patch avoids the need to modify these drivers. Besides it does this only
>>>when the OFFLOAD flag is set.
>>
>>Yep, well in my reply to another patch of you series I expressed my
>>feeling that the flag should be really checked in particular switch
>>driver, not core. But I might be wrong there as well...
>
>The bridge driver owns these attributes...and he needs to call the switchdev
>api to offload.
>And the condition for the switchdev api call is the offload flag. And the
>offload flag is part of the switchdev api.
>The drivers just set it on the netdev, they dont own the offload flag. So, I
>don't see a reason why the core should not
>know about the flag.
I do not understand the formulation "own the offload flag". What I say
is let the bridge/others to call the switchdev api unconditionally and
let the leaf drivers handle that as they see fit, taking various facts
into account, flags included. This way you avoid the need for flags
inheritance in stacked scenarios. Imagine following example:
bridge - bond --- eth1
--- eth2
eth1 and eth2 are switch ports. Now eth1 has the flag set and eth2 does
not. Should the bond have the flag set or not? And if it has, eth2 need
to check the flag as well to do not offload.
Implementing the inheritance correctly would be a small nightmare. So I
say, why don't just let the leafs to check and decide.
>
>What has been accepted in the kernel currently does not help bridge driver
>offloading to switchdev. It does help if you want to manage your switch
>device separately like you were already doing with nics. ie going to switch
>port driver directly. It does not help the stacked device case either.
>
>
>I will resubmit my series with the checkpatch errors you pointed out.
>
>And, am also looking at other ways to solve the problem.
>
>Thanks for the review.
>
>
>>>It will not stop at adding the ndo_bridge_setlink to bond/macvlan etc. It
>>>will be all other ndo_ops we will need for switch asics.
>>>It will be l3 tomorrow, if the route is through a bond (But at that point, we
>>>may end up having to introduce switch device instead of going to the port.
>>>Lets see).
>>>
>>>Today this patch introduces an abstract way to get to the switch driver by
>>>getting to the slave switch port (And only when the OFFLOAD flag is set).
>>>
>>>
>>>>
>>>>>>Let every "upper" to handle ndo_bridge_setlink their way. Sometimes it
>>>>>>might not make sense to propagate to "lowers".
>>>>>This does not really propagate to lowers. It is just trying to get to a
>>>>>switch port and from there to the switch driver.
>>>>>Example, bond driver does not need to care if its a bridge port. It will
>>>>>simply pass the call to its slave which
>>>>>might be a switch port.
>>>>>
>>>>>bond driver does not care if its a bridge port. But the switch driver cares,
>>>>>because it knows that the bond was created with switch ports.
>>>>>
>>>>>
>>>>>>>And this allows a switch driver to receive these callbacks if it has marked
>>>>>>>the switch port with an offload flag. Your way of using the switch port to
>>>>>>>get to the switch driver does not help in these cases.
>>>>>>I do not follow how this is related to this case (stacked layout).
>>>>>>
>>>>>>>The other option is to use the 'switch device (not port)' to get to the
>>>>>>>switch driver.
>>>>>>That would not help this case (stacked layout) I believe.
>>>>>>
>>>>>>
>>>>>>>This patch shows that you can still do this with the ndo ops.
>>>>>>>>>+ err = netdev_switch_port_bridge_setlink(lower_dev, nlh, flags);
>>>>>>>>>+ if (err)
>>>>>>>>>+ ret = err;
>>>>>>>>>+ }
>>>>>>>> ^^^^^ Indent is off. This should be catched by scripts/checkpatch.pl.
>>>>>>>>
>>>>>>>>>+
>>>>>>>>>+ return ret;
>>>>>>>>>+}
>>>>>>>>>+EXPORT_SYMBOL(netdev_switch_port_bridge_setlink);
>>>>>>>>>+
>>>>>>>>>+/**
>>>>>>>>>+ * netdev_switch_port_bridge_dellink - Notify switch device port of bridge
>>>>>>>>>+ * attribute delete
>>>>>>>>>+ *
>>>>>>>>>+ * @dev: port device
>>>>>>>>>+ * @nlh: netlink msg with bridge port attributes
>>>>>>>>>+ *
>>>>>>>>>+ * Notify switch device port of bridge port attribute delete
>>>>>>>>>+ */
>>>>>>>>>+int netdev_switch_port_bridge_dellink(struct net_device *dev,
>>>>>>>>>+ struct nlmsghdr *nlh, u16 flags)
>>>>>>>>>+{
>>>>>>>>>+ const struct net_device_ops *ops = dev->netdev_ops;
>>>>>>>>>+ struct net_device *lower_dev;
>>>>>>>>>+ struct list_head *iter;
>>>>>>>>>+ int ret = 0, err = 0;
>>>>>>>>>+
>>>>>>>>>+ if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>>>>>>>>>+ return err;
>>>>>>>>>+
>>>>>>>>>+ if (ops->ndo_bridge_dellink) {
>>>>>>>>>+ WARN_ON(!ops->ndo_switch_parent_id_get);
>>>>>>>>>+ return ops->ndo_bridge_dellink(dev, nlh, flags);
>>>>>>>>>+ }
>>>>>>>>>+
>>>>>>>>>+ netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>>>>>>>>+ err = netdev_switch_port_bridge_dellink(lower_dev, nlh, flags);
>>>>>>>>>+ if (err)
>>>>>>>>>+ ret = err;
>>>>>>>>>+ }
>>>>>>>>>+
>>>>>>>>>+ return ret;
>>>>>>>>>+}
>>>>>>>>>+EXPORT_SYMBOL(netdev_switch_port_bridge_dellink);
>>>>>>>>>--
>>>>>>>>>1.7.10.4
>>>>>>>>>
>>>>>>--
>>>>>>To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>>>>the body of a message to majordomo@vger.kernel.org
>>>>>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>--
>>To unsubscribe from this list: send the line "unsubscribe netdev" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* [PATCH net] net: Disallow providing non zero VLAN ID for NIC drivers FDB add flow
From: Or Gerlitz @ 2014-12-14 16:19 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, jiri, gospo, jhs, john.r.fastabend, Or Gerlitz
The current implementations all use dev_uc_add_excl() and such whose API
doesn't support vlans, so we can't make it with NICs HW for now.
Fixes: f6f6424ba773 ('net: make vid as a parameter for ndo_fdb_add/ndo_fdb_del')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 5 +++++
net/core/rtnetlink.c | 5 +++++
2 files changed, 10 insertions(+), 0 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 0a7ea4c..a5f2660 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -7549,6 +7549,11 @@ static int i40e_ndo_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
if (!(pf->flags & I40E_FLAG_SRIOV_ENABLED))
return -EOPNOTSUPP;
+ if (vid) {
+ pr_info("%s: vlans aren't supported yet for dev_uc|mc_add()\n", dev->name);
+ return -EINVAL;
+ }
+
/* Hardware does not support aging addresses so if a
* ndm_state is given only allow permanent addresses
*/
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d06107d..9cf6fe9 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2368,6 +2368,11 @@ int ndo_dflt_fdb_add(struct ndmsg *ndm,
return err;
}
+ if (vid) {
+ pr_info("%s: vlans aren't supported yet for dev_uc|mc_add()\n", dev->name);
+ return err;
+ }
+
if (is_unicast_ether_addr(addr) || is_link_local_ether_addr(addr))
err = dev_uc_add_excl(dev, addr);
else if (is_multicast_ether_addr(addr))
--
1.7.1
^ permalink raw reply related
* Re: [PATCH iproute2 REGRESSION] ss: Dont show netlink and packet sockets by default
From: vadim4j @ 2014-12-14 17:15 UTC (permalink / raw)
To: Sergei Shtylyov; +Cc: Vadim Kochan, netdev
In-Reply-To: <548D9E25.30100@cogentembedded.com>
On Sun, Dec 14, 2014 at 05:26:45PM +0300, Sergei Shtylyov wrote:
> Hello.
>
> On 12/14/2014 12:36 PM, Vadim Kochan wrote:
>
> >From: Vadim Kochan <vadim4j@gmail.com>
>
> >Checking by SS_CLOSE state was remowed in:
>
> > (45a4770bc0) ss: Remove checking SS_CLOSE state for packet and netlink
>
> >which is not really correct because now by default all sockets are seen
> >when do 'ss'.
>
> >Here is most correct fix which considers specified family.
>
> >To see netlink sockets:
> > ss -A netlink
>
> >To see packet sockets:
> > ss -A packet
>
> >And ss by default will show only connected/established sockets as it
> >was before all the time.
> >
> >Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
> >---
> > misc/ss.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
>
> >diff --git a/misc/ss.c b/misc/ss.c
> >index e9927a5..6050ab6 100644
> >--- a/misc/ss.c
> >+++ b/misc/ss.c
> >@@ -2801,6 +2801,9 @@ static int packet_show(struct filter *f)
> > int ino;
> > unsigned long long sk;
> >
> >+ if (preferred_family != AF_PACKET && !(f->states & (1<<SS_CLOSE)))
>
> Please surround << with spaces, to be consistent with other operators and
> general kernel coding style.
>
> >+ return 0;
> >+
> > if (packet_show_netlink(f, NULL) == 0)
> > return 0;
> >
> >@@ -3028,6 +3031,9 @@ static int netlink_show(struct filter *f)
> > int rq, wq, rc;
> > unsigned long long sk, cb;
> >
> >+ if (preferred_family != AF_NETLINK && !(f->states & (1<<SS_CLOSE)))
>
> Likewise.
>
> >+ return 0;
> >+
> [...]
>
> WBR, Sergei
>
OK, I just returned removed code, but I agree to correct it, thanks.
^ permalink raw reply
* [PATCH iproute2 REGRESSION v2] ss: Dont show netlink and packet sockets by default
From: Vadim Kochan @ 2014-12-14 17:23 UTC (permalink / raw)
To: netdev; +Cc: Vadim Kochan
From: Vadim Kochan <vadim4j@gmail.com>
Checking by SS_CLOSE state was remowed in:
(45a4770bc0) ss: Remove checking SS_CLOSE state for packet and netlink
which is not really correct because now by default all sockets are seen
when do 'ss'.
Here is most correct fix which considers specified family.
To see netlink sockets:
ss -A netlink
To see packet sockets:
ss -A packet
And ss by default will show only connected/established sockets as it
was before all the time.
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
---
misc/ss.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/misc/ss.c b/misc/ss.c
index e9927a5..8f39eb8 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -2801,6 +2801,9 @@ static int packet_show(struct filter *f)
int ino;
unsigned long long sk;
+ if (preferred_family != AF_PACKET && !(f->states & (1 << SS_CLOSE)))
+ return 0;
+
if (packet_show_netlink(f, NULL) == 0)
return 0;
@@ -3028,6 +3031,9 @@ static int netlink_show(struct filter *f)
int rq, wq, rc;
unsigned long long sk, cb;
+ if (preferred_family != AF_NETLINK && !(f->states & (1 << SS_CLOSE)))
+ return 0;
+
if (!getenv("PROC_NET_NETLINK") && !getenv("PROC_ROOT") &&
netlink_show_netlink(f, NULL) == 0)
return 0;
--
2.1.3
^ permalink raw reply related
* Re: [PATCH ethtool v2 2/3] ethtool: Add copybreak support
From: Ben Hutchings @ 2014-12-14 17:46 UTC (permalink / raw)
To: Govindarajulu Varadarajan; +Cc: netdev, ogerlitz, yevgenyp
In-Reply-To: <1412637141-3205-3-git-send-email-_govind@gmx.com>
[-- Attachment #1: Type: text/plain, Size: 1987 bytes --]
On Tue, 2014-10-07 at 04:42 +0530, Govindarajulu Varadarajan wrote:
> This patch adds support for setting/getting driver's rx_copybreak value.
> copybreak is set/get using new ethtool tunable interface.
>
> This was added to net-next in
> commit: f0db9b073415848709dd59a6394969882f517da9
>
> ethtool: Add generic options for tunables
>
> Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
> ---
> ethtool.c | 177 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 177 insertions(+)
>
> diff --git a/ethtool.c b/ethtool.c
> index bf583f3..4045356 100644
> --- a/ethtool.c
> +++ b/ethtool.c
> @@ -179,6 +179,12 @@ static const struct flag_info flags_msglvl[] = {
> { "wol", NETIF_MSG_WOL },
> };
>
> +static const char *tunable_name[] = {
> + [ETHTOOL_ID_UNSPEC] = "Unspec",
> + [ETHTOOL_RX_COPYBREAK] = "rx",
> + [ETHTOOL_TX_COPYBREAK] = "tx",
> +};
Tunables should be named by a string set defined in the kernel.
[...]
> @@ -4055,6 +4228,10 @@ static const struct option {
> " [ rx-mini N ]\n"
> " [ rx-jumbo N ]\n"
> " [ tx N ]\n" },
> + { "-b|--show-copybreak", 1, do_gcopybreak, "Show copybreak values" },
> + { "-B|--set-copybreak", 1, do_scopybreak, "Set copybreak values",
> + " [ rx N]\n"
> + " [ tx N]\n" },
> { "-k|--show-features|--show-offload", 1, do_gfeatures,
> "Get state of protocol offload and other features" },
> { "-K|--features|--offload", 1, do_sfeatures,
[...]
T don't think this is worth two options of its own. You should be able
to add generic get/set-tunable optins. You'll need to get the string
set to find out the names of tunables. When setting a tunable, you'll
need to get it first to find out its type.
Ben.
--
Ben Hutchings
The two most common things in the universe are hydrogen and stupidity.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
^ permalink raw reply
* Re: [ethtool PATCH 1/1] bug fix: SFP Tx BIAS uses memory wrong offset
From: Ben Hutchings @ 2014-12-14 17:48 UTC (permalink / raw)
To: Jamal Hadi Salim; +Cc: bwh, netdev
In-Reply-To: <1413367740-11118-1-git-send-email-jhs@emojatatu.com>
[-- Attachment #1: Type: text/plain, Size: 927 bytes --]
On Wed, 2014-10-15 at 06:09 -0400, Jamal Hadi Salim wrote:
> From: Jamal Hadi Salim <jhs@mojatatu.com>
>
> SFF-8472 rev 12.0 indicates the SFP BIAS is at offset 100 of page A2
>
> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Applied, thanks.
Ben.
> ---
> sfpdiag.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sfpdiag.c b/sfpdiag.c
> index 812a2fa..a3dbc9b 100644
> --- a/sfpdiag.c
> +++ b/sfpdiag.c
> @@ -48,7 +48,7 @@
> #define SFF_A2_VCC_HWARN 12
> #define SFF_A2_VCC_LWARN 14
>
> -#define SFF_A2_BIAS 96
> +#define SFF_A2_BIAS 100
> #define SFF_A2_BIAS_HALRM 16
> #define SFF_A2_BIAS_LALRM 18
> #define SFF_A2_BIAS_HWARN 20
--
Ben Hutchings
The two most common things in the universe are hydrogen and stupidity.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
^ permalink raw reply
* Re: [ethtool][PATCH] Fix build with musl by using more common typedefs
From: Ben Hutchings @ 2014-12-14 18:37 UTC (permalink / raw)
To: Paul Barker; +Cc: Ben Hutchings, netdev, John Spencer
In-Reply-To: <1414323849-5739-2-git-send-email-paul@paulbarker.me.uk>
[-- Attachment #1: Type: text/plain, Size: 1510 bytes --]
On Sun, 2014-10-26 at 11:44 +0000, Paul Barker wrote:
> When using musl as the standard C library, type names such as '__int32_t' are
> not defined. Instead we must use the more commonly defined type names such as
> 'int32_t', which are defined in <stdint.h>.
>
> Signed-off-by: John Spencer <maillist-linux@barfooze.de>
> Signed-off-by: Paul Barker <paul@paulbarker.me.uk>
Applied, thanks.
Ben.
> ---
> internal.h | 13 +++++++------
> 1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/internal.h b/internal.h
> index a9dfae0..262a39f 100644
> --- a/internal.h
> +++ b/internal.h
> @@ -7,6 +7,7 @@
> #include "ethtool-config.h"
> #endif
> #include <stdio.h>
> +#include <stdint.h>
> #include <stdlib.h>
> #include <string.h>
> #include <sys/types.h>
> @@ -17,16 +18,16 @@
>
> /* ethtool.h expects these to be defined by <linux/types.h> */
> #ifndef HAVE_BE_TYPES
> -typedef __uint16_t __be16;
> -typedef __uint32_t __be32;
> +typedef uint16_t __be16;
> +typedef uint32_t __be32;
> typedef unsigned long long __be64;
> #endif
>
> typedef unsigned long long u64;
> -typedef __uint32_t u32;
> -typedef __uint16_t u16;
> -typedef __uint8_t u8;
> -typedef __int32_t s32;
> +typedef uint32_t u32;
> +typedef uint16_t u16;
> +typedef uint8_t u8;
> +typedef int32_t s32;
>
> #include "ethtool-copy.h"
> #include "net_tstamp-copy.h"
--
Ben Hutchings
The two most common things in the universe are hydrogen and stupidity.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
^ permalink raw reply
* Re: [PATCH net] net: Disallow providing non zero VLAN ID for NIC drivers FDB add flow
From: Jiri Pirko @ 2014-12-14 19:23 UTC (permalink / raw)
To: Or Gerlitz; +Cc: David S. Miller, netdev, gospo, jhs, john.r.fastabend
In-Reply-To: <1418573945-27840-1-git-send-email-ogerlitz@mellanox.com>
Sun, Dec 14, 2014 at 05:19:05PM CET, ogerlitz@mellanox.com wrote:
>The current implementations all use dev_uc_add_excl() and such whose API
>doesn't support vlans, so we can't make it with NICs HW for now.
>
>Fixes: f6f6424ba773 ('net: make vid as a parameter for ndo_fdb_add/ndo_fdb_del')
Maybe I'm missing something, but this commit did not introduce the
problem. If was there already before when NDA_VLAN was set and ignored.
But other than this. I like the patch
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
>Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
>---
> drivers/net/ethernet/intel/i40e/i40e_main.c | 5 +++++
> net/core/rtnetlink.c | 5 +++++
> 2 files changed, 10 insertions(+), 0 deletions(-)
>
>diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
>index 0a7ea4c..a5f2660 100644
>--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
>+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
>@@ -7549,6 +7549,11 @@ static int i40e_ndo_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
> if (!(pf->flags & I40E_FLAG_SRIOV_ENABLED))
> return -EOPNOTSUPP;
>
>+ if (vid) {
>+ pr_info("%s: vlans aren't supported yet for dev_uc|mc_add()\n", dev->name);
>+ return -EINVAL;
>+ }
>+
> /* Hardware does not support aging addresses so if a
> * ndm_state is given only allow permanent addresses
> */
>diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
>index d06107d..9cf6fe9 100644
>--- a/net/core/rtnetlink.c
>+++ b/net/core/rtnetlink.c
>@@ -2368,6 +2368,11 @@ int ndo_dflt_fdb_add(struct ndmsg *ndm,
> return err;
> }
>
>+ if (vid) {
>+ pr_info("%s: vlans aren't supported yet for dev_uc|mc_add()\n", dev->name);
>+ return err;
>+ }
>+
> if (is_unicast_ether_addr(addr) || is_link_local_ether_addr(addr))
> err = dev_uc_add_excl(dev, addr);
> else if (is_multicast_ether_addr(addr))
>--
>1.7.1
>
^ permalink raw reply
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
From: Roopa Prabhu @ 2014-12-14 19:41 UTC (permalink / raw)
To: Jiri Pirko
Cc: sfeldma, jhs, bcrl, tgraf, john.fastabend, stephen, linville,
vyasevic, netdev, davem, shm, gospo
In-Reply-To: <20141214153549.GA2174@nanopsycho.orion>
On 12/14/14, 7:35 AM, Jiri Pirko wrote:
> Sun, Dec 14, 2014 at 03:13:40PM CET, roopa@cumulusnetworks.com wrote:
>> On 12/11/14, 2:25 PM, Jiri Pirko wrote:
>>> Thu, Dec 11, 2014 at 07:27:32PM CET, roopa@cumulusnetworks.com wrote:
>>>> On 12/11/14, 10:07 AM, Jiri Pirko wrote:
>>>>> Thu, Dec 11, 2014 at 06:59:15PM CET, roopa@cumulusnetworks.com wrote:
>>>>>> On 12/11/14, 9:11 AM, Jiri Pirko wrote:
>>>>>>> Thu, Dec 11, 2014 at 05:52:10PM CET, roopa@cumulusnetworks.com wrote:
>>>>>>>> On 12/10/14, 1:37 AM, Jiri Pirko wrote:
>>>>>>>>> Wed, Dec 10, 2014 at 10:05:18AM CET, roopa@cumulusnetworks.com wrote:
>>>>>>>>>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>>>>>>>
>>>>>>>>>> This patch adds two new api's netdev_switch_port_bridge_setlink
>>>>>>>>>> and netdev_switch_port_bridge_dellink to offload bridge port attributes
>>>>>>>>>> to switch asic
>>>>>>>>>>
>>>>>>>>>> (The names of the apis look odd with 'switch_port_bridge',
>>>>>>>>>> but am more inclined to change the prefix of the api to something else.
>>>>>>>>>> Will take any suggestions).
>>>>>>>>>>
>>>>>>>>>> The api's look at the NETIF_F_HW_NETFUNC_OFFLOAD feature flag to
>>>>>>>>>> pass bridge port attributes to the port device.
>>>>>>>>>>
>>>>>>>>>> If the device has the NETIF_F_HW_NETFUNC_OFFLOAD, but does not support
>>>>>>>>>> the bridge port attribute offload ndo, call bridge port attribute ndo's on
>>>>>>>>>> the lowerdevs if supported. This is one way to pass bridge port attributes
>>>>>>>>>> through stacked netdevs (example when bridge port is a bond and bond slaves
>>>>>>>>>> are switch ports).
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>>>>>>> ---
>>>>>>>>>> include/net/switchdev.h | 5 +++-
>>>>>>>>>> net/switchdev/switchdev.c | 70 +++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>> 2 files changed, 74 insertions(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>>>>>>>>> index 8a6d164..22676b6 100644
>>>>>>>>>> --- a/include/net/switchdev.h
>>>>>>>>>> +++ b/include/net/switchdev.h
>>>>>>>>>> @@ -17,7 +17,10 @@
>>>>>>>>>> int netdev_switch_parent_id_get(struct net_device *dev,
>>>>>>>>>> struct netdev_phys_item_id *psid);
>>>>>>>>>> int netdev_switch_port_stp_update(struct net_device *dev, u8 state);
>>>>>>>>>> -
>>>>>>>>>> +int netdev_switch_port_bridge_setlink(struct net_device *dev,
>>>>>>>>>> + struct nlmsghdr *nlh, u16 flags);
>>>>>>>>>> +int netdev_switch_port_bridge_dellink(struct net_device *dev,
>>>>>>>>>> + struct nlmsghdr *nlh, u16 flags);
>>>>>>>>>> #else
>>>>>>>>>>
>>>>>>>>>> static inline int netdev_switch_parent_id_get(struct net_device *dev,
>>>>>>>>>> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>>>>>>>>>> index d162b21..62317e1 100644
>>>>>>>>>> --- a/net/switchdev/switchdev.c
>>>>>>>>>> +++ b/net/switchdev/switchdev.c
>>>>>>>>>> @@ -50,3 +50,73 @@ int netdev_switch_port_stp_update(struct net_device *dev, u8 state)
>>>>>>>>>> return ops->ndo_switch_port_stp_update(dev, state);
>>>>>>>>>> }
>>>>>>>>>> EXPORT_SYMBOL(netdev_switch_port_stp_update);
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * netdev_switch_port_bridge_setlink - Notify switch device port of bridge
>>>>>>>>>> + * port attributes
>>>>>>>>>> + *
>>>>>>>>>> + * @dev: port device
>>>>>>>>>> + * @nlh: netlink msg with bridge port attributes
>>>>>>>>>> + *
>>>>>>>>>> + * Notify switch device port of bridge port attributes
>>>>>>>>>> + */
>>>>>>>>>> +int netdev_switch_port_bridge_setlink(struct net_device *dev,
>>>>>>>>>> + struct nlmsghdr *nlh, u16 flags)
>>>>>>>>>> +{
>>>>>>>>>> + const struct net_device_ops *ops = dev->netdev_ops;
>>>>>>>>>> + struct net_device *lower_dev;
>>>>>>>>>> + struct list_head *iter;
>>>>>>>>>> + int ret = 0, err = 0;
>>>>>>>>>> +
>>>>>>>>>> + if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>>>>>>>>>> + return err;
>>>>>>>>>> +
>>>>>>>>>> + if (ops->ndo_bridge_setlink) {
>>>>>>>>>> + WARN_ON(!ops->ndo_switch_parent_id_get);
>>>>>>>>>> + return ops->ndo_bridge_setlink(dev, nlh, flags);
>>>>>>>>> You have to change ndo_bridge_setlink in netdevice.h first.
>>>>>>>>> Otherwise when only this patch is applied (during bisection)
>>>>>>>>> this won't compile.
>>>>>>>> ack, will fix it and keep that in mind next time.
>>>>>>>>>> + }
>>>>>>>>>> +
>>>>>>>>>> + netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>>>>>>>> I do not understand why to iterate over lower devices. At this
>>>>>>>>> stage we don't know a thing about this upper or its lowers. Let
>>>>>>>>> the uppers (/masters) to decide if this needs to be propagated
>>>>>>>>> or not.
>>>>>>>> Jiri, In the stacked devices case, there is no way to propagate the bridge
>>>>>>>> port attributes to switch device driver today (vlan and other bridge port
>>>>>>>> attributes). Can you tell me if there is a way ?. no, ndo_vlan* ndo's are not
>>>>>>>> useful here. Nor we should go and implement ndo_bridge_setlink* in all
>>>>>>>> devices that can be bridge ports.
>>>>>>> Hmm. I just think that is cleaner to implement ndo_bridge_setlink in
>>>>>>> bonding for example and let it propagate the the call to slaves.
>>>>>> No, that will require bridge attribute support in all drivers. And that is no
>>>>>> good.
>>>>> Not all drivers, just all masters which want to support this. Like bond,
>>>>> team, macvlan etc. That would be the same as for
>>>>> ndo_vlan_rx_add_vid/ndo_vlan_rx_kill_vid/ndo_change_mtu etc. I do not
>>>>> see any problem in that. It is much much clearer over big hammer iterate
>>>>> over lowers in my opinion.
>>>> You cannot avoid the lowerdev iteration in any case.
>>>> If you added it in the individual drivers: bond, macvlan and other drivers
>>>> will all have to do the same thing.
>>>> ie Call bridge setlink on lowerdevs.
>>> I feel that the right way is to let masters propagate that themselves in
>>> their code.
>> In this case no. Just because an interface is a port in a bridge, it is wrong
>> to indicate that the interface driver needs to understand all bridge port
>> configuration attributes. Note that with what you are asking for ...all
>> bridge port drivers (bonds, vxlans) will also need to implement the netdev
>> stp state update api.
> I'm very well aware of this fact. But still, I'm convinced that the way
> similar things are implemented now, using prapagation inside particular
> drivers (bond/team/etc) is the correct way to go. I do not see any
> downside in that. But we are running in circles here. I would love to
> hear opinion of other people here.
I know you are aware of this fact ,...but just stating again here that
we are also talking about l3 ops here.
Which don't make sense to be in all drivers. Maybe i am being very
conservative.
And yes, agree we are going in circles. :). and yes, would love to hear
other opinions as well.
>
>
>>> That's it. I might be wrong of course.
>>
>>>> My patch avoids the need to modify these drivers. Besides it does this only
>>>> when the OFFLOAD flag is set.
>>> Yep, well in my reply to another patch of you series I expressed my
>>> feeling that the flag should be really checked in particular switch
>>> driver, not core. But I might be wrong there as well...
>> The bridge driver owns these attributes...and he needs to call the switchdev
>> api to offload.
>> And the condition for the switchdev api call is the offload flag. And the
>> offload flag is part of the switchdev api.
>> The drivers just set it on the netdev, they dont own the offload flag. So, I
>> don't see a reason why the core should not
>> know about the flag.
> I do not understand the formulation "own the offload flag". What I say
> is let the bridge/others to call the switchdev api unconditionally and
> let the leaf drivers handle that as they see fit, taking various facts
> into account, flags included. This way you avoid the need for flags
> inheritance in stacked scenarios. Imagine following example:
>
> bridge - bond --- eth1
> --- eth2
>
> eth1 and eth2 are switch ports. Now eth1 has the flag set and eth2 does
> not. Should the bond have the flag set or not? And if it has, eth2 need
> to check the flag as well to do not offload.
>
> Implementing the inheritance correctly would be a small nightmare. So I
> say, why don't just let the leafs to check and decide.
Note that i wasn't really considering inheritance as the major factor in
our argument.
If that needs to be dropped, i can consider that. But, the flag is there
for a reason: To not unconditionally call lowerdev ndo's.
ie reduce some overhead.
And in the above case, the bond does get the flag..because it has a
lowerdev with the feature flag.
And note that the inheritance does not come with a cost. This is all
existing code. If i hadn't mentioned it, it would probably go unnoticed ;).
I had to just add the below patch. And existing calls to
netdev_update_features in bridge ndo_add/del_slave takes care of
inheriting the feature flags from the slaves.
again, i did not introduce any complexity here. The feature flag is just
there to reduce some overhead in unnecessarily traversing all lowerdevs.
@@ -159,7 +161,8 @@ enum {
*/
#define NETIF_F_ONE_FOR_ALL (NETIF_F_GSO_SOFTWARE | NETIF_F_GSO_ROBUST | \
NETIF_F_SG | NETIF_F_HIGHDMA | \
- NETIF_F_FRAGLIST | NETIF_F_VLAN_CHALLENGED)
+ NETIF_F_FRAGLIST | NETIF_F_VLAN_CHALLENGED | \
+ NETIF_F_HW_NETFUNC_OFFLOAD)
/*
* If one device doesn't support one of these features, then disable it
* for all in netdev_increment_features.
>
>
>> What has been accepted in the kernel currently does not help bridge driver
>> offloading to switchdev. It does help if you want to manage your switch
>> device separately like you were already doing with nics. ie going to switch
>> port driver directly. It does not help the stacked device case either.
>>
>>
>> I will resubmit my series with the checkpatch errors you pointed out.
>>
>> And, am also looking at other ways to solve the problem.
>>
>> Thanks for the review.
>>
>>
>>>> It will not stop at adding the ndo_bridge_setlink to bond/macvlan etc. It
>>>> will be all other ndo_ops we will need for switch asics.
>>>> It will be l3 tomorrow, if the route is through a bond (But at that point, we
>>>> may end up having to introduce switch device instead of going to the port.
>>>> Lets see).
>>>>
>>>> Today this patch introduces an abstract way to get to the switch driver by
>>>> getting to the slave switch port (And only when the OFFLOAD flag is set).
>>>>
>>>>
>>>>>>> Let every "upper" to handle ndo_bridge_setlink their way. Sometimes it
>>>>>>> might not make sense to propagate to "lowers".
>>>>>> This does not really propagate to lowers. It is just trying to get to a
>>>>>> switch port and from there to the switch driver.
>>>>>> Example, bond driver does not need to care if its a bridge port. It will
>>>>>> simply pass the call to its slave which
>>>>>> might be a switch port.
>>>>>>
>>>>>> bond driver does not care if its a bridge port. But the switch driver cares,
>>>>>> because it knows that the bond was created with switch ports.
>>>>>>
>>>>>>
>>>>>>>> And this allows a switch driver to receive these callbacks if it has marked
>>>>>>>> the switch port with an offload flag. Your way of using the switch port to
>>>>>>>> get to the switch driver does not help in these cases.
>>>>>>> I do not follow how this is related to this case (stacked layout).
>>>>>>>
>>>>>>>> The other option is to use the 'switch device (not port)' to get to the
>>>>>>>> switch driver.
>>>>>>> That would not help this case (stacked layout) I believe.
>>>>>>>
>>>>>>>
>>>>>>>> This patch shows that you can still do this with the ndo ops.
>>>>>>>>>> + err = netdev_switch_port_bridge_setlink(lower_dev, nlh, flags);
>>>>>>>>>> + if (err)
>>>>>>>>>> + ret = err;
>>>>>>>>>> + }
>>>>>>>>> ^^^^^ Indent is off. This should be catched by scripts/checkpatch.pl.
>>>>>>>>>
>>>>>>>>>> +
>>>>>>>>>> + return ret;
>>>>>>>>>> +}
>>>>>>>>>> +EXPORT_SYMBOL(netdev_switch_port_bridge_setlink);
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * netdev_switch_port_bridge_dellink - Notify switch device port of bridge
>>>>>>>>>> + * attribute delete
>>>>>>>>>> + *
>>>>>>>>>> + * @dev: port device
>>>>>>>>>> + * @nlh: netlink msg with bridge port attributes
>>>>>>>>>> + *
>>>>>>>>>> + * Notify switch device port of bridge port attribute delete
>>>>>>>>>> + */
>>>>>>>>>> +int netdev_switch_port_bridge_dellink(struct net_device *dev,
>>>>>>>>>> + struct nlmsghdr *nlh, u16 flags)
>>>>>>>>>> +{
>>>>>>>>>> + const struct net_device_ops *ops = dev->netdev_ops;
>>>>>>>>>> + struct net_device *lower_dev;
>>>>>>>>>> + struct list_head *iter;
>>>>>>>>>> + int ret = 0, err = 0;
>>>>>>>>>> +
>>>>>>>>>> + if (!(dev->features & NETIF_F_HW_NETFUNC_OFFLOAD))
>>>>>>>>>> + return err;
>>>>>>>>>> +
>>>>>>>>>> + if (ops->ndo_bridge_dellink) {
>>>>>>>>>> + WARN_ON(!ops->ndo_switch_parent_id_get);
>>>>>>>>>> + return ops->ndo_bridge_dellink(dev, nlh, flags);
>>>>>>>>>> + }
>>>>>>>>>> +
>>>>>>>>>> + netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>>>>>>>>> + err = netdev_switch_port_bridge_dellink(lower_dev, nlh, flags);
>>>>>>>>>> + if (err)
>>>>>>>>>> + ret = err;
>>>>>>>>>> + }
>>>>>>>>>> +
>>>>>>>>>> + return ret;
>>>>>>>>>> +}
>>>>>>>>>> +EXPORT_SYMBOL(netdev_switch_port_bridge_dellink);
>>>>>>>>>> --
>>>>>>>>>> 1.7.10.4
>>>>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox