Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net] failover: eliminate callback hell
From: Michael S. Tsirkin @ 2018-06-05 19:38 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: kys, haiyangz, davem, sridhar.samudrala, netdev,
	Stephen Hemminger
In-Reply-To: <20180605115305.502a7ebb@xeon-e3>

On Tue, Jun 05, 2018 at 11:53:05AM -0700, Stephen Hemminger wrote:
> > >   * Now, netvsc and net_failover use the same delayed work type
> > >     mechanism for setup. Previously, net_failover code was triggering off
> > >     name change but a similar policy was rejected for netvsc.
> > >     "what is good for the goose is good for the gander"  
> > 
> > I don't really understand what you are saying here.  I think the delayed
> > hack is kind of ugly and seems racy.  Current failover code was rejected
> > by whom?  Why is new one good and for whom?  Did you want to do a name
> > change in netvsc but it was rejected? Could you clarify please?
> 
> See:
>    https://patchwork.ozlabs.org/patch/851711/

Let me try to summarize that:

	You wanted to speed up the delayed link up.  You had an idea to
	additionally take link up when userspace renames the interface (standby
	one which is also the failover for netvsc).

	But userspace might not do any renames, in which case there will
	still be the delay, and so this never got applied.

	Is this a good summary?

Davem said delay should go away completely as it's not robust, and I
think I agree.  So I don't think we should make all failover users use
delay. IIUC failover kept a delay option especially for netvsc to
minimize the surprise factor. Hopefully we can come up with
something more robust and drop that option completely.


> > >   * Set permanent and current address of net_failover device
> > >     to match the primary.
> > > 
> > >   * Carrier should be marked off before registering device
> > >     the net_failover device.  
> > 
> > Are above two bugfixes?
> 
> Yes.

Maybe fix these two as first patches in the set?

> > > Although this patch needs to go into 4.18 (linux-net),  
> > 
> > I'd rather we focused on fixing bugs in 4.18, and left refactoring to
> > 4.19.
> >
> 
> Either we fix or revert the current code in 4.18.
> Sorry, I am not having callback hell code in any vendor or upstream kernel.

I agree callbacks add complexity which often isn't necessary, so
removing them where possible is a good cleanup.  But maybe a patch
shouldn't mix bugfixes, cleanups and behaviour changes all together.  If
nothing else it makes review harder.  Splitting patches up might make it
more likely they can go into 4.18 which seems to be what you want.

HTH,
-- 
MST

^ permalink raw reply

* Re: [PATCH net-next v2 0/2] net: phy: improve PHY suspend/resume
From: Heiner Kallweit @ 2018-06-05 19:39 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Florian Fainelli, David Miller, netdev@vger.kernel.org
In-Reply-To: <28a8846a-f7a7-3266-3aec-7f58fe8dac72@gmail.com>

On 02.06.2018 22:27, Heiner Kallweit wrote:
> On 01.06.2018 02:10, Andrew Lunn wrote:
>>> Configuring the different WoL options isn't handled by writing to
>>> the PHY registers but by writing to chip / MAC registers.
>>> Therefore phy_suspend() isn't able to figure out whether WoL is
>>> enabled or not. Only the parent has the full picture.
>>
>> Hi Heiner
>>
>> I think you need to look at your different runtime PM domains.  If i
>> understand the code right, you runtime suspend if there is no
>> link. But for this to work correctly, your PHY needs to keep working.
>> You also cannot assume all accesses to the PHY go via the MAC. Some
>> calls will go direct to the PHY, and they can trigger MDIO bus
>> accesses.  So i think you need two runtime PM domains. MAC and MDIO
>> bus.  Maybe just the pll? An MDIO bus is a device, so it can have its
>> on PM callbacks. It is not clear what you need to resume in order to
>> make MDIO work.
>>
> Thanks for your comments!
> The actual problem is quite small: I get an error at MDIO suspend,
> the PHY however is suspended later by the driver's suspend callback
> anyway. Because the problem is small I'm somewhat reluctant to
> consider bigger changes like introducing different PM domains.
> 
> Primary reason for the error is that the network chip is in PCI D3hot
> at that moment. In addition to that for some of the chips supported by
> the driver also MDIO-relevant PLL's might be disabled.
> 
> By the way:
> When checking PM handling for PHY/MDIO I stumbled across something
> that can be improved IMO, I'll send a patch for your review.
> 
I experimented a little and with adding Runtime PM to MDIO bus and
PHY device I can make it work:
PHY runtime-resumes before entering suspend and resumes its parent
(MDIO bus) which then resumes its parent (PCI device).
However this needs quite some code and is hard to read / understand
w/o reading through this mail thread.

And in general I still have doubts this is the right way. Let's
consider the following scenario:

A network driver configures WoL in its suspend callback
(incl. setting the device to wakeup-enabled).
The suspend callback of the PHY is called before this and therefore
has no clue that WoL will be configured a little later, with the
consequence that it will do an unsolicited power-down.
The network driver then has to detect this and power-up the PHY again.
This doesn't seem to make much sense and still my best idea is to
establish a mechanism that a device can state: I orchestrate PM
of my children.

Heiner

>> It might also help if you do the phy_connect in .ndo_open and
>> disconnect in .ndo_stop. This is a common pattern in drivers. But some
>> also do it is probe and remove.
>>
> Thanks for the hint. I will move phy_connect_direct accordingly.
> 
>>      Andrew
>>
> Heiner
> 

^ permalink raw reply

* [PATCH iproute2-next 1/1] tc: add json support in csum action
From: Keara Leibovitz @ 2018-06-05 19:30 UTC (permalink / raw)
  To: dsahern; +Cc: stephen, netdev, kernel, Keara Leibovitz

Signed-off-by: Keara Leibovitz <kleib@mojatatu.com>
---
 tc/m_csum.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/tc/m_csum.c b/tc/m_csum.c
index 8391071d73f2..67481667d9d2 100644
--- a/tc/m_csum.c
+++ b/tc/m_csum.c
@@ -162,6 +162,7 @@ print_csum(struct action_util *au, FILE *f, struct rtattr *arg)
 	char *uflag_5 = "";
 	char *uflag_6 = "";
 	char *uflag_7 = "";
+	char buf[64] = {0};
 
 	int uflag_count = 0;
 
@@ -198,12 +199,15 @@ print_csum(struct action_util *au, FILE *f, struct rtattr *arg)
 		uflag_1 = "?empty";
 	}
 
-	fprintf(f, "csum (%s%s%s%s%s%s%s) ",
-		uflag_1, uflag_2, uflag_3,
-		uflag_4, uflag_5, uflag_6, uflag_7);
+	snprintf(buf, sizeof(buf), "%s%s%s%s%s%s%s",
+		 uflag_1, uflag_2, uflag_3,
+		 uflag_4, uflag_5, uflag_6, uflag_7);
+	print_string(PRINT_ANY, "csum", "csum (%s) ", buf);
+
 	print_action_control(f, "action ", sel->action, "\n");
-	fprintf(f, "\tindex %u ref %d bind %d", sel->index, sel->refcnt,
-		sel->bindcnt);
+	print_uint(PRINT_ANY, "index", "\tindex %u", sel->index);
+	print_int(PRINT_ANY, "ref", " ref %d", sel->refcnt);
+	print_int(PRINT_ANY, "bind", " bind %d", sel->bindcnt);
 
 	if (show_stats) {
 		if (tb[TCA_CSUM_TM]) {
@@ -212,7 +216,7 @@ print_csum(struct action_util *au, FILE *f, struct rtattr *arg)
 			print_tm(f, tm);
 		}
 	}
-	fprintf(f, "\n");
+	print_string(PRINT_FP, NULL, "%s", "\n");
 
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH] r8169: Reinstate ALDPS and ASPM support
From: Heiner Kallweit @ 2018-06-05 19:41 UTC (permalink / raw)
  To: Florian Fainelli, Bjorn Helgaas, Ryankao
  Cc: Kai Heng Feng, jrg.otte@gmail.com, David Miller, Hayes Wang,
	romieu@fr.zoreil.com, Linux Netdev List,
	Linux Kernel Mailing List, Hau, linux-pci
In-Reply-To: <d50e04b3-87be-f300-b0ea-2da7e3660ebb@gmail.com>

On 05.06.2018 21:27, Florian Fainelli wrote:
> On 06/05/2018 12:17 PM, Heiner Kallweit wrote:
>> On 05.06.2018 21:11, Bjorn Helgaas wrote:
>>> [+cc linux-pci]
>>>
>>> On Tue, Jun 05, 2018 at 12:28:05PM -0500, Bjorn Helgaas wrote:
>>>> On Tue, Jun 05, 2018 at 06:34:09AM +0000, Ryankao wrote:
>>>>> Add realtek folk Hau
>>>>>
>>>>> -----Original Message-----
>>>>> From: Kai Heng Feng [mailto:kai.heng.feng@canonical.com] 
>>>>> Sent: Tuesday, June 05, 2018 1:02 PM
>>>>> To: jrg.otte@gmail.com
>>>>> Cc: David Miller <davem@davemloft.net>; Hayes Wang <hayeswang@realtek.com>; hkallweit1@gmail.com; romieu@fr.zoreil.com; Linux Netdev List <netdev@vger.kernel.org>; Linux Kernel Mailing List <linux-kernel@vger.kernel.org>; Ryankao <ryankao@realtek.com>
>>>>> Subject: Re: [PATCH] r8169: Reinstate ALDPS and ASPM support
>>>>>
>>>>> Hi Jörg Otte,
>>>>>
>>>>> Can you give this patch a try?
>>>>>
>>>>> Since you are the only one that reported ALDPS/ASPM regression,
>>>>>
>>>>> And I think this patch should solve the issue you had [1].
>>>>>
>>>>> Hopefully we don't need to go down the rabbit hole of blacklist/whitelist...
>>>>>
>>>>> Kai-Heng
>>>>>
>>>>> [1] https://lkml.org/lkml/2013/1/5/36
>>>>
>>>> I have no idea what ALDPS is.  It's not mentioned in the PCIe spec, so
>>>> presumably it's some Realtek-specific thing.  ASPM is a generic PCIe
>>>> thing.  Changes to these two things should be in separate patches so
>>>> they don't get tangled up.
>>>>
>> ALDPS = Advanced Link Down Power Saving
>> And yes, it's a Realtek feature.
> 
> Link as in Ethernet link or PCI(e) link? Sorry too lazy to let me google
> that for myself :)
> 
Sorry .. Link here refers to the Ethernet PHY.

^ permalink raw reply

* Re: [PATCH iproute2-next 1/1] tc: add json support in csum action
From: David Ahern @ 2018-06-05 19:56 UTC (permalink / raw)
  To: Keara Leibovitz, dsahern; +Cc: stephen, netdev, kernel
In-Reply-To: <1528227039-21831-1-git-send-email-kleib@mojatatu.com>

On 6/5/18 12:30 PM, Keara Leibovitz wrote:

please add some words here. e.g., add example output

> Signed-off-by: Keara Leibovitz <kleib@mojatatu.com>
> ---
>  tc/m_csum.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/tc/m_csum.c b/tc/m_csum.c
> index 8391071d73f2..67481667d9d2 100644
> --- a/tc/m_csum.c
> +++ b/tc/m_csum.c
> @@ -162,6 +162,7 @@ print_csum(struct action_util *au, FILE *f, struct rtattr *arg)
>  	char *uflag_5 = "";
>  	char *uflag_6 = "";
>  	char *uflag_7 = "";
> +	char buf[64] = {0};

initialization is not needed. It is set before use.

^ permalink raw reply

* [PATCH 1/1] net: dsa: b53: Fix for brcm tag issue in Cygnus SoC
From: Arun Parameswaran @ 2018-06-05 20:38 UTC (permalink / raw)
  To: Florian Fainelli, Vivien Didelot, Andrew Lunn, David S. Miller
  Cc: netdev, linux-kernel, bcm-kernel-feedback-list, Arun Parameswaran

In the Broadcom Cygnus SoC, the brcm tag needs to be inserted
in between the mac address and the ether type (should use
'DSA_PROTO_TAG_BRCM') for the packets sent to the internal
b53 switch.

Since the Cygnus was added with the BCM58XX device id and the
BCM58XX uses 'DSA_PROTO_TAG_BRCM_PREPEND', the data path is
broken, due to the incorrect brcm tag location.

Add a new b53 device id (BCM583XX) for Cygnus family to fix the
issue. Add the new device id to the BCM58XX family as Cygnus
is similar to the BCM58XX in most other functionalities.

Fixes: 11606039604c ("net: dsa: b53: Support prepended Broadcom tags")

Signed-off-by: Arun Parameswaran <arun.parameswaran@broadcom.com>
---
 drivers/net/dsa/b53/b53_common.c | 15 ++++++++++++++-
 drivers/net/dsa/b53/b53_priv.h   |  2 ++
 drivers/net/dsa/b53/b53_srab.c   |  4 ++--
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 3da5fca..bbc6cc6 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -684,7 +684,8 @@ static int b53_switch_reset(struct b53_device *dev)
 	 * still use this driver as a library and need to perform the reset
 	 * earlier.
 	 */
-	if (dev->chip_id == BCM58XX_DEVICE_ID) {
+	if (dev->chip_id == BCM58XX_DEVICE_ID ||
+	    dev->chip_id == BCM583XX_DEVICE_ID) {
 		b53_read8(dev, B53_CTRL_PAGE, B53_SOFTRESET, &reg);
 		reg |= SW_RST | EN_SW_RST | EN_CH_RST;
 		b53_write8(dev, B53_CTRL_PAGE, B53_SOFTRESET, reg);
@@ -1880,6 +1881,18 @@ struct b53_chip_data {
 		.jumbo_size_reg = B53_JUMBO_MAX_SIZE,
 	},
 	{
+		.chip_id = BCM583XX_DEVICE_ID,
+		.dev_name = "BCM583xx/11360",
+		.vlans = 4096,
+		.enabled_ports = 0x103,
+		.arl_entries = 4,
+		.cpu_port = B53_CPU_PORT,
+		.vta_regs = B53_VTA_REGS,
+		.duplex_reg = B53_DUPLEX_STAT_GE,
+		.jumbo_pm_reg = B53_JUMBO_PORT_MASK,
+		.jumbo_size_reg = B53_JUMBO_MAX_SIZE,
+	},
+	{
 		.chip_id = BCM7445_DEVICE_ID,
 		.dev_name = "BCM7445",
 		.vlans	= 4096,
diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h
index 3b57f47..b232aaa 100644
--- a/drivers/net/dsa/b53/b53_priv.h
+++ b/drivers/net/dsa/b53/b53_priv.h
@@ -62,6 +62,7 @@ enum {
 	BCM53018_DEVICE_ID = 0x53018,
 	BCM53019_DEVICE_ID = 0x53019,
 	BCM58XX_DEVICE_ID = 0x5800,
+	BCM583XX_DEVICE_ID = 0x58300,
 	BCM7445_DEVICE_ID = 0x7445,
 	BCM7278_DEVICE_ID = 0x7278,
 };
@@ -181,6 +182,7 @@ static inline int is5301x(struct b53_device *dev)
 static inline int is58xx(struct b53_device *dev)
 {
 	return dev->chip_id == BCM58XX_DEVICE_ID ||
+		dev->chip_id == BCM583XX_DEVICE_ID ||
 		dev->chip_id == BCM7445_DEVICE_ID ||
 		dev->chip_id == BCM7278_DEVICE_ID;
 }
diff --git a/drivers/net/dsa/b53/b53_srab.c b/drivers/net/dsa/b53/b53_srab.c
index c37ffd1..8247481 100644
--- a/drivers/net/dsa/b53/b53_srab.c
+++ b/drivers/net/dsa/b53/b53_srab.c
@@ -364,7 +364,7 @@ static int b53_srab_write64(struct b53_device *dev, u8 page, u8 reg,
 	{ .compatible = "brcm,bcm53018-srab" },
 	{ .compatible = "brcm,bcm53019-srab" },
 	{ .compatible = "brcm,bcm5301x-srab" },
-	{ .compatible = "brcm,bcm11360-srab", .data = (void *)BCM58XX_DEVICE_ID },
+	{ .compatible = "brcm,bcm11360-srab", .data = (void *)BCM583XX_DEVICE_ID },
 	{ .compatible = "brcm,bcm58522-srab", .data = (void *)BCM58XX_DEVICE_ID },
 	{ .compatible = "brcm,bcm58525-srab", .data = (void *)BCM58XX_DEVICE_ID },
 	{ .compatible = "brcm,bcm58535-srab", .data = (void *)BCM58XX_DEVICE_ID },
@@ -372,7 +372,7 @@ static int b53_srab_write64(struct b53_device *dev, u8 page, u8 reg,
 	{ .compatible = "brcm,bcm58623-srab", .data = (void *)BCM58XX_DEVICE_ID },
 	{ .compatible = "brcm,bcm58625-srab", .data = (void *)BCM58XX_DEVICE_ID },
 	{ .compatible = "brcm,bcm88312-srab", .data = (void *)BCM58XX_DEVICE_ID },
-	{ .compatible = "brcm,cygnus-srab", .data = (void *)BCM58XX_DEVICE_ID },
+	{ .compatible = "brcm,cygnus-srab", .data = (void *)BCM583XX_DEVICE_ID },
 	{ .compatible = "brcm,nsp-srab", .data = (void *)BCM58XX_DEVICE_ID },
 	{ /* sentinel */ },
 };
-- 
1.9.1

^ permalink raw reply related

* [PATCH iproute2-next v2 1/1] tc: add json support in csum action
From: Keara Leibovitz @ 2018-06-05 20:44 UTC (permalink / raw)
  To: dsahern; +Cc: stephen, netdev, kernel, Keara Leibovitz

Add json output support for checksum action.

Example output:

~$ $TC actions add action csum udp continue index 7
~$ $TC actions add action csum icmp iph igmp pipe index 200 cookie 112233
~$ $TC -j actions ls action csum

[{
    "total acts":2
}, {
    "actions": [{
        "order":0,
        "csum":"udp",
        "control_action": {
            "type":"continue"
        },
        "index":7,
        "ref":1,
        "bind":0
    }, {
        "order":1,
        "csum":"iph, icmp, igmp",
        "control_action": {
            "type":"pipe"
        },
        "index":200,
        "ref":1,
        "bind":0,
        "cookie":"112233"
    }]
}]

v2:
    Don't initialized char buf[64];
    Add output example
    
Signed-off-by: Keara Leibovitz <kleib@mojatatu.com>
---
 tc/m_csum.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/tc/m_csum.c b/tc/m_csum.c
index 8391071d73f2..0bdbcf361a28 100644
--- a/tc/m_csum.c
+++ b/tc/m_csum.c
@@ -162,6 +162,7 @@ print_csum(struct action_util *au, FILE *f, struct rtattr *arg)
 	char *uflag_5 = "";
 	char *uflag_6 = "";
 	char *uflag_7 = "";
+	char buf[64];
 
 	int uflag_count = 0;
 
@@ -198,12 +199,15 @@ print_csum(struct action_util *au, FILE *f, struct rtattr *arg)
 		uflag_1 = "?empty";
 	}
 
-	fprintf(f, "csum (%s%s%s%s%s%s%s) ",
-		uflag_1, uflag_2, uflag_3,
-		uflag_4, uflag_5, uflag_6, uflag_7);
+	snprintf(buf, sizeof(buf), "%s%s%s%s%s%s%s",
+		 uflag_1, uflag_2, uflag_3,
+		 uflag_4, uflag_5, uflag_6, uflag_7);
+	print_string(PRINT_ANY, "csum", "csum (%s) ", buf);
+
 	print_action_control(f, "action ", sel->action, "\n");
-	fprintf(f, "\tindex %u ref %d bind %d", sel->index, sel->refcnt,
-		sel->bindcnt);
+	print_uint(PRINT_ANY, "index", "\tindex %u", sel->index);
+	print_int(PRINT_ANY, "ref", " ref %d", sel->refcnt);
+	print_int(PRINT_ANY, "bind", " bind %d", sel->bindcnt);
 
 	if (show_stats) {
 		if (tb[TCA_CSUM_TM]) {
@@ -212,7 +216,7 @@ print_csum(struct action_util *au, FILE *f, struct rtattr *arg)
 			print_tm(f, tm);
 		}
 	}
-	fprintf(f, "\n");
+	print_string(PRINT_FP, NULL, "%s", "\n");
 
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH 1/1] net: dsa: b53: Fix for brcm tag issue in Cygnus SoC
From: Scott Branden @ 2018-06-05 20:55 UTC (permalink / raw)
  To: Arun Parameswaran, Florian Fainelli, Vivien Didelot, Andrew Lunn,
	David S. Miller, Clément Péron
  Cc: netdev, linux-kernel, bcm-kernel-feedback-list
In-Reply-To: <1528231092-31472-1-git-send-email-arun.parameswaran@broadcom.com>

Clement,

Please test this works for your issue.


On 18-06-05 01:38 PM, Arun Parameswaran wrote:
> In the Broadcom Cygnus SoC, the brcm tag needs to be inserted
> in between the mac address and the ether type (should use
> 'DSA_PROTO_TAG_BRCM') for the packets sent to the internal
> b53 switch.
>
> Since the Cygnus was added with the BCM58XX device id and the
> BCM58XX uses 'DSA_PROTO_TAG_BRCM_PREPEND', the data path is
> broken, due to the incorrect brcm tag location.
>
> Add a new b53 device id (BCM583XX) for Cygnus family to fix the
> issue. Add the new device id to the BCM58XX family as Cygnus
> is similar to the BCM58XX in most other functionalities.
>
> Fixes: 11606039604c ("net: dsa: b53: Support prepended Broadcom tags")
>
> Signed-off-by: Arun Parameswaran <arun.parameswaran@broadcom.com>
Acked-by: Scott Branden <scott.branden@broadcom.com>
> ---
>   drivers/net/dsa/b53/b53_common.c | 15 ++++++++++++++-
>   drivers/net/dsa/b53/b53_priv.h   |  2 ++
>   drivers/net/dsa/b53/b53_srab.c   |  4 ++--
>   3 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
> index 3da5fca..bbc6cc6 100644
> --- a/drivers/net/dsa/b53/b53_common.c
> +++ b/drivers/net/dsa/b53/b53_common.c
> @@ -684,7 +684,8 @@ static int b53_switch_reset(struct b53_device *dev)
>   	 * still use this driver as a library and need to perform the reset
>   	 * earlier.
>   	 */
> -	if (dev->chip_id == BCM58XX_DEVICE_ID) {
> +	if (dev->chip_id == BCM58XX_DEVICE_ID ||
> +	    dev->chip_id == BCM583XX_DEVICE_ID) {
>   		b53_read8(dev, B53_CTRL_PAGE, B53_SOFTRESET, &reg);
>   		reg |= SW_RST | EN_SW_RST | EN_CH_RST;
>   		b53_write8(dev, B53_CTRL_PAGE, B53_SOFTRESET, reg);
> @@ -1880,6 +1881,18 @@ struct b53_chip_data {
>   		.jumbo_size_reg = B53_JUMBO_MAX_SIZE,
>   	},
>   	{
> +		.chip_id = BCM583XX_DEVICE_ID,
> +		.dev_name = "BCM583xx/11360",
> +		.vlans = 4096,
> +		.enabled_ports = 0x103,
> +		.arl_entries = 4,
> +		.cpu_port = B53_CPU_PORT,
> +		.vta_regs = B53_VTA_REGS,
> +		.duplex_reg = B53_DUPLEX_STAT_GE,
> +		.jumbo_pm_reg = B53_JUMBO_PORT_MASK,
> +		.jumbo_size_reg = B53_JUMBO_MAX_SIZE,
> +	},
> +	{
>   		.chip_id = BCM7445_DEVICE_ID,
>   		.dev_name = "BCM7445",
>   		.vlans	= 4096,
> diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h
> index 3b57f47..b232aaa 100644
> --- a/drivers/net/dsa/b53/b53_priv.h
> +++ b/drivers/net/dsa/b53/b53_priv.h
> @@ -62,6 +62,7 @@ enum {
>   	BCM53018_DEVICE_ID = 0x53018,
>   	BCM53019_DEVICE_ID = 0x53019,
>   	BCM58XX_DEVICE_ID = 0x5800,
> +	BCM583XX_DEVICE_ID = 0x58300,
>   	BCM7445_DEVICE_ID = 0x7445,
>   	BCM7278_DEVICE_ID = 0x7278,
>   };
> @@ -181,6 +182,7 @@ static inline int is5301x(struct b53_device *dev)
>   static inline int is58xx(struct b53_device *dev)
>   {
>   	return dev->chip_id == BCM58XX_DEVICE_ID ||
> +		dev->chip_id == BCM583XX_DEVICE_ID ||
>   		dev->chip_id == BCM7445_DEVICE_ID ||
>   		dev->chip_id == BCM7278_DEVICE_ID;
>   }
> diff --git a/drivers/net/dsa/b53/b53_srab.c b/drivers/net/dsa/b53/b53_srab.c
> index c37ffd1..8247481 100644
> --- a/drivers/net/dsa/b53/b53_srab.c
> +++ b/drivers/net/dsa/b53/b53_srab.c
> @@ -364,7 +364,7 @@ static int b53_srab_write64(struct b53_device *dev, u8 page, u8 reg,
>   	{ .compatible = "brcm,bcm53018-srab" },
>   	{ .compatible = "brcm,bcm53019-srab" },
>   	{ .compatible = "brcm,bcm5301x-srab" },
> -	{ .compatible = "brcm,bcm11360-srab", .data = (void *)BCM58XX_DEVICE_ID },
> +	{ .compatible = "brcm,bcm11360-srab", .data = (void *)BCM583XX_DEVICE_ID },
>   	{ .compatible = "brcm,bcm58522-srab", .data = (void *)BCM58XX_DEVICE_ID },
>   	{ .compatible = "brcm,bcm58525-srab", .data = (void *)BCM58XX_DEVICE_ID },
>   	{ .compatible = "brcm,bcm58535-srab", .data = (void *)BCM58XX_DEVICE_ID },
> @@ -372,7 +372,7 @@ static int b53_srab_write64(struct b53_device *dev, u8 page, u8 reg,
>   	{ .compatible = "brcm,bcm58623-srab", .data = (void *)BCM58XX_DEVICE_ID },
>   	{ .compatible = "brcm,bcm58625-srab", .data = (void *)BCM58XX_DEVICE_ID },
>   	{ .compatible = "brcm,bcm88312-srab", .data = (void *)BCM58XX_DEVICE_ID },
> -	{ .compatible = "brcm,cygnus-srab", .data = (void *)BCM58XX_DEVICE_ID },
> +	{ .compatible = "brcm,cygnus-srab", .data = (void *)BCM583XX_DEVICE_ID },
>   	{ .compatible = "brcm,nsp-srab", .data = (void *)BCM58XX_DEVICE_ID },
>   	{ /* sentinel */ },
>   };

^ permalink raw reply

* Re: [PATCH 4/4] cpsw: add switchdev support
From: Grygorii Strashko @ 2018-06-05 21:03 UTC (permalink / raw)
  To: Ilias Apalodimas, Florian Fainelli
  Cc: Andrew Lunn, netdev, ivan.khoronzhuk, nsekhar, jiri, ivecera,
	francois.ozog, yogeshs, spatton
In-Reply-To: <20180602103432.GA948@apalos>



On 06/02/2018 05:34 AM, Ilias Apalodimas wrote:
> Hi Florian,
> 
> Thanks for taking time to look into this
> 
> On Fri, Jun 01, 2018 at 02:48:48PM -0700, Florian Fainelli wrote:
>>
>>
>> On 05/24/2018 09:56 PM, Ilias Apalodimas wrote:
>>> On Thu, May 24, 2018 at 06:39:04PM +0200, Andrew Lunn wrote:
>>>> On Thu, May 24, 2018 at 04:32:34PM +0300, Ilias Apalodimas wrote:
>>>>> On Thu, May 24, 2018 at 03:12:29PM +0200, Andrew Lunn wrote:
>>>>>> Device tree is supposed to describe the hardware. Using that hardware
>>>>>> in different ways is not something you should describe in DT.
>>>>>>
>>>>> The new switchdev mode is applied with a .config option in the kernel. What you
>>>>> see is pre-existing code, so i am not sure if i should change it in this
>>>>> patchset.
>>>>
>>>> If you break the code up into a library and two drivers, it becomes a
>>>> moot point.
>>> Agree
>>>
>>>>
>>>> But what i don't like here is that the device tree says to do dual
>>>> mac. But you ignore that and do sometime else. I would prefer that if
>>>> DT says dual mac, and switchdev is compiled in, the probe fails with
>>>> EINVAL. Rather than ignore something, make it clear it is invalid.
>>> The switch has 3 modes of operation as is.
>>> 1. switch mode, to enable that you don't need to add anything on
>>> the DTS and linux registers a single netdev interface.
>>> 2. dual mac mode, this is when you need to add dual_emac; on the DTS.
>>> 3. switchdev mode which is controlled by a .config option, since as you
>>> pointed out DTS was not made for controlling config options.
>>>
>>> I agree that this is far from beautiful. If the driver remains as in though,
>>> i'd prefer either keeping what's there or making "switchdev" a DTS option,
>>> following the pre-existing erroneous usage rather than making the device
>>> unusable.  If we end up returning some error and refuse to initialize, users
>>> that remote upgrade their equipment, without taking a good look at changelog,
>>> will loose access to their devices with no means of remotely fixing that.
>>
>> It seems to me that the mistake here is seeing multiple modes of
>> operations for the cpsw. There are not actually many, there is one
>> usage, and then there is what you can and cannot offload.

> CPSW has in fact 2 modes of operation, different FIFO usage/lookup entry(it's
> called ALE in the current driver) by-pass(which is used in dual emac for
> example) and other features. Again Grygorii is better suited to answer the
> exact differences.

dual_mac is HW enabled mode of operation, so having DT option is pretty
reasonable as for me. 
1) when enabled it configures internal FIFOs in special way so both
external Ports became equal in the direction toward to Host port 0.

TRM "The intention of this mode is to allow packets from both ethernet ports
to be written into the FIFO without one port starving the other port."

2) ALE, out of the box, does not support this mode and, as result, two
default vlan have to be created to direct traffic P1->P0 (vlan1) and
P2-P0 (vlan2), but any packets P0->P1 and P0->P2 have to be directed
(and will bypass ALE). This way traffic separated on cpsw egress towards to P0, 

>> The basic
>> premise with switchdev and DSA (which uses switchdev) is that each
>> user-facing port of your switch needs to work as if it were a normal
>> Ethernet NIC, that is what you call dual-MAC I believe. Then, when you
>> create a bridge and you enslave those ports into the bridge, you need to
>> have forwarding done in hardware between these two ports when the
>> SMAC/DMAC are not for the host/CPU/management interface and you must
>> simultaneously still have the host have the ability to send/receive
>> traffic through the bridge device.

TRM
"When operating in dual mac mode the intention is to transfer packets between ports 0 and 1 and ports 0
and 2, but not between ports 1 and 2. Each CPGMAC_SL appears as a single MAC with no bridging
between MAC’s. Each CPGMAC_SL has at least one unique (not the same) mac address."

So, we cant't talk about forwarding in dual_mac mode. Interfaces expected to be 
completely independent without any packet leaking between interfaces.

!! BUT !! CPSW ALE (switch core) still expected to be used, but for packet filtering
- only registered vlans
- only registered mcast/bcast
- ingress mcast/bcast rate limiting (it's actually more like coalescing -
 limits number of mcast/bcast packets per sec. 

And all offloading ALE (val/mdb) entries should always contain two ports in masks:
p1&p0 or p2&p0. Never ever all three ports. 

> Yes dual emac does that. But dual emac configures the port facing VLAN to the
> CPU port as well. So dual emac splits and uses 2 interfaces. VLAN 1 is
> configured on port1 + CPU port and VLAN 2 is confired on port 2 + CPU port
> That's exactly what the current RFC does as well, with the addition of
> registering a sw0p0 (i'll explain why later on this mail)
> A little more detail on the issue we are having. On my description
> sw0p0 -> CPU port, sw0p1 -> port 1 sw0p2 -> port 2. sw0p1/sw0p2 are the ports
> that have PHYs attached.
> 
> When we start in the new switchdev mode all interfaces are added to VLAN 0
> so CPU port + port1 + port2 are all in the same VLAN group. In that case sw0p1
> and sw0p2 are working as you describe. So those 2 interfaces can send/receive
> traffic normally which matches the switchdev case.
> 
> When we add them on a bridge(let's say br0), VLAN1(or any default bridge VLAN)
> is now configured on sw0p1 and sw0p2 but *not* on the CPU port.
>  From this point on the whole fuunctionality just collapses. The switch will
> work and offload traffic between sw0p1/sw0p2 but the bridge won't be able to
> get an ip address (since VLAN1 is not a member of the CPU port and the packet
> gets dropped).
> IGMPv2/V3 messages will never reach the br_multicast.c code to trigger
> switchdev and configure the MDBs on the ports.  i am prety sure there are other
> fail scenarios which i haven't discovered already, but those 2 are the most
> basic ones.  If we add VLAN1 on the CPU port, everything works as intended of
> course.
> 
> That's the reason we registered sw0p0 as the CPU port. It can't do any "real"
> traffic, but you can configure the CPU port independantly and not be forced to
> do an OR on every VLAN add/delete grouping the CPU port with your port command.
> The TL;DR version of this is that the switch is working exactly as switchdev is
> expecting offloading traffic to the hardware when possible as long as the CPU
> port is member of the proper VLANs
> 
> Petr's patch solves this for us (9c86ce2c1ae337fc10568a12aea812ed03de8319).
> We can now do "bridge vlan add dev br0 vid 100 pvid untagged self" and decide
> when to add the CPU port or not.
> 
> There are still a couple of cases that are not covered though, if we don't
> register the CPU port. We cant decide when to forward multicast
> traffic on the CPU port if a join hasn't been sent from br0.
> So let's say you got 2 hosts doing multicast and for whatever reason the host
> wants to see that traffic.
> With the CPU port present you can do a
> "bridge mdb add dev br0 port sw0p0 grp 239.1.1.1 permanent" which will offload
> the traffic to the CPU port and thus the host. If this goes away we are forced
> to send a join.
>   
>> It seems to me like this is entirely doable given that the dual MAC use
>> case is supported already.
>>
>> switchdev is just a stateless framework to get notified from the
>> networking stack about what you can possibly offload in hardware, so
>> having a DTS option gate that is unfortunately wrong because it is
>> really implementing a SW policy in DTS which is not what it is meant for.
> The DTS option for configuration pre-existed, i don't like that either switchdev
> mode is activated by a .config option not DTS(it just overrides whatever config
> you have on the DTS). Far from pretty though fair point, i am open to
> suggestions.

Again this is option describing HW mode which not expected to be changed on the fly.
I'm honestly, propose descope dual_mac and concentrate on switch mode (switchdev) - 
right now I don't see how dual_mac can be supported with switchdev as per above.
The same way as I do not see how we can re-use switchdev with 50% of devices which 
have "only one" user-facing external port (P1 or P2).


-- 
regards,
-grygorii

^ permalink raw reply

* umh build...
From: David Miller @ 2018-06-05 21:03 UTC (permalink / raw)
  To: alexei.starovoitov; +Cc: netdev


Alexei, I tried to build bpfilter on sparc64 and it shows that
CONFIG_OUTPUT_FORMAT is an x86 specific Kconfig value.

And, even if I added it, it's not clear what it should even be set to.

Right now, for example, my userland builds default to 32-bit sparc
even though I'm building a 64-bit kernel.

I think what ends up happening has to be in some way in response to
what kind of "native" binaries HOSTCC is actually building.

^ permalink raw reply

* Re: [PATCH iproute2-next v2 1/1] tc: add json support in csum action
From: Stephen Hemminger @ 2018-06-05 21:10 UTC (permalink / raw)
  To: Keara Leibovitz; +Cc: dsahern, netdev, kernel
In-Reply-To: <1528231459-26546-1-git-send-email-kleib@mojatatu.com>

On Tue,  5 Jun 2018 16:44:19 -0400
Keara Leibovitz <kleib@mojatatu.com> wrote:

> diff --git a/tc/m_csum.c b/tc/m_csum.c
> index 8391071d73f2..0bdbcf361a28 100644
> --- a/tc/m_csum.c
> +++ b/tc/m_csum.c
> @@ -162,6 +162,7 @@ print_csum(struct action_util *au, FILE *f, struct rtattr *arg)
>  	char *uflag_5 = "";
>  	char *uflag_6 = "";
>  	char *uflag_7 = "";
> +	char buf[64];

Looks good.
In iproute2 the common macro for declaring these kind of buffers is:
	SPRINT_BUF(buf);

Doesn't matter that much, but nice to have consistency.

^ permalink raw reply

* Re: [PATCH 0/4] RFC CPSW switchdev mode
From: Grygorii Strashko @ 2018-06-05 21:18 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ilias Apalodimas, Ivan Vecera, Jiri Pirko, netdev,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton
In-Reply-To: <20180603000831.GA14515@lunn.ch>



On 06/02/2018 07:08 PM, Andrew Lunn wrote:
> On Sat, Jun 02, 2018 at 06:28:22PM -0500, Grygorii Strashko wrote:
> 
> Hi Grygorii
> 
> I'm just picking out one thing here... there is lots more good stuff here.
> 
>> Additional headache is PTP: we have on PHC, but both external interfaces P1/P2
>> can timestamp packets.
>   
> This should not be a problem. The Marvell switches have one PHC, but
> each port can time stamp packets using this counter. Each port has its
> own receive and transmit time stamp registers. So i don't think this
> will cause you problems.

I hope you are right - question is always in number of available options
and which one to select - and, most important, explain it to the end user :( 


For example:
phc_index is returned as part of .get_ts_info() = cpsw_get_ts_info(),
so which intf should return phc_index?

Still not tested, so jut hope ...


-- 
regards,
-grygorii

^ permalink raw reply

* Re: [PATCH bpf-next v5 00/10] BTF: BPF Type Format
From: Martin KaFai Lau @ 2018-06-05 21:25 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team
In-Reply-To: <20180419194034.GB3254@kernel.org>

On Thu, Apr 19, 2018 at 04:40:34PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Apr 18, 2018 at 03:55:56PM -0700, Martin KaFai Lau escreveu:
> > This patch introduces BPF Type Format (BTF).
> > 
> > BTF (BPF Type Format) is the meta data format which describes
> > the data types of BPF program/map.  Hence, it basically focus
> > on the C programming language which the modern BPF is primary
> > using.  The first use case is to provide a generic pretty print
> > capability for a BPF map.
> > 
> > A modified pahole that can convert dwarf to BTF is here:
> > https://github.com/iamkafai/pahole/tree/btf
> > (Arnaldo, there is some BTF_KIND numbering changes on
> >  Apr 18th, d61426c1571)
> 
> Thanks for letting me know, I'm starting to look at this,
Hi Arnaldo,

Do you have a chance to take a look and pull it?  The kernel
changes will be in 4.18, so it will be handy if it is available in
the pahole repository.

[ btw, the latest commit (1 commit) should be 94a11b59e592 ].

Thanks,
Martin

> 
> - Arnaldo
>  
> > Please see individual patch for details.
> > 
> > v5:
> > - Remove BTF_KIND_FLOAT and BTF_KIND_FUNC which are not
> >   currently used.  They can be added in the future.
> >   Some bpf_df_xxx() are removed together.
> > - Add comment in patch 7 to clarify that the new bpffs_map_fops
> >   should not be extended further.
> > 
> > v4:
> > - Fix warning (remove unneeded semicolon)
> > - Remove a redundant variable (nr_bytes) from btf_int_check_meta() in
> >   patch 1.  Caught by W=1.
> > 
> > v3:
> > - Rebase to bpf-next
> > - Fix sparse warning (by adding static)
> > - Add BTF header logging: btf_verifier_log_hdr()
> > - Fix the alignment test on btf->type_off
> > - Add tests for the BTF header
> > - Lower the max BTF size to 16MB.  It should be enough
> >   for some time.  We could raise it later if it would
> >   be needed.
> > 
> > v2:
> > - Use kvfree where needed in patch 1 and 2
> > - Also consider BTF_INT_OFFSET() in the btf_int_check_meta()
> >   in patch 1
> > - Fix an incorrect goto target in map_create() during
> >   the btf-error-path in patch 7
> > - re-org some local vars to keep the rev xmas tree in btf.c
> > 
> > Martin KaFai Lau (10):
> >   bpf: btf: Introduce BPF Type Format (BTF)
> >   bpf: btf: Validate type reference
> >   bpf: btf: Check members of struct/union
> >   bpf: btf: Add pretty print capability for data with BTF type info
> >   bpf: btf: Add BPF_BTF_LOAD command
> >   bpf: btf: Add BPF_OBJ_GET_INFO_BY_FD support to BTF fd
> >   bpf: btf: Add pretty print support to the basic arraymap
> >   bpf: btf: Sync bpf.h and btf.h to tools/
> >   bpf: btf: Add BTF support to libbpf
> >   bpf: btf: Add BTF tests
> > 
> >  include/linux/bpf.h                          |   20 +-
> >  include/linux/btf.h                          |   48 +
> >  include/uapi/linux/bpf.h                     |   12 +
> >  include/uapi/linux/btf.h                     |  130 ++
> >  kernel/bpf/Makefile                          |    1 +
> >  kernel/bpf/arraymap.c                        |   50 +
> >  kernel/bpf/btf.c                             | 2064 ++++++++++++++++++++++++++
> >  kernel/bpf/inode.c                           |  156 +-
> >  kernel/bpf/syscall.c                         |   51 +-
> >  tools/include/uapi/linux/bpf.h               |   12 +
> >  tools/include/uapi/linux/btf.h               |  130 ++
> >  tools/lib/bpf/Build                          |    2 +-
> >  tools/lib/bpf/bpf.c                          |   92 +-
> >  tools/lib/bpf/bpf.h                          |   16 +
> >  tools/lib/bpf/btf.c                          |  374 +++++
> >  tools/lib/bpf/btf.h                          |   22 +
> >  tools/lib/bpf/libbpf.c                       |  148 +-
> >  tools/lib/bpf/libbpf.h                       |    3 +
> >  tools/testing/selftests/bpf/Makefile         |   26 +-
> >  tools/testing/selftests/bpf/test_btf.c       | 1669 +++++++++++++++++++++
> >  tools/testing/selftests/bpf/test_btf_haskv.c |   48 +
> >  tools/testing/selftests/bpf/test_btf_nokv.c  |   43 +
> >  22 files changed, 5076 insertions(+), 41 deletions(-)
> >  create mode 100644 include/linux/btf.h
> >  create mode 100644 include/uapi/linux/btf.h
> >  create mode 100644 kernel/bpf/btf.c
> >  create mode 100644 tools/include/uapi/linux/btf.h
> >  create mode 100644 tools/lib/bpf/btf.c
> >  create mode 100644 tools/lib/bpf/btf.h
> >  create mode 100644 tools/testing/selftests/bpf/test_btf.c
> >  create mode 100644 tools/testing/selftests/bpf/test_btf_haskv.c
> >  create mode 100644 tools/testing/selftests/bpf/test_btf_nokv.c
> > 
> > -- 
> > 2.9.5

^ permalink raw reply

* Re: [PATCH net] net: sched: cls: Fix offloading when ingress dev is vxlan
From: Jakub Kicinski @ 2018-06-05 21:27 UTC (permalink / raw)
  To: David Miller
  Cc: paulb, jiri, xiyou.wangcong, jhs, netdev, kliteyn, roid, shahark,
	markb, ogerlitz
In-Reply-To: <20180605.150640.12956507851401975.davem@davemloft.net>

On Tue, 05 Jun 2018 15:06:40 -0400 (EDT), David Miller wrote:
> From: Jakub Kicinski <kubakici@wp.pl>
> Date: Tue, 5 Jun 2018 11:57:47 -0700
> 
> > Do we still care about correctness and not breaking backward
> > compatibility?  
> 
> Jakub let me know if you want me to revert this change.

Yes, I think this patch introduces a regression when block is shared
between offload capable and in-capable device, therefore it should be
reverted.  Shared blocks went through a number of review cycles to
ensure such cases are handled correctly.


Longer story about the actual issue which is never explained in the
commit message is this: in kernels 4.10 - 4.14 skip_sw flag was
supported on tunnels in cls_flower only:

static int fl_hw_replace_filter(struct tcf_proto *tp,
[...]
	if (!tc_can_offload(dev)) {
		if (tcf_exts_get_dev(dev, &f->exts, &f->hw_dev) ||
		    (f->hw_dev && !tc_can_offload(f->hw_dev))) {
			f->hw_dev = dev;
			return tc_skip_sw(f->flags) ? -EINVAL : 0;
		}
		dev = f->hw_dev;
		cls_flower.egress_dev = true;
	} else {
		f->hw_dev = dev;
	}


In 4.15 - 4.17 with addition of shared blocks egdev mechanism was
promoted to a generic TC thing supported on all filters but it no
longer works with skip_sw.

I'd argue skip_sw is incorrect for tunnels, because the rule will only
apply to traffic ingressing on ASIC ports, not all traffic which hits
the tunnel netdev.  Therefore we should keep the 4.15 - 4.17 behaviour.

But that's a side note, I don't think we should be breaking offload on
shared blocks whether we decide to support skip_sw on tunnels or not.

^ permalink raw reply

* Re: [PATCH 0/4] RFC CPSW switchdev mode
From: Andrew Lunn @ 2018-06-05 21:28 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Ilias Apalodimas, Ivan Vecera, Jiri Pirko, netdev,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton
In-Reply-To: <327df2cb-a0ad-c272-9b03-066d16ac14b6@ti.com>

> I hope you are right - question is always in number of available options
> and which one to select - and, most important, explain it to the end user :( 

The end customer being ptp4linux? At least for Marvell switches, it is
happy about everything except that the switch is a bit slow, so we
need to modify some of the time outs in the configuration file.

> For example:
> phc_index is returned as part of .get_ts_info() = cpsw_get_ts_info(),
> so which intf should return phc_index?

It is not a 1:1 relationship. See:

https://elixir.bootlin.com/linux/latest/source/drivers/net/dsa/mv88e6xxx/hwtstamp.c#L61

All interfaces return the same index.

In fact, for a switch, having a PHC per port would be odd. That would
mean you need to sync the PHCs in order to act as a boundary clock.

    Andrew

^ permalink raw reply

* Re: [PATCH RFC net-next] net: ipvs: Adjust gso_size for IPPROTO_TCP
From: Martin KaFai Lau @ 2018-06-05 21:31 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: netdev, David Ahern, Tom Herbert, Eric Dumazet, Nikita Shirokov,
	kernel-team, lvs-devel
In-Reply-To: <alpine.LFD.2.20.1805051515450.1895@ja.home.ssi.bg>

On Sat, May 05, 2018 at 03:58:25PM +0300, Julian Anastasov wrote:
> 	So, except the RTF_LOCAL check in __ip6_rt_update_pmtu
> we should have no other issues. 
Hi Julian,

Do you have a chance to work on the IPv6 side?

Thanks,
Martin

^ permalink raw reply

* Re: [PATCH 0/4] RFC CPSW switchdev mode
From: Grygorii Strashko @ 2018-06-05 21:31 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ilias Apalodimas, Ivan Vecera, Jiri Pirko, netdev,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton
In-Reply-To: <20180603003742.GC14515@lunn.ch>



On 06/02/2018 07:37 PM, Andrew Lunn wrote:
>> 1) boot, ping no vlan
>>
>> # ip link add name br0 type bridge
>> # echo 0 > /sys/class/net/br0/bridge/default_pvid
>> # ip link set dev eth2 master br0
>> # ip link set dev eth0 master br0
>> # ip link set dev eth1 master br0
>> # ifconfig br0 192.168.1.2
>>
>> *Note*: I've had to disable default_pvid as otherwise linux Bridge adds
>> and offloads default vlan 1, but default configuration for CPSW driver is vid 0.
>> +  CPSW specific - it can't untag packets for P0.
>> Another option I've found:
>> # ip link set dev br0 type bridge vlan_filtering 1.
>> but anyway, I've found it confusing that Linux bridge adds default vlan when vlan_filtering == 0
> 
> There are three different configurations here you need to worry about,
> with respect to vlans:
> 
> # CONFIG_VLAN_8021Q is not set
> 
> So you don't have any vlan support in the kernel.
> 
> CONFIG_VLAN_8021Q=y, vlan_filtering = 0
> 
> So you have vlans, but filtering is off
> 
> CONFIG_VLAN_8021Q=y, vlan_filtering = 1
> 
> So you have vlans, and filtering is on.
> 
> Even with vlan_filtering off, the bridge still does a little with
> vlans.
> 
> And you need all three to work correctly.
> 

Right, thanky for the info, but still (sry, to be annoying) why default vlan is added by bridge
when vlan_filtering == 0?

-- 
regards,
-grygorii

^ permalink raw reply

* Re: [PATCH 4/4] cpsw: add switchdev support
From: Florian Fainelli @ 2018-06-05 21:37 UTC (permalink / raw)
  To: Grygorii Strashko, Ilias Apalodimas
  Cc: Andrew Lunn, netdev, ivan.khoronzhuk, nsekhar, jiri, ivecera,
	francois.ozog, yogeshs, spatton
In-Reply-To: <e5832a2c-da63-d91d-e93a-51b7e84959a0@ti.com>

On 06/05/2018 02:03 PM, Grygorii Strashko wrote:
> 
> 
> On 06/02/2018 05:34 AM, Ilias Apalodimas wrote:
>> Hi Florian,
>>
>> Thanks for taking time to look into this
>>
>> On Fri, Jun 01, 2018 at 02:48:48PM -0700, Florian Fainelli wrote:
>>>
>>>
>>> On 05/24/2018 09:56 PM, Ilias Apalodimas wrote:
>>>> On Thu, May 24, 2018 at 06:39:04PM +0200, Andrew Lunn wrote:
>>>>> On Thu, May 24, 2018 at 04:32:34PM +0300, Ilias Apalodimas wrote:
>>>>>> On Thu, May 24, 2018 at 03:12:29PM +0200, Andrew Lunn wrote:
>>>>>>> Device tree is supposed to describe the hardware. Using that hardware
>>>>>>> in different ways is not something you should describe in DT.
>>>>>>>
>>>>>> The new switchdev mode is applied with a .config option in the kernel. What you
>>>>>> see is pre-existing code, so i am not sure if i should change it in this
>>>>>> patchset.
>>>>>
>>>>> If you break the code up into a library and two drivers, it becomes a
>>>>> moot point.
>>>> Agree
>>>>
>>>>>
>>>>> But what i don't like here is that the device tree says to do dual
>>>>> mac. But you ignore that and do sometime else. I would prefer that if
>>>>> DT says dual mac, and switchdev is compiled in, the probe fails with
>>>>> EINVAL. Rather than ignore something, make it clear it is invalid.
>>>> The switch has 3 modes of operation as is.
>>>> 1. switch mode, to enable that you don't need to add anything on
>>>> the DTS and linux registers a single netdev interface.
>>>> 2. dual mac mode, this is when you need to add dual_emac; on the DTS.
>>>> 3. switchdev mode which is controlled by a .config option, since as you
>>>> pointed out DTS was not made for controlling config options.
>>>>
>>>> I agree that this is far from beautiful. If the driver remains as in though,
>>>> i'd prefer either keeping what's there or making "switchdev" a DTS option,
>>>> following the pre-existing erroneous usage rather than making the device
>>>> unusable.  If we end up returning some error and refuse to initialize, users
>>>> that remote upgrade their equipment, without taking a good look at changelog,
>>>> will loose access to their devices with no means of remotely fixing that.
>>>
>>> It seems to me that the mistake here is seeing multiple modes of
>>> operations for the cpsw. There are not actually many, there is one
>>> usage, and then there is what you can and cannot offload.
> 
>> CPSW has in fact 2 modes of operation, different FIFO usage/lookup entry(it's
>> called ALE in the current driver) by-pass(which is used in dual emac for
>> example) and other features. Again Grygorii is better suited to answer the
>> exact differences.
> 
> dual_mac is HW enabled mode of operation, so having DT option is pretty
> reasonable as for me. 
> 1) when enabled it configures internal FIFOs in special way so both
> external Ports became equal in the direction toward to Host port 0.
> 
> TRM "The intention of this mode is to allow packets from both ethernet ports
> to be written into the FIFO without one port starving the other port."
> 
> 2) ALE, out of the box, does not support this mode and, as result, two
> default vlan have to be created to direct traffic P1->P0 (vlan1) and
> P2-P0 (vlan2), but any packets P0->P1 and P0->P2 have to be directed
> (and will bypass ALE). This way traffic separated on cpsw egress towards to P0, 
> 
>>> The basic
>>> premise with switchdev and DSA (which uses switchdev) is that each
>>> user-facing port of your switch needs to work as if it were a normal
>>> Ethernet NIC, that is what you call dual-MAC I believe. Then, when you
>>> create a bridge and you enslave those ports into the bridge, you need to
>>> have forwarding done in hardware between these two ports when the
>>> SMAC/DMAC are not for the host/CPU/management interface and you must
>>> simultaneously still have the host have the ability to send/receive
>>> traffic through the bridge device.
> 
> TRM
> "When operating in dual mac mode the intention is to transfer packets between ports 0 and 1 and ports 0
> and 2, but not between ports 1 and 2. Each CPGMAC_SL appears as a single MAC with no bridging
> between MAC’s. Each CPGMAC_SL has at least one unique (not the same) mac address."
> 
> So, we cant't talk about forwarding in dual_mac mode. Interfaces expected to be 
> completely independent without any packet leaking between interfaces.
> 
> !! BUT !! CPSW ALE (switch core) still expected to be used, but for packet filtering
> - only registered vlans
> - only registered mcast/bcast
> - ingress mcast/bcast rate limiting (it's actually more like coalescing -
>  limits number of mcast/bcast packets per sec. 
> 
> And all offloading ALE (val/mdb) entries should always contain two ports in masks:
> p1&p0 or p2&p0. Never ever all three ports. 
> 
>> Yes dual emac does that. But dual emac configures the port facing VLAN to the
>> CPU port as well. So dual emac splits and uses 2 interfaces. VLAN 1 is
>> configured on port1 + CPU port and VLAN 2 is confired on port 2 + CPU port
>> That's exactly what the current RFC does as well, with the addition of
>> registering a sw0p0 (i'll explain why later on this mail)
>> A little more detail on the issue we are having. On my description
>> sw0p0 -> CPU port, sw0p1 -> port 1 sw0p2 -> port 2. sw0p1/sw0p2 are the ports
>> that have PHYs attached.
>>
>> When we start in the new switchdev mode all interfaces are added to VLAN 0
>> so CPU port + port1 + port2 are all in the same VLAN group. In that case sw0p1
>> and sw0p2 are working as you describe. So those 2 interfaces can send/receive
>> traffic normally which matches the switchdev case.
>>
>> When we add them on a bridge(let's say br0), VLAN1(or any default bridge VLAN)
>> is now configured on sw0p1 and sw0p2 but *not* on the CPU port.
>>  From this point on the whole fuunctionality just collapses. The switch will
>> work and offload traffic between sw0p1/sw0p2 but the bridge won't be able to
>> get an ip address (since VLAN1 is not a member of the CPU port and the packet
>> gets dropped).
>> IGMPv2/V3 messages will never reach the br_multicast.c code to trigger
>> switchdev and configure the MDBs on the ports.  i am prety sure there are other
>> fail scenarios which i haven't discovered already, but those 2 are the most
>> basic ones.  If we add VLAN1 on the CPU port, everything works as intended of
>> course.
>>
>> That's the reason we registered sw0p0 as the CPU port. It can't do any "real"
>> traffic, but you can configure the CPU port independantly and not be forced to
>> do an OR on every VLAN add/delete grouping the CPU port with your port command.
>> The TL;DR version of this is that the switch is working exactly as switchdev is
>> expecting offloading traffic to the hardware when possible as long as the CPU
>> port is member of the proper VLANs
>>
>> Petr's patch solves this for us (9c86ce2c1ae337fc10568a12aea812ed03de8319).
>> We can now do "bridge vlan add dev br0 vid 100 pvid untagged self" and decide
>> when to add the CPU port or not.
>>
>> There are still a couple of cases that are not covered though, if we don't
>> register the CPU port. We cant decide when to forward multicast
>> traffic on the CPU port if a join hasn't been sent from br0.
>> So let's say you got 2 hosts doing multicast and for whatever reason the host
>> wants to see that traffic.
>> With the CPU port present you can do a
>> "bridge mdb add dev br0 port sw0p0 grp 239.1.1.1 permanent" which will offload
>> the traffic to the CPU port and thus the host. If this goes away we are forced
>> to send a join.
>>   
>>> It seems to me like this is entirely doable given that the dual MAC use
>>> case is supported already.
>>>
>>> switchdev is just a stateless framework to get notified from the
>>> networking stack about what you can possibly offload in hardware, so
>>> having a DTS option gate that is unfortunately wrong because it is
>>> really implementing a SW policy in DTS which is not what it is meant for.
>> The DTS option for configuration pre-existed, i don't like that either switchdev
>> mode is activated by a .config option not DTS(it just overrides whatever config
>> you have on the DTS). Far from pretty though fair point, i am open to
>> suggestions.
> 
> Again this is option describing HW mode which not expected to be changed on the fly.
> I'm honestly, propose descope dual_mac and concentrate on switch mode (switchdev) - 
> right now I don't see how dual_mac can be supported with switchdev as per above.
> The same way as I do not see how we can re-use switchdev with 50% of devices which 
> have "only one" user-facing external port (P1 or P2).

This is still not appropriate for Device Tree, because this is
completely orthogonal from one another. Also, I think you tend to
conflate what the switch can do, with in which mode does it start by
default, which are, again, two different things.
-- 
Florian

^ permalink raw reply

* Re: [PATCH 0/4] RFC CPSW switchdev mode
From: Andrew Lunn @ 2018-06-05 21:37 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Ilias Apalodimas, Ivan Vecera, Jiri Pirko, netdev,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton
In-Reply-To: <05111013-4986-b1f2-a2d1-9cab48be1ce5@ti.com>

> Right, thanky for the info, but still (sry, to be annoying) why default vlan is added by bridge
> when vlan_filtering == 0?

I have no idea!

I just made sure the Marvell driver works with this.

You might want to do a git blame and find out who added it, and it
might say why.

      Andrew

^ permalink raw reply

* Re: [PATCH 1/1] net: dsa: b53: Fix for brcm tag issue in Cygnus SoC
From: Florian Fainelli @ 2018-06-05 21:41 UTC (permalink / raw)
  To: Arun Parameswaran, Vivien Didelot, Andrew Lunn, David S. Miller
  Cc: netdev, linux-kernel, bcm-kernel-feedback-list,
	Clément Péron
In-Reply-To: <1528231092-31472-1-git-send-email-arun.parameswaran@broadcom.com>

On 06/05/2018 01:38 PM, Arun Parameswaran wrote:
> In the Broadcom Cygnus SoC, the brcm tag needs to be inserted
> in between the mac address and the ether type (should use
> 'DSA_PROTO_TAG_BRCM') for the packets sent to the internal
> b53 switch.
> 
> Since the Cygnus was added with the BCM58XX device id and the
> BCM58XX uses 'DSA_PROTO_TAG_BRCM_PREPEND', the data path is
> broken, due to the incorrect brcm tag location.
> 
> Add a new b53 device id (BCM583XX) for Cygnus family to fix the
> issue. Add the new device id to the BCM58XX family as Cygnus
> is similar to the BCM58XX in most other functionalities.
> 
> Fixes: 11606039604c ("net: dsa: b53: Support prepended Broadcom tags")
> 
> Signed-off-by: Arun Parameswaran <arun.parameswaran@broadcom.com>

Clement originally reported this to me/us:

Reported-by: Clément Péron <peron.clem@gmail.com>

I completely overlooked that when adding support for prepended Broadcom
tags, thanks for the fix Arun!

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>

David, can you also queue this up for -stable? Thank you

> ---
>  drivers/net/dsa/b53/b53_common.c | 15 ++++++++++++++-
>  drivers/net/dsa/b53/b53_priv.h   |  2 ++
>  drivers/net/dsa/b53/b53_srab.c   |  4 ++--
>  3 files changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
> index 3da5fca..bbc6cc6 100644
> --- a/drivers/net/dsa/b53/b53_common.c
> +++ b/drivers/net/dsa/b53/b53_common.c
> @@ -684,7 +684,8 @@ static int b53_switch_reset(struct b53_device *dev)
>  	 * still use this driver as a library and need to perform the reset
>  	 * earlier.
>  	 */
> -	if (dev->chip_id == BCM58XX_DEVICE_ID) {
> +	if (dev->chip_id == BCM58XX_DEVICE_ID ||
> +	    dev->chip_id == BCM583XX_DEVICE_ID) {
>  		b53_read8(dev, B53_CTRL_PAGE, B53_SOFTRESET, &reg);
>  		reg |= SW_RST | EN_SW_RST | EN_CH_RST;
>  		b53_write8(dev, B53_CTRL_PAGE, B53_SOFTRESET, reg);
> @@ -1880,6 +1881,18 @@ struct b53_chip_data {
>  		.jumbo_size_reg = B53_JUMBO_MAX_SIZE,
>  	},
>  	{
> +		.chip_id = BCM583XX_DEVICE_ID,
> +		.dev_name = "BCM583xx/11360",
> +		.vlans = 4096,
> +		.enabled_ports = 0x103,
> +		.arl_entries = 4,
> +		.cpu_port = B53_CPU_PORT,
> +		.vta_regs = B53_VTA_REGS,
> +		.duplex_reg = B53_DUPLEX_STAT_GE,
> +		.jumbo_pm_reg = B53_JUMBO_PORT_MASK,
> +		.jumbo_size_reg = B53_JUMBO_MAX_SIZE,
> +	},
> +	{
>  		.chip_id = BCM7445_DEVICE_ID,
>  		.dev_name = "BCM7445",
>  		.vlans	= 4096,
> diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h
> index 3b57f47..b232aaa 100644
> --- a/drivers/net/dsa/b53/b53_priv.h
> +++ b/drivers/net/dsa/b53/b53_priv.h
> @@ -62,6 +62,7 @@ enum {
>  	BCM53018_DEVICE_ID = 0x53018,
>  	BCM53019_DEVICE_ID = 0x53019,
>  	BCM58XX_DEVICE_ID = 0x5800,
> +	BCM583XX_DEVICE_ID = 0x58300,
>  	BCM7445_DEVICE_ID = 0x7445,
>  	BCM7278_DEVICE_ID = 0x7278,
>  };
> @@ -181,6 +182,7 @@ static inline int is5301x(struct b53_device *dev)
>  static inline int is58xx(struct b53_device *dev)
>  {
>  	return dev->chip_id == BCM58XX_DEVICE_ID ||
> +		dev->chip_id == BCM583XX_DEVICE_ID ||
>  		dev->chip_id == BCM7445_DEVICE_ID ||
>  		dev->chip_id == BCM7278_DEVICE_ID;
>  }
> diff --git a/drivers/net/dsa/b53/b53_srab.c b/drivers/net/dsa/b53/b53_srab.c
> index c37ffd1..8247481 100644
> --- a/drivers/net/dsa/b53/b53_srab.c
> +++ b/drivers/net/dsa/b53/b53_srab.c
> @@ -364,7 +364,7 @@ static int b53_srab_write64(struct b53_device *dev, u8 page, u8 reg,
>  	{ .compatible = "brcm,bcm53018-srab" },
>  	{ .compatible = "brcm,bcm53019-srab" },
>  	{ .compatible = "brcm,bcm5301x-srab" },
> -	{ .compatible = "brcm,bcm11360-srab", .data = (void *)BCM58XX_DEVICE_ID },
> +	{ .compatible = "brcm,bcm11360-srab", .data = (void *)BCM583XX_DEVICE_ID },
>  	{ .compatible = "brcm,bcm58522-srab", .data = (void *)BCM58XX_DEVICE_ID },
>  	{ .compatible = "brcm,bcm58525-srab", .data = (void *)BCM58XX_DEVICE_ID },
>  	{ .compatible = "brcm,bcm58535-srab", .data = (void *)BCM58XX_DEVICE_ID },
> @@ -372,7 +372,7 @@ static int b53_srab_write64(struct b53_device *dev, u8 page, u8 reg,
>  	{ .compatible = "brcm,bcm58623-srab", .data = (void *)BCM58XX_DEVICE_ID },
>  	{ .compatible = "brcm,bcm58625-srab", .data = (void *)BCM58XX_DEVICE_ID },
>  	{ .compatible = "brcm,bcm88312-srab", .data = (void *)BCM58XX_DEVICE_ID },
> -	{ .compatible = "brcm,cygnus-srab", .data = (void *)BCM58XX_DEVICE_ID },
> +	{ .compatible = "brcm,cygnus-srab", .data = (void *)BCM583XX_DEVICE_ID },
>  	{ .compatible = "brcm,nsp-srab", .data = (void *)BCM58XX_DEVICE_ID },
>  	{ /* sentinel */ },
>  };
> 


-- 
Florian

^ permalink raw reply

* Re: [PATCH 0/4] RFC CPSW switchdev mode
From: Grygorii Strashko @ 2018-06-05 21:42 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ilias Apalodimas, Ivan Vecera, Jiri Pirko, netdev,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton
In-Reply-To: <20180605212840.GA3796@lunn.ch>



On 06/05/2018 04:28 PM, Andrew Lunn wrote:
>> I hope you are right - question is always in number of available options
>> and which one to select - and, most important, explain it to the end user :(
> 
> The end customer being ptp4linux? At least for Marvell switches, it is
> happy about everything except that the switch is a bit slow, so we
> need to modify some of the time outs in the configuration file.
> 
>> For example:
>> phc_index is returned as part of .get_ts_info() = cpsw_get_ts_info(),
>> so which intf should return phc_index?
> 
> It is not a 1:1 relationship. See:
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/net/dsa/mv88e6xxx/hwtstamp.c#L61
> 
> All interfaces return the same index.
> 
> In fact, for a switch, having a PHC per port would be odd. That would
> mean you need to sync the PHCs in order to act as a boundary clock.

PHC only one, but hw timestamping blocks are per port.

-- 
regards,
-grygorii

^ permalink raw reply

* Re: [PATCH 0/4] RFC CPSW switchdev mode
From: Andrew Lunn @ 2018-06-05 21:55 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Ilias Apalodimas, Ivan Vecera, Jiri Pirko, netdev,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton
In-Reply-To: <3531006d-6cdf-f0a1-a0ce-042194aece45@ti.com>

> PHC only one, but hw timestamping blocks are per port.

Yes, same as the Marvell. Per port, there are two receive time stamps
and one transmit time stamp.

    Andrew

^ permalink raw reply

* [PATCH net] net: ipv6: ip6_output: alloc skb with tailroom
From: Alexander Aring @ 2018-06-05 22:04 UTC (permalink / raw)
  To: netdev
  Cc: davem, yoshfuji, david.palma, rabinarayans0828, jhs, stefan,
	linux-wpan, kernel, Alexander Aring

This patch adds care about tailroom length for allocate a skb from ipv6
level stack. In case of 6lowpan we had the problem the skb runs into a
skb_over_panic() in some special length cases. The root was there was no
tailroom allocated for the IEEE 802.15.4 checksum, although we had
the necessary tailroom specified inside the netdev structure.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195059
Reported-by: David Palma <david.palma@ntnu.no>
Reported-by: Rabi Narayan Sahoo <rabinarayans0828@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
---
Hi,

nasty bug, I suppose this is the correct fix to my last question for
what dev->needed_tailroom is designed for.

I added two Reported-by here, David Palma reported this bug one year ago
and I didn't had time to investigate. Interesting is that he told this
bug doesn't occur (in case of 6lowpan 802.15.4) on 32 bit systems.
Maybe alignment is related to the question why it works on 32 bit.

Anyway, a week ago "Rabi Narayan Sahoo" reported the bug again and I
needed to investigate something "why", since I also use a 64 bit vm.

David Palma did a nice job for reproduce this bug and he (I think) lives
at least one year with it, so I put him at first.

Anyway, Rabi Narayan Sahoo was very very close to fix it and found the
right code part which I also found. I read his mail afterwards because
it was received messed on the linux-wpan mailinglist. So it's correct
to give him credits too. :-)

I hope there are no other cases where tailroom is missing.
The second one is not needed to fix my bug but I think we need it there.
Also hh_len is also used inside a skb_resever() in this function,
but this is for headroom only.

- Alex

 net/ipv6/ip6_output.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 7b6d1689087b..b4e521cfe3cf 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1262,6 +1262,7 @@ static int __ip6_append_data(struct sock *sk,
 	int exthdrlen = 0;
 	int dst_exthdrlen = 0;
 	int hh_len;
+	int t_len;
 	int copy;
 	int err;
 	int offset = 0;
@@ -1283,6 +1284,7 @@ static int __ip6_append_data(struct sock *sk,
 	orig_mtu = mtu;
 
 	hh_len = LL_RESERVED_SPACE(rt->dst.dev);
+	t_len = rt->dst.dev->needed_tailroom;
 
 	fragheaderlen = sizeof(struct ipv6hdr) + rt->rt6i_nfheader_len +
 			(opt ? opt->opt_nflen : 0);
@@ -1425,13 +1427,13 @@ static int __ip6_append_data(struct sock *sk,
 			}
 			if (transhdrlen) {
 				skb = sock_alloc_send_skb(sk,
-						alloclen + hh_len,
+						alloclen + hh_len + t_len,
 						(flags & MSG_DONTWAIT), &err);
 			} else {
 				skb = NULL;
 				if (refcount_read(&sk->sk_wmem_alloc) + wmem_alloc_delta <=
 				    2 * sk->sk_sndbuf)
-					skb = alloc_skb(alloclen + hh_len,
+					skb = alloc_skb(alloclen + hh_len + t_len,
 							sk->sk_allocation);
 				if (unlikely(!skb))
 					err = -ENOBUFS;
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH bpf-next v4 1/2] trace_helpers.c: Add helpers to poll multiple perf FDs for events
From: Jakub Kicinski @ 2018-06-05 22:16 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: netdev
In-Reply-To: <152821020087.23694.8231039605257373797.stgit@alrua-kau>

On Tue, 05 Jun 2018 16:50:00 +0200, Toke Høiland-Jørgensen wrote:
> Add two new helper functions to trace_helpers that supports polling
> multiple perf file descriptors for events. These are used to the XDP
> perf_event_output example, which needs to work with one perf fd per CPU.
> 
> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
> ---
>  tools/testing/selftests/bpf/trace_helpers.c |   47 ++++++++++++++++++++++++++-
>  tools/testing/selftests/bpf/trace_helpers.h |    4 ++
>  2 files changed, 49 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/bpf/trace_helpers.c b/tools/testing/selftests/bpf/trace_helpers.c
> index 3868dcb63420..1e62d89f34cf 100644
> --- a/tools/testing/selftests/bpf/trace_helpers.c
> +++ b/tools/testing/selftests/bpf/trace_helpers.c
> @@ -88,7 +88,7 @@ static int page_size;
>  static int page_cnt = 8;
>  static struct perf_event_mmap_page *header;
>  
> -int perf_event_mmap(int fd)
> +int perf_event_mmap_header(int fd, struct perf_event_mmap_page **header)
>  {
>  	void *base;
>  	int mmap_size;
> @@ -102,10 +102,15 @@ int perf_event_mmap(int fd)
>  		return -1;
>  	}
>  
> -	header = base;
> +	*header = base;
>  	return 0;
>  }
>  
> +int perf_event_mmap(int fd)
> +{
> +	return perf_event_mmap_header(fd, &header);
> +}
> +
>  static int perf_event_poll(int fd)
>  {
>  	struct pollfd pfd = { .fd = fd, .events = POLLIN };
> @@ -163,3 +168,41 @@ int perf_event_poller(int fd, perf_event_print_fn output_fn)
>  
>  	return ret;
>  }
> +
> +int perf_event_poller_multi(int *fds, struct perf_event_mmap_page **headers,
> +			    int num_fds, perf_event_print_fn output_fn)
> +{
> +	enum bpf_perf_event_ret ret;
> +	struct pollfd *pfds;
> +	void *buf = NULL;
> +	size_t len = 0;
> +	int i;
> +
> +	pfds = malloc(sizeof(*pfds) * num_fds);
> +	if (!pfds)
> +		return -1;

nit: this is correct, but better use LIBBPF_PERF_EVENT_ERROR?  This
function is supposed to return enum bpf_perf_event_ret values I assume?
In case someone moves this code to libbpf later on...

> +	memset(pfds, 0, sizeof(*pfds) * num_fds);
> +	for (i = 0; i < num_fds; i++) {
> +		pfds[i].fd = fds[i];
> +		pfds[i].events = POLLIN;
> +	}
> +
> +	for (;;) {
> +		poll(pfds, num_fds, 1000);
> +		for (i = 0; i < num_fds; i++) {
> +			if (pfds[i].revents) {

nit: save yourself a lever of indentation by doing:

if (!pfds[i].revents)
	continue;

and you won't have to go over 80 chars.

> +				ret = bpf_perf_event_read_simple(headers[i], page_cnt * page_size,
> +								page_size, &buf, &len,
> +								bpf_perf_event_print,
> +								output_fn);
> +				if (ret != LIBBPF_PERF_EVENT_CONT)
> +					break;
> +			}
> +		}
> +	}
> +	free(buf);
> +	free(pfds);
> +
> +	return ret;
> +}

^ permalink raw reply

* Re: [PATCH bpf-next v4 2/2] samples/bpf: Add xdp_sample_pkts example
From: Jakub Kicinski @ 2018-06-05 22:24 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: netdev
In-Reply-To: <152821020092.23694.4231922206556589059.stgit@alrua-kau>

On Tue, 05 Jun 2018 16:50:00 +0200, Toke Høiland-Jørgensen wrote:
> Add an example program showing how to sample packets from XDP using the
> perf event buffer. The example userspace program just prints the ethernet
> header for every packet sampled.
> 
> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>

> diff --git a/samples/bpf/xdp_sample_pkts_kern.c b/samples/bpf/xdp_sample_pkts_kern.c
> new file mode 100644
> index 000000000000..4560522ca015
> --- /dev/null
> +++ b/samples/bpf/xdp_sample_pkts_kern.c
> @@ -0,0 +1,62 @@
> +#include <linux/ptrace.h>
> +#include <linux/version.h>
> +#include <uapi/linux/bpf.h>
> +#include "bpf_helpers.h"
> +
> +#define SAMPLE_SIZE 64ul
> +#define MAX_CPUS 24

That may be a lil' too few for modern HW with hyper-threading on ;)  
My development machine says:

$ ncpus
28

128, maybe?

> +#define bpf_printk(fmt, ...)					\
> +({								\
> +	       char ____fmt[] = fmt;				\
> +	       bpf_trace_printk(____fmt, sizeof(____fmt),	\
> +				##__VA_ARGS__);			\
> +})
> +
> +struct bpf_map_def SEC("maps") my_map = {
> +	.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
> +	.key_size = sizeof(int),
> +	.value_size = sizeof(u32),
> +	.max_entries = MAX_CPUS,
> +};
> +
> +SEC("xdp_sample")
> +int xdp_sample_prog(struct xdp_md *ctx)
> +{
> +	void *data_end = (void *)(long)ctx->data_end;
> +	void *data = (void *)(long)ctx->data;
> +
> +        /* Metadata will be in the perf event before the packet data. */
> +	struct S {
> +		u16 cookie;
> +		u16 pkt_len;
> +	} __attribute__((packed)) metadata;
> +
> +	if (data + SAMPLE_SIZE < data_end) {
> +		/* The XDP perf_event_output handler will use the upper 32 bits
> +		 * of the flags argument as a number of bytes to include of the
> +		 * packet payload in the event data. If the size is too big, the
> +		 * call to bpf_perf_event_output will fail and return -EFAULT.
> +		 *
> +		 * See bpf_xdp_event_output in net/core/filter.c.
> +		 *
> +		 * The BPF_F_CURRENT_CPU flag means that the event output fd
> +		 * will be indexed by the CPU number in the event map.
> +		 */
> +		u64 flags = (SAMPLE_SIZE << 32) | BPF_F_CURRENT_CPU;
> +		int ret;
> +
> +		metadata.cookie = 0xdead;
> +		metadata.pkt_len = (u16)(data_end - data);
> +
> +		ret = bpf_perf_event_output(ctx, &my_map, flags,
> +				      &metadata, sizeof(metadata));
> +		if(ret)

Please run checkpatch --strict on the samples.

> +			bpf_printk("perf_event_output failed: %d\n", ret);
> +	}
> +
> +	return XDP_PASS;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> +u32 _version SEC("version") = LINUX_VERSION_CODE;

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox