Netdev List
 help / color / mirror / Atom feed
* Re: WARNING: CPU: 3 PID: 0 at net/sched/sch_hfsc.c:1388 hfsc_dequeue+0x319/0x350 [sch_hfsc]
From: Cong Wang @ 2018-06-18 19:28 UTC (permalink / raw)
  To: Marco Berizzi; +Cc: Linux Kernel Network Developers
In-Reply-To: <1938577710.309243.1528180788286@mail.libero.it>

[-- Attachment #1: Type: text/plain, Size: 867 bytes --]

On Mon, Jun 4, 2018 at 11:39 PM, Marco Berizzi <pupilla@libero.it> wrote:
>> Il 8 marzo 2018 alle 17.02 Marco Berizzi <pupilla@libero.it> ha scritto:
>>
>> > Marco Berizzi wrote:
>> >
>> > Hello everyone,
>> >
>> > Yesterday I got this error on a slackware linux 4.16-rc4 system
>> > running as a traffic shaping gateway and netfilter nat.
>> > The error has been arisen after a partial ISP network outage,
>> > so unfortunately it will not trivial for me to reproduce it again.
>>
>> Hello everyone,
>>
>> I'm getting this error twice/day, so fortunately I'm able to
>> reproduce it.
>
> I'm getting the same error also with 4.17 (4.15.18 was
> the latest working version):

Can you test the attached patch?

It almost certainly fixes the warning, but I am not sure if
it papers out any other real problem. Please make sure
hfsc still works as expected. :)

Thanks.

[-- Attachment #2: hfsc.diff --]
[-- Type: application/octet-stream, Size: 715 bytes --]

commit 3df066c835cdb6dd5ee093292e1c41ae584fafa3
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date:   Mon Jun 18 11:59:47 2018 -0700

    net_sched: remove a bogus warning in hfsc
    
    Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 3ae9877ea205..3278a76f6861 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1385,8 +1385,8 @@ hfsc_schedule_watchdog(struct Qdisc *sch)
 		if (next_time == 0 || next_time > q->root.cl_cfmin)
 			next_time = q->root.cl_cfmin;
 	}
-	WARN_ON(next_time == 0);
-	qdisc_watchdog_schedule(&q->watchdog, next_time);
+	if (next_time)
+		qdisc_watchdog_schedule(&q->watchdog, next_time);
 }
 
 static int

^ permalink raw reply related

* Re: [PATCH] optoe: driver to read/write SFP/QSFP EEPROMs
From: Don Bollinger @ 2018-06-18 19:13 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Andrew Lunn, Tom Lendacky, Arnd Bergmann, Greg Kroah-Hartman,
	linux-kernel, brandon_chuang, wally_wang, roy_lee, rick_burchett,
	quentin.chang, steven.noble, jeffrey.townsend, scotte, roopa,
	David Ahern, luke.williams, Guohan Lu, Russell King,
	netdev@vger.kernel.org, don
In-Reply-To: <ef0baf6d-1d1f-ba5b-708a-d8179bd1ea3b@gmail.com>

On Thu, Jun 14, 2018 at 08:24:34PM -0700, Florian Fainelli wrote:
> 
> 
> On 06/14/2018 07:26 PM, Don Bollinger wrote:
> > On Tue, Jun 12, 2018 at 08:11:09PM +0200, Andrew Lunn wrote:
> >>> There's an SFP driver under drivers/net/phy.  Can that driver be extended
> >>> to provide this support?  Adding Russel King who developed sfp.c, as well
> >>> at the netdev mailing list.
> >>
> >> I agree, the current SFP code should be used.
> >>
> >> My observations seem to be there are two different ways {Q}SFP are used:
> >>
> >> 1) The Linux kernel has full control, as assumed by the devlink/SFP
> >> frame work. We parse the SFP data to find the capabilities of the SFP
> >> and use it to program the MAC to use the correct mode. The MAC can be
> >> a NIC, but it can also be a switch. DSA is gaining support for
> >> PHYLINK, so SFP modules should just work with most switches which DSA
> >> support.  And there is no reason a plain switchdev switch can not use
> >> PHYLINK.
> >>
> >> 2) Firmware is in control of the PHY layer, but there is a wish to
> >> expose some of the data which is available via i2c from the {Q}SFP to
> >> linux.
> >>
> >> It appears this optoe supports this second case. It does not appear to
> >> support any in kernel API to actually make use of the SFP data in the
> >> kernel.
> >>
> >> We should not be duplicating code. We should share the SFP code for
> >> both use cases above. There is also a Linux standard API for getting
> >> access to this information. ethtool -m/--module-info. Anything which
> >> is exporting {Q}SFP data needs to use this API.
> >>
> >>    Andrew
> > 
> > Actually this is better described by a third use case.  The target
> > switches are PHY-less (see various designs at
> > www.compute.org/wiki/Networking/SpecsAndDesigns). The AS5712 for example
> > says "The AS5712-54X is a PHY-Less design with the SFP+ and QSFP+
> > connections directly attaching to the Serdes interfaces of the Broadcom
> > BCM56854 720G Trident 2 switching silicon..."
> > 
> > The electrical controls of the {Q}SFP devices (TxDisable for example)
> > are organized in a platform dependent way, through CPLD devices, and
> > managed by a platform specific CPLD driver.
> > 
> > The i2c bus is muxed from the CPU to all of the {Q}SFP devices, which
> > are set up as standard linux i2c devices
> > (/sys/bus/i2c/devices/i2c-xxxx).
> > 
> > There is no MDIO bus between the CPU and the {Q}SFP devices.
> > 
> >> 2) Firmware is in control of the PHY layer, but there is a wish to
> >> expose some of the data which is available via i2c from the {Q}SFP to
> >> linux.
> > 
> > So the switch silicon is in control of the PHY layer.  The platform
> > driver is in control of the electrical interfaces.  And the EEPROM data
> > is available via I2C.
> > 
> > And, there isn't actually 'a wish to expose' the EEPROM data to linux
> > (the kernel).  It turns out that none of the NOS partners I'm working
> > with use that data *in the kernel*.  It is all managed from user space.
> > 
> > More generally, I think sfp.c and optoe are not actually trying to
> > accomplish the same thing at all.  sfp.c combines all three functions
> > (PHY, electrical control, EEPROM access).  optoe is only providing EEPROM
> > access, and only to user space.  This is a real need in the white box
> > switch environment, and is not met by sfp.c.  optoe isn't better, sfp.c
> > isn't better, they're just different.
> 
> sfp exposes standard ethtool hooks such as get_module_info() which pass
> the whole data blob to user-space, e.g: ethtool where all of this is
> better interpreted.

Yes.  But ethtool depends on the underlying driver to access the data.
The current sfp.c implementation limits the amount of data that can be
accessed.  optoe makes the entire EEPROM accessible.  I think the right
solution is to call optoe for access to the EEPROM.  A couple of lines
of code in sfp_i2c_read could call optoe to get the data instead of
formatting the i2c_transfer directly.  That change would immediately
give the whole SFP framework access to all the address space of both SFP
and QSFP.  (Same change to sfp_i2c_write.)

> 
> > 
> > Finally, sfp.c does not recognize that SFP devices have data beyond 512
> > bytes, accessible via a page register.  It also does not recognize QSFP
> > devices at all.  QSFP devices have only 256 bytes accessible (one i2c
> > address) before switching to paged access for the remaining data.  The
> > first design requirement for optoe was to access all the available
> > pages, because there is information and controls that we (optics
> > vendors) want to make available to network management applications.
> 
> Patches welcome if you wish to extend sfp.c to support QSFP devices for
> instances.

I would love to collaborate on this.  I don't have an environment
(hardware or software) where I could build or test changes to the sfp
code.

> 
> > 
> > If sfp.c creates a standard linux i2c client for each SFP device, it
> > should be possible to create an optoe managed device 'under' sfp.c to
> > provide access to the full EEPROM address space:
> 
> It's the other way around, SFP relies on a standard Linux i2c bus master
> to exist such that it can read the EEPROM from the standard slave
> address location, same goes with a possibly present PHY.

Great.  Then plugging optoe in should be easy.

> 
> >   # echo optoe2 0x50 >/sys/bus/i2c/devices/i2c-xx/new_device
> > This might prove useful to user space consumers of that data.  We could
> > also easily add a kernel API (eg the nvmem framework) to optoe to provide
> > kernel access.  In other words, sfp.c could assign EEPROM management to
> > optoe, while managing the electrical interfaces.  (This is actually
> > pretty close to how the platfom drivers work in the switch environment.)
> > sfp.c would get SFP page support and QSFP EEPROM access 'for free'.
> 
> That sounds like a possibly good approach.

Thanks

> 
> > 
> >>                       There is also a Linux standard API for getting
> >> access to this information. ethtool -m/--module-info. Anything which
> >> is exporting {Q}SFP data needs to use this API.
> > 
> > optoe simply provides direct access from user space to the full EEPROM
> > data.  There is more data there than ethtool knows about, and in some
> > devices there are proprietary registers that ethtool will never know
> > about.  optoe does not interpret any of the EEPROM content (except the
> > bare minimum to access pages correctly).  optoe also does not get in the
> > way of ethtool.  It could prove to be a handy way for ethtool to access
> > new EEPROM fields in the future.  QSFP-DD/OSFP are coming soon, they
> > will have a different (incompatible) set of new fields to be decoded.
> 
> sfp is the same it only passes the EEPROM information to user-space and
> interprets just what it needs to get the work done.

I offer include/linux/sfp.h as a counter example.  Every byte, every bit
in the spec is spec'ed there.  That's great, but exceeds the mandate of
optoe.  Optoe is just there to get the data in and out of the EEPROM.

> 
> > 
> > Bottom Line:  sfp.c is not a useful starting point for the switch
> > environment I'm working in.  The underlying hardware architecture is
> > quite different.  optoe is not a competing alternative.  Its only
> > function is to provide user-space access to the EEPROM data in {Q}SFP
> > devices.
> 
> I just don't understand why would we want optoe when the standard way to
> expose both EEPROM and diagnostics modules has been through ethtool's
> get_module_info since the basic paradigm is that a network port is a
> net_device instance in the kernel. If that basic device driver model
> does not exist, then it is unclear to me what are the benefits.

We want optoe to access regions of the EEPROM which are paged, and to
access QSFP which only has one I2C address and is also paged.  It is
just the sfp_read/sfp_write function, but reading and writing the whole
EEPROM.  Plugging it into sfp gives that broader access to the ethtool
interface and the rest of the net-device model.

> 
> Would I be completely wrong if I wrote that you are likely working with
> a switch SDK which primarily runs in user-space and so with lack of a
> proper kernel-based device driver for your piece of hardware you are
> attempting to create a driver which is some sort of a "tap" for some
> specific portion of that larger hardware block?

Not completely wrong, but biased.  The switch ASIC and the switch SDK do
not connect to the i2c bus (at least in the architecture I'm working
with).  Whether or not it is in user-space isn't the point.  That it
doesn't access i2c, hence doesn't access the EEPROM, is the reason we
have a 'proper kernel-based device driver'.

> -- 
> Florian
> 

Don

^ permalink raw reply

* Re: 4.14.(44->48) IPv6 RA issue?
From: Ido Schimmel @ 2018-06-18 19:09 UTC (permalink / raw)
  To: valdis.kletnieks; +Cc: netdev, linux-kernel
In-Reply-To: <35958.1529334166@turing-police.cc.vt.edu>

On Mon, Jun 18, 2018 at 11:02:46AM -0400, valdis.kletnieks@vt.edu wrote:
> So I'm trying to troubleshoot an issue on an OpenWRT/Lede based
> router, where IPv6 connectivity totally fails. I've bisected it down to:
> 
> git log --oneline 187da94808a634477b5e5a69109ea0c566dfa64b..73d8a6ab7668173d70adbed45b61be5256c505e
> 73d8a6ab7668 (refs/bisect/bad) base-files: fix UCI config parsing and callback handling
> e52f3e9b1376 kernel: bump 4.14 to 4.14.48
> 7590c3c58f5e (HEAD) scripts: Replace obsolete POSIX tmpnam in slugimage.pl with File::Temp function
> 987900f2de76 hostapd: properly build hostapd-only SSL variants
> 
> and am pretty sure that it's the kernel bump (works with a 4.14.44 kernel,
> breaks with 4.14.48) as the other 3 commits don't go anywhere near IPv6 handling.

...

> Note that eth1 is the uplink towards my ISP.  I've pointed a 'tcpdump -n -i eth1 ip6'
> at it, and see plenty of RA packets come in, but neighbor discovery never completes.
> Looking at the Changelogs for .45->.50 don't show any smoking-gun patches.
> 
> This ring any bells, before I delve deeper into it?

If this is a router and forwarding is enabled, then you need to set
'accept_ra' to '2' [1]. Is this what you have configured? Seems that
OpenWRT recently started to explicitly set 'accept_ra' to '0' [2].

1. https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
2. https://github.com/openwrt/openwrt/commit/bb46520159c0119e829900e29681feea6f297fe0

^ permalink raw reply

* [PATCH iproute2-next 1/1] tc: jsonify nat action
From: Keara Leibovitz @ 2018-06-18 18:57 UTC (permalink / raw)
  To: dsahern; +Cc: stephen, netdev, kernel, Keara Leibovitz

Add json output support for nat action

Example output:

~$ $TC actions add action nat egress 10.10.10.1 20.20.20.2 index 2
~$ $TC actions add action nat ingress 100.100.100.1/32 200.200.200.2 \
	continue index 99
~$ $TC -j actions ls action nat

[{
	"total acts": 2
}, {
	"actions": [{
		"order": 0,
		"type": "nat",
		"direction": "egress",
		"old_addr": "10.10.10.1/32",
		"new_addr": "20.20.20.2",
		"control_action": {
			"type": "pass"
		},
		"index": 2,
		"ref": 1,
		"bind": 0
	}, {
		"order": 1,
		"type": "nat",
		"direction": "ingress",
		"old_addr": "100.100.100.1/32",
		"new_addr": "200.200.200.2",
		"control_action": {
			"type": "continue"
		},
		"index": 99,
		"ref": 1,
		"bind": 0
	}]
}]

Signed-off-by: Keara Leibovitz <kleib@mojatatu.com>
---
 tc/m_nat.c | 32 +++++++++++++++++++-------------
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/tc/m_nat.c b/tc/m_nat.c
index 653792da91c0..ee0b7520a605 100644
--- a/tc/m_nat.c
+++ b/tc/m_nat.c
@@ -142,9 +142,8 @@ print_nat(struct action_util *au, FILE * f, struct rtattr *arg)
 {
 	struct tc_nat *sel;
 	struct rtattr *tb[TCA_NAT_MAX + 1];
-	char buf1[256];
-	char buf2[256];
-
+	SPRINT_BUF(buf1);
+	SPRINT_BUF(buf2);
 	int len;
 
 	if (arg == NULL)
@@ -153,7 +152,7 @@ print_nat(struct action_util *au, FILE * f, struct rtattr *arg)
 	parse_rtattr_nested(tb, TCA_NAT_MAX, arg);
 
 	if (tb[TCA_NAT_PARMS] == NULL) {
-		fprintf(f, "[NULL nat parameters]");
+		print_string(PRINT_FP, NULL, "%s", "[NULL nat parameters]");
 		return -1;
 	}
 	sel = RTA_DATA(tb[TCA_NAT_PARMS]);
@@ -161,15 +160,22 @@ print_nat(struct action_util *au, FILE * f, struct rtattr *arg)
 	len = ffs(sel->mask);
 	len = len ? 33 - len : 0;
 
-	fprintf(f, " nat %s %s/%d %s", sel->flags & TCA_NAT_FLAG_EGRESS ?
-				       "egress" : "ingress",
-		format_host_r(AF_INET, 4, &sel->old_addr, buf1, sizeof(buf1)),
-		len,
-		format_host_r(AF_INET, 4, &sel->new_addr, buf2, sizeof(buf2)));
-	print_action_control(f, " ", sel->action, "");
+	print_string(PRINT_ANY, "type", " %s ", "nat");
+	print_string(PRINT_ANY, "direction", "%s",
+		     sel->flags & TCA_NAT_FLAG_EGRESS ? "egress" : "ingress");
 
-	fprintf(f, "\n\t index %u ref %d bind %d",
-		sel->index, sel->refcnt, sel->bindcnt);
+	snprintf(buf2, sizeof(buf2), "%s/%d",
+		 format_host_r(AF_INET, 4, &sel->old_addr, buf1, sizeof(buf1)),
+		 len);
+	print_string(PRINT_ANY, "old_addr", " %s", buf2);
+	print_string(PRINT_ANY, "new_addr", " %s",
+		     format_host_r(AF_INET, 4, &sel->new_addr, buf1, sizeof(buf1)));
+
+	print_action_control(f, " ", sel->action, "");
+	print_string(PRINT_FP, NULL, "%s", _SL_);
+	print_uint(PRINT_ANY, "index", "\t index %u", sel->index);
+	print_int(PRINT_ANY, "ref", " ref %d", sel->refcnt);
+	print_int(PRINT_ANY, "bind", " bind %d", sel->bindcnt);
 
 	if (show_stats) {
 		if (tb[TCA_NAT_TM]) {
@@ -179,7 +185,7 @@ print_nat(struct action_util *au, FILE * f, struct rtattr *arg)
 		}
 	}
 
-	fprintf(f, "\n");
+	print_string(PRINT_FP, NULL, "%s", _SL_);
 
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH v3] bpf: attach type BPF_LIRC_MODE2 should not depend on CONFIG_CGROUP_BPF
From: kbuild test robot @ 2018-06-18 18:46 UTC (permalink / raw)
  To: Sean Young
  Cc: kbuild-all, Daniel Borkmann, Y Song, Matthias Reichl, linux-media,
	LKML, Alexei Starovoitov, Mauro Carvalho Chehab, netdev,
	Devin Heitmueller, Quentin Monnet
In-Reply-To: <20180618171216.gearpr755pm3wot7@gofer.mess.org>

[-- Attachment #1: Type: text/plain, Size: 1124 bytes --]

Hi Sean,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.18-rc1 next-20180618]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Sean-Young/bpf-attach-type-BPF_LIRC_MODE2-should-not-depend-on-CONFIG_CGROUP_BPF/20180619-023056
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from kernel///events/core.c:45:0:
>> include/linux/bpf.h:710:1: error: expected identifier or '(' before '{' token
    {
    ^

vim +710 include/linux/bpf.h

   707	
   708	int sockmap_get_from_fd(const union bpf_attr *attr, int type,
   709				struct bpf_prog *prog);
 > 710	{
   711		return -EINVAL;
   712	}
   713	#endif
   714	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6370 bytes --]

^ permalink raw reply

* Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
From: Mathieu Malaterre @ 2018-06-18 18:45 UTC (permalink / raw)
  To: eric.dumazet
  Cc: schwab, David S. Miller, Eric Dumazet, LKML, Christophe LEROY,
	Meelis Roos, netdev, linuxppc-dev
In-Reply-To: <816ef746-5278-1d51-1d9d-55593e377681@gmail.com>

On Mon, Jun 18, 2018 at 8:18 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 06/18/2018 10:54 AM, Andreas Schwab wrote:
> > On Jun 17 2018, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> >> Oh this is silly, please try :
> >>
> >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> >> index c642304f178ce0a4e1358d59e45032a39f76fb3f..54dd9c18ecad817812898d6f335e1794a07dabbe 100644
> >> --- a/net/core/skbuff.c
> >> +++ b/net/core/skbuff.c
> >> @@ -1845,10 +1845,9 @@ EXPORT_SYMBOL(___pskb_trim);
> >>  int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len)
> >>  {
> >>         if (skb->ip_summed == CHECKSUM_COMPLETE) {
> >> -               int delta = skb->len - len;
> >> +               __wsum csumdiff = skb_checksum(skb, len, skb->len - len, 0);
> >>
> >> -               skb->csum = csum_sub(skb->csum,
> >> -                                    skb_checksum(skb, len, delta, 0));
> >> +               skb->csum = csum_block_sub(skb->csum, csumdiff, len);
> >>         }
> >>         return __pskb_trim(skb, len);
> >>  }
> >
> > That doesn't help either.
> >
> > Andreas.
> >
>
> Then maybe NIC provided csum is not correct.
>
> It does not compute a checksum on all the frame, but part of it.
>
> You could use this patch to double check.
>
> diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c
> index 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..277859ea73e35271a10b02011120fca248ec8e71 100644
> --- a/drivers/net/ethernet/sun/sungem.c
> +++ b/drivers/net/ethernet/sun/sungem.c
> @@ -857,6 +857,11 @@ static int gem_rx(struct gem *gp, int work_to_do)
>
>                 csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff);
>                 skb->csum = csum_unfold(csum);
> +               {
> +               __wsum rsum = csum_partial(skb->data + ETH_HLEN, len - ETH_HLEN, 0);
> +               if (csum != csum_fold(rsum))
> +                       pr_err_ratelimited("sungem wrong csum : %x/%x, len %u bytes\n", csum, csum_fold(rsum), len);
> +               }
>                 skb->ip_summed = CHECKSUM_COMPLETE;
>                 skb->protocol = eth_type_trans(skb, gp->dev);
>
>

Here is what I get on my side

[   53.628847] sungem: sungem wrong csum : 4e04/f97, len 64 bytes
[   53.667063] sungem: sungem wrong csum : eea8/6eec, len 149 bytes
[   58.648952] sungem: sungem wrong csum : 2095/3d06, len 64 bytes
[   58.669414] sungem: sungem wrong csum : 5245/b50, len 149 bytes
[   63.674451] sungem: sungem wrong csum : 2d8/5abd, len 149 bytes
[   68.678233] sungem: sungem wrong csum : b8fc/a498, len 149 bytes
[   73.685771] sungem: sungem wrong csum : 374/5a21, len 149 bytes
[   78.689089] sungem: sungem wrong csum : d81/5014, len 149 bytes
[   83.683261] sungem: sungem wrong csum : 4e04/f97, len 64 bytes
[   83.690193] sungem: sungem wrong csum : c2f7/9a9d, len 149 bytes
[   88.692511] sungem: sungem wrong csum : f4d8/68bc, len 149 bytes
[   93.699915] sungem: sungem wrong csum : 1370/4a25, len 149 bytes
[   98.703136] sungem: sungem wrong csum : e1b5/7bdf, len 149 bytes
[  103.704230] sungem: sungem wrong csum : 5321/a74, len 149 bytes
[  108.688912] sungem: sungem wrong csum : 2095/3d06, len 64 bytes
[  108.706559] sungem: sungem wrong csum : ddbc/7fd8, len 149 bytes
[  113.713189] sungem: sungem wrong csum : 5a65/330, len 149 bytes
[  113.891697] sungem: sungem wrong csum : 4e04/f97, len 64 bytes
[  118.717151] sungem: sungem wrong csum : f7c8/65cc, len 149 bytes
[  123.722680] sungem: sungem wrong csum : 3d7a/201b, len 149 bytes
[  128.726524] sungem: sungem wrong csum : c8fd/9497, len 149 bytes
[  133.732045] sungem: sungem wrong csum : de0d/7f87, len 149 bytes
[  135.529163] sungem: sungem wrong csum : 3089/b6dd, len 96 bytes
[  135.529208] eth0: hw csum failure
[  135.529220] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #7
[  135.529226] Call Trace:
[  135.529243] [dffedbe0] [c069ddac]
__skb_checksum_complete+0xf0/0x108 (unreliable)


>
>
>

^ permalink raw reply

* Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
From: Mathieu Malaterre @ 2018-06-18 18:29 UTC (permalink / raw)
  To: schwab
  Cc: eric.dumazet, David S. Miller, Eric Dumazet, LKML,
	Christophe LEROY, Meelis Roos, netdev, linuxppc-dev
In-Reply-To: <87o9g8geu0.fsf@igel.home>

On Mon, Jun 18, 2018 at 7:54 PM Andreas Schwab <schwab@linux-m68k.org> wrote:
>
> On Jun 17 2018, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> > Oh this is silly, please try :
> >
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index c642304f178ce0a4e1358d59e45032a39f76fb3f..54dd9c18ecad817812898d6f335e1794a07dabbe 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -1845,10 +1845,9 @@ EXPORT_SYMBOL(___pskb_trim);
> >  int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len)
> >  {
> >         if (skb->ip_summed == CHECKSUM_COMPLETE) {
> > -               int delta = skb->len - len;
> > +               __wsum csumdiff = skb_checksum(skb, len, skb->len - len, 0);
> >
> > -               skb->csum = csum_sub(skb->csum,
> > -                                    skb_checksum(skb, len, delta, 0));
> > +               skb->csum = csum_block_sub(skb->csum, csumdiff, len);
> >         }
> >         return __pskb_trim(skb, len);
> >  }
>
> That doesn't help either.

seconded (setup g4+sungem):

[  100.272676] eth0: hw csum failure
[  100.272710] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #6
[  100.272716] Call Trace:
[  100.272733] [dffedbd0] [c069ddb8]
__skb_checksum_complete+0xf0/0x108 (unreliable)
[  100.272752] [dffedbf0] [c078ea28] __udp4_lib_rcv+0x238/0xf98
[  100.272767] [dffedc70] [c0731630] ip_local_deliver_finish+0xa8/0x3c4
[  100.272777] [dffedcb0] [c073243c] ip_local_deliver+0xf0/0x154
[  100.272786] [dffedcf0] [c07328e8] ip_rcv+0x448/0x774
[  100.272800] [dffedd50] [c06aeaec] __netif_receive_skb_core+0x5e8/0x1184
[  100.272811] [dffedde0] [c06bba2c] napi_gro_receive+0x160/0x22c
[  100.272828] [dffede10] [e1571590] gem_poll+0x7fc/0x1ac0 [sungem]
[  100.272837] [dffedee0] [c06bacfc] net_rx_action+0x34c/0x618
[  100.272849] [dffedf60] [c07fd28c] __do_softirq+0x16c/0x5f0
[  100.272863] [dffedfd0] [c0064c7c] irq_exit+0x110/0x1a8
[  100.272877] [dffedff0] [c0016170] call_do_irq+0x24/0x3c
[  100.272890] [c0cf7e80] [c0009a84] do_IRQ+0x98/0x1a0
[  100.272900] [c0cf7eb0] [c001b474] ret_from_except+0x0/0x14
[  100.272911] --- interrupt: 501 at arch_cpu_idle+0x30/0x78
                   LR = arch_cpu_idle+0x30/0x78
[  100.272920] [c0cf7f70] [c0cf6000] 0xc0cf6000 (unreliable)
[  100.272935] [c0cf7f80] [c00a3868] do_idle+0xc4/0x158
[  100.272944] [c0cf7fb0] [c00a3ab4] cpu_startup_entry+0x24/0x28
[  100.272955] [c0cf7fc0] [c0998820] start_kernel+0x47c/0x490
[  100.272963] [c0cf7ff0] [00003444] 0x3444


> Andreas.
>
> --
> Andreas Schwab, schwab@linux-m68k.org
> GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
> "And now for something completely different."

^ permalink raw reply

* Re: [PATCH bpf-net] bpf: Change bpf_fib_lookup to return lookup status
From: David Ahern @ 2018-06-18 18:27 UTC (permalink / raw)
  To: Martin KaFai Lau, dsahern; +Cc: netdev, borkmann, ast, davem
In-Reply-To: <20180618181123.eczjeb3axd6sao57@kafai-mbp.dhcp.thefacebook.com>

On 6/18/18 12:11 PM, Martin KaFai Lau wrote:
> On Sun, Jun 17, 2018 at 08:18:19AM -0700, dsahern@kernel.org wrote:
>> From: David Ahern <dsahern@gmail.com>
>>
>> For ACLs implemented using either FIB rules or FIB entries, the BPF
>> program needs the FIB lookup status to be able to drop the packet.
> Except BPF_FIB_LKUP_RET_SUCCESS and BPF_FIB_LKUP_RET_NO_NEIGH,  can you
> give an example on how the xdp_prog may decide XDP_PASS vs XDP_DROP based
> on other BPF_FIB_LKUP_RET_*?
> 

	rc = bpf_fib_lookup(ctx, &fib_params, sizeof(fib_params), flags);
	if (rc == 0)
		packet is forwarded, do the redirect

	/* the program is misconfigured -- wrong parameters in struct or flags */
	if (rc < 0)
		....

	/* rc > 0 case */
	switch(rc) {
	case BPF_FIB_LKUP_RET_BLACKHOLE:
	case BPF_FIB_LKUP_RET_UNREACHABLE:
	case BPF_FIB_LKUP_RET_PROHIBIT:
		return XDP_DROP;
	}

For the others it becomes a question of do we share why the stack needs
to be involved? Maybe the program wants to collect stats to show traffic
patterns that can be improved (BPF_FIB_LKUP_RET_FRAG_NEEDED) or support
in the kernel needs to be improved (BPF_FIB_LKUP_RET_UNSUPP_LWT) or an
interface is misconfigured (BPF_FIB_LKUP_RET_FWD_DISABLED).

Arguably BPF_FIB_LKUP_RET_NO_NHDEV is not needed. See below.

>> @@ -2612,6 +2613,19 @@ struct bpf_raw_tracepoint_args {
>>  #define BPF_FIB_LOOKUP_DIRECT  BIT(0)
>>  #define BPF_FIB_LOOKUP_OUTPUT  BIT(1)
>>  
>> +enum {
>> +	BPF_FIB_LKUP_RET_SUCCESS,      /* lookup successful */
>> +	BPF_FIB_LKUP_RET_BLACKHOLE,    /* dest is blackholed */
>> +	BPF_FIB_LKUP_RET_UNREACHABLE,  /* dest is unreachable */
>> +	BPF_FIB_LKUP_RET_PROHIBIT,     /* dest not allowed */
>> +	BPF_FIB_LKUP_RET_NOT_FWDED,    /* pkt is not forwardded */
> BPF_FIB_LKUP_RET_NOT_FWDED is a catch all?
> 

Destination is local. More precisely, the FIB lookup is not unicast so
not forwarded. It could be RTN_LOCAL, RTN_BROADCAST, RTN_ANYCAST, or
RTN_MULTICAST. The next ones -- blackhole, reachable, prohibit -- are
called out.

>> @@ -4252,16 +4277,19 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>>  	if (check_mtu) {
>>  		mtu = ipv6_stub->ip6_mtu_from_fib6(f6i, dst, src);
>>  		if (params->tot_len > mtu)
>> -			return 0;
>> +			return BPF_FIB_LKUP_RET_FRAG_NEEDED;
>>  	}
>>  
>>  	if (f6i->fib6_nh.nh_lwtstate)
>> -		return 0;
>> +		return BPF_FIB_LKUP_RET_UNSUPP_LWT;
>>  
>>  	if (f6i->fib6_flags & RTF_GATEWAY)
>>  		*dst = f6i->fib6_nh.nh_gw;
>>  
>>  	dev = f6i->fib6_nh.nh_dev;
>> +	if (unlikely(!dev))
>> +		return BPF_FIB_LKUP_RET_NO_NHDEV;
> Is this a bug fix?
> 

Difference between IPv4 and IPv6. Making them consistent.

It is a major BUG in the kernel to reach this point in either protocol
to have a unicast route not tied to a device. IPv4 has checks; v6 does
not. I figured this being new code, why not make bpf_ipv{4,6}_fib_lookup
as close to the same as possible.

^ permalink raw reply

* Re: [PATCH] ptp: replace getnstimeofday64() with ktime_get_real_ts64()
From: Richard Cochran @ 2018-06-18 18:26 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Yangbo Lu, y2038, David S. Miller, Fabio Estevam, netdev,
	linux-kernel
In-Reply-To: <20180618142109.3445025-1-arnd@arndb.de>

On Mon, Jun 18, 2018 at 04:20:39PM +0200, Arnd Bergmann wrote:
> getnstimeofday64() is deprecated and getting replaced throughout
> the kernel with ktime_get_*() based helpers for a more consistent
> interface.
> 
> The two functions do the exact same thing, so this is just
> a cosmetic change.

Acked-by: Richard Cochran <richardcochran@gmail.com>

^ permalink raw reply

* Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
From: Eric Dumazet @ 2018-06-18 18:18 UTC (permalink / raw)
  To: Andreas Schwab, Eric Dumazet
  Cc: Mathieu Malaterre, David S. Miller, Eric Dumazet, LKML,
	Christophe LEROY, Meelis Roos, netdev, linuxppc-dev
In-Reply-To: <87o9g8geu0.fsf@igel.home>



On 06/18/2018 10:54 AM, Andreas Schwab wrote:
> On Jun 17 2018, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
>> Oh this is silly, please try :
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index c642304f178ce0a4e1358d59e45032a39f76fb3f..54dd9c18ecad817812898d6f335e1794a07dabbe 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -1845,10 +1845,9 @@ EXPORT_SYMBOL(___pskb_trim);
>>  int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len)
>>  {
>>         if (skb->ip_summed == CHECKSUM_COMPLETE) {
>> -               int delta = skb->len - len;
>> +               __wsum csumdiff = skb_checksum(skb, len, skb->len - len, 0);
>>  
>> -               skb->csum = csum_sub(skb->csum,
>> -                                    skb_checksum(skb, len, delta, 0));
>> +               skb->csum = csum_block_sub(skb->csum, csumdiff, len);
>>         }
>>         return __pskb_trim(skb, len);
>>  }
> 
> That doesn't help either.
> 
> Andreas.
> 

Then maybe NIC provided csum is not correct.

It does not compute a checksum on all the frame, but part of it.

You could use this patch to double check.

diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c
index 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..277859ea73e35271a10b02011120fca248ec8e71 100644
--- a/drivers/net/ethernet/sun/sungem.c
+++ b/drivers/net/ethernet/sun/sungem.c
@@ -857,6 +857,11 @@ static int gem_rx(struct gem *gp, int work_to_do)
 
                csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff);
                skb->csum = csum_unfold(csum);
+               {
+               __wsum rsum = csum_partial(skb->data + ETH_HLEN, len - ETH_HLEN, 0);
+               if (csum != csum_fold(rsum))
+                       pr_err_ratelimited("sungem wrong csum : %x/%x, len %u bytes\n", csum, csum_fold(rsum), len);
+               }
                skb->ip_summed = CHECKSUM_COMPLETE;
                skb->protocol = eth_type_trans(skb, gp->dev);
 

^ permalink raw reply related

* Re: [PATCH bpf-net] bpf: Change bpf_fib_lookup to return lookup status
From: Martin KaFai Lau @ 2018-06-18 18:11 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, borkmann, ast, davem, David Ahern
In-Reply-To: <20180617151819.6824-1-dsahern@kernel.org>

On Sun, Jun 17, 2018 at 08:18:19AM -0700, dsahern@kernel.org wrote:
> From: David Ahern <dsahern@gmail.com>
> 
> For ACLs implemented using either FIB rules or FIB entries, the BPF
> program needs the FIB lookup status to be able to drop the packet.
Except BPF_FIB_LKUP_RET_SUCCESS and BPF_FIB_LKUP_RET_NO_NEIGH,  can you
give an example on how the xdp_prog may decide XDP_PASS vs XDP_DROP based
on other BPF_FIB_LKUP_RET_*?

> Since the bpf_fib_lookup API has not reached a released kernel yet,
> change the return code to contain an encoding of the FIB lookup
> result and return the nexthop device index in the params struct.
> 
> In addition, inform the BPF program of any post FIB lookup reason as
> to why the packet needs to go up the stack.
> 
> Update the sample program per the change in API.
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>
> ---
>  include/uapi/linux/bpf.h   | 28 ++++++++++++++----
>  net/core/filter.c          | 74 ++++++++++++++++++++++++++++++++--------------
>  samples/bpf/xdp_fwd_kern.c |  8 ++---
>  3 files changed, 78 insertions(+), 32 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 59b19b6a40d7..ceb80071c341 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -1857,7 +1857,8 @@ union bpf_attr {
>   *		is resolved), the nexthop address is returned in ipv4_dst
>   *		or ipv6_dst based on family, smac is set to mac address of
>   *		egress device, dmac is set to nexthop mac address, rt_metric
> - *		is set to metric from route (IPv4/IPv6 only).
> + *		is set to metric from route (IPv4/IPv6 only), and ifindex
> + *		is set to the device index of the nexthop from the FIB lookup.
>   *
>   *             *plen* argument is the size of the passed in struct.
>   *             *flags* argument can be a combination of one or more of the
> @@ -1873,9 +1874,9 @@ union bpf_attr {
>   *             *ctx* is either **struct xdp_md** for XDP programs or
>   *             **struct sk_buff** tc cls_act programs.
>   *     Return
> - *             Egress device index on success, 0 if packet needs to continue
> - *             up the stack for further processing or a negative error in case
> - *             of failure.
> + *		< 0 if any input argument is invalid
> + *		  0 on success (packet is forwarded and nexthop neighbor exists)
> + *		> 0 one of BPF_FIB_LKUP_RET_ codes on FIB lookup response
>   *
>   * int bpf_sock_hash_update(struct bpf_sock_ops_kern *skops, struct bpf_map *map, void *key, u64 flags)
>   *	Description
> @@ -2612,6 +2613,19 @@ struct bpf_raw_tracepoint_args {
>  #define BPF_FIB_LOOKUP_DIRECT  BIT(0)
>  #define BPF_FIB_LOOKUP_OUTPUT  BIT(1)
>  
> +enum {
> +	BPF_FIB_LKUP_RET_SUCCESS,      /* lookup successful */
> +	BPF_FIB_LKUP_RET_BLACKHOLE,    /* dest is blackholed */
> +	BPF_FIB_LKUP_RET_UNREACHABLE,  /* dest is unreachable */
> +	BPF_FIB_LKUP_RET_PROHIBIT,     /* dest not allowed */
> +	BPF_FIB_LKUP_RET_NOT_FWDED,    /* pkt is not forwardded */
BPF_FIB_LKUP_RET_NOT_FWDED is a catch all?

> +	BPF_FIB_LKUP_RET_FWD_DISABLED, /* fwding is not enabled on ingress */
> +	BPF_FIB_LKUP_RET_UNSUPP_LWT,   /* fwd requires unsupported encap */
> +	BPF_FIB_LKUP_RET_NO_NHDEV,     /* nh device does not exist */
> +	BPF_FIB_LKUP_RET_NO_NEIGH,     /* no neigh entry for nh */
> +	BPF_FIB_LKUP_RET_FRAG_NEEDED,  /* pkt too big to fwd */
> +};
> +
>  struct bpf_fib_lookup {
>  	/* input:  network family for lookup (AF_INET, AF_INET6)
>  	 * output: network family of egress nexthop
> @@ -2625,7 +2639,11 @@ struct bpf_fib_lookup {
>  
>  	/* total length of packet from network header - used for MTU check */
>  	__u16	tot_len;
> -	__u32	ifindex;  /* L3 device index for lookup */
> +
> +	/* input: L3 device index for lookup
> +	 * output: nexthop device index from FIB lookup
> +	 */
> +	__u32	ifindex;
>  
>  	union {
>  		/* inputs to lookup */
> diff --git a/net/core/filter.c b/net/core/filter.c
> index e7f12e9f598c..e758ca487878 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -4073,8 +4073,9 @@ static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params,
>  	memcpy(params->smac, dev->dev_addr, ETH_ALEN);
>  	params->h_vlan_TCI = 0;
>  	params->h_vlan_proto = 0;
> +	params->ifindex = dev->ifindex;
>  
> -	return dev->ifindex;
> +	return 0;
>  }
>  #endif
>  
> @@ -4098,7 +4099,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>  	/* verify forwarding is enabled on this interface */
>  	in_dev = __in_dev_get_rcu(dev);
>  	if (unlikely(!in_dev || !IN_DEV_FORWARD(in_dev)))
> -		return 0;
> +		return BPF_FIB_LKUP_RET_FWD_DISABLED;
>  
>  	if (flags & BPF_FIB_LOOKUP_OUTPUT) {
>  		fl4.flowi4_iif = 1;
> @@ -4123,7 +4124,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>  
>  		tb = fib_get_table(net, tbid);
>  		if (unlikely(!tb))
> -			return 0;
> +			return BPF_FIB_LKUP_RET_NOT_FWDED;
>  
>  		err = fib_table_lookup(tb, &fl4, &res, FIB_LOOKUP_NOREF);
>  	} else {
> @@ -4135,8 +4136,20 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>  		err = fib_lookup(net, &fl4, &res, FIB_LOOKUP_NOREF);
>  	}
>  
> -	if (err || res.type != RTN_UNICAST)
> -		return 0;
> +	if (err) {
> +		/* map fib lookup errors to RTN_ type */
> +		if (err == -EINVAL)
> +			return BPF_FIB_LKUP_RET_BLACKHOLE;
> +		if (err == -EHOSTUNREACH)
> +			return BPF_FIB_LKUP_RET_UNREACHABLE;
> +		if (err == -EACCES)
> +			return BPF_FIB_LKUP_RET_PROHIBIT;
> +
> +		return BPF_FIB_LKUP_RET_NOT_FWDED;
> +	}
> +
> +	if (res.type != RTN_UNICAST)
> +		return BPF_FIB_LKUP_RET_NOT_FWDED;
>  
>  	if (res.fi->fib_nhs > 1)
>  		fib_select_path(net, &res, &fl4, NULL);
> @@ -4144,18 +4157,18 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>  	if (check_mtu) {
>  		mtu = ip_mtu_from_fib_result(&res, params->ipv4_dst);
>  		if (params->tot_len > mtu)
> -			return 0;
> +			return BPF_FIB_LKUP_RET_FRAG_NEEDED;
>  	}
>  
>  	nh = &res.fi->fib_nh[res.nh_sel];
>  
>  	/* do not handle lwt encaps right now */
>  	if (nh->nh_lwtstate)
> -		return 0;
> +		return BPF_FIB_LKUP_RET_UNSUPP_LWT;
>  
>  	dev = nh->nh_dev;
>  	if (unlikely(!dev))
> -		return 0;
> +		return BPF_FIB_LKUP_RET_NO_NHDEV;
>  
>  	if (nh->nh_gw)
>  		params->ipv4_dst = nh->nh_gw;
> @@ -4166,10 +4179,10 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>  	 * rcu_read_lock_bh is not needed here
>  	 */
>  	neigh = __ipv4_neigh_lookup_noref(dev, (__force u32)params->ipv4_dst);
> -	if (neigh)
> -		return bpf_fib_set_fwd_params(params, neigh, dev);
> +	if (!neigh)
> +		return BPF_FIB_LKUP_RET_NO_NEIGH;
>  
> -	return 0;
> +	return bpf_fib_set_fwd_params(params, neigh, dev);
>  }
>  #endif
>  
> @@ -4190,7 +4203,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>  
>  	/* link local addresses are never forwarded */
>  	if (rt6_need_strict(dst) || rt6_need_strict(src))
> -		return 0;
> +		return BPF_FIB_LKUP_RET_NOT_FWDED;
>  
>  	dev = dev_get_by_index_rcu(net, params->ifindex);
>  	if (unlikely(!dev))
> @@ -4198,7 +4211,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>  
>  	idev = __in6_dev_get_safely(dev);
>  	if (unlikely(!idev || !net->ipv6.devconf_all->forwarding))
> -		return 0;
> +		return BPF_FIB_LKUP_RET_FWD_DISABLED;
>  
>  	if (flags & BPF_FIB_LOOKUP_OUTPUT) {
>  		fl6.flowi6_iif = 1;
> @@ -4225,7 +4238,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>  
>  		tb = ipv6_stub->fib6_get_table(net, tbid);
>  		if (unlikely(!tb))
> -			return 0;
> +			return BPF_FIB_LKUP_RET_NOT_FWDED;
>  
>  		f6i = ipv6_stub->fib6_table_lookup(net, tb, oif, &fl6, strict);
>  	} else {
> @@ -4238,11 +4251,23 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>  	}
>  
>  	if (unlikely(IS_ERR_OR_NULL(f6i) || f6i == net->ipv6.fib6_null_entry))
> -		return 0;
> +		return BPF_FIB_LKUP_RET_NOT_FWDED;
> +
> +	if (unlikely(f6i->fib6_flags & RTF_REJECT)) {
> +		switch (f6i->fib6_type) {
> +		case RTN_BLACKHOLE:
> +			return BPF_FIB_LKUP_RET_BLACKHOLE;
> +		case RTN_UNREACHABLE:
> +			return BPF_FIB_LKUP_RET_UNREACHABLE;
> +		case RTN_PROHIBIT:
> +			return BPF_FIB_LKUP_RET_PROHIBIT;
> +		default:
> +			return BPF_FIB_LKUP_RET_NOT_FWDED;
> +		}
> +	}
>  
> -	if (unlikely(f6i->fib6_flags & RTF_REJECT ||
> -	    f6i->fib6_type != RTN_UNICAST))
> -		return 0;
> +	if (f6i->fib6_type != RTN_UNICAST)
> +		return BPF_FIB_LKUP_RET_NOT_FWDED;
>  
>  	if (f6i->fib6_nsiblings && fl6.flowi6_oif == 0)
>  		f6i = ipv6_stub->fib6_multipath_select(net, f6i, &fl6,
> @@ -4252,16 +4277,19 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>  	if (check_mtu) {
>  		mtu = ipv6_stub->ip6_mtu_from_fib6(f6i, dst, src);
>  		if (params->tot_len > mtu)
> -			return 0;
> +			return BPF_FIB_LKUP_RET_FRAG_NEEDED;
>  	}
>  
>  	if (f6i->fib6_nh.nh_lwtstate)
> -		return 0;
> +		return BPF_FIB_LKUP_RET_UNSUPP_LWT;
>  
>  	if (f6i->fib6_flags & RTF_GATEWAY)
>  		*dst = f6i->fib6_nh.nh_gw;
>  
>  	dev = f6i->fib6_nh.nh_dev;
> +	if (unlikely(!dev))
> +		return BPF_FIB_LKUP_RET_NO_NHDEV;
Is this a bug fix?

> +
>  	params->rt_metric = f6i->fib6_metric;
>  
>  	/* xdp and cls_bpf programs are run in RCU-bh so rcu_read_lock_bh is
> @@ -4270,10 +4298,10 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>  	 */
>  	neigh = ___neigh_lookup_noref(ipv6_stub->nd_tbl, neigh_key_eq128,
>  				      ndisc_hashfn, dst, dev);
> -	if (neigh)
> -		return bpf_fib_set_fwd_params(params, neigh, dev);
> +	if (!neigh)
> +		return BPF_FIB_LKUP_RET_NO_NEIGH;
>  
> -	return 0;
> +	return bpf_fib_set_fwd_params(params, neigh, dev);
>  }
>  #endif
>  
> diff --git a/samples/bpf/xdp_fwd_kern.c b/samples/bpf/xdp_fwd_kern.c
> index 6673cdb9f55c..a7e94e7ff87d 100644
> --- a/samples/bpf/xdp_fwd_kern.c
> +++ b/samples/bpf/xdp_fwd_kern.c
> @@ -48,9 +48,9 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
>  	struct ethhdr *eth = data;
>  	struct ipv6hdr *ip6h;
>  	struct iphdr *iph;
> -	int out_index;
>  	u16 h_proto;
>  	u64 nh_off;
> +	int rc;
>  
>  	nh_off = sizeof(*eth);
>  	if (data + nh_off > data_end)
> @@ -101,7 +101,7 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
>  
>  	fib_params.ifindex = ctx->ingress_ifindex;
>  
> -	out_index = bpf_fib_lookup(ctx, &fib_params, sizeof(fib_params), flags);
> +	rc = bpf_fib_lookup(ctx, &fib_params, sizeof(fib_params), flags);
>  
>  	/* verify egress index has xdp support
>  	 * TO-DO bpf_map_lookup_elem(&tx_port, &key) fails with
> @@ -109,7 +109,7 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
>  	 * NOTE: without verification that egress index supports XDP
>  	 *       forwarding packets are dropped.
>  	 */
> -	if (out_index > 0) {
> +	if (rc == 0) {
>  		if (h_proto == htons(ETH_P_IP))
>  			ip_decrease_ttl(iph);
>  		else if (h_proto == htons(ETH_P_IPV6))
> @@ -117,7 +117,7 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
>  
>  		memcpy(eth->h_dest, fib_params.dmac, ETH_ALEN);
>  		memcpy(eth->h_source, fib_params.smac, ETH_ALEN);
> -		return bpf_redirect_map(&tx_port, out_index, 0);
> +		return bpf_redirect_map(&tx_port, fib_params.ifindex, 0);
>  	}
>  
>  	return XDP_PASS;
> -- 
> 2.11.0
> 

^ permalink raw reply

* array bounds warning in xfrm_output_resume
From: David Ahern @ 2018-06-18 18:10 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev@vger.kernel.org

Florian:

I am seeing this warning:

$ make O=kbuild/perf -j 24 -s
In file included from /home/dsa/kernel-3.git/include/linux/kernel.h:10:0,
                 from /home/dsa/kernel-3.git/include/linux/list.h:9,
                 from /home/dsa/kernel-3.git/include/linux/module.h:9,
                 from /home/dsa/kernel-3.git/net/xfrm/xfrm_output.c:13:
/home/dsa/kernel-3.git/net/xfrm/xfrm_output.c: In function
‘xfrm_output_resume’:
/home/dsa/kernel-3.git/include/linux/compiler.h:252:20: warning: array
subscript is above array bounds [-Warray-bounds]
   __read_once_size(&(x), __u.__c, sizeof(x));  \
                    ^~~~
/home/dsa/kernel-3.git/include/linux/compiler.h:258:22: note: in
expansion of macro ‘__READ_ONCE’
 #define READ_ONCE(x) __READ_ONCE(x, 1)
                      ^~~~~~~~~~~
/home/dsa/kernel-3.git/include/linux/rcupdate.h:350:48: note: in
expansion of macro ‘READ_ONCE’
  typeof(*p) *________p1 = (typeof(*p) *__force)READ_ONCE(p); \
                                                ^~~~~~~~~
/home/dsa/kernel-3.git/include/linux/rcupdate.h:487:2: note: in
expansion of macro ‘__rcu_dereference_check’
  __rcu_dereference_check((p), (c) || rcu_read_lock_held(), __rcu)
  ^~~~~~~~~~~~~~~~~~~~~~~
/home/dsa/kernel-3.git/include/linux/rcupdate.h:545:28: note: in
expansion of macro ‘rcu_dereference_check’
 #define rcu_dereference(p) rcu_dereference_check(p, 0)
                            ^~~~~~~~~~~~~~~~~~~~~
/home/dsa/kernel-3.git/include/linux/netfilter.h:218:15: note: in
expansion of macro ‘rcu_dereference’
   hook_head = rcu_dereference(net->nf.hooks_arp[hook]);
               ^~~~~~~~~~~~~~~

Line in question is the nf_hook in xfrm_output_resume.
NF_INET_POST_ROUTING = 4 which is greater than NF_ARP_NUMHOOKS = 3

I believe ef57170bbfdd6 is the commit that introduced the warning

^ permalink raw reply

* Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
From: Andreas Schwab @ 2018-06-18 17:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Mathieu Malaterre, David S. Miller, Eric Dumazet, LKML,
	Christophe LEROY, Meelis Roos, netdev, linuxppc-dev
In-Reply-To: <e1a9b126-3226-72d0-82b2-b69ca5f59ccb@gmail.com>

On Jun 17 2018, Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Oh this is silly, please try :
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index c642304f178ce0a4e1358d59e45032a39f76fb3f..54dd9c18ecad817812898d6f335e1794a07dabbe 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -1845,10 +1845,9 @@ EXPORT_SYMBOL(___pskb_trim);
>  int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len)
>  {
>         if (skb->ip_summed == CHECKSUM_COMPLETE) {
> -               int delta = skb->len - len;
> +               __wsum csumdiff = skb_checksum(skb, len, skb->len - len, 0);
>  
> -               skb->csum = csum_sub(skb->csum,
> -                                    skb_checksum(skb, len, delta, 0));
> +               skb->csum = csum_block_sub(skb->csum, csumdiff, len);
>         }
>         return __pskb_trim(skb, len);
>  }

That doesn't help either.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply

* Re: [RFC v2, net-next, PATCH 0/4] Add switchdev on TI-CPSW
From: Ilias Apalodimas @ 2018-06-18 17:49 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri,
	ivecera, f.fainelli, francois.ozog, yogeshs, spatton, Jose.Abreu
In-Reply-To: <20180618173025.GF5865@lunn.ch>

On Mon, Jun 18, 2018 at 07:30:25PM +0200, Andrew Lunn wrote:
> On Mon, Jun 18, 2018 at 07:46:02PM +0300, Ilias Apalodimas wrote:
> > On Mon, Jun 18, 2018 at 06:28:36PM +0200, Andrew Lunn wrote:
> > > > Yes, if the CPU port is added on the VLAN then unregistered multicast traffic
> > > > (and thus IGMP joins) will reach the CPU port and everything will work as
> > > > expected. I think we should not consider this as a "problem" as long as it's
> > > > descibed properly in Documentation. This switch is excected to support this.
> > > 
> > > Back to the two e1000e. What would you expect to happen with them?
> > > Either IGMP snooping needs to work, or your don't do snooping at
> > > all.
> > That's a different use case
> 
> I disagree. That is the exact same use case. I add ports to a bridge
> and i expect the bridge to either do IGMP snooping, or just forward
> all multicast. That is the users expectations. That is how the Linux
> network stack works. If the hardware has limitations you want to try
> to hide them from the user.
Why is this a limitation? Isn't it proper functionality?
If you configure the CPU port as "passthrough" (which again is
the default) then everything works just like e1000e. The user doesn't have to do
anything at all, MDBs are added/deleted based on proper timers/joins etc.
If the user chooses to use the cpu port as a kind of internal L-2 filter, 
that's up to him, but having hardware do the filtering for you isn't something 
i'd call a limitation.

I am not sure what's the case here though. The hardware operates as you want
by defaulti. As added functionality the user can, if he chooses to, add the 
MDBs manually instead of having some piece of code do that for him. 
This was clearly described in the first RFC since it was the only reason to add
a CPU port. Is there a problem with the user controlling these capabilities of
the hardware?
> > > So by default, it just needs to work. You can give the user the option
> > > to shoot themselves in the foot, but they need to actively pull the
> > > trigger to blow their own foot off.
> 
> > Yes it does by default. I don't consider it "foot shooting" though. 
> > If we stop thinking about switches connected to user environments 
> 
> I never think about switches. I think about a block of acceleration
> hardware, which i try to offload Linux networking to.  And if the
> hardware cannot accelerate Linux network functions properly, i don't
> try to offload it. That way it always operates in the same way, and
> the user expectations are clear.
> 
>     Andrew
The acceleration block is working properly here. The user has the ability to
instruct the acceleration block not to accelerate all the traffic but specific
cases he chooses to. Isn't that a valid use case since the hardware supports
that ?

Regards
Ilias

^ permalink raw reply

* Re: [RFC v2, net-next, PATCH 0/4] Add switchdev on TI-CPSW
From: Andrew Lunn @ 2018-06-18 17:30 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri,
	ivecera, f.fainelli, francois.ozog, yogeshs, spatton, Jose.Abreu
In-Reply-To: <20180618164602.GA26411@apalos>

On Mon, Jun 18, 2018 at 07:46:02PM +0300, Ilias Apalodimas wrote:
> On Mon, Jun 18, 2018 at 06:28:36PM +0200, Andrew Lunn wrote:
> > > Yes, if the CPU port is added on the VLAN then unregistered multicast traffic
> > > (and thus IGMP joins) will reach the CPU port and everything will work as
> > > expected. I think we should not consider this as a "problem" as long as it's
> > > descibed properly in Documentation. This switch is excected to support this.
> > 
> > Back to the two e1000e. What would you expect to happen with them?
> > Either IGMP snooping needs to work, or your don't do snooping at
> > all.
> That's a different use case

I disagree. That is the exact same use case. I add ports to a bridge
and i expect the bridge to either do IGMP snooping, or just forward
all multicast. That is the users expectations. That is how the Linux
network stack works. If the hardware has limitations you want to try
to hide them from the user.

> > So by default, it just needs to work. You can give the user the option
> > to shoot themselves in the foot, but they need to actively pull the
> > trigger to blow their own foot off.

> Yes it does by default. I don't consider it "foot shooting" though. 
> If we stop thinking about switches connected to user environments 

I never think about switches. I think about a block of acceleration
hardware, which i try to offload Linux networking to. And if the
hardware cannot accelerate Linux network functions properly, i don't
try to offload it. That way it always operates in the same way, and
the user expectations are clear.

    Andrew

^ permalink raw reply

* [PATCH v3] bpf: attach type BPF_LIRC_MODE2 should not depend on CONFIG_CGROUP_BPF
From: Sean Young @ 2018-06-18 17:12 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Y Song, Matthias Reichl, linux-media, LKML, Alexei Starovoitov,
	Mauro Carvalho Chehab, netdev, Devin Heitmueller, Quentin Monnet
In-Reply-To: <d2613314-406e-dd7d-1cf0-b5a78a155e3b@iogearbox.net>

If the kernel is compiled with CONFIG_CGROUP_BPF not enabled, it is not
possible to attach, detach or query IR BPF programs to /dev/lircN devices,
making them impossible to use. For embedded devices, it should be possible
to use IR decoding without cgroups or CONFIG_CGROUP_BPF enabled.

This change requires some refactoring, since bpf_prog_{attach,detach,query}
functions are now always compiled, but their code paths for cgroups need
moving out. Rather than a #ifdef CONFIG_CGROUP_BPF in kernel/bpf/syscall.c,
moving them to kernel/bpf/cgroup.c and kernel/bpf/sockmap.c does not
require #ifdefs since that is already conditionally compiled.

Signed-off-by: Sean Young <sean@mess.org>
---
 drivers/media/rc/bpf-lirc.c | 14 +-----
 include/linux/bpf-cgroup.h  | 26 ++++++++++
 include/linux/bpf.h         |  8 +++
 include/linux/bpf_lirc.h    |  5 +-
 kernel/bpf/cgroup.c         | 52 ++++++++++++++++++++
 kernel/bpf/sockmap.c        | 18 +++++++
 kernel/bpf/syscall.c        | 98 ++++++++-----------------------------
 7 files changed, 130 insertions(+), 91 deletions(-)

diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c
index 40826bba06b6..fcfab6635f9c 100644
--- a/drivers/media/rc/bpf-lirc.c
+++ b/drivers/media/rc/bpf-lirc.c
@@ -207,29 +207,19 @@ void lirc_bpf_free(struct rc_dev *rcdev)
 	bpf_prog_array_free(rcdev->raw->progs);
 }
 
-int lirc_prog_attach(const union bpf_attr *attr)
+int lirc_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 {
-	struct bpf_prog *prog;
 	struct rc_dev *rcdev;
 	int ret;
 
 	if (attr->attach_flags)
 		return -EINVAL;
 
-	prog = bpf_prog_get_type(attr->attach_bpf_fd,
-				 BPF_PROG_TYPE_LIRC_MODE2);
-	if (IS_ERR(prog))
-		return PTR_ERR(prog);
-
 	rcdev = rc_dev_get_from_fd(attr->target_fd);
-	if (IS_ERR(rcdev)) {
-		bpf_prog_put(prog);
+	if (IS_ERR(rcdev))
 		return PTR_ERR(rcdev);
-	}
 
 	ret = lirc_bpf_attach(rcdev, prog);
-	if (ret)
-		bpf_prog_put(prog);
 
 	put_device(&rcdev->dev);
 
diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 975fb4cf1bb7..79795c5fa7c3 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -188,12 +188,38 @@ int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor,
 									      \
 	__ret;								      \
 })
+int cgroup_bpf_prog_attach(const union bpf_attr *attr,
+			   enum bpf_prog_type ptype, struct bpf_prog *prog);
+int cgroup_bpf_prog_detach(const union bpf_attr *attr,
+			   enum bpf_prog_type ptype);
+int cgroup_bpf_prog_query(const union bpf_attr *attr,
+			  union bpf_attr __user *uattr);
 #else
 
+struct bpf_prog;
 struct cgroup_bpf {};
 static inline void cgroup_bpf_put(struct cgroup *cgrp) {}
 static inline int cgroup_bpf_inherit(struct cgroup *cgrp) { return 0; }
 
+static inline int cgroup_bpf_prog_attach(const union bpf_attr *attr,
+					 enum bpf_prog_type ptype,
+					 struct bpf_prog *prog)
+{
+	return -EINVAL;
+}
+
+static inline int cgroup_bpf_prog_detach(const union bpf_attr *attr,
+					 enum bpf_prog_type ptype)
+{
+	return -EINVAL;
+}
+
+static inline int cgroup_bpf_prog_query(const union bpf_attr *attr,
+					union bpf_attr __user *uattr)
+{
+	return -EINVAL;
+}
+
 #define cgroup_bpf_enabled (0)
 #define BPF_CGROUP_PRE_CONNECT_ENABLED(sk) (0)
 #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk,skb) ({ 0; })
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 995c3b1e59bf..ac4c73d29f96 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -684,6 +684,8 @@ static inline void bpf_map_offload_map_free(struct bpf_map *map)
 struct sock  *__sock_map_lookup_elem(struct bpf_map *map, u32 key);
 struct sock  *__sock_hash_lookup_elem(struct bpf_map *map, void *key);
 int sock_map_prog(struct bpf_map *map, struct bpf_prog *prog, u32 type);
+int sockmap_get_from_fd(const union bpf_attr *attr, int type,
+			struct bpf_prog *prog);
 #else
 static inline struct sock  *__sock_map_lookup_elem(struct bpf_map *map, u32 key)
 {
@@ -702,6 +704,12 @@ static inline int sock_map_prog(struct bpf_map *map,
 {
 	return -EOPNOTSUPP;
 }
+
+int sockmap_get_from_fd(const union bpf_attr *attr, int type,
+			struct bpf_prog *prog);
+{
+	return -EINVAL;
+}
 #endif
 
 #if defined(CONFIG_XDP_SOCKETS)
diff --git a/include/linux/bpf_lirc.h b/include/linux/bpf_lirc.h
index 5f8a4283092d..9d9ff755ec29 100644
--- a/include/linux/bpf_lirc.h
+++ b/include/linux/bpf_lirc.h
@@ -5,11 +5,12 @@
 #include <uapi/linux/bpf.h>
 
 #ifdef CONFIG_BPF_LIRC_MODE2
-int lirc_prog_attach(const union bpf_attr *attr);
+int lirc_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog);
 int lirc_prog_detach(const union bpf_attr *attr);
 int lirc_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr);
 #else
-static inline int lirc_prog_attach(const union bpf_attr *attr)
+static inline int lirc_prog_attach(const union bpf_attr *attr,
+				   struct bpf_prog *prog)
 {
 	return -EINVAL;
 }
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index f7c00bd6f8e4..f0ae8a3d01f9 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -428,6 +428,58 @@ int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 	return ret;
 }
 
+int cgroup_bpf_prog_attach(const union bpf_attr *attr,
+			   enum bpf_prog_type ptype, struct bpf_prog *prog)
+{
+	struct cgroup *cgrp;
+	int ret;
+
+	cgrp = cgroup_get_from_fd(attr->target_fd);
+	if (IS_ERR(cgrp))
+		return PTR_ERR(cgrp);
+
+	ret = cgroup_bpf_attach(cgrp, prog, attr->attach_type,
+				attr->attach_flags);
+	cgroup_put(cgrp);
+
+	return ret;
+}
+
+int cgroup_bpf_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype)
+{
+	struct bpf_prog *prog;
+	struct cgroup *cgrp;
+	int ret;
+
+	cgrp = cgroup_get_from_fd(attr->target_fd);
+	if (IS_ERR(cgrp))
+		return PTR_ERR(cgrp);
+
+	prog = bpf_prog_get_type(attr->attach_bpf_fd, ptype);
+	if (IS_ERR(prog))
+		prog = NULL;
+
+	ret = cgroup_bpf_detach(cgrp, prog, attr->attach_type, 0);
+	if (prog)
+		bpf_prog_put(prog);
+	cgroup_put(cgrp);
+	return ret;
+}
+
+int cgroup_bpf_prog_query(const union bpf_attr *attr,
+			  union bpf_attr __user *uattr)
+{
+	struct cgroup *cgrp;
+	int ret;
+
+	cgrp = cgroup_get_from_fd(attr->query.target_fd);
+	if (IS_ERR(cgrp))
+		return PTR_ERR(cgrp);
+	ret = cgroup_bpf_query(cgrp, attr, uattr);
+	cgroup_put(cgrp);
+	return ret;
+}
+
 /**
  * __cgroup_bpf_run_filter_skb() - Run a program for packet filtering
  * @sk: The socket sending or receiving traffic
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 52a91d816c0e..81d0c55a77aa 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -1915,6 +1915,24 @@ int sock_map_prog(struct bpf_map *map, struct bpf_prog *prog, u32 type)
 	return 0;
 }
 
+int sockmap_get_from_fd(const union bpf_attr *attr, int type,
+			struct bpf_prog *prog)
+{
+	int ufd = attr->target_fd;
+	struct bpf_map *map;
+	struct fd f;
+	int err;
+
+	f = fdget(ufd);
+	map = __bpf_map_get(f);
+	if (IS_ERR(map))
+		return PTR_ERR(map);
+
+	err = sock_map_prog(map, prog, attr->attach_type);
+	fdput(f);
+	return err;
+}
+
 static void *sock_map_lookup(struct bpf_map *map, void *key)
 {
 	return NULL;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 0fa20624707f..93993c03c9ac 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1489,8 +1489,6 @@ static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
 	return err;
 }
 
-#ifdef CONFIG_CGROUP_BPF
-
 static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
 					     enum bpf_attach_type attach_type)
 {
@@ -1505,40 +1503,6 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
 
 #define BPF_PROG_ATTACH_LAST_FIELD attach_flags
 
-static int sockmap_get_from_fd(const union bpf_attr *attr,
-			       int type, bool attach)
-{
-	struct bpf_prog *prog = NULL;
-	int ufd = attr->target_fd;
-	struct bpf_map *map;
-	struct fd f;
-	int err;
-
-	f = fdget(ufd);
-	map = __bpf_map_get(f);
-	if (IS_ERR(map))
-		return PTR_ERR(map);
-
-	if (attach) {
-		prog = bpf_prog_get_type(attr->attach_bpf_fd, type);
-		if (IS_ERR(prog)) {
-			fdput(f);
-			return PTR_ERR(prog);
-		}
-	}
-
-	err = sock_map_prog(map, prog, attr->attach_type);
-	if (err) {
-		fdput(f);
-		if (prog)
-			bpf_prog_put(prog);
-		return err;
-	}
-
-	fdput(f);
-	return 0;
-}
-
 #define BPF_F_ATTACH_MASK \
 	(BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI)
 
@@ -1546,7 +1510,6 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 {
 	enum bpf_prog_type ptype;
 	struct bpf_prog *prog;
-	struct cgroup *cgrp;
 	int ret;
 
 	if (!capable(CAP_NET_ADMIN))
@@ -1583,12 +1546,15 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 		ptype = BPF_PROG_TYPE_CGROUP_DEVICE;
 		break;
 	case BPF_SK_MSG_VERDICT:
-		return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_MSG, true);
+		ptype = BPF_PROG_TYPE_SK_MSG;
+		break;
 	case BPF_SK_SKB_STREAM_PARSER:
 	case BPF_SK_SKB_STREAM_VERDICT:
-		return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_SKB, true);
+		ptype = BPF_PROG_TYPE_SK_SKB;
+		break;
 	case BPF_LIRC_MODE2:
-		return lirc_prog_attach(attr);
+		ptype = BPF_PROG_TYPE_LIRC_MODE2;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -1602,17 +1568,20 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 		return -EINVAL;
 	}
 
-	cgrp = cgroup_get_from_fd(attr->target_fd);
-	if (IS_ERR(cgrp)) {
-		bpf_prog_put(prog);
-		return PTR_ERR(cgrp);
+	switch (ptype) {
+	case BPF_PROG_TYPE_SK_SKB:
+	case BPF_PROG_TYPE_SK_MSG:
+		ret = sockmap_get_from_fd(attr, ptype, prog);
+		break;
+	case BPF_PROG_TYPE_LIRC_MODE2:
+		ret = lirc_prog_attach(attr, prog);
+		break;
+	default:
+		ret = cgroup_bpf_prog_attach(attr, ptype, prog);
 	}
 
-	ret = cgroup_bpf_attach(cgrp, prog, attr->attach_type,
-				attr->attach_flags);
 	if (ret)
 		bpf_prog_put(prog);
-	cgroup_put(cgrp);
 
 	return ret;
 }
@@ -1622,9 +1591,6 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 static int bpf_prog_detach(const union bpf_attr *attr)
 {
 	enum bpf_prog_type ptype;
-	struct bpf_prog *prog;
-	struct cgroup *cgrp;
-	int ret;
 
 	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
@@ -1657,29 +1623,17 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 		ptype = BPF_PROG_TYPE_CGROUP_DEVICE;
 		break;
 	case BPF_SK_MSG_VERDICT:
-		return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_MSG, false);
+		return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_MSG, NULL);
 	case BPF_SK_SKB_STREAM_PARSER:
 	case BPF_SK_SKB_STREAM_VERDICT:
-		return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_SKB, false);
+		return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_SKB, NULL);
 	case BPF_LIRC_MODE2:
 		return lirc_prog_detach(attr);
 	default:
 		return -EINVAL;
 	}
 
-	cgrp = cgroup_get_from_fd(attr->target_fd);
-	if (IS_ERR(cgrp))
-		return PTR_ERR(cgrp);
-
-	prog = bpf_prog_get_type(attr->attach_bpf_fd, ptype);
-	if (IS_ERR(prog))
-		prog = NULL;
-
-	ret = cgroup_bpf_detach(cgrp, prog, attr->attach_type, 0);
-	if (prog)
-		bpf_prog_put(prog);
-	cgroup_put(cgrp);
-	return ret;
+	return cgroup_bpf_prog_detach(attr, ptype);
 }
 
 #define BPF_PROG_QUERY_LAST_FIELD query.prog_cnt
@@ -1687,9 +1641,6 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 static int bpf_prog_query(const union bpf_attr *attr,
 			  union bpf_attr __user *uattr)
 {
-	struct cgroup *cgrp;
-	int ret;
-
 	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
 	if (CHECK_ATTR(BPF_PROG_QUERY))
@@ -1717,14 +1668,9 @@ static int bpf_prog_query(const union bpf_attr *attr,
 	default:
 		return -EINVAL;
 	}
-	cgrp = cgroup_get_from_fd(attr->query.target_fd);
-	if (IS_ERR(cgrp))
-		return PTR_ERR(cgrp);
-	ret = cgroup_bpf_query(cgrp, attr, uattr);
-	cgroup_put(cgrp);
-	return ret;
+
+	return cgroup_bpf_prog_query(attr, uattr);
 }
-#endif /* CONFIG_CGROUP_BPF */
 
 #define BPF_PROG_TEST_RUN_LAST_FIELD test.duration
 
@@ -2371,7 +2317,6 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	case BPF_OBJ_GET:
 		err = bpf_obj_get(&attr);
 		break;
-#ifdef CONFIG_CGROUP_BPF
 	case BPF_PROG_ATTACH:
 		err = bpf_prog_attach(&attr);
 		break;
@@ -2381,7 +2326,6 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	case BPF_PROG_QUERY:
 		err = bpf_prog_query(&attr, uattr);
 		break;
-#endif
 	case BPF_PROG_TEST_RUN:
 		err = bpf_prog_test_run(&attr, uattr);
 		break;
-- 
2.17.1

^ permalink raw reply related

* Re: [RFC v2, net-next, PATCH 0/4] Add switchdev on TI-CPSW
From: Ilias Apalodimas @ 2018-06-18 16:46 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri,
	ivecera, f.fainelli, francois.ozog, yogeshs, spatton, Jose.Abreu
In-Reply-To: <20180618162836.GD5865@lunn.ch>

On Mon, Jun 18, 2018 at 06:28:36PM +0200, Andrew Lunn wrote:
> > Yes, if the CPU port is added on the VLAN then unregistered multicast traffic
> > (and thus IGMP joins) will reach the CPU port and everything will work as
> > expected. I think we should not consider this as a "problem" as long as it's
> > descibed properly in Documentation. This switch is excected to support this.
> 
> Back to the two e1000e. What would you expect to happen with them?
> Either IGMP snooping needs to work, or your don't do snooping at
> all.
That's a different use case, you don't have a CPU port on e1000 and it 
"just works". You can't do anything to the card and drop the packet. If you
want to have the same example imagine something like "i filter and drop IGMP
messages on one of the 2 e1000e's on the bridge but i expect IGMP to work".
It's a totally different hardware here were this is an option and for TI 
usecases a valid option.

> 
> > What you describe is essentially what happens on "example 2."
> > Enabling the unregistered multicat traffic to be directed to the CPU will cover
> > all use cases that require no user intervention for everything to work. MDBs
> > will automcatically be added/removed to the hardware and traffic will be
> > offloaded.
> > With the current code the user has that possibility, so it's up to him to 
> > decide what mode of configuration he prefers.
> 
> So by default, it just needs to work. You can give the user the option
> to shoot themselves in the foot, but they need to actively pull the
> trigger to blow their own foot off.
Yes it does by default. I don't consider it "foot shooting" though. 
If we stop thinking about switches connected to user environments and think of
industrial ones, my understanding is that this is a common scenarioa that needs
to be supported.

> 
> > > > Setting on/off and IFF_MULTICAST (on eth0/eth1/br0) will affect registered 
> > > > multicast masks programmed in the switch(for port1, port2, cpu port
> > > > respectively).
> > > 
> > > > This muct occur before adding VLANs on the interfaces. If you change the
> > > > flag after the VLAN configuration you need to re-issue the VLAN config 
> > > > commands. 
> > > 
> > > This you should fix. You should be able to get the stack to tell you
> > > about all the configured VLANs, so you can re-program the switch.
> > I was planning fixing these via bridge link commands which would get propagated
> > to the driver for port1/port2 and CPU port. I just didn't find anything
> > relevant to multicast on bridge commands apart from flooding (which is used 
> > properly). I think that the proper way to do this is follow the logic that was
> > introduced by VLANs i.e: 
> > bridge vlan add dev br0 vid 100 pvid untagged self <---- destined for CPU port
> > and apply it for multicast/flooding etc.
> > This requires iproute2 changes first though.
> 
> No, i think you can do this in the driver. The driver just needs to
> ask the stack to tell it about all the VLANs again. Or you walk the
> VLAN tables in the hardware and do the programming based on that. I
> don't see why user space should be involved at all. What would you
> expect with two e1000e's?
We are pretty much describing the same thing i just thought having a bridge
command for it was more appropriate.
After the user removes the multicast flag for an interface i'll just walk VLANs
and adjust accordingly. It's doable and i'll change it for the patch.


Thanks
Ilias

^ permalink raw reply

* Re: [RFC v2, net-next, PATCH 0/4] Add switchdev on TI-CPSW
From: Andrew Lunn @ 2018-06-18 16:28 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri,
	ivecera, f.fainelli, francois.ozog, yogeshs, spatton, Jose.Abreu
In-Reply-To: <20180618160418.GA25068@apalos>

> Yes, if the CPU port is added on the VLAN then unregistered multicast traffic
> (and thus IGMP joins) will reach the CPU port and everything will work as
> expected. I think we should not consider this as a "problem" as long as it's
> descibed properly in Documentation. This switch is excected to support this.

Back to the two e1000e. What would you expect to happen with them?
Either IGMP snooping needs to work, or your don't do snooping at
all.

> What you describe is essentially what happens on "example 2."
> Enabling the unregistered multicat traffic to be directed to the CPU will cover
> all use cases that require no user intervention for everything to work. MDBs
> will automcatically be added/removed to the hardware and traffic will be
> offloaded.
> With the current code the user has that possibility, so it's up to him to 
> decide what mode of configuration he prefers.

So by default, it just needs to work. You can give the user the option
to shoot themselves in the foot, but they need to actively pull the
trigger to blow their own foot off.

> > > Setting on/off and IFF_MULTICAST (on eth0/eth1/br0) will affect registered 
> > > multicast masks programmed in the switch(for port1, port2, cpu port
> > > respectively).
> > 
> > > This muct occur before adding VLANs on the interfaces. If you change the
> > > flag after the VLAN configuration you need to re-issue the VLAN config 
> > > commands. 
> > 
> > This you should fix. You should be able to get the stack to tell you
> > about all the configured VLANs, so you can re-program the switch.
> I was planning fixing these via bridge link commands which would get propagated
> to the driver for port1/port2 and CPU port. I just didn't find anything
> relevant to multicast on bridge commands apart from flooding (which is used 
> properly). I think that the proper way to do this is follow the logic that was
> introduced by VLANs i.e: 
> bridge vlan add dev br0 vid 100 pvid untagged self <---- destined for CPU port
> and apply it for multicast/flooding etc.
> This requires iproute2 changes first though.

No, i think you can do this in the driver. The driver just needs to
ask the stack to tell it about all the VLANs again. Or you walk the
VLAN tables in the hardware and do the programming based on that. I
don't see why user space should be involved at all. What would you
expect with two e1000e's?

       Andrew

^ permalink raw reply

* Re: [RFC v2, net-next, PATCH 4/4] net/cpsw_switchdev: add switchdev mode of operation on cpsw driver
From: Andrew Lunn @ 2018-06-18 16:16 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri,
	ivecera, f.fainelli, francois.ozog, yogeshs, spatton, Jose.Abreu
In-Reply-To: <1528974690-31600-5-git-send-email-ilias.apalodimas@linaro.org>

> @@ -2711,6 +2789,10 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
>  	if (of_property_read_bool(node, "dual_emac"))
>  		data->switch_mode = CPSW_DUAL_EMAC;
>  
> +	/* switchdev overrides DTS */
> +	if (IS_ENABLED(CONFIG_TI_CPSW_SWITCHDEV))
> +		data->switch_mode = CPSW_SWITCHDEV;
> +

I know you discussed this a bit with Jiri, but i still think if
'dual_mac" is found, you should do dual mac. The DT clearly requests
dual mac, and doing anything else is going to cause confusion.

The device tree binding is ambiguous what should happen when dual-mac
is missing. So i would only enable swithdev mode in that case.

But ideally, it should be a new driver with a new binding.

    Andrew

^ permalink raw reply

* Re: [RFC v2, net-next, PATCH 0/4] Add switchdev on TI-CPSW
From: Ilias Apalodimas @ 2018-06-18 16:04 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri,
	ivecera, f.fainelli, francois.ozog, yogeshs, spatton, Jose.Abreu
In-Reply-To: <20180618150424.GA5865@lunn.ch>

On Mon, Jun 18, 2018 at 05:04:24PM +0200, Andrew Lunn wrote:
> Hi Ilias
> 
> Thanks for removing the CPU port. That helps a lot moving forward.
> 
> > - Multicast testing client-port1(tagged on vlan 100) server-port1
> > switch-config is provided by TI (https://git.ti.com/switch-config)
> > and is used to verify correct switch configuration.
> > 1. switch-config output
> > 	- type: vlan , vid = 100, untag_force = 0x4, reg_mcast = 0x6,
> > 	unreg_mcast = 0x0, member_list = 0x6
> > Server running on sw0p2: iperf -s -u -B 239.1.1.1 -i 1
> > Client running on sw0p1: iperf -c 239.1.1.1 -u -b 990m -f m -i 5 -t 3600
> > No IGMP reaches the CPU port to add MDBs(since CPU does not receive 
> > unregistered multicast as programmed).
> 
> Is this something you can work around? Is there a TCAM you can program
> to detect IGMP and pass the packets to the CPU? Without receiving
> IGMP, multicast is pretty broken.
Yes it's described in example 2. (i'll explain in detail further down)
> 
> If i understand right, a multicast listener running on the CPU should
> work, since you can add an MDB to receive multicast traffic from the
> two ports. Multicast traffic sent from the CPU also works. What does
> not work is IGMP snooping of traffic between the two switch ports. You
> have no access to the IGMP frames, so cannot snoop. So unless you can
> work around that with a TCAM, i think you just have to blindly pass
> all multicast between the two ports.
Yes, if the CPU port is added on the VLAN then unregistered multicast traffic
(and thus IGMP joins) will reach the CPU port and everything will work as
expected. I think we should not consider this as a "problem" as long as it's
descibed properly in Documentation. This switch is excected to support this.

What you describe is essentially what happens on "example 2."
Enabling the unregistered multicat traffic to be directed to the CPU will cover
all use cases that require no user intervention for everything to work. MDBs
will automcatically be added/removed to the hardware and traffic will be
offloaded.
With the current code the user has that possibility, so it's up to him to 
decide what mode of configuration he prefers.

> 
> > Setting on/off and IFF_MULTICAST (on eth0/eth1/br0) will affect registered 
> > multicast masks programmed in the switch(for port1, port2, cpu port
> > respectively).
> 
> > This muct occur before adding VLANs on the interfaces. If you change the
> > flag after the VLAN configuration you need to re-issue the VLAN config 
> > commands. 
> 
> This you should fix. You should be able to get the stack to tell you
> about all the configured VLANs, so you can re-program the switch.
I was planning fixing these via bridge link commands which would get propagated
to the driver for port1/port2 and CPU port. I just didn't find anything
relevant to multicast on bridge commands apart from flooding (which is used 
properly). I think that the proper way to do this is follow the logic that was
introduced by VLANs i.e: 
bridge vlan add dev br0 vid 100 pvid untagged self <---- destined for CPU port
and apply it for multicast/flooding etc.
This requires iproute2 changes first though.

> 
> > - NFS:
> > The only way for NFS to work is by chrooting to a minimal environment when 
> > switch configuration that will affect connectivity is needed.
> 
> You might want to look at the commit history for DSA. Florian added a
> patch which makes NFS root work with DSA. It might give you clues as
> to what you need to add to make it just work.
I'll have a look. Chrooting is rarely needed in our case anyway. It's only
needed when "loss of connectivity" is bound to take place.
> 
>    Andrew

Thanks
Ilias

^ permalink raw reply

* Re: [PATCH 2/5] net: emaclite: Balance braces in else statement
From: Joe Perches @ 2018-06-18 16:03 UTC (permalink / raw)
  To: Radhey Shyam Pandey, davem, andrew, michal.simek
  Cc: netdev, linux-arm-kernel, linux-kernel
In-Reply-To: <1529322610-27215-3-git-send-email-radhey.shyam.pandey@xilinx.com>

On Mon, 2018-06-18 at 17:20 +0530, Radhey Shyam Pandey wrote:
> Remove else as it is not required with if doing a return.
> Fixes below checkpatch warning.

> WARNING: else is not generally useful after a break or return

checkpatch is stupid and doesn't understand code flow.
Always try to improve code flow instead of merely
following brainless instructions from a script.

So:

> diff --git a/drivers/net/ethernet/xilinx/xilinx_emaclite.c b/drivers/net/ethernet/xilinx/xilinx_emaclite.c
[]
> @@ -569,13 +569,11 @@ static void xemaclite_tx_handler(struct net_device *dev)
>  					(u8 *) lp->deferred_skb->data,
>  					lp->deferred_skb->len) != 0)
>  			return;
> -		else {
> -			dev->stats.tx_bytes += lp->deferred_skb->len;
> -			dev_kfree_skb_irq(lp->deferred_skb);
> -			lp->deferred_skb = NULL;
> -			netif_trans_update(dev); /* prevent tx timeout */
> -			netif_wake_queue(dev);
> -		}
> +		dev->stats.tx_bytes += lp->deferred_skb->len;
> +		dev_kfree_skb_irq(lp->deferred_skb);
> +		lp->deferred_skb = NULL;
> +		netif_trans_update(dev); /* prevent tx timeout */
> +		netif_wake_queue(dev);
>  	}
>  }

If you really want to redo this function, perhaps something like:

static void xemaclite_tx_handler(struct net_device *dev)
{
	struct net_local *lp = netdev_priv(dev);

	dev->stats.tx_packets++;

	if (!lp->deferred_skb)
		return;

	if (xemaclite_send_data(lp, (u8 *)lp->deferred_skb->data,
				lp->deferred_skb->len))
		return;

	dev->stats.tx_bytes += lp->deferred_skb->len;
	dev_kfree_skb_irq(lp->deferred_skb);
	lp->deferred_skb = NULL;
	netif_trans_update(dev); /* prevent tx timeout */
	netif_wake_queue(dev);
}
 
> @@ -1052,13 +1050,13 @@ static bool get_bool(struct platform_device *ofdev, const char *s)
>  {
>  	u32 *p = (u32 *)of_get_property(ofdev->dev.of_node, s, NULL);
>  
> -	if (p) {
> +	if (p)
>  		return (bool)*p;
> -	} else {
> -		dev_warn(&ofdev->dev, "Parameter %s not found,"
> +
> +	dev_warn(&ofdev->dev, "Parameter %s not found,"
>  			"defaulting to false\n", s);
> -		return false;
> -	}
> +
> +	return false;
>  }

And this function has backward logic as the failure paths
are the ones that should return early or use a goto.

Perhaps something like:

static bool get_bool(struct platform_device *ofdev, const char *s)
{
	u32 *p = (u32 *)of_get_property(ofdev->dev.of_node, s, NULL);

	if (!p) {
		dev_warn(&ofdev->dev,
			 "Parameter '%s' not found, defaulting to false\n", s);
		return false;
	}

	return *p;
}

^ permalink raw reply

* Re: [PATCH net-next 3/3] tcp: use monotonic timestamps for PAWS
From: Eric Dumazet @ 2018-06-18 15:59 UTC (permalink / raw)
  To: Arnd Bergmann, Harsh Jain, Herbert Xu, David S. Miller,
	Eric Dumazet, Alexey Kuznetsov, Hideaki YOSHIFUJI, Atul Gupta,
	Gustavo A. R. Silva
  Cc: Lawrence Brakmo, Christoph Paasch, Soheil Hassas Yeganeh, y2038,
	netdev, Ursula Braun, Florian Westphal, Alexei Starovoitov,
	linux-kernel, Priyaranjan Jha, Maciej Żenczykowski,
	Yuchung Cheng, David Ahern, Ivan Delalande, Michael Werner,
	Neal Cardwell, linux-crypto
In-Reply-To: <20180618152242.1566661-3-arnd@arndb.de>



On 06/18/2018 08:22 AM, Arnd Bergmann wrote:
> Using get_seconds() for timestamps is deprecated since it can lead
> to overflows on 32-bit systems. While the interface generally doesn't
> overflow until year 2106, the specific implementation of the TCP PAWS
> algorithm breaks in 2038 when the intermediate signed 32-bit timestamps
> overflow.
>
...

>
>  
>  static inline u32 tcp_cookie_time(void)
> @@ -1361,7 +1362,7 @@ static inline bool tcp_paws_check(const struct tcp_options_received *rx_opt,
>  {
>  	if ((s32)(rx_opt->ts_recent - rx_opt->rcv_tsval) <= paws_win)
>  		return true;
> -	if (unlikely(get_seconds() >= rx_opt->ts_recent_stamp + TCP_PAWS_24DAYS))
> +	if (unlikely(ktime_get_seconds() >= rx_opt->ts_recent_stamp + TCP_PAWS_24DAYS))
>  		return true;
>  	/*
>  	 * Some OSes send SYN and SYNACK messages with tsval=0 tsecr=0,
> @@ -1391,7 +1392,7 @@ static inline bool tcp_paws_reject(const struct tcp_options_received *rx_opt,
>  
>  	   However, we can relax time bounds for RST segments to MSL.
>  	 */
> -	if (rst && get_seconds() >= rx_opt->ts_recent_stamp + TCP_PAWS_MSL)
> +	if (rst && ktime_get_seconds() >= rx_opt->ts_recent_stamp + TCP_PAWS_MSL)
>  		return false;
>  	return true;



Please use the time_after32(), since ktime_get_seconds() is time64_t while ts_recent_stamp is int.

Same remark for tcp_twsk_unique()

Lets clean up this stuff, thanks !

_______________________________________________
Y2038 mailing list
Y2038@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/y2038

^ permalink raw reply

* Re: [PATCH net] net/ipv6: respect rcu grace period before freeing fib6_info
From: David Ahern @ 2018-06-18 15:49 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller; +Cc: netdev, Eric Dumazet
In-Reply-To: <20180618122431.131265-1-edumazet@google.com>

On 6/18/18 6:24 AM, Eric Dumazet wrote:
> syzbot reported use after free that is caused by fib6_info being
> freed without a proper RCU grace period.
> 

...

> Fixes: a64efe142f5e ("net/ipv6: introduce fib6_info struct and helpers")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: David Ahern <dsahern@gmail.com>
> Reported-by: syzbot+9e6d75e3edef427ee888@syzkaller.appspotmail.com
> ---
>  include/net/ip6_fib.h | 5 +++--
>  net/ipv6/ip6_fib.c    | 5 +++--
>  2 files changed, 6 insertions(+), 4 deletions(-)
> 

I wondered if that was needed when flipping to the new data struct.
Apparently so. Thanks for the patch,

Acked-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply

* Re: [PATCH net-next 0/10] xfrm: remove flow cache
From: Kristian Evensen @ 2018-06-18 15:46 UTC (permalink / raw)
  To: Florian Westphal
  Cc: David Miller, Network Development, Steffen Klassert, ilant
In-Reply-To: <CAKfDRXgbA1x9QS7ABiMwKxf11sYqfVggVXskdTNoL-CVDURhRA@mail.gmail.com>

Hi,

On Wed, Jun 13, 2018 at 3:43 PM, Kristian Evensen
<kristian.evensen@gmail.com> wrote:
> Thanks! I will prepare a firmware for one of my devices tonight, start
> testing tomorrow and report back when I have some results.

We have been testing this patch over the weekend and it has a
significant impact on performance. In our tests, the throughput
reduction has been reduced from around -20% to -5%. We also see that
the overall throughput is independent of the number of tunnels, while
before the throughput was reduced as the number of tunnels increased.
We are still running our tests and will report back if we observe
anything new, but so far, so good.

BR,
Kristian

^ permalink raw reply

* Re: Link modes representation in phylib
From: Andrew Lunn @ 2018-06-18 15:40 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, Russell King - ARM Linux, Florian Fainelli, netdev,
	Antoine Tenart, thomas.petazzoni@bootlin.com, Gregory CLEMENT,
	Miquel Raynal
In-Reply-To: <20180618170224.321f8264@bootlin.com>

On Mon, Jun 18, 2018 at 05:02:24PM +0200, Maxime Chevallier wrote:
> Hello everyone,
> 
> I'm currently working on adding support for 2.5GBaseT on some Marvell
> PHYs (the marvell10g family, including the 88X3310).
> 
> However, phylib doesn't quite support these modes yet. Its stores the
> different supported and advertised modes in u32 fields, which can't
> contain the relevant values for 2500BaseT mode (and all other modes that
> come after the 31st one).

Hi Maxime

Did you look at phylink? I think it already gets this right.  It could
be, any MAC which needs to use > bit 31 should use phylink, not
phylib.

That narrows the problem down to just the PHY drivers. We might be
able to mass convert those. Or maybe we can consider just doing some
conversion work on PHYs which support > 1Gbps?

	   Andrew

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox