Netdev List
 help / color / mirror / Atom feed
* [PATCH v2] ipv6: Not to probe neighbourless routes
From: Yi Wang @ 2019-08-28  2:19 UTC (permalink / raw)
  To: davem
  Cc: kuznet, yoshfuji, netdev, linux-kernel, xue.zhihong, wang.yi59,
	wang.liang82, Cheng Lin

From: Cheng Lin <cheng.lin130@zte.com.cn>

Originally, Router Reachability Probing require a neighbour entry
existed. Commit 2152caea7196 ("ipv6: Do not depend on rt->n in
rt6_probe().") removed the requirement for a neighbour entry. And
commit f547fac624be ("ipv6: rate-limit probes for neighbourless
routes") adds rate-limiting for neighbourless routes.

And, the Neighbor Discovery for IP version 6 (IPv6)(rfc4861) says,
"
7.2.5.  Receipt of Neighbor Advertisements

When a valid Neighbor Advertisement is received (either solicited or
unsolicited), the Neighbor Cache is searched for the target's entry.
If no entry exists, the advertisement SHOULD be silently discarded.
There is no need to create an entry if none exists, since the
recipient has apparently not initiated any communication with the
target.
".

In rt6_probe(), just a Neighbor Solicitation message are transmited.
When receiving a Neighbor Advertisement, the node does nothing in a
Neighborless condition.

Not sure it's needed to create a neighbor entry in Router
Reachability Probing. And the Original way may be the right way.

This patch recover the requirement for a neighbour entry.

Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn>
---
 include/net/ip6_fib.h | 5 -----
 net/ipv6/route.c      | 6 +-----
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 4b5656c..8c2e022 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -124,11 +124,6 @@ struct rt6_exception {
 
 struct fib6_nh {
 	struct fib_nh_common	nh_common;
-
-#ifdef CONFIG_IPV6_ROUTER_PREF
-	unsigned long		last_probe;
-#endif
-
 	struct rt6_info * __percpu *rt6i_pcpu;
 	struct rt6_exception_bucket __rcu *rt6i_exception_bucket;
 };
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index fd059e0..1839dd7 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -639,12 +639,12 @@ static void rt6_probe(struct fib6_nh *fib6_nh)
 	nh_gw = &fib6_nh->fib_nh_gw6;
 	dev = fib6_nh->fib_nh_dev;
 	rcu_read_lock_bh();
-	idev = __in6_dev_get(dev);
 	neigh = __ipv6_neigh_lookup_noref(dev, nh_gw);
 	if (neigh) {
 		if (neigh->nud_state & NUD_VALID)
 			goto out;
 
+		idev = __in6_dev_get(dev);
 		write_lock(&neigh->lock);
 		if (!(neigh->nud_state & NUD_VALID) &&
 		    time_after(jiffies,
@@ -654,13 +654,9 @@ static void rt6_probe(struct fib6_nh *fib6_nh)
 				__neigh_set_probe_once(neigh);
 		}
 		write_unlock(&neigh->lock);
-	} else if (time_after(jiffies, fib6_nh->last_probe +
-				       idev->cnf.rtr_probe_interval)) {
-		work = kmalloc(sizeof(*work), GFP_ATOMIC);
 	}
 
 	if (work) {
-		fib6_nh->last_probe = jiffies;
 		INIT_WORK(&work->work, rt6_probe_deferred);
 		work->target = *nh_gw;
 		dev_hold(dev);
-- 
1.8.3.1


^ permalink raw reply related

* Re: [PATCH v2] ipv6: Not to probe neighbourless routes
From: David Miller @ 2019-08-28  3:13 UTC (permalink / raw)
  To: wang.yi59
  Cc: kuznet, yoshfuji, netdev, linux-kernel, xue.zhihong, wang.liang82,
	cheng.lin130
In-Reply-To: <1566958765-1686-1-git-send-email-wang.yi59@zte.com.cn>


Because you didn't even compile test the previous patch, I want to
know how you did functional testing on this version on current kernel
versions?

^ permalink raw reply

* Re: [PATCH v5 net-next 02/18] ionic: Add hardware init and device commands
From: Jakub Kicinski @ 2019-08-28  3:16 UTC (permalink / raw)
  To: Shannon Nelson; +Cc: netdev, davem
In-Reply-To: <a2ed5049-14c6-749c-9a9b-f826d9a88cb0@pensando.io>

On Tue, 27 Aug 2019 14:22:55 -0700, Shannon Nelson wrote:
> >> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_devlink.c b/drivers/net/ethernet/pensando/ionic/ionic_devlink.c
> >> index e24ef6971cd5..1ca1e33cca04 100644
> >> --- a/drivers/net/ethernet/pensando/ionic/ionic_devlink.c
> >> +++ b/drivers/net/ethernet/pensando/ionic/ionic_devlink.c
> >> @@ -11,8 +11,28 @@
> >>   static int ionic_dl_info_get(struct devlink *dl, struct devlink_info_req *req,
> >>   			     struct netlink_ext_ack *extack)
> >>   {
> >> +	struct ionic *ionic = devlink_priv(dl);
> >> +	struct ionic_dev *idev = &ionic->idev;
> >> +	char buf[16];
> >> +
> >>   	devlink_info_driver_name_put(req, IONIC_DRV_NAME);
> >>   
> >> +	devlink_info_version_running_put(req,
> >> +					 DEVLINK_INFO_VERSION_GENERIC_FW_MGMT,
> >> +					 idev->dev_info.fw_version);  
> > Are you sure this is not the FW that controls the data path?  
> 
> There is only one FW rev to report, and this covers mgmt and data.

Can you add a key for that? Cause this one clearly says management..

> >  
> >> +	snprintf(buf, sizeof(buf), "0x%x", idev->dev_info.asic_type);
> >> +	devlink_info_version_fixed_put(req,
> >> +				       DEVLINK_INFO_VERSION_GENERIC_BOARD_ID,
> >> +				       buf);  
> > Board ID is not ASIC. This is for identifying a board version with all
> > its components which surround the main ASIC.
> >  
> >> +	snprintf(buf, sizeof(buf), "0x%x", idev->dev_info.asic_rev);
> >> +	devlink_info_version_fixed_put(req,
> >> +				       DEVLINK_INFO_VERSION_GENERIC_BOARD_REV,
> >> +				       buf);  
> > ditto  
> 
> Since I don't have any board info available at this point, shall I use 
> my own "asic.id" and "asic.rev" strings, or in this patch shall I add 
> something like this to devlink.h and use them here:
> 
> /* Part number, identifier of asic design */
> #define DEVLINK_INFO_VERSION_GENERIC_ASIC_ID    "asic.id"
> /* Revision of asic design */
> #define DEVLINK_INFO_VERSION_GENERIC_ASIC_REV    "asic.rev"

Yes, please add these to the generic items and document appropriately.

^ permalink raw reply

* Re: [PATCH net-next v2 0/6] net: dsa: explicit programmation of VLAN on CPU ports
From: David Miller @ 2019-08-28  3:17 UTC (permalink / raw)
  To: olteanv; +Cc: vivien.didelot, netdev, f.fainelli, andrew
In-Reply-To: <CA+h21hqPTqzCPPYqF1zvbGhu=yeqPFQ_X77xtfmt1mzENDQ9Dw@mail.gmail.com>

From: Vladimir Oltean <olteanv@gmail.com>
Date: Sun, 25 Aug 2019 21:27:23 +0300

> On Sun, 25 Aug 2019 at 20:25, Vivien Didelot <vivien.didelot@gmail.com> wrote:
>>
>> When a VLAN is programmed on a user port, every switch of the fabric also
>> program the CPU ports and the DSA links as part of the VLAN. To do that,
>> DSA makes use of bitmaps to prepare all members of a VLAN.
>>
>> While this is expected for DSA links which are used as conduit between
>> interconnected switches, only the dedicated CPU port of the slave must be
>> programmed, not all CPU ports of the fabric. This may also cause problems in
>> other corners of DSA such as the tag_8021q.c driver, which needs to program
>> its ports manually, CPU port included.
>>
>> We need the dsa_port_vlan_{add,del} functions and its dsa_port_vid_{add,del}
>> variants to simply trigger the VLAN programmation without any logic in them,
>> but they may currently skip the operation based on the bridge device state.
>>
>> This patchset gets rid of the bitmap operations, and moves the bridge device
>> check as well as the explicit programmation of CPU ports where they belong,
>> in the slave code.
>>
>> While at it, clear the VLAN flags before programming a CPU port, as it
>> doesn't make sense to forward the PVID flag for example for such ports.
>>
>> Changes in v2: only clear the PVID flag.
 ...
> For the whole series:
> Tested-by: Vladimir Oltean <olteanv@gmail.com>
> Thanks!

Series applied.

^ permalink raw reply

* Re: [PATCH net-next v4 0/3] net: ethernet: mediatek: convert to PHYLINK
From: David Miller @ 2019-08-28  3:19 UTC (permalink / raw)
  To: opensource
  Cc: john, sean.wang, nelson.chang, matthias.bgg, netdev,
	linux-arm-kernel, linux-mediatek, linux-mips, linux, frank-w, sr
In-Reply-To: <20190825174341.20750-1-opensource@vdorst.com>

From: René van Dorst <opensource@vdorst.com>
Date: Sun, 25 Aug 2019 19:43:38 +0200

> These patches converts mediatek driver to PHYLINK API.
> 
> v3->v4:
> * Phylink improvements and clean-ups after review
> v2->v3:
> * Phylink improvements and clean-ups after review
> v1->v2:
> * Rebase for mt76x8 changes
> * Phylink improvements and clean-ups after review
> * SGMII port doesn't support 2.5Gbit in SGMII mode only in BASE-X mode.
>   Refactor the code.

Series applied.

^ permalink raw reply

* Re: [PATCH v2 net] Add genphy_c45_config_aneg() function to phy-c45.c
From: David Miller @ 2019-08-28  3:20 UTC (permalink / raw)
  To: marco.hartmann
  Cc: andrew, f.fainelli, hkallweit1, netdev, linux-kernel,
	christian.herber
In-Reply-To: <1566385208-23523-1-git-send-email-marco.hartmann@nxp.com>

From: Marco Hartmann <marco.hartmann@nxp.com>
Date: Wed, 21 Aug 2019 11:00:46 +0000

> Commit 34786005eca3 ("net: phy: prevent PHYs w/o Clause 22 regs from calling
> genphy_config_aneg") introduced a check that aborts phy_config_aneg()
> if the phy is a C45 phy.
> This causes phy_state_machine() to call phy_error() so that the phy
> ends up in PHY_HALTED state.
> 
> Instead of returning -EOPNOTSUPP, call genphy_c45_config_aneg()
> (analogous to the C22 case) so that the state machine can run
> correctly.
> 
> genphy_c45_config_aneg() closely resembles mv3310_config_aneg()
> in drivers/net/phy/marvell10g.c, excluding vendor specific
> configurations for 1000BaseT.
> 
> Fixes: 22b56e827093 ("net: phy: replace genphy_10g_driver with genphy_c45_driver")
> 
> Signed-off-by: Marco Hartmann <marco.hartmann@nxp.com>
> ---
> Changes in v2:
> - corrected commit message
> - reordered variables

Applied to net-next.

^ permalink raw reply

* Re: [PATCH v2 net] Add genphy_c45_config_aneg() function to phy-c45.c
From: David Miller @ 2019-08-28  3:22 UTC (permalink / raw)
  To: marco.hartmann
  Cc: andrew, f.fainelli, hkallweit1, netdev, linux-kernel,
	christian.herber
In-Reply-To: <20190827.202043.766506227116086877.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Tue, 27 Aug 2019 20:20:43 -0700 (PDT)

> Applied to net-next.

My bad, applied to net and queued up for v5.2 -stable.

^ permalink raw reply

* Re: [PATCH v5 net-next 02/18] ionic: Add hardware init and device commands
From: Shannon Nelson @ 2019-08-28  3:26 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, davem
In-Reply-To: <20190827201646.2befe6c3@cakuba.netronome.com>

On 8/27/19 8:16 PM, Jakub Kicinski wrote:
> On Tue, 27 Aug 2019 14:22:55 -0700, Shannon Nelson wrote:
>>>> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_devlink.c b/drivers/net/ethernet/pensando/ionic/ionic_devlink.c
>>>> index e24ef6971cd5..1ca1e33cca04 100644
>>>> --- a/drivers/net/ethernet/pensando/ionic/ionic_devlink.c
>>>> +++ b/drivers/net/ethernet/pensando/ionic/ionic_devlink.c
>>>> @@ -11,8 +11,28 @@
>>>>    static int ionic_dl_info_get(struct devlink *dl, struct devlink_info_req *req,
>>>>    			     struct netlink_ext_ack *extack)
>>>>    {
>>>> +	struct ionic *ionic = devlink_priv(dl);
>>>> +	struct ionic_dev *idev = &ionic->idev;
>>>> +	char buf[16];
>>>> +
>>>>    	devlink_info_driver_name_put(req, IONIC_DRV_NAME);
>>>>    
>>>> +	devlink_info_version_running_put(req,
>>>> +					 DEVLINK_INFO_VERSION_GENERIC_FW_MGMT,
>>>> +					 idev->dev_info.fw_version);
>>> Are you sure this is not the FW that controls the data path?
>> There is only one FW rev to report, and this covers mgmt and data.
> Can you add a key for that? Cause this one clearly says management..

Perhaps something like this?

/* Overall FW version */
#define DEVLINK_INFO_VERSION_GENERIC_FW    "fw"


>> Since I don't have any board info available at this point, shall I use
>> my own "asic.id" and "asic.rev" strings, or in this patch shall I add
>> something like this to devlink.h and use them here:
>>
>> /* Part number, identifier of asic design */
>> #define DEVLINK_INFO_VERSION_GENERIC_ASIC_ID    "asic.id"
>> /* Revision of asic design */
>> #define DEVLINK_INFO_VERSION_GENERIC_ASIC_REV    "asic.rev"
> Yes, please add these to the generic items and document appropriately.

Sure.  Is there any place besides 
Documentation/networking/devlink-info-versions.rst?

sln



^ permalink raw reply

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Masami Hiramatsu @ 2019-08-28  3:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andy Lutomirski, Alexei Starovoitov, Kees Cook, LSM List,
	James Morris, Jann Horn, Peter Zijlstra, Masami Hiramatsu,
	David S. Miller, Daniel Borkmann, Network Development, bpf,
	kernel-team, Linux API
In-Reply-To: <20190827192144.3b38b25a@gandalf.local.home>

On Tue, 27 Aug 2019 19:21:44 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:


> > Here's my proposal for CAP_TRACING, documentation-style:
> > 
> > --- begin ---
> > 
> > CAP_TRACING enables a task to use various kernel features to trace
> > running user programs and the kernel itself.  CAP_TRACING also enables
> > a task to bypass some speculation attack countermeasures.  A task in
> > the init user namespace with CAP_TRACING will be able to tell exactly
> > what kernel code is executed and when, and will be able to read kernel
> > registers and kernel memory.  It will, similarly, be able to read the
> > state of other user tasks.
> > 
> > Specifically, CAP_TRACING allows the following operations.  It may
> > allow more operations in the future:
> > 
> >  - Full use of perf_event_open(), similarly to the effect of
> > kernel.perf_event_paranoid == -1.
> > 
> >  - Loading and attaching tracing BPF programs, including use of BPF
> > raw tracepoints.
> > 
> >  - Use of BPF stack maps.
> > 
> >  - Use of bpf_probe_read() and bpf_trace_printk().
> > 
> >  - Use of unsafe pointer-to-integer conversions in BPF.
> > 
> >  - Bypassing of BPF's speculation attack hardening measures and
> > constant blinding.  (Note: other mechanisms might also allow this.)
> > 
> > CAP_TRACING does not override normal permissions on sysfs or debugfs.
> > This means that, unless a new interface for programming kprobes and
> > such is added, it does not directly allow use of kprobes.
> 
> kprobes can be created in the tracefs filesystem (which is separate from
> debugfs, tracefs just gets automatically mounted
> in /sys/kernel/debug/tracing when debugfs is mounted) from the
> kprobe_events file. /sys/kernel/tracing is just the tracefs
> directory without debugfs, and was created specifically to allow
> tracing to be access without opening up the can of worms in debugfs.

I like the CAP_TRACING for tracefs. Can we make the tracefs itself
check the CAP_TRACING and call file_ops? or each tracefs file-ops
handlers must check it?

> Should we allow CAP_TRACING access to /proc/kallsyms? as it is helpful
> to convert perf and trace-cmd's function pointers into names. Once you
> allow tracing of the kernel, hiding /proc/kallsyms is pretty useless.

Also, there is a blacklist of kprobes under debugfs. If CAP_TRACING
introduced and it allows to access kallsyms, I would like to move the
blacklist under tracefs, or make an alias of blacklist entry on tracefs.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH net] net: dsa: tag_8021q: Future-proof the reserved fields in the custom VID
From: David Miller @ 2019-08-28  3:32 UTC (permalink / raw)
  To: olteanv; +Cc: f.fainelli, vivien.didelot, andrew, netdev
In-Reply-To: <20190825183212.11426-1-olteanv@gmail.com>

From: Vladimir Oltean <olteanv@gmail.com>
Date: Sun, 25 Aug 2019 21:32:12 +0300

> After witnessing the discussion in https://lkml.org/lkml/2019/8/14/151
> w.r.t. ioctl extensibility, it became clear that such an issue might
> prevent that the 3 RSV bits inside the DSA 802.1Q tag might also suffer
> the same fate and be useless for further extension.
> 
> So clearly specify that the reserved bits should currently be
> transmitted as zero and ignored on receive. The DSA tagger already does
> this (and has always did), and is the only known user so far (no
> Wireshark dissection plugin, etc). So there should be no incompatibility
> to speak of.
> 
> Fixes: 0471dd429cea ("net: dsa: tag_8021q: Create a stable binary format")
> Signed-off-by: Vladimir Oltean <olteanv@gmail.com>

Applied and queued up for v5.2 -stable.

^ permalink raw reply

* Re: [PATCH v5 net-next 02/18] ionic: Add hardware init and device commands
From: Jakub Kicinski @ 2019-08-28  3:34 UTC (permalink / raw)
  To: Shannon Nelson; +Cc: netdev, davem
In-Reply-To: <93a16cf5-b8a2-8915-4190-b81607058eb2@pensando.io>

On Tue, 27 Aug 2019 20:26:59 -0700, Shannon Nelson wrote:
> On 8/27/19 8:16 PM, Jakub Kicinski wrote:
> > On Tue, 27 Aug 2019 14:22:55 -0700, Shannon Nelson wrote:  
> >>>> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_devlink.c b/drivers/net/ethernet/pensando/ionic/ionic_devlink.c
> >>>> index e24ef6971cd5..1ca1e33cca04 100644
> >>>> --- a/drivers/net/ethernet/pensando/ionic/ionic_devlink.c
> >>>> +++ b/drivers/net/ethernet/pensando/ionic/ionic_devlink.c
> >>>> @@ -11,8 +11,28 @@
> >>>>    static int ionic_dl_info_get(struct devlink *dl, struct devlink_info_req *req,
> >>>>    			     struct netlink_ext_ack *extack)
> >>>>    {
> >>>> +	struct ionic *ionic = devlink_priv(dl);
> >>>> +	struct ionic_dev *idev = &ionic->idev;
> >>>> +	char buf[16];
> >>>> +
> >>>>    	devlink_info_driver_name_put(req, IONIC_DRV_NAME);
> >>>>    
> >>>> +	devlink_info_version_running_put(req,
> >>>> +					 DEVLINK_INFO_VERSION_GENERIC_FW_MGMT,
> >>>> +					 idev->dev_info.fw_version);  
> >>> Are you sure this is not the FW that controls the data path?  
> >> There is only one FW rev to report, and this covers mgmt and data.  
> > Can you add a key for that? Cause this one clearly says management..  
> 
> Perhaps something like this?
> 
> /* Overall FW version */
> #define DEVLINK_INFO_VERSION_GENERIC_FW    "fw"

Sounds reasonable.

> >> Since I don't have any board info available at this point, shall I use
> >> my own "asic.id" and "asic.rev" strings, or in this patch shall I add
> >> something like this to devlink.h and use them here:
> >>
> >> /* Part number, identifier of asic design */
> >> #define DEVLINK_INFO_VERSION_GENERIC_ASIC_ID    "asic.id"
> >> /* Revision of asic design */
> >> #define DEVLINK_INFO_VERSION_GENERIC_ASIC_REV    "asic.rev"  
> > Yes, please add these to the generic items and document appropriately.  
> 
> Sure.  Is there any place besides 
> Documentation/networking/devlink-info-versions.rst?

Nope, that's the one.

^ permalink raw reply

* Re: [PATCH net-next 0/2] Simplify DSA handling of VLAN subinterface offload
From: David Miller @ 2019-08-28  3:46 UTC (permalink / raw)
  To: olteanv; +Cc: f.fainelli, vivien.didelot, andrew, netdev
In-Reply-To: <20190825194630.12404-1-olteanv@gmail.com>

From: Vladimir Oltean <olteanv@gmail.com>
Date: Sun, 25 Aug 2019 22:46:28 +0300

> Depends on Vivien Didelot's patchset:
> https://patchwork.ozlabs.org/project/netdev/list/?series=127197&state=*
> 
> This patchset removes a few strange-looking guards for -EOPNOTSUPP in
> dsa_slave_vlan_rx_add_vid and dsa_slave_vlan_rx_kill_vid, making that
> code path no longer possible.
> 
> It also disables the code path for the sja1105 driver, which does
> support editing the VLAN table, but not hardware-accelerated VLAN
> sub-interfaces, therefore the check in the DSA core would be wrong.
> There was no better DSA callback to do this than .port_enable, i.e.
> at ndo_open time.

Series applied.

^ permalink raw reply

* Re: [PATCH v2 -next] net: mediatek: remove set but not used variable 'status'
From: David Miller @ 2019-08-28  3:48 UTC (permalink / raw)
  To: maowenan
  Cc: nbd, john, sean.wang, nelson.chang, matthias.bgg, kernel-janitors,
	netdev, linux-arm-kernel, linux-mediatek, linux-kernel
In-Reply-To: <20190826013118.22720-1-maowenan@huawei.com>

From: Mao Wenan <maowenan@huawei.com>
Date: Mon, 26 Aug 2019 09:31:18 +0800

> Fixes gcc '-Wunused-but-set-variable' warning:
> drivers/net/ethernet/mediatek/mtk_eth_soc.c: In function mtk_handle_irq:
> drivers/net/ethernet/mediatek/mtk_eth_soc.c:1951:6: warning: variable status set but not used [-Wunused-but-set-variable]
> 
> Fixes: 296c9120752b ("net: ethernet: mediatek: Add MT7628/88 SoC support")
> Signed-off-by: Mao Wenan <maowenan@huawei.com>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH v3] net: fix skb use after free in netpoll
From: David Miller @ 2019-08-28  3:52 UTC (permalink / raw)
  To: loyou85
  Cc: edumazet, dsterba, dbanerje, fw, davej, tglx, matwey,
	sakari.ailus, netdev, linux-kernel, xiaojunzhao141
In-Reply-To: <1566801964-14691-1-git-send-email-loyou85@gmail.com>

From: Feng Sun <loyou85@gmail.com>
Date: Mon, 26 Aug 2019 14:46:04 +0800

> After commit baeababb5b85d5c4e6c917efe2a1504179438d3b
> ("tun: return NET_XMIT_DROP for dropped packets"),
> when tun_net_xmit drop packets, it will free skb and return NET_XMIT_DROP,
> netpoll_send_skb_on_dev will run into following use after free cases:
> 1. retry netpoll_start_xmit with freed skb;
> 2. queue freed skb in npinfo->txq.
> queue_process will also run into use after free case.
> 
> hit netpoll_send_skb_on_dev first case with following kernel log:
 ...
> Signed-off-by: Feng Sun <loyou85@gmail.com>
> Signed-off-by: Xiaojun Zhao <xiaojunzhao141@gmail.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net-next 0/3] sctp: add SCTP_ECN_SUPPORTED sockopt
From: David Miller @ 2019-08-28  3:55 UTC (permalink / raw)
  To: nhorman; +Cc: lucien.xin, netdev, linux-sctp, marcelo.leitner
In-Reply-To: <20190826110221.GA7831@hmswarspite.think-freely.org>

From: Neil Horman <nhorman@tuxdriver.com>
Date: Mon, 26 Aug 2019 07:02:21 -0400

> On Mon, Aug 26, 2019 at 04:30:01PM +0800, Xin Long wrote:
>> This patchset is to make ecn flag per netns and endpoint and then
>> add SCTP_ECN_SUPPORTED sockopt, as does for other feature flags.
>> 
>> Xin Long (3):
>>   sctp: make ecn flag per netns and endpoint
>>   sctp: allow users to set netns ecn flag with sysctl
>>   sctp: allow users to set ep ecn flag by sockopt
 ...
> Series
> Acked-by: Neil Horman <nhorman@tuxdriver.com>

Series applied to net-next, thanks.

^ permalink raw reply

* Re: [PATCH net] net/rds: Fix info leak in rds6_inc_info_copy()
From: David Miller @ 2019-08-28  4:08 UTC (permalink / raw)
  To: ka-cheong.poon; +Cc: netdev, santosh.shilimkar, rds-devel
In-Reply-To: <1566812352-27332-1-git-send-email-ka-cheong.poon@oracle.com>

From: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Date: Mon, 26 Aug 2019 02:39:12 -0700

> The rds6_inc_info_copy() function has a couple struct members which
> are leaking stack information.  The ->tos field should hold actual
> information and the ->flags field needs to be zeroed out.
> 
> Fixes: 3eb450367d08 ("rds: add type of service(tos) infrastructure")
> Fixes: b7ff8b1036f0 ("rds: Extend RDS API for IPv6 support")
> Reported-by: 黄ID蝴蝶 <butterflyhuangxx@gmail.com>
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [net-next 00/15][pull request] 100GbE Intel Wired LAN Driver Updates 2019-08-26
From: Jakub Kicinski @ 2019-08-28  4:09 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: davem, netdev, nhorman, sassmann
In-Reply-To: <20190827163832.8362-1-jeffrey.t.kirsher@intel.com>

On Tue, 27 Aug 2019 09:38:17 -0700, Jeff Kirsher wrote:
> This series contains updates to ice driver only.

Looks clear from uAPI perspective. It does mix fixes with -next, 
but I guess that's your call.

Code-wise changes like this are perhaps the low-light:

@@ -2105,7 +2108,10 @@ void ice_trigger_sw_intr(struct ice_hw *hw, struct ice_q_vector *q_vector)
  * @ring: Tx ring to be stopped
  * @txq_meta: Meta data of Tx ring to be stopped
  */
-static int
+#ifndef CONFIG_PCI_IOV
+static
+#endif /* !CONFIG_PCI_IOV */
+int
 ice_vsi_stop_tx_ring(struct ice_vsi *vsi, enum ice_disq_rst_src rst_src,
 		     u16 rel_vmvf_num, struct ice_ring *ring,
 		     struct ice_txq_meta *txq_meta)

^ permalink raw reply

* Re: [net-next 00/15][pull request] 100GbE Intel Wired LAN Driver Updates 2019-08-26
From: Jeff Kirsher @ 2019-08-28  4:17 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, nhorman, sassmann
In-Reply-To: <20190827210928.576c5fef@cakuba.netronome.com>

[-- Attachment #1: Type: text/plain, Size: 1128 bytes --]

On Tue, 2019-08-27 at 21:09 -0700, Jakub Kicinski wrote:
> On Tue, 27 Aug 2019 09:38:17 -0700, Jeff Kirsher wrote:
> > This series contains updates to ice driver only.
> 
> Looks clear from uAPI perspective. It does mix fixes with -next, 
> but I guess that's your call.

Yeah, I always debate about sending the fixes to net, but many of them do
not apply cleanly or at all to the previous kernel version since we are
actively adding new features and functionality to this driver.

Once this device gets released, I will be more concerned about getting
fixes into older kernels.

> 
> Code-wise changes like this are perhaps the low-light:
> 
> @@ -2105,7 +2108,10 @@ void ice_trigger_sw_intr(struct ice_hw *hw, struct
> ice_q_vector *q_vector)
>   * @ring: Tx ring to be stopped
>   * @txq_meta: Meta data of Tx ring to be stopped
>   */
> -static int
> +#ifndef CONFIG_PCI_IOV
> +static
> +#endif /* !CONFIG_PCI_IOV */
> +int
>  ice_vsi_stop_tx_ring(struct ice_vsi *vsi, enum ice_disq_rst_src rst_src,
>  		     u16 rel_vmvf_num, struct ice_ring *ring,
>  		     struct ice_txq_meta *txq_meta)


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH net v3 0/2] r8152: fix side effect
From: Jakub Kicinski @ 2019-08-28  4:17 UTC (permalink / raw)
  To: Hayes Wang; +Cc: netdev, nic_swsd, linux-kernel
In-Reply-To: <1394712342-15778-320-Taiwan-albertk@realtek.com>

On Wed, 28 Aug 2019 09:51:40 +0800, Hayes Wang wrote:
> v3:
> Update the commit message for patch #1.
> 
> v2:
> Replace patch #2 with "r8152: remove calling netif_napi_del".
> 
> v1:
> The commit 0ee1f4734967 ("r8152: napi hangup fix after disconnect")
> add a check to avoid using napi_disable after netif_napi_del. However,
> the commit ffa9fec30ca0 ("r8152: set RTL8152_UNPLUG only for real
> disconnection") let the check useless.
> 
> Therefore, I revert commit 0ee1f4734967 ("r8152: napi hangup fix
> after disconnect") first, and add another patch to fix it.

LGTM, seems like if we were to add a Fixes tag it'd point to the

ffa9fec30ca0 ("r8152: set RTL8152_UNPLUG only for real disconnection")

commit, then? So only net needs it, v5.2 is fine.

^ permalink raw reply

* Re: [PATCH V3 net 1/2] openvswitch: Properly set L4 keys on "later" IP fragments
From: Gregory Rose @ 2019-08-28  4:19 UTC (permalink / raw)
  To: Pravin Shelar; +Cc: Linux Kernel Network Developers, Joe Stringer
In-Reply-To: <CAOrHB_DXXSoe9rjamp_OSxDonsqTADrbV4GdUdct=uq_eOXN-Q@mail.gmail.com>


On 8/27/2019 5:33 PM, Pravin Shelar wrote:
> On Tue, Aug 27, 2019 at 7:58 AM Greg Rose <gvrose8192@gmail.com> wrote:
>> When IP fragments are reassembled before being sent to conntrack, the
>> key from the last fragment is used.  Unless there are reordering
>> issues, the last fragment received will not contain the L4 ports, so the
>> key for the reassembled datagram won't contain them.  This patch updates
>> the key once we have a reassembled datagram.
>>
>> The handle_fragments() function works on L3 headers so we pull the L3/L4
>> flow key update code from key_extract into a new function
>> 'key_extract_l3l4'.  Then we add a another new function
>> ovs_flow_key_update_l3l4() and export it so that it is accessible by
>> handle_fragments() for conntrack packet reassembly.
>>
>> Co-authored by: Justin Pettit <jpettit@ovn.org>
>> Signed-off-by: Greg Rose <gvrose8192@gmail.com>
>>
> Looks good to me.
>
> Acked-by: Pravin B Shelar <pshelar@ovn.org>
>
> Thanks,
> Pravin.

Thanks Pravin.

I missed a dash in the Co-authored-by line.  If that could be fixed up 
on commit then good, otherwise I can resend.

- Greg

^ permalink raw reply

* Re: [PATCH net] tcp: remove empty skb from write queue in error cases
From: David Miller @ 2019-08-28  4:38 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, soheil, ncardwell, eric.dumazet, jbaron, rutsky
In-Reply-To: <20190826161915.81676-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Mon, 26 Aug 2019 09:19:15 -0700

> Vladimir Rutsky reported stuck TCP sessions after memory pressure
> events. Edge Trigger epoll() user would never receive an EPOLLOUT
> notification allowing them to retry a sendmsg().
> 
> Jason tested the case of sk_stream_alloc_skb() returning NULL,
> but there are other paths that could lead both sendmsg() and sendpage()
> to return -1 (EAGAIN), with an empty skb queued on the write queue.
> 
> This patch makes sure we remove this empty skb so that
> Jason code can detect that the queue is empty, and
> call sk->sk_write_space(sk) accordingly.
> 
> Fixes: ce5ec440994b ("tcp: ensure epoll edge trigger wakeup when write queue is empty")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jason Baron <jbaron@akamai.com>
> Reported-by: Vladimir Rutsky <rutsky@google.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH] net/hamradio/6pack: Fix the size of a sk_buff used in 'sp_bump()'
From: David Miller @ 2019-08-28  4:39 UTC (permalink / raw)
  To: christophe.jaillet; +Cc: ajk, linux-hams, netdev, linux-kernel, kernel-janitors
In-Reply-To: <20190826190209.16795-1-christophe.jaillet@wanadoo.fr>

From: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Date: Mon, 26 Aug 2019 21:02:09 +0200

> We 'allocate' 'count' bytes here. In fact, 'dev_alloc_skb' already add some
> extra space for padding, so a bit more is allocated.
> 
> However, we use 1 byte for the KISS command, then copy 'count' bytes, so
> count+1 bytes.
> 
> Explicitly allocate and use 1 more byte to be safe.
> 
> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
> ---
> This patch should be safe, be however may no be the correct way to fix the
> "buffer overflow". Maybe, the allocated size is correct and we should have:
>    memcpy(ptr, sp->cooked_buf + 1, count - 1);
> or
>    memcpy(ptr, sp->cooked_buf + 1, count - 1sp->rcount);
> 
> I've not dig deep enough to understand the link betwwen 'rcount' and
> how 'cooked_buf' is used.

I'm trying to figure out how this code works too.

Why are they skipping over the first byte?  Is that to avoid the
command byte?  Yes, then using sp->rcount as the memcpy length makes
sense.

Why is the caller subtracting 2 from the RX buffer count when
calculating sp->rcount?  This makes the situation even more confusing.


^ permalink raw reply

* Re: [PATCH net-next v5 0/6] net: dsa: mv88e6xxx: Peridot/Topaz SERDES changes
From: David Miller @ 2019-08-28  4:42 UTC (permalink / raw)
  To: marek.behun; +Cc: vivien.didelot, netdev, andrew, f.fainelli, olteanv
In-Reply-To: <20190826213155.14685-1-marek.behun@nic.cz>

From: Marek Behún <marek.behun@nic.cz>
Date: Mon, 26 Aug 2019 23:31:49 +0200

> this is the fifth version of changes for the Topaz/Peridot family of
> switches. The patches apply on net-next.
> Changes since v4:
>  - added Reviewed-by and Tested-by tags on first 2 patches, the others
>    are changed are affected by changes in patch 3/6, so I did not add
>    the tags, except for 5/6, which is just macro renaming
>  - patch 3 was changed: the serdes_get_lane returns 0 on success (lane
>    was discovered), -ENODEV if not lane is present on the port, and
>    other error if other error occured. Lane is put into a pointer of
>    type u8
>  - patches 4 and 6 were affected by this (error detecting from
>    serdes_get_lane)
>  - Andrew's complaint about the two additional parameters
>    (allow_over_2500 and make_cmode_writable) was addressed, by Vivien's
>    advice: I put a new method into chip operations structure, named
>    port_set_cmode_writable. This is called from mv88e6xxx_port_setup_mac
>    just before port_set_cmode. The method is implemented for Topaz.
>    The check if cmodes over 2500 should be allowed on given port is now
>    done in the specific port_set_cmode() that requires it, thus the
>    allow_over_2500 argument is not needed
> 
> Again, tested on Turris Mox with Peridot, Topaz, and Peridot + Topaz.

Series applied, thank you.

^ permalink raw reply

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Alexei Starovoitov @ 2019-08-28  4:43 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Alexei Starovoitov, Kees Cook, LSM List, James Morris, Jann Horn,
	Peter Zijlstra, Masami Hiramatsu, Steven Rostedt, David S. Miller,
	Daniel Borkmann, Network Development, bpf, kernel-team, Linux API
In-Reply-To: <CALCETrVbPPPr=BdPAx=tJKxD3oLXP4OVSgCYrB_E4vb6idELow@mail.gmail.com>

On Tue, Aug 27, 2019 at 05:55:41PM -0700, Andy Lutomirski wrote:
> 
> I was hoping for something in Documentation/admin-guide, not in a
> changelog that's hard to find.

eventually yes.

> >
> > > Changing the capability that some existing operation requires could
> > > break existing programs.  The old capability may need to be accepted
> > > as well.
> >
> > As far as I can see there is no ABI breakage. Please point out
> > which line of the patch may break it.
> 
> As a more or less arbitrary selection:
> 
>  void bpf_prog_kallsyms_add(struct bpf_prog *fp)
>  {
>         if (!bpf_prog_kallsyms_candidate(fp) ||
> -           !capable(CAP_SYS_ADMIN))
> +           !capable(CAP_BPF))
>                 return;
> 
> Before your patch, a task with CAP_SYS_ADMIN could do this.  Now it
> can't.  Per the usual Linux definition of "ABI break", this is an ABI
> break if and only if someone actually did this in a context where they
> have CAP_SYS_ADMIN but not all capabilities.  How confident are you
> that no one does things like this?
>  void bpf_prog_kallsyms_add(struct bpf_prog *fp)
>  {
>         if (!bpf_prog_kallsyms_candidate(fp) ||
> -           !capable(CAP_SYS_ADMIN))
> +           !capable(CAP_BPF))
>                 return;

Yes. I'm confident that apps don't drop everything and
leave cap_sys_admin only before doing bpf() syscall, since it would
break their own use of networking.
Hence I'm not going to do the cap_syslog-like "deprecated" message mess
because of this unfounded concern.
If I turn out to be wrong we will add this "deprecated mess" later.

> 
> From the previous discussion, you want to make progress toward solving
> a lot of problems with CAP_BPF.  One of them was making BPF
> firewalling more generally useful. By making CAP_BPF grant the ability
> to read kernel memory, you will make administrators much more nervous
> to grant CAP_BPF. 

Andy, were your email hacked?
I explained several times that in this proposal 
CAP_BPF _and_ CAP_TRACING _both_ are necessary to read kernel memory.
CAP_BPF alone is _not enough_.

> Similarly, and correct me if I'm wrong, most of
> these capabilities are primarily or only useful for tracing, so I
> don't see why users without CAP_TRACING should get them.
> bpf_trace_printk(), in particular, even has "trace" in its name :)
> 
> Also, if a task has CAP_TRACING, it's expected to be able to trace the
> system -- that's the whole point.  Why shouldn't it be able to use BPF
> to trace the system better?

CAP_TRACING shouldn't be able to do BPF because BPF is not tracing only.

> > For example:
> > BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr)
> > {
> >         int ret;
> >
> >         ret = probe_kernel_read(dst, unsafe_ptr, size);
> >         if (unlikely(ret < 0))
> >                 memset(dst, 0, size);
> >
> >         return ret;
> > }
> >
> > All of BPF (including prototype of bpf_probe_read) is controlled by CAP_BPF.
> > But the kernel primitives its using (probe_kernel_read) is controlled by CAP_TRACING.
> > Hence a task needs _both_ CAP_BPF and CAP_TRACING to attach and run bpf program
> > that uses bpf_probe_read.
> >
> > Similar with all other kernel code that BPF helpers may call directly or indirectly.
> > If there is a way for bpf program to call into piece of code controlled by CAP_TRACING
> > such helper would need CAP_BPF and CAP_TRACING.
> > If bpf helper calls into something that may mangle networking packet
> > such helper would need both CAP_BPF and CAP_NET_ADMIN to execute.
> 
> Why do you want to require CAP_BPF to call into functions like
> bpf_probe_read()?  I understand why you want to limit access to bpf,
> but I think that CAP_TRACING should be sufficient to allow the tracing
> parts of BPF.  After all, a lot of your concerns, especially the ones
> involving speculation, don't really apply to users with CAP_TRACING --
> users with CAP_TRACING can read kernel memory with or without bpf.

Let me try again to explain the concept...

Imagine AUDI logo with 4 circles.
They partially intersect.
The first circle is CAP_TRACING. Second is CAP_BPF. Third is CAP_NET_ADMIN.
Fourth - up to your imagination :)

These capabilities subdivide different parts of root privileges.
CAP_NET_ADMIN is useful on its own.
Just as CAP_TRACING that is useful for perf, ftrace, and probably
other tracing things that don't need bpf.

'bpftrace' is using a lot of tracing and a lot of bpf features,
but not all of bpf and not all tracing.
It falls into intersection of CAP_BPF and CAP_TRACING.

probe_kernel_read is a tracing mechanism.
perf can use it without bpf.
Hence it should be controlled by CAP_TRACING.

bpf_probe_read is a wrapper of that mechanism.
It's a place where BPF and TRACING circles intersect.
A task needs to have both CAP_BPF (to load the program)
and CAP_TRACING (to read kernel memory) to execute bpf_probe_read() helper.

> > > > @@ -2080,7 +2083,10 @@ static int bpf_prog_test_run(const union bpf_attr *attr,
> > > >         struct bpf_prog *prog;
> > > >         int ret = -ENOTSUPP;
> > > >
> > > > -       if (!capable(CAP_SYS_ADMIN))
> > > > +       if (!capable(CAP_NET_ADMIN) || !capable(CAP_BPF))
> > > > +               /* test_run callback is available for networking progs only.
> > > > +                * Add cap_bpf_tracing() above when tracing progs become runable.
> > > > +                */
> > >
> > > I think test_run should probably be CAP_SYS_ADMIN forever.  test_run
> > > is the only way that one can run a bpf program and call helper
> > > functions via the program if one doesn't have permission to attach the
> > > program.
> >
> > Since CAP_BPF + CAP_NET_ADMIN allow attach. It means that a task
> > with these two permissions will have programs running anyway.
> > (traffic will flow through netdev, socket events will happen, etc)
> > Hence no reason to disallow running program via test_run.
> >
> 
> test_run allows fully controlled inputs, in a context where a program
> can trivially flush caches, mistrain branch predictors, etc first.  It
> seems to me that, if a JITted bpf program contains an exploitable
> speculation gadget (MDS, Spectre v1, RSB, or anything else), 

speaking of MDS... I already asked you to help investigate its
applicability with existing bpf exposure. Are you going to do that?

> it will
> be *much* easier to exploit it using test_run than using normal
> network traffic.  Similarly, normal network traffic will have network
> headers that are valid enough to have caused the BPF program to be
> invoked in the first place.  test_run can inject arbitrary garbage.

Please take a look at Jann's var1 exploit. Was it hard to run bpf prog
in controlled environment without test_run command ?


^ permalink raw reply

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Alexei Starovoitov @ 2019-08-28  4:47 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Andy Lutomirski, Alexei Starovoitov, Kees Cook,
	LSM List, James Morris, Jann Horn, Peter Zijlstra,
	David S. Miller, Daniel Borkmann, Network Development, bpf,
	kernel-team, Linux API
In-Reply-To: <20190828123041.c0c90c15865897461ee819a2@kernel.org>

On Wed, Aug 28, 2019 at 12:30:41PM +0900, Masami Hiramatsu wrote:
> > kprobes can be created in the tracefs filesystem (which is separate from
> > debugfs, tracefs just gets automatically mounted
> > in /sys/kernel/debug/tracing when debugfs is mounted) from the
> > kprobe_events file. /sys/kernel/tracing is just the tracefs
> > directory without debugfs, and was created specifically to allow
> > tracing to be access without opening up the can of worms in debugfs.
> 
> I like the CAP_TRACING for tracefs. Can we make the tracefs itself
> check the CAP_TRACING and call file_ops? or each tracefs file-ops
> handlers must check it?

Thanks for the feedback.
I'll hack a prototype of CAP_TRACING for perf bits that I understand
and you folks will be able to use it in ftrace when initial support lands.
imo the question above is an implementation detail that you can resolve later.
I see it as a followup to initial CAP_TRACING drop.


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox