Netdev List
 help / color / mirror / Atom feed
* [PATCH v2] ipv6: Not to probe neighbourless routes
From: Yi Wang @ 2019-08-28  2:19 UTC (permalink / raw)
  To: davem
  Cc: kuznet, yoshfuji, netdev, linux-kernel, xue.zhihong, wang.yi59,
	wang.liang82, Cheng Lin

From: Cheng Lin <cheng.lin130@zte.com.cn>

Originally, Router Reachability Probing require a neighbour entry
existed. Commit 2152caea7196 ("ipv6: Do not depend on rt->n in
rt6_probe().") removed the requirement for a neighbour entry. And
commit f547fac624be ("ipv6: rate-limit probes for neighbourless
routes") adds rate-limiting for neighbourless routes.

And, the Neighbor Discovery for IP version 6 (IPv6)(rfc4861) says,
"
7.2.5.  Receipt of Neighbor Advertisements

When a valid Neighbor Advertisement is received (either solicited or
unsolicited), the Neighbor Cache is searched for the target's entry.
If no entry exists, the advertisement SHOULD be silently discarded.
There is no need to create an entry if none exists, since the
recipient has apparently not initiated any communication with the
target.
".

In rt6_probe(), just a Neighbor Solicitation message are transmited.
When receiving a Neighbor Advertisement, the node does nothing in a
Neighborless condition.

Not sure it's needed to create a neighbor entry in Router
Reachability Probing. And the Original way may be the right way.

This patch recover the requirement for a neighbour entry.

Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn>
---
 include/net/ip6_fib.h | 5 -----
 net/ipv6/route.c      | 6 +-----
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 4b5656c..8c2e022 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -124,11 +124,6 @@ struct rt6_exception {
 
 struct fib6_nh {
 	struct fib_nh_common	nh_common;
-
-#ifdef CONFIG_IPV6_ROUTER_PREF
-	unsigned long		last_probe;
-#endif
-
 	struct rt6_info * __percpu *rt6i_pcpu;
 	struct rt6_exception_bucket __rcu *rt6i_exception_bucket;
 };
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index fd059e0..1839dd7 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -639,12 +639,12 @@ static void rt6_probe(struct fib6_nh *fib6_nh)
 	nh_gw = &fib6_nh->fib_nh_gw6;
 	dev = fib6_nh->fib_nh_dev;
 	rcu_read_lock_bh();
-	idev = __in6_dev_get(dev);
 	neigh = __ipv6_neigh_lookup_noref(dev, nh_gw);
 	if (neigh) {
 		if (neigh->nud_state & NUD_VALID)
 			goto out;
 
+		idev = __in6_dev_get(dev);
 		write_lock(&neigh->lock);
 		if (!(neigh->nud_state & NUD_VALID) &&
 		    time_after(jiffies,
@@ -654,13 +654,9 @@ static void rt6_probe(struct fib6_nh *fib6_nh)
 				__neigh_set_probe_once(neigh);
 		}
 		write_unlock(&neigh->lock);
-	} else if (time_after(jiffies, fib6_nh->last_probe +
-				       idev->cnf.rtr_probe_interval)) {
-		work = kmalloc(sizeof(*work), GFP_ATOMIC);
 	}
 
 	if (work) {
-		fib6_nh->last_probe = jiffies;
 		INIT_WORK(&work->work, rt6_probe_deferred);
 		work->target = *nh_gw;
 		dev_hold(dev);
-- 
1.8.3.1


^ permalink raw reply related

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Steven Rostedt @ 2019-08-28  2:22 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, Alexei Starovoitov, Kees Cook, LSM List,
	James Morris, Jann Horn, Peter Zijlstra, Masami Hiramatsu,
	David S. Miller, Daniel Borkmann, Network Development, bpf,
	kernel-team, Linux API
In-Reply-To: <A95DA1BC-E2A1-4CC3-B17F-36C494FB7540@amacapital.net>

On Tue, 27 Aug 2019 18:12:59 -0700
Andy Lutomirski <luto@amacapital.net> wrote:

> Too many slashes :/
> 
> A group could work for v1.  Maybe all the tools should get updated to use this path?

trace-cmd now does. In fact, if run as root, it will first check if
tracefs is mounted, and if not, it will try to mount it at this
location.

-- Steve

^ permalink raw reply

* RE: [PATCH] r8152: Add rx_buf_sz field to struct r8152
From: Hayes Wang @ 2019-08-28  2:04 UTC (permalink / raw)
  To: Prashant Malani, davem@davemloft.net
  Cc: grundler@chromium.org, netdev@vger.kernel.org, nic_swsd
In-Reply-To: <20190827180146.253431-1-pmalani@chromium.org>

Prashant Malani [mailto:pmalani@chromium.org]
> Sent: Wednesday, August 28, 2019 2:02 AM
> To: Hayes Wang; davem@davemloft.net
> Cc: grundler@chromium.org; netdev@vger.kernel.org; nic_swsd; Prashant
> Malani
> Subject: [PATCH] r8152: Add rx_buf_sz field to struct r8152
> 
> tp->rx_buf_sz is set according to the specific version of HW being used.
> 
> agg_buf_sz was originally added to support LSO (Large Send Offload) and
> then seems to have been co-opted for LRO (Large Receive Offload). But RX
> large buffer size can be larger than TX large buffer size with newer HW.
> Using larger buffers can result in fewer "large RX packets" processed
> by the rest of the networking stack to reduce RX CPU utilization.
> 
> This patch is copied from the r8152 driver (v2.12.0) published by
> Realtek (www.realtek.com).
> 
> Signed-off-by: Prashant Malani <pmalani@chromium.org>
> Reviewed-by: Grant Grundler <grundler@chromium.org>
> ---

It seems as same as the commit ec5791c202ac ("r8152: separate the rx
buffer size").

https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/
commit/?id=ec5791c202aca90c1b3b99dff268a995cf2d6aa1

Best Regards,
Hayes


^ permalink raw reply

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Andy Lutomirski @ 2019-08-28  2:00 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Alexei Starovoitov, Alexei Starovoitov, Kees Cook, LSM List,
	James Morris, Jann Horn, Peter Zijlstra, Masami Hiramatsu,
	Steven Rostedt, David S. Miller, Daniel Borkmann,
	Network Development, bpf, kernel-team, Linux API
In-Reply-To: <CALCETrVbPPPr=BdPAx=tJKxD3oLXP4OVSgCYrB_E4vb6idELow@mail.gmail.com>

> On Aug 27, 2019, at 5:55 PM, Andy Lutomirski <luto@kernel.org> wrote:
>
> On Tue, Aug 27, 2019 at 5:34 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>>

> From the previous discussion, you want to make progress toward solving
> a lot of problems with CAP_BPF.  One of them was making BPF
> firewalling more generally useful. By making CAP_BPF grant the ability
> to read kernel memory, you will make administrators much more nervous
> to grant CAP_BPF.  Similarly, and correct me if I'm wrong, most of
> these capabilities are primarily or only useful for tracing, so I
> don't see why users without CAP_TRACING should get them.
> bpf_trace_printk(), in particular, even has "trace" in its name :)
>
> Also, if a task has CAP_TRACING, it's expected to be able to trace the
> system -- that's the whole point.  Why shouldn't it be able to use BPF
> to trace the system better?

Let me put this a bit differently. Part of the point is that
CAP_TRACING should allow a user or program to trace without being able
to corrupt the system. CAP_BPF as you’ve proposed it *can* likely
crash the system.  For example, CAP_BPF allows bpf_map_get_fd_by_id()
in your patch.  If the system uses a BPF firewall that stores some of
its rules in maps, then bpf_map_get_fd_by_id() can be used to get a
writable fd to the map, which can be used to change the map, thus
preventing network access.  This means that no combination of
CAP_TRACING and CAP_BPF ends up allowing tracing without granting the
ability to reconfigure or otherwise corrupt the system.

^ permalink raw reply

* [PATCH net v3 2/2] r8152: remove calling netif_napi_del
From: Hayes Wang @ 2019-08-28  1:51 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, Hayes Wang
In-Reply-To: <1394712342-15778-320-Taiwan-albertk@realtek.com>

Remove unnecessary use of netif_napi_del. This also avoids to call
napi_disable() after netif_napi_del().

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/usb/r8152.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index ad3abe26b51b..04137ac373b0 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -5352,7 +5352,6 @@ static int rtl8152_probe(struct usb_interface *intf,
 	return 0;
 
 out1:
-	netif_napi_del(&tp->napi);
 	usb_set_intfdata(intf, NULL);
 out:
 	free_netdev(netdev);
@@ -5367,7 +5366,6 @@ static void rtl8152_disconnect(struct usb_interface *intf)
 	if (tp) {
 		rtl_set_unplug(tp);
 
-		netif_napi_del(&tp->napi);
 		unregister_netdev(tp->netdev);
 		cancel_delayed_work_sync(&tp->hw_phy_work);
 		tp->rtl_ops.unload(tp);
-- 
2.21.0


^ permalink raw reply related

* [PATCH net v3 1/2] Revert "r8152: napi hangup fix after disconnect"
From: Hayes Wang @ 2019-08-28  1:51 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, Hayes Wang
In-Reply-To: <1394712342-15778-320-Taiwan-albertk@realtek.com>

This reverts commit 0ee1f4734967af8321ecebaf9c74221ace34f2d5.

The commit 0ee1f4734967 ("r8152: napi hangup fix after
disconnect") adds a check about RTL8152_UNPLUG to determine
if calling napi_disable() is invalid in rtl8152_close(),
when rtl8152_disconnect() is called. This avoids to use
napi_disable() after calling netif_napi_del().

Howver, commit ffa9fec30ca0 ("r8152: set RTL8152_UNPLUG
only for real disconnection") causes that RTL8152_UNPLUG
is not always set when calling rtl8152_disconnect().
Therefore, I have to revert commit 0ee1f4734967 ("r8152:
napi hangup fix after disconnect"), first. And submit
another patch to fix it.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/usb/r8152.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index eee0f5007ee3..ad3abe26b51b 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -4021,8 +4021,7 @@ static int rtl8152_close(struct net_device *netdev)
 #ifdef CONFIG_PM_SLEEP
 	unregister_pm_notifier(&tp->pm_notifier);
 #endif
-	if (!test_bit(RTL8152_UNPLUG, &tp->flags))
-		napi_disable(&tp->napi);
+	napi_disable(&tp->napi);
 	clear_bit(WORK_ENABLE, &tp->flags);
 	usb_kill_urb(tp->intr_urb);
 	cancel_delayed_work_sync(&tp->schedule);
-- 
2.21.0


^ permalink raw reply related

* [PATCH net v3 0/2] r8152: fix side effect
From: Hayes Wang @ 2019-08-28  1:51 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, Hayes Wang
In-Reply-To: <1394712342-15778-314-Taiwan-albertk@realtek.com>

v3:
Update the commit message for patch #1.

v2:
Replace patch #2 with "r8152: remove calling netif_napi_del".

v1:
The commit 0ee1f4734967 ("r8152: napi hangup fix after disconnect")
add a check to avoid using napi_disable after netif_napi_del. However,
the commit ffa9fec30ca0 ("r8152: set RTL8152_UNPLUG only for real
disconnection") let the check useless.

Therefore, I revert commit 0ee1f4734967 ("r8152: napi hangup fix
after disconnect") first, and add another patch to fix it.

Hayes Wang (2):
  Revert "r8152: napi hangup fix after disconnect"
  r8152: remove calling netif_napi_del

 drivers/net/usb/r8152.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

-- 
2.21.0


^ permalink raw reply

* [PATCH net-next] net: phy: force phy suspend when calling phy_stop
From: Jian Shen @ 2019-08-28  1:34 UTC (permalink / raw)
  To: andrew, f.fainelli, hkallweit1, davem, sergei.shtylyov
  Cc: netdev, forest.zhouchang, linuxarm

Some ethernet drivers may call phy_start() and phy_stop() from
ndo_open() and ndo_close() respectively.

When network cable is unconnected, and operate like below:
step 1: ifconfig ethX up -> ndo_open -> phy_start ->start
autoneg, and phy is no link.
step 2: ifconfig ethX down -> ndo_close -> phy_stop -> just stop
phy state machine.

This patch forces phy suspend even phydev->link is off.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
---
 drivers/net/phy/phy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index f3adea9..0acd5b4 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -911,8 +911,8 @@ void phy_state_machine(struct work_struct *work)
 		if (phydev->link) {
 			phydev->link = 0;
 			phy_link_down(phydev, true);
-			do_suspend = true;
 		}
+		do_suspend = true;
 		break;
 	}
 
-- 
2.8.1


^ permalink raw reply related

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Andy Lutomirski @ 2019-08-28  1:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andy Lutomirski, Alexei Starovoitov, Kees Cook, LSM List,
	James Morris, Jann Horn, Peter Zijlstra, Masami Hiramatsu,
	David S. Miller, Daniel Borkmann, Network Development, bpf,
	kernel-team, Linux API
In-Reply-To: <20190827204433.3af91faf@gandalf.local.home>



> On Aug 27, 2019, at 5:44 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> On Tue, 27 Aug 2019 16:34:47 -0700
> Andy Lutomirski <luto@kernel.org> wrote:
> 
>>>> CAP_TRACING does not override normal permissions on sysfs or debugfs.
>>>> This means that, unless a new interface for programming kprobes and
>>>> such is added, it does not directly allow use of kprobes.  
>>> 
>>> kprobes can be created in the tracefs filesystem (which is separate from
>>> debugfs, tracefs just gets automatically mounted
>>> in /sys/kernel/debug/tracing when debugfs is mounted) from the
>>> kprobe_events file. /sys/kernel/tracing is just the tracefs
>>> directory without debugfs, and was created specifically to allow
>>> tracing to be access without opening up the can of worms in debugfs.  
>> 
>> I think that, in principle, CAP_TRACING should allow this, but I'm not
>> sure how to achieve that.  I suppose we could set up
>> inode_operations.permission on tracefs, but what exactly would it do?
>> Would it be just like generic_permission() except that it would look
>> at CAP_TRACING instead of CAP_DAC_OVERRIDE?  That is, you can use
>> tracefs if you have CAP_TRACING *or* acl access?  Or would it be:
>> 
>> int tracing_permission(struct inode *inode, int mask)
>> {
>>  if (!capable(CAP_TRACING))
>>    return -EPERM;
>> 
>>  return generic_permission(inode, mask);
>> }
> 
> Perhaps we should make a group for it?
> 

Hmm. That means that you’d need CAP_TRACING and a group. That’s probably not terrible, but it could be annoying.

>> 
>> Which would mean that you need ACL *and* CAP_TRACING, so
>> administrators would change the mode to 777.  That's a bit scary.
>> 
>> And this still doesn't let people even *find* tracefs, since it's
>> hidden in debugfs.
>> 
>> So maybe make CAP_TRACING override ACLs but also add /sys/fs/tracing
>> and mount tracefs there, too, so that regular users can at least find
>> the mountpoint.
> 
> I think you missed what I said. It's not hidden in /sys/kernel/debug.
> If you enable tracefs, you have /sys/kernel/tracing created, and is
> completely separate from debugfs. I only have it *also* automatically
> mounted to /sys/kernel/debug/tracing for backward compatibility
> reasons, as older versions of trace-cmd will only mount debugfs (as
> root), and expect to find it there.
> 
> mount -t tracefs nodev /sys/kernel/tracing

Too many slashes :/

A group could work for v1.  Maybe all the tools should get updated to use this path?

> 
> -- Steve
> 
>> 
>>> 
>>> Should we allow CAP_TRACING access to /proc/kallsyms? as it is helpful
>>> to convert perf and trace-cmd's function pointers into names. Once you
>>> allow tracing of the kernel, hiding /proc/kallsyms is pretty useless.  
>> 
>> I think we should.
> 

^ permalink raw reply

* Re: [RFC PATCH net-next] net: phy: force phy suspend when calling phy_stop
From: shenjian (K) @ 2019-08-28  1:03 UTC (permalink / raw)
  To: Heiner Kallweit, andrew, f.fainelli, davem
  Cc: netdev, forest.zhouchang, linuxarm
In-Reply-To: <04fdbe88-8471-c023-4a0d-890667735737@gmail.com>



在 2019/8/28 3:41, Heiner Kallweit 写道:
> On 27.08.2019 10:29, shenjian (K) wrote:
>>
>>
>> 在 2019/8/27 13:51, Heiner Kallweit 写道:
>>> On 27.08.2019 04:47, Jian Shen wrote:
>>>> Some ethernet drivers may call phy_start() and phy_stop() from
>>>> ndo_open and ndo_close() respectively.
>>>>
>>>> When network cable is unconnected, and operate like below:
>>>> step 1: ifconfig ethX up -> ndo_open -> phy_start ->start
>>>> autoneg, and phy is no link.
>>>> step 2: ifconfig ethX down -> ndo_close -> phy_stop -> just stop
>>>> phy state machine.
>>>> step 3: plugin the network cable, and autoneg complete, then
>>>> LED for link status will be on.
>>>> step 4: ethtool ethX --> see the result of "Link detected" is no.
>>>>
>>> Step 3 and 4 seem to be unrelated to the actual issue.
>>> With which MAC + PHY driver did you observe this?
>>>
>> Thanks Heiner,
>>
>> I tested this on HNS3 driver, with two phy, Marvell 88E1512 and RTL8211.
>>
>> Step 3 and Step 4 is just to describe that the LED of link shows link up,
>> but the port information shows no link.
>>
> ethtool refers to the link at MAC level. Therefore default implementation
> ethtool_op_get_link just returns the result of netif_carrier_ok().
> Also using PHY link status if interface is down doesn't really make sense:
> - phylib state machine isn't running, therefore PHY status doesn't get updated
> - often MAC drivers shut down parts of the MAC on ndo_close, this typically
>   makes the internal MDIO bus unaccessible
> So just remove steps 3 and 4. The patch itself is fine with me.
> 
OK, I will fix the commit log.
Thanks, Heiner.
>>
>>>> This patch forces phy suspend even phydev->link is off.
>>>>
>>>> Signed-off-by: Jian Shen <shenjian15@huawei.com>
>>>> ---
>>>>  drivers/net/phy/phy.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
>>>> index f3adea9..0acd5b4 100644
>>>> --- a/drivers/net/phy/phy.c
>>>> +++ b/drivers/net/phy/phy.c
>>>> @@ -911,8 +911,8 @@ void phy_state_machine(struct work_struct *work)
>>>>  		if (phydev->link) {
>>>>  			phydev->link = 0;
>>>>  			phy_link_down(phydev, true);
>>>> -			do_suspend = true;
>>>>  		}
>>>> +		do_suspend = true;
>>>>  		break;
>>>>  	}
>>>>  
>>>>
>>> Heiner
>>>
>>>
>>
>>
> 
> 
> .
> 


^ permalink raw reply

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Andy Lutomirski @ 2019-08-28  0:55 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Alexei Starovoitov, Kees Cook, LSM List,
	James Morris, Jann Horn, Peter Zijlstra, Masami Hiramatsu,
	Steven Rostedt, David S. Miller, Daniel Borkmann,
	Network Development, bpf, kernel-team, Linux API
In-Reply-To: <20190828003447.htgzsxs5oevn3eys@ast-mbp.dhcp.thefacebook.com>

On Tue, Aug 27, 2019 at 5:34 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Aug 27, 2019 at 04:01:08PM -0700, Andy Lutomirski wrote:
> > [adding some security and tracing folks to cc]
> >
> > On Tue, Aug 27, 2019 at 1:52 PM Alexei Starovoitov <ast@kernel.org> wrote:
> > >
> > > Introduce CAP_BPF that allows loading all types of BPF programs,
> > > create most map types, load BTF, iterate programs and maps.
> > > CAP_BPF alone is not enough to attach or run programs.
> > >
> > > Networking:
> > >
> > > CAP_BPF and CAP_NET_ADMIN are necessary to:
> > > - attach to cgroup-bpf hooks like INET_INGRESS, INET_SOCK_CREATE, INET4_CONNECT
> > > - run networking bpf programs (like xdp, skb, flow_dissector)
> > >
> > > Tracing:
> > >
> > > CAP_BPF and perf_paranoid_tracepoint_raw() (which is kernel.perf_event_paranoid == -1)
> > > are necessary to:
> > > - attach bpf program to raw tracepoint
> > > - use bpf_trace_printk() in all program types (not only tracing programs)
> > > - create bpf stackmap
> > >
> > > To attach bpf to perf_events perf_event_open() needs to succeed as usual.
> > >
> > > CAP_BPF controls BPF side.
> > > CAP_NET_ADMIN controls intersection where BPF calls into networking.
> > > perf_paranoid_tracepoint_raw controls intersection where BPF calls into tracing.
> > >
> > > In the future CAP_TRACING could be introduced to control
> > > creation of kprobe/uprobe and attaching bpf to perf_events.
> > > In such case bpf_probe_read() thin wrapper would be controlled by CAP_BPF.
> > > Whereas probe_read() would be controlled by CAP_TRACING.
> > > CAP_TRACING would also control generic kprobe+probe_read.
> > > CAP_BPF and CAP_TRACING would be necessary for tracing bpf programs
> > > that want to use bpf_probe_read.
> >
> > First, some high-level review:
> >
> > Can you write up some clear documentation aimed at administrators that
> > says what CAP_BPF does?  For example, is it expected that CAP_BPF by
> > itself permits reading all kernel memory?
>
> hmm. the answer is in the sentence you quoted right above.

I was hoping for something in Documentation/admin-guide, not in a
changelog that's hard to find.

>
> > Can you give at least one fully described use case where CAP_BPF
> > solves a real-world problem that is not solved by existing mechanisms?
>
> bpftrace binary would be installed with CAP_BPF and CAP_TRACING.
> bcc tools would be installed with CAP_BPF and CAP_TRACING.
> perf binary would be installed with CAP_TRACING only.
> XDP networking daemon would be installed with CAP_BPF and CAP_NET_ADMIN.
> None of them would need full root.

As in just setting these bits in fscaps?  What does this achieve
beyond just installing them setuid-root or with CAP_SYS_ADMIN and
judicious use of capset internally?  For that matter, what prevents
unauthorized users from tracing the system if you do this?  This just
lets anyone trace the system, which seems like a mistake.

Can you clarify your example or give another one?

>
> > Changing the capability that some existing operation requires could
> > break existing programs.  The old capability may need to be accepted
> > as well.
>
> As far as I can see there is no ABI breakage. Please point out
> which line of the patch may break it.

As a more or less arbitrary selection:

 void bpf_prog_kallsyms_add(struct bpf_prog *fp)
 {
        if (!bpf_prog_kallsyms_candidate(fp) ||
-           !capable(CAP_SYS_ADMIN))
+           !capable(CAP_BPF))
                return;

Before your patch, a task with CAP_SYS_ADMIN could do this.  Now it
can't.  Per the usual Linux definition of "ABI break", this is an ABI
break if and only if someone actually did this in a context where they
have CAP_SYS_ADMIN but not all capabilities.  How confident are you
that no one does things like this?
 void bpf_prog_kallsyms_add(struct bpf_prog *fp)
 {
        if (!bpf_prog_kallsyms_candidate(fp) ||
-           !capable(CAP_SYS_ADMIN))
+           !capable(CAP_BPF))
                return;

> > > ---
> > > I would prefer to introduce CAP_TRACING soon, since it
> > > will make tracing and networking permission model symmetrical.
> > >
> >
> > Here's my proposal for CAP_TRACING, documentation-style:
> >
> > --- begin ---
> >
> > CAP_TRACING enables a task to use various kernel features to trace
> > running user programs and the kernel itself.  CAP_TRACING also enables
> > a task to bypass some speculation attack countermeasures.  A task in
> > the init user namespace with CAP_TRACING will be able to tell exactly
> > what kernel code is executed and when, and will be able to read kernel
> > registers and kernel memory.  It will, similarly, be able to read the
> > state of other user tasks.
> >
> > Specifically, CAP_TRACING allows the following operations.  It may
> > allow more operations in the future:
> >
> >  - Full use of perf_event_open(), similarly to the effect of
> > kernel.perf_event_paranoid == -1.
>
> +1
>
> >  - Loading and attaching tracing BPF programs, including use of BPF
> > raw tracepoints.
>
> -1
>
> >  - Use of BPF stack maps.
>
> -1
>
> >  - Use of bpf_probe_read() and bpf_trace_printk().
>
> -1
>
> >  - Use of unsafe pointer-to-integer conversions in BPF.
>
> -1
>
> >  - Bypassing of BPF's speculation attack hardening measures and
> > constant blinding.  (Note: other mechanisms might also allow this.)
>
> -1
> All of the above are allowed by CAP_BPF.
> They are not allowed by CAP_TRACING.

Why?  I don't mean to discount your -1, and you may well have a
compelling reason.  If so, I'll change my proposal.

From the previous discussion, you want to make progress toward solving
a lot of problems with CAP_BPF.  One of them was making BPF
firewalling more generally useful. By making CAP_BPF grant the ability
to read kernel memory, you will make administrators much more nervous
to grant CAP_BPF.  Similarly, and correct me if I'm wrong, most of
these capabilities are primarily or only useful for tracing, so I
don't see why users without CAP_TRACING should get them.
bpf_trace_printk(), in particular, even has "trace" in its name :)

Also, if a task has CAP_TRACING, it's expected to be able to trace the
system -- that's the whole point.  Why shouldn't it be able to use BPF
to trace the system better?

>
> > CAP_TRACING does not override normal permissions on sysfs or debugfs.
> > This means that, unless a new interface for programming kprobes and
> > such is added, it does not directly allow use of kprobes.
>
> kprobes can be created via perf_event_open already.
> So above statement contradicts your first statement.

Hmm.  The way of using perf with kprobes that I'm familiar with is:

# perf probe --add func_name

And this uses "/sys/kernel/debug/tracing//kprobe_events" (with the
double slash!).  Is there indeed another way to do this?

Anyway, I didn't mean to exclude any existing perf_event_open()
mechanism -- what I meant was that, without some extension to my
proposal, /sys/kernel/debug/tracing wouldn't magically become
accessible due to CAP_TRACING.

>
> > If CAP_TRACING, by itself, enables a task to crash or otherwise
> > corrupt the kernel or other tasks, this will be considered a kernel
> > bug.
>
> +1
>
> > CAP_TRACING in a non-init user namespace may, in the future, allow
> > tracing of other tasks in that user namespace or its descendants.  It
> > will not enable kernel tracing or tracing of tasks outside the user
> > namespace in question.
>
> I would avoid describing user ns for now.
> There is enough confusion without it.
>
> > --- end ---
> >
> > Does this sound good?  The idea here is that CAP_TRACING should be
> > very useful even without CAP_BPF, which allows CAP_BPF to be less
> > powerful.
>
> As proposed CAP_BPF does not allow tracing or networking on its own.
> CAP_BPF only controls BPF side.
>
> For example:
> BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr)
> {
>         int ret;
>
>         ret = probe_kernel_read(dst, unsafe_ptr, size);
>         if (unlikely(ret < 0))
>                 memset(dst, 0, size);
>
>         return ret;
> }
>
> All of BPF (including prototype of bpf_probe_read) is controlled by CAP_BPF.
> But the kernel primitives its using (probe_kernel_read) is controlled by CAP_TRACING.
> Hence a task needs _both_ CAP_BPF and CAP_TRACING to attach and run bpf program
> that uses bpf_probe_read.
>
> Similar with all other kernel code that BPF helpers may call directly or indirectly.
> If there is a way for bpf program to call into piece of code controlled by CAP_TRACING
> such helper would need CAP_BPF and CAP_TRACING.
> If bpf helper calls into something that may mangle networking packet
> such helper would need both CAP_BPF and CAP_NET_ADMIN to execute.

Why do you want to require CAP_BPF to call into functions like
bpf_probe_read()?  I understand why you want to limit access to bpf,
but I think that CAP_TRACING should be sufficient to allow the tracing
parts of BPF.  After all, a lot of your concerns, especially the ones
involving speculation, don't really apply to users with CAP_TRACING --
users with CAP_TRACING can read kernel memory with or without bpf.

>
> > > @@ -2080,7 +2083,10 @@ static int bpf_prog_test_run(const union bpf_attr *attr,
> > >         struct bpf_prog *prog;
> > >         int ret = -ENOTSUPP;
> > >
> > > -       if (!capable(CAP_SYS_ADMIN))
> > > +       if (!capable(CAP_NET_ADMIN) || !capable(CAP_BPF))
> > > +               /* test_run callback is available for networking progs only.
> > > +                * Add cap_bpf_tracing() above when tracing progs become runable.
> > > +                */
> >
> > I think test_run should probably be CAP_SYS_ADMIN forever.  test_run
> > is the only way that one can run a bpf program and call helper
> > functions via the program if one doesn't have permission to attach the
> > program.
>
> Since CAP_BPF + CAP_NET_ADMIN allow attach. It means that a task
> with these two permissions will have programs running anyway.
> (traffic will flow through netdev, socket events will happen, etc)
> Hence no reason to disallow running program via test_run.
>

test_run allows fully controlled inputs, in a context where a program
can trivially flush caches, mistrain branch predictors, etc first.  It
seems to me that, if a JITted bpf program contains an exploitable
speculation gadget (MDS, Spectre v1, RSB, or anything else), it will
be *much* easier to exploit it using test_run than using normal
network traffic.  Similarly, normal network traffic will have network
headers that are valid enough to have caused the BPF program to be
invoked in the first place.  test_run can inject arbitrary garbage.

^ permalink raw reply

* RE: [E1000-devel] SFP+ EEPROM readouts fail on X722 (ethtool -m: Invalid argument)
From: Fujinaka, Todd @ 2019-08-28  0:53 UTC (permalink / raw)
  To: Jakub Jankowski, e1000-devel@lists.sourceforge.net
  Cc: netdev@vger.kernel.org, mhemsley@open-systems.com
In-Reply-To: <34ba28aa-44a5-8a6c-c8c4-b92a16f952ad@toxcorp.com>

Sorry about the top posting, but if I don't do it this way I can't read anything in Outlook (not my preferred MUA).

I think I may have been wrong about things. I'm not as familiar with the x722, and the NVM versions are completely different than the x710 and I was confused.

Even worse, I'm not sure if the x722 is able to read the data from the SFP/SFP+ EEPROM. I remembered that was a feature we requested internally but I don't remember what the progress was.

I'm asking around to see if I can get clarification. I haven't heard anything yet.

Todd Fujinaka
Software Application Engineer
Datacenter Engineering Group
Intel Corporation
todd.fujinaka@intel.com


-----Original Message-----
From: Jakub Jankowski [mailto:shasta@toxcorp.com] 
Sent: Tuesday, August 27, 2019 4:01 PM
To: Fujinaka, Todd <todd.fujinaka@intel.com>; e1000-devel@lists.sourceforge.net
Cc: netdev@vger.kernel.org; mhemsley@open-systems.com
Subject: Re: [E1000-devel] SFP+ EEPROM readouts fail on X722 (ethtool -m: Invalid argument)

Hi,

On 8/27/19 7:56 PM, Fujinaka, Todd wrote:
> The hints should be:
> # ethtool -m eth10
> Cannot get module EEPROM information: Invalid argument # dmesg | tail -n 1 [  445.971974] i40e 0000:3d:00.3 eth10: Module EEPROM memory read not supported. Please update the NVM image.
>
> # ethtool -i eth10
> driver: i40e
> version: 2.9.21
> firmware-version: 3.31 0x80000d31 1.1767.0
>
> And the working case:
> # ethtool -i eth8
> driver: i40e
> version: 2.9.21
> firmware-version: 6.01 0x800035cf 1.1876.0
>
> If you don't see it, 6.01 > 3.31.
The reason why firmware between the two is (that much) different is because the non-working case is from X722 NIC, while the working one is from X710.

> The NVM update tool should be available on downloadcenter.intel.com

Thanks for the pointer to NVM updater. I'd like to offer some additional comments about my experience with the newest one (v4.00):

a) running ./nvmupdate64e (from X722_NVMUpdate_Linux_x64 subdir) errors out without really saying what's wrong:

   # ./nvmupdate64e

   Intel(R) Ethernet NVM Update Tool
   NVMUpdate version 1.30.2.11
   Copyright (C) 2013 - 2017 Intel Corporation.


   WARNING: To avoid damage to your device, do not stop the update or reboot or power off the system during this update.
   Inventory in progress. Please wait [+.........]
   Tool execution completed with the following status: The configuration file could not be opened/read, or a syntax error was discovered in the file
   Press any key to exit.

after enabling logging (-l out.log) a bit more is revealed:

   # tail -n 2 out.log
   Error:   Config file line 2: Not supported config file version.
   Error:   Missing CONFIG VERSION parameter in configuration file.

but that's not entirely true, CONFIG VERSION is set in the default configuration file:

   # head -n 2 nvmupdate.cfg
   CURRENT FAMILY: 1.0.0
   CONFIG VERSION: 1.14.0

so why isn't this understood?
Manually editing nvmupdate.cfg and setting CONFIG VERSION: 1.11.0 seems to make this particular problem go away.

b) Re-doing this with downgraded config version exposes another problem:

   Config file read.
   Error:   Can't open NVM map file [Immediate_offset_2.txt]

and indeed, there is no Immediate_offset_2.txt in NVMUpdatePackage_WFT_WFQ&WF0_v4.00/X722_NVMUpdate_Linux_x64/
There is one, however, in
NVMUpdatePackage_WFT_WFQ&WF0_v4.00/X722_NVMUpdate_EFIx64/ subdir. 
Copying it over to the _Linux_x64 resolves this particular problem

c) Re-doing this with Immediate_offset_2.txt in place exposes third problem:

   Error:   Can't open NVM image file
[LBG_B2_Wolf_Pass_WFT_X557_P01_PHY_Auto_Detect_P23_NCSI_v3.31_800016DB.bin]

and once again - same story. It exists in NVMUpdatePackage_WFT_WFQ&WF0_v4.00/X722_NVMUpdate_EFIx64/ but not NVMUpdatePackage_WFT_WFQ&WF0_v4.00/X722_NVMUpdate_Linux_x64/ - had to copy it over.


Once I managed to get all these out of the way, the tool finally ran:

   Num Description                               Ver. DevId S:B Status
   === ======================================== ===== ===== ====== ===============
   01) Intel(R) Ethernet Server Adapter I350-T4  1.99  1521 00:024 Update not available
   02) Intel(R) Ethernet Connection X722 for     3.49  37D2 00:061 Update
       10GBASE-T available
   03) Intel(R) Ethernet Server Adapter I350-T4  1.99  1521 00:175 Update not available


The initial starting point was:

0) firmware-version: 3.31 0x80000d31 1.1767.0

After first update+reboot, this was bumped to:

1) firmware-version: 3.1d 0x800016db 1.1767.0    (but ethtool -m ethX still doesn't work)

So I ran the tool the second time, it said 'Update available' again, but this time:

   Num Description                               Ver. DevId S:B Status
   === ======================================== ===== ===== ====== ===============
   01) Intel(R) Ethernet Server Adapter I350-T4  1.99  1521 00:024 Update not available
   02) Intel(R) Ethernet Connection X722 for     3.29  37D2 00:061 Update
       10GBASE-T available
   03) Intel(R) Ethernet Server Adapter I350-T4  1.99  1521 00:175 Update not available

   Options: Adapter Index List (comma-separated), [A]ll, e[X]it
   Enter selection:02
   Would you like to back up the NVM images? [Y]es/[N]o: Y
   Update in progress. This operation may take several minutes.
   [*******+..]
   Tool execution completed with the following status: <---------- why is there no status printed?
   Press any key to exit.


Checking output log:

   # cat out3.log
   Intel(R) Ethernet NVM Update Tool
   NVMUpdate version 1.30.2.11
   Copyright (C) 2013 - 2017 Intel Corporation.

   ./nvmupdate64e -c nvmupdate.cfg -l out3.log

   Config file read.
   Inventory
   [00:061:00:00]: Intel(R) Ethernet Connection X722 for 10GBASE-T
       Flash inventory started
       Shadow RAM inventory started
   Alternate MAC address is not set
       Shadow RAM inventory finished
       Flash inventory finished
       OROM inventory started
       OROM inventory finished
       PHY NVM inventory started
       PHY NVM inventory finished
   [00:061:00:01]: Intel(R) Ethernet Connection X722 for 10GBASE-T
       Device already inventoried.
   [00:061:00:02]: Intel(R) Ethernet Connection X722 for 10GbE SFP+
       Device already inventoried.
       PHY NVM inventory started
       PHY NVM inventory finished
   [00:061:00:03]: Intel(R) Ethernet Connection X722 for 10GbE SFP+
       Device already inventoried.
   Update
   [00:061:00:00]: Intel(R) Ethernet Connection X722 for 10GBASE-T
       Creating backup images in directory: A4BF0164884A
       Backup images created.
       Flash update started
       NVM image verification started
       Shadow RAM image verification started

   Image differences found at offset 0x3AE [Device=0xF, Buffer=0x0] - 
update required.
   Error:   Flash update failed
   [00:061:00:02]: Intel(R) Ethernet Connection X722 for 10GbE SFP+
   #

However, ethtool -i suggests that firmware was updated to:

2) firmware-version: 4.00 0x80001577 1.1580.0    <------- so it did 
_something_ after all?

At this point, every subsequent attempt to run the NVM updater yields 
the same results: an update is available, but attempting to apply it 
fails with the same message in log.

And my initial issue still persists - ethtool -m <iface> still returns 
"invalid argument" with "Module EEPROM memory read not supported. Please 
update the NVM image" logged in dmesg.

How can I resolve this?

Cheers,
  Jakub.

>
> Todd Fujinaka
> Software Application Engineer
> Datacenter Engineering Group
> Intel Corporation
> todd.fujinaka@intel.com
>
>
> -----Original Message-----
> From: Jakub Jankowski [mailto:shasta@toxcorp.com]
> Sent: Tuesday, August 27, 2019 4:03 AM
> To: e1000-devel@lists.sourceforge.net
> Cc: netdev@vger.kernel.org; shasta@toxcorp.com; mhemsley@open-systems.com
> Subject: [E1000-devel] SFP+ EEPROM readouts fail on X722 (ethtool -m: Invalid argument)
>
> Hi,
>
> We can't get SFP+ EEPROM readouts for X722 to work at all:
>
> # ethtool -m eth10
> Cannot get module EEPROM information: Invalid argument # dmesg | tail -n 1 [  445.971974] i40e 0000:3d:00.3 eth10: Module EEPROM memory read not supported. Please update the NVM image.
> # lspci | grep 3d:00.3
> 3d:00.3 Ethernet controller: Intel Corporation Ethernet Connection X722 for 10GbE SFP+ (rev 09)
>
>
> We're running 4.19.65 kernel at the moment, testing using the newest out-of-tree Intel module
>
> # modinfo -F version i40e
> 2.9.21
>
> We also tried:
> - 4.19.65 with in-tree i40e (2.3.2-k)
> - stock Arch Linux (kernel 5.2.5, driver 2.8.20-k) and the results are the same, as shown above.
>
> # ethtool -i eth10
> driver: i40e
> version: 2.9.21
> firmware-version: 3.31 0x80000d31 1.1767.0
> expansion-rom-version:
> bus-info: 0000:3d:00.3
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: yes
> # dmidecode -s baseboard-manufacturer
> Intel Corporation
> # dmidecode -s baseboard-product-name
> S2600WFT
> # dmidecode -s baseboard-version
> H48104-853
>
> # lspci -vvv
> (...)
> 3d:00.3 Ethernet controller: Intel Corporation Ethernet Connection X722 for 10GbE SFP+ (rev 09)
> 	DeviceName: Intel PCH Integrated 10 Gigabit Ethernet Controller
> 	Subsystem: Intel Corporation Ethernet Connection X722 for 10GbE SFP+
> 	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 32 bytes
> 	Interrupt: pin A routed to IRQ 112
> 	NUMA node: 0
> 	Region 0: Memory at ab000000 (64-bit, prefetchable) [size=16M]
> 	Region 3: Memory at b0000000 (64-bit, prefetchable) [size=32K]
> 	Expansion ROM at <ignored> [disabled]
> 	Capabilities: [40] Power Management version 3
> 		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
> 		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
> 	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
> 		Address: 0000000000000000  Data: 0000
> 		Masking: 00000000  Pending: 00000000
> 	Capabilities: [70] MSI-X: Enable+ Count=129 Masked-
> 		Vector table: BAR=3 offset=00000000
> 		PBA: BAR=3 offset=00001000
> 	Capabilities: [a0] Express (v2) Endpoint, MSI 00
> 		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
> 			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
> 		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
> 			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop- FLReset-
> 			MaxPayload 256 bytes, MaxReadReq 512 bytes
> 		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend-
> 		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
> 			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
> 		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> 			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> 		LnkSta:	Speed 2.5GT/s (ok), Width x1 (ok)
> 			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> 		DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported
> 			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
> 			 AtomicOpsCtl: ReqEn-
> 		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
> 			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> 	Capabilities: [e0] Vital Product Data
> 		Product Name: Example VPD
> 		Read-only fields:
> 			[V0] Vendor specific:
> 			[RV] Reserved: checksum good, 0 byte(s) reserved
> 		End
> 	Capabilities: [100 v2] Advanced Error Reporting
> 		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
> 		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> 		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> 		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> 		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
> 			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> 		HeaderLog: 00000000 00000000 00000000 00000000
> 	Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
> 		ARICap:	MFVC- ACS-, Next Function: 0
> 		ARICtl:	MFVC- ACS-, Function Group: 0
> 	Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
> 		IOVCap:	Migration-, Interrupt Message Number: 000
> 		IOVCtl:	Enable- Migration- Interrupt- MSE- ARIHierarchy-
> 		IOVSta:	Migration-
> 		Initial VFs: 32, Total VFs: 32, Number of VFs: 0, Function Dependency Link: 03
> 		VF offset: 109, stride: 1, Device ID: 37cd
> 		Supported Page Size: 00000553, System Page Size: 00000001
> 		Region 0: Memory at 00000000af000000 (64-bit, prefetchable)
> 		Region 3: Memory at 00000000b0020000 (64-bit, prefetchable)
> 		VF Migration: offset: 00000000, BIR: 0
> 	Capabilities: [1a0 v1] Transaction Processing Hints
> 		Device specific mode supported
> 		No steering table available
> 	Capabilities: [1b0 v1] Access Control Services
> 		ACSCap:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
> 		ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
> 	Kernel driver in use: i40e
> 	Kernel modules: i40e
>
>
> Same kernel+i40e, same SFP+ module - but on Intel X710, works like a treat:
>
> # lspci | grep X7
> 81:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)
> 81:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01) # ethtool -m eth8
> 	Identifier                                : 0x03 (SFP)
> 	Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
> 	Connector                                 : 0x07 (LC)
> 	Transceiver codes                         : 0x10 0x00 0x00 0x01 0x00 0x00 0x00 0x00 0x00
> 	Transceiver type                          : 10G Ethernet: 10G Base-SR
> 	Transceiver type                          : Ethernet: 1000BASE-SX
> 	Encoding                                  : 0x06 (64B/66B)
> 	BR, Nominal                               : 10300MBd
>           (...)
> # ethtool -i eth8
> driver: i40e
> version: 2.9.21
> firmware-version: 6.01 0x800035cf 1.1876.0
> expansion-rom-version:
> bus-info: 0000:81:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: yes
> #
>
>
> Is this a known problem?
>
>
> Best regards,
>    Jakub
>
>
>
> _______________________________________________
> E1000-devel mailing list
> E1000-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/e1000-devel
> To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired


^ permalink raw reply

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Steven Rostedt @ 2019-08-28  0:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Alexei Starovoitov, Kees Cook, LSM List, James Morris, Jann Horn,
	Peter Zijlstra, Masami Hiramatsu, David S. Miller,
	Daniel Borkmann, Network Development, bpf, kernel-team, Linux API
In-Reply-To: <CALCETrUOHRMkBRJi_s30CjZdOLDGtdMOEgqfgPf+q0x+Fw7LtQ@mail.gmail.com>

On Tue, 27 Aug 2019 16:34:47 -0700
Andy Lutomirski <luto@kernel.org> wrote:

> > > CAP_TRACING does not override normal permissions on sysfs or debugfs.
> > > This means that, unless a new interface for programming kprobes and
> > > such is added, it does not directly allow use of kprobes.  
> >
> > kprobes can be created in the tracefs filesystem (which is separate from
> > debugfs, tracefs just gets automatically mounted
> > in /sys/kernel/debug/tracing when debugfs is mounted) from the
> > kprobe_events file. /sys/kernel/tracing is just the tracefs
> > directory without debugfs, and was created specifically to allow
> > tracing to be access without opening up the can of worms in debugfs.  
> 
> I think that, in principle, CAP_TRACING should allow this, but I'm not
> sure how to achieve that.  I suppose we could set up
> inode_operations.permission on tracefs, but what exactly would it do?
> Would it be just like generic_permission() except that it would look
> at CAP_TRACING instead of CAP_DAC_OVERRIDE?  That is, you can use
> tracefs if you have CAP_TRACING *or* acl access?  Or would it be:
> 
> int tracing_permission(struct inode *inode, int mask)
> {
>   if (!capable(CAP_TRACING))
>     return -EPERM;
> 
>   return generic_permission(inode, mask);
> }

Perhaps we should make a group for it?

> 
> Which would mean that you need ACL *and* CAP_TRACING, so
> administrators would change the mode to 777.  That's a bit scary.
> 
> And this still doesn't let people even *find* tracefs, since it's
> hidden in debugfs.
> 
> So maybe make CAP_TRACING override ACLs but also add /sys/fs/tracing
> and mount tracefs there, too, so that regular users can at least find
> the mountpoint.

I think you missed what I said. It's not hidden in /sys/kernel/debug.
If you enable tracefs, you have /sys/kernel/tracing created, and is
completely separate from debugfs. I only have it *also* automatically
mounted to /sys/kernel/debug/tracing for backward compatibility
reasons, as older versions of trace-cmd will only mount debugfs (as
root), and expect to find it there.

 mount -t tracefs nodev /sys/kernel/tracing

-- Steve

> 
> >
> > Should we allow CAP_TRACING access to /proc/kallsyms? as it is helpful
> > to convert perf and trace-cmd's function pointers into names. Once you
> > allow tracing of the kernel, hiding /proc/kallsyms is pretty useless.  
> 
> I think we should.


^ permalink raw reply

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Alexei Starovoitov @ 2019-08-28  0:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andy Lutomirski, Alexei Starovoitov, Kees Cook, LSM List,
	James Morris, Jann Horn, Peter Zijlstra, Masami Hiramatsu,
	David S. Miller, Daniel Borkmann, Network Development, bpf,
	kernel-team, Linux API
In-Reply-To: <20190827192144.3b38b25a@gandalf.local.home>

On Tue, Aug 27, 2019 at 07:21:44PM -0400, Steven Rostedt wrote:
> 
> At least for CAP_TRACING (if it were to allow read/write access
> to /sys/kernel/tracing), that would be very useful. It would be useful
> to those that basically own their machines, and want to trace their
> applications all the way into the kernel without having to run as full
> root.

+1

The proposal is to have CAP_TRACING to control perf and ftrace.
perf and trace-cmd binaries could be installed with CAP_TRACING and that's
all they need to do full tracing.

I can craft a patch for perf_event_open side and demo CAP_TRACING.
Once that cap bit is ready you can use it on ftrace side?

> Should we allow CAP_TRACING access to /proc/kallsyms? as it is helpful
> to convert perf and trace-cmd's function pointers into names. Once you
> allow tracing of the kernel, hiding /proc/kallsyms is pretty useless.

yep.


^ permalink raw reply

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Alexei Starovoitov @ 2019-08-28  0:34 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Alexei Starovoitov, Kees Cook, LSM List, James Morris, Jann Horn,
	Peter Zijlstra, Masami Hiramatsu, Steven Rostedt, David S. Miller,
	Daniel Borkmann, Network Development, bpf, kernel-team, Linux API
In-Reply-To: <CALCETrV8iJv9+Ai11_1_r6MapPhhwt9hjxi=6EoixytabTScqg@mail.gmail.com>

On Tue, Aug 27, 2019 at 04:01:08PM -0700, Andy Lutomirski wrote:
> [adding some security and tracing folks to cc]
> 
> On Tue, Aug 27, 2019 at 1:52 PM Alexei Starovoitov <ast@kernel.org> wrote:
> >
> > Introduce CAP_BPF that allows loading all types of BPF programs,
> > create most map types, load BTF, iterate programs and maps.
> > CAP_BPF alone is not enough to attach or run programs.
> >
> > Networking:
> >
> > CAP_BPF and CAP_NET_ADMIN are necessary to:
> > - attach to cgroup-bpf hooks like INET_INGRESS, INET_SOCK_CREATE, INET4_CONNECT
> > - run networking bpf programs (like xdp, skb, flow_dissector)
> >
> > Tracing:
> >
> > CAP_BPF and perf_paranoid_tracepoint_raw() (which is kernel.perf_event_paranoid == -1)
> > are necessary to:
> > - attach bpf program to raw tracepoint
> > - use bpf_trace_printk() in all program types (not only tracing programs)
> > - create bpf stackmap
> >
> > To attach bpf to perf_events perf_event_open() needs to succeed as usual.
> >
> > CAP_BPF controls BPF side.
> > CAP_NET_ADMIN controls intersection where BPF calls into networking.
> > perf_paranoid_tracepoint_raw controls intersection where BPF calls into tracing.
> >
> > In the future CAP_TRACING could be introduced to control
> > creation of kprobe/uprobe and attaching bpf to perf_events.
> > In such case bpf_probe_read() thin wrapper would be controlled by CAP_BPF.
> > Whereas probe_read() would be controlled by CAP_TRACING.
> > CAP_TRACING would also control generic kprobe+probe_read.
> > CAP_BPF and CAP_TRACING would be necessary for tracing bpf programs
> > that want to use bpf_probe_read.
> 
> First, some high-level review:
> 
> Can you write up some clear documentation aimed at administrators that
> says what CAP_BPF does?  For example, is it expected that CAP_BPF by
> itself permits reading all kernel memory?

hmm. the answer is in the sentence you quoted right above.

> Can you give at least one fully described use case where CAP_BPF
> solves a real-world problem that is not solved by existing mechanisms?

bpftrace binary would be installed with CAP_BPF and CAP_TRACING.
bcc tools would be installed with CAP_BPF and CAP_TRACING.
perf binary would be installed with CAP_TRACING only.
XDP networking daemon would be installed with CAP_BPF and CAP_NET_ADMIN.
None of them would need full root.

> Changing the capability that some existing operation requires could
> break existing programs.  The old capability may need to be accepted
> as well.

As far as I can see there is no ABI breakage. Please point out
which line of the patch may break it.

> I'm inclined to suggest that CAP_TRACING be figured out or rejected
> before something like this gets applied.

that's fair.

> 
> >
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > ---
> > I would prefer to introduce CAP_TRACING soon, since it
> > will make tracing and networking permission model symmetrical.
> >
> 
> Here's my proposal for CAP_TRACING, documentation-style:
> 
> --- begin ---
> 
> CAP_TRACING enables a task to use various kernel features to trace
> running user programs and the kernel itself.  CAP_TRACING also enables
> a task to bypass some speculation attack countermeasures.  A task in
> the init user namespace with CAP_TRACING will be able to tell exactly
> what kernel code is executed and when, and will be able to read kernel
> registers and kernel memory.  It will, similarly, be able to read the
> state of other user tasks.
> 
> Specifically, CAP_TRACING allows the following operations.  It may
> allow more operations in the future:
> 
>  - Full use of perf_event_open(), similarly to the effect of
> kernel.perf_event_paranoid == -1.

+1

>  - Loading and attaching tracing BPF programs, including use of BPF
> raw tracepoints.

-1

>  - Use of BPF stack maps.

-1

>  - Use of bpf_probe_read() and bpf_trace_printk().

-1

>  - Use of unsafe pointer-to-integer conversions in BPF.

-1

>  - Bypassing of BPF's speculation attack hardening measures and
> constant blinding.  (Note: other mechanisms might also allow this.)

-1
All of the above are allowed by CAP_BPF.
They are not allowed by CAP_TRACING.

> CAP_TRACING does not override normal permissions on sysfs or debugfs.
> This means that, unless a new interface for programming kprobes and
> such is added, it does not directly allow use of kprobes.

kprobes can be created via perf_event_open already.
So above statement contradicts your first statement.

> If CAP_TRACING, by itself, enables a task to crash or otherwise
> corrupt the kernel or other tasks, this will be considered a kernel
> bug.

+1

> CAP_TRACING in a non-init user namespace may, in the future, allow
> tracing of other tasks in that user namespace or its descendants.  It
> will not enable kernel tracing or tracing of tasks outside the user
> namespace in question.

I would avoid describing user ns for now.
There is enough confusion without it.

> --- end ---
> 
> Does this sound good?  The idea here is that CAP_TRACING should be
> very useful even without CAP_BPF, which allows CAP_BPF to be less
> powerful.

As proposed CAP_BPF does not allow tracing or networking on its own.
CAP_BPF only controls BPF side.

For example:
BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr)
{
        int ret;

        ret = probe_kernel_read(dst, unsafe_ptr, size);
        if (unlikely(ret < 0))
                memset(dst, 0, size);

        return ret;
}

All of BPF (including prototype of bpf_probe_read) is controlled by CAP_BPF.
But the kernel primitives its using (probe_kernel_read) is controlled by CAP_TRACING.
Hence a task needs _both_ CAP_BPF and CAP_TRACING to attach and run bpf program
that uses bpf_probe_read.

Similar with all other kernel code that BPF helpers may call directly or indirectly.
If there is a way for bpf program to call into piece of code controlled by CAP_TRACING
such helper would need CAP_BPF and CAP_TRACING.
If bpf helper calls into something that may mangle networking packet
such helper would need both CAP_BPF and CAP_NET_ADMIN to execute.

> > @@ -2080,7 +2083,10 @@ static int bpf_prog_test_run(const union bpf_attr *attr,
> >         struct bpf_prog *prog;
> >         int ret = -ENOTSUPP;
> >
> > -       if (!capable(CAP_SYS_ADMIN))
> > +       if (!capable(CAP_NET_ADMIN) || !capable(CAP_BPF))
> > +               /* test_run callback is available for networking progs only.
> > +                * Add cap_bpf_tracing() above when tracing progs become runable.
> > +                */
> 
> I think test_run should probably be CAP_SYS_ADMIN forever.  test_run
> is the only way that one can run a bpf program and call helper
> functions via the program if one doesn't have permission to attach the
> program.  

Since CAP_BPF + CAP_NET_ADMIN allow attach. It means that a task
with these two permissions will have programs running anyway.
(traffic will flow through netdev, socket events will happen, etc)
Hence no reason to disallow running program via test_run.


^ permalink raw reply

* Re: [PATCH V3 net 1/2] openvswitch: Properly set L4 keys on "later" IP fragments
From: Pravin Shelar @ 2019-08-28  0:33 UTC (permalink / raw)
  To: Greg Rose; +Cc: Linux Kernel Network Developers, Joe Stringer
In-Reply-To: <1566917890-22304-1-git-send-email-gvrose8192@gmail.com>

On Tue, Aug 27, 2019 at 7:58 AM Greg Rose <gvrose8192@gmail.com> wrote:
>
> When IP fragments are reassembled before being sent to conntrack, the
> key from the last fragment is used.  Unless there are reordering
> issues, the last fragment received will not contain the L4 ports, so the
> key for the reassembled datagram won't contain them.  This patch updates
> the key once we have a reassembled datagram.
>
> The handle_fragments() function works on L3 headers so we pull the L3/L4
> flow key update code from key_extract into a new function
> 'key_extract_l3l4'.  Then we add a another new function
> ovs_flow_key_update_l3l4() and export it so that it is accessible by
> handle_fragments() for conntrack packet reassembly.
>
> Co-authored by: Justin Pettit <jpettit@ovn.org>
> Signed-off-by: Greg Rose <gvrose8192@gmail.com>
>
Looks good to me.

Acked-by: Pravin B Shelar <pshelar@ovn.org>

Thanks,
Pravin.

^ permalink raw reply

* Re: [PATCH V3 net 2/2] openvswitch: Clear the L4 portion of the key for "later" fragments.
From: Pravin Shelar @ 2019-08-28  0:33 UTC (permalink / raw)
  To: Greg Rose; +Cc: Linux Kernel Network Developers, Joe Stringer, Justin Pettit
In-Reply-To: <1566917890-22304-2-git-send-email-gvrose8192@gmail.com>

On Tue, Aug 27, 2019 at 7:58 AM Greg Rose <gvrose8192@gmail.com> wrote:
>
> From: Justin Pettit <jpettit@ovn.org>
>
> Only the first fragment in a datagram contains the L4 headers.  When the
> Open vSwitch module parses a packet, it always sets the IP protocol
> field in the key, but can only set the L4 fields on the first fragment.
> The original behavior would not clear the L4 portion of the key, so
> garbage values would be sent in the key for "later" fragments.  This
> patch clears the L4 fields in that circumstance to prevent sending those
> garbage values as part of the upcall.
>
> Signed-off-by: Justin Pettit <jpettit@ovn.org>

Acked-by: Pravin B Shelar <pshelar@ovn.org>

Thanks,
Pravin.

^ permalink raw reply

* Re: [PATCH 10/10] mm/oom_debug: Add Enhanced Process Print Information
From: kbuild test robot @ 2019-08-28  0:21 UTC (permalink / raw)
  To: Edward Chron
  Cc: kbuild-all, Andrew Morton, Michal Hocko, Roman Gushchin,
	Johannes Weiner, David Rientjes, Tetsuo Handa, Shakeel Butt,
	linux-mm, linux-kernel, colona, Edward Chron, David S. Miller,
	netdev
In-Reply-To: <20190826193638.6638-11-echron@arista.com>

[-- Attachment #1: Type: text/plain, Size: 972 bytes --]

Hi Edward,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to v5.3-rc6 next-20190827]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Edward-Chron/mm-oom_debug-Add-Debug-base-code/20190827-183210
config: i386-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.4.0-10) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   ld: mm/oom_kill_debug.o: in function `timespec_format.constprop.2':
>> oom_kill_debug.c:(.text+0x156): undefined reference to `__udivdi3'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 69549 bytes --]

^ permalink raw reply

* [bpf-next] bpf: fix error check in bpf_tcp_gen_syncookie
From: Petar Penkov @ 2019-08-27 23:46 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Petar Penkov, Stanislav Fomichev

From: Petar Penkov <ppenkov@google.com>

If a SYN cookie is not issued by tcp_v#_gen_syncookie, then the return
value will be exactly 0, rather than <= 0. Let's change the check to
reflect that, especially since mss is an unsigned value and cannot be
negative.

Fixes: 70d66244317e ("bpf: add bpf_tcp_gen_syncookie helper")
Reported-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Petar Penkov <ppenkov@google.com>
---
 net/core/filter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 0c1059cdad3d..17bc9af8f156 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5903,7 +5903,7 @@ BPF_CALL_5(bpf_tcp_gen_syncookie, struct sock *, sk, void *, iph, u32, iph_len,
 	default:
 		return -EPROTONOSUPPORT;
 	}
-	if (mss <= 0)
+	if (mss == 0)
 		return -ENOENT;
 
 	return cookie | ((u64)mss << 32);
-- 
2.23.0.187.g17f5b7556c-goog


^ permalink raw reply related

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Andy Lutomirski @ 2019-08-27 23:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andy Lutomirski, Alexei Starovoitov, Kees Cook, LSM List,
	James Morris, Jann Horn, Peter Zijlstra, Masami Hiramatsu,
	David S. Miller, Daniel Borkmann, Network Development, bpf,
	kernel-team, Linux API
In-Reply-To: <20190827192144.3b38b25a@gandalf.local.home>

On Tue, Aug 27, 2019 at 4:21 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Tue, 27 Aug 2019 16:01:08 -0700
> Andy Lutomirski <luto@kernel.org> wrote:
>
> > [adding some security and tracing folks to cc]
> >
> > On Tue, Aug 27, 2019 at 1:52 PM Alexei Starovoitov <ast@kernel.org> wrote:
> > >
> > > Introduce CAP_BPF that allows loading all types of BPF programs,
> > > create most map types, load BTF, iterate programs and maps.
> > > CAP_BPF alone is not enough to attach or run programs.
> > >
> > > Networking:
> > >
> > > CAP_BPF and CAP_NET_ADMIN are necessary to:
> > > - attach to cgroup-bpf hooks like INET_INGRESS, INET_SOCK_CREATE, INET4_CONNECT
> > > - run networking bpf programs (like xdp, skb, flow_dissector)
> > >
> > > Tracing:
> > >
> > > CAP_BPF and perf_paranoid_tracepoint_raw() (which is kernel.perf_event_paranoid == -1)
> > > are necessary to:
> > > - attach bpf program to raw tracepoint
> > > - use bpf_trace_printk() in all program types (not only tracing programs)
> > > - create bpf stackmap
> > >
> > > To attach bpf to perf_events perf_event_open() needs to succeed as usual.
> > >
> > > CAP_BPF controls BPF side.
> > > CAP_NET_ADMIN controls intersection where BPF calls into networking.
> > > perf_paranoid_tracepoint_raw controls intersection where BPF calls into tracing.
> > >
> > > In the future CAP_TRACING could be introduced to control
> > > creation of kprobe/uprobe and attaching bpf to perf_events.
> > > In such case bpf_probe_read() thin wrapper would be controlled by CAP_BPF.
> > > Whereas probe_read() would be controlled by CAP_TRACING.
> > > CAP_TRACING would also control generic kprobe+probe_read.
> > > CAP_BPF and CAP_TRACING would be necessary for tracing bpf programs
> > > that want to use bpf_probe_read.
>
> No mention of the tracefs (/sys/kernel/tracing) file?

See below.  Also, I am embarrassed to admit that I just assumed that
/sys/kernel/debug/tracing was just like any other debugfs directory.

>
>
> >
> > Changing the capability that some existing operation requires could
> > break existing programs.  The old capability may need to be accepted
> > as well.
> >
> > I'm inclined to suggest that CAP_TRACING be figured out or rejected
> > before something like this gets applied.
> >
> >
> > >
> > > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > > ---
> > > I would prefer to introduce CAP_TRACING soon, since it
> > > will make tracing and networking permission model symmetrical.
> > >
> >
> > Here's my proposal for CAP_TRACING, documentation-style:
> >
> > --- begin ---
> >
> > CAP_TRACING enables a task to use various kernel features to trace
> > running user programs and the kernel itself.  CAP_TRACING also enables
> > a task to bypass some speculation attack countermeasures.  A task in
> > the init user namespace with CAP_TRACING will be able to tell exactly
> > what kernel code is executed and when, and will be able to read kernel
> > registers and kernel memory.  It will, similarly, be able to read the
> > state of other user tasks.
> >
> > Specifically, CAP_TRACING allows the following operations.  It may
> > allow more operations in the future:
> >
> >  - Full use of perf_event_open(), similarly to the effect of
> > kernel.perf_event_paranoid == -1.
> >
> >  - Loading and attaching tracing BPF programs, including use of BPF
> > raw tracepoints.
> >
> >  - Use of BPF stack maps.
> >
> >  - Use of bpf_probe_read() and bpf_trace_printk().
> >
> >  - Use of unsafe pointer-to-integer conversions in BPF.
> >
> >  - Bypassing of BPF's speculation attack hardening measures and
> > constant blinding.  (Note: other mechanisms might also allow this.)
> >
> > CAP_TRACING does not override normal permissions on sysfs or debugfs.
> > This means that, unless a new interface for programming kprobes and
> > such is added, it does not directly allow use of kprobes.
>
> kprobes can be created in the tracefs filesystem (which is separate from
> debugfs, tracefs just gets automatically mounted
> in /sys/kernel/debug/tracing when debugfs is mounted) from the
> kprobe_events file. /sys/kernel/tracing is just the tracefs
> directory without debugfs, and was created specifically to allow
> tracing to be access without opening up the can of worms in debugfs.

I think that, in principle, CAP_TRACING should allow this, but I'm not
sure how to achieve that.  I suppose we could set up
inode_operations.permission on tracefs, but what exactly would it do?
Would it be just like generic_permission() except that it would look
at CAP_TRACING instead of CAP_DAC_OVERRIDE?  That is, you can use
tracefs if you have CAP_TRACING *or* acl access?  Or would it be:

int tracing_permission(struct inode *inode, int mask)
{
  if (!capable(CAP_TRACING))
    return -EPERM;

  return generic_permission(inode, mask);
}

Which would mean that you need ACL *and* CAP_TRACING, so
administrators would change the mode to 777.  That's a bit scary.

And this still doesn't let people even *find* tracefs, since it's
hidden in debugfs.

So maybe make CAP_TRACING override ACLs but also add /sys/fs/tracing
and mount tracefs there, too, so that regular users can at least find
the mountpoint.

>
> Should we allow CAP_TRACING access to /proc/kallsyms? as it is helpful
> to convert perf and trace-cmd's function pointers into names. Once you
> allow tracing of the kernel, hiding /proc/kallsyms is pretty useless.

I think we should.

^ permalink raw reply

* Re: [PATCH v5 net-next 04/18] ionic: Add basic lif support
From: Shannon Nelson @ 2019-08-27 23:29 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, davem
In-Reply-To: <20190826214238.07a0eee9@cakuba.netronome.com>

On 8/26/19 9:42 PM, Jakub Kicinski wrote:
> On Mon, 26 Aug 2019 14:33:25 -0700, Shannon Nelson wrote:
>> +static inline bool ionic_is_pf(struct ionic *ionic)
>> +{
>> +	return ionic->pdev &&
>> +	       ionic->pdev->device == PCI_DEVICE_ID_PENSANDO_IONIC_ETH_PF;
>> +}
>> +
>> +static inline bool ionic_is_vf(struct ionic *ionic)
>> +{
>> +	return ionic->pdev &&
>> +	       ionic->pdev->device == PCI_DEVICE_ID_PENSANDO_IONIC_ETH_VF;
>> +}
>> +
>> +static inline bool ionic_is_25g(struct ionic *ionic)
>> +{
>> +	return ionic_is_pf(ionic) &&
>> +	       ionic->pdev->subsystem_device == IONIC_SUBDEV_ID_NAPLES_25;
>> +}
>> +
>> +static inline bool ionic_is_100g(struct ionic *ionic)
>> +{
>> +	return ionic_is_pf(ionic) &&
>> +	       (ionic->pdev->subsystem_device == IONIC_SUBDEV_ID_NAPLES_100_4 ||
>> +		ionic->pdev->subsystem_device == IONIC_SUBDEV_ID_NAPLES_100_8);
>> +}
> Again, a bunch of unused stuff.

More "near future" support code that didn't get stripped.  I'll pull it 
out for now.

sln


^ permalink raw reply

* Re: [PATCH net-next 1/4] r8169: prepare for adding RTL8125 support
From: Andrew Lunn @ 2019-08-27 23:27 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Realtek linux nic maintainers, David Miller,
	netdev@vger.kernel.org, Chun-Hao Lin
In-Reply-To: <66ac2b09-ea87-a4ba-f6f3-1885e9587298@gmail.com>

On Tue, Aug 27, 2019 at 08:41:00PM +0200, Heiner Kallweit wrote:
> This patch prepares the driver for adding RTL8125 support:
> - change type of interrupt mask to u32
> - restrict rtl_is_8168evl_up to RTL8168 chip versions
> - factor out reading MAC address from registers
> - re-add function rtl_get_events
> - move disabling interrupt coalescing to RTL8169/RTL8168 init
> - read different register for PCI commit
> - don't use bit LastFrag in tx descriptor after send, RTL8125 clears it

Hi Heiner

That is a lot of changes in one patch. Although there is no planned
functional change, r8169 has a habit of breaking. Having lots of small
changes would help tracking down which change caused a breakage, via a
git bisect.

So you might want to consider splitting this up into a number of small
patches.

	Andrew

^ permalink raw reply

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Steven Rostedt @ 2019-08-27 23:21 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Alexei Starovoitov, Kees Cook, LSM List, James Morris, Jann Horn,
	Peter Zijlstra, Masami Hiramatsu, David S. Miller,
	Daniel Borkmann, Network Development, bpf, kernel-team, Linux API
In-Reply-To: <CALCETrV8iJv9+Ai11_1_r6MapPhhwt9hjxi=6EoixytabTScqg@mail.gmail.com>

On Tue, 27 Aug 2019 16:01:08 -0700
Andy Lutomirski <luto@kernel.org> wrote:

> [adding some security and tracing folks to cc]
> 
> On Tue, Aug 27, 2019 at 1:52 PM Alexei Starovoitov <ast@kernel.org> wrote:
> >
> > Introduce CAP_BPF that allows loading all types of BPF programs,
> > create most map types, load BTF, iterate programs and maps.
> > CAP_BPF alone is not enough to attach or run programs.
> >
> > Networking:
> >
> > CAP_BPF and CAP_NET_ADMIN are necessary to:
> > - attach to cgroup-bpf hooks like INET_INGRESS, INET_SOCK_CREATE, INET4_CONNECT
> > - run networking bpf programs (like xdp, skb, flow_dissector)
> >
> > Tracing:
> >
> > CAP_BPF and perf_paranoid_tracepoint_raw() (which is kernel.perf_event_paranoid == -1)
> > are necessary to:
> > - attach bpf program to raw tracepoint
> > - use bpf_trace_printk() in all program types (not only tracing programs)
> > - create bpf stackmap
> >
> > To attach bpf to perf_events perf_event_open() needs to succeed as usual.
> >
> > CAP_BPF controls BPF side.
> > CAP_NET_ADMIN controls intersection where BPF calls into networking.
> > perf_paranoid_tracepoint_raw controls intersection where BPF calls into tracing.
> >
> > In the future CAP_TRACING could be introduced to control
> > creation of kprobe/uprobe and attaching bpf to perf_events.
> > In such case bpf_probe_read() thin wrapper would be controlled by CAP_BPF.
> > Whereas probe_read() would be controlled by CAP_TRACING.
> > CAP_TRACING would also control generic kprobe+probe_read.
> > CAP_BPF and CAP_TRACING would be necessary for tracing bpf programs
> > that want to use bpf_probe_read.

No mention of the tracefs (/sys/kernel/tracing) file?
  
> 
> First, some high-level review:
> 
> Can you write up some clear documentation aimed at administrators that
> says what CAP_BPF does?  For example, is it expected that CAP_BPF by
> itself permits reading all kernel memory?  Why might one grant it?
> 
> Can you give at least one fully described use case where CAP_BPF
> solves a real-world problem that is not solved by existing mechanisms?

At least for CAP_TRACING (if it were to allow read/write access
to /sys/kernel/tracing), that would be very useful. It would be useful
to those that basically own their machines, and want to trace their
applications all the way into the kernel without having to run as full
root.


> 
> Changing the capability that some existing operation requires could
> break existing programs.  The old capability may need to be accepted
> as well.
> 
> I'm inclined to suggest that CAP_TRACING be figured out or rejected
> before something like this gets applied.
> 
> 
> >
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > ---
> > I would prefer to introduce CAP_TRACING soon, since it
> > will make tracing and networking permission model symmetrical.
> >  
> 
> Here's my proposal for CAP_TRACING, documentation-style:
> 
> --- begin ---
> 
> CAP_TRACING enables a task to use various kernel features to trace
> running user programs and the kernel itself.  CAP_TRACING also enables
> a task to bypass some speculation attack countermeasures.  A task in
> the init user namespace with CAP_TRACING will be able to tell exactly
> what kernel code is executed and when, and will be able to read kernel
> registers and kernel memory.  It will, similarly, be able to read the
> state of other user tasks.
> 
> Specifically, CAP_TRACING allows the following operations.  It may
> allow more operations in the future:
> 
>  - Full use of perf_event_open(), similarly to the effect of
> kernel.perf_event_paranoid == -1.
> 
>  - Loading and attaching tracing BPF programs, including use of BPF
> raw tracepoints.
> 
>  - Use of BPF stack maps.
> 
>  - Use of bpf_probe_read() and bpf_trace_printk().
> 
>  - Use of unsafe pointer-to-integer conversions in BPF.
> 
>  - Bypassing of BPF's speculation attack hardening measures and
> constant blinding.  (Note: other mechanisms might also allow this.)
> 
> CAP_TRACING does not override normal permissions on sysfs or debugfs.
> This means that, unless a new interface for programming kprobes and
> such is added, it does not directly allow use of kprobes.

kprobes can be created in the tracefs filesystem (which is separate from
debugfs, tracefs just gets automatically mounted
in /sys/kernel/debug/tracing when debugfs is mounted) from the
kprobe_events file. /sys/kernel/tracing is just the tracefs
directory without debugfs, and was created specifically to allow
tracing to be access without opening up the can of worms in debugfs.

Should we allow CAP_TRACING access to /proc/kallsyms? as it is helpful
to convert perf and trace-cmd's function pointers into names. Once you
allow tracing of the kernel, hiding /proc/kallsyms is pretty useless.

-- Steve

> 
> If CAP_TRACING, by itself, enables a task to crash or otherwise
> corrupt the kernel or other tasks, this will be considered a kernel
> bug.
> 
> CAP_TRACING in a non-init user namespace may, in the future, allow
> tracing of other tasks in that user namespace or its descendants.  It
> will not enable kernel tracing or tracing of tasks outside the user
> namespace in question.
> 
> --- end ---
> 

^ permalink raw reply

* Re: [PATCH net-next v2 3/3] dpaa2-eth: Add pause frame support
From: Andrew Lunn @ 2019-08-27 23:21 UTC (permalink / raw)
  To: Ioana Radulescu; +Cc: netdev, davem, ioana.ciornei
In-Reply-To: <1566915351-32075-3-git-send-email-ruxandra.radulescu@nxp.com>

On Tue, Aug 27, 2019 at 05:15:51PM +0300, Ioana Radulescu wrote:
> Starting with firmware version MC10.18.0, we have support for
> L2 flow control. Asymmetrical configuration (Rx or Tx only) is
> supported, but not pause frame autonegotioation.

> +static int set_pause(struct dpaa2_eth_priv *priv)
> +{
> +	struct device *dev = priv->net_dev->dev.parent;
> +	struct dpni_link_cfg link_cfg = {0};
> +	int err;
> +
> +	/* Get the default link options so we don't override other flags */
> +	err = dpni_get_link_cfg(priv->mc_io, 0, priv->mc_token, &link_cfg);
> +	if (err) {
> +		dev_err(dev, "dpni_get_link_cfg() failed\n");
> +		return err;
> +	}
> +
> +	link_cfg.options |= DPNI_LINK_OPT_PAUSE;
> +	link_cfg.options &= ~DPNI_LINK_OPT_ASYM_PAUSE;
> +	err = dpni_set_link_cfg(priv->mc_io, 0, priv->mc_token, &link_cfg);
> +	if (err) {
> +		dev_err(dev, "dpni_set_link_cfg() failed\n");
> +		return err;
> +	}
> +
> +	priv->link_state.options = link_cfg.options;
> +
> +	return 0;
> +}
> +
>  /* Configure the DPNI object this interface is associated with */
>  static int setup_dpni(struct fsl_mc_device *ls_dev)
>  {
> @@ -2500,6 +2562,13 @@ static int setup_dpni(struct fsl_mc_device *ls_dev)
>  
>  	set_enqueue_mode(priv);
>  
> +	/* Enable pause frame support */
> +	if (dpaa2_eth_has_pause_support(priv)) {
> +		err = set_pause(priv);
> +		if (err)
> +			goto close;

Hi Ioana

So by default you have the MAC do pause, not asym pause?  Generally,
any MAC that can do asym pause does asym pause.

But if this is what you want, it is not wrong.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH v5 net-next 02/18] ionic: Add hardware init and device commands
From: Shannon Nelson @ 2019-08-27 23:17 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, davem
In-Reply-To: <20190827195017.GR2168@lunn.ch>

On 8/27/19 12:50 PM, Andrew Lunn wrote:
> On Tue, Aug 27, 2019 at 10:39:20AM -0700, Shannon Nelson wrote:
>> On 8/26/19 7:26 PM, Andrew Lunn wrote:
>>> On Mon, Aug 26, 2019 at 02:33:23PM -0700, Shannon Nelson wrote:
>>>> +void ionic_debugfs_add_dev(struct ionic *ionic)
>>>> +{
>>>> +	struct dentry *dentry;
>>>> +
>>>> +	dentry = debugfs_create_dir(ionic_bus_info(ionic), ionic_dir);
>>>> +	if (IS_ERR_OR_NULL(dentry))
>>>> +		return;
>>>> +
>>>> +	ionic->dentry = dentry;
>>>> +}
>>> Hi Shannon
>>>
>>> There was recently a big patchset from GregKH which removed all error
>>> checking from drivers calling debugfs calls. I'm pretty sure you don't
>>> need this check here.
>> With this check I end up either with a valid dentry value or NULL in
>> ionic->dentry.  Without this check I possibly get an error value in
>> ionic->dentry, which can get used later as the parent dentry to try to make
>> a new debugfs node.
> Hi Shannon
>
> What you should find is that every debugfs function will have
> something like:
>
> 	if (IS_ERR(dentry))
> 	   return dentry;
> or
> 	if (IS_ERR(parent))
> 	   return parent;
>
> If you know of a API which is missing such protection, i'm sure GregKH
> would like to know. Especially since he just ripped all such
> protection in driver out. Meaning he just broken some drivers if such
> IS_ERR() calls are missing in the debugfs core.
>
Ah, here's the confusion: there's a start_creating() in the tracefs as 
well as in the debugfs, and the tracefs code doesn't have all of those 
checks.

sln



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox