Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH bpf-next 0/4] bpf: precision tracking tests
From: Daniel Borkmann @ 2019-08-27 22:43 UTC (permalink / raw)
  To: Alexei Starovoitov, davem; +Cc: netdev, bpf, kernel-team
In-Reply-To: <20190823055215.2658669-1-ast@kernel.org>

On 8/23/19 7:52 AM, Alexei Starovoitov wrote:
> Add few additional tests for precision tracking in the verifier.
> 
> Alexei Starovoitov (4):
>    bpf: introduce verifier internal test flag
>    tools/bpf: sync bpf.h
>    selftests/bpf: verifier precise tests
>    selftests/bpf: add precision tracking test
> 
>   include/linux/bpf_verifier.h                  |   1 +
>   include/uapi/linux/bpf.h                      |   3 +
>   kernel/bpf/syscall.c                          |   1 +
>   kernel/bpf/verifier.c                         |   5 +-
>   tools/include/uapi/linux/bpf.h                |   3 +
>   tools/testing/selftests/bpf/test_verifier.c   |  68 +++++++--
>   .../testing/selftests/bpf/verifier/precise.c  | 142 ++++++++++++++++++
>   7 files changed, 211 insertions(+), 12 deletions(-)
>   create mode 100644 tools/testing/selftests/bpf/verifier/precise.c
> 

Applied, thanks!

^ permalink raw reply

* Re: [PATCH bpf-next v3 0/4] selftests/bpf: test_progs: misc fixes
From: Daniel Borkmann @ 2019-08-27 22:44 UTC (permalink / raw)
  To: Stanislav Fomichev, netdev, bpf; +Cc: davem, ast, Andrii Nakryiko
In-Reply-To: <20190821234427.179886-1-sdf@google.com>

On 8/22/19 1:44 AM, Stanislav Fomichev wrote:
> * add test__skip to indicate skipped tests
> * remove global success/error counts (use environment)
> * remove asserts from the tests
> * remove unused ret from send_signal test
> 
> v3:
> * QCHECK -> CHECK_FAIL (Daniel Borkmann)
> 
> v2:
> * drop patch that changes output to keep consistent with test_verifier
>    (Alexei Starovoitov)
> * QCHECK instead of test__fail (Andrii Nakryiko)
> * test__skip count number of subtests (Andrii Nakryiko)
> 
> Cc: Andrii Nakryiko <andriin@fb.com>

Applied, thanks!

^ permalink raw reply

* Re: [PATCH bpf-next] selftests/bpf: remove wrong nhoff in flow dissector test
From: Daniel Borkmann @ 2019-08-27 22:44 UTC (permalink / raw)
  To: Stanislav Fomichev, netdev, bpf; +Cc: davem, ast
In-Reply-To: <20190826222712.171177-1-sdf@google.com>

On 8/27/19 12:27 AM, Stanislav Fomichev wrote:
> .nhoff = 0 is (correctly) reset to ETH_HLEN on the next line so let's
> drop it.
> 
> Signed-off-by: Stanislav Fomichev <sdf@google.com>

Applied, thanks!

^ permalink raw reply

* Re: [E1000-devel] SFP+ EEPROM readouts fail on X722 (ethtool -m: Invalid argument)
From: Jakub Jankowski @ 2019-08-27 23:01 UTC (permalink / raw)
  To: Fujinaka, Todd, e1000-devel@lists.sourceforge.net
  Cc: netdev@vger.kernel.org, mhemsley@open-systems.com
In-Reply-To: <9B4A1B1917080E46B64F07F2989DADD69B013DB0@ORSMSX115.amr.corp.intel.com>

Hi,

On 8/27/19 7:56 PM, Fujinaka, Todd wrote:
> The hints should be:
> # ethtool -m eth10
> Cannot get module EEPROM information: Invalid argument # dmesg | tail -n 1 [  445.971974] i40e 0000:3d:00.3 eth10: Module EEPROM memory read not supported. Please update the NVM image.
>
> # ethtool -i eth10
> driver: i40e
> version: 2.9.21
> firmware-version: 3.31 0x80000d31 1.1767.0
>
> And the working case:
> # ethtool -i eth8
> driver: i40e
> version: 2.9.21
> firmware-version: 6.01 0x800035cf 1.1876.0
>
> If you don't see it, 6.01 > 3.31.
The reason why firmware between the two is (that much) different is 
because the non-working case is from X722 NIC, while the working one is 
from X710.

> The NVM update tool should be available on downloadcenter.intel.com

Thanks for the pointer to NVM updater. I'd like to offer some additional 
comments about my experience with the newest one (v4.00):

a) running ./nvmupdate64e (from X722_NVMUpdate_Linux_x64 subdir) errors 
out without really saying what's wrong:

   # ./nvmupdate64e

   Intel(R) Ethernet NVM Update Tool
   NVMUpdate version 1.30.2.11
   Copyright (C) 2013 - 2017 Intel Corporation.


   WARNING: To avoid damage to your device, do not stop the update or 
reboot or power off the system during this update.
   Inventory in progress. Please wait [+.........]
   Tool execution completed with the following status: The configuration 
file could not be opened/read, or a syntax error was discovered in the file
   Press any key to exit.

after enabling logging (-l out.log) a bit more is revealed:

   # tail -n 2 out.log
   Error:   Config file line 2: Not supported config file version.
   Error:   Missing CONFIG VERSION parameter in configuration file.

but that's not entirely true, CONFIG VERSION is set in the default 
configuration file:

   # head -n 2 nvmupdate.cfg
   CURRENT FAMILY: 1.0.0
   CONFIG VERSION: 1.14.0

so why isn't this understood?
Manually editing nvmupdate.cfg and setting CONFIG VERSION: 1.11.0 seems 
to make this particular problem go away.

b) Re-doing this with downgraded config version exposes another problem:

   Config file read.
   Error:   Can't open NVM map file [Immediate_offset_2.txt]

and indeed, there is no Immediate_offset_2.txt in 
NVMUpdatePackage_WFT_WFQ&WF0_v4.00/X722_NVMUpdate_Linux_x64/
There is one, however, in 
NVMUpdatePackage_WFT_WFQ&WF0_v4.00/X722_NVMUpdate_EFIx64/ subdir. 
Copying it over to the _Linux_x64 resolves this particular problem

c) Re-doing this with Immediate_offset_2.txt in place exposes third problem:

   Error:   Can't open NVM image file 
[LBG_B2_Wolf_Pass_WFT_X557_P01_PHY_Auto_Detect_P23_NCSI_v3.31_800016DB.bin]

and once again - same story. It exists in 
NVMUpdatePackage_WFT_WFQ&WF0_v4.00/X722_NVMUpdate_EFIx64/ but not 
NVMUpdatePackage_WFT_WFQ&WF0_v4.00/X722_NVMUpdate_Linux_x64/ - had to 
copy it over.


Once I managed to get all these out of the way, the tool finally ran:

   Num Description                               Ver. DevId S:B Status
   === ======================================== ===== ===== ====== 
===============
   01) Intel(R) Ethernet Server Adapter I350-T4  1.99  1521 00:024 
Update not
available
   02) Intel(R) Ethernet Connection X722 for     3.49  37D2 00:061 Update
       10GBASE-T available
   03) Intel(R) Ethernet Server Adapter I350-T4  1.99  1521 00:175 
Update not
available


The initial starting point was:

0) firmware-version: 3.31 0x80000d31 1.1767.0

After first update+reboot, this was bumped to:

1) firmware-version: 3.1d 0x800016db 1.1767.0    (but ethtool -m ethX 
still doesn't work)

So I ran the tool the second time, it said 'Update available' again, but 
this time:

   Num Description                               Ver. DevId S:B Status
   === ======================================== ===== ===== ====== 
===============
   01) Intel(R) Ethernet Server Adapter I350-T4  1.99  1521 00:024 
Update not
available
   02) Intel(R) Ethernet Connection X722 for     3.29  37D2 00:061 Update
       10GBASE-T available
   03) Intel(R) Ethernet Server Adapter I350-T4  1.99  1521 00:175 
Update not
available

   Options: Adapter Index List (comma-separated), [A]ll, e[X]it
   Enter selection:02
   Would you like to back up the NVM images? [Y]es/[N]o: Y
   Update in progress. This operation may take several minutes.
   [*******+..]
   Tool execution completed with the following status: <---------- why 
is there no status printed?
   Press any key to exit.


Checking output log:

   # cat out3.log
   Intel(R) Ethernet NVM Update Tool
   NVMUpdate version 1.30.2.11
   Copyright (C) 2013 - 2017 Intel Corporation.

   ./nvmupdate64e -c nvmupdate.cfg -l out3.log

   Config file read.
   Inventory
   [00:061:00:00]: Intel(R) Ethernet Connection X722 for 10GBASE-T
       Flash inventory started
       Shadow RAM inventory started
   Alternate MAC address is not set
       Shadow RAM inventory finished
       Flash inventory finished
       OROM inventory started
       OROM inventory finished
       PHY NVM inventory started
       PHY NVM inventory finished
   [00:061:00:01]: Intel(R) Ethernet Connection X722 for 10GBASE-T
       Device already inventoried.
   [00:061:00:02]: Intel(R) Ethernet Connection X722 for 10GbE SFP+
       Device already inventoried.
       PHY NVM inventory started
       PHY NVM inventory finished
   [00:061:00:03]: Intel(R) Ethernet Connection X722 for 10GbE SFP+
       Device already inventoried.
   Update
   [00:061:00:00]: Intel(R) Ethernet Connection X722 for 10GBASE-T
       Creating backup images in directory: A4BF0164884A
       Backup images created.
       Flash update started
       NVM image verification started
       Shadow RAM image verification started

   Image differences found at offset 0x3AE [Device=0xF, Buffer=0x0] - 
update required.
   Error:   Flash update failed
   [00:061:00:02]: Intel(R) Ethernet Connection X722 for 10GbE SFP+
   #

However, ethtool -i suggests that firmware was updated to:

2) firmware-version: 4.00 0x80001577 1.1580.0    <------- so it did 
_something_ after all?

At this point, every subsequent attempt to run the NVM updater yields 
the same results: an update is available, but attempting to apply it 
fails with the same message in log.

And my initial issue still persists - ethtool -m <iface> still returns 
"invalid argument" with "Module EEPROM memory read not supported. Please 
update the NVM image" logged in dmesg.

How can I resolve this?

Cheers,
  Jakub.

>
> Todd Fujinaka
> Software Application Engineer
> Datacenter Engineering Group
> Intel Corporation
> todd.fujinaka@intel.com
>
>
> -----Original Message-----
> From: Jakub Jankowski [mailto:shasta@toxcorp.com]
> Sent: Tuesday, August 27, 2019 4:03 AM
> To: e1000-devel@lists.sourceforge.net
> Cc: netdev@vger.kernel.org; shasta@toxcorp.com; mhemsley@open-systems.com
> Subject: [E1000-devel] SFP+ EEPROM readouts fail on X722 (ethtool -m: Invalid argument)
>
> Hi,
>
> We can't get SFP+ EEPROM readouts for X722 to work at all:
>
> # ethtool -m eth10
> Cannot get module EEPROM information: Invalid argument # dmesg | tail -n 1 [  445.971974] i40e 0000:3d:00.3 eth10: Module EEPROM memory read not supported. Please update the NVM image.
> # lspci | grep 3d:00.3
> 3d:00.3 Ethernet controller: Intel Corporation Ethernet Connection X722 for 10GbE SFP+ (rev 09)
>
>
> We're running 4.19.65 kernel at the moment, testing using the newest out-of-tree Intel module
>
> # modinfo -F version i40e
> 2.9.21
>
> We also tried:
> - 4.19.65 with in-tree i40e (2.3.2-k)
> - stock Arch Linux (kernel 5.2.5, driver 2.8.20-k) and the results are the same, as shown above.
>
> # ethtool -i eth10
> driver: i40e
> version: 2.9.21
> firmware-version: 3.31 0x80000d31 1.1767.0
> expansion-rom-version:
> bus-info: 0000:3d:00.3
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: yes
> # dmidecode -s baseboard-manufacturer
> Intel Corporation
> # dmidecode -s baseboard-product-name
> S2600WFT
> # dmidecode -s baseboard-version
> H48104-853
>
> # lspci -vvv
> (...)
> 3d:00.3 Ethernet controller: Intel Corporation Ethernet Connection X722 for 10GbE SFP+ (rev 09)
> 	DeviceName: Intel PCH Integrated 10 Gigabit Ethernet Controller
> 	Subsystem: Intel Corporation Ethernet Connection X722 for 10GbE SFP+
> 	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 32 bytes
> 	Interrupt: pin A routed to IRQ 112
> 	NUMA node: 0
> 	Region 0: Memory at ab000000 (64-bit, prefetchable) [size=16M]
> 	Region 3: Memory at b0000000 (64-bit, prefetchable) [size=32K]
> 	Expansion ROM at <ignored> [disabled]
> 	Capabilities: [40] Power Management version 3
> 		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
> 		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
> 	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
> 		Address: 0000000000000000  Data: 0000
> 		Masking: 00000000  Pending: 00000000
> 	Capabilities: [70] MSI-X: Enable+ Count=129 Masked-
> 		Vector table: BAR=3 offset=00000000
> 		PBA: BAR=3 offset=00001000
> 	Capabilities: [a0] Express (v2) Endpoint, MSI 00
> 		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
> 			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
> 		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
> 			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop- FLReset-
> 			MaxPayload 256 bytes, MaxReadReq 512 bytes
> 		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend-
> 		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
> 			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
> 		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> 			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> 		LnkSta:	Speed 2.5GT/s (ok), Width x1 (ok)
> 			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> 		DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported
> 			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
> 			 AtomicOpsCtl: ReqEn-
> 		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
> 			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> 	Capabilities: [e0] Vital Product Data
> 		Product Name: Example VPD
> 		Read-only fields:
> 			[V0] Vendor specific:
> 			[RV] Reserved: checksum good, 0 byte(s) reserved
> 		End
> 	Capabilities: [100 v2] Advanced Error Reporting
> 		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
> 		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> 		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> 		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> 		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
> 			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> 		HeaderLog: 00000000 00000000 00000000 00000000
> 	Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
> 		ARICap:	MFVC- ACS-, Next Function: 0
> 		ARICtl:	MFVC- ACS-, Function Group: 0
> 	Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
> 		IOVCap:	Migration-, Interrupt Message Number: 000
> 		IOVCtl:	Enable- Migration- Interrupt- MSE- ARIHierarchy-
> 		IOVSta:	Migration-
> 		Initial VFs: 32, Total VFs: 32, Number of VFs: 0, Function Dependency Link: 03
> 		VF offset: 109, stride: 1, Device ID: 37cd
> 		Supported Page Size: 00000553, System Page Size: 00000001
> 		Region 0: Memory at 00000000af000000 (64-bit, prefetchable)
> 		Region 3: Memory at 00000000b0020000 (64-bit, prefetchable)
> 		VF Migration: offset: 00000000, BIR: 0
> 	Capabilities: [1a0 v1] Transaction Processing Hints
> 		Device specific mode supported
> 		No steering table available
> 	Capabilities: [1b0 v1] Access Control Services
> 		ACSCap:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
> 		ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
> 	Kernel driver in use: i40e
> 	Kernel modules: i40e
>
>
> Same kernel+i40e, same SFP+ module - but on Intel X710, works like a treat:
>
> # lspci | grep X7
> 81:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)
> 81:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01) # ethtool -m eth8
> 	Identifier                                : 0x03 (SFP)
> 	Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
> 	Connector                                 : 0x07 (LC)
> 	Transceiver codes                         : 0x10 0x00 0x00 0x01 0x00 0x00 0x00 0x00 0x00
> 	Transceiver type                          : 10G Ethernet: 10G Base-SR
> 	Transceiver type                          : Ethernet: 1000BASE-SX
> 	Encoding                                  : 0x06 (64B/66B)
> 	BR, Nominal                               : 10300MBd
>           (...)
> # ethtool -i eth8
> driver: i40e
> version: 2.9.21
> firmware-version: 6.01 0x800035cf 1.1876.0
> expansion-rom-version:
> bus-info: 0000:81:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: yes
> #
>
>
> Is this a known problem?
>
>
> Best regards,
>    Jakub
>
>
>
> _______________________________________________
> E1000-devel mailing list
> E1000-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/e1000-devel
> To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired


^ permalink raw reply

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Andy Lutomirski @ 2019-08-27 23:01 UTC (permalink / raw)
  To: Alexei Starovoitov, Kees Cook, LSM List, James Morris, Jann Horn,
	Peter Zijlstra, Masami Hiramatsu, Steven Rostedt
  Cc: David S. Miller, Daniel Borkmann, Network Development, bpf,
	kernel-team, Linux API
In-Reply-To: <20190827205213.456318-1-ast@kernel.org>

[adding some security and tracing folks to cc]

On Tue, Aug 27, 2019 at 1:52 PM Alexei Starovoitov <ast@kernel.org> wrote:
>
> Introduce CAP_BPF that allows loading all types of BPF programs,
> create most map types, load BTF, iterate programs and maps.
> CAP_BPF alone is not enough to attach or run programs.
>
> Networking:
>
> CAP_BPF and CAP_NET_ADMIN are necessary to:
> - attach to cgroup-bpf hooks like INET_INGRESS, INET_SOCK_CREATE, INET4_CONNECT
> - run networking bpf programs (like xdp, skb, flow_dissector)
>
> Tracing:
>
> CAP_BPF and perf_paranoid_tracepoint_raw() (which is kernel.perf_event_paranoid == -1)
> are necessary to:
> - attach bpf program to raw tracepoint
> - use bpf_trace_printk() in all program types (not only tracing programs)
> - create bpf stackmap
>
> To attach bpf to perf_events perf_event_open() needs to succeed as usual.
>
> CAP_BPF controls BPF side.
> CAP_NET_ADMIN controls intersection where BPF calls into networking.
> perf_paranoid_tracepoint_raw controls intersection where BPF calls into tracing.
>
> In the future CAP_TRACING could be introduced to control
> creation of kprobe/uprobe and attaching bpf to perf_events.
> In such case bpf_probe_read() thin wrapper would be controlled by CAP_BPF.
> Whereas probe_read() would be controlled by CAP_TRACING.
> CAP_TRACING would also control generic kprobe+probe_read.
> CAP_BPF and CAP_TRACING would be necessary for tracing bpf programs
> that want to use bpf_probe_read.

First, some high-level review:

Can you write up some clear documentation aimed at administrators that
says what CAP_BPF does?  For example, is it expected that CAP_BPF by
itself permits reading all kernel memory?  Why might one grant it?

Can you give at least one fully described use case where CAP_BPF
solves a real-world problem that is not solved by existing mechanisms?

Changing the capability that some existing operation requires could
break existing programs.  The old capability may need to be accepted
as well.

I'm inclined to suggest that CAP_TRACING be figured out or rejected
before something like this gets applied.

>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
> I would prefer to introduce CAP_TRACING soon, since it
> will make tracing and networking permission model symmetrical.
>

Here's my proposal for CAP_TRACING, documentation-style:

--- begin ---

CAP_TRACING enables a task to use various kernel features to trace
running user programs and the kernel itself.  CAP_TRACING also enables
a task to bypass some speculation attack countermeasures.  A task in
the init user namespace with CAP_TRACING will be able to tell exactly
what kernel code is executed and when, and will be able to read kernel
registers and kernel memory.  It will, similarly, be able to read the
state of other user tasks.

Specifically, CAP_TRACING allows the following operations.  It may
allow more operations in the future:

 - Full use of perf_event_open(), similarly to the effect of
kernel.perf_event_paranoid == -1.

 - Loading and attaching tracing BPF programs, including use of BPF
raw tracepoints.

 - Use of BPF stack maps.

 - Use of bpf_probe_read() and bpf_trace_printk().

 - Use of unsafe pointer-to-integer conversions in BPF.

 - Bypassing of BPF's speculation attack hardening measures and
constant blinding.  (Note: other mechanisms might also allow this.)

CAP_TRACING does not override normal permissions on sysfs or debugfs.
This means that, unless a new interface for programming kprobes and
such is added, it does not directly allow use of kprobes.

If CAP_TRACING, by itself, enables a task to crash or otherwise
corrupt the kernel or other tasks, this will be considered a kernel
bug.

CAP_TRACING in a non-init user namespace may, in the future, allow
tracing of other tasks in that user namespace or its descendants.  It
will not enable kernel tracing or tracing of tasks outside the user
namespace in question.

--- end ---

Does this sound good?  The idea here is that CAP_TRACING should be
very useful even without CAP_BPF, which allows CAP_BPF to be less
powerful.

> +bool cap_bpf_tracing(void)
> +{
> +       return capable(CAP_SYS_ADMIN) ||
> +              (capable(CAP_BPF) && !perf_paranoid_tracepoint_raw());
> +}

If auditing is on, this will audit the wrong thing.  James, I think a
helper like:

bool ns_either_cap(struct user_ns *ns, int preferred_cap, int other_cap);

would help.  ns_either_cap returns true if either cap is held (i.e.
effective, as usual).  On success, it audits preferred_cap if held and
other_cap otherwise.  On failure, it audits preferred_cap.  Does this
sound right?

Also, for reference, perf_paranoid_tracepoint_raw() is this:

static inline bool perf_paranoid_tracepoint_raw(void)
{
        return sysctl_perf_event_paranoid > -1;
}

so the overall effect of cap_bpf_tracing() is rather odd, and it seems
to control a few things that don't obvious all have similar security
effects.

> @@ -2080,7 +2083,10 @@ static int bpf_prog_test_run(const union bpf_attr *attr,
>         struct bpf_prog *prog;
>         int ret = -ENOTSUPP;
>
> -       if (!capable(CAP_SYS_ADMIN))
> +       if (!capable(CAP_NET_ADMIN) || !capable(CAP_BPF))
> +               /* test_run callback is available for networking progs only.
> +                * Add cap_bpf_tracing() above when tracing progs become runable.
> +                */

I think test_run should probably be CAP_SYS_ADMIN forever.  test_run
is the only way that one can run a bpf program and call helper
functions via the program if one doesn't have permission to attach the
program.  Also, if there's a way to run a speculation attack via a bpf
program, test_run will make it much easier to do in a controlled
environment.  Finally, when debugging bpf programs, developers can use
their own computers or a VM.

^ permalink raw reply

* [PATCH net-next] net/ncsi: add response handlers for PLDM over NC-SI
From: Ben Wei @ 2019-08-27 23:03 UTC (permalink / raw)
  To: David Miller, sam@mendozajonas.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, openbmc@lists.ozlabs.org
  Cc: Ben Wei

This patch adds handlers for PLDM over NC-SI command response.

This enables NC-SI driver recognizes the packet type so the responses don't get dropped as unknown packet type.

PLDM over NC-SI are not handled in kernel driver for now, but can be passed back to user space via Netlink for further handling.

Signed-off-by: Ben Wei <benwei@fb.com>
---
 net/ncsi/ncsi-pkt.h |  5 +++++
 net/ncsi/ncsi-rsp.c | 11 +++++++++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/net/ncsi/ncsi-pkt.h b/net/ncsi/ncsi-pkt.h index a8e9def593f2..80938b338fee 100644
--- a/net/ncsi/ncsi-pkt.h
+++ b/net/ncsi/ncsi-pkt.h
@@ -387,6 +387,9 @@ struct ncsi_aen_hncdsc_pkt {
 #define NCSI_PKT_CMD_OEM	0x50 /* OEM                              */
 #define NCSI_PKT_CMD_PLDM	0x51 /* PLDM request over NCSI over RBT  */
 #define NCSI_PKT_CMD_GPUUID	0x52 /* Get package UUID                 */
+#define NCSI_PKT_CMD_QPNPR	0x56 /* Query Pending NC PLDM request */
+#define NCSI_PKT_CMD_SNPR	0x57 /* Send NC PLDM Reply  */
+
 
 /* NCSI packet responses */
 #define NCSI_PKT_RSP_CIS	(NCSI_PKT_CMD_CIS    + 0x80)
@@ -419,6 +422,8 @@ struct ncsi_aen_hncdsc_pkt {
 #define NCSI_PKT_RSP_OEM	(NCSI_PKT_CMD_OEM    + 0x80)
 #define NCSI_PKT_RSP_PLDM	(NCSI_PKT_CMD_PLDM   + 0x80)
 #define NCSI_PKT_RSP_GPUUID	(NCSI_PKT_CMD_GPUUID + 0x80)
+#define NCSI_PKT_RSP_QPNPR	(NCSI_PKT_CMD_QPNPR   + 0x80)
+#define NCSI_PKT_RSP_SNPR	(NCSI_PKT_CMD_SNPR   + 0x80)
 
 /* NCSI response code/reason */
 #define NCSI_PKT_RSP_C_COMPLETED	0x0000 /* Command Completed        */
diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c index 5254004f2b42..524974af0db6 100644
--- a/net/ncsi/ncsi-rsp.c
+++ b/net/ncsi/ncsi-rsp.c
@@ -1035,6 +1035,11 @@ static int ncsi_rsp_handler_gpuuid(struct ncsi_request *nr)
 	return 0;
 }
 
+static int ncsi_rsp_handler_pldm(struct ncsi_request *nr) {
+	return 0;
+}
+
 static int ncsi_rsp_handler_netlink(struct ncsi_request *nr)  {
 	struct ncsi_dev_priv *ndp = nr->ndp;
@@ -1088,8 +1093,10 @@ static struct ncsi_rsp_handler {
 	{ NCSI_PKT_RSP_GNPTS,  48, ncsi_rsp_handler_gnpts   },
 	{ NCSI_PKT_RSP_GPS,     8, ncsi_rsp_handler_gps     },
 	{ NCSI_PKT_RSP_OEM,    -1, ncsi_rsp_handler_oem     },
-	{ NCSI_PKT_RSP_PLDM,    0, NULL                     },
-	{ NCSI_PKT_RSP_GPUUID, 20, ncsi_rsp_handler_gpuuid  }
+	{ NCSI_PKT_RSP_PLDM,   -1, ncsi_rsp_handler_pldm    },
+	{ NCSI_PKT_RSP_GPUUID, 20, ncsi_rsp_handler_gpuuid  },
+	{ NCSI_PKT_RSP_QPNPR,  -1, ncsi_rsp_handler_pldm    },
+	{ NCSI_PKT_RSP_SNPR,   -1, ncsi_rsp_handler_pldm    }
 };
 
 int ncsi_rcv_rsp(struct sk_buff *skb, struct net_device *dev,
--
2.17.1


^ permalink raw reply

* Re: [PATCH v2 net] Add genphy_c45_config_aneg() function to phy-c45.c
From: Andrew Lunn @ 2019-08-27 23:10 UTC (permalink / raw)
  To: David Miller
  Cc: marco.hartmann, f.fainelli, hkallweit1, netdev, linux-kernel,
	christian.herber
In-Reply-To: <20190827.150103.723109968950216148.davem@davemloft.net>

On Tue, Aug 27, 2019 at 03:01:03PM -0700, David Miller wrote:
> From: Marco Hartmann <marco.hartmann@nxp.com>
> Date: Wed, 21 Aug 2019 11:00:46 +0000
> 
> > Commit 34786005eca3 ("net: phy: prevent PHYs w/o Clause 22 regs from calling
> > genphy_config_aneg") introduced a check that aborts phy_config_aneg()
> > if the phy is a C45 phy.
> > This causes phy_state_machine() to call phy_error() so that the phy
> > ends up in PHY_HALTED state.
> > 
> > Instead of returning -EOPNOTSUPP, call genphy_c45_config_aneg()
> > (analogous to the C22 case) so that the state machine can run
> > correctly.
> > 
> > genphy_c45_config_aneg() closely resembles mv3310_config_aneg()
> > in drivers/net/phy/marvell10g.c, excluding vendor specific
> > configurations for 1000BaseT.
> > 
> > Fixes: 22b56e827093 ("net: phy: replace genphy_10g_driver with genphy_c45_driver")
> > 
> > Signed-off-by: Marco Hartmann <marco.hartmann@nxp.com>
> 
> Andrew, gentle ping to respond to Heiner who said:

It at least makes it consistent with phy_restart_aneg() and
phy_aneg_done().

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net-next v2 1/3] dpaa2-eth: Remove support for changing link settings
From: Andrew Lunn @ 2019-08-27 23:11 UTC (permalink / raw)
  To: Ioana Radulescu; +Cc: netdev, davem, ioana.ciornei
In-Reply-To: <1566915351-32075-1-git-send-email-ruxandra.radulescu@nxp.com>

On Tue, Aug 27, 2019 at 05:15:49PM +0300, Ioana Radulescu wrote:
> We only support fixed-link for now, so there is no point in
> offering users the option to change link settings via ethtool.
> 
> Functionally there is no change, since firmware prevents us from
> changing link parameters anyway.
> 
> Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net-next v2 2/3] dpaa2-eth: Use stored link settings
From: Andrew Lunn @ 2019-08-27 23:12 UTC (permalink / raw)
  To: Ioana Radulescu; +Cc: netdev, davem, ioana.ciornei
In-Reply-To: <1566915351-32075-2-git-send-email-ruxandra.radulescu@nxp.com>

On Tue, Aug 27, 2019 at 05:15:50PM +0300, Ioana Radulescu wrote:
> Whenever a link state change occurs, we get notified and save
> the new link settings in the device's private data. In ethtool
> get_link_ksettings, use the stored state instead of interrogating
> the firmware each time.
> 
> Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH v5 net-next 02/18] ionic: Add hardware init and device commands
From: Shannon Nelson @ 2019-08-27 23:17 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, davem
In-Reply-To: <20190827195017.GR2168@lunn.ch>

On 8/27/19 12:50 PM, Andrew Lunn wrote:
> On Tue, Aug 27, 2019 at 10:39:20AM -0700, Shannon Nelson wrote:
>> On 8/26/19 7:26 PM, Andrew Lunn wrote:
>>> On Mon, Aug 26, 2019 at 02:33:23PM -0700, Shannon Nelson wrote:
>>>> +void ionic_debugfs_add_dev(struct ionic *ionic)
>>>> +{
>>>> +	struct dentry *dentry;
>>>> +
>>>> +	dentry = debugfs_create_dir(ionic_bus_info(ionic), ionic_dir);
>>>> +	if (IS_ERR_OR_NULL(dentry))
>>>> +		return;
>>>> +
>>>> +	ionic->dentry = dentry;
>>>> +}
>>> Hi Shannon
>>>
>>> There was recently a big patchset from GregKH which removed all error
>>> checking from drivers calling debugfs calls. I'm pretty sure you don't
>>> need this check here.
>> With this check I end up either with a valid dentry value or NULL in
>> ionic->dentry.  Without this check I possibly get an error value in
>> ionic->dentry, which can get used later as the parent dentry to try to make
>> a new debugfs node.
> Hi Shannon
>
> What you should find is that every debugfs function will have
> something like:
>
> 	if (IS_ERR(dentry))
> 	   return dentry;
> or
> 	if (IS_ERR(parent))
> 	   return parent;
>
> If you know of a API which is missing such protection, i'm sure GregKH
> would like to know. Especially since he just ripped all such
> protection in driver out. Meaning he just broken some drivers if such
> IS_ERR() calls are missing in the debugfs core.
>
Ah, here's the confusion: there's a start_creating() in the tracefs as 
well as in the debugfs, and the tracefs code doesn't have all of those 
checks.

sln



^ permalink raw reply

* Re: [PATCH net-next v2 3/3] dpaa2-eth: Add pause frame support
From: Andrew Lunn @ 2019-08-27 23:21 UTC (permalink / raw)
  To: Ioana Radulescu; +Cc: netdev, davem, ioana.ciornei
In-Reply-To: <1566915351-32075-3-git-send-email-ruxandra.radulescu@nxp.com>

On Tue, Aug 27, 2019 at 05:15:51PM +0300, Ioana Radulescu wrote:
> Starting with firmware version MC10.18.0, we have support for
> L2 flow control. Asymmetrical configuration (Rx or Tx only) is
> supported, but not pause frame autonegotioation.

> +static int set_pause(struct dpaa2_eth_priv *priv)
> +{
> +	struct device *dev = priv->net_dev->dev.parent;
> +	struct dpni_link_cfg link_cfg = {0};
> +	int err;
> +
> +	/* Get the default link options so we don't override other flags */
> +	err = dpni_get_link_cfg(priv->mc_io, 0, priv->mc_token, &link_cfg);
> +	if (err) {
> +		dev_err(dev, "dpni_get_link_cfg() failed\n");
> +		return err;
> +	}
> +
> +	link_cfg.options |= DPNI_LINK_OPT_PAUSE;
> +	link_cfg.options &= ~DPNI_LINK_OPT_ASYM_PAUSE;
> +	err = dpni_set_link_cfg(priv->mc_io, 0, priv->mc_token, &link_cfg);
> +	if (err) {
> +		dev_err(dev, "dpni_set_link_cfg() failed\n");
> +		return err;
> +	}
> +
> +	priv->link_state.options = link_cfg.options;
> +
> +	return 0;
> +}
> +
>  /* Configure the DPNI object this interface is associated with */
>  static int setup_dpni(struct fsl_mc_device *ls_dev)
>  {
> @@ -2500,6 +2562,13 @@ static int setup_dpni(struct fsl_mc_device *ls_dev)
>  
>  	set_enqueue_mode(priv);
>  
> +	/* Enable pause frame support */
> +	if (dpaa2_eth_has_pause_support(priv)) {
> +		err = set_pause(priv);
> +		if (err)
> +			goto close;

Hi Ioana

So by default you have the MAC do pause, not asym pause?  Generally,
any MAC that can do asym pause does asym pause.

But if this is what you want, it is not wrong.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Steven Rostedt @ 2019-08-27 23:21 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Alexei Starovoitov, Kees Cook, LSM List, James Morris, Jann Horn,
	Peter Zijlstra, Masami Hiramatsu, David S. Miller,
	Daniel Borkmann, Network Development, bpf, kernel-team, Linux API
In-Reply-To: <CALCETrV8iJv9+Ai11_1_r6MapPhhwt9hjxi=6EoixytabTScqg@mail.gmail.com>

On Tue, 27 Aug 2019 16:01:08 -0700
Andy Lutomirski <luto@kernel.org> wrote:

> [adding some security and tracing folks to cc]
> 
> On Tue, Aug 27, 2019 at 1:52 PM Alexei Starovoitov <ast@kernel.org> wrote:
> >
> > Introduce CAP_BPF that allows loading all types of BPF programs,
> > create most map types, load BTF, iterate programs and maps.
> > CAP_BPF alone is not enough to attach or run programs.
> >
> > Networking:
> >
> > CAP_BPF and CAP_NET_ADMIN are necessary to:
> > - attach to cgroup-bpf hooks like INET_INGRESS, INET_SOCK_CREATE, INET4_CONNECT
> > - run networking bpf programs (like xdp, skb, flow_dissector)
> >
> > Tracing:
> >
> > CAP_BPF and perf_paranoid_tracepoint_raw() (which is kernel.perf_event_paranoid == -1)
> > are necessary to:
> > - attach bpf program to raw tracepoint
> > - use bpf_trace_printk() in all program types (not only tracing programs)
> > - create bpf stackmap
> >
> > To attach bpf to perf_events perf_event_open() needs to succeed as usual.
> >
> > CAP_BPF controls BPF side.
> > CAP_NET_ADMIN controls intersection where BPF calls into networking.
> > perf_paranoid_tracepoint_raw controls intersection where BPF calls into tracing.
> >
> > In the future CAP_TRACING could be introduced to control
> > creation of kprobe/uprobe and attaching bpf to perf_events.
> > In such case bpf_probe_read() thin wrapper would be controlled by CAP_BPF.
> > Whereas probe_read() would be controlled by CAP_TRACING.
> > CAP_TRACING would also control generic kprobe+probe_read.
> > CAP_BPF and CAP_TRACING would be necessary for tracing bpf programs
> > that want to use bpf_probe_read.

No mention of the tracefs (/sys/kernel/tracing) file?
  
> 
> First, some high-level review:
> 
> Can you write up some clear documentation aimed at administrators that
> says what CAP_BPF does?  For example, is it expected that CAP_BPF by
> itself permits reading all kernel memory?  Why might one grant it?
> 
> Can you give at least one fully described use case where CAP_BPF
> solves a real-world problem that is not solved by existing mechanisms?

At least for CAP_TRACING (if it were to allow read/write access
to /sys/kernel/tracing), that would be very useful. It would be useful
to those that basically own their machines, and want to trace their
applications all the way into the kernel without having to run as full
root.


> 
> Changing the capability that some existing operation requires could
> break existing programs.  The old capability may need to be accepted
> as well.
> 
> I'm inclined to suggest that CAP_TRACING be figured out or rejected
> before something like this gets applied.
> 
> 
> >
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > ---
> > I would prefer to introduce CAP_TRACING soon, since it
> > will make tracing and networking permission model symmetrical.
> >  
> 
> Here's my proposal for CAP_TRACING, documentation-style:
> 
> --- begin ---
> 
> CAP_TRACING enables a task to use various kernel features to trace
> running user programs and the kernel itself.  CAP_TRACING also enables
> a task to bypass some speculation attack countermeasures.  A task in
> the init user namespace with CAP_TRACING will be able to tell exactly
> what kernel code is executed and when, and will be able to read kernel
> registers and kernel memory.  It will, similarly, be able to read the
> state of other user tasks.
> 
> Specifically, CAP_TRACING allows the following operations.  It may
> allow more operations in the future:
> 
>  - Full use of perf_event_open(), similarly to the effect of
> kernel.perf_event_paranoid == -1.
> 
>  - Loading and attaching tracing BPF programs, including use of BPF
> raw tracepoints.
> 
>  - Use of BPF stack maps.
> 
>  - Use of bpf_probe_read() and bpf_trace_printk().
> 
>  - Use of unsafe pointer-to-integer conversions in BPF.
> 
>  - Bypassing of BPF's speculation attack hardening measures and
> constant blinding.  (Note: other mechanisms might also allow this.)
> 
> CAP_TRACING does not override normal permissions on sysfs or debugfs.
> This means that, unless a new interface for programming kprobes and
> such is added, it does not directly allow use of kprobes.

kprobes can be created in the tracefs filesystem (which is separate from
debugfs, tracefs just gets automatically mounted
in /sys/kernel/debug/tracing when debugfs is mounted) from the
kprobe_events file. /sys/kernel/tracing is just the tracefs
directory without debugfs, and was created specifically to allow
tracing to be access without opening up the can of worms in debugfs.

Should we allow CAP_TRACING access to /proc/kallsyms? as it is helpful
to convert perf and trace-cmd's function pointers into names. Once you
allow tracing of the kernel, hiding /proc/kallsyms is pretty useless.

-- Steve

> 
> If CAP_TRACING, by itself, enables a task to crash or otherwise
> corrupt the kernel or other tasks, this will be considered a kernel
> bug.
> 
> CAP_TRACING in a non-init user namespace may, in the future, allow
> tracing of other tasks in that user namespace or its descendants.  It
> will not enable kernel tracing or tracing of tasks outside the user
> namespace in question.
> 
> --- end ---
> 

^ permalink raw reply

* Re: [PATCH net-next 1/4] r8169: prepare for adding RTL8125 support
From: Andrew Lunn @ 2019-08-27 23:27 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Realtek linux nic maintainers, David Miller,
	netdev@vger.kernel.org, Chun-Hao Lin
In-Reply-To: <66ac2b09-ea87-a4ba-f6f3-1885e9587298@gmail.com>

On Tue, Aug 27, 2019 at 08:41:00PM +0200, Heiner Kallweit wrote:
> This patch prepares the driver for adding RTL8125 support:
> - change type of interrupt mask to u32
> - restrict rtl_is_8168evl_up to RTL8168 chip versions
> - factor out reading MAC address from registers
> - re-add function rtl_get_events
> - move disabling interrupt coalescing to RTL8169/RTL8168 init
> - read different register for PCI commit
> - don't use bit LastFrag in tx descriptor after send, RTL8125 clears it

Hi Heiner

That is a lot of changes in one patch. Although there is no planned
functional change, r8169 has a habit of breaking. Having lots of small
changes would help tracking down which change caused a breakage, via a
git bisect.

So you might want to consider splitting this up into a number of small
patches.

	Andrew

^ permalink raw reply

* Re: [PATCH v5 net-next 04/18] ionic: Add basic lif support
From: Shannon Nelson @ 2019-08-27 23:29 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, davem
In-Reply-To: <20190826214238.07a0eee9@cakuba.netronome.com>

On 8/26/19 9:42 PM, Jakub Kicinski wrote:
> On Mon, 26 Aug 2019 14:33:25 -0700, Shannon Nelson wrote:
>> +static inline bool ionic_is_pf(struct ionic *ionic)
>> +{
>> +	return ionic->pdev &&
>> +	       ionic->pdev->device == PCI_DEVICE_ID_PENSANDO_IONIC_ETH_PF;
>> +}
>> +
>> +static inline bool ionic_is_vf(struct ionic *ionic)
>> +{
>> +	return ionic->pdev &&
>> +	       ionic->pdev->device == PCI_DEVICE_ID_PENSANDO_IONIC_ETH_VF;
>> +}
>> +
>> +static inline bool ionic_is_25g(struct ionic *ionic)
>> +{
>> +	return ionic_is_pf(ionic) &&
>> +	       ionic->pdev->subsystem_device == IONIC_SUBDEV_ID_NAPLES_25;
>> +}
>> +
>> +static inline bool ionic_is_100g(struct ionic *ionic)
>> +{
>> +	return ionic_is_pf(ionic) &&
>> +	       (ionic->pdev->subsystem_device == IONIC_SUBDEV_ID_NAPLES_100_4 ||
>> +		ionic->pdev->subsystem_device == IONIC_SUBDEV_ID_NAPLES_100_8);
>> +}
> Again, a bunch of unused stuff.

More "near future" support code that didn't get stripped.  I'll pull it 
out for now.

sln


^ permalink raw reply

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Andy Lutomirski @ 2019-08-27 23:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andy Lutomirski, Alexei Starovoitov, Kees Cook, LSM List,
	James Morris, Jann Horn, Peter Zijlstra, Masami Hiramatsu,
	David S. Miller, Daniel Borkmann, Network Development, bpf,
	kernel-team, Linux API
In-Reply-To: <20190827192144.3b38b25a@gandalf.local.home>

On Tue, Aug 27, 2019 at 4:21 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Tue, 27 Aug 2019 16:01:08 -0700
> Andy Lutomirski <luto@kernel.org> wrote:
>
> > [adding some security and tracing folks to cc]
> >
> > On Tue, Aug 27, 2019 at 1:52 PM Alexei Starovoitov <ast@kernel.org> wrote:
> > >
> > > Introduce CAP_BPF that allows loading all types of BPF programs,
> > > create most map types, load BTF, iterate programs and maps.
> > > CAP_BPF alone is not enough to attach or run programs.
> > >
> > > Networking:
> > >
> > > CAP_BPF and CAP_NET_ADMIN are necessary to:
> > > - attach to cgroup-bpf hooks like INET_INGRESS, INET_SOCK_CREATE, INET4_CONNECT
> > > - run networking bpf programs (like xdp, skb, flow_dissector)
> > >
> > > Tracing:
> > >
> > > CAP_BPF and perf_paranoid_tracepoint_raw() (which is kernel.perf_event_paranoid == -1)
> > > are necessary to:
> > > - attach bpf program to raw tracepoint
> > > - use bpf_trace_printk() in all program types (not only tracing programs)
> > > - create bpf stackmap
> > >
> > > To attach bpf to perf_events perf_event_open() needs to succeed as usual.
> > >
> > > CAP_BPF controls BPF side.
> > > CAP_NET_ADMIN controls intersection where BPF calls into networking.
> > > perf_paranoid_tracepoint_raw controls intersection where BPF calls into tracing.
> > >
> > > In the future CAP_TRACING could be introduced to control
> > > creation of kprobe/uprobe and attaching bpf to perf_events.
> > > In such case bpf_probe_read() thin wrapper would be controlled by CAP_BPF.
> > > Whereas probe_read() would be controlled by CAP_TRACING.
> > > CAP_TRACING would also control generic kprobe+probe_read.
> > > CAP_BPF and CAP_TRACING would be necessary for tracing bpf programs
> > > that want to use bpf_probe_read.
>
> No mention of the tracefs (/sys/kernel/tracing) file?

See below.  Also, I am embarrassed to admit that I just assumed that
/sys/kernel/debug/tracing was just like any other debugfs directory.

>
>
> >
> > Changing the capability that some existing operation requires could
> > break existing programs.  The old capability may need to be accepted
> > as well.
> >
> > I'm inclined to suggest that CAP_TRACING be figured out or rejected
> > before something like this gets applied.
> >
> >
> > >
> > > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > > ---
> > > I would prefer to introduce CAP_TRACING soon, since it
> > > will make tracing and networking permission model symmetrical.
> > >
> >
> > Here's my proposal for CAP_TRACING, documentation-style:
> >
> > --- begin ---
> >
> > CAP_TRACING enables a task to use various kernel features to trace
> > running user programs and the kernel itself.  CAP_TRACING also enables
> > a task to bypass some speculation attack countermeasures.  A task in
> > the init user namespace with CAP_TRACING will be able to tell exactly
> > what kernel code is executed and when, and will be able to read kernel
> > registers and kernel memory.  It will, similarly, be able to read the
> > state of other user tasks.
> >
> > Specifically, CAP_TRACING allows the following operations.  It may
> > allow more operations in the future:
> >
> >  - Full use of perf_event_open(), similarly to the effect of
> > kernel.perf_event_paranoid == -1.
> >
> >  - Loading and attaching tracing BPF programs, including use of BPF
> > raw tracepoints.
> >
> >  - Use of BPF stack maps.
> >
> >  - Use of bpf_probe_read() and bpf_trace_printk().
> >
> >  - Use of unsafe pointer-to-integer conversions in BPF.
> >
> >  - Bypassing of BPF's speculation attack hardening measures and
> > constant blinding.  (Note: other mechanisms might also allow this.)
> >
> > CAP_TRACING does not override normal permissions on sysfs or debugfs.
> > This means that, unless a new interface for programming kprobes and
> > such is added, it does not directly allow use of kprobes.
>
> kprobes can be created in the tracefs filesystem (which is separate from
> debugfs, tracefs just gets automatically mounted
> in /sys/kernel/debug/tracing when debugfs is mounted) from the
> kprobe_events file. /sys/kernel/tracing is just the tracefs
> directory without debugfs, and was created specifically to allow
> tracing to be access without opening up the can of worms in debugfs.

I think that, in principle, CAP_TRACING should allow this, but I'm not
sure how to achieve that.  I suppose we could set up
inode_operations.permission on tracefs, but what exactly would it do?
Would it be just like generic_permission() except that it would look
at CAP_TRACING instead of CAP_DAC_OVERRIDE?  That is, you can use
tracefs if you have CAP_TRACING *or* acl access?  Or would it be:

int tracing_permission(struct inode *inode, int mask)
{
  if (!capable(CAP_TRACING))
    return -EPERM;

  return generic_permission(inode, mask);
}

Which would mean that you need ACL *and* CAP_TRACING, so
administrators would change the mode to 777.  That's a bit scary.

And this still doesn't let people even *find* tracefs, since it's
hidden in debugfs.

So maybe make CAP_TRACING override ACLs but also add /sys/fs/tracing
and mount tracefs there, too, so that regular users can at least find
the mountpoint.

>
> Should we allow CAP_TRACING access to /proc/kallsyms? as it is helpful
> to convert perf and trace-cmd's function pointers into names. Once you
> allow tracing of the kernel, hiding /proc/kallsyms is pretty useless.

I think we should.

^ permalink raw reply

* [bpf-next] bpf: fix error check in bpf_tcp_gen_syncookie
From: Petar Penkov @ 2019-08-27 23:46 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Petar Penkov, Stanislav Fomichev

From: Petar Penkov <ppenkov@google.com>

If a SYN cookie is not issued by tcp_v#_gen_syncookie, then the return
value will be exactly 0, rather than <= 0. Let's change the check to
reflect that, especially since mss is an unsigned value and cannot be
negative.

Fixes: 70d66244317e ("bpf: add bpf_tcp_gen_syncookie helper")
Reported-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Petar Penkov <ppenkov@google.com>
---
 net/core/filter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 0c1059cdad3d..17bc9af8f156 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5903,7 +5903,7 @@ BPF_CALL_5(bpf_tcp_gen_syncookie, struct sock *, sk, void *, iph, u32, iph_len,
 	default:
 		return -EPROTONOSUPPORT;
 	}
-	if (mss <= 0)
+	if (mss == 0)
 		return -ENOENT;
 
 	return cookie | ((u64)mss << 32);
-- 
2.23.0.187.g17f5b7556c-goog


^ permalink raw reply related

* Re: [PATCH 10/10] mm/oom_debug: Add Enhanced Process Print Information
From: kbuild test robot @ 2019-08-28  0:21 UTC (permalink / raw)
  To: Edward Chron
  Cc: kbuild-all, Andrew Morton, Michal Hocko, Roman Gushchin,
	Johannes Weiner, David Rientjes, Tetsuo Handa, Shakeel Butt,
	linux-mm, linux-kernel, colona, Edward Chron, David S. Miller,
	netdev
In-Reply-To: <20190826193638.6638-11-echron@arista.com>

[-- Attachment #1: Type: text/plain, Size: 972 bytes --]

Hi Edward,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to v5.3-rc6 next-20190827]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Edward-Chron/mm-oom_debug-Add-Debug-base-code/20190827-183210
config: i386-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.4.0-10) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   ld: mm/oom_kill_debug.o: in function `timespec_format.constprop.2':
>> oom_kill_debug.c:(.text+0x156): undefined reference to `__udivdi3'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 69549 bytes --]

^ permalink raw reply

* Re: [PATCH V3 net 2/2] openvswitch: Clear the L4 portion of the key for "later" fragments.
From: Pravin Shelar @ 2019-08-28  0:33 UTC (permalink / raw)
  To: Greg Rose; +Cc: Linux Kernel Network Developers, Joe Stringer, Justin Pettit
In-Reply-To: <1566917890-22304-2-git-send-email-gvrose8192@gmail.com>

On Tue, Aug 27, 2019 at 7:58 AM Greg Rose <gvrose8192@gmail.com> wrote:
>
> From: Justin Pettit <jpettit@ovn.org>
>
> Only the first fragment in a datagram contains the L4 headers.  When the
> Open vSwitch module parses a packet, it always sets the IP protocol
> field in the key, but can only set the L4 fields on the first fragment.
> The original behavior would not clear the L4 portion of the key, so
> garbage values would be sent in the key for "later" fragments.  This
> patch clears the L4 fields in that circumstance to prevent sending those
> garbage values as part of the upcall.
>
> Signed-off-by: Justin Pettit <jpettit@ovn.org>

Acked-by: Pravin B Shelar <pshelar@ovn.org>

Thanks,
Pravin.

^ permalink raw reply

* Re: [PATCH V3 net 1/2] openvswitch: Properly set L4 keys on "later" IP fragments
From: Pravin Shelar @ 2019-08-28  0:33 UTC (permalink / raw)
  To: Greg Rose; +Cc: Linux Kernel Network Developers, Joe Stringer
In-Reply-To: <1566917890-22304-1-git-send-email-gvrose8192@gmail.com>

On Tue, Aug 27, 2019 at 7:58 AM Greg Rose <gvrose8192@gmail.com> wrote:
>
> When IP fragments are reassembled before being sent to conntrack, the
> key from the last fragment is used.  Unless there are reordering
> issues, the last fragment received will not contain the L4 ports, so the
> key for the reassembled datagram won't contain them.  This patch updates
> the key once we have a reassembled datagram.
>
> The handle_fragments() function works on L3 headers so we pull the L3/L4
> flow key update code from key_extract into a new function
> 'key_extract_l3l4'.  Then we add a another new function
> ovs_flow_key_update_l3l4() and export it so that it is accessible by
> handle_fragments() for conntrack packet reassembly.
>
> Co-authored by: Justin Pettit <jpettit@ovn.org>
> Signed-off-by: Greg Rose <gvrose8192@gmail.com>
>
Looks good to me.

Acked-by: Pravin B Shelar <pshelar@ovn.org>

Thanks,
Pravin.

^ permalink raw reply

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Alexei Starovoitov @ 2019-08-28  0:34 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Alexei Starovoitov, Kees Cook, LSM List, James Morris, Jann Horn,
	Peter Zijlstra, Masami Hiramatsu, Steven Rostedt, David S. Miller,
	Daniel Borkmann, Network Development, bpf, kernel-team, Linux API
In-Reply-To: <CALCETrV8iJv9+Ai11_1_r6MapPhhwt9hjxi=6EoixytabTScqg@mail.gmail.com>

On Tue, Aug 27, 2019 at 04:01:08PM -0700, Andy Lutomirski wrote:
> [adding some security and tracing folks to cc]
> 
> On Tue, Aug 27, 2019 at 1:52 PM Alexei Starovoitov <ast@kernel.org> wrote:
> >
> > Introduce CAP_BPF that allows loading all types of BPF programs,
> > create most map types, load BTF, iterate programs and maps.
> > CAP_BPF alone is not enough to attach or run programs.
> >
> > Networking:
> >
> > CAP_BPF and CAP_NET_ADMIN are necessary to:
> > - attach to cgroup-bpf hooks like INET_INGRESS, INET_SOCK_CREATE, INET4_CONNECT
> > - run networking bpf programs (like xdp, skb, flow_dissector)
> >
> > Tracing:
> >
> > CAP_BPF and perf_paranoid_tracepoint_raw() (which is kernel.perf_event_paranoid == -1)
> > are necessary to:
> > - attach bpf program to raw tracepoint
> > - use bpf_trace_printk() in all program types (not only tracing programs)
> > - create bpf stackmap
> >
> > To attach bpf to perf_events perf_event_open() needs to succeed as usual.
> >
> > CAP_BPF controls BPF side.
> > CAP_NET_ADMIN controls intersection where BPF calls into networking.
> > perf_paranoid_tracepoint_raw controls intersection where BPF calls into tracing.
> >
> > In the future CAP_TRACING could be introduced to control
> > creation of kprobe/uprobe and attaching bpf to perf_events.
> > In such case bpf_probe_read() thin wrapper would be controlled by CAP_BPF.
> > Whereas probe_read() would be controlled by CAP_TRACING.
> > CAP_TRACING would also control generic kprobe+probe_read.
> > CAP_BPF and CAP_TRACING would be necessary for tracing bpf programs
> > that want to use bpf_probe_read.
> 
> First, some high-level review:
> 
> Can you write up some clear documentation aimed at administrators that
> says what CAP_BPF does?  For example, is it expected that CAP_BPF by
> itself permits reading all kernel memory?

hmm. the answer is in the sentence you quoted right above.

> Can you give at least one fully described use case where CAP_BPF
> solves a real-world problem that is not solved by existing mechanisms?

bpftrace binary would be installed with CAP_BPF and CAP_TRACING.
bcc tools would be installed with CAP_BPF and CAP_TRACING.
perf binary would be installed with CAP_TRACING only.
XDP networking daemon would be installed with CAP_BPF and CAP_NET_ADMIN.
None of them would need full root.

> Changing the capability that some existing operation requires could
> break existing programs.  The old capability may need to be accepted
> as well.

As far as I can see there is no ABI breakage. Please point out
which line of the patch may break it.

> I'm inclined to suggest that CAP_TRACING be figured out or rejected
> before something like this gets applied.

that's fair.

> 
> >
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > ---
> > I would prefer to introduce CAP_TRACING soon, since it
> > will make tracing and networking permission model symmetrical.
> >
> 
> Here's my proposal for CAP_TRACING, documentation-style:
> 
> --- begin ---
> 
> CAP_TRACING enables a task to use various kernel features to trace
> running user programs and the kernel itself.  CAP_TRACING also enables
> a task to bypass some speculation attack countermeasures.  A task in
> the init user namespace with CAP_TRACING will be able to tell exactly
> what kernel code is executed and when, and will be able to read kernel
> registers and kernel memory.  It will, similarly, be able to read the
> state of other user tasks.
> 
> Specifically, CAP_TRACING allows the following operations.  It may
> allow more operations in the future:
> 
>  - Full use of perf_event_open(), similarly to the effect of
> kernel.perf_event_paranoid == -1.

+1

>  - Loading and attaching tracing BPF programs, including use of BPF
> raw tracepoints.

-1

>  - Use of BPF stack maps.

-1

>  - Use of bpf_probe_read() and bpf_trace_printk().

-1

>  - Use of unsafe pointer-to-integer conversions in BPF.

-1

>  - Bypassing of BPF's speculation attack hardening measures and
> constant blinding.  (Note: other mechanisms might also allow this.)

-1
All of the above are allowed by CAP_BPF.
They are not allowed by CAP_TRACING.

> CAP_TRACING does not override normal permissions on sysfs or debugfs.
> This means that, unless a new interface for programming kprobes and
> such is added, it does not directly allow use of kprobes.

kprobes can be created via perf_event_open already.
So above statement contradicts your first statement.

> If CAP_TRACING, by itself, enables a task to crash or otherwise
> corrupt the kernel or other tasks, this will be considered a kernel
> bug.

+1

> CAP_TRACING in a non-init user namespace may, in the future, allow
> tracing of other tasks in that user namespace or its descendants.  It
> will not enable kernel tracing or tracing of tasks outside the user
> namespace in question.

I would avoid describing user ns for now.
There is enough confusion without it.

> --- end ---
> 
> Does this sound good?  The idea here is that CAP_TRACING should be
> very useful even without CAP_BPF, which allows CAP_BPF to be less
> powerful.

As proposed CAP_BPF does not allow tracing or networking on its own.
CAP_BPF only controls BPF side.

For example:
BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr)
{
        int ret;

        ret = probe_kernel_read(dst, unsafe_ptr, size);
        if (unlikely(ret < 0))
                memset(dst, 0, size);

        return ret;
}

All of BPF (including prototype of bpf_probe_read) is controlled by CAP_BPF.
But the kernel primitives its using (probe_kernel_read) is controlled by CAP_TRACING.
Hence a task needs _both_ CAP_BPF and CAP_TRACING to attach and run bpf program
that uses bpf_probe_read.

Similar with all other kernel code that BPF helpers may call directly or indirectly.
If there is a way for bpf program to call into piece of code controlled by CAP_TRACING
such helper would need CAP_BPF and CAP_TRACING.
If bpf helper calls into something that may mangle networking packet
such helper would need both CAP_BPF and CAP_NET_ADMIN to execute.

> > @@ -2080,7 +2083,10 @@ static int bpf_prog_test_run(const union bpf_attr *attr,
> >         struct bpf_prog *prog;
> >         int ret = -ENOTSUPP;
> >
> > -       if (!capable(CAP_SYS_ADMIN))
> > +       if (!capable(CAP_NET_ADMIN) || !capable(CAP_BPF))
> > +               /* test_run callback is available for networking progs only.
> > +                * Add cap_bpf_tracing() above when tracing progs become runable.
> > +                */
> 
> I think test_run should probably be CAP_SYS_ADMIN forever.  test_run
> is the only way that one can run a bpf program and call helper
> functions via the program if one doesn't have permission to attach the
> program.  

Since CAP_BPF + CAP_NET_ADMIN allow attach. It means that a task
with these two permissions will have programs running anyway.
(traffic will flow through netdev, socket events will happen, etc)
Hence no reason to disallow running program via test_run.


^ permalink raw reply

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Alexei Starovoitov @ 2019-08-28  0:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andy Lutomirski, Alexei Starovoitov, Kees Cook, LSM List,
	James Morris, Jann Horn, Peter Zijlstra, Masami Hiramatsu,
	David S. Miller, Daniel Borkmann, Network Development, bpf,
	kernel-team, Linux API
In-Reply-To: <20190827192144.3b38b25a@gandalf.local.home>

On Tue, Aug 27, 2019 at 07:21:44PM -0400, Steven Rostedt wrote:
> 
> At least for CAP_TRACING (if it were to allow read/write access
> to /sys/kernel/tracing), that would be very useful. It would be useful
> to those that basically own their machines, and want to trace their
> applications all the way into the kernel without having to run as full
> root.

+1

The proposal is to have CAP_TRACING to control perf and ftrace.
perf and trace-cmd binaries could be installed with CAP_TRACING and that's
all they need to do full tracing.

I can craft a patch for perf_event_open side and demo CAP_TRACING.
Once that cap bit is ready you can use it on ftrace side?

> Should we allow CAP_TRACING access to /proc/kallsyms? as it is helpful
> to convert perf and trace-cmd's function pointers into names. Once you
> allow tracing of the kernel, hiding /proc/kallsyms is pretty useless.

yep.


^ permalink raw reply

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Steven Rostedt @ 2019-08-28  0:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Alexei Starovoitov, Kees Cook, LSM List, James Morris, Jann Horn,
	Peter Zijlstra, Masami Hiramatsu, David S. Miller,
	Daniel Borkmann, Network Development, bpf, kernel-team, Linux API
In-Reply-To: <CALCETrUOHRMkBRJi_s30CjZdOLDGtdMOEgqfgPf+q0x+Fw7LtQ@mail.gmail.com>

On Tue, 27 Aug 2019 16:34:47 -0700
Andy Lutomirski <luto@kernel.org> wrote:

> > > CAP_TRACING does not override normal permissions on sysfs or debugfs.
> > > This means that, unless a new interface for programming kprobes and
> > > such is added, it does not directly allow use of kprobes.  
> >
> > kprobes can be created in the tracefs filesystem (which is separate from
> > debugfs, tracefs just gets automatically mounted
> > in /sys/kernel/debug/tracing when debugfs is mounted) from the
> > kprobe_events file. /sys/kernel/tracing is just the tracefs
> > directory without debugfs, and was created specifically to allow
> > tracing to be access without opening up the can of worms in debugfs.  
> 
> I think that, in principle, CAP_TRACING should allow this, but I'm not
> sure how to achieve that.  I suppose we could set up
> inode_operations.permission on tracefs, but what exactly would it do?
> Would it be just like generic_permission() except that it would look
> at CAP_TRACING instead of CAP_DAC_OVERRIDE?  That is, you can use
> tracefs if you have CAP_TRACING *or* acl access?  Or would it be:
> 
> int tracing_permission(struct inode *inode, int mask)
> {
>   if (!capable(CAP_TRACING))
>     return -EPERM;
> 
>   return generic_permission(inode, mask);
> }

Perhaps we should make a group for it?

> 
> Which would mean that you need ACL *and* CAP_TRACING, so
> administrators would change the mode to 777.  That's a bit scary.
> 
> And this still doesn't let people even *find* tracefs, since it's
> hidden in debugfs.
> 
> So maybe make CAP_TRACING override ACLs but also add /sys/fs/tracing
> and mount tracefs there, too, so that regular users can at least find
> the mountpoint.

I think you missed what I said. It's not hidden in /sys/kernel/debug.
If you enable tracefs, you have /sys/kernel/tracing created, and is
completely separate from debugfs. I only have it *also* automatically
mounted to /sys/kernel/debug/tracing for backward compatibility
reasons, as older versions of trace-cmd will only mount debugfs (as
root), and expect to find it there.

 mount -t tracefs nodev /sys/kernel/tracing

-- Steve

> 
> >
> > Should we allow CAP_TRACING access to /proc/kallsyms? as it is helpful
> > to convert perf and trace-cmd's function pointers into names. Once you
> > allow tracing of the kernel, hiding /proc/kallsyms is pretty useless.  
> 
> I think we should.


^ permalink raw reply

* RE: [E1000-devel] SFP+ EEPROM readouts fail on X722 (ethtool -m: Invalid argument)
From: Fujinaka, Todd @ 2019-08-28  0:53 UTC (permalink / raw)
  To: Jakub Jankowski, e1000-devel@lists.sourceforge.net
  Cc: netdev@vger.kernel.org, mhemsley@open-systems.com
In-Reply-To: <34ba28aa-44a5-8a6c-c8c4-b92a16f952ad@toxcorp.com>

Sorry about the top posting, but if I don't do it this way I can't read anything in Outlook (not my preferred MUA).

I think I may have been wrong about things. I'm not as familiar with the x722, and the NVM versions are completely different than the x710 and I was confused.

Even worse, I'm not sure if the x722 is able to read the data from the SFP/SFP+ EEPROM. I remembered that was a feature we requested internally but I don't remember what the progress was.

I'm asking around to see if I can get clarification. I haven't heard anything yet.

Todd Fujinaka
Software Application Engineer
Datacenter Engineering Group
Intel Corporation
todd.fujinaka@intel.com


-----Original Message-----
From: Jakub Jankowski [mailto:shasta@toxcorp.com] 
Sent: Tuesday, August 27, 2019 4:01 PM
To: Fujinaka, Todd <todd.fujinaka@intel.com>; e1000-devel@lists.sourceforge.net
Cc: netdev@vger.kernel.org; mhemsley@open-systems.com
Subject: Re: [E1000-devel] SFP+ EEPROM readouts fail on X722 (ethtool -m: Invalid argument)

Hi,

On 8/27/19 7:56 PM, Fujinaka, Todd wrote:
> The hints should be:
> # ethtool -m eth10
> Cannot get module EEPROM information: Invalid argument # dmesg | tail -n 1 [  445.971974] i40e 0000:3d:00.3 eth10: Module EEPROM memory read not supported. Please update the NVM image.
>
> # ethtool -i eth10
> driver: i40e
> version: 2.9.21
> firmware-version: 3.31 0x80000d31 1.1767.0
>
> And the working case:
> # ethtool -i eth8
> driver: i40e
> version: 2.9.21
> firmware-version: 6.01 0x800035cf 1.1876.0
>
> If you don't see it, 6.01 > 3.31.
The reason why firmware between the two is (that much) different is because the non-working case is from X722 NIC, while the working one is from X710.

> The NVM update tool should be available on downloadcenter.intel.com

Thanks for the pointer to NVM updater. I'd like to offer some additional comments about my experience with the newest one (v4.00):

a) running ./nvmupdate64e (from X722_NVMUpdate_Linux_x64 subdir) errors out without really saying what's wrong:

   # ./nvmupdate64e

   Intel(R) Ethernet NVM Update Tool
   NVMUpdate version 1.30.2.11
   Copyright (C) 2013 - 2017 Intel Corporation.


   WARNING: To avoid damage to your device, do not stop the update or reboot or power off the system during this update.
   Inventory in progress. Please wait [+.........]
   Tool execution completed with the following status: The configuration file could not be opened/read, or a syntax error was discovered in the file
   Press any key to exit.

after enabling logging (-l out.log) a bit more is revealed:

   # tail -n 2 out.log
   Error:   Config file line 2: Not supported config file version.
   Error:   Missing CONFIG VERSION parameter in configuration file.

but that's not entirely true, CONFIG VERSION is set in the default configuration file:

   # head -n 2 nvmupdate.cfg
   CURRENT FAMILY: 1.0.0
   CONFIG VERSION: 1.14.0

so why isn't this understood?
Manually editing nvmupdate.cfg and setting CONFIG VERSION: 1.11.0 seems to make this particular problem go away.

b) Re-doing this with downgraded config version exposes another problem:

   Config file read.
   Error:   Can't open NVM map file [Immediate_offset_2.txt]

and indeed, there is no Immediate_offset_2.txt in NVMUpdatePackage_WFT_WFQ&WF0_v4.00/X722_NVMUpdate_Linux_x64/
There is one, however, in
NVMUpdatePackage_WFT_WFQ&WF0_v4.00/X722_NVMUpdate_EFIx64/ subdir. 
Copying it over to the _Linux_x64 resolves this particular problem

c) Re-doing this with Immediate_offset_2.txt in place exposes third problem:

   Error:   Can't open NVM image file
[LBG_B2_Wolf_Pass_WFT_X557_P01_PHY_Auto_Detect_P23_NCSI_v3.31_800016DB.bin]

and once again - same story. It exists in NVMUpdatePackage_WFT_WFQ&WF0_v4.00/X722_NVMUpdate_EFIx64/ but not NVMUpdatePackage_WFT_WFQ&WF0_v4.00/X722_NVMUpdate_Linux_x64/ - had to copy it over.


Once I managed to get all these out of the way, the tool finally ran:

   Num Description                               Ver. DevId S:B Status
   === ======================================== ===== ===== ====== ===============
   01) Intel(R) Ethernet Server Adapter I350-T4  1.99  1521 00:024 Update not available
   02) Intel(R) Ethernet Connection X722 for     3.49  37D2 00:061 Update
       10GBASE-T available
   03) Intel(R) Ethernet Server Adapter I350-T4  1.99  1521 00:175 Update not available


The initial starting point was:

0) firmware-version: 3.31 0x80000d31 1.1767.0

After first update+reboot, this was bumped to:

1) firmware-version: 3.1d 0x800016db 1.1767.0    (but ethtool -m ethX still doesn't work)

So I ran the tool the second time, it said 'Update available' again, but this time:

   Num Description                               Ver. DevId S:B Status
   === ======================================== ===== ===== ====== ===============
   01) Intel(R) Ethernet Server Adapter I350-T4  1.99  1521 00:024 Update not available
   02) Intel(R) Ethernet Connection X722 for     3.29  37D2 00:061 Update
       10GBASE-T available
   03) Intel(R) Ethernet Server Adapter I350-T4  1.99  1521 00:175 Update not available

   Options: Adapter Index List (comma-separated), [A]ll, e[X]it
   Enter selection:02
   Would you like to back up the NVM images? [Y]es/[N]o: Y
   Update in progress. This operation may take several minutes.
   [*******+..]
   Tool execution completed with the following status: <---------- why is there no status printed?
   Press any key to exit.


Checking output log:

   # cat out3.log
   Intel(R) Ethernet NVM Update Tool
   NVMUpdate version 1.30.2.11
   Copyright (C) 2013 - 2017 Intel Corporation.

   ./nvmupdate64e -c nvmupdate.cfg -l out3.log

   Config file read.
   Inventory
   [00:061:00:00]: Intel(R) Ethernet Connection X722 for 10GBASE-T
       Flash inventory started
       Shadow RAM inventory started
   Alternate MAC address is not set
       Shadow RAM inventory finished
       Flash inventory finished
       OROM inventory started
       OROM inventory finished
       PHY NVM inventory started
       PHY NVM inventory finished
   [00:061:00:01]: Intel(R) Ethernet Connection X722 for 10GBASE-T
       Device already inventoried.
   [00:061:00:02]: Intel(R) Ethernet Connection X722 for 10GbE SFP+
       Device already inventoried.
       PHY NVM inventory started
       PHY NVM inventory finished
   [00:061:00:03]: Intel(R) Ethernet Connection X722 for 10GbE SFP+
       Device already inventoried.
   Update
   [00:061:00:00]: Intel(R) Ethernet Connection X722 for 10GBASE-T
       Creating backup images in directory: A4BF0164884A
       Backup images created.
       Flash update started
       NVM image verification started
       Shadow RAM image verification started

   Image differences found at offset 0x3AE [Device=0xF, Buffer=0x0] - 
update required.
   Error:   Flash update failed
   [00:061:00:02]: Intel(R) Ethernet Connection X722 for 10GbE SFP+
   #

However, ethtool -i suggests that firmware was updated to:

2) firmware-version: 4.00 0x80001577 1.1580.0    <------- so it did 
_something_ after all?

At this point, every subsequent attempt to run the NVM updater yields 
the same results: an update is available, but attempting to apply it 
fails with the same message in log.

And my initial issue still persists - ethtool -m <iface> still returns 
"invalid argument" with "Module EEPROM memory read not supported. Please 
update the NVM image" logged in dmesg.

How can I resolve this?

Cheers,
  Jakub.

>
> Todd Fujinaka
> Software Application Engineer
> Datacenter Engineering Group
> Intel Corporation
> todd.fujinaka@intel.com
>
>
> -----Original Message-----
> From: Jakub Jankowski [mailto:shasta@toxcorp.com]
> Sent: Tuesday, August 27, 2019 4:03 AM
> To: e1000-devel@lists.sourceforge.net
> Cc: netdev@vger.kernel.org; shasta@toxcorp.com; mhemsley@open-systems.com
> Subject: [E1000-devel] SFP+ EEPROM readouts fail on X722 (ethtool -m: Invalid argument)
>
> Hi,
>
> We can't get SFP+ EEPROM readouts for X722 to work at all:
>
> # ethtool -m eth10
> Cannot get module EEPROM information: Invalid argument # dmesg | tail -n 1 [  445.971974] i40e 0000:3d:00.3 eth10: Module EEPROM memory read not supported. Please update the NVM image.
> # lspci | grep 3d:00.3
> 3d:00.3 Ethernet controller: Intel Corporation Ethernet Connection X722 for 10GbE SFP+ (rev 09)
>
>
> We're running 4.19.65 kernel at the moment, testing using the newest out-of-tree Intel module
>
> # modinfo -F version i40e
> 2.9.21
>
> We also tried:
> - 4.19.65 with in-tree i40e (2.3.2-k)
> - stock Arch Linux (kernel 5.2.5, driver 2.8.20-k) and the results are the same, as shown above.
>
> # ethtool -i eth10
> driver: i40e
> version: 2.9.21
> firmware-version: 3.31 0x80000d31 1.1767.0
> expansion-rom-version:
> bus-info: 0000:3d:00.3
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: yes
> # dmidecode -s baseboard-manufacturer
> Intel Corporation
> # dmidecode -s baseboard-product-name
> S2600WFT
> # dmidecode -s baseboard-version
> H48104-853
>
> # lspci -vvv
> (...)
> 3d:00.3 Ethernet controller: Intel Corporation Ethernet Connection X722 for 10GbE SFP+ (rev 09)
> 	DeviceName: Intel PCH Integrated 10 Gigabit Ethernet Controller
> 	Subsystem: Intel Corporation Ethernet Connection X722 for 10GbE SFP+
> 	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 32 bytes
> 	Interrupt: pin A routed to IRQ 112
> 	NUMA node: 0
> 	Region 0: Memory at ab000000 (64-bit, prefetchable) [size=16M]
> 	Region 3: Memory at b0000000 (64-bit, prefetchable) [size=32K]
> 	Expansion ROM at <ignored> [disabled]
> 	Capabilities: [40] Power Management version 3
> 		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
> 		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
> 	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
> 		Address: 0000000000000000  Data: 0000
> 		Masking: 00000000  Pending: 00000000
> 	Capabilities: [70] MSI-X: Enable+ Count=129 Masked-
> 		Vector table: BAR=3 offset=00000000
> 		PBA: BAR=3 offset=00001000
> 	Capabilities: [a0] Express (v2) Endpoint, MSI 00
> 		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
> 			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
> 		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
> 			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop- FLReset-
> 			MaxPayload 256 bytes, MaxReadReq 512 bytes
> 		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend-
> 		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
> 			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
> 		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> 			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> 		LnkSta:	Speed 2.5GT/s (ok), Width x1 (ok)
> 			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> 		DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported
> 			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
> 			 AtomicOpsCtl: ReqEn-
> 		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
> 			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> 	Capabilities: [e0] Vital Product Data
> 		Product Name: Example VPD
> 		Read-only fields:
> 			[V0] Vendor specific:
> 			[RV] Reserved: checksum good, 0 byte(s) reserved
> 		End
> 	Capabilities: [100 v2] Advanced Error Reporting
> 		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
> 		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> 		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> 		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> 		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
> 			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> 		HeaderLog: 00000000 00000000 00000000 00000000
> 	Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
> 		ARICap:	MFVC- ACS-, Next Function: 0
> 		ARICtl:	MFVC- ACS-, Function Group: 0
> 	Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
> 		IOVCap:	Migration-, Interrupt Message Number: 000
> 		IOVCtl:	Enable- Migration- Interrupt- MSE- ARIHierarchy-
> 		IOVSta:	Migration-
> 		Initial VFs: 32, Total VFs: 32, Number of VFs: 0, Function Dependency Link: 03
> 		VF offset: 109, stride: 1, Device ID: 37cd
> 		Supported Page Size: 00000553, System Page Size: 00000001
> 		Region 0: Memory at 00000000af000000 (64-bit, prefetchable)
> 		Region 3: Memory at 00000000b0020000 (64-bit, prefetchable)
> 		VF Migration: offset: 00000000, BIR: 0
> 	Capabilities: [1a0 v1] Transaction Processing Hints
> 		Device specific mode supported
> 		No steering table available
> 	Capabilities: [1b0 v1] Access Control Services
> 		ACSCap:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
> 		ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
> 	Kernel driver in use: i40e
> 	Kernel modules: i40e
>
>
> Same kernel+i40e, same SFP+ module - but on Intel X710, works like a treat:
>
> # lspci | grep X7
> 81:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)
> 81:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01) # ethtool -m eth8
> 	Identifier                                : 0x03 (SFP)
> 	Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
> 	Connector                                 : 0x07 (LC)
> 	Transceiver codes                         : 0x10 0x00 0x00 0x01 0x00 0x00 0x00 0x00 0x00
> 	Transceiver type                          : 10G Ethernet: 10G Base-SR
> 	Transceiver type                          : Ethernet: 1000BASE-SX
> 	Encoding                                  : 0x06 (64B/66B)
> 	BR, Nominal                               : 10300MBd
>           (...)
> # ethtool -i eth8
> driver: i40e
> version: 2.9.21
> firmware-version: 6.01 0x800035cf 1.1876.0
> expansion-rom-version:
> bus-info: 0000:81:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: yes
> #
>
>
> Is this a known problem?
>
>
> Best regards,
>    Jakub
>
>
>
> _______________________________________________
> E1000-devel mailing list
> E1000-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/e1000-devel
> To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired


^ permalink raw reply

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Andy Lutomirski @ 2019-08-28  0:55 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Alexei Starovoitov, Kees Cook, LSM List,
	James Morris, Jann Horn, Peter Zijlstra, Masami Hiramatsu,
	Steven Rostedt, David S. Miller, Daniel Borkmann,
	Network Development, bpf, kernel-team, Linux API
In-Reply-To: <20190828003447.htgzsxs5oevn3eys@ast-mbp.dhcp.thefacebook.com>

On Tue, Aug 27, 2019 at 5:34 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Aug 27, 2019 at 04:01:08PM -0700, Andy Lutomirski wrote:
> > [adding some security and tracing folks to cc]
> >
> > On Tue, Aug 27, 2019 at 1:52 PM Alexei Starovoitov <ast@kernel.org> wrote:
> > >
> > > Introduce CAP_BPF that allows loading all types of BPF programs,
> > > create most map types, load BTF, iterate programs and maps.
> > > CAP_BPF alone is not enough to attach or run programs.
> > >
> > > Networking:
> > >
> > > CAP_BPF and CAP_NET_ADMIN are necessary to:
> > > - attach to cgroup-bpf hooks like INET_INGRESS, INET_SOCK_CREATE, INET4_CONNECT
> > > - run networking bpf programs (like xdp, skb, flow_dissector)
> > >
> > > Tracing:
> > >
> > > CAP_BPF and perf_paranoid_tracepoint_raw() (which is kernel.perf_event_paranoid == -1)
> > > are necessary to:
> > > - attach bpf program to raw tracepoint
> > > - use bpf_trace_printk() in all program types (not only tracing programs)
> > > - create bpf stackmap
> > >
> > > To attach bpf to perf_events perf_event_open() needs to succeed as usual.
> > >
> > > CAP_BPF controls BPF side.
> > > CAP_NET_ADMIN controls intersection where BPF calls into networking.
> > > perf_paranoid_tracepoint_raw controls intersection where BPF calls into tracing.
> > >
> > > In the future CAP_TRACING could be introduced to control
> > > creation of kprobe/uprobe and attaching bpf to perf_events.
> > > In such case bpf_probe_read() thin wrapper would be controlled by CAP_BPF.
> > > Whereas probe_read() would be controlled by CAP_TRACING.
> > > CAP_TRACING would also control generic kprobe+probe_read.
> > > CAP_BPF and CAP_TRACING would be necessary for tracing bpf programs
> > > that want to use bpf_probe_read.
> >
> > First, some high-level review:
> >
> > Can you write up some clear documentation aimed at administrators that
> > says what CAP_BPF does?  For example, is it expected that CAP_BPF by
> > itself permits reading all kernel memory?
>
> hmm. the answer is in the sentence you quoted right above.

I was hoping for something in Documentation/admin-guide, not in a
changelog that's hard to find.

>
> > Can you give at least one fully described use case where CAP_BPF
> > solves a real-world problem that is not solved by existing mechanisms?
>
> bpftrace binary would be installed with CAP_BPF and CAP_TRACING.
> bcc tools would be installed with CAP_BPF and CAP_TRACING.
> perf binary would be installed with CAP_TRACING only.
> XDP networking daemon would be installed with CAP_BPF and CAP_NET_ADMIN.
> None of them would need full root.

As in just setting these bits in fscaps?  What does this achieve
beyond just installing them setuid-root or with CAP_SYS_ADMIN and
judicious use of capset internally?  For that matter, what prevents
unauthorized users from tracing the system if you do this?  This just
lets anyone trace the system, which seems like a mistake.

Can you clarify your example or give another one?

>
> > Changing the capability that some existing operation requires could
> > break existing programs.  The old capability may need to be accepted
> > as well.
>
> As far as I can see there is no ABI breakage. Please point out
> which line of the patch may break it.

As a more or less arbitrary selection:

 void bpf_prog_kallsyms_add(struct bpf_prog *fp)
 {
        if (!bpf_prog_kallsyms_candidate(fp) ||
-           !capable(CAP_SYS_ADMIN))
+           !capable(CAP_BPF))
                return;

Before your patch, a task with CAP_SYS_ADMIN could do this.  Now it
can't.  Per the usual Linux definition of "ABI break", this is an ABI
break if and only if someone actually did this in a context where they
have CAP_SYS_ADMIN but not all capabilities.  How confident are you
that no one does things like this?
 void bpf_prog_kallsyms_add(struct bpf_prog *fp)
 {
        if (!bpf_prog_kallsyms_candidate(fp) ||
-           !capable(CAP_SYS_ADMIN))
+           !capable(CAP_BPF))
                return;

> > > ---
> > > I would prefer to introduce CAP_TRACING soon, since it
> > > will make tracing and networking permission model symmetrical.
> > >
> >
> > Here's my proposal for CAP_TRACING, documentation-style:
> >
> > --- begin ---
> >
> > CAP_TRACING enables a task to use various kernel features to trace
> > running user programs and the kernel itself.  CAP_TRACING also enables
> > a task to bypass some speculation attack countermeasures.  A task in
> > the init user namespace with CAP_TRACING will be able to tell exactly
> > what kernel code is executed and when, and will be able to read kernel
> > registers and kernel memory.  It will, similarly, be able to read the
> > state of other user tasks.
> >
> > Specifically, CAP_TRACING allows the following operations.  It may
> > allow more operations in the future:
> >
> >  - Full use of perf_event_open(), similarly to the effect of
> > kernel.perf_event_paranoid == -1.
>
> +1
>
> >  - Loading and attaching tracing BPF programs, including use of BPF
> > raw tracepoints.
>
> -1
>
> >  - Use of BPF stack maps.
>
> -1
>
> >  - Use of bpf_probe_read() and bpf_trace_printk().
>
> -1
>
> >  - Use of unsafe pointer-to-integer conversions in BPF.
>
> -1
>
> >  - Bypassing of BPF's speculation attack hardening measures and
> > constant blinding.  (Note: other mechanisms might also allow this.)
>
> -1
> All of the above are allowed by CAP_BPF.
> They are not allowed by CAP_TRACING.

Why?  I don't mean to discount your -1, and you may well have a
compelling reason.  If so, I'll change my proposal.

From the previous discussion, you want to make progress toward solving
a lot of problems with CAP_BPF.  One of them was making BPF
firewalling more generally useful. By making CAP_BPF grant the ability
to read kernel memory, you will make administrators much more nervous
to grant CAP_BPF.  Similarly, and correct me if I'm wrong, most of
these capabilities are primarily or only useful for tracing, so I
don't see why users without CAP_TRACING should get them.
bpf_trace_printk(), in particular, even has "trace" in its name :)

Also, if a task has CAP_TRACING, it's expected to be able to trace the
system -- that's the whole point.  Why shouldn't it be able to use BPF
to trace the system better?

>
> > CAP_TRACING does not override normal permissions on sysfs or debugfs.
> > This means that, unless a new interface for programming kprobes and
> > such is added, it does not directly allow use of kprobes.
>
> kprobes can be created via perf_event_open already.
> So above statement contradicts your first statement.

Hmm.  The way of using perf with kprobes that I'm familiar with is:

# perf probe --add func_name

And this uses "/sys/kernel/debug/tracing//kprobe_events" (with the
double slash!).  Is there indeed another way to do this?

Anyway, I didn't mean to exclude any existing perf_event_open()
mechanism -- what I meant was that, without some extension to my
proposal, /sys/kernel/debug/tracing wouldn't magically become
accessible due to CAP_TRACING.

>
> > If CAP_TRACING, by itself, enables a task to crash or otherwise
> > corrupt the kernel or other tasks, this will be considered a kernel
> > bug.
>
> +1
>
> > CAP_TRACING in a non-init user namespace may, in the future, allow
> > tracing of other tasks in that user namespace or its descendants.  It
> > will not enable kernel tracing or tracing of tasks outside the user
> > namespace in question.
>
> I would avoid describing user ns for now.
> There is enough confusion without it.
>
> > --- end ---
> >
> > Does this sound good?  The idea here is that CAP_TRACING should be
> > very useful even without CAP_BPF, which allows CAP_BPF to be less
> > powerful.
>
> As proposed CAP_BPF does not allow tracing or networking on its own.
> CAP_BPF only controls BPF side.
>
> For example:
> BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr)
> {
>         int ret;
>
>         ret = probe_kernel_read(dst, unsafe_ptr, size);
>         if (unlikely(ret < 0))
>                 memset(dst, 0, size);
>
>         return ret;
> }
>
> All of BPF (including prototype of bpf_probe_read) is controlled by CAP_BPF.
> But the kernel primitives its using (probe_kernel_read) is controlled by CAP_TRACING.
> Hence a task needs _both_ CAP_BPF and CAP_TRACING to attach and run bpf program
> that uses bpf_probe_read.
>
> Similar with all other kernel code that BPF helpers may call directly or indirectly.
> If there is a way for bpf program to call into piece of code controlled by CAP_TRACING
> such helper would need CAP_BPF and CAP_TRACING.
> If bpf helper calls into something that may mangle networking packet
> such helper would need both CAP_BPF and CAP_NET_ADMIN to execute.

Why do you want to require CAP_BPF to call into functions like
bpf_probe_read()?  I understand why you want to limit access to bpf,
but I think that CAP_TRACING should be sufficient to allow the tracing
parts of BPF.  After all, a lot of your concerns, especially the ones
involving speculation, don't really apply to users with CAP_TRACING --
users with CAP_TRACING can read kernel memory with or without bpf.

>
> > > @@ -2080,7 +2083,10 @@ static int bpf_prog_test_run(const union bpf_attr *attr,
> > >         struct bpf_prog *prog;
> > >         int ret = -ENOTSUPP;
> > >
> > > -       if (!capable(CAP_SYS_ADMIN))
> > > +       if (!capable(CAP_NET_ADMIN) || !capable(CAP_BPF))
> > > +               /* test_run callback is available for networking progs only.
> > > +                * Add cap_bpf_tracing() above when tracing progs become runable.
> > > +                */
> >
> > I think test_run should probably be CAP_SYS_ADMIN forever.  test_run
> > is the only way that one can run a bpf program and call helper
> > functions via the program if one doesn't have permission to attach the
> > program.
>
> Since CAP_BPF + CAP_NET_ADMIN allow attach. It means that a task
> with these two permissions will have programs running anyway.
> (traffic will flow through netdev, socket events will happen, etc)
> Hence no reason to disallow running program via test_run.
>

test_run allows fully controlled inputs, in a context where a program
can trivially flush caches, mistrain branch predictors, etc first.  It
seems to me that, if a JITted bpf program contains an exploitable
speculation gadget (MDS, Spectre v1, RSB, or anything else), it will
be *much* easier to exploit it using test_run than using normal
network traffic.  Similarly, normal network traffic will have network
headers that are valid enough to have caused the BPF program to be
invoked in the first place.  test_run can inject arbitrary garbage.

^ permalink raw reply

* Re: [RFC PATCH net-next] net: phy: force phy suspend when calling phy_stop
From: shenjian (K) @ 2019-08-28  1:03 UTC (permalink / raw)
  To: Heiner Kallweit, andrew, f.fainelli, davem
  Cc: netdev, forest.zhouchang, linuxarm
In-Reply-To: <04fdbe88-8471-c023-4a0d-890667735737@gmail.com>



在 2019/8/28 3:41, Heiner Kallweit 写道:
> On 27.08.2019 10:29, shenjian (K) wrote:
>>
>>
>> 在 2019/8/27 13:51, Heiner Kallweit 写道:
>>> On 27.08.2019 04:47, Jian Shen wrote:
>>>> Some ethernet drivers may call phy_start() and phy_stop() from
>>>> ndo_open and ndo_close() respectively.
>>>>
>>>> When network cable is unconnected, and operate like below:
>>>> step 1: ifconfig ethX up -> ndo_open -> phy_start ->start
>>>> autoneg, and phy is no link.
>>>> step 2: ifconfig ethX down -> ndo_close -> phy_stop -> just stop
>>>> phy state machine.
>>>> step 3: plugin the network cable, and autoneg complete, then
>>>> LED for link status will be on.
>>>> step 4: ethtool ethX --> see the result of "Link detected" is no.
>>>>
>>> Step 3 and 4 seem to be unrelated to the actual issue.
>>> With which MAC + PHY driver did you observe this?
>>>
>> Thanks Heiner,
>>
>> I tested this on HNS3 driver, with two phy, Marvell 88E1512 and RTL8211.
>>
>> Step 3 and Step 4 is just to describe that the LED of link shows link up,
>> but the port information shows no link.
>>
> ethtool refers to the link at MAC level. Therefore default implementation
> ethtool_op_get_link just returns the result of netif_carrier_ok().
> Also using PHY link status if interface is down doesn't really make sense:
> - phylib state machine isn't running, therefore PHY status doesn't get updated
> - often MAC drivers shut down parts of the MAC on ndo_close, this typically
>   makes the internal MDIO bus unaccessible
> So just remove steps 3 and 4. The patch itself is fine with me.
> 
OK, I will fix the commit log.
Thanks, Heiner.
>>
>>>> This patch forces phy suspend even phydev->link is off.
>>>>
>>>> Signed-off-by: Jian Shen <shenjian15@huawei.com>
>>>> ---
>>>>  drivers/net/phy/phy.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
>>>> index f3adea9..0acd5b4 100644
>>>> --- a/drivers/net/phy/phy.c
>>>> +++ b/drivers/net/phy/phy.c
>>>> @@ -911,8 +911,8 @@ void phy_state_machine(struct work_struct *work)
>>>>  		if (phydev->link) {
>>>>  			phydev->link = 0;
>>>>  			phy_link_down(phydev, true);
>>>> -			do_suspend = true;
>>>>  		}
>>>> +		do_suspend = true;
>>>>  		break;
>>>>  	}
>>>>  
>>>>
>>> Heiner
>>>
>>>
>>
>>
> 
> 
> .
> 


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox