Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Page allocator bottleneck
From: Aaron Lu @ 2018-04-27  8:45 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Linux Kernel Network Developers, linux-mm, Mel Gorman,
	David Miller, Jesper Dangaard Brouer, Eric Dumazet,
	Alexei Starovoitov, Saeed Mahameed, Eran Ben Elisha,
	Andrew Morton, Michal Hocko
In-Reply-To: <20180423131033.GA13792@intel.com>

On Mon, Apr 23, 2018 at 09:10:33PM +0800, Aaron Lu wrote:
> On Mon, Apr 23, 2018 at 11:54:57AM +0300, Tariq Toukan wrote:
> > Hi,
> > 
> > I ran my tests with your patches.
> > Initial BW numbers are significantly higher than I documented back then in
> > this mail-thread.
> > For example, in driver #2 (see original mail thread), with 6 rings, I now
> > get 92Gbps (slightly less than linerate) in comparison to 64Gbps back then.
> > 
> > However, there were many kernel changes since then, I need to isolate your
> > changes. I am not sure I can finish this today, but I will surely get to it
> > next week after I'm back from vacation.
> > 
> > Still, when I increase the scale (more rings, i.e. more cpus), I see that
> > queued_spin_lock_slowpath gets to 60%+ cpu. Still high, but lower than it
> > used to be.
> 
> I wonder if it is on allocation path or free path?

Just FYI, I have pushed two more commits on top of the branch.
They should improve free path zone lock contention for MIGRATE_UNMOVABLE
pages(most kernel code alloc such pages), you may consider apply them if
free path contention is a problem.

^ permalink raw reply

* [PATCH] netfilter: ebtables: handle string from userspace with care
From: Paolo Abeni @ 2018-04-27  8:45 UTC (permalink / raw)
  To: netfilter-devel; +Cc: syzbot, fw, coreteam, syzkaller-bugs, netdev

strlcpy() can't be safely used on a user-space provided string,
as it can try to read beyond the buffer's end, if the latter is
not NULL terminated.

Leveraging the above, syzbot has been able to trigger the following
splat:

BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300
[inline]
BUG: KASAN: stack-out-of-bounds in compat_mtw_from_user
net/bridge/netfilter/ebtables.c:1957 [inline]
BUG: KASAN: stack-out-of-bounds in ebt_size_mwt
net/bridge/netfilter/ebtables.c:2059 [inline]
BUG: KASAN: stack-out-of-bounds in size_entry_mwt
net/bridge/netfilter/ebtables.c:2155 [inline]
BUG: KASAN: stack-out-of-bounds in compat_copy_entries+0x96c/0x14a0
net/bridge/netfilter/ebtables.c:2194
Write of size 33 at addr ffff8801b0abf888 by task syz-executor0/4504

CPU: 0 PID: 4504 Comm: syz-executor0 Not tainted 4.17.0-rc2+ #40
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
  print_address_description+0x6c/0x20b mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
  check_memory_region_inline mm/kasan/kasan.c:260 [inline]
  check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
  memcpy+0x37/0x50 mm/kasan/kasan.c:303
  strlcpy include/linux/string.h:300 [inline]
  compat_mtw_from_user net/bridge/netfilter/ebtables.c:1957 [inline]
  ebt_size_mwt net/bridge/netfilter/ebtables.c:2059 [inline]
  size_entry_mwt net/bridge/netfilter/ebtables.c:2155 [inline]
  compat_copy_entries+0x96c/0x14a0 net/bridge/netfilter/ebtables.c:2194
  compat_do_replace+0x483/0x900 net/bridge/netfilter/ebtables.c:2285
  compat_do_ebt_set_ctl+0x2ac/0x324 net/bridge/netfilter/ebtables.c:2367
  compat_nf_sockopt net/netfilter/nf_sockopt.c:144 [inline]
  compat_nf_setsockopt+0x9b/0x140 net/netfilter/nf_sockopt.c:156
  compat_ip_setsockopt+0xff/0x140 net/ipv4/ip_sockglue.c:1279
  inet_csk_compat_setsockopt+0x97/0x120 net/ipv4/inet_connection_sock.c:1041
  compat_tcp_setsockopt+0x49/0x80 net/ipv4/tcp.c:2901
  compat_sock_common_setsockopt+0xb4/0x150 net/core/sock.c:3050
  __compat_sys_setsockopt+0x1ab/0x7c0 net/compat.c:403
  __do_compat_sys_setsockopt net/compat.c:416 [inline]
  __se_compat_sys_setsockopt net/compat.c:413 [inline]
  __ia32_compat_sys_setsockopt+0xbd/0x150 net/compat.c:413
  do_syscall_32_irqs_on arch/x86/entry/common.c:323 [inline]
  do_fast_syscall_32+0x345/0xf9b arch/x86/entry/common.c:394
  entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7fb3cb9
RSP: 002b:00000000fff0c26c EFLAGS: 00000282 ORIG_RAX: 000000000000016e
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000000000
RDX: 0000000000000080 RSI: 0000000020000300 RDI: 00000000000005f4
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

The buggy address belongs to the page:
page:ffffea0006c2afc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x2fffc0000000000()
raw: 02fffc0000000000 0000000000000000 0000000000000000 00000000ffffffff
raw: 0000000000000000 ffffea0006c20101 0000000000000000 0000000000000000
page dumped because: kasan: bad access detected

Fix the issue replacing the unsafe function with strscpy() and
taking care of possible errors.

Fixes: 81e675c227ec ("netfilter: ebtables: add CONFIG_COMPAT support")
Reported-and-tested-by: syzbot+4e42a04e0bc33cb6c087@syzkaller.appspotmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
Notes: others strlcpy() usage in ebtables.c look safe, as the source
string cames from the kernel.
---
 net/bridge/netfilter/ebtables.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 28a4c3490359..6ba639f6c51d 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1954,7 +1954,8 @@ static int compat_mtw_from_user(struct compat_ebt_entry_mwt *mwt,
 	int off, pad = 0;
 	unsigned int size_kern, match_size = mwt->match_size;
 
-	strlcpy(name, mwt->u.name, sizeof(name));
+	if (strscpy(name, mwt->u.name, sizeof(name)) < 0)
+		return -EINVAL;
 
 	if (state->buf_kern_start)
 		dst = state->buf_kern_start + state->buf_kern_offset;
-- 
2.14.3

^ permalink raw reply related

* Re: i.MX6S/DL and QCA8334 switch using DSA driver - CPU port not working
From: Michal Vokáč @ 2018-04-27  8:49 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, Vivien Didelot, Florian Fainelli
In-Reply-To: <20180426140629.GB15370@lunn.ch>

On 26.4.2018 16:06, Andrew Lunn wrote:
> On Thu, Apr 26, 2018 at 03:37:33PM +0200, Michal Vokáč wrote:
>>
>>   - Linux 4.9.84 (Freescale 4.9-1.0.x-imx branch)
> 
> Hi Michal
> 
> Please use mainline, not the freescale fork. For DSA, there is nothing
> you need in the freescale fork. Once it works with mainline, you can
> then figure out what needs to be done to make the fork work.

Hi Andrew,
Thanks for such a quick reply!

OK, good point, I will go this way - mainline first, fork then.

>> The Freescale branch does not introduce any changes to the DSA nor to the QCA8K
>> drivers from mainline.
> 
> Does it have
> fbbeefdd2104 ("net: fec: Allow reception of frames bigger than 1522 bytes")

Yes, this one was backported to 4.9.74 and is in the freescale branch too.

>> To make the bridge work I need to enable forwarding across all the switch ports
>> at setup.
>>
>> --- a/drivers/net/dsa/qca8k.c
>> +++ b/drivers/net/dsa/qca8k.c
>> @@ -578,12 +578,12 @@ qca8k_setup(struct dsa_switch *ds)
>>     		if (ds->enabled_port_mask & BIT(i))
>>     			qca8k_port_set_status(priv, i, 0);
>> -	/* Forward all unknown frames to CPU port for Linux processing */
>> +	/* Forward all unknown frames to all pors */
>>     	qca8k_write(priv, QCA8K_REG_GLOBAL_FW_CTRL1,
>>     		    BIT(0) << QCA8K_GLOBAL_FW_CTRL1_IGMP_DP_S |
>> -		    BIT(0) << QCA8K_GLOBAL_FW_CTRL1_BC_DP_S |
>> -		    BIT(0) << QCA8K_GLOBAL_FW_CTRL1_MC_DP_S |
>> -		    BIT(0) << QCA8K_GLOBAL_FW_CTRL1_UC_DP_S);
>> +		    0x7f << QCA8K_GLOBAL_FW_CTRL1_BC_DP_S |
>> +		    0x7f << QCA8K_GLOBAL_FW_CTRL1_MC_DP_S |
>> +		    0x7f << QCA8K_GLOBAL_FW_CTRL1_UC_DP_S);
>>     	/* Setup connection between CPU port & user ports */
>>     	for (i = 0; i < DSA_MAX_PORTS; i++) {
>> --
> 
> This is probably because you don't have a working CPU port.  If that
> worked, all unknown frames would be passed to the software bridge. It
> would then either flood them out all ports, or if it knows the
> destination MAC address, out one specific port. The should be enough
> to make the destination reply, at which point the switch learns the
> MAC address, and it is no longer unknown.
> 
> So lets leave this alone for the moment.

First attempt - pure mainline 4.9.84 without my patch.
CPU port not working, bridge not working.

>> But I am still not able to make work the CPU port though.
>>
>>   # udhcpc -i eth2
>>   Sending discover...
>>   [FOREVER]
>>
>> The same for eth1, eth2 and br0.
>>
>> I suspect the problem may be at different levels:
>>
>>   - The RGMII interface is not properly configured
>>    -- at the CPU side, or
>>    -- at the switch chip side.
>>   - Some setup that I have not done needs to be done (in userspace).
> 
> Your user space setup look O.K.
> 
> Try playing with RGMII delays. Set the phy-mode to rgmii-id.

OK, I will try some combinations and also to tune the numbers in the driver.
That part is actually quite confusing to me. phy-mode can be set for the fec
and for the port. For the fec I am now using rgmii as that is what we were
using before and it worked. Though from my understanding of the ethernet
binding doc it totally make sense to use rgmii-id for the fec.
I tried that and it did not help.

Using rgmii-id for the port is not valid as the qca8k driver does not support
that mode. It only supports rgmii and sgmii. I think this is actually not
correct. When phy-mode is set to rgmii for port the qca8k driver configures
internal delays in the switch. So it behaves like rgmii-id I think.

Should not it be:

--- a/drivers/net/dsa/qca8k.c
+++ b/drivers/net/dsa/qca8k.c
@@ -474,7 +474,7 @@ qca8k_set_pad_ctrl(struct qca8k_priv *priv, int port, int mode)
  	 * PHY or MAC.
  	 */
  	switch (mode) {
-	case PHY_INTERFACE_MODE_RGMII:
+	case PHY_INTERFACE_MODE_RGMII_ID:
  	qca8k_write(priv, reg,
		    QCA8K_PORT_PAD_RGMII_EN |
		    QCA8K_PORT_PAD_RGMII_TX_DELAY(3) |

^ permalink raw reply

* Re: [PATCHv3 3/3] tools bpftool: Display license GPL compatible in prog show/list
From: Jiri Olsa @ 2018-04-27  8:58 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Jakub Kicinski, Jiri Olsa, Alexei Starovoitov, lkml, netdev,
	Quentin Monnet
In-Reply-To: <dbc91e08-4c99-e01f-eab5-ac235812cee7@iogearbox.net>

On Thu, Apr 26, 2018 at 10:49:25PM +0200, Daniel Borkmann wrote:
> On 04/26/2018 10:18 AM, Jiri Olsa wrote:
> [...]
> > v3 of the last patch attached, the branch is also updated
> > 
> > thanks,
> > jirka
> > 
> > 
> > ---
> > Display the license "gpl" string in bpftool prog command, like:
> > 
> >   # bpftool prog list
> >   5: tracepoint  name func  tag 57cd311f2e27366b  gpl
> >           loaded_at Apr 26/09:37  uid 0
> >           xlated 16B  not jited  memlock 4096B
> > 
> >   # bpftool --json --pretty prog show
> >   [{
> >           "id": 5,
> >           "type": "tracepoint",
> >           "name": "func",
> >           "tag": "57cd311f2e27366b",
> >           "gpl_compatible": true,
> >           "loaded_at": "Apr 26/09:37",
> >           "uid": 0,
> >           "bytes_xlated": 16,
> >           "jited": false,
> >           "bytes_memlock": 4096
> >       }
> >   ]
> > 
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> 
> Ok, v2 from prior two patches and v3 of this one applied to bpf-next. Please
> next time always submit a fresh new series at once, thanks Jiri.

noted, thanks a lot

jirka

^ permalink raw reply

* Re: [PATCH bpf-next v4 00/10] bpf: document eBPF helpers and add a script to generate man page
From: Daniel Borkmann @ 2018-04-27  9:08 UTC (permalink / raw)
  To: Quentin Monnet, ast; +Cc: netdev, oss-drivers, linux-doc, linux-man
In-Reply-To: <20180425171701.11048-1-quentin.monnet@netronome.com>

On 04/25/2018 07:16 PM, Quentin Monnet wrote:
> eBPF helper functions can be called from within eBPF programs to perform
> a variety of tasks that would be otherwise hard or impossible to do with
> eBPF itself. There is a growing number of such helper functions in the
> kernel, but documentation is scarce. The main user space header file
> does contain a short commented description of most helpers, but it is
> somewhat outdated and not complete. It is more a "cheat sheet" than a
> real documentation accessible to new eBPF developers.
> 
> This commit attempts to improve the situation by replacing the existing
> overview for the helpers with a more developed description. Furthermore,
> a Python script is added to generate a manual page for eBPF helpers. The
> workflow is the following, and requires the rst2man utility:
> 
>     $ ./scripts/bpf_helpers_doc.py \
>             --filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst
>     $ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7
>     $ man /tmp/bpf-helpers.7
> 
> The objective is to keep all documentation related to the helpers in a
> single place, and to be able to generate from here a manual page that
> could be packaged in the man-pages repository and shipped with most
> distributions.
> 
> Additionally, parsing the prototypes of the helper functions could
> hopefully be reused, with a different Printer object, to generate
> header files needed in some eBPF-related projects.
> 
> Regarding the description of each helper, it comprises several items:
> 
> - The function prototype.
> - A description of the function and of its arguments (except for a
>   couple of cases, when there are no arguments and the return value
>   makes the function usage really obvious).
> - A description of return values (if not void).
> 
> Additional items such as the list of compatible eBPF program and map
> types for each helper, Linux kernel version that introduced the helper,
> GPL-only restriction, and commit hash could be added in the future, but
> it was decided on the mailing list to leave them aside for now.
> 
> For several helpers, descriptions are inspired (at times, nearly copied)
> from the commit logs introducing them in the kernel--Many thanks to
> their respective authors! Some sentences were also adapted from comments
> from the reviews, thanks to the reviewers as well. Descriptions were
> completed as much as possible, the objective being to have something easily
> accessible even for people just starting with eBPF. There is probably a bit
> more work to do in this direction for some helpers.
[...]

Applied yesterday night to bpf-next (and now in net-next), thanks Quentin!

^ permalink raw reply

* Re: [PATCHv2 bpf-next 0/2] BPF tunnel testsuite
From: Daniel Borkmann @ 2018-04-27  9:10 UTC (permalink / raw)
  To: William Tu, netdev
In-Reply-To: <1524776500-27030-1-git-send-email-u9012063@gmail.com>

On 04/26/2018 11:01 PM, William Tu wrote:
> The patch series provide end-to-end eBPF tunnel testsute.  A common topology
> is created below for all types of tunnels:
> 
> Topology:                                                                     
> ---------                                                                     
>      root namespace   |     at_ns0 namespace                                   
>                       |                                                        
>       -----------     |     -----------                                        
>       | tnl dev |     |     | tnl dev |  (overlay network)                     
>       -----------     |     -----------                                        
>       metadata-mode   |     native-mode                                        
>        with bpf       |                                                        
>                       |                                                        
>       ----------      |     ----------                                         
>       |  veth1  | --------- |  veth0  |  (underlay network)                    
>       ----------    peer    ----------                                         
> 	                                                                              
>                                                                                
> Device Configuration                                                          
> --------------------                                                          
>  Root namespace with metadata-mode tunnel + BPF                                
>  Device names and addresses:                                                   
>        veth1 IP: 172.16.1.200, IPv6: 00::22 (underlay)                         
>        tunnel dev <type>11, ex: gre11, IPv4: 10.1.1.200 (overlay)              
>                                                                                
>  Namespace at_ns0 with native tunnel                                           
>  Device names and addresses:                                                   
>        veth0 IPv4: 172.16.1.100, IPv6: 00::11 (underlay)                       
>        tunnel dev <type>00, ex: gre00, IPv4: 10.1.1.100 (overlay)              
>                                                                                
>                                                                                
> End-to-end ping packet flow                                                   
> ---------------------------                                                   
>  Most of the tests start by namespace creation, device configuration,          
>  then ping the underlay and overlay network.  When doing 'ping 10.1.1.100'     
>  from root namespace, the following operations happen:                         
>  1) Route lookup shows 10.1.1.100/24 belongs to tnl dev, fwd to tnl dev.       
>  2) Tnl device's egress BPF program is triggered and set the tunnel metadata,  
>     with remote_ip=172.16.1.200 and others.                                    
>  3) Outer tunnel header is prepended and route the packet to veth1's egress    
>  4) veth0's ingress queue receive the tunneled packet at namespace at_ns0      
>  5) Tunnel protocol handler, ex: vxlan_rcv, decap the packet                   
>  6) Forward the packet to the overlay tnl dev                                  
> 
> Test Cases
> -----------------------------
>  Tunnel Type |  BPF Programs
> -----------------------------
>  GRE:          gre_set_tunnel, gre_get_tunnel
>  IP6GRE:       ip6gretap_set_tunnel, ip6gretap_get_tunnel
>  ERSPAN:       erspan_set_tunnel, erspan_get_tunnel
>  IP6ERSPAN:    ip4ip6erspan_set_tunnel, ip4ip6erspan_get_tunnel
>  VXLAN:        vxlan_set_tunnel, vxlan_get_tunnel
>  IP6VXLAN:     ip6vxlan_set_tunnel, ip6vxlan_get_tunnel
>  GENEVE:       geneve_set_tunnel, geneve_get_tunnel
>  IP6GENEVE:    ip6geneve_set_tunnel, ip6geneve_get_tunnel
>  IPIP:         ipip_set_tunnel, ipip_get_tunnel
>  IP6IP:        ipip6_set_tunnel, ipip6_get_tunnel,
>                ip6ip6_set_tunnel, ip6ip6_get_tunnel
>  XFRM:         xfrm_get_state

Applied yesterday night to bpf-next (and now in net-next), thanks William!

^ permalink raw reply

* Re: [PATCH bpf-next] bpf: fix xdp_generic for bpf_adjust_tail usecase
From: Daniel Borkmann @ 2018-04-27  9:11 UTC (permalink / raw)
  To: Nikita V. Shirokov, Alexei Starovoitov, David S . Miller; +Cc: netdev
In-Reply-To: <20180425141503.25772-1-tehnerd@tehnerd.com>

On 04/25/2018 04:15 PM, Nikita V. Shirokov wrote:
>  when bpf_adjust_tail was introduced for generic xdp, it changed skb's tail
>  pointer, so it was pointing to the new  "end of the packet". however skb's
>  len field wasn't properly modified, so on the wire ethernet frame had
>  original (or even bigger, if adjust_head was used) size. this diff is fixing
>  this.
> 
> Fixes: 198d83bb3 (" bpf: make generic xdp compatible w/
> bpf_xdp_adjust_tail")
> 
> Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com>

Applied yesterday to bpf-next (and now in net-next), thanks Nikita!

^ permalink raw reply

* Re: [RFC v3 0/5] virtio: support packed ring
From: Tiwei Bie @ 2018-04-27  9:12 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, virtualization, linux-kernel, netdev, wexu,
	jfreimann
In-Reply-To: <5c712aa2-f00e-b472-cdfc-48175aea790d@redhat.com>

On Fri, Apr 27, 2018 at 02:17:51PM +0800, Jason Wang wrote:
> On 2018年04月27日 12:18, Michael S. Tsirkin wrote:
> > On Fri, Apr 27, 2018 at 11:56:05AM +0800, Jason Wang wrote:
> > > On 2018年04月25日 13:15, Tiwei Bie wrote:
> > > > Hello everyone,
> > > > 
> > > > This RFC implements packed ring support in virtio driver.
> > > > 
> > > > Some simple functional tests have been done with Jason's
> > > > packed ring implementation in vhost:
> > > > 
> > > > https://lkml.org/lkml/2018/4/23/12
> > > > 
> > > > Both of ping and netperf worked as expected (with EVENT_IDX
> > > > disabled). But there are below known issues:
> > > > 
> > > > 1. Reloading the guest driver will break the Tx/Rx;
> > > Will have a look at this issue.
> > > 
> > > > 2. Zeroing the flags when detaching a used desc will
> > > >      break the guest -> host path.
> > > I still think zeroing flags is unnecessary or even a bug. At host, I track
> > > last observed avail wrap counter and detect avail like (what is suggested in
> > > the example code in the spec):
> > > 
> > > static bool desc_is_avail(struct vhost_virtqueue *vq, __virtio16 flags)
> > > {
> > >         bool avail = flags & cpu_to_vhost16(vq, DESC_AVAIL);
> > > 
> > >         return avail == vq->avail_wrap_counter;
> > > }
> > > 
> > > So zeroing wrap can not work with this obviously.
> > > 
> > > Thanks
> > I agree. I think what one should do is flip the available bit.
> > 
> 
> But is this flipping a must?
> 
> Thanks

Yeah, that's my question too. It seems to be a requirement
for driver that, the only change to the desc status that a
driver can do during running is to mark the desc as avail,
and any other changes to the desc status are not allowed.
Similarly, the device can only mark the desc as used, and
any other changes to the desc status are also not allowed.
So the question is, are there such requirements?

Based on below contents in the spec:

"""
Thus VIRTQ_DESC_F_AVAIL and VIRTQ_DESC_F_USED bits are different
for an available descriptor and equal for a used descriptor.

Note that this observation is mostly useful for sanity-checking
as these are necessary but not sufficient conditions
"""

It seems that, it's necessary for devices to check whether
the AVAIL bit and USED bit are different.

Best regards,
Tiwei Bie

^ permalink raw reply

* Re: [PATCH] [PATCH bpf-next] samples/bpf/bpf_load.c: remove redundant ret assignment in bpf_load_program()
From: Daniel Borkmann @ 2018-04-27  9:15 UTC (permalink / raw)
  To: Wang Sheng-Hui, ast, netdev
In-Reply-To: <20180425020713.1795-1-shhuiw@foxmail.com>

On 04/25/2018 04:07 AM, Wang Sheng-Hui wrote:
> 2 redundant ret assignments removded:
> * 'ret = 1' before the logic 'if (data_maps)', and if any errors jump to
>   label 'done'. No 'ret = 1' needed before the error jump.
> * After the '/* load programs */' part, if everything goes well, then
>   the BPF code will be loaded and 'ret' set to 0 by load_and_attach().
>   If something goes wrong, 'ret' set to none-O, the redundant 'ret = 0'
>   after the for clause will make the error skipped.
>   For example, if some BPF code cannot provide supported program types
>   in ELF SEC("unknown"), the for clause will not call load_and_attach()
>   to load the BPF code. 1 should be returned to callees instead of 0.
> 
> Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>

Applied yesterday to bpf-next (and now in net-next), thanks Nikita!

^ permalink raw reply

* Re: [PATCH] netfilter: ebtables: handle string from userspace with care
From: Florian Westphal @ 2018-04-27  9:26 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netfilter-devel, syzbot, fw, coreteam, syzkaller-bugs, netdev
In-Reply-To: <8710122d42aa1f3e081812f2abf406973f834982.1524818458.git.pabeni@redhat.com>

Paolo Abeni <pabeni@redhat.com> wrote:
> strlcpy() can't be safely used on a user-space provided string,
> as it can try to read beyond the buffer's end, if the latter is
> not NULL terminated.

Yes.

> Leveraging the above, syzbot has been able to trigger the following
> splat:
> 
> BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300
> [inline]
> BUG: KASAN: stack-out-of-bounds in compat_mtw_from_user
> net/bridge/netfilter/ebtables.c:1957 [inline]
> BUG: KASAN: stack-out-of-bounds in ebt_size_mwt
> net/bridge/netfilter/ebtables.c:2059 [inline]
> BUG: KASAN: stack-out-of-bounds in size_entry_mwt
> net/bridge/netfilter/ebtables.c:2155 [inline]
> BUG: KASAN: stack-out-of-bounds in compat_copy_entries+0x96c/0x14a0
> net/bridge/netfilter/ebtables.c:2194
> Write of size 33 at addr ffff8801b0abf888 by task syz-executor0/4504

Which is weird, I don't understand this report.
The code IS wrong, but it should cause out-of-bounds read (strlen on
src), but not out-of-bounds write.

Yes, I sent a recent patch (dceb48d86b4871984b8ce9ad5057fb2c01aa33de in
nf.git) that would now allow to get rid of the strlcpy and use the
source directly.

^ permalink raw reply

* Re: [PATCH net-next] selftests: pmtu: Minimum MTU for vti6 is 68
From: Xin Long @ 2018-04-27  9:33 UTC (permalink / raw)
  To: Stefano Brivio
  Cc: David S . Miller, Steffen Klassert, Alexey Kodanev, Jarod Wilson,
	Sabrina Dubroca, network dev
In-Reply-To: <c2369c8f004006b33007bad40b63c35f50ff3c23.1524764073.git.sbrivio@redhat.com>

On Fri, Apr 27, 2018 at 1:41 AM, Stefano Brivio <sbrivio@redhat.com> wrote:
> A vti6 interface can carry IPv4 packets too.
>
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> ---
>  tools/testing/selftests/net/pmtu.sh | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh
> index 1e428781a625..7651fd4d86fe 100755
> --- a/tools/testing/selftests/net/pmtu.sh
> +++ b/tools/testing/selftests/net/pmtu.sh
> @@ -368,7 +368,7 @@ test_pmtu_vti6_link_add_mtu() {
>
>         fail=0
>
> -       min=1280
> +       min=68                  # vti6 can carry IPv4 packets too
>         max=$((65535 - 40))
>         # Check invalid values first
>         for v in $((min - 1)) $((max + 1)); do
> @@ -384,7 +384,7 @@ test_pmtu_vti6_link_add_mtu() {
>         done
>
>         # Now check valid values
> -       for v in 1280 1300 $((65535 - 40)); do
> +       for v in 68 1280 1300 $((65535 - 40)); do
>                 ${ns_a} ip link add vti6_a mtu ${v} type vti6 local ${veth6_a_addr} remote ${veth6_b_addr} key 10
>                 mtu="$(link_get_mtu "${ns_a}" vti6_a)"
>                 ${ns_a} ip link del vti6_a
> --
> 2.15.1
>
Reviewed-by: Xin Long <lucien.xin@gmail.com>

^ permalink raw reply

* Re: [PATCH bpf-next] bpf, doc: Update bpf_jit_enable limitation for CONFIG_BPF_JIT_ALWAYS_ON
From: Daniel Borkmann @ 2018-04-27  9:44 UTC (permalink / raw)
  To: Leo Yan, Alexei Starovoitov, David S. Miller, Jonathan Corbet,
	netdev, linux-kernel, linux-doc
In-Reply-To: <1524709611-29437-1-git-send-email-leo.yan@linaro.org>

On 04/26/2018 04:26 AM, Leo Yan wrote:
> When CONFIG_BPF_JIT_ALWAYS_ON is enabled, kernel has limitation for
> bpf_jit_enable, so it has fixed value 1 and we cannot set it to 2
> for JIT opcode dumping; this patch is to update the doc for it.
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  Documentation/networking/filter.txt | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
> index fd55c7d..feddab9 100644
> --- a/Documentation/networking/filter.txt
> +++ b/Documentation/networking/filter.txt
> @@ -483,6 +483,12 @@ Example output from dmesg:
>  [ 3389.935851] JIT code: 00000030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 00
>  [ 3389.935852] JIT code: 00000040: eb 02 31 c0 c9 c3
>  
> +When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is set to 1 by default
> +and it returns failure if change to any other value from proc node; this is
> +for security consideration to avoid leaking info to unprivileged users. In this
> +case, we can't directly dump JIT opcode image from kernel log, alternatively we
> +need to use bpf tool for the dumping.
> +

Could you change this doc text a bit, I think it's slightly misleading. From the first
sentence one could also interpret that value 0 would leaking info to unprivileged users
whereas here we're only talking about the case of value 2. Maybe something roughly like
this to make it more clear:

  When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is permanently set to 1 and
  setting any other value than that will return in failure. This is even the case for
  setting bpf_jit_enable to 2, since dumping the final JIT image into the kernel log
  is discouraged and introspection through bpftool (under tools/bpf/bpftool/) is the
  generally recommended approach instead.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH] netfilter: ebtables: handle string from userspace with care
From: Dmitry Vyukov @ 2018-04-27  9:46 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Paolo Abeni, netfilter-devel, syzbot, coreteam, syzkaller-bugs,
	netdev
In-Reply-To: <20180427092622.4ifhb4zjoncwawmi@breakpoint.cc>

On Fri, Apr 27, 2018 at 11:26 AM, Florian Westphal <fw@strlen.de> wrote:
> Paolo Abeni <pabeni@redhat.com> wrote:
>> strlcpy() can't be safely used on a user-space provided string,
>> as it can try to read beyond the buffer's end, if the latter is
>> not NULL terminated.
>
> Yes.
>
>> Leveraging the above, syzbot has been able to trigger the following
>> splat:
>>
>> BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300
>> [inline]
>> BUG: KASAN: stack-out-of-bounds in compat_mtw_from_user
>> net/bridge/netfilter/ebtables.c:1957 [inline]
>> BUG: KASAN: stack-out-of-bounds in ebt_size_mwt
>> net/bridge/netfilter/ebtables.c:2059 [inline]
>> BUG: KASAN: stack-out-of-bounds in size_entry_mwt
>> net/bridge/netfilter/ebtables.c:2155 [inline]
>> BUG: KASAN: stack-out-of-bounds in compat_copy_entries+0x96c/0x14a0
>> net/bridge/netfilter/ebtables.c:2194
>> Write of size 33 at addr ffff8801b0abf888 by task syz-executor0/4504
>
> Which is weird, I don't understand this report.
> The code IS wrong, but it should cause out-of-bounds read (strlen on
> src), but not out-of-bounds write.

Please see this for explanation:
https://groups.google.com/d/msg/syzkaller-bugs/-Jyti8zBWjU/6n-fkmXeBAAJ
The stack overwrite actually happens here.

> Yes, I sent a recent patch (dceb48d86b4871984b8ce9ad5057fb2c01aa33de in
> nf.git) that would now allow to get rid of the strlcpy and use the
> source directly.

^ permalink raw reply

* Re: [PATCH bpf-next] bpf, doc: Update bpf_jit_enable limitation for CONFIG_BPF_JIT_ALWAYS_ON
From: Leo Yan @ 2018-04-27  9:49 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, David S. Miller, Jonathan Corbet, netdev,
	linux-kernel, linux-doc
In-Reply-To: <275e03a2-b74e-8f60-4ffe-26c9a79fae9d@iogearbox.net>

On Fri, Apr 27, 2018 at 11:44:44AM +0200, Daniel Borkmann wrote:
> On 04/26/2018 04:26 AM, Leo Yan wrote:
> > When CONFIG_BPF_JIT_ALWAYS_ON is enabled, kernel has limitation for
> > bpf_jit_enable, so it has fixed value 1 and we cannot set it to 2
> > for JIT opcode dumping; this patch is to update the doc for it.
> > 
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> >  Documentation/networking/filter.txt | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
> > index fd55c7d..feddab9 100644
> > --- a/Documentation/networking/filter.txt
> > +++ b/Documentation/networking/filter.txt
> > @@ -483,6 +483,12 @@ Example output from dmesg:
> >  [ 3389.935851] JIT code: 00000030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 00
> >  [ 3389.935852] JIT code: 00000040: eb 02 31 c0 c9 c3
> >  
> > +When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is set to 1 by default
> > +and it returns failure if change to any other value from proc node; this is
> > +for security consideration to avoid leaking info to unprivileged users. In this
> > +case, we can't directly dump JIT opcode image from kernel log, alternatively we
> > +need to use bpf tool for the dumping.
> > +
> 
> Could you change this doc text a bit, I think it's slightly misleading. From the first
> sentence one could also interpret that value 0 would leaking info to unprivileged users
> whereas here we're only talking about the case of value 2. Maybe something roughly like
> this to make it more clear:
> 
>   When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is permanently set to 1 and
>   setting any other value than that will return in failure. This is even the case for
>   setting bpf_jit_enable to 2, since dumping the final JIT image into the kernel log
>   is discouraged and introspection through bpftool (under tools/bpf/bpftool/) is the
>   generally recommended approach instead.

Yeah, your rephrasing is more clear and better.  Will do this and send
new patch soon.  Thanks for your helping.

> Thanks,
> Daniel

^ permalink raw reply

* [PATCH v1 net-next] microchip_t1: Add driver for Microchip LAN87XX T1 PHYs
From: Nisar Sayed @ 2018-04-27 15:10 UTC (permalink / raw)
  To: davem; +Cc: UNGLinuxDriver, netdev

Add driver for Microchip LAN87XX T1 PHYs

This patch support driver for Microchp T1 PHYs.
There will be followup patches to this driver to support T1 PHY
features such as cable diagnostics, signal quality indicator(SQI),
sleep and wakeup (TC10) support.

Signed-off-by: Nisar Sayed <Nisar.Sayed@microchip.com>
---
v0 - v1:
        * Rename microchipT1phy.c file to microchip_t1.c
        * Remove microchipT1phy.h include file
        * Add SPDX license identifier
        * Remove remove probe and remove functions
        * Update LAN87XX_INTERRUPT_MASK write as suggested
---
 drivers/net/phy/Kconfig        |  5 +++
 drivers/net/phy/Makefile       |  1 +
 drivers/net/phy/microchip_t1.c | 88 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 94 insertions(+)
 create mode 100644 drivers/net/phy/microchip_t1.c

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index bdfbabb..7b0b351 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -354,6 +354,11 @@ config MICROCHIP_PHY
 	help
 	  Supports the LAN88XX PHYs.
 
+config MICROCHIP_T1_PHY
+	tristate "Microchip T1 PHYs"
+	---help---
+	  Supports the LAN87XX PHYs.
+
 config MICROSEMI_PHY
 	tristate "Microsemi PHYs"
 	---help---
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 01acbcb..3d0550b 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -70,6 +70,7 @@ obj-$(CONFIG_MESON_GXL_PHY)	+= meson-gxl.o
 obj-$(CONFIG_MICREL_KS8995MA)	+= spi_ks8995.o
 obj-$(CONFIG_MICREL_PHY)	+= micrel.o
 obj-$(CONFIG_MICROCHIP_PHY)	+= microchip.o
+obj-$(CONFIG_MICROCHIP_T1_PHY)	+= microchip_t1.o
 obj-$(CONFIG_MICROSEMI_PHY)	+= mscc.o
 obj-$(CONFIG_NATIONAL_PHY)	+= national.o
 obj-$(CONFIG_QSEMI_PHY)		+= qsemi.o
diff --git a/drivers/net/phy/microchip_t1.c b/drivers/net/phy/microchip_t1.c
new file mode 100644
index 0000000..1f6f299
--- /dev/null
+++ b/drivers/net/phy/microchip_t1.c
@@ -0,0 +1,88 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2018 Microchip Technology
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/mii.h>
+#include <linux/phy.h>
+
+/* Interrupt Source Register */
+#define LAN87XX_INTERRUPT_SOURCE                (0x18)
+
+/* Interrupt Mask Register */
+#define LAN87XX_INTERRUPT_MASK                  (0x19)
+#define LAN87XX_MASK_LINK_UP                    (0x0004)
+#define LAN87XX_MASK_LINK_DOWN                  (0x0002)
+
+#define DRIVER_AUTHOR	"Nisar Sayed <nisar.sayed@microchip.com>"
+#define DRIVER_DESC	"Microchip LAN87XX T1 PHY driver"
+
+static int lan87xx_phy_config_intr(struct phy_device *phydev)
+{
+	int rc, val = 0;
+
+	if (phydev->interrupts == PHY_INTERRUPT_ENABLED) {
+		/* unmask all source and clear them before enable */
+		rc = phy_write(phydev, LAN87XX_INTERRUPT_MASK, 0x7FFF);
+		rc = phy_read(phydev, LAN87XX_INTERRUPT_SOURCE);
+		val = (LAN87XX_MASK_LINK_UP | LAN87XX_MASK_LINK_DOWN);
+	}
+
+	rc = phy_write(phydev, LAN87XX_INTERRUPT_MASK, val);
+
+	return rc < 0 ? rc : 0;
+}
+
+static int lan87xx_phy_ack_interrupt(struct phy_device *phydev)
+{
+	int rc = phy_read(phydev, LAN87XX_INTERRUPT_SOURCE);
+
+	return rc < 0 ? rc : 0;
+}
+
+static struct phy_driver microchip_t1_phy_driver[] = {
+	{
+		.phy_id         = 0x0007c150,
+		.phy_id_mask    = 0xfffffff0,
+		.name           = "Microchip LAN87xx",
+
+		.features       = SUPPORTED_100baseT_Full,
+		.flags          = PHY_HAS_INTERRUPT,
+
+		.config_init    = genphy_config_init,
+		.config_aneg    = genphy_config_aneg,
+
+		.ack_interrupt  = lan87xx_phy_ack_interrupt,
+		.config_intr    = lan87xx_phy_config_intr,
+
+		.suspend        = genphy_suspend,
+		.resume         = genphy_resume,
+	}
+};
+
+module_phy_driver(microchip_t1_phy_driver);
+
+static struct mdio_device_id __maybe_unused microchip_t1_tbl[] = {
+	{ 0x0007c150, 0xfffffff0 },
+	{ }
+};
+
+MODULE_DEVICE_TABLE(mdio, microchip_t1_tbl);
+
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
+MODULE_LICENSE("GPL");
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH 3/3] selftests/bpf: .gitignore: add test_btf
From: Daniel Borkmann @ 2018-04-27  9:53 UTC (permalink / raw)
  To: Sirio Balmelli, ast; +Cc: netdev
In-Reply-To: <20180426083146.GA14025@vm4>

Hi Sirio,

thanks for your patch!

On 04/26/2018 10:31 AM, Sirio Balmelli wrote:
> Signed-off-by: Sirio Balmelli <sirio@b-ad.ch>
> ---
>  tools/testing/selftests/bpf/.gitignore | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
> index 5e1ab2f..9513c77 100644
> --- a/tools/testing/selftests/bpf/.gitignore
> +++ b/tools/testing/selftests/bpf/.gitignore
> @@ -12,6 +12,7 @@ test_tcpbpf_user
>  test_verifier_log
>  feature
>  test_libbpf_open
> +test_btf

This one is already part of bpf-next tree, please rebase:

https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/.gitignore

>  test_sock
>  test_sock_addr
>  urandom_read

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH 2/3] selftests/bpf: test_xdp_noinline.c: fix 'noinline' macro expansion
From: Daniel Borkmann @ 2018-04-27  9:58 UTC (permalink / raw)
  To: Sirio Balmelli, ast; +Cc: netdev
In-Reply-To: <20180426083125.GA13968@vm4>

On 04/26/2018 10:31 AM, Sirio Balmelli wrote:
> Compiling with clang 7.0.0 yields:
> test_xdp_noinline.c:470:24: warning: unknown attribute '__attribute__' ignored [-Wunknown-attributes]
> ../../../include/linux/compiler-gcc.h:24:19: note: expanded from macro 'noinline'
>                         ^
> test_xdp_noinline.c:494:24: error: use of undeclared identifier 'noinline'; did you mean 'inline'?
> static __attribute__ ((noinline))
> 
> This appears to be the 'noinline' attribute being itself macro-expanded,
> so the compiler sees '__attribute__ ((__attribute__((noinline))))'.
> 
> Fix using an #ifndef.
> Homogenize function declarations.
> 
> Signed-off-by: Sirio Balmelli <sirio@b-ad.ch>

I think this error is a result of your previous patch that you pull in
kernel headers suddenly. Otherwise include/linux/compiler-gcc.h should
have never been included. That's why you see the wrong expansion of ...

  __attribute__ ((noinline))

... into ...

  __attribute__ ((__attribute__ ((noinline))))

... since noinline is additionally defined in include/linux/compiler-gcc.h.

> ---
>  tools/testing/selftests/bpf/test_xdp_noinline.c | 79 +++++++++++++------------
>  1 file changed, 42 insertions(+), 37 deletions(-)
> 
> diff --git a/tools/testing/selftests/bpf/test_xdp_noinline.c b/tools/testing/selftests/bpf/test_xdp_noinline.c
> index 5e4aac7..5b5f3f2 100644
> --- a/tools/testing/selftests/bpf/test_xdp_noinline.c
> +++ b/tools/testing/selftests/bpf/test_xdp_noinline.c
> @@ -15,6 +15,11 @@
>  #include <linux/udp.h>
>  #include "bpf_helpers.h"
>  
> +/* some compiler-specific header might define this */
> +#ifndef noinline
> +#define noinline (__attribute__ ((noinline)))
> +#endif
> +
>  #define bpf_printk(fmt, ...)				\
>  ({							\
>  	char ____fmt[] = fmt;				\
> @@ -55,7 +60,7 @@ static __u32 rol32(__u32 word, unsigned int shift)
>  
>  typedef unsigned int u32;
>  
> -static __attribute__ ((noinline))
> +static noinline
>  u32 jhash(const void *key, u32 length, u32 initval)
>  {
>  	u32 a, b, c;
> @@ -92,7 +97,7 @@ u32 jhash(const void *key, u32 length, u32 initval)
>  	return c;
>  }
>  
> -static __attribute__ ((noinline))
> +static noinline
>  u32 __jhash_nwords(u32 a, u32 b, u32 c, u32 initval)
>  {
>  	a += initval;
> @@ -102,7 +107,7 @@ u32 __jhash_nwords(u32 a, u32 b, u32 c, u32 initval)
>  	return c;
>  }
>  
> -static __attribute__ ((noinline))
> +static noinline
>  u32 jhash_2words(u32 a, u32 b, u32 initval)
>  {
>  	return __jhash_nwords(a, b, 0, initval + JHASH_INITVAL + (2 << 2));
> @@ -239,7 +244,7 @@ static inline __u64 calc_offset(bool is_ipv6, bool is_icmp)
>  	return off;
>  }
>  
> -static __attribute__ ((noinline))
> +static noinline
>  bool parse_udp(void *data, void *data_end,
>  	       bool is_ipv6, struct packet_description *pckt)
>  {
> @@ -261,7 +266,7 @@ bool parse_udp(void *data, void *data_end,
>  	return 1;
>  }
>  
> -static __attribute__ ((noinline))
> +static noinline
>  bool parse_tcp(void *data, void *data_end,
>  	       bool is_ipv6, struct packet_description *pckt)
>  {
> @@ -285,7 +290,7 @@ bool parse_tcp(void *data, void *data_end,
>  	return 1;
>  }
>  
> -static __attribute__ ((noinline))
> +static noinline
>  bool encap_v6(struct xdp_md *xdp, struct ctl_value *cval,
>  	      struct packet_description *pckt,
>  	      struct real_definition *dst, __u32 pkt_bytes)
> @@ -328,7 +333,7 @@ bool encap_v6(struct xdp_md *xdp, struct ctl_value *cval,
>  	return 1;
>  }
>  
> -static __attribute__ ((noinline))
> +static noinline
>  bool encap_v4(struct xdp_md *xdp, struct ctl_value *cval,
>  	      struct packet_description *pckt,
>  	      struct real_definition *dst, __u32 pkt_bytes)
> @@ -382,7 +387,7 @@ bool encap_v4(struct xdp_md *xdp, struct ctl_value *cval,
>  	return 1;
>  }
>  
> -static __attribute__ ((noinline))
> +static noinline
>  bool decap_v6(struct xdp_md *xdp, void **data, void **data_end, bool inner_v4)
>  {
>  	struct eth_hdr *new_eth;
> @@ -403,7 +408,7 @@ bool decap_v6(struct xdp_md *xdp, void **data, void **data_end, bool inner_v4)
>  	return 1;
>  }
>  
> -static __attribute__ ((noinline))
> +static noinline
>  bool decap_v4(struct xdp_md *xdp, void **data, void **data_end)
>  {
>  	struct eth_hdr *new_eth;
> @@ -421,7 +426,7 @@ bool decap_v4(struct xdp_md *xdp, void **data, void **data_end)
>  	return 1;
>  }
>  
> -static __attribute__ ((noinline))
> +static noinline
>  int swap_mac_and_send(void *data, void *data_end)
>  {
>  	unsigned char tmp_mac[6];
> @@ -434,7 +439,7 @@ int swap_mac_and_send(void *data, void *data_end)
>  	return XDP_TX;
>  }
>  
> -static __attribute__ ((noinline))
> +static noinline
>  int send_icmp_reply(void *data, void *data_end)
>  {
>  	struct icmphdr *icmp_hdr;
> @@ -467,7 +472,7 @@ int send_icmp_reply(void *data, void *data_end)
>  	return swap_mac_and_send(data, data_end);
>  }
>  
> -static __attribute__ ((noinline))
> +static noinline
>  int send_icmp6_reply(void *data, void *data_end)
>  {
>  	struct icmp6hdr *icmp_hdr;
> @@ -491,7 +496,7 @@ int send_icmp6_reply(void *data, void *data_end)
>  	return swap_mac_and_send(data, data_end);
>  }
>  
> -static __attribute__ ((noinline))
> +static noinline
>  int parse_icmpv6(void *data, void *data_end, __u64 off,
>  		 struct packet_description *pckt)
>  {
> @@ -516,7 +521,7 @@ int parse_icmpv6(void *data, void *data_end, __u64 off,
>  	return -1;
>  }
>  
> -static __attribute__ ((noinline))
> +static noinline
>  int parse_icmp(void *data, void *data_end, __u64 off,
>  	       struct packet_description *pckt)
>  {
> @@ -543,7 +548,7 @@ int parse_icmp(void *data, void *data_end, __u64 off,
>  	return -1;
>  }
>  
> -static __attribute__ ((noinline))
> +static noinline
>  __u32 get_packet_hash(struct packet_description *pckt,
>  		      bool hash_16bytes)
>  {
> @@ -555,11 +560,11 @@ __u32 get_packet_hash(struct packet_description *pckt,
>  				    24);
>  }
>  
> -__attribute__ ((noinline))
> -static bool get_packet_dst(struct real_definition **real,
> -			   struct packet_description *pckt,
> -			   struct vip_meta *vip_info,
> -			   bool is_ipv6, void *lru_map)
> +static noinline
> +bool get_packet_dst(struct real_definition **real,
> +		    struct packet_description *pckt,
> +		    struct vip_meta *vip_info,
> +		    bool is_ipv6, void *lru_map)
>  {
>  	struct real_pos_lru new_dst_lru = { };
>  	bool hash_16bytes = is_ipv6;
> @@ -608,10 +613,10 @@ static bool get_packet_dst(struct real_definition **real,
>  	return 1;
>  }
>  
> -__attribute__ ((noinline))
> -static void connection_table_lookup(struct real_definition **real,
> -				    struct packet_description *pckt,
> -				    void *lru_map)
> +static noinline
> +void connection_table_lookup(struct real_definition **real,
> +			     struct packet_description *pckt,
> +			     void *lru_map)
>  {
>  
>  	struct real_pos_lru *dst_lru;
> @@ -635,11 +640,11 @@ static void connection_table_lookup(struct real_definition **real,
>   * below function has 6 arguments whereas bpf and llvm allow maximum of 5
>   * but since it's _static_ llvm can optimize one argument away
>   */
> -__attribute__ ((noinline))
> -static int process_l3_headers_v6(struct packet_description *pckt,
> -				 __u8 *protocol, __u64 off,
> -				 __u16 *pkt_bytes, void *data,
> -				 void *data_end)
> +static noinline
> +int process_l3_headers_v6(struct packet_description *pckt,
> +			  __u8 *protocol, __u64 off,
> +			  __u16 *pkt_bytes, void *data,
> +			  void *data_end)
>  {
>  	struct ipv6hdr *ip6h;
>  	__u64 iph_len;
> @@ -666,11 +671,11 @@ static int process_l3_headers_v6(struct packet_description *pckt,
>  	return -1;
>  }
>  
> -__attribute__ ((noinline))
> -static int process_l3_headers_v4(struct packet_description *pckt,
> -				 __u8 *protocol, __u64 off,
> -				 __u16 *pkt_bytes, void *data,
> -				 void *data_end)
> +static noinline
> +int process_l3_headers_v4(struct packet_description *pckt,
> +			  __u8 *protocol, __u64 off,
> +			  __u16 *pkt_bytes, void *data,
> +			  void *data_end)
>  {
>  	struct iphdr *iph;
>  	__u64 iph_len;
> @@ -698,9 +703,9 @@ static int process_l3_headers_v4(struct packet_description *pckt,
>  	return -1;
>  }
>  
> -__attribute__ ((noinline))
> -static int process_packet(void *data, __u64 off, void *data_end,
> -			  bool is_ipv6, struct xdp_md *xdp)
> +static inline

s/inline/noinline/

> +int process_packet(void *data, __u64 off, void *data_end,
> +		   bool is_ipv6, struct xdp_md *xdp)
>  {
>  
>  	struct real_definition *dst = NULL;
> 

^ permalink raw reply

* Re: [PATCH bpf-next] bpf, doc: Update bpf_jit_enable limitation for CONFIG_BPF_JIT_ALWAYS_ON
From: Daniel Borkmann @ 2018-04-27  9:59 UTC (permalink / raw)
  To: Leo Yan
  Cc: Alexei Starovoitov, David S. Miller, Jonathan Corbet, netdev,
	linux-kernel, linux-doc
In-Reply-To: <20180427094910.GA31015@leoy-ThinkPad-X240s>

On 04/27/2018 11:49 AM, Leo Yan wrote:
> On Fri, Apr 27, 2018 at 11:44:44AM +0200, Daniel Borkmann wrote:
>> On 04/26/2018 04:26 AM, Leo Yan wrote:
>>> When CONFIG_BPF_JIT_ALWAYS_ON is enabled, kernel has limitation for
>>> bpf_jit_enable, so it has fixed value 1 and we cannot set it to 2
>>> for JIT opcode dumping; this patch is to update the doc for it.
>>>
>>> Signed-off-by: Leo Yan <leo.yan@linaro.org>
>>> ---
>>>  Documentation/networking/filter.txt | 6 ++++++
>>>  1 file changed, 6 insertions(+)
>>>
>>> diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
>>> index fd55c7d..feddab9 100644
>>> --- a/Documentation/networking/filter.txt
>>> +++ b/Documentation/networking/filter.txt
>>> @@ -483,6 +483,12 @@ Example output from dmesg:
>>>  [ 3389.935851] JIT code: 00000030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 00
>>>  [ 3389.935852] JIT code: 00000040: eb 02 31 c0 c9 c3
>>>  
>>> +When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is set to 1 by default
>>> +and it returns failure if change to any other value from proc node; this is
>>> +for security consideration to avoid leaking info to unprivileged users. In this
>>> +case, we can't directly dump JIT opcode image from kernel log, alternatively we
>>> +need to use bpf tool for the dumping.
>>> +
>>
>> Could you change this doc text a bit, I think it's slightly misleading. From the first
>> sentence one could also interpret that value 0 would leaking info to unprivileged users
>> whereas here we're only talking about the case of value 2. Maybe something roughly like
>> this to make it more clear:
>>
>>   When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is permanently set to 1 and
>>   setting any other value than that will return in failure. This is even the case for
>>   setting bpf_jit_enable to 2, since dumping the final JIT image into the kernel log
>>   is discouraged and introspection through bpftool (under tools/bpf/bpftool/) is the
>>   generally recommended approach instead.
> 
> Yeah, your rephrasing is more clear and better.  Will do this and send
> new patch soon.  Thanks for your helping.

Awesome, thank you!

^ permalink raw reply

* [PATCH bpf-next v2] bpf, doc: Update bpf_jit_enable limitation for CONFIG_BPF_JIT_ALWAYS_ON
From: Leo Yan @ 2018-04-27 10:02 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, David S. Miller,
	Jonathan Corbet, netdev, linux-kernel, linux-doc
  Cc: Leo Yan

When CONFIG_BPF_JIT_ALWAYS_ON is enabled, kernel has limitation for
bpf_jit_enable, so it has fixed value 1 and we cannot set it to 2
for JIT opcode dumping; this patch is to update the doc for it.

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 Documentation/networking/filter.txt | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
index fd55c7d..5032e12 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -483,6 +483,12 @@ Example output from dmesg:
 [ 3389.935851] JIT code: 00000030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 00
 [ 3389.935852] JIT code: 00000040: eb 02 31 c0 c9 c3
 
+When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is permanently set to 1 and
+setting any other value than that will return in failure. This is even the case for
+setting bpf_jit_enable to 2, since dumping the final JIT image into the kernel log
+is discouraged and introspection through bpftool (under tools/bpf/bpftool/) is the
+generally recommended approach instead.
+
 In the kernel source tree under tools/bpf/, there's bpf_jit_disasm for
 generating disassembly out of the kernel log's hexdump:
 
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH net-next 00/13] sctp: refactor MTU handling
From: Xin Long @ 2018-04-27 10:04 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: network dev, linux-sctp, Vlad Yasevich, Neil Horman
In-Reply-To: <cover.1524772453.git.marcelo.leitner@gmail.com>

On Fri, Apr 27, 2018 at 3:58 AM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> Currently MTU handling is spread over SCTP stack. There are multiple
> places doing same/similar calculations and updating them is error prone
> as one spot can easily be left out.
>
> This patchset converges it into a more concise and consistent code. In
> general, it moves MTU handling from functions with bigger objectives,
> such as sctp_assoc_add_peer(), to specific functions.
>
> It's also a preparation for the next patchset, which removes the
> duplication between sctp_make_op_error_space and
> sctp_make_op_error_fixed and relies on sctp_mtu_payload introduced here.
>
> More details on each patch.
>
> Marcelo Ricardo Leitner (13):
>   sctp: remove old and unused SCTP_MIN_PMTU
>   sctp: move transport pathmtu calc away of sctp_assoc_add_peer
>   sctp: remove an if() that is always true
>   sctp: introduce sctp_assoc_set_pmtu
>   sctp: introduce sctp_mtu_payload
>   sctp: introduce sctp_assoc_update_frag_point
>   sctp: remove sctp_assoc_pending_pmtu
>   sctp: introduce sctp_dst_mtu
>   sctp: remove sctp_transport_pmtu_check
>   sctp: re-use sctp_transport_pmtu in sctp_transport_route
>   sctp: honor PMTU_DISABLED when handling icmp
>   sctp: consider idata chunks when setting SCTP_MAXSEG
>   sctp: allow unsetting sockopt MAXSEG
>
>  include/net/sctp/constants.h |  5 ++--
>  include/net/sctp/sctp.h      | 52 ++++++++++++++------------------------
>  include/net/sctp/structs.h   |  2 ++
>  net/sctp/associola.c         | 60 +++++++++++++++++++++++---------------------
>  net/sctp/chunk.c             | 12 +--------
>  net/sctp/output.c            | 28 ++++++++-------------
>  net/sctp/socket.c            | 43 ++++++++++++++-----------------
>  net/sctp/transport.c         | 37 ++++++++++++++-------------
>  8 files changed, 105 insertions(+), 134 deletions(-)
>
> --
> 2.14.3
>
Series
Reviewed-by: Xin Long <lucien.xin@gmail.com>

^ permalink raw reply

* Re: [PATCH 1/3] selftests/bpf: Makefile: add includes to fix broken test build
From: Daniel Borkmann @ 2018-04-27 10:04 UTC (permalink / raw)
  To: Sirio Balmelli, ast; +Cc: netdev
In-Reply-To: <20180426083107.GA13908@vm4>

On 04/26/2018 10:31 AM, Sirio Balmelli wrote:
> several bpf tests fail to build with clang 7.0.0:
> ...
> In file included from ../../../include/uapi/linux/bpf.h:11:
> In file included from ./include/uapi/linux/types.h:5:
> /usr/include/asm-generic/int-ll64.h:11:10: fatal error: 'asm/bitsperlong.h' file not found
> 
> /usr/include/asm-generic/int-ll64.h is from outside the kernel repo,
> probably a good idea to repoint to -I$(ROOT)/include/uapi.
> asm/bitsperlong.h is architecture-specific, cater for this with an
> architecture-specific include -I$(ROOT)/$(ARCH)/include/uapi.
> 
> Re-building now yields:
> ../../../../include/uapi/linux/stddef.h:2:10: fatal error: 'linux/compiler_types.h' file not found
> 
> Fix this with -I$(ROOT)/include
> 
> Signed-off-by: Sirio Balmelli <sirio@b-ad.ch>
> ---
>  tools/testing/selftests/bpf/Makefile | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index 0b72cc7..6a8cfaf 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -80,8 +80,14 @@ else
>    CPU ?= generic
>  endif
>  
> -CLANG_FLAGS = -I. -I./include/uapi -I../../../include/uapi \
> -	      -Wno-compare-distinct-pointer-types
> +ARCH := arch/$(subst _64,,$(shell uname -p))
> +ROOT :=../../../..
> +TOOLS :=../../..
> +CLANG_FLAGS = -I. -I./include/uapi \
> +	-I$(TOOLS)/include/uapi -I$(TOOLS)/include \
> +	-I$(ROOT)/$(ARCH)/include/uapi \
> +	-I$(ROOT)/include/uapi -I$(ROOT)/include \
> +	-Wno-compare-distinct-pointer-types

Problem is that this will now pull in all sort of kernel headers whereas
before the includes are limited and contained to tools/include/ respectively
tools/arch/*/include/, meaning, the tools/ infrastructure has specifically
headers that are needed under these locations. And a bitsperlong.h is already
present there, thus please change and respin your fix to reuse that one.

Thanks Sirio!

^ permalink raw reply

* Re: [PATCH bpf-next v2] bpf, doc: Update bpf_jit_enable limitation for CONFIG_BPF_JIT_ALWAYS_ON
From: Daniel Borkmann @ 2018-04-27 10:10 UTC (permalink / raw)
  To: Leo Yan, Alexei Starovoitov, David S. Miller, Jonathan Corbet,
	netdev, linux-kernel, linux-doc
In-Reply-To: <1524823374-6174-1-git-send-email-leo.yan@linaro.org>

On 04/27/2018 12:02 PM, Leo Yan wrote:
> When CONFIG_BPF_JIT_ALWAYS_ON is enabled, kernel has limitation for
> bpf_jit_enable, so it has fixed value 1 and we cannot set it to 2
> for JIT opcode dumping; this patch is to update the doc for it.
> 
> Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>

Applied to bpf-next, thanks Leo!

^ permalink raw reply

* Re: [dm-devel] [PATCH v5] fault-injection: introduce kvmalloc fallback options
From: Mikulas Patocka @ 2018-04-27 10:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Michael S. Tsirkin, John Stoffel, James Bottomley, Michal,
	eric.dumazet, netdev, jasowang, Randy Dunlap, linux-kernel,
	Matthew Wilcox, linux-mm, dm-devel, Vlastimil Babka, Andrew,
	David Rientjes, Morton, virtualization, David Miller, edumazet
In-Reply-To: <20180427082555.GC17484@dhcp22.suse.cz>

On Fri, 27 Apr 2018, Michal Hocko wrote:

> On Thu 26-04-18 18:52:05, Mikulas Patocka wrote:
> > 
> > 
> > On Fri, 27 Apr 2018, Michael S. Tsirkin wrote:
> [...]
> > >    But assuming it's important to control this kind of
> > >    fault injection to be controlled from
> > >    a dedicated menuconfig option, why not the rest of
> > >    faults?
> > 
> > The injected faults cause damage to the user, so there's no point to 
> > enable them by default. vmalloc fallback should not cause any damage 
> > (assuming that the code is correctly written).
> 
> But you want to find those bugs which would BUG_ON easier, so there is a
> risk of harm IIUC

Yes, I want to harm them, but I only want to harm the users using the 
debugging kernel. Testers should be "harmed" by crashes - so that the 
users of production kernels are harmed less.

If someone hits this, he should report it, use the kernel parameter to 
turn it off and continue with the testing.

> and this is not much different than other fault injecting paths.

Fault injections causes misbehavior even on completely bug-free code (for 
example, syscalls randomly returning -ENOMEM). This won't cause 
misbehavior on bug-free code.

Mikulas

^ permalink raw reply

* [PATCH net-next 0/2] netns: uevent filtering
From: Christian Brauner @ 2018-04-27 10:23 UTC (permalink / raw)
  To: ebiederm, davem, netdev, linux-kernel
  Cc: avagin, ktkhai, serge, gregkh, Christian Brauner

Hey everyone,

This is the new approach to uevent filtering as discussed (see the
threads in [1], [2], and [3]).

This series deals with with fixing up uevent filtering logic:
- uevent filtering logic is simplified
- locking time on uevent_sock_list is minimized
- tagged and untagged kobjects are handled in separate codepaths
- permissions for userspace are fixed for network device uevents in
  network namespaces owned by non-initial user namespaces
  Udev is now able to see those events correctly which it wasn't before.
  For example, moving a physical device into a network namespace not
  owned by the initial user namespaces before gave:

  root@xen1:~# udevadm --debug monitor -k
  calling: monitor
  monitor will print the received events for:
  KERNEL - the kernel uevent

  sender uid=65534, message ignored
  sender uid=65534, message ignored
  sender uid=65534, message ignored
  sender uid=65534, message ignored
  sender uid=65534, message ignored

  and now after the discussion and solution in [3] correctly gives:

  root@xen1:~# udevadm --debug monitor -k
  calling: monitor
  monitor will print the received events for:
  KERNEL - the kernel uevent

  KERNEL[625.301042] add      /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/enp1s0f1 (net)
  KERNEL[625.301109] move     /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/enp1s0f1 (net)
  KERNEL[625.301138] move     /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/eth1 (net)
  KERNEL[655.333272] remove /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/eth1 (net)

Thanks!
Christian

[1]: https://lkml.org/lkml/2018/4/4/739
[2]: https://lkml.org/lkml/2018/4/26/767
[3]: https://lkml.org/lkml/2018/4/26/738

Christian Brauner (2):
  uevent: add alloc_uevent_skb() helper
  netns: restrict uevents

 lib/kobject_uevent.c | 175 ++++++++++++++++++++++++++++++-------------
 1 file changed, 123 insertions(+), 52 deletions(-)

-- 
2.17.0

^ permalink raw reply

* [PATCH net-next 1/2 v3] uevent: add alloc_uevent_skb() helper
From: Christian Brauner @ 2018-04-27 10:23 UTC (permalink / raw)
  To: ebiederm, davem, netdev, linux-kernel
  Cc: avagin, ktkhai, serge, gregkh, Christian Brauner
In-Reply-To: <20180427102306.8617-1-christian.brauner@ubuntu.com>

This patch adds alloc_uevent_skb() in preparation for follow up patches.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
 lib/kobject_uevent.c | 39 ++++++++++++++++++++++++++-------------
 1 file changed, 26 insertions(+), 13 deletions(-)

diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index 15ea216a67ce..c3cb110f663b 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -296,6 +296,31 @@ static void cleanup_uevent_env(struct subprocess_info *info)
 }
 #endif
 
+static struct sk_buff *alloc_uevent_skb(struct kobj_uevent_env *env,
+					const char *action_string,
+					const char *devpath)
+{
+	struct sk_buff *skb = NULL;
+	char *scratch;
+	size_t len;
+
+	/* allocate message with maximum possible size */
+	len = strlen(action_string) + strlen(devpath) + 2;
+	skb = alloc_skb(len + env->buflen, GFP_KERNEL);
+	if (!skb)
+		return NULL;
+
+	/* add header */
+	scratch = skb_put(skb, len);
+	sprintf(scratch, "%s@%s", action_string, devpath);
+
+	skb_put_data(skb, env->buf, env->buflen);
+
+	NETLINK_CB(skb).dst_group = 1;
+
+	return skb;
+}
+
 static int kobject_uevent_net_broadcast(struct kobject *kobj,
 					struct kobj_uevent_env *env,
 					const char *action_string,
@@ -314,22 +339,10 @@ static int kobject_uevent_net_broadcast(struct kobject *kobj,
 			continue;
 
 		if (!skb) {
-			/* allocate message with the maximum possible size */
-			size_t len = strlen(action_string) + strlen(devpath) + 2;
-			char *scratch;
-
 			retval = -ENOMEM;
-			skb = alloc_skb(len + env->buflen, GFP_KERNEL);
+			skb = alloc_uevent_skb(env, action_string, devpath);
 			if (!skb)
 				continue;
-
-			/* add header */
-			scratch = skb_put(skb, len);
-			sprintf(scratch, "%s@%s", action_string, devpath);
-
-			skb_put_data(skb, env->buf, env->buflen);
-
-			NETLINK_CB(skb).dst_group = 1;
 		}
 
 		retval = netlink_broadcast_filtered(uevent_sock, skb_get(skb),
-- 
2.17.0

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox