Netdev List
 help / color / mirror / Atom feed
* Re: KASAN: use-after-free Read in rxrpc_queue_local
From: syzbot @ 2019-08-12 18:06 UTC (permalink / raw)
  To: arvid.brodin, davem, dhowells, linux-afs, linux-kernel, netdev,
	syzkaller-bugs, xiyou.wangcong
In-Reply-To: <0000000000007593f4058fea60d8@google.com>

syzbot has bisected this bug to:

commit b9a1e627405d68d475a3c1f35e685ccfb5bbe668
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date:   Thu Jul 4 00:21:13 2019 +0000

     hsr: implement dellink to clean up resources

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=10b4ebce600000
start commit:   125b7e09 net: tc35815: Explicitly check NET_IP_ALIGN is no..
git tree:       net
final crash:    https://syzkaller.appspot.com/x/report.txt?x=12b4ebce600000
console output: https://syzkaller.appspot.com/x/log.txt?x=14b4ebce600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=a4c9e9f08e9e8960
dashboard link: https://syzkaller.appspot.com/bug?extid=78e71c5bab4f76a6a719
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=165ec172600000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=119d4eba600000

Reported-by: syzbot+78e71c5bab4f76a6a719@syzkaller.appspotmail.com
Fixes: b9a1e627405d ("hsr: implement dellink to clean up resources")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply

* Re: [PATCH net] ibmveth: Convert multicast list size for little-endian systems
From: Thomas Falcon @ 2019-08-12 18:02 UTC (permalink / raw)
  To: Joe Perches, netdev; +Cc: liuhangbin, davem
In-Reply-To: <93581c35c0a8c06434221e628153028105849064.camel@perches.com>


On 8/12/19 12:49 PM, Joe Perches wrote:
> On Mon, 2019-08-12 at 12:43 -0500, Thomas Falcon wrote:
>> The ibm,mac-address-filters property defines the maximum number of
>> addresses the hypervisor's multicast filter list can support. It is
>> encoded as a big-endian integer in the OF device tree, but the virtual
>> ethernet driver does not convert it for use by little-endian systems.
>> As a result, the driver is not behaving as it should on affected systems
>> when a large number of multicast addresses are assigned to the device.
>>
>> Reported-by: Hangbin Liu <liuhangbin@gmail.com>
>> Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
>> ---
>>   drivers/net/ethernet/ibm/ibmveth.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
>> index d654c23..b50a6cf 100644
>> --- a/drivers/net/ethernet/ibm/ibmveth.c
>> +++ b/drivers/net/ethernet/ibm/ibmveth.c
>> @@ -1645,7 +1645,7 @@ static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id)
>>   
>>   	adapter->vdev = dev;
>>   	adapter->netdev = netdev;
>> -	adapter->mcastFilterSize = *mcastFilterSize_p;
>> +	adapter->mcastFilterSize = be32_to_cpu(*mcastFilterSize_p);
>>   	adapter->pool_config = 0;
>>   
>>   	netif_napi_add(netdev, &adapter->napi, ibmveth_poll, 16);
> Perhaps to keep sparse happy too: (untested)


Thanks! I will test this and submit a v2 soon.


> ---
>   drivers/net/ethernet/ibm/ibmveth.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
> index d654c234aaf7..90539d7ce565 100644
> --- a/drivers/net/ethernet/ibm/ibmveth.c
> +++ b/drivers/net/ethernet/ibm/ibmveth.c
> @@ -1605,7 +1605,7 @@ static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id)
>   	struct net_device *netdev;
>   	struct ibmveth_adapter *adapter;
>   	unsigned char *mac_addr_p;
> -	unsigned int *mcastFilterSize_p;
> +	__be32 *mcastFilterSize_p;
>   	long ret;
>   	unsigned long ret_attr;
>   
> @@ -1627,7 +1627,7 @@ static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id)
>   		return -EINVAL;
>   	}
>   
> -	mcastFilterSize_p = (unsigned int *)vio_get_attribute(dev,
> +	mcastFilterSize_p = (__be32 *)vio_get_attribute(dev,
>   						VETH_MCAST_FILTER_SIZE, NULL);
>   	if (!mcastFilterSize_p) {
>   		dev_err(&dev->dev, "Can't find VETH_MCAST_FILTER_SIZE "
> @@ -1645,7 +1645,7 @@ static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id)
>   
>   	adapter->vdev = dev;
>   	adapter->netdev = netdev;
> -	adapter->mcastFilterSize = *mcastFilterSize_p;
> +	adapter->mcastFilterSize = be32_to_cpu(*mcastFilterSize_p);
>   	adapter->pool_config = 0;
>   
>   	netif_napi_add(netdev, &adapter->napi, ibmveth_poll, 16);
>
>


^ permalink raw reply

* Re: 5.3-rc3-ish VM crash: RIP: 0010:tcp_trim_head+0x20/0xe0
From: Eric Dumazet @ 2019-08-12 17:56 UTC (permalink / raw)
  To: Sander Eikelenboom, netdev, linux-kernel
In-Reply-To: <27aebb57-0ca9-fba3-092f-39131ad2b648@eikelenboom.it>



On 8/12/19 2:50 PM, Sander Eikelenboom wrote:
> L.S.,
> 
> While testing a somewhere-after-5.3-rc3 kernel (which included the latest net merge (33920f1ec5bf47c5c0a1d2113989bdd9dfb3fae9),
> one of my Xen VM's (which gets quite some network load) crashed.
> See below for the stacktrace.
> 
> Unfortunately I haven't got a clear trigger, so bisection doesn't seem to be an option at the moment. 
> I haven't encountered this on 5.2, so it seems to be an regression against 5.2.
> 
> Any ideas ?
> 
> --
> Sander
> 
> 
> [16930.653595] general protection fault: 0000 [#1] SMP NOPTI
> [16930.653624] CPU: 0 PID: 3275 Comm: rsync Not tainted 5.3.0-rc3-20190809-doflr+ #1
> [16930.653657] RIP: 0010:tcp_trim_head+0x20/0xe0
> [16930.653677] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8
> [16930.653741] RSP: 0000:ffffc90000003ad8 EFLAGS: 00010286
> [16930.653762] RAX: fffe888005bf62c0 RBX: ffff8880115fb800 RCX: 000000008010000b

crash in " mov    0x20(%rax),%eax"   and RAX=fffe888005bf62c0 (not a valid kernel address)

Look like one bit corruption maybe.

Nothing comes to mind really between 5.2 and 53 that could explain this.

> [16930.653791] RDX: 00000000000005a0 RSI: ffff8880115fb800 RDI: ffff888016b00880
> [16930.653819] RBP: ffff888016b00880 R08: 0000000000000001 R09: 0000000000000000
> [16930.653848] R10: ffff88800ae00800 R11: 00000000bfe632e6 R12: 00000000000005a0
> [16930.653875] R13: 0000000000000001 R14: 00000000bfe62d46 R15: 0000000000000004
> [16930.653913] FS:  00007fe71fe2cb80(0000) GS:ffff88801f200000(0000) knlGS:0000000000000000
> [16930.653943] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [16930.653965] CR2: 000055de0f3e7000 CR3: 0000000011f32000 CR4: 00000000000006f0
> [16930.653993] Call Trace:
> [16930.654005]  <IRQ>
> [16930.654018]  tcp_ack+0xbb0/0x1230
> [16930.654033]  tcp_rcv_established+0x2e8/0x630
> [16930.654053]  tcp_v4_do_rcv+0x129/0x1d0
> [16930.654070]  tcp_v4_rcv+0xac9/0xcb0
> [16930.654088]  ip_protocol_deliver_rcu+0x27/0x1b0
> [16930.654109]  ip_local_deliver_finish+0x3f/0x50
> [16930.654128]  ip_local_deliver+0x4d/0xe0
> [16930.654145]  ? ip_protocol_deliver_rcu+0x1b0/0x1b0
> [16930.654163]  ip_rcv+0x4c/0xd0
> [16930.654179]  __netif_receive_skb_one_core+0x79/0x90
> [16930.654200]  netif_receive_skb_internal+0x2a/0xa0
> [16930.654219]  napi_gro_receive+0xe7/0x140
> [16930.654237]  xennet_poll+0x9be/0xae0
> [16930.654254]  net_rx_action+0x136/0x340
> [16930.654271]  __do_softirq+0xdd/0x2cf
> [16930.654287]  irq_exit+0x7a/0xa0
> [16930.654304]  xen_evtchn_do_upcall+0x27/0x40
> [16930.654320]  xen_hvm_callback_vector+0xf/0x20
> [16930.654339]  </IRQ>
> [16930.654349] RIP: 0033:0x55de0d87db99
> [16930.654364] Code: 00 00 48 89 7c 24 f8 45 39 fe 45 0f 42 fe 44 89 7c 24 f4 eb 09 0f 1f 40 00 83 e9 01 74 3e 89 f2 48 63 f8 4c 01 d2 44 38 1c 3a <75> 25 44 38 6c 3a ff 75 1e 41 0f b6 3c 24 40 38 3a 75 14 41 0f b6
> [16930.654432] RSP: 002b:00007ffd5531eec8 EFLAGS: 00000a87 ORIG_RAX: ffffffffffffff0c
> [16930.655004] RAX: 0000000000000002 RBX: 000055de0f3e8e50 RCX: 000000000000007f
> [16930.655034] RDX: 000055de0f3dc2d2 RSI: 0000000000003492 RDI: 0000000000000002
> [16930.655062] RBP: 0000000000007fff R08: 00000000000080ea R09: 00000000000001f0
> [16930.655089] R10: 000055de0f3d8e40 R11: 0000000000000094 R12: 000055de0f3e0f2a
> [16930.655116] R13: 0000000000000010 R14: 0000000000007f16 R15: 0000000000000080
> [16930.655144] Modules linked in:
> [16930.655200] ---[ end trace 533367c95501b645 ]---
> [16930.655223] RIP: 0010:tcp_trim_head+0x20/0xe0
> [16930.655243] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8
> [16930.655312] RSP: 0000:ffffc90000003ad8 EFLAGS: 00010286
> [16930.655331] RAX: fffe888005bf62c0 RBX: ffff8880115fb800 RCX: 000000008010000b
> [16930.655360] RDX: 00000000000005a0 RSI: ffff8880115fb800 RDI: ffff888016b00880
> [16930.655387] RBP: ffff888016b00880 R08: 0000000000000001 R09: 0000000000000000
> [16930.655414] R10: ffff88800ae00800 R11: 00000000bfe632e6 R12: 00000000000005a0
> [16930.655441] R13: 0000000000000001 R14: 00000000bfe62d46 R15: 0000000000000004
> [16930.655475] FS:  00007fe71fe2cb80(0000) GS:ffff88801f200000(0000) knlGS:0000000000000000
> [16930.655502] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [16930.655525] CR2: 000055de0f3e7000 CR3: 0000000011f32000 CR4: 00000000000006f0
> [16930.655553] Kernel panic - not syncing: Fatal exception in interrupt
> [16930.655789] Kernel Offset: disabled
> 

^ permalink raw reply

* Re: [PATCH v3 13/17] mvpp2: no need to check return value of debugfs_create functions
From: Nick Desaulniers @ 2019-08-12 17:55 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: netdev, David S. Miller, Maxime Chevallier, Nathan Huckleberry
In-Reply-To: <20190810101732.26612-14-gregkh@linuxfoundation.org>

On Sat, Aug 10, 2019 at 3:17 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> When calling debugfs functions, there is no need to ever check the
> return value.  The function can work or not, but the code logic should
> never do something different based on this.

Maybe adding this recommendation to the comment block above the
definition of debugfs_create_dir() in fs/debugfs/inode.c would help
prevent this issue in the future?  What failure means, and how to
proceed can be tricky; more documentation can only help in this
regard.

>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Maxime Chevallier <maxime.chevallier@bootlin.com>
> Cc: Nick Desaulniers <ndesaulniers@google.com>
> Cc: Nathan Huckleberry <nhuck@google.com>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---
>  .../ethernet/marvell/mvpp2/mvpp2_debugfs.c    | 19 +------------------
>  1 file changed, 1 insertion(+), 18 deletions(-)
>
> diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_debugfs.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_debugfs.c
> index 274fb07362cb..4a3baa7e0142 100644
> --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_debugfs.c
> +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_debugfs.c
> @@ -452,8 +452,6 @@ static int mvpp2_dbgfs_flow_port_init(struct dentry *parent,
>         struct dentry *port_dir;
>
>         port_dir = debugfs_create_dir(port->dev->name, parent);
> -       if (IS_ERR(port_dir))
> -               return PTR_ERR(port_dir);
>
>         port_entry = &port->priv->dbgfs_entries->port_flow_entries[port->id];
>
> @@ -480,8 +478,6 @@ static int mvpp2_dbgfs_flow_entry_init(struct dentry *parent,
>         sprintf(flow_entry_name, "%02d", flow);
>
>         flow_entry_dir = debugfs_create_dir(flow_entry_name, parent);
> -       if (!flow_entry_dir)
> -               return -ENOMEM;
>
>         entry = &priv->dbgfs_entries->flow_entries[flow];
>
> @@ -514,8 +510,6 @@ static int mvpp2_dbgfs_flow_init(struct dentry *parent, struct mvpp2 *priv)
>         int i, ret;
>
>         flow_dir = debugfs_create_dir("flows", parent);
> -       if (!flow_dir)
> -               return -ENOMEM;
>
>         for (i = 0; i < MVPP2_N_PRS_FLOWS; i++) {
>                 ret = mvpp2_dbgfs_flow_entry_init(flow_dir, priv, i);
> @@ -539,8 +533,6 @@ static int mvpp2_dbgfs_prs_entry_init(struct dentry *parent,
>         sprintf(prs_entry_name, "%03d", tid);
>
>         prs_entry_dir = debugfs_create_dir(prs_entry_name, parent);
> -       if (!prs_entry_dir)
> -               return -ENOMEM;
>
>         entry = &priv->dbgfs_entries->prs_entries[tid];
>
> @@ -578,8 +570,6 @@ static int mvpp2_dbgfs_prs_init(struct dentry *parent, struct mvpp2 *priv)
>         int i, ret;
>
>         prs_dir = debugfs_create_dir("parser", parent);
> -       if (!prs_dir)
> -               return -ENOMEM;
>
>         for (i = 0; i < MVPP2_PRS_TCAM_SRAM_SIZE; i++) {
>                 ret = mvpp2_dbgfs_prs_entry_init(prs_dir, priv, i);
> @@ -688,8 +678,6 @@ static int mvpp2_dbgfs_port_init(struct dentry *parent,
>         struct dentry *port_dir;
>
>         port_dir = debugfs_create_dir(port->dev->name, parent);
> -       if (IS_ERR(port_dir))
> -               return PTR_ERR(port_dir);
>
>         debugfs_create_file("parser_entries", 0444, port_dir, port,
>                             &mvpp2_dbgfs_port_parser_fops);
> @@ -716,15 +704,10 @@ void mvpp2_dbgfs_init(struct mvpp2 *priv, const char *name)
>         int ret, i;
>
>         mvpp2_root = debugfs_lookup(MVPP2_DRIVER_NAME, NULL);
> -       if (!mvpp2_root) {
> +       if (!mvpp2_root)
>                 mvpp2_root = debugfs_create_dir(MVPP2_DRIVER_NAME, NULL);
> -               if (IS_ERR(mvpp2_root))
> -                       return;
> -       }
>
>         mvpp2_dir = debugfs_create_dir(name, mvpp2_root);
> -       if (IS_ERR(mvpp2_dir))
> -               return;
>
>         priv->dbgfs_dir = mvpp2_dir;
>         priv->dbgfs_entries = kzalloc(sizeof(*priv->dbgfs_entries), GFP_KERNEL);
> --
> 2.22.0
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply

* Re: [PATCH bpf-next v2 2/4] bpf: support cloning sk storage on accept()
From: Stanislav Fomichev @ 2019-08-12 17:52 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Stanislav Fomichev, netdev, bpf, davem, ast, Martin KaFai Lau,
	Yonghong Song
In-Reply-To: <db5ec323-1126-d461-bc65-27ccc1414589@iogearbox.net>

On 08/12, Daniel Borkmann wrote:
> On 8/9/19 6:10 PM, Stanislav Fomichev wrote:
> > Add new helper bpf_sk_storage_clone which optionally clones sk storage
> > and call it from sk_clone_lock.
> > 
> > Cc: Martin KaFai Lau <kafai@fb.com>
> > Cc: Yonghong Song <yhs@fb.com>
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> [...]
> > +int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
> > +{
> > +	struct bpf_sk_storage *new_sk_storage = NULL;
> > +	struct bpf_sk_storage *sk_storage;
> > +	struct bpf_sk_storage_elem *selem;
> > +	int ret;
> > +
> > +	RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
> > +
> > +	rcu_read_lock();
> > +	sk_storage = rcu_dereference(sk->sk_bpf_storage);
> > +
> > +	if (!sk_storage || hlist_empty(&sk_storage->list))
> > +		goto out;
> > +
> > +	hlist_for_each_entry_rcu(selem, &sk_storage->list, snode) {
> > +		struct bpf_sk_storage_elem *copy_selem;
> > +		struct bpf_sk_storage_map *smap;
> > +		struct bpf_map *map;
> > +		int refold;
> > +
> > +		smap = rcu_dereference(SDATA(selem)->smap);
> > +		if (!(smap->map.map_flags & BPF_F_CLONE))
> > +			continue;
> > +
> > +		map = bpf_map_inc_not_zero(&smap->map, false);
> > +		if (IS_ERR(map))
> > +			continue;
> > +
> > +		copy_selem = bpf_sk_storage_clone_elem(newsk, smap, selem);
> > +		if (!copy_selem) {
> > +			ret = -ENOMEM;
> > +			bpf_map_put(map);
> > +			goto err;
> > +		}
> > +
> > +		if (new_sk_storage) {
> > +			selem_link_map(smap, copy_selem);
> > +			__selem_link_sk(new_sk_storage, copy_selem);
> > +		} else {
> > +			ret = sk_storage_alloc(newsk, smap, copy_selem);
> > +			if (ret) {
> > +				kfree(copy_selem);
> > +				atomic_sub(smap->elem_size,
> > +					   &newsk->sk_omem_alloc);
> > +				bpf_map_put(map);
> > +				goto err;
> > +			}
> > +
> > +			new_sk_storage = rcu_dereference(copy_selem->sk_storage);
> > +		}
> > +		bpf_map_put(map);
> 
> The map get/put combination /under/ RCU read lock seems a bit odd to me, could
> you exactly describe the race that this would be preventing?
There is a race between sk storage release and sk storage clone.
bpf_sk_storage_map_free uses synchronize_rcu to wait for all existing
users to finish and the new ones are prevented via map's refcnt being
zero; we need to do something like that for the clone.
Martin suggested to use bpf_map_inc_not_zero/bpf_map_put.
If I read everythin correctly, I think without map_inc/map_put we
get the following race:

CPU0                                   CPU1

bpf_map_put
  bpf_sk_storage_map_free(smap)
    synchronize_rcu

    // no more users via bpf or
    // syscall, but clone
    // can still happen

    for each (bucket)
      selem_unlink
        selem_unlink_map(smap)

        // adding anything at
        // this point to the
        // bucket will leak

                                       rcu_read_lock
                                       tcp_v4_rcv
                                         tcp_v4_do_rcv
                                           // sk is lockless TCP_LISTEN
                                           tcp_v4_cookie_check
                                             tcp_v4_syn_recv_sock
                                               bpf_sk_storage_clone
                                                 rcu_dereference(sk->sk_bpf_storage)
                                                 selem_link_map(smap, copy)
                                                 // adding new element to the
                                                 // map -> leak
                                       rcu_read_unlock

      selem_unlink_sk
       sk->sk_bpf_storage = NULL

    synchronize_rcu

^ permalink raw reply

* RE: [PATCH v3 net-next 0/3] net: batched receive in GRO path
From: Ioana Ciocoi Radulescu @ 2019-08-12 17:51 UTC (permalink / raw)
  To: Edward Cree
  Cc: David Miller, netdev, Eric Dumazet,
	linux-net-drivers@solarflare.com
In-Reply-To: <a6faf533-6dd3-d7d7-9464-1fe87d0ac7fc@solarflare.com>

> -----Original Message-----
> From: Edward Cree <ecree@solarflare.com>
> Sent: Friday, August 9, 2019 8:32 PM
> To: Ioana Ciocoi Radulescu <ruxandra.radulescu@nxp.com>
> Cc: David Miller <davem@davemloft.net>; netdev <netdev@vger.kernel.org>;
> Eric Dumazet <eric.dumazet@gmail.com>; linux-net-drivers@solarflare.com
> Subject: Re: [PATCH v3 net-next 0/3] net: batched receive in GRO path
> 
> On 09/08/2019 18:14, Ioana Ciocoi Radulescu wrote:
> > Hi Edward,
> >
> > I'm probably missing a lot of context here, but is there a reason
> > this change targets only the napi_gro_frags() path and not the
> > napi_gro_receive() one?
> > I'm trying to understand what drivers that don't call napi_gro_frags()
> > should do in order to benefit from this batching feature.
> The sfc driver (which is what I have lots of hardware for, so I can
>  test it) uses napi_gro_frags().
> It should be possible to do a similar patch to napi_gro_receive(),
>  if someone wants to put in the effort of writing and testing it.

Rather tricky, since I'm not really familiar with GRO internals and
probably don't understand all the implications of such a change :-/
Any pointers to what I should pay attention to/sensitive areas that
need extra care?

> However, there are many more callers, so more effort required to
>  make sure none of them care whether the return value is GRO_DROP
>  or GRO_NORMAL (since the listified version cannot give that
>  indication).

At a quick glance, there's only one driver that looks at the return
value of napi_gro_receive (drivers/net/ethernet/socionext/netsec.c),
and it only updates interface stats based on it.

> Also, the guidance from Eric is that drivers seeking high performance
>  should use napi_gro_frags(), as this allows GRO to recycle the SKB.

But this guidance is for GRO-able frames only, right? If I try to use
napi_gro_frags() indiscriminately on the Rx path, I get a big
performance penalty in some cases - e.g. forwarding of non-TCP
single buffer frames.

On the other hand, Eric shot down my attempt to differentiate between
TCP and non-TCP frames inside the driver (see 
https://patchwork.ozlabs.org/patch/1135817/#2222236), so I'm not
really sure what's the recommended approach here?

> 
> All of this together means I don't plan to submit such a patch; I
>  would however be happy to review a patch if someone else writes one.

Thanks a lot for the explanations!
Ioana

^ permalink raw reply

* Re: [PATCH net-next,v4 08/12] drivers: net: use flow block API
From: Edward Cree @ 2019-08-12 17:50 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netdev, netfilter-devel
In-Reply-To: <20190709205550.3160-9-pablo@netfilter.org>

On 09/07/2019 21:55, Pablo Neira Ayuso wrote:
> This patch updates flow_block_cb_setup_simple() to use the flow block API.
> Several drivers are also adjusted to use it.
>
> This patch introduces the per-driver list of flow blocks to account for
> blocks that are already in use.
>
> Remove tc_block_offload alias.
>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> ---
> v4: fix typo in list in nfp driver - Jakub Kicinski.
>     Move driver_list handling to the driver code, this list is transitional,
>     until drivers are updated to support multiple subsystems. No more
>     driver_list handling from core.

Pablo, can you explain (because this commit message doesn't) why these per-
 driver lists are needed, and what the information/state is that has module
 (rather than, say, netdevice) scope?

AFAICT none of these drivers otherwise interacts with TC at a module level
 (every block binding, callback etc. is associated with a single netdevice,
 usually through a cb_priv = netdev_priv(net_dev)), so this looks very out
 of place.

(More questions likely to follow as I work my way through trying to re-
 target my in-development driver to this new API.  Also if you have any
 kind of design document explaining how it all fits together, that'd be
 nice, because currently trying to figure it out is giving me a headache.)

-Ed

^ permalink raw reply

* Re: [PATCH net] ibmveth: Convert multicast list size for little-endian systems
From: Joe Perches @ 2019-08-12 17:49 UTC (permalink / raw)
  To: Thomas Falcon, netdev; +Cc: liuhangbin, davem
In-Reply-To: <1565631786-18860-1-git-send-email-tlfalcon@linux.ibm.com>

On Mon, 2019-08-12 at 12:43 -0500, Thomas Falcon wrote:
> The ibm,mac-address-filters property defines the maximum number of
> addresses the hypervisor's multicast filter list can support. It is
> encoded as a big-endian integer in the OF device tree, but the virtual
> ethernet driver does not convert it for use by little-endian systems.
> As a result, the driver is not behaving as it should on affected systems
> when a large number of multicast addresses are assigned to the device.
> 
> Reported-by: Hangbin Liu <liuhangbin@gmail.com>
> Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
> ---
>  drivers/net/ethernet/ibm/ibmveth.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
> index d654c23..b50a6cf 100644
> --- a/drivers/net/ethernet/ibm/ibmveth.c
> +++ b/drivers/net/ethernet/ibm/ibmveth.c
> @@ -1645,7 +1645,7 @@ static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id)
>  
>  	adapter->vdev = dev;
>  	adapter->netdev = netdev;
> -	adapter->mcastFilterSize = *mcastFilterSize_p;
> +	adapter->mcastFilterSize = be32_to_cpu(*mcastFilterSize_p);
>  	adapter->pool_config = 0;
>  
>  	netif_napi_add(netdev, &adapter->napi, ibmveth_poll, 16);

Perhaps to keep sparse happy too: (untested)
---
 drivers/net/ethernet/ibm/ibmveth.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
index d654c234aaf7..90539d7ce565 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -1605,7 +1605,7 @@ static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id)
 	struct net_device *netdev;
 	struct ibmveth_adapter *adapter;
 	unsigned char *mac_addr_p;
-	unsigned int *mcastFilterSize_p;
+	__be32 *mcastFilterSize_p;
 	long ret;
 	unsigned long ret_attr;
 
@@ -1627,7 +1627,7 @@ static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id)
 		return -EINVAL;
 	}
 
-	mcastFilterSize_p = (unsigned int *)vio_get_attribute(dev,
+	mcastFilterSize_p = (__be32 *)vio_get_attribute(dev,
 						VETH_MCAST_FILTER_SIZE, NULL);
 	if (!mcastFilterSize_p) {
 		dev_err(&dev->dev, "Can't find VETH_MCAST_FILTER_SIZE "
@@ -1645,7 +1645,7 @@ static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id)
 
 	adapter->vdev = dev;
 	adapter->netdev = netdev;
-	adapter->mcastFilterSize = *mcastFilterSize_p;
+	adapter->mcastFilterSize = be32_to_cpu(*mcastFilterSize_p);
 	adapter->pool_config = 0;
 
 	netif_napi_add(netdev, &adapter->napi, ibmveth_poll, 16);



^ permalink raw reply related

* Re: [PATCH bpf-next v4 1/2] xsk: remove AF_XDP socket from map when the socket is released
From: Björn Töpel @ 2019-08-12 17:25 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Netdev, Björn Töpel,
	Karlsson, Magnus, Bruce Richardson, Song Liu, bpf
In-Reply-To: <5ad56a5e-a189-3f56-c85c-24b6c300efd9@iogearbox.net>

On Mon, 12 Aug 2019 at 14:28, Daniel Borkmann <daniel@iogearbox.net> wrote:
>
[...]
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index 59b57d708697..c3447bad608a 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -362,6 +362,50 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
> >       dev_put(dev);
> >   }
> >
> > +static struct xsk_map *xsk_get_map_list_entry(struct xdp_sock *xs,
> > +                                           struct xdp_sock ***map_entry)
> > +{
> > +     struct xsk_map *map = NULL;
> > +     struct xsk_map_node *node;
> > +
> > +     *map_entry = NULL;
> > +
> > +     spin_lock_bh(&xs->map_list_lock);
> > +     node = list_first_entry_or_null(&xs->map_list, struct xsk_map_node,
> > +                                     node);
> > +     if (node) {
> > +             WARN_ON(xsk_map_inc(node->map));
>
> Can you elaborate on the refcount usage here and against what scenario it is protecting?
>

Thanks for having a look!

First we access the map_list (under the lock) and pull out the map
which we intend to clean. In order to clear the map entry, we need to
a reference to the map. However, when the map_list_lock is released,
there's a window where the map entry can be cleared and the map can be
destroyed, and making the "map", which is used in
xsk_delete_from_maps, stale. To guarantee existence the additional
refinc is required. Makes sense?

> Do we pretend it never fails on the bpf_map_inc() wrt the WARN_ON(),
> why that (what makes it different from the xsk_map_node_alloc() inc
> above where we do error out)?

Hmm, given that we're in a cleanup (socket release), we can't really
return any error. What would be a more robust way? Retrying? AFAIK the
release ops return an int, but it's not checked/used.

> > +             map = node->map;
> > +             *map_entry = node->map_entry;
> > +     }
> > +     spin_unlock_bh(&xs->map_list_lock);
> > +     return map;
> > +}
> > +
> > +static void xsk_delete_from_maps(struct xdp_sock *xs)
> > +{
> > +     /* This function removes the current XDP socket from all the
> > +      * maps it resides in. We need to take extra care here, due to
> > +      * the two locks involved. Each map has a lock synchronizing
> > +      * updates to the entries, and each socket has a lock that
> > +      * synchronizes access to the list of maps (map_list). For
> > +      * deadlock avoidance the locks need to be taken in the order
> > +      * "map lock"->"socket map list lock". We start off by
> > +      * accessing the socket map list, and take a reference to the
> > +      * map to guarantee existence. Then we ask the map to remove
> > +      * the socket, which tries to remove the socket from the
> > +      * map. Note that there might be updates to the map between
> > +      * xsk_get_map_list_entry() and xsk_map_try_sock_delete().
> > +      */

I tried to clarify here, but I obviously need to do a better job. :-)


Björn

^ permalink raw reply

* Re: [PATCH net] nexthop: use nlmsg_parse_deprecated()
From: David Ahern @ 2019-08-12 17:23 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller
  Cc: netdev, Eric Dumazet, syzbot, Johannes Berg
In-Reply-To: <20190812113616.51725-1-edumazet@google.com>

On 8/12/19 5:36 AM, Eric Dumazet wrote:
> David missed that commit 8cb081746c03 ("netlink: make validation
> more configurable for future strictness") has renamed nlmsg_parse()

I think the root cause is nlmsg_parse() calling __nla_parse and not
__nlmsg_parse. Users of nlmsg_parse are missing the header validation.

^ permalink raw reply

* Re: [PATCH net-next] net: can: Fix compiling warning
From: Kees Cook @ 2019-08-12 17:19 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Oliver Hartkopp, Patrick Bellasi, linux-sparse, Mao Wenan, davem,
	netdev, linux-kernel, kernel-janitors, Ingo Molnar
In-Reply-To: <20190807105042.GK1974@kadam>

On Wed, Aug 07, 2019 at 01:50:42PM +0300, Dan Carpenter wrote:
> On Tue, Aug 06, 2019 at 06:41:44PM +0200, Oliver Hartkopp wrote:
> > I compiled the code (the original version), but I do not get that "Should it
> > be static?" warning:
> > 
> > user@box:~/net-next$ make C=1
> >   CALL    scripts/checksyscalls.sh
> >   CALL    scripts/atomic/check-atomics.sh
> >   DESCEND  objtool
> >   CHK     include/generated/compile.h
> >   CHECK   net/can/af_can.c
> > ./include/linux/sched.h:609:43: error: bad integer constant expression
> > ./include/linux/sched.h:609:73: error: invalid named zero-width bitfield
> > `value'
> > ./include/linux/sched.h:610:43: error: bad integer constant expression
> > ./include/linux/sched.h:610:67: error: invalid named zero-width bitfield
> > `bucket_id'
> >   CC [M]  net/can/af_can.o
> 
> The sched.h errors suppress Sparse warnings so it's broken/useless now.
> The code looks like this:
> 
> include/linux/sched.h
>    613  struct uclamp_se {
>    614          unsigned int value              : bits_per(SCHED_CAPACITY_SCALE);
>    615          unsigned int bucket_id          : bits_per(UCLAMP_BUCKETS);
>    616          unsigned int active             : 1;
>    617          unsigned int user_defined       : 1;
>    618  };
> 
> bits_per() is zero and Sparse doesn't like zero sized bitfields.

I just noticed these sparse warnings too -- what's happening here? Are
they _supposed_ to be 0-width fields? It doesn't look like it to me:

CONFIG_UCLAMP_BUCKETS_COUNT=5
...
#define UCLAMP_BUCKETS CONFIG_UCLAMP_BUCKETS_COUNT
...
        unsigned int bucket_id          : bits_per(UCLAMP_BUCKETS);

I would expect this to be 3 bits wide. ... Looks like gcc agrees:

struct uclamp_se {
    unsigned int               value:11;             /*     0: 0  4 */
    unsigned int               bucket_id:3;          /*     0:11  4 */
...

So this is a sparse issue?

-- 
Kees Cook

^ permalink raw reply

* [PATCH net-next] net: devlink: remove redundant rtnl lock assert
From: Vlad Buslov @ 2019-08-12 17:02 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, Vlad Buslov, Jiri Pirko

It is enough for caller of devlink_compat_switch_id_get() to hold the net
device to guarantee that devlink port is not destroyed concurrently. Remove
rtnl lock assertion and modify comment to warn user that they must hold
either rtnl lock or reference to net device. This is necessary to
accommodate future implementation of rtnl-unlocked TC offloads driver
callbacks.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/core/devlink.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index e8f0b891f000..e98e7bd9740b 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -6938,11 +6938,10 @@ int devlink_compat_switch_id_get(struct net_device *dev,
 {
 	struct devlink_port *devlink_port;
 
-	/* RTNL mutex is held here which ensures that devlink_port
-	 * instance cannot disappear in the middle. No need to take
+	/* Caller must hold RTNL mutex or reference to dev, which ensures that
+	 * devlink_port instance cannot disappear in the middle. No need to take
 	 * any devlink lock as only permanent values are accessed.
 	 */
-	ASSERT_RTNL();
 	devlink_port = netdev_to_devlink_port(dev);
 	if (!devlink_port || !devlink_port->attrs.switch_port)
 		return -EOPNOTSUPP;
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH v5 19/29] compat_ioctl: move hci_sock handlers into driver
From: Marcel Holtmann @ 2019-08-12 16:29 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, Johan Hedberg,
	David S. Miller, Deepa Dinamani, linux-bluetooth, netdev
In-Reply-To: <20190730195819.901457-7-arnd@arndb.de>

Hi Arnd,

> All these ioctl commands are compatible, so we can handle
> them with a trivial wrapper in hci_sock.c and remove
> the listing in fs/compat_ioctl.c.
> 
> A few of the commands pass integer arguments instead of
> pointers, so for correctness skip the compat_ptr() conversion
> here.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
> fs/compat_ioctl.c        | 24 ------------------------
> net/bluetooth/hci_sock.c | 21 ++++++++++++++++++++-
> 2 files changed, 20 insertions(+), 25 deletions(-)

I think it is best if this series is applied as a whole. So whoever takes it

Acked-by: Marcel Holtmann <marcel@holtmann.org>

Regards

Marcel


^ permalink raw reply

* Re: [PATCH v5 18/29] compat_ioctl: move rfcomm handlers into driver
From: Marcel Holtmann @ 2019-08-12 16:29 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, Johan Hedberg,
	David S. Miller, Mauro Carvalho Chehab, linux-bluetooth, netdev
In-Reply-To: <20190730195819.901457-6-arnd@arndb.de>

Hi Arnd,

> All these ioctl commands are compatible, so we can handle
> them with a trivial wrapper in rfcomm/sock.c and remove
> the listing in fs/compat_ioctl.c.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
> fs/compat_ioctl.c           |  6 ------
> net/bluetooth/rfcomm/sock.c | 14 ++++++++++++--
> 2 files changed, 12 insertions(+), 8 deletions(-)

I think it is best if this series is applied as a whole. So whoever takes it

Acked-by: Marcel Holtmann <marcel@holtmann.org>

Regards

Marcel


^ permalink raw reply

* Re: [patch net-next rfc 3/7] net: rtnetlink: add commands to add and delete alternative ifnames
From: Roopa Prabhu @ 2019-08-12 16:23 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jiri Pirko, David Ahern, netdev, David Miller, Jakub Kicinski,
	Stephen Hemminger, dcbw, Michal Kubecek, Andrew Lunn, parav,
	Saeed Mahameed, mlxsw
In-Reply-To: <20190812084039.2fbd1f01@hermes.lan>

On Mon, Aug 12, 2019 at 8:40 AM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Mon, 12 Aug 2019 10:31:39 +0200
> Jiri Pirko <jiri@resnulli.us> wrote:
>
> > Mon, Aug 12, 2019 at 03:37:26AM CEST, dsahern@gmail.com wrote:
> > >On 8/11/19 7:34 PM, David Ahern wrote:
> > >> On 8/10/19 12:30 AM, Jiri Pirko wrote:
> > >>> Could you please write me an example message of add/remove?
> > >>
> > >> altnames are for existing netdevs, yes? existing netdevs have an id and
> > >> a name - 2 existing references for identifying the existing netdev for
> > >> which an altname will be added. Even using the altname as the main
> > >> 'handle' for a setlink change, I see no reason why the GETLINK api can
> > >> not take an the IFLA_ALT_IFNAME and return the full details of the
> > >> device if the altname is unique.
> > >>
> > >> So, what do the new RTM commands give you that you can not do with
> > >> RTM_*LINK?
> > >>
> > >
> > >
> > >To put this another way, the ALT_NAME is an attribute of an object - a
> > >LINK. It is *not* a separate object which requires its own set of
> > >commands for manipulating.
> >
> > Okay, again, could you provide example of a message to add/remove
> > altname using existing setlink message? Thanks!
>
> The existing IFALIAS takes an empty name to do deletion.

yes, if its a single alt name, keeping it similar to IFALIAS will help

^ permalink raw reply

* Re: [PATCH net-next] net: openvswitch: Set OvS recirc_id from tc chain index
From: Pravin Shelar @ 2019-08-12 16:18 UTC (permalink / raw)
  To: Paul Blakey
  Cc: Linux Kernel Network Developers, David S. Miller, Justin Pettit,
	Simon Horman, Marcelo Ricardo Leitner, Vlad Buslov, Jiri Pirko,
	Roi Dayan, Yossi Kuperman, Rony Efraim, Oz Shlomo
In-Reply-To: <b5342e56-4baa-97ab-8694-2f48d012afca@mellanox.com>

On Sun, Aug 11, 2019 at 3:46 AM Paul Blakey <paulb@mellanox.com> wrote:
>
>
> On 8/8/2019 11:53 PM, Pravin Shelar wrote:
> > On Wed, Aug 7, 2019 at 5:08 AM Paul Blakey <paulb@mellanox.com> wrote:
> >> Offloaded OvS datapath rules are translated one to one to tc rules,
> >> for example the following simplified OvS rule:
> >>
> >> recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2)
> >>
> >> Will be translated to the following tc rule:
> >>
> >> $ tc filter add dev dev1 ingress \
> >>              prio 1 chain 0 proto ip \
> >>                  flower tcp ct_state -trk \
> >>                  action ct pipe \
> >>                  action goto chain 2
> >>
> >> Received packets will first travel though tc, and if they aren't stolen
> >> by it, like in the above rule, they will continue to OvS datapath.
> >> Since we already did some actions (action ct in this case) which might
> >> modify the packets, and updated action stats, we would like to continue
> >> the proccessing with the correct recirc_id in OvS (here recirc_id(2))
> >> where we left off.
> >>
> >> To support this, introduce a new skb extension for tc, which
> >> will be used for translating tc chain to ovs recirc_id to
> >> handle these miss cases. Last tc chain index will be set
> >> by tc goto chain action and read by OvS datapath.
> >>
> >> Signed-off-by: Paul Blakey <paulb@mellanox.com>
> >> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
> >> Acked-by: Jiri Pirko <jiri@mellanox.com>
> >> ---
> >>   include/linux/skbuff.h    | 13 +++++++++++++
> >>   include/net/sch_generic.h |  5 ++++-
> >>   net/core/skbuff.c         |  6 ++++++
> >>   net/openvswitch/flow.c    |  9 +++++++++
> >>   net/sched/Kconfig         | 13 +++++++++++++
> >>   net/sched/act_api.c       |  1 +
> >>   net/sched/cls_api.c       | 12 ++++++++++++
> >>   7 files changed, 58 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> >> index 3aef8d8..fb2a792 100644
> >> --- a/include/linux/skbuff.h
> >> +++ b/include/linux/skbuff.h
> >> @@ -279,6 +279,16 @@ struct nf_bridge_info {
> >>   };
> >>   #endif
> >>
> >> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
> >> +/* Chain in tc_skb_ext will be used to share the tc chain with
> >> + * ovs recirc_id. It will be set to the current chain by tc
> >> + * and read by ovs to recirc_id.
> >> + */
> >> +struct tc_skb_ext {
> >> +       __u32 chain;
> >> +};
> >> +#endif
> >> +
> >>   struct sk_buff_head {
> >>          /* These two members must be first. */
> >>          struct sk_buff  *next;
> >> @@ -4050,6 +4060,9 @@ enum skb_ext_id {
> >>   #ifdef CONFIG_XFRM
> >>          SKB_EXT_SEC_PATH,
> >>   #endif
> >> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
> >> +       TC_SKB_EXT,
> >> +#endif
> >>          SKB_EXT_NUM, /* must be last */
> >>   };
> >>
> >> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> >> index 6b6b012..871feea 100644
> >> --- a/include/net/sch_generic.h
> >> +++ b/include/net/sch_generic.h
> >> @@ -275,7 +275,10 @@ struct tcf_result {
> >>                          unsigned long   class;
> >>                          u32             classid;
> >>                  };
> >> -               const struct tcf_proto *goto_tp;
> >> +               struct {
> >> +                       const struct tcf_proto *goto_tp;
> >> +                       u32 goto_index;
> >> +               };
> >>
> >>                  /* used in the skb_tc_reinsert function */
> >>                  struct {
> >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> >> index ea8e8d3..2b40b5a 100644
> >> --- a/net/core/skbuff.c
> >> +++ b/net/core/skbuff.c
> >> @@ -4087,6 +4087,9 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
> >>   #ifdef CONFIG_XFRM
> >>          [SKB_EXT_SEC_PATH] = SKB_EXT_CHUNKSIZEOF(struct sec_path),
> >>   #endif
> >> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
> >> +       [TC_SKB_EXT] = SKB_EXT_CHUNKSIZEOF(struct tc_skb_ext),
> >> +#endif
> >>   };
> >>
> >>   static __always_inline unsigned int skb_ext_total_length(void)
> >> @@ -4098,6 +4101,9 @@ static __always_inline unsigned int skb_ext_total_length(void)
> >>   #ifdef CONFIG_XFRM
> >>                  skb_ext_type_len[SKB_EXT_SEC_PATH] +
> >>   #endif
> >> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
> >> +               skb_ext_type_len[TC_SKB_EXT] +
> >> +#endif
> >>                  0;
> >>   }
> >>
> >> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> >> index bc89e16..0287ead 100644
> >> --- a/net/openvswitch/flow.c
> >> +++ b/net/openvswitch/flow.c
> >> @@ -816,6 +816,9 @@ static int key_extract_mac_proto(struct sk_buff *skb)
> >>   int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
> >>                           struct sk_buff *skb, struct sw_flow_key *key)
> >>   {
> >> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
> >> +       struct tc_skb_ext *tc_ext;
> >> +#endif
> >>          int res, err;
> >>
> >>          /* Extract metadata from packet. */
> >> @@ -848,7 +851,13 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
> >>          if (res < 0)
> >>                  return res;
> >>          key->mac_proto = res;
> >> +
> >> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
> >> +       tc_ext = skb_ext_find(skb, TC_SKB_EXT);
> >> +       key->recirc_id = tc_ext ? tc_ext->chain : 0;
> >> +#else
> >>          key->recirc_id = 0;
> >> +#endif
> >>
> > Most of cases the config would be turned on, so the ifdef is not that
> > useful. Can you add static key to avoid searching the skb-ext in non
> > offload cases.
>
> Hi,
>
> What do you mean by a static key?
>
https://www.kernel.org/doc/Documentation/static-keys.txt

Static key can be enabled when a flow is added to the tc filter.

^ permalink raw reply

* Re: [PATCH v2 2/2] Documentation: sphinx: Don't parse socket() as identifier reference
From: Mauro Carvalho Chehab @ 2019-08-12 16:11 UTC (permalink / raw)
  To: Jonathan Neuschäfer
  Cc: linux-doc, Jonathan Corbet, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, linux-kernel, netdev, xdp-newbies, bpf
In-Reply-To: <20190812160708.32172-2-j.neuschaefer@gmx.net>

Em Mon, 12 Aug 2019 18:07:05 +0200
Jonathan Neuschäfer <j.neuschaefer@gmx.net> escreveu:

> With the introduction of Documentation/sphinx/automarkup.py, socket() is
> parsed as a reference to the in-kernel definition of socket. Sphinx then
> decides that struct socket is a good match, which is usually not
> intended, when the syscall is meant instead. This was observed in
> Documentation/networking/af_xdp.rst.
> 
> Prevent socket() from being misinterpreted by adding it to the Skipfuncs
> list in automarkup.py.
> 
> Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
> ---
> 
> v2:
> - block socket() in Documentation/sphinx/automarkup.py, as suggested by
>   Jonathan Corbet
> 
> v1:
> - https://lore.kernel.org/lkml/20190810121738.19587-1-j.neuschaefer@gmx.net/
> ---
>  Documentation/sphinx/automarkup.py | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/sphinx/automarkup.py b/Documentation/sphinx/automarkup.py
> index a8798369e8f7..5b6119ff69f4 100644
> --- a/Documentation/sphinx/automarkup.py
> +++ b/Documentation/sphinx/automarkup.py
> @@ -26,7 +26,8 @@ RE_function = re.compile(r'([\w_][\w\d_]+\(\))')
>  # just don't even try with these names.
>  #
>  Skipfuncs = [ 'open', 'close', 'read', 'write', 'fcntl', 'mmap',
> -              'select', 'poll', 'fork', 'execve', 'clone', 'ioctl']
> +              'select', 'poll', 'fork', 'execve', 'clone', 'ioctl',
> +              'socket' ]

Both patches sound OK on my eyes. Yet, I would just fold them into
a single one.

In any case:

Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
> 
>  #
>  # Find all occurrences of function() and try to replace them with
> --
> 2.20.1
> 



Thanks,
Mauro

^ permalink raw reply

* [PATCH v2 2/2] Documentation: sphinx: Don't parse socket() as identifier reference
From: Jonathan Neuschäfer @ 2019-08-12 16:07 UTC (permalink / raw)
  To: linux-doc
  Cc: Jonathan Neuschäfer, Jonathan Corbet, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Mauro Carvalho Chehab,
	linux-kernel, netdev, xdp-newbies, bpf
In-Reply-To: <20190812160708.32172-1-j.neuschaefer@gmx.net>

With the introduction of Documentation/sphinx/automarkup.py, socket() is
parsed as a reference to the in-kernel definition of socket. Sphinx then
decides that struct socket is a good match, which is usually not
intended, when the syscall is meant instead. This was observed in
Documentation/networking/af_xdp.rst.

Prevent socket() from being misinterpreted by adding it to the Skipfuncs
list in automarkup.py.

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
---

v2:
- block socket() in Documentation/sphinx/automarkup.py, as suggested by
  Jonathan Corbet

v1:
- https://lore.kernel.org/lkml/20190810121738.19587-1-j.neuschaefer@gmx.net/
---
 Documentation/sphinx/automarkup.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/sphinx/automarkup.py b/Documentation/sphinx/automarkup.py
index a8798369e8f7..5b6119ff69f4 100644
--- a/Documentation/sphinx/automarkup.py
+++ b/Documentation/sphinx/automarkup.py
@@ -26,7 +26,8 @@ RE_function = re.compile(r'([\w_][\w\d_]+\(\))')
 # just don't even try with these names.
 #
 Skipfuncs = [ 'open', 'close', 'read', 'write', 'fcntl', 'mmap',
-              'select', 'poll', 'fork', 'execve', 'clone', 'ioctl']
+              'select', 'poll', 'fork', 'execve', 'clone', 'ioctl',
+              'socket' ]

 #
 # Find all occurrences of function() and try to replace them with
--
2.20.1


^ permalink raw reply related

* Re: [PATCH bpf] s390/bpf: fix lcgr instruction encoding
From: Daniel Borkmann @ 2019-08-12 16:06 UTC (permalink / raw)
  To: Ilya Leoshkevich, bpf, netdev; +Cc: gor, heiko.carstens
In-Reply-To: <20190812150332.98109-1-iii@linux.ibm.com>

On 8/12/19 5:03 PM, Ilya Leoshkevich wrote:
> "masking, test in bounds 3" fails on s390, because
> BPF_ALU64_IMM(BPF_NEG, BPF_REG_2, 0) ignores the top 32 bits of
> BPF_REG_2. The reason is that JIT emits lcgfr instead of lcgr.
> The associated comment indicates that the code was intended to emit lcgr
> in the first place, it's just that the wrong opcode was used.
> 
> Fix by using the correct opcode.
> 
> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>

Applied, thanks!

^ permalink raw reply

* Re: [patch net-next rfc 3/7] net: rtnetlink: add commands to add and delete alternative ifnames
From: David Ahern @ 2019-08-12 16:01 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Roopa Prabhu, netdev, David Miller, Jakub Kicinski,
	Stephen Hemminger, dcbw, Michal Kubecek, Andrew Lunn, parav,
	Saeed Mahameed, mlxsw
In-Reply-To: <20190812083139.GA2428@nanopsycho>

On 8/12/19 2:31 AM, Jiri Pirko wrote:
> Mon, Aug 12, 2019 at 03:37:26AM CEST, dsahern@gmail.com wrote:
>> On 8/11/19 7:34 PM, David Ahern wrote:
>>> On 8/10/19 12:30 AM, Jiri Pirko wrote:
>>>> Could you please write me an example message of add/remove?
>>>
>>> altnames are for existing netdevs, yes? existing netdevs have an id and
>>> a name - 2 existing references for identifying the existing netdev for
>>> which an altname will be added. Even using the altname as the main
>>> 'handle' for a setlink change, I see no reason why the GETLINK api can
>>> not take an the IFLA_ALT_IFNAME and return the full details of the
>>> device if the altname is unique.
>>>
>>> So, what do the new RTM commands give you that you can not do with
>>> RTM_*LINK?
>>>
>>
>>
>> To put this another way, the ALT_NAME is an attribute of an object - a
>> LINK. It is *not* a separate object which requires its own set of
>> commands for manipulating.
> 
> Okay, again, could you provide example of a message to add/remove
> altname using existing setlink message? Thanks!
> 

Examples from your cover letter with updates

$ ip link set dummy0 altname someothername
$ ip link set dummy0 altname someotherveryveryveryverylongname

$ ip link set dummy0 del altname someothername
$ ip link set dummy0 del altname someotherveryveryveryverylongname

This syntactic sugar to what is really happening:

RTM_NEWLINK, dummy0, IFLA_ALT_IFNAME

if you are allowing many alt names, then yes, you need a flag to say
delete this specific one which is covered by Roopa's nested suggestion.

^ permalink raw reply

* Re: [PATCHv2 net 0/2] Add netdev_level_ratelimited to avoid netdev msg flush
From: Thomas Falcon @ 2019-08-12 15:56 UTC (permalink / raw)
  To: Hangbin Liu, David Miller; +Cc: netdev, joe
In-Reply-To: <20190812073733.GV18865@dhcp-12-139.nay.redhat.com>


On 8/12/19 2:37 AM, Hangbin Liu wrote:
> On Sun, Aug 11, 2019 at 09:08:20PM -0700, David Miller wrote:
>> From: Hangbin Liu <liuhangbin@gmail.com>
>> Date: Fri,  9 Aug 2019 08:29:39 +0800
>>
>>> ibmveth 30000003 env3: h_multicast_ctrl rc=4 when adding an entry to the filter table
>>> error when add more thann 256 memberships in one multicast group. I haven't
>>> found this issue on other driver. It looks like an ibm driver issue and need
>>> to be fixed separately.
>> You need to root cause and fix the reason this message appears so much.
>>
>> Once I let you rate limit the message you will have zero incentive to
>> fix the real problem and fix it.
> Sorry, I'm not familiar with ibmveth driver...
>
> Hi Thomas,
>
> Would you please help check why this issue happens


Hi, thanks for reporting this. I was able to recreate this on my own 
system. The virtual ethernet's multicast filter list size is limited, 
and the driver will check that there is available space before adding 
entries.  The problem is that the size is encoded as big endian, but the 
driver does not convert it for little endian systems after retrieving it 
from the device tree.  As a result the driver is requesting more than 
the hypervisor can allow and getting this error in reply. I will submit 
a patch to correct this soon.

Thanks again,

Tom

> Thanks
> Hangbbin
>

^ permalink raw reply

* Re: [PATCH v3 16/17] ixgbe: no need to check return value of debugfs_create functions
From: Jeff Kirsher @ 2019-08-12 15:49 UTC (permalink / raw)
  To: Greg Kroah-Hartman, netdev; +Cc: David S. Miller, intel-wired-lan
In-Reply-To: <20190810101732.26612-17-gregkh@linuxfoundation.org>

[-- Attachment #1: Type: text/plain, Size: 692 bytes --]

On Sat, 2019-08-10 at 12:17 +0200, Greg Kroah-Hartman wrote:
> When calling debugfs functions, there is no need to ever check the
> return value.  The function can work or not, but the code logic
> should
> never do something different based on this.
> 
> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

> ---
>  .../net/ethernet/intel/ixgbe/ixgbe_debugfs.c  | 22 +++++----------
> ----
>  1 file changed, 5 insertions(+), 17 deletions(-)


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v3 15/17] i40e: no need to check return value of debugfs_create functions
From: Jeff Kirsher @ 2019-08-12 15:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, netdev; +Cc: David S. Miller, intel-wired-lan
In-Reply-To: <20190810101732.26612-16-gregkh@linuxfoundation.org>

[-- Attachment #1: Type: text/plain, Size: 692 bytes --]

On Sat, 2019-08-10 at 12:17 +0200, Greg Kroah-Hartman wrote:
> When calling debugfs functions, there is no need to ever check the
> return value.  The function can work or not, but the code logic
> should
> never do something different based on this.
> 
> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

> ---
>  .../net/ethernet/intel/i40e/i40e_debugfs.c    | 22 ++++-------------
> --
>  1 file changed, 4 insertions(+), 18 deletions(-)


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v3 14/17] fm10k: no need to check return value of debugfs_create functions
From: Jeff Kirsher @ 2019-08-12 15:46 UTC (permalink / raw)
  To: Greg Kroah-Hartman, netdev; +Cc: David S. Miller, intel-wired-lan
In-Reply-To: <20190810101732.26612-15-gregkh@linuxfoundation.org>

[-- Attachment #1: Type: text/plain, Size: 655 bytes --]

On Sat, 2019-08-10 at 12:17 +0200, Greg Kroah-Hartman wrote:
> When calling debugfs functions, there is no need to ever check the
> return value.  The function can work or not, but the code logic
> should
> never do something different based on this.
> 
> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

> ---
>  drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c | 2 --
>  1 file changed, 2 deletions(-)


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH bpf] s390/bpf: fix lcgr instruction encoding
From: Vasily Gorbik @ 2019-08-12 15:46 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: bpf, netdev, heiko.carstens
In-Reply-To: <20190812150332.98109-1-iii@linux.ibm.com>

On Mon, Aug 12, 2019 at 05:03:32PM +0200, Ilya Leoshkevich wrote:
> "masking, test in bounds 3" fails on s390, because
> BPF_ALU64_IMM(BPF_NEG, BPF_REG_2, 0) ignores the top 32 bits of
> BPF_REG_2. The reason is that JIT emits lcgfr instead of lcgr.
> The associated comment indicates that the code was intended to emit lcgr
> in the first place, it's just that the wrong opcode was used.
> 
> Fix by using the correct opcode.
> 
> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> ---
>  arch/s390/net/bpf_jit_comp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
> index e636728ab452..6299156f9738 100644
> --- a/arch/s390/net/bpf_jit_comp.c
> +++ b/arch/s390/net/bpf_jit_comp.c
> @@ -863,7 +863,7 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, int i
>  		break;
>  	case BPF_ALU64 | BPF_NEG: /* dst = -dst */
>  		/* lcgr %dst,%dst */
> -		EMIT4(0xb9130000, dst_reg, dst_reg);
> +		EMIT4(0xb9030000, dst_reg, dst_reg);
>  		break;
>  	/*
>  	 * BPF_FROM_BE/LE
> -- 
> 2.21.0
> 
Please add
Fixes: 054623105728 ("s390/bpf: Add s390x eBPF JIT compiler backend")
or whatever it should be. With that:
Acked-by: Vasily Gorbik <gor@linux.ibm.com>


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox