Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v6 net-next 15/19] ionic: Add Tx and Rx handling
From: Jakub Kicinski @ 2019-08-29 23:18 UTC (permalink / raw)
  To: Shannon Nelson; +Cc: netdev, davem
In-Reply-To: <20190829182720.68419-16-snelson@pensando.io>

On Thu, 29 Aug 2019 11:27:16 -0700, Shannon Nelson wrote:
> +netdev_tx_t ionic_start_xmit(struct sk_buff *skb, struct net_device *netdev)
> +{
> +	u16 queue_index = skb_get_queue_mapping(skb);
> +	struct ionic_lif *lif = netdev_priv(netdev);
> +	struct ionic_queue *q;
> +	int ndescs;
> +	int err;
> +
> +	if (unlikely(!test_bit(IONIC_LIF_UP, lif->state))) {
> +		dev_kfree_skb(skb);
> +		return NETDEV_TX_OK;
> +	}
> +
> +	if (likely(lif_to_txqcq(lif, queue_index)))
> +		q = lif_to_txq(lif, queue_index);
> +	else
> +		q = lif_to_txq(lif, 0);
> +
> +	ndescs = ionic_tx_descs_needed(q, skb);
> +	if (ndescs < 0)
> +		goto err_out_drop;
> +
> +	if (!ionic_q_has_space(q, ndescs)) {

You should stop the queue in advance, whenever you can't ensure that a
max size frame can be placed on the ring. Requeuing is very expensive
so modern drivers should try to never return NETDEV_TX_BUSY

> +		netif_stop_subqueue(netdev, queue_index);
> +		q->stop++;
> +
> +		/* Might race with ionic_tx_clean, check again */
> +		smp_rmb();
> +		if (ionic_q_has_space(q, ndescs)) {
> +			netif_wake_subqueue(netdev, queue_index);
> +			q->wake++;
> +		} else {
> +			return NETDEV_TX_BUSY;
> +		}
> +	}
> +
> +	if (skb_is_gso(skb))
> +		err = ionic_tx_tso(q, skb);
> +	else
> +		err = ionic_tx(q, skb);
> +
> +	if (err)
> +		goto err_out_drop;
> +
> +	return NETDEV_TX_OK;
> +
> +err_out_drop:
> +	netif_stop_subqueue(netdev, queue_index);

This stopping of the queue is suspicious, if ionic_tx() fails there's
no guarantee the queue will ever be woken up, no?

> +	q->stop++;
> +	q->drop++;
> +	dev_kfree_skb(skb);
> +	return NETDEV_TX_OK;
> +}

^ permalink raw reply

* RE: linux-next: Tree for Aug 29 (mlx5)
From: Haiyang Zhang @ 2019-08-29 23:15 UTC (permalink / raw)
  To: Saeed Mahameed, sfr@canb.auug.org.au, Eran Ben Elisha,
	linux-next@vger.kernel.org, rdunlap@infradead.org
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	Leon Romanovsky
In-Reply-To: <82c4fad3fc394693a596597df0d73cc5235f7025.camel@mellanox.com>



> -----Original Message-----
> From: Saeed Mahameed <saeedm@mellanox.com>
> Sent: Thursday, August 29, 2019 4:04 PM
> To: sfr@canb.auug.org.au; Eran Ben Elisha <eranbe@mellanox.com>; linux-
> next@vger.kernel.org; rdunlap@infradead.org; Haiyang Zhang
> <haiyangz@microsoft.com>
> Cc: linux-kernel@vger.kernel.org; netdev@vger.kernel.org; Leon
> Romanovsky <leonro@mellanox.com>
> Subject: Re: linux-next: Tree for Aug 29 (mlx5)
> 
> On Thu, 2019-08-29 at 21:48 +0000, Haiyang Zhang wrote:
> > > -----Original Message-----
> > > From: Saeed Mahameed <saeedm@mellanox.com>
> > > Sent: Thursday, August 29, 2019 2:32 PM
> > > To: sfr@canb.auug.org.au; Eran Ben Elisha <eranbe@mellanox.com>;
> > > linux-
> > > next@vger.kernel.org; rdunlap@infradead.org; Haiyang Zhang
> > > <haiyangz@microsoft.com>
> > > Cc: linux-kernel@vger.kernel.org; netdev@vger.kernel.org; Leon
> > > Romanovsky <leonro@mellanox.com>
> > > Subject: Re: linux-next: Tree for Aug 29 (mlx5)
> > >
> > > On Thu, 2019-08-29 at 12:55 -0700, Randy Dunlap wrote:
> > > > On 8/29/19 12:54 PM, Randy Dunlap wrote:
> > > > > On 8/29/19 4:08 AM, Stephen Rothwell wrote:
> > > > > > Hi all,
> > > > > >
> > > > > > Changes since 20190828:
> > > > > >
> > > > >
> > > > > on x86_64:
> > > > > when CONFIG_PCI_HYPERV=m
> > > >
> > > > and CONFIG_PCI_HYPERV_INTERFACE=m
> > > >
> > >
> > > Haiyang and Eran, I think CONFIG_PCI_HYPERV_INTERFACE was never
> > > supposed to be a module ? it supposed to provide an always available
> > > interface to drivers ..
> > >
> > > Anyway, maybe we need to imply CONFIG_PCI_HYPERV_INTERFACE in
> mlx5.
> >
> > The symbolic dependency by driver mlx5e,  automatically triggers
> > loading of pci_hyperv_interface module. And this module can be loaded
> > in any platforms.
> >
> 
> This only works when both are modules.
> 
> 
> > Currently, mlx5e driver has #if
> > IS_ENABLED(CONFIG_PCI_HYPERV_INTERFACE)
> > around the code using the interface.
> >
> > I agree --
> > Adding "select PCI_HYPERV_INTERFACE" for mlx5e will clean up these
> > #if's.
> >
> 
> No, not "select", "imply".
> 
> if one wants PCI_HYPERV_INTERFACE off, imply will keep it off for you.

This looks good.

- Haiyang

^ permalink raw reply

* Re: [PATCH bpf-next 00/13] bpf: adding map batch processing support
From: Brian Vazquez @ 2019-08-29 23:13 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Yonghong Song, Alexei Starovoitov, bpf, netdev, Daniel Borkmann,
	kernel-team
In-Reply-To: <20190829113932.5c058194@cakuba.netronome.com>

On Thu, Aug 29, 2019 at 11:40 AM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Wed, 28 Aug 2019 23:45:02 -0700, Yonghong Song wrote:
> > Brian Vazquez has proposed BPF_MAP_DUMP command to look up more than one
> > map entries per syscall.
> >   https://lore.kernel.org/bpf/CABCgpaU3xxX6CMMxD+1knApivtc2jLBHysDXw-0E9bQEL0qC3A@mail.gmail.com/T/#t
> >
> > During discussion, we found more use cases can be supported in a similar
> > map operation batching framework. For example, batched map lookup and delete,
> > which can be really helpful for bcc.
> >   https://github.com/iovisor/bcc/blob/master/tools/tcptop.py#L233-L243
> >   https://github.com/iovisor/bcc/blob/master/tools/slabratetop.py#L129-L138
> >
> > Also, in bcc, we have API to delete all entries in a map.
> >   https://github.com/iovisor/bcc/blob/master/src/cc/api/BPFTable.h#L257-L264
> >
> > For map update, batched operations also useful as sometimes applications need
> > to populate initial maps with more than one entry. For example, the below
> > example is from kernel/samples/bpf/xdp_redirect_cpu_user.c:
> >   https://github.com/torvalds/linux/blob/master/samples/bpf/xdp_redirect_cpu_user.c#L543-L550
> >
> > This patch addresses all the above use cases. To make uapi stable, it also
> > covers other potential use cases. Four bpf syscall subcommands are introduced:
> >     BPF_MAP_LOOKUP_BATCH
> >     BPF_MAP_LOOKUP_AND_DELETE_BATCH
> >     BPF_MAP_UPDATE_BATCH
> >     BPF_MAP_DELETE_BATCH
> >
> > In userspace, application can iterate through the whole map one batch
> > as a time, e.g., bpf_map_lookup_batch() in the below:
> >     p_key = NULL;
> >     p_next_key = &key;
> >     while (true) {
> >        err = bpf_map_lookup_batch(fd, p_key, &p_next_key, keys, values,
> >                                   &batch_size, elem_flags, flags);
> >        if (err) ...
> >        if (p_next_key) break; // done
> >        if (!p_key) p_key = p_next_key;
> >     }
> > Please look at individual patches for details of new syscall subcommands
> > and examples of user codes.
> >
> > The testing is also done in a qemu VM environment:
> >       measure_lookup: max_entries 1000000, batch 10, time 342ms
> >       measure_lookup: max_entries 1000000, batch 1000, time 295ms
> >       measure_lookup: max_entries 1000000, batch 1000000, time 270ms
> >       measure_lookup: max_entries 1000000, no batching, time 1346ms
> >       measure_lookup_delete: max_entries 1000000, batch 10, time 433ms
> >       measure_lookup_delete: max_entries 1000000, batch 1000, time 363ms
> >       measure_lookup_delete: max_entries 1000000, batch 1000000, time 357ms
> >       measure_lookup_delete: max_entries 1000000, not batch, time 1894ms
> >       measure_delete: max_entries 1000000, batch, time 220ms
> >       measure_delete: max_entries 1000000, not batch, time 1289ms
> > For a 1M entry hash table, batch size of 10 can reduce cpu time
> > by 70%. Please see patch "tools/bpf: measure map batching perf"
> > for details of test codes.
>
> Hi Yonghong!
>
> great to see this, we have been looking at implementing some way to
> speed up map walks as well.
>
> The direction we were looking in, after previous discussions [1],
> however, was to provide a BPF program which can run the logic entirely
> within the kernel.
>
> We have a rough PoC on the FW side (we can offload the program which
> walks the map, which is pretty neat), but the kernel verifier side
> hasn't really progressed. It will soon.
>
> The rough idea is that the user space provides two programs, "filter"
> and "dumper":
>
>         bpftool map exec id XYZ filter pinned /some/prog \
>                                 dumper pinned /some/other_prog
>
> Both programs get this context:
>
> struct map_op_ctx {
>         u64 key;
>         u64 value;
> }
>
> We need a per-map implementation of the exec side, but roughly maps
> would do:
>
>         LIST_HEAD(deleted);
>
>         for entry in map {
>                 struct map_op_ctx {
>                         .key    = entry->key,
>                         .value  = entry->value,
>                 };
>
>                 act = BPF_PROG_RUN(filter, &map_op_ctx);
>                 if (act & ~ACT_BITS)
>                         return -EINVAL;
>
>                 if (act & DELETE) {
>                         map_unlink(entry);
>                         list_add(entry, &deleted);
>                 }
>                 if (act & STOP)
>                         break;
>         }
>
>         synchronize_rcu();
>
>         for entry in deleted {
>                 struct map_op_ctx {
>                         .key    = entry->key,
>                         .value  = entry->value,
>                 };
>
>                 BPF_PROG_RUN(dumper, &map_op_ctx);
>                 map_free(entry);
>         }
>
Hi Jakub,

how would that approach support percpu maps?

I'm thinking of a scenario where you want to do some calculations on
percpu maps and you are interested on the info on all the cpus not
just the one that is running the bpf program. Currently on a pcpu map
the bpf_map_lookup_elem helper only returns the pointer to the data of
the executing cpu.

> The filter program can't perform any map operations other than lookup,
> otherwise we won't be able to guarantee that we'll walk the entire map
> (if the filter program deletes some entries in a unfortunate order).
>
> If user space just wants a pure dump it can simply load a program which
> dumps the entries into a perf ring.
>
> I'm bringing this up because that mechanism should cover what is
> achieved with this patch set and much more.
>
> In particular for networking workloads where old flows have to be
> pruned from the map periodically it's far more efficient to communicate
> to user space only the flows which timed out (the delete batching from
> this set won't help at all).
>
> With a 2M entry map and this patch set we still won't be able to prune
> once a second on one core.
>
> [1]
> https://lore.kernel.org/netdev/20190813130921.10704-4-quentin.monnet@netronome.com/

^ permalink raw reply

* Re: [PATCH v6 net-next 14/19] ionic: Add initial ethtool support
From: Jakub Kicinski @ 2019-08-29 23:10 UTC (permalink / raw)
  To: Shannon Nelson; +Cc: netdev, davem
In-Reply-To: <20190829182720.68419-15-snelson@pensando.io>

On Thu, 29 Aug 2019 11:27:15 -0700, Shannon Nelson wrote:
> +static int ionic_get_module_eeprom(struct net_device *netdev,
> +				   struct ethtool_eeprom *ee,
> +				   u8 *data)
> +{
> +	struct ionic_lif *lif = netdev_priv(netdev);
> +	struct ionic_dev *idev = &lif->ionic->idev;
> +	struct ionic_xcvr_status *xcvr;
> +	char tbuf[sizeof(xcvr->sprom)];
> +	int count = 10;
> +	u32 len;
> +
> +	/* The NIC keeps the module prom up-to-date in the DMA space
> +	 * so we can simply copy the module bytes into the data buffer.
> +	 */
> +	xcvr = &idev->port_info->status.xcvr;
> +	len = min_t(u32, sizeof(xcvr->sprom), ee->len);
> +
> +	do {
> +		memcpy(data, xcvr->sprom, len);
> +		memcpy(tbuf, xcvr->sprom, len);
> +
> +		/* Let's make sure we got a consistent copy */
> +		if (!memcmp(data, tbuf, len))
> +			break;
> +
> +	} while (--count);

Should this return an error if the image was never consistent?

> +
> +	return 0;
> +}

^ permalink raw reply

* Re: [PATCH v6 net-next 12/19] ionic: Add Rx filter and rx_mode ndo support
From: Jakub Kicinski @ 2019-08-29 23:06 UTC (permalink / raw)
  To: Shannon Nelson; +Cc: netdev, davem
In-Reply-To: <20190829182720.68419-13-snelson@pensando.io>

On Thu, 29 Aug 2019 11:27:13 -0700, Shannon Nelson wrote:
> @@ -588,8 +866,26 @@ static int ionic_set_features(struct net_device *netdev,
>  
>  static int ionic_set_mac_address(struct net_device *netdev, void *sa)
>  {
> -	netdev_info(netdev, "%s: stubbed\n", __func__);
> -	return 0;
> +	struct sockaddr *addr = sa;
> +	u8 *mac;
> +
> +	mac = (u8 *)addr->sa_data;
> +	if (ether_addr_equal(netdev->dev_addr, mac))
> +		return 0;
> +
> +	if (!is_valid_ether_addr(mac))
> +		return -EADDRNOTAVAIL;
> +
> +	if (!is_zero_ether_addr(netdev->dev_addr)) {
> +		netdev_info(netdev, "deleting mac addr %pM\n",
> +			    netdev->dev_addr);
> +		ionic_addr_del(netdev, netdev->dev_addr);
> +	}
> +
> +	memcpy(netdev->dev_addr, mac, netdev->addr_len);
> +	netdev_info(netdev, "updating mac addr %pM\n", mac);
> +
> +	return ionic_addr_add(netdev, mac);
>  }

Please use the eth_prepare_mac_addr_change() and
eth_commit_mac_addr_change() helpers.

^ permalink raw reply

* Re: linux-next: Tree for Aug 29 (mlx5)
From: Saeed Mahameed @ 2019-08-29 23:04 UTC (permalink / raw)
  To: sfr@canb.auug.org.au, Eran Ben Elisha, linux-next@vger.kernel.org,
	rdunlap@infradead.org, haiyangz
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	Leon Romanovsky
In-Reply-To: <DM6PR21MB13379A89D3A57DCFD6E0D419CAA20@DM6PR21MB1337.namprd21.prod.outlook.com>

On Thu, 2019-08-29 at 21:48 +0000, Haiyang Zhang wrote:
> > -----Original Message-----
> > From: Saeed Mahameed <saeedm@mellanox.com>
> > Sent: Thursday, August 29, 2019 2:32 PM
> > To: sfr@canb.auug.org.au; Eran Ben Elisha <eranbe@mellanox.com>;
> > linux-
> > next@vger.kernel.org; rdunlap@infradead.org; Haiyang Zhang
> > <haiyangz@microsoft.com>
> > Cc: linux-kernel@vger.kernel.org; netdev@vger.kernel.org; Leon
> > Romanovsky <leonro@mellanox.com>
> > Subject: Re: linux-next: Tree for Aug 29 (mlx5)
> > 
> > On Thu, 2019-08-29 at 12:55 -0700, Randy Dunlap wrote:
> > > On 8/29/19 12:54 PM, Randy Dunlap wrote:
> > > > On 8/29/19 4:08 AM, Stephen Rothwell wrote:
> > > > > Hi all,
> > > > > 
> > > > > Changes since 20190828:
> > > > > 
> > > > 
> > > > on x86_64:
> > > > when CONFIG_PCI_HYPERV=m
> > > 
> > > and CONFIG_PCI_HYPERV_INTERFACE=m
> > > 
> > 
> > Haiyang and Eran, I think CONFIG_PCI_HYPERV_INTERFACE was never
> > supposed to be a module ? it supposed to provide an always
> > available
> > interface to drivers ..
> > 
> > Anyway, maybe we need to imply CONFIG_PCI_HYPERV_INTERFACE in mlx5.
> 
> The symbolic dependency by driver mlx5e,  automatically triggers
> loading of
> pci_hyperv_interface module. And this module can be loaded in any
> platforms.
> 

This only works when both are modules. 


> Currently, mlx5e driver has #if
> IS_ENABLED(CONFIG_PCI_HYPERV_INTERFACE)
> around the code using the interface.
> 
> I agree --
> Adding "select PCI_HYPERV_INTERFACE" for mlx5e will clean up these
> #if's.
> 

No, not "select", "imply".

if one wants PCI_HYPERV_INTERFACE off, imply will keep it off for you.

> Thanks,
> - Haiyang

^ permalink raw reply

* Re: [PATCH bpf-next 05/13] bpf: adding map batch processing support
From: Brian Vazquez @ 2019-08-29 23:01 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team
In-Reply-To: <20190829064507.2750795-1-yhs@fb.com>

Hi Yonghong!

Thanks for sending this series of patches and starting the discussion.

On Wed, Aug 28, 2019 at 11:45 PM Yonghong Song <yhs@fb.com> wrote:
>
> Brian Vazquez has proposed BPF_MAP_DUMP command to look up more than one
> map entries per syscall.
>   https://lore.kernel.org/bpf/CABCgpaU3xxX6CMMxD+1knApivtc2jLBHysDXw-0E9bQEL0qC3A@mail.gmail.com/T/#t
>
> During discussion, we found more use cases can be supported in a similar
> map operation batching framework. For example, batched map lookup and delete,
> which can be really helpful for bcc.
>   https://github.com/iovisor/bcc/blob/master/tools/tcptop.py#L233-L243
>   https://github.com/iovisor/bcc/blob/master/tools/slabratetop.py#L129-L138
>
> Also, in bcc, we have API to delete all entries in a map.
>   https://github.com/iovisor/bcc/blob/master/src/cc/api/BPFTable.h#L257-L264
>
> For map update, batched operations also useful as sometimes applications need
> to populate initial maps with more than one entry. For example, the below
> example is from kernel/samples/bpf/xdp_redirect_cpu_user.c:
>   https://github.com/torvalds/linux/blob/master/samples/bpf/xdp_redirect_cpu_user.c#L543-L550
>
> This patch addresses all the above use cases. To make uapi stable, it also
> covers other potential use cases. Four bpf syscall subcommands are introduced:
>         BPF_MAP_LOOKUP_BATCH
>         BPF_MAP_LOOKUP_AND_DELETE_BATCH
>         BPF_MAP_UPDATE_BATCH
>         BPF_MAP_DELETE_BATCH
>
> The UAPI attribute structure looks like:
>         struct { /* struct used by BPF_MAP_*_BATCH commands */
>                 __aligned_u64   start_key;      /* input: storing start key,
>                                                  * if NULL, starting from the beginning.
>                                                  */
>                 __aligned_u64   next_start_key; /* output: storing next batch start_key,
>                                                  * if NULL, no next key.
>                                                  */
>                 __aligned_u64   keys;           /* input/output: key buffer */
>                 __aligned_u64   values;         /* input/output: value buffer */
>                 __u32           count;          /* input: # of keys/values in
>                                                  *   or fits in keys[]/values[].
>                                                  * output: how many successful
>                                                  *   lookup/lookup_and_delete
>                                                  *   /delete/update operations.
>                                                  */
>                 __u32           map_fd;
>                 __u64           elem_flags;     /* BPF_MAP_{UPDATE,LOOKUP}_ELEM flags */
>                 __u64           flags;          /* flags for batch operation */
>         } batch;
>
> The 'start_key' and 'next_start_key' are used to BPF_MAP_LOOKUP_BATCH,
> BPF_MAP_LOOKUP_AND_DELETE_BATCH and BPF_MAP_DELETE_BATCH
> to start the operation on 'start_key' and also set the
> next batch start key in 'next_start_key'.
>
> If 'count' is greater than 0 and the return code is 0,
>   . the 'count' may be updated to be smaller if there is less
>     elements than 'count' for the operation. In such cases,
>     'next_start_key' will be set to NULL.
>   . the 'count' remains the same. 'next_start_key' could be NULL
>     or could point to next start_key for batch processing.
>
> If 'count' is greater than 0 and the return code is an error
> other than -EFAULT, the kernel will try to overwrite 'count'
> to contain the number of successful element-level (lookup,
> lookup_and_delete, update and delete) operations. The following
> attributes can be checked:
>   . if 'count' value is modified, the new value will be
>     the number of successful element-level operations.
>   . if 'count' value is modified, the keys[]/values[] will
>     contain correct results for new 'count' number of
>     operations for lookup[_and_delete] and update.
>
> The implementation in this patch mimics what user space
> did, e.g., for lookup_and_delete,
>     while(bpf_get_next_key(map, keyp, next_keyp) == 0) {
>        bpf_map_delete_elem(map, keyp);
>        bpf_map_lookup_elem(map, next_keyp, &value, flags);
>        keyp, next_keyp = next_keyp, keyp;
>     }
> The similar loop is implemented in the kernel, and
> each operation, bpf_get_next_key(), bpf_map_delete_elem()
> and bpf_map_lookup_elem(), uses existing kernel functions
> each of which has its own rcu_read_lock region, bpf_prog_active
> protection, etc.
> Therefore, it is totally possible that after bpf_get_next_key(),
> the bpf_map_delete_elem() or bpf_map_lookup_elem() may fail
> as the key may be deleted concurrently by kernel or
> other user space processes/threads.
> By default, the current implementation permits the loop
> to continue, just like typical user space handling. But
> a flag, BPF_F_ENFORCE_ENOENT, can be specified, so kernel
> will return an error if bpf_map_delete_elem() or
> bpf_map_lookup_elem() failed.
>
> The high-level algorithm for BPF_MAP_LOOKUP_BATCH and
> BPF_MAP_LOOKUP_AND_DELETE_BATCH:
>         if (start_key == NULL and next_start_key == NULL) {
>                 lookup up to 'count' keys in keys[] and fill
>                 corresponding values[], and delete those
>                 keys if BPF_MAP_LOOKUP_AND_DELETE_BATCH.
>         } else if (start_key == NULL && next_start_key != NULL) {
>                 lookup up to 'count' keys from the beginning
>                 of the map and fill keys[]/values[], delete
>                 those keys if BPF_MAP_LOOKUP_AND_DELETE_BATCH.
>                 Set 'next_start_key' for next batch operation.
>         } else if (start_key != NULL && next_start_key != NULL) {
>                 lookup to 'count' keys from 'start_key', inclusive,
>                 and fill keys[]/values[], delete those keys if
>                 BPF_MAP_LOOKUP_AND_DELETE_BATCH.
>                 Set 'next_start_key' for next batch operation.
>         }
>
> The high-level algorithm for BPF_MAP_UPDATE_BATCH:
>         if (count != 0) {
>                 do 'count' number of updates on keys[]/values[].
>         }
>
> The high-level algorithm for BPF_MAP_DELETE_BATCH:
>         if (count == 0) {
>                 if (start_key == NULL) {
>                         delete all elements from map.
>                 } else {
>                         delete all elements from start_key to the end of map.
>                 }
>         } else {
>                 if (start_key == NULL and next_start_key == NULL) {
>                         delete 'count' number of keys in keys[].
>                 } else if (start_key == NULL and next_start_key != NULL) {
>                         delete 'count' number of keys from the
>                         beginning of the map and set 'next_start_key'
>                         properly.
>                 } else if (start_key != NULL and next_start_keeey != NULL) {
>                         delete 'count' number of keys from 'start_key',
>                         and set 'next_start_key' properly.
>                 }
>         }
>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  include/uapi/linux/bpf.h |  27 +++
>  kernel/bpf/syscall.c     | 448 +++++++++++++++++++++++++++++++++++++++
>  2 files changed, 475 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 5d2fb183ee2d..576688f13e8c 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -107,6 +107,10 @@ enum bpf_cmd {
>         BPF_MAP_LOOKUP_AND_DELETE_ELEM,
>         BPF_MAP_FREEZE,
>         BPF_BTF_GET_NEXT_ID,
> +       BPF_MAP_LOOKUP_BATCH,
> +       BPF_MAP_LOOKUP_AND_DELETE_BATCH,
> +       BPF_MAP_UPDATE_BATCH,
> +       BPF_MAP_DELETE_BATCH,
>  };
>
>  enum bpf_map_type {
> @@ -347,6 +351,9 @@ enum bpf_attach_type {
>  /* flags for BPF_PROG_QUERY */
>  #define BPF_F_QUERY_EFFECTIVE  (1U << 0)
>
> +/* flags for BPF_MAP_*_BATCH */
> +#define BPF_F_ENFORCE_ENOENT   (1U << 0)
> +
>  enum bpf_stack_build_id_status {
>         /* user space need an empty entry to identify end of a trace */
>         BPF_STACK_BUILD_ID_EMPTY = 0,
> @@ -396,6 +403,26 @@ union bpf_attr {
>                 __u64           flags;
>         };
>
> +       struct { /* struct used by BPF_MAP_*_BATCH commands */
> +               __aligned_u64   start_key;      /* input: storing start key,
> +                                                * if NULL, starting from the beginning.
> +                                                */
> +               __aligned_u64   next_start_key; /* output: storing next batch start_key,
> +                                                * if NULL, no next key.
> +                                                */
> +               __aligned_u64   keys;           /* input/output: key buffer */
> +               __aligned_u64   values;         /* input/output: value buffer */
> +               __u32           count;          /* input: # of keys/values in
> +                                                *   or fits in keys[]/values[].
> +                                                * output: how many successful
> +                                                *   lookup/lookup_and_delete
> +                                                *   update/delete operations.
> +                                                */
> +               __u32           map_fd;
> +               __u64           elem_flags;     /* BPF_MAP_*_ELEM flags */
> +               __u64           flags;          /* flags for batch operation */
> +       } batch;
> +
>         struct { /* anonymous struct used by BPF_PROG_LOAD command */
>                 __u32           prog_type;      /* one of enum bpf_prog_type */
>                 __u32           insn_cnt;
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 06308f0206e5..8746b55405f9 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -33,6 +33,17 @@
>
>  #define BPF_OBJ_FLAG_MASK   (BPF_F_RDONLY | BPF_F_WRONLY)
>
> +#define BPF_MAP_BATCH_SWAP_KEYS(key1, key2, buf1, buf2)        \
> +       do {                                            \
> +               if (key1 == (buf1)) {                   \
> +                       key1 = buf2;                    \
> +                       key2 = buf1;                    \
> +               } else {                                \
> +                       key1 = buf1;                    \
> +                       key2 = buf2;                    \
> +               }                                       \
> +       } while (0)                                     \
> +
>  DEFINE_PER_CPU(int, bpf_prog_active);
>  static DEFINE_IDR(prog_idr);
>  static DEFINE_SPINLOCK(prog_idr_lock);
> @@ -1183,6 +1194,431 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
>         return err;
>  }
>
> +static void __map_batch_get_attrs(const union bpf_attr *attr,
> +                                 void __user **skey, void __user **nskey,
> +                                 void __user **keys, void __user **values,
> +                                 u32 *max_count, u64 *elem_flags, u64 *flags)
> +{
> +       *skey = u64_to_user_ptr(attr->batch.start_key);
> +       *nskey = u64_to_user_ptr(attr->batch.next_start_key);
> +       *keys = u64_to_user_ptr(attr->batch.keys);
> +       *values = u64_to_user_ptr(attr->batch.values);
> +       *max_count = attr->batch.count;
> +       *elem_flags = attr->batch.elem_flags;
> +       *flags = attr->batch.flags;
> +}
> +
> +static int
> +__map_lookup_delete_batch_key_in_keys(struct bpf_map *map, void *key, void *value,
> +                                     u32 max_count, u32 key_size, u32 value_size,
> +                                     u64 elem_flags, void __user *keys,
> +                                     void __user *values,
> +                                     union bpf_attr __user *uattr,
> +                                     bool ignore_enoent)
> +{
> +       u32 count, missed = 0;
> +       int ret = 0, err;
> +
> +       for (count = 0; count < max_count; count++) {
> +               if (copy_from_user(key, keys + count * key_size, key_size)) {
> +                       ret = -EFAULT;
> +                       break;
> +               }
> +
> +               ret = bpf_map_copy_value(map, key, value, elem_flags);
> +               if (ret) {
> +                       if (ret != -ENOENT || !ignore_enoent)
> +                               break;
> +
> +                       missed++;
> +                       continue;
> +               }
> +
> +
> +               if (copy_to_user(values + count * value_size, value, value_size)) {
> +                       ret = -EFAULT;
> +                       break;
> +               }
> +
> +               ret = bpf_map_delete_elem(map, key);
> +               if (ret) {
> +                       if (ret != -ENOENT || !ignore_enoent)
> +                               break;
> +
> +                       missed++;
> +               }
> +       }
> +
> +       count -= missed;
> +       if ((!ret && missed) || (ret && ret != -EFAULT)) {
> +               err = put_user(count, &uattr->batch.count);
> +               ret = err ? : ret;
> +       }
> +
> +       return ret;
> +}
> +
> +static int map_lookup_and_delete_batch(struct bpf_map *map,
> +                                      const union bpf_attr *attr,
> +                                      union bpf_attr __user *uattr,
> +                                      bool do_delete)
> +{
> +       u32 max_count, count = 0, key_size, roundup_key_size, value_size;
> +       bool ignore_enoent, nextkey_is_null, copied;
> +       void *buf = NULL, *key, *value, *next_key;
> +       void __user *skey, *nskey, *keys, *values;
> +       u64 elem_flags, flags, zero = 0;
> +       int err, ret = 0;
> +
> +       if (map->map_type == BPF_MAP_TYPE_QUEUE ||
> +           map->map_type == BPF_MAP_TYPE_STACK)
> +               return -ENOTSUPP;
> +
> +       __map_batch_get_attrs(attr, &skey, &nskey, &keys, &values, &max_count,
> +                             &elem_flags, &flags);
> +
> +       if (elem_flags & ~BPF_F_LOCK || flags & ~BPF_F_ENFORCE_ENOENT)
> +               return -EINVAL;
> +
> +       if (!max_count)
> +               return 0;
> +
> +       /* The following max_count/skey/nskey combinations are supported:
> +        * max_count != 0 && !skey && !nskey: loop/delete max_count elements in keys[]/values[].
> +        * max_count != 0 && !skey && nskey: loop/delete max_count elements starting from map start.
> +        * max_count != 0 && skey && nskey: loop/delete max_count elements starting from skey.
> +        */
> +       if (skey && !nskey)
> +               return -EINVAL;
> +
> +       /* allocate space for two keys and one value. */
> +       key_size = map->key_size;
> +       roundup_key_size = round_up(map->key_size, 8);
> +       value_size = bpf_map_value_size(map);
> +       buf = kmalloc(roundup_key_size * 2 + value_size, GFP_USER | __GFP_NOWARN);
> +       if (!buf)
> +               return -ENOMEM;
> +
> +       key = buf;
> +       next_key = buf + roundup_key_size;
> +       value = buf + roundup_key_size * 2;
> +       ignore_enoent = !(flags & BPF_F_ENFORCE_ENOENT);
> +
> +       if (!skey && !nskey) {
> +               /* handle cases where keys in keys[] */
> +               ret = __map_lookup_delete_batch_key_in_keys(map, key, value, max_count,
> +                                                           key_size, value_size,
> +                                                           elem_flags, keys, values,
> +                                                           uattr, ignore_enoent);
> +               goto out;
> +       }
> +
> +       /* Get the first key. */
> +       if (!skey) {
> +               ret = bpf_map_get_next_key(map, NULL, key);
> +               if (ret) {
> +                       nextkey_is_null = true;
> +                       goto after_loop;
> +               }
> +       } else if (copy_from_user(key, skey, key_size)) {
> +               ret = -EFAULT;
> +               goto out;
> +       }
> +
> +       /* Copy the first key/value pair */
> +       ret = bpf_map_copy_value(map, key, value, elem_flags);
> +       if (ret) {
> +               if (skey)
> +                       goto out;
> +
> +               nextkey_is_null = true;
> +               goto after_loop;
> +       }
> +
> +       if (copy_to_user(keys, key, key_size) ||
> +           copy_to_user(values, value, value_size)) {
> +               ret = -EFAULT;
> +               goto out;
> +       }
> +
> +       /* We will always try to get next_key first
> +        * and then delete the current key.
> +        */
> +       ret = bpf_map_get_next_key(map, key, next_key);

One of the issues I see in this implementation is that is still
relying on the existing functions and has the same consistency
problems that my attempt had.

The problem happens when you are trying to do batch lookup on a
hashmap and when executing bpf_map_get_next_key(map, key, next_key)
the key is removed, then that call will return the first key and you'd
start iterating the map from the beginning again and retrieve
duplicate information.

Note that sometimes you can also start from the same bucket when the
key is updated while dumping it because it can be inserted on the head
of the bucket so you could potentially revisit elements that you had
already visited.

From previous discussion my understanding was that what we wanted was
to pursue 'atomic' compounded operations first and after that, try to
batch them. Although I don't think there's an easy way of batching and
being consistent at the same time.

^ permalink raw reply

* Re: [PATCH v6 net-next 07/19] ionic: Add basic adminq support
From: Jakub Kicinski @ 2019-08-29 22:52 UTC (permalink / raw)
  To: Shannon Nelson; +Cc: netdev, davem
In-Reply-To: <20190829182720.68419-8-snelson@pensando.io>

On Thu, 29 Aug 2019 11:27:08 -0700, Shannon Nelson wrote:
> +static void ionic_lif_qcq_deinit(struct ionic_lif *lif, struct ionic_qcq *qcq)
> +{
> +	struct ionic_dev *idev = &lif->ionic->idev;
> +	struct device *dev = lif->ionic->dev;
> +
> +	if (!qcq)
> +		return;
> +
> +	ionic_debugfs_del_qcq(qcq);
> +
> +	if (!(qcq->flags & IONIC_QCQ_F_INITED))
> +		return;
> +
> +	if (qcq->flags & IONIC_QCQ_F_INTR) {
> +		ionic_intr_mask(idev->intr_ctrl, qcq->intr.index,
> +				IONIC_INTR_MASK_SET);
> +		synchronize_irq(qcq->intr.vector);
> +		devm_free_irq(dev, qcq->intr.vector, &qcq->napi);

Doesn't free_irq() basically imply synchronize_irq()?

> +		netif_napi_del(&qcq->napi);
> +	}
> +
> +	qcq->flags &= ~IONIC_QCQ_F_INITED;

^ permalink raw reply

* Re: [PATCH v6 net-next 04/19] ionic: Add port management commands
From: Jakub Kicinski @ 2019-08-29 22:46 UTC (permalink / raw)
  To: Shannon Nelson; +Cc: netdev, davem
In-Reply-To: <20190829182720.68419-5-snelson@pensando.io>

On Thu, 29 Aug 2019 11:27:05 -0700, Shannon Nelson wrote:
> The port management commands apply to the physical port
> associated with the PCI device, which might be shared among
> several logical interfaces.
> 
> Signed-off-by: Shannon Nelson <snelson@pensando.io>
> ---
>  drivers/net/ethernet/pensando/ionic/ionic.h   |  4 +
>  .../ethernet/pensando/ionic/ionic_bus_pci.c   | 16 ++++
>  .../net/ethernet/pensando/ionic/ionic_dev.c   | 92 +++++++++++++++++++
>  .../net/ethernet/pensando/ionic/ionic_dev.h   | 13 +++
>  .../net/ethernet/pensando/ionic/ionic_main.c  | 86 +++++++++++++++++
>  5 files changed, 211 insertions(+)
> 
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic.h b/drivers/net/ethernet/pensando/ionic/ionic.h
> index 89ad9c590736..4960effd2bcc 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic.h
> +++ b/drivers/net/ethernet/pensando/ionic/ionic.h
> @@ -42,4 +42,8 @@ int ionic_identify(struct ionic *ionic);
>  int ionic_init(struct ionic *ionic);
>  int ionic_reset(struct ionic *ionic);
>  
> +int ionic_port_identify(struct ionic *ionic);
> +int ionic_port_init(struct ionic *ionic);
> +int ionic_port_reset(struct ionic *ionic);
> +
>  #endif /* _IONIC_H_ */
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_bus_pci.c b/drivers/net/ethernet/pensando/ionic/ionic_bus_pci.c
> index 286b4b450a73..804dd43e92a6 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic_bus_pci.c
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_bus_pci.c
> @@ -138,12 +138,27 @@ static int ionic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>  		goto err_out_teardown;
>  	}
>  
> +	/* Configure the ports */
> +	err = ionic_port_identify(ionic);
> +	if (err) {
> +		dev_err(dev, "Cannot identify port: %d, aborting\n", err);
> +		goto err_out_reset;
> +	}
> +
> +	err = ionic_port_init(ionic);
> +	if (err) {
> +		dev_err(dev, "Cannot init port: %d, aborting\n", err);
> +		goto err_out_reset;
> +	}
> +
>  	err = ionic_devlink_register(ionic);
>  	if (err)
>  		dev_err(dev, "Cannot register devlink: %d\n", err);
>  
>  	return 0;
>  
> +err_out_reset:
> +	ionic_reset(ionic);
>  err_out_teardown:
>  	ionic_dev_teardown(ionic);
>  err_out_unmap_bars:
> @@ -170,6 +185,7 @@ static void ionic_remove(struct pci_dev *pdev)
>  		return;
>  
>  	ionic_devlink_unregister(ionic);
> +	ionic_port_reset(ionic);
>  	ionic_reset(ionic);
>  	ionic_dev_teardown(ionic);
>  	ionic_unmap_bars(ionic);
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_dev.c b/drivers/net/ethernet/pensando/ionic/ionic_dev.c
> index 0bf1bd6bd7b1..3137776e9191 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic_dev.c
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_dev.c
> @@ -134,3 +134,95 @@ void ionic_dev_cmd_reset(struct ionic_dev *idev)
>  
>  	ionic_dev_cmd_go(idev, &cmd);
>  }
> +
> +/* Port commands */
> +void ionic_dev_cmd_port_identify(struct ionic_dev *idev)
> +{
> +	union ionic_dev_cmd cmd = {
> +		.port_init.opcode = IONIC_CMD_PORT_IDENTIFY,
> +		.port_init.index = 0,
> +	};
> +
> +	ionic_dev_cmd_go(idev, &cmd);
> +}
> +
> +void ionic_dev_cmd_port_init(struct ionic_dev *idev)
> +{
> +	union ionic_dev_cmd cmd = {
> +		.port_init.opcode = IONIC_CMD_PORT_INIT,
> +		.port_init.index = 0,
> +		.port_init.info_pa = cpu_to_le64(idev->port_info_pa),
> +	};
> +
> +	ionic_dev_cmd_go(idev, &cmd);
> +}
> +
> +void ionic_dev_cmd_port_reset(struct ionic_dev *idev)
> +{
> +	union ionic_dev_cmd cmd = {
> +		.port_reset.opcode = IONIC_CMD_PORT_RESET,
> +		.port_reset.index = 0,
> +	};
> +
> +	ionic_dev_cmd_go(idev, &cmd);
> +}
> +
> +void ionic_dev_cmd_port_state(struct ionic_dev *idev, u8 state)
> +{
> +	union ionic_dev_cmd cmd = {
> +		.port_setattr.opcode = IONIC_CMD_PORT_SETATTR,
> +		.port_setattr.index = 0,
> +		.port_setattr.attr = IONIC_PORT_ATTR_STATE,
> +		.port_setattr.state = state,
> +	};
> +
> +	ionic_dev_cmd_go(idev, &cmd);
> +}
> +
> +void ionic_dev_cmd_port_speed(struct ionic_dev *idev, u32 speed)
> +{
> +	union ionic_dev_cmd cmd = {
> +		.port_setattr.opcode = IONIC_CMD_PORT_SETATTR,
> +		.port_setattr.index = 0,
> +		.port_setattr.attr = IONIC_PORT_ATTR_SPEED,
> +		.port_setattr.speed = cpu_to_le32(speed),
> +	};
> +
> +	ionic_dev_cmd_go(idev, &cmd);
> +}
> +
> +void ionic_dev_cmd_port_autoneg(struct ionic_dev *idev, u8 an_enable)
> +{
> +	union ionic_dev_cmd cmd = {
> +		.port_setattr.opcode = IONIC_CMD_PORT_SETATTR,
> +		.port_setattr.index = 0,
> +		.port_setattr.attr = IONIC_PORT_ATTR_AUTONEG,
> +		.port_setattr.an_enable = an_enable,
> +	};
> +
> +	ionic_dev_cmd_go(idev, &cmd);
> +}
> +
> +void ionic_dev_cmd_port_fec(struct ionic_dev *idev, u8 fec_type)
> +{
> +	union ionic_dev_cmd cmd = {
> +		.port_setattr.opcode = IONIC_CMD_PORT_SETATTR,
> +		.port_setattr.index = 0,
> +		.port_setattr.attr = IONIC_PORT_ATTR_FEC,
> +		.port_setattr.fec_type = fec_type,
> +	};
> +
> +	ionic_dev_cmd_go(idev, &cmd);
> +}
> +
> +void ionic_dev_cmd_port_pause(struct ionic_dev *idev, u8 pause_type)
> +{
> +	union ionic_dev_cmd cmd = {
> +		.port_setattr.opcode = IONIC_CMD_PORT_SETATTR,
> +		.port_setattr.index = 0,
> +		.port_setattr.attr = IONIC_PORT_ATTR_PAUSE,
> +		.port_setattr.pause_type = pause_type,
> +	};
> +
> +	ionic_dev_cmd_go(idev, &cmd);
> +}

Hm. So you haven't moved those?

> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_dev.h b/drivers/net/ethernet/pensando/ionic/ionic_dev.h
> index 7050545a83aa..81b6910aabc1 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic_dev.h
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_dev.h
> @@ -117,6 +117,10 @@ struct ionic_dev {
>  	struct ionic_intr __iomem *intr_ctrl;
>  	u64 __iomem *intr_status;
>  
> +	u32 port_info_sz;
> +	struct ionic_port_info *port_info;
> +	dma_addr_t port_info_pa;
> +
>  	struct ionic_devinfo dev_info;
>  };
>  
> @@ -135,4 +139,13 @@ void ionic_dev_cmd_identify(struct ionic_dev *idev, u8 ver);
>  void ionic_dev_cmd_init(struct ionic_dev *idev);
>  void ionic_dev_cmd_reset(struct ionic_dev *idev);
>  
> +void ionic_dev_cmd_port_identify(struct ionic_dev *idev);
> +void ionic_dev_cmd_port_init(struct ionic_dev *idev);
> +void ionic_dev_cmd_port_reset(struct ionic_dev *idev);
> +void ionic_dev_cmd_port_state(struct ionic_dev *idev, u8 state);
> +void ionic_dev_cmd_port_speed(struct ionic_dev *idev, u32 speed);
> +void ionic_dev_cmd_port_autoneg(struct ionic_dev *idev, u8 an_enable);
> +void ionic_dev_cmd_port_fec(struct ionic_dev *idev, u8 fec_type);
> +void ionic_dev_cmd_port_pause(struct ionic_dev *idev, u8 pause_type);
> +
>  #endif /* _IONIC_DEV_H_ */
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_main.c b/drivers/net/ethernet/pensando/ionic/ionic_main.c
> index 5c311b9241ee..96de2789587d 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic_main.c
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_main.c
> @@ -317,6 +317,92 @@ int ionic_reset(struct ionic *ionic)
>  	return err;
>  }
>  
> +int ionic_port_identify(struct ionic *ionic)
> +{
> +	struct ionic_identity *ident = &ionic->ident;
> +	struct ionic_dev *idev = &ionic->idev;
> +	size_t sz;
> +	int err;
> +
> +	mutex_lock(&ionic->dev_cmd_lock);
> +
> +	ionic_dev_cmd_port_identify(idev);
> +	err = ionic_dev_cmd_wait(ionic, DEVCMD_TIMEOUT);
> +	if (!err) {
> +		sz = min(sizeof(ident->port), sizeof(idev->dev_cmd_regs->data));
> +		memcpy_fromio(&ident->port, &idev->dev_cmd_regs->data, sz);
> +	}
> +
> +	mutex_unlock(&ionic->dev_cmd_lock);
> +
> +	return err;
> +}
> +
> +int ionic_port_init(struct ionic *ionic)
> +{
> +	struct ionic_identity *ident = &ionic->ident;
> +	struct ionic_dev *idev = &ionic->idev;
> +	size_t sz;
> +	int err;
> +
> +	if (idev->port_info)
> +		return 0;
> +
> +	idev->port_info_sz = ALIGN(sizeof(*idev->port_info), PAGE_SIZE);
> +	idev->port_info = dma_alloc_coherent(ionic->dev, idev->port_info_sz,
> +					     &idev->port_info_pa,
> +					     GFP_KERNEL);
> +	if (!idev->port_info) {
> +		dev_err(ionic->dev, "Failed to allocate port info, aborting\n");
> +		return -ENOMEM;
> +	}
> +
> +	sz = min(sizeof(ident->port.config), sizeof(idev->dev_cmd_regs->data));
> +
> +	mutex_lock(&ionic->dev_cmd_lock);
> +
> +	memcpy_toio(&idev->dev_cmd_regs->data, &ident->port.config, sz);
> +	ionic_dev_cmd_port_init(idev);
> +	err = ionic_dev_cmd_wait(ionic, DEVCMD_TIMEOUT);
> +
> +	ionic_dev_cmd_port_state(&ionic->idev, IONIC_PORT_ADMIN_STATE_UP);
> +	(void)ionic_dev_cmd_wait(ionic, DEVCMD_TIMEOUT);
> +
> +	mutex_unlock(&ionic->dev_cmd_lock);
> +	if (err) {
> +		dev_err(ionic->dev, "Failed to init port\n");
> +		dma_free_coherent(ionic->dev, idev->port_info_sz,
> +				  idev->port_info, idev->port_info_pa);

idev->port_info = NULL;

> +	}
> +
> +	return err;
> +}
> +
> +int ionic_port_reset(struct ionic *ionic)
> +{
> +	struct ionic_dev *idev = &ionic->idev;
> +	int err;
> +
> +	if (!idev->port_info)
> +		return 0;
> +
> +	mutex_lock(&ionic->dev_cmd_lock);
> +	ionic_dev_cmd_port_reset(idev);
> +	err = ionic_dev_cmd_wait(ionic, DEVCMD_TIMEOUT);
> +	mutex_unlock(&ionic->dev_cmd_lock);
> +
> +	dma_free_coherent(ionic->dev, idev->port_info_sz,
> +			  idev->port_info, idev->port_info_pa);
> +
> +	idev->port_info = NULL;
> +	idev->port_info_pa = 0;
> +
> +	if (err)
> +		dev_err(ionic->dev, "Failed to reset port\n");
> +
> +	return err;
> +}
> +
>  static int __init ionic_init_module(void)
>  {
>  	pr_info("%s %s, ver %s\n",


^ permalink raw reply

* Re: [RESEND PATCH 1/5] dt-bindings: net: Add compatible for BCM4345C5 bluetooth device
From: Rob Herring @ 2019-08-29 22:45 UTC (permalink / raw)
  To: megous
  Cc: Maxime Ripard, Chen-Yu Tsai, Marcel Holtmann, Johan Hedberg,
	Mark Rutland, David S. Miller, netdev, devicetree, linux-kernel,
	linux-arm-kernel, linux-bluetooth, Ondrej Jirman
In-Reply-To: <20190823103139.17687-2-megous@megous.com>

On Fri, 23 Aug 2019 12:31:35 +0200, megous@megous.com wrote:
> From: Ondrej Jirman <megous@megous.com>
> 
> This is present in the AP6526 WiFi/Bluetooth 5.0 module.
> 
> Signed-off-by: Ondrej Jirman <megous@megous.com>
> ---
>  Documentation/devicetree/bindings/net/broadcom-bluetooth.txt | 1 +
>  1 file changed, 1 insertion(+)
> 

Acked-by: Rob Herring <robh@kernel.org>

^ permalink raw reply

* Re: [PATCH] ipv6: Not to probe neighbourless routes
From: kbuild test robot @ 2019-08-29 22:42 UTC (permalink / raw)
  To: Yi Wang
  Cc: kbuild-all, davem, kuznet, yoshfuji, netdev, linux-kernel,
	xue.zhihong, wang.yi59, wang.liang82, Cheng Lin
In-Reply-To: <1566896907-5121-1-git-send-email-wang.yi59@zte.com.cn>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=unknown-8bit, Size: 6132 bytes --]

Hi Yi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to v5.3-rc6 next-20190829]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Yi-Wang/ipv6-Not-to-probe-neighbourless-routes/20190830-025439
config: x86_64-allyesconfig (attached as .config)
compiler: gcc-7 (Debian 7.4.0-11) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   net/ipv6/route.c: In function 'rt6_probe':
>> net/ipv6/route.c:660:10: error: 'struct fib6_nh' has no member named 'last_probe'
      fib6_nh->last_probe = jiffies;
             ^~

vim +660 net/ipv6/route.c

c2f17e827b4199 Hannes Frederic Sowa         2013-10-21  619  
cc3a86c802f0ba David Ahern                  2019-04-09  620  static void rt6_probe(struct fib6_nh *fib6_nh)
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  621  {
f547fac624be53 Sabrina Dubroca              2018-10-12  622  	struct __rt6_probe_work *work = NULL;
5e670d844b2a4e David Ahern                  2018-04-17  623  	const struct in6_addr *nh_gw;
f2c31e32b378a6 Eric Dumazet                 2011-07-29  624  	struct neighbour *neigh;
5e670d844b2a4e David Ahern                  2018-04-17  625  	struct net_device *dev;
f547fac624be53 Sabrina Dubroca              2018-10-12  626  	struct inet6_dev *idev;
5e670d844b2a4e David Ahern                  2018-04-17  627  
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  628  	/*
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  629  	 * Okay, this does not seem to be appropriate
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  630  	 * for now, however, we need to check if it
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  631  	 * is really so; aka Router Reachability Probing.
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  632  	 *
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  633  	 * Router Reachability Probe MUST be rate-limited
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  634  	 * to no more than one per minute.
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  635  	 */
cc3a86c802f0ba David Ahern                  2019-04-09  636  	if (fib6_nh->fib_nh_gw_family)
7ff74a596b6aa4 YOSHIFUJI Hideaki / 吉藤英明 2013-01-17  637  		return;
5e670d844b2a4e David Ahern                  2018-04-17  638  
cc3a86c802f0ba David Ahern                  2019-04-09  639  	nh_gw = &fib6_nh->fib_nh_gw6;
cc3a86c802f0ba David Ahern                  2019-04-09  640  	dev = fib6_nh->fib_nh_dev;
2152caea719657 YOSHIFUJI Hideaki / 吉藤英明 2013-01-17  641  	rcu_read_lock_bh();
5e670d844b2a4e David Ahern                  2018-04-17  642  	neigh = __ipv6_neigh_lookup_noref(dev, nh_gw);
2152caea719657 YOSHIFUJI Hideaki / 吉藤英明 2013-01-17  643  	if (neigh) {
8d6c31bf574177 Martin KaFai Lau             2015-07-24  644  		if (neigh->nud_state & NUD_VALID)
8d6c31bf574177 Martin KaFai Lau             2015-07-24  645  			goto out;
8d6c31bf574177 Martin KaFai Lau             2015-07-24  646  
e3f0c73ec3f208 Cheng Lin                    2019-08-27  647  		idev = __in6_dev_get(dev);
2152caea719657 YOSHIFUJI Hideaki / 吉藤英明 2013-01-17  648  		write_lock(&neigh->lock);
990edb428c2c85 Martin KaFai Lau             2015-07-24  649  		if (!(neigh->nud_state & NUD_VALID) &&
990edb428c2c85 Martin KaFai Lau             2015-07-24  650  		    time_after(jiffies,
dcd1f572954f9d David Ahern                  2018-04-18  651  			       neigh->updated + idev->cnf.rtr_probe_interval)) {
c2f17e827b4199 Hannes Frederic Sowa         2013-10-21  652  			work = kmalloc(sizeof(*work), GFP_ATOMIC);
990edb428c2c85 Martin KaFai Lau             2015-07-24  653  			if (work)
7e980569642811 Jiri Benc                    2013-12-11  654  				__neigh_set_probe_once(neigh);
990edb428c2c85 Martin KaFai Lau             2015-07-24  655  		}
2152caea719657 YOSHIFUJI Hideaki / 吉藤英明 2013-01-17  656  		write_unlock(&neigh->lock);
990edb428c2c85 Martin KaFai Lau             2015-07-24  657  	}
2152caea719657 YOSHIFUJI Hideaki / 吉藤英明 2013-01-17  658  
c2f17e827b4199 Hannes Frederic Sowa         2013-10-21  659  	if (work) {
cc3a86c802f0ba David Ahern                  2019-04-09 @660  		fib6_nh->last_probe = jiffies;
c2f17e827b4199 Hannes Frederic Sowa         2013-10-21  661  		INIT_WORK(&work->work, rt6_probe_deferred);
5e670d844b2a4e David Ahern                  2018-04-17  662  		work->target = *nh_gw;
5e670d844b2a4e David Ahern                  2018-04-17  663  		dev_hold(dev);
5e670d844b2a4e David Ahern                  2018-04-17  664  		work->dev = dev;
c2f17e827b4199 Hannes Frederic Sowa         2013-10-21  665  		schedule_work(&work->work);
c2f17e827b4199 Hannes Frederic Sowa         2013-10-21  666  	}
990edb428c2c85 Martin KaFai Lau             2015-07-24  667  
8d6c31bf574177 Martin KaFai Lau             2015-07-24  668  out:
2152caea719657 YOSHIFUJI Hideaki / 吉藤英明 2013-01-17  669  	rcu_read_unlock_bh();
f2c31e32b378a6 Eric Dumazet                 2011-07-29  670  }
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  671  #else
cc3a86c802f0ba David Ahern                  2019-04-09  672  static inline void rt6_probe(struct fib6_nh *fib6_nh)
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  673  {
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  674  }
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  675  #endif
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  676  

:::::: The code at line 660 was first introduced by commit
:::::: cc3a86c802f0ba9a2627aef256d95ae3b3fa6e91 ipv6: Change rt6_probe to take a fib6_nh

:::::: TO: David Ahern <dsahern@gmail.com>
:::::: CC: David S. Miller <davem@davemloft.net>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 69510 bytes --]

^ permalink raw reply

* Re: [PATCH v6 net-next 02/19] ionic: Add basic framework for IONIC Network device driver
From: Jakub Kicinski @ 2019-08-29 22:38 UTC (permalink / raw)
  To: Shannon Nelson; +Cc: netdev, davem
In-Reply-To: <20190829182720.68419-3-snelson@pensando.io>

On Thu, 29 Aug 2019 11:27:03 -0700, Shannon Nelson wrote:
> This patch adds a basic driver framework for the Pensando IONIC
> network device.  There is no functionality right now other than
> the ability to load and unload.
> 
> Signed-off-by: Shannon Nelson <snelson@pensando.io>

> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_devlink.c b/drivers/net/ethernet/pensando/ionic/ionic_devlink.c
> new file mode 100644
> index 000000000000..6892409cd64b
> --- /dev/null
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_devlink.c
> @@ -0,0 +1,44 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright(c) 2017 - 2019 Pensando Systems, Inc */
> +
> +#include <linux/module.h>
> +#include <linux/netdevice.h>
> +
> +#include "ionic.h"
> +#include "ionic_bus.h"
> +#include "ionic_devlink.h"
> +
> +static int ionic_dl_info_get(struct devlink *dl, struct devlink_info_req *req,
> +			     struct netlink_ext_ack *extack)
> +{
> +	devlink_info_driver_name_put(req, IONIC_DRV_NAME);

This may fail, should the error not be propagated?

> +	return 0;
> +}
> +
> +static const struct devlink_ops ionic_dl_ops = {
> +	.info_get	= ionic_dl_info_get,
> +};
> +
> +struct ionic *ionic_devlink_alloc(struct device *dev)
> +{
> +	struct ionic *ionic;
> +	struct devlink *dl;
> +
> +	dl = devlink_alloc(&ionic_dl_ops, sizeof(struct ionic));
> +	if (!dl) {
> +		dev_warn(dev, "devlink_alloc failed");

missing new line at the end of warning, but the warning is unnecessary,
if memory allocation fails kernel will generate a big OOM splat, anyway

> +		return NULL;
> +	}
> +
> +	ionic = devlink_priv(dl);
> +
> +	return ionic;

return devlink_priv(dl);

> +}
> +
> +void ionic_devlink_free(struct ionic *ionic)
> +{
> +	struct devlink *dl = priv_to_devlink(ionic);
> +
> +	devlink_free(dl);
> +}

^ permalink raw reply

* Re: [PATCH v6 net-next 01/19] devlink: Add new info version tags for ASIC and FW
From: Jakub Kicinski @ 2019-08-29 22:33 UTC (permalink / raw)
  To: Shannon Nelson; +Cc: netdev, davem, Jiri Pirko
In-Reply-To: <20190829182720.68419-2-snelson@pensando.io>

On Thu, 29 Aug 2019 11:27:02 -0700, Shannon Nelson wrote:
> The current tag set is still rather small and needs a couple
> more tags to help with ASIC identification and to have a
> more generic FW version.
> 
> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
> Cc: Jiri Pirko <jiri@resnulli.us>
> Signed-off-by: Shannon Nelson <snelson@pensando.io>

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>

^ permalink raw reply

* Re: [PATCH] ipv6: Not to probe neighbourless routes
From: kbuild test robot @ 2019-08-29 22:20 UTC (permalink / raw)
  To: Yi Wang
  Cc: kbuild-all, davem, kuznet, yoshfuji, netdev, linux-kernel,
	xue.zhihong, wang.yi59, wang.liang82, Cheng Lin
In-Reply-To: <1566896907-5121-1-git-send-email-wang.yi59@zte.com.cn>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=unknown-8bit, Size: 6124 bytes --]

Hi Yi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to v5.3-rc6 next-20190829]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Yi-Wang/ipv6-Not-to-probe-neighbourless-routes/20190830-025439
config: x86_64-rhel (attached as .config)
compiler: gcc-7 (Debian 7.4.0-11) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   net/ipv6/route.c: In function 'rt6_probe':
>> net/ipv6/route.c:660:10: error: 'struct fib6_nh' has no member named 'last_probe'
      fib6_nh->last_probe = jiffies;
             ^~

vim +660 net/ipv6/route.c

c2f17e827b4199 Hannes Frederic Sowa         2013-10-21  619  
cc3a86c802f0ba David Ahern                  2019-04-09  620  static void rt6_probe(struct fib6_nh *fib6_nh)
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  621  {
f547fac624be53 Sabrina Dubroca              2018-10-12  622  	struct __rt6_probe_work *work = NULL;
5e670d844b2a4e David Ahern                  2018-04-17  623  	const struct in6_addr *nh_gw;
f2c31e32b378a6 Eric Dumazet                 2011-07-29  624  	struct neighbour *neigh;
5e670d844b2a4e David Ahern                  2018-04-17  625  	struct net_device *dev;
f547fac624be53 Sabrina Dubroca              2018-10-12  626  	struct inet6_dev *idev;
5e670d844b2a4e David Ahern                  2018-04-17  627  
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  628  	/*
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  629  	 * Okay, this does not seem to be appropriate
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  630  	 * for now, however, we need to check if it
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  631  	 * is really so; aka Router Reachability Probing.
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  632  	 *
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  633  	 * Router Reachability Probe MUST be rate-limited
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  634  	 * to no more than one per minute.
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  635  	 */
cc3a86c802f0ba David Ahern                  2019-04-09  636  	if (fib6_nh->fib_nh_gw_family)
7ff74a596b6aa4 YOSHIFUJI Hideaki / 吉藤英明 2013-01-17  637  		return;
5e670d844b2a4e David Ahern                  2018-04-17  638  
cc3a86c802f0ba David Ahern                  2019-04-09  639  	nh_gw = &fib6_nh->fib_nh_gw6;
cc3a86c802f0ba David Ahern                  2019-04-09  640  	dev = fib6_nh->fib_nh_dev;
2152caea719657 YOSHIFUJI Hideaki / 吉藤英明 2013-01-17  641  	rcu_read_lock_bh();
5e670d844b2a4e David Ahern                  2018-04-17  642  	neigh = __ipv6_neigh_lookup_noref(dev, nh_gw);
2152caea719657 YOSHIFUJI Hideaki / 吉藤英明 2013-01-17  643  	if (neigh) {
8d6c31bf574177 Martin KaFai Lau             2015-07-24  644  		if (neigh->nud_state & NUD_VALID)
8d6c31bf574177 Martin KaFai Lau             2015-07-24  645  			goto out;
8d6c31bf574177 Martin KaFai Lau             2015-07-24  646  
e3f0c73ec3f208 Cheng Lin                    2019-08-27  647  		idev = __in6_dev_get(dev);
2152caea719657 YOSHIFUJI Hideaki / 吉藤英明 2013-01-17  648  		write_lock(&neigh->lock);
990edb428c2c85 Martin KaFai Lau             2015-07-24  649  		if (!(neigh->nud_state & NUD_VALID) &&
990edb428c2c85 Martin KaFai Lau             2015-07-24  650  		    time_after(jiffies,
dcd1f572954f9d David Ahern                  2018-04-18  651  			       neigh->updated + idev->cnf.rtr_probe_interval)) {
c2f17e827b4199 Hannes Frederic Sowa         2013-10-21  652  			work = kmalloc(sizeof(*work), GFP_ATOMIC);
990edb428c2c85 Martin KaFai Lau             2015-07-24  653  			if (work)
7e980569642811 Jiri Benc                    2013-12-11  654  				__neigh_set_probe_once(neigh);
990edb428c2c85 Martin KaFai Lau             2015-07-24  655  		}
2152caea719657 YOSHIFUJI Hideaki / 吉藤英明 2013-01-17  656  		write_unlock(&neigh->lock);
990edb428c2c85 Martin KaFai Lau             2015-07-24  657  	}
2152caea719657 YOSHIFUJI Hideaki / 吉藤英明 2013-01-17  658  
c2f17e827b4199 Hannes Frederic Sowa         2013-10-21  659  	if (work) {
cc3a86c802f0ba David Ahern                  2019-04-09 @660  		fib6_nh->last_probe = jiffies;
c2f17e827b4199 Hannes Frederic Sowa         2013-10-21  661  		INIT_WORK(&work->work, rt6_probe_deferred);
5e670d844b2a4e David Ahern                  2018-04-17  662  		work->target = *nh_gw;
5e670d844b2a4e David Ahern                  2018-04-17  663  		dev_hold(dev);
5e670d844b2a4e David Ahern                  2018-04-17  664  		work->dev = dev;
c2f17e827b4199 Hannes Frederic Sowa         2013-10-21  665  		schedule_work(&work->work);
c2f17e827b4199 Hannes Frederic Sowa         2013-10-21  666  	}
990edb428c2c85 Martin KaFai Lau             2015-07-24  667  
8d6c31bf574177 Martin KaFai Lau             2015-07-24  668  out:
2152caea719657 YOSHIFUJI Hideaki / 吉藤英明 2013-01-17  669  	rcu_read_unlock_bh();
f2c31e32b378a6 Eric Dumazet                 2011-07-29  670  }
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  671  #else
cc3a86c802f0ba David Ahern                  2019-04-09  672  static inline void rt6_probe(struct fib6_nh *fib6_nh)
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  673  {
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  674  }
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  675  #endif
270972554c91ac YOSHIFUJI Hideaki            2006-03-20  676  

:::::: The code at line 660 was first introduced by commit
:::::: cc3a86c802f0ba9a2627aef256d95ae3b3fa6e91 ipv6: Change rt6_probe to take a fib6_nh

:::::: TO: David Ahern <dsahern@gmail.com>
:::::: CC: David S. Miller <davem@davemloft.net>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 43525 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v5] sched: Add dualpi2 qdisc
From: Bob Briscoe @ 2019-08-29 22:18 UTC (permalink / raw)
  To: Dave Taht
  Cc: Tilmans, Olivier (Nokia - BE/Antwerp), Eric Dumazet,
	Stephen Hemminger, Olga Albisser,
	De Schepper, Koen (Nokia - BE/Antwerp), Henrik Steen,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <CAA93jw5R2QWSngdKX5RSvGR5NEvFTH-Sp5__k+EroxkkQkfzcw@mail.gmail.com>

Dave,

On 28/08/2019 17:55, Dave Taht wrote:
> On Wed, Aug 28, 2019 at 7:00 AM Bob Briscoe <research@bobbriscoe.net> wrote:
>> Olivier, Dave,
>>
>> On 23/08/2019 13:59, Tilmans, Olivier (Nokia - BE/Antwerp) wrote:
>>
>> as best as I can
>> tell (but could be wrong) the NQB idea wants to put something into the
>> l4s fast queue? Or is NQB supposed to
>> be a third queue?
>>
>> NQB is not supported in this release of the code. But FYI, it's not for a third queue.
> At the time of my code review of dualpi I had not gone back to review
> the NQB draft fully.
>
>> We can add support for NQB in the future, by expanding the
>> dualpi2_skb_classify() function. This is however out of scope at the
>> moment as NQB is not yet adopted by the TSV WG. I'd guess we may want more
>> than just the NQB DSCP codepoint in the L queue, which then warrant
>> another way to classify traffic, e.g., using tc filter hints.
> Yes, you'll find find folk are fans of being able to put tc (and ebpf)
> filters in front of various qdiscs for classification, logging, and/or
> dropping behavior.
>
> A fairly typical stanza is here:
> https://github.com/torvalds/linux/blob/master/net/sched/sch_sfq.c#L171
> to line 193.
Yes, I got a student to add hooks for the Linux classification 
architecture (either adding more, or overriding the defaults) a couple 
of years ago, along with creating a classful structure. But his 
unfinished branch got left dangling once he graduated and is now way out 
of date. it's still our intention to take that direction tho.

>
>> The IETF adopted the NQB draft at the meeting just passed in July, but the draft has not yet been updated to reflect that: https://tools.ietf.org/html/draft-white-tsvwg-nqb-02
> Hmmm... no. I think oliver's statement was correct.
>
> NQB was put into the "call for adoption into tsvwg" state (
> https://mailarchive.ietf.org/arch/msg/tsvwg/fjyYQgU9xQCNalwPO7v9-al6mGk
> ) in the tsvwg aug 21st, which
> doesn't mean "adopted by the ietf", either.
You're right.

I've been away from all this for a while. In the tsvwg meeting there 
were perhaps a couple of dozen folks stating support and no-one against, 
so I had (wrongly) extrapolated from that - I should have checked the 
status of the ML discussion first.

> In response to that call
> several folk did put in (rather pithy),
> comments on the current state of the NQB idea and internet draft, starting here:
>
> https://mailarchive.ietf.org/arch/msg/tsvwg/hZGjm899t87YZl9JJUOWQq4KBsk

> While using up ECT1 in the L4S code as an identifier and not as a
> congestion indicator is very controversial for me (
> https://lwn.net/Articles/783673/ ), AND I'd rather it not be baked
> into the linux api for dualpi should this identifier not be chosen by
> the wg (thus my suggestion of a mask or lookup table)...
That ship has sailed. You can consider it controversial if you want, but 
the tsvwg decided to use ECT(1) as an identifier for L4S after long 
discussions back in 2016. Years of a large number of people's work was 
predicated on that decision. So the dualpi2 code reflects the way the 
IETF is approaching this.


>
> ... I also dearly would like both sides of this code - dualpi and tcp
> prague - in a simultaneously testable and high quality state. Without
> that, many core ideas in dualpi cannot be tested, nor objectively
> evaluated against other tcps and qdiscs using rfc3168 behavior along
> the path. Multiple experimental ideas in RFC8311 (such as those in
> section 4.3) have also not been re-evaluated in any context.
We're working on that - top priority now.
>
> Is the known to work reference codebase for "tcp prague" still 3.19 based?
It is, but Olivier recently found the elusive cause of the problem that 
made later versions bursty. So we're getting close.




Bob
>> Bob
>>
>> --
>> ________________________________________________________________
>> Bob Briscoe                               http://bobbriscoe.net/
>
>
> --
>
> Dave Täht
> CTO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-831-205-9740

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/


^ permalink raw reply

* Re: general protection fault in tls_sk_proto_close (2)
From: John Fastabend @ 2019-08-29 22:18 UTC (permalink / raw)
  To: Jakub Kicinski, John Fastabend
  Cc: Hillf Danton, syzbot, aviadye, borisp, daniel, davejwatson, davem,
	linux-kernel, netdev, syzkaller-bugs
In-Reply-To: <20190829115315.5686c18f@cakuba.netronome.com>

Jakub Kicinski wrote:
> On Thu, 29 Aug 2019 11:48:32 -0700, John Fastabend wrote:
> > Jakub Kicinski wrote:
> > > On Thu, 29 Aug 2019 11:52:00 +0800, Hillf Danton wrote:  
> > > > Alternatively work is done if sock is closed again. Anyway ctx is reset
> > > > under sock's callback lock in write mode.
> > > > 
> > > > --- a/net/tls/tls_main.c
> > > > +++ b/net/tls/tls_main.c
> > > > @@ -295,6 +295,8 @@ static void tls_sk_proto_close(struct so
> > > >  	long timeo = sock_sndtimeo(sk, 0);
> > > >  	bool free_ctx;
> > > >  
> > > > +	if (!ctx)
> > > > +		return;
> > > >  	if (ctx->tx_conf == TLS_SW)
> > > >  		tls_sw_cancel_work_tx(ctx);  
> > > 
> > > That's no bueno, the real socket's close will never get called.  
> > 
> > Seems when we refactored BPF side we dropped the check for ULP on one
> > path so I'll add that back now. It would be nice and seems we are
> > getting closer now that tls side is a bit more dynamic if the ordering
> > didn't matter.
> 
> We'd probably need some more generic way of communicating the changes
> in sk_proto stack, e.g. by moving the update into one of sk_proto
> callbacks? but yes.
> 
> > diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> > index 1330a7442e5b..30d11558740e 100644
> > --- a/net/core/sock_map.c
> > +++ b/net/core/sock_map.c
> > @@ -666,6 +666,8 @@ static int sock_hash_update_common(struct bpf_map *map, void *key,
> >         WARN_ON_ONCE(!rcu_read_lock_held());
> >         if (unlikely(flags > BPF_EXIST))
> >                 return -EINVAL;
> > +       if (unlikely(icsk->icsk_ulp_data))
> > +               return -EINVAL;
> > 
> >         link = sk_psock_init_link();
> >         if (!link)
> 
> Thanks! That looks good, if you feel like submitting officially feel
> free to add my Reviewed-by!

I'll send it out this evening after running the selftests.

.John

^ permalink raw reply

* Re: [PATCH v3 1/2] net: core: Notify on changes to dev->promiscuity.
From: David Miller @ 2019-08-29 22:12 UTC (permalink / raw)
  To: idosch
  Cc: andrew, jiri, horatiu.vultur, alexandre.belloni, UNGLinuxDriver,
	allan.nielsen, ivecera, f.fainelli, netdev, linux-kernel
In-Reply-To: <20190829193613.GA23259@splinter>

From: Ido Schimmel <idosch@idosch.org>
Date: Thu, 29 Aug 2019 22:36:13 +0300

> I fully agree that we should make it easy for users to capture offloaded
> traffic, which is why I suggested patching libpcap. Add a flag to
> capable netdevs that tells libpcap that in order to capture all the
> traffic from this interface it needs to add a tc filter with a trap
> action. That way zero familiarity with tc is required from users.

Why not just make setting promisc mode on the device do this rather than
require all of this tc indirection nonsense?

That's the whole point of this conversation I thought?

^ permalink raw reply

* Re: [PATCH v3 1/2] net: core: Notify on changes to dev->promiscuity.
From: David Miller @ 2019-08-29 22:10 UTC (permalink / raw)
  To: andrew
  Cc: idosch, jiri, horatiu.vultur, alexandre.belloni, UNGLinuxDriver,
	allan.nielsen, ivecera, f.fainelli, netdev, linux-kernel
In-Reply-To: <20190829182957.GA17530@lunn.ch>

From: Andrew Lunn <andrew@lunn.ch>
Date: Thu, 29 Aug 2019 20:29:57 +0200

> We should also think about the different classes of users. Somebody
> using a TOR switch with a NOS is very different to a user of a SOHO
> switch in their WiFi access point. The first probably knows tc very
> well, the second has probably never heard of it, and just wants
> tcpdump to work like on their desktop.

+1

^ permalink raw reply

* Re: [PATCH v3 1/2] net: core: Notify on changes to dev->promiscuity.
From: David Miller @ 2019-08-29 22:08 UTC (permalink / raw)
  To: idosch
  Cc: andrew, jiri, horatiu.vultur, alexandre.belloni, UNGLinuxDriver,
	allan.nielsen, ivecera, f.fainelli, netdev, linux-kernel
In-Reply-To: <20190829175759.GA19471@splinter>

From: Ido Schimmel <idosch@idosch.org>
Date: Thu, 29 Aug 2019 20:57:59 +0300

> On a software switch, when you run tcpdump without '-p', do you incur
> major packet loss? No. Will this happen when you punt several Tbps to
> your CPU on the hardware switch? Yes.
> 
> Extending the definition of promiscuous mode to mean punt all traffic to
> the CPU is wrong, IMO. You will not be able to capture all the packets

This is so illogical, it is mind boggling.

How different is this to using tcpdump/wireshark on a 100GB or 1TB
network interface?

There is no difference.

Please stop portraying switches as special in this regard, they are
not.

^ permalink raw reply

* Re: [PATCH v2 net-next 05/15] net: sgi: ioc3-eth: allocate space for desc rings only once
From: Jakub Kicinski @ 2019-08-29 22:05 UTC (permalink / raw)
  To: Thomas Bogendoerfer
  Cc: Ralf Baechle, Paul Burton, James Hogan, David S. Miller,
	linux-mips, linux-kernel, netdev
In-Reply-To: <20190830000058.882feb357058437cddc71315@suse.de>

On Fri, 30 Aug 2019 00:00:58 +0200, Thomas Bogendoerfer wrote:
> On Thu, 29 Aug 2019 14:05:37 -0700
> Jakub Kicinski <jakub.kicinski@netronome.com> wrote:
> 
> > On Thu, 29 Aug 2019 17:50:03 +0200, Thomas Bogendoerfer wrote:  
> > > +		if (skb)
> > > +			dev_kfree_skb_any(skb);  
> > 
> > I think dev_kfree_skb_any() accepts NULL  
> 
> yes, I'll drop the if
> 
> > > +
> > > +	/* Allocate and rx ring.  4kb = 512 entries  */
> > > +	ip->rxr = (unsigned long *)get_zeroed_page(GFP_ATOMIC);
> > > +	if (!ip->rxr) {
> > > +		pr_err("ioc3-eth: rx ring allocation failed\n");
> > > +		err = -ENOMEM;
> > > +		goto out_stop;
> > > +	}
> > > +
> > > +	/* Allocate tx rings.  16kb = 128 bufs.  */
> > > +	ip->txr = (struct ioc3_etxd *)__get_free_pages(GFP_KERNEL, 2);
> > > +	if (!ip->txr) {
> > > +		pr_err("ioc3-eth: tx ring allocation failed\n");
> > > +		err = -ENOMEM;
> > > +		goto out_stop;
> > > +	}  
> > 
> > Please just use kcalloc()/kmalloc_array() here,  
> 
> both allocation will be replaced in patch 11 with dma_direct_alloc_pages.
> So I hope I don't need to change it here.

Ah, missed that!

> Out of curiosity does kcalloc/kmalloc_array give me the same guarantees about
> alignment ? rx ring needs to be 4KB aligned, tx ring 16KB aligned.

I don't think so, actually, I was mostly worried you are passing
address from get_page() into kfree() here ;) But patch 11 cures that,
so that's good, too.

> >, and make sure the flags
> > are set to GFP_KERNEL whenever possible. Here and in ioc3_alloc_rings()
> > it looks like GFP_ATOMIC is unnecessary.  
> 
> yes, I'll change it

^ permalink raw reply

* Re: [PATCH bpf-next 01/13] bpf: add bpf_map_value_size and bp_map_copy_value helper functions
From: Song Liu @ 2019-08-29 22:04 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Networking, Alexei Starovoitov, Brian Vazquez,
	Daniel Borkmann, Kernel Team
In-Reply-To: <20190829064502.2750359-1-yhs@fb.com>



> On Aug 28, 2019, at 11:45 PM, Yonghong Song <yhs@fb.com> wrote:
> 
> From: Brian Vazquez <brianvv@google.com>
> 
> Move reusable code from map_lookup_elem to helper functions to avoid code
> duplication in kernel/bpf/syscall.c
> 
> Suggested-by: Stanislav Fomichev <sdf@google.com>
> Signed-off-by: Brian Vazquez <brianvv@google.com>


Acked-by: Song Liu <songliubraving@fb.com>

Yonghong, we also need your SoB. 



^ permalink raw reply

* Re: [PATCH v2 net-next 05/15] net: sgi: ioc3-eth: allocate space for desc rings only once
From: Thomas Bogendoerfer @ 2019-08-29 22:00 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Ralf Baechle, Paul Burton, James Hogan, David S. Miller,
	linux-mips, linux-kernel, netdev
In-Reply-To: <20190829140537.68abfc9f@cakuba.netronome.com>

On Thu, 29 Aug 2019 14:05:37 -0700
Jakub Kicinski <jakub.kicinski@netronome.com> wrote:

> On Thu, 29 Aug 2019 17:50:03 +0200, Thomas Bogendoerfer wrote:
> > +		if (skb)
> > +			dev_kfree_skb_any(skb);
> 
> I think dev_kfree_skb_any() accepts NULL

yes, I'll drop the if

> > +
> > +	/* Allocate and rx ring.  4kb = 512 entries  */
> > +	ip->rxr = (unsigned long *)get_zeroed_page(GFP_ATOMIC);
> > +	if (!ip->rxr) {
> > +		pr_err("ioc3-eth: rx ring allocation failed\n");
> > +		err = -ENOMEM;
> > +		goto out_stop;
> > +	}
> > +
> > +	/* Allocate tx rings.  16kb = 128 bufs.  */
> > +	ip->txr = (struct ioc3_etxd *)__get_free_pages(GFP_KERNEL, 2);
> > +	if (!ip->txr) {
> > +		pr_err("ioc3-eth: tx ring allocation failed\n");
> > +		err = -ENOMEM;
> > +		goto out_stop;
> > +	}
> 
> Please just use kcalloc()/kmalloc_array() here,

both allocation will be replaced in patch 11 with dma_direct_alloc_pages.
So I hope I don't need to change it here.

Out of curiosity does kcalloc/kmalloc_array give me the same guarantees about
alignment ? rx ring needs to be 4KB aligned, tx ring 16KB aligned.

>, and make sure the flags
> are set to GFP_KERNEL whenever possible. Here and in ioc3_alloc_rings()
> it looks like GFP_ATOMIC is unnecessary.

yes, I'll change it

Thomas.

-- 
SUSE Software Solutions Germany GmbH
HRB 247165 (AG München)
Geschäftsführer: Felix Imendörffer

^ permalink raw reply

* ANNOUNCE: rpld an another RPL implementation for Linux
From: Alexander Aring @ 2019-08-29 21:57 UTC (permalink / raw)
  To: open list:NETWORKING [GENERAL]
  Cc: Michael Richardson, Jamal Hadi Salim, Robert Kaiser,
	Martin Gergeleit, Kai Beckmann, koen, linux-wpan - ML, reubenhwk,
	BlueZ development, Stefan Schmidt, sebastian.meiling,
	Marcel Holtmann, Werner Almesberger, Jukka Rissanen

Hi,

I had some free time, I wanted to know how RPL [0] works so I did a
implementation. It's _very_ basic as it only gives you a "routable"
(is that a word?) thing afterwards in a very constrained setup of RPL
messages.

Took ~1 month to implement it and I reused some great code from radvd
[1]. I released it under the same license (BSD?). Anyway, I know there
exists a lot of memory leaks and the parameters are just crazy as not
practical in a real environment BUT it works.

I changed a little bit the dependencies from radvd (because fancy new things):

- lua for config handling
- libev for event loop handling
- libmnl for netlink handling

The code is available at:

https://github.com/linux-wpan/rpld

With a recent kernel (I think 4.19 and above) and necessary user space
dependencies, just build it and run the start script. It will create
some virtual IEEE 802.15.4 6LoWPAN interfaces and you can run
traceroute from namespace ns0 (which is the RPL DODAG root) to any
other node e.g. namespace ns5. With more knowledge of the scripts you
can change the underlying topology, everybody is welcome to improve
them.

I will work more on it when I have time... to have at least something
running means the real fun can begin (but it was already fun before).

The big thing what everybody wants is source routing, which requires
some control plane for RPL into the kernel to say how and when to put
source routing headers in IPv6. I think somehow I know what's
necessary now... but I didn't implemented it, this simple
implementation just filling up routing tables as RPL supports storing
(routing table) or non-storing (source routing) modes. People tells me
to lookup frrouting to look how they do source routing, I will if I
get the chance.

It doesn't run on Bluetooth yet, I know there exists a lack of UAPI to
figure out the linklayer which is used by 6LoWPAN. I need somehow a
SLAVE_INFO attribute in netlink to figure this out and tell me some
6LoWPAN specific attributes. I am sorry Bluetooth people, but I think
you are also more interested in source routing because I heard
somebody saying it's the more common approach outside (but I never saw
any other RPL implementation than unstrung running).

Also I did something in my masters thesis to make a better parent
selection, if this implementation becomes stable I can look to get
this migrated.

Please, radvd maintainer let me know if everything is okay from your
side. As I said I reused some code from radvd. I also operate on
ICMPv6 sockets. The same to Michael Richardson unstrung [2]. If there
is anything to talk or you have complains, I can change it.

Thanks, I really only wanted to get more knowledge about routing
protocols and how to implement such. Besides all known issues, I still
think it's a good starting point.

- Alex

[0] https://tools.ietf.org/html/rfc6550
[1] https://github.com/reubenhwk/radvd
[2] https://github.com/AnimaGUS-minerva/unstrung

^ permalink raw reply

* Re: [PATCH net-next v2 3/3] net: tls: export protocol version, cipher, tx_conf/rx_conf to socket diag
From: Jakub Kicinski @ 2019-08-29 21:56 UTC (permalink / raw)
  To: Davide Caratti
  Cc: borisp, Eric Dumazet, aviadye, davejwatson, davem, john.fastabend,
	Matthieu Baerts, netdev
In-Reply-To: <22da29aa0d0c683afeba7549cabc64c5e073d308.1567095873.git.dcaratti@redhat.com>

On Thu, 29 Aug 2019 18:48:04 +0200, Davide Caratti wrote:
> When an application configures kernel TLS on top of a TCP socket, it's
> now possible for inet_diag_handler() to collect information regarding the
> protocol version, the cipher type and TX / RX configuration, in case
> INET_DIAG_INFO is requested.
> 
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>

> diff --git a/include/net/tls.h b/include/net/tls.h
> index 4997742475cd..990f1d9182a3 100644
> --- a/include/net/tls.h
> +++ b/include/net/tls.h
> @@ -431,6 +431,25 @@ static inline bool is_tx_ready(struct tls_sw_context_tx *ctx)
>  	return READ_ONCE(rec->tx_ready);
>  }
>  
> +static inline u16 tls_user_config(struct tls_context *ctx, bool tx)
> +{
> +	u16 config = tx ? ctx->tx_conf : ctx->rx_conf;
> +
> +	switch (config) {
> +	case TLS_BASE:
> +		return TLS_CONF_BASE;
> +	case TLS_SW:
> +		return TLS_CONF_SW;
> +#ifdef CONFIG_TLS_DEVICE

Recently the TLS_HW define was taken out of the ifdef, so the ifdef
around this is no longer necessary.

> +	case TLS_HW:
> +		return TLS_CONF_HW;
> +#endif
> +	case TLS_HW_RECORD:
> +		return TLS_CONF_HW_RECORD;
> +	}
> +	return 0;
> +}
> +
>  struct sk_buff *
>  tls_validate_xmit_skb(struct sock *sk, struct net_device *dev,
>  		      struct sk_buff *skb);

> diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> index f8f2d2c3d627..3351a2ace369 100644
> --- a/net/tls/tls_main.c
> +++ b/net/tls/tls_main.c
> @@ -39,6 +39,7 @@
>  #include <linux/netdevice.h>
>  #include <linux/sched/signal.h>
>  #include <linux/inetdevice.h>
> +#include <linux/inet_diag.h>
>  
>  #include <net/tls.h>
>  
> @@ -835,6 +836,67 @@ static void tls_update(struct sock *sk, struct proto *p)
>  	}
>  }
>  
> +static int tls_get_info(const struct sock *sk, struct sk_buff *skb)
> +{
> +	struct tls_context *ctx;
> +	u16 version, cipher_type;

Unfortunately revere christmas tree will be needed :(

> +	struct nlattr *start;
> +	int err;

^ permalink raw reply

* RE: linux-next: Tree for Aug 29 (mlx5)
From: Haiyang Zhang @ 2019-08-29 21:48 UTC (permalink / raw)
  To: Saeed Mahameed, sfr@canb.auug.org.au, Eran Ben Elisha,
	linux-next@vger.kernel.org, rdunlap@infradead.org
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	Leon Romanovsky
In-Reply-To: <c92d20e27268f515e0d4c8a28f92c0da041c2acc.camel@mellanox.com>



> -----Original Message-----
> From: Saeed Mahameed <saeedm@mellanox.com>
> Sent: Thursday, August 29, 2019 2:32 PM
> To: sfr@canb.auug.org.au; Eran Ben Elisha <eranbe@mellanox.com>; linux-
> next@vger.kernel.org; rdunlap@infradead.org; Haiyang Zhang
> <haiyangz@microsoft.com>
> Cc: linux-kernel@vger.kernel.org; netdev@vger.kernel.org; Leon
> Romanovsky <leonro@mellanox.com>
> Subject: Re: linux-next: Tree for Aug 29 (mlx5)
> 
> On Thu, 2019-08-29 at 12:55 -0700, Randy Dunlap wrote:
> > On 8/29/19 12:54 PM, Randy Dunlap wrote:
> > > On 8/29/19 4:08 AM, Stephen Rothwell wrote:
> > > > Hi all,
> > > >
> > > > Changes since 20190828:
> > > >
> > >
> > > on x86_64:
> > > when CONFIG_PCI_HYPERV=m
> >
> > and CONFIG_PCI_HYPERV_INTERFACE=m
> >
> 
> Haiyang and Eran, I think CONFIG_PCI_HYPERV_INTERFACE was never
> supposed to be a module ? it supposed to provide an always available
> interface to drivers ..
> 
> Anyway, maybe we need to imply CONFIG_PCI_HYPERV_INTERFACE in mlx5.

The symbolic dependency by driver mlx5e,  automatically triggers loading of
pci_hyperv_interface module. And this module can be loaded in any platforms.

Currently, mlx5e driver has #if IS_ENABLED(CONFIG_PCI_HYPERV_INTERFACE)
around the code using the interface.

I agree --
Adding "select PCI_HYPERV_INTERFACE" for mlx5e will clean up these #if's.

Thanks,
- Haiyang

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox