Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v5 rdma-next 1/6] RDMA/core: Create mmap database and cookie helper functions
From: Leon Romanovsky @ 2019-07-09  7:02 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: ariel.elior, jgg, dledford, galpress, linux-rdma, davem, netdev
In-Reply-To: <20190708091503.14723-2-michal.kalderon@marvell.com>

On Mon, Jul 08, 2019 at 12:14:58PM +0300, Michal Kalderon wrote:
> Create some common API's for adding entries to a xa_mmap.
> Searching for an entry and freeing one.
>
> The code was copied from the efa driver almost as is, just renamed
> function to be generic and not efa specific.
>
> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
> ---
>  drivers/infiniband/core/device.c      |   1 +
>  drivers/infiniband/core/rdma_core.c   |   1 +
>  drivers/infiniband/core/uverbs_cmd.c  |   1 +
>  drivers/infiniband/core/uverbs_main.c | 105 ++++++++++++++++++++++++++++++++++
>  include/rdma/ib_verbs.h               |  32 +++++++++++
>  5 files changed, 140 insertions(+)
>
> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> index 8a6ccb936dfe..a830c2c5d691 100644
> --- a/drivers/infiniband/core/device.c
> +++ b/drivers/infiniband/core/device.c
> @@ -2521,6 +2521,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
>  	SET_DEVICE_OP(dev_ops, map_mr_sg_pi);
>  	SET_DEVICE_OP(dev_ops, map_phys_fmr);
>  	SET_DEVICE_OP(dev_ops, mmap);
> +	SET_DEVICE_OP(dev_ops, mmap_free);
>  	SET_DEVICE_OP(dev_ops, modify_ah);
>  	SET_DEVICE_OP(dev_ops, modify_cq);
>  	SET_DEVICE_OP(dev_ops, modify_device);
> diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
> index ccf4d069c25c..7166741834c8 100644
> --- a/drivers/infiniband/core/rdma_core.c
> +++ b/drivers/infiniband/core/rdma_core.c
> @@ -817,6 +817,7 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile,
>  	rdma_restrack_del(&ucontext->res);
>
>  	ib_dev->ops.dealloc_ucontext(ucontext);
> +	rdma_user_mmap_entries_remove_free(ucontext);
>  	kfree(ucontext);
>
>  	ufile->ucontext = NULL;
> diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
> index 7ddd0e5bc6b3..44c0600245e4 100644
> --- a/drivers/infiniband/core/uverbs_cmd.c
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -254,6 +254,7 @@ static int ib_uverbs_get_context(struct uverbs_attr_bundle *attrs)
>
>  	mutex_init(&ucontext->per_mm_list_lock);
>  	INIT_LIST_HEAD(&ucontext->per_mm_list);
> +	xa_init(&ucontext->mmap_xa);
>
>  	ret = get_unused_fd_flags(O_CLOEXEC);
>  	if (ret < 0)
> diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
> index 11c13c1381cf..37507cc27e8c 100644
> --- a/drivers/infiniband/core/uverbs_main.c
> +++ b/drivers/infiniband/core/uverbs_main.c
> @@ -965,6 +965,111 @@ int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
>  }
>  EXPORT_SYMBOL(rdma_user_mmap_io);
>
> +static inline u64
> +rdma_user_mmap_get_key(const struct rdma_user_mmap_entry *entry)
> +{
> +	return (u64)entry->mmap_page << PAGE_SHIFT;
> +}
> +
> +struct rdma_user_mmap_entry *
> +rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64 len)
> +{
> +	struct rdma_user_mmap_entry *entry;
> +	u64 mmap_page;
> +
> +	mmap_page = key >> PAGE_SHIFT;
> +	if (mmap_page > U32_MAX)
> +		return NULL;
> +
> +	entry = xa_load(&ucontext->mmap_xa, mmap_page);
> +	if (!entry || rdma_user_mmap_get_key(entry) != key ||
> +	    entry->length != len)
> +		return NULL;
> +
> +	ibdev_dbg(ucontext->device,
> +		  "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx] removed\n",
> +		  entry->obj, key, entry->address, entry->length);
> +
> +	return entry;
> +}
> +EXPORT_SYMBOL(rdma_user_mmap_entry_get);

Please add function description in kernel doc format for all newly EXPORT_SYMBOL()
functions you introduced in RDMA/core.

> +
> +/*
> + * Note this locking scheme cannot support removal of entries, except during
> + * ucontext destruction when the core code guarentees no concurrency.
> + */
> +u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext, void *obj,
> +				u64 address, u64 length, u8 mmap_flag)
> +{
> +	struct rdma_user_mmap_entry *entry;
> +	u32 next_mmap_page;
> +	int err;
> +
> +	entry = kmalloc(sizeof(*entry), GFP_KERNEL);

It is worth to use kzalloc and not kmalloc.

Thanks

^ permalink raw reply

* [PATCH net-next] netfilter: nft_meta: Fix build error
From: YueHaibing @ 2019-07-09  7:01 UTC (permalink / raw)
  To: davem, pablo, kadlec, fw, roopa, nikolay, wenxu
  Cc: linux-kernel, netdev, bridge, coreteam, netfilter-devel,
	YueHaibing

If NFT_BRIDGE_META is y and NF_TABLES is m, building fails:

net/bridge/netfilter/nft_meta_bridge.o: In function `nft_meta_bridge_get_init':
nft_meta_bridge.c:(.text+0xd0): undefined reference to `nft_parse_register'
nft_meta_bridge.c:(.text+0xec): undefined reference to `nft_validate_register_store'

Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 30e103fe24de ("netfilter: nft_meta: move bridge meta keys into nft_meta_bridge")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 net/bridge/netfilter/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/bridge/netfilter/Kconfig b/net/bridge/netfilter/Kconfig
index fbc7085..ca3214f 100644
--- a/net/bridge/netfilter/Kconfig
+++ b/net/bridge/netfilter/Kconfig
@@ -12,6 +12,7 @@ if NF_TABLES_BRIDGE
 
 config NFT_BRIDGE_META
 	tristate "Netfilter nf_table bridge meta support"
+	depends on NF_TABLES
 	help
 	  Add support for bridge dedicated meta key.
 
-- 
2.7.4



^ permalink raw reply related

* Re: [PATCH net-next iproute2 2/3] tc: Introduce tc ct action
From: Paul Blakey @ 2019-07-09  6:58 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Jiri Pirko, Roi Dayan, Yossi Kuperman, Oz Shlomo,
	netdev@vger.kernel.org, David Miller, Aaron Conole, Zhike Wang,
	Justin Pettit, John Hurley, Rony Efraim, nst-kernel@redhat.com,
	Simon Horman
In-Reply-To: <20190708175446.GL3449@localhost.localdomain>


On 7/8/2019 8:54 PM, Marcelo Ricardo Leitner wrote:
> On Sun, Jul 07, 2019 at 11:53:47AM +0300, Paul Blakey wrote:
>> New tc action to send packets to conntrack module, commit
>> them, and set a zone, labels, mark, and nat on the connection.
>>
>> It can also clear the packet's conntrack state by using clear.
>>
>> Usage:
>>     ct clear
>>     ct commit [force] [zone] [mark] [label] [nat]
> Isn't the 'commit' also optional? More like
>      ct [commit [force]] [zone] [mark] [label] [nat]
>
>>     ct [nat] [zone]
>>
>> Signed-off-by: Paul Blakey <paulb@mellanox.com>
>> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>> Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
>> Acked-by: Jiri Pirko <jiri@mellanox.com>
>> Acked-by: Roi Dayan <roid@mellanox.com>
>> ---
> ...
>> +static void
>> +usage(void)
>> +{
>> +	fprintf(stderr,
>> +		"Usage: ct clear\n"
>> +		"	ct commit [force] [zone ZONE] [mark MASKED_MARK] [label MASKED_LABEL] [nat NAT_SPEC]\n"
> Ditto here then.


In commit msg and here, it means there is multiple modes of operation. I 
think it's easier to split those.

"ct clear" to clear it , not other options can be added here.

"ct commit  [force].... " sends to conntrack and commit a connection, 
and only for commit can you specify force mark  label, and nat with 
nat_spec....

and the last one, "ct [nat] [zone ZONE]" is to just send the packet to 
conntrack on some zone [optional], restore nat [optional].


>
>> +		"	ct [nat] [zone ZONE]\n"
>> +		"Where: ZONE is the conntrack zone table number\n"
>> +		"	NAT_SPEC is {src|dst} addr addr1[-addr2] [port port1[-port2]]\n"
>> +		"\n");
>> +	exit(-1);
>> +}
> ...
>
> The validation below doesn't enforce that commit must be there for
> such case.
which case? commit is optional. the above are the three valid patterns.
>> +static int
>> +parse_ct(struct action_util *a, int *argc_p, char ***argv_p, int tca_id,
>> +		struct nlmsghdr *n)
>> +{
>> +	struct tc_ct sel = {};
>> +	char **argv = *argv_p;
>> +	struct rtattr *tail;
>> +	int argc = *argc_p;
>> +	int ct_action = 0;
>> +	int ret;
>> +
>> +	tail = addattr_nest(n, MAX_MSG, tca_id);
>> +
>> +	if (argc && matches(*argv, "ct") == 0)
>> +		NEXT_ARG_FWD();
>> +
>> +	while (argc > 0) {
>> +		if (matches(*argv, "zone") == 0) {
>> +			NEXT_ARG();
>> +
>> +			if (ct_parse_u16(*argv,
>> +					 TCA_CT_ZONE, TCA_CT_UNSPEC, n)) {
>> +				fprintf(stderr, "ct: Illegal \"zone\"\n");
>> +				return -1;
>> +			}
>> +		} else if (matches(*argv, "nat") == 0) {
>> +			ct_action |= TCA_CT_ACT_NAT;
>> +
>> +			NEXT_ARG();
>> +			if (matches(*argv, "src") == 0)
>> +				ct_action |= TCA_CT_ACT_NAT_SRC;
>> +			else if (matches(*argv, "dst") == 0)
>> +				ct_action |= TCA_CT_ACT_NAT_DST;
>> +			else
>> +				continue;
>> +
>> +			NEXT_ARG();
>> +			if (matches(*argv, "addr") != 0)
>> +				usage();
>> +
>> +			NEXT_ARG();
>> +			ret = ct_parse_nat_addr_range(*argv, n);
>> +			if (ret) {
>> +				fprintf(stderr, "ct: Illegal nat address range\n");
>> +				return -1;
>> +			}
>> +
>> +			NEXT_ARG_FWD();
>> +			if (matches(*argv, "port") != 0)
>> +				continue;
>> +
>> +			NEXT_ARG();
>> +			ret = ct_parse_nat_port_range(*argv, n);
>> +			if (ret) {
>> +				fprintf(stderr, "ct: Illegal nat port range\n");
>> +				return -1;
>> +			}
>> +		} else if (matches(*argv, "clear") == 0) {
>> +			ct_action |= TCA_CT_ACT_CLEAR;
>> +		} else if (matches(*argv, "commit") == 0) {
>> +			ct_action |= TCA_CT_ACT_COMMIT;
>> +		} else if (matches(*argv, "force") == 0) {
>> +			ct_action |= TCA_CT_ACT_FORCE;
>> +		} else if (matches(*argv, "index") == 0) {
>> +			NEXT_ARG();
>> +			if (get_u32(&sel.index, *argv, 10)) {
>> +				fprintf(stderr, "ct: Illegal \"index\"\n");
>> +				return -1;
>> +			}
>> +		} else if (matches(*argv, "mark") == 0) {
>> +			NEXT_ARG();
>> +
>> +			ret = ct_parse_mark(*argv, n);
>> +			if (ret) {
>> +				fprintf(stderr, "ct: Illegal \"mark\"\n");
>> +				return -1;
>> +			}
>> +		} else if (matches(*argv, "label") == 0) {
>> +			NEXT_ARG();
>> +
>> +			ret = ct_parse_labels(*argv, n);
>> +			if (ret) {
>> +				fprintf(stderr, "ct: Illegal \"label\"\n");
>> +				return -1;
>> +			}
>> +		} else if (matches(*argv, "help") == 0) {
>> +			usage();
>> +		} else {
>> +			break;
>> +		}
>> +		NEXT_ARG_FWD();
>> +	}
>> +
>> +	if (ct_action & TCA_CT_ACT_CLEAR &&
>> +	    ct_action & ~TCA_CT_ACT_CLEAR) {
>> +		fprintf(stderr, "ct: clear can only be used alone\n");
>> +		return -1;
>> +	}
>> +
>> +	if (ct_action & TCA_CT_ACT_NAT_SRC &&
>> +	    ct_action & TCA_CT_ACT_NAT_DST) {
>> +		fprintf(stderr, "ct: src and dst nat can't be used together\n");
>> +		return -1;
>> +	}
>> +
>> +	if ((ct_action & TCA_CT_ACT_COMMIT) &&
>> +	    (ct_action & TCA_CT_ACT_NAT) &&
>> +	    !(ct_action & (TCA_CT_ACT_NAT_SRC | TCA_CT_ACT_NAT_DST))) {
>> +		fprintf(stderr, "ct: commit and nat must set src or dst\n");
>> +		return -1;
>> +	}
>> +
>> +	if (!(ct_action & TCA_CT_ACT_COMMIT) &&
>> +	    (ct_action & (TCA_CT_ACT_NAT_SRC | TCA_CT_ACT_NAT_DST))) {
>> +		fprintf(stderr, "ct: src or dst is only valid if commit is set\n");
>> +		return -1;
>> +	}
>> +
>> +	parse_action_control_dflt(&argc, &argv, &sel.action, false,
>> +				  TC_ACT_PIPE);
>> +	NEXT_ARG_FWD();
>> +
>> +	addattr16(n, MAX_MSG, TCA_CT_ACTION, ct_action);
>> +	addattr_l(n, MAX_MSG, TCA_CT_PARMS, &sel, sizeof(sel));
>> +	addattr_nest_end(n, tail);
>> +
>> +	*argc_p = argc;
>> +	*argv_p = argv;
>> +	return 0;
>> +}
> ...

^ permalink raw reply

* Re: [PATCH net-next v5 1/4] net/sched: Introduce action ct
From: Paul Blakey @ 2019-07-09  7:00 UTC (permalink / raw)
  To: Florian Westphal, Marcelo Ricardo Leitner
  Cc: Jiri Pirko, Roi Dayan, Yossi Kuperman, Oz Shlomo,
	netdev@vger.kernel.org, David Miller, Aaron Conole, Zhike Wang,
	Rony Efraim, nst-kernel@redhat.com, John Hurley, Simon Horman,
	Justin Pettit
In-Reply-To: <20190708152805.dul3kgu4csr64fqk@breakpoint.cc>


On 7/8/2019 6:28 PM, Florian Westphal wrote:
> Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
>>> +	} else { /* NFPROTO_IPV6 */
>>> +		enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone;
>>> +
>>> +		memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm));
>>> +		err = nf_ct_frag6_gather(net, skb, user);
>> This doesn't build without IPv6 enabled.
>> ERROR: "nf_ct_frag6_gather" [net/sched/act_ct.ko] undefined!
>>
>> We need to (copy and pasted):
>>
>> @@ -179,7 +179,9 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb,
>>                  local_bh_enable();
>>                  if (err && err != -EINPROGRESS)
>>                          goto out_free;
>> -       } else { /* NFPROTO_IPV6 */
>> +       }
>> +#if IS_ENABLED(IPV6)
>> +       else { /* NFPROTO_IPV6 */
>>                  enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone;
> Good catch, but it should be
> #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
> just like ovs conntrack.c ,



Thanks guys.


^ permalink raw reply

* Re: [PATCH v3 net-next 19/19] ionic: Add basic devlink interface
From: Jiri Pirko @ 2019-07-09  6:56 UTC (permalink / raw)
  To: Shannon Nelson; +Cc: netdev
In-Reply-To: <6f9ebbca-4f13-b046-477c-678489e6ffbf@pensando.io>

Tue, Jul 09, 2019 at 12:58:00AM CEST, snelson@pensando.io wrote:
>On 7/8/19 1:03 PM, Jiri Pirko wrote:
>> Mon, Jul 08, 2019 at 09:58:09PM CEST, snelson@pensando.io wrote:
>> > On 7/8/19 12:34 PM, Jiri Pirko wrote:
>> > > Mon, Jul 08, 2019 at 09:25:32PM CEST, snelson@pensando.io wrote:
>> > > > 
>> > > > +
>> > > > +static const struct devlink_ops ionic_dl_ops = {
>> > > > +	.info_get	= ionic_dl_info_get,
>> > > > +};
>> > > > +
>> > > > +int ionic_devlink_register(struct ionic *ionic)
>> > > > +{
>> > > > +	struct devlink *dl;
>> > > > +	struct ionic **ip;
>> > > > +	int err;
>> > > > +
>> > > > +	dl = devlink_alloc(&ionic_dl_ops, sizeof(struct ionic *));
>> > > Oups. Something is wrong with your flow. The devlink alloc is allocating
>> > > the structure that holds private data (per-device data) for you. This is
>> > > misuse :/
>> > > 
>> > > You are missing one parent device struct apparently.
>> > > 
>> > > Oh, I think I see something like it. The unused "struct ionic_devlink".
>> > If I'm not mistaken, the alloc is only allocating enough for a pointer, not
>> > the whole per device struct, and a few lines down from here the pointer to
>> > the new devlink struct is assigned to ionic->dl.  This was based on what I
>> > found in the qed driver's qed_devlink_register(), and it all seems to work.
>> I'm not saying your code won't work. What I say is that you should have
>> a struct for device that would be allocated by devlink_alloc()
>
>Is there a particular reason why?  I appreciate that devlink_alloc() can give
>you this device specific space, just as alloc_etherdev_mq() can, but is there

Yes. Devlink manipulates with the whole device. However,
alloc_etherdev_mq() allocates only net_device. These are 2 different
things. devlink port relates 1:1 to net_device. However, devlink
instance can have multiple ports. What I say is do it correctly.


>a specific reason why this should be used instead of setting up simply a
>pointer to a space that has already been allocated?  There are several
>drivers that are using it the way I've setup here, which happened to be the
>first examples I followed - are they doing something different that makes
>this valid for them?

Nope. I'll look at that and fix.


>
>> 
>> The ionic struct should be associated with devlink_port. That you are
>> missing too.
>
>We don't support any of devlink_port features at this point, just the simple
>device information.

No problem, you can still register devlink_port. You don't have to do
much in order to do so.


>
>sln
>
>> 
>> 
>> > That unused struct ionic_devlink does need to go away, it was superfluous
>> > after working out a better typecast off of devlink_priv().
>> > 
>> > I'll remove the unused struct ionic_devlink, but I think the rest is okay.
>> > 
>> > sln
>> > 
>> > > 
>> > > > +	if (!dl) {
>> > > > +		dev_warn(ionic->dev, "devlink_alloc failed");
>> > > > +		return -ENOMEM;
>> > > > +	}
>> > > > +
>> > > > +	ip = (struct ionic **)devlink_priv(dl);
>> > > > +	*ip = ionic;
>> > > > +	ionic->dl = dl;
>> > > > +
>> > > > +	err = devlink_register(dl, ionic->dev);
>> > > > +	if (err) {
>> > > > +		dev_warn(ionic->dev, "devlink_register failed: %d\n", err);
>> > > > +		goto err_dl_free;
>> > > > +	}
>> > > > +
>> > > > +	return 0;
>> > > > +
>> > > > +err_dl_free:
>> > > > +	ionic->dl = NULL;
>> > > > +	devlink_free(dl);
>> > > > +	return err;
>> > > > +}
>> > > > +
>> > > > +void ionic_devlink_unregister(struct ionic *ionic)
>> > > > +{
>> > > > +	if (!ionic->dl)
>> > > > +		return;
>> > > > +
>> > > > +	devlink_unregister(ionic->dl);
>> > > > +	devlink_free(ionic->dl);
>> > > > +}
>> > > > diff --git a/drivers/net/ethernet/pensando/ionic/ionic_devlink.h b/drivers/net/ethernet/pensando/ionic/ionic_devlink.h
>> > > > new file mode 100644
>> > > > index 000000000000..35528884e29f
>> > > > --- /dev/null
>> > > > +++ b/drivers/net/ethernet/pensando/ionic/ionic_devlink.h
>> > > > @@ -0,0 +1,12 @@
>> > > > +/* SPDX-License-Identifier: GPL-2.0 */
>> > > > +/* Copyright(c) 2017 - 2019 Pensando Systems, Inc */
>> > > > +
>> > > > +#ifndef _IONIC_DEVLINK_H_
>> > > > +#define _IONIC_DEVLINK_H_
>> > > > +
>> > > > +#include <net/devlink.h>
>> > > > +
>> > > > +int ionic_devlink_register(struct ionic *ionic);
>> > > > +void ionic_devlink_unregister(struct ionic *ionic);
>> > > > +
>> > > > +#endif /* _IONIC_DEVLINK_H_ */
>> > > > -- 
>> > > > 2.17.1
>> > > > 
>

^ permalink raw reply

* Re: [PATCH net-next 11/16] qlge: Remove qlge_bq.len & size
From: Benjamin Poirier @ 2019-07-09  6:52 UTC (permalink / raw)
  To: Manish Chopra; +Cc: GR-Linux-NIC-Dev, netdev@vger.kernel.org
In-Reply-To: <DM6PR18MB2697E84A8DD54483832AD202ABFD0@DM6PR18MB2697.namprd18.prod.outlook.com>

On 2019/06/27 10:47, Manish Chopra wrote:
> > 
> > -	for (i = 0; i < qdev->rx_ring_count; i++) {
> > +	for (i = 0; i < qdev->rss_ring_count; i++) {
> >  		struct rx_ring *rx_ring = &qdev->rx_ring[i];
> > 
> > -		if (rx_ring->lbq.queue)
> > -			ql_free_lbq_buffers(qdev, rx_ring);
> > -		if (rx_ring->sbq.queue)
> > -			ql_free_sbq_buffers(qdev, rx_ring);
> > +		ql_free_lbq_buffers(qdev, rx_ring);
> > +		ql_free_sbq_buffers(qdev, rx_ring);
> >  	}
> >  }
> > 
> 
> Seems irrelevant change as per what this patch is supposed to do exactly.

Sure, I've removed that hunk.

^ permalink raw reply

* Re: linux-next: build failure after merge of the net-next tree
From: Leon Romanovsky @ 2019-07-09  6:43 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Doug Ledford, Jason Gunthorpe, David Miller, Networking,
	Linux Next Mailing List, Linux Kernel Mailing List,
	Bernard Metzler
In-Reply-To: <20190709135636.4d36e19f@canb.auug.org.au>

On Tue, Jul 09, 2019 at 01:56:36PM +1000, Stephen Rothwell wrote:
> Hi all,
>
> After merging the net-next tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
>
> drivers/infiniband/sw/siw/siw_cm.c: In function 'siw_create_listen':
> drivers/infiniband/sw/siw/siw_cm.c:1978:3: error: implicit declaration of function 'for_ifa'; did you mean 'fork_idle'? [-Werror=implicit-function-declaration]
>    for_ifa(in_dev)
>    ^~~~~~~
>    fork_idle
> drivers/infiniband/sw/siw/siw_cm.c:1978:18: error: expected ';' before '{' token
>    for_ifa(in_dev)
>                   ^
>                   ;
>    {
>    ~
>
> Caused by commit
>
>   6c52fdc244b5 ("rdma/siw: connection management")
>
> from the rdma tree.  I don't know why this didn't fail after I mereged
> that tree.

I had the same question, because I have this fix for a couple of days already.

From 56c9e15ec670af580daa8c3ffde9503af3042d67 Mon Sep 17 00:00:00 2001
From: Leon Romanovsky <leonro@mellanox.com>
Date: Sun, 7 Jul 2019 10:43:42 +0300
Subject: [PATCH] Fixup to build SIW issue

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/sw/siw/siw_cm.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/sw/siw/siw_cm.c b/drivers/infiniband/sw/siw/siw_cm.c
index 8e618cb7261f..c883bf514341 100644
--- a/drivers/infiniband/sw/siw/siw_cm.c
+++ b/drivers/infiniband/sw/siw/siw_cm.c
@@ -1954,6 +1954,7 @@ static void siw_drop_listeners(struct iw_cm_id *id)
 int siw_create_listen(struct iw_cm_id *id, int backlog)
 {
 	struct net_device *dev = to_siw_dev(id->device)->netdev;
+	const struct in_ifaddr *ifa;
 	int rv = 0, listeners = 0;

 	siw_dbg(id->device, "id 0x%p: backlog %d\n", id, backlog);
@@ -1975,8 +1976,7 @@ int siw_create_listen(struct iw_cm_id *id, int backlog)
 			id, &s_laddr.sin_addr, ntohs(s_laddr.sin_port),
 			&s_raddr->sin_addr, ntohs(s_raddr->sin_port));

-		for_ifa(in_dev)
-		{
+		in_dev_for_each_ifa_rcu(ifa, in_dev) {
 			if (ipv4_is_zeronet(s_laddr.sin_addr.s_addr) ||
 			    s_laddr.sin_addr.s_addr == ifa->ifa_address) {
 				s_laddr.sin_addr.s_addr = ifa->ifa_address;
@@ -1988,7 +1988,6 @@ int siw_create_listen(struct iw_cm_id *id, int backlog)
 					listeners++;
 			}
 		}
-		endfor_ifa(in_dev);
 		in_dev_put(in_dev);
 	} else if (id->local_addr.ss_family == AF_INET6) {
 		struct inet6_dev *in6_dev = in6_dev_get(dev);
--
2.21.0


>
> I have marked that driver as depending on BROKEN for today.
>
> --
> Cheers,
> Stephen Rothwell



^ permalink raw reply related

* Re: [RFC v2] vhost: introduce mdev based hardware vhost backend
From: Tiwei Bie @ 2019-07-09  6:33 UTC (permalink / raw)
  To: Jason Wang
  Cc: Alex Williamson, mst, maxime.coquelin, linux-kernel, kvm,
	virtualization, netdev, dan.daly, cunming.liang, zhihong.wang,
	idos, Rob Miller, Ariel Adam
In-Reply-To: <deae5ede-57e9-41e6-ea42-d84e07ca480a@redhat.com>

On Tue, Jul 09, 2019 at 10:50:38AM +0800, Jason Wang wrote:
> On 2019/7/8 下午2:16, Tiwei Bie wrote:
> > On Fri, Jul 05, 2019 at 08:49:46AM -0600, Alex Williamson wrote:
> > > On Thu, 4 Jul 2019 14:21:34 +0800
> > > Tiwei Bie <tiwei.bie@intel.com> wrote:
> > > > On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote:
> > > > > On 2019/7/3 下午9:08, Tiwei Bie wrote:
> > > > > > On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:
> > > > > > > On 2019/7/3 下午7:52, Tiwei Bie wrote:
> > > > > > > > On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:
> > > > > > > > > On 2019/7/3 下午5:13, Tiwei Bie wrote:
> > > > > > > > > > Details about this can be found here:
> > > > > > > > > > 
> > > > > > > > > > https://lwn.net/Articles/750770/
> > > > > > > > > > 
> > > > > > > > > > What's new in this version
> > > > > > > > > > ==========================
> > > > > > > > > > 
> > > > > > > > > > A new VFIO device type is introduced - vfio-vhost. This addressed
> > > > > > > > > > some comments from here:https://patchwork.ozlabs.org/cover/984763/
> > > > > > > > > > 
> > > > > > > > > > Below is the updated device interface:
> > > > > > > > > > 
> > > > > > > > > > Currently, there are two regions of this device: 1) CONFIG_REGION
> > > > > > > > > > (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
> > > > > > > > > > device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
> > > > > > > > > > can be used to notify the device.
> > > > > > > > > > 
> > > > > > > > > > 1. CONFIG_REGION
> > > > > > > > > > 
> > > > > > > > > > The region described by CONFIG_REGION is the main control interface.
> > > > > > > > > > Messages will be written to or read from this region.
> > > > > > > > > > 
> > > > > > > > > > The message type is determined by the `request` field in message
> > > > > > > > > > header. The message size is encoded in the message header too.
> > > > > > > > > > The message format looks like this:
> > > > > > > > > > 
> > > > > > > > > > struct vhost_vfio_op {
> > > > > > > > > > 	__u64 request;
> > > > > > > > > > 	__u32 flags;
> > > > > > > > > > 	/* Flag values: */
> > > > > > > > > >      #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
> > > > > > > > > > 	__u32 size;
> > > > > > > > > > 	union {
> > > > > > > > > > 		__u64 u64;
> > > > > > > > > > 		struct vhost_vring_state state;
> > > > > > > > > > 		struct vhost_vring_addr addr;
> > > > > > > > > > 	} payload;
> > > > > > > > > > };
> > > > > > > > > > 
> > > > > > > > > > The existing vhost-kernel ioctl cmds are reused as the message
> > > > > > > > > > requests in above structure.
> > > > > > > > > Still a comments like V1. What's the advantage of inventing a new protocol?
> > > > > > > > I'm trying to make it work in VFIO's way..
> > > > > > > > > I believe either of the following should be better:
> > > > > > > > > 
> > > > > > > > > - using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL and
> > > > > > > > > extend it with e.g notify region. The advantages is that all exist userspace
> > > > > > > > > program could be reused without modification (or minimal modification). And
> > > > > > > > > vhost API hides lots of details that is not necessary to be understood by
> > > > > > > > > application (e.g in the case of container).
> > > > > > > > Do you mean reusing vhost's ioctl on VFIO device fd directly,
> > > > > > > > or introducing another mdev driver (i.e. vhost_mdev instead of
> > > > > > > > using the existing vfio_mdev) for mdev device?
> > > > > > > Can we simply add them into ioctl of mdev_parent_ops?
> > > > > > Right, either way, these ioctls have to be and just need to be
> > > > > > added in the ioctl of the mdev_parent_ops. But another thing we
> > > > > > also need to consider is that which file descriptor the userspace
> > > > > > will do the ioctl() on. So I'm wondering do you mean let the
> > > > > > userspace do the ioctl() on the VFIO device fd of the mdev
> > > > > > device?
> > > > > Yes.
> > > > Got it! I'm not sure what's Alex opinion on this. If we all
> > > > agree with this, I can do it in this way.
> > > > 
> > > > > Is there any other way btw?
> > > > Just a quick thought.. Maybe totally a bad idea. I was thinking
> > > > whether it would be odd to do non-VFIO's ioctls on VFIO's device
> > > > fd. So I was wondering whether it's possible to allow binding
> > > > another mdev driver (e.g. vhost_mdev) to the supported mdev
> > > > devices. The new mdev driver, vhost_mdev, can provide similar
> > > > ways to let userspace open the mdev device and do the vhost ioctls
> > > > on it. To distinguish with the vfio_mdev compatible mdev devices,
> > > > the device API of the new vhost_mdev compatible mdev devices
> > > > might be e.g. "vhost-net" for net?
> > > > 
> > > > So in VFIO case, the device will be for passthru directly. And
> > > > in VHOST case, the device can be used to accelerate the existing
> > > > virtualized devices.
> > > > 
> > > > How do you think?
> > > VFIO really can't prevent vendor specific ioctls on the device file
> > > descriptor for mdevs, but a) we'd want to be sure the ioctl address
> > > space can't collide with ioctls we'd use for vfio defined purposes and
> > > b) maybe the VFIO user API isn't what you want in the first place if
> > > you intend to mostly/entirely ignore the defined ioctl set and replace
> > > them with your own.  In the case of the latter, you're also not getting
> > > the advantages of the existing VFIO userspace code, so why expose a
> > > VFIO device at all.
> > Yeah, I totally agree.
> 
> 
> I guess the original idea is to reuse the VFIO DMA/IOMMU API for this. Then
> we have the chance to reuse vfio codes in qemu for dealing with e.g vIOMMU.

Yeah, you are right. We have several choices here:

#1. We expose a VFIO device, so we can reuse the VFIO container/group
    based DMA API and potentially reuse a lot of VFIO code in QEMU.

    But in this case, we have two choices for the VFIO device interface
    (i.e. the interface on top of VFIO device fd):

    A) we may invent a new vhost protocol (as demonstrated by the code
       in this RFC) on VFIO device fd to make it work in VFIO's way,
       i.e. regions and irqs.

    B) Or as you proposed, instead of inventing a new vhost protocol,
       we can reuse most existing vhost ioctls on the VFIO device fd
       directly. There should be no conflicts between the VFIO ioctls
       (type is 0x3B) and VHOST ioctls (type is 0xAF) currently.

#2. Instead of exposing a VFIO device, we may expose a VHOST device.
    And we will introduce a new mdev driver vhost-mdev to do this.
    It would be natural to reuse the existing kernel vhost interface
    (ioctls) on it as much as possible. But we will need to invent
    some APIs for DMA programming (reusing VHOST_SET_MEM_TABLE is a
    choice, but it's too heavy and doesn't support vIOMMU by itself).

I'm not sure which one is the best choice we all want..
Which one (#1/A, #1/B, or #2) would you prefer?

> 
> 
> > 
> > > The mdev interface does provide a general interface for creating and
> > > managing virtual devices, vfio-mdev is just one driver on the mdev
> > > bus.  Parav (Mellanox) has been doing work on mdev-core to help clean
> > > out vfio-isms from the interface, aiui, with the intent of implementing
> > > another mdev bus driver for using the devices within the kernel.
> > Great to know this! I found below series after some searching:
> > 
> > https://lkml.org/lkml/2019/3/8/821
> > 
> > In above series, the new mlx5_core mdev driver will do the probe
> > by calling mlx5_get_core_dev() first on the parent device of the
> > mdev device. In vhost_mdev, maybe we can also keep track of all
> > the compatible mdev devices and use this info to do the probe.
> 
> 
> I don't get why this is needed. My understanding is if we want to go this
> way, there're actually two parts. 1) Vhost mdev that implements the device
> managements and vhost ioctl. 2) Vhost it self, which can accept mdev fd as
> it backend through VHOST_NET_SET_BACKEND.

I think with vhost-mdev (or with vfio-mdev if we agree to do vhost
ioctls on vfio device fd directly), we don't need to open /dev/vhost-net
(and there is no VHOST_NET_SET_BACKEND needed) at all. Either way,
after getting the fd of the mdev, we just need to do vhost ioctls
on it directly.

> 
> 
> > But we also need a way to allow vfio_mdev driver to distinguish
> > and reject the incompatible mdev devices.
> 
> 
> One issue for this series is that it doesn't consider DMA isolation at all.
> 
> 
> > 
> > > It
> > > seems like this vhost-mdev driver might be similar, using mdev but not
> > > necessarily vfio-mdev to expose devices.  Thanks,
> > Yeah, I also think so!
> 
> 
> I've cced some driver developers for their inputs. I think we need a sample
> parent drivers in the next version for us to understand the full picture.
> 
> 
> Thanks
> 
> 
> > 
> > Thanks!
> > Tiwei
> > 
> > > Alex

^ permalink raw reply

* Re: [PATCH net-next v2 0/4] bnxt_en: Add XDP_REDIRECT support.
From: Ilias Apalodimas @ 2019-07-09  6:30 UTC (permalink / raw)
  To: David Miller; +Cc: michael.chan, gospo, netdev, hawk, ast
In-Reply-To: <20190708.152020.327516269485719584.davem@davemloft.net>

Hi David, 

On Mon, Jul 08, 2019 at 03:20:20PM -0700, David Miller wrote:
> From: Michael Chan <michael.chan@broadcom.com>
> Date: Mon,  8 Jul 2019 17:53:00 -0400
> 
> > This patch series adds XDP_REDIRECT support by Andy Gospodarek.
> 
> Series applied, thanks everyone.

We need a fix on this after merging Ivans patch
commit 1da4bbeffe41ba318812d7590955faee8636668b

page_pool_destroy needs to be explicitely called when shutting down the
interface as it's not automatically called from xdp_rxq_info_unreg()

Thanks
/Ilias

^ permalink raw reply

* Re: [PATCH net-next v2 4/4] bnxt_en: add page_pool support
From: Ilias Apalodimas @ 2019-07-09  6:27 UTC (permalink / raw)
  To: Michael Chan; +Cc: davem, gospo, netdev, hawk, ast
In-Reply-To: <1562622784-29918-5-git-send-email-michael.chan@broadcom.com>

Hi Andy, Michael,

On Mon, Jul 08, 2019 at 05:53:04PM -0400, Michael Chan wrote:
> From: Andy Gospodarek <gospo@broadcom.com>
> 
> This removes contention over page allocation for XDP_REDIRECT actions by
> adding page_pool support per queue for the driver.  The performance for
> XDP_REDIRECT actions scales linearly with the number of cores performing
> redirect actions when using the page pools instead of the standard page
> allocator.
> 
> v2: Fix up the error path from XDP registration, noted by Ilias Apalodimas.
> 
> Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
> Signed-off-by: Michael Chan <michael.chan@broadcom.com>
> ---
>  drivers/net/ethernet/broadcom/Kconfig         |  1 +
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 47 +++++++++++++++++++++++----
>  drivers/net/ethernet/broadcom/bnxt/bnxt.h     |  3 ++
>  drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |  3 +-
>  4 files changed, 47 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/Kconfig b/drivers/net/ethernet/broadcom/Kconfig
> index 2e4a8c7..e9017ca 100644
> --- a/drivers/net/ethernet/broadcom/Kconfig
> +++ b/drivers/net/ethernet/broadcom/Kconfig
> @@ -199,6 +199,7 @@ config BNXT
>  	select FW_LOADER
>  	select LIBCRC32C
>  	select NET_DEVLINK
> +	select PAGE_POOL
>  	---help---
>  	  This driver supports Broadcom NetXtreme-C/E 10/25/40/50 gigabit
>  	  Ethernet cards.  To compile this driver as a module, choose M here:
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index d8f0846..d25bb38 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -54,6 +54,7 @@
>  #include <net/pkt_cls.h>
>  #include <linux/hwmon.h>
>  #include <linux/hwmon-sysfs.h>
> +#include <net/page_pool.h>
>  
>  #include "bnxt_hsi.h"
>  #include "bnxt.h"
> @@ -668,19 +669,20 @@ static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts)
>  }
>  
>  static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping,
> +					 struct bnxt_rx_ring_info *rxr,
>  					 gfp_t gfp)
>  {
>  	struct device *dev = &bp->pdev->dev;
>  	struct page *page;
>  
> -	page = alloc_page(gfp);
> +	page = page_pool_dev_alloc_pages(rxr->page_pool);
>  	if (!page)
>  		return NULL;
>  
>  	*mapping = dma_map_page_attrs(dev, page, 0, PAGE_SIZE, bp->rx_dir,
>  				      DMA_ATTR_WEAK_ORDERING);
>  	if (dma_mapping_error(dev, *mapping)) {
> -		__free_page(page);
> +		page_pool_recycle_direct(rxr->page_pool, page);
>  		return NULL;
>  	}
>  	*mapping += bp->rx_dma_offset;
> @@ -716,7 +718,8 @@ int bnxt_alloc_rx_data(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
>  	dma_addr_t mapping;
>  
>  	if (BNXT_RX_PAGE_MODE(bp)) {
> -		struct page *page = __bnxt_alloc_rx_page(bp, &mapping, gfp);
> +		struct page *page =
> +			__bnxt_alloc_rx_page(bp, &mapping, rxr, gfp);
>  
>  		if (!page)
>  			return -ENOMEM;
> @@ -2360,7 +2363,7 @@ static void bnxt_free_rx_skbs(struct bnxt *bp)
>  				dma_unmap_page_attrs(&pdev->dev, mapping,
>  						     PAGE_SIZE, bp->rx_dir,
>  						     DMA_ATTR_WEAK_ORDERING);
> -				__free_page(data);
> +				page_pool_recycle_direct(rxr->page_pool, data);
>  			} else {
>  				dma_unmap_single_attrs(&pdev->dev, mapping,
>  						       bp->rx_buf_use_size,
> @@ -2497,6 +2500,8 @@ static void bnxt_free_rx_rings(struct bnxt *bp)
>  		if (xdp_rxq_info_is_reg(&rxr->xdp_rxq))
>  			xdp_rxq_info_unreg(&rxr->xdp_rxq);
>  
> +		rxr->page_pool = NULL;
> +
>  		kfree(rxr->rx_tpa);
>  		rxr->rx_tpa = NULL;
>  
> @@ -2511,6 +2516,26 @@ static void bnxt_free_rx_rings(struct bnxt *bp)
>  	}
>  }
>  
> +static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
> +				   struct bnxt_rx_ring_info *rxr)
> +{
> +	struct page_pool_params pp = { 0 };
> +
> +	pp.pool_size = bp->rx_ring_size;
> +	pp.nid = dev_to_node(&bp->pdev->dev);
> +	pp.dev = &bp->pdev->dev;
> +	pp.dma_dir = DMA_BIDIRECTIONAL;
> +
> +	rxr->page_pool = page_pool_create(&pp);
> +	if (IS_ERR(rxr->page_pool)) {
> +		int err = PTR_ERR(rxr->page_pool);
> +
> +		rxr->page_pool = NULL;
> +		return err;
> +	}
> +	return 0;
> +}
> +
>  static int bnxt_alloc_rx_rings(struct bnxt *bp)
>  {
>  	int i, rc, agg_rings = 0, tpa_rings = 0;
> @@ -2530,14 +2555,24 @@ static int bnxt_alloc_rx_rings(struct bnxt *bp)
>  
>  		ring = &rxr->rx_ring_struct;
>  
> +		rc = bnxt_alloc_rx_page_pool(bp, rxr);
> +		if (rc)
> +			return rc;
> +
>  		rc = xdp_rxq_info_reg(&rxr->xdp_rxq, bp->dev, i);
> -		if (rc < 0)
> +		if (rc < 0) {
> +			page_pool_free(rxr->page_pool);

commit 1da4bbeffe41ba318812d7590955faee8636668b has been merged. Can you please
change this to page_pool_destroy(). Please note that this jhas to change for the
'normal' shutdown path as well (calling page_pool_destroy after
xdp_rxq_info_unreg)

> +			rxr->page_pool = NULL;
>  			return rc;
> +		}
>  
>  		rc = xdp_rxq_info_reg_mem_model(&rxr->xdp_rxq,
> -						MEM_TYPE_PAGE_SHARED, NULL);
> +						MEM_TYPE_PAGE_POOL,
> +						rxr->page_pool);
>  		if (rc) {
>  			xdp_rxq_info_unreg(&rxr->xdp_rxq);
> +			page_pool_free(rxr->page_pool);

ditto

> +			rxr->page_pool = NULL;
>  			return rc;
>  		}
>  
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> index 8ac51fa..16694b7 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> @@ -26,6 +26,8 @@
>  #include <net/xdp.h>
>  #include <linux/dim.h>
>  
> +struct page_pool;
> +
>  struct tx_bd {
>  	__le32 tx_bd_len_flags_type;
>  	#define TX_BD_TYPE					(0x3f << 0)
> @@ -799,6 +801,7 @@ struct bnxt_rx_ring_info {
>  	struct bnxt_ring_struct	rx_ring_struct;
>  	struct bnxt_ring_struct	rx_agg_ring_struct;
>  	struct xdp_rxq_info	xdp_rxq;
> +	struct page_pool	*page_pool;
>  };
>  
>  struct bnxt_cp_ring_info {
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> index 12489d2..c6f6f20 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> @@ -15,6 +15,7 @@
>  #include <linux/bpf.h>
>  #include <linux/bpf_trace.h>
>  #include <linux/filter.h>
> +#include <net/page_pool.h>
>  #include "bnxt_hsi.h"
>  #include "bnxt.h"
>  #include "bnxt_xdp.h"
> @@ -191,7 +192,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
>  
>  		if (xdp_do_redirect(bp->dev, &xdp, xdp_prog)) {
>  			trace_xdp_exception(bp->dev, xdp_prog, act);
> -			__free_page(page);
> +			page_pool_recycle_direct(rxr->page_pool, page);
>  			return true;
>  		}
>  
> -- 
> 2.5.1
> 

Thanks and sorry for the inconvenience :(
/Ilias

^ permalink raw reply

* Re: [PATCH net-next,v3 11/11] netfilter: nf_tables: add hardware offload support
From: Jiri Pirko @ 2019-07-09  6:20 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Pablo Neira Ayuso, netdev, davem, thomas.lendacky, f.fainelli,
	ariel.elior, michael.chan, madalin.bucur, yisen.zhuang,
	salil.mehta, jeffrey.t.kirsher, tariqt, saeedm, jiri, idosch,
	peppe.cavallaro, grygorii.strashko, andrew, vivien.didelot,
	alexandre.torgue, joabreu, linux-net-drivers, ogerlitz,
	Manish.Chopra, marcelo.leitner, mkubecek, venkatkumar.duvvuru,
	maxime.chevallier, cphealy, netfilter-devel
In-Reply-To: <20190708184437.4d29648a@cakuba.netronome.com>

Tue, Jul 09, 2019 at 03:44:37AM CEST, jakub.kicinski@netronome.com wrote:
>On Mon,  8 Jul 2019 18:06:13 +0200, Pablo Neira Ayuso wrote:
>> This patch adds hardware offload support for nftables through the
>> existing netdev_ops->ndo_setup_tc() interface, the TC_SETUP_CLSFLOWER
>> classifier and the flow rule API. This hardware offload support is
>> available for the NFPROTO_NETDEV family and the ingress hook.
>> 
>> Each nftables expression has a new ->offload interface, that is used to
>> populate the flow rule object that is attached to the transaction
>> object.
>> 
>> There is a new per-table NFT_TABLE_F_HW flag, that is set on to offload
>> an entire table, including all of its chains.
>> 
>> This patch supports for basic metadata (layer 3 and 4 protocol numbers),
>> 5-tuple payload matching and the accept/drop actions; this also includes
>> basechain hardware offload only.
>> 
>> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
>
>Any particular reason to not fence this off with a device feature
>(ethtool -k)?  Then you wouldn't need that per-driver list abomination
>until drivers start advertising it..  IDK if we want the per-device
>offload enable flags or not in general, it seems like a good idea in
>general for admin to be able to disable offload per device 🤷
>
>> +static int nft_flow_offload_rule(struct nft_trans *trans,
>> +				 enum tc_fl_command command)
>> +{
>> +	struct nft_flow_rule *flow = nft_trans_flow_rule(trans);
>> +	struct nft_rule *rule = nft_trans_rule(trans);
>> +	struct tc_cls_flower_offload cls_flower = {};
>> +	struct nft_base_chain *basechain;
>> +	struct netlink_ext_ack extack;
>> +	__be16 proto = ETH_P_ALL;
>> +
>> +	if (!nft_is_base_chain(trans->ctx.chain))
>> +		return -EOPNOTSUPP;
>> +
>> +	basechain = nft_base_chain(trans->ctx.chain);
>> +
>> +	if (flow)
>> +		proto = flow->proto;
>> +
>> +	nft_flow_offload_common_init(&cls_flower.common, proto, &extack);
>> +	cls_flower.command = command;
>> +	cls_flower.cookie = (unsigned long) rule;
>> +	if (flow)
>> +		cls_flower.rule = flow->rule;
>> +
>> +	return nft_setup_cb_call(basechain, TC_SETUP_CLSFLOWER, &cls_flower);
>> +}
>
>Are we 100% okay with using TC cls_flower structures and defines in nft
>code?

Yeah, your right. Should be renamed and moved to "flow offload" as well.

^ permalink raw reply

* RE: [PATCH net-next v6 0/5] devlink: Introduce PCI PF, VF ports and attributes
From: Parav Pandit @ 2019-07-09  6:20 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev@vger.kernel.org, Jiri Pirko, Saeed Mahameed
In-Reply-To: <20190708224012.0280846c@cakuba.netronome.com>

Hi Jakub,

> -----Original Message-----
> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> Sent: Tuesday, July 9, 2019 11:10 AM
> To: Parav Pandit <parav@mellanox.com>
> Cc: netdev@vger.kernel.org; Jiri Pirko <jiri@mellanox.com>; Saeed Mahameed
> <saeedm@mellanox.com>
> Subject: Re: [PATCH net-next v6 0/5] devlink: Introduce PCI PF, VF ports and
> attributes
> 
> On Mon,  8 Jul 2019 23:17:34 -0500, Parav Pandit wrote:
> > This patchset carry forwards the work initiated in [1] and discussion
> > futher concluded at [2].
> >
> > To improve visibility of representor netdevice, its association with
> > PF or VF, physical port, two new devlink port flavours are added as
> > PCI PF and PCI VF ports.
> >
> > A sample eswitch view can be seen below, which will be futher extended
> > to mdev subdevices of a PCI function in future.
> >
> > Patch-1 moves physical port's attribute to new structure
> > Patch-2 enhances netlink response to consider port flavour
> > Patch-3,4 extends devlink port attributes and port flavour
> > Patch-5 extends mlx5 driver to register devlink ports for PF, VF and
> > physical link.
> 
> The coding leaves something to be desired:
> 
> 1) flavour handling in devlink_nl_port_attrs_put() really calls for a
>    switch statement,
> 2) devlink_port_attrs_.*set() can take a pointer to flavour specific
>    structure instead of attr structure for setting the parameters,
> 3) the "ret" variable there is unnecessary,
> 4) there is inconsistency in whether there is an empty line between
>    if (ret) return; after __devlink_port_attrs_set() and attr setting,
> 5) /* Associated PCI VF for of the PCI PF for this port. */ doesn't
>    read great;
> 6) mlx5 functions should preferably have an appropriate prefix - f.e.
>    register_devlink_port() or is_devlink_port_supported().
> 
Those two static helper functions doesn't need mlx5_ prefix.
ndo ops are prefixed appropriately.

> But I'll leave it to Jiri and Dave to decide if its worth a respin :) Functionally I
> think this is okay.
> 
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

^ permalink raw reply

* Re: [PATCH net-next v6 0/5] devlink: Introduce PCI PF, VF ports and attributes
From: Jiri Pirko @ 2019-07-09  6:17 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Parav Pandit, netdev, jiri, saeedm
In-Reply-To: <20190708224012.0280846c@cakuba.netronome.com>

Tue, Jul 09, 2019 at 07:40:12AM CEST, jakub.kicinski@netronome.com wrote:
>On Mon,  8 Jul 2019 23:17:34 -0500, Parav Pandit wrote:
>> This patchset carry forwards the work initiated in [1] and discussion
>> futher concluded at [2].
>> 
>> To improve visibility of representor netdevice, its association with
>> PF or VF, physical port, two new devlink port flavours are added as
>> PCI PF and PCI VF ports.
>> 
>> A sample eswitch view can be seen below, which will be futher extended to
>> mdev subdevices of a PCI function in future.
>> 
>> Patch-1 moves physical port's attribute to new structure
>> Patch-2 enhances netlink response to consider port flavour
>> Patch-3,4 extends devlink port attributes and port flavour
>> Patch-5 extends mlx5 driver to register devlink ports for PF, VF and
>> physical link.
>
>The coding leaves something to be desired:
>
>1) flavour handling in devlink_nl_port_attrs_put() really calls for a
>   switch statement,
>2) devlink_port_attrs_.*set() can take a pointer to flavour specific
>   structure instead of attr structure for setting the parameters,
>3) the "ret" variable there is unnecessary,
>4) there is inconsistency in whether there is an empty line between
>   if (ret) return; after __devlink_port_attrs_set() and attr setting,
>5) /* Associated PCI VF for of the PCI PF for this port. */ doesn't
>   read great;
>6) mlx5 functions should preferably have an appropriate prefix - f.e.
>   register_devlink_port() or is_devlink_port_supported().
>
>But I'll leave it to Jiri and Dave to decide if its worth a respin :)
>Functionally I think this is okay.
>

I'm happy with the set as it is right now. Anyway, if you want your
concerns to be addresses, you should write them to the appropriate code.
This list is hard to follow.


>Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

^ permalink raw reply

* Re: [bpf PATCH v2 0/6] bpf: sockmap/tls fixes
From: Jakub Kicinski @ 2019-07-09  6:13 UTC (permalink / raw)
  To: John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf
In-Reply-To: <156261310104.31108.4569969631798277807.stgit@ubuntu3-kvm1>

On Mon, 08 Jul 2019 19:13:29 +0000, John Fastabend wrote:
> Resolve a series of splats discovered by syzbot and an unhash
> TLS issue noted by Eric Dumazet.
> 
> The main issues revolved around interaction between TLS and
> sockmap tear down. TLS and sockmap could both reset sk->prot
> ops creating a condition where a close or unhash op could be
> called forever. A rare race condition resulting from a missing
> rcu sync operation was causing a use after free. Then on the
> TLS side dropping the sock lock and re-acquiring it during the
> close op could hang. Finally, sockmap must be deployed before
> tls for current stack assumptions to be met. This is enforced
> now. A feature series can enable it.
> 
> To fix this first refactor TLS code so the lock is held for the
> entire teardown operation. Then add an unhash callback to ensure
> TLS can not transition from ESTABLISHED to LISTEN state. This
> transition is a similar bug to the one found and fixed previously
> in sockmap. Then apply three fixes to sockmap to fix up races
> on tear down around map free and close. Finally, if sockmap
> is destroyed before TLS we add a new ULP op update to inform
> the TLS stack it should not call sockmap ops. This last one
> appears to be the most commonly found issue from syzbot.

Looks like strparser is not done'd for offload?

About patch 6 - I was recently wondering about the "impossible" syzbot
report where context is not freed and my conclusion was that there
can be someone sitting at lock_sock() in tcp_close() already by the
time we start installing the ULP, so TLS's close will never get called.
The entire replacing of callbacks business is really shaky :(

Perhaps I'm rumbling, I will take a close look after I get some sleep :)

^ permalink raw reply

* Re: [PATCH net-next v6 0/5] devlink: Introduce PCI PF, VF ports and attributes
From: Jakub Kicinski @ 2019-07-09  5:40 UTC (permalink / raw)
  To: Parav Pandit; +Cc: netdev, jiri, saeedm
In-Reply-To: <20190709041739.44292-1-parav@mellanox.com>

On Mon,  8 Jul 2019 23:17:34 -0500, Parav Pandit wrote:
> This patchset carry forwards the work initiated in [1] and discussion
> futher concluded at [2].
> 
> To improve visibility of representor netdevice, its association with
> PF or VF, physical port, two new devlink port flavours are added as
> PCI PF and PCI VF ports.
> 
> A sample eswitch view can be seen below, which will be futher extended to
> mdev subdevices of a PCI function in future.
> 
> Patch-1 moves physical port's attribute to new structure
> Patch-2 enhances netlink response to consider port flavour
> Patch-3,4 extends devlink port attributes and port flavour
> Patch-5 extends mlx5 driver to register devlink ports for PF, VF and
> physical link.

The coding leaves something to be desired:

1) flavour handling in devlink_nl_port_attrs_put() really calls for a
   switch statement,
2) devlink_port_attrs_.*set() can take a pointer to flavour specific
   structure instead of attr structure for setting the parameters,
3) the "ret" variable there is unnecessary,
4) there is inconsistency in whether there is an empty line between
   if (ret) return; after __devlink_port_attrs_set() and attr setting,
5) /* Associated PCI VF for of the PCI PF for this port. */ doesn't
   read great;
6) mlx5 functions should preferably have an appropriate prefix - f.e.
   register_devlink_port() or is_devlink_port_supported().

But I'll leave it to Jiri and Dave to decide if its worth a respin :)
Functionally I think this is okay.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

^ permalink raw reply

* Re: [PATCH v3 net-next 13/19] ionic: Add initial ethtool support
From: Michal Kubecek @ 2019-07-09  5:25 UTC (permalink / raw)
  To: Shannon Nelson; +Cc: netdev
In-Reply-To: <20190708192532.27420-14-snelson@pensando.io>

On Mon, Jul 08, 2019 at 12:25:26PM -0700, Shannon Nelson wrote:
> Add in the basic ethtool callbacks for device information
> and control.
> 
> Signed-off-by: Shannon Nelson <snelson@pensando.io>
> ---
>  drivers/net/ethernet/pensando/ionic/Makefile  |   2 +-
>  .../net/ethernet/pensando/ionic/ionic_dev.h   |   3 +
>  .../ethernet/pensando/ionic/ionic_ethtool.c   | 509 ++++++++++++++++++
>  .../ethernet/pensando/ionic/ionic_ethtool.h   |   9 +
>  .../net/ethernet/pensando/ionic/ionic_lif.c   |   2 +
>  .../net/ethernet/pensando/ionic/ionic_lif.h   |   8 +
>  6 files changed, 532 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/net/ethernet/pensando/ionic/ionic_ethtool.c
>  create mode 100644 drivers/net/ethernet/pensando/ionic/ionic_ethtool.h
...
> +static int ionic_set_link_ksettings(struct net_device *netdev,
> +				    const struct ethtool_link_ksettings *ks)
> +{
> +	struct lif *lif = netdev_priv(netdev);
> +	struct ionic *ionic = lif->ionic;
> +	struct ionic_dev *idev = &lif->ionic->idev;
> +	u8 fec_type = PORT_FEC_TYPE_NONE;
> +	u32 req_rs, req_fc;
> +	int err = 0;
> +
> +	/* set autoneg */
> +	if (ks->base.autoneg != idev->port_info->config.an_enable) {
> +		mutex_lock(&ionic->dev_cmd_lock);
> +		ionic_dev_cmd_port_autoneg(idev, ks->base.autoneg);
> +		err = ionic_dev_cmd_wait(ionic, devcmd_timeout);
> +		mutex_unlock(&ionic->dev_cmd_lock);
> +		if (err)
> +			return err;
> +
> +		idev->port_info->config.an_enable = ks->base.autoneg;
> +	}
> +
> +	/* set speed */
> +	if (ks->base.speed != le32_to_cpu(idev->port_info->config.speed)) {
> +		mutex_lock(&ionic->dev_cmd_lock);
> +		ionic_dev_cmd_port_speed(idev, ks->base.speed);
> +		err = ionic_dev_cmd_wait(ionic, devcmd_timeout);
> +		mutex_unlock(&ionic->dev_cmd_lock);
> +		if (err)
> +			return err;
> +
> +		idev->port_info->config.speed = cpu_to_le32(ks->base.speed);
> +	}
> +
> +	/* set FEC */
> +	req_rs = ethtool_link_ksettings_test_link_mode(ks, advertising, FEC_RS);
> +	req_fc = ethtool_link_ksettings_test_link_mode(ks, advertising, FEC_BASER);
> +	if (req_rs && req_fc) {
> +		netdev_info(netdev, "Only select one FEC mode at a time\n");
> +		return -EINVAL;
> +
> +	} else if (req_fc &&
> +		   idev->port_info->config.fec_type != PORT_FEC_TYPE_FC) {
> +		fec_type = PORT_FEC_TYPE_FC;
> +	} else if (req_rs &&
> +		   idev->port_info->config.fec_type != PORT_FEC_TYPE_RS) {
> +		fec_type = PORT_FEC_TYPE_RS;
> +	} else if (!(req_rs | req_fc) &&
> +		 idev->port_info->config.fec_type != PORT_FEC_TYPE_NONE) {
> +		fec_type = PORT_FEC_TYPE_NONE;
> +	}
> +
> +	if (fec_type != idev->port_info->config.fec_type) {
> +		mutex_lock(&ionic->dev_cmd_lock);
> +		ionic_dev_cmd_port_fec(idev, PORT_FEC_TYPE_NONE);

The second argument should be fec_type, I believe.

> +		err = ionic_dev_cmd_wait(ionic, devcmd_timeout);
> +		mutex_unlock(&ionic->dev_cmd_lock);
> +		if (err)
> +			return err;
> +
> +		idev->port_info->config.fec_type = fec_type;
> +	}
> +
> +	return 0;
> +}
...
> +static int ionic_set_ringparam(struct net_device *netdev,
> +			       struct ethtool_ringparam *ring)
> +{
> +	struct lif *lif = netdev_priv(netdev);
> +	bool running;
> +	int i, j;
> +
> +	if (ring->rx_mini_pending || ring->rx_jumbo_pending) {
> +		netdev_info(netdev, "Changing jumbo or mini descriptors not supported\n");
> +		return -EINVAL;
> +	}
> +
> +	i = ring->tx_pending & (ring->tx_pending - 1);
> +	j = ring->rx_pending & (ring->rx_pending - 1);
> +	if (i || j) {
> +		netdev_info(netdev, "Descriptor count must be a power of 2\n");
> +		return -EINVAL;
> +	}

You can use is_power_of_2() here (it wouldn't allow 0 but you probably
don't want to allow that either).

Michal

> +
> +	/* if nothing to do return success */
> +	if (ring->tx_pending == lif->ntxq_descs &&
> +	    ring->rx_pending == lif->nrxq_descs)
> +		return 0;
> +
> +	while (test_and_set_bit(LIF_QUEUE_RESET, lif->state))
> +		usleep_range(200, 400);
> +
> +	running = test_bit(LIF_UP, lif->state);
> +	if (running)
> +		ionic_stop(netdev);
> +
> +	lif->ntxq_descs = ring->tx_pending;
> +	lif->nrxq_descs = ring->rx_pending;
> +
> +	if (running)
> +		ionic_open(netdev);
> +	clear_bit(LIF_QUEUE_RESET, lif->state);
> +
> +	return 0;
> +}

^ permalink raw reply

* Re: [PATCH v3 net-next 13/19] ionic: Add initial ethtool support
From: Michal Kubecek @ 2019-07-09  5:15 UTC (permalink / raw)
  To: netdev; +Cc: Andrew Lunn, Shannon Nelson
In-Reply-To: <20190708220406.GB17857@lunn.ch>

On Tue, Jul 09, 2019 at 12:04:06AM +0200, Andrew Lunn wrote:
> > +static int ionic_get_link_ksettings(struct net_device *netdev,
> > +				    struct ethtool_link_ksettings *ks)
> > +{
> > +	struct lif *lif = netdev_priv(netdev);
> > +	struct ionic_dev *idev = &lif->ionic->idev;
> > +	int copper_seen = 0;
> > +
> > +	ethtool_link_ksettings_zero_link_mode(ks, supported);
> > +	ethtool_link_ksettings_zero_link_mode(ks, advertising);
> > +
> > +	switch (le16_to_cpu(idev->port_info->status.xcvr.pid)) {
> > +		/* Copper */
> > +	case XCVR_PID_QSFP_100G_CR4:
> > +		ethtool_link_ksettings_add_link_mode(ks, supported,
> > +						     100000baseCR4_Full);
> > +		copper_seen++;
> > +		break;
> > +	case XCVR_PID_QSFP_40GBASE_CR4:
> > +		ethtool_link_ksettings_add_link_mode(ks, supported,
> > +						     40000baseCR4_Full);
> > +		copper_seen++;
> > +		break;
> > +	case XCVR_PID_SFP_25GBASE_CR_S:
> > +	case XCVR_PID_SFP_25GBASE_CR_L:
> > +	case XCVR_PID_SFP_25GBASE_CR_N:
> > +		ethtool_link_ksettings_add_link_mode(ks, supported,
> > +						     25000baseCR_Full);
> > +		copper_seen++;
> > +		break;
> > +	case XCVR_PID_SFP_10GBASE_AOC:
> > +	case XCVR_PID_SFP_10GBASE_CU:
> > +		ethtool_link_ksettings_add_link_mode(ks, supported,
> > +						     10000baseCR_Full);
> > +		copper_seen++;
> > +		break;
> > +
> > +		/* Fibre */
> > +	case XCVR_PID_QSFP_100G_SR4:
> > +	case XCVR_PID_QSFP_100G_AOC:
> > +		ethtool_link_ksettings_add_link_mode(ks, supported,
> > +						     100000baseSR4_Full);
> > +		break;
> > +	case XCVR_PID_QSFP_100G_LR4:
> > +		ethtool_link_ksettings_add_link_mode(ks, supported,
> > +						     100000baseLR4_ER4_Full);
> > +		break;
> > +	case XCVR_PID_QSFP_100G_ER4:
> > +		ethtool_link_ksettings_add_link_mode(ks, supported,
> > +						     100000baseLR4_ER4_Full);
> > +		break;
> > +	case XCVR_PID_QSFP_40GBASE_SR4:
> > +	case XCVR_PID_QSFP_40GBASE_AOC:
> > +		ethtool_link_ksettings_add_link_mode(ks, supported,
> > +						     40000baseSR4_Full);
> > +		break;
> > +	case XCVR_PID_QSFP_40GBASE_LR4:
> > +		ethtool_link_ksettings_add_link_mode(ks, supported,
> > +						     40000baseLR4_Full);
> > +		break;
> > +	case XCVR_PID_SFP_25GBASE_SR:
> > +	case XCVR_PID_SFP_25GBASE_AOC:
> > +		ethtool_link_ksettings_add_link_mode(ks, supported,
> > +						     25000baseSR_Full);
> > +		break;
> > +	case XCVR_PID_SFP_10GBASE_SR:
> > +		ethtool_link_ksettings_add_link_mode(ks, supported,
> > +						     10000baseSR_Full);
> > +		break;
> > +	case XCVR_PID_SFP_10GBASE_LR:
> > +		ethtool_link_ksettings_add_link_mode(ks, supported,
> > +						     10000baseLR_Full);
> > +		break;
> > +	case XCVR_PID_SFP_10GBASE_LRM:
> > +		ethtool_link_ksettings_add_link_mode(ks, supported,
> > +						     10000baseLRM_Full);
> > +		break;
> > +	case XCVR_PID_SFP_10GBASE_ER:
> > +		ethtool_link_ksettings_add_link_mode(ks, supported,
> > +						     10000baseER_Full);
> > +		break;
> 
> I don't know these link modes too well. But only setting a single bit
> seems odd. What i do know is that an SFP which supports 2500BaseX
> should also be able to support 1000BaseX. So should a 100G SFP also
> support 40G, 25G, 10G etc? The SERDES just runs a slower bitstream
> over the basic bitpipe?
> 
> > +	case XCVR_PID_QSFP_100G_ACC:
> > +	case XCVR_PID_QSFP_40GBASE_ER4:
> > +	case XCVR_PID_SFP_25GBASE_LR:
> > +	case XCVR_PID_SFP_25GBASE_ER:
> > +		dev_info(lif->ionic->dev, "no decode bits for xcvr type pid=%d / 0x%x\n",
> > +			 idev->port_info->status.xcvr.pid,
> > +			 idev->port_info->status.xcvr.pid);
> > +		break;
> 
> Why not add them?
> 
> 
> > +	memcpy(ks->link_modes.advertising, ks->link_modes.supported,
> > +	       sizeof(ks->link_modes.advertising));
> 
> bitmap_copy() would be a better way to do this. You could consider
> adding a helper to ethtool.h.

Also, there is no need to zero initialize ks->link_modes.advertising
above if it's going to be rewritten here anyway.

Michal

^ permalink raw reply

* Re: [PATCH v3] perf cs-etm: Improve completeness for kernel address space
From: Leo Yan @ 2019-07-09  5:11 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Suzuki K Poulose, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Linux Kernel Mailing List, linux-arm-kernel, netdev, bpf,
	Peter Zijlstra, Coresight ML
In-Reply-To: <CANLsYkwjJ57RWEqS9suLm1+JKicG1LzcHtP8k5qTK1d7bw=1MA@mail.gmail.com>

Hi Mathieu,

On Mon, Jul 08, 2019 at 11:33:59AM -0600, Mathieu Poirier wrote:
> On Wed, 19 Jun 2019 at 21:45, Leo Yan <leo.yan@linaro.org> wrote:
> >
> > Arm and arm64 architecture reserve some memory regions prior to the
> > symbol '_stext' and these memory regions later will be used by device
> > module and BPF jit.  The current code misses to consider these memory
> > regions thus any address in the regions will be taken as user space
> > mode, but perf cannot find the corresponding dso with the wrong CPU
> > mode so we misses to generate samples for device module and BPF
> > related trace data.
> >
> > This patch parse the link scripts to get the memory size prior to start
> > address and reduce this size from 'etmq->etm->kernel_start', then can
> > get a fixed up kernel start address which contain memory regions for
> > device module and BPF.  Finally, cs_etm__cpu_mode() can return right
> > mode for these memory regions and perf can successfully generate
> > samples.
> >
> > The reason for parsing the link scripts is Arm architecture changes text
> > offset dependent on different platforms, which define multiple text
> > offsets in $kernel/arch/arm/Makefile.  This offset is decided when build
> > kernel and the final value is extended in the link script, so we can
> > extract the used value from the link script.  We use the same way to
> > parse arm64 link script as well.  If fail to find the link script, the
> > pre start memory size is assumed as zero, in this case it has no any
> > change caused with this patch.
> >
> > Below is detailed info for testing this patch:
> >
> > - Build LLVM/Clang 8.0 or later version;
> >
> > - Configure perf with ~/.perfconfig:
> >
> >   root@debian:~# cat ~/.perfconfig
> >   # this file is auto-generated.
> >   [llvm]
> >           clang-path = /mnt/build/llvm-build/build/install/bin/clang
> >           kbuild-dir = /mnt/linux-kernel/linux-cs-dev/
> >           clang-opt = "-g"
> >           dump-obj = true
> >
> >   [trace]
> >           show_zeros = yes
> >           show_duration = no
> >           no_inherit = yes
> >           show_timestamp = no
> >           show_arg_names = no
> >           args_alignment = 40
> >           show_prefix = yes
> >
> > - Run 'perf trace' command with eBPF event:
> >
> >   root@debian:~# perf trace -e string \
> >       -e $kernel/tools/perf/examples/bpf/augmented_raw_syscalls.c
> >
> > - Read eBPF program memory mapping in kernel:
> >
> >   root@debian:~# echo 1 > /proc/sys/net/core/bpf_jit_kallsyms
> >   root@debian:~# cat /proc/kallsyms | grep -E "bpf_prog_.+_sys_[enter|exit]"
> >   ffff000000086a84 t bpf_prog_f173133dc38ccf87_sys_enter  [bpf]
> >   ffff000000088618 t bpf_prog_c1bd85c092d6e4aa_sys_exit   [bpf]
> >
> > - Launch any program which accesses file system frequently so can hit
> >   the system calls trace flow with eBPF event;
> >
> > - Capture CoreSight trace data with filtering eBPF program:
> >
> >   root@debian:~# perf record -e cs_etm/@20070000.etr/ \
> >           --filter 'filter 0xffff000000086a84/0x800' -a sleep 5s
> >
> > - Annotate for symbol 'bpf_prog_f173133dc38ccf87_sys_enter':
> >
> >   root@debian:~# perf report
> >   Then select 'branches' samples and press 'a' to annotate symbol
> >   'bpf_prog_f173133dc38ccf87_sys_enter', press 'P' to print to the
> >   bpf_prog_f173133dc38ccf87_sys_enter.annotation file:
> >
> >   root@debian:~# cat bpf_prog_f173133dc38ccf87_sys_enter.annotation
> >
> >   bpf_prog_f173133dc38ccf87_sys_enter() bpf_prog_f173133dc38ccf87_sys_enter
> >   Event: branches
> >
> >   Percent      int sys_enter(struct syscall_enter_args *args)
> >                  stp  x29, x30, [sp, #-16]!
> >
> >                 int key = 0;
> >                  mov  x29, sp
> >
> >                        augmented_args = bpf_map_lookup_elem(&augmented_filename_map, &key);
> >                  stp  x19, x20, [sp, #-16]!
> >
> >                        augmented_args = bpf_map_lookup_elem(&augmented_filename_map, &key);
> >                  stp  x21, x22, [sp, #-16]!
> >
> >                  stp  x25, x26, [sp, #-16]!
> >
> >                 return bpf_get_current_pid_tgid();
> >                  mov  x25, sp
> >
> >                 return bpf_get_current_pid_tgid();
> >                  mov  x26, #0x0                         // #0
> >
> >                  sub  sp, sp, #0x10
> >
> >                 return bpf_map_lookup_elem(pids, &pid) != NULL;
> >                  add  x19, x0, #0x0
> >
> >                  mov  x0, #0x0                          // #0
> >
> >                  mov  x10, #0xfffffffffffffff8          // #-8
> >
> >                 if (pid_filter__has(&pids_filtered, getpid()))
> >                  str  w0, [x25, x10]
> >
> >                 probe_read(&augmented_args->args, sizeof(augmented_args->args), args);
> >                  add  x1, x25, #0x0
> >
> >                 probe_read(&augmented_args->args, sizeof(augmented_args->args), args);
> >                  mov  x10, #0xfffffffffffffff8          // #-8
> >
> >                 syscall = bpf_map_lookup_elem(&syscalls, &augmented_args->args.syscall_nr);
> >                  add  x1, x1, x10
> >
> >                 syscall = bpf_map_lookup_elem(&syscalls, &augmented_args->args.syscall_nr);
> >                  mov  x0, #0xffff8009ffffffff           // #-140694538682369
> >
> >                  movk x0, #0x6698, lsl #16
> >
> >                  movk x0, #0x3e00
> >
> >                  mov  x10, #0xffffffffffff1040          // #-61376
> >
> >                 if (syscall == NULL || !syscall->enabled)
> >                  movk x10, #0x1023, lsl #16
> >
> >                 if (syscall == NULL || !syscall->enabled)
> >                  movk x10, #0x0, lsl #32
> >
> >                 loop_iter_first()
> >     3.69       → blr  bpf_prog_f173133dc38ccf87_sys_enter
> >                 loop_iter_first()
> >                  add  x7, x0, #0x0
> >
> >                 loop_iter_first()
> >                  add  x20, x7, #0x0
> >
> >                 int size = probe_read_str(&augmented_filename->value, filename_len, filename_arg);
> >                  mov  x0, #0x1                          // #1
> 
> I'm not sure all this information about annotation should be in the
> changelog.  This patch is about being able to decode traces that
> executed outside the current kernel addresse range and as such simply
> using "perf report" or "perf script" successfully is enough to test
> this set.  Any information that goes beyond that muddies the water.

Agree.  Will remove this in the new patch.

> >   [...]
> >
> > Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> > Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> > Cc: Jiri Olsa <jolsa@redhat.com>
> > Cc: Namhyung Kim <namhyung@kernel.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
> > Cc: coresight@lists.linaro.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> >  tools/perf/Makefile.config | 22 ++++++++++++++++++++++
> >  tools/perf/util/cs-etm.c   | 19 ++++++++++++++++++-
> >  2 files changed, 40 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
> > index 51dd00f65709..a58cd5a43a98 100644
> > --- a/tools/perf/Makefile.config
> > +++ b/tools/perf/Makefile.config
> > @@ -418,6 +418,28 @@ ifdef CORESIGHT
> >      endif
> >      LDFLAGS += $(LIBOPENCSD_LDFLAGS)
> >      EXTLIBS += $(OPENCSDLIBS)
> > +    PRE_START_SIZE := 0
> > +    ifneq ($(wildcard $(srctree)/arch/$(SRCARCH)/kernel/vmlinux.lds),)
> > +      ifeq ($(SRCARCH),arm64)
> > +        # Extract info from lds:
> > +        #  . = ((((((((0xffffffffffffffff)) - (((1)) << (48)) + 1) + (0)) + (0x08000000))) + (0x08000000))) + 0x00080000;
> > +        # PRE_START_SIZE := (0x08000000 + 0x08000000 + 0x00080000) = 0x10080000
> > +        PRE_START_SIZE := $(shell egrep ' \. \= \({8}0x[0-9a-fA-F]+\){2}' \
> > +          $(srctree)/arch/$(SRCARCH)/kernel/vmlinux.lds | \
> > +          sed -e 's/[(|)|.|=|+|<|;|-]//g' -e 's/ \+/ /g' -e 's/^[ \t]*//' | \
> > +          awk -F' ' '{printf "0x%x", $$6+$$7+$$8}' 2>/dev/null)
> > +      endif
> > +      ifeq ($(SRCARCH),arm)
> > +        # Extract info from lds:
> > +        #   . = ((0xC0000000)) + 0x00208000;
> > +        # PRE_START_SIZE := 0x00208000
> > +        PRE_START_SIZE := $(shell egrep ' \. \= \({2}0x[0-9a-fA-F]+\){2}' \
> > +          $(srctree)/arch/$(SRCARCH)/kernel/vmlinux.lds | \
> > +          sed -e 's/[(|)|.|=|+|<|;|-]//g' -e 's/ \+/ /g' -e 's/^[ \t]*//' | \
> > +          awk -F' ' '{printf "0x%x", $$2}' 2>/dev/null)
> > +      endif
> > +    endif
> > +    CFLAGS += -DARM_PRE_START_SIZE=$(PRE_START_SIZE)
> 
> It might be useful to do this for arm and arm64 regardless of
> CoreSight but I'll let Arnaldo decide on this.

Ah, good point!  On Arm/arm64 platforms, if we want to parse kernel
address space with considering eBPF/module, the kernel start address
needs to be compensated in the function machine__get_kernel_start(),
so this can let PMU or other events to work correctly.

Will wait a bit if Arnaldo could give out guidance.

> >      $(call detected,CONFIG_LIBOPENCSD)
> >      ifdef CSTRACE_RAW
> >        CFLAGS += -DCS_DEBUG_RAW
> > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> > index 0c7776b51045..5fa0be3a3904 100644
> > --- a/tools/perf/util/cs-etm.c
> > +++ b/tools/perf/util/cs-etm.c
> > @@ -613,10 +613,27 @@ static void cs_etm__free(struct perf_session *session)
> >  static u8 cs_etm__cpu_mode(struct cs_etm_queue *etmq, u64 address)
> >  {
> >         struct machine *machine;
> > +       u64 fixup_kernel_start = 0;
> >
> >         machine = etmq->etm->machine;
> >
> > -       if (address >= etmq->etm->kernel_start) {
> > +       /*
> > +        * Since arm and arm64 specify some memory regions prior to
> > +        * 'kernel_start', kernel addresses can be less than 'kernel_start'.
> > +        *
> > +        * For arm architecture, the 16MB virtual memory space prior to
> > +        * 'kernel_start' is allocated to device modules, a PMD table if
> > +        * CONFIG_HIGHMEM is enabled and a PGD table.
> > +        *
> > +        * For arm64 architecture, the root PGD table, device module memory
> > +        * region and BPF jit region are prior to 'kernel_start'.
> > +        *
> > +        * To reflect the complete kernel address space, compensate these
> > +        * pre-defined regions for kernel start address.
> > +        */
> > +       fixup_kernel_start = etmq->etm->kernel_start - ARM_PRE_START_SIZE;
> > +
> > +       if (address >= fixup_kernel_start) {
> >                 if (machine__is_host(machine))
> >                         return PERF_RECORD_MISC_KERNEL;
> >                 else
> 
> Tested-by: Mathieu Poirier <mathieu.poirier@linaro.org>

Thanks a lot for the reviewing & testing; and very appreciate your much
consumed time :)

Thanks,
Leo Yan

^ permalink raw reply

* Re: [PATCH V5 net-next 4/6] dt-bindings: ptp: Introduce MII time stamping devices.
From: Richard Cochran @ 2019-07-09  5:11 UTC (permalink / raw)
  To: Rob Herring
  Cc: netdev, David Miller, devicetree, Andrew Lunn, Florian Fainelli,
	Jacob Keller, Mark Rutland, Miroslav Lichvar, Willem de Bruijn
In-Reply-To: <20190708213837.GA28934@bogus>

On Mon, Jul 08, 2019 at 03:38:37PM -0600, Rob Herring wrote:
> > +Required properties of the control node:
> > +
> > +- compatible:		"ines,ptp-ctrl"
> 
> This is an IP block that gets integrated into SoCs?

It is an IP block implemented in an FPGA (like the zync or the socfpga).

> It's not very 
> specific given that there could be different versions of the IP block 
> and SoC vendors can integrate various versions of the IP block in their 
> own unique (i.e. buggy) way.

There is a version register where both the interface and FPGA are
versioned.  The driver doesn't presently do anything with that field,
but if newer interfaces appear, then the driver can deal with that
without any DT help.

Thanks,
Richard

^ permalink raw reply

* Re: [PATCH bpf-next] libbpf: fix ptr to u64 conversion warning on 32-bit platforms
From: Yonghong Song @ 2019-07-09  4:30 UTC (permalink / raw)
  To: Andrii Nakryiko, andrii.nakryiko@gmail.com, Alexei Starovoitov,
	daniel@iogearbox.net, bpf@vger.kernel.org, netdev@vger.kernel.org,
	Kernel Team
In-Reply-To: <20190709040007.1665882-1-andriin@fb.com>



On 7/8/19 9:00 PM, Andrii Nakryiko wrote:
> On 32-bit platforms compiler complains about conversion:
> 
> libbpf.c: In function ‘perf_event_open_probe’:
> libbpf.c:4112:17: error: cast from pointer to integer of different
> size [-Werror=pointer-to-int-cast]
>    attr.config1 = (uint64_t)(void *)name; /* kprobe_func or uprobe_path */
>                   ^
> 
> Reported-by: Matt Hart <matthew.hart@linaro.org>
> Fixes: b26500274767 ("libbpf: add kprobe/uprobe attach API")
> Tested-by: Matt Hart <matthew.hart@linaro.org>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>

Acked-by: Yonghong Song <yhs@fb.com>

> ---
>   tools/lib/bpf/libbpf.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index ed07789b3e62..794dd5064ae8 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -4126,8 +4126,8 @@ static int perf_event_open_probe(bool uprobe, bool retprobe, const char *name,
>   	}
>   	attr.size = sizeof(attr);
>   	attr.type = type;
> -	attr.config1 = (uint64_t)(void *)name; /* kprobe_func or uprobe_path */
> -	attr.config2 = offset;		       /* kprobe_addr or probe_offset */
> +	attr.config1 = ptr_to_u64(name); /* kprobe_func or uprobe_path */
> +	attr.config2 = offset;		 /* kprobe_addr or probe_offset */
>   
>   	/* pid filter is meaningful only for uprobes */
>   	pfd = syscall(__NR_perf_event_open, &attr,
> 

^ permalink raw reply

* Re: WARNING in mark_chain_precision
From: syzbot @ 2019-07-09  4:25 UTC (permalink / raw)
  To: aaron.f.brown, andrii.nakryiko, ast, bpf, daniel, davem, hawk,
	intel-wired-lan, jakub.kicinski, jeffrey.t.kirsher,
	john.fastabend, kafai, linux-kernel, netdev, sasha.neftin,
	songliubraving, syzkaller-bugs, xdp-newbies, yhs
In-Reply-To: <CAEf4BzaUEWwGL3k0VeiFYFqyJexQU9cDZWN69jSDpBjP1ZEcpw@mail.gmail.com>

Hello,

syzbot has tested the proposed patch but the reproducer still triggered  
crash:
WARNING in bpf_jit_free

WARNING: CPU: 0 PID: 9077 at kernel/bpf/core.c:851 bpf_jit_free+0x157/0x1b0
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 9077 Comm: kworker/0:3 Not tainted 5.2.0-rc6+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Workqueue: events bpf_prog_free_deferred
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x172/0x1f0 lib/dump_stack.c:113
  panic+0x2cb/0x744 kernel/panic.c:219
  __warn.cold+0x20/0x4d kernel/panic.c:576
  report_bug+0x263/0x2b0 lib/bug.c:186
  fixup_bug arch/x86/kernel/traps.c:179 [inline]
  fixup_bug arch/x86/kernel/traps.c:174 [inline]
  do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:272
  do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:291
  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:986
RIP: 0010:bpf_jit_free+0x157/0x1b0
Code: 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 5d 48 b8 00 02 00 00  
00 00 ad de 48 39 43 70 0f 84 05 ff ff ff e8 09 7f f4 ff <0f> 0b e9 f9 fe  
ff ff e8 2d 02 2e 00 e9 d9 fe ff ff 48 89 7d e0 e8
RSP: 0018:ffff888084affcb0 EFLAGS: 00010293
RAX: ffff88808a622100 RBX: ffff88809639d580 RCX: ffffffff817b0b0d
RDX: 0000000000000000 RSI: ffffffff817c4557 RDI: ffff88809639d5f0
RBP: ffff888084affcd0 R08: 1ffffffff150daa8 R09: fffffbfff150daa9
R10: fffffbfff150daa8 R11: ffffffff8a86d547 R12: ffffc90001921000
R13: ffff88809639d5e8 R14: ffff8880a0589800 R15: ffff8880ae834d40
  bpf_prog_free_deferred+0x27a/0x350 kernel/bpf/core.c:1982
  process_one_work+0x989/0x1790 kernel/workqueue.c:2269
  worker_thread+0x98/0xe40 kernel/workqueue.c:2415
  kthread+0x354/0x420 kernel/kthread.c:255
  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Kernel Offset: disabled
Rebooting in 86400 seconds..


Tested on:

commit:         b9321614 bpf: fix precision bit propagation for BPF_ST ins..
git tree:       https://github.com/anakryiko/linux bpf-fix-precise-bpf_st
console output: https://syzkaller.appspot.com/x/log.txt?x=112f0dfda00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=6bb3e6e7997c14f9
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)


^ permalink raw reply

* [PATCH net-next v6 5/5] net/mlx5e: Register devlink ports for physical link, PCI PF, VFs
From: Parav Pandit @ 2019-07-09  4:17 UTC (permalink / raw)
  To: netdev; +Cc: jiri, saeedm, jakub.kicinski, Parav Pandit
In-Reply-To: <20190709041739.44292-1-parav@mellanox.com>

Register devlink port of physical port, PCI PF and PCI VF flavour
for each PF, VF when a given devlink instance is in switchdev mode.

Implement ndo_get_devlink_port callback API to make use of registered
devlink ports.
This eliminates ndo_get_phys_port_name() and ndo_get_port_parent_id()
callbacks. Hence, remove them.

An example output with 2 VFs, without a PF and single uplink port is
below.

$devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0 flavour physical
pci/0000:05:00.0/1: type eth netdev eth1 flavour pcivf pfnum 0 vfnum 0
pci/0000:05:00.0/2: type eth netdev eth2 flavour pcivf pfnum 0 vfnum 1

Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_rep.c  | 108 +++++++++++++-----
 .../net/ethernet/mellanox/mlx5/core/en_rep.h  |   1 +
 2 files changed, 78 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 529f8e4b32c6..6810b9fa0705 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -37,6 +37,7 @@
 #include <net/act_api.h>
 #include <net/netevent.h>
 #include <net/arp.h>
+#include <net/devlink.h>
 
 #include "eswitch.h"
 #include "en.h"
@@ -1119,32 +1120,6 @@ static int mlx5e_rep_close(struct net_device *dev)
 	return ret;
 }
 
-static int mlx5e_rep_get_phys_port_name(struct net_device *dev,
-					char *buf, size_t len)
-{
-	struct mlx5e_priv *priv = netdev_priv(dev);
-	struct mlx5e_rep_priv *rpriv = priv->ppriv;
-	struct mlx5_eswitch_rep *rep = rpriv->rep;
-	unsigned int fn;
-	int ret;
-
-	fn = PCI_FUNC(priv->mdev->pdev->devfn);
-	if (fn >= MLX5_MAX_PORTS)
-		return -EOPNOTSUPP;
-
-	if (rep->vport == MLX5_VPORT_UPLINK)
-		ret = snprintf(buf, len, "p%d", fn);
-	else if (rep->vport == MLX5_VPORT_PF)
-		ret = snprintf(buf, len, "pf%d", fn);
-	else
-		ret = snprintf(buf, len, "pf%dvf%d", fn, rep->vport - 1);
-
-	if (ret >= len)
-		return -EOPNOTSUPP;
-
-	return 0;
-}
-
 static int
 mlx5e_rep_setup_tc_cls_flower(struct mlx5e_priv *priv,
 			      struct tc_cls_flower_offload *cls_flower, int flags)
@@ -1298,17 +1273,24 @@ static int mlx5e_uplink_rep_set_vf_vlan(struct net_device *dev, int vf, u16 vlan
 	return 0;
 }
 
+static struct devlink_port *mlx5e_get_devlink_port(struct net_device *dev)
+{
+	struct mlx5e_priv *priv = netdev_priv(dev);
+	struct mlx5e_rep_priv *rpriv = priv->ppriv;
+
+	return &rpriv->dl_port;
+}
+
 static const struct net_device_ops mlx5e_netdev_ops_rep = {
 	.ndo_open                = mlx5e_rep_open,
 	.ndo_stop                = mlx5e_rep_close,
 	.ndo_start_xmit          = mlx5e_xmit,
-	.ndo_get_phys_port_name  = mlx5e_rep_get_phys_port_name,
 	.ndo_setup_tc            = mlx5e_rep_setup_tc,
+	.ndo_get_devlink_port = mlx5e_get_devlink_port,
 	.ndo_get_stats64         = mlx5e_rep_get_stats,
 	.ndo_has_offload_stats	 = mlx5e_rep_has_offload_stats,
 	.ndo_get_offload_stats	 = mlx5e_rep_get_offload_stats,
 	.ndo_change_mtu          = mlx5e_rep_change_mtu,
-	.ndo_get_port_parent_id	 = mlx5e_rep_get_port_parent_id,
 };
 
 static const struct net_device_ops mlx5e_netdev_ops_uplink_rep = {
@@ -1316,8 +1298,8 @@ static const struct net_device_ops mlx5e_netdev_ops_uplink_rep = {
 	.ndo_stop                = mlx5e_close,
 	.ndo_start_xmit          = mlx5e_xmit,
 	.ndo_set_mac_address     = mlx5e_uplink_rep_set_mac,
-	.ndo_get_phys_port_name  = mlx5e_rep_get_phys_port_name,
 	.ndo_setup_tc            = mlx5e_rep_setup_tc,
+	.ndo_get_devlink_port = mlx5e_get_devlink_port,
 	.ndo_get_stats64         = mlx5e_get_stats,
 	.ndo_has_offload_stats	 = mlx5e_rep_has_offload_stats,
 	.ndo_get_offload_stats	 = mlx5e_rep_get_offload_stats,
@@ -1330,7 +1312,6 @@ static const struct net_device_ops mlx5e_netdev_ops_uplink_rep = {
 	.ndo_get_vf_config       = mlx5e_get_vf_config,
 	.ndo_get_vf_stats        = mlx5e_get_vf_stats,
 	.ndo_set_vf_vlan         = mlx5e_uplink_rep_set_vf_vlan,
-	.ndo_get_port_parent_id	 = mlx5e_rep_get_port_parent_id,
 	.ndo_set_features        = mlx5e_set_features,
 };
 
@@ -1731,6 +1712,55 @@ static const struct mlx5e_profile mlx5e_uplink_rep_profile = {
 	.max_tc			= MLX5E_MAX_NUM_TC,
 };
 
+static bool
+is_devlink_port_supported(const struct mlx5_core_dev *dev,
+			  const struct mlx5e_rep_priv *rpriv)
+{
+	return rpriv->rep->vport == MLX5_VPORT_UPLINK ||
+	       rpriv->rep->vport == MLX5_VPORT_PF ||
+	       mlx5_eswitch_is_vf_vport(dev->priv.eswitch, rpriv->rep->vport);
+}
+
+static int register_devlink_port(struct mlx5_core_dev *dev,
+				 struct mlx5e_rep_priv *rpriv)
+{
+	struct devlink *devlink = priv_to_devlink(dev);
+	struct mlx5_eswitch_rep *rep = rpriv->rep;
+	struct netdev_phys_item_id ppid = {};
+	int ret;
+
+	if (!is_devlink_port_supported(dev, rpriv))
+		return 0;
+
+	ret = mlx5e_rep_get_port_parent_id(rpriv->netdev, &ppid);
+	if (ret)
+		return ret;
+
+	if (rep->vport == MLX5_VPORT_UPLINK)
+		devlink_port_attrs_set(&rpriv->dl_port,
+				       DEVLINK_PORT_FLAVOUR_PHYSICAL,
+				       PCI_FUNC(dev->pdev->devfn), false, 0,
+				       &ppid.id[0], ppid.id_len);
+	else if (rep->vport == MLX5_VPORT_PF)
+		devlink_port_attrs_pci_pf_set(&rpriv->dl_port,
+					      &ppid.id[0], ppid.id_len,
+					      dev->pdev->devfn);
+	else if (mlx5_eswitch_is_vf_vport(dev->priv.eswitch, rpriv->rep->vport))
+		devlink_port_attrs_pci_vf_set(&rpriv->dl_port,
+					      &ppid.id[0], ppid.id_len,
+					      dev->pdev->devfn,
+					      rep->vport - 1);
+
+	return devlink_port_register(devlink, &rpriv->dl_port, rep->vport);
+}
+
+static void unregister_devlink_port(struct mlx5_core_dev *dev,
+				    struct mlx5e_rep_priv *rpriv)
+{
+	if (is_devlink_port_supported(dev, rpriv))
+		devlink_port_unregister(&rpriv->dl_port);
+}
+
 /* e-Switch vport representors */
 static int
 mlx5e_vport_rep_load(struct mlx5_core_dev *dev, struct mlx5_eswitch_rep *rep)
@@ -1782,15 +1812,27 @@ mlx5e_vport_rep_load(struct mlx5_core_dev *dev, struct mlx5_eswitch_rep *rep)
 		goto err_detach_netdev;
 	}
 
+	err = register_devlink_port(dev, rpriv);
+	if (err) {
+		esw_warn(dev, "Failed to register devlink port %d\n",
+			 rep->vport);
+		goto err_neigh_cleanup;
+	}
+
 	err = register_netdev(netdev);
 	if (err) {
 		pr_warn("Failed to register representor netdev for vport %d\n",
 			rep->vport);
-		goto err_neigh_cleanup;
+		goto err_devlink_cleanup;
 	}
 
+	if (is_devlink_port_supported(dev, rpriv))
+		devlink_port_type_eth_set(&rpriv->dl_port, netdev);
 	return 0;
 
+err_devlink_cleanup:
+	unregister_devlink_port(dev, rpriv);
+
 err_neigh_cleanup:
 	mlx5e_rep_neigh_cleanup(rpriv);
 
@@ -1813,9 +1855,13 @@ mlx5e_vport_rep_unload(struct mlx5_eswitch_rep *rep)
 	struct mlx5e_rep_priv *rpriv = mlx5e_rep_to_rep_priv(rep);
 	struct net_device *netdev = rpriv->netdev;
 	struct mlx5e_priv *priv = netdev_priv(netdev);
+	struct mlx5_core_dev *dev = priv->mdev;
 	void *ppriv = priv->ppriv;
 
+	if (is_devlink_port_supported(dev, rpriv))
+		devlink_port_type_clear(&rpriv->dl_port);
 	unregister_netdev(netdev);
+	unregister_devlink_port(dev, rpriv);
 	mlx5e_rep_neigh_cleanup(rpriv);
 	mlx5e_detach_netdev(priv);
 	if (rep->vport == MLX5_VPORT_UPLINK)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
index d4585f3b8cb2..c56e6ee4350c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
@@ -86,6 +86,7 @@ struct mlx5e_rep_priv {
 	struct mlx5_flow_handle *vport_rx_rule;
 	struct list_head       vport_sqs_list;
 	struct mlx5_rep_uplink_priv uplink_priv; /* valid for uplink rep */
+	struct devlink_port dl_port;
 };
 
 static inline
-- 
2.19.2


^ permalink raw reply related

* [PATCH net-next v6 4/5] devlink: Introduce PCI VF port flavour and port attribute
From: Parav Pandit @ 2019-07-09  4:17 UTC (permalink / raw)
  To: netdev; +Cc: jiri, saeedm, jakub.kicinski, Parav Pandit
In-Reply-To: <20190709041739.44292-1-parav@mellanox.com>

In an eswitch, PCI VF may have port which is normally represented using
a representor netdevice.
To have better visibility of eswitch port, its association with VF,
and its representor netdevice, introduce a PCI VF port flavour.

When devlink port flavour is PCI VF, fill up PCI VF attributes of
the port.

Extend port name creation using PCI PF and VF number scheme on best
effort basis, so that vendor drivers can skip defining their own scheme.

$ devlink port show
pci/0000:05:00.0/0: type eth netdev eth0 flavour pcipf pfnum 0
pci/0000:05:00.0/1: type eth netdev eth1 flavour pcivf pfnum 0 vfnum 0
pci/0000:05:00.0/2: type eth netdev eth2 flavour pcivf pfnum 0 vfnum 1

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
---
 include/net/devlink.h        | 10 ++++++++++
 include/uapi/linux/devlink.h |  6 ++++++
 net/core/devlink.c           | 38 ++++++++++++++++++++++++++++++++++++
 3 files changed, 54 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 97cef896e4d0..bc36f942a7d5 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -50,6 +50,11 @@ struct devlink_port_pci_pf_attrs {
 	u16 pf;	/* Associated PCI PF for this port. */
 };
 
+struct devlink_port_pci_vf_attrs {
+	u16 pf;	/* Associated PCI PF for this port. */
+	u16 vf;	/* Associated PCI VF for of the PCI PF for this port. */
+};
+
 struct devlink_port_attrs {
 	u8 set:1,
 	   split:1,
@@ -59,6 +64,7 @@ struct devlink_port_attrs {
 	union {
 		struct devlink_port_phys_attrs phys;
 		struct devlink_port_pci_pf_attrs pci_pf;
+		struct devlink_port_pci_vf_attrs pci_vf;
 	};
 };
 
@@ -607,6 +613,10 @@ void devlink_port_attrs_set(struct devlink_port *devlink_port,
 void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port,
 				   const unsigned char *switch_id,
 				   unsigned char switch_id_len, u16 pf);
+void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port,
+				   const unsigned char *switch_id,
+				   unsigned char switch_id_len,
+				   u16 pf, u16 vf);
 int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
 			u32 size, u16 ingress_pools_count,
 			u16 egress_pools_count, u16 ingress_tc_count,
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index f7323884c3fe..ffc993256527 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -173,6 +173,10 @@ enum devlink_port_flavour {
 				      * the PCI PF. It is an internal
 				      * port that faces the PCI PF.
 				      */
+	DEVLINK_PORT_FLAVOUR_PCI_VF, /* Represents eswitch port
+				      * for the PCI VF. It is an internal
+				      * port that faces the PCI VF.
+				      */
 };
 
 enum devlink_param_cmode {
@@ -342,6 +346,8 @@ enum devlink_attr {
 	DEVLINK_ATTR_FLASH_UPDATE_STATUS_TOTAL,	/* u64 */
 
 	DEVLINK_ATTR_PORT_PCI_PF_NUMBER,	/* u16 */
+	DEVLINK_ATTR_PORT_PCI_VF_NUMBER,	/* u16 */
+
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index d362652a5cc7..4f40aeace902 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -519,6 +519,12 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg,
 		if (nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER,
 				attrs->pci_pf.pf))
 			return -EMSGSIZE;
+	} else if (devlink_port->attrs.flavour == DEVLINK_PORT_FLAVOUR_PCI_VF) {
+		if (nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER,
+				attrs->pci_vf.pf) ||
+		    nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_VF_NUMBER,
+				attrs->pci_vf.vf))
+			return -EMSGSIZE;
 	}
 	if (devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_PHYSICAL &&
 	    devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_CPU &&
@@ -5832,6 +5838,34 @@ void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port,
 }
 EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_pf_set);
 
+/**
+ *	devlink_port_attrs_pci_vf_set - Set PCI VF port attributes
+ *
+ *	@devlink_port: devlink port
+ *	@pf: associated PF for the devlink port instance
+ *	@vf: associated VF of a PF for the devlink port instance
+ *	@switch_id: if the port is part of switch, this is buffer with ID,
+ *	            otherwise this is NULL
+ *	@switch_id_len: length of the switch_id buffer
+ */
+void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port,
+				   const unsigned char *switch_id,
+				   unsigned char switch_id_len,
+				   u16 pf, u16 vf)
+{
+	struct devlink_port_attrs *attrs = &devlink_port->attrs;
+	int ret;
+
+	ret = __devlink_port_attrs_set(devlink_port,
+				       DEVLINK_PORT_FLAVOUR_PCI_VF,
+				       switch_id, switch_id_len);
+	if (ret)
+		return;
+	attrs->pci_vf.pf = pf;
+	attrs->pci_vf.vf = vf;
+}
+EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_vf_set);
+
 static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
 					     char *name, size_t len)
 {
@@ -5860,6 +5894,10 @@ static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
 	case DEVLINK_PORT_FLAVOUR_PCI_PF:
 		n = snprintf(name, len, "pf%u", attrs->pci_pf.pf);
 		break;
+	case DEVLINK_PORT_FLAVOUR_PCI_VF:
+		n = snprintf(name, len, "pf%uvf%u",
+			     attrs->pci_vf.pf, attrs->pci_vf.vf);
+		break;
 	}
 
 	if (n >= len)
-- 
2.19.2


^ permalink raw reply related

* [PATCH net-next v6 1/5] devlink: Refactor physical port attributes
From: Parav Pandit @ 2019-07-09  4:17 UTC (permalink / raw)
  To: netdev; +Cc: jiri, saeedm, jakub.kicinski, Parav Pandit
In-Reply-To: <20190709041739.44292-1-parav@mellanox.com>

To support additional devlink port flavours and to support few common
and few different port attributes, move physical port attributes to a
different structure.

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
---
Changelog:
v5->v6:
 - Addressed comment from Jiri.
 - Changed 'physical' to 'phys'.
v4->v5:
 - Addressed comments from Jiri.
 - Moved check for physical port flavours check to separate patch.
v3->v4:
 - Addressed comments from Jiri.
 - Renamed phys_port to physical to be consistent with pci_pf.
 - Removed port_number from __devlink_port_attrs_set and moved
   assigment to caller function.
 - Used capital letter while moving old comment to new structure.
 - Removed helper function is_devlink_phy_port_num_supported().
v2->v3:
 - Address comments from Jakub.
 - Made port_number and split_port_number applicable only to
   physical port flavours by having in union.
v1->v2:
 - Limited port_num attribute to physical ports
 - Updated PCI PF attribute set API to not have port_number
---
 include/net/devlink.h | 13 ++++++++--
 net/core/devlink.c    | 58 ++++++++++++++++++++++++++++---------------
 2 files changed, 49 insertions(+), 22 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 6625ea068d5e..4538c80fe293 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -38,14 +38,23 @@ struct devlink {
 	char priv[0] __aligned(NETDEV_ALIGN);
 };
 
+struct devlink_port_phys_attrs {
+	u32 port_number; /* Same value as "split group".
+			  * A physical port which is visible to the user
+			  * for a given port flavour.
+			  */
+	u32 split_subport_number;
+};
+
 struct devlink_port_attrs {
 	u8 set:1,
 	   split:1,
 	   switch_port:1;
 	enum devlink_port_flavour flavour;
-	u32 port_number; /* same value as "split group" */
-	u32 split_subport_number;
 	struct netdev_phys_item_id switch_id;
+	union {
+		struct devlink_port_phys_attrs phys;
+	};
 };
 
 struct devlink_port {
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 89c533778135..eacaf37b5108 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -515,14 +515,16 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg,
 		return 0;
 	if (nla_put_u16(msg, DEVLINK_ATTR_PORT_FLAVOUR, attrs->flavour))
 		return -EMSGSIZE;
-	if (nla_put_u32(msg, DEVLINK_ATTR_PORT_NUMBER, attrs->port_number))
+	if (nla_put_u32(msg, DEVLINK_ATTR_PORT_NUMBER,
+			attrs->phys.port_number))
 		return -EMSGSIZE;
 	if (!attrs->split)
 		return 0;
-	if (nla_put_u32(msg, DEVLINK_ATTR_PORT_SPLIT_GROUP, attrs->port_number))
+	if (nla_put_u32(msg, DEVLINK_ATTR_PORT_SPLIT_GROUP,
+			attrs->phys.port_number))
 		return -EMSGSIZE;
 	if (nla_put_u32(msg, DEVLINK_ATTR_PORT_SPLIT_SUBPORT_NUMBER,
-			attrs->split_subport_number))
+			attrs->phys.split_subport_number))
 		return -EMSGSIZE;
 	return 0;
 }
@@ -5738,6 +5740,29 @@ void devlink_port_type_clear(struct devlink_port *devlink_port)
 }
 EXPORT_SYMBOL_GPL(devlink_port_type_clear);
 
+static int __devlink_port_attrs_set(struct devlink_port *devlink_port,
+				    enum devlink_port_flavour flavour,
+				    const unsigned char *switch_id,
+				    unsigned char switch_id_len)
+{
+	struct devlink_port_attrs *attrs = &devlink_port->attrs;
+
+	if (WARN_ON(devlink_port->registered))
+		return -EEXIST;
+	attrs->set = true;
+	attrs->flavour = flavour;
+	if (switch_id) {
+		attrs->switch_port = true;
+		if (WARN_ON(switch_id_len > MAX_PHYS_ITEM_ID_LEN))
+			switch_id_len = MAX_PHYS_ITEM_ID_LEN;
+		memcpy(attrs->switch_id.id, switch_id, switch_id_len);
+		attrs->switch_id.id_len = switch_id_len;
+	} else {
+		attrs->switch_port = false;
+	}
+	return 0;
+}
+
 /**
  *	devlink_port_attrs_set - Set port attributes
  *
@@ -5760,23 +5785,15 @@ void devlink_port_attrs_set(struct devlink_port *devlink_port,
 			    unsigned char switch_id_len)
 {
 	struct devlink_port_attrs *attrs = &devlink_port->attrs;
+	int ret;
 
-	if (WARN_ON(devlink_port->registered))
+	ret = __devlink_port_attrs_set(devlink_port, flavour,
+				       switch_id, switch_id_len);
+	if (ret)
 		return;
-	attrs->set = true;
-	attrs->flavour = flavour;
-	attrs->port_number = port_number;
 	attrs->split = split;
-	attrs->split_subport_number = split_subport_number;
-	if (switch_id) {
-		attrs->switch_port = true;
-		if (WARN_ON(switch_id_len > MAX_PHYS_ITEM_ID_LEN))
-			switch_id_len = MAX_PHYS_ITEM_ID_LEN;
-		memcpy(attrs->switch_id.id, switch_id, switch_id_len);
-		attrs->switch_id.id_len = switch_id_len;
-	} else {
-		attrs->switch_port = false;
-	}
+	attrs->phys.port_number = port_number;
+	attrs->phys.split_subport_number = split_subport_number;
 }
 EXPORT_SYMBOL_GPL(devlink_port_attrs_set);
 
@@ -5792,10 +5809,11 @@ static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
 	switch (attrs->flavour) {
 	case DEVLINK_PORT_FLAVOUR_PHYSICAL:
 		if (!attrs->split)
-			n = snprintf(name, len, "p%u", attrs->port_number);
+			n = snprintf(name, len, "p%u", attrs->phys.port_number);
 		else
-			n = snprintf(name, len, "p%us%u", attrs->port_number,
-				     attrs->split_subport_number);
+			n = snprintf(name, len, "p%us%u",
+				     attrs->phys.port_number,
+				     attrs->phys.split_subport_number);
 		break;
 	case DEVLINK_PORT_FLAVOUR_CPU:
 	case DEVLINK_PORT_FLAVOUR_DSA:
-- 
2.19.2


^ permalink raw reply related

* [PATCH net-next v6 2/5] devlink: Return physical port fields only for applicable port flavours
From: Parav Pandit @ 2019-07-09  4:17 UTC (permalink / raw)
  To: netdev; +Cc: jiri, saeedm, jakub.kicinski, Parav Pandit
In-Reply-To: <20190709041739.44292-1-parav@mellanox.com>

Physical port number and split group fields are applicable only to
physical port flavours such as PHYSICAL, CPU and DSA.
Hence limit returning those values in netlink response to such port
flavours.

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
---
 net/core/devlink.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index eacaf37b5108..a9c4e5d8a99c 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -515,6 +515,10 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg,
 		return 0;
 	if (nla_put_u16(msg, DEVLINK_ATTR_PORT_FLAVOUR, attrs->flavour))
 		return -EMSGSIZE;
+	if (devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_PHYSICAL &&
+	    devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_CPU &&
+	    devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_DSA)
+		return 0;
 	if (nla_put_u32(msg, DEVLINK_ATTR_PORT_NUMBER,
 			attrs->phys.port_number))
 		return -EMSGSIZE;
-- 
2.19.2


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox