Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] [net-next, wrong] make BPFILTER_UMH depend on X86
From: Masahiro Yamada @ 2018-05-31  1:42 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Arnd Bergmann, David S. Miller, Alexei Starovoitov,
	Linux Kbuild mailing list, netdev, Linux Kernel Mailing List
In-Reply-To: <20180530151736.nzpde2bgzn4koi7f@ast-mbp>

2018-05-31 0:17 GMT+09:00 Alexei Starovoitov <alexei.starovoitov@gmail.com>:
> On Mon, May 28, 2018 at 05:31:01PM +0200, Arnd Bergmann wrote:
>> When build testing across architectures, I run into a build error on
>> all targets other than X86:
>>
>> gcc-8.1.0-nolibc/arm-linux-gnueabi/bin/arm-linux-gnueabi-objdump: net/bpfilter/bpfilter_umh: File format not recognized
>> gcc-8.1.0-nolibc/arm-linux-gnueabi/bin/arm-linux-gnueabi-objcopy:net/bpfilter/bpfilter_umh.o: Invalid bfd target
>>
>> The problem is that 'hostprogs' get built with 'gcc' rather than
>> '$(CROSS_COMPILE)gcc', and my default gcc (as most people's) targets x86.
>>
>> To work around it, adding an X86 dependency gets randconfigs building
>> again on my box.
>>
>> Clearly, this is not a good solution, since it should actually work fine
>> when building native kernels on other architectures but that is now
>> disabled, while cross building an x86 kernel on another host is still
>> broken after my patch.
>>
>> What we probably want here is to try out if the compiler is able to build
>> executables for the target architecture and not build the helper otherwise,
>> at least when compile-testing. No idea how to do that though.
>>
>> Link: http://www.kernel.org/pub/tools/crosstool/
>> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
>> Cc: linux-kbuild@vger.kernel.org
>> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>> ---
>>  net/bpfilter/Kconfig | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/net/bpfilter/Kconfig b/net/bpfilter/Kconfig
>> index 60725c5f79db..61cc4fcbb4d0 100644
>> --- a/net/bpfilter/Kconfig
>> +++ b/net/bpfilter/Kconfig
>> @@ -9,6 +9,7 @@ menuconfig BPFILTER
>>  if BPFILTER
>>  config BPFILTER_UMH
>>       tristate "bpfilter kernel module with user mode helper"
>> +     depends on X86 # actually depends on native builds
>
> depends on X86 will break it on arm.
> I think the better short term fix would be to test that HOSTCC == CC
> It doesn't have to be the same compiler. HOSTCC's arch == kernel ARCH
> Not sure how to hack makefile to do that.
> Long term we need to get rid of HOSTCC dependency.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hmm.
For cross-compiling, we set 'ARCH' via the environment variable or the
command line.

ARCH is not explicitly set, the top-level Makefile sets it to $(SUBARCH)


ARCH ?= $(SUBARCH)


Maybe, we can assume the native build if $(ARCH) and $(SUBARCH) are the same?


-- 
Best Regards
Masahiro Yamada

^ permalink raw reply

* [PATCH net-next] hv_netvsc: fix error return code in netvsc_probe()
From: Wei Yongjun @ 2018-05-31  2:04 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Sridhar Samudrala
  Cc: Wei Yongjun, devel, netdev, kernel-janitors

Fix to return a negative error code from the failover register fail
error handling case instead of 0, as done elsewhere in this function.

Fixes: 1ff78076d8dd ("netvsc: refactor notifier/event handling code to use the failover framework")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 drivers/net/hyperv/netvsc_drv.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index ebe9642..bef4d55 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2031,8 +2031,10 @@ static int netvsc_probe(struct hv_device *dev,
 	}
 
 	net_device_ctx->failover = failover_register(net, &netvsc_failover_ops);
-	if (IS_ERR(net_device_ctx->failover))
+	if (IS_ERR(net_device_ctx->failover)) {
+		ret = PTR_ERR(net_device_ctx->failover);
 		goto err_failover;
+	}
 
 	return ret;

^ permalink raw reply related

* [PATCH net-next] virtio_net: fix error return code in virtnet_probe()
From: Wei Yongjun @ 2018-05-31  2:05 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Sridhar Samudrala
  Cc: Wei Yongjun, virtualization, netdev, kernel-janitors

Fix to return a negative error code from the failover create fail error
handling case instead of 0, as done elsewhere in this function.

Fixes: ba5e4426e80e ("virtio_net: Extend virtio to use VF datapath when available")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 drivers/net/virtio_net.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8f08a3e..2d55e2a 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2935,8 +2935,10 @@ static int virtnet_probe(struct virtio_device *vdev)
 
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_STANDBY)) {
 		vi->failover = net_failover_create(vi->dev);
-		if (IS_ERR(vi->failover))
+		if (IS_ERR(vi->failover)) {
+			err = PTR_ERR(vi->failover);
 			goto free_vqs;
+		}
 	}
 
 	err = register_netdev(dev);

^ permalink raw reply related

* Re: [PATCH net-next] virtio_net: fix error return code in virtnet_probe()
From: Jason Wang @ 2018-05-31  2:05 UTC (permalink / raw)
  To: Wei Yongjun, Michael S. Tsirkin, Sridhar Samudrala
  Cc: virtualization, netdev, kernel-janitors
In-Reply-To: <1527732307-145609-1-git-send-email-weiyongjun1@huawei.com>



On 2018年05月31日 10:05, Wei Yongjun wrote:
> Fix to return a negative error code from the failover create fail error
> handling case instead of 0, as done elsewhere in this function.
>
> Fixes: ba5e4426e80e ("virtio_net: Extend virtio to use VF datapath when available")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
> ---
>   drivers/net/virtio_net.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 8f08a3e..2d55e2a 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2935,8 +2935,10 @@ static int virtnet_probe(struct virtio_device *vdev)
>   
>   	if (virtio_has_feature(vdev, VIRTIO_NET_F_STANDBY)) {
>   		vi->failover = net_failover_create(vi->dev);
> -		if (IS_ERR(vi->failover))
> +		if (IS_ERR(vi->failover)) {
> +			err = PTR_ERR(vi->failover);
>   			goto free_vqs;
> +		}
>   	}
>   
>   	err = register_netdev(dev);
>

Acked-by: Jason Wang <jasowang@redhat.com>

^ permalink raw reply

* Re: [PATCH net-next v12 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Stephen Hemminger @ 2018-05-31  2:06 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: mst, davem, netdev, virtualization, virtio-dev, jesse.brandeburg,
	alexander.h.duyck, kubakici, jasowang, loseweigh, jiri,
	aaron.f.brown, anjali.singhai
In-Reply-To: <1527180917-39737-3-git-send-email-sridhar.samudrala@intel.com>

On Thu, 24 May 2018 09:55:14 -0700
Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:

> Use the registration/notification framework supported by the generic
> failover infrastructure.
> 
> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>

Why was this merged? It was never signed off by any of the netvsc maintainers,
and there were still issues unresolved.

There are also namespaces issues I am fixing and this breaks them.
Will start my patch set with a revert for this. Sorry

^ permalink raw reply

* Re: [PATCH v2] Revert "alx: remove WoL support"
From: AceLan Kao @ 2018-05-31  2:13 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jay Cliburn, Chris Snook, David S . Miller, Rakesh Pandit, netdev,
	Emily Chien, Johannes Berg, Johannes Stezenbach,
	Linux-Kernel@Vger. Kernel. Org
In-Reply-To: <20180530135859.GB27537@lunn.ch>

Hi Andrew,

2018-05-30 21:58 GMT+08:00 Andrew Lunn <andrew@lunn.ch>:
> On Wed, May 30, 2018 at 10:10:08AM +0800, AceLan Kao wrote:
>> This reverts commit bc2bebe8de8ed4ba6482c9cc370b0dd72ffe8cd2.
>>
>> The WoL feature is a must to pass Energy Star 6.1 and above,
>> the power consumption will be measured during S3 with WoL is enabled.
>>
>> Reverting "alx: remove WoL support", and will try to fix the unintentional
>> wake up issue when WoL is enabled.
>
> Hi AceLan
>
> I find this change log entry rather odd.
>
> If i remember correctly, you first argued that you did not want to
> have to distribute out of tree patches.
Yes, once the secure boot is enabled, no dkms driver would be loaded.

>
> It was suggested that you might be able to justify the revert using
> the argument that the cure is worse than the decease. You ignored
I didn't try to ignore it, maybe I misunderstood what you say. I thought
you do not like the driver parameter, so I only revert back the alx wol
feature.

> that, and when with this Energy Star argument. That got shot down by
> DaveM, and told to actually try to find the problem.
To pass Energy Star is my purpose, I'm sorry to not mention it in the beginning.
We used to using dkms for the measurement, but secure boot is coming,
so we need to make wol feature to be built in the kernel.

And I've written to the device owners for help, but they are not care too much
about the wol feature and are not inconvenient for the testing. So I stuck here
until I saw the user report.

>
> So you then come back and said you think the problem is fixed, but
> don't know exactly what fixed it. So DaveM said try again.
That's another user's report, not me, please refer the link below
https://bugzilla.kernel.org/show_bug.cgi?id=61651#c126

We have no wake up issue and can't reproduce this issue at my side.

>
> Now you are back to Energy Star.
>
> I don't get this. It was the fact you said it was probably fixed that
> made DaveM reconsider. That is the argument you should be using in the
> change log. We want to know what testing you have done. See a
> tested-by: from somebody who had the issue which caused the revert,
> and now says the issue is fixed.
Thanks to remind me and sorry for my ignorance, I never think of adding
tested-by: in the comment, I'll be asking the reporter to provide more info
and put his name in the comment.

Hope my explanation is helpful for the misunderstanding.
And I'll submit another v3 patch once I got the info from reporter.

>
> Ideally we would like to know which change actually fixed the issue,
> so it can be added to stable. But that requires somebody to do a long
> git bisect.
>
>     Andrew
Thanks,

Best regards,
AceLan Kao.

^ permalink raw reply

* [PATCH net-next] net/mlx5: Make function mlx5_fpga_tls_send_teardown_cmd() static
From: Wei Yongjun @ 2018-05-31  2:31 UTC (permalink / raw)
  To: Boris Pismenny, Saeed Mahameed, Leon Romanovsky, Ilya Lesokhin
  Cc: Wei Yongjun, netdev, linux-rdma, kernel-janitors

Fixes the following sparse warning:

drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c:199:6: warning:
 symbol 'mlx5_fpga_tls_send_teardown_cmd' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c
index 2104801..c973623 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c
@@ -196,8 +196,8 @@ static void mlx5_fpga_tls_flow_to_cmd(void *flow, void *cmd)
 		 MLX5_GET(tls_flow, flow, direction_sx));
 }
 
-void mlx5_fpga_tls_send_teardown_cmd(struct mlx5_core_dev *mdev, void *flow,
-				     u32 swid, gfp_t flags)
+static void mlx5_fpga_tls_send_teardown_cmd(struct mlx5_core_dev *mdev,
+					    void *flow, u32 swid, gfp_t flags)
 {
 	struct mlx5_teardown_stream_context *ctx;
 	struct mlx5_fpga_dma_buf *buf;

^ permalink raw reply related

* [PATCH net-next] net/smc: fix error return code in smc_setsockopt()
From: Wei Yongjun @ 2018-05-31  2:31 UTC (permalink / raw)
  To: Ursula Braun; +Cc: Wei Yongjun, linux-s390, netdev, kernel-janitors

Fix to return error code -EINVAL instead of 0 if optlen is invalid.

Fixes: 01d2f7e2cdd3 ("net/smc: sockopts TCP_NODELAY and TCP_CORK")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 net/smc/af_smc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 2c369d4..973b447 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -1420,7 +1420,7 @@ static int smc_setsockopt(struct socket *sock, int level, int optname,
 		return rc;
 
 	if (optlen < sizeof(int))
-		return rc;
+		return -EINVAL;
 	get_user(val, (int __user *)optval);
 
 	lock_sock(sk);

^ permalink raw reply related

* Re: [PATCH net-next v12 1/5] net: Introduce generic failover module
From: Stephen Hemminger @ 2018-05-31  2:52 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: mst, davem, netdev, virtualization, virtio-dev, jesse.brandeburg,
	alexander.h.duyck, kubakici, jasowang, loseweigh, jiri,
	aaron.f.brown, anjali.singhai
In-Reply-To: <00d34f67-f26f-0b20-af3f-2add24ae8a5c@intel.com>

On Fri, 25 May 2018 16:06:58 -0700
"Samudrala, Sridhar" <sridhar.samudrala@intel.com> wrote:

> On 5/25/2018 3:38 PM, Stephen Hemminger wrote:
> > On Thu, 24 May 2018 09:55:13 -0700
> > Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
> >  
> >> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> >> index 03ed492c4e14..0f4ba52b641d 100644
> >> --- a/include/linux/netdevice.h
> >> +++ b/include/linux/netdevice.h
> >> @@ -1421,6 +1421,8 @@ struct net_device_ops {
> >>    *	entity (i.e. the master device for bridged veth)
> >>    * @IFF_MACSEC: device is a MACsec device
> >>    * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
> >> + * @IFF_FAILOVER: device is a failover master device
> >> + * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
> >>    */
> >>   enum netdev_priv_flags {
> >>   	IFF_802_1Q_VLAN			= 1<<0,
> >> @@ -1450,6 +1452,8 @@ enum netdev_priv_flags {
> >>   	IFF_PHONY_HEADROOM		= 1<<24,
> >>   	IFF_MACSEC			= 1<<25,
> >>   	IFF_NO_RX_HANDLER		= 1<<26,
> >> +	IFF_FAILOVER			= 1<<27,
> >> +	IFF_FAILOVER_SLAVE		= 1<<28,
> >>   };  
> > Why is FAILOVER any different than other master/slave relationships.
> > I don't think you need to take up precious netdev flag bits for this.  
> 
> These are netdev priv flags.
> Jiri says that IFF_MASTER/IFF_SLAVE are bonding specific flags and cannot be used
> with other failover mechanisms. Team also doesn't use this flags and it has its own
> priv_flags.
> 

This change breaks userspace.
We already have worked with partners to ignore devices marked as IFF_SLAVE,
and IFF_SLAVE is visible to user space API's.

NAK

^ permalink raw reply

* Re: [PATCH net-next v12 1/5] net: Introduce generic failover module
From: Samudrala, Sridhar @ 2018-05-31  2:58 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: mst, davem, netdev, virtualization, virtio-dev, jesse.brandeburg,
	alexander.h.duyck, kubakici, jasowang, loseweigh, jiri,
	aaron.f.brown, anjali.singhai
In-Reply-To: <20180530225259.2cf3f7be@shemminger-XPS-13-9360>



On 5/30/2018 7:52 PM, Stephen Hemminger wrote:
> On Fri, 25 May 2018 16:06:58 -0700
> "Samudrala, Sridhar" <sridhar.samudrala@intel.com> wrote:
>
>> On 5/25/2018 3:38 PM, Stephen Hemminger wrote:
>>> On Thu, 24 May 2018 09:55:13 -0700
>>> Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
>>>   
>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>> index 03ed492c4e14..0f4ba52b641d 100644
>>>> --- a/include/linux/netdevice.h
>>>> +++ b/include/linux/netdevice.h
>>>> @@ -1421,6 +1421,8 @@ struct net_device_ops {
>>>>     *	entity (i.e. the master device for bridged veth)
>>>>     * @IFF_MACSEC: device is a MACsec device
>>>>     * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
>>>> + * @IFF_FAILOVER: device is a failover master device
>>>> + * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
>>>>     */
>>>>    enum netdev_priv_flags {
>>>>    	IFF_802_1Q_VLAN			= 1<<0,
>>>> @@ -1450,6 +1452,8 @@ enum netdev_priv_flags {
>>>>    	IFF_PHONY_HEADROOM		= 1<<24,
>>>>    	IFF_MACSEC			= 1<<25,
>>>>    	IFF_NO_RX_HANDLER		= 1<<26,
>>>> +	IFF_FAILOVER			= 1<<27,
>>>> +	IFF_FAILOVER_SLAVE		= 1<<28,
>>>>    };
>>> Why is FAILOVER any different than other master/slave relationships.
>>> I don't think you need to take up precious netdev flag bits for this.
>> These are netdev priv flags.
>> Jiri says that IFF_MASTER/IFF_SLAVE are bonding specific flags and cannot be used
>> with other failover mechanisms. Team also doesn't use this flags and it has its own
>> priv_flags.
>>
> This change breaks userspace.
> We already have worked with partners to ignore devices marked as IFF_SLAVE,
> and IFF_SLAVE is visible to user space API's.

I specifically made sure not to remove IFF_SLAVE in the netvsc patch.


>
> NAK

^ permalink raw reply

* Re: [PATCH net-next v12 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Samudrala, Sridhar @ 2018-05-31  3:03 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: alexander.h.duyck, virtio-dev, jiri, mst, kubakici, netdev,
	virtualization, loseweigh, anjali.singhai, aaron.f.brown, davem
In-Reply-To: <20180530220635.206ee6d7@shemminger-XPS-13-9360>



On 5/30/2018 7:06 PM, Stephen Hemminger wrote:
> On Thu, 24 May 2018 09:55:14 -0700
> Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
>
>> Use the registration/notification framework supported by the generic
>> failover infrastructure.
>>
>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> Why was this merged? It was never signed off by any of the netvsc maintainers,
> and there were still issues unresolved.
>
> There are also namespaces issues I am fixing and this breaks them.
> Will start my patch set with a revert for this. Sorry

I would appreciate if you can make the fixes on top of this patch series. I tried hard
to make sure that netvsc functionality and behavior doesn't change.

It is possible that there could be some bugs introduced, but they can be fixed.
Looks like Wei already found a bug and submitted a fix for that.

^ permalink raw reply

* Re: [RFC V5 PATCH 8/8] vhost: event suppression for packed ring
From: Jason Wang @ 2018-05-31  3:09 UTC (permalink / raw)
  To: Wei Xu; +Cc: mst, kvm, virtualization, netdev, linux-kernel, jfreimann,
	tiwei.bie
In-Reply-To: <20180530114200.GA23792@wei-ubt>



On 2018年05月30日 19:42, Wei Xu wrote:
>>   /* This actually signals the guest, using eventfd. */
>>   void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>>   {
>> @@ -2802,10 +2930,34 @@ static bool vhost_enable_notify_packed(struct vhost_dev *dev,
>>   				       struct vhost_virtqueue *vq)
>>   {
>>   	struct vring_desc_packed *d = vq->desc_packed + vq->avail_idx;
>> -	__virtio16 flags;
>> +	__virtio16 flags = RING_EVENT_FLAGS_ENABLE;
>>   	int ret;
>>   
>> -	/* FIXME: disable notification through device area */
>> +	if (!(vq->used_flags & VRING_USED_F_NO_NOTIFY))
>> +		return false;
>> +	vq->used_flags &= ~VRING_USED_F_NO_NOTIFY;
> 'used_flags' was originally designed for 1.0, why should we pay attetion to it here?
>
> Wei

It was used to recored whether or not we've disabled notification. Then 
we can avoid unnecessary userspace writes or memory barriers.

Thanks

^ permalink raw reply

* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
From: Samudrala, Sridhar @ 2018-05-31  3:35 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Michael Chan, David Miller, Netdev, Or Gerlitz
In-Reply-To: <20180530155337.691f1ad4@cakuba>

On 5/30/2018 3:53 PM, Jakub Kicinski wrote:
> On Wed, 30 May 2018 14:23:06 -0700, Samudrala, Sridhar wrote:
>> On 5/29/2018 11:33 PM, Jakub Kicinski wrote:
>>> On Tue, 29 May 2018 23:08:11 -0700, Michael Chan wrote:
>>>> On Tue, May 29, 2018 at 10:56 PM, Jakub Kicinski wrote:
>>>>> On Tue, 29 May 2018 20:19:54 -0700, Michael Chan wrote:
>>>>>> On Tue, May 29, 2018 at 1:46 PM, Samudrala, Sridhar wrote:
>>>>>>> Isn't ndo_set_vf_xxx() considered a legacy interface and not planned to be
>>>>>>> extended?
>>>>> +1 it's painful to see this feature being added to the legacy
>>>>> API :(  Another duplicated configuration knob.
>>>>>   
>>>>>> I didn't know about that.
>>>>>>   
>>>>>>> Shouldn't we enable this via ethtool on the port representor netdev?
>>>>>> We discussed about this.  ethtool on the VF representor will only work
>>>>>> in switchdev mode and also will not support min/max values.
>>>>> Ethtool channel API may be overdue a rewrite in devlink anyway, but I
>>>>> feel like implementing switchdev mode and rewriting features in devlink
>>>>> may be too much to ask.
>>>> Totally agreed.  And switchdev mode doesn't seem to be that widely
>>>> used at the moment.  Do you have other suggestions besides NDO?
>>> At some points you (Broadcom) were working whole bunch of devlink
>>> configuration options for the PCIe side of the ASIC.  The number of
>>> queues relates to things like number of allocated MSI-X vectors, which
>>> if memory serves me was in your devlink patch set.  In an ideal world
>>> we would try to keep all those in one place :)
>>>
>>> For PCIe config there is always the question of what can be configured
>>> at runtime, and what requires a HW reset.  Therefore that devlink API
>>> which could configure current as well as persistent device settings was
>>> quite nice.  I'm not sure if reallocating queues would ever require
>>> PCIe block reset but maybe...  Certainly it seems the notion of min
>>> queues would make more sense in PCIe configuration devlink API than
>>> ethtool channel API to me as well.
>>>
>>> Queues are in the grey area between netdev and non-netdev constructs.
>>> They make sense both from PCIe resource allocation perspective (i.e.
>>> devlink PCIe settings) and netdev perspective (ethtool) because they
>>> feed into things like qdisc offloads, maybe per-queue stats etc.
>>>
>>> So yes...  IMHO it would be nice to add this to a devlink SR-IOV config
>>> API and/or switchdev representors.  But neither of those are really an
>>> option for you today so IDK :)
>> One reason why 'switchdev' mode is not yet widely used or enabled by default
>> could be due to the requirement to program the flow rules only via slow path.
> Do you mean the fallback traffic requirement?

Yes.

>
>> Would it make sense to relax this requirement and support a mode where port
>> representors are created and let the PF driver implement a default policy that
>> adds flow rules for all the VFs to enable connectivity and let the user
>> add/modify the rules via port representors?
> I definitely share your concerns, stopping a major HW vendor from using
> this new and preferred mode is not helping us make progress.
>
> The problem is that if we allow this diversion, i.e. driver to implement
> some special policy, or pre-populate a bridge in a configuration that
> suits the HW we may condition users to expect that as the standard Linux
> behaviour.  And we will be stuck with it forever even tho your next gen
> HW (ice?) may support correct behaviour.

Yes. ice can support slowpath behavior as required to support OVS offload.
However, i was just wondering if we should have an option to allow switchdev
without slowpath so that the user can use alternate mechanisms to program
the flow rules instead of having to use OVS.


>
> We should perhaps separate switchdev mode from TC flower/OvS offloads.
> Is your objective to implement OvS offload or just switchdev mode?
>
> For OvS without proper fallback behaviour you may struggle.
>
> Switchdev mode could be within your reach even without changing the
> default rules.  What if you spawned all port netdevs (I dislike the
> term representor, sorry, it's confusing people) in down state and then
> refuse to bring them up unless user instantiated a bridge that would
> behave in a way that your HW can support?  If ports are down you won't
> have fallback traffic so no problem to solve.

If we want to use port netdev's admin state to control the link state of the
VFs then this will not work.
We need to only disable TX/RX but admin state and link state need to be
supported on the port netdevs.

^ permalink raw reply

* [PATCH net-next] net: netcp: ethss: remove unnecessary pointer set to NULL
From: YueHaibing @ 2018-05-31  3:48 UTC (permalink / raw)
  To: davem, w-kwok2, m-karicheri2; +Cc: netdev, linux-kernel, YueHaibing

If statement has make sure the 'slave->phy' is NULL

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/net/ethernet/ti/netcp_ethss.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/netcp_ethss.c b/drivers/net/ethernet/ti/netcp_ethss.c
index 6a728d3..6e455a2 100644
--- a/drivers/net/ethernet/ti/netcp_ethss.c
+++ b/drivers/net/ethernet/ti/netcp_ethss.c
@@ -3206,7 +3206,6 @@ static void init_secondary_ports(struct gbe_priv *gbe_dev,
 		if (!slave->phy) {
 			dev_err(dev, "phy not found for slave %d\n",
 				slave->slave_num);
-			slave->phy = NULL;
 		} else {
 			dev_dbg(dev, "phy found: id is: 0x%s\n",
 				phydev_name(slave->phy));
-- 
2.7.0

^ permalink raw reply related

* [PATCH net] net/ncsi: Fix array size in dumpit handler
From: Samuel Mendoza-Jonas @ 2018-05-31  4:10 UTC (permalink / raw)
  To: netdev; +Cc: Samuel Mendoza-Jonas, David S . Miller, linux-kernel, openbmc

With CONFIG_CC_STACKPROTECTOR enabled the kernel panics as below when
parsing a NCSI_CMD_PKG_INFO command:

[  150.149711] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: 805cff08
[  150.149711]
[  150.159919] CPU: 0 PID: 1301 Comm: ncsi-netlink Not tainted 4.13.16-468cbec6d2c91239332cb91b1f0a73aafcb6f0c6 #1
[  150.170004] Hardware name: Generic DT based system
[  150.174852] [<80109930>] (unwind_backtrace) from [<80106bc4>] (show_stack+0x20/0x24)
[  150.182641] [<80106bc4>] (show_stack) from [<805d36e4>] (dump_stack+0x20/0x28)
[  150.189888] [<805d36e4>] (dump_stack) from [<801163ac>] (panic+0xdc/0x278)
[  150.196780] [<801163ac>] (panic) from [<801162cc>] (__stack_chk_fail+0x20/0x24)
[  150.204111] [<801162cc>] (__stack_chk_fail) from [<805cff08>] (ncsi_pkg_info_all_nl+0x244/0x258)
[  150.212912] [<805cff08>] (ncsi_pkg_info_all_nl) from [<804f939c>] (genl_lock_dumpit+0x3c/0x54)
[  150.221535] [<804f939c>] (genl_lock_dumpit) from [<804f873c>] (netlink_dump+0xf8/0x284)
[  150.229550] [<804f873c>] (netlink_dump) from [<804f8d44>] (__netlink_dump_start+0x124/0x17c)
[  150.237992] [<804f8d44>] (__netlink_dump_start) from [<804f9880>] (genl_rcv_msg+0x1c8/0x3d4)
[  150.246440] [<804f9880>] (genl_rcv_msg) from [<804f9174>] (netlink_rcv_skb+0xd8/0x134)
[  150.254361] [<804f9174>] (netlink_rcv_skb) from [<804f96a4>] (genl_rcv+0x30/0x44)
[  150.261850] [<804f96a4>] (genl_rcv) from [<804f7790>] (netlink_unicast+0x198/0x234)
[  150.269511] [<804f7790>] (netlink_unicast) from [<804f7ffc>] (netlink_sendmsg+0x368/0x3b0)
[  150.277783] [<804f7ffc>] (netlink_sendmsg) from [<804abea4>] (sock_sendmsg+0x24/0x34)
[  150.285625] [<804abea4>] (sock_sendmsg) from [<804ac1dc>] (___sys_sendmsg+0x244/0x260)
[  150.293556] [<804ac1dc>] (___sys_sendmsg) from [<804ad98c>] (__sys_sendmsg+0x5c/0x9c)
[  150.301400] [<804ad98c>] (__sys_sendmsg) from [<804ad9e4>] (SyS_sendmsg+0x18/0x1c)
[  150.308984] [<804ad9e4>] (SyS_sendmsg) from [<80102640>] (ret_fast_syscall+0x0/0x3c)
[  150.316743] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: 805cff08

This turns out to be because the attrs array in ncsi_pkg_info_all_nl()
is initialised to a length of NCSI_ATTR_MAX which is the maximum
attribute number, not the number of attributes.

Fixes: 955dc68cb9b2 ("net/ncsi: Add generic netlink family")
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
---
 net/ncsi/ncsi-netlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ncsi/ncsi-netlink.c b/net/ncsi/ncsi-netlink.c
index 8d7e849d4825..41cede4041d3 100644
--- a/net/ncsi/ncsi-netlink.c
+++ b/net/ncsi/ncsi-netlink.c
@@ -215,7 +215,7 @@ static int ncsi_pkg_info_nl(struct sk_buff *msg, struct genl_info *info)
 static int ncsi_pkg_info_all_nl(struct sk_buff *skb,
 				struct netlink_callback *cb)
 {
-	struct nlattr *attrs[NCSI_ATTR_MAX];
+	struct nlattr *attrs[NCSI_ATTR_MAX + 1];
 	struct ncsi_package *np, *package;
 	struct ncsi_dev_priv *ndp;
 	unsigned int package_id;
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH bpf v3 3/5] selftests/bpf: test_sockmap, fix test timeout
From: Prashant Bhole @ 2018-05-31  4:13 UTC (permalink / raw)
  To: John Fastabend, Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, David S . Miller, Shuah Khan,
	netdev, linux-kselftest
In-Reply-To: <df1d9ec2-783a-09f4-29d7-544d20d74465@gmail.com>



On 5/31/2018 4:59 AM, John Fastabend wrote:
> On 05/30/2018 12:29 PM, Alexei Starovoitov wrote:
>> On Wed, May 30, 2018 at 02:56:09PM +0900, Prashant Bhole wrote:
>>> In order to reduce runtime of tests, recently timout for select() call
>>> was reduced from 1sec to 10usec. This was causing many tests failures.
>>> It was caught with failure handling commits in this series.
>>>
>>> Restoring the timeout from 10usec to 1sec
>>>
>>> Fixes: a18fda1a62c3 ("bpf: reduce runtime of test_sockmap tests")
>>> Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
>>> ---
>>>   tools/testing/selftests/bpf/test_sockmap.c | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
>>> index 64f9e25c451f..9d01f5c2abe2 100644
>>> --- a/tools/testing/selftests/bpf/test_sockmap.c
>>> +++ b/tools/testing/selftests/bpf/test_sockmap.c
>>> @@ -345,8 +345,8 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
>>>   		if (err < 0)
>>>   			perror("recv start time: ");
>>>   		while (s->bytes_recvd < total_bytes) {
>>> -			timeout.tv_sec = 0;
>>> -			timeout.tv_usec = 10;
>>> +			timeout.tv_sec = 1;
>>> +			timeout.tv_usec = 0;
>>
>> I've applied the set, but had to revert it, since it takes too long.
>>
>> real	1m40.124s
>> user	0m0.375s
>> sys	0m14.521s
>>
> 
> Dang, I thought it would be a bit longer but not minutes.
> 
>> Myself and Daniel run the test semi-manually when we apply patches.> Adding 2 extra minutes of wait time is unnecessary.
> 
> Yep.
> 
>> Especially since most of it is idle time.
>> Please find a way to fix tests differently.
>> btw I don't see any failures today. Not sure what is being fixed
>> by incresing a timeout.
>>
> 
> Calling these fixes is a bit much, they are primarily improvements.
> 
> The background is, when I originally wrote the tests my goal was to
> exercise the kernel code paths. Because of this I didn't really care if
> the tests actually sent/recv all bytes in the test. (I have long
> running tests using netperf/wrk/apached/etc. for that) But, the manual
> tests do have an option to verify the data if specified. The 'verify'
> option is a bit fragile in that with the right tests (e.g. drop)
> or the certain options (e.g. cork) it can fail which is expected.
> 
> What Prashant added was support to actually verify the data correctly.
> And also fix a few cgroup handling and some pretty printing as well.
> He noticed the low timeout causing issue in these cases though so
> increased it.
> 
> @Prashant, how about increasing this less dramatically because now
> all cork tests are going to stall for 1s unless perfectly aligned.
> How about 100us? Or even better we can conditionally set it based
> on if tx_cork is set. If tx_cork is set use 1us otherwise use 200us
> or something. (1s is really to high in any cases for lo)
> 
> Also capturing some of the above in the cover letter would help
> folks understand the context a bit better.
> 

I did trial and error for timeout values. Currently 1000us for corked 
tests and 1 sec for other tests works fine. I observed broken-pipe error 
at tx side when timeout was < 1000us.

Also tests with apply=1 and higher number of iterations were taking 
time, so reducing iterations reduces the test run time drastically.

real    0m12.968s
user    0m0.219s
sys     0m14.337s

Also I will try to explain background in the cover letter of next series.

-Prashant

^ permalink raw reply

* [PATCH bpf-next v4 1/5] selftests/bpf: test_sockmap, check test failure
From: Prashant Bhole @ 2018-05-31  4:42 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, John Fastabend
  Cc: Prashant Bhole, David S . Miller, Shuah Khan, netdev,
	linux-kselftest
In-Reply-To: <20180531044240.796-1-bhole_prashant_q7@lab.ntt.co.jp>

Test failures are not identified because exit code of RX/TX threads
is not checked. Also threads are not returning correct exit code.

- Return exit code from threads depending on test execution status
- In main thread, check the exit code of RX/TX threads
- Skip error checking for corked tests as they are expected to timeout

Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
---
 tools/testing/selftests/bpf/test_sockmap.c | 27 +++++++++++++++++-----
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index eb17fae458e6..7b2008a144cb 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -429,8 +429,8 @@ static int sendmsg_test(struct sockmap_options *opt)
 	struct msg_stats s = {0};
 	int iov_count = opt->iov_count;
 	int iov_buf = opt->iov_length;
+	int rx_status, tx_status;
 	int cnt = opt->rate;
-	int status;
 
 	errno = 0;
 
@@ -442,7 +442,7 @@ static int sendmsg_test(struct sockmap_options *opt)
 	rxpid = fork();
 	if (rxpid == 0) {
 		if (opt->drop_expected)
-			exit(1);
+			exit(0);
 
 		if (opt->sendpage)
 			iov_count = 1;
@@ -463,7 +463,9 @@ static int sendmsg_test(struct sockmap_options *opt)
 				"rx_sendmsg: TX: %zuB %fB/s %fGB/s RX: %zuB %fB/s %fGB/s\n",
 				s.bytes_sent, sent_Bps, sent_Bps/giga,
 				s.bytes_recvd, recvd_Bps, recvd_Bps/giga);
-		exit(1);
+		if (err && txmsg_cork)
+			err = 0;
+		exit(err ? 1 : 0);
 	} else if (rxpid == -1) {
 		perror("msg_loop_rx: ");
 		return errno;
@@ -491,14 +493,27 @@ static int sendmsg_test(struct sockmap_options *opt)
 				"tx_sendmsg: TX: %zuB %fB/s %f GB/s RX: %zuB %fB/s %fGB/s\n",
 				s.bytes_sent, sent_Bps, sent_Bps/giga,
 				s.bytes_recvd, recvd_Bps, recvd_Bps/giga);
-		exit(1);
+		exit(err ? 1 : 0);
 	} else if (txpid == -1) {
 		perror("msg_loop_tx: ");
 		return errno;
 	}
 
-	assert(waitpid(rxpid, &status, 0) == rxpid);
-	assert(waitpid(txpid, &status, 0) == txpid);
+	assert(waitpid(rxpid, &rx_status, 0) == rxpid);
+	assert(waitpid(txpid, &tx_status, 0) == txpid);
+	if (WIFEXITED(rx_status)) {
+		err = WEXITSTATUS(rx_status);
+		if (err) {
+			fprintf(stderr, "rx thread exited with err %d. ", err);
+			goto out;
+		}
+	}
+	if (WIFEXITED(tx_status)) {
+		err = WEXITSTATUS(tx_status);
+		if (err)
+			fprintf(stderr, "tx thread exited with err %d. ", err);
+	}
+out:
 	return err;
 }
 
-- 
2.17.0

^ permalink raw reply related

* [PATCH bpf-next v4 0/5] fix test_sockmap
From: Prashant Bhole @ 2018-05-31  4:42 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, John Fastabend
  Cc: Prashant Bhole, David S . Miller, Shuah Khan, netdev,
	linux-kselftest

test_sockmap was originally written only to exercise kernel code
paths, so there was no strict checking of errors. When the code was
modified to run as selftests, due to lack of error handling it was not
able to detect test failures.

In order to improve, this series fixes error handling, test run time
and data verification.

Also slightly improved test output by printing parameter values (cork,
apply, start, end) so that parameters for all tests are displayed.

Changes in v4:
  - patch1: Ignore RX timoute error only for corked tests
  - patch3: Setting different timeout for corked tests and reduce
      run time by reducing number of iterations in some tests

Changes in v3:
  - Skipped error checking for corked tests

Prashant Bhole (5):
  selftests/bpf: test_sockmap, check test failure
  selftests/bpf: test_sockmap, join cgroup in selftest mode
  selftests/bpf: test_sockmap, timing improvements
  selftests/bpf: test_sockmap, fix data verification
  selftests/bpf: test_sockmap, print additional test options

 tools/testing/selftests/bpf/test_sockmap.c | 87 +++++++++++++++++-----
 1 file changed, 67 insertions(+), 20 deletions(-)

-- 
2.17.0

^ permalink raw reply

* [PATCH bpf-next v4 2/5] selftests/bpf: test_sockmap, join cgroup in selftest mode
From: Prashant Bhole @ 2018-05-31  4:42 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, John Fastabend
  Cc: Prashant Bhole, David S . Miller, Shuah Khan, netdev,
	linux-kselftest
In-Reply-To: <20180531044240.796-1-bhole_prashant_q7@lab.ntt.co.jp>

In case of selftest mode, temporary cgroup environment is created but
cgroup is not joined. It causes test failures. Fixed by joining the
cgroup

Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
---
 tools/testing/selftests/bpf/test_sockmap.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 7b2008a144cb..7f9ca79aadbc 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -1344,6 +1344,11 @@ static int __test_suite(char *bpf_file)
 		return cg_fd;
 	}
 
+	if (join_cgroup(CG_PATH)) {
+		fprintf(stderr, "ERROR: failed to join cgroup\n");
+		return -EINVAL;
+	}
+
 	/* Tests basic commands and APIs with range of iov values */
 	txmsg_start = txmsg_end = 0;
 	err = test_txmsg(cg_fd);
-- 
2.17.0

^ permalink raw reply related

* [PATCH bpf-next v4 3/5] selftests/bpf: test_sockmap, timing improvements
From: Prashant Bhole @ 2018-05-31  4:42 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, John Fastabend
  Cc: Prashant Bhole, David S . Miller, Shuah Khan, netdev,
	linux-kselftest
In-Reply-To: <20180531044240.796-1-bhole_prashant_q7@lab.ntt.co.jp>

Currently 10us delay is too low for many tests to succeed. It needs to
be increased. Also, many corked tests are expected to hit rx timeout
irrespective of timeout value.

- This patch sets 1000usec timeout value for corked tests because less
than that causes broken-pipe error in tx thread. Also sets 1 second
timeout for all other tests because less than that results in RX
timeout
- tests with apply=1 and higher number of iterations were taking lot
of time. This patch reduces test run time by reducing iterations.

real    0m12.968s
user    0m0.219s
sys     0m14.337s

Fixes: a18fda1a62c3 ("bpf: reduce runtime of test_sockmap tests")
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
---
 tools/testing/selftests/bpf/test_sockmap.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 7f9ca79aadbc..5cd0550af595 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -345,8 +345,13 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
 		if (err < 0)
 			perror("recv start time: ");
 		while (s->bytes_recvd < total_bytes) {
-			timeout.tv_sec = 0;
-			timeout.tv_usec = 10;
+			if (txmsg_cork) {
+				timeout.tv_sec = 0;
+				timeout.tv_usec = 1000;
+			} else {
+				timeout.tv_sec = 1;
+				timeout.tv_usec = 0;
+			}
 
 			/* FD sets */
 			FD_ZERO(&w);
@@ -1025,14 +1030,14 @@ static int test_send(struct sockmap_options *opt, int cgrp)
 
 	opt->iov_length = 1;
 	opt->iov_count = 1;
-	opt->rate = 1024;
+	opt->rate = 512;
 	err = test_exec(cgrp, opt);
 	if (err)
 		goto out;
 
 	opt->iov_length = 256;
 	opt->iov_count = 1024;
-	opt->rate = 10;
+	opt->rate = 2;
 	err = test_exec(cgrp, opt);
 	if (err)
 		goto out;
-- 
2.17.0

^ permalink raw reply related

* [PATCH bpf-next v4 4/5] selftests/bpf: test_sockmap, fix data verification
From: Prashant Bhole @ 2018-05-31  4:42 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, John Fastabend
  Cc: Prashant Bhole, David S . Miller, Shuah Khan, netdev,
	linux-kselftest
In-Reply-To: <20180531044240.796-1-bhole_prashant_q7@lab.ntt.co.jp>

When data verification is enabled, some tests fail because verification is done
incorrectly. Following changes fix it.

- Identify the size of data block to be verified
- Reset verification counter when data block size is reached
- Fixed the value printed in case of verfication failure

Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
---
 tools/testing/selftests/bpf/test_sockmap.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 5cd0550af595..2bc70e1ab2eb 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -337,8 +337,15 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
 		int fd_flags = O_NONBLOCK;
 		struct timeval timeout;
 		float total_bytes;
+		int bytes_cnt = 0;
+		int chunk_sz;
 		fd_set w;
 
+		if (opt->sendpage)
+			chunk_sz = iov_length * cnt;
+		else
+			chunk_sz = iov_length * iov_count;
+
 		fcntl(fd, fd_flags);
 		total_bytes = (float)iov_count * (float)iov_length * (float)cnt;
 		err = clock_gettime(CLOCK_MONOTONIC, &s->start);
@@ -393,9 +400,14 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
 							errno = -EIO;
 							fprintf(stderr,
 								"detected data corruption @iov[%i]:%i %02x != %02x, %02x ?= %02x\n",
-								i, j, d[j], k - 1, d[j+1], k + 1);
+								i, j, d[j], k - 1, d[j+1], k);
 							goto out_errno;
 						}
+						bytes_cnt++;
+						if (bytes_cnt == chunk_sz) {
+							k = 0;
+							bytes_cnt = 0;
+						}
 						recv--;
 					}
 				}
-- 
2.17.0

^ permalink raw reply related

* [PATCH bpf-next v4 5/5] selftests/bpf: test_sockmap, print additional test options
From: Prashant Bhole @ 2018-05-31  4:42 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, John Fastabend
  Cc: Prashant Bhole, David S . Miller, Shuah Khan, netdev,
	linux-kselftest
In-Reply-To: <20180531044240.796-1-bhole_prashant_q7@lab.ntt.co.jp>

Print values of test options like apply, cork, start, end so that
individual failed tests can be identified for manual run

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
---
 tools/testing/selftests/bpf/test_sockmap.c | 28 +++++++++++++++-------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 2bc70e1ab2eb..05c8cb71724a 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -876,6 +876,8 @@ static char *test_to_str(int test)
 #define OPTSTRING 60
 static void test_options(char *options)
 {
+	char tstr[OPTSTRING];
+
 	memset(options, 0, OPTSTRING);
 
 	if (txmsg_pass)
@@ -888,14 +890,22 @@ static void test_options(char *options)
 		strncat(options, "redir_noisy,", OPTSTRING);
 	if (txmsg_drop)
 		strncat(options, "drop,", OPTSTRING);
-	if (txmsg_apply)
-		strncat(options, "apply,", OPTSTRING);
-	if (txmsg_cork)
-		strncat(options, "cork,", OPTSTRING);
-	if (txmsg_start)
-		strncat(options, "start,", OPTSTRING);
-	if (txmsg_end)
-		strncat(options, "end,", OPTSTRING);
+	if (txmsg_apply) {
+		snprintf(tstr, OPTSTRING, "apply %d,", txmsg_apply);
+		strncat(options, tstr, OPTSTRING);
+	}
+	if (txmsg_cork) {
+		snprintf(tstr, OPTSTRING, "cork %d,", txmsg_cork);
+		strncat(options, tstr, OPTSTRING);
+	}
+	if (txmsg_start) {
+		snprintf(tstr, OPTSTRING, "start %d,", txmsg_start);
+		strncat(options, tstr, OPTSTRING);
+	}
+	if (txmsg_end) {
+		snprintf(tstr, OPTSTRING, "end %d,", txmsg_end);
+		strncat(options, tstr, OPTSTRING);
+	}
 	if (txmsg_ingress)
 		strncat(options, "ingress,", OPTSTRING);
 	if (txmsg_skb)
@@ -904,7 +914,7 @@ static void test_options(char *options)
 
 static int __test_exec(int cgrp, int test, struct sockmap_options *opt)
 {
-	char *options = calloc(60, sizeof(char));
+	char *options = calloc(OPTSTRING, sizeof(char));
 	int err;
 
 	if (test == SENDPAGE)
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH] samples/bpf: Add xdp_sample_pkts example
From: Song Liu @ 2018-05-31  5:03 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: Networking
In-Reply-To: <20180530164524.21760-1-toke@toke.dk>

On Wed, May 30, 2018 at 9:45 AM, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
> This adds an example program showing how to sample packets from XDP using
> the perf event buffer. The example userspace program just prints the
> ethernet header for every packet sampled.
>
> Most of the userspace code is borrowed from other examples, most notably
> trace_output.
>
> Note that the example only works when everything runs on CPU0; so
> suitable smp_affinity needs to be set on the device. Some drivers seem
> to reset smp_affinity when loading an XDP program, so it may be
> necessary to change it after starting the example userspace program.

Why does this only works when everything runs on CPU0? Is this something
we can improve?

Thanks,
Song

>
> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
> ---
>  samples/bpf/Makefile               |   4 +
>  samples/bpf/xdp_sample_pkts_kern.c |  48 ++++++++++++
>  samples/bpf/xdp_sample_pkts_user.c | 147 +++++++++++++++++++++++++++++++++++++
>  3 files changed, 199 insertions(+)
>  create mode 100644 samples/bpf/xdp_sample_pkts_kern.c
>  create mode 100644 samples/bpf/xdp_sample_pkts_user.c
>
> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> index 1303af1..6f0c6d2 100644
> --- a/samples/bpf/Makefile
> +++ b/samples/bpf/Makefile
> @@ -52,6 +52,7 @@ hostprogs-y += xdp_adjust_tail
>  hostprogs-y += xdpsock
>  hostprogs-y += xdp_fwd
>  hostprogs-y += task_fd_query
> +hostprogs-y += xdp_sample_pkts
>
>  # Libbpf dependencies
>  LIBBPF = $(TOOLS_PATH)/lib/bpf/libbpf.a
> @@ -107,6 +108,7 @@ xdp_adjust_tail-objs := xdp_adjust_tail_user.o
>  xdpsock-objs := bpf_load.o xdpsock_user.o
>  xdp_fwd-objs := bpf_load.o xdp_fwd_user.o
>  task_fd_query-objs := bpf_load.o task_fd_query_user.o $(TRACE_HELPERS)
> +xdp_sample_pkts-objs := bpf_load.o xdp_sample_pkts_user.o $(TRACE_HELPERS)
>
>  # Tell kbuild to always build the programs
>  always := $(hostprogs-y)
> @@ -163,6 +165,7 @@ always += xdp_adjust_tail_kern.o
>  always += xdpsock_kern.o
>  always += xdp_fwd_kern.o
>  always += task_fd_query_kern.o
> +always += xdp_sample_pkts_kern.o
>
>  HOSTCFLAGS += -I$(objtree)/usr/include
>  HOSTCFLAGS += -I$(srctree)/tools/lib/
> @@ -179,6 +182,7 @@ HOSTCFLAGS_spintest_user.o += -I$(srctree)/tools/lib/bpf/
>  HOSTCFLAGS_trace_event_user.o += -I$(srctree)/tools/lib/bpf/
>  HOSTCFLAGS_sampleip_user.o += -I$(srctree)/tools/lib/bpf/
>  HOSTCFLAGS_task_fd_query_user.o += -I$(srctree)/tools/lib/bpf/
> +HOSTCFLAGS_xdp_sample_pkts_user.o += -I$(srctree)/tools/lib/bpf/
>
>  HOST_LOADLIBES         += $(LIBBPF) -lelf
>  HOSTLOADLIBES_tracex4          += -lrt
> diff --git a/samples/bpf/xdp_sample_pkts_kern.c b/samples/bpf/xdp_sample_pkts_kern.c
> new file mode 100644
> index 0000000..c58183a
> --- /dev/null
> +++ b/samples/bpf/xdp_sample_pkts_kern.c
> @@ -0,0 +1,48 @@
> +#include <linux/ptrace.h>
> +#include <linux/version.h>
> +#include <uapi/linux/bpf.h>
> +#include "bpf_helpers.h"
> +
> +#define SAMPLE_SIZE 64ul
> +
> +struct bpf_map_def SEC("maps") my_map = {
> +       .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
> +       .key_size = sizeof(int),
> +       .value_size = sizeof(u32),
> +       .max_entries = 2,
> +};
> +
> +SEC("xdp_sample")
> +int xdp_sample_prog(struct xdp_md *ctx)
> +{
> +       void *data_end = (void *)(long)ctx->data_end;
> +       void *data = (void *)(long)ctx->data;
> +
> +        /* Metadata will be in the perf event before the packet data. */
> +       struct S {
> +               u16 cookie;
> +               u16 pkt_len;
> +       } __attribute__((packed)) metadata;
> +
> +       if (data + SAMPLE_SIZE < data_end) {
> +               /* The XDP perf_event_output handler will use the upper 32 bits
> +                * of the flags argument as a number of bytes to include of the
> +                * packet payload in the event data. If the size is too big, the
> +                * call to bpf_perf_event_output will fail and return -EFAULT.
> +                *
> +                * See bpf_xdp_event_output in net/core/filter.c.
> +                */
> +               u64 flags = SAMPLE_SIZE << 32;
> +
> +               metadata.cookie = 0xdead;
> +               metadata.pkt_len = (u16)(data_end - data);
> +
> +               bpf_perf_event_output(ctx, &my_map, flags,
> +                                     &metadata, sizeof(metadata));
> +       }
> +
> +       return XDP_PASS;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> +u32 _version SEC("version") = LINUX_VERSION_CODE;
> diff --git a/samples/bpf/xdp_sample_pkts_user.c b/samples/bpf/xdp_sample_pkts_user.c
> new file mode 100644
> index 0000000..f996917
> --- /dev/null
> +++ b/samples/bpf/xdp_sample_pkts_user.c
> @@ -0,0 +1,147 @@
> +/* This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + */
> +#include <stdio.h>
> +#include <unistd.h>
> +#include <stdlib.h>
> +#include <stdbool.h>
> +#include <string.h>
> +#include <fcntl.h>
> +#include <poll.h>
> +#include <linux/perf_event.h>
> +#include <linux/bpf.h>
> +#include <net/if.h>
> +#include <errno.h>
> +#include <assert.h>
> +#include <sys/syscall.h>
> +#include <sys/ioctl.h>
> +#include <sys/mman.h>
> +#include <time.h>
> +#include <signal.h>
> +#include <libbpf.h>
> +#include "bpf_load.h"
> +#include "bpf_util.h"
> +#include <bpf/bpf.h>
> +
> +#include "perf-sys.h"
> +#include "trace_helpers.h"
> +
> +static int pmu_fd, if_idx = 0;
> +static char *if_name;
> +
> +static int do_attach(int idx, int fd, const char *name)
> +{
> +       int err;
> +
> +       err = bpf_set_link_xdp_fd(idx, fd, 0);
> +       if (err < 0)
> +               printf("ERROR: failed to attach program to %s\n", name);
> +
> +       return err;
> +}
> +
> +static int do_detach(int idx, const char *name)
> +{
> +       int err;
> +
> +       err = bpf_set_link_xdp_fd(idx, -1, 0);
> +       if (err < 0)
> +               printf("ERROR: failed to detach program from %s\n", name);
> +
> +       return err;
> +}
> +
> +#define SAMPLE_SIZE 64
> +
> +static int print_bpf_output(void *data, int size)
> +{
> +       struct {
> +               __u16 cookie;
> +               __u16 pkt_len;
> +               __u8  pkt_data[SAMPLE_SIZE];
> +       } __attribute__((packed)) *e = data;
> +       int i;
> +
> +       if (e->cookie != 0xdead) {
> +               printf("BUG cookie %x sized %d\n",
> +                      e->cookie, size);
> +               return LIBBPF_PERF_EVENT_ERROR;
> +       }
> +
> +       printf("Pkt len: %-5d bytes. Ethernet hdr: ", e->pkt_len);
> +       for (i = 0; i < 14 && i < e->pkt_len; i++)
> +               printf("%02x ", e->pkt_data[i]);
> +       printf("\n");
> +
> +       return LIBBPF_PERF_EVENT_CONT;
> +}
> +
> +static void test_bpf_perf_event(void)
> +{
> +       struct perf_event_attr attr = {
> +               .sample_type = PERF_SAMPLE_RAW,
> +               .type = PERF_TYPE_SOFTWARE,
> +               .config = PERF_COUNT_SW_BPF_OUTPUT,
> +       };
> +       int key = 0;
> +
> +       pmu_fd = sys_perf_event_open(&attr, -1/*pid*/, 0/*cpu*/, -1/*group_fd*/, 0);
> +
> +       assert(pmu_fd >= 0);
> +       assert(bpf_map_update_elem(map_fd[0], &key, &pmu_fd, BPF_ANY) == 0);
> +       ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
> +}
> +
> +static void sig_handler(int signo)
> +{
> +       do_detach(if_idx, if_name);
> +       exit(0);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +       char filename[256];
> +       int ret, err;
> +
> +       if (argc < 2) {
> +               printf("Usage: %s <ifname>\n", argv[0]);
> +               return 1;
> +       }
> +
> +       snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
> +
> +       if (load_bpf_file(filename)) {
> +               printf("%s", bpf_log_buf);
> +               return 1;
> +       }
> +
> +       if_idx = if_nametoindex(argv[1]);
> +       if (!if_idx)
> +               if_idx = strtoul(argv[1], NULL, 0);
> +
> +       if (!if_idx) {
> +               fprintf(stderr, "Invalid ifname\n");
> +               return 1;
> +       }
> +       if_name = argv[1];
> +       err = do_attach(if_idx, prog_fd[0], argv[1]);
> +       if (err)
> +               return err;
> +
> +       if (signal(SIGINT, sig_handler) ||
> +           signal(SIGHUP, sig_handler) ||
> +           signal(SIGTERM, sig_handler)) {
> +               perror("signal");
> +               return 1;
> +       }
> +
> +       test_bpf_perf_event();
> +
> +       if (perf_event_mmap(pmu_fd) < 0)
> +               return 1;
> +
> +       ret = perf_event_poller(pmu_fd, print_bpf_output);
> +       kill(0, SIGINT);
> +       return ret;
> +}
> --
> 2.7.4
>

^ permalink raw reply

* [PATCH -net] net: ethernet: davinci_emac: fix error handling in probe()
From: Dan Carpenter @ 2018-05-31  6:44 UTC (permalink / raw)
  To: Grygorii Strashko, Cyril Chemparathy
  Cc: David S. Miller, Rob Herring, Lukas Wunner, Florian Fainelli,
	Ivan Khoronzhuk, linux-omap, netdev, kernel-janitors

The current error handling code has an issue where it does:

	if (priv->txchan)
		cpdma_chan_destroy(priv->txchan);

The problem is that ->txchan is either valid or an error pointer (which
would lead to an Oops).  I've changed it to use multiple error labels so
that the test can be removed.

Also there were some missing calls to netif_napi_del().

Fixes: 3ef0fdb2342c ("net: davinci_emac: switch to new cpdma layer")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
index be0fec17d95d..06d7c9e4dcda 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -1873,7 +1873,7 @@ static int davinci_emac_probe(struct platform_device *pdev)
 	if (IS_ERR(priv->txchan)) {
 		dev_err(&pdev->dev, "error initializing tx dma channel\n");
 		rc = PTR_ERR(priv->txchan);
-		goto no_cpdma_chan;
+		goto err_free_dma;
 	}
 
 	priv->rxchan = cpdma_chan_create(priv->dma, EMAC_DEF_RX_CH,
@@ -1881,14 +1881,14 @@ static int davinci_emac_probe(struct platform_device *pdev)
 	if (IS_ERR(priv->rxchan)) {
 		dev_err(&pdev->dev, "error initializing rx dma channel\n");
 		rc = PTR_ERR(priv->rxchan);
-		goto no_cpdma_chan;
+		goto err_free_txchan;
 	}
 
 	res = platform_get_resource(pdev, IORESOURCE_IRQ, 0);
 	if (!res) {
 		dev_err(&pdev->dev, "error getting irq res\n");
 		rc = -ENOENT;
-		goto no_cpdma_chan;
+		goto err_free_rxchan;
 	}
 	ndev->irq = res->start;
 
@@ -1914,7 +1914,7 @@ static int davinci_emac_probe(struct platform_device *pdev)
 		pm_runtime_put_noidle(&pdev->dev);
 		dev_err(&pdev->dev, "%s: failed to get_sync(%d)\n",
 			__func__, rc);
-		goto no_cpdma_chan;
+		goto err_napi_del;
 	}
 
 	/* register the network device */
@@ -1924,7 +1924,7 @@ static int davinci_emac_probe(struct platform_device *pdev)
 		dev_err(&pdev->dev, "error in register_netdev\n");
 		rc = -ENODEV;
 		pm_runtime_put(&pdev->dev);
-		goto no_cpdma_chan;
+		goto err_napi_del;
 	}
 
 
@@ -1937,11 +1937,13 @@ static int davinci_emac_probe(struct platform_device *pdev)
 
 	return 0;
 
-no_cpdma_chan:
-	if (priv->txchan)
-		cpdma_chan_destroy(priv->txchan);
-	if (priv->rxchan)
-		cpdma_chan_destroy(priv->rxchan);
+err_napi_del:
+	netif_napi_del(&priv->napi);
+err_free_rxchan:
+	cpdma_chan_destroy(priv->rxchan);
+err_free_txchan:
+	cpdma_chan_destroy(priv->txchan);
+err_free_dma:
 	cpdma_ctlr_destroy(priv->dma);
 no_pdata:
 	if (of_phy_is_fixed_link(np))

^ permalink raw reply related

* [PATCH net-next v2] net: sched: split tc_ctl_tfilter into three handlers
From: Vlad Buslov @ 2018-05-31  6:52 UTC (permalink / raw)
  To: jiri; +Cc: netdev, jhs, xiyou.wangcong, davem, Vlad Buslov
In-Reply-To: <20180530202533.GE2010@nanopsycho>

tc_ctl_tfilter handles three netlink message types: RTM_NEWTFILTER,
RTM_DELTFILTER, RTM_GETTFILTER. However, implementation of this function
involves a lot of branching on specific message type because most of the
code is message-specific. This significantly complicates adding new
functionality and doesn't provide much benefit of code reuse.

Split tc_ctl_tfilter to three standalone functions that handle filter new,
delete and get requests.

The only truly protocol independent part of tc_ctl_tfilter is code that
looks up queue, class, and block. Refactor this code to standalone
tcf_block_find function that is used by all three new handlers.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
---
Changes V1 -> V2:
- Rebase on current net-next

 net/sched/cls_api.c | 438 +++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 293 insertions(+), 145 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 76303c45db19..c06585fb2dc6 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -437,6 +437,78 @@ static struct tcf_block *tcf_block_lookup(struct net *net, u32 block_index)
 	return idr_find(&tn->idr, block_index);
 }
 
+/* Find tcf block.
+ * Set q, parent, cl when appropriate.
+ */
+
+static struct tcf_block *tcf_block_find(struct net *net, struct Qdisc **q,
+					u32 *parent, unsigned long *cl,
+					int ifindex, u32 block_index,
+					struct netlink_ext_ack *extack)
+{
+	struct tcf_block *block;
+
+	if (ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
+		block = tcf_block_lookup(net, block_index);
+		if (!block) {
+			NL_SET_ERR_MSG(extack, "Block of given index was not found");
+			return ERR_PTR(-EINVAL);
+		}
+	} else {
+		const struct Qdisc_class_ops *cops;
+		struct net_device *dev;
+
+		/* Find link */
+		dev = __dev_get_by_index(net, ifindex);
+		if (!dev)
+			return ERR_PTR(-ENODEV);
+
+		/* Find qdisc */
+		if (!*parent) {
+			*q = dev->qdisc;
+			*parent = (*q)->handle;
+		} else {
+			*q = qdisc_lookup(dev, TC_H_MAJ(*parent));
+			if (!*q) {
+				NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't exists");
+				return ERR_PTR(-EINVAL);
+			}
+		}
+
+		/* Is it classful? */
+		cops = (*q)->ops->cl_ops;
+		if (!cops) {
+			NL_SET_ERR_MSG(extack, "Qdisc not classful");
+			return ERR_PTR(-EINVAL);
+		}
+
+		if (!cops->tcf_block) {
+			NL_SET_ERR_MSG(extack, "Class doesn't support blocks");
+			return ERR_PTR(-EOPNOTSUPP);
+		}
+
+		/* Do we search for filter, attached to class? */
+		if (TC_H_MIN(*parent)) {
+			*cl = cops->find(*q, *parent);
+			if (*cl == 0) {
+				NL_SET_ERR_MSG(extack, "Specified class doesn't exist");
+				return ERR_PTR(-ENOENT);
+			}
+		}
+
+		/* And the last stroke */
+		block = cops->tcf_block(*q, *cl, extack);
+		if (!block)
+			return ERR_PTR(-EINVAL);
+		if (tcf_block_shared(block)) {
+			NL_SET_ERR_MSG(extack, "This filter block is shared. Please use the block index to manipulate the filters");
+			return ERR_PTR(-EOPNOTSUPP);
+		}
+	}
+
+	return block;
+}
+
 static struct tcf_chain *tcf_block_chain_zero(struct tcf_block *block)
 {
 	return list_first_entry(&block->chain_list, struct tcf_chain, list);
@@ -984,9 +1056,7 @@ static void tfilter_notify_chain(struct net *net, struct sk_buff *oskb,
 			       q, parent, 0, event, false);
 }
 
-/* Add/change/delete/get a filter node */
-
-static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
+static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 			  struct netlink_ext_ack *extack)
 {
 	struct net *net = sock_net(skb->sk);
@@ -1007,8 +1077,7 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	int err;
 	int tp_created;
 
-	if ((n->nlmsg_type != RTM_GETTFILTER) &&
-	    !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
+	if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 replay:
@@ -1026,24 +1095,13 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	cl = 0;
 
 	if (prio == 0) {
-		switch (n->nlmsg_type) {
-		case RTM_DELTFILTER:
-			if (protocol || t->tcm_handle || tca[TCA_KIND]) {
-				NL_SET_ERR_MSG(extack, "Cannot flush filters with protocol, handle or kind set");
-				return -ENOENT;
-			}
-			break;
-		case RTM_NEWTFILTER:
-			/* If no priority is provided by the user,
-			 * we allocate one.
-			 */
-			if (n->nlmsg_flags & NLM_F_CREATE) {
-				prio = TC_H_MAKE(0x80000000U, 0U);
-				prio_allocate = true;
-				break;
-			}
-			/* fall-through */
-		default:
+		/* If no priority is provided by the user,
+		 * we allocate one.
+		 */
+		if (n->nlmsg_flags & NLM_F_CREATE) {
+			prio = TC_H_MAKE(0x80000000U, 0U);
+			prio_allocate = true;
+		} else {
 			NL_SET_ERR_MSG(extack, "Invalid filter command with priority of zero");
 			return -ENOENT;
 		}
@@ -1051,66 +1109,11 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 
 	/* Find head of filter chain. */
 
-	if (t->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
-		block = tcf_block_lookup(net, t->tcm_block_index);
-		if (!block) {
-			NL_SET_ERR_MSG(extack, "Block of given index was not found");
-			err = -EINVAL;
-			goto errout;
-		}
-	} else {
-		const struct Qdisc_class_ops *cops;
-		struct net_device *dev;
-
-		/* Find link */
-		dev = __dev_get_by_index(net, t->tcm_ifindex);
-		if (!dev)
-			return -ENODEV;
-
-		/* Find qdisc */
-		if (!parent) {
-			q = dev->qdisc;
-			parent = q->handle;
-		} else {
-			q = qdisc_lookup(dev, TC_H_MAJ(t->tcm_parent));
-			if (!q) {
-				NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't exists");
-				return -EINVAL;
-			}
-		}
-
-		/* Is it classful? */
-		cops = q->ops->cl_ops;
-		if (!cops) {
-			NL_SET_ERR_MSG(extack, "Qdisc not classful");
-			return -EINVAL;
-		}
-
-		if (!cops->tcf_block) {
-			NL_SET_ERR_MSG(extack, "Class doesn't support blocks");
-			return -EOPNOTSUPP;
-		}
-
-		/* Do we search for filter, attached to class? */
-		if (TC_H_MIN(parent)) {
-			cl = cops->find(q, parent);
-			if (cl == 0) {
-				NL_SET_ERR_MSG(extack, "Specified class doesn't exist");
-				return -ENOENT;
-			}
-		}
-
-		/* And the last stroke */
-		block = cops->tcf_block(q, cl, extack);
-		if (!block) {
-			err = -EINVAL;
-			goto errout;
-		}
-		if (tcf_block_shared(block)) {
-			NL_SET_ERR_MSG(extack, "This filter block is shared. Please use the block index to manipulate the filters");
-			err = -EOPNOTSUPP;
-			goto errout;
-		}
+	block = tcf_block_find(net, &q, &parent, &cl,
+			       t->tcm_ifindex, t->tcm_block_index, extack);
+	if (IS_ERR(block)) {
+		err = PTR_ERR(block);
+		goto errout;
 	}
 
 	chain_index = tca[TCA_CHAIN] ? nla_get_u32(tca[TCA_CHAIN]) : 0;
@@ -1119,19 +1122,10 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 		err = -EINVAL;
 		goto errout;
 	}
-	chain = tcf_chain_get(block, chain_index,
-			      n->nlmsg_type == RTM_NEWTFILTER);
+	chain = tcf_chain_get(block, chain_index, true);
 	if (!chain) {
 		NL_SET_ERR_MSG(extack, "Cannot find specified filter chain");
-		err = n->nlmsg_type == RTM_NEWTFILTER ? -ENOMEM : -EINVAL;
-		goto errout;
-	}
-
-	if (n->nlmsg_type == RTM_DELTFILTER && prio == 0) {
-		tfilter_notify_chain(net, skb, block, q, parent, n,
-				     chain, RTM_DELTFILTER);
-		tcf_chain_flush(chain);
-		err = 0;
+		err = -ENOMEM;
 		goto errout;
 	}
 
@@ -1152,8 +1146,7 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 			goto errout;
 		}
 
-		if (n->nlmsg_type != RTM_NEWTFILTER ||
-		    !(n->nlmsg_flags & NLM_F_CREATE)) {
+		if (!(n->nlmsg_flags & NLM_F_CREATE)) {
 			NL_SET_ERR_MSG(extack, "Need both RTM_NEWTFILTER and NLM_F_CREATE to create a new filter");
 			err = -ENOENT;
 			goto errout;
@@ -1178,56 +1171,15 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	fh = tp->ops->get(tp, t->tcm_handle);
 
 	if (!fh) {
-		if (n->nlmsg_type == RTM_DELTFILTER && t->tcm_handle == 0) {
-			tcf_chain_tp_remove(chain, &chain_info, tp);
-			tfilter_notify(net, skb, n, tp, block, q, parent, fh,
-				       RTM_DELTFILTER, false);
-			tcf_proto_destroy(tp, extack);
-			err = 0;
-			goto errout;
-		}
-
-		if (n->nlmsg_type != RTM_NEWTFILTER ||
-		    !(n->nlmsg_flags & NLM_F_CREATE)) {
+		if (!(n->nlmsg_flags & NLM_F_CREATE)) {
 			NL_SET_ERR_MSG(extack, "Need both RTM_NEWTFILTER and NLM_F_CREATE to create a new filter");
 			err = -ENOENT;
 			goto errout;
 		}
-	} else {
-		bool last;
-
-		switch (n->nlmsg_type) {
-		case RTM_NEWTFILTER:
-			if (n->nlmsg_flags & NLM_F_EXCL) {
-				if (tp_created)
-					tcf_proto_destroy(tp, NULL);
-				NL_SET_ERR_MSG(extack, "Filter already exists");
-				err = -EEXIST;
-				goto errout;
-			}
-			break;
-		case RTM_DELTFILTER:
-			err = tfilter_del_notify(net, skb, n, tp, block,
-						 q, parent, fh, false, &last,
-						 extack);
-			if (err)
-				goto errout;
-			if (last) {
-				tcf_chain_tp_remove(chain, &chain_info, tp);
-				tcf_proto_destroy(tp, extack);
-			}
-			goto errout;
-		case RTM_GETTFILTER:
-			err = tfilter_notify(net, skb, n, tp, block, q, parent,
-					     fh, RTM_NEWTFILTER, true);
-			if (err < 0)
-				NL_SET_ERR_MSG(extack, "Failed to send filter notify message");
-			goto errout;
-		default:
-			NL_SET_ERR_MSG(extack, "Invalid netlink message type");
-			err = -EINVAL;
-			goto errout;
-		}
+	} else if (n->nlmsg_flags & NLM_F_EXCL) {
+		NL_SET_ERR_MSG(extack, "Filter already exists");
+		err = -EEXIST;
+		goto errout;
 	}
 
 	err = tp->ops->change(net, skb, tp, cl, t->tcm_handle, tca, &fh,
@@ -1252,6 +1204,202 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	return err;
 }
 
+static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
+			  struct netlink_ext_ack *extack)
+{
+	struct net *net = sock_net(skb->sk);
+	struct nlattr *tca[TCA_MAX + 1];
+	struct tcmsg *t;
+	u32 protocol;
+	u32 prio;
+	u32 parent;
+	u32 chain_index;
+	struct Qdisc *q = NULL;
+	struct tcf_chain_info chain_info;
+	struct tcf_chain *chain = NULL;
+	struct tcf_block *block;
+	struct tcf_proto *tp = NULL;
+	unsigned long cl = 0;
+	void *fh = NULL;
+	int err;
+
+	if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
+		return -EPERM;
+
+	err = nlmsg_parse(n, sizeof(*t), tca, TCA_MAX, NULL, extack);
+	if (err < 0)
+		return err;
+
+	t = nlmsg_data(n);
+	protocol = TC_H_MIN(t->tcm_info);
+	prio = TC_H_MAJ(t->tcm_info);
+	parent = t->tcm_parent;
+
+	if (prio == 0 && (protocol || t->tcm_handle || tca[TCA_KIND])) {
+		NL_SET_ERR_MSG(extack, "Cannot flush filters with protocol, handle or kind set");
+		return -ENOENT;
+	}
+
+	/* Find head of filter chain. */
+
+	block = tcf_block_find(net, &q, &parent, &cl,
+			       t->tcm_ifindex, t->tcm_block_index, extack);
+	if (IS_ERR(block)) {
+		err = PTR_ERR(block);
+		goto errout;
+	}
+
+	chain_index = tca[TCA_CHAIN] ? nla_get_u32(tca[TCA_CHAIN]) : 0;
+	if (chain_index > TC_ACT_EXT_VAL_MASK) {
+		NL_SET_ERR_MSG(extack, "Specified chain index exceeds upper limit");
+		err = -EINVAL;
+		goto errout;
+	}
+	chain = tcf_chain_get(block, chain_index, false);
+	if (!chain) {
+		NL_SET_ERR_MSG(extack, "Cannot find specified filter chain");
+		err = -EINVAL;
+		goto errout;
+	}
+
+	if (prio == 0) {
+		tfilter_notify_chain(net, skb, block, q, parent, n,
+				     chain, RTM_DELTFILTER);
+		tcf_chain_flush(chain);
+		err = 0;
+		goto errout;
+	}
+
+	tp = tcf_chain_tp_find(chain, &chain_info, protocol,
+			       prio, false);
+	if (!tp || IS_ERR(tp)) {
+		NL_SET_ERR_MSG(extack, "Filter with specified priority/protocol not found");
+		err = PTR_ERR(tp);
+		goto errout;
+	} else if (tca[TCA_KIND] && nla_strcmp(tca[TCA_KIND], tp->ops->kind)) {
+		NL_SET_ERR_MSG(extack, "Specified filter kind does not match existing one");
+		err = -EINVAL;
+		goto errout;
+	}
+
+	fh = tp->ops->get(tp, t->tcm_handle);
+
+	if (!fh) {
+		if (t->tcm_handle == 0) {
+			tcf_chain_tp_remove(chain, &chain_info, tp);
+			tfilter_notify(net, skb, n, tp, block, q, parent, fh,
+				       RTM_DELTFILTER, false);
+			tcf_proto_destroy(tp, extack);
+			err = 0;
+		} else {
+			NL_SET_ERR_MSG(extack, "Specified filter handle not found");
+			err = -ENOENT;
+		}
+	} else {
+		bool last;
+
+		err = tfilter_del_notify(net, skb, n, tp, block,
+					 q, parent, fh, false, &last,
+					 extack);
+		if (err)
+			goto errout;
+		if (last) {
+			tcf_chain_tp_remove(chain, &chain_info, tp);
+			tcf_proto_destroy(tp, extack);
+		}
+	}
+
+errout:
+	if (chain)
+		tcf_chain_put(chain);
+	return err;
+}
+
+static int tc_get_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
+			  struct netlink_ext_ack *extack)
+{
+	struct net *net = sock_net(skb->sk);
+	struct nlattr *tca[TCA_MAX + 1];
+	struct tcmsg *t;
+	u32 protocol;
+	u32 prio;
+	u32 parent;
+	u32 chain_index;
+	struct Qdisc *q = NULL;
+	struct tcf_chain_info chain_info;
+	struct tcf_chain *chain = NULL;
+	struct tcf_block *block;
+	struct tcf_proto *tp = NULL;
+	unsigned long cl = 0;
+	void *fh = NULL;
+	int err;
+
+	err = nlmsg_parse(n, sizeof(*t), tca, TCA_MAX, NULL, extack);
+	if (err < 0)
+		return err;
+
+	t = nlmsg_data(n);
+	protocol = TC_H_MIN(t->tcm_info);
+	prio = TC_H_MAJ(t->tcm_info);
+	parent = t->tcm_parent;
+
+	if (prio == 0) {
+		NL_SET_ERR_MSG(extack, "Invalid filter command with priority of zero");
+		return -ENOENT;
+	}
+
+	/* Find head of filter chain. */
+
+	block = tcf_block_find(net, &q, &parent, &cl,
+			       t->tcm_ifindex, t->tcm_block_index, extack);
+	if (IS_ERR(block)) {
+		err = PTR_ERR(block);
+		goto errout;
+	}
+
+	chain_index = tca[TCA_CHAIN] ? nla_get_u32(tca[TCA_CHAIN]) : 0;
+	if (chain_index > TC_ACT_EXT_VAL_MASK) {
+		NL_SET_ERR_MSG(extack, "Specified chain index exceeds upper limit");
+		err = -EINVAL;
+		goto errout;
+	}
+	chain = tcf_chain_get(block, chain_index, false);
+	if (!chain) {
+		NL_SET_ERR_MSG(extack, "Cannot find specified filter chain");
+		err = -EINVAL;
+		goto errout;
+	}
+
+	tp = tcf_chain_tp_find(chain, &chain_info, protocol,
+			       prio, false);
+	if (!tp || IS_ERR(tp)) {
+		NL_SET_ERR_MSG(extack, "Filter with specified priority/protocol not found");
+		err = PTR_ERR(tp);
+		goto errout;
+	} else if (tca[TCA_KIND] && nla_strcmp(tca[TCA_KIND], tp->ops->kind)) {
+		NL_SET_ERR_MSG(extack, "Specified filter kind does not match existing one");
+		err = -EINVAL;
+		goto errout;
+	}
+
+	fh = tp->ops->get(tp, t->tcm_handle);
+
+	if (!fh) {
+		NL_SET_ERR_MSG(extack, "Specified filter handle not found");
+		err = -ENOENT;
+	} else {
+		err = tfilter_notify(net, skb, n, tp, block, q, parent,
+				     fh, RTM_NEWTFILTER, true);
+		if (err < 0)
+			NL_SET_ERR_MSG(extack, "Failed to send filter notify message");
+	}
+
+errout:
+	if (chain)
+		tcf_chain_put(chain);
+	return err;
+}
+
 struct tcf_dump_args {
 	struct tcf_walker w;
 	struct sk_buff *skb;
@@ -1634,9 +1782,9 @@ static int __init tc_filter_init(void)
 	if (err)
 		goto err_register_pernet_subsys;
 
-	rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_ctl_tfilter, NULL, 0);
-	rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_ctl_tfilter, NULL, 0);
-	rtnl_register(PF_UNSPEC, RTM_GETTFILTER, tc_ctl_tfilter,
+	rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_new_tfilter, NULL, 0);
+	rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_del_tfilter, NULL, 0);
+	rtnl_register(PF_UNSPEC, RTM_GETTFILTER, tc_get_tfilter,
 		      tc_dump_tfilter, 0);
 
 	return 0;
-- 
2.7.5

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox