Netdev List
 help / color / mirror / Atom feed
* [PATCH net] net/ncsi: Fix array size in dumpit handler
From: Samuel Mendoza-Jonas @ 2018-05-31  4:10 UTC (permalink / raw)
  To: netdev; +Cc: Samuel Mendoza-Jonas, David S . Miller, linux-kernel, openbmc

With CONFIG_CC_STACKPROTECTOR enabled the kernel panics as below when
parsing a NCSI_CMD_PKG_INFO command:

[  150.149711] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: 805cff08
[  150.149711]
[  150.159919] CPU: 0 PID: 1301 Comm: ncsi-netlink Not tainted 4.13.16-468cbec6d2c91239332cb91b1f0a73aafcb6f0c6 #1
[  150.170004] Hardware name: Generic DT based system
[  150.174852] [<80109930>] (unwind_backtrace) from [<80106bc4>] (show_stack+0x20/0x24)
[  150.182641] [<80106bc4>] (show_stack) from [<805d36e4>] (dump_stack+0x20/0x28)
[  150.189888] [<805d36e4>] (dump_stack) from [<801163ac>] (panic+0xdc/0x278)
[  150.196780] [<801163ac>] (panic) from [<801162cc>] (__stack_chk_fail+0x20/0x24)
[  150.204111] [<801162cc>] (__stack_chk_fail) from [<805cff08>] (ncsi_pkg_info_all_nl+0x244/0x258)
[  150.212912] [<805cff08>] (ncsi_pkg_info_all_nl) from [<804f939c>] (genl_lock_dumpit+0x3c/0x54)
[  150.221535] [<804f939c>] (genl_lock_dumpit) from [<804f873c>] (netlink_dump+0xf8/0x284)
[  150.229550] [<804f873c>] (netlink_dump) from [<804f8d44>] (__netlink_dump_start+0x124/0x17c)
[  150.237992] [<804f8d44>] (__netlink_dump_start) from [<804f9880>] (genl_rcv_msg+0x1c8/0x3d4)
[  150.246440] [<804f9880>] (genl_rcv_msg) from [<804f9174>] (netlink_rcv_skb+0xd8/0x134)
[  150.254361] [<804f9174>] (netlink_rcv_skb) from [<804f96a4>] (genl_rcv+0x30/0x44)
[  150.261850] [<804f96a4>] (genl_rcv) from [<804f7790>] (netlink_unicast+0x198/0x234)
[  150.269511] [<804f7790>] (netlink_unicast) from [<804f7ffc>] (netlink_sendmsg+0x368/0x3b0)
[  150.277783] [<804f7ffc>] (netlink_sendmsg) from [<804abea4>] (sock_sendmsg+0x24/0x34)
[  150.285625] [<804abea4>] (sock_sendmsg) from [<804ac1dc>] (___sys_sendmsg+0x244/0x260)
[  150.293556] [<804ac1dc>] (___sys_sendmsg) from [<804ad98c>] (__sys_sendmsg+0x5c/0x9c)
[  150.301400] [<804ad98c>] (__sys_sendmsg) from [<804ad9e4>] (SyS_sendmsg+0x18/0x1c)
[  150.308984] [<804ad9e4>] (SyS_sendmsg) from [<80102640>] (ret_fast_syscall+0x0/0x3c)
[  150.316743] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: 805cff08

This turns out to be because the attrs array in ncsi_pkg_info_all_nl()
is initialised to a length of NCSI_ATTR_MAX which is the maximum
attribute number, not the number of attributes.

Fixes: 955dc68cb9b2 ("net/ncsi: Add generic netlink family")
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
---
 net/ncsi/ncsi-netlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ncsi/ncsi-netlink.c b/net/ncsi/ncsi-netlink.c
index 8d7e849d4825..41cede4041d3 100644
--- a/net/ncsi/ncsi-netlink.c
+++ b/net/ncsi/ncsi-netlink.c
@@ -215,7 +215,7 @@ static int ncsi_pkg_info_nl(struct sk_buff *msg, struct genl_info *info)
 static int ncsi_pkg_info_all_nl(struct sk_buff *skb,
 				struct netlink_callback *cb)
 {
-	struct nlattr *attrs[NCSI_ATTR_MAX];
+	struct nlattr *attrs[NCSI_ATTR_MAX + 1];
 	struct ncsi_package *np, *package;
 	struct ncsi_dev_priv *ndp;
 	unsigned int package_id;
-- 
2.17.0

^ permalink raw reply related

* [PATCH net-next] net: netcp: ethss: remove unnecessary pointer set to NULL
From: YueHaibing @ 2018-05-31  3:48 UTC (permalink / raw)
  To: davem, w-kwok2, m-karicheri2; +Cc: netdev, linux-kernel, YueHaibing

If statement has make sure the 'slave->phy' is NULL

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/net/ethernet/ti/netcp_ethss.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/netcp_ethss.c b/drivers/net/ethernet/ti/netcp_ethss.c
index 6a728d3..6e455a2 100644
--- a/drivers/net/ethernet/ti/netcp_ethss.c
+++ b/drivers/net/ethernet/ti/netcp_ethss.c
@@ -3206,7 +3206,6 @@ static void init_secondary_ports(struct gbe_priv *gbe_dev,
 		if (!slave->phy) {
 			dev_err(dev, "phy not found for slave %d\n",
 				slave->slave_num);
-			slave->phy = NULL;
 		} else {
 			dev_dbg(dev, "phy found: id is: 0x%s\n",
 				phydev_name(slave->phy));
-- 
2.7.0

^ permalink raw reply related

* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
From: Samudrala, Sridhar @ 2018-05-31  3:35 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Michael Chan, David Miller, Netdev, Or Gerlitz
In-Reply-To: <20180530155337.691f1ad4@cakuba>

On 5/30/2018 3:53 PM, Jakub Kicinski wrote:
> On Wed, 30 May 2018 14:23:06 -0700, Samudrala, Sridhar wrote:
>> On 5/29/2018 11:33 PM, Jakub Kicinski wrote:
>>> On Tue, 29 May 2018 23:08:11 -0700, Michael Chan wrote:
>>>> On Tue, May 29, 2018 at 10:56 PM, Jakub Kicinski wrote:
>>>>> On Tue, 29 May 2018 20:19:54 -0700, Michael Chan wrote:
>>>>>> On Tue, May 29, 2018 at 1:46 PM, Samudrala, Sridhar wrote:
>>>>>>> Isn't ndo_set_vf_xxx() considered a legacy interface and not planned to be
>>>>>>> extended?
>>>>> +1 it's painful to see this feature being added to the legacy
>>>>> API :(  Another duplicated configuration knob.
>>>>>   
>>>>>> I didn't know about that.
>>>>>>   
>>>>>>> Shouldn't we enable this via ethtool on the port representor netdev?
>>>>>> We discussed about this.  ethtool on the VF representor will only work
>>>>>> in switchdev mode and also will not support min/max values.
>>>>> Ethtool channel API may be overdue a rewrite in devlink anyway, but I
>>>>> feel like implementing switchdev mode and rewriting features in devlink
>>>>> may be too much to ask.
>>>> Totally agreed.  And switchdev mode doesn't seem to be that widely
>>>> used at the moment.  Do you have other suggestions besides NDO?
>>> At some points you (Broadcom) were working whole bunch of devlink
>>> configuration options for the PCIe side of the ASIC.  The number of
>>> queues relates to things like number of allocated MSI-X vectors, which
>>> if memory serves me was in your devlink patch set.  In an ideal world
>>> we would try to keep all those in one place :)
>>>
>>> For PCIe config there is always the question of what can be configured
>>> at runtime, and what requires a HW reset.  Therefore that devlink API
>>> which could configure current as well as persistent device settings was
>>> quite nice.  I'm not sure if reallocating queues would ever require
>>> PCIe block reset but maybe...  Certainly it seems the notion of min
>>> queues would make more sense in PCIe configuration devlink API than
>>> ethtool channel API to me as well.
>>>
>>> Queues are in the grey area between netdev and non-netdev constructs.
>>> They make sense both from PCIe resource allocation perspective (i.e.
>>> devlink PCIe settings) and netdev perspective (ethtool) because they
>>> feed into things like qdisc offloads, maybe per-queue stats etc.
>>>
>>> So yes...  IMHO it would be nice to add this to a devlink SR-IOV config
>>> API and/or switchdev representors.  But neither of those are really an
>>> option for you today so IDK :)
>> One reason why 'switchdev' mode is not yet widely used or enabled by default
>> could be due to the requirement to program the flow rules only via slow path.
> Do you mean the fallback traffic requirement?

Yes.

>
>> Would it make sense to relax this requirement and support a mode where port
>> representors are created and let the PF driver implement a default policy that
>> adds flow rules for all the VFs to enable connectivity and let the user
>> add/modify the rules via port representors?
> I definitely share your concerns, stopping a major HW vendor from using
> this new and preferred mode is not helping us make progress.
>
> The problem is that if we allow this diversion, i.e. driver to implement
> some special policy, or pre-populate a bridge in a configuration that
> suits the HW we may condition users to expect that as the standard Linux
> behaviour.  And we will be stuck with it forever even tho your next gen
> HW (ice?) may support correct behaviour.

Yes. ice can support slowpath behavior as required to support OVS offload.
However, i was just wondering if we should have an option to allow switchdev
without slowpath so that the user can use alternate mechanisms to program
the flow rules instead of having to use OVS.


>
> We should perhaps separate switchdev mode from TC flower/OvS offloads.
> Is your objective to implement OvS offload or just switchdev mode?
>
> For OvS without proper fallback behaviour you may struggle.
>
> Switchdev mode could be within your reach even without changing the
> default rules.  What if you spawned all port netdevs (I dislike the
> term representor, sorry, it's confusing people) in down state and then
> refuse to bring them up unless user instantiated a bridge that would
> behave in a way that your HW can support?  If ports are down you won't
> have fallback traffic so no problem to solve.

If we want to use port netdev's admin state to control the link state of the
VFs then this will not work.
We need to only disable TX/RX but admin state and link state need to be
supported on the port netdevs.

^ permalink raw reply

* Re: [RFC V5 PATCH 8/8] vhost: event suppression for packed ring
From: Jason Wang @ 2018-05-31  3:09 UTC (permalink / raw)
  To: Wei Xu; +Cc: mst, kvm, virtualization, netdev, linux-kernel, jfreimann,
	tiwei.bie
In-Reply-To: <20180530114200.GA23792@wei-ubt>



On 2018年05月30日 19:42, Wei Xu wrote:
>>   /* This actually signals the guest, using eventfd. */
>>   void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>>   {
>> @@ -2802,10 +2930,34 @@ static bool vhost_enable_notify_packed(struct vhost_dev *dev,
>>   				       struct vhost_virtqueue *vq)
>>   {
>>   	struct vring_desc_packed *d = vq->desc_packed + vq->avail_idx;
>> -	__virtio16 flags;
>> +	__virtio16 flags = RING_EVENT_FLAGS_ENABLE;
>>   	int ret;
>>   
>> -	/* FIXME: disable notification through device area */
>> +	if (!(vq->used_flags & VRING_USED_F_NO_NOTIFY))
>> +		return false;
>> +	vq->used_flags &= ~VRING_USED_F_NO_NOTIFY;
> 'used_flags' was originally designed for 1.0, why should we pay attetion to it here?
>
> Wei

It was used to recored whether or not we've disabled notification. Then 
we can avoid unnecessary userspace writes or memory barriers.

Thanks

^ permalink raw reply

* Re: [PATCH net-next v12 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Samudrala, Sridhar @ 2018-05-31  3:03 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: alexander.h.duyck, virtio-dev, jiri, mst, kubakici, netdev,
	virtualization, loseweigh, anjali.singhai, aaron.f.brown, davem
In-Reply-To: <20180530220635.206ee6d7@shemminger-XPS-13-9360>



On 5/30/2018 7:06 PM, Stephen Hemminger wrote:
> On Thu, 24 May 2018 09:55:14 -0700
> Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
>
>> Use the registration/notification framework supported by the generic
>> failover infrastructure.
>>
>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> Why was this merged? It was never signed off by any of the netvsc maintainers,
> and there were still issues unresolved.
>
> There are also namespaces issues I am fixing and this breaks them.
> Will start my patch set with a revert for this. Sorry

I would appreciate if you can make the fixes on top of this patch series. I tried hard
to make sure that netvsc functionality and behavior doesn't change.

It is possible that there could be some bugs introduced, but they can be fixed.
Looks like Wei already found a bug and submitted a fix for that.

^ permalink raw reply

* Re: [PATCH net-next v12 1/5] net: Introduce generic failover module
From: Samudrala, Sridhar @ 2018-05-31  2:58 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: mst, davem, netdev, virtualization, virtio-dev, jesse.brandeburg,
	alexander.h.duyck, kubakici, jasowang, loseweigh, jiri,
	aaron.f.brown, anjali.singhai
In-Reply-To: <20180530225259.2cf3f7be@shemminger-XPS-13-9360>



On 5/30/2018 7:52 PM, Stephen Hemminger wrote:
> On Fri, 25 May 2018 16:06:58 -0700
> "Samudrala, Sridhar" <sridhar.samudrala@intel.com> wrote:
>
>> On 5/25/2018 3:38 PM, Stephen Hemminger wrote:
>>> On Thu, 24 May 2018 09:55:13 -0700
>>> Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
>>>   
>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>> index 03ed492c4e14..0f4ba52b641d 100644
>>>> --- a/include/linux/netdevice.h
>>>> +++ b/include/linux/netdevice.h
>>>> @@ -1421,6 +1421,8 @@ struct net_device_ops {
>>>>     *	entity (i.e. the master device for bridged veth)
>>>>     * @IFF_MACSEC: device is a MACsec device
>>>>     * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
>>>> + * @IFF_FAILOVER: device is a failover master device
>>>> + * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
>>>>     */
>>>>    enum netdev_priv_flags {
>>>>    	IFF_802_1Q_VLAN			= 1<<0,
>>>> @@ -1450,6 +1452,8 @@ enum netdev_priv_flags {
>>>>    	IFF_PHONY_HEADROOM		= 1<<24,
>>>>    	IFF_MACSEC			= 1<<25,
>>>>    	IFF_NO_RX_HANDLER		= 1<<26,
>>>> +	IFF_FAILOVER			= 1<<27,
>>>> +	IFF_FAILOVER_SLAVE		= 1<<28,
>>>>    };
>>> Why is FAILOVER any different than other master/slave relationships.
>>> I don't think you need to take up precious netdev flag bits for this.
>> These are netdev priv flags.
>> Jiri says that IFF_MASTER/IFF_SLAVE are bonding specific flags and cannot be used
>> with other failover mechanisms. Team also doesn't use this flags and it has its own
>> priv_flags.
>>
> This change breaks userspace.
> We already have worked with partners to ignore devices marked as IFF_SLAVE,
> and IFF_SLAVE is visible to user space API's.

I specifically made sure not to remove IFF_SLAVE in the netvsc patch.


>
> NAK

^ permalink raw reply

* Re: [PATCH net-next v12 1/5] net: Introduce generic failover module
From: Stephen Hemminger @ 2018-05-31  2:52 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: mst, davem, netdev, virtualization, virtio-dev, jesse.brandeburg,
	alexander.h.duyck, kubakici, jasowang, loseweigh, jiri,
	aaron.f.brown, anjali.singhai
In-Reply-To: <00d34f67-f26f-0b20-af3f-2add24ae8a5c@intel.com>

On Fri, 25 May 2018 16:06:58 -0700
"Samudrala, Sridhar" <sridhar.samudrala@intel.com> wrote:

> On 5/25/2018 3:38 PM, Stephen Hemminger wrote:
> > On Thu, 24 May 2018 09:55:13 -0700
> > Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
> >  
> >> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> >> index 03ed492c4e14..0f4ba52b641d 100644
> >> --- a/include/linux/netdevice.h
> >> +++ b/include/linux/netdevice.h
> >> @@ -1421,6 +1421,8 @@ struct net_device_ops {
> >>    *	entity (i.e. the master device for bridged veth)
> >>    * @IFF_MACSEC: device is a MACsec device
> >>    * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
> >> + * @IFF_FAILOVER: device is a failover master device
> >> + * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
> >>    */
> >>   enum netdev_priv_flags {
> >>   	IFF_802_1Q_VLAN			= 1<<0,
> >> @@ -1450,6 +1452,8 @@ enum netdev_priv_flags {
> >>   	IFF_PHONY_HEADROOM		= 1<<24,
> >>   	IFF_MACSEC			= 1<<25,
> >>   	IFF_NO_RX_HANDLER		= 1<<26,
> >> +	IFF_FAILOVER			= 1<<27,
> >> +	IFF_FAILOVER_SLAVE		= 1<<28,
> >>   };  
> > Why is FAILOVER any different than other master/slave relationships.
> > I don't think you need to take up precious netdev flag bits for this.  
> 
> These are netdev priv flags.
> Jiri says that IFF_MASTER/IFF_SLAVE are bonding specific flags and cannot be used
> with other failover mechanisms. Team also doesn't use this flags and it has its own
> priv_flags.
> 

This change breaks userspace.
We already have worked with partners to ignore devices marked as IFF_SLAVE,
and IFF_SLAVE is visible to user space API's.

NAK

^ permalink raw reply

* [PATCH net-next] net/smc: fix error return code in smc_setsockopt()
From: Wei Yongjun @ 2018-05-31  2:31 UTC (permalink / raw)
  To: Ursula Braun; +Cc: Wei Yongjun, linux-s390, netdev, kernel-janitors

Fix to return error code -EINVAL instead of 0 if optlen is invalid.

Fixes: 01d2f7e2cdd3 ("net/smc: sockopts TCP_NODELAY and TCP_CORK")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 net/smc/af_smc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 2c369d4..973b447 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -1420,7 +1420,7 @@ static int smc_setsockopt(struct socket *sock, int level, int optname,
 		return rc;
 
 	if (optlen < sizeof(int))
-		return rc;
+		return -EINVAL;
 	get_user(val, (int __user *)optval);
 
 	lock_sock(sk);

^ permalink raw reply related

* [PATCH net-next] net/mlx5: Make function mlx5_fpga_tls_send_teardown_cmd() static
From: Wei Yongjun @ 2018-05-31  2:31 UTC (permalink / raw)
  To: Boris Pismenny, Saeed Mahameed, Leon Romanovsky, Ilya Lesokhin
  Cc: Wei Yongjun, netdev, linux-rdma, kernel-janitors

Fixes the following sparse warning:

drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c:199:6: warning:
 symbol 'mlx5_fpga_tls_send_teardown_cmd' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c
index 2104801..c973623 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c
@@ -196,8 +196,8 @@ static void mlx5_fpga_tls_flow_to_cmd(void *flow, void *cmd)
 		 MLX5_GET(tls_flow, flow, direction_sx));
 }
 
-void mlx5_fpga_tls_send_teardown_cmd(struct mlx5_core_dev *mdev, void *flow,
-				     u32 swid, gfp_t flags)
+static void mlx5_fpga_tls_send_teardown_cmd(struct mlx5_core_dev *mdev,
+					    void *flow, u32 swid, gfp_t flags)
 {
 	struct mlx5_teardown_stream_context *ctx;
 	struct mlx5_fpga_dma_buf *buf;

^ permalink raw reply related

* Re: [PATCH v2] Revert "alx: remove WoL support"
From: AceLan Kao @ 2018-05-31  2:13 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jay Cliburn, Chris Snook, David S . Miller, Rakesh Pandit, netdev,
	Emily Chien, Johannes Berg, Johannes Stezenbach,
	Linux-Kernel@Vger. Kernel. Org
In-Reply-To: <20180530135859.GB27537@lunn.ch>

Hi Andrew,

2018-05-30 21:58 GMT+08:00 Andrew Lunn <andrew@lunn.ch>:
> On Wed, May 30, 2018 at 10:10:08AM +0800, AceLan Kao wrote:
>> This reverts commit bc2bebe8de8ed4ba6482c9cc370b0dd72ffe8cd2.
>>
>> The WoL feature is a must to pass Energy Star 6.1 and above,
>> the power consumption will be measured during S3 with WoL is enabled.
>>
>> Reverting "alx: remove WoL support", and will try to fix the unintentional
>> wake up issue when WoL is enabled.
>
> Hi AceLan
>
> I find this change log entry rather odd.
>
> If i remember correctly, you first argued that you did not want to
> have to distribute out of tree patches.
Yes, once the secure boot is enabled, no dkms driver would be loaded.

>
> It was suggested that you might be able to justify the revert using
> the argument that the cure is worse than the decease. You ignored
I didn't try to ignore it, maybe I misunderstood what you say. I thought
you do not like the driver parameter, so I only revert back the alx wol
feature.

> that, and when with this Energy Star argument. That got shot down by
> DaveM, and told to actually try to find the problem.
To pass Energy Star is my purpose, I'm sorry to not mention it in the beginning.
We used to using dkms for the measurement, but secure boot is coming,
so we need to make wol feature to be built in the kernel.

And I've written to the device owners for help, but they are not care too much
about the wol feature and are not inconvenient for the testing. So I stuck here
until I saw the user report.

>
> So you then come back and said you think the problem is fixed, but
> don't know exactly what fixed it. So DaveM said try again.
That's another user's report, not me, please refer the link below
https://bugzilla.kernel.org/show_bug.cgi?id=61651#c126

We have no wake up issue and can't reproduce this issue at my side.

>
> Now you are back to Energy Star.
>
> I don't get this. It was the fact you said it was probably fixed that
> made DaveM reconsider. That is the argument you should be using in the
> change log. We want to know what testing you have done. See a
> tested-by: from somebody who had the issue which caused the revert,
> and now says the issue is fixed.
Thanks to remind me and sorry for my ignorance, I never think of adding
tested-by: in the comment, I'll be asking the reporter to provide more info
and put his name in the comment.

Hope my explanation is helpful for the misunderstanding.
And I'll submit another v3 patch once I got the info from reporter.

>
> Ideally we would like to know which change actually fixed the issue,
> so it can be added to stable. But that requires somebody to do a long
> git bisect.
>
>     Andrew
Thanks,

Best regards,
AceLan Kao.

^ permalink raw reply

* Re: [PATCH net-next v12 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Stephen Hemminger @ 2018-05-31  2:06 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: mst, davem, netdev, virtualization, virtio-dev, jesse.brandeburg,
	alexander.h.duyck, kubakici, jasowang, loseweigh, jiri,
	aaron.f.brown, anjali.singhai
In-Reply-To: <1527180917-39737-3-git-send-email-sridhar.samudrala@intel.com>

On Thu, 24 May 2018 09:55:14 -0700
Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:

> Use the registration/notification framework supported by the generic
> failover infrastructure.
> 
> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>

Why was this merged? It was never signed off by any of the netvsc maintainers,
and there were still issues unresolved.

There are also namespaces issues I am fixing and this breaks them.
Will start my patch set with a revert for this. Sorry

^ permalink raw reply

* Re: [PATCH net-next] virtio_net: fix error return code in virtnet_probe()
From: Jason Wang @ 2018-05-31  2:05 UTC (permalink / raw)
  To: Wei Yongjun, Michael S. Tsirkin, Sridhar Samudrala
  Cc: virtualization, netdev, kernel-janitors
In-Reply-To: <1527732307-145609-1-git-send-email-weiyongjun1@huawei.com>



On 2018年05月31日 10:05, Wei Yongjun wrote:
> Fix to return a negative error code from the failover create fail error
> handling case instead of 0, as done elsewhere in this function.
>
> Fixes: ba5e4426e80e ("virtio_net: Extend virtio to use VF datapath when available")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
> ---
>   drivers/net/virtio_net.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 8f08a3e..2d55e2a 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2935,8 +2935,10 @@ static int virtnet_probe(struct virtio_device *vdev)
>   
>   	if (virtio_has_feature(vdev, VIRTIO_NET_F_STANDBY)) {
>   		vi->failover = net_failover_create(vi->dev);
> -		if (IS_ERR(vi->failover))
> +		if (IS_ERR(vi->failover)) {
> +			err = PTR_ERR(vi->failover);
>   			goto free_vqs;
> +		}
>   	}
>   
>   	err = register_netdev(dev);
>

Acked-by: Jason Wang <jasowang@redhat.com>

^ permalink raw reply

* [PATCH net-next] virtio_net: fix error return code in virtnet_probe()
From: Wei Yongjun @ 2018-05-31  2:05 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Sridhar Samudrala
  Cc: Wei Yongjun, virtualization, netdev, kernel-janitors

Fix to return a negative error code from the failover create fail error
handling case instead of 0, as done elsewhere in this function.

Fixes: ba5e4426e80e ("virtio_net: Extend virtio to use VF datapath when available")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 drivers/net/virtio_net.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8f08a3e..2d55e2a 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2935,8 +2935,10 @@ static int virtnet_probe(struct virtio_device *vdev)
 
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_STANDBY)) {
 		vi->failover = net_failover_create(vi->dev);
-		if (IS_ERR(vi->failover))
+		if (IS_ERR(vi->failover)) {
+			err = PTR_ERR(vi->failover);
 			goto free_vqs;
+		}
 	}
 
 	err = register_netdev(dev);

^ permalink raw reply related

* [PATCH net-next] hv_netvsc: fix error return code in netvsc_probe()
From: Wei Yongjun @ 2018-05-31  2:04 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Sridhar Samudrala
  Cc: Wei Yongjun, devel, netdev, kernel-janitors

Fix to return a negative error code from the failover register fail
error handling case instead of 0, as done elsewhere in this function.

Fixes: 1ff78076d8dd ("netvsc: refactor notifier/event handling code to use the failover framework")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 drivers/net/hyperv/netvsc_drv.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index ebe9642..bef4d55 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2031,8 +2031,10 @@ static int netvsc_probe(struct hv_device *dev,
 	}
 
 	net_device_ctx->failover = failover_register(net, &netvsc_failover_ops);
-	if (IS_ERR(net_device_ctx->failover))
+	if (IS_ERR(net_device_ctx->failover)) {
+		ret = PTR_ERR(net_device_ctx->failover);
 		goto err_failover;
+	}
 
 	return ret;

^ permalink raw reply related

* Re: [PATCH] [net-next, wrong] make BPFILTER_UMH depend on X86
From: Masahiro Yamada @ 2018-05-31  1:42 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Arnd Bergmann, David S. Miller, Alexei Starovoitov,
	Linux Kbuild mailing list, netdev, Linux Kernel Mailing List
In-Reply-To: <20180530151736.nzpde2bgzn4koi7f@ast-mbp>

2018-05-31 0:17 GMT+09:00 Alexei Starovoitov <alexei.starovoitov@gmail.com>:
> On Mon, May 28, 2018 at 05:31:01PM +0200, Arnd Bergmann wrote:
>> When build testing across architectures, I run into a build error on
>> all targets other than X86:
>>
>> gcc-8.1.0-nolibc/arm-linux-gnueabi/bin/arm-linux-gnueabi-objdump: net/bpfilter/bpfilter_umh: File format not recognized
>> gcc-8.1.0-nolibc/arm-linux-gnueabi/bin/arm-linux-gnueabi-objcopy:net/bpfilter/bpfilter_umh.o: Invalid bfd target
>>
>> The problem is that 'hostprogs' get built with 'gcc' rather than
>> '$(CROSS_COMPILE)gcc', and my default gcc (as most people's) targets x86.
>>
>> To work around it, adding an X86 dependency gets randconfigs building
>> again on my box.
>>
>> Clearly, this is not a good solution, since it should actually work fine
>> when building native kernels on other architectures but that is now
>> disabled, while cross building an x86 kernel on another host is still
>> broken after my patch.
>>
>> What we probably want here is to try out if the compiler is able to build
>> executables for the target architecture and not build the helper otherwise,
>> at least when compile-testing. No idea how to do that though.
>>
>> Link: http://www.kernel.org/pub/tools/crosstool/
>> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
>> Cc: linux-kbuild@vger.kernel.org
>> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>> ---
>>  net/bpfilter/Kconfig | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/net/bpfilter/Kconfig b/net/bpfilter/Kconfig
>> index 60725c5f79db..61cc4fcbb4d0 100644
>> --- a/net/bpfilter/Kconfig
>> +++ b/net/bpfilter/Kconfig
>> @@ -9,6 +9,7 @@ menuconfig BPFILTER
>>  if BPFILTER
>>  config BPFILTER_UMH
>>       tristate "bpfilter kernel module with user mode helper"
>> +     depends on X86 # actually depends on native builds
>
> depends on X86 will break it on arm.
> I think the better short term fix would be to test that HOSTCC == CC
> It doesn't have to be the same compiler. HOSTCC's arch == kernel ARCH
> Not sure how to hack makefile to do that.
> Long term we need to get rid of HOSTCC dependency.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hmm.
For cross-compiling, we set 'ARCH' via the environment variable or the
command line.

ARCH is not explicitly set, the top-level Makefile sets it to $(SUBARCH)


ARCH ?= $(SUBARCH)


Maybe, we can assume the native build if $(ARCH) and $(SUBARCH) are the same?


-- 
Best Regards
Masahiro Yamada

^ permalink raw reply

* Re: [PATCH net-next] net: qcom/emac: fix unused variable
From: YueHaibing @ 2018-05-31  1:34 UTC (permalink / raw)
  To: Timur Tabi, davem; +Cc: netdev, linux-kernel
In-Reply-To: <0db547d4-27c1-a9f7-f443-86bebd831cb2@codeaurora.org>


On 2018/5/30 20:10, Timur Tabi wrote:
> On 5/29/18 5:43 AM, YueHaibing wrote:
>> When CONFIG_ACPI isn't set, variable qdf2400_ops/qdf2432_ops isn't used.
>> drivers/net/ethernet/qualcomm/emac/emac-sgmii.c:284:25: warning: ‘qdf2400_ops’ defined but not used [-Wunused-variable]
>>   static struct sgmii_ops qdf2400_ops = {
>>                           ^~~~~~~~~~~
>> drivers/net/ethernet/qualcomm/emac/emac-sgmii.c:276:25: warning: ‘qdf2432_ops’ defined but not used [-Wunused-variable]
>>   static struct sgmii_ops qdf2432_ops = {
>>                           ^~~~~~~~~~~
>>
>> Move the declaration and functions inside the CONFIG_ACPI ifdef
>> to fix the warning.
>> Signed-off-by: YueHaibing<yuehaibing@huawei.com>
> 
> I already fixed this with:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=d377df784178bf5b0a39e75dc8b1ee86e1abb3f6
>

Oh,I should notice this, thanks.

^ permalink raw reply

* [PATCH bpf-next] xsk: temporarily disable AF_XDP
From: Björn Töpel @ 2018-05-31  0:17 UTC (permalink / raw)
  To: ast, daniel, netdev
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson

From: Björn Töpel <bjorn.topel@intel.com>

Temporarily disable AF_XDP sockets, and hide uapi.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 include/{uapi => }/linux/if_xdp.h | 0
 net/xdp/Kconfig                   | 2 +-
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename include/{uapi => }/linux/if_xdp.h (100%)

diff --git a/include/uapi/linux/if_xdp.h b/include/linux/if_xdp.h
similarity index 100%
rename from include/uapi/linux/if_xdp.h
rename to include/linux/if_xdp.h
diff --git a/net/xdp/Kconfig b/net/xdp/Kconfig
index 90e4a7152854..d845606dae7b 100644
--- a/net/xdp/Kconfig
+++ b/net/xdp/Kconfig
@@ -1,5 +1,5 @@
 config XDP_SOCKETS
-	bool "XDP sockets"
+	bool "XDP sockets" if n
 	depends on BPF_SYSCALL
 	default n
 	help
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH net] mlx4_core: restore optimal ICM memory allocation
From: Qing Huang @ 2018-05-30 23:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Eric Dumazet, John Sperbeck, Tarick Bedeir,
	Daniel Jurgens, Zhu Yanjun, Tariq Toukan, linux-rdma,
	Santosh Shilimkar
In-Reply-To: <CANn89i+K3LS7+idPBOkcaBPBdBwz9tWCLqkzbWZA10W2mmR9hA@mail.gmail.com>



On 5/30/2018 2:30 PM, Eric Dumazet wrote:
> On Wed, May 30, 2018 at 5:08 PM Qing Huang<qing.huang@oracle.com>  wrote:
>>
>> On 5/30/2018 1:50 PM, Eric Dumazet wrote:
>>> On Wed, May 30, 2018 at 4:30 PM Qing Huang<qing.huang@oracle.com>  wrote:
>>>> On 5/29/2018 9:11 PM, Eric Dumazet wrote:
>>>>> Commit 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
>>>>> brought a regression caught in our regression suite, thanks to KASAN.
>>>> If KASAN reported issue was really caused by smaller chunk sizes,
>>>> changing allocation
>>>> order dynamically will eventually hit the same issue.
>>> Sigh, you have little idea of what your patch really did...
>>>
>>> The KASAN part only shows the tip of the iceberg, but our main concern
>>> is an increase of memory overhead.
>> Well, the commit log only mentioned KASAN and but the change here didn't
>> seem to solve
>> the issue.
> Can you elaborate ?
>
> My patch solves our problems.
>
> Both the memory overhead and KASAN splats are gone.

If KASAN issue was triggered by using smaller chunks, when under memory 
pressure with lots of fragments,
low order memory allocation will do the similar things. So perhaps in 
your test env, memory allocation
and usage is relatively static, that's probably why using larger chunks 
didn't really utilize low order
allocation code path hence no KASAN issue was spotted.

Smaller chunk size in the mlx4 driver is not supposed to cause any 
memory corruption. We will probably
need to continue to investigate this. Can you provide the test command 
that trigger this issue when running
KASAN kernel so we can try to reproduce it in our lab? It could be that 
upstream code is missing some other
fixes.


>>> Alternative is to revert your patch, since we are now very late in 4.17 cycle.
>>>
>>> Memory usage has grown a lot with your patch, since each 4KB page needs a full
>>> struct mlx4_icm_chunk (256 bytes of overhead !)
>> Going to smaller chunks will have some overhead. It depends on the
>> application though.
>> What's the total increased memory consumption in your env?
> As I explained, your patch adds 256 bytes of overhead per 4KB.
>
> Your changelog did not mentioned that at all, and we discovered this
> the hard way.

If you have some concern regarding memory usage, you should bring this 
up during code review.

Repeated failure and retry for lower order allocations could be bad for 
latency too. This wasn't
mentioned in this commit either.

Like I said, how much overhead really depends on the application. 256 
bytes x chunks may not be
significant on a server with lots of memory.

> That is pretty intolerable, and is a blocker for us, memory is precious.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message tomajordomo@vger.kernel.org
> More majordomo info athttp://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
From: Jakub Kicinski @ 2018-05-30 22:53 UTC (permalink / raw)
  To: Samudrala, Sridhar; +Cc: Michael Chan, David Miller, Netdev, Or Gerlitz
In-Reply-To: <6dd76ffc-4097-0e5e-6f66-78fd178f89c2@intel.com>

On Wed, 30 May 2018 14:23:06 -0700, Samudrala, Sridhar wrote:
> On 5/29/2018 11:33 PM, Jakub Kicinski wrote:
> > On Tue, 29 May 2018 23:08:11 -0700, Michael Chan wrote:  
> >> On Tue, May 29, 2018 at 10:56 PM, Jakub Kicinski wrote:  
> >>> On Tue, 29 May 2018 20:19:54 -0700, Michael Chan wrote:  
> >>>> On Tue, May 29, 2018 at 1:46 PM, Samudrala, Sridhar wrote:  
> >>>>> Isn't ndo_set_vf_xxx() considered a legacy interface and not planned to be
> >>>>> extended?  
> >>> +1 it's painful to see this feature being added to the legacy
> >>> API :(  Another duplicated configuration knob.
> >>>  
> >>>> I didn't know about that.
> >>>>  
> >>>>> Shouldn't we enable this via ethtool on the port representor netdev?  
> >>>> We discussed about this.  ethtool on the VF representor will only work
> >>>> in switchdev mode and also will not support min/max values.  
> >>> Ethtool channel API may be overdue a rewrite in devlink anyway, but I
> >>> feel like implementing switchdev mode and rewriting features in devlink
> >>> may be too much to ask.  
> >> Totally agreed.  And switchdev mode doesn't seem to be that widely
> >> used at the moment.  Do you have other suggestions besides NDO?  
> > At some points you (Broadcom) were working whole bunch of devlink
> > configuration options for the PCIe side of the ASIC.  The number of
> > queues relates to things like number of allocated MSI-X vectors, which
> > if memory serves me was in your devlink patch set.  In an ideal world
> > we would try to keep all those in one place :)
> >
> > For PCIe config there is always the question of what can be configured
> > at runtime, and what requires a HW reset.  Therefore that devlink API
> > which could configure current as well as persistent device settings was
> > quite nice.  I'm not sure if reallocating queues would ever require
> > PCIe block reset but maybe...  Certainly it seems the notion of min
> > queues would make more sense in PCIe configuration devlink API than
> > ethtool channel API to me as well.
> >
> > Queues are in the grey area between netdev and non-netdev constructs.
> > They make sense both from PCIe resource allocation perspective (i.e.
> > devlink PCIe settings) and netdev perspective (ethtool) because they
> > feed into things like qdisc offloads, maybe per-queue stats etc.
> >
> > So yes...  IMHO it would be nice to add this to a devlink SR-IOV config
> > API and/or switchdev representors.  But neither of those are really an
> > option for you today so IDK :)  
> 
> One reason why 'switchdev' mode is not yet widely used or enabled by default
> could be due to the requirement to program the flow rules only via slow path.

Do you mean the fallback traffic requirement?

> Would it make sense to relax this requirement and support a mode where port
> representors are created and let the PF driver implement a default policy that
> adds flow rules for all the VFs to enable connectivity and let the user
> add/modify the rules via port representors?

I definitely share your concerns, stopping a major HW vendor from using
this new and preferred mode is not helping us make progress.

The problem is that if we allow this diversion, i.e. driver to implement
some special policy, or pre-populate a bridge in a configuration that
suits the HW we may condition users to expect that as the standard Linux
behaviour.  And we will be stuck with it forever even tho your next gen
HW (ice?) may support correct behaviour.

We should perhaps separate switchdev mode from TC flower/OvS offloads.
Is your objective to implement OvS offload or just switchdev mode?  

For OvS without proper fallback behaviour you may struggle.

Switchdev mode could be within your reach even without changing the
default rules.  What if you spawned all port netdevs (I dislike the
term representor, sorry, it's confusing people) in down state and then
refuse to bring them up unless user instantiated a bridge that would
behave in a way that your HW can support?  If ports are down you won't
have fallback traffic so no problem to solve.

^ permalink raw reply

* greetings
From: Miss Zeliha ömer Faruk @ 2018-05-30 22:34 UTC (permalink / raw)




-- 
Hello

I have been trying to contact you. Did you get my business proposal?

Best Regards,
Miss.Zeliha ömer faruk
Esentepe Mahallesi Büyükdere
Caddesi Kristal Kule Binasi
No:215 Sisli - Istanbul, Turke

^ permalink raw reply

* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
From: Jakub Kicinski @ 2018-05-30 22:36 UTC (permalink / raw)
  To: Michael Chan; +Cc: Samudrala, Sridhar, David Miller, Netdev
In-Reply-To: <CACKFLi=-QVePYwAiG+boXoH0tdWTaM1ZRyV4qxsj0Au1w5PKBQ@mail.gmail.com>

On Wed, 30 May 2018 00:18:39 -0700, Michael Chan wrote:
> On Tue, May 29, 2018 at 11:33 PM, Jakub Kicinski wrote:
> > At some points you (Broadcom) were working whole bunch of devlink
> > configuration options for the PCIe side of the ASIC.  The number of
> > queues relates to things like number of allocated MSI-X vectors, which
> > if memory serves me was in your devlink patch set.  In an ideal world
> > we would try to keep all those in one place :)  
> 
> Yeah, another colleague is now working with Mellanox on something similar.
> 
> One difference between those devlink parameters and these queue
> parameters is that the former are more permanent and global settings.
> For example, number of VFs or number of MSIX per VF are persistent
> settings once they are set and after PCIe reset.  On the other hand,
> these queue settings are pure run-time settings and may be unique for
> each VF.  These are not stored as there is no room in NVRAM to store
> 128 sets or more of these parameters.

Indeed, I think the API must be flexible as to what is persistent and
what is not because HW will certainly differ in that regard.  And
agreed, queues may be a bit of a stretch here, but worth a try.

> Anyway, let me discuss this with my colleague to see if there is a
> natural fit for these queue parameters in the devlink infrastructure
> that they are working on.

Thank you!

^ permalink raw reply

* [PATCH] rtnetlink: Remove VLA usage
From: Kees Cook @ 2018-05-30 22:20 UTC (permalink / raw)
  To: David S. Miller; +Cc: Florian Westphal, David Ahern, netdev, linux-kernel

In the quest to remove all stack VLA usage from the kernel[1], this
allocates the maximum size expected for all possible types and adds
sanity-checks at both registration and usage to make sure nothing gets
out of sync. This matches the proposed VLA solution for nfnetlink[2]. The
values chosen here were based on finding assignments for .maxtype and
.slave_maxtype and manually counting the enums:

slave_maxtype (max 33):
	IFLA_BRPORT_MAX     33
	IFLA_BOND_SLAVE_MAX  9

maxtype (max 45):
	IFLA_BOND_MAX       28
	IFLA_BR_MAX         45
	__IFLA_CAIF_HSI_MAX  8
	IFLA_CAIF_MAX        4
	IFLA_CAN_MAX        16
	IFLA_GENEVE_MAX     12
	IFLA_GRE_MAX        25
	IFLA_GTP_MAX         5
	IFLA_HSR_MAX         7
	IFLA_IPOIB_MAX       4
	IFLA_IPTUN_MAX      21
	IFLA_IPVLAN_MAX      3
	IFLA_MACSEC_MAX     15
	IFLA_MACVLAN_MAX     7
	IFLA_PPP_MAX         2
	__IFLA_RMNET_MAX     4
	IFLA_VLAN_MAX        6
	IFLA_VRF_MAX         2
	IFLA_VTI_MAX         7
	IFLA_VXLAN_MAX      28
	VETH_INFO_MAX        2
	VXCAN_INFO_MAX       2

This additionally changes maxtype and slave_maxtype fields to unsigned,
since they're only ever using positive values.

[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
[2] https://patchwork.kernel.org/patch/10439647/

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/net/rtnetlink.h |  4 ++--
 net/core/rtnetlink.c    | 18 ++++++++++++++++--
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index 14b6b3af8918..0bbaa5488423 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -64,7 +64,7 @@ struct rtnl_link_ops {
 	size_t			priv_size;
 	void			(*setup)(struct net_device *dev);
 
-	int			maxtype;
+	unsigned int		maxtype;
 	const struct nla_policy	*policy;
 	int			(*validate)(struct nlattr *tb[],
 					    struct nlattr *data[],
@@ -92,7 +92,7 @@ struct rtnl_link_ops {
 	unsigned int		(*get_num_tx_queues)(void);
 	unsigned int		(*get_num_rx_queues)(void);
 
-	int			slave_maxtype;
+	unsigned int		slave_maxtype;
 	const struct nla_policy	*slave_policy;
 	int			(*slave_changelink)(struct net_device *dev,
 						    struct net_device *slave_dev,
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 45936922d7e2..4ede33719ca9 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -59,6 +59,9 @@
 #include <net/rtnetlink.h>
 #include <net/net_namespace.h>
 
+#define RTNL_MAX_TYPE		48
+#define RTNL_SLAVE_MAX_TYPE	36
+
 struct rtnl_link {
 	rtnl_doit_func		doit;
 	rtnl_dumpit_func	dumpit;
@@ -389,6 +392,11 @@ int rtnl_link_register(struct rtnl_link_ops *ops)
 {
 	int err;
 
+	/* Sanity-check max sizes to avoid stack buffer overflow. */
+	if (WARN_ON(ops->maxtype > RTNL_MAX_TYPE ||
+		    ops->slave_maxtype > RTNL_SLAVE_MAX_TYPE))
+		return -EINVAL;
+
 	rtnl_lock();
 	err = __rtnl_link_register(ops);
 	rtnl_unlock();
@@ -2900,13 +2908,16 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	}
 
 	if (1) {
-		struct nlattr *attr[ops ? ops->maxtype + 1 : 1];
-		struct nlattr *slave_attr[m_ops ? m_ops->slave_maxtype + 1 : 1];
+		struct nlattr *attr[RTNL_MAX_TYPE + 1];
+		struct nlattr *slave_attr[RTNL_SLAVE_MAX_TYPE + 1];
 		struct nlattr **data = NULL;
 		struct nlattr **slave_data = NULL;
 		struct net *dest_net, *link_net = NULL;
 
 		if (ops) {
+			if (ops->maxtype > RTNL_MAX_TYPE)
+				return -EINVAL;
+
 			if (ops->maxtype && linkinfo[IFLA_INFO_DATA]) {
 				err = nla_parse_nested(attr, ops->maxtype,
 						       linkinfo[IFLA_INFO_DATA],
@@ -2923,6 +2934,9 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 		}
 
 		if (m_ops) {
+			if (ops->slave_maxtype > RTNL_SLAVE_MAX_TYPE)
+				return -EINVAL;
+
 			if (m_ops->slave_maxtype &&
 			    linkinfo[IFLA_INFO_SLAVE_DATA]) {
 				err = nla_parse_nested(slave_attr,
-- 
2.17.0


-- 
Kees Cook
Pixel Security

^ permalink raw reply related

* Re: [bpf-next V1 PATCH 0/8] bpf/xdp: add flags argument to ndo_xdp_xmit and flag flush operation
From: Song Liu @ 2018-05-30 22:18 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Networking, Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	makita.toshiaki
In-Reply-To: <152770312703.20510.5854417568847239931.stgit@firesoul>

Overall, this set looks good to me. The only suggestion I have is to add more
documentation on the expected behavior of XDP_XMIT_FLUSH in netdevice.h
(as part of 01/08).

Thanks,
Song


On Wed, May 30, 2018 at 11:00 AM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
> As I mentioned in merge commit 10f678683e4 ("Merge branch 'xdp_xmit-bulking'")
> I plan to change the API for ndo_xdp_xmit once more, by adding a flags
> argument, which is done in this patchset.
>
> I know it is late in the cycle (currently at rc7), but it would be
> nice to avoid changing NDOs over several kernel releases, as it is
> annoying to vendors and distro backporters, but it is not strictly
> UAPI so it is allowed (according to Alexei).
>
> The end-goal is getting rid of the ndo_xdp_flush operation, as it will
> make it possible for drivers to implement a TXQ synchronization mechanism
> that is not necessarily derived from the CPU id (smp_processor_id).
>
> This patchset removes all callers of the ndo_xdp_flush operation, but
> it doesn't take the last step of removing it from all drivers.  This
> can be done later, or I can update the patchset on request.
>
> Micro-benchmarks only show a very small performance improvement, for
> map-redirect around ~2 ns, and for non-map redirect ~7 ns.  I've not
> benchmarked this with CONFIG_RETPOLINE, but the performance benefit
> should be more visible given we end-up removing an indirect call.
>
> ---
>
> Jesper Dangaard Brouer (8):
>       xdp: add flags argument to ndo_xdp_xmit API
>       i40e: implement flush flag for ndo_xdp_xmit
>       ixgbe: implement flush flag for ndo_xdp_xmit
>       tun: implement flush flag for ndo_xdp_xmit
>       virtio_net: implement flush flag for ndo_xdp_xmit
>       xdp: done implementing ndo_xdp_xmit flush flag for all drivers
>       bpf/xdp: non-map redirect can avoid calling ndo_xdp_flush
>       bpf/xdp: devmap can avoid calling ndo_xdp_flush
>
>
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c   |    9 ++++++++-
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h   |    3 ++-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   23 +++++++++++++++++------
>  drivers/net/tun.c                             |   25 ++++++++++++++++++-------
>  drivers/net/virtio_net.c                      |    9 ++++++++-
>  include/linux/netdevice.h                     |    7 ++++---
>  include/net/xdp.h                             |    4 ++++
>  kernel/bpf/devmap.c                           |   20 +++++++-------------
>  net/core/filter.c                             |    3 +--
>  9 files changed, 69 insertions(+), 34 deletions(-)
>
> --

^ permalink raw reply

* [PATCH net-next] net: dsa: mv88e6xxx: Be explicit about DT or pdata
From: Andrew Lunn @ 2018-05-30 22:15 UTC (permalink / raw)
  To: David Miller
  Cc: Vivien Didelot, Florian Fainelli, dan.carpenter, netdev,
	Andrew Lunn

Make it explicit that either device tree is used or platform data.  If
neither is available, abort the probe.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 877b7cb0b6f2 ("net: dsa: mv88e6xxx: Add minimal platform_data support")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 12df00f593b7..437cd6eb4faa 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -4389,6 +4389,9 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
 	int port;
 	int err;
 
+	if (!np && !pdata)
+		return -EINVAL;
+
 	if (np)
 		compat_info = of_device_get_match_data(dev);
 
-- 
2.17.0

^ permalink raw reply related

* Re: [bpf-next V1 PATCH 8/8] bpf/xdp: devmap can avoid calling ndo_xdp_flush
From: Song Liu @ 2018-05-30 22:06 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Networking, Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	makita.toshiaki
In-Reply-To: <152770327343.20510.17045843025201198801.stgit@firesoul>

On Wed, May 30, 2018 at 11:01 AM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
> The XDP_REDIRECT map devmap can avoid using ndo_xdp_flush, by instead
> instructing ndo_xdp_xmit to flush via XDP_XMIT_FLUSH flag in
> appropriate places.
>
> Notice after this patch it is possible to remove ndo_xdp_flush
> completely, as this is the last user of ndo_xdp_flush. This is left
> for later patches, to keep driver changes separate.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  kernel/bpf/devmap.c |   20 +++++++-------------
>  1 file changed, 7 insertions(+), 13 deletions(-)
>
> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index 04fbd75a5274..9c846a7a8cff 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -217,7 +217,7 @@ void __dev_map_insert_ctx(struct bpf_map *map, u32 bit)
>  }
>
>  static int bq_xmit_all(struct bpf_dtab_netdev *obj,
> -                        struct xdp_bulk_queue *bq)
> +                      struct xdp_bulk_queue *bq, bool flush)

How about we use "int flags" instead of "bool flush" for easier extension?

Thanks,
Song

>  {
>         struct net_device *dev = obj->dev;
>         int sent = 0, drops = 0, err = 0;
> @@ -232,7 +232,8 @@ static int bq_xmit_all(struct bpf_dtab_netdev *obj,
>                 prefetch(xdpf);
>         }
>
> -       sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q, 0);
> +       sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q,
> +                                            flush ? XDP_XMIT_FLUSH : 0);
>         if (sent < 0) {
>                 err = sent;
>                 sent = 0;
> @@ -276,7 +277,6 @@ void __dev_map_flush(struct bpf_map *map)
>         for_each_set_bit(bit, bitmap, map->max_entries) {
>                 struct bpf_dtab_netdev *dev = READ_ONCE(dtab->netdev_map[bit]);
>                 struct xdp_bulk_queue *bq;
> -               struct net_device *netdev;
>
>                 /* This is possible if the dev entry is removed by user space
>                  * between xdp redirect and flush op.
> @@ -287,10 +287,7 @@ void __dev_map_flush(struct bpf_map *map)
>                 __clear_bit(bit, bitmap);
>
>                 bq = this_cpu_ptr(dev->bulkq);
> -               bq_xmit_all(dev, bq);
> -               netdev = dev->dev;
> -               if (likely(netdev->netdev_ops->ndo_xdp_flush))
> -                       netdev->netdev_ops->ndo_xdp_flush(netdev);
> +               bq_xmit_all(dev, bq, true);
>         }
>  }
>
> @@ -320,7 +317,7 @@ static int bq_enqueue(struct bpf_dtab_netdev *obj, struct xdp_frame *xdpf,
>         struct xdp_bulk_queue *bq = this_cpu_ptr(obj->bulkq);
>
>         if (unlikely(bq->count == DEV_MAP_BULK_SIZE))
> -               bq_xmit_all(obj, bq);
> +               bq_xmit_all(obj, bq, false);
>
>         /* Ingress dev_rx will be the same for all xdp_frame's in
>          * bulk_queue, because bq stored per-CPU and must be flushed
> @@ -359,8 +356,7 @@ static void *dev_map_lookup_elem(struct bpf_map *map, void *key)
>
>  static void dev_map_flush_old(struct bpf_dtab_netdev *dev)
>  {
> -       if (dev->dev->netdev_ops->ndo_xdp_flush) {
> -               struct net_device *fl = dev->dev;
> +       if (dev->dev->netdev_ops->ndo_xdp_xmit) {
>                 struct xdp_bulk_queue *bq;
>                 unsigned long *bitmap;
>
> @@ -371,9 +367,7 @@ static void dev_map_flush_old(struct bpf_dtab_netdev *dev)
>                         __clear_bit(dev->bit, bitmap);
>
>                         bq = per_cpu_ptr(dev->bulkq, cpu);
> -                       bq_xmit_all(dev, bq);
> -
> -                       fl->netdev_ops->ndo_xdp_flush(dev->dev);
> +                       bq_xmit_all(dev, bq, true);
>                 }
>         }
>  }
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox