Netdev List
 help / color / mirror / Atom feed
* linux-next: manual merge of the akpm tree with the bpf tree
From: Stephen Rothwell @ 2019-02-12  6:14 UTC (permalink / raw)
  To: Andrew Morton, Daniel Borkmann, Alexei Starovoitov, Networking
  Cc: Linux Next Mailing List, Linux Kernel Mailing List,
	Davidlohr Bueso

[-- Attachment #1: Type: text/plain, Size: 712 bytes --]

Hi all,

Today's linux-next merge of the akpm tree got a conflict in:

  net/xdp/xdp_umem.c

between commit:

  e451eb510684 ("xsk: share the mmap_sem for page pinning")

from the bpf tree and patch:

  "net/xdp/xdp_umem.c: do not use mmap_sem"

from the akpm tree.

I fixed it up (I dropped the akpm tree patch) and can carry the fix as
necessary. This is now fixed as far as linux-next is concerned, but any
non trivial conflicts should be mentioned to your upstream maintainer
when your tree is submitted for merging.  You may also want to consider
cooperating with the maintainer of the conflicting tree to minimise any
particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
From: Heiner Kallweit @ 2019-02-12  6:51 UTC (permalink / raw)
  To: Andrew Lunn, John David Anglin
  Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev
In-Reply-To: <20190212035806.GE19023@lunn.ch>

On 12.02.2019 04:58, Andrew Lunn wrote:
>>> Hi David
>>>
>>> I just tested this on one of my boards. It loops endlessly:
>>>
>>> [   47.173396] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
>>> [   47.182108] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
>>> [   47.190820] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
>>> [   47.199535] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
>>> [   47.208254] mv88e6xxx_g1_irq_thread_work: c881 a8 80   
>>>
>>> These are reg, ctl1, reg & ctl1.
>>>
>>> So there is an unhandled device interrupt.
> 
> Hi Heiner
> 
> Your patch Fixes: 2b3e88ea6528 ("net: phy: improve phy state
> checking") is causing me problems with interrupts for the Marvell
> switches.
> 
Hi Andrew,

what kernel version is it?
And the PHY driver in use is "Marvell 88E6390" ?

> That change means we don't check the PHY device if it caused an
> interrupt when its state is less than UP.
> 
> What i'm seeing is that the PHY is interrupting pretty early on after
> a reboot when the previous boot had the interface up.
> 
So this means that when going down for reboot the interrupts are not
properly masked / disabled? Because (at least for net-next) we enable
interrupts in phy_start() only.


> [   10.125702] Marvell 88E6390 mv88e6xxx-0:02: phy_start_interrupts
> [   10.162798] Marvell 88E6390 mv88e6xxx-0:02: phy_enable_interrupts
> [   10.168931] Marvell 88E6390 mv88e6xxx-0:02: marvell_ack_interrupt
> [   10.180164] Marvell 88E6390 mv88e6xxx-0:02: marvell_config_intr 1
> 
> a little later it interrupts:
> 
> [   12.999717] mv88e6xxx_g1_irq_thread_fn
> [   13.007253] mv88e6xxx_g2_irq_thread_fn: 4 811c 4
> [   13.012015] libphy: __phy_is_started: phydev->state 1 PHY_UP 3
> [   13.017941] Marvell 88E6390 mv88e6xxx-0:02: phy_interrupt: phy_is_started(phydev) 0
> 
> The current code just causes it to be ignored. So the interrupts fires
> again, and again...
> 
I would have more expected the opposite. If the interrupt is ignored
(IRQ_NONE returned), then it doesn't get acked. And if it's not acked
new interrupts should be blocked. Or is it different with this chip?

> If i change to code to call into the PHY driver and let it handle the
> interrupts, things keep running. A little bit later the interface is
> configured up:
> 
> [   15.921326] mv88e6085 gpio-0:00 red: configuring for phy/gmii link mode
> [   15.928693] libphy: __phy_is_started: phydev->state 3 PHY_UP 3
> [   15.929442] IPv6: ADDRCONF(NETDEV_UP): red: link is not ready
> [   15.935596] Marvell 88E6390 mv88e6xxx-0:02: m88e6390_config_aneg
> [   15.935608] Marvell 88E6390 mv88e6xxx-0:02: m88e6390_errata
> 
> [   16.071364] Marvell 88E6390 mv88e6xxx-0:02: m88e1510_config_aneg
> [   16.112362] Marvell 88E6390 mv88e6xxx-0:02: m88e1318_config_aneg
> [   16.151245] Marvell 88E6390 mv88e6xxx-0:02: m88e1121_config_aneg
> [   16.368206] Marvell 88E6390 mv88e6xxx-0:02: PHY state change UP -> NOLINK
> 
> and after another interrupt the link goes up.
> 
> [   19.519840] mv88e6xxx_g1_irq_thread_fn
> [   19.528546] mv88e6xxx_g2_irq_thread_fn: 4 811c 4
> [   19.534152] libphy: __phy_is_started: phydev->state 5 PHY_UP 3
> [   19.540030] Marvell 88E6390 mv88e6xxx-0:02: phy_interrupt: phy_is_started(phydev) 1
> [   19.547721] Marvell 88E6390 mv88e6xxx-0:02: m88e1121_did_interrupt
> [   19.559829] Marvell 88E6390 mv88e6xxx-0:02: marvell_ack_interrupt
> [   19.590753] Marvell 88E6390 mv88e6xxx-0:02: marvell_read_status
> [   19.596712] Marvell 88E6390 mv88e6xxx-0:02: marvell_update_link
> [   19.628387] Marvell 88E6390 mv88e6xxx-0:02: PHY state change NOLINK -> RUNNING
> [   19.628453] mv88e6085 gpio-0:00 red: Link is Up - 1Gbps/Full - flow control off
> [   19.635920] IPv6: ADDRCONF(NETDEV_CHANGE): red: link becomes ready
> 
> I don't yet know why the first interrupt happens, before we configure
> auto-neg, etc. But it is not too unreasonable. We have configured
> interrupts, so it could be reporting link down etc.
> 
> So i think we might need to revert part of this change, call into the
> driver so long as the PHY is not in state PHY_HALTED.
> 
> What do you think?
> 
I will take a closer look later.

>      Andrew
> 
Heiner

^ permalink raw reply

* [PATCH net-next 1/2] devlink: Return right error code in case of errors for region read
From: Parav Pandit @ 2019-02-12  7:09 UTC (permalink / raw)
  To: jiri, davem, netdev; +Cc: parav

devlink_nl_cmd_region_read_dumpit() misses to return right error code on
most error conditions.
Return the right error code on such errors.

Fixes: 4e54795a27f5 ("devlink: Add support for region snapshot read command")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>

---
 net/core/devlink.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 5e2ef5a..8a198ba 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3643,26 +3643,34 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 		goto out;
 
 	devlink = devlink_get_from_attrs(sock_net(cb->skb->sk), attrs);
-	if (IS_ERR(devlink))
+	if (IS_ERR(devlink)) {
+		err = -ENODEV;
 		goto out;
+	}
 
 	mutex_lock(&devlink_mutex);
 	mutex_lock(&devlink->lock);
 
 	if (!attrs[DEVLINK_ATTR_REGION_NAME] ||
-	    !attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID])
+	    !attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]) {
+		err = -EINVAL;
 		goto out_unlock;
+	}
 
 	region_name = nla_data(attrs[DEVLINK_ATTR_REGION_NAME]);
 	region = devlink_region_get_by_name(devlink, region_name);
-	if (!region)
+	if (!region) {
+		err = -EINVAL;
 		goto out_unlock;
+	}
 
 	hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq,
 			  &devlink_nl_family, NLM_F_ACK | NLM_F_MULTI,
 			  DEVLINK_CMD_REGION_READ);
-	if (!hdr)
+	if (!hdr) {
+		err = -EMSGSIZE;
 		goto out_unlock;
+	}
 
 	err = devlink_nl_put_handle(skb, devlink);
 	if (err)
@@ -3673,8 +3681,10 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 		goto nla_put_failure;
 
 	chunks_attr = nla_nest_start(skb, DEVLINK_ATTR_REGION_CHUNKS);
-	if (!chunks_attr)
+	if (!chunks_attr) {
+		err = -EMSGSIZE;
 		goto nla_put_failure;
+	}
 
 	if (attrs[DEVLINK_ATTR_REGION_CHUNK_ADDR] &&
 	    attrs[DEVLINK_ATTR_REGION_CHUNK_LEN]) {
@@ -3715,7 +3725,7 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 	mutex_unlock(&devlink->lock);
 	mutex_unlock(&devlink_mutex);
 out:
-	return 0;
+	return err;
 }
 
 struct devlink_info_req {
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH net-next 2/2] devlink: Fix list access without lock while reading region
From: Parav Pandit @ 2019-02-12  7:09 UTC (permalink / raw)
  To: jiri, davem, netdev; +Cc: parav

While finding the devlink device during region reading,
devlink device list is accessed and devlink device is
returned without holding a lock. This could lead to user-after-free
accesses.

While at it, add lockdep assert to ensure that all future callers hold
the lock when calling devlink_get_from_attrs().

Fixes: 4e54795a27f5 ("devlink: Add support for region snapshot read command")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/core/devlink.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 8a198ba..8cd773b 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -116,6 +116,8 @@ static struct devlink *devlink_get_from_attrs(struct net *net,
 	busname = nla_data(attrs[DEVLINK_ATTR_BUS_NAME]);
 	devname = nla_data(attrs[DEVLINK_ATTR_DEV_NAME]);
 
+	lockdep_assert_held(&devlink_mutex);
+
 	list_for_each_entry(devlink, &devlink_list, list) {
 		if (strcmp(devlink->dev->bus->name, busname) == 0 &&
 		    strcmp(dev_name(devlink->dev), devname) == 0 &&
@@ -3642,13 +3644,13 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 	if (err)
 		goto out;
 
+	mutex_lock(&devlink_mutex);
 	devlink = devlink_get_from_attrs(sock_net(cb->skb->sk), attrs);
 	if (IS_ERR(devlink)) {
 		err = -ENODEV;
-		goto out;
+		goto out_dev;
 	}
 
-	mutex_lock(&devlink_mutex);
 	mutex_lock(&devlink->lock);
 
 	if (!attrs[DEVLINK_ATTR_REGION_NAME] ||
@@ -3723,6 +3725,7 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 	genlmsg_cancel(skb, hdr);
 out_unlock:
 	mutex_unlock(&devlink->lock);
+out_dev:
 	mutex_unlock(&devlink_mutex);
 out:
 	return err;
-- 
1.8.3.1


^ permalink raw reply related

* Re: [PATCH net-next 3/3] staging: fsl-dpaa2: ethsw: Remove getting PORT_BRIDGE_FLAGS
From: Greg KH @ 2019-02-12  7:28 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, open list:STAGING SUBSYSTEM, andrew,
	moderated list:ETHERNET BRIDGE, open list, vivien.didelot,
	Ido Schimmel, jiri, David S. Miller
In-Reply-To: <20190211211749.19847-4-f.fainelli@gmail.com>

On Mon, Feb 11, 2019 at 01:17:49PM -0800, Florian Fainelli wrote:
> There is no code that tries to get the attribute
> SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS, remove support for doing that.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
>  drivers/staging/fsl-dpaa2/ethsw/ethsw.c | 5 -----
>  1 file changed, 5 deletions(-)

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply

* Re: [oss-drivers] Re: [RFC 1/3] devlink: add flash update command
From: Jiri Pirko @ 2019-02-12  7:26 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers, mkubecek, andrew
In-Reply-To: <20190212082530.48cbb382@cakuba.netronome.com>

Mon, Feb 11, 2019 at 08:25:30PM CET, jakub.kicinski@netronome.com wrote:
>On Mon, 11 Feb 2019 17:45:13 +0100, Jiri Pirko wrote:
>> Mon, Feb 11, 2019 at 07:59:20AM CET, jakub.kicinski@netronome.com wrote:
>> >Add devlink flash update command. Advanced NICs have firmware
>> >stored in flash and often cryptographically secured. Updating
>> >that flash is handled by management firmware. Ethtool has a
>> >flash update command which served us well, however, it has two
>> >shortcomings:
>> > - it takes rtnl_lock unnecessarily - really flash update has
>> >   nothing to do with networking, so using a networking device
>> >   as a handle is suboptimal, which leads us to the second one:
>> > - it requires a functioning netdev - in case device enters an
>> >   error state and can't spawn a netdev (e.g. communication
>> >   with the device fails) there is no netdev to use as a handle
>> >   for flashing.
>> >
>> >Devlink already has the ability to report the firmware versions,
>> >now with the ability to update the firmware/flash we will be
>> >able to recover devices in bad state.
>> >
>> >To enable easy interoperability with ethtool add the target
>> >partition ID. We may or may not add a different method of
>> >identification, but there is no such immediate need.
>> >
>> >Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>> >---
>> > include/net/devlink.h        |  2 ++
>> > include/uapi/linux/devlink.h |  6 ++++++
>> > net/core/devlink.c           | 30 ++++++++++++++++++++++++++++++
>> > 3 files changed, 38 insertions(+)
>> >
>> >diff --git a/include/net/devlink.h b/include/net/devlink.h
>> >index 07660fe4c0e3..55b3478b1291 100644
>> >--- a/include/net/devlink.h
>> >+++ b/include/net/devlink.h
>> >@@ -529,6 +529,8 @@ struct devlink_ops {
>> > 				      struct netlink_ext_ack *extack);
>> > 	int (*info_get)(struct devlink *devlink, struct devlink_info_req *req,
>> > 			struct netlink_ext_ack *extack);
>> >+	int (*flash_update)(struct devlink *devlink, const char *path,
>> >+			    u32 target, struct netlink_ext_ack *extack);
>> > };
>> > 
>> > static inline void *devlink_priv(struct devlink *devlink)
>> >diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
>> >index 72d9f7c89190..f4417283fd1b 100644
>> >--- a/include/uapi/linux/devlink.h
>> >+++ b/include/uapi/linux/devlink.h
>> >@@ -103,6 +103,8 @@ enum devlink_command {
>> > 	DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET,
>> > 	DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR,
>> > 
>> >+	DEVLINK_CMD_FLASH_UPDATE,
>> >+
>> > 	/* add new commands above here */
>> > 	__DEVLINK_CMD_MAX,
>> > 	DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
>> >@@ -326,6 +328,10 @@ enum devlink_attr {
>> > 	DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS,		/* u64 */
>> > 	DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD,	/* u64 */
>> > 	DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER,	/* u8 */
>> >+
>> >+	DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME,	/* string */
>> >+	DEVLINK_ATTR_FLASH_UPDATE_TARGET_ID,	/* u32 */  
>> 
>> Do we need to carry this on? I mean, the ef->region is only checked in 4
>> drivers against ETHTOOL_FLASH_ALL_REGIONS, which is default.
>> There is this bnxt driver which is the only one working with this value.
>> I think that for compat, there should be some id-region mapping
>> provided by driver which the compat layer would use to translate.
>> 
>> I also think that this should be in sync with what is returned in
>> DEVLINK_ATTR_INFO_VERSION_NAME.
>> 
>> For example:
>> $ devlink dev info pci/0000:82:00.0
>> pci/0000:82:00.0:
>>   driver nfp
>>   serial_number 16240145
>>   versions:
>>       fixed:
>>         board.id AMDA0081-0001
>>         board.rev 15
>>         board.vendor SMA
>>         board.model hydrogen
>>       running:
>>         fw.mgmt 010181.010181.0101d4
>>         fw.cpld 0x1030000
>>         fw.app abm-d372b6
>>         fw.undi 0.0.2
>>         chip.init AMDA-0081-0001  20160318164536
>>       stored:
>>         fw.mgmt 010181.010181.0101d4
>>         fw.app abm-d372b6
>>         fw.undi 0.0.2
>>         chip.init AMDA-0081-0001  20160318164536
>> 
>> Now user should be able to use one of the identifiers to flash relevant
>> fw, like:
>> 
>> devlink dev flash pci/0000:82:00.0 XXX fw.mgmt file flash-boot.bin
>> 
>> I'm not sure about the name of "xxx" attribute. Maybe "id":
>> 
>> devlink dev flash pci/0000:82:00.0 id fw.mgmt file flash-boot.bin
>> devlink dev flash pci/0000:82:00.0 id fw.cpld file some-other.bin
>
>Agreed, that looks good!  TBH in case of Netronome the binary
>image contains an identifier so it will update the correct component

Okay. So in case the "component" attr is omitted, there would be some
flag passed to the driver so it would know that the file contains more
component binaries and has to do parsing itself/in-fw.


>automatically.  That's why I say "no immediate need" :)  (How about
>"component" instead of "id", BTW?)

Component is fine by me.


>
>I will drop the target ID, I just added it for full backward compat
>with ethtool, but it may be confusing, given it would be mostly unused.

Ok.


>I'll drop it in non-RFC, do you want me to add the id/component or leave
>it out for now?

I think it would be good to add it and have the api complete.

^ permalink raw reply

* Re: [PATCH v2] net: fix IPv6 prefix route residue
From: Zhiqiang Liu @ 2019-02-12  7:43 UTC (permalink / raw)
  To: David Miller
  Cc: kuznet, yoshfuji, 0xeffeff, edumazet, netdev, mingfangsen,
	zhangwenhao8, wangxiaogang3, zhoukang7, dsahern, thaller,
	maowenan
In-Reply-To: <20190211.203801.1570457113402893162.davem@davemloft.net>

> From: Zhiqiang Liu <liuzhiqiang26@huawei.com>
> Date: Mon, 11 Feb 2019 10:57:46 +0800
> 
>> From: Zhiqiang Liu <liuzhiqiang26@huawei.com>
>>
>> Follow those steps:
>>  # ip addr add 2001:123::1/32 dev eth0
>>  # ip addr add 2001:123:456::2/64 dev eth0
>>  # ip addr del 2001:123::1/32 dev eth0
>>  # ip addr del 2001:123:456::2/64 dev eth0
>> and then prefix route of 2001:123::1/32 will still exist.
>>
>> This is because ipv6_prefix_equal in check_cleanup_prefix_route
>> func does not check whether two IPv6 addresses have the same
>> prefix length. If the prefix of one address starts with another
>> shorter address prefix, even though their prefix lengths are
>> different, the return value of ipv6_prefix_equal is true.
>>
>> Here I add a check of whether two addresses have the same prefix
>> to decide whether their prefixes are equal.
>>
>> Fixes: 5b84efecb7d9 ("ipv6 addrconf: don't cleanup prefix route
>> for IFA_F_NOPREFIXROUTE")
>> Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
>> Reported-by: Wenhao Zhang <zhangwenhao8@huawei.com>
> 
> Applied and queued up for -stable.Thank you for applying the patch.
> 
> Please do not split up long Fixes: tag lines, keep the entire tag on
> one line only.
> 
> I fixed it up for you this time.
> 
> Thanks.
This is the first patch in my life. Thank you and David Ahern again.


^ permalink raw reply

* Re: [PATCH 01/18] MIPS: lantiq: pass struct device to DMA API functions
From: Christoph Hellwig @ 2019-02-12  7:45 UTC (permalink / raw)
  To: Paul Burton
  Cc: Christoph Hellwig, John Crispin, Vinod Koul, Dmitry Tarnyagin,
	Nicolas Ferre, Sudip Mukherjee, Felipe Balbi,
	linux-mips@vger.kernel.org, linux-kernel@vger.kernel.org,
	dmaengine@vger.kernel.org, netdev@vger.kernel.org,
	linux-usb@vger.kernel.org, linux-fbdev@vger.kernel.org,
	alsa-devel@alsa-project.org, iommu@lists.linux-foundation.org
In-Reply-To: <20190207232912.wfgejc5c6d6lk5so@pburton-laptop>

On Thu, Feb 07, 2019 at 11:29:14PM +0000, Paul Burton wrote:
> Would you like this to go through the MIPS tree or elsewhere? If the
> latter:
> 
>     Acked-by: Paul Burton <paul.burton@mips.com>

Please pick it up through the mips tree!

^ permalink raw reply

* [PATCH bpf] xsk: do not remove umem from netdevice on fall-back to copy-mode
From: Björn Töpel @ 2019-02-12  7:51 UTC (permalink / raw)
  To: ast, daniel, netdev
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson,
	jan.sokolowski

From: Björn Töpel <bjorn.topel@intel.com>

Commit c9b47cc1fabc ("xsk: fix bug when trying to use both copy and
zero-copy on one queue id") stores the umem into the netdev._rx
struct. However, the patch incorrectly removed the umem from the
netdev._rx struct when user-space passed "best-effort" mode
(i.e. select the fastest possible option available), and zero-copy
mode was not available. This commit fixes that.

Fixes: c9b47cc1fabc ("xsk: fix bug when trying to use both copy and zero-copy on one queue id")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 net/xdp/xdp_umem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 597866e7c441..37e1fe180769 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -125,9 +125,10 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
 	return 0;
 
 err_unreg_umem:
-	xdp_clear_umem_at_qid(dev, queue_id);
 	if (!force_zc)
 		err = 0; /* fallback to copy mode */
+	if (err)
+		xdp_clear_umem_at_qid(dev, queue_id);
 out_rtnl_unlock:
 	rtnl_unlock();
 	return err;
-- 
2.19.1


^ permalink raw reply related

* [PATCH net-next] nfp: flower: remove double new line
From: Jakub Kicinski @ 2019-02-12  8:18 UTC (permalink / raw)
  To: davem; +Cc: netdev, oss-drivers, Jakub Kicinski

Recent cls_flower offload rewrite added a double new line.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/flower/offload.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index fe1469d201af..450d7296fd57 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -187,7 +187,6 @@ nfp_flower_calculate_key_layers(struct nfp_app *app,
 		if (ipv4_addrs.mask->dst != cpu_to_be32(~0))
 			return -EOPNOTSUPP;
 
-
 		flow_rule_match_enc_ports(rule, &enc_ports);
 		if (enc_ports.mask->dst != cpu_to_be16(~0))
 			return -EOPNOTSUPP;
-- 
2.19.2


^ permalink raw reply related

* [PATCH bpf-next] bpf: offload: add priv field for drivers
From: Jakub Kicinski @ 2019-02-12  8:20 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski

Currently bpf_offload_dev does not have any priv pointer, forcing
the drivers to work backwards from the netdev in program metadata.
This is not great given programs are conceptually associated with
the offload device, and it means one or two unnecessary deferences.
Add a priv pointer to bpf_offload_dev.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/main.c    |  2 +-
 drivers/net/ethernet/netronome/nfp/bpf/offload.c |  4 +---
 drivers/net/netdevsim/bpf.c                      |  5 +++--
 include/linux/bpf.h                              |  3 ++-
 kernel/bpf/offload.c                             | 10 +++++++++-
 5 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index dccae0319204..275de9f4c61c 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -465,7 +465,7 @@ static int nfp_bpf_init(struct nfp_app *app)
 		app->ctrl_mtu = nfp_bpf_ctrl_cmsg_mtu(bpf);
 	}
 
-	bpf->bpf_dev = bpf_offload_dev_create(&nfp_bpf_dev_ops);
+	bpf->bpf_dev = bpf_offload_dev_create(&nfp_bpf_dev_ops, bpf);
 	err = PTR_ERR_OR_ZERO(bpf->bpf_dev);
 	if (err)
 		goto err_free_neutral_maps;
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index 55c7dbf8b421..15dce97650a5 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -185,8 +185,6 @@ static void nfp_prog_free(struct nfp_prog *nfp_prog)
 
 static int nfp_bpf_verifier_prep(struct bpf_prog *prog)
 {
-	struct nfp_net *nn = netdev_priv(prog->aux->offload->netdev);
-	struct nfp_app *app = nn->app;
 	struct nfp_prog *nfp_prog;
 	int ret;
 
@@ -197,7 +195,7 @@ static int nfp_bpf_verifier_prep(struct bpf_prog *prog)
 
 	INIT_LIST_HEAD(&nfp_prog->insns);
 	nfp_prog->type = prog->type;
-	nfp_prog->bpf = app->priv;
+	nfp_prog->bpf = bpf_offload_dev_priv(prog->aux->offload->offdev);
 
 	ret = nfp_prog_prepare(nfp_prog, prog->insnsi, prog->len);
 	if (ret)
diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
index 172b271c8bd2..f92c43453ec6 100644
--- a/drivers/net/netdevsim/bpf.c
+++ b/drivers/net/netdevsim/bpf.c
@@ -248,7 +248,7 @@ static int nsim_bpf_create_prog(struct netdevsim *ns, struct bpf_prog *prog)
 
 static int nsim_bpf_verifier_prep(struct bpf_prog *prog)
 {
-	struct netdevsim *ns = netdev_priv(prog->aux->offload->netdev);
+	struct netdevsim *ns = bpf_offload_dev_priv(prog->aux->offload->offdev);
 
 	if (!ns->bpf_bind_accept)
 		return -EOPNOTSUPP;
@@ -589,7 +589,8 @@ int nsim_bpf_init(struct netdevsim *ns)
 		if (IS_ERR_OR_NULL(ns->sdev->ddir_bpf_bound_progs))
 			return -ENOMEM;
 
-		ns->sdev->bpf_dev = bpf_offload_dev_create(&nsim_bpf_dev_ops);
+		ns->sdev->bpf_dev = bpf_offload_dev_create(&nsim_bpf_dev_ops,
+							   ns);
 		err = PTR_ERR_OR_ZERO(ns->sdev->bpf_dev);
 		if (err)
 			return err;
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index bd169a7bcc93..d3126ff4994a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -767,8 +767,9 @@ int bpf_map_offload_get_next_key(struct bpf_map *map,
 bool bpf_offload_prog_map_match(struct bpf_prog *prog, struct bpf_map *map);
 
 struct bpf_offload_dev *
-bpf_offload_dev_create(const struct bpf_prog_offload_ops *ops);
+bpf_offload_dev_create(const struct bpf_prog_offload_ops *ops, void *priv);
 void bpf_offload_dev_destroy(struct bpf_offload_dev *offdev);
+void *bpf_offload_dev_priv(struct bpf_offload_dev *offdev);
 int bpf_offload_dev_netdev_register(struct bpf_offload_dev *offdev,
 				    struct net_device *netdev);
 void bpf_offload_dev_netdev_unregister(struct bpf_offload_dev *offdev,
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index 39dba8c90331..ba635209ae9a 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -35,6 +35,7 @@ static DECLARE_RWSEM(bpf_devs_lock);
 struct bpf_offload_dev {
 	const struct bpf_prog_offload_ops *ops;
 	struct list_head netdevs;
+	void *priv;
 };
 
 struct bpf_offload_netdev {
@@ -669,7 +670,7 @@ void bpf_offload_dev_netdev_unregister(struct bpf_offload_dev *offdev,
 EXPORT_SYMBOL_GPL(bpf_offload_dev_netdev_unregister);
 
 struct bpf_offload_dev *
-bpf_offload_dev_create(const struct bpf_prog_offload_ops *ops)
+bpf_offload_dev_create(const struct bpf_prog_offload_ops *ops, void *priv)
 {
 	struct bpf_offload_dev *offdev;
 	int err;
@@ -688,6 +689,7 @@ bpf_offload_dev_create(const struct bpf_prog_offload_ops *ops)
 		return ERR_PTR(-ENOMEM);
 
 	offdev->ops = ops;
+	offdev->priv = priv;
 	INIT_LIST_HEAD(&offdev->netdevs);
 
 	return offdev;
@@ -700,3 +702,9 @@ void bpf_offload_dev_destroy(struct bpf_offload_dev *offdev)
 	kfree(offdev);
 }
 EXPORT_SYMBOL_GPL(bpf_offload_dev_destroy);
+
+void *bpf_offload_dev_priv(struct bpf_offload_dev *offdev)
+{
+	return offdev->priv;
+}
+EXPORT_SYMBOL_GPL(bpf_offload_dev_priv);
-- 
2.19.2


^ permalink raw reply related

* Re: [net-next PATCH 2/2] net: page_pool: don't use page->private to store dma_addr_t
From: Jesper Dangaard Brouer @ 2019-02-12  8:23 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Netdev, linux-mm, Toke Høiland-Jørgensen,
	Ilias Apalodimas, Matthew Wilcox, Saeed Mahameed, Andrew Morton,
	Mel Gorman, David S. Miller, Tariq Toukan, brouer
In-Reply-To: <CAKgT0Ucw_HGaice7cjM7e_nYuvjU_TKVd54Yc_fHen1pZRkUJw@mail.gmail.com>

On Mon, 11 Feb 2019 11:31:13 -0800
Alexander Duyck <alexander.duyck@gmail.com> wrote:

> On Mon, Feb 11, 2019 at 8:07 AM Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> >
> > From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> >
> > As pointed out by David Miller the current page_pool implementation
> > stores dma_addr_t in page->private.
> > This won't work on 32-bit platforms with 64-bit DMA addresses since the
> > page->private is an unsigned long and the dma_addr_t a u64.
> >
> > A previous patch is adding dma_addr_t on struct page to accommodate this.
> > This patch adapts the page_pool related functions to use the newly added
> > struct for storing and retrieving DMA addresses from network drivers.
> >
> > Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> >  net/core/page_pool.c |   13 +++++++++----
> >  1 file changed, 9 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > index 43a932cb609b..897a69a1477e 100644
> > --- a/net/core/page_pool.c
> > +++ b/net/core/page_pool.c
> > @@ -136,7 +136,9 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
> >         if (!(pool->p.flags & PP_FLAG_DMA_MAP))
> >                 goto skip_dma_map;
> >
> > -       /* Setup DMA mapping: use page->private for DMA-addr
> > +       /* Setup DMA mapping: use 'struct page' area for storing DMA-addr
> > +        * since dma_addr_t can be either 32 or 64 bits and does not always fit
> > +        * into page private data (i.e 32bit cpu with 64bit DMA caps)
> >          * This mapping is kept for lifetime of page, until leaving pool.
> >          */
> >         dma = dma_map_page(pool->p.dev, page, 0,
> > @@ -146,7 +148,7 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
> >                 put_page(page);
> >                 return NULL;
> >         }
> > -       set_page_private(page, dma); /* page->private = dma; */
> > +       page->dma_addr = dma;
> >
> >  skip_dma_map:
> >         /* When page just alloc'ed is should/must have refcnt 1. */
> > @@ -175,13 +177,16 @@ EXPORT_SYMBOL(page_pool_alloc_pages);
> >  static void __page_pool_clean_page(struct page_pool *pool,
> >                                    struct page *page)
> >  {
> > +       dma_addr_t dma;
> > +
> >         if (!(pool->p.flags & PP_FLAG_DMA_MAP))
> >                 return;
> >
> > +       dma = page->dma_addr;
> >         /* DMA unmap */
> > -       dma_unmap_page(pool->p.dev, page_private(page),
> > +       dma_unmap_page(pool->p.dev, dma,
> >                        PAGE_SIZE << pool->p.order, pool->p.dma_dir);
> > -       set_page_private(page, 0);
> > +       page->dma_addr = 0;
> >  }
> >
> >  /* Return a page to the page allocator, cleaning up our state */  
> 
> This comment is unrelated to this patch specifically, but applies more
> generally to the page_pool use of dma_unmap_page.
> 
> So just looking at this I am pretty sure the use of just
> dma_unmap_page isn't correct here. You should probably be using
> dma_unmap_page_attrs and specifically be passing the attribute
> DMA_ATTR_SKIP_CPU_SYNC so that you can tear down the mapping without
> invalidating the contents of the page.

It is unrelated to this patch, but YES you are right.  I was aware of
this, but it slipped my mind.  You were the one that taught me the
principle page_pool is based on, that we keep the DMA mapping, but
instead let the driver perform the DMA-sync operations.

Thanks for catching this!  I actually think that the current small
ARM64 board we are playing with at the moment (Espressobin) will have a
performance benefit from doing this.


> This is something that will work for most cases but if you run into a
> case where this is used with SWIOTLB in bounce buffer mode you would
> end up potentially corrupting data on the unmap call.

I do have a board Machiattobin, that operate with SWIOTLB bounce
buffers, which it is not suppose to, and something that I'll hopefully
get a round to fix soon.  But we have not implemented use of page_pool
on that board yet. So, thanks for catching this.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [net-next PATCH 1/2] mm: add dma_addr_t to struct page
From: Jesper Dangaard Brouer @ 2019-02-12  8:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: netdev, linux-mm, Toke Høiland-Jørgensen,
	Ilias Apalodimas, willy, Saeed Mahameed, mgorman, David S. Miller,
	Tariq Toukan, brouer
In-Reply-To: <20190211121624.30c601d0fa4c0f972eeaf1c6@linux-foundation.org>

On Mon, 11 Feb 2019 12:16:24 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Mon, 11 Feb 2019 17:06:46 +0100 Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> 
> > The page_pool API is using page->private to store DMA addresses.
> > As pointed out by David Miller we can't use that on 32-bit architectures
> > with 64-bit DMA
> > 
> > This patch adds a new dma_addr_t struct to allow storing DMA addresses
> > 
> > ..
> >
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -95,6 +95,14 @@ struct page {
> >  			 */
> >  			unsigned long private;
> >  		};
> > +		struct {	/* page_pool used by netstack */
> > +			/**
> > +			 * @dma_addr: Page_pool need to store DMA-addr, and
> > +			 * cannot use @private, as DMA-mappings can be 64-bit
> > +			 * even on 32-bit Architectures.
> > +			 */  
> 
> This comment is a bit awkward.  The discussion about why it doesn't use
> ->private is uninteresting going forward and is more material for a  
> changelog.
> 
> How about
> 
> 			/**
> 			 * @dma_addr: page_pool requires a 64-bit value even on
> 			 * 32-bit architectures.
> 			 */

Much better, I'll use that!

> Otherwise,
> 
> Acked-by: Andrew Morton <akpm@linux-foundation.org>

Thanks!

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH] ser_gigaset: mark expected switch fall-through
From: Paul Bolle @ 2019-02-12  8:45 UTC (permalink / raw)
  To: Gustavo A. R. Silva
  Cc: gigaset307x-common, netdev, linux-kernel, Kees Cook, Karsten Keil
In-Reply-To: <20190211223444.GA29517@embeddedor>

Gustavo A. R. Silva schreef op ma 11-02-2019 om 16:34 [-0600]:
> In preparation to enabling -Wimplicit-fallthrough, mark switch
> cases where we are expecting to fall through.
> 
> This patch fixes the following warning:
> 
> drivers/isdn/gigaset/ser-gigaset.c: In function ‘gigaset_tty_ioctl’:
> drivers/isdn/gigaset/ser-gigaset.c:627:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
>    switch (arg) {
>    ^~~~~~
> drivers/isdn/gigaset/ser-gigaset.c:638:2: note: here
>   default:
>   ^~~~~~~
> 
> Warning level 3 was used: -Wimplicit-fallthrough=3
> 
> Notice that, in this particular case, the code comment is modified
> in accordance with what GCC is expecting to find.
> 
> This patch is part of the ongoing efforts to enable
> -Wimplicit-fallthrough.
> 
> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>

Acked-by: Paul Bolle <pebolle@tiscali.nl>

Thanks,


Paul Bolle


^ permalink raw reply

* [PATCH 0/2] i40e: fix regression that enables AF_XDP ZC unconditionally
From: Björn Töpel @ 2019-02-12  8:52 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, netdev,
	jan.sokolowski

This series addresses a recent AF_XDP zero-copy regression.

In commit f3fef2b6e1cc ("i40e: Remove umem from VSI") a regression was
introduced; When the VSI was reset, the setup code would try to enable
AF_XDP ZC unconditionally (as long as there was a umem placed in the
netdev._rx struct). Here, we add a bitmap to the VSI that tracks if a
certain queue pair has been "zero-copy enabled" via the ndo_bpf. The
bitmap is used in i40e_xsk_umem, and enables zero-copy if and only if
XDP is enabled, the corresponding qid in the bitmap is set and the
umem is non-NULL.

Thanks,
Björn


Björn Töpel (2):
  i40e: move i40e_xsk_umem function
  i40e: add tracking of AF_XDP ZC state for each queue pair

 drivers/net/ethernet/intel/i40e/i40e.h      | 16 ++----------
 drivers/net/ethernet/intel/i40e/i40e_main.c | 28 +++++++++++++++++++++
 drivers/net/ethernet/intel/i40e/i40e_xsk.c  |  3 +++
 3 files changed, 33 insertions(+), 14 deletions(-)

-- 
2.19.1


^ permalink raw reply

* [PATCH 1/2] i40e: move i40e_xsk_umem function
From: Björn Töpel @ 2019-02-12  8:52 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, netdev,
	jan.sokolowski
In-Reply-To: <20190212085205.7848-1-bjorn.topel@gmail.com>

From: Björn Töpel <bjorn.topel@intel.com>

The i40e_xsk_umem function was explicitly inlined in i40e.h. There is
no reason for that, so move it to i40e_main.c instead.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h      | 14 --------------
 drivers/net/ethernet/intel/i40e/i40e_main.c | 20 ++++++++++++++++++++
 2 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index d684998ba2b0..cc583ad5236b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -1096,20 +1096,6 @@ static inline bool i40e_enabled_xdp_vsi(struct i40e_vsi *vsi)
 	return !!vsi->xdp_prog;
 }
 
-static inline struct xdp_umem *i40e_xsk_umem(struct i40e_ring *ring)
-{
-	bool xdp_on = i40e_enabled_xdp_vsi(ring->vsi);
-	int qid = ring->queue_index;
-
-	if (ring_is_xdp(ring))
-		qid -= ring->vsi->alloc_queue_pairs;
-
-	if (!xdp_on)
-		return NULL;
-
-	return xdp_get_umem_from_qid(ring->vsi->netdev, qid);
-}
-
 int i40e_create_queue_channel(struct i40e_vsi *vsi, struct i40e_channel *ch);
 int i40e_set_bw_limit(struct i40e_vsi *vsi, u16 seid, u64 max_tx_rate);
 int i40e_add_del_cloud_filter(struct i40e_vsi *vsi,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 44856a84738d..ba1a84a2c8e5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -3063,6 +3063,26 @@ static void i40e_config_xps_tx_ring(struct i40e_ring *ring)
 			    ring->queue_index);
 }
 
+/**
+ * i40e_xsk_umem - Retrieve the AF_XDP ZC if XDP and ZC is enabled
+ * @ring: The Tx or Rx ring
+ *
+ * Returns the UMEM or NULL.
+ **/
+static struct xdp_umem *i40e_xsk_umem(struct i40e_ring *ring)
+{
+	bool xdp_on = i40e_enabled_xdp_vsi(ring->vsi);
+	int qid = ring->queue_index;
+
+	if (ring_is_xdp(ring))
+		qid -= ring->vsi->alloc_queue_pairs;
+
+	if (!xdp_on)
+		return NULL;
+
+	return xdp_get_umem_from_qid(ring->vsi->netdev, qid);
+}
+
 /**
  * i40e_configure_tx_ring - Configure a transmit ring context and rest
  * @ring: The Tx ring to configure
-- 
2.19.1


^ permalink raw reply related

* [PATCH 2/2] i40e: add tracking of AF_XDP ZC state for each queue pair
From: Björn Töpel @ 2019-02-12  8:52 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, netdev,
	jan.sokolowski
In-Reply-To: <20190212085205.7848-1-bjorn.topel@gmail.com>

From: Björn Töpel <bjorn.topel@intel.com>

In commit f3fef2b6e1cc ("i40e: Remove umem from VSI") a regression was
introduced; When the VSI was reset, the setup code would try to enable
AF_XDP ZC unconditionally (as long as there was a umem placed in the
netdev._rx struct). Here, we add a bitmap to the VSI that tracks if a
certain queue pair has been "zero-copy enabled" via the ndo_bpf. The
bitmap is used in i40e_xsk_umem, and enables zero-copy if and only if
XDP is enabled, the corresponding qid in the bitmap is set and the
umem is non-NULL.

Fixes: f3fef2b6e1cc ("i40e: Remove umem from VSI")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h      |  2 ++
 drivers/net/ethernet/intel/i40e/i40e_main.c | 10 +++++++++-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c  |  3 +++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index cc583ad5236b..d3cc3427caad 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -790,6 +790,8 @@ struct i40e_vsi {
 
 	/* VSI specific handlers */
 	irqreturn_t (*irq_handler)(int irq, void *data);
+
+	unsigned long *af_xdp_zc_qps; /* tracks AF_XDP ZC enabled qps */
 } ____cacheline_internodealigned_in_smp;
 
 struct i40e_netdev_priv {
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index ba1a84a2c8e5..0dd00d58c524 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -3077,7 +3077,7 @@ static struct xdp_umem *i40e_xsk_umem(struct i40e_ring *ring)
 	if (ring_is_xdp(ring))
 		qid -= ring->vsi->alloc_queue_pairs;
 
-	if (!xdp_on)
+	if (!xdp_on || !test_bit(qid, ring->vsi->af_xdp_zc_qps))
 		return NULL;
 
 	return xdp_get_umem_from_qid(ring->vsi->netdev, qid);
@@ -10076,6 +10076,12 @@ static int i40e_vsi_mem_alloc(struct i40e_pf *pf, enum i40e_vsi_type type)
 	hash_init(vsi->mac_filter_hash);
 	vsi->irqs_ready = false;
 
+	if (type == I40E_VSI_MAIN) {
+		vsi->af_xdp_zc_qps = bitmap_zalloc(pf->num_lan_qps, GFP_KERNEL);
+		if (!vsi->af_xdp_zc_qps)
+			goto err_rings;
+	}
+
 	ret = i40e_set_num_rings_in_vsi(vsi);
 	if (ret)
 		goto err_rings;
@@ -10094,6 +10100,7 @@ static int i40e_vsi_mem_alloc(struct i40e_pf *pf, enum i40e_vsi_type type)
 	goto unlock_pf;
 
 err_rings:
+	bitmap_free(vsi->af_xdp_zc_qps);
 	pf->next_vsi = i - 1;
 	kfree(vsi);
 unlock_pf:
@@ -10174,6 +10181,7 @@ static int i40e_vsi_clear(struct i40e_vsi *vsi)
 	i40e_put_lump(pf->qp_pile, vsi->base_queue, vsi->idx);
 	i40e_put_lump(pf->irq_pile, vsi->base_vector, vsi->idx);
 
+	bitmap_free(vsi->af_xdp_zc_qps);
 	i40e_vsi_free_arrays(vsi, true);
 	i40e_clear_rss_config_user(vsi);
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 96d849460d9b..2737fee338c4 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -102,6 +102,8 @@ static int i40e_xsk_umem_enable(struct i40e_vsi *vsi, struct xdp_umem *umem,
 	if (err)
 		return err;
 
+	set_bit(qid, vsi->af_xdp_zc_qps);
+
 	if_running = netif_running(vsi->netdev) && i40e_enabled_xdp_vsi(vsi);
 
 	if (if_running) {
@@ -143,6 +145,7 @@ static int i40e_xsk_umem_disable(struct i40e_vsi *vsi, u16 qid)
 			return err;
 	}
 
+	clear_bit(qid, vsi->af_xdp_zc_qps);
 	i40e_xsk_umem_dma_unmap(vsi, umem);
 
 	if (if_running) {
-- 
2.19.1


^ permalink raw reply related

* Re: Stack sends oversize UDP packet to the driver
From: Michael Chan @ 2019-02-12  8:55 UTC (permalink / raw)
  To: Mahesh Bandewar (महेश बंडेवार)
  Cc: Daniel Axtens, Netdev, David Miller, Eric Dumazet,
	Willem de Bruijn
In-Reply-To: <CAF2d9jjwuT1WSz7P9QFbjdpp8GhhV-XBLzVC8H94Le_JCxL0fg@mail.gmail.com>

On Fri, Feb 8, 2019 at 12:26 PM Mahesh Bandewar (महेश बंडेवार)
<maheshb@google.com> wrote:
>
> On Wed, Feb 6, 2019 at 8:51 PM Mahesh Bandewar (महेश बंडेवार)
> <maheshb@google.com> wrote:
> >
> > On Tue, Feb 5, 2019 at 11:36 AM Michael Chan <michael.chan@broadcom.com> wrote:
> > > I've looked at this a little more.  The blackhole_dev is not IFF_UP |
> > > IFF_RUNNING, right?  May be that's why the packets are never getting
> > > to the xmit function?
> > Yes, so I added those two flags and ended up writing a test-module for
> > the device (which I will include while posting the patch-series).
> > However, adding those flags is also not sufficient since the qdisc is
> > initialized to noop_qdisc so qdisc enqueue will drop packets before
> > hitting the ndo_start_xmit().
>
> I have another version of the fix (with help from Eric) and this
> should hit the .ndo_start_xmit() of the blackhole_dev. I'm adding
> these flags during the setup and then calling dev_activate() to change
> noop qdisc to null qdisc. Please give this patch set a try and let me
> know if the blackhole_dev xmit path gets exercised in your test
> scenario.

The new version still works in the sense that no oversize packets are
seen in the NIC driver's xmit function.  But I still don't see any
packets hitting the blackhole's xmit function.  I'm not 100% sure but
I think the blackhole dev has no IP address and so the UDP packets are
dropped in ip_finish_output2() because there is no neigh.  Something
like that.

^ permalink raw reply

* Re: [PATCH net-next 2/2] devlink: Fix list access without lock while reading region
From: Sergei Shtylyov @ 2019-02-12  9:01 UTC (permalink / raw)
  To: Parav Pandit, jiri, davem, netdev
In-Reply-To: <1549955377-15828-1-git-send-email-parav@mellanox.com>

Hello!

On 12.02.2019 10:09, Parav Pandit wrote:

> While finding the devlink device during region reading,
> devlink device list is accessed and devlink device is
> returned without holding a lock. This could lead to user-after-free

    Use-after-free, perhaps?

> accesses.
> 
> While at it, add lockdep assert to ensure that all future callers hold
> the lock when calling devlink_get_from_attrs().
> 
> Fixes: 4e54795a27f5 ("devlink: Add support for region snapshot read command")
> Signed-off-by: Parav Pandit <parav@mellanox.com>
> Acked-by: Jiri Pirko <jiri@mellanox.com>
[...]

MBR, Sergei

^ permalink raw reply

* Re: [PATCH net-next 0/3] Remove getting SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS
From: Jiri Pirko @ 2019-02-12  8:55 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, David S. Miller, Ido Schimmel, open list,
	open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
	andrew, vivien.didelot
In-Reply-To: <20190211211749.19847-1-f.fainelli@gmail.com>

Mon, Feb 11, 2019 at 10:17:46PM CET, f.fainelli@gmail.com wrote:
>Hi all,
>
>AFAICT there is no code that attempts to get the value of the attribute
>SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS while it is used with
>switchdev_port_attr_set().
>
>This is effectively no doing anything and it can slow down future work
>that tries to make modifications in these areas so remove that.
>
>David, there should be no dependency with previous patch series, but
>again, feedback from Ido and Jiri would be welcome in case this was
>added for a reason.

It was originally used by:
switchdev_port_bridge_getlink()
removed by:
commit 29ab586c3d83f81c435e269cace9a1619afb5bbd
Author: Arkadi Sharshevsky <arkadis@mellanox.com>
Date:   Sun Aug 6 16:15:51 2017 +0300

    net: switchdev: Remove bridge bypass support from switchdev

So these are just leftovers. Let's flush them.

^ permalink raw reply

* Re: [PATCH net-next 1/3] mlxsw: spectrum_switchdev: Remove getting PORT_BRIDGE_FLAGS
From: Jiri Pirko @ 2019-02-12  8:55 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, David S. Miller, Ido Schimmel, open list,
	open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
	andrew, vivien.didelot
In-Reply-To: <20190211211749.19847-2-f.fainelli@gmail.com>

Mon, Feb 11, 2019 at 10:17:47PM CET, f.fainelli@gmail.com wrote:
>There is no code that will query the SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS
>attribute remove support for that.
>
>Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* Re: [PATCH net-next 2/3] rocker: Remove getting PORT_BRIDGE_FLAGS
From: Jiri Pirko @ 2019-02-12  8:56 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, David S. Miller, Ido Schimmel, open list,
	open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
	andrew, vivien.didelot
In-Reply-To: <20190211211749.19847-3-f.fainelli@gmail.com>

Mon, Feb 11, 2019 at 10:17:48PM CET, f.fainelli@gmail.com wrote:
>There is no code that attempts to get the
>SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS attribute, remove support for that.
>
>Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* Re: [PATCH net-next 3/3] staging: fsl-dpaa2: ethsw: Remove getting PORT_BRIDGE_FLAGS
From: Jiri Pirko @ 2019-02-12  8:56 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, David S. Miller, Ido Schimmel, open list,
	open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
	andrew, vivien.didelot
In-Reply-To: <20190211211749.19847-4-f.fainelli@gmail.com>

Mon, Feb 11, 2019 at 10:17:49PM CET, f.fainelli@gmail.com wrote:
>There is no code that tries to get the attribute
>SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS, remove support for doing that.
>
>Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* Re: [Patch net] team: avoid complex list operations in team_nl_cmd_options_set()
From: Jiri Pirko @ 2019-02-12  9:26 UTC (permalink / raw)
  To: Cong Wang
  Cc: netdev, syzbot+4d4af685432dc0e56c91, syzbot+68ee510075cf64260cc4,
	Paolo Abeni
In-Reply-To: <20190212055951.6712-1-xiyou.wangcong@gmail.com>

Tue, Feb 12, 2019 at 06:59:51AM CET, xiyou.wangcong@gmail.com wrote:
>The current opt_inst_list operations inside team_nl_cmd_options_set()
>is too complex to track:
>
>    LIST_HEAD(opt_inst_list);
>    nla_for_each_nested(...) {
>        list_for_each_entry(opt_inst, &team->option_inst_list, list) {
>            if (__team_option_inst_tmp_find(&opt_inst_list, opt_inst))
>                continue;
>            list_add(&opt_inst->tmp_list, &opt_inst_list);
>        }
>    }
>    team_nl_send_event_options_get(team, &opt_inst_list);
>
>as while we retrieve 'opt_inst' from team->option_inst_list, it could
>be added to the local 'opt_inst_list' for multiple times. The
>__team_option_inst_tmp_find() doesn't work, as the setter
>team_mode_option_set() still calls team->ops.exit() which uses
>->tmp_list too in __team_options_change_check().
>
>Simplify the list operations by moving the 'opt_inst_list' and
>team_nl_send_event_options_get() into the nla_for_each_nested() loop so
>that it can be guranteed that we won't insert a same list entry for
>multiple times. Therefore, __team_option_inst_tmp_find() can be removed
>too.
>
>Fixes: 4fb0534fb7bb ("team: avoid adding twice the same option to the event list")
>Fixes: 2fcdb2c9e659 ("team: allow to send multiple set events in one message")
>Reported-by: syzbot+4d4af685432dc0e56c91@syzkaller.appspotmail.com
>Reported-by: syzbot+68ee510075cf64260cc4@syzkaller.appspotmail.com
>Cc: Jiri Pirko <jiri@resnulli.us>
>Cc: Paolo Abeni <pabeni@redhat.com>
>Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

Thanks!

^ permalink raw reply

* Re: [net-next PATCH 1/2] mm: add dma_addr_t to struct page
From: Jesper Dangaard Brouer @ 2019-02-12 10:06 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: netdev, linux-mm, Toke Høiland-Jørgensen,
	Ilias Apalodimas, Saeed Mahameed, Andrew Morton, mgorman,
	David S. Miller, Tariq Toukan, brouer, Willem de Bruijn
In-Reply-To: <20190211165551.GD12668@bombadil.infradead.org>

On Mon, 11 Feb 2019 08:55:51 -0800
Matthew Wilcox <willy@infradead.org> wrote:

> On Mon, Feb 11, 2019 at 05:06:46PM +0100, Jesper Dangaard Brouer wrote:
> > The page_pool API is using page->private to store DMA addresses.
> > As pointed out by David Miller we can't use that on 32-bit architectures
> > with 64-bit DMA
> > 
> > This patch adds a new dma_addr_t struct to allow storing DMA addresses
> > 
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>  
> 
> Reviewed-by: Matthew Wilcox <willy@infradead.org>
> 
> > +		struct {	/* page_pool used by netstack */
> > +			/**
> > +			 * @dma_addr: Page_pool need to store DMA-addr, and  
> 
> s/need/needs/
> 
> > +			 * cannot use @private, as DMA-mappings can be 64-bit  
> 
> s/DMA-mappings/DMA addresses/
> 
> > +			 * even on 32-bit Architectures.  
> 
> s/A/a/

Yes, that comments needs improvement. I think I'll use AKPMs suggestion.


> > +			 */
> > +			dma_addr_t dma_addr; /* Shares area with @lru */  
> 
> It also shares with @slab_list, @next, @compound_head, @pgmap and
> @rcu_head.  I think it's pointless to try to document which other fields
> something shares space with; the places which do it are a legacy from
> before I rearranged struct page last year.  Anyone looking at this should
> now be able to see "Oh, this is a union, only use the fields which are
> in the union for the type of struct page I have here".

I agree, I'll strip that comment.

 
> Are the pages allocated from this API ever supposed to be mapped to
> userspace?

I would like to know what fields on struct-page we cannot touch if we
want to keep this a possibility?

That said, I hope we don't need to do this. But as I integrate this
further into the netstack code, we might have to support this, or
at-least release the page_pool "state" (currently only DMA-addr) before
the skb_zcopy code path.  First iteration will not do zero-copy stuff,
and later I'll coordinate with Willem how to add this, if needed.

My general opinion is that if an end-user want to have pages mapped to
userspace, then page_pool (MEM_TYPE_PAGE_POOL) is not the right choice,
but instead use MEM_TYPE_ZERO_COPY (see enum xdp_mem_type).  We are
generally working towards allowing NIC drivers to have a different
memory type per RX-ring.


> You also say in the documentation:
> 
>  * If no DMA mapping is done, then it can act as shim-layer that
>  * fall-through to alloc_page.  As no state is kept on the page, the
>  * regular put_page() call is sufficient.
> 
> I think this is probably a dangerous precedent to set.  Better to require
> exactly one call to page_pool_put_page() (with the understanding that the
> refcount may be elevated, so this may not be the final free of the page,
> but the page will no longer be usable for its page_pool purpose).

Yes, this actually how it is implemented today, and the comment should
be improved.  Today __page_pool_put_page() in case of refcount is
elevated do call __page_pool_clean_page() to release page page_pool
state, and is in principle no longer "usable" for page_pool purposes.
BUT I have considered removing this, as it might not fit how want to
use the API. In our current RFC we found a need for (and introduced) a
page_pool_unmap_page() call (that call __page_pool_clean_page()), when
driver hits cases where the code path doesn't have a call-back to
page_pool_put_page() but instead end-up calling put_page().

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox