Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v5 11/17] net: sgi: ioc3-eth: no need to stop queue set_multicast_list
From: Thomas Bogendoerfer @ 2019-08-21 12:40 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jonathan Corbet, Ralf Baechle, Paul Burton, James Hogan,
	Dmitry Torokhov, Lee Jones, David S. Miller, Srinivas Kandagatla,
	Alessandro Zummo, Alexandre Belloni, Greg Kroah-Hartman,
	Jiri Slaby, Evgeniy Polyakov, linux-doc, linux-kernel, linux-mips,
	linux-input, netdev, linux-rtc, linux-serial
In-Reply-To: <20190819170440.37ff18d4@cakuba.netronome.com>

On Mon, 19 Aug 2019 17:04:53 -0700
Jakub Kicinski <jakub.kicinski@netronome.com> wrote:

> On Mon, 19 Aug 2019 18:31:34 +0200, Thomas Bogendoerfer wrote:
> > netif_stop_queue()/netif_wake_qeue() aren't needed for changing
> > multicast filters. Use spinlocks instead for proper protection
> > of private struct.
> > 
>
> I thought it may protect ip->emcr, but that one is accessed with no
> locking from the ioc3_timer() -> ioc3_setup_duplex() path..

it should protect ip->emcr ... I'll add spin_lock/unlock to setup_duplex and
respin the patch.

Thomas.

-- 
SUSE Linux GmbH
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

^ permalink raw reply

* Re: [PATCH bpf-next v2 0/3] xdpsock: allow mmap2 usage for 32bits
From: Daniel Borkmann @ 2019-08-21 12:35 UTC (permalink / raw)
  To: Ivan Khoronzhuk, magnus.karlsson, bjorn.topel
  Cc: davem, hawk, john.fastabend, jakub.kicinski, netdev, bpf,
	xdp-newbies, linux-kernel, jlemon, yhs, andrii.nakryiko
In-Reply-To: <20190815121356.8848-1-ivan.khoronzhuk@linaro.org>

On 8/15/19 2:13 PM, Ivan Khoronzhuk wrote:
> This patchset contains several improvements for af_xdp socket umem
> mappings for 32bit systems. Also, there is one more patch outside of
> this series that on linux-next tree and related to mmap2 af_xdp umem
> offsets: "mm: mmap: increase sockets maximum memory size pgoff for 32bits"
> https://lkml.org/lkml/2019/8/12/549
> 
> Based on bpf-next/master
> 
> Prev: https://lkml.org/lkml/2019/8/13/437
> 
> v2..v1:
> 	- replaced "libbpf: add asm/unistd.h to xsk to get __NR_mmap2" on
> 	 "libbpf: use LFS (_FILE_OFFSET_BITS) instead of direct mmap2
> 	 syscall"
> 	- use vmap along with page_address to avoid overkill
> 	- define mmap syscall trace5 for mmap if defined
> 
> Ivan Khoronzhuk (3):
>    libbpf: use LFS (_FILE_OFFSET_BITS) instead of direct mmap2 syscall
>    xdp: xdp_umem: replace kmap on vmap for umem map
>    samples: bpf: syscal_nrs: use mmap2 if defined
> 
>   net/xdp/xdp_umem.c         | 36 +++++++++++++++++++++++-----
>   samples/bpf/syscall_nrs.c  |  6 +++++
>   samples/bpf/tracex5_kern.c | 13 ++++++++++
>   tools/lib/bpf/Makefile     |  1 +
>   tools/lib/bpf/xsk.c        | 49 +++++++++++---------------------------
>   5 files changed, 64 insertions(+), 41 deletions(-)
> 

Applied, and fixed up typo in last one's subject, thanks!

^ permalink raw reply

* [PATCH net-next] amd-xgbe: use devm_platform_ioremap_resource() to simplify code
From: YueHaibing @ 2019-08-21 12:32 UTC (permalink / raw)
  To: davem, thomas.lendacky; +Cc: linux-kernel, netdev, YueHaibing

Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/net/ethernet/amd/xgbe/xgbe-platform.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-platform.c b/drivers/net/ethernet/amd/xgbe/xgbe-platform.c
index dce9e59..4ebd241 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-platform.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-platform.c
@@ -301,7 +301,6 @@ static int xgbe_platform_probe(struct platform_device *pdev)
 	struct xgbe_prv_data *pdata;
 	struct device *dev = &pdev->dev;
 	struct platform_device *phy_pdev;
-	struct resource *res;
 	const char *phy_mode;
 	unsigned int phy_memnum, phy_irqnum;
 	unsigned int dma_irqnum, dma_irqend;
@@ -353,8 +352,7 @@ static int xgbe_platform_probe(struct platform_device *pdev)
 	}
 
 	/* Obtain the mmio areas for the device */
-	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-	pdata->xgmac_regs = devm_ioremap_resource(dev, res);
+	pdata->xgmac_regs = devm_platform_ioremap_resource(pdev, 0);
 	if (IS_ERR(pdata->xgmac_regs)) {
 		dev_err(dev, "xgmac ioremap failed\n");
 		ret = PTR_ERR(pdata->xgmac_regs);
@@ -363,8 +361,7 @@ static int xgbe_platform_probe(struct platform_device *pdev)
 	if (netif_msg_probe(pdata))
 		dev_dbg(dev, "xgmac_regs = %p\n", pdata->xgmac_regs);
 
-	res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
-	pdata->xpcs_regs = devm_ioremap_resource(dev, res);
+	pdata->xpcs_regs = devm_platform_ioremap_resource(pdev, 1);
 	if (IS_ERR(pdata->xpcs_regs)) {
 		dev_err(dev, "xpcs ioremap failed\n");
 		ret = PTR_ERR(pdata->xpcs_regs);
@@ -373,8 +370,8 @@ static int xgbe_platform_probe(struct platform_device *pdev)
 	if (netif_msg_probe(pdata))
 		dev_dbg(dev, "xpcs_regs  = %p\n", pdata->xpcs_regs);
 
-	res = platform_get_resource(phy_pdev, IORESOURCE_MEM, phy_memnum++);
-	pdata->rxtx_regs = devm_ioremap_resource(dev, res);
+	pdata->rxtx_regs = devm_platform_ioremap_resource(phy_pdev,
+							  phy_memnum++);
 	if (IS_ERR(pdata->rxtx_regs)) {
 		dev_err(dev, "rxtx ioremap failed\n");
 		ret = PTR_ERR(pdata->rxtx_regs);
@@ -383,8 +380,8 @@ static int xgbe_platform_probe(struct platform_device *pdev)
 	if (netif_msg_probe(pdata))
 		dev_dbg(dev, "rxtx_regs  = %p\n", pdata->rxtx_regs);
 
-	res = platform_get_resource(phy_pdev, IORESOURCE_MEM, phy_memnum++);
-	pdata->sir0_regs = devm_ioremap_resource(dev, res);
+	pdata->sir0_regs = devm_platform_ioremap_resource(phy_pdev,
+							  phy_memnum++);
 	if (IS_ERR(pdata->sir0_regs)) {
 		dev_err(dev, "sir0 ioremap failed\n");
 		ret = PTR_ERR(pdata->sir0_regs);
@@ -393,8 +390,8 @@ static int xgbe_platform_probe(struct platform_device *pdev)
 	if (netif_msg_probe(pdata))
 		dev_dbg(dev, "sir0_regs  = %p\n", pdata->sir0_regs);
 
-	res = platform_get_resource(phy_pdev, IORESOURCE_MEM, phy_memnum++);
-	pdata->sir1_regs = devm_ioremap_resource(dev, res);
+	pdata->sir1_regs = devm_platform_ioremap_resource(phy_pdev,
+							  phy_memnum++);
 	if (IS_ERR(pdata->sir1_regs)) {
 		dev_err(dev, "sir1 ioremap failed\n");
 		ret = PTR_ERR(pdata->sir1_regs);
-- 
2.7.4



^ permalink raw reply related

* [PATCH bpf] flow_dissector: Fix potential use-after-free on BPF_PROG_DETACH
From: Jakub Sitnicki @ 2019-08-21 12:17 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team, Petar Penkov, Willem de Bruijn, Lorenz Bauer

Call to bpf_prog_put(), with help of call_rcu(), queues an RCU-callback to
free the program once a grace period has elapsed. The callback can run
together with new RCU readers that started after the last grace period.
New RCU readers can potentially see the "old" to-be-freed or already-freed
pointer to the program object before the RCU update-side NULLs it.

Reorder the operations so that the RCU update-side resets the protected
pointer before the end of the grace period after which the program will be
freed.

Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook")
Reported-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/core/flow_dissector.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 3e6fedb57bc1..2470b4b404e6 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -142,8 +142,8 @@ int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
 		mutex_unlock(&flow_dissector_mutex);
 		return -ENOENT;
 	}
-	bpf_prog_put(attached);
 	RCU_INIT_POINTER(net->flow_dissector_prog, NULL);
+	bpf_prog_put(attached);
 	mutex_unlock(&flow_dissector_mutex);
 	return 0;
 }
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH bpf-next v2 2/4] selftests/bpf: test_progs: remove global fail/success counts
From: Daniel Borkmann @ 2019-08-21 12:17 UTC (permalink / raw)
  To: Stanislav Fomichev, netdev, bpf; +Cc: davem, ast, Andrii Nakryiko
In-Reply-To: <20190819191752.241637-3-sdf@google.com>

On 8/19/19 9:17 PM, Stanislav Fomichev wrote:
> Now that we have a global per-test/per-environment state, there
> is no longer need to have global fail/success counters (and there
> is no need to save/get the diff before/after the test).

Thanks for the improvements, just a small comment below, otherwise LGTM.

> Introduce QCHECK macro (suggested by Andrii) and covert existing tests
> to it. QCHECK uses new test__fail() to record the failure.
> 
> Cc: Andrii Nakryiko <andriin@fb.com>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
[...]
> @@ -96,17 +93,25 @@ extern struct ipv6_packet pkt_v6;
>   #define _CHECK(condition, tag, duration, format...) ({			\
>   	int __ret = !!(condition);					\
>   	if (__ret) {							\
> -		error_cnt++;						\
> +		test__fail();						\
>   		printf("%s:FAIL:%s ", __func__, tag);			\
>   		printf(format);						\
>   	} else {							\
> -		pass_cnt++;						\
>   		printf("%s:PASS:%s %d nsec\n",				\
>   		       __func__, tag, duration);			\
>   	}								\
>   	__ret;								\
>   })
>   
> +#define QCHECK(condition) ({						\
> +	int __ret = !!(condition);					\
> +	if (__ret) {							\
> +		test__fail();						\
> +		printf("%s:FAIL:%d ", __func__, __LINE__);		\
> +	}								\
> +	__ret;								\
> +})

I know it's just a tiny nit but the name QCHECK() really doesn't tell me anything
if I don't see its definition. Even just a CHECK_FAIL() might be 'better' and
more aligned with the CHECK() and CHECK_ATTR() we have, at least I don't think
many would automatically derive 'quiet' from the Q prefix [0].

   [0] https://lore.kernel.org/bpf/CAEf4BzbUGiUZBWkTWe2=LfhkXYhQGndN9gR6VTZwfV3eytstUw@mail.gmail.com/

>   #define CHECK(condition, tag, format...) \
>   	_CHECK(condition, tag, duration, format)
>   #define CHECK_ATTR(condition, tag, format...) \
> 


^ permalink raw reply

* Re: [PATCH net-next v2 6/9] net: macsec: hardware offloading infrastructure
From: Igor Russkikh @ 2019-08-21 12:01 UTC (permalink / raw)
  To: Antoine Tenart, Sabrina Dubroca
  Cc: Andrew Lunn, davem@davemloft.net, f.fainelli@gmail.com,
	hkallweit1@gmail.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, thomas.petazzoni@bootlin.com,
	alexandre.belloni@bootlin.com, allan.nielsen@microchip.com,
	camelia.groza@nxp.com, Simon Edelhaus, Pavel Belous
In-Reply-To: <20190821100106.GA3006@kwain>


> Right. I did not see how *not* to strip the sectag in the h/w back then,
> I'll have another look because that would improve things a lot.
> 
> @all: do other MACsec offloading hardware allow not to stip the sectag?

I've just checked this, and see an action option in our HW classifier to keep
macsec header with optional error information added. But we've never
experimented configuring this honestly, I don't think we should rely in general
design that such a feature is widely available in HW.

Regards,
  Igor

^ permalink raw reply

* Re: [PATCH net-next 1/1] fec: add C45 MDIO read/write support
From: Marco Hartmann @ 2019-08-21 11:44 UTC (permalink / raw)
  To: Andy Duan, davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Christian Herber
In-Reply-To: <VI1PR0402MB3600576CFF2392A71B1DA99CFFAB0@VI1PR0402MB3600.eurprd04.prod.outlook.com>

On 20.08.19 04:08, Andy Duan wrote:
> From: Marco Hartmann Sent: Tuesday, August 20, 2019 1:11 AM
>> IEEE 802.3ae clause 45 defines a modified MDIO protocol that uses a two
>> staged access model in order to increase the address space.
>>
>> This patch adds support for C45 MDIO read and write accesses, which are
>> used whenever the MII_ADDR_C45 flag in the regnum argument is set.
>> In case it is not set, C22 accesses are used as before.
>>
>> Co-developed-by: Christian Herber <christian.herber@nxp.com>
>> Signed-off-by: Christian Herber <christian.herber@nxp.com>
>> Signed-off-by: Marco Hartmann <marco.hartmann@nxp.com>
>> ---
>>   drivers/net/ethernet/freescale/fec_main.c | 65
>> ++++++++++++++++++++++++++++---
>>   1 file changed, 59 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/freescale/fec_main.c
>> b/drivers/net/ethernet/freescale/fec_main.c
>> index c01d3ec3e9af..73f8f9a149a1 100644
>> --- a/drivers/net/ethernet/freescale/fec_main.c
>> +++ b/drivers/net/ethernet/freescale/fec_main.c
>> @@ -208,8 +208,11 @@ MODULE_PARM_DESC(macaddr, "FEC Ethernet
>> MAC address");
>>
>>   /* FEC MII MMFR bits definition */
>>   #define FEC_MMFR_ST		(1 << 30)
>> +#define FEC_MMFR_ST_C45		(0)
>>   #define FEC_MMFR_OP_READ	(2 << 28)
>> +#define FEC_MMFR_OP_READ_C45	(3 << 28)
>>   #define FEC_MMFR_OP_WRITE	(1 << 28)
>> +#define FEC_MMFR_OP_ADDR_WRITE	(0)
>>   #define FEC_MMFR_PA(v)		((v & 0x1f) << 23)
>>   #define FEC_MMFR_RA(v)		((v & 0x1f) << 18)
>>   #define FEC_MMFR_TA		(2 << 16)
>> @@ -1767,7 +1770,7 @@ static int fec_enet_mdio_read(struct mii_bus *bus,
>> int mii_id, int regnum)
>>   	struct fec_enet_private *fep = bus->priv;
>>   	struct device *dev = &fep->pdev->dev;
>>   	unsigned long time_left;
>> -	int ret = 0;
>> +	int ret = 0, frame_start, frame_addr, frame_op;
> 
> Add bool variable:
> 
> bool is_c45 = !!(regnum & MII_ADDR_C45);
>>
>>   	ret = pm_runtime_get_sync(dev);
>>   	if (ret < 0)
>> @@ -1775,9 +1778,36 @@ static int fec_enet_mdio_read(struct mii_bus
>> *bus, int mii_id, int regnum)
>>
>>   	reinit_completion(&fep->mdio_done);
>>
>> +	if (MII_ADDR_C45 & regnum) {
> if (is_c45)
> 
>> +		frame_start = FEC_MMFR_ST_C45;
>> +
>> +		/* write address */
>> +		frame_addr = (regnum >> 16);
>> +		writel(frame_start | FEC_MMFR_OP_ADDR_WRITE |
>> +		       FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(frame_addr) |
>> +		       FEC_MMFR_TA | (regnum & 0xFFFF),
>> +		       fep->hwp + FEC_MII_DATA);
>> +
>> +		/* wait for end of transfer */
>> +		time_left = wait_for_completion_timeout(&fep->mdio_done,
>> +				usecs_to_jiffies(FEC_MII_TIMEOUT));
>> +		if (time_left == 0) {
>> +			netdev_err(fep->netdev, "MDIO address write timeout\n");
>> +			ret  = -ETIMEDOUT;
> 
> Should be:
> goto out;
>> +		}
>> +
>> +		frame_op = FEC_MMFR_OP_READ_C45;
>> +
>> +	} else {
>> +		/* C22 read */
>> +		frame_op = FEC_MMFR_OP_READ;
>> +		frame_start = FEC_MMFR_ST;
>> +		frame_addr = regnum;
>> +	}
>> +
>>   	/* start a read op */
>> -	writel(FEC_MMFR_ST | FEC_MMFR_OP_READ |
>> -		FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
>> +	writel(frame_start | frame_op |
>> +		FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(frame_addr) |
>>   		FEC_MMFR_TA, fep->hwp + FEC_MII_DATA);
>>
>>   	/* wait for end of transfer */
>> @@ -1804,7 +1834,7 @@ static int fec_enet_mdio_write(struct mii_bus *bus,
>> int mii_id, int regnum,
>>   	struct fec_enet_private *fep = bus->priv;
>>   	struct device *dev = &fep->pdev->dev;
>>   	unsigned long time_left;
>> -	int ret;
>> +	int ret, frame_start, frame_addr;
>>
>>   	ret = pm_runtime_get_sync(dev);
>>   	if (ret < 0)
>> @@ -1814,9 +1844,32 @@ static int fec_enet_mdio_write(struct mii_bus
>> *bus, int mii_id, int regnum,
> 
> bool is_c45 = !!(regnum & MII_ADDR_C45);
>>
>>   	reinit_completion(&fep->mdio_done);
>>
>> +	if (MII_ADDR_C45 & regnum) {
> 
> if (!is_c45) {
>> +		frame_start = FEC_MMFR_ST_C45;
>> +
>> +		/* write address */
>> +		frame_addr = (regnum >> 16);
>> +		writel(frame_start | FEC_MMFR_OP_ADDR_WRITE |
>> +		       FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(frame_addr) |
>> +		       FEC_MMFR_TA | (regnum & 0xFFFF),
>> +		       fep->hwp + FEC_MII_DATA);
>> +
>> +		/* wait for end of transfer */
>> +		time_left = wait_for_completion_timeout(&fep->mdio_done,
>> +			usecs_to_jiffies(FEC_MII_TIMEOUT));
>> +		if (time_left == 0) {
>> +			netdev_err(fep->netdev, "MDIO address write timeout\n");
>> +			ret  = -ETIMEDOUT;
> Like mdio read, it should be:
> goto out;
>> +		}
>> +	} else {
>> +		/* C22 write */
>> +		frame_start = FEC_MMFR_ST;
>> +		frame_addr = regnum;
>> +	}
>> +
>>   	/* start a write op */
>> -	writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
>> -		FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
>> +	writel(frame_start | FEC_MMFR_OP_WRITE |
>> +		FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(frame_addr) |
>>   		FEC_MMFR_TA | FEC_MMFR_DATA(value),
>>   		fep->hwp + FEC_MII_DATA);
>>
>> --
>> 2.7.4
> 

Thank you for your feedback,
the fixes are included in v2 of the patch.

Regards,
Marco

^ permalink raw reply

* [PATCH v2 net-next] net: fec: add C45 MDIO read/write support
From: Marco Hartmann @ 2019-08-21 11:43 UTC (permalink / raw)
  To: Andy Duan, davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Christian Herber, Marco Hartmann

IEEE 802.3ae clause 45 defines a modified MDIO protocol that uses a two
staged access model in order to increase the address space.

This patch adds support for C45 MDIO read and write accesses, which are
used whenever the MII_ADDR_C45 flag in the regnum argument is set.
In case it is not set, C22 accesses are used as before.

Signed-off-by: Marco Hartmann <marco.hartmann@nxp.com>
---
Changes in v2:
- use bool variable is_c45
- add missing goto statements
---
---
 drivers/net/ethernet/freescale/fec_main.c | 70 ++++++++++++++++++++++++++++---
 1 file changed, 64 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index c01d3ec3e9af..cb3ce27fb27a 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -208,8 +208,11 @@ MODULE_PARM_DESC(macaddr, "FEC Ethernet MAC address");
 
 /* FEC MII MMFR bits definition */
 #define FEC_MMFR_ST		(1 << 30)
+#define FEC_MMFR_ST_C45		(0)
 #define FEC_MMFR_OP_READ	(2 << 28)
+#define FEC_MMFR_OP_READ_C45	(3 << 28)
 #define FEC_MMFR_OP_WRITE	(1 << 28)
+#define FEC_MMFR_OP_ADDR_WRITE	(0)
 #define FEC_MMFR_PA(v)		((v & 0x1f) << 23)
 #define FEC_MMFR_RA(v)		((v & 0x1f) << 18)
 #define FEC_MMFR_TA		(2 << 16)
@@ -1767,7 +1770,8 @@ static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
 	struct fec_enet_private *fep = bus->priv;
 	struct device *dev = &fep->pdev->dev;
 	unsigned long time_left;
-	int ret = 0;
+	int ret = 0, frame_start, frame_addr, frame_op;
+	bool is_c45 = !!(regnum & MII_ADDR_C45);
 
 	ret = pm_runtime_get_sync(dev);
 	if (ret < 0)
@@ -1775,9 +1779,37 @@ static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
 
 	reinit_completion(&fep->mdio_done);
 
+	if (is_c45) {
+		frame_start = FEC_MMFR_ST_C45;
+
+		/* write address */
+		frame_addr = (regnum >> 16);
+		writel(frame_start | FEC_MMFR_OP_ADDR_WRITE |
+		       FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(frame_addr) |
+		       FEC_MMFR_TA | (regnum & 0xFFFF),
+		       fep->hwp + FEC_MII_DATA);
+
+		/* wait for end of transfer */
+		time_left = wait_for_completion_timeout(&fep->mdio_done,
+				usecs_to_jiffies(FEC_MII_TIMEOUT));
+		if (time_left == 0) {
+			netdev_err(fep->netdev, "MDIO address write timeout\n");
+			ret = -ETIMEDOUT;
+			goto out;
+		}
+
+		frame_op = FEC_MMFR_OP_READ_C45;
+
+	} else {
+		/* C22 read */
+		frame_op = FEC_MMFR_OP_READ;
+		frame_start = FEC_MMFR_ST;
+		frame_addr = regnum;
+	}
+
 	/* start a read op */
-	writel(FEC_MMFR_ST | FEC_MMFR_OP_READ |
-		FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
+	writel(frame_start | frame_op |
+		FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(frame_addr) |
 		FEC_MMFR_TA, fep->hwp + FEC_MII_DATA);
 
 	/* wait for end of transfer */
@@ -1804,7 +1836,8 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
 	struct fec_enet_private *fep = bus->priv;
 	struct device *dev = &fep->pdev->dev;
 	unsigned long time_left;
-	int ret;
+	int ret, frame_start, frame_addr;
+	bool is_c45 = !!(regnum & MII_ADDR_C45);
 
 	ret = pm_runtime_get_sync(dev);
 	if (ret < 0)
@@ -1814,9 +1847,33 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
 
 	reinit_completion(&fep->mdio_done);
 
+	if (is_c45) {
+		frame_start = FEC_MMFR_ST_C45;
+
+		/* write address */
+		frame_addr = (regnum >> 16);
+		writel(frame_start | FEC_MMFR_OP_ADDR_WRITE |
+		       FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(frame_addr) |
+		       FEC_MMFR_TA | (regnum & 0xFFFF),
+		       fep->hwp + FEC_MII_DATA);
+
+		/* wait for end of transfer */
+		time_left = wait_for_completion_timeout(&fep->mdio_done,
+			usecs_to_jiffies(FEC_MII_TIMEOUT));
+		if (time_left == 0) {
+			netdev_err(fep->netdev, "MDIO address write timeout\n");
+			ret = -ETIMEDOUT;
+			goto out;
+		}
+	} else {
+		/* C22 write */
+		frame_start = FEC_MMFR_ST;
+		frame_addr = regnum;
+	}
+
 	/* start a write op */
-	writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
-		FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
+	writel(frame_start | FEC_MMFR_OP_WRITE |
+		FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(frame_addr) |
 		FEC_MMFR_TA | FEC_MMFR_DATA(value),
 		fep->hwp + FEC_MII_DATA);
 
@@ -1828,6 +1885,7 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
 		ret  = -ETIMEDOUT;
 	}
 
+out:
 	pm_runtime_mark_last_busy(dev);
 	pm_runtime_put_autosuspend(dev);
 
-- 
2.7.4


^ permalink raw reply related

* Re: [PATCH bpf-next 2/2] tools: bpftool: add "bpftool map freeze" subcommand
From: Daniel Borkmann @ 2019-08-21 11:40 UTC (permalink / raw)
  To: Quentin Monnet, Alexei Starovoitov; +Cc: bpf, netdev, oss-drivers
In-Reply-To: <20190821085219.30387-3-quentin.monnet@netronome.com>

On 8/21/19 10:52 AM, Quentin Monnet wrote:
> Add a new subcommand to freeze maps from user space.
> 
> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> ---
>   .../bpf/bpftool/Documentation/bpftool-map.rst |  9 +++++
>   tools/bpf/bpftool/bash-completion/bpftool     |  4 +--
>   tools/bpf/bpftool/map.c                       | 34 ++++++++++++++++++-
>   3 files changed, 44 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
> index 61d1d270eb5e..1c0f7146aab0 100644
> --- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
> +++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
> @@ -36,6 +36,7 @@ MAP COMMANDS
>   |	**bpftool** **map pop**        *MAP*
>   |	**bpftool** **map enqueue**    *MAP* **value** *VALUE*
>   |	**bpftool** **map dequeue**    *MAP*
> +|	**bpftool** **map freeze**     *MAP*
>   |	**bpftool** **map help**
>   |
>   |	*MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
> @@ -127,6 +128,14 @@ DESCRIPTION
>   	**bpftool map dequeue**  *MAP*
>   		  Dequeue and print **value** from the queue.
>   
> +	**bpftool map freeze**  *MAP*
> +		  Freeze the map as read-only from user space. Entries from a
> +		  frozen map can not longer be updated or deleted with the
> +		  **bpf\ ()** system call. This operation is not reversible,
> +		  and the map remains immutable from user space until its
> +		  destruction. However, read and write permissions for BPF
> +		  programs to the map remain unchanged.

That is not correct, programs that are loaded into the system /after/ the map
has been frozen cannot modify values either, thus read-only from both sides.

Thanks,
Daniel

^ permalink raw reply

* [PATCH] trivial: netns: fix typo in 'struct net.passive' description
From: Mike Rapoport @ 2019-08-21 11:29 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Mike Rapoport

Replace 'decided' with 'decide' so that comment would be

/* To decide when the network namespace should be freed. */

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
---
 include/net/net_namespace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index cb668bc2692d..ab40d7afdc54 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -52,7 +52,7 @@ struct bpf_prog;
 #define NETDEV_HASHENTRIES (1 << NETDEV_HASHBITS)
 
 struct net {
-	refcount_t		passive;	/* To decided when the network
+	refcount_t		passive;	/* To decide when the network
 						 * namespace should be freed.
 						 */
 	refcount_t		count;		/* To decided when the network
-- 
2.7.4


^ permalink raw reply related

* [PATCH v2 net] Add genphy_c45_config_aneg() function to phy-c45.c
From: Marco Hartmann @ 2019-08-21 11:00 UTC (permalink / raw)
  To: andrew@lunn.ch, f.fainelli@gmail.com, hkallweit1@gmail.com,
	davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Marco Hartmann, Christian Herber

Commit 34786005eca3 ("net: phy: prevent PHYs w/o Clause 22 regs from calling
genphy_config_aneg") introduced a check that aborts phy_config_aneg()
if the phy is a C45 phy.
This causes phy_state_machine() to call phy_error() so that the phy
ends up in PHY_HALTED state.

Instead of returning -EOPNOTSUPP, call genphy_c45_config_aneg()
(analogous to the C22 case) so that the state machine can run
correctly.

genphy_c45_config_aneg() closely resembles mv3310_config_aneg()
in drivers/net/phy/marvell10g.c, excluding vendor specific
configurations for 1000BaseT.

Fixes: 22b56e827093 ("net: phy: replace genphy_10g_driver with genphy_c45_driver")

Signed-off-by: Marco Hartmann <marco.hartmann@nxp.com>
---
Changes in v2:
- corrected commit message
- reordered variables
---
---
 drivers/net/phy/phy-c45.c | 26 ++++++++++++++++++++++++++
 drivers/net/phy/phy.c     |  2 +-
 include/linux/phy.h       |  1 +
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/phy-c45.c b/drivers/net/phy/phy-c45.c
index 58bb25e4af10..7935593debb1 100644
--- a/drivers/net/phy/phy-c45.c
+++ b/drivers/net/phy/phy-c45.c
@@ -523,6 +523,32 @@ int genphy_c45_read_status(struct phy_device *phydev)
 }
 EXPORT_SYMBOL_GPL(genphy_c45_read_status);
 
+/**
+ * genphy_c45_config_aneg - restart auto-negotiation or forced setup
+ * @phydev: target phy_device struct
+ *
+ * Description: If auto-negotiation is enabled, we configure the
+ *   advertising, and then restart auto-negotiation.  If it is not
+ *   enabled, then we force a configuration.
+ */
+int genphy_c45_config_aneg(struct phy_device *phydev)
+{
+	bool changed = false;
+	int ret;
+
+	if (phydev->autoneg == AUTONEG_DISABLE)
+		return genphy_c45_pma_setup_forced(phydev);
+
+	ret = genphy_c45_an_config_aneg(phydev);
+	if (ret < 0)
+		return ret;
+	if (ret > 0)
+		changed = true;
+
+	return genphy_c45_check_and_restart_aneg(phydev, changed);
+}
+EXPORT_SYMBOL_GPL(genphy_c45_config_aneg);
+
 /* The gen10g_* functions are the old Clause 45 stub */
 
 int gen10g_config_aneg(struct phy_device *phydev)
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index f3adea9ef400..74c4e15ebe52 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -507,7 +507,7 @@ static int phy_config_aneg(struct phy_device *phydev)
 	 * allowed to call genphy_config_aneg()
 	 */
 	if (phydev->is_c45 && !(phydev->c45_ids.devices_in_package & BIT(0)))
-		return -EOPNOTSUPP;
+		return genphy_c45_config_aneg(phydev);
 
 	return genphy_config_aneg(phydev);
 }
diff --git a/include/linux/phy.h b/include/linux/phy.h
index d26779f1fb6b..a7ecbe0e55aa 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1117,6 +1117,7 @@ int genphy_c45_an_disable_aneg(struct phy_device *phydev);
 int genphy_c45_read_mdix(struct phy_device *phydev);
 int genphy_c45_pma_read_abilities(struct phy_device *phydev);
 int genphy_c45_read_status(struct phy_device *phydev);
+int genphy_c45_config_aneg(struct phy_device *phydev);
 
 /* The gen10g_* functions are the old Clause 45 stub */
 int gen10g_config_aneg(struct phy_device *phydev);
-- 
2.7.4


^ permalink raw reply related

* Re: [PATCH spi for-5.4 1/5] spi: Use an abbreviated pointer to ctlr->cur_msg in __spi_pump_messages
From: Mark Brown @ 2019-08-21 11:01 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Hubert Feurstein, mlichvar, Richard Cochran, Andrew Lunn,
	Florian Fainelli, linux-spi, netdev
In-Reply-To: <CA+h21hpgi7-dJ-QoRBEQorcRyEuhqhKixpFK5fGxOnZxTHi-4g@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 795 bytes --]

On Tue, Aug 20, 2019 at 10:36:43PM +0300, Vladimir Oltean wrote:
> On Tue, 20 Aug 2019 at 21:21, Mark Brown <broonie@kernel.org> wrote:

> > On Sun, Aug 18, 2019 at 09:25:56PM +0300, Vladimir Oltean wrote:

> > >       /* Extract head of queue */
> > > -     ctlr->cur_msg =
> > > -             list_first_entry(&ctlr->queue, struct spi_message, queue);
> > > +     mesg = list_first_entry(&ctlr->queue, struct spi_message, queue);
> > > +     ctlr->cur_msg = mesg;

> > Why mesg when the existing code uses msg as an abbreviation here?

> Does it matter? I took from spi_finalize_current_message which also uses mesg.

It's particularly visible when it's on the same line, flags up a
question about if things are the same.  Other things not being great
doesn't preclude making this one better.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: Regression fix for bpf in v5.3 (was Re: [RFC PATCH] bpf: handle 32-bit zext during constant blinding)
From: Jiong Wang @ 2019-08-21 10:55 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Naveen N. Rao, Alexei Starovoitov, Daniel Borkmann, Jiong Wang,
	bpf, linuxppc-dev, netdev, linux-kernel
In-Reply-To: <87d0gy6cj6.fsf@concordia.ellerman.id.au>


Michael Ellerman writes:

> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:
>> Since BPF constant blinding is performed after the verifier pass, there
>> are certain ALU32 instructions inserted which don't have a corresponding
>> zext instruction inserted after. This is causing a kernel oops on
>> powerpc and can be reproduced by running 'test_cgroup_storage' with
>> bpf_jit_harden=2.
>>
>> Fix this by emitting BPF_ZEXT during constant blinding if
>> prog->aux->verifier_zext is set.
>>
>> Fixes: a4b1d3c1ddf6cb ("bpf: verifier: insert zero extension according to analysis result")
>> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
>> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
>> ---
>> This approach (the location where zext is being introduced below, in 
>> particular) works for powerpc, but I am not entirely sure if this is 
>> sufficient for other architectures as well. This is broken on v5.3-rc4.
>
> Any comment on this?

Have commented on https://marc.info/?l=linux-netdev&m=156637836024743&w=2

The fix looks correct to me on "BPF_LD | BPF_IMM | BPF_DW", but looks
unnecessary on two other places. It would be great if you or Naveen could
confirm it.

Thanks.

Regards,
Jiong

> This is a regression in v5.3, which results in a kernel crash, it would
> be nice to get it fixed before the release please?
>
> cheers
>
>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
>> index 8191a7db2777..d84146e6fd9e 100644
>> --- a/kernel/bpf/core.c
>> +++ b/kernel/bpf/core.c
>> @@ -890,7 +890,8 @@ int bpf_jit_get_func_addr(const struct bpf_prog *prog,
>>  
>>  static int bpf_jit_blind_insn(const struct bpf_insn *from,
>>  			      const struct bpf_insn *aux,
>> -			      struct bpf_insn *to_buff)
>> +			      struct bpf_insn *to_buff,
>> +			      bool emit_zext)
>>  {
>>  	struct bpf_insn *to = to_buff;
>>  	u32 imm_rnd = get_random_int();
>> @@ -939,6 +940,8 @@ static int bpf_jit_blind_insn(const struct bpf_insn *from,
>>  		*to++ = BPF_ALU32_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ from->imm);
>>  		*to++ = BPF_ALU32_IMM(BPF_XOR, BPF_REG_AX, imm_rnd);
>>  		*to++ = BPF_ALU32_REG(from->code, from->dst_reg, BPF_REG_AX);
>> +		if (emit_zext)
>> +			*to++ = BPF_ZEXT_REG(from->dst_reg);
>>  		break;
>>  
>>  	case BPF_ALU64 | BPF_ADD | BPF_K:
>> @@ -992,6 +995,10 @@ static int bpf_jit_blind_insn(const struct bpf_insn *from,
>>  			off -= 2;
>>  		*to++ = BPF_ALU32_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ from->imm);
>>  		*to++ = BPF_ALU32_IMM(BPF_XOR, BPF_REG_AX, imm_rnd);
>> +		if (emit_zext) {
>> +			*to++ = BPF_ZEXT_REG(BPF_REG_AX);
>> +			off--;
>> +		}
>>  		*to++ = BPF_JMP32_REG(from->code, from->dst_reg, BPF_REG_AX,
>>  				      off);
>>  		break;
>> @@ -1005,6 +1012,8 @@ static int bpf_jit_blind_insn(const struct bpf_insn *from,
>>  	case 0: /* Part 2 of BPF_LD | BPF_IMM | BPF_DW. */
>>  		*to++ = BPF_ALU32_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ aux[0].imm);
>>  		*to++ = BPF_ALU32_IMM(BPF_XOR, BPF_REG_AX, imm_rnd);
>> +		if (emit_zext)
>> +			*to++ = BPF_ZEXT_REG(BPF_REG_AX);
>>  		*to++ = BPF_ALU64_REG(BPF_OR,  aux[0].dst_reg, BPF_REG_AX);
>>  		break;
>>  
>> @@ -1088,7 +1097,8 @@ struct bpf_prog *bpf_jit_blind_constants(struct bpf_prog *prog)
>>  		    insn[1].code == 0)
>>  			memcpy(aux, insn, sizeof(aux));
>>  
>> -		rewritten = bpf_jit_blind_insn(insn, aux, insn_buff);
>> +		rewritten = bpf_jit_blind_insn(insn, aux, insn_buff,
>> +						clone->aux->verifier_zext);
>>  		if (!rewritten)
>>  			continue;
>>  
>> -- 
>> 2.22.0


^ permalink raw reply

* Re: [PATCH net-next 1/1] Add genphy_c45_config_aneg() function to phy-c45.c
From: Marco Hartmann @ 2019-08-21 10:55 UTC (permalink / raw)
  To: Heiner Kallweit, Christian Herber, andrew@lunn.ch,
	f.fainelli@gmail.com, davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <3b16b8b6-7a9f-0376-ba73-96d23262dd6e@gmail.com>

On 19.08.19 21:51, Heiner Kallweit wrote:
> On 19.08.2019 19:52, Marco Hartmann wrote:
>> and call it from phy_config_aneg().
>>
> Here something went wrong.
> 
>> commit 34786005eca3 ("net: phy: prevent PHYs w/o Clause 22 regs from
>> calling genphy_config_aneg") introduced a check that aborts
>> phy_config_aneg() if the phy is a C45 phy.
>> This causes phy_state_machine() to call phy_error() so that the phy
>> ends up in PHY_HALTED state.
>>
>> Instead of returning -EOPNOTSUPP, call genphy_c45_config_aneg()
>> (analogous to the C22 case) so that the state machine can run
>> correctly.
>>
>> genphy_c45_config_aneg() closely resembles mv3310_config_aneg()
>> in drivers/net/phy/marvell10g.c, excluding vendor specific
>> configurations for 1000BaseT.
>>
>> Fixes: 34786005eca3 ("net: phy: prevent PHYs w/o Clause 22 regs from
>> calling genphy_config_aneg")
>>
> This tag seems to be the wrong one. This change was done before
> genphy_c45_driver was added. Most likely tag should be:
> 22b56e827093 ("net: phy: replace genphy_10g_driver with genphy_c45_driver")
> And because it's a fix applying to previous kernel versions it should
> be annotated "net", not "net-next".
> 
You are correct, I fixed the tag and annotation.

>> Signed-off-by: Marco Hartmann <marco.hartmann@nxp.com>
>> ---
>>   drivers/net/phy/phy-c45.c | 26 ++++++++++++++++++++++++++
>>   drivers/net/phy/phy.c     |  2 +-
>>   include/linux/phy.h       |  1 +
>>   3 files changed, 28 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/phy/phy-c45.c b/drivers/net/phy/phy-c45.c
>> index b9d4145781ca..fa9062fd9122 100644
>> --- a/drivers/net/phy/phy-c45.c
>> +++ b/drivers/net/phy/phy-c45.c
>> @@ -509,6 +509,32 @@ int genphy_c45_read_status(struct phy_device *phydev)
>>   }
>>   EXPORT_SYMBOL_GPL(genphy_c45_read_status);
>>   
>> +/**
>> + * genphy_c45_config_aneg - restart auto-negotiation or forced setup
>> + * @phydev: target phy_device struct
>> + *
>> + * Description: If auto-negotiation is enabled, we configure the
>> + *   advertising, and then restart auto-negotiation.  If it is not
>> + *   enabled, then we force a configuration.
>> + */
>> +int genphy_c45_config_aneg(struct phy_device *phydev)
>> +{
>> +	int ret;
>> +	bool changed = false;
> 
> Reverse xmas tree please.
> 
ok.

>> [...]
> 
> Overall looks good to me. For a single patch you don't have to provide
> a cover letter.
> 

Thank you for your feedback,
I will provide a v2 of the patch with the above fixes.

Regards,
Marco

^ permalink raw reply

* Re: [PATCH net-next v4] sched: Add dualpi2 qdisc
From: Tilmans, Olivier (Nokia - BE/Antwerp) @ 2019-08-21 10:53 UTC (permalink / raw)
  To: Eric Dumazet, Stephen Hemminger, Olga Albisser,
	De Schepper, Koen (Nokia - BE/Antwerp), Bob Briscoe, Henrik Steen,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <20190821081150.31838-1-olivier.tilmans@nokia-bell-labs.com>

> +static s64 __scale_delta(s64 diff)
> +{
> +	do_div(diff, (1 << (ALPHA_BETA_GRANULARITY + 1)) - 1);
> +	return diff;
> +}
[...]
> +	delta = __scale_delta(((s64)qdelay - q->pi2.target) * q->pi2.alpha);
> +	delta += __scale_delta(((s64)qdelay - qdelay_old) * q->pi2.beta);

I just noticed that ensuring 64b divide compatibility across platforms 
using do_div() introduced this bug, as do_div() works with unsigned operands.

This will be fixed in a later v5 with the following patch:

---
 net/sched/sch_dualpi2.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/sched/sch_dualpi2.c b/net/sched/sch_dualpi2.c
index a6452aa82018..c6c851499d35 100644
--- a/net/sched/sch_dualpi2.c
+++ b/net/sched/sch_dualpi2.c
@@ -385,7 +385,7 @@ static struct sk_buff *dualpi2_qdisc_dequeue(struct Qdisc *sch)
 	return skb;
 }
 
-static s64 __scale_delta(s64 diff)
+static s64 __scale_delta(u64 diff)
 {
 	do_div(diff, (1 << (ALPHA_BETA_GRANULARITY + 1)) - 1);
 	return diff;
@@ -406,16 +406,18 @@ static u32 calculate_probability(struct Qdisc *sch)
 	/* Alpha and beta take at most 32b, i.e, the delay difference would
 	 * overflow for queueing delay differences > ~4.2sec.
 	 */
-	delta = __scale_delta(((s64)qdelay - q->pi2.target) * q->pi2.alpha);
-	delta += __scale_delta(((s64)qdelay - qdelay_old) * q->pi2.beta);
-	new_prob = delta + q->pi2.prob;
+	delta = ((s64)qdelay - q->pi2.target) * q->pi2.alpha;
+	delta += ((s64)qdelay - qdelay_old) * q->pi2.beta;
 	/* Prevent overflow */
 	if (delta > 0) {
+		new_prob = __scale_delta(delta) + q->pi2.prob;
 		if (new_prob < q->pi2.prob)
 			new_prob = MAX_PROB;
+	} else {
+		new_prob = q->pi2.prob - __scale_delta(delta * -1);
 		/* Prevent underflow */
-	} else if (new_prob > q->pi2.prob) {
-		new_prob = 0;
+		if (new_prob > q->pi2.prob)
+			new_prob = 0;
 	}
 	/* If we do not drop on overload, ensure we cap the L4S probability to
 	 * 100% to keep window fairness when overflowing.
-- 

Sorry for this.


Best,
Olivier

^ permalink raw reply related

* Re: [PATCH v2 2/4] bpf: fix 'struct pt_reg' typo in documentation
From: Quentin Monnet @ 2019-08-21 10:28 UTC (permalink / raw)
  To: Peter Wu, Alexei Starovoitov, Daniel Borkmann; +Cc: netdev, bpf
In-Reply-To: <20190820230900.23445-3-peter@lekensteyn.nl>

2019-08-21 00:08 UTC+0100 ~ Peter Wu <peter@lekensteyn.nl>
> There is no 'struct pt_reg'.
> 
> Signed-off-by: Peter Wu <peter@lekensteyn.nl>

Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>

Thanks for the fix!
Quentin

^ permalink raw reply

* Re: [PATCH net-next v3 4/4] net: fec: add support for PTP system timestamping for MDIO devices
From: Vladimir Oltean @ 2019-08-21 10:28 UTC (permalink / raw)
  To: Hubert Feurstein
  Cc: netdev, lkml, Andrew Lunn, Richard Cochran, Miroslav Lichvar,
	Fugang Duan, David S. Miller
In-Reply-To: <20190820084833.6019-5-hubert.feurstein@vahle.at>

On Tue, 20 Aug 2019 at 11:49, Hubert Feurstein <h.feurstein@gmail.com> wrote:
>
> From: Hubert Feurstein <h.feurstein@gmail.com>
>
> In order to improve the synchronisation precision of phc2sys (from
> the linuxptp project) for devices like switches which are attached
> to the MDIO bus, it is necessary the get the system timestamps as
> close as possible to the access which causes the PTP timestamp
> register to be snapshotted in the switch hardware. Usually this is
> triggered by an MDIO write access, the snapshotted timestamp is then
> transferred by several MDIO reads.
>
> The ptp_read_system_*ts functions already check the ptp_sts pointer.
>
> Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
> ---
>  drivers/net/ethernet/freescale/fec_main.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
> index c01d3ec3e9af..dd1253683ac0 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -1815,10 +1815,12 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
>         reinit_completion(&fep->mdio_done);
>
>         /* start a write op */
> +       ptp_read_system_prets(bus->ptp_sts);
>         writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
>                 FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
>                 FEC_MMFR_TA | FEC_MMFR_DATA(value),
>                 fep->hwp + FEC_MII_DATA);
> +       ptp_read_system_postts(bus->ptp_sts);
>

How do you know the core will not service an interrupt here?
Why are you not disabling (postponing) local interrupts after this
critical section? (which you were in the RFC)
If the argument is that you didn't notice any issue with phc2sys -N 5,
that's not a good argument. "Unlikely for a condition to happen" does
not mean deterministic.
Here is an example of the system servicing an interrupt during the
transmission of a 12-byte SPI transfer (proof that they can occur
anywhere where they aren't disabled):
https://drive.google.com/file/d/1rUZpfkBKHJGwQN4orFUWks_5i70wn-mj/view?usp=sharing

>         /* wait for end of transfer */
>         time_left = wait_for_completion_timeout(&fep->mdio_done,
> @@ -1956,7 +1958,7 @@ static int fec_enet_mii_init(struct platform_device *pdev)
>         struct fec_enet_private *fep = netdev_priv(ndev);
>         struct device_node *node;
>         int err = -ENXIO;
> -       u32 mii_speed, holdtime;
> +       u32 mii_speed, mii_period, holdtime;
>
>         /*
>          * The i.MX28 dual fec interfaces are not equal.
> @@ -1993,6 +1995,7 @@ static int fec_enet_mii_init(struct platform_device *pdev)
>          * document.
>          */
>         mii_speed = DIV_ROUND_UP(clk_get_rate(fep->clk_ipg), 5000000);
> +       mii_period = div_u64((u64)mii_speed * 2 * NSEC_PER_SEC, clk_get_rate(fep->clk_ipg));
>         if (fep->quirks & FEC_QUIRK_ENET_MAC)
>                 mii_speed--;
>         if (mii_speed > 63) {
> @@ -2034,6 +2037,8 @@ static int fec_enet_mii_init(struct platform_device *pdev)
>                 pdev->name, fep->dev_id + 1);
>         fep->mii_bus->priv = fep;
>         fep->mii_bus->parent = &pdev->dev;
> +       fep->mii_bus->flags = MII_BUS_F_PTP_STS_SUPPORTED;
> +       fep->mii_bus->ptp_sts_offset = 32 * mii_period;
>
>         node = of_get_child_by_name(pdev->dev.of_node, "mdio");
>         err = of_mdiobus_register(fep->mii_bus, node);
> --
> 2.22.1
>

Regards,
-Vladimir

^ permalink raw reply

* Regression fix for bpf in v5.3 (was Re: [RFC PATCH] bpf: handle 32-bit zext during constant blinding)
From: Michael Ellerman @ 2019-08-21 10:25 UTC (permalink / raw)
  To: Naveen N. Rao, Alexei Starovoitov, Daniel Borkmann, Jiong Wang
  Cc: bpf, linuxppc-dev, netdev, linux-kernel
In-Reply-To: <20190813171018.28221-1-naveen.n.rao@linux.vnet.ibm.com>

"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:
> Since BPF constant blinding is performed after the verifier pass, there
> are certain ALU32 instructions inserted which don't have a corresponding
> zext instruction inserted after. This is causing a kernel oops on
> powerpc and can be reproduced by running 'test_cgroup_storage' with
> bpf_jit_harden=2.
>
> Fix this by emitting BPF_ZEXT during constant blinding if
> prog->aux->verifier_zext is set.
>
> Fixes: a4b1d3c1ddf6cb ("bpf: verifier: insert zero extension according to analysis result")
> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> ---
> This approach (the location where zext is being introduced below, in 
> particular) works for powerpc, but I am not entirely sure if this is 
> sufficient for other architectures as well. This is broken on v5.3-rc4.

Any comment on this?

This is a regression in v5.3, which results in a kernel crash, it would
be nice to get it fixed before the release please?

cheers

> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 8191a7db2777..d84146e6fd9e 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -890,7 +890,8 @@ int bpf_jit_get_func_addr(const struct bpf_prog *prog,
>  
>  static int bpf_jit_blind_insn(const struct bpf_insn *from,
>  			      const struct bpf_insn *aux,
> -			      struct bpf_insn *to_buff)
> +			      struct bpf_insn *to_buff,
> +			      bool emit_zext)
>  {
>  	struct bpf_insn *to = to_buff;
>  	u32 imm_rnd = get_random_int();
> @@ -939,6 +940,8 @@ static int bpf_jit_blind_insn(const struct bpf_insn *from,
>  		*to++ = BPF_ALU32_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ from->imm);
>  		*to++ = BPF_ALU32_IMM(BPF_XOR, BPF_REG_AX, imm_rnd);
>  		*to++ = BPF_ALU32_REG(from->code, from->dst_reg, BPF_REG_AX);
> +		if (emit_zext)
> +			*to++ = BPF_ZEXT_REG(from->dst_reg);
>  		break;
>  
>  	case BPF_ALU64 | BPF_ADD | BPF_K:
> @@ -992,6 +995,10 @@ static int bpf_jit_blind_insn(const struct bpf_insn *from,
>  			off -= 2;
>  		*to++ = BPF_ALU32_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ from->imm);
>  		*to++ = BPF_ALU32_IMM(BPF_XOR, BPF_REG_AX, imm_rnd);
> +		if (emit_zext) {
> +			*to++ = BPF_ZEXT_REG(BPF_REG_AX);
> +			off--;
> +		}
>  		*to++ = BPF_JMP32_REG(from->code, from->dst_reg, BPF_REG_AX,
>  				      off);
>  		break;
> @@ -1005,6 +1012,8 @@ static int bpf_jit_blind_insn(const struct bpf_insn *from,
>  	case 0: /* Part 2 of BPF_LD | BPF_IMM | BPF_DW. */
>  		*to++ = BPF_ALU32_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ aux[0].imm);
>  		*to++ = BPF_ALU32_IMM(BPF_XOR, BPF_REG_AX, imm_rnd);
> +		if (emit_zext)
> +			*to++ = BPF_ZEXT_REG(BPF_REG_AX);
>  		*to++ = BPF_ALU64_REG(BPF_OR,  aux[0].dst_reg, BPF_REG_AX);
>  		break;
>  
> @@ -1088,7 +1097,8 @@ struct bpf_prog *bpf_jit_blind_constants(struct bpf_prog *prog)
>  		    insn[1].code == 0)
>  			memcpy(aux, insn, sizeof(aux));
>  
> -		rewritten = bpf_jit_blind_insn(insn, aux, insn_buff);
> +		rewritten = bpf_jit_blind_insn(insn, aux, insn_buff,
> +						clone->aux->verifier_zext);
>  		if (!rewritten)
>  			continue;
>  
> -- 
> 2.22.0

^ permalink raw reply

* Re: [PATCH net-next v3 2/4] net: mdio: add PTP offset compensation to mdiobus_write_sts
From: Miroslav Lichvar @ 2019-08-21 10:19 UTC (permalink / raw)
  To: Hubert Feurstein
  Cc: Andrew Lunn, netdev, lkml, Richard Cochran, Florian Fainelli,
	Heiner Kallweit, Vladimir Oltean, David S. Miller
In-Reply-To: <CAFfN3gXtkv=YjoQixN+MdZ9vLZRPBMwg1mefuBTHFf1_QENPsg@mail.gmail.com>

On Wed, Aug 21, 2019 at 11:53:12AM +0200, Hubert Feurstein wrote:
> Am Mi., 21. Aug. 2019 um 10:07 Uhr schrieb Miroslav Lichvar
> > Because those reports/statistics are important in calculation of
> > maximum error. If someone had a requirement for a clock to be accurate
> > to 1.5 microseconds and the ioctl returned a delay indicating a
> > sufficient accuracy when in reality it could be worse, that would be a
> > problem.
> >
> Ok, I understand your point. But including the MDIO completion into
> delay calculation
> will indicate a much wore precision as it actually is.

That's ok. It's the same with PCIe devices. It takes about 500 ns to
read a PCI register, so we know in the worst case the offset is
accurate to 250 ns. It's probably much better, maybe to 50 ns, but we
don't really know. We don't know how much asymmetry there is in the
PCIe delay (it certainly is not zero), or how much time the NIC
actually needs to respond to the command and when exactly it reads the
clock.

> When the MDIO
> driver implements
> the PTP system timestamping as follows ...
> 
>   ptp_read_system_prets(bus->ptp_sts);
>   writel(value, mdio-reg)
>   ptp_read_system_postts(bus->ptp_sts);
> 
> ... then we catch already the error caused by interrupts which hit the
> pre/post_ts section.
> Now we only have the additional error of one MDIO clock cycle
> (~400ns). Because I expect
> the MDIO controller to shift out the MDIO frame on the next MDIO clock
> cycle.

Is this always the case?

> So if I subtract
> one MDIO clock cycle from pre_ts and add one MDIO clock cycle to
> post_ts the error indication
> would be sufficiently corrected IMHO.

If I understand it correctly, this ignores the time needed for the
frame to be received, decoded and the clock to be read.

-- 
Miroslav Lichvar

^ permalink raw reply

* Re: [PATCH net-next v3 2/4] net: mdio: add PTP offset compensation to mdiobus_write_sts
From: Vladimir Oltean @ 2019-08-21 10:19 UTC (permalink / raw)
  To: Hubert Feurstein
  Cc: Miroslav Lichvar, Andrew Lunn, netdev, lkml, Richard Cochran,
	Florian Fainelli, Heiner Kallweit, David S. Miller
In-Reply-To: <CAFfN3gXtkv=YjoQixN+MdZ9vLZRPBMwg1mefuBTHFf1_QENPsg@mail.gmail.com>

On Wed, 21 Aug 2019 at 12:53, Hubert Feurstein <h.feurstein@gmail.com> wrote:
>
> Am Mi., 21. Aug. 2019 um 10:07 Uhr schrieb Miroslav Lichvar
> <mlichvar@redhat.com>:
> > > Currently I do not see the benefit from this. The original intention was to
> > > compensate for the remaining offset as good as possible.
> >
> > That's ok, but IMHO the change should not break the assumptions of
> > existing application and users.
> >
> > > The current code
> > > of phc2sys uses the delay only for the filtering of the measurement record
> > > with the shortest delay and for reporting and statistics. Why not simple shift
> > > the timestamps with the offset to the point where we expect the PHC timestamp
> > > to be captured, and we have a very good result compared to where we came
> > > from.
> >
> > Because those reports/statistics are important in calculation of
> > maximum error. If someone had a requirement for a clock to be accurate
> > to 1.5 microseconds and the ioctl returned a delay indicating a
> > sufficient accuracy when in reality it could be worse, that would be a
> > problem.
> >
> Ok, I understand your point. But including the MDIO completion into
> delay calculation
> will indicate a much wore precision as it actually is. When the MDIO
> driver implements
> the PTP system timestamping as follows ...
>
>   ptp_read_system_prets(bus->ptp_sts);
>   writel(value, mdio-reg)
>   ptp_read_system_postts(bus->ptp_sts);
>
> ... then we catch already the error caused by interrupts which hit the
> pre/post_ts section.
> Now we only have the additional error of one MDIO clock cycle
> (~400ns). Because I expect
> the MDIO controller to shift out the MDIO frame on the next MDIO clock
> cycle. So if I subtract

How do you know?
The MDIO controller is a memory-mapped peripheral so there will be a
transaction on the core <-> peripheral interconnect in the SoC.
Depending on the system load, the transaction might not be
instantaneous as you think. Additionally there will be clock domain
crossings in the MDIO controller when transferring the command from
the platform clock to the peripheral clock, which might also add some
jitter.
MDIO frames may also begin with 32 bits of preamble, with some
controllers being able to suppress it. Have you checked/accounted for
this?
The only reliable moment when you know the MDIO command has completed
is when the controller says it has. If there is additional jitter in
waiting for the completion event caused by the GIC and the scheduling
of the ISR, then just switch the driver to poll mode, like everybody
else.

> one MDIO clock cycle from pre_ts and add one MDIO clock cycle to
> post_ts the error indication
> would be sufficiently corrected IMHO. And then we can shift both
> timestamps in the switch driver
> (in the gettimex handler) to compensate the switch depending offset.
> What do you think?
>
> Hubert

Regards,
-Vladimir

^ permalink raw reply

* Re: [PATCH RFC ipsec-next 0/7] ipsec: add TCP encapsulation support (RFC 8229)
From: Sabrina Dubroca @ 2019-08-21 10:17 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: netdev, Herbert Xu
In-Reply-To: <20190821065911.GO2879@gauss3.secunet.de>

2019-08-21, 08:59:11 +0200, Steffen Klassert wrote:
> On Fri, Aug 16, 2019 at 04:18:14PM +0200, Sabrina Dubroca wrote:
> > Hi Steffen,
> > 
> > 2019-06-25, 12:11:33 +0200, Sabrina Dubroca wrote:
> > > This patchset introduces support for TCP encapsulation of IKE and ESP
> > > messages, as defined by RFC 8229 [0]. It is an evolution of what
> > > Herbert Xu proposed in January 2018 [1] that addresses the main
> > > criticism against it, by not interfering with the TCP implementation
> > > at all. The networking stack now has infrastructure for this: TCP ULPs
> > > and Stream Parsers.
> > 
> > Have you had a chance to look at this?  I was going to rebase and
> > resend, but the patches still apply to ipsec-next and net-next (patch
> > 2 is already in net-next as commit bd95e678e0f6).
> 
> I had a look and I have no general objection against this. If you
> think the patchset is ready for inclusion, just remove the RFC and
> resend it. I'll have a closer on it look then.

Ok, thanks, I'll repost.

-- 
Sabrina

^ permalink raw reply

* Re: [PATCH net] ixgbe: fix double clean of tx descriptors with xdp
From: Eelco Chaudron @ 2019-08-21 10:09 UTC (permalink / raw)
  To: Ilya Maximets
  Cc: netdev, linux-kernel, bpf, David S. Miller, Björn Töpel,
	Magnus Karlsson, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Jeff Kirsher, intel-wired-lan, William Tu
In-Reply-To: <20190820151611.10727-1-i.maximets@samsung.com>



On 20 Aug 2019, at 17:16, Ilya Maximets wrote:

> Tx code doesn't clear the descriptor status after cleaning.
> So, if the budget is larger than number of used elems in a ring, some
> descriptors will be accounted twice and xsk_umem_complete_tx will move
> prod_tail far beyond the prod_head breaking the comletion queue ring.
>
> Fix that by limiting the number of descriptors to clean by the number
> of used descriptors in the tx ring.
>
> Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support")
> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> ---
>
> Not tested yet because of lack of available hardware.
> So, testing is very welcome.
>
Hi Ilya, this patch fixes the issue I reported earlier on the Open 
vSwitch mailing list regarding complete queue overrun.

Tested-by: Eelco Chaudron <echaudro@redhat.com>

<SNIP>

^ permalink raw reply

* pull-request: mac80211-next 2019-08-21
From: Johannes Berg @ 2019-08-21 10:04 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-wireless

Hi Dave,

For -next, we have more changes, but as described in the tag
they really just fall into a few groups of changes :-)

Please pull and let me know if there's any problem.

Thanks,
johannes



The following changes since commit 8c40f3b212a373be843a29db608b462af5c3ed5d:

  Merge tag 'mlx5-updates-2019-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux (2019-08-20 22:59:45 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next.git tags/mac80211-next-for-davem-2019-08-21

for you to fetch changes up to 48cb39522a9d4d4680865e40a88f975a1cee6abc:

  mac80211: minstrel_ht: improve rate probing for devices with static fallback (2019-08-21 11:10:13 +0200)

----------------------------------------------------------------
Here are a few groups of changes:
 * EDMG channel support (60 GHz, just a single patch)
 * initial 6/7 GHz band support (Arend)
 * association timestamp recording (Ben)
 * rate control improvements for better performance with
   the mt76 driver (Felix)
 * various fixes for previous HE support changes (John)

----------------------------------------------------------------
Alexei Avshalom Lazar (1):
      nl80211: Add support for EDMG channels

Arend van Spriel (8):
      nl80211: add 6GHz band definition to enum nl80211_band
      cfg80211: add 6GHz UNII band definitions
      cfg80211: util: add 6GHz channel to freq conversion and vice versa
      cfg80211: extend ieee80211_operating_class_to_band() for 6GHz
      cfg80211: add 6GHz in code handling array with NUM_NL80211_BANDS entries
      cfg80211: use same IR permissive rules for 6GHz band
      cfg80211: ibss: use 11a mandatory rates for 6GHz band operation
      cfg80211: apply same mandatory rate flags for 5GHz and 6GHz

Ben Greear (2):
      cfg80211: Support assoc-at timer in sta-info
      mac80211: add assoc-at support

Felix Fietkau (4):
      mac80211: minstrel_ht: fix per-group max throughput rate initialization
      mac80211: minstrel_ht: reduce unnecessary rate probing attempts
      mac80211: minstrel_ht: fix default max throughput rate indexes
      mac80211: minstrel_ht: improve rate probing for devices with static fallback

John Crispin (5):
      mac80211: fix TX legacy rate reporting when tx_status_ext is used
      mac80211: fix bad guard when reporting legacy rates
      mac80211: 80Mhz was not reported properly when using tx_status_ext
      mac80211: add missing length field increment when generating Radiotap header
      mac80211: fix possible NULL pointerderef in obss pd code

 drivers/net/wireless/ath/wil6210/cfg80211.c |   2 +-
 include/net/cfg80211.h                      |  88 ++++++++-
 include/uapi/linux/nl80211.h                |  29 +++
 net/mac80211/he.c                           |   3 +-
 net/mac80211/mlme.c                         |   2 +-
 net/mac80211/rc80211_minstrel.h             |   1 +
 net/mac80211/rc80211_minstrel_ht.c          | 277 ++++++++++++++++++++++++----
 net/mac80211/rc80211_minstrel_ht.h          |  12 ++
 net/mac80211/sta_info.c                     |   3 +
 net/mac80211/sta_info.h                     |   2 +
 net/mac80211/status.c                       |  31 ++--
 net/mac80211/tx.c                           |   1 +
 net/wireless/chan.c                         | 162 +++++++++++++++-
 net/wireless/ibss.c                         |  16 +-
 net/wireless/nl80211.c                      |  39 ++++
 net/wireless/reg.c                          |  21 ++-
 net/wireless/trace.h                        |   3 +-
 net/wireless/util.c                         |  56 +++++-
 18 files changed, 684 insertions(+), 64 deletions(-)


^ permalink raw reply

* Re: [PATCH net-next v2 6/9] net: macsec: hardware offloading infrastructure
From: Antoine Tenart @ 2019-08-21 10:01 UTC (permalink / raw)
  To: Sabrina Dubroca
  Cc: Antoine Tenart, Igor Russkikh, Andrew Lunn, davem@davemloft.net,
	f.fainelli@gmail.com, hkallweit1@gmail.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	thomas.petazzoni@bootlin.com, alexandre.belloni@bootlin.com,
	allan.nielsen@microchip.com, camelia.groza@nxp.com,
	Simon Edelhaus, Pavel Belous
In-Reply-To: <20190820144119.GA28714@bistromath.localdomain>

Hi Sabrina,

On Tue, Aug 20, 2019 at 04:41:19PM +0200, Sabrina Dubroca wrote:
> 2019-08-20, 12:01:40 +0200, Antoine Tenart wrote:
> > So it seems the ability to enable or disable the offloading on a given
> > interface is the main missing feature. I'll add that, however I'll
> > probably (at least at first):
> > 
> > - Have the interface to be fully offloaded or fully handled in s/w (with
> >   errors being thrown if a given configuration isn't supported). Having
> >   both at the same time on a given interface would be tricky because of
> >   the MACsec validation parameter.
> > 
> > - Won't allow to enable/disable the offloading of there are rules in
> >   place, as we're not sure the same rules would be accepted by the other
> >   implementation.
> 
> That's probably quite problematic actually, because to do that you
> need to be able to resync the state between software and hardware,
> particularly packet numbers. So, yeah, we're better off having it
> completely blocked until we have a working implementation (if that
> ever happens).
> 
> However, that means in case the user wants to set up something that's
> not offloadable, they'll have to:
>  - configure the offloaded version until EOPNOTSUPP
>  - tear everything down
>  - restart from scratch without offloading
> 
> That's inconvenient.

That's right, the user might have to replay the whole configuration if
on rule failed to match the h/w requirements. It's inconvenient, but I
think it's better to be safe until we have (if that happens) a working
implementation of synchronizing the rules' state.

> Talking about packet numbers, can you describe how PN exhaustion is
> handled?  I couldn't find much about packet numbers at all in the
> driver patches (I hope the hw doesn't wrap around from 2^32-1 to 0 on
> the same SA).  At some point userspace needs to know that we're
> getting close to 2^32 and that it's time to re-key.  Since the whole
> TX path of the software implementation is bypassed, it looks like the
> PN (as far as drivers/net/macsec.c is concerned) never increases, so
> userspace can't know when to negotiate a new SA.

That's a very good point. It actually was on my todo list for the next
version (I wanted to discuss the other points first). We would also
need to sync the stats at some point.

> > I'm not sure if we should allow to mix the implementations on a given
> > physical interface (by having two MACsec interfaces attached) as the
> > validation would be impossible to do (we would have no idea if a
> > packet was correctly handled by the offloading part or just not being
> > a MACsec packet in the first place, in Rx).
> 
> That's something that really bothers me with this proposal. Quoting
> from the commit message:
> 
> > the packets seen by the networking stack on both the physical and
> > MACsec virtual interface are exactly the same

That bothers me as well.

> If the HW/driver is expected to strip the sectag, I don't see how we
> could ever have multiple secy's at the same time and demultiplex
> properly between them. That's part of the reason why I chose to have
> virtual interfaces: without them, picking the right secy on TX gets
> weird.
> 
> AFAICT, it means that any filters (tc, tcpdump) need to be different
> between offloaded and non-offloaded cases.
> 
> And it's going to be confusing to the administrator when they look at
> tcpdumps expecting to see MACsec frames.

Right. I did not see how *not* to strip the sectag in the h/w back then,
I'll have another look because that would improve things a lot.

@all: do other MACsec offloading hardware allow not to stip the sectag?

> I don't see how this implementation handles non-macsec traffic (on TX,
> that would be packets sent directly through the real interface, for
> example by wpa_supplicant - on RX, incoming MKA traffic for
> wpa_supplicant). Unless I missed something, incoming MKA traffic will
> end up on the macsec interface as well as the lower interface (not
> entirely critical, as long as wpa_supplicant can grab it on the lower
> device, but not consistent with the software implementation).

That's right, as we have no way to tell if an Rx packet was MACsec or
non-MACsec traffic, both will end up on both interfaces. Some h/w may be
able to insert a custom header (or may allow not to strip the sectag),
but I did not find anything related to this in mine (I'll double check).

> How does the driver distinguish traffic that should pass through
> unmodified from traffic that the HW needs to encapsulate and encrypt?

At least in PHYs, packets go in a classification unit (that can match on
multiple parts of the packet, given the hardware capabilities, eg. the
MAC addresses). The result of the match is an action, which can be
"bypass the MACsec block" or "go through it (which links the packet to a
given configuration)". This is done in Rx and in Tx, and this is how the
h/w block will know what to encapsulate and encrypt.

Thanks,
Antoine

-- 
Antoine Ténart, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

* pull-request: mac80211 2019-08-21
From: Johannes Berg @ 2019-08-21 10:00 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-wireless

Hi Dave,

I have here for you a few fixes; three, to be specific. Nothing that
warrants real discussion or urgency though.

Please pull and let me know if there's any problem.

Thanks,
johannes



The following changes since commit a1c4cd67840ef80f6ca5f73326fa9a6719303a95:

  net: fix __ip_mc_inc_group usage (2019-08-20 12:48:06 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211.git tags/mac80211-for-davem-2019-08-21

for you to fetch changes up to 0d31d4dbf38412f5b8b11b4511d07b840eebe8cb:

  Revert "cfg80211: fix processing world regdomain when non modular" (2019-08-21 10:43:03 +0200)

----------------------------------------------------------------
Just three fixes:
 * extended key ID key installation
 * regulatory processing
 * possible memory leak in an error path

----------------------------------------------------------------
Alexander Wetzel (1):
      cfg80211: Fix Extended Key ID key install checks

Hodaszi, Robert (1):
      Revert "cfg80211: fix processing world regdomain when non modular"

Johannes Berg (1):
      mac80211: fix possible sta leak

 net/mac80211/cfg.c  |  9 +++++----
 net/wireless/reg.c  |  2 +-
 net/wireless/util.c | 23 ++++++++++++++---------
 3 files changed, 20 insertions(+), 14 deletions(-)


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox