Netdev List
 help / color / mirror / Atom feed
* [PATCH 5/8] net: ixp4xx: Spelling s/XSacle/XScale/
From: Geert Uytterhoeven @ 2019-07-31 13:22 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, linux-kernel, Geert Uytterhoeven
In-Reply-To: <20190731132216.17194-1-geert+renesas@glider.be>

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 drivers/net/ethernet/xscale/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/xscale/Kconfig b/drivers/net/ethernet/xscale/Kconfig
index 2f354ba029a61491..cd0a8f46e7c6c143 100644
--- a/drivers/net/ethernet/xscale/Kconfig
+++ b/drivers/net/ethernet/xscale/Kconfig
@@ -13,7 +13,7 @@ config NET_VENDOR_XSCALE
 
 	  Note that the answer to this question does not directly affect the
 	  kernel: saying N will just cause the configurator to skip all
-	  the questions about XSacle IXP devices. If you say Y, you will be
+	  the questions about XScale IXP devices. If you say Y, you will be
 	  asked for your specific card in the following questions.
 
 if NET_VENDOR_XSCALE
-- 
2.17.1


^ permalink raw reply related

* [PATCH 2/8] net: amd: Spelling s/case/cause/
From: Geert Uytterhoeven @ 2019-07-31 13:22 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, linux-kernel, Geert Uytterhoeven
In-Reply-To: <20190731132216.17194-1-geert+renesas@glider.be>

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 drivers/net/ethernet/amd/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/amd/Kconfig b/drivers/net/ethernet/amd/Kconfig
index de4950d2022e0a6d..9f965cdfff5c99db 100644
--- a/drivers/net/ethernet/amd/Kconfig
+++ b/drivers/net/ethernet/amd/Kconfig
@@ -14,7 +14,7 @@ config NET_VENDOR_AMD
 	  say Y.
 
 	  Note that the answer to this question does not directly affect
-	  the kernel: saying N will just case the configurator to skip all
+	  the kernel: saying N will just cause the configurator to skip all
 	  the questions regarding AMD chipsets. If you say Y, you will be asked
 	  for your specific chipset/driver in the following questions.
 
-- 
2.17.1


^ permalink raw reply related

* [PATCH 7/8] net: packetengines: Fix manufacturer spelling and capitalization
From: Geert Uytterhoeven @ 2019-07-31 13:22 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, linux-kernel, Geert Uytterhoeven
In-Reply-To: <20190731132216.17194-1-geert+renesas@glider.be>

Use "Packet Engines" consistently.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 drivers/net/ethernet/packetengines/Kconfig  | 6 +++---
 drivers/net/ethernet/packetengines/Makefile | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/packetengines/Kconfig b/drivers/net/ethernet/packetengines/Kconfig
index 8161e308e64b0f16..ead3750b4489d1bf 100644
--- a/drivers/net/ethernet/packetengines/Kconfig
+++ b/drivers/net/ethernet/packetengines/Kconfig
@@ -1,10 +1,10 @@
 # SPDX-License-Identifier: GPL-2.0-only
 #
-# Packet engine device configuration
+# Packet Engines device configuration
 #
 
 config NET_VENDOR_PACKET_ENGINES
-	bool "Packet Engine devices"
+	bool "Packet Engines devices"
 	default y
 	depends on PCI
 	---help---
@@ -12,7 +12,7 @@ config NET_VENDOR_PACKET_ENGINES
 
 	  Note that the answer to this question doesn't directly affect the
 	  kernel: saying N will just cause the configurator to skip all
-	  the questions about packet engine devices. If you say Y, you will
+	  the questions about Packet Engines devices. If you say Y, you will
 	  be asked for your specific card in the following questions.
 
 if NET_VENDOR_PACKET_ENGINES
diff --git a/drivers/net/ethernet/packetengines/Makefile b/drivers/net/ethernet/packetengines/Makefile
index 1553c9cfc254d6f8..cf054b796d111231 100644
--- a/drivers/net/ethernet/packetengines/Makefile
+++ b/drivers/net/ethernet/packetengines/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 #
-# Makefile for the Packet Engine network device drivers.
+# Makefile for the Packet Engines network device drivers.
 #
 
 obj-$(CONFIG_HAMACHI) += hamachi.o
-- 
2.17.1


^ permalink raw reply related

* [PATCH 4/8] net: broadcom: Fix manufacturer name in Kconfig help text
From: Geert Uytterhoeven @ 2019-07-31 13:22 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, linux-kernel, Geert Uytterhoeven
In-Reply-To: <20190731132216.17194-1-geert+renesas@glider.be>

The help text refers to AMD instead of Broadcom, presumably because it
was copied from the former.

Fixes: adfc5217e9db68d3 ("broadcom: Move the Broadcom drivers")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 drivers/net/ethernet/broadcom/Kconfig | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/Kconfig b/drivers/net/ethernet/broadcom/Kconfig
index e9017caf024dcf33..e24f5d2b6afe3547 100644
--- a/drivers/net/ethernet/broadcom/Kconfig
+++ b/drivers/net/ethernet/broadcom/Kconfig
@@ -14,9 +14,9 @@ config NET_VENDOR_BROADCOM
 	  say Y.
 
 	  Note that the answer to this question does not directly affect
-	  the kernel: saying N will just case the configurator to skip all
-	  the questions regarding AMD chipsets. If you say Y, you will be asked
-	  for your specific chipset/driver in the following questions.
+	  the kernel: saying N will just cause the configurator to skip all
+	  the questions regarding Broadcom chipsets. If you say Y, you will
+	  be asked for your specific chipset/driver in the following questions.
 
 if NET_VENDOR_BROADCOM
 
-- 
2.17.1


^ permalink raw reply related

* [PATCH 8/8] net: samsung: Spelling s/case/cause/
From: Geert Uytterhoeven @ 2019-07-31 13:22 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, linux-kernel, Geert Uytterhoeven
In-Reply-To: <20190731132216.17194-1-geert+renesas@glider.be>

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 drivers/net/ethernet/samsung/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/samsung/Kconfig b/drivers/net/ethernet/samsung/Kconfig
index 027938017579130f..e92a178a76df0849 100644
--- a/drivers/net/ethernet/samsung/Kconfig
+++ b/drivers/net/ethernet/samsung/Kconfig
@@ -11,7 +11,7 @@ config NET_VENDOR_SAMSUNG
 	  say Y.
 
 	  Note that the answer to this question does not directly affect
-	  the kernel: saying N will just case the configurator to skip all
+	  the kernel: saying N will just cause the configurator to skip all
 	  the questions about Samsung chipsets. If you say Y, you will be asked
 	  for your specific chipset/driver in the following questions.
 
-- 
2.17.1


^ permalink raw reply related

* [PATCH 6/8] net: nixge: Spelling s/Instrument/Instruments/
From: Geert Uytterhoeven @ 2019-07-31 13:22 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, linux-kernel, Geert Uytterhoeven
In-Reply-To: <20190731132216.17194-1-geert+renesas@glider.be>

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 drivers/net/ethernet/ni/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ni/Kconfig b/drivers/net/ethernet/ni/Kconfig
index 70b1a03c0953e696..01229190132d484f 100644
--- a/drivers/net/ethernet/ni/Kconfig
+++ b/drivers/net/ethernet/ni/Kconfig
@@ -11,7 +11,7 @@ config NET_VENDOR_NI
 
 	  Note that the answer to this question doesn't directly affect the
 	  kernel: saying N will just cause the configurator to skip all
-	  the questions about National Instrument devices.
+	  the questions about National Instruments devices.
 	  If you say Y, you will be asked for your specific device in the
 	  following questions.
 
-- 
2.17.1


^ permalink raw reply related

* [PATCH 0/8] net: Manufacturer names and spelling fixes
From: Geert Uytterhoeven @ 2019-07-31 13:22 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, linux-kernel, Geert Uytterhoeven

	Hi David,

This is a set of fixes for (some blatantly) wrong manufacturer names and
various spelling issues, mostly in Kconfig help texts.

Thanks!

Geert Uytterhoeven (8):
  net: 8390: Fix manufacturer name in Kconfig help text
  net: amd: Spelling s/case/cause/
  net: apple: Fix manufacturer name in Kconfig help text
  net: broadcom: Fix manufacturer name in Kconfig help text
  net: ixp4xx: Spelling s/XSacle/XScale/
  net: nixge: Spelling s/Instrument/Instruments/
  net: packetengines: Fix manufacturer spelling and capitalization
  net: samsung: Spelling s/case/cause/

 drivers/net/ethernet/8390/Kconfig           | 4 ++--
 drivers/net/ethernet/amd/Kconfig            | 2 +-
 drivers/net/ethernet/apple/Kconfig          | 4 ++--
 drivers/net/ethernet/broadcom/Kconfig       | 6 +++---
 drivers/net/ethernet/ni/Kconfig             | 2 +-
 drivers/net/ethernet/packetengines/Kconfig  | 6 +++---
 drivers/net/ethernet/packetengines/Makefile | 2 +-
 drivers/net/ethernet/samsung/Kconfig        | 2 +-
 drivers/net/ethernet/xscale/Kconfig         | 2 +-
 9 files changed, 15 insertions(+), 15 deletions(-)

-- 
2.17.1

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH bpf-next 1/9] selftests/bpf: prevent headers to be compiled as C code
From: Ilya Leoshkevich @ 2019-07-31 13:21 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Stanislav Fomichev, Andrii Nakryiko, bpf, Networking,
	Alexei Starovoitov, Daniel Borkmann, Kernel Team
In-Reply-To: <CAEf4Bzbj6RWUo8Q7wgMnbL=T7V2yK2=gbdO3sSfxJ71Gp6jeYA@mail.gmail.com>

> Am 27.07.2019 um 20:53 schrieb Andrii Nakryiko <andrii.nakryiko@gmail.com>:
> 
> On Fri, Jul 26, 2019 at 3:01 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
>> 
>> On 07/26, Andrii Nakryiko wrote:
>>> On Fri, Jul 26, 2019 at 2:21 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
>>>> 
>>>> On 07/26, Andrii Nakryiko wrote:
>>>>> Apprently listing header as a normal dependency for a binary output
>>>>> makes it go through compilation as if it was C code. This currently
>>>>> works without a problem, but in subsequent commits causes problems for
>>>>> differently generated test.h for test_progs. Marking those headers as
>>>>> order-only dependency solves the issue.
>>>> Are you sure it will not result in a situation where
>>>> test_progs/test_maps is not regenerated if tests.h is updated.
>>>> 
>>>> If I read the following doc correctly, order deps make sense for
>>>> directories only:
>>>> https://www.gnu.org/software/make/manual/html_node/Prerequisite-Types.html
>>>> 
>>>> Can you maybe double check it with:
>>>> * make
>>>> * add new prog_tests/test_something.c
>>>> * make
>>>> to see if the binary is regenerated with test_something.c?
>>> 
>>> Yeah, tested that, it triggers test_progs rebuild.
>>> 
>>> Ordering is still preserved, because test.h is dependency of
>>> test_progs.c, which is dependency of test_progs binary, so that's why
>>> it works.
>>> 
>>> As to why .h file is compiled as C file, I have no idea and ideally
>>> that should be fixed somehow.
>> I guess that's because it's a prerequisite and we have a target that
>> puts all prerequisites when calling CC:
>> 
>> test_progs: a.c b.c tests.h
>>        gcc a.c b.c tests.h -o test_progs
>> 
>> So gcc compiles each input file.
> 
> If that's really a definition of the rule, then it seems not exactly
> correct. What if some of the prerequisites are some object files,
> directories, etc. I'd say the rule should only include .c files. I'll
> add it to my TODO list to go and check how all this is defined
> somewhere, but for now I'm leaving everything as is in this patch.
> 

I believe it’s an implicit built-in rule, which is defined by make
itself here:

https://git.savannah.gnu.org/cgit/make.git/tree/default.c?h=4.2.1#n131

The observed behavior arises because these rules use $^ all over the
place. My 2c is that it would be better to make the rule explicit,
because it would cost just an extra line, but it would be immediately
obvious why the code is correct w.r.t. rebuilds.

Best regards,
Ilya

^ permalink raw reply

* [PATCH v2] isdn: hfcsusb: Fix mISDN driver crash caused by transfer buffer on the stack
From: Juliana Rodrigueiro @ 2019-07-31 13:17 UTC (permalink / raw)
  To: isdn, netdev; +Cc: thomas.jarosch

Since linux 4.9 it is not possible to use buffers on the stack for DMA transfers.

During usb probe the driver crashes with "transfer buffer is on stack" message.

This fix k-allocates a buffer to be used on "read_reg_atomic", which is a macro
that calls "usb_control_msg" under the hood.

Kernel 4.19 backtrace:

usb_hcd_submit_urb+0x3e5/0x900
? sched_clock+0x9/0x10
? log_store+0x203/0x270
? get_random_u32+0x6f/0x90
? cache_alloc_refill+0x784/0x8a0
usb_submit_urb+0x3b4/0x550
usb_start_wait_urb+0x4e/0xd0
usb_control_msg+0xb8/0x120
hfcsusb_probe+0x6bc/0xb40 [hfcsusb]
usb_probe_interface+0xc2/0x260
really_probe+0x176/0x280
driver_probe_device+0x49/0x130
__driver_attach+0xa9/0xb0
? driver_probe_device+0x130/0x130
bus_for_each_dev+0x5a/0x90
driver_attach+0x14/0x20
? driver_probe_device+0x130/0x130
bus_add_driver+0x157/0x1e0
driver_register+0x51/0xe0
usb_register_driver+0x5d/0x120
? 0xf81ed000
hfcsusb_drv_init+0x17/0x1000 [hfcsusb]
do_one_initcall+0x44/0x190
? free_unref_page_commit+0x6a/0xd0
do_init_module+0x46/0x1c0
load_module+0x1dc1/0x2400
sys_init_module+0xed/0x120
do_fast_syscall_32+0x7a/0x200
entry_SYSENTER_32+0x6b/0xbe

Signed-off-by: Juliana Rodrigueiro <juliana.rodrigueiro@intra2net.com>
---
Changes in v2:
   - Ordered the local variables declaration from longest to shortest.

 drivers/isdn/hardware/mISDN/hfcsusb.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/isdn/hardware/mISDN/hfcsusb.c b/drivers/isdn/hardware/mISDN/hfcsusb.c
index 6d05946b445e..fe1117a21f6b 100644
--- a/drivers/isdn/hardware/mISDN/hfcsusb.c
+++ b/drivers/isdn/hardware/mISDN/hfcsusb.c
@@ -1704,13 +1704,23 @@ hfcsusb_stop_endpoint(struct hfcsusb *hw, int channel)
 static int
 setup_hfcsusb(struct hfcsusb *hw)
 {
+	void *dmabuf = kmalloc(sizeof(u_char), GFP_KERNEL);
 	u_char b;
+	int ret;
 
 	if (debug & DBG_HFC_CALL_TRACE)
 		printk(KERN_DEBUG "%s: %s\n", hw->name, __func__);
 
+	if (!dmabuf)
+		return -ENOMEM;
+
+	ret = read_reg_atomic(hw, HFCUSB_CHIP_ID, dmabuf);
+
+	memcpy(&b, dmabuf, sizeof(u_char));
+	kfree(dmabuf);
+
 	/* check the chip id */
-	if (read_reg_atomic(hw, HFCUSB_CHIP_ID, &b) != 1) {
+	if (ret != 1) {
 		printk(KERN_DEBUG "%s: %s: cannot read chip id\n",
 		       hw->name, __func__);
 		return 1;
-- 
2.17.2





^ permalink raw reply related

* [PATCH] net: mediatek: Drop unneeded dependency on NET_VENDOR_MEDIATEK
From: Geert Uytterhoeven @ 2019-07-31 13:12 UTC (permalink / raw)
  To: Felix Fietkau, John Crispin, Sean Wang, Nelson Chang,
	David S . Miller, Matthias Brugger
  Cc: netdev, linux-arm-kernel, linux-mediatek, Geert Uytterhoeven

The whole block is protected by "if NET_VENDOR_MEDIATEK", so there is
no need for individual driver config symbols to duplicate this
dependency.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 drivers/net/ethernet/mediatek/Kconfig | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/mediatek/Kconfig b/drivers/net/ethernet/mediatek/Kconfig
index 263cd0909fe0de39..1f7fff81f24dbb96 100644
--- a/drivers/net/ethernet/mediatek/Kconfig
+++ b/drivers/net/ethernet/mediatek/Kconfig
@@ -9,7 +9,6 @@ if NET_VENDOR_MEDIATEK
 
 config NET_MEDIATEK_SOC
 	tristate "MediaTek SoC Gigabit Ethernet support"
-	depends on NET_VENDOR_MEDIATEK
 	select PHYLIB
 	---help---
 	  This driver supports the gigabit ethernet MACs in the
-- 
2.17.1


^ permalink raw reply related

* pull-request: mac80211 2019-07-31
From: Johannes Berg @ 2019-07-31 12:49 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-wireless

Hi Dave,

We have few fixes, most importantly probably the NETIF_F_LLTX revert,
we thought we were now more layered like VLAN or such since we do all
of the queue control internally, but it caused problems, evidently not.

Please pull and let me know if there's any problem.

Thanks,
johannes



The following changes since commit 47d858d0bdcd47cc1c6c9eeca91b091dd9e55637:

  ipip: validate header length in ipip_tunnel_xmit (2019-07-25 17:23:40 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211.git tags/mac80211-for-davem-2019-07-31

for you to fetch changes up to eef347f846ee8f7296a6f84e3866c057ca6bcce0:

  Revert "mac80211: set NETIF_F_LLTX when using intermediate tx queues" (2019-07-30 14:52:50 +0200)

----------------------------------------------------------------
Just a few fixes:
 * revert NETIF_F_LLTX usage as it caused problems
 * avoid warning on WMM parameters from AP that are too short
 * fix possible null-ptr dereference in hwsim
 * fix interface combinations with 4-addr and crypto control

----------------------------------------------------------------
Brian Norris (1):
      mac80211: don't WARN on short WMM parameters from AP

Jia-Ju Bai (1):
      mac80211_hwsim: Fix possible null-pointer dereferences in hwsim_dump_radio_nl()

Johannes Berg (1):
      Revert "mac80211: set NETIF_F_LLTX when using intermediate tx queues"

Manikanta Pubbisetty (1):
      {nl,mac}80211: fix interface combinations on crypto controlled devices

 drivers/net/wireless/mac80211_hwsim.c |  8 +++++---
 include/net/cfg80211.h                | 15 +++++++++++++++
 net/mac80211/iface.c                  |  1 -
 net/mac80211/mlme.c                   | 10 ++++++++++
 net/mac80211/util.c                   |  7 +++----
 net/wireless/core.c                   |  6 ++----
 net/wireless/nl80211.c                |  4 +---
 net/wireless/util.c                   | 27 +++++++++++++++++++++++++--
 8 files changed, 61 insertions(+), 17 deletions(-)


^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH bpf-next v4 03/11] libbpf: add flags to umem config
From: Björn Töpel @ 2019-07-31 12:45 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, Karlsson, Magnus, Jakub Kicinski,
	Jonathan Lemon, Saeed Mahameed, Maxim Mikityanskiy,
	Stephen Hemminger, Bruce Richardson, ciara.loftus,
	intel-wired-lan, bpf
In-Reply-To: <20190730085400.10376-4-kevin.laatz@intel.com>

On Tue, 30 Jul 2019 at 19:43, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> This patch adds a 'flags' field to the umem_config and umem_reg structs.
> This will allow for more options to be added for configuring umems.
>
> The first use for the flags field is to add a flag for unaligned chunks
> mode. These flags can either be user-provided or filled with a default.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
>
> ---
> v2:
>   - Removed the headroom check from this patch. It has moved to the
>     previous patch.
>
> v4:
>   - modified chunk flag define
> ---
>  tools/include/uapi/linux/if_xdp.h | 9 +++++++--
>  tools/lib/bpf/xsk.c               | 3 +++
>  tools/lib/bpf/xsk.h               | 2 ++
>  3 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
> index faaa5ca2a117..a691802d7915 100644
> --- a/tools/include/uapi/linux/if_xdp.h
> +++ b/tools/include/uapi/linux/if_xdp.h
> @@ -17,6 +17,10 @@
>  #define XDP_COPY       (1 << 1) /* Force copy-mode */
>  #define XDP_ZEROCOPY   (1 << 2) /* Force zero-copy mode */
>
> +/* Flags for xsk_umem_config flags */
> +#define XDP_UMEM_UNALIGNED_CHUNK_FLAG_SHIFT 15
> +#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << XDP_UMEM_UNALIGNED_CHUNK_FLAG_SHIFT)
> +
>  struct sockaddr_xdp {
>         __u16 sxdp_family;
>         __u16 sxdp_flags;
> @@ -49,8 +53,9 @@ struct xdp_mmap_offsets {
>  #define XDP_OPTIONS                    8
>
>  struct xdp_umem_reg {
> -       __u64 addr; /* Start of packet data area */
> -       __u64 len; /* Length of packet data area */
> +       __u64 addr;     /* Start of packet data area */
> +       __u64 len:48;   /* Length of packet data area */
> +       __u64 flags:16; /* Flags for umem */

So, the flags member moved from struct sockaddr_xdp to struct
xdp_umem_reg. Makes sense. However, I'm not a fan of the bitfield. Why
not just add the flags member after the last member (headroom) and
deal with it in xsk.c/xsk_setsockopt and libbpf? The bitfield
preserves the size, but makes it hard to read/error prone IMO.


Björn


>         __u32 chunk_size;
>         __u32 headroom;
>  };
> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> index 5007b5d4fd2c..5e7e4d420ee0 100644
> --- a/tools/lib/bpf/xsk.c
> +++ b/tools/lib/bpf/xsk.c
> @@ -116,6 +116,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
>                 cfg->comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
>                 cfg->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
>                 cfg->frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
> +               cfg->flags = XSK_UMEM__DEFAULT_FLAGS;
>                 return;
>         }
>
> @@ -123,6 +124,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
>         cfg->comp_size = usr_cfg->comp_size;
>         cfg->frame_size = usr_cfg->frame_size;
>         cfg->frame_headroom = usr_cfg->frame_headroom;
> +       cfg->flags = usr_cfg->flags;
>  }
>
>  static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
> @@ -182,6 +184,7 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
>         mr.len = size;
>         mr.chunk_size = umem->config.frame_size;
>         mr.headroom = umem->config.frame_headroom;
> +       mr.flags = umem->config.flags;
>
>         err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr));
>         if (err) {
> diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
> index 833a6e60d065..44a03d8c34b9 100644
> --- a/tools/lib/bpf/xsk.h
> +++ b/tools/lib/bpf/xsk.h
> @@ -170,12 +170,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
>  #define XSK_UMEM__DEFAULT_FRAME_SHIFT    12 /* 4096 bytes */
>  #define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
>  #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
> +#define XSK_UMEM__DEFAULT_FLAGS 0
>
>  struct xsk_umem_config {
>         __u32 fill_size;
>         __u32 comp_size;
>         __u32 frame_size;
>         __u32 frame_headroom;
> +       __u32 flags;
>  };
>
>  /* Flags for the libbpf_flags field. */
> --
> 2.17.1
>
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply

* Re: [PATCH V2 4/9] vhost: reset invalidate_count in vhost_set_vring_num_addr()
From: Jason Gunthorpe @ 2019-07-31 12:41 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, kvm, virtualization, netdev, linux-kernel, linux-mm
In-Reply-To: <20190731084655.7024-5-jasowang@redhat.com>

On Wed, Jul 31, 2019 at 04:46:50AM -0400, Jason Wang wrote:
> The vhost_set_vring_num_addr() could be called in the middle of
> invalidate_range_start() and invalidate_range_end(). If we don't reset
> invalidate_count after the un-registering of MMU notifier, the
> invalidate_cont will run out of sync (e.g never reach zero). This will
> in fact disable the fast accessor path. Fixing by reset the count to
> zero.
> 
> Reported-by: Michael S. Tsirkin <mst@redhat.com>

Did Michael report this as well?

> Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
>  drivers/vhost/vhost.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 2a3154976277..2a7217c33668 100644
> +++ b/drivers/vhost/vhost.c
> @@ -2073,6 +2073,10 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d,
>  		d->has_notifier = false;
>  	}
>  
> +	/* reset invalidate_count in case we are in the middle of
> +	 * invalidate_start() and invalidate_end().
> +	 */
> +	vq->invalidate_count = 0;
>  	vhost_uninit_vq_maps(vq);
>  #endif
>  

^ permalink raw reply

* Re: [PATCH V2 7/9] vhost: do not use RCU to synchronize MMU notifier with worker
From: Jason Gunthorpe @ 2019-07-31 12:39 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, kvm, virtualization, netdev, linux-kernel, linux-mm
In-Reply-To: <20190731084655.7024-8-jasowang@redhat.com>

On Wed, Jul 31, 2019 at 04:46:53AM -0400, Jason Wang wrote:
> We used to use RCU to synchronize MMU notifier with worker. This leads
> calling synchronize_rcu() in invalidate_range_start(). But on a busy
> system, there would be many factors that may slow down the
> synchronize_rcu() which makes it unsuitable to be called in MMU
> notifier.
> 
> A solution is SRCU but its overhead is obvious with the expensive full
> memory barrier. Another choice is to use seqlock, but it doesn't
> provide a synchronization method between readers and writers. The last
> choice is to use vq mutex, but it need to deal with the worst case
> that MMU notifier must be blocked and wait for the finish of swap in.
> 
> So this patch switches use a counter to track whether or not the map
> was used. The counter was increased when vq try to start or finish
> uses the map. This means, when it was even, we're sure there's no
> readers and MMU notifier is synchronized. When it was odd, it means
> there's a reader we need to wait it to be even again then we are
> synchronized. 

You just described a seqlock.

We've been talking about providing this as some core service from mmu
notifiers because nearly every use of this API needs it.

IMHO this gets the whole thing backwards, the common pattern is to
protect the 'shadow pte' data with a seqlock (usually open coded),
such that the mmu notififer side has the write side of that lock and
the read side is consumed by the thread accessing or updating the SPTE.


> Reported-by: Michael S. Tsirkin <mst@redhat.com>
> Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
>  drivers/vhost/vhost.c | 145 ++++++++++++++++++++++++++----------------
>  drivers/vhost/vhost.h |   7 +-
>  2 files changed, 94 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index cfc11f9ed9c9..db2c81cb1e90 100644
> +++ b/drivers/vhost/vhost.c
> @@ -324,17 +324,16 @@ static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq)
>  
>  	spin_lock(&vq->mmu_lock);
>  	for (i = 0; i < VHOST_NUM_ADDRS; i++) {
> -		map[i] = rcu_dereference_protected(vq->maps[i],
> -				  lockdep_is_held(&vq->mmu_lock));
> +		map[i] = vq->maps[i];
>  		if (map[i]) {
>  			vhost_set_map_dirty(vq, map[i], i);
> -			rcu_assign_pointer(vq->maps[i], NULL);
> +			vq->maps[i] = NULL;
>  		}
>  	}
>  	spin_unlock(&vq->mmu_lock);
>  
> -	/* No need for synchronize_rcu() or kfree_rcu() since we are
> -	 * serialized with memory accessors (e.g vq mutex held).
> +	/* No need for synchronization since we are serialized with
> +	 * memory accessors (e.g vq mutex held).
>  	 */
>  
>  	for (i = 0; i < VHOST_NUM_ADDRS; i++)
> @@ -362,6 +361,44 @@ static bool vhost_map_range_overlap(struct vhost_uaddr *uaddr,
>  	return !(end < uaddr->uaddr || start > uaddr->uaddr - 1 + uaddr->size);
>  }
>  
> +static void inline vhost_vq_access_map_begin(struct vhost_virtqueue *vq)
> +{
> +	int ref = READ_ONCE(vq->ref);

Is a lock/single threaded supposed to be held for this?

> +
> +	smp_store_release(&vq->ref, ref + 1);
> +	/* Make sure ref counter is visible before accessing the map */
> +	smp_load_acquire(&vq->ref);

release/acquire semantics are intended to protect blocks of related
data, so reading something with acquire and throwing away the result
is nonsense.

> +}
> +
> +static void inline vhost_vq_access_map_end(struct vhost_virtqueue *vq)
> +{
> +	int ref = READ_ONCE(vq->ref);

If the write to vq->ref is not locked this algorithm won't work, if it
is locked the READ_ONCE is not needed.

> +	/* Make sure vq access is done before increasing ref counter */
> +	smp_store_release(&vq->ref, ref + 1);
> +}
> +
> +static void inline vhost_vq_sync_access(struct vhost_virtqueue *vq)
> +{
> +	int ref;
> +
> +	/* Make sure map change was done before checking ref counter */
> +	smp_mb();

This is probably smp_rmb after reading ref, and if you are setting ref
with smp_store_release then this should be smp_load_acquire() without
an explicit mb.

> +	ref = READ_ONCE(vq->ref);
> +	if (ref & 0x1) {
> +		/* When ref change, we are sure no reader can see
> +		 * previous map */
> +		while (READ_ONCE(vq->ref) == ref) {
> +			set_current_state(TASK_RUNNING);
> +			schedule();
> +		}
> +	}

This is basically read_seqcount_begin()' with a schedule instead of
cpu_relax


> +	/* Make sure ref counter was checked before any other
> +	 * operations that was dene on map. */
> +	smp_mb();

should be in a smp_load_acquire()

> +}
> +
>  static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq,
>  				      int index,
>  				      unsigned long start,
> @@ -376,16 +413,15 @@ static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq,
>  	spin_lock(&vq->mmu_lock);
>  	++vq->invalidate_count;
>  
> -	map = rcu_dereference_protected(vq->maps[index],
> -					lockdep_is_held(&vq->mmu_lock));
> +	map = vq->maps[index];
>  	if (map) {
>  		vhost_set_map_dirty(vq, map, index);
> -		rcu_assign_pointer(vq->maps[index], NULL);
> +		vq->maps[index] = NULL;
>  	}
>  	spin_unlock(&vq->mmu_lock);
>  
>  	if (map) {
> -		synchronize_rcu();
> +		vhost_vq_sync_access(vq);

What prevents racing with vhost_vq_access_map_end here?

>  		vhost_map_unprefetch(map);
>  	}
>  }

Overall I don't like it. 

We are trying to get rid of these botique mmu notifier patterns in
drivers. 

Jason

^ permalink raw reply

* [PATCH 2/2] cnic: Use refcount_t for refcount
From: Chuhong Yuan @ 2019-07-31 12:22 UTC (permalink / raw)
  Cc: David S . Miller, netdev, linux-kernel, Chuhong Yuan

refcount_t is better for reference counters since its
implementation can prevent overflows.
So convert atomic_t ref counters to refcount_t.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
 drivers/net/ethernet/broadcom/cnic.c    | 26 ++++++++++++-------------
 drivers/net/ethernet/broadcom/cnic_if.h |  6 +++---
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/cnic.c b/drivers/net/ethernet/broadcom/cnic.c
index 57dc3cbff36e..215777d63cda 100644
--- a/drivers/net/ethernet/broadcom/cnic.c
+++ b/drivers/net/ethernet/broadcom/cnic.c
@@ -141,22 +141,22 @@ static int cnic_uio_close(struct uio_info *uinfo, struct inode *inode)
 
 static inline void cnic_hold(struct cnic_dev *dev)
 {
-	atomic_inc(&dev->ref_count);
+	refcount_inc(&dev->ref_count);
 }
 
 static inline void cnic_put(struct cnic_dev *dev)
 {
-	atomic_dec(&dev->ref_count);
+	refcount_dec(&dev->ref_count);
 }
 
 static inline void csk_hold(struct cnic_sock *csk)
 {
-	atomic_inc(&csk->ref_count);
+	refcount_inc(&csk->ref_count);
 }
 
 static inline void csk_put(struct cnic_sock *csk)
 {
-	atomic_dec(&csk->ref_count);
+	refcount_dec(&csk->ref_count);
 }
 
 static struct cnic_dev *cnic_from_netdev(struct net_device *netdev)
@@ -177,12 +177,12 @@ static struct cnic_dev *cnic_from_netdev(struct net_device *netdev)
 
 static inline void ulp_get(struct cnic_ulp_ops *ulp_ops)
 {
-	atomic_inc(&ulp_ops->ref_count);
+	refcount_inc(&ulp_ops->ref_count);
 }
 
 static inline void ulp_put(struct cnic_ulp_ops *ulp_ops)
 {
-	atomic_dec(&ulp_ops->ref_count);
+	refcount_dec(&ulp_ops->ref_count);
 }
 
 static void cnic_ctx_wr(struct cnic_dev *dev, u32 cid_addr, u32 off, u32 val)
@@ -494,7 +494,7 @@ int cnic_register_driver(int ulp_type, struct cnic_ulp_ops *ulp_ops)
 	}
 	read_unlock(&cnic_dev_lock);
 
-	atomic_set(&ulp_ops->ref_count, 0);
+	refcount_set(&ulp_ops->ref_count, 0);
 	rcu_assign_pointer(cnic_ulp_tbl[ulp_type], ulp_ops);
 	mutex_unlock(&cnic_lock);
 
@@ -545,12 +545,12 @@ int cnic_unregister_driver(int ulp_type)
 
 	mutex_unlock(&cnic_lock);
 	synchronize_rcu();
-	while ((atomic_read(&ulp_ops->ref_count) != 0) && (i < 20)) {
+	while ((refcount_read(&ulp_ops->ref_count) != 0) && (i < 20)) {
 		msleep(100);
 		i++;
 	}
 
-	if (atomic_read(&ulp_ops->ref_count) != 0)
+	if (refcount_read(&ulp_ops->ref_count) != 0)
 		pr_warn("%s: Failed waiting for ref count to go to zero\n",
 			__func__);
 	return 0;
@@ -3596,7 +3596,7 @@ static int cnic_cm_create(struct cnic_dev *dev, int ulp_type, u32 cid,
 	}
 
 	csk1 = &cp->csk_tbl[l5_cid];
-	if (atomic_read(&csk1->ref_count))
+	if (refcount_read(&csk1->ref_count))
 		return -EAGAIN;
 
 	if (test_and_set_bit(SK_F_INUSE, &csk1->flags))
@@ -3651,7 +3651,7 @@ static int cnic_cm_destroy(struct cnic_sock *csk)
 	csk_hold(csk);
 	clear_bit(SK_F_INUSE, &csk->flags);
 	smp_mb__after_atomic();
-	while (atomic_read(&csk->ref_count) != 1)
+	while (refcount_read(&csk->ref_count) != 1)
 		msleep(1);
 	cnic_cm_cleanup(csk);
 
@@ -5432,11 +5432,11 @@ static void cnic_free_dev(struct cnic_dev *dev)
 {
 	int i = 0;
 
-	while ((atomic_read(&dev->ref_count) != 0) && i < 10) {
+	while ((refcount_read(&dev->ref_count) != 0) && i < 10) {
 		msleep(100);
 		i++;
 	}
-	if (atomic_read(&dev->ref_count) != 0)
+	if (refcount_read(&dev->ref_count) != 0)
 		netdev_err(dev->netdev, "Failed waiting for ref count to go to zero\n");
 
 	netdev_info(dev->netdev, "Removed CNIC device\n");
diff --git a/drivers/net/ethernet/broadcom/cnic_if.h b/drivers/net/ethernet/broadcom/cnic_if.h
index 789e5c7e9311..5232a05ac7ba 100644
--- a/drivers/net/ethernet/broadcom/cnic_if.h
+++ b/drivers/net/ethernet/broadcom/cnic_if.h
@@ -300,7 +300,7 @@ struct cnic_sock {
 #define SK_F_CLOSING		7
 #define SK_F_HW_ERR		8
 
-	atomic_t ref_count;
+	refcount_t ref_count;
 	u32 state;
 	struct kwqe kwqe1;
 	struct kwqe kwqe2;
@@ -335,7 +335,7 @@ struct cnic_dev {
 #define CNIC_F_CNIC_UP		1
 #define CNIC_F_BNX2_CLASS	3
 #define CNIC_F_BNX2X_CLASS	4
-	atomic_t	ref_count;
+	refcount_t	ref_count;
 	u8		mac_addr[ETH_ALEN];
 
 	int		max_iscsi_conn;
@@ -378,7 +378,7 @@ struct cnic_ulp_ops {
 				  char *data, u16 data_size);
 	int (*cnic_get_stats)(void *ulp_ctx);
 	struct module *owner;
-	atomic_t ref_count;
+	refcount_t ref_count;
 };
 
 int cnic_register_driver(int ulp_type, struct cnic_ulp_ops *ulp_ops);
-- 
2.20.1


^ permalink raw reply related

* [PATCH 1/2] bnxt_en: Use refcount_t for refcount
From: Chuhong Yuan @ 2019-07-31 12:22 UTC (permalink / raw)
  Cc: Michael Chan, David S . Miller, netdev, linux-kernel,
	Chuhong Yuan

refcount_t is better for reference counters since its
implementation can prevent overflows.
So convert atomic_t ref counters to refcount_t.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 8 ++++----
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
index fc77caf0a076..eb7ed34639e2 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
@@ -49,7 +49,7 @@ static int bnxt_register_dev(struct bnxt_en_dev *edev, int ulp_id,
 			return -ENOMEM;
 	}
 
-	atomic_set(&ulp->ref_count, 0);
+	refcount_set(&ulp->ref_count, 0);
 	ulp->handle = handle;
 	rcu_assign_pointer(ulp->ulp_ops, ulp_ops);
 
@@ -87,7 +87,7 @@ static int bnxt_unregister_dev(struct bnxt_en_dev *edev, int ulp_id)
 	synchronize_rcu();
 	ulp->max_async_event_id = 0;
 	ulp->async_events_bmap = NULL;
-	while (atomic_read(&ulp->ref_count) != 0 && i < 10) {
+	while (refcount_read(&ulp->ref_count) != 0 && i < 10) {
 		msleep(100);
 		i++;
 	}
@@ -246,12 +246,12 @@ static int bnxt_send_msg(struct bnxt_en_dev *edev, int ulp_id,
 
 static void bnxt_ulp_get(struct bnxt_ulp *ulp)
 {
-	atomic_inc(&ulp->ref_count);
+	refcount_inc(&ulp->ref_count);
 }
 
 static void bnxt_ulp_put(struct bnxt_ulp *ulp)
 {
-	atomic_dec(&ulp->ref_count);
+	refcount_dec(&ulp->ref_count);
 }
 
 void bnxt_ulp_stop(struct bnxt *bp)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h
index cd78453d0bf0..fc4aa582d190 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h
@@ -52,7 +52,7 @@ struct bnxt_ulp {
 	u16		max_async_event_id;
 	u16		msix_requested;
 	u16		msix_base;
-	atomic_t	ref_count;
+	refcount_t	ref_count;
 };
 
 struct bnxt_en_dev {
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH] net: sctp: Rename fallthrough label to unhandled
From: Neil Horman @ 2019-07-31 12:16 UTC (permalink / raw)
  To: Joe Perches
  Cc: Vlad Yasevich, Marcelo Ricardo Leitner, David S. Miller,
	linux-sctp, netdev, linux-kernel
In-Reply-To: <eac3fe457d553a2b366e1c1898d47ae8c048087c.camel@perches.com>

On Wed, Jul 31, 2019 at 04:32:43AM -0700, Joe Perches wrote:
> On Wed, 2019-07-31 at 07:19 -0400, Neil Horman wrote:
> > On Tue, Jul 30, 2019 at 10:04:37PM -0700, Joe Perches wrote:
> > > fallthrough may become a pseudo reserved keyword so this only use of
> > > fallthrough is better renamed to allow it.
> > > 
> > > Signed-off-by: Joe Perches <joe@perches.com>
> > Are you referring to the __attribute__((fallthrough)) statement that gcc
> > supports?  If so the compiler should by all rights be able to differentiate
> > between a null statement attribute and a explicit goto and label without the
> > need for renaming here.  Or are you referring to something else?
> 
> Hi.
> 
> I sent after this a patch that adds
> 
> # define fallthrough                    __attribute__((__fallthrough__))
> 
> https://lore.kernel.org/patchwork/patch/1108577/
> 
> So this rename is a prerequisite to adding this #define.
> 
why not just define __fallthrough instead, like we do for all the other
attributes we alias (i.e. __read_mostly, __protected_by, __unused, __exception,
etc)

Neil

> > > diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> []
> > > @@ -2152,7 +2152,7 @@ static enum sctp_ierror sctp_verify_param(struct net *net,
> > >  	case SCTP_PARAM_SET_PRIMARY:
> > >  		if (net->sctp.addip_enable)
> > >  			break;
> > > -		goto fallthrough;
> > > +		goto unhandled;
> 
> etc...
> 
> 
> 

^ permalink raw reply

* [PATCH nf,v2] netfilter: nf_tables: map basechain priority to hardware priority
From: Pablo Neira Ayuso @ 2019-07-31 12:16 UTC (permalink / raw)
  To: netfilter-devel
  Cc: marcelo.leitner, wenxu, jiri, saeedm, gerlitz.or, paulb, netdev,
	jakub.kicinski

This patch adds initial support for offloading basechains using the
priority range from -8192 to 8191.

The software priority -8192 is mapped to the hardware priority
0xC000 + 1. tcf_auto_prio() uses 0xC000 if the user specifies no
priority, then it subtracts 1 for each new tcf_proto object. This patch
reserves the hardware priority range from 0xC000 to 0xFFFF for
netfilter.

The software to hardware priority mapping is not exposed to userspace,
so it should be possible to update this / extend the range of supported
priorities later on.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
v2: fix indent in priority assignment - Marcelo Leitner.
    Missing << 16 bitshift.

 include/net/netfilter/nf_tables_offload.h |  2 ++
 net/netfilter/nf_tables_api.c             |  4 ++++
 net/netfilter/nf_tables_offload.c         | 31 ++++++++++++++++++++++++++++---
 3 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/include/net/netfilter/nf_tables_offload.h b/include/net/netfilter/nf_tables_offload.h
index 3196663a10e3..2d497394021e 100644
--- a/include/net/netfilter/nf_tables_offload.h
+++ b/include/net/netfilter/nf_tables_offload.h
@@ -73,4 +73,6 @@ int nft_flow_rule_offload_commit(struct net *net);
 	(__reg)->key		= __key;				\
 	memset(&(__reg)->mask, 0xff, (__reg)->len);
 
+u16 nft_chain_offload_priority(struct nft_base_chain *basechain);
+
 #endif
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 605a7cfe7ca7..9cf0fecf5cb9 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1662,6 +1662,10 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask,
 
 		chain->flags |= NFT_BASE_CHAIN | flags;
 		basechain->policy = NF_ACCEPT;
+		if (chain->flags & NFT_CHAIN_HW_OFFLOAD &&
+		    !nft_chain_offload_priority(basechain))
+			return -EOPNOTSUPP;
+
 		flow_block_init(&basechain->flow_block);
 	} else {
 		chain = kzalloc(sizeof(*chain), GFP_KERNEL);
diff --git a/net/netfilter/nf_tables_offload.c b/net/netfilter/nf_tables_offload.c
index 64f5fd5f240e..a79efd49c8a8 100644
--- a/net/netfilter/nf_tables_offload.c
+++ b/net/netfilter/nf_tables_offload.c
@@ -103,10 +103,11 @@ void nft_offload_update_dependency(struct nft_offload_ctx *ctx,
 }
 
 static void nft_flow_offload_common_init(struct flow_cls_common_offload *common,
-					 __be16 proto,
-					struct netlink_ext_ack *extack)
+					 __be16 proto, int priority,
+					 struct netlink_ext_ack *extack)
 {
 	common->protocol = proto;
+	common->prio = priority << 16;
 	common->extack = extack;
 }
 
@@ -124,6 +125,28 @@ static int nft_setup_cb_call(struct nft_base_chain *basechain,
 	return 0;
 }
 
+/* Available priorities for hardware offload range: -8192..8191 */
+#define NFT_BASECHAIN_OFFLOAD_PRIO_MAX		(SHRT_MAX / 4)
+#define NFT_BASECHAIN_OFFLOAD_PRIO_MIN		(SHRT_MIN / 4)
+#define NFT_BASECHAIN_OFFLOAD_PRIO_RANGE	(USHRT_MAX / 4)
+/* tcf_auto_prio() uses 0xC000 as base, then subtract one for each new chain. */
+#define NFT_BASECHAIN_OFFLOAD_HW_PRIO_BASE	(0xC000 + 1)
+
+u16 nft_chain_offload_priority(struct nft_base_chain *basechain)
+{
+	u16 prio;
+
+	if (basechain->ops.priority < NFT_BASECHAIN_OFFLOAD_PRIO_MIN ||
+	    basechain->ops.priority > NFT_BASECHAIN_OFFLOAD_PRIO_MAX)
+		return 0;
+
+	/* map netfilter chain priority to hardware priority. */
+	prio = basechain->ops.priority + NFT_BASECHAIN_OFFLOAD_PRIO_MAX +
+		NFT_BASECHAIN_OFFLOAD_HW_PRIO_BASE;
+
+	return prio;
+}
+
 static int nft_flow_offload_rule(struct nft_trans *trans,
 				 enum flow_cls_command command)
 {
@@ -142,7 +165,9 @@ static int nft_flow_offload_rule(struct nft_trans *trans,
 	if (flow)
 		proto = flow->proto;
 
-	nft_flow_offload_common_init(&cls_flow.common, proto, &extack);
+	nft_flow_offload_common_init(&cls_flow.common, proto,
+				     nft_chain_offload_priority(basechain),
+				     &extack);
 	cls_flow.command = command;
 	cls_flow.cookie = (unsigned long) rule;
 	if (flow)
-- 
2.11.0


^ permalink raw reply related

* [PATCH 4/8] netfilter: add include guard to xt_connlabel.h
From: Pablo Neira Ayuso @ 2019-07-31 11:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190731115157.27020-1-pablo@netfilter.org>

From: Masahiro Yamada <yamada.masahiro@socionext.com>

Add a header include guard just in case.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/uapi/linux/netfilter/xt_connlabel.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/uapi/linux/netfilter/xt_connlabel.h b/include/uapi/linux/netfilter/xt_connlabel.h
index 2312f0ec07b2..323f0dfc2a4e 100644
--- a/include/uapi/linux/netfilter/xt_connlabel.h
+++ b/include/uapi/linux/netfilter/xt_connlabel.h
@@ -1,4 +1,8 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+
+#ifndef _UAPI_XT_CONNLABEL_H
+#define _UAPI_XT_CONNLABEL_H
+
 #include <linux/types.h>
 
 #define XT_CONNLABEL_MAXBIT 127
@@ -11,3 +15,5 @@ struct xt_connlabel_mtinfo {
 	__u16 bit;
 	__u16 options;
 };
+
+#endif /* _UAPI_XT_CONNLABEL_H */
-- 
2.11.0


^ permalink raw reply related

* [PATCH 8/8] netfilter: ebtables: also count base chain policies
From: Pablo Neira Ayuso @ 2019-07-31 11:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190731115157.27020-1-pablo@netfilter.org>

From: Florian Westphal <fw@strlen.de>

ebtables doesn't include the base chain policies in the rule count,
so we need to add them manually when we call into the x_tables core
to allocate space for the comapt offset table.

This lead syzbot to trigger:
WARNING: CPU: 1 PID: 9012 at net/netfilter/x_tables.c:649
xt_compat_add_offset.cold+0x11/0x36 net/netfilter/x_tables.c:649

Reported-by: syzbot+276ddebab3382bbf72db@syzkaller.appspotmail.com
Fixes: 2035f3ff8eaa ("netfilter: ebtables: compat: un-break 32bit setsockopt when no rules are present")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/bridge/netfilter/ebtables.c | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index fd84b48e48b5..c8177a89f52c 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1770,20 +1770,28 @@ static int compat_calc_entry(const struct ebt_entry *e,
 	return 0;
 }
 
+static int ebt_compat_init_offsets(unsigned int number)
+{
+	if (number > INT_MAX)
+		return -EINVAL;
+
+	/* also count the base chain policies */
+	number += NF_BR_NUMHOOKS;
+
+	return xt_compat_init_offsets(NFPROTO_BRIDGE, number);
+}
 
 static int compat_table_info(const struct ebt_table_info *info,
 			     struct compat_ebt_replace *newinfo)
 {
 	unsigned int size = info->entries_size;
 	const void *entries = info->entries;
+	int ret;
 
 	newinfo->entries_size = size;
-	if (info->nentries) {
-		int ret = xt_compat_init_offsets(NFPROTO_BRIDGE,
-						 info->nentries);
-		if (ret)
-			return ret;
-	}
+	ret = ebt_compat_init_offsets(info->nentries);
+	if (ret)
+		return ret;
 
 	return EBT_ENTRY_ITERATE(entries, size, compat_calc_entry, info,
 							entries, newinfo);
@@ -2234,11 +2242,9 @@ static int compat_do_replace(struct net *net, void __user *user,
 
 	xt_compat_lock(NFPROTO_BRIDGE);
 
-	if (tmp.nentries) {
-		ret = xt_compat_init_offsets(NFPROTO_BRIDGE, tmp.nentries);
-		if (ret < 0)
-			goto out_unlock;
-	}
+	ret = ebt_compat_init_offsets(tmp.nentries);
+	if (ret < 0)
+		goto out_unlock;
 
 	ret = compat_copy_entries(entries_tmp, tmp.entries_size, &state);
 	if (ret < 0)
-- 
2.11.0


^ permalink raw reply related

* [PATCH 6/8] netfilter: ipset: Copy the right MAC address in bitmap:ip,mac and hash:ip,mac sets
From: Pablo Neira Ayuso @ 2019-07-31 11:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190731115157.27020-1-pablo@netfilter.org>

From: Stefano Brivio <sbrivio@redhat.com>

In commit 8cc4ccf58379 ("ipset: Allow matching on destination MAC address
for mac and ipmac sets"), ipset.git commit 1543514c46a7, I added to the
KADT functions for sets matching on MAC addreses the copy of source or
destination MAC address depending on the configured match.

This was done correctly for hash:mac, but for hash:ip,mac and
bitmap:ip,mac, copying and pasting the same code block presents an
obvious problem: in these two set types, the MAC address is the second
dimension, not the first one, and we are actually selecting the MAC
address depending on whether the first dimension (IP address) specifies
source or destination.

Fix this by checking for the IPSET_DIM_TWO_SRC flag in option flags.

This way, mixing source and destination matches for the two dimensions
of ip,mac set types works as expected. With this setup:

  ip netns add A
  ip link add veth1 type veth peer name veth2 netns A
  ip addr add 192.0.2.1/24 dev veth1
  ip -net A addr add 192.0.2.2/24 dev veth2
  ip link set veth1 up
  ip -net A link set veth2 up

  dst=$(ip netns exec A cat /sys/class/net/veth2/address)

  ip netns exec A ipset create test_bitmap bitmap:ip,mac range 192.0.0.0/16
  ip netns exec A ipset add test_bitmap 192.0.2.1,${dst}
  ip netns exec A iptables -A INPUT -m set ! --match-set test_bitmap src,dst -j DROP

  ip netns exec A ipset create test_hash hash:ip,mac
  ip netns exec A ipset add test_hash 192.0.2.1,${dst}
  ip netns exec A iptables -A INPUT -m set ! --match-set test_hash src,dst -j DROP

ipset correctly matches a test packet:

  # ping -c1 192.0.2.2 >/dev/null
  # echo $?
  0

Reported-by: Chen Yi <yiche@redhat.com>
Fixes: 8cc4ccf58379 ("ipset: Allow matching on destination MAC address for mac and ipmac sets")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
---
 net/netfilter/ipset/ip_set_bitmap_ipmac.c | 2 +-
 net/netfilter/ipset/ip_set_hash_ipmac.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_bitmap_ipmac.c b/net/netfilter/ipset/ip_set_bitmap_ipmac.c
index ca7ac4a25ada..1d4e63326e68 100644
--- a/net/netfilter/ipset/ip_set_bitmap_ipmac.c
+++ b/net/netfilter/ipset/ip_set_bitmap_ipmac.c
@@ -226,7 +226,7 @@ bitmap_ipmac_kadt(struct ip_set *set, const struct sk_buff *skb,
 
 	e.id = ip_to_id(map, ip);
 
-	if (opt->flags & IPSET_DIM_ONE_SRC)
+	if (opt->flags & IPSET_DIM_TWO_SRC)
 		ether_addr_copy(e.ether, eth_hdr(skb)->h_source);
 	else
 		ether_addr_copy(e.ether, eth_hdr(skb)->h_dest);
diff --git a/net/netfilter/ipset/ip_set_hash_ipmac.c b/net/netfilter/ipset/ip_set_hash_ipmac.c
index eb1443408320..24d8f4df4230 100644
--- a/net/netfilter/ipset/ip_set_hash_ipmac.c
+++ b/net/netfilter/ipset/ip_set_hash_ipmac.c
@@ -93,7 +93,7 @@ hash_ipmac4_kadt(struct ip_set *set, const struct sk_buff *skb,
 	    (skb_mac_header(skb) + ETH_HLEN) > skb->data)
 		return -EINVAL;
 
-	if (opt->flags & IPSET_DIM_ONE_SRC)
+	if (opt->flags & IPSET_DIM_TWO_SRC)
 		ether_addr_copy(e.ether, eth_hdr(skb)->h_source);
 	else
 		ether_addr_copy(e.ether, eth_hdr(skb)->h_dest);
-- 
2.11.0


^ permalink raw reply related

* [PATCH 5/8] netfilter: ipset: Actually allow destination MAC address for hash:ip,mac sets too
From: Pablo Neira Ayuso @ 2019-07-31 11:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190731115157.27020-1-pablo@netfilter.org>

From: Stefano Brivio <sbrivio@redhat.com>

In commit 8cc4ccf58379 ("ipset: Allow matching on destination MAC address
for mac and ipmac sets"), ipset.git commit 1543514c46a7, I removed the
KADT check that prevents matching on destination MAC addresses for
hash:mac sets, but forgot to remove the same check for hash:ip,mac set.

Drop this check: functionality is now commented in man pages and there's
no reason to restrict to source MAC address matching anymore.

Reported-by: Chen Yi <yiche@redhat.com>
Fixes: 8cc4ccf58379 ("ipset: Allow matching on destination MAC address for mac and ipmac sets")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
---
 net/netfilter/ipset/ip_set_hash_ipmac.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_ipmac.c b/net/netfilter/ipset/ip_set_hash_ipmac.c
index faf59b6a998f..eb1443408320 100644
--- a/net/netfilter/ipset/ip_set_hash_ipmac.c
+++ b/net/netfilter/ipset/ip_set_hash_ipmac.c
@@ -89,10 +89,6 @@ hash_ipmac4_kadt(struct ip_set *set, const struct sk_buff *skb,
 	struct hash_ipmac4_elem e = { .ip = 0, { .foo[0] = 0, .foo[1] = 0 } };
 	struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set);
 
-	 /* MAC can be src only */
-	if (!(opt->flags & IPSET_DIM_TWO_SRC))
-		return 0;
-
 	if (skb_mac_header(skb) < skb->head ||
 	    (skb_mac_header(skb) + ETH_HLEN) > skb->data)
 		return -EINVAL;
-- 
2.11.0


^ permalink raw reply related

* [PATCH 7/8] netfilter: ipset: Fix rename concurrency with listing
From: Pablo Neira Ayuso @ 2019-07-31 11:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190731115157.27020-1-pablo@netfilter.org>

From: Jozsef Kadlecsik <kadlec@netfilter.org>

Shijie Luo reported that when stress-testing ipset with multiple concurrent
create, rename, flush, list, destroy commands, it can result

ipset <version>: Broken LIST kernel message: missing DATA part!

error messages and broken list results. The problem was the rename operation
was not properly handled with respect of listing. The patch fixes the issue.

Reported-by: Shijie Luo <luoshijie1@huawei.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
---
 net/netfilter/ipset/ip_set_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index 2e151856ad99..e64d5f9a89dd 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -1161,7 +1161,7 @@ static int ip_set_rename(struct net *net, struct sock *ctnl,
 		return -ENOENT;
 
 	write_lock_bh(&ip_set_ref_lock);
-	if (set->ref != 0) {
+	if (set->ref != 0 || set->ref_netlink != 0) {
 		ret = -IPSET_ERR_REFERENCED;
 		goto out;
 	}
-- 
2.11.0


^ permalink raw reply related

* [PATCH 2/8] netfilter: nf_tables: Make nft_meta expression more robust
From: Pablo Neira Ayuso @ 2019-07-31 11:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190731115157.27020-1-pablo@netfilter.org>

From: Phil Sutter <phil@nwl.cc>

nft_meta_get_eval()'s tendency to bail out setting NFT_BREAK verdict in
situations where required data is missing leads to unexpected behaviour
with inverted checks like so:

| meta iifname != eth0 accept

This rule will never match if there is no input interface (or it is not
known) which is not intuitive and, what's worse, breaks consistency of
iptables-nft with iptables-legacy.

Fix this by falling back to placing a value in dreg which never matches
(avoiding accidental matches), i.e. zero for interface index and an
empty string for interface name.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/bridge/netfilter/nft_meta_bridge.c |  6 +-----
 net/netfilter/nft_meta.c               | 16 ++++------------
 2 files changed, 5 insertions(+), 17 deletions(-)

diff --git a/net/bridge/netfilter/nft_meta_bridge.c b/net/bridge/netfilter/nft_meta_bridge.c
index bed66f536b34..a98dec2cf0cf 100644
--- a/net/bridge/netfilter/nft_meta_bridge.c
+++ b/net/bridge/netfilter/nft_meta_bridge.c
@@ -30,13 +30,9 @@ static void nft_meta_bridge_get_eval(const struct nft_expr *expr,
 	switch (priv->key) {
 	case NFT_META_BRI_IIFNAME:
 		br_dev = nft_meta_get_bridge(in);
-		if (!br_dev)
-			goto err;
 		break;
 	case NFT_META_BRI_OIFNAME:
 		br_dev = nft_meta_get_bridge(out);
-		if (!br_dev)
-			goto err;
 		break;
 	case NFT_META_BRI_IIFPVID: {
 		u16 p_pvid;
@@ -64,7 +60,7 @@ static void nft_meta_bridge_get_eval(const struct nft_expr *expr,
 		goto out;
 	}
 
-	strncpy((char *)dest, br_dev->name, IFNAMSIZ);
+	strncpy((char *)dest, br_dev ? br_dev->name : "", IFNAMSIZ);
 	return;
 out:
 	return nft_meta_get_eval(expr, regs, pkt);
diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
index f1b1d948c07b..f69afb9ff3cb 100644
--- a/net/netfilter/nft_meta.c
+++ b/net/netfilter/nft_meta.c
@@ -60,24 +60,16 @@ void nft_meta_get_eval(const struct nft_expr *expr,
 		*dest = skb->mark;
 		break;
 	case NFT_META_IIF:
-		if (in == NULL)
-			goto err;
-		*dest = in->ifindex;
+		*dest = in ? in->ifindex : 0;
 		break;
 	case NFT_META_OIF:
-		if (out == NULL)
-			goto err;
-		*dest = out->ifindex;
+		*dest = out ? out->ifindex : 0;
 		break;
 	case NFT_META_IIFNAME:
-		if (in == NULL)
-			goto err;
-		strncpy((char *)dest, in->name, IFNAMSIZ);
+		strncpy((char *)dest, in ? in->name : "", IFNAMSIZ);
 		break;
 	case NFT_META_OIFNAME:
-		if (out == NULL)
-			goto err;
-		strncpy((char *)dest, out->name, IFNAMSIZ);
+		strncpy((char *)dest, out ? out->name : "", IFNAMSIZ);
 		break;
 	case NFT_META_IIFTYPE:
 		if (in == NULL)
-- 
2.11.0


^ permalink raw reply related

* [PATCH 3/8] netfilter: nft_meta_bridge: Eliminate 'out' label
From: Pablo Neira Ayuso @ 2019-07-31 11:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190731115157.27020-1-pablo@netfilter.org>

From: Phil Sutter <phil@nwl.cc>

The label is used just once and the code it points at is not reused, no
point in keeping it.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/bridge/netfilter/nft_meta_bridge.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/bridge/netfilter/nft_meta_bridge.c b/net/bridge/netfilter/nft_meta_bridge.c
index a98dec2cf0cf..1804e867f715 100644
--- a/net/bridge/netfilter/nft_meta_bridge.c
+++ b/net/bridge/netfilter/nft_meta_bridge.c
@@ -57,13 +57,11 @@ static void nft_meta_bridge_get_eval(const struct nft_expr *expr,
 		return;
 	}
 	default:
-		goto out;
+		return nft_meta_get_eval(expr, regs, pkt);
 	}
 
 	strncpy((char *)dest, br_dev ? br_dev->name : "", IFNAMSIZ);
 	return;
-out:
-	return nft_meta_get_eval(expr, regs, pkt);
 err:
 	regs->verdict.code = NFT_BREAK;
 }
-- 
2.11.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox