Netdev List
 help / color / mirror / Atom feed
* [PATCH] octeontx2-af: Use GFP_ATOMIC under spin lock
From: Wei Yongjun @ 2018-10-25  1:42 UTC (permalink / raw)
  To: Sunil Goutham, Linu Cherian, Geetha sowjanya, Jerin Jacob
  Cc: Wei Yongjun, netdev, kernel-janitors

The function nix_update_mce_list() is called from
nix_update_bcast_mce_list(), and a spin lock is held
here, so we should use GFP_ATOMIC instead.

Fixes: 4b05528ebf0c ("octeontx2-af: Update bcast list upon NIXLF alloc/free")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
index 8890c95..a618e48 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
@@ -1294,7 +1294,7 @@ static int nix_update_mce_list(struct nix_mce_list *mce_list,
 		return 0;
 
 	/* Add a new one to the list, at the tail */
-	mce = kzalloc(sizeof(*mce), GFP_KERNEL);
+	mce = kzalloc(sizeof(*mce), GFP_ATOMIC);
 	if (!mce)
 		return -ENOMEM;
 	mce->idx = idx;

^ permalink raw reply related

* BUG: KMSAN: uninit-value in selinux_socket_bind, selinux_socket_connect_helper
From: Kyungtae Kim @ 2018-10-25 10:35 UTC (permalink / raw)
  To: davem; +Cc: lifeasageek, syzkaller, threeearcat, linux-kernel, netdev

We report two crashes related (in v4.19-rc8) :
"BUG: KMSAN: uninit-value in selinux_socket_bind"
"BUG: KMSAN: uninit-value in selinux_socket_connect_helper”

kernel config: https://kt0755.github.io/etc/config-4.19-rc2.kmsan
repro: https://kt0755.github.io/etc/repro.b0e55.c

Since both crashes share the same issue, we just explain one of the two.
When the third argument of bind() (i.e., addrlen) is zero, in __sys_bind(),
data copy from user sockaddr to kernel sockaddr does not occur
(net/socket.c:186).
However, a subsequent function selinux_socket_bind() tries to read
the kernel sockaddr (address->sa_family) that was not initialized at all.

Crash log1
==================================================================
BUG: KMSAN: uninit-value in selinux_socket_bind+0x61b/0x1040
security/selinux/hooks.c:4643
CPU: 0 PID: 19070 Comm: syz-executor6 Not tainted 4.19.0-rc8+ #18
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x305/0x460 lib/dump_stack.c:113
 kmsan_report+0x1a2/0x2e0 mm/kmsan/kmsan.c:917
 __msan_warning+0x7d/0xe0 mm/kmsan/kmsan_instr.c:500
 selinux_socket_bind+0x61b/0x1040 security/selinux/hooks.c:4643
 security_socket_bind+0x127/0x200 security/security.c:1390
 __sys_bind+0x577/0x7e0 net/socket.c:1479
 __do_sys_bind net/socket.c:1494 [inline]
 __se_sys_bind+0x8d/0xb0 net/socket.c:1492
 __x64_sys_bind+0x4a/0x70 net/socket.c:1492
 do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
 entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x4497b9
Code: e8 8c 9f 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
01 f0 ff ff 0f 83 9b 6b fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fdddf1e9c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
RAX: ffffffffffffffda RBX: 00007fdddf1ea6cc RCX: 00000000004497b9
RDX: 0000000000000000 RSI: 0000000020000040 RDI: 0000000000000013
RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000000004c8 R14: 00000000006e8568 R15: 00007fdddf1ea700

Local variable description: ----address@__sys_bind
Variable was created at:
 __sys_bind+0x89/0x7e0 net/socket.c:1470
 __do_sys_bind net/socket.c:1494 [inline]
 __se_sys_bind+0x8d/0xb0 net/socket.c:1492
==================================================================

Crash log2
==================================================================
BUG: KMSAN: uninit-value in selinux_socket_connect_helper+0x55c/0x960
security/selinux/hooks.c:4775
CPU: 0 PID: 8234 Comm: syz-executor2 Not tainted 4.19.0-rc8+ #18
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x305/0x460 lib/dump_stack.c:113
 kmsan_report+0x1a2/0x2e0 mm/kmsan/kmsan.c:917
 __msan_warning+0x7d/0xe0 mm/kmsan/kmsan_instr.c:500
 selinux_socket_connect_helper+0x55c/0x960 security/selinux/hooks.c:4775
 selinux_socket_connect+0xbe/0x180 security/selinux/hooks.c:4834
 security_socket_connect+0x127/0x200 security/security.c:1395
 __sys_connect+0x577/0x850 net/socket.c:1660
 __do_sys_connect net/socket.c:1675 [inline]
 __se_sys_connect+0x8d/0xb0 net/socket.c:1672
 __x64_sys_connect+0x4a/0x70 net/socket.c:1672
 do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
 entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x4497b9
Code: e8 8c 9f 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
01 f0 ff ff 0f 83 9b 6b fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ff660d67c68 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 00007ff660d686cc RCX: 00000000004497b9
RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000013
RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000000ae0 R14: 00000000006e8b80 R15: 00007ff660d68700

Local variable description: ----address@__sys_connect
Variable was created at:
 __sys_connect+0x89/0x850 net/socket.c:1647
 __do_sys_connect net/socket.c:1675 [inline]
 __se_sys_connect+0x8d/0xb0 net/socket.c:1672
==================================================================

We provide a simple patch below to stop them.
There are a few more lines that invoke move_addr_to_kernel(), but the
two of them
(bind and connect) seem to be the only cases to be corrected.


diff --git a/net/socket.c b/net/socket.c
index 390a8ec..de0931c2 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1475,7 +1475,7 @@ int __sys_bind(int fd, struct sockaddr __user
*umyaddr, int addrlen)
        sock = sockfd_lookup_light(fd, &err, &fput_needed);
        if (sock) {
                err = move_addr_to_kernel(umyaddr, addrlen, &address);
-               if (err >= 0) {
+               if (err > 0) {
                        err = security_socket_bind(sock,
                                                   (struct sockaddr *)&address,
                                                   addrlen);
@@ -1653,7 +1653,7 @@ int __sys_connect(int fd, struct sockaddr __user
*uservaddr, int addrlen)
        if (!sock)
                goto out;
        err = move_addr_to_kernel(uservaddr, addrlen, &address);
-       if (err < 0)
+       if (err <= 0)
                goto out_put;

        err =


Thanks,
Kyungtae Kim

^ permalink raw reply related

* [PATCH] dp83867: fix speed 10 in sgmii mode
From: Max Uvarov @ 2018-10-25 11:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: andrew, f.fainelli, netdev, Max Uvarov

For support 10Mps sped in SGMII mode DP83867_10M_SGMII_RATE_ADAPT bit
of DP83867_10M_SGMII_CFG register has to be cleared by software.
That does not affect speeds 100 and 1000 so can be done on init.

Signed-off-by: Max Uvarov <muvarov@gmail.com>
---
 http://www.ti.com/lit/ds/symlink/dp83867e.pdf
 page 87 for register bits
 and
 page 23 last sentance that bit needs to be cleared.

 drivers/net/phy/dp83867.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c
index ab58224f897f..1df4da3f70a9 100644
--- a/drivers/net/phy/dp83867.c
+++ b/drivers/net/phy/dp83867.c
@@ -37,6 +37,7 @@
 #define DP83867_STRAP_STS1	0x006E
 #define DP83867_RGMIIDCTL	0x0086
 #define DP83867_IO_MUX_CFG	0x0170
+#define DP83867_10M_SGMII_CFG  0x016F
 
 #define DP83867_SW_RESET	BIT(15)
 #define DP83867_SW_RESTART	BIT(14)
@@ -79,6 +80,9 @@
 /* CFG4 bits */
 #define DP83867_CFG4_PORT_MIRROR_EN              BIT(0)
 
+/* 10M_SGMII_CFG bits */
+#define DP83867_10M_SGMII_RATE_ADAPT		 BIT(7)
+
 enum {
 	DP83867_PORT_MIRROING_KEEP,
 	DP83867_PORT_MIRROING_EN,
@@ -285,6 +289,24 @@ static int dp83867_config_init(struct phy_device *phydev)
 		}
 	}
 
+	if (phydev->interface == PHY_INTERFACE_MODE_SGMII) {
+		/* For support SPEED_10 in SGMII mode
+		 * DP83867_10M_SGMII_RATE_ADAPT bit
+		 * has to be cleared by software. That
+		 * does not affect SPEED_100 and
+		 * SPEED_1000.
+		 */
+		val = phy_read_mmd(phydev, DP83867_DEVADDR,
+				   DP83867_10M_SGMII_CFG);
+		val &= ~DP83867_10M_SGMII_RATE_ADAPT;
+		ret = phy_write_mmd(phydev, DP83867_DEVADDR,
+				    DP83867_10M_SGMII_CFG, val);
+		if (ret) {
+			WARN_ONCE(1, "dp83867: error DP83867_10M_SGMII_CFG read\n");
+			return ret;
+		}
+	}
+
 	/* Enable Interrupt output INT_OE in CFG3 register */
 	if (phy_interrupt_is_valid(phydev)) {
 		val = phy_read(phydev, DP83867_CFG3);
-- 
2.17.1

^ permalink raw reply related

* Re: BUG: KMSAN: uninit-value in selinux_socket_bind, selinux_socket_connect_helper
From: Alexander Potapenko @ 2018-10-25 11:07 UTC (permalink / raw)
  To: kt0755
  Cc: David Miller, lifeasageek, syzkaller, threeearcat, LKML,
	Networking, alexey.kodanev
In-Reply-To: <CAEAjamv0P63dTXEjzbDp6ahaNRuUiRPWTnj_N2RV7monBk4bHw@mail.gmail.com>

On Thu, Oct 25, 2018 at 12:35 PM Kyungtae Kim <kt0755@gmail.com> wrote:
>
> We report two crashes related (in v4.19-rc8) :
> "BUG: KMSAN: uninit-value in selinux_socket_bind"
> "BUG: KMSAN: uninit-value in selinux_socket_connect_helper”
>
> kernel config: https://kt0755.github.io/etc/config-4.19-rc2.kmsan
> repro: https://kt0755.github.io/etc/repro.b0e55.c
>
> Since both crashes share the same issue, we just explain one of the two.
> When the third argument of bind() (i.e., addrlen) is zero, in __sys_bind(),
> data copy from user sockaddr to kernel sockaddr does not occur
> (net/socket.c:186).
> However, a subsequent function selinux_socket_bind() tries to read
> the kernel sockaddr (address->sa_family) that was not initialized at all.
>
> Crash log1
> ==================================================================
> BUG: KMSAN: uninit-value in selinux_socket_bind+0x61b/0x1040
> security/selinux/hooks.c:4643
> CPU: 0 PID: 19070 Comm: syz-executor6 Not tainted 4.19.0-rc8+ #18
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x305/0x460 lib/dump_stack.c:113
>  kmsan_report+0x1a2/0x2e0 mm/kmsan/kmsan.c:917
>  __msan_warning+0x7d/0xe0 mm/kmsan/kmsan_instr.c:500
>  selinux_socket_bind+0x61b/0x1040 security/selinux/hooks.c:4643
>  security_socket_bind+0x127/0x200 security/security.c:1390
>  __sys_bind+0x577/0x7e0 net/socket.c:1479
>  __do_sys_bind net/socket.c:1494 [inline]
>  __se_sys_bind+0x8d/0xb0 net/socket.c:1492
>  __x64_sys_bind+0x4a/0x70 net/socket.c:1492
>  do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
>  entry_SYSCALL_64_after_hwframe+0x63/0xe7
> RIP: 0033:0x4497b9
> Code: e8 8c 9f 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48
> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
> 01 f0 ff ff 0f 83 9b 6b fc ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007fdddf1e9c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
> RAX: ffffffffffffffda RBX: 00007fdddf1ea6cc RCX: 00000000004497b9
> RDX: 0000000000000000 RSI: 0000000020000040 RDI: 0000000000000013
> RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
> R13: 00000000000004c8 R14: 00000000006e8568 R15: 00007fdddf1ea700
>
> Local variable description: ----address@__sys_bind
> Variable was created at:
>  __sys_bind+0x89/0x7e0 net/socket.c:1470
>  __do_sys_bind net/socket.c:1494 [inline]
>  __se_sys_bind+0x8d/0xb0 net/socket.c:1492
> ==================================================================
>
> Crash log2
> ==================================================================
> BUG: KMSAN: uninit-value in selinux_socket_connect_helper+0x55c/0x960
> security/selinux/hooks.c:4775
> CPU: 0 PID: 8234 Comm: syz-executor2 Not tainted 4.19.0-rc8+ #18
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x305/0x460 lib/dump_stack.c:113
>  kmsan_report+0x1a2/0x2e0 mm/kmsan/kmsan.c:917
>  __msan_warning+0x7d/0xe0 mm/kmsan/kmsan_instr.c:500
>  selinux_socket_connect_helper+0x55c/0x960 security/selinux/hooks.c:4775
>  selinux_socket_connect+0xbe/0x180 security/selinux/hooks.c:4834
>  security_socket_connect+0x127/0x200 security/security.c:1395
>  __sys_connect+0x577/0x850 net/socket.c:1660
>  __do_sys_connect net/socket.c:1675 [inline]
>  __se_sys_connect+0x8d/0xb0 net/socket.c:1672
>  __x64_sys_connect+0x4a/0x70 net/socket.c:1672
>  do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
>  entry_SYSCALL_64_after_hwframe+0x63/0xe7
> RIP: 0033:0x4497b9
> Code: e8 8c 9f 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48
> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
> 01 f0 ff ff 0f 83 9b 6b fc ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007ff660d67c68 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
> RAX: ffffffffffffffda RBX: 00007ff660d686cc RCX: 00000000004497b9
> RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000013
> RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
> R13: 0000000000000ae0 R14: 00000000006e8b80 R15: 00007ff660d68700
>
> Local variable description: ----address@__sys_connect
> Variable was created at:
>  __sys_connect+0x89/0x850 net/socket.c:1647
>  __do_sys_connect net/socket.c:1675 [inline]
>  __se_sys_connect+0x8d/0xb0 net/socket.c:1672
> ==================================================================
>
> We provide a simple patch below to stop them.
> There are a few more lines that invoke move_addr_to_kernel(), but the
> two of them
> (bind and connect) seem to be the only cases to be corrected.
>
>
> diff --git a/net/socket.c b/net/socket.c
> index 390a8ec..de0931c2 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -1475,7 +1475,7 @@ int __sys_bind(int fd, struct sockaddr __user
> *umyaddr, int addrlen)
>         sock = sockfd_lookup_light(fd, &err, &fput_needed);
>         if (sock) {
>                 err = move_addr_to_kernel(umyaddr, addrlen, &address);
> -               if (err >= 0) {
> +               if (err > 0) {
>                         err = security_socket_bind(sock,
>                                                    (struct sockaddr *)&address,
>                                                    addrlen);
> @@ -1653,7 +1653,7 @@ int __sys_connect(int fd, struct sockaddr __user
> *uservaddr, int addrlen)
>         if (!sock)
>                 goto out;
>         err = move_addr_to_kernel(uservaddr, addrlen, &address);
> -       if (err < 0)
> +       if (err <= 0)
>                 goto out_put;
>
>         err =
I believe a better fix would be to add an |addrlen| check to
selinux_socket_bind(), which is illegally assuming
|address->sa_family| contains any reasonable data regardless of actual
address length.
I suspect the bug has been introduced in
https://github.com/torvalds/linux/commit/0f8db8cc73df60b3de9a5eebd8f117b56eff5b03
>
> Thanks,
> Kyungtae Kim
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply

* RE: [PATCH 3/3] usb: renesas_usbhs: Remove dummy runtime PM callbacks
From: Yoshihiro Shimoda @ 2018-10-25  2:45 UTC (permalink / raw)
  To: Jarkko Nikula, linux-pm@vger.kernel.org
  Cc: linux-i2c@vger.kernel.org, Wolfram Sang, netdev@vger.kernel.org,
	David S . Miller, Sergei Shtylyov,
	linux-renesas-soc@vger.kernel.org, linux-usb@vger.kernel.org
In-Reply-To: <20181024135134.28456-4-jarkko.nikula@linux.intel.com>

Hi Jarkko,

> From: Jarkko Nikula, Sent: Wednesday, October 24, 2018 10:52 PM
> 
> Platform drivers don't need dummy runtime PM callbacks that just return
> success in order to have runtime PM happening. This has changed since
> following commits:
> 
> 05aa55dddb9e ("PM / Runtime: Lenient generic runtime pm callbacks")
> 543f2503a956 ("PM / platform_bus: Allow runtime PM by default")
> 8b313a38ecff ("PM / Platform: Use generic runtime PM callbacks directly")
> 
> Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
> ---
> Only build tested.

Thank you for the patch!

Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>

> ---
>  drivers/usb/renesas_usbhs/common.c | 14 --------------
>  1 file changed, 14 deletions(-)
> 
> diff --git a/drivers/usb/renesas_usbhs/common.c b/drivers/usb/renesas_usbhs/common.c
> index a3e1290d682d..0e760f228dd8 100644
> --- a/drivers/usb/renesas_usbhs/common.c
> +++ b/drivers/usb/renesas_usbhs/common.c
> @@ -874,23 +874,9 @@ static int usbhsc_resume(struct device *dev)
>  	return 0;
>  }
> 
> -static int usbhsc_runtime_nop(struct device *dev)
> -{
> -	/* Runtime PM callback shared between ->runtime_suspend()
> -	 * and ->runtime_resume(). Simply returns success.
> -	 *
> -	 * This driver re-initializes all registers after
> -	 * pm_runtime_get_sync() anyway so there is no need
> -	 * to save and restore registers here.
> -	 */

I guess this code came from i2c-sh_mobile.c or sh_eth.c
and we didn't realize this code didn't need at that time (Oct 10 2011).

Best regards,
Yoshihiro Shimoda

> -	return 0;
> -}
> -
>  static const struct dev_pm_ops usbhsc_pm_ops = {
>  	.suspend		= usbhsc_suspend,
>  	.resume			= usbhsc_resume,
> -	.runtime_suspend	= usbhsc_runtime_nop,
> -	.runtime_resume		= usbhsc_runtime_nop,
>  };
> 
>  static struct platform_driver renesas_usbhs_driver = {
> --
> 2.19.1

^ permalink raw reply

* Re: [PATCH ghak90 (was ghak32) V4 03/10] audit: log container info of syscalls
From: Richard Guy Briggs @ 2018-10-25 12:22 UTC (permalink / raw)
  To: Paul Moore
  Cc: containers, linux-api, linux-audit, linux-fsdevel, linux-kernel,
	netdev, netfilter-devel, ebiederm, luto, carlos, dhowells, viro,
	simo, Eric Paris, Serge Hallyn
In-Reply-To: <166a9dae538.280e.85c95baa4474aabc7814e68940a78392@paul-moore.com>

On 2018-10-25 07:13, Paul Moore wrote:
> On October 25, 2018 1:43:16 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> > On 2018-10-24 16:55, Paul Moore wrote:
> >> On Wed, Oct 24, 2018 at 11:15 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> >>> On 2018-10-19 19:16, Paul Moore wrote:
> >>>> On Sun, Aug 5, 2018 at 4:32 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> >
> 
> ...
> 
> >
> >>>> However, I do care about the "op" field in this record.  It just
> >>>> doesn't make any sense; the way you are using it it is more of a
> >>>> context field than an operations field, and even then why is the
> >>>> context important from a logging and/or security perspective?  Drop it
> >>>> please.
> >>>
> >>> I'll rename it to whatever you like.  I'd suggest "ref=".  The reason I
> >>> think it is important is there are multiple sources that aren't always
> >>> obvious from the other records to which it is associated.  In the case
> >>> of ptrace and signals, there can be many target tasks listed (OBJ_PID)
> >>> with no other way to distinguish the matching audit container identifier
> >>> records all for one event.  This is in addition to the default syscall
> >>> container identifier record.  I'm not currently happy with the text
> >>> content to link the two, but that should be solvable (most obvious is
> >>> taret PID).  Throwing away this information seems shortsighted.
> >>
> >> It would be helpful if you could generate real audit events
> >> demonstrating the problems you are describing, as well as a more
> >> standard syscall event, so we can discuss some possible solutions.
> >
> > If the auditted process is in a container and it ptraces or signals
> > another process in a container, there will be two AUDIT_CONTAINER
> > records for the same event that won't be identified as to which record
> > belongs to which process or other record (SYSCALL vs 1+ OBJ_PID
> > records).  There could be many signals recorded, each with their own
> > OBJ_PID record.  The first is stored in the audit context and additional
> > ones are stored in a chained struct that can accommodate 16 entries each.
> >
> > (See audit_signal_info(), __audit_ptrace().)
> >
> > (As a side note, on code inspection it appears that a signal target
> > would get overwritten by a ptrace action if they were to happen in that
> > order.)
> 
> As requested above, please respond with real audit events generated by
> this patchset so that we can discuss possible solutions.

Ok, then we should be developping a test to test ptrace and signal
auditting in general since we don't have current experience/evidence
that those even work (or rip them out if not).

> paul moore

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply

* Re: [PATCH ghak90 (was ghak32) V4 03/10] audit: log container info of syscalls
From: Richard Guy Briggs @ 2018-10-25 12:27 UTC (permalink / raw)
  To: Paul Moore
  Cc: luto, carlos, linux-api, containers, linux-kernel, viro, dhowells,
	linux-audit, netfilter-devel, ebiederm, simo, netdev,
	linux-fsdevel, Eric Paris, Serge Hallyn
In-Reply-To: <CAHC9VhR02siURgadQs0EikcOm0d_TPAi4jJYCx4NkA=wQ5rjSA@mail.gmail.com>

On 2018-10-25 06:49, Paul Moore wrote:
> On Thu, Oct 25, 2018 at 2:06 AM Steve Grubb <sgrubb@redhat.com> wrote:
> > On Wed, 24 Oct 2018 20:42:55 -0400
> > Richard Guy Briggs <rgb@redhat.com> wrote:
> > > On 2018-10-24 16:55, Paul Moore wrote:
> > > > On Wed, Oct 24, 2018 at 11:15 AM Richard Guy Briggs
> > > > <rgb@redhat.com> wrote:
> > > > > On 2018-10-19 19:16, Paul Moore wrote:
> > > > > > On Sun, Aug 5, 2018 at 4:32 AM Richard Guy Briggs
> > > > > > <rgb@redhat.com> wrote:
> 
> ...
> 
> > > > > > > +/*
> > > > > > > + * audit_log_contid - report container info
> > > > > > > + * @tsk: task to be recorded
> > > > > > > + * @context: task or local context for record
> > > > > > > + * @op: contid string description
> > > > > > > + */
> > > > > > > +int audit_log_contid(struct task_struct *tsk,
> > > > > > > +                            struct audit_context *context,
> > > > > > > char *op) +{
> > > > > > > +       struct audit_buffer *ab;
> > > > > > > +
> > > > > > > +       if (!audit_contid_set(tsk))
> > > > > > > +               return 0;
> > > > > > > +       /* Generate AUDIT_CONTAINER record with container ID
> > > > > > > */
> > > > > > > +       ab = audit_log_start(context, GFP_KERNEL,
> > > > > > > AUDIT_CONTAINER);
> > > > > > > +       if (!ab)
> > > > > > > +               return -ENOMEM;
> > > > > > > +       audit_log_format(ab, "op=%s contid=%llu",
> > > > > > > +                        op, audit_get_contid(tsk));
> > > > > > > +       audit_log_end(ab);
> > > > > > > +       return 0;
> > > > > > > +}
> > > > > > > +EXPORT_SYMBOL(audit_log_contid);
> > > > > >
> > > > > > As discussed in the previous iteration of the patch, I prefer
> > > > > > AUDIT_CONTAINER_ID here over AUDIT_CONTAINER.  If you feel
> > > > > > strongly about keeping it as-is with AUDIT_CONTAINER I suppose
> > > > > > I could live with that, but it is isn't my first choice.
> > > > >
> > > > > I don't have a strong opinion on this one, mildly preferring the
> > > > > shorter one only because it is shorter.
> > > >
> > > > We already have multiple AUDIT_CONTAINER* record types, so it seems
> > > > as though we should use "AUDIT_CONTAINER" as a prefix of sorts,
> > > > rather than a type itself.
> > >
> > > I'm fine with that.  I'd still like to hear Steve's input.  He had
> > > stronger opinions than me.
> >
> > The creation event should be separate and distinct from the continuing
> > use when its used as a supplemental record. IOW, binding the ID to a
> > container is part of the lifecycle and needs to be kept distinct.
> 
> Steve's comment is pretty ambiguous when it comes to AUDIT_CONTAINER
> vs AUDIT_CONTAINER_ID, but one could argue that AUDIT_CONTAINER_ID
> helps distinguish the audit container id marking record and gets to
> what I believe is the spirit of Steve's comment.  Taking this in
> context with my previous remarks, let's switch to using
> AUDIT_CONTAINER_ID.

I suspect Steve is mixing up AUDIT_CONTAINER_OP with AUDIT_CONTAINER_ID,
confusing the fact that they are two seperate records.  As a summary,
the suggested records are:
	CONTAINER_OP	audit container identifier creation
	CONTAINER	audit container identifier aux record to an event

and what Paul is suggesting (which is fine by me) is:
	CONTAINER_OP	audit container identifier creation event
	CONTAINER_ID	audit container identifier aux record to an event

Steve, please indicate you are fine with this.

> paul moore

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply

* Re: [PATCH] octeontx2-af: Use GFP_ATOMIC under spin lock
From: Sunil Kovvuri @ 2018-10-25  4:18 UTC (permalink / raw)
  To: weiyongjun1
  Cc: Sunil Goutham, Linu Cherian, Geetha sowjanya, jerinj,
	Linux Netdev List, kernel-janitors
In-Reply-To: <1540431746-176759-1-git-send-email-weiyongjun1@huawei.com>

On Thu, Oct 25, 2018 at 7:02 AM Wei Yongjun <weiyongjun1@huawei.com> wrote:
>
> The function nix_update_mce_list() is called from
> nix_update_bcast_mce_list(), and a spin lock is held
> here, so we should use GFP_ATOMIC instead.
>
> Fixes: 4b05528ebf0c ("octeontx2-af: Update bcast list upon NIXLF alloc/free")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
> ---
>  drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
> index 8890c95..a618e48 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
> +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
> @@ -1294,7 +1294,7 @@ static int nix_update_mce_list(struct nix_mce_list *mce_list,
>                 return 0;
>
>         /* Add a new one to the list, at the tail */
> -       mce = kzalloc(sizeof(*mce), GFP_KERNEL);
> +       mce = kzalloc(sizeof(*mce), GFP_ATOMIC);
>         if (!mce)
>                 return -ENOMEM;
>         mce->idx = idx;
>

Thanks for pointing this and for the patch.
FYI, this driver is no where near to complete, after net-next is open we will
start submitting rest of the patches. There is a subsequent patch
which changes most of the locks to mutex from spinlock, which will
fix this issue as well.

It would be great if above patch is ignored for now, as in it's
current state driver is
not complete and usable. Otherwise also fine, I will change change/fix
this again.

Regards,
Sunil.

^ permalink raw reply

* Re: [PATCH RFC v3 0/3] Rlimit for module space
From: Michal Hocko @ 2018-10-25 14:18 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: daniel@iogearbox.net, jeyu@kernel.org, ard.biesheuvel@linaro.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	jannh@google.com, keescook@chromium.org, arjan@linux.intel.com,
	netdev@vger.kernel.org, tglx@linutronix.de,
	linux-mips@linux-mips.org, linux-s390@vger.kernel.org,
	x86@kernel.org, kristen@linux.intel.com, Dock, Deneen T,
	catalin.marinas@arm.com, mingo@redhat.com, will.deacon@arm.com
In-Reply-To: <d1fec827d028168047eafbac56e8e47d37cf7fb5.camel@intel.com>

On Thu 25-10-18 01:01:44, Edgecombe, Rick P wrote:
[...]
> FWIW, cgroups seems like a better solution than rlimit to me too. Maybe you all
> already know, but it looks like the cgroups vmalloc charge is done in the main
> page allocator and counts against the whole kernel memory limit.

I am not sure I understand what you are saying but let me clarify that
vmalloc memory is accounted against memory cgroup associated with the
user context calling vmalloc. All you need to do is to add __GFP_ACCOUNT
to the gfp mask. The only challenge here is the charged memory life
cycle. When does it get deallocated? In the worst case the memory is not
tight to any user context and as such it doesn't get deallocated by
killing all processes which could lead to memcg limit exhaustion.

> A user may want
> to have a higher kernel limit than the module space size, so it seems it isn't
> enough by itself and some new limit would need to be added.

If there is another restriction on this memory then you are right.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: [PATCH 1/2] Bluetooth: Add quirk for reading BD_ADDR from fwnode property
From: Balakrishna Godavarthi @ 2018-10-25 14:36 UTC (permalink / raw)
  To: Matthias Kaehlcke
  Cc: Marcel Holtmann, Johan Hedberg, David S . Miller, Loic Poulain,
	linux-bluetooth, netdev, linux-kernel, Brian Norris,
	Dmitry Grinberg
In-Reply-To: <20181025002134.256791-2-mka@chromium.org>

On 2018-10-25 05:51, Matthias Kaehlcke wrote:
> Add HCI_QUIRK_USE_BDADDR_PROPERTY to allow controllers to retrieve
> the public Bluetooth address from the firmware node property
> 'local-bd-address'. If quirk is set and the property does not exist
> or is invalid the controller is marked as unconfigured.
> 
> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> ---
> hci_dev_get_bd_addr_from_property() currently assumes that the
> firmware node with 'local-bd-address' is from hdev->dev.parent, not
> sure if this universally true. However if it is true for existing
> device that might use this interface we can assume this for now
> (unless there is a clear solution now), and cross the bridge of
> finding an alternative when we actually encounter the situation.
> One option could be to look for the first parent that has a fwnode.
> ---
>  include/net/bluetooth/hci.h | 12 +++++++++++
>  net/bluetooth/hci_core.c    | 42 +++++++++++++++++++++++++++++++++++++
>  net/bluetooth/mgmt.c        |  6 ++++--
>  3 files changed, 58 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
> index cdd9f1fe7cfa..a5d748099752 100644
> --- a/include/net/bluetooth/hci.h
> +++ b/include/net/bluetooth/hci.h
> @@ -158,6 +158,18 @@ enum {
>  	 */
>  	HCI_QUIRK_INVALID_BDADDR,
> 
> +	/* When this quirk is set, the public Bluetooth address
> +	 * initially reported by HCI Read BD Address command
> +	 * is considered invalid. The public BD Address can be
> +	 * specified in the fwnode property 'local-bd-address'.
> +	 * If this property does not exist or is invalid controller
> +	 * configuration is required before this device can be used.
> +	 *
> +	 * This quirk can be set before hci_register_dev is called or
> +	 * during the hdev->setup vendor callback.
> +	 */
> +	HCI_QUIRK_USE_BDADDR_PROPERTY,
> +
>  	/* When this quirk is set, the duplicate filtering during
>  	 * scanning is based on Bluetooth devices addresses. To allow
>  	 * RSSI based updates, restart scanning if needed.
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index 74b29c7d841c..97214262c4fb 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -30,6 +30,7 @@
>  #include <linux/rfkill.h>
>  #include <linux/debugfs.h>
>  #include <linux/crypto.h>
> +#include <linux/property.h>
>  #include <asm/unaligned.h>
> 
>  #include <net/bluetooth/bluetooth.h>
> @@ -1355,9 +1356,40 @@ int hci_inquiry(void __user *arg)
>  	return err;
>  }
> 
> +/**
> + * hci_dev_get_bd_addr_from_property - Get the Bluetooth Device 
> Address
> + *				       (BD_ADDR) for a HCI device from
> + *				       a firmware node property.
> + * @hdev:	The HCI device
> + *
> + * Search the firmware node for 'local-bd-address'.
> + *
> + * All-zero BD addresses are rejected, because those could be 
> properties
> + * that exist in the firmware tables, but were not updated by the 
> firmware. For
> + * example, the DTS could define 'local-bd-address', with zero BD 
> addresses.
> + */
> +static int hci_dev_get_bd_addr_from_property(struct hci_dev *hdev)
> +{
> +	struct fwnode_handle *fwnode = dev_fwnode(hdev->dev.parent);
> +	bdaddr_t ba;
> +	int ret;
> +
> +	ret = fwnode_property_read_u8_array(fwnode, "local-bd-address",
> +					    (u8 *)&ba, sizeof(ba));
> +	if (ret < 0)
> +		return ret;
> +	if (!bacmp(&ba, BDADDR_ANY))
> +		return -ENODATA;
> +
> +	hdev->public_addr = ba;
> +
> +	return 0;
> +}
> +
>  static int hci_dev_do_open(struct hci_dev *hdev)
>  {
>  	int ret = 0;
> +	bool bd_addr_set = false;
> 
>  	BT_DBG("%s %p", hdev->name, hdev);
> 
> @@ -1422,6 +1454,16 @@ static int hci_dev_do_open(struct hci_dev *hdev)
>  		if (hdev->setup)
>  			ret = hdev->setup(hdev);
> 
> +		if (test_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks)) {
> +			if (!hci_dev_get_bd_addr_from_property(hdev))
> +				if (hdev->set_bdaddr &&
> +				    !hdev->set_bdaddr(hdev, &hdev->public_addr))
> +					bd_addr_set = true;
> +
> +			if (!bd_addr_set)
> +				hci_dev_set_flag(hdev, HCI_UNCONFIGURED);
> +		}
> +
>  		/* The transport driver can set these quirks before
>  		 * creating the HCI device or in its setup callback.
>  		 *
> diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c
> index 3bdc8f3ca259..3d9edb752403 100644
> --- a/net/bluetooth/mgmt.c
> +++ b/net/bluetooth/mgmt.c
> @@ -551,7 +551,8 @@ static bool is_configured(struct hci_dev *hdev)
>  	    !hci_dev_test_flag(hdev, HCI_EXT_CONFIGURED))
>  		return false;
> 
> -	if (test_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks) &&
> +	if ((test_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks) ||
> +	     test_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks)) &&
>  	    !bacmp(&hdev->public_addr, BDADDR_ANY))
>  		return false;
> 
> @@ -566,7 +567,8 @@ static __le32 get_missing_options(struct hci_dev 
> *hdev)
>  	    !hci_dev_test_flag(hdev, HCI_EXT_CONFIGURED))
>  		options |= MGMT_OPTION_EXTERNAL_CONFIG;
> 
> -	if (test_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks) &&
> +	if ((test_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks) ||
> +	     test_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks)) &&
>  	    !bacmp(&hdev->public_addr, BDADDR_ANY))
>  		options |= MGMT_OPTION_PUBLIC_ADDRESS;

Looks fine to me.

Reviewed-by: Balakrishna Godavarthi <bgodavar@codeaurora.org>
-- 
Regards
Balakrishna.

^ permalink raw reply

* Re: [PATCH 2/2] Bluetooth: btqcomsmd: use HCI_QUIRK_USE_BDADDR_PROPERTY
From: Balakrishna Godavarthi @ 2018-10-25 14:40 UTC (permalink / raw)
  To: Matthias Kaehlcke
  Cc: Marcel Holtmann, Johan Hedberg, David S . Miller, Loic Poulain,
	linux-bluetooth, netdev, linux-kernel, Brian Norris,
	Dmitry Grinberg
In-Reply-To: <20181025002134.256791-3-mka@chromium.org>

Hi Matthias,

On 2018-10-25 05:51, Matthias Kaehlcke wrote:
> Use the HCI_QUIRK_USE_BDADDR_PROPERTY quirk to let the HCI
> core handle the reading of 'local-bd-address'. With this there
> is no need to set HCI_QUIRK_INVALID_BDADDR, the case of a
> non-existing or invalid fwnode property is handled by the core
> code.
> 
> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> ---
> I couldn't actually test the changes in this driver since I
> don't have a device with this controller. Could someone
> from Qualcomm help with this?
> ---
>  drivers/bluetooth/btqcomsmd.c | 28 +++-------------------------
>  1 file changed, 3 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/bluetooth/btqcomsmd.c 
> b/drivers/bluetooth/btqcomsmd.c
> index 7df3eed1ef5e..e5841602c4f1 100644
> --- a/drivers/bluetooth/btqcomsmd.c
> +++ b/drivers/bluetooth/btqcomsmd.c
> @@ -125,23 +125,10 @@ static int btqcomsmd_setup(struct hci_dev *hdev)
>  		return PTR_ERR(skb);
>  	kfree_skb(skb);
> 
> -	/* Devices do not have persistent storage for BD address. If no
> -	 * BD address has been retrieved during probe, mark the device
> -	 * as having an invalid BD address.
> +	/* Devices do not have persistent storage for BD address. Retrieve
> +	 * it from the firmware node property.
>  	 */
> -	if (!bacmp(&btq->bdaddr, BDADDR_ANY)) {
> -		set_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks);
> -		return 0;
> -	}
> -
> -	/* When setting a configured BD address fails, mark the device
> -	 * as having an invalid BD address.
> -	 */
> -	err = qca_set_bdaddr_rome(hdev, &btq->bdaddr);
> -	if (err) {
> -		set_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks);
> -		return 0;
> -	}
> +	set_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks);
> 
>  	return 0;
>  }
> @@ -169,15 +156,6 @@ static int btqcomsmd_probe(struct platform_device 
> *pdev)
>  	if (IS_ERR(btq->cmd_channel))
>  		return PTR_ERR(btq->cmd_channel);
> 
> -	/* The local-bd-address property is usually injected by the
> -	 * bootloader which has access to the allocated BD address.
> -	 */
> -	if (!of_property_read_u8_array(pdev->dev.of_node, "local-bd-address",
> -				       (u8 *)&btq->bdaddr, sizeof(bdaddr_t))) {
> -		dev_info(&pdev->dev, "BD address %pMR retrieved from device-tree",
> -			 &btq->bdaddr);
> -	}
> -
>  	hdev = hci_alloc_dev();
>  	if (!hdev)
>  		return -ENOMEM;

nit: can be remove unused "bdaddr_t bdaddr" variable.
https://elixir.bootlin.com/linux/v4.19-rc8/source/drivers/bluetooth/btqcomsmd.c#L31
Apart from this, It looks ok to me.

Reviewed-by: Balakrishna Godavarthi <bgodavar@codeaurora.org>
-- 
Regards
Balakrishna.

^ permalink raw reply

* Re: [PATCH net-next v8 28/28] net: WireGuard secure network tunnel
From: Jason A. Donenfeld @ 2018-10-25 14:43 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: LKML, Netdev, Linux Crypto Mailing List, David Miller,
	Greg Kroah-Hartman
In-Reply-To: <20181020224706.GC14816@lunn.ch>

Hi Andrew,

Thanks for the review. Comments and fix links are inline below.

On Sun, Oct 21, 2018 at 12:47 AM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > +#define choose_node(parent, key)                                               \
> > +     parent->bit[(key[parent->bit_at_a] >> parent->bit_at_b) & 1]
> This should be a function, not a macro.

That prevents it from being used as an lvalue, which is mostly where
it's used, with the rcu-protected structure, so this will remain a
macro. I'll make it upper-case though.

> > +
> > +static void node_free_rcu(struct rcu_head *rcu)
> > +{
> > +     kfree(container_of(rcu, struct allowedips_node, rcu));
> > +}
> > +
> > +#define push_rcu(stack, p, len) ({                                             \
> > +             if (rcu_access_pointer(p)) {                                   \
> > +                     WARN_ON(IS_ENABLED(DEBUG) && (len) >= 128);            \
> > +                     stack[(len)++] = rcu_dereference_raw(p);               \
> > +             }                                                              \
> > +             true;                                                          \
> > +     })
>
> This also looks like it could be a function.

You're right. I've changed this now. That produces slightly worse code
on gcc 8.2, but that's not really hot path anyway, so it doesn't
matter.

> > +static void root_free_rcu(struct rcu_head *rcu)
> > +{
> > +     struct allowedips_node *node, *stack[128] = {
> > +             container_of(rcu, struct allowedips_node, rcu) };
> > +     unsigned int len = 1;
> > +
> > +     while (len > 0 && (node = stack[--len]) &&
> > +            push_rcu(stack, node->bit[0], len) &&
> > +            push_rcu(stack, node->bit[1], len))
> > +             kfree(node);
> > +}
> > +
> > +#define ref(p) rcu_access_pointer(p)
> > +#define deref(p) rcu_dereference_protected(*(p), lockdep_is_held(lock))
>
> Macros should be uppercase, or better still, functions.
>
> > +#define push(p) ({                                                             \
> > +             WARN_ON(IS_ENABLED(DEBUG) && len >= 128);                      \
> > +             stack[len++] = p;                                              \
> > +     })
>
> This one definitely should be upper case, to warn readers it has
> unexpected side effects.

I've made these little helper macros uppercase.

>
> > +
> > +static void walk_remove_by_peer(struct allowedips_node __rcu **top,
> > +                             struct wg_peer *peer, struct mutex *lock)
> > +{
> > +     struct allowedips_node __rcu **stack[128], **nptr;
> > +     struct allowedips_node *node, *prev;
> > +     unsigned int len;
> > +
> > +     if (unlikely(!peer || !ref(*top)))
> > +             return;
> > +
> > +     for (prev = NULL, len = 0, push(top); len > 0; prev = node) {
> > +             nptr = stack[len - 1];
> > +             node = deref(nptr);
> > +             if (!node) {
> > +                     --len;
> > +                     continue;
> > +             }
> > +             if (!prev || ref(prev->bit[0]) == node ||
> > +                 ref(prev->bit[1]) == node) {
> > +                     if (ref(node->bit[0]))
> > +                             push(&node->bit[0]);
> > +                     else if (ref(node->bit[1]))
> > +                             push(&node->bit[1]);
> > +             } else if (ref(node->bit[0]) == prev) {
> > +                     if (ref(node->bit[1]))
> > +                             push(&node->bit[1]);
> > +             } else {
> > +                     if (rcu_dereference_protected(node->peer,
> > +                             lockdep_is_held(lock)) == peer) {
> > +                             RCU_INIT_POINTER(node->peer, NULL);
> > +                             if (!node->bit[0] || !node->bit[1]) {
> > +                                     rcu_assign_pointer(*nptr,
> > +                                     deref(&node->bit[!ref(node->bit[0])]));
> > +                                     call_rcu_bh(&node->rcu, node_free_rcu);
> > +                                     node = deref(nptr);
> > +                             }
> > +                     }
> > +                     --len;
> > +             }
> > +     }
> > +}
> > +
> > +#undef ref
> > +#undef deref
> > +#undef push
> > +
> > +static __always_inline unsigned int fls128(u64 a, u64 b)
> > +{
> > +     return a ? fls64(a) + 64U : fls64(b);
> > +}
>
> Does the compiler actually get this wrong? Not inline it?

Actually for this function (in contrast with all of the other ones
marked as such, which really must have the annotation), recent gcc
gets it right, so I've removed the annotation. Nice catch.

> > +
> > +static __always_inline u8 common_bits(const struct allowedips_node *node,
> > +                                   const u8 *key, u8 bits)
> > +{
> > +     if (bits == 32)
> > +             return 32U - fls(*(const u32 *)node->bits ^ *(const u32 *)key);
> > +     else if (bits == 128)
> > +             return 128U - fls128(
> > +                     *(const u64 *)&node->bits[0] ^ *(const u64 *)&key[0],
> > +                     *(const u64 *)&node->bits[8] ^ *(const u64 *)&key[8]);
> > +     return 0;
> > +}
> > +
> > +/* This could be much faster if it actually just compared the common bits
> > + * properly, by precomputing a mask bswap(~0 << (32 - cidr)), and the rest, but
> > + * it turns out that common_bits is already super fast on modern processors,
> > + * even taking into account the unfortunate bswap. So, we just inline it like
> > + * this instead.
> > + */
> > +#define prefix_matches(node, key, bits)                                        \
> > +     (common_bits(node, key, bits) >= (node)->cidr)
>
> Could be a function.

I've made this a function now and confirmed codegen did not change
significantly.

> > +/* Returns a strong reference to a peer */
> > +static __always_inline struct wg_peer *
> > +lookup(struct allowedips_node __rcu *root, u8 bits, const void *be_ip)
> > +{
> > +     u8 ip[16] __aligned(__alignof(u64));
>
> You virtually never see aligned stack variables. This needs some sort
> of comment.

I've added a comment. The reason, if you're curious, is so that we can
pass it to fls64 on all platforms.

> > +__attribute__((nonnull(1))) static bool
> > +node_placement(struct allowedips_node __rcu *trie, const u8 *key, u8 cidr,
> > +            u8 bits, struct allowedips_node **rnode, struct mutex *lock)
> > +{
> > +     struct allowedips_node *node = rcu_dereference_protected(trie,
> > +                                             lockdep_is_held(lock));
> > +     struct allowedips_node *parent = NULL;
> > +     bool exact = false;
>
> Should there be a WARN_ON(!key) here, since the attribute will only
> detect problems at compile time, and maybe some runtime cases will get
> passed it?

Actually no, it's only used in one place, and that function is pretty
clear about checking beforehand, and even this function checks
implicitly. Looking at my commit log, I had added the annotation to
make clang-analyzer happy, because it couldn't (at the time) deduce
things correctly from rcu_access_pointer(p) checks. It looks like
recent clang-analyzer has fixed that, and besides I'm not sure the
kernel is in the business of adding annotations to work around
static-analyzer-bug-of-the-week. So I've gotten rid of the annotation.
Thanks for bringing my attention to it.

> > +             net_dbg_ratelimited("%s: Could not decrypt invalid cookie response\n",
> > +                                 wg->dev->name);
>
> It might be worth adding a netdev_dbg_ratelimited(), which takes a
> netdev as its first parameter, just line netdev_dbg().

That sounds like it could be useful. Though, I'm trying hard to make
the first patch submission _not_ need to touch any of the rest of the
networking stack. I've actually got a small list of interesting
networking stack changes that might be useful for WireGuard, but I
think I'd prefer to submit these after this is all merged, and each
one separately for a separate discussion, if that's okay with you.


> > +
> > +#if defined(CONFIG_PM_SLEEP) && !defined(CONFIG_ANDROID)
>
> I don't see any other code which uses this combination. Why is this
> needed?

WireGuard clears private key material before going to sleep, so that
ephemeral keys never live longer in ram than their expiration date.
This works well for mostly everything, except Android devices where
crazy out-of-tree wireless drivers (such as qcacld) will actually wake
and sleep the phone constantly, for a few milliseconds each wake, to
handle incoming packets. In this case, we must not zero out ephemeral
keys, so that these packets can actually be decrypted. There are no
other systems that have this characteristic, as far as I can tell, and
I'm certainly not willing to add a runtime configuration nob for that,
and it turns out that matching on CONFIG_ANDROID as such does the
right thing all of the time. If the landscape becomes more
complicated, we can visit it, probably in the form of passing reasons
through to pm_notification and related scaffolding. But in lieu of
this infrastructure someday existing, I prefer to do the
simple-and-always-works-anyway solution.

> > +     while (skb_queue_len(&peer->staged_packet_queue) > MAX_STAGED_PACKETS)
> > +             dev_kfree_skb(__skb_dequeue(&peer->staged_packet_queue));
>
> It would be good to have some stats counters in here. Maybe the
> standard interface statistics will cover it, otherwise ethtool -S.

Good point, I need to increment the drop counters there. I'll add
that; good catch.

Willy and I actually discussed around a year ago adding a series of
wireguard-specific error counters for all the various unusual things
that can happen. This is something I'll probably investigate in
earnest post-merge, as it's a bit of a "bell and whistle" thing, and
the existing counters now are already sufficient for most purposes.

> You should also get this code looked at by some of the queueing
> people. Rather than discarding, you might want to be applying back
> pressure to slow down the sender application?

Actually Toke, DaveT, and I have agonized quite a bit over the
queueing mechanisms in WireGuard, as well as two GSoC summers looking
at it with students. There's been lots of flent and rrul and tweaking
involved, and the entire queueing system has gone through probably 15
different revisions of trying things out and experimenting. At this
point, I think the way we're doing queueing and when we're dropping,
and those various factors, are taken care of pretty well. Future work
still involves considering shoehorning some fq_codel in a similar way
that wireless does, and a few other tweeks, but I think the current
system is good and stable and very much sufficient.

The particular instance you're pointing out though, with
staged_packet_queue, is not the relevant queue and isn't what you have
in mind with regards to backpressure. This is simply a very short
lived thing for dealing with packets before a handshake is made, and
for handling xmit being used from multiple cores, but it's not _the_
queueing system that's relevant in the actual buffering and latency
calculations.

A little bit of the queueing stuff is discussed in the paper I
presented at Netdev 2.2, if you're interested:
https://www.wireguard.com/papers/wireguard-netdev22.pdf Some of that
is outdated by now, but should give you some idea.

> > +static void kdf(u8 *first_dst, u8 *second_dst, u8 *third_dst, const u8 *data,
> > +             size_t first_len, size_t second_len, size_t third_len,
> > +             size_t data_len, const u8 chaining_key[NOISE_HASH_LEN])
> > +{
> > +     u8 output[BLAKE2S_HASH_SIZE + 1];
> > +     u8 secret[BLAKE2S_HASH_SIZE];
> > +
> > +     WARN_ON(IS_ENABLED(DEBUG) &&
> > +             (first_len > BLAKE2S_HASH_SIZE ||
> > +              second_len > BLAKE2S_HASH_SIZE ||
> > +              third_len > BLAKE2S_HASH_SIZE ||
> > +              ((second_len || second_dst || third_len || third_dst) &&
> > +               (!first_len || !first_dst)) ||
> > +              ((third_len || third_dst) && (!second_len || !second_dst))));
>
> Maybe split this up into a number of WARN_ON()s. At the moment, if it
> fires, you have little idea why, there are so many comparisons. Also,
> is this on the hot path? I guess not, since this is keys, not
> packets. Do you need to care about DEBUG here?

This is on the hot path, actually. Well, it's not on path of data
packets, but I do consider handshake packets to be fairly "warm". I'm
not especially keen on splitting that up into multiple warn_ons,
mostly because if that is ever hint, something is seriously wrong, and
the whole flow would likely then need auditing anyway.

Regards,
Jason

^ permalink raw reply

* Re: [PATCH ghak90 (was ghak32) V4 03/10] audit: log container info of syscalls
From: Paul Moore @ 2018-10-25  6:13 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: containers, linux-api, linux-audit, linux-fsdevel, linux-kernel,
	netdev, netfilter-devel, ebiederm, luto, carlos, dhowells, viro,
	simo, Eric Paris, Serge Hallyn
In-Reply-To: <20181025004255.zl7p7j6gztouh2hh@madcap2.tricolour.ca>

On October 25, 2018 1:43:16 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2018-10-24 16:55, Paul Moore wrote:
>> On Wed, Oct 24, 2018 at 11:15 AM Richard Guy Briggs <rgb@redhat.com> wrote:
>>> On 2018-10-19 19:16, Paul Moore wrote:
>>>> On Sun, Aug 5, 2018 at 4:32 AM Richard Guy Briggs <rgb@redhat.com> wrote:
>

...

>
>>>> However, I do care about the "op" field in this record.  It just
>>>> doesn't make any sense; the way you are using it it is more of a
>>>> context field than an operations field, and even then why is the
>>>> context important from a logging and/or security perspective?  Drop it
>>>> please.
>>>
>>> I'll rename it to whatever you like.  I'd suggest "ref=".  The reason I
>>> think it is important is there are multiple sources that aren't always
>>> obvious from the other records to which it is associated.  In the case
>>> of ptrace and signals, there can be many target tasks listed (OBJ_PID)
>>> with no other way to distinguish the matching audit container identifier
>>> records all for one event.  This is in addition to the default syscall
>>> container identifier record.  I'm not currently happy with the text
>>> content to link the two, but that should be solvable (most obvious is
>>> taret PID).  Throwing away this information seems shortsighted.
>>
>> It would be helpful if you could generate real audit events
>> demonstrating the problems you are describing, as well as a more
>> standard syscall event, so we can discuss some possible solutions.
>
> If the auditted process is in a container and it ptraces or signals
> another process in a container, there will be two AUDIT_CONTAINER
> records for the same event that won't be identified as to which record
> belongs to which process or other record (SYSCALL vs 1+ OBJ_PID
> records).  There could be many signals recorded, each with their own
> OBJ_PID record.  The first is stored in the audit context and additional
> ones are stored in a chained struct that can accommodate 16 entries each.
>
> (See audit_signal_info(), __audit_ptrace().)
>
> (As a side note, on code inspection it appears that a signal target
> would get overwritten by a ptrace action if they were to happen in that
> order.)

As requested above, please respond with real audit events generated by this patchset so that we can discuss possible solutions.

^ permalink raw reply

* Re: [PATCH net] net: sched: Remove TCA_OPTIONS from policy
From: Jiri Pirko @ 2018-10-25  6:31 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, davem, pupilla, David Ahern
In-Reply-To: <20181024153249.15374-1-dsahern@kernel.org>

Wed, Oct 24, 2018 at 05:32:49PM CEST, dsahern@kernel.org wrote:
>From: David Ahern <dsahern@gmail.com>
>
>Marco reported an error with hfsc:
>root@Calimero:~# tc qdisc add dev eth0 root handle 1:0 hfsc default 1
>Error: Attribute failed policy validation.
>
>Apparently a few implementations pass TCA_OPTIONS as a binary instead
>of nested attribute, so drop TCA_OPTIONS from the policy.

Yeah, this is nice example of a case, where I think it wouldn't hurt to
be a bit more strict. Apparently, the userspace app is buggy. It should
be fixed. Note that I'm aware of the bw compatibility.


>
>Fixes: 8b4c3cdd9dd8 ("net: sched: Add policy validation for tc attributes")
>Reported-by: Marco Berizzi <pupilla@libero.it>
>Signed-off-by: David Ahern <dsahern@gmail.com>
>---
> net/sched/sch_api.c | 1 -
> 1 file changed, 1 deletion(-)
>
>diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
>index 3dc0acf54245..be7cd140b2a3 100644
>--- a/net/sched/sch_api.c
>+++ b/net/sched/sch_api.c
>@@ -1309,7 +1309,6 @@ check_loop_fn(struct Qdisc *q, unsigned long cl, struct qdisc_walker *w)
> 
> const struct nla_policy rtm_tca_policy[TCA_MAX + 1] = {
> 	[TCA_KIND]		= { .type = NLA_STRING },
>-	[TCA_OPTIONS]		= { .type = NLA_NESTED },

A future developer might add this again. A comment why not would be good
here.


> 	[TCA_RATE]		= { .type = NLA_BINARY,
> 				    .len = sizeof(struct tc_estimator) },
> 	[TCA_STAB]		= { .type = NLA_NESTED },
>-- 
>2.11.0
>

^ permalink raw reply

* Re: [PATCH] netfilter: conntrack: fix calculation of next bucket number in early_drop
From: Vasiliy Khoruzhick @ 2018-10-25  6:37 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
	David S. Miller, netfilter-devel, coreteam, netdev, linux-kernel,
	stable
In-Reply-To: <a2fa755a-048b-459a-b526-2fe27cbb6574@arista.com>

On Wed, Oct 24, 2018 at 9:52 PM, Dmitry Safonov <dima@arista.com> wrote:
> On 10/25/18 4:48 AM, Vasily Khoruzhick wrote:
>>
>> If there's no entry to drop in bucket that corresponds to the hash,
>> early_drop() should look for it in other buckets. But since it increments
>> hash instead of bucket number, it actually looks in the same bucket 8
>> times: hsize is 16k by default (14 bits) and hash is 32-bit value, so
>> reciprocal_scale(hash, hsize) returns the same value for hash..hash+7 in
>> most cases.
>>
>> Fix it by increasing bucket number instead of hash and rename _hash
>> to bucket to avoid future confusion.
>>
>> Fixes: 3e86638e9a0b ("netfilter: conntrack: consider ct netns in
>> early_drop logic")
>> Cc: <stable@vger.kernel.org> # v4.7+
>> Signed-off-by: Vasily Khoruzhick <vasilykh@arista.com>
>
>
> Nice work!
>
> Reviewed-by: Dmitry Safonov <dima@arista.com>

Oops, found a bug in it - 'bucket' should be moved outside of the
loop. I'll send v2 tomorrow morning.

>
>> ---
>>   net/netfilter/nf_conntrack_core.c | 11 +++++++----
>>   1 file changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/netfilter/nf_conntrack_core.c
>> b/net/netfilter/nf_conntrack_core.c
>> index ca1168d67fac..a04af246b184 100644
>> --- a/net/netfilter/nf_conntrack_core.c
>> +++ b/net/netfilter/nf_conntrack_core.c
>> @@ -1073,19 +1073,22 @@ static unsigned int early_drop_list(struct net
>> *net,
>>         return drops;
>>   }
>>   -static noinline int early_drop(struct net *net, unsigned int _hash)
>> +static noinline int early_drop(struct net *net, unsigned int hash)
>>   {
>>         unsigned int i;
>>         for (i = 0; i < NF_CT_EVICTION_RANGE; i++) {
>>                 struct hlist_nulls_head *ct_hash;
>> -               unsigned int hash, hsize, drops;
>> +               unsigned int bucket, hsize, drops;
>>                 rcu_read_lock();
>>                 nf_conntrack_get_ht(&ct_hash, &hsize);
>> -               hash = reciprocal_scale(_hash++, hsize);
>> +               if (!i)
>> +                       bucket = reciprocal_scale(hash, hsize);
>> +               else
>> +                       bucket = (bucket + 1) % hsize;
>>   -             drops = early_drop_list(net, &ct_hash[hash]);
>> +               drops = early_drop_list(net, &ct_hash[bucket]);
>>                 rcu_read_unlock();
>>                 if (drops) {
>>
>
> --
>           Dima

^ permalink raw reply

* Re: [PATCH ghak90 (was ghak32) V4 03/10] audit: log container info of syscalls
From: Steve Grubb @ 2018-10-25 15:57 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: Paul Moore, simo, carlos, linux-api, containers, linux-kernel,
	dhowells, linux-audit, netfilter-devel, ebiederm, luto, netdev,
	linux-fsdevel, Eric Paris, Serge Hallyn, viro
In-Reply-To: <20181025122732.4j4rbychjse3gemt@madcap2.tricolour.ca>

On Thu, 25 Oct 2018 08:27:32 -0400
Richard Guy Briggs <rgb@redhat.com> wrote:

> On 2018-10-25 06:49, Paul Moore wrote:
> > On Thu, Oct 25, 2018 at 2:06 AM Steve Grubb <sgrubb@redhat.com>
> > wrote:  
> > > On Wed, 24 Oct 2018 20:42:55 -0400
> > > Richard Guy Briggs <rgb@redhat.com> wrote:  
> > > > On 2018-10-24 16:55, Paul Moore wrote:  
> > > > > On Wed, Oct 24, 2018 at 11:15 AM Richard Guy Briggs
> > > > > <rgb@redhat.com> wrote:  
> > > > > > On 2018-10-19 19:16, Paul Moore wrote:  
> > > > > > > On Sun, Aug 5, 2018 at 4:32 AM Richard Guy Briggs
> > > > > > > <rgb@redhat.com> wrote:  
> > 
> > ...
> >   
> > > > > > > > +/*
> > > > > > > > + * audit_log_contid - report container info
> > > > > > > > + * @tsk: task to be recorded
> > > > > > > > + * @context: task or local context for record
> > > > > > > > + * @op: contid string description
> > > > > > > > + */
> > > > > > > > +int audit_log_contid(struct task_struct *tsk,
> > > > > > > > +                            struct audit_context
> > > > > > > > *context, char *op) +{
> > > > > > > > +       struct audit_buffer *ab;
> > > > > > > > +
> > > > > > > > +       if (!audit_contid_set(tsk))
> > > > > > > > +               return 0;
> > > > > > > > +       /* Generate AUDIT_CONTAINER record with
> > > > > > > > container ID */
> > > > > > > > +       ab = audit_log_start(context, GFP_KERNEL,
> > > > > > > > AUDIT_CONTAINER);
> > > > > > > > +       if (!ab)
> > > > > > > > +               return -ENOMEM;
> > > > > > > > +       audit_log_format(ab, "op=%s contid=%llu",
> > > > > > > > +                        op, audit_get_contid(tsk));
> > > > > > > > +       audit_log_end(ab);
> > > > > > > > +       return 0;
> > > > > > > > +}
> > > > > > > > +EXPORT_SYMBOL(audit_log_contid);  
> > > > > > >
> > > > > > > As discussed in the previous iteration of the patch, I
> > > > > > > prefer AUDIT_CONTAINER_ID here over AUDIT_CONTAINER.  If
> > > > > > > you feel strongly about keeping it as-is with
> > > > > > > AUDIT_CONTAINER I suppose I could live with that, but it
> > > > > > > is isn't my first choice.  
> > > > > >
> > > > > > I don't have a strong opinion on this one, mildly
> > > > > > preferring the shorter one only because it is shorter.  
> > > > >
> > > > > We already have multiple AUDIT_CONTAINER* record types, so it
> > > > > seems as though we should use "AUDIT_CONTAINER" as a prefix
> > > > > of sorts, rather than a type itself.  
> > > >
> > > > I'm fine with that.  I'd still like to hear Steve's input.  He
> > > > had stronger opinions than me.  
> > >
> > > The creation event should be separate and distinct from the
> > > continuing use when its used as a supplemental record. IOW,
> > > binding the ID to a container is part of the lifecycle and needs
> > > to be kept distinct.  
> > 
> > Steve's comment is pretty ambiguous when it comes to AUDIT_CONTAINER
> > vs AUDIT_CONTAINER_ID, but one could argue that AUDIT_CONTAINER_ID
> > helps distinguish the audit container id marking record and gets to
> > what I believe is the spirit of Steve's comment.  Taking this in
> > context with my previous remarks, let's switch to using
> > AUDIT_CONTAINER_ID.  
> 
> I suspect Steve is mixing up AUDIT_CONTAINER_OP with
> AUDIT_CONTAINER_ID, confusing the fact that they are two seperate
> records.  As a summary, the suggested records are:
> 	CONTAINER_OP	audit container identifier creation
> 	CONTAINER	audit container identifier aux record to an
> event
> 
> and what Paul is suggesting (which is fine by me) is:
> 	CONTAINER_OP	audit container identifier creation event
> 	CONTAINER_ID	audit container identifier aux record to
> an event
> 
> Steve, please indicate you are fine with this.

I thought it was:

CONTAINER_ID audit container identifier creation event
CONTAINER audit container identifier aux record to an event

Or vice versa. Don't mix up creation of the identifier with operations.

-Steve

^ permalink raw reply

* [PATCH v3 0/2] net: qcom/emac: add shared mdio bus support
From: Wang Dongsheng @ 2018-10-25  8:15 UTC (permalink / raw)
  To: timur, andrew; +Cc: netdev, Wang Dongsheng, yu.zheng, f.fainelli

The emac include MDIO controller, and the motherboard has more than one
PHY connected to an MDIO bus. So share the shared mii_bus for others MAC
device that not has MDIO bus connected.

Based on ACPI, since "phy-handle" cannot directly point to a _DSD
sub-package, so we use "phy-handle" to point an internal MDIO device port.
The port describes the phy address.

Tested: QDF2400 (ACPI), buildin/insmod/rmmod

V3:
 - Add "phy-handle" support.
 - Remove all of DT changes.

V2:
 - Separate patch.

Wang Dongsheng (2):
  net: qcom/emac: split phy_config to mdio bus create and get phy device
  net: qcom/emac: add phy-handle support for ACPI

 drivers/net/ethernet/qualcomm/emac/emac-phy.c | 183 ++++++++++++++----
 1 file changed, 142 insertions(+), 41 deletions(-)

-- 
2.18.0

^ permalink raw reply

* RE: [PATCH v3 0/2] net: qcom/emac: add shared mdio bus support
From: Wang, Dongsheng @ 2018-10-25  8:16 UTC (permalink / raw)
  To: timur@kernel.org, andrew@lunn.ch
  Cc: netdev@vger.kernel.org, Zheng, Joey, f.fainelli@gmail.com
In-Reply-To: <20181025081503.15683-1-dongsheng.wang@hxt-semitech.com>

Sorry, please ignore this patch.

Cheers,
-Dongsheng


> -----Original Message-----
> From: Wang, Dongsheng
> Sent: Thursday, October 25, 2018 4:15 PM
> To: timur@kernel.org; andrew@lunn.ch
> Cc: netdev@vger.kernel.org; Wang, Dongsheng
> <dongsheng.wang@hxt-semitech.com>; Zheng, Joey
> <yu.zheng@hxt-semitech.com>; f.fainelli@gmail.com
> Subject: [PATCH v3 0/2] net: qcom/emac: add shared mdio bus support
> 
> The emac include MDIO controller, and the motherboard has more than one
> PHY connected to an MDIO bus. So share the shared mii_bus for others MAC
> device that not has MDIO bus connected.
> 
> Based on ACPI, since "phy-handle" cannot directly point to a _DSD sub-package,
> so we use "phy-handle" to point an internal MDIO device port.
> The port describes the phy address.
> 
> Tested: QDF2400 (ACPI), buildin/insmod/rmmod
> 
> V3:
>  - Add "phy-handle" support.
>  - Remove all of DT changes.
> 
> V2:
>  - Separate patch.
> 
> Wang Dongsheng (2):
>   net: qcom/emac: split phy_config to mdio bus create and get phy device
>   net: qcom/emac: add phy-handle support for ACPI
> 
>  drivers/net/ethernet/qualcomm/emac/emac-phy.c | 183 ++++++++++++++----
>  1 file changed, 142 insertions(+), 41 deletions(-)
> 
> --
> 2.18.0

^ permalink raw reply

* Re: [PATCH nf] netfilter: ipv6: fix oops when defragmenting locally generated fragments
From: Pablo Neira Ayuso @ 2018-10-25  8:18 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netfilter-devel, lorenzo, zenczykowski, edumazet, netdev, maze
In-Reply-To: <20181023144716.19746-1-fw@strlen.de>

On Tue, Oct 23, 2018 at 04:47:16PM +0200, Florian Westphal wrote:
> Unlike ipv4 and normal ipv6 defrag, netfilter ipv6 defragmentation did
> not save/restore skb->dst.
> 
> This causes oops when handling locally generated ipv6 fragments, as
> output path needs a valid dst.

Applied, thanks!

^ permalink raw reply

* Re: [PATCH bpf-next 6/6] selftests/bpf: test_verifier, check bpf_map_lookup_elem access in bpf prog
From: Naresh Kamboju @ 2018-10-25  8:54 UTC (permalink / raw)
  To: liu.song.a23, bhole_prashant_q7
  Cc: ast, Daniel Borkmann, jakub.kicinski, David S. Miller,
	quentin.monnet, netdev, open list:KERNEL SELFTEST FRAMEWORK
In-Reply-To: <CAPhsuW72jhD+962NjSyxPrMhoeE9d24ArEVm0oDsP4FV46nNVA@mail.gmail.com>

On Tue, 9 Oct 2018 at 12:32, Song Liu <liu.song.a23@gmail.com> wrote:
>
> On Mon, Oct 8, 2018 at 6:07 PM Prashant Bhole
> <bhole_prashant_q7@lab.ntt.co.jp> wrote:
> >
> > map_lookup_elem isn't supported by certain map types like:
> > - BPF_MAP_TYPE_PROG_ARRAY
> > - BPF_MAP_TYPE_STACK_TRACE
> > - BPF_MAP_TYPE_XSKMAP
> > - BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH
> > Let's add verfier tests to check whether verifier prevents
> > bpf_map_lookup_elem call on above programs from bpf program.
> >
> > Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
> > Acked-by: Alexei Starovoitov <ast@kernel.org>
> Acked-by: Song Liu <songliubraving@fb.com>
>
> > ---
> >  tools/testing/selftests/bpf/test_verifier.c | 121 +++++++++++++++++++-
> >  1 file changed, 120 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
> > index 65ae44c85d27..cf4cd32b6772 100644
> > --- a/tools/testing/selftests/bpf/test_verifier.c
> > +++ b/tools/testing/selftests/bpf/test_verifier.c
> > @@ -48,7 +48,7 @@
> >
> >  #define MAX_INSNS      BPF_MAXINSNS
> >  #define MAX_FIXUPS     8
> > -#define MAX_NR_MAPS    8
> > +#define MAX_NR_MAPS    13
> >  #define POINTER_VALUE  0xcafe4all
> >  #define TEST_DATA_LEN  64
> >
> > @@ -65,6 +65,10 @@ struct bpf_test {
> >         int fixup_map_hash_48b[MAX_FIXUPS];
> >         int fixup_map_hash_16b[MAX_FIXUPS];
> >         int fixup_map_array_48b[MAX_FIXUPS];
> > +       int fixup_map_sockmap[MAX_FIXUPS];
> > +       int fixup_map_sockhash[MAX_FIXUPS];
> > +       int fixup_map_xskmap[MAX_FIXUPS];
> > +       int fixup_map_stacktrace[MAX_FIXUPS];
> >         int fixup_prog1[MAX_FIXUPS];
> >         int fixup_prog2[MAX_FIXUPS];
> >         int fixup_map_in_map[MAX_FIXUPS];
> > @@ -4541,6 +4545,85 @@ static struct bpf_test tests[] = {
> >                 .errstr = "invalid access to packet",
> >                 .prog_type = BPF_PROG_TYPE_SCHED_CLS,
> >         },
> > +       {
> > +               "prevent map lookup in sockmap",
> > +               .insns = {
> > +                       BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> > +                       BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> > +                       BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> > +                       BPF_LD_MAP_FD(BPF_REG_1, 0),
> > +                       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> > +                                    BPF_FUNC_map_lookup_elem),
> > +                       BPF_EXIT_INSN(),
> > +               },
> > +               .fixup_map_sockmap = { 3 },
> > +               .result = REJECT,
> > +               .errstr = "cannot pass map_type 15 into func bpf_map_lookup_elem",
> > +               .prog_type = BPF_PROG_TYPE_SOCK_OPS,
> > +       },
> > +       {
> > +               "prevent map lookup in sockhash",
> > +               .insns = {
> > +                       BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> > +                       BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> > +                       BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> > +                       BPF_LD_MAP_FD(BPF_REG_1, 0),
> > +                       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> > +                                    BPF_FUNC_map_lookup_elem),
> > +                       BPF_EXIT_INSN(),
> > +               },
> > +               .fixup_map_sockhash = { 3 },
> > +               .result = REJECT,
> > +               .errstr = "cannot pass map_type 18 into func bpf_map_lookup_elem",
> > +               .prog_type = BPF_PROG_TYPE_SOCK_OPS,
> > +       },
> > +       {
> > +               "prevent map lookup in xskmap",
> > +               .insns = {
> > +                       BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> > +                       BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> > +                       BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> > +                       BPF_LD_MAP_FD(BPF_REG_1, 0),
> > +                       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> > +                                    BPF_FUNC_map_lookup_elem),
> > +                       BPF_EXIT_INSN(),
> > +               },
> > +               .fixup_map_xskmap = { 3 },
> > +               .result = REJECT,
> > +               .errstr = "cannot pass map_type 17 into func bpf_map_lookup_elem",
> > +               .prog_type = BPF_PROG_TYPE_XDP,
> > +       },
> > +       {
> > +               "prevent map lookup in stack trace",
> > +               .insns = {
> > +                       BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> > +                       BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> > +                       BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> > +                       BPF_LD_MAP_FD(BPF_REG_1, 0),
> > +                       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> > +                                    BPF_FUNC_map_lookup_elem),
> > +                       BPF_EXIT_INSN(),
> > +               },
> > +               .fixup_map_stacktrace = { 3 },
> > +               .result = REJECT,
> > +               .errstr = "cannot pass map_type 7 into func bpf_map_lookup_elem",
> > +               .prog_type = BPF_PROG_TYPE_PERF_EVENT,
> > +       },
> > +       {
> > +               "prevent map lookup in prog array",
> > +               .insns = {
> > +                       BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> > +                       BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> > +                       BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> > +                       BPF_LD_MAP_FD(BPF_REG_1, 0),
> > +                       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> > +                                    BPF_FUNC_map_lookup_elem),
> > +                       BPF_EXIT_INSN(),
> > +               },
> > +               .fixup_prog2 = { 3 },
> > +               .result = REJECT,
> > +               .errstr = "cannot pass map_type 3 into func bpf_map_lookup_elem",
> > +       },
> >         {
> >                 "valid map access into an array with a constant",
> >                 .insns = {
> > @@ -13515,6 +13598,10 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_map_type prog_type,
> >         int *fixup_map_hash_48b = test->fixup_map_hash_48b;
> >         int *fixup_map_hash_16b = test->fixup_map_hash_16b;
> >         int *fixup_map_array_48b = test->fixup_map_array_48b;
> > +       int *fixup_map_sockmap = test->fixup_map_sockmap;
> > +       int *fixup_map_sockhash = test->fixup_map_sockhash;
> > +       int *fixup_map_xskmap = test->fixup_map_xskmap;
> > +       int *fixup_map_stacktrace = test->fixup_map_stacktrace;
> >         int *fixup_prog1 = test->fixup_prog1;
> >         int *fixup_prog2 = test->fixup_prog2;
> >         int *fixup_map_in_map = test->fixup_map_in_map;
> > @@ -13603,6 +13690,38 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_map_type prog_type,
> >                         fixup_percpu_cgroup_storage++;
> >                 } while (*fixup_percpu_cgroup_storage);
> >         }
> > +       if (*fixup_map_sockmap) {
> > +               map_fds[9] = create_map(BPF_MAP_TYPE_SOCKMAP, sizeof(int),
> > +                                       sizeof(int), 1);
> > +               do {
> > +                       prog[*fixup_map_sockmap].imm = map_fds[9];
> > +                       fixup_map_sockmap++;
> > +               } while (*fixup_map_sockmap);
> > +       }
> > +       if (*fixup_map_sockhash) {
> > +               map_fds[10] = create_map(BPF_MAP_TYPE_SOCKHASH, sizeof(int),
> > +                                       sizeof(int), 1);
> > +               do {
> > +                       prog[*fixup_map_sockhash].imm = map_fds[10];
> > +                       fixup_map_sockhash++;
> > +               } while (*fixup_map_sockhash);
> > +       }
> > +       if (*fixup_map_xskmap) {
> > +               map_fds[11] = create_map(BPF_MAP_TYPE_XSKMAP, sizeof(int),
> > +                                       sizeof(int), 1);
> > +               do {
> > +                       prog[*fixup_map_xskmap].imm = map_fds[11];
> > +                       fixup_map_xskmap++;
> > +               } while (*fixup_map_xskmap);
> > +       }

selftests: bpf: test_verifier sockmap, sockhash, xskmap failed on
mainline and next
(from 4.19.0-rc7-next-20181011 to till date )
Are we missing any pre-required kernel configs ?

Test log,
------------
selftests: bpf: test_verifier
<>
#274/p prevent map lookup in sockmap Failed to create hash map
'Invalid argument'!
FAIL
Unexpected error message!
EXP: cannot pass map_type 15 into func bpf_map_lookup_elem
RES: fd -1 is not pointing to valid bpf_map
fd -1 is not pointing to valid bpf_map
#275/p prevent map lookup in sockhash Failed to create hash map
'Invalid argument'!
FAIL
Unexpected error message!
EXP: cannot pass map_type 18 into func bpf_map_lookup_elem
RES: fd -1 is not pointing to valid bpf_map
fd -1 is not pointing to valid bpf_map
#276/p prevent map lookup in xskmap Failed to create hash map 'Invalid
argument'!
FAIL
Unexpected error message!
EXP: cannot pass map_type 17 into func bpf_map_lookup_elem
RES: fd -1 is not pointing to valid bpf_map
fd -1 is not pointing to valid bpf_map
<>
Summary: 962 PASSED, 0 SKIPPED, 3 FAILED
not ok 1..1 selftests: bpf: test_verifier [FAIL]
selftests: bpf_test_verifier [FAIL]

-mainline results history,
https://qa-reports.linaro.org/lkft/linux-next-oe/tests/kselftest/bpf_test_verifier

-next results history,
https://qa-reports.linaro.org/lkft/linux-next-oe/tests/kselftest/bpf_test_verifier

Test case full log,
https://lkft.validation.linaro.org/scheduler/job/461881#L1655

Best regards
Naresh Kamboju

> > +       if (*fixup_map_stacktrace) {
> > +               map_fds[12] = create_map(BPF_MAP_TYPE_STACK_TRACE, sizeof(u32),
> > +                                        sizeof(u64), 1);
> > +               do {
> > +                       prog[*fixup_map_stacktrace].imm = map_fds[12];
> > +                       fixup_map_stacktrace++;
> > +               } while (fixup_map_stacktrace);
> > +       }
> >  }
> >
> >  static void do_test_single(struct bpf_test *test, bool unpriv,
> > --
> > 2.17.1
> >
> >

^ permalink raw reply

* Re: Regression: kernel 4.14 an later very slow with many ipsec tunnels
From: David Miller @ 2018-10-25 17:34 UTC (permalink / raw)
  To: linux
  Cc: netdev, fw, steffen.klassert, linux-kernel, torvalds,
	christophe.gouault, gregkh
In-Reply-To: <2766296.15tpkxTHJV@stwm.de>

From: Wolfgang Walter <linux@stwm.de>
Date: Thu, 25 Oct 2018 11:38:19 +0200

> there is now a new 4.19 which still has the big performance regression when 
> many ipsec tunnels are configured (throughput and latency get worse by 10 to 
> 50 times) which makes any kernel > 4.9 unusable for our routers.
> 
> I still don't understand why a revert of the flow cache removal at least for 
> the longterm kernels is that a bad option (maybe as a compile time option), 
> especially as there is no workaround available.

You do know that the flow cache is DDoS targettable, right?

That's why we removed it, we did not make the change lightly.

Adding a DDoS vector back into the kernel is not an option sorry.

Please work diligently with Florian and others to try and find ways to
soften the performance hit.

Thank you.

^ permalink raw reply

* Re: [PATCH ghak90 (was ghak32) V4 03/10] audit: log container info of syscalls
From: Richard Guy Briggs @ 2018-10-25 17:38 UTC (permalink / raw)
  To: Steve Grubb
  Cc: Paul Moore, simo, carlos, linux-api, containers, linux-kernel,
	dhowells, linux-audit, netfilter-devel, ebiederm, luto, netdev,
	linux-fsdevel, Eric Paris, Serge Hallyn, viro
In-Reply-To: <20181025175745.5b2b13e9@ivy-bridge>

On 2018-10-25 17:57, Steve Grubb wrote:
> On Thu, 25 Oct 2018 08:27:32 -0400
> Richard Guy Briggs <rgb@redhat.com> wrote:
> 
> > On 2018-10-25 06:49, Paul Moore wrote:
> > > On Thu, Oct 25, 2018 at 2:06 AM Steve Grubb <sgrubb@redhat.com>
> > > wrote:  
> > > > On Wed, 24 Oct 2018 20:42:55 -0400
> > > > Richard Guy Briggs <rgb@redhat.com> wrote:  
> > > > > On 2018-10-24 16:55, Paul Moore wrote:  
> > > > > > On Wed, Oct 24, 2018 at 11:15 AM Richard Guy Briggs
> > > > > > <rgb@redhat.com> wrote:  
> > > > > > > On 2018-10-19 19:16, Paul Moore wrote:  
> > > > > > > > On Sun, Aug 5, 2018 at 4:32 AM Richard Guy Briggs
> > > > > > > > <rgb@redhat.com> wrote:  
> > > 
> > > ...
> > >   
> > > > > > > > > +/*
> > > > > > > > > + * audit_log_contid - report container info
> > > > > > > > > + * @tsk: task to be recorded
> > > > > > > > > + * @context: task or local context for record
> > > > > > > > > + * @op: contid string description
> > > > > > > > > + */
> > > > > > > > > +int audit_log_contid(struct task_struct *tsk,
> > > > > > > > > +                            struct audit_context
> > > > > > > > > *context, char *op) +{
> > > > > > > > > +       struct audit_buffer *ab;
> > > > > > > > > +
> > > > > > > > > +       if (!audit_contid_set(tsk))
> > > > > > > > > +               return 0;
> > > > > > > > > +       /* Generate AUDIT_CONTAINER record with
> > > > > > > > > container ID */
> > > > > > > > > +       ab = audit_log_start(context, GFP_KERNEL,
> > > > > > > > > AUDIT_CONTAINER);
> > > > > > > > > +       if (!ab)
> > > > > > > > > +               return -ENOMEM;
> > > > > > > > > +       audit_log_format(ab, "op=%s contid=%llu",
> > > > > > > > > +                        op, audit_get_contid(tsk));
> > > > > > > > > +       audit_log_end(ab);
> > > > > > > > > +       return 0;
> > > > > > > > > +}
> > > > > > > > > +EXPORT_SYMBOL(audit_log_contid);  
> > > > > > > >
> > > > > > > > As discussed in the previous iteration of the patch, I
> > > > > > > > prefer AUDIT_CONTAINER_ID here over AUDIT_CONTAINER.  If
> > > > > > > > you feel strongly about keeping it as-is with
> > > > > > > > AUDIT_CONTAINER I suppose I could live with that, but it
> > > > > > > > is isn't my first choice.  
> > > > > > >
> > > > > > > I don't have a strong opinion on this one, mildly
> > > > > > > preferring the shorter one only because it is shorter.  
> > > > > >
> > > > > > We already have multiple AUDIT_CONTAINER* record types, so it
> > > > > > seems as though we should use "AUDIT_CONTAINER" as a prefix
> > > > > > of sorts, rather than a type itself.  
> > > > >
> > > > > I'm fine with that.  I'd still like to hear Steve's input.  He
> > > > > had stronger opinions than me.  
> > > >
> > > > The creation event should be separate and distinct from the
> > > > continuing use when its used as a supplemental record. IOW,
> > > > binding the ID to a container is part of the lifecycle and needs
> > > > to be kept distinct.  
> > > 
> > > Steve's comment is pretty ambiguous when it comes to AUDIT_CONTAINER
> > > vs AUDIT_CONTAINER_ID, but one could argue that AUDIT_CONTAINER_ID
> > > helps distinguish the audit container id marking record and gets to
> > > what I believe is the spirit of Steve's comment.  Taking this in
> > > context with my previous remarks, let's switch to using
> > > AUDIT_CONTAINER_ID.  
> > 
> > I suspect Steve is mixing up AUDIT_CONTAINER_OP with
> > AUDIT_CONTAINER_ID, confusing the fact that they are two seperate
> > records.  As a summary, the suggested records are:
> > 	CONTAINER_OP	audit container identifier creation
> > 	CONTAINER	audit container identifier aux record to an
> > event
> > 
> > and what Paul is suggesting (which is fine by me) is:
> > 	CONTAINER_OP	audit container identifier creation event
> > 	CONTAINER_ID	audit container identifier aux record to
> > an event
> > 
> > Steve, please indicate you are fine with this.
> 
> I thought it was:

It *was*.  It was changed at Paul's request in this v3 thread:
	https://www.redhat.com/archives/linux-audit/2018-July/msg00087.html

And listed in the examples and changelog to this v4 patchset:
	https://www.redhat.com/archives/linux-audit/2018-July/msg00178.html

It is also listed in this userspace patchset update v4 (which should
also have had a changelog added to it, note to self...):
	https://www.redhat.com/archives/linux-audit/2018-July/msg00189.html

I realize it is hard to keep up with all the detail changes in these
patchsets...

> CONTAINER_ID audit container identifier creation event
> CONTAINER audit container identifier aux record to an event
> 
> Or vice versa. Don't mix up creation of the identifier with operations.

Exactly what I'm trying to avoid...  Worded another way: "Don't mix up
the creation operation with routine reporting of the identifier in
events."  Steve, can you and Paul discuss and agree on what they should
be called?  I don't have a horse in this race, but I need to record the
result of that run.  ;-)

> -Steve

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply

* Re: [PATCH bpf-next 6/6] selftests/bpf: test_verifier, check bpf_map_lookup_elem access in bpf prog
From: Prashant Bhole @ 2018-10-25  9:09 UTC (permalink / raw)
  To: Naresh Kamboju, liu.song.a23
  Cc: ast, Daniel Borkmann, jakub.kicinski, David S. Miller,
	quentin.monnet, netdev, open list:KERNEL SELFTEST FRAMEWORK
In-Reply-To: <CA+G9fYtT78hr4i6JDdeKBNoO4G_NMSVWUTyxMoU4_1ZfqGdthg@mail.gmail.com>



On 10/25/2018 5:54 PM, Naresh Kamboju wrote:
> On Tue, 9 Oct 2018 at 12:32, Song Liu <liu.song.a23@gmail.com> wrote:
>>
>> On Mon, Oct 8, 2018 at 6:07 PM Prashant Bhole
>> <bhole_prashant_q7@lab.ntt.co.jp> wrote:
>>>
>>> map_lookup_elem isn't supported by certain map types like:
>>> - BPF_MAP_TYPE_PROG_ARRAY
>>> - BPF_MAP_TYPE_STACK_TRACE
>>> - BPF_MAP_TYPE_XSKMAP
>>> - BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH
>>> Let's add verfier tests to check whether verifier prevents
>>> bpf_map_lookup_elem call on above programs from bpf program.
>>>
>>> Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
>>> Acked-by: Alexei Starovoitov <ast@kernel.org>
>> Acked-by: Song Liu <songliubraving@fb.com>
>>
>>> ---
>>>   tools/testing/selftests/bpf/test_verifier.c | 121 +++++++++++++++++++-
>>>   1 file changed, 120 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
>>> index 65ae44c85d27..cf4cd32b6772 100644
>>> --- a/tools/testing/selftests/bpf/test_verifier.c
>>> +++ b/tools/testing/selftests/bpf/test_verifier.c
>>> @@ -48,7 +48,7 @@
>>>
>>>   #define MAX_INSNS      BPF_MAXINSNS
>>>   #define MAX_FIXUPS     8
>>> -#define MAX_NR_MAPS    8
>>> +#define MAX_NR_MAPS    13
>>>   #define POINTER_VALUE  0xcafe4all
>>>   #define TEST_DATA_LEN  64
>>>
>>> @@ -65,6 +65,10 @@ struct bpf_test {
>>>          int fixup_map_hash_48b[MAX_FIXUPS];
>>>          int fixup_map_hash_16b[MAX_FIXUPS];
>>>          int fixup_map_array_48b[MAX_FIXUPS];
>>> +       int fixup_map_sockmap[MAX_FIXUPS];
>>> +       int fixup_map_sockhash[MAX_FIXUPS];
>>> +       int fixup_map_xskmap[MAX_FIXUPS];
>>> +       int fixup_map_stacktrace[MAX_FIXUPS];
>>>          int fixup_prog1[MAX_FIXUPS];
>>>          int fixup_prog2[MAX_FIXUPS];
>>>          int fixup_map_in_map[MAX_FIXUPS];
>>> @@ -4541,6 +4545,85 @@ static struct bpf_test tests[] = {
>>>                  .errstr = "invalid access to packet",
>>>                  .prog_type = BPF_PROG_TYPE_SCHED_CLS,
>>>          },
>>> +       {
>>> +               "prevent map lookup in sockmap",
>>> +               .insns = {
>>> +                       BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
>>> +                       BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
>>> +                       BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
>>> +                       BPF_LD_MAP_FD(BPF_REG_1, 0),
>>> +                       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
>>> +                                    BPF_FUNC_map_lookup_elem),
>>> +                       BPF_EXIT_INSN(),
>>> +               },
>>> +               .fixup_map_sockmap = { 3 },
>>> +               .result = REJECT,
>>> +               .errstr = "cannot pass map_type 15 into func bpf_map_lookup_elem",
>>> +               .prog_type = BPF_PROG_TYPE_SOCK_OPS,
>>> +       },
>>> +       {
>>> +               "prevent map lookup in sockhash",
>>> +               .insns = {
>>> +                       BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
>>> +                       BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
>>> +                       BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
>>> +                       BPF_LD_MAP_FD(BPF_REG_1, 0),
>>> +                       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
>>> +                                    BPF_FUNC_map_lookup_elem),
>>> +                       BPF_EXIT_INSN(),
>>> +               },
>>> +               .fixup_map_sockhash = { 3 },
>>> +               .result = REJECT,
>>> +               .errstr = "cannot pass map_type 18 into func bpf_map_lookup_elem",
>>> +               .prog_type = BPF_PROG_TYPE_SOCK_OPS,
>>> +       },
>>> +       {
>>> +               "prevent map lookup in xskmap",
>>> +               .insns = {
>>> +                       BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
>>> +                       BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
>>> +                       BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
>>> +                       BPF_LD_MAP_FD(BPF_REG_1, 0),
>>> +                       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
>>> +                                    BPF_FUNC_map_lookup_elem),
>>> +                       BPF_EXIT_INSN(),
>>> +               },
>>> +               .fixup_map_xskmap = { 3 },
>>> +               .result = REJECT,
>>> +               .errstr = "cannot pass map_type 17 into func bpf_map_lookup_elem",
>>> +               .prog_type = BPF_PROG_TYPE_XDP,
>>> +       },
>>> +       {
>>> +               "prevent map lookup in stack trace",
>>> +               .insns = {
>>> +                       BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
>>> +                       BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
>>> +                       BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
>>> +                       BPF_LD_MAP_FD(BPF_REG_1, 0),
>>> +                       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
>>> +                                    BPF_FUNC_map_lookup_elem),
>>> +                       BPF_EXIT_INSN(),
>>> +               },
>>> +               .fixup_map_stacktrace = { 3 },
>>> +               .result = REJECT,
>>> +               .errstr = "cannot pass map_type 7 into func bpf_map_lookup_elem",
>>> +               .prog_type = BPF_PROG_TYPE_PERF_EVENT,
>>> +       },
>>> +       {
>>> +               "prevent map lookup in prog array",
>>> +               .insns = {
>>> +                       BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
>>> +                       BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
>>> +                       BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
>>> +                       BPF_LD_MAP_FD(BPF_REG_1, 0),
>>> +                       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
>>> +                                    BPF_FUNC_map_lookup_elem),
>>> +                       BPF_EXIT_INSN(),
>>> +               },
>>> +               .fixup_prog2 = { 3 },
>>> +               .result = REJECT,
>>> +               .errstr = "cannot pass map_type 3 into func bpf_map_lookup_elem",
>>> +       },
>>>          {
>>>                  "valid map access into an array with a constant",
>>>                  .insns = {
>>> @@ -13515,6 +13598,10 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_map_type prog_type,
>>>          int *fixup_map_hash_48b = test->fixup_map_hash_48b;
>>>          int *fixup_map_hash_16b = test->fixup_map_hash_16b;
>>>          int *fixup_map_array_48b = test->fixup_map_array_48b;
>>> +       int *fixup_map_sockmap = test->fixup_map_sockmap;
>>> +       int *fixup_map_sockhash = test->fixup_map_sockhash;
>>> +       int *fixup_map_xskmap = test->fixup_map_xskmap;
>>> +       int *fixup_map_stacktrace = test->fixup_map_stacktrace;
>>>          int *fixup_prog1 = test->fixup_prog1;
>>>          int *fixup_prog2 = test->fixup_prog2;
>>>          int *fixup_map_in_map = test->fixup_map_in_map;
>>> @@ -13603,6 +13690,38 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_map_type prog_type,
>>>                          fixup_percpu_cgroup_storage++;
>>>                  } while (*fixup_percpu_cgroup_storage);
>>>          }
>>> +       if (*fixup_map_sockmap) {
>>> +               map_fds[9] = create_map(BPF_MAP_TYPE_SOCKMAP, sizeof(int),
>>> +                                       sizeof(int), 1);
>>> +               do {
>>> +                       prog[*fixup_map_sockmap].imm = map_fds[9];
>>> +                       fixup_map_sockmap++;
>>> +               } while (*fixup_map_sockmap);
>>> +       }
>>> +       if (*fixup_map_sockhash) {
>>> +               map_fds[10] = create_map(BPF_MAP_TYPE_SOCKHASH, sizeof(int),
>>> +                                       sizeof(int), 1);
>>> +               do {
>>> +                       prog[*fixup_map_sockhash].imm = map_fds[10];
>>> +                       fixup_map_sockhash++;
>>> +               } while (*fixup_map_sockhash);
>>> +       }
>>> +       if (*fixup_map_xskmap) {
>>> +               map_fds[11] = create_map(BPF_MAP_TYPE_XSKMAP, sizeof(int),
>>> +                                       sizeof(int), 1);
>>> +               do {
>>> +                       prog[*fixup_map_xskmap].imm = map_fds[11];
>>> +                       fixup_map_xskmap++;
>>> +               } while (*fixup_map_xskmap);
>>> +       }
> 
> selftests: bpf: test_verifier sockmap, sockhash, xskmap failed on
> mainline and next
> (from 4.19.0-rc7-next-20181011 to till date )
> Are we missing any pre-required kernel configs ?

sockmap/hashmap is dependent on CONFIG_BPF_STREAM_PARSER and xskmap is 
dependent on CONFIG_XDP_SOCKETS.

Reference: include/linux/bpf_types.h

-Prashant

> 
> Test log,
> ------------
> selftests: bpf: test_verifier
> <>
> #274/p prevent map lookup in sockmap Failed to create hash map
> 'Invalid argument'!
> FAIL
> Unexpected error message!
> EXP: cannot pass map_type 15 into func bpf_map_lookup_elem
> RES: fd -1 is not pointing to valid bpf_map
> fd -1 is not pointing to valid bpf_map
> #275/p prevent map lookup in sockhash Failed to create hash map
> 'Invalid argument'!
> FAIL
> Unexpected error message!
> EXP: cannot pass map_type 18 into func bpf_map_lookup_elem
> RES: fd -1 is not pointing to valid bpf_map
> fd -1 is not pointing to valid bpf_map
> #276/p prevent map lookup in xskmap Failed to create hash map 'Invalid
> argument'!
> FAIL
> Unexpected error message!
> EXP: cannot pass map_type 17 into func bpf_map_lookup_elem
> RES: fd -1 is not pointing to valid bpf_map
> fd -1 is not pointing to valid bpf_map
> <>
> Summary: 962 PASSED, 0 SKIPPED, 3 FAILED
> not ok 1..1 selftests: bpf: test_verifier [FAIL]
> selftests: bpf_test_verifier [FAIL]
> 
> -mainline results history,
> https://qa-reports.linaro.org/lkft/linux-next-oe/tests/kselftest/bpf_test_verifier
> 
> -next results history,
> https://qa-reports.linaro.org/lkft/linux-next-oe/tests/kselftest/bpf_test_verifier
> 
> Test case full log,
> https://lkft.validation.linaro.org/scheduler/job/461881#L1655
> 
> Best regards
> Naresh Kamboju
> 
>>> +       if (*fixup_map_stacktrace) {
>>> +               map_fds[12] = create_map(BPF_MAP_TYPE_STACK_TRACE, sizeof(u32),
>>> +                                        sizeof(u64), 1);
>>> +               do {
>>> +                       prog[*fixup_map_stacktrace].imm = map_fds[12];
>>> +                       fixup_map_stacktrace++;
>>> +               } while (fixup_map_stacktrace);
>>> +       }
>>>   }
>>>
>>>   static void do_test_single(struct bpf_test *test, bool unpriv,
>>> --
>>> 2.17.1
>>>
>>>
> 
> 

^ permalink raw reply

* Re: Regression in 4.19 net/phy/realtek: garbled sysfs output
From: Holger Hoffstätte @ 2018-10-25  9:29 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Netdev, Jassi Brar, David S. Miller, Heiner Kallweit
In-Reply-To: <20181024201219.GB19440@lunn.ch>

On 10/24/18 22:12, Andrew Lunn wrote:
> On Wed, Oct 24, 2018 at 09:36:02PM +0200, Holger Hoffstätte wrote:
>> Hi,
>>
>> Since 4.19 r8169 depends on phylib:
>>
>> $lsmod | grep r8169
>> r8169                  81920  0
>> libphy                 57344  2 r8169,realtek
>>
>> Unfortunately this now gives me the following sysfs error:
>>
>> $cd /sys/module/realtek/drivers
>> $ls -l
>> ls: cannot access 'mdio_bus:RTL8201F 10/100Mbps Ethernet': No such file or directory
>> total 0
>> lrwxrwxrwx 1 root root 0 Oct 24 21:09 'mdio_bus:RTL8201CP Ethernet' -> '../../../bus/mdio_bus/drivers/RTL8201CP Ethernet'
>> l????????? ? ?    ?    ?            ? 'mdio_bus:RTL8201F 10/100Mbps Ethernet'
>> lrwxrwxrwx 1 root root 0 Oct 24 21:09 'mdio_bus:RTL8211 Gigabit Ethernet' -> '../../../bus/mdio_bus/drivers/RTL8211 Gigabit Ethernet'
>> [..]
>>
>> Apparently the forward slash in "10/100Mbps Ethernet" is interpreted as
>> directory separator that leads nowhere, and was introduced in commit
>> 513588dd44b ("net: phy: realtek: add RTL8201F phy-id and functions").
>>
>> Would it be acceptable to change the name simply to "RTL8201F Ethernet"?
> 
> Hi Holger
> 
> Or use "RTL8201F Fast Ethernet"

Yes, even better since it's correct. :)
As expected changing the name .name entry fixes the sysfs behaviour.

diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
index 7fc8508b5..271e8adc3 100644
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -220,7 +220,7 @@ static struct phy_driver realtek_drvs[] = {
  		.flags          = PHY_HAS_INTERRUPT,
  	}, {
  		.phy_id		= 0x001cc816,
-		.name		= "RTL8201F 10/100Mbps Ethernet",
+		.name		= "RTL8201F Fast Ethernet",
  		.phy_id_mask	= 0x001fffff,
  		.features	= PHY_BASIC_FEATURES,
  		.flags		= PHY_HAS_INTERRUPT,

> I wonder if other drivers have similar problems?
> 
> davicom.c:      .name           = "Davicom DM9161B/C",
> intel-xway.c:           .name           = "Intel XWAY PHY11G (PEF 7071/PEF 7072) v1.3",
> intel-xway.c:           .name           = "Intel XWAY PHY11G (PEF 7071/PEF 7072) v1.4",
> intel-xway.c:           .name           = "Intel XWAY PHY11G (PEF 7071/PEF 7072) v1.5 / v1.6",
> intel-xway.c:           .name           = "Intel XWAY PHY22F (PEF 7061) v1.5 / v1.6",
> smsc.c:	 .name	       = "SMSC LAN8710/LAN8720",

I'm open to suggestions about how to rename those identifiers.
"|" seems to work but IMHO looks a bit weird:

"Davicom DM9161B/C" -> "Davicom DM9161B|C"
"(PEF 7071/PEF 7072) v1.5 / v1.6" -> "(PEF 7071|7072) v1.5|6"

We can go full regex, which will probably get me voted off the island:
"(PEF 7071/PEF 7072) v1.5 / v1.6" -> "(PEF {7071,7072}) v1.{5,6}"

Cast your votes now!

cheers,
Holger

^ permalink raw reply related

* Re: [PATCH net-next 0/4] net: ethernet: ti: cpsw: fix vlan mcast
From: David Miller @ 2018-10-25 18:34 UTC (permalink / raw)
  To: ivan.khoronzhuk
  Cc: grygorii.strashko, linux-omap, netdev, linux-kernel,
	alexander.h.duyck, bjorn
In-Reply-To: <20181024221059.21834-1-ivan.khoronzhuk@linaro.org>

From: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Date: Thu, 25 Oct 2018 01:10:55 +0300

> The cpsw holds separate mcast entires for vlan entries. At this moment
> driver adds only not vlan mcast addresses, omitting vlan/mcast entries.
> As result mcast for vlans doesn't work. It can be fixed by adding same
> mcast entries for every created vlan, but this patchseries uses more
> sophisticated way and allows to create mcast entries only for vlans
> that really require it. Generic functions from this series can be
> reused for fixing vlan and macvlan unicast.

This is a bug fix but targetted at net-next, and indeed it is quite
invasive as it adds new core infrastructure and converts the generic
vlan code over to using it.

Unfortunately net-next is closed.

So if you want this bug fixed in mainline you will have to come up
with a less invasive fix, and resubmit this net-next approach when the
net-next tree opens back up.

Thank you.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox