Netdev List
 help / color / mirror / Atom feed
* [PATCH net v4] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs
From: Duoming Zhou @ 2022-04-27  1:14 UTC (permalink / raw)
  To: krzysztof.kozlowski, pabeni, linux-kernel
  Cc: davem, gregkh, alexander.deucher, akpm, broonie, netdev, kuba,
	linma, Duoming Zhou

There are destructive operations such as nfcmrvl_fw_dnld_abort and
gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware,
gpio and so on could be destructed while the upper layer functions such as
nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads
to double-free, use-after-free and null-ptr-deref bugs.

There are three situations that could lead to double-free bugs.

The first situation is shown below:

   (Thread 1)                 |      (Thread 2)
nfcmrvl_fw_dnld_start         |
 ...                          |  nfcmrvl_nci_unregister_dev
 release_firmware()           |   nfcmrvl_fw_dnld_abort
  kfree(fw) //(1)             |    fw_dnld_over
                              |     release_firmware
  ...                         |      kfree(fw) //(2)
                              |     ...

The second situation is shown below:

   (Thread 1)                 |      (Thread 2)
nfcmrvl_fw_dnld_start         |
 ...                          |
 mod_timer                    |
 (wait a time)                |
 fw_dnld_timeout              |  nfcmrvl_nci_unregister_dev
   fw_dnld_over               |   nfcmrvl_fw_dnld_abort
    release_firmware          |    fw_dnld_over
     kfree(fw) //(1)          |     release_firmware
     ...                      |      kfree(fw) //(2)

The third situation is shown below:

       (Thread 1)               |       (Thread 2)
nfcmrvl_nci_recv_frame          |
 if(..->fw_download_in_progress)|
  nfcmrvl_fw_dnld_recv_frame    |
   queue_work                   |
                                |
fw_dnld_rx_work                 | nfcmrvl_nci_unregister_dev
 fw_dnld_over                   |  nfcmrvl_fw_dnld_abort
  release_firmware              |   fw_dnld_over
   kfree(fw) //(1)              |    release_firmware
                                |     kfree(fw) //(2)

The firmware struct is deallocated in position (1) and deallocated
in position (2) again.

The crash trace triggered by POC is like below:

[  122.640457] BUG: KASAN: double-free or invalid-free in fw_dnld_over+0x28/0xf0
[  122.640457] Call Trace:
[  122.640457]  <TASK>
[  122.640457]  kfree+0xb0/0x330
[  122.640457]  fw_dnld_over+0x28/0xf0
[  122.640457]  nfcmrvl_nci_unregister_dev+0x61/0x70
[  122.640457]  nci_uart_tty_close+0x87/0xd0
[  122.640457]  tty_ldisc_kill+0x3e/0x80
[  122.640457]  tty_ldisc_hangup+0x1b2/0x2c0
[  122.640457]  __tty_hangup.part.0+0x316/0x520
[  122.640457]  tty_release+0x200/0x670
[  122.640457]  __fput+0x110/0x410
[  122.640457]  task_work_run+0x86/0xd0
[  122.640457]  exit_to_user_mode_prepare+0x1aa/0x1b0
[  122.640457]  syscall_exit_to_user_mode+0x19/0x50
[  122.640457]  do_syscall_64+0x48/0x90
[  122.640457]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  122.640457] RIP: 0033:0x7f68433f6beb

What's more, there are also use-after-free and null-ptr-deref bugs
in nfcmrvl_fw_dnld_start. If we deallocate firmware struct, gpio or
set null to the members of priv->fw_dnld in nfcmrvl_nci_unregister_dev,
then, we dereference firmware, gpio or the members of priv->fw_dnld in
nfcmrvl_fw_dnld_start, the UAF or NPD bugs will happen.

This patch reorders destructive operations after nci_unregister_device
and adds bool variable protected by device_lock to synchronize between
cleanup routine and firmware download routine. The process is shown below.

       (Thread 1)               |       (Thread 2)
nfcmrvl_nci_unregister_dev      |
  nci_unregister_device         |
    nfc_unregister_device       | nfc_fw_download
      device_lock()             |
      ...                       |
      nfc_download = false;     |   ...
      device_unlock()           |
  ...                           |   device_lock()
  (destructive operations)      |   if(.. || !nfc_download)
                                |     goto error;
                                | error:
                                |   device_unlock()

If the device is detaching, the download function will goto error.
If the download function is executing, nci_unregister_device will
wait until download function is finished.

Fixes: 3194c6870158 ("NFC: nfcmrvl: add firmware download support")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
---
Changes in v4:
  - Change bool variable nfc_download to static.

 drivers/nfc/nfcmrvl/main.c | 2 +-
 net/nfc/core.c             | 6 +++++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/nfc/nfcmrvl/main.c b/drivers/nfc/nfcmrvl/main.c
index 2fcf545012b..1a5284de434 100644
--- a/drivers/nfc/nfcmrvl/main.c
+++ b/drivers/nfc/nfcmrvl/main.c
@@ -183,6 +183,7 @@ void nfcmrvl_nci_unregister_dev(struct nfcmrvl_private *priv)
 {
 	struct nci_dev *ndev = priv->ndev;
 
+	nci_unregister_device(ndev);
 	if (priv->ndev->nfc_dev->fw_download_in_progress)
 		nfcmrvl_fw_dnld_abort(priv);
 
@@ -191,7 +192,6 @@ void nfcmrvl_nci_unregister_dev(struct nfcmrvl_private *priv)
 	if (gpio_is_valid(priv->config.reset_n_io))
 		gpio_free(priv->config.reset_n_io);
 
-	nci_unregister_device(ndev);
 	nci_free_device(ndev);
 	kfree(priv);
 }
diff --git a/net/nfc/core.c b/net/nfc/core.c
index dc7a2404efd..1d91334ee86 100644
--- a/net/nfc/core.c
+++ b/net/nfc/core.c
@@ -25,6 +25,8 @@
 #define NFC_CHECK_PRES_FREQ_MS	2000
 
 int nfc_devlist_generation;
+/* nfc_download: used to judge whether nfc firmware download could start */
+static bool nfc_download;
 DEFINE_MUTEX(nfc_devlist_mutex);
 
 /* NFC device ID bitmap */
@@ -38,7 +40,7 @@ int nfc_fw_download(struct nfc_dev *dev, const char *firmware_name)
 
 	device_lock(&dev->dev);
 
-	if (!device_is_registered(&dev->dev)) {
+	if (!device_is_registered(&dev->dev) || !nfc_download) {
 		rc = -ENODEV;
 		goto error;
 	}
@@ -1134,6 +1136,7 @@ int nfc_register_device(struct nfc_dev *dev)
 			dev->rfkill = NULL;
 		}
 	}
+	nfc_download = true;
 	device_unlock(&dev->dev);
 
 	rc = nfc_genl_device_added(dev);
@@ -1166,6 +1169,7 @@ void nfc_unregister_device(struct nfc_dev *dev)
 		rfkill_unregister(dev->rfkill);
 		rfkill_destroy(dev->rfkill);
 	}
+	nfc_download = false;
 	device_unlock(&dev->dev);
 
 	if (dev->ops->check_presence) {
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH net v3] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs
From: kernel test robot @ 2022-04-27  1:02 UTC (permalink / raw)
  To: Duoming Zhou, krzysztof.kozlowski, pabeni, linux-kernel
  Cc: kbuild-all, davem, kuba, netdev, gregkh, alexander.deucher,
	broonie, akpm, linma, Duoming Zhou
In-Reply-To: <20220426155911.77761-1-duoming@zju.edu.cn>

Hi Duoming,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Duoming-Zhou/nfc-nfcmrvl-main-reorder-destructive-operations-in-nfcmrvl_nci_unregister_dev-to-avoid-bugs/20220427-000533
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 24cbdb910bb62b5be3865275e5682be1a7708c0f
config: i386-randconfig-s001-20220425 (https://download.01.org/0day-ci/archive/20220427/202204270807.CiWK3t7m-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.2.0-20) 11.2.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/intel-lab-lkp/linux/commit/87bb704da970003fb8a7091317e0e5a1b1f2b009
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Duoming-Zhou/nfc-nfcmrvl-main-reorder-destructive-operations-in-nfcmrvl_nci_unregister_dev-to-avoid-bugs/20220427-000533
        git checkout 87bb704da970003fb8a7091317e0e5a1b1f2b009
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash net/nfc/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> net/nfc/core.c:28:6: sparse: sparse: symbol 'nfc_download' was not declared. Should it be static?

Please review and possibly fold the followup patch.

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply

* Re: [PATCH v2 net] net: Use this_cpu_inc() to increment net->core_stats
From: patchwork-bot+netdevbpf @ 2022-04-27  1:00 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: kuba, netdev, edumazet, davem, pabeni, tglx, peterz
In-Reply-To: <YmbO0pxgtKpCw4SY@linutronix.de>

Hello:

This patch was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 25 Apr 2022 18:39:46 +0200 you wrote:
> The macro dev_core_stats_##FIELD##_inc() disables preemption and invokes
> netdev_core_stats_alloc() to return a per-CPU pointer.
> netdev_core_stats_alloc() will allocate memory on its first invocation
> which breaks on PREEMPT_RT because it requires non-atomic context for
> memory allocation.
> 
> This can be avoided by enabling preemption in netdev_core_stats_alloc()
> assuming the caller always disables preemption.
> 
> [...]

Here is the summary with links:
  - [v2,net] net: Use this_cpu_inc() to increment net->core_stats
    https://git.kernel.org/netdev/net/c/6510ea973d8d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] net: dsa: lantiq_gswip: Don't set GSWIP_MII_CFG_RMII_CLK
From: patchwork-bot+netdevbpf @ 2022-04-27  1:00 UTC (permalink / raw)
  To: Martin Blumenstingl
  Cc: netdev, hauke, linux-kernel, andrew, vivien.didelot, olteanv,
	davem, kuba, pabeni, stable, jan
In-Reply-To: <20220425152027.2220750-1-martin.blumenstingl@googlemail.com>

Hello:

This patch was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 25 Apr 2022 17:20:27 +0200 you wrote:
> Commit 4b5923249b8fa4 ("net: dsa: lantiq_gswip: Configure all remaining
> GSWIP_MII_CFG bits") added all known bits in the GSWIP_MII_CFGp
> register. It helped bring this register into a well-defined state so the
> driver has to rely less on the bootloader to do things right.
> Unfortunately it also sets the GSWIP_MII_CFG_RMII_CLK bit without any
> possibility to configure it. Upon further testing it turns out that all
> boards which are supported by the GSWIP driver in OpenWrt which use an
> RMII PHY have a dedicated oscillator on the board which provides the
> 50MHz RMII reference clock.
> 
> [...]

Here is the summary with links:
  - [net] net: dsa: lantiq_gswip: Don't set GSWIP_MII_CFG_RMII_CLK
    https://git.kernel.org/netdev/net/c/71cffebf6358

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* [RFC PATCH] nfc: nfcmrvl: main: nfc_download can be static
From: kernel test robot @ 2022-04-27  0:57 UTC (permalink / raw)
  To: Duoming Zhou, krzysztof.kozlowski, pabeni, linux-kernel
  Cc: kbuild-all, davem, kuba, netdev, gregkh, alexander.deucher,
	broonie, akpm, linma, Duoming Zhou
In-Reply-To: <20220426155911.77761-1-duoming@zju.edu.cn>

net/nfc/core.c:28:6: warning: symbol 'nfc_download' was not declared. Should it be static?

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
---
 net/nfc/core.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/nfc/core.c b/net/nfc/core.c
index da8199f67d42c..9035935e93d80 100644
--- a/net/nfc/core.c
+++ b/net/nfc/core.c
@@ -25,7 +25,7 @@
 #define NFC_CHECK_PRES_FREQ_MS	2000
 
 int nfc_devlist_generation;
-bool nfc_download;
+static bool nfc_download;
 DEFINE_MUTEX(nfc_devlist_mutex);
 
 /* NFC device ID bitmap */

^ permalink raw reply related

* Re: [PATCH v1] net: stmmac: dwmac-imx: comment spelling fix
From: patchwork-bot+netdevbpf @ 2022-04-27  0:50 UTC (permalink / raw)
  To: Marcel Ziswiler
  Cc: netdev, marcel.ziswiler, alexandre.torgue, davem, festevam,
	fugang.duan, peppe.cavallaro, kuba, joabreu, mcoquelin.stm32,
	linux-imx, pabeni, kernel, s.hauer, shawnguo, linux-arm-kernel,
	linux-kernel, linux-stm32
In-Reply-To: <20220425154856.169499-1-marcel@ziswiler.com>

Hello:

This patch was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 25 Apr 2022 17:48:56 +0200 you wrote:
> From: Marcel Ziswiler <marcel.ziswiler@toradex.com>
> 
> Fix spelling in comment.
> 
> Fixes: 94abdad6974a ("net: ethernet: dwmac: add ethernet glue logic for NXP imx8 chip")
> Signed-off-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
> 
> [...]

Here is the summary with links:
  - [v1] net: stmmac: dwmac-imx: comment spelling fix
    https://git.kernel.org/netdev/net-next/c/b1190d5175ac

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH 0/2] net: Remove unused __SLOW_DOWN_IO
From: patchwork-bot+netdevbpf @ 2022-04-27  0:50 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: kuba, davem, pabeni, 3chas3, linux-atm-general, netdev,
	linux-kernel, bhelgaas
In-Reply-To: <20220425212644.1659070-1-helgaas@kernel.org>

Hello:

This series was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 25 Apr 2022 16:26:42 -0500 you wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> Remove unused mentions of __SLOW_DOWN_IO.
> 
> Bjorn Helgaas (2):
>   net: wan: atp: remove unused eeprom_delay()
>   net: remove comments that mention obsolete __SLOW_DOWN_IO
> 
> [...]

Here is the summary with links:
  - [1/2] net: wan: atp: remove unused eeprom_delay()
    https://git.kernel.org/netdev/net-next/c/dac173db114d
  - [2/2] net: remove comments that mention obsolete __SLOW_DOWN_IO
    https://git.kernel.org/netdev/net-next/c/e39f63fe0d94

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next] net: dsa: mt753x: fix pcs conversion regression
From: patchwork-bot+netdevbpf @ 2022-04-27  0:50 UTC (permalink / raw)
  To: Russell King
  Cc: Landen.Chao, dqfext, sean.wang, daniel, andrew, vivien.didelot,
	f.fainelli, olteanv, davem, kuba, pabeni, matthias.bgg, netdev,
	linux-arm-kernel, linux-mediatek
In-Reply-To: <E1nj6FW-007WZB-5Y@rmk-PC.armlinux.org.uk>

Hello:

This patch was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 25 Apr 2022 22:28:02 +0100 you wrote:
> Daniel Golle reports that the conversion of mt753x to phylink PCS caused
> an oops as below.
> 
> The problem is with the placement of the PCS initialisation, which
> occurs after mt7531_setup() has been called. However, burited in this
> function is a call to setup the CPU port, which requires the PCS
> structure to be already setup.
> 
> [...]

Here is the summary with links:
  - [net-next] net: dsa: mt753x: fix pcs conversion regression
    https://git.kernel.org/netdev/net-next/c/fae463084032

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next] net: tls: fix async vs NIC crypto offload
From: patchwork-bot+netdevbpf @ 2022-04-27  0:50 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, pabeni, gal, borisp, john.fastabend, daniel
In-Reply-To: <20220425233309.344858-1-kuba@kernel.org>

Hello:

This patch was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 25 Apr 2022 16:33:09 -0700 you wrote:
> When NIC takes care of crypto (or the record has already
> been decrypted) we forget to update darg->async. ->async
> is supposed to mean whether record is async capable on
> input and whether record has been queued for async crypto
> on output.
> 
> Reported-by: Gal Pressman <gal@nvidia.com>
> Fixes: 3547a1f9d988 ("tls: rx: use async as an in-out argument")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> 
> [...]

Here is the summary with links:
  - [net-next] net: tls: fix async vs NIC crypto offload
    https://git.kernel.org/netdev/net-next/c/c706b2b5ed74

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH v2 net-next] net: generalize skb freeing deferral to per-cpu lists
From: patchwork-bot+netdevbpf @ 2022-04-27  0:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, kuba, pabeni, netdev, edumazet
In-Reply-To: <20220422201237.416238-1-eric.dumazet@gmail.com>

Hello:

This patch was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 22 Apr 2022 13:12:37 -0700 you wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Logic added in commit f35f821935d8 ("tcp: defer skb freeing after socket
> lock is released") helped bulk TCP flows to move the cost of skbs
> frees outside of critical section where socket lock was held.
> 
> But for RPC traffic, or hosts with RFS enabled, the solution is far from
> being ideal.
> 
> [...]

Here is the summary with links:
  - [v2,net-next] net: generalize skb freeing deferral to per-cpu lists
    https://git.kernel.org/netdev/net-next/c/68822bdf76f1

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* 答复: [PATCH bpf-next] samples/bpf: detach xdp prog when program exits unexpectedly in xdp_rxq_info_user
From: shaozhengchao @ 2022-04-27  0:45 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Networking, open list, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	john fastabend, Andrii Nakryiko, Martin Lau, Song Liu,
	Yonghong Song, KP Singh, weiyongjun (A), yuehaibing
In-Reply-To: <CAEf4BzY=VajaEH_09FX0-wuPFCJ6te=shZa0jj2Jc7ukL-WzMQ@mail.gmail.com>

Thank you for your reply.

I will fix the compilation warning firstly, and then convert this sample to skeleton. Thank you very much.

-----邮件原件-----
发件人: Andrii Nakryiko [mailto:andrii.nakryiko@gmail.com] 
发送时间: 2022年4月26日 12:35
收件人: shaozhengchao <shaozhengchao@huawei.com>
抄送: bpf <bpf@vger.kernel.org>; Networking <netdev@vger.kernel.org>; open list <linux-kernel@vger.kernel.org>; Alexei Starovoitov <ast@kernel.org>; Daniel Borkmann <daniel@iogearbox.net>; David S. Miller <davem@davemloft.net>; Jakub Kicinski <kuba@kernel.org>; Jesper Dangaard Brouer <hawk@kernel.org>; john fastabend <john.fastabend@gmail.com>; Andrii Nakryiko <andrii@kernel.org>; Martin Lau <kafai@fb.com>; Song Liu <songliubraving@fb.com>; Yonghong Song <yhs@fb.com>; KP Singh <kpsingh@kernel.org>; weiyongjun (A) <weiyongjun1@huawei.com>; yuehaibing <yuehaibing@huawei.com>
主题: Re: [PATCH bpf-next] samples/bpf: detach xdp prog when program exits unexpectedly in xdp_rxq_info_user

On Thu, Apr 21, 2022 at 5:07 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote:
>
> When xdp_rxq_info_user program exits unexpectedly, it doesn't detach 
> xdp prog of device, and other xdp prog can't be attached to the 
> device. So call init_exit() to detach xdp prog when program exits unexpectedly.
>
> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
> ---
>  samples/bpf/xdp_rxq_info_user.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
>

you are introducing a new compilation warning, please fix it


/data/users/andriin/linux/samples/bpf/xdp_rxq_info_user.c: In function
‘options2str’:
/data/users/andriin/linux/samples/bpf/xdp_rxq_info_user.c:153:1:
warning: control reaches end of non-void function [-Wreturn-type]
  153 | }
      | ^

It also would be good to instead use bpf_link-based XDP attachment that would be auto-detached automatically on process crash
(bpf_link_create() FTW). Please consider also converting this sample to skeleton (and then bpf_program__attach_xdp() as high-level alternative to bpf_link_create()).

> diff --git a/samples/bpf/xdp_rxq_info_user.c 
> b/samples/bpf/xdp_rxq_info_user.c index f2d90cba5164..6378007e085a 
> 100644
> --- a/samples/bpf/xdp_rxq_info_user.c
> +++ b/samples/bpf/xdp_rxq_info_user.c
> @@ -18,7 +18,7 @@ static const char *__doc__ = " XDP RX-queue info extract example\n\n"
>  #include <getopt.h>
>  #include <net/if.h>
>  #include <time.h>

[...]

^ permalink raw reply

* Re: [PATCH v35 08/29] LSM: Use lsmblob in security_secctx_to_secid
From: John Johansen @ 2022-04-27  0:38 UTC (permalink / raw)
  To: Casey Schaufler, casey.schaufler, jmorris, linux-security-module,
	selinux
  Cc: linux-audit, keescook, penguin-kernel, paul, stephen.smalley.work,
	linux-kernel, netdev, netfilter-devel
In-Reply-To: <20220418145945.38797-9-casey@schaufler-ca.com>

On 4/18/22 07:59, Casey Schaufler wrote:
> Change the security_secctx_to_secid interface to use a lsmblob
> structure in place of the single u32 secid in support of
> module stacking. Change its callers to do the same.
> 
> The security module hook is unchanged, still passing back a secid.
> The infrastructure passes the correct entry from the lsmblob.
> 
> Acked-by: Paul Moore <paul@paul-moore.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>
> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
> Cc: netdev@vger.kernel.org
> Cc: netfilter-devel@vger.kernel.org
> To: Pablo Neira Ayuso <pablo@netfilter.org>

Reviewed-by: John Johansen <john.johansen@canonical.com>

> ---
>  include/linux/security.h          | 26 ++++++++++++++++++--
>  kernel/cred.c                     |  4 +---
>  net/netfilter/nft_meta.c          | 10 ++++----
>  net/netfilter/xt_SECMARK.c        |  7 +++++-
>  net/netlabel/netlabel_unlabeled.c | 23 +++++++++++-------
>  security/security.c               | 40 ++++++++++++++++++++++++++-----
>  6 files changed, 85 insertions(+), 25 deletions(-)
> 
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 68ab0add23d3..57879f0b9f89 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -199,6 +199,27 @@ static inline bool lsmblob_equal(const struct lsmblob *bloba,
>  extern int lsm_name_to_slot(char *name);
>  extern const char *lsm_slot_to_name(int slot);
>  
> +/**
> + * lsmblob_value - find the first non-zero value in an lsmblob structure.
> + * @blob: Pointer to the data
> + *
> + * This needs to be used with extreme caution, as the cases where
> + * it is appropriate are rare.
> + *
> + * Return the first secid value set in the lsmblob.
> + * There should only be one.
> + */
> +static inline u32 lsmblob_value(const struct lsmblob *blob)
> +{
> +	int i;
> +
> +	for (i = 0; i < LSMBLOB_ENTRIES; i++)
> +		if (blob->secid[i])
> +			return blob->secid[i];
> +
> +	return 0;
> +}
> +
>  /* These functions are in security/commoncap.c */
>  extern int cap_capable(const struct cred *cred, struct user_namespace *ns,
>  		       int cap, unsigned int opts);
> @@ -529,7 +550,8 @@ int security_setprocattr(const char *lsm, const char *name, void *value,
>  int security_netlink_send(struct sock *sk, struct sk_buff *skb);
>  int security_ismaclabel(const char *name);
>  int security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen);
> -int security_secctx_to_secid(const char *secdata, u32 seclen, u32 *secid);
> +int security_secctx_to_secid(const char *secdata, u32 seclen,
> +			     struct lsmblob *blob);
>  void security_release_secctx(char *secdata, u32 seclen);
>  void security_inode_invalidate_secctx(struct inode *inode);
>  int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen);
> @@ -1384,7 +1406,7 @@ static inline int security_secid_to_secctx(u32 secid, char **secdata, u32 *secle
>  
>  static inline int security_secctx_to_secid(const char *secdata,
>  					   u32 seclen,
> -					   u32 *secid)
> +					   struct lsmblob *blob)
>  {
>  	return -EOPNOTSUPP;
>  }
> diff --git a/kernel/cred.c b/kernel/cred.c
> index 3925d38f49f4..adea727744f4 100644
> --- a/kernel/cred.c
> +++ b/kernel/cred.c
> @@ -791,14 +791,12 @@ EXPORT_SYMBOL(set_security_override);
>  int set_security_override_from_ctx(struct cred *new, const char *secctx)
>  {
>  	struct lsmblob blob;
> -	u32 secid;
>  	int ret;
>  
> -	ret = security_secctx_to_secid(secctx, strlen(secctx), &secid);
> +	ret = security_secctx_to_secid(secctx, strlen(secctx), &blob);
>  	if (ret < 0)
>  		return ret;
>  
> -	lsmblob_init(&blob, secid);
>  	return set_security_override(new, &blob);
>  }
>  EXPORT_SYMBOL(set_security_override_from_ctx);
> diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
> index ac4859241e17..fc0028c9e33d 100644
> --- a/net/netfilter/nft_meta.c
> +++ b/net/netfilter/nft_meta.c
> @@ -860,21 +860,21 @@ static const struct nla_policy nft_secmark_policy[NFTA_SECMARK_MAX + 1] = {
>  
>  static int nft_secmark_compute_secid(struct nft_secmark *priv)
>  {
> -	u32 tmp_secid = 0;
> +	struct lsmblob blob;
>  	int err;
>  
> -	err = security_secctx_to_secid(priv->ctx, strlen(priv->ctx), &tmp_secid);
> +	err = security_secctx_to_secid(priv->ctx, strlen(priv->ctx), &blob);
>  	if (err)
>  		return err;
>  
> -	if (!tmp_secid)
> +	if (!lsmblob_is_set(&blob))
>  		return -ENOENT;
>  
> -	err = security_secmark_relabel_packet(tmp_secid);
> +	err = security_secmark_relabel_packet(lsmblob_value(&blob));
>  	if (err)
>  		return err;
>  
> -	priv->secid = tmp_secid;
> +	priv->secid = lsmblob_value(&blob);
>  	return 0;
>  }
>  
> diff --git a/net/netfilter/xt_SECMARK.c b/net/netfilter/xt_SECMARK.c
> index 498a0bf6f044..87ca3a537d1c 100644
> --- a/net/netfilter/xt_SECMARK.c
> +++ b/net/netfilter/xt_SECMARK.c
> @@ -42,13 +42,14 @@ secmark_tg(struct sk_buff *skb, const struct xt_secmark_target_info_v1 *info)
>  
>  static int checkentry_lsm(struct xt_secmark_target_info_v1 *info)
>  {
> +	struct lsmblob blob;
>  	int err;
>  
>  	info->secctx[SECMARK_SECCTX_MAX - 1] = '\0';
>  	info->secid = 0;
>  
>  	err = security_secctx_to_secid(info->secctx, strlen(info->secctx),
> -				       &info->secid);
> +				       &blob);
>  	if (err) {
>  		if (err == -EINVAL)
>  			pr_info_ratelimited("invalid security context \'%s\'\n",
> @@ -56,6 +57,10 @@ static int checkentry_lsm(struct xt_secmark_target_info_v1 *info)
>  		return err;
>  	}
>  
> +	/* xt_secmark_target_info can't be changed to use lsmblobs because
> +	 * it is exposed as an API. Use lsmblob_value() to get the one
> +	 * value that got set by security_secctx_to_secid(). */
> +	info->secid = lsmblob_value(&blob);
>  	if (!info->secid) {
>  		pr_info_ratelimited("unable to map security context \'%s\'\n",
>  				    info->secctx);
> diff --git a/net/netlabel/netlabel_unlabeled.c b/net/netlabel/netlabel_unlabeled.c
> index 8490e46359ae..f3e2cde76919 100644
> --- a/net/netlabel/netlabel_unlabeled.c
> +++ b/net/netlabel/netlabel_unlabeled.c
> @@ -880,7 +880,7 @@ static int netlbl_unlabel_staticadd(struct sk_buff *skb,
>  	void *addr;
>  	void *mask;
>  	u32 addr_len;
> -	u32 secid;
> +	struct lsmblob blob;
>  	struct netlbl_audit audit_info;
>  
>  	/* Don't allow users to add both IPv4 and IPv6 addresses for a
> @@ -904,13 +904,18 @@ static int netlbl_unlabel_staticadd(struct sk_buff *skb,
>  	ret_val = security_secctx_to_secid(
>  		                  nla_data(info->attrs[NLBL_UNLABEL_A_SECCTX]),
>  				  nla_len(info->attrs[NLBL_UNLABEL_A_SECCTX]),
> -				  &secid);
> +				  &blob);
>  	if (ret_val != 0)
>  		return ret_val;
>  
> +	/* netlbl_unlhsh_add will be changed to pass a struct lsmblob *
> +	 * instead of a u32 later in this patch set. security_secctx_to_secid()
> +	 * will only be setting one entry in the lsmblob struct, so it is
> +	 * safe to use lsmblob_value() to get that one value. */
> +
>  	return netlbl_unlhsh_add(&init_net,
> -				 dev_name, addr, mask, addr_len, secid,
> -				 &audit_info);
> +				 dev_name, addr, mask, addr_len,
> +				 lsmblob_value(&blob), &audit_info);
>  }
>  
>  /**
> @@ -931,7 +936,7 @@ static int netlbl_unlabel_staticadddef(struct sk_buff *skb,
>  	void *addr;
>  	void *mask;
>  	u32 addr_len;
> -	u32 secid;
> +	struct lsmblob blob;
>  	struct netlbl_audit audit_info;
>  
>  	/* Don't allow users to add both IPv4 and IPv6 addresses for a
> @@ -953,13 +958,15 @@ static int netlbl_unlabel_staticadddef(struct sk_buff *skb,
>  	ret_val = security_secctx_to_secid(
>  		                  nla_data(info->attrs[NLBL_UNLABEL_A_SECCTX]),
>  				  nla_len(info->attrs[NLBL_UNLABEL_A_SECCTX]),
> -				  &secid);
> +				  &blob);
>  	if (ret_val != 0)
>  		return ret_val;
>  
> +	/* security_secctx_to_secid() will only put one secid into the lsmblob
> +	 * so it's safe to use lsmblob_value() to get the secid. */
>  	return netlbl_unlhsh_add(&init_net,
> -				 NULL, addr, mask, addr_len, secid,
> -				 &audit_info);
> +				 NULL, addr, mask, addr_len,
> +				 lsmblob_value(&blob), &audit_info);
>  }
>  
>  /**
> diff --git a/security/security.c b/security/security.c
> index e9f1487af0e5..f814a41c5d9f 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -2211,10 +2211,22 @@ int security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen)
>  }
>  EXPORT_SYMBOL(security_secid_to_secctx);
>  
> -int security_secctx_to_secid(const char *secdata, u32 seclen, u32 *secid)
> +int security_secctx_to_secid(const char *secdata, u32 seclen,
> +			     struct lsmblob *blob)
>  {
> -	*secid = 0;
> -	return call_int_hook(secctx_to_secid, 0, secdata, seclen, secid);
> +	struct security_hook_list *hp;
> +	int rc;
> +
> +	lsmblob_init(blob, 0);
> +	hlist_for_each_entry(hp, &security_hook_heads.secctx_to_secid, list) {
> +		if (WARN_ON(hp->lsmid->slot < 0 || hp->lsmid->slot >= lsm_slot))
> +			continue;
> +		rc = hp->hook.secctx_to_secid(secdata, seclen,
> +					      &blob->secid[hp->lsmid->slot]);
> +		if (rc != 0)
> +			return rc;
> +	}
> +	return 0;
>  }
>  EXPORT_SYMBOL(security_secctx_to_secid);
>  
> @@ -2365,10 +2377,26 @@ int security_socket_getpeersec_stream(struct socket *sock, char __user *optval,
>  				optval, optlen, len);
>  }
>  
> -int security_socket_getpeersec_dgram(struct socket *sock, struct sk_buff *skb, u32 *secid)
> +int security_socket_getpeersec_dgram(struct socket *sock, struct sk_buff *skb,
> +				     u32 *secid)
>  {
> -	return call_int_hook(socket_getpeersec_dgram, -ENOPROTOOPT, sock,
> -			     skb, secid);
> +	struct security_hook_list *hp;
> +	int rc = -ENOPROTOOPT;
> +
> +	/*
> +	 * Only one security module should provide a real hook for
> +	 * this. A stub or bypass like is used in BPF should either
> +	 * (somehow) leave rc unaltered or return -ENOPROTOOPT.
> +	 */
> +	hlist_for_each_entry(hp, &security_hook_heads.socket_getpeersec_dgram,
> +			     list) {
> +		if (WARN_ON(hp->lsmid->slot < 0 || hp->lsmid->slot >= lsm_slot))
> +			continue;
> +		rc = hp->hook.socket_getpeersec_dgram(sock, skb, secid);
> +		if (rc != -ENOPROTOOPT)
> +			break;
> +	}
> +	return rc;
>  }
>  EXPORT_SYMBOL(security_socket_getpeersec_dgram);
>  


^ permalink raw reply

* Re: [PATCH net-next] net: dsa: mt753x: fix pcs conversion regression
From: Jakub Kicinski @ 2022-04-27  0:34 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Landen Chao, DENG Qingfang, Sean Wang, Daniel Golle, Andrew Lunn,
	Vivien Didelot, Florian Fainelli, Vladimir Oltean,
	David S. Miller, Paolo Abeni, Matthias Brugger, netdev,
	linux-arm-kernel, linux-mediatek
In-Reply-To: <E1nj6FW-007WZB-5Y@rmk-PC.armlinux.org.uk>

On Mon, 25 Apr 2022 22:28:02 +0100 Russell King (Oracle) wrote:
> Reported-by: Daniel Golle <daniel@makrotopia.org>
> Tested-by: Daniel Golle <daniel@makrotopia.org>
> Fixes: cbd1f243bc41 ("net: dsa: mt7530: partially convert to phylink_pcs")
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> ---
> 
> Fixes line added so people know which net-next commit is affected;
> this is only in net-next at present. If you think it's unnecessary,
> please remove when applying the patch, or let me know and I will
> resend.

Not at all, FWIW it's very useful.

^ permalink raw reply

* [PATCH -next] rtw89: remove unneeded semicolon
From: Yang Li @ 2022-04-27  0:31 UTC (permalink / raw)
  To: davem
  Cc: kuba, pabeni, kvalo, pkshih, linux-wireless, netdev, linux-kernel,
	Yang Li, Abaci Robot

Eliminate the following coccicheck warning:
./drivers/net/wireless/realtek/rtw89/rtw8852c.c:2556:2-3: Unneeded
semicolon

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
---
 drivers/net/wireless/realtek/rtw89/rtw8852c.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/realtek/rtw89/rtw8852c.c b/drivers/net/wireless/realtek/rtw89/rtw8852c.c
index 858611c64e6b..275568468212 100644
--- a/drivers/net/wireless/realtek/rtw89/rtw8852c.c
+++ b/drivers/net/wireless/realtek/rtw89/rtw8852c.c
@@ -2553,7 +2553,7 @@ do {								\
 	default:
 		val = arg.gnt_bt.data;
 		break;
-	};
+	}
 
 	__write_ctrl(R_AX_PWR_COEXT_CTRL, B_AX_TXAGC_BT_MASK, val,
 		     B_AX_TXAGC_BT_EN, arg.ctrl_gnt_bt != 0xffff);
-- 
2.20.1.7.g153144c


^ permalink raw reply related

* Re: [PATCH bpf-next v5 3/8] bpf: per-cgroup lsm flavor
From: Martin KaFai Lau @ 2022-04-27  0:10 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: netdev, bpf, ast, daniel, andrii
In-Reply-To: <20220419190053.3395240-4-sdf@google.com>

On Tue, Apr 19, 2022 at 12:00:48PM -0700, Stanislav Fomichev wrote:
> +static void bpf_cgroup_storages_unlink(struct bpf_cgroup_storage *storages[])
> +{
> +	enum bpf_cgroup_storage_type stype;
> +
> +	for_each_cgroup_storage_type(stype)
> +		bpf_cgroup_storage_unlink(storages[stype]);
> +}
> +
>  /* Called when bpf_cgroup_link is auto-detached from dying cgroup.
>   * It drops cgroup and bpf_prog refcounts, and marks bpf_link as defunct. It
>   * doesn't free link memory, which will eventually be done by bpf_link's
> @@ -166,6 +256,16 @@ static void bpf_cgroup_link_auto_detach(struct bpf_cgroup_link *link)
>  	link->cgroup = NULL;
>  }
>  
> +static void bpf_cgroup_lsm_shim_release(struct bpf_prog *prog,
> +					enum cgroup_bpf_attach_type atype)
> +{
> +	if (prog->aux->cgroup_atype < CGROUP_LSM_START ||
> +	    prog->aux->cgroup_atype > CGROUP_LSM_END)
> +		return;
> +
> +	bpf_trampoline_unlink_cgroup_shim(prog);
> +}
> +
>  /**
>   * cgroup_bpf_release() - put references of all bpf programs and
>   *                        release all cgroup bpf data
> @@ -190,10 +290,18 @@ static void cgroup_bpf_release(struct work_struct *work)
>  
>  		hlist_for_each_entry_safe(pl, pltmp, progs, node) {
>  			hlist_del(&pl->node);
> -			if (pl->prog)
> +			if (pl->prog) {
> +				if (atype == BPF_LSM_CGROUP)
> +					bpf_cgroup_lsm_shim_release(pl->prog,
> +								    atype);
>  				bpf_prog_put(pl->prog);
> -			if (pl->link)
> +			}
> +			if (pl->link) {
> +				if (atype == BPF_LSM_CGROUP)
> +					bpf_cgroup_lsm_shim_release(pl->link->link.prog,
> +								    atype);
>  				bpf_cgroup_link_auto_detach(pl->link);
> +			}
>  			kfree(pl);
>  			static_branch_dec(&cgroup_bpf_enabled_key[atype]);
>  		}
> @@ -506,6 +614,7 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
>  	struct bpf_prog *old_prog = NULL;
>  	struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
>  	struct bpf_cgroup_storage *new_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
> +	struct bpf_attach_target_info tgt_info = {};
>  	enum cgroup_bpf_attach_type atype;
>  	struct bpf_prog_list *pl;
>  	struct hlist_head *progs;
> @@ -522,9 +631,35 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
>  		/* replace_prog implies BPF_F_REPLACE, and vice versa */
>  		return -EINVAL;
>  
> -	atype = to_cgroup_bpf_attach_type(type);
> -	if (atype < 0)
> -		return -EINVAL;
> +	if (type == BPF_LSM_CGROUP) {
> +		struct bpf_prog *p = prog ? : link->link.prog;
> +
> +		if (replace_prog) {
> +			/* Reusing shim from the original program.
> +			 */
> +			if (replace_prog->aux->attach_btf_id !=
> +			    p->aux->attach_btf_id)
> +				return -EINVAL;
> +
> +			atype = replace_prog->aux->cgroup_atype;
> +		} else {
> +			err = bpf_check_attach_target(NULL, p, NULL,
> +						      p->aux->attach_btf_id,
> +						      &tgt_info);
> +			if (err)
> +				return -EINVAL;
> +
> +			atype = bpf_lsm_attach_type_get(p->aux->attach_btf_id);
> +			if (atype < 0)
> +				return atype;
> +		}
> +
> +		p->aux->cgroup_atype = atype;
> +	} else {
> +		atype = to_cgroup_bpf_attach_type(type);
> +		if (atype < 0)
> +			return -EINVAL;
> +	}
>  
>  	progs = &cgrp->bpf.progs[atype];
>  
> @@ -580,13 +715,26 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
>  	if (err)
>  		goto cleanup;
>  
> +	bpf_cgroup_storages_link(new_storage, cgrp, type);
After looking between this patch 3 and the next patch 4, I can't
quite think this through quickly, so it may be faster to ask :)

I have questions on the ordering between update_effective_progs(),
bpf_cgroup_storages_link(), and bpf_trampoline_link_cgroup_shim().

Why bpf_cgroup_storages_link() has to be moved up and done before
bpf_trampoline_link_cgroup_shim() ?

> +
> +	if (type == BPF_LSM_CGROUP && !old_prog) {
> +		struct bpf_prog *p = prog ? : link->link.prog;
> +
> +		err = bpf_trampoline_link_cgroup_shim(p, &tgt_info);
update_effective_progs() was done a few lines above, so the effective[atype]
array has the new_prog now.

If bpf_trampoline_link_cgroup_shim() did fail to add the
shim_prog to the trampoline, the new_prog will still be left in
effective[atype].  There is no shim_prog to execute effective[].
However, there may be places that access effective[]. e.g.
__cgroup_bpf_query() although I think to_cgroup_bpf_attach_type()
is not handling BPF_LSM_CGROUP now.  More on __cgroup_bpf_query()
later.

Doing bpf_trampoline_link_cgroup_shim() just before activate_effective_progs() ?

Have you thought about what is needed to support __cgroup_bpf_query() ?
bpf_attach_type and cgroup_bpf_attach_type is no longer a 1:1 relationship.
Looping through cgroup_lsm_atype_usecnt[] and output them under BPF_LSM_CGROUP ?
Same goes for local_storage.  All lsm-cgrp attaching to different
attach_btf_id sharing one local_storage because the key is only
cgroup-id and attach_type.  Is it enough to start with that
first and the key could be extended later with a new map_flag?
This is related to the API.

> +		if (err)
> +			goto cleanup_trampoline;
> +	}
> +
>  	if (old_prog)
>  		bpf_prog_put(old_prog);
>  	else
>  		static_branch_inc(&cgroup_bpf_enabled_key[atype]);
> -	bpf_cgroup_storages_link(new_storage, cgrp, type);
> +
>  	return 0;
>  
> +cleanup_trampoline:
> +	bpf_cgroup_storages_unlink(new_storage);
> +
>  cleanup:
>  	if (old_prog) {
>  		pl->prog = old_prog;
> @@ -678,9 +826,13 @@ static int __cgroup_bpf_replace(struct cgroup *cgrp,
>  	struct hlist_head *progs;
>  	bool found = false;
>  
> -	atype = to_cgroup_bpf_attach_type(link->type);
> -	if (atype < 0)
> -		return -EINVAL;
> +	if (link->type == BPF_LSM_CGROUP) {
> +		atype = link->link.prog->aux->cgroup_atype;
> +	} else {
> +		atype = to_cgroup_bpf_attach_type(link->type);
> +		if (atype < 0)
> +			return -EINVAL;
> +	}
>  
>  	progs = &cgrp->bpf.progs[atype];
>  
> @@ -696,6 +848,9 @@ static int __cgroup_bpf_replace(struct cgroup *cgrp,
>  	if (!found)
>  		return -ENOENT;
>  
> +	if (link->type == BPF_LSM_CGROUP)
> +		new_prog->aux->cgroup_atype = atype;
> +
>  	old_prog = xchg(&link->link.prog, new_prog);
>  	replace_effective_prog(cgrp, atype, link);
>  	bpf_prog_put(old_prog);
> @@ -779,9 +934,15 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
>  	u32 flags;
>  	int err;
>  
> -	atype = to_cgroup_bpf_attach_type(type);
> -	if (atype < 0)
> -		return -EINVAL;
> +	if (type == BPF_LSM_CGROUP) {
> +		struct bpf_prog *p = prog ? : link->link.prog;
> +
> +		atype = p->aux->cgroup_atype;
> +	} else {
> +		atype = to_cgroup_bpf_attach_type(type);
> +		if (atype < 0)
> +			return -EINVAL;
> +	}
>  
>  	progs = &cgrp->bpf.progs[atype];
>  	flags = cgrp->bpf.flags[atype];
> @@ -803,6 +964,10 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
>  	if (err)
>  		goto cleanup;
>  
> +	if (type == BPF_LSM_CGROUP)
> +		bpf_cgroup_lsm_shim_release(prog ? : link->link.prog,
> +					    atype);
> +
>  	/* now can actually delete it from this cgroup list */
>  	hlist_del(&pl->node);
>  

[ ... ]

> diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
> index 0c4fd194e801..c76dfa4ea2d9 100644
> --- a/kernel/bpf/trampoline.c
> +++ b/kernel/bpf/trampoline.c
> @@ -11,6 +11,8 @@
>  #include <linux/rcupdate_wait.h>
>  #include <linux/module.h>
>  #include <linux/static_call.h>
> +#include <linux/bpf_verifier.h>
> +#include <linux/bpf_lsm.h>
>  
>  /* dummy _ops. The verifier will operate on target program's ops. */
>  const struct bpf_verifier_ops bpf_extension_verifier_ops = {
> @@ -485,6 +487,149 @@ int bpf_trampoline_unlink_prog(struct bpf_prog *prog, struct bpf_trampoline *tr)
>  	return err;
>  }
>  
> +#if defined(CONFIG_BPF_JIT) && defined(CONFIG_BPF_SYSCALL)
> +static struct bpf_prog *cgroup_shim_alloc(const struct bpf_prog *prog,
> +					  bpf_func_t bpf_func)
> +{
> +	struct bpf_prog *p;
> +
> +	p = bpf_prog_alloc(1, 0);
> +	if (!p)
> +		return NULL;
> +
> +	p->jited = false;
> +	p->bpf_func = bpf_func;
> +
> +	p->aux->cgroup_atype = prog->aux->cgroup_atype;
> +	p->aux->attach_func_proto = prog->aux->attach_func_proto;
> +	p->aux->attach_btf_id = prog->aux->attach_btf_id;
> +	p->aux->attach_btf = prog->aux->attach_btf;
> +	btf_get(p->aux->attach_btf);
> +	p->type = BPF_PROG_TYPE_LSM;
> +	p->expected_attach_type = BPF_LSM_MAC;
> +	bpf_prog_inc(p);
> +
> +	return p;
> +}
> +
> +static struct bpf_prog *cgroup_shim_find(struct bpf_trampoline *tr,
> +					 bpf_func_t bpf_func)
> +{
> +	const struct bpf_prog_aux *aux;
> +	int kind;
> +
> +	for (kind = 0; kind < BPF_TRAMP_MAX; kind++) {
> +		hlist_for_each_entry(aux, &tr->progs_hlist[kind], tramp_hlist) {
> +			struct bpf_prog *p = aux->prog;
> +
> +			if (p->bpf_func == bpf_func)
> +				return p;
> +		}
> +	}
> +
> +	return NULL;
> +}
> +
> +int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
> +				    struct bpf_attach_target_info *tgt_info)
> +{
> +	struct bpf_prog *shim_prog = NULL;
> +	struct bpf_trampoline *tr;
> +	bpf_func_t bpf_func;
> +	u64 key;
> +	int err;
> +
> +	key = bpf_trampoline_compute_key(NULL, prog->aux->attach_btf,
> +					 prog->aux->attach_btf_id);
> +
> +	err = bpf_lsm_find_cgroup_shim(prog, &bpf_func);
> +	if (err)
> +		return err;
> +
> +	tr = bpf_trampoline_get(key, tgt_info);
> +	if (!tr)
> +		return  -ENOMEM;
> +
> +	mutex_lock(&tr->mutex);
> +
> +	shim_prog = cgroup_shim_find(tr, bpf_func);
> +	if (shim_prog) {
> +		/* Reusing existing shim attached by the other program.
> +		 */
> +		bpf_prog_inc(shim_prog);
> +		mutex_unlock(&tr->mutex);
> +		return 0;
> +	}
> +
> +	/* Allocate and install new shim.
> +	 */
> +
> +	shim_prog = cgroup_shim_alloc(prog, bpf_func);
> +	if (!shim_prog) {
> +		err = -ENOMEM;
> +		goto out;
> +	}
> +
> +	err = __bpf_trampoline_link_prog(shim_prog, tr);
> +	if (err)
> +		goto out;
> +
> +	mutex_unlock(&tr->mutex);
> +
> +	return 0;
> +out:
> +	if (shim_prog)
> +		bpf_prog_put(shim_prog);
> +
> +	mutex_unlock(&tr->mutex);
> +	return err;
> +}
> +
> +void bpf_trampoline_unlink_cgroup_shim(struct bpf_prog *prog)
> +{
> +	struct bpf_prog *shim_prog;
> +	struct bpf_trampoline *tr;
> +	bpf_func_t bpf_func;
> +	u64 key;
> +	int err;
> +
> +	key = bpf_trampoline_compute_key(NULL, prog->aux->attach_btf,
> +					 prog->aux->attach_btf_id);
> +
> +	err = bpf_lsm_find_cgroup_shim(prog, &bpf_func);
> +	if (err)
> +		return;
> +
> +	tr = bpf_trampoline_lookup(key);
> +	if (!tr)
> +		return;
> +
> +	mutex_lock(&tr->mutex);
> +
> +	shim_prog = cgroup_shim_find(tr, bpf_func);
> +	if (shim_prog) {
> +		/* We use shim_prog refcnt for tracking whether to
> +		 * remove the shim program from the trampoline.
> +		 * Trampoline's mutex is held while refcnt is
> +		 * added/subtracted so we don't need to care about
> +		 * potential races.
> +		 */
> +
> +		if (atomic64_read(&shim_prog->aux->refcnt) == 1)
> +			WARN_ON_ONCE(__bpf_trampoline_unlink_prog(shim_prog, tr));
> +
> +		bpf_prog_put(shim_prog);
> +	}
> +
> +	mutex_unlock(&tr->mutex);
> +
> +	bpf_trampoline_put(tr); /* bpf_trampoline_lookup */
> +
> +	if (shim_prog)
> +		bpf_trampoline_put(tr);
> +}
> +#endif
> +

^ permalink raw reply

* Re: [PATCH RFC 4/5] net/tls: Add support for PF_TLSH (a TLS handshake listener)
From: Jakub Kicinski @ 2022-04-27  0:03 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Sagi Grimberg, Chuck Lever, netdev, linux-nfs, linux-nvme,
	linux-cifs, linux-fsdevel, ak, borisp, simo
In-Reply-To: <40bc060f-f359-081d-9ba7-fae531cf2cd6@suse.de>

On Tue, 26 Apr 2022 17:58:39 +0200 Hannes Reinecke wrote:
> > Plus there are more protocols being actively worked on (QUIC, PSP etc.)
> > Having per ULP special sauce to invoke a user space helper is not the
> > paradigm we chose, and the time as inopportune as ever to change that.  
> 
> Which is precisely what we hope to discuss at LSF.
> (Yes, I know, probably not the best venue to discuss network stuff ...)

Indeed.

> Each approach has its drawbacks:
> 
> - Establishing sockets from userspace will cause issues during 
> reconnection, as then someone (aka the kernel) will have to inform 
> userspace that a new connection will need to be established.
> (And that has to happen while the root filesystem is potentially 
> inaccessible, so you can't just call arbitrary commands here)
> (Especially call_usermodehelper() is out of the game)

Indeed, we may need _some_ form of a notification mechanism and that's
okay. Can be a (more generic) socket, can be something based on existing
network storage APIs (IDK what you have there).

My thinking was that establishing the session in user space would be
easiest. We wouldn't need all the special getsockopt()s which AFAIU
work around part of the handshake being done in the kernel, and which,
I hope we can agree, are not beautiful.

> - Having ULP helpers (as with this design) mitigates that problem 
> somewhat in the sense that you can mlock() that daemon and having it 
> polling on an intermediate socket; that solves the notification problem.
> But you have to have ULP special sauce here to make it work.

TBH I don't see how this is much different to option 1 in terms of
constraints & requirements on the user space agent. We can implement
option 1 over a socket-like interface, too, and that'll carry
notifications all the same.

> - Moving everything in kernel is ... possible. But then you have yet 
> another security-relevant piece of code in the kernel which needs to be 
> audited, CVEd etc. Not to mention the usual policy discussion whether it 
> really belongs into the kernel.

Yeah, if that gets posted it'd be great if it includes removing me from
the TLS maintainers 'cause I want to sleep at night ;)

> So I don't really see any obvious way to go; best we can do is to pick 
> the least ugly :-(

True, I'm sure we can find some middle ground between 1 and 2.
Preferably implemented in a way where the mechanism is separated 
from the fact it's carrying TLS handshake requests, so that it can
carry something else tomorrow.

^ permalink raw reply

* Re: [RFC v1 1/3] net: dsa: mt753x: make reset optional
From: Vladimir Oltean @ 2022-04-26 23:57 UTC (permalink / raw)
  To: Frank Wunderlich
  Cc: linux-mediatek, linux-rockchip, Frank Wunderlich, Rob Herring,
	Krzysztof Kozlowski, Heiko Stuebner, Sean Wang, Landen Chao,
	DENG Qingfang, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Matthias Brugger,
	Peter Geis, devicetree, linux-arm-kernel, linux-kernel, netdev
In-Reply-To: <20220426134924.30372-2-linux@fw-web.de>

On Tue, Apr 26, 2022 at 03:49:22PM +0200, Frank Wunderlich wrote:
> From: Frank Wunderlich <frank-w@public-files.de>
> 
> Currently a reset line is required, but on BPI-R2-Pro board
> this reset is shared with the gmac and prevents the switch to
> be initialized because mdio is not ready fast enough after
> the reset.
> 
> So make the reset optional to allow shared reset lines.

What does it mean "to allow shared reset lines"? Allow as in "allow them
to sit there, unused"?

> Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
> ---
>  drivers/net/dsa/mt7530.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
> index 19f0035d4410..ccf4cb944167 100644
> --- a/drivers/net/dsa/mt7530.c
> +++ b/drivers/net/dsa/mt7530.c
> @@ -2134,7 +2134,7 @@ mt7530_setup(struct dsa_switch *ds)
>  		reset_control_assert(priv->rstc);
>  		usleep_range(1000, 1100);
>  		reset_control_deassert(priv->rstc);
> -	} else {
> +	} else if (priv->reset) {

I don't really understand this patch. gpiod_set_value_cansleep() can
tolerate NULL GPIO descriptors.

>  		gpiod_set_value_cansleep(priv->reset, 0);
>  		usleep_range(1000, 1100);
>  		gpiod_set_value_cansleep(priv->reset, 1);
> @@ -2276,7 +2276,7 @@ mt7531_setup(struct dsa_switch *ds)
>  		reset_control_assert(priv->rstc);
>  		usleep_range(1000, 1100);
>  		reset_control_deassert(priv->rstc);
> -	} else {
> +	} else if (priv->reset) {
>  		gpiod_set_value_cansleep(priv->reset, 0);
>  		usleep_range(1000, 1100);
>  		gpiod_set_value_cansleep(priv->reset, 1);
> @@ -3272,8 +3272,7 @@ mt7530_probe(struct mdio_device *mdiodev)
>  		priv->reset = devm_gpiod_get_optional(&mdiodev->dev, "reset",
>  						      GPIOD_OUT_LOW);
>  		if (IS_ERR(priv->reset)) {
> -			dev_err(&mdiodev->dev, "Couldn't get our reset line\n");
> -			return PTR_ERR(priv->reset);
> +			dev_warn(&mdiodev->dev, "Couldn't get our reset line\n");

I certainly don't understand why you're suppressing the pointer-encoded
errors here. The function used is devm_gpiod_get_optional(), which
returns NULL for a missing reset-gpios, not IS_ERR(something). The
IS_ERR(something) is actually important to not ignore, maybe it's
IS_ERR(-EPROBE_DEFER). And this change breaks waiting for the descriptor
to become available.

>  		}
>  	}
>  
> -- 
> 2.25.1
> 

So what doesn't work without this patch, exactly?

^ permalink raw reply

* Re: [PATCH bpf-next v6 6/6] bpf: Allow the new syncookie helpers to work with SKBs
From: Andrii Nakryiko @ 2022-04-26 23:47 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Networking, Tariq Toukan, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, David S. Miller,
	Jakub Kicinski, Petar Penkov, Lorenz Bauer, Eric Dumazet,
	Hideaki YOSHIFUJI, David Ahern, Shuah Khan,
	Jesper Dangaard Brouer, Nathan Chancellor, Nick Desaulniers,
	Joe Stringer, Florent Revest, open list:KERNEL SELFTEST FRAMEWORK,
	Toke Høiland-Jørgensen, Kumar Kartikeya Dwivedi,
	Florian Westphal, pabeni
In-Reply-To: <20220422172422.4037988-7-maximmi@nvidia.com>

On Fri, Apr 22, 2022 at 10:25 AM Maxim Mikityanskiy <maximmi@nvidia.com> wrote:
>
> This commits allows the new BPF helpers to work in SKB context (in TC
> BPF programs): bpf_tcp_raw_{gen,check}_syncookie_ipv{4,6}.
>
> The sample application and selftest are updated to support the TC mode.
> It's not the recommended mode of operation, because the SKB is already
> created at this point, and it's unlikely that the BPF program will
> provide any substantional speedup compared to regular SYN cookies or
> synproxy.
>
> Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>  net/core/filter.c                             |  10 ++
>  .../selftests/bpf/prog_tests/xdp_synproxy.c   |  53 +++++--
>  .../selftests/bpf/progs/xdp_synproxy_kern.c   | 141 +++++++++++++-----
>  tools/testing/selftests/bpf/xdp_synproxy.c    |  94 +++++++++---
>  4 files changed, 230 insertions(+), 68 deletions(-)
>

[...]

>
> -       return hdr.tcp->syn ? syncookie_handle_syn(&hdr, ctx, data, data_end) :
> -                             syncookie_handle_ack(&hdr);
> +       return hdr->tcp->syn ? syncookie_handle_syn(hdr, ctx, data, data_end, xdp) :
> +                              syncookie_handle_ack(hdr);
> +}
> +
> +SEC("xdp/syncookie")

SEC("xdp")? libbpf will reject SEC("xdp/syncookie") in strict libbpf 1.0 mode

> +int syncookie_xdp(struct xdp_md *ctx)
> +{
> +       void *data_end = (void *)(long)ctx->data_end;
> +       void *data = (void *)(long)ctx->data;
> +       struct header_pointers hdr;
> +       int ret;
> +
> +       ret = syncookie_part1(ctx, data, data_end, &hdr, true);
> +       if (ret != XDP_TX)
> +               return ret;
> +
> +       data_end = (void *)(long)ctx->data_end;
> +       data = (void *)(long)ctx->data;
> +
> +       return syncookie_part2(ctx, data, data_end, &hdr, true);
> +}

[...]

^ permalink raw reply

* Re: [PATCH RFC 4/5] net/tls: Add support for PF_TLSH (a TLS handshake listener)
From: Jakub Kicinski @ 2022-04-26 23:47 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: netdev, Linux NFS Mailing List, linux-nvme@lists.infradead.org,
	linux-cifs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	ak@tempesta-tech.com, borisp@nvidia.com, simo@redhat.com
In-Reply-To: <BA6BB8F6-3A2A-427B-A5D7-30B5F778B7E0@oracle.com>

On Tue, 26 Apr 2022 15:58:29 +0000 Chuck Lever III wrote:
> > On Apr 26, 2022, at 10:55 AM, Jakub Kicinski <kuba@kernel.org> wrote:
> >> The RPC-with-TLS standard allows unencrypted RPC traffic on the connection
> >> before sending ClientHello. I think we'd like to stick with creating the
> >> socket in the kernel, for this reason and for the reasons Hannes mentions
> >> in his reply.  
> > 
> > Umpf, I presume that's reviewed by security people in IETF so I guess
> > it's done right this time (tm).  
> 
> > Your wording seems careful not to imply that you actually need that,
> > tho. Am I over-interpreting?  
> 
> RPC-with-TLS requires one RPC as a "starttls" token. That could be
> done in user space as part of the handshake, but it is currently
> done in the kernel to enable the user agent to be shared with other
> kernel consumers of TLS. Keep in mind that we already have two
> real consumers: NVMe and RPC-with-TLS; and possibly QUIC.
> 
> You asserted earlier that creating sockets in user space "scales
> better" but did not provide any data. Can we see some? How well
> does it need to scale for storage protocols that use long-lived
> connections?

I meant scale with the number of possible crypto protocols, 
I mentioned three there.

> Also, why has no-one mentioned the NBD on TLS implementation to
> us before? I will try to review that code soon.

Oops, maybe that thing had never seen the light of a public mailing
list then :S Dave Watson was working on it at Facebook, but he since
moved to greener pastures.

> > This set does not even have selftests.  
> 
> I can include unit tests with the prototype. Someone needs to
> educate me on what is the preferred unit test paradigm for this
> type of subsystem. Examples in the current kernel code base would
> help too.

Whatever level of testing makes you as an engineer comfortable
with saying "this test suite is sufficient"? ;)

For TLS we have tools/testing/selftests/net/tls.c - it's hardly
an example of excellence but, you know, it catches bugs here and 
there.

> > Plus there are more protocols being actively worked on (QUIC, PSP etc.)
> > Having per ULP special sauce to invoke a user space helper is not the
> > paradigm we chose, and the time as inopportune as ever to change that.  
> 
> When we started discussing TLS handshake requirements with some
> community members several years ago, creating the socket in
> kernel and passing it up to a user agent was the suggested design.
> Has that recommendation changed since then?

Hm, do you remember who you discussed it with? Would be good 
to loop those folks in. I wasn't involved at the beginning of the 
TLS work, I know second hand that HW offload and nbd were involved 
and that the design went thru some serious re-architecting along 
the way. In the beginning there was a separate socket for control
records, and that was nacked.

But also (and perhaps most importantly) I'm not really objecting 
to creating the socket in the kernel. I'm primarily objecting to 
a second type of a special TLS socket which has TLS semantics.

> I'd prefer an in-kernel handshake implementation over a user
> space one (even one that is sharable amongst transports and ULPs
> as my proposal is intended to be). However, so far we've been told
> that an in-kernel handshake implementation is a non-starter.
> 
> But in the abstract, we agree that having a single TLS handshake
> mechanism for kernel consumers is preferable.

For some definition of "we" which doesn't not include me?

^ permalink raw reply

* Re: [Patch net-next] net: dsa: ksz9477: move get_stats64 to ksz_common.c
From: Vladimir Oltean @ 2022-04-26 23:42 UTC (permalink / raw)
  To: Arun Ramadoss
  Cc: linux-kernel, netdev, Paolo Abeni, Jakub Kicinski,
	David S. Miller, Florian Fainelli, Vivien Didelot, Andrew Lunn,
	UNGLinuxDriver, Woojung Huh, Oleksij Rempel
In-Reply-To: <20220426091048.9311-1-arun.ramadoss@microchip.com>

On Tue, Apr 26, 2022 at 02:40:48PM +0530, Arun Ramadoss wrote:
> The mib counters for the ksz9477 is same for the ksz9477 switch and
> LAN937x switch. Hence moving it to ksz_common.c file in order to have it
> generic function. The DSA hook get_stats64 now can call ksz_get_stats64.
> 
> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
> ---

Reviewed-by: Vladimir Oltean <olteanv@gmail.com>

^ permalink raw reply

* Re: [PATCH v2 net-next 07/10] net: dsa: request drivers to perform FDB isolation
From: Vladimir Oltean @ 2022-04-26 23:17 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Hans Schultz, netdev@vger.kernel.org, Jakub Kicinski,
	David S. Miller, Florian Fainelli, Vivien Didelot,
	Vladimir Oltean, Kurt Kanzenbach, Hauke Mehrtens, Woojung Huh,
	UNGLinuxDriver@microchip.com, Sean Wang, Landen Chao,
	DENG Qingfang, Claudiu Manoil, Alexandre Belloni, Linus Walleij,
	Alvin Šipraga, George McCollister
In-Reply-To: <YmgaX4On/2j3lJf/@lunn.ch>

On Tue, Apr 26, 2022 at 06:14:23PM +0200, Andrew Lunn wrote:
> > > @@ -941,23 +965,29 @@ struct dsa_switch_ops {
> > >  	 * Forwarding database
> > >  	 */
> > >  	int	(*port_fdb_add)(struct dsa_switch *ds, int port,
> > > -				const unsigned char *addr, u16 vid);
> > > +				const unsigned char *addr, u16 vid,
> > > +				struct dsa_db db);
> > 
> > Hi! Wouldn't it be better to have a struct that has all the functions
> > parameters in one instead of adding further parameters to these
> > functions?
> > 
> > I am asking because I am also needing to add a parameter to
> > port_fdb_add(), and it would be more future oriented to have a single
> > function parameter as a struct, so that it is easier to add parameters
> > to these functions without havíng to change the prototype of the
> > function every time.
> 
> Hi Hans
> 
> Please trim the text to only what is relevant when replying. It is
> easy to miss comments when having to Page Down, Page Down, Page down,
> to find comments.
> 
>    Andrew

Agreed, had to scroll too much.

Hans, what extra argument do you want to add to port_fdb_add?
A static/dynamic, I suppose, similar to what exists in port_fdb_dump?

But we surely wouldn't pass _all_ parameters of port_fdb_add through
some giant struct args_of_port_fdb_add, would we?  Not ds, port, db,
just what is naturally grouped together as an FDB entry: addr, vid,
maybe your new static/dynamic thing.

If we group the addr and vid in port_fdb_add into a structure that
represents an FDB entry, what do we do about those in port_fdb_del?
Group those as well, maybe for consistency?

Same question for port_fdb_dump and its dsa_fdb_dump_cb_t: would you
change it for uniformity, or would you keep it the way it is to reduce
the churn? I mean it's a perfectly viable candidate for argument
grouping, but your stated goal _is_ to reduce churn.

But if we add the static/dynamic boolean to this structure, does it make
sense on deletion? And if it doesn't, why have we changed the prototype
of port_fdb_del to include it?

Restated: do we want to treat the "static/dynamic" info as a property of
an FDB entry (i.e. a member of the structure), or as the way in which a
certain FDB entry can be added to hardware (case in which it is relevant
only to _add and to _dump)?  After all, an FDB entry for {addr, vid}
learned statically, and another FDB entry for the same {addr, vid} but
learned dynamically, are fundamentally the same object.

And if we don't go with a big struct args_of_port_fdb_add (which would
be silly if we did), who guarantees that the argument list of port_fdb_add
won't grow in the future anyway? Like in the example I just gave above,
where "static/dynamic" doesn't fully appear to group naturally with
"addr" and "vid", and would probably still be a separate boolean,
rendering the whole point moot.

And even grouping only those last 2 together is maybe debatable due to
practical reasons - where do we declare this small structure? We have a
struct switchdev_notifier_fdb_info with some more stuff that we
deliberately do not want to expose, and {addr, vid} are all that remain.

Although maybe there are benefits to having a small {addr, vid} structure
defined somewhere publicly, too, and referenced consistently by driver
code. Like for example to avoid bad patterns from proliferating.
Currently we have "const unsigned char *addr, u16 vid", so on a 64 bit
machine, this is a pointer plus an unsigned short, 10 bytes, padded up
by the compiler, maybe to 16. But the Ethernet addresses are 6 bytes,
those are shorter than the pointer itself, so on a 64-bit machine,
having "unsigned char addr[ETH_ALEN], u16 vid" makes a lot more space,
saves some real memory.

Anyway, I'm side tracking. If you want to make changes, propose a
patch, but come up with a more real argument than "reducing churn"
(while effectively producing churn).

To give you a counter example, phylink_mac_config() used to have the
opposite problem, the arguments were all neatly packed into a struct
phylink_link_state, but as the kerneldocs from include/linux/phylink.h
put it, not all members of that structure were guaranteed to contain
valid values. So there were bugs due to people not realizing this, and
consequently, phylink_mac_link_up() took the opposite approach, of
explicitly passing all known parameters of the resolved link state as
individual arguments. Now there are 10 arguments to that function, but
somehow at least this appears to have worked better, in the sense that
there isn't an explanatory note saying what's valid and what isn't.

^ permalink raw reply

* Re: [PATCH bpf-next 2/4] libbpf: Add helpers for pinning bpf prog through bpf object skeleton
From: Andrii Nakryiko @ 2022-04-26 23:15 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko, Martin Lau,
	Song Liu, Yonghong Song, john fastabend, KP Singh, netdev, bpf
In-Reply-To: <CALOAHbAb6VH_fHAE3_tCMK0pBJCdM9PPg9pfHoye+2jq+N7DYQ@mail.gmail.com>

On Tue, Apr 26, 2022 at 8:59 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Mon, Apr 25, 2022 at 9:57 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > On 4/23/22 4:00 PM, Yafang Shao wrote:
> > > Currently there're helpers for allowing to open/load/attach BPF object
> > > through BPF object skeleton. Let's also add helpers for pinning through
> > > BPF object skeleton. It could simplify BPF userspace code which wants to
> > > pin the progs into bpffs.
> >
> > Please elaborate some more on your use case/rationale for the commit message,
> > do you have orchestration code that will rely on these specifically?
> >
>
> We have a bpf manager on our production environment to maintain the
> bpf programs, some of which need to be pinned in bpffs, for example
> tracing bpf programs, perf_event programs and other bpf hooks added by
> ourselves for performance tuning.  These bpf programs don't need a
> user agent, while they really work like a kernel module, that is why
> we pin them. For these kinds of bpf programs, the bpf manager can help
> to simplify the development and deployment.  Take the improvement on
> development for example,  the user doesn't need to write userspace
> code while he focuses on the kernel side only, and then bpf manager
> will do all the other things. Below is a simple example,
>    Step1, gen the skeleton for the user provided bpf object file,
>               $ bpftool gen skeleton  test.bpf.o > simple.skel.h
>    Step2, Compile the bpf object file into a runnable binary
>               #include "simple.skel.h"
>
>               #define SIMPLE_BPF_PIN(name, path)  \
>               ({                                                              \
>                   struct name##_bpf *obj;                      \
>                   int err = 0;                                            \
>                                                                               \
>                   obj = name##_bpf__open();                \
>                    if (!obj) {                                              \
>                        err = -errno;                                    \
>                        goto cleanup;                                 \
>                     }                                                         \
>                                                                               \
>                     err = name##_bpf__load(obj);           \
>                     if (err)                                                 \
>                         goto cleanup;                                 \
>                                                                                \
>                      err = name##_bpf__attach(obj);       \
>                      if (err)                                                \
>                          goto cleanup;                                \
>                                                                                \
>                      err = name##_bpf__pin_prog(obj, path);      \
>                      if (err)                                                \
>                          goto cleanup;                                \
>                                                                               \
>                       goto end;                                         \
>                                                                               \
>                   cleanup:                                              \
>                       name##_bpf__destroy(obj);            \
>                   end:                                                     \
>                       err;                                                  \
>                    })
>
>                    SIMPLE_BPF_PIN(test, "/sys/fs/bpf");
>
>                As the userspace code of FD-based bpf objects are all
> the same,  so we can abstract them as above.  The pathset means to add
> the non-exist "name##_bpf__pin_prog(obj, path)" for it.
>

Your BPF manager is user-space code that you control, right? I'm not
sure how skeleton is helpful here given your BPF manager is generic
and doesn't work with any specific skeleton, if I understand the idea.
But let's assume that you use skeleton to also embed BPF ELF bytes and
pass them to your manager for "activation". Once you open and load
bpf_object, your BPF manager can generically iterate all BPF programs
using bpf_object_for_each_program(), attempt to attach them with
bpf_program__attach() (see how bpf_object__attach_skeleton is handling
non-auto-attachable programs) and immediately pin the link (no need to
even store it, you can destroy it after pinning immediately). All this
is using generic libbpf APIs and requires no code generation. But keep
in mind that not all struct bpf_link in libbpf are pinnable (not all
links have FD-based BPF link in kernel associated with them), so
you'll have to deal with that somehow (and what you didn't do in this
patch for libbpf implementation).

> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > ---
> > >   tools/lib/bpf/libbpf.c   | 59 ++++++++++++++++++++++++++++++++++++++++
> > >   tools/lib/bpf/libbpf.h   |  4 +++
> > >   tools/lib/bpf/libbpf.map |  2 ++
> > >   3 files changed, 65 insertions(+)
> > >

[...]

^ permalink raw reply

* Re: [PATCH net-next v6 04/13] net: wwan: t7xx: Add port proxy infrastructure
From: Sergey Ryazanov @ 2022-04-26 23:06 UTC (permalink / raw)
  To: Veleta, Moises
  Cc: Ricardo Martinez, netdev@vger.kernel.org,
	linux-wireless@vger.kernel.org, Jakub Kicinski, David Miller,
	Johannes Berg, Loic Poulain, Kumar, M Chetan,
	Devegowda, Chandrashekar, linuxwwan,
	chiranjeevi.rapolu@linux.intel.com, Liu, Haijun, Hanania, Amir,
	Andy Shevchenko, Sharma, Dinesh, Lee, Eliot,
	Jarvinen, Ilpo Johannes, Bossart, Pierre-louis,
	Sethuraman, Muralidharan, Mishra, Soumya Prakash,
	Kancharla, Sreehari, Sahu, Madhusmita
In-Reply-To: <MWHPR1101MB231920A2152DC2FC7B2FBB3CE0FB9@MWHPR1101MB2319.namprd11.prod.outlook.com>

Hello Moises,

On Tue, Apr 26, 2022 at 10:46 PM Veleta, Moises <moises.veleta@intel.com> wrote:
> From: Sergey Ryazanov <ryazanov.s.a@gmail.com>
> Sent: Monday, April 25, 2022 4:53 PM
> To: Ricardo Martinez <ricardo.martinez@linux.intel.com>
> Cc: netdev@vger.kernel.org <netdev@vger.kernel.org>; linux-wireless@vger.kernel.org <linux-wireless@vger.kernel.org>; Jakub Kicinski <kuba@kernel.org>; David Miller <davem@davemloft.net>; Johannes Berg <johannes@sipsolutions.net>; Loic Poulain <loic.poulain@linaro.org>; Kumar, M Chetan <m.chetan.kumar@intel.com>; Devegowda, Chandrashekar <chandrashekar.devegowda@intel.com>; linuxwwan <linuxwwan@intel.com>; chiranjeevi.rapolu@linux.intel.com <chiranjeevi.rapolu@linux.intel.com>; Liu, Haijun <haijun.liu@mediatek.com>; Hanania, Amir <amir.hanania@intel.com>; Andy Shevchenko <andriy.shevchenko@linux.intel.com>; Sharma, Dinesh <dinesh.sharma@intel.com>; Lee, Eliot <eliot.lee@intel.com>; Jarvinen, Ilpo Johannes <ilpo.johannes.jarvinen@intel.com>; Veleta, Moises <moises.veleta@intel.com>; Bossart, Pierre-louis <pierre-louis.bossart@intel.com>; Sethuraman, Muralidharan <muralidharan.sethuraman@intel.com>; Mishra, Soumya Prakash <soumya.prakash.mishra@intel.com>; Kancharla, Sreehari <sreehari.kancharla@intel.com>; Sahu, Madhusmita <madhusmita.sahu@intel.com>
> Subject: Re: [PATCH net-next v6 04/13] net: wwan: t7xx: Add port proxy infrastructure
>
> On Fri, Apr 8, 2022 at 1:37 AM Ricardo Martinez
> <ricardo.martinez@linux.intel.com> wrote:
>> Port-proxy provides a common interface to interact with different types
>> of ports. Ports export their configuration via `struct t7xx_port` and
>> operate as defined by `struct port_ops`.
>>
>> Signed-off-by: Haijun Liu <haijun.liu@mediatek.com>
>> Co-developed-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
>> Signed-off-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
>> Co-developed-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
>> Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
>>
>> From a WWAN framework perspective:
>> Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
>
> [skipped]
>
>> +int t7xx_port_enqueue_skb(struct t7xx_port *port, struct sk_buff *skb)
>> +{
>> +       unsigned long flags;
>> +
>> +       spin_lock_irqsave(&port->rx_wq.lock, flags);
>> +       if (port->rx_skb_list.qlen >= port->rx_length_th) {
>> +               spin_unlock_irqrestore(&port->rx_wq.lock, flags);
>
> Probably skb should be freed here before returning. The caller assumes
> that skb will be consumed even in case of error.
>
> [MV] We do not drop port ctrl messages. We keep them and try again later.
> Whereas WWAN skbs are dropped if conditions are met.

I missed that the WWAN port returns no error when it drops the skb.
And then I concluded that any failure to process the CCCI message
should be accomplished with the skb freeing. Now the handling of CCCI
messages is more clear to me. Thank you for the clarifications!

To avoid similar misinterpretation in the future, I thought that the
skb freeing in the WWAN port worth a comment. Something to describe
that despite dropping the message, the return code is zero, indicating
skb consumption. Similarly in this (t7xx_port_enqueue_skb) function.
Something like: "return an error here, the caller will try again
later". And maybe in t7xx_cldma_gpd_rx_from_q() near the loop break
after the .skb_recv() failure test. Something like: "break message
processing for now will try this message later".

What do you think?

>> +               return -ENOBUFS;
>> +       }
>> +       __skb_queue_tail(&port->rx_skb_list, skb);
>> +       spin_unlock_irqrestore(&port->rx_wq.lock, flags);
>> +
>> +       wake_up_all(&port->rx_wq);
>> +       return 0;
>> +}
>
> [skipped]
>
>> +static int t7xx_port_proxy_recv_skb(struct cldma_queue *queue, struct sk_buff *skb)
>> +{
>> +       struct ccci_header *ccci_h = (struct ccci_header *)skb->data;
>> +       struct t7xx_pci_dev *t7xx_dev = queue->md_ctrl->t7xx_dev;
>> +       struct t7xx_fsm_ctl *ctl = t7xx_dev->md->fsm_ctl;
>> +       struct device *dev = queue->md_ctrl->dev;
>> +       struct t7xx_port_conf *port_conf;
>> +       struct t7xx_port *port;
>> +       u16 seq_num, channel;
>> +       int ret;
>> +
>> +       if (!skb)
>> +               return -EINVAL;
>> +
>> +       channel = FIELD_GET(CCCI_H_CHN_FLD, le32_to_cpu(ccci_h->status));
>> +       if (t7xx_fsm_get_md_state(ctl) == MD_STATE_INVALID) {
>> +               dev_err_ratelimited(dev, "Packet drop on channel 0x%x, modem not ready\n", channel);
>> +               goto drop_skb;
>> +       }
>> +
>> +       port = t7xx_port_proxy_find_port(t7xx_dev, queue, channel);
>> +       if (!port) {
>> +               dev_err_ratelimited(dev, "Packet drop on channel 0x%x, port not found\n", channel);
>> +               goto drop_skb;
>> +       }
>> +
>> +       seq_num = t7xx_port_next_rx_seq_num(port, ccci_h);
>> +       port_conf = port->port_conf;
>> +       skb_pull(skb, sizeof(*ccci_h));
>> +
>> +       ret = port_conf->ops->recv_skb(port, skb);
>> +       if (ret) {
>> +               skb_push(skb, sizeof(*ccci_h));
>
> Header can not be pushed back here, since the .recv_skb() callback
> consumes (frees) skb even in case of error. See
> t7xx_port_wwan_recv_skb() and my comment in t7xx_port_enqueue_skb().
> Pushing the header back after failure will trigger the use-after-free
> error.
>
> [MV] Since t7xx_port_wwan_recv_skb returns 0 when dropping a skb, it skips this push operation and
> will not trigger the error stated. Inside of that function an error is printed.
>
>> +               return ret;
>> +       }
>> +
>> +       port->seq_nums[MTK_RX] = seq_num;
>
> The expected sequence number updated only on successful .recv_skb()
> exit. This will trigger the out-of-order seqno warning on a next
> message after a .recv_skb() failure. Is this intentional behaviour?
>
> Maybe the expected sequence number should be updated before the
> .recv_skb() call? Or even the sequence number update should be moved
> to t7xx_port_next_rx_seq_num()?
>
> [MV] For t7xx_port_wwan_recv_skb, it drops skb and seq_nums is updated.
> for t7xx_port_enqueue_skb, it retains skb and can be processed again later, skipping this update upon error.

Now this is clear to me. Thanks.

>> +       return 0;
>> +
>> +drop_skb:
>> +       dev_kfree_skb_any(skb);
>> +       return 0;
>> +}

-- 
Sergey

^ permalink raw reply

* Re: [PATCH net-next v9 2/2] net: ethernet: Add driver for Sunplus SP7021
From: Francois Romieu @ 2022-04-26 22:52 UTC (permalink / raw)
  To: Wells Lu
  Cc: davem, kuba, robh+dt, netdev, devicetree, linux-kernel, p.zabel,
	pabeni, krzk+dt, roopa, andrew, edumazet, wells.lu
In-Reply-To: <1650882640-7106-3-git-send-email-wellslutw@gmail.com>

Wells Lu <wellslutw@gmail.com> :
[...]
> +int spl2sw_rx_poll(struct napi_struct *napi, int budget)
> +{
[...]
> +	wmb();	/* make sure settings are effective. */
> +	mask = readl(comm->l2sw_reg_base + L2SW_SW_INT_MASK_0);
> +	mask &= ~MAC_INT_RX;
> +	writel(mask, comm->l2sw_reg_base + L2SW_SW_INT_MASK_0);
> +
> +	napi_complete(napi);
> +	return 0;
> +}
> +
> +int spl2sw_tx_poll(struct napi_struct *napi, int budget)
> +{
[...]
> +	wmb();			/* make sure settings are effective. */
> +	mask = readl(comm->l2sw_reg_base + L2SW_SW_INT_MASK_0);
> +	mask &= ~MAC_INT_TX;
> +	writel(mask, comm->l2sw_reg_base + L2SW_SW_INT_MASK_0);
> +
> +	napi_complete(napi);
> +	return 0;
> +}
> +
> +irqreturn_t spl2sw_ethernet_interrupt(int irq, void *dev_id)
> +{
[...]
> +	if (status & MAC_INT_RX) {
> +		/* Disable RX interrupts. */
> +		mask = readl(comm->l2sw_reg_base + L2SW_SW_INT_MASK_0);
> +		mask |= MAC_INT_RX;
> +		writel(mask, comm->l2sw_reg_base + L2SW_SW_INT_MASK_0);
[...]
> +		napi_schedule(&comm->rx_napi);
> +	}
> +
> +	if (status & MAC_INT_TX) {
> +		/* Disable TX interrupts. */
> +		mask = readl(comm->l2sw_reg_base + L2SW_SW_INT_MASK_0);
> +		mask |= MAC_INT_TX;
> +		writel(mask, comm->l2sw_reg_base + L2SW_SW_INT_MASK_0);
> +
> +		if (unlikely(status & MAC_INT_TX_DES_ERR)) {
[...]
> +		} else {
> +			napi_schedule(&comm->tx_napi);
> +		}
> +	}

The readl/writel sequence in rx_poll (or tx_poll) races with the irq
handler performing MAC_INT_TX (or MAC_INT_RX) work. If the readl
returns the same value to both callers, one of the writel will be
overwritten.

-- 
Ueimor

^ permalink raw reply

* Re: [PATCH bpf-next v5 3/8] bpf: per-cgroup lsm flavor
From: Stanislav Fomichev @ 2022-04-26 22:44 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: netdev, bpf, ast, daniel, andrii
In-Reply-To: <20220426062705.siikmhdj5vhsffjq@kafai-mbp.dhcp.thefacebook.com>

On Mon, Apr 25, 2022 at 11:27 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Tue, Apr 19, 2022 at 12:00:48PM -0700, Stanislav Fomichev wrote:
> > Allow attaching to lsm hooks in the cgroup context.
> >
> > Attaching to per-cgroup LSM works exactly like attaching
> > to other per-cgroup hooks. New BPF_LSM_CGROUP is added
> > to trigger new mode; the actual lsm hook we attach to is
> > signaled via existing attach_btf_id.
> >
> > For the hooks that have 'struct socket' as its first argument,
> > we use the cgroup associated with that socket. For the rest,
> > we use 'current' cgroup (this is all on default hierarchy == v2 only).
> > Note that for the hooks that work on 'struct sock' we still
> > take the cgroup from 'current' because most of the time,
> > the 'sock' argument is not properly initialized.
> This paragraph is out-dated.

Ack, will update that last part about sock, thanks!

> > Behind the scenes, we allocate a shim program that is attached
> > to the trampoline and runs cgroup effective BPF programs array.
> > This shim has some rudimentary ref counting and can be shared
> > between several programs attaching to the same per-cgroup lsm hook.
> >
> > Note that this patch bloats cgroup size because we add 211
> > cgroup_bpf_attach_type(s) for simplicity sake. This will be
> > addressed in the subsequent patch.
> >
> > Also note that we only add non-sleepable flavor for now. To enable
> > sleepable use-cases, BPF_PROG_RUN_ARRAY_CG has to grab trace rcu,
> s/BPF_PROG_RUN_ARRAY_CG/bpf_prog_run_array_cg/

Sure, thanks!

> > shim programs have to be freed via trace rcu, cgroup_bpf.effective
> > should be also trace-rcu-managed + maybe some other changes that
> > I'm not aware of.
> >
>
> [ ... ]
>
> > diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> > index 064eccba641d..2161cba1fe0c 100644
> > --- a/kernel/bpf/bpf_lsm.c
> > +++ b/kernel/bpf/bpf_lsm.c
> > @@ -16,6 +16,7 @@
> >  #include <linux/bpf_local_storage.h>
> >  #include <linux/btf_ids.h>
> >  #include <linux/ima.h>
> > +#include <linux/bpf-cgroup.h>
> >
> >  /* For every LSM hook that allows attachment of BPF programs, declare a nop
> >   * function where a BPF program can be attached.
> > @@ -35,6 +36,68 @@ BTF_SET_START(bpf_lsm_hooks)
> >  #undef LSM_HOOK
> >  BTF_SET_END(bpf_lsm_hooks)
> >
> > +/* List of LSM hooks that should operate on 'current' cgroup regardless
> > + * of function signature.
> > + */
> > +BTF_SET_START(bpf_lsm_current_hooks)
> > +/* operate on freshly allocated sk without any cgroup association */
> > +BTF_ID(func, bpf_lsm_sk_alloc_security)
> > +BTF_ID(func, bpf_lsm_sk_free_security)
> > +BTF_SET_END(bpf_lsm_current_hooks)
> > +
> > +int bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
> > +                          bpf_func_t *bpf_func)
> > +{
> > +     const struct btf_type *first_arg_type;
> > +     const struct btf_type *sock_type;
> > +     const struct btf_type *sk_type;
> > +     const struct btf *btf_vmlinux;
> > +     const struct btf_param *args;
> > +     s32 type_id;
> > +
> > +     if (!prog->aux->attach_func_proto ||
> > +         !btf_type_is_func_proto(prog->aux->attach_func_proto))
> > +             return -EINVAL;
> > +
> > +     if (btf_type_vlen(prog->aux->attach_func_proto) < 1 ||
> > +         btf_id_set_contains(&bpf_lsm_current_hooks,
> > +                             prog->aux->attach_btf_id)) {
> > +             *bpf_func = __cgroup_bpf_run_lsm_current;
> > +             return 0;
> > +     }
> > +
> > +     args = btf_params(prog->aux->attach_func_proto);
> > +
> > +     btf_vmlinux = bpf_get_btf_vmlinux();
> > +     if (!btf_vmlinux)
> Remove this check and other similar checks because the btf_vmlinux has
> been successfully parsed during the prog load time.

Good point, will remove.

> > +             return -EINVAL;
> > +
> > +     type_id = btf_find_by_name_kind(btf_vmlinux, "socket", BTF_KIND_STRUCT);
> > +     if (type_id < 0)
> > +             return -EINVAL;
> > +     sock_type = btf_type_by_id(btf_vmlinux, type_id);
> > +
> > +     type_id = btf_find_by_name_kind(btf_vmlinux, "sock", BTF_KIND_STRUCT);
> > +     if (type_id < 0)
> > +             return -EINVAL;
> > +     sk_type = btf_type_by_id(btf_vmlinux, type_id);
> > +
> > +     first_arg_type = btf_type_resolve_ptr(btf_vmlinux, args[0].type, NULL);
> > +     if (first_arg_type == sock_type)
> > +             *bpf_func = __cgroup_bpf_run_lsm_socket;
> > +     else if (first_arg_type == sk_type)
> > +             *bpf_func = __cgroup_bpf_run_lsm_sock;
> > +     else
> > +             *bpf_func = __cgroup_bpf_run_lsm_current;
> > +
> > +     return 0;
> > +}
> > +
> > +int bpf_lsm_hook_idx(u32 btf_id)
> > +{
> > +     return btf_id_set_index(&bpf_lsm_hooks, btf_id);
> > +}
> > +
> >  int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
> >                       const struct bpf_prog *prog)
> >  {
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index 0918a39279f6..4199de31f49c 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -4971,6 +4971,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> >
> >       if (arg == nr_args) {
> >               switch (prog->expected_attach_type) {
> > +             case BPF_LSM_CGROUP:
> >               case BPF_LSM_MAC:
> >               case BPF_TRACE_FEXIT:
> >                       /* When LSM programs are attached to void LSM hooks
> > @@ -6396,6 +6397,16 @@ static int btf_id_cmp_func(const void *a, const void *b)
> >       return *pa - *pb;
> >  }
> >
> > +int btf_id_set_index(const struct btf_id_set *set, u32 id)
> > +{
> > +     const u32 *p;
> > +
> > +     p = bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func);
> > +     if (!p)
> > +             return -1;
> > +     return p - set->ids;
> > +}
> > +
> >  bool btf_id_set_contains(const struct btf_id_set *set, u32 id)
> >  {
> >       return bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func) != NULL;
> > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> > index aaf9e36f2736..ba0e0c7a661d 100644
> > --- a/kernel/bpf/cgroup.c
> > +++ b/kernel/bpf/cgroup.c
> > @@ -14,6 +14,9 @@
> >  #include <linux/string.h>
> >  #include <linux/bpf.h>
> >  #include <linux/bpf-cgroup.h>
> > +#include <linux/btf_ids.h>
> > +#include <linux/bpf_lsm.h>
> > +#include <linux/bpf_verifier.h>
> >  #include <net/sock.h>
> >  #include <net/bpf_sk_storage.h>
> >
> > @@ -88,6 +91,85 @@ bpf_prog_run_array_cg(const struct cgroup_bpf *cgrp,
> >       return run_ctx.retval;
> >  }
> >
> > +unsigned int __cgroup_bpf_run_lsm_sock(const void *ctx,
> > +                                    const struct bpf_insn *insn)
> > +{
> > +     const struct bpf_prog *shim_prog;
> > +     struct sock *sk;
> > +     struct cgroup *cgrp;
> > +     int ret = 0;
> > +     u64 *regs;
> > +
> > +     regs = (u64 *)ctx;
> > +     sk = (void *)(unsigned long)regs[BPF_REG_0];
> > +     /*shim_prog = container_of(insn, struct bpf_prog, insnsi);*/
> > +     shim_prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
> > +
> > +     cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
> > +     if (likely(cgrp))
> > +             ret = bpf_prog_run_array_cg(&cgrp->bpf,
> > +                                         shim_prog->aux->cgroup_atype,
> > +                                         ctx, bpf_prog_run, 0);
> > +     return ret;
> > +}
> > +
> > +unsigned int __cgroup_bpf_run_lsm_socket(const void *ctx,
> > +                                      const struct bpf_insn *insn)
> > +{
> > +     const struct bpf_prog *shim_prog;
> > +     struct socket *sock;
> > +     struct cgroup *cgrp;
> > +     int ret = 0;
> > +     u64 *regs;
> > +
> > +     regs = (u64 *)ctx;
> > +     sock = (void *)(unsigned long)regs[BPF_REG_0];
> > +     /*shim_prog = container_of(insn, struct bpf_prog, insnsi);*/
> > +     shim_prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
> > +
> > +     cgrp = sock_cgroup_ptr(&sock->sk->sk_cgrp_data);
> > +     if (likely(cgrp))
> > +             ret = bpf_prog_run_array_cg(&cgrp->bpf,
> > +                                         shim_prog->aux->cgroup_atype,
> > +                                         ctx, bpf_prog_run, 0);
> > +     return ret;
> > +}
> > +
> > +unsigned int __cgroup_bpf_run_lsm_current(const void *ctx,
> > +                                       const struct bpf_insn *insn)
> > +{
> > +     const struct bpf_prog *shim_prog;
> > +     struct cgroup *cgrp;
> > +     int ret = 0;
> > +
> > +     if (unlikely(!current))
> > +             return 0;
> > +
> > +     /*shim_prog = container_of(insn, struct bpf_prog, insnsi);*/
> > +     shim_prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi));
> > +
> > +     rcu_read_lock();
> > +     cgrp = task_dfl_cgroup(current);
> > +     if (likely(cgrp))
> > +             ret = bpf_prog_run_array_cg(&cgrp->bpf,
> > +                                         shim_prog->aux->cgroup_atype,
> > +                                         ctx, bpf_prog_run, 0);
> > +     rcu_read_unlock();
> > +     return ret;
> > +}
> > +
> > +#ifdef CONFIG_BPF_LSM
> > +static enum cgroup_bpf_attach_type bpf_lsm_attach_type_get(u32 attach_btf_id)
> > +{
> > +     return CGROUP_LSM_START + bpf_lsm_hook_idx(attach_btf_id);
> > +}
> > +#else
> > +static enum cgroup_bpf_attach_type bpf_lsm_attach_type_get(u32 attach_btf_id)
> > +{
> > +     return -EOPNOTSUPP;
> > +}
> > +#endif /* CONFIG_BPF_LSM */
> > +
> >  void cgroup_bpf_offline(struct cgroup *cgrp)
> >  {
> >       cgroup_get(cgrp);
> > @@ -155,6 +237,14 @@ static void bpf_cgroup_storages_link(struct bpf_cgroup_storage *storages[],
> >               bpf_cgroup_storage_link(storages[stype], cgrp, attach_type);
> >  }
> >
> > +static void bpf_cgroup_storages_unlink(struct bpf_cgroup_storage *storages[])
> > +{
> > +     enum bpf_cgroup_storage_type stype;
> > +
> > +     for_each_cgroup_storage_type(stype)
> > +             bpf_cgroup_storage_unlink(storages[stype]);
> > +}
> > +
> >  /* Called when bpf_cgroup_link is auto-detached from dying cgroup.
> >   * It drops cgroup and bpf_prog refcounts, and marks bpf_link as defunct. It
> >   * doesn't free link memory, which will eventually be done by bpf_link's
> > @@ -166,6 +256,16 @@ static void bpf_cgroup_link_auto_detach(struct bpf_cgroup_link *link)
> >       link->cgroup = NULL;
> >  }
> >
> > +static void bpf_cgroup_lsm_shim_release(struct bpf_prog *prog,
> > +                                     enum cgroup_bpf_attach_type atype)
> > +{
> > +     if (prog->aux->cgroup_atype < CGROUP_LSM_START ||
> > +         prog->aux->cgroup_atype > CGROUP_LSM_END)
> These checks are unnecessary.  cgroup_atype was set by the kernel
> during attach and it could not be invalid during detach.
>
> Remove this helper and directly call bpf_trampoline_unlink_cgroup_shim(prog)
> instead.

True. I'll remove the checks, I'll still leave
bpf_cgroup_lsm_shim_release around because in the next patch I add
bpf_lsm_attach_type_put here (seems than copy-pasting
bpf_lsm_attach_type_put elsewhere?)

> > +             return;
> > +
> > +     bpf_trampoline_unlink_cgroup_shim(prog);
> > +}
> > +
> >  /**
> >   * cgroup_bpf_release() - put references of all bpf programs and
> >   *                        release all cgroup bpf data
> > @@ -190,10 +290,18 @@ static void cgroup_bpf_release(struct work_struct *work)
> >
> >               hlist_for_each_entry_safe(pl, pltmp, progs, node) {
> >                       hlist_del(&pl->node);
> > -                     if (pl->prog)
> > +                     if (pl->prog) {
> > +                             if (atype == BPF_LSM_CGROUP)
> > +                                     bpf_cgroup_lsm_shim_release(pl->prog,
> > +                                                                 atype);
> >                               bpf_prog_put(pl->prog);
> > -                     if (pl->link)
> > +                     }
> > +                     if (pl->link) {
> > +                             if (atype == BPF_LSM_CGROUP)
> > +                                     bpf_cgroup_lsm_shim_release(pl->link->link.prog,
> > +                                                                 atype);
> >                               bpf_cgroup_link_auto_detach(pl->link);
> > +                     }
> >                       kfree(pl);
> >                       static_branch_dec(&cgroup_bpf_enabled_key[atype]);
> >               }
> > @@ -506,6 +614,7 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
> >       struct bpf_prog *old_prog = NULL;
> >       struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
> >       struct bpf_cgroup_storage *new_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
> > +     struct bpf_attach_target_info tgt_info = {};
> >       enum cgroup_bpf_attach_type atype;
> >       struct bpf_prog_list *pl;
> >       struct hlist_head *progs;
> > @@ -522,9 +631,35 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
> >               /* replace_prog implies BPF_F_REPLACE, and vice versa */
> >               return -EINVAL;
> >
> > -     atype = to_cgroup_bpf_attach_type(type);
> > -     if (atype < 0)
> > -             return -EINVAL;
> > +     if (type == BPF_LSM_CGROUP) {
> > +             struct bpf_prog *p = prog ? : link->link.prog;
> > +
> > +             if (replace_prog) {
> > +                     /* Reusing shim from the original program.
> > +                      */
> > +                     if (replace_prog->aux->attach_btf_id !=
> > +                         p->aux->attach_btf_id)
> > +                             return -EINVAL;
> > +
> > +                     atype = replace_prog->aux->cgroup_atype;
> > +             } else {
> > +                     err = bpf_check_attach_target(NULL, p, NULL,
> > +                                                   p->aux->attach_btf_id,
> > +                                                   &tgt_info);
> > +                     if (err)
> > +                             return -EINVAL;
> > +
> > +                     atype = bpf_lsm_attach_type_get(p->aux->attach_btf_id);
> > +                     if (atype < 0)
> > +                             return atype;
> > +             }
> > +
> > +             p->aux->cgroup_atype = atype;
> > +     } else {
> > +             atype = to_cgroup_bpf_attach_type(type);
> > +             if (atype < 0)
> > +                     return -EINVAL;
> > +     }
> >
> >       progs = &cgrp->bpf.progs[atype];
> >
> > @@ -580,13 +715,26 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
> >       if (err)
> >               goto cleanup;
> >
> > +     bpf_cgroup_storages_link(new_storage, cgrp, type);
> It looks everything is ready for the cgrp local_storage.
> How about also allowing bpf_get_local_storage() for BPF_LSM_CGROUP?

Definitely, we want local storage to work, let me actually exercise it
in the selftest and add the missing kernel bits to enable it.

> > +
> > +     if (type == BPF_LSM_CGROUP && !old_prog) {
> > +             struct bpf_prog *p = prog ? : link->link.prog;
> > +
> > +             err = bpf_trampoline_link_cgroup_shim(p, &tgt_info);
> > +             if (err)
> > +                     goto cleanup_trampoline;
> > +     }
> > +
> >       if (old_prog)
> >               bpf_prog_put(old_prog);
> >       else
> >               static_branch_inc(&cgroup_bpf_enabled_key[atype]);
> > -     bpf_cgroup_storages_link(new_storage, cgrp, type);
> > +
> >       return 0;
> >
> > +cleanup_trampoline:
> > +     bpf_cgroup_storages_unlink(new_storage);
> > +
> >  cleanup:
> >       if (old_prog) {
> >               pl->prog = old_prog;
> > @@ -678,9 +826,13 @@ static int __cgroup_bpf_replace(struct cgroup *cgrp,
> >       struct hlist_head *progs;
> >       bool found = false;
> >
> > -     atype = to_cgroup_bpf_attach_type(link->type);
> > -     if (atype < 0)
> > -             return -EINVAL;
> > +     if (link->type == BPF_LSM_CGROUP) {
> > +             atype = link->link.prog->aux->cgroup_atype;
> > +     } else {
> > +             atype = to_cgroup_bpf_attach_type(link->type);
> > +             if (atype < 0)
> > +                     return -EINVAL;
> > +     }
> >
> >       progs = &cgrp->bpf.progs[atype];
> >
> > @@ -696,6 +848,9 @@ static int __cgroup_bpf_replace(struct cgroup *cgrp,
> >       if (!found)
> >               return -ENOENT;
> >
> > +     if (link->type == BPF_LSM_CGROUP)
> > +             new_prog->aux->cgroup_atype = atype;
> Does it also need to check attach_btf_id between the
> new_prog and the old prog?

Ah, forgot this one, good catch, thanks!

> > +
> >       old_prog = xchg(&link->link.prog, new_prog);
> >       replace_effective_prog(cgrp, atype, link);
> >       bpf_prog_put(old_prog);
> > @@ -779,9 +934,15 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
> >       u32 flags;
> >       int err;
> >
> > -     atype = to_cgroup_bpf_attach_type(type);
> > -     if (atype < 0)
> > -             return -EINVAL;
> > +     if (type == BPF_LSM_CGROUP) {
> > +             struct bpf_prog *p = prog ? : link->link.prog;
> > +
> > +             atype = p->aux->cgroup_atype;
> > +     } else {
> > +             atype = to_cgroup_bpf_attach_type(type);
> > +             if (atype < 0)
> > +                     return -EINVAL;
> > +     }
> >
> >       progs = &cgrp->bpf.progs[atype];
> >       flags = cgrp->bpf.flags[atype];
> > @@ -803,6 +964,10 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
> >       if (err)
> >               goto cleanup;
> >
> > +     if (type == BPF_LSM_CGROUP)
> > +             bpf_cgroup_lsm_shim_release(prog ? : link->link.prog,
> > +                                         atype);
> After looking at find_detach_entry(),
> the pl->prog may not be the same as the prog or link->link.prog here.

I'm assuming you're talking about "allow detaching with invalid FD
(prog==NULL) in legacy mode", right? I did miss that, so I will use
pl->prog instead, thanks!

> > +
> >       /* now can actually delete it from this cgroup list */
> >       hlist_del(&pl->node);
> >
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index e9621cfa09f2..d94f4951154e 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -3139,6 +3139,11 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
> >               return prog->enforce_expected_attach_type &&
> >                       prog->expected_attach_type != attach_type ?
> >                       -EINVAL : 0;
> > +     case BPF_PROG_TYPE_LSM:
> > +             if (prog->expected_attach_type != BPF_LSM_CGROUP)
> > +                     return -EINVAL;
> > +             return 0;
> > +
> >       default:
> >               return 0;
> >       }
> > @@ -3194,6 +3199,8 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
> >               return BPF_PROG_TYPE_SK_LOOKUP;
> >       case BPF_XDP:
> >               return BPF_PROG_TYPE_XDP;
> > +     case BPF_LSM_CGROUP:
> > +             return BPF_PROG_TYPE_LSM;
> >       default:
> >               return BPF_PROG_TYPE_UNSPEC;
> >       }
> > @@ -3247,6 +3254,7 @@ static int bpf_prog_attach(const union bpf_attr *attr)
> >       case BPF_PROG_TYPE_CGROUP_SOCKOPT:
> >       case BPF_PROG_TYPE_CGROUP_SYSCTL:
> >       case BPF_PROG_TYPE_SOCK_OPS:
> > +     case BPF_PROG_TYPE_LSM:
> >               ret = cgroup_bpf_prog_attach(attr, ptype, prog);
> >               break;
> >       default:
> > @@ -3284,6 +3292,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
> >       case BPF_PROG_TYPE_CGROUP_SOCKOPT:
> >       case BPF_PROG_TYPE_CGROUP_SYSCTL:
> >       case BPF_PROG_TYPE_SOCK_OPS:
> > +     case BPF_PROG_TYPE_LSM:
> >               return cgroup_bpf_prog_detach(attr, ptype);
> >       default:
> >               return -EINVAL;
> > @@ -4317,6 +4326,7 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
> >       case BPF_PROG_TYPE_CGROUP_DEVICE:
> >       case BPF_PROG_TYPE_CGROUP_SYSCTL:
> >       case BPF_PROG_TYPE_CGROUP_SOCKOPT:
> > +     case BPF_PROG_TYPE_LSM:
> >               ret = cgroup_bpf_link_attach(attr, prog);
> >               break;
> >       case BPF_PROG_TYPE_TRACING:
> > diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
> > index 0c4fd194e801..c76dfa4ea2d9 100644
> > --- a/kernel/bpf/trampoline.c
> > +++ b/kernel/bpf/trampoline.c
> > @@ -11,6 +11,8 @@
> >  #include <linux/rcupdate_wait.h>
> >  #include <linux/module.h>
> >  #include <linux/static_call.h>
> > +#include <linux/bpf_verifier.h>
> > +#include <linux/bpf_lsm.h>
> >
> >  /* dummy _ops. The verifier will operate on target program's ops. */
> >  const struct bpf_verifier_ops bpf_extension_verifier_ops = {
> > @@ -485,6 +487,149 @@ int bpf_trampoline_unlink_prog(struct bpf_prog *prog, struct bpf_trampoline *tr)
> >       return err;
> >  }
> >
> > +#if defined(CONFIG_BPF_JIT) && defined(CONFIG_BPF_SYSCALL)
> > +static struct bpf_prog *cgroup_shim_alloc(const struct bpf_prog *prog,
> > +                                       bpf_func_t bpf_func)
> > +{
> > +     struct bpf_prog *p;
> > +
> > +     p = bpf_prog_alloc(1, 0);
> > +     if (!p)
> > +             return NULL;
> > +
> > +     p->jited = false;
> > +     p->bpf_func = bpf_func;
> > +
> > +     p->aux->cgroup_atype = prog->aux->cgroup_atype;
> > +     p->aux->attach_func_proto = prog->aux->attach_func_proto;
> > +     p->aux->attach_btf_id = prog->aux->attach_btf_id;
> > +     p->aux->attach_btf = prog->aux->attach_btf;
> > +     btf_get(p->aux->attach_btf);
> > +     p->type = BPF_PROG_TYPE_LSM;
> > +     p->expected_attach_type = BPF_LSM_MAC;
> > +     bpf_prog_inc(p);
> > +
> > +     return p;
> > +}
> > +
> > +static struct bpf_prog *cgroup_shim_find(struct bpf_trampoline *tr,
> > +                                      bpf_func_t bpf_func)
> > +{
> > +     const struct bpf_prog_aux *aux;
> > +     int kind;
> > +
> > +     for (kind = 0; kind < BPF_TRAMP_MAX; kind++) {
> > +             hlist_for_each_entry(aux, &tr->progs_hlist[kind], tramp_hlist) {
> > +                     struct bpf_prog *p = aux->prog;
> > +
> > +                     if (p->bpf_func == bpf_func)
> > +                             return p;
> > +             }
> > +     }
> > +
> > +     return NULL;
> > +}
> > +
> > +int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
> > +                                 struct bpf_attach_target_info *tgt_info)
> > +{
> > +     struct bpf_prog *shim_prog = NULL;
> > +     struct bpf_trampoline *tr;
> > +     bpf_func_t bpf_func;
> > +     u64 key;
> > +     int err;
> > +
> > +     key = bpf_trampoline_compute_key(NULL, prog->aux->attach_btf,
> > +                                      prog->aux->attach_btf_id);
> > +
> > +     err = bpf_lsm_find_cgroup_shim(prog, &bpf_func);
> > +     if (err)
> > +             return err;
> > +
> > +     tr = bpf_trampoline_get(key, tgt_info);
> > +     if (!tr)
> > +             return  -ENOMEM;
> > +
> > +     mutex_lock(&tr->mutex);
> > +
> > +     shim_prog = cgroup_shim_find(tr, bpf_func);
> > +     if (shim_prog) {
> > +             /* Reusing existing shim attached by the other program.
> > +              */
> nit.  Avoid extra line in '*/' for one liner comment.  Didn't
> see this convention in other bpf codes.

Ack, will do!

> Also, how about the earlier comment regarding to another __bpf_prog_enter
> and __bpf_prog_exit for BPF_LSM_CGROUP that don't do the active counts
> and stats collection ?

Ugh, I totally forgot that we've agreed to remove those active counts
for the lsm-cgroup case. Let me actually add that part..

> > +             bpf_prog_inc(shim_prog);
> > +             mutex_unlock(&tr->mutex);
> > +             return 0;
> > +     }
> > +
> > +     /* Allocate and install new shim.
> > +      */
> Same here.

Ack. Thank you for another round of review!

> > +
> > +     shim_prog = cgroup_shim_alloc(prog, bpf_func);
> > +     if (!shim_prog) {
> > +             err = -ENOMEM;
> > +             goto out;
> > +     }
> > +
> > +     err = __bpf_trampoline_link_prog(shim_prog, tr);
> > +     if (err)
> > +             goto out;
> > +
> > +     mutex_unlock(&tr->mutex);
> > +
> > +     return 0;
> > +out:
> > +     if (shim_prog)
> > +             bpf_prog_put(shim_prog);
> > +
> > +     mutex_unlock(&tr->mutex);
> > +     return err;
> > +}
> > +
> > +void bpf_trampoline_unlink_cgroup_shim(struct bpf_prog *prog)
> > +{
> > +     struct bpf_prog *shim_prog;
> > +     struct bpf_trampoline *tr;
> > +     bpf_func_t bpf_func;
> > +     u64 key;
> > +     int err;
> > +
> > +     key = bpf_trampoline_compute_key(NULL, prog->aux->attach_btf,
> > +                                      prog->aux->attach_btf_id);
> > +
> > +     err = bpf_lsm_find_cgroup_shim(prog, &bpf_func);
> > +     if (err)
> > +             return;
> > +
> > +     tr = bpf_trampoline_lookup(key);
> > +     if (!tr)
> > +             return;
> > +
> > +     mutex_lock(&tr->mutex);
> > +
> > +     shim_prog = cgroup_shim_find(tr, bpf_func);
> > +     if (shim_prog) {
> > +             /* We use shim_prog refcnt for tracking whether to
> > +              * remove the shim program from the trampoline.
> > +              * Trampoline's mutex is held while refcnt is
> > +              * added/subtracted so we don't need to care about
> > +              * potential races.
> > +              */
> > +
> > +             if (atomic64_read(&shim_prog->aux->refcnt) == 1)
> > +                     WARN_ON_ONCE(__bpf_trampoline_unlink_prog(shim_prog, tr));
> > +
> > +             bpf_prog_put(shim_prog);
> > +     }
> > +
> > +     mutex_unlock(&tr->mutex);
> > +
> > +     bpf_trampoline_put(tr); /* bpf_trampoline_lookup */
> > +
> > +     if (shim_prog)
> > +             bpf_trampoline_put(tr);
> > +}
> > +#endif
> > +
> >  struct bpf_trampoline *bpf_trampoline_get(u64 key,
> >                                         struct bpf_attach_target_info *tgt_info)
> >  {
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 9c1a02b82ecd..cc84954846d7 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -14197,6 +14197,7 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
> >               fallthrough;
> >       case BPF_MODIFY_RETURN:
> >       case BPF_LSM_MAC:
> > +     case BPF_LSM_CGROUP:
> >       case BPF_TRACE_FENTRY:
> >       case BPF_TRACE_FEXIT:
> >               if (!btf_type_is_func(t)) {
> > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> > index d14b10b85e51..bbe48a2dd852 100644
> > --- a/tools/include/uapi/linux/bpf.h
> > +++ b/tools/include/uapi/linux/bpf.h
> > @@ -998,6 +998,7 @@ enum bpf_attach_type {
> >       BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
> >       BPF_PERF_EVENT,
> >       BPF_TRACE_KPROBE_MULTI,
> > +     BPF_LSM_CGROUP,
> >       __MAX_BPF_ATTACH_TYPE
> >  };
> >
> > --
> > 2.36.0.rc0.470.gd361397f0d-goog
> >

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox