Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net 0/2] net/smc: Two fixes for smc fallback
From: Karsten Graul @ 2022-04-23 10:26 UTC (permalink / raw)
  To: Wen Gu, davem, kuba; +Cc: linux-s390, netdev, linux-kernel
In-Reply-To: <1650614179-11529-1-git-send-email-guwen@linux.alibaba.com>

On 22/04/2022 09:56, Wen Gu wrote:
> This patch set includes two fixes for smc fallback:
> 
> Patch 1/2 introduces some simple helpers to wrap the replacement
> and restore of clcsock's callback functions. Make sure that only
> the original callbacks will be saved and not overwritten.
> 
> Patch 2/2 fixes a syzbot reporting slab-out-of-bound issue where
> smc_fback_error_report() accesses the already freed smc sock (see
> https://lore.kernel.org/r/00000000000013ca8105d7ae3ada@google.com/).
> The patch fixes it by resetting sk_user_data and restoring clcsock
> callback functions timely in fallback situation.

Thank you for the analysis and the fix!

For the series:
Acked-by: Karsten Graul <kgraul@linux.ibm.com>

^ permalink raw reply

* Re: [PATCH] net: set proper memcg for net_init hooks allocations
From: kernel test robot @ 2022-04-23 10:31 UTC (permalink / raw)
  To: Vasily Averin, Vlastimil Babka, Shakeel Butt
  Cc: llvm, kbuild-all, kernel, Florian Westphal, linux-kernel,
	Roman Gushchin, Michal Hocko, cgroups, netdev, Jakub Kicinski,
	Paolo Abeni
In-Reply-To: <6f38e02b-9af3-4dcf-9000-1118a04b13c7@openvz.org>

Hi Vasily,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.18-rc3 next-20220422]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Vasily-Averin/net-set-proper-memcg-for-net_init-hooks-allocations/20220423-160759
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git c00c5e1d157bec0ef0b0b59aa5482eb8dc7e8e49
config: riscv-randconfig-r042-20220422 (https://download.01.org/0day-ci/archive/20220423/202204231806.8O86U791-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 5bd87350a5ae429baf8f373cb226a57b62f87280)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install riscv cross compiling tool for clang build
        # apt-get install binutils-riscv64-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/3b379e5391e36e13b9f36305aa6d233fb03d4e58
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Vasily-Averin/net-set-proper-memcg-for-net_init-hooks-allocations/20220423-160759
        git checkout 3b379e5391e36e13b9f36305aa6d233fb03d4e58
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash drivers/gpu/drm/exynos/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from drivers/gpu/drm/exynos/exynos_drm_dma.c:15:
   In file included from drivers/gpu/drm/exynos/exynos_drm_drv.h:16:
   In file included from include/drm/drm_crtc.h:28:
   In file included from include/linux/i2c.h:19:
   In file included from include/linux/regulator/consumer.h:35:
   In file included from include/linux/suspend.h:5:
   In file included from include/linux/swap.h:9:
   include/linux/memcontrol.h:1773:21: error: call to undeclared function 'css_tryget'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
           } while (memcg && !css_tryget(&memcg->css));
                              ^
   include/linux/memcontrol.h:1773:38: error: incomplete definition of type 'struct mem_cgroup'
           } while (memcg && !css_tryget(&memcg->css));
                                          ~~~~~^
   include/linux/mm_types.h:31:8: note: forward declaration of 'struct mem_cgroup'
   struct mem_cgroup;
          ^
>> drivers/gpu/drm/exynos/exynos_drm_dma.c:55:35: warning: implicit conversion from 'unsigned long long' to 'unsigned int' changes value from 18446744073709551615 to 4294967295 [-Wconstant-conversion]
           dma_set_max_seg_size(subdrv_dev, DMA_BIT_MASK(32));
           ~~~~~~~~~~~~~~~~~~~~             ^~~~~~~~~~~~~~~~
   include/linux/dma-mapping.h:76:40: note: expanded from macro 'DMA_BIT_MASK'
   #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1))
                                          ^~~~~
   1 warning and 2 errors generated.


vim +55 drivers/gpu/drm/exynos/exynos_drm_dma.c

67fbf3a3ef8443 Andrzej Hajda    2018-10-12  33  
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  34  /*
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  35   * drm_iommu_attach_device- attach device to iommu mapping
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  36   *
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  37   * @drm_dev: DRM device
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  38   * @subdrv_dev: device to be attach
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  39   *
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  40   * This function should be called by sub drivers to attach it to iommu
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  41   * mapping.
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  42   */
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  43  static int drm_iommu_attach_device(struct drm_device *drm_dev,
07dc3678bacc2a Marek Szyprowski 2020-03-09  44  				struct device *subdrv_dev, void **dma_priv)
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  45  {
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  46  	struct exynos_drm_private *priv = drm_dev->dev_private;
b9c633882de460 Marek Szyprowski 2020-06-01  47  	int ret = 0;
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  48  
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  49  	if (get_dma_ops(priv->dma_dev) != get_dma_ops(subdrv_dev)) {
6f83d20838c099 Inki Dae         2019-04-15  50  		DRM_DEV_ERROR(subdrv_dev, "Device %s lacks support for IOMMU\n",
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  51  			  dev_name(subdrv_dev));
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  52  		return -EINVAL;
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  53  	}
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  54  
ddfd4ab6bb0883 Marek Szyprowski 2020-07-07 @55  	dma_set_max_seg_size(subdrv_dev, DMA_BIT_MASK(32));
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  56  	if (IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)) {
07dc3678bacc2a Marek Szyprowski 2020-03-09  57  		/*
07dc3678bacc2a Marek Szyprowski 2020-03-09  58  		 * Keep the original DMA mapping of the sub-device and
07dc3678bacc2a Marek Szyprowski 2020-03-09  59  		 * restore it on Exynos DRM detach, otherwise the DMA
07dc3678bacc2a Marek Szyprowski 2020-03-09  60  		 * framework considers it as IOMMU-less during the next
07dc3678bacc2a Marek Szyprowski 2020-03-09  61  		 * probe (in case of deferred probe or modular build)
07dc3678bacc2a Marek Szyprowski 2020-03-09  62  		 */
07dc3678bacc2a Marek Szyprowski 2020-03-09  63  		*dma_priv = to_dma_iommu_mapping(subdrv_dev);
07dc3678bacc2a Marek Szyprowski 2020-03-09  64  		if (*dma_priv)
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  65  			arm_iommu_detach_device(subdrv_dev);
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  66  
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  67  		ret = arm_iommu_attach_device(subdrv_dev, priv->mapping);
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  68  	} else if (IS_ENABLED(CONFIG_IOMMU_DMA)) {
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  69  		ret = iommu_attach_device(priv->mapping, subdrv_dev);
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  70  	}
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  71  
b9c633882de460 Marek Szyprowski 2020-06-01  72  	return ret;
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  73  }
67fbf3a3ef8443 Andrzej Hajda    2018-10-12  74  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply

* Re: [PATCH net-next 0/8] mptcp: TCP fallback for established connections
From: patchwork-bot+netdevbpf @ 2022-04-23 11:00 UTC (permalink / raw)
  To: Mat Martineau; +Cc: netdev, davem, kuba, pabeni, matthieu.baerts, mptcp
In-Reply-To: <20220422215543.545732-1-mathew.j.martineau@linux.intel.com>

Hello:

This series was applied to netdev/net-next.git (master)
by David S. Miller <davem@davemloft.net>:

On Fri, 22 Apr 2022 14:55:35 -0700 you wrote:
> RFC 8684 allows some MPTCP connections to fall back to regular TCP when
> the MPTCP DSS checksum detects middlebox interference, there is only a
> single subflow, and there is no unacknowledged out-of-sequence
> data. When this condition is detected, the stack sends a MPTCP DSS
> option with an "infinite mapping" to signal that a fallback is
> happening, and the peers will stop sending MPTCP options in their TCP
> headers. The Linux MPTCP stack has not yet supported this type of
> fallback, instead closing the connection when the MPTCP checksum fails.
> 
> [...]

Here is the summary with links:
  - [net-next,1/8] mptcp: don't send RST for single subflow
    https://git.kernel.org/netdev/net-next/c/1761fed25678
  - [net-next,2/8] mptcp: add the fallback check
    https://git.kernel.org/netdev/net-next/c/0348c690ed37
  - [net-next,3/8] mptcp: track and update contiguous data status
    https://git.kernel.org/netdev/net-next/c/0530020a7c8f
  - [net-next,4/8] mptcp: infinite mapping sending
    https://git.kernel.org/netdev/net-next/c/1e39e5a32ad7
  - [net-next,5/8] mptcp: infinite mapping receiving
    https://git.kernel.org/netdev/net-next/c/f8d4bcacff3b
  - [net-next,6/8] mptcp: add mib for infinite map sending
    https://git.kernel.org/netdev/net-next/c/104125b82e5c
  - [net-next,7/8] mptcp: dump infinite_map field in mptcp_dump_mpext
    https://git.kernel.org/netdev/net-next/c/d9fdd02d4265
  - [net-next,8/8] selftests: mptcp: add infinite map mibs check
    https://git.kernel.org/netdev/net-next/c/8bd03be3418c

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* [PATCH] brcmfmac: use ISO3166 country code and 0 rev as fallback on brcmfmac43602 chips
From: Hamid Zamani @ 2022-04-23 11:12 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Arend van Spriel, Franky Lin, Hante Meuleman, Shawn Guo,
	Hans de Goede, Soeren Moch, linux-wireless,
	brcm80211-dev-list.pdl, SHA-cyfmac-dev-list, netdev, linux-kernel,
	Hamid Zamani

This uses ISO3166 country code and 0 rev on brcmfmac43602 chips.
Without this patch 80 MHz width is not selected on 5 GHz channels.

Commit a21bf90e927f ("brcmfmac: use ISO3166 country code and 0 rev as
fallback on some devices") provides a way to specify chips for using the
fallback case.

Before commit 151a7c12c4fc ("Revert "brcmfmac: use ISO3166 country code
and 0 rev as fallback"") brcmfmac43602 devices works correctly and for
this specific case 80 MHz width is selected.

Signed-off-by: Hamid Zamani <hzamani.cs91@gmail.com>
---
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
index f0ad1e23f3c8..360b103fe898 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
@@ -7481,6 +7481,7 @@ static bool brmcf_use_iso3166_ccode_fallback(struct brcmf_pub *drvr)
 {
 	switch (drvr->bus_if->chip) {
 	case BRCM_CC_4345_CHIP_ID:
+	case BRCM_CC_43602_CHIP_ID:
 		return true;
 	default:
 		return false;
-- 
2.35.1

^ permalink raw reply related

* Re: [PATCH net-next 0/8] DSA selftests
From: patchwork-bot+netdevbpf @ 2022-04-23 11:20 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, kuba, davem, pabeni, f.fainelli, andrew, vivien.didelot,
	olteanv, claudiu.manoil, alexandre.belloni, UNGLinuxDriver,
	tobias, mattias.forsblad, roopa, nikolay, jiri, idosch, troglobit,
	kabel, ansuelsmth, dqfext, kurt
In-Reply-To: <20220422101504.3729309-1-vladimir.oltean@nxp.com>

Hello:

This series was applied to netdev/net-next.git (master)
by David S. Miller <davem@davemloft.net>:

On Fri, 22 Apr 2022 13:14:56 +0300 you wrote:
> When working on complex new features or reworks it becomes increasingly
> difficult to ensure there aren't regressions being introduced, and
> therefore it would be nice if we could go over the functionality we
> already have and write some tests for it.
> 
> Verbally I know from Tobias Waldekranz that he has been working on some
> selftests for DSA, yet I have never seen them, so here I am adding some
> tests I have written which have been useful for me. The list is by no
> means complete (it only covers elementary functionality), but it's still
> good to have as a starting point. I also borrowed some refactoring
> changes from Joachim Wiberg that he submitted for his "net: bridge:
> forwarding of unknown IPv4/IPv6/MAC BUM traffic" series, but not the
> entirety of his selftests. I now think that his selftests have some
> overlap with bridge_vlan_unaware.sh and bridge_vlan_aware.sh and they
> should be more tightly integrated with each other - yet I didn't do that
> either :). Another issue I had with his selftests was that they jumped
> straight ahead to configure brport flags on br0 (a radical new idea
> still at RFC status) while we have bigger problems, and we don't have
> nearly enough coverage for the *existing* functionality.
> 
> [...]

Here is the summary with links:
  - [net-next,1/8] selftests: forwarding: add option to run tests with stable MAC addresses
    https://git.kernel.org/netdev/net-next/c/b343734ee265
  - [net-next,2/8] selftests: forwarding: add TCPDUMP_EXTRA_FLAGS to lib.sh
    https://git.kernel.org/netdev/net-next/c/fe32dffdcd33
  - [net-next,3/8] selftests: forwarding: multiple instances in tcpdump helper
    https://git.kernel.org/netdev/net-next/c/6182c5c5098f
  - [net-next,4/8] selftests: forwarding: add helpers for IP multicast group joins/leaves
    https://git.kernel.org/netdev/net-next/c/f23cddc72294
  - [net-next,5/8] selftests: forwarding: add helper for retrieving IPv6 link-local address of interface
    https://git.kernel.org/netdev/net-next/c/a5114df6c613
  - [net-next,6/8] selftests: forwarding: add a no_forwarding.sh test
    https://git.kernel.org/netdev/net-next/c/476a4f05d9b8
  - [net-next,7/8] selftests: forwarding: add a test for local_termination.sh
    https://git.kernel.org/netdev/net-next/c/90b9566aa5cd
  - [net-next,8/8] selftests: drivers: dsa: add a subset of forwarding selftests
    https://git.kernel.org/netdev/net-next/c/07c8a2dd69f6

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* [PATCH net] virtio_net: fix wrong buf address calculation when using xdp
From: Nikolay Aleksandrov @ 2022-04-23 11:26 UTC (permalink / raw)
  To: netdev
  Cc: kuba, davem, Nikolay Aleksandrov, stable, Jason Wang, Xuan Zhuo,
	Daniel Borkmann, Michael S. Tsirkin, virtualization

We received a report[1] of kernel crashes when Cilium is used in XDP
mode with virtio_net after updating to newer kernels. After
investigating the reason it turned out that when using mergeable bufs
with an XDP program which adjusts xdp.data or xdp.data_meta page_to_buf()
calculates the build_skb address wrong because the offset can become less
than the headroom so it gets the address of the previous page (-X bytes
depending on how lower offset is):
 page_to_skb: page addr ffff9eb2923e2000 buf ffff9eb2923e1ffc offset 252 headroom 256

This is a pr_err() I added in the beginning of page_to_skb which clearly
shows offset that is less than headroom by adding 4 bytes of metadata
via an xdp prog. The calculations done are:
 receive_mergeable():
 headroom = VIRTIO_XDP_HEADROOM; // VIRTIO_XDP_HEADROOM == 256 bytes
 offset = xdp.data - page_address(xdp_page) -
          vi->hdr_len - metasize;

 page_to_skb():
 p = page_address(page) + offset;
 ...
 buf = p - headroom;

Now buf goes -4 bytes from the page's starting address as can be seen
above which is set as skb->head and skb->data by build_skb later. Depending
on what's done with the skb (when it's freed most often) we get all kinds
of corruptions and BUG_ON() triggers in mm[2]. The story of the faulty
commit is interesting because the patch was sent and applied twice (it
seems the first one got lost during merge back in 5.13 window). The
first version of the patch that was applied as:
 commit 7bf64460e3b2 ("virtio-net: get build_skb() buf by data ptr")
was actually correct because it calculated the page starting address
without relying on offset or headroom, but then the second version that
was applied as:
 commit 8fb7da9e9907 ("virtio_net: get build_skb() buf by data ptr")
was wrong and added the above calculation.
An example xdp prog[3] is below.

[1] https://github.com/cilium/cilium/issues/19453

[2] Two of the many traces:
 [   40.437400] BUG: Bad page state in process swapper/0  pfn:14940
 [   40.916726] BUG: Bad page state in process systemd-resolve  pfn:053b7
 [   41.300891] kernel BUG at include/linux/mm.h:720!
 [   41.301801] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
 [   41.302784] CPU: 1 PID: 1181 Comm: kubelet Kdump: loaded Tainted: G    B   W         5.18.0-rc1+ #37
 [   41.304458] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014
 [   41.306018] RIP: 0010:page_frag_free+0x79/0xe0
 [   41.306836] Code: 00 00 75 ea 48 8b 07 a9 00 00 01 00 74 e0 48 8b 47 48 48 8d 50 ff a8 01 48 0f 45 fa eb d0 48 c7 c6 18 b8 30 a6 e8 d7 f8 fc ff <0f> 0b 48 8d 78 ff eb bc 48 8b 07 a9 00 00 01 00 74 3a 66 90 0f b6
 [   41.310235] RSP: 0018:ffffac05c2a6bc78 EFLAGS: 00010292
 [   41.311201] RAX: 000000000000003e RBX: 0000000000000000 RCX: 0000000000000000
 [   41.312502] RDX: 0000000000000001 RSI: ffffffffa6423004 RDI: 00000000ffffffff
 [   41.313794] RBP: ffff993c98823600 R08: 0000000000000000 R09: 00000000ffffdfff
 [   41.315089] R10: ffffac05c2a6ba68 R11: ffffffffa698ca28 R12: ffff993c98823600
 [   41.316398] R13: ffff993c86311ebc R14: 0000000000000000 R15: 000000000000005c
 [   41.317700] FS:  00007fe13fc56740(0000) GS:ffff993cdd900000(0000) knlGS:0000000000000000
 [   41.319150] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [   41.320152] CR2: 000000c00008a000 CR3: 0000000014908000 CR4: 0000000000350ee0
 [   41.321387] Call Trace:
 [   41.321819]  <TASK>
 [   41.322193]  skb_release_data+0x13f/0x1c0
 [   41.322902]  __kfree_skb+0x20/0x30
 [   41.343870]  tcp_recvmsg_locked+0x671/0x880
 [   41.363764]  tcp_recvmsg+0x5e/0x1c0
 [   41.384102]  inet_recvmsg+0x42/0x100
 [   41.406783]  ? sock_recvmsg+0x1d/0x70
 [   41.428201]  sock_read_iter+0x84/0xd0
 [   41.445592]  ? 0xffffffffa3000000
 [   41.462442]  new_sync_read+0x148/0x160
 [   41.479314]  ? 0xffffffffa3000000
 [   41.496937]  vfs_read+0x138/0x190
 [   41.517198]  ksys_read+0x87/0xc0
 [   41.535336]  do_syscall_64+0x3b/0x90
 [   41.551637]  entry_SYSCALL_64_after_hwframe+0x44/0xae
 [   41.568050] RIP: 0033:0x48765b
 [   41.583955] Code: e8 4a 35 fe ff eb 88 cc cc cc cc cc cc cc cc e8 fb 7a fe ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
 [   41.632818] RSP: 002b:000000c000a2f5b8 EFLAGS: 00000212 ORIG_RAX: 0000000000000000
 [   41.664588] RAX: ffffffffffffffda RBX: 000000c000062000 RCX: 000000000048765b
 [   41.681205] RDX: 0000000000005e54 RSI: 000000c000e66000 RDI: 0000000000000016
 [   41.697164] RBP: 000000c000a2f608 R08: 0000000000000001 R09: 00000000000001b4
 [   41.713034] R10: 00000000000000b6 R11: 0000000000000212 R12: 00000000000000e9
 [   41.728755] R13: 0000000000000001 R14: 000000c000a92000 R15: ffffffffffffffff
 [   41.744254]  </TASK>
 [   41.758585] Modules linked in: br_netfilter bridge veth netconsole virtio_net

 and

 [   33.524802] BUG: Bad page state in process systemd-network  pfn:11e60
 [   33.528617] page ffffe05dc0147b00 ffffe05dc04e7a00 ffff8ae9851ec000 (1) len 82 offset 252 metasize 4 hroom 0 hdr_len 12 data ffff8ae9851ec10c data_meta ffff8ae9851ec108 data_end ffff8ae9851ec14e
 [   33.529764] page:000000003792b5ba refcount:0 mapcount:-512 mapping:0000000000000000 index:0x0 pfn:0x11e60
 [   33.532463] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
 [   33.532468] raw: 000fffffc0000000 0000000000000000 dead000000000122 0000000000000000
 [   33.532470] raw: 0000000000000000 0000000000000000 00000000fffffdff 0000000000000000
 [   33.532471] page dumped because: nonzero mapcount
 [   33.532472] Modules linked in: br_netfilter bridge veth netconsole virtio_net
 [   33.532479] CPU: 0 PID: 791 Comm: systemd-network Kdump: loaded Not tainted 5.18.0-rc1+ #37
 [   33.532482] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014
 [   33.532484] Call Trace:
 [   33.532496]  <TASK>
 [   33.532500]  dump_stack_lvl+0x45/0x5a
 [   33.532506]  bad_page.cold+0x63/0x94
 [   33.532510]  free_pcp_prepare+0x290/0x420
 [   33.532515]  free_unref_page+0x1b/0x100
 [   33.532518]  skb_release_data+0x13f/0x1c0
 [   33.532524]  kfree_skb_reason+0x3e/0xc0
 [   33.532527]  ip6_mc_input+0x23c/0x2b0
 [   33.532531]  ip6_sublist_rcv_finish+0x83/0x90
 [   33.532534]  ip6_sublist_rcv+0x22b/0x2b0

[3] XDP program to reproduce(xdp_pass.c):
 #include <linux/bpf.h>
 #include <bpf/bpf_helpers.h>

 SEC("xdp_pass")
 int xdp_pkt_pass(struct xdp_md *ctx)
 {
          bpf_xdp_adjust_head(ctx, -(int)32);
          return XDP_PASS;
 }

 char _license[] SEC("license") = "GPL";

 compile: clang -O2 -g -Wall -target bpf -c xdp_pass.c -o xdp_pass.o
 load on virtio_net: ip link set enp1s0 xdpdrv obj xdp_pass.o sec xdp_pass

CC: stable@vger.kernel.org
CC: Jason Wang <jasowang@redhat.com>
CC: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
CC: Daniel Borkmann <daniel@iogearbox.net>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: virtualization@lists.linux-foundation.org
Fixes: 8fb7da9e9907 ("virtio_net: get build_skb() buf by data ptr")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
---
 drivers/net/virtio_net.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 87838cbe38cf..0687dd88e97f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -434,9 +434,13 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 	 * Buffers with headroom use PAGE_SIZE as alloc size, see
 	 * add_recvbuf_mergeable() + get_mergeable_buf_len()
 	 */
-	truesize = headroom ? PAGE_SIZE : truesize;
+	if (headroom) {
+		truesize = PAGE_SIZE;
+		buf = (char *)((unsigned long)p & PAGE_MASK);
+	} else {
+		buf = p;
+	}
 	tailroom = truesize - headroom;
-	buf = p - headroom;
 
 	len -= hdr_len;
 	offset += hdr_padded_len;
-- 
2.35.1


^ permalink raw reply related

* Re: [syzbot] WARNING: kmalloc bug in bpf
From: Dmitry Vyukov @ 2022-04-23 12:13 UTC (permalink / raw)
  To: syzbot
  Cc: andrii, ast, bpf, daniel, davem, hawk, jiri, john.fastabend,
	kafai, kpsingh, kuba, leonro, linux-kernel, netdev,
	songliubraving, syzkaller-bugs, torvalds, yhs
In-Reply-To: <000000000000ec946105dd42bcd6@google.com>

On Fri, 22 Apr 2022 at 20:53, syzbot
<syzbot+cecf5b7071a0dfb76530@syzkaller.appspotmail.com> wrote:
>
> syzbot suspects this issue was fixed by commit:
>
> commit 0708a0afe291bdfe1386d74d5ec1f0c27e8b9168
> Author: Daniel Borkmann <daniel@iogearbox.net>
> Date:   Fri Mar 4 14:26:32 2022 +0000
>
>     mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls
>
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=1499c6fcf00000
> start commit:   1d5a47424040 sfc: The RX page_ring is optional
> git tree:       net
> kernel config:  https://syzkaller.appspot.com/x/.config?x=1a86c22260afac2f
> dashboard link: https://syzkaller.appspot.com/bug?extid=cecf5b7071a0dfb76530
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=176738e7b00000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=13b4508db00000
>
> If the result looks correct, please mark the issue as fixed by replying with:
>
> #syz fix: mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls
>
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection

Looks reasonable:

#syz fix: mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls

^ permalink raw reply

* Re: [PATCH v3] cw1200: fix incorrect check to determine if no element is found in list
From: Kalle Valo @ 2022-04-23 12:39 UTC (permalink / raw)
  To: Xiaomeng Tong
  Cc: pizza, davem, kuba, pabeni, linville, linux-wireless, netdev,
	linux-kernel, Xiaomeng Tong
In-Reply-To: <20220413091723.17596-1-xiam0nd.tong@gmail.com>

Xiaomeng Tong <xiam0nd.tong@gmail.com> wrote:

> The bug is here: "} else if (item) {".
> 
> The list iterator value will *always* be set and non-NULL by
> list_for_each_entry(), so it is incorrect to assume that the iterator
> value will be NULL if the list is empty or no element is found in list.
> 
> Use a new value 'iter' as the list iterator, while use the old value
> 'item' as a dedicated pointer to point to the found element, which
> 1. can fix this bug, due to now 'item' is NULL only if it's not found.
> 2. do not need to change all the uses of 'item' after the loop.
> 3. can also limit the scope of the list iterator 'iter' *only inside*
>    the traversal loop by simply declaring 'iter' inside the loop in the
>    future, as usage of the iterator outside of the list_for_each_entry
>    is considered harmful. https://lkml.org/lkml/2022/2/17/1032
> 
> Fixes: a910e4a94f692 ("cw1200: add driver for the ST-E CW1100 & CW1200 WLAN chipsets")
> Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>

Can someone review this, please?

Patch set to Deferred.

-- 
https://patchwork.kernel.org/project/linux-wireless/patch/20220413091723.17596-1-xiam0nd.tong@gmail.com/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


^ permalink raw reply

* Re: wl18xx: debugfs: use pm_runtime_resume_and_get() instead of pm_runtime_get_sync()
From: Kalle Valo @ 2022-04-23 12:43 UTC (permalink / raw)
  To: cgel.zte
  Cc: davem, kuba, pabeni, linux-wireless, netdev, linux-kernel,
	Minghao Chi, Zeal Robot
In-Reply-To: <20220413093356.2538192-1-chi.minghao@zte.com.cn>

cgel.zte@gmail.com wrote:

> From: Minghao Chi <chi.minghao@zte.com.cn>
> 
> Using pm_runtime_resume_and_get is more appropriate
> for simplifing code
> 
> Reported-by: Zeal Robot <zealci@zte.com.cn>
> Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>

Patch applied to wireless-next.git, thanks.

8e95061b5b9c wl18xx: debugfs: use pm_runtime_resume_and_get() instead of pm_runtime_get_sync()

-- 
https://patchwork.kernel.org/project/linux-wireless/patch/20220413093356.2538192-1-chi.minghao@zte.com.cn/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


^ permalink raw reply

* Re: wl12xx: use pm_runtime_resume_and_get() instead of pm_runtime_get_sync()
From: Kalle Valo @ 2022-04-23 12:45 UTC (permalink / raw)
  To: cgel.zte
  Cc: davem, kuba, linux-wireless, netdev, linux-kernel, Minghao Chi,
	Zeal Robot
In-Reply-To: <20220420090214.2588618-1-chi.minghao@zte.com.cn>

cgel.zte@gmail.com wrote:

> From: Minghao Chi <chi.minghao@zte.com.cn>
> 
> Using pm_runtime_resume_and_get() to replace pm_runtime_get_sync and
> pm_runtime_put_noidle. This change is just to simplify the code, no
> actual functional changes.
> 
> Reported-by: Zeal Robot <zealci@zte.com.cn>
> Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>

Patch applied to wireless-next.git, thanks.

54d5ecc1710e wl12xx: use pm_runtime_resume_and_get() instead of pm_runtime_get_sync()

-- 
https://patchwork.kernel.org/project/linux-wireless/patch/20220420090214.2588618-1-chi.minghao@zte.com.cn/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


^ permalink raw reply

* Re: wl12xx: scan: use pm_runtime_resume_and_get() instead of pm_runtime_get_sync()
From: Kalle Valo @ 2022-04-23 12:46 UTC (permalink / raw)
  To: cgel.zte
  Cc: davem, kuba, linux-wireless, netdev, linux-kernel, Minghao Chi,
	Zeal Robot
In-Reply-To: <20220420090247.2588680-1-chi.minghao@zte.com.cn>

cgel.zte@gmail.com wrote:

> From: Minghao Chi <chi.minghao@zte.com.cn>
> 
> Using pm_runtime_resume_and_get() to replace pm_runtime_get_sync and
> pm_runtime_put_noidle. This change is just to simplify the code, no
> actual functional changes.
> 
> Reported-by: Zeal Robot <zealci@zte.com.cn>
> Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>

Patch applied to wireless-next.git, thanks.

c94e36908467 wl12xx: scan: use pm_runtime_resume_and_get() instead of pm_runtime_get_sync()

-- 
https://patchwork.kernel.org/project/linux-wireless/patch/20220420090247.2588680-1-chi.minghao@zte.com.cn/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


^ permalink raw reply

* [PATCH] dt-bindings: can: renesas,rcar-canfd: Document RZ/G2UL support
From: Biju Das @ 2022-04-23 13:07 UTC (permalink / raw)
  To: Wolfgang Grandegger, Marc Kleine-Budde, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Rob Herring, Krzysztof Kozlowski
  Cc: Biju Das, Fabrizio Castro, linux-can, netdev, devicetree,
	Geert Uytterhoeven, Chris Paterson, Biju Das,
	Prabhakar Mahadev Lad, linux-renesas-soc

Add CANFD binding documentation for Renesas R9A07G043 (RZ/G2UL) SoC.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
---
 .../devicetree/bindings/net/can/renesas,rcar-canfd.yaml          | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/net/can/renesas,rcar-canfd.yaml b/Documentation/devicetree/bindings/net/can/renesas,rcar-canfd.yaml
index f98c53dc1894..96259868df43 100644
--- a/Documentation/devicetree/bindings/net/can/renesas,rcar-canfd.yaml
+++ b/Documentation/devicetree/bindings/net/can/renesas,rcar-canfd.yaml
@@ -32,6 +32,7 @@ properties:
 
       - items:
           - enum:
+              - renesas,r9a07g043-canfd    # RZ/G2UL
               - renesas,r9a07g044-canfd    # RZ/G2{L,LC}
               - renesas,r9a07g054-canfd    # RZ/V2L
           - const: renesas,rzg2l-canfd     # RZ/G2L family
-- 
2.25.1


^ permalink raw reply related

* [PATCH 2/2] net: dsa: mv88e6xxx: Handle single-chip-address OF property
From: Nathan Rossi @ 2022-04-23 13:14 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Nathan Rossi, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Vladimir Oltean, David S. Miller, Jakub Kicinski, Paolo Abeni
In-Reply-To: <20220423131427.237160-1-nathan@nathanrossi.com>

Handle the parsing and use of single chip addressing when the switch has
the single-chip-address property defined. This allows for specifying the
switch as using single chip addressing even when mdio address 0 is used
by another device on the bus. This is a feature of some switches (e.g.
the MV88E6341/MV88E6141) where the switch shares the bus only responding
to the higher 16 addresses.

Signed-off-by: Nathan Rossi <nathan@nathanrossi.com>
---
 drivers/net/dsa/mv88e6xxx/smi.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6xxx/smi.c b/drivers/net/dsa/mv88e6xxx/smi.c
index a990271b74..1eb31c1563 100644
--- a/drivers/net/dsa/mv88e6xxx/smi.c
+++ b/drivers/net/dsa/mv88e6xxx/smi.c
@@ -171,9 +171,12 @@ static const struct mv88e6xxx_bus_ops mv88e6xxx_smi_indirect_ops = {
 int mv88e6xxx_smi_init(struct mv88e6xxx_chip *chip,
 		       struct mii_bus *bus, int sw_addr)
 {
+	struct device_node *np = chip->dev->of_node;
+
 	if (chip->info->dual_chip)
 		chip->smi_ops = &mv88e6xxx_smi_dual_direct_ops;
-	else if (sw_addr == 0)
+	else if (sw_addr == 0 ||
+		 (np && of_property_read_bool(np, "single-chip-address")))
 		chip->smi_ops = &mv88e6xxx_smi_direct_ops;
 	else if (chip->info->multi_chip)
 		chip->smi_ops = &mv88e6xxx_smi_indirect_ops;
---
2.35.2

^ permalink raw reply related

* [PATCH 1/2] dt-bindings: net: dsa: marvell: Add single-chip-address property
From: Nathan Rossi @ 2022-04-23 13:14 UTC (permalink / raw)
  To: netdev, devicetree, linux-kernel
  Cc: Nathan Rossi, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Vladimir Oltean, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Rob Herring, Krzysztof Kozlowski

Some Marvell DSA devices can be accessed in a single chip addressing
mode. This is currently configured by setting the address of the switch
to 0. However switches in this configuration do not respond to address
0, only responding to higher addresses (fixed addressed based on the
switch model) for the individual ports/etc. This is a feature to allow
for other phys to exist on the same mdio bus.

This change defines a 'single-chip-address' property in order to
explicitly define that the chip is accessed in this mode. This allows
for a switch to have an address defined other than 0, so that address
0 can be used for another mdio device.

Signed-off-by: Nathan Rossi <nathan@nathanrossi.com>
---
 Documentation/devicetree/bindings/net/dsa/marvell.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/dsa/marvell.txt b/Documentation/devicetree/bindings/net/dsa/marvell.txt
index 2363b41241..5c7304274c 100644
--- a/Documentation/devicetree/bindings/net/dsa/marvell.txt
+++ b/Documentation/devicetree/bindings/net/dsa/marvell.txt
@@ -46,6 +46,8 @@ Optional properties:
 - mdio?		: Container of PHYs and devices on the external MDIO
 			  bus. The node must contains a compatible string of
 			  "marvell,mv88e6xxx-mdio-external"
+- single-chip-address	: Device is configured to use single chip addressing
+			  mode.

 Example:

---
2.35.2

^ permalink raw reply related

* Re: [PATCH net] sctp: check asoc strreset_chunk in sctp_generate_reconf_event
From: Marcelo Ricardo Leitner @ 2022-04-23 13:18 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, linux-sctp, davem, kuba, Neil Horman
In-Reply-To: <3000f8b12920ae81b84dceead6dcc90bb00c0403.1650487961.git.lucien.xin@gmail.com>

On Wed, Apr 20, 2022 at 04:52:41PM -0400, Xin Long wrote:
> A null pointer reference issue can be triggered when the response of a
> stream reconf request arrives after the timer is triggered, such as:
> 
>   send Incoming SSN Reset Request --->
>   CPU0:
>    reconf timer is triggered,
>    go to the handler code before hold sk lock
>                             <--- reply with Outgoing SSN Reset Request
>   CPU1:
>    process Outgoing SSN Reset Request,
>    and set asoc->strreset_chunk to NULL
>   CPU0:
>    continue the handler code, hold sk lock,
>    and try to hold asoc->strreset_chunk, crash!
> 
> In Ying Xu's testing, the call trace is:
> 
>   [ ] BUG: kernel NULL pointer dereference, address: 0000000000000010
>   [ ] RIP: 0010:sctp_chunk_hold+0xe/0x40 [sctp]
>   [ ] Call Trace:
>   [ ]  <IRQ>
>   [ ]  sctp_sf_send_reconf+0x2c/0x100 [sctp]
>   [ ]  sctp_do_sm+0xa4/0x220 [sctp]
>   [ ]  sctp_generate_reconf_event+0xbd/0xe0 [sctp]
>   [ ]  call_timer_fn+0x26/0x130
> 
> This patch is to fix it by returning from the timer handler if asoc
> strreset_chunk is already set to NULL.

Right. The timer callback didn't have a check on whether it was still
needed or not, and per the description above, it would simply try to
handle it twice then.

> 
> Fixes: 7b9438de0cd4 ("sctp: add stream reconf timer")
> Reported-by: Ying Xu <yinxu@redhat.com>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

^ permalink raw reply

* [PATCH] net: dsa: mv88e6xxx: Skip cmode writable for mv88e6*41 if unchanged
From: Nathan Rossi @ 2022-04-23 13:20 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Nathan Rossi, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Vladimir Oltean, David S. Miller, Jakub Kicinski, Paolo Abeni

The mv88e6341_port_set_cmode function always calls the set writable
regardless of whether the current cmode is different from the desired
cmode. This is problematic for specific configurations of the mv88e6341
and mv88e6141 (in single chip adddressing mode?) where the hidden
registers are not accessible. This causes the set_cmode_writable to
fail, and causes teardown of the switch despite the cmode already being
configured in the correct mode (via external configuration).

This change adds checking of the current cmode compared to the desired
mode and returns if already in the desired mode. This skips the
set_cmode_writable setup if the port is already configured in the
desired mode, avoiding any issues with access of hidden registers.

Signed-off-by: Nathan Rossi <nathan@nathanrossi.com>
---
 drivers/net/dsa/mv88e6xxx/port.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c
index 795b312876..f2e9c8cae3 100644
--- a/drivers/net/dsa/mv88e6xxx/port.c
+++ b/drivers/net/dsa/mv88e6xxx/port.c
@@ -713,6 +713,7 @@ int mv88e6341_port_set_cmode(struct mv88e6xxx_chip *chip, int port,
 			     phy_interface_t mode)
 {
 	int err;
+	u8 cmode = chip->ports[port].cmode;

 	if (port != 5)
 		return -EOPNOTSUPP;
@@ -724,6 +725,23 @@ int mv88e6341_port_set_cmode(struct mv88e6xxx_chip *chip, int port,
 	case PHY_INTERFACE_MODE_XAUI:
 	case PHY_INTERFACE_MODE_RXAUI:
 		return -EINVAL;
+
+	/* Check before setting writable. Such that on devices that are already
+	 * correctly configured, no attempt is made to make the cmode writable
+	 * as it may fail.
+	 */
+	case PHY_INTERFACE_MODE_1000BASEX:
+		if (cmode == MV88E6XXX_PORT_STS_CMODE_1000BASEX)
+			return 0;
+		break;
+	case PHY_INTERFACE_MODE_SGMII:
+		if (cmode == MV88E6XXX_PORT_STS_CMODE_SGMII)
+			return 0;
+		break;
+	case PHY_INTERFACE_MODE_2500BASEX:
+		if (cmode == MV88E6XXX_PORT_STS_CMODE_2500BASEX)
+			return 0;
+		break;
 	default:
 		break;
 	}
---
2.35.2

^ permalink raw reply related

* Re: [PATCH] net: dsa: mv88e6xxx: Skip cmode writable for mv88e6*41 if unchanged
From: Marek Behún @ 2022-04-23 13:25 UTC (permalink / raw)
  To: Nathan Rossi
  Cc: netdev, linux-kernel, Andrew Lunn, Vivien Didelot,
	Florian Fainelli, Vladimir Oltean, David S. Miller,
	Jakub Kicinski, Paolo Abeni
In-Reply-To: <20220423132035.238704-1-nathan@nathanrossi.com>

On Sat, 23 Apr 2022 13:20:35 +0000
Nathan Rossi <nathan@nathanrossi.com> wrote:

> The mv88e6341_port_set_cmode function always calls the set writable
> regardless of whether the current cmode is different from the desired
> cmode. This is problematic for specific configurations of the mv88e6341
> and mv88e6141 (in single chip adddressing mode?) where the hidden
> registers are not accessible.

I don't have a problem with skipping setting cmode writable if cmode is
not being changed. But hidden registers should be accessible regardless
of whether you are using single chip addressing mode or not. You need
to find why it isn't working for you, this is a bug.

Marek

^ permalink raw reply

* Re: [PATCH] net: unexport csum_and_copy_{from,to}_user
From: Michael Ellerman @ 2022-04-23 13:42 UTC (permalink / raw)
  To: Christoph Hellwig, akpm
  Cc: x86, linux-alpha, linux-m68k, linuxppc-dev, linux-kernel, netdev
In-Reply-To: <20220421070440.1282704-1-hch@lst.de>

Christoph Hellwig <hch@lst.de> writes:
> csum_and_copy_from_user and csum_and_copy_to_user are exported by
> a few architectures, but not actually used in modular code.  Drop
> the exports.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  arch/alpha/lib/csum_partial_copy.c   | 1 -
>  arch/m68k/lib/checksum.c             | 2 --
>  arch/powerpc/lib/checksum_wrappers.c | 2 --
>  arch/x86/lib/csum-wrappers_64.c      | 2 --
>  4 files changed, 7 deletions(-)

Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)

cheers

^ permalink raw reply

* Re: [PATCH net] net: Use this_cpu_inc() to increment net->core_stats
From: Eric Dumazet @ 2022-04-23 13:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sebastian Andrzej Siewior, netdev, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Thomas Gleixner
In-Reply-To: <20220423092439.GY2731@worktop.programming.kicks-ass.net>

On Sat, Apr 23, 2022 at 2:24 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Apr 21, 2022 at 06:51:31PM +0200, Sebastian Andrzej Siewior wrote:
> > On 2022-04-21 09:06:05 [-0700], Eric Dumazet wrote:
> > > On Thu, Apr 21, 2022 at 7:00 AM Sebastian Andrzej Siewior
> > > <bigeasy@linutronix.de> wrote:
> > > >
> > >
> > > >                 for_each_possible_cpu(i) {
> > > >                         core_stats = per_cpu_ptr(p, i);
> > > > -                       storage->rx_dropped += local_read(&core_stats->rx_dropped);
> > > > -                       storage->tx_dropped += local_read(&core_stats->tx_dropped);
> > > > -                       storage->rx_nohandler += local_read(&core_stats->rx_nohandler);
> > > > +                       storage->rx_dropped += core_stats->rx_dropped;
> > > > +                       storage->tx_dropped += core_stats->tx_dropped;
> > > > +                       storage->rx_nohandler += core_stats->rx_nohandler;
> > >
> > > I think that one of the reasons for me to use  local_read() was that
> > > it provided what was needed to avoid future syzbot reports.
> >
> > syzbot report due a plain read of a per-CPU variable which might be
> > modified?
> >
> > > Perhaps use READ_ONCE() here ?
> > >
> > > Yes, we have many similar folding loops that are  simply assuming
> > > compiler won't do stupid things.
> >
> > I wasn't sure about that and added PeterZ to do some yelling here just
> > in case. And yes, we have other sites doing exactly that. In
> >    Documentation/core-api/this_cpu_ops.rst
> > there is nothing about remote-READ-access (only that there should be no
> > writes (due to parallel this_cpu_inc() on the local CPU)). I know that a
> > 32bit write can be optimized in two 16bit writes in certain cases but a
> > read is a read.
> > PeterZ? :)
>
> Eric is right. READ_ONCE() is 'required' to ensure the compiler doesn't
> split the load and KCSAN konws about these things.

More details can be found in https://lwn.net/Articles/793253/

Thanks !

^ permalink raw reply

* Re: [PATCH net] virtio_net: fix wrong buf address calculation when using xdp
From: Xuan Zhuo @ 2022-04-23 13:31 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: kuba, davem, Nikolay Aleksandrov, stable, Jason Wang,
	Daniel Borkmann, Michael S. Tsirkin, virtualization, netdev
In-Reply-To: <20220423112612.2292774-1-razor@blackwall.org>

On Sat, 23 Apr 2022 14:26:12 +0300, Nikolay Aleksandrov <razor@blackwall.org> wrote:
> We received a report[1] of kernel crashes when Cilium is used in XDP
> mode with virtio_net after updating to newer kernels. After
> investigating the reason it turned out that when using mergeable bufs
> with an XDP program which adjusts xdp.data or xdp.data_meta page_to_buf()
> calculates the build_skb address wrong because the offset can become less
> than the headroom so it gets the address of the previous page (-X bytes
> depending on how lower offset is):
>  page_to_skb: page addr ffff9eb2923e2000 buf ffff9eb2923e1ffc offset 252 headroom 256
>
> This is a pr_err() I added in the beginning of page_to_skb which clearly
> shows offset that is less than headroom by adding 4 bytes of metadata
> via an xdp prog. The calculations done are:
>  receive_mergeable():
>  headroom = VIRTIO_XDP_HEADROOM; // VIRTIO_XDP_HEADROOM == 256 bytes
>  offset = xdp.data - page_address(xdp_page) -
>           vi->hdr_len - metasize;
>
>  page_to_skb():
>  p = page_address(page) + offset;
>  ...
>  buf = p - headroom;
>
> Now buf goes -4 bytes from the page's starting address as can be seen
> above which is set as skb->head and skb->data by build_skb later. Depending
> on what's done with the skb (when it's freed most often) we get all kinds
> of corruptions and BUG_ON() triggers in mm[2]. The story of the faulty
> commit is interesting because the patch was sent and applied twice (it
> seems the first one got lost during merge back in 5.13 window). The
> first version of the patch that was applied as:
>  commit 7bf64460e3b2 ("virtio-net: get build_skb() buf by data ptr")
> was actually correct because it calculated the page starting address
> without relying on offset or headroom, but then the second version that
> was applied as:
>  commit 8fb7da9e9907 ("virtio_net: get build_skb() buf by data ptr")
> was wrong and added the above calculation.
> An example xdp prog[3] is below.
>
> [1] https://github.com/cilium/cilium/issues/19453
>
> [2] Two of the many traces:
>  [   40.437400] BUG: Bad page state in process swapper/0  pfn:14940
>  [   40.916726] BUG: Bad page state in process systemd-resolve  pfn:053b7
>  [   41.300891] kernel BUG at include/linux/mm.h:720!
>  [   41.301801] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>  [   41.302784] CPU: 1 PID: 1181 Comm: kubelet Kdump: loaded Tainted: G    B   W         5.18.0-rc1+ #37
>  [   41.304458] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014
>  [   41.306018] RIP: 0010:page_frag_free+0x79/0xe0
>  [   41.306836] Code: 00 00 75 ea 48 8b 07 a9 00 00 01 00 74 e0 48 8b 47 48 48 8d 50 ff a8 01 48 0f 45 fa eb d0 48 c7 c6 18 b8 30 a6 e8 d7 f8 fc ff <0f> 0b 48 8d 78 ff eb bc 48 8b 07 a9 00 00 01 00 74 3a 66 90 0f b6
>  [   41.310235] RSP: 0018:ffffac05c2a6bc78 EFLAGS: 00010292
>  [   41.311201] RAX: 000000000000003e RBX: 0000000000000000 RCX: 0000000000000000
>  [   41.312502] RDX: 0000000000000001 RSI: ffffffffa6423004 RDI: 00000000ffffffff
>  [   41.313794] RBP: ffff993c98823600 R08: 0000000000000000 R09: 00000000ffffdfff
>  [   41.315089] R10: ffffac05c2a6ba68 R11: ffffffffa698ca28 R12: ffff993c98823600
>  [   41.316398] R13: ffff993c86311ebc R14: 0000000000000000 R15: 000000000000005c
>  [   41.317700] FS:  00007fe13fc56740(0000) GS:ffff993cdd900000(0000) knlGS:0000000000000000
>  [   41.319150] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  [   41.320152] CR2: 000000c00008a000 CR3: 0000000014908000 CR4: 0000000000350ee0
>  [   41.321387] Call Trace:
>  [   41.321819]  <TASK>
>  [   41.322193]  skb_release_data+0x13f/0x1c0
>  [   41.322902]  __kfree_skb+0x20/0x30
>  [   41.343870]  tcp_recvmsg_locked+0x671/0x880
>  [   41.363764]  tcp_recvmsg+0x5e/0x1c0
>  [   41.384102]  inet_recvmsg+0x42/0x100
>  [   41.406783]  ? sock_recvmsg+0x1d/0x70
>  [   41.428201]  sock_read_iter+0x84/0xd0
>  [   41.445592]  ? 0xffffffffa3000000
>  [   41.462442]  new_sync_read+0x148/0x160
>  [   41.479314]  ? 0xffffffffa3000000
>  [   41.496937]  vfs_read+0x138/0x190
>  [   41.517198]  ksys_read+0x87/0xc0
>  [   41.535336]  do_syscall_64+0x3b/0x90
>  [   41.551637]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>  [   41.568050] RIP: 0033:0x48765b
>  [   41.583955] Code: e8 4a 35 fe ff eb 88 cc cc cc cc cc cc cc cc e8 fb 7a fe ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
>  [   41.632818] RSP: 002b:000000c000a2f5b8 EFLAGS: 00000212 ORIG_RAX: 0000000000000000
>  [   41.664588] RAX: ffffffffffffffda RBX: 000000c000062000 RCX: 000000000048765b
>  [   41.681205] RDX: 0000000000005e54 RSI: 000000c000e66000 RDI: 0000000000000016
>  [   41.697164] RBP: 000000c000a2f608 R08: 0000000000000001 R09: 00000000000001b4
>  [   41.713034] R10: 00000000000000b6 R11: 0000000000000212 R12: 00000000000000e9
>  [   41.728755] R13: 0000000000000001 R14: 000000c000a92000 R15: ffffffffffffffff
>  [   41.744254]  </TASK>
>  [   41.758585] Modules linked in: br_netfilter bridge veth netconsole virtio_net
>
>  and
>
>  [   33.524802] BUG: Bad page state in process systemd-network  pfn:11e60
>  [   33.528617] page ffffe05dc0147b00 ffffe05dc04e7a00 ffff8ae9851ec000 (1) len 82 offset 252 metasize 4 hroom 0 hdr_len 12 data ffff8ae9851ec10c data_meta ffff8ae9851ec108 data_end ffff8ae9851ec14e
>  [   33.529764] page:000000003792b5ba refcount:0 mapcount:-512 mapping:0000000000000000 index:0x0 pfn:0x11e60
>  [   33.532463] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
>  [   33.532468] raw: 000fffffc0000000 0000000000000000 dead000000000122 0000000000000000
>  [   33.532470] raw: 0000000000000000 0000000000000000 00000000fffffdff 0000000000000000
>  [   33.532471] page dumped because: nonzero mapcount
>  [   33.532472] Modules linked in: br_netfilter bridge veth netconsole virtio_net
>  [   33.532479] CPU: 0 PID: 791 Comm: systemd-network Kdump: loaded Not tainted 5.18.0-rc1+ #37
>  [   33.532482] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014
>  [   33.532484] Call Trace:
>  [   33.532496]  <TASK>
>  [   33.532500]  dump_stack_lvl+0x45/0x5a
>  [   33.532506]  bad_page.cold+0x63/0x94
>  [   33.532510]  free_pcp_prepare+0x290/0x420
>  [   33.532515]  free_unref_page+0x1b/0x100
>  [   33.532518]  skb_release_data+0x13f/0x1c0
>  [   33.532524]  kfree_skb_reason+0x3e/0xc0
>  [   33.532527]  ip6_mc_input+0x23c/0x2b0
>  [   33.532531]  ip6_sublist_rcv_finish+0x83/0x90
>  [   33.532534]  ip6_sublist_rcv+0x22b/0x2b0
>
> [3] XDP program to reproduce(xdp_pass.c):
>  #include <linux/bpf.h>
>  #include <bpf/bpf_helpers.h>
>
>  SEC("xdp_pass")
>  int xdp_pkt_pass(struct xdp_md *ctx)
>  {
>           bpf_xdp_adjust_head(ctx, -(int)32);
>           return XDP_PASS;
>  }
>
>  char _license[] SEC("license") = "GPL";
>
>  compile: clang -O2 -g -Wall -target bpf -c xdp_pass.c -o xdp_pass.o
>  load on virtio_net: ip link set enp1s0 xdpdrv obj xdp_pass.o sec xdp_pass
>
> CC: stable@vger.kernel.org
> CC: Jason Wang <jasowang@redhat.com>
> CC: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> CC: Daniel Borkmann <daniel@iogearbox.net>
> CC: "Michael S. Tsirkin" <mst@redhat.com>
> CC: virtualization@lists.linux-foundation.org
> Fixes: 8fb7da9e9907 ("virtio_net: get build_skb() buf by data ptr")
> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
> ---
>  drivers/net/virtio_net.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 87838cbe38cf..0687dd88e97f 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -434,9 +434,13 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>  	 * Buffers with headroom use PAGE_SIZE as alloc size, see
>  	 * add_recvbuf_mergeable() + get_mergeable_buf_len()
>  	 */
> -	truesize = headroom ? PAGE_SIZE : truesize;
> +	if (headroom) {
> +		truesize = PAGE_SIZE;
> +		buf = (char *)((unsigned long)p & PAGE_MASK);

The reason for not doing this is that buf and p may not be on the same page, and
buf is probably not page-aligned.

The implementation of virtio-net merge is add_recvbuf_mergeable(), which
allocates a large block of memory at one time, and allocates from it each time.
Although in xdp mode, each allocation is page_size, it does not guarantee that
each allocation is page-aligned .

The problem here is that the value of headroom is wrong, the package is
structured like this:

from device    | headroom          | virtio-net hdr | data |
after xdp      | headroom  |  virtio-net hdr | meta | data |

The page_address(page) + offset we pass to page_to_skb() points to the
virtio-net hdr.

So I think it might be better to change it this way.

Thanks.


diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 87838cbe38cf..086ae835ec86 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1012,7 +1012,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
                                head_skb = page_to_skb(vi, rq, xdp_page, offset,
                                                       len, PAGE_SIZE, false,
                                                       metasize,
-                                                      VIRTIO_XDP_HEADROOM);
+                                                      VIRTIO_XDP_HEADROOM - metazie);
                                return head_skb;
                        }
                        break;

^ permalink raw reply related

* Re: [PATCH net-next 03/28] sfc: Copy shared files needed for Siena
From: Jakub Kicinski @ 2022-04-23 13:50 UTC (permalink / raw)
  To: Martin Habets; +Cc: pabeni, davem, netdev, ecree.xilinx
In-Reply-To: <165063946292.27138.5733728538967332821.stgit@palantir17.mph.net>

On Fri, 22 Apr 2022 15:57:43 +0100 Martin Habets wrote:
> From: Martin Habets <martinh@xilinx.com>
> 
> No changes are done, those will be done with subsequent commits.

This ginormous patch does not make it thru the mail systems.
I'm guessing there is a (perfectly reasonable) 1MB limit somewhere.

I think you can also rework the series and combine the pure rename
patches. Having the renames by header file does not substantially 
help review.

Try to stay under the 15 patch limit.

^ permalink raw reply

* Re: [PATCH v0] nfc: nci: add flush_workqueue to prevent uaf
From: Guenter Roeck @ 2022-04-23 13:52 UTC (permalink / raw)
  To: Lin Ma; +Cc: krzk, davem, kuba, pabeni, netdev, linux-kernel, mudongliangabcd
In-Reply-To: <524c4fb6.6e33.1803cf85ae9.Coremail.linma@zju.edu.cn>

On Mon, Apr 18, 2022 at 09:59:10PM +0800, Lin Ma wrote:
> Hello Guenter,
> 
> > I have been wondering about this and the same code further below.
> > What prevents the command timer from firing after the call to
> > flush_workqueue() ?
> > 
> > Thanks,
> > Guenter
> > 
> 
> From my understanding, once the flush_workqueue() is executed, the work that queued in
> ndev->cmd_wq will be taken the care of.
> 
> That is, once the flush_workqueue() is finished, it promises there is no executing or 
> pending nci_cmd_work() ever.
> 
> static void nci_cmd_work(struct work_struct *work)
> {
>     // ...
> 		mod_timer(&ndev->cmd_timer,
> 			  jiffies + msecs_to_jiffies(NCI_CMD_TIMEOUT));
>     // ...
> }
> 
> The command timer is still able be fired because the mod_timer() here. That is why the
> del_timer_sync() is necessary after the flush_workqueue().
> 
> One very puzzling part is that you may find out the timer queue the work again
> 
> /* NCI command timer function */
> static void nci_cmd_timer(struct timer_list *t)
> {
>     // ...
> 	queue_work(ndev->cmd_wq, &ndev->cmd_work);
> }
> 
> But I found that this is okay because there is no packets in ndev->cmd_q buffers hence 
> even there is a queued nci_cmd_work(), it simply checks the queue and returns.
> 
> That is, the old race picture as below
> 
> > Thread-1                           Thread-2
> >                                  | nci_dev_up()
> >                                  |   nci_open_device()
> >                                  |     __nci_request(nci_reset_req)
> >                                  |       nci_send_cmd
> >                                  |         queue_work(cmd_work)
> > nci_unregister_device()          |
> >   nci_close_device()             | ...
> >     del_timer_sync(cmd_timer)[1] |
> > ...                              | Worker
> > nci_free_device()                | nci_cmd_work()
> >   kfree(ndev)[3]                 |   mod_timer(cmd_timer)[2]
> 
> is impossible now because the patched flush_workqueue() make the race like below
> 
> > Thread-1                           Thread-2
> >                                  | nci_dev_up()
> >                                  |   nci_open_device()
> >                                  |     __nci_request(nci_reset_req)
> >                                  |       nci_send_cmd
> >                                  |         queue_work(cmd_work)
> > nci_unregister_device()          |
> >   nci_close_device()             | ...
> >     flush_workqueue()[patch]     | Worker
> >                                  | nci_cmd_work()
> >                                  |   mod_timer(cmd_timer)[2]
> >     // work over then return
> >     del_timer_sync(cmd_timer)[1] |
> >                                  | Timer
> >                                  | nci_cmd_timer()
> >                                  | 
> >     // timer over then return    |
> > ...                              |
> > nci_free_device()                | 
> >   kfree(ndev)[3]                 | 
> 
> 
> With above thinkings and the given fact that my POC didn't raise the UAF, I think the 
> flush_workqueue() + del_timer_sync() combination is okay to hinder this race.
> 
> Tell me if there is anything wrong.
> 

Thanks a lot for the detailed explanation and analysis.
I agree with your conclusion.

Guenter

^ permalink raw reply

* Re: [PATCH] net: dsa: mv88e6xxx: Skip cmode writable for mv88e6*41 if unchanged
From: Nathan Rossi @ 2022-04-23 13:59 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, linux-kernel, Andrew Lunn, Vivien Didelot,
	Florian Fainelli, Vladimir Oltean, David S. Miller,
	Jakub Kicinski, Paolo Abeni
In-Reply-To: <20220423152523.1f38e2d8@thinkpad>

On Sat, 23 Apr 2022 at 23:25, Marek Behún <kabel@kernel.org> wrote:
>
> On Sat, 23 Apr 2022 13:20:35 +0000
> Nathan Rossi <nathan@nathanrossi.com> wrote:
>
> > The mv88e6341_port_set_cmode function always calls the set writable
> > regardless of whether the current cmode is different from the desired
> > cmode. This is problematic for specific configurations of the mv88e6341
> > and mv88e6141 (in single chip adddressing mode?) where the hidden
> > registers are not accessible.
>
> I don't have a problem with skipping setting cmode writable if cmode is
> not being changed. But hidden registers should be accessible regardless
> of whether you are using single chip addressing mode or not. You need
> to find why it isn't working for you, this is a bug.

I did try to debug the hidden register access, unfortunately with the
device I have the hidden registers do not behave correctly. It simply
times out waiting for the busy bit to change. I was not sure the
reason why and suspected it was something specific to the single mode,
and unfortunately the only information I have regarding these
registers is the kernel code itself. Perhaps it is specific to some
other pin configuration or the specific chip revision? If you have any
additional information for these hidden registers it would be very
helpful in debugging. For reference the device is a MV88E6141,
manufactured in 2019 week 47 (in a Netgate SG-3100).

Thanks,
Nathan

>
> Marek

^ permalink raw reply

* [PATCH bpf-next 0/4] bpf: Generate helpers for pinning through bpf object skeleton
From: Yafang Shao @ 2022-04-23 14:00 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh
  Cc: netdev, bpf, Yafang Shao

Currently there're helpers for allowing to open/load/attach BPF object
through BPF object skeleton. Let's also add helpers for pinning through
BPF object skeleton. It could simplify BPF userspace code which wants to
pin the progs into bpffs.

After this change, with command 'bpftool gen skeleton XXX.bpf.o', the
helpers for pinning BPF prog will be generated in BPF object skeleton.

The new helpers are named with __{pin, unpin}_prog, because it only pins
bpf progs. If the user also wants to pin bpf maps, he can use
LIBBPF_PIN_BY_NAME.

Yafang Shao (4):
  libbpf: Define DEFAULT_BPFFS
  libbpf: Add helpers for pinning bpf prog through bpf object skeleton
  bpftool: Fix incorrect return in generated detach helper
  bpftool: Generate helpers for pinning prog through bpf object skeleton

 tools/bpf/bpftool/gen.c     | 18 ++++++++++-
 tools/lib/bpf/bpf_helpers.h |  2 +-
 tools/lib/bpf/libbpf.c      | 61 ++++++++++++++++++++++++++++++++++++-
 tools/lib/bpf/libbpf.h      | 10 ++++--
 tools/lib/bpf/libbpf.map    |  2 ++
 5 files changed, 88 insertions(+), 5 deletions(-)

-- 
2.17.1

^ permalink raw reply

* [PATCH bpf-next 3/4] bpftool: Fix incorrect return in generated detach helper
From: Yafang Shao @ 2022-04-23 14:00 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh
  Cc: netdev, bpf, Yafang Shao
In-Reply-To: <20220423140058.54414-1-laoar.shao@gmail.com>

There is no return value of bpf_object__detach_skeleton(), so we'd
better not return it, that is formal.

Fixes: 5dc7a8b21144 ("bpftool, selftests/bpf: Embed object file inside skeleton")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 tools/bpf/bpftool/gen.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/gen.c b/tools/bpf/bpftool/gen.c
index 7678af364793..8f76d8d9996c 100644
--- a/tools/bpf/bpftool/gen.c
+++ b/tools/bpf/bpftool/gen.c
@@ -1171,7 +1171,7 @@ static int do_skeleton(int argc, char **argv)
 		static inline void					    \n\
 		%1$s__detach(struct %1$s *obj)				    \n\
 		{							    \n\
-			return bpf_object__detach_skeleton(obj->skeleton);  \n\
+			bpf_object__detach_skeleton(obj->skeleton);	    \n\
 		}							    \n\
 		",
 		obj_name
-- 
2.17.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox