Netdev List
 help / color / mirror / Atom feed
* Re: [net] 03d56978dd: BUG:Bad_page_map_in_process
From: Joanne Koong @ 2022-08-12 19:01 UTC (permalink / raw)
  To: Yujie Liu
  Cc: 0day robot, LKML, netdev, dccp, lkp, Paolo Abeni, Eric Dumazet,
	Jakub Kicinski, Martin KaFai Lau, David Miller, kernel test robot
In-Reply-To: <CAJnrk1Y4WOHohQFp_+_OUuAQfKS7BewgmKp4V+MF3EMGTxcR=w@mail.gmail.com>

On Tue, Aug 9, 2022 at 9:52 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Fri, Aug 5, 2022 at 12:30 AM Yujie Liu <yujie.liu@intel.com> wrote:
> >
> > Hi Joanne,
> >
> > On 7/28/2022 07:41, Joanne Koong wrote:
> > > On Sun, Jul 24, 2022 at 7:05 AM kernel test robot <oliver.sang@intel.com> wrote:
> > >>
> > >>
> > >>
> > >> Greeting,
> > >>
> > >> FYI, we noticed the following commit (built with gcc-11):
> > >>
> > >> commit: 03d56978dd246147e151916e4dc72af7bc24d5c9 ("[PATCH net-next v3 1/3] net: Add a bhash2 table hashed by port + address")
> > >> url: https://github.com/intel-lab-lkp/linux/commits/Joanne-Koong/Add-a-second-bind-table-hashed-by-port-address/20220723-035903
> > >> base: https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git 949d6b405e6160ae44baea39192d67b39cb7eeac
> > >> patch link: https://lore.kernel.org/netdev/20220722195406.1304948-2-joannelkoong@gmail.com
> > >>
> > >> in testcase: boot
> > >>
> > >> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> > >>
> > >> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> > >>
> > >>
> > >>
> > >> If you fix the issue, kindly add following tag
> > >> Reported-by: kernel test robot <oliver.sang@intel.com>
> > >>
> > >>
> > >> [  103.871133][  T486] BUG: Bad page map in process rsync  pte:ffff92f93b759508 pmd:13fc1e067
> > >> [  103.873143][  T486] addr:00007f9fe52a2000 vm_flags:00000075 anon_vma:0000000000000000 mapping:ffff92f928adcb58 index:1a1
> > >> [  103.875128][  T486] file:libcrypto.so.1.1 fault:filemap_fault mmap:generic_file_mmap read_folio:simple_read_folio
> > >> [  103.877339][  T486] CPU: 0 PID: 486 Comm: rsync Not tainted 5.19.0-rc7-01443-g03d56978dd24 #1
> > >> [  103.879032][  T486] Call Trace:
> > >> [  103.879742][  T486]  <TASK>
> > >> [  103.880329][  T486]  ? simple_write_end+0x140/0x140
> > >> [  103.881338][  T486]  dump_stack_lvl+0x3b/0x53
> > >> [  103.882274][  T486]  ? __filemap_get_folio+0x780/0x780
> > >> [  103.883270][  T486]  print_bad_pte.cold+0x15b/0x1c5
> > >> [  103.884202][  T486]  vm_normal_page+0x65/0x140
> > >> [  103.885062][  T486]  zap_pte_range+0x23b/0x9c0
> > >> [  103.885897][  T486]  unmap_page_range+0x263/0x5c0
> > >> [  103.886846][  T486]  unmap_vmas+0x121/0x200
> > >> [  103.887628][  T486]  exit_mmap+0xb5/0x240
> > >> [  103.888401][  T486]  mmput+0x3b/0x140
> > >> [  103.889134][  T486]  exit_mm+0xff/0x180
> > >> [  103.889877][  T486]  do_exit+0x100/0x400
> > >> [  103.890661][  T486]  do_group_exit+0x3e/0x100
> > >> [  103.891514][  T486]  __x64_sys_exit_group+0x18/0x40
> > >> [  103.892494][  T486]  do_syscall_64+0x5d/0x80
> > >> [  103.893294][  T486]  ? do_user_addr_fault+0x257/0x6c0
> > >> [  103.894238][  T486]  ? lock_release+0x6e/0x100
> > >> [  103.895171][  T486]  ? up_read+0x12/0x40
> > >> [  103.896036][  T486]  ? exc_page_fault+0xb2/0x2c0
> > >> [  103.897021][  T486]  entry_SYSCALL_64_after_hwframe+0x5d/0xc7
> > >> [  103.898243][  T486] RIP: 0033:0x7f9fe5007699
> > >> [  103.899149][  T486] Code: Unable to access opcode bytes at RIP 0x7f9fe500766f.
> > >> [  103.900511][  T486] RSP: 002b:00007fff7e32c3a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> > >> [  103.902027][  T486] RAX: ffffffffffffffda RBX: 00007f9fe50fc610 RCX: 00007f9fe5007699
> > >> [  103.903477][  T486] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
> > >> [  103.904943][  T486] RBP: 0000000000000000 R08: ffffffffffffff80 R09: 0000000000000001
> > >> [  103.906384][  T486] R10: 000000000000000b R11: 0000000000000246 R12: 00007f9fe50fc610
> > >> [  103.907823][  T486] R13: 0000000000000001 R14: 00007f9fe50fcae8 R15: 0000000000000000
> > >> [  103.909290][  T486]  </TASK>
> > >> [  103.910423][  T486] Disabling lock debugging due to kernel taint
> > >> [  107.503093][  T508] BUG: Bad page map in process rsync  pte:ffff92f93b7fe508 pmd:13aa1c067
> > >> [  107.504948][  T508] addr:00007fced9aa2000 vm_flags:00000075 anon_vma:0000000000000000 mapping:ffff92f92891ab58 index:9a
> > >> [  107.507070][  T508] file:libzstd.so.1.4.8 fault:filemap_fault mmap:generic_file_mmap read_folio:simple_read_folio
> > >> [  107.508825][  T508] CPU: 0 PID: 508 Comm: rsync Tainted: G    B             5.19.0-rc7-01443-g03d56978dd24 #1
> > >> [  107.510762][  T508] Call Trace:
> > >> [  107.511458][  T508]  <TASK>
> > >> [  107.512058][  T508]  ? simple_write_end+0x140/0x140
> > >> [  107.513072][  T508]  dump_stack_lvl+0x3b/0x53
> > >> [  107.513990][  T508]  ? __filemap_get_folio+0x780/0x780
> > >> [  107.519166][  T508]  print_bad_pte.cold+0x15b/0x1c5
> > >> [  107.520032][  T508]  vm_normal_page+0x65/0x140
> > >> [  107.520802][  T508]  zap_pte_range+0x23b/0x9c0
> > >> [  107.521548][  T508]  unmap_page_range+0x263/0x5c0
> > >> [  107.522355][  T508]  unmap_vmas+0x121/0x200
> > >> [  107.523247][  T508]  exit_mmap+0xb5/0x240
> > >> [  107.524107][  T508]  mmput+0x3b/0x140
> > >> [  107.524908][  T508]  exit_mm+0xff/0x180
> > >> [  107.525716][  T508]  do_exit+0x100/0x400
> > >> [  107.526613][  T508]  do_group_exit+0x3e/0x100
> > >> [  107.527541][  T508]  __x64_sys_exit_group+0x18/0x40
> > >> [  107.528450][  T508]  do_syscall_64+0x5d/0x80
> > >> [  107.529368][  T508]  ? up_read+0x12/0x40
> > >> [  107.530228][  T508]  ? do_user_addr_fault+0x257/0x6c0
> > >> [  107.531121][  T508]  ? rcu_read_lock_sched_held+0x5/0x40
> > >> [  107.532046][  T508]  ? exc_page_fault+0xb2/0x2c0
> > >> [  107.532843][  T508]  entry_SYSCALL_64_after_hwframe+0x5d/0xc7
> > >> [  107.533866][  T508] RIP: 0033:0x7fced95ff699
> > >> [  107.534781][  T508] Code: Unable to access opcode bytes at RIP 0x7fced95ff66f.
> > >> [  107.536225][  T508] RSP: 002b:00007fff162474c8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> > >> [  107.537871][  T508] RAX: ffffffffffffffda RBX: 00007fced96f4610 RCX: 00007fced95ff699
> > >> [  107.539506][  T508] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
> > >> [  107.541126][  T508] RBP: 0000000000000000 R08: ffffffffffffff80 R09: 0000000000000001
> > >> [  107.542743][  T508] R10: 000000000000000b R11: 0000000000000246 R12: 00007fced96f4610
> > >> [  107.544310][  T508] R13: 0000000000000001 R14: 00007fced96f4ae8 R15: 0000000000000000
> > >> [  107.545881][  T508]  </TASK>
> > >>
> > >>
> > >>
> > >> To reproduce:
> > >>
> > >>          # build kernel
> > >>          cd linux
> > >>          cp config-5.19.0-rc7-01443-g03d56978dd24 .config
> > >>          make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
> > >>          make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
> > >>          cd <mod-install-dir>
> > >>          find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
> > >>
> > >>
> > >>          git clone https://github.com/intel/lkp-tests.git
> > >>          cd lkp-tests
> > >>          bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
> > >>
> > >>          # if come across any failure that blocks the test,
> > >>          # please remove ~/.lkp and /lkp dir to run from a clean state.
> > >>
> > > I ran this in a loop ~20 times but I'm not able to repro the crash.
> > > This is a snippet of what I see (and I can also attach or paste the
> > > entire log if that would be helpful):
> > >
> > > I examined more closely the changes between v2 and v3 and I don't see
> > > anything that would lead to this error either (I'm assuming  v2 is
> > > okay because this report wasn't generated for it). Looking at the
> > > stack trace too, I'm not seeing anything that sticks out (eg this
> > > looks like a memory mapping failure and bhash2 didn't modify mapping
> > > or paging code).
> >
> > We chose commit 949d6b405e61 (net: add missing includes and forward
> > declarations under net/) as base, which used to be the head of
> > net-next/master branch then, and apply your v3 patches on top of it.
> > So the test result is a comparison between 949d6b405e61 and v3.
> >
> > Refer to the bug info:
> >
> > [  103.871133][  T486] BUG: Bad page map in process rsync  pte:ffff92f93b759508 pmd:13fc1e067
> >
> > The BUG happens in rsync, and it reminds me that we have some extra
> > steps when running the test in our infrastructure. We will use some
> > commands such as `wget` and `rsync` to transfer the test result to
> > our server, but these steps are not included when reproducing locally.
> >
> > Then I come up with an idea that maybe the kernel can boot successfully,
> > but the v3 patch may have some impacts on the command involving network
> > operations.
> >
> > Could you please help to apply below hack on the latest version of
> > lkp-tests, and retry to see if can reproduce the crash? It is just
> > a meaningless `wget` command to involve network in local test and align
> > with the steps in our testing environment.
>
> I will try to repro this this week. I'll let you know what I find.

I applied the wget change you suggested and was able to reproduce the crash.

This is happening because in the case where there is a connect() call
on address 0 on an unbound socket, the socket gets added to the bind
bucket twice. The first happens in inet_bhash2_update_saddr() and the
second happens when __inet_hash_connect() calls inet_bind_hash(). The
fix is to update the bhash2 table only if the socket is already bound.

I will submit v4 with this fix added. There is already a selftest
("sk_connect_zero_addr") in the 3rd patch that simulates this case but
it doesn't trigger the bad page table entry state when unmapping.

Thanks for reporting.

>
> >
> > diff --git a/lib/upload.sh b/lib/upload.sh
> > index 257b498db..e8801736e 100755
> > --- a/lib/upload.sh
> > +++ b/lib/upload.sh
> > @@ -181,7 +181,8 @@ upload_files()
> >                  fi
> >          else
> >                  # 9pfs, copy directly
> > -               upload_files_copy "$@"
> > +               wget 127.0.0.1
> >                  return
> >          fi
> >   }
> >
> > After applying above hack, I've tried to run 20 times on base and v3 patch
> > respectively. All runs of base are good, but there are 8 crash runs of v3.
> >
> > Reproducing steps:
> >
> >         cd linux
> >         git remote add net-next https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git
> >         git fetch net-next master
> >         git checkout 949d6b405e61 # checkout to base
> >         git am <v3.patch>
> >
> >         cp config-5.19.0-rc7-01443-g03d56978dd24 .config # config file is attached
> >         make ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
> >         mkdir <mod-install-dir>
> >         make ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
> >         cd <mod-install-dir>
> >         find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
> >
> >          git clone https://github.com/intel/lkp-tests.git
> >          cd lkp-tests
> >         # apply the hack mentioned above
> >          bin/lkp qemu -k <bzImage> -m <mod-install-dir>/modules.cgz job-script # job-script is attached in this email
> >
> > --
> > Best Regards,
> > Yujie
> >
> > >
> > > I don't think this bug report is related to the bhash2 changes. But
> > > please let me know if you disagree.
> > >
> > > Thanks,
> > > Joanne
> > >
> > >>
> > >>
> > >> --
> > >> 0-DAY CI Kernel Test Service
> > >> https://01.org/lkp
> > >>
> > >>

^ permalink raw reply

* Re: [PATCH v5 3/5] Bluetooth: Add support for hci devcoredump
From: kernel test robot @ 2022-08-12 18:59 UTC (permalink / raw)
  To: Manish Mandlik, Arend van Spriel, Greg Kroah-Hartman, marcel,
	luiz.dentz
  Cc: kbuild-all, Johannes Berg, Dan Williams, Jason Gunthorpe,
	Signed-off-by : Manish Mandlik, linux-bluetooth, Thomas Gleixner,
	Rafael J . Wysocki, chromeos-bluetooth-upstreaming, Won Chung,
	Abhishek Pandit-Subedi, Eric Dumazet, Jakub Kicinski,
	Johan Hedberg, Paolo Abeni, linux-kernel, netdev
In-Reply-To: <20220810085753.v5.3.Iaf638bb9f885f5880ab1b4e7ae2f73dd53a54661@changeid>

Hi Manish,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on bluetooth/master]
[also build test WARNING on bluetooth-next/master driver-core/driver-core-testing linus/master next-20220812]
[cannot apply to v5.19]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Manish-Mandlik/sysfs-Add-attribute-info-for-sys-devices-coredump_disabled/20220811-000313
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth.git master
config: loongarch-randconfig-r022-20220811 (https://download.01.org/0day-ci/archive/20220813/202208130238.wyNvbcE7-lkp@intel.com/config)
compiler: loongarch64-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/6fe2192077ebdca91aef91e907f79d9e38960a21
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Manish-Mandlik/sysfs-Add-attribute-info-for-sys-devices-coredump_disabled/20220811-000313
        git checkout 6fe2192077ebdca91aef91e907f79d9e38960a21
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=loongarch SHELL=/bin/bash net/bluetooth/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from net/bluetooth/coredump.c:8:
   net/bluetooth/coredump.c: In function 'hci_devcoredump_rx':
>> include/net/bluetooth/bluetooth.h:255:17: warning: format '%u' expects argument of type 'unsigned int', but argument 4 has type 'size_t' {aka 'long unsigned int'} [-Wformat=]
     255 |         BT_INFO("%s: " fmt, bt_dev_name(hdev), ##__VA_ARGS__)
         |                 ^~~~~~
   include/net/bluetooth/bluetooth.h:242:41: note: in definition of macro 'BT_INFO'
     242 | #define BT_INFO(fmt, ...)       bt_info(fmt "\n", ##__VA_ARGS__)
         |                                         ^~~
   net/bluetooth/coredump.c:298:25: note: in expansion of macro 'bt_dev_info'
     298 |                         bt_dev_info(hdev,
         |                         ^~~~~~~~~~~
>> include/net/bluetooth/bluetooth.h:255:17: warning: format '%u' expects argument of type 'unsigned int', but argument 4 has type 'size_t' {aka 'long unsigned int'} [-Wformat=]
     255 |         BT_INFO("%s: " fmt, bt_dev_name(hdev), ##__VA_ARGS__)
         |                 ^~~~~~
   include/net/bluetooth/bluetooth.h:242:41: note: in definition of macro 'BT_INFO'
     242 | #define BT_INFO(fmt, ...)       bt_info(fmt "\n", ##__VA_ARGS__)
         |                                         ^~~
   net/bluetooth/coredump.c:317:25: note: in expansion of macro 'bt_dev_info'
     317 |                         bt_dev_info(hdev,
         |                         ^~~~~~~~~~~
   net/bluetooth/coredump.c: In function 'hci_devcoredump_timeout':
>> include/net/bluetooth/bluetooth.h:255:17: warning: format '%u' expects argument of type 'unsigned int', but argument 4 has type 'size_t' {aka 'long unsigned int'} [-Wformat=]
     255 |         BT_INFO("%s: " fmt, bt_dev_name(hdev), ##__VA_ARGS__)
         |                 ^~~~~~
   include/net/bluetooth/bluetooth.h:242:41: note: in definition of macro 'BT_INFO'
     242 | #define BT_INFO(fmt, ...)       bt_info(fmt "\n", ##__VA_ARGS__)
         |                                         ^~~
   net/bluetooth/coredump.c:364:9: note: in expansion of macro 'bt_dev_info'
     364 |         bt_dev_info(hdev, "Devcoredump timeout with size %u (expect %u)",
         |         ^~~~~~~~~~~


vim +255 include/net/bluetooth/bluetooth.h

9b392e0e0b6d02 Luiz Augusto von Dentz 2022-03-03  253  
6f558b70fb39fc Loic Poulain           2015-08-30  254  #define bt_dev_info(hdev, fmt, ...)				\
9b392e0e0b6d02 Luiz Augusto von Dentz 2022-03-03 @255  	BT_INFO("%s: " fmt, bt_dev_name(hdev), ##__VA_ARGS__)
594b31ea7dc610 Frederic Danis         2015-09-23  256  #define bt_dev_warn(hdev, fmt, ...)				\
9b392e0e0b6d02 Luiz Augusto von Dentz 2022-03-03  257  	BT_WARN("%s: " fmt, bt_dev_name(hdev), ##__VA_ARGS__)
6f558b70fb39fc Loic Poulain           2015-08-30  258  #define bt_dev_err(hdev, fmt, ...)				\
9b392e0e0b6d02 Luiz Augusto von Dentz 2022-03-03  259  	BT_ERR("%s: " fmt, bt_dev_name(hdev), ##__VA_ARGS__)
6f558b70fb39fc Loic Poulain           2015-08-30  260  #define bt_dev_dbg(hdev, fmt, ...)				\
9b392e0e0b6d02 Luiz Augusto von Dentz 2022-03-03  261  	BT_DBG("%s: " fmt, bt_dev_name(hdev), ##__VA_ARGS__)
6f558b70fb39fc Loic Poulain           2015-08-30  262  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply

* Re: [RFC v2] net: introduce OpenVPN Data Channel Offload (ovpn-dco)
From: Stephen Hemminger @ 2022-08-12 18:44 UTC (permalink / raw)
  To: Sergey Ryazanov
  Cc: Antonio Quartulli, netdev, David Miller, Jakub Kicinski,
	open list
In-Reply-To: <CAHNKnsQnHAdxC-XhC9RP-cFp0d-E4YGb+7ie3WymXVL9N-QS6A@mail.gmail.com>

On Fri, 12 Aug 2022 21:34:33 +0300
Sergey Ryazanov <ryazanov.s.a@gmail.com> wrote:

> What is the purpose of creating and destroying interfaces via RTNL,
> but performing all other operations using the dedicated netlink
> protocol?
> 
> RTNL interface usually implemented for some standalone interface
> types, e.g. VLAN, GRE, etc. Here we need a userspace application
> anyway to be able to use the network device to forward traffic, and
> the module implements the dedicated GENL protocol. So why not just
> introduce OVPN_CMD_NEW_IFACE and OVPN_CMD_DEL_IFACE commands to the
> GENL interface? It looks like this will simplify the userspace part by
> using the single GENL interface for any management operations.

RTNL is netlink. The standard way to create network devices should
be available with newlink message as in:

 # ip link add dev myvpn type ovpn <options>

^ permalink raw reply

* Re: [RFC v2] net: introduce OpenVPN Data Channel Offload (ovpn-dco)
From: Sergey Ryazanov @ 2022-08-12 18:34 UTC (permalink / raw)
  To: Antonio Quartulli; +Cc: netdev, David Miller, Jakub Kicinski, open list
In-Reply-To: <20220803153152.11189-1-antonio@openvpn.net>

Hello Antonio,

first of all, I would like to say thank you for taking care of this.
Such useful software as OpenVPN deserves a kernel accelerated rate,
and probably a lot of users, including me, are waiting for such
feature.

On Wed, Aug 3, 2022 at 6:37 PM Antonio Quartulli <antonio@openvpn.net> wrote:
> OpenVPN is a userspace software existing since around 2005 that allows
> users to create secure tunnels.
>
> So far OpenVPN has implemented all operations in userspace, which
> implies several back and forth between kernel and user land in order to
> process packets (encapsulate/decapsulate, encrypt/decrypt, rerouting..).
>
> With ovpn-dco, we intend to move the fast path (data channel) entirely
> in kernel space and thus improve user measured throughput over the
> tunnel.
>
> ovpn-dco is implemented as a simple virtual network device driver, that
> can be manipulated by means of the standard RTNL APIs. A device of kind
> 'ovpn-dco' allows only IPv4/6 traffic and can be of type:
> * P2P (peer-to-peer): any packet sent over the interface will be
>   encapsulated and transmitted to the other side (typical OpenVPN
>   client behaviour);
> * P2MP (point-to-multipoint): packets sent over the interface are
>   transmitted to peers based on existing routes (typical OpenVPN
>   server behaviour).
>
> After the interface has been created, OpenVPN in userspace can
> configure it using a new Netlink API. Specifically it is possible
> to manage peers, configure per-peer keys and exchange packets with
> userspace.
>
> The OpenVPN control channel is multiplexed over the same transport
> socket by means of OP codes. Anything that is not DATA_V2 (OpenVPN
> OP code for data traffic) is sent to userspace and handled there.
> This way the ovpn-dco codebase is kept as compact as possible while
> focusing on handling data traffic only.
>
> Any OpenVPN control feature (like cipher negotiation, TLS handshake,
> rekeying, etc.) is still fully handled by the userspace process.
>
> When userspace establishes a new connection with a peer, it first
> performs the handshake and then passes the socket to ovpn-dco, which
> takes ownership. From this moment on ovpn-dco will handle data traffic
> for the new peer. When control packets are received on the link, they
> are forwarded to userspace via Netlink.
>
> (this approach is somewhat inspired by hostapd+mac80211)
>
> Some events (like peer deletion) are sent to a Netlink multicast group.
>
> Although it wasn't easy to convince the community, ovpn-dco implements
> only a limited number of the data-channel features supported by the
> userspace program.
>
> Each feature that made it to ovpn-dco was attentively vetted to
> avoid carrying too much legacy along with us (and to give a clear cut to
> old and probalby-not-so-useful features).
>
> Notably, only encryption using AEAD ciphers (specifically
> ChaCha20Poly1305 and AES-GCM) was implemented. Supporting any other
> cipher out there was not deemed useful.
>
> As explained above, in case of P2MP mode, OpenVPN will use the main system
> routing table to decide which packet goes to which peer. This implies
> that no routing table was re-implemented in ovpn-dco.
>
> This kernel module can be enabled by selecting the CONFIG_OVPN_DCO entry
> in the networking drivers section.
>
> Cc: David Miller <davem@davemloft.net>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: netdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
> ---
>
> Changes from v1:
> * use net/netdev print helpers when possible
> * properly set min/max_mtu
> * get rid of ndo_change_mtu
> * don't set version in ethtool output
> * ensure can be compiled also when no IPv6 support exists
>
> ---
>  MAINTAINERS                        |    8 +
>  drivers/net/Kconfig                |   19 +
>  drivers/net/Makefile               |    1 +
>  drivers/net/ovpn-dco/Makefile      |   21 +
>  drivers/net/ovpn-dco/addr.h        |   41 +
>  drivers/net/ovpn-dco/bind.c        |   62 ++
>  drivers/net/ovpn-dco/bind.h        |   67 ++
>  drivers/net/ovpn-dco/crypto.c      |  154 ++++
>  drivers/net/ovpn-dco/crypto.h      |  144 ++++
>  drivers/net/ovpn-dco/crypto_aead.c |  367 +++++++++
>  drivers/net/ovpn-dco/crypto_aead.h |   27 +
>  drivers/net/ovpn-dco/main.c        |  271 +++++++
>  drivers/net/ovpn-dco/main.h        |   32 +
>  drivers/net/ovpn-dco/netlink.c     | 1143 ++++++++++++++++++++++++++++
>  drivers/net/ovpn-dco/netlink.h     |   22 +
>  drivers/net/ovpn-dco/ovpn.c        |  600 +++++++++++++++
>  drivers/net/ovpn-dco/ovpn.h        |   43 ++
>  drivers/net/ovpn-dco/ovpnstruct.h  |   59 ++
>  drivers/net/ovpn-dco/peer.c        |  906 ++++++++++++++++++++++
>  drivers/net/ovpn-dco/peer.h        |  168 ++++
>  drivers/net/ovpn-dco/pktid.c       |  127 ++++
>  drivers/net/ovpn-dco/pktid.h       |  116 +++
>  drivers/net/ovpn-dco/proto.h       |  101 +++
>  drivers/net/ovpn-dco/rcu.h         |   21 +
>  drivers/net/ovpn-dco/skb.h         |   54 ++
>  drivers/net/ovpn-dco/sock.c        |  134 ++++
>  drivers/net/ovpn-dco/sock.h        |   54 ++
>  drivers/net/ovpn-dco/stats.c       |   20 +
>  drivers/net/ovpn-dco/stats.h       |   67 ++
>  drivers/net/ovpn-dco/tcp.c         |  326 ++++++++
>  drivers/net/ovpn-dco/tcp.h         |   38 +
>  drivers/net/ovpn-dco/udp.c         |  343 +++++++++
>  drivers/net/ovpn-dco/udp.h         |   25 +
>  include/net/netlink.h              |    1 +
>  include/uapi/linux/ovpn_dco.h      |  265 +++++++
>  include/uapi/linux/udp.h           |    1 +
>  36 files changed, 5848 insertions(+)

It is very hard to review almost 6 thousand lines of code at once.
Some issues may be overlooked. I would like to ask you to do the
following submission as a series of patches. It is easier to review
the code separated into patches by logically completed functional
blocks. E.g., module skeleton, management API framework, crypto
framework, minimal data handling utils, UDP transport support, TCP
transport support, other supplementary features like ethtools API,
statistics, etc. This was just an example, you better know internal
code dependencies and a best way to introduce the functionality.

> diff --git a/MAINTAINERS b/MAINTAINERS
> index 1920d82db83e..7cb16007dd5c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -15103,6 +15103,14 @@ T:     git git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git
>  F:     Documentation/filesystems/overlayfs.rst
>  F:     fs/overlayfs/
>
> +OVPN-DCO NETWORK DRIVER
> +M:     Antonio Quartulli <antonio@openvpn.net>
> +L:     openvpn-devel@lists.sourceforge.net (moderated for non-subscribers)
> +L:     netdev@vger.kernel.org
> +S:     Maintained
> +F:     drivers/net/ovpn-dco/
> +F:     include/uapi/linux/ovpn_dco.h
> +
>  P54 WIRELESS DRIVER
>  M:     Christian Lamparter <chunkeey@googlemail.com>
>  L:     linux-wireless@vger.kernel.org
> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> index 94c889802566..349866bd4448 100644
> --- a/drivers/net/Kconfig
> +++ b/drivers/net/Kconfig
> @@ -116,6 +116,25 @@ config WIREGUARD_DEBUG
>
>           Say N here unless you know what you're doing.
>
> +config OVPN_DCO
> +       tristate "OpenVPN data channel offload"

Just curious, why do you need this "DCO" suffix? It is not some
commonly recognized abbreviation. Why not just call this module
"OpenVPN"? Or are you planning to move some other components into the
kernel and reserve some naming space?

> +       depends on NET && INET
> +       select NET_UDP_TUNNEL
> +       select DST_CACHE
> +       select CRYPTO
> +       select CRYPTO_AES
> +       select CRYPTO_GCM
> +       select CRYPTO_CHACHA20POLY1305
> +       help
> +         This module enhances the performance of an OpenVPN connection by
> +         allowing the user to offload the data channel processing to
> +         kernelspace.
> +         Connection handshake, parameters negotiation and other non-data
> +         related mechanisms are still performed in userspace.
> +
> +         The OpenVPN userspace software at version 2.6 or higher is required
> +         to use this functionality.
> +
>  config EQUALIZER
>         tristate "EQL (serial line load balancing) support"
>         help

[skipped]

> +/**
> + * ovpn_num_queues - define number of queues to allocate per device
> + *
> + * The value returned by this function is used to decide how many RX and TX
> + * queues to allocate when creating the netdev object
> + *
> + * Return the number of queues to allocate
> + */
> +static unsigned int ovpn_num_queues(void)
> +{
> +       return num_online_cpus();
> +}
> +
> +static struct rtnl_link_ops ovpn_link_ops __read_mostly = {
> +       .kind                   = DRV_NAME,
> +       .priv_size              = sizeof(struct ovpn_struct),
> +       .setup                  = ovpn_setup,
> +       .policy                 = ovpn_policy,
> +       .maxtype                = IFLA_OVPN_MAX,
> +       .newlink                = ovpn_newlink,
> +       .dellink                = ovpn_dellink,

What is the purpose of creating and destroying interfaces via RTNL,
but performing all other operations using the dedicated netlink
protocol?

RTNL interface usually implemented for some standalone interface
types, e.g. VLAN, GRE, etc. Here we need a userspace application
anyway to be able to use the network device to forward traffic, and
the module implements the dedicated GENL protocol. So why not just
introduce OVPN_CMD_NEW_IFACE and OVPN_CMD_DEL_IFACE commands to the
GENL interface? It looks like this will simplify the userspace part by
using the single GENL interface for any management operations.

> +       .get_num_tx_queues      = ovpn_num_queues,
> +       .get_num_rx_queues      = ovpn_num_queues,

What is the benefit of requesting multiple queues if the xmit callback
places all packets from those kernel queues into the single internal
queue anyway?

> +};
> +
> +static int __init ovpn_init(void)
> +{
> +       int err = 0;
> +
> +       pr_info("%s %s -- %s\n", DRV_DESCRIPTION, DRV_VERSION, DRV_COPYRIGHT);

Is this log line really necessary for the regular module usage?

> +
> +       /* init RTNL link ops */
> +       err = rtnl_link_register(&ovpn_link_ops);
> +       if (err) {
> +               pr_err("ovpn: can't register RTNL link ops\n");
> +               goto err;
> +       }
> +
> +       err = ovpn_netlink_register();
> +       if (err) {
> +               pr_err("ovpn: can't register netlink family\n");
> +               goto err_rtnl_unregister;
> +       }
> +
> +       return 0;
> +
> +err_rtnl_unregister:
> +       rtnl_link_unregister(&ovpn_link_ops);
> +err:
> +       pr_err("ovpn: initialization failed, error status=%d\n", err);
> +       return err;
> +}

[skipped]

> +static const struct genl_ops ovpn_netlink_ops[] = {
> +       {
> +               .cmd = OVPN_CMD_NEW_PEER,
> +               .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,

AFAIK, the "don't validate strict" flag is for compatibility with old
users of earlier existing subsystems. For new GENL families, it is
better to avoid using this flag and strictly implement the netlink
protocol.

> +               .flags = GENL_ADMIN_PERM,
> +               .doit = ovpn_netlink_new_peer,
> +       },
>
>         ...
>
> +};

[skipped]

> +static int ovpn_transport_to_userspace(struct ovpn_struct *ovpn, const struct ovpn_peer *peer,
> +                                      struct sk_buff *skb)
> +{
> +       int ret;
> +
> +       ret = skb_linearize(skb);
> +       if (ret < 0)
> +               return ret;
> +
> +       ret = ovpn_netlink_send_packet(ovpn, peer, skb->data, skb->len);
> +       if (ret < 0)
> +               return ret;

Another interesting decision. Why are you transporting the control
messages via Netlink? Why not just pass them to userspace via an
already existing TCP/UDP socket, like the LT2P module do, for example?
Such design usually requires less changes to the userspace application
since it is still able to process control messages as earlier by
reading them from the socket.

> +       consume_skb(skb);
> +       return 0;
> +}

[skipped]

> +/* Net device start xmit
> + */
> +netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev)
> +{
> +       struct ovpn_struct *ovpn = netdev_priv(dev);
> +       struct sk_buff *segments, *tmp, *curr, *next;
> +       struct sk_buff_head skb_list;
> +       __be16 proto;
> +       int ret;
> +
> +       /* reset netfilter state */
> +       nf_reset_ct(skb);
> +
> +       /* verify IP header size in network packet */
> +       proto = ovpn_ip_check_protocol(skb);
> +       if (unlikely(!proto || skb->protocol != proto)) {
> +               net_dbg_ratelimited("%s: dropping malformed payload packet\n",
> +                                   dev->name);
> +               goto drop;
> +       }
> +
> +       if (skb_is_gso(skb)) {
> +               segments = skb_gso_segment(skb, 0);
> +               if (IS_ERR(segments)) {
> +                       ret = PTR_ERR(segments);
> +                       net_dbg_ratelimited("%s: cannot segment packet: %d\n", dev->name, ret);
> +                       goto drop;
> +               }
> +
> +               consume_skb(skb);
> +               skb = segments;
> +       }
> +
> +       /* from this moment on, "skb" might be a list */
> +
> +       __skb_queue_head_init(&skb_list);
> +       skb_list_walk_safe(skb, curr, next) {
> +               skb_mark_not_on_list(curr);
> +
> +               tmp = skb_share_check(curr, GFP_ATOMIC);
> +               if (unlikely(!tmp)) {
> +                       kfree_skb_list(next);
> +                       net_dbg_ratelimited("%s: skb_share_check failed\n", dev->name);
> +                       goto drop_list;
> +               }
> +
> +               __skb_queue_tail(&skb_list, tmp);
> +       }
> +       skb_list.prev->next = NULL;
> +
> +       ovpn_queue_skb(ovpn, skb_list.next, NULL);
> +
> +       return NETDEV_TX_OK;
> +
> +drop_list:
> +       skb_queue_walk_safe(&skb_list, curr, next)
> +               kfree_skb(curr);
> +drop:
> +       skb_tx_error(skb);
> +       kfree_skb_list(skb);
> +       return NET_XMIT_DROP;
> +}

[skipped]

> diff --git a/include/uapi/linux/ovpn_dco.h b/include/uapi/linux/ovpn_dco.h
> new file mode 100644
> index 000000000000..6afee8b3fedd
> --- /dev/null
> +++ b/include/uapi/linux/ovpn_dco.h
> @@ -0,0 +1,265 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/*
> + *  OpenVPN data channel accelerator
> + *
> + *  Copyright (C) 2019-2022 OpenVPN, Inc.
> + *
> + *  Author:    James Yonan <james@openvpn.net>
> + *             Antonio Quartulli <antonio@openvpn.net>
> + */
> +
> +#ifndef _UAPI_LINUX_OVPN_DCO_H_
> +#define _UAPI_LINUX_OVPN_DCO_H_
> +
> +#define OVPN_NL_NAME "ovpn-dco"
> +
> +#define OVPN_NL_MULTICAST_GROUP_PEERS "peers"
> +
> +/**
> + * enum ovpn_nl_commands - supported netlink commands
> + */
> +enum ovpn_nl_commands {
> +       /**
> +        * @OVPN_CMD_UNSPEC: unspecified command to catch errors
> +        */
> +       OVPN_CMD_UNSPEC = 0,
> +
> +       /**
> +        * @OVPN_CMD_NEW_PEER: Configure peer with its crypto keys
> +        */
> +       OVPN_CMD_NEW_PEER,
> +
> +       /**
> +        * @OVPN_CMD_SET_PEER: Tweak parameters for an existing peer
> +        */
> +       OVPN_CMD_SET_PEER,
> +
> +       /**
> +        * @OVPN_CMD_DEL_PEER: Remove peer from internal table
> +        */
> +       OVPN_CMD_DEL_PEER,
> +
> +       OVPN_CMD_NEW_KEY,
> +
> +       OVPN_CMD_SWAP_KEYS,
> +
> +       OVPN_CMD_DEL_KEY,
> +
> +       /**
> +        * @OVPN_CMD_REGISTER_PACKET: Register for specific packet types to be
> +        * forwarded to userspace
> +        */
> +       OVPN_CMD_REGISTER_PACKET,
> +
> +       /**
> +        * @OVPN_CMD_PACKET: Send a packet from userspace to kernelspace. Also
> +        * used to send to userspace packets for which a process had registered
> +        * with OVPN_CMD_REGISTER_PACKET
> +        */
> +       OVPN_CMD_PACKET,
> +
> +       /**
> +        * @OVPN_CMD_GET_PEER: Retrieve the status of a peer or all peers
> +        */
> +       OVPN_CMD_GET_PEER,
> +};
> +
> +enum ovpn_cipher_alg {
> +       /**
> +        * @OVPN_CIPHER_ALG_NONE: No encryption - reserved for debugging only
> +        */
> +       OVPN_CIPHER_ALG_NONE = 0,
> +       /**
> +        * @OVPN_CIPHER_ALG_AES_GCM: AES-GCM AEAD cipher with any allowed key size
> +        */
> +       OVPN_CIPHER_ALG_AES_GCM,
> +       /**
> +        * @OVPN_CIPHER_ALG_CHACHA20_POLY1305: ChaCha20Poly1305 AEAD cipher
> +        */
> +       OVPN_CIPHER_ALG_CHACHA20_POLY1305,
> +};
> +
> +enum ovpn_del_peer_reason {
> +       __OVPN_DEL_PEER_REASON_FIRST,
> +       OVPN_DEL_PEER_REASON_TEARDOWN = __OVPN_DEL_PEER_REASON_FIRST,
> +       OVPN_DEL_PEER_REASON_USERSPACE,
> +       OVPN_DEL_PEER_REASON_EXPIRED,
> +       OVPN_DEL_PEER_REASON_TRANSPORT_ERROR,
> +       __OVPN_DEL_PEER_REASON_AFTER_LAST
> +};
> +
> +enum ovpn_key_slot {
> +       __OVPN_KEY_SLOT_FIRST,
> +       OVPN_KEY_SLOT_PRIMARY = __OVPN_KEY_SLOT_FIRST,
> +       OVPN_KEY_SLOT_SECONDARY,
> +       __OVPN_KEY_SLOT_AFTER_LAST,
> +};
> +
> +enum ovpn_netlink_attrs {
> +       OVPN_ATTR_UNSPEC = 0,
> +       OVPN_ATTR_IFINDEX,
> +       OVPN_ATTR_NEW_PEER,
> +       OVPN_ATTR_SET_PEER,
> +       OVPN_ATTR_DEL_PEER,

What is the purpose of introducing separate attributes for each
NEW/SET/GET/DEL operation? Why not just use a single OVPN_ATTR_PEER
attribute?

BTW, generic netlink for some time allows you to have a dedicated set
of attributes (and corresponding policies) for each message. So, if
you have different object types (e.g. peers, interfaces, keys) you can
avoid creating a common set of attributes to cover them all at once,
but just create several attribute sets, one set per each object type
with corresponding policies (see policy field of the genl_ops struct).

> +       OVPN_ATTR_NEW_KEY,
> +       OVPN_ATTR_SWAP_KEYS,
> +       OVPN_ATTR_DEL_KEY,
> +       OVPN_ATTR_PACKET,
> +       OVPN_ATTR_GET_PEER,
> +
> +       __OVPN_ATTR_AFTER_LAST,
> +       OVPN_ATTR_MAX = __OVPN_ATTR_AFTER_LAST - 1,
> +};
> +
> +enum ovpn_netlink_key_dir_attrs {
> +       OVPN_KEY_DIR_ATTR_UNSPEC = 0,
> +       OVPN_KEY_DIR_ATTR_CIPHER_KEY,
> +       OVPN_KEY_DIR_ATTR_NONCE_TAIL,
> +
> +       __OVPN_KEY_DIR_ATTR_AFTER_LAST,
> +       OVPN_KEY_DIR_ATTR_MAX = __OVPN_KEY_DIR_ATTR_AFTER_LAST - 1,
> +};
> +
> +enum ovpn_netlink_new_key_attrs {
> +       OVPN_NEW_KEY_ATTR_UNSPEC = 0,
> +       OVPN_NEW_KEY_ATTR_PEER_ID,
> +       OVPN_NEW_KEY_ATTR_KEY_SLOT,
> +       OVPN_NEW_KEY_ATTR_KEY_ID,
> +       OVPN_NEW_KEY_ATTR_CIPHER_ALG,
> +       OVPN_NEW_KEY_ATTR_ENCRYPT_KEY,
> +       OVPN_NEW_KEY_ATTR_DECRYPT_KEY,
> +
> +       __OVPN_NEW_KEY_ATTR_AFTER_LAST,
> +       OVPN_NEW_KEY_ATTR_MAX = __OVPN_NEW_KEY_ATTR_AFTER_LAST - 1,
> +};
> +
> +enum ovpn_netlink_del_key_attrs {
> +       OVPN_DEL_KEY_ATTR_UNSPEC = 0,
> +       OVPN_DEL_KEY_ATTR_PEER_ID,
> +       OVPN_DEL_KEY_ATTR_KEY_SLOT,
> +
> +       __OVPN_DEL_KEY_ATTR_AFTER_LAST,
> +       OVPN_DEL_KEY_ATTR_MAX = __OVPN_DEL_KEY_ATTR_AFTER_LAST - 1,
> +};
> +
> +enum ovpn_netlink_swap_keys_attrs {
> +       OVPN_SWAP_KEYS_ATTR_UNSPEC = 0,
> +       OVPN_SWAP_KEYS_ATTR_PEER_ID,
> +
> +       __OVPN_SWAP_KEYS_ATTR_AFTER_LAST,
> +       OVPN_SWAP_KEYS_ATTR_MAX = __OVPN_SWAP_KEYS_ATTR_AFTER_LAST - 1,
> +
> +};
> +
> +enum ovpn_netlink_new_peer_attrs {
> +       OVPN_NEW_PEER_ATTR_UNSPEC = 0,
> +       OVPN_NEW_PEER_ATTR_PEER_ID,
> +       OVPN_NEW_PEER_ATTR_SOCKADDR_REMOTE,
> +       OVPN_NEW_PEER_ATTR_SOCKET,
> +       OVPN_NEW_PEER_ATTR_IPV4,
> +       OVPN_NEW_PEER_ATTR_IPV6,
> +       OVPN_NEW_PEER_ATTR_LOCAL_IP,
> +
> +       __OVPN_NEW_PEER_ATTR_AFTER_LAST,
> +       OVPN_NEW_PEER_ATTR_MAX = __OVPN_NEW_PEER_ATTR_AFTER_LAST - 1,
> +};
> +
> +enum ovpn_netlink_set_peer_attrs {
> +       OVPN_SET_PEER_ATTR_UNSPEC = 0,
> +       OVPN_SET_PEER_ATTR_PEER_ID,
> +       OVPN_SET_PEER_ATTR_KEEPALIVE_INTERVAL,
> +       OVPN_SET_PEER_ATTR_KEEPALIVE_TIMEOUT,
> +
> +       __OVPN_SET_PEER_ATTR_AFTER_LAST,
> +       OVPN_SET_PEER_ATTR_MAX = __OVPN_SET_PEER_ATTR_AFTER_LAST - 1,
> +};
> +
> +enum ovpn_netlink_del_peer_attrs {
> +       OVPN_DEL_PEER_ATTR_UNSPEC = 0,
> +       OVPN_DEL_PEER_ATTR_REASON,
> +       OVPN_DEL_PEER_ATTR_PEER_ID,
> +
> +       __OVPN_DEL_PEER_ATTR_AFTER_LAST,
> +       OVPN_DEL_PEER_ATTR_MAX = __OVPN_DEL_PEER_ATTR_AFTER_LAST - 1,
> +};
> +
> +enum ovpn_netlink_get_peer_attrs {
> +       OVPN_GET_PEER_ATTR_UNSPEC = 0,
> +       OVPN_GET_PEER_ATTR_PEER_ID,
> +
> +       __OVPN_GET_PEER_ATTR_AFTER_LAST,
> +       OVPN_GET_PEER_ATTR_MAX = __OVPN_GET_PEER_ATTR_AFTER_LAST - 1,
> +};

What is the reason to create a separate set of attributes per each
operation? In my experience, it is easier to use a common set of
attributes for all operations on the same object type. At least it is
easier to manage one enum instead of four. And you are always sure
that attributes with the same semantics (e.g. remote IP) have the same
id in any GET/SET message.

> +enum ovpn_netlink_get_peer_response_attrs {
> +       OVPN_GET_PEER_RESP_ATTR_UNSPEC = 0,
> +       OVPN_GET_PEER_RESP_ATTR_PEER_ID,
> +       OVPN_GET_PEER_RESP_ATTR_SOCKADDR_REMOTE,
> +       OVPN_GET_PEER_RESP_ATTR_IPV4,
> +       OVPN_GET_PEER_RESP_ATTR_IPV6,
> +       OVPN_GET_PEER_RESP_ATTR_LOCAL_IP,
> +       OVPN_GET_PEER_RESP_ATTR_LOCAL_PORT,
> +       OVPN_GET_PEER_RESP_ATTR_KEEPALIVE_INTERVAL,
> +       OVPN_GET_PEER_RESP_ATTR_KEEPALIVE_TIMEOUT,
> +       OVPN_GET_PEER_RESP_ATTR_RX_BYTES,
> +       OVPN_GET_PEER_RESP_ATTR_TX_BYTES,
> +       OVPN_GET_PEER_RESP_ATTR_RX_PACKETS,
> +       OVPN_GET_PEER_RESP_ATTR_TX_PACKETS,
> +
> +       __OVPN_GET_PEER_RESP_ATTR_AFTER_LAST,
> +       OVPN_GET_PEER_RESP_ATTR_MAX = __OVPN_GET_PEER_RESP_ATTR_AFTER_LAST - 1,
> +};
> +
> +enum ovpn_netlink_peer_stats_attrs {
> +       OVPN_PEER_STATS_ATTR_UNSPEC = 0,
> +       OVPN_PEER_STATS_BYTES,
> +       OVPN_PEER_STATS_PACKETS,
> +
> +       __OVPN_PEER_STATS_ATTR_AFTER_LAST,
> +       OVPN_PEER_STATS_ATTR_MAX = __OVPN_PEER_STATS_ATTR_AFTER_LAST - 1,
> +};
> +
> +enum ovpn_netlink_peer_attrs {
> +       OVPN_PEER_ATTR_UNSPEC = 0,
> +       OVPN_PEER_ATTR_PEER_ID,
> +       OVPN_PEER_ATTR_SOCKADDR_REMOTE,
> +       OVPN_PEER_ATTR_IPV4,
> +       OVPN_PEER_ATTR_IPV6,
> +       OVPN_PEER_ATTR_LOCAL_IP,
> +       OVPN_PEER_ATTR_KEEPALIVE_INTERVAL,
> +       OVPN_PEER_ATTR_KEEPALIVE_TIMEOUT,
> +       OVPN_PEER_ATTR_ENCRYPT_KEY,
> +       OVPN_PEER_ATTR_DECRYPT_KEY,
> +       OVPN_PEER_ATTR_RX_STATS,
> +       OVPN_PEER_ATTR_TX_STATS,
> +
> +       __OVPN_PEER_ATTR_AFTER_LAST,
> +       OVPN_PEER_ATTR_MAX = __OVPN_PEER_ATTR_AFTER_LAST - 1,
> +};
> +
> +enum ovpn_netlink_packet_attrs {
> +       OVPN_PACKET_ATTR_UNSPEC = 0,
> +       OVPN_PACKET_ATTR_PACKET,
> +       OVPN_PACKET_ATTR_PEER_ID,
> +
> +       __OVPN_PACKET_ATTR_AFTER_LAST,
> +       OVPN_PACKET_ATTR_MAX = __OVPN_PACKET_ATTR_AFTER_LAST - 1,
> +};
> +
> +enum ovpn_ifla_attrs {
> +       IFLA_OVPN_UNSPEC = 0,
> +       IFLA_OVPN_MODE,
> +
> +       __IFLA_OVPN_AFTER_LAST,
> +       IFLA_OVPN_MAX = __IFLA_OVPN_AFTER_LAST - 1,
> +};
> +
> +enum ovpn_mode {
> +       __OVPN_MODE_FIRST = 0,
> +       OVPN_MODE_P2P = __OVPN_MODE_FIRST,
> +       OVPN_MODE_MP,
> +
> +       __OVPN_MODE_AFTER_LAST,
> +};
> +
> +#endif /* _UAPI_LINUX_OVPN_DCO_H_ */

--
BR,
Sergey

^ permalink raw reply

* Re: [PATCH v2] net: moxa: pass pdev instead of ndev to DMA functions
From: Andrew Lunn @ 2022-08-12 17:56 UTC (permalink / raw)
  To: Sergei Antonov
  Cc: netdev, Jakub Kicinski, Pavel Skripkin, David S . Miller,
	Paolo Abeni, Florian Fainelli
In-Reply-To: <20220812171339.2271788-1-saproj@gmail.com>

On Fri, Aug 12, 2022 at 08:13:39PM +0300, Sergei Antonov wrote:
> dma_map_single() calls fail in moxart_mac_setup_desc_ring() and
> moxart_mac_start_xmit() which leads to an incessant output of this:
> 
> [   16.043925] moxart-ethernet 92000000.mac eth0: DMA mapping error
> [   16.050957] moxart-ethernet 92000000.mac eth0: DMA mapping error
> [   16.058229] moxart-ethernet 92000000.mac eth0: DMA mapping error
> 
> Passing pdev to DMA is a common approach among net drivers.
> 
> Changes in v2:
> - Reject an approach to inherit DMA masks from the platform device.
> 
> Signed-off-by: Sergei Antonov <saproj@gmail.com>
> Suggested-by: Andrew Lunn <andrew@lunn.ch>
> CC: Jakub Kicinski <kuba@kernel.org>
> CC: Pavel Skripkin <paskripkin@gmail.com>
> CC: David S. Miller <davem@davemloft.net>
> CC: Paolo Abeni <pabeni@redhat.com>
> CC: Florian Fainelli <f.fainelli@gmail.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH bpf-next 13/15] mm, memcg: Add new helper get_obj_cgroup_from_cgroup
From: Roman Gushchin @ 2022-08-12 17:40 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
	Song Liu, Yonghong Song, john fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, jolsa, Johannes Weiner, Michal Hocko,
	Shakeel Butt, Muchun Song, Andrew Morton, netdev, bpf, Linux MM
In-Reply-To: <CALOAHbAj7BymBV7KhzxLfMPue8666V+24TOfqG0XTE4euWyR4Q@mail.gmail.com>

On Fri, Aug 12, 2022 at 08:35:19AM +0800, Yafang Shao wrote:
> On Fri, Aug 12, 2022 at 12:16 AM Roman Gushchin
> <roman.gushchin@linux.dev> wrote:
> >
> > On Wed, Aug 10, 2022 at 03:18:38PM +0000, Yafang Shao wrote:
> > > Introduce new helper get_obj_cgroup_from_cgroup() to get obj_cgroup from
> > > a specific cgroup.
> > >
> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > ---
> > >  include/linux/memcontrol.h |  1 +
> > >  mm/memcontrol.c            | 41 +++++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 42 insertions(+)
> > >
> > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > > index 2f0a611..901a921 100644
> > > --- a/include/linux/memcontrol.h
> > > +++ b/include/linux/memcontrol.h
> > > @@ -1713,6 +1713,7 @@ static inline void set_shrinker_bit(struct mem_cgroup *memcg,
> > >  int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order);
> > >  void __memcg_kmem_uncharge_page(struct page *page, int order);
> > >
> > > +struct obj_cgroup *get_obj_cgroup_from_cgroup(struct cgroup *cgrp);
> > >  struct obj_cgroup *get_obj_cgroup_from_current(void);
> > >  struct obj_cgroup *get_obj_cgroup_from_page(struct page *page);
> > >
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index 618c366..762cffa 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -2908,6 +2908,47 @@ static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
> > >       return objcg;
> > >  }
> > >
> > > +static struct obj_cgroup *get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
> > > +{
> > > +     struct obj_cgroup *objcg;
> > > +
> > > +     if (memcg_kmem_bypass())
> > > +             return NULL;
> > > +
> > > +     rcu_read_lock();
> > > +     objcg = __get_obj_cgroup_from_memcg(memcg);
> > > +     rcu_read_unlock();
> > > +     return objcg;
> >
> > This code doesn't make sense to me. What does rcu read lock protect here?
> 
> To protect rcu_dereference(memcg->objcg);.
> Doesn't it need the read rcu lock ?

No, it's not how rcu works. Please, take a look at the docs here:
https://docs.kernel.org/RCU/whatisRCU.html#whatisrcu .
In particular, it describes this specific case very well.

In 2 words, you don't protect the rcu_dereference() call, you protect the pointer
you get, cause it's valid only inside the rcu read section. After rcu_read_unlock()
it might point at a random data, because the protected object can be already freed.

Thanks!

^ permalink raw reply

* [PATCH net 4/4] iavf: Fix deadlock in initialization
From: Tony Nguyen @ 2022-08-12 17:23 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet
  Cc: Ivan Vecera, netdev, anthony.l.nguyen, sassmann, Konrad Jankowski
In-Reply-To: <20220812172309.853230-1-anthony.l.nguyen@intel.com>

From: Ivan Vecera <ivecera@redhat.com>

Fix deadlock that occurs when iavf interface is a part of failover
configuration.

1. Mutex crit_lock is taken at the beginning of iavf_watchdog_task()
2. Function iavf_init_config_adapter() is called when adapter
   state is __IAVF_INIT_CONFIG_ADAPTER
3. iavf_init_config_adapter() calls register_netdevice() that emits
   NETDEV_REGISTER event
4. Notifier function failover_event() then calls
   net_failover_slave_register() that calls dev_open()
5. dev_open() calls iavf_open() that tries to take crit_lock in
   end-less loop

Stack trace:
...
[  790.251876]  usleep_range_state+0x5b/0x80
[  790.252547]  iavf_open+0x37/0x1d0 [iavf]
[  790.253139]  __dev_open+0xcd/0x160
[  790.253699]  dev_open+0x47/0x90
[  790.254323]  net_failover_slave_register+0x122/0x220 [net_failover]
[  790.255213]  failover_slave_register.part.7+0xd2/0x180 [failover]
[  790.256050]  failover_event+0x122/0x1ab [failover]
[  790.256821]  notifier_call_chain+0x47/0x70
[  790.257510]  register_netdevice+0x20f/0x550
[  790.258263]  iavf_watchdog_task+0x7c8/0xea0 [iavf]
[  790.259009]  process_one_work+0x1a7/0x360
[  790.259705]  worker_thread+0x30/0x390

To fix the situation we should check the current adapter state after
first unsuccessful mutex_trylock() and return with -EBUSY if it is
__IAVF_INIT_CONFIG_ADAPTER.

Fixes: 226d528512cf ("iavf: fix locking of critical sections")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf_main.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 95d4348e7579..f39440ad5c50 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -4088,8 +4088,17 @@ static int iavf_open(struct net_device *netdev)
 		return -EIO;
 	}
 
-	while (!mutex_trylock(&adapter->crit_lock))
+	while (!mutex_trylock(&adapter->crit_lock)) {
+		/* If we are in __IAVF_INIT_CONFIG_ADAPTER state the crit_lock
+		 * is already taken and iavf_open is called from an upper
+		 * device's notifier reacting on NETDEV_REGISTER event.
+		 * We have to leave here to avoid dead lock.
+		 */
+		if (adapter->state == __IAVF_INIT_CONFIG_ADAPTER)
+			return -EBUSY;
+
 		usleep_range(500, 1000);
+	}
 
 	if (adapter->state != __IAVF_DOWN) {
 		err = -EBUSY;
-- 
2.35.1


^ permalink raw reply related

* [PATCH net 2/4] iavf: Fix NULL pointer dereference in iavf_get_link_ksettings
From: Tony Nguyen @ 2022-08-12 17:23 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet
  Cc: Przemyslaw Patynowski, netdev, anthony.l.nguyen,
	Jedrzej Jagielski, Marek Szlosek
In-Reply-To: <20220812172309.853230-1-anthony.l.nguyen@intel.com>

From: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>

Fix possible NULL pointer dereference, due to freeing of adapter->vf_res
in iavf_init_get_resources. Previous commit introduced a regression,
where receiving IAVF_ERR_ADMIN_QUEUE_NO_WORK from iavf_get_vf_config
would free adapter->vf_res. However, netdev is still registered, so
ethtool_ops can be called. Calling iavf_get_link_ksettings with no vf_res,
will result with:
[ 9385.242676] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 9385.242683] #PF: supervisor read access in kernel mode
[ 9385.242686] #PF: error_code(0x0000) - not-present page
[ 9385.242690] PGD 0 P4D 0
[ 9385.242696] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
[ 9385.242701] CPU: 6 PID: 3217 Comm: pmdalinux Kdump: loaded Tainted: G S          E     5.18.0-04958-ga54ce3703613-dirty #1
[ 9385.242708] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.11.0 11/02/2019
[ 9385.242710] RIP: 0010:iavf_get_link_ksettings+0x29/0xd0 [iavf]
[ 9385.242745] Code: 00 0f 1f 44 00 00 b8 01 ef ff ff 48 c7 46 30 00 00 00 00 48 c7 46 38 00 00 00 00 c6 46 0b 00 66 89 46 08 48 8b 87 68 0e 00 00 <f6> 40 08 80 75 50 8b 87 5c 0e 00 00 83 f8 08 74 7a 76 1d 83 f8 20
[ 9385.242749] RSP: 0018:ffffc0560ec7fbd0 EFLAGS: 00010246
[ 9385.242755] RAX: 0000000000000000 RBX: ffffc0560ec7fc08 RCX: 0000000000000000
[ 9385.242759] RDX: ffffffffc0ad4550 RSI: ffffc0560ec7fc08 RDI: ffffa0fc66674000
[ 9385.242762] RBP: 00007ffd1fb2bf50 R08: b6a2d54b892363ee R09: ffffa101dc14fb00
[ 9385.242765] R10: 0000000000000000 R11: 0000000000000004 R12: ffffa0fc66674000
[ 9385.242768] R13: 0000000000000000 R14: ffffa0fc66674000 R15: 00000000ffffffa1
[ 9385.242771] FS:  00007f93711a2980(0000) GS:ffffa0fad72c0000(0000) knlGS:0000000000000000
[ 9385.242775] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9385.242778] CR2: 0000000000000008 CR3: 0000000a8e61c003 CR4: 00000000003706e0
[ 9385.242781] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 9385.242784] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 9385.242787] Call Trace:
[ 9385.242791]  <TASK>
[ 9385.242793]  ethtool_get_settings+0x71/0x1a0
[ 9385.242814]  __dev_ethtool+0x426/0x2f40
[ 9385.242823]  ? slab_post_alloc_hook+0x4f/0x280
[ 9385.242836]  ? kmem_cache_alloc_trace+0x15d/0x2f0
[ 9385.242841]  ? dev_ethtool+0x59/0x170
[ 9385.242848]  dev_ethtool+0xa7/0x170
[ 9385.242856]  dev_ioctl+0xc3/0x520
[ 9385.242866]  sock_do_ioctl+0xa0/0xe0
[ 9385.242877]  sock_ioctl+0x22f/0x320
[ 9385.242885]  __x64_sys_ioctl+0x84/0xc0
[ 9385.242896]  do_syscall_64+0x3a/0x80
[ 9385.242904]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 9385.242918] RIP: 0033:0x7f93702396db
[ 9385.242923] Code: 73 01 c3 48 8b 0d ad 57 38 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7d 57 38 00 f7 d8 64 89 01 48
[ 9385.242927] RSP: 002b:00007ffd1fb2bf18 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 9385.242932] RAX: ffffffffffffffda RBX: 000055671b1d2fe0 RCX: 00007f93702396db
[ 9385.242935] RDX: 00007ffd1fb2bf20 RSI: 0000000000008946 RDI: 0000000000000007
[ 9385.242937] RBP: 00007ffd1fb2bf20 R08: 0000000000000003 R09: 0030763066307330
[ 9385.242940] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd1fb2bf80
[ 9385.242942] R13: 0000000000000007 R14: 0000556719f6de90 R15: 00007ffd1fb2c1b0
[ 9385.242948]  </TASK>
[ 9385.242949] Modules linked in: iavf(E) xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables rfkill nfnetlink vfat fat irdma ib_uverbs ib_core intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iTCO_wdt iTCO_vendor_support ice irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl i40e pcspkr intel_cstate joydev mei_me intel_uncore mxm_wmi mei ipmi_ssif lpc_ich ipmi_si acpi_power_meter xfs libcrc32c mgag200 i2c_algo_bit drm_shmem_helper drm_kms_helper sd_mod t10_pi crc64_rocksoft crc64 syscopyarea sg sysfillrect sysimgblt fb_sys_fops drm ixgbe ahci libahci libata crc32c_intel mdio dca wmi dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse
[ 9385.243065]  [last unloaded: iavf]

Dereference happens in if (ADV_LINK_SUPPORT(adapter)) statement

Fixes: 209f2f9c7181 ("iavf: Add support for VIRTCHNL_VF_OFFLOAD_VLAN_V2 negotiation")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
This was initially sent [1] and has had the -EAGAIN removed. It was an
extra safety check, where the adjusted goto (alone) has been tested to
resolve the issue.

[1] https://lore.kernel.org/netdev/20220712182636.53a957cc@kernel.org/

 drivers/net/ethernet/intel/iavf/iavf_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 45d097a164ad..6aa3eff0da2c 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2367,7 +2367,7 @@ static void iavf_init_get_resources(struct iavf_adapter *adapter)
 	err = iavf_get_vf_config(adapter);
 	if (err == -EALREADY) {
 		err = iavf_send_vf_config_msg(adapter);
-		goto err_alloc;
+		goto err;
 	} else if (err == -EINVAL) {
 		/* We only get -EINVAL if the device is in a very bad
 		 * state or if we've been disabled for previous bad
-- 
2.35.1


^ permalink raw reply related

* [PATCH net 3/4] iavf: Fix reset error handling
From: Tony Nguyen @ 2022-08-12 17:23 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet
  Cc: Przemyslaw Patynowski, netdev, anthony.l.nguyen,
	Jedrzej Jagielski, Marek Szlosek
In-Reply-To: <20220812172309.853230-1-anthony.l.nguyen@intel.com>

From: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>

Do not call iavf_close in iavf_reset_task error handling. Doing so can
lead to double call of napi_disable, which can lead to deadlock there.
Removing VF would lead to iavf_remove task being stuck, because it
requires crit_lock, which is held by iavf_close.
Call iavf_disable_vf if reset fail, so that driver will clean up
remaining invalid resources.
During rapid VF resets, HW can fail to setup VF mailbox. Wrong
error handling can lead to iavf_remove being stuck with:
[ 5218.999087] iavf 0000:82:01.0: Failed to init adminq: -53
...
[ 5267.189211] INFO: task repro.sh:11219 blocked for more than 30 seconds.
[ 5267.189520]       Tainted: G S          E     5.18.0-04958-ga54ce3703613-dirty #1
[ 5267.189764] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5267.190062] task:repro.sh        state:D stack:    0 pid:11219 ppid:  8162 flags:0x00000000
[ 5267.190347] Call Trace:
[ 5267.190647]  <TASK>
[ 5267.190927]  __schedule+0x460/0x9f0
[ 5267.191264]  schedule+0x44/0xb0
[ 5267.191563]  schedule_preempt_disabled+0x14/0x20
[ 5267.191890]  __mutex_lock.isra.12+0x6e3/0xac0
[ 5267.192237]  ? iavf_remove+0xf9/0x6c0 [iavf]
[ 5267.192565]  iavf_remove+0x12a/0x6c0 [iavf]
[ 5267.192911]  ? _raw_spin_unlock_irqrestore+0x1e/0x40
[ 5267.193285]  pci_device_remove+0x36/0xb0
[ 5267.193619]  device_release_driver_internal+0xc1/0x150
[ 5267.193974]  pci_stop_bus_device+0x69/0x90
[ 5267.194361]  pci_stop_and_remove_bus_device+0xe/0x20
[ 5267.194735]  pci_iov_remove_virtfn+0xba/0x120
[ 5267.195130]  sriov_disable+0x2f/0xe0
[ 5267.195506]  ice_free_vfs+0x7d/0x2f0 [ice]
[ 5267.196056]  ? pci_get_device+0x4f/0x70
[ 5267.196496]  ice_sriov_configure+0x78/0x1a0 [ice]
[ 5267.196995]  sriov_numvfs_store+0xfe/0x140
[ 5267.197466]  kernfs_fop_write_iter+0x12e/0x1c0
[ 5267.197918]  new_sync_write+0x10c/0x190
[ 5267.198404]  vfs_write+0x24e/0x2d0
[ 5267.198886]  ksys_write+0x5c/0xd0
[ 5267.199367]  do_syscall_64+0x3a/0x80
[ 5267.199827]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 5267.200317] RIP: 0033:0x7f5b381205c8
[ 5267.200814] RSP: 002b:00007fff8c7e8c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 5267.201981] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f5b381205c8
[ 5267.202620] RDX: 0000000000000002 RSI: 00005569420ee900 RDI: 0000000000000001
[ 5267.203426] RBP: 00005569420ee900 R08: 000000000000000a R09: 00007f5b38180820
[ 5267.204327] R10: 000000000000000a R11: 0000000000000246 R12: 00007f5b383c06e0
[ 5267.205193] R13: 0000000000000002 R14: 00007f5b383bb880 R15: 0000000000000002
[ 5267.206041]  </TASK>
[ 5267.206970] Kernel panic - not syncing: hung_task: blocked tasks
[ 5267.207809] CPU: 48 PID: 551 Comm: khungtaskd Kdump: loaded Tainted: G S          E     5.18.0-04958-ga54ce3703613-dirty #1
[ 5267.208726] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.11.0 11/02/2019
[ 5267.209623] Call Trace:
[ 5267.210569]  <TASK>
[ 5267.211480]  dump_stack_lvl+0x33/0x42
[ 5267.212472]  panic+0x107/0x294
[ 5267.213467]  watchdog.cold.8+0xc/0xbb
[ 5267.214413]  ? proc_dohung_task_timeout_secs+0x30/0x30
[ 5267.215511]  kthread+0xf4/0x120
[ 5267.216459]  ? kthread_complete_and_exit+0x20/0x20
[ 5267.217505]  ret_from_fork+0x22/0x30
[ 5267.218459]  </TASK>

Fixes: f0db78928783 ("i40evf: use netdev variable in reset task")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf_main.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 6aa3eff0da2c..95d4348e7579 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -3086,12 +3086,15 @@ static void iavf_reset_task(struct work_struct *work)
 
 	return;
 reset_err:
+	if (running) {
+		set_bit(__IAVF_VSI_DOWN, adapter->vsi.state);
+		iavf_free_traffic_irqs(adapter);
+	}
+	iavf_disable_vf(adapter);
+
 	mutex_unlock(&adapter->client_lock);
 	mutex_unlock(&adapter->crit_lock);
-	if (running)
-		iavf_change_state(adapter, __IAVF_RUNNING);
 	dev_err(&adapter->pdev->dev, "failed to allocate resources during reinit\n");
-	iavf_close(netdev);
 }
 
 /**
-- 
2.35.1


^ permalink raw reply related

* [PATCH net 1/4] iavf: Fix adminq error handling
From: Tony Nguyen @ 2022-08-12 17:23 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet
  Cc: Przemyslaw Patynowski, netdev, anthony.l.nguyen,
	Jedrzej Jagielski, Marek Szlosek
In-Reply-To: <20220812172309.853230-1-anthony.l.nguyen@intel.com>

From: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>

iavf_alloc_asq_bufs/iavf_alloc_arq_bufs allocates with dma_alloc_coherent
memory for VF mailbox.
Free DMA regions for both ASQ and ARQ in case error happens during
configuration of ASQ/ARQ registers.
Without this change it is possible to see when unloading interface:
74626.583369: dma_debug_device_change: device driver has pending DMA allocations while released from device [count=32]
One of leaked entries details: [device address=0x0000000b27ff9000] [size=4096 bytes] [mapped with DMA_BIDIRECTIONAL] [mapped as coherent]

Fixes: d358aa9a7a2d ("i40evf: init code and hardware support")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf_adminq.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_adminq.c b/drivers/net/ethernet/intel/iavf/iavf_adminq.c
index cd4e6a22d0f9..9ffbd24d83cb 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_adminq.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_adminq.c
@@ -324,6 +324,7 @@ static enum iavf_status iavf_config_arq_regs(struct iavf_hw *hw)
 static enum iavf_status iavf_init_asq(struct iavf_hw *hw)
 {
 	enum iavf_status ret_code = 0;
+	int i;
 
 	if (hw->aq.asq.count > 0) {
 		/* queue already initialized */
@@ -354,12 +355,17 @@ static enum iavf_status iavf_init_asq(struct iavf_hw *hw)
 	/* initialize base registers */
 	ret_code = iavf_config_asq_regs(hw);
 	if (ret_code)
-		goto init_adminq_free_rings;
+		goto init_free_asq_bufs;
 
 	/* success! */
 	hw->aq.asq.count = hw->aq.num_asq_entries;
 	goto init_adminq_exit;
 
+init_free_asq_bufs:
+	for (i = 0; i < hw->aq.num_asq_entries; i++)
+		iavf_free_dma_mem(hw, &hw->aq.asq.r.asq_bi[i]);
+	iavf_free_virt_mem(hw, &hw->aq.asq.dma_head);
+
 init_adminq_free_rings:
 	iavf_free_adminq_asq(hw);
 
@@ -383,6 +389,7 @@ static enum iavf_status iavf_init_asq(struct iavf_hw *hw)
 static enum iavf_status iavf_init_arq(struct iavf_hw *hw)
 {
 	enum iavf_status ret_code = 0;
+	int i;
 
 	if (hw->aq.arq.count > 0) {
 		/* queue already initialized */
@@ -413,12 +420,16 @@ static enum iavf_status iavf_init_arq(struct iavf_hw *hw)
 	/* initialize base registers */
 	ret_code = iavf_config_arq_regs(hw);
 	if (ret_code)
-		goto init_adminq_free_rings;
+		goto init_free_arq_bufs;
 
 	/* success! */
 	hw->aq.arq.count = hw->aq.num_arq_entries;
 	goto init_adminq_exit;
 
+init_free_arq_bufs:
+	for (i = 0; i < hw->aq.num_arq_entries; i++)
+		iavf_free_dma_mem(hw, &hw->aq.arq.r.arq_bi[i]);
+	iavf_free_virt_mem(hw, &hw->aq.arq.dma_head);
 init_adminq_free_rings:
 	iavf_free_adminq_arq(hw);
 
-- 
2.35.1


^ permalink raw reply related

* [PATCH net 0/4][pull request] Intel Wired LAN Driver Updates 2022-08-12 (iavf)
From: Tony Nguyen @ 2022-08-12 17:23 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet; +Cc: Tony Nguyen, netdev

This series contains updates to iavf driver only.

Przemyslaw frees memory for admin queues in initialization error paths,
prevents freeing of vf_res which is causing null pointer dereference,
and adjusts calls in error path of reset to avoid iavf_close() which
could cause deadlock.

Ivan Vecera avoids deadlock that can occur when driver if part of
failover.

The following are changes since commit 40b4ac880e21d917da7f3752332fa57564a4c202:
  net: lan966x: fix checking for return value of platform_get_irq_byname()
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue 40GbE

Ivan Vecera (1):
  iavf: Fix deadlock in initialization

Przemyslaw Patynowski (3):
  iavf: Fix adminq error handling
  iavf: Fix NULL pointer dereference in iavf_get_link_ksettings
  iavf: Fix reset error handling

 drivers/net/ethernet/intel/iavf/iavf_adminq.c | 15 +++++++++++--
 drivers/net/ethernet/intel/iavf/iavf_main.c   | 22 ++++++++++++++-----
 2 files changed, 30 insertions(+), 7 deletions(-)

-- 
2.35.1


^ permalink raw reply

* [PATCH v2] net: moxa: pass pdev instead of ndev to DMA functions
From: Sergei Antonov @ 2022-08-12 17:13 UTC (permalink / raw)
  To: netdev
  Cc: Sergei Antonov, Andrew Lunn, Jakub Kicinski, Pavel Skripkin,
	David S . Miller, Paolo Abeni, Florian Fainelli

dma_map_single() calls fail in moxart_mac_setup_desc_ring() and
moxart_mac_start_xmit() which leads to an incessant output of this:

[   16.043925] moxart-ethernet 92000000.mac eth0: DMA mapping error
[   16.050957] moxart-ethernet 92000000.mac eth0: DMA mapping error
[   16.058229] moxart-ethernet 92000000.mac eth0: DMA mapping error

Passing pdev to DMA is a common approach among net drivers.

Changes in v2:
- Reject an approach to inherit DMA masks from the platform device.

Signed-off-by: Sergei Antonov <saproj@gmail.com>
Suggested-by: Andrew Lunn <andrew@lunn.ch>
CC: Jakub Kicinski <kuba@kernel.org>
CC: Pavel Skripkin <paskripkin@gmail.com>
CC: David S. Miller <davem@davemloft.net>
CC: Paolo Abeni <pabeni@redhat.com>
CC: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/moxa/moxart_ether.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/moxa/moxart_ether.c b/drivers/net/ethernet/moxa/moxart_ether.c
index a3214a762e4b..f11f1cb92025 100644
--- a/drivers/net/ethernet/moxa/moxart_ether.c
+++ b/drivers/net/ethernet/moxa/moxart_ether.c
@@ -77,7 +77,7 @@ static void moxart_mac_free_memory(struct net_device *ndev)
 	int i;
 
 	for (i = 0; i < RX_DESC_NUM; i++)
-		dma_unmap_single(&ndev->dev, priv->rx_mapping[i],
+		dma_unmap_single(&priv->pdev->dev, priv->rx_mapping[i],
 				 priv->rx_buf_size, DMA_FROM_DEVICE);
 
 	if (priv->tx_desc_base)
@@ -147,11 +147,11 @@ static void moxart_mac_setup_desc_ring(struct net_device *ndev)
 		       desc + RX_REG_OFFSET_DESC1);
 
 		priv->rx_buf[i] = priv->rx_buf_base + priv->rx_buf_size * i;
-		priv->rx_mapping[i] = dma_map_single(&ndev->dev,
+		priv->rx_mapping[i] = dma_map_single(&priv->pdev->dev,
 						     priv->rx_buf[i],
 						     priv->rx_buf_size,
 						     DMA_FROM_DEVICE);
-		if (dma_mapping_error(&ndev->dev, priv->rx_mapping[i]))
+		if (dma_mapping_error(&priv->pdev->dev, priv->rx_mapping[i]))
 			netdev_err(ndev, "DMA mapping error\n");
 
 		moxart_desc_write(priv->rx_mapping[i],
@@ -240,7 +240,7 @@ static int moxart_rx_poll(struct napi_struct *napi, int budget)
 		if (len > RX_BUF_SIZE)
 			len = RX_BUF_SIZE;
 
-		dma_sync_single_for_cpu(&ndev->dev,
+		dma_sync_single_for_cpu(&priv->pdev->dev,
 					priv->rx_mapping[rx_head],
 					priv->rx_buf_size, DMA_FROM_DEVICE);
 		skb = netdev_alloc_skb_ip_align(ndev, len);
@@ -294,7 +294,7 @@ static void moxart_tx_finished(struct net_device *ndev)
 	unsigned int tx_tail = priv->tx_tail;
 
 	while (tx_tail != tx_head) {
-		dma_unmap_single(&ndev->dev, priv->tx_mapping[tx_tail],
+		dma_unmap_single(&priv->pdev->dev, priv->tx_mapping[tx_tail],
 				 priv->tx_len[tx_tail], DMA_TO_DEVICE);
 
 		ndev->stats.tx_packets++;
@@ -358,9 +358,9 @@ static netdev_tx_t moxart_mac_start_xmit(struct sk_buff *skb,
 
 	len = skb->len > TX_BUF_SIZE ? TX_BUF_SIZE : skb->len;
 
-	priv->tx_mapping[tx_head] = dma_map_single(&ndev->dev, skb->data,
+	priv->tx_mapping[tx_head] = dma_map_single(&priv->pdev->dev, skb->data,
 						   len, DMA_TO_DEVICE);
-	if (dma_mapping_error(&ndev->dev, priv->tx_mapping[tx_head])) {
+	if (dma_mapping_error(&priv->pdev->dev, priv->tx_mapping[tx_head])) {
 		netdev_err(ndev, "DMA mapping error\n");
 		goto out_unlock;
 	}
@@ -379,7 +379,7 @@ static netdev_tx_t moxart_mac_start_xmit(struct sk_buff *skb,
 		len = ETH_ZLEN;
 	}
 
-	dma_sync_single_for_device(&ndev->dev, priv->tx_mapping[tx_head],
+	dma_sync_single_for_device(&priv->pdev->dev, priv->tx_mapping[tx_head],
 				   priv->tx_buf_size, DMA_TO_DEVICE);
 
 	txdes1 = TX_DESC1_LTS | TX_DESC1_FTS | (len & TX_DESC1_BUF_SIZE_MASK);
@@ -493,7 +493,7 @@ static int moxart_mac_probe(struct platform_device *pdev)
 	priv->tx_buf_size = TX_BUF_SIZE;
 	priv->rx_buf_size = RX_BUF_SIZE;
 
-	priv->tx_desc_base = dma_alloc_coherent(&pdev->dev, TX_REG_DESC_SIZE *
+	priv->tx_desc_base = dma_alloc_coherent(p_dev, TX_REG_DESC_SIZE *
 						TX_DESC_NUM, &priv->tx_base,
 						GFP_DMA | GFP_KERNEL);
 	if (!priv->tx_desc_base) {
@@ -501,7 +501,7 @@ static int moxart_mac_probe(struct platform_device *pdev)
 		goto init_fail;
 	}
 
-	priv->rx_desc_base = dma_alloc_coherent(&pdev->dev, RX_REG_DESC_SIZE *
+	priv->rx_desc_base = dma_alloc_coherent(p_dev, RX_REG_DESC_SIZE *
 						RX_DESC_NUM, &priv->rx_base,
 						GFP_DMA | GFP_KERNEL);
 	if (!priv->rx_desc_base) {
-- 
2.32.0


^ permalink raw reply related

* [syzbot] memory leak in netlink_policy_dump_add_policy
From: syzbot @ 2022-08-12 17:04 UTC (permalink / raw)
  To: davem, edumazet, kuba, linux-kernel, netdev, pabeni,
	syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    4e23eeebb2e5 Merge tag 'bitmap-6.0-rc1' of https://github...
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=165f4f6a080000
kernel config:  https://syzkaller.appspot.com/x/.config?x=3a433c7a2539f51c
dashboard link: https://syzkaller.appspot.com/bug?extid=dc54d9ba8153b216cae0
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1443be71080000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11e5918e080000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+dc54d9ba8153b216cae0@syzkaller.appspotmail.com

executing program
executing program
executing program
BUG: memory leak
unreferenced object 0xffff888113093f00 (size 192):
  comm "syz-executor228", pid 3636, jiffies 4294947950 (age 12.750s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 0a 00 00 00 00 00 00 00  ................
    40 53 fd 84 ff ff ff ff 40 01 00 00 00 00 00 00  @S......@.......
  backtrace:
    [<ffffffff83a0e378>] kmalloc include/linux/slab.h:600 [inline]
    [<ffffffff83a0e378>] kzalloc include/linux/slab.h:733 [inline]
    [<ffffffff83a0e378>] alloc_state net/netlink/policy.c:104 [inline]
    [<ffffffff83a0e378>] netlink_policy_dump_add_policy+0x198/0x1f0 net/netlink/policy.c:135
    [<ffffffff83a0d78d>] ctrl_dumppolicy_start+0x15d/0x290 net/netlink/genetlink.c:1173
    [<ffffffff83a0abf8>] genl_start+0x148/0x210 net/netlink/genetlink.c:596
    [<ffffffff83a0756a>] __netlink_dump_start+0x20a/0x440 net/netlink/af_netlink.c:2370
    [<ffffffff83a0a38e>] genl_family_rcv_msg_dumpit+0x15e/0x190 net/netlink/genetlink.c:678
    [<ffffffff83a0b1d5>] genl_family_rcv_msg net/netlink/genetlink.c:772 [inline]
    [<ffffffff83a0b1d5>] genl_rcv_msg+0x225/0x2c0 net/netlink/genetlink.c:792
    [<ffffffff83a09807>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2501
    [<ffffffff83a0a214>] genl_rcv+0x24/0x40 net/netlink/genetlink.c:803
    [<ffffffff83a08977>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
    [<ffffffff83a08977>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345
    [<ffffffff83a08e36>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921
    [<ffffffff8385aea6>] sock_sendmsg_nosec net/socket.c:714 [inline]
    [<ffffffff8385aea6>] sock_sendmsg+0x56/0x80 net/socket.c:734
    [<ffffffff8385b40c>] ____sys_sendmsg+0x36c/0x390 net/socket.c:2482
    [<ffffffff8385fd08>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536
    [<ffffffff8385fe98>] __sys_sendmsg+0x88/0x100 net/socket.c:2565
    [<ffffffff845d8535>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    [<ffffffff845d8535>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
    [<ffffffff84600087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd



---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [RFC net-next 0/4] ynl: YAML netlink protocol descriptions
From: Florian Fainelli @ 2022-08-12 17:00 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, davem, edumazet, pabeni
  Cc: sdf, jacob.e.keller, vadfed, johannes, jiri, dsahern, stephen, fw,
	linux-doc
In-Reply-To: <20220811022304.583300-1-kuba@kernel.org>

On 8/10/22 19:23, Jakub Kicinski wrote:
> Netlink seems simple and reasonable to those who understand it.
> It appears cumbersome and arcane to those who don't.
> 
> This RFC introduces machine readable netlink protocol descriptions
> in YAML, in an attempt to make creation of truly generic netlink
> libraries a possibility. Truly generic netlink library here means
> a library which does not require changes to support a new family
> or a new operation.
> 
> Each YAML spec lists attributes and operations the family supports.
> The specs are fully standalone, meaning that there is no dependency
> on existing uAPI headers in C. Numeric values of all attribute types,
> operations, enums, and defines and listed in the spec (or unambiguous).
> This property removes the need to manually translate the headers for
> languages which are not compatible with C.
> 
> The expectation is that the spec can be used to either dynamically
> translate between whatever types the high level language likes (see
> the Python example below) or codegen a complete libarary / bindings
> for a netlink family at compilation time (like popular RPC libraries
> do).
> 
> Currently only genetlink is supported, but the "old netlink" should
> be supportable as well (I don't need it myself).
> 
> On the kernel side the YAML spec can be used to generate:
>   - the C uAPI header
>   - documentation of the protocol as a ReST file
>   - policy tables for input attribute validation
>   - operation tables
> 
> We can also codegen parsers and dump helpers, but right now the level
> of "creativity & cleverness" when it comes to netlink parsing is so
> high it's quite hard to generalize it for most families without major
> refactoring.
> 
> Being able to generate the header, documentation and policy tables
> should balance out the extra effort of writing the YAML spec.
> 
> Here is a Python example I promised earlier:
> 
>    ynl = YnlFamily("path/to/ethtool.yaml")
>    channels = ynl.channels_get({'header': {'dev_name': 'eni1np1'}})
> 
> If the call was successful "channels" will hold a standard Python dict,
> e.g.:
> 
>    {'header': {'dev_index': 6, 'dev_name': 'eni1np1'},
>     'combined_max': 1,
>     'combined_count': 1}
> 
> for a netdevsim device with a single combined queue.
> 
> YnlFamily is an implementation of a YAML <> netlink translator (patch 3).
> It takes a path to the YAML spec - hopefully one day we will make
> the YAMLs themselves uAPI and distribute them like we distribute
> C headers. Or get them distributed to a standard search path another
> way. Until then, the YNL library needs a full path to the YAML spec and
> application has to worry about the distribution of those.
> 
> The YnlFamily reads all the info it needs from the spec, resolves
> the genetlink family id, and creates methods based on the spec.
> channels_get is such a dynamically-generated method (i.e. grep for
> channels_get in the python code shows nothing). The method can be called
> passing a standard Python dict as an argument. YNL will look up each key
> in the YAML spec and render the appropriate binary (netlink TLV)
> representation of the value. It then talks thru a netlink socket
> to the kernel, and deserilizes the response, converting the netlink
> TLVs into Python types and constructing a dictionary.
> 
> Again, the YNL code is completely generic and has no knowledge specific
> to ethtool. It's fairly simple an incomplete (in terms of types
> for example), I wrote it this afternoon. I'm also pretty bad at Python,
> but it's the only language I can type which allows the method
> magic, so please don't judge :) I have a rather more complete codegen
> for C, with support for notifications, kernel -> user policy/type
> verification, resolving extack attr offsets into a path
> of attribute names etc, etc. But that stuff needs polishing and
> is less suitable for an RFC.
> 
> The ability for a high level language like Python to talk to the kernel
> so easily, without ctypes, manually packing structs, copy'n'pasting
> values for defines etc. excites me more than C codegen, anyway.

This is really cool BTW, and it makes a lot of sense to me that we are 
moving that way, especially with Rust knocking at the door. I will try 
to do a more thorough review, than "cool, I like it".
-- 
Florian

^ permalink raw reply

* Re: [GIT PULL] virtio: fatures, fixes
From: pr-tracker-bot @ 2022-08-12 16:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Linus Torvalds, kvm, virtualization, netdev, linux-kernel,
	alvaro.karsz, colin.i.king, colin.king, dan.carpenter, david,
	elic, eperezma, gautam.dawar, gshan, hdegoede, hulkci, jasowang,
	jiaming, kangjie.xu, lingshan.zhu, liubo03, michael.christie, mst,
	pankaj.gupta, peng.fan, quic_mingxue, robin.murphy, sgarzare,
	suwan.kim027, syoshida, xieyongji, xuanzhuo, xuqiang36
In-Reply-To: <20220812114250-mutt-send-email-mst@kernel.org>

The pull request you sent on Fri, 12 Aug 2022 11:42:50 -0400:

> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/7a53e17accce9d310d2e522dfc701d8da7ccfa65

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply

* Re: Query on reads being flagged as direct writes...
From: Stanislav Fomichev @ 2022-08-12 16:58 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Lina Wang, Linux NetDev, BPF Mailing List, Jesper Dangaard Brouer,
	Thomas Graf, Alexei Starovoitov
In-Reply-To: <CAHo-OoxwQ3fO3brKw0MSNcQtW5Ynr8LUJoANU_TFeOAQkP1RAA@mail.gmail.com>

On Fri, Aug 12, 2022 at 5:06 AM Maciej Żenczykowski
<zenczykowski@gmail.com> wrote:
>
> From kernel/bpf/verifier.c with some simplifications (removed some of
> the cases to make this shorter):
>
> static bool may_access_direct_pkt_data(struct bpf_verifier_env *env,
> const struct bpf_call_arg_meta *meta, enum bpf_access_type t)
> {
>   enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
>   switch (prog_type) {
>     /* Program types only with direct read access go here! */
>     case BPF_PROG_TYPE_CGROUP_SKB: (and some others)
>       if (t == BPF_WRITE) return false;
>       fallthrough;
>     /* Program types with direct read + write access go here! */
>     case BPF_PROG_TYPE_SCHED_CLS: (and some others)
>       if (meta) return meta->pkt_access;
>       env->seen_direct_write = true;
>       return true;
>     case BPF_PROG_TYPE_CGROUP_SOCKOPT:
>       if (t == BPF_WRITE) env->seen_direct_write = true;
>       return true;
>   }
> }
>
> why does the above set env->seen_direct_write to true even when t !=
> BPF_WRITE, even for programs that only allow (per the comment) direct
> read access.
>
> Is this working correctly?  Is there some gotcha this is papering over?
>
> Should 'env->seen_direct_write = true; return true;' be changed into
> 'fallthrough' so that write is only set if t == BPF_WRITE?
>
> This matters because 'env->seen_direct_write = true' then triggers an
> unconditional unclone in the bpf prologue, which I'd like to avoid
> unless I actually need to modify the packet (with
> bpf_skb_store_bytes)...
>
> may_access_direct_pkt_data() has two call sites, in one it only gets
> called with BPF_WRITE so it's ok, but the other one is in
> check_func_arg():
>
> if (type_is_pkt_pointer(type) && !may_access_direct_pkt_data(env,
> meta, BPF_READ)) { verbose(env, "helper access to the packet is not
> allowed\n"); return -EACCES; }
>
> and I'm not really following what this does, but it seems like bpf
> helper read access to the packet triggers unclone?

There seems to be a set of helpers (pkt_access=true) which accept
direct packet pointers and are known to be doing only reads of the skb
data (safe without clone).
You seem to be hitting the case where you're passing that packet
pointer to one of the "unsafe" (pkt_acces=false) helpers which
triggers that seen_direct_write=true condition.
So it seems like it's by design? Which helper are you calling? Maybe
that one should also have pkt_access=true?

Tangential: I wish there was an explicit BPF_F_MAY_ATTEMPT_TO_CLONE
flag that gates this auto-clone. I think at some point we also
accidentally hit it :-(

> (side note: all packets ingressing from the rndis gadget driver are
> clones due to how it deals with usb packet deaggregation [not to be
> mistaken with lro/tso])
>
> Confused...

^ permalink raw reply

* Re: [PATCH] net: moxa: inherit DMA masks to make dma_map_single() work
From: Florian Fainelli @ 2022-08-12 16:58 UTC (permalink / raw)
  To: Andrew Lunn, Sergei Antonov
  Cc: netdev, Jakub Kicinski, Pavel Skripkin, David S . Miller,
	Guobin Huang, Paolo Abeni
In-Reply-To: <YvaCE0lqfOi+tE5X@lunn.ch>

On 8/12/22 09:38, Andrew Lunn wrote:
> On Fri, Aug 12, 2022 at 07:35:43PM +0300, Sergei Antonov wrote:
>> On Fri, 12 Aug 2022 at 19:13, Andrew Lunn <andrew@lunn.ch> wrote:
>>>> +     /* Inherit the DMA masks from the platform device */
>>>> +     ndev->dev.dma_mask = p_dev->dma_mask;
>>>> +     ndev->dev.coherent_dma_mask = p_dev->coherent_dma_mask;
>>>
>>> There is only one other ethernet driver which does this. What you see
>>> much more often is:
>>>
>>> alacritech/slicoss.c:   paddr = dma_map_single(&sdev->pdev->dev, skb->data, maplen,
>>> neterion/s2io.c:                                dma_map_single(&sp->pdev->dev, ba->ba_1,
>>> dlink/dl2k.c:                       cpu_to_le64(dma_map_single(&np->pdev->dev, skb->data,
>>> micrel/ksz884x.c:               dma_buf->dma = dma_map_single(&hw_priv->pdev->dev, skb->data,
>>
>> Also works. Do you recommend to create a v2 of the patch?
> 
> Yes please. It makes things easier to maintain if every driver does
> the same thing.

Yes this is a common pattern to store a device reference pointing to 
&pdev->dev into your network device private structure fetched via 
netdev_priv().

Alternatively, we could sort of try to settle on a common pattern where 
we utilize &dev->parent->dev thanks to having called SET_NETDEV_DEV(), 
that might be more universal?
-- 
Florian

^ permalink raw reply

* Re: [PATCH bpf-next 13/15] mm, memcg: Add new helper get_obj_cgroup_from_cgroup
From: Shakeel Butt @ 2022-08-12 16:57 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	songmuchun, akpm, netdev, bpf, linux-mm
In-Reply-To: <20220810151840.16394-14-laoar.shao@gmail.com>

On Wed, Aug 10, 2022 at 03:18:38PM +0000, Yafang Shao wrote:
> Introduce new helper get_obj_cgroup_from_cgroup() to get obj_cgroup from
> a specific cgroup.

Can you please add couple of lines on why you need objcg?

> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  include/linux/memcontrol.h |  1 +
>  mm/memcontrol.c            | 41 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 42 insertions(+)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 2f0a611..901a921 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -1713,6 +1713,7 @@ static inline void set_shrinker_bit(struct mem_cgroup *memcg,
>  int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order);
>  void __memcg_kmem_uncharge_page(struct page *page, int order);
>  
> +struct obj_cgroup *get_obj_cgroup_from_cgroup(struct cgroup *cgrp);
>  struct obj_cgroup *get_obj_cgroup_from_current(void);
>  struct obj_cgroup *get_obj_cgroup_from_page(struct page *page);
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 618c366..762cffa 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2908,6 +2908,47 @@ static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
>  	return objcg;
>  }
>  
> +static struct obj_cgroup *get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
> +{
> +	struct obj_cgroup *objcg;
> +
> +	if (memcg_kmem_bypass())
> +		return NULL;
> +
> +	rcu_read_lock();
> +	objcg = __get_obj_cgroup_from_memcg(memcg);
> +	rcu_read_unlock();
> +	return objcg;
> +}
> +
> +struct obj_cgroup *get_obj_cgroup_from_cgroup(struct cgroup *cgrp)
> +{
> +	struct cgroup_subsys_state *css;
> +	struct mem_cgroup *memcg;
> +	struct obj_cgroup *objcg;
> +
> +	rcu_read_lock();
> +	css = rcu_dereference(cgrp->subsys[memory_cgrp_id]);
> +	if (!css || !css_tryget_online(css)) {
> +		rcu_read_unlock();
> +		cgroup_put(cgrp);
> +		return ERR_PTR(-EINVAL);
> +	}
> +	rcu_read_unlock();
> +	cgroup_put(cgrp);

The above put seems out of place and buggy.

> +
> +	memcg = mem_cgroup_from_css(css);
> +	if (!memcg) {
> +		css_put(css);
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	objcg = get_obj_cgroup_from_memcg(memcg);
> +	css_put(css);
> +
> +	return objcg;
> +}
> +
>  __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
>  {
>  	struct obj_cgroup *objcg = NULL;
> -- 
> 1.8.3.1
> 

^ permalink raw reply

* Re: [PATCH v2 0/5] iio/hwmon/mfd/leds/net/power/ASoC: dt-bindings: few stale maintainers cleanup
From: Rob Herring @ 2022-08-12 16:54 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Michael Hennerich, Jean Delvare, Guenter Roeck,
	Krzysztof Kozlowski, Jonathan Cameron, Lars-Peter Clausen,
	Pavel Machek, Tim Harvey, Lee Jones, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Sebastian Reichel,
	Liam Girdwood, Mark Brown, Andrew Davis, linux-hwmon, devicetree,
	linux-kernel, linux-iio, linux-fbdev, linux-leds, netdev,
	linux-pm, alsa-devel
In-Reply-To: <20220809162752.10186-1-krzysztof.kozlowski@linaro.org>

On Tue, Aug 09, 2022 at 07:27:47PM +0300, Krzysztof Kozlowski wrote:
> Hi,
> 
> Changes since v1
> ================
> 1. Patch #5: Drop also Ricardo Rivera-Matos and assign TI bindings to Andrew Davis
> 2. Add acks.
> 
> A question
> ==========
> 
> Several of the bindings here had only one maintainer and history does not
> always point to a new one (although I did not perform extensive digging). I
> added subsystem maintainer, because dtschema requires an entry with valid email address.
> 
> This is not the best choice as simply subsystem maintainer might not have the
> actual device (or its datasheets or any interest in it).
> 
> Maybe we could add some "orphaned" entry in such case?

It would need to be obvious to not use for a new binding.

> 
> Best regards,
> Krzysztof
> 
> Krzysztof Kozlowski (5):
>   dt-bindings: iio: Drop Joachim Eastwood
>   dt-bindings: iio: Drop Bogdan Pricop
>   dt-bindings: Drop Beniamin Bia and Stefan Popa
>   dt-bindings: Drop Robert Jones
>   dt-bindings: Drop Dan Murphy and Ricardo Rivera-Matos

Series applied for 6.0-rc1.

Rob

^ permalink raw reply

* Re: [PATCH] net: moxa: inherit DMA masks to make dma_map_single() work
From: Andrew Lunn @ 2022-08-12 16:38 UTC (permalink / raw)
  To: Sergei Antonov
  Cc: netdev, Jakub Kicinski, Pavel Skripkin, David S . Miller,
	Guobin Huang, Paolo Abeni
In-Reply-To: <CABikg9wm=8rbBFP0vaVHpGBJfXOi4k0bvwK7F+agMXEPfFn2RQ@mail.gmail.com>

On Fri, Aug 12, 2022 at 07:35:43PM +0300, Sergei Antonov wrote:
> On Fri, 12 Aug 2022 at 19:13, Andrew Lunn <andrew@lunn.ch> wrote:
> > > +     /* Inherit the DMA masks from the platform device */
> > > +     ndev->dev.dma_mask = p_dev->dma_mask;
> > > +     ndev->dev.coherent_dma_mask = p_dev->coherent_dma_mask;
> >
> > There is only one other ethernet driver which does this. What you see
> > much more often is:
> >
> > alacritech/slicoss.c:   paddr = dma_map_single(&sdev->pdev->dev, skb->data, maplen,
> > neterion/s2io.c:                                dma_map_single(&sp->pdev->dev, ba->ba_1,
> > dlink/dl2k.c:                       cpu_to_le64(dma_map_single(&np->pdev->dev, skb->data,
> > micrel/ksz884x.c:               dma_buf->dma = dma_map_single(&hw_priv->pdev->dev, skb->data,
> 
> Also works. Do you recommend to create a v2 of the patch?

Yes please. It makes things easier to maintain if every driver does
the same thing.

    Andrew

^ permalink raw reply

* Re: [PATCH] net: moxa: inherit DMA masks to make dma_map_single() work
From: Sergei Antonov @ 2022-08-12 16:35 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, Jakub Kicinski, Pavel Skripkin, David S . Miller,
	Guobin Huang, Paolo Abeni
In-Reply-To: <YvZ8NwzGV/9QDInR@lunn.ch>

On Fri, 12 Aug 2022 at 19:13, Andrew Lunn <andrew@lunn.ch> wrote:
> > +     /* Inherit the DMA masks from the platform device */
> > +     ndev->dev.dma_mask = p_dev->dma_mask;
> > +     ndev->dev.coherent_dma_mask = p_dev->coherent_dma_mask;
>
> There is only one other ethernet driver which does this. What you see
> much more often is:
>
> alacritech/slicoss.c:   paddr = dma_map_single(&sdev->pdev->dev, skb->data, maplen,
> neterion/s2io.c:                                dma_map_single(&sp->pdev->dev, ba->ba_1,
> dlink/dl2k.c:                       cpu_to_le64(dma_map_single(&np->pdev->dev, skb->data,
> micrel/ksz884x.c:               dma_buf->dma = dma_map_single(&hw_priv->pdev->dev, skb->data,

Also works. Do you recommend to create a v2 of the patch?

^ permalink raw reply

* Re: [PATCH net] net: phy: Warn about incorrect mdio_bus_phy_resume() state
From: Florian Fainelli @ 2022-08-12 16:32 UTC (permalink / raw)
  To: Marek Szyprowski, netdev, Steve Glendinning
  Cc: opendmb, Andrew Lunn, Heiner Kallweit, Russell King,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	open list
In-Reply-To: <27016cc0-f228-748b-ea03-800dda4e5f0c@samsung.com>

On 8/12/22 04:19, Marek Szyprowski wrote:
> Hi All,
> 
> On 02.08.2022 01:34, Florian Fainelli wrote:
>> Calling mdio_bus_phy_resume() with neither the PHY state machine set to
>> PHY_HALTED nor phydev->mac_managed_pm set to true is a good indication
>> that we can produce a race condition looking like this:
>>
>> CPU0						CPU1
>> bcmgenet_resume
>>    -> phy_resume
>>      -> phy_init_hw
>>    -> phy_start
>>      -> phy_resume
>>                                                   phy_start_aneg()
>> mdio_bus_phy_resume
>>    -> phy_resume
>>       -> phy_write(..., BMCR_RESET)
>>        -> usleep()                                  -> phy_read()
>>
>> with the phy_resume() function triggering a PHY behavior that might have
>> to be worked around with (see bf8bfc4336f7 ("net: phy: broadcom: Fix
>> brcm_fet_config_init()") for instance) that ultimately leads to an error
>> reading from the PHY.
>>
>> Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM")
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> 
> This patch, as probably intended, triggers a warning during system
> suspend/resume cycle in the SMSC911x driver. I've observed it on ARM
> Juno R1 board on the kernel compiled from next-202208010:
> 
>    ------------[ cut here ]------------
>    WARNING: CPU: 1 PID: 398 at drivers/net/phy/phy_device.c:323
> mdio_bus_phy_resume+0x34/0xc8
>    Modules linked in: smsc911x cpufreq_powersave cpufreq_conservative
> crct10dif_ce ip_tables x_tables ipv6 [last unloaded: smsc911x]
>    CPU: 1 PID: 398 Comm: rtcwake Not tainted 5.19.0+ #940
>    Hardware name: ARM Juno development board (r1) (DT)
>    pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>    pc : mdio_bus_phy_resume+0x34/0xc8
>    lr : dpm_run_callback+0x74/0x350
>    ...
>    Call trace:
>     mdio_bus_phy_resume+0x34/0xc8
>     dpm_run_callback+0x74/0x350
>     device_resume+0xb8/0x258
>     dpm_resume+0x120/0x4a8
>     dpm_resume_end+0x14/0x28
>     suspend_devices_and_enter+0x164/0xa60
>     pm_suspend+0x25c/0x3a8
>     state_store+0x84/0x108
>     kobj_attr_store+0x14/0x28
>     sysfs_kf_write+0x60/0x70
>     kernfs_fop_write_iter+0x124/0x1a8
>     new_sync_write+0xd0/0x190
>     vfs_write+0x208/0x478
>     ksys_write+0x64/0xf0
>     __arm64_sys_write+0x14/0x20
>     invoke_syscall+0x40/0xf8
>     el0_svc_common.constprop.3+0x8c/0x120
>     do_el0_svc+0x28/0xc8
>     el0_svc+0x48/0xd0
>     el0t_64_sync_handler+0x94/0xb8
>     el0t_64_sync+0x15c/0x160
>    irq event stamp: 24406
>    hardirqs last  enabled at (24405): [<ffff8000090c4734>]
> _raw_spin_unlock_irqrestore+0x8c/0x90
>    hardirqs last disabled at (24406): [<ffff8000090b3164>] el1_dbg+0x24/0x88
>    softirqs last  enabled at (24144): [<ffff800008010488>] _stext+0x488/0x5cc
>    softirqs last disabled at (24139): [<ffff80000809bf98>]
> irq_exit_rcu+0x168/0x1a8
>    ---[ end trace 0000000000000000 ]---
> 
> I hope the above information will help fixing the driver.

Yes this is catching an actual issue in the driver in that the PHY state 
machine is still running while the system is trying to suspend. We could 
go about fixing it in a different number of ways, though I believe this 
one is probably correct enough to work and fix the warning:

diff --git a/drivers/net/ethernet/smsc/smsc911x.c 
b/drivers/net/ethernet/smsc/smsc911x.c
index 3bf20211cceb..e9c0668a4dc0 100644
--- a/drivers/net/ethernet/smsc/smsc911x.c
+++ b/drivers/net/ethernet/smsc/smsc911x.c
@@ -1037,6 +1037,8 @@ static int smsc911x_mii_probe(struct net_device *dev)
                 return ret;
         }

+       /* Indicate that the MAC is responsible for managing PHY PM */
+       phydev->mac_managed_pm = true;
         phy_attached_info(phydev);

         phy_set_max_speed(phydev, SPEED_100);
@@ -2587,6 +2589,8 @@ static int smsc911x_suspend(struct device *dev)
         if (netif_running(ndev)) {
                 netif_stop_queue(ndev);
                 netif_device_detach(ndev);
+               if (!device_may_wakeup(dev))
+                       phy_suspend(dev->phydev);
         }

         /* enable wake on LAN, energy detection and the external PME
@@ -2628,6 +2632,8 @@ static int smsc911x_resume(struct device *dev)
         if (netif_running(ndev)) {
                 netif_device_attach(ndev);
                 netif_start_queue(ndev);
+               if (!device_may_wakeup(dev))
+                       phy_resume(dev->phydev);
         }

         return 0;

-- 
Florian

^ permalink raw reply related

* Re: [PATCH v2] wifi: cfg80211: Fix UAF in ieee80211_scan_rx()
From: Siddh Raman Pant @ 2022-08-12 16:27 UTC (permalink / raw)
  To: Greg KH
  Cc: johannes berg, david s. miller, eric dumazet, jakub kicinski,
	paolo abeni, netdev, syzbot+6cb476b7c69916a0caca, linux-wireless,
	linux-kernel, syzbot+f9acff9bf08a845f225d,
	syzbot+9250865a55539d384347, linux-kernel-mentees
In-Reply-To: <YvZxfpY4JUqvsOG5@kroah.com>

On Fri, 12 Aug 2022 20:57:58 +0530  Greg KH  wrote:
> rcu just delays freeing of an object, you might just be delaying the
> race condition.  Just moving a single object to be freed with rcu feels
> very odd if you don't have another reference somewhere.

As mentioned in patch message, in net/mac80211/scan.c, we have:
        void ieee80211_scan_rx(struct ieee80211_local *local, struct sk_buff *skb)
        {
                ...
                scan_req = rcu_dereference(local->scan_req);
                sched_scan_req = rcu_dereference(local->sched_scan_req);

                if (scan_req)
                        scan_req_flags = scan_req->flags;
                ...
        }

So scan_req is probably supposed to be protected by RCU.

Also, in ieee80211_local's definition at net/mac80211/ieee80211_i.h, we have:
        struct cfg80211_scan_request __rcu *scan_req;

Thus, scan_req is indeed supposed to be protected by RCU, which this patch
addresses by adding a RCU head to the type's struct, and using kfree_rcu().

The above snippet is where the UAF happens (you can refer to syzkaller's log),
because __cfg80211_scan_done() is called and frees the pointer.

Thanks,
Siddh


^ permalink raw reply

* Re: [RFC net-next 4/4] ynl: add a sample user for ethtool
From: Stanislav Fomichev @ 2022-08-12 16:26 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, davem, edumazet, pabeni, jacob.e.keller, vadfed, johannes,
	jiri, dsahern, stephen, fw, linux-doc
In-Reply-To: <20220811163111.56d83702@kernel.org>

On Thu, Aug 11, 2022 at 4:31 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 11 Aug 2022 15:55:44 -0700 Stanislav Fomichev wrote:
> > > We could, I guess. To be clear this controls the count, IOW:
> > >
> > > enum {
> > >         PREFIX_A_BLA_ATTR = 1,
> > >         PREFIX_A_ANOTHER_ATTR,
> > >         PREFIX_A_AND_ONEMORE,
> > >         __PFREIX_A_CNT, // <--- This thing
> > > };
> > > #define PREFIX_A_MAX (__PFREIX_A_CNT - 1)
> > >
> > > It's not used in the generated code, only if we codegen the uAPI,
> > > AFAIR. So we'd need a way to tell the generator of the uAPI about
> > > the situation, anyway. I could be misremembering.
> >
> > My worry is that we'll have more hacks like these and it's hard, as a
> > spec reader/writer, to figure out that they exist..
> > So I was wondering if it's "easier" (from the spec reader/writer pov)
> > to have some c-header-fixup: section where we can have plain
> > c-preprocessor hacks like these (where we need to redefine something
> > to match the old behavior).
>
> Let me think about it some more. My main motivation is people writing
> new families, I haven't sent too much time worrying about the existing
> ones with all their quirks. It's entirely possible that the uAPI quirks
> can just go and we won't generate uAPI for existing families as it
> doesn't buy us anything.

Ack. Although, we have to have some existing examples for people to
write new families. So you might still have to convert something :-)

> > Coming from stubby/grpc, I was expecting to see words like
> > message/field/struct. The question is what's more confusing: sticking
> > with netlink naming or trying to map grpc/thrift concepts on top of
> > what we have. (I'm assuming more people know about grpc/thrift than
> > netlink)
> >
> > messages: # or maybe 'attribute-sets' ?
> >   - name: channels
> >     ...
>
> Still not convinced about messages, as it makes me think that every
> "space" is then a definition of a message rather than just container
> for field definitions with independent ID spaces.
>
> Attribute-sets sounds good, happy to rename.
>
> Another thought I just had was to call it something like "data-types"
> or "field-types" or "type-spaces". To indicate the split into "data"
> and "actions"/"operations"?

I like attribute-set better than attribute-space :-)

> > operations:
> >   - name: channel_get
> >     message: channels
> >     do:
> >       request:
> >         fields:
> >         - header
> >         - rx_max
> >
> > Or maybe all we really need is a section in the doc called 'Netlink
> > for gRPC/Thrift users' where we map these concepts:
> > - attribute-spaces (attribute-sets?) -> messages
> > - attributes -> fields
>
> Excellent idea!
>
> > > Dunno, that'd mean that the Python method is called
> > > ETHTOOL_MSG_CHANNELS_GET rather than just channels_get.
> > > I don't want to force all languages to use the C naming.
> > > The C naming just leads to silly copy'n'paste issues like
> > > f329a0ebeab.
> >
> > Can we have 'name:' and 'long-name:' or 'c-name:' or 'full-name' ?
> >
> > - name: header
> >    attributes:
> >     - name: dev_index
> >       full-name: ETHTOOL_A_HEADER_DEV_INDEX
> >       val:
> >       type:
> >
> > Suppose I'm rewriting my c application from uapi to some generated (in
> > the future) python-like channels_get() method. If I can grep for
> > ETHTOOL_MSG_CHANNELS_GET, that would save me a bunch of time figuring
> > out what the new canonical wrapper is.
> >
> > Also, maybe, at some point we'll have:
> > - name: dev_index
> >   c-name: ETHTOOL_A_HEADER_DEV_INDEX
> >   java-name: headerDevIndex
>
> Herm, looking at my commits where I started going with the C codegen
> (which I haven't posted here) is converting the values to the same
> format as keys (i.e. YAML/JSON style with dashes). So the codegen does:
>
>         c_name = attr['name']
>         if c_name in c_keywords:
>                 c_name += '_'
>         c_name = c_name.replace('-', '_')
>
> So the name would be "dev-index", C will make that dev_index, Java will
> make that devIndex (or whatever) etc.
>
> I really don't want people to have to prefix the names because that's
> creating more work. We can slap a /* header.dev_index */ comment in
> the generated uAPI, for the grep? Dunno..

Yeah, dunno as well, not sure how much of the per-language knowledge
you should bake into the tool itself.. I think my confusion mostly
comes from the fact that 'name' is mixed together with 'name-prefix'
and one is 'low_caps' while the other one is 'ALL_CAPS'. Too much
magic :-)

Thinking out loud: maybe these transformations should all go via some
extra/separate yaml (or separate sections)? Like:

fixup-attribute-sets:
  - name: header
    canonical-c-name: "ETHTOOL_A_HEADER_{name.upper()}"
  - name: channels
    canonical-c-name: "ETHTOOL_A_CHANNELS_{name.upper()}"

fixup-operations:
  canonical-c-name: "ETHTOOL_MSG_{name.upper()}"

  # in case the "generic" catch-all above doesn't work
  - name: channgels_get
     canonical-c-name: "ETHTOOL_MSG_CHANNELS_GET"

We can call it "compatibility" yaml and put all sorts of weird stuff in it.
New users hopefully don't need to care about it and don't need to
write any of the -prefix/-suffix stuff.


> > > Good catch, I'm aware. I was planning to add a "header constants"
> > > section or some such. A section in "headers" which defines the
> > > constants which C code will get from the headers.
> >
> > Define as in 're-define' or define as in 'you need to include some
> > other header for this to work'?
> >
> > const:
> >   - name: ALTIFNAMSIZ
> >     val: 128
>
> This one. In most cases the constant is defined in the same uAPI header
> as the proto so we're good. But there's IFNAMSIZ and friends which are
> shared.
>
> > which then does
> >
> > #ifndef
> > #define ALTIFNAMSIZ 128
> > #else
> > static_assert(ALTIFNAMSIZ == 128)
> > #endif
> >
> > ?
> >
> > or:
> >
> > external-const:
> >   - name: ALTIFNAMSIZ
> >     header: include/uapi/linux/if.h
> >
> > which then might generate the following:
> >
> > #include <include/uapi/linux/if.h>
> > #ifndef ALTIFNAMSIZ
> > #error "include/uapi/linux/if.h does not define ALTIFNAMSIZ"
> > #endif
> >
> > > For Python it does not matter, as we don't have to size arrays.
> >
> > Hm, I was expecting the situation to be the opposite :-) Because if
> > you really have to know this len in python, how do you resolve
> > ALTIFNAMSIZ?
>
> Why does Python need to know the length of the string tho?
> On receive if kernel gives you a longer name - great, no problem.
> On send the kernel will tell you so also meh.

I was thinking that you wanted to do some client size validation as
well? As in sizeof(rx_buf) > ALTIFNAMSIZ -> panic? But I agree that
there is really no value in that, the kernel will do the validation
anyway..

> In C the struct has a char bla[FIXED_SIZE] so if we get an oversized
> string we're pooped, that's my point, dunno what other practical use
> the string sizing has.
>
> > The simplest thing to do might be to require these headers to be
> > hermetic (i.e., redefine all consts the spec cares about internally)?
>
> That's what I'm thinking if they are actually needed. But it only C
> cares we can just slap the right includes and not worry. Dunno if other
> languages are similarly string-challenged.

^ permalink raw reply

* Re: [PATCH] net: moxa: inherit DMA masks to make dma_map_single() work
From: Andrew Lunn @ 2022-08-12 16:13 UTC (permalink / raw)
  To: Sergei Antonov
  Cc: netdev, Jakub Kicinski, Pavel Skripkin, David S . Miller,
	Guobin Huang, Paolo Abeni
In-Reply-To: <20220812154820.2225457-1-saproj@gmail.com>

On Fri, Aug 12, 2022 at 06:48:20PM +0300, Sergei Antonov wrote:
> dma_map_single() calls fail in moxart_mac_setup_desc_ring() and
> moxart_mac_start_xmit() which leads to an incessant output of this:
> 
> [   16.043925] moxart-ethernet 92000000.mac eth0: DMA mapping error
> [   16.050957] moxart-ethernet 92000000.mac eth0: DMA mapping error
> [   16.058229] moxart-ethernet 92000000.mac eth0: DMA mapping error
> 
> To make dma_map_single() work, inherit DMA masks from the platform device.
> 
> Signed-off-by: Sergei Antonov <saproj@gmail.com>
> CC: Jakub Kicinski <kuba@kernel.org>
> CC: Pavel Skripkin <paskripkin@gmail.com>
> CC: David S. Miller <davem@davemloft.net>
> CC: Guobin Huang <huangguobin4@huawei.com>
> CC: Paolo Abeni <pabeni@redhat.com>
> ---
>  drivers/net/ethernet/moxa/moxart_ether.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/net/ethernet/moxa/moxart_ether.c b/drivers/net/ethernet/moxa/moxart_ether.c
> index a3214a762e4b..de99211d85c2 100644
> --- a/drivers/net/ethernet/moxa/moxart_ether.c
> +++ b/drivers/net/ethernet/moxa/moxart_ether.c
> @@ -537,6 +537,10 @@ static int moxart_mac_probe(struct platform_device *pdev)
>  	ndev->priv_flags |= IFF_UNICAST_FLT;
>  	ndev->irq = irq;
>  
> +	/* Inherit the DMA masks from the platform device */
> +	ndev->dev.dma_mask = p_dev->dma_mask;
> +	ndev->dev.coherent_dma_mask = p_dev->coherent_dma_mask;

There is only one other ethernet driver which does this. What you see
much more often is:

alacritech/slicoss.c:	paddr = dma_map_single(&sdev->pdev->dev, skb->data, maplen,
neterion/s2io.c:				dma_map_single(&sp->pdev->dev, ba->ba_1,
dlink/dl2k.c:			    cpu_to_le64(dma_map_single(&np->pdev->dev, skb->data,
micrel/ksz884x.c:		dma_buf->dma = dma_map_single(&hw_priv->pdev->dev, skb->data,

	Andrew

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox