Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] iphase: Adding a null pointer check
From: Jakub Kicinski @ 2023-11-09  2:40 UTC (permalink / raw)
  To: Andrey Shumilin
  Cc: 3chas3, linux-atm-general, netdev, linux-kernel, lvc-project,
	khoroshilov, ykarpov, vmerzlyakov, vefanov
In-Reply-To: <20231107123600.14529-1-shum.sdl@nppct.ru>

On Tue,  7 Nov 2023 15:36:00 +0300 Andrey Shumilin wrote:
> The pointer <dev->desc_tbl[i].iavcc> is dereferenced on line 195.
> Further in the code, it is checked for null on line 204.
> It is proposed to add a check before dereferencing the pointer.

How do you know this is the right way to address the problem.
Much easier to debug a crash than misbehaving driver.

This is ancient code, leave it be.
-- 
pw-bot: reject
pv-bot: nit

^ permalink raw reply

* Re: [PATCH] boning: use a read-write lock in bonding_show_bonds()
From: Haifeng Xu @ 2023-11-09  2:43 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: j.vosburgh, andy, davem, kuba, pabeni, netdev, linux-kernel
In-Reply-To: <CANn89iJnjp8YYYLqtfAGg6PU9iiSrKbMU43wgDkuEVqX8kSCmA@mail.gmail.com>



On 2023/11/8 22:19, Eric Dumazet wrote:
> On Wed, Nov 8, 2023 at 7:47 AM Haifeng Xu <haifeng.xu@shopee.com> wrote:
>>
>> call stack:
> 
> These stacks should either be removed from the changelog, or moved
> _after_ the description
> of the problem. These are normal looking call stacks, you are not
> fixing a crash or deadlock.
> 
>> ......
>> PID: 210933  TASK: ffff92424e5ec080  CPU: 13  COMMAND: "kworker/u96:2"
>> [ffffa7a8e96bbac0] __schedule at ffffffffb0719898
>> [ffffa7a8e96bbb48] schedule at ffffffffb0719e9e
>> [ffffa7a8e96bbb68] rwsem_down_write_slowpath at ffffffffafb3167a
>> [ffffa7a8e96bbc00] down_write at ffffffffb071bfc1
>> [ffffa7a8e96bbc18] kernfs_remove_by_name_ns at ffffffffafe3593e
>> [ffffa7a8e96bbc48] sysfs_unmerge_group at ffffffffafe38922
>> [ffffa7a8e96bbc68] dpm_sysfs_remove at ffffffffb021c96a
>> [ffffa7a8e96bbc80] device_del at ffffffffb0209af8
>> [ffffa7a8e96bbcd0] netdev_unregister_kobject at ffffffffb04a6b0e
>> [ffffa7a8e96bbcf8] unregister_netdevice_many at ffffffffb046d3d9
>> [ffffa7a8e96bbd60] default_device_exit_batch at ffffffffb046d8d1
>> [ffffa7a8e96bbdd0] ops_exit_list at ffffffffb045e21d
>> [ffffa7a8e96bbe00] cleanup_net at ffffffffb045ea46
>> [ffffa7a8e96bbe60] process_one_work at ffffffffafad94bb
>> [ffffa7a8e96bbeb0] worker_thread at ffffffffafad96ad
>> [ffffa7a8e96bbf10] kthread at ffffffffafae132a
>> [ffffa7a8e96bbf50] ret_from_fork at ffffffffafa04b92
>>
>> 290858 PID: 278176  TASK: ffff925deb39a040  CPU: 32  COMMAND: "node-exporter"
>> [ffffa7a8d14dbb80] __schedule at ffffffffb0719898
>> [ffffa7a8d14dbc08] schedule at ffffffffb0719e9e
>> [ffffa7a8d14dbc28] schedule_preempt_disabled at ffffffffb071a24e
>> [ffffa7a8d14dbc38] __mutex_lock at ffffffffb071af28
>> [ffffa7a8d14dbcb8] __mutex_lock_slowpath at ffffffffb071b1a3
>> [ffffa7a8d14dbcc8] mutex_lock at ffffffffb071b1e2
>> [ffffa7a8d14dbce0] rtnl_lock at ffffffffb047f4b5
>> [ffffa7a8d14dbcf0] bonding_show_bonds at ffffffffc079b1a1 [bonding]
>> [ffffa7a8d14dbd20] class_attr_show at ffffffffb02117ce
>> [ffffa7a8d14dbd30] sysfs_kf_seq_show at ffffffffafe37ba1
>> [ffffa7a8d14dbd50] kernfs_seq_show at ffffffffafe35c07
>> [ffffa7a8d14dbd60] seq_read_iter at ffffffffafd9fce0
>> [ffffa7a8d14dbdc0] kernfs_fop_read_iter at ffffffffafe36a10
>> [ffffa7a8d14dbe00] new_sync_read at ffffffffafd6de23
>> [ffffa7a8d14dbe90] vfs_read at ffffffffafd6e64e
>> [ffffa7a8d14dbed0] ksys_read at ffffffffafd70977
>> [ffffa7a8d14dbf10] __x64_sys_read at ffffffffafd70a0a
>> [ffffa7a8d14dbf20] do_syscall_64 at ffffffffb070bf1c
>> [ffffa7a8d14dbf50] entry_SYSCALL_64_after_hwframe at ffffffffb080007c
>> ......
>>
>> Problem description:
>>
>> Thread 210933 holds the rtnl_mutex and tries to acquire the kernfs_rwsem,
>> but there are many readers which hold the kernfs_rwsem, so it has to sleep
>> for a long time to wait the readers release the lock. Thread 278176 and any
>> other threads which call bonding_show_bonds() also need to wait because
>> they try to accuire the rtnl_mutex.
> 
> acquire
> 
>>
>> bonding_show_bonds() uses rtnl_mutex to protect the bond_list traversal.
>> However, the addition and deletion of bond_list are only performed in
>> bond_init()/bond_uninit(), so we can intoduce a separate read-write lock
> 
> introduce
> 
>> to synchronize bond list mutation.
>>
>> What's the benefits of this change?
>>
>> 1) All threads which call bonding_show_bonds() only wait when the
>> registration or unregistration of bond device happens.
>>
>> 2) There are many other users of rtnl_mutex, so bonding_show_bonds()
>> won't compete with them.
>>
>> In a word, this change reduces the lock contention of rtnl_mutex.
>>
> 
> This looks good to me, but please note:
> 
> 1) This is net-next material, please resend next week, because
> net-next is currently closed during the merge window.
> 
> 2) Using a spell checker would point few typos (including in the title
> "boning" -> "bonding")
> 
> Thanks.

Thanks for your review, I 'll send a new patch next week.

^ permalink raw reply

* Re: [PATCH net 1/5] netfilter: add missing module descriptions
From: patchwork-bot+netdevbpf @ 2023-11-09  2:50 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, davem, netdev, kuba, pabeni, edumazet, fw,
	kadlec
In-Reply-To: <20231108155802.84617-2-pablo@netfilter.org>

Hello:

This series was applied to netdev/net.git (main)
by Pablo Neira Ayuso <pablo@netfilter.org>:

On Wed,  8 Nov 2023 16:57:58 +0100 you wrote:
> From: Florian Westphal <fw@strlen.de>
> 
> W=1 builds warn on missing MODULE_DESCRIPTION, add them.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> 
> [...]

Here is the summary with links:
  - [net,1/5] netfilter: add missing module descriptions
    https://git.kernel.org/netdev/net/c/94090b23f3f7
  - [net,2/5] netfilter: nf_tables: remove catchall element in GC sync path
    https://git.kernel.org/netdev/net/c/93995bf4af2c
  - [net,3/5] ipvs: add missing module descriptions
    https://git.kernel.org/netdev/net/c/17cd01e4d1e3
  - [net,4/5] netfilter: xt_recent: fix (increase) ipv6 literal buffer length
    https://git.kernel.org/netdev/net/c/7b308feb4fd2
  - [net,5/5] netfilter: nat: fix ipv6 nat redirect with mapped and scoped addresses
    https://git.kernel.org/netdev/net/c/80abbe8a8263

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net 0/4][pull request] Intel Wired LAN Driver Updates 2023-11-06 (ice)
From: patchwork-bot+netdevbpf @ 2023-11-09  2:50 UTC (permalink / raw)
  To: Tony Nguyen; +Cc: davem, kuba, pabeni, edumazet, netdev
In-Reply-To: <20231107004844.655549-1-anthony.l.nguyen@intel.com>

Hello:

This series was applied to netdev/net.git (main)
by Tony Nguyen <anthony.l.nguyen@intel.com>:

On Mon,  6 Nov 2023 16:48:38 -0800 you wrote:
> This series contains updates to ice driver only.
> 
> Dave removes SR-IOV LAG attribute for only the interface being disabled
> to allow for proper unwinding of all interfaces.
> 
> Michal Schmidt changes some LAG allocations from GFP_KERNEL to GFP_ATOMIC
> due to non-allowed sleeping.
> 
> [...]

Here is the summary with links:
  - [net,1/4] ice: Fix SRIOV LAG disable on non-compliant aggregate
    https://git.kernel.org/netdev/net/c/3e39da4fa16c
  - [net,2/4] ice: lag: in RCU, use atomic allocation
    https://git.kernel.org/netdev/net/c/e1db8c2a01d7
  - [net,3/4] ice: Fix VF-VF filter rules in switchdev mode
    https://git.kernel.org/netdev/net/c/8b3c8c55ccbc
  - [net,4/4] ice: Fix VF-VF direction matching in drop rule in switchdev
    https://git.kernel.org/netdev/net/c/68c51db3a16d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net 0/2][pull request] Intel Wired LAN Driver Updates 2023-11-06 (i40e)
From: patchwork-bot+netdevbpf @ 2023-11-09  2:50 UTC (permalink / raw)
  To: Tony Nguyen; +Cc: davem, kuba, pabeni, edumazet, netdev, ivecera, jiri
In-Reply-To: <20231107003600.653796-1-anthony.l.nguyen@intel.com>

Hello:

This series was applied to netdev/net.git (main)
by Tony Nguyen <anthony.l.nguyen@intel.com>:

On Mon,  6 Nov 2023 16:35:57 -0800 you wrote:
> This series contains updates to i40e driver only.
> 
> Ivan Vecera resolves a couple issues with devlink; removing a call to
> devlink_port_type_clear() and ensuring devlink port is unregistered
> after the net device.
> 
> The following are changes since commit c1ed833e0b3b7b9edc82b97b73b2a8a10ceab241:
>   Merge branch 'smc-fixes'
> and are available in the git repository at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue 40GbE
> 
> [...]

Here is the summary with links:
  - [net,1/2] i40e: Do not call devlink_port_type_clear()
    https://git.kernel.org/netdev/net/c/e96fe283c6f4
  - [net,2/2] i40e: Fix devlink port unregistering
    https://git.kernel.org/netdev/net/c/aa54d846f361

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net V9 1/2] ptp: ptp_read should not release queue
From: patchwork-bot+netdevbpf @ 2023-11-09  2:50 UTC (permalink / raw)
  To: Edward Adam Davis
  Cc: richardcochran, davem, habetsm.xilinx, jeremy, linux-kernel,
	netdev, reibax, syzbot+df3f3ef31f60781fa911
In-Reply-To: <tencent_18747D76F1675A3C633772960237544AAA09@qq.com>

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue,  7 Nov 2023 16:00:40 +0800 you wrote:
> Firstly, queue is not the memory allocated in ptp_read;
> Secondly, other processes may block at ptp_read and wait for conditions to be
> met to perform read operations.
> 
> Acked-by: Richard Cochran <richardcochran@gmail.com>
> Reported-and-tested-by: syzbot+df3f3ef31f60781fa911@syzkaller.appspotmail.com
> Fixes: 8f5de6fb2453 ("ptp: support multiple timestamp event readers")
> Signed-off-by: Edward Adam Davis <eadavis@qq.com>
> 
> [...]

Here is the summary with links:
  - [net,V9,1/2] ptp: ptp_read should not release queue
    https://git.kernel.org/netdev/net/c/b714ca2ccf6a
  - [net,V9,2/2] ptp: fix corrupted list in ptp_open
    https://git.kernel.org/netdev/net/c/1bea2c3e6df8

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] net: kcm: fill in MODULE_DESCRIPTION()
From: patchwork-bot+netdevbpf @ 2023-11-09  2:50 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, edumazet, pabeni, dhowells
In-Reply-To: <20231108020305.537293-1-kuba@kernel.org>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue,  7 Nov 2023 18:03:05 -0800 you wrote:
> W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: dhowells@redhat.com
> 
> I've updated the cc_maintainers test as promised at netconf.
> Let's see if it works...
> 
> [...]

Here is the summary with links:
  - [net] net: kcm: fill in MODULE_DESCRIPTION()
    https://git.kernel.org/netdev/net/c/31356547e331

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] net_sched: sch_fq: better validate TCA_FQ_WEIGHTS and TCA_FQ_PRIOMAP
From: patchwork-bot+netdevbpf @ 2023-11-09  2:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, kuba, pabeni, willemb, jhs, xiyou.wangcong, toke, jiri,
	netdev, eric.dumazet, syzkaller
In-Reply-To: <20231107160440.1992526-1-edumazet@google.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue,  7 Nov 2023 16:04:40 +0000 you wrote:
> syzbot was able to trigger the following report while providing
> too small TCA_FQ_WEIGHTS attribute [1]
> 
> Fix is to use NLA_POLICY_EXACT_LEN() to ensure user space
> provided correct sizes.
> 
> Apply the same fix to TCA_FQ_PRIOMAP.
> 
> [...]

Here is the summary with links:
  - [net] net_sched: sch_fq: better validate TCA_FQ_WEIGHTS and TCA_FQ_PRIOMAP
    https://git.kernel.org/netdev/net/c/f1a3b283f852

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [RFC PATCH v3 07/12] page-pool: device memory support
From: Mina Almasry @ 2023-11-09  3:20 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: netdev, linux-kernel, linux-arch, linux-kselftest, linux-media,
	dri-devel, linaro-mm-sig, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Jesper Dangaard Brouer,
	Ilias Apalodimas, Arnd Bergmann, David Ahern, Willem de Bruijn,
	Shuah Khan, Sumit Semwal, Christian König, Shakeel Butt,
	Jeroen de Borst, Praveen Kaligineedi
In-Reply-To: <d4309392-711a-75b0-7bf0-9e7de8fd527e@huawei.com>

On Wed, Nov 8, 2023 at 2:56 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2023/11/8 5:56, Mina Almasry wrote:
> > On Tue, Nov 7, 2023 at 12:00 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
> >>
> >> On 2023/11/6 10:44, Mina Almasry wrote:
> >>> Overload the LSB of struct page* to indicate that it's a page_pool_iov.
> >>>
> >>> Refactor mm calls on struct page* into helpers, and add page_pool_iov
> >>> handling on those helpers. Modify callers of these mm APIs with calls to
> >>> these helpers instead.
> >>>
> >>> In areas where struct page* is dereferenced, add a check for special
> >>> handling of page_pool_iov.
> >>>
> >>> Signed-off-by: Mina Almasry <almasrymina@google.com>
> >>>
> >>> ---
> >>>  include/net/page_pool/helpers.h | 74 ++++++++++++++++++++++++++++++++-
> >>>  net/core/page_pool.c            | 63 ++++++++++++++++++++--------
> >>>  2 files changed, 118 insertions(+), 19 deletions(-)
> >>>
> >>> diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
> >>> index b93243c2a640..08f1a2cc70d2 100644
> >>> --- a/include/net/page_pool/helpers.h
> >>> +++ b/include/net/page_pool/helpers.h
> >>> @@ -151,6 +151,64 @@ static inline struct page_pool_iov *page_to_page_pool_iov(struct page *page)
> >>>       return NULL;
> >>>  }
> >>>
> >>> +static inline int page_pool_page_ref_count(struct page *page)
> >>> +{
> >>> +     if (page_is_page_pool_iov(page))
> >>> +             return page_pool_iov_refcount(page_to_page_pool_iov(page));
> >>
> >> We have added a lot of 'if' for the devmem case, it would be better to
> >> make it more generic so that we can have more unified metadata handling
> >> for normal page and devmem. If we add another memory type here, do we
> >> need another 'if' here?
> >
> > Maybe, not sure. I'm guessing new memory types will either be pages or
> > iovs, so maybe no new if statements needed.
> >
> >> That is part of the reason I suggested using a more unified metadata for
> >> all the types of memory chunks used by page_pool.
> >
> > I think your suggestion was to use struct pages for devmem. That was
> > thoroughly considered and intensely argued about in the initial
> > conversations regarding devmem and the initial RFC, and from the
> > conclusions there it's extremely clear to me that devmem struct pages
> > are categorically a no-go.
>
> Not exactly, I was wondering if adding a more abstract structure specificly
> for page pool makes any sense, and each mem type can add its own specific
> fields, net stack only see and handle the common fields so that it does not
> care about specific mem type, and each provider only see the and handle the
> specific fields belonging to it most of the time.
>
> Ideally something like beleow:
>
> struct netmem {
>         /* common fields */
>         refcount_t refcount;
>         struct page_pool *pp;
>         ......
>
>         union {
>                 struct devmem{
>                         struct dmabuf_genpool_chunk_owner *owner;
>                 };
>
>                 struct other_mem{
>                         ...
>                         ...
>                 };
>         };
> };
>
> But untill we completely decouple the 'struct page' from the net stack,
> the above seems undoable in the near term.

Agreed everything above is undoable.

> But we might be able to do something as folio is doing now, mm subsystem
> is still seeing 'struct folio/page', but other subsystem like slab is using
> 'struct slab', and there is still some common fields shared between
> 'struct folio' and 'struct slab'.
>

In my eyes this is almost exactly what I suggested in RFC v1 and got
immediately nacked with no room to negotiate. What we did for v1 is to
allocate struct pages for dma-buf to make dma-bufs look like struct
page to mm subsystem. Almost exactly what you're describing above.
It's a no-go. I don't think renaming struct page to netmem is going to
move the needle (it also re-introduces code-churn). What I feel like I
learnt is that dma-bufs are not struct pages and can't be made to look
like one, I think.

> As the netmem patchset, is devmem able to reuse the below 'struct netmem'
> and rename it to 'struct page_pool_iov'?

I don't think so. For the reasons above, but also practically it
immediately falls apart. Consider this field in netmem:

+ * @flags: The same as the page flags.  Do not use directly.

dma-buf don't have or support page-flags, and making dma-buf looks
like they support page flags or any page-like features (other than
dma_addr) seems extremely unacceptable to mm folks.

> So that 'struct page' for normal
> memory and 'struct page_pool_iov' for devmem share the common fields used
> by page pool and net stack?

Are you suggesting that we'd cast a netmem* to a page* and call core
mm APIs on it? It's basically what was happening with RFC v1, where
things that are not struct pages were made to look like struct pages.

Also, there isn't much upside for what you're suggesting, I think. For
example I can align the refcount variable in struct page_pool_iov with
the refcount in struct page so that this works:

put_page((struct page*)ppiov);

but it's a disaster. Because put_page() will call __put_page() if the
page is freed, and __put_page() will try to return the page to the
buddy allocator!

>  And we might be able to reuse the 'flags',
> '_pp_mapping_pad' and '_mapcount' for specific mem provider, which is enough
> for the devmem only requiring a single pointer to point to it's
> owner?
>

All the above seems quite similar to RFC v1 again, using netmem
instead of struct page. In RFC v1 we re-used zone_device_data() for
the dma-buf owner equivalent.

> https://lkml.kernel.org/netdev/20230105214631.3939268-2-willy@infradead.org/
>
> +/**
> + * struct netmem - A memory allocation from a &struct page_pool.
> + * @flags: The same as the page flags.  Do not use directly.
> + * @pp_magic: Magic value to avoid recycling non page_pool allocated pages.
> + * @pp: The page pool this netmem was allocated from.
> + * @dma_addr: Call netmem_get_dma_addr() to read this value.
> + * @dma_addr_upper: Might need to be 64-bit on 32-bit architectures.
> + * @pp_frag_count: For frag page support, not supported in 32-bit
> + *   architectures with 64-bit DMA.
> + * @_mapcount: Do not access this member directly.
> + * @_refcount: Do not access this member directly.  Read it using
> + *   netmem_ref_count() and manipulate it with netmem_get() and netmem_put().
> + *
> + * This struct overlays struct page for now.  Do not modify without a
> + * good understanding of the issues.
> + */
> +struct netmem {
> +       unsigned long flags;
> +       unsigned long pp_magic;
> +       struct page_pool *pp;
> +       /* private: no need to document this padding */
> +       unsigned long _pp_mapping_pad;  /* aliases with folio->mapping */
> +       /* public: */
> +       unsigned long dma_addr;
> +       union {
> +               unsigned long dma_addr_upper;
> +               atomic_long_t pp_frag_count;
> +       };
> +       atomic_t _mapcount;
> +       atomic_t _refcount;
> +};
>
> If we do that, it seems we might be able to allow net stack and page pool to see
> the metadata for devmem chunk as 'struct page', and may be able to aovid most of
> the 'if' checking in net stack and page pool?
>
> >
> > --
> > Thanks,
> > Mina
> >
> > .
> >



-- 
Thanks,
Mina

^ permalink raw reply

* Re: [PATCH net-next 05/15] net: page_pool: record pools per netdev
From: Mina Almasry @ 2023-11-09  3:28 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, edumazet, pabeni, hawk, ilias.apalodimas
In-Reply-To: <20231025131740.489fdfcf@kernel.org>

On Wed, Oct 25, 2023 at 1:17 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 25 Oct 2023 12:56:44 -0700 Mina Almasry wrote:
> > > +#if IS_ENABLED(CONFIG_PAGE_POOL)
> > > +       /** @page_pools: page pools created for this netdevice */
> > > +       struct hlist_head       page_pools;
> > > +#endif
> >
> > I wonder if this per netdev field is really necessary. Is it not
> > possible to do the same simply looping over the  (global) page_pools
> > xarray? Or is that too silly of an idea. I guess on some systems you
> > may end up with 100s or 1000s of active or orphaned page pools and
> > then globally iterating over the whole page_pools xarray can be really
> > slow..
>
> I think we want the per-netdev hlist either way, on netdev
> unregistration we need to find its pools to clear the pointers.
> At which point we can as well use that to dump the pools.
>
> I don't see a strong reason to use one approach over the other.
> Note that other objects like napi and queues (WIP patches) also walk
> netdevs and dump sub-objects from them.
>

Sorry for the very late reply.

I'm not sure we need a per-netdev hlist either way, seems to me
looping over the global list and filterying by pp->netdev is the
equivalent of looping over the per-netdev hlist, I think.

The reason I was suggesting to only do the global list is because when
the same info is stored in 2 places you may end up with the info going
out of sync. Here, netdev->page_pools needs to match the entries in
each pp->netdev. But, I think your approach is more reasonable as
looping over the global array at all times may be a pain if there are
lots of page_pools on the system. I also can't find any issues with
how you're keeping netdev->page_pools & the page_pools->netdev, so
FWIW:

Reviewed-by: Mina Almasry <almasrymina@google.com>

> > > @@ -48,6 +49,7 @@ struct pp_alloc_cache {
> > >   * @pool_size: size of the ptr_ring
> > >   * @nid:       NUMA node id to allocate from pages from
> > >   * @dev:       device, for DMA pre-mapping purposes
> > > + * @netdev:    netdev this pool will serve (leave as NULL if none or multiple)
> >
> > Is this an existing use case (page_pools that serve null or multiple
> > netdevs), or a future use case? My understanding is that currently
> > page_pools serve at most 1 rx-queue. Spot checking a few drivers that
> > seems to be true.
>
> I think I saw one embedded driver for a switch-like device which has
> queues servicing all ports, and therefore netdevs.
> We'd need some help from people using such devices to figure out what
> the right way to represent them is, and what extra bits of
> functionality they need.
>
> > I'm guessing 1 is _always_ loopback?
>
> AFAIK, yes. I should probably use LOOPBACK_IFINDEX, to make it clearer.



-- 
Thanks,
Mina

^ permalink raw reply

* Re: [PATCH net-next v2 2/5] virtio-net: separate rx/tx coalescing moderation cmds
From: Jason Wang @ 2023-11-09  4:18 UTC (permalink / raw)
  To: Heng Qi
  Cc: netdev, virtualization, Michael S. Tsirkin, Eric Dumazet,
	Paolo Abeni, David S. Miller, Jesper Dangaard Brouer,
	John Fastabend, Alexei Starovoitov, Simon Horman, Jakub Kicinski,
	Liu, Yujie
In-Reply-To: <a8dd10692a8291029afe283808a5b9051e1400b5.1698929590.git.hengqi@linux.alibaba.com>

On Thu, Nov 2, 2023 at 9:10 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
> This patch separates the rx and tx global coalescing moderation
> commands to support netdim switches in subsequent patches.
>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks

> ---
> v1->v2:
> - Add 'i' definition.
>
>  drivers/net/virtio_net.c | 31 ++++++++++++++++++++++++++++---
>  1 file changed, 28 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 0ad2894e6a5e..0285301caf78 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3266,10 +3266,10 @@ static int virtnet_get_link_ksettings(struct net_device *dev,
>         return 0;
>  }
>
> -static int virtnet_send_notf_coal_cmds(struct virtnet_info *vi,
> -                                      struct ethtool_coalesce *ec)
> +static int virtnet_send_tx_notf_coal_cmds(struct virtnet_info *vi,
> +                                         struct ethtool_coalesce *ec)
>  {
> -       struct scatterlist sgs_tx, sgs_rx;
> +       struct scatterlist sgs_tx;
>         int i;
>
>         vi->ctrl->coal_tx.tx_usecs = cpu_to_le32(ec->tx_coalesce_usecs);
> @@ -3289,6 +3289,15 @@ static int virtnet_send_notf_coal_cmds(struct virtnet_info *vi,
>                 vi->sq[i].intr_coal.max_packets = ec->tx_max_coalesced_frames;
>         }
>
> +       return 0;
> +}
> +
> +static int virtnet_send_rx_notf_coal_cmds(struct virtnet_info *vi,
> +                                         struct ethtool_coalesce *ec)
> +{
> +       struct scatterlist sgs_rx;
> +       int i;
> +
>         vi->ctrl->coal_rx.rx_usecs = cpu_to_le32(ec->rx_coalesce_usecs);
>         vi->ctrl->coal_rx.rx_max_packets = cpu_to_le32(ec->rx_max_coalesced_frames);
>         sg_init_one(&sgs_rx, &vi->ctrl->coal_rx, sizeof(vi->ctrl->coal_rx));
> @@ -3309,6 +3318,22 @@ static int virtnet_send_notf_coal_cmds(struct virtnet_info *vi,
>         return 0;
>  }
>
> +static int virtnet_send_notf_coal_cmds(struct virtnet_info *vi,
> +                                      struct ethtool_coalesce *ec)
> +{
> +       int err;
> +
> +       err = virtnet_send_tx_notf_coal_cmds(vi, ec);
> +       if (err)
> +               return err;
> +
> +       err = virtnet_send_rx_notf_coal_cmds(vi, ec);
> +       if (err)
> +               return err;
> +
> +       return 0;
> +}
> +
>  static int virtnet_send_ctrl_coal_vq_cmd(struct virtnet_info *vi,
>                                          u16 vqn, u32 max_usecs, u32 max_packets)
>  {
> --
> 2.19.1.6.gb485710b
>
>


^ permalink raw reply

* Re: [PATCH net-next v2 4/5] virtio-net: support rx netdim
From: Jason Wang @ 2023-11-09  4:43 UTC (permalink / raw)
  To: Heng Qi
  Cc: netdev, virtualization, Michael S. Tsirkin, Eric Dumazet,
	Paolo Abeni, David S. Miller, Jesper Dangaard Brouer,
	John Fastabend, Alexei Starovoitov, Simon Horman, Jakub Kicinski,
	Liu, Yujie
In-Reply-To: <12c77098b73313eea8fdc88a3d1d20611444827d.1698929590.git.hengqi@linux.alibaba.com>

On Thu, Nov 2, 2023 at 9:10 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
> By comparing the traffic information in the complete napi processes,
> let the virtio-net driver automatically adjust the coalescing
> moderation parameters of each receive queue.
>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> ---
> v1->v2:
> - Improved the judgment of dim switch conditions.
> - Fix the safe problem of work thread.

Nit: it's better to be more concrete here, e.g what kind of "safe
problem" it is.


>
>  drivers/net/virtio_net.c | 187 ++++++++++++++++++++++++++++++++++-----
>  1 file changed, 165 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 69fe09e99b3c..5473aa1ee5cd 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -19,6 +19,7 @@
>  #include <linux/average.h>
>  #include <linux/filter.h>
>  #include <linux/kernel.h>
> +#include <linux/dim.h>
>  #include <net/route.h>
>  #include <net/xdp.h>
>  #include <net/net_failover.h>
> @@ -172,6 +173,17 @@ struct receive_queue {
>
>         struct virtnet_rq_stats stats;
>
> +       /* The number of rx notifications */
> +       u16 calls;
> +
> +       /* Is dynamic interrupt moderation enabled? */
> +       bool dim_enabled;
> +
> +       /* Dynamic Interrupt Moderation */
> +       struct dim dim;
> +
> +       u32 packets_in_napi;
> +
>         struct virtnet_interrupt_coalesce intr_coal;
>
>         /* Chain pages by the private ptr. */
> @@ -305,6 +317,9 @@ struct virtnet_info {
>         u8 duplex;
>         u32 speed;
>
> +       /* Is rx dynamic interrupt moderation enabled? */
> +       bool rx_dim_enabled;
> +
>         /* Interrupt coalescing settings */
>         struct virtnet_interrupt_coalesce intr_coal_tx;
>         struct virtnet_interrupt_coalesce intr_coal_rx;
> @@ -2001,6 +2016,7 @@ static void skb_recv_done(struct virtqueue *rvq)
>         struct virtnet_info *vi = rvq->vdev->priv;
>         struct receive_queue *rq = &vi->rq[vq2rxq(rvq)];
>
> +       rq->calls++;
>         virtqueue_napi_schedule(&rq->napi, rvq);
>  }
>
> @@ -2141,6 +2157,26 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
>         }
>  }
>
> +static void virtnet_rx_dim_work(struct work_struct *work);
> +
> +static void virtnet_rx_dim_update(struct virtnet_info *vi, struct receive_queue *rq)
> +{
> +       struct dim_sample cur_sample = {};
> +
> +       if (!rq->packets_in_napi)
> +               return;
> +
> +       u64_stats_update_begin(&rq->stats.syncp);
> +       dim_update_sample(rq->calls,
> +                         u64_stats_read(&rq->stats.packets),
> +                         u64_stats_read(&rq->stats.bytes),
> +                         &cur_sample);
> +       u64_stats_update_end(&rq->stats.syncp);
> +
> +       net_dim(&rq->dim, cur_sample);
> +       rq->packets_in_napi = 0;
> +}
> +
>  static int virtnet_poll(struct napi_struct *napi, int budget)
>  {
>         struct receive_queue *rq =
> @@ -2149,17 +2185,22 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
>         struct send_queue *sq;
>         unsigned int received;
>         unsigned int xdp_xmit = 0;
> +       bool napi_complete;
>
>         virtnet_poll_cleantx(rq);
>
>         received = virtnet_receive(rq, budget, &xdp_xmit);
> +       rq->packets_in_napi += received;
>
>         if (xdp_xmit & VIRTIO_XDP_REDIR)
>                 xdp_do_flush();
>
>         /* Out of packets? */
> -       if (received < budget)
> -               virtqueue_napi_complete(napi, rq->vq, received);
> +       if (received < budget) {
> +               napi_complete = virtqueue_napi_complete(napi, rq->vq, received);
> +               if (napi_complete && rq->dim_enabled)
> +                       virtnet_rx_dim_update(vi, rq);
> +       }
>
>         if (xdp_xmit & VIRTIO_XDP_TX) {
>                 sq = virtnet_xdp_get_sq(vi);
> @@ -2179,6 +2220,7 @@ static void virtnet_disable_queue_pair(struct virtnet_info *vi, int qp_index)
>         virtnet_napi_tx_disable(&vi->sq[qp_index].napi);
>         napi_disable(&vi->rq[qp_index].napi);
>         xdp_rxq_info_unreg(&vi->rq[qp_index].xdp_rxq);
> +       cancel_work_sync(&vi->rq[qp_index].dim.work);
>  }
>
>  static int virtnet_enable_queue_pair(struct virtnet_info *vi, int qp_index)
> @@ -2196,6 +2238,9 @@ static int virtnet_enable_queue_pair(struct virtnet_info *vi, int qp_index)
>         if (err < 0)
>                 goto err_xdp_reg_mem_model;
>
> +       INIT_WORK(&vi->rq[qp_index].dim.work, virtnet_rx_dim_work);

Any reason we need to do the INIT_WORK here but not probe?

> +       vi->rq[qp_index].dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
> +
>         virtnet_napi_enable(vi->rq[qp_index].vq, &vi->rq[qp_index].napi);
>         virtnet_napi_tx_enable(vi, vi->sq[qp_index].vq, &vi->sq[qp_index].napi);
>
> @@ -2393,8 +2438,10 @@ static int virtnet_rx_resize(struct virtnet_info *vi,
>
>         qindex = rq - vi->rq;
>
> -       if (running)
> +       if (running) {
>                 napi_disable(&rq->napi);
> +               cancel_work_sync(&rq->dim.work);
> +       }
>
>         err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_free_unused_buf);
>         if (err)
> @@ -2403,8 +2450,10 @@ static int virtnet_rx_resize(struct virtnet_info *vi,
>         if (!try_fill_recv(vi, rq, GFP_KERNEL))
>                 schedule_delayed_work(&vi->refill, 0);
>
> -       if (running)
> +       if (running) {
> +               INIT_WORK(&rq->dim.work, virtnet_rx_dim_work);
>                 virtnet_napi_enable(rq->vq, &rq->napi);
> +       }
>         return err;
>  }
>
> @@ -3341,24 +3390,51 @@ static int virtnet_send_tx_notf_coal_cmds(struct virtnet_info *vi,
>  static int virtnet_send_rx_notf_coal_cmds(struct virtnet_info *vi,
>                                           struct ethtool_coalesce *ec)
>  {
> +       bool rx_ctrl_dim_on = !!ec->use_adaptive_rx_coalesce;
>         struct scatterlist sgs_rx;
> +       bool switch_dim, update;
>         int i;
>
> -       vi->ctrl->coal_rx.rx_usecs = cpu_to_le32(ec->rx_coalesce_usecs);
> -       vi->ctrl->coal_rx.rx_max_packets = cpu_to_le32(ec->rx_max_coalesced_frames);
> -       sg_init_one(&sgs_rx, &vi->ctrl->coal_rx, sizeof(vi->ctrl->coal_rx));
> -
> -       if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_NOTF_COAL,
> -                                 VIRTIO_NET_CTRL_NOTF_COAL_RX_SET,
> -                                 &sgs_rx))
> -               return -EINVAL;
> +       switch_dim = rx_ctrl_dim_on != vi->rx_dim_enabled;
> +       if (switch_dim) {
> +               if (rx_ctrl_dim_on) {
> +                       if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_VQ_NOTF_COAL)) {
> +                               vi->rx_dim_enabled = true;
> +                               for (i = 0; i < vi->max_queue_pairs; i++)
> +                                       vi->rq[i].dim_enabled = true;
> +                       } else {
> +                               return -EOPNOTSUPP;
> +                       }
> +               } else {
> +                       vi->rx_dim_enabled = false;
> +                       for (i = 0; i < vi->max_queue_pairs; i++)
> +                               vi->rq[i].dim_enabled = false;
> +               }
> +       } else {
> +               if (ec->rx_coalesce_usecs != vi->intr_coal_rx.max_usecs ||
> +                   ec->rx_max_coalesced_frames != vi->intr_coal_rx.max_packets)
> +                       update = true;
>
> -       /* Save parameters */
> -       vi->intr_coal_rx.max_usecs = ec->rx_coalesce_usecs;
> -       vi->intr_coal_rx.max_packets = ec->rx_max_coalesced_frames;
> -       for (i = 0; i < vi->max_queue_pairs; i++) {
> -               vi->rq[i].intr_coal.max_usecs = ec->rx_coalesce_usecs;
> -               vi->rq[i].intr_coal.max_packets = ec->rx_max_coalesced_frames;
> +               if (vi->rx_dim_enabled) {
> +                       if (update)
> +                               return -EINVAL;

update could be used without initialization?

Btw under what condition could we reach here?

Thanks

> +               } else {
> +                       vi->ctrl->coal_rx.rx_usecs = cpu_to_le32(ec->rx_coalesce_usecs);
> +                       vi->ctrl->coal_rx.rx_max_packets = cpu_to_le32(ec->rx_max_coalesced_frames);
> +                       sg_init_one(&sgs_rx, &vi->ctrl->coal_rx, sizeof(vi->ctrl->coal_rx));
> +
> +                       if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_NOTF_COAL,
> +                                                 VIRTIO_NET_CTRL_NOTF_COAL_RX_SET,
> +                                                 &sgs_rx))
> +                               return -EINVAL;
> +
> +                       vi->intr_coal_rx.max_usecs = ec->rx_coalesce_usecs;
> +                       vi->intr_coal_rx.max_packets = ec->rx_max_coalesced_frames;
> +                       for (i = 0; i < vi->max_queue_pairs; i++) {
> +                               vi->rq[i].intr_coal.max_usecs = ec->rx_coalesce_usecs;
> +                               vi->rq[i].intr_coal.max_packets = ec->rx_max_coalesced_frames;
> +                       }
> +               }
>         }
>
>         return 0;
> @@ -3380,15 +3456,54 @@ static int virtnet_send_notf_coal_cmds(struct virtnet_info *vi,
>         return 0;
>  }
>
> +static int virtnet_send_rx_notf_coal_vq_cmds(struct virtnet_info *vi,
> +                                            struct ethtool_coalesce *ec,
> +                                            u16 queue)
> +{
> +       bool rx_ctrl_dim_on = !!ec->use_adaptive_rx_coalesce;
> +       bool cur_rx_dim = vi->rq[queue].dim_enabled;
> +       u32 max_usecs, max_packets;
> +       bool switch_dim, update;
> +       int err;
> +
> +       switch_dim = rx_ctrl_dim_on != cur_rx_dim;
> +       if (switch_dim) {
> +               if (rx_ctrl_dim_on)
> +                       vi->rq[queue].dim_enabled = true;
> +               else
> +                       vi->rq[queue].dim_enabled = false;
> +       } else {
> +               max_usecs = vi->rq[queue].intr_coal.max_usecs;
> +               max_packets = vi->rq[queue].intr_coal.max_packets;
> +               if (ec->rx_coalesce_usecs != max_usecs ||
> +                   ec->rx_max_coalesced_frames != max_packets)
> +                       update = true;
> +
> +               if (cur_rx_dim) {
> +                       if (update)
> +                               return -EINVAL;
> +               } else {
> +                       if (!update)
> +                               return 0;
> +
> +                       err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, queue,
> +                                                              ec->rx_coalesce_usecs,
> +                                                              ec->rx_max_coalesced_frames);
> +                       if (err)
> +                               return err;
> +               }
> +       }
> +
> +       return 0;
> +}
> +
>  static int virtnet_send_notf_coal_vq_cmds(struct virtnet_info *vi,
>                                           struct ethtool_coalesce *ec,
>                                           u16 queue)
>  {
>         int err;
>
> -       err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, queue,
> -                                              ec->rx_coalesce_usecs,
> -                                              ec->rx_max_coalesced_frames);
> +       err = virtnet_send_rx_notf_coal_vq_cmds(vi, ec, queue);
>         if (err)
>                 return err;
>
> @@ -3401,6 +3516,32 @@ static int virtnet_send_notf_coal_vq_cmds(struct virtnet_info *vi,
>         return 0;
>  }
>
> +static void virtnet_rx_dim_work(struct work_struct *work)
> +{
> +       struct dim *dim = container_of(work, struct dim, work);
> +       struct receive_queue *rq = container_of(dim,
> +                       struct receive_queue, dim);
> +       struct virtnet_info *vi = rq->vq->vdev->priv;
> +       struct net_device *dev = vi->dev;
> +       struct dim_cq_moder update_moder;
> +       int qnum = rq - vi->rq, err;
> +
> +       update_moder = net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
> +       if (update_moder.usec != vi->rq[qnum].intr_coal.max_usecs ||
> +           update_moder.pkts != vi->rq[qnum].intr_coal.max_packets) {
> +               rtnl_lock();
> +               err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum,
> +                                                      update_moder.usec,
> +                                                      update_moder.pkts);
> +               if (err)
> +                       pr_debug("%s: Failed to send dim parameters on rxq%d\n",
> +                                dev->name, (int)(rq - vi->rq));
> +               rtnl_unlock();
> +       }
> +
> +       dim->state = DIM_START_MEASURE;
> +}
> +
>  static int virtnet_coal_params_supported(struct ethtool_coalesce *ec)
>  {
>         /* usecs coalescing is supported only if VIRTIO_NET_F_NOTF_COAL
> @@ -3482,6 +3623,7 @@ static int virtnet_get_coalesce(struct net_device *dev,
>                 ec->tx_coalesce_usecs = vi->intr_coal_tx.max_usecs;
>                 ec->tx_max_coalesced_frames = vi->intr_coal_tx.max_packets;
>                 ec->rx_max_coalesced_frames = vi->intr_coal_rx.max_packets;
> +               ec->use_adaptive_rx_coalesce = vi->rx_dim_enabled;
>         } else {
>                 ec->rx_max_coalesced_frames = 1;
>
> @@ -3539,6 +3681,7 @@ static int virtnet_get_per_queue_coalesce(struct net_device *dev,
>                 ec->tx_coalesce_usecs = vi->sq[queue].intr_coal.max_usecs;
>                 ec->tx_max_coalesced_frames = vi->sq[queue].intr_coal.max_packets;
>                 ec->rx_max_coalesced_frames = vi->rq[queue].intr_coal.max_packets;
> +               ec->use_adaptive_rx_coalesce = vi->rq[queue].dim_enabled;
>         } else {
>                 ec->rx_max_coalesced_frames = 1;
>
> @@ -3664,7 +3807,7 @@ static int virtnet_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *info)
>
>  static const struct ethtool_ops virtnet_ethtool_ops = {
>         .supported_coalesce_params = ETHTOOL_COALESCE_MAX_FRAMES |
> -               ETHTOOL_COALESCE_USECS,
> +               ETHTOOL_COALESCE_USECS | ETHTOOL_COALESCE_USE_ADAPTIVE_RX,
>         .get_drvinfo = virtnet_get_drvinfo,
>         .get_link = ethtool_op_get_link,
>         .get_ringparam = virtnet_get_ringparam,
> --
> 2.19.1.6.gb485710b
>


^ permalink raw reply

* Re: [PATCH net-next v2 5/5] virtio-net: return -EOPNOTSUPP for adaptive-tx
From: Jason Wang @ 2023-11-09  4:45 UTC (permalink / raw)
  To: Heng Qi
  Cc: netdev, virtualization, Michael S. Tsirkin, Eric Dumazet,
	Paolo Abeni, David S. Miller, Jesper Dangaard Brouer,
	John Fastabend, Alexei Starovoitov, Simon Horman, Jakub Kicinski,
	Liu, Yujie
In-Reply-To: <4d57c072ca7d12034a1be4d9284e2be5988e1330.1698929590.git.hengqi@linux.alibaba.com>

On Thu, Nov 2, 2023 at 9:10 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
> We do not currently support tx dim, so respond to -EOPNOTSUPP.
>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> ---
> v1->v2:
> - Use -EOPNOTSUPP instead of specific implementation.
>
>  drivers/net/virtio_net.c | 29 ++++++++++++++++++++++++++---
>  1 file changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 5473aa1ee5cd..03edeadd0725 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3364,9 +3364,15 @@ static int virtnet_get_link_ksettings(struct net_device *dev,
>  static int virtnet_send_tx_notf_coal_cmds(struct virtnet_info *vi,
>                                           struct ethtool_coalesce *ec)
>  {
> +       bool tx_ctrl_dim_on = !!ec->use_adaptive_tx_coalesce;
>         struct scatterlist sgs_tx;
>         int i;
>
> +       if (tx_ctrl_dim_on) {
> +               pr_debug("Failed to enable adaptive-tx, which is not supported\n");
> +               return -EOPNOTSUPP;
> +       }

When can we hit this?

Thanks

> +
>         vi->ctrl->coal_tx.tx_usecs = cpu_to_le32(ec->tx_coalesce_usecs);
>         vi->ctrl->coal_tx.tx_max_packets = cpu_to_le32(ec->tx_max_coalesced_frames);
>         sg_init_one(&sgs_tx, &vi->ctrl->coal_tx, sizeof(vi->ctrl->coal_tx));
> @@ -3497,6 +3503,25 @@ static int virtnet_send_rx_notf_coal_vq_cmds(struct virtnet_info *vi,
>         return 0;
>  }
>
> +static int virtnet_send_tx_notf_coal_vq_cmds(struct virtnet_info *vi,
> +                                            struct ethtool_coalesce *ec,
> +                                            u16 queue)
> +{
> +       bool tx_ctrl_dim_on = !!ec->use_adaptive_tx_coalesce;
> +       int err;
> +
> +       if (tx_ctrl_dim_on) {
> +               pr_debug("Enabling adaptive-tx for txq%d is not supported\n", queue);
> +               return -EOPNOTSUPP;
> +       }
> +
> +       err = virtnet_send_tx_ctrl_coal_vq_cmd(vi, queue,
> +                                              ec->tx_coalesce_usecs,
> +                                              ec->tx_max_coalesced_frames);
> +
> +       return err;
> +}
> +
>  static int virtnet_send_notf_coal_vq_cmds(struct virtnet_info *vi,
>                                           struct ethtool_coalesce *ec,
>                                           u16 queue)
> @@ -3507,9 +3532,7 @@ static int virtnet_send_notf_coal_vq_cmds(struct virtnet_info *vi,
>         if (err)
>                 return err;
>
> -       err = virtnet_send_tx_ctrl_coal_vq_cmd(vi, queue,
> -                                              ec->tx_coalesce_usecs,
> -                                              ec->tx_max_coalesced_frames);
> +       err = virtnet_send_tx_notf_coal_vq_cmds(vi, ec, queue);
>         if (err)
>                 return err;
>
> --
> 2.19.1.6.gb485710b
>


^ permalink raw reply

* [PATCH net 1/1] net: stmmac: fix MAC and phylink mismatch issue after resume with STMMAC_FLAG_USE_PHY_WOL enabled
From: Gan Yi Fang @ 2023-11-09  5:00 UTC (permalink / raw)
  To: Alexandre Torgue, Jose Abreu, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Maxime Coquelin, Russell King,
	Joakim Zhang, netdev, linux-stm32, linux-arm-kernel, linux-kernel
  Cc: Looi Hong Aun, Voon Weifeng, Song Yoong Siang, Gan Yi Fang

From: "Gan, Yi Fang" <yi.fang.gan@intel.com>

The issue happened when flag STMMAC_FLAG_USE_PHY_WOL is enabled.
It can be reproduced with steps below:
1. Advertise only one speed on the host
2. Enable the WoL on the host
3. Suspend the host
4. Wake up the host

When the WoL is disabled, both the PHY and MAC will suspend and wake up
with everything configured well. When WoL is enabled, the PHY needs to be
stay awake to receive the signal from remote client but MAC will enter
suspend mode.

When the MAC resumes from suspend, phylink_resume() will call
phylink_start() to start the phylink instance which will trigger the
phylink machine to invoke the mac_link_up callback function. The
stmmac_mac_link_up() will configure the MAC_CTRL_REG based on the current
link state. Then the stmmac_hw_setup() will be called to configure the MAC.

This sequence might cause mismatch of the link state between MAC and
phylink. This patch moves the phylink_resume() after stmamc_hw_setup() to
ensure the MAC is initialized before phylink is being configured.

As phylink_resume() is called all the time, refactor the code and
remove the redundant check.

Fixes: 90702dcd19c0 ("net: stmmac: fix MAC not working when system resume back with WoL active")
Cc: <stable@vger.kernel.org> # 5.15+
Signed-off-by: Gan, Yi Fang <yi.fang.gan@intel.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 3e50fd53a617..9b009fa5478f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -7844,16 +7844,6 @@ int stmmac_resume(struct device *dev)
 			return ret;
 	}

-	rtnl_lock();
-	if (device_may_wakeup(priv->device) && priv->plat->pmt) {
-		phylink_resume(priv->phylink);
-	} else {
-		phylink_resume(priv->phylink);
-		if (device_may_wakeup(priv->device))
-			phylink_speed_up(priv->phylink);
-	}
-	rtnl_unlock();
-
 	rtnl_lock();
 	mutex_lock(&priv->lock);

@@ -7868,6 +7858,11 @@ int stmmac_resume(struct device *dev)

 	stmmac_restore_hw_vlan_rx_fltr(priv, ndev, priv->hw);

+	phylink_resume(priv->phylink);
+
+	if (device_may_wakeup(priv->device) && !(priv->plat->pmt))
+		phylink_speed_up(priv->phylink);
+
 	stmmac_enable_all_queues(priv);
 	stmmac_enable_all_dma_irq(priv);

-- 
2.34.1

^ permalink raw reply related

* RE: [PATCH net-next v2 1/1] net: stmmac: check CBS input values before configuration
From: Gan, Yi Fang @ 2023-11-09  5:08 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Alexandre Torgue, Jose Abreu, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Maxime Coquelin,
	netdev@vger.kernel.org, linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, Looi, Hong Aun, Voon, Weifeng,
	Song, Yoong Siang, Sit, Michael Wei Hong
In-Reply-To: <58132260-81d0-4f0c-90b6-0c97c7a6a2f5@lunn.ch>

Hi Andrew,

The values are not specific to stmmac driver.
It is more making sense to implement the checking at higher level.
Let's close the review for this patch.


Best Regards,
Fang

> -----Original Message-----
> From: Andrew Lunn <andrew@lunn.ch>
> Sent: Wednesday, November 1, 2023 8:36 PM
> To: Gan, Yi Fang <yi.fang.gan@intel.com>
> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>; Jose Abreu
> <joabreu@synopsys.com>; David S . Miller <davem@davemloft.net>; Eric
> Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo
> Abeni <pabeni@redhat.com>; Maxime Coquelin
> <mcoquelin.stm32@gmail.com>; netdev@vger.kernel.org; linux-stm32@st-md-
> mailman.stormreply.com; linux-arm-kernel@lists.infradead.org; linux-
> kernel@vger.kernel.org; Looi, Hong Aun <hong.aun.looi@intel.com>; Voon,
> Weifeng <weifeng.voon@intel.com>; Song, Yoong Siang
> <yoong.siang.song@intel.com>; Sit, Michael Wei Hong
> <michael.wei.hong.sit@intel.com>
> Subject: Re: [PATCH net-next v2 1/1] net: stmmac: check CBS input values
> before configuration
> 
> On Wed, Nov 01, 2023 at 02:19:20PM +0800, Gan Yi Fang wrote:
> > From: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com>
> >
> > Add check for below conditions before proceeding to configuration.
> > A message will be prompted if the input value is invalid.
> >
> > Idleslope minus sendslope should equal speed_div.
> > Idleslope is always a positive value including zero.
> > Sendslope is always a negative value including zero.
> > Hicredit is always a positive value including zero.
> > Locredit is always a negative value including zero.
> 
> Which of these conditional are specific to stmmac, and which are generic to
> CBS? Anything which is generic to CBS i would expect to be checked at a higher
> level, rather than in every driver implementing CBS.
> 
> 	Andrew

^ permalink raw reply

* [syzbot] [net?] BUG: unable to handle kernel paging request in nsim_bpf
From: syzbot @ 2023-11-09  5:10 UTC (permalink / raw)
  To: ast, bpf, daniel, davem, edumazet, hawk, john.fastabend, kuba,
	linux-kernel, netdev, pabeni, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    8de1e7afcc1c Merge branch 'for-next/core' into for-kernelci
git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
console output: https://syzkaller.appspot.com/x/log.txt?x=158c647b680000
kernel config:  https://syzkaller.appspot.com/x/.config?x=3e6feaeda5dcbc27
dashboard link: https://syzkaller.appspot.com/bug?extid=44c2416196b7c607f226
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: arm64
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=104da6eb680000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=14df3787680000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/0f00907f9764/disk-8de1e7af.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/0502fe78c60d/vmlinux-8de1e7af.xz
kernel image: https://storage.googleapis.com/syzbot-assets/192135168cc0/Image-8de1e7af.gz.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+44c2416196b7c607f226@syzkaller.appspotmail.com

netdevsim netdevsim0 netdevsim1: set [1, 0] type 2 family 0 port 6081 - 0
netdevsim netdevsim0 netdevsim2: set [1, 0] type 2 family 0 port 6081 - 0
netdevsim netdevsim0 netdevsim3: set [1, 0] type 2 family 0 port 6081 - 0
Unable to handle kernel paging request at virtual address dfff800000000003
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
Mem abort info:
  ESR = 0x0000000096000005
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  FSC = 0x05: level 1 translation fault
Data abort info:
  ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
  CM = 0, WnR = 0, TnD = 0, TagAccess = 0
  GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[dfff800000000003] address between user and kernel address ranges
Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 6085 Comm: syz-executor153 Not tainted 6.6.0-rc7-syzkaller-g8de1e7afcc1c #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : nsim_setup_prog_hw_checks drivers/net/netdevsim/bpf.c:320 [inline]
pc : nsim_bpf+0x1e0/0xae0 drivers/net/netdevsim/bpf.c:562
lr : nsim_bpf+0x8c/0xae0 drivers/net/netdevsim/bpf.c:554
sp : ffff800096c67790
x29: ffff800096c677a0 x28: dfff800000000000 x27: ffff700012d8cf00
x26: dfff800000000000 x25: ffff800096c67a00 x24: 0000000000000008
x23: ffff800096c67820 x22: 0000000000000018 x21: ffff800096c67820
x20: ffff0000d3834cc0 x19: ffff0000d3834000 x18: ffff800096c67580
x17: ffff8000805c1258 x16: ffff80008030c738 x15: 0000000000000000
x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000003
x11: ffff0000d4ab3780 x10: 00000000000000bc x9 : ffff800085ce8bf0
x8 : 0000000000000003 x7 : 0000000000000000 x6 : 0000000000000000
x5 : ffff800092dee000 x4 : 0000000000000000 x3 : ffff80008030c754
x2 : 0000000000000000 x1 : ffff80009001ef50 x0 : 0000000000000001
Call trace:
 nsim_setup_prog_hw_checks drivers/net/netdevsim/bpf.c:320 [inline]
 nsim_bpf+0x1e0/0xae0 drivers/net/netdevsim/bpf.c:562
 dev_xdp_install+0x124/0x2f0 net/core/dev.c:9199
 dev_xdp_attach+0xa4c/0xcc8 net/core/dev.c:9351
 dev_xdp_attach_link net/core/dev.c:9370 [inline]
 bpf_xdp_link_attach+0x300/0x710 net/core/dev.c:9540
 link_create+0x2c0/0x68c kernel/bpf/syscall.c:4954
 __sys_bpf+0x4d4/0x5dc kernel/bpf/syscall.c:5414
 __do_sys_bpf kernel/bpf/syscall.c:5448 [inline]
 __se_sys_bpf kernel/bpf/syscall.c:5446 [inline]
 __arm64_sys_bpf+0x80/0x98 kernel/bpf/syscall.c:5446
 __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline]
 invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:51
 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:136
 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:155
 el0_svc+0x54/0x158 arch/arm64/kernel/entry-common.c:678
 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:696
 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:595
Code: 96b3720d f94002c8 91006116 d343fec8 (387a6908) 
---[ end trace 0000000000000000 ]---
----------------
Code disassembly (best guess):
   0:	96b3720d 	bl	0xfffffffffacdc834
   4:	f94002c8 	ldr	x8, [x22]
   8:	91006116 	add	x22, x8, #0x18
   c:	d343fec8 	lsr	x8, x22, #3
* 10:	387a6908 	ldrb	w8, [x8, x26] <-- trapping instruction


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply

* [PATCH net-next 1/1] net: stmmac: Add support for HW-accelerated VLAN stripping
From: Gan Yi Fang @ 2023-11-09  5:38 UTC (permalink / raw)
  To: Alexandre Torgue, Jose Abreu, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Maxime Coquelin, Russell King,
	Andrew Halaney, Simon Horman, Bartosz Golaszewski, Shenwei Wang,
	Russell King, Johannes Zink, Jochen Henneberg, netdev,
	linux-stm32, linux-arm-kernel, linux-kernel
  Cc: Looi Hong Aun, Voon Weifeng, Song Yoong Siang, Gan Yi Fang

From: "Gan, Yi Fang" <yi.fang.gan@intel.com>

Current implementation supports driver level VLAN tag stripping only.
The features is always on if CONFIG_VLAN_8021Q is enabled in kernel
config and is not user configurable.

This patch add support to MAC level VLAN tag stripping and can be
configured through ethtool. If the rx-vlan-offload is off, the VLAN tag
will be stripped by driver. If the rx-vlan-offload is on, the VLAN tag
will be stripped by MAC.

Command: ethtool -K enp0s30f4 rx-vlan-offload off | on

Signed-off-by: Gan, Yi Fang <yi.fang.gan@intel.com>
Signed-off-by: Lai Peter Jun Ann <jun.ann.lai@intel.com>
---
 drivers/net/ethernet/stmicro/stmmac/common.h  |  1 +
 .../net/ethernet/stmicro/stmmac/dwmac-intel.c |  3 +-
 .../net/ethernet/stmicro/stmmac/dwmac4_core.c | 35 +++++++++++++++++++
 .../ethernet/stmicro/stmmac/dwmac4_descs.c    | 13 +++++++
 drivers/net/ethernet/stmicro/stmmac/hwif.h    | 15 ++++++++
 .../net/ethernet/stmicro/stmmac/stmmac_main.c | 24 ++++++++++++-
 include/linux/stmmac.h                        |  1 +
 7 files changed, 90 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index e3f650e88f82..6b935922054d 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -580,6 +580,7 @@ struct mac_device_info {
 	u32 vlan_filter[32];
 	bool vlan_fail_q_en;
 	u8 vlan_fail_q;
+	bool hw_vlan_en;
 };
 
 struct stmmac_rx_routing {
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c
index 60283543ffc8..651fee867aac 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c
@@ -457,7 +457,8 @@ static int intel_mgbe_common_data(struct pci_dev *pdev,
 	plat->has_gmac = 0;
 	plat->has_gmac4 = 1;
 	plat->force_sf_dma_mode = 0;
-	plat->flags |= (STMMAC_FLAG_TSO_EN | STMMAC_FLAG_SPH_DISABLE);
+	plat->flags |= (STMMAC_FLAG_TSO_EN | STMMAC_FLAG_SPH_DISABLE |
+			STMMAC_FLAG_HW_VLAN_EN);
 
 	/* Multiplying factor to the clk_eee_i clock time
 	 * period to make it closer to 100 ns. This value
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index c6ff1fa0e04d..6a8d7873b456 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -1134,6 +1134,35 @@ static int dwmac4_config_l4_filter(struct mac_device_info *hw, u32 filter_no,
 	return 0;
 }
 
+static void dwmac4_rx_hw_vlan(struct mac_device_info *hw,
+			      struct dma_desc *rx_desc, struct sk_buff *skb)
+{
+	if (hw->desc->get_rx_vlan_valid(rx_desc)) {
+		u16 vid = (u16)hw->desc->get_rx_vlan_tci(rx_desc);
+
+		__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vid);
+	}
+}
+
+static void dwmac4_set_hw_vlan_mode(struct mac_device_info *hw)
+{
+	void __iomem *ioaddr = hw->pcsr;
+	u32 value = readl(ioaddr + GMAC_VLAN_TAG);
+
+	value &= ~GMAC_VLAN_TAG_CTRL_EVLS_MASK;
+
+	if (hw->hw_vlan_en)
+		/* Always strip VLAN on Receive */
+		value |= GMAC_VLAN_TAG_STRIP_ALL;
+	else
+		/* Do not strip VLAN on Receive */
+		value |= GMAC_VLAN_TAG_STRIP_NONE;
+
+	/* Enable outer VLAN Tag in Rx DMA descriptor */
+	value |= GMAC_VLAN_TAG_CTRL_EVLRXS;
+	writel(value, ioaddr + GMAC_VLAN_TAG);
+}
+
 const struct stmmac_ops dwmac4_ops = {
 	.core_init = dwmac4_core_init,
 	.phylink_get_caps = dwmac4_phylink_get_caps,
@@ -1175,6 +1204,8 @@ const struct stmmac_ops dwmac4_ops = {
 	.add_hw_vlan_rx_fltr = dwmac4_add_hw_vlan_rx_fltr,
 	.del_hw_vlan_rx_fltr = dwmac4_del_hw_vlan_rx_fltr,
 	.restore_hw_vlan_rx_fltr = dwmac4_restore_hw_vlan_rx_fltr,
+	.rx_hw_vlan = dwmac4_rx_hw_vlan,
+	.set_hw_vlan_mode = dwmac4_set_hw_vlan_mode,
 };
 
 const struct stmmac_ops dwmac410_ops = {
@@ -1224,6 +1255,8 @@ const struct stmmac_ops dwmac410_ops = {
 	.add_hw_vlan_rx_fltr = dwmac4_add_hw_vlan_rx_fltr,
 	.del_hw_vlan_rx_fltr = dwmac4_del_hw_vlan_rx_fltr,
 	.restore_hw_vlan_rx_fltr = dwmac4_restore_hw_vlan_rx_fltr,
+	.rx_hw_vlan = dwmac4_rx_hw_vlan,
+	.set_hw_vlan_mode = dwmac4_set_hw_vlan_mode,
 };
 
 const struct stmmac_ops dwmac510_ops = {
@@ -1277,6 +1310,8 @@ const struct stmmac_ops dwmac510_ops = {
 	.add_hw_vlan_rx_fltr = dwmac4_add_hw_vlan_rx_fltr,
 	.del_hw_vlan_rx_fltr = dwmac4_del_hw_vlan_rx_fltr,
 	.restore_hw_vlan_rx_fltr = dwmac4_restore_hw_vlan_rx_fltr,
+	.rx_hw_vlan = dwmac4_rx_hw_vlan,
+	.set_hw_vlan_mode = dwmac4_set_hw_vlan_mode,
 };
 
 static u32 dwmac4_get_num_vlan(void __iomem *ioaddr)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
index 89a14084c611..a01d71dfed6c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
@@ -198,6 +198,17 @@ static int dwmac4_get_tx_ls(struct dma_desc *p)
 		>> TDES3_LAST_DESCRIPTOR_SHIFT;
 }
 
+static inline int dwmac4_wrback_get_rx_vlan_tci(struct dma_desc *p)
+{
+	return (le32_to_cpu(p->des0) & RDES0_VLAN_TAG_MASK);
+}
+
+static inline bool dwmac4_wrback_get_rx_vlan_valid(struct dma_desc *p)
+{
+	return ((le32_to_cpu(p->des3) & RDES3_LAST_DESCRIPTOR) &&
+		(le32_to_cpu(p->des3) & RDES3_RDES0_VALID));
+}
+
 static int dwmac4_wrback_get_rx_frame_len(struct dma_desc *p, int rx_coe)
 {
 	return (le32_to_cpu(p->des3) & RDES3_PACKET_SIZE_MASK);
@@ -551,6 +562,8 @@ const struct stmmac_desc_ops dwmac4_desc_ops = {
 	.set_tx_owner = dwmac4_set_tx_owner,
 	.set_rx_owner = dwmac4_set_rx_owner,
 	.get_tx_ls = dwmac4_get_tx_ls,
+	.get_rx_vlan_tci = dwmac4_wrback_get_rx_vlan_tci,
+	.get_rx_vlan_valid = dwmac4_wrback_get_rx_vlan_valid,
 	.get_rx_frame_len = dwmac4_wrback_get_rx_frame_len,
 	.enable_tx_timestamp = dwmac4_rd_enable_tx_timestamp,
 	.get_tx_timestamp_status = dwmac4_wrback_get_tx_timestamp_status,
diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.h b/drivers/net/ethernet/stmicro/stmmac/hwif.h
index b95d3e137813..5a079dae1380 100644
--- a/drivers/net/ethernet/stmicro/stmmac/hwif.h
+++ b/drivers/net/ethernet/stmicro/stmmac/hwif.h
@@ -56,6 +56,10 @@ struct stmmac_desc_ops {
 	void (*set_tx_ic)(struct dma_desc *p);
 	/* Last tx segment reports the transmit status */
 	int (*get_tx_ls)(struct dma_desc *p);
+	/* Get the tag of the descriptor */
+	int (*get_rx_vlan_tci)(struct dma_desc *p);
+	/* Get the valid status of descriptor */
+	bool (*get_rx_vlan_valid)(struct dma_desc *p);
 	/* Return the transmit status looking at the TDES1 */
 	int (*tx_status)(struct stmmac_extra_stats *x,
 			 struct dma_desc *p, void __iomem *ioaddr);
@@ -117,6 +121,10 @@ struct stmmac_desc_ops {
 	stmmac_do_void_callback(__priv, desc, set_tx_ic, __args)
 #define stmmac_get_tx_ls(__priv, __args...) \
 	stmmac_do_callback(__priv, desc, get_tx_ls, __args)
+#define stmmac_get_rx_vlan_tci(__priv, __args...) \
+	stmmac_do_callback(__priv, desc, get_rx_vlan_tci, __args)
+#define stmmac_get_rx_vlan_valid(__priv, __args...) \
+	stmmac_do_callback(__priv, desc, get_rx_vlan_valid, __args)
 #define stmmac_tx_status(__priv, __args...) \
 	stmmac_do_callback(__priv, desc, tx_status, __args)
 #define stmmac_get_tx_len(__priv, __args...) \
@@ -388,6 +396,9 @@ struct stmmac_ops {
 	void (*update_vlan_hash)(struct mac_device_info *hw, u32 hash,
 				 __le16 perfect_match, bool is_double);
 	void (*enable_vlan)(struct mac_device_info *hw, u32 type);
+	void (*rx_hw_vlan)(struct mac_device_info *hw, struct dma_desc *rx_desc,
+			   struct sk_buff *skb);
+	void (*set_hw_vlan_mode)(struct mac_device_info *hw);
 	int (*add_hw_vlan_rx_fltr)(struct net_device *dev,
 				   struct mac_device_info *hw,
 				   __be16 proto, u16 vid);
@@ -497,6 +508,10 @@ struct stmmac_ops {
 	stmmac_do_void_callback(__priv, mac, update_vlan_hash, __args)
 #define stmmac_enable_vlan(__priv, __args...) \
 	stmmac_do_void_callback(__priv, mac, enable_vlan, __args)
+#define stmmac_rx_hw_vlan(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, rx_hw_vlan, __args)
+#define stmmac_set_hw_vlan_mode(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, set_hw_vlan_mode, __args)
 #define stmmac_add_hw_vlan_rx_fltr(__priv, __args...) \
 	stmmac_do_callback(__priv, mac, add_hw_vlan_rx_fltr, __args)
 #define stmmac_del_hw_vlan_rx_fltr(__priv, __args...) \
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 3e50fd53a617..62299ec5179f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3469,6 +3469,9 @@ static int stmmac_hw_setup(struct net_device *dev, bool ptp_register)
 	/* Start the ball rolling... */
 	stmmac_start_all_dma(priv);
 
+	if (priv->hw->hw_vlan_en)
+		stmmac_set_hw_vlan_mode(priv, priv->hw);
+
 	if (priv->dma_cap.fpesel) {
 		stmmac_fpe_start_wq(priv);
 
@@ -5508,7 +5511,14 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
 		/* Got entire packet into SKB. Finish it. */
 
 		stmmac_get_rx_hwtstamp(priv, p, np, skb);
-		stmmac_rx_vlan(priv->dev, skb);
+
+		if (priv->hw->hw_vlan_en)
+			/* MAC level stripping. */
+			stmmac_rx_hw_vlan(priv, priv->hw, p, skb);
+		else
+			/* Driver level stripping. */
+			stmmac_rx_vlan(priv->dev, skb);
+
 		skb->protocol = eth_type_trans(skb, priv->dev);
 
 		if (unlikely(!coe))
@@ -5817,6 +5827,14 @@ static int stmmac_set_features(struct net_device *netdev,
 			stmmac_enable_sph(priv, priv->ioaddr, sph_en, chan);
 	}
 
+	if ((features & NETIF_F_HW_VLAN_CTAG_RX) &&
+	    (priv->plat->flags & STMMAC_FLAG_HW_VLAN_EN))
+		priv->hw->hw_vlan_en = true;
+	else
+		priv->hw->hw_vlan_en = false;
+
+	stmmac_set_hw_vlan_mode(priv, priv->hw);
+
 	return 0;
 }
 
@@ -7146,6 +7164,8 @@ static int stmmac_hw_init(struct stmmac_priv *priv)
 			 "Enable RX Mitigation via HW Watchdog Timer\n");
 	}
 
+	priv->hw->hw_vlan_en = (priv->plat->flags & STMMAC_FLAG_HW_VLAN_EN);
+
 	return 0;
 }
 
@@ -7515,6 +7535,8 @@ int stmmac_dvr_probe(struct device *device,
 #ifdef STMMAC_VLAN_TAG_USED
 	/* Both mac100 and gmac support receive VLAN tag detection */
 	ndev->features |= NETIF_F_HW_VLAN_CTAG_RX | NETIF_F_HW_VLAN_STAG_RX;
+	ndev->hw_features |= NETIF_F_HW_VLAN_CTAG_RX;
+
 	if (priv->dma_cap.vlhash) {
 		ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
 		ndev->features |= NETIF_F_HW_VLAN_STAG_FILTER;
diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index 0b4658a7eceb..1cf78e6bca5e 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -220,6 +220,7 @@ struct dwmac4_addrs {
 #define STMMAC_FLAG_RX_CLK_RUNS_IN_LPI		BIT(10)
 #define STMMAC_FLAG_EN_TX_LPI_CLOCKGATING	BIT(11)
 #define STMMAC_FLAG_HWTSTAMP_CORRECT_LATENCY	BIT(12)
+#define STMMAC_FLAG_HW_VLAN_EN			BIT(13)
 
 struct plat_stmmacenet_data {
 	int bus_id;
-- 
2.34.1


^ permalink raw reply related

* [PATCH v4] net: mvneta: fix calls to page_pool_get_stats
From: Sven Auhagen @ 2023-11-09  5:59 UTC (permalink / raw)
  To: netdev
  Cc: thomas.petazzoni, brouer, lorenzo, ilias.apalodimas, mcroce, leon,
	kuba

Calling page_pool_get_stats in the mvneta driver without checks
leads to kernel crashes.
First the page pool is only available if the bm is not used.
The page pool is also not allocated when the port is stopped.
It can also be not allocated in case of errors.

The current implementation leads to the following crash calling
ethstats on a port that is down or when calling it at the wrong moment:

ble to handle kernel NULL pointer dereference at virtual address 00000070
[00000070] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Hardware name: Marvell Armada 380/385 (Device Tree)
PC is at page_pool_get_stats+0x18/0x1cc
LR is at mvneta_ethtool_get_stats+0xa0/0xe0 [mvneta]
pc : [<c0b413cc>]    lr : [<bf0a98d8>]    psr: a0000013
sp : f1439d48  ip : f1439dc0  fp : 0000001d
r10: 00000100  r9 : c4816b80  r8 : f0d75150
r7 : bf0b400c  r6 : c238f000  r5 : 00000000  r4 : f1439d68
r3 : c2091040  r2 : ffffffd8  r1 : f1439d68  r0 : 00000000
Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 066b004a  DAC: 00000051
Register r0 information: NULL pointer
Register r1 information: 2-page vmalloc region starting at 0xf1438000 allocated at kernel_clone+0x9c/0x390
Register r2 information: non-paged memory
Register r3 information: slab kmalloc-2k start c2091000 pointer offset 64 size 2048
Register r4 information: 2-page vmalloc region starting at 0xf1438000 allocated at kernel_clone+0x9c/0x390
Register r5 information: NULL pointer
Register r6 information: slab kmalloc-cg-4k start c238f000 pointer offset 0 size 4096
Register r7 information: 15-page vmalloc region starting at 0xbf0a8000 allocated at load_module+0xa30/0x219c
Register r8 information: 1-page vmalloc region starting at 0xf0d75000 allocated at ethtool_get_stats+0x138/0x208
Register r9 information: slab task_struct start c4816b80 pointer offset 0
Register r10 information: non-paged memory
Register r11 information: non-paged memory
Register r12 information: 2-page vmalloc region starting at 0xf1438000 allocated at kernel_clone+0x9c/0x390
Process snmpd (pid: 733, stack limit = 0x38de3a88)
Stack: (0xf1439d48 to 0xf143a000)
9d40:                   000000c0 00000001 c238f000 bf0b400c f0d75150 c4816b80
9d60: 00000100 bf0a98d8 00000000 00000000 00000000 00000000 00000000 00000000
9d80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9da0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9dc0: 00000dc0 5335509c 00000035 c238f000 bf0b2214 01067f50 f0d75000 c0b9b9c8
9de0: 0000001d 00000035 c2212094 5335509c c4816b80 c238f000 c5ad6e00 01067f50
9e00: c1b0be80 c4816b80 00014813 c0b9d7f0 00000000 00000000 0000001d 0000001d
9e20: 00000000 00001200 00000000 00000000 c216ed90 c73943b8 00000000 00000000
9e40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9e60: 00000000 c0ad9034 00000000 00000000 00000000 00000000 00000000 00000000
9e80: 00000000 00000000 00000000 5335509c c1b0be80 f1439ee4 00008946 c1b0be80
9ea0: 01067f50 f1439ee3 00000000 00000046 b6d77ae0 c0b383f0 00008946 becc83e8
9ec0: c1b0be80 00000051 0000000b c68ca480 c7172d00 c0ad8ff0 f1439ee3 cf600e40
9ee0: 01600e40 32687465 00000000 00000000 00000000 01067f50 00000000 00000000
9f00: 00000000 5335509c 00008946 00008946 00000000 c68ca480 becc83e8 c05e2de0
9f20: f1439fb0 c03002f0 00000006 5ac3c35a c4816b80 00000006 b6d77ae0 c030caf0
9f40: c4817350 00000014 f1439e1c 0000000c 00000000 00000051 01000000 00000014
9f60: 00003fec f1439edc 00000001 c0372abc b6d77ae0 c0372abc cf600e40 5335509c
9f80: c21e6800 01015c9c 0000000b 00008946 00000036 c03002f0 c4816b80 00000036
9fa0: b6d77ae0 c03000c0 01015c9c 0000000b 0000000b 00008946 becc83e8 00000000
9fc0: 01015c9c 0000000b 00008946 00000036 00000035 010678a0 b6d797ec b6d77ae0
9fe0: b6dbf738 becc838c b6d186d7 b6baa858 40000030 0000000b 00000000 00000000
 page_pool_get_stats from mvneta_ethtool_get_stats+0xa0/0xe0 [mvneta]
 mvneta_ethtool_get_stats [mvneta] from ethtool_get_stats+0x154/0x208
 ethtool_get_stats from dev_ethtool+0xf48/0x2480
 dev_ethtool from dev_ioctl+0x538/0x63c
 dev_ioctl from sock_ioctl+0x49c/0x53c
 sock_ioctl from sys_ioctl+0x134/0xbd8
 sys_ioctl from ret_fast_syscall+0x0/0x1c
Exception stack(0xf1439fa8 to 0xf1439ff0)
9fa0:                   01015c9c 0000000b 0000000b 00008946 becc83e8 00000000
9fc0: 01015c9c 0000000b 00008946 00000036 00000035 010678a0 b6d797ec b6d77ae0
9fe0: b6dbf738 becc838c b6d186d7 b6baa858
Code: e28dd004 e1a05000 e2514000 0a00006a (e5902070)

This commit adds the proper checks before calling page_pool_get_stats.

Fixes: b3fc79225f05 ("net: mvneta: add support for page_pool_get_stats")
Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de>
Reported-by: Paulo Da Silva <Paulo.DaSilva@kyberna.com>
---

Change from v3:
	* Move the page pool check back to mvneta

Change from v2:
	* Fix the fixes tag

Change from v1:
	* Add cover letter
	* Move the page pool check in mvneta to the ethtool stats
	  function

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 8b0f12a0e0f2..bbb5d972657a 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -4734,13 +4734,16 @@ static void mvneta_ethtool_get_strings(struct net_device *netdev, u32 sset,
 {
 	if (sset == ETH_SS_STATS) {
 		int i;
+		struct mvneta_port *pp = netdev_priv(netdev);
 
 		for (i = 0; i < ARRAY_SIZE(mvneta_statistics); i++)
 			memcpy(data + i * ETH_GSTRING_LEN,
 			       mvneta_statistics[i].name, ETH_GSTRING_LEN);
 
-		data += ETH_GSTRING_LEN * ARRAY_SIZE(mvneta_statistics);
-		page_pool_ethtool_stats_get_strings(data);
+		if (!pp->bm_priv) {
+			data += ETH_GSTRING_LEN * ARRAY_SIZE(mvneta_statistics);
+			page_pool_ethtool_stats_get_strings(data);
+		}
 	}
 }
 
@@ -4858,8 +4861,10 @@ static void mvneta_ethtool_pp_stats(struct mvneta_port *pp, u64 *data)
 	struct page_pool_stats stats = {};
 	int i;
 
-	for (i = 0; i < rxq_number; i++)
-		page_pool_get_stats(pp->rxqs[i].page_pool, &stats);
+	for (i = 0; i < rxq_number; i++) {
+		if (pp->rxqs[i].page_pool)
+			page_pool_get_stats(pp->rxqs[i].page_pool, &stats);
+	}
 
 	page_pool_ethtool_stats_get(data, &stats);
 }
@@ -4875,14 +4880,21 @@ static void mvneta_ethtool_get_stats(struct net_device *dev,
 	for (i = 0; i < ARRAY_SIZE(mvneta_statistics); i++)
 		*data++ = pp->ethtool_stats[i];
 
-	mvneta_ethtool_pp_stats(pp, data);
+	if (!pp->bm_priv && !pp->is_stopped)
+		mvneta_ethtool_pp_stats(pp, data);
 }
 
 static int mvneta_ethtool_get_sset_count(struct net_device *dev, int sset)
 {
-	if (sset == ETH_SS_STATS)
-		return ARRAY_SIZE(mvneta_statistics) +
-		       page_pool_ethtool_stats_get_count();
+	if (sset == ETH_SS_STATS) {
+		int count = ARRAY_SIZE(mvneta_statistics);
+		struct mvneta_port *pp = netdev_priv(dev);
+
+		if (!pp->bm_priv)
+			count += page_pool_ethtool_stats_get_count();
+
+		return count;
+	}
 
 	return -EOPNOTSUPP;
 }
-- 
2.42.0


^ permalink raw reply related

* Re: [PATCH net-next v2 04/21] virtio_net: move core structures to virtio_net.h
From: Jason Wang @ 2023-11-09  6:03 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michael S. Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	virtualization, bpf
In-Reply-To: <20231107031227.100015-5-xuanzhuo@linux.alibaba.com>

On Tue, Nov 7, 2023 at 11:13 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Move some core structures (send_queue, receive_queue, virtnet_info)
> definitions and the relative structures definitions into the
> virtio_net.h file.
>
> That will be used by the other c code files.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


^ permalink raw reply

* Re: [PATCH net-next v2 05/21] virtio_net: add prefix virtnet to all struct inside virtio_net.h
From: Jason Wang @ 2023-11-09  6:04 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michael S. Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	virtualization, bpf
In-Reply-To: <20231107031227.100015-6-xuanzhuo@linux.alibaba.com>

On Tue, Nov 7, 2023 at 11:13 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> We move some structures to the header file, but these structures do not
> prefixed with virtnet. This patch adds virtnet for these.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


^ permalink raw reply

* Re: [PATCH net-next v2 08/21] virtio_net: sq support premapped mode
From: Jason Wang @ 2023-11-09  6:37 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michael S. Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	virtualization, bpf
In-Reply-To: <20231107031227.100015-9-xuanzhuo@linux.alibaba.com>

On Tue, Nov 7, 2023 at 11:12 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> If the xsk is enabling, the xsk tx will share the send queue.
> But the xsk requires that the send queue use the premapped mode.
> So the send queue must support premapped mode.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/net/virtio/main.c       | 163 ++++++++++++++++++++++++++++----
>  drivers/net/virtio/virtio_net.h |  16 ++++
>  2 files changed, 163 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> index 16e75c08639e..f052db459156 100644
> --- a/drivers/net/virtio/main.c
> +++ b/drivers/net/virtio/main.c
> @@ -46,6 +46,7 @@ module_param(napi_tx, bool, 0644);
>  #define VIRTIO_XDP_REDIR       BIT(1)
>
>  #define VIRTIO_XDP_FLAG        BIT(0)
> +#define VIRTIO_XMIT_DATA_MASK (VIRTIO_XDP_FLAG)
>
>  #define VIRTNET_DRIVER_VERSION "1.0.0"
>
> @@ -167,6 +168,29 @@ static struct xdp_frame *ptr_to_xdp(void *ptr)
>         return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
>  }
>
> +static inline void *virtnet_sq_unmap(struct virtnet_sq *sq, void *data)
> +{
> +       struct virtnet_sq_dma *next, *head;
> +
> +       head = (void *)((unsigned long)data & ~VIRTIO_XMIT_DATA_MASK);
> +
> +       data = head->data;
> +
> +       while (head) {
> +               virtqueue_dma_unmap_single_attrs(sq->vq, head->addr, head->len,
> +                                                DMA_TO_DEVICE, 0);
> +
> +               next = head->next;
> +
> +               head->next = sq->dmainfo.free;
> +               sq->dmainfo.free = head;
> +
> +               head = next;
> +       }
> +
> +       return data;
> +}
> +
>  static void __free_old_xmit(struct virtnet_sq *sq, bool in_napi,
>                             u64 *bytes, u64 *packets)
>  {
> @@ -175,14 +199,24 @@ static void __free_old_xmit(struct virtnet_sq *sq, bool in_napi,
>
>         while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
>                 if (!is_xdp_frame(ptr)) {
> -                       struct sk_buff *skb = ptr;
> +                       struct sk_buff *skb;
> +
> +                       if (sq->do_dma)
> +                               ptr = virtnet_sq_unmap(sq, ptr);
> +
> +                       skb = ptr;
>
>                         pr_debug("Sent skb %p\n", skb);
>
>                         *bytes += skb->len;
>                         napi_consume_skb(skb, in_napi);
>                 } else {
> -                       struct xdp_frame *frame = ptr_to_xdp(ptr);
> +                       struct xdp_frame *frame;
> +
> +                       if (sq->do_dma)
> +                               ptr = virtnet_sq_unmap(sq, ptr);
> +
> +                       frame = ptr_to_xdp(ptr);
>
>                         *bytes += xdp_get_frame_len(frame);
>                         xdp_return_frame(frame);
> @@ -567,22 +601,104 @@ static void *virtnet_rq_alloc(struct virtnet_rq *rq, u32 size, gfp_t gfp)
>         return buf;
>  }
>
> -static void virtnet_rq_set_premapped(struct virtnet_info *vi)
> +static int virtnet_sq_set_premapped(struct virtnet_sq *sq)
>  {
> -       int i;
> +       struct virtnet_sq_dma *d;
> +       int err, size, i;
>
> -       /* disable for big mode */
> -       if (!vi->mergeable_rx_bufs && vi->big_packets)
> -               return;
> +       size = virtqueue_get_vring_size(sq->vq);
> +
> +       size += MAX_SKB_FRAGS + 2;

Btw, the dmainfo seems per sg? If I'm correct, how can vq_size +
MAX_SKB_FRAGS + 2 work?

> +
> +       sq->dmainfo.head = kcalloc(size, sizeof(*sq->dmainfo.head), GFP_KERNEL);
> +       if (!sq->dmainfo.head)
> +               return -ENOMEM;
> +
> +       err = virtqueue_set_dma_premapped(sq->vq);
> +       if (err) {
> +               kfree(sq->dmainfo.head);
> +               return err;
> +       }

Allocating after set_dma_premapped() seems easier.

Btw, is there a benchmark of TX PPS just for this patch to demonstrate
the impact of the performance?

> +
> +       sq->dmainfo.free = NULL;
> +
> +       sq->do_dma = true;
> +
> +       for (i = 0; i < size; ++i) {
> +               d = &sq->dmainfo.head[i];
> +
> +               d->next = sq->dmainfo.free;
> +               sq->dmainfo.free = d;
> +       }
> +
> +       return 0;
> +}
> +
> +static void virtnet_set_premapped(struct virtnet_info *vi)
> +{
> +       int i;
>
>         for (i = 0; i < vi->max_queue_pairs; i++) {
> -               if (virtqueue_set_dma_premapped(vi->rq[i].vq))
> -                       continue;
> +               virtnet_sq_set_premapped(&vi->sq[i]);
>
> -               vi->rq[i].do_dma = true;
> +               /* disable for big mode */
> +               if (vi->mergeable_rx_bufs || !vi->big_packets) {
> +                       if (!virtqueue_set_dma_premapped(vi->rq[i].vq))
> +                               vi->rq[i].do_dma = true;

How about sticking a virtnet_rq_set_premapped() and calling it here?

It seems more clean.

Btw, the big mode support for pre mapping is still worthwhile
regardless whether or not XDP is supported. It has a page pool so we
can avoid redundant DMA map/unmap there.

> +               }
>         }
>  }
>
> +static struct virtnet_sq_dma *virtnet_sq_map_sg(struct virtnet_sq *sq, int nents, void *data)
> +{
> +       struct virtnet_sq_dma *d, *head;
> +       struct scatterlist *sg;
> +       int i;
> +
> +       head = NULL;
> +
> +       for_each_sg(sq->sg, sg, nents, i) {
> +               sg->dma_address = virtqueue_dma_map_single_attrs(sq->vq, sg_virt(sg),
> +                                                                sg->length,
> +                                                                DMA_TO_DEVICE, 0);
> +               if (virtqueue_dma_mapping_error(sq->vq, sg->dma_address))
> +                       goto err;
> +
> +               d = sq->dmainfo.free;
> +               sq->dmainfo.free = d->next;
> +
> +               d->addr = sg->dma_address;
> +               d->len = sg->length;
> +
> +               d->next = head;
> +               head = d;
> +       }
> +
> +       head->data = data;
> +
> +       return (void *)((unsigned long)head | ((unsigned long)data & VIRTIO_XMIT_DATA_MASK));

So head contains a pointer to data, any reason we still need to pack a
data pointer here?


> +err:
> +       virtnet_sq_unmap(sq, head);
> +       return NULL;
> +}
> +
> +static int virtnet_add_outbuf(struct virtnet_sq *sq, u32 num, void *data)
> +{
> +       int ret;
> +
> +       if (sq->do_dma) {
> +               data = virtnet_sq_map_sg(sq, num, data);
> +               if (!data)
> +                       return -ENOMEM;
> +       }
> +
> +       ret = virtqueue_add_outbuf(sq->vq, sq->sg, num, data, GFP_ATOMIC);
> +       if (ret && sq->do_dma)
> +               virtnet_sq_unmap(sq, data);
> +
> +       return ret;
> +}
> +
>  static void free_old_xmit(struct virtnet_sq *sq, bool in_napi)
>  {
>         u64 bytes, packets = 0;
> @@ -686,8 +802,7 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
>                             skb_frag_size(frag), skb_frag_off(frag));
>         }
>
> -       err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1,
> -                                  xdp_to_ptr(xdpf), GFP_ATOMIC);
> +       err = virtnet_add_outbuf(sq, nr_frags + 1, xdp_to_ptr(xdpf));
>         if (unlikely(err))
>                 return -ENOSPC; /* Caller handle free/refcnt */
>
> @@ -2126,7 +2241,8 @@ static int xmit_skb(struct virtnet_sq *sq, struct sk_buff *skb)
>                         return num_sg;
>                 num_sg++;
>         }
> -       return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
> +
> +       return virtnet_add_outbuf(sq, num_sg, skb);
>  }
>
>  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> @@ -3818,6 +3934,8 @@ static void virtnet_free_queues(struct virtnet_info *vi)
>         for (i = 0; i < vi->max_queue_pairs; i++) {
>                 __netif_napi_del(&vi->rq[i].napi);
>                 __netif_napi_del(&vi->sq[i].napi);
> +
> +               kfree(vi->sq[i].dmainfo.head);
>         }
>
>         /* We called __netif_napi_del(),
> @@ -3866,10 +3984,23 @@ static void free_receive_page_frags(struct virtnet_info *vi)
>
>  static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf)
>  {
> -       if (!is_xdp_frame(buf))
> +       struct virtnet_info *vi = vq->vdev->priv;
> +       struct virtnet_sq *sq;
> +       int i = vq2rxq(vq);
> +
> +       sq = &vi->sq[i];
> +
> +       if (!is_xdp_frame(buf)) {
> +               if (sq->do_dma)
> +                       buf = virtnet_sq_unmap(sq, buf);
> +
>                 dev_kfree_skb(buf);
> -       else
> +       } else {
> +               if (sq->do_dma)
> +                       buf = virtnet_sq_unmap(sq, buf);
> +
>                 xdp_return_frame(ptr_to_xdp(buf));
> +       }
>  }
>
>  static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf)
> @@ -4075,7 +4206,7 @@ static int init_vqs(struct virtnet_info *vi)
>         if (ret)
>                 goto err_free;
>
> -       virtnet_rq_set_premapped(vi);
> +       virtnet_set_premapped(vi);
>
>         cpus_read_lock();
>         virtnet_set_affinity(vi);
> diff --git a/drivers/net/virtio/virtio_net.h b/drivers/net/virtio/virtio_net.h
> index d814341d9f97..ce806afb6d64 100644
> --- a/drivers/net/virtio/virtio_net.h
> +++ b/drivers/net/virtio/virtio_net.h
> @@ -48,6 +48,18 @@ struct virtnet_rq_dma {
>         u16 need_sync;
>  };
>
> +struct virtnet_sq_dma {
> +       struct virtnet_sq_dma *next;
> +       dma_addr_t addr;
> +       u32 len;
> +       void *data;

I think we need to seek a way to reuse what has been stored by virtio
core. It should be much more efficient.

Thanks

> +};
> +
> +struct virtnet_sq_dma_head {
> +       struct virtnet_sq_dma *free;
> +       struct virtnet_sq_dma *head;
> +};
> +
>  /* Internal representation of a send virtqueue */
>  struct virtnet_sq {
>         /* Virtqueue associated with this virtnet_sq */
> @@ -67,6 +79,10 @@ struct virtnet_sq {
>
>         /* Record whether sq is in reset state. */
>         bool reset;
> +
> +       bool do_dma;
> +
> +       struct virtnet_sq_dma_head dmainfo;
>  };
>
>  /* Internal representation of a receive virtqueue */
> --
> 2.32.0.3.g01195cf9f
>


^ permalink raw reply

* Re: [PATCH v4] net: mvneta: fix calls to page_pool_get_stats
From: Lorenzo Bianconi @ 2023-11-09  7:48 UTC (permalink / raw)
  To: Sven Auhagen
  Cc: netdev, thomas.petazzoni, brouer, ilias.apalodimas, mcroce, leon,
	kuba
In-Reply-To: <4wba22pa6sxknqfxve42xevswz4wfu637p5gyyeq546tmzudzu@4z3kphfrpm64>

[-- Attachment #1: Type: text/plain, Size: 7843 bytes --]

> Calling page_pool_get_stats in the mvneta driver without checks
> leads to kernel crashes.
> First the page pool is only available if the bm is not used.
> The page pool is also not allocated when the port is stopped.
> It can also be not allocated in case of errors.
> 
> The current implementation leads to the following crash calling
> ethstats on a port that is down or when calling it at the wrong moment:
> 
> ble to handle kernel NULL pointer dereference at virtual address 00000070
> [00000070] *pgd=00000000
> Internal error: Oops: 5 [#1] SMP ARM
> Hardware name: Marvell Armada 380/385 (Device Tree)
> PC is at page_pool_get_stats+0x18/0x1cc
> LR is at mvneta_ethtool_get_stats+0xa0/0xe0 [mvneta]
> pc : [<c0b413cc>]    lr : [<bf0a98d8>]    psr: a0000013
> sp : f1439d48  ip : f1439dc0  fp : 0000001d
> r10: 00000100  r9 : c4816b80  r8 : f0d75150
> r7 : bf0b400c  r6 : c238f000  r5 : 00000000  r4 : f1439d68
> r3 : c2091040  r2 : ffffffd8  r1 : f1439d68  r0 : 00000000
> Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
> Control: 10c5387d  Table: 066b004a  DAC: 00000051
> Register r0 information: NULL pointer
> Register r1 information: 2-page vmalloc region starting at 0xf1438000 allocated at kernel_clone+0x9c/0x390
> Register r2 information: non-paged memory
> Register r3 information: slab kmalloc-2k start c2091000 pointer offset 64 size 2048
> Register r4 information: 2-page vmalloc region starting at 0xf1438000 allocated at kernel_clone+0x9c/0x390
> Register r5 information: NULL pointer
> Register r6 information: slab kmalloc-cg-4k start c238f000 pointer offset 0 size 4096
> Register r7 information: 15-page vmalloc region starting at 0xbf0a8000 allocated at load_module+0xa30/0x219c
> Register r8 information: 1-page vmalloc region starting at 0xf0d75000 allocated at ethtool_get_stats+0x138/0x208
> Register r9 information: slab task_struct start c4816b80 pointer offset 0
> Register r10 information: non-paged memory
> Register r11 information: non-paged memory
> Register r12 information: 2-page vmalloc region starting at 0xf1438000 allocated at kernel_clone+0x9c/0x390
> Process snmpd (pid: 733, stack limit = 0x38de3a88)
> Stack: (0xf1439d48 to 0xf143a000)
> 9d40:                   000000c0 00000001 c238f000 bf0b400c f0d75150 c4816b80
> 9d60: 00000100 bf0a98d8 00000000 00000000 00000000 00000000 00000000 00000000
> 9d80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 9da0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 9dc0: 00000dc0 5335509c 00000035 c238f000 bf0b2214 01067f50 f0d75000 c0b9b9c8
> 9de0: 0000001d 00000035 c2212094 5335509c c4816b80 c238f000 c5ad6e00 01067f50
> 9e00: c1b0be80 c4816b80 00014813 c0b9d7f0 00000000 00000000 0000001d 0000001d
> 9e20: 00000000 00001200 00000000 00000000 c216ed90 c73943b8 00000000 00000000
> 9e40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 9e60: 00000000 c0ad9034 00000000 00000000 00000000 00000000 00000000 00000000
> 9e80: 00000000 00000000 00000000 5335509c c1b0be80 f1439ee4 00008946 c1b0be80
> 9ea0: 01067f50 f1439ee3 00000000 00000046 b6d77ae0 c0b383f0 00008946 becc83e8
> 9ec0: c1b0be80 00000051 0000000b c68ca480 c7172d00 c0ad8ff0 f1439ee3 cf600e40
> 9ee0: 01600e40 32687465 00000000 00000000 00000000 01067f50 00000000 00000000
> 9f00: 00000000 5335509c 00008946 00008946 00000000 c68ca480 becc83e8 c05e2de0
> 9f20: f1439fb0 c03002f0 00000006 5ac3c35a c4816b80 00000006 b6d77ae0 c030caf0
> 9f40: c4817350 00000014 f1439e1c 0000000c 00000000 00000051 01000000 00000014
> 9f60: 00003fec f1439edc 00000001 c0372abc b6d77ae0 c0372abc cf600e40 5335509c
> 9f80: c21e6800 01015c9c 0000000b 00008946 00000036 c03002f0 c4816b80 00000036
> 9fa0: b6d77ae0 c03000c0 01015c9c 0000000b 0000000b 00008946 becc83e8 00000000
> 9fc0: 01015c9c 0000000b 00008946 00000036 00000035 010678a0 b6d797ec b6d77ae0
> 9fe0: b6dbf738 becc838c b6d186d7 b6baa858 40000030 0000000b 00000000 00000000
>  page_pool_get_stats from mvneta_ethtool_get_stats+0xa0/0xe0 [mvneta]
>  mvneta_ethtool_get_stats [mvneta] from ethtool_get_stats+0x154/0x208
>  ethtool_get_stats from dev_ethtool+0xf48/0x2480
>  dev_ethtool from dev_ioctl+0x538/0x63c
>  dev_ioctl from sock_ioctl+0x49c/0x53c
>  sock_ioctl from sys_ioctl+0x134/0xbd8
>  sys_ioctl from ret_fast_syscall+0x0/0x1c
> Exception stack(0xf1439fa8 to 0xf1439ff0)
> 9fa0:                   01015c9c 0000000b 0000000b 00008946 becc83e8 00000000
> 9fc0: 01015c9c 0000000b 00008946 00000036 00000035 010678a0 b6d797ec b6d77ae0
> 9fe0: b6dbf738 becc838c b6d186d7 b6baa858
> Code: e28dd004 e1a05000 e2514000 0a00006a (e5902070)
> 
> This commit adds the proper checks before calling page_pool_get_stats.
> 
> Fixes: b3fc79225f05 ("net: mvneta: add support for page_pool_get_stats")
> Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de>
> Reported-by: Paulo Da Silva <Paulo.DaSilva@kyberna.com>
> ---

Hi Sven,

first of all thx for fixing it. Just minor comments inline.

Regards,
Lorenzo

> 
> Change from v3:
> 	* Move the page pool check back to mvneta
> 
> Change from v2:
> 	* Fix the fixes tag
> 
> Change from v1:
> 	* Add cover letter
> 	* Move the page pool check in mvneta to the ethtool stats
> 	  function
> 
> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> index 8b0f12a0e0f2..bbb5d972657a 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -4734,13 +4734,16 @@ static void mvneta_ethtool_get_strings(struct net_device *netdev, u32 sset,
>  {
>  	if (sset == ETH_SS_STATS) {
>  		int i;
> +		struct mvneta_port *pp = netdev_priv(netdev);

nit: reverse christmas tree here (just if you need to repost)

>  
>  		for (i = 0; i < ARRAY_SIZE(mvneta_statistics); i++)
>  			memcpy(data + i * ETH_GSTRING_LEN,
>  			       mvneta_statistics[i].name, ETH_GSTRING_LEN);
>  
> -		data += ETH_GSTRING_LEN * ARRAY_SIZE(mvneta_statistics);
> -		page_pool_ethtool_stats_get_strings(data);
> +		if (!pp->bm_priv) {
> +			data += ETH_GSTRING_LEN * ARRAY_SIZE(mvneta_statistics);
> +			page_pool_ethtool_stats_get_strings(data);
> +		}
>  	}
>  }
>  
> @@ -4858,8 +4861,10 @@ static void mvneta_ethtool_pp_stats(struct mvneta_port *pp, u64 *data)
>  	struct page_pool_stats stats = {};
>  	int i;
>  
> -	for (i = 0; i < rxq_number; i++)
> -		page_pool_get_stats(pp->rxqs[i].page_pool, &stats);
> +	for (i = 0; i < rxq_number; i++) {
> +		if (pp->rxqs[i].page_pool)
> +			page_pool_get_stats(pp->rxqs[i].page_pool, &stats);
> +	}
>  
>  	page_pool_ethtool_stats_get(data, &stats);
>  }
> @@ -4875,14 +4880,21 @@ static void mvneta_ethtool_get_stats(struct net_device *dev,
>  	for (i = 0; i < ARRAY_SIZE(mvneta_statistics); i++)
>  		*data++ = pp->ethtool_stats[i];
>  
> -	mvneta_ethtool_pp_stats(pp, data);
> +	if (!pp->bm_priv && !pp->is_stopped)

do we need to check pp->is_stopped here? (we already check if page_pool
pointer is NULL in mvneta_ethtool_pp_stats).
Moreover in mvneta_ethtool_get_sset_count() and in mvneta_ethtool_get_strings()
we just check pp->bm_priv pointer. Are the stats disaligned in this case?

> +		mvneta_ethtool_pp_stats(pp, data);
>  }
>  
>  static int mvneta_ethtool_get_sset_count(struct net_device *dev, int sset)
>  {
> -	if (sset == ETH_SS_STATS)
> -		return ARRAY_SIZE(mvneta_statistics) +
> -		       page_pool_ethtool_stats_get_count();
> +	if (sset == ETH_SS_STATS) {
> +		int count = ARRAY_SIZE(mvneta_statistics);
> +		struct mvneta_port *pp = netdev_priv(dev);
> +
> +		if (!pp->bm_priv)
> +			count += page_pool_ethtool_stats_get_count();
> +
> +		return count;
> +	}
>  
>  	return -EOPNOTSUPP;
>  }
> -- 
> 2.42.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH 15/22] arch: vdso: consolidate gettime prototypes
From: Christophe Leroy @ 2023-11-09  7:50 UTC (permalink / raw)
  To: Arnd Bergmann, Arnd Bergmann, Andrew Morton,
	linux-kernel@vger.kernel.org, Masahiro Yamada,
	linux-kbuild@vger.kernel.org
  Cc: Matt Turner, Vineet Gupta, Russell King, Catalin Marinas,
	Will Deacon, Steven Rostedt, Masami Hiramatsu, Mark Rutland,
	guoren, Peter Zijlstra, Ard Biesheuvel, Huacai Chen, Greg Ungerer,
	Michal Simek, Thomas Bogendoerfer, Dinh Nguyen, Michael Ellerman,
	Nicholas Piggin, Geoff Levand, Palmer Dabbelt, Heiko Carstens,
	John Paul Adrian Glaubitz, David S . Miller, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, x86@kernel.org, Helge Deller,
	Sudip Mukherjee, Greg Kroah-Hartman, Timur Tabi, Kent Overstreet,
	David Woodhouse, Naveen N. Rao, Anil S Keshavamurthy, Kees Cook,
	Vincenzo Frascino, Juri Lelli, Vincent Guittot, Nathan Chancellor,
	Nick Desaulniers, Nicolas Schier, Alexander Viro,
	Uwe Kleine-König, linux-alpha@vger.kernel.org,
	linux-snps-arc@lists.infradead.org,
	linux-arm-kernel@lists.infradead.org,
	linux-trace-kernel@vger.kernel.org, linux-csky@vger.kernel.org,
	loongarch@lists.linux.dev, linux-m68k@lists.linux-m68k.org,
	linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org,
	linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, Netdev,
	linux-parisc@vger.kernel.org, linux-usb@vger.kernel.org,
	linux-fbdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linux-bcachefs@vger.kernel.org, linux-mtd@lists.infradead.org
In-Reply-To: <d94de5b8-db92-4055-9484-f2666973c02a@app.fastmail.com>



Le 08/11/2023 à 20:37, Arnd Bergmann a écrit :
> On Wed, Nov 8, 2023, at 19:31, Christophe Leroy wrote:
>> Le 08/11/2023 à 13:58, Arnd Bergmann a écrit :
> 
>> powerpc has functions doing more or less the same, they are called
>> __c_kernel_clock_gettime() and alike with their prototypes siting in
>> arch/powerpc/include/asm/vdso/gettimeofday.h
>>
>> Should those prototypes be moved to include/vdso/gettime.h too and
>> eventually renamed, or are they considered too powerpc specific ?
> 
> I don't actually know, my initial interpretation was that
> these function names are part of the user ABI for the vdso,
> but I never looked closely enough at how vdso works to
> be sure what the actual ABI is.
> 
> If __c_kernel_clock_gettime() etc are not part of the user-facing
> ABI, I think renaming them for consistency with the other
> architectures would be best.
> 

User-facing ABI function is __kernel_clock_gettime(), defined in 
arch/powerpc/kernel/vdso/gettimeofday.S , see man vdso.
There is no prototype defined for it anywhere, obviously that's 
undetected because it is assembly. Should a prototype be added somewhere 
anyway ?

__kernel_clock_gettime() sets up a stack frame, retrieves the address of 
the datapage then calls __c_kernel_clock_gettime() which then calls 
__cvdso_clock_gettime_data() which is part of the generic CVDSO.

Maybe that's too different from what other architectures do ?

Christophe

^ permalink raw reply

* Re: [RFC net-next] net: don't dump stack on queue timeout
From: Daniele Palmas @ 2023-11-09  7:40 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, syzbot+d55372214aff0faa1f1f, jhs,
	xiyou.wangcong, jiri
In-Reply-To: <20231109000901.949152-1-kuba@kernel.org>

Hello Jakub,

Il giorno gio 9 nov 2023 alle ore 01:09 Jakub Kicinski
<kuba@kernel.org> ha scritto:
>
> The top syzbot report for networking (#14 for the entire kernel)
> is the queue timeout splat. We kept it around for a long time,
> because in real life it provides pretty strong signal that
> something is wrong with the driver or the device.
>
> Removing it is also likely to break monitoring for those who
> track it as a kernel warning.
>
> Nevertheless, WARN()ings are best suited for catching kernel
> programming bugs. If a Tx queue gets starved due to a pause
> storm, priority configuration, or other weirdness - that's
> obviously a problem, but not a problem we can fix at
> the kernel level.
>
> Bite the bullet and convert the WARN() to a print.
>

I can't comment on other scenarios, but at least for mobile broadband
I think this could be a useful change.

For example, I can see the splat with MBIM modems when radio link
failure happens, something for which the host can't really do
anything. So, the main result of using WARN is to scare the users who
are not aware of the reasons behind it and create unneeded support
requests...

Thanks,
Daniele

^ permalink raw reply

* Re: PRP with VLAN support - or how to contribute to a Linux network driver
From: Heiko Gerstung @ 2023-11-09  8:08 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev@vger.kernel.org
In-Reply-To: <6ab3289a-2ede-41a3-8c57-b01df37bab7f@lunn.ch>

[-- Attachment #1: Type: text/plain, Size: 2026 bytes --]

Am 08.11.23, 16:17 schrieb "Andrew Lunn" <andrew@lunn.ch <mailto:andrew@lunn.ch>>:

>> I would like to discuss if it makes sense to remove the PRP
>> functionality from the HSR driver (which is based on the bridge
>> kernel module AFAICS) and instead implement PRP as a separate module
>> (based on the Bonding driver, which would make more sense for PRP).

> Seems like nobody replied. I don't know PRP or HSR, so i can only make
> general remarks.

Thank you for responding!

> The general policy is that we don't rip something out and replace it
> with new code. We try to improve what already exists to meet the
> demands. This is partially because of backwards compatibility. There
> could be users using the code as is. You cannot break that. Can you
> step by step modify the current code to make use of bonding, and in
> the process show you don't break the current use cases? 

Understood. I am not sure if we can change the hsr driver to gradually use a more bonding-like approach for prp and I believe this might not be required, as long as we can get VLAN support into it. 

> You also need to consider offloading to hardware. The bridge code has infrastructure
> to offload. Does the bond driver? I've no idea about that.

I do not know this either but would expect that the nature of bonding would not require offloading support (I do not see a potential for efficiency/performance improvements here, unlike HSR or PRP). 

>> Hoping for advise what the next steps could be. Happy to discuss
>> this off-list as it may not be of interest for most people.

> You probably want to get together with others who are interested in
> PRP and HSR. linutronix, ti, microchip, etc.

Yes, would love to do that and my hope was that I would find them here. I am not familiar with the "orphaned" status for a kernel module, but I would have expected that one of the mentioned parties interested in PRP/HSR would have adopted the module. 

> Andrew

Again, thanks a lot for your comments and remarks, very useful.

Heiko

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 8165 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox