* Re: [PATCH net-next v2 3/3] net: dsa: realtek: support reset controller
From: Vladimir Oltean @ 2023-11-02 14:04 UTC (permalink / raw)
To: Luiz Angelo Daros de Luca
Cc: netdev, linus.walleij, alsi, andrew, vivien.didelot, f.fainelli,
davem, kuba, pabeni, robh+dt, krzk+dt, arinc.unal
In-Reply-To: <CAJq09z6f3AA4t7t+FvdRg9wS9DftNbibu6pssUAPA3u4qih0rg@mail.gmail.com>
On Wed, Nov 01, 2023 at 04:55:19PM -0300, Luiz Angelo Daros de Luca wrote:
> Hi Vladimir,
>
> > realtek-mdio is an MDIO driver while realtek-smi is a platform driver
> > implementing a bitbang protocol. They might never be used together in
> > a system to share RAM and not even installed together in small
> > systems. If I do need to share the code, I would just link it twice.
> > Would something like this be acceptable?
> >
> > drivers/net/dsa/realtek/Makefile
> > -obj-$(CONFIG_NET_DSA_REALTEK_MDIO) += realtek-mdio.o
> > -obj-$(CONFIG_NET_DSA_REALTEK_SMI) += realtek-smi.o
> > +obj-$(CONFIG_NET_DSA_REALTEK_MDIO) += realtek-mdio.o realtek_common.o
> > +obj-$(CONFIG_NET_DSA_REALTEK_SMI) += realtek-smi.o realtek_common.o
You cannot link realtek_common.o into multiple .ko files. It generates
build warnings and it is being phased out.
https://patchwork.kernel.org/project/netdevbpf/cover/20221119225650.1044591-1-alobakin@pm.me/
> Just a follow up.
>
> It is not that simple to include a .c file into an existing single
> file module. It looks like you need to rename the original file as all
> linked objects must not conflict with the module name. The kernel
> build seems to create a new object file for each module. Is there a
> clearer way? I think #include a common .c file would not be
> acceptable.
>
> I tested your shared module suggestion. It is the clearest one but it
> also increased the overall size quite a bit. Even linking two objects
> seems to use the double of space used by the functions alone. These
> are some values (mips)
>
> drivers/net/dsa/realtek/realtek_common.o 5744 without exports
> drivers/net/dsa/realtek/realtek_common.o 33472 exporting the two reset functions (assert/deassert)
>
> drivers/net/dsa/realtek/realtek-mdio.o 18756 without the reset funcs (to be used as a module)
> drivers/net/dsa/realtek/realtek-mdio.o 20480 including the realtek_common.c (#include <realtek_common.c>)
> drivers/net/dsa/realtek/realtek-mdio.o 22696 linking the realtek_common.o
>
> drivers/net/dsa/realtek/realtek-smi.o 30712 without the reset funcs (to be used as a module)
> drivers/net/dsa/realtek/realtek-smi.o 34604 linking the realtek_common.o
>
> drivers/net/dsa/realtek/realtek-mdio.ko 28800 without the reset funcs (it will use realtek_common.ko)
> drivers/net/dsa/realtek/realtek-mdio.ko 32736 linking the realtek_common.o
>
> drivers/net/dsa/realtek/realtek-smi.ko 40708 without the reset funcs (it will use realtek_common.ko)
> drivers/net/dsa/realtek/realtek-smi.ko 44612 linking the realtek_common.o
>
> In summary, we get about 1.5kb of code with the extra functions,
> almost 4kb if we link a common object containing the functions and
> 33kb if we use a module for those two functions.
>
> I can go with any option. I just need to know which one would be
> accepted to update my patches.
> 1) keep duplicated functions on each file
Disadvantage: need to update the same functions twice, it becomes
possible for them to diverge, increases maintenance difficulty.
> 2) share the code including the .c on both
Including a .c file with a preprocessor #include is fragile, has to be
placed very carefully, etc. So it is also a time bomb from a maintenance
PoV. Maybe a header with static inline function definitions would be
worth considering, although I don't think it's common practice to do
this.
> 3) share the code linking a common object in both modules (and
> renaming existing .c files)
Sharing object files is being phased out.
> 4) create a new module used by both modules.
I am suspicious of the numbers you provided above. What needs to be
compared is the reduction in size of realtek_mdio.ko and realtek_smi.ko,
compared to the size of the new realtek_common.ko. Also, this starts
being more and more worthwhile as more code goes into realtek_common.ko,
and I also mentioned a common probe/remove/shutdown as being viable
candidates. Looking at your figures, I'm not sure at which ones I need
to look in order to compute that metric.
> The devices that would use this driver have very restricted storage
> space. Every kbyte counts.
Well, in that case you could compile the code as built into the kernel,
and the module overhead would go away.
^ permalink raw reply
* Re: [PATCH bpf-next v6 11/18] ice: put XDP meta sources assignment under a static key condition
From: Larysa Zaremba @ 2023-11-02 13:48 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski,
Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
netdev, Willem de Bruijn, Alexei Starovoitov, Tariq Toukan,
Saeed Mahameed, toke
In-Reply-To: <ZUOitlGHC0Kgrh5i@boxer>
On Thu, Nov 02, 2023 at 02:23:02PM +0100, Maciej Fijalkowski wrote:
> On Tue, Oct 31, 2023 at 03:22:31PM +0100, Larysa Zaremba wrote:
> > On Sat, Oct 28, 2023 at 09:55:52PM +0200, Maciej Fijalkowski wrote:
> > > On Mon, Oct 23, 2023 at 11:35:46AM +0200, Larysa Zaremba wrote:
> > > > On Fri, Oct 20, 2023 at 06:32:13PM +0200, Maciej Fijalkowski wrote:
> > > > > On Thu, Oct 12, 2023 at 07:05:17PM +0200, Larysa Zaremba wrote:
> > > > > > Usage of XDP hints requires putting additional information after the
> > > > > > xdp_buff. In basic case, only the descriptor has to be copied on a
> > > > > > per-packet basis, because xdp_buff permanently resides before per-ring
> > > > > > metadata (cached time and VLAN protocol ID).
> > > > > >
> > > > > > However, in ZC mode, xdp_buffs come from a pool, so memory after such
> > > > > > buffer does not contain any reliable information, so everything has to be
> > > > > > copied, damaging the performance.
> > > > > >
> > > > > > Introduce a static key to enable meta sources assignment only when attached
> > > > > > XDP program is device-bound.
> > > > > >
> > > > > > This patch eliminates a 6% performance drop in ZC mode, which was a result
> > > > > > of addition of XDP hints to the driver.
> > > > > >
> > > > > > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > > > > > ---
> > > > > > drivers/net/ethernet/intel/ice/ice.h | 1 +
> > > > > > drivers/net/ethernet/intel/ice/ice_main.c | 14 ++++++++++++++
> > > > > > drivers/net/ethernet/intel/ice/ice_txrx.c | 3 ++-
> > > > > > drivers/net/ethernet/intel/ice/ice_xsk.c | 3 +++
> > > > > > 4 files changed, 20 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
> > > > > > index 3d0f15f8b2b8..76d22be878a4 100644
> > > > > > --- a/drivers/net/ethernet/intel/ice/ice.h
> > > > > > +++ b/drivers/net/ethernet/intel/ice/ice.h
> > > > > > @@ -210,6 +210,7 @@ enum ice_feature {
> > > > > > };
> > > > > >
> > > > > > DECLARE_STATIC_KEY_FALSE(ice_xdp_locking_key);
> > > > > > +DECLARE_STATIC_KEY_FALSE(ice_xdp_meta_key);
> > > > > >
> > > > > > struct ice_channel {
> > > > > > struct list_head list;
> > > > > > diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> > > > > > index 47e8920e1727..ee0df86d34b7 100644
> > > > > > --- a/drivers/net/ethernet/intel/ice/ice_main.c
> > > > > > +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> > > > > > @@ -48,6 +48,9 @@ MODULE_PARM_DESC(debug, "netif level (0=none,...,16=all)");
> > > > > > DEFINE_STATIC_KEY_FALSE(ice_xdp_locking_key);
> > > > > > EXPORT_SYMBOL(ice_xdp_locking_key);
> > > > > >
> > > > > > +DEFINE_STATIC_KEY_FALSE(ice_xdp_meta_key);
> > > > > > +EXPORT_SYMBOL(ice_xdp_meta_key);
> > > > > > +
> > > > > > /**
> > > > > > * ice_hw_to_dev - Get device pointer from the hardware structure
> > > > > > * @hw: pointer to the device HW structure
> > > > > > @@ -2634,6 +2637,11 @@ static int ice_xdp_alloc_setup_rings(struct ice_vsi *vsi)
> > > > > > return -ENOMEM;
> > > > > > }
> > > > > >
> > > > > > +static bool ice_xdp_prog_has_meta(struct bpf_prog *prog)
> > > > > > +{
> > > > > > + return prog && prog->aux->dev_bound;
> > > > > > +}
> > > > > > +
> > > > > > /**
> > > > > > * ice_vsi_assign_bpf_prog - set or clear bpf prog pointer on VSI
> > > > > > * @vsi: VSI to set the bpf prog on
> > > > > > @@ -2644,10 +2652,16 @@ static void ice_vsi_assign_bpf_prog(struct ice_vsi *vsi, struct bpf_prog *prog)
> > > > > > struct bpf_prog *old_prog;
> > > > > > int i;
> > > > > >
> > > > > > + if (ice_xdp_prog_has_meta(prog))
> > > > > > + static_branch_inc(&ice_xdp_meta_key);
> > > > >
> > > > > i thought boolean key would be enough but inc/dec should serve properly
> > > > > for example prog hotswap cases.
> > > > >
> > > >
> > > > My thought process on using counting instead of boolean was: there can be
> > > > several PFs that use the same driver, so therefore we need to keep track of how
> > > > many od them use hints.
> > >
> > > Very good point. This implies that if PF0 has hints-enabled prog loaded,
> > > PF1 with non-hints prog will "suffer" from it.
> > >
> > > Sorry for such a long delays in responses but I was having a hard time
> > > making up my mind about it. In the end I have come up to some conclusions.
> > > I know the timing for sending this response is not ideal, but I need to
> > > get this off my chest and bring discussion back to life:)
> > >
> > > IMHO having static keys to eliminate ZC overhead does not scale. I assume
> > > every other driver would have to follow that.
> > >
> > > XSK pool allows us to avoid initializing various things per each packet.
> > > Instead, taking xdp_rxq_info as an example, each xdp_buff from pool has
> > > xdp_rxq_info assigned at init time. With this in mind, we should have some
> > > mechanism to set hints-specific things in xdp_buff_xsk::cb, at init time
> > > as well. Such mechanism should not require us to expose driver's private
> > > xdp_buff hints containers (such as ice_pkt_ctx) to XSK pool.
> > >
> > > Right now you moved phctime down to ice_pkt_ctx and to me that's the main
> > > reason we have to copy ice_pkt_ctx to each xdp_buff on ZC. What if we keep
> > > the cached_phctime at original offset in ring but ice_pkt_ctx would get a
> > > pointer to that?
> > >
> > > This would allow us to init the pointer in each xdp_buff from XSK pool at
> > > init time. I have come up with a way to program that via so called XSK
> > > meta descriptors. Each desc would have data to write onto cb, offset
> > > within cb and amount of bytes to write/copy.
> > >
> > > I'll share the diff below but note that I didn't measure how much lower
> > > the performance is degraded. My icelake machine where I used to measure
> > > performance-sensitive code got broke. For now we can't escape initing
> > > eop_desc per each xdp_buff, but I moved it to alloc side, as we mangle
> > > descs there anyway.
> > >
> > > I think mlx5 could benefit from that approach as well with initing the rq
> > > ptr at init time.
> > >
> > > Diff does mostly these things:
> > > - move cached_phctime to old place in ice_rx_ring and add ptr to that in
> > > ice_pkt_ctx
> > > - introduce xsk_pool_set_meta()
> > > - use it from ice side.
> > >
> >
> > Thank you for the code! I will probably send v7 with such changes. Are you OK,
> > if patch with core changes would go with you as an author?
>
> Yes or I can produce a patch and share, up to you.
>
I have already started, your diff does not compile, so I took some creative
liberty. Will send you patches for verification this week.
> >
> > But also, I see a minor problem with that switching VLAN protocol does not
> > trigger buffer allocation, so we have to point to that too, this probably means
> > moving cached time back and finding 16 extra bits in CL3. Single pointer to
> > {cached time, vlan_proto} would be copied to be after xdp_buff.
>
> It's not that it has to trigger buffer allocation, we could stop the
> interface if pool is present and update vlan proto on pool's xdp_buffs
> (from quick glance i don't see that we're stopping iface for setting vlan
> features) but that sounds like more of a hassle to do...
>
> So yeah maybe let's just have a ptr in ice_pkt_ctx as well.
>
> [...]
^ permalink raw reply
* Re: [PATCH bpf-next v3 1/2] bpf: add skcipher API support to TC/XDP programs
From: Vadim Fedorenko @ 2023-11-02 13:44 UTC (permalink / raw)
To: Martin KaFai Lau, Song Liu
Cc: bpf, netdev, linux-crypto, Jakub Kicinski, Andrii Nakryiko,
Alexei Starovoitov, Mykola Lysenko, Vadim Fedorenko,
David S. Miller, Herbert Xu
In-Reply-To: <6947046d-27e3-90ee-3419-0b480af0abb0@linux.dev>
On 01/11/2023 23:41, Martin KaFai Lau wrote:
> On 11/1/23 3:50 PM, Vadim Fedorenko wrote:
>>>> +static void *__bpf_dynptr_data_ptr(const struct bpf_dynptr_kern *ptr)
>>>> +{
>>>> + enum bpf_dynptr_type type;
>>>> +
>>>> + if (!ptr->data)
>>>> + return NULL;
>>>> +
>>>> + type = bpf_dynptr_get_type(ptr);
>>>> +
>>>> + switch (type) {
>>>> + case BPF_DYNPTR_TYPE_LOCAL:
>>>> + case BPF_DYNPTR_TYPE_RINGBUF:
>>>> + return ptr->data + ptr->offset;
>>>> + case BPF_DYNPTR_TYPE_SKB:
>>>> + return skb_pointer_if_linear(ptr->data, ptr->offset,
>>>> __bpf_dynptr_size(ptr));
>>>> + case BPF_DYNPTR_TYPE_XDP:
>>>> + {
>>>> + void *xdp_ptr = bpf_xdp_pointer(ptr->data, ptr->offset,
>>>> __bpf_dynptr_size(ptr));
>>>
>>> I suspect what it is doing here (for skb and xdp in particular) is
>>> very similar to bpf_dynptr_slice. Please check if
>>> bpf_dynptr_slice(ptr, 0, NULL, sz) will work.
>>>
>>
>> Well, yes, it's simplified version of bpf_dynptr_slice. The problem is
>> that bpf_dynptr_slice bpf_kfunc which cannot be used in another
>> bpf_kfunc. Should I refactor the code to use it in both places? Like
>
> Sorry, scrolled too fast in my earlier reply :(
>
> I am not aware of this limitation. What error does it have?
> The bpf_dynptr_slice_rdwr kfunc() is also calling the bpf_dynptr_slice()
> kfunc.
>
>> create __bpf_dynptr_slice() which will be internal part of bpf_kfunc?
Apparently Song has a patch to expose these bpf_dynptr_slice* functions
ton in-kernel users.
https://lore.kernel.org/bpf/20231024235551.2769174-2-song@kernel.org/
Should I wait for it to be merged before sending next version?
^ permalink raw reply
* Re: [PATCH net-next v2 8/9] microchip: lan865x: add driver support for Microchip's LAN865X MACPHY
From: Ramón Nordin Rodriguez @ 2023-11-02 13:40 UTC (permalink / raw)
To: Andrew Lunn
Cc: Parthiban Veerasooran, davem, edumazet, kuba, pabeni, robh+dt,
krzysztof.kozlowski+dt, conor+dt, corbet, steen.hegelund, rdunlap,
horms, casper.casan, netdev, devicetree, linux-kernel, linux-doc,
horatiu.vultur, Woojung.Huh, Nicolas.Ferre, UNGLinuxDriver,
Thorsten.Kummermehr
In-Reply-To: <f95b42ef-b7e0-44dc-b7c8-9353e9edc2df@lunn.ch>
Hi Andrew
> > spi-max-frequency = <50000000>;
>
> That is a pretty high frequency. It is actually running at that speed?
I have not verified that we're actually hitting 50M.
>
> Have you tried lower speed?
Sorry for being too brief about the things I've tested. But yes, I've
tested running at 15MHz with the same or similar enough results that
I've not been able to discern any difference.
Meaningful to mention here as well would be that I've tested with and
without DMA as well.
> > With this setup I'm getting a maximum throughput of about 90kB/s.
> > If I increase the chunk size / oa-cps to 64 I get a might higher
> > throughput ~900kB/s, but after 0-2s I get dump below (or similar).
>
> Is this tcp traffic? Lost packets will have a big impact. You might
> want to look at the link peer with tcpdump and look for retries. Also,
> look if there are frames with bad checksums.
>
As you assume, this was with TCP. I'll do a run with tcpdump and see if
I can compile any intelligent statistics from that.
^ permalink raw reply
* Re: [PATCH net-next v2 9/9] dt-bindings: net: add Microchip's LAN865X 10BASE-T1S MACPHY
From: Parthiban.Veerasooran @ 2023-11-02 13:31 UTC (permalink / raw)
To: krzysztof.kozlowski
Cc: netdev, devicetree, linux-kernel, linux-doc, Horatiu.Vultur,
Woojung.Huh, Nicolas.Ferre, UNGLinuxDriver, Thorsten.Kummermehr,
davem, edumazet, kuba, pabeni, robh+dt, krzysztof.kozlowski+dt,
conor+dt, corbet, Steen.Hegelund, rdunlap, horms, casper.casan,
andrew
In-Reply-To: <08ad71a0-daf1-4588-a97c-95f3ee5df84e@linaro.org>
Hi Krzysztof
On 30/10/23 8:15 pm, Krzysztof Kozlowski wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
> On 30/10/2023 14:16, Parthiban.Veerasooran@microchip.com wrote:
>> Hi Krzysztof,
>>
>> On 24/10/23 1:33 pm, Krzysztof Kozlowski wrote:
>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>>>
>>> On 23/10/2023 17:46, Parthiban Veerasooran wrote:
>>>> Add DT bindings for Microchip's LAN865X 10BASE-T1S MACPHY. The LAN8650/1
>>>> combines a Media Access Controller (MAC) and an Ethernet PHY to enable
>>>> 10BASE‑T1S networks. The Ethernet Media Access Controller (MAC) module
>>>> implements a 10 Mbps half duplex Ethernet MAC, compatible with the IEEE
>>>> 802.3 standard and a 10BASE-T1S physical layer transceiver integrated
>>>> into the LAN8650/1. The communication between the Host and the MAC-PHY is
>>>> specified in the OPEN Alliance 10BASE-T1x MACPHY Serial Interface (TC6).
>>>
>>> It does not look like you tested the bindings, at least after quick
>>> look. Please run `make dt_binding_check` (see
>>> Documentation/devicetree/bindings/writing-schema.rst for instructions).
>>> Maybe you need to update your dtschema and yamllint.
>> Sorry, somehow I missed doing this check. Will fix this in the next
>> revision.
>
> Please also build your driver.
Didn't see any error in the driver compilation.
>
>>>
>>>>
>>>> Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
>>>> ---
>>>> .../bindings/net/microchip,lan865x.yaml | 101 ++++++++++++++++++
>>>> MAINTAINERS | 1 +
>>>> 2 files changed, 102 insertions(+)
>>>> create mode 100644 Documentation/devicetree/bindings/net/microchip,lan865x.yaml
>>>>
>>>> diff --git a/Documentation/devicetree/bindings/net/microchip,lan865x.yaml b/Documentation/devicetree/bindings/net/microchip,lan865x.yaml
>>>> new file mode 100644
>>>> index 000000000000..974622dd6846
>>>> --- /dev/null
>>>> +++ b/Documentation/devicetree/bindings/net/microchip,lan865x.yaml
>>>> @@ -0,0 +1,101 @@
>>>> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
>>>> +%YAML 1.2
>>>> +---
>>>> +$id: http://devicetree.org/schemas/net/microchip,lan865x.yaml#
>>>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>>>> +
>>>> +title: Microchip LAN8650/1 10BASE-T1S MACPHY Ethernet Controllers
>>>> +
>>>> +maintainers:
>>>> + - Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
>>>> +
>>>> +description:
>>>> + The LAN8650/1 combines a Media Access Controller (MAC) and an Ethernet
>>>> + PHY to enable 10BASE‑T1S networks. The Ethernet Media Access Controller
>>>> + (MAC) module implements a 10 Mbps half duplex Ethernet MAC, compatible
>>>> + with the IEEE 802.3 standard and a 10BASE-T1S physical layer transceiver
>>>> + integrated into the LAN8650/1. The communication between the Host and
>>>> + the MAC-PHY is specified in the OPEN Alliance 10BASE-T1x MACPHY Serial
>>>> + Interface (TC6).
>>>> +
>>>> + Specifications about the LAN8650/1 can be found at:
>>>> + https://www.microchip.com/en-us/product/lan8650
>>>> +
>>>> +allOf:
>>>> + - $ref: ethernet-controller.yaml#
>>>> +
>>>> +properties:
>>>> + compatible:
>>>> + const: microchip,lan865x
>>>
>>> No wildcards in compatibles.
>> Ah ok missed it. So it will become like below isn't it?
>>
>> properties:
>> compatible:
>> enum:
>> - microchip,lan8650
>> - microchip,lan8651
>
> Rather enum for lan8650 and items for lan8651 with fallback. Aren't they
> compatible, since you wanted to use wildcard?
So do you mean like below?
properties:
compatible:
oneOf:
- items:
- const: microchip,lan8650
- const: microchip,lan8651
- enum:
- microchip,lan8650
Best Regards,
Parthiban V
>
>
> Best regards,
> Krzysztof
>
^ permalink raw reply
* Re: [PATCH net] virtio/vsock: Fix uninit-value in virtio_transport_recv_pkt()
From: Shigeru Yoshida @ 2023-11-02 13:30 UTC (permalink / raw)
To: kuba
Cc: stefanha, sgarzare, davem, edumazet, pabeni, kvm, virtualization,
netdev, linux-kernel
In-Reply-To: <20231101222045.5f7cca01@kernel.org>
On Wed, 1 Nov 2023 22:20:45 -0700, Jakub Kicinski wrote:
> On Fri, 27 Oct 2023 00:01:54 +0900 Shigeru Yoshida wrote:
>> This issue occurs because the `buf_alloc` and `fwd_cnt` fields of the
>> `struct virtio_vsock_hdr` are not initialized when a new skb is allocated
>> in `virtio_transport_alloc_skb()`. This patch resolves the issue by
>> initializing these fields during allocation.
>
> We didn't manage to apply this before the merge window, and now the
> trees have converged. Patch no longer applies cleanly to net.
> Please rebase & repost.
I got it. I'll rebase the patch to the latest net tree and resend it.
Thanks,
Shigeru
^ permalink raw reply
* Re: [PATCH net v1 1/3] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
From: D. Wythe @ 2023-11-02 13:28 UTC (permalink / raw)
To: Wenjia Zhang, kgraul, jaka, wintera
Cc: kuba, davem, netdev, linux-s390, linux-rdma
In-Reply-To: <72b57457-43e1-49f7-9670-08bbf03231e1@linux.ibm.com>
On 11/2/23 6:34 PM, Wenjia Zhang wrote:
>
>
> On 02.11.23 06:52, D. Wythe wrote:
>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>
>> Considering scenario:
>>
>> smc_cdc_rx_handler
>> __smc_release
>> sock_set_flag
>> smc_close_active()
>> sock_set_flag
>>
>> __set_bit(DEAD) __set_bit(DONE)
>>
>> Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
>> if the DEAD flag lost, the state SMC_CLOSED will be never be reached
>> in smc_close_passive_work:
>>
>> if (sock_flag(sk, SOCK_DEAD) &&
>> smc_close_sent_any_close(conn)) {
>> sk->sk_state = SMC_CLOSED;
>> } else {
>> /* just shutdown, but not yet closed locally */
>> sk->sk_state = SMC_APPFINCLOSEWAIT;
>> }
>>
>> Replace sock_set_flags or __set_bit to set_bit will fix this problem.
>> Since set_bit is atomic.
>>
>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>> Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
>
> Fixes tag?
ops, i forget that. 🙁
I will fix it in next version.
Thanks.
D. Wythe
^ permalink raw reply
* RE: [PATCH net-next v2 3/4] octeon_ep: implement xmit_more in transmit
From: Shinas Rasheed @ 2023-11-02 13:24 UTC (permalink / raw)
To: David Laight, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
Cc: Haseeb Gani, Vimlesh Kumar, egallen@redhat.com,
mschmidt@redhat.com, pabeni@redhat.com, horms@kernel.org,
kuba@kernel.org, davem@davemloft.net, wizhao@redhat.com,
konguyen@redhat.com, Veerasenareddy Burru, Sathesh B Edara,
Eric Dumazet
In-Reply-To: <9631475a8ba94c1682696d219c632538@AcuMS.aculab.com>
Hi David,
> -----Original Message-----
> From: David Laight <David.Laight@ACULAB.COM>
> Sent: Monday, October 30, 2023 9:00 PM
> To: Shinas Rasheed <srasheed@marvell.com>; netdev@vger.kernel.org;
> linux-kernel@vger.kernel.org
> > > > - /* Ring Doorbell to notify the NIC there is a new packet */
> > > > - writel(1, iq->doorbell_reg);
> > > > - iq->stats.instr_posted++;
> > > > + /* Ring Doorbell to notify the NIC of new packets */
> > > > + writel(iq->fill_cnt, iq->doorbell_reg);
> > > > + iq->stats.instr_posted += iq->fill_cnt;
> > > > + iq->fill_cnt = 0;
> > > > return NETDEV_TX_OK;
> > >
> > > Does that really need the count?
> > > A 'doorbell' register usually just tells the MAC engine
> > > to go and look at the transmit ring.
> > > It then continues to process transmits until it fails
> > > to find a packet.
> > > So if the transmit is active you don't need to set the bit.
> > > (Although that is actually rather hard to detect.)
> >
> > The way the octeon hardware works is that it expects number of newly
> updated packets
> > to be written to the doorbell register,which effectively increments the
> doorbell
> > count which shall be decremented by hardware as it reads these packets.
> So in essence,
> > the doorbell count also indicates outstanding packets to be read by
> hardware.
>
> Unusual - I wouldn't call that a doorbell register.
I double checked in earlier implementations as well as the reference manuals.
This is how the hardware register is prescribed to be used.
> If you do writes for every packet then the hardware can get on with
> sending the first packet and might be able to do bulk reads
> for the next packet(s) when that finishes.
>
> The extra code you are adding could easily (waving hands)
> be more expensive than the posted PCIe write.
> (Especially if you have to add an atomic operation.)
>
> Unless, of course, you have to wait for it to send that batch
> of packets before you can give it any more.
> Which would be rather entirely broken and would really require
> you do the write in the end-of-transit path.
The atomic operation is replaced in the very next patch. Other than that,
what is it that you suggest we should 'fix' in this implementation of handling xmit_more?
> The loop is something like:
> while (get_packet()) {
> per_cpu->xmit_more = !queue_empty();
> if (transmit_packet() != TX_OK)
> break;
> }
> So if a packet is added while all the transmit setup code is running
> it isn't detected.
> I managed to repeatedly get that to loop when xmit_more wasn't set
> and in a driver where the 'doorbell' write wasn't entirely trivial.
How are you suggesting we handle or diagnose such a case, in the driver? Can you provide an example
reference which handles adequately this special case?
When the net-next opens again, I can submit the patches again with the added changes you suggested. Thanks for reviewing!
Shinas
^ permalink raw reply
* Re: [PATCH bpf-next v6 11/18] ice: put XDP meta sources assignment under a static key condition
From: Maciej Fijalkowski @ 2023-11-02 13:23 UTC (permalink / raw)
To: Larysa Zaremba
Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
kpsingh, sdf, haoluo, jolsa, David Ahern, Jakub Kicinski,
Willem de Bruijn, Jesper Dangaard Brouer, Anatoly Burakov,
Alexander Lobakin, Magnus Karlsson, Maryam Tahhan, xdp-hints,
netdev, Willem de Bruijn, Alexei Starovoitov, Tariq Toukan,
Saeed Mahameed, toke
In-Reply-To: <ZUENpw5GVqcL0GJc@lzaremba-mobl.ger.corp.intel.com>
On Tue, Oct 31, 2023 at 03:22:31PM +0100, Larysa Zaremba wrote:
> On Sat, Oct 28, 2023 at 09:55:52PM +0200, Maciej Fijalkowski wrote:
> > On Mon, Oct 23, 2023 at 11:35:46AM +0200, Larysa Zaremba wrote:
> > > On Fri, Oct 20, 2023 at 06:32:13PM +0200, Maciej Fijalkowski wrote:
> > > > On Thu, Oct 12, 2023 at 07:05:17PM +0200, Larysa Zaremba wrote:
> > > > > Usage of XDP hints requires putting additional information after the
> > > > > xdp_buff. In basic case, only the descriptor has to be copied on a
> > > > > per-packet basis, because xdp_buff permanently resides before per-ring
> > > > > metadata (cached time and VLAN protocol ID).
> > > > >
> > > > > However, in ZC mode, xdp_buffs come from a pool, so memory after such
> > > > > buffer does not contain any reliable information, so everything has to be
> > > > > copied, damaging the performance.
> > > > >
> > > > > Introduce a static key to enable meta sources assignment only when attached
> > > > > XDP program is device-bound.
> > > > >
> > > > > This patch eliminates a 6% performance drop in ZC mode, which was a result
> > > > > of addition of XDP hints to the driver.
> > > > >
> > > > > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > > > > ---
> > > > > drivers/net/ethernet/intel/ice/ice.h | 1 +
> > > > > drivers/net/ethernet/intel/ice/ice_main.c | 14 ++++++++++++++
> > > > > drivers/net/ethernet/intel/ice/ice_txrx.c | 3 ++-
> > > > > drivers/net/ethernet/intel/ice/ice_xsk.c | 3 +++
> > > > > 4 files changed, 20 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
> > > > > index 3d0f15f8b2b8..76d22be878a4 100644
> > > > > --- a/drivers/net/ethernet/intel/ice/ice.h
> > > > > +++ b/drivers/net/ethernet/intel/ice/ice.h
> > > > > @@ -210,6 +210,7 @@ enum ice_feature {
> > > > > };
> > > > >
> > > > > DECLARE_STATIC_KEY_FALSE(ice_xdp_locking_key);
> > > > > +DECLARE_STATIC_KEY_FALSE(ice_xdp_meta_key);
> > > > >
> > > > > struct ice_channel {
> > > > > struct list_head list;
> > > > > diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> > > > > index 47e8920e1727..ee0df86d34b7 100644
> > > > > --- a/drivers/net/ethernet/intel/ice/ice_main.c
> > > > > +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> > > > > @@ -48,6 +48,9 @@ MODULE_PARM_DESC(debug, "netif level (0=none,...,16=all)");
> > > > > DEFINE_STATIC_KEY_FALSE(ice_xdp_locking_key);
> > > > > EXPORT_SYMBOL(ice_xdp_locking_key);
> > > > >
> > > > > +DEFINE_STATIC_KEY_FALSE(ice_xdp_meta_key);
> > > > > +EXPORT_SYMBOL(ice_xdp_meta_key);
> > > > > +
> > > > > /**
> > > > > * ice_hw_to_dev - Get device pointer from the hardware structure
> > > > > * @hw: pointer to the device HW structure
> > > > > @@ -2634,6 +2637,11 @@ static int ice_xdp_alloc_setup_rings(struct ice_vsi *vsi)
> > > > > return -ENOMEM;
> > > > > }
> > > > >
> > > > > +static bool ice_xdp_prog_has_meta(struct bpf_prog *prog)
> > > > > +{
> > > > > + return prog && prog->aux->dev_bound;
> > > > > +}
> > > > > +
> > > > > /**
> > > > > * ice_vsi_assign_bpf_prog - set or clear bpf prog pointer on VSI
> > > > > * @vsi: VSI to set the bpf prog on
> > > > > @@ -2644,10 +2652,16 @@ static void ice_vsi_assign_bpf_prog(struct ice_vsi *vsi, struct bpf_prog *prog)
> > > > > struct bpf_prog *old_prog;
> > > > > int i;
> > > > >
> > > > > + if (ice_xdp_prog_has_meta(prog))
> > > > > + static_branch_inc(&ice_xdp_meta_key);
> > > >
> > > > i thought boolean key would be enough but inc/dec should serve properly
> > > > for example prog hotswap cases.
> > > >
> > >
> > > My thought process on using counting instead of boolean was: there can be
> > > several PFs that use the same driver, so therefore we need to keep track of how
> > > many od them use hints.
> >
> > Very good point. This implies that if PF0 has hints-enabled prog loaded,
> > PF1 with non-hints prog will "suffer" from it.
> >
> > Sorry for such a long delays in responses but I was having a hard time
> > making up my mind about it. In the end I have come up to some conclusions.
> > I know the timing for sending this response is not ideal, but I need to
> > get this off my chest and bring discussion back to life:)
> >
> > IMHO having static keys to eliminate ZC overhead does not scale. I assume
> > every other driver would have to follow that.
> >
> > XSK pool allows us to avoid initializing various things per each packet.
> > Instead, taking xdp_rxq_info as an example, each xdp_buff from pool has
> > xdp_rxq_info assigned at init time. With this in mind, we should have some
> > mechanism to set hints-specific things in xdp_buff_xsk::cb, at init time
> > as well. Such mechanism should not require us to expose driver's private
> > xdp_buff hints containers (such as ice_pkt_ctx) to XSK pool.
> >
> > Right now you moved phctime down to ice_pkt_ctx and to me that's the main
> > reason we have to copy ice_pkt_ctx to each xdp_buff on ZC. What if we keep
> > the cached_phctime at original offset in ring but ice_pkt_ctx would get a
> > pointer to that?
> >
> > This would allow us to init the pointer in each xdp_buff from XSK pool at
> > init time. I have come up with a way to program that via so called XSK
> > meta descriptors. Each desc would have data to write onto cb, offset
> > within cb and amount of bytes to write/copy.
> >
> > I'll share the diff below but note that I didn't measure how much lower
> > the performance is degraded. My icelake machine where I used to measure
> > performance-sensitive code got broke. For now we can't escape initing
> > eop_desc per each xdp_buff, but I moved it to alloc side, as we mangle
> > descs there anyway.
> >
> > I think mlx5 could benefit from that approach as well with initing the rq
> > ptr at init time.
> >
> > Diff does mostly these things:
> > - move cached_phctime to old place in ice_rx_ring and add ptr to that in
> > ice_pkt_ctx
> > - introduce xsk_pool_set_meta()
> > - use it from ice side.
> >
>
> Thank you for the code! I will probably send v7 with such changes. Are you OK,
> if patch with core changes would go with you as an author?
Yes or I can produce a patch and share, up to you.
>
> But also, I see a minor problem with that switching VLAN protocol does not
> trigger buffer allocation, so we have to point to that too, this probably means
> moving cached time back and finding 16 extra bits in CL3. Single pointer to
> {cached time, vlan_proto} would be copied to be after xdp_buff.
It's not that it has to trigger buffer allocation, we could stop the
interface if pool is present and update vlan proto on pool's xdp_buffs
(from quick glance i don't see that we're stopping iface for setting vlan
features) but that sounds like more of a hassle to do...
So yeah maybe let's just have a ptr in ice_pkt_ctx as well.
[...]
^ permalink raw reply
* [GIT PULL] Landlock updates for v6.7
From: Mickaël Salaün @ 2023-11-02 13:13 UTC (permalink / raw)
To: Linus Torvalds
Cc: Mickaël Salaün, Günther Noack, Paul Moore,
Willem de Bruijn, artem.kuzin, yusongping, linux-kernel,
linux-security-module, netdev, netfilter-devel
Hi Linus,
This PR adds initial network support for Landlock (TCP bind and connect
access control), contributed by Konstantin Meskhidze [1]. Please pull
these changes for v6.7-rc1 . These 13 commits merged cleanly with your
master branch and the LSM/dev branch [2]. The kernel code has been
tested in the latest linux-next releases for a month (next-20231003 [3])
but the related patch series has since been updated (while keeping the
same kernel code): extended tests, improved documentation and commit
messages. I rebased the latest patch series (with some cosmetic fixes)
on v6.6-rc7 and added two more tests.
A Landlock ruleset can now handle two new access rights:
LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP. When
handled, the related actions are denied unless explicitly allowed by a
Landlock network rule for a specific port.
The related patch series has been reviewed for almost two years, it has
evolved a lot and we now have reached a decent design, code and testing.
The refactored kernel code and the new test helpers also bring the
foundation to support more network protocols.
Test coverage for security/landlock is 92.4% of 710 lines according to
gcc/gcov-13, and it was 93.1% of 597 lines before this series. The
decrease in coverage is due to code refactoring to make the ruleset
management more generic (i.e. dealing with inodes and ports) that also
added new WARN_ON_ONCE() checks not possible to test from user space.
syzkaller has been updated accordingly [4], and such patched instance
(tailored to Landlock) has been running for a month, covering all the
new network-related code [5].
Link: https://lore.kernel.org/r/20231026014751.414649-1-konstantin.meskhidze@huawei.com [1]
Link: https://lore.kernel.org/r/CAHC9VhS1wwgH6NNd+cJz4MYogPiRV8NyPDd1yj5SpaxeUB4UVg@mail.gmail.com [2]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next-history.git/commit/?id=c8dc5ee69d3a [3]
Link: https://github.com/google/syzkaller/pull/4266 [4]
Link: https://storage.googleapis.com/syzbot-assets/82e8608dec36/ci-upstream-linux-next-kasan-gce-root-ab577164.html#security%2flandlock%2fnet.c [5]
Regards,
Mickaël
--
The following changes since commit 05d3ef8bba77c1b5f98d941d8b2d4aeab8118ef1:
Linux 6.6-rc7 (2023-10-22 12:11:21 -1000)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux.git tags/landlock-6.7-rc1
for you to fetch changes up to f12f8f84509a084399444c4422661345a15cc713:
selftests/landlock: Add tests for FS topology changes with network rules (2023-10-27 17:53:31 +0200)
----------------------------------------------------------------
Landlock updates for v6.7-rc1
----------------------------------------------------------------
Konstantin Meskhidze (11):
landlock: Make ruleset's access masks more generic
landlock: Refactor landlock_find_rule/insert_rule helpers
landlock: Refactor merge/inherit_ruleset helpers
landlock: Move and rename layer helpers
landlock: Refactor layer helpers
landlock: Refactor landlock_add_rule() syscall
landlock: Support network rules with TCP bind and connect
selftests/landlock: Share enforce_ruleset() helper
selftests/landlock: Add network tests
samples/landlock: Support TCP restrictions
landlock: Document network support
Mickaël Salaün (2):
landlock: Allow FS topology changes for domains without such rule type
selftests/landlock: Add tests for FS topology changes with network rules
Documentation/userspace-api/landlock.rst | 99 +-
include/uapi/linux/landlock.h | 55 +
samples/landlock/sandboxer.c | 115 +-
security/landlock/Kconfig | 1 +
security/landlock/Makefile | 2 +
security/landlock/fs.c | 232 ++--
security/landlock/limits.h | 6 +
security/landlock/net.c | 200 +++
security/landlock/net.h | 33 +
security/landlock/ruleset.c | 405 ++++--
security/landlock/ruleset.h | 185 ++-
security/landlock/setup.c | 2 +
security/landlock/syscalls.c | 158 ++-
tools/testing/selftests/landlock/base_test.c | 2 +-
tools/testing/selftests/landlock/common.h | 13 +
tools/testing/selftests/landlock/config | 4 +
tools/testing/selftests/landlock/fs_test.c | 69 +-
tools/testing/selftests/landlock/net_test.c | 1738 ++++++++++++++++++++++++++
18 files changed, 2967 insertions(+), 352 deletions(-)
create mode 100644 security/landlock/net.c
create mode 100644 security/landlock/net.h
create mode 100644 tools/testing/selftests/landlock/net_test.c
^ permalink raw reply
* [PATCH net-next v2 4/5] virtio-net: support rx netdim
From: Heng Qi @ 2023-11-02 13:09 UTC (permalink / raw)
To: netdev, virtualization
Cc: Jason Wang, Michael S. Tsirkin, Eric Dumazet, Paolo Abeni,
David S. Miller, Jesper Dangaard Brouer, John Fastabend,
Alexei Starovoitov, Simon Horman, Jakub Kicinski, Liu, Yujie
In-Reply-To: <cover.1698929590.git.hengqi@linux.alibaba.com>
By comparing the traffic information in the complete napi processes,
let the virtio-net driver automatically adjust the coalescing
moderation parameters of each receive queue.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
---
v1->v2:
- Improved the judgment of dim switch conditions.
- Fix the safe problem of work thread.
drivers/net/virtio_net.c | 187 ++++++++++++++++++++++++++++++++++-----
1 file changed, 165 insertions(+), 22 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 69fe09e99b3c..5473aa1ee5cd 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -19,6 +19,7 @@
#include <linux/average.h>
#include <linux/filter.h>
#include <linux/kernel.h>
+#include <linux/dim.h>
#include <net/route.h>
#include <net/xdp.h>
#include <net/net_failover.h>
@@ -172,6 +173,17 @@ struct receive_queue {
struct virtnet_rq_stats stats;
+ /* The number of rx notifications */
+ u16 calls;
+
+ /* Is dynamic interrupt moderation enabled? */
+ bool dim_enabled;
+
+ /* Dynamic Interrupt Moderation */
+ struct dim dim;
+
+ u32 packets_in_napi;
+
struct virtnet_interrupt_coalesce intr_coal;
/* Chain pages by the private ptr. */
@@ -305,6 +317,9 @@ struct virtnet_info {
u8 duplex;
u32 speed;
+ /* Is rx dynamic interrupt moderation enabled? */
+ bool rx_dim_enabled;
+
/* Interrupt coalescing settings */
struct virtnet_interrupt_coalesce intr_coal_tx;
struct virtnet_interrupt_coalesce intr_coal_rx;
@@ -2001,6 +2016,7 @@ static void skb_recv_done(struct virtqueue *rvq)
struct virtnet_info *vi = rvq->vdev->priv;
struct receive_queue *rq = &vi->rq[vq2rxq(rvq)];
+ rq->calls++;
virtqueue_napi_schedule(&rq->napi, rvq);
}
@@ -2141,6 +2157,26 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
}
}
+static void virtnet_rx_dim_work(struct work_struct *work);
+
+static void virtnet_rx_dim_update(struct virtnet_info *vi, struct receive_queue *rq)
+{
+ struct dim_sample cur_sample = {};
+
+ if (!rq->packets_in_napi)
+ return;
+
+ u64_stats_update_begin(&rq->stats.syncp);
+ dim_update_sample(rq->calls,
+ u64_stats_read(&rq->stats.packets),
+ u64_stats_read(&rq->stats.bytes),
+ &cur_sample);
+ u64_stats_update_end(&rq->stats.syncp);
+
+ net_dim(&rq->dim, cur_sample);
+ rq->packets_in_napi = 0;
+}
+
static int virtnet_poll(struct napi_struct *napi, int budget)
{
struct receive_queue *rq =
@@ -2149,17 +2185,22 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
struct send_queue *sq;
unsigned int received;
unsigned int xdp_xmit = 0;
+ bool napi_complete;
virtnet_poll_cleantx(rq);
received = virtnet_receive(rq, budget, &xdp_xmit);
+ rq->packets_in_napi += received;
if (xdp_xmit & VIRTIO_XDP_REDIR)
xdp_do_flush();
/* Out of packets? */
- if (received < budget)
- virtqueue_napi_complete(napi, rq->vq, received);
+ if (received < budget) {
+ napi_complete = virtqueue_napi_complete(napi, rq->vq, received);
+ if (napi_complete && rq->dim_enabled)
+ virtnet_rx_dim_update(vi, rq);
+ }
if (xdp_xmit & VIRTIO_XDP_TX) {
sq = virtnet_xdp_get_sq(vi);
@@ -2179,6 +2220,7 @@ static void virtnet_disable_queue_pair(struct virtnet_info *vi, int qp_index)
virtnet_napi_tx_disable(&vi->sq[qp_index].napi);
napi_disable(&vi->rq[qp_index].napi);
xdp_rxq_info_unreg(&vi->rq[qp_index].xdp_rxq);
+ cancel_work_sync(&vi->rq[qp_index].dim.work);
}
static int virtnet_enable_queue_pair(struct virtnet_info *vi, int qp_index)
@@ -2196,6 +2238,9 @@ static int virtnet_enable_queue_pair(struct virtnet_info *vi, int qp_index)
if (err < 0)
goto err_xdp_reg_mem_model;
+ INIT_WORK(&vi->rq[qp_index].dim.work, virtnet_rx_dim_work);
+ vi->rq[qp_index].dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+
virtnet_napi_enable(vi->rq[qp_index].vq, &vi->rq[qp_index].napi);
virtnet_napi_tx_enable(vi, vi->sq[qp_index].vq, &vi->sq[qp_index].napi);
@@ -2393,8 +2438,10 @@ static int virtnet_rx_resize(struct virtnet_info *vi,
qindex = rq - vi->rq;
- if (running)
+ if (running) {
napi_disable(&rq->napi);
+ cancel_work_sync(&rq->dim.work);
+ }
err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_free_unused_buf);
if (err)
@@ -2403,8 +2450,10 @@ static int virtnet_rx_resize(struct virtnet_info *vi,
if (!try_fill_recv(vi, rq, GFP_KERNEL))
schedule_delayed_work(&vi->refill, 0);
- if (running)
+ if (running) {
+ INIT_WORK(&rq->dim.work, virtnet_rx_dim_work);
virtnet_napi_enable(rq->vq, &rq->napi);
+ }
return err;
}
@@ -3341,24 +3390,51 @@ static int virtnet_send_tx_notf_coal_cmds(struct virtnet_info *vi,
static int virtnet_send_rx_notf_coal_cmds(struct virtnet_info *vi,
struct ethtool_coalesce *ec)
{
+ bool rx_ctrl_dim_on = !!ec->use_adaptive_rx_coalesce;
struct scatterlist sgs_rx;
+ bool switch_dim, update;
int i;
- vi->ctrl->coal_rx.rx_usecs = cpu_to_le32(ec->rx_coalesce_usecs);
- vi->ctrl->coal_rx.rx_max_packets = cpu_to_le32(ec->rx_max_coalesced_frames);
- sg_init_one(&sgs_rx, &vi->ctrl->coal_rx, sizeof(vi->ctrl->coal_rx));
-
- if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_NOTF_COAL,
- VIRTIO_NET_CTRL_NOTF_COAL_RX_SET,
- &sgs_rx))
- return -EINVAL;
+ switch_dim = rx_ctrl_dim_on != vi->rx_dim_enabled;
+ if (switch_dim) {
+ if (rx_ctrl_dim_on) {
+ if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_VQ_NOTF_COAL)) {
+ vi->rx_dim_enabled = true;
+ for (i = 0; i < vi->max_queue_pairs; i++)
+ vi->rq[i].dim_enabled = true;
+ } else {
+ return -EOPNOTSUPP;
+ }
+ } else {
+ vi->rx_dim_enabled = false;
+ for (i = 0; i < vi->max_queue_pairs; i++)
+ vi->rq[i].dim_enabled = false;
+ }
+ } else {
+ if (ec->rx_coalesce_usecs != vi->intr_coal_rx.max_usecs ||
+ ec->rx_max_coalesced_frames != vi->intr_coal_rx.max_packets)
+ update = true;
- /* Save parameters */
- vi->intr_coal_rx.max_usecs = ec->rx_coalesce_usecs;
- vi->intr_coal_rx.max_packets = ec->rx_max_coalesced_frames;
- for (i = 0; i < vi->max_queue_pairs; i++) {
- vi->rq[i].intr_coal.max_usecs = ec->rx_coalesce_usecs;
- vi->rq[i].intr_coal.max_packets = ec->rx_max_coalesced_frames;
+ if (vi->rx_dim_enabled) {
+ if (update)
+ return -EINVAL;
+ } else {
+ vi->ctrl->coal_rx.rx_usecs = cpu_to_le32(ec->rx_coalesce_usecs);
+ vi->ctrl->coal_rx.rx_max_packets = cpu_to_le32(ec->rx_max_coalesced_frames);
+ sg_init_one(&sgs_rx, &vi->ctrl->coal_rx, sizeof(vi->ctrl->coal_rx));
+
+ if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_NOTF_COAL,
+ VIRTIO_NET_CTRL_NOTF_COAL_RX_SET,
+ &sgs_rx))
+ return -EINVAL;
+
+ vi->intr_coal_rx.max_usecs = ec->rx_coalesce_usecs;
+ vi->intr_coal_rx.max_packets = ec->rx_max_coalesced_frames;
+ for (i = 0; i < vi->max_queue_pairs; i++) {
+ vi->rq[i].intr_coal.max_usecs = ec->rx_coalesce_usecs;
+ vi->rq[i].intr_coal.max_packets = ec->rx_max_coalesced_frames;
+ }
+ }
}
return 0;
@@ -3380,15 +3456,54 @@ static int virtnet_send_notf_coal_cmds(struct virtnet_info *vi,
return 0;
}
+static int virtnet_send_rx_notf_coal_vq_cmds(struct virtnet_info *vi,
+ struct ethtool_coalesce *ec,
+ u16 queue)
+{
+ bool rx_ctrl_dim_on = !!ec->use_adaptive_rx_coalesce;
+ bool cur_rx_dim = vi->rq[queue].dim_enabled;
+ u32 max_usecs, max_packets;
+ bool switch_dim, update;
+ int err;
+
+ switch_dim = rx_ctrl_dim_on != cur_rx_dim;
+ if (switch_dim) {
+ if (rx_ctrl_dim_on)
+ vi->rq[queue].dim_enabled = true;
+ else
+ vi->rq[queue].dim_enabled = false;
+ } else {
+ max_usecs = vi->rq[queue].intr_coal.max_usecs;
+ max_packets = vi->rq[queue].intr_coal.max_packets;
+ if (ec->rx_coalesce_usecs != max_usecs ||
+ ec->rx_max_coalesced_frames != max_packets)
+ update = true;
+
+ if (cur_rx_dim) {
+ if (update)
+ return -EINVAL;
+ } else {
+ if (!update)
+ return 0;
+
+ err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, queue,
+ ec->rx_coalesce_usecs,
+ ec->rx_max_coalesced_frames);
+ if (err)
+ return err;
+ }
+ }
+
+ return 0;
+}
+
static int virtnet_send_notf_coal_vq_cmds(struct virtnet_info *vi,
struct ethtool_coalesce *ec,
u16 queue)
{
int err;
- err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, queue,
- ec->rx_coalesce_usecs,
- ec->rx_max_coalesced_frames);
+ err = virtnet_send_rx_notf_coal_vq_cmds(vi, ec, queue);
if (err)
return err;
@@ -3401,6 +3516,32 @@ static int virtnet_send_notf_coal_vq_cmds(struct virtnet_info *vi,
return 0;
}
+static void virtnet_rx_dim_work(struct work_struct *work)
+{
+ struct dim *dim = container_of(work, struct dim, work);
+ struct receive_queue *rq = container_of(dim,
+ struct receive_queue, dim);
+ struct virtnet_info *vi = rq->vq->vdev->priv;
+ struct net_device *dev = vi->dev;
+ struct dim_cq_moder update_moder;
+ int qnum = rq - vi->rq, err;
+
+ update_moder = net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
+ if (update_moder.usec != vi->rq[qnum].intr_coal.max_usecs ||
+ update_moder.pkts != vi->rq[qnum].intr_coal.max_packets) {
+ rtnl_lock();
+ err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum,
+ update_moder.usec,
+ update_moder.pkts);
+ if (err)
+ pr_debug("%s: Failed to send dim parameters on rxq%d\n",
+ dev->name, (int)(rq - vi->rq));
+ rtnl_unlock();
+ }
+
+ dim->state = DIM_START_MEASURE;
+}
+
static int virtnet_coal_params_supported(struct ethtool_coalesce *ec)
{
/* usecs coalescing is supported only if VIRTIO_NET_F_NOTF_COAL
@@ -3482,6 +3623,7 @@ static int virtnet_get_coalesce(struct net_device *dev,
ec->tx_coalesce_usecs = vi->intr_coal_tx.max_usecs;
ec->tx_max_coalesced_frames = vi->intr_coal_tx.max_packets;
ec->rx_max_coalesced_frames = vi->intr_coal_rx.max_packets;
+ ec->use_adaptive_rx_coalesce = vi->rx_dim_enabled;
} else {
ec->rx_max_coalesced_frames = 1;
@@ -3539,6 +3681,7 @@ static int virtnet_get_per_queue_coalesce(struct net_device *dev,
ec->tx_coalesce_usecs = vi->sq[queue].intr_coal.max_usecs;
ec->tx_max_coalesced_frames = vi->sq[queue].intr_coal.max_packets;
ec->rx_max_coalesced_frames = vi->rq[queue].intr_coal.max_packets;
+ ec->use_adaptive_rx_coalesce = vi->rq[queue].dim_enabled;
} else {
ec->rx_max_coalesced_frames = 1;
@@ -3664,7 +3807,7 @@ static int virtnet_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *info)
static const struct ethtool_ops virtnet_ethtool_ops = {
.supported_coalesce_params = ETHTOOL_COALESCE_MAX_FRAMES |
- ETHTOOL_COALESCE_USECS,
+ ETHTOOL_COALESCE_USECS | ETHTOOL_COALESCE_USE_ADAPTIVE_RX,
.get_drvinfo = virtnet_get_drvinfo,
.get_link = ethtool_op_get_link,
.get_ringparam = virtnet_get_ringparam,
--
2.19.1.6.gb485710b
^ permalink raw reply related
* [PATCH net-next v2 5/5] virtio-net: return -EOPNOTSUPP for adaptive-tx
From: Heng Qi @ 2023-11-02 13:09 UTC (permalink / raw)
To: netdev, virtualization
Cc: Jason Wang, Michael S. Tsirkin, Eric Dumazet, Paolo Abeni,
David S. Miller, Jesper Dangaard Brouer, John Fastabend,
Alexei Starovoitov, Simon Horman, Jakub Kicinski, Liu, Yujie
In-Reply-To: <cover.1698929590.git.hengqi@linux.alibaba.com>
We do not currently support tx dim, so respond to -EOPNOTSUPP.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
---
v1->v2:
- Use -EOPNOTSUPP instead of specific implementation.
drivers/net/virtio_net.c | 29 ++++++++++++++++++++++++++---
1 file changed, 26 insertions(+), 3 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 5473aa1ee5cd..03edeadd0725 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3364,9 +3364,15 @@ static int virtnet_get_link_ksettings(struct net_device *dev,
static int virtnet_send_tx_notf_coal_cmds(struct virtnet_info *vi,
struct ethtool_coalesce *ec)
{
+ bool tx_ctrl_dim_on = !!ec->use_adaptive_tx_coalesce;
struct scatterlist sgs_tx;
int i;
+ if (tx_ctrl_dim_on) {
+ pr_debug("Failed to enable adaptive-tx, which is not supported\n");
+ return -EOPNOTSUPP;
+ }
+
vi->ctrl->coal_tx.tx_usecs = cpu_to_le32(ec->tx_coalesce_usecs);
vi->ctrl->coal_tx.tx_max_packets = cpu_to_le32(ec->tx_max_coalesced_frames);
sg_init_one(&sgs_tx, &vi->ctrl->coal_tx, sizeof(vi->ctrl->coal_tx));
@@ -3497,6 +3503,25 @@ static int virtnet_send_rx_notf_coal_vq_cmds(struct virtnet_info *vi,
return 0;
}
+static int virtnet_send_tx_notf_coal_vq_cmds(struct virtnet_info *vi,
+ struct ethtool_coalesce *ec,
+ u16 queue)
+{
+ bool tx_ctrl_dim_on = !!ec->use_adaptive_tx_coalesce;
+ int err;
+
+ if (tx_ctrl_dim_on) {
+ pr_debug("Enabling adaptive-tx for txq%d is not supported\n", queue);
+ return -EOPNOTSUPP;
+ }
+
+ err = virtnet_send_tx_ctrl_coal_vq_cmd(vi, queue,
+ ec->tx_coalesce_usecs,
+ ec->tx_max_coalesced_frames);
+
+ return err;
+}
+
static int virtnet_send_notf_coal_vq_cmds(struct virtnet_info *vi,
struct ethtool_coalesce *ec,
u16 queue)
@@ -3507,9 +3532,7 @@ static int virtnet_send_notf_coal_vq_cmds(struct virtnet_info *vi,
if (err)
return err;
- err = virtnet_send_tx_ctrl_coal_vq_cmd(vi, queue,
- ec->tx_coalesce_usecs,
- ec->tx_max_coalesced_frames);
+ err = virtnet_send_tx_notf_coal_vq_cmds(vi, ec, queue);
if (err)
return err;
--
2.19.1.6.gb485710b
^ permalink raw reply related
* [PATCH net-next v2 3/5] virtio-net: extract virtqueue coalescig cmd for reuse
From: Heng Qi @ 2023-11-02 13:09 UTC (permalink / raw)
To: netdev, virtualization
Cc: Jason Wang, Michael S. Tsirkin, Eric Dumazet, Paolo Abeni,
David S. Miller, Jesper Dangaard Brouer, John Fastabend,
Alexei Starovoitov, Simon Horman, Jakub Kicinski, Liu, Yujie
In-Reply-To: <cover.1698929590.git.hengqi@linux.alibaba.com>
Extract commands to set virtqueue coalescing parameters for reuse
by ethtool -Q, vq resize and netdim.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 106 +++++++++++++++++++++++----------------
1 file changed, 64 insertions(+), 42 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 0285301caf78..69fe09e99b3c 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2849,6 +2849,58 @@ static void virtnet_cpu_notif_remove(struct virtnet_info *vi)
&vi->node_dead);
}
+static int virtnet_send_ctrl_coal_vq_cmd(struct virtnet_info *vi,
+ u16 vqn, u32 max_usecs, u32 max_packets)
+{
+ struct scatterlist sgs;
+
+ vi->ctrl->coal_vq.vqn = cpu_to_le16(vqn);
+ vi->ctrl->coal_vq.coal.max_usecs = cpu_to_le32(max_usecs);
+ vi->ctrl->coal_vq.coal.max_packets = cpu_to_le32(max_packets);
+ sg_init_one(&sgs, &vi->ctrl->coal_vq, sizeof(vi->ctrl->coal_vq));
+
+ if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_NOTF_COAL,
+ VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET,
+ &sgs))
+ return -EINVAL;
+
+ return 0;
+}
+
+static int virtnet_send_rx_ctrl_coal_vq_cmd(struct virtnet_info *vi,
+ u16 queue, u32 max_usecs,
+ u32 max_packets)
+{
+ int err;
+
+ err = virtnet_send_ctrl_coal_vq_cmd(vi, rxq2vq(queue),
+ max_usecs, max_packets);
+ if (err)
+ return err;
+
+ vi->rq[queue].intr_coal.max_usecs = max_usecs;
+ vi->rq[queue].intr_coal.max_packets = max_packets;
+
+ return 0;
+}
+
+static int virtnet_send_tx_ctrl_coal_vq_cmd(struct virtnet_info *vi,
+ u16 queue, u32 max_usecs,
+ u32 max_packets)
+{
+ int err;
+
+ err = virtnet_send_ctrl_coal_vq_cmd(vi, txq2vq(queue),
+ max_usecs, max_packets);
+ if (err)
+ return err;
+
+ vi->sq[queue].intr_coal.max_usecs = max_usecs;
+ vi->sq[queue].intr_coal.max_packets = max_packets;
+
+ return 0;
+}
+
static void virtnet_get_ringparam(struct net_device *dev,
struct ethtool_ringparam *ring,
struct kernel_ethtool_ringparam *kernel_ring,
@@ -2906,14 +2958,11 @@ static int virtnet_set_ringparam(struct net_device *dev,
* through the VIRTIO_NET_CTRL_NOTF_COAL_TX_SET command, or, if the driver
* did not set any TX coalescing parameters, to 0.
*/
- err = virtnet_send_ctrl_coal_vq_cmd(vi, txq2vq(i),
- vi->intr_coal_tx.max_usecs,
- vi->intr_coal_tx.max_packets);
+ err = virtnet_send_tx_ctrl_coal_vq_cmd(vi, i,
+ vi->intr_coal_tx.max_usecs,
+ vi->intr_coal_tx.max_packets);
if (err)
return err;
-
- vi->sq[i].intr_coal.max_usecs = vi->intr_coal_tx.max_usecs;
- vi->sq[i].intr_coal.max_packets = vi->intr_coal_tx.max_packets;
}
if (ring->rx_pending != rx_pending) {
@@ -2922,14 +2971,11 @@ static int virtnet_set_ringparam(struct net_device *dev,
return err;
/* The reason is same as the transmit virtqueue reset */
- err = virtnet_send_ctrl_coal_vq_cmd(vi, rxq2vq(i),
- vi->intr_coal_rx.max_usecs,
- vi->intr_coal_rx.max_packets);
+ err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, i,
+ vi->intr_coal_rx.max_usecs,
+ vi->intr_coal_rx.max_packets);
if (err)
return err;
-
- vi->rq[i].intr_coal.max_usecs = vi->intr_coal_rx.max_usecs;
- vi->rq[i].intr_coal.max_packets = vi->intr_coal_rx.max_packets;
}
}
@@ -3334,48 +3380,24 @@ static int virtnet_send_notf_coal_cmds(struct virtnet_info *vi,
return 0;
}
-static int virtnet_send_ctrl_coal_vq_cmd(struct virtnet_info *vi,
- u16 vqn, u32 max_usecs, u32 max_packets)
-{
- struct scatterlist sgs;
-
- vi->ctrl->coal_vq.vqn = cpu_to_le16(vqn);
- vi->ctrl->coal_vq.coal.max_usecs = cpu_to_le32(max_usecs);
- vi->ctrl->coal_vq.coal.max_packets = cpu_to_le32(max_packets);
- sg_init_one(&sgs, &vi->ctrl->coal_vq, sizeof(vi->ctrl->coal_vq));
-
- if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_NOTF_COAL,
- VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET,
- &sgs))
- return -EINVAL;
-
- return 0;
-}
-
static int virtnet_send_notf_coal_vq_cmds(struct virtnet_info *vi,
struct ethtool_coalesce *ec,
u16 queue)
{
int err;
- err = virtnet_send_ctrl_coal_vq_cmd(vi, rxq2vq(queue),
- ec->rx_coalesce_usecs,
- ec->rx_max_coalesced_frames);
+ err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, queue,
+ ec->rx_coalesce_usecs,
+ ec->rx_max_coalesced_frames);
if (err)
return err;
- vi->rq[queue].intr_coal.max_usecs = ec->rx_coalesce_usecs;
- vi->rq[queue].intr_coal.max_packets = ec->rx_max_coalesced_frames;
-
- err = virtnet_send_ctrl_coal_vq_cmd(vi, txq2vq(queue),
- ec->tx_coalesce_usecs,
- ec->tx_max_coalesced_frames);
+ err = virtnet_send_tx_ctrl_coal_vq_cmd(vi, queue,
+ ec->tx_coalesce_usecs,
+ ec->tx_max_coalesced_frames);
if (err)
return err;
- vi->sq[queue].intr_coal.max_usecs = ec->tx_coalesce_usecs;
- vi->sq[queue].intr_coal.max_packets = ec->tx_max_coalesced_frames;
-
return 0;
}
--
2.19.1.6.gb485710b
^ permalink raw reply related
* [PATCH net-next v2 2/5] virtio-net: separate rx/tx coalescing moderation cmds
From: Heng Qi @ 2023-11-02 13:09 UTC (permalink / raw)
To: netdev, virtualization
Cc: Jason Wang, Michael S. Tsirkin, Eric Dumazet, Paolo Abeni,
David S. Miller, Jesper Dangaard Brouer, John Fastabend,
Alexei Starovoitov, Simon Horman, Jakub Kicinski, Liu, Yujie
In-Reply-To: <cover.1698929590.git.hengqi@linux.alibaba.com>
This patch separates the rx and tx global coalescing moderation
commands to support netdim switches in subsequent patches.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
---
v1->v2:
- Add 'i' definition.
drivers/net/virtio_net.c | 31 ++++++++++++++++++++++++++++---
1 file changed, 28 insertions(+), 3 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 0ad2894e6a5e..0285301caf78 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3266,10 +3266,10 @@ static int virtnet_get_link_ksettings(struct net_device *dev,
return 0;
}
-static int virtnet_send_notf_coal_cmds(struct virtnet_info *vi,
- struct ethtool_coalesce *ec)
+static int virtnet_send_tx_notf_coal_cmds(struct virtnet_info *vi,
+ struct ethtool_coalesce *ec)
{
- struct scatterlist sgs_tx, sgs_rx;
+ struct scatterlist sgs_tx;
int i;
vi->ctrl->coal_tx.tx_usecs = cpu_to_le32(ec->tx_coalesce_usecs);
@@ -3289,6 +3289,15 @@ static int virtnet_send_notf_coal_cmds(struct virtnet_info *vi,
vi->sq[i].intr_coal.max_packets = ec->tx_max_coalesced_frames;
}
+ return 0;
+}
+
+static int virtnet_send_rx_notf_coal_cmds(struct virtnet_info *vi,
+ struct ethtool_coalesce *ec)
+{
+ struct scatterlist sgs_rx;
+ int i;
+
vi->ctrl->coal_rx.rx_usecs = cpu_to_le32(ec->rx_coalesce_usecs);
vi->ctrl->coal_rx.rx_max_packets = cpu_to_le32(ec->rx_max_coalesced_frames);
sg_init_one(&sgs_rx, &vi->ctrl->coal_rx, sizeof(vi->ctrl->coal_rx));
@@ -3309,6 +3318,22 @@ static int virtnet_send_notf_coal_cmds(struct virtnet_info *vi,
return 0;
}
+static int virtnet_send_notf_coal_cmds(struct virtnet_info *vi,
+ struct ethtool_coalesce *ec)
+{
+ int err;
+
+ err = virtnet_send_tx_notf_coal_cmds(vi, ec);
+ if (err)
+ return err;
+
+ err = virtnet_send_rx_notf_coal_cmds(vi, ec);
+ if (err)
+ return err;
+
+ return 0;
+}
+
static int virtnet_send_ctrl_coal_vq_cmd(struct virtnet_info *vi,
u16 vqn, u32 max_usecs, u32 max_packets)
{
--
2.19.1.6.gb485710b
^ permalink raw reply related
* [PATCH net-next v2 1/5] virtio-net: returns whether napi is complete
From: Heng Qi @ 2023-11-02 13:09 UTC (permalink / raw)
To: netdev, virtualization
Cc: Jason Wang, Michael S. Tsirkin, Eric Dumazet, Paolo Abeni,
David S. Miller, Jesper Dangaard Brouer, John Fastabend,
Alexei Starovoitov, Simon Horman, Jakub Kicinski, Liu, Yujie
In-Reply-To: <cover.1698929590.git.hengqi@linux.alibaba.com>
rx netdim needs to count the traffic during a complete napi process,
and start updating and comparing samples to make decisions after
the napi ends. Let virtqueue_napi_complete() return true if napi is done,
otherwise vice versa.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d16f592c2061..0ad2894e6a5e 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -431,7 +431,7 @@ static void virtqueue_napi_schedule(struct napi_struct *napi,
}
}
-static void virtqueue_napi_complete(struct napi_struct *napi,
+static bool virtqueue_napi_complete(struct napi_struct *napi,
struct virtqueue *vq, int processed)
{
int opaque;
@@ -440,9 +440,13 @@ static void virtqueue_napi_complete(struct napi_struct *napi,
if (napi_complete_done(napi, processed)) {
if (unlikely(virtqueue_poll(vq, opaque)))
virtqueue_napi_schedule(napi, vq);
+ else
+ return true;
} else {
virtqueue_disable_cb(vq);
}
+
+ return false;
}
static void skb_xmit_done(struct virtqueue *vq)
--
2.19.1.6.gb485710b
^ permalink raw reply related
* [PATCH net-next v2 0/5] virtio-net: support dynamic coalescing moderation
From: Heng Qi @ 2023-11-02 13:09 UTC (permalink / raw)
To: netdev, virtualization
Cc: Jason Wang, Michael S. Tsirkin, Eric Dumazet, Paolo Abeni,
David S. Miller, Jesper Dangaard Brouer, John Fastabend,
Alexei Starovoitov, Simon Horman, Jakub Kicinski, Liu, Yujie
Now, virtio-net already supports per-queue moderation parameter
setting. Based on this, we use the linux dimlib to support
dynamic coalescing moderation for virtio-net.
Due to some scheduling issues, we only support and test the rx dim.
Some test results:
I. Sockperf UDP
=================================================
1. Env
rxq_0 with affinity to cpu_0.
2. Cmd
client: taskset -c 0 sockperf tp -p 8989 -i $IP -t 10 -m 16B
server: taskset -c 0 sockperf sr -p 8989
3. Result
dim off: 1143277.00 rxpps, throughput 17.844 MBps, cpu is 100%.
dim on: 1124161.00 rxpps, throughput 17.610 MBps, cpu is 83.5%.
=================================================
II. Redis
=================================================
1. Env
There are 8 rxqs, and rxq_i with affinity to cpu_i.
2. Result
When all cpus are 100%, ops/sec of memtier_benchmark client is
dim off: 978437.23
dim on: 1143638.28
=================================================
III. Nginx
=================================================
1. Env
There are 8 rxqs and rxq_i with affinity to cpu_i.
2. Result
When all cpus are 100%, requests/sec of wrk client is
dim off: 877931.67
dim on: 1019160.31
=================================================
IV. Latency of sockperf udp
=================================================
1. Rx cmd
taskset -c 0 sockperf sr -p 8989
2. Tx cmd
taskset -c 0 sockperf pp -i ${ip} -p 8989 -t 10
After running this cmd 5 times and averaging the results,
3. Result
dim off: 17.7735 usec
dim on: 18.0110 usec
=================================================
Changelog:
v1->v2:
- Patch(2/5): a minor fix.
- Patch(4/5):
- improved the judgment of dim switch conditions.
- fix safe problem of work thread.
- Patch(5/5): Drop the tx dim implementation.
Heng Qi (5):
virtio-net: returns whether napi is complete
virtio-net: separate rx/tx coalescing moderation cmds
virtio-net: extract virtqueue coalescig cmd for reuse
virtio-net: support rx netdim
virtio-net: return -EOPNOTSUPP for adaptive-tx
drivers/net/virtio_net.c | 331 ++++++++++++++++++++++++++++++++-------
1 file changed, 274 insertions(+), 57 deletions(-)
--
2.19.1.6.gb485710b
^ permalink raw reply
* Re: [Patch v13 4/9] wifi: mac80211: Add support for WBRF features
From: Johannes Berg @ 2023-11-02 13:04 UTC (permalink / raw)
To: Ilpo Järvinen
Cc: Ma Jun, amd-gfx, lenb, davem, edumazet, kuba, pabeni,
alexander.deucher, Lijo.Lazar, mario.limonciello, Netdev,
linux-wireless, LKML, linux-doc, platform-driver-x86, majun,
Evan Quan
In-Reply-To: <e42c5484-d66-e41a-8b2e-a1fa4495ce2@linux.intel.com>
On Thu, 2023-11-02 at 14:24 +0200, Ilpo Järvinen wrote:
> On Thu, 2 Nov 2023, Johannes Berg wrote:
> > On Thu, 2023-11-02 at 13:55 +0200, Ilpo Järvinen wrote:
> >
> > > > +static void get_chan_freq_boundary(u32 center_freq, u32 bandwidth, u64 *start, u64 *end)
> > > > +{
> > > > + bandwidth = MHZ_TO_KHZ(bandwidth);
> > > > + center_freq = MHZ_TO_KHZ(center_freq);
> > >
> > > Please use include/linux/units.h ones for these too.
> >
> > Now we're feature creeping though - this has existed for *years* in the
> > wireless stack with many instances? We can convert them over, I guess,
> > but not sure that makes much sense here - we'd want to add such macros
> > to units.h, but ... moving them can be independent of this patch?
>
> What new macros you're talking about?
Sorry, I got confused - for some reason I was pretty sure something here
was already being added to units.h in this patchset.
> Nothing new needs to be added
> as there's already KHZ_PER_MHZ so these would just be:
>
> bandwidth *= KHZ_PER_MHZ;
> center_freq *= KHZ_PER_MHZ;
Sure, and in this case that's probably pretty much equivalent. But
having a MHZ_TO_KHZ() macro isn't inherently *bad*, and I'm not sure
you're objection to it on anything other than "it's not defined in
units.h".
> Everything can of course be postponed by the argument that some
> subsystem specific mechanism has been there before the generic one
> but the end of that road won't be pretty... What I was trying to do
> here was to point out the new stuff introduced by this series into the
> direction of the generic thing.
I just think that the better course of action would be to eventually
move MHZ_TO_KHZ() to units.h ...
johannes
^ permalink raw reply
* Re: [PATCH net-next] net, sched: Fix SKB_NOT_DROPPED_YET splat under debug config
From: Jamal Hadi Salim @ 2023-11-02 12:47 UTC (permalink / raw)
To: Paolo Abeni; +Cc: Daniel Borkmann, kuba, idosch, netdev, bpf
In-Reply-To: <5ca2062477738b804ce805847f7aec024ad5988c.camel@redhat.com>
On Thu, Nov 2, 2023 at 6:17 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Fri, 2023-10-27 at 20:21 +0200, Daniel Borkmann wrote:
> > On 10/27/23 7:24 PM, Jamal Hadi Salim wrote:
> > > On Fri, Oct 27, 2023 at 9:51 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> > > >
> > > > Ido reported:
> > > >
> > > > [...] getting the following splat [1] with CONFIG_DEBUG_NET=y and this
> > > > reproducer [2]. Problem seems to be that classifiers clear 'struct
> > > > tcf_result::drop_reason', thereby triggering the warning in
> > > > __kfree_skb_reason() due to reason being 'SKB_NOT_DROPPED_YET' (0). [...]
> > > >
> > > > [1]
> > > > WARNING: CPU: 0 PID: 181 at net/core/skbuff.c:1082 kfree_skb_reason+0x38/0x130
> > > > Modules linked in:
> > > > CPU: 0 PID: 181 Comm: mausezahn Not tainted 6.6.0-rc6-custom-ge43e6d9582e0 #682
> > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
> > > > RIP: 0010:kfree_skb_reason+0x38/0x130
> > > > [...]
> > > > Call Trace:
> > > > <IRQ>
> > > > __netif_receive_skb_core.constprop.0+0x837/0xdb0
> > > > __netif_receive_skb_one_core+0x3c/0x70
> > > > process_backlog+0x95/0x130
> > > > __napi_poll+0x25/0x1b0
> > > > net_rx_action+0x29b/0x310
> > > > __do_softirq+0xc0/0x29b
> > > > do_softirq+0x43/0x60
> > > > </IRQ>
> > > >
> > > > [2]
> > > > #!/bin/bash
> > > >
> > > > ip link add name veth0 type veth peer name veth1
> > > > ip link set dev veth0 up
> > > > ip link set dev veth1 up
> > > > tc qdisc add dev veth1 clsact
> > > > tc filter add dev veth1 ingress pref 1 proto all flower dst_mac 00:11:22:33:44:55 action drop
> > > > mausezahn veth0 -a own -b 00:11:22:33:44:55 -q -c 1
> > > >
> > > > What happens is that inside most classifiers the tcf_result is copied over
> > > > from a filter template e.g. *res = f->res which then implicitly overrides
> > > > the prior SKB_DROP_REASON_TC_{INGRESS,EGRESS} default drop code which was
> > > > set via sch_handle_{ingress,egress}() for kfree_skb_reason().
> > > >
> > > > Add a small helper tcf_set_result() and convert classifiers over to it.
> > > > The latter leaves the drop code intact and classifiers, actions as well
> > > > as the action engine in tcf_exts_exec() can then in future make use of
> > > > tcf_set_drop_reason(), too.
> > > >
> > > > Tested that the splat is fixed under CONFIG_DEBUG_NET=y with the repro.
> > > >
> > > > Fixes: 54a59aed395c ("net, sched: Make tc-related drop reason more flexible")
> > > > Reported-by: Ido Schimmel <idosch@idosch.org>
> > > > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> > > > Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> > > > Cc: Jakub Kicinski <kuba@kernel.org>
> > > > Link: https://lore.kernel.org/netdev/ZTjY959R+AFXf3Xy@shredder
> > > > ---
> > > > include/net/pkt_cls.h | 12 ++++++++++++
> > > > net/sched/cls_basic.c | 2 +-
> > > > net/sched/cls_bpf.c | 2 +-
> > > > net/sched/cls_flower.c | 2 +-
> > > > net/sched/cls_fw.c | 2 +-
> > > > net/sched/cls_matchall.c | 2 +-
> > > > net/sched/cls_route.c | 4 ++--
> > > > net/sched/cls_u32.c | 2 +-
> > > > 8 files changed, 20 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> > > > index a76c9171db0e..31d8e8587824 100644
> > > > --- a/include/net/pkt_cls.h
> > > > +++ b/include/net/pkt_cls.h
> > > > @@ -160,6 +160,18 @@ static inline void tcf_set_drop_reason(struct tcf_result *res,
> > > > res->drop_reason = reason;
> > > > }
> > > >
> > > > +static inline void tcf_set_result(struct tcf_result *to,
> > > > + const struct tcf_result *from)
> > > > +{
> > > > + /* tcf_result's drop_reason which is the last member must be
> > > > + * preserved and cannot be copied from the cls'es tcf_result
> > > > + * template given this is carried all the way and potentially
> > > > + * set to a concrete tc drop reason upon error or intentional
> > > > + * drop. See tcf_set_drop_reason() locations.
> > > > + */
> > > > + memcpy(to, from, offsetof(typeof(*to), drop_reason));
> > > > +}
> > >
> > > I believe our bigger issue here is we are using this struct now for
> > > both policy set by the control plane and for runtime decisions
> >
> > Hm, but that was also either way in the original rfc.
> >
> > > (drop_reason) - whereas the original assumption was this struct only
> > > held set policy. In retrospect we should have put the verdict(which is
> > > policy) here and return the error code (as was in the first patch). I
> > > am also not sure humans would not make a mistake on "this field must
> > > be at the end of the struct". Can we put some assert (or big comment
> > > on the struct) to make sure someone does not overwrite this field?
> >
> > Yeah that can be done.
>
> FTR, I agree the comment or even better a build_bug_on() somewhere
> should be better.
Paolo - Did you see the patch i posted? Ido/Daniel?
cheers,
jamal
>
> Thanks!
>
> Paolo
>
^ permalink raw reply
* Re: [PATCH net-next v2 8/9] microchip: lan865x: add driver support for Microchip's LAN865X MACPHY
From: Andrew Lunn @ 2023-11-02 12:39 UTC (permalink / raw)
To: Ramón Nordin Rodriguez
Cc: Parthiban Veerasooran, davem, edumazet, kuba, pabeni, robh+dt,
krzysztof.kozlowski+dt, conor+dt, corbet, steen.hegelund, rdunlap,
horms, casper.casan, netdev, devicetree, linux-kernel, linux-doc,
horatiu.vultur, Woojung.Huh, Nicolas.Ferre, UNGLinuxDriver,
Thorsten.Kummermehr
In-Reply-To: <ZUOUGf-PMGo8z1s-@debian>
> spe1: ethernet@1{
> compatible = "microchip,lan865x";
> reg = <1>;
> interrupt-parent = <&gpio5>;
> interrupts = <0 IRQ_TYPE_EDGE_FALLING>;
> spi-max-frequency = <50000000>;
That is a pretty high frequency. It is actually running at that speed?
Have you tried lower speed?
> oa-tc6{
> #address-cells = <1>;
> #size-cells = <0>;
> oa-cps = <32>;
> oa-prote;
> oa-dprac;
> };
> };
> };
>
> With this setup I'm getting a maximum throughput of about 90kB/s.
> If I increase the chunk size / oa-cps to 64 I get a might higher
> throughput ~900kB/s, but after 0-2s I get dump below (or similar).
Is this tcp traffic? Lost packets will have a big impact. You might
want to look at the link peer with tcpdump and look for retries. Also,
look if there are frames with bad checksums.
Andrew
^ permalink raw reply
* Re: [PATCH net v2] net: dsa: tag_rtl4_a: Bump min packet size
From: Vladimir Oltean @ 2023-11-02 12:32 UTC (permalink / raw)
To: Linus Walleij
Cc: Andrew Lunn, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, netdev, linux-kernel
In-Reply-To: <CACRpkdb=16uLhsXhktLCwUByDAMv9Arg2zzCA+oJW2HBJ35-Bg@mail.gmail.com>
On Tue, Oct 31, 2023 at 08:02:29PM +0100, Linus Walleij wrote:
> On Tue, Oct 31, 2023 at 5:34 PM Vladimir Oltean <olteanv@gmail.com> wrote:
> > So on the gemini-dlink-dir-685.dts platform, you can also use &gmac1 as
> > a plain Ethernet port, right?
>
> As a port it exist on the SoC yes but it is not connected physically
> to anything.
>
> &gmac0 is connected to the switch, and the switch has all the PHYs.
(...)
> If you by remote end mean the end of a physical cable there is
> no way I can do that, as I have no PHY on gmac1.
>
> (I don't know if I misunderstand the question...)
No, you aren't.
> But I have other Gemini platforms, so I will try to do it on one
> of them! Let's see if I can do this thing....
Ok.
^ permalink raw reply
* Re: [PATCH v3] net: r8169: Disable multicast filter for RTL8168H and RTL8107E
From: patchwork-bot+netdevbpf @ 2023-11-02 12:30 UTC (permalink / raw)
To: Patrick Thompson
Cc: netdev, hau, davem, edumazet, hkallweit1, kuba, pabeni,
linux-kernel, nic_swsd
In-Reply-To: <20231030205031.177855-1-ptf@google.com>
Hello:
This patch was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:
On Mon, 30 Oct 2023 16:50:14 -0400 you wrote:
> RTL8168H and RTL8107E ethernet adapters erroneously filter unicast
> eapol packets unless allmulti is enabled. These devices correspond to
> RTL_GIGA_MAC_VER_46 and VER_48. Add an exception for VER_46 and VER_48
> in the same way that VER_35 has an exception.
>
> Fixes: 6e1d0b898818 ("r8169:add support for RTL8168H and RTL8107E")
> Signed-off-by: Patrick Thompson <ptf@google.com>
>
> [...]
Here is the summary with links:
- [v3] net: r8169: Disable multicast filter for RTL8168H and RTL8107E
https://git.kernel.org/netdev/net/c/efa5f1311c49
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net 1/7] net: hns3: fix add VLAN fail issue
From: Jijie Shao @ 2023-11-02 12:29 UTC (permalink / raw)
To: Paolo Abeni, yisen.zhuang, salil.mehta, davem, edumazet, kuba
Cc: shaojijie, shenjian15, wangjie125, liuyonglong, netdev,
linux-kernel
In-Reply-To: <6603e0480feea2e7a28a865705da52bb99679a35.camel@redhat.com>
on 2023/11/2 18:31, Paolo Abeni wrote:
> On Sat, 2023-10-28 at 10:59 +0800, Jijie Shao wrote:
>
> /* when port base vlan enabled, we use port base vlan as the vlan
> * filter entry. In this case, we don't update vlan filter table
> @@ -10481,7 +10482,9 @@ int hclge_set_vlan_filter(struct hnae3_handle *handle, __be16 proto,
> * and try to remove it from hw later, to be consistence
> * with stack
> */
> + mutex_lock(&hdev->vport_lock);
> set_bit(vlan_id, vport->vlan_del_fail_bmap);
> + mutex_unlock(&hdev->vport_lock);
> It looks like that the 'hclge_rm_vport_vlan_table()' call a few lines
> above will now happen with the vport_lock unlocked.
>
> That looks racy and would deserve at least a comment explaining why
> it's safe.
>
> Thanks,
>
> Paolo
Yeah, this is a problem that causes the function to be invoked when it
is not locked. We'll get lock before the function be invoked in V2
Thanks, Jijie
^ permalink raw reply
* Re: [PATCH v2] netfilter: nf_tables: prevent OOB access in nft_byteorder_eval
From: Dan Carpenter @ 2023-11-02 12:27 UTC (permalink / raw)
To: Florian Westphal
Cc: Thadeu Lima de Souza Cascardo, netfilter-devel, netdev,
Pablo Neira Ayuso, Harshit Mogalapalli
In-Reply-To: <20231102102846.GE6174@breakpoint.cc>
On Thu, Nov 02, 2023 at 11:28:46AM +0100, Florian Westphal wrote:
> Dan Carpenter <dan.carpenter@linaro.org> wrote:
> > This patch is correct, but shouldn't we fix the code for 64 bit writes
> > as well?
>
> Care to send a patch?
>
Sure. Will do.
regads,
dan carpenter
^ permalink raw reply
* RE: [PATCH 1/2][net-next] skbuff: move netlink_large_alloc_large_skb() to skbuff.c
From: Li,Rongqing @ 2023-11-02 12:09 UTC (permalink / raw)
To: Yunsheng Lin, netdev@vger.kernel.org
In-Reply-To: <50622ac2-0939-af35-5d62-c56249e7bd26@huawei.com>
> -----Original Message-----
> From: Yunsheng Lin <linyunsheng@huawei.com>
> Sent: Thursday, November 2, 2023 7:02 PM
> To: Li,Rongqing <lirongqing@baidu.com>; netdev@vger.kernel.org
> Subject: Re: [PATCH 1/2][net-next] skbuff: move netlink_large_alloc_large_skb()
> to skbuff.c
>
> On 2023/11/2 14:28, Li RongQing wrote:
> > move netlink_alloc_large_skb and netlink_skb_destructor to skbuff.c
> > and rename them more generic, so they can be used elsewhere large
> > non-contiguous physical memory is needed
> >
> > Signed-off-by: Li RongQing <lirongqing@baidu.com>
> > ---
> > include/linux/skbuff.h | 3 +++
> > net/core/skbuff.c | 40
> ++++++++++++++++++++++++++++++++++++++++
> > net/netlink/af_netlink.c | 41
> > ++---------------------------------------
> > 3 files changed, 45 insertions(+), 39 deletions(-)
> >
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index
> > 4174c4b..774a401 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -5063,5 +5063,8 @@ static inline void skb_mark_for_recycle(struct
> > sk_buff *skb) ssize_t skb_splice_from_iter(struct sk_buff *skb, struct iov_iter
> *iter,
> > ssize_t maxsize, gfp_t gfp);
> >
> > +
> > +void large_skb_destructor(struct sk_buff *skb); struct sk_buff
> > +*alloc_large_skb(unsigned int size, int broadcast);
> > #endif /* __KERNEL__ */
> > #endif /* _LINUX_SKBUFF_H */
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c index
> > 4570705..20ffcd5 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -6917,3 +6917,43 @@ ssize_t skb_splice_from_iter(struct sk_buff *skb,
> struct iov_iter *iter,
> > return spliced ?: ret;
> > }
> > EXPORT_SYMBOL(skb_splice_from_iter);
> > +
> > +void large_skb_destructor(struct sk_buff *skb) {
> > + if (is_vmalloc_addr(skb->head)) {
> > + if (!skb->cloned ||
> > + !atomic_dec_return(&(skb_shinfo(skb)->dataref)))
> > + vfree(skb->head);
> > +
> > + skb->head = NULL;
>
> There seems to be an assumption that skb returned from
> netlink_alloc_large_skb() is not expecting the frag page for shinfo->frags*, as the
> above NULL setting will bypass most of the handling in skb_release_data(),then
> how can we ensure that the user is not breaking the assumption if we make it
> more generic?
>
How about to add WARN_ON(skb_shinfo(skb)-> nr_frags) to find this condition
-Li RongQing
>
^ permalink raw reply
* Re: [Patch v13 4/9] wifi: mac80211: Add support for WBRF features
From: Ilpo Järvinen @ 2023-11-02 12:24 UTC (permalink / raw)
To: Johannes Berg
Cc: Ma Jun, amd-gfx, lenb, davem, edumazet, kuba, pabeni,
alexander.deucher, Lijo.Lazar, mario.limonciello, Netdev,
linux-wireless, LKML, linux-doc, platform-driver-x86, majun,
Evan Quan
In-Reply-To: <b080757463a1f55a38484e3ea39fd3697e98409e.camel@sipsolutions.net>
[-- Attachment #1: Type: text/plain, Size: 1145 bytes --]
On Thu, 2 Nov 2023, Johannes Berg wrote:
> On Thu, 2023-11-02 at 13:55 +0200, Ilpo Järvinen wrote:
>
> > > +static void get_chan_freq_boundary(u32 center_freq, u32 bandwidth, u64 *start, u64 *end)
> > > +{
> > > + bandwidth = MHZ_TO_KHZ(bandwidth);
> > > + center_freq = MHZ_TO_KHZ(center_freq);
> >
> > Please use include/linux/units.h ones for these too.
>
> Now we're feature creeping though - this has existed for *years* in the
> wireless stack with many instances? We can convert them over, I guess,
> but not sure that makes much sense here - we'd want to add such macros
> to units.h, but ... moving them can be independent of this patch?
What new macros you're talking about? Nothing new needs to be added
as there's already KHZ_PER_MHZ so these would just be:
bandwidth *= KHZ_PER_MHZ;
center_freq *= KHZ_PER_MHZ;
Everything can of course be postponed by the argument that some
subsystem specific mechanism has been there before the generic one
but the end of that road won't be pretty... What I was trying to do
here was to point out the new stuff introduced by this series into the
direction of the generic thing.
--
i.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox