Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2] wimax/i2400m: fix a memory leak bug
From: Perez-Gonzalez, Inaky @ 2019-08-18 22:22 UTC (permalink / raw)
  To: David Miller
  Cc: wenwen@cs.uga.edu, linux-wimax, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20190818.141158.218871786116375619.davem@davemloft.net>

This driver should be orphaned. 

While I can’t certainly say nobody is using it, the HW has not been sold for years and it hasn’t been brought to current LK standards. 

If your assesment is the code shall not be used, it’s then another argument towards disconnecting it. 

> On Aug 18, 2019, at 14:12, David Miller <davem@davemloft.net> wrote:
> 
> From: Wenwen Wang <wenwen@cs.uga.edu>
> Date: Thu, 15 Aug 2019 15:29:51 -0500
> 
>> In i2400m_barker_db_init(), 'options_orig' is allocated through kstrdup()
>> to hold the original command line options. Then, the options are parsed.
>> However, if an error occurs during the parsing process, 'options_orig' is
>> not deallocated, leading to a memory leak bug. To fix this issue, free
>> 'options_orig' before returning the error.
>> 
>> Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
> 
> Applied, but... looking at the rest of this file I hope nobody is actually
> running this code.

^ permalink raw reply

* Re: regression in ath10k dma allocation
From: Tobias Klausmann @ 2019-08-18 22:38 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Nicolin Chen, Christoph Hellwig, kvalo, davem, ath10k,
	linux-wireless, netdev, linux-kernel, m.szyprowski, robin.murphy,
	iommu, tobias.klausmann
In-Reply-To: <20190818031328.11848-1-hdanton@sina.com>


On 18.08.19 05:13, Hillf Danton wrote:
> On Sat, 17 Aug 2019 00:42:48 +0200 Tobias Klausmann wrote:
>> Hi Nicolin,
>>
>> On 17.08.19 00:25, Nicolin Chen wrote:
>>> Hi Tobias
>>>
>>> On Fri, Aug 16, 2019 at 10:16:45PM +0200, Tobias Klausmann wrote:
>>>>> do you have CONFIG_DMA_CMA set in your config?  If not please make sure
>>>>> you have this commit in your testing tree, and if the problem still
>>>>> persists it would be a little odd and we'd have to dig deeper:
>>>>>
>>>>> commit dd3dcede9fa0a0b661ac1f24843f4a1b1317fdb6
>>>>> Author: Nicolin Chen <nicoleotsuka@gmail.com>
>>>>> Date:   Wed May 29 17:54:25 2019 -0700
>>>>>
>>>>>        dma-contiguous: fix !CONFIG_DMA_CMA version of dma_{alloc, free}_contiguous()
>>>> yes CONFIG_DMA_CMA is set (=y, see attached config), the commit you mention
>>>> above is included, if you have any hints how to go forward, please let me
>>>> know!
>>> For CONFIG_DMA_CMA=y, by judging the log with error code -12, I
>>> feel this one should work for you. Would you please check if it
>>> is included or try it out otherwise?
>>>
>>> dma-contiguous: do not overwrite align in dma_alloc_contiguous()
>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=c6622a425acd1d2f3a443cd39b490a8777b622d7
>>
>> Thanks for the hint, yet the commit is included and does not fix the
>> problem!
>>
Hi Hillf,

i just tested you first hunk (which comes from kernel/dma/direct.c if 
i'm not mistaken), it did not compile on its own, yet with a tiny bit of 
work it did, and it does indeed solve the regression. But if using that 
is the "right" way to do it, not sure, but its not on me to decide.

Anyway: Thanks for the hint,

Tobias


> Hi Tobias
>
> Two minor diffs below in hope that they might make sense.
>
> 1, fallback unless dma coherent ok.
>
> --- a/kernel/dma/contiguous.c
> +++ b/kernel/dma/contiguous.c
> @@ -246,6 +246,10 @@ struct page *dma_alloc_contiguous(struct
>   		size_t cma_align = min_t(size_t, align, CONFIG_CMA_ALIGNMENT);
>   
>   		page = cma_alloc(cma, count, cma_align, gfp & __GFP_NOWARN);
> +		if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
> +			dma_free_contiguous(dev, page, size);
> +			page = NULL;
> +		}
>   	}
>   
>   	/* Fallback allocation of normal pages */
> --
>
> 2, cleanup: cma unless contiguous
>
> --- a/kernel/dma/contiguous.c
> +++ b/kernel/dma/contiguous.c
> @@ -234,18 +234,13 @@ struct page *dma_alloc_contiguous(struct
>   	size_t count = PAGE_ALIGN(size) >> PAGE_SHIFT;
>   	size_t align = get_order(PAGE_ALIGN(size));
>   	struct page *page = NULL;
> -	struct cma *cma = NULL;
> -
> -	if (dev && dev->cma_area)
> -		cma = dev->cma_area;
> -	else if (count > 1)
> -		cma = dma_contiguous_default_area;
>   
>   	/* CMA can be used only in the context which permits sleeping */
> -	if (cma && gfpflags_allow_blocking(gfp)) {
> +	if (count > 1 && gfpflags_allow_blocking(gfp)) {
>   		size_t cma_align = min_t(size_t, align, CONFIG_CMA_ALIGNMENT);
>   
> -		page = cma_alloc(cma, count, cma_align, gfp & __GFP_NOWARN);
> +		page = cma_alloc(dev_get_cma_area(dev), count, cma_align,
> +							gfp & __GFP_NOWARN);
>   		if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
>   			dma_free_contiguous(dev, page, size);
>   			page = NULL;
> --
>

^ permalink raw reply

* Re: [PATCH v6 1/4] dt-bindings: net: phy: Add subnode for LED configuration
From: Andrew Lunn @ 2019-08-19  0:29 UTC (permalink / raw)
  To: Matthias Kaehlcke
  Cc: Pavel Machek, David S . Miller, Rob Herring, Mark Rutland,
	Florian Fainelli, Heiner Kallweit, netdev, devicetree,
	linux-kernel, Douglas Anderson
In-Reply-To: <20190816220411.GX250418@google.com>

On Fri, Aug 16, 2019 at 03:04:11PM -0700, Matthias Kaehlcke wrote:
> On Fri, Aug 16, 2019 at 10:13:38PM +0200, Pavel Machek wrote:
> > Hi!
> > 
> > Please Cc led mailing lists on led issues.
> 
> sorry for missing this
> 
> > On Tue 2019-08-13 12:11:44, Matthias Kaehlcke wrote:
> > > The LED behavior of some Ethernet PHYs is configurable. Add an
> > > optional 'leds' subnode with a child node for each LED to be
> > > configured. The binding aims to be compatible with the common
> > > LED binding (see devicetree/bindings/leds/common.txt).
> > > 
> > > A LED can be configured to be:
> > > 
> > > - 'on' when a link is active, some PHYs allow configuration for
> > >   certain link speeds
> > >   speeds
> > > - 'off'
> > > - blink on RX/TX activity, some PHYs allow configuration for
> > >   certain link speeds
> > > 
> > > For the configuration to be effective it needs to be supported by
> > > the hardware and the corresponding PHY driver.
> > > 
> > > Suggested-by: Andrew Lunn <andrew@lunn.ch>
> > > Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> > 
> > > @@ -173,5 +217,20 @@ examples:
> > >              reset-gpios = <&gpio1 4 1>;
> > >              reset-assert-us = <1000>;
> > >              reset-deassert-us = <2000>;
> > > +
> > > +            leds {
> > > +                #address-cells = <1>;
> > > +                #size-cells = <0>;
> > > +
> > > +                led@0 {
> > > +                    reg = <0>;
> > > +                    linux,default-trigger = "phy-link-1g";
> > > +                };
> > 
> > Because this affects us.
> > 
> > Is the LED software controllable?
> 
> it might be for certain PHYs, integration with the LED framework is
> not part of this series.
> 
> > Or can it do limited subset of triggers you listed?
> 
> it depends on the PHY. The one in this series (RTL8211E) only supports
> a limited subset of the listed triggers.

Hi Pavel

At the moment, there is no integration with the LED
subsystem. However, i would like to be prepared for it in the
future. It will also require some extensions to the LED subsystem.
These triggers are hardware triggers. We would need to add support for
LED trigger offload to the hardware, not have Linux blink the LED in
software. Plus we need per LED triggers, not only global triggers.
Most Ethernet PHYs also allow on/off state to be set, so they could be
software controllable, and support the generic triggers Linux has.

It has been on my mind to do this for a while, but i've never had the
time. It should also be applicable to other subsystems which have
hardware to blink LEDs. Some serial ports can control LEDs for Rx/Tx
and flow control. Maybe disk/RAID controllers are configuration how
they blink there LEDs?

      Andrew

^ permalink raw reply

* Re: [PATCH v6 4/4] net: phy: realtek: Add LED configuration support for RTL8211E
From: Andrew Lunn @ 2019-08-19  0:37 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Matthias Kaehlcke, jacek.anaszewski, linux-leds, dmurphy,
	David S . Miller, Rob Herring, Mark Rutland, Florian Fainelli,
	Heiner Kallweit, netdev, devicetree, linux-kernel,
	Douglas Anderson
In-Reply-To: <20190817140502.GA5878@amd>

> Yes, I believe the integration is neccessary. Using same binding is
> neccessary for that, but not sufficient. For example, we need
> compatible trigger names, too.

Hi Pavel

Please could you explain what you mean by compatible trigger names?

> So... I'd really like to see proper integration is possible before we
> merge this.

Please let me turn that around. What do you see as being impossible at
the moment? What do we need to convince you about?

    Thanks
	Andrew

^ permalink raw reply

* Re: [RFC PATCH net-next 03/11] spi: Add a PTP system timestamp to the transfer structure
From: Andrew Lunn @ 2019-08-19  0:43 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Mark Brown, Hubert Feurstein, mlichvar, Richard Cochran,
	Florian Fainelli, linux-spi, netdev
In-Reply-To: <CA+h21hrZbun_j+oABJFP+P+V3zHP2x0mAhv-1ocF38miCvZHew@mail.gmail.com>

> MDIO bus controllers are in a similar situation (with Hubert's patch)
> but at least there the frame size is fixed and I haven't heard of an
> MDIO controller to use DMA.

Linux does not have any DMA driver MDIO busses, as far as i know. It
does not make sense, since you are only transferring 16bits of
data. The vast majority are polled completion, but there is one which
generates an interrupt on completion.

	Andrew

^ permalink raw reply

* [PATCH] sock: fix potential memory leak in proto_register()
From: zhanglin @ 2019-08-19  1:35 UTC (permalink / raw)
  To: davem
  Cc: ast, daniel, kafai, songliubraving, yhs, willemb, edumazet,
	deepa.kernel, arnd, dh.herrmann, gnault, netdev, linux-kernel,
	bpf, xue.zhihong, wang.yi59, jiang.xuexin, zhanglin

If protocols registered exceeded PROTO_INUSE_NR, prot will be
added to proto_list, but no available bit left for prot in
proto_inuse_idx.

Signed-off-by: zhanglin <zhang.lin16@zte.com.cn>
---
 net/core/sock.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index bc3512f230a3..25388d429f6a 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3139,16 +3139,17 @@ static __init int net_inuse_init(void)
 
 core_initcall(net_inuse_init);
 
-static void assign_proto_idx(struct proto *prot)
+static int assign_proto_idx(struct proto *prot)
 {
 	prot->inuse_idx = find_first_zero_bit(proto_inuse_idx, PROTO_INUSE_NR);
 
 	if (unlikely(prot->inuse_idx == PROTO_INUSE_NR - 1)) {
 		pr_err("PROTO_INUSE_NR exhausted\n");
-		return;
+		return -ENOSPC;
 	}
 
 	set_bit(prot->inuse_idx, proto_inuse_idx);
+	return 0;
 }
 
 static void release_proto_idx(struct proto *prot)
@@ -3243,18 +3244,24 @@ int proto_register(struct proto *prot, int alloc_slab)
 	}
 
 	mutex_lock(&proto_list_mutex);
+	if (assign_proto_idx(prot)) {
+		mutex_unlock(&proto_list_mutex);
+		goto out_free_timewait_sock_slab_name;
+	}
 	list_add(&prot->node, &proto_list);
-	assign_proto_idx(prot);
 	mutex_unlock(&proto_list_mutex);
 	return 0;
 
 out_free_timewait_sock_slab_name:
-	kfree(prot->twsk_prot->twsk_slab_name);
+	if (alloc_slab && prot->twsk_prot)
+		kfree(prot->twsk_prot->twsk_slab_name);
 out_free_request_sock_slab:
-	req_prot_cleanup(prot->rsk_prot);
+	if (alloc_slab) {
+		req_prot_cleanup(prot->rsk_prot);
 
-	kmem_cache_destroy(prot->slab);
-	prot->slab = NULL;
+		kmem_cache_destroy(prot->slab);
+		prot->slab = NULL;
+	}
 out:
 	return -ENOBUFS;
 }
-- 
2.17.1


^ permalink raw reply related

* [GIT] Networking
From: David Miller @ 2019-08-19  2:46 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


1) Fix jmp to 1st instruction in x64 JIT, from Alexei Starovoitov.

2) Severl kTLS fixes in mlx5 driver, from Tariq Toukan.

3) Fix severe performance regression due to lack of SKB coalescing
   of fragments during local delivery, from Guillaume Nault.

4) Error path memory leak in sch_taprio, from Ivan Khoronzhuk.

5) Fix batched events in skbedit packet action, from Roman Mashak.

6) Propagate VLAN TX offload to hw_enc_features in bond and team drivers,
   from Yue Haibing.

7) RXRPC local endpoint refcounting fix and read after free in
   rxrpc_queue_local(), from David Howells.

8) Fix endian bug in ibmveth multicast list handling, from Thomas
   Falcon.

9) Oops, make nlmsg_parse() wrap around the correct function,
   __nlmsg_parse not __nla_parse().  Fix from David Ahern.

10) Memleak in sctp_scend_reset_streams(), fro Zheng Bin.

11) Fix memory leak in cxgb4, from Wenwen Wang.

12) Yet another race in AF_PACKET, from Eric Dumazet.

13) Fix false detection of retransmit failures in tipc, from Tuong
    Lien.

14) Use after free in ravb_tstamp_skb, from Tho Vu.

Please pull, thanks a lot!

The following changes since commit 33920f1ec5bf47c5c0a1d2113989bdd9dfb3fae9:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net (2019-08-06 17:11:59 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git 

for you to fetch changes up to cfef46d692efd852a0da6803f920cc756eea2855:

  ravb: Fix use-after-free ravb_tstamp_skb (2019-08-18 14:19:14 -0700)

----------------------------------------------------------------
Alexei Starovoitov (2):
      bpf: fix x64 JIT code generation for jmp to 1st insn
      selftests/bpf: tests for jmp to 1st insn

Anders Roxell (1):
      selftests: net: tcp_fastopen_backup_key.sh: fix shellcheck issue

Andrii Nakryiko (2):
      libbpf: fix erroneous multi-closing of BTF FD
      libbpf: set BTF FD for prog only when there is supported .BTF.ext data

André Draszik (1):
      net: phy: at803x: stop switching phy delay config needlessly

Aya Levin (3):
      net/mlx5e: Fix false negative indication on tx reporter CQE recovery
      net/mlx5e: Fix error flow of CQE recovery on tx reporter
      net/mlx5e: Remove redundant check in CQE recovery flow of tx reporter

Balakrishna Godavarthi (1):
      Bluetooth: btqca: Reset download type to default

Chen-Yu Tsai (1):
      net: dsa: Check existence of .port_mdb_add callback before calling it

Chris Packham (1):
      tipc: initialise addr_trail_end when setting node addresses

Claire Chang (1):
      Bluetooth: btqca: release_firmware after qca_inject_cmd_complete_event

Daniel Borkmann (3):
      Merge branch 'bpf-bpftool-pinning-error-msg'
      sock: make cookie generation global instead of per netns
      bpf: sync bpf.h to tools infrastructure

David Ahern (2):
      netdevsim: Restore per-network namespace accounting for fib entries
      netlink: Fix nlmsg_parse as a wrapper for strict message parsing

David Howells (5):
      rxrpc: Fix local endpoint refcounting
      rxrpc: Don't bother generating maxSkew in the ACK packet
      rxrpc: Fix local refcounting
      rxrpc: Fix local endpoint replacement
      rxrpc: Fix read-after-free in rxrpc_queue_local()

David S. Miller (12):
      Merge tag 'batadv-net-for-davem-20190808' of git://git.open-mesh.org/linux-merge
      Merge branch 'skbedit-batch-fixes'
      Merge tag 'rxrpc-fixes-20190809' of git://git.kernel.org/.../dhowells/linux-fs
      Merge branch 'Fix-collisions-in-socket-cookie-generation'
      Merge tag 'mlx5-fixes-2019-08-08' of git://git.kernel.org/.../saeed/linux
      Merge git://git.kernel.org/.../bpf/bpf
      Merge tag 'mlx5-fixes-2019-08-15' of git://git.kernel.org/.../saeed/linux
      Merge git://git.kernel.org/.../pablo/nf
      Merge tag 'rxrpc-fixes-20190814' of git://git.kernel.org/.../dhowells/linux-fs
      Merge branch 'for-upstream' of git://git.kernel.org/.../bluetooth/bluetooth
      Merge branch 'bnxt_en-Bug-fixes'
      Merge branch 'flow_offload-hardware-priority-fixes'

Denis Efremov (2):
      MAINTAINERS: PHY LIBRARY: Update files in the record
      MAINTAINERS: r8169: Update path to the driver

Dexuan Cui (1):
      hv_netvsc: Fix a warning of suspicious RCU usage

Dirk Morris (1):
      netfilter: conntrack: Use consistent ct id hash calculation

Eran Ben Elisha (1):
      net/mlx5e: Fix compatibility issue with ethtool flash device

Eric Dumazet (1):
      net/packet: fix race in tpacket_snd()

Fabian Henneke (1):
      Bluetooth: hidp: Let hidp_send_message return number of queued bytes

Florian Westphal (2):
      selftests: netfilter: extend flowtable test script for ipsec
      netfilter: nf_flow_table: fix offload for flows that are subject to xfrm

Fuqian Huang (1):
      net: tundra: tsi108: use spin_lock_irqsave instead of spin_lock_irq in IRQ context

Guillaume Nault (1):
      inet: frags: re-introduce skb coalescing for local delivery

Harish Bandi (1):
      Bluetooth: hci_qca: Send VS pre shutdown command.

Heiner Kallweit (1):
      net: phy: consider AN_RESTART status when reading link status

Huy Nguyen (2):
      net/mlx5: Support inner header match criteria for non decap flow action
      net/mlx5e: Only support tx/rx pause setting for port owner

Ivan Khoronzhuk (1):
      net: sched: sch_taprio: fix memleak in error path for sched list parse

Jakub Kicinski (4):
      net/tls: prevent skb_orphan() from leaking TLS plain text with offload
      tools: bpftool: fix error message (prog -> object)
      tools: bpftool: add error message on pin failure
      net/tls: swap sk_write_space on close

John Fastabend (1):
      net: tls, fix sk_write_space NULL write when tx disabled

Jonathan Neuschäfer (1):
      net: nps_enet: Fix function names in doc comments

Julian Wiedmann (1):
      s390/qeth: serialize cmd reply with concurrent timeout

Manish Chopra (1):
      bnx2x: Fix VF's VLAN reconfiguration in reload.

Marcel Holtmann (1):
      Bluetooth: Add debug setting for changing minimum encryption key size

Matthias Kaehlcke (2):
      Bluetooth: btqca: Add a short delay before downloading the NVM
      Bluetooth: btqca: Use correct byte format for opcode of injected command

Maxim Mikityanskiy (2):
      net/mlx5e: Use flow keys dissector to parse packets for ARFS
      net/mlx5e: Fix a race with XSKICOSQ in XSK wakeup flow

Michael Chan (2):
      bnxt_en: Fix VNIC clearing logic for 57500 chips.
      bnxt_en: Improve RX doorbell sequence.

Mohamad Heib (1):
      net/mlx5e: ethtool, Avoid setting speed to 56GBASE when autoneg off

Nathan Chancellor (1):
      net: tc35815: Explicitly check NET_IP_ALIGN is not zero in tc35815_rx

Pablo Neira Ayuso (6):
      netfilter: nf_tables: use-after-free in failing rule with bound set
      netfilter: nf_flow_table: conntrack picks up expired flows
      netfilter: nf_flow_table: teardown flow timeout race
      netfilter: nft_flow_offload: skip tcp rst and fin packets
      net: sched: use major priority number as hardware priority
      netfilter: nf_tables: map basechain priority to hardware priority

Petr Machata (1):
      mlxsw: spectrum_ptp: Keep unmatched entries in a linked list

Rocky Liao (1):
      Bluetooth: hci_qca: Skip 1 error print in device_want_to_sleep()

Roman Mashak (2):
      net sched: update skbedit action for batched events operations
      tc-testing: updated skbedit action tests with batch create/delete

Ross Lagerwall (1):
      xen/netback: Reset nr_frags before freeing skb

Somnath Kotur (1):
      bnxt_en: Fix to include flow direction in L2 key

Stephen Hemminger (3):
      docs: admin-guide: remove references to IPX and token-ring
      net: docs: replace IPX in tuntap documentation
      net: cavium: fix driver name

Sven Eckelmann (2):
      batman-adv: Fix netlink dumping of all mcast_flags buckets
      batman-adv: Fix deletion of RTR(4|6) mcast list entries

Taehee Yoo (1):
      ixgbe: fix possible deadlock in ixgbe_service_task()

Takshak Chahande (1):
      libbpf : make libbpf_num_possible_cpus function thread safe

Tariq Toukan (5):
      net/mlx5: crypto, Fix wrong offset in encryption key command
      net/mlx5: kTLS, Fix wrong TIS opmod constants
      net/mlx5e: kTLS, Fix progress params context WQE layout
      net/mlx5e: kTLS, Fix tisn field name
      net/mlx5e: kTLS, Fix tisn field placement

Tho Vu (1):
      ravb: Fix use-after-free ravb_tstamp_skb

Thomas Falcon (2):
      ibmveth: Convert multicast list size for little-endian system
      ibmvnic: Unmap DMA address of TX descriptor buffers after use

Tuong Lien (1):
      tipc: fix false detection of retransmit failures

Vasundhara Volam (2):
      bnxt_en: Fix handling FRAG_ERR when NVM_INSTALL_UPDATE cmd fails
      bnxt_en: Suppress HWRM errors for HWRM_NVM_GET_VARIABLE command

Venkat Duvvuru (1):
      bnxt_en: Use correct src_fid to determine direction of the flow

Wei Yongjun (2):
      Bluetooth: btusb: Fix error return code in btusb_mtk_setup_firmware()
      Bluetooth: hci_qca: Use kfree_skb() instead of kfree()

Wenwen Wang (8):
      net/mlx4_en: fix a memory leak bug
      cxgb4: fix a memory leak bug
      liquidio: add cleanup in octeon_setup_iq()
      net: myri10ge: fix memory leaks
      lan78xx: Fix memory leaks
      cx82310_eth: fix a memory leak bug
      net: kalmia: fix memory leaks
      wimax/i2400m: fix a memory leak bug

Xin Long (1):
      sctp: fix the transport error_count check

YueHaibing (3):
      bonding: Add vlan tx offload to hw_enc_features
      net: dsa: sja1105: remove set but not used variables 'tx_vid' and 'rx_vid'
      team: Add vlan tx offload to hw_enc_features

zhengbin (1):
      sctp: fix memleak in sctp_send_reset_streams

 Documentation/admin-guide/sysctl/net.rst                         |  29 +---------------
 Documentation/networking/tls-offload.rst                         |  18 ----------
 Documentation/networking/tuntap.txt                              |   4 +--
 MAINTAINERS                                                      |   4 +--
 arch/x86/net/bpf_jit_comp.c                                      |   9 ++---
 drivers/bluetooth/btqca.c                                        |  29 ++++++++++++++--
 drivers/bluetooth/btqca.h                                        |   7 ++++
 drivers/bluetooth/btusb.c                                        |   4 ++-
 drivers/bluetooth/hci_qca.c                                      |   9 +++--
 drivers/net/bonding/bond_main.c                                  |   2 ++
 drivers/net/dsa/sja1105/sja1105_main.c                           |   4 ---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c                  |   7 ++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h                  |   2 ++
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c                 |  17 +++++++---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c                        |  36 +++++++++++++-------
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c                |   9 +++--
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c                |  12 +++----
 drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c                     |   8 ++---
 drivers/net/ethernet/broadcom/bnxt/bnxt_tc.h                     |   6 ++--
 drivers/net/ethernet/cavium/common/cavium_ptp.c                  |   2 +-
 drivers/net/ethernet/cavium/liquidio/request_manager.c           |   4 ++-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c               |   4 ++-
 drivers/net/ethernet/ezchip/nps_enet.h                           |   4 +--
 drivers/net/ethernet/ibm/ibmveth.c                               |   9 ++---
 drivers/net/ethernet/ibm/ibmvnic.c                               |  11 ++-----
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c                    |   5 +--
 drivers/net/ethernet/mellanox/mlx4/en_rx.c                       |   3 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h                     |  11 +++++--
 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c         |  19 +++++------
 drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c           |   3 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.h          |   6 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c       |  10 +++---
 drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c                |  97 +++++++++++++++++++-----------------------------------
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c             |  46 ++++++++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c                |   1 -
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c                  |  33 ++++++++++++-------
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h                |   4 +--
 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c       |  12 +++----
 drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c          |   9 +++++
 drivers/net/ethernet/mellanox/mlx5/core/lib/crypto.c             |   1 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c               |   2 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c               | 138 +++++++++++++++++++++++++++++++----------------------------------------------
 drivers/net/ethernet/mscc/ocelot_flower.c                        |  12 ++-----
 drivers/net/ethernet/myricom/myri10ge/myri10ge.c                 |   2 +-
 drivers/net/ethernet/netronome/nfp/flower/qos_conf.c             |   2 +-
 drivers/net/ethernet/renesas/ravb_main.c                         |   8 +++--
 drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c                  |   2 +-
 drivers/net/ethernet/toshiba/tc35815.c                           |   2 +-
 drivers/net/ethernet/tundra/tsi108_eth.c                         |   5 +--
 drivers/net/hyperv/netvsc_drv.c                                  |   9 +++--
 drivers/net/netdevsim/dev.c                                      |  63 ++++++++++++++---------------------
 drivers/net/netdevsim/fib.c                                      | 102 ++++++++++++++++++++++++++++++++++-----------------------
 drivers/net/netdevsim/netdev.c                                   |   9 ++++-
 drivers/net/netdevsim/netdevsim.h                                |  10 +++---
 drivers/net/phy/at803x.c                                         |  32 +++++-------------
 drivers/net/phy/phy-c45.c                                        |  14 ++++++++
 drivers/net/phy/phy_device.c                                     |  12 ++++++-
 drivers/net/team/team.c                                          |   2 ++
 drivers/net/usb/cx82310_eth.c                                    |   3 +-
 drivers/net/usb/kalmia.c                                         |   6 ++--
 drivers/net/usb/lan78xx.c                                        |   8 +++--
 drivers/net/wimax/i2400m/fw.c                                    |   4 ++-
 drivers/net/xen-netback/netback.c                                |   2 ++
 drivers/s390/net/qeth_core.h                                     |   1 +
 drivers/s390/net/qeth_core_main.c                                |  20 ++++++++++++
 include/linux/mlx5/device.h                                      |   4 +--
 include/linux/mlx5/mlx5_ifc.h                                    |   5 ++-
 include/linux/skbuff.h                                           |   8 +++++
 include/linux/socket.h                                           |   3 ++
 include/net/bluetooth/hci_core.h                                 |   1 +
 include/net/inet_frag.h                                          |   2 +-
 include/net/net_namespace.h                                      |   1 -
 include/net/netfilter/nf_tables.h                                |   9 +++--
 include/net/netfilter/nf_tables_offload.h                        |   2 ++
 include/net/netlink.h                                            |   5 ++-
 include/net/pkt_cls.h                                            |   2 +-
 include/net/sock.h                                               |  10 +++++-
 include/trace/events/rxrpc.h                                     |   6 ++--
 include/uapi/linux/bpf.h                                         |   4 +--
 net/batman-adv/multicast.c                                       |   8 +++--
 net/bluetooth/hci_core.c                                         |   1 +
 net/bluetooth/hci_debugfs.c                                      |  31 ++++++++++++++++++
 net/bluetooth/hidp/core.c                                        |   9 +++--
 net/bluetooth/l2cap_core.c                                       |   2 +-
 net/core/sock.c                                                  |  19 ++++++++---
 net/core/sock_diag.c                                             |   3 +-
 net/dsa/switch.c                                                 |   3 ++
 net/ieee802154/6lowpan/reassembly.c                              |   2 +-
 net/ipv4/inet_fragment.c                                         |  39 +++++++++++++++-------
 net/ipv4/ip_fragment.c                                           |   8 ++++-
 net/ipv4/tcp.c                                                   |   3 ++
 net/ipv4/tcp_bpf.c                                               |   6 +++-
 net/ipv4/tcp_output.c                                            |   3 ++
 net/ipv6/netfilter/nf_conntrack_reasm.c                          |   2 +-
 net/ipv6/reassembly.c                                            |   2 +-
 net/netfilter/nf_conntrack_core.c                                |  16 ++++-----
 net/netfilter/nf_flow_table_core.c                               |  43 +++++++++++++++++-------
 net/netfilter/nf_flow_table_ip.c                                 |  43 ++++++++++++++++++++++++
 net/netfilter/nf_tables_api.c                                    |  19 ++++++++---
 net/netfilter/nf_tables_offload.c                                |  17 ++++++++--
 net/netfilter/nft_flow_offload.c                                 |   9 +++--
 net/packet/af_packet.c                                           |   7 ++++
 net/rxrpc/af_rxrpc.c                                             |   6 ++--
 net/rxrpc/ar-internal.h                                          |   8 +++--
 net/rxrpc/call_event.c                                           |  15 ++++-----
 net/rxrpc/input.c                                                |  59 ++++++++++++++++-----------------
 net/rxrpc/local_object.c                                         | 103 +++++++++++++++++++++++++++++++++++----------------------
 net/rxrpc/output.c                                               |   3 +-
 net/rxrpc/recvmsg.c                                              |   6 ++--
 net/sched/act_skbedit.c                                          |  12 +++++++
 net/sched/sch_taprio.c                                           |   3 +-
 net/sctp/sm_sideeffect.c                                         |   2 +-
 net/sctp/stream.c                                                |   1 +
 net/tipc/addr.c                                                  |   1 +
 net/tipc/link.c                                                  |  92 ++++++++++++++++++++++++++++-----------------------
 net/tipc/msg.h                                                   |   8 +++--
 net/tls/tls_device.c                                             |   9 +++--
 net/tls/tls_main.c                                               |   2 ++
 tools/bpf/bpftool/common.c                                       |   8 +++--
 tools/include/uapi/linux/bpf.h                                   |  11 ++++---
 tools/lib/bpf/libbpf.c                                           |  33 ++++++++++---------
 tools/testing/selftests/bpf/verifier/loops1.c                    |  28 ++++++++++++++++
 tools/testing/selftests/net/tcp_fastopen_backup_key.sh           |   2 +-
 tools/testing/selftests/netfilter/nft_flowtable.sh               |  48 +++++++++++++++++++++++++++
 tools/testing/selftests/tc-testing/tc-tests/actions/skbedit.json |  47 ++++++++++++++++++++++++++
 125 files changed, 1156 insertions(+), 688 deletions(-)

^ permalink raw reply

* [PATCH net-next] r8152: fix accessing skb after napi_gro_receive
From: Hayes Wang @ 2019-08-19  3:15 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, Hayes Wang
In-Reply-To: <1394712342-15778-299-albertk@realtek.com>

Fix accessing skb after napi_gro_receive which is caused by
commit 47922fcde536 ("r8152: support skb_add_rx_frag").

Fixes: 47922fcde536 ("r8152: support skb_add_rx_frag")
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/usb/r8152.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 40d18e866269..b1db6df6f4ab 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -2094,10 +2094,10 @@ static int rx_bottom(struct r8152 *tp, int budget)
 			skb->protocol = eth_type_trans(skb, netdev);
 			rtl_rx_vlan_tag(rx_desc, skb);
 			if (work_done < budget) {
-				napi_gro_receive(napi, skb);
 				work_done++;
 				stats->rx_packets++;
 				stats->rx_bytes += skb->len;
+				napi_gro_receive(napi, skb);
 			} else {
 				__skb_queue_tail(&tp->rx_queue, skb);
 			}
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH v2] tun: fix use-after-free when register netdev failed
From: Jason Wang @ 2019-08-19  3:17 UTC (permalink / raw)
  To: Yang Yingliang, netdev; +Cc: eric.dumazet, xiyou.wangcong, davem, weiyongjun1
In-Reply-To: <1565953224-104941-1-git-send-email-yangyingliang@huawei.com>


On 2019/8/16 下午7:00, Yang Yingliang wrote:
> I got a UAF repport in tun driver when doing fuzzy test:
>
> [  466.269490] ==================================================================
> [  466.271792] BUG: KASAN: use-after-free in tun_chr_read_iter+0x2ca/0x2d0
> [  466.271806] Read of size 8 at addr ffff888372139250 by task tun-test/2699
> [  466.271810]
> [  466.271824] CPU: 1 PID: 2699 Comm: tun-test Not tainted 5.3.0-rc1-00001-g5a9433db2614-dirty #427
> [  466.271833] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
> [  466.271838] Call Trace:
> [  466.271858]  dump_stack+0xca/0x13e
> [  466.271871]  ? tun_chr_read_iter+0x2ca/0x2d0
> [  466.271890]  print_address_description+0x79/0x440
> [  466.271906]  ? vprintk_func+0x5e/0xf0
> [  466.271920]  ? tun_chr_read_iter+0x2ca/0x2d0
> [  466.271935]  __kasan_report+0x15c/0x1df
> [  466.271958]  ? tun_chr_read_iter+0x2ca/0x2d0
> [  466.271976]  kasan_report+0xe/0x20
> [  466.271987]  tun_chr_read_iter+0x2ca/0x2d0
> [  466.272013]  do_iter_readv_writev+0x4b7/0x740
> [  466.272032]  ? default_llseek+0x2d0/0x2d0
> [  466.272072]  do_iter_read+0x1c5/0x5e0
> [  466.272110]  vfs_readv+0x108/0x180
> [  466.299007]  ? compat_rw_copy_check_uvector+0x440/0x440
> [  466.299020]  ? fsnotify+0x888/0xd50
> [  466.299040]  ? __fsnotify_parent+0xd0/0x350
> [  466.299064]  ? fsnotify_first_mark+0x1e0/0x1e0
> [  466.304548]  ? vfs_write+0x264/0x510
> [  466.304569]  ? ksys_write+0x101/0x210
> [  466.304591]  ? do_preadv+0x116/0x1a0
> [  466.304609]  do_preadv+0x116/0x1a0
> [  466.309829]  do_syscall_64+0xc8/0x600
> [  466.309849]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  466.309861] RIP: 0033:0x4560f9
> [  466.309875] Code: 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> [  466.309889] RSP: 002b:00007ffffa5166e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000127
> [  466.322992] RAX: ffffffffffffffda RBX: 0000000000400460 RCX: 00000000004560f9
> [  466.322999] RDX: 0000000000000003 RSI: 00000000200008c0 RDI: 0000000000000003
> [  466.323007] RBP: 00007ffffa516700 R08: 0000000000000004 R09: 0000000000000000
> [  466.323014] R10: 0000000000000000 R11: 0000000000000206 R12: 000000000040cb10
> [  466.323021] R13: 0000000000000000 R14: 00000000006d7018 R15: 0000000000000000
> [  466.323057]
> [  466.323064] Allocated by task 2605:
> [  466.335165]  save_stack+0x19/0x80
> [  466.336240]  __kasan_kmalloc.constprop.8+0xa0/0xd0
> [  466.337755]  kmem_cache_alloc+0xe8/0x320
> [  466.339050]  getname_flags+0xca/0x560
> [  466.340229]  user_path_at_empty+0x2c/0x50
> [  466.341508]  vfs_statx+0xe6/0x190
> [  466.342619]  __do_sys_newstat+0x81/0x100
> [  466.343908]  do_syscall_64+0xc8/0x600
> [  466.345303]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  466.347034]
> [  466.347517] Freed by task 2605:
> [  466.348471]  save_stack+0x19/0x80
> [  466.349476]  __kasan_slab_free+0x12e/0x180
> [  466.350726]  kmem_cache_free+0xc8/0x430
> [  466.351874]  putname+0xe2/0x120
> [  466.352921]  filename_lookup+0x257/0x3e0
> [  466.354319]  vfs_statx+0xe6/0x190
> [  466.355498]  __do_sys_newstat+0x81/0x100
> [  466.356889]  do_syscall_64+0xc8/0x600
> [  466.358037]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  466.359567]
> [  466.360050] The buggy address belongs to the object at ffff888372139100
> [  466.360050]  which belongs to the cache names_cache of size 4096
> [  466.363735] The buggy address is located 336 bytes inside of
> [  466.363735]  4096-byte region [ffff888372139100, ffff88837213a100)
> [  466.367179] The buggy address belongs to the page:
> [  466.368604] page:ffffea000dc84e00 refcount:1 mapcount:0 mapping:ffff8883df1b4f00 index:0x0 compound_mapcount: 0
> [  466.371582] flags: 0x2fffff80010200(slab|head)
> [  466.372910] raw: 002fffff80010200 dead000000000100 dead000000000122 ffff8883df1b4f00
> [  466.375209] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
> [  466.377778] page dumped because: kasan: bad access detected
> [  466.379730]
> [  466.380288] Memory state around the buggy address:
> [  466.381844]  ffff888372139100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [  466.384009]  ffff888372139180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [  466.386131] >ffff888372139200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [  466.388257]                                                  ^
> [  466.390234]  ffff888372139280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [  466.392512]  ffff888372139300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [  466.394667] ==================================================================
>
> tun_chr_read_iter() accessed the memory which freed by free_netdev()
> called by tun_set_iff():
>
>         CPUA                                     CPUB
>     tun_set_iff()
>       alloc_netdev_mqs()
>       tun_attach()
>                                             tun_chr_read_iter()
>                                               tun_get()
>       register_netdevice() <-- inject error
>       tun_detach_all()
>         synchronize_net()
>                                               tun_do_read()
>                                                 tun_ring_recv()
>                                                   schedule()
>       free_netdev()
>         netdev_freemem()
>                                               tun_put()
>                                                 dev_put() <-- UAF
>
> Move tun_set_real_num_queues() out of tun_attach() and call it
> before register_netdevice() in tun_set_iff().
>
> Call tun_attach() after register_netdevice() to make sure tfile->tun
> is not published until the netdevice is registered. So the read/write
> thread can not use the tun pointer that may freed by free_netdev().
> (The tun and dev pointer are allocated by alloc_netdev_mqs(), they can
> be freed by netdev_freemem().)
>
> ---
> Changes in v2:
>  - add a param in tun_set_real_num_queues()
>  - move tun_set_real_num_queues() out of tun_attach()
>  - call tun_set_real_num_queues() before register_netdevice()
>  - call tun_attach() after register_netdevice()
> ---
>
> Cc: Yang Yingliang <yangyingliang@huawei.com>
> Fixes: eb0fb363f920 ("tuntap: attach queue 0 before registering netdevice")
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Suggested-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
> ---
>  drivers/net/tun.c | 38 +++++++++++++++++++++-----------------
>  1 file changed, 21 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index db16d7a13e00..a19f864c5f8d 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -626,10 +626,11 @@ static inline bool tun_not_capable(struct tun_struct *tun)
>  		!ns_capable(net->user_ns, CAP_NET_ADMIN);
>  }
>  
> -static void tun_set_real_num_queues(struct tun_struct *tun)
> +static void tun_set_real_num_queues(struct tun_struct *tun,
> +				    unsigned int numqueues)
>  {
> -	netif_set_real_num_tx_queues(tun->dev, tun->numqueues);
> -	netif_set_real_num_rx_queues(tun->dev, tun->numqueues);
> +	netif_set_real_num_tx_queues(tun->dev, numqueues);
> +	netif_set_real_num_rx_queues(tun->dev, numqueues);
>  }
>  
>  static void tun_disable_queue(struct tun_struct *tun, struct tun_file *tfile)
> @@ -708,7 +709,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
>  		tun_flow_delete_by_queue(tun, tun->numqueues + 1);
>  		/* Drop read queue */
>  		tun_queue_purge(tfile);
> -		tun_set_real_num_queues(tun);
> +		tun_set_real_num_queues(tun, tun->numqueues);
>  	} else if (tfile->detached && clean) {
>  		tun = tun_enable_queue(tfile);
>  		sock_put(&tfile->sk);
> @@ -873,7 +874,6 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
>  	rcu_assign_pointer(tfile->tun, tun);
>  	rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
>  	tun->numqueues++;
> -	tun_set_real_num_queues(tun);
>  out:
>  	return err;
>  }
> @@ -2734,6 +2734,8 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
>  		if (err < 0)
>  			return err;
>  
> +		tun_set_real_num_queues(tun, tun->numqueues);
> +
>  		if (tun->flags & IFF_MULTI_QUEUE &&
>  		    (tun->numqueues + tun->numdisabled > 1)) {
>  			/* One or more queue has already been attached, no need
> @@ -2828,14 +2830,18 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
>  			      (ifr->ifr_flags & TUN_FEATURES);
>  
>  		INIT_LIST_HEAD(&tun->disabled);
> -		err = tun_attach(tun, file, false, ifr->ifr_flags & IFF_NAPI,
> -				 ifr->ifr_flags & IFF_NAPI_FRAGS);
> -		if (err < 0)
> -			goto err_free_flow;
> +
> +		tun_set_real_num_queues(tun, tun->numqueues + 1);


This looks tricky, why not simply call netif_set_real_num_tx/rx_queues()
here?

Thanks


>  
>  		err = register_netdevice(tun->dev);
>  		if (err < 0)
> -			goto err_detach;
> +			/* register_netdevice() already called tun_free_netdev() */
> +			goto err_free_dev;
> +
> +		err = tun_attach(tun, file, false, ifr->ifr_flags & IFF_NAPI,
> +				 ifr->ifr_flags & IFF_NAPI_FRAGS);
> +		if (err < 0)
> +			goto err_unregister;
>  	}
>  
>  	netif_carrier_on(tun->dev);
> @@ -2851,14 +2857,10 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
>  	strcpy(ifr->ifr_name, tun->dev->name);
>  	return 0;
>  
> -err_detach:
> -	tun_detach_all(dev);
> -	/* register_netdevice() already called tun_free_netdev() */
> -	goto err_free_dev;
> +err_unregister:
> +	unregister_netdevice(dev);
> +	return err;
>  
> -err_free_flow:
> -	tun_flow_uninit(tun);
> -	security_tun_dev_free_security(tun->security);
>  err_free_stat:
>  	free_percpu(tun->pcpu_stats);
>  err_free_dev:
> @@ -2979,6 +2981,8 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
>  			goto unlock;
>  		ret = tun_attach(tun, file, false, tun->flags & IFF_NAPI,
>  				 tun->flags & IFF_NAPI_FRAGS);
> +		if (!ret)
> +			tun_set_real_num_queues(tun, tun->numqueues);
>  	} else if (ifr->ifr_flags & IFF_DETACH_QUEUE) {
>  		tun = rtnl_dereference(tfile->tun);
>  		if (!tun || !(tun->flags & IFF_MULTI_QUEUE) || tfile->detached)

^ permalink raw reply

* Re: [PATCH v2] rtw88: pci: Move a mass of jobs in hw IRQ to soft IRQ
From: Jian-Hong Pan @ 2019-08-19  3:26 UTC (permalink / raw)
  To: Tony Chuang
  Cc: Kalle Valo, David S . Miller, linux-wireless@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux@endlessm.com
In-Reply-To: <F7CD281DE3E379468C6D07993EA72F84D18932B7@RTITMBSVM04.realtek.com.tw>

Tony Chuang <yhchuang@realtek.com> 於 2019年8月16日 週五 下午6:44寫道：
>
> > From: Jian-Hong Pan
> >
> > There is a mass of jobs between spin lock and unlock in the hardware
> > IRQ which will occupy much time originally. To make system work more
> > efficiently, this patch moves the jobs to the soft IRQ (bottom half) to
> > reduce the time in hardware IRQ.
> >
> > Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
> > ---
> > v2:
> >  Change the spin_lock_irqsave/unlock_irqrestore to spin_lock/unlock in
> >  rtw_pci_interrupt_handler. Because the interrupts are already disabled
> >  in the hardware interrupt handler.
> >
> >  drivers/net/wireless/realtek/rtw88/pci.c | 33 +++++++++++++++++++-----
> >  1 file changed, 27 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/net/wireless/realtek/rtw88/pci.c
> > b/drivers/net/wireless/realtek/rtw88/pci.c
> > index 00ef229552d5..0740140d7e46 100644
> > --- a/drivers/net/wireless/realtek/rtw88/pci.c
> > +++ b/drivers/net/wireless/realtek/rtw88/pci.c
> > @@ -866,12 +866,28 @@ static irqreturn_t rtw_pci_interrupt_handler(int irq,
> > void *dev)
> >  {
> >       struct rtw_dev *rtwdev = dev;
> >       struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
> > -     u32 irq_status[4];
> >
> >       spin_lock(&rtwpci->irq_lock);
> >       if (!rtwpci->irq_enabled)
> >               goto out;
> >
> > +     /* disable RTW PCI interrupt to avoid more interrupts before the end of
> > +      * thread function
> > +      */
> > +     rtw_pci_disable_interrupt(rtwdev, rtwpci);
>
> So basically it's to prevent back-to-back interrupts.
>
> Nothing wrong about it, I just wondering why we don't like
> back-to-back interrupts. Does it means that those interrupts
> fired between irq_handler and threadfin would increase
> much more time to consume them.

1. It is one of the reasons.  Besides, the hardware interrupt has
higher priority than soft IRQ.  If next hardware interrupt is coming
faster then the soft IRQ (BH), the software IRQ may always become
pended.
2. Keep the data's state until the interrupt is fully dealt.
3. The process of request_threaded_irq is documented:
https://www.kernel.org/doc/htmldocs/kernel-api/API-request-threaded-irq.html

> > +out:
> > +     spin_unlock(&rtwpci->irq_lock);
> > +
> > +     return IRQ_WAKE_THREAD;
> > +}
> > +
> > +static irqreturn_t rtw_pci_interrupt_threadfn(int irq, void *dev)
> > +{
> > +     struct rtw_dev *rtwdev = dev;
> > +     struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
> > +     unsigned long flags;
> > +     u32 irq_status[4];
> > +
> >       rtw_pci_irq_recognized(rtwdev, rtwpci, irq_status);
> >
> >       if (irq_status[0] & IMR_MGNTDOK)
> > @@ -891,8 +907,11 @@ static irqreturn_t rtw_pci_interrupt_handler(int irq,
> > void *dev)
> >       if (irq_status[0] & IMR_ROK)
> >               rtw_pci_rx_isr(rtwdev, rtwpci, RTW_RX_QUEUE_MPDU);
> >
> > -out:
> > -     spin_unlock(&rtwpci->irq_lock);
> > +     /* all of the jobs for this interrupt have been done */
> > +     spin_lock_irqsave(&rtwpci->irq_lock, flags);
>
> I suggest to protect the ISRs. Because next patches will require
> to check if the TX DMA path is empty. This means I will also add
> this rtwpci->irq_lock to the TX path, and check if the skb_queue
> does not have any pending SKBs not DMAed successfully.

Ah ... Okay for the TX path

> > +     if (rtw_flag_check(rtwdev, RTW_FLAG_RUNNING))
>
> Why check the flag here? Is there any racing or something?
> Otherwise it looks to break the symmetry.

According to backtrace, I notice rtw_pci_stop will be invoked in
rtw_power_off,  rtw_core_stop which clears the running state.
rtw_core_stop is invoked by rtw_enter_ips due to IEEE80211_CONF_IDLE.
If the stop command comes between the hardware interrupt and soft IRQ
(BH) and there is no start command again, I think user wants the WiFi
stop instead of becoming start again.

Jian-Hong Pan

^ permalink raw reply

* [PATCH] net/ncsi: add control packet payload to NC-SI commands from netlink
From: Ben Wei @ 2019-08-19  4:33 UTC (permalink / raw)
  To: Ben Wei
  Cc: openbmc@lists.ozlabs.org, Samuel Mendoza-Jonas, David S. Miller,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Ben Wei

This patch adds support for NCSI_CMD_SEND_CMD netlink command to load packet data payload (up to 16 bytes) to standard NC-SI commands.

Packet data will be loaded from NCSI_ATTR_DATA attribute similar to NC-SI OEM commands

Signed-off-by: Ben Wei <benwei@fb.com>
---
 net/ncsi/internal.h     | 7 ++++---
 net/ncsi/ncsi-netlink.c | 9 +++++++++
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h index 0b3f0673e1a2..4ff442faf5dc 100644
--- a/net/ncsi/internal.h
+++ b/net/ncsi/internal.h
@@ -328,9 +328,10 @@ struct ncsi_cmd_arg {
 	unsigned short       payload;     /* Command packet payload length */
 	unsigned int         req_flags;   /* NCSI request properties       */
 	union {
-		unsigned char  bytes[16]; /* Command packet specific data  */
-		unsigned short words[8];
-		unsigned int   dwords[4];
+#define NCSI_MAX_DATA_BYTES 16
+		unsigned char  bytes[NCSI_MAX_DATA_BYTES]; /* Command packet specific data  */
+		unsigned short words[NCSI_MAX_DATA_BYTES / sizeof(unsigned short)];
+		unsigned int   dwords[NCSI_MAX_DATA_BYTES / sizeof(unsigned int)];
 	};
 	unsigned char        *data;       /* NCSI OEM data                 */
 	struct genl_info     *info;       /* Netlink information           */
diff --git a/net/ncsi/ncsi-netlink.c b/net/ncsi/ncsi-netlink.c index 8b386d766e7d..7d2a43f30eb1 100644
--- a/net/ncsi/ncsi-netlink.c
+++ b/net/ncsi/ncsi-netlink.c
@@ -459,6 +459,15 @@ static int ncsi_send_cmd_nl(struct sk_buff *msg, struct genl_info *info)
 	nca.payload = ntohs(hdr->length);
 	nca.data = data + sizeof(*hdr);
 
+	if (nca.payload <= NCSI_MAX_DATA_BYTES) {
+		memcpy(nca.bytes, nca.data, nca.payload);
+	} else {
+		netdev_info(ndp->ndev.dev, "NCSI:payload size %u exceeds max %u\n",
+			    nca.payload, NCSI_MAX_DATA_BYTES);
+		ret = -EINVAL;
+		goto out_netlink;
+	}
+
 	ret = ncsi_xmit_cmd(&nca);
 out_netlink:
 	if (ret != 0) {
--
2.17.1


^ permalink raw reply

* [PATCH] test_bpf: Fix a new clang warning about xor-ing two numbers
From: Nathan Chancellor @ 2019-08-19  4:34 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, netdev, bpf,
	linux-kernel, clang-built-linux, Nathan Chancellor

r369217 in clang added a new warning about potential misuse of the xor
operator as an exponentiation operator:

../lib/test_bpf.c:870:13: warning: result of '10 ^ 300' is 294; did you
mean '1e300'? [-Wxor-used-as-pow]
                { { 4, 10 ^ 300 }, { 20, 10 ^ 300 } },
                       ~~~^~~~~
                       1e300
../lib/test_bpf.c:870:13: note: replace expression with '0xA ^ 300' to
silence this warning
../lib/test_bpf.c:870:31: warning: result of '10 ^ 300' is 294; did you
mean '1e300'? [-Wxor-used-as-pow]
                { { 4, 10 ^ 300 }, { 20, 10 ^ 300 } },
                                         ~~~^~~~~
                                         1e300
../lib/test_bpf.c:870:31: note: replace expression with '0xA ^ 300' to
silence this warning

The commit link for this new warning has some good logic behind wanting
to add it but this instance appears to be a false positive. Adopt its
suggestion to silence the warning but not change the code. According to
the differential review link in the clang commit, GCC may eventually
adopt this warning as well.

Link: https://github.com/ClangBuiltLinux/linux/issues/643
Link: https://github.com/llvm/llvm-project/commit/920890e26812f808a74c60ebc14cc636dac661c1
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
---

I highly doubt that 1e300 was intented but if it was (or something else
was), please let me know. Commit history wasn't entirely clear on why
this expression was used over just a raw number.

 lib/test_bpf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index c41705835cba..5ef3eccee27c 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -867,7 +867,7 @@ static struct bpf_test tests[] = {
 		},
 		CLASSIC,
 		{ },
-		{ { 4, 10 ^ 300 }, { 20, 10 ^ 300 } },
+		{ { 4, 0xA ^ 300 }, { 20, 0xA ^ 300 } },
 	},
 	{
 		"SPILL_FILL",
-- 
2.23.0


^ permalink raw reply related

* [PATCH] Kconfig: Fix the reference to the IDT77105 Phy driver in the description of ATM_NICSTAR_USE_IDT77105
From: Christophe JAILLET @ 2019-08-19  5:04 UTC (permalink / raw)
  To: 3chas3
  Cc: linux-atm-general, netdev, linux-kernel, kernel-janitors,
	Christophe JAILLET

This should be IDT77105, not IDT77015.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
---
 drivers/atm/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/atm/Kconfig b/drivers/atm/Kconfig
index 2e2efa577437..8c37294f1d1e 100644
--- a/drivers/atm/Kconfig
+++ b/drivers/atm/Kconfig
@@ -200,7 +200,7 @@ config ATM_NICSTAR_USE_SUNI
 	  make the card work).
 
 config ATM_NICSTAR_USE_IDT77105
-	bool "Use IDT77015 PHY driver (25Mbps)"
+	bool "Use IDT77105 PHY driver (25Mbps)"
 	depends on ATM_NICSTAR
 	help
 	  Support for the PHYsical layer chip in ForeRunner LE25 cards. In
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH] test_bpf: Fix a new clang warning about xor-ing two numbers
From: Yonghong Song @ 2019-08-19  5:51 UTC (permalink / raw)
  To: Nathan Chancellor, Alexei Starovoitov, Daniel Borkmann
  Cc: Martin Lau, Song Liu, netdev@vger.kernel.org, bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org, clang-built-linux@googlegroups.com
In-Reply-To: <20190819043419.68223-1-natechancellor@gmail.com>



On 8/18/19 9:34 PM, Nathan Chancellor wrote:
> r369217 in clang added a new warning about potential misuse of the xor
> operator as an exponentiation operator:
> 
> ../lib/test_bpf.c:870:13: warning: result of '10 ^ 300' is 294; did you
> mean '1e300'? [-Wxor-used-as-pow]
>                  { { 4, 10 ^ 300 }, { 20, 10 ^ 300 } },
>                         ~~~^~~~~
>                         1e300
> ../lib/test_bpf.c:870:13: note: replace expression with '0xA ^ 300' to
> silence this warning
> ../lib/test_bpf.c:870:31: warning: result of '10 ^ 300' is 294; did you
> mean '1e300'? [-Wxor-used-as-pow]
>                  { { 4, 10 ^ 300 }, { 20, 10 ^ 300 } },
>                                           ~~~^~~~~
>                                           1e300
> ../lib/test_bpf.c:870:31: note: replace expression with '0xA ^ 300' to
> silence this warning
> 
> The commit link for this new warning has some good logic behind wanting
> to add it but this instance appears to be a false positive. Adopt its
> suggestion to silence the warning but not change the code. According to
> the differential review link in the clang commit, GCC may eventually
> adopt this warning as well.
> 
> Link: https://github.com/ClangBuiltLinux/linux/issues/643
> Link: https://github.com/llvm/llvm-project/commit/920890e26812f808a74c60ebc14cc636dac661c1
> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>

Verified that latest trunk clang indeed has this warning, and below 
change indeed fixed the warning in the correct way.

Acked-by: Yonghong Song <yhs@fb.com>

> ---
> 
> I highly doubt that 1e300 was intented but if it was (or something else
> was), please let me know. Commit history wasn't entirely clear on why
> this expression was used over just a raw number.
> 
>   lib/test_bpf.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/test_bpf.c b/lib/test_bpf.c
> index c41705835cba..5ef3eccee27c 100644
> --- a/lib/test_bpf.c
> +++ b/lib/test_bpf.c
> @@ -867,7 +867,7 @@ static struct bpf_test tests[] = {
>   		},
>   		CLASSIC,
>   		{ },
> -		{ { 4, 10 ^ 300 }, { 20, 10 ^ 300 } },
> +		{ { 4, 0xA ^ 300 }, { 20, 0xA ^ 300 } },
>   	},
>   	{
>   		"SPILL_FILL",
> 

^ permalink raw reply

* Re: [PATCH net-next] net: openvswitch: Set OvS recirc_id from tc chain
From: Paul Blakey @ 2019-08-19  6:20 UTC (permalink / raw)
  To: Pravin B Shelar, netdev@vger.kernel.org, David S. Miller,
	Justin Pettit, Simon Horman, Marcelo Ricardo Leitner, Vlad Buslov
  Cc: Jiri Pirko, Roi Dayan, Yossi Kuperman, Rony Efraim, Oz Shlomo
In-Reply-To: <1566144059-8247-1-git-send-email-paulb@mellanox.com>


On 8/18/2019 7:00 PM, Paul Blakey wrote:
> What do you guys say about the following diff on top of the last one?
> Use static key, and also have OVS_DP_CMD_SET command probe/enable the feature.
>
> This will allow userspace to probe the feature, and selectivly enable it via the
> OVS_DP_CMD_SET command.
>
> Thansk,
> Paul.
>
>
> ---
>   include/uapi/linux/openvswitch.h |  3 +++
>   net/openvswitch/datapath.c       | 29 +++++++++++++++++++++++++----
>   net/openvswitch/datapath.h       |  2 ++
>   net/openvswitch/flow.c           |  6 ++++--
>   4 files changed, 34 insertions(+), 6 deletions(-)
>
> diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
> index f271f1e..1887a45 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -123,6 +123,9 @@ struct ovs_vport_stats {
>   /* Allow datapath to associate multiple Netlink PIDs to each vport */
>   #define OVS_DP_F_VPORT_PIDS	(1 << 1)
>   
> +/* Allow tc offload recirc sharing */
> +#define OVS_DP_F_TC_RECIRC_SHARING	(1 << 2)
> +
>   /* Fixed logical ports. */
>   #define OVSP_LOCAL      ((__u32)0)
>   
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index 892287d..589b4f1 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -1541,10 +1541,27 @@ static void ovs_dp_reset_user_features(struct sk_buff *skb, struct genl_info *in
>   	dp->user_features = 0;
>   }
>   
> -static void ovs_dp_change(struct datapath *dp, struct nlattr *a[])
> +DEFINE_STATIC_KEY_FALSE(tc_recirc_sharing_support);
> +
> +static int ovs_dp_change(struct datapath *dp, struct nlattr *a[])
>   {
> +	u32 user_features;
> +
>   	if (a[OVS_DP_ATTR_USER_FEATURES])
> -		dp->user_features = nla_get_u32(a[OVS_DP_ATTR_USER_FEATURES]);
> +		user_features = nla_get_u32(a[OVS_DP_ATTR_USER_FEATURES]);
> +
> +#if !IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
> +	if (user_features & OVS_DP_F_TC_RECIRC_SHARING)
> +		return -EOPNOTSUPP;
> +#endif
> +	dp->user_features = user_features;
> +
> +	if (dp->user_features & OVS_DP_F_TC_RECIRC_SHARING)
> +		static_branch_enable(&tc_recirc_sharing_support);
> +	else
> +		static_branch_disable(&tc_recirc_sharing_support);
> +
> +	return 0;
>   }
>   
>   static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
> @@ -1606,7 +1623,9 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
>   	parms.port_no = OVSP_LOCAL;
>   	parms.upcall_portids = a[OVS_DP_ATTR_UPCALL_PID];
>   
> -	ovs_dp_change(dp, a);
> +	err = ovs_dp_change(dp, a);
> +	if (err)
> +		goto err_destroy_meters;
>   
>   	/* So far only local changes have been made, now need the lock. */
>   	ovs_lock();
> @@ -1732,7 +1751,9 @@ static int ovs_dp_cmd_set(struct sk_buff *skb, struct genl_info *info)
>   	if (IS_ERR(dp))
>   		goto err_unlock_free;
>   
> -	ovs_dp_change(dp, info->attrs);
> +	err = ovs_dp_change(dp, info->attrs);
> +	if (err)
> +		goto err_unlock_free;
>   
>   	err = ovs_dp_cmd_fill_info(dp, reply, info->snd_portid,
>   				   info->snd_seq, 0, OVS_DP_CMD_SET);
> diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
> index 751d34a..81e85dd 100644
> --- a/net/openvswitch/datapath.h
> +++ b/net/openvswitch/datapath.h
> @@ -218,6 +218,8 @@ static inline struct datapath *get_dp(struct net *net, int dp_ifindex)
>   extern struct notifier_block ovs_dp_device_notifier;
>   extern struct genl_family dp_vport_genl_family;
>   
> +DECLARE_STATIC_KEY_FALSE(tc_recirc_sharing_support);
> +
>   void ovs_dp_process_packet(struct sk_buff *skb, struct sw_flow_key *key);
>   void ovs_dp_detach_port(struct vport *);
>   int ovs_dp_upcall(struct datapath *, struct sk_buff *,
> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> index 0287ead..c0ac7c9 100644
> --- a/net/openvswitch/flow.c
> +++ b/net/openvswitch/flow.c
> @@ -853,8 +853,10 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
>   	key->mac_proto = res;
>   
>   #if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
> -	tc_ext = skb_ext_find(skb, TC_SKB_EXT);
> -	key->recirc_id = tc_ext ? tc_ext->chain : 0;
> +	if (static_branch_unlikely(&tc_recirc_sharing_support)) {
> +		tc_ext = skb_ext_find(skb, TC_SKB_EXT);
> +		key->recirc_id = tc_ext ? tc_ext->chain : 0;
> +	}

Here should be

else

       key->recirc_id = 0

>   #else
>   	key->recirc_id = 0;
>   #endif

^ permalink raw reply

* Re: Re: [PATCH net-next 0/1] Add BASE-T1 PHY support
From: Christian Herber @ 2019-08-19  6:32 UTC (permalink / raw)
  To: Heiner Kallweit, davem@davemloft.net
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <8c15b855-6947-9930-c3df-71a64fbff33b@gmail.com>

On 16.08.2019 22:59, Heiner Kallweit wrote:
> On 15.08.2019 17:32, Christian Herber wrote:
>> This patch adds basic support for BASE-T1 PHYs in the framework.
>> BASE-T1 PHYs main area of application are automotive and industrial.
>> BASE-T1 is standardized in IEEE 802.3, namely
>> - IEEE 802.3bw: 100BASE-T1
>> - IEEE 802.3bp 1000BASE-T1
>> - IEEE 802.3cg: 10BASE-T1L and 10BASE-T1S
>>
>> There are no products which contain BASE-T1 and consumer type PHYs like
>> 1000BASE-T. However, devices exist which combine 100BASE-T1 and 1000BASE-T1
>> PHYs with auto-negotiation.
> 
> Is this meant in a way that *currently* there are no PHY's combining Base-T1
> with normal Base-T modes? Or are there reasons why this isn't possible in
> general? I'm asking because we have PHY's combining copper and fiber, and e.g.
> the mentioned Aquantia PHY that combines NBase-T with 1000Base-T2.
> 
>>
>> The intention of this patch is to make use of the existing Clause 45 functions.
>> BASE-T1 adds some additional registers e.g. for aneg control, which follow a
>> similiar register layout as the existing devices. The bits which are used in
>> BASE-T1 specific registers are the same as in basic registers, thus the
>> existing functions can be resued, with get_aneg_ctrl() selecting the correct
>> register address.
>>
> If Base-T1 can't be combined with other modes then at a first glance I see no
> benefit in defining new registers e.g. for aneg control, and the standard ones
> are unused. Why not using the standard registers? Can you shed some light on that?
> 
> Are the new registers internally shadowed to the standard location?
> That's something I've seen on other PHY's: one register appears in different
> places in different devices.
> 
>> The current version of ethtool has been prepared for 100/1000BASE-T1 and works
>> with this patch. 10BASE-T1 needs to be added to ethtool.
>>
>> Christian Herber (1):
>>    Added BASE-T1 PHY support to PHY Subsystem
>>
>>   drivers/net/phy/phy-c45.c    | 113 +++++++++++++++++++++++++++++++----
>>   drivers/net/phy/phy-core.c   |   4 +-
>>   include/uapi/linux/ethtool.h |   2 +
>>   include/uapi/linux/mdio.h    |  21 +++++++
>>   4 files changed, 129 insertions(+), 11 deletions(-)
>>
> 
> Heiner
> 

Hi Heiner,

I do not think the Aquantia part you are describing is publicly 
documented, so i cannot comment on that part.
There are multiple reasons why e.g. xBASE-T1 plus 1000BASE-T is 
unlikely. First, the is no use-case known to me, where this would be 
required. Second, there is no way that you can do an auto-negotiation 
between the two, as these both have their own auto-neg defined (Clause 
28/73 vs. Clause 98). Thirdly, if you would ever have a product with 
both, I believe it would just include two full PHYs and a way to select 
which flavor you want. Of course, this is the theory until proven 
otherwise, but to me it is sufficient to use a single driver.

The registers are different in the fields they include. It is just that 
the flags which are used by the Linux driver, like restarting auto-neg, 
are at the same position.

Christian


^ permalink raw reply

* Re: [PATCH bpf-next] libbpf: remove zc variable as it is not used
From: Magnus Karlsson @ 2019-08-19  6:35 UTC (permalink / raw)
  To: Jonathan Lemon, Maxim Mikityanskiy
  Cc: Yonghong Song, Magnus Karlsson, Björn Töpel,
	Alexei Starovoitov, Daniel Borkmann, Network Development, bpf
In-Reply-To: <2B143E7F-EE34-4298-B628-E2F669F89896@gmail.com>

On Sat, Aug 17, 2019 at 12:02 AM Jonathan Lemon
<jonathan.lemon@gmail.com> wrote:
>
>
>
> On 16 Aug 2019, at 8:37, Yonghong Song wrote:
>
> > On 8/16/19 3:26 AM, Magnus Karlsson wrote:
> >> The zc is not used in the xsk part of libbpf, so let us remove it.
> >> Not
> >> good to have dead code lying around.
> >>
> >> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> >> Reported-by: Yonghong Song <yhs@fb.com> > ---
> >>   tools/lib/bpf/xsk.c | 3 ---
> >>   1 file changed, 3 deletions(-)
> >>
> >> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> >> index 680e630..9687da9 100644
> >> --- a/tools/lib/bpf/xsk.c
> >> +++ b/tools/lib/bpf/xsk.c
> >> @@ -65,7 +65,6 @@ struct xsk_socket {
> >>      int xsks_map_fd;
> >>      __u32 queue_id;
> >>      char ifname[IFNAMSIZ];
> >> -    bool zc;
> >>   };
> >>
> >>   struct xsk_nl_info {
> >> @@ -608,8 +607,6 @@ int xsk_socket__create(struct xsk_socket
> >> **xsk_ptr, const char *ifname,
> >>              goto out_mmap_tx;
> >>      }
> >>
> >> -    xsk->zc = opts.flags & XDP_OPTIONS_ZEROCOPY;
> >
> > Since opts.flags usage is removed. Do you think it makes sense to
> > remove
> >          optlen = sizeof(opts);
> >          err = getsockopt(xsk->fd, SOL_XDP, XDP_OPTIONS, &opts,
> > &optlen);
> >          if (err) {
> >                  err = -errno;
> >                  goto out_mmap_tx;
> >          }
> > as well since nobody then uses opts?
>
> IIRC, this was added specifically in
> 2761ed4b6e192820760d5ba913834b2ba05fd08c
> so that userland code could know whether the socket was operating in
> zero-copy
> mode or not.

Thanks for reminding me Jonathan.

Roping in Maxim here since he wrote the patch. Was this something you
planned on using but the functionality that needed it was removed? The
patch set did go through a number of changes in the libbpf area, if I
remember correctly.

There are two options: either we remove it, or we add an interface in
xsk.h so that people can use it. I vote for the latter since I think
it could be useful. The sample app could use it at least :-).

/Magnus

> --
> Jonathan

^ permalink raw reply

* Re:  Re: [PATCH net-next 1/1] Added BASE-T1 PHY support to PHY Subsystem
From: Christian Herber @ 2019-08-19  6:40 UTC (permalink / raw)
  To: Heiner Kallweit, davem@davemloft.net
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <95eda63a-8202-3f49-c86a-e418162b2811@gmail.com>

On 16.08.2019 23:13, Heiner Kallweit wrote:
> 
> On 15.08.2019 17:32, Christian Herber wrote:
>> BASE-T1 is a category of Ethernet PHYs.
>> They use a single copper pair for transmission.
>> This patch add basic support for this category of PHYs.
>> It coveres the discovery of abilities and basic configuration.
>> It includes setting fixed speed and enabling auto-negotiation.
>> BASE-T1 devices should always Clause-45 managed.
>> Therefore, this patch extends phy-c45.c.
>> While for some functions like auto-neogtiation different registers are
>> used, the layout of these registers is the same for the used fields.
>> Thus, much of the logic of basic Clause-45 devices can be reused.
>>
>> Signed-off-by: Christian Herber <christian.herber@nxp.com>
>> ---
>>   drivers/net/phy/phy-c45.c    | 113 +++++++++++++++++++++++++++++++----
>>   drivers/net/phy/phy-core.c   |   4 +-
>>   include/uapi/linux/ethtool.h |   2 +
>>   include/uapi/linux/mdio.h    |  21 +++++++
>>   4 files changed, 129 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/net/phy/phy-c45.c b/drivers/net/phy/phy-c45.c
>> index b9d4145781ca..9ff0b8c785de 100644
>> --- a/drivers/net/phy/phy-c45.c
>> +++ b/drivers/net/phy/phy-c45.c
>> @@ -8,13 +8,23 @@
>>   #include <linux/mii.h>
>>   #include <linux/phy.h>
>>
>> +#define IS_100BASET1(phy) (linkmode_test_bit( \
>> +                        ETHTOOL_LINK_MODE_100baseT1_Full_BIT, \
>> +                        (phy)->supported))
>> +#define IS_1000BASET1(phy) (linkmode_test_bit( \
>> +                         ETHTOOL_LINK_MODE_1000baseT1_Full_BIT, \
>> +                         (phy)->supported))
>> +
> Why is there no such macro for 10BaseT1?
> 
>> +static u32 get_aneg_ctrl(struct phy_device *phydev);
>> +static u32 get_aneg_stat(struct phy_device *phydev);
>> +
>>   /**
>>    * genphy_c45_setup_forced - configures a forced speed
>>    * @phydev: target phy_device struct
>>    */
>>   int genphy_c45_pma_setup_forced(struct phy_device *phydev)
>>   {
>> -     int ctrl1, ctrl2, ret;
>> +     int ctrl1, ctrl2, base_t1_ctrl = 0, ret;
>>
>>        /* Half duplex is not supported */
>>        if (phydev->duplex != DUPLEX_FULL)
>> @@ -28,6 +38,16 @@ int genphy_c45_pma_setup_forced(struct phy_device *phydev)
>>        if (ctrl2 < 0)
>>                return ctrl2;
>>
>> +     if (IS_100BASET1(phydev) || IS_1000BASET1(phydev)) {
> 
> 10BaseT1 doesn't need to be considered here?
> And maybe it would be better to have a flag for BaseT1 at phy_device level
> (based on bit MDIO_PMA_EXTABLE_BASET1?) instead of checking whether certain
> T1 modes are supported. Then we would be more future-proof in case of new
> T1 modes.
> 
>> +             base_t1_ctrl = phy_read_mmd(phydev,
>> +                                         MDIO_MMD_PMAPMD,
>> +                                         MDIO_PMA_BASET1CTRL);
>> +             if (base_t1_ctrl < 0)
>> +                     return base_t1_ctrl;
>> +
>> +             base_t1_ctrl &= ~(MDIO_PMA_BASET1CTRL_TYPE);
>> +     }
>> +
>>        ctrl1 &= ~MDIO_CTRL1_SPEEDSEL;
>>        /*
>>         * PMA/PMD type selection is 1.7.5:0 not 1.7.3:0.  See 45.2.1.6.1
>> @@ -41,12 +61,21 @@ int genphy_c45_pma_setup_forced(struct phy_device *phydev)
>>                break;
>>        case SPEED_100:
>>                ctrl1 |= MDIO_PMA_CTRL1_SPEED100;
>> -             ctrl2 |= MDIO_PMA_CTRL2_100BTX;
>> +             if (IS_100BASET1(phydev)) {
>> +                     ctrl2 |= MDIO_PMA_CTRL2_BT1;
>> +                     base_t1_ctrl |= MDIO_PMA_BASET1CTRL_TYPE_100BT1;
>> +             } else {
>> +                     ctrl2 |= MDIO_PMA_CTRL2_100BTX;
>> +             }
>>                break;
>>        case SPEED_1000:
>>                ctrl1 |= MDIO_PMA_CTRL1_SPEED1000;
>> -             /* Assume 1000base-T */
>> -             ctrl2 |= MDIO_PMA_CTRL2_1000BT;
>> +             if (IS_1000BASET1(phydev)) {
>> +                     ctrl2 |= MDIO_PMA_CTRL2_BT1;
>> +                     base_t1_ctrl |= MDIO_PMA_BASET1CTRL_TYPE_1000BT1;
>> +             } else {
>> +                     ctrl2 |= MDIO_PMA_CTRL2_1000BT;
>> +             }
>>                break;
>>        case SPEED_2500:
>>                ctrl1 |= MDIO_CTRL1_SPEED2_5G;
>> @@ -75,6 +104,14 @@ int genphy_c45_pma_setup_forced(struct phy_device *phydev)
>>        if (ret < 0)
>>                return ret;
>>
>> +     if (IS_100BASET1(phydev) || IS_1000BASET1(phydev)) {
>> +             ret = phy_write_mmd(phydev,
>> +                                 MDIO_MMD_PMAPMD,
>> +                                 MDIO_PMA_BASET1CTRL,
>> +                                 base_t1_ctrl);
>> +             if (ret < 0)
>> +                     return ret;
>> +     }
>>        return genphy_c45_an_disable_aneg(phydev);
>>   }
>>   EXPORT_SYMBOL_GPL(genphy_c45_pma_setup_forced);
>> @@ -135,8 +172,7 @@ EXPORT_SYMBOL_GPL(genphy_c45_an_config_aneg);
>>    */
>>   int genphy_c45_an_disable_aneg(struct phy_device *phydev)
>>   {
>> -
>> -     return phy_clear_bits_mmd(phydev, MDIO_MMD_AN, MDIO_CTRL1,
>> +     return phy_clear_bits_mmd(phydev, MDIO_MMD_AN, get_aneg_ctrl(phydev),
>>                                  MDIO_AN_CTRL1_ENABLE | MDIO_AN_CTRL1_RESTART);
>>   }
>>   EXPORT_SYMBOL_GPL(genphy_c45_an_disable_aneg);
>> @@ -151,7 +187,7 @@ EXPORT_SYMBOL_GPL(genphy_c45_an_disable_aneg);
>>    */
>>   int genphy_c45_restart_aneg(struct phy_device *phydev)
>>   {
>> -     return phy_set_bits_mmd(phydev, MDIO_MMD_AN, MDIO_CTRL1,
>> +     return phy_set_bits_mmd(phydev, MDIO_MMD_AN, get_aneg_ctrl(phydev),
>>                                MDIO_AN_CTRL1_ENABLE | MDIO_AN_CTRL1_RESTART);
>>   }
>>   EXPORT_SYMBOL_GPL(genphy_c45_restart_aneg);
>> @@ -171,7 +207,7 @@ int genphy_c45_check_and_restart_aneg(struct phy_device *phydev, bool restart)
>>
>>        if (!restart) {
>>                /* Configure and restart aneg if it wasn't set before */
>> -             ret = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_CTRL1);
>> +             ret = phy_read_mmd(phydev, MDIO_MMD_AN, get_aneg_ctrl(phydev));
>>                if (ret < 0)
>>                        return ret;
>>
>> @@ -199,7 +235,7 @@ EXPORT_SYMBOL_GPL(genphy_c45_check_and_restart_aneg);
>>    */
>>   int genphy_c45_aneg_done(struct phy_device *phydev)
>>   {
>> -     int val = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_STAT1);
>> +     int val = phy_read_mmd(phydev, MDIO_MMD_AN, get_aneg_stat(phydev));
>>
>>        return val < 0 ? val : val & MDIO_AN_STAT1_COMPLETE ? 1 : 0;
>>   }
>> @@ -385,7 +421,9 @@ EXPORT_SYMBOL_GPL(genphy_c45_read_mdix);
>>    * PMA Extended Abilities (1.11) register, indicating 1000BASET an 10G related
>>    * modes. If bit 1.11.14 is set, then the list is also extended with the modes
>>    * in the 2.5G/5G PMA Extended register (1.21), indicating if 2.5GBASET and
>> - * 5GBASET are supported.
>> + * 5GBASET are supported. If bit 1.11.11 is set, then the list is also extended
>> + * with the modes in the BASE-T1 PMA Extended register (1.18), indicating if
>> + * 10/100/1000BASET-1 are supported.
>>    */
>>   int genphy_c45_pma_read_abilities(struct phy_device *phydev)
>>   {
>> @@ -470,6 +508,29 @@ int genphy_c45_pma_read_abilities(struct phy_device *phydev)
>>                                         phydev->supported,
>>                                         val & MDIO_PMA_NG_EXTABLE_5GBT);
>>                }
>> +
>> +             if (val & MDIO_PMA_EXTABLE_BASET1) {
>> +                     val = phy_read_mmd(phydev, MDIO_MMD_PMAPMD,
>> +                                        MDIO_PMA_BASET1_EXTABLE);
>> +                     if (val < 0)
>> +                             return val;
>> +
>> +                     linkmode_mod_bit(ETHTOOL_LINK_MODE_100baseT1_Full_BIT,
>> +                                      phydev->supported,
>> +                                      val & MDIO_PMA_BASET1_EXTABLE_100BT1);
>> +
>> +                     linkmode_mod_bit(ETHTOOL_LINK_MODE_1000baseT1_Full_BIT,
>> +                                      phydev->supported,
>> +                                      val & MDIO_PMA_BASET1_EXTABLE_1000BT1);
>> +
>> +                     linkmode_mod_bit(ETHTOOL_LINK_MODE_10baseT1L_Full_BIT,
>> +                                      phydev->supported,
>> +                                      val & MDIO_PMA_BASET1_EXTABLE_10BT1L);
>> +
>> +                     linkmode_mod_bit(ETHTOOL_LINK_MODE_10baseT1S_Full_BIT,
>> +                                      phydev->supported,
>> +                                      val & MDIO_PMA_BASET1_EXTABLE_10BT1S);
>> +             }
>>        }
>>
>>        return 0;
>> @@ -509,6 +570,38 @@ int genphy_c45_read_status(struct phy_device *phydev)
>>   }
>>   EXPORT_SYMBOL_GPL(genphy_c45_read_status);
>>
>> +/**
>> + * get_aneg_ctrl - Get the register address for auto-
>> + * negotiation control register
>> + * @phydev: target phy_device struct
>> + *
>> + */
>> +static u32 get_aneg_ctrl(struct phy_device *phydev)
>> +{
>> +     u32 ctrl = MDIO_CTRL1;
>> +
>> +     if (IS_100BASET1(phydev) || IS_1000BASET1(phydev))
>> +             ctrl = MDIO_AN_BT1_CTRL;
>> +
> AFAICS 10BaseT1 has separate aneg registers (526/527).
> To be considered here?
> 
>> +     return ctrl;
>> +}
>> +
>> +/**
>> + * get_aneg_ctrl - Get the register address for auto-
>> + * negotiation status register
>> + * @phydev: target phy_device struct
>> + *
>> + */
>> +static u32 get_aneg_stat(struct phy_device *phydev)
>> +{
>> +     u32 stat = MDIO_STAT1;
>> +
>> +     if (IS_100BASET1(phydev) || IS_1000BASET1(phydev))
>> +             stat = MDIO_AN_BT1_STAT;
>> +
>> +     return stat;
>> +}
>> +
>>   /* The gen10g_* functions are the old Clause 45 stub */
>>
>>   int gen10g_config_aneg(struct phy_device *phydev)
>> diff --git a/drivers/net/phy/phy-core.c b/drivers/net/phy/phy-core.c
>> index 369903d9b6ec..b50576f7709a 100644
>> --- a/drivers/net/phy/phy-core.c
>> +++ b/drivers/net/phy/phy-core.c
>> @@ -8,7 +8,7 @@
>>
>>   const char *phy_speed_to_str(int speed)
>>   {
>> -     BUILD_BUG_ON_MSG(__ETHTOOL_LINK_MODE_MASK_NBITS != 69,
>> +     BUILD_BUG_ON_MSG(__ETHTOOL_LINK_MODE_MASK_NBITS != 71,
>>                "Enum ethtool_link_mode_bit_indices and phylib are out of sync. "
>>                "If a speed or mode has been added please update phy_speed_to_str "
>>                "and the PHY settings array.\n");
>> @@ -140,6 +140,8 @@ static const struct phy_setting settings[] = {
>>        /* 10M */
>>        PHY_SETTING(     10, FULL,     10baseT_Full             ),
>>        PHY_SETTING(     10, HALF,     10baseT_Half             ),
>> +     PHY_SETTING(     10, FULL,     10baseT1L_Full           ),
>> +     PHY_SETTING(     10, FULL,     10baseT1S_Full           ),
>>   };
>>   #undef PHY_SETTING
>>
>> diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
>> index dd06302aa93e..e429cc8da31a 100644
>> --- a/include/uapi/linux/ethtool.h
>> +++ b/include/uapi/linux/ethtool.h
>> @@ -1485,6 +1485,8 @@ enum ethtool_link_mode_bit_indices {
>>        ETHTOOL_LINK_MODE_200000baseCR4_Full_BIT         = 66,
>>        ETHTOOL_LINK_MODE_100baseT1_Full_BIT             = 67,
>>        ETHTOOL_LINK_MODE_1000baseT1_Full_BIT            = 68,
>> +     ETHTOOL_LINK_MODE_10baseT1L_Full_BIT             = 69,
>> +     ETHTOOL_LINK_MODE_10baseT1S_Full_BIT             = 70,
>>
>>        /* must be last entry */
>>        __ETHTOOL_LINK_MODE_MASK_NBITS
>> diff --git a/include/uapi/linux/mdio.h b/include/uapi/linux/mdio.h
>> index 0a552061ff1c..6fd5ff632b8e 100644
>> --- a/include/uapi/linux/mdio.h
>> +++ b/include/uapi/linux/mdio.h
>> @@ -43,6 +43,7 @@
>>   #define MDIO_PKGID1          14      /* Package identifier */
>>   #define MDIO_PKGID2          15
>>   #define MDIO_AN_ADVERTISE    16      /* AN advertising (base page) */
>> +#define MDIO_PMA_BASET1_EXTABLE      18      /* BASE-T1 PMA/PMD extended ability */
>>   #define MDIO_AN_LPA          19      /* AN LP abilities (base page) */
>>   #define MDIO_PCS_EEE_ABLE    20      /* EEE Capability register */
>>   #define MDIO_PMA_NG_EXTABLE  21      /* 2.5G/5G PMA/PMD extended ability */
>> @@ -57,11 +58,16 @@
>>   #define MDIO_PMA_10GBT_SNR   133     /* 10GBASE-T SNR margin, lane A.
>>                                         * Lanes B-D are numbered 134-136. */
>>   #define MDIO_PMA_10GBR_FECABLE       170     /* 10GBASE-R FEC ability */
>> +#define MDIO_PMA_BASET1CTRL     2100 /* BASE-T1 PMA/PMD control */
>>   #define MDIO_PCS_10GBX_STAT1 24      /* 10GBASE-X PCS status 1 */
>>   #define MDIO_PCS_10GBRT_STAT1        32      /* 10GBASE-R/-T PCS status 1 */
>>   #define MDIO_PCS_10GBRT_STAT2        33      /* 10GBASE-R/-T PCS status 2 */
>>   #define MDIO_AN_10GBT_CTRL   32      /* 10GBASE-T auto-negotiation control */
>>   #define MDIO_AN_10GBT_STAT   33      /* 10GBASE-T auto-negotiation status */
>> +#define MDIO_AN_BT1_CTRL     512     /* BASE-T1 auto-negotiation control */
>> +#define MDIO_AN_BT1_STAT     513     /* BASE-T1 auto-negotiation status */
>> +#define MDIO_AN_10BT1_CTRL   526     /* 10BASE-T1 auto-negotiation control */
>> +#define MDIO_AN_10BT1_STAT   527     /* 10BASE-T1 auto-negotiation status */
>>
>>   /* LASI (Link Alarm Status Interrupt) registers, defined by XENPAK MSA. */
>>   #define MDIO_PMA_LASI_RXCTRL 0x9000  /* RX_ALARM control */
>> @@ -151,6 +157,7 @@
>>   #define MDIO_PMA_CTRL2_100BTX                0x000e  /* 100BASE-TX type */
>>   #define MDIO_PMA_CTRL2_10BT          0x000f  /* 10BASE-T type */
>>   #define MDIO_PMA_CTRL2_2_5GBT                0x0030  /* 2.5GBaseT type */
>> +#define MDIO_PMA_CTRL2_BT1           0x003D  /* BASE-T1 type */
>>   #define MDIO_PMA_CTRL2_5GBT          0x0031  /* 5GBaseT type */
>>   #define MDIO_PCS_CTRL2_TYPE          0x0003  /* PCS type selection */
>>   #define MDIO_PCS_CTRL2_10GBR         0x0000  /* 10GBASE-R type */
>> @@ -205,8 +212,16 @@
>>   #define MDIO_PMA_EXTABLE_1000BKX     0x0040  /* 1000BASE-KX ability */
>>   #define MDIO_PMA_EXTABLE_100BTX              0x0080  /* 100BASE-TX ability */
>>   #define MDIO_PMA_EXTABLE_10BT                0x0100  /* 10BASE-T ability */
>> +#define MDIO_PMA_EXTABLE_BASET1              0x0800  /* BASE-T1 ability */
>>   #define MDIO_PMA_EXTABLE_NBT         0x4000  /* 2.5/5GBASE-T ability */
>>
>> +/* PMA BASE-T1 control register. */
>> +#define MDIO_PMA_BASET1CTRL_TYPE         0x000f /* PMA/PMD BASE-T1 type sel. */
>> +#define MDIO_PMA_BASET1CTRL_TYPE_100BT1  0x0000 /* 100BASE-T1 */
>> +#define MDIO_PMA_BASET1CTRL_TYPE_1000BT1 0x0001 /* 1000BASE-T1 */
>> +#define MDIO_PMA_BASET1CTRL_TYPE_10BT1L  0x0002 /* 10BASE-T1L */
>> +#define MDIO_PMA_BASET1CTRL_TYPE_10BT1S  0x0003 /* 10BASE-T1S */
>> +
>>   /* PHY XGXS lane state register. */
>>   #define MDIO_PHYXS_LNSTAT_SYNC0              0x0001
>>   #define MDIO_PHYXS_LNSTAT_SYNC1              0x0002
>> @@ -281,6 +296,12 @@
>>   #define MDIO_PMA_NG_EXTABLE_2_5GBT   0x0001  /* 2.5GBASET ability */
>>   #define MDIO_PMA_NG_EXTABLE_5GBT     0x0002  /* 5GBASET ability */
>>
>> +/* BASE-T1 Extended abilities register. */
>> +#define MDIO_PMA_BASET1_EXTABLE_100BT1   0x0001  /* 100BASE-T1 ability */
>> +#define MDIO_PMA_BASET1_EXTABLE_1000BT1  0x0002  /* 1000BASE-T1 ability */
>> +#define MDIO_PMA_BASET1_EXTABLE_10BT1L   0x0004  /* 10BASE-T1L ability */
>> +#define MDIO_PMA_BASET1_EXTABLE_10BT1S   0x0008  /* 10BASE-T1S ability */
>> +
>>   /* LASI RX_ALARM control/status registers. */
>>   #define MDIO_PMA_LASI_RX_PHYXSLFLT   0x0001  /* PHY XS RX local fault */
>>   #define MDIO_PMA_LASI_RX_PCSLFLT     0x0008  /* PCS RX local fault */
>>
> 
> 
I have already prepared my next patch with a global is_baset1_capable 
property in the phy device. It is intentional that I did not introduce 
IS_10BASET1 macro. I should have probably explained in the cover letter. 
Will do for v2.

100 and 1000BASE-T1 are already more mature than 10BASE-T1. For 
10BASE-T1, the only support added right now is the detection of the 
capability, i.e. reporting to ethtool. I would suggest enabling more 
support for 10BASE-T1 once there is silicon available on a larger scale 
and usage is more clear.

^ permalink raw reply

* [PATCH net-next v2] r8152: divide the tx and rx bottom functions
From: Hayes Wang @ 2019-08-19  6:40 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, Hayes Wang
In-Reply-To: <1394712342-15778-301-Taiwan-albertk@realtek.com>

Move the tx bottom function from NAPI to a new tasklet. Then, for
multi-cores, the bottom functions of tx and rx may be run at same
time with different cores. This is used to improve performance.

On x86, Tx/Rx 943/943 Mbits/sec -> 945/944.
For arm platform, Tx/Rx: 917/917 Mbits/sec -> 933/933.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
v2: add the performance number in the commit message.
---
 drivers/net/usb/r8152.c | 39 ++++++++++++++++++++++++++-------------
 1 file changed, 26 insertions(+), 13 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 40d18e866269..3ed9f8e082c9 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -619,7 +619,7 @@ enum rtl8152_flags {
 	RTL8152_LINK_CHG,
 	SELECTIVE_SUSPEND,
 	PHY_RESET,
-	SCHEDULE_NAPI,
+	SCHEDULE_TASKLET,
 	GREEN_ETHERNET,
 	DELL_TB_RX_AGG_BUG,
 };
@@ -733,6 +733,7 @@ struct r8152 {
 #ifdef CONFIG_PM_SLEEP
 	struct notifier_block pm_notifier;
 #endif
+	struct tasklet_struct tx_tl;
 
 	struct rtl_ops {
 		void (*init)(struct r8152 *);
@@ -1401,7 +1402,7 @@ static void write_bulk_callback(struct urb *urb)
 		return;
 
 	if (!skb_queue_empty(&tp->tx_queue))
-		napi_schedule(&tp->napi);
+		tasklet_schedule(&tp->tx_tl);
 }
 
 static void intr_callback(struct urb *urb)
@@ -2179,8 +2180,12 @@ static void tx_bottom(struct r8152 *tp)
 	} while (res == 0);
 }
 
-static void bottom_half(struct r8152 *tp)
+static void bottom_half(unsigned long data)
 {
+	struct r8152 *tp;
+
+	tp = (struct r8152 *)data;
+
 	if (test_bit(RTL8152_UNPLUG, &tp->flags))
 		return;
 
@@ -2192,7 +2197,7 @@ static void bottom_half(struct r8152 *tp)
 	if (!netif_carrier_ok(tp->netdev))
 		return;
 
-	clear_bit(SCHEDULE_NAPI, &tp->flags);
+	clear_bit(SCHEDULE_TASKLET, &tp->flags);
 
 	tx_bottom(tp);
 }
@@ -2203,16 +2208,12 @@ static int r8152_poll(struct napi_struct *napi, int budget)
 	int work_done;
 
 	work_done = rx_bottom(tp, budget);
-	bottom_half(tp);
 
 	if (work_done < budget) {
 		if (!napi_complete_done(napi, work_done))
 			goto out;
 		if (!list_empty(&tp->rx_done))
 			napi_schedule(napi);
-		else if (!skb_queue_empty(&tp->tx_queue) &&
-			 !list_empty(&tp->tx_free))
-			napi_schedule(napi);
 	}
 
 out:
@@ -2366,11 +2367,11 @@ static netdev_tx_t rtl8152_start_xmit(struct sk_buff *skb,
 
 	if (!list_empty(&tp->tx_free)) {
 		if (test_bit(SELECTIVE_SUSPEND, &tp->flags)) {
-			set_bit(SCHEDULE_NAPI, &tp->flags);
+			set_bit(SCHEDULE_TASKLET, &tp->flags);
 			schedule_delayed_work(&tp->schedule, 0);
 		} else {
 			usb_mark_last_busy(tp->udev);
-			napi_schedule(&tp->napi);
+			tasklet_schedule(&tp->tx_tl);
 		}
 	} else if (skb_queue_len(&tp->tx_queue) > tp->tx_qlen) {
 		netif_stop_queue(netdev);
@@ -4020,9 +4021,11 @@ static void set_carrier(struct r8152 *tp)
 	} else {
 		if (netif_carrier_ok(netdev)) {
 			netif_carrier_off(netdev);
+			tasklet_disable(&tp->tx_tl);
 			napi_disable(napi);
 			tp->rtl_ops.disable(tp);
 			napi_enable(napi);
+			tasklet_enable(&tp->tx_tl);
 			netif_info(tp, link, netdev, "carrier off\n");
 		}
 	}
@@ -4055,10 +4058,10 @@ static void rtl_work_func_t(struct work_struct *work)
 	if (test_and_clear_bit(RTL8152_SET_RX_MODE, &tp->flags))
 		_rtl8152_set_rx_mode(tp->netdev);
 
-	/* don't schedule napi before linking */
-	if (test_and_clear_bit(SCHEDULE_NAPI, &tp->flags) &&
+	/* don't schedule tasket before linking */
+	if (test_and_clear_bit(SCHEDULE_TASKLET, &tp->flags) &&
 	    netif_carrier_ok(tp->netdev))
-		napi_schedule(&tp->napi);
+		tasklet_schedule(&tp->tx_tl);
 
 	mutex_unlock(&tp->control);
 
@@ -4144,6 +4147,7 @@ static int rtl8152_open(struct net_device *netdev)
 		goto out_unlock;
 	}
 	napi_enable(&tp->napi);
+	tasklet_enable(&tp->tx_tl);
 
 	mutex_unlock(&tp->control);
 
@@ -4171,6 +4175,7 @@ static int rtl8152_close(struct net_device *netdev)
 #ifdef CONFIG_PM_SLEEP
 	unregister_pm_notifier(&tp->pm_notifier);
 #endif
+	tasklet_disable(&tp->tx_tl);
 	if (!test_bit(RTL8152_UNPLUG, &tp->flags))
 		napi_disable(&tp->napi);
 	clear_bit(WORK_ENABLE, &tp->flags);
@@ -4440,6 +4445,7 @@ static int rtl8152_pre_reset(struct usb_interface *intf)
 		return 0;
 
 	netif_stop_queue(netdev);
+	tasklet_disable(&tp->tx_tl);
 	napi_disable(&tp->napi);
 	clear_bit(WORK_ENABLE, &tp->flags);
 	usb_kill_urb(tp->intr_urb);
@@ -4483,6 +4489,7 @@ static int rtl8152_post_reset(struct usb_interface *intf)
 	}
 
 	napi_enable(&tp->napi);
+	tasklet_enable(&tp->tx_tl);
 	netif_wake_queue(netdev);
 	usb_submit_urb(tp->intr_urb, GFP_KERNEL);
 
@@ -4636,10 +4643,12 @@ static int rtl8152_system_suspend(struct r8152 *tp)
 
 		clear_bit(WORK_ENABLE, &tp->flags);
 		usb_kill_urb(tp->intr_urb);
+		tasklet_disable(&tp->tx_tl);
 		napi_disable(napi);
 		cancel_delayed_work_sync(&tp->schedule);
 		tp->rtl_ops.down(tp);
 		napi_enable(napi);
+		tasklet_enable(&tp->tx_tl);
 	}
 
 	return 0;
@@ -5499,6 +5508,8 @@ static int rtl8152_probe(struct usb_interface *intf,
 	mutex_init(&tp->control);
 	INIT_DELAYED_WORK(&tp->schedule, rtl_work_func_t);
 	INIT_DELAYED_WORK(&tp->hw_phy_work, rtl_hw_phy_work_func_t);
+	tasklet_init(&tp->tx_tl, bottom_half, (unsigned long)tp);
+	tasklet_disable(&tp->tx_tl);
 
 	netdev->netdev_ops = &rtl8152_netdev_ops;
 	netdev->watchdog_timeo = RTL8152_TX_TIMEOUT;
@@ -5585,6 +5596,7 @@ static int rtl8152_probe(struct usb_interface *intf,
 
 out1:
 	netif_napi_del(&tp->napi);
+	tasklet_kill(&tp->tx_tl);
 	usb_set_intfdata(intf, NULL);
 out:
 	free_netdev(netdev);
@@ -5601,6 +5613,7 @@ static void rtl8152_disconnect(struct usb_interface *intf)
 
 		netif_napi_del(&tp->napi);
 		unregister_netdev(tp->netdev);
+		tasklet_kill(&tp->tx_tl);
 		cancel_delayed_work_sync(&tp->hw_phy_work);
 		tp->rtl_ops.unload(tp);
 		free_netdev(tp->netdev);
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH bpf-next] libbpf: remove zc variable as it is not used
From: Maxim Mikityanskiy @ 2019-08-19  6:47 UTC (permalink / raw)
  To: Magnus Karlsson, Jonathan Lemon
  Cc: Yonghong Song, Magnus Karlsson, Björn Töpel,
	Alexei Starovoitov, Daniel Borkmann, Network Development, bpf
In-Reply-To: <CAJ8uoz1hY0P+xypkJYYi775SeSXnrrPSM5v0yTf3G+d2a3OhJg@mail.gmail.com>

On 2019-08-19 09:35, Magnus Karlsson wrote:
> On Sat, Aug 17, 2019 at 12:02 AM Jonathan Lemon
> <jonathan.lemon@gmail.com> wrote:
>>
>>
>>
>> On 16 Aug 2019, at 8:37, Yonghong Song wrote:
>>
>>> On 8/16/19 3:26 AM, Magnus Karlsson wrote:
>>>> The zc is not used in the xsk part of libbpf, so let us remove it.
>>>> Not
>>>> good to have dead code lying around.
>>>>
>>>> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
>>>> Reported-by: Yonghong Song <yhs@fb.com> > ---
>>>>    tools/lib/bpf/xsk.c | 3 ---
>>>>    1 file changed, 3 deletions(-)
>>>>
>>>> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
>>>> index 680e630..9687da9 100644
>>>> --- a/tools/lib/bpf/xsk.c
>>>> +++ b/tools/lib/bpf/xsk.c
>>>> @@ -65,7 +65,6 @@ struct xsk_socket {
>>>>       int xsks_map_fd;
>>>>       __u32 queue_id;
>>>>       char ifname[IFNAMSIZ];
>>>> -    bool zc;
>>>>    };
>>>>
>>>>    struct xsk_nl_info {
>>>> @@ -608,8 +607,6 @@ int xsk_socket__create(struct xsk_socket
>>>> **xsk_ptr, const char *ifname,
>>>>               goto out_mmap_tx;
>>>>       }
>>>>
>>>> -    xsk->zc = opts.flags & XDP_OPTIONS_ZEROCOPY;
>>>
>>> Since opts.flags usage is removed. Do you think it makes sense to
>>> remove
>>>           optlen = sizeof(opts);
>>>           err = getsockopt(xsk->fd, SOL_XDP, XDP_OPTIONS, &opts,
>>> &optlen);
>>>           if (err) {
>>>                   err = -errno;
>>>                   goto out_mmap_tx;
>>>           }
>>> as well since nobody then uses opts?
>>
>> IIRC, this was added specifically in
>> 2761ed4b6e192820760d5ba913834b2ba05fd08c
>> so that userland code could know whether the socket was operating in
>> zero-copy
>> mode or not.
> 
> Thanks for reminding me Jonathan.
> 
> Roping in Maxim here since he wrote the patch. Was this something you
> planned on using but the functionality that needed it was removed? The
> patch set did go through a number of changes in the libbpf area, if I
> remember correctly.
> 
> There are two options: either we remove it, or we add an interface in
> xsk.h so that people can use it. I vote for the latter since I think
> it could be useful. The sample app could use it at least :-).

+1, let's expose it and make xdpsock read and print it. I added this 
flag to libbpf when I added the ability to query it from the kernel, but 
I didn't pay attention that struct xsk_socket was private to libbpf, and 
xsk->zc couldn't be accessed externally.

> /Magnus
> 
>> --
>> Jonathan


^ permalink raw reply

* Re: [PATCH net-next v7 5/6] flow_offload: support get multi-subsystem block
From: Vlad Buslov @ 2019-08-19  7:26 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Vlad Buslov, wenxu, David Miller, Jiri Pirko, pablo@netfilter.org,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <20190816105627.57c1c2aa@cakuba.netronome.com>


On Fri 16 Aug 2019 at 20:56, Jakub Kicinski <jakub.kicinski@netronome.com> wrote:
> On Fri, 16 Aug 2019 15:04:44 +0000, Vlad Buslov wrote:
>> >> [  401.511871] RSP: 002b:00007ffca2a9fad8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>> >> [  401.511875] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fad892d30f8
>> >> [  401.511878] RDX: 0000000000000002 RSI: 000055afeb072a90 RDI: 0000000000000001
>> >> [  401.511881] RBP: 000055afeb072a90 R08: 00000000ffffffff R09: 000000000000000a
>> >> [  401.511884] R10: 000055afeb058710 R11: 0000000000000246 R12: 0000000000000002
>> >> [  401.511887] R13: 00007fad893a8780 R14: 0000000000000002 R15: 00007fad893a3740
>> >>
>> >> I don't think it is correct approach to try to call these callbacks with
>> >> rcu protection because:
>> >>
>> >> - Cls API uses sleeping locks that cannot be used in rcu read section
>> >>   (hence the included trace).
>> >>
>> >> - It assumes that all implementation of classifier ops reoffload() don't
>> >>   sleep.
>> >>
>> >> - And that all driver offload callbacks (both block and classifier
>> >>   setup) don't sleep, which is not the case.
>> >>
>> >> I don't see any straightforward way to fix this, besides using some
>> >> other locking mechanism to protect block_ing_cb_list.
>> >>
>> >> Regards,
>> >> Vlad
>> >
>> > Maybe get the  mutex flow_indr_block_ing_cb_lock for both lookup, add, delete?
>> >
>> > the callbacks_lists. the add and delete is work only on modules init case. So the
>> >
>> > lookup is also not frequently(ony [un]register) and can protect with the locks.
>>
>> That should do the job. I'll send the patch.
>
> Hi Vlad!
>
> While looking into this, would you mind also add the missing
> flow_block_cb_is_busy() calls in the indirect handlers in the drivers?
>
> LMK if you're too busy, I don't want this to get forgotten :)

Hi Jakub,

I've checked the code and it looks like only nfp driver is affected:

- I added check in nfp to lookup cb_priv with
  nfp_flower_indr_block_cb_priv_lookup() and call
  flow_block_cb_is_busy() if cb_priv exists.

- In mlx5 en_rep.c there is already a check that indr_priv exists, so
  trying to lookup block_cb->cb_indent==indr_priv is redundant.

- Switch drivers (mlxsw and ocelot) take reference to block_cb on
  FLOW_BLOCK_BIND, so they should not require any modifications.

Tell me if I missed anything. Sending the patch for nfp.

Regards,
Vlad

^ permalink raw reply

* Re: [PATCH v2] tun: fix use-after-free when register netdev failed
From: Yang Yingliang @ 2019-08-19  7:29 UTC (permalink / raw)
  To: Jason Wang, netdev; +Cc: eric.dumazet, xiyou.wangcong, davem, weiyongjun1
In-Reply-To: <1b8175f3-7781-923b-5a24-d473f6efd33d@redhat.com>



On 2019/8/19 11:17, Jason Wang wrote:
> On 2019/8/16 下午7:00, Yang Yingliang wrote:
[...]
>>   
>>   		INIT_LIST_HEAD(&tun->disabled);
>> -		err = tun_attach(tun, file, false, ifr->ifr_flags & IFF_NAPI,
>> -				 ifr->ifr_flags & IFF_NAPI_FRAGS);
>> -		if (err < 0)
>> -			goto err_free_flow;
>> +
>> +		tun_set_real_num_queues(tun, tun->numqueues + 1);
>
> This looks tricky, why not simply call netif_set_real_num_tx/rx_queues()
> here?
OK, I will do some test, then send a v3 patch.


Thanks,
Yang

>
> Thanks
>
>
>>   
>>   		err = register_netdevice(tun->dev);
>>   		if (err < 0)
>> -			goto err_detach;
>> +			/* register_netdevice() already called tun_free_netdev() */
>> +			goto err_free_dev;
>> +
>> +		err = tun_attach(tun, file, false, ifr->ifr_flags & IFF_NAPI,
>> +				 ifr->ifr_flags & IFF_NAPI_FRAGS);
>> +		if (err < 0)
>> +			goto err_unregister;
>>   	}
>>   
>>   	netif_carrier_on(tun->dev);
>> @@ -2851,14 +2857,10 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
>>   	strcpy(ifr->ifr_name, tun->dev->name);
>>   	return 0;
>>   
>> -err_detach:
>> -	tun_detach_all(dev);
>> -	/* register_netdevice() already called tun_free_netdev() */
>> -	goto err_free_dev;
>> +err_unregister:
>> +	unregister_netdevice(dev);
>> +	return err;
>>   
>> -err_free_flow:
>> -	tun_flow_uninit(tun);
>> -	security_tun_dev_free_security(tun->security);
>>   err_free_stat:
>>   	free_percpu(tun->pcpu_stats);
>>   err_free_dev:
>> @@ -2979,6 +2981,8 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
>>   			goto unlock;
>>   		ret = tun_attach(tun, file, false, tun->flags & IFF_NAPI,
>>   				 tun->flags & IFF_NAPI_FRAGS);
>> +		if (!ret)
>> +			tun_set_real_num_queues(tun, tun->numqueues);
>>   	} else if (ifr->ifr_flags & IFF_DETACH_QUEUE) {
>>   		tun = rtnl_dereference(tfile->tun);
>>   		if (!tun || !(tun->flags & IFF_MULTI_QUEUE) || tfile->detached)
> .
>



^ permalink raw reply

* [PATCH net] nfp: flower: verify that block cb is not busy before binding
From: Vlad Buslov @ 2019-08-19  7:33 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, jakub.kicinski, pablo,
	Vlad Buslov

When processing FLOW_BLOCK_BIND command on indirect block, check that flow
block cb is not busy.

Fixes: 0d4fd02e7199 ("net: flow_offload: add flow_block_cb_is_busy() and use it")
Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
---
 drivers/net/ethernet/netronome/nfp/flower/offload.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index e209f150c5f2..9917d64694c6 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -1416,6 +1416,13 @@ nfp_flower_setup_indr_tc_block(struct net_device *netdev, struct nfp_app *app,
 
 	switch (f->command) {
 	case FLOW_BLOCK_BIND:
+		cb_priv = nfp_flower_indr_block_cb_priv_lookup(app, netdev);
+		if (cb_priv &&
+		    flow_block_cb_is_busy(nfp_flower_setup_indr_block_cb,
+					  cb_priv,
+					  &nfp_block_cb_list))
+			return -EBUSY;
+
 		cb_priv = kmalloc(sizeof(*cb_priv), GFP_KERNEL);
 		if (!cb_priv)
 			return -ENOMEM;
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH bpf-next] libbpf: remove zc variable as it is not used
From: Magnus Karlsson @ 2019-08-19  7:35 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: Jonathan Lemon, Yonghong Song, Magnus Karlsson,
	Björn Töpel, Alexei Starovoitov, Daniel Borkmann,
	Network Development, bpf
In-Reply-To: <f88c052e-bc75-9e70-6e94-3dc581baf5f4@mellanox.com>

On Mon, Aug 19, 2019 at 8:47 AM Maxim Mikityanskiy <maximmi@mellanox.com> wrote:
>
> On 2019-08-19 09:35, Magnus Karlsson wrote:
> > On Sat, Aug 17, 2019 at 12:02 AM Jonathan Lemon
> > <jonathan.lemon@gmail.com> wrote:
> >>
> >>
> >>
> >> On 16 Aug 2019, at 8:37, Yonghong Song wrote:
> >>
> >>> On 8/16/19 3:26 AM, Magnus Karlsson wrote:
> >>>> The zc is not used in the xsk part of libbpf, so let us remove it.
> >>>> Not
> >>>> good to have dead code lying around.
> >>>>
> >>>> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> >>>> Reported-by: Yonghong Song <yhs@fb.com> > ---
> >>>>    tools/lib/bpf/xsk.c | 3 ---
> >>>>    1 file changed, 3 deletions(-)
> >>>>
> >>>> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> >>>> index 680e630..9687da9 100644
> >>>> --- a/tools/lib/bpf/xsk.c
> >>>> +++ b/tools/lib/bpf/xsk.c
> >>>> @@ -65,7 +65,6 @@ struct xsk_socket {
> >>>>       int xsks_map_fd;
> >>>>       __u32 queue_id;
> >>>>       char ifname[IFNAMSIZ];
> >>>> -    bool zc;
> >>>>    };
> >>>>
> >>>>    struct xsk_nl_info {
> >>>> @@ -608,8 +607,6 @@ int xsk_socket__create(struct xsk_socket
> >>>> **xsk_ptr, const char *ifname,
> >>>>               goto out_mmap_tx;
> >>>>       }
> >>>>
> >>>> -    xsk->zc = opts.flags & XDP_OPTIONS_ZEROCOPY;
> >>>
> >>> Since opts.flags usage is removed. Do you think it makes sense to
> >>> remove
> >>>           optlen = sizeof(opts);
> >>>           err = getsockopt(xsk->fd, SOL_XDP, XDP_OPTIONS, &opts,
> >>> &optlen);
> >>>           if (err) {
> >>>                   err = -errno;
> >>>                   goto out_mmap_tx;
> >>>           }
> >>> as well since nobody then uses opts?
> >>
> >> IIRC, this was added specifically in
> >> 2761ed4b6e192820760d5ba913834b2ba05fd08c
> >> so that userland code could know whether the socket was operating in
> >> zero-copy
> >> mode or not.
> >
> > Thanks for reminding me Jonathan.
> >
> > Roping in Maxim here since he wrote the patch. Was this something you
> > planned on using but the functionality that needed it was removed? The
> > patch set did go through a number of changes in the libbpf area, if I
> > remember correctly.
> >
> > There are two options: either we remove it, or we add an interface in
> > xsk.h so that people can use it. I vote for the latter since I think
> > it could be useful. The sample app could use it at least :-).
>
> +1, let's expose it and make xdpsock read and print it. I added this
> flag to libbpf when I added the ability to query it from the kernel, but
> I didn't pay attention that struct xsk_socket was private to libbpf, and
> xsk->zc couldn't be accessed externally.

Good. I will then add it.

Please discard this patch. I will add this interface in a new patch.

/Magnus

> > /Magnus
> >
> >> --
> >> Jonathan
>

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH bpf-next 0/5] Add support for SKIP_BPF flag for AF_XDP sockets
From: Björn Töpel @ 2019-08-19  7:39 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: Samudrala, Sridhar, Björn Töpel, Karlsson, Magnus,
	Netdev, bpf, intel-wired-lan, maciej.fijalkowski, tom.herbert
In-Reply-To: <331CAEDB-776A-4928-93EF-F45F1339848F@gmail.com>

On Sat, 17 Aug 2019 at 00:08, Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
> On 16 Aug 2019, at 6:32, Björn Töpel wrote:
[...]
> >
> > Today, from a driver perspective, to enable XDP you pass a struct
> > bpf_prog pointer via the ndo_bpf. The program get executed in
> > BPF_PROG_RUN (via bpf_prog_run_xdp) from include/linux/filter.h.
> >
> > I think it's possible to achieve what you're doing w/o *any* driver
> > modification. Pass a special, invalid, pointer to the driver (say
> > (void *)0x1 or smth more elegant), which has a special handling in
> > BPF_RUN_PROG e.g. setting a per-cpu state and return XDP_REDIRECT. The
> > per-cpu state is picked up in xdp_do_redirect and xdp_flush.
> >
> > An approach like this would be general, and apply to all modes
> > automatically.
> >
> > Thoughts?
>
> All the default program does is check that the map entry contains a xsk,
> and call bpf_redirect_map().  So this is pretty much the same as above,
> without any special case handling.
>
> Why would this be so expensive?  Is the JIT compilation time being
> counted?

No, not the JIT compilation time, only the fast-path. The gain is from
removing the indirect call (hitting a retpoline) when calling the XDP
program, and reducing code from xdp_do_redirect/xdp_flush.

But, as Jakub pointed out, the XDP batching work by Maciej, might
reduce the retpoline impact quite a bit.


Björn

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox