* [PATCH v7 net-next 0/5] add PLCA RS support and onsemi NCN26000
From: Piergiorgio Beruto @ 2022-12-17 0:48 UTC (permalink / raw)
To: Andrew Lunn, Heiner Kallweit, Russell King, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: linux-kernel, netdev, Oleksij Rempel
This patchset adds support for getting/setting the Physical Layer
Collision Avoidace (PLCA) Reconciliation Sublayer (RS) configuration and
status on Ethernet PHYs that supports it.
PLCA is a feature that provides improved media-access performance in terms
of throughput, latency and fairness for multi-drop (P2MP) half-duplex PHYs.
PLCA is defined in Clause 148 of the IEEE802.3 specifications as amended
by 802.3cg-2019. Currently, PLCA is supported by the 10BASE-T1S single-pair
Ethernet PHY defined in the same standard and related amendments. The OPEN
Alliance SIG TC14 defines additional specifications for the 10BASE-T1S PHY,
including a standard register map for PHYs that embeds the PLCA RS (see
PLCA management registers at https://www.opensig.org/about/specifications/).
The changes proposed herein add the appropriate ethtool netlink interface
for configuring the PLCA RS on PHYs that supports it. A separate patchset
further modifies the ethtool userspace program to show and modify the
configuration/status of the PLCA RS.
Additionally, this patchset adds support for the onsemi NCN26000
Industrial Ethernet 10BASE-T1S PHY that uses the newly added PLCA
infrastructure.
Piergiorgio Beruto (5):
net/ethtool: add netlink interface for the PLCA RS
drivers/net/phy: add the link modes for the 10BASE-T1S Ethernet PHY
drivers/net/phy: add connection between ethtool and phylib for PLCA
drivers/net/phy: add helpers to get/set PLCA configuration
drivers/net/phy: add driver for the onsemi NCN26000 10BASE-T1S PHY
Documentation/networking/ethtool-netlink.rst | 138 ++++++++++
MAINTAINERS | 14 +
drivers/net/phy/Kconfig | 7 +
drivers/net/phy/Makefile | 1 +
drivers/net/phy/mdio-open-alliance.h | 46 ++++
drivers/net/phy/ncn26000.c | 171 ++++++++++++
drivers/net/phy/phy-c45.c | 183 ++++++++++++
drivers/net/phy/phy-core.c | 5 +-
drivers/net/phy/phy.c | 172 ++++++++++++
drivers/net/phy/phy_device.c | 17 ++
drivers/net/phy/phylink.c | 6 +-
include/linux/ethtool.h | 12 +
include/linux/phy.h | 83 ++++++
include/uapi/linux/ethtool.h | 3 +
include/uapi/linux/ethtool_netlink.h | 25 ++
net/ethtool/Makefile | 2 +-
net/ethtool/common.c | 8 +
net/ethtool/netlink.c | 29 ++
net/ethtool/netlink.h | 6 +
net/ethtool/plca.c | 276 +++++++++++++++++++
20 files changed, 1201 insertions(+), 3 deletions(-)
create mode 100644 drivers/net/phy/mdio-open-alliance.h
create mode 100644 drivers/net/phy/ncn26000.c
create mode 100644 net/ethtool/plca.c
--
2.37.4
^ permalink raw reply
* Re: [PATCH v6] vmxnet3: Add XDP support.
From: Alexander Duyck @ 2022-12-17 0:37 UTC (permalink / raw)
To: William Tu; +Cc: netdev, tuc, gyang, doshir
In-Reply-To: <CALDO+SaKd74n_PDB+kN0KQWt_Zh9NjzPAm-kWfie4Vd+jOeCZw@mail.gmail.com>
On Fri, Dec 16, 2022 at 2:53 PM William Tu <u9012063@gmail.com> wrote:
>
> Hi Alexander,
> Thanks for taking a look at this patch!
>
> On Fri, Dec 16, 2022 at 9:14 AM Alexander H Duyck
> <alexander.duyck@gmail.com> wrote:
> >
> > On Fri, 2022-12-16 at 06:41 -0800, William Tu wrote:
> > > The patch adds native-mode XDP support: XDP DROP, PASS, TX, and REDIRECT.
> > >
> > > Background:
> > > The vmxnet3 rx consists of three rings: ring0, ring1, and dataring.
> > > For r0 and r1, buffers at r0 are allocated using alloc_skb APIs and dma
> > > mapped to the ring's descriptor. If LRO is enabled and packet size larger
> > > than 3K, VMXNET3_MAX_SKB_BUF_SIZE, then r1 is used to mapped the rest of
> > > the buffer larger than VMXNET3_MAX_SKB_BUF_SIZE. Each buffer in r1 is
> > > allocated using alloc_page. So for LRO packets, the payload will be in one
> > > buffer from r0 and multiple from r1, for non-LRO packets, only one
> > > descriptor in r0 is used for packet size less than 3k.
> > >
> > > When receiving a packet, the first descriptor will have the sop (start of
> > > packet) bit set, and the last descriptor will have the eop (end of packet)
> > > bit set. Non-LRO packets will have only one descriptor with both sop and
> > > eop set.
> > >
> > > Other than r0 and r1, vmxnet3 dataring is specifically designed for
> > > handling packets with small size, usually 128 bytes, defined in
> > > VMXNET3_DEF_RXDATA_DESC_SIZE, by simply copying the packet from the backend
> > > driver in ESXi to the ring's memory region at front-end vmxnet3 driver, in
> > > order to avoid memory mapping/unmapping overhead. In summary, packet size:
> > > A. < 128B: use dataring
> > > B. 128B - 3K: use ring0 (VMXNET3_RX_BUF_SKB)
> > > C. > 3K: use ring0 and ring1 (VMXNET3_RX_BUF_SKB + VMXNET3_RX_BUF_PAGE)
> > > As a result, the patch adds XDP support for packets using dataring
> > > and r0 (case A and B), not the large packet size when LRO is enabled.
> > >
> > > XDP Implementation:
> > > When user loads and XDP prog, vmxnet3 driver checks configurations, such
> > > as mtu, lro, and re-allocate the rx buffer size for reserving the extra
> > > headroom, XDP_PACKET_HEADROOM, for XDP frame. The XDP prog will then be
> > > associated with every rx queue of the device. Note that when using dataring
> > > for small packet size, vmxnet3 (front-end driver) doesn't control the
> > > buffer allocation, as a result the XDP frame's headroom is zero.
> > >
> > > The receive side of XDP is implemented for case A and B, by invoking the
> > > bpf program at vmxnet3_rq_rx_complete and handle its returned action.
> > > The new vmxnet3_run_xdp function handles the difference of using dataring
> > > or ring0, and decides the next journey of the packet afterward.
> > >
> > > For TX, vmxnet3 has split header design. Outgoing packets are parsed
> > > first and protocol headers (L2/L3/L4) are copied to the backend. The
> > > rest of the payload are dma mapped. Since XDP_TX does not parse the
> > > packet protocol, the entire XDP frame is using dma mapped for the
> > > transmission. Because the actual TX is deferred until txThreshold, it's
> > > possible that an rx buffer is being overwritten by incoming burst of rx
> > > packets, before tx buffer being transmitted. As a result, we allocate new
> > > skb and introduce a copy.
> > >
> > > Performance:
> > > Tested using two VMs inside one ESXi machine, using single core on each
> > > vmxnet3 device, sender using DPDK testpmd tx-mode attached to vmxnet3
> > > device, sending 64B or 512B packet.
> > >
> > > VM1 txgen:
> > > $ dpdk-testpmd -l 0-3 -n 1 -- -i --nb-cores=3 \
> > > --forward-mode=txonly --eth-peer=0,<mac addr of vm2>
> > > option: add "--txonly-multi-flow"
> > > option: use --txpkts=512 or 64 byte
> > >
> > > VM2 running XDP:
> > > $ ./samples/bpf/xdp_rxq_info -d ens160 -a <options> --skb-mode
> > > $ ./samples/bpf/xdp_rxq_info -d ens160 -a <options>
> > > options: XDP_DROP, XDP_PASS, XDP_TX
> > >
> > > To test REDIRECT to cpu 0, use
> > > $ ./samples/bpf/xdp_redirect_cpu -d ens160 -c 0 -e drop
> > >
> > > Single core performance comparison with skb-mode.
> > > 64B: skb-mode -> native-mode (with this patch)
> > > XDP_DROP: 932Kpps -> 2.0Mpps
> > > XDP_PASS: 284Kpps -> 314Kpps
> > > XDP_TX: 591Kpps -> 1.8Mpps
> > > REDIRECT: 453Kpps -> 501Kpps
> > >
> > > 512B: skb-mode -> native-mode (with this patch)
> > > XDP_DROP: 890Kpps -> 1.3Mpps
> > > XDP_PASS: 284Kpps -> 314Kpps
> > > XDP_TX: 555Kpps -> 1.2Mpps
> > > REDIRECT: 670Kpps -> 430Kpps
> > >
> >
> > I hadn't noticed it before. Based on this it looks like native mode is
> > performing worse then skb-mode for redirect w/ 512B packets? Have you
> > looked into why that might be?
>
> yes, I noticed it but don't know why, maybe it's due to extra copy and page
> allocation like you said below. I will dig deeper.
>
> >
> > My main concern would be that you are optimizing for recyling in the Tx
> > and Redirect paths, when you might be better off just releasing the
> > buffers and batch allocating new pages in your Rx path.
>
> right, are you talking about using the page pool allocator, ex: slide 8 below
> https://legacy.netdevconf.info/0x14/pub/slides/10/add-xdp-on-driver.pdf
> I tried it before but then I found I have to replace lots of existing vmxnet3
> code, basically replacing all the rx/tx buffer allocation code with new
> page pool api, even without XDP. I'd love to give it a try, do you think it's
> worth doing it?
It might be. It is hard for me to say without knowing more about the
driver itself. However if i were doing a driver from scratch that
supported XDP I would probably go that route. Having to refactor an
existing driver is admittedly going to be more work.
>
> >
> > > + __netif_tx_unlock(nq);
> > > + return err;
> > > +}
> > > +
> > > +int
> > > +vmxnet3_xdp_xmit(struct net_device *dev,
> > > + int n, struct xdp_frame **frames, u32 flags)
> > > +{
> > > + struct vmxnet3_adapter *adapter;
> > > + struct vmxnet3_tx_queue *tq;
> > > + struct netdev_queue *nq;
> > > + int i, err, cpu;
> > > + int nxmit = 0;
> > > + int tq_number;
> > > +
> > > + adapter = netdev_priv(dev);
> > > +
> > > + if (unlikely(test_bit(VMXNET3_STATE_BIT_QUIESCED, &adapter->state)))
> > > + return -ENETDOWN;
> > > + if (unlikely(test_bit(VMXNET3_STATE_BIT_RESETTING, &adapter->state)))
> > > + return -EINVAL;
> > > +
> > > + tq_number = adapter->num_tx_queues;
> > > + cpu = smp_processor_id();
> > > + tq = &adapter->tx_queue[cpu % tq_number];
> > > + if (tq->stopped)
> > > + return -ENETDOWN;
> > > +
> > > + nq = netdev_get_tx_queue(adapter->netdev, tq->qid);
> > > +
> > > + __netif_tx_lock(nq, cpu);
> > > + for (i = 0; i < n; i++) {
> > > + err = vmxnet3_xdp_xmit_frame(adapter, frames[i], tq);
> > > + if (err) {
> > > + tq->stats.xdp_xmit_err++;
> > > + break;
> > > + }
> > > + nxmit++;
> > > + }
> > > +
> > > + tq->stats.xdp_xmit += nxmit;
> > > + __netif_tx_unlock(nq);
> > > +
> >
> > Are you doing anything to free the frames after you transmit them? If I
> > am not mistaken you are just copying them over into skbs aren't you, so
> > what is freeing the frames after that?
>
> The frames will be free at vmxnet3_tq_cleanup() at dev_kfree_skb_any(tbi->skb);
> Because at the vmxnet3_xdp_xmit_frame the allocated skb is saved at tbi->skb,
> so it can be freed at tq cleanup.
The frames I am referring to are the xdp_frame, not the skb.
Specifically your function is copying the data out. So in the redirect
case I think you might be leaking pages. That is one of the reasons
why I was thinking it might be better to just push the data all the
way through.
In the other email you sent me the call xdp_return_frame was used to
free the frame. That is what would be expected in this function after
you cleaned the data out and placed it in an skbuff in
vmxnet3_xdp_xmit_frame.
> >
> > > + return nxmit;
> > > +}
> > > +
> > > +static int
> > > +__vmxnet3_run_xdp(struct vmxnet3_rx_queue *rq, void *data, int data_len,
> > > + int headroom, int frame_sz, bool *need_xdp_flush,
> > > + struct sk_buff *skb)
> > > +{
> > > + struct xdp_frame *xdpf;
> > > + void *buf_hard_start;
> > > + struct xdp_buff xdp;
> > > + struct page *page;
> > > + void *orig_data;
> > > + int err, delta;
> > > + int delta_len;
> > > + u32 act;
> > > +
> > > + buf_hard_start = data;
> > > + xdp_init_buff(&xdp, frame_sz, &rq->xdp_rxq);
> > > + xdp_prepare_buff(&xdp, buf_hard_start, headroom, data_len, true);
> > > + orig_data = xdp.data;
> > > +
> > > + act = bpf_prog_run_xdp(rq->xdp_bpf_prog, &xdp);
> > > + rq->stats.xdp_packets++;
> > > +
> > > + switch (act) {
> > > + case XDP_DROP:
> > > + rq->stats.xdp_drops++;
> > > + break;
> > > + case XDP_PASS:
> > > + /* bpf prog might change len and data position.
> > > + * dataring does not use skb so not support this.
> > > + */
> > > + delta = xdp.data - orig_data;
> > > + delta_len = (xdp.data_end - xdp.data) - data_len;
> > > + if (skb) {
> > > + skb_reserve(skb, delta);
> > > + skb_put(skb, delta_len);
> > > + }
> > > + break;
> > > + case XDP_TX:
> > > + xdpf = xdp_convert_buff_to_frame(&xdp);
> > > + if (!xdpf ||
> > > + vmxnet3_xdp_xmit_back(rq->adapter, xdpf)) {
> > > + rq->stats.xdp_drops++;
> > > + } else {
> > > + rq->stats.xdp_tx++;
> > > + }
> > > + break;
> > > + case XDP_ABORTED:
> > > + trace_xdp_exception(rq->adapter->netdev, rq->xdp_bpf_prog,
> > > + act);
> > > + rq->stats.xdp_aborted++;
> > > + break;
> > > + case XDP_REDIRECT:
> > > + page = alloc_page(GFP_ATOMIC);
> > > + if (!page) {
> > > + rq->stats.rx_buf_alloc_failure++;
> > > + return XDP_DROP;
> > > + }
> >
> > So I think I see the problem I had questions about here. If I am not
> > mistaken you are copying the buffer to this page, and then copying this
> > page to an skb right? I think you might be better off just writing off
> > the Tx/Redirect pages and letting them go through their respective
> > paths and just allocating new pages instead assuming these pages were
> > consumed.
>
> I'm not sure I understand, can you elaborate?
>
> For XDP_TX, I'm doing 1 extra copy, copying to the newly allocated skb
> in vmxnet3_xdp_xmit_back.
> For XDP_REDIREC, I allocate a page and copy to the page and call
> xdp_do_redirect, there is no copying to skb again. If I don't allocate a
> new page, it always crashes at
> [62020.425932] BUG: Bad page state in process cpumap/0/map:29 pfn:107548
> [62020.440905] kernel BUG at include/linux/mm.h:757!
> VM_BUG_ON_PAGE(page_ref_count(page) == 0, page);
What I was referring to was one copy here, and then another copy in
your vmxnet3_xdp_xmit_frame() function with the page you allocated
here possibly being leaked.
Though based on the other trace you provided it would look like you
are redirecting to something else as your code currently doesn't
reference xdp_return_frame which is what I was referring to earlier in
terms of missing the logic to free the frame in your transmit path.
^ permalink raw reply
* [PATCH wpan-next v2 6/6] mac802154: Handle passive scanning
From: Miquel Raynal @ 2022-12-17 0:02 UTC (permalink / raw)
To: Alexander Aring, Stefan Schmidt, linux-wpan
Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
Guilhem Imberton, Thomas Petazzoni, David S. Miller,
Jakub Kicinski, Paolo Abeni, Eric Dumazet, netdev, Miquel Raynal
In-Reply-To: <20221217000226.646767-1-miquel.raynal@bootlin.com>
Implement the core hooks in order to provide the softMAC layer support
for passive scans. Scans are requested by the user and can be aborted.
Changing channels manually is prohibited during scans.
The implementation uses a workqueue triggered at a certain interval
depending on the symbol duration for the current channel and the
duration order provided. More advanced drivers with internal scheduling
capabilities might require additional care but there is none mainline
yet.
Received beacons during a passive scan are processed in a work queue and
their result forwarded to the upper layer.
Active scanning is not supported yet.
Co-developed-by: David Girault <david.girault@qorvo.com>
Signed-off-by: David Girault <david.girault@qorvo.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
include/linux/ieee802154.h | 4 +
include/net/cfg802154.h | 16 ++
net/mac802154/Makefile | 2 +-
net/mac802154/cfg.c | 31 ++++
net/mac802154/ieee802154_i.h | 37 ++++-
net/mac802154/iface.c | 3 +
net/mac802154/main.c | 16 +-
net/mac802154/rx.c | 36 ++++-
net/mac802154/scan.c | 289 +++++++++++++++++++++++++++++++++++
9 files changed, 428 insertions(+), 6 deletions(-)
create mode 100644 net/mac802154/scan.c
diff --git a/include/linux/ieee802154.h b/include/linux/ieee802154.h
index b22e4147d334..140f61ec0f5f 100644
--- a/include/linux/ieee802154.h
+++ b/include/linux/ieee802154.h
@@ -47,6 +47,10 @@
/* Duration in superframe order */
#define IEEE802154_MAX_SCAN_DURATION 14
#define IEEE802154_ACTIVE_SCAN_DURATION 15
+/* Superframe duration in slots */
+#define IEEE802154_SUPERFRAME_PERIOD 16
+/* Various periods expressed in symbols */
+#define IEEE802154_SLOT_PERIOD 60
#define IEEE802154_LIFS_PERIOD 40
#define IEEE802154_SIFS_PERIOD 12
#define IEEE802154_MAX_SIFS_FRAME_SIZE 18
diff --git a/include/net/cfg802154.h b/include/net/cfg802154.h
index 5d4750d24f13..090e1ad64a55 100644
--- a/include/net/cfg802154.h
+++ b/include/net/cfg802154.h
@@ -314,6 +314,22 @@ struct cfg802154_scan_request {
struct wpan_phy *wpan_phy;
};
+/**
+ * struct cfg802154_mac_pkt - MAC packet descriptor (beacon/command)
+ * @node: MAC packets to process list member
+ * @skb: the received sk_buff
+ * @sdata: the interface on which @skb was received
+ * @page: page configuration when @skb was received
+ * @channel: channel configuration when @skb was received
+ */
+struct cfg802154_mac_pkt {
+ struct list_head node;
+ struct sk_buff *skb;
+ struct ieee802154_sub_if_data *sdata;
+ u8 page;
+ u8 channel;
+};
+
struct ieee802154_llsec_key_id {
u8 mode;
u8 id;
diff --git a/net/mac802154/Makefile b/net/mac802154/Makefile
index 4059295fdbf8..43d1347b37ee 100644
--- a/net/mac802154/Makefile
+++ b/net/mac802154/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0-only
obj-$(CONFIG_MAC802154) += mac802154.o
mac802154-objs := main.o rx.o tx.o mac_cmd.o mib.o \
- iface.o llsec.o util.o cfg.o trace.o
+ iface.o llsec.o util.o cfg.o scan.o trace.o
CFLAGS_trace.o := -I$(src)
diff --git a/net/mac802154/cfg.c b/net/mac802154/cfg.c
index 469d6e8dd2dd..187cebcaf233 100644
--- a/net/mac802154/cfg.c
+++ b/net/mac802154/cfg.c
@@ -114,6 +114,10 @@ ieee802154_set_channel(struct wpan_phy *wpan_phy, u8 page, u8 channel)
wpan_phy->current_channel == channel)
return 0;
+ /* Refuse to change channels during a scanning operation */
+ if (mac802154_is_scanning(local))
+ return -EBUSY;
+
ret = drv_set_channel(local, page, channel);
if (!ret) {
wpan_phy->current_page = page;
@@ -261,6 +265,31 @@ ieee802154_set_ackreq_default(struct wpan_phy *wpan_phy,
return 0;
}
+static int mac802154_trigger_scan(struct wpan_phy *wpan_phy,
+ struct cfg802154_scan_request *request)
+{
+ struct ieee802154_sub_if_data *sdata;
+
+ sdata = IEEE802154_WPAN_DEV_TO_SUB_IF(request->wpan_dev);
+
+ ASSERT_RTNL();
+
+ return mac802154_trigger_scan_locked(sdata, request);
+}
+
+static int mac802154_abort_scan(struct wpan_phy *wpan_phy,
+ struct wpan_dev *wpan_dev)
+{
+ struct ieee802154_local *local = wpan_phy_priv(wpan_phy);
+ struct ieee802154_sub_if_data *sdata;
+
+ sdata = IEEE802154_WPAN_DEV_TO_SUB_IF(wpan_dev);
+
+ ASSERT_RTNL();
+
+ return mac802154_abort_scan_locked(local, sdata);
+}
+
#ifdef CONFIG_IEEE802154_NL802154_EXPERIMENTAL
static void
ieee802154_get_llsec_table(struct wpan_phy *wpan_phy,
@@ -468,6 +497,8 @@ const struct cfg802154_ops mac802154_config_ops = {
.set_max_frame_retries = ieee802154_set_max_frame_retries,
.set_lbt_mode = ieee802154_set_lbt_mode,
.set_ackreq_default = ieee802154_set_ackreq_default,
+ .trigger_scan = mac802154_trigger_scan,
+ .abort_scan = mac802154_abort_scan,
#ifdef CONFIG_IEEE802154_NL802154_EXPERIMENTAL
.get_llsec_table = ieee802154_get_llsec_table,
.lock_llsec_table = ieee802154_lock_llsec_table,
diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h
index aeadee543a9c..0e4db967bd1d 100644
--- a/net/mac802154/ieee802154_i.h
+++ b/net/mac802154/ieee802154_i.h
@@ -21,6 +21,10 @@
#include "llsec.h"
+enum ieee802154_ongoing {
+ IEEE802154_IS_SCANNING = BIT(0),
+};
+
/* mac802154 device private data */
struct ieee802154_local {
struct ieee802154_hw hw;
@@ -43,15 +47,26 @@ struct ieee802154_local {
struct list_head interfaces;
struct mutex iflist_mtx;
- /* This one is used for scanning and other jobs not to be interfered
- * with serial driver.
- */
+ /* Data related workqueue */
struct workqueue_struct *workqueue;
+ /* MAC commands related workqueue */
+ struct workqueue_struct *mac_wq;
struct hrtimer ifs_timer;
+ /* Scanning */
+ u8 scan_page;
+ u8 scan_channel;
+ struct cfg802154_scan_request __rcu *scan_req;
+ struct delayed_work scan_work;
+
+ /* Asynchronous tasks */
+ struct list_head rx_beacon_list;
+ struct work_struct rx_beacon_work;
+
bool started;
bool suspended;
+ unsigned long ongoing;
struct tasklet_struct tasklet;
struct sk_buff_head skb_queue;
@@ -226,6 +241,22 @@ void mac802154_unlock_table(struct net_device *dev);
int mac802154_wpan_update_llsec(struct net_device *dev);
+/* PAN management handling */
+void mac802154_scan_worker(struct work_struct *work);
+int mac802154_trigger_scan_locked(struct ieee802154_sub_if_data *sdata,
+ struct cfg802154_scan_request *request);
+int mac802154_abort_scan_locked(struct ieee802154_local *local,
+ struct ieee802154_sub_if_data *sdata);
+int mac802154_process_beacon(struct ieee802154_local *local,
+ struct sk_buff *skb,
+ u8 page, u8 channel);
+void mac802154_rx_beacon_worker(struct work_struct *work);
+
+static inline bool mac802154_is_scanning(struct ieee802154_local *local)
+{
+ return test_bit(IEEE802154_IS_SCANNING, &local->ongoing);
+}
+
/* interface handling */
int ieee802154_iface_init(void);
void ieee802154_iface_exit(void);
diff --git a/net/mac802154/iface.c b/net/mac802154/iface.c
index 7de2f843379c..a5958d323ea3 100644
--- a/net/mac802154/iface.c
+++ b/net/mac802154/iface.c
@@ -302,6 +302,9 @@ static int mac802154_slave_close(struct net_device *dev)
ASSERT_RTNL();
+ if (mac802154_is_scanning(local))
+ mac802154_abort_scan_locked(local, sdata);
+
netif_stop_queue(dev);
local->open_count--;
diff --git a/net/mac802154/main.c b/net/mac802154/main.c
index 12a13a850fdf..b1111279e06d 100644
--- a/net/mac802154/main.c
+++ b/net/mac802154/main.c
@@ -89,6 +89,7 @@ ieee802154_alloc_hw(size_t priv_data_len, const struct ieee802154_ops *ops)
local->ops = ops;
INIT_LIST_HEAD(&local->interfaces);
+ INIT_LIST_HEAD(&local->rx_beacon_list);
mutex_init(&local->iflist_mtx);
tasklet_setup(&local->tasklet, ieee802154_tasklet_handler);
@@ -96,6 +97,8 @@ ieee802154_alloc_hw(size_t priv_data_len, const struct ieee802154_ops *ops)
skb_queue_head_init(&local->skb_queue);
INIT_WORK(&local->sync_tx_work, ieee802154_xmit_sync_worker);
+ INIT_DELAYED_WORK(&local->scan_work, mac802154_scan_worker);
+ INIT_WORK(&local->rx_beacon_work, mac802154_rx_beacon_worker);
/* init supported flags with 802.15.4 default ranges */
phy->supported.max_minbe = 8;
@@ -185,6 +188,7 @@ static void ieee802154_setup_wpan_phy_pib(struct wpan_phy *wpan_phy)
int ieee802154_register_hw(struct ieee802154_hw *hw)
{
struct ieee802154_local *local = hw_to_local(hw);
+ char mac_wq_name[IFNAMSIZ + 10] = {};
struct net_device *dev;
int rc = -ENOSYS;
@@ -195,6 +199,13 @@ int ieee802154_register_hw(struct ieee802154_hw *hw)
goto out;
}
+ snprintf(mac_wq_name, IFNAMSIZ + 10, "%s-mac-cmds", wpan_phy_name(local->phy));
+ local->mac_wq = create_singlethread_workqueue(mac_wq_name);
+ if (!local->mac_wq) {
+ rc = -ENOMEM;
+ goto out_wq;
+ }
+
hrtimer_init(&local->ifs_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
local->ifs_timer.function = ieee802154_xmit_ifs_timer;
@@ -224,7 +235,7 @@ int ieee802154_register_hw(struct ieee802154_hw *hw)
rc = wpan_phy_register(local->phy);
if (rc < 0)
- goto out_wq;
+ goto out_mac_wq;
rtnl_lock();
@@ -243,6 +254,8 @@ int ieee802154_register_hw(struct ieee802154_hw *hw)
out_phy:
wpan_phy_unregister(local->phy);
+out_mac_wq:
+ destroy_workqueue(local->mac_wq);
out_wq:
destroy_workqueue(local->workqueue);
out:
@@ -263,6 +276,7 @@ void ieee802154_unregister_hw(struct ieee802154_hw *hw)
rtnl_unlock();
+ destroy_workqueue(local->mac_wq);
destroy_workqueue(local->workqueue);
wpan_phy_unregister(local->phy);
}
diff --git a/net/mac802154/rx.c b/net/mac802154/rx.c
index 97bb4401dd3e..da0628ee3c89 100644
--- a/net/mac802154/rx.c
+++ b/net/mac802154/rx.c
@@ -29,12 +29,31 @@ static int ieee802154_deliver_skb(struct sk_buff *skb)
return netif_receive_skb(skb);
}
+void mac802154_rx_beacon_worker(struct work_struct *work)
+{
+ struct ieee802154_local *local =
+ container_of(work, struct ieee802154_local, rx_beacon_work);
+ struct cfg802154_mac_pkt *mac_pkt;
+
+ mac_pkt = list_first_entry_or_null(&local->rx_beacon_list,
+ struct cfg802154_mac_pkt, node);
+ if (!mac_pkt)
+ return;
+
+ mac802154_process_beacon(local, mac_pkt->skb, mac_pkt->page, mac_pkt->channel);
+
+ list_del(&mac_pkt->node);
+ kfree_skb(mac_pkt->skb);
+ kfree(mac_pkt);
+}
+
static int
ieee802154_subif_frame(struct ieee802154_sub_if_data *sdata,
struct sk_buff *skb, const struct ieee802154_hdr *hdr)
{
- struct wpan_dev *wpan_dev = &sdata->wpan_dev;
struct wpan_phy *wpan_phy = sdata->local->hw.phy;
+ struct wpan_dev *wpan_dev = &sdata->wpan_dev;
+ struct cfg802154_mac_pkt *mac_pkt;
__le16 span, sshort;
int rc;
@@ -106,6 +125,21 @@ ieee802154_subif_frame(struct ieee802154_sub_if_data *sdata,
switch (mac_cb(skb)->type) {
case IEEE802154_FC_TYPE_BEACON:
+ dev_dbg(&sdata->dev->dev, "BEACON received\n");
+ if (!mac802154_is_scanning(sdata->local))
+ goto fail;
+
+ mac_pkt = kzalloc(sizeof(*mac_pkt), GFP_ATOMIC);
+ if (!mac_pkt)
+ goto fail;
+
+ mac_pkt->skb = skb_get(skb);
+ mac_pkt->sdata = sdata;
+ mac_pkt->page = sdata->local->scan_page;
+ mac_pkt->channel = sdata->local->scan_channel;
+ list_add_tail(&mac_pkt->node, &sdata->local->rx_beacon_list);
+ queue_work(sdata->local->mac_wq, &sdata->local->rx_beacon_work);
+ return NET_RX_SUCCESS;
case IEEE802154_FC_TYPE_ACK:
case IEEE802154_FC_TYPE_MAC_CMD:
goto fail;
diff --git a/net/mac802154/scan.c b/net/mac802154/scan.c
new file mode 100644
index 000000000000..2d3d2d5a4e34
--- /dev/null
+++ b/net/mac802154/scan.c
@@ -0,0 +1,289 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * IEEE 802.15.4 scanning management
+ *
+ * Copyright (C) 2021 Qorvo US, Inc
+ * Authors:
+ * - David Girault <david.girault@qorvo.com>
+ * - Miquel Raynal <miquel.raynal@bootlin.com>
+ */
+
+#include <linux/module.h>
+#include <linux/rtnetlink.h>
+#include <net/mac802154.h>
+
+#include "ieee802154_i.h"
+#include "driver-ops.h"
+#include "../ieee802154/nl802154.h"
+
+/*
+ * mac802154_scan_cleanup_locked() must be called upon scan completion or abort.
+ * - Completions are asynchronous, not locked by the rtnl and decided by the
+ * scan worker.
+ * - Aborts are decided by userspace, and locked by the rtnl.
+ *
+ * Concurrent modifications to the PHY, the interfaces or the hardware is in
+ * general prevented by the rtnl. So in most cases we don't need additional
+ * protection.
+ *
+ * However, the scan worker get's triggered without anybody noticing and thus we
+ * must ensure the presence of the devices as well as data consistency:
+ * - The sub-interface and device driver module get both their reference
+ * counters incremented whenever we start a scan, so they cannot disappear
+ * during operation.
+ * - Data consistency is achieved by the use of rcu protected pointers.
+ */
+static int mac802154_scan_cleanup_locked(struct ieee802154_local *local,
+ struct ieee802154_sub_if_data *sdata,
+ bool aborted)
+{
+ struct wpan_dev *wpan_dev = &sdata->wpan_dev;
+ struct wpan_phy *wpan_phy = local->phy;
+ struct cfg802154_scan_request *request;
+ u8 arg;
+
+ /* Prevent any further use of the scan request */
+ clear_bit(IEEE802154_IS_SCANNING, &local->ongoing);
+ cancel_delayed_work(&local->scan_work);
+ request = rcu_replace_pointer(local->scan_req, NULL, 1);
+ if (!request)
+ return 0;
+ kfree_rcu(request);
+
+ /* Advertize first, while we know the devices cannot be removed */
+ if (aborted)
+ arg = NL802154_SCAN_DONE_REASON_ABORTED;
+ else
+ arg = NL802154_SCAN_DONE_REASON_FINISHED;
+ nl802154_scan_done(wpan_phy, wpan_dev, arg);
+
+ /* Cleanup software stack */
+ ieee802154_mlme_op_post(local);
+
+ /* Set the hardware back in its original state */
+ drv_set_channel(local, wpan_phy->current_page,
+ wpan_phy->current_channel);
+ ieee802154_configure_durations(wpan_phy, wpan_phy->current_page,
+ wpan_phy->current_channel);
+ drv_stop(local);
+ synchronize_net();
+ sdata->required_filtering = sdata->iface_default_filtering;
+ drv_start(local, sdata->required_filtering, &local->addr_filt);
+
+ return 0;
+}
+
+int mac802154_abort_scan_locked(struct ieee802154_local *local,
+ struct ieee802154_sub_if_data *sdata)
+{
+ ASSERT_RTNL();
+
+ if (!mac802154_is_scanning(local))
+ return -ESRCH;
+
+ return mac802154_scan_cleanup_locked(local, sdata, true);
+}
+
+static unsigned int mac802154_scan_get_channel_time(u8 duration_order,
+ u8 symbol_duration)
+{
+ u64 base_super_frame_duration = (u64)symbol_duration *
+ IEEE802154_SUPERFRAME_PERIOD * IEEE802154_SLOT_PERIOD;
+
+ return usecs_to_jiffies(base_super_frame_duration *
+ (BIT(duration_order) + 1));
+}
+
+static void mac802154_flush_queued_beacons(struct ieee802154_local *local)
+{
+ struct cfg802154_mac_pkt *mac_pkt, *tmp;
+
+ list_for_each_entry_safe(mac_pkt, tmp, &local->rx_beacon_list, node) {
+ list_del(&mac_pkt->node);
+ kfree_skb(mac_pkt->skb);
+ kfree(mac_pkt);
+ }
+}
+
+static void
+mac802154_scan_get_next_channel(struct ieee802154_local *local,
+ struct cfg802154_scan_request *scan_req,
+ u8 *channel)
+{
+ (*channel)++;
+ *channel = find_next_bit((const unsigned long *)&scan_req->channels,
+ IEEE802154_MAX_CHANNEL + 1,
+ *channel);
+}
+
+static int mac802154_scan_find_next_chan(struct ieee802154_local *local,
+ struct cfg802154_scan_request *scan_req,
+ u8 page, u8 *channel)
+{
+ mac802154_scan_get_next_channel(local, scan_req, channel);
+ if (*channel > IEEE802154_MAX_CHANNEL)
+ return -EINVAL;
+
+ return 0;
+}
+
+void mac802154_scan_worker(struct work_struct *work)
+{
+ struct ieee802154_local *local =
+ container_of(work, struct ieee802154_local, scan_work.work);
+ struct cfg802154_scan_request *scan_req;
+ struct ieee802154_sub_if_data *sdata;
+ unsigned int scan_duration = 0;
+ struct wpan_phy* wpan_phy;
+ u8 scan_req_duration;
+ u8 page, channel;
+ int ret;
+
+ /* Ensure the device receiver is turned off when changing channels
+ * because there is no atomic way to change the channel and know on
+ * which one a beacon might have been received.
+ */
+ drv_stop(local);
+ synchronize_net();
+ mac802154_flush_queued_beacons(local);
+
+ rcu_read_lock();
+ scan_req = rcu_dereference(local->scan_req);
+ if (unlikely(!scan_req)) {
+ rcu_read_unlock();
+ return;
+ }
+
+ sdata = IEEE802154_WPAN_DEV_TO_SUB_IF(scan_req->wpan_dev);
+
+ /* Wait an arbitrary amount of time in case we cannot use the device */
+ if (local->suspended || !ieee802154_sdata_running(sdata)) {
+ rcu_read_unlock();
+ queue_delayed_work(local->mac_wq, &local->scan_work,
+ msecs_to_jiffies(1000));
+ return;
+ }
+
+ wpan_phy = scan_req->wpan_phy;
+ scan_req_duration = scan_req->duration;
+
+ /* Look for the next valid chan */
+ page = local->scan_page;
+ channel = local->scan_channel;
+ do {
+ ret = mac802154_scan_find_next_chan(local, scan_req, page, &channel);
+ if (ret) {
+ rcu_read_unlock();
+ goto end_scan;
+ }
+ } while (!ieee802154_chan_is_valid(scan_req->wpan_phy, page, channel));
+
+ rcu_read_unlock();
+
+ /* Bypass the stack on purpose when changing the channel */
+ rtnl_lock();
+ ret = drv_set_channel(local, page, channel);
+ rtnl_unlock();
+ if (ret) {
+ dev_err(&sdata->dev->dev,
+ "Channel change failure during scan, aborting (%d)\n", ret);
+ goto end_scan;
+ }
+
+ local->scan_page = page;
+ local->scan_channel = channel;
+
+ rtnl_lock();
+ ret = drv_start(local, IEEE802154_FILTERING_3_SCAN, &local->addr_filt);
+ rtnl_unlock();
+ if (ret) {
+ dev_err(&sdata->dev->dev,
+ "Restarting failure after channel change, aborting (%d)\n", ret);
+ goto end_scan;
+ }
+
+ ieee802154_configure_durations(wpan_phy, page, channel);
+ scan_duration = mac802154_scan_get_channel_time(scan_req_duration,
+ wpan_phy->symbol_duration);
+ dev_dbg(&sdata->dev->dev,
+ "Scan page %u channel %u for %ums\n",
+ page, channel, jiffies_to_msecs(scan_duration));
+ queue_delayed_work(local->mac_wq, &local->scan_work, scan_duration);
+ return;
+
+end_scan:
+ rtnl_lock();
+ mac802154_scan_cleanup_locked(local, sdata, false);
+ rtnl_unlock();
+}
+
+int mac802154_trigger_scan_locked(struct ieee802154_sub_if_data *sdata,
+ struct cfg802154_scan_request *request)
+{
+ struct ieee802154_local *local = sdata->local;
+
+ ASSERT_RTNL();
+
+ if (mac802154_is_scanning(local))
+ return -EBUSY;
+
+ /* TODO: support other scanning type */
+ if (request->type != NL802154_SCAN_PASSIVE)
+ return -EOPNOTSUPP;
+
+ /* Store scanning parameters */
+ rcu_assign_pointer(local->scan_req, request);
+
+ /* Software scanning requires to set promiscuous mode, so we need to
+ * pause the Tx queue during the entire operation.
+ */
+ ieee802154_mlme_op_pre(local);
+
+ sdata->required_filtering = IEEE802154_FILTERING_3_SCAN;
+ local->scan_page = request->page;
+ local->scan_channel = -1;
+ set_bit(IEEE802154_IS_SCANNING, &local->ongoing);
+
+ nl802154_scan_started(request->wpan_phy, request->wpan_dev);
+
+ queue_delayed_work(local->mac_wq, &local->scan_work, 0);
+
+ return 0;
+}
+
+int mac802154_process_beacon(struct ieee802154_local *local,
+ struct sk_buff *skb,
+ u8 page, u8 channel)
+{
+ struct ieee802154_beacon_hdr *bh = (void *)skb->data;
+ struct ieee802154_addr *src = &mac_cb(skb)->source;
+ struct cfg802154_scan_request *scan_req;
+ struct ieee802154_coord_desc desc;
+
+ if (skb->len != sizeof(*bh))
+ return -EINVAL;
+
+ if (unlikely(src->mode == IEEE802154_ADDR_NONE))
+ return -EINVAL;
+
+ dev_dbg(&skb->dev->dev,
+ "BEACON received on page %u channel %u\n",
+ page, channel);
+
+ memcpy(&desc.addr, src, sizeof(desc.addr));
+ desc.page = page;
+ desc.channel = channel;
+ desc.link_quality = mac_cb(skb)->lqi;
+ desc.superframe_spec = get_unaligned_le16(skb->data);
+ desc.gts_permit = bh->gts_permit;
+
+ trace_802154_scan_event(&desc);
+
+ rcu_read_lock();
+ scan_req = rcu_dereference(local->scan_req);
+ if (likely(scan_req))
+ nl802154_scan_event(scan_req->wpan_phy, scan_req->wpan_dev, &desc);
+ rcu_read_unlock();
+
+ return 0;
+}
--
2.34.1
^ permalink raw reply related
* [PATCH wpan-next v2 5/6] mac802154: Add MLME Tx locked helpers
From: Miquel Raynal @ 2022-12-17 0:02 UTC (permalink / raw)
To: Alexander Aring, Stefan Schmidt, linux-wpan
Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
Guilhem Imberton, Thomas Petazzoni, David S. Miller,
Jakub Kicinski, Paolo Abeni, Eric Dumazet, netdev, Miquel Raynal
In-Reply-To: <20221217000226.646767-1-miquel.raynal@bootlin.com>
These have the exact same behavior as before, except they expect the
rtnl to be already taken (and will complain otherwise). This allows
performing MLME transmissions from different contexts.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
net/mac802154/ieee802154_i.h | 6 ++++++
net/mac802154/tx.c | 42 +++++++++++++++++++++++++-----------
2 files changed, 35 insertions(+), 13 deletions(-)
diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h
index 509e0172fe82..aeadee543a9c 100644
--- a/net/mac802154/ieee802154_i.h
+++ b/net/mac802154/ieee802154_i.h
@@ -141,10 +141,16 @@ int ieee802154_mlme_op_pre(struct ieee802154_local *local);
int ieee802154_mlme_tx(struct ieee802154_local *local,
struct ieee802154_sub_if_data *sdata,
struct sk_buff *skb);
+int ieee802154_mlme_tx_locked(struct ieee802154_local *local,
+ struct ieee802154_sub_if_data *sdata,
+ struct sk_buff *skb);
void ieee802154_mlme_op_post(struct ieee802154_local *local);
int ieee802154_mlme_tx_one(struct ieee802154_local *local,
struct ieee802154_sub_if_data *sdata,
struct sk_buff *skb);
+int ieee802154_mlme_tx_one_locked(struct ieee802154_local *local,
+ struct ieee802154_sub_if_data *sdata,
+ struct sk_buff *skb);
netdev_tx_t
ieee802154_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev);
netdev_tx_t
diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index 9d8d43cf1e64..2a6f1ed763c9 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -137,34 +137,37 @@ int ieee802154_mlme_op_pre(struct ieee802154_local *local)
return ieee802154_sync_and_hold_queue(local);
}
-int ieee802154_mlme_tx(struct ieee802154_local *local,
- struct ieee802154_sub_if_data *sdata,
- struct sk_buff *skb)
+int ieee802154_mlme_tx_locked(struct ieee802154_local *local,
+ struct ieee802154_sub_if_data *sdata,
+ struct sk_buff *skb)
{
- int ret;
-
/* Avoid possible calls to ->ndo_stop() when we asynchronously perform
* MLME transmissions.
*/
- rtnl_lock();
+ ASSERT_RTNL();
/* Ensure the device was not stopped, otherwise error out */
- if (!local->open_count) {
- rtnl_unlock();
+ if (!local->open_count)
return -ENETDOWN;
- }
/* Warn if the ieee802154 core thinks MLME frames can be sent while the
* net interface expects this cannot happen.
*/
- if (WARN_ON_ONCE(!netif_running(sdata->dev))) {
- rtnl_unlock();
+ if (WARN_ON_ONCE(!netif_running(sdata->dev)))
return -ENETDOWN;
- }
ieee802154_tx(local, skb);
- ret = ieee802154_sync_queue(local);
+ return ieee802154_sync_queue(local);
+}
+int ieee802154_mlme_tx(struct ieee802154_local *local,
+ struct ieee802154_sub_if_data *sdata,
+ struct sk_buff *skb)
+{
+ int ret;
+
+ rtnl_lock();
+ ret = ieee802154_mlme_tx_locked(local, sdata, skb);
rtnl_unlock();
return ret;
@@ -188,6 +191,19 @@ int ieee802154_mlme_tx_one(struct ieee802154_local *local,
return ret;
}
+int ieee802154_mlme_tx_one_locked(struct ieee802154_local *local,
+ struct ieee802154_sub_if_data *sdata,
+ struct sk_buff *skb)
+{
+ int ret;
+
+ ieee802154_mlme_op_pre(local);
+ ret = ieee802154_mlme_tx_locked(local, sdata, skb);
+ ieee802154_mlme_op_post(local);
+
+ return ret;
+}
+
static bool ieee802154_queue_is_stopped(struct ieee802154_local *local)
{
return test_bit(WPAN_PHY_FLAG_STATE_QUEUE_STOPPED, &local->phy->flags);
--
2.34.1
^ permalink raw reply related
* [PATCH wpan-next v2 4/6] mac802154: Prepare forcing specific symbol duration
From: Miquel Raynal @ 2022-12-17 0:02 UTC (permalink / raw)
To: Alexander Aring, Stefan Schmidt, linux-wpan
Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
Guilhem Imberton, Thomas Petazzoni, David S. Miller,
Jakub Kicinski, Paolo Abeni, Eric Dumazet, netdev, Miquel Raynal
In-Reply-To: <20221217000226.646767-1-miquel.raynal@bootlin.com>
The scan logic will bypass the whole ->set_channel() logic from the top
by calling the driver hook to just switch between channels when
required.
We can no longer rely on the "current" page/channel settings to set the
right symbol duration. Let's add these as new parameters to allow
providing the page/channel couple that we want.
There is no functional change.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
include/net/cfg802154.h | 3 ++-
net/mac802154/cfg.c | 2 +-
net/mac802154/main.c | 20 +++++++++++---------
3 files changed, 14 insertions(+), 11 deletions(-)
diff --git a/include/net/cfg802154.h b/include/net/cfg802154.h
index 11bedfa96371..5d4750d24f13 100644
--- a/include/net/cfg802154.h
+++ b/include/net/cfg802154.h
@@ -483,6 +483,7 @@ static inline const char *wpan_phy_name(struct wpan_phy *phy)
return dev_name(&phy->dev);
}
-void ieee802154_configure_durations(struct wpan_phy *phy);
+void ieee802154_configure_durations(struct wpan_phy *phy,
+ unsigned int page, unsigned int channel);
#endif /* __NET_CFG802154_H */
diff --git a/net/mac802154/cfg.c b/net/mac802154/cfg.c
index dc2d918fac68..469d6e8dd2dd 100644
--- a/net/mac802154/cfg.c
+++ b/net/mac802154/cfg.c
@@ -118,7 +118,7 @@ ieee802154_set_channel(struct wpan_phy *wpan_phy, u8 page, u8 channel)
if (!ret) {
wpan_phy->current_page = page;
wpan_phy->current_channel = channel;
- ieee802154_configure_durations(wpan_phy);
+ ieee802154_configure_durations(wpan_phy, page, channel);
}
return ret;
diff --git a/net/mac802154/main.c b/net/mac802154/main.c
index 3ed31daf7b9c..12a13a850fdf 100644
--- a/net/mac802154/main.c
+++ b/net/mac802154/main.c
@@ -113,32 +113,33 @@ ieee802154_alloc_hw(size_t priv_data_len, const struct ieee802154_ops *ops)
}
EXPORT_SYMBOL(ieee802154_alloc_hw);
-void ieee802154_configure_durations(struct wpan_phy *phy)
+void ieee802154_configure_durations(struct wpan_phy *phy,
+ unsigned int page, unsigned int channel)
{
u32 duration = 0;
- switch (phy->current_page) {
+ switch (page) {
case 0:
- if (BIT(phy->current_channel) & 0x1)
+ if (BIT(channel) & 0x1)
/* 868 MHz BPSK 802.15.4-2003: 20 ksym/s */
duration = 50 * NSEC_PER_USEC;
- else if (BIT(phy->current_channel) & 0x7FE)
+ else if (BIT(channel) & 0x7FE)
/* 915 MHz BPSK 802.15.4-2003: 40 ksym/s */
duration = 25 * NSEC_PER_USEC;
- else if (BIT(phy->current_channel) & 0x7FFF800)
+ else if (BIT(channel) & 0x7FFF800)
/* 2400 MHz O-QPSK 802.15.4-2006: 62.5 ksym/s */
duration = 16 * NSEC_PER_USEC;
break;
case 2:
- if (BIT(phy->current_channel) & 0x1)
+ if (BIT(channel) & 0x1)
/* 868 MHz O-QPSK 802.15.4-2006: 25 ksym/s */
duration = 40 * NSEC_PER_USEC;
- else if (BIT(phy->current_channel) & 0x7FE)
+ else if (BIT(channel) & 0x7FE)
/* 915 MHz O-QPSK 802.15.4-2006: 62.5 ksym/s */
duration = 16 * NSEC_PER_USEC;
break;
case 3:
- if (BIT(phy->current_channel) & 0x3FFF)
+ if (BIT(channel) & 0x3FFF)
/* 2.4 GHz CSS 802.15.4a-2007: 1/6 Msym/s */
duration = 6 * NSEC_PER_USEC;
break;
@@ -201,7 +202,8 @@ int ieee802154_register_hw(struct ieee802154_hw *hw)
ieee802154_setup_wpan_phy_pib(local->phy);
- ieee802154_configure_durations(local->phy);
+ ieee802154_configure_durations(local->phy, local->phy->current_page,
+ local->phy->current_channel);
if (!(hw->flags & IEEE802154_HW_CSMA_PARAMS)) {
local->phy->supported.min_csma_backoffs = 4;
--
2.34.1
^ permalink raw reply related
* [PATCH wpan-next v2 3/6] ieee802154: Introduce a helper to validate a channel
From: Miquel Raynal @ 2022-12-17 0:02 UTC (permalink / raw)
To: Alexander Aring, Stefan Schmidt, linux-wpan
Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
Guilhem Imberton, Thomas Petazzoni, David S. Miller,
Jakub Kicinski, Paolo Abeni, Eric Dumazet, netdev, Miquel Raynal
In-Reply-To: <20221217000226.646767-1-miquel.raynal@bootlin.com>
This helper for now only checks if the page member and channel member
are valid (in the specification range) and supported (by checking the
device capabilities). Soon two new parameters will be introduced and
having this helper will let us only modify its content rather than
modifying the logic everywhere else in the subsystem.
There is not functional change.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
include/net/cfg802154.h | 11 +++++++++++
net/ieee802154/nl802154.c | 3 +--
2 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/include/net/cfg802154.h b/include/net/cfg802154.h
index 76d4f95e9974..11bedfa96371 100644
--- a/include/net/cfg802154.h
+++ b/include/net/cfg802154.h
@@ -246,6 +246,17 @@ static inline void wpan_phy_net_set(struct wpan_phy *wpan_phy, struct net *net)
write_pnet(&wpan_phy->_net, net);
}
+static inline bool ieee802154_chan_is_valid(struct wpan_phy *phy,
+ u8 page, u8 channel)
+{
+ if (page > IEEE802154_MAX_PAGE ||
+ channel > IEEE802154_MAX_CHANNEL ||
+ !(phy->supported.channels[page] & BIT(channel)))
+ return false;
+
+ return true;
+}
+
/**
* struct ieee802154_addr - IEEE802.15.4 device address
* @mode: Address mode from frame header. Can be one of:
diff --git a/net/ieee802154/nl802154.c b/net/ieee802154/nl802154.c
index 64c6c33b28a9..1d703251f74a 100644
--- a/net/ieee802154/nl802154.c
+++ b/net/ieee802154/nl802154.c
@@ -976,8 +976,7 @@ static int nl802154_set_channel(struct sk_buff *skb, struct genl_info *info)
channel = nla_get_u8(info->attrs[NL802154_ATTR_CHANNEL]);
/* check 802.15.4 constraints */
- if (page > IEEE802154_MAX_PAGE || channel > IEEE802154_MAX_CHANNEL ||
- !(rdev->wpan_phy.supported.channels[page] & BIT(channel)))
+ if (!ieee802154_chan_is_valid(&rdev->wpan_phy, page, channel))
return -EINVAL;
return rdev_set_channel(rdev, page, channel);
--
2.34.1
^ permalink raw reply related
* [PATCH wpan-next v2 2/6] ieee802154: Define a beacon frame header
From: Miquel Raynal @ 2022-12-17 0:02 UTC (permalink / raw)
To: Alexander Aring, Stefan Schmidt, linux-wpan
Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
Guilhem Imberton, Thomas Petazzoni, David S. Miller,
Jakub Kicinski, Paolo Abeni, Eric Dumazet, netdev, Miquel Raynal
In-Reply-To: <20221217000226.646767-1-miquel.raynal@bootlin.com>
This definition will be used when adding support for scanning and defines
the content of a beacon frame header as in the 802.15.4 specification.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
include/net/ieee802154_netdev.h | 36 +++++++++++++++++++++++++++++++++
1 file changed, 36 insertions(+)
diff --git a/include/net/ieee802154_netdev.h b/include/net/ieee802154_netdev.h
index 4c33a20ea57f..2f2196049a86 100644
--- a/include/net/ieee802154_netdev.h
+++ b/include/net/ieee802154_netdev.h
@@ -38,6 +38,42 @@
#include <net/cfg802154.h>
+struct ieee802154_beacon_hdr {
+#if defined(__LITTLE_ENDIAN_BITFIELD)
+ u16 beacon_order:4,
+ superframe_order:4,
+ final_cap_slot:4,
+ battery_life_ext:1,
+ reserved0:1,
+ pan_coordinator:1,
+ assoc_permit:1;
+ u8 gts_count:3,
+ gts_reserved:4,
+ gts_permit:1;
+ u8 pend_short_addr_count:3,
+ reserved1:1,
+ pend_ext_addr_count:3,
+ reserved2:1;
+#elif defined(__BIG_ENDIAN_BITFIELD)
+ u16 assoc_permit:1,
+ pan_coordinator:1,
+ reserved0:1,
+ battery_life_ext:1,
+ final_cap_slot:4,
+ superframe_order:4,
+ beacon_order:4;
+ u8 gts_permit:1,
+ gts_reserved:4,
+ gts_count:3;
+ u8 reserved2:1,
+ pend_ext_addr_count:3,
+ reserved1:1,
+ pend_short_addr_count:3;
+#else
+#error "Please fix <asm/byteorder.h>"
+#endif
+} __packed;
+
struct ieee802154_sechdr {
#if defined(__LITTLE_ENDIAN_BITFIELD)
u8 level:3,
--
2.34.1
^ permalink raw reply related
* [PATCH wpan-next v2 1/6] ieee802154: Add support for user scanning requests
From: Miquel Raynal @ 2022-12-17 0:02 UTC (permalink / raw)
To: Alexander Aring, Stefan Schmidt, linux-wpan
Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
Guilhem Imberton, Thomas Petazzoni, David S. Miller,
Jakub Kicinski, Paolo Abeni, Eric Dumazet, netdev, Miquel Raynal
In-Reply-To: <20221217000226.646767-1-miquel.raynal@bootlin.com>
The ieee802154 layer should be able to scan a set of channels in order
to look for beacons advertizing PANs. Supporting this involves adding
two user commands: triggering scans and aborting scans. The user should
also be notified when a new beacon is received and also upon scan
termination.
A scan request structure is created to list the requirements and to be
accessed asynchronously when changing channels or receiving beacons.
Mac layers may now implement the ->trigger_scan() and ->abort_scan()
hooks.
Co-developed-by: David Girault <david.girault@qorvo.com>
Signed-off-by: David Girault <david.girault@qorvo.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
include/linux/ieee802154.h | 3 +
include/net/cfg802154.h | 25 +++++
include/net/nl802154.h | 58 ++++++++++
net/ieee802154/nl802154.c | 220 +++++++++++++++++++++++++++++++++++++
net/ieee802154/nl802154.h | 3 +
net/ieee802154/rdev-ops.h | 28 +++++
net/ieee802154/trace.h | 40 +++++++
7 files changed, 377 insertions(+)
diff --git a/include/linux/ieee802154.h b/include/linux/ieee802154.h
index 0303eb84d596..b22e4147d334 100644
--- a/include/linux/ieee802154.h
+++ b/include/linux/ieee802154.h
@@ -44,6 +44,9 @@
#define IEEE802154_SHORT_ADDR_LEN 2
#define IEEE802154_PAN_ID_LEN 2
+/* Duration in superframe order */
+#define IEEE802154_MAX_SCAN_DURATION 14
+#define IEEE802154_ACTIVE_SCAN_DURATION 15
#define IEEE802154_LIFS_PERIOD 40
#define IEEE802154_SIFS_PERIOD 12
#define IEEE802154_MAX_SIFS_FRAME_SIZE 18
diff --git a/include/net/cfg802154.h b/include/net/cfg802154.h
index d09c393d229f..76d4f95e9974 100644
--- a/include/net/cfg802154.h
+++ b/include/net/cfg802154.h
@@ -18,6 +18,7 @@
struct wpan_phy;
struct wpan_phy_cca;
+struct cfg802154_scan_request;
#ifdef CONFIG_IEEE802154_NL802154_EXPERIMENTAL
struct ieee802154_llsec_device_key;
@@ -67,6 +68,10 @@ struct cfg802154_ops {
struct wpan_dev *wpan_dev, bool mode);
int (*set_ackreq_default)(struct wpan_phy *wpan_phy,
struct wpan_dev *wpan_dev, bool ackreq);
+ int (*trigger_scan)(struct wpan_phy *wpan_phy,
+ struct cfg802154_scan_request *request);
+ int (*abort_scan)(struct wpan_phy *wpan_phy,
+ struct wpan_dev *wpan_dev);
#ifdef CONFIG_IEEE802154_NL802154_EXPERIMENTAL
void (*get_llsec_table)(struct wpan_phy *wpan_phy,
struct wpan_dev *wpan_dev,
@@ -278,6 +283,26 @@ struct ieee802154_coord_desc {
bool gts_permit;
};
+/**
+ * struct cfg802154_scan_request - Scan request
+ *
+ * @type: type of scan to be performed
+ * @page: page on which to perform the scan
+ * @channels: channels in te %page to be scanned
+ * @duration: time spent on each channel, calculated with:
+ * aBaseSuperframeDuration * (2 ^ duration + 1)
+ * @wpan_dev: the wpan device on which to perform the scan
+ * @wpan_phy: the wpan phy on which to perform the scan
+ */
+struct cfg802154_scan_request {
+ enum nl802154_scan_types type;
+ u8 page;
+ u32 channels;
+ u8 duration;
+ struct wpan_dev *wpan_dev;
+ struct wpan_phy *wpan_phy;
+};
+
struct ieee802154_llsec_key_id {
u8 mode;
u8 id;
diff --git a/include/net/nl802154.h b/include/net/nl802154.h
index b79a89d5207c..c267fa1c5aac 100644
--- a/include/net/nl802154.h
+++ b/include/net/nl802154.h
@@ -73,6 +73,9 @@ enum nl802154_commands {
NL802154_CMD_DEL_SEC_LEVEL,
NL802154_CMD_SCAN_EVENT,
+ NL802154_CMD_TRIGGER_SCAN,
+ NL802154_CMD_ABORT_SCAN,
+ NL802154_CMD_SCAN_DONE,
/* add new commands above here */
@@ -134,6 +137,13 @@ enum nl802154_attrs {
NL802154_ATTR_NETNS_FD,
NL802154_ATTR_COORDINATOR,
+ NL802154_ATTR_SCAN_TYPE,
+ NL802154_ATTR_SCAN_FLAGS,
+ NL802154_ATTR_SCAN_CHANNELS,
+ NL802154_ATTR_SCAN_PREAMBLE_CODES,
+ NL802154_ATTR_SCAN_MEAN_PRF,
+ NL802154_ATTR_SCAN_DURATION,
+ NL802154_ATTR_SCAN_DONE_REASON,
/* add attributes here, update the policy in nl802154.c */
@@ -259,6 +269,54 @@ enum nl802154_coord {
NL802154_COORD_MAX,
};
+/**
+ * enum nl802154_scan_types - Scan types
+ *
+ * @__NL802154_SCAN_INVALID: scan type number 0 is reserved
+ * @NL802154_SCAN_ED: An ED scan allows a device to obtain a measure of the peak
+ * energy in each requested channel
+ * @NL802154_SCAN_ACTIVE: Locate any coordinator transmitting Beacon frames using
+ * a Beacon Request command
+ * @NL802154_SCAN_PASSIVE: Locate any coordinator transmitting Beacon frames
+ * @NL802154_SCAN_ORPHAN: Relocate coordinator following a loss of synchronisation
+ * @NL802154_SCAN_ENHANCED_ACTIVE: Same as Active using Enhanced Beacon Request
+ * command instead of Beacon Request command
+ * @NL802154_SCAN_RIT_PASSIVE: Passive scan for RIT Data Request command frames
+ * instead of Beacon frames
+ * @NL802154_SCAN_ATTR_MAX: Maximum SCAN attribute number
+ */
+enum nl802154_scan_types {
+ __NL802154_SCAN_INVALID,
+ NL802154_SCAN_ED,
+ NL802154_SCAN_ACTIVE,
+ NL802154_SCAN_PASSIVE,
+ NL802154_SCAN_ORPHAN,
+ NL802154_SCAN_ENHANCED_ACTIVE,
+ NL802154_SCAN_RIT_PASSIVE,
+
+ /* keep last */
+ NL802154_SCAN_ATTR_MAX,
+};
+
+/**
+ * enum nl802154_scan_done_reasons - End of scan reasons
+ *
+ * @__NL802154_SCAN_DONE_REASON_INVALID: scan done reason number 0 is reserved.
+ * @NL802154_SCAN_DONE_REASON_FINISHED: The scan just finished naturally after
+ * going through all the requested and possible (complex) channels.
+ * @NL802154_SCAN_DONE_REASON_ABORTED: The scan was aborted upon user request.
+ * a Beacon Request command
+ * @NL802154_SCAN_DONE_REASON_MAX: Maximum scan done reason attribute number.
+ */
+enum nl802154_scan_done_reasons {
+ __NL802154_SCAN_DONE_REASON_INVALID,
+ NL802154_SCAN_DONE_REASON_FINISHED,
+ NL802154_SCAN_DONE_REASON_ABORTED,
+
+ /* keep last */
+ NL802154_SCAN_DONE_REASON_MAX,
+};
+
/**
* enum nl802154_cca_modes - cca modes
*
diff --git a/net/ieee802154/nl802154.c b/net/ieee802154/nl802154.c
index 80dc73182785..64c6c33b28a9 100644
--- a/net/ieee802154/nl802154.c
+++ b/net/ieee802154/nl802154.c
@@ -221,6 +221,13 @@ static const struct nla_policy nl802154_policy[NL802154_ATTR_MAX+1] = {
[NL802154_ATTR_COORDINATOR] = { .type = NLA_NESTED },
+ [NL802154_ATTR_SCAN_TYPE] = { .type = NLA_U8 },
+ [NL802154_ATTR_SCAN_CHANNELS] = { .type = NLA_U32 },
+ [NL802154_ATTR_SCAN_PREAMBLE_CODES] = { .type = NLA_U64 },
+ [NL802154_ATTR_SCAN_MEAN_PRF] = { .type = NLA_U8 },
+ [NL802154_ATTR_SCAN_DURATION] = { .type = NLA_U8 },
+ [NL802154_ATTR_SCAN_DONE_REASON] = { .type = NLA_U8 },
+
#ifdef CONFIG_IEEE802154_NL802154_EXPERIMENTAL
[NL802154_ATTR_SEC_ENABLED] = { .type = NLA_U8, },
[NL802154_ATTR_SEC_OUT_LEVEL] = { .type = NLA_U32, },
@@ -1384,6 +1391,203 @@ int nl802154_scan_event(struct wpan_phy *wpan_phy, struct wpan_dev *wpan_dev,
}
EXPORT_SYMBOL_GPL(nl802154_scan_event);
+static int nl802154_trigger_scan(struct sk_buff *skb, struct genl_info *info)
+{
+ struct cfg802154_registered_device *rdev = info->user_ptr[0];
+ struct net_device *dev = info->user_ptr[1];
+ struct wpan_dev *wpan_dev = dev->ieee802154_ptr;
+ struct wpan_phy *wpan_phy = &rdev->wpan_phy;
+ struct cfg802154_scan_request *request;
+ u8 type;
+ int err;
+
+ /* Monitors are not allowed to perform scans */
+ if (wpan_dev->iftype == NL802154_IFTYPE_MONITOR)
+ return -EPERM;
+
+ request = kzalloc(sizeof(*request), GFP_KERNEL);
+ if (!request)
+ return -ENOMEM;
+
+ request->wpan_dev = wpan_dev;
+ request->wpan_phy = wpan_phy;
+
+ type = nla_get_u8(info->attrs[NL802154_ATTR_SCAN_TYPE]);
+ switch (type) {
+ case NL802154_SCAN_PASSIVE:
+ request->type = type;
+ break;
+ default:
+ pr_err("Unsupported scan type: %d\n", type);
+ err = -EINVAL;
+ goto free_request;
+ }
+
+ if (info->attrs[NL802154_ATTR_PAGE]) {
+ request->page = nla_get_u8(info->attrs[NL802154_ATTR_PAGE]);
+ if (request->page > IEEE802154_MAX_PAGE) {
+ pr_err("Invalid page %d > %d\n",
+ request->page, IEEE802154_MAX_PAGE);
+ err = -EINVAL;
+ goto free_request;
+ }
+ } else {
+ /* Use current page by default */
+ request->page = wpan_phy->current_page;
+ }
+
+ if (info->attrs[NL802154_ATTR_SCAN_CHANNELS]) {
+ request->channels = nla_get_u32(info->attrs[NL802154_ATTR_SCAN_CHANNELS]);
+ if (request->channels >= BIT(IEEE802154_MAX_CHANNEL + 1)) {
+ pr_err("Invalid channels bitfield %x ≥ %lx\n",
+ request->channels,
+ BIT(IEEE802154_MAX_CHANNEL + 1));
+ err = -EINVAL;
+ goto free_request;
+ }
+ } else {
+ /* Scan all supported channels by default */
+ request->channels = wpan_phy->supported.channels[request->page];
+ }
+
+ if (info->attrs[NL802154_ATTR_SCAN_PREAMBLE_CODES] ||
+ info->attrs[NL802154_ATTR_SCAN_MEAN_PRF]) {
+ pr_err("Preamble codes and mean PRF not supported yet\n");
+ err = -EINVAL;
+ goto free_request;
+ }
+
+ if (info->attrs[NL802154_ATTR_SCAN_DURATION]) {
+ request->duration = nla_get_u8(info->attrs[NL802154_ATTR_SCAN_DURATION]);
+ if (request->duration > IEEE802154_MAX_SCAN_DURATION) {
+ pr_err("Duration is out of range\n");
+ err = -EINVAL;
+ goto free_request;
+ }
+ } else {
+ /* Use maximum duration order by default */
+ request->duration = IEEE802154_MAX_SCAN_DURATION;
+ }
+
+ if (wpan_dev->netdev)
+ dev_hold(wpan_dev->netdev);
+
+ err = rdev_trigger_scan(rdev, request);
+ if (err) {
+ pr_err("Failure starting scanning (%d)\n", err);
+ goto free_device;
+ }
+
+ return 0;
+
+free_device:
+ if (wpan_dev->netdev)
+ dev_put(wpan_dev->netdev);
+free_request:
+ kfree(request);
+
+ return err;
+}
+
+static int nl802154_prep_scan_msg(struct sk_buff *msg,
+ struct cfg802154_registered_device *rdev,
+ struct wpan_dev *wpan_dev, u32 portid,
+ u32 seq, int flags, u8 cmd, u8 arg)
+{
+ void *hdr;
+
+ hdr = nl802154hdr_put(msg, portid, seq, flags, cmd);
+ if (!hdr)
+ return -ENOBUFS;
+
+ if (nla_put_u32(msg, NL802154_ATTR_WPAN_PHY, rdev->wpan_phy_idx))
+ goto nla_put_failure;
+
+ if (wpan_dev->netdev &&
+ nla_put_u32(msg, NL802154_ATTR_IFINDEX, wpan_dev->netdev->ifindex))
+ goto nla_put_failure;
+
+ if (nla_put_u64_64bit(msg, NL802154_ATTR_WPAN_DEV,
+ wpan_dev_id(wpan_dev), NL802154_ATTR_PAD))
+ goto nla_put_failure;
+
+ if (cmd == NL802154_CMD_SCAN_DONE &&
+ nla_put_u8(msg, NL802154_ATTR_SCAN_DONE_REASON, arg))
+ goto nla_put_failure;
+
+ genlmsg_end(msg, hdr);
+
+ return 0;
+
+nla_put_failure:
+ genlmsg_cancel(msg, hdr);
+
+ return -EMSGSIZE;
+}
+
+static int nl802154_send_scan_msg(struct cfg802154_registered_device *rdev,
+ struct wpan_dev *wpan_dev, u8 cmd, u8 arg)
+{
+ struct sk_buff *msg;
+ int ret;
+
+ msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+ if (!msg)
+ return -ENOMEM;
+
+ ret = nl802154_prep_scan_msg(msg, rdev, wpan_dev, 0, 0, 0, cmd, arg);
+ if (ret < 0) {
+ nlmsg_free(msg);
+ return ret;
+ }
+
+ return genlmsg_multicast_netns(&nl802154_fam,
+ wpan_phy_net(&rdev->wpan_phy), msg, 0,
+ NL802154_MCGRP_SCAN, GFP_KERNEL);
+}
+
+int nl802154_scan_started(struct wpan_phy *wpan_phy, struct wpan_dev *wpan_dev)
+{
+ struct cfg802154_registered_device *rdev = wpan_phy_to_rdev(wpan_phy);
+ int err;
+
+ /* Ignore errors when there are no listeners */
+ err = nl802154_send_scan_msg(rdev, wpan_dev, NL802154_CMD_TRIGGER_SCAN, 0);
+ if (err == -ESRCH)
+ err = 0;
+
+ return err;
+}
+EXPORT_SYMBOL_GPL(nl802154_scan_started);
+
+int nl802154_scan_done(struct wpan_phy *wpan_phy, struct wpan_dev *wpan_dev,
+ enum nl802154_scan_done_reasons reason)
+{
+ struct cfg802154_registered_device *rdev = wpan_phy_to_rdev(wpan_phy);
+ int err;
+
+ /* Ignore errors when there are no listeners */
+ err = nl802154_send_scan_msg(rdev, wpan_dev, NL802154_CMD_SCAN_DONE, reason);
+ if (err == -ESRCH)
+ err = 0;
+
+ if (wpan_dev->netdev)
+ dev_put(wpan_dev->netdev);
+
+ return err;
+}
+EXPORT_SYMBOL_GPL(nl802154_scan_done);
+
+static int nl802154_abort_scan(struct sk_buff *skb, struct genl_info *info)
+{
+ struct cfg802154_registered_device *rdev = info->user_ptr[0];
+ struct net_device *dev = info->user_ptr[1];
+ struct wpan_dev *wpan_dev = dev->ieee802154_ptr;
+
+ /* Resources are released in the notification helper above */
+ return rdev_abort_scan(rdev, wpan_dev);
+}
+
#ifdef CONFIG_IEEE802154_NL802154_EXPERIMENTAL
static const struct nla_policy nl802154_dev_addr_policy[NL802154_DEV_ADDR_ATTR_MAX + 1] = {
[NL802154_DEV_ADDR_ATTR_PAN_ID] = { .type = NLA_U16 },
@@ -2472,6 +2676,22 @@ static const struct genl_ops nl802154_ops[] = {
.internal_flags = NL802154_FLAG_NEED_NETDEV |
NL802154_FLAG_NEED_RTNL,
},
+ {
+ .cmd = NL802154_CMD_TRIGGER_SCAN,
+ .doit = nl802154_trigger_scan,
+ .flags = GENL_ADMIN_PERM,
+ .internal_flags = NL802154_FLAG_NEED_NETDEV |
+ NL802154_FLAG_CHECK_NETDEV_UP |
+ NL802154_FLAG_NEED_RTNL,
+ },
+ {
+ .cmd = NL802154_CMD_ABORT_SCAN,
+ .doit = nl802154_abort_scan,
+ .flags = GENL_ADMIN_PERM,
+ .internal_flags = NL802154_FLAG_NEED_NETDEV |
+ NL802154_FLAG_CHECK_NETDEV_UP |
+ NL802154_FLAG_NEED_RTNL,
+ },
#ifdef CONFIG_IEEE802154_NL802154_EXPERIMENTAL
{
.cmd = NL802154_CMD_SET_SEC_PARAMS,
diff --git a/net/ieee802154/nl802154.h b/net/ieee802154/nl802154.h
index 89b805500032..cfa7134be747 100644
--- a/net/ieee802154/nl802154.h
+++ b/net/ieee802154/nl802154.h
@@ -6,5 +6,8 @@ int nl802154_init(void);
void nl802154_exit(void);
int nl802154_scan_event(struct wpan_phy *wpan_phy, struct wpan_dev *wpan_dev,
struct ieee802154_coord_desc *desc);
+int nl802154_scan_started(struct wpan_phy *wpan_phy, struct wpan_dev *wpan_dev);
+int nl802154_scan_done(struct wpan_phy *wpan_phy, struct wpan_dev *wpan_dev,
+ enum nl802154_scan_done_reasons reason);
#endif /* __IEEE802154_NL802154_H */
diff --git a/net/ieee802154/rdev-ops.h b/net/ieee802154/rdev-ops.h
index 598f5af49775..e171d74c3251 100644
--- a/net/ieee802154/rdev-ops.h
+++ b/net/ieee802154/rdev-ops.h
@@ -209,6 +209,34 @@ rdev_set_ackreq_default(struct cfg802154_registered_device *rdev,
return ret;
}
+static inline int rdev_trigger_scan(struct cfg802154_registered_device *rdev,
+ struct cfg802154_scan_request *request)
+{
+ int ret;
+
+ if (!rdev->ops->trigger_scan)
+ return -EOPNOTSUPP;
+
+ trace_802154_rdev_trigger_scan(&rdev->wpan_phy, request);
+ ret = rdev->ops->trigger_scan(&rdev->wpan_phy, request);
+ trace_802154_rdev_return_int(&rdev->wpan_phy, ret);
+ return ret;
+}
+
+static inline int rdev_abort_scan(struct cfg802154_registered_device *rdev,
+ struct wpan_dev *wpan_dev)
+{
+ int ret;
+
+ if (!rdev->ops->abort_scan)
+ return -EOPNOTSUPP;
+
+ trace_802154_rdev_abort_scan(&rdev->wpan_phy, wpan_dev);
+ ret = rdev->ops->abort_scan(&rdev->wpan_phy, wpan_dev);
+ trace_802154_rdev_return_int(&rdev->wpan_phy, ret);
+ return ret;
+}
+
#ifdef CONFIG_IEEE802154_NL802154_EXPERIMENTAL
/* TODO this is already a nl802154, so move into ieee802154 */
static inline void
diff --git a/net/ieee802154/trace.h b/net/ieee802154/trace.h
index 19c2e5d60e76..e5405f737ded 100644
--- a/net/ieee802154/trace.h
+++ b/net/ieee802154/trace.h
@@ -295,6 +295,46 @@ TRACE_EVENT(802154_rdev_set_ackreq_default,
WPAN_DEV_PR_ARG, BOOL_TO_STR(__entry->ackreq))
);
+TRACE_EVENT(802154_rdev_trigger_scan,
+ TP_PROTO(struct wpan_phy *wpan_phy,
+ struct cfg802154_scan_request *request),
+ TP_ARGS(wpan_phy, request),
+ TP_STRUCT__entry(
+ WPAN_PHY_ENTRY
+ __field(u8, page)
+ __field(u32, channels)
+ __field(u8, duration)
+ ),
+ TP_fast_assign(
+ WPAN_PHY_ASSIGN;
+ __entry->page = request->page;
+ __entry->channels = request->channels;
+ __entry->duration = request->duration;
+ ),
+ TP_printk(WPAN_PHY_PR_FMT ", scan, page: %d, channels: %x, duration %d",
+ WPAN_PHY_PR_ARG, __entry->page, __entry->channels, __entry->duration)
+);
+
+DECLARE_EVENT_CLASS(802154_wdev_template,
+ TP_PROTO(struct wpan_phy *wpan_phy, struct wpan_dev *wpan_dev),
+ TP_ARGS(wpan_phy, wpan_dev),
+ TP_STRUCT__entry(
+ WPAN_PHY_ENTRY
+ WPAN_DEV_ENTRY
+ ),
+ TP_fast_assign(
+ WPAN_PHY_ASSIGN;
+ WPAN_DEV_ASSIGN;
+ ),
+ TP_printk(WPAN_PHY_PR_FMT ", " WPAN_DEV_PR_FMT,
+ WPAN_PHY_PR_ARG, WPAN_DEV_PR_ARG)
+);
+
+DEFINE_EVENT(802154_wdev_template, 802154_rdev_abort_scan,
+ TP_PROTO(struct wpan_phy *wpan_phy, struct wpan_dev *wpan_dev),
+ TP_ARGS(wpan_phy, wpan_dev)
+);
+
TRACE_EVENT(802154_rdev_return_int,
TP_PROTO(struct wpan_phy *wpan_phy, int ret),
TP_ARGS(wpan_phy, ret),
--
2.34.1
^ permalink raw reply related
* [PATCH wpan-next v2 0/6] IEEE 802.15.4 passive scan support
From: Miquel Raynal @ 2022-12-17 0:02 UTC (permalink / raw)
To: Alexander Aring, Stefan Schmidt, linux-wpan
Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
Guilhem Imberton, Thomas Petazzoni, David S. Miller,
Jakub Kicinski, Paolo Abeni, Eric Dumazet, netdev, Miquel Raynal
Hello,
We now have the infrastructure to report beacons/PANs, we also have the
capability to transmit MLME commands synchronously. It is time to use
these to implement a proper scan implementation.
There are a few side-changes which are necessary for the soft MAC scan
implementation to compile/work, but nothing big. The two main changes
are:
* The introduction of a user API for managing scans.
* The soft MAC implementation of a scan.
In all the past, current and future submissions, David and Romuald from
Qorvo are credited in various ways (main author, co-author,
suggested-by) depending of the amount of rework that was involved on
each patch, reflecting as much as possible the open-source guidelines we
follow in the kernel. All this effort is made possible thanks to Qorvo
Inc which is pushing towards a featureful upstream WPAN support.
Example of output:
# iwpan monitor
coord1 (phy #1): scan started
coord1 (phy #1): beacon received: PAN 0xabcd, addr 0xb2bcc36ac5570abe
coord1 (phy #1): scan finished
coord1 (phy #1): scan started
coord1 (phy #1): scan aborted
Cheers,
Miquèl
Changes in v2:
* Different way to forward the reason why a scan was terminated, instead
of providing two different "commands" we provide the same "scan done"
command and aside an attribute, saying whether the scan was aborted by
the user or terminated by itself at the end of the required list of
(complex) channels to scan.
Miquel Raynal (6):
ieee802154: Add support for user scanning requests
ieee802154: Define a beacon frame header
ieee802154: Introduce a helper to validate a channel
mac802154: Prepare forcing specific symbol duration
mac802154: Add MLME Tx locked helpers
mac802154: Handle passive scanning
include/linux/ieee802154.h | 7 +
include/net/cfg802154.h | 55 +++++-
include/net/ieee802154_netdev.h | 36 ++++
include/net/nl802154.h | 58 +++++++
net/ieee802154/nl802154.c | 223 +++++++++++++++++++++++-
net/ieee802154/nl802154.h | 3 +
net/ieee802154/rdev-ops.h | 28 ++++
net/ieee802154/trace.h | 40 +++++
net/mac802154/Makefile | 2 +-
net/mac802154/cfg.c | 33 +++-
net/mac802154/ieee802154_i.h | 43 ++++-
net/mac802154/iface.c | 3 +
net/mac802154/main.c | 36 ++--
net/mac802154/rx.c | 36 +++-
net/mac802154/scan.c | 289 ++++++++++++++++++++++++++++++++
net/mac802154/tx.c | 42 +++--
16 files changed, 902 insertions(+), 32 deletions(-)
create mode 100644 net/mac802154/scan.c
--
2.34.1
^ permalink raw reply
* [PATCH wpan] mac802154: Fix possible double free upon parsing error
From: Miquel Raynal @ 2022-12-16 23:57 UTC (permalink / raw)
To: Alexander Aring, Stefan Schmidt, linux-wpan
Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
Guilhem Imberton, Thomas Petazzoni, David S. Miller,
Jakub Kicinski, Paolo Abeni, Eric Dumazet, netdev, Miquel Raynal,
Dan Carpenter
Commit 4d1c7d87030b ("mac802154: Move an skb free within the rx path")
tried to simplify error handling within the receive path by moving the
kfree_skb() call at the very end of the top-level function but missed
one kfree_skb() called upon frame parsing error. Prevent this possible
double free from happening.
Fixes: 4d1c7d87030b ("mac802154: Move an skb free within the rx path")
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
net/mac802154/rx.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/net/mac802154/rx.c b/net/mac802154/rx.c
index c2aae2a6d6a6..97bb4401dd3e 100644
--- a/net/mac802154/rx.c
+++ b/net/mac802154/rx.c
@@ -213,7 +213,6 @@ __ieee802154_rx_handle_packet(struct ieee802154_local *local,
ret = ieee802154_parse_frame_start(skb, &hdr);
if (ret) {
pr_debug("got invalid frame\n");
- kfree_skb(skb);
return;
}
--
2.34.1
^ permalink raw reply related
* Re: [PATCH v6] vmxnet3: Add XDP support.
From: William Tu @ 2022-12-16 23:55 UTC (permalink / raw)
To: Alexander H Duyck; +Cc: netdev, tuc, gyang, doshir
In-Reply-To: <CALDO+SaKd74n_PDB+kN0KQWt_Zh9NjzPAm-kWfie4Vd+jOeCZw@mail.gmail.com>
On Fri, Dec 16, 2022 at 2:53 PM William Tu <u9012063@gmail.com> wrote:
>
> Hi Alexander,
> Thanks for taking a look at this patch!
>
> On Fri, Dec 16, 2022 at 9:14 AM Alexander H Duyck
> <alexander.duyck@gmail.com> wrote:
> >
> > On Fri, 2022-12-16 at 06:41 -0800, William Tu wrote:
> > > The patch adds native-mode XDP support: XDP DROP, PASS, TX, and REDIRECT.
> > >
> > > Background:
> > > The vmxnet3 rx consists of three rings: ring0, ring1, and dataring.
> > > For r0 and r1, buffers at r0 are allocated using alloc_skb APIs and dma
> > > mapped to the ring's descriptor. If LRO is enabled and packet size larger
> > > than 3K, VMXNET3_MAX_SKB_BUF_SIZE, then r1 is used to mapped the rest of
> > > the buffer larger than VMXNET3_MAX_SKB_BUF_SIZE. Each buffer in r1 is
> > > allocated using alloc_page. So for LRO packets, the payload will be in one
> > > buffer from r0 and multiple from r1, for non-LRO packets, only one
> > > descriptor in r0 is used for packet size less than 3k.
> > >
> > > When receiving a packet, the first descriptor will have the sop (start of
> > > packet) bit set, and the last descriptor will have the eop (end of packet)
> > > bit set. Non-LRO packets will have only one descriptor with both sop and
> > > eop set.
> > >
> > > Other than r0 and r1, vmxnet3 dataring is specifically designed for
> > > handling packets with small size, usually 128 bytes, defined in
> > > VMXNET3_DEF_RXDATA_DESC_SIZE, by simply copying the packet from the backend
> > > driver in ESXi to the ring's memory region at front-end vmxnet3 driver, in
> > > order to avoid memory mapping/unmapping overhead. In summary, packet size:
> > > A. < 128B: use dataring
> > > B. 128B - 3K: use ring0 (VMXNET3_RX_BUF_SKB)
> > > C. > 3K: use ring0 and ring1 (VMXNET3_RX_BUF_SKB + VMXNET3_RX_BUF_PAGE)
> > > As a result, the patch adds XDP support for packets using dataring
> > > and r0 (case A and B), not the large packet size when LRO is enabled.
> > >
> > > XDP Implementation:
> > > When user loads and XDP prog, vmxnet3 driver checks configurations, such
> > > as mtu, lro, and re-allocate the rx buffer size for reserving the extra
> > > headroom, XDP_PACKET_HEADROOM, for XDP frame. The XDP prog will then be
> > > associated with every rx queue of the device. Note that when using dataring
> > > for small packet size, vmxnet3 (front-end driver) doesn't control the
> > > buffer allocation, as a result the XDP frame's headroom is zero.
> > >
> > > The receive side of XDP is implemented for case A and B, by invoking the
> > > bpf program at vmxnet3_rq_rx_complete and handle its returned action.
> > > The new vmxnet3_run_xdp function handles the difference of using dataring
> > > or ring0, and decides the next journey of the packet afterward.
> > >
> > > For TX, vmxnet3 has split header design. Outgoing packets are parsed
> > > first and protocol headers (L2/L3/L4) are copied to the backend. The
> > > rest of the payload are dma mapped. Since XDP_TX does not parse the
> > > packet protocol, the entire XDP frame is using dma mapped for the
> > > transmission. Because the actual TX is deferred until txThreshold, it's
> > > possible that an rx buffer is being overwritten by incoming burst of rx
> > > packets, before tx buffer being transmitted. As a result, we allocate new
> > > skb and introduce a copy.
> > >
> > > Performance:
> > > Tested using two VMs inside one ESXi machine, using single core on each
> > > vmxnet3 device, sender using DPDK testpmd tx-mode attached to vmxnet3
> > > device, sending 64B or 512B packet.
> > >
> > > VM1 txgen:
> > > $ dpdk-testpmd -l 0-3 -n 1 -- -i --nb-cores=3 \
> > > --forward-mode=txonly --eth-peer=0,<mac addr of vm2>
> > > option: add "--txonly-multi-flow"
> > > option: use --txpkts=512 or 64 byte
> > >
> > > VM2 running XDP:
> > > $ ./samples/bpf/xdp_rxq_info -d ens160 -a <options> --skb-mode
> > > $ ./samples/bpf/xdp_rxq_info -d ens160 -a <options>
> > > options: XDP_DROP, XDP_PASS, XDP_TX
> > >
> > > To test REDIRECT to cpu 0, use
> > > $ ./samples/bpf/xdp_redirect_cpu -d ens160 -c 0 -e drop
> > >
> > > Single core performance comparison with skb-mode.
> > > 64B: skb-mode -> native-mode (with this patch)
> > > XDP_DROP: 932Kpps -> 2.0Mpps
> > > XDP_PASS: 284Kpps -> 314Kpps
> > > XDP_TX: 591Kpps -> 1.8Mpps
> > > REDIRECT: 453Kpps -> 501Kpps
> > >
> > > 512B: skb-mode -> native-mode (with this patch)
> > > XDP_DROP: 890Kpps -> 1.3Mpps
> > > XDP_PASS: 284Kpps -> 314Kpps
> > > XDP_TX: 555Kpps -> 1.2Mpps
> > > REDIRECT: 670Kpps -> 430Kpps
> > >
> >
> > I hadn't noticed it before. Based on this it looks like native mode is
> > performing worse then skb-mode for redirect w/ 512B packets? Have you
> > looked into why that might be?
>
> yes, I noticed it but don't know why, maybe it's due to extra copy and page
> allocation like you said below. I will dig deeper.
>
> >
> > My main concern would be that you are optimizing for recyling in the Tx
> > and Redirect paths, when you might be better off just releasing the
> > buffers and batch allocating new pages in your Rx path.
>
> right, are you talking about using the page pool allocator, ex: slide 8 below
> https://legacy.netdevconf.info/0x14/pub/slides/10/add-xdp-on-driver.pdf
> I tried it before but then I found I have to replace lots of existing vmxnet3
> code, basically replacing all the rx/tx buffer allocation code with new
> page pool api, even without XDP. I'd love to give it a try, do you think it's
> worth doing it?
>
> >
> > > Limitations:
> > > a. LRO will be disabled when users load XDP program
> > > b. MTU will be checked and limit to
> > > VMXNET3_MAX_SKB_BUF_SIZE(3K) - XDP_PACKET_HEADROOM(256) -
> > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
> > >
> > > Signed-off-by: William Tu <u9012063@gmail.com>
> > > ---
> > > v5 -> v6:
> > > work on feedbacks from Alexander Duyck
> > > - remove unused skb parameter at vmxnet3_xdp_xmit
> > > - add trace point for XDP_ABORTED
> > > - avoid TX packet buffer being overwritten by allocatin
> > > new skb and memcpy (TX performance drop from 2.3 to 1.8Mpps)
> > > - update some of the performance number and a demo video
> > > https://youtu.be/I3nx5wQDTXw
> > > - pass VMware internal regression test using non-ENS: vmxnet3v2,
> > > vmxnet3v3, vmxnet3v4, so remove RFC tag.
> > >
> > > v4 -> v5:
> > > - move XDP code to separate file: vmxnet3_xdp.{c, h},
> > > suggested by Guolin
> > > - expose vmxnet3_rq_create_all and vmxnet3_adjust_rx_ring_size
> > > - more test using samples/bpf/{xdp1, xdp2, xdp_adjust_tail}
> > > - add debug print
> > > - rebase on commit 65e6af6cebe
> > >
> > > v3 -> v4:
> > > - code refactoring and improved commit message
> > > - make dataring and non-dataring case clear
> > > - in XDP_PASS, handle xdp.data and xdp.data_end change after
> > > bpf program executed
> > > - now still working on internal testing
> > > - v4 applied on net-next commit 65e6af6cebef
> > >
> > > v2 -> v3:
> > > - code refactoring: move the XDP processing to the front
> > > of the vmxnet3_rq_rx_complete, and minimize the places
> > > of changes to existing code.
> > > - Performance improvement over BUF_SKB (512B) due to skipping
> > > skb allocation when DROP and TX.
> > >
> > > v1 -> v2:
> > > - Avoid skb allocation for small packet size (when dataring is used)
> > > - Use rcu_read_lock unlock instead of READ_ONCE
> > > - Peroformance improvement over v1
> > > - Merge xdp drop, tx, pass, and redirect into 1 patch
> > >
> > > I tested the patch using below script:
> > > while [ true ]; do
> > > timeout 20 ./samples/bpf/xdp_rxq_info -d ens160 -a XDP_DROP --skb-mode
> > > timeout 20 ./samples/bpf/xdp_rxq_info -d ens160 -a XDP_DROP
> > > timeout 20 ./samples/bpf/xdp_rxq_info -d ens160 -a XDP_PASS --skb-mode
> > > timeout 20 ./samples/bpf/xdp_rxq_info -d ens160 -a XDP_PASS
> > > timeout 20 ./samples/bpf/xdp_rxq_info -d ens160 -a XDP_TX --skb-mode
> > > timeout 20 ./samples/bpf/xdp_rxq_info -d ens160 -a XDP_TX
> > > timeout 20 ./samples/bpf/xdp_redirect_cpu -d ens160 -c 0 -e drop
> > > timeout 20 ./samples/bpf/xdp_redirect_cpu -d ens160 -c 0 -e pass
> > > done
> > > ---
> > > drivers/net/vmxnet3/Makefile | 2 +-
> > > drivers/net/vmxnet3/vmxnet3_drv.c | 48 ++-
> > > drivers/net/vmxnet3/vmxnet3_ethtool.c | 14 +
> > > drivers/net/vmxnet3/vmxnet3_int.h | 20 ++
> > > drivers/net/vmxnet3/vmxnet3_xdp.c | 458 ++++++++++++++++++++++++++
> > > drivers/net/vmxnet3/vmxnet3_xdp.h | 39 +++
> > > 6 files changed, 573 insertions(+), 8 deletions(-)
> > > create mode 100644 drivers/net/vmxnet3/vmxnet3_xdp.c
> > > create mode 100644 drivers/net/vmxnet3/vmxnet3_xdp.h
> > >
> > > diff --git a/drivers/net/vmxnet3/Makefile b/drivers/net/vmxnet3/Makefile
> > > index a666a88ac1ff..f82870c10205 100644
> > > --- a/drivers/net/vmxnet3/Makefile
> > > +++ b/drivers/net/vmxnet3/Makefile
> > > @@ -32,4 +32,4 @@
> > >
> > > obj-$(CONFIG_VMXNET3) += vmxnet3.o
> > >
> > > -vmxnet3-objs := vmxnet3_drv.o vmxnet3_ethtool.o
> > > +vmxnet3-objs := vmxnet3_drv.o vmxnet3_ethtool.o vmxnet3_xdp.o
> > > diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
> > > index d3e7b27eb933..b55fec2ac2bf 100644
> > > --- a/drivers/net/vmxnet3/vmxnet3_drv.c
> > > +++ b/drivers/net/vmxnet3/vmxnet3_drv.c
> > > @@ -28,6 +28,7 @@
> > > #include <net/ip6_checksum.h>
> > >
> > > #include "vmxnet3_int.h"
> > > +#include "vmxnet3_xdp.h"
> > >
> > > char vmxnet3_driver_name[] = "vmxnet3";
> > > #define VMXNET3_DRIVER_DESC "VMware vmxnet3 virtual NIC driver"
> > > @@ -351,7 +352,6 @@ vmxnet3_unmap_pkt(u32 eop_idx, struct vmxnet3_tx_queue *tq,
> > > BUG_ON(VMXNET3_TXDESC_GET_EOP(&(tq->tx_ring.base[eop_idx].txd)) != 1);
> > >
> > > skb = tq->buf_info[eop_idx].skb;
> > > - BUG_ON(skb == NULL);
> > > tq->buf_info[eop_idx].skb = NULL;
> > >
> > > VMXNET3_INC_RING_IDX_ONLY(eop_idx, tq->tx_ring.size);
> > > @@ -592,6 +592,9 @@ vmxnet3_rq_alloc_rx_buf(struct vmxnet3_rx_queue *rq, u32 ring_idx,
> > > rbi->skb = __netdev_alloc_skb_ip_align(adapter->netdev,
> > > rbi->len,
> > > GFP_KERNEL);
> > > + if (adapter->xdp_enabled)
> > > + skb_reserve(rbi->skb, XDP_PACKET_HEADROOM);
> > > +
> > > if (unlikely(rbi->skb == NULL)) {
> > > rq->stats.rx_buf_alloc_failure++;
> > > break;
> > > @@ -1404,6 +1407,8 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq,
> > > struct Vmxnet3_RxDesc rxCmdDesc;
> > > struct Vmxnet3_RxCompDesc rxComp;
> > > #endif
> > > + bool need_flush = 0;
> > > +
> > > vmxnet3_getRxComp(rcd, &rq->comp_ring.base[rq->comp_ring.next2proc].rcd,
> > > &rxComp);
> > > while (rcd->gen == rq->comp_ring.gen) {
> > > @@ -1444,6 +1449,19 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq,
> > > goto rcd_done;
> > > }
> > >
> > > + if (unlikely(rcd->sop && rcd->eop && adapter->xdp_enabled)) {
> > > + int act = vmxnet3_process_xdp(adapter, rq, rcd, rbi,
> > > + rxd, &need_flush);
> > > + ctx->skb = NULL;
> > > + switch (act) {
> > > + case VMXNET3_XDP_TAKEN:
> > > + goto rcd_done;
> > > + case VMXNET3_XDP_CONTINUE:
> > > + default:
> > > + break;
> > > + }
> > > + }
> > > +
> > > if (rcd->sop) { /* first buf of the pkt */
> > > bool rxDataRingUsed;
> > > u16 len;
> > > @@ -1483,6 +1501,10 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq,
> > > goto rcd_done;
> > > }
> > >
> > > + if (adapter->xdp_enabled && !rxDataRingUsed)
> > > + skb_reserve(new_skb,
> > > + XDP_PACKET_HEADROOM);
> > > +
> > > if (rxDataRingUsed) {
> > > size_t sz;
> > >
> > > @@ -1730,6 +1752,8 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq,
> > > vmxnet3_getRxComp(rcd,
> > > &rq->comp_ring.base[rq->comp_ring.next2proc].rcd, &rxComp);
> > > }
> > > + if (need_flush)
> > > + xdp_do_flush_map();
> > >
> > > return num_pkts;
> > > }
> > > @@ -1776,6 +1800,7 @@ vmxnet3_rq_cleanup(struct vmxnet3_rx_queue *rq,
> > >
> > > rq->comp_ring.gen = VMXNET3_INIT_GEN;
> > > rq->comp_ring.next2proc = 0;
> > > + rq->xdp_bpf_prog = NULL;
> > > }
> > >
> > >
> > > @@ -1788,7 +1813,6 @@ vmxnet3_rq_cleanup_all(struct vmxnet3_adapter *adapter)
> > > vmxnet3_rq_cleanup(&adapter->rx_queue[i], adapter);
> > > }
> > >
> > > -
> > > static void vmxnet3_rq_destroy(struct vmxnet3_rx_queue *rq,
> > > struct vmxnet3_adapter *adapter)
> > > {
> > > @@ -1832,6 +1856,8 @@ static void vmxnet3_rq_destroy(struct vmxnet3_rx_queue *rq,
> > > kfree(rq->buf_info[0]);
> > > rq->buf_info[0] = NULL;
> > > rq->buf_info[1] = NULL;
> > > +
> > > + vmxnet3_unregister_xdp_rxq(rq);
> > > }
> > >
> > > static void
> > > @@ -1893,6 +1919,10 @@ vmxnet3_rq_init(struct vmxnet3_rx_queue *rq,
> > > }
> > > vmxnet3_rq_alloc_rx_buf(rq, 1, rq->rx_ring[1].size - 1, adapter);
> > >
> > > + /* always register, even if no XDP prog used */
> > > + if (vmxnet3_register_xdp_rxq(rq, adapter))
> > > + return -EINVAL;
> > > +
> > > /* reset the comp ring */
> > > rq->comp_ring.next2proc = 0;
> > > memset(rq->comp_ring.base, 0, rq->comp_ring.size *
> > > @@ -1989,7 +2019,7 @@ vmxnet3_rq_create(struct vmxnet3_rx_queue *rq, struct vmxnet3_adapter *adapter)
> > > }
> > >
> > >
> > > -static int
> > > +int
> > > vmxnet3_rq_create_all(struct vmxnet3_adapter *adapter)
> > > {
> > > int i, err = 0;
> > > @@ -2585,7 +2615,8 @@ vmxnet3_setup_driver_shared(struct vmxnet3_adapter *adapter)
> > > if (adapter->netdev->features & NETIF_F_RXCSUM)
> > > devRead->misc.uptFeatures |= UPT1_F_RXCSUM;
> > >
> > > - if (adapter->netdev->features & NETIF_F_LRO) {
> > > + if (adapter->netdev->features & NETIF_F_LRO &&
> > > + !adapter->xdp_enabled) {
> > > devRead->misc.uptFeatures |= UPT1_F_LRO;
> > > devRead->misc.maxNumRxSG = cpu_to_le16(1 + MAX_SKB_FRAGS);
> > > }
> > > @@ -3026,7 +3057,7 @@ vmxnet3_free_pci_resources(struct vmxnet3_adapter *adapter)
> > > }
> > >
> > >
> > > -static void
> > > +void
> > > vmxnet3_adjust_rx_ring_size(struct vmxnet3_adapter *adapter)
> > > {
> > > size_t sz, i, ring0_size, ring1_size, comp_size;
> > > @@ -3035,7 +3066,8 @@ vmxnet3_adjust_rx_ring_size(struct vmxnet3_adapter *adapter)
> > > if (adapter->netdev->mtu <= VMXNET3_MAX_SKB_BUF_SIZE -
> > > VMXNET3_MAX_ETH_HDR_SIZE) {
> > > adapter->skb_buf_size = adapter->netdev->mtu +
> > > - VMXNET3_MAX_ETH_HDR_SIZE;
> > > + VMXNET3_MAX_ETH_HDR_SIZE +
> > > + vmxnet3_xdp_headroom(adapter);
> > > if (adapter->skb_buf_size < VMXNET3_MIN_T0_BUF_SIZE)
> > > adapter->skb_buf_size = VMXNET3_MIN_T0_BUF_SIZE;
> > >
> > > @@ -3563,7 +3595,6 @@ vmxnet3_reset_work(struct work_struct *data)
> > > clear_bit(VMXNET3_STATE_BIT_RESETTING, &adapter->state);
> > > }
> > >
> > > -
> > > static int
> > > vmxnet3_probe_device(struct pci_dev *pdev,
> > > const struct pci_device_id *id)
> > > @@ -3585,6 +3616,8 @@ vmxnet3_probe_device(struct pci_dev *pdev,
> > > #ifdef CONFIG_NET_POLL_CONTROLLER
> > > .ndo_poll_controller = vmxnet3_netpoll,
> > > #endif
> > > + .ndo_bpf = vmxnet3_xdp,
> > > + .ndo_xdp_xmit = vmxnet3_xdp_xmit,
> > > };
> > > int err;
> > > u32 ver;
> > > @@ -3900,6 +3933,7 @@ vmxnet3_probe_device(struct pci_dev *pdev,
> > > goto err_register;
> > > }
> > >
> > > + adapter->xdp_enabled = false;
> > > vmxnet3_check_link(adapter, false);
> > > return 0;
> > >
> > > diff --git a/drivers/net/vmxnet3/vmxnet3_ethtool.c b/drivers/net/vmxnet3/vmxnet3_ethtool.c
> > > index 18cf7c723201..6f542236b26e 100644
> > > --- a/drivers/net/vmxnet3/vmxnet3_ethtool.c
> > > +++ b/drivers/net/vmxnet3/vmxnet3_ethtool.c
> > > @@ -76,6 +76,10 @@ vmxnet3_tq_driver_stats[] = {
> > > copy_skb_header) },
> > > { " giant hdr", offsetof(struct vmxnet3_tq_driver_stats,
> > > oversized_hdr) },
> > > + { " xdp xmit", offsetof(struct vmxnet3_tq_driver_stats,
> > > + xdp_xmit) },
> > > + { " xdp xmit err", offsetof(struct vmxnet3_tq_driver_stats,
> > > + xdp_xmit_err) },
> > > };
> > >
> > > /* per rq stats maintained by the device */
> > > @@ -106,6 +110,16 @@ vmxnet3_rq_driver_stats[] = {
> > > drop_fcs) },
> > > { " rx buf alloc fail", offsetof(struct vmxnet3_rq_driver_stats,
> > > rx_buf_alloc_failure) },
> > > + { " xdp packets", offsetof(struct vmxnet3_rq_driver_stats,
> > > + xdp_packets) },
> > > + { " xdp tx", offsetof(struct vmxnet3_rq_driver_stats,
> > > + xdp_tx) },
> > > + { " xdp redirects", offsetof(struct vmxnet3_rq_driver_stats,
> > > + xdp_redirects) },
> > > + { " xdp drops", offsetof(struct vmxnet3_rq_driver_stats,
> > > + xdp_drops) },
> > > + { " xdp aborted", offsetof(struct vmxnet3_rq_driver_stats,
> > > + xdp_aborted) },
> > > };
> > >
> > > /* global stats maintained by the driver */
> > > diff --git a/drivers/net/vmxnet3/vmxnet3_int.h b/drivers/net/vmxnet3/vmxnet3_int.h
> > > index 3367db23aa13..5cf4033930d8 100644
> > > --- a/drivers/net/vmxnet3/vmxnet3_int.h
> > > +++ b/drivers/net/vmxnet3/vmxnet3_int.h
> > > @@ -56,6 +56,8 @@
> > > #include <linux/if_arp.h>
> > > #include <linux/inetdevice.h>
> > > #include <linux/log2.h>
> > > +#include <linux/bpf.h>
> > > +#include <linux/skbuff.h>
> > >
> > > #include "vmxnet3_defs.h"
> > >
> > > @@ -217,6 +219,9 @@ struct vmxnet3_tq_driver_stats {
> > > u64 linearized; /* # of pkts linearized */
> > > u64 copy_skb_header; /* # of times we have to copy skb header */
> > > u64 oversized_hdr;
> > > +
> > > + u64 xdp_xmit;
> > > + u64 xdp_xmit_err;
> > > };
> > >
> > > struct vmxnet3_tx_ctx {
> > > @@ -285,6 +290,12 @@ struct vmxnet3_rq_driver_stats {
> > > u64 drop_err;
> > > u64 drop_fcs;
> > > u64 rx_buf_alloc_failure;
> > > +
> > > + u64 xdp_packets; /* Total packets processed by XDP. */
> > > + u64 xdp_tx;
> > > + u64 xdp_redirects;
> > > + u64 xdp_drops;
> > > + u64 xdp_aborted;
> > > };
> > >
> > > struct vmxnet3_rx_data_ring {
> > > @@ -307,6 +318,8 @@ struct vmxnet3_rx_queue {
> > > struct vmxnet3_rx_buf_info *buf_info[2];
> > > struct Vmxnet3_RxQueueCtrl *shared;
> > > struct vmxnet3_rq_driver_stats stats;
> > > + struct bpf_prog __rcu *xdp_bpf_prog;
> > > + struct xdp_rxq_info xdp_rxq;
> > > } __attribute__((__aligned__(SMP_CACHE_BYTES)));
> > >
> > > #define VMXNET3_DEVICE_MAX_TX_QUEUES 32
> > > @@ -415,6 +428,7 @@ struct vmxnet3_adapter {
> > > u16 tx_prod_offset;
> > > u16 rx_prod_offset;
> > > u16 rx_prod2_offset;
> > > + bool xdp_enabled;
> > > };
> > >
> > > #define VMXNET3_WRITE_BAR0_REG(adapter, reg, val) \
> > > @@ -490,6 +504,12 @@ vmxnet3_tq_destroy_all(struct vmxnet3_adapter *adapter);
> > > void
> > > vmxnet3_rq_destroy_all(struct vmxnet3_adapter *adapter);
> > >
> > > +int
> > > +vmxnet3_rq_create_all(struct vmxnet3_adapter *adapter);
> > > +
> > > +void
> > > +vmxnet3_adjust_rx_ring_size(struct vmxnet3_adapter *adapter);
> > > +
> > > netdev_features_t
> > > vmxnet3_fix_features(struct net_device *netdev, netdev_features_t features);
> > >
> > > diff --git a/drivers/net/vmxnet3/vmxnet3_xdp.c b/drivers/net/vmxnet3/vmxnet3_xdp.c
> > > new file mode 100644
> > > index 000000000000..afb2d43b5464
> > > --- /dev/null
> > > +++ b/drivers/net/vmxnet3/vmxnet3_xdp.c
> > > @@ -0,0 +1,458 @@
> > > +// SPDX-License-Identifier: GPL-2.0-or-later
> > > +/*
> > > + * Linux driver for VMware's vmxnet3 ethernet NIC.
> > > + * Copyright (C) 2008-2023, VMware, Inc. All Rights Reserved.
> > > + * Maintained by: pv-drivers@vmware.com
> > > + *
> > > + */
> > > +
> > > +#include "vmxnet3_int.h"
> > > +#include "vmxnet3_xdp.h"
> > > +
> > > +static void
> > > +vmxnet3_xdp_exchange_program(struct vmxnet3_adapter *adapter,
> > > + struct bpf_prog *prog)
> > > +{
> > > + struct vmxnet3_rx_queue *rq;
> > > + int i;
> > > +
> > > + for (i = 0; i < adapter->num_rx_queues; i++) {
> > > + rq = &adapter->rx_queue[i];
> > > + rcu_assign_pointer(rq->xdp_bpf_prog, prog);
> > > + }
> > > + if (prog)
> > > + adapter->xdp_enabled = true;
> > > + else
> > > + adapter->xdp_enabled = false;
> > > +}
> > > +
> > > +static int
> > > +vmxnet3_xdp_set(struct net_device *netdev, struct netdev_bpf *bpf,
> > > + struct netlink_ext_ack *extack)
> > > +{
> > > + struct vmxnet3_adapter *adapter = netdev_priv(netdev);
> > > + struct bpf_prog *new_bpf_prog = bpf->prog;
> > > + struct bpf_prog *old_bpf_prog;
> > > + bool need_update;
> > > + bool running;
> > > + int err = 0;
> > > +
> > > + if (new_bpf_prog && netdev->mtu > VMXNET3_XDP_MAX_MTU) {
> > > + NL_SET_ERR_MSG_MOD(extack, "MTU too large for XDP");
> > > + return -EOPNOTSUPP;
> > > + }
> > > +
> > > + old_bpf_prog = READ_ONCE(adapter->rx_queue[0].xdp_bpf_prog);
> > > + if (!new_bpf_prog && !old_bpf_prog) {
> > > + adapter->xdp_enabled = false;
> > > + return 0;
> > > + }
> > > + running = netif_running(netdev);
> > > + need_update = !!old_bpf_prog != !!new_bpf_prog;
> > > +
> > > + if (running && need_update)
> > > + vmxnet3_quiesce_dev(adapter);
> > > +
> > > + vmxnet3_xdp_exchange_program(adapter, new_bpf_prog);
> > > + if (old_bpf_prog)
> > > + bpf_prog_put(old_bpf_prog);
> > > +
> > > + if (running && need_update) {
> > > + vmxnet3_reset_dev(adapter);
> > > + vmxnet3_rq_destroy_all(adapter);
> > > + vmxnet3_adjust_rx_ring_size(adapter);
> > > + err = vmxnet3_rq_create_all(adapter);
> > > + if (err) {
> > > + NL_SET_ERR_MSG_MOD(extack,
> > > + "failed to re-create rx queues for XDP.");
> > > + err = -EOPNOTSUPP;
> > > + goto out;
> > > + }
> > > + err = vmxnet3_activate_dev(adapter);
> > > + if (err) {
> > > + NL_SET_ERR_MSG_MOD(extack,
> > > + "failed to activate device for XDP.");
> > > + err = -EOPNOTSUPP;
> > > + goto out;
> > > + }
> > > + clear_bit(VMXNET3_STATE_BIT_RESETTING, &adapter->state);
> > > + }
> > > +out:
> > > + return err;
> > > +}
> > > +
> > > +/* This is the main xdp call used by kernel to set/unset eBPF program. */
> > > +int
> > > +vmxnet3_xdp(struct net_device *netdev, struct netdev_bpf *bpf)
> > > +{
> > > + switch (bpf->command) {
> > > + case XDP_SETUP_PROG:
> > > + netdev_dbg(netdev, "XDP: set program to ");
> > > + return vmxnet3_xdp_set(netdev, bpf, bpf->extack);
> > > + default:
> > > + return -EINVAL;
> > > + }
> > > + return 0;
> > > +}
> > > +
> > > +int
> > > +vmxnet3_xdp_headroom(struct vmxnet3_adapter *adapter)
> > > +{
> > > + if (adapter->xdp_enabled)
> > > + return VMXNET3_XDP_ROOM;
> > > + else
> > > + return 0;
> > > +}
> > > +
> > > +void
> > > +vmxnet3_unregister_xdp_rxq(struct vmxnet3_rx_queue *rq)
> > > +{
> > > + xdp_rxq_info_unreg_mem_model(&rq->xdp_rxq);
> > > + xdp_rxq_info_unreg(&rq->xdp_rxq);
> > > +}
> > > +
> > > +int
> > > +vmxnet3_register_xdp_rxq(struct vmxnet3_rx_queue *rq,
> > > + struct vmxnet3_adapter *adapter)
> > > +{
> > > + int err;
> > > +
> > > + err = xdp_rxq_info_reg(&rq->xdp_rxq, adapter->netdev, rq->qid, 0);
> > > + if (err < 0)
> > > + return err;
> > > +
> > > + err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq, MEM_TYPE_PAGE_SHARED,
> > > + NULL);
> > > + if (err < 0) {
> > > + xdp_rxq_info_unreg(&rq->xdp_rxq);
> > > + return err;
> > > + }
> > > + return 0;
> > > +}
> > > +
> > > +static int
> > > +vmxnet3_xdp_xmit_frame(struct vmxnet3_adapter *adapter,
> > > + struct xdp_frame *xdpf,
> > > + struct vmxnet3_tx_queue *tq)
> > > +{
> > > + struct vmxnet3_tx_buf_info *tbi = NULL;
> > > + union Vmxnet3_GenericDesc *gdesc;
> > > + struct vmxnet3_tx_ctx ctx;
> > > + int tx_num_deferred;
> > > + struct sk_buff *skb;
> > > + u32 buf_size;
> > > + int ret = 0;
> > > + u32 dw2;
> > > +
> > > + if (vmxnet3_cmd_ring_desc_avail(&tq->tx_ring) == 0) {
> > > + tq->stats.tx_ring_full++;
> > > + ret = -ENOMEM;
> > > + goto exit;
> > > + }
> > > +
> > > + skb = __netdev_alloc_skb(adapter->netdev, xdpf->len, GFP_KERNEL);
> > > + if (unlikely(!skb))
> > > + goto exit;
> > > +
> >
> > Any reason for this not to return an error. Seems like this might be
> > better off being the ENOMEM w/ the no descriptor case being something
> > like ENOSPC.
>
> my mistake, will fix it, thanks!
>
> >
> > Also at some point you may want to look at supporting handling page
> > fragments or XDP frames as allocating an skb for an XDP transmit can be
> > expensive.
>
> OK! I haven't thought about that yet, will keep it in mind.
>
> >
> > > + memcpy(skb->data, xdpf->data, xdpf->len);
> > > + dw2 = (tq->tx_ring.gen ^ 0x1) << VMXNET3_TXD_GEN_SHIFT;
> > > + dw2 |= xdpf->len;
> > > + ctx.sop_txd = tq->tx_ring.base + tq->tx_ring.next2fill;
> > > + gdesc = ctx.sop_txd;
> > > +
> > > + buf_size = xdpf->len;
> > > + tbi = tq->buf_info + tq->tx_ring.next2fill;
> > > + tbi->map_type = VMXNET3_MAP_SINGLE;
> > > + tbi->dma_addr = dma_map_single(&adapter->pdev->dev,
> > > + skb->data, buf_size,
> > > + DMA_TO_DEVICE);
> > > + if (dma_mapping_error(&adapter->pdev->dev, tbi->dma_addr)) {
> > > + ret = -EFAULT;
> > > + goto exit;
> > > + }
> >
> > Don't you need to free the skb here?
>
> Yes, will fix it.
> >
> > > + tbi->len = buf_size;
> > > +
> > > + gdesc = tq->tx_ring.base + tq->tx_ring.next2fill;
> > > + WARN_ON_ONCE(gdesc->txd.gen == tq->tx_ring.gen);
> > > +
> > > + gdesc->txd.addr = cpu_to_le64(tbi->dma_addr);
> > > + gdesc->dword[2] = cpu_to_le32(dw2);
> > > +
> > > + /* Setup the EOP desc */
> > > + gdesc->dword[3] = cpu_to_le32(VMXNET3_TXD_CQ | VMXNET3_TXD_EOP);
> > > +
> > > + gdesc->txd.om = 0;
> > > + gdesc->txd.msscof = 0;
> > > + gdesc->txd.hlen = 0;
> > > + gdesc->txd.ti = 0;
> > > +
> > > + tx_num_deferred = le32_to_cpu(tq->shared->txNumDeferred);
> > > + tq->shared->txNumDeferred += 1;
> > > + tx_num_deferred++;
> > > +
> > > + vmxnet3_cmd_ring_adv_next2fill(&tq->tx_ring);
> > > +
> > > + /* set the last buf_info for the pkt */
> > > + tbi->skb = skb;
> > > + tbi->sop_idx = ctx.sop_txd - tq->tx_ring.base;
> > > +
> > > + dma_wmb();
> > > + gdesc->dword[2] = cpu_to_le32(le32_to_cpu(gdesc->dword[2]) ^
> > > + VMXNET3_TXD_GEN);
> > > + if (tx_num_deferred >= le32_to_cpu(tq->shared->txThreshold)) {
> > > + tq->shared->txNumDeferred = 0;
> > > + VMXNET3_WRITE_BAR0_REG(adapter,
> > > + VMXNET3_REG_TXPROD + tq->qid * 8,
> > > + tq->tx_ring.next2fill);
> > > + }
> > > +exit:
> > > + return ret;
> > > +}
> > > +
> > > +static int
> > > +vmxnet3_xdp_xmit_back(struct vmxnet3_adapter *adapter,
> > > + struct xdp_frame *xdpf)
> > > +{
> > > + struct vmxnet3_tx_queue *tq;
> > > + struct netdev_queue *nq;
> > > + int err = 0, cpu;
> > > + int tq_number;
> > > +
> > > + tq_number = adapter->num_tx_queues;
> > > + cpu = smp_processor_id();
> > > + tq = &adapter->tx_queue[cpu % tq_number];
> > > + if (tq->stopped)
> > > + return -ENETDOWN;
> > > +
> > > + nq = netdev_get_tx_queue(adapter->netdev, tq->qid);
> > > +
> > > + __netif_tx_lock(nq, cpu);
> > > + err = vmxnet3_xdp_xmit_frame(adapter, xdpf, tq);
> > > + if (err)
> > > + goto exit;
> > > +
> > > +exit:
> >
> > What is the point of the label/goto? I don't see another spot that
> > references it.
> Thanks, will fix it!
>
> >
> > > + __netif_tx_unlock(nq);
> > > + return err;
> > > +}
> > > +
> > > +int
> > > +vmxnet3_xdp_xmit(struct net_device *dev,
> > > + int n, struct xdp_frame **frames, u32 flags)
> > > +{
> > > + struct vmxnet3_adapter *adapter;
> > > + struct vmxnet3_tx_queue *tq;
> > > + struct netdev_queue *nq;
> > > + int i, err, cpu;
> > > + int nxmit = 0;
> > > + int tq_number;
> > > +
> > > + adapter = netdev_priv(dev);
> > > +
> > > + if (unlikely(test_bit(VMXNET3_STATE_BIT_QUIESCED, &adapter->state)))
> > > + return -ENETDOWN;
> > > + if (unlikely(test_bit(VMXNET3_STATE_BIT_RESETTING, &adapter->state)))
> > > + return -EINVAL;
> > > +
> > > + tq_number = adapter->num_tx_queues;
> > > + cpu = smp_processor_id();
> > > + tq = &adapter->tx_queue[cpu % tq_number];
> > > + if (tq->stopped)
> > > + return -ENETDOWN;
> > > +
> > > + nq = netdev_get_tx_queue(adapter->netdev, tq->qid);
> > > +
> > > + __netif_tx_lock(nq, cpu);
> > > + for (i = 0; i < n; i++) {
> > > + err = vmxnet3_xdp_xmit_frame(adapter, frames[i], tq);
> > > + if (err) {
> > > + tq->stats.xdp_xmit_err++;
> > > + break;
> > > + }
> > > + nxmit++;
> > > + }
> > > +
> > > + tq->stats.xdp_xmit += nxmit;
> > > + __netif_tx_unlock(nq);
> > > +
> >
> > Are you doing anything to free the frames after you transmit them? If I
> > am not mistaken you are just copying them over into skbs aren't you, so
> > what is freeing the frames after that?
>
> The frames will be free at vmxnet3_tq_cleanup() at dev_kfree_skb_any(tbi->skb);
> Because at the vmxnet3_xdp_xmit_frame the allocated skb is saved at tbi->skb,
> so it can be freed at tq cleanup.
>
> >
> > > + return nxmit;
> > > +}
> > > +
> > > +static int
> > > +__vmxnet3_run_xdp(struct vmxnet3_rx_queue *rq, void *data, int data_len,
> > > + int headroom, int frame_sz, bool *need_xdp_flush,
> > > + struct sk_buff *skb)
> > > +{
> > > + struct xdp_frame *xdpf;
> > > + void *buf_hard_start;
> > > + struct xdp_buff xdp;
> > > + struct page *page;
> > > + void *orig_data;
> > > + int err, delta;
> > > + int delta_len;
> > > + u32 act;
> > > +
> > > + buf_hard_start = data;
> > > + xdp_init_buff(&xdp, frame_sz, &rq->xdp_rxq);
> > > + xdp_prepare_buff(&xdp, buf_hard_start, headroom, data_len, true);
> > > + orig_data = xdp.data;
> > > +
> > > + act = bpf_prog_run_xdp(rq->xdp_bpf_prog, &xdp);
> > > + rq->stats.xdp_packets++;
> > > +
> > > + switch (act) {
> > > + case XDP_DROP:
> > > + rq->stats.xdp_drops++;
> > > + break;
> > > + case XDP_PASS:
> > > + /* bpf prog might change len and data position.
> > > + * dataring does not use skb so not support this.
> > > + */
> > > + delta = xdp.data - orig_data;
> > > + delta_len = (xdp.data_end - xdp.data) - data_len;
> > > + if (skb) {
> > > + skb_reserve(skb, delta);
> > > + skb_put(skb, delta_len);
> > > + }
> > > + break;
> > > + case XDP_TX:
> > > + xdpf = xdp_convert_buff_to_frame(&xdp);
> > > + if (!xdpf ||
> > > + vmxnet3_xdp_xmit_back(rq->adapter, xdpf)) {
> > > + rq->stats.xdp_drops++;
> > > + } else {
> > > + rq->stats.xdp_tx++;
> > > + }
> > > + break;
> > > + case XDP_ABORTED:
> > > + trace_xdp_exception(rq->adapter->netdev, rq->xdp_bpf_prog,
> > > + act);
> > > + rq->stats.xdp_aborted++;
> > > + break;
> > > + case XDP_REDIRECT:
> > > + page = alloc_page(GFP_ATOMIC);
> > > + if (!page) {
> > > + rq->stats.rx_buf_alloc_failure++;
> > > + return XDP_DROP;
> > > + }
> >
> > So I think I see the problem I had questions about here. If I am not
> > mistaken you are copying the buffer to this page, and then copying this
> > page to an skb right? I think you might be better off just writing off
> > the Tx/Redirect pages and letting them go through their respective
> > paths and just allocating new pages instead assuming these pages were
> > consumed.
>
> I'm not sure I understand, can you elaborate?
>
> For XDP_TX, I'm doing 1 extra copy, copying to the newly allocated skb
> in vmxnet3_xdp_xmit_back.
> For XDP_REDIREC, I allocate a page and copy to the page and call
> xdp_do_redirect, there is no copying to skb again. If I don't allocate a
> new page, it always crashes at
> [62020.425932] BUG: Bad page state in process cpumap/0/map:29 pfn:107548
> [62020.440905] kernel BUG at include/linux/mm.h:757!
> VM_BUG_ON_PAGE(page_ref_count(page) == 0, page);
just to provide more detail about the crash.
It happens when I run
./samples/bpf/xdp_redirect_cpu -d ens160 -c 0 -e drop
with the following stack
[62020.473606] RIP: 0010:page_frag_free+0xb6/0x140
[62020.572992] __xdp_return+0xad/0x260
[62020.577546] xdp_return_frame+0x2c/0xa0
[62020.582190] cpu_map_bpf_prog_run_xdp.isra.0+0xa3/0x320
Thanks
William
^ permalink raw reply
* Re: Possible race with xsk_flush
From: Shawn Bohrer @ 2022-12-16 23:42 UTC (permalink / raw)
To: Magnus Karlsson; +Cc: netdev, bpf, bjorn, magnus.karlsson, kernel-team
In-Reply-To: <CAJ8uoz1GKvoaM0DCo1Ki8q=LHR1cjrNC=1BK7chTKKW9Po5F5A@mail.gmail.com>
On Fri, Dec 16, 2022 at 11:05:19AM +0100, Magnus Karlsson wrote:
> To summarize, we are expecting this ordering:
>
> CPU 0 __xsk_rcv_zc()
> CPU 0 __xsk_map_flush()
> CPU 2 __xsk_rcv_zc()
> CPU 2 __xsk_map_flush()
>
> But we are seeing this order:
>
> CPU 0 __xsk_rcv_zc()
> CPU 2 __xsk_rcv_zc()
> CPU 0 __xsk_map_flush()
> CPU 2 __xsk_map_flush()
>
> Here is the veth NAPI poll loop:
>
> static int veth_poll(struct napi_struct *napi, int budget)
> {
> struct veth_rq *rq =
> container_of(napi, struct veth_rq, xdp_napi);
> struct veth_stats stats = {};
> struct veth_xdp_tx_bq bq;
> int done;
>
> bq.count = 0;
>
> xdp_set_return_frame_no_direct();
> done = veth_xdp_rcv(rq, budget, &bq, &stats);
>
> if (done < budget && napi_complete_done(napi, done)) {
> /* Write rx_notify_masked before reading ptr_ring */
> smp_store_mb(rq->rx_notify_masked, false);
> if (unlikely(!__ptr_ring_empty(&rq->xdp_ring))) {
> if (napi_schedule_prep(&rq->xdp_napi)) {
> WRITE_ONCE(rq->rx_notify_masked, true);
> __napi_schedule(&rq->xdp_napi);
> }
> }
> }
>
> if (stats.xdp_tx > 0)
> veth_xdp_flush(rq, &bq);
> if (stats.xdp_redirect > 0)
> xdp_do_flush();
> xdp_clear_return_frame_no_direct();
>
> return done;
> }
>
> Something I have never seen before is that there is
> napi_complete_done() and a __napi_schedule() before xdp_do_flush().
> Let us check if this has something to do with it. So two suggestions
> to be executed separately:
>
> * Put a probe at the __napi_schedule() above and check if it gets
> triggered before this problem
> * Move the "if (stats.xdp_redirect > 0) xdp_do_flush();" to just
> before "if (done < budget && napi_complete_done(napi, done)) {"
I've built a kernel moving xdp_do_flush() before napi_complete_done()
and will leave this running over the weekend. I also spent a while
trying to understand the napi code to see if that reordering seemed
like it might cause the bug. I'll admit I still don't fully undestand
the napi code or how the bug can happen. We'll see if this appears to
fix the issue, but since it is somewhat hard to reproduce I'd love for
someone to be able to explain why it fixes it.
Thanks,
Shawn Bohrer
^ permalink raw reply
* Re: [RFC net-next 01/15] devlink: move code to a dedicated directory
From: Jacob Keller @ 2022-12-16 23:26 UTC (permalink / raw)
To: Jakub Kicinski, Jiri Pirko; +Cc: davem, netdev, edumazet, pabeni, leon
In-Reply-To: <20221216103216.6a98725b@kernel.org>
On 12/16/2022 10:32 AM, Jakub Kicinski wrote:
> On Fri, 16 Dec 2022 10:10:41 +0100 Jiri Pirko wrote:
>>>> Hmm, as devlink is not really designed to be only networking thing,
>>>> perhaps this is good opportunity to move out of net/ and change the
>>>> config name to "CONFIG_DEVLINK" ?
>>>
>>> Nothing against it, but don't think it belongs in this patch.
>>> So I call scope creep.
>>
>> Yeah, but I mean, since you move it from /net/core to /net/, why not
>> just move it to / ?
>
> It still needs to depend on NET for sockets, right?
Its going to depend on whatever Generic Netlink depends on yea.
^ permalink raw reply
* Re: [PATCH v6] vmxnet3: Add XDP support.
From: William Tu @ 2022-12-16 22:53 UTC (permalink / raw)
To: Alexander H Duyck; +Cc: netdev, tuc, gyang, doshir
In-Reply-To: <c4c62a4563d6669ca3c5d5efee0e54bc8742f27d.camel@gmail.com>
Hi Alexander,
Thanks for taking a look at this patch!
On Fri, Dec 16, 2022 at 9:14 AM Alexander H Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Fri, 2022-12-16 at 06:41 -0800, William Tu wrote:
> > The patch adds native-mode XDP support: XDP DROP, PASS, TX, and REDIRECT.
> >
> > Background:
> > The vmxnet3 rx consists of three rings: ring0, ring1, and dataring.
> > For r0 and r1, buffers at r0 are allocated using alloc_skb APIs and dma
> > mapped to the ring's descriptor. If LRO is enabled and packet size larger
> > than 3K, VMXNET3_MAX_SKB_BUF_SIZE, then r1 is used to mapped the rest of
> > the buffer larger than VMXNET3_MAX_SKB_BUF_SIZE. Each buffer in r1 is
> > allocated using alloc_page. So for LRO packets, the payload will be in one
> > buffer from r0 and multiple from r1, for non-LRO packets, only one
> > descriptor in r0 is used for packet size less than 3k.
> >
> > When receiving a packet, the first descriptor will have the sop (start of
> > packet) bit set, and the last descriptor will have the eop (end of packet)
> > bit set. Non-LRO packets will have only one descriptor with both sop and
> > eop set.
> >
> > Other than r0 and r1, vmxnet3 dataring is specifically designed for
> > handling packets with small size, usually 128 bytes, defined in
> > VMXNET3_DEF_RXDATA_DESC_SIZE, by simply copying the packet from the backend
> > driver in ESXi to the ring's memory region at front-end vmxnet3 driver, in
> > order to avoid memory mapping/unmapping overhead. In summary, packet size:
> > A. < 128B: use dataring
> > B. 128B - 3K: use ring0 (VMXNET3_RX_BUF_SKB)
> > C. > 3K: use ring0 and ring1 (VMXNET3_RX_BUF_SKB + VMXNET3_RX_BUF_PAGE)
> > As a result, the patch adds XDP support for packets using dataring
> > and r0 (case A and B), not the large packet size when LRO is enabled.
> >
> > XDP Implementation:
> > When user loads and XDP prog, vmxnet3 driver checks configurations, such
> > as mtu, lro, and re-allocate the rx buffer size for reserving the extra
> > headroom, XDP_PACKET_HEADROOM, for XDP frame. The XDP prog will then be
> > associated with every rx queue of the device. Note that when using dataring
> > for small packet size, vmxnet3 (front-end driver) doesn't control the
> > buffer allocation, as a result the XDP frame's headroom is zero.
> >
> > The receive side of XDP is implemented for case A and B, by invoking the
> > bpf program at vmxnet3_rq_rx_complete and handle its returned action.
> > The new vmxnet3_run_xdp function handles the difference of using dataring
> > or ring0, and decides the next journey of the packet afterward.
> >
> > For TX, vmxnet3 has split header design. Outgoing packets are parsed
> > first and protocol headers (L2/L3/L4) are copied to the backend. The
> > rest of the payload are dma mapped. Since XDP_TX does not parse the
> > packet protocol, the entire XDP frame is using dma mapped for the
> > transmission. Because the actual TX is deferred until txThreshold, it's
> > possible that an rx buffer is being overwritten by incoming burst of rx
> > packets, before tx buffer being transmitted. As a result, we allocate new
> > skb and introduce a copy.
> >
> > Performance:
> > Tested using two VMs inside one ESXi machine, using single core on each
> > vmxnet3 device, sender using DPDK testpmd tx-mode attached to vmxnet3
> > device, sending 64B or 512B packet.
> >
> > VM1 txgen:
> > $ dpdk-testpmd -l 0-3 -n 1 -- -i --nb-cores=3 \
> > --forward-mode=txonly --eth-peer=0,<mac addr of vm2>
> > option: add "--txonly-multi-flow"
> > option: use --txpkts=512 or 64 byte
> >
> > VM2 running XDP:
> > $ ./samples/bpf/xdp_rxq_info -d ens160 -a <options> --skb-mode
> > $ ./samples/bpf/xdp_rxq_info -d ens160 -a <options>
> > options: XDP_DROP, XDP_PASS, XDP_TX
> >
> > To test REDIRECT to cpu 0, use
> > $ ./samples/bpf/xdp_redirect_cpu -d ens160 -c 0 -e drop
> >
> > Single core performance comparison with skb-mode.
> > 64B: skb-mode -> native-mode (with this patch)
> > XDP_DROP: 932Kpps -> 2.0Mpps
> > XDP_PASS: 284Kpps -> 314Kpps
> > XDP_TX: 591Kpps -> 1.8Mpps
> > REDIRECT: 453Kpps -> 501Kpps
> >
> > 512B: skb-mode -> native-mode (with this patch)
> > XDP_DROP: 890Kpps -> 1.3Mpps
> > XDP_PASS: 284Kpps -> 314Kpps
> > XDP_TX: 555Kpps -> 1.2Mpps
> > REDIRECT: 670Kpps -> 430Kpps
> >
>
> I hadn't noticed it before. Based on this it looks like native mode is
> performing worse then skb-mode for redirect w/ 512B packets? Have you
> looked into why that might be?
yes, I noticed it but don't know why, maybe it's due to extra copy and page
allocation like you said below. I will dig deeper.
>
> My main concern would be that you are optimizing for recyling in the Tx
> and Redirect paths, when you might be better off just releasing the
> buffers and batch allocating new pages in your Rx path.
right, are you talking about using the page pool allocator, ex: slide 8 below
https://legacy.netdevconf.info/0x14/pub/slides/10/add-xdp-on-driver.pdf
I tried it before but then I found I have to replace lots of existing vmxnet3
code, basically replacing all the rx/tx buffer allocation code with new
page pool api, even without XDP. I'd love to give it a try, do you think it's
worth doing it?
>
> > Limitations:
> > a. LRO will be disabled when users load XDP program
> > b. MTU will be checked and limit to
> > VMXNET3_MAX_SKB_BUF_SIZE(3K) - XDP_PACKET_HEADROOM(256) -
> > SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
> >
> > Signed-off-by: William Tu <u9012063@gmail.com>
> > ---
> > v5 -> v6:
> > work on feedbacks from Alexander Duyck
> > - remove unused skb parameter at vmxnet3_xdp_xmit
> > - add trace point for XDP_ABORTED
> > - avoid TX packet buffer being overwritten by allocatin
> > new skb and memcpy (TX performance drop from 2.3 to 1.8Mpps)
> > - update some of the performance number and a demo video
> > https://youtu.be/I3nx5wQDTXw
> > - pass VMware internal regression test using non-ENS: vmxnet3v2,
> > vmxnet3v3, vmxnet3v4, so remove RFC tag.
> >
> > v4 -> v5:
> > - move XDP code to separate file: vmxnet3_xdp.{c, h},
> > suggested by Guolin
> > - expose vmxnet3_rq_create_all and vmxnet3_adjust_rx_ring_size
> > - more test using samples/bpf/{xdp1, xdp2, xdp_adjust_tail}
> > - add debug print
> > - rebase on commit 65e6af6cebe
> >
> > v3 -> v4:
> > - code refactoring and improved commit message
> > - make dataring and non-dataring case clear
> > - in XDP_PASS, handle xdp.data and xdp.data_end change after
> > bpf program executed
> > - now still working on internal testing
> > - v4 applied on net-next commit 65e6af6cebef
> >
> > v2 -> v3:
> > - code refactoring: move the XDP processing to the front
> > of the vmxnet3_rq_rx_complete, and minimize the places
> > of changes to existing code.
> > - Performance improvement over BUF_SKB (512B) due to skipping
> > skb allocation when DROP and TX.
> >
> > v1 -> v2:
> > - Avoid skb allocation for small packet size (when dataring is used)
> > - Use rcu_read_lock unlock instead of READ_ONCE
> > - Peroformance improvement over v1
> > - Merge xdp drop, tx, pass, and redirect into 1 patch
> >
> > I tested the patch using below script:
> > while [ true ]; do
> > timeout 20 ./samples/bpf/xdp_rxq_info -d ens160 -a XDP_DROP --skb-mode
> > timeout 20 ./samples/bpf/xdp_rxq_info -d ens160 -a XDP_DROP
> > timeout 20 ./samples/bpf/xdp_rxq_info -d ens160 -a XDP_PASS --skb-mode
> > timeout 20 ./samples/bpf/xdp_rxq_info -d ens160 -a XDP_PASS
> > timeout 20 ./samples/bpf/xdp_rxq_info -d ens160 -a XDP_TX --skb-mode
> > timeout 20 ./samples/bpf/xdp_rxq_info -d ens160 -a XDP_TX
> > timeout 20 ./samples/bpf/xdp_redirect_cpu -d ens160 -c 0 -e drop
> > timeout 20 ./samples/bpf/xdp_redirect_cpu -d ens160 -c 0 -e pass
> > done
> > ---
> > drivers/net/vmxnet3/Makefile | 2 +-
> > drivers/net/vmxnet3/vmxnet3_drv.c | 48 ++-
> > drivers/net/vmxnet3/vmxnet3_ethtool.c | 14 +
> > drivers/net/vmxnet3/vmxnet3_int.h | 20 ++
> > drivers/net/vmxnet3/vmxnet3_xdp.c | 458 ++++++++++++++++++++++++++
> > drivers/net/vmxnet3/vmxnet3_xdp.h | 39 +++
> > 6 files changed, 573 insertions(+), 8 deletions(-)
> > create mode 100644 drivers/net/vmxnet3/vmxnet3_xdp.c
> > create mode 100644 drivers/net/vmxnet3/vmxnet3_xdp.h
> >
> > diff --git a/drivers/net/vmxnet3/Makefile b/drivers/net/vmxnet3/Makefile
> > index a666a88ac1ff..f82870c10205 100644
> > --- a/drivers/net/vmxnet3/Makefile
> > +++ b/drivers/net/vmxnet3/Makefile
> > @@ -32,4 +32,4 @@
> >
> > obj-$(CONFIG_VMXNET3) += vmxnet3.o
> >
> > -vmxnet3-objs := vmxnet3_drv.o vmxnet3_ethtool.o
> > +vmxnet3-objs := vmxnet3_drv.o vmxnet3_ethtool.o vmxnet3_xdp.o
> > diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
> > index d3e7b27eb933..b55fec2ac2bf 100644
> > --- a/drivers/net/vmxnet3/vmxnet3_drv.c
> > +++ b/drivers/net/vmxnet3/vmxnet3_drv.c
> > @@ -28,6 +28,7 @@
> > #include <net/ip6_checksum.h>
> >
> > #include "vmxnet3_int.h"
> > +#include "vmxnet3_xdp.h"
> >
> > char vmxnet3_driver_name[] = "vmxnet3";
> > #define VMXNET3_DRIVER_DESC "VMware vmxnet3 virtual NIC driver"
> > @@ -351,7 +352,6 @@ vmxnet3_unmap_pkt(u32 eop_idx, struct vmxnet3_tx_queue *tq,
> > BUG_ON(VMXNET3_TXDESC_GET_EOP(&(tq->tx_ring.base[eop_idx].txd)) != 1);
> >
> > skb = tq->buf_info[eop_idx].skb;
> > - BUG_ON(skb == NULL);
> > tq->buf_info[eop_idx].skb = NULL;
> >
> > VMXNET3_INC_RING_IDX_ONLY(eop_idx, tq->tx_ring.size);
> > @@ -592,6 +592,9 @@ vmxnet3_rq_alloc_rx_buf(struct vmxnet3_rx_queue *rq, u32 ring_idx,
> > rbi->skb = __netdev_alloc_skb_ip_align(adapter->netdev,
> > rbi->len,
> > GFP_KERNEL);
> > + if (adapter->xdp_enabled)
> > + skb_reserve(rbi->skb, XDP_PACKET_HEADROOM);
> > +
> > if (unlikely(rbi->skb == NULL)) {
> > rq->stats.rx_buf_alloc_failure++;
> > break;
> > @@ -1404,6 +1407,8 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq,
> > struct Vmxnet3_RxDesc rxCmdDesc;
> > struct Vmxnet3_RxCompDesc rxComp;
> > #endif
> > + bool need_flush = 0;
> > +
> > vmxnet3_getRxComp(rcd, &rq->comp_ring.base[rq->comp_ring.next2proc].rcd,
> > &rxComp);
> > while (rcd->gen == rq->comp_ring.gen) {
> > @@ -1444,6 +1449,19 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq,
> > goto rcd_done;
> > }
> >
> > + if (unlikely(rcd->sop && rcd->eop && adapter->xdp_enabled)) {
> > + int act = vmxnet3_process_xdp(adapter, rq, rcd, rbi,
> > + rxd, &need_flush);
> > + ctx->skb = NULL;
> > + switch (act) {
> > + case VMXNET3_XDP_TAKEN:
> > + goto rcd_done;
> > + case VMXNET3_XDP_CONTINUE:
> > + default:
> > + break;
> > + }
> > + }
> > +
> > if (rcd->sop) { /* first buf of the pkt */
> > bool rxDataRingUsed;
> > u16 len;
> > @@ -1483,6 +1501,10 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq,
> > goto rcd_done;
> > }
> >
> > + if (adapter->xdp_enabled && !rxDataRingUsed)
> > + skb_reserve(new_skb,
> > + XDP_PACKET_HEADROOM);
> > +
> > if (rxDataRingUsed) {
> > size_t sz;
> >
> > @@ -1730,6 +1752,8 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq,
> > vmxnet3_getRxComp(rcd,
> > &rq->comp_ring.base[rq->comp_ring.next2proc].rcd, &rxComp);
> > }
> > + if (need_flush)
> > + xdp_do_flush_map();
> >
> > return num_pkts;
> > }
> > @@ -1776,6 +1800,7 @@ vmxnet3_rq_cleanup(struct vmxnet3_rx_queue *rq,
> >
> > rq->comp_ring.gen = VMXNET3_INIT_GEN;
> > rq->comp_ring.next2proc = 0;
> > + rq->xdp_bpf_prog = NULL;
> > }
> >
> >
> > @@ -1788,7 +1813,6 @@ vmxnet3_rq_cleanup_all(struct vmxnet3_adapter *adapter)
> > vmxnet3_rq_cleanup(&adapter->rx_queue[i], adapter);
> > }
> >
> > -
> > static void vmxnet3_rq_destroy(struct vmxnet3_rx_queue *rq,
> > struct vmxnet3_adapter *adapter)
> > {
> > @@ -1832,6 +1856,8 @@ static void vmxnet3_rq_destroy(struct vmxnet3_rx_queue *rq,
> > kfree(rq->buf_info[0]);
> > rq->buf_info[0] = NULL;
> > rq->buf_info[1] = NULL;
> > +
> > + vmxnet3_unregister_xdp_rxq(rq);
> > }
> >
> > static void
> > @@ -1893,6 +1919,10 @@ vmxnet3_rq_init(struct vmxnet3_rx_queue *rq,
> > }
> > vmxnet3_rq_alloc_rx_buf(rq, 1, rq->rx_ring[1].size - 1, adapter);
> >
> > + /* always register, even if no XDP prog used */
> > + if (vmxnet3_register_xdp_rxq(rq, adapter))
> > + return -EINVAL;
> > +
> > /* reset the comp ring */
> > rq->comp_ring.next2proc = 0;
> > memset(rq->comp_ring.base, 0, rq->comp_ring.size *
> > @@ -1989,7 +2019,7 @@ vmxnet3_rq_create(struct vmxnet3_rx_queue *rq, struct vmxnet3_adapter *adapter)
> > }
> >
> >
> > -static int
> > +int
> > vmxnet3_rq_create_all(struct vmxnet3_adapter *adapter)
> > {
> > int i, err = 0;
> > @@ -2585,7 +2615,8 @@ vmxnet3_setup_driver_shared(struct vmxnet3_adapter *adapter)
> > if (adapter->netdev->features & NETIF_F_RXCSUM)
> > devRead->misc.uptFeatures |= UPT1_F_RXCSUM;
> >
> > - if (adapter->netdev->features & NETIF_F_LRO) {
> > + if (adapter->netdev->features & NETIF_F_LRO &&
> > + !adapter->xdp_enabled) {
> > devRead->misc.uptFeatures |= UPT1_F_LRO;
> > devRead->misc.maxNumRxSG = cpu_to_le16(1 + MAX_SKB_FRAGS);
> > }
> > @@ -3026,7 +3057,7 @@ vmxnet3_free_pci_resources(struct vmxnet3_adapter *adapter)
> > }
> >
> >
> > -static void
> > +void
> > vmxnet3_adjust_rx_ring_size(struct vmxnet3_adapter *adapter)
> > {
> > size_t sz, i, ring0_size, ring1_size, comp_size;
> > @@ -3035,7 +3066,8 @@ vmxnet3_adjust_rx_ring_size(struct vmxnet3_adapter *adapter)
> > if (adapter->netdev->mtu <= VMXNET3_MAX_SKB_BUF_SIZE -
> > VMXNET3_MAX_ETH_HDR_SIZE) {
> > adapter->skb_buf_size = adapter->netdev->mtu +
> > - VMXNET3_MAX_ETH_HDR_SIZE;
> > + VMXNET3_MAX_ETH_HDR_SIZE +
> > + vmxnet3_xdp_headroom(adapter);
> > if (adapter->skb_buf_size < VMXNET3_MIN_T0_BUF_SIZE)
> > adapter->skb_buf_size = VMXNET3_MIN_T0_BUF_SIZE;
> >
> > @@ -3563,7 +3595,6 @@ vmxnet3_reset_work(struct work_struct *data)
> > clear_bit(VMXNET3_STATE_BIT_RESETTING, &adapter->state);
> > }
> >
> > -
> > static int
> > vmxnet3_probe_device(struct pci_dev *pdev,
> > const struct pci_device_id *id)
> > @@ -3585,6 +3616,8 @@ vmxnet3_probe_device(struct pci_dev *pdev,
> > #ifdef CONFIG_NET_POLL_CONTROLLER
> > .ndo_poll_controller = vmxnet3_netpoll,
> > #endif
> > + .ndo_bpf = vmxnet3_xdp,
> > + .ndo_xdp_xmit = vmxnet3_xdp_xmit,
> > };
> > int err;
> > u32 ver;
> > @@ -3900,6 +3933,7 @@ vmxnet3_probe_device(struct pci_dev *pdev,
> > goto err_register;
> > }
> >
> > + adapter->xdp_enabled = false;
> > vmxnet3_check_link(adapter, false);
> > return 0;
> >
> > diff --git a/drivers/net/vmxnet3/vmxnet3_ethtool.c b/drivers/net/vmxnet3/vmxnet3_ethtool.c
> > index 18cf7c723201..6f542236b26e 100644
> > --- a/drivers/net/vmxnet3/vmxnet3_ethtool.c
> > +++ b/drivers/net/vmxnet3/vmxnet3_ethtool.c
> > @@ -76,6 +76,10 @@ vmxnet3_tq_driver_stats[] = {
> > copy_skb_header) },
> > { " giant hdr", offsetof(struct vmxnet3_tq_driver_stats,
> > oversized_hdr) },
> > + { " xdp xmit", offsetof(struct vmxnet3_tq_driver_stats,
> > + xdp_xmit) },
> > + { " xdp xmit err", offsetof(struct vmxnet3_tq_driver_stats,
> > + xdp_xmit_err) },
> > };
> >
> > /* per rq stats maintained by the device */
> > @@ -106,6 +110,16 @@ vmxnet3_rq_driver_stats[] = {
> > drop_fcs) },
> > { " rx buf alloc fail", offsetof(struct vmxnet3_rq_driver_stats,
> > rx_buf_alloc_failure) },
> > + { " xdp packets", offsetof(struct vmxnet3_rq_driver_stats,
> > + xdp_packets) },
> > + { " xdp tx", offsetof(struct vmxnet3_rq_driver_stats,
> > + xdp_tx) },
> > + { " xdp redirects", offsetof(struct vmxnet3_rq_driver_stats,
> > + xdp_redirects) },
> > + { " xdp drops", offsetof(struct vmxnet3_rq_driver_stats,
> > + xdp_drops) },
> > + { " xdp aborted", offsetof(struct vmxnet3_rq_driver_stats,
> > + xdp_aborted) },
> > };
> >
> > /* global stats maintained by the driver */
> > diff --git a/drivers/net/vmxnet3/vmxnet3_int.h b/drivers/net/vmxnet3/vmxnet3_int.h
> > index 3367db23aa13..5cf4033930d8 100644
> > --- a/drivers/net/vmxnet3/vmxnet3_int.h
> > +++ b/drivers/net/vmxnet3/vmxnet3_int.h
> > @@ -56,6 +56,8 @@
> > #include <linux/if_arp.h>
> > #include <linux/inetdevice.h>
> > #include <linux/log2.h>
> > +#include <linux/bpf.h>
> > +#include <linux/skbuff.h>
> >
> > #include "vmxnet3_defs.h"
> >
> > @@ -217,6 +219,9 @@ struct vmxnet3_tq_driver_stats {
> > u64 linearized; /* # of pkts linearized */
> > u64 copy_skb_header; /* # of times we have to copy skb header */
> > u64 oversized_hdr;
> > +
> > + u64 xdp_xmit;
> > + u64 xdp_xmit_err;
> > };
> >
> > struct vmxnet3_tx_ctx {
> > @@ -285,6 +290,12 @@ struct vmxnet3_rq_driver_stats {
> > u64 drop_err;
> > u64 drop_fcs;
> > u64 rx_buf_alloc_failure;
> > +
> > + u64 xdp_packets; /* Total packets processed by XDP. */
> > + u64 xdp_tx;
> > + u64 xdp_redirects;
> > + u64 xdp_drops;
> > + u64 xdp_aborted;
> > };
> >
> > struct vmxnet3_rx_data_ring {
> > @@ -307,6 +318,8 @@ struct vmxnet3_rx_queue {
> > struct vmxnet3_rx_buf_info *buf_info[2];
> > struct Vmxnet3_RxQueueCtrl *shared;
> > struct vmxnet3_rq_driver_stats stats;
> > + struct bpf_prog __rcu *xdp_bpf_prog;
> > + struct xdp_rxq_info xdp_rxq;
> > } __attribute__((__aligned__(SMP_CACHE_BYTES)));
> >
> > #define VMXNET3_DEVICE_MAX_TX_QUEUES 32
> > @@ -415,6 +428,7 @@ struct vmxnet3_adapter {
> > u16 tx_prod_offset;
> > u16 rx_prod_offset;
> > u16 rx_prod2_offset;
> > + bool xdp_enabled;
> > };
> >
> > #define VMXNET3_WRITE_BAR0_REG(adapter, reg, val) \
> > @@ -490,6 +504,12 @@ vmxnet3_tq_destroy_all(struct vmxnet3_adapter *adapter);
> > void
> > vmxnet3_rq_destroy_all(struct vmxnet3_adapter *adapter);
> >
> > +int
> > +vmxnet3_rq_create_all(struct vmxnet3_adapter *adapter);
> > +
> > +void
> > +vmxnet3_adjust_rx_ring_size(struct vmxnet3_adapter *adapter);
> > +
> > netdev_features_t
> > vmxnet3_fix_features(struct net_device *netdev, netdev_features_t features);
> >
> > diff --git a/drivers/net/vmxnet3/vmxnet3_xdp.c b/drivers/net/vmxnet3/vmxnet3_xdp.c
> > new file mode 100644
> > index 000000000000..afb2d43b5464
> > --- /dev/null
> > +++ b/drivers/net/vmxnet3/vmxnet3_xdp.c
> > @@ -0,0 +1,458 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> > +/*
> > + * Linux driver for VMware's vmxnet3 ethernet NIC.
> > + * Copyright (C) 2008-2023, VMware, Inc. All Rights Reserved.
> > + * Maintained by: pv-drivers@vmware.com
> > + *
> > + */
> > +
> > +#include "vmxnet3_int.h"
> > +#include "vmxnet3_xdp.h"
> > +
> > +static void
> > +vmxnet3_xdp_exchange_program(struct vmxnet3_adapter *adapter,
> > + struct bpf_prog *prog)
> > +{
> > + struct vmxnet3_rx_queue *rq;
> > + int i;
> > +
> > + for (i = 0; i < adapter->num_rx_queues; i++) {
> > + rq = &adapter->rx_queue[i];
> > + rcu_assign_pointer(rq->xdp_bpf_prog, prog);
> > + }
> > + if (prog)
> > + adapter->xdp_enabled = true;
> > + else
> > + adapter->xdp_enabled = false;
> > +}
> > +
> > +static int
> > +vmxnet3_xdp_set(struct net_device *netdev, struct netdev_bpf *bpf,
> > + struct netlink_ext_ack *extack)
> > +{
> > + struct vmxnet3_adapter *adapter = netdev_priv(netdev);
> > + struct bpf_prog *new_bpf_prog = bpf->prog;
> > + struct bpf_prog *old_bpf_prog;
> > + bool need_update;
> > + bool running;
> > + int err = 0;
> > +
> > + if (new_bpf_prog && netdev->mtu > VMXNET3_XDP_MAX_MTU) {
> > + NL_SET_ERR_MSG_MOD(extack, "MTU too large for XDP");
> > + return -EOPNOTSUPP;
> > + }
> > +
> > + old_bpf_prog = READ_ONCE(adapter->rx_queue[0].xdp_bpf_prog);
> > + if (!new_bpf_prog && !old_bpf_prog) {
> > + adapter->xdp_enabled = false;
> > + return 0;
> > + }
> > + running = netif_running(netdev);
> > + need_update = !!old_bpf_prog != !!new_bpf_prog;
> > +
> > + if (running && need_update)
> > + vmxnet3_quiesce_dev(adapter);
> > +
> > + vmxnet3_xdp_exchange_program(adapter, new_bpf_prog);
> > + if (old_bpf_prog)
> > + bpf_prog_put(old_bpf_prog);
> > +
> > + if (running && need_update) {
> > + vmxnet3_reset_dev(adapter);
> > + vmxnet3_rq_destroy_all(adapter);
> > + vmxnet3_adjust_rx_ring_size(adapter);
> > + err = vmxnet3_rq_create_all(adapter);
> > + if (err) {
> > + NL_SET_ERR_MSG_MOD(extack,
> > + "failed to re-create rx queues for XDP.");
> > + err = -EOPNOTSUPP;
> > + goto out;
> > + }
> > + err = vmxnet3_activate_dev(adapter);
> > + if (err) {
> > + NL_SET_ERR_MSG_MOD(extack,
> > + "failed to activate device for XDP.");
> > + err = -EOPNOTSUPP;
> > + goto out;
> > + }
> > + clear_bit(VMXNET3_STATE_BIT_RESETTING, &adapter->state);
> > + }
> > +out:
> > + return err;
> > +}
> > +
> > +/* This is the main xdp call used by kernel to set/unset eBPF program. */
> > +int
> > +vmxnet3_xdp(struct net_device *netdev, struct netdev_bpf *bpf)
> > +{
> > + switch (bpf->command) {
> > + case XDP_SETUP_PROG:
> > + netdev_dbg(netdev, "XDP: set program to ");
> > + return vmxnet3_xdp_set(netdev, bpf, bpf->extack);
> > + default:
> > + return -EINVAL;
> > + }
> > + return 0;
> > +}
> > +
> > +int
> > +vmxnet3_xdp_headroom(struct vmxnet3_adapter *adapter)
> > +{
> > + if (adapter->xdp_enabled)
> > + return VMXNET3_XDP_ROOM;
> > + else
> > + return 0;
> > +}
> > +
> > +void
> > +vmxnet3_unregister_xdp_rxq(struct vmxnet3_rx_queue *rq)
> > +{
> > + xdp_rxq_info_unreg_mem_model(&rq->xdp_rxq);
> > + xdp_rxq_info_unreg(&rq->xdp_rxq);
> > +}
> > +
> > +int
> > +vmxnet3_register_xdp_rxq(struct vmxnet3_rx_queue *rq,
> > + struct vmxnet3_adapter *adapter)
> > +{
> > + int err;
> > +
> > + err = xdp_rxq_info_reg(&rq->xdp_rxq, adapter->netdev, rq->qid, 0);
> > + if (err < 0)
> > + return err;
> > +
> > + err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq, MEM_TYPE_PAGE_SHARED,
> > + NULL);
> > + if (err < 0) {
> > + xdp_rxq_info_unreg(&rq->xdp_rxq);
> > + return err;
> > + }
> > + return 0;
> > +}
> > +
> > +static int
> > +vmxnet3_xdp_xmit_frame(struct vmxnet3_adapter *adapter,
> > + struct xdp_frame *xdpf,
> > + struct vmxnet3_tx_queue *tq)
> > +{
> > + struct vmxnet3_tx_buf_info *tbi = NULL;
> > + union Vmxnet3_GenericDesc *gdesc;
> > + struct vmxnet3_tx_ctx ctx;
> > + int tx_num_deferred;
> > + struct sk_buff *skb;
> > + u32 buf_size;
> > + int ret = 0;
> > + u32 dw2;
> > +
> > + if (vmxnet3_cmd_ring_desc_avail(&tq->tx_ring) == 0) {
> > + tq->stats.tx_ring_full++;
> > + ret = -ENOMEM;
> > + goto exit;
> > + }
> > +
> > + skb = __netdev_alloc_skb(adapter->netdev, xdpf->len, GFP_KERNEL);
> > + if (unlikely(!skb))
> > + goto exit;
> > +
>
> Any reason for this not to return an error. Seems like this might be
> better off being the ENOMEM w/ the no descriptor case being something
> like ENOSPC.
my mistake, will fix it, thanks!
>
> Also at some point you may want to look at supporting handling page
> fragments or XDP frames as allocating an skb for an XDP transmit can be
> expensive.
OK! I haven't thought about that yet, will keep it in mind.
>
> > + memcpy(skb->data, xdpf->data, xdpf->len);
> > + dw2 = (tq->tx_ring.gen ^ 0x1) << VMXNET3_TXD_GEN_SHIFT;
> > + dw2 |= xdpf->len;
> > + ctx.sop_txd = tq->tx_ring.base + tq->tx_ring.next2fill;
> > + gdesc = ctx.sop_txd;
> > +
> > + buf_size = xdpf->len;
> > + tbi = tq->buf_info + tq->tx_ring.next2fill;
> > + tbi->map_type = VMXNET3_MAP_SINGLE;
> > + tbi->dma_addr = dma_map_single(&adapter->pdev->dev,
> > + skb->data, buf_size,
> > + DMA_TO_DEVICE);
> > + if (dma_mapping_error(&adapter->pdev->dev, tbi->dma_addr)) {
> > + ret = -EFAULT;
> > + goto exit;
> > + }
>
> Don't you need to free the skb here?
Yes, will fix it.
>
> > + tbi->len = buf_size;
> > +
> > + gdesc = tq->tx_ring.base + tq->tx_ring.next2fill;
> > + WARN_ON_ONCE(gdesc->txd.gen == tq->tx_ring.gen);
> > +
> > + gdesc->txd.addr = cpu_to_le64(tbi->dma_addr);
> > + gdesc->dword[2] = cpu_to_le32(dw2);
> > +
> > + /* Setup the EOP desc */
> > + gdesc->dword[3] = cpu_to_le32(VMXNET3_TXD_CQ | VMXNET3_TXD_EOP);
> > +
> > + gdesc->txd.om = 0;
> > + gdesc->txd.msscof = 0;
> > + gdesc->txd.hlen = 0;
> > + gdesc->txd.ti = 0;
> > +
> > + tx_num_deferred = le32_to_cpu(tq->shared->txNumDeferred);
> > + tq->shared->txNumDeferred += 1;
> > + tx_num_deferred++;
> > +
> > + vmxnet3_cmd_ring_adv_next2fill(&tq->tx_ring);
> > +
> > + /* set the last buf_info for the pkt */
> > + tbi->skb = skb;
> > + tbi->sop_idx = ctx.sop_txd - tq->tx_ring.base;
> > +
> > + dma_wmb();
> > + gdesc->dword[2] = cpu_to_le32(le32_to_cpu(gdesc->dword[2]) ^
> > + VMXNET3_TXD_GEN);
> > + if (tx_num_deferred >= le32_to_cpu(tq->shared->txThreshold)) {
> > + tq->shared->txNumDeferred = 0;
> > + VMXNET3_WRITE_BAR0_REG(adapter,
> > + VMXNET3_REG_TXPROD + tq->qid * 8,
> > + tq->tx_ring.next2fill);
> > + }
> > +exit:
> > + return ret;
> > +}
> > +
> > +static int
> > +vmxnet3_xdp_xmit_back(struct vmxnet3_adapter *adapter,
> > + struct xdp_frame *xdpf)
> > +{
> > + struct vmxnet3_tx_queue *tq;
> > + struct netdev_queue *nq;
> > + int err = 0, cpu;
> > + int tq_number;
> > +
> > + tq_number = adapter->num_tx_queues;
> > + cpu = smp_processor_id();
> > + tq = &adapter->tx_queue[cpu % tq_number];
> > + if (tq->stopped)
> > + return -ENETDOWN;
> > +
> > + nq = netdev_get_tx_queue(adapter->netdev, tq->qid);
> > +
> > + __netif_tx_lock(nq, cpu);
> > + err = vmxnet3_xdp_xmit_frame(adapter, xdpf, tq);
> > + if (err)
> > + goto exit;
> > +
> > +exit:
>
> What is the point of the label/goto? I don't see another spot that
> references it.
Thanks, will fix it!
>
> > + __netif_tx_unlock(nq);
> > + return err;
> > +}
> > +
> > +int
> > +vmxnet3_xdp_xmit(struct net_device *dev,
> > + int n, struct xdp_frame **frames, u32 flags)
> > +{
> > + struct vmxnet3_adapter *adapter;
> > + struct vmxnet3_tx_queue *tq;
> > + struct netdev_queue *nq;
> > + int i, err, cpu;
> > + int nxmit = 0;
> > + int tq_number;
> > +
> > + adapter = netdev_priv(dev);
> > +
> > + if (unlikely(test_bit(VMXNET3_STATE_BIT_QUIESCED, &adapter->state)))
> > + return -ENETDOWN;
> > + if (unlikely(test_bit(VMXNET3_STATE_BIT_RESETTING, &adapter->state)))
> > + return -EINVAL;
> > +
> > + tq_number = adapter->num_tx_queues;
> > + cpu = smp_processor_id();
> > + tq = &adapter->tx_queue[cpu % tq_number];
> > + if (tq->stopped)
> > + return -ENETDOWN;
> > +
> > + nq = netdev_get_tx_queue(adapter->netdev, tq->qid);
> > +
> > + __netif_tx_lock(nq, cpu);
> > + for (i = 0; i < n; i++) {
> > + err = vmxnet3_xdp_xmit_frame(adapter, frames[i], tq);
> > + if (err) {
> > + tq->stats.xdp_xmit_err++;
> > + break;
> > + }
> > + nxmit++;
> > + }
> > +
> > + tq->stats.xdp_xmit += nxmit;
> > + __netif_tx_unlock(nq);
> > +
>
> Are you doing anything to free the frames after you transmit them? If I
> am not mistaken you are just copying them over into skbs aren't you, so
> what is freeing the frames after that?
The frames will be free at vmxnet3_tq_cleanup() at dev_kfree_skb_any(tbi->skb);
Because at the vmxnet3_xdp_xmit_frame the allocated skb is saved at tbi->skb,
so it can be freed at tq cleanup.
>
> > + return nxmit;
> > +}
> > +
> > +static int
> > +__vmxnet3_run_xdp(struct vmxnet3_rx_queue *rq, void *data, int data_len,
> > + int headroom, int frame_sz, bool *need_xdp_flush,
> > + struct sk_buff *skb)
> > +{
> > + struct xdp_frame *xdpf;
> > + void *buf_hard_start;
> > + struct xdp_buff xdp;
> > + struct page *page;
> > + void *orig_data;
> > + int err, delta;
> > + int delta_len;
> > + u32 act;
> > +
> > + buf_hard_start = data;
> > + xdp_init_buff(&xdp, frame_sz, &rq->xdp_rxq);
> > + xdp_prepare_buff(&xdp, buf_hard_start, headroom, data_len, true);
> > + orig_data = xdp.data;
> > +
> > + act = bpf_prog_run_xdp(rq->xdp_bpf_prog, &xdp);
> > + rq->stats.xdp_packets++;
> > +
> > + switch (act) {
> > + case XDP_DROP:
> > + rq->stats.xdp_drops++;
> > + break;
> > + case XDP_PASS:
> > + /* bpf prog might change len and data position.
> > + * dataring does not use skb so not support this.
> > + */
> > + delta = xdp.data - orig_data;
> > + delta_len = (xdp.data_end - xdp.data) - data_len;
> > + if (skb) {
> > + skb_reserve(skb, delta);
> > + skb_put(skb, delta_len);
> > + }
> > + break;
> > + case XDP_TX:
> > + xdpf = xdp_convert_buff_to_frame(&xdp);
> > + if (!xdpf ||
> > + vmxnet3_xdp_xmit_back(rq->adapter, xdpf)) {
> > + rq->stats.xdp_drops++;
> > + } else {
> > + rq->stats.xdp_tx++;
> > + }
> > + break;
> > + case XDP_ABORTED:
> > + trace_xdp_exception(rq->adapter->netdev, rq->xdp_bpf_prog,
> > + act);
> > + rq->stats.xdp_aborted++;
> > + break;
> > + case XDP_REDIRECT:
> > + page = alloc_page(GFP_ATOMIC);
> > + if (!page) {
> > + rq->stats.rx_buf_alloc_failure++;
> > + return XDP_DROP;
> > + }
>
> So I think I see the problem I had questions about here. If I am not
> mistaken you are copying the buffer to this page, and then copying this
> page to an skb right? I think you might be better off just writing off
> the Tx/Redirect pages and letting them go through their respective
> paths and just allocating new pages instead assuming these pages were
> consumed.
I'm not sure I understand, can you elaborate?
For XDP_TX, I'm doing 1 extra copy, copying to the newly allocated skb
in vmxnet3_xdp_xmit_back.
For XDP_REDIREC, I allocate a page and copy to the page and call
xdp_do_redirect, there is no copying to skb again. If I don't allocate a
new page, it always crashes at
[62020.425932] BUG: Bad page state in process cpumap/0/map:29 pfn:107548
[62020.440905] kernel BUG at include/linux/mm.h:757!
VM_BUG_ON_PAGE(page_ref_count(page) == 0, page);
Thanks!
William
^ permalink raw reply
* Re: [PATCH] net: ks8851: Drop IRQ threading
From: Dmitry Torokhov @ 2022-12-16 22:32 UTC (permalink / raw)
To: Marek Vasut
Cc: Eric Dumazet, netdev, David S. Miller, Geoff Levand,
Jakub Kicinski, Linus Walleij, Paolo Abeni, Petr Machata,
Wolfram Sang
In-Reply-To: <7a50a241-0a93-3e44-bcc7-b9e07c62d616@denx.de>
On Fri, Dec 16, 2022 at 11:19:27PM +0100, Marek Vasut wrote:
> On 12/16/22 22:54, Dmitry Torokhov wrote:
> > On Fri, Dec 16, 2022 at 02:23:04PM +0100, Eric Dumazet wrote:
> > > On Fri, Dec 16, 2022 at 1:47 PM Marek Vasut <marex@denx.de> wrote:
> > > >
> > > > Request non-threaded IRQ in the KSZ8851 driver, this fixes the following warning:
> > > > "
> > > > NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!!
> > >
> > > This changelog is a bit terse.
> > >
> > > Why can other drivers use request_threaded_irq(), but not this one ?
> >
> > This one actually *has* to use threading, as (as far as I can see) the
> > "lock" that is being taken in ks8851_irq for the SPI variant of the
> > device is actually a mutex, so we have to be able to sleep in the
> > interrupt handler...
>
> So maybe we should use threaded one for the SPI variant and non-threaded one
> for the Parallel bus variant ?
I do not think the threading itself is the issue. I did a quick search
and "Non-RCU local softirq work is pending" seems to be a somewhat
common issue in network drivers. I think you should follow for example
thread in https://lore.kernel.org/all/87y28b9nyn.ffs@tglx/t/ and collect
the trace data and bug tglx and Paul. I see you are even on CC in that
thread...
Thanks.
--
Dmitry
^ permalink raw reply
* Re: [PATCH net] devlink: protect devlink dump by the instance lock
From: Jacob Keller @ 2022-12-16 22:32 UTC (permalink / raw)
To: Jiri Pirko, Jakub Kicinski; +Cc: davem, netdev, edumazet, pabeni, jiri, moshe
In-Reply-To: <Y5w3y1BU8cLAMTny@nanopsycho>
On 12/16/2022 1:18 AM, Jiri Pirko wrote:
> Fri, Dec 16, 2022 at 05:44:47AM CET, kuba@kernel.org wrote:
>> On Thu, 15 Dec 2022 20:41:22 -0800 Jakub Kicinski wrote:
>>> Take the instance lock around devlink_nl_fill() when dumping,
>>> doit takes it already.
>>>
>>> We are only dumping basic info so in the worst case we were risking
>>> data races around the reload statistics. Also note that the reloads
>>> themselves had not been under the instance lock until recently, so
>>> the selection of the Fixes tag is inherently questionable.
>>>
>>> Fixes: a254c264267e ("devlink: Add reload stats")
>>
>> On second thought, the drivers can't call reload, so until we got rid
>> of the big bad mutex there could have been no race. I'll swap the tag
>> for:
>>
>> Fixes: d3efc2a6a6d8 ("net: devlink: remove devlink_mutex")
>>
>> when/if applying.
>
> You are right.
>
> Reviewed-by: Jiri Pirko <jiri@nvidia.com>
>
> Thanks!
Agreed!
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
^ permalink raw reply
* Re: [PATCH] net: ks8851: Drop IRQ threading
From: Marek Vasut @ 2022-12-16 22:19 UTC (permalink / raw)
To: Dmitry Torokhov, Eric Dumazet
Cc: netdev, David S. Miller, Geoff Levand, Jakub Kicinski,
Linus Walleij, Paolo Abeni, Petr Machata, Wolfram Sang
In-Reply-To: <Y5zpMILXRnW2+dBU@google.com>
On 12/16/22 22:54, Dmitry Torokhov wrote:
> On Fri, Dec 16, 2022 at 02:23:04PM +0100, Eric Dumazet wrote:
>> On Fri, Dec 16, 2022 at 1:47 PM Marek Vasut <marex@denx.de> wrote:
>>>
>>> Request non-threaded IRQ in the KSZ8851 driver, this fixes the following warning:
>>> "
>>> NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!!
>>
>> This changelog is a bit terse.
>>
>> Why can other drivers use request_threaded_irq(), but not this one ?
>
> This one actually *has* to use threading, as (as far as I can see) the
> "lock" that is being taken in ks8851_irq for the SPI variant of the
> device is actually a mutex, so we have to be able to sleep in the
> interrupt handler...
So maybe we should use threaded one for the SPI variant and non-threaded
one for the Parallel bus variant ?
^ permalink raw reply
* [PATCH] octeontx2_pf: Select NET_DEVLINK when enabling OCTEONTX2_PF
From: Paul Gazzillo @ 2022-12-16 21:54 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Randy Dunlap, Zheng Bin, Yang Yingliang, Suman Ghosh, netdev,
linux-kernel
Cc: Paul Gazzillo
When using COMPILE_TEST, the driver controlled by OCTEONTX2_PF does
not select NET_DEVLINK while the related OCTEONTX2_AF driver does.
This means that when OCTEONTX2_PF is enabled from a default
configuration, linker errors will occur due to undefined references to
code controlled by NET_DEVLINK.
1. make.cross ARCH=x86_64 defconfig
2. make.cross ARCH=x86_64 menuconfig
3. Enable COMPILE_TEST
General setup --->
Compile also drivers which will not load
4. Enable OCTEONTX2_PF
Device Drivers --->
Network device support --->
Ethernet driver support --->
Marvell OcteonTX2 NIC Physical Function driver
5. Exit and save configuration. NET_DEVLINK will still be disabled.
6. make.cross ARCH=x86_64 several linker errors, for example,
ld: drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.o:
in function `otx2_register_dl':
otx2_devlink.c:(.text+0x142): undefined reference to `devlink_alloc_ns'
This fix adds "select NET_DEVLINK" link to OCTEONTX2_PF's Kconfig
specification to match OCTEONTX2_AF.
Signed-off-by: Paul Gazzillo <paul@pgazz.com>
---
drivers/net/ethernet/marvell/octeontx2/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/marvell/octeontx2/Kconfig b/drivers/net/ethernet/marvell/octeontx2/Kconfig
index 3f982ccf2c85..639893d87055 100644
--- a/drivers/net/ethernet/marvell/octeontx2/Kconfig
+++ b/drivers/net/ethernet/marvell/octeontx2/Kconfig
@@ -31,6 +31,7 @@ config NDC_DIS_DYNAMIC_CACHING
config OCTEONTX2_PF
tristate "Marvell OcteonTX2 NIC Physical Function driver"
select OCTEONTX2_MBOX
+ select NET_DEVLINK
depends on (64BIT && COMPILE_TEST) || ARM64
depends on PCI
depends on PTP_1588_CLOCK_OPTIONAL
--
2.25.1
^ permalink raw reply related
* Re: [PATCH] net: ks8851: Drop IRQ threading
From: Dmitry Torokhov @ 2022-12-16 21:54 UTC (permalink / raw)
To: Eric Dumazet
Cc: Marek Vasut, netdev, David S. Miller, Geoff Levand,
Jakub Kicinski, Linus Walleij, Paolo Abeni, Petr Machata,
Wolfram Sang
In-Reply-To: <CANn89i+08T_1pDZ-FWikarVq=5q4MVAx=+mRkSqeinfb10OdOg@mail.gmail.com>
On Fri, Dec 16, 2022 at 02:23:04PM +0100, Eric Dumazet wrote:
> On Fri, Dec 16, 2022 at 1:47 PM Marek Vasut <marex@denx.de> wrote:
> >
> > Request non-threaded IRQ in the KSZ8851 driver, this fixes the following warning:
> > "
> > NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!!
>
> This changelog is a bit terse.
>
> Why can other drivers use request_threaded_irq(), but not this one ?
This one actually *has* to use threading, as (as far as I can see) the
"lock" that is being taken in ks8851_irq for the SPI variant of the
device is actually a mutex, so we have to be able to sleep in the
interrupt handler...
>
>
> > "
> >
> > Signed-off-by: Marek Vasut <marex@denx.de>
> > ---
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Geoff Levand <geoff@infradead.org>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: Linus Walleij <linus.walleij@linaro.org>
> > Cc: Paolo Abeni <pabeni@redhat.com>
> > Cc: Petr Machata <petrm@nvidia.com>
> > Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
> > To: netdev@vger.kernel.org
> > ---
> > drivers/net/ethernet/micrel/ks8851_common.c | 4 +---
> > 1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/micrel/ks8851_common.c b/drivers/net/ethernet/micrel/ks8851_common.c
> > index cfbc900d4aeb9..1eba4ba0b95cf 100644
> > --- a/drivers/net/ethernet/micrel/ks8851_common.c
> > +++ b/drivers/net/ethernet/micrel/ks8851_common.c
> > @@ -443,9 +443,7 @@ static int ks8851_net_open(struct net_device *dev)
> > unsigned long flags;
> > int ret;
> >
> > - ret = request_threaded_irq(dev->irq, NULL, ks8851_irq,
> > - IRQF_TRIGGER_LOW | IRQF_ONESHOT,
> > - dev->name, ks);
> > + ret = request_irq(dev->irq, ks8851_irq, IRQF_TRIGGER_LOW, dev->name, ks);
> > if (ret < 0) {
> > netdev_err(dev, "failed to get irq\n");
> > return ret;
> > --
> > 2.35.1
> >
Thanks.
--
Dmitry
^ permalink raw reply
* Re: [PATCH net-next v3] net: ipa: Remove redundant comparison with zero
From: Kang Minchul @ 2022-12-16 21:39 UTC (permalink / raw)
To: Alex Elder
Cc: Alex Elder, David S . Miller, Jakub Kicinski, netdev,
linux-kernel
In-Reply-To: <941d8fa9-9c7b-8e06-e87a-b1c9ed80a639@ieee.org>
2022년 12월 17일 (토) 오전 6:34, Alex Elder <elder@ieee.org>님이 작성:
>
> On 12/16/22 3:11 PM, Kang Minchul wrote:
> > platform_get_irq_byname() returns non-zero IRQ number on success,
> > and negative error number on failure.
> >
> > So comparing return value with zero is unnecessary.
> >
> > Signed-off-by: Kang Minchul <tegongkang@gmail.com>
> > ---
> > drivers/net/ipa/gsi.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ipa/gsi.c b/drivers/net/ipa/gsi.c
> > index bea2da1c4c51..e05e94bd9ba0 100644
> > --- a/drivers/net/ipa/gsi.c
> > +++ b/drivers/net/ipa/gsi.c
> > @@ -1302,7 +1302,7 @@ static int gsi_irq_init(struct gsi *gsi, struct platform_device *pdev)
> > int ret;
> >
> > ret = platform_get_irq_byname(pdev, "gsi");
> > - if (ret <= 0)
> > + if (ret < 0)
> > return ret ? : -EINVAL;
>
> In doing this, you assume platform_get_irq_byname() never
> returns 0. I accept that assumption now. But in that case,
> ret will *always* be non-zero if the branch is taken, so the
> next line should simply be:
>
> return ret;
>
> There are two other places in the IPA driver that follow exactly
> the same pattern of getting the IRQ number and handling the
> possibility that the value returned is (erroneously) 0. If
> you're making this change in one place, please do so in all
> of them.
>
> Frankly though, I think this change adds little value, and I
> would be content to just leave everything as-is...
>
> -Alex
>
> >
> > gsi->irq = ret;
>
Hmm, yes.
I think you're right.
I don't think this code is worth much either.
Please ignore this patch and leave everything as-is.
Thanks for your feedback though :)
Kang Minchul
^ permalink raw reply
* Re: [PATCH net-next v3] net: ipa: Remove redundant comparison with zero
From: Alex Elder @ 2022-12-16 21:34 UTC (permalink / raw)
To: Kang Minchul, Alex Elder, David S . Miller, Jakub Kicinski
Cc: netdev, linux-kernel
In-Reply-To: <20221216211124.154459-1-tegongkang@gmail.com>
On 12/16/22 3:11 PM, Kang Minchul wrote:
> platform_get_irq_byname() returns non-zero IRQ number on success,
> and negative error number on failure.
>
> So comparing return value with zero is unnecessary.
>
> Signed-off-by: Kang Minchul <tegongkang@gmail.com>
> ---
> drivers/net/ipa/gsi.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ipa/gsi.c b/drivers/net/ipa/gsi.c
> index bea2da1c4c51..e05e94bd9ba0 100644
> --- a/drivers/net/ipa/gsi.c
> +++ b/drivers/net/ipa/gsi.c
> @@ -1302,7 +1302,7 @@ static int gsi_irq_init(struct gsi *gsi, struct platform_device *pdev)
> int ret;
>
> ret = platform_get_irq_byname(pdev, "gsi");
> - if (ret <= 0)
> + if (ret < 0)
> return ret ? : -EINVAL;
In doing this, you assume platform_get_irq_byname() never
returns 0. I accept that assumption now. But in that case,
ret will *always* be non-zero if the branch is taken, so the
next line should simply be:
return ret;
There are two other places in the IPA driver that follow exactly
the same pattern of getting the IRQ number and handling the
possibility that the value returned is (erroneously) 0. If
you're making this change in one place, please do so in all
of them.
Frankly though, I think this change adds little value, and I
would be content to just leave everything as-is...
-Alex
>
> gsi->irq = ret;
^ permalink raw reply
* Re: [PATCH 4.19 000/338] 4.19.238-rc1 review
From: Trond Myklebust @ 2022-12-16 21:24 UTC (permalink / raw)
To: Michael Trimarchi
Cc: Neil Brown, Naresh Kamboju, Greg Kroah-Hartman,
Linux Kernel Mailing List, stable@vger.kernel.org, Linus Torvalds,
Andrew Morton, linux@roeck-us.net, shuah@kernel.org,
patches@kernelci.org, lkft-triage@lists.linaro.org, pavel@denx.de,
jonathanh@nvidia.com, f.fainelli@gmail.com,
sudipm.mukherjee@gmail.com, slade@sladewatkins.com, Netdev,
David S. Miller, Jakub Kicinski, Paolo Abeni, Trond Myklebust,
linux-nfs@vger.kernel.org, Anna Schumaker
In-Reply-To: <Y5y5n8JoGZNt1otY@panicking>
> On Dec 16, 2022, at 13:31, Michael Trimarchi <michael@amarulasolutions.com> wrote:
>
> [You don't often get email from michael@amarulasolutions.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> Hi Neil
>
> On Tue, Apr 26, 2022 at 12:29:55PM +1000, NeilBrown wrote:
>> On Thu, 21 Apr 2022, Naresh Kamboju wrote:
>>> On Mon, 18 Apr 2022 at 14:09, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
>>>>
>>>> On Thu, 14 Apr 2022 at 18:45, Greg Kroah-Hartman
>>>> <gregkh@linuxfoundation.org> wrote:
>>>>>
>>>>> This is the start of the stable review cycle for the 4.19.238 release.
>>>>> There are 338 patches in this series, all will be posted as a response
>>>>> to this one. If anyone has any issues with these being applied, please
>>>>> let me know.
>>>>>
>>>>> Responses should be made by Sat, 16 Apr 2022 11:07:54 +0000.
>>>>> Anything received after that time might be too late.
>>>>>
>>>>> The whole patch series can be found in one patch at:
>>>>> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.238-rc1.gz
>>>>> or in the git tree and branch at:
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
>>>>> and the diffstat can be found below.
>>>>>
>>>>> thanks,
>>>>>
>>>>> greg k-h
>>>>
>>>>
>>>> Following kernel warning noticed on arm64 Juno-r2 while booting
>>>> stable-rc 4.19.238. Here is the full test log link [1].
>>>>
>>>> [ 0.000000] Booting Linux on physical CPU 0x0000000100 [0x410fd033]
>>>> [ 0.000000] Linux version 4.19.238 (tuxmake@tuxmake) (gcc version
>>>> 11.2.0 (Debian 11.2.0-18)) #1 SMP PREEMPT @1650206156
>>>> [ 0.000000] Machine model: ARM Juno development board (r2)
>>>> <trim>
>>>> [ 18.499895] ================================
>>>> [ 18.504172] WARNING: inconsistent lock state
>>>> [ 18.508451] 4.19.238 #1 Not tainted
>>>> [ 18.511944] --------------------------------
>>>> [ 18.516222] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
>>>> [ 18.522242] kworker/u12:3/60 [HC0[0]:SC0[0]:HE1:SE1] takes:
>>>> [ 18.527826] (____ptrval____)
>>>> (&(&xprt->transport_lock)->rlock){+.?.}, at: xprt_destroy+0x70/0xe0
>>>> [ 18.536648] {IN-SOFTIRQ-W} state was registered at:
>>>> [ 18.541543] lock_acquire+0xc8/0x23c
>>
>> Prior to Linux 5.3, ->transport_lock needs spin_lock_bh() and
>> spin_unlock_bh().
>>
>
> We get the same deadlock or similar one and we think that
> can be connected to this thread on 4.19.243. For us is a bit
> difficult to hit but we are going to apply this change
>
> net: sunrpc: Fix deadlock in xprt_destroy
>
> Prior to Linux 5.3, ->transport_lock needs spin_lock_bh() and
> spin_unlock_bh().
>
> Signed-off-by: Michael Trimarchi <michael@amarulasolutions.com>
> ---
> net/sunrpc/xprt.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> index d05fa7c36d00..b1abf4848bbc 100644
> --- a/net/sunrpc/xprt.c
> +++ b/net/sunrpc/xprt.c
> @@ -1550,9 +1550,9 @@ static void xprt_destroy(struct rpc_xprt *xprt)
> * is cleared. We use ->transport_lock to ensure the mod_timer()
> * can only run *before* del_time_sync(), never after.
> */
> - spin_lock(&xprt->transport_lock);
> + spin_lock_bh(&xprt->transport_lock);
> del_timer_sync(&xprt->timer);
> - spin_unlock(&xprt->transport_lock);
> + spin_unlock_bh(&xprt->transport_lock);
>
> /*
> * Destroy sockets etc from the system workqueue so they can
> —
Agreed. When backporting to kernels that are older than 5.3.x, the transport lock needs to be taken using the bh-safe spin lock variants.
Reviewed-by: Trond Myklebust <trond.myklebust@hammerspace.com <mailto:trond.myklebust@hammerspace.com>>
_________________________________
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com
^ permalink raw reply
* [PATCH net-next v3] net: ipa: Remove redundant comparison with zero
From: Kang Minchul @ 2022-12-16 21:11 UTC (permalink / raw)
To: Alex Elder, David S . Miller, Jakub Kicinski
Cc: netdev, linux-kernel, Kang Minchul
platform_get_irq_byname() returns non-zero IRQ number on success,
and negative error number on failure.
So comparing return value with zero is unnecessary.
Signed-off-by: Kang Minchul <tegongkang@gmail.com>
---
drivers/net/ipa/gsi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ipa/gsi.c b/drivers/net/ipa/gsi.c
index bea2da1c4c51..e05e94bd9ba0 100644
--- a/drivers/net/ipa/gsi.c
+++ b/drivers/net/ipa/gsi.c
@@ -1302,7 +1302,7 @@ static int gsi_irq_init(struct gsi *gsi, struct platform_device *pdev)
int ret;
ret = platform_get_irq_byname(pdev, "gsi");
- if (ret <= 0)
+ if (ret < 0)
return ret ? : -EINVAL;
gsi->irq = ret;
--
2.34.1
^ permalink raw reply related
* Re: [PATCH] ath9k: use proper statements in conditionals
From: Toke Høiland-Jørgensen @ 2022-12-16 21:06 UTC (permalink / raw)
To: Kalle Valo
Cc: Arnd Bergmann, Pavel Skripkin, Arnd Bergmann, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Tetsuo Handa,
linux-wireless, netdev, linux-kernel
In-Reply-To: <87359fkt1q.fsf@kernel.org>
Kalle Valo <kvalo@kernel.org> writes:
> Toke Høiland-Jørgensen <toke@toke.dk> writes:
>
>> Arnd Bergmann <arnd@kernel.org> writes:
>>
>>> From: Arnd Bergmann <arnd@arndb.de>
>>>
>>> A previous cleanup patch accidentally broke some conditional
>>> expressions by replacing the safe "do {} while (0)" constructs
>>> with empty macros. gcc points this out when extra warnings
>>> are enabled:
>>>
>>> drivers/net/wireless/ath/ath9k/hif_usb.c: In function 'ath9k_skb_queue_complete':
>>> drivers/net/wireless/ath/ath9k/hif_usb.c:251:57: error: suggest braces around empty body in an 'else' statement [-Werror=empty-body]
>>> 251 | TX_STAT_INC(hif_dev, skb_failed);
>>>
>>> Make both sets of macros proper expressions again.
>>>
>>> Fixes: d7fc76039b74 ("ath9k: htc: clean up statistics macros")
>>> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>>
>> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
>
> I'll queue this to v6.2.
Sounds good, thanks!
-Toke
^ permalink raw reply
* RE: [PATCH ethtool-next v1 0/3] add netlink support for rss get
From: Mogilappagari, Sudheer @ 2022-12-16 21:06 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev@vger.kernel.org, mkubecek@suse.cz, Samudrala, Sridhar,
Nguyen, Anthony L
In-Reply-To: <20221214195544.65896858@kernel.org>
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Subject: Re: [PATCH ethtool-next v1 0/3] add netlink support for rss
> get
>
> On Wed, 14 Dec 2022 15:54:15 -0800 Sudheer Mogilappagari wrote:
> > These patches add netlink based handler to fetch RSS information
> using
> > "ethtool -x <eth> [context %d]" command.
>
> Can we please support JSON output from the start?
> I can't stress enough how useful json output is in practice.
Sure. Will send out v2 with JSON support.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox