* [PATCH AUTOSEL 6.6 27/31] tsnep: Fix tsnep_request_irq() format-overflow warning
From: Sasha Levin @ 2023-11-07 12:06 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Gerhard Engleder, kernel test robot, Jacob Keller, Jakub Kicinski,
Sasha Levin, davem, edumazet, pabeni, maciej.fijalkowski, hawk,
alexanderduyck, netdev
In-Reply-To: <20231107120704.3756327-1-sashal@kernel.org>
From: Gerhard Engleder <gerhard@engleder-embedded.com>
[ Upstream commit 00e984cb986b31e9313745e51daceaa1e1eb7351 ]
Compiler warns about a possible format-overflow in tsnep_request_irq():
drivers/net/ethernet/engleder/tsnep_main.c:884:55: warning: 'sprintf' may write a terminating nul past the end of the destination [-Wformat-overflow=]
sprintf(queue->name, "%s-rx-%d", name,
^
drivers/net/ethernet/engleder/tsnep_main.c:881:55: warning: 'sprintf' may write a terminating nul past the end of the destination [-Wformat-overflow=]
sprintf(queue->name, "%s-tx-%d", name,
^
drivers/net/ethernet/engleder/tsnep_main.c:878:49: warning: '-txrx-' directive writing 6 bytes into a region of size between 5 and 25 [-Wformat-overflow=]
sprintf(queue->name, "%s-txrx-%d", name,
^~~~~~
Actually overflow cannot happen. Name is limited to IFNAMSIZ, because
netdev_name() is called during ndo_open(). queue_index is single char,
because less than 10 queues are supported.
Fix warning with snprintf(). Additionally increase buffer to 32 bytes,
because those 7 additional bytes were unused anyway.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202310182028.vmDthIUa-lkp@intel.com/
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231023183856.58373-1-gerhard@engleder-embedded.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/engleder/tsnep.h | 2 +-
drivers/net/ethernet/engleder/tsnep_main.c | 12 ++++++------
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/engleder/tsnep.h b/drivers/net/ethernet/engleder/tsnep.h
index 6e14c918e3fb7..f188fba021a62 100644
--- a/drivers/net/ethernet/engleder/tsnep.h
+++ b/drivers/net/ethernet/engleder/tsnep.h
@@ -143,7 +143,7 @@ struct tsnep_rx {
struct tsnep_queue {
struct tsnep_adapter *adapter;
- char name[IFNAMSIZ + 9];
+ char name[IFNAMSIZ + 16];
struct tsnep_tx *tx;
struct tsnep_rx *rx;
diff --git a/drivers/net/ethernet/engleder/tsnep_main.c b/drivers/net/ethernet/engleder/tsnep_main.c
index 8b992dc9bb52b..38da2d6c250e6 100644
--- a/drivers/net/ethernet/engleder/tsnep_main.c
+++ b/drivers/net/ethernet/engleder/tsnep_main.c
@@ -1779,14 +1779,14 @@ static int tsnep_request_irq(struct tsnep_queue *queue, bool first)
dev = queue->adapter;
} else {
if (queue->tx && queue->rx)
- sprintf(queue->name, "%s-txrx-%d", name,
- queue->rx->queue_index);
+ snprintf(queue->name, sizeof(queue->name), "%s-txrx-%d",
+ name, queue->rx->queue_index);
else if (queue->tx)
- sprintf(queue->name, "%s-tx-%d", name,
- queue->tx->queue_index);
+ snprintf(queue->name, sizeof(queue->name), "%s-tx-%d",
+ name, queue->tx->queue_index);
else
- sprintf(queue->name, "%s-rx-%d", name,
- queue->rx->queue_index);
+ snprintf(queue->name, sizeof(queue->name), "%s-rx-%d",
+ name, queue->rx->queue_index);
handler = tsnep_irq_txrx;
dev = queue;
}
--
2.42.0
^ permalink raw reply related
* [PATCH AUTOSEL 6.6 21/31] vsock: read from socket's error queue
From: Sasha Levin @ 2023-11-07 12:06 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Arseniy Krasnov, Stefano Garzarella, David S . Miller,
Sasha Levin, edumazet, kuba, pabeni, dhowells, alexander,
virtualization, netdev
In-Reply-To: <20231107120704.3756327-1-sashal@kernel.org>
From: Arseniy Krasnov <avkrasnov@salutedevices.com>
[ Upstream commit 49dbe25adac42d3e06f65d1420946bec65896222 ]
This adds handling of MSG_ERRQUEUE input flag in receive call. This flag
is used to read socket's error queue instead of data queue. Possible
scenario of error queue usage is receiving completions for transmission
with MSG_ZEROCOPY flag. This patch also adds new defines: 'SOL_VSOCK'
and 'VSOCK_RECVERR'.
Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/linux/socket.h | 1 +
include/uapi/linux/vm_sockets.h | 17 +++++++++++++++++
net/vmw_vsock/af_vsock.c | 6 ++++++
3 files changed, 24 insertions(+)
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 39b74d83c7c4a..cfcb7e2c3813f 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -383,6 +383,7 @@ struct ucred {
#define SOL_MPTCP 284
#define SOL_MCTP 285
#define SOL_SMC 286
+#define SOL_VSOCK 287
/* IPX options */
#define IPX_TYPE 1
diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h
index c60ca33eac594..ed07181d4eff9 100644
--- a/include/uapi/linux/vm_sockets.h
+++ b/include/uapi/linux/vm_sockets.h
@@ -191,4 +191,21 @@ struct sockaddr_vm {
#define IOCTL_VM_SOCKETS_GET_LOCAL_CID _IO(7, 0xb9)
+/* MSG_ZEROCOPY notifications are encoded in the standard error format,
+ * sock_extended_err. See Documentation/networking/msg_zerocopy.rst in
+ * kernel source tree for more details.
+ */
+
+/* 'cmsg_level' field value of 'struct cmsghdr' for notification parsing
+ * when MSG_ZEROCOPY flag is used on transmissions.
+ */
+
+#define SOL_VSOCK 287
+
+/* 'cmsg_type' field value of 'struct cmsghdr' for notification parsing
+ * when MSG_ZEROCOPY flag is used on transmissions.
+ */
+
+#define VSOCK_RECVERR 1
+
#endif /* _UAPI_VM_SOCKETS_H */
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 020cf17ab7e47..ccd8cefeea7ba 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -89,6 +89,7 @@
#include <linux/types.h>
#include <linux/bitops.h>
#include <linux/cred.h>
+#include <linux/errqueue.h>
#include <linux/init.h>
#include <linux/io.h>
#include <linux/kernel.h>
@@ -110,6 +111,7 @@
#include <linux/workqueue.h>
#include <net/sock.h>
#include <net/af_vsock.h>
+#include <uapi/linux/vm_sockets.h>
static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
static void vsock_sk_destruct(struct sock *sk);
@@ -2134,6 +2136,10 @@ vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
int err;
sk = sock->sk;
+
+ if (unlikely(flags & MSG_ERRQUEUE))
+ return sock_recv_errqueue(sk, msg, len, SOL_VSOCK, VSOCK_RECVERR);
+
vsk = vsock_sk(sk);
err = 0;
--
2.42.0
^ permalink raw reply related
* [PATCH AUTOSEL 6.6 20/31] net: sfp: add quirk for FS's 2.5G copper SFP
From: Sasha Levin @ 2023-11-07 12:06 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Raju Lakkaraju, Paolo Abeni, Sasha Levin, linux, andrew,
hkallweit1, davem, edumazet, kuba, netdev
In-Reply-To: <20231107120704.3756327-1-sashal@kernel.org>
From: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
[ Upstream commit e27aca3760c08b7b05aea71068bd609aa93e7b35 ]
Add a quirk for a copper SFP that identifies itself as "FS" "SFP-2.5G-T".
This module's PHY is inaccessible, and can only run at 2500base-X with the
host without negotiation. Add a quirk to enable the 2500base-X interface mode
with 2500base-T support and disable auto negotiation.
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Link: https://lore.kernel.org/r/20230925080059.266240-1-Raju.Lakkaraju@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/phy/sfp.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index a50038a452507..3679a43f4eb02 100644
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -468,6 +468,9 @@ static const struct sfp_quirk sfp_quirks[] = {
SFP_QUIRK("HUAWEI", "MA5671A", sfp_quirk_2500basex,
sfp_fixup_ignore_tx_fault),
+ // FS 2.5G Base-T
+ SFP_QUIRK_M("FS", "SFP-2.5G-T", sfp_quirk_oem_2_5g),
+
// Lantech 8330-262D-E can operate at 2500base-X, but incorrectly report
// 2500MBd NRZ in their EEPROM
SFP_QUIRK_M("Lantech", "8330-262D-E", sfp_quirk_2500basex),
--
2.42.0
^ permalink raw reply related
* [PATCH AUTOSEL 6.6 17/31] net: annotate data-races around sk->sk_dst_pending_confirm
From: Sasha Levin @ 2023-11-07 12:06 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Eric Dumazet, David S . Miller, Sasha Levin, kuba, pabeni,
dsahern, kuniyu, wuyun.abel, leitao, alexander, dhowells, netdev
In-Reply-To: <20231107120704.3756327-1-sashal@kernel.org>
From: Eric Dumazet <edumazet@google.com>
[ Upstream commit eb44ad4e635132754bfbcb18103f1dcb7058aedd ]
This field can be read or written without socket lock being held.
Add annotations to avoid load-store tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/net/sock.h | 6 +++---
net/core/sock.c | 2 +-
net/ipv4/tcp_output.c | 2 +-
3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index 97f7fbcbf61ed..7753354d59c0b 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2181,7 +2181,7 @@ static inline void __dst_negative_advice(struct sock *sk)
if (ndst != dst) {
rcu_assign_pointer(sk->sk_dst_cache, ndst);
sk_tx_queue_clear(sk);
- sk->sk_dst_pending_confirm = 0;
+ WRITE_ONCE(sk->sk_dst_pending_confirm, 0);
}
}
}
@@ -2198,7 +2198,7 @@ __sk_dst_set(struct sock *sk, struct dst_entry *dst)
struct dst_entry *old_dst;
sk_tx_queue_clear(sk);
- sk->sk_dst_pending_confirm = 0;
+ WRITE_ONCE(sk->sk_dst_pending_confirm, 0);
old_dst = rcu_dereference_protected(sk->sk_dst_cache,
lockdep_sock_is_held(sk));
rcu_assign_pointer(sk->sk_dst_cache, dst);
@@ -2211,7 +2211,7 @@ sk_dst_set(struct sock *sk, struct dst_entry *dst)
struct dst_entry *old_dst;
sk_tx_queue_clear(sk);
- sk->sk_dst_pending_confirm = 0;
+ WRITE_ONCE(sk->sk_dst_pending_confirm, 0);
old_dst = xchg((__force struct dst_entry **)&sk->sk_dst_cache, dst);
dst_release(old_dst);
}
diff --git a/net/core/sock.c b/net/core/sock.c
index 16584e2dd6481..bfaf47b3f3c7c 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -600,7 +600,7 @@ struct dst_entry *__sk_dst_check(struct sock *sk, u32 cookie)
INDIRECT_CALL_INET(dst->ops->check, ip6_dst_check, ipv4_dst_check,
dst, cookie) == NULL) {
sk_tx_queue_clear(sk);
- sk->sk_dst_pending_confirm = 0;
+ WRITE_ONCE(sk->sk_dst_pending_confirm, 0);
RCU_INIT_POINTER(sk->sk_dst_cache, NULL);
dst_release(dst);
return NULL;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f0723460753c5..9ccfdc825004d 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1331,7 +1331,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
skb->destructor = skb_is_tcp_pure_ack(skb) ? __sock_wfree : tcp_wfree;
refcount_add(skb->truesize, &sk->sk_wmem_alloc);
- skb_set_dst_pending_confirm(skb, sk->sk_dst_pending_confirm);
+ skb_set_dst_pending_confirm(skb, READ_ONCE(sk->sk_dst_pending_confirm));
/* Build TCP header and checksum it. */
th = (struct tcphdr *)skb->data;
--
2.42.0
^ permalink raw reply related
* [PATCH AUTOSEL 6.6 16/31] net: annotate data-races around sk->sk_tx_queue_mapping
From: Sasha Levin @ 2023-11-07 12:06 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Eric Dumazet, David S . Miller, Sasha Levin, kuba, pabeni, netdev
In-Reply-To: <20231107120704.3756327-1-sashal@kernel.org>
From: Eric Dumazet <edumazet@google.com>
[ Upstream commit 0bb4d124d34044179b42a769a0c76f389ae973b6 ]
This field can be read or written without socket lock being held.
Add annotations to avoid load-store tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/net/sock.h | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index 92f7ea62a9159..97f7fbcbf61ed 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2006,21 +2006,33 @@ static inline void sk_tx_queue_set(struct sock *sk, int tx_queue)
/* sk_tx_queue_mapping accept only upto a 16-bit value */
if (WARN_ON_ONCE((unsigned short)tx_queue >= USHRT_MAX))
return;
- sk->sk_tx_queue_mapping = tx_queue;
+ /* Paired with READ_ONCE() in sk_tx_queue_get() and
+ * other WRITE_ONCE() because socket lock might be not held.
+ */
+ WRITE_ONCE(sk->sk_tx_queue_mapping, tx_queue);
}
#define NO_QUEUE_MAPPING USHRT_MAX
static inline void sk_tx_queue_clear(struct sock *sk)
{
- sk->sk_tx_queue_mapping = NO_QUEUE_MAPPING;
+ /* Paired with READ_ONCE() in sk_tx_queue_get() and
+ * other WRITE_ONCE() because socket lock might be not held.
+ */
+ WRITE_ONCE(sk->sk_tx_queue_mapping, NO_QUEUE_MAPPING);
}
static inline int sk_tx_queue_get(const struct sock *sk)
{
- if (sk && sk->sk_tx_queue_mapping != NO_QUEUE_MAPPING)
- return sk->sk_tx_queue_mapping;
+ if (sk) {
+ /* Paired with WRITE_ONCE() in sk_tx_queue_clear()
+ * and sk_tx_queue_set().
+ */
+ int val = READ_ONCE(sk->sk_tx_queue_mapping);
+ if (val != NO_QUEUE_MAPPING)
+ return val;
+ }
return -1;
}
--
2.42.0
^ permalink raw reply related
* [PATCH AUTOSEL 6.6 12/31] net: sfp: add quirk for Fiberstone GPON-ONU-34-20BI
From: Sasha Levin @ 2023-11-07 12:05 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Christian Marangi, Paolo Abeni, Sasha Levin, linux, andrew,
hkallweit1, davem, edumazet, kuba, netdev
In-Reply-To: <20231107120704.3756327-1-sashal@kernel.org>
From: Christian Marangi <ansuelsmth@gmail.com>
[ Upstream commit d387e34fec407f881fdf165b5d7ec128ebff362f ]
Fiberstone GPON-ONU-34-20B can operate at 2500base-X, but report 1.2GBd
NRZ in their EEPROM.
The module also require the ignore tx fault fixup similar to Huawei MA5671A
as it gets disabled on error messages with serial redirection enabled.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://lore.kernel.org/r/20230919124720.8210-1-ansuelsmth@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/phy/sfp.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index 4ecfac2278651..a50038a452507 100644
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -452,6 +452,11 @@ static const struct sfp_quirk sfp_quirks[] = {
// Rollball protocol to talk to the PHY.
SFP_QUIRK_F("FS", "SFP-10G-T", sfp_fixup_fs_10gt),
+ // Fiberstore GPON-ONU-34-20BI can operate at 2500base-X, but report 1.2GBd
+ // NRZ in their EEPROM
+ SFP_QUIRK("FS", "GPON-ONU-34-20BI", sfp_quirk_2500basex,
+ sfp_fixup_ignore_tx_fault),
+
SFP_QUIRK_F("HALNy", "HL-GSFP", sfp_fixup_halny_gsfp),
// HG MXPD-483II-F 2.5G supports 2500Base-X, but incorrectly reports
--
2.42.0
^ permalink raw reply related
* [PATCH AUTOSEL 6.6 05/31] atl1c: Work around the DMA RX overflow issue
From: Sasha Levin @ 2023-11-07 12:05 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Sieng-Piaw Liew, Paolo Abeni, Sasha Levin, chris.snook, davem,
edumazet, kuba, horms, pavan.chebbi, trix, ruc_gongyuanjun,
netdev
In-Reply-To: <20231107120704.3756327-1-sashal@kernel.org>
From: Sieng-Piaw Liew <liew.s.piaw@gmail.com>
[ Upstream commit 86565682e9053e5deb128193ea9e88531bbae9cf ]
This is based on alx driver commit 881d0327db37 ("net: alx: Work around
the DMA RX overflow issue").
The alx and atl1c drivers had RX overflow error which was why a custom
allocator was created to avoid certain addresses. The simpler workaround
then created for alx driver, but not for atl1c due to lack of tester.
Instead of using a custom allocator, check the allocated skb address and
use skb_reserve() to move away from problematic 0x...fc0 address.
Tested on AR8131 on Acer 4540.
Signed-off-by: Sieng-Piaw Liew <liew.s.piaw@gmail.com>
Link: https://lore.kernel.org/r/20230912010711.12036-1-liew.s.piaw@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/atheros/atl1c/atl1c.h | 3 -
.../net/ethernet/atheros/atl1c/atl1c_main.c | 67 +++++--------------
2 files changed, 16 insertions(+), 54 deletions(-)
diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c.h b/drivers/net/ethernet/atheros/atl1c/atl1c.h
index 43d821fe7a542..63ba64dbb7310 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c.h
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c.h
@@ -504,15 +504,12 @@ struct atl1c_rrd_ring {
u16 next_to_use;
u16 next_to_clean;
struct napi_struct napi;
- struct page *rx_page;
- unsigned int rx_page_offset;
};
/* board specific private data structure */
struct atl1c_adapter {
struct net_device *netdev;
struct pci_dev *pdev;
- unsigned int rx_frag_size;
struct atl1c_hw hw;
struct atl1c_hw_stats hw_stats;
struct mii_if_info mii; /* MII interface info */
diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
index 940c5d1ff9cfc..74b78164cf74a 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
@@ -483,15 +483,10 @@ static int atl1c_set_mac_addr(struct net_device *netdev, void *p)
static void atl1c_set_rxbufsize(struct atl1c_adapter *adapter,
struct net_device *dev)
{
- unsigned int head_size;
int mtu = dev->mtu;
adapter->rx_buffer_len = mtu > AT_RX_BUF_SIZE ?
roundup(mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN, 8) : AT_RX_BUF_SIZE;
-
- head_size = SKB_DATA_ALIGN(adapter->rx_buffer_len + NET_SKB_PAD + NET_IP_ALIGN) +
- SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
- adapter->rx_frag_size = roundup_pow_of_two(head_size);
}
static netdev_features_t atl1c_fix_features(struct net_device *netdev,
@@ -964,7 +959,6 @@ static void atl1c_init_ring_ptrs(struct atl1c_adapter *adapter)
static void atl1c_free_ring_resources(struct atl1c_adapter *adapter)
{
struct pci_dev *pdev = adapter->pdev;
- int i;
dma_free_coherent(&pdev->dev, adapter->ring_header.size,
adapter->ring_header.desc, adapter->ring_header.dma);
@@ -977,12 +971,6 @@ static void atl1c_free_ring_resources(struct atl1c_adapter *adapter)
kfree(adapter->tpd_ring[0].buffer_info);
adapter->tpd_ring[0].buffer_info = NULL;
}
- for (i = 0; i < adapter->rx_queue_count; ++i) {
- if (adapter->rrd_ring[i].rx_page) {
- put_page(adapter->rrd_ring[i].rx_page);
- adapter->rrd_ring[i].rx_page = NULL;
- }
- }
}
/**
@@ -1754,48 +1742,11 @@ static inline void atl1c_rx_checksum(struct atl1c_adapter *adapter,
skb_checksum_none_assert(skb);
}
-static struct sk_buff *atl1c_alloc_skb(struct atl1c_adapter *adapter,
- u32 queue, bool napi_mode)
-{
- struct atl1c_rrd_ring *rrd_ring = &adapter->rrd_ring[queue];
- struct sk_buff *skb;
- struct page *page;
-
- if (adapter->rx_frag_size > PAGE_SIZE) {
- if (likely(napi_mode))
- return napi_alloc_skb(&rrd_ring->napi,
- adapter->rx_buffer_len);
- else
- return netdev_alloc_skb_ip_align(adapter->netdev,
- adapter->rx_buffer_len);
- }
-
- page = rrd_ring->rx_page;
- if (!page) {
- page = alloc_page(GFP_ATOMIC);
- if (unlikely(!page))
- return NULL;
- rrd_ring->rx_page = page;
- rrd_ring->rx_page_offset = 0;
- }
-
- skb = build_skb(page_address(page) + rrd_ring->rx_page_offset,
- adapter->rx_frag_size);
- if (likely(skb)) {
- skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
- rrd_ring->rx_page_offset += adapter->rx_frag_size;
- if (rrd_ring->rx_page_offset >= PAGE_SIZE)
- rrd_ring->rx_page = NULL;
- else
- get_page(page);
- }
- return skb;
-}
-
static int atl1c_alloc_rx_buffer(struct atl1c_adapter *adapter, u32 queue,
bool napi_mode)
{
struct atl1c_rfd_ring *rfd_ring = &adapter->rfd_ring[queue];
+ struct atl1c_rrd_ring *rrd_ring = &adapter->rrd_ring[queue];
struct pci_dev *pdev = adapter->pdev;
struct atl1c_buffer *buffer_info, *next_info;
struct sk_buff *skb;
@@ -1814,13 +1765,27 @@ static int atl1c_alloc_rx_buffer(struct atl1c_adapter *adapter, u32 queue,
while (next_info->flags & ATL1C_BUFFER_FREE) {
rfd_desc = ATL1C_RFD_DESC(rfd_ring, rfd_next_to_use);
- skb = atl1c_alloc_skb(adapter, queue, napi_mode);
+ /* When DMA RX address is set to something like
+ * 0x....fc0, it will be very likely to cause DMA
+ * RFD overflow issue.
+ *
+ * To work around it, we apply rx skb with 64 bytes
+ * longer space, and offset the address whenever
+ * 0x....fc0 is detected.
+ */
+ if (likely(napi_mode))
+ skb = napi_alloc_skb(&rrd_ring->napi, adapter->rx_buffer_len + 64);
+ else
+ skb = netdev_alloc_skb(adapter->netdev, adapter->rx_buffer_len + 64);
if (unlikely(!skb)) {
if (netif_msg_rx_err(adapter))
dev_warn(&pdev->dev, "alloc rx buffer failed\n");
break;
}
+ if (((unsigned long)skb->data & 0xfff) == 0xfc0)
+ skb_reserve(skb, 64);
+
/*
* Make buffer alignment 2 beyond a 16 byte boundary
* this will result in a 16 byte aligned IP header after
--
2.42.0
^ permalink raw reply related
* [PATCH AUTOSEL 6.6 04/31] wifi: mac80211: don't return unset power in ieee80211_get_tx_power()
From: Sasha Levin @ 2023-11-07 12:05 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Ping-Ke Shih, Zong-Zhe Yang, Johannes Berg, Sasha Levin, johannes,
davem, edumazet, kuba, pabeni, linux-wireless, netdev
In-Reply-To: <20231107120704.3756327-1-sashal@kernel.org>
From: Ping-Ke Shih <pkshih@realtek.com>
[ Upstream commit e160ab85166e77347d0cbe5149045cb25e83937f ]
We can get a UBSAN warning if ieee80211_get_tx_power() returns the
INT_MIN value mac80211 internally uses for "unset power level".
UBSAN: signed-integer-overflow in net/wireless/nl80211.c:3816:5
-2147483648 * 100 cannot be represented in type 'int'
CPU: 0 PID: 20433 Comm: insmod Tainted: G WC OE
Call Trace:
dump_stack+0x74/0x92
ubsan_epilogue+0x9/0x50
handle_overflow+0x8d/0xd0
__ubsan_handle_mul_overflow+0xe/0x10
nl80211_send_iface+0x688/0x6b0 [cfg80211]
[...]
cfg80211_register_wdev+0x78/0xb0 [cfg80211]
cfg80211_netdev_notifier_call+0x200/0x620 [cfg80211]
[...]
ieee80211_if_add+0x60e/0x8f0 [mac80211]
ieee80211_register_hw+0xda5/0x1170 [mac80211]
In this case, simply return an error instead, to indicate
that no data is available.
Cc: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://lore.kernel.org/r/20230203023636.4418-1-pkshih@realtek.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/mac80211/cfg.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index 0e3a1753a51c6..715da615f0359 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -3121,6 +3121,10 @@ static int ieee80211_get_tx_power(struct wiphy *wiphy,
else
*dbm = sdata->vif.bss_conf.txpower;
+ /* INT_MIN indicates no power level was set yet */
+ if (*dbm == INT_MIN)
+ return -EINVAL;
+
return 0;
}
--
2.42.0
^ permalink raw reply related
* Re: [RFC Draft net-next] docs: netdev: add section on using lei to manage netdev mail volume
From: Matthieu Baerts @ 2023-11-07 12:06 UTC (permalink / raw)
To: David Wei, Jakub Kicinski, davem
Cc: netdev, workflows, linux-doc, pabeni, corbet, edumazet
In-Reply-To: <9ee972b4-b3ff-4201-b22e-c76080cb8f6e@davidwei.uk>
Hi David,
On 06/11/2023 17:57, David Wei wrote:
> On 2023-11-06 03:24, Matthieu Baerts wrote:
>> On 05/11/2023 19:50, David Wei wrote:
>>> As a beginner to netdev I found the volume of mail to be overwhelming. I only
>>> want to focus on core netdev changes and ignore most driver changes. I found a
>>> way to do this using lei, filtering the mailing list using lore's query
>>> language and writing the results into an IMAP server.
>>
>> I agree that the volume of mail is too high with a variety of subjects.
>> That's why it is very important to CC the right people (as mentioned by
>> Patchwork [1] ;) )
>>
>> [1]
>> https://patchwork.kernel.org/project/netdevbpf/patch/20231105185014.2523447-1-dw@davidwei.uk/
>
> Sorry and noted, I've now CC'd maintainers mentioned by Patchwork.
Thanks!
>>> This patch is an RFC draft of updating the maintainer-netdev documentation with
>>> this information in the hope of helping out others in the future.
>>
>> Note that I'm also using lei to filter emails, e.g. to be notified when
>> someone sends a patch modifying this maintainer-netdev.rst file! [2]
>>
>> But I don't think this issue of "busy mailing list" is specific to
>> netdev. It seems that "lei" is already mentioned in another part of the
>> doc [3]. Maybe this part can be improved? Or the netdev doc could add a
>> reference to the existing part?
>
> I think "busy mailing list" is especially bad for netdev. There are many
> tutorials for setting up lei, but my ideal goal is a copy + paste
> command specifically for netdev that outputs into an IMAP server for
> beginners to use. As opposed to writing something more generic.
I see. I don't know if many people are in this case, but having this
example will certainly help people adapting it to their case!
>> (Maybe such info should be present elsewhere, e.g. on vger [4] or lore)
>>
>> [2]
>> https://lore.kernel.org/netdev/?q=%28dfn%3ADocumentation%2Fnetworking%2Fnetdev-FAQ.rst+OR+dfn%3ADocumentation%2Fprocess%2Fmaintainer-netdev.rst%29+AND+rt%3A1.month.ago..
>> [3]
>> https://docs.kernel.org/maintainer/feature-and-driver-maintainers.html#mailing-list-participation
>
> This document is aimed at kernel maintainers. My concern is that
> beginners would not find or read this document.
Indeed.
>> [4] http://vger.kernel.org/vger-lists.html
>
> It would be nice to add a link in the netdev list "Info" section. Do you
> know how to update it?
No, sorry. Maybe Jakub or DaveM can help?
> How about keeping a netdev specific sample lei query in
> maintainer-netdev and refer to it from [4]?
Fine by me, but best to check with Netdev maintainers :)
(...)
> It would be ideal if we could express dfn:^net/*. I contacted the public
> inbox folks and they said it is not supported :(
Thank you for having asked them and Konstantin. That's a shame we cannot
use regex. Maybe later.
Cheers,
Matt
^ permalink raw reply
* [PATCH net] net: ti: icss-iep: fix setting counter value
From: Diogo Ivo @ 2023-11-07 12:00 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, horms, danishanwar, vigneshr,
rogerq, grygorii.strashko, m-karicheri2
Cc: Diogo Ivo, jan.kiszka, netdev, baocheng.su
Currently icss_iep_set_counter() writes the upper 32-bits of the
counter value to both the lower and upper counter registers, so
fix this by writing the appropriate value to the lower register.
Fixes: c1e0230eeaab ("net: ti: icss-iep: Add IEP driver")
Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
---
drivers/net/ethernet/ti/icssg/icss_iep.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/ti/icssg/icss_iep.c b/drivers/net/ethernet/ti/icssg/icss_iep.c
index 4cf2a52e4378..3025e9c18970 100644
--- a/drivers/net/ethernet/ti/icssg/icss_iep.c
+++ b/drivers/net/ethernet/ti/icssg/icss_iep.c
@@ -177,7 +177,7 @@ static void icss_iep_set_counter(struct icss_iep *iep, u64 ns)
if (iep->plat_data->flags & ICSS_IEP_64BIT_COUNTER_SUPPORT)
writel(upper_32_bits(ns), iep->base +
iep->plat_data->reg_offs[ICSS_IEP_COUNT_REG1]);
- writel(upper_32_bits(ns), iep->base + iep->plat_data->reg_offs[ICSS_IEP_COUNT_REG0]);
+ writel(lower_32_bits(ns), iep->base + iep->plat_data->reg_offs[ICSS_IEP_COUNT_REG0]);
}
static void icss_iep_update_to_next_boundary(struct icss_iep *iep, u64 start_ns);
--
2.42.1
^ permalink raw reply related
* [PATCH net] page_pool: Add myself as page pool reviewer in MAINTAINERS
From: Yunsheng Lin @ 2023-11-07 11:34 UTC (permalink / raw)
To: davem, kuba, pabeni
Cc: netdev, linux-kernel, Yunsheng Lin, Jesper Dangaard Brouer,
Ilias Apalodimas
I have added frag support for page pool, made some improvement
for it recently, and reviewed some related patches too.
So add myself as reviewer so that future patch will be cc'ed
to my email.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Jesper Dangaard Brouer <hawk@kernel.org>
CC: Ilias Apalodimas <ilias.apalodimas@linaro.org>
CC: David S. Miller <davem@davemloft.net>
CC: Jakub Kicinski <kuba@kernel.org>
CC: Paolo Abeni <pabeni@redhat.com>
CC: Netdev <netdev@vger.kernel.org>
---
MAINTAINERS | 1 +
1 file changed, 1 insertion(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 14e1194faa4b..5d20efb9021a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16242,6 +16242,7 @@ F: mm/truncate.c
PAGE POOL
M: Jesper Dangaard Brouer <hawk@kernel.org>
M: Ilias Apalodimas <ilias.apalodimas@linaro.org>
+R Yunsheng Lin <linyunsheng@huawei.com>
L: netdev@vger.kernel.org
S: Supported
F: Documentation/networking/page_pool.rst
--
2.33.0
^ permalink raw reply related
* [PATCH] iphase: Adding a null pointer check
From: Andrey Shumilin @ 2023-11-07 11:24 UTC (permalink / raw)
To: 3chas3
Cc: Andrey Shumilin, linux-atm-general, netdev, linux-kernel,
lvc-project
The pointer <dev->desc_tbl[i].iavcc> is dereferenced on line 195.
Further in the code, it is checked for null on line 204.
It is proposed to add a check before dereferencing the pointer.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Andrey Shumilin <shum.sdl@nppct.ru>
---
drivers/atm/iphase.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/atm/iphase.c b/drivers/atm/iphase.c
index 324148686953..596422fbfacc 100644
--- a/drivers/atm/iphase.c
+++ b/drivers/atm/iphase.c
@@ -192,6 +192,11 @@ static u16 get_desc (IADEV *dev, struct ia_vcc *iavcc) {
i++;
continue;
}
+ if (!(iavcc_r = dev->desc_tbl[i].iavcc)) {
+ printk("Fatal err, desc table vcc or skb is NULL\n");
+ i++;
+ continue;
+ }
ltimeout = dev->desc_tbl[i].iavcc->ltimeout;
delta = jiffies - dev->desc_tbl[i].timestamp;
if (delta >= ltimeout) {
--
2.30.2
^ permalink raw reply related
* [PATCH v2 net 7/7] net/sched: taprio: enable cycle time adjustment for current entry
From: Faizal Rahim @ 2023-11-07 11:20 UTC (permalink / raw)
To: Vladimir Oltean, Vinicius Costa Gomes, Jamal Hadi Salim,
Cong Wang, Jiri Pirko, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: netdev, linux-kernel
In-Reply-To: <20231107112023.676016-1-faizal.abdul.rahim@linux.intel.com>
Handles cycle time adjustments for the current active entry
when new admin base time occurs quickly, either within the
current entry or the next one.
Changes covers:
1. Negative cycle correction or truncation
Occurs when the new admin base time falls before the expiry of the
current running entry.
2. Positive cycle correction or extension
Occurs when the new admin base time falls within the next entry,
and the current entry is the cycle's last entry. In this case, the
changes in taprio_start_sched() extends the schedule, preventing
old oper schedule from resuming and getting truncated in the next
advance_sched() call.
3. A new API, update_gate_close_time(), has been created to update
the gate_close_time of the current entry in the event of cycle
correction.
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
---
net/sched/sch_taprio.c | 72 +++++++++++++++++++++++++++++++-----------
1 file changed, 53 insertions(+), 19 deletions(-)
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index c60e9e7ac193..56743754d42e 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -1379,41 +1379,75 @@ static void setup_first_end_time(struct taprio_sched *q,
rcu_assign_pointer(q->current_entry, NULL);
}
+static void update_gate_close_time(struct sched_entry *current_entry,
+ ktime_t new_end_time,
+ int num_tc)
+{
+ int tc;
+
+ for (tc = 0; tc < num_tc; tc++) {
+ if (current_entry->gate_mask & BIT(tc))
+ current_entry->gate_close_time[tc] = new_end_time;
+ }
+}
+
static void taprio_start_sched(struct Qdisc *sch,
ktime_t new_base_time,
- struct sched_gate_list *new)
+ struct sched_gate_list *admin)
{
struct taprio_sched *q = qdisc_priv(sch);
+ ktime_t expires = hrtimer_get_expires(&q->advance_timer);
+ struct net_device *dev = qdisc_dev(q->root);
+ struct sched_entry *curr_entry = NULL;
struct sched_gate_list *oper = NULL;
- ktime_t expires, start;
if (FULL_OFFLOAD_IS_ENABLED(q->flags))
return;
oper = rcu_dereference_protected(q->oper_sched,
lockdep_is_held(&q->current_entry_lock));
+ curr_entry = rcu_dereference_protected(q->current_entry,
+ lockdep_is_held(&q->current_entry_lock));
- expires = hrtimer_get_expires(&q->advance_timer);
- if (expires == 0)
- expires = KTIME_MAX;
+ if (hrtimer_active(&q->advance_timer)) {
+ oper->cycle_time_correction =
+ get_cycle_time_correction(oper, new_base_time,
+ curr_entry->end_time,
+ curr_entry);
- /* If the new schedule starts before the next expiration, we
- * reprogram it to the earliest one, so we change the admin
- * schedule to the operational one at the right time.
- */
- start = min_t(ktime_t, new_base_time, expires);
-
- if (expires != KTIME_MAX &&
- ktime_compare(start, new_base_time) == 0) {
- /* Since timer was changed to align to the new admin schedule,
- * setting the variable below to a non-initialized value will
- * indicate to advance_sched() to call switch_schedules() after
- * this timer expires.
+ if (cycle_corr_active(oper->cycle_time_correction)) {
+ /* This is the last entry we are running from oper,
+ * subsequent entry will take from the new admin.
+ */
+ ktime_t now = taprio_get_time(q);
+ u64 gate_duration_left = ktime_sub(new_base_time, now);
+ struct qdisc_size_table *stab =
+ rtnl_dereference(q->root->stab);
+ int num_tc = netdev_get_num_tc(dev);
+
+ oper->cycle_end_time = new_base_time;
+ curr_entry->end_time = new_base_time;
+ curr_entry->correction_active = true;
+
+ update_open_gate_duration(curr_entry, oper, num_tc,
+ gate_duration_left);
+ update_gate_close_time(curr_entry, new_base_time, num_tc);
+ taprio_update_queue_max_sdu(q, oper, stab);
+ taprio_set_budgets(q, oper, curr_entry);
+ }
+ }
+
+ if (!hrtimer_active(&q->advance_timer) ||
+ cycle_corr_active(oper->cycle_time_correction)) {
+ /* Use new admin base time if :
+ * 1. there's no active oper
+ * 2. there's active oper and we will change to the new admin
+ * schedule after the current entry from oper ends
*/
- oper->cycle_time_correction = 0;
+ expires = new_base_time;
}
- hrtimer_start(&q->advance_timer, start, HRTIMER_MODE_ABS);
+ hrtimer_start(&q->advance_timer, expires, HRTIMER_MODE_ABS);
}
static void taprio_set_picos_per_byte(struct net_device *dev,
--
2.25.1
^ permalink raw reply related
* [PATCH v2 net 6/7] net/sched: taprio: fix q->current_entry is NULL before its expiry
From: Faizal Rahim @ 2023-11-07 11:20 UTC (permalink / raw)
To: Vladimir Oltean, Vinicius Costa Gomes, Jamal Hadi Salim,
Cong Wang, Jiri Pirko, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: netdev, linux-kernel
In-Reply-To: <20231107112023.676016-1-faizal.abdul.rahim@linux.intel.com>
Fix the issue of prematurely setting q->current_entry to NULL in the
setup_first_end_time() function when a new admin schedule arrives
while the oper schedule is still running but hasn't transitioned yet.
This premature setting causes problems because any reference to
q->current_entry, such as in taprio_dequeue(), will result in NULL
during this period, which is incorrect. q->current_entry should remain
valid until the currently running entry expires.
To address this issue, only set q->current_entry to NULL when there is
no oper schedule currently running.
Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
---
net/sched/sch_taprio.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 01b114edec30..c60e9e7ac193 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -1375,7 +1375,8 @@ static void setup_first_end_time(struct taprio_sched *q,
first->gate_close_time[tc] = ktime_add_ns(base, first->gate_duration[tc]);
}
- rcu_assign_pointer(q->current_entry, NULL);
+ if (!hrtimer_active(&q->advance_timer))
+ rcu_assign_pointer(q->current_entry, NULL);
}
static void taprio_start_sched(struct Qdisc *sch,
--
2.25.1
^ permalink raw reply related
* [PATCH v2 net 5/7] net/sched: taprio: fix delayed switching to new schedule after timer expiry
From: Faizal Rahim @ 2023-11-07 11:20 UTC (permalink / raw)
To: Vladimir Oltean, Vinicius Costa Gomes, Jamal Hadi Salim,
Cong Wang, Jiri Pirko, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: netdev, linux-kernel
In-Reply-To: <20231107112023.676016-1-faizal.abdul.rahim@linux.intel.com>
If a new GCL is triggered and the new admin base time falls before the
expiry of advance_timer (current running entry from oper),
taprio_start_sched() resets the current advance_timer expiry to the
new admin base time. However, upon expiry, advance_sched() doesn't
immediately switch to the admin schedule. It continues running entries
from the old oper schedule, and only switches to the new admin schedule
much later. Ideally, if the advance_timer is shorten to align with the
new admin base time, when the timer expires, advance_sched() should
trigger switch_schedules() at the beginning.
To resolve this issue, set the cycle_time_correction to a non-initialized
value in taprio_start_sched(). advance_sched() will use it to initiate
switch_schedules() at the beginning.
Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule")
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
---
net/sched/sch_taprio.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index f18a5fe12f0c..01b114edec30 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -1379,14 +1379,19 @@ static void setup_first_end_time(struct taprio_sched *q,
}
static void taprio_start_sched(struct Qdisc *sch,
- ktime_t start, struct sched_gate_list *new)
+ ktime_t new_base_time,
+ struct sched_gate_list *new)
{
struct taprio_sched *q = qdisc_priv(sch);
- ktime_t expires;
+ struct sched_gate_list *oper = NULL;
+ ktime_t expires, start;
if (FULL_OFFLOAD_IS_ENABLED(q->flags))
return;
+ oper = rcu_dereference_protected(q->oper_sched,
+ lockdep_is_held(&q->current_entry_lock));
+
expires = hrtimer_get_expires(&q->advance_timer);
if (expires == 0)
expires = KTIME_MAX;
@@ -1395,7 +1400,17 @@ static void taprio_start_sched(struct Qdisc *sch,
* reprogram it to the earliest one, so we change the admin
* schedule to the operational one at the right time.
*/
- start = min_t(ktime_t, start, expires);
+ start = min_t(ktime_t, new_base_time, expires);
+
+ if (expires != KTIME_MAX &&
+ ktime_compare(start, new_base_time) == 0) {
+ /* Since timer was changed to align to the new admin schedule,
+ * setting the variable below to a non-initialized value will
+ * indicate to advance_sched() to call switch_schedules() after
+ * this timer expires.
+ */
+ oper->cycle_time_correction = 0;
+ }
hrtimer_start(&q->advance_timer, start, HRTIMER_MODE_ABS);
}
--
2.25.1
^ permalink raw reply related
* [PATCH v2 net 4/7] net/sched: taprio: get corrected value of cycle_time and interval
From: Faizal Rahim @ 2023-11-07 11:20 UTC (permalink / raw)
To: Vladimir Oltean, Vinicius Costa Gomes, Jamal Hadi Salim,
Cong Wang, Jiri Pirko, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: netdev, linux-kernel
In-Reply-To: <20231107112023.676016-1-faizal.abdul.rahim@linux.intel.com>
Retrieve adjusted cycle_time and interval values through new APIs.
Note that in some cases where the original values are required,
such as in dump_schedule() and setup_first_end_time(), direct calls
to cycle_time and interval are retained without using the new APIs.
Added a new field, correction_active, in the sched_entry struct to
determine the entry's correction state. This field is required due
to specific flow like find_entry_to_transmit() -> get_interval_end_time()
which retrieves the interval for each entry. During positive cycle
time correction, it's known that the last entry interval requires
correction. However, for negative correction, the affected entry
is unknown, which is why this new field is necessary.
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
---
net/sched/sch_taprio.c | 50 ++++++++++++++++++++++++++++++------------
1 file changed, 36 insertions(+), 14 deletions(-)
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 119dec3bbe88..f18a5fe12f0c 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -61,6 +61,7 @@ struct sched_entry {
u32 gate_mask;
u32 interval;
u8 command;
+ bool correction_active;
};
struct sched_gate_list {
@@ -215,6 +216,31 @@ static void switch_schedules(struct taprio_sched *q,
*admin = NULL;
}
+static bool cycle_corr_active(s64 cycle_time_correction)
+{
+ if (cycle_time_correction == INIT_CYCLE_TIME_CORRECTION)
+ return false;
+ else
+ return true;
+}
+
+u32 get_interval(const struct sched_entry *entry,
+ const struct sched_gate_list *oper)
+{
+ if (entry->correction_active)
+ return entry->interval + oper->cycle_time_correction;
+ else
+ return entry->interval;
+}
+
+s64 get_cycle_time(const struct sched_gate_list *oper)
+{
+ if (cycle_corr_active(oper->cycle_time_correction))
+ return oper->cycle_time + oper->cycle_time_correction;
+ else
+ return oper->cycle_time;
+}
+
/* Get how much time has been already elapsed in the current cycle. */
static s32 get_cycle_time_elapsed(struct sched_gate_list *sched, ktime_t time)
{
@@ -222,7 +248,7 @@ static s32 get_cycle_time_elapsed(struct sched_gate_list *sched, ktime_t time)
s32 time_elapsed;
time_since_sched_start = ktime_sub(time, sched->base_time);
- div_s64_rem(time_since_sched_start, sched->cycle_time, &time_elapsed);
+ div_s64_rem(time_since_sched_start, get_cycle_time(sched), &time_elapsed);
return time_elapsed;
}
@@ -235,8 +261,9 @@ static ktime_t get_interval_end_time(struct sched_gate_list *sched,
s32 cycle_elapsed = get_cycle_time_elapsed(sched, intv_start);
ktime_t intv_end, cycle_ext_end, cycle_end;
- cycle_end = ktime_add_ns(intv_start, sched->cycle_time - cycle_elapsed);
- intv_end = ktime_add_ns(intv_start, entry->interval);
+ cycle_end = ktime_add_ns(intv_start,
+ get_cycle_time(sched) - cycle_elapsed);
+ intv_end = ktime_add_ns(intv_start, get_interval(entry, sched));
cycle_ext_end = ktime_add(cycle_end, sched->cycle_time_extension);
if (ktime_before(intv_end, cycle_end))
@@ -259,14 +286,6 @@ static int duration_to_length(struct taprio_sched *q, u64 duration)
return div_u64(duration * PSEC_PER_NSEC, atomic64_read(&q->picos_per_byte));
}
-static bool cycle_corr_active(s64 cycle_time_correction)
-{
- if (cycle_time_correction == INIT_CYCLE_TIME_CORRECTION)
- return false;
- else
- return true;
-}
-
/* Sets sched->max_sdu[] and sched->max_frm_len[] to the minimum between the
* q->max_sdu[] requested by the user and the max_sdu dynamically determined by
* the maximum open gate durations at the given link speed.
@@ -351,7 +370,7 @@ static struct sched_entry *find_entry_to_transmit(struct sk_buff *skb,
if (!sched)
return NULL;
- cycle = sched->cycle_time;
+ cycle = get_cycle_time(sched);
cycle_elapsed = get_cycle_time_elapsed(sched, time);
curr_intv_end = ktime_sub_ns(time, cycle_elapsed);
cycle_end = ktime_add_ns(curr_intv_end, cycle);
@@ -365,7 +384,7 @@ static struct sched_entry *find_entry_to_transmit(struct sk_buff *skb,
break;
if (!(entry->gate_mask & BIT(tc)) ||
- packet_transmit_time > entry->interval)
+ packet_transmit_time > get_interval(entry, sched))
continue;
txtime = entry->next_txtime;
@@ -543,7 +562,8 @@ static long get_packet_txtime(struct sk_buff *skb, struct Qdisc *sch)
* interval starts.
*/
if (ktime_after(transmit_end_time, interval_end))
- entry->next_txtime = ktime_add(interval_start, sched->cycle_time);
+ entry->next_txtime =
+ ktime_add(interval_start, get_cycle_time(sched));
} while (sched_changed || ktime_after(transmit_end_time, interval_end));
entry->next_txtime = transmit_end_time;
@@ -1045,6 +1065,7 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
oper->cycle_end_time = new_base_time;
end_time = new_base_time;
+ next->correction_active = true;
update_open_gate_duration(next, oper, num_tc,
new_gate_duration);
@@ -1146,6 +1167,7 @@ static int fill_sched_entry(struct taprio_sched *q, struct nlattr **tb,
}
entry->interval = interval;
+ entry->correction_active = false;
return 0;
}
--
2.25.1
^ permalink raw reply related
* [PATCH v2 net 3/7] net/sched: taprio: update impacted fields during cycle time adjustment
From: Faizal Rahim @ 2023-11-07 11:20 UTC (permalink / raw)
To: Vladimir Oltean, Vinicius Costa Gomes, Jamal Hadi Salim,
Cong Wang, Jiri Pirko, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: netdev, linux-kernel
In-Reply-To: <20231107112023.676016-1-faizal.abdul.rahim@linux.intel.com>
Update impacted fields in advance_sched() if cycle_corr_active()
is true, which indicates that the next entry is the last entry
from oper that it will run.
Update impacted fields:
1. gate_duration[tc], max_open_gate_duration[tc]
Created a new API update_open_gate_duration().The API sets the
duration based on the last remaining entry, the original value
was based on consideration of multiple entries.
2. gate_close_time[tc]
Update next entry gate close time according to the new admin
base time
3. max_sdu[tc], budget[tc]
Restrict from setting to max value because there's only a single
entry left to run from oper before changing to the new admin
schedule, so we shouldn't set to max.
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
---
net/sched/sch_taprio.c | 49 +++++++++++++++++++++++++++++++++++++++---
1 file changed, 46 insertions(+), 3 deletions(-)
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index ed32654b46f5..119dec3bbe88 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -288,7 +288,8 @@ static void taprio_update_queue_max_sdu(struct taprio_sched *q,
/* TC gate never closes => keep the queueMaxSDU
* selected by the user
*/
- if (sched->max_open_gate_duration[tc] == sched->cycle_time) {
+ if (sched->max_open_gate_duration[tc] == sched->cycle_time &&
+ !cycle_corr_active(sched->cycle_time_correction)) {
max_sdu_dynamic = U32_MAX;
} else {
u32 max_frm_len;
@@ -684,7 +685,8 @@ static void taprio_set_budgets(struct taprio_sched *q,
for (tc = 0; tc < num_tc; tc++) {
/* Traffic classes which never close have infinite budget */
- if (entry->gate_duration[tc] == sched->cycle_time)
+ if (entry->gate_duration[tc] == sched->cycle_time &&
+ !cycle_corr_active(sched->cycle_time_correction))
budget = INT_MAX;
else
budget = div64_u64((u64)entry->gate_duration[tc] * PSEC_PER_NSEC,
@@ -896,6 +898,32 @@ static bool should_restart_cycle(const struct sched_gate_list *oper,
return false;
}
+/* Open gate duration were calculated at the beginning with consideration of
+ * multiple entries. If cycle time correction is active, there's only a single
+ * remaining entry left from oper to run.
+ * Update open gate duration based on this last entry.
+ */
+static void update_open_gate_duration(struct sched_entry *entry,
+ struct sched_gate_list *oper,
+ int num_tc,
+ u64 open_gate_duration)
+{
+ int tc;
+
+ if (!entry || !oper)
+ return;
+
+ for (tc = 0; tc < num_tc; tc++) {
+ if (entry->gate_mask & BIT(tc)) {
+ entry->gate_duration[tc] = open_gate_duration;
+ oper->max_open_gate_duration[tc] = open_gate_duration;
+ } else {
+ entry->gate_duration[tc] = 0;
+ oper->max_open_gate_duration[tc] = 0;
+ }
+ }
+}
+
static bool should_change_sched(struct sched_gate_list *oper)
{
bool change_to_admin_sched = false;
@@ -1010,13 +1038,28 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
/* The next entry is the last entry we will run from
* oper, subsequent ones will take from the new admin
*/
+ u64 new_gate_duration =
+ next->interval + oper->cycle_time_correction;
+ struct qdisc_size_table *stab =
+ rtnl_dereference(q->root->stab);
+
oper->cycle_end_time = new_base_time;
end_time = new_base_time;
+
+ update_open_gate_duration(next, oper, num_tc,
+ new_gate_duration);
+ taprio_update_queue_max_sdu(q, oper, stab);
}
}
for (tc = 0; tc < num_tc; tc++) {
- if (next->gate_duration[tc] == oper->cycle_time)
+ if (cycle_corr_active(oper->cycle_time_correction) &&
+ (next->gate_mask & BIT(tc)))
+ /* Set to the new base time, ensuring a smooth transition
+ * to the new schedule when the next entry finishes.
+ */
+ next->gate_close_time[tc] = end_time;
+ else if (next->gate_duration[tc] == oper->cycle_time)
next->gate_close_time[tc] = KTIME_MAX;
else
next->gate_close_time[tc] = ktime_add_ns(entry->end_time,
--
2.25.1
^ permalink raw reply related
* [PATCH v2 net 2/7] net/sched: taprio: fix cycle time adjustment for next entry
From: Faizal Rahim @ 2023-11-07 11:20 UTC (permalink / raw)
To: Vladimir Oltean, Vinicius Costa Gomes, Jamal Hadi Salim,
Cong Wang, Jiri Pirko, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: netdev, linux-kernel
In-Reply-To: <20231107112023.676016-1-faizal.abdul.rahim@linux.intel.com>
According to IEEE Std. 802.1Q-2018 section Q.5 CycleTimeExtension:
"the Cycle Time Extension variable allows this extension of the last old
cycle to be done in a defined way. If the last complete old cycle would
normally end less than OperCycleTimeExtension nanoseconds before the new
base time, then the last complete cycle before AdminBaseTime is reached
is extended so that it ends at AdminBaseTime."
The current taprio implementation does not extend the last old cycle in
the defined manner specified in the Qbv Spec. This is part of the fix
covered in this patch.
Here are the changes made:
1. A new API, get_cycle_time_correction(), has been added to return the
correction value. If it returns a non-initialize value, it indicates
changes required for the next entry schedule, and upon the completion
of the next entry's duration, entries will be loaded from the new admin
schedule.
2. Store correction values in cycle_time_correction:
a) Positive correction value/extension
We calculate the correction between the last operational cycle and the
new admin base time. Note that for positive correction to take place,
the next entry should be the last entry from oper and the new admin base
time falls within the next cycle time of old oper.
b) Negative correction value
The new admin base time starts earlier than the next entry's end time.
c) Zero correction value
The new admin base time aligns exactly with the old cycle.
3. When cycle_time_correction is set to a non-initialized value, it is
used to:
a) Indicate that cycle correction is active to trigger adjustments in
affected fields like interval and cycle_time. A new API,
cycle_corr_active(), has been created to help with this purpose.
b) Transition to the new admin schedule at the beginning of
advance_sched(). A new API, should_change_sched(), has been introduced
for this purpose.
4. Remove the previous definition of should_change_scheds() API. A new
should_change_sched() API has been introduced, but it serves a
completely different purpose than the one that was removed.
Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule")
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
---
net/sched/sch_taprio.c | 105 +++++++++++++++++++++++++++--------------
1 file changed, 70 insertions(+), 35 deletions(-)
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index dee103647823..ed32654b46f5 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -259,6 +259,14 @@ static int duration_to_length(struct taprio_sched *q, u64 duration)
return div_u64(duration * PSEC_PER_NSEC, atomic64_read(&q->picos_per_byte));
}
+static bool cycle_corr_active(s64 cycle_time_correction)
+{
+ if (cycle_time_correction == INIT_CYCLE_TIME_CORRECTION)
+ return false;
+ else
+ return true;
+}
+
/* Sets sched->max_sdu[] and sched->max_frm_len[] to the minimum between the
* q->max_sdu[] requested by the user and the max_sdu dynamically determined by
* the maximum open gate durations at the given link speed.
@@ -888,38 +896,59 @@ static bool should_restart_cycle(const struct sched_gate_list *oper,
return false;
}
-static bool should_change_schedules(const struct sched_gate_list *admin,
- const struct sched_gate_list *oper,
- ktime_t end_time)
+static bool should_change_sched(struct sched_gate_list *oper)
{
- ktime_t next_base_time, extension_time;
+ bool change_to_admin_sched = false;
- if (!admin)
- return false;
+ if (oper->cycle_time_correction != INIT_CYCLE_TIME_CORRECTION) {
+ /* The recent entry ran is the last one from oper */
+ change_to_admin_sched = true;
+ oper->cycle_time_correction = INIT_CYCLE_TIME_CORRECTION;
+ }
- next_base_time = sched_base_time(admin);
+ return change_to_admin_sched;
+}
- /* This is the simple case, the end_time would fall after
- * the next schedule base_time.
- */
- if (ktime_compare(next_base_time, end_time) <= 0)
+static bool should_extend_cycle(const struct sched_gate_list *oper,
+ ktime_t new_base_time,
+ ktime_t entry_end_time,
+ const struct sched_entry *entry)
+{
+ ktime_t next_cycle_end_time = ktime_add_ns(oper->cycle_end_time,
+ oper->cycle_time);
+ bool extension_supported = oper->cycle_time_extension > 0 ? true : false;
+ s64 extension_limit = oper->cycle_time_extension;
+
+ if (extension_supported &&
+ list_is_last(&entry->list, &oper->entries) &&
+ ktime_before(new_base_time, next_cycle_end_time) &&
+ ktime_sub(new_base_time, entry_end_time) < extension_limit)
return true;
+ else
+ return false;
+}
- /* This is the cycle_time_extension case, if the end_time
- * plus the amount that can be extended would fall after the
- * next schedule base_time, we can extend the current schedule
- * for that amount.
- */
- extension_time = ktime_add_ns(end_time, oper->cycle_time_extension);
+static s64 get_cycle_time_correction(const struct sched_gate_list *oper,
+ ktime_t new_base_time,
+ ktime_t entry_end_time,
+ const struct sched_entry *entry)
+{
+ s64 correction = INIT_CYCLE_TIME_CORRECTION;
- /* FIXME: the IEEE 802.1Q-2018 Specification isn't clear about
- * how precisely the extension should be made. So after
- * conformance testing, this logic may change.
- */
- if (ktime_compare(next_base_time, extension_time) <= 0)
- return true;
+ if (!entry || !oper)
+ return correction;
- return false;
+ if (ktime_compare(new_base_time, entry_end_time) <= 0) {
+ /* negative or zero correction */
+ correction = ktime_sub(new_base_time, entry_end_time);
+ } else if (ktime_after(new_base_time, entry_end_time) &&
+ should_extend_cycle(oper, new_base_time,
+ entry_end_time, entry)) {
+ /* positive correction */
+ correction = ktime_sub(new_base_time, entry_end_time);
+ }
+
+ return correction;
}
static enum hrtimer_restart advance_sched(struct hrtimer *timer)
@@ -942,10 +971,8 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
admin = rcu_dereference_protected(q->admin_sched,
lockdep_is_held(&q->current_entry_lock));
- if (!oper || oper->cycle_time_correction != INIT_CYCLE_TIME_CORRECTION) {
- oper->cycle_time_correction = INIT_CYCLE_TIME_CORRECTION;
+ if (!oper || should_change_sched(oper))
switch_schedules(q, &admin, &oper);
- }
/* This can happen in two cases: 1. this is the very first run
* of this function (i.e. we weren't running any schedule
@@ -972,6 +999,22 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
end_time = ktime_add_ns(entry->end_time, next->interval);
end_time = min_t(ktime_t, end_time, oper->cycle_end_time);
+ if (admin) {
+ ktime_t new_base_time = sched_base_time(admin);
+
+ oper->cycle_time_correction =
+ get_cycle_time_correction(oper, new_base_time,
+ end_time, next);
+
+ if (cycle_corr_active(oper->cycle_time_correction)) {
+ /* The next entry is the last entry we will run from
+ * oper, subsequent ones will take from the new admin
+ */
+ oper->cycle_end_time = new_base_time;
+ end_time = new_base_time;
+ }
+ }
+
for (tc = 0; tc < num_tc; tc++) {
if (next->gate_duration[tc] == oper->cycle_time)
next->gate_close_time[tc] = KTIME_MAX;
@@ -980,14 +1023,6 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
next->gate_duration[tc]);
}
- if (should_change_schedules(admin, oper, end_time)) {
- /* Set things so the next time this runs, the new
- * schedule runs.
- */
- end_time = sched_base_time(admin);
- oper->cycle_time_correction = 0;
- }
-
next->end_time = end_time;
taprio_set_budgets(q, oper, next);
--
2.25.1
^ permalink raw reply related
* [PATCH v2 net 1/7] net/sched: taprio: fix too early schedules switching
From: Faizal Rahim @ 2023-11-07 11:20 UTC (permalink / raw)
To: Vladimir Oltean, Vinicius Costa Gomes, Jamal Hadi Salim,
Cong Wang, Jiri Pirko, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: netdev, linux-kernel
In-Reply-To: <20231107112023.676016-1-faizal.abdul.rahim@linux.intel.com>
In the current taprio code for dynamic schedule change,
admin/oper schedule switching happens immediately when
should_change_schedules() is true. Then the last entry of
the old admin schedule stops being valid anymore from
taprio_dequeue_from_txq’s perspective.
To solve this, we have to delay the switch_schedules() call via
the new cycle_time_correction variable. The variable serves 2
purposes:
1. Upon entering advance_sched(), if the value is set to a
non-initialized value, it indicates that we need to change
schedule.
2. Store the cycle time correction value which will be used for
negative or positive correction.
Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule")
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
---
net/sched/sch_taprio.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 2e1949de4171..dee103647823 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -41,6 +41,7 @@ static struct static_key_false taprio_have_working_mqprio;
#define TXTIME_ASSIST_IS_ENABLED(flags) ((flags) & TCA_TAPRIO_ATTR_FLAG_TXTIME_ASSIST)
#define FULL_OFFLOAD_IS_ENABLED(flags) ((flags) & TCA_TAPRIO_ATTR_FLAG_FULL_OFFLOAD)
#define TAPRIO_FLAGS_INVALID U32_MAX
+#define INIT_CYCLE_TIME_CORRECTION S64_MIN
struct sched_entry {
/* Durations between this GCL entry and the GCL entry where the
@@ -75,6 +76,7 @@ struct sched_gate_list {
ktime_t cycle_end_time;
s64 cycle_time;
s64 cycle_time_extension;
+ s64 cycle_time_correction;
s64 base_time;
};
@@ -940,8 +942,10 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
admin = rcu_dereference_protected(q->admin_sched,
lockdep_is_held(&q->current_entry_lock));
- if (!oper)
+ if (!oper || oper->cycle_time_correction != INIT_CYCLE_TIME_CORRECTION) {
+ oper->cycle_time_correction = INIT_CYCLE_TIME_CORRECTION;
switch_schedules(q, &admin, &oper);
+ }
/* This can happen in two cases: 1. this is the very first run
* of this function (i.e. we weren't running any schedule
@@ -981,7 +985,7 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
* schedule runs.
*/
end_time = sched_base_time(admin);
- switch_schedules(q, &admin, &oper);
+ oper->cycle_time_correction = 0;
}
next->end_time = end_time;
@@ -1174,6 +1178,7 @@ static int parse_taprio_schedule(struct taprio_sched *q, struct nlattr **tb,
}
taprio_calculate_gate_durations(q, new);
+ new->cycle_time_correction = INIT_CYCLE_TIME_CORRECTION;
return 0;
}
--
2.25.1
^ permalink raw reply related
* [PATCH v2 net 0/7] qbv cycle time extension/truncation
From: Faizal Rahim @ 2023-11-07 11:20 UTC (permalink / raw)
To: Vladimir Oltean, Vinicius Costa Gomes, Jamal Hadi Salim,
Cong Wang, Jiri Pirko, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: netdev, linux-kernel
According to IEEE Std. 802.1Q-2018 section Q.5 CycleTimeExtension,
the Cycle Time Extension variable allows this extension of the last old
cycle to be done in a defined way. If the last complete old cycle would
normally end less than OperCycleTimeExtension nanoseconds before the new
base time, then the last complete cycle before AdminBaseTime is reached
is extended so that it ends at AdminBaseTime.
Changes in v2:
- Added 's64 cycle_time_correction' in 'sched_gate_list struct'.
- Removed sched_changed created in v1 since the new cycle_time_correction
field can also serve to indicate the need for a schedule change.
- Added 'bool correction_active' in 'struct sched_entry' to represent
the correction state from the entry's perspective and return corrected
interval value when active.
- Fix cycle time correction logics for the next entry in advance_sched()
- Fix and implement proper cycle time correction logics for current
entry in taprio_start_sched()
v1 at:
https://lore.kernel.org/lkml/20230530082541.495-1-muhammad.husaini.zulkifli@intel.com/
Faizal Rahim (7):
net/sched: taprio: fix too early schedules switching
net/sched: taprio: fix cycle time adjustment for next entry
net/sched: taprio: update impacted fields during cycle time adjustment
net/sched: taprio: get corrected value of cycle_time and interval
net/sched: taprio: fix delayed switching to new schedule after timer
expiry
net/sched: taprio: fix q->current_entry is NULL before its expiry
net/sched: taprio: enable cycle time adjustment for current entry
net/sched/sch_taprio.c | 263 ++++++++++++++++++++++++++++++++---------
1 file changed, 209 insertions(+), 54 deletions(-)
--
2.25.1
^ permalink raw reply
* Re: [PATCH v3 04/13] mm/execmem, arch: convert remaining overrides of module_alloc to execmem
From: Will Deacon @ 2023-11-07 10:44 UTC (permalink / raw)
To: Mike Rapoport
Cc: linux-kernel, Andrew Morton, Björn Töpel,
Catalin Marinas, Christophe Leroy, David S. Miller, Dinh Nguyen,
Heiko Carstens, Helge Deller, Huacai Chen, Kent Overstreet,
Luis Chamberlain, Mark Rutland, Michael Ellerman, Nadav Amit,
Naveen N. Rao, Palmer Dabbelt, Puranjay Mohan, Rick Edgecombe,
Russell King, Song Liu, Steven Rostedt, Thomas Bogendoerfer,
Thomas Gleixner, bpf, linux-arm-kernel, linux-mips, linux-mm,
linux-modules, linux-parisc, linux-riscv, linux-s390,
linux-trace-kernel, linuxppc-dev, loongarch, netdev, sparclinux,
x86
In-Reply-To: <20231030070053.GL2824@kernel.org>
On Mon, Oct 30, 2023 at 09:00:53AM +0200, Mike Rapoport wrote:
> On Thu, Oct 26, 2023 at 11:24:39AM +0100, Will Deacon wrote:
> > On Thu, Oct 26, 2023 at 11:58:00AM +0300, Mike Rapoport wrote:
> > > On Mon, Oct 23, 2023 at 06:14:20PM +0100, Will Deacon wrote:
> > > > On Mon, Sep 18, 2023 at 10:29:46AM +0300, Mike Rapoport wrote:
> > > > > diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
> > > > > index dd851297596e..cd6320de1c54 100644
> > > > > --- a/arch/arm64/kernel/module.c
> > > > > +++ b/arch/arm64/kernel/module.c
>
> ...
>
> > > > > - if (module_direct_base) {
> > > > > - p = __vmalloc_node_range(size, MODULE_ALIGN,
> > > > > - module_direct_base,
> > > > > - module_direct_base + SZ_128M,
> > > > > - GFP_KERNEL | __GFP_NOWARN,
> > > > > - PAGE_KERNEL, 0, NUMA_NO_NODE,
> > > > > - __builtin_return_address(0));
> > > > > - }
> > > > > + module_init_limits();
> > > >
> > > > Hmm, this used to be run from subsys_initcall(), but now you're running
> > > > it _really_ early, before random_init(), so randomization of the module
> > > > space is no longer going to be very random if we don't have early entropy
> > > > from the firmware or the CPU, which is likely to be the case on most SoCs.
> > >
> > > Well, it will be as random as KASLR. Won't that be enough?
> >
> > I don't think that's true -- we have the 'kaslr-seed' property for KASLR,
> > but I'm not seeing anything like that for the module randomisation and I
> > also don't see why we need to set these limits so early.
>
> x86 needs execmem initialized before ftrace_init() so I thought it would be
> best to setup execmem along with most of MM in mm_core_init().
>
> I'll move execmem initialization for !x86 to a later point, say
> core_initcall.
Thanks, Mike.
Will
^ permalink raw reply
* Re: [PATCH] s390/qeth: Fix typo 'weed' in comment
From: Alexandra Winter @ 2023-11-07 10:06 UTC (permalink / raw)
To: Kuan-Wei Chiu, wenjia, hca, gor, agordeev
Cc: borntraeger, svens, linux-s390, netdev, linux-kernel
In-Reply-To: <20231106222059.1475375-1-visitorckw@gmail.com>
On 06.11.23 23:20, Kuan-Wei Chiu wrote:
> Replace 'weed' with 'we' in the comment.
>
> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> ---
> drivers/s390/net/qeth_core_main.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c
> index 6af2511e070c..cf8506d0f185 100644
> --- a/drivers/s390/net/qeth_core_main.c
> +++ b/drivers/s390/net/qeth_core_main.c
> @@ -3675,7 +3675,7 @@ static void qeth_flush_queue(struct qeth_qdio_out_q *queue)
> static void qeth_check_outbound_queue(struct qeth_qdio_out_q *queue)
> {
> /*
> - * check if weed have to switch to non-packing mode or if
> + * check if we have to switch to non-packing mode or if
> * we have to get a pci flag out on the queue
> */
> if ((atomic_read(&queue->used_buffers) <= QETH_LOW_WATERMARK_PACK) ||
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
^ permalink raw reply
* [PATCH net v3 4/4] net: ethernet: cortina: Checksum only TCP and UDP
From: Linus Walleij @ 2023-11-07 9:54 UTC (permalink / raw)
To: Hans Ulli Kroll, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Michał Mirosław, Vladimir Oltean,
Andrew Lunn
Cc: linux-arm-kernel, netdev, linux-kernel, Linus Walleij
In-Reply-To: <20231107-gemini-largeframe-fix-v3-0-e3803c080b75@linaro.org>
It is a bit odd that frames that are neither TCP or UDP
(such as ICMP or ARP) are still sent to the checksumming
engine, where they are clearly just ignored.
Rewrite the logic slightly so that we first check if the
frame is TCP or UDP, in that case bypass the checksum
engine.
Reported-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
drivers/net/ethernet/cortina/gemini.c | 35 +++++++++++++++++++++--------------
1 file changed, 21 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
index 78287cfcbf63..1bf07505653b 100644
--- a/drivers/net/ethernet/cortina/gemini.c
+++ b/drivers/net/ethernet/cortina/gemini.c
@@ -1144,6 +1144,7 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
skb_frag_t *skb_frag;
dma_addr_t mapping;
unsigned short mtu;
+ bool tcp, udp;
void *buffer;
int ret;
@@ -1160,7 +1161,18 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
word3 |= mtu;
}
- if (skb->len >= ETH_FRAME_LEN) {
+ /* Check if the protocol is TCP or UDP */
+ tcp = false;
+ udp = false;
+ if (skb->protocol == htons(ETH_P_IP)) {
+ tcp = ip_hdr(skb)->protocol == IPPROTO_TCP;
+ udp = ip_hdr(skb)->protocol == IPPROTO_UDP;
+ } else { /* IPv6 */
+ tcp = ipv6_hdr(skb)->nexthdr == IPPROTO_TCP;
+ udp = ipv6_hdr(skb)->nexthdr == IPPROTO_UDP;
+ }
+
+ if (skb->len >= ETH_FRAME_LEN || (!tcp && !udp)) {
/* Hardware offloaded checksumming isn't working on frames
* bigger than 1514 bytes. A hypothesis about this is that the
* checksum buffer is only 1518 bytes, so when the frames get
@@ -1168,6 +1180,9 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
* overwritten by the FCS.
*
* Just use software checksumming and bypass on bigger frames.
+ *
+ * Bypass the checksumming engine for any protocols that are
+ * not TCP or UDP.
*/
if (skb->ip_summed == CHECKSUM_PARTIAL) {
ret = skb_checksum_help(skb);
@@ -1176,22 +1191,14 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
}
word1 |= TSS_BYPASS_BIT;
} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
- int tcp = 0;
-
- /* We do not switch off the checksumming on non TCP/UDP
- * frames: as is shown from tests, the checksumming engine
- * is smart enough to see that a frame is not actually TCP
- * or UDP and then just pass it through without any changes
- * to the frame.
+ /* If we get here we are dealing with a TCP or UDP frame
+ * which is small enough to be processed by the checkumming
+ * engine.
*/
- if (skb->protocol == htons(ETH_P_IP)) {
+ if (skb->protocol == htons(ETH_P_IP))
word1 |= TSS_IP_CHKSUM_BIT;
- tcp = ip_hdr(skb)->protocol == IPPROTO_TCP;
- } else { /* IPv6 */
+ else
word1 |= TSS_IPV6_ENABLE_BIT;
- tcp = ipv6_hdr(skb)->nexthdr == IPPROTO_TCP;
- }
-
word1 |= tcp ? TSS_TCP_CHKSUM_BIT : TSS_UDP_CHKSUM_BIT;
}
--
2.34.1
^ permalink raw reply related
* [PATCH net v3 3/4] net: ethernet: cortina: Handle large frames
From: Linus Walleij @ 2023-11-07 9:54 UTC (permalink / raw)
To: Hans Ulli Kroll, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Michał Mirosław, Vladimir Oltean,
Andrew Lunn
Cc: linux-arm-kernel, netdev, linux-kernel, Linus Walleij
In-Reply-To: <20231107-gemini-largeframe-fix-v3-0-e3803c080b75@linaro.org>
The Gemini ethernet controller provides hardware checksumming
for frames up to 1514 bytes including ethernet headers but not
FCS.
If we start sending bigger frames (after first bumping up the MTU
on both interfaces sending and receiveing the frames), truncated
packets start to appear on the target such as in this tcpdump
resulting from ping -s 1474:
23:34:17.241983 14:d6:4d:a8:3c:4f (oui Unknown) > bc:ae:c5:6b:a8:3d (oui Unknown),
ethertype IPv4 (0x0800), length 1514: truncated-ip - 2 bytes missing!
(tos 0x0, ttl 64, id 32653, offset 0, flags [DF], proto ICMP (1), length 1502)
OpenWrt.lan > Fecusia: ICMP echo request, id 1672, seq 50, length 1482
If we bypass the hardware checksumming and provide a software
fallback, everything starts working fine up to the max TX MTU
of 2047 bytes, for example ping -s2000 192.168.1.2:
00:44:29.587598 bc:ae:c5:6b:a8:3d (oui Unknown) > 14:d6:4d:a8:3c:4f (oui Unknown),
ethertype IPv4 (0x0800), length 2042:
(tos 0x0, ttl 64, id 51828, offset 0, flags [none], proto ICMP (1), length 2028)
Fecusia > OpenWrt.lan: ICMP echo reply, id 1683, seq 4, length 2008
The bit enabling to bypass hardware checksum (or any of the
"TSS" bits) are undocumented in the hardware reference manual.
The entire hardware checksum unit appears undocumented. The
conclusion that we need to use the "bypass" bit was found by
trial-and-error.
Since no hardware checksum will happen, we slot in a software
checksum fallback.
Check for the condition where we need to compute checksum on the
skb with either hardware or software using == CHECKSUM_PARTIAL instead
of != CHECKSUM_NONE which is an incomplete check according to
<linux/skbuff.h>.
We delete the code disabling the hardware checksum for large
MTU:s: this is suboptimal because it will disable hardware
checksumming also on small packets which the checksumming
engine can handle just fine, which is a waste of resources.
On the D-Link DIR-685 router this fixes a bug on the conduit
interface to the RTL8366RB DSA switch: as the switch needs to add
space for its tag it increases the MTU on the conduit interface
to 1504 and that means that when the router sends packages
of 1500 bytes these get an extra 4 bytes of DSA tag and the
transfer fails because of the erroneous hardware checksumming,
affecting such basic functionality as the LuCI web interface.
Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit ethernet")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
drivers/net/ethernet/cortina/gemini.c | 34 +++++++++++++++++++++++-----------
1 file changed, 23 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
index b21a94b4ab5c..78287cfcbf63 100644
--- a/drivers/net/ethernet/cortina/gemini.c
+++ b/drivers/net/ethernet/cortina/gemini.c
@@ -1145,6 +1145,7 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
dma_addr_t mapping;
unsigned short mtu;
void *buffer;
+ int ret;
mtu = ETH_HLEN;
mtu += netdev->mtu;
@@ -1159,9 +1160,30 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
word3 |= mtu;
}
- if (skb->ip_summed != CHECKSUM_NONE) {
+ if (skb->len >= ETH_FRAME_LEN) {
+ /* Hardware offloaded checksumming isn't working on frames
+ * bigger than 1514 bytes. A hypothesis about this is that the
+ * checksum buffer is only 1518 bytes, so when the frames get
+ * bigger they get truncated, or the last few bytes get
+ * overwritten by the FCS.
+ *
+ * Just use software checksumming and bypass on bigger frames.
+ */
+ if (skb->ip_summed == CHECKSUM_PARTIAL) {
+ ret = skb_checksum_help(skb);
+ if (ret)
+ return ret;
+ }
+ word1 |= TSS_BYPASS_BIT;
+ } else if (skb->ip_summed == CHECKSUM_PARTIAL) {
int tcp = 0;
+ /* We do not switch off the checksumming on non TCP/UDP
+ * frames: as is shown from tests, the checksumming engine
+ * is smart enough to see that a frame is not actually TCP
+ * or UDP and then just pass it through without any changes
+ * to the frame.
+ */
if (skb->protocol == htons(ETH_P_IP)) {
word1 |= TSS_IP_CHKSUM_BIT;
tcp = ip_hdr(skb)->protocol == IPPROTO_TCP;
@@ -1978,15 +2000,6 @@ static int gmac_change_mtu(struct net_device *netdev, int new_mtu)
return 0;
}
-static netdev_features_t gmac_fix_features(struct net_device *netdev,
- netdev_features_t features)
-{
- if (netdev->mtu + ETH_HLEN + VLAN_HLEN > MTU_SIZE_BIT_MASK)
- features &= ~GMAC_OFFLOAD_FEATURES;
-
- return features;
-}
-
static int gmac_set_features(struct net_device *netdev,
netdev_features_t features)
{
@@ -2212,7 +2225,6 @@ static const struct net_device_ops gmac_351x_ops = {
.ndo_set_mac_address = gmac_set_mac_address,
.ndo_get_stats64 = gmac_get_stats64,
.ndo_change_mtu = gmac_change_mtu,
- .ndo_fix_features = gmac_fix_features,
.ndo_set_features = gmac_set_features,
};
--
2.34.1
^ permalink raw reply related
* [PATCH net v3 2/4] net: ethernet: cortina: Fix max RX frame define
From: Linus Walleij @ 2023-11-07 9:54 UTC (permalink / raw)
To: Hans Ulli Kroll, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Michał Mirosław, Vladimir Oltean,
Andrew Lunn
Cc: linux-arm-kernel, netdev, linux-kernel, Linus Walleij
In-Reply-To: <20231107-gemini-largeframe-fix-v3-0-e3803c080b75@linaro.org>
Enumerator 3 is 1548 bytes according to the datasheet.
Not 1542.
Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit ethernet")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
drivers/net/ethernet/cortina/gemini.c | 4 ++--
drivers/net/ethernet/cortina/gemini.h | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
index ed9701f8ad9a..b21a94b4ab5c 100644
--- a/drivers/net/ethernet/cortina/gemini.c
+++ b/drivers/net/ethernet/cortina/gemini.c
@@ -432,8 +432,8 @@ static const struct gmac_max_framelen gmac_maxlens[] = {
.val = CONFIG0_MAXLEN_1536,
},
{
- .max_l3_len = 1542,
- .val = CONFIG0_MAXLEN_1542,
+ .max_l3_len = 1548,
+ .val = CONFIG0_MAXLEN_1548,
},
{
.max_l3_len = 9212,
diff --git a/drivers/net/ethernet/cortina/gemini.h b/drivers/net/ethernet/cortina/gemini.h
index 201b4efe2937..24bb989981f2 100644
--- a/drivers/net/ethernet/cortina/gemini.h
+++ b/drivers/net/ethernet/cortina/gemini.h
@@ -787,7 +787,7 @@ union gmac_config0 {
#define CONFIG0_MAXLEN_1536 0
#define CONFIG0_MAXLEN_1518 1
#define CONFIG0_MAXLEN_1522 2
-#define CONFIG0_MAXLEN_1542 3
+#define CONFIG0_MAXLEN_1548 3
#define CONFIG0_MAXLEN_9k 4 /* 9212 */
#define CONFIG0_MAXLEN_10k 5 /* 10236 */
#define CONFIG0_MAXLEN_1518__6 6
--
2.34.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox