* [PATCH 4.14 18/43] brcmfmac: sdio: Fix undefined behavior due to shift overflowing the constant
From: Greg Kroah-Hartman @ 2022-04-26 8:21 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Borislav Petkov, Arend van Spriel,
Franky Lin, Hante Meuleman, Kalle Valo, David S. Miller,
Jakub Kicinski, brcm80211-dev-list.pdl, netdev, Arend van Spriel,
Sasha Levin
In-Reply-To: <20220426081734.509314186@linuxfoundation.org>
From: Borislav Petkov <bp@alien8.de>
[ Upstream commit 6fb3a5868b2117611f41e421e10e6a8c2a13039a ]
Fix:
drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c: In function ‘brcmf_sdio_drivestrengthinit’:
drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c:3798:2: error: case label does not reduce to an integer constant
case SDIOD_DRVSTR_KEY(BRCM_CC_43143_CHIP_ID, 17):
^~~~
drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c:3809:2: error: case label does not reduce to an integer constant
case SDIOD_DRVSTR_KEY(BRCM_CC_43362_CHIP_ID, 13):
^~~~
See https://lore.kernel.org/r/YkwQ6%2BtIH8GQpuct@zn.tnic for the gory
details as to why it triggers with older gccs only.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arend van Spriel <aspriel@gmail.com>
Cc: Franky Lin <franky.lin@broadcom.com>
Cc: Hante Meuleman <hante.meuleman@broadcom.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: brcm80211-dev-list.pdl@broadcom.com
Cc: netdev@vger.kernel.org
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/Ykx0iRlvtBnKqtbG@zn.tnic
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
index d198a8780b96..8fa4ffff7c32 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
@@ -552,7 +552,7 @@ enum brcmf_sdio_frmtype {
BRCMF_SDIO_FT_SUB,
};
-#define SDIOD_DRVSTR_KEY(chip, pmu) (((chip) << 16) | (pmu))
+#define SDIOD_DRVSTR_KEY(chip, pmu) (((unsigned int)(chip) << 16) | (pmu))
/* SDIO Pad drive strength to select value mappings */
struct sdiod_drive_str {
--
2.35.1
^ permalink raw reply related
* [PATCH 4.14 01/43] etherdevice: Adjust ether_addr* prototypes to silence -Wstringop-overead
From: Greg Kroah-Hartman @ 2022-04-26 8:20 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Marc Kleine-Budde, Jakub Kicinski,
David S. Miller, netdev, Kees Cook, Khem Raj
In-Reply-To: <20220426081734.509314186@linuxfoundation.org>
From: Kees Cook <keescook@chromium.org>
commit 2618a0dae09ef37728dab89ff60418cbe25ae6bd upstream.
With GCC 12, -Wstringop-overread was warning about an implicit cast from
char[6] to char[8]. However, the extra 2 bytes are always thrown away,
alignment doesn't matter, and the risk of hitting the edge of unallocated
memory has been accepted, so this prototype can just be converted to a
regular char *. Silences:
net/core/dev.c: In function ‘bpf_prog_run_generic_xdp’: net/core/dev.c:4618:21: warning: ‘ether_addr_equal_64bits’ reading 8 bytes from a region of size 6 [-Wstringop-overread]
4618 | orig_host = ether_addr_equal_64bits(eth->h_dest, > skb->dev->dev_addr);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
net/core/dev.c:4618:21: note: referencing argument 1 of type ‘const u8[8]’ {aka ‘const unsigned char[8]’}
net/core/dev.c:4618:21: note: referencing argument 2 of type ‘const u8[8]’ {aka ‘const unsigned char[8]’}
In file included from net/core/dev.c:91: include/linux/etherdevice.h:375:20: note: in a call to function ‘ether_addr_equal_64bits’
375 | static inline bool ether_addr_equal_64bits(const u8 addr1[6+2],
| ^~~~~~~~~~~~~~~~~~~~~~~
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Tested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Link: https://lore.kernel.org/netdev/20220212090811.uuzk6d76agw2vv73@pengutronix.de
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Khem Raj <raj.khem@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
include/linux/etherdevice.h | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
--- a/include/linux/etherdevice.h
+++ b/include/linux/etherdevice.h
@@ -130,7 +130,7 @@ static inline bool is_multicast_ether_ad
#endif
}
-static inline bool is_multicast_ether_addr_64bits(const u8 addr[6+2])
+static inline bool is_multicast_ether_addr_64bits(const u8 *addr)
{
#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64
#ifdef __BIG_ENDIAN
@@ -344,8 +344,7 @@ static inline bool ether_addr_equal(cons
* Please note that alignment of addr1 & addr2 are only guaranteed to be 16 bits.
*/
-static inline bool ether_addr_equal_64bits(const u8 addr1[6+2],
- const u8 addr2[6+2])
+static inline bool ether_addr_equal_64bits(const u8 *addr1, const u8 *addr2)
{
#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64
u64 fold = (*(const u64 *)addr1) ^ (*(const u64 *)addr2);
^ permalink raw reply
* [PATCH 4.9 13/24] brcmfmac: sdio: Fix undefined behavior due to shift overflowing the constant
From: Greg Kroah-Hartman @ 2022-04-26 8:21 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Borislav Petkov, Arend van Spriel,
Franky Lin, Hante Meuleman, Kalle Valo, David S. Miller,
Jakub Kicinski, brcm80211-dev-list.pdl, netdev, Arend van Spriel,
Sasha Levin
In-Reply-To: <20220426081731.370823950@linuxfoundation.org>
From: Borislav Petkov <bp@alien8.de>
[ Upstream commit 6fb3a5868b2117611f41e421e10e6a8c2a13039a ]
Fix:
drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c: In function ‘brcmf_sdio_drivestrengthinit’:
drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c:3798:2: error: case label does not reduce to an integer constant
case SDIOD_DRVSTR_KEY(BRCM_CC_43143_CHIP_ID, 17):
^~~~
drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c:3809:2: error: case label does not reduce to an integer constant
case SDIOD_DRVSTR_KEY(BRCM_CC_43362_CHIP_ID, 13):
^~~~
See https://lore.kernel.org/r/YkwQ6%2BtIH8GQpuct@zn.tnic for the gory
details as to why it triggers with older gccs only.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arend van Spriel <aspriel@gmail.com>
Cc: Franky Lin <franky.lin@broadcom.com>
Cc: Hante Meuleman <hante.meuleman@broadcom.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: brcm80211-dev-list.pdl@broadcom.com
Cc: netdev@vger.kernel.org
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/Ykx0iRlvtBnKqtbG@zn.tnic
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
index 998a4bd6db78..d8f34883c096 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
@@ -547,7 +547,7 @@ enum brcmf_sdio_frmtype {
BRCMF_SDIO_FT_SUB,
};
-#define SDIOD_DRVSTR_KEY(chip, pmu) (((chip) << 16) | (pmu))
+#define SDIOD_DRVSTR_KEY(chip, pmu) (((unsigned int)(chip) << 16) | (pmu))
/* SDIO Pad drive strength to select value mappings */
struct sdiod_drive_str {
--
2.35.1
^ permalink raw reply related
* [PATCH 4.9 01/24] etherdevice: Adjust ether_addr* prototypes to silence -Wstringop-overead
From: Greg Kroah-Hartman @ 2022-04-26 8:20 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Marc Kleine-Budde, Jakub Kicinski,
David S. Miller, netdev, Kees Cook, Khem Raj
In-Reply-To: <20220426081731.370823950@linuxfoundation.org>
From: Kees Cook <keescook@chromium.org>
commit 2618a0dae09ef37728dab89ff60418cbe25ae6bd upstream.
With GCC 12, -Wstringop-overread was warning about an implicit cast from
char[6] to char[8]. However, the extra 2 bytes are always thrown away,
alignment doesn't matter, and the risk of hitting the edge of unallocated
memory has been accepted, so this prototype can just be converted to a
regular char *. Silences:
net/core/dev.c: In function ‘bpf_prog_run_generic_xdp’: net/core/dev.c:4618:21: warning: ‘ether_addr_equal_64bits’ reading 8 bytes from a region of size 6 [-Wstringop-overread]
4618 | orig_host = ether_addr_equal_64bits(eth->h_dest, > skb->dev->dev_addr);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
net/core/dev.c:4618:21: note: referencing argument 1 of type ‘const u8[8]’ {aka ‘const unsigned char[8]’}
net/core/dev.c:4618:21: note: referencing argument 2 of type ‘const u8[8]’ {aka ‘const unsigned char[8]’}
In file included from net/core/dev.c:91: include/linux/etherdevice.h:375:20: note: in a call to function ‘ether_addr_equal_64bits’
375 | static inline bool ether_addr_equal_64bits(const u8 addr1[6+2],
| ^~~~~~~~~~~~~~~~~~~~~~~
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Tested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Link: https://lore.kernel.org/netdev/20220212090811.uuzk6d76agw2vv73@pengutronix.de
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Khem Raj <raj.khem@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
include/linux/etherdevice.h | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
--- a/include/linux/etherdevice.h
+++ b/include/linux/etherdevice.h
@@ -125,7 +125,7 @@ static inline bool is_multicast_ether_ad
#endif
}
-static inline bool is_multicast_ether_addr_64bits(const u8 addr[6+2])
+static inline bool is_multicast_ether_addr_64bits(const u8 *addr)
{
#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64
#ifdef __BIG_ENDIAN
@@ -339,8 +339,7 @@ static inline bool ether_addr_equal(cons
* Please note that alignment of addr1 & addr2 are only guaranteed to be 16 bits.
*/
-static inline bool ether_addr_equal_64bits(const u8 addr1[6+2],
- const u8 addr2[6+2])
+static inline bool ether_addr_equal_64bits(const u8 *addr1, const u8 *addr2)
{
#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64
u64 fold = (*(const u64 *)addr1) ^ (*(const u64 *)addr2);
^ permalink raw reply
* Re: [Patch net-next] net: phy: LAN937x: add interrupt support for link detection
From: patchwork-bot+netdevbpf @ 2022-04-26 8:20 UTC (permalink / raw)
To: Arun Ramadoss
Cc: linux-kernel, netdev, pabeni, kuba, davem, linux, hkallweit1,
andrew, UNGLinuxDriver
In-Reply-To: <20220423154727.29052-1-arun.ramadoss@microchip.com>
Hello:
This patch was applied to netdev/net-next.git (master)
by Paolo Abeni <pabeni@redhat.com>:
On Sat, 23 Apr 2022 21:17:27 +0530 you wrote:
> Added the config_intr and handle_interrupt for the LAN937x phy which is
> same as the LAN87xx phy.
>
> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
> ---
> drivers/net/phy/microchip_t1.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> [...]
Here is the summary with links:
- [net-next] net: phy: LAN937x: add interrupt support for link detection
https://git.kernel.org/netdev/net-next/c/fb0a43f5bd45
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* [PATCH net-next 2/2] net: tcp: add skb drop reasons to route_req()
From: menglong8.dong @ 2022-04-26 8:07 UTC (permalink / raw)
To: kuba
Cc: rostedt, mingo, davem, yoshfuji, dsahern, pabeni, benbjiang,
flyingpeng, imagedong, edumazet, kafai, talalahmad, keescook,
mengensun, dongli.zhang, linux-kernel, netdev
In-Reply-To: <20220426080709.6504-1-imagedong@tencent.com>
From: Menglong Dong <imagedong@tencent.com>
Add skb drop reasons to the route_req() in struct tcp_request_sock_ops.
Following functions are involved:
tcp_v4_route_req()
tcp_v6_route_req()
subflow_v4_route_req()
subflow_v6_route_req()
And the new reason SKB_DROP_REASON_SECURITY is added, which is used when
skb is dropped by LSM.
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
---
include/linux/skbuff.h | 1 +
include/net/tcp.h | 3 ++-
include/trace/events/skb.h | 1 +
net/ipv4/tcp_input.c | 2 +-
net/ipv4/tcp_ipv4.c | 14 +++++++++++---
net/ipv6/tcp_ipv6.c | 14 +++++++++++---
net/mptcp/subflow.c | 10 ++++++----
7 files changed, 33 insertions(+), 12 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index f33b3636bbce..5909759e1b95 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -473,6 +473,7 @@ enum skb_drop_reason {
SKB_DROP_REASON_TCP_REQQFULLDROP, /* request queue of the listen
* socket is full
*/
+ SKB_DROP_REASON_SECURITY, /* dropped by LSM */
SKB_DROP_REASON_MAX,
};
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 679b1964d494..01f841611895 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2075,7 +2075,8 @@ struct tcp_request_sock_ops {
struct dst_entry *(*route_req)(const struct sock *sk,
struct sk_buff *skb,
struct flowi *fl,
- struct request_sock *req);
+ struct request_sock *req,
+ enum skb_drop_reason *reason);
u32 (*init_seq)(const struct sk_buff *skb);
u32 (*init_ts_off)(const struct net *net, const struct sk_buff *skb);
int (*send_synack)(const struct sock *sk, struct dst_entry *dst,
diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index de6c93670437..aff57cd43e85 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -82,6 +82,7 @@
EM(SKB_DROP_REASON_PKT_TOO_BIG, PKT_TOO_BIG) \
EM(SKB_DROP_REASON_LISTENOVERFLOWS, LISTENOVERFLOWS) \
EM(SKB_DROP_REASON_TCP_REQQFULLDROP, TCP_REQQFULLDROP) \
+ EM(SKB_DROP_REASON_SECURITY, SECURITY) \
EMe(SKB_DROP_REASON_MAX, MAX)
#undef EM
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index e0bbbd624246..2c158593dc37 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6928,7 +6928,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
/* Note: tcp_v6_init_req() might override ir_iif for link locals */
inet_rsk(req)->ir_iif = inet_request_bound_dev_if(sk, skb);
- dst = af_ops->route_req(sk, skb, &fl, req);
+ dst = af_ops->route_req(sk, skb, &fl, req, &reason);
if (!dst)
goto drop_and_free;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index b8daf49f54a5..12acf4823488 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1424,14 +1424,22 @@ static void tcp_v4_init_req(struct request_sock *req,
static struct dst_entry *tcp_v4_route_req(const struct sock *sk,
struct sk_buff *skb,
struct flowi *fl,
- struct request_sock *req)
+ struct request_sock *req,
+ enum skb_drop_reason *reason)
{
+ struct dst_entry *dst;
+
tcp_v4_init_req(req, sk, skb);
- if (security_inet_conn_request(sk, skb, req))
+ if (security_inet_conn_request(sk, skb, req)) {
+ SKB_DR_SET(*reason, SECURITY);
return NULL;
+ }
- return inet_csk_route_req(sk, &fl->u.ip4, req);
+ dst = inet_csk_route_req(sk, &fl->u.ip4, req);
+ if (!dst)
+ SKB_DR_SET(*reason, IP_OUTNOROUTES);
+ return dst;
}
struct request_sock_ops tcp_request_sock_ops __read_mostly = {
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 782df529ff69..d69fef0e0892 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -802,14 +802,22 @@ static void tcp_v6_init_req(struct request_sock *req,
static struct dst_entry *tcp_v6_route_req(const struct sock *sk,
struct sk_buff *skb,
struct flowi *fl,
- struct request_sock *req)
+ struct request_sock *req,
+ enum skb_drop_reason *reason)
{
+ struct dst_entry *dst;
+
tcp_v6_init_req(req, sk, skb);
- if (security_inet_conn_request(sk, skb, req))
+ if (security_inet_conn_request(sk, skb, req)) {
+ SKB_DR_SET(*reason, SECURITY);
return NULL;
+ }
- return inet6_csk_route_req(sk, &fl->u.ip6, req, IPPROTO_TCP);
+ dst = inet6_csk_route_req(sk, &fl->u.ip6, req, IPPROTO_TCP);
+ if (!dst)
+ SKB_DR_SET(*reason, IP_OUTNOROUTES);
+ return dst;
}
struct request_sock_ops tcp6_request_sock_ops __read_mostly = {
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index aba260f547da..03d07165cda6 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -283,7 +283,8 @@ EXPORT_SYMBOL_GPL(mptcp_subflow_init_cookie_req);
static struct dst_entry *subflow_v4_route_req(const struct sock *sk,
struct sk_buff *skb,
struct flowi *fl,
- struct request_sock *req)
+ struct request_sock *req,
+ enum skb_drop_reason *reason)
{
struct dst_entry *dst;
int err;
@@ -291,7 +292,7 @@ static struct dst_entry *subflow_v4_route_req(const struct sock *sk,
tcp_rsk(req)->is_mptcp = 1;
subflow_init_req(req, sk);
- dst = tcp_request_sock_ipv4_ops.route_req(sk, skb, fl, req);
+ dst = tcp_request_sock_ipv4_ops.route_req(sk, skb, fl, req, reason);
if (!dst)
return NULL;
@@ -309,7 +310,8 @@ static struct dst_entry *subflow_v4_route_req(const struct sock *sk,
static struct dst_entry *subflow_v6_route_req(const struct sock *sk,
struct sk_buff *skb,
struct flowi *fl,
- struct request_sock *req)
+ struct request_sock *req,
+ enum skb_drop_reason *reason)
{
struct dst_entry *dst;
int err;
@@ -317,7 +319,7 @@ static struct dst_entry *subflow_v6_route_req(const struct sock *sk,
tcp_rsk(req)->is_mptcp = 1;
subflow_init_req(req, sk);
- dst = tcp_request_sock_ipv6_ops.route_req(sk, skb, fl, req);
+ dst = tcp_request_sock_ipv6_ops.route_req(sk, skb, fl, req, reason);
if (!dst)
return NULL;
--
2.36.0
^ permalink raw reply related
* [PATCH net-next 1/2] net: add skb drop reasons to inet connect request
From: menglong8.dong @ 2022-04-26 8:07 UTC (permalink / raw)
To: kuba
Cc: rostedt, mingo, davem, yoshfuji, dsahern, pabeni, benbjiang,
flyingpeng, imagedong, edumazet, kafai, talalahmad, keescook,
mengensun, dongli.zhang, linux-kernel, netdev
In-Reply-To: <20220426080709.6504-1-imagedong@tencent.com>
From: Menglong Dong <imagedong@tencent.com>
The 'conn_request()' in struct inet_connection_sock_af_ops is used to
process connection requesting for TCP/DCCP. Take TCP for example, it
is just 'tcp_v4_conn_request()'.
When non-zero value is returned by 'tcp_v4_conn_request()', the skb
will be freed by kfree_skb() and a 'reset' packet will be send.
Otherwise, it will be freed normally.
In this code path, 'consume_skb()' is used in many abnormal cases, such
as the accept queue of the listen socket full, which should be
'kfree_skb()'.
Therefore, we make a little change to the 'conn_request()' interface.
When 0 is returned, we call 'consume_skb()' as usual; when negative is
returned, we call 'kfree_skb()' and send a 'reset' as usual; when
positive is returned, which has not happened yet, we do nothing, and
skb will be freed in 'conn_request()'. Then, we can use drop reasons
in 'conn_request()'.
Following new drop reasons are added:
SKB_DROP_REASON_LISTENOVERFLOWS
SKB_DROP_REASON_TCP_REQQFULLDROP
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
---
include/linux/skbuff.h | 4 ++++
include/trace/events/skb.h | 2 ++
net/dccp/input.c | 12 +++++-------
net/ipv4/tcp_input.c | 21 +++++++++++++--------
net/ipv4/tcp_ipv4.c | 3 ++-
5 files changed, 26 insertions(+), 16 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 84d78df60453..f33b3636bbce 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -469,6 +469,10 @@ enum skb_drop_reason {
SKB_DROP_REASON_PKT_TOO_BIG, /* packet size is too big (maybe exceed
* the MTU)
*/
+ SKB_DROP_REASON_LISTENOVERFLOWS, /* accept queue of the listen socket is full */
+ SKB_DROP_REASON_TCP_REQQFULLDROP, /* request queue of the listen
+ * socket is full
+ */
SKB_DROP_REASON_MAX,
};
diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index a477bf907498..de6c93670437 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -80,6 +80,8 @@
EM(SKB_DROP_REASON_IP_INADDRERRORS, IP_INADDRERRORS) \
EM(SKB_DROP_REASON_IP_INNOROUTES, IP_INNOROUTES) \
EM(SKB_DROP_REASON_PKT_TOO_BIG, PKT_TOO_BIG) \
+ EM(SKB_DROP_REASON_LISTENOVERFLOWS, LISTENOVERFLOWS) \
+ EM(SKB_DROP_REASON_TCP_REQQFULLDROP, TCP_REQQFULLDROP) \
EMe(SKB_DROP_REASON_MAX, MAX)
#undef EM
diff --git a/net/dccp/input.c b/net/dccp/input.c
index 2cbb757a894f..ed20dfe83f66 100644
--- a/net/dccp/input.c
+++ b/net/dccp/input.c
@@ -574,8 +574,7 @@ int dccp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
struct dccp_sock *dp = dccp_sk(sk);
struct dccp_skb_cb *dcb = DCCP_SKB_CB(skb);
const int old_state = sk->sk_state;
- bool acceptable;
- int queued = 0;
+ int err, queued = 0;
/*
* Step 3: Process LISTEN state
@@ -606,13 +605,12 @@ int dccp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
*/
rcu_read_lock();
local_bh_disable();
- acceptable = inet_csk(sk)->icsk_af_ops->conn_request(sk, skb) >= 0;
+ err = inet_csk(sk)->icsk_af_ops->conn_request(sk, skb);
local_bh_enable();
rcu_read_unlock();
- if (!acceptable)
- return 1;
- consume_skb(skb);
- return 0;
+ if (!err)
+ consume_skb(skb);
+ return err < 0;
}
if (dh->dccph_type == DCCP_PKT_RESET)
goto discard;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index daff631b9486..e0bbbd624246 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6411,7 +6411,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
struct inet_connection_sock *icsk = inet_csk(sk);
const struct tcphdr *th = tcp_hdr(skb);
struct request_sock *req;
- int queued = 0;
+ int err, queued = 0;
bool acceptable;
SKB_DR(reason);
@@ -6438,14 +6438,13 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
*/
rcu_read_lock();
local_bh_disable();
- acceptable = icsk->icsk_af_ops->conn_request(sk, skb) >= 0;
+ err = icsk->icsk_af_ops->conn_request(sk, skb);
local_bh_enable();
rcu_read_unlock();
- if (!acceptable)
- return 1;
- consume_skb(skb);
- return 0;
+ if (!err)
+ consume_skb(skb);
+ return err < 0;
}
SKB_DR_SET(reason, TCP_FLAGS);
goto discard;
@@ -6878,6 +6877,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
bool want_cookie = false;
struct dst_entry *dst;
struct flowi fl;
+ SKB_DR(reason);
/* TW buckets are converted to open requests without
* limitations, they conserve resources and peer is
@@ -6886,12 +6886,15 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
if ((net->ipv4.sysctl_tcp_syncookies == 2 ||
inet_csk_reqsk_queue_is_full(sk)) && !isn) {
want_cookie = tcp_syn_flood_action(sk, rsk_ops->slab_name);
- if (!want_cookie)
+ if (!want_cookie) {
+ SKB_DR_SET(reason, TCP_REQQFULLDROP);
goto drop;
+ }
}
if (sk_acceptq_is_full(sk)) {
NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
+ SKB_DR_SET(reason, LISTENOVERFLOWS);
goto drop;
}
@@ -6947,6 +6950,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
*/
pr_drop_req(req, ntohs(tcp_hdr(skb)->source),
rsk_ops->family);
+ SKB_DR_SET(reason, TCP_REQQFULLDROP);
goto drop_and_release;
}
@@ -7006,7 +7010,8 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
drop_and_free:
__reqsk_free(req);
drop:
+ kfree_skb_reason(skb, reason);
tcp_listendrop(sk);
- return 0;
+ return 1;
}
EXPORT_SYMBOL(tcp_conn_request);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 157265aecbed..b8daf49f54a5 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1470,7 +1470,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
drop:
tcp_listendrop(sk);
- return 0;
+ kfree_skb_reason(skb, SKB_DROP_REASON_IP_INADDRERRORS);
+ return 1;
}
EXPORT_SYMBOL(tcp_v4_conn_request);
--
2.36.0
^ permalink raw reply related
* [PATCH net-next 0/2] net: tcp: add skb drop reasons to connect request
From: menglong8.dong @ 2022-04-26 8:07 UTC (permalink / raw)
To: kuba
Cc: rostedt, mingo, davem, yoshfuji, dsahern, pabeni, benbjiang,
flyingpeng, imagedong, edumazet, kafai, talalahmad, keescook,
mengensun, dongli.zhang, linux-kernel, netdev
From: Menglong Dong <imagedong@tencent.com>
Seems now the reasons of skb drop in TCP layer are almost supported,
except the path of connect requesting. So let's just finish it.
The TCP connect requesting is processed by
'inet_csk(sk)->icsk_af_ops->conn_request()'. Yeah, it's a function
pointer, so it's not easy to add function param to it. Luckily, it's
return value can be reused. For now, 0 means a call of 'consume_skb()'
and -1 means 'kfree_skb()', with a RESET be send. Therefore, we can
free skb with 'kfree_skb_reason()' in 'conn_request()' and return 1.
While 1 is returned, we do nothing outside. This work is done in the
1th patch.
And in the 2th patch, skb drop reasons are added to route_req() in
struct tcp_request_sock_ops by adding a function param to it.
Following new skb drop reasons are added:
SKB_DROP_REASON_LISTENOVERFLOWS
SKB_DROP_REASON_TCP_REQQFULLDROP
SKB_DROP_REASON_SECURITY
Menglong Dong (2):
net: add skb drop reasons to inet connect request
net: tcp: add skb drop reasons to route_req()
include/linux/skbuff.h | 5 +++++
include/net/tcp.h | 3 ++-
include/trace/events/skb.h | 3 +++
net/dccp/input.c | 12 +++++-------
net/ipv4/tcp_input.c | 23 ++++++++++++++---------
net/ipv4/tcp_ipv4.c | 17 +++++++++++++----
net/ipv6/tcp_ipv6.c | 14 +++++++++++---
net/mptcp/subflow.c | 10 ++++++----
8 files changed, 59 insertions(+), 28 deletions(-)
--
2.36.0
^ permalink raw reply
* Re: [PATCH net-next] net: tls: fix async vs NIC crypto offload
From: Gal Pressman @ 2022-04-26 8:06 UTC (permalink / raw)
To: Jakub Kicinski, davem; +Cc: netdev, pabeni, borisp, john.fastabend, daniel
In-Reply-To: <20220425233309.344858-1-kuba@kernel.org>
On 26/04/2022 02:33, Jakub Kicinski wrote:
> When NIC takes care of crypto (or the record has already
> been decrypted) we forget to update darg->async. ->async
> is supposed to mean whether record is async capable on
> input and whether record has been queued for async crypto
> on output.
>
> Reported-by: Gal Pressman <gal@nvidia.com>
> Fixes: 3547a1f9d988 ("tls: rx: use async as an in-out argument")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Thanks Jakub!
Tested-by: Gal Pressman <gal@nvidia.com>
^ permalink raw reply
* Re: [RFC PATCH bpf-next 0/2] bpf: Introduce ternary search tree for string key
From: Hou Tao @ 2022-04-26 8:03 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Alexei Starovoitov, Yonghong Song, Daniel Borkmann,
Martin KaFai Lau, Andrii Nakryiko, Song Liu, KP Singh,
David S . Miller, Jakub Kicinski, John Fastabend, Networking, bpf
In-Reply-To: <CAEf4Bzbfct4G7AjVjbaL8LvSGy0NQWeEjoR1BCfeZzdmYx8Tpw@mail.gmail.com>
Hi,
On 4/15/2022 5:25 AM, Andrii Nakryiko wrote:
> On Wed, Apr 13, 2022 at 6:03 PM Hou Tao <houtao1@huawei.com> wrote:
>> Hi,
>>
>> (I send my previous reply in HTML mode mistakenly and the mail list doesn't
>> receive it, so send it again in the plain text mode.)
>>
>> On 4/13/2022 12:09 PM, Andrii Nakryiko wrote:
>>> On Fri, Apr 8, 2022 at 8:08 PM Hou Tao <houtao1@huawei.com> wrote:
>>>> Hi,
>>>>
>>>> On 4/7/2022 1:38 AM, Andrii Nakryiko wrote:
>>>>> On Thu, Mar 31, 2022 at 5:04 AM Hou Tao <houtao1@huawei.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> The initial motivation for the patchset is due to the suggestion of Alexei.
>>>>>> During the discuss of supporting of string key in hash-table, he saw the
>>>>>> space efficiency of ternary search tree under our early test and suggest
>>>>>> us to post it as a new bpf map [1].
>>>>>>
>>>>>> Ternary search tree is a special trie where nodes are arranged in a
>>>>>> manner similar to binary search tree, but with up to three children
>>>>>> rather than two. The three children correpond to nodes whose value is
>>>>>> less than, equal to, and greater than the value of current node
>>>>>> respectively.
>>>>>>
>>>>>> In ternary search tree map, only the valid content of string is saved.
>>>>>> The trailing null byte and unused bytes after it are not saved. If there
>>>>>> are common prefixes between these strings, the prefix is only saved once.
>>>>>> Compared with other space optimized trie (e.g. HAT-trie, succinct trie),
>>>>>> the advantage of ternary search tree is simple and being writeable.
>> snip
>>>>> Have you heard and tried qp-trie ([0]) by any chance? It is elegant
>>>>> and simple data structure. By all the available benchmarks it handily
>>>>> beats Red-Black trees in terms of memory usage and performance (though
>>>>> it of course depends on the data set, just like "memory compression"
>>>>> for ternary tree of yours depends on large set of common prefixes).
>>>>> qp-trie based BPF map seems (at least on paper) like a better
>>>>> general-purpose BPF map that is dynamically sized (avoiding current
>>>>> HASHMAP limitations) and stores keys in sorted order (and thus allows
>>>>> meaningful ordered iteration *and*, importantly for longest prefix
>>>>> match tree, allows efficient prefix matches). I did a quick experiment
>>>>> about a month ago trying to replace libbpf's internal use of hashmap
>>>>> with qp-trie for BTF string dedup and it was slightly slower than
>>>>> hashmap (not surprisingly, though, because libbpf over-sizes hashmap
>>>>> to avoid hash collisions and long chains in buckets), but it was still
>>>>> very decent even in that scenario. So I've been mulling the idea of
>>>>> implementing BPF map based on qp-trie elegant design and ideas, but
>>>>> can't find time to do this.
>>>> I have heard about it when check the space efficient of HAT trie [0], because
>>>> qp-trie needs to save the whole string key in the leaf node and its space
>>>> efficiency can not be better than ternary search tree for strings with common
>>>> prefix, so I did not consider about it. But I will do some benchmarks to check
>>>> the lookup performance and space efficiency of qp-trie and tst for string with
>>>> common prefix and strings without much common prefix.
>>>> If qp-trie is better, I think I can take the time to post it as a bpf map if you
>>>> are OK with that.
>>> You can probably always craft a data set where prefix sharing is so
>>> prevalent that space savings are very significant. But I think for a
>>> lot of real-world data it won't be as extreme and qp-trie might be
>>> very comparable (if not more memory-efficient) due to very compact
>>> node layout (which was the point of qp-trie). So I'd be really curious
>>> to see some comparisons. Would be great if you can try both!
>> It is a bit surprised to me that qp-trie has better memory efficiency (and
>> better lookup performance sometimes) compared with tst when there are not so
>> many common prefix between input strings (All tests below are conducted by
>> implementing the data structure in user-space):
> Thanks for a quick follow up and a benchmark!
>
> Low memory use is probably due to the minimal amount of pointers and
> extra metadata used per node in qp-trie. qp-trie approach is very
> lean, which is why I was recommending trying it out.
>
>> * all unique symbols in /proc/kallsyms (171428 sorted symbols, 4.2MB in total)
>>
>> | qp-trie | tst | hash |
>> total memory used (MB) | 8.6 | 11.2 | 22.3 |
>> total update time (us) | 94623 | 87396 | 24477 |
>> total lookup time (us) | 50681 | 67395 | 22842 |
>>
>> * all strings in BTF string area (115980 unsorted strings, 2MB in total)
>>
>> | qp-trie | tst | hash |
>> total memory used (MB) | 5.0 | 7.3 | 13.5 |
>> total update time (us) | 67764 | 43484 | 16462 |
>> total lookup time (us) | 33732 | 31612 | 16462 |
>>
>> * all strings in BTF string area (115980 sorted string, 2MB in total)
>>
>> | qp-trie | tst | hash |
>> total memory used (MB) | 5.0 | 7.3 | 13.5 |
>> total update time (us) | 58745 | 57756 | 16210 |
>> total lookup time (us) | 26922 | 40850 | 16896 |
>>
>> * all files under Linux kernel (2.7MB, 74359 files generated by find utility
>> with "./" stripped)
>>
>> | qp-trie | tst | hash |
>> total memory used (MB) | 4.6 | 5.2 | 11.6 |
>> total update time (us) | 50422 | 28842 | 15255 |
>> total lookup time (us) | 22543 | 18252 | 11836 |
> Seems like lookup time is more or less on par (and for kallsyms
> noticeably faster), but update is sometimes a bit slower. I don't know
> if you did your own code or used open-source implementation, but keep
> in mind that performance of qp-trie very much depends on fast
> __builtin_popcount, so make sure you are using proper -march when
> compiling. See [0]
>
> [0] https://stackoverflow.com/questions/52161596/why-is-builtin-popcount-slower-than-my-own-bit-counting-function
I used the open source code from github [0] directly. And after adding
-march=native, both the lookup and update performance of qp-trie are improved.
And the lookup performance of qp-trie is always better than tst, but the update
performance of qp-trie is still worse than tst.
[0]: https://github.com/fanf2/qp.git
>> When the length of common prefix increases, ternary search tree becomes better
>> than qp-trie.
>>
>> * all files under Linux kernel with a comm prefix (e.g. "/home/houtao")
>>
>> | qp-trie | tst | hash |
>> total memory used (MB) | 5.5 | 5.2 | 12.2 |
>> total update time (us) | 51558 | 29835 | 15345 |
>> total lookup time (us) | 23121 | 19638 | 11540 |
>>
>> Because the lengthy prefix is not so common, and for string map I think the
>> memory efficiency and lookup performance is more importance than update
>> performance, so maybe qp-trie is a better choice for string map ? Any suggestions ?
>>
> I'm biased :) But I like the idea of qp-trie as a general purpose
> ordered and dynamically sized BPF map. It makes no assumption about
> data being string-like and sharing common prefixes. It can be made to
> work just as fine with any array of bytes, making it very suitable as
> a generic lookup table map. Note that upstream implementation does
> assume zero-terminated strings and no key being a prefix of another
> key. But all that can be removed. For fixed-length keys this can never
> happen by construction, for variable-length keys (and we'll be able to
> support this finally with bpf_dynptr's help very soon), we can record
> length of the key in each leaf and use that during comparisons.
Using the trailing zero byte to make sure no key will be a prefix of another is
simple, but I will check whether or not there is other way to make the bytes
array key work out. Alexei had suggest me to use the length of key as part of
key just like bpf_lpm_trie_key does, maybe i can try it first.
>
> Also note that qp-trie can be internally used by BPF_MAP_TYPE_LPM_TRIE
> very efficiently and speed it up considerable in the process (and
> especially to get rid of the global lock).
>
> So if you were to invest in a proper full-featured production
> implementation of a BPF map, I'd start with qp-trie. From available
> benchmarks it's both faster and more memory efficient than Red-Black
> trees, which could be an alternative underlying implementation of such
> ordered and "resizable" map.
Thanks for your suggestions. I will give it a try.
Regards,
Tao
>> Regards,
>> Tao
>>>>> This prefix sharing is nice when you have a lot of long common
>>>>> prefixes, but I'm a bit skeptical that as a general-purpose BPF data
>>>>> structure it's going to be that beneficial. 192 bytes of common
>>>>> prefixes seems like a very unusual dataset :)
>>>> Yes. The case with common prefix I known is full file path.
>>>>> More specifically about TST implementation in your paches. One global
>>>>> per-map lock I think is a very big downside. We have LPM trie which is
>>>>> very slow in big part due to global lock. It might be possible to
>>>>> design more granular schema for TST, but this whole in-place splitting
>>>>> logic makes this harder. I think qp-trie can be locked in a granular
>>>>> fashion much more easily by having a "hand over hand" locking: lock
>>>>> parent, find child, lock child, unlock parent, move into child node.
>>>>> Something like that would be more scalable overall, especially if the
>>>>> access pattern is not focused on a narrow set of nodes.
>>>> Yes. The global lock is a problem but the splitting is not in-place. I will try
>>>> to figure out whether the lock can be more scalable after the benchmark test
>>>> between qp-trie and tst.
>>> Great, looking forward!
>>>
>>>> Regards,
>>>> Tao
>>>>
>>>> [0]: https://github.com/Tessil/hat-trie
>>>>> Anyways, I love data structures and this one is an interesting idea.
>>>>> But just my few cents of "production-readiness" for general-purpose
>>>>> data structures for BPF.
>>>>>
>>>>> [0] https://dotat.at/prog/qp/README.html
>>>>>
>>>>>> Regards,
>>>>>> Tao
>>>>>>
>>>>>> [1]: https://lore.kernel.org/bpf/CAADnVQJUJp3YBcpESwR3Q1U6GS1mBM=Vp-qYuQX7eZOaoLjdUA@mail.gmail.com/
>>>>>>
>>>>>> Hou Tao (2):
>>>>>> bpf: Introduce ternary search tree for string key
>>>>>> selftests/bpf: add benchmark for ternary search tree map
>>>>>>
>>>>>> include/linux/bpf_types.h | 1 +
>>>>>> include/uapi/linux/bpf.h | 1 +
>>>>>> kernel/bpf/Makefile | 1 +
>>>>>> kernel/bpf/bpf_tst.c | 411 +++++++++++++++++
>>>>>> tools/include/uapi/linux/bpf.h | 1 +
>>>>>> tools/testing/selftests/bpf/Makefile | 5 +-
>>>>>> tools/testing/selftests/bpf/bench.c | 6 +
>>>>>> .../selftests/bpf/benchs/bench_tst_map.c | 415 ++++++++++++++++++
>>>>>> .../selftests/bpf/benchs/run_bench_tst.sh | 54 +++
>>>>>> tools/testing/selftests/bpf/progs/tst_bench.c | 70 +++
>>>>>> 10 files changed, 964 insertions(+), 1 deletion(-)
>>>>>> create mode 100644 kernel/bpf/bpf_tst.c
>>>>>> create mode 100644 tools/testing/selftests/bpf/benchs/bench_tst_map.c
>>>>>> create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_tst.sh
>>>>>> create mode 100644 tools/testing/selftests/bpf/progs/tst_bench.c
>>>>>>
>>>>>> --
>>>>>> 2.31.1
>>>>>>
>>>>> .
>>> .
> .
^ permalink raw reply
* Re: [PATCH net-next v6 08/13] net: wwan: t7xx: Add data path interface
From: Sergey Ryazanov @ 2022-04-26 8:00 UTC (permalink / raw)
To: Ilpo Järvinen
Cc: Ricardo Martinez, Netdev, linux-wireless, Jakub Kicinski,
David Miller, Johannes Berg, Loic Poulain, M Chetan Kumar,
chandrashekar.devegowda, Intel Corporation, chiranjeevi.rapolu,
Haijun Liu (刘海军), amir.hanania,
Andy Shevchenko, dinesh.sharma, eliot.lee, moises.veleta,
pierre-louis.bossart, muralidharan.sethuraman,
Soumya.Prakash.Mishra, sreehari.kancharla, madhusmita.sahu
In-Reply-To: <d829315b-79ca-ff88-c76-e352d8fb5b5b@linux.intel.com>
On Tue, Apr 26, 2022 at 10:30 AM Ilpo Järvinen
<ilpo.jarvinen@linux.intel.com> wrote:
> On Tue, 26 Apr 2022, Sergey Ryazanov wrote:
>> On Fri, Apr 8, 2022 at 1:37 AM Ricardo Martinez
>> <ricardo.martinez@linux.intel.com> wrote:
>>> Data Path Modem AP Interface (DPMAIF) HIF layer provides methods
>>> for initialization, ISR, control and event handling of TX/RX flows.
>>>
>>> DPMAIF TX
>>> Exposes the 'dmpaif_tx_send_skb' function which can be used by the
>>> network device to transmit packets.
>>> The uplink data management uses a Descriptor Ring Buffer (DRB).
>>> First DRB entry is a message type that will be followed by 1 or more
>>> normal DRB entries. Message type DRB will hold the skb information
>>> and each normal DRB entry holds a pointer to the skb payload.
>>>
>>> DPMAIF RX
>>> The downlink buffer management uses Buffer Address Table (BAT) and
>>> Packet Information Table (PIT) rings.
>>> The BAT ring holds the address of skb data buffer for the HW to use,
>>> while the PIT contains metadata about a whole network packet including
>>> a reference to the BAT entry holding the data buffer address.
>>> The driver reads the PIT and BAT entries written by the modem, when
>>> reaching a threshold, the driver will reload the PIT and BAT rings.
>>>
>>> Signed-off-by: Haijun Liu <haijun.liu@mediatek.com>
>>> Signed-off-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
>>> Co-developed-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
>>> Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
>>>
>>> From a WWAN framework perspective:
>>> Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
>>
>> Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
>>
>> and a small question below.
>>
>>> diff --git a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_rx.c b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_rx.c
>>> ...
>>> +static bool t7xx_alloc_and_map_skb_info(const struct dpmaif_ctrl *dpmaif_ctrl,
>>> + const unsigned int size, struct dpmaif_bat_skb *cur_skb)
>>> +{
>>> + dma_addr_t data_bus_addr;
>>> + struct sk_buff *skb;
>>> + size_t data_len;
>>> +
>>> + skb = __dev_alloc_skb(size, GFP_KERNEL);
>>> + if (!skb)
>>> + return false;
>>> +
>>> + data_len = skb_end_pointer(skb) - skb->data;
>>
>> Earlier you use a nice t7xx_skb_data_area_size() function here, but
>> now you opencode it. Is it a consequence of t7xx_common.h removing?
>>
>> I would even encourage you to make this function common and place it
>> into include/linux/skbuff.h with a dedicated patch and call it
>> something like skb_data_size(). Surprisingly, there is no such helper
>> function in the kernel, and several other drivers will benefit from
>> it:
>
> I agree other than the name. IMHO, skb_data_size sounds too much "data
> size" which it exactly isn't but just how large the memory area is (we
> already have "datalen" anyway and on language level, those two don't sound
> different at all). The memory area allocated may or may not have actual
> data in it, I suggested adding "area" into it.
I agree, using the "area" word in the helper name gives more clue
about its purpose, thanks.
--
Sergey
^ permalink raw reply
* Re: [PATCH] net: dsa: mv88e6xxx: Skip cmode writable for mv88e6*41 if unchanged
From: Paolo Abeni @ 2022-04-26 7:50 UTC (permalink / raw)
To: Marek Behún, Nathan Rossi
Cc: netdev, linux-kernel, Andrew Lunn, Vivien Didelot,
Florian Fainelli, Vladimir Oltean, David S. Miller,
Jakub Kicinski
In-Reply-To: <20220423152523.1f38e2d8@thinkpad>
Hello,
On Sat, 2022-04-23 at 15:25 +0200, Marek Behún wrote:
> On Sat, 23 Apr 2022 13:20:35 +0000
> Nathan Rossi <nathan@nathanrossi.com> wrote:
>
> > The mv88e6341_port_set_cmode function always calls the set writable
> > regardless of whether the current cmode is different from the desired
> > cmode. This is problematic for specific configurations of the mv88e6341
> > and mv88e6141 (in single chip adddressing mode?) where the hidden
> > registers are not accessible.
>
> I don't have a problem with skipping setting cmode writable if cmode is
> not being changed. But hidden registers should be accessible regardless
> of whether you are using single chip addressing mode or not. You need
> to find why it isn't working for you, this is a bug.
For the records, I read the above as requiring a fix the root cause, so
I'm not applying this patch. Please correct me if I'm wrong.
Thanks,
Paolo
^ permalink raw reply
* Re: [PATCH v0] mctp: defer the kfree of object mdev->addrs
From: patchwork-bot+netdevbpf @ 2022-04-26 7:50 UTC (permalink / raw)
To: Lin Ma; +Cc: jk, matt, davem, kuba, pabeni, netdev, linux-kernel
In-Reply-To: <20220422114340.32346-1-linma@zju.edu.cn>
Hello:
This patch was applied to netdev/net.git (master)
by Paolo Abeni <pabeni@redhat.com>:
On Fri, 22 Apr 2022 19:43:40 +0800 you wrote:
> The function mctp_unregister() reclaims the device's relevant resource
> when a netcard detaches. However, a running routine may be unaware of
> this and cause the use-after-free of the mdev->addrs object.
>
> The race condition can be demonstrated below
>
> cleanup thread another thread
> |
> unregister_netdev() | mctp_sendmsg()
> ... | ...
> mctp_unregister() | rt = mctp_route_lookup()
> ... | mctl_local_output()
> kfree(mdev->addrs) | ...
> | saddr = rt->dev->addrs[0];
> |
>
> [...]
Here is the summary with links:
- [v0] mctp: defer the kfree of object mdev->addrs
https://git.kernel.org/netdev/net/c/b561275d633b
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net-next RESEND] net: phy: marvell-88x2222: set proper phydev->port
From: Ivan Bornyakov @ 2022-04-26 7:46 UTC (permalink / raw)
To: Russell King (Oracle)
Cc: Andrew Lunn, Heiner Kallweit, David S. Miller, Jakub Kicinski,
Paolo Abeni, netdev, linux-kernel, system
In-Reply-To: <YmcTkkNcDrtdcGTM@shell.armlinux.org.uk>
On Mon, Apr 25, 2022 at 10:33:06PM +0100, Russell King (Oracle) wrote:
> On Mon, Apr 25, 2022 at 07:16:37AM +0300, Ivan Bornyakov wrote:
> > phydev->port was not set and always reported as PORT_TP.
> > Set phydev->port according to inserted SFP module.
> >
> > Signed-off-by: Ivan Bornyakov <i.bornyakov@metrotek.ru>
> > ---
> > drivers/net/phy/marvell-88x2222.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/net/phy/marvell-88x2222.c b/drivers/net/phy/marvell-88x2222.c
> > index ec4f1407a78c..9f971b37ec35 100644
> > --- a/drivers/net/phy/marvell-88x2222.c
> > +++ b/drivers/net/phy/marvell-88x2222.c
> > @@ -603,6 +603,7 @@ static int mv2222_sfp_insert(void *upstream, const struct sfp_eeprom_id *id)
> > dev = &phydev->mdio.dev;
> >
> > sfp_parse_support(phydev->sfp_bus, id, sfp_supported);
> > + phydev->port = sfp_parse_port(phydev->sfp_bus, id, sfp_supported);
> > sfp_interface = sfp_select_interface(phydev->sfp_bus, sfp_supported);
> >
> > dev_info(dev, "%s SFP module inserted\n", phy_modes(sfp_interface));
> > @@ -639,6 +640,7 @@ static void mv2222_sfp_remove(void *upstream)
> >
> > priv->line_interface = PHY_INTERFACE_MODE_NA;
> > linkmode_zero(priv->supported);
> > + phydev->port = PORT_OTHER;
>
> Can this PHY be used in dual-media mode, auto-switching between copper
> and fibre?
I believe, it can not.
> If so, is PORT_OTHER actually appropriate here, or should
> the old value be saved when the module is inserted and restored when
> it's removed?
Would PORT_NONE be appropriate? Saving old value would be ok also.
^ permalink raw reply
* [PATCH net 1/1] net: stmmac: disable Split Header (SPH) for Intel platforms
From: Tan Tee Min @ 2022-04-26 7:45 UTC (permalink / raw)
To: Giuseppe Cavallaro, Alexandre Torgue, Jose Abreu,
David S . Miller, Jakub Kicinski, Paolo Abeni, Maxime Coquelin,
Michael Sit Wei Hong, Xiaoliang Yang, Wong Vee Khee, Tan Tee Min,
Ling Pei Lee, Bhupesh Sharma, Kurt Kanzenbach
Cc: netdev, linux-stm32, linux-arm-kernel, linux-kernel, stable,
Voon Wei Feng, Song Yoong Siang, Ong, Boon Leong, Tan Tee Min
Based on DesignWare Ethernet QoS datasheet, we are seeing the limitation
of Split Header (SPH) feature is not supported for Ipv4 fragmented packet.
This SPH limitation will cause ping failure when the packets size exceed
the MTU size. For example, the issue happens once the basic ping packet
size is larger than the configured MTU size and the data is lost inside
the fragmented packet, replaced by zeros/corrupted values, and leads to
ping fail.
So, disable the Split Header for Intel platforms.
Cc: <stable@vger.kernel.org> # 5.10.x
Suggested-by: Ong, Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Mohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com>
---
drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c | 1 +
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +-
include/linux/stmmac.h | 1 +
3 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c
index 63754a9c4ba7..0b0be0898ac5 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c
@@ -454,6 +454,7 @@ static int intel_mgbe_common_data(struct pci_dev *pdev,
plat->has_gmac4 = 1;
plat->force_sf_dma_mode = 0;
plat->tso_en = 1;
+ plat->sph_disable = 1;
/* Multiplying factor to the clk_eee_i clock time
* period to make it closer to 100 ns. This value
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 4a4b3651ab3e..2525a80353b7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -7021,7 +7021,7 @@ int stmmac_dvr_probe(struct device *device,
dev_info(priv->device, "TSO feature enabled\n");
}
- if (priv->dma_cap.sphen) {
+ if (priv->dma_cap.sphen && !priv->plat->sph_disable) {
ndev->hw_features |= NETIF_F_GRO;
priv->sph_cap = true;
priv->sph = priv->sph_cap;
diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index 24eea1b05ca2..29917850f079 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -270,5 +270,6 @@ struct plat_stmmacenet_data {
int msi_rx_base_vec;
int msi_tx_base_vec;
bool use_phy_wol;
+ bool sph_disable;
};
#endif
--
2.25.1
^ permalink raw reply related
* Re: [PATCH v2 net-next] net: generalize skb freeing deferral to per-cpu lists
From: Paolo Abeni @ 2022-04-26 7:38 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski; +Cc: netdev, Eric Dumazet
In-Reply-To: <20220422201237.416238-1-eric.dumazet@gmail.com>
Hello,
I'm sorry for the late feedback. I have only a possibly relevant point
below.
On Fri, 2022-04-22 at 13:12 -0700, Eric Dumazet wrote:
[...]
> @@ -6571,6 +6577,28 @@ static int napi_threaded_poll(void *data)
> return 0;
> }
>
> +static void skb_defer_free_flush(struct softnet_data *sd)
> +{
> + struct sk_buff *skb, *next;
> + unsigned long flags;
> +
> + /* Paired with WRITE_ONCE() in skb_attempt_defer_free() */
> + if (!READ_ONCE(sd->defer_list))
> + return;
> +
> + spin_lock_irqsave(&sd->defer_lock, flags);
> + skb = sd->defer_list;
I *think* that this read can possibly be fused with the previous one,
and another READ_ONCE() should avoid that.
BTW it looks like this version gives slightly better results than the
previous one, perhpas due to the single-liked list usage?
Thanks!
Paolo
^ permalink raw reply
* Re: [PATCH bpf-next v3 2/7] ftrace: Fix deadloop caused by direct call in ftrace selftest
From: Xu Kuohai @ 2022-04-26 7:36 UTC (permalink / raw)
To: Steven Rostedt
Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
Catalin Marinas, Will Deacon, Ingo Molnar, Daniel Borkmann,
Alexei Starovoitov, Zi Shen Lim, Andrii Nakryiko,
Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
KP Singh, David S . Miller, Hideaki YOSHIFUJI, David Ahern,
Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, hpa,
Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer, Mark Rutland,
Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
Delyan Kratunov, Kumar Kartikeya Dwivedi
In-Reply-To: <20220425110512.538ce0bf@gandalf.local.home>
On 4/25/2022 11:05 PM, Steven Rostedt wrote:
> On Sun, 24 Apr 2022 11:40:23 -0400
> Xu Kuohai <xukuohai@huawei.com> wrote:
>
>> diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
>> index abcadbe933bb..d2eff2b1d743 100644
>> --- a/kernel/trace/trace_selftest.c
>> +++ b/kernel/trace/trace_selftest.c
>> @@ -785,8 +785,24 @@ static struct fgraph_ops fgraph_ops __initdata = {
>> };
>>
>> #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
>> +#ifdef CONFIG_ARM64
>
> Please find a way to add this in arm specific code. Do not add architecture
> defines in generic code.
>
> You could add:
>
> #ifndef ARCH_HAVE_FTRACE_DIRECT_TEST_FUNC
> noinline __noclone static void trace_direct_tramp(void) { }
> #endif
>
> here, and in arch/arm64/include/ftrace.h
>
> #define ARCH_HAVE_FTRACE_DIRECT_TEST_FUNC
>
> and define your test function in the arm64 specific code.
>
> -- Steve
>
>
will move this to arch/arm64/ in v4, thanks.
>
>
>> +extern void trace_direct_tramp(void);
>> +
>> +asm (
>> +" .pushsection .text, \"ax\", @progbits\n"
>> +" .type trace_direct_tramp, %function\n"
>> +" .global trace_direct_tramp\n"
>> +"trace_direct_tramp:"
>> +" mov x10, x30\n"
>> +" mov x30, x9\n"
>> +" ret x10\n"
>> +" .size trace_direct_tramp, .-trace_direct_tramp\n"
>> +" .popsection\n"
>> +);
>> +#else
>> noinline __noclone static void trace_direct_tramp(void) { }
>> #endif
>> +#endif
>>
>> /*
>> * Pretty much the same than for the function tracer from which the selftest
>
> .
^ permalink raw reply
* Re: [PATCH net-next v6 08/13] net: wwan: t7xx: Add data path interface
From: Ilpo Järvinen @ 2022-04-26 7:29 UTC (permalink / raw)
To: Sergey Ryazanov
Cc: Ricardo Martinez, Netdev, linux-wireless, Jakub Kicinski,
David Miller, Johannes Berg, Loic Poulain, M Chetan Kumar,
chandrashekar.devegowda, Intel Corporation, chiranjeevi.rapolu,
Haijun Liu (刘海军), amir.hanania,
Andy Shevchenko, dinesh.sharma, eliot.lee, moises.veleta,
pierre-louis.bossart, muralidharan.sethuraman,
Soumya.Prakash.Mishra, sreehari.kancharla, madhusmita.sahu
In-Reply-To: <CAHNKnsTr3aq1sgHnZQFL7-0uHMp3Wt4PMhVgTMSAiiXT=8p35A@mail.gmail.com>
On Tue, 26 Apr 2022, Sergey Ryazanov wrote:
> On Fri, Apr 8, 2022 at 1:37 AM Ricardo Martinez
> <ricardo.martinez@linux.intel.com> wrote:
> > Data Path Modem AP Interface (DPMAIF) HIF layer provides methods
> > for initialization, ISR, control and event handling of TX/RX flows.
> >
> > DPMAIF TX
> > Exposes the 'dmpaif_tx_send_skb' function which can be used by the
> > network device to transmit packets.
> > The uplink data management uses a Descriptor Ring Buffer (DRB).
> > First DRB entry is a message type that will be followed by 1 or more
> > normal DRB entries. Message type DRB will hold the skb information
> > and each normal DRB entry holds a pointer to the skb payload.
> >
> > DPMAIF RX
> > The downlink buffer management uses Buffer Address Table (BAT) and
> > Packet Information Table (PIT) rings.
> > The BAT ring holds the address of skb data buffer for the HW to use,
> > while the PIT contains metadata about a whole network packet including
> > a reference to the BAT entry holding the data buffer address.
> > The driver reads the PIT and BAT entries written by the modem, when
> > reaching a threshold, the driver will reload the PIT and BAT rings.
> >
> > Signed-off-by: Haijun Liu <haijun.liu@mediatek.com>
> > Signed-off-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
> > Co-developed-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
> > Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
> >
> > From a WWAN framework perspective:
> > Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
>
> Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
>
> and a small question below.
>
> > diff --git a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_rx.c b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_rx.c
> > ...
> > +static bool t7xx_alloc_and_map_skb_info(const struct dpmaif_ctrl *dpmaif_ctrl,
> > + const unsigned int size, struct dpmaif_bat_skb *cur_skb)
> > +{
> > + dma_addr_t data_bus_addr;
> > + struct sk_buff *skb;
> > + size_t data_len;
> > +
> > + skb = __dev_alloc_skb(size, GFP_KERNEL);
> > + if (!skb)
> > + return false;
> > +
> > + data_len = skb_end_pointer(skb) - skb->data;
>
> Earlier you use a nice t7xx_skb_data_area_size() function here, but
> now you opencode it. Is it a consequence of t7xx_common.h removing?
>
> I would even encourage you to make this function common and place it
> into include/linux/skbuff.h with a dedicated patch and call it
> something like skb_data_size(). Surprisingly, there is no such helper
> function in the kernel, and several other drivers will benefit from
> it:
I agree other than the name. IMHO, skb_data_size sounds too much "data
size" which it exactly isn't but just how large the memory area is (we
already have "datalen" anyway and on language level, those two don't sound
different at all). The memory area allocated may or may not have actual
data in it, I suggested adding "area" into it.
--
i.
^ permalink raw reply
* Re: [PATCH perf/core 4/5] perf tools: Register perfkprobe libbpf section handler
From: Jiri Olsa @ 2022-04-26 6:58 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Jiri Olsa, Arnaldo Carvalho de Melo, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, linux-perf-use., Networking,
bpf, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
Peter Zijlstra, Martin KaFai Lau, Song Liu, Yonghong Song,
John Fastabend, Ian Rogers
In-Reply-To: <CAEf4BzaGLRYiQtT4_HV1ntAV0Br2yyRo5sZiebVAt9QJ8WVF5g@mail.gmail.com>
On Mon, Apr 25, 2022 at 11:22:54PM -0700, Andrii Nakryiko wrote:
> On Fri, Apr 22, 2022 at 3:01 AM Jiri Olsa <jolsa@kernel.org> wrote:
> >
> > Perf is using section name to declare special kprobe arguments,
> > which no longer works with current libbpf, that either requires
> > certain form of the section name or allows to register custom
> > handler.
> >
> > Adding support for 'perfkprobe/' section name handler to take
> > care of perf kprobe programs.
> >
> > The handler servers two purposes:
> > - allows perf programs to have special arguments in section name
> > - allows perf to use pre-load callback where we can attach init
> > code (zeroing all argument registers) to each perf program
> >
> > The second is essential part of new prologue generation code,
> > that's coming in following patch.
> >
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> > tools/perf/util/bpf-loader.c | 50 ++++++++++++++++++++++++++++++++++++
> > 1 file changed, 50 insertions(+)
> >
> > diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
> > index f8ad581ea247..92dd8cc18edb 100644
> > --- a/tools/perf/util/bpf-loader.c
> > +++ b/tools/perf/util/bpf-loader.c
> > @@ -86,6 +86,7 @@ bpf_perf_object__next(struct bpf_perf_object *prev)
> > (perf_obj) = (tmp), (tmp) = bpf_perf_object__next(tmp))
> >
> > static bool libbpf_initialized;
> > +static int libbpf_sec_handler;
> >
> > static int bpf_perf_object__add(struct bpf_object *obj)
> > {
> > @@ -99,12 +100,61 @@ static int bpf_perf_object__add(struct bpf_object *obj)
> > return perf_obj ? 0 : -ENOMEM;
> > }
> >
> > +static struct bpf_insn prologue_init_insn[] = {
> > + BPF_MOV64_IMM(BPF_REG_0, 0),
> > + BPF_MOV64_IMM(BPF_REG_1, 0),
> > + BPF_MOV64_IMM(BPF_REG_2, 0),
> > + BPF_MOV64_IMM(BPF_REG_3, 0),
> > + BPF_MOV64_IMM(BPF_REG_4, 0),
> > + BPF_MOV64_IMM(BPF_REG_5, 0),
> > +};
> > +
> > +#define LIBBPF_SEC_PREFIX "perfkprobe/"
>
> libbpf allows to register fallback handler that will handle any SEC()
> definition besides the ones that libbpf handles. Would that work in
> this case instead of adding a custom prefix handler here? See
> prog_tests/custom_sec_handlers.c for example:
>
> fallback_id = libbpf_register_prog_handler(NULL,
> BPF_PROG_TYPE_SYSCALL, 0, &opts);
nice, I did not see that.. that should be better, no need for the prefix
thanks,
jirka
^ permalink raw reply
* Re: [PATCH net-next 00/11] mlxsw: extend line card model by devices and info
From: Jiri Pirko @ 2022-04-26 6:57 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Ido Schimmel, Ido Schimmel, netdev, davem, pabeni, jiri, petrm,
dsahern, andrew, mlxsw
In-Reply-To: <20220425125218.7caa473f@kernel.org>
Mon, Apr 25, 2022 at 09:52:18PM CEST, kuba@kernel.org wrote:
>On Mon, 25 Apr 2022 22:39:57 +0300 Ido Schimmel wrote:
>> > :/ what is a line card device? You must provide document what you're
>> > doing, this:
>> >
>> > .../networking/devlink/devlink-linecard.rst | 4 +
>> >
>> > is not enough.
>> >
>> > How many operations and attributes are you going to copy&paste?
>> >
>> > Is linking devlink instances into a hierarchy a better approach?
>>
>> In this particular case, these devices are gearboxes. They are running
>> their own firmware and we want user space to be able to query and update
>> the running firmware version.
>
>Nothing too special, then, we don't create "devices" for every
>component of the system which can have a separate FW. That's where
>"components" are intended to be used..
*
Sure, that is why I re-used components :)
But you have to somehow link the component to the particular gearbox on
particular line card. Say, you need to flash GB on line card 8. This is
basically providing a way to expose this relationship to user. Also, the
"lc info" shows the FW version for gearboxes. As Ido mentioned, the GB
versions could be listed in "devlink dev info" in theory. But then, you
need to somehow expose the relationship with line card as well.
I don't see any simpler iface than this.
>
>> The idea (implemented in the next patchset) is to let these devices
>> expose their own "component name", which can then be plugged into the
>> existing flash command:
>>
>> $ devlink lc show pci/0000:01:00.0 lc 8
>> pci/0000:01:00.0:
>> lc 8 state active type 16x100G
>> supported_types:
>> 16x100G
>> devices:
>> device 0 flashable true component lc8_dev0
>> device 1 flashable false
>> device 2 flashable false
>> device 3 flashable false
>> $ devlink dev flash pci/0000:01:00.0 file some_file.mfa2 component lc8_dev0
>
>IDK if it's just me or this assumes deep knowledge of the system.
>I don't understand why we need to list devices 1-3 at all. And they
>don't even have names. No information is exposed.
There are 4 gearboxes on the line card. They share the same flash. So if
you flash gearbox 0, the rest will use the same FW.
I'm exposing them for the sake of completeness. Also, the interface
needs to be designed as a list anyway, as different line cards may have
separate flash per gearbox.
What's is the harm in exposing devices 1-3? If you insist, we can hide
them.
>
>There are many components on any networking device, including plenty
>40G-R4 -> 25G-R1 gearboxes out there.
>
>> Registering a separate devlink instance for these devices sounds like an
>> overkill to me. If you are not OK with a separate command (e.g.,
>> DEVLINK_CMD_LINECARD_INFO_GET), then extending DEVLINK_CMD_INFO_GET is
>> also an option. We discussed this during internal review, but felt that
>> the current approach is cleaner.
>
>I don't know what you have queued, so if you don't need a full devlink
>instance (IOW line cards won't need more individual config) that's fine.
Yeah, incoparable, the devlink dev and line card device - gearbox.
>For just FW flashing you can list many devices and update the
>components... no need to introduce new objects or uAPI.
Please see * above.
>
>> > Would you mind if I revert this?
>>
>> I can't stop you, but keep in mind that it's already late here and that
>> tomorrow I'm AFK (reserve duty) and won't be able to tag it. Jiri should
>> be available to continue this discussion tomorrow morning, so probably
>> best to wait for his feedback.
>
>Sure, no rush.
^ permalink raw reply
* Re: [PATCH perf/core 1/5] libbpf: Add bpf_program__set_insns function
From: Jiri Olsa @ 2022-04-26 6:57 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Daniel Borkmann, Jiri Olsa, Arnaldo Carvalho de Melo,
Alexei Starovoitov, Andrii Nakryiko, linux-perf-use., Networking,
bpf, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
Peter Zijlstra, Martin KaFai Lau, Song Liu, Yonghong Song,
John Fastabend, Ian Rogers
In-Reply-To: <CAEf4BzZOKosYRHwK2CfZzpTUcDdrLXPXbYax++Q_PHCMcNdqCw@mail.gmail.com>
On Mon, Apr 25, 2022 at 11:19:09PM -0700, Andrii Nakryiko wrote:
> On Mon, Apr 25, 2022 at 9:22 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > On 4/22/22 12:00 PM, Jiri Olsa wrote:
> > > Adding bpf_program__set_insns that allows to set new
> > > instructions for program.
> > >
> > > Also moving bpf_program__attach_kprobe_multi_opts on
> > > the proper name sorted place in map file.
>
> would make sense to fix it as a separate patch, it has nothing to do
> with bpf_program__set_insns() API itself
np
>
> > >
> > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > ---
> > > tools/lib/bpf/libbpf.c | 8 ++++++++
> > > tools/lib/bpf/libbpf.h | 12 ++++++++++++
> > > tools/lib/bpf/libbpf.map | 3 ++-
> > > 3 files changed, 22 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > > index 809fe209cdcc..284790d81c1b 100644
> > > --- a/tools/lib/bpf/libbpf.c
> > > +++ b/tools/lib/bpf/libbpf.c
> > > @@ -8457,6 +8457,14 @@ size_t bpf_program__insn_cnt(const struct bpf_program *prog)
> > > return prog->insns_cnt;
> > > }
> > >
> > > +void bpf_program__set_insns(struct bpf_program *prog,
> > > + struct bpf_insn *insns, size_t insns_cnt)
> > > +{
> > > + free(prog->insns);
> > > + prog->insns = insns;
> > > + prog->insns_cnt = insns_cnt;
>
> let's not store user-provided pointer here. Please realloc prog->insns
> as necessary and copy over insns into it.
>
> Also let's at least add the check for prog->loaded and return -EBUSY
> in such a case. And of course this API should return int, not void.
ok, will change
>
> > > +}
> > > +
> > > int bpf_program__set_prep(struct bpf_program *prog, int nr_instances,
> > > bpf_program_prep_t prep)
> > > {
> > > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> > > index 05dde85e19a6..b31ad58d335f 100644
> > > --- a/tools/lib/bpf/libbpf.h
> > > +++ b/tools/lib/bpf/libbpf.h
> > > @@ -323,6 +323,18 @@ struct bpf_insn;
> > > * different.
> > > */
> > > LIBBPF_API const struct bpf_insn *bpf_program__insns(const struct bpf_program *prog);
> > > +
> > > +/**
> > > + * @brief **bpf_program__set_insns()** can set BPF program's underlying
> > > + * BPF instructions.
> > > + * @param prog BPF program for which to return instructions
> > > + * @param insn a pointer to an array of BPF instructions
> > > + * @param insns_cnt number of `struct bpf_insn`'s that form
> > > + * specified BPF program
> > > + */
>
> This API makes me want to cry... but I can't come up with anything
> better for perf's use case.
>
> But it can only more or less safely and sanely be used from the
> prog_prepare_load_fn callback, so please add a big warning here saying
> that this is a very advanced libbpf API and the user needs to know
> what they are doing and this should be used from prog_prepare_load_fn
> callback only. If bpf_program__set_insns() is called before
> prog_prepare_load_fn any map/subprog/etc relocation will most probably
> fail or corrupt BPF program code.
will add the warnings
>
> > > +LIBBPF_API void bpf_program__set_insns(struct bpf_program *prog,
> > > + struct bpf_insn *insns, size_t insns_cnt);
>
> s/insns_cnt/insn_cnt/
>
> > > +
> >
> > Iiuc, patch 2 should be squashed into this one given they logically belong to the
> > same change?
> >
> > Fwiw, I think the API description should be elaborated a bit more, in particular that
> > the passed-in insns need to be from allocated dynamic memory which is later on passed
> > to free(), and maybe also constraints at which point in time bpf_program__set_insns()
> > may be called.. (as well as high-level description on potential use cases e.g. around
> > patch 4).
>
> Yep, patch #1 is kind of broken without patch #2, so let's combine them.
ok
thanks,
jirka
^ permalink raw reply
* Re: [PATCH bpf-next 3/4] bpftool: Fix incorrect return in generated detach helper
From: Andrii Nakryiko @ 2022-04-26 6:47 UTC (permalink / raw)
To: Yafang Shao
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
Song Liu, Yonghong Song, john fastabend, KP Singh, Networking,
bpf
In-Reply-To: <20220423140058.54414-4-laoar.shao@gmail.com>
On Sat, Apr 23, 2022 at 7:02 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> There is no return value of bpf_object__detach_skeleton(), so we'd
> better not return it, that is formal.
>
> Fixes: 5dc7a8b21144 ("bpftool, selftests/bpf: Embed object file inside skeleton")
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
> tools/bpf/bpftool/gen.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/bpf/bpftool/gen.c b/tools/bpf/bpftool/gen.c
> index 7678af364793..8f76d8d9996c 100644
> --- a/tools/bpf/bpftool/gen.c
> +++ b/tools/bpf/bpftool/gen.c
> @@ -1171,7 +1171,7 @@ static int do_skeleton(int argc, char **argv)
> static inline void \n\
> %1$s__detach(struct %1$s *obj) \n\
> { \n\
> - return bpf_object__detach_skeleton(obj->skeleton); \n\
> + bpf_object__detach_skeleton(obj->skeleton); \n\
It's not incorrect to return the result of void-returning function in
another void-returning function. C compiler allows this and we rely on
this property very explicitly in macros like BPF_PROG and BPF_KPROBE.
So if anything, it's not a fix, at best improvement, but even then
quite optional.
> } \n\
> ",
> obj_name
> --
> 2.17.1
>
^ permalink raw reply
* Re: [PATCH net-next 00/11] mlxsw: extend line card model by devices and info
From: Jiri Pirko @ 2022-04-26 6:47 UTC (permalink / raw)
To: Ido Schimmel
Cc: Jakub Kicinski, Ido Schimmel, netdev, davem, pabeni, jiri, petrm,
dsahern, andrew, mlxsw
In-Reply-To: <Ymb5DQonnrnIBG3c@shredder>
Mon, Apr 25, 2022 at 09:39:57PM CEST, idosch@idosch.org wrote:
>On Mon, Apr 25, 2022 at 09:00:21AM -0700, Jakub Kicinski wrote:
>> On Mon, 25 Apr 2022 06:44:20 +0300 Ido Schimmel wrote:
>> > This patchset is extending the line card model by three items:
>> > 1) line card devices
>> > 2) line card info
>> > 3) line card device info
>> >
>> > First three patches are introducing the necessary changes in devlink
>> > core.
>> >
>> > Then, all three extensions are implemented in mlxsw alongside with
>> > selftest.
>>
>> :/ what is a line card device? You must provide document what you're
>> doing, this:
>>
>> .../networking/devlink/devlink-linecard.rst | 4 +
>>
>> is not enough.
>>
>> How many operations and attributes are you going to copy&paste?
>>
>> Is linking devlink instances into a hierarchy a better approach?
>
>In this particular case, these devices are gearboxes. They are running
>their own firmware and we want user space to be able to query and update
>the running firmware version.
>
>The idea (implemented in the next patchset) is to let these devices
>expose their own "component name", which can then be plugged into the
>existing flash command:
>
> $ devlink lc show pci/0000:01:00.0 lc 8
> pci/0000:01:00.0:
> lc 8 state active type 16x100G
> supported_types:
> 16x100G
> devices:
> device 0 flashable true component lc8_dev0
> device 1 flashable false
> device 2 flashable false
> device 3 flashable false
> $ devlink dev flash pci/0000:01:00.0 file some_file.mfa2 component lc8_dev0
>
>Registering a separate devlink instance for these devices sounds like an
It is not a separate devlink device, not removetely. A devlink device is
attached to some bus on the host, it contains entities like netdevices,
etc.
Line card devices, on contrary, are accessible over ASIC FW interface,
they reside on line cards. ASIC FW is using build-in SDK to communicate
with them. There is really nothing to expose, except for the face they
are there, with some FW version and later on (follow-up patchset) to be
able to flash FW on them.
It's a gearbox. I found it odd to name it gearbox as in theory, there
might be some other single purpose device on the line card of other type
than gearbox. Therefore, "device". I admit it is a bit misleading. Idea
for a better name?
>overkill to me. If you are not OK with a separate command (e.g.,
>DEVLINK_CMD_LINECARD_INFO_GET), then extending DEVLINK_CMD_INFO_GET is
>also an option. We discussed this during internal review, but felt that
We would need to add all the line card hierarchy into info_get. That
would look a bit odd to me.
>the current approach is cleaner.
>
>> Would you mind if I revert this?
Let's see what you need to change and change it in place, if possible.
>
>I can't stop you, but keep in mind that it's already late here and that
>tomorrow I'm AFK (reserve duty) and won't be able to tag it. Jiri should
>be available to continue this discussion tomorrow morning, so probably
>best to wait for his feedback.
^ permalink raw reply
* Re: [PATCH bpf-next 1/4] libbpf: Define DEFAULT_BPFFS
From: Andrii Nakryiko @ 2022-04-26 6:45 UTC (permalink / raw)
To: Yafang Shao
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
Song Liu, Yonghong Song, john fastabend, KP Singh, Networking,
bpf
In-Reply-To: <20220423140058.54414-2-laoar.shao@gmail.com>
On Sat, Apr 23, 2022 at 7:01 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> Let's use a macro DEFAULT_BPFFS instead of the hard-coded "/sys/fs/bpf".
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
> tools/lib/bpf/bpf_helpers.h | 2 +-
> tools/lib/bpf/libbpf.c | 2 +-
> tools/lib/bpf/libbpf.h | 6 ++++--
> 3 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
> index 44df982d2a5c..9161ebcd3466 100644
> --- a/tools/lib/bpf/bpf_helpers.h
> +++ b/tools/lib/bpf/bpf_helpers.h
> @@ -137,7 +137,7 @@ struct bpf_map_def {
>
> enum libbpf_pin_type {
> LIBBPF_PIN_NONE,
> - /* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
> + /* PIN_BY_NAME: pin maps by name (in DEFAULT_BPFFS by default) */
how is this improving things? now I need to grep some more to find out
what's the value of DEFAULT_BPFFS is
> LIBBPF_PIN_BY_NAME,
> };
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 9a213aaaac8a..13fcf91e9e0e 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -2180,7 +2180,7 @@ static int build_map_pin_path(struct bpf_map *map, const char *path)
> int len;
>
> if (!path)
> - path = "/sys/fs/bpf";
> + path = DEFAULT_BPFFS;
>
> len = snprintf(buf, PATH_MAX, "%s/%s", path, bpf_map__name(map));
> if (len < 0)
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index cdbfee60ea3e..3784867811a4 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -28,6 +28,8 @@ LIBBPF_API __u32 libbpf_major_version(void);
> LIBBPF_API __u32 libbpf_minor_version(void);
> LIBBPF_API const char *libbpf_version_string(void);
>
> +#define DEFAULT_BPFFS "/sys/fs/bpf"
> +
> enum libbpf_errno {
> __LIBBPF_ERRNO__START = 4000,
>
> @@ -91,7 +93,7 @@ struct bpf_object_open_opts {
> bool relaxed_core_relocs;
> /* maps that set the 'pinning' attribute in their definition will have
> * their pin_path attribute set to a file in this directory, and be
> - * auto-pinned to that path on load; defaults to "/sys/fs/bpf".
> + * auto-pinned to that path on load; defaults to DEFAULT_BPFFS.
> */
> const char *pin_root_path;
>
> @@ -190,7 +192,7 @@ bpf_object__open_xattr(struct bpf_object_open_attr *attr);
>
> enum libbpf_pin_type {
> LIBBPF_PIN_NONE,
> - /* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
> + /* PIN_BY_NAME: pin maps by name (in DEFAULT_BPFFS by default) */
> LIBBPF_PIN_BY_NAME,
> };
>
> --
> 2.17.1
>
^ permalink raw reply
* Re: [PATCH bpf-next 0/4] bpf: Generate helpers for pinning through bpf object skeleton
From: Andrii Nakryiko @ 2022-04-26 6:45 UTC (permalink / raw)
To: Yafang Shao
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
Song Liu, Yonghong Song, john fastabend, KP Singh, Networking,
bpf
In-Reply-To: <20220423140058.54414-1-laoar.shao@gmail.com>
On Sat, Apr 23, 2022 at 7:01 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> Currently there're helpers for allowing to open/load/attach BPF object
> through BPF object skeleton. Let's also add helpers for pinning through
> BPF object skeleton. It could simplify BPF userspace code which wants to
> pin the progs into bpffs.
>
> After this change, with command 'bpftool gen skeleton XXX.bpf.o', the
> helpers for pinning BPF prog will be generated in BPF object skeleton.
>
> The new helpers are named with __{pin, unpin}_prog, because it only pins
> bpf progs. If the user also wants to pin bpf maps, he can use
> LIBBPF_PIN_BY_NAME.
API says it's pinning programs, but really it's trying to pin links.
But those links might not even be created for non-auto-attachable
programs, and for others users might or might not set
<skel>.links.<prog_name> links.
There are lots of questions about this new functionality... But the
main one is why do we need it? What does it bring that's hard to do
otherwise?
>
> Yafang Shao (4):
> libbpf: Define DEFAULT_BPFFS
> libbpf: Add helpers for pinning bpf prog through bpf object skeleton
> bpftool: Fix incorrect return in generated detach helper
> bpftool: Generate helpers for pinning prog through bpf object skeleton
>
> tools/bpf/bpftool/gen.c | 18 ++++++++++-
> tools/lib/bpf/bpf_helpers.h | 2 +-
> tools/lib/bpf/libbpf.c | 61 ++++++++++++++++++++++++++++++++++++-
> tools/lib/bpf/libbpf.h | 10 ++++--
> tools/lib/bpf/libbpf.map | 2 ++
> 5 files changed, 88 insertions(+), 5 deletions(-)
>
> --
> 2.17.1
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox