* [PATCH net 1/5] net: can: j1939: enhanced error handling for tightly received RTS messages in xtp_rx_rts_session_new
2024-06-21 11:23 [PATCH net 0/5] pull-request: can 2024-06-21 Marc Kleine-Budde
@ 2024-06-21 11:23 ` Marc Kleine-Budde
2024-06-22 1:20 ` patchwork-bot+netdevbpf
2024-06-21 11:23 ` [PATCH net 2/5] net: can: j1939: Initialize unused data in j1939_send_one() Marc Kleine-Budde
` (3 subsequent siblings)
4 siblings, 1 reply; 7+ messages in thread
From: Marc Kleine-Budde @ 2024-06-21 11:23 UTC (permalink / raw)
To: netdev
Cc: davem, kuba, linux-can, kernel, Oleksij Rempel,
syzbot+daa36413a5cedf799ae4, stable, Marc Kleine-Budde
From: Oleksij Rempel <o.rempel@pengutronix.de>
This patch enhances error handling in scenarios with RTS (Request to
Send) messages arriving closely. It replaces the less informative WARN_ON_ONCE
backtraces with a new error handling method. This provides clearer error
messages and allows for the early termination of problematic sessions.
Previously, sessions were only released at the end of j1939_xtp_rx_rts().
Potentially this could be reproduced with something like:
testj1939 -r vcan0:0x80 &
while true; do
# send first RTS
cansend vcan0 18EC8090#1014000303002301;
# send second RTS
cansend vcan0 18EC8090#1014000303002301;
# send abort
cansend vcan0 18EC8090#ff00000000002301;
done
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Reported-by: syzbot+daa36413a5cedf799ae4@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/all/20231117124959.961171-1-o.rempel@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
net/can/j1939/transport.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c
index fe3df23a2595..c6569f98d251 100644
--- a/net/can/j1939/transport.c
+++ b/net/can/j1939/transport.c
@@ -1593,8 +1593,8 @@ j1939_session *j1939_xtp_rx_rts_session_new(struct j1939_priv *priv,
struct j1939_sk_buff_cb skcb = *j1939_skb_to_cb(skb);
struct j1939_session *session;
const u8 *dat;
+ int len, ret;
pgn_t pgn;
- int len;
netdev_dbg(priv->ndev, "%s\n", __func__);
@@ -1653,7 +1653,22 @@ j1939_session *j1939_xtp_rx_rts_session_new(struct j1939_priv *priv,
session->tskey = priv->rx_tskey++;
j1939_sk_errqueue(session, J1939_ERRQUEUE_RX_RTS);
- WARN_ON_ONCE(j1939_session_activate(session));
+ ret = j1939_session_activate(session);
+ if (ret) {
+ /* Entering this scope indicates an issue with the J1939 bus.
+ * Possible scenarios include:
+ * - A time lapse occurred, and a new session was initiated
+ * due to another packet being sent correctly. This could
+ * have been caused by too long interrupt, debugger, or being
+ * out-scheduled by another task.
+ * - The bus is receiving numerous erroneous packets, either
+ * from a malfunctioning device or during a test scenario.
+ */
+ netdev_alert(priv->ndev, "%s: 0x%p: concurrent session with same addr (%02x %02x) is already active.\n",
+ __func__, session, skcb.addr.sa, skcb.addr.da);
+ j1939_session_put(session);
+ return NULL;
+ }
return session;
}
base-commit: 8851346912a1fa33e7a5966fe51f07313b274627
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH net 2/5] net: can: j1939: Initialize unused data in j1939_send_one()
2024-06-21 11:23 [PATCH net 0/5] pull-request: can 2024-06-21 Marc Kleine-Budde
2024-06-21 11:23 ` [PATCH net 1/5] net: can: j1939: enhanced error handling for tightly received RTS messages in xtp_rx_rts_session_new Marc Kleine-Budde
@ 2024-06-21 11:23 ` Marc Kleine-Budde
2024-06-21 11:23 ` [PATCH net 3/5] net: can: j1939: recover socket queue on CAN bus error during BAM transmission Marc Kleine-Budde
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Marc Kleine-Budde @ 2024-06-21 11:23 UTC (permalink / raw)
To: netdev
Cc: davem, kuba, linux-can, kernel, Shigeru Yoshida,
syzbot+5681e40d297b30f5b513, Oleksij Rempel, stable,
Marc Kleine-Budde
From: Shigeru Yoshida <syoshida@redhat.com>
syzbot reported kernel-infoleak in raw_recvmsg() [1]. j1939_send_one()
creates full frame including unused data, but it doesn't initialize
it. This causes the kernel-infoleak issue. Fix this by initializing
unused data.
[1]
BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:114 [inline]
BUG: KMSAN: kernel-infoleak in copy_to_user_iter lib/iov_iter.c:24 [inline]
BUG: KMSAN: kernel-infoleak in iterate_ubuf include/linux/iov_iter.h:29 [inline]
BUG: KMSAN: kernel-infoleak in iterate_and_advance2 include/linux/iov_iter.h:245 [inline]
BUG: KMSAN: kernel-infoleak in iterate_and_advance include/linux/iov_iter.h:271 [inline]
BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x366/0x2520 lib/iov_iter.c:185
instrument_copy_to_user include/linux/instrumented.h:114 [inline]
copy_to_user_iter lib/iov_iter.c:24 [inline]
iterate_ubuf include/linux/iov_iter.h:29 [inline]
iterate_and_advance2 include/linux/iov_iter.h:245 [inline]
iterate_and_advance include/linux/iov_iter.h:271 [inline]
_copy_to_iter+0x366/0x2520 lib/iov_iter.c:185
copy_to_iter include/linux/uio.h:196 [inline]
memcpy_to_msg include/linux/skbuff.h:4113 [inline]
raw_recvmsg+0x2b8/0x9e0 net/can/raw.c:1008
sock_recvmsg_nosec net/socket.c:1046 [inline]
sock_recvmsg+0x2c4/0x340 net/socket.c:1068
____sys_recvmsg+0x18a/0x620 net/socket.c:2803
___sys_recvmsg+0x223/0x840 net/socket.c:2845
do_recvmmsg+0x4fc/0xfd0 net/socket.c:2939
__sys_recvmmsg net/socket.c:3018 [inline]
__do_sys_recvmmsg net/socket.c:3041 [inline]
__se_sys_recvmmsg net/socket.c:3034 [inline]
__x64_sys_recvmmsg+0x397/0x490 net/socket.c:3034
x64_sys_call+0xf6c/0x3b50 arch/x86/include/generated/asm/syscalls_64.h:300
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Uninit was created at:
slab_post_alloc_hook mm/slub.c:3804 [inline]
slab_alloc_node mm/slub.c:3845 [inline]
kmem_cache_alloc_node+0x613/0xc50 mm/slub.c:3888
kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:577
__alloc_skb+0x35b/0x7a0 net/core/skbuff.c:668
alloc_skb include/linux/skbuff.h:1313 [inline]
alloc_skb_with_frags+0xc8/0xbf0 net/core/skbuff.c:6504
sock_alloc_send_pskb+0xa81/0xbf0 net/core/sock.c:2795
sock_alloc_send_skb include/net/sock.h:1842 [inline]
j1939_sk_alloc_skb net/can/j1939/socket.c:878 [inline]
j1939_sk_send_loop net/can/j1939/socket.c:1142 [inline]
j1939_sk_sendmsg+0xc0a/0x2730 net/can/j1939/socket.c:1277
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x30f/0x380 net/socket.c:745
____sys_sendmsg+0x877/0xb60 net/socket.c:2584
___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638
__sys_sendmsg net/socket.c:2667 [inline]
__do_sys_sendmsg net/socket.c:2676 [inline]
__se_sys_sendmsg net/socket.c:2674 [inline]
__x64_sys_sendmsg+0x307/0x4a0 net/socket.c:2674
x64_sys_call+0xc4b/0x3b50 arch/x86/include/generated/asm/syscalls_64.h:47
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Bytes 12-15 of 16 are uninitialized
Memory access of size 16 starts at ffff888120969690
Data copied to user address 00000000200017c0
CPU: 1 PID: 5050 Comm: syz-executor198 Not tainted 6.9.0-rc5-syzkaller-00031-g71b1543c83d6 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Reported-and-tested-by: syzbot+5681e40d297b30f5b513@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=5681e40d297b30f5b513
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Link: https://lore.kernel.org/all/20240517035953.2617090-1-syoshida@redhat.com
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
net/can/j1939/main.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/net/can/j1939/main.c b/net/can/j1939/main.c
index a6fb89fa6278..7e8a20f2fc42 100644
--- a/net/can/j1939/main.c
+++ b/net/can/j1939/main.c
@@ -30,10 +30,6 @@ MODULE_ALIAS("can-proto-" __stringify(CAN_J1939));
/* CAN_HDR: #bytes before can_frame data part */
#define J1939_CAN_HDR (offsetof(struct can_frame, data))
-/* CAN_FTR: #bytes beyond data part */
-#define J1939_CAN_FTR (sizeof(struct can_frame) - J1939_CAN_HDR - \
- sizeof(((struct can_frame *)0)->data))
-
/* lowest layer */
static void j1939_can_recv(struct sk_buff *iskb, void *data)
{
@@ -342,7 +338,7 @@ int j1939_send_one(struct j1939_priv *priv, struct sk_buff *skb)
memset(cf, 0, J1939_CAN_HDR);
/* make it a full can frame again */
- skb_put(skb, J1939_CAN_FTR + (8 - dlc));
+ skb_put_zero(skb, 8 - dlc);
canid = CAN_EFF_FLAG |
(skcb->priority << 26) |
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH net 5/5] can: mcp251xfd: fix infinite loop when xmit fails
2024-06-21 11:23 [PATCH net 0/5] pull-request: can 2024-06-21 Marc Kleine-Budde
` (3 preceding siblings ...)
2024-06-21 11:23 ` [PATCH net 4/5] can: kvaser_usb: fix return value for hif_usb_send_regout Marc Kleine-Budde
@ 2024-06-21 11:23 ` Marc Kleine-Budde
4 siblings, 0 replies; 7+ messages in thread
From: Marc Kleine-Budde @ 2024-06-21 11:23 UTC (permalink / raw)
To: netdev
Cc: davem, kuba, linux-can, kernel, Vitor Soares, stable,
Marc Kleine-Budde
From: Vitor Soares <vitor.soares@toradex.com>
When the mcp251xfd_start_xmit() function fails, the driver stops
processing messages, and the interrupt routine does not return,
running indefinitely even after killing the running application.
Error messages:
[ 441.298819] mcp251xfd spi2.0 can0: ERROR in mcp251xfd_start_xmit: -16
[ 441.306498] mcp251xfd spi2.0 can0: Transmit Event FIFO buffer not empty. (seq=0x000017c7, tef_tail=0x000017cf, tef_head=0x000017d0, tx_head=0x000017d3).
... and repeat forever.
The issue can be triggered when multiple devices share the same SPI
interface. And there is concurrent access to the bus.
The problem occurs because tx_ring->head increments even if
mcp251xfd_start_xmit() fails. Consequently, the driver skips one TX
package while still expecting a response in
mcp251xfd_handle_tefif_one().
Resolve the issue by starting a workqueue to write the tx obj
synchronously if err = -EBUSY. In case of another error, decrement
tx_ring->head, remove skb from the echo stack, and drop the message.
Fixes: 55e5b97f003e ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN")
Cc: stable@vger.kernel.org
Signed-off-by: Vitor Soares <vitor.soares@toradex.com>
Link: https://lore.kernel.org/all/20240517134355.770777-1-ivitro@gmail.com
[mkl: use more imperative wording in patch description]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
.../net/can/spi/mcp251xfd/mcp251xfd-core.c | 14 ++++-
drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c | 55 ++++++++++++++++---
drivers/net/can/spi/mcp251xfd/mcp251xfd.h | 5 ++
3 files changed, 65 insertions(+), 9 deletions(-)
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
index 1d9057dc44f2..bf1589aef1fc 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
@@ -1618,11 +1618,20 @@ static int mcp251xfd_open(struct net_device *ndev)
clear_bit(MCP251XFD_FLAGS_DOWN, priv->flags);
can_rx_offload_enable(&priv->offload);
+ priv->wq = alloc_ordered_workqueue("%s-mcp251xfd_wq",
+ WQ_FREEZABLE | WQ_MEM_RECLAIM,
+ dev_name(&spi->dev));
+ if (!priv->wq) {
+ err = -ENOMEM;
+ goto out_can_rx_offload_disable;
+ }
+ INIT_WORK(&priv->tx_work, mcp251xfd_tx_obj_write_sync);
+
err = request_threaded_irq(spi->irq, NULL, mcp251xfd_irq,
IRQF_SHARED | IRQF_ONESHOT,
dev_name(&spi->dev), priv);
if (err)
- goto out_can_rx_offload_disable;
+ goto out_destroy_workqueue;
err = mcp251xfd_chip_interrupts_enable(priv);
if (err)
@@ -1634,6 +1643,8 @@ static int mcp251xfd_open(struct net_device *ndev)
out_free_irq:
free_irq(spi->irq, priv);
+ out_destroy_workqueue:
+ destroy_workqueue(priv->wq);
out_can_rx_offload_disable:
can_rx_offload_disable(&priv->offload);
set_bit(MCP251XFD_FLAGS_DOWN, priv->flags);
@@ -1661,6 +1672,7 @@ static int mcp251xfd_stop(struct net_device *ndev)
hrtimer_cancel(&priv->tx_irq_timer);
mcp251xfd_chip_interrupts_disable(priv);
free_irq(ndev->irq, priv);
+ destroy_workqueue(priv->wq);
can_rx_offload_disable(&priv->offload);
mcp251xfd_timestamp_stop(priv);
mcp251xfd_chip_stop(priv, CAN_STATE_STOPPED);
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c
index 160528d3cc26..b1de8052a45c 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c
@@ -131,6 +131,39 @@ mcp251xfd_tx_obj_from_skb(const struct mcp251xfd_priv *priv,
tx_obj->xfer[0].len = len;
}
+static void mcp251xfd_tx_failure_drop(const struct mcp251xfd_priv *priv,
+ struct mcp251xfd_tx_ring *tx_ring,
+ int err)
+{
+ struct net_device *ndev = priv->ndev;
+ struct net_device_stats *stats = &ndev->stats;
+ unsigned int frame_len = 0;
+ u8 tx_head;
+
+ tx_ring->head--;
+ stats->tx_dropped++;
+ tx_head = mcp251xfd_get_tx_head(tx_ring);
+ can_free_echo_skb(ndev, tx_head, &frame_len);
+ netdev_completed_queue(ndev, 1, frame_len);
+ netif_wake_queue(ndev);
+
+ if (net_ratelimit())
+ netdev_err(priv->ndev, "ERROR in %s: %d\n", __func__, err);
+}
+
+void mcp251xfd_tx_obj_write_sync(struct work_struct *work)
+{
+ struct mcp251xfd_priv *priv = container_of(work, struct mcp251xfd_priv,
+ tx_work);
+ struct mcp251xfd_tx_obj *tx_obj = priv->tx_work_obj;
+ struct mcp251xfd_tx_ring *tx_ring = priv->tx;
+ int err;
+
+ err = spi_sync(priv->spi, &tx_obj->msg);
+ if (err)
+ mcp251xfd_tx_failure_drop(priv, tx_ring, err);
+}
+
static int mcp251xfd_tx_obj_write(const struct mcp251xfd_priv *priv,
struct mcp251xfd_tx_obj *tx_obj)
{
@@ -162,6 +195,11 @@ static bool mcp251xfd_tx_busy(const struct mcp251xfd_priv *priv,
return false;
}
+static bool mcp251xfd_work_busy(struct work_struct *work)
+{
+ return work_busy(work);
+}
+
netdev_tx_t mcp251xfd_start_xmit(struct sk_buff *skb,
struct net_device *ndev)
{
@@ -175,7 +213,8 @@ netdev_tx_t mcp251xfd_start_xmit(struct sk_buff *skb,
if (can_dev_dropped_skb(ndev, skb))
return NETDEV_TX_OK;
- if (mcp251xfd_tx_busy(priv, tx_ring))
+ if (mcp251xfd_tx_busy(priv, tx_ring) ||
+ mcp251xfd_work_busy(&priv->tx_work))
return NETDEV_TX_BUSY;
tx_obj = mcp251xfd_get_tx_obj_next(tx_ring);
@@ -193,13 +232,13 @@ netdev_tx_t mcp251xfd_start_xmit(struct sk_buff *skb,
netdev_sent_queue(priv->ndev, frame_len);
err = mcp251xfd_tx_obj_write(priv, tx_obj);
- if (err)
- goto out_err;
-
- return NETDEV_TX_OK;
-
- out_err:
- netdev_err(priv->ndev, "ERROR in %s: %d\n", __func__, err);
+ if (err == -EBUSY) {
+ netif_stop_queue(ndev);
+ priv->tx_work_obj = tx_obj;
+ queue_work(priv->wq, &priv->tx_work);
+ } else if (err) {
+ mcp251xfd_tx_failure_drop(priv, tx_ring, err);
+ }
return NETDEV_TX_OK;
}
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
index 24510b3b8020..b35bfebd23f2 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
@@ -633,6 +633,10 @@ struct mcp251xfd_priv {
struct mcp251xfd_rx_ring *rx[MCP251XFD_FIFO_RX_NUM];
struct mcp251xfd_tx_ring tx[MCP251XFD_FIFO_TX_NUM];
+ struct workqueue_struct *wq;
+ struct work_struct tx_work;
+ struct mcp251xfd_tx_obj *tx_work_obj;
+
DECLARE_BITMAP(flags, __MCP251XFD_FLAGS_SIZE__);
u8 rx_ring_num;
@@ -952,6 +956,7 @@ void mcp251xfd_skb_set_timestamp(const struct mcp251xfd_priv *priv,
void mcp251xfd_timestamp_init(struct mcp251xfd_priv *priv);
void mcp251xfd_timestamp_stop(struct mcp251xfd_priv *priv);
+void mcp251xfd_tx_obj_write_sync(struct work_struct *work);
netdev_tx_t mcp251xfd_start_xmit(struct sk_buff *skb,
struct net_device *ndev);
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread