* Re: [PATCH 2/4] vhost/vsock: add VHOST_RESET_OWNER ioctl
From: Stefano Garzarella @ 2026-06-16 14:26 UTC (permalink / raw)
To: Andrey Drobyshev
Cc: linux-kernel, kvm, virtualization, netdev, mst, stefanha,
maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den
In-Reply-To: <129f5833-3a7f-4b2d-a965-20903e4e2fb5@virtuozzo.com>
On Tue, Jun 16, 2026 at 05:10:38PM +0300, Andrey Drobyshev wrote:
>On 6/16/26 4:48 PM, Stefano Garzarella wrote:
>> On Fri, Jun 12, 2026 at 07:57:16PM +0300, Andrey Drobyshev wrote:
>>> From: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>>>
>>> This ioctl is needed for QEMU's CPR (checkpoint-restore) migration of
>>> the guest with vhost-vsock device. For this to work, we need to reset
>>> the device ownership on the source side by calling RESET_OWNER, and then
>>> claim it on the dest side by calling SET_OWNER. We expect not to lose any
>>> AF_VSOCK connection while this happens.
>>>
>>> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>>> ---
>>> drivers/vhost/vsock.c | 28 ++++++++++++++++++++++++++++
>>> 1 file changed, 28 insertions(+)
>>>
>>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>>> index b12221ce6faf..e629886e5cf8 100644
>>> --- a/drivers/vhost/vsock.c
>>> +++ b/drivers/vhost/vsock.c
>>> @@ -894,6 +894,32 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
>>> return -EFAULT;
>>> }
>>>
>>> +static int vhost_vsock_reset_owner(struct vhost_vsock *vsock)
>>> +{
>>> + struct vhost_iotlb *umem;
>>> + long err;
>>> +
>>> + mutex_lock(&vsock->dev.mutex);
>>> + err = vhost_dev_check_owner(&vsock->dev);
>>> + if (err)
>>> + goto done;
>>> + umem = vhost_dev_reset_owner_prepare();
>>> + if (!umem) {
>>> + err = -ENOMEM;
>>> + goto done;
>>> + }
>>> + /* Follows vhost_vsock_dev_release closely except for guest_cid drop */
>>> + vsock_for_each_connected_socket(&vhost_transport.transport,
>>> + vhost_vsock_reset_orphans);
>>
>> In vhost_vsock_reset_orphans() we have:
>>
>> rcu_read_lock();
>>
>> /* If the peer is still valid, no need to reset connection */
>> if (vhost_vsock_get(vsk->remote_addr.svm_cid, sock_net(sk))) {
>> rcu_read_unlock();
>> return;
>> }
>>
>> IIUC we are not removing the guest cid from the hash table, so this
>> check will be always true, and nothing is done.
>>
>> So, is this call really useful?
>>
>
>You're right, and it's most probably an artifact from mimicking the
>vhost_vsock_dev_release() implementation, as mentioned in the comment.
>In our case this whole iteration is a no-op, we better remove it.
>
>BTW earlier I received some feedback from Sashiko AI reviewer, which
>also spotted that same issue (and some more interesting races):
>
>https://sashiko.dev/#/patchset/20260612165718.433546-1-andrey.drobyshev@virtuozzo.com
Oh they seems similar to claude comments I included in my comment on
patch 3.
Yeah, we should takes a look, they seems real issues.
>
>Apparently it only CC's its reviews to kvm@vger.kernel.org so you can't
>see them right away. Just wanted to let you know to save your time
>here. I'll send a v2 with respect to Sashiko remarks. But of course
>would be great if you spot some more issues here.
>
Thanks for pointing that out, but in general I try to do my reviews
before looking at AI reviews (both sashiko or claude locally) to avoid
to be too much biased.
Thanks,
Stefano
^ permalink raw reply
* [PATCH net] net: ena: clean up XDP TX queues when regular TX setup fails
From: Dawei Feng @ 2026-06-16 14:24 UTC (permalink / raw)
To: akiyano
Cc: darinzon, andrew+netdev, davem, edumazet, kuba, pabeni, ast,
daniel, hawk, john.fastabend, sdf, sameehj, netdev, linux-kernel,
bpf, jianhao.xu, Dawei Feng, stable
create_queues_with_size_backoff() creates XDP TX queues before setting
up the regular TX path. If the subsequent allocation or creation of
regular TX queues fails, the error handling paths omit the teardown of the
XDP TX queues, leading to a resource leak.
Fix this by explicitly destroying the XDP TX queue subset at the two
missing failure points.
The bug was first flagged by an experimental analysis tool we are
developing for kernel memory-management bugs while analyzing
v6.13-rc1. The tool is still under development and is not yet publicly
available. Manual inspection confirms that the bug is still
present in v7.1-rc7.
An x86_64 allyesconfig build showed no new warnings. As we do not have
an ENA device to test with, no runtime testing was able to be performed.
Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action")
Cc: stable@vger.kernel.org
Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn>
---
drivers/net/ethernet/amazon/ena/ena_netdev.c | 23 ++++++++++++++++++--
1 file changed, 21 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 92d149d4f091..5d05020a6d05 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -752,6 +752,18 @@ static void ena_destroy_all_tx_queues(struct ena_adapter *adapter)
}
}
+static void ena_destroy_xdp_tx_queues(struct ena_adapter *adapter)
+{
+ u16 ena_qid;
+ int i;
+
+ for (i = adapter->xdp_first_ring;
+ i < adapter->xdp_first_ring + adapter->xdp_num_queues; i++) {
+ ena_qid = ENA_IO_TXQ_IDX(i);
+ ena_com_destroy_io_queue(adapter->ena_dev, ena_qid);
+ }
+}
+
static void ena_destroy_all_rx_queues(struct ena_adapter *adapter)
{
u16 ena_qid;
@@ -2078,14 +2090,21 @@ static int create_queues_with_size_backoff(struct ena_adapter *adapter)
rc = ena_setup_tx_resources_in_range(adapter,
0,
adapter->num_io_queues);
- if (rc)
+ if (rc) {
+ ena_destroy_xdp_tx_queues(adapter);
+ ena_free_all_io_tx_resources_in_range(adapter,
+ adapter->xdp_first_ring,
+ adapter->xdp_num_queues);
goto err_setup_tx;
+ }
rc = ena_create_io_tx_queues_in_range(adapter,
0,
adapter->num_io_queues);
- if (rc)
+ if (rc) {
+ ena_destroy_xdp_tx_queues(adapter);
goto err_create_tx_queues;
+ }
rc = ena_setup_all_rx_resources(adapter);
if (rc)
--
2.34.1
^ permalink raw reply related
* Re: [PATCH 4/4] vhost/vsock: re-scan TX virtqueue on device start
From: Stefano Garzarella @ 2026-06-16 14:23 UTC (permalink / raw)
To: Andrey Drobyshev
Cc: linux-kernel, kvm, virtualization, netdev, mst, stefanha,
maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den
In-Reply-To: <20260612165718.433546-5-andrey.drobyshev@virtuozzo.com>
On Fri, Jun 12, 2026 at 07:57:18PM +0300, Andrey Drobyshev wrote:
>During QEMU CPR live-update (and VHOST_RESET_OWNER in general) the guest
>keeps running while the host drops and later re-attaches vhost backends.
>If the guest adds a buffer to the TX virtqueue (guest->host) and kicks
>while the backend is temporarily NULL (between vhost_vsock_drop_backends()
>and the next vhost_vsock_start()), then the kick is delivered to the
>vhost worker, handle_tx_kick() sees a NULL backend and returns, and the
>kick signal is consumed. The buffer is then left in the ring.
>
>Then upon device start vhost_vsock_start() only re-kicks the RX send
>worker, never the TX VQ, so the buffer is processed only if the guest
>happens to kick again. But if the guest itself is now waiting for data
>from the host, it will never kick TX VQ again, and we end up in a
>deadlock.
>
>The deadlock is reproduced during active host->guest socat data transfer
>under multiple consecutive CPR live-update's.
>
>To fix this, in vhost_vsock_start(), after kicking the RX send worker, also
>queue the TX vq poll so any buffers the guest enqueued while we were paused
>get scanned.
Again, it seems like we're fixing an issue that existed before this
series, but IIUC without support for VHOST_RESET_OWNER, this could never
have happened, so the wording should be changed to make it clear that
this is can happen only with the new VHOST_RESET_OWNER support.
In addition, this patch must also be applied before the
VHOST_RESET_OWNER support or merged into it.
>
>Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
>---
> drivers/vhost/vsock.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index bcaba36becd7..1fcfe71d18be 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -655,6 +655,12 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
> */
> vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_RX], &vsock->send_pkt_work);
>
>+ /*
>+ * Some packets might've also been queued in TX VQ. Re-scan it here,
>+ * mirroring the RX send-worker kick above.
>+ */
Can we also mention that this is related to VHOST_RESET_OWNER?
Thanks,
Stefano
>+ vhost_poll_queue(&vsock->vqs[VSOCK_VQ_TX].poll);
>+
> mutex_unlock(&vsock->dev.mutex);
> return 0;
>
>--
>2.47.1
>
^ permalink raw reply
* Re: [PATCH 3/4] vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause
From: Stefano Garzarella @ 2026-06-16 14:18 UTC (permalink / raw)
To: Andrey Drobyshev
Cc: linux-kernel, kvm, virtualization, netdev, mst, stefanha,
maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den
In-Reply-To: <20260612165718.433546-4-andrey.drobyshev@virtuozzo.com>
On Fri, Jun 12, 2026 at 07:57:17PM +0300, Andrey Drobyshev wrote:
>From: "Denis V. Lunev" <den@openvz.org>
>
>Earlier commit ("ms/vhost/vsock: Refuse the connection immediately when
Please follow
https://docs.kernel.org/process/submitting-patches.html#describe-your-changes
on how to refer to a commit.
>guest isn't ready") added a fast-fail in vhost_transport_send_pkt(). It
>rejects every host send with -EHOSTUNREACH until the destination calls
>SET_RUNNING(1). The fast-fail condition checks whether device's backends
>are dropped, and if they're, the guest is considered to be not ready.
Okay, so it's not a regression, I mean without this series that patch is
not adding any regression, no?
If it's the case, I'll change the wording in the cover letter.
>
>However, there might be other reasons for backends to be nulled. In
>particular, when QEMU is performing CPR (checkpoint-restore) migration,
>device ownership is being RESET and SET again, which leads to backends
>drop and reattach. If we end up connecting during this window, an
>AF_VSOCK client gets -EHOSTUNREACH, which is wrong.
Please add this change before starting to support VHOST_RESET_OWNER
ioctl in vhost-vsock, otherwise we are breaking the bisectability.
>
>Add a cpr_paused flag set inside vhost_vsock_drop_backends() when the
>backend was previously live, cleared by vhost_vsock_start(). When set,
>vhost_transport_send_pkt() queues the skb instead of fast-failing; the
>existing kick of send_pkt_work in vhost_vsock_start() drains it on
>resume. A device that has never run keeps cpr_paused == false and the
>boot-time fast-fail behaviour is preserved.
>
>Pair the cpr_paused store with the backend store using an
>smp_wmb()/smp_rmb() pair so a concurrent sender on a weakly-ordered
>architecture never observes (NULL backend, !paused):
>
>Signed-off-by: Denis V. Lunev <den@openvz.org>
>---
> drivers/vhost/vsock.c | 22 +++++++++++++++++++---
> 1 file changed, 19 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index e629886e5cf8..bcaba36becd7 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -61,6 +61,7 @@ struct vhost_vsock {
>
> u32 guest_cid;
> bool seqpacket_allow;
>+ bool cpr_paused; /* between stop and next start */
> };
>
> static u32 vhost_transport_get_local_cid(void)
>@@ -311,11 +312,17 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net *net)
> * the mutex would be too expensive in this hot path, and we already have
> * all the outcomes covered: if the backend becomes NULL right after the check,
> * vhost_transport_do_send_pkt() will check it under the mutex anyway.
>+ *
>+ * Don't fast-fail if cpr_paused is set, keep queueing skbs instead.
>+ * The kick in vhost_vsock_start() will drain them on resume.
> */
> if (unlikely(!data_race(vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])))) {
>- rcu_read_unlock();
>- kfree_skb(skb);
>- return -EHOSTUNREACH;
>+ smp_rmb(); /* pairs with smp_wmb() in start/drop_backends */
>+ if (!READ_ONCE(vsock->cpr_paused)) {
Can we avoid this which is not really readable and maybe add a single
variable to control the fast-fail at all?
I mean replacing both cpr_paused + backend-pointer with a single
`started` flag: set it to false at open, true on start via
smp_store_release(), back to false on normal stop, and leave it true
during CPR pause.
The reader in send_pkt can do just:
if (!smp_load_acquire(&vsock->started))
return -EHOSTUNREACH;
WDYT?
>+ rcu_read_unlock();
>+ kfree_skb(skb);
>+ return -EHOSTUNREACH;
>+ }
That said claude here is reporting a potential issue that I think we
should consider:
After VHOST_RESET_OWNER, the guest CID stays in the hash, so
vhost_transport_send_pkt() can still find the vsock, skip the
fast-fail (cpr_paused=true), and call vhost_vq_work_queue() while
vhost_workers_free() is freeing workers without a synchronize_rcu()
— risking a use-after-free. Also, any send_pkt_work queued between
the last flush and worker teardown gets its VHOST_WORK_QUEUED bit
stuck (the vhost task exits without draining), deadlocking
host→guest traffic after restart.
A synchronize_rcu() in vhost_workers_free() between the
rcu_assign_pointer(NULL) loop and the destroy loop would close the
use-after-free, and reinitializing send_pkt_work via
vhost_work_init() after vhost_dev_reset_owner() returns would clear
the stuck QUEUED bit.
> }
>
> if (virtio_vsock_skb_reply(skb))
>@@ -640,6 +647,9 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
> mutex_unlock(&vq->mutex);
> }
>
>+ smp_wmb(); /* pairs with smp_rmb() in send_pkt */
>+ WRITE_ONCE(vsock->cpr_paused, false);
>+
> /* Some packets may have been queued before the device was started,
> * let's kick the send worker to send them.
> */
>@@ -671,6 +681,11 @@ static void vhost_vsock_drop_backends(struct vhost_vsock *vsock)
>
> lockdep_assert_held(&vsock->dev.mutex);
>
>+ if (vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])) {
>+ WRITE_ONCE(vsock->cpr_paused, true);
>+ smp_wmb(); /* pairs with smp_rmb() in send_pkt */
>+ }
Why here and not in vhost_vsock_reset_owner()?
Also having this here will set it to true also with
VHOST_VSOCK_SET_RUNNING(0), is that right?
Thanks,
Stefano
>+
> for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
> vq = &vsock->vqs[i];
>
>@@ -728,6 +743,7 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
>
> vsock->guest_cid = 0; /* no CID assigned yet */
> vsock->seqpacket_allow = false;
>+ vsock->cpr_paused = false;
>
> atomic_set(&vsock->queued_replies, 0);
>
>--
>2.47.1
>
^ permalink raw reply
* Re: [PATCH RFC 3/9] net: stmmac: qcom-ethqos: fix RGMII_ID mode to use DLL bypass
From: Konrad Dybcio @ 2026-06-16 14:14 UTC (permalink / raw)
To: Andrew Lunn, Mohd Ayaan Anwar, Bjorn Andersson,
Bartosz Golaszewski, Eric Chanudet, Lucas Karpinski,
Andrew Halaney
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Richard Cochran, Bjorn Andersson, Konrad Dybcio, Maxime Coquelin,
Alexandre Torgue, Russell King, linux-arm-msm, netdev, devicetree,
linux-kernel, linux-stm32, linux-arm-kernel
In-Reply-To: <82705420-771d-41bf-a4d9-ed94dff86ff0@lunn.ch>
On 6/15/26 6:48 PM, Andrew Lunn wrote:
> On Mon, Jun 15, 2026 at 09:24:07AM +0530, Mohd Ayaan Anwar wrote:
>> Hello Andrew,
>> On Thu, Jun 11, 2026 at 10:54:37PM +0200, Andrew Lunn wrote:
>>> On Fri, Jun 12, 2026 at 12:06:59AM +0530, Mohd Ayaan Anwar wrote:
>>>> When "rgmii-id" is selected the PHY supplies both TX and RX delays, so
>>>> the MAC must not add its own. The driver currently falls through to the
>>>> generic DLL initialisation path which programs it to add a delay.
>>>>
>>>> Power down the DLL and set DDR bypass mode for RGMII_ID, then program
>>>> the IO_MACRO via a new ethqos_rgmii_id_macro_init() helper. Also fix
>>>> ethqos_set_clk_tx_rate() to not double the clock rate in bypass mode at
>>>> 100M/10M, and remove RGMII_ID from the phase-shift suppression in
>>>> ethqos_rgmii_macro_init() since RGMII_ID no longer reaches that path.
>>>
>>> I'm curious how this works at the moment? Do no boards make use of
>>> RGMII ID? Are all current boards broken?
>>
>> Searching through the DTS, I found that we have two boards using "rgmii"
>> (qcs404-evb-4000.dts and sa8155-adp.dts) and another board using
>> "rgmii-txid" (sa8540p-ride.dts). No board which uses RGMII ID.
>
> So this causes problems. We cannot break existing boards, yet it would
> be good to fix the current broken behaviour.
These are a funny bunch.. QCS404 is a stuck in a perpetual cycle of
"no one has the hardware" and "someone has the hw but zero interest or
time". I think we've considered it for removal at one point..
I'm not sure to what degree the two SA8xxx boards are used. They
may have been stuck in some sort of a limbo. Maybe Bjorn knows?
Also +Cc some of the folks that contributed to them in the past
Konrad
^ permalink raw reply
* [PATCH net] net: serialize netif_running() check in enqueue_to_backlog()
From: Eric Dumazet @ 2026-06-16 14:13 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
Eric Dumazet, syzbot+965506b59a2de0b6905c, Julian Anastasov
Syzbot reported a KASAN slab-use-after-free in fib_rules_lookup().
The root cause is a race condition where packets can escape the backlog
flushing during device unregistration (e.g., during netns exit).
Commit e9e4dd3267d0 ("net: do not process device backlog during unregistration")
introduced a lockless netif_running() check in enqueue_to_backlog() to
prevent queuing packets to an unregistering device.
However, this creates a TOCTOU race window.
A lockless transmitter (like veth_xmit) can pass
the check before dev_close() clears IFF_UP. If the transmitter is then
delayed, flush_all_backlogs() can run and finish before the transmitter
grabs the backlog lock and queues the packet. The packet then escapes
the flush and triggers UAF later when processed.
Fix this by moving the netif_running() check inside the backlog lock.
This serializes the check with the flush work (which also grabs the lock).
We then either queue the packet before the flush runs (so it gets flushed),
or check netif_running() after the flush/close completes (so it gets dropped).
Fixes: e9e4dd3267d0 ("net: do not process device backlog during unregistration")
Reported-by: syzbot+965506b59a2de0b6905c@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6a315824.b0403584.28d0ff.0000.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
---
net/core/dev.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 731e661d7be6574d5eca4a600e0a5623be4c2485..f81ce83fb3250d591ffa5eeb4c3067f8b75a54ca 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5381,8 +5381,6 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
u32 tail;
reason = SKB_DROP_REASON_DEV_READY;
- if (unlikely(!netif_running(skb->dev)))
- goto bad_dev;
sd = &per_cpu(softnet_data, cpu);
@@ -5394,6 +5392,10 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
backlog_lock_irq_save(sd, &flags);
qlen = skb_queue_len(&sd->input_pkt_queue);
if (likely(qlen <= max_backlog)) {
+ if (unlikely(!netif_running(skb->dev))) {
+ backlog_unlock_irq_restore(sd, flags);
+ goto bad_dev;
+ }
if (!qlen) {
/* Schedule NAPI for backlog device. We can use
* non atomic operation as we own the queue lock.
--
2.54.0.1189.g8c84645362-goog
^ permalink raw reply related
* Re: [PATCH 2/4] vhost/vsock: add VHOST_RESET_OWNER ioctl
From: Andrey Drobyshev @ 2026-06-16 14:10 UTC (permalink / raw)
To: Stefano Garzarella
Cc: linux-kernel, kvm, virtualization, netdev, mst, stefanha,
maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den
In-Reply-To: <ajFRRmA9req1muX6@sgarzare-redhat>
On 6/16/26 4:48 PM, Stefano Garzarella wrote:
> On Fri, Jun 12, 2026 at 07:57:16PM +0300, Andrey Drobyshev wrote:
>> From: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>>
>> This ioctl is needed for QEMU's CPR (checkpoint-restore) migration of
>> the guest with vhost-vsock device. For this to work, we need to reset
>> the device ownership on the source side by calling RESET_OWNER, and then
>> claim it on the dest side by calling SET_OWNER. We expect not to lose any
>> AF_VSOCK connection while this happens.
>>
>> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>> ---
>> drivers/vhost/vsock.c | 28 ++++++++++++++++++++++++++++
>> 1 file changed, 28 insertions(+)
>>
>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> index b12221ce6faf..e629886e5cf8 100644
>> --- a/drivers/vhost/vsock.c
>> +++ b/drivers/vhost/vsock.c
>> @@ -894,6 +894,32 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
>> return -EFAULT;
>> }
>>
>> +static int vhost_vsock_reset_owner(struct vhost_vsock *vsock)
>> +{
>> + struct vhost_iotlb *umem;
>> + long err;
>> +
>> + mutex_lock(&vsock->dev.mutex);
>> + err = vhost_dev_check_owner(&vsock->dev);
>> + if (err)
>> + goto done;
>> + umem = vhost_dev_reset_owner_prepare();
>> + if (!umem) {
>> + err = -ENOMEM;
>> + goto done;
>> + }
>> + /* Follows vhost_vsock_dev_release closely except for guest_cid drop */
>> + vsock_for_each_connected_socket(&vhost_transport.transport,
>> + vhost_vsock_reset_orphans);
>
> In vhost_vsock_reset_orphans() we have:
>
> rcu_read_lock();
>
> /* If the peer is still valid, no need to reset connection */
> if (vhost_vsock_get(vsk->remote_addr.svm_cid, sock_net(sk))) {
> rcu_read_unlock();
> return;
> }
>
> IIUC we are not removing the guest cid from the hash table, so this
> check will be always true, and nothing is done.
>
> So, is this call really useful?
>
You're right, and it's most probably an artifact from mimicking the
vhost_vsock_dev_release() implementation, as mentioned in the comment.
In our case this whole iteration is a no-op, we better remove it.
BTW earlier I received some feedback from Sashiko AI reviewer, which
also spotted that same issue (and some more interesting races):
https://sashiko.dev/#/patchset/20260612165718.433546-1-andrey.drobyshev@virtuozzo.com
Apparently it only CC's its reviews to kvm@vger.kernel.org so you can't
see them right away. Just wanted to let you know to save your time
here. I'll send a v2 with respect to Sashiko remarks. But of course
would be great if you spot some more issues here.
>> + vhost_vsock_drop_backends(vsock);
>> + vhost_vsock_flush(vsock);
>> + vhost_dev_stop(&vsock->dev);
>> + vhost_dev_reset_owner(&vsock->dev, umem);
>> +done:
>> + mutex_unlock(&vsock->dev.mutex);
>> + return err;
>> +}
>> +
>> static long vhost_vsock_dev_ioctl(struct file *f, unsigned int ioctl,
>> unsigned long arg)
>> {
>> @@ -937,6 +963,8 @@ static long vhost_vsock_dev_ioctl(struct file *f, unsigned int ioctl,
>> return -EOPNOTSUPP;
>> vhost_set_backend_features(&vsock->dev, features);
>> return 0;
>> + case VHOST_RESET_OWNER:
>> + return vhost_vsock_reset_owner(vsock);
>> default:
>> mutex_lock(&vsock->dev.mutex);
>> r = vhost_dev_ioctl(&vsock->dev, ioctl, argp);
>> --
>> 2.47.1
>>
>
^ permalink raw reply
* [syzbot] [net?] KASAN: slab-use-after-free Read in fib_rules_lookup
From: syzbot @ 2026-06-16 14:05 UTC (permalink / raw)
To: davem, dsahern, edumazet, horms, idosch, kuba, linux-kernel,
netdev, pabeni, syzkaller-bugs
Hello,
syzbot found the following issue on:
HEAD commit: 72dfa4700f78 net: dsa: sja1105: fix lastused timestamp in ..
git tree: net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=15794bd2580000
kernel config: https://syzkaller.appspot.com/x/.config?x=a0842261b62cdea8
dashboard link: https://syzkaller.appspot.com/bug?extid=965506b59a2de0b6905c
compiler: Debian clang version 22.1.6 (++20260514074242+fc4aad7b5db3-1~exp1~20260514074407.73), Debian LLD 22.1.6
Unfortunately, I don't have any reproducer for this issue yet.
Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/d4e16f50a97c/disk-72dfa470.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/6cd4a736e796/vmlinux-72dfa470.xz
kernel image: https://storage.googleapis.com/syzbot-assets/548b0011c8e8/bzImage-72dfa470.xz
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+965506b59a2de0b6905c@syzkaller.appspotmail.com
bond0 (unregistering): Released all slaves
bond1 (unregistering): Released all slaves
bond2 (unregistering): (slave dummy0): Releasing active interface
bond2 (unregistering): Released all slaves
==================================================================
BUG: KASAN: slab-use-after-free in fib_rules_lookup+0x15e/0xeb0 net/core/fib_rules.c:321
Read of size 8 at addr ffff88804ec4c680 by task kworker/u8:21/12641
CPU: 0 UID: 0 PID: 12641 Comm: kworker/u8:21 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/09/2026
Workqueue: netns cleanup_net
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
print_address_description+0x55/0x1e0 mm/kasan/report.c:378
print_report+0x58/0x70 mm/kasan/report.c:482
kasan_report+0x117/0x150 mm/kasan/report.c:595
fib_rules_lookup+0x15e/0xeb0 net/core/fib_rules.c:321
__fib_lookup+0x106/0x210 net/ipv4/fib_rules.c:96
ip_route_output_key_hash_rcu+0x294/0x2720 net/ipv4/route.c:2811
ip_route_output_key_hash+0x18d/0x2a0 net/ipv4/route.c:2702
__ip_route_output_key include/net/route.h:169 [inline]
ip_route_output_flow+0x2a/0x150 net/ipv4/route.c:2929
ip4_datagram_release_cb+0x89d/0xbe0 net/ipv4/datagram.c:118
release_sock+0x206/0x260 net/core/sock.c:3861
inet_shutdown+0x2b1/0x390 net/ipv4/af_inet.c:950
udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
fou_release net/ipv4/fou_core.c:562 [inline]
fou_exit_net+0x17d/0x1f0 net/ipv4/fou_core.c:1230
ops_exit_list net/core/net_namespace.c:199 [inline]
ops_undo_list+0x43d/0x8d0 net/core/net_namespace.c:252
cleanup_net+0x572/0x810 net/core/net_namespace.c:702
process_one_work kernel/workqueue.c:3314 [inline]
process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3397
worker_thread+0xa47/0xfb0 kernel/workqueue.c:3478
kthread+0x389/0x470 kernel/kthread.c:436
ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
Allocated by task 19121:
kasan_save_stack mm/kasan/common.c:57 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
__kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415
kasan_kmalloc include/linux/kasan.h:263 [inline]
__do_kmalloc_node mm/slub.c:5296 [inline]
__kmalloc_node_track_caller_noprof+0x4d7/0x7b0 mm/slub.c:5408
kmemdup_noprof+0x2b/0x70 mm/util.c:138
kmemdup_noprof include/linux/fortify-string.h:763 [inline]
fib_rules_register+0x2f/0x400 net/core/fib_rules.c:170
fib4_rules_init+0x21/0x160 net/ipv4/fib_rules.c:508
ip_fib_net_init net/ipv4/fib_frontend.c:1578 [inline]
fib_net_init+0x17a/0x3e0 net/ipv4/fib_frontend.c:1628
ops_init+0x35d/0x5d0 net/core/net_namespace.c:137
setup_net+0x118/0x350 net/core/net_namespace.c:446
copy_net_ns+0x4f9/0x720 net/core/net_namespace.c:579
create_new_namespaces+0x3f0/0x6b0 kernel/nsproxy.c:132
unshare_nsproxy_namespaces+0x149/0x190 kernel/nsproxy.c:234
ksys_unshare+0x57d/0xa00 kernel/fork.c:3242
__do_sys_unshare kernel/fork.c:3316 [inline]
__se_sys_unshare kernel/fork.c:3314 [inline]
__x64_sys_unshare+0x38/0x50 kernel/fork.c:3314
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 12641:
kasan_save_stack mm/kasan/common.c:57 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:584
poison_slab_object mm/kasan/common.c:253 [inline]
__kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
kasan_slab_free include/linux/kasan.h:235 [inline]
slab_free_hook mm/slub.c:2689 [inline]
__rcu_free_sheaf_prepare+0x12d/0x2a0 mm/slub.c:2940
rcu_free_sheaf+0x31/0x200 mm/slub.c:5850
rcu_do_batch kernel/rcu/tree.c:2617 [inline]
rcu_core+0x78b/0x10a0 kernel/rcu/tree.c:2869
handle_softirqs+0x225/0x840 kernel/softirq.c:622
do_softirq+0x76/0xd0 kernel/softirq.c:523
__local_bh_enable_ip+0xf8/0x130 kernel/softirq.c:450
unregister_netdevice_many_notify+0x1874/0x2150 net/core/dev.c:12445
ops_exit_rtnl_list net/core/net_namespace.c:187 [inline]
ops_undo_list+0x391/0x8d0 net/core/net_namespace.c:248
cleanup_net+0x572/0x810 net/core/net_namespace.c:702
process_one_work kernel/workqueue.c:3314 [inline]
process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3397
worker_thread+0xa47/0xfb0 kernel/workqueue.c:3478
kthread+0x389/0x470 kernel/kthread.c:436
ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
The buggy address belongs to the object at ffff88804ec4c600
which belongs to the cache kmalloc-192 of size 192
The buggy address is located 128 bytes inside of
freed 192-byte region [ffff88804ec4c600, ffff88804ec4c6c0)
The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4ec4c
flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 00fff00000000000 ffff88813fe163c0 dead000000000100 dead000000000122
raw: 0000000000000000 0000000800100010 00000000f5000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0xd2cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 13856, tgid 13853 (syz.3.2144), ts 351172300879, free_ts 351133053454
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853
prep_new_page mm/page_alloc.c:1861 [inline]
get_page_from_freelist+0x24ae/0x2530 mm/page_alloc.c:3941
__alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221
alloc_slab_page mm/slub.c:3278 [inline]
allocate_slab+0x77/0x660 mm/slub.c:3467
new_slab mm/slub.c:3525 [inline]
refill_objects+0x336/0x3d0 mm/slub.c:7272
refill_sheaf mm/slub.c:2816 [inline]
__pcs_replace_empty_main+0x320/0x720 mm/slub.c:4652
alloc_from_pcs mm/slub.c:4750 [inline]
slab_alloc_node mm/slub.c:4884 [inline]
__do_kmalloc_node mm/slub.c:5295 [inline]
__kmalloc_noprof+0x464/0x750 mm/slub.c:5308
kmalloc_noprof include/linux/slab.h:954 [inline]
kzalloc_noprof include/linux/slab.h:1188 [inline]
new_dir fs/proc/proc_sysctl.c:966 [inline]
get_subdir fs/proc/proc_sysctl.c:1010 [inline]
sysctl_mkdir_p fs/proc/proc_sysctl.c:1320 [inline]
__register_sysctl_table+0xc02/0x1370 fs/proc/proc_sysctl.c:1395
neigh_sysctl_register+0x9b1/0xa90 net/core/neighbour.c:3915
addrconf_sysctl_register+0xb3/0x1c0 net/ipv6/addrconf.c:7396
ipv6_add_dev+0xd26/0x13a0 net/ipv6/addrconf.c:460
addrconf_notify+0x771/0x1050 net/ipv6/addrconf.c:3679
notifier_call_chain+0x1a5/0x3d0 kernel/notifier.c:85
call_netdevice_notifiers_extack net/core/dev.c:2288 [inline]
call_netdevice_notifiers net/core/dev.c:2302 [inline]
register_netdevice+0x18db/0x1f00 net/core/dev.c:11474
macsec_newlink+0x706/0x1200 drivers/net/macsec.c:4218
rtnl_newlink_create+0x310/0xb00 net/core/rtnetlink.c:3905
page last free pid 12657 tgid 12657 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
__free_pages_prepare mm/page_alloc.c:1397 [inline]
__free_frozen_pages+0xc0d/0xd20 mm/page_alloc.c:2938
__tlb_remove_table_free mm/mmu_gather.c:228 [inline]
tlb_remove_table_rcu+0x85/0x100 mm/mmu_gather.c:291
rcu_do_batch kernel/rcu/tree.c:2617 [inline]
rcu_core+0x78b/0x10a0 kernel/rcu/tree.c:2869
handle_softirqs+0x225/0x840 kernel/softirq.c:622
__do_softirq kernel/softirq.c:656 [inline]
invoke_softirq kernel/softirq.c:496 [inline]
__irq_exit_rcu+0xca/0x220 kernel/softirq.c:735
irq_exit_rcu+0x9/0x30 kernel/softirq.c:752
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1061 [inline]
sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1061
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:697
Memory state around the buggy address:
ffff88804ec4c580: 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88804ec4c600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88804ec4c680: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
^
ffff88804ec4c700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88804ec4c780: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
==================================================================
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title
If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)
If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report
If you want to undo deduplication, reply with:
#syz undup
^ permalink raw reply
* Re: [PATCH net] nfc: pn533: prevent division by zero in the listen mode timer
From: Simon Horman @ 2026-06-16 14:02 UTC (permalink / raw)
To: Yinhao Hu
Cc: netdev, David Heidelberg, Krzysztof Kozlowski, Jakub Kicinski,
Dan Carpenter, dzm91, hust-os-kernel-patches
In-Reply-To: <20260615103547.1599528-1-dddddd@hust.edu.cn>
On Mon, Jun 15, 2026 at 03:35:47AM -0700, Yinhao Hu wrote:
> The listen-mode timer handler advances the polling state machine through
> pn533_poll_next_mod(), which computes:
>
> dev->poll_mod_curr = (dev->poll_mod_curr + 1) % dev->poll_mod_count;
>
> pn533_poll_reset_mod_list() clears dev->poll_mod_count without first
> stopping that timer: pn533_dep_link_down() deletes no timer at all, and
> pn533_stop_poll() uses timer_delete(), which does not wait for a handler
> already running on another CPU. When the handler runs after the count
> has been zeroed, it divides by zero:
>
> Oops: divide error: 0000 [#1] SMP
> RIP: 0010:pn533_listen_mode_timer+0x9b/0x110
>
> Delete the timer synchronously in pn533_poll_reset_mod_list(), the single
> place that clears the list, so the handler can no longer run past a reset.
> Also return early when poll_mod_count is already zero, covering the window
> where pn533_wq_poll() re-arms the timer just before a reset.
>
> Fixes: 6fbbdc16be38 ("NFC: Implement pn533 polling loop")
> Signed-off-by: Yinhao Hu <dddddd@hust.edu.cn>
> ---
> drivers/nfc/pn533/pn533.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/nfc/pn533/pn533.c b/drivers/nfc/pn533/pn533.c
> index d7bdbc82e2ba..88df99001b4a 100644
> --- a/drivers/nfc/pn533/pn533.c
> +++ b/drivers/nfc/pn533/pn533.c
> @@ -951,6 +951,7 @@ static inline void pn533_poll_next_mod(struct pn533 *dev)
>
> static void pn533_poll_reset_mod_list(struct pn533 *dev)
> {
> + timer_delete_sync(&dev->listen_timer);
> dev->poll_mod_count = 0;
> }
>
> @@ -1235,6 +1236,10 @@ static void pn533_listen_mode_timer(struct timer_list *t)
> {
> struct pn533 *dev = timer_container_of(dev, t, listen_timer);
>
> + /* Polling may have been stopped while the timer was pending. */
> + if (!dev->poll_mod_count)
> + return;
> +
I am concerned that access to poll_mod_count is not synchronised and thus
this may not work as intended.
> dev->cancel_listen = 1;
>
> pn533_poll_next_mod(dev);
> --
> 2.43.0
>
^ permalink raw reply
* Re: [PATCH 0/4] vhost/vsock: add support for VHOST_RESET_OWNER and CPR migration
From: Andrey Drobyshev @ 2026-06-16 14:01 UTC (permalink / raw)
To: Stefano Garzarella
Cc: linux-kernel, kvm, virtualization, netdev, mst, stefanha,
maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den
In-Reply-To: <ajFLvKcT-A0wvLYW@sgarzare-redhat>
Hello Stefano,
On 6/16/26 4:35 PM, Stefano Garzarella wrote:
> Hi Andrey,
> thanks for the series!
>
> On Fri, Jun 12, 2026 at 07:57:14PM +0300, Andrey Drobyshev wrote:
>> Host<-->guest connections via AF_VSOCK sockets aren't supposed to
>> outlive VM migration, since VM is moving to another host. However
>> there's a special case, which is QEMU live-update, or CPR
>> (checkpoint-restore) migration. In this case, VM remains on the same
>> host, and we'd like such connections to persist.
>
> In the spec we have VIRTIO_VSOCK_EVENT_TRANSPORT_RESET which is usually
> sent by the device after a migration.
>
> IIUC the specs don't say this has to be done all the time, so we don't
> need to change anything in the specs, right?
>
> We just need to avoid sending it (which I think is what we're doing
> here... I still need to look at the patches).
>
Sending this exact ioctl is guarded by one of my patches in the QEMU
counterpart series:
https://lore.kernel.org/qemu-devel/20260612165110.431376-6-andrey.drobyshev@virtuozzo.com/
So we indeed avoid sending it on migration target in case of CPR migration.
>>
>> For this to work, we need to be able to transfer device ownership from
>> source QEMU to dest QEMU. Namely, source needs to reset ownership by
>> issuing VHOST_RESET_OWNER ioctl, and then target has to claim it by
>> calling VHOST_SET_OWNER.
>>
>> Since VHOST_RESET_OWNER isn't yet implemented for vhost-vsock, let's add
>> such implementation (patches 1-2). Also fix regression introduced by
>> the earlier commit [1] (patch 3), and fix the deadlock bug (commit 4).
>
> If it's a regression, should we fix it separately?
>
> Or is it related to this series?
>
Probably my wording wasn't quite correct. I posted this patch here
because we found the problem during testing this particular
functionality, i.e. vsock data transfer + CPR migration. And the
problem was introduced by a recent commit, which is fine on its own, but
breaks the CPR case.
>>
>> There's a complementary series for QEMU [0] adding support of vhost-vsock
>> devices during CPR migration.
>>
>> NOTE: this series needs to be applied on top of Michael's vhost/linux-next
>> tree as it contains relevant commit [1], not yet present in master branch.
>>
>> I've tested this (patched QEMU + patched kernel) approximately as follows:
>>
>> * Run listener in the guest:
>> socat -u VSOCK-LISTEN:9999 - >/tmp/recv.bin
>>
>> * Run data transfer from host to guest:
>> socat -u FILE:/root/bigfile.bin VSOCK-CONNECT:CID:9999
>>
>> * Perform CPR migration during transfer (either cpr-exec or cpr-transfer)
>> * Check that file hash sum matches
>>
>> [0] https://lore.kernel.org/qemu-devel/20260612165110.431376-1-andrey.drobyshev@virtuozzo.com
>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/commit/?id=bb26ed5f3a8b
>>
>> Andrey Drobyshev (1):
>> vhost/vsock: re-scan TX virtqueue on device start
>>
>> Denis V. Lunev (1):
>> vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause
>>
>> Pavel Tikhomirov (2):
>> vhost/vsock: split out vhost_vsock_drop_backends helper
>> vhost/vsock: add VHOST_RESET_OWNER ioctl
>>
>> drivers/vhost/vsock.c | 80 +++++++++++++++++++++++++++++++++++++------
>> 1 file changed, 69 insertions(+), 11 deletions(-)
>>
>> --
>> 2.47.1
>>
>
^ permalink raw reply
* Re: [PATCH 2/4] vhost/vsock: add VHOST_RESET_OWNER ioctl
From: Stefano Garzarella @ 2026-06-16 13:48 UTC (permalink / raw)
To: Andrey Drobyshev
Cc: linux-kernel, kvm, virtualization, netdev, mst, stefanha,
maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den
In-Reply-To: <20260612165718.433546-3-andrey.drobyshev@virtuozzo.com>
On Fri, Jun 12, 2026 at 07:57:16PM +0300, Andrey Drobyshev wrote:
>From: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>
>This ioctl is needed for QEMU's CPR (checkpoint-restore) migration of
>the guest with vhost-vsock device. For this to work, we need to reset
>the device ownership on the source side by calling RESET_OWNER, and then
>claim it on the dest side by calling SET_OWNER. We expect not to lose any
>AF_VSOCK connection while this happens.
>
>Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>---
> drivers/vhost/vsock.c | 28 ++++++++++++++++++++++++++++
> 1 file changed, 28 insertions(+)
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index b12221ce6faf..e629886e5cf8 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -894,6 +894,32 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
> return -EFAULT;
> }
>
>+static int vhost_vsock_reset_owner(struct vhost_vsock *vsock)
>+{
>+ struct vhost_iotlb *umem;
>+ long err;
>+
>+ mutex_lock(&vsock->dev.mutex);
>+ err = vhost_dev_check_owner(&vsock->dev);
>+ if (err)
>+ goto done;
>+ umem = vhost_dev_reset_owner_prepare();
>+ if (!umem) {
>+ err = -ENOMEM;
>+ goto done;
>+ }
>+ /* Follows vhost_vsock_dev_release closely except for guest_cid drop */
>+ vsock_for_each_connected_socket(&vhost_transport.transport,
>+ vhost_vsock_reset_orphans);
In vhost_vsock_reset_orphans() we have:
rcu_read_lock();
/* If the peer is still valid, no need to reset connection */
if (vhost_vsock_get(vsk->remote_addr.svm_cid, sock_net(sk))) {
rcu_read_unlock();
return;
}
IIUC we are not removing the guest cid from the hash table, so this
check will be always true, and nothing is done.
So, is this call really useful?
>+ vhost_vsock_drop_backends(vsock);
>+ vhost_vsock_flush(vsock);
>+ vhost_dev_stop(&vsock->dev);
>+ vhost_dev_reset_owner(&vsock->dev, umem);
>+done:
>+ mutex_unlock(&vsock->dev.mutex);
>+ return err;
>+}
>+
> static long vhost_vsock_dev_ioctl(struct file *f, unsigned int ioctl,
> unsigned long arg)
> {
>@@ -937,6 +963,8 @@ static long vhost_vsock_dev_ioctl(struct file *f, unsigned int ioctl,
> return -EOPNOTSUPP;
> vhost_set_backend_features(&vsock->dev, features);
> return 0;
>+ case VHOST_RESET_OWNER:
>+ return vhost_vsock_reset_owner(vsock);
> default:
> mutex_lock(&vsock->dev.mutex);
> r = vhost_dev_ioctl(&vsock->dev, ioctl, argp);
>--
>2.47.1
>
^ permalink raw reply
* [PATCH net v4 2/2] ipv6: account for fraggap on the paged allocation path
From: Wongi Lee @ 2026-06-16 13:46 UTC (permalink / raw)
To: netdev
Cc: David Ahern, Ido Schimmel, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, asml.silence, dhowells,
willemb, Jungwoo Lee
In-Reply-To: <ajFQn6yh43eDeQm9@DESKTOP-19IMU7U.localdomain>
In __ip6_append_data(), when the paged-allocation branch is taken
(MSG_MORE / NETIF_F_SG / large fraglen), alloclen and pagedlen are
computed as
alloclen = fragheaderlen + transhdrlen;
pagedlen = datalen - transhdrlen;
datalen already includes fraggap (datalen = length + fraggap). When
fraggap is non-zero, this is not the first skb and transhdrlen is zero.
The fraggap bytes carried over from the previous skb are copied just past
the fragment headers in the new skb's linear area. The linear area is
therefore undersized by fraggap bytes while pagedlen is overstated by the
same amount, and the copy writes past skb->end into the trailing
skb_shared_info.
An unprivileged user can trigger this via a UDPv6 socket using
MSG_MORE together with MSG_SPLICE_PAGES.
The bad accounting was introduced by commit 773ba4fe9104 ("ipv6:
avoid partial copy for zc"). Before commit ce650a166335 ("udp6: Fix
__ip6_append_data()'s handling of MSG_SPLICE_PAGES"), the negative
copy value caused -EINVAL to be returned. That later commit allowed
MSG_SPLICE_PAGES to proceed in this case, making the corruption
triggerable.
The non-paged branch sets alloclen to fraglen, which already accounts
for fraggap because datalen does. Bring the paged branch in line by
adding fraggap to alloclen and subtracting it from pagedlen.
After this adjustment, copy no longer collapses to -fraggap on the
paged path, so remove the stale comment describing that old arithmetic.
Since a negative copy is no longer expected for a valid MSG_SPLICE_PAGES
case, remove the MSG_SPLICE_PAGES exception from the negative copy check.
Fixes: 773ba4fe9104 ("ipv6: avoid partial copy for zc")
Signed-off-by: Jungwoo Lee <jwlee2217@gmail.com>
Signed-off-by: Wongi Lee <qw3rtyp0@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
net/ipv6/ip6_output.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index c14adcdd4396..13463c95c7a7 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1668,8 +1668,8 @@ static int __ip6_append_data(struct sock *sk,
!(rt->dst.dev->features & NETIF_F_SG)))
alloclen = fraglen;
else {
- alloclen = fragheaderlen + transhdrlen;
- pagedlen = datalen - transhdrlen;
+ alloclen = fragheaderlen + transhdrlen + fraggap;
+ pagedlen = datalen - transhdrlen - fraggap;
}
alloclen += alloc_extra;
@@ -1684,10 +1684,7 @@ static int __ip6_append_data(struct sock *sk,
fraglen = datalen + fragheaderlen;
copy = datalen - transhdrlen - fraggap - pagedlen;
- /* [!] NOTE: copy may be negative if pagedlen>0
- * because then the equation may reduces to -fraggap.
- */
- if (copy < 0 && !(flags & MSG_SPLICE_PAGES)) {
+ if (copy < 0) {
err = -EINVAL;
goto error;
}
--
2.34.1
^ permalink raw reply related
* Re: [PATCH 1/4] vhost/vsock: split out vhost_vsock_drop_backends helper
From: Stefano Garzarella @ 2026-06-16 13:42 UTC (permalink / raw)
To: Andrey Drobyshev
Cc: linux-kernel, kvm, virtualization, netdev, mst, stefanha,
maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den
In-Reply-To: <20260612165718.433546-2-andrey.drobyshev@virtuozzo.com>
On Fri, Jun 12, 2026 at 07:57:15PM +0300, Andrey Drobyshev wrote:
>From: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>
>Split the actual backend dropping part from vhost_vsock_stop. We're
>going to need it for the VHOST_RESET_OWNER implementation in the
>following patch, when vsock->dev.mutex is already taken and owner is
>checked.
>
>Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>---
> drivers/vhost/vsock.c | 26 +++++++++++++++++---------
> 1 file changed, 17 insertions(+), 9 deletions(-)
LGTM!
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index 9aaab6bb8061..b12221ce6faf 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -664,9 +664,24 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
> return ret;
> }
>
>-static int vhost_vsock_stop(struct vhost_vsock *vsock, bool check_owner)
>+static void vhost_vsock_drop_backends(struct vhost_vsock *vsock)
> {
>+ struct vhost_virtqueue *vq;
> size_t i;
>+
>+ lockdep_assert_held(&vsock->dev.mutex);
>+
>+ for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
>+ vq = &vsock->vqs[i];
>+
>+ mutex_lock(&vq->mutex);
>+ vhost_vq_set_backend(vq, NULL);
>+ mutex_unlock(&vq->mutex);
>+ }
>+}
>+
>+static int vhost_vsock_stop(struct vhost_vsock *vsock, bool check_owner)
>+{
> int ret = 0;
>
> mutex_lock(&vsock->dev.mutex);
>@@ -677,14 +692,7 @@ static int vhost_vsock_stop(struct vhost_vsock *vsock, bool check_owner)
> goto err;
> }
>
>- for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
>- struct vhost_virtqueue *vq = &vsock->vqs[i];
>-
>- mutex_lock(&vq->mutex);
>- vhost_vq_set_backend(vq, NULL);
>- mutex_unlock(&vq->mutex);
>- }
>-
>+ vhost_vsock_drop_backends(vsock);
> err:
> mutex_unlock(&vsock->dev.mutex);
> return ret;
>--
>2.47.1
>
^ permalink raw reply
* [PATCH net v4 1/2] ipv4: account for fraggap on the paged allocation path
From: Wongi Lee @ 2026-06-16 13:38 UTC (permalink / raw)
To: netdev
Cc: David Ahern, Ido Schimmel, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, asml.silence, dhowells,
willemb, Jungwoo Lee
In-Reply-To: <ajFQn6yh43eDeQm9@DESKTOP-19IMU7U.localdomain>
In __ip_append_data(), when the paged-allocation branch is taken,
alloclen and pagedlen are computed as
alloclen = fragheaderlen + transhdrlen;
pagedlen = datalen - transhdrlen;
datalen already includes fraggap, but the fraggap bytes carried over
from the previous skb are copied into the new skb's linear area at
offset transhdrlen by the subsequent skb_copy_and_csum_bits(). The
linear area is therefore undersized by fraggap bytes while pagedlen is
overstated by the same amount.
The non-paged branch sets alloclen to fraglen, which already accounts
for fraggap because datalen does. Bring the paged branch in line by
adding fraggap to alloclen and subtracting it from pagedlen.
After this adjustment, copy no longer collapses to -fraggap on the
paged path, so remove the stale comment describing that old arithmetic.
Fixes: 8eb77cc73977 ("ipv4: avoid partial copy for zc")
Signed-off-by: Jungwoo Lee <jwlee2217@gmail.com>
Signed-off-by: Wongi Lee <qw3rtyp0@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
net/ipv4/ip_output.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 5bcd73cbdb41..ec790bad1679 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1117,8 +1117,8 @@ static int __ip_append_data(struct sock *sk,
!(rt->dst.dev->features & NETIF_F_SG)))
alloclen = fraglen;
else {
- alloclen = fragheaderlen + transhdrlen;
- pagedlen = datalen - transhdrlen;
+ alloclen = fragheaderlen + transhdrlen + fraggap;
+ pagedlen = datalen - transhdrlen - fraggap;
}
alloclen += alloc_extra;
@@ -1165,9 +1165,6 @@ static int __ip_append_data(struct sock *sk,
}
copy = datalen - transhdrlen - fraggap - pagedlen;
- /* [!] NOTE: copy will be negative if pagedlen>0
- * because then the equation reduces to -fraggap.
- */
if (copy > 0 &&
INDIRECT_CALL_1(getfrag, ip_generic_getfrag,
from, data + transhdrlen, offset,
--
2.34.1
^ permalink raw reply related
* Re: [PATCH v2] net: macb: add TX stall timeout callback to recover from lost TSTART write
From: Nicolai Buchwitz @ 2026-06-16 13:37 UTC (permalink / raw)
To: Andrea della Porta
Cc: netdev, Theo Lebrun, Nicolas Ferre, Claudiu Beznea, Andrew Lunn,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel, linux-arm-kernel, linux-rpi-kernel, Lukasz Raczylo,
Steffen Jaeckel
In-Reply-To: <468f480454a314303bac6a54780b153f689f2267.1781598350.git.andrea.porta@suse.com>
On 16.6.2026 15:23, Andrea della Porta wrote:
> From: Lukasz Raczylo <lukasz@raczylo.com>
>
> The MACB found in the Raspberry Pi RP1 suffers from sporadic stalls on
> the TX queue.
> While the exact root cause is not yet fully understood, it is likely
> related to a hardware issue where a TSTART write to the NCR register
> is missed, preventing the transmission from being kicked off.
>
> Implement a timeout callback to handle TX queue stalls, triggering the
> existing restart mechanism to recover.
>
> Link:
> https://lore.kernel.org/all/20260514215459.36109-1-lukasz@raczylo.com/
> Fixes: dc110d1b23564 ("net: cadence: macb: Add support for Raspberry Pi
> RP1 ethernet controller")
> Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>
> Co-developed-by: Steffen Jaeckel <sjaeckel@suse.de>
> Signed-off-by: Steffen Jaeckel <sjaeckel@suse.de>
> Co-developed-by: Andrea della Porta <andrea.porta@suse.com>
> Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
> ---
>
> CHANGES IN v2:
>
> - dropped the rate-limited log message
> - avoid incrementing tx_error as this is per packet
>
> ---
> drivers/net/ethernet/cadence/macb_main.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/drivers/net/ethernet/cadence/macb_main.c
> b/drivers/net/ethernet/cadence/macb_main.c
> index a12aa21244e83..fd282a1700fb9 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -4522,6 +4522,13 @@ static int macb_setup_tc(struct net_device *dev,
> enum tc_setup_type type,
> }
> }
>
> +static void macb_tx_timeout(struct net_device *dev, unsigned int q)
> +{
> + struct macb *bp = netdev_priv(dev);
> +
> + macb_tx_restart(&bp->queues[q]);
> +}
> +
> static const struct net_device_ops macb_netdev_ops = {
> .ndo_open = macb_open,
> .ndo_stop = macb_close,
> @@ -4540,6 +4547,7 @@ static const struct net_device_ops
> macb_netdev_ops = {
> .ndo_hwtstamp_set = macb_hwtstamp_set,
> .ndo_hwtstamp_get = macb_hwtstamp_get,
> .ndo_setup_tc = macb_setup_tc,
> + .ndo_tx_timeout = macb_tx_timeout,
> };
>
> /* Configure peripheral capabilities according to device tree
Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Thanks,
Nicolai
^ permalink raw reply
* [PATCH net-next v6 2/2] dinghai: add hardware register access and PCI capability scanning
From: han.junyang @ 2026-06-16 13:35 UTC (permalink / raw)
To: andrew+netdev, davem, edumazet, kuba, pabeni, horms
Cc: linux-kernel, netdev, han.junyang, ran.ming, han.chengfei,
zhang.yanze
In-Reply-To: <20260616212106742_trNLb7r-FL04eDlJO8tT@zte.com.cn>
From: Junyang Han <han.junyang@zte.com.cn>
Implement PCI configuration space access, BAR mapping, capability
scanning (common/notify/device), and hardware queue register
definitions for DingHai PF device.
Signed-off-by: Junyang Han <han.junyang@zte.com.cn>
---
drivers/net/ethernet/zte/dinghai/dh_queue.h | 71 ++++
drivers/net/ethernet/zte/dinghai/en_pf.c | 439 ++++++++++++++++++++
drivers/net/ethernet/zte/dinghai/en_pf.h | 66 +++
3 files changed, 576 insertions(+)
create mode 100644 drivers/net/ethernet/zte/dinghai/dh_queue.h
diff --git a/drivers/net/ethernet/zte/dinghai/dh_queue.h b/drivers/net/ethernet/zte/dinghai/dh_queue.h
new file mode 100644
index 000000000000..5067c73fed33
--- /dev/null
+++ b/drivers/net/ethernet/zte/dinghai/dh_queue.h
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * ZTE DingHai Ethernet driver - PCI capability definitions
+ * Copyright (c) 2022-2026, ZTE Corporation.
+ */
+
+#ifndef __DH_QUEUE_H__
+#define __DH_QUEUE_H__
+
+/* Vector value used to disable MSI for queue */
+#define ZXDH_MSI_NO_VECTOR 0xff
+
+/* Status byte for guest to report progress, and synchronize features */
+/* We have seen device and processed generic fields */
+#define ZXDH_CONFIG_S_ACKNOWLEDGE 1
+/* We have found a driver for the device. */
+#define ZXDH_CONFIG_S_DRIVER 2
+/* Driver has used its parts of the config, and is happy */
+#define ZXDH_CONFIG_S_DRIVER_OK 4
+/* Driver has finished configuring features */
+#define ZXDH_CONFIG_S_FEATURES_OK 8
+/* Device entered invalid state, driver must reset it */
+#define ZXDH_CONFIG_S_NEEDS_RESET 0x40
+/* We've given up on this device */
+#define ZXDH_CONFIG_S_FAILED 0x80
+
+/* This is the PCI capability header: */
+struct zxdh_pf_pci_cap {
+ __u8 cap_vndr; /* Generic PCI field: PCI_CAP_ID_VNDR */
+ __u8 cap_next; /* Generic PCI field: next ptr. */
+ __u8 cap_len; /* Generic PCI field: capability length */
+ __u8 cfg_type; /* Identifies the structure. */
+ __u8 bar; /* Where to find it. */
+ __u8 id; /* Multiple capabilities of the same type */
+ __u8 padding[2]; /* Pad to full dword. */
+ __le32 offset; /* Offset within bar. */
+ __le32 length; /* Length of the structure, in bytes. */
+};
+
+/* Fields in ZXDH_PF_PCI_CAP_COMMON_CFG: */
+struct zxdh_pf_pci_common_cfg {
+ /* About the whole device. */
+ __le32 device_feature_select; /* read-write */
+ __le32 device_feature; /* read-only */
+ __le32 guest_feature_select; /* read-write */
+ __le32 guest_feature; /* read-write */
+ __le16 msix_config; /* read-write */
+ __le16 num_queues; /* read-only */
+ __u8 device_status; /* read-write */
+ __u8 config_generation; /* read-only */
+
+ /* About a specific virtqueue. */
+ __le16 queue_select; /* read-write */
+ __le16 queue_size; /* read-write, power of 2. */
+ __le16 queue_msix_vector; /* read-write */
+ __le16 queue_enable; /* read-write */
+ __le16 queue_notify_off; /* read-only */
+ __le32 queue_desc_lo; /* read-write */
+ __le32 queue_desc_hi; /* read-write */
+ __le32 queue_avail_lo; /* read-write */
+ __le32 queue_avail_hi; /* read-write */
+ __le32 queue_used_lo; /* read-write */
+ __le32 queue_used_hi; /* read-write */
+};
+
+struct zxdh_pf_pci_notify_cap {
+ struct zxdh_pf_pci_cap cap;
+ __le32 notify_off_multiplier; /* Multiplier for queue_notify_off. */
+};
+
+#endif /* __DH_QUEUE_H__ */
diff --git a/drivers/net/ethernet/zte/dinghai/en_pf.c b/drivers/net/ethernet/zte/dinghai/en_pf.c
index 99f2a8af5bf4..401876623689 100644
--- a/drivers/net/ethernet/zte/dinghai/en_pf.c
+++ b/drivers/net/ethernet/zte/dinghai/en_pf.c
@@ -9,6 +9,7 @@
#include <net/devlink.h>
#include <linux/dma-mapping.h>
#include "en_pf.h"
+#include "dh_queue.h"
MODULE_AUTHOR("Junyang Han <han.junyang@zte.com.cn>");
MODULE_DESCRIPTION("ZTE DingHai series Ethernet driver");
@@ -90,6 +91,444 @@ void dh_pf_pci_close(struct dh_core_dev *dev)
pci_disable_device(dev->pdev);
}
+int zxdh_pf_pci_find_capability(struct pci_dev *pdev, u8 cfg_type,
+ u32 ioresource_types, int *bars)
+{
+ int pos;
+ u8 type;
+ u8 bar;
+
+ for (pos = pci_find_capability(pdev, PCI_CAP_ID_VNDR); pos > 0;
+ pos = pci_find_next_capability(pdev, pos, PCI_CAP_ID_VNDR)) {
+ pci_read_config_byte(pdev,
+ pos + offsetof(struct zxdh_pf_pci_cap,
+ cfg_type), &type);
+ pci_read_config_byte(pdev,
+ pos + offsetof(struct zxdh_pf_pci_cap, bar), &bar);
+
+ /* ignore structures with reserved BAR values */
+ if (bar > ZXDH_PF_MAX_BAR_VAL)
+ continue;
+
+ if (type == cfg_type) {
+ if (pci_resource_len(pdev, bar) &&
+ pci_resource_flags(pdev, bar) & ioresource_types) {
+ *bars |= (1 << bar);
+ return pos;
+ }
+ }
+ }
+
+ return 0;
+}
+
+void __iomem *zxdh_pf_map_capability(struct dh_core_dev *dh_dev, int off,
+ size_t minlen, u32 align,
+ u32 start, u32 size,
+ size_t *len, resource_size_t *pa,
+ u32 *bar_off)
+{
+ struct pci_dev *pdev = dh_dev->pdev;
+ void __iomem *p;
+ u32 offset;
+ u32 length;
+ u8 bar;
+
+ pci_read_config_byte(pdev,
+ off + offsetof(struct zxdh_pf_pci_cap, bar), &bar);
+ pci_read_config_dword(pdev,
+ off + offsetof(struct zxdh_pf_pci_cap,
+ offset), &offset);
+ pci_read_config_dword(pdev,
+ off + offsetof(struct zxdh_pf_pci_cap,
+ length), &length);
+
+ if (bar_off)
+ *bar_off = offset;
+
+ if (length <= start) {
+ dev_err(dh_dev->device, "bad capability len %u (>%u expected)\n",
+ length, start);
+ return NULL;
+ }
+
+ if (length - start < minlen) {
+ dev_err(dh_dev->device, "bad capability len %u (>=%zu expected)\n",
+ length, minlen);
+ return NULL;
+ }
+
+ length -= start;
+ if (start + offset < offset) {
+ dev_err(dh_dev->device, "map wrap-around %u+%u\n", start, offset);
+ return NULL;
+ }
+
+ offset += start;
+ if (offset & (align - 1)) {
+ dev_err(dh_dev->device, "offset %u not aligned to %u\n", offset, align);
+ return NULL;
+ }
+
+ if (length > size)
+ length = size;
+
+ if (len)
+ *len = length;
+
+ if (minlen + offset < minlen ||
+ minlen + offset > pci_resource_len(pdev, bar)) {
+ dev_err(dh_dev->device,
+ "map custom queue %zu@%u out of range on bar %i length %lu\n",
+ minlen, offset, bar,
+ (unsigned long)pci_resource_len(pdev, bar));
+ return NULL;
+ }
+
+ p = pci_iomap_range(pdev, bar, offset, length);
+ if (!p) {
+ dev_err(dh_dev->device, "unable to map custom queue %u@%u on bar %i\n",
+ length, offset, bar);
+ } else if (pa) {
+ *pa = pci_resource_start(pdev, bar) + offset;
+ }
+
+ return p;
+}
+
+int zxdh_pf_common_cfg_init(struct dh_core_dev *dh_dev)
+{
+ struct zxdh_pf_device *pf_dev = dh_dev->priv;
+ struct pci_dev *pdev = dh_dev->pdev;
+ int common;
+
+ /* check for a common config: if not, use legacy mode (bar 0). */
+ common = zxdh_pf_pci_find_capability(pdev, ZXDH_PCI_CAP_COMMON_CFG,
+ IORESOURCE_IO | IORESOURCE_MEM,
+ &pf_dev->modern_bars);
+ if (common == 0) {
+ dev_err(dh_dev->device,
+ "missing capabilities %i, leaving for legacy driver\n",
+ common);
+ return -ENODEV;
+ }
+
+ pf_dev->common = zxdh_pf_map_capability(dh_dev, common,
+ sizeof(struct zxdh_pf_pci_common_cfg),
+ ZXDH_PF_ALIGN4, 0,
+ sizeof(struct zxdh_pf_pci_common_cfg),
+ NULL, NULL, NULL);
+ if (!pf_dev->common) {
+ dev_err(dh_dev->device, "pf_dev->common is null\n");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+int zxdh_pf_notify_cfg_init(struct dh_core_dev *dh_dev)
+{
+ struct zxdh_pf_device *pf_dev = dh_dev->priv;
+ struct pci_dev *pdev = dh_dev->pdev;
+ u32 notify_length;
+ u32 notify_offset;
+ int notify;
+
+ /* If common is there, these should be too... */
+ notify = zxdh_pf_pci_find_capability(pdev, ZXDH_PCI_CAP_NOTIFY_CFG,
+ IORESOURCE_IO | IORESOURCE_MEM,
+ &pf_dev->modern_bars);
+ if (notify == 0) {
+ dev_err(dh_dev->device, "missing capabilities %i\n", notify);
+ return -EINVAL;
+ }
+
+ pci_read_config_dword(pdev,
+ notify + offsetof(struct zxdh_pf_pci_notify_cap,
+ notify_off_multiplier),
+ &pf_dev->notify_offset_multiplier);
+ pci_read_config_dword(pdev,
+ notify + offsetof(struct zxdh_pf_pci_notify_cap,
+ cap.length), ¬ify_length);
+ pci_read_config_dword(pdev,
+ notify + offsetof(struct zxdh_pf_pci_notify_cap,
+ cap.offset), ¬ify_offset);
+
+ /* We don't know how many VQs we'll map, ahead of the time.
+ * If notify length is small, map it all now. Otherwise,
+ * map each VQ individually later.
+ */
+ if (notify_length + (notify_offset % PAGE_SIZE) <= PAGE_SIZE) {
+ pf_dev->notify_base = zxdh_pf_map_capability(dh_dev, notify,
+ ZXDH_PF_MAP_MINLEN2,
+ ZXDH_PF_ALIGN2, 0,
+ notify_length,
+ &pf_dev->notify_len,
+ &pf_dev->notify_pa, NULL);
+ if (!pf_dev->notify_base) {
+ dev_err(dh_dev->device, "pf_dev->notify_base is null\n");
+ return -EINVAL;
+ }
+ } else {
+ pf_dev->notify_map_cap = notify;
+ }
+
+ return 0;
+}
+
+int zxdh_pf_device_cfg_init(struct dh_core_dev *dh_dev)
+{
+ struct zxdh_pf_device *pf_dev = dh_dev->priv;
+ struct pci_dev *pdev = dh_dev->pdev;
+ int device;
+
+ /* Device capability is only mandatory for
+ * devices that have device-specific configuration.
+ */
+ device = zxdh_pf_pci_find_capability(pdev, ZXDH_PCI_CAP_DEVICE_CFG,
+ IORESOURCE_IO | IORESOURCE_MEM,
+ &pf_dev->modern_bars);
+
+ /* we don't know how much we should map,
+ * but PAGE_SIZE is more than enough for all existing devices.
+ */
+ if (device) {
+ pf_dev->device = zxdh_pf_map_capability(dh_dev, device, 0,
+ ZXDH_PF_ALIGN4, 0, PAGE_SIZE,
+ &pf_dev->device_len, NULL,
+ &pf_dev->dev_cfg_bar_off);
+ if (!pf_dev->device) {
+ dev_err(dh_dev->device, "pf_dev->device is null\n");
+ return -EINVAL;
+ }
+ }
+ return 0;
+}
+
+void zxdh_pf_modern_cfg_uninit(struct dh_core_dev *dh_dev)
+{
+ struct zxdh_pf_device *pf_dev = dh_dev->priv;
+ struct pci_dev *pdev = dh_dev->pdev;
+
+ if (pf_dev->device)
+ pci_iounmap(pdev, pf_dev->device);
+ if (pf_dev->notify_base)
+ pci_iounmap(pdev, pf_dev->notify_base);
+ pci_iounmap(pdev, pf_dev->common);
+}
+
+int zxdh_pf_modern_cfg_init(struct dh_core_dev *dh_dev)
+{
+ struct zxdh_pf_device *pf_dev = dh_dev->priv;
+ struct pci_dev *pdev = dh_dev->pdev;
+ int ret;
+
+ ret = zxdh_pf_common_cfg_init(dh_dev);
+ if (ret) {
+ dev_err(dh_dev->device, "zxdh_pf_common_cfg_init failed: %d\n", ret);
+ return -EINVAL;
+ }
+
+ ret = zxdh_pf_notify_cfg_init(dh_dev);
+ if (ret) {
+ dev_err(dh_dev->device, "zxdh_pf_notify_cfg_init failed: %d\n", ret);
+ goto err_map_notify;
+ }
+
+ ret = zxdh_pf_device_cfg_init(dh_dev);
+ if (ret) {
+ dev_err(dh_dev->device, "zxdh_pf_device_cfg_init failed: %d\n", ret);
+ goto err_map_device;
+ }
+
+ return 0;
+
+err_map_device:
+ if (pf_dev->notify_base)
+ pci_iounmap(pdev, pf_dev->notify_base);
+err_map_notify:
+ pci_iounmap(pdev, pf_dev->common);
+ return -EINVAL;
+}
+
+u16 zxdh_pf_get_queue_notify_off(struct dh_core_dev *dh_dev,
+ u16 phy_index, u16 index)
+{
+ struct zxdh_pf_device *pf_dev = dh_dev->priv;
+
+ if (pf_dev->packed_status)
+ iowrite16(phy_index, &pf_dev->common->queue_select);
+ else
+ iowrite16(index, &pf_dev->common->queue_select);
+
+ return ioread16(&pf_dev->common->queue_notify_off);
+}
+
+void __iomem *zxdh_pf_map_vq_notify(struct dh_core_dev *dh_dev,
+ u16 phy_index, u16 index,
+ resource_size_t *pa)
+{
+ struct zxdh_pf_device *pf_dev = dh_dev->priv;
+ u16 off;
+
+ off = zxdh_pf_get_queue_notify_off(dh_dev, phy_index, index);
+
+ if (pf_dev->notify_base) {
+ /* offset should not wrap */
+ if ((u64)off *
+ pf_dev->notify_offset_multiplier + 2 > pf_dev->notify_len) {
+ dev_err(dh_dev->device,
+ "bad notification offset %u (x %u) for queue %u > %zd",
+ off, pf_dev->notify_offset_multiplier, phy_index,
+ pf_dev->notify_len);
+ return NULL;
+ }
+
+ if (pa)
+ *pa = pf_dev->notify_pa + off * pf_dev->notify_offset_multiplier;
+
+ return pf_dev->notify_base + off * pf_dev->notify_offset_multiplier;
+ } else {
+ return zxdh_pf_map_capability(dh_dev, pf_dev->notify_map_cap, 2, 2,
+ off * pf_dev->notify_offset_multiplier,
+ 2, NULL, pa, NULL);
+ }
+}
+
+void zxdh_pf_unmap_vq_notify(struct dh_core_dev *dh_dev, void __iomem *priv)
+{
+ struct zxdh_pf_device *pf_dev = dh_dev->priv;
+
+ if (!pf_dev->notify_base)
+ pci_iounmap(dh_dev->pdev, priv);
+}
+
+void zxdh_pf_set_status(struct dh_core_dev *dh_dev, u8 status)
+{
+ struct zxdh_pf_device *pf_dev = dh_dev->priv;
+
+ iowrite8(status, &pf_dev->common->device_status);
+}
+
+u8 zxdh_pf_get_status(struct dh_core_dev *dh_dev)
+{
+ struct zxdh_pf_device *pf_dev = dh_dev->priv;
+
+ return ioread8(&pf_dev->common->device_status);
+}
+
+u8 zxdh_pf_get_cfg_gen(struct dh_core_dev *dh_dev)
+{
+ struct zxdh_pf_device *pf_dev = dh_dev->priv;
+ u8 config_generation;
+
+ config_generation = ioread8(&pf_dev->common->config_generation);
+
+ return config_generation;
+}
+
+void zxdh_pf_get_vf_mac(struct dh_core_dev *dh_dev, u8 *mac, int vf_id)
+{
+ struct zxdh_pf_device *pf_dev = dh_dev->priv;
+ u32 DEV_MAC_L;
+ u16 DEV_MAC_H;
+
+ if (pf_dev->pf_sriov_cap_base) {
+ DEV_MAC_L = ioread32(pf_dev->pf_sriov_cap_base +
+ (pf_dev->sriov_bar_size) * vf_id +
+ pf_dev->dev_cfg_bar_off);
+ mac[0] = DEV_MAC_L & 0xff;
+ mac[1] = (DEV_MAC_L >> 8) & 0xff;
+ mac[2] = (DEV_MAC_L >> 16) & 0xff;
+ mac[3] = (DEV_MAC_L >> 24) & 0xff;
+ DEV_MAC_H = ioread16(pf_dev->pf_sriov_cap_base +
+ (pf_dev->sriov_bar_size) * vf_id +
+ pf_dev->dev_cfg_bar_off +
+ ZXDH_DEV_MAC_HIGH_OFFSET);
+ mac[4] = DEV_MAC_H & 0xff;
+ mac[5] = (DEV_MAC_H >> 8) & 0xff;
+ }
+}
+
+void zxdh_pf_set_vf_mac_reg(struct zxdh_pf_device *pf_dev,
+ u8 *mac, int vf_id)
+{
+ u32 DEV_MAC_L;
+ u16 DEV_MAC_H;
+
+ if (pf_dev->pf_sriov_cap_base) {
+ DEV_MAC_L = mac[0] | (mac[1] << 8) |
+ (mac[2] << 16) | (mac[3] << 24);
+ DEV_MAC_H = mac[4] | (mac[5] << 8);
+ iowrite32(DEV_MAC_L, (pf_dev->pf_sriov_cap_base +
+ (pf_dev->sriov_bar_size) * vf_id +
+ pf_dev->dev_cfg_bar_off));
+ iowrite16(DEV_MAC_H, (pf_dev->pf_sriov_cap_base +
+ (pf_dev->sriov_bar_size) * vf_id +
+ pf_dev->dev_cfg_bar_off +
+ ZXDH_DEV_MAC_HIGH_OFFSET));
+ }
+}
+
+void zxdh_pf_set_vf_mac(struct dh_core_dev *dh_dev, u8 *mac, int vf_id)
+{
+ struct zxdh_pf_device *pf_dev = dh_dev->priv;
+
+ zxdh_pf_set_vf_mac_reg(pf_dev, mac, vf_id);
+}
+
+void zxdh_set_mac(struct dh_core_dev *dh_dev, u8 *mac)
+{
+ struct zxdh_pf_device *pf_dev = dh_dev->priv;
+ u32 DEV_MAC_L;
+ u16 DEV_MAC_H;
+
+ DEV_MAC_L = mac[0] | (mac[1] << 8) | (mac[2] << 16) | (mac[3] << 24);
+ DEV_MAC_H = mac[4] | (mac[5] << 8);
+ iowrite32(DEV_MAC_L, pf_dev->device);
+ iowrite16(DEV_MAC_H, pf_dev->device + ZXDH_DEV_MAC_HIGH_OFFSET);
+}
+
+void zxdh_get_mac(struct dh_core_dev *dh_dev, u8 *mac)
+{
+ struct zxdh_pf_device *pf_dev = dh_dev->priv;
+ u32 DEV_MAC_L;
+ u16 DEV_MAC_H;
+
+ DEV_MAC_L = ioread32(pf_dev->device);
+ mac[0] = DEV_MAC_L & 0xff;
+ mac[1] = (DEV_MAC_L >> 8) & 0xff;
+ mac[2] = (DEV_MAC_L >> 16) & 0xff;
+ mac[3] = (DEV_MAC_L >> 24) & 0xff;
+ DEV_MAC_H = ioread16(pf_dev->device + ZXDH_DEV_MAC_HIGH_OFFSET);
+ mac[4] = DEV_MAC_H & 0xff;
+ mac[5] = (DEV_MAC_H >> 8) & 0xff;
+}
+
+u64 zxdh_pf_get_features(struct dh_core_dev *dh_dev)
+{
+ struct zxdh_pf_device *pf_dev = dh_dev->priv;
+ u64 device_feature;
+
+ iowrite32(0, &pf_dev->common->device_feature_select);
+ device_feature = ioread32(&pf_dev->common->device_feature);
+ iowrite32(1, &pf_dev->common->device_feature_select);
+ device_feature |= ((u64)ioread32(&pf_dev->common->device_feature)
+ << 32);
+
+ return device_feature;
+}
+
+void zxdh_pf_set_features(struct dh_core_dev *dh_dev, u64 features)
+{
+ struct zxdh_pf_device *pf_dev = dh_dev->priv;
+
+ iowrite32(0, &pf_dev->common->guest_feature_select);
+ iowrite32((u32)features, &pf_dev->common->guest_feature);
+ iowrite32(1, &pf_dev->common->guest_feature_select);
+ iowrite32(features >> 32, &pf_dev->common->guest_feature);
+}
+
static int dh_pf_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
struct zxdh_pf_device *pf_dev;
diff --git a/drivers/net/ethernet/zte/dinghai/en_pf.h b/drivers/net/ethernet/zte/dinghai/en_pf.h
index 80ff1b860b83..434d18944924 100644
--- a/drivers/net/ethernet/zte/dinghai/en_pf.h
+++ b/drivers/net/ethernet/zte/dinghai/en_pf.h
@@ -17,6 +17,24 @@
#define ZXDH_PF_DEVICE_ID 0x8040
#define ZXDH_VF_DEVICE_ID 0x8041
+/* Common configuration */
+#define ZXDH_PCI_CAP_COMMON_CFG 1
+/* Notifications */
+#define ZXDH_PCI_CAP_NOTIFY_CFG 2
+/* ISR access */
+#define ZXDH_PCI_CAP_ISR_CFG 3
+/* Device specific configuration */
+#define ZXDH_PCI_CAP_DEVICE_CFG 4
+/* PCI configuration access */
+#define ZXDH_PCI_CAP_PCI_CFG 5
+
+#define ZXDH_PF_MAX_BAR_VAL 0x5
+#define ZXDH_PF_ALIGN4 4
+#define ZXDH_PF_ALIGN2 2
+#define ZXDH_PF_MAP_MINLEN2 2
+
+#define ZXDH_DEV_MAC_HIGH_OFFSET 4
+
enum dh_coredev_type {
DH_COREDEV_PF,
DH_COREDEV_VF,
@@ -36,7 +54,26 @@ struct dh_core_dev {
};
struct zxdh_pf_device {
+ struct zxdh_pf_pci_common_cfg __iomem *common;
+ /* Device-specific data (non-legacy mode) */
+ /* Base of vq notifications (non-legacy mode). */
+ void __iomem *device;
+ void __iomem *notify_base;
+ void __iomem *pf_sriov_cap_base;
+ /* Physical base of vq notifications */
+ resource_size_t notify_pa;
+ /* So we can sanity-check accesses. */
+ size_t notify_len;
+ size_t device_len;
+ /* Capability for when we need to map notifications per-vq. */
+ s32 notify_map_cap;
+ u32 notify_offset_multiplier;
+ /* Multiply queue_notify_off by this value. (non-legacy mode). */
+ s32 modern_bars;
void __iomem *pci_ioremap_addr[6];
+ u64 sriov_bar_size;
+ u32 dev_cfg_bar_off;
+ bool packed_status;
bool bar_chan_valid;
bool vepa;
struct mutex irq_lock; /* Protects IRQ operations */
@@ -61,5 +98,34 @@ static inline void dh_core_free_priv(struct dh_core_dev *dh_dev)
((pdev)->device == ZXDH_VF_DEVICE_ID ? DH_COREDEV_VF : DH_COREDEV_PF)
void dh_pf_pci_close(struct dh_core_dev *dev);
+int zxdh_pf_pci_find_capability(struct pci_dev *pdev, u8 cfg_type,
+ u32 ioresource_types, int *bars);
+void __iomem *zxdh_pf_map_capability(struct dh_core_dev *dh_dev, int off,
+ size_t minlen, u32 align,
+ u32 start, u32 size,
+ size_t *len, resource_size_t *pa,
+ u32 *bar_off);
+int zxdh_pf_common_cfg_init(struct dh_core_dev *dh_dev);
+int zxdh_pf_notify_cfg_init(struct dh_core_dev *dh_dev);
+int zxdh_pf_device_cfg_init(struct dh_core_dev *dh_dev);
+void zxdh_pf_modern_cfg_uninit(struct dh_core_dev *dh_dev);
+int zxdh_pf_modern_cfg_init(struct dh_core_dev *dh_dev);
+u16 zxdh_pf_get_queue_notify_off(struct dh_core_dev *dh_dev,
+ u16 phy_index, u16 index);
+void __iomem *zxdh_pf_map_vq_notify(struct dh_core_dev *dh_dev,
+ u16 phy_index, u16 index,
+ resource_size_t *pa);
+void zxdh_pf_unmap_vq_notify(struct dh_core_dev *dh_dev, void __iomem *priv);
+void zxdh_pf_set_status(struct dh_core_dev *dh_dev, u8 status);
+u8 zxdh_pf_get_status(struct dh_core_dev *dh_dev);
+u8 zxdh_pf_get_cfg_gen(struct dh_core_dev *dh_dev);
+void zxdh_pf_get_vf_mac(struct dh_core_dev *dh_dev, u8 *mac, int vf_id);
+void zxdh_pf_set_vf_mac_reg(struct zxdh_pf_device *pf_dev,
+ u8 *mac, int vf_id);
+void zxdh_pf_set_vf_mac(struct dh_core_dev *dh_dev, u8 *mac, int vf_id);
+void zxdh_set_mac(struct dh_core_dev *dh_dev, u8 *mac);
+void zxdh_get_mac(struct dh_core_dev *dh_dev, u8 *mac);
+u64 zxdh_pf_get_features(struct dh_core_dev *dh_dev);
+void zxdh_pf_set_features(struct dh_core_dev *dh_dev, u64 features);
#endif /* __ZXDH_EN_PF_H__ */
--
2.27.0
^ permalink raw reply related
* Re: [PATCH 0/4] vhost/vsock: add support for VHOST_RESET_OWNER and CPR migration
From: Stefano Garzarella @ 2026-06-16 13:35 UTC (permalink / raw)
To: Andrey Drobyshev
Cc: linux-kernel, kvm, virtualization, netdev, mst, stefanha,
maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den
In-Reply-To: <20260612165718.433546-1-andrey.drobyshev@virtuozzo.com>
Hi Andrey,
thanks for the series!
On Fri, Jun 12, 2026 at 07:57:14PM +0300, Andrey Drobyshev wrote:
>Host<-->guest connections via AF_VSOCK sockets aren't supposed to
>outlive VM migration, since VM is moving to another host. However
>there's a special case, which is QEMU live-update, or CPR
>(checkpoint-restore) migration. In this case, VM remains on the same
>host, and we'd like such connections to persist.
In the spec we have VIRTIO_VSOCK_EVENT_TRANSPORT_RESET which is usually
sent by the device after a migration.
IIUC the specs don't say this has to be done all the time, so we don't
need to change anything in the specs, right?
We just need to avoid sending it (which I think is what we're doing
here... I still need to look at the patches).
>
>For this to work, we need to be able to transfer device ownership from
>source QEMU to dest QEMU. Namely, source needs to reset ownership by
>issuing VHOST_RESET_OWNER ioctl, and then target has to claim it by
>calling VHOST_SET_OWNER.
>
>Since VHOST_RESET_OWNER isn't yet implemented for vhost-vsock, let's add
>such implementation (patches 1-2). Also fix regression introduced by
>the earlier commit [1] (patch 3), and fix the deadlock bug (commit 4).
If it's a regression, should we fix it separately?
Or is it related to this series?
>
>There's a complementary series for QEMU [0] adding support of vhost-vsock
>devices during CPR migration.
>
>NOTE: this series needs to be applied on top of Michael's vhost/linux-next
>tree as it contains relevant commit [1], not yet present in master branch.
>
>I've tested this (patched QEMU + patched kernel) approximately as follows:
>
> * Run listener in the guest:
> socat -u VSOCK-LISTEN:9999 - >/tmp/recv.bin
>
> * Run data transfer from host to guest:
> socat -u FILE:/root/bigfile.bin VSOCK-CONNECT:CID:9999
>
> * Perform CPR migration during transfer (either cpr-exec or cpr-transfer)
> * Check that file hash sum matches
>
>[0] https://lore.kernel.org/qemu-devel/20260612165110.431376-1-andrey.drobyshev@virtuozzo.com
>[1] https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/commit/?id=bb26ed5f3a8b
>
>Andrey Drobyshev (1):
> vhost/vsock: re-scan TX virtqueue on device start
>
>Denis V. Lunev (1):
> vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause
>
>Pavel Tikhomirov (2):
> vhost/vsock: split out vhost_vsock_drop_backends helper
> vhost/vsock: add VHOST_RESET_OWNER ioctl
>
> drivers/vhost/vsock.c | 80 +++++++++++++++++++++++++++++++++++++------
> 1 file changed, 69 insertions(+), 11 deletions(-)
>
>--
>2.47.1
>
^ permalink raw reply
* Re: [PATCH bpf v2 1/2] bpf: Fix partial copy of non-linear test_run output
From: Paul Chaignon @ 2026-06-16 13:33 UTC (permalink / raw)
To: Sun Jian
Cc: bpf, netdev, linux-kselftest, linux-kernel, ast, daniel, andrii,
martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, davem,
edumazet, kuba, pabeni, horms, shuah, hawk, john.fastabend, sdf,
toke, lorenzo
In-Reply-To: <20260616093103.471444-2-sun.jian.kdev@gmail.com>
On Tue, Jun 16, 2026 at 05:31:02PM +0800, Sun Jian wrote:
> For non-linear test_run output, bpf_test_finish() derives the linear
> data copy length from copy_size - frag_size. This only matches the
> linear data length when copy_size is the full packet size.
>
> When userspace provides a short data_out buffer, copy_size is clamped to
> that buffer size. If copy_size is smaller than frag_size, the computed
> length becomes negative and bpf_test_finish() returns -ENOSPC before
> copying the packet prefix or updating data_size_out.
>
> Compute the linear data length from the packet layout instead, and clamp
> the linear copy length to copy_size. This preserves the expected
> partial-copy semantics: return -ENOSPC, copy the packet prefix that fits
> in data_out, and report the full packet length through data_size_out.
>
> Fixes: 7855e0db150ad ("bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature")
> Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
> ---
> net/bpf/test_run.c | 11 ++++-------
> 1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index 2bc04feadfab..976e8fa31bc9 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -453,19 +453,16 @@ static int bpf_test_finish(const union bpf_attr *kattr,
> }
>
> if (data_out) {
> - int len = sinfo ? copy_size - frag_size : copy_size;
> -
> - if (len < 0) {
> - err = -ENOSPC;
> - goto out;
> - }
> + u32 head_len = size - frag_size;
> + u32 len = min(copy_size, head_len);
>
> if (copy_to_user(data_out, data, len))
> goto out;
>
> if (sinfo) {
> - int i, offset = len;
> + u32 offset = len;
> u32 data_len;
> + int i;
That doesn't look needed.
>
> for (i = 0; i < sinfo->nr_frags; i++) {
> skb_frag_t *frag = &sinfo->frags[i];
> --
> 2.43.0
>
^ permalink raw reply
* [PATCH net v4 0/2] ipv4/ipv6: account for fraggap on paged allocation paths
From: Wongi Lee @ 2026-06-16 13:33 UTC (permalink / raw)
To: netdev
Cc: David Ahern, Ido Schimmel, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, asml.silence, dhowells,
willemb, Jungwoo Lee
Fix fraggap accounting in the paged-allocation paths of IPv4 and IPv6.
The IPv6 patch is the v4 update of the previously posted patch. The IPv4
patch handles the same code pattern (by Ido).
v3->v4
- Remove the MSG_SPLICE_PAGES exception from the IPv6 negative copy check.
- Clarify where the fraggap bytes are copied in the commit messages.
- Add Reviewed-by tags.
v2->v3
- Add the IPv4 counterpart.
- Mention that the IPv6 corruption became triggerable after ce650a166335.
- Remove the stale comments about copy becoming -fraggap when pagedlen > 0.
- Add missing Cc entries.
v1->v2:
- Fix mail format.
v3: https://lore.kernel.org/netdev/aiq3f7UZGFp0F3MV@DESKTOP-19IMU7U.localdomain/
v2: https://lore.kernel.org/netdev/aigx83czv+UJZA0d@DESKTOP-19IMU7U.localdomain/
v1: https://lore.kernel.org/netdev/aibiIYMAwUErTw5U@DESKTOP-19IMU7U.localdomain/
Wongi Lee (2):
ipv4: account for fraggap on the paged allocation path
ipv6: account for fraggap on the paged allocation path
net/ipv4/ip_output.c | 7 ++-----
net/ipv6/ip6_output.c | 9 +++------
2 files changed, 5 insertions(+), 11 deletions(-)
--
2.34.1
^ permalink raw reply
* [PATCH net-next v6 1/2] dinghai: add ZTE network driver support
From: han.junyang @ 2026-06-16 13:30 UTC (permalink / raw)
To: andrew+netdev, davem, edumazet, kuba, pabeni, horms
Cc: linux-kernel, netdev, han.junyang, ran.ming, han.chengfei,
zhang.yanze
In-Reply-To: <20260616212106742_trNLb7r-FL04eDlJO8tT@zte.com.cn>
From: Junyang Han <han.junyang@zte.com.cn>
Add basic framework for ZTE DingHai ethernet PF driver, including
Kconfig/Makefile build support and PCIe device probe/remove skeleton.
Signed-off-by: Junyang Han <han.junyang@zte.com.cn>
---
MAINTAINERS | 6 +
drivers/net/ethernet/Kconfig | 1 +
drivers/net/ethernet/Makefile | 1 +
drivers/net/ethernet/zte/Kconfig | 20 +++
drivers/net/ethernet/zte/Makefile | 6 +
drivers/net/ethernet/zte/dinghai/Kconfig | 34 ++++
drivers/net/ethernet/zte/dinghai/Makefile | 10 ++
drivers/net/ethernet/zte/dinghai/en_pf.c | 183 ++++++++++++++++++++++
drivers/net/ethernet/zte/dinghai/en_pf.h | 65 ++++++++
9 files changed, 326 insertions(+)
create mode 100644 drivers/net/ethernet/zte/Kconfig
create mode 100644 drivers/net/ethernet/zte/Makefile
create mode 100644 drivers/net/ethernet/zte/dinghai/Kconfig
create mode 100644 drivers/net/ethernet/zte/dinghai/Makefile
create mode 100644 drivers/net/ethernet/zte/dinghai/en_pf.c
create mode 100644 drivers/net/ethernet/zte/dinghai/en_pf.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 2fb1c75afd16..73692b09bf7b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -29440,6 +29440,12 @@ S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git
F: sound/hda/codecs/senarytech.c
+ZTE DINGHAI ETHERNET DRIVER
+M: Junyang Han <han.junyang@zte.com.cn>
+L: netdev@vger.kernel.org
+S: Maintained
+F: drivers/net/ethernet/zte/
+
THE REST
M: Linus Torvalds <torvalds@linux-foundation.org>
L: linux-kernel@vger.kernel.org
diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index b8f70e2a1763..c2b6996b0cfe 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -188,5 +188,6 @@ source "drivers/net/ethernet/wangxun/Kconfig"
source "drivers/net/ethernet/wiznet/Kconfig"
source "drivers/net/ethernet/xilinx/Kconfig"
source "drivers/net/ethernet/xircom/Kconfig"
+source "drivers/net/ethernet/zte/Kconfig"
endif # ETHERNET
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index 57344fec6ce0..a34bcbd4df4e 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -104,3 +104,4 @@ obj-$(CONFIG_NET_VENDOR_XIRCOM) += xircom/
obj-$(CONFIG_NET_VENDOR_SYNOPSYS) += synopsys/
obj-$(CONFIG_NET_VENDOR_PENSANDO) += pensando/
obj-$(CONFIG_OA_TC6) += oa_tc6.o
+obj-$(CONFIG_NET_VENDOR_ZTE) += zte/
diff --git a/drivers/net/ethernet/zte/Kconfig b/drivers/net/ethernet/zte/Kconfig
new file mode 100644
index 000000000000..b95c2fc7db77
--- /dev/null
+++ b/drivers/net/ethernet/zte/Kconfig
@@ -0,0 +1,20 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# ZTE driver configuration
+#
+
+config NET_VENDOR_ZTE
+ bool "ZTE devices"
+ default y
+ help
+ If you have a network (Ethernet) card belonging to this class, say Y.
+ Note that the answer to this question doesn't directly affect the
+ kernel: saying N will just cause the configurator to skip all
+ the questions about Zte cards. If you say Y, you will be asked
+ for your specific card in the following questions.
+
+if NET_VENDOR_ZTE
+
+source "drivers/net/ethernet/zte/dinghai/Kconfig"
+
+endif # NET_VENDOR_ZTE
diff --git a/drivers/net/ethernet/zte/Makefile b/drivers/net/ethernet/zte/Makefile
new file mode 100644
index 000000000000..cd9929b61559
--- /dev/null
+++ b/drivers/net/ethernet/zte/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for the ZTE device drivers
+#
+
+obj-$(CONFIG_DINGHAI) += dinghai/
diff --git a/drivers/net/ethernet/zte/dinghai/Kconfig b/drivers/net/ethernet/zte/dinghai/Kconfig
new file mode 100644
index 000000000000..94b5bd9b3c50
--- /dev/null
+++ b/drivers/net/ethernet/zte/dinghai/Kconfig
@@ -0,0 +1,34 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# ZTE DingHai Ethernet driver configuration
+#
+
+config DINGHAI
+ bool "ZTE DingHai Ethernet driver"
+ depends on NET_VENDOR_ZTE && PCI
+ select NET_DEVLINK
+ help
+ This driver supports ZTE DingHai Ethernet devices.
+
+ DingHai is a high-performance Ethernet controller that supports
+ multiple features including hardware offloading, SR-IOV, and
+ advanced virtualization capabilities.
+
+ If you say Y here, you can select specific driver variants below.
+
+ If unsure, say N.
+
+if DINGHAI
+
+config DINGHAI_PF
+ tristate "ZTE DingHai PF (Physical Function) driver"
+ help
+ This driver supports ZTE DingHai PCI Express Ethernet
+ adapters (PF).
+
+ To compile this driver as a module, choose M here. The module
+ will be named dinghai10e.
+
+ If unsure, say N.
+
+endif # DINGHAI
diff --git a/drivers/net/ethernet/zte/dinghai/Makefile b/drivers/net/ethernet/zte/dinghai/Makefile
new file mode 100644
index 000000000000..f55a8de518be
--- /dev/null
+++ b/drivers/net/ethernet/zte/dinghai/Makefile
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for ZTE DingHai Ethernet driver
+#
+
+ccflags-y += -I$(src)
+
+obj-$(CONFIG_DINGHAI_PF) += dinghai10e.o
+dinghai10e-y := en_pf.o
+
diff --git a/drivers/net/ethernet/zte/dinghai/en_pf.c b/drivers/net/ethernet/zte/dinghai/en_pf.c
new file mode 100644
index 000000000000..99f2a8af5bf4
--- /dev/null
+++ b/drivers/net/ethernet/zte/dinghai/en_pf.c
@@ -0,0 +1,183 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * ZTE DingHai Ethernet driver
+ * Copyright (c) 2022-2026, ZTE Corporation.
+ */
+
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <net/devlink.h>
+#include <linux/dma-mapping.h>
+#include "en_pf.h"
+
+MODULE_AUTHOR("Junyang Han <han.junyang@zte.com.cn>");
+MODULE_DESCRIPTION("ZTE DingHai series Ethernet driver");
+MODULE_LICENSE("GPL");
+
+static const struct devlink_ops dh_pf_devlink_ops = {};
+
+static const struct pci_device_id dh_pf_pci_table[] = {
+ { PCI_DEVICE(ZXDH_PF_VENDOR_ID, ZXDH_PF_DEVICE_ID), 0 },
+ { PCI_DEVICE(ZXDH_PF_VENDOR_ID, ZXDH_VF_DEVICE_ID), 0 },
+ { 0, }
+};
+
+MODULE_DEVICE_TABLE(pci, dh_pf_pci_table);
+
+static int dh_pf_pci_init(struct dh_core_dev *dev)
+{
+ struct zxdh_pf_device *pf_dev = dev->priv;
+ int ret;
+
+ pci_set_drvdata(dev->pdev, dev);
+
+ ret = pci_enable_device(dev->pdev);
+ if (ret) {
+ dev_err(dev->device, "pci_enable_device failed: %d\n", ret);
+ return ret;
+ }
+
+ ret = dma_set_mask_and_coherent(dev->device, DMA_BIT_MASK(64));
+ if (ret) {
+ ret = dma_set_mask_and_coherent(dev->device, DMA_BIT_MASK(32));
+ if (ret) {
+ dev_err(dev->device, "dma_set_mask_and_coherent failed: %d\n", ret);
+ goto err_pci;
+ }
+ }
+
+ ret = pci_request_selected_regions(dev->pdev,
+ pci_select_bars(dev->pdev, IORESOURCE_MEM),
+ "dh-pf");
+ if (ret) {
+ dev_err(dev->device, "pci_request_selected_regions failed: %d\n", ret);
+ goto err_pci;
+ }
+
+ pci_set_master(dev->pdev);
+ ret = pci_save_state(dev->pdev);
+ if (ret) {
+ dev_err(dev->device, "pci_save_state failed: %d\n", ret);
+ goto err_pci_save_state;
+ }
+
+ pf_dev->pci_ioremap_addr[0] =
+ ioremap(pci_resource_start(dev->pdev, 0),
+ pci_resource_len(dev->pdev, 0));
+ if (!pf_dev->pci_ioremap_addr[0]) {
+ ret = -ENOMEM;
+ dev_err(dev->device, "dh pf pci ioremap failed\n");
+ goto err_pci_save_state;
+ }
+
+ return 0;
+
+err_pci_save_state:
+ pci_release_selected_regions(dev->pdev,
+ pci_select_bars(dev->pdev, IORESOURCE_MEM));
+err_pci:
+ pci_disable_device(dev->pdev);
+ return ret;
+}
+
+void dh_pf_pci_close(struct dh_core_dev *dev)
+{
+ struct zxdh_pf_device *pf_dev = dev->priv;
+
+ iounmap(pf_dev->pci_ioremap_addr[0]);
+ pci_release_selected_regions(dev->pdev,
+ pci_select_bars(dev->pdev, IORESOURCE_MEM));
+ pci_disable_device(dev->pdev);
+}
+
+static int dh_pf_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+ struct zxdh_pf_device *pf_dev;
+ struct dh_core_dev *dh_dev;
+ struct devlink *devlink;
+ int ret;
+
+ devlink = devlink_alloc(&dh_pf_devlink_ops, sizeof(struct dh_core_dev),
+ &pdev->dev);
+ if (!devlink) {
+ dev_err(&pdev->dev, "dh_pf devlink alloc failed\n");
+ return -ENOMEM;
+ }
+
+ dh_dev = devlink_priv(devlink);
+ dh_dev->device = &pdev->dev;
+ dh_dev->pdev = pdev;
+ dh_dev->devlink = devlink;
+
+ pf_dev = dh_core_alloc_priv(dh_dev, sizeof(*pf_dev));
+ if (!pf_dev) {
+ dev_err(&pdev->dev, "dh_pf_dev alloc failed\n");
+ ret = -ENOMEM;
+ goto err_pf_dev;
+ }
+
+ pf_dev->bar_chan_valid = false;
+ pf_dev->vepa = false;
+ mutex_init(&dh_dev->lock);
+ mutex_init(&pf_dev->irq_lock);
+
+ dh_dev->coredev_type = GET_COREDEV_TYPE(pdev);
+
+ ret = dh_pf_pci_init(dh_dev);
+ if (ret) {
+ dev_err(&pdev->dev, "dh_pf_pci_init failed: %d\n", ret);
+ goto err_cfg_init;
+ }
+
+ devlink_register(devlink);
+
+ return 0;
+
+err_cfg_init:
+ mutex_destroy(&pf_dev->irq_lock);
+ mutex_destroy(&dh_dev->lock);
+ dh_core_free_priv(dh_dev);
+err_pf_dev:
+ devlink_free(devlink);
+ return ret;
+}
+
+static void dh_pf_remove(struct pci_dev *pdev)
+{
+ struct dh_core_dev *dh_dev = pci_get_drvdata(pdev);
+ struct devlink *devlink = priv_to_devlink(dh_dev);
+ struct zxdh_pf_device *pf_dev = dh_dev->priv;
+
+ devlink_unregister(devlink);
+ dh_pf_pci_close(dh_dev);
+ mutex_destroy(&pf_dev->irq_lock);
+ mutex_destroy(&dh_dev->lock);
+ dh_core_free_priv(dh_dev);
+ devlink_free(devlink);
+ pci_set_drvdata(pdev, NULL);
+}
+
+static void dh_pf_shutdown(struct pci_dev *pdev)
+{
+ struct dh_core_dev *dh_dev = pci_get_drvdata(pdev);
+ struct devlink *devlink = priv_to_devlink(dh_dev);
+ struct zxdh_pf_device *pf_dev = dh_dev->priv;
+
+ devlink_unregister(devlink);
+ dh_pf_pci_close(dh_dev);
+ mutex_destroy(&pf_dev->irq_lock);
+ mutex_destroy(&dh_dev->lock);
+ dh_core_free_priv(dh_dev);
+ devlink_free(devlink);
+ pci_set_drvdata(pdev, NULL);
+}
+
+static struct pci_driver dh_pf_driver = {
+ .name = "dinghai10e",
+ .id_table = dh_pf_pci_table,
+ .probe = dh_pf_probe,
+ .remove = dh_pf_remove,
+ .shutdown = dh_pf_shutdown,
+};
+
+module_pci_driver(dh_pf_driver);
diff --git a/drivers/net/ethernet/zte/dinghai/en_pf.h b/drivers/net/ethernet/zte/dinghai/en_pf.h
new file mode 100644
index 000000000000..80ff1b860b83
--- /dev/null
+++ b/drivers/net/ethernet/zte/dinghai/en_pf.h
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * ZTE DingHai Ethernet driver - PF header
+ * Copyright (c) 2022-2026, ZTE Corporation.
+ */
+
+#ifndef __ZXDH_EN_PF_H__
+#define __ZXDH_EN_PF_H__
+
+#include <linux/types.h>
+#include <linux/pci.h>
+#include <linux/mutex.h>
+#include <linux/device.h>
+#include <linux/slab.h>
+
+#define ZXDH_PF_VENDOR_ID 0x1cf2
+#define ZXDH_PF_DEVICE_ID 0x8040
+#define ZXDH_VF_DEVICE_ID 0x8041
+
+enum dh_coredev_type {
+ DH_COREDEV_PF,
+ DH_COREDEV_VF,
+ DH_COREDEV_SF,
+ DH_COREDEV_MPF
+};
+
+struct devlink;
+
+struct dh_core_dev {
+ struct device *device;
+ enum dh_coredev_type coredev_type;
+ struct pci_dev *pdev;
+ struct devlink *devlink;
+ struct mutex lock; /* Protects device configuration */
+ void *priv;
+};
+
+struct zxdh_pf_device {
+ void __iomem *pci_ioremap_addr[6];
+ bool bar_chan_valid;
+ bool vepa;
+ struct mutex irq_lock; /* Protects IRQ operations */
+};
+
+static inline void *dh_core_alloc_priv(struct dh_core_dev *dh_dev,
+ size_t size)
+{
+ void *priv = kzalloc(size, GFP_KERNEL);
+
+ if (priv)
+ dh_dev->priv = priv;
+ return priv;
+}
+
+static inline void dh_core_free_priv(struct dh_core_dev *dh_dev)
+{
+ kfree(dh_dev->priv);
+}
+
+#define GET_COREDEV_TYPE(pdev) \
+ ((pdev)->device == ZXDH_VF_DEVICE_ID ? DH_COREDEV_VF : DH_COREDEV_PF)
+
+void dh_pf_pci_close(struct dh_core_dev *dev);
+
+#endif /* __ZXDH_EN_PF_H__ */
--
2.27.0
^ permalink raw reply related
* Re: [PATCH net v2] appletalk: fix TOCTOU race in atalk_sendmsg
From: Simon Horman @ 2026-06-16 13:22 UTC (permalink / raw)
To: Yizhou Zhao
Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Kees Cook, Kito Xu, linux-kernel, Yuxiang Yang,
Ao Wang, Xuewei Feng, Qi Li, Ke Xu, stable
In-Reply-To: <20260615090635.1549-1-zhaoyz24@mails.tsinghua.edu.cn>
On Mon, Jun 15, 2026 at 05:06:33PM +0800, Yizhou Zhao wrote:
> atalk_sendmsg() looks up an AppleTalk route, stores the returned
> atalk_route and net_device pointers, and then drops the socket lock
> around sock_alloc_send_skb(). The route pointer returned by
> atrtr_find() is only protected while atalk_routes_lock is held; after
> that lock is dropped, a concurrent SIOCDELRT or device-down path can
> unlink the route, drop the device reference, and free the route.
>
> When sendmsg resumes, it can still dereference the stale route and
> device pointers while building or transmitting the packet. A KASAN
> reproducer using AF_APPLETALK sockets and SIOCADDRT/SIOCDELRT reports
> slab-use-after-free reads in atalk_sendmsg(), with the object allocated
> by atrtr_create() and freed by atrtr_delete().
>
> Fix this by splitting the route lookup into a helper that is called with
> atalk_routes_lock already held. atalk_sendmsg() now performs route
> lookup, copies the route fields it needs, and takes references to the
> selected devices with netdev_hold() while still holding
> atalk_routes_lock. After the lock is dropped and skb allocation sleeps,
> the send path uses only the copied route data and the held net_device
> references, which are released with netdev_put() before returning.
>
> This preserves the existing route selection behaviour, including the
> separate loopback route used for broadcast loopback, while removing the
> dangling route/device window.
>
> Fixes: 60d9f461a20b ("appletalk: remove the BKL")
> Cc: stable@vger.kernel.org
> Reported-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
> Reported-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn>
> Reported-by: Ao Wang <wangao@seu.edu.cn>
> Reported-by: Xuewei Feng <fengxw06@126.com>
> Reported-by: Qi Li <qli01@tsinghua.edu.cn>
> Reported-by: Ke Xu <xuke@tsinghua.edu.cn>
> Assisted-by: GLM:GLM-5.1
> Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
> ---
> Changes in v2:
> - Use netdev_hold()/netdev_put() instead of dev_hold()/dev_put().
> - Drop explicit NULL checks before releasing temporary device refs.
> - Link to v1: https://lore.kernel.org/netdev/20260610052315.64504-1-zhaoyz24@mails.tsinghua.edu.cn/
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply
* [PATCH net-next v6 0/2] Add ZTE DingHai Ethernet PF driver
From: han.junyang @ 2026-06-16 13:21 UTC (permalink / raw)
To: andrew+netdev, davem, edumazet, kuba, pabeni, horms
Cc: linux-kernel, netdev, han.junyang, ran.ming, han.chengfei,
zhang.yanze
From: Junyang Han <han.junyang@zte.com.cn>
This series adds initial support for the ZTE DingHai Ethernet controller,
a high-performance PCIe Ethernet device supporting SR-IOV, hardware
offloading, and advanced virtualization features.
Changes from v5:
- Drop dev_info() log spam.
- Propagate the real error code from dh_pf_pci_init() in
dh_pf_probe() instead of hard-coding -ENOMEM.
- Register devlink only after dh_pf_pci_init() succeeds, and
in dh_pf_remove()/dh_pf_shutdown() unregister devlink
before tearing down PCI/mutex/priv.
- Drop the "dh_dev->priv = NULL" assignment from
dh_core_free_priv().
Changes from v4:
- Fix sparse warning: add __iomem annotation to priv pointer
- Fix Clang format warning
- Use "dinghai:" as patch subject prefix
- Ensure proper patch threading
Note: Sent manually due to temporary git send-email unavailability
in our environment. Will use git send-email or b4 for future
submissions. Apologies for any inconvenience.
Changes from v3:
- Merged patches 1 and 2:
Combined initial framework with logging infrastructure
for better code organization and reduced patch count. This was done because
the logging infrastructure now uses Linux's built-in dev_err(), dev_info(),
dev_warn(), etc. macros instead of a custom logging system.
- Removed unnecessary variable initialization:
Fixed "don't initialise variables".
- Fixed variable declaration order:
Applied "Reverse Christmas tree" ordering with variables
declared from longest to shortest line length.
- Code quality improvements:
Fixed all checkpatch.pl issues (alignment, formatting, etc.).
Changes from v2:
- Address maintainer feedback from v2 review:
* Remove meaningless initialization
* Change dh_pf_pci_table to static const for better encapsulation
* Simplify MODULE_DESCRIPTION for brevity
- Coding style improvements:
* Ensure all lines are within 80-column limit
* Use kernel types (u32/u8) consistently throughout
* Improve code readability with better formatting
Changes from v1 (addressing feedback from AndrewLunn):
- Update copyright years to 2022-2026
- Remove DRV_VERSION, MODULE_VERSION and related boilerplate
- Fix MODULE_AUTHOR to use person with email address
- Use module_pci_driver() instead of manual init/exit
- Remove empty suspend/resume callbacks
- Replace char priv[] flexible array with void *priv + kzalloc
- Switch logging from printk wrappers to dev_*() based macros
- Remove dh_helper.h and dh_log.c, simplify to dh_log.h only
- Fix variable declaration ordering (reverse Christmas tree)
- Remove unnecessary NULL check in remove and pf_dev=NULL in probe
- Fix indentation and remove unnecessary type casts
- Use kernel idiomatic "if (ret)" style
This is the initial submission and only includes the PF (Physical Function)
driver. The VF (Virtual Function) driver will be submitted separately.
Junyang Han (2):
dinghai: add ZTE network driver support
dinghai: add hardware register access and PCI capability scanning
MAINTAINERS | 6 +
drivers/net/ethernet/Kconfig | 1 +
drivers/net/ethernet/Makefile | 1 +
drivers/net/ethernet/zte/Kconfig | 20 +
drivers/net/ethernet/zte/Makefile | 6 +
drivers/net/ethernet/zte/dinghai/Kconfig | 34 ++
drivers/net/ethernet/zte/dinghai/Makefile | 10 +
drivers/net/ethernet/zte/dinghai/dh_queue.h | 71 +++
drivers/net/ethernet/zte/dinghai/en_pf.c | 622 ++++++++++++++++++++
drivers/net/ethernet/zte/dinghai/en_pf.h | 131 +++++
10 files changed, 902 insertions(+)
create mode 100644 drivers/net/ethernet/zte/Kconfig
create mode 100644 drivers/net/ethernet/zte/Makefile
create mode 100644 drivers/net/ethernet/zte/dinghai/Kconfig
create mode 100644 drivers/net/ethernet/zte/dinghai/Makefile
create mode 100644 drivers/net/ethernet/zte/dinghai/dh_queue.h
create mode 100644 drivers/net/ethernet/zte/dinghai/en_pf.c
create mode 100644 drivers/net/ethernet/zte/dinghai/en_pf.h
--
2.27.0
^ permalink raw reply
* [PATCH v2] net: macb: add TX stall timeout callback to recover from lost TSTART write
From: Andrea della Porta @ 2026-06-16 13:23 UTC (permalink / raw)
To: netdev, Theo Lebrun, Andrea della Porta, Nicolas Ferre,
Claudiu Beznea, Andrew Lunn, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, linux-kernel, linux-arm-kernel,
linux-rpi-kernel, Nicolai Buchwitz
Cc: Lukasz Raczylo, Steffen Jaeckel
From: Lukasz Raczylo <lukasz@raczylo.com>
The MACB found in the Raspberry Pi RP1 suffers from sporadic stalls on
the TX queue.
While the exact root cause is not yet fully understood, it is likely
related to a hardware issue where a TSTART write to the NCR register
is missed, preventing the transmission from being kicked off.
Implement a timeout callback to handle TX queue stalls, triggering the
existing restart mechanism to recover.
Link: https://lore.kernel.org/all/20260514215459.36109-1-lukasz@raczylo.com/
Fixes: dc110d1b23564 ("net: cadence: macb: Add support for Raspberry Pi RP1 ethernet controller")
Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>
Co-developed-by: Steffen Jaeckel <sjaeckel@suse.de>
Signed-off-by: Steffen Jaeckel <sjaeckel@suse.de>
Co-developed-by: Andrea della Porta <andrea.porta@suse.com>
Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
---
CHANGES IN v2:
- dropped the rate-limited log message
- avoid incrementing tx_error as this is per packet
---
drivers/net/ethernet/cadence/macb_main.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index a12aa21244e83..fd282a1700fb9 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -4522,6 +4522,13 @@ static int macb_setup_tc(struct net_device *dev, enum tc_setup_type type,
}
}
+static void macb_tx_timeout(struct net_device *dev, unsigned int q)
+{
+ struct macb *bp = netdev_priv(dev);
+
+ macb_tx_restart(&bp->queues[q]);
+}
+
static const struct net_device_ops macb_netdev_ops = {
.ndo_open = macb_open,
.ndo_stop = macb_close,
@@ -4540,6 +4547,7 @@ static const struct net_device_ops macb_netdev_ops = {
.ndo_hwtstamp_set = macb_hwtstamp_set,
.ndo_hwtstamp_get = macb_hwtstamp_get,
.ndo_setup_tc = macb_setup_tc,
+ .ndo_tx_timeout = macb_tx_timeout,
};
/* Configure peripheral capabilities according to device tree
--
2.35.3
^ permalink raw reply related
* Re: [PATCH bpf v2 2/2] selftests/bpf: Cover partial copy of non-linear test_run output
From: Paul Chaignon @ 2026-06-16 13:17 UTC (permalink / raw)
To: Sun Jian
Cc: bpf, netdev, linux-kselftest, linux-kernel, ast, daniel, andrii,
martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, davem,
edumazet, kuba, pabeni, horms, shuah, hawk, john.fastabend, sdf,
toke, lorenzo
In-Reply-To: <20260616093103.471444-3-sun.jian.kdev@gmail.com>
On Tue, Jun 16, 2026 at 05:31:03PM +0800, Sun Jian wrote:
> prog_run_opts already verifies that BPF_PROG_TEST_RUN returns -ENOSPC
> for a short data_out buffer while still reporting the full output size
> through data_size_out.
>
> Add the same coverage for non-linear test_run output. Use pass-through
> TC and XDP programs with a 9000-byte packet, a 64-byte linear data area,
> and a 100-byte data_out buffer. The expected output spans both the linear
> data and the first fragment.
>
> Verify that test_run returns -ENOSPC, reports the full packet length
> through data_size_out, and copies the packet prefix into data_out for
> both non-linear skb and XDP frags paths.
>
> Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
> ---
> .../selftests/bpf/prog_tests/prog_run_opts.c | 72 +++++++++++++++++++
> .../selftests/bpf/progs/test_pkt_access.c | 12 ++++
> 2 files changed, 84 insertions(+)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/prog_run_opts.c b/tools/testing/selftests/bpf/prog_tests/prog_run_opts.c
> index 01f1d1b6715a..71af1ff02023 100644
> --- a/tools/testing/selftests/bpf/prog_tests/prog_run_opts.c
> +++ b/tools/testing/selftests/bpf/prog_tests/prog_run_opts.c
> @@ -4,6 +4,10 @@
>
> #include "test_pkt_access.skel.h"
>
> +#define NONLINEAR_PKT_LEN 9000
> +#define NONLINEAR_LINEAR_DATA_LEN 64
> +#define SHORT_OUT_LEN 100
> +
> static const __u32 duration;
>
> static void check_run_cnt(int prog_fd, __u64 run_cnt)
> @@ -20,6 +24,71 @@ static void check_run_cnt(int prog_fd, __u64 run_cnt)
> "incorrect number of repetitions, want %llu have %llu\n", run_cnt, info.run_cnt);
> }
>
> +static void init_pkt(__u8 *pkt, size_t len)
> +{
> + size_t i;
> +
> + for (i = 0; i < len; i++)
> + pkt[i] = i & 0xff;
> +}
> +
> +static void test_skb_nonlinear_data_out_partial(struct test_pkt_access *skel)
> +{
> + LIBBPF_OPTS(bpf_test_run_opts, topts);
> + __u8 pkt[NONLINEAR_PKT_LEN];
> + __u8 out[SHORT_OUT_LEN];
> + struct __sk_buff skb = {};
> + int prog_fd, err;
> +
> + init_pkt(pkt, sizeof(pkt));
Can't we reuse pkt_v4 by reducing the linear area to ETH_HLEN?
> + memset(out, 0xa5, sizeof(out));
Why is this needed?
> +
> + skb.data_end = NONLINEAR_LINEAR_DATA_LEN;
> +
> + topts.data_in = pkt;
> + topts.data_size_in = sizeof(pkt);
> + topts.data_out = out;
> + topts.data_size_out = sizeof(out);
> + topts.ctx_in = &skb;
> + topts.ctx_size_in = sizeof(skb);
> +
> + prog_fd = bpf_program__fd(skel->progs.tc_pass_prog);
> + err = bpf_prog_test_run_opts(prog_fd, &topts);
> +
> + ASSERT_EQ(err, -ENOSPC, "skb_nonlinear_partial_err");
> + ASSERT_EQ(topts.data_size_out, sizeof(pkt), "skb_nonlinear_partial_data_size_out");
> + ASSERT_OK(memcmp(out, pkt, sizeof(out)), "skb_nonlinear_partial_data_out");
> +}
> +
> +static void test_xdp_nonlinear_data_out_partial(struct test_pkt_access *skel)
> +{
> + LIBBPF_OPTS(bpf_test_run_opts, topts);
> + __u8 pkt[NONLINEAR_PKT_LEN];
> + __u8 out[SHORT_OUT_LEN];
> + struct xdp_md ctx = {};
> + int prog_fd, err;
> +
> + init_pkt(pkt, sizeof(pkt));
> + memset(out, 0xa5, sizeof(out));
> +
> + ctx.data = 0;
> + ctx.data_end = NONLINEAR_LINEAR_DATA_LEN;
> +
> + topts.data_in = pkt;
> + topts.data_size_in = sizeof(pkt);
> + topts.data_out = out;
> + topts.data_size_out = sizeof(out);
> + topts.ctx_in = &ctx;
> + topts.ctx_size_in = sizeof(ctx);
> +
> + prog_fd = bpf_program__fd(skel->progs.xdp_frags_pass_prog);
> + err = bpf_prog_test_run_opts(prog_fd, &topts);
> +
> + ASSERT_EQ(err, -ENOSPC, "xdp_nonlinear_partial_err");
> + ASSERT_EQ(topts.data_size_out, sizeof(pkt), "xdp_nonlinear_partial_data_size_out");
> + ASSERT_OK(memcmp(out, pkt, sizeof(out)), "xdp_nonlinear_partial_data_out");
> +}
> +
> void test_prog_run_opts(void)
> {
> struct test_pkt_access *skel;
> @@ -69,6 +138,9 @@ void test_prog_run_opts(void)
> run_cnt += topts.repeat;
> check_run_cnt(prog_fd, run_cnt);
>
> + test_skb_nonlinear_data_out_partial(skel);
> + test_xdp_nonlinear_data_out_partial(skel);
> +
> cleanup:
> if (skel)
> test_pkt_access__destroy(skel);
> diff --git a/tools/testing/selftests/bpf/progs/test_pkt_access.c b/tools/testing/selftests/bpf/progs/test_pkt_access.c
> index bce7173152c6..cd284401eebd 100644
> --- a/tools/testing/selftests/bpf/progs/test_pkt_access.c
> +++ b/tools/testing/selftests/bpf/progs/test_pkt_access.c
> @@ -150,3 +150,15 @@ int test_pkt_access(struct __sk_buff *skb)
>
> return TC_ACT_UNSPEC;
> }
> +
> +SEC("tc")
> +int tc_pass_prog(struct __sk_buff *skb)
> +{
> + return TC_ACT_OK;
> +}
Once we're reusing pkt_v4, maybe we can also reuse the existing BPF
program?
> +
> +SEC("xdp.frags")
> +int xdp_frags_pass_prog(struct xdp_md *ctx)
> +{
> + return XDP_PASS;
> +}
> --
> 2.43.0
>
^ permalink raw reply
* Re: [PATCH net v3 2/2] ipv6: account for fraggap on the paged allocation path
From: Wongi Lee @ 2026-06-16 13:11 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, David Ahern, Ido Schimmel, David S. Miller, Eric Dumazet,
Paolo Abeni, Simon Horman, asml.silence, dhowells, willemb,
Jungwoo Lee
In-Reply-To: <20260615133945.6e94c2d9@kernel.org>
On Mon, Jun 15, 2026 at 01:39:45PM -0700, Jakub Kicinski wrote:
> On Thu, 11 Jun 2026 22:34:13 +0900 Wongi Lee wrote:
> > copy = datalen - transhdrlen - fraggap - pagedlen;
> > - /* [!] NOTE: copy may be negative if pagedlen>0
> > - * because then the equation may reduces to -fraggap.
> > - */
> > if (copy < 0 && !(flags & MSG_SPLICE_PAGES)) {
>
> You remove the comment because copy can never be negative with
> pagedlen>0 now, can we not remove "!(flags & MSG_SPLICE_PAGES)"
> as well then?
Yes, I checked the arithmetic and I agree that the MSG_SPLICE_PAGES
exception is no longer needed after fraggap is accounted for in pagedlen.
I will remove the exception in v4 and address Ido's commit message
comment as well.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox