* [Stable-10.0.11 80/86] ui/sdl2: Set GL ES profile before creating initial GL context
From: Michael Tokarev @ 2026-06-24 5:02 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-stable, Ryan Zhang, Marc-André Lureau, Michael Tokarev
In-Reply-To: <qemu-stable-10.0.11-20260624080222@cover.tls.msk.ru>
From: Ryan Zhang <mailto>
When the user selects GLES via '-display sdl,gl=es', we need to set
SDL_GL_CONTEXT_PROFILE_MASK to SDL_GL_CONTEXT_PROFILE_ES before
calling SDL_GL_CreateContext(). This ensures SDL_GL_LoadLibrary()
loads the correct GLES driver instead of the desktop OpenGL driver.
Fix the below issue: qemu-system-aarch64: /usr/src/debug/libepoxy
/1.5.10/src/dispatch_common.c:872: epoxy_get_proc_address: Assertion
`0 && "Couldn't find current GLX or EGL context.\n"' failed.
sdl2_gl_create_context() already sets the profile mask correctly for
ES mode, but the initial context created in sdl2_window_create() is
missing the same treatment.
Fixes:da3f7a3ab9ea0091955b58f8909dfcee01f4043e ("ui/sdl: try to instantiate the matching opengl renderer")
Signed-off-by: ryan.zhang@nxp.com
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-ID: <DU2PR04MB9018BB3650BA218438C01F2A83042@DU2PR04MB9018.eurprd04.prod.outlook.com>
(cherry picked from commit 490a3e1867f025c68fa13db766b5c8da16c6eca4)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
diff --git a/ui/sdl2.c b/ui/sdl2.c
index b00e421f7f..c798654c99 100644
--- a/ui/sdl2.c
+++ b/ui/sdl2.c
@@ -109,6 +109,8 @@ void sdl2_window_create(struct sdl2_console *scon)
if (scon->opts->gl == DISPLAY_GL_MODE_ES) {
driver = "opengles2";
+ SDL_GL_SetAttribute(SDL_GL_CONTEXT_PROFILE_MASK,
+ SDL_GL_CONTEXT_PROFILE_ES);
}
SDL_SetHint(SDL_HINT_RENDER_DRIVER, driver);
--
2.47.3
^ permalink raw reply related
* Re: [PATCH v5 08/13] scsi: fnic: Handle NVMe LS frames in FDLS
From: sashiko-bot @ 2026-06-24 5:13 UTC (permalink / raw)
To: Karan Tilak Kumar; +Cc: linux-scsi
In-Reply-To: <20260624044334.3079-9-kartilak@cisco.com>
Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
New issues:
- [High] Using non-synchronous cancel_delayed_work() in fdls_reset_oxid_pool() allows concurrent lockless execution of the retry worker, leading to state corruption and OXID double-allocations.
- [High] Mismatched lock/unlock flags cause IRQ state corruption when acquiring fnic_lock with local flags and releasing it with the stale global fnic->lock_flags.
- [High] Using non-sync timer_delete() in nvfnic_ls_rsp_recv() causes a UAF in the timer callback.
Pre-existing issues:
- [High] Dropping a lock inside list_for_each_entry_safe() allows concurrent deletion of the 'next' pointer.
- [High] Module unload/teardown sequence fails to synchronously cancel the oxid_pool delayed works, leading to a UAF crash.
--
commit 04dceb559b15b01cb194adfb5c26b4d828e90ace
Author: Karan Tilak Kumar <kartilak@cisco.com>
scsi: fnic: Handle NVMe LS frames in FDLS
Classify NVMe LS request OXIDs, route NVMe LS responses and ABTS frames
through the FCS receive path, and reset NVMe exchanges when FDLS tears
down target ports.
> diff --git a/drivers/scsi/fnic/fdls_disc.c b/drivers/scsi/fnic/fdls_disc.c
> --- a/drivers/scsi/fnic/fdls_disc.c
> +++ b/drivers/scsi/fnic/fdls_disc.c
[ ... ]
> @@ -387,10 +387,25 @@ static bool fdls_is_oxid_tgt_req(uint16_t oxid)
>
> static void fdls_reset_oxid_pool(struct fnic_iport_s *iport)
> {
> struct fnic_oxid_pool_s *oxid_pool = &iport->oxid_pool;
> + struct reclaim_entry_s *reclaim_entry, *next;
>
> + cancel_delayed_work(&oxid_pool->oxid_reclaim_work);
> + cancel_delayed_work(&oxid_pool->schedule_oxid_free_retry);
> + list_for_each_entry_safe(reclaim_entry, next,
> + &oxid_pool->oxid_reclaim_list, links) {
> + list_del(&reclaim_entry->links);
> + kfree(reclaim_entry);
> + }
> + bitmap_clear(oxid_pool->pending_schedule_free, 0, FNIC_OXID_POOL_SZ);
> + bitmap_clear(oxid_pool->bitmap, 0, FNIC_OXID_POOL_SZ);
[Severity: High]
Can this race with the lockless retry worker? fdls_reset_oxid_pool() uses
non-synchronous cancel_delayed_work() and immediately clears the bitmaps.
If fdls_schedule_oxid_free_retry_work() is traversing
pending_schedule_free locklessly on another CPU, could it read a set bit
just before the reset clears the bitmap, allocate memory, and block on
fnic_lock?
Once the reset finishes and unlocks, the worker might inject a stale
reclaim entry into the now empty list and schedule oxid_reclaim_work. When
the reclaim work fires, could it clear a bit in oxid_pool->bitmap that may
have been reallocated to a new active I/O request, causing an OXID
collision?
[Severity: High]
This is a pre-existing issue, but does the module unload/teardown sequence
fail to synchronously cancel these oxid_pool delayed works?
If fnic_remove() initiates driver teardown and frees the fnic struct without
calling cancel_delayed_work_sync(), flush_workqueue(fnic_event_queue) won't
synchronize them because they are queued on the system workqueue. Could
fdls_reclaim_oxid_handler() execute after the fnic structure is freed,
causing a use-after-free crash?
> oxid_pool->next_idx = 0;
> }
[ ... ]
> @@ -1288,6 +1303,10 @@ bool fdls_delete_tport(struct fnic_iport_s *iport, struct fnic_tport_s *tport)
> spin_unlock_irqrestore(&fnic->fnic_lock, fnic->lock_flags);
> fnic_rport_exch_reset(iport->fnic, tport->fcid);
> spin_lock_irqsave(&fnic->fnic_lock, fnic->lock_flags);
> + } else if (IS_FNIC_NVME_INITIATOR(fnic)) {
> + spin_unlock_irqrestore(&fnic->fnic_lock, fnic->lock_flags);
> + nvfnic_exch_reset(iport, tport);
> + spin_lock_irqsave(&fnic->fnic_lock, fnic->lock_flags);
> }
[Severity: High]
This is a pre-existing issue, but does dropping fnic_lock here allow
concurrent modification of the tport_list?
When fnic_fdls_link_down() iterates over tport_list using
list_for_each_entry_safe(), it caches the next pointer. Since
fdls_delete_tport() unlocks the list briefly, could another context
process a concurrent event and delete the cached next entry?
If the cached next entry is freed while the lock is dropped, won't the
loop dereference a freed pointer when it continues?
[Severity: High]
Does this sequence corrupt the hardware IRQ masking state?
When callers like fdls_tport_timer_callback() or fnic_tport_event_handler()
acquire the fnic_lock using a local flags variable:
spin_lock_irqsave(&fnic->fnic_lock, flags);
and then invoke helpers like fdls_delete_tport() or
nvfnic_process_ls_abts_rsp() which temporarily drop and reacquire the lock
using the struct member fnic->lock_flags:
spin_unlock_irqrestore(&fnic->fnic_lock, fnic->lock_flags);
Because fnic->lock_flags was never initialized by the original caller, does
it restore an arbitrary stale IRQ state? Reacquiring the lock then overwrites
fnic->lock_flags, causing the caller to unlock with the original local flags
and severely corrupt the IRQ state.
[ ... ]
> diff --git a/drivers/scsi/fnic/fnic_fcs.c b/drivers/scsi/fnic/fnic_fcs.c
> --- a/drivers/scsi/fnic/fnic_fcs.c
> +++ b/drivers/scsi/fnic/fnic_fcs.c
[ ... ]
> @@ -313,8 +319,14 @@ void fnic_handle_frame(struct work_struct *work)
> fchdr_offset = (cur_frame->rx_ethhdr_stripped) ?
> 0 : FNIC_ETH_FCOE_HDRS_OFFSET;
>
> - fnic_fdls_recv_frame(&fnic->iport, cur_frame->fp,
> - cur_frame->frame_len, fchdr_offset);
> + fchdr = (struct fc_frame_header *)((u8 *)cur_frame->fp + fchdr_offset);
> + if (IS_FNIC_NVME_INITIATOR(fnic) && fnic_is_nvme_frame(fchdr)) {
> + nvfnic_ls_rsp_recv(&fnic->iport, fchdr,
> + cur_frame->frame_len - fchdr_offset);
[Severity: High]
Does calling nvfnic_ls_rsp_recv() here introduce a use-after-free risk
in its timer callback?
Inside nvfnic_ls_rsp_recv(), it executes with fnic_lock held and invokes
timer_delete(&nvfnic_ls_req->ls_req_timer) without synchronization. If the
timer callback nvfnic_ls_req_timeout() is concurrently executing on another
CPU, it will spin-wait on fnic_lock.
nvfnic_ls_rsp_recv() then drops the lock and calls lsreq->done(), which
commonly causes the transport layer to free the request and its private
payload. Once the lock is dropped, will the spinning timer callback unblock,
acquire the lock, and inspect the now-freed nvfnic_ls_req memory?
> + } else {
> + fnic_fdls_recv_frame(&fnic->iport, cur_frame->fp,
> + cur_frame->frame_len, fchdr_offset);
> + }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260624044334.3079-1-kartilak@cisco.com?part=8
^ permalink raw reply
* Re: [PATCH v5 06/13] scsi: fnic: Add the NVMe/FC transport path
From: sashiko-bot @ 2026-06-24 5:12 UTC (permalink / raw)
To: Karan Tilak Kumar; +Cc: linux-scsi
In-Reply-To: <20260624044334.3079-7-kartilak@cisco.com>
Thank you for your contribution! Sashiko AI review found 11 potential issue(s) to consider:
New issues:
- [Critical] Premature freeing of I/O tags during ABTS completion leads to tag reuse and data corruption.
- [High] Non-sync timer cancel in nvfnic_ls_rsp_recv leads to Use-After-Free of lsreq and fnic structures.
- [High] Asynchronous deletion timeouts cause Use-After-Free of tport and lport objects.
- [High] Missing cancellation of nvme_io_cmpl_work allows a work item to execute after the fnic structure is freed.
- [High] Leaked LS requests and active timers when remote port unregistration fails.
- [High] A failed abort request combined with a concurrent completion causes an I/O to hang forever.
- [High] Concurrent I/O completion causes Use-After-Free of fcp_req when _terminate_tport_ios drops the lock.
- [High] Accessing lsreq->private after dropping the lock in nvfnic_ls_req_abort causes Use-After-Free.
- [Medium] Uninitialized sgl_mapped state leads to unbalanced dma_unmap_single calls for zero-length NVMe requests.
- [Medium] Permanent reference leak of the PCI device (pdev->dev) during localport registration.
Pre-existing issues:
- [High] Dropping the spinlock inside list_for_each_entry_safe() permits concurrent list corruption.
--
commit 78a3171c3dafe1bc2912707cce47ec94e43860f3
Author: Karan Tilak Kumar <kartilak@cisco.com>
scsi: fnic: Add the NVMe/FC transport path
This patch introduces the NVMe/FC transport path for the fnic driver.
It includes local and remote port registration, I/O submission, tag
management, abort handling, and cleanup logic.
Closes: https://lore.kernel.org/oe-kbuild-all/202605280430.wTYAqI3A-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202605280619.pmobiDWp-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202605280519.Jd4fmgAZ-lkp@intel.com/
> diff --git a/drivers/scsi/fnic/fnic_main.c b/drivers/scsi/fnic/fnic_main.c
> index 0d7828be244da..df8c8ebbc32b9 100644
> --- a/drivers/scsi/fnic/fnic_main.c
> +++ b/drivers/scsi/fnic/fnic_main.c
[ ... ]
> @@ -1215,6 +1286,11 @@ static void fnic_remove(struct pci_dev *pdev)
> */
> fnic_cleanup(fnic);
>
> + if (IS_FNIC_NVME_INITIATOR(fnic)) {
> + sbitmap_free(&fnic->nvfnic_tag_map);
> + fnic_free_ioreq_tables_mq(fnic);
> + }
> +
> spin_lock_irqsave(&fnic_list_lock, flags);
> list_del(&fnic->list);
> spin_unlock_irqrestore(&fnic_list_lock, flags);
[Severity: High]
Does this code leave work items running after the fnic structure is freed?
Looking at fnic_remove(), we call fnic_cleanup(fnic) which drains the hardware
completion queues, potentially finding completions and queueing work items via
queue_work(fnic_cmpl_queue, &fnic->nvme_io_cmpl_work).
Since there is no cancel_work_sync() before kfree(fnic), could a re-queued
work item execute after the memory is freed and cause a use-after-free crash?
> diff --git a/drivers/scsi/fnic/fnic_nvme.c b/drivers/scsi/fnic/fnic_nvme.c
> new file mode 100644
> index 0000000000000..8374464e4fcc8
> --- /dev/null
> +++ b/drivers/scsi/fnic/fnic_nvme.c
[ ... ]
> +void nvfnic_dma_unmap_sgl(struct fnic *fnic, struct fnic_io_req *io_req)
> +{
> + if (io_req->sgl_mapped) {
> + dma_unmap_single(&fnic->pdev->dev, io_req->sgl_list_pa,
> + sizeof(io_req->sgl_list[0]) * io_req->sgl_cnt,
> + DMA_TO_DEVICE);
> + io_req->sgl_mapped = 0;
> + io_req->sgl_list_pa = 0;
> + }
> +}
[Severity: Medium]
Could this lead to an unbalanced DMA unmap?
In nvfnic_fcpio_send(), io_req->sgl_mapped is not explicitly zeroed during
initialization:
nvfnic_fcpio_send() {
...
io_req->sgl_list_pa = 0;
io_req->wq = hw_queue_handle;
...
}
If a previous I/O set sgl_mapped to 1, and the current I/O has zero length,
the mapping step is skipped but sgl_mapped might still be 1. Will this cause
nvfnic_dma_unmap_sgl() to unmap a 0 address with 0 length?
[ ... ]
> +void nvfnic_fcpio_nvme_fast_cmpl_handler(struct fnic *fnic,
> + struct fcpio_fw_req *desc)
> +{
> + u8 type;
> + u8 hdr_status;
> + struct fcpio_tag ftag;
> + u32 id;
> + struct fnic_io_req *io_req;
[ ... ]
> + fcpio_header_dec(&desc->hdr, &type, &hdr_status, &ftag);
> + fcpio_tag_id_dec(&ftag, &id);
> + tag = id & FNIC_TAG_MASK;
> +
> + if (tag >= fnic->fnic_max_tag_id) {
> + FNIC_NVME_DBG(KERN_ERR, fnic,
> + "Tag out of range tag: 0x%x hdr status: %s\n", tag,
> + fnic_fcpio_status_to_str(hdr_status));
> + return;
> + }
> + spin_lock_irqsave(&fnic->fnic_lock, fnic->lock_flags);
> +
> + io_req = nvfnic_find_io_req_by_tag(fnic, tag);
> +
> + WARN_ON_ONCE(!io_req);
> + if (!io_req) {
[ ... ]
> + /* firmware completed the io */
> + io_req->io_completed = 1;
> + if (io_req->cmd_state == FNIC_IOREQ_ABTS_PENDING) {
> + /*
> + * set the FNIC_IO_DONE so that this doesn't get
> + * flagged as 'out of order' if it was not aborted
> + */
> + io_req->cmd_flags |= FNIC_IO_DONE;
> + io_req->cmd_flags |= FNIC_IO_ABTS_PENDING;
> + if (hdr_status == FCPIO_ABORTED)
> + io_req->cmd_flags |= FNIC_IO_ABORTED;
> + spin_unlock_irqrestore(&fnic->fnic_lock, fnic->lock_flags);
> +
> + FNIC_NVME_DBG(KERN_INFO, fnic,
> + "icmnd abts hdr:%d %s tag:0x%x io:%p",
> + hdr_status, fnic_fcpio_status_to_str(hdr_status),
> + id, io_req);
> + return;
> + }
[Severity: High]
Could an I/O hang indefinitely here if the abort request fails to queue?
When aborting an I/O, nvfnic_fcpio_abort() sets the state to
FNIC_IOREQ_ABTS_PENDING and drops the lock. If a hardware completion arrives
during this window, nvfnic_fcpio_nvme_fast_cmpl_handler() observes
ABTS_PENDING and intentionally drops the completion.
If the subsequent nvfnic_queue_abort_io_req() call fails, the state is
reverted back to FNIC_IOREQ_CMD_PENDING. Because the completion was
permanently discarded and the hardware was not told to abort, does this
leave the I/O hanging forever?
[ ... ]
> +void nvfnic_fcpio_nvme_itmf_cmpl_handler(struct fnic *fnic,
> + struct fcpio_fw_req *desc)
> +{
> + u8 type;
> + u8 hdr_status;
[ ... ]
> + /* If the status is IO not found consider it as success.
> + * NVME sends abort even if rport is down in which case
> + * we will get FCPIO_TIMEOUT. Consider this as success.
> + */
> + if ((hdr_status == FCPIO_IO_NOT_FOUND) ||
> + (hdr_status == FCPIO_TIMEOUT) ||
> + (hdr_status == FCPIO_ITMF_REJECTED))
> + io_req->abts_state = FCPIO_SUCCESS;
> +
> + io_req->cmd_flags |= FNIC_IO_ABT_TERM_DONE;
> +
> +
> + io_req->fcp_req->transferred_length = 0;
> + io_req->fcp_req->rcv_rsplen = 0;
> + if (io_req->abts_state == FCPIO_SUCCESS)
> + io_req->fcp_req->status = NVME_SC_ABORT_REQ;
> + else
> + io_req->fcp_req->status = NVME_SC_INTERNAL;
> +
> + nvfnic_release_nvme_ioreq_buf(iport, io_req);
> + if (io_req->done)
> + io_req->done(io_req);
> + spin_unlock_irqrestore(&fnic->fnic_lock, fnic->lock_flags);
> +}
[Severity: Critical]
Does this allow a tag to be reused before the hardware has actually completed
the original request, potentially corrupting a new I/O?
When an ITMF (abort) completion times out (FCPIO_TIMEOUT), the tag is
forcefully freed via io_req->done() and can be reallocated to a new request.
Since the driver matches completions only by the tag without a generation
count, if the late completion for the original I/O arrives, will
nvfnic_fcpio_nvme_fast_cmpl_handler() match it to the new request and
prematurely complete it with stale data?
[ ... ]
> +bool _terminate_tport_ios(struct sbitmap *map, unsigned int tag,
> + void *data)
> +{
> + struct fnic_tport_s *tport = data;
> + struct fnic_iport_s *iport = tport->iport;
> + struct fnic *fnic = iport->fnic;
> + struct fnic_io_req *io_req;
> + struct nvmefc_fcp_req *fcp_req = NULL;
> + struct nvme_fc_local_port *lport = iport->nv_lport;
> + struct nvme_fc_remote_port *rport = tport->nv_rport;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&fnic->fnic_lock, flags);
> + io_req = fnic->sw_copy_wq[0].io_req_table[tag];
> + if (!io_req) {
> + spin_unlock_irqrestore(&fnic->fnic_lock, flags);
> + return true;
> + }
> +
> + if (io_req->tport != tport) {
> + spin_unlock_irqrestore(&fnic->fnic_lock, flags);
> + return true;
> + }
> +
> + FNIC_NVME_DBG(KERN_INFO, fnic,
> + "Terminate tag: 0x%x (tport fcid 0x%x)\n",
> + io_req->tag, io_req->tport->fcid);
> + fcp_req = io_req->fcp_req;
> + spin_unlock_irqrestore(&fnic->fnic_lock, flags);
> +
> + nvfnic_fcpio_abort(lport, rport, NULL, fcp_req);
> + return true;
> +}
[Severity: High]
Can fcp_req be accessed after it has been freed?
Here, fcp_req is extracted from io_req under the lock. The lock is then dropped
without modifying io_req->cmd_state. During this unlocked window, a concurrent
hardware completion could run, process the I/O, invoke io_req->done(), and
return the request to the midlayer, which might free or reuse fcp_req.
When nvfnic_fcpio_abort() is subsequently called, could it dereference the
now-freed fcp_req structure?
[ ... ]
> +void nvfnic_nvme_zero_devloss_tports(struct fnic *fnic)
> +{
> + struct fnic_tport_s *tport, *next;
> +
> + spin_lock_irqsave(&fnic->fnic_lock, fnic->lock_flags);
> + list_for_each_entry_safe(tport, next, &fnic->iport.tport_list, links) {
> + if (tport->flags & FNIC_FDLS_NVME_REGISTERED) {
> + spin_unlock_irqrestore(&fnic->fnic_lock, fnic->lock_flags);
> + nvme_fc_set_remoteport_devloss(tport->nv_rport, 0);
> + spin_lock_irqsave(&fnic->fnic_lock, fnic->lock_flags);
> + }
> + }
> + spin_unlock_irqrestore(&fnic->fnic_lock, fnic->lock_flags);
> +}
[Severity: High]
This is a pre-existing issue, but dropping the lock inside
list_for_each_entry_safe() allows concurrent list corruption.
list_for_each_entry_safe() protects against deletion of the *current* entry
by caching the 'next' pointer. However, while the fnic_lock is dropped,
concurrent threads can remove and free the cached 'next' entry from the list.
When the loop resumes and accesses 'next', does this result in a
use-after-free? This pattern also exists in fnic_delete_fcp_tports().
[ ... ]
> +void nvfnic_ls_rsp_recv(struct fnic_iport_s *iport,
> + struct fc_frame_header *fchdr, int len)
> +{
> + uint8_t *fcid;
[ ... ]
> + nvfnic_ls_req->state = FNIC_LS_REQ_CMD_COMPLETE;
> +
> + list_del_init(&nvfnic_ls_req->list);
> + lsreq->private = NULL;
> + fdls_free_oxid(iport, oxid, &nvfnic_ls_req->oxid);
> + timer_delete(&nvfnic_ls_req->ls_req_timer);
> +
> + if (status == 0) {
> + FNIC_NVME_DBG(KERN_DEBUG, fnic,
> + "tport:0x%x lsreq:0x%x completed\n",
> + tport_fcid, oxid);
> +
> + /* Copy the Response */
> + memcpy(lsreq->rspaddr, (uint8_t *)fchdr + sizeof(*fchdr),
> + rsp_len);
> + }
> +
> + spin_unlock_irqrestore(&fnic->fnic_lock, fnic->lock_flags);
> + lsreq->done(lsreq, status);
> + spin_lock_irqsave(&fnic->fnic_lock, fnic->lock_flags);
> +}
[Severity: High]
Could the timer callback race with lsreq->done() and cause a use-after-free?
Because timer_delete() is used instead of timer_delete_sync(), the timer
callback might be concurrently executing on another CPU spinning on fnic_lock.
When the lock is dropped here, lsreq->done() is called, returning ownership
to the NVMe FC midlayer which frees lsreq.
Once the lock is released, won't the spinning timer callback wake up and
dereference the freed lsreq memory?
[ ... ]
> +void nvfnic_ls_req_abort(struct nvme_fc_local_port *lport,
> + struct nvme_fc_remote_port *rport,
> + struct nvmefc_ls_req *lsreq)
> +{
> + struct fnic_iport_s *iport = lport->private;
> + struct fnic *fnic = iport->fnic;
> + struct fnic_tport_s *tport;
> + struct nvfnic_ls_req *nvfnic_ls_req;
> + uint16_t oxid;
> + int timeout;
> +
> + spin_lock_irqsave(&fnic->fnic_lock, fnic->lock_flags);
[ ... ]
> + nvfnic_ls_req->state = FNIC_LS_REQ_CMD_ABTS_STARTED;
> + spin_unlock_irqrestore(&fnic->fnic_lock, fnic->lock_flags);
> + timer_delete_sync(&nvfnic_ls_req->ls_req_timer);
> +
> + spin_lock_irqsave(&fnic->fnic_lock, fnic->lock_flags);
> + nvfnic_ls_req = lsreq->private;
> +
> + if ((nvfnic_ls_req == NULL) ||
> + (nvfnic_ls_req->state == FNIC_LS_REQ_CMD_ABTS_PENDING)) {
[Severity: High]
Does this dereference lsreq->private after the lock is reacquired, when lsreq
might already be freed?
The fnic_lock is dropped to call timer_delete_sync(). During this window,
nvfnic_terminate_tport_ls_reqs() could process the request and invoke
lsreq->done(), returning it to the midlayer which then frees it.
When the lock is reacquired, nvfnic_ls_req = lsreq->private accesses the freed
lsreq memory. Can this result in a use-after-free crash?
[ ... ]
> +void nvfnic_delete_tport(struct fnic_iport_s *iport,
> + struct fnic_tport_s *tport,
> + unsigned long flags)
> +{
> + struct fnic *fnic = iport->fnic;
> + int ret;
> + unsigned int time_wait = FNIC_NVME_TPORT_REMOVE_WAIT;
> + unsigned int time_remain;
> + DECLARE_COMPLETION_ONSTACK(tm_done);
> + unsigned int fcid;
> + int count = 0;
> +
> + if (!tport)
> + return;
> +
> + fcid = tport->fcid;
> + fdls_set_tport_state(tport, FDLS_TGT_STATE_OFFLINE);
> +
> + FNIC_NVME_DBG(KERN_DEBUG, fnic,
> + "0x%x: scheduled deletion for tport: 0x%x\n",
> + iport->fcid, tport->fcid);
> +
> + if (!(tport->flags & FNIC_FDLS_NVME_REGISTERED)) {
> + FNIC_NVME_DBG(KERN_ERR, fnic,
> + "0x%x: tport: 0x%x not registered. Freeing\n",
> + iport->fcid, tport->fcid);
> + list_del(&tport->links);
> + kfree(tport);
> + return;
> + }
> +
> + tport->tport_del_done = &tm_done;
> +
> + tport->flags |= FNIC_FDLS_TPORT_DELETED;
> + spin_unlock_irqrestore(&fnic->fnic_lock, flags);
> + ret = nvme_fc_unregister_remoteport(tport->nv_rport);
> + if (ret) {
> + FNIC_NVME_DBG(KERN_ERR, fnic,
> + "tport: 0x%x unregister failed %d\n",
> + tport->fcid, ret);
> + spin_lock_irqsave(&fnic->fnic_lock, flags);
> + tport->tport_del_done = NULL;
> + list_del(&tport->links);
> + kfree(tport);
> + return;
> + }
[Severity: High]
Does this leak pending LS requests and leave their timers active?
If nvme_fc_unregister_remoteport() fails, kfree(tport) is called directly.
This bypasses nvfnic_terminate_tport_ls_reqs(), which leaks all nvfnic_ls_req
structures left on tport->ls_req_list.
Since their timers are never stopped, won't nvfnic_ls_req_timeout()
eventually fire and access the freed tport structure?
> + time_remain = wait_for_completion_timeout(tport->tport_del_done,
> + msecs_to_jiffies(time_wait));
> +
> + FNIC_NVME_DBG(KERN_DEBUG, fnic,
> + "tport: 0x%x wait for deletion done\n",
> + tport->fcid);
> +
> + spin_lock_irqsave(&fnic->fnic_lock, flags);
> + tport->tport_del_done = NULL;
> +
> + if (!time_remain) {
> + FNIC_NVME_DBG(KERN_ERR, fnic,
> + "tport: 0x%x nvme midlayer completion timed out\n",
> + tport->fcid);
> +
> + spin_unlock_irqrestore(&fnic->fnic_lock, flags);
> + nvfnic_cleanup_tport_io(fnic, tport);
> + spin_lock_irqsave(&fnic->fnic_lock, flags);
> + tport->flags &= ~FNIC_FDLS_NVME_TPORT_CLEANUP_PENDING;
> + list_del(&tport->links);
> + kfree(tport);
[Severity: High]
If this timeout expires, will the NVMe FC midlayer still call the
remoteport_delete callback and cause a use-after-free?
When the timeout expires, the driver forcefully calls kfree(tport). However,
the NVMe FC midlayer retains pointers to these structures and will eventually
execute the asynchronous remoteport_delete callback.
When nvfnic_remote_port_delete() is called later, it dereferences
rport->private. Since tport was already freed here, does this lead to a
use-after-free crash?
[ ... ]
> +int nvfnic_add_lport(struct fnic *fnic)
> +{
> + struct nvme_fc_port_info pinfo;
> + struct fnic_iport_s *iport = &fnic->iport;
> + int ret = 0;
[ ... ]
> + ret = nvme_fc_register_localport(&pinfo, iport->nv_tmpl,
> + get_device(&fnic->pdev->dev),
> + &iport->nv_lport);
> + if (ret) {
> + FNIC_NVME_DBG(KERN_ERR, fnic,
> + "Failed to add wwpn: 0x%llx ret: %d\n",
> + iport->wwpn, ret);
> + kfree(iport->nv_tmpl);
> + return ret;
> + }
[Severity: Medium]
Does this create a permanent reference leak of the PCI device?
The get_device(&fnic->pdev->dev) call takes a reference that is passed to
the NVMe FC midlayer, which internally takes its own reference when registering
the localport and puts it upon unregistration.
Because this driver-level get_device() is never balanced with a put_device()
call by fnic, will this leak prevent PCI device teardown or hot-unplug?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260624044334.3079-1-kartilak@cisco.com?part=6
^ permalink raw reply
* [Stable-10.2.4 93/95] ppc/pnv_phb3: Error out on invalid config access
From: Michael Tokarev @ 2026-06-24 5:05 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-stable, Aditya Gupta, Zexiang Zhang, Michael S. Tsirkin,
Michael Tokarev
In-Reply-To: <qemu-stable-10.2.4-20260624080454@cover.tls.msk.ru>
From: Aditya Gupta <adityag@linux.ibm.com>
PHB in Power8 supports 8 byte registers, and hence the ops structure
allows accessing of 8 bytes in 'pnv_phb3_reg_ops'
Both 'pnv_phb3_reg_read' & 'pnv_phb3_reg_write' pass the arguments as is
to 'pnv_phb3_config_{read,write}', if offset is PHB_CONFIG_DATA.
This when called with size as 8, causes following assert failure in
'pci_host_config_read_common' & 'pci_host_config_write_common':
assert(len <= 4);
Validate that size is <=4, before jumping to pci_host_config_{read,write}_common
Resolves: https://gitlab.com/qemu-project/qemu/-/work_items/3334
Reported-by: Zexiang Zhang <chan9yan9@gmail.com>
Fixes: 9ae1329ee2fe ("ppc/pnv: Add models for POWER8 PHB3 PCIe Host bridge")
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20260326190438.734239-2-adityag@linux.ibm.com>
(cherry picked from commit 218109781209f9d77242b2cdf743acac8bc3b893)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
diff --git a/hw/pci-host/pnv_phb3.c b/hw/pci-host/pnv_phb3.c
index 5d8383fac3..6cd66e705f 100644
--- a/hw/pci-host/pnv_phb3.c
+++ b/hw/pci-host/pnv_phb3.c
@@ -475,6 +475,11 @@ void pnv_phb3_reg_write(void *opaque, hwaddr off, uint64_t val, unsigned size)
/* Special case configuration data */
if ((off & 0xfffc) == PHB_CONFIG_DATA) {
+ if (size > 4) {
+ phb3_error(phb, "Invalid config access, offset: 0x%"PRIx64" size: %d",
+ off, size);
+ return;
+ }
pnv_phb3_config_write(phb, off & 0x3, size, val);
return;
}
@@ -597,6 +602,11 @@ uint64_t pnv_phb3_reg_read(void *opaque, hwaddr off, unsigned size)
uint64_t val;
if ((off & 0xfffc) == PHB_CONFIG_DATA) {
+ if (size > 4) {
+ phb3_error(phb, "Invalid config access, offset: 0x%"PRIx64" size: %d",
+ off, size);
+ return ~0ull;
+ }
return pnv_phb3_config_read(phb, off & 0x3, size);
}
--
2.47.3
^ permalink raw reply related
* Re: [PATCH v3] x86/pci-dma: add a SWIOTLB_ANY flag to lift the low mem limitation
From: Borislav Petkov @ 2026-06-24 5:11 UTC (permalink / raw)
To: Miao, Jun
Cc: tglx@kernel.org, mingo@redhat.com, dave.hansen@linux.intel.com,
m.szyprowski@samsung.com, robin.murphy@arm.com, x86@kernel.org,
linux-kernel@vger.kernel.org, aakarsh.jain@oss.qualcomm.com,
Edgecombe, Rick P, Tom Lendacky, Michael Roth, Nikunj A Dadhania
In-Reply-To: <PH7PR11MB8455779DDDCF779C19F2412E9AED2@PH7PR11MB8455.namprd11.prod.outlook.com>
On Wed, Jun 24, 2026 at 03:23:39AM +0000, Miao, Jun wrote:
> >On Wed, Jun 24, 2026 at 01:53:39AM +0000, Miao, Jun wrote:
> >> Good suggestion and thank you for testing the ADM SEV-SNP.
> >
> >I don't think anyone tested it on SNP yet.
>
> I only know that SEV-SNP is a more advanced, third-generation feature.
> Are we using "SEV" here as a shorthand for these?
You have SEV, SEV-ES and SEV-SNP in the order of their appearance and in the
order they have gotten additional features. SNP is the one which is has
addressed the most if not all? confidential VM attack vectors. And that's the
one I care about as the other two are just the prerequisites to the SNP thing.
In my opinion only anyway.
> When using confidential VMs, users want to pass through both the high-speed
> network interface card (NIC) or an 8-GPUs setup into the CVMs. During data
> transfer, the SWIOTLB bounce buffer becomes a critical "hot path" acting as
> an intermediary for convertor between private and shared memory.
> consequently, the capacity requirement increases—otherwise, network or data
> transfer performance would be adversely affected.
Yes, that makes more sense. Pls add it to the commit message.
> What I mean to convey is that in TEE environments based on AMD SEV or Intel TDX,
> the core issue is the lack of trust in the hypervisor's VMM.
Then say it this way. Trusted hypervisor sounds like we trust the HV. Which we
absolutely do not.
> When using confidential VMs, users want to pass through both the high-speed
> network interface card (NIC) or an 8-GPUs setup into the CVMs. During data
> transfer, the SWIOTLB bounce buffer becomes a critical "hot path" acting as
> an intermediary for convertor between private and shared memory.
> consequently, the capacity requirement increases—otherwise, network or data
> transfer performance would be adversely affected
>
> Confidential VMs include AMD SEV and Intel TDX guests want to allocate
> a swiotlb buffer that is not restricted to low memory in TEE.
Sounds better, yes.
Except that we still need to test it on SNP.
Adding some folks on Cc who can do that and take a look at your patch and vet
it for SNP - my guest is still broken. :-\
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply
* [Stable-10.0.11 82/86] linux-user/xtensa: save/restore FP registers across signal delivery
From: Michael Tokarev @ 2026-06-24 5:02 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-stable, Matt Turner, Richard Henderson, Helge Deller,
Michael Tokarev
In-Reply-To: <qemu-stable-10.0.11-20260624080222@cover.tls.msk.ru>
From: Matt Turner <mattst88@gmail.com>
Add support for saving and restoring f0-f15 across signal delivery.
The target_xtensa_xtregs_fp struct carries 32-bit f-regs for cores
with XTENSA_OPTION_FP_COPROCESSOR; target_xtensa_xtregs_dfp carries
64-bit f-regs for cores with XTENSA_OPTION_DFP_COPROCESSOR.
Lock the xtregs region via lock_user before reading on sigreturn,
since sc_xtregs is a user-space pointer that may lie outside the
locked sigframe.
Signed-off-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Helge Deller <deller@gmx.de>
(cherry picked from commit 6858e3a71cc41510937bec0950eb4e42e33ba5f2)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
diff --git a/linux-user/xtensa/signal.c b/linux-user/xtensa/signal.c
index 6514b8dd57..fb1a3b86c1 100644
--- a/linux-user/xtensa/signal.c
+++ b/linux-user/xtensa/signal.c
@@ -21,6 +21,7 @@
#include "user-internals.h"
#include "signal-common.h"
#include "linux-user/trace.h"
+#include "target/xtensa/cpu.h"
struct target_sigcontext {
abi_ulong sc_pc;
@@ -43,10 +44,25 @@ struct target_ucontext {
target_sigset_t tuc_sigmask;
};
+struct target_xtensa_xtregs_fp {
+ uint32_t f[16];
+ uint32_t fcr;
+ uint32_t fsr;
+};
+
+struct target_xtensa_xtregs_dfp {
+ uint64_t f[16];
+ uint32_t fcr;
+ uint32_t fsr;
+};
+
struct target_rt_sigframe {
target_siginfo_t info;
struct target_ucontext uc;
- /* TODO: xtregs */
+ union {
+ struct target_xtensa_xtregs_fp fp;
+ struct target_xtensa_xtregs_dfp dfp;
+ } xtregs;
uint8_t retcode[6];
abi_ulong window[4];
};
@@ -107,6 +123,7 @@ static int flush_window_regs(CPUXtensaState *env)
}
static int setup_sigcontext(struct target_rt_sigframe *frame,
+ abi_ulong frame_addr,
CPUXtensaState *env)
{
struct target_sigcontext *sc = &frame->uc.tuc_mcontext;
@@ -123,8 +140,25 @@ static int setup_sigcontext(struct target_rt_sigframe *frame,
for (i = 0; i < 16; ++i) {
__put_user(env->regs[i], sc->sc_a + i);
}
- __put_user(0, &sc->sc_xtregs);
- /* TODO: xtregs */
+ if (xtensa_option_enabled(env->config, XTENSA_OPTION_DFP_COPROCESSOR)) {
+ for (i = 0; i < 16; ++i) {
+ __put_user(env->fregs[i].f64, &frame->xtregs.dfp.f[i]);
+ }
+ __put_user(env->uregs[FCR], &frame->xtregs.dfp.fcr);
+ __put_user(cpu_get_fsr(env), &frame->xtregs.dfp.fsr);
+ __put_user(frame_addr + offsetof(struct target_rt_sigframe, xtregs),
+ &sc->sc_xtregs);
+ } else if (xtensa_option_enabled(env->config, XTENSA_OPTION_FP_COPROCESSOR)) {
+ for (i = 0; i < 16; ++i) {
+ __put_user(env->fregs[i].f32[FP_F32_LOW], &frame->xtregs.fp.f[i]);
+ }
+ __put_user(env->uregs[FCR], &frame->xtregs.fp.fcr);
+ __put_user(cpu_get_fsr(env), &frame->xtregs.fp.fsr);
+ __put_user(frame_addr + offsetof(struct target_rt_sigframe, xtregs),
+ &sc->sc_xtregs);
+ } else {
+ __put_user(0, &sc->sc_xtregs);
+ }
return 1;
}
@@ -190,7 +224,7 @@ void setup_rt_frame(int sig, struct target_sigaction *ka,
__put_user(0, &frame->uc.tuc_flags);
__put_user(0, &frame->uc.tuc_link);
target_save_altstack(&frame->uc.tuc_stack, env);
- if (!setup_sigcontext(frame, env)) {
+ if (!setup_sigcontext(frame, frame_addr, env)) {
unlock_user_struct(frame, frame_addr, 0);
goto give_sigsegv;
}
@@ -244,8 +278,8 @@ give_sigsegv:
return;
}
-static void restore_sigcontext(CPUXtensaState *env,
- struct target_rt_sigframe *frame)
+static int restore_sigcontext(CPUXtensaState *env,
+ struct target_rt_sigframe *frame)
{
struct target_sigcontext *sc = &frame->uc.tuc_mcontext;
uint32_t ps;
@@ -267,7 +301,51 @@ static void restore_sigcontext(CPUXtensaState *env,
for (i = 0; i < 16; ++i) {
__get_user(env->regs[i], sc->sc_a + i);
}
- /* TODO: xtregs */
+ {
+ abi_ulong xtregs_addr;
+
+ __get_user(xtregs_addr, &sc->sc_xtregs);
+ if (xtregs_addr) {
+ if (xtensa_option_enabled(env->config,
+ XTENSA_OPTION_DFP_COPROCESSOR)) {
+ struct target_xtensa_xtregs_dfp *xtregs;
+ uint32_t fcr, fsr;
+
+ xtregs = lock_user(VERIFY_READ, xtregs_addr,
+ sizeof(*xtregs), 1);
+ if (!xtregs) {
+ return 0;
+ }
+ for (i = 0; i < 16; ++i) {
+ __get_user(env->fregs[i].f64, &xtregs->f[i]);
+ }
+ __get_user(fcr, &xtregs->fcr);
+ __get_user(fsr, &xtregs->fsr);
+ unlock_user(xtregs, xtregs_addr, 0);
+ cpu_set_fcr(env, fcr);
+ cpu_set_fsr(env, fsr);
+ } else if (xtensa_option_enabled(env->config,
+ XTENSA_OPTION_FP_COPROCESSOR)) {
+ struct target_xtensa_xtregs_fp *xtregs;
+ uint32_t fcr, fsr;
+
+ xtregs = lock_user(VERIFY_READ, xtregs_addr,
+ sizeof(*xtregs), 1);
+ if (!xtregs) {
+ return 0;
+ }
+ for (i = 0; i < 16; ++i) {
+ __get_user(env->fregs[i].f32[FP_F32_LOW], &xtregs->f[i]);
+ }
+ __get_user(fcr, &xtregs->fcr);
+ __get_user(fsr, &xtregs->fsr);
+ unlock_user(xtregs, xtregs_addr, 0);
+ cpu_set_fcr(env, fcr);
+ cpu_set_fsr(env, fsr);
+ }
+ }
+ }
+ return 1;
}
long do_rt_sigreturn(CPUXtensaState *env)
@@ -283,7 +361,9 @@ long do_rt_sigreturn(CPUXtensaState *env)
target_to_host_sigset(&set, &frame->uc.tuc_sigmask);
set_sigmask(&set);
- restore_sigcontext(env, frame);
+ if (!restore_sigcontext(env, frame)) {
+ goto badframe;
+ }
target_restore_altstack(&frame->uc.tuc_stack, env);
unlock_user_struct(frame, frame_addr, 0);
--
2.47.3
^ permalink raw reply related
* Re: [PATCH v5 07/13] scsi: fnic: Route completions and resets by initiator role
From: sashiko-bot @ 2026-06-24 5:11 UTC (permalink / raw)
To: Karan Tilak Kumar; +Cc: linux-scsi
In-Reply-To: <20260624044334.3079-8-kartilak@cisco.com>
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
New issues:
- [High] Asynchronous NVMe IO completion work queued during driver teardown executes after the fnic structure is freed, leading to a Use-After-Free.
Pre-existing issues:
- [Critical] The firmware reset completion handler calls `schedule_timeout()` from a hardirq context, leading to a "scheduling while atomic" kernel panic.
- [High] Using a shared struct member (`fnic->lock_flags`) to store IRQ state for `spin_lock_irqsave` violates locking semantics and can corrupt IRQ state across CPUs.
--
commit 0746859ac4c31fbfb285f20a291564287cfa0119
Author: Karan Tilak Kumar <kartilak@cisco.com>
scsi: fnic: Route completions and resets by initiator role
Dispatch FCPIO command, response, and ITMF completions to the FCP or
NVMe handlers based on the configured role. Read the NVMe queue-depth
and timeout retry fields from firmware config, clean up NVMe I/O on
firmware reset, and skip SCSI-only cleanup for initiator roles.
[ ... ]
> diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c
> index 04ab384033b19..9607684bc6107 100644
> --- a/drivers/scsi/fnic/fnic_scsi.c
> +++ b/drivers/scsi/fnic/fnic_scsi.c
[ ... ]
> @@ -734,6 +735,8 @@ static int fnic_fcpio_fw_reset_cmpl_handler(struct fnic *fnic,
> /* Clean up all outstanding io requests */
> if (IS_FNIC_FCP_INITIATOR(fnic))
> fnic_cleanup_io(fnic, SCSI_NO_TAG);
[Severity: Critical]
This is a pre-existing issue, but does calling fnic_cleanup_io() from a
hardware interrupt context trigger a "scheduling while atomic" panic?
If a hardware interrupt handler like fnic_isr_msix_wq_copy() processes
a FCPIO_RESET_CMPL event:
fnic_wq_copy_cmpl_handler() ->
fnic_fcpio_fw_reset_cmpl_handler() ->
fnic_cleanup_io() ->
schedule_timeout(msecs_to_jiffies(100))
Calling schedule_timeout() yields the CPU to the scheduler, which is
invalid from hardirq context. Will this crash the system upon firmware
reset completion?
> + else if (IS_FNIC_NVME_INITIATOR(fnic))
> + nvfnic_cleanup_all_nvme_ios(fnic);
>
> atomic64_set(&fnic->fnic_stats.fw_stats.active_fw_reqs, 0);
> atomic64_set(&fnic->fnic_stats.io_stats.active_ios, 0);
[ ... ]
> @@ -1457,11 +1461,22 @@ static int fnic_fcpio_cmpl_handler(struct vnic_dev *vdev,
> break;
>
> case FCPIO_ICMND_CMPL: /* fw completed a command */
> - fnic_fcpio_icmnd_cmpl_handler(fnic, cq_index, desc);
> + if (IS_FNIC_FCP_INITIATOR(fnic))
> + fnic_fcpio_icmnd_cmpl_handler(fnic, cq_index, desc);
> + else if (IS_FNIC_NVME_INITIATOR(fnic))
> + nvfnic_fcpio_nvme_fast_cmpl_handler(fnic, desc);
> + break;
> +
> + case FCPIO_NVME_ERSP_HW_CMPL: /* fw completed NVMe ERSP */
> + if (IS_FNIC_NVME_INITIATOR(fnic))
> + nvfnic_fcpio_ersp_cmpl_handler(fnic, desc, 1);
[Severity: High]
This is a pre-existing issue, but does calling these NVMe completion handlers
corrupt IRQ state across CPUs?
Looking at nvfnic_fcpio_nvme_fast_cmpl_handler() and
nvfnic_fcpio_ersp_cmpl_handler() in fnic_nvme.c, they both use a shared
structure member to save IRQ state:
spin_lock_irqsave(&fnic->fnic_lock, fnic->lock_flags);
Because fnic->lock_flags is shared rather than a local stack variable,
concurrent threads acquiring the lock on different CPUs will overwrite
each other's saved IRQ states. When spin_unlock_irqrestore() is called,
the wrong IRQ state is restored.
> break;
>
> case FCPIO_ITMF_CMPL: /* fw completed itmf (abort cmd, lun reset)*/
> - fnic_fcpio_itmf_cmpl_handler(fnic, cq_index, desc);
> + if (IS_FNIC_FCP_INITIATOR(fnic))
> + fnic_fcpio_itmf_cmpl_handler(fnic, cq_index, desc);
> + else if (IS_FNIC_NVME_INITIATOR(fnic))
> + nvfnic_fcpio_nvme_itmf_cmpl_handler(fnic, desc);
> break;
[ ... ]
> @@ -1650,6 +1665,15 @@ void fnic_wq_copy_cleanup_handler(struct vnic_wq_copy *wq,
> unsigned long start_time = 0;
> uint16_t hwq;
>
> + /*
> + * Clean up outstanding NVMe requests if firmware reset did not
> + * complete them before WQ copy cleanup.
> + */
> + if (IS_FNIC_NVME_INITIATOR(fnic)) {
> + nvfnic_cleanup_all_nvme_ios(fnic);
> + return;
> + }
> +
[Severity: High]
Can this unconditional cleanup lead to a use-after-free during driver
teardown?
The fnic_remove() sequence calls nvfnic_nvme_unload() which flushes
fnic->nvme_io_cmpl_work. It then calls fnic_cleanup() which invokes this
handler:
fnic_remove()
nvfnic_nvme_unload()
flush_work(&fnic->nvme_io_cmpl_work);
fnic_cleanup()
fnic_wq_copy_cleanup_handler()
nvfnic_cleanup_all_nvme_ios()
io_req->done()
If there are uncompleted requests, nvfnic_cleanup_all_nvme_ios() will
forcibly complete them, and io_req->done() queues new async work referencing
&fnic->nvme_io_cmpl_work.
Since fnic_remove() then frees the fnic structure without another flush,
won't the asynchronously executing work item dereference the freed fnic
pointer?
> /* get the tag reference */
> fcpio_tag_id_dec(&desc->hdr.tag, &id);
> id &= FNIC_TAG_MASK;
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260624044334.3079-1-kartilak@cisco.com?part=7
^ permalink raw reply
* [Stable-10.0.11 83/86] linux-user/xtensa: fix unlock of uninitialized frame pointer on sigreturn
From: Michael Tokarev @ 2026-06-24 5:02 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable, Matt Turner, Helge Deller, Michael Tokarev
In-Reply-To: <qemu-stable-10.0.11-20260624080222@cover.tls.msk.ru>
From: Matt Turner <mattst88@gmail.com>
If lock_user_struct fails, frame is uninitialized but the badframe
label unconditionally calls unlock_user_struct on it. Handle the
lock failure inline so badframe is only reached with a valid lock.
Signed-off-by: Matt Turner <mattst88@gmail.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Helge Deller <deller@gmx.de>
(cherry picked from commit 54e08dbe8f2aeca57e3b1a5eab09a9fec88c1c67)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
diff --git a/linux-user/xtensa/signal.c b/linux-user/xtensa/signal.c
index fb1a3b86c1..49a4d4bb53 100644
--- a/linux-user/xtensa/signal.c
+++ b/linux-user/xtensa/signal.c
@@ -356,7 +356,8 @@ long do_rt_sigreturn(CPUXtensaState *env)
trace_user_do_rt_sigreturn(env, frame_addr);
if (!lock_user_struct(VERIFY_READ, frame, frame_addr, 1)) {
- goto badframe;
+ force_sig(TARGET_SIGSEGV);
+ return -QEMU_ESIGRETURN;
}
target_to_host_sigset(&set, &frame->uc.tuc_sigmask);
set_sigmask(&set);
--
2.47.3
^ permalink raw reply related
* Re: [PATCH v5 09/13] scsi: fnic: Send NVMe LS requests through FDLS
From: sashiko-bot @ 2026-06-24 5:11 UTC (permalink / raw)
To: Karan Tilak Kumar; +Cc: linux-scsi
In-Reply-To: <20260624044334.3079-10-kartilak@cisco.com>
Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
New issues:
- [Critical] A race condition between NVMe-FC LS request submission and abort paths allows mod_timer to be called on an uninitialized timer.
Pre-existing issues:
- [Critical] Using the shared struct member fnic->lock_flags for spin_lock_irqsave leads to IRQ flag corruption.
- [High] Missing synchronous timer cancellation in LS response processing leads to UAF in timer callback.
- [High] Dereferencing lsreq->private after dropping lock in nvfnic_ls_req_abort() causes UAF.
--
commit 6b90bcf4cce6c8a0c7e6c0e98d68dca9d9d15359
Author: Karan Tilak Kumar <kartilak@cisco.com>
scsi: fnic: Send NVMe LS requests through FDLS
This commit adds the FC frame wrapper for NVMe LS requests and builds LS
request frames from the NVMe-FC transport callback. It allocates OXIDs,
tracks outstanding LS requests on the target port, arms request timers,
and registers the LS request callback in the NVMe FC port template.
Closes: https://lore.kernel.org/oe-kbuild-all/202605280619.pmobiDWp-lkp@intel.com/
> diff --git a/drivers/scsi/fnic/fnic_nvme.c b/drivers/scsi/fnic/fnic_nvme.c
> index 8374464e4fcc8..16e2f0add5ce6 100644
> --- a/drivers/scsi/fnic/fnic_nvme.c
> +++ b/drivers/scsi/fnic/fnic_nvme.c
> @@ -1306,6 +1306,129 @@ void nvfnic_ls_req_timeout(struct timer_list *t)
> ls_req->done(ls_req, -ETIMEDOUT);
> }
[Severity: Critical]
This is a pre-existing issue, but does nvfnic_ls_req_timeout() use the
shared struct member fnic->lock_flags for spin_lock_irqsave()?
If spin_lock_irqsave() saves the local IRQ state into a shared variable,
and multiple CPUs attempt to acquire the lock concurrently, won't they
overwrite fnic->lock_flags with their own IRQ state?
When releasing the lock, a CPU would restore the IRQ state written by
another CPU instead of its own, potentially leading to permanent IRQ
disablement and hard lockups.
[Severity: High]
This isn't a bug introduced by this patch, but is it possible for a
timeout and an LS response to race and cause a use-after-free here?
If a concurrent completion path like nvfnic_ls_rsp_recv() completes an
LS request using timer_delete() (which is not synchronous) and then calls
lsreq->done(), the NVMe FC transport will free the lsreq memory.
If nvfnic_ls_req_timeout() is executing concurrently on another CPU, won't
it acquire the lock and dereference the already freed lsreq memory?
> +
> +/**
> + * nvfnic_ls_req_send - Send NVMe FC link service (LS) request
[ ... ]
> + if (fdls_alloc_oxid(iport, FNIC_FRAME_TYPE_NVME_LS,
> + &nvfnic_ls_req->oxid) == FNIC_UNASSIGNED_OXID) {
> + FNIC_FCS_DBG(KERN_INFO, fnic,
> + "0x%x: Failed to allocate OXID to send NVME LS REQ",
> + iport->fcid);
> + mempool_free(frame, fnic->frame_pool);
> + spin_unlock_irqrestore(&fnic->fnic_lock, flags);
> + return -EAGAIN;
> + }
> +
> + timer_setup(&nvfnic_ls_req->ls_req_timer, nvfnic_ls_req_timeout,
> + 0UL);
[Severity: Critical]
Is there a race condition here between the LS request submission and the
abort path?
In the NVMe-FC midlayer, __nvme_fc_send_ls_req() adds the LS request to
rport->ls_req_list before calling the ls_req callback (nvfnic_ls_req_send).
If nvme_fc_abort_lsops() executes asynchronously right after the list
insertion but before nvfnic_ls_req_send() initializes the timer here,
will it invoke nvfnic_ls_req_abort() on a request where lsreq->private
is still purely zero-initialized memory?
If so, nvfnic_ls_req_abort() would unconditionally call mod_timer() on an
uninitialized timer, which could lead to a NULL pointer dereference or
timer wheel corruption.
> +
> + nvfnic_ls_req->fnic = fnic;
> + nvfnic_ls_req->tport = tport;
[ ... ]
> @@ -1606,7 +1729,7 @@ nvme_fc_port_template nvfnic_port = {
> .remoteport_delete = nvfnic_remote_port_delete,
> .create_queue = nvfnic_create_queue,
> .delete_queue = NULL,
> - .ls_req = NULL,
> + .ls_req = nvfnic_ls_req_send,
> .ls_abort = nvfnic_ls_req_abort,
[Severity: High]
This is a pre-existing issue, but does dropping the lock in
nvfnic_ls_req_abort() open a window for a use-after-free?
In nvfnic_ls_req_abort(), fnic->fnic_lock is dropped to safely call
timer_delete_sync(). During this unlocked window, if a concurrent
completion path calls lsreq->done(), the transport will free the lsreq
memory.
When timer_delete_sync() returns and nvfnic_ls_req_abort() reacquires the
lock, it executes nvfnic_ls_req = lsreq->private. Does this blindly
dereference the freed lsreq structure?
> .fcp_io = nvfnic_fcpio_send,
> .fcp_abort = nvfnic_fcpio_abort,
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260624044334.3079-1-kartilak@cisco.com?part=9
^ permalink raw reply
* [Stable-10.2.4 88/95] hw/9pfs: consolidate name validation with check_name()
From: Michael Tokarev @ 2026-06-24 5:05 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable, Christian Schoenebeck, Michael Tokarev
In-Reply-To: <qemu-stable-10.2.4-20260624080454@cover.tls.msk.ru>
From: Christian Schoenebeck <qemu_oss@crudebyte.com>
Add a new, shared helper function check_name() that consolidates the name
validation logic (illegal name check and "." / ".." rejection) currently
spread over multiple 9p handlers, unnecessarily duplicating code.
This is pure refactoring with no behavior change. The existing error code
semantics are preserved: rename operations return -EISDIR, create
operations return -EEXIST.
Note: These current error codes actually differ from native Linux system
calls (e.g. rename() returns -EBUSY, open(O_CREAT) returns -EISDIR). The 9P
protocol does not mandate specific error codes for these validation errors.
Hence consolidating to a single error code (e.g., -EINVAL) for all cases
could be considered in the future for simplicity reason.
This change reduces code duplication across 9 functions:
- v9fs_lcreate
- v9fs_create
- v9fs_symlink
- v9fs_link
- v9fs_rename
- v9fs_renameat
- v9fs_wstat
- v9fs_mknod
- v9fs_mkdir
Link: https://lore.kernel.org/qemu-devel/0573103880129eb543f07b68c77e86f2f572f6bf.1780072238.git.qemu_oss@crudebyte.com
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
(cherry picked from commit 116db2986b11c914217bbd1547815b6c7efb944a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index abc54e169c..5b3626221c 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -1823,6 +1823,25 @@ static bool name_is_illegal(const char *name)
return !*name || strchr(name, '/') != NULL;
}
+static int check_name(const char *name, V9fsPDU *pdu)
+{
+ int request_type = pdu->id;
+
+ if (name_is_illegal(name)) {
+ return -ENOENT;
+ }
+ if (!strcmp(name, ".") || !strcmp(name, "..")) {
+ /*
+ * TODO: The different error codes here are just there to preserve
+ * pre-existing behaviour of 9p server. In future it might make sense to
+ * consolidate this and e.g. just return -EINVAL for everyone.
+ */
+ return (request_type == P9_TRENAME || request_type == P9_TRENAMEAT ||
+ request_type == P9_TWSTAT) ? -EISDIR : -EEXIST;
+ }
+ return 0;
+}
+
static bool same_stat_id(const struct stat *a, const struct stat *b)
{
return a->st_dev == b->st_dev && a->st_ino == b->st_ino;
@@ -2173,13 +2192,8 @@ static void coroutine_fn v9fs_lcreate(void *opaque)
}
trace_v9fs_lcreate(pdu->tag, pdu->id, dfid, flags, mode, gid);
- if (name_is_illegal(name.data)) {
- err = -ENOENT;
- goto out_nofid;
- }
-
- if (!strcmp(".", name.data) || !strcmp("..", name.data)) {
- err = -EEXIST;
+ err = check_name(name.data, pdu);
+ if (err < 0) {
goto out_nofid;
}
@@ -2861,13 +2875,8 @@ static void coroutine_fn v9fs_create(void *opaque)
}
trace_v9fs_create(pdu->tag, pdu->id, fid, name.data, perm, mode);
- if (name_is_illegal(name.data)) {
- err = -ENOENT;
- goto out_nofid;
- }
-
- if (!strcmp(".", name.data) || !strcmp("..", name.data)) {
- err = -EEXIST;
+ err = check_name(name.data, pdu);
+ if (err < 0) {
goto out_nofid;
}
@@ -3055,13 +3064,8 @@ static void coroutine_fn v9fs_symlink(void *opaque)
}
trace_v9fs_symlink(pdu->tag, pdu->id, dfid, name.data, symname.data, gid);
- if (name_is_illegal(name.data)) {
- err = -ENOENT;
- goto out_nofid;
- }
-
- if (!strcmp(".", name.data) || !strcmp("..", name.data)) {
- err = -EEXIST;
+ err = check_name(name.data, pdu);
+ if (err < 0) {
goto out_nofid;
}
@@ -3148,13 +3152,8 @@ static void coroutine_fn v9fs_link(void *opaque)
}
trace_v9fs_link(pdu->tag, pdu->id, dfid, oldfid, name.data);
- if (name_is_illegal(name.data)) {
- err = -ENOENT;
- goto out_nofid;
- }
-
- if (!strcmp(".", name.data) || !strcmp("..", name.data)) {
- err = -EEXIST;
+ err = check_name(name.data, pdu);
+ if (err < 0) {
goto out_nofid;
}
@@ -3385,13 +3384,8 @@ static void coroutine_fn v9fs_rename(void *opaque)
goto out_nofid;
}
- if (name_is_illegal(name.data)) {
- err = -ENOENT;
- goto out_nofid;
- }
-
- if (!strcmp(".", name.data) || !strcmp("..", name.data)) {
- err = -EISDIR;
+ err = check_name(name.data, pdu);
+ if (err < 0) {
goto out_nofid;
}
@@ -3526,14 +3520,12 @@ static void coroutine_fn v9fs_renameat(void *opaque)
goto out_err;
}
- if (name_is_illegal(old_name.data) || name_is_illegal(new_name.data)) {
- err = -ENOENT;
+ err = check_name(old_name.data, pdu);
+ if (err < 0) {
goto out_err;
}
-
- if (!strcmp(".", old_name.data) || !strcmp("..", old_name.data) ||
- !strcmp(".", new_name.data) || !strcmp("..", new_name.data)) {
- err = -EISDIR;
+ err = check_name(new_name.data, pdu);
+ if (err < 0) {
goto out_err;
}
@@ -3638,12 +3630,8 @@ static void coroutine_fn v9fs_wstat(void *opaque)
err = -EOPNOTSUPP;
goto out;
}
- if (name_is_illegal(v9stat.name.data)) {
- err = -ENOENT;
- goto out;
- }
- if (!strcmp(".", v9stat.name.data) || !strcmp("..", v9stat.name.data)) {
- err = -EISDIR;
+ err = check_name(v9stat.name.data, pdu);
+ if (err < 0) {
goto out;
}
@@ -3776,13 +3764,8 @@ static void coroutine_fn v9fs_mknod(void *opaque)
}
trace_v9fs_mknod(pdu->tag, pdu->id, fid, mode, major, minor);
- if (name_is_illegal(name.data)) {
- err = -ENOENT;
- goto out_nofid;
- }
-
- if (!strcmp(".", name.data) || !strcmp("..", name.data)) {
- err = -EEXIST;
+ err = check_name(name.data, pdu);
+ if (err < 0) {
goto out_nofid;
}
@@ -3938,13 +3921,8 @@ static void coroutine_fn v9fs_mkdir(void *opaque)
}
trace_v9fs_mkdir(pdu->tag, pdu->id, fid, name.data, mode, gid);
- if (name_is_illegal(name.data)) {
- err = -ENOENT;
- goto out_nofid;
- }
-
- if (!strcmp(".", name.data) || !strcmp("..", name.data)) {
- err = -EEXIST;
+ err = check_name(name.data, pdu);
+ if (err < 0) {
goto out_nofid;
}
--
2.47.3
^ permalink raw reply related
* [Stable-10.2.4 72/95] target/riscv/cpu_helper.c: allow LOAD_ADDR_MIS promotion to AMO fault
From: Michael Tokarev @ 2026-06-24 5:05 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-stable, Daniel Henrique Barboza, Alistair Francis, Chao Liu,
Michael Tokarev
In-Reply-To: <qemu-stable-10.2.4-20260624080454@cover.tls.msk.ru>
From: Daniel Henrique Barboza <daniel.barboza@oss.qualcomm.com>
promote_load_fault() is missing the promotion of misaligned AMO load
addresses, i.e. RISCV_EXCP_LOAD_ADDR_MIS should be promoted to
RISCV_EXCP_STORE_AMO_ADDR_MIS when RISCV_UW2_ALWAYS_STORE_AMO (i.e.
always_storeamo is true).
All other load AMO faults are already being covered.
Cc: qemu-stable@nongnu.org
Fixes: 98f21c30f5 ("target/riscv: AMO operations always raise store/AMO fault")
Resolves: https://gitlab.com/qemu-project/qemu/-/work_items/3503
Signed-off-by: Daniel Henrique Barboza <daniel.barboza@oss.qualcomm.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Chao Liu <chao.liu.zevorn@gmail.com>
Message-ID: <20260522181353.429782-1-daniel.barboza@oss.qualcomm.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit d85a4ec06a65ccdd5c7d0f00b3e6695fc14a547a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 024213fb4b..1bd6649f23 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -2124,6 +2124,9 @@ static target_ulong promote_load_fault(target_ulong orig_cause)
case RISCV_EXCP_LOAD_PAGE_FAULT:
return RISCV_EXCP_STORE_PAGE_FAULT;
+
+ case RISCV_EXCP_LOAD_ADDR_MIS:
+ return RISCV_EXCP_STORE_AMO_ADDR_MIS;
}
/* if no promotion, return original cause */
--
2.47.3
^ permalink raw reply related
* [Stable-10.0.11 64/86] virtio: Allow to fill a whole virtqueue in order
From: Michael Tokarev @ 2026-06-24 5:02 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-stable, Eugenio Pérez, Michael S. Tsirkin,
Michael Tokarev
In-Reply-To: <qemu-stable-10.0.11-20260624080222@cover.tls.msk.ru>
From: Eugenio Pérez <eperezma@redhat.com>
As the while steps < max_steps is already one less than the vq size, the
right maximum max_steps variable is queue length, not the maximum
possible remainder of % vq->vring.num.
Fixes: b44135daa37 ("virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20260304173535.2702587-2-eperezma@redhat.com>
(cherry picked from commit eceff0982f97cc79a26883b93f8eac05cd126dd8)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 34ef7b86d9..215c95eccf 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -937,7 +937,7 @@ static void virtqueue_ordered_fill(VirtQueue *vq, const VirtQueueElement *elem,
* We shouldn't need to increase 'i' by more than or equal to
* the distance between used_idx and last_avail_idx (max_steps).
*/
- max_steps = (vq->last_avail_idx - vq->used_idx) % vq->vring.num;
+ max_steps = MIN(vq->last_avail_idx - vq->used_idx, vq->vring.num);
/* Search for element in vq->used_elems */
while (steps < max_steps) {
--
2.47.3
^ permalink raw reply related
* [Stable-10.0.11 72/86] disas/riscv.c: fix inst_length()
From: Michael Tokarev @ 2026-06-24 5:02 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-stable, Daniel Henrique Barboza, Alistair Francis,
Michael Tokarev
In-Reply-To: <qemu-stable-10.0.11-20260624080222@cover.tls.msk.ru>
From: Daniel Henrique Barboza <daniel.barboza@oss.qualcomm.com>
inst_length() can return 0 if 'inst' happens to not match any known
encoding (like [1]). Returning 0 is not desirable, even for unknown
encodings, given that it will cause a loop in target_disas() later on.
The most recent version of the RISC-V unpriv spec ditched the
sophisticated instruction-length encoding. We're now supporting only
16-bit and 32-bit length instructions, where:
"All the 32-bit instructions in the base ISA have their lowest two bits
set to 11. The optional compressed 16-bit instruction-set extensions
have their lowest two bits equal to 00, 01, or 10."
So the code is now simpler, never returning 0, and in fact it's the same
thing we're already doing in insn_len() from target/riscv/internals.h.
Due to include shenarigans we can't use that function in disas/riscv.c,
but I believe we can cut ourselves some slack this time and not lose
sleep over a 1 line of duplicated logic. We're documenting it though!
[1] https://gitlab.com/qemu-project/qemu/-/work_items/3479
Resolves: https://gitlab.com/qemu-project/qemu/-/work_items/3479
Signed-off-by: Daniel Henrique Barboza <daniel.barboza@oss.qualcomm.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20260527200355.2068879-2-daniel.barboza@oss.qualcomm.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit 758dce9c98af4f3ef26eada48a484a7d60258636)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
diff --git a/disas/riscv.c b/disas/riscv.c
index 980da7f7e1..2d8add3212 100644
--- a/disas/riscv.c
+++ b/disas/riscv.c
@@ -5083,26 +5083,10 @@ static bool check_constraints(rv_decode *dec, const rvc_constraint *c)
return true;
}
-/* instruction length */
-
+/* Same as insn_len() from target/riscv/internals.h */
static size_t inst_length(rv_inst inst)
{
- /* NOTE: supports maximum instruction size of 64-bits */
-
- /*
- * instruction length coding
- *
- * aa - 16 bit aa != 11
- * bbb11 - 32 bit bbb != 111
- * 011111 - 48 bit
- * 0111111 - 64 bit
- */
-
- return (inst & 0b11) != 0b11 ? 2
- : (inst & 0b11100) != 0b11100 ? 4
- : (inst & 0b111111) == 0b011111 ? 6
- : (inst & 0b1111111) == 0b0111111 ? 8
- : 0;
+ return (inst & 3) == 3 ? 4 : 2;
}
/* format instruction */
--
2.47.3
^ permalink raw reply related
* Re: [PATCH v5 02/13] scsi: fnic: Use fnic_num for non-SCSI identifiers
From: sashiko-bot @ 2026-06-24 5:11 UTC (permalink / raw)
To: Karan Tilak Kumar; +Cc: linux-scsi
In-Reply-To: <20260624044334.3079-3-kartilak@cisco.com>
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Unconditional use of fnic_num in fnic_fcpio_ack_handler trace events breaks event correlation for FCP initiators.
--
commit 899ac498b451c2fff04272de1bf2b696b5332d70
Author: Karan Tilak Kumar <kartilak@cisco.com>
scsi: fnic: Use fnic_num for non-SCSI identifiers
This commit updates the driver to use the fnic instance number for
NVMe-facing FDMI and debugfs entries, and for trace events. It aims to
reserve SCSI host numbers strictly for FCP initiator paths.
[ ... ]
> diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c
> index 0759540f66753..b92260583c67b 100644
> --- a/drivers/scsi/fnic/fnic_scsi.c
> +++ b/drivers/scsi/fnic/fnic_scsi.c
> @@ -894,7 +894,7 @@ static inline void fnic_fcpio_ack_handler(struct fnic *fnic,
>
> spin_unlock_irqrestore(&fnic->wq_copy_lock[wq_index], flags);
> FNIC_TRACE(fnic_fcpio_ack_handler,
> - fnic->host->host_no, 0, 0, ox_id_tag[2], ox_id_tag[3],
> + fnic->fnic_num, 0, 0, ox_id_tag[2], ox_id_tag[3],
[Severity: Medium]
Will this unconditional use of fnic_num break trace event correlation for
FCP initiators?
Other FCP-specific trace events continue to use the SCSI midlayer-allocated
host_no. By changing fnic_fcpio_ack_handler() to use fnic_num
unconditionally, a single FCP I/O sequence will interleave two different
identifier values in the trace buffer (e.g., host_no 2 vs fnic_num 0) on
systems with other SCSI controllers.
Should this trace call check IS_FNIC_FCP_INITIATOR(fnic) to log the host_no
for FCP traffic, similar to the debugfs and FDMI changes?
> ox_id_tag[4], ox_id_tag[5]);
> }
>
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260624044334.3079-1-kartilak@cisco.com?part=2
^ permalink raw reply
* [Stable-10.0.11 62/86] libvhost-user: fix buffer overflow in virtqueue_read_indirect_desc()
From: Michael Tokarev @ 2026-06-24 5:02 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-stable, Stefano Garzarella, DARKNAVY,
Daniel P. Berrangé, Michael S. Tsirkin, Michael Tokarev
In-Reply-To: <qemu-stable-10.0.11-20260624080222@cover.tls.msk.ru>
From: Stefano Garzarella <sgarzare@redhat.com>
virtqueue_read_indirect_desc() copies an indirect descriptor table
into a buffer in chunks when the table crosses a memory region
boundary. The destination is a struct vring_desc pointer but is
advanced by a byte count, so each increment moves the pointer by
read_len elements instead of read_len bytes, writing beyond the
buffer.
Use a char pointer for the destination so that the arithmetic
advances correctly.
While at it, change the source from a struct vring_desc pointer
to a void pointer: when the table is split across regions,
vu_gpa_to_va() can return a pointer into the middle of a descriptor,
so casting it to a struct vring_desc pointer is wrong. The pointer is
only used as a memcpy() source, so a void pointer is fine.
Fixes: CVE-2026-6425
Fixes: 293084a719 ("libvhost-user: Support across-memory-boundary access")
Cc: qemu-stable@nongnu.org
Reported-by: DARKNAVY <vr@darknavy.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20260417132645.121192-2-sgarzare@redhat.com>
(cherry picked from commit 85805ee887be4adddb1f1a34f526968adc95e582)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index 9c630c2170..014d210748 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -2391,8 +2391,9 @@ static int
virtqueue_read_indirect_desc(VuDev *dev, struct vring_desc *desc,
uint64_t addr, size_t len)
{
- struct vring_desc *ori_desc;
+ char *dst_desc = (char *)desc;
uint64_t read_len;
+ void *ori_desc;
if (len > (VIRTQUEUE_MAX_SIZE * sizeof(struct vring_desc))) {
return -1;
@@ -2409,10 +2410,10 @@ virtqueue_read_indirect_desc(VuDev *dev, struct vring_desc *desc,
return -1;
}
- memcpy(desc, ori_desc, read_len);
+ memcpy(dst_desc, ori_desc, read_len);
len -= read_len;
addr += read_len;
- desc += read_len;
+ dst_desc += read_len;
}
return 0;
--
2.47.3
^ permalink raw reply related
* [Stable-10.0.11 85/86] hw/pci: Replace assert with bounds check and return
From: Michael Tokarev @ 2026-06-24 5:02 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-stable, Aditya Gupta, Zexiang Zhang, Michael S. Tsirkin,
Michael Tokarev
In-Reply-To: <qemu-stable-10.0.11-20260624080222@cover.tls.msk.ru>
From: Aditya Gupta <adityag@linux.ibm.com>
As reported in https://gitlab.com/qemu-project/qemu/-/work_items/3334,
callers of 'pci_host_config_{read,write}_common' can pass length as 8,
causing an assert failure
The original issue with pnv_phb3 triggering the assert was fixed in a
previous commit
Instead of asserting on invalid length, check if the length is valid
(<=4), otherwise return (with the failure error code in read)
Reported-by: Zexiang Zhang <chan9yan9@gmail.com>
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20260326190438.734239-3-adityag@linux.ibm.com>
(cherry picked from commit c7209c56718107fefd5deae140a450954d31ff2b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
diff --git a/hw/pci/pci_host.c b/hw/pci/pci_host.c
index 56f7f28a1a..37a8d186d6 100644
--- a/hw/pci/pci_host.c
+++ b/hw/pci/pci_host.c
@@ -81,7 +81,12 @@ void pci_host_config_write_common(PCIDevice *pci_dev, uint32_t addr,
return;
}
- assert(len <= 4);
+ if (len > 4) {
+ PCI_DPRINTF("%s: invalid length access: addr " HWADDR_FMT_plx " \
+ len %d val %"PRIx32"\n", __func__, addr, len, val);
+ return;
+ }
+
/* non-zero functions are only exposed when function 0 is present,
* allowing direct removal of unexposed functions.
*/
@@ -106,7 +111,12 @@ uint32_t pci_host_config_read_common(PCIDevice *pci_dev, uint32_t addr,
return ~0x0;
}
- assert(len <= 4);
+ if (len > 4) {
+ PCI_DPRINTF("%s: invalid length access: addr " HWADDR_FMT_plx " \
+ len %d val %"PRIx32"\n", __func__, addr, len, val);
+ return ~0x0;
+ }
+
/* non-zero functions are only exposed when function 0 is present,
* allowing direct removal of unexposed functions.
*/
--
2.47.3
^ permalink raw reply related
* [Stable-10.2.4 70/95] amd_iommu: Reject non-decreasing NextLevel in fetch_pte()
From: Michael Tokarev @ 2026-06-24 5:05 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-stable, Alejandro Jimenez, Sairaj Kodilkar,
Michael S. Tsirkin, Michael Tokarev
In-Reply-To: <qemu-stable-10.2.4-20260624080454@cover.tls.msk.ru>
From: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
The AMD-Vi specification requires that the NextLevel field for a page table
entry must not be greater or equal to the current page table entry level.
Enforce this to avoid infinite page walk loops on corrupted or buggy guest
page tables.
The initial implementation of fetch_pte() did not implement this check, but
was not vulnerable since the page walk code explicitly decremented the level
instead of retrieving it from the page table entry.
Cc: qemu-stable@nongnu.org
Reviewed-by: Sairaj Kodilkar <sarunkod@amd.com>
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20260330212817.992673-3-alejandro.j.jimenez@oracle.com>
(cherry picked from commit 291aa70ad254b6c48012dbfd16a4af0978ea1b84)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 86dab42624..a7991b2790 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -771,6 +771,10 @@ static uint64_t fetch_pte(AMDVIAddressSpace *as, hwaddr address, uint64_t dte,
break;
}
+ /* Next level must always be less than current level */
+ if (pt_level <= next_pt_level) {
+ return -AMDVI_FR_PT_ENTRY_INV;
+ }
pt_level = next_pt_level;
/*
--
2.47.3
^ permalink raw reply related
* [Stable-10.2.4 71/95] virtio: Allow to fill a whole virtqueue in order
From: Michael Tokarev @ 2026-06-24 5:05 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-stable, Eugenio Pérez, Michael S. Tsirkin,
Michael Tokarev
In-Reply-To: <qemu-stable-10.2.4-20260624080454@cover.tls.msk.ru>
From: Eugenio Pérez <eperezma@redhat.com>
As the while steps < max_steps is already one less than the vq size, the
right maximum max_steps variable is queue length, not the maximum
possible remainder of % vq->vring.num.
Fixes: b44135daa37 ("virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20260304173535.2702587-2-eperezma@redhat.com>
(cherry picked from commit eceff0982f97cc79a26883b93f8eac05cd126dd8)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index deb7c6695e..4712a9a822 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -963,7 +963,7 @@ static void virtqueue_ordered_fill(VirtQueue *vq, const VirtQueueElement *elem,
* We shouldn't need to increase 'i' by more than or equal to
* the distance between used_idx and last_avail_idx (max_steps).
*/
- max_steps = (vq->last_avail_idx - vq->used_idx) % vq->vring.num;
+ max_steps = MIN(vq->last_avail_idx - vq->used_idx, vq->vring.num);
/* Search for element in vq->used_elems */
while (steps < max_steps) {
--
2.47.3
^ permalink raw reply related
* [Stable-10.2.4 76/95] disas/riscv.c: add 'cbo' insns to disassembler
From: Michael Tokarev @ 2026-06-24 5:05 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-stable, Daniel Henrique Barboza, Chao Liu, Richard Henderson,
Alistair Francis, Michael Tokarev
In-Reply-To: <qemu-stable-10.2.4-20260624080454@cover.tls.msk.ru>
From: Daniel Henrique Barboza <daniel.barboza@oss.qualcomm.com>
We forgot to add 'cbo' insns to disas/riscv.c. The result is that the
disassembler recognizes all of them as 'lq', an insn that happens to
share the same opcode space.
While we're at it reorder cbo_* entries in insn32.decode using opcode
order instead of insn name.
Resolves: https://gitlab.com/qemu-project/qemu/-/work_items/3480
Signed-off-by: Daniel Henrique Barboza <daniel.barboza@oss.qualcomm.com>
Reviewed-by: Chao Liu <chao.liu.zevorn@gmail.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20260519204714.1376551-1-daniel.barboza@oss.qualcomm.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit 9273cda72293ed2f414a0f239c4ac78fb7838c0e)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
diff --git a/disas/riscv.c b/disas/riscv.c
index 85cd2a9c2a..980da7f7e1 100644
--- a/disas/riscv.c
+++ b/disas/riscv.c
@@ -984,6 +984,10 @@ typedef enum {
rv_op_ssamoswap_d = 953,
rv_op_c_sspush = 954,
rv_op_c_sspopchk = 955,
+ rv_op_cbo_inval = 956,
+ rv_op_cbo_clean = 957,
+ rv_op_cbo_flush = 958,
+ rv_op_cbo_zero = 959,
} rv_op;
/* register names */
@@ -2254,6 +2258,10 @@ const rv_opcode_data rvi_opcode_data[] = {
rv_op_sspush, 0 },
{ "c.sspopchk", rv_codec_cmop_ss, rv_fmt_rs1, NULL, rv_op_sspopchk,
rv_op_sspopchk, 0 },
+ { "cbo.inval", rv_codec_r, rv_fmt_rs1, NULL, 0, 0, 0 },
+ { "cbo.clean", rv_codec_r, rv_fmt_rs1, NULL, 0, 0, 0 },
+ { "cbo.flush", rv_codec_r, rv_fmt_rs1, NULL, 0, 0, 0 },
+ { "cbo.zero", rv_codec_r, rv_fmt_rs1, NULL, 0, 0, 0 },
};
/* CSR names */
@@ -2875,7 +2883,26 @@ static void decode_inst_opcode(rv_decode *dec, rv_isa isa)
switch ((inst >> 12) & 0b111) {
case 0: op = rv_op_fence; break;
case 1: op = rv_op_fence_i; break;
- case 2: op = rv_op_lq; break;
+ case 2:
+ /*
+ * 'lq' shares the "(...) 010 ..... 0001111" opcode space
+ * with 'cbo' insns. Check the next 5 bits to select
+ * what we want:
+ *
+ * cbo_inval 0000000 00000 ..... 010 00000 0001111
+ * cbo_clean 0000000 00001 ..... 010 00000 0001111
+ * cbo_flush 0000000 00010 ..... 010 00000 0001111
+ * cbo_zero 0000000 00100 ..... 010 00000 0001111
+ *
+ * Anything that doesn't match these will default to 'lq'.
+ */
+ switch ((inst >> 17) & 0b11111) {
+ case 0: op = rv_op_cbo_inval; break;
+ case 1: op = rv_op_cbo_clean; break;
+ case 2: op = rv_op_cbo_flush; break;
+ case 4: op = rv_op_cbo_zero; break;
+ default: op = rv_op_lq; break;
+ }
}
break;
case 4:
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index cd23b1f3a9..564b745c48 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -199,9 +199,9 @@ ldu ............ ..... 111 ..... 0000011 @i
{
[
# *** RV32 Zicbom Standard Extension ***
+ cbo_inval 0000000 00000 ..... 010 00000 0001111 @sfence_vm
cbo_clean 0000000 00001 ..... 010 00000 0001111 @sfence_vm
cbo_flush 0000000 00010 ..... 010 00000 0001111 @sfence_vm
- cbo_inval 0000000 00000 ..... 010 00000 0001111 @sfence_vm
# *** RV32 Zicboz Standard Extension ***
cbo_zero 0000000 00100 ..... 010 00000 0001111 @sfence_vm
--
2.47.3
^ permalink raw reply related
* [Stable-10.0.11 66/86] target/riscv/csr.c: do not allow mstatus MPV/GVA writes
From: Michael Tokarev @ 2026-06-24 5:02 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-stable, Daniel Henrique Barboza, Alistair Francis,
Michael Tokarev
In-Reply-To: <qemu-stable-10.0.11-20260624080222@cover.tls.msk.ru>
From: Daniel Henrique Barboza <daniel.barboza@oss.qualcomm.com>
The priv spec states the following about mstatus.MPV:
"The MPV bit (Machine Previous Virtualization Mode) is written by the
implementation whenever a trap is taken into M-mode."
And, about mstatus.GVA:
"Field GVA (Guest Virtual Address) is written by the implementation
whenever a trap is taken into M-mode."
Both are written during riscv_cpu_do_interrupt(). They're not supposed
to be written by userspace. As far as write_mstatus goes these fields
are read only. The same applies for mstatush.MPV/mstatush.GVA.
Fixes: 03dd405dd5 ("target/riscv: Support MSTATUS.MPV/GVA only when RVH is enabled")
Signed-off-by: Daniel Henrique Barboza <daniel.barboza@oss.qualcomm.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20260514194537.2416243-2-daniel.barboza@oss.qualcomm.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit 18645f19578955ec5ff2c40cd2c8753d6bc460c2)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index a62b16feae..5e1b840087 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -2014,9 +2014,6 @@ static RISCVException write_mstatus(CPURISCVState *env, int csrno,
}
if (xl != MXL_RV32 || env->debugger) {
- if (riscv_has_ext(env, RVH)) {
- mask |= MSTATUS_MPV | MSTATUS_GVA;
- }
if ((val & MSTATUS64_UXL) != 0) {
mask |= MSTATUS64_UXL;
}
@@ -2053,7 +2050,7 @@ static RISCVException write_mstatush(CPURISCVState *env, int csrno,
target_ulong val)
{
uint64_t valh = (uint64_t)val << 32;
- uint64_t mask = riscv_has_ext(env, RVH) ? MSTATUS_MPV | MSTATUS_GVA : 0;
+ uint64_t mask = 0;
if (riscv_cpu_cfg(env)->ext_smdbltrp) {
mask |= MSTATUS_MDT;
--
2.47.3
^ permalink raw reply related
* [linux-next:master 8727/15099] drivers/firmware/imx/se_ctrl.c:476:2-7: WARNING: invalid free of devm_ allocated data
From: kernel test robot @ 2026-06-24 5:09 UTC (permalink / raw)
To: Pankaj Gupta; +Cc: oe-kbuild-all, Frank Li
tree: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
head: 4e5dfb7c84012007c3c7061126491bbc92d71bf1
commit: 2768fdfd55852f0973ddb0866eef14321df2fa47 [8727/15099] firmware: drivers: imx: adds miscdev
config: powerpc64-randconfig-r052-20260624 (https://download.01.org/0day-ci/archive/20260624/202606241320.9gDd2s75-lkp@intel.com/config)
compiler: clang version 23.0.0git (https://github.com/llvm/llvm-project a9b492db3d50683e446cd1a5c9ffaf4e92cb77a7)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606241320.9gDd2s75-lkp@intel.com/
cocci warnings: (new ones prefixed by >>)
>> drivers/firmware/imx/se_ctrl.c:476:2-7: WARNING: invalid free of devm_ allocated data
drivers/firmware/imx/se_ctrl.c:487:2-7: WARNING: invalid free of devm_ allocated data
vim +476 drivers/firmware/imx/se_ctrl.c
460
461 static int init_device_context(struct se_if_priv *priv, int ch_id,
462 struct se_if_device_ctx **new_dev_ctx)
463 {
464 struct se_if_device_ctx *dev_ctx;
465 int ret = 0;
466
467 dev_ctx = kzalloc_obj(*dev_ctx, GFP_KERNEL);
468
469 if (!dev_ctx)
470 return -ENOMEM;
471
472 dev_ctx->devname = kasprintf(GFP_KERNEL, "%s0_ch%d",
473 get_se_if_name(priv->if_defs->se_if_type),
474 ch_id);
475 if (!dev_ctx->devname) {
> 476 kfree(dev_ctx);
477 return -ENOMEM;
478 }
479
480 mutex_init(&dev_ctx->fops_lock);
481 dev_ctx->priv = priv;
482 *new_dev_ctx = dev_ctx;
483
484 ret = init_se_shared_mem(dev_ctx);
485 if (ret < 0) {
486 kfree(dev_ctx->devname);
487 kfree(dev_ctx);
488 *new_dev_ctx = NULL;
489 }
490
491 list_add_tail(&dev_ctx->link, &priv->dev_ctx_list);
492 priv->active_devctx_count++;
493
494 return ret;
495 }
496
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply
* [Stable-10.2.4 79/95] target/riscv/cpu_helper.c: add PMA access fault
From: Michael Tokarev @ 2026-06-24 5:05 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-stable, Daniel Henrique Barboza, Alistair Francis, Chao Liu,
Michael Tokarev
In-Reply-To: <qemu-stable-10.2.4-20260624080454@cover.tls.msk.ru>
From: Daniel Henrique Barboza <daniel.barboza@oss.qualcomm.com>
We're not doing anything special w.r.t PMA (Physical Memory Access)
related faults, handling them like regular faults that will eventually
turn to be regular page faults.
Turns out we can't do that. Priv spec section "Virtual Address
Translation Process" mentions:
"If a store to the PTE at address a+va.vpn[i]×PTESIZE would violate a
PMA or PMP check, raise an access-fault exception corresponding to the
original access type."
This means that we should handle PMA violations with access faults, like
we're already doing with PMP. One clear code path where we should throw
a PMA failure, exposed by [1], is the error return from
address_space_ld* call.
There's a separated issue with the error code being returned by them (it
always return DECODE_ERROR even with 'rejected' reads) that we're going
to work around it by assuming that we did a good job with the PTE
address sanitization beforehand, and interpret that the error here is
related to PMA. This is of course not ideal but fixing this QEMU API is
out of scope for this work.
All this said, we'll set the new pmp_pma_violation flag when we have
either a PMP or a PMA fault, and everything else shall fall into place.
[1] https://gitlab.com/qemu-project/qemu/-/work_items/3502
Resolves: https://gitlab.com/qemu-project/qemu/-/work_items/3502
Signed-off-by: Daniel Henrique Barboza <daniel.barboza@oss.qualcomm.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Chao Liu <chao.liu.zevorn@gmail.com>
Message-ID: <20260522172502.320529-1-daniel.barboza@oss.qualcomm.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit 8e090ecfa9e5d9c225a9f6aae1e39586c182afa7)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index ee93558668..faf8e1d151 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -131,7 +131,8 @@ enum {
TRANSLATE_SUCCESS,
TRANSLATE_FAIL,
TRANSLATE_PMP_FAIL,
- TRANSLATE_G_STAGE_FAIL
+ TRANSLATE_G_STAGE_FAIL,
+ TRANSLATE_PMA_FAIL,
};
/* Extension context status */
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 97a8fe308d..e2da93700d 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -1374,7 +1374,22 @@ static int get_physical_address(CPURISCVState *env, hwaddr *physical,
}
if (res != MEMTX_OK) {
- return TRANSLATE_FAIL;
+ /*
+ * The result of address_space_* APIs above does not take into
+ * consideration reject reads, putting all errors in the same
+ * cathegory (DECODE_ERROR), although there's a clear
+ * distinction between a rejected read versus other errors
+ * (see memory_region_dispatch_read() ->
+ * memory_region_access_valid()). This is something that
+ * we might have to deal with core QEMU logic some other
+ * day.
+ *
+ * For this particular error path, given that we made checks
+ * w.r.t legal PTE address before calling those APIs, we'll
+ * assume that anything != MEMTX_OK means a rejected read,
+ * i.e. a PMA error.
+ */
+ return TRANSLATE_PMA_FAIL;
}
if (riscv_cpu_sxl(env) == MXL_RV32) {
@@ -1654,7 +1669,8 @@ static int get_physical_address(CPURISCVState *env, hwaddr *physical,
}
static void raise_mmu_exception(CPURISCVState *env, target_ulong address,
- MMUAccessType access_type, bool pmp_violation,
+ MMUAccessType access_type,
+ bool pmp_pma_violation,
bool first_stage, bool two_stage,
bool two_stage_indirect)
{
@@ -1662,7 +1678,7 @@ static void raise_mmu_exception(CPURISCVState *env, target_ulong address,
switch (access_type) {
case MMU_INST_FETCH:
- if (pmp_violation) {
+ if (pmp_pma_violation) {
cs->exception_index = RISCV_EXCP_INST_ACCESS_FAULT;
} else if (env->virt_enabled && !first_stage) {
cs->exception_index = RISCV_EXCP_INST_GUEST_PAGE_FAULT;
@@ -1671,7 +1687,7 @@ static void raise_mmu_exception(CPURISCVState *env, target_ulong address,
}
break;
case MMU_DATA_LOAD:
- if (pmp_violation) {
+ if (pmp_pma_violation) {
cs->exception_index = RISCV_EXCP_LOAD_ACCESS_FAULT;
} else if (two_stage && !first_stage) {
cs->exception_index = RISCV_EXCP_LOAD_GUEST_ACCESS_FAULT;
@@ -1680,7 +1696,7 @@ static void raise_mmu_exception(CPURISCVState *env, target_ulong address,
}
break;
case MMU_DATA_STORE:
- if (pmp_violation) {
+ if (pmp_pma_violation) {
cs->exception_index = RISCV_EXCP_STORE_AMO_ACCESS_FAULT;
} else if (two_stage && !first_stage) {
cs->exception_index = RISCV_EXCP_STORE_GUEST_AMO_ACCESS_FAULT;
@@ -1806,7 +1822,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int size,
vaddr im_address;
hwaddr pa = 0;
int prot, prot2, prot_pmp;
- bool pmp_violation = false;
+ bool pmp_pma_violation = false;
bool first_stage_error = true;
bool two_stage_lookup = mmuidx_2stage(mmu_idx);
bool two_stage_indirect_error = false;
@@ -1907,8 +1923,8 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int size,
}
}
- if (ret == TRANSLATE_PMP_FAIL) {
- pmp_violation = true;
+ if (ret == TRANSLATE_PMP_FAIL || ret == TRANSLATE_PMA_FAIL) {
+ pmp_pma_violation = true;
}
if (ret == TRANSLATE_SUCCESS) {
@@ -1935,7 +1951,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int size,
cpu_check_watchpoint(cs, address, size, MEMTXATTRS_UNSPECIFIED,
wp_access, retaddr);
- raise_mmu_exception(env, address, access_type, pmp_violation,
+ raise_mmu_exception(env, address, access_type, pmp_pma_violation,
first_stage_error, two_stage_lookup,
two_stage_indirect_error);
cpu_loop_exit_restore(cs, retaddr);
--
2.47.3
^ permalink raw reply related
* Re: [PATCH] linux-firmware: upgrade 20260519 -> 20260622
From: Vivek Puar @ 2026-06-24 5:09 UTC (permalink / raw)
To: openembedded-core
In-Reply-To: <lf3gxnthrsftrp3mtqjywqnc5barhfz77uw4l2xshld7gcyzuc@7hzndercqhvl>
[-- Attachment #1: Type: text/plain, Size: 332 bytes --]
On Tue, Jun 23, 2026 at 11:19 PM, Dmitry Baryshkov wrote:
>
> You can also delete REMOVE_UNLICENSED and all special handling for it.
Some LICENSES are still present in linux-firmware.git, once cleanup is done, we can delete REMOVE_UNLICENSED from recipe.
>
> s/lpaicp/audio/
Updated in PATCH v2.
Regards,
Vivek
[-- Attachment #2: Type: text/html, Size: 431 bytes --]
^ permalink raw reply
* [Stable-10.2.4 67/95] libvhost-user: fix buffer overflow in virtqueue_read_indirect_desc()
From: Michael Tokarev @ 2026-06-24 5:05 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-stable, Stefano Garzarella, DARKNAVY,
Daniel P. Berrangé, Michael S. Tsirkin, Michael Tokarev
In-Reply-To: <qemu-stable-10.2.4-20260624080454@cover.tls.msk.ru>
From: Stefano Garzarella <sgarzare@redhat.com>
virtqueue_read_indirect_desc() copies an indirect descriptor table
into a buffer in chunks when the table crosses a memory region
boundary. The destination is a struct vring_desc pointer but is
advanced by a byte count, so each increment moves the pointer by
read_len elements instead of read_len bytes, writing beyond the
buffer.
Use a char pointer for the destination so that the arithmetic
advances correctly.
While at it, change the source from a struct vring_desc pointer
to a void pointer: when the table is split across regions,
vu_gpa_to_va() can return a pointer into the middle of a descriptor,
so casting it to a struct vring_desc pointer is wrong. The pointer is
only used as a memcpy() source, so a void pointer is fine.
Fixes: CVE-2026-6425
Fixes: 293084a719 ("libvhost-user: Support across-memory-boundary access")
Cc: qemu-stable@nongnu.org
Reported-by: DARKNAVY <vr@darknavy.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20260417132645.121192-2-sgarzare@redhat.com>
(cherry picked from commit 85805ee887be4adddb1f1a34f526968adc95e582)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index 9c630c2170..014d210748 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -2391,8 +2391,9 @@ static int
virtqueue_read_indirect_desc(VuDev *dev, struct vring_desc *desc,
uint64_t addr, size_t len)
{
- struct vring_desc *ori_desc;
+ char *dst_desc = (char *)desc;
uint64_t read_len;
+ void *ori_desc;
if (len > (VIRTQUEUE_MAX_SIZE * sizeof(struct vring_desc))) {
return -1;
@@ -2409,10 +2410,10 @@ virtqueue_read_indirect_desc(VuDev *dev, struct vring_desc *desc,
return -1;
}
- memcpy(desc, ori_desc, read_len);
+ memcpy(dst_desc, ori_desc, read_len);
len -= read_len;
addr += read_len;
- desc += read_len;
+ dst_desc += read_len;
}
return 0;
--
2.47.3
^ permalink raw reply related
* [Stable-10.2.4 69/95] amd_iommu: Follow root pointer before page walk and use 1-based levels
From: Michael Tokarev @ 2026-06-24 5:05 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-stable, Alejandro Jimenez, David Hoppenbrouwers,
Sairaj Kodilkar, Michael S. Tsirkin, Michael Tokarev
In-Reply-To: <qemu-stable-10.2.4-20260624080454@cover.tls.msk.ru>
From: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
DTE[Mode] and PTE NextLevel encode page table levels as 1-based values, but
fetch_pte() currently uses a 0-based level counter, making the logic
harder to follow and requiring conversions between DTE mode and level.
Switch the page table walk logic to use 1-based level accounting in
fetch_pte() and the relevant macro helpers. To further simplify the page
walking loop, split the root page table access from the walk i.e. rework
fetch_pte() to follow the DTE Page Table Root Pointer and retrieve the top
level pagetable entry before entering the loop, then iterate only over the
PDE/PTE entries.
The reworked algorithm fixes a page walk bug where the page size was
calculated for the next level before checking if the current PTE was already
a leaf/hugepage. That caused hugepage mappings to be reported as 4K pages,
leading to performance degradation and failures in some setups.
Fixes: a74bb3110a5b ("amd_iommu: Add helpers to walk AMD v1 Page Table format")
Cc: qemu-stable@nongnu.org
Reported-by: David Hoppenbrouwers <qemu@demindiro.com>
Reviewed-By: David Hoppenbrouwers <qemu@demindiro.com>
Reviewed-by: Sairaj Kodilkar <sarunkod@amd.com>
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20260330212817.992673-2-alejandro.j.jimenez@oracle.com>
(cherry picked from commit 786550e2d38a92e90c13eb9d57e3a72d7ab38d51)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index c4b3a9c78d..86dab42624 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -648,6 +648,52 @@ static uint64_t large_pte_page_size(uint64_t pte)
return PTE_LARGE_PAGE_SIZE(pte);
}
+/*
+ * Validate DTE fields and extract permissions and top level data required to
+ * initiate the page table walk.
+ *
+ * On success, returns 0 and stores:
+ * - top_level: highest page-table level encoded in DTE[Mode]
+ * - dte_perms: effective permissions from the DTE
+ *
+ * On failure, returns -AMDVI_FR_PT_ROOT_INV. This includes cases where:
+ * - DTE permissions disallow read AND write
+ * - DTE[Mode] is invalid for translation
+ * - IOVA exceeds the address width supported by DTE[Mode]
+ * In all such cases a page walk must be aborted.
+ */
+static uint64_t amdvi_get_top_pt_level_and_perms(hwaddr address, uint64_t dte,
+ uint8_t *top_level,
+ IOMMUAccessFlags *dte_perms)
+{
+ *dte_perms = amdvi_get_perms(dte);
+ if (*dte_perms == IOMMU_NONE) {
+ return -AMDVI_FR_PT_ROOT_INV;
+ }
+
+ /* Verifying a valid mode is encoded in DTE */
+ *top_level = get_pte_translation_mode(dte);
+
+ /*
+ * Page Table Root pointer is only valid for GPA->SPA translation on
+ * supported modes.
+ */
+ if (*top_level == 0 || *top_level > 6) {
+ return -AMDVI_FR_PT_ROOT_INV;
+ }
+
+ /*
+ * If IOVA is larger than the max supported by the highest pgtable level,
+ * there is nothing to do.
+ */
+ if (address > PT_LEVEL_MAX_ADDR(*top_level)) {
+ /* IOVA too large for the current DTE */
+ return -AMDVI_FR_PT_ROOT_INV;
+ }
+
+ return 0;
+}
+
/*
* Helper function to fetch a PTE using AMD v1 pgtable format.
* On successful page walk, returns 0 and pte parameter points to a valid PTE.
@@ -662,40 +708,49 @@ static uint64_t large_pte_page_size(uint64_t pte)
static uint64_t fetch_pte(AMDVIAddressSpace *as, hwaddr address, uint64_t dte,
uint64_t *pte, hwaddr *page_size)
{
- IOMMUAccessFlags perms = amdvi_get_perms(dte);
-
- uint8_t level, mode;
uint64_t pte_addr;
+ uint8_t pt_level, next_pt_level;
+ IOMMUAccessFlags perms;
+ int ret;
- *pte = dte;
*page_size = 0;
- if (perms == IOMMU_NONE) {
- return -AMDVI_FR_PT_ROOT_INV;
- }
-
/*
- * The Linux kernel driver initializes the default mode to 3, corresponding
- * to a 39-bit GPA space, where each entry in the pagetable translates to a
- * 1GB (2^30) page size.
+ * Verify the DTE is properly configured before page walk, and extract
+ * top pagetable level and permissions.
*/
- level = mode = get_pte_translation_mode(dte);
- assert(mode > 0 && mode < 7);
+ ret = amdvi_get_top_pt_level_and_perms(address, dte, &pt_level, &perms);
+ if (ret < 0) {
+ return ret;
+ }
/*
- * If IOVA is larger than the max supported by the current pgtable level,
- * there is nothing to do.
+ * Retrieve the top pagetable entry by following the DTE Page Table Root
+ * Pointer and indexing the top level table using the IOVA from the request.
*/
- if (address > PT_LEVEL_MAX_ADDR(mode - 1)) {
- /* IOVA too large for the current DTE */
+ pte_addr = NEXT_PTE_ADDR(dte, pt_level, address);
+ *pte = amdvi_get_pte_entry(as->iommu_state, pte_addr, as->devfn);
+
+ if (*pte == (uint64_t)-1) {
+ /*
+ * A returned PTE of -1 here indicates a failure to read the top level
+ * page table from guest memory. A page walk is not possible and page
+ * size must be returned as 0.
+ */
return -AMDVI_FR_PT_ROOT_INV;
}
- do {
- level -= 1;
+ /*
+ * Calculate page size for the top level page table entry.
+ * This ensures correct results for a single level Page Table setup.
+ */
+ *page_size = PTE_LEVEL_PAGE_SIZE(pt_level);
- /* Update the page_size */
- *page_size = PTE_LEVEL_PAGE_SIZE(level);
+ /*
+ * The root page table entry and its level have been determined. Begin the
+ * page walk.
+ */
+ while (pt_level > 0) {
/* Permission bits are ANDed at every level, including the DTE */
perms &= amdvi_get_perms(*pte);
@@ -708,37 +763,34 @@ static uint64_t fetch_pte(AMDVIAddressSpace *as, hwaddr address, uint64_t dte,
return 0;
}
+ next_pt_level = PTE_NEXT_LEVEL(*pte);
+
/* Large or Leaf PTE found */
- if (PTE_NEXT_LEVEL(*pte) == 7 || PTE_NEXT_LEVEL(*pte) == 0) {
+ if (next_pt_level == 0 || next_pt_level == 7) {
/* Leaf PTE found */
break;
}
+ pt_level = next_pt_level;
+
/*
- * Index the pgtable using the IOVA bits corresponding to current level
- * and walk down to the lower level.
+ * The current entry is a Page Directory Entry. Descend to the lower
+ * page table level encoded in current pte, and index the new table
+ * using the appropriate IOVA bits to retrieve the new entry.
*/
- pte_addr = NEXT_PTE_ADDR(*pte, level, address);
+ *page_size = PTE_LEVEL_PAGE_SIZE(pt_level);
+
+ pte_addr = NEXT_PTE_ADDR(*pte, pt_level, address);
*pte = amdvi_get_pte_entry(as->iommu_state, pte_addr, as->devfn);
if (*pte == (uint64_t)-1) {
- /*
- * A returned PTE of -1 indicates a failure to read the page table
- * entry from guest memory.
- */
- if (level == mode - 1) {
- /* Failure to retrieve the Page Table from Root Pointer */
- *page_size = 0;
- return -AMDVI_FR_PT_ROOT_INV;
- } else {
- /* Failure to read PTE. Page walk skips a page_size chunk */
- return -AMDVI_FR_PT_ENTRY_INV;
- }
+ /* Failure to read PTE. Page walk skips a page_size chunk */
+ return -AMDVI_FR_PT_ENTRY_INV;
}
- } while (level > 0);
+ }
+
+ assert(PTE_NEXT_LEVEL(*pte) == 0 || PTE_NEXT_LEVEL(*pte) == 7);
- assert(PTE_NEXT_LEVEL(*pte) == 0 || PTE_NEXT_LEVEL(*pte) == 7 ||
- level == 0);
/*
* Page walk ends when Next Level field on PTE shows that either a leaf PTE
* or a series of large PTEs have been reached. In the latter case, even if
diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
index 302ccca512..7af3c742b7 100644
--- a/hw/i386/amd_iommu.h
+++ b/hw/i386/amd_iommu.h
@@ -186,17 +186,16 @@
#define IOMMU_PTE_PRESENT(pte) ((pte) & AMDVI_PTE_PR)
-/* Using level=0 for leaf PTE at 4K page size */
-#define PT_LEVEL_SHIFT(level) (12 + ((level) * 9))
+/* Using level=1 for leaf PTE at 4K page size */
+#define PT_LEVEL_SHIFT(level) (12 + (((level) - 1) * 9))
/* Return IOVA bit group used to index the Page Table at specific level */
#define PT_LEVEL_INDEX(level, iova) (((iova) >> PT_LEVEL_SHIFT(level)) & \
GENMASK64(8, 0))
-/* Return the max address for a specified level i.e. max_oaddr */
-#define PT_LEVEL_MAX_ADDR(x) (((x) < 5) ? \
- ((1ULL << PT_LEVEL_SHIFT((x + 1))) - 1) : \
- (~(0ULL)))
+/* Return the maximum output address for a specified page table level */
+#define PT_LEVEL_MAX_ADDR(level) (((level) > 5) ? (~(0ULL)) : \
+ ((1ULL << PT_LEVEL_SHIFT((level) + 1)) - 1))
/* Extract the NextLevel field from PTE/PDE */
#define PTE_NEXT_LEVEL(pte) (((pte) & AMDVI_PTE_NEXT_LEVEL_MASK) >> 9)
--
2.47.3
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.