All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Philippe Mathieu-Daudé" <philmd@linaro.org>
Cc: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>,
	qemu-devel@nongnu.org,
	Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
	Mauro Matteo Cascella <mcascell@redhat.com>,
	qemu-stable@nongnu.org,
	Corentin BAYET <corentin.bayet@reversetactics.com>
Subject: Re: [PATCH-for-10.1? v2] pcie_sriov: Fix configuration and state synchronization
Date: Mon, 28 Jul 2025 06:19:18 -0400	[thread overview]
Message-ID: <20250728061838-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <0e122e2a-898f-46ca-b484-291b0091d928@linaro.org>

On Mon, Jul 28, 2025 at 12:09:34PM +0200, Philippe Mathieu-Daudé wrote:
> Hi Akihiko, Michael,
> 
> On 27/7/25 08:50, Akihiko Odaki wrote:
> > Fix issues in PCIe SR-IOV configuration register handling that caused
> > inconsistent internal state due to improper write mask handling and
> > incorrect migration behavior.
> > 
> > Two main problems were identified:
> > 
> > 1. VF Enable bit write mask handling:
> >     pcie_sriov_config_write() incorrectly assumed that its val parameter
> >     was already masked, causing it to ignore the actual write mask.
> >     This led to the VF Enable bit being processed even when masked,
> >     resulting in incorrect VF registration/unregistration. It is
> >     identified as CVE-2025-54567.
> > 
> > 2. Migration state inconsistency:
> >     pcie_sriov_pf_post_load() unconditionally called register_vfs()
> >     regardless of the VF Enable bit state, creating inconsistent
> >     internal state when VFs should not be enabled. Additionally,
> >     it failed to properly update the NumVFs write mask based on
> >     the current configuration. It is identified as CVE-2025-54566.
> > 
> > Root cause analysis revealed that both functions relied on incorrect
> > special-case assumptions instead of properly reading and consuming
> > the actual configuration values. This change introduces a unified
> > consume_config() function that reads actual configuration values and
> > synchronize the internal state without special-case assumptions.
> > 
> > The solution only adds register read overhead in non-hot-path code
> > while ensuring correct SR-IOV state management across configuration
> > writes and migration scenarios.
> > 
> > Fixes: 5e7dd17e4348 ("pcie_sriov: Remove num_vfs from PCIESriovPF")
> > Fixes: f9efcd47110d ("pcie_sriov: Register VFs after migration")
> > Fixes: CVE-2025-54566
> > Fixes: CVE-2025-54567
> > Cc: qemu-stable@nongnu.org
> > Reported-by: Corentin BAYET <corentin.bayet@reversetactics.com>
> > Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
> > ---
> > Changes in v2:
> > - Changed to perform the VFEnable write mask update only when the bit is
> >    cleared. It clarifies the intention is to prevent setting the bit
> >    (i.e., the bit is currently cleared) when the NumVF holds an invalid
> >    value. The code execution when the bit is set will be also a bit
> >    shorter.
> > - Added references to the relevant CVEs.
> > - Link to v1: https://lore.kernel.org/qemu-devel/20250713-wmask-v1-1-4c744cdb32c0@rsg.ci.i.u-tokyo.ac.jp
> > ---
> >   hw/pci/pcie_sriov.c | 42 +++++++++++++++++++++++-------------------
> >   1 file changed, 23 insertions(+), 19 deletions(-)
> > 
> > diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
> > index 3ad18744f4a8ed2b35144fafcdc8e7e00fec3672..8a4bf0d6f7c0c6e9ec30df2e9bc55967e48cf6c3 100644
> > --- a/hw/pci/pcie_sriov.c
> > +++ b/hw/pci/pcie_sriov.c
> > @@ -64,6 +64,27 @@ static void unregister_vfs(PCIDevice *dev)
> >       pci_set_word(dev->wmask + dev->exp.sriov_cap + PCI_SRIOV_NUM_VF, 0xffff);
> >   }
> > +static void consume_config(PCIDevice *dev)
> > +{
> > +    uint8_t *cfg = dev->config + dev->exp.sriov_cap;
> > +
> > +    if (pci_get_word(cfg + PCI_SRIOV_CTRL) & PCI_SRIOV_CTRL_VFE) {
> > +        register_vfs(dev);
> > +    } else {
> > +        uint8_t *wmask = dev->wmask + dev->exp.sriov_cap;
> > +        uint16_t num_vfs = pci_get_word(cfg + PCI_SRIOV_NUM_VF);
> > +        uint16_t wmask_val = PCI_SRIOV_CTRL_MSE | PCI_SRIOV_CTRL_ARI;
> > +
> > +        unregister_vfs(dev);
> > +
> > +        if (num_vfs <= pci_get_word(cfg + PCI_SRIOV_TOTAL_VF)) {
> > +            wmask_val |= PCI_SRIOV_CTRL_VFE;
> > +        }
> > +
> > +        pci_set_word(wmask + PCI_SRIOV_CTRL, wmask_val);
> > +    }
> > +}
> > +
> >   static bool pcie_sriov_pf_init_common(PCIDevice *dev, uint16_t offset,
> >                                         uint16_t vf_dev_id, uint16_t init_vfs,
> >                                         uint16_t total_vfs, uint16_t vf_offset,
> > @@ -416,30 +437,13 @@ void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
> >       trace_sriov_config_write(dev->name, PCI_SLOT(dev->devfn),
> >                                PCI_FUNC(dev->devfn), off, val, len);
> > -    if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
> > -        if (val & PCI_SRIOV_CTRL_VFE) {
> > -            register_vfs(dev);
> > -        } else {
> > -            unregister_vfs(dev);
> > -        }
> > -    } else if (range_covers_byte(off, len, PCI_SRIOV_NUM_VF)) {
> > -        uint8_t *cfg = dev->config + sriov_cap;
> > -        uint8_t *wmask = dev->wmask + sriov_cap;
> > -        uint16_t num_vfs = pci_get_word(cfg + PCI_SRIOV_NUM_VF);
> > -        uint16_t wmask_val = PCI_SRIOV_CTRL_MSE | PCI_SRIOV_CTRL_ARI;
> > -
> > -        if (num_vfs <= pci_get_word(cfg + PCI_SRIOV_TOTAL_VF)) {
> > -            wmask_val |= PCI_SRIOV_CTRL_VFE;
> > -        }
> > -
> > -        pci_set_word(wmask + PCI_SRIOV_CTRL, wmask_val);
> > -    }
> > +    consume_config(dev);
> 
> As usual, this would be simpler to review extracting consume_config() in
> a preliminary patch, then the real fix.
> 
> >   }
> >   void pcie_sriov_pf_post_load(PCIDevice *dev)
> >   {
> >       if (dev->exp.sriov_cap) {
> > -        register_vfs(dev);
> > +        consume_config(dev);
> >       }
> >   }
> 
> Michael, do we want this for the 10.1 release?
> 
> Regards,
> 
> Phil.

Think so, I'm testing a pull with this included.



  reply	other threads:[~2025-07-28 10:40 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-27  6:50 [PATCH v2] pcie_sriov: Fix configuration and state synchronization Akihiko Odaki
2025-07-28 10:09 ` [PATCH-for-10.1? " Philippe Mathieu-Daudé
2025-07-28 10:19   ` Michael S. Tsirkin [this message]
2025-07-28 13:42     ` Philippe Mathieu-Daudé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250728061838-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=corentin.bayet@reversetactics.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mcascell@redhat.com \
    --cc=odaki@rsg.ci.i.u-tokyo.ac.jp \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-stable@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.