All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Roger Pau Monné" <roger.pau@citrix.com>
To: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
Cc: xen-devel@lists.xenproject.org,
	Jason Andryuk <jandryuk@gmail.com>,
	Jan Beulich <jbeulich@suse.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wl@xen.org>
Subject: Re: [PATCH v2 2/3] x86/hvm: Allow writes to registers on the same page as MSI-X table
Date: Tue, 28 Mar 2023 13:28:44 +0200	[thread overview]
Message-ID: <ZCLNQGXvUBxZbIGS@Air-de-Roger> (raw)
In-Reply-To: <20230325024924.882883-2-marmarek@invisiblethingslab.com>

On Sat, Mar 25, 2023 at 03:49:23AM +0100, Marek Marczykowski-Górecki wrote:
> Some devices (notably Intel Wifi 6 AX210 card) keep auxiliary registers
> on the same page as MSI-X table. Device model (especially one in
> stubdomain) cannot really handle those, as direct writes to that page is
> refused (page is on the mmio_ro_ranges list). Instead, add internal ioreq
> server that handle those writes.
> 
> Doing this, requires correlating write location with guest view
> of MSI-X table address. Since QEMU doesn't map MSI-X table to the guest,
> it requires msixtbl_entry->gtable, which is HVM-only. Similar feature
> for PV would need to be done separately.
> 
> This can be also used to read Pending Bit Array, if it lives on the same
> page, making QEMU not needing /dev/mem access at all (especially helpful
> with lockdown enabled in dom0). If PBA lives on another page, QEMU will
> map it to the guest directly.
> If PBA lives on the same page, forbid writes and log a message.
> Technically, writes outside of PBA could be allowed, but at this moment
> the precise location of PBA isn't saved.
> 
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> ---
> v2:
>  - adjust commit message
>  - pass struct domain to msixtbl_page_handler_get_hwaddr()
>  - reduce local variables used only once
>  - log a warning if write is forbidden if MSI-X and PBA lives on the same
>    page
>  - do not passthrough unaligned accesses
>  - handle accesses both before and after MSI-X table
> ---
>  xen/arch/x86/hvm/vmsi.c        | 154 +++++++++++++++++++++++++++++++++
>  xen/arch/x86/include/asm/msi.h |   5 ++
>  xen/arch/x86/msi.c             |  38 ++++++++
>  3 files changed, 197 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> index 9c82bf9b4ec2..9293009a4075 100644
> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -438,6 +438,152 @@ static const struct hvm_io_ops msixtbl_mmio_ops = {
>      .write = _msixtbl_write,
>  };
>  
> +static void __iomem *msixtbl_page_handler_get_hwaddr(
> +    const struct domain *d,
> +    uint64_t address,
> +    bool write)
> +{
> +    const struct pci_dev *pdev = NULL;
> +    const struct msixtbl_entry *entry;
> +    int adj_type;

unsigned AFAICT.

> +
> +    rcu_read_lock(&msixtbl_rcu_lock);
> +    /*
> +     * Check if it's on the same page as the end of the MSI-X table, but
> +     * outside of the table itself.
> +     */
> +    list_for_each_entry( entry, &d->arch.hvm.msixtbl_list, list )
> +    {
> +        if ( PFN_DOWN(address) == PFN_DOWN(entry->gtable) &&
> +             address < entry->gtable )
> +        {
> +            adj_type = ADJ_IDX_FIRST;
> +            pdev = entry->pdev;
> +            break;
> +        }
> +        if ( PFN_DOWN(address) == PFN_DOWN(entry->gtable + entry->table_len) &&

Should be entry->gtable + entry->table_len - 1, or else you are
off-by-one.

> +             address >= entry->gtable + entry->table_len )
> +        {
> +            adj_type = ADJ_IDX_LAST;
> +            pdev = entry->pdev;
> +            break;
> +        }
> +    }
> +    rcu_read_unlock(&msixtbl_rcu_lock);
> +
> +    if ( !pdev )
> +        return NULL;
> +
> +    ASSERT(pdev->msix);
> +
> +    if ( !pdev->msix->adj_access_table_idx[adj_type] )
> +    {
> +        gdprintk(XENLOG_WARNING,
> +                 "Page for adjacent MSI-X table access not initialized for %pp\n",
> +                 pdev);
> +
> +        return NULL;
> +    }
> +
> +    /* If PBA lives on the same page too, forbid writes. */
> +    if ( write && (
> +        (adj_type == ADJ_IDX_LAST &&
> +         pdev->msix->table.last == pdev->msix->pba.first) ||
> +        (adj_type == ADJ_IDX_FIRST &&
> +         pdev->msix->table.first == pdev->msix->pba.last)) )
> +    {
> +        gdprintk(XENLOG_WARNING,
> +                 "MSI-X table and PBA of %pp live on the same page, "
> +                 "writing to other registers there is not implemented\n",
> +                 pdev);

Don't you actually need to check that the passed address falls into the
PBA array?  PBA can be sharing the same page as the MSI-X table, but
that doesn't imply there aren't holes also not used by either the PBA
or the MSI-X table in such page.

> +        return NULL;
> +    }
> +
> +    return fix_to_virt(pdev->msix->adj_access_table_idx[adj_type]) +
> +        (address & (PAGE_SIZE - 1));

You can use PAGE_OFFSET().

> +
> +}
> +
> +static bool cf_check msixtbl_page_accept(
> +        const struct hvm_io_handler *handler, const ioreq_t *r)
> +{
> +    ASSERT(r->type == IOREQ_TYPE_COPY);
> +
> +    return msixtbl_page_handler_get_hwaddr(
> +            current->domain, r->addr, r->dir == IOREQ_WRITE);

I think you want to accept it also if it's a write to the PBA, and
just drop it.  You should always pass write=false and then drop it in
msixtbl_page_write() if it falls in the PBA region (but still return
X86EMUL_OKAY).

> +}
> +
> +static int cf_check msixtbl_page_read(
> +        const struct hvm_io_handler *handler,
> +        uint64_t address, uint32_t len, uint64_t *pval)

Why use hvm_io_ops and not hvm_mmio_ops?  You only care about MMIO
accesses here.

> +{
> +    void __iomem *hwaddr;

const

I would also initialize *pval = ~0ul for safety.

> +
> +    if ( address & (len - 1) )
> +        return X86EMUL_UNHANDLEABLE;

You can use IS_ALIGNED for clarity.  In my fix for this for vPCI Jan
asked to split unaligned accesses into 1byte ones, but I think for
guests it's better to just drop them unless there's a specific case
that we need to handle.

Also you should return X86EMUL_OKAY here, as the address is handled
here, but the current way to handle it is to drop the access.

Printing a debug message can be helpful in case unaligned accesses
need to be implemented in order to support some device.

> +
> +    hwaddr = msixtbl_page_handler_get_hwaddr(
> +            current->domain, address, false);
> +
> +    if ( !hwaddr )
> +        return X86EMUL_UNHANDLEABLE;

Maybe X86EMUL_RETRY, since the page was there in the accept handler.

> +
> +    switch ( len )
> +    {
> +    case 1:
> +        *pval = readb(hwaddr);
> +        break;
> +    case 2:
> +        *pval = readw(hwaddr);
> +        break;
> +    case 4:
> +        *pval = readl(hwaddr);
> +        break;
> +    case 8:
> +        *pval = readq(hwaddr);
> +        break;

Nit: we usually add a newline after every break;

> +    default:
> +        return X86EMUL_UNHANDLEABLE;

I would rather use ASSERT_UNREACHABLE() here and fallthrough to the
return below.  Seeing such size is likely an indication of something
else gone very wrong, better to just terminate the access at once.

> +    }
> +    return X86EMUL_OKAY;
> +}
> +
> +static int cf_check msixtbl_page_write(
> +        const struct hvm_io_handler *handler,
> +        uint64_t address, uint32_t len, uint64_t val)
> +{
> +    void __iomem *hwaddr = msixtbl_page_handler_get_hwaddr(
> +            current->domain, address, true);
> +

You don't seem to check whether the access is aligned here?

Otherwise I have mostly the same comments as in msixtbl_page_read().

> +    if ( !hwaddr )
> +        return X86EMUL_UNHANDLEABLE;
> +
> +    switch ( len ) {
> +        case 1:
> +            writeb(val, hwaddr);
> +            break;
> +        case 2:
> +            writew(val, hwaddr);
> +            break;
> +        case 4:
> +            writel(val, hwaddr);
> +            break;
> +        case 8:
> +            writeq(val, hwaddr);
> +            break;
> +        default:
> +            return X86EMUL_UNHANDLEABLE;
> +    }
> +    return X86EMUL_OKAY;
> +
> +}
> +
> +static const struct hvm_io_ops msixtbl_mmio_page_ops = {
> +    .accept = msixtbl_page_accept,
> +    .read = msixtbl_page_read,
> +    .write = msixtbl_page_write,
> +};
> +
>  static void add_msixtbl_entry(struct domain *d,
>                                struct pci_dev *pdev,
>                                uint64_t gtable,
> @@ -593,6 +739,14 @@ void msixtbl_init(struct domain *d)
>          handler->type = IOREQ_TYPE_COPY;
>          handler->ops = &msixtbl_mmio_ops;
>      }
> +
> +    /* passthrough access to other registers on the same page */
> +    handler = hvm_next_io_handler(d);
> +    if ( handler )
> +    {
> +        handler->type = IOREQ_TYPE_COPY;
> +        handler->ops = &msixtbl_mmio_page_ops;
> +    }
>  }
>  
>  void msixtbl_pt_cleanup(struct domain *d)
> diff --git a/xen/arch/x86/include/asm/msi.h b/xen/arch/x86/include/asm/msi.h
> index a53ade95c9ad..d13cf1c1f873 100644
> --- a/xen/arch/x86/include/asm/msi.h
> +++ b/xen/arch/x86/include/asm/msi.h
> @@ -207,6 +207,10 @@ struct msg_address {
>                                         PCI_MSIX_ENTRY_SIZE + \
>                                         (~PCI_MSIX_BIRMASK & (PAGE_SIZE - 1)))
>  
> +/* indexes in adj_access_table_idx[] below */
> +#define ADJ_IDX_FIRST 0
> +#define ADJ_IDX_LAST  1
> +
>  struct arch_msix {
>      unsigned int nr_entries, used_entries;
>      struct {
> @@ -214,6 +218,7 @@ struct arch_msix {
>      } table, pba;
>      int table_refcnt[MAX_MSIX_TABLE_PAGES];
>      int table_idx[MAX_MSIX_TABLE_PAGES];
> +    int adj_access_table_idx[2];
>      spinlock_t table_lock;
>      bool host_maskall, guest_maskall;
>      domid_t warned;
> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
> index d0bf63df1def..680853f84685 100644
> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -961,6 +961,34 @@ static int msix_capability_init(struct pci_dev *dev,
>                  domain_crash(d);
>              /* XXX How to deal with existing mappings? */
>          }
> +
> +        /*
> +         * If the MSI-X table doesn't start at the page boundary, map the first page for
> +         * passthrough accesses.
> +         */

I think you should initialize
msix->adj_access_table_idx[ADJ_IDX_{FIRST,LAST}] to -1?

> +        if ( table_paddr & (PAGE_SIZE - 1) )
> +        {
> +            int idx = msix_get_fixmap(msix, table_paddr, table_paddr);
> +
> +            if ( idx >= 0 )
> +                msix->adj_access_table_idx[ADJ_IDX_FIRST] = idx;
> +            else
> +                gprintk(XENLOG_ERR, "Failed to map first MSI-X table page: %d\n", idx);
> +        }
> +        /*
> +         * If the MSI-X table doesn't span full page(s), map the last page for
> +         * passthrough accesses.
> +         */
> +        if ( (table_paddr + msix->nr_entries * PCI_MSIX_ENTRY_SIZE) & (PAGE_SIZE - 1) )
> +        {
> +            uint64_t entry_paddr = table_paddr + msix->nr_entries * PCI_MSIX_ENTRY_SIZE;
> +            int idx = msix_get_fixmap(msix, table_paddr, entry_paddr);
> +
> +            if ( idx >= 0 )
> +                msix->adj_access_table_idx[ADJ_IDX_LAST] = idx;
> +            else
> +                gprintk(XENLOG_ERR, "Failed to map last MSI-X table page: %d\n", idx);
> +        }
>      }
>      WARN_ON(msix->table.first != (table_paddr >> PAGE_SHIFT));
>      ++msix->used_entries;
> @@ -1090,6 +1118,16 @@ static void _pci_cleanup_msix(struct arch_msix *msix)
>              WARN();
>          msix->table.first = 0;
>          msix->table.last = 0;
> +        if ( msix->adj_access_table_idx[ADJ_IDX_FIRST] )
> +        {
> +            msix_put_fixmap(msix, msix->adj_access_table_idx[ADJ_IDX_FIRST]);
> +            msix->adj_access_table_idx[ADJ_IDX_FIRST] = 0;
> +        }
> +        if ( msix->adj_access_table_idx[ADJ_IDX_LAST] )
> +        {
> +            msix_put_fixmap(msix, msix->adj_access_table_idx[ADJ_IDX_LAST]);
> +            msix->adj_access_table_idx[ADJ_IDX_LAST] = 0;

Isn't 0 a valid idx?  You should check for >= 0 and also set
to -1 once the fixmap entry has been put.

Thanks, Roger.


  parent reply	other threads:[~2023-03-28 11:29 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-25  2:49 [PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model Marek Marczykowski-Górecki
2023-03-25  2:49 ` [PATCH v2 2/3] x86/hvm: Allow writes to registers on the same page as MSI-X table Marek Marczykowski-Górecki
2023-03-27 10:47   ` Andrew Cooper
2023-03-28 11:28   ` Roger Pau Monné [this message]
2023-03-28 12:05     ` Marek Marczykowski-Górecki
2023-03-28 12:34       ` Jan Beulich
2023-03-28 12:52         ` Marek Marczykowski-Górecki
2023-03-28 13:03           ` Jan Beulich
2023-03-28 13:22             ` Roger Pau Monné
2023-04-03  4:21             ` Marek Marczykowski-Górecki
2023-04-03 11:09               ` Jan Beulich
2023-03-28 12:52       ` Roger Pau Monné
2023-03-25  2:49 ` [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot Marek Marczykowski-Górecki
2023-03-28 11:37   ` Roger Pau Monné
2023-03-28 12:07     ` Marek Marczykowski-Górecki
2023-03-28 12:38       ` Jan Beulich
2023-03-28 12:54   ` Jan Beulich
2023-03-28 13:04     ` Marek Marczykowski-Górecki
2023-03-28 13:17       ` Jason Andryuk
2023-03-28 17:38         ` Jason Andryuk
2023-03-28 13:23       ` Jan Beulich
2023-03-28 13:27         ` Roger Pau Monné
2023-03-28 13:32           ` Jason Andryuk
2023-03-28 13:35             ` Jan Beulich
2023-03-28 13:43               ` Jason Andryuk
2023-03-28 13:54                 ` Jan Beulich
2023-03-28 15:08                   ` Jason Andryuk
2023-03-28 15:39                     ` Jan Beulich
2023-03-27 10:12 ` [PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model Roger Pau Monné
2023-03-27 10:26   ` Marek Marczykowski-Górecki
2023-03-27 10:51     ` Roger Pau Monné
2023-03-27 11:34       ` Marek Marczykowski-Górecki
2023-03-27 13:29         ` Roger Pau Monné
2023-03-27 14:20           ` Marek Marczykowski-Górecki
2023-03-27 14:37             ` Roger Pau Monné
2023-03-27 15:32   ` Jan Beulich
2023-03-27 15:42     ` Roger Pau Monné

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZCLNQGXvUBxZbIGS@Air-de-Roger \
    --to=roger.pau@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jandryuk@gmail.com \
    --cc=jbeulich@suse.com \
    --cc=marmarek@invisiblethingslab.com \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.