All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Roger Pau Monné" <roger.pau@citrix.com>
To: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
Cc: xen-devel@lists.xenproject.org, Jan Beulich <jbeulich@suse.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>
Subject: Re: [PATCH v7 2/6] x86/hvm: Allow access to registers on the same page as MSI-X table
Date: Thu, 9 May 2024 09:35:29 +0200	[thread overview]
Message-ID: <Zjx8wf3m6AlQFbLT@macbook> (raw)
In-Reply-To: <Zjv0Ea3hQyzFwpmc@mail-itl>

On Wed, May 08, 2024 at 11:52:17PM +0200, Marek Marczykowski-Górecki wrote:
> On Wed, May 08, 2024 at 06:09:48PM +0200, Roger Pau Monné wrote:
> > On Tue, May 07, 2024 at 02:44:02PM +0200, Marek Marczykowski-Górecki wrote:
> > > Some devices (notably Intel Wifi 6 AX210 card) keep auxiliary registers
> > > on the same page as MSI-X table. Device model (especially one in
> > > stubdomain) cannot really handle those, as direct writes to that page is
> > > refused (page is on the mmio_ro_ranges list). Instead, extend
> > > msixtbl_mmio_ops to handle such accesses too.
> > > 
> > > Doing this, requires correlating read/write location with guest
> > > MSI-X table address. Since QEMU doesn't map MSI-X table to the guest,
> > > it requires msixtbl_entry->gtable, which is HVM-only. Similar feature
> > > for PV would need to be done separately.
> > > 
> > > This will be also used to read Pending Bit Array, if it lives on the same
> > > page, making QEMU not needing /dev/mem access at all (especially helpful
> > > with lockdown enabled in dom0). If PBA lives on another page, QEMU will
> > > map it to the guest directly.
> > > If PBA lives on the same page, discard writes and log a message.
> > > Technically, writes outside of PBA could be allowed, but at this moment
> > > the precise location of PBA isn't saved, and also no known device abuses
> > > the spec in this way (at least yet).
> > > 
> > > To access those registers, msixtbl_mmio_ops need the relevant page
> > > mapped. MSI handling already has infrastructure for that, using fixmap,
> > > so try to map first/last page of the MSI-X table (if necessary) and save
> > > their fixmap indexes. Note that msix_get_fixmap() does reference
> > > counting and reuses existing mapping, so just call it directly, even if
> > > the page was mapped before. Also, it uses a specific range of fixmap
> > > indexes which doesn't include 0, so use 0 as default ("not mapped")
> > > value - which simplifies code a bit.
> > > 
> > > Based on assumption that all MSI-X page accesses are handled by Xen, do
> > > not forward adjacent accesses to other hypothetical ioreq servers, even
> > > if the access wasn't handled for some reason (failure to map pages etc).
> > > Relevant places log a message about that already.
> > > 
> > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > 
> > Thanks, just a couple of minor comments, I think the only relevant one
> > is that you can drop ADJACENT_DONT_HANDLE unless there's something
> > I'm missing.  The rest are mostly cosmetic, but if you have to respin
> > and agree with them might be worth addressing.
> > 
> > Sorry for giving this feedback so late in the process, I should have
> > attempted to review earlier versions.
> > 
> > > ---
> > > Changes in v7:
> > > - simplify logic based on assumption that all access to MSI-X pages are
> > >   handled by Xen (Roger)
> > > - move calling adjacent_handle() into adjacent_{read,write}() (Roger)
> > > - move range check into msixtbl_addr_to_desc() (Roger)
> > > - fix off-by-one when initializing adj_access_idx[ADJ_IDX_LAST] (Roger)
> > > - no longer distinguish between unhandled write due to PBA nearby and
> > >   other reasons
> > > - add missing break after ASSERT_UNREACHABLE (Jan)
> > > Changes in v6:
> > > - use MSIX_CHECK_WARN macro
> > > - extend assert on fixmap_idx
> > > - add break in default label, after ASSERT_UNREACHABLE(), and move
> > >   setting default there
> > > - style fixes
> > > Changes in v5:
> > > - style fixes
> > > - include GCC version in the commit message
> > > - warn only once (per domain, per device) about failed adjacent access
> > > Changes in v4:
> > > - drop same_page parameter of msixtbl_find_entry(), distinguish two
> > >   cases in relevant callers
> > > - rename adj_access_table_idx to adj_access_idx
> > > - code style fixes
> > > - drop alignment check in adjacent_{read,write}() - all callers already
> > >   have it earlier
> > > - delay mapping first/last MSI-X pages until preparing device for a
> > >   passthrough
> > > v3:
> > >  - merge handling into msixtbl_mmio_ops
> > >  - extend commit message
> > > v2:
> > >  - adjust commit message
> > >  - pass struct domain to msixtbl_page_handler_get_hwaddr()
> > >  - reduce local variables used only once
> > >  - log a warning if write is forbidden if MSI-X and PBA lives on the same
> > >    page
> > >  - do not passthrough unaligned accesses
> > >  - handle accesses both before and after MSI-X table
> > > ---
> > >  xen/arch/x86/hvm/vmsi.c        | 205 ++++++++++++++++++++++++++++++++--
> > >  xen/arch/x86/include/asm/msi.h |   5 +-
> > >  xen/arch/x86/msi.c             |  42 +++++++-
> > >  3 files changed, 242 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> > > index 999917983789..f7b7b4998b5e 100644
> > > --- a/xen/arch/x86/hvm/vmsi.c
> > > +++ b/xen/arch/x86/hvm/vmsi.c
> > > @@ -180,6 +180,10 @@ static bool msixtbl_initialised(const struct domain *d)
> > >      return d->arch.hvm.msixtbl_list.next;
> > >  }
> > >  
> > > +/*
> > > + * Lookup an msixtbl_entry on the same page as given addr. It's up to the
> > > + * caller to check if address is strictly part of the table - if relevant.
> > > + */
> > >  static struct msixtbl_entry *msixtbl_find_entry(
> > >      struct vcpu *v, unsigned long addr)
> > >  {
> > > @@ -187,8 +191,8 @@ static struct msixtbl_entry *msixtbl_find_entry(
> > >      struct domain *d = v->domain;
> > >  
> > >      list_for_each_entry( entry, &d->arch.hvm.msixtbl_list, list )
> > > -        if ( addr >= entry->gtable &&
> > > -             addr < entry->gtable + entry->table_len )
> > > +        if ( PFN_DOWN(addr) >= PFN_DOWN(entry->gtable) &&
> > > +             PFN_DOWN(addr) <= PFN_DOWN(entry->gtable + entry->table_len - 1) )
> > >              return entry;
> > >  
> > >      return NULL;
> > > @@ -203,6 +207,10 @@ static struct msi_desc *msixtbl_addr_to_desc(
> > >      if ( !entry || !entry->pdev )
> > >          return NULL;
> > >  
> > > +    if ( addr < entry->gtable ||
> > > +         addr >= entry->gtable + entry->table_len )
> > > +        return NULL;
> > > +
> > >      nr_entry = (addr - entry->gtable) / PCI_MSIX_ENTRY_SIZE;
> > >  
> > >      list_for_each_entry( desc, &entry->pdev->msi_list, list )
> > > @@ -213,6 +221,152 @@ static struct msi_desc *msixtbl_addr_to_desc(
> > >      return NULL;
> > >  }
> > >  
> > > +/*
> > > + * Returns:
> > > + *  - ADJACENT_DONT_HANDLE if no handling should be done
> > > + *  - a fixmap idx to use for handling
> > > + */
> > > +#define ADJACENT_DONT_HANDLE UINT_MAX
> > 
> > Isn't it fine to just return 0 to signal that the access is not
> > handled?
> > 
> > fixmap index 0 is reserved anyway (see FIX_RESERVED), so could be used
> > for this purpose and then you don't need to introduce
> > ADJACENT_DONT_HANDLE?
> 
> It was this way before in v2 and you asked me to not use 0 for this
> purpose...

Sorry, I think I didn't realize fixmap idx 0 was reserved, and hence
can be used to signal no idx.

> > > +
> > > +    if ( !msix->adj_access_idx[adj_type] )
> > > +    {
> > > +        if ( MSIX_CHECK_WARN(msix, entry->pdev->domain->domain_id,
> > > +                             adjacent_not_initialized) )
> > > +            gprintk(XENLOG_WARNING,
> > > +                    "Page for adjacent(%d) MSI-X table access not initialized for %pp (addr %#lx, gtable %#lx\n",
> > 
> > Do you really need to log an error here?  There's an error already
> > printed in msix_capability_init() if the adjacent pages can't be
> > mapped.
> 
> IMO it's better to keep this message, otherwise it might be pretty hard
> to debug not working device - a message buried somewhere on startup
> might be hard to correlate with an issue much later.

Would you mind starting the entry with the SBDF then?

"%pp: MSI-X adjacent memory not mapped, dropping access to %#lx\n"

Or similar.

> > > +    fixmap_idx = adjacent_handle(entry, address, false);
> > > +
> > > +    if ( fixmap_idx == ADJACENT_DONT_HANDLE )
> > > +    {
> > > +        *pval = ~0UL;
> > > +        return X86EMUL_OKAY;
> > > +    }
> > 
> > FWIW, I find it safer to unconditionally init *pval = ~0UL at the
> > start of the function, and then the return here and in the default
> > switch statement case can avoid setting it.  It's less easy to return
> > without the variable being set.
> 
> It was this way in v5, but Jan asked me to move it to only relevant
> branch.

Hm, I see, we had this discussion with Jan in the past.  I'm fine this
way if you prefer, but I think it's less robust.

> > > @@ -374,16 +550,25 @@ static bool cf_check msixtbl_range(
> > >  {
> > >      struct vcpu *curr = current;
> > >      unsigned long addr = r->addr;
> > > -    const struct msi_desc *desc;
> > > +    const struct msixtbl_entry *entry;
> > > +    bool ret = false;
> > >  
> > >      ASSERT(r->type == IOREQ_TYPE_COPY);
> > >  
> > >      rcu_read_lock(&msixtbl_rcu_lock);
> > > -    desc = msixtbl_addr_to_desc(msixtbl_find_entry(curr, addr), addr);
> > > +    entry = msixtbl_find_entry(curr, addr);
> > > +    if ( entry )
> > > +    {
> > > +        if ( addr < entry->gtable || addr >= entry->gtable + entry->table_len )
> > > +            /* Possibly handle adjacent access. */
> > > +            ret = true;
> > > +        else
> > > +            ret = msixtbl_addr_to_desc(entry, addr) != NULL;
> > > +    }
> > 
> > You could probably put all this into a single condition:
> > 
> > if ( entry &&
> >       /* Adjacent access. */
> >      (addr < entry->gtable || addr >= entry->gtable + entry->table_len ||
> >       /* Otherwise check if there's a matching msi_desc. */
> >       msixtbl_addr_to_desc(entry, addr)) )
> >     ret = true;
> > 
> > That's IMO easier to read by setting ret once only.
> 
> Is multi-line "if" mixed with comments really easier to follow?

It is for me, because ret gets set in a single place, it's a single
branch to analyze and reduces indentation.

Ultimately it's a question of taste, so would leave that up to you as
the author of the code.  I also dislike the 'ret =
msixtbl_addr_to_desc(entry, addr) != NULL' expression, but again it's
a question of taste.

Thanks, Roger.


  reply	other threads:[~2024-05-09  7:35 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-07 12:44 [PATCH v7 0/6] MSI-X support with qemu in stubdomain, and other related changes Marek Marczykowski-Górecki
2024-05-07 12:44 ` [PATCH v7 1/6] x86/msi: Extend per-domain/device warning mechanism Marek Marczykowski-Górecki
2024-05-07 12:44 ` [PATCH v7 2/6] x86/hvm: Allow access to registers on the same page as MSI-X table Marek Marczykowski-Górecki
2024-05-08 16:09   ` Roger Pau Monné
2024-05-08 21:52     ` Marek Marczykowski-Górecki
2024-05-09  7:35       ` Roger Pau Monné [this message]
2024-05-07 12:44 ` [PATCH v7 3/6] automation: prevent QEMU access to /dev/mem in PCI passthrough tests Marek Marczykowski-Górecki
2024-05-07 12:44 ` [PATCH v7 4/6] automation: switch to a wifi card on ADL system Marek Marczykowski-Górecki
2024-05-07 12:44 ` [PATCH v7 5/6] [DO NOT APPLY] switch to qemu fork Marek Marczykowski-Górecki
2024-05-07 12:44 ` [PATCH v7 6/6] [DO NOT APPLY] switch to alternative artifact repo Marek Marczykowski-Górecki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zjx8wf3m6AlQFbLT@macbook \
    --to=roger.pau@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=marmarek@invisiblethingslab.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.