From: Alex Williamson <alex.williamson@redhat.com>
To: Reinette Chatre <reinette.chatre@intel.com>
Cc: <jgg@nvidia.com>, <yishaih@nvidia.com>,
<shameerali.kolothum.thodi@huawei.com>, <kevin.tian@intel.com>,
<tglx@linutronix.de>, <darwi@linutronix.de>,
<kvm@vger.kernel.org>, <dave.jiang@intel.com>,
<jing2.liu@intel.com>, <ashok.raj@intel.com>,
<fenghua.yu@intel.com>, <tom.zanussi@linux.intel.com>,
<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x
Date: Mon, 3 Apr 2023 21:18:41 -0600 [thread overview]
Message-ID: <20230403211841.0e206b67.alex.williamson@redhat.com> (raw)
In-Reply-To: <57a8c701-bf97-fddd-9ac0-fc4d09e3cb16@intel.com>
On Mon, 3 Apr 2023 15:50:54 -0700
Reinette Chatre <reinette.chatre@intel.com> wrote:
> Hi Alex,
>
> On 4/3/2023 1:22 PM, Alex Williamson wrote:
> > On Mon, 3 Apr 2023 10:31:23 -0700
> > Reinette Chatre <reinette.chatre@intel.com> wrote:
> >
> >> Hi Alex,
> >>
> >> On 3/31/2023 3:24 PM, Alex Williamson wrote:
> >>> On Fri, 31 Mar 2023 10:49:16 -0700
> >>> Reinette Chatre <reinette.chatre@intel.com> wrote:
> >>>> On 3/30/2023 3:42 PM, Alex Williamson wrote:
> >>>>> On Thu, 30 Mar 2023 16:40:50 -0600
> >>>>> Alex Williamson <alex.williamson@redhat.com> wrote:
> >>>>>
> >>>>>> On Tue, 28 Mar 2023 14:53:34 -0700
> >>>>>> Reinette Chatre <reinette.chatre@intel.com> wrote:
> >>>>>>
> >>
> >> ...
> >>
> >>>>>>> + msix_map.index = vector;
> >>>>>>> + msix_map.virq = irq;
> >>>>>>> + pci_msix_free_irq(pdev, msix_map);
> >>>>>>> + }
> >>>>>>> + vfio_pci_memory_unlock_and_restore(vdev, cmd);
> >>>>>>> out_put_eventfd_ctx:
> >>>>>>> eventfd_ctx_put(trigger);
> >>>>>>> out_free_name:
> >>>>>>> kfree(ctx->name);
> >>>>>>> ctx->name = NULL;
> >>>>>>> +out_free_ctx:
> >>>>>>> + if (allow_dyn_alloc && new_ctx)
> >>>>>>> + vfio_irq_ctx_free(vdev, ctx, vector);
> >>>>>>> return ret;
> >>>>>>> }
> >>>>>>>
> >>>>>>
> >>>>>> Do we really need the new_ctx test in the above cases? Thanks,
> >>>>
> >>>> new_ctx is not required for correctness but instead is used to keep
> >>>> the code symmetric.
> >>>> Specifically, if the user enables MSI-X without providing triggers and
> >>>> then later assign triggers then an error path without new_ctx would unwind
> >>>> more than done in this function, it would free the context that
> >>>> was allocated within vfio_msi_enable().
> >>>
> >>> Seems like we already have that asymmetry, if a trigger is unset we'll
> >>> free the ctx allocated by vfio_msi_enable(). Tracking which are
> >>
> >> Apologies, but could you please elaborate on where the asymmetry is? I am
> >> not able to see a flow in this solution where the ctx allocated by
> >> vfio_msi_enable() is freed if the trigger is unset.
> >
> > The user first calls SET_IRQS to enable MSI-X with some number of
> > vectors with (potentially) an eventfd for each vector. The user later
> > calls SET_IRQS passing a -1 eventfd for one or more of the vectors with
> > an eventfd initialized in the prior step. Given that we find the ctx,
> > the ctx has a trigger, and assuming dynamic allocation is supported, the
> > ctx is freed and vfio_msi_set_vector_signal() returns w/o allocating a
> > new ctx. We've de-allocated both the irq and context initialized from
> > vfio_msi_enable().
>
> This is correct. The comment I responded to was in regards to an unset
> trigger. The flow you describe is when a trigger is set. Not that
> it changes your point though, which is that vfio_msi_set_vector_signal()
> frees memory allocated by vfio_msi_enable(). This is clear to me. This
> is intended behavior. My concern is/was with the error path where a function
> failing may not be expected to change state, you address that concern below.
>
> >>> allocated where is unnecessarily complex, how about a policy that
> >>
> >> I do not see this as tracking where allocations are made. Instead I
> >> see it as containing/compartmentalizing state changes with the goal of
> >> making the code easier to understand and maintain. Specifically, new_ctx
> >> is used so that if vfio_msi_set_vector_signal() fails, the state
> >> before and after vfio_msi_set_vector_signal() will be the same.
> >
> > That's not really possible given how we teardown the existing ctx
> > before configuring the new one and unwind to disable contexts in
> > vfio_msi_set_block()
>
> Very unlikely indeed. I agree.
>
> >> I do agree that it makes vfio_msi_set_vector_signal() more complex
> >> and I can remove new_ctx if you find that this is unnecessary after
> >> considering the motivations behind its use.
> >
> > If the goal is to allow the user to swap one eventfd for another, where
> > the result will always be the new eventfd on success or the old eventfd
> > on error, I don't see that this code does that, or that we've ever
> > attempted to make such a guarantee. If the ioctl errors, I think the
> > eventfds are generally deconfigured. We certainly have the unwind code
> > that we discussed earlier that deconfigures all the vectors previously
> > touched in the loop (which seems to be another path where we could
> > de-allocate from the set of initial ctxs).
>
> Thank you for your patience in hearing and addressing my concerns. I plan
> to remove new_ctx in the next version.
>
> >>> devices supporting vdev->has_dyn_msix only ever have active contexts
> >>> allocated? Thanks,
> >>
> >> What do you see as an "active context"? A policy that is currently enforced
> >> is that an allocated context always has an allocated interrupt associated
> >> with it. I do not see how this could be expanded to also require an
> >> enabled interrupt because interrupt enabling requires a trigger that
> >> may not be available.
> >
> > A context is essentially meant to track a trigger, ie. an eventfd
> > provided by the user. In the static case all the irqs are necessarily
> > pre-allocated, therefore we had no reason to consider a dynamic array
> > for the contexts. However, a given context is really only "active" if
> > it has a trigger, otherwise it's just a placeholder. When the
> > placeholder is filled by an eventfd, the pre-allocated irq is enabled.
>
> I see.
>
> >
> > This proposal seems to be a hybrid approach, pre-allocating some
> > initial set of irqs and contexts and expecting the differentiation to
> > occur only when new vectors are added, though we have some disagreement
> > about this per above. Unfortunately I don't see an API to enable MSI-X
> > without some vectors, so some pre-allocation of irqs seems to be
> > required regardless.
>
> Right. pci_alloc_irq_vectors() or equivalent continues to be needed to
> enable MSI-X. Even so, it does seem possible (within vfio_msi_enable())
> to just allocate one vector using pci_alloc_irq_vectors()
> and then immediately free it using pci_msix_free_irq(). What do you think?
QEMU does something similar but I think it can really only be described
as a hack. In this case I think we can work with them being allocated
since that's essentially the static path.
> If I understand correctly this can be done without allocating any context
> and leave MSI-X enabled without any interrupts allocated. This could be a
> way to accomplish the "active context" policy for dynamic allocation.
> This is not a policy that can be applied broadly to interrupt contexts though
> because MSI and non-dynamic MSI-X could still have contexts with allocated
> interrupts without eventfd.
I think we could come up with wrappers that handle all cases, for
example:
int vfio_pci_alloc_irq(struct vfio_pci_core_device *vdev,
unsigned int vector, int irq_type)
{
struct pci_dev *pdev = vdev->pdev;
struct msi_map map;
int irq;
if (irq_type == VFIO_PCI_INTX_IRQ_INDEX)
return pdev->irq ?: -EINVAL;
irq = pci_irq_vector(pdev, vector);
if (irq > 0 || irq_type == VFIO_PCI_MSI_IRQ_INDEX ||
!vdev->has_dyn_msix)
return irq;
map = pci_msix_alloc_irq_at(pdev, vector, NULL);
return map.index;
}
void vfio_pci_free_irq(struct vfio_pci_core_device *vdev,
unsigned in vector, int irq_type)
{
struct msi_map map;
int irq;
if (irq_type != VFIO_PCI_INTX_MSIX_INDEX ||
!vdev->has_dyn_msix)
return;
irq = pci_irq_vector(pdev, vector);
map = { .index = vector, .virq = irq };
if (WARN_ON(irq < 0))
return;
pci_msix_free_irq(pdev, msix_map);
}
At that point, maybe we'd check whether it makes sense to embed the irq
alloc/free within the ctx alloc/free.
> > But if non-active contexts were only placeholders in the pre-dynamic
> > world and we now manage them via a dynamic array, why is there any
> > pre-allocation of contexts without knowing the nature of the eventfd to
> > fill it? We could have more commonality between cases if contexts are
> > always dynamically allocated, which might simplify differentiation of
> > the has_dyn_msix cases largely to wrappers allocating and freeing irqs.
> > Thanks,
>
> Thank you very much for your guidance. I will digest this some more and
> see how wrappers could be used. In the mean time while trying to think how
> to unify this code I do think there is an issue in this patch in that
> the get_cached_msi_msg()/pci_write_msi_msg()
> should not be in an else branch.
>
> Specifically, I think it needs to be:
> if (msix) {
> if (irq == -EINVAL) {
> /* dynamically allocate interrupt */
> }
> get_cached_msi_msg(irq, &msg);
> pci_write_msi_msg(irq, &msg);
> }
Yes, that's looked wrong to me all along, I think that resolves it.
Thanks,
Alex
next prev parent reply other threads:[~2023-04-04 3:19 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-28 21:53 [PATCH V2 0/8] vfio/pci: Support dynamic allocation of MSI-X interrupts Reinette Chatre
2023-03-28 21:53 ` [PATCH V2 1/8] vfio/pci: Consolidate irq cleanup on MSI/MSI-X disable Reinette Chatre
2023-03-28 21:53 ` [PATCH V2 2/8] vfio/pci: Remove negative check on unsigned vector Reinette Chatre
2023-03-30 20:26 ` Alex Williamson
2023-03-30 22:32 ` Reinette Chatre
2023-03-30 22:54 ` Alex Williamson
2023-03-30 23:54 ` Reinette Chatre
2023-03-28 21:53 ` [PATCH V2 3/8] vfio/pci: Prepare for dynamic interrupt context storage Reinette Chatre
2023-03-28 21:53 ` [PATCH V2 4/8] vfio/pci: Use xarray for " Reinette Chatre
2023-04-07 7:21 ` Liu, Jing2
2023-04-07 16:44 ` Reinette Chatre
2023-03-28 21:53 ` [PATCH V2 5/8] vfio/pci: Remove interrupt context counter Reinette Chatre
2023-03-28 21:53 ` [PATCH V2 6/8] vfio/pci: Move to single error path Reinette Chatre
2023-03-28 21:53 ` [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x Reinette Chatre
2023-03-29 2:48 ` kernel test robot
2023-03-29 14:42 ` Reinette Chatre
2023-03-29 22:10 ` Reinette Chatre
2023-03-29 2:58 ` kernel test robot
2023-03-30 22:40 ` Alex Williamson
2023-03-30 22:42 ` Alex Williamson
2023-03-31 17:49 ` Reinette Chatre
2023-03-31 22:24 ` Alex Williamson
2023-04-03 17:31 ` Reinette Chatre
2023-04-03 20:22 ` Alex Williamson
2023-04-03 22:50 ` Reinette Chatre
2023-04-04 3:18 ` Alex Williamson [this message]
2023-04-04 3:51 ` Tian, Kevin
2023-04-04 17:29 ` Reinette Chatre
2023-04-04 18:43 ` Alex Williamson
2023-04-04 20:46 ` Reinette Chatre
2023-04-04 16:54 ` Reinette Chatre
2023-04-04 18:24 ` Alex Williamson
2023-04-06 20:13 ` Reinette Chatre
2023-03-31 10:02 ` Liu, Jing2
2023-03-31 13:51 ` Alex Williamson
2023-04-04 3:19 ` Liu, Jing2
2023-03-28 21:53 ` [PATCH V2 8/8] vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X Reinette Chatre
2023-03-29 3:29 ` kernel test robot
2023-03-29 3:29 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230403211841.0e206b67.alex.williamson@redhat.com \
--to=alex.williamson@redhat.com \
--cc=ashok.raj@intel.com \
--cc=darwi@linutronix.de \
--cc=dave.jiang@intel.com \
--cc=fenghua.yu@intel.com \
--cc=jgg@nvidia.com \
--cc=jing2.liu@intel.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=reinette.chatre@intel.com \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=tglx@linutronix.de \
--cc=tom.zanussi@linux.intel.com \
--cc=yishaih@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).