Re: [RFC V1 RESEND 2/6] PCI/MSI: Dynamic allocation of MSI-X vectors by group

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Megha Dey <megha.dey@intel.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: bhelgaas@google.com, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, marc.zyngier@arm.com,
	ashok.raj@intel.com, jacob.jun.pan@linux.intel.com
Subject: Re: [RFC V1 RESEND 2/6] PCI/MSI: Dynamic allocation of MSI-X vectors by group
Date: Mon, 12 Aug 2019 10:47:12 -0700	[thread overview]
Message-ID: <1565632032.7042.12.camel@intel.com> (raw)
In-Reply-To: <alpine.DEB.2.21.1908071525390.24014@nanos.tec.linutronix.de>

On Wed, 2019-08-07 at 15:56 +0200, Thomas Gleixner wrote:
> Megha,
> 
> On Tue, 6 Aug 2019, Megha Dey wrote:
> > 
> > On Sat, 2019-06-29 at 09:59 +0200, Thomas Gleixner wrote:
> > > 
> > > On Fri, 21 Jun 2019, Megha Dey wrote:
> > Totally agreed. The request to add a dynamic MSI-X infrastructure
> > came
> > from some driver teams internally and currently they do not have
> > bandwidth to come up with relevant test cases. <sigh>
> Hahahaha.
> 
> > 
> > But we hope that this patch set could serve as a precursor to the
> > interrupt message store (IMS) patch set, and we can use this patch
> > set
> > as the baseline for the IMS patches.
> If IMS needs the same functionality, then we need to think about it
> slightly differently because IMS is not necessarily tied to PCI.
>  
> IMS has some similarity to the ARM GIC ITS stuff IIRC, which already
> provides these things outside of PCI. Marc?
> 
> We probably need some generic infrastructure for this so PCI and
> everything
> else can use it.

Yeah agreed, I will look at the ARM GIC ITS to see how we can
consolidate this code.

> 
> > 
> > > 
> > > > 
> > > > +		/*
> > > > +		 * Save the pointer to the first msi_desc
> > > > entry of
> > > > every
> > > > +		 * MSI-X group. This pointer is used by other
> > > > functions
> > > > +		 * as the starting point to iterate through
> > > > each
> > > > of the
> > > > +		 * entries in that particular group.
> > > > +		 */
> > > > +		if (!i)
> > > > +			dev->dev.grp_first_desc =
> > > > list_last_entry
> > > > +			(dev_to_msi_list(&dev->dev), struct
> > > > msi_desc, list);
> > > How is that supposed to work? The pointer gets overwritten on
> > > every
> > > invocation of that interface. I assume this is merily an
> > > intermediate
> > > storage for setup. Shudder.
> > > 
> > Yes, you are right.
> > 
> > The grp_first_desc is simply a temporary storage to store the
> > first msi_desc entry of every group, which can be used by other
> > functions to iterate through the entries belonging to that group
> > only,
> > using the for_each_pci_msi_entry/ for_each_msi_entry_from macro. It
> > is
> > not the cleanest of solutions, I agree.
> Yeah, it's too ugly to exist.
> 

will get rid of it.

> > 
> > With your proposal of supporting a separate group list, I don't
> > think
> > there will be a need to use this kind of temporary storage
> > variable.
> Exactly.
> 
> > 
> > > 
> > > > 
> > > > -	for_each_pci_msi_entry(entry, dev) {
> > > > +	for_each_pci_msi_entry_from(entry, dev) {
> > >   > +/* Iterate through MSI entries of device dev starting from a
> > > given desc */
> > >   > +#define for_each_msi_entry_from(desc,
> > > dev)                             \
> > >   > +       desc =
> > > (*dev).grp_first_desc;                                   \
> > >   > +       list_for_each_entry_from((desc),
> > > dev_to_msi_list((dev)),
> > > list)  \
> > > 
> > > So this hides the whole group stuff behind a hideous iterator.
> > > 
> > > for_each_pci_msi_entry_from() ? from what? from the device? Sane
> > > iterators
> > > which have a _from naming, have also a from argument.
> > > 
> > This was meant to be "iterate over all the entries belonging to a
> > group", sorry if that was not clear. 
> > 
> > The current 'for_each_pci_msi_entry' macro iterates through all the
> > msi_desc entries belonging to a particular device. Since we have a
> > piecewise allocation of the MSI-X vectors with this change, we
> > would
> > want to iterate only through the newly added entries, i.e the
> > entries
> > allocated to the current group.
> I understand that, but please make macros and function names so they
> are
> halfways self explaining and intuitive.
> 

yeah, point taken.

>  > In V2, I will introduce a new macro,
> 'for_each_pci_msi_entry_group',
> > 
> > which will only iterate through the msi_desc entries belonging to a
> > particular group.
> for_each_pci_msi_entry_group()
> 
> is ambiguous. It could mean to iterate over the groups. 
> 
> for_each_pci_msi_entry_in_group()
> 
> avoids that.

Yes, agreed.
>  
> > 
> > > 
> > > > 
> > > > -	ret = msix_setup_entries(dev, base, entries, nvec,
> > > > affd);
> > > > +	ret = msix_setup_entries(dev, dev->base, entries,
> > > > nvec,
> > > > affd, group);
> > > >  	if (ret)
> > > >  		return ret;
> > > Any error exit in this function will leave MSIx disabled. That
> > > means
> > > if
> > > this is a subsequent group allocation which fails for whatever
> > > reason, this
> > > will render all existing and possibly already in use interrupts
> > > unusable.
> > > 
> > Hmmm yeah, I hadn't thought about this!
> > 
> > So according to the code, we must 'Ensure MSI-X is disabled while
> > it is
> > set up'. MSI-X would be disabled until the setup of the new vectors
> > is
> > complete, even if we do not take the error exit right?
> > 
> > Earlier this was not a problem since we disable the MSI-X, setup
> > all
> > the vectors at once, and then enable the MSI-X once and for all. 
> > 
> > I am not sure how to avoid disabling of MSI-X here.
> The problem with your code is that is keeps it disabled in case of an
> error, which makes all existing users (groups) starve.
> 
> But, yes there is also the question what happens during the time when
> interrupts are raised on already configured devices exactly during
> the time
> where MSI-X is disabled temporarily to setup a new group. I fear that
> will
> end up with lost interrupts and/or spurious interrupts via the legacy
> INT[ABCD]. That really needs to be investigated _before_ we go there.
> 

Hmm, yeah that is my concern, I do not know what the behavior would be.
Any experiments that I can run to know more?

> > 
> > > 
> > > > 
> > > >  static int __pci_enable_msix_range(struct pci_dev *dev,
> > > >  				   struct msix_entry *entries,
> > > > int
> > > > minvec,
> > > > -				   int maxvec, struct
> > > > irq_affinity
> > > > *affd)
> > > > +				   int maxvec, struct
> > > > irq_affinity
> > > > *affd,
> > > > +				   bool one_shot, int group)
> > > >  {
> > > >  	int rc, nvec = maxvec;
> > > >  
> > > >  	if (maxvec < minvec)
> > > >  		return -ERANGE;
> > > >  
> > > > -	if (WARN_ON_ONCE(dev->msix_enabled))
> > > > -		return -EINVAL;
> > > So any misbehaving PCI driver can now call into this without
> > > being
> > > caught.
> > > 
> > I do not understand what misbehaving PCI driver means :(
> The one which calls into that interface _AFTER_ msix is enabled. We
> catch
> that right now and reject it.
> 

Right.
> > 
> > Basically this statement is what denies multiple MSI-X vector
> > allocations, and I wanted to remove it so that we could do just
> > that.
> > 
> > Please let me know how I could change this.
> There are several ways to do that, but it needs to be made
> conditionally on
> things like 'device has group mode support' ...
>  

Ok.
> > 
> > > 
> > > If you want to support group based allocations, then the PCI/MSI
> > > facility
> > > has to be refactored from ground up.
> > > 
> > >   1) Introduce the concept of groups by adding a group list head
> > > to
> > > struct
> > >      pci_dev. Ideally you create a new struct pci_dev_msi or
> > > whatever
> > > where
> > >      all this muck goes into.
> > > 
> > I think we can use the existing list_head 'msi_list' in the struct
> > device for this, instead of having a new list_head for the group.
> > So
> > now instead of msi_list being a list of all the msi_desc entries,
> > it
> > will have a list of the different groups associated with the
> > device.
> > 
> > IMHO, since IMS is non PCI compliant, having this group_list_head
> > would
> > be better off in struct device than struct pci_dev, which would
> > enable
> > code reuse.
> Sure, but then we really need to look at the IMS requirements in
> order not
> to rewrite this whole thing over and over.
> 

Yeah, since IMS is non PCI compliant, if we make all these changes at
the struct device level unlike the struct pci_dev level, I think we can
reuse the same code for all MSI/ MSI like interrupts.

Having said that I will send out an RFC of the initial IMS code, to
avoid any sort of code duplication.
> > 
> > > 
> > >   2) Change the existing code to treat the current allocation
> > > mode as
> > > a
> > >      group allocation. Keep the entries in a new struct
> > > msi_entry_group and
> > >      have a group id, list head and the entries in there.
> > > 
> > I am thinking of something like this, please let me know if this is
> > what you are suggesting:
> > 
> > 1. Introduce a new msi_entry_group struct:
> > struct msi_entry_grp {
> >   int group_id; // monotonically increasing group_id
> >   int num_vecs; // number of msi_desc entries per group
> >   struct list_head group_list; // Added to msi_list in struct
> > device
> >   struct list_head entry_list; // list of msi_desc entries for this
> > grp
> > }
> Looks about right.

Ok! 
> > 
> > 2. Add a new 'for_each_pci_msi_entry_group' macro. This macro
> > should
> > only iterate through the msi_desc entries belonging to a group.
> See above.
>  

will update this is V2.

> > 
> > 3. The existing for_each_pci_msi_entry, needs to be modified so
> > that it
> > is backward compatible. This macro should still be able to iterate
> > through all the entries in all the groups. 
> I'm not sure. It might be just the thing which iterates over group 0,
> which
> is the default for all devices which do not use/support group mode,
> but
> let's see.
>  

Yeah, I am currently trying to get this approach to work i.e. reworking
the for_each_pci_msi_entry macro to be group aware (not gotten it to
work thus far, though).

> Thanks,
> 
> 	tglx

next prev parent reply	other threads:[~2019-08-12 17:25 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-22  0:19 [RFC V1 RESEND 0/6] Introduce dynamic allocation/freeing of MSI-X vectors Megha Dey
2019-06-22  0:19 ` [RFC V1 RESEND 1/6] PCI/MSI: New structures/macros for dynamic MSI-X allocation Megha Dey
2019-06-22  0:19 ` [RFC V1 RESEND 2/6] PCI/MSI: Dynamic allocation of MSI-X vectors by group Megha Dey
2019-06-29  7:59   ` Thomas Gleixner
2019-08-06 19:05     ` Megha Dey
2019-08-07 13:56       ` Thomas Gleixner
2019-08-07 14:18         ` Marc Zyngier
2019-08-11  7:20           ` Thomas Gleixner
2019-08-12 17:54             ` Megha Dey
2019-08-12 17:48           ` Megha Dey
2019-08-12 17:47         ` Megha Dey [this message]
2021-01-07 22:30     ` Jacob Keller
2019-06-22  0:19 ` [RFC V1 RESEND 3/6] x86: Introduce the dynamic teardown function Megha Dey
2019-06-29  8:01   ` Thomas Gleixner
2019-08-06 19:06     ` Megha Dey
2019-06-22  0:19 ` [RFC V1 RESEND 4/6] PCI/MSI: Introduce new structure to manage MSI-x entries Megha Dey
2019-06-22  0:19 ` [RFC V1 RESEND 5/6] PCI/MSI: Free MSI-X resources by group Megha Dey
2019-06-29  8:08   ` Thomas Gleixner
2019-08-06 19:09     ` Megha Dey
2019-08-11  7:18       ` Thomas Gleixner
2019-08-12 18:13         ` Megha Dey
2019-06-22  0:19 ` [RFC V1 RESEND 6/6] Documentation: PCI/MSI: Document dynamic MSI-X infrastructure Megha Dey
2019-08-02  0:24 ` [RFC V1 RESEND 0/6] Introduce dynamic allocation/freeing of MSI-X vectors Bjorn Helgaas
2019-08-06 19:12   ` Megha Dey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1565632032.7042.12.camel@intel.com \
    --to=megha.dey@intel.com \
    --cc=ashok.raj@intel.com \
    --cc=bhelgaas@google.com \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=marc.zyngier@arm.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.