From: Bjorn Helgaas <helgaas@kernel.org>
To: Nitesh Narayan Lal <nitesh@redhat.com>
Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
linux-pci@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
frederic@kernel.org, mtosatti@redhat.com, sassmann@redhat.com,
jesse.brandeburg@intel.com, lihong.yang@intel.com,
jeffrey.t.kirsher@intel.com, jacob.e.keller@intel.com,
jlelli@redhat.com, hch@infradead.org, bhelgaas@google.com,
mike.marciniszyn@intel.com, dennis.dalessandro@intel.com,
thomas.lendacky@amd.com, jerinj@marvell.com,
mathias.nyman@intel.com, jiri@nvidia.com, mingo@redhat.com,
peterz@infradead.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org
Subject: Re: [PATCH v2 4/4] PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs
Date: Thu, 24 Sep 2020 17:59:08 -0500 [thread overview]
Message-ID: <20200924225908.GA2367591@bjorn-Precision-5520> (raw)
In-Reply-To: <493a6fe5-f33f-85b2-6149-809f3cb4390f@redhat.com>
On Thu, Sep 24, 2020 at 05:39:07PM -0400, Nitesh Narayan Lal wrote:
>
> On 9/24/20 4:45 PM, Bjorn Helgaas wrote:
> > Possible subject:
> >
> > PCI: Limit pci_alloc_irq_vectors() to housekeeping CPUs
>
> Will switch to this.
>
> > On Wed, Sep 23, 2020 at 02:11:26PM -0400, Nitesh Narayan Lal wrote:
> >> This patch limits the pci_alloc_irq_vectors, max_vecs argument that is
> >> passed on by the caller based on the housekeeping online CPUs (that are
> >> meant to perform managed IRQ jobs).
> >>
> >> A minimum of the max_vecs passed and housekeeping online CPUs is derived
> >> to ensure that we don't create excess vectors as that can be problematic
> >> specifically in an RT environment. In cases where the min_vecs exceeds the
> >> housekeeping online CPUs, max vecs is restricted based on the min_vecs
> >> instead. The proposed change is required because for an RT environment
> >> unwanted IRQs are moved to the housekeeping CPUs from isolated CPUs to
> >> keep the latency overhead to a minimum. If the number of housekeeping CPUs
> >> is significantly lower than that of the isolated CPUs we can run into
> >> failures while moving these IRQs to housekeeping CPUs due to per CPU
> >> vector limit.
> > Does this capture enough of the log?
> >
> > If we have isolated CPUs dedicated for use by real-time tasks, we
> > try to move IRQs to housekeeping CPUs to reduce overhead on the
> > isolated CPUs.
>
> How about:
> "
> If we have isolated CPUs or CPUs running in nohz_full mode for the purpose
> of real-time, we try to move IRQs to housekeeping CPUs to reduce latency
> overhead on these real-time CPUs.
> "
>
> What do you think?
It's OK, but from the PCI core perspective, "nohz_full mode" doesn't
really mean anything. I think it's a detail that should be inside the
"housekeeping CPU" abstraction.
> > If we allocate too many IRQ vectors, moving them all to housekeeping
> > CPUs may exceed per-CPU vector limits.
> >
> > When we have isolated CPUs, limit the number of vectors allocated by
> > pci_alloc_irq_vectors() to the minimum number required by the
> > driver, or to one per housekeeping CPU if that is larger
>
> I think this is good, I can adopt this.
>
> > .
> >
> >> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> >> ---
> >> include/linux/pci.h | 15 +++++++++++++++
> >> 1 file changed, 15 insertions(+)
> >>
> >> diff --git a/include/linux/pci.h b/include/linux/pci.h
> >> index 835530605c0d..cf9ca9410213 100644
> >> --- a/include/linux/pci.h
> >> +++ b/include/linux/pci.h
> >> @@ -38,6 +38,7 @@
> >> #include <linux/interrupt.h>
> >> #include <linux/io.h>
> >> #include <linux/resource_ext.h>
> >> +#include <linux/sched/isolation.h>
> >> #include <uapi/linux/pci.h>
> >>
> >> #include <linux/pci_ids.h>
> >> @@ -1797,6 +1798,20 @@ static inline int
> >> pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
> >> unsigned int max_vecs, unsigned int flags)
> >> {
> >> + unsigned int hk_cpus = hk_num_online_cpus();
> >> +
> >> + /*
> >> + * For a real-time environment, try to be conservative and at max only
> >> + * ask for the same number of vectors as there are housekeeping online
> >> + * CPUs. In case, the min_vecs requested exceeds the housekeeping
> >> + * online CPUs, restrict the max_vecs based on the min_vecs instead.
> >> + */
> >> + if (hk_cpus != num_online_cpus()) {
> >> + if (min_vecs > hk_cpus)
> >> + max_vecs = min_vecs;
> >> + else
> >> + max_vecs = min_t(int, max_vecs, hk_cpus);
> >> + }
> > Is the below basically the same?
> >
> > /*
> > * If we have isolated CPUs for use by real-time tasks,
> > * minimize overhead on those CPUs by moving IRQs to the
> > * remaining "housekeeping" CPUs. Limit vector usage to keep
> > * housekeeping CPUs from running out of IRQ vectors.
> > */
>
> How about the following as a comment:
>
> "
> If we have isolated CPUs or CPUs running in nohz_full mode for real-time,
> latency overhead is minimized on those CPUs by moving the IRQ vectors to
> the housekeeping CPUs. Limit the number of vectors to keep housekeeping
> CPUs from running out of IRQ vectors.
>
> "
>
> > if (housekeeping_cpus < num_online_cpus()) {
> > if (housekeeping_cpus < min_vecs)
> > max_vecs = min_vecs;
> > else if (housekeeping_cpus < max_vecs)
> > max_vecs = housekeeping_cpus;
> > }
>
> The only reason I went with hk_cpus instead of housekeeping_cpus is because
> at other places in the kernel I found a similar naming convention (eg.
> hk_mask, hk_flags etc.).
> But if housekeeping_cpus makes things more clear, I can switch to that
> instead.
>
> Although after Frederic and Peter's suggestion the previous call will change
> to something like:
>
> "
> housekeeping_cpus = housekeeping_num_online_cpus(HK_FLAG_MANAGED_IRQ);
> "
>
> Which should still falls in the that 80 chars per line limit.
I don't really care whether it's "hk_cpus" or "housekeeping_cpus" as
long as "housekeeping" appears in the code somewhere. If we call
"housekeeping_num_online_cpus()" that should be enough.
> > My comment isn't quite right because this patch only limits the number
> > of vectors; it doesn't actually *move* IRQs to the housekeeping CPUs.
>
> Yeap it doesn't move IRQs to the housekeeping CPUs.
>
> > I don't know where the move happens (or maybe you just avoid assigning
> > IRQs to isolated CPUs, and I don't know how that happens either).
>
> This can happen in the userspace, either manually or by some application
> such as tuned.
Some brief hint about this might be helpful.
> >> return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
> >> NULL);
> >> }
> >> --
> >> 2.18.2
> >>
> --
> Thanks
> Nitesh
>
next prev parent reply other threads:[~2020-09-24 22:59 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-23 18:11 [PATCH v2 0/4] isolation: limit msix vectors based on housekeeping CPUs Nitesh Narayan Lal
2020-09-23 18:11 ` [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs Nitesh Narayan Lal
2020-09-24 8:40 ` peterz
2020-09-24 12:09 ` Frederic Weisbecker
2020-09-24 12:23 ` Nitesh Narayan Lal
2020-09-24 12:24 ` Peter Zijlstra
2020-09-24 12:11 ` Frederic Weisbecker
2020-09-24 12:26 ` Nitesh Narayan Lal
2020-09-24 12:46 ` Peter Zijlstra
2020-09-24 13:45 ` Nitesh Narayan Lal
2020-09-24 20:47 ` Bjorn Helgaas
2020-09-24 21:52 ` Nitesh Narayan Lal
2020-09-23 18:11 ` [PATCH v2 2/4] sched/isolation: Extend nohz_full to isolate managed IRQs Nitesh Narayan Lal
2020-09-23 18:11 ` [PATCH v2 3/4] i40e: limit msix vectors based on housekeeping CPUs Nitesh Narayan Lal
2020-09-23 18:11 ` [PATCH v2 4/4] PCI: Limit pci_alloc_irq_vectors as per " Nitesh Narayan Lal
2020-09-24 20:45 ` Bjorn Helgaas
2020-09-24 21:39 ` Nitesh Narayan Lal
2020-09-24 22:59 ` Bjorn Helgaas [this message]
2020-09-24 23:40 ` Nitesh Narayan Lal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200924225908.GA2367591@bjorn-Precision-5520 \
--to=helgaas@kernel.org \
--cc=bhelgaas@google.com \
--cc=dennis.dalessandro@intel.com \
--cc=frederic@kernel.org \
--cc=hch@infradead.org \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=jacob.e.keller@intel.com \
--cc=jeffrey.t.kirsher@intel.com \
--cc=jerinj@marvell.com \
--cc=jesse.brandeburg@intel.com \
--cc=jiri@nvidia.com \
--cc=jlelli@redhat.com \
--cc=juri.lelli@redhat.com \
--cc=lihong.yang@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=mathias.nyman@intel.com \
--cc=mike.marciniszyn@intel.com \
--cc=mingo@redhat.com \
--cc=mtosatti@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=nitesh@redhat.com \
--cc=peterz@infradead.org \
--cc=sassmann@redhat.com \
--cc=thomas.lendacky@amd.com \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox