public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nitesh Narayan Lal <nitesh@redhat.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	linux-pci@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	frederic@kernel.org, mtosatti@redhat.com, sassmann@redhat.com,
	jesse.brandeburg@intel.com, lihong.yang@intel.com,
	jeffrey.t.kirsher@intel.com, jacob.e.keller@intel.com,
	jlelli@redhat.com, hch@infradead.org, bhelgaas@google.com,
	mike.marciniszyn@intel.com, dennis.dalessandro@intel.com,
	thomas.lendacky@amd.com, jerinj@marvell.com,
	mathias.nyman@intel.com, jiri@nvidia.com, mingo@redhat.com,
	peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org
Subject: Re: [PATCH v2 4/4] PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs
Date: Thu, 24 Sep 2020 19:40:49 -0400	[thread overview]
Message-ID: <176efdda-7d23-dea2-ef41-5df7cae56233@redhat.com> (raw)
In-Reply-To: <20200924225908.GA2367591@bjorn-Precision-5520>


[-- Attachment #1.1: Type: text/plain, Size: 6343 bytes --]


On 9/24/20 6:59 PM, Bjorn Helgaas wrote:
> On Thu, Sep 24, 2020 at 05:39:07PM -0400, Nitesh Narayan Lal wrote:
>> On 9/24/20 4:45 PM, Bjorn Helgaas wrote:
>>> Possible subject:
>>>
>>>   PCI: Limit pci_alloc_irq_vectors() to housekeeping CPUs
>> Will switch to this.
>>
>>> On Wed, Sep 23, 2020 at 02:11:26PM -0400, Nitesh Narayan Lal wrote:
>>>> This patch limits the pci_alloc_irq_vectors, max_vecs argument that is
>>>> passed on by the caller based on the housekeeping online CPUs (that are
>>>> meant to perform managed IRQ jobs).
>>>>
>>>> A minimum of the max_vecs passed and housekeeping online CPUs is derived
>>>> to ensure that we don't create excess vectors as that can be problematic
>>>> specifically in an RT environment. In cases where the min_vecs exceeds the
>>>> housekeeping online CPUs, max vecs is restricted based on the min_vecs
>>>> instead. The proposed change is required because for an RT environment
>>>> unwanted IRQs are moved to the housekeeping CPUs from isolated CPUs to
>>>> keep the latency overhead to a minimum. If the number of housekeeping CPUs
>>>> is significantly lower than that of the isolated CPUs we can run into
>>>> failures while moving these IRQs to housekeeping CPUs due to per CPU
>>>> vector limit.
>>> Does this capture enough of the log?
>>>
>>>   If we have isolated CPUs dedicated for use by real-time tasks, we
>>>   try to move IRQs to housekeeping CPUs to reduce overhead on the
>>>   isolated CPUs.
>> How about:
>> "
>> If we have isolated CPUs or CPUs running in nohz_full mode for the purpose
>> of real-time, we try to move IRQs to housekeeping CPUs to reduce latency
>> overhead on these real-time CPUs.
>> "
>>
>> What do you think?
> It's OK, but from the PCI core perspective, "nohz_full mode" doesn't
> really mean anything.  I think it's a detail that should be inside the
> "housekeeping CPU" abstraction.

I get your point, in that case I will probably stick to your original
suggestion just replacing the term "overhead" with "latency overhead".

>
>>>   If we allocate too many IRQ vectors, moving them all to housekeeping
>>>   CPUs may exceed per-CPU vector limits.
>>>
>>>   When we have isolated CPUs, limit the number of vectors allocated by
>>>   pci_alloc_irq_vectors() to the minimum number required by the
>>>   driver, or to one per housekeeping CPU if that is larger
>> I think this is good, I can adopt this.
>>
>>> .
>>>
>>>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>>>> ---
>>>>  include/linux/pci.h | 15 +++++++++++++++
>>>>  1 file changed, 15 insertions(+)
>>>>
>>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>>> index 835530605c0d..cf9ca9410213 100644
>>>> --- a/include/linux/pci.h
>>>> +++ b/include/linux/pci.h
>>>> @@ -38,6 +38,7 @@
>>>>  #include <linux/interrupt.h>
>>>>  #include <linux/io.h>
>>>>  #include <linux/resource_ext.h>
>>>> +#include <linux/sched/isolation.h>
>>>>  #include <uapi/linux/pci.h>
>>>>  
>>>>  #include <linux/pci_ids.h>
>>>> @@ -1797,6 +1798,20 @@ static inline int
>>>>  pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
>>>>  		      unsigned int max_vecs, unsigned int flags)
>>>>  {
>>>> +	unsigned int hk_cpus = hk_num_online_cpus();
>>>> +
>>>> +	/*
>>>> +	 * For a real-time environment, try to be conservative and at max only
>>>> +	 * ask for the same number of vectors as there are housekeeping online
>>>> +	 * CPUs. In case, the min_vecs requested exceeds the housekeeping
>>>> +	 * online CPUs, restrict the max_vecs based on the min_vecs instead.
>>>> +	 */
>>>> +	if (hk_cpus != num_online_cpus()) {
>>>> +		if (min_vecs > hk_cpus)
>>>> +			max_vecs = min_vecs;
>>>> +		else
>>>> +			max_vecs = min_t(int, max_vecs, hk_cpus);
>>>> +	}
>>> Is the below basically the same?
>>>
>>> 	/*
>>> 	 * If we have isolated CPUs for use by real-time tasks,
>>> 	 * minimize overhead on those CPUs by moving IRQs to the
>>> 	 * remaining "housekeeping" CPUs.  Limit vector usage to keep
>>> 	 * housekeeping CPUs from running out of IRQ vectors.
>>> 	 */
>> How about the following as a comment:
>>
>> "
>> If we have isolated CPUs or CPUs running in nohz_full mode for real-time,
>> latency overhead is minimized on those CPUs by moving the IRQ vectors to
>> the housekeeping CPUs. Limit the number of vectors to keep housekeeping
>> CPUs from running out of IRQ vectors.
>>
>> "
>>
>>> 	if (housekeeping_cpus < num_online_cpus()) {
>>> 		if (housekeeping_cpus < min_vecs)
>>> 			max_vecs = min_vecs;
>>> 		else if (housekeeping_cpus < max_vecs)
>>> 			max_vecs = housekeeping_cpus;
>>> 	}
>> The only reason I went with hk_cpus instead of housekeeping_cpus is because
>> at other places in the kernel I found a similar naming convention (eg.
>> hk_mask, hk_flags etc.).
>> But if housekeeping_cpus makes things more clear, I can switch to that
>> instead.
>>
>> Although after Frederic and Peter's suggestion the previous call will change
>> to something like:
>>
>> "
>> housekeeping_cpus = housekeeping_num_online_cpus(HK_FLAG_MANAGED_IRQ);
>> "
>>
>> Which should still falls in the that 80 chars per line limit.
> I don't really care whether it's "hk_cpus" or "housekeeping_cpus" as
> long as "housekeeping" appears in the code somewhere.  If we call
> "housekeeping_num_online_cpus()" that should be enough.

Got it, in that case we are good here.

>
>>> My comment isn't quite right because this patch only limits the number
>>> of vectors; it doesn't actually *move* IRQs to the housekeeping CPUs.
>> Yeap it doesn't move IRQs to the housekeeping CPUs.
>>
>>> I don't know where the move happens (or maybe you just avoid assigning
>>> IRQs to isolated CPUs, and I don't know how that happens either).
>> This can happen in the userspace, either manually or by some application
>> such as tuned.
> Some brief hint about this might be helpful.

Sure, I will try to come up with something on the lines what you suggested
i.e., it also includes the information that the IRQs are moved from the
userspace.


>
>>>>  	return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
>>>>  					      NULL);
>>>>  }
>>>> -- 
>>>> 2.18.2
>>>>
>> -- 
>> Thanks
>> Nitesh
>>
>
>
-- 
Thanks
Nitesh


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

      reply	other threads:[~2020-09-24 23:41 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-23 18:11 [PATCH v2 0/4] isolation: limit msix vectors based on housekeeping CPUs Nitesh Narayan Lal
2020-09-23 18:11 ` [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs Nitesh Narayan Lal
2020-09-24  8:40   ` peterz
2020-09-24 12:09     ` Frederic Weisbecker
2020-09-24 12:23       ` Nitesh Narayan Lal
2020-09-24 12:24       ` Peter Zijlstra
2020-09-24 12:11   ` Frederic Weisbecker
2020-09-24 12:26     ` Nitesh Narayan Lal
2020-09-24 12:46   ` Peter Zijlstra
2020-09-24 13:45     ` Nitesh Narayan Lal
2020-09-24 20:47   ` Bjorn Helgaas
2020-09-24 21:52     ` Nitesh Narayan Lal
2020-09-23 18:11 ` [PATCH v2 2/4] sched/isolation: Extend nohz_full to isolate managed IRQs Nitesh Narayan Lal
2020-09-23 18:11 ` [PATCH v2 3/4] i40e: limit msix vectors based on housekeeping CPUs Nitesh Narayan Lal
2020-09-23 18:11 ` [PATCH v2 4/4] PCI: Limit pci_alloc_irq_vectors as per " Nitesh Narayan Lal
2020-09-24 20:45   ` Bjorn Helgaas
2020-09-24 21:39     ` Nitesh Narayan Lal
2020-09-24 22:59       ` Bjorn Helgaas
2020-09-24 23:40         ` Nitesh Narayan Lal [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=176efdda-7d23-dea2-ef41-5df7cae56233@redhat.com \
    --to=nitesh@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=dennis.dalessandro@intel.com \
    --cc=frederic@kernel.org \
    --cc=hch@infradead.org \
    --cc=helgaas@kernel.org \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jacob.e.keller@intel.com \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=jerinj@marvell.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=jiri@nvidia.com \
    --cc=jlelli@redhat.com \
    --cc=juri.lelli@redhat.com \
    --cc=lihong.yang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mathias.nyman@intel.com \
    --cc=mike.marciniszyn@intel.com \
    --cc=mingo@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=sassmann@redhat.com \
    --cc=thomas.lendacky@amd.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox