Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Re: [PATCH] cpuset: Enforce that a child's cpus must be a subset of the parent
From: Peter Zijlstra @ 2018-05-31  7:43 UTC (permalink / raw)
  To: Zefan Li
  Cc: Waiman Long, Tejun Heo, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi
In-Reply-To: <5B0F4F09.9050100@huawei.com>

On Thu, May 31, 2018 at 09:25:29AM +0800, Zefan Li wrote:
> Hi Waiman,
> 
> On 2018/5/30 21:46, Waiman Long wrote:
> > It was found that the cpuset.cpus could contain CPUs that are not listed
> > in their parent's cpu list as shown by the command sequence below:
> > 
> >   # echo "+cpuset" >cgroup.subtree_control
> >   # mkdir g1
> >   # echo 0-5 >g1/cpuset.cpus
> >   # mkdir g1/g11
> >   # echo "+cpuset" > g1/cgroup.subtree_control
> >   # echo 6-11 >g1/g11/cpuset.cpus
> >   # grep -R . g1 | grep "\.cpus"
> >   g1/cpuset.cpus:0-5
> >   g1/cpuset.cpus.effective:0-5
> >   g1/g11/cpuset.cpus:6-11
> >   g1/g11/cpuset.cpus.effective:0-5
> > 
> > As the intersection of g11's cpus and that of g1 is empty, the effective
> > cpus of g11 is just that of g1. The check in update_cpumask() is now
> > corrected to make sure that cpus in a child cpus must be a subset of
> > its parent's cpus. The error "write error: Invalid argument" will now
> > be reported in the above case.
> > 
> 
> We made the distinction between user-configured CPUs and effective CPUs
> in commit 7e88291beefbb758, so actually it's not a bug.

Why though; that makes no sense what so ever.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] cpuset: Enforce that a child's cpus must be a subset of the parent
From: Zefan Li @ 2018-05-31  8:12 UTC (permalink / raw)
  To: Waiman Long, Tejun Heo, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi
In-Reply-To: <5B0F4F09.9050100@huawei.com>

On 2018/5/31 9:25, Zefan Li wrote:
> Hi Waiman,
> 
> On 2018/5/30 21:46, Waiman Long wrote:
>> It was found that the cpuset.cpus could contain CPUs that are not listed
>> in their parent's cpu list as shown by the command sequence below:
>>
>>   # echo "+cpuset" >cgroup.subtree_control
>>   # mkdir g1
>>   # echo 0-5 >g1/cpuset.cpus
>>   # mkdir g1/g11
>>   # echo "+cpuset" > g1/cgroup.subtree_control
>>   # echo 6-11 >g1/g11/cpuset.cpus
>>   # grep -R . g1 | grep "\.cpus"
>>   g1/cpuset.cpus:0-5
>>   g1/cpuset.cpus.effective:0-5
>>   g1/g11/cpuset.cpus:6-11
>>   g1/g11/cpuset.cpus.effective:0-5
>>
>> As the intersection of g11's cpus and that of g1 is empty, the effective
>> cpus of g11 is just that of g1. The check in update_cpumask() is now
>> corrected to make sure that cpus in a child cpus must be a subset of
>> its parent's cpus. The error "write error: Invalid argument" will now
>> be reported in the above case.
>>
> 
> We made the distinction between user-configured CPUs and effective CPUs
> in commit 7e88291beefbb758, so actually it's not a bug.
> 

I remember the original reason is to support restoration of the original
cpu after cpu offline->online. We use user-configured CPUs to remember
if the cpu should be restored in the cpuset after it's onlined.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] cpuset: Enforce that a child's cpus must be a subset of the parent
From: Peter Zijlstra @ 2018-05-31  8:26 UTC (permalink / raw)
  To: Zefan Li
  Cc: Waiman Long, Tejun Heo, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi
In-Reply-To: <5B0FAE72.1090204@huawei.com>

On Thu, May 31, 2018 at 04:12:34PM +0800, Zefan Li wrote:
> On 2018/5/31 9:25, Zefan Li wrote:
> > Hi Waiman,
> > 
> > On 2018/5/30 21:46, Waiman Long wrote:
> >> It was found that the cpuset.cpus could contain CPUs that are not listed
> >> in their parent's cpu list as shown by the command sequence below:
> >>
> >>   # echo "+cpuset" >cgroup.subtree_control
> >>   # mkdir g1
> >>   # echo 0-5 >g1/cpuset.cpus
> >>   # mkdir g1/g11
> >>   # echo "+cpuset" > g1/cgroup.subtree_control
> >>   # echo 6-11 >g1/g11/cpuset.cpus
> >>   # grep -R . g1 | grep "\.cpus"
> >>   g1/cpuset.cpus:0-5
> >>   g1/cpuset.cpus.effective:0-5
> >>   g1/g11/cpuset.cpus:6-11
> >>   g1/g11/cpuset.cpus.effective:0-5
> >>
> >> As the intersection of g11's cpus and that of g1 is empty, the effective
> >> cpus of g11 is just that of g1. The check in update_cpumask() is now
> >> corrected to make sure that cpus in a child cpus must be a subset of
> >> its parent's cpus. The error "write error: Invalid argument" will now
> >> be reported in the above case.
> >>
> > 
> > We made the distinction between user-configured CPUs and effective CPUs
> > in commit 7e88291beefbb758, so actually it's not a bug.
> > 
> 
> I remember the original reason is to support restoration of the original
> cpu after cpu offline->online. We use user-configured CPUs to remember
> if the cpu should be restored in the cpuset after it's onlined.

AFAICT you can do that and still have the child a subset of the parent,
no?
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] cpuset: Enforce that a child's cpus must be a subset of the parent
From: Juri Lelli @ 2018-05-31  8:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Zefan Li, Waiman Long, Tejun Heo, Johannes Weiner, Ingo Molnar,
	cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Patrick Bellasi
In-Reply-To: <20180531082613.GF12180@hirez.programming.kicks-ass.net>

On 31/05/18 10:26, Peter Zijlstra wrote:
> On Thu, May 31, 2018 at 04:12:34PM +0800, Zefan Li wrote:
> > On 2018/5/31 9:25, Zefan Li wrote:
> > > Hi Waiman,
> > > 
> > > On 2018/5/30 21:46, Waiman Long wrote:
> > >> It was found that the cpuset.cpus could contain CPUs that are not listed
> > >> in their parent's cpu list as shown by the command sequence below:
> > >>
> > >>   # echo "+cpuset" >cgroup.subtree_control
> > >>   # mkdir g1
> > >>   # echo 0-5 >g1/cpuset.cpus
> > >>   # mkdir g1/g11
> > >>   # echo "+cpuset" > g1/cgroup.subtree_control
> > >>   # echo 6-11 >g1/g11/cpuset.cpus
> > >>   # grep -R . g1 | grep "\.cpus"
> > >>   g1/cpuset.cpus:0-5
> > >>   g1/cpuset.cpus.effective:0-5
> > >>   g1/g11/cpuset.cpus:6-11
> > >>   g1/g11/cpuset.cpus.effective:0-5
> > >>
> > >> As the intersection of g11's cpus and that of g1 is empty, the effective
> > >> cpus of g11 is just that of g1. The check in update_cpumask() is now
> > >> corrected to make sure that cpus in a child cpus must be a subset of
> > >> its parent's cpus. The error "write error: Invalid argument" will now
> > >> be reported in the above case.
> > >>
> > > 
> > > We made the distinction between user-configured CPUs and effective CPUs
> > > in commit 7e88291beefbb758, so actually it's not a bug.
> > > 
> > 
> > I remember the original reason is to support restoration of the original
> > cpu after cpu offline->online. We use user-configured CPUs to remember
> > if the cpu should be restored in the cpuset after it's onlined.
> 
> AFAICT you can do that and still have the child a subset of the parent,
> no?

Plus this is not hotplug, but a user decision. It could make sense to
keep .cpus unmodified after hotplug events, but does it make sense to
let the user be able to choose cpus outside the parent domain?
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] cpuset: Enforce that a child's cpus must be a subset of the parent
From: Zefan Li @ 2018-05-31  8:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, Tejun Heo, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi
In-Reply-To: <20180531082613.GF12180@hirez.programming.kicks-ass.net>

On 2018/5/31 16:26, Peter Zijlstra wrote:
> On Thu, May 31, 2018 at 04:12:34PM +0800, Zefan Li wrote:
>> On 2018/5/31 9:25, Zefan Li wrote:
>>> Hi Waiman,
>>>
>>> On 2018/5/30 21:46, Waiman Long wrote:
>>>> It was found that the cpuset.cpus could contain CPUs that are not listed
>>>> in their parent's cpu list as shown by the command sequence below:
>>>>
>>>>   # echo "+cpuset" >cgroup.subtree_control
>>>>   # mkdir g1
>>>>   # echo 0-5 >g1/cpuset.cpus
>>>>   # mkdir g1/g11
>>>>   # echo "+cpuset" > g1/cgroup.subtree_control
>>>>   # echo 6-11 >g1/g11/cpuset.cpus
>>>>   # grep -R . g1 | grep "\.cpus"
>>>>   g1/cpuset.cpus:0-5
>>>>   g1/cpuset.cpus.effective:0-5
>>>>   g1/g11/cpuset.cpus:6-11
>>>>   g1/g11/cpuset.cpus.effective:0-5
>>>>
>>>> As the intersection of g11's cpus and that of g1 is empty, the effective
>>>> cpus of g11 is just that of g1. The check in update_cpumask() is now
>>>> corrected to make sure that cpus in a child cpus must be a subset of
>>>> its parent's cpus. The error "write error: Invalid argument" will now
>>>> be reported in the above case.
>>>>
>>>
>>> We made the distinction between user-configured CPUs and effective CPUs
>>> in commit 7e88291beefbb758, so actually it's not a bug.
>>>
>>
>> I remember the original reason is to support restoration of the original
>> cpu after cpu offline->online. We use user-configured CPUs to remember
>> if the cpu should be restored in the cpuset after it's onlined.
> 
> AFAICT you can do that and still have the child a subset of the parent,
> no?
> .

Sure. IIRC this was suggested by Tejun as he had done something similar to devcgroup.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v9 2/7] cpuset: Add new v2 cpuset.sched.domain_root flag
From: Peter Zijlstra @ 2018-05-31  9:49 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi
In-Reply-To: <1527601294-3444-3-git-send-email-longman@redhat.com>

On Tue, May 29, 2018 at 09:41:29AM -0400, Waiman Long wrote:
> +  cpuset.sched.domain_root
> +	A read-write single value file which exists on non-root
> +	cpuset-enabled cgroups.  It is a binary value flag that accepts
> +	either "0" (off) or "1" (on).  This flag is set by the parent
> +	and is not delegatable.

What does "is not delegatable" mean?

I think you used to say "is owned by the parent", which is took to mean
file ownership is that of the parent directory (..) and not of the
current (,), which is slightly odd but works.

So if you chown a cgroup to a user, that user will not be able to change
the file of it's 'root' (will actually be the root in case of
container), but it _can_ change this file for any sub-cgroups it
creates, right?

So in that respect the feature is delegatable, a container can create
sub-partitions. It just cannot change it's 'root' partition, which is
consistent with a real root.

The only inconsistently left is then that the real root does not have
the file at all, vs a container root having it, but not accessible.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 1/7] PCI: endpoint: Add MSI-X interfaces
From: Kishon Vijay Abraham I @ 2018-05-31 10:30 UTC (permalink / raw)
  To: Gustavo Pimentel, bhelgaas, lorenzo.pieralisi, Joao.Pinto,
	jingoohan1, adouglas, jesper.nilsson
  Cc: linux-pci, linux-doc, linux-kernel
In-Reply-To: <1c07395b28f8b33ce370bd6f15bb1e7f2a95ba45.1526576613.git.gustavo.pimentel@synopsys.com>



On Thursday 17 May 2018 10:39 PM, Gustavo Pimentel wrote:
> Add PCI_EPC_IRQ_MSIX type.
> 
> Add MSI-X callbacks signatures to the ops structure.
> 
> Add sysfs interface for set/get MSI-X capability maximum number.
> 
> Change pci_epc_raise_irq() signature, namely the interrupt_num variable type
> from u8 to u16 to accommodate 2048 maximum MSI-X interrupts.
> 
> Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>

Acked-by: Kishon Vijay Abraham I <kishon@ti.com>
> ---
> Change v1->v2:
>  - Nothing changed, just to follow the patch set version.
> 
>  drivers/pci/endpoint/pci-ep-cfs.c   | 24 +++++++++++++++
>  drivers/pci/endpoint/pci-epc-core.c | 59 ++++++++++++++++++++++++++++++++++++-
>  include/linux/pci-epc.h             | 13 ++++++--
>  include/linux/pci-epf.h             |  1 +
>  4 files changed, 94 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/endpoint/pci-ep-cfs.c b/drivers/pci/endpoint/pci-ep-cfs.c
> index 018ea34..d1288a0 100644
> --- a/drivers/pci/endpoint/pci-ep-cfs.c
> +++ b/drivers/pci/endpoint/pci-ep-cfs.c
> @@ -286,6 +286,28 @@ static ssize_t pci_epf_msi_interrupts_show(struct config_item *item,
>  		       to_pci_epf_group(item)->epf->msi_interrupts);
>  }
>  
> +static ssize_t pci_epf_msix_interrupts_store(struct config_item *item,
> +					     const char *page, size_t len)
> +{
> +	u16 val;
> +	int ret;
> +
> +	ret = kstrtou16(page, 0, &val);
> +	if (ret)
> +		return ret;
> +
> +	to_pci_epf_group(item)->epf->msix_interrupts = val;
> +
> +	return len;
> +}
> +
> +static ssize_t pci_epf_msix_interrupts_show(struct config_item *item,
> +					    char *page)
> +{
> +	return sprintf(page, "%d\n",
> +		       to_pci_epf_group(item)->epf->msix_interrupts);
> +}
> +
>  PCI_EPF_HEADER_R(vendorid)
>  PCI_EPF_HEADER_W_u16(vendorid)
>  
> @@ -327,6 +349,7 @@ CONFIGFS_ATTR(pci_epf_, subsys_vendor_id);
>  CONFIGFS_ATTR(pci_epf_, subsys_id);
>  CONFIGFS_ATTR(pci_epf_, interrupt_pin);
>  CONFIGFS_ATTR(pci_epf_, msi_interrupts);
> +CONFIGFS_ATTR(pci_epf_, msix_interrupts);
>  
>  static struct configfs_attribute *pci_epf_attrs[] = {
>  	&pci_epf_attr_vendorid,
> @@ -340,6 +363,7 @@ static struct configfs_attribute *pci_epf_attrs[] = {
>  	&pci_epf_attr_subsys_id,
>  	&pci_epf_attr_interrupt_pin,
>  	&pci_epf_attr_msi_interrupts,
> +	&pci_epf_attr_msix_interrupts,
>  	NULL,
>  };
>  
> diff --git a/drivers/pci/endpoint/pci-epc-core.c b/drivers/pci/endpoint/pci-epc-core.c
> index b0ee427..a23aa75 100644
> --- a/drivers/pci/endpoint/pci-epc-core.c
> +++ b/drivers/pci/endpoint/pci-epc-core.c
> @@ -137,7 +137,7 @@ EXPORT_SYMBOL_GPL(pci_epc_start);
>   * Invoke to raise an MSI or legacy interrupt
>   */
>  int pci_epc_raise_irq(struct pci_epc *epc, u8 func_no,
> -		      enum pci_epc_irq_type type, u8 interrupt_num)
> +		      enum pci_epc_irq_type type, u16 interrupt_num)
>  {
>  	int ret;
>  	unsigned long flags;
> @@ -218,6 +218,63 @@ int pci_epc_set_msi(struct pci_epc *epc, u8 func_no, u8 interrupts)
>  EXPORT_SYMBOL_GPL(pci_epc_set_msi);
>  
>  /**
> + * pci_epc_get_msix() - get the number of MSI-X interrupt numbers allocated
> + * @epc: the EPC device to which MSI-X interrupts was requested
> + * @func_no: the endpoint function number in the EPC device
> + *
> + * Invoke to get the number of MSI-X interrupts allocated by the RC
> + */
> +int pci_epc_get_msix(struct pci_epc *epc, u8 func_no)
> +{
> +	int interrupt;
> +	unsigned long flags;
> +
> +	if (IS_ERR_OR_NULL(epc) || func_no >= epc->max_functions)
> +		return 0;
> +
> +	if (!epc->ops->get_msix)
> +		return 0;
> +
> +	spin_lock_irqsave(&epc->lock, flags);
> +	interrupt = epc->ops->get_msix(epc, func_no);
> +	spin_unlock_irqrestore(&epc->lock, flags);
> +
> +	if (interrupt < 0)
> +		return 0;
> +
> +	return interrupt + 1;
> +}
> +EXPORT_SYMBOL_GPL(pci_epc_get_msix);
> +
> +/**
> + * pci_epc_set_msix() - set the number of MSI-X interrupt numbers required
> + * @epc: the EPC device on which MSI-X has to be configured
> + * @func_no: the endpoint function number in the EPC device
> + * @interrupts: number of MSI-X interrupts required by the EPF
> + *
> + * Invoke to set the required number of MSI-X interrupts.
> + */
> +int pci_epc_set_msix(struct pci_epc *epc, u8 func_no, u16 interrupts)
> +{
> +	int ret;
> +	unsigned long flags;
> +
> +	if (IS_ERR_OR_NULL(epc) || func_no >= epc->max_functions ||
> +	    interrupts < 1 || interrupts > 2048)
> +		return -EINVAL;
> +
> +	if (!epc->ops->set_msix)
> +		return 0;
> +
> +	spin_lock_irqsave(&epc->lock, flags);
> +	ret = epc->ops->set_msix(epc, func_no, interrupts - 1);
> +	spin_unlock_irqrestore(&epc->lock, flags);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(pci_epc_set_msix);
> +
> +/**
>   * pci_epc_unmap_addr() - unmap CPU address from PCI address
>   * @epc: the EPC device on which address is allocated
>   * @func_no: the endpoint function number in the EPC device
> diff --git a/include/linux/pci-epc.h b/include/linux/pci-epc.h
> index 243eaa5..c73abc2 100644
> --- a/include/linux/pci-epc.h
> +++ b/include/linux/pci-epc.h
> @@ -17,6 +17,7 @@ enum pci_epc_irq_type {
>  	PCI_EPC_IRQ_UNKNOWN,
>  	PCI_EPC_IRQ_LEGACY,
>  	PCI_EPC_IRQ_MSI,
> +	PCI_EPC_IRQ_MSIX,
>  };
>  
>  /**
> @@ -30,6 +31,10 @@ enum pci_epc_irq_type {
>   *	     capability register
>   * @get_msi: ops to get the number of MSI interrupts allocated by the RC from
>   *	     the MSI capability register
> + * @set_msix: ops to set the requested number of MSI-X interrupts in the
> + *	     MSI-X capability register
> + * @get_msix: ops to get the number of MSI-X interrupts allocated by the RC
> + *	     from the MSI-X capability register
>   * @raise_irq: ops to raise a legacy or MSI interrupt
>   * @start: ops to start the PCI link
>   * @stop: ops to stop the PCI link
> @@ -48,8 +53,10 @@ struct pci_epc_ops {
>  			      phys_addr_t addr);
>  	int	(*set_msi)(struct pci_epc *epc, u8 func_no, u8 interrupts);
>  	int	(*get_msi)(struct pci_epc *epc, u8 func_no);
> +	int	(*set_msix)(struct pci_epc *epc, u8 func_no, u16 interrupts);
> +	int	(*get_msix)(struct pci_epc *epc, u8 func_no);
>  	int	(*raise_irq)(struct pci_epc *epc, u8 func_no,
> -			     enum pci_epc_irq_type type, u8 interrupt_num);
> +			     enum pci_epc_irq_type type, u16 interrupt_num);
>  	int	(*start)(struct pci_epc *epc);
>  	void	(*stop)(struct pci_epc *epc);
>  	struct module *owner;
> @@ -144,8 +151,10 @@ void pci_epc_unmap_addr(struct pci_epc *epc, u8 func_no,
>  			phys_addr_t phys_addr);
>  int pci_epc_set_msi(struct pci_epc *epc, u8 func_no, u8 interrupts);
>  int pci_epc_get_msi(struct pci_epc *epc, u8 func_no);
> +int pci_epc_set_msix(struct pci_epc *epc, u8 func_no, u16 interrupts);
> +int pci_epc_get_msix(struct pci_epc *epc, u8 func_no);
>  int pci_epc_raise_irq(struct pci_epc *epc, u8 func_no,
> -		      enum pci_epc_irq_type type, u8 interrupt_num);
> +		      enum pci_epc_irq_type type, u16 interrupt_num);
>  int pci_epc_start(struct pci_epc *epc);
>  void pci_epc_stop(struct pci_epc *epc);
>  struct pci_epc *pci_epc_get(const char *epc_name);
> diff --git a/include/linux/pci-epf.h b/include/linux/pci-epf.h
> index f7d6f48..9bb1f31 100644
> --- a/include/linux/pci-epf.h
> +++ b/include/linux/pci-epf.h
> @@ -119,6 +119,7 @@ struct pci_epf {
>  	struct pci_epf_header	*header;
>  	struct pci_epf_bar	bar[6];
>  	u8			msi_interrupts;
> +	u16			msix_interrupts;
>  	u8			func_no;
>  
>  	struct pci_epc		*epc;
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [GIT PULL 0/7] perf/urgent fixes
From: Arnaldo Carvalho de Melo @ 2018-05-31 10:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Clark Williams, linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, Adrian Hunter, Agustin Vega-Frias,
	Alexander Shishkin, Andi Kleen, coresight, Daniel Borkmann,
	David Ahern, Ganapatrao Kulkarni, Heiko Carstens, He Kuang,
	Hendrik Brueckner, Jin Yao, Jiri Olsa, Jonathan Corbet, Kan Liang,
	kim.phillips, Kim Phillips, Lakshman Annadorai, Leo Yan,
	linux-arm-kernel, linux-doc, Martin Schwidefsky, Mathieu Poirier,
	Mike Leach, Namhyung Kim, netdev, Peter Zijlstra, Robert Walker,
	Shaokun Zhang, Simon Que, Stephane Eranian, Thomas Richter,
	Tor Jeremiassen, Wang Nan, Will Deacon, YueHaibing,
	Arnaldo Carvalho de Melo

Hi Ingo,

	Please consider pulling,

- Arnaldo

Test results at the end of this message, as usual.

The following changes since commit f3903c9161f0d636a7b0ff03841628928457e64c:

  Merge tag 'perf-urgent-for-mingo-4.17-20180514' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent (2018-05-15 08:20:45 +0200)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-urgent-for-mingo-4.17-20180531

for you to fetch changes up to 18a7057420f8b67f15d17087bf5c0863db752c8b:

  perf tools: Fix perf.data format description of NRCPUS header (2018-05-30 15:40:26 -0300)

----------------------------------------------------------------
perf/urgent fixes:

- Fix 'perf test Session topology' segfault on s390 (Thomas Richter)

- Fix NULL return handling in bpf__prepare_load() (YueHaibing)

- Fix indexing on Coresight ETM packet queue decoder (Mathieu Poirier)

- Fix perf.data format description of NRCPUS header (Arnaldo Carvalho de Melo)

- Update perf.data documentation section on cpu topology

- Handle uncore event aliases in small groups properly (Kan Liang)

- Add missing perf_sample.addr into python sample dictionary (Leo Yan)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

----------------------------------------------------------------
Arnaldo Carvalho de Melo (1):
      perf tools: Fix perf.data format description of NRCPUS header

Kan Liang (1):
      perf parse-events: Handle uncore event aliases in small groups properly

Leo Yan (1):
      perf script python: Add addr into perf sample dict

Mathieu Poirier (1):
      perf cs-etm: Fix indexing for decoder packet queue

Thomas Richter (2):
      perf test: "Session topology" dumps core on s390
      perf data: Update documentation section on cpu topology

YueHaibing (1):
      perf bpf: Fix NULL return handling in bpf__prepare_load()

 tools/perf/Documentation/perf.data-file-format.txt |  10 +-
 tools/perf/tests/topology.c                        |  30 ++++-
 tools/perf/util/bpf-loader.c                       |   6 +-
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c    |  12 +-
 tools/perf/util/evsel.h                            |   1 +
 tools/perf/util/parse-events.c                     | 130 ++++++++++++++++++++-
 tools/perf/util/parse-events.h                     |   7 +-
 tools/perf/util/parse-events.y                     |   8 +-
 .../util/scripting-engines/trace-event-python.c    |   2 +
 9 files changed, 185 insertions(+), 21 deletions(-)

Test results:

The first ones are container (docker) based builds of tools/perf with
and without libelf support.  Where clang is available, it is also used
to build perf with/without libelf, and building with LIBCLANGLLVM=1
(built-in clang) with gcc and clang when clang and its devel libraries
are installed.

The objtool and samples/bpf/ builds are disabled now that I'm switching from
using the sources in a local volume to fetching them from a http server to
build it inside the container, to make it easier to build in a container cluster.
Those will come back later.

Several are cross builds, the ones with -x-ARCH and the android one, and those
may not have all the features built, due to lack of multi-arch devel packages,
available and being used so far on just a few, like
debian:experimental-x-{arm64,mipsel}.

The 'perf test' one will perform a variety of tests exercising
tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands
with a variety of command line event specifications to then intercept the
sys_perf_event syscall to check that the perf_event_attr fields are set up as
expected, among a variety of other unit tests.

Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/
with a variety of feature sets, exercising the build with an incomplete set of
features as well as with a complete one. It is planned to have it run on each
of the containers mentioned above, using some container orchestration
infrastructure. Get in contact if interested in helping having this in place.

   1 alpine:3.4                    : Ok   gcc (Alpine 5.3.0) 5.3.0
   2 alpine:3.5                    : Ok   gcc (Alpine 6.2.1) 6.2.1 20160822
   3 alpine:3.6                    : Ok   gcc (Alpine 6.3.0) 6.3.0
   4 alpine:3.7                    : Ok   gcc (Alpine 6.4.0) 6.4.0
   5 alpine:edge                   : Ok   gcc (Alpine 6.4.0) 6.4.0
   6 amazonlinux:1                 : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)
   7 amazonlinux:2                 : Ok   gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)
   8 android-ndk:r12b-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
   9 android-ndk:r15c-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
  10 centos:5                      : Ok   gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55)
  11 centos:6                      : Ok   gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18)
  12 centos:7                      : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
  13 debian:7                      : Ok   gcc (Debian 4.7.2-5) 4.7.2
  14 debian:8                      : Ok   gcc (Debian 4.9.2-10+deb8u1) 4.9.2
  15 debian:9                      : Ok   gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
  16 debian:experimental           : Ok   gcc (Debian 7.3.0-19) 7.3.0
  17 debian:experimental-x-arm64   : Ok   aarch64-linux-gnu-gcc (Debian 7.3.0-19) 7.3.0
  18 debian:experimental-x-mips    : Ok   mips-linux-gnu-gcc (Debian 7.3.0-19) 7.3.0
  19 debian:experimental-x-mips64  : Ok   mips64-linux-gnuabi64-gcc (Debian 7.3.0-18) 7.3.0
  20 debian:experimental-x-mipsel  : Ok   mipsel-linux-gnu-gcc (Debian 7.3.0-19) 7.3.0
  21 fedora:20                     : Ok   gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)
  22 fedora:21                     : Ok   gcc (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)
  23 fedora:22                     : Ok   gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
  24 fedora:23                     : Ok   gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
  25 fedora:24                     : Ok   gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1)
  26 fedora:24-x-ARC-uClibc        : Ok   arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710
  27 fedora:25                     : Ok   gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1)
  28 fedora:26                     : Ok   gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2)
  29 fedora:27                     : Ok   gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)
  30 fedora:28                     : Ok   gcc (GCC) 8.1.1 20180502 (Red Hat 8.1.1-1)
  31 fedora:rawhide                : Ok   gcc (GCC) 8.0.1 20180324 (Red Hat 8.0.1-0.20)
  32 gentoo-stage3-amd64:latest    : Ok   gcc (Gentoo 6.4.0-r1 p1.3) 6.4.0
  33 mageia:5                      : Ok   gcc (GCC) 4.9.2
  34 mageia:6                      : Ok   gcc (Mageia 5.5.0-1.mga6) 5.5.0
  35 opensuse:42.1                 : Ok   gcc (SUSE Linux) 4.8.5
  36 opensuse:42.2                 : Ok   gcc (SUSE Linux) 4.8.5
  37 opensuse:42.3                 : Ok   gcc (SUSE Linux) 4.8.5
  38 opensuse:tumbleweed           : Ok   gcc (SUSE Linux) 7.3.1 20180323 [gcc-7-branch revision 258812]
  39 oraclelinux:6                 : Ok   gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18.0.7)
  40 oraclelinux:7                 : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28.0.1)
  41 ubuntu:12.04.5                : Ok   gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
  42 ubuntu:14.04.4                : Ok   gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
  43 ubuntu:14.04.4-x-linaro-arm64 : Ok   aarch64-linux-gnu-gcc (Linaro GCC 5.5-2017.10) 5.5.0
  44 ubuntu:16.04                  : Ok   gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  45 ubuntu:16.04-x-arm            : Ok   arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  46 ubuntu:16.04-x-arm64          : Ok   aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  47 ubuntu:16.04-x-powerpc        : Ok   powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  48 ubuntu:16.04-x-powerpc64      : Ok   powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  49 ubuntu:16.04-x-powerpc64el    : Ok   powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  50 ubuntu:16.04-x-s390           : Ok   s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  51 ubuntu:16.10                  : Ok   gcc (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005
  52 ubuntu:17.04                  : Ok   gcc (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406
  53 ubuntu:17.10                  : Ok   gcc (Ubuntu 7.2.0-8ubuntu3.2) 7.2.0
  54 ubuntu:18.04                  : Ok   gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0

  # git log --oneline -1
  18a7057420f8 (HEAD -> perf/urgent) perf tools: Fix perf.data format description of NRCPUS header
  # perf --version
  perf version 4.17.rc5.g18a7057
  # uname -a
  Linux jouet 4.17.0-rc5 #21 SMP Mon May 14 15:35:35 -03 2018 x86_64 x86_64 x86_64 GNU/Linux
  # perf test
   1: vmlinux symtab matches kallsyms                       : Ok
   2: Detect openat syscall event                           : Ok
   3: Detect openat syscall event on all cpus               : Ok
   4: Read samples using the mmap interface                 : Ok
   5: Test data source output                               : Ok
   6: Parse event definition strings                        : Ok
   7: Simple expression parser                              : Ok
   8: PERF_RECORD_* events & perf_sample fields             : Ok
   9: Parse perf pmu format                                 : Ok
  10: DSO data read                                         : Ok
  11: DSO data cache                                        : Ok
  12: DSO data reopen                                       : Ok
  13: Roundtrip evsel->name                                 : Ok
  14: Parse sched tracepoints fields                        : Ok
  15: syscalls:sys_enter_openat event fields                : Ok
  16: Setup struct perf_event_attr                          : Ok
  17: Match and link multiple hists                         : Ok
  18: 'import perf' in python                               : Ok
  19: Breakpoint overflow signal handler                    : Ok
  20: Breakpoint overflow sampling                          : Ok
  21: Breakpoint accounting                                 : Ok
  22: Number of exit events of a simple workload            : Ok
  23: Software clock events period values                   : Ok
  24: Object code reading                                   : Ok
  25: Sample parsing                                        : Ok
  26: Use a dummy software event to keep tracking           : Ok
  27: Parse with no sample_id_all bit set                   : Ok
  28: Filter hist entries                                   : Ok
  29: Lookup mmap thread                                    : Ok
  30: Share thread mg                                       : Ok
  31: Sort output of hist entries                           : Ok
  32: Cumulate child hist entries                           : Ok
  33: Track with sched_switch                               : Ok
  34: Filter fds with revents mask in a fdarray             : Ok
  35: Add fd to a fdarray, making it autogrow               : Ok
  36: kmod_path__parse                                      : Ok
  37: Thread map                                            : Ok
  38: LLVM search and compile                               :
  38.1: Basic BPF llvm compile                              : Ok
  38.2: kbuild searching                                    : Ok
  38.3: Compile source for BPF prologue generation          : Ok
  38.4: Compile source for BPF relocation                   : Ok
  39: Session topology                                      : Ok
  40: BPF filter                                            :
  40.1: Basic BPF filtering                                 : Ok
  40.2: BPF pinning                                         : Ok
  40.3: BPF prologue generation                             : Ok
  40.4: BPF relocation checker                              : Ok
  41: Synthesize thread map                                 : Ok
  42: Remove thread map                                     : Ok
  43: Synthesize cpu map                                    : Ok
  44: Synthesize stat config                                : Ok
  45: Synthesize stat                                       : Ok
  46: Synthesize stat round                                 : Ok
  47: Synthesize attr update                                : Ok
  48: Event times                                           : Ok
  49: Read backward ring buffer                             : Ok
  50: Print cpu map                                         : Ok
  51: Probe SDT events                                      : Ok
  52: is_printable_array                                    : Ok
  53: Print bitmap                                          : Ok
  54: perf hooks                                            : Ok
  55: builtin clang support                                 : Skip (not compiled in)
  56: unit_number__scnprintf                                : Ok
  57: mem2node                                              : Ok
  58: x86 rdpmc                                             : Ok
  59: Convert perf time to TSC                              : Ok
  60: DWARF unwind                                          : Ok
  61: x86 instruction decoder - new instructions            : Ok
  62: Use vfs_getname probe to get syscall args filenames   : Ok
  63: Check open filename arg using perf trace + vfs_getname: Ok
  64: probe libc's inet_pton & backtrace it with ping       : Ok
  65: Add vfs_getname probe to get syscall args filenames   : Ok
  #

  $ make -C tools/perf build-test
  make: Entering directory '/home/acme/git/perf/tools/perf'
  - tarpkg: ./tests/perf-targz-src-pkg .
                   make_pure_O: make
                 make_static_O: make LDFLAGS=-static
           make_no_libpython_O: make NO_LIBPYTHON=1
                    make_doc_O: make doc
  make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1
              make_no_libbpf_O: make NO_LIBBPF=1
           make_no_backtrace_O: make NO_BACKTRACE=1
            make_install_bin_O: make install-bin
            make_no_auxtrace_O: make NO_AUXTRACE=1
                  make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1
                   make_tags_O: make tags
         make_install_prefix_O: make install prefix=/tmp/krava
              make_clean_all_O: make clean all
             make_no_libperl_O: make NO_LIBPERL=1
             make_util_map_o_O: make util/map.o
        make_with_babeltrace_O: make LIBBABELTRACE=1
                 make_perf_o_O: make perf.o
           make_no_libunwind_O: make NO_LIBUNWIND=1
                make_no_newt_O: make NO_NEWT=1
            make_no_libaudit_O: make NO_LIBAUDIT=1
                make_no_gtk2_O: make NO_GTK2=1
                make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1
             make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1
           make_no_libbionic_O: make NO_LIBBIONIC=1
         make_with_clangllvm_O: make LIBCLANGLLVM=1
   make_install_prefix_slash_O: make install prefix=/tmp/krava/
             make_no_libnuma_O: make NO_LIBNUMA=1
               make_no_slang_O: make NO_SLANG=1
                make_install_O: make install
              make_no_libelf_O: make NO_LIBELF=1
       make_util_pmu_bison_o_O: make util/pmu-bison.o
                   make_help_O: make help
                  make_debug_O: make DEBUG=1
            make_no_demangle_O: make NO_DEMANGLE=1
  OK
  make: Leaving directory '/home/acme/git/perf/tools/perf'
  $
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH 6/7] perf script python: Add addr into perf sample dict
From: Arnaldo Carvalho de Melo @ 2018-05-31 10:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Clark Williams, linux-kernel, linux-perf-users, Leo Yan,
	Alexander Shishkin, Jiri Olsa, Jonathan Corbet, Mathieu Poirier,
	Mike Leach, Namhyung Kim, Peter Zijlstra, Robert Walker,
	Tor Jeremiassen, coresight, kim.phillips, linux-arm-kernel,
	linux-doc, Arnaldo Carvalho de Melo
In-Reply-To: <20180531103220.24684-1-acme@kernel.org>

From: Leo Yan <leo.yan@linaro.org>

ARM CoreSight auxtrace uses 'sample->addr' to record the target address
for branch instructions, so the data of 'sample->addr' is required for
tracing data analysis.

This commit collects data of 'sample->addr' into perf sample dict,
finally can be used for python script for parsing event.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Walker <robert.walker@arm.com>
Cc: Tor Jeremiassen <tor@ti.com>
Cc: coresight@lists.linaro.org
Cc: kim.phillips@arm.co
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1527497103-3593-3-git-send-email-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/scripting-engines/trace-event-python.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index 10dd5fce082b..7f8afacd08ee 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -531,6 +531,8 @@ static PyObject *get_perf_sample_dict(struct perf_sample *sample,
 			PyLong_FromUnsignedLongLong(sample->period));
 	pydict_set_item_string_decref(dict_sample, "phys_addr",
 			PyLong_FromUnsignedLongLong(sample->phys_addr));
+	pydict_set_item_string_decref(dict_sample, "addr",
+			PyLong_FromUnsignedLongLong(sample->addr));
 	set_sample_read_in_dict(dict_sample, sample, evsel);
 	pydict_set_item_string_decref(dict, "sample", dict_sample);
 
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [GIT PULL 0/7] perf/urgent fixes
From: Ingo Molnar @ 2018-05-31 10:40 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Clark Williams, linux-kernel, linux-perf-users, Adrian Hunter,
	Agustin Vega-Frias, Alexander Shishkin, Andi Kleen, coresight,
	Daniel Borkmann, David Ahern, Ganapatrao Kulkarni, Heiko Carstens,
	He Kuang, Hendrik Brueckner, Jin Yao, Jiri Olsa, Jonathan Corbet,
	Kan Liang, kim.phillips, Kim Phillips, Lakshman Annadorai,
	Leo Yan, linux-arm-kernel, linux-doc, Martin Schwidefsky,
	Mathieu Poirier, Mike Leach, Namhyung Kim, netdev, Peter Zijlstra,
	Robert Walker, Shaokun Zhang, Simon Que, Stephane Eranian,
	Thomas Richter, Tor Jeremiassen, Wang Nan, Will Deacon,
	YueHaibing, Arnaldo Carvalho de Melo
In-Reply-To: <20180531103220.24684-1-acme@kernel.org>


* Arnaldo Carvalho de Melo <acme@kernel.org> wrote:

> Hi Ingo,
> 
> 	Please consider pulling,
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> The following changes since commit f3903c9161f0d636a7b0ff03841628928457e64c:
> 
>   Merge tag 'perf-urgent-for-mingo-4.17-20180514' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent (2018-05-15 08:20:45 +0200)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-urgent-for-mingo-4.17-20180531
> 
> for you to fetch changes up to 18a7057420f8b67f15d17087bf5c0863db752c8b:
> 
>   perf tools: Fix perf.data format description of NRCPUS header (2018-05-30 15:40:26 -0300)
> 
> ----------------------------------------------------------------
> perf/urgent fixes:
> 
> - Fix 'perf test Session topology' segfault on s390 (Thomas Richter)
> 
> - Fix NULL return handling in bpf__prepare_load() (YueHaibing)
> 
> - Fix indexing on Coresight ETM packet queue decoder (Mathieu Poirier)
> 
> - Fix perf.data format description of NRCPUS header (Arnaldo Carvalho de Melo)
> 
> - Update perf.data documentation section on cpu topology
> 
> - Handle uncore event aliases in small groups properly (Kan Liang)
> 
> - Add missing perf_sample.addr into python sample dictionary (Leo Yan)
> 
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> ----------------------------------------------------------------
> Arnaldo Carvalho de Melo (1):
>       perf tools: Fix perf.data format description of NRCPUS header
> 
> Kan Liang (1):
>       perf parse-events: Handle uncore event aliases in small groups properly
> 
> Leo Yan (1):
>       perf script python: Add addr into perf sample dict
> 
> Mathieu Poirier (1):
>       perf cs-etm: Fix indexing for decoder packet queue
> 
> Thomas Richter (2):
>       perf test: "Session topology" dumps core on s390
>       perf data: Update documentation section on cpu topology
> 
> YueHaibing (1):
>       perf bpf: Fix NULL return handling in bpf__prepare_load()
> 
>  tools/perf/Documentation/perf.data-file-format.txt |  10 +-
>  tools/perf/tests/topology.c                        |  30 ++++-
>  tools/perf/util/bpf-loader.c                       |   6 +-
>  tools/perf/util/cs-etm-decoder/cs-etm-decoder.c    |  12 +-
>  tools/perf/util/evsel.h                            |   1 +
>  tools/perf/util/parse-events.c                     | 130 ++++++++++++++++++++-
>  tools/perf/util/parse-events.h                     |   7 +-
>  tools/perf/util/parse-events.y                     |   8 +-
>  .../util/scripting-engines/trace-event-python.c    |   2 +
>  9 files changed, 185 insertions(+), 21 deletions(-)

Pulled, thanks a lot Arnaldo!

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v9 3/7] cpuset: Add cpuset.sched.load_balance flag to v2
From: Peter Zijlstra @ 2018-05-31 10:44 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi
In-Reply-To: <1527601294-3444-4-git-send-email-longman@redhat.com>

On Tue, May 29, 2018 at 09:41:30AM -0400, Waiman Long wrote:
> diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
> index e7534c5..681a809 100644
> --- a/Documentation/cgroup-v2.txt
> +++ b/Documentation/cgroup-v2.txt
> @@ -1542,6 +1542,32 @@ Cpuset Interface Files
>  	Further changes made to "cpuset.cpus" is allowed as long as
>  	the first condition above is still true.
>  
> +	A parent scheduling domain root cgroup cannot distribute all
> +	its CPUs to its child scheduling domain root cgroups

This I think wants to be in the previous patch

>                                                            unless
> +	its load balancing flag is turned off.

And this is indeed for here.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 2/7] PCI: dwc: Add MSI-X callbacks handler
From: Kishon Vijay Abraham I @ 2018-05-31 10:49 UTC (permalink / raw)
  To: Gustavo Pimentel, bhelgaas, lorenzo.pieralisi, Joao.Pinto,
	jingoohan1, adouglas, jesper.nilsson
  Cc: linux-pci, linux-doc, linux-kernel
In-Reply-To: <24e9cdc20d57aaea0bcb62879824c57004df46c9.1526576613.git.gustavo.pimentel@synopsys.com>

Hi,

On Thursday 17 May 2018 10:39 PM, Gustavo Pimentel wrote:
> Change pcie_raise_irq() signature, namely the interrupt_num variable type
> from u8 to u16 to accommodate 2048 maximum MSI-X interrupts.
> 
> Add PCIe config space capability search function.
> 
> Add sysfs set/get interface to allow the change of EP MSI-X maximum number.
> 
> Add EP MSI-X callback for triggering interruptions.
> 
> Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
> ---
> Change v1->v2:
>  - Nothing changed, just to follow the patch set version.
> 
>  drivers/pci/dwc/pci-dra7xx.c           |   2 +-
>  drivers/pci/dwc/pcie-artpec6.c         |   2 +-
>  drivers/pci/dwc/pcie-designware-ep.c   | 146 ++++++++++++++++++++++++++++++++-
>  drivers/pci/dwc/pcie-designware-plat.c |   4 +-
>  drivers/pci/dwc/pcie-designware.h      |  14 +++-
>  5 files changed, 163 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
> index f688204..bdf948b 100644
> --- a/drivers/pci/dwc/pci-dra7xx.c
> +++ b/drivers/pci/dwc/pci-dra7xx.c
> @@ -370,7 +370,7 @@ static void dra7xx_pcie_raise_msi_irq(struct dra7xx_pcie *dra7xx,
>  }
>  
>  static int dra7xx_pcie_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
> -				 enum pci_epc_irq_type type, u8 interrupt_num)
> +				 enum pci_epc_irq_type type, u16 interrupt_num)
>  {
>  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>  	struct dra7xx_pcie *dra7xx = to_dra7xx_pcie(pci);
> diff --git a/drivers/pci/dwc/pcie-artpec6.c b/drivers/pci/dwc/pcie-artpec6.c
> index 321b56c..9a2474b 100644
> --- a/drivers/pci/dwc/pcie-artpec6.c
> +++ b/drivers/pci/dwc/pcie-artpec6.c
> @@ -428,7 +428,7 @@ static void artpec6_pcie_ep_init(struct dw_pcie_ep *ep)
>  }
>  
>  static int artpec6_pcie_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
> -				  enum pci_epc_irq_type type, u8 interrupt_num)
> +				  enum pci_epc_irq_type type, u16 interrupt_num)
>  {
>  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);

I think you should change pci_epc_raise_irq (in previous patch) and the above
two changes in a separate patch. You can also include pcie-cadence-ep.c along
with that.
>  
> diff --git a/drivers/pci/dwc/pcie-designware-ep.c b/drivers/pci/dwc/pcie-designware-ep.c
> index 1eec441..e5f2377 100644
> --- a/drivers/pci/dwc/pcie-designware-ep.c
> +++ b/drivers/pci/dwc/pcie-designware-ep.c
> @@ -40,6 +40,39 @@ void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar)
>  	__dw_pcie_ep_reset_bar(pci, bar, 0);
>  }
>  
> +u8 __dw_pcie_ep_find_next_cap(struct dw_pcie *pci, u8 cap_ptr,
> +			      u8 cap)
> +{
> +	u8 cap_id, next_cap_ptr;
> +	u16 reg;
> +
> +	reg = dw_pcie_readw_dbi(pci, cap_ptr);
> +	next_cap_ptr = (reg & 0xff00) >> 8;
> +	cap_id = (reg & 0x00ff);
> +
> +	if (!next_cap_ptr || cap_id > PCI_CAP_ID_MAX)
> +		return 0;
> +
> +	if (cap_id == cap)
> +		return cap_ptr;
> +
> +	return __dw_pcie_ep_find_next_cap(pci, next_cap_ptr, cap);
> +}
> +
> +u8 dw_pcie_ep_find_capability(struct dw_pcie *pci, u8 cap)
> +{
> +	u8 next_cap_ptr;
> +	u16 reg;
> +
> +	reg = dw_pcie_readw_dbi(pci, PCI_CAPABILITY_LIST);
> +	next_cap_ptr = (reg & 0x00ff);
> +
> +	if (!next_cap_ptr)
> +		return 0;
> +
> +	return __dw_pcie_ep_find_next_cap(pci, next_cap_ptr, cap);
> +}
> +
>  static int dw_pcie_ep_write_header(struct pci_epc *epc, u8 func_no,
>  				   struct pci_epf_header *hdr)
>  {
> @@ -241,8 +274,47 @@ static int dw_pcie_ep_set_msi(struct pci_epc *epc, u8 func_no, u8 encode_int)
>  	return 0;
>  }
>  
> +static int dw_pcie_ep_get_msix(struct pci_epc *epc, u8 func_no)
> +{
> +	struct dw_pcie_ep *ep = epc_get_drvdata(epc);
> +	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> +	u32 val, reg;
> +
> +	if (!ep->msix_cap)
> +		return 0;

return -EINVAL?

or pci_epc_get_msix() will return 1.
> +
> +	reg = ep->msix_cap + PCI_MSIX_FLAGS;
> +	val = dw_pcie_readw_dbi(pci, reg);
> +	if (!(val & PCI_MSIX_FLAGS_ENABLE))
> +		return -EINVAL;
> +
> +	val &= PCI_MSIX_FLAGS_QSIZE;
> +
> +	return val;
> +}
> +
> +static int dw_pcie_ep_set_msix(struct pci_epc *epc, u8 func_no, u16 interrupts)
> +{
> +	struct dw_pcie_ep *ep = epc_get_drvdata(epc);
> +	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> +	u32 val, reg;
> +
> +	if (!ep->msix_cap)
> +		return 0;

here too return -EINVAL.
> +
> +	reg = ep->msix_cap + PCI_MSIX_FLAGS;
> +	val = dw_pcie_readw_dbi(pci, reg);
> +	val &= ~PCI_MSIX_FLAGS_QSIZE;
> +	val |= interrupts;
> +	dw_pcie_dbi_ro_wr_en(pci);
> +	dw_pcie_writew_dbi(pci, reg, val);
> +	dw_pcie_dbi_ro_wr_dis(pci);
> +
> +	return 0;
> +}
> +
>  static int dw_pcie_ep_raise_irq(struct pci_epc *epc, u8 func_no,
> -				enum pci_epc_irq_type type, u8 interrupt_num)
> +				enum pci_epc_irq_type type, u16 interrupt_num)
>  {
>  	struct dw_pcie_ep *ep = epc_get_drvdata(epc);
>  
> @@ -282,6 +354,8 @@ static const struct pci_epc_ops epc_ops = {
>  	.unmap_addr		= dw_pcie_ep_unmap_addr,
>  	.set_msi		= dw_pcie_ep_set_msi,
>  	.get_msi		= dw_pcie_ep_get_msi,
> +	.set_msix		= dw_pcie_ep_set_msix,
> +	.get_msix		= dw_pcie_ep_get_msix,
>  	.raise_irq		= dw_pcie_ep_raise_irq,
>  	.start			= dw_pcie_ep_start,
>  	.stop			= dw_pcie_ep_stop,
> @@ -322,6 +396,64 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
>  	return 0;
>  }
>  
> +int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
> +			     u16 interrupt_num)
> +{
> +	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> +	struct pci_epc *epc = ep->epc;
> +	u16 tbl_offset, bir;
> +	u32 bar_addr_upper, bar_addr_lower;
> +	u32 msg_addr_upper, msg_addr_lower;
> +	u32 reg, msg_data, vec_ctrl;
> +	u64 tbl_addr, msg_addr, reg_u64;
> +	void __iomem *msix_tbl;
> +	int ret;
> +
> +	reg = ep->msix_cap + PCI_MSIX_TABLE;
> +	tbl_offset = dw_pcie_readl_dbi(pci, reg);
> +	bir = (tbl_offset & PCI_MSIX_TABLE_BIR);
> +	tbl_offset &= PCI_MSIX_TABLE_OFFSET;
> +	tbl_offset >>= 3;
> +
> +	reg = PCI_BASE_ADDRESS_0 + (4 * bir);
> +	bar_addr_lower = dw_pcie_readl_dbi(pci, reg);
> +	reg_u64 = (bar_addr_lower & PCI_BASE_ADDRESS_MEM_TYPE_MASK);
> +	if (reg_u64 == PCI_BASE_ADDRESS_MEM_TYPE_64)
> +		bar_addr_upper = dw_pcie_readl_dbi(pci, reg + 4);
> +	else
> +		bar_addr_upper = 0;

You can skip else if you can use something like below

bar_addr_upper = 0
bar_addr_lower = dw_pcie_readl_dbi(pci, reg);
reg_u64 = (bar_addr_lower & PCI_BASE_ADDRESS_MEM_TYPE_MASK);
if (reg_u64 == PCI_BASE_ADDRESS_MEM_TYPE_64)
	bar_addr_upper = dw_pcie_readl_dbi(pci, reg + 4);
> +
> +	tbl_addr = ((u64) bar_addr_upper) << 32 | bar_addr_lower;
> +	tbl_addr += (tbl_offset + ((interrupt_num - 1) * PCI_MSIX_ENTRY_SIZE));
> +	tbl_addr &= PCI_BASE_ADDRESS_MEM_MASK;
> +
> +	msix_tbl = ioremap_nocache(ep->phys_base + tbl_addr, ep->addr_size);

Why do you want to ioremap the entire address region?
> +	if (!msix_tbl)
> +		return -EINVAL;
> +
> +	msg_addr_lower = readl(msix_tbl + PCI_MSIX_ENTRY_LOWER_ADDR);
> +	msg_addr_upper = readl(msix_tbl + PCI_MSIX_ENTRY_UPPER_ADDR);
> +	msg_addr = ((u64) msg_addr_upper) << 32 | msg_addr_lower;
> +	msg_data = readl(msix_tbl + PCI_MSIX_ENTRY_DATA);
> +	vec_ctrl = readl(msix_tbl + PCI_MSIX_ENTRY_VECTOR_CTRL);
> +
> +	if (vec_ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT)
> +		return -EPERM;
> +
> +	iounmap(msix_tbl);
> +
> +	ret = dw_pcie_ep_map_addr(epc, func_no, ep->msix_mem_phys, msg_addr,
> +				  epc->mem->page_size);
> +	if (ret)
> +		return ret;
> +
> +	writel(msg_data, ep->msix_mem);
> +
> +	dw_pcie_ep_unmap_addr(epc, func_no, ep->msix_mem_phys);
> +
> +	return 0;
> +}
> +
>  void dw_pcie_ep_exit(struct dw_pcie_ep *ep)
>  {
>  	struct pci_epc *epc = ep->epc;
> @@ -329,6 +461,9 @@ void dw_pcie_ep_exit(struct dw_pcie_ep *ep)
>  	pci_epc_mem_free_addr(epc, ep->msi_mem_phys, ep->msi_mem,
>  			      epc->mem->page_size);
>  
> +	pci_epc_mem_free_addr(epc, ep->msix_mem_phys, ep->msix_mem,
> +			      epc->mem->page_size);
> +
>  	pci_epc_mem_exit(epc);
>  }
>  
> @@ -410,6 +545,15 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
>  		dev_err(dev, "Failed to reserve memory for MSI\n");
>  		return -ENOMEM;
>  	}
> +	ep->msi_cap = dw_pcie_ep_find_capability(pci, PCI_CAP_ID_MSI);

msi_cap is not used anywhere else.

Thanks
Kishon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 3/7] PCI: cadence: Update cdns_pcie_ep_raise_irq function signature
From: Kishon Vijay Abraham I @ 2018-05-31 10:51 UTC (permalink / raw)
  To: Gustavo Pimentel, bhelgaas, lorenzo.pieralisi, Joao.Pinto,
	jingoohan1, adouglas, jesper.nilsson
  Cc: linux-pci, linux-doc, linux-kernel
In-Reply-To: <4994f263efbf6a2cb952d3d9839fb3b1737efde9.1526576613.git.gustavo.pimentel@synopsys.com>

Hi Alan,

On Thursday 17 May 2018 10:39 PM, Gustavo Pimentel wrote:
> Change cdns_pcie_ep_raise_irq() signature, namely the interrupt_num
> variable type from u8 to u16 to accommodate 2048 maximum MSI-X
> interrupts.
> 
> Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
> Acked-by: Alan Douglas <adouglas@cadence.com>

Do you want to add MSI-X support to cadence PCIe?

Thanks
Kishon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v9 3/7] cpuset: Add cpuset.sched.load_balance flag to v2
From: Peter Zijlstra @ 2018-05-31 10:54 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi
In-Reply-To: <1527601294-3444-4-git-send-email-longman@redhat.com>

On Tue, May 29, 2018 at 09:41:30AM -0400, Waiman Long wrote:

> +  cpuset.sched.load_balance
> +	A read-write single value file which exists on non-root
> +	cpuset-enabled cgroups.  It is a binary value flag that accepts
> +	either "0" (off) or "1" (on).  This flag is set by the parent
> +	and is not delegatable.  It is on by default in the root cgroup.
> +
> +	When it is on, tasks within this cpuset will be load-balanced
> +	by the kernel scheduler.  Tasks will be moved from CPUs with
> +	high load to other CPUs within the same cpuset with less load
> +	periodically.
> +
> +	When it is off, there will be no load balancing among CPUs on
> +	this cgroup.  Tasks will stay in the CPUs they are running on
> +	and will not be moved to other CPUs.

That is not entirely accurate I'm afraid (unless the patch makes it so,
I've yet to check). When you disable load-balancing on a cgroup you'll
get whatever balancing is left for the partition you happen to end up
in.

Take for instance workqueue thingies, they use kthread_bind_mask()
(IIRC) and thus end up with PF_NO_SETAFFINITY so cpusets (or any other
cgroups really) do not have effect on them (long standing complaint).

So take for instance the unbound numa enabled workqueue threads, those
will land in whatever partition and get balanced there.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 4/7] PCI: dwc: Rework MSI callbacks handler
From: Kishon Vijay Abraham I @ 2018-05-31 10:54 UTC (permalink / raw)
  To: Gustavo Pimentel, bhelgaas, lorenzo.pieralisi, Joao.Pinto,
	jingoohan1, adouglas, jesper.nilsson
  Cc: linux-pci, linux-doc, linux-kernel
In-Reply-To: <9558d4fcd6f888599e3b70133f1f242c7a6664ee.1526576613.git.gustavo.pimentel@synopsys.com>

Hi,

On Thursday 17 May 2018 10:39 PM, Gustavo Pimentel wrote:
> Remove duplicate defines located on pcie-designware.h file already
> available on /include/uapi/linux/pci-regs.h file.
> 
> Add pci_epc_set_msi() maximum 32 interrupts validation.
> 
> Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
> ---
> Change v1->v2:
>  - Nothing changed, just to follow the patch set version.
> 
>  drivers/pci/dwc/pcie-designware-ep.c | 49 ++++++++++++++++++++++++------------
>  drivers/pci/dwc/pcie-designware.h    | 11 --------
>  drivers/pci/endpoint/pci-epc-core.c  |  3 ++-
>  3 files changed, 35 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/pci/dwc/pcie-designware-ep.c b/drivers/pci/dwc/pcie-designware-ep.c
> index e5f2377..a4baa0d 100644
> --- a/drivers/pci/dwc/pcie-designware-ep.c
> +++ b/drivers/pci/dwc/pcie-designware-ep.c
> @@ -246,29 +246,38 @@ static int dw_pcie_ep_map_addr(struct pci_epc *epc, u8 func_no,
>  
>  static int dw_pcie_ep_get_msi(struct pci_epc *epc, u8 func_no)
>  {
> -	int val;
>  	struct dw_pcie_ep *ep = epc_get_drvdata(epc);
>  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> +	u32 val, reg;
> +
> +	if (!ep->msi_cap)

Ah, msi_cap is used here.
> +		return 0;

return -EINVAL.
>  
> -	val = dw_pcie_readw_dbi(pci, MSI_MESSAGE_CONTROL);
> -	if (!(val & MSI_CAP_MSI_EN_MASK))
> +	reg = ep->msi_cap + PCI_MSI_FLAGS;
> +	val = dw_pcie_readw_dbi(pci, reg);
> +	if (!(val & PCI_MSI_FLAGS_ENABLE))
>  		return -EINVAL;
>  
> -	val = (val & MSI_CAP_MME_MASK) >> MSI_CAP_MME_SHIFT;
> +	val = (val & PCI_MSI_FLAGS_QSIZE) >> 4;
> +
>  	return val;
>  }
>  
> -static int dw_pcie_ep_set_msi(struct pci_epc *epc, u8 func_no, u8 encode_int)
> +static int dw_pcie_ep_set_msi(struct pci_epc *epc, u8 func_no, u8 interrupts)
>  {
> -	int val;
>  	struct dw_pcie_ep *ep = epc_get_drvdata(epc);
>  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> +	u32 val, reg;
>  
> -	val = dw_pcie_readw_dbi(pci, MSI_MESSAGE_CONTROL);
> -	val &= ~MSI_CAP_MMC_MASK;
> -	val |= (encode_int << MSI_CAP_MMC_SHIFT) & MSI_CAP_MMC_MASK;
> +	if (!ep->msi_cap)
> +		return 0;

return -EINVAL.
> +
> +	reg = ep->msi_cap + PCI_MSI_FLAGS;
> +	val = dw_pcie_readw_dbi(pci, reg);
> +	val &= ~PCI_MSI_FLAGS_QMASK;
> +	val |= (interrupts << 1) & PCI_MSI_FLAGS_QMASK;
>  	dw_pcie_dbi_ro_wr_en(pci);
> -	dw_pcie_writew_dbi(pci, MSI_MESSAGE_CONTROL, val);
> +	dw_pcie_writew_dbi(pci, reg, val);
>  	dw_pcie_dbi_ro_wr_dis(pci);
>  
>  	return 0;
> @@ -367,21 +376,29 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
>  	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>  	struct pci_epc *epc = ep->epc;
>  	u16 msg_ctrl, msg_data;
> -	u32 msg_addr_lower, msg_addr_upper;
> +	u32 msg_addr_lower, msg_addr_upper, reg;
>  	u64 msg_addr;
>  	bool has_upper;
>  	int ret;
>  
> +	if (!ep->msi_cap)
> +		return 0;

return -EINVAL.

Thanks
Kishon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 5/7] PCI: dwc: Add legacy interrupt callback handler
From: Kishon Vijay Abraham I @ 2018-05-31 10:56 UTC (permalink / raw)
  To: Gustavo Pimentel, bhelgaas, lorenzo.pieralisi, Joao.Pinto,
	jingoohan1, adouglas, jesper.nilsson
  Cc: linux-pci, linux-doc, linux-kernel
In-Reply-To: <c94a66476c391f8bdc781384e6d096453c2c885b.1526576613.git.gustavo.pimentel@synopsys.com>



On Thursday 17 May 2018 10:39 PM, Gustavo Pimentel wrote:
> Add a legacy interrupt callback handler. Currently DesignWare IP don't
> allow trigger legacy interrupts.
> 
> Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>

Acked-by: Kishon Vijay Abraham I <kishon@ti.com>
> ---
> Change v1->v2:
>  - Nothing changed, just to follow the patch set version.
> 
>  drivers/pci/dwc/pcie-designware-ep.c   | 10 ++++++++++
>  drivers/pci/dwc/pcie-designware-plat.c |  3 +--
>  drivers/pci/dwc/pcie-designware.h      |  6 ++++++
>  3 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/dwc/pcie-designware-ep.c b/drivers/pci/dwc/pcie-designware-ep.c
> index a4baa0d..9822127 100644
> --- a/drivers/pci/dwc/pcie-designware-ep.c
> +++ b/drivers/pci/dwc/pcie-designware-ep.c
> @@ -370,6 +370,16 @@ static const struct pci_epc_ops epc_ops = {
>  	.stop			= dw_pcie_ep_stop,
>  };
>  
> +int dw_pcie_ep_raise_legacy_irq(struct dw_pcie_ep *ep, u8 func_no)
> +{
> +	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> +	struct device *dev = pci->dev;
> +
> +	dev_err(dev, "EP cannot trigger legacy IRQs\n");
> +
> +	return -EINVAL;
> +}
> +
>  int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
>  			     u8 interrupt_num)
>  {
> diff --git a/drivers/pci/dwc/pcie-designware-plat.c b/drivers/pci/dwc/pcie-designware-plat.c
> index 654dcb5..90a8c95 100644
> --- a/drivers/pci/dwc/pcie-designware-plat.c
> +++ b/drivers/pci/dwc/pcie-designware-plat.c
> @@ -84,8 +84,7 @@ static int dw_plat_pcie_ep_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
>  
>  	switch (type) {
>  	case PCI_EPC_IRQ_LEGACY:
> -		dev_err(pci->dev, "EP cannot trigger legacy IRQs\n");
> -		return -EINVAL;
> +		return dw_pcie_ep_raise_legacy_irq(ep, func_no);
>  	case PCI_EPC_IRQ_MSI:
>  		return dw_pcie_ep_raise_msi_irq(ep, func_no, interrupt_num);
>  	case PCI_EPC_IRQ_MSIX:
> diff --git a/drivers/pci/dwc/pcie-designware.h b/drivers/pci/dwc/pcie-designware.h
> index a0ab12f..69e6e17 100644
> --- a/drivers/pci/dwc/pcie-designware.h
> +++ b/drivers/pci/dwc/pcie-designware.h
> @@ -350,6 +350,7 @@ static inline int dw_pcie_allocate_domains(struct pcie_port *pp)
>  void dw_pcie_ep_linkup(struct dw_pcie_ep *ep);
>  int dw_pcie_ep_init(struct dw_pcie_ep *ep);
>  void dw_pcie_ep_exit(struct dw_pcie_ep *ep);
> +int dw_pcie_ep_raise_legacy_irq(struct dw_pcie_ep *ep, u8 func_no);
>  int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
>  			     u8 interrupt_num);
>  int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
> @@ -369,6 +370,11 @@ static inline void dw_pcie_ep_exit(struct dw_pcie_ep *ep)
>  {
>  }
>  
> +static inline int dw_pcie_ep_raise_legacy_irq(struct dw_pcie_ep *ep, u8 func_no)
> +{
> +	return 0;
> +}
> +
>  static inline int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
>  					   u8 interrupt_num)
>  {
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 6/7] misc: pci_endpoint_test: Add MSI-X support
From: Kishon Vijay Abraham I @ 2018-05-31 11:27 UTC (permalink / raw)
  To: Gustavo Pimentel, bhelgaas, lorenzo.pieralisi, Joao.Pinto,
	jingoohan1, adouglas, jesper.nilsson
  Cc: linux-pci, linux-doc, linux-kernel
In-Reply-To: <30034b904a27c405b894a34922e59315dcae68ab.1526576613.git.gustavo.pimentel@synopsys.com>

Hi,

On Thursday 17 May 2018 10:39 PM, Gustavo Pimentel wrote:
> Add MSI-X support and update driver documentation accordingly.
> 
> Add new driver parameter to allow interruption type selection.
> 
> Add 2 new IOCTL commands:
>  - Allow to reconfigure driver IRQ type in runtime.
>  - Allow to retrieve current driver IRQ type configured.
> 
> Change Legacy/MSI/MSI-X test process, by having in a BAR:
>  - Interrupt type triggered (added).
>  - Interrupt ID number (moved from the command section).
> 
> Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>

It's better to change the subject to pci-epf-test/pci_endpoint_test: Add MSI-X
support

IMO this patch should be split into multiple patches

*) Cleanup PCI_ENDPOINT_TEST memspace (by moving the interrupt number away from
command section)
*) Using irq_type module param
*) Adding MSI-X support
*) adding 2 ioctl commands (I'm not convinced on adding new ioctls though)

Thanks
Kishon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v9 3/7] cpuset: Add cpuset.sched.load_balance flag to v2
From: Peter Zijlstra @ 2018-05-31 12:26 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi,
	Thomas Gleixner
In-Reply-To: <1527601294-3444-4-git-send-email-longman@redhat.com>

On Tue, May 29, 2018 at 09:41:30AM -0400, Waiman Long wrote:
> The sched.load_balance flag is needed to enable CPU isolation similar to
> what can be done with the "isolcpus" kernel boot parameter. Its value
> can only be changed in a scheduling domain with no child cpusets. On
> a non-scheduling domain cpuset, the value of sched.load_balance is
> inherited from its parent. This is to make sure that all the cpusets
> within the same scheduling domain or partition has the same load
> balancing state.
> 
> This flag is set by the parent and is not delegatable.

> +  cpuset.sched.domain_root
> +	A read-write single value file which exists on non-root
> +	cpuset-enabled cgroups.  It is a binary value flag that accepts
> +	either "0" (off) or "1" (on).  This flag is set by the parent
> +	and is not delegatable.
> +
> +	If set, it indicates that the current cgroup is the root of a
> +	new scheduling domain or partition that comprises itself and
> +	all its descendants except those that are scheduling domain
> +	roots themselves and their descendants.  The root cgroup is
> +	always a scheduling domain root.
> +
> +	There are constraints on where this flag can be set.  It can
> +	only be set in a cgroup if all the following conditions are true.
> +
> +	1) The "cpuset.cpus" is not empty and the list of CPUs are
> +	   exclusive, i.e. they are not shared by any of its siblings.
> +	2) The parent cgroup is also a scheduling domain root.
> +	3) There is no child cgroups with cpuset enabled.  This is
> +	   for eliminating corner cases that have to be handled if such
> +	   a condition is allowed.
> +
> +	Setting this flag will take the CPUs away from the effective
> +	CPUs of the parent cgroup.  Once it is set, this flag cannot
> +	be cleared if there are any child cgroups with cpuset enabled.
> +	Further changes made to "cpuset.cpus" is allowed as long as
> +	the first condition above is still true.
> +
> +	A parent scheduling domain root cgroup cannot distribute all
> +	its CPUs to its child scheduling domain root cgroups unless
> +	its load balancing flag is turned off.
> +
> +  cpuset.sched.load_balance
> +	A read-write single value file which exists on non-root
> +	cpuset-enabled cgroups.  It is a binary value flag that accepts
> +	either "0" (off) or "1" (on).  This flag is set by the parent
> +	and is not delegatable.  It is on by default in the root cgroup.
> +
> +	When it is on, tasks within this cpuset will be load-balanced
> +	by the kernel scheduler.  Tasks will be moved from CPUs with
> +	high load to other CPUs within the same cpuset with less load
> +	periodically.
> +
> +	When it is off, there will be no load balancing among CPUs on
> +	this cgroup.  Tasks will stay in the CPUs they are running on
> +	and will not be moved to other CPUs.
> +
> +	The load balancing state of a cgroup can only be changed on a
> +	scheduling domain root cgroup with no cpuset-enabled children.
> +	All cgroups within a scheduling domain or partition must have
> +	the same load balancing state.	As descendant cgroups of a
> +	scheduling domain root are created, they inherit the same load
> +	balancing state of their root.

I still find all that a bit weird.

So load_balance=0 basically changes a partition into a
'fully-partitioned partition' with the seemingly random side-effect that
now sub-partitions are allowed to consume all CPUs.

The rationale, only given in the Changelog above, seems to be to allow
'easy' emulation of isolcpus.

I'm still not convinced this is a useful knob to have. You can do
fully-partitioned by simply creating a lot of 1 cpu parititions.

So this one knob does two separate things, both of which seem, to me,
redundant.

Can we please get better rationale for this?
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] cpuset: Enforce that a child's cpus must be a subset of the parent
From: Waiman Long @ 2018-05-31 13:22 UTC (permalink / raw)
  To: Zefan Li, Peter Zijlstra
  Cc: Tejun Heo, Johannes Weiner, Ingo Molnar, cgroups, linux-kernel,
	linux-doc, kernel-team, pjt, luto, Mike Galbraith, torvalds,
	Roman Gushchin, Juri Lelli, Patrick Bellasi
In-Reply-To: <5B0FB58C.9030705@huawei.com>

On 05/31/2018 04:42 AM, Zefan Li wrote:
> On 2018/5/31 16:26, Peter Zijlstra wrote:
>> On Thu, May 31, 2018 at 04:12:34PM +0800, Zefan Li wrote:
>>> On 2018/5/31 9:25, Zefan Li wrote:
>>>> Hi Waiman,
>>>>
>>>> On 2018/5/30 21:46, Waiman Long wrote:
>>>>> It was found that the cpuset.cpus could contain CPUs that are not listed
>>>>> in their parent's cpu list as shown by the command sequence below:
>>>>>
>>>>>   # echo "+cpuset" >cgroup.subtree_control
>>>>>   # mkdir g1
>>>>>   # echo 0-5 >g1/cpuset.cpus
>>>>>   # mkdir g1/g11
>>>>>   # echo "+cpuset" > g1/cgroup.subtree_control
>>>>>   # echo 6-11 >g1/g11/cpuset.cpus
>>>>>   # grep -R . g1 | grep "\.cpus"
>>>>>   g1/cpuset.cpus:0-5
>>>>>   g1/cpuset.cpus.effective:0-5
>>>>>   g1/g11/cpuset.cpus:6-11
>>>>>   g1/g11/cpuset.cpus.effective:0-5
>>>>>
>>>>> As the intersection of g11's cpus and that of g1 is empty, the effective
>>>>> cpus of g11 is just that of g1. The check in update_cpumask() is now
>>>>> corrected to make sure that cpus in a child cpus must be a subset of
>>>>> its parent's cpus. The error "write error: Invalid argument" will now
>>>>> be reported in the above case.
>>>>>
>>>> We made the distinction between user-configured CPUs and effective CPUs
>>>> in commit 7e88291beefbb758, so actually it's not a bug.
>>>>
>>> I remember the original reason is to support restoration of the original
>>> cpu after cpu offline->online. We use user-configured CPUs to remember
>>> if the cpu should be restored in the cpuset after it's onlined.
>> AFAICT you can do that and still have the child a subset of the parent,
>> no?
>> .
> Sure. IIRC this was suggested by Tejun as he had done something similar to devcgroup.
>
OK, let wait until Tejun has time to chime in. For me, it just look
weird to be able to do that.

Another corner case that is not handled is when cpus_allowed is empty.
In this case, it falls back to the parent's effective cpus. On the other
hand, it can also be argued that an empty cpus_allowed is a transient
state and a cpuset shouldn't have cpus undefined while creating children.

Cheers,
Longman


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v9 3/7] cpuset: Add cpuset.sched.load_balance flag to v2
From: Waiman Long @ 2018-05-31 13:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi
In-Reply-To: <20180531105416.GI12180@hirez.programming.kicks-ass.net>

On 05/31/2018 06:54 AM, Peter Zijlstra wrote:
> On Tue, May 29, 2018 at 09:41:30AM -0400, Waiman Long wrote:
>
>> +  cpuset.sched.load_balance
>> +	A read-write single value file which exists on non-root
>> +	cpuset-enabled cgroups.  It is a binary value flag that accepts
>> +	either "0" (off) or "1" (on).  This flag is set by the parent
>> +	and is not delegatable.  It is on by default in the root cgroup.
>> +
>> +	When it is on, tasks within this cpuset will be load-balanced
>> +	by the kernel scheduler.  Tasks will be moved from CPUs with
>> +	high load to other CPUs within the same cpuset with less load
>> +	periodically.
>> +
>> +	When it is off, there will be no load balancing among CPUs on
>> +	this cgroup.  Tasks will stay in the CPUs they are running on
>> +	and will not be moved to other CPUs.
> That is not entirely accurate I'm afraid (unless the patch makes it so,
> I've yet to check). When you disable load-balancing on a cgroup you'll
> get whatever balancing is left for the partition you happen to end up
> in.
>
> Take for instance workqueue thingies, they use kthread_bind_mask()
> (IIRC) and thus end up with PF_NO_SETAFFINITY so cpusets (or any other
> cgroups really) do not have effect on them (long standing complaint).
>
> So take for instance the unbound numa enabled workqueue threads, those
> will land in whatever partition and get balanced there.

Thanks for the clarification. The patch doesn't make any changes in the
scheduler. I was trying to say what the flag does. I will update the
documentation about this nuisance.

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v9 3/7] cpuset: Add cpuset.sched.load_balance flag to v2
From: Waiman Long @ 2018-05-31 13:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi,
	Thomas Gleixner
In-Reply-To: <20180531122638.GJ12180@hirez.programming.kicks-ass.net>

On 05/31/2018 08:26 AM, Peter Zijlstra wrote:
> On Tue, May 29, 2018 at 09:41:30AM -0400, Waiman Long wrote:
>> The sched.load_balance flag is needed to enable CPU isolation similar to
>> what can be done with the "isolcpus" kernel boot parameter. Its value
>> can only be changed in a scheduling domain with no child cpusets. On
>> a non-scheduling domain cpuset, the value of sched.load_balance is
>> inherited from its parent. This is to make sure that all the cpusets
>> within the same scheduling domain or partition has the same load
>> balancing state.
>>
>> This flag is set by the parent and is not delegatable.
>> +  cpuset.sched.domain_root
>> +	A read-write single value file which exists on non-root
>> +	cpuset-enabled cgroups.  It is a binary value flag that accepts
>> +	either "0" (off) or "1" (on).  This flag is set by the parent
>> +	and is not delegatable.
>> +
>> +	If set, it indicates that the current cgroup is the root of a
>> +	new scheduling domain or partition that comprises itself and
>> +	all its descendants except those that are scheduling domain
>> +	roots themselves and their descendants.  The root cgroup is
>> +	always a scheduling domain root.
>> +
>> +	There are constraints on where this flag can be set.  It can
>> +	only be set in a cgroup if all the following conditions are true.
>> +
>> +	1) The "cpuset.cpus" is not empty and the list of CPUs are
>> +	   exclusive, i.e. they are not shared by any of its siblings.
>> +	2) The parent cgroup is also a scheduling domain root.
>> +	3) There is no child cgroups with cpuset enabled.  This is
>> +	   for eliminating corner cases that have to be handled if such
>> +	   a condition is allowed.
>> +
>> +	Setting this flag will take the CPUs away from the effective
>> +	CPUs of the parent cgroup.  Once it is set, this flag cannot
>> +	be cleared if there are any child cgroups with cpuset enabled.
>> +	Further changes made to "cpuset.cpus" is allowed as long as
>> +	the first condition above is still true.
>> +
>> +	A parent scheduling domain root cgroup cannot distribute all
>> +	its CPUs to its child scheduling domain root cgroups unless
>> +	its load balancing flag is turned off.
>> +
>> +  cpuset.sched.load_balance
>> +	A read-write single value file which exists on non-root
>> +	cpuset-enabled cgroups.  It is a binary value flag that accepts
>> +	either "0" (off) or "1" (on).  This flag is set by the parent
>> +	and is not delegatable.  It is on by default in the root cgroup.
>> +
>> +	When it is on, tasks within this cpuset will be load-balanced
>> +	by the kernel scheduler.  Tasks will be moved from CPUs with
>> +	high load to other CPUs within the same cpuset with less load
>> +	periodically.
>> +
>> +	When it is off, there will be no load balancing among CPUs on
>> +	this cgroup.  Tasks will stay in the CPUs they are running on
>> +	and will not be moved to other CPUs.
>> +
>> +	The load balancing state of a cgroup can only be changed on a
>> +	scheduling domain root cgroup with no cpuset-enabled children.
>> +	All cgroups within a scheduling domain or partition must have
>> +	the same load balancing state.	As descendant cgroups of a
>> +	scheduling domain root are created, they inherit the same load
>> +	balancing state of their root.
> I still find all that a bit weird.
>
> So load_balance=0 basically changes a partition into a
> 'fully-partitioned partition' with the seemingly random side-effect that
> now sub-partitions are allowed to consume all CPUs.

Are you suggesting that we should allow sub-partition to consume all the
CPUs no matter the load balance state? I can live with that if you think
it is more logical.

> The rationale, only given in the Changelog above, seems to be to allow
> 'easy' emulation of isolcpus.
>
> I'm still not convinced this is a useful knob to have. You can do
> fully-partitioned by simply creating a lot of 1 cpu parititions.

That is certainly true. However, I think there are some additional
overhead in the scheduler side in maintaining those 1-cpu partitions. Right?

> So this one knob does two separate things, both of which seem, to me,
> redundant.
>
> Can we please get better rationale for this?

I am fine getting rid of the load_balance flag if this is the consensus.
However, we do need to come up with a good migration story for those
users that need the isolcpus capability. I think Mike was the one asking
for supporting isolcpus. So Mike, what is your take on that.

Cheers,
Longman


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v9 3/7] cpuset: Add cpuset.sched.load_balance flag to v2
From: Peter Zijlstra @ 2018-05-31 15:20 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi,
	Thomas Gleixner
In-Reply-To: <42cc1f44-2355-1c0c-b575-49c863303c42@redhat.com>

On Thu, May 31, 2018 at 09:54:27AM -0400, Waiman Long wrote:
> On 05/31/2018 08:26 AM, Peter Zijlstra wrote:

> > I still find all that a bit weird.
> >
> > So load_balance=0 basically changes a partition into a
> > 'fully-partitioned partition' with the seemingly random side-effect that
> > now sub-partitions are allowed to consume all CPUs.
> 
> Are you suggesting that we should allow sub-partition to consume all the
> CPUs no matter the load balance state? I can live with that if you think
> it is more logical.

I'm on the fence myself; the only thing I'm fairly sure of is that tying
this particular behaviour to the load-balance knob seems off.

> > The rationale, only given in the Changelog above, seems to be to allow
> > 'easy' emulation of isolcpus.
> >
> > I'm still not convinced this is a useful knob to have. You can do
> > fully-partitioned by simply creating a lot of 1 cpu parititions.
> 
> That is certainly true. However, I think there are some additional
> overhead in the scheduler side in maintaining those 1-cpu partitions. Right?

cpuset-controller as such doesn't have much overhead scheduler wise,
cpu-controller OTOH does, and there depth is the predominant factor, so
many sibling groups should not matter there either.

> > So this one knob does two separate things, both of which seem, to me,
> > redundant.
> >
> > Can we please get better rationale for this?
> 
> I am fine getting rid of the load_balance flag if this is the consensus.
> However, we do need to come up with a good migration story for those
> users that need the isolcpus capability. I think Mike was the one asking
> for supporting isolcpus. So Mike, what is your take on that.

So I don't strictly mind having a knob that does the 'fully-partitioned
partition' thing -- however odd that sounds -- but I feel we should have
a solid use-case for it.

I also think we should not mix the 'consume all' thing with the
'fully-partitioned' thing, as they are otherwise unrelated.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v9 3/7] cpuset: Add cpuset.sched.load_balance flag to v2
From: Waiman Long @ 2018-05-31 15:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi,
	Thomas Gleixner
In-Reply-To: <20180531152050.GK12180@hirez.programming.kicks-ass.net>

On 05/31/2018 11:20 AM, Peter Zijlstra wrote:
> On Thu, May 31, 2018 at 09:54:27AM -0400, Waiman Long wrote:
>> On 05/31/2018 08:26 AM, Peter Zijlstra wrote:
>>> I still find all that a bit weird.
>>>
>>> So load_balance=0 basically changes a partition into a
>>> 'fully-partitioned partition' with the seemingly random side-effect that
>>> now sub-partitions are allowed to consume all CPUs.
>> Are you suggesting that we should allow sub-partition to consume all the
>> CPUs no matter the load balance state? I can live with that if you think
>> it is more logical.
> I'm on the fence myself; the only thing I'm fairly sure of is that tying
> this particular behaviour to the load-balance knob seems off.

The main reason for doing it this way is that I don't want to have
load-balanced partition with no cpu in it. How about we just don't allow
consume-all at all. Each partition must have at least 1 cpu.

>
>>> The rationale, only given in the Changelog above, seems to be to allow
>>> 'easy' emulation of isolcpus.
>>>
>>> I'm still not convinced this is a useful knob to have. You can do
>>> fully-partitioned by simply creating a lot of 1 cpu parititions.
>> That is certainly true. However, I think there are some additional
>> overhead in the scheduler side in maintaining those 1-cpu partitions. Right?
> cpuset-controller as such doesn't have much overhead scheduler wise,
> cpu-controller OTOH does, and there depth is the predominant factor, so
> many sibling groups should not matter there either.
>
>>> So this one knob does two separate things, both of which seem, to me,
>>> redundant.
>>>
>>> Can we please get better rationale for this?
>> I am fine getting rid of the load_balance flag if this is the consensus.
>> However, we do need to come up with a good migration story for those
>> users that need the isolcpus capability. I think Mike was the one asking
>> for supporting isolcpus. So Mike, what is your take on that.
> So I don't strictly mind having a knob that does the 'fully-partitioned
> partition' thing -- however odd that sounds -- but I feel we should have
> a solid use-case for it.
>
> I also think we should not mix the 'consume all' thing with the
> 'fully-partitioned' thing, as they are otherwise unrelated.

The "consume all" and "fully-partitioned" look the same to me. Are you
talking about allocating all the CPUs in a partition to sub-partitions
so that there is no CPU left in the parent partition?

Cheers,
Longman


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] cpuset: Enforce that a child's cpus must be a subset of the parent
From: Tejun Heo @ 2018-05-31 15:58 UTC (permalink / raw)
  To: Waiman Long
  Cc: Zefan Li, Peter Zijlstra, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi
In-Reply-To: <4dc718bc-4bd5-4998-853b-9c6ba67b89a0@redhat.com>

Hello,

On Thu, May 31, 2018 at 09:22:23AM -0400, Waiman Long wrote:
> >>>>> As the intersection of g11's cpus and that of g1 is empty, the effective
> >>>>> cpus of g11 is just that of g1. The check in update_cpumask() is now
> >>>>> corrected to make sure that cpus in a child cpus must be a subset of
> >>>>> its parent's cpus. The error "write error: Invalid argument" will now
> >>>>> be reported in the above case.
> >>>>>
> >>>> We made the distinction between user-configured CPUs and effective CPUs
> >>>> in commit 7e88291beefbb758, so actually it's not a bug.
> >>>>
> >>> I remember the original reason is to support restoration of the original
> >>> cpu after cpu offline->online. We use user-configured CPUs to remember
> >>> if the cpu should be restored in the cpuset after it's onlined.
> >> AFAICT you can do that and still have the child a subset of the parent,
> >> no?
> >> .
> > Sure. IIRC this was suggested by Tejun as he had done something similar to devcgroup.
> >
> OK, let wait until Tejun has time to chime in. For me, it just look
> weird to be able to do that.
> 
> Another corner case that is not handled is when cpus_allowed is empty.
> In this case, it falls back to the parent's effective cpus. On the other
> hand, it can also be argued that an empty cpus_allowed is a transient
> state and a cpuset shouldn't have cpus undefined while creating children.

Tying together what's configured and what's applied may feel
attractive on the surface but it's a long term headache.

* It's inconsistent with what other controllers are doing.  All the
  limit resource configs declare the upper bound the specific cgroup
  can consume regardless of what's actually available to it.  They
  limit but don't guarantee access.

* Which decouples a given cgroup's configurations from its ancestors',
  which allows an ancestor to take away resources that it granted
  before and then also giving it back later.  No matter what you do,
  if you couple configs of cgroup hierarchy, you end up restricting
  what an ancestor can do to its sub-hierarchy, which can quickly
  become a difficult operational headache.

So, let's please stay away from it even if that means a bit of
overhead in terms of interface.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v9 3/7] cpuset: Add cpuset.sched.load_balance flag to v2
From: Peter Zijlstra @ 2018-05-31 16:08 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi,
	Thomas Gleixner
In-Reply-To: <ebe7f0c1-b2dc-57ad-e5cb-7edf0fee4626@redhat.com>

On Thu, May 31, 2018 at 11:36:39AM -0400, Waiman Long wrote:
> > I'm on the fence myself; the only thing I'm fairly sure of is that tying
> > this particular behaviour to the load-balance knob seems off.
> 
> The main reason for doing it this way is that I don't want to have
> load-balanced partition with no cpu in it. How about we just don't allow
> consume-all at all. Each partition must have at least 1 cpu.

I suspect that might be sufficient. It certainly is for the use-cases
I'm aware of. You always want a system/control set which runs the
regular busy work of running a system.

Then you have one (or more) partitions to run your 'important' work.

> > I also think we should not mix the 'consume all' thing with the
> > 'fully-partitioned' thing, as they are otherwise unrelated.
>
> The "consume all" and "fully-partitioned" look the same to me. Are you
> talking about allocating all the CPUs in a partition to sub-partitions
> so that there is no CPU left in the parent partition?

Not sure what you're asking. "consume all" is allowing sub-partitions to
allocate all CPUs of the parent, such that there are none left.

"fully-partitioned" is N cpus but no load-balancing, also equivalent to
N 1 CPU parititions.

They are distinct things. Disabling load-balancing should not affect how
many CPUs can be allocated to sub-partitions, the moment you hit 1 CPU
the load balancing is effectively off already. Going down to 0 CPUs
isn't a problem for the load-balancer, it wasn't doing anything anyway.

So the question is if someone really needs the one partition without
balancing over N separate paritions.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox