LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH kernel v3 2/2] KVM: PPC: Check if IOMMU page is contained in the pinned physical page
From: David Gibson @ 2018-07-06  5:06 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, kvm-ppc, Alex Williamson, Michael Ellerman,
	Paul Mackerras
In-Reply-To: <20180705151904.17de7322@aik.ozlabs.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 6662 bytes --]

On Thu, Jul 05, 2018 at 03:19:04PM +1000, Alexey Kardashevskiy wrote:
> On Thu, 5 Jul 2018 12:42:20 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Wed, Jul 04, 2018 at 03:00:52PM +1000, Alexey Kardashevskiy wrote:
> > > A VM which has:
> > >  - a DMA capable device passed through to it (eg. network card);
> > >  - running a malicious kernel that ignores H_PUT_TCE failure;
> > >  - capability of using IOMMU pages bigger that physical pages
> > > can create an IOMMU mapping that exposes (for example) 16MB of
> > > the host physical memory to the device when only 64K was allocated to the VM.
> > > 
> > > The remaining 16MB - 64K will be some other content of host memory, possibly
> > > including pages of the VM, but also pages of host kernel memory, host
> > > programs or other VMs.
> > > 
> > > The attacking VM does not control the location of the page it can map,
> > > and is only allowed to map as many pages as it has pages of RAM.
> > > 
> > > We already have a check in drivers/vfio/vfio_iommu_spapr_tce.c that
> > > an IOMMU page is contained in the physical page so the PCI hardware won't
> > > get access to unassigned host memory; however this check is missing in
> > > the KVM fastpath (H_PUT_TCE accelerated code). We were lucky so far and
> > > did not hit this yet as the very first time when the mapping happens
> > > we do not have tbl::it_userspace allocated yet and fall back to
> > > the userspace which in turn calls VFIO IOMMU driver, this fails and
> > > the guest does not retry,
> > > 
> > > This stores the smallest preregistered page size in the preregistered
> > > region descriptor and changes the mm_iommu_xxx API to check this against
> > > the IOMMU page size. This only allows huge pages use if the entire
> > > preregistered block is backed with huge pages which are completely
> > > contained the preregistered chunk; otherwise this defaults to PAGE_SIZE.
> > > 
> > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>  
> > 
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > 
> > On the grounds that I think this version is safe, which the old one
> > wasn't.  However it still has some flaws..
> > 
> > [snip]
> > > @@ -125,7 +126,8 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
> > >  {
> > >  	struct mm_iommu_table_group_mem_t *mem;
> > >  	long i, j, ret = 0, locked_entries = 0;
> > > -	struct page *page = NULL;
> > > +	unsigned int pageshift;
> > > +	struct page *page = NULL, *head = NULL;
> > >  
> > >  	mutex_lock(&mem_list_mutex);
> > >  
> > > @@ -159,6 +161,7 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
> > >  		goto unlock_exit;
> > >  	}
> > >  
> > > +	mem->pageshift = 64;
> > >  	mem->hpas = vzalloc(array_size(entries, sizeof(mem->hpas[0])));
> > >  	if (!mem->hpas) {
> > >  		kfree(mem);
> > > @@ -199,9 +202,35 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
> > >  			}
> > >  		}
> > >  populate:
> > > +		pageshift = PAGE_SHIFT;
> > > +		if (PageCompound(page)) {
> > > +			/* Make sure huge page is contained completely */
> > > +			struct page *tmphead = compound_head(page);
> > > +			unsigned int n = compound_order(tmphead);
> > > +
> > > +			if (!head) {
> > > +				/* Is it a head of a huge page? */
> > > +				if (page == tmphead) {
> > > +					head = tmphead;
> > > +					pageshift += n;
> > > +				}
> > > +			} else if (head == tmphead) {
> > > +				/* Still same huge page, good */
> > > +				pageshift += n;
> > > +
> > > +				/* End of the huge page */
> > > +				if (page - head == (1UL << n) - 1)
> > > +					head = NULL;
> > > +			}
> > > +		}
> > > +		mem->pageshift = min(mem->pageshift, pageshift);
> > >  		mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
> > >  	}
> > >  
> > > +	/* We have an incomplete huge page, default to PAGE_SHIFT */
> > > +	if (head)
> > > +		mem->pageshift = PAGE_SHIFT;
> > > +  
> > 
> > So, if the user attempts to prereg a region which starts or ends in
> > the middle of a hugepage, this logic will clamp the region's max page
> > shift down to PAGE_SHIFT.  That's safe, but not optimal.
> > 
> > Suppose userspace had an area backed with 16MiB hugepages, and wanted
> > to pre-reg a window that was 2MiB aligned, but not 16MiB aligned.  It
> > would still be safe to allow 2MiB TCEs, but the code above would clamp
> > it down to 64kiB (or 4kiB).
> > 
> > The code to do it is also pretty convoluted.
> > 
> > I think you'd be better off initializing mem->pageshift to the largest
> > possible natural alignment of the region:
> > 	mem->pageshift = ctz64(ua | (entries << PAGE_SHIFT));
> > 
> > Then it should just be sufficient to clamp pageshift down to
> > compound_order() + PAGE_SHIFT for each entry.
> 
> 
> I like this better, just one question - does hugetlbfs guarantee the @ua
> alignment if backed with an actual huge page?

So, yeah it does, as you determined.  And it has to - I don't know of
any MMU that allows for large pages that aren't naturally aligned, so
the uas would have to be aligned to actually map the pages into
userspace.

But... there's another more subtle case that I'm less sure about.
What you're actually checking for here is a compound page on the
physical side.  A hugetlbfs mapping in userspace is the main case
where I'd expect that, but, I'm not absolutely certain there can't be
some other case where a compound page is used to back a normal 64k
mapping in a user process.  If that is possible, it would probably
also be possible for the UA to end up misaligned with the compound
page's natural alignment.

I don't know of any case where that could happen, but I'm far from
confident it doesn't exist.  Things to consider:
   - mapping hugetlbfs, then mremap()ing part of it
   - a SHARED mapping, where it's aligned in one process and gets
     THPed, but is not aligned in the other
   - mmap() from a device or subsystem that provides some kind
     of IO or special memory that's handled with compound pages on the
     kernel side, but is just mapped into userspace with regular 64k
     PTEs
   - One process mapping libhugetlbfs, then another (say a debugger_
     attempting to map the first process's address space via
     /proc/pid/mem)
   - ..and that's just the ones I could think of quickly

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH kernel v4 2/2] KVM: PPC: Check if IOMMU page is contained in the pinned physical page
From: David Gibson @ 2018-07-06  5:13 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, kvm-ppc, Alex Williamson, Michael Ellerman,
	Paul Mackerras
In-Reply-To: <20180705080133.18690-3-aik@ozlabs.ru>

[-- Attachment #1: Type: text/plain, Size: 6235 bytes --]

On Thu, Jul 05, 2018 at 06:01:33PM +1000, Alexey Kardashevskiy wrote:
> A VM which has:
>  - a DMA capable device passed through to it (eg. network card);
>  - running a malicious kernel that ignores H_PUT_TCE failure;
>  - capability of using IOMMU pages bigger that physical pages
> can create an IOMMU mapping that exposes (for example) 16MB of
> the host physical memory to the device when only 64K was allocated to
> the VM.
> 
> The remaining 16MB - 64K will be some other content of host memory,
> possibly including pages of the VM, but also pages of host kernel memory,
> host programs or other VMs.
> 
> The attacking VM does not control the location of the page it can map,
> and is only allowed to map as many pages as it has pages of RAM.
> 
> We already have a check in drivers/vfio/vfio_iommu_spapr_tce.c that
> an IOMMU page is contained in the physical page so the PCI hardware won't
> get access to unassigned host memory; however this check is missing in
> the KVM fastpath (H_PUT_TCE accelerated code). We were lucky so far and
> did not hit this yet as the very first time when the mapping happens
> we do not have tbl::it_userspace allocated yet and fall back to
> the userspace which in turn calls VFIO IOMMU driver, this fails and
> the guest does not retry,
> 
> This stores the smallest preregistered page size in the preregistered
> region descriptor and changes the mm_iommu_xxx API to check this against
> the IOMMU page size. This calculates maximum page size as a minimum of
> the natural region alignment and compound page size.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

It's certainly better than what we have, though a couple of comments
still:

[snip]
> diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
> index abb4364..11e1029 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -27,6 +27,7 @@ struct mm_iommu_table_group_mem_t {
>  	struct rcu_head rcu;
>  	unsigned long used;
>  	atomic64_t mapped;
> +	unsigned int pageshift;
>  	u64 ua;			/* userspace address */
>  	u64 entries;		/* number of entries in hpas[] */
>  	u64 *hpas;		/* vmalloc'ed */
> @@ -125,7 +126,8 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
>  {
>  	struct mm_iommu_table_group_mem_t *mem;
>  	long i, j, ret = 0, locked_entries = 0;
> -	struct page *page = NULL;
> +	unsigned int pageshift;
> +	struct page *page = NULL, *head = NULL;
>  
>  	mutex_lock(&mem_list_mutex);
>  
> @@ -159,6 +161,7 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
>  		goto unlock_exit;
>  	}
>  
> +	mem->pageshift = __builtin_ctzl(ua | (entries << PAGE_SHIFT));

This could definitely do with a comment saying what this is trying to
calculate.

Explicitly calling a _builtin_ is also kinda horrid.  I wrote my
sample thinking of qemu where there's a standard and widely used
ctz64().  Not sure if there's something else we should be using in kernelspace.

>  	mem->hpas = vzalloc(array_size(entries, sizeof(mem->hpas[0])));
>  	if (!mem->hpas) {
>  		kfree(mem);
> @@ -199,9 +202,17 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
>  			}
>  		}
>  populate:
> +		pageshift = PAGE_SHIFT;
> +		if (PageCompound(page))
> +			pageshift += compound_order(compound_head(page));

So, as I said in reply to the earlier version, I'm not 100% sure there
isn't some way a compound_page could end up mapped unaligned in
userspace (with smaller userspace mappings of a larger underlying
physical page).  A WARN_ON() and fallback to assuming pageshift =
PAGE_SHIFT in that case would probably be a good idea.

> +		mem->pageshift = min(mem->pageshift, pageshift);
>  		mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
>  	}
>  
> +	/* We have an incomplete huge page, default to PAGE_SHIFT */
> +	if (head)
> +		mem->pageshift = PAGE_SHIFT;
> +
>  	atomic64_set(&mem->mapped, 1);
>  	mem->used = 1;
>  	mem->ua = ua;
> @@ -349,7 +360,7 @@ struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm,
>  EXPORT_SYMBOL_GPL(mm_iommu_find);
>  
>  long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
> -		unsigned long ua, unsigned long *hpa)
> +		unsigned long ua, unsigned int pageshift, unsigned long *hpa)
>  {
>  	const long entry = (ua - mem->ua) >> PAGE_SHIFT;
>  	u64 *va = &mem->hpas[entry];
> @@ -357,6 +368,9 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
>  	if (entry >= mem->entries)
>  		return -EFAULT;
>  
> +	if (pageshift > mem->pageshift)
> +		return -EFAULT;
> +
>  	*hpa = *va | (ua & ~PAGE_MASK);
>  
>  	return 0;
> @@ -364,7 +378,7 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
>  EXPORT_SYMBOL_GPL(mm_iommu_ua_to_hpa);
>  
>  long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem,
> -		unsigned long ua, unsigned long *hpa)
> +		unsigned long ua, unsigned int pageshift, unsigned long *hpa)
>  {
>  	const long entry = (ua - mem->ua) >> PAGE_SHIFT;
>  	void *va = &mem->hpas[entry];
> @@ -373,6 +387,9 @@ long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem,
>  	if (entry >= mem->entries)
>  		return -EFAULT;
>  
> +	if (pageshift > mem->pageshift)
> +		return -EFAULT;
> +
>  	pa = (void *) vmalloc_to_phys(va);
>  	if (!pa)
>  		return -EFAULT;
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
> index 2da5f05..7cd63b0 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -467,7 +467,7 @@ static int tce_iommu_prereg_ua_to_hpa(struct tce_container *container,
>  	if (!mem)
>  		return -EINVAL;
>  
> -	ret = mm_iommu_ua_to_hpa(mem, tce, phpa);
> +	ret = mm_iommu_ua_to_hpa(mem, tce, shift, phpa);
>  	if (ret)
>  		return -EINVAL;
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH kernel v4 2/2] KVM: PPC: Check if IOMMU page is contained in the pinned physical page
From: Michael Ellerman @ 2018-07-06  6:00 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev
  Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Alex Williamson,
	Paul Mackerras
In-Reply-To: <20180705080133.18690-3-aik@ozlabs.ru>

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
> index abb4364..11e1029 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -159,6 +161,7 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
>  		goto unlock_exit;
>  	}
>  
> +	mem->pageshift = __builtin_ctzl(ua | (entries << PAGE_SHIFT));

__builtin_ctzl(0) is undefined, so are we guaranteed that
(ua | (entries << PAGE_SHIFT)) is never zero? I couldn't convince
myself.

> @@ -199,9 +202,17 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
>  			}
>  		}
>  populate:
> +		pageshift = PAGE_SHIFT;
> +		if (PageCompound(page))
> +			pageshift += compound_order(compound_head(page));
> +		mem->pageshift = min(mem->pageshift, pageshift);
>  		mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
>  	}
>  
> +	/* We have an incomplete huge page, default to PAGE_SHIFT */
> +	if (head)
> +		mem->pageshift = PAGE_SHIFT;
> +

You never set head AFIACS? (other than in the initialiser)

cheers

^ permalink raw reply

* Re: [PATCH kernel v4 2/2] KVM: PPC: Check if IOMMU page is contained in the pinned physical page
From: David Gibson @ 2018-07-06  6:15 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Alexey Kardashevskiy, linuxppc-dev, kvm-ppc, Alex Williamson,
	Paul Mackerras
In-Reply-To: <874lhclx8h.fsf@concordia.ellerman.id.au>

[-- Attachment #1: Type: text/plain, Size: 1726 bytes --]

On Fri, Jul 06, 2018 at 04:00:30PM +1000, Michael Ellerman wrote:
> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> 
> > diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
> > index abb4364..11e1029 100644
> > --- a/arch/powerpc/mm/mmu_context_iommu.c
> > +++ b/arch/powerpc/mm/mmu_context_iommu.c
> > @@ -159,6 +161,7 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
> >  		goto unlock_exit;
> >  	}
> >  
> > +	mem->pageshift = __builtin_ctzl(ua | (entries << PAGE_SHIFT));
> 
> __builtin_ctzl(0) is undefined, so are we guaranteed that
> (ua | (entries << PAGE_SHIFT)) is never zero? I couldn't convince
> myself.

We certainly don't need to care about that case, since registering a
zero-length region (entries == 0) is no-op.  But maybe we need to
filter it above.

> > @@ -199,9 +202,17 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
> >  			}
> >  		}
> >  populate:
> > +		pageshift = PAGE_SHIFT;
> > +		if (PageCompound(page))
> > +			pageshift += compound_order(compound_head(page));
> > +		mem->pageshift = min(mem->pageshift, pageshift);
> >  		mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
> >  	}
> >  
> > +	/* We have an incomplete huge page, default to PAGE_SHIFT */
> > +	if (head)
> > +		mem->pageshift = PAGE_SHIFT;
> > +
> 
> You never set head AFIACS? (other than in the initialiser)

That looks like a leftover from the previous version.

> 
> cheers
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCHv3 0/4] drivers/base: bugfix for supplier<-consumer ordering in device_kset
From: Rafael J. Wysocki @ 2018-07-06  8:47 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Rafael J. Wysocki, Pingfan Liu, Linux Kernel Mailing List,
	Greg Kroah-Hartman, Grygorii Strashko, Christoph Hellwig,
	Bjorn Helgaas, Dave Young, Linux PCI, linuxppc-dev, Linux PM,
	Kishon Vijay Abraham I
In-Reply-To: <20180706083603.GA9063@wunner.de>

On Fri, Jul 6, 2018 at 10:36 AM, Lukas Wunner <lukas@wunner.de> wrote:
> [cc += Kishon Vijay Abraham]
>
> On Thu, Jul 05, 2018 at 11:18:28AM +0200, Rafael J. Wysocki wrote:
>> OK, so calling devices_kset_move_last() from really_probe() clearly is
>> a mistake.
>>
>> I'm not really sure what the intention of it was as the changelog of
>> commit 52cdbdd49853d doesn't really explain that (why would it be
>> insufficient without that change?)
>
> It seems 52cdbdd49853d fixed an issue with boards which have an MMC
> whose reset pin needs to be driven high on shutdown, lest the MMC
> won't be found on the next boot.
>
> The boards' devicetrees use a kludge wherein the reset pin is modelled
> as a regulator.  The regulator is enabled when the MMC probes and
> disabled on driver unbind and shutdown.  As a result, the pin is driven
> low on shutdown and the MMC is not found on the next boot.
>
> To fix this, another kludge was invented wherein the GPIO expander
> driving the reset pin unconditionally drives all its pins high on
> shutdown, see pcf857x_shutdown() in drivers/gpio/gpio-pcf857x.c
> (commit adc284755055, "gpio: pcf857x: restore the initial line state
> of all pcf lines").
>
> For this kludge to work, the GPIO expander's ->shutdown hook needs to
> be executed after the MMC expander's ->shutdown hook.
>
> Commit 52cdbdd49853d achieved that by reordering devices_kset according
> to the probe order.  Apparently the MMC probes after the GPIO expander,
> possibly because it returns -EPROBE_DEFER if the vmmc regulator isn't
> available yet, see mmc_regulator_get_supply().
>
> Note, I'm just piecing the information together from git history,
> I'm not responsible for these kludges.  (I'm innocent!)

Sure enough. :-)

In any case, calling devices_kset_move_last() in really_probe() is
plain broken and if its only purpose was to address a single, arguably
kludgy, use case, let's just get rid of it in the first place IMO.

> @Pingfan Liu, if you just remove the call to devices_kset_move_last()
> from really_probe(), does the issue go away?

I would think so from the description of the problem (elsewhere in this thread).

Thanks,
Rafael

^ permalink raw reply

* [PATCH v3 0/2] powerpc: Detection and scheduler optimization for POWER9 bigcore
From: Gautham R. Shenoy @ 2018-07-06  9:05 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Michael Neuling,
	Vaidyanathan Srinivasan, Akshay Adiga, Shilpasri G Bhat,
	Oliver O'Halloran, Nicholas Piggin, Murilo Opsfelder Araujo
  Cc: linuxppc-dev, linux-kernel, Gautham R. Shenoy

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

Hi,

This is the third iteration of the patchset to add support for
big-core on POWER9.

The previous versions can be found here:

v2: https://lkml.org/lkml/2018/7/3/401
v1: https://lkml.org/lkml/2018/5/11/245

Changes : v2 --> v3
    - Set sane values in the tg->property, tg->nr_groups inside
    parse_thread_groups before returning due to an error.
    - Define a helper function to determine whether a CPU device node
      is a big-core or not.
    - Updated the comments around the functions to describe the
      arguments passed to them.

These changes were suggested by Murilo Opsfelder Araujo.

v1 --> v2
    - Added comments explaining the "ibm,thread-groups" device tree property.
    - Uses cleaner device-tree parsing functions to parse the u32 arrays.
    - Adds a sysfs file listing the small-core siblings for every CPU.
    - Enables the scheduler optimization by setting the CPU_FTR_ASYM_SMT bit
      in the cur_cpu_spec->cpu_features on detecting the presence
      of interleaved big-core.
    - Handles the corner case where there is only a single thread-group
      or when there is a single thread in a thread-group.

Description:
~~~~~~~~~~~~~~~~~~~~
A pair of IBM POWER9 SMT4 cores can be fused together to form a
big-core with 8 SMT threads. This can be discovered via the
"ibm,thread-groups" CPU property in the device tree which will
indicate which group of threads that share the L1 cache, translation
cache and instruction data flow.  If there are multiple such group of
threads, then the core is a big-core. Furthermore, the thread-ids of
such a big-core is obtained by interleaving the thread-ids of the
component SMT4 cores.

Eg: Threads in the pair of component SMT4 cores of an interleaved
big-core are numbered {0,2,4,6} and {1,3,5,7} respectively.

On such a big-core, when multiple tasks are scheduled to run on the
big-core, we get the best performance when the tasks are spread across
the pair of SMT4 cores.

The Linux scheduler supports a flag called "SD_ASYM_PACKING" which
when set in the SMT sched-domain, biases the load-balancing of the
tasks on the smaller numbered threads in the core. On an big-core
whose threads are interleavings of the threads of the small cores,
enabling SD_ASYM_PACKING in the SMT sched-domain automatically results
in spreading the tasks uniformly across the associated pair of SMT4
cores, thereby yielding better performance.

This patchset contains two patches which on detecting the presence of
interleaved big-cores will enable the the CPU_FTR_ASYM_SMT bit in the
cur_cpu_spec->cpu_feature.

Patch 1: adds support to detect the presence of
big-cores and reports the small-core siblings of each CPU X
via the sysfs file "/sys/devices/system/cpu/cpuX/big_core_siblings".

Patch 2: checks if the thread-ids of the component small-cores are
interleaved, in which case we enable the the CPU_FTR_ASYM_SMT bit in
the cur_cpu_spec->cpu_features which results in the SD_ASYM_PACKING
flag being set at the SMT level sched-domain.

Results:
~~~~~~~~~~~~~~~~~
Experimental results for ebizzy with 2 threads, bound to a single big-core
show a marked improvement with this patchset over the 4.18-rc3 vanilla
kernel.

The result of 100 such runs for 4.18-rc3 kernel and the 4.18-rc3 +
big-core-patches are as follows

4.18-rc3:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        records/s    :  # samples  : Histogram
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[0 -       1000000]  :      0      : #
[1000000 - 2000000]  :      6      : ##
[2000000 - 3000000]  :      8      : ##
[3000000 - 4000000]  :      15     : ####
[4000000 - 5000000]  :      0      : #
[5000000 - 6000000]  :      71     : ###############

4.18-rc3 + big-core-patches
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        records/s    :  # samples  : Histogram
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[0 -       1000000]  :      0      : #
[1000000 - 2000000]  :      0      : #
[2000000 - 3000000]  :      9      : ##
[3000000 - 4000000]  :      0      : #
[4000000 - 5000000]  :      0      : #
[5000000 - 6000000]  :      91     : ###################

Gautham R. Shenoy (2):
  powerpc: Detect the presence of big-cores via "ibm,thread-groups"
  powerpc: Enable CPU_FTR_ASYM_SMT for interleaved big-cores

 Documentation/ABI/testing/sysfs-devices-system-cpu |   8 +
 arch/powerpc/include/asm/cputhreads.h              |  22 ++
 arch/powerpc/kernel/setup-common.c                 | 221 ++++++++++++++++++++-
 arch/powerpc/kernel/sysfs.c                        |  35 ++++
 4 files changed, 285 insertions(+), 1 deletion(-)

-- 
1.9.4

^ permalink raw reply

* [PATCH v3 2/2] powerpc: Enable CPU_FTR_ASYM_SMT for interleaved big-cores
From: Gautham R. Shenoy @ 2018-07-06  9:05 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Michael Neuling,
	Vaidyanathan Srinivasan, Akshay Adiga, Shilpasri G Bhat,
	Oliver O'Halloran, Nicholas Piggin, Murilo Opsfelder Araujo
  Cc: linuxppc-dev, linux-kernel, Gautham R. Shenoy
In-Reply-To: <cover.1530867876.git.ego@linux.vnet.ibm.com>

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

A pair of IBM POWER9 SMT4 cores can be fused together to form a big-core
with 8 SMT threads. This can be discovered via the "ibm,thread-groups"
CPU property in the device tree which will indicate which group of
threads that share the L1 cache, translation cache and instruction data
flow. If there are multiple such group of threads, then the core is a
big-core.

Furthermore, if the thread-ids of the threads of the big-core can be
obtained by interleaving the thread-ids of the thread-groups
(component small core), then such a big-core is called an interleaved
big-core.

Eg: Threads in the pair of component SMT4 cores of an interleaved
big-core are numbered {0,2,4,6} and {1,3,5,7} respectively.

The SMT4 cores forming a big-core are more or less independent
units. Thus when multiple tasks are scheduled to run on the fused
core, we get the best performance when the tasks are spread across the
pair of SMT4 cores.

This patch enables CPU_FTR_ASYM_SMT bit in the cpu-features on
detecting the presence of interleaved big-cores at boot up. This will
will bias the load-balancing of tasks on smaller numbered threads,
which will automatically result in spreading the tasks uniformly
across the associated pair of SMT4 cores.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/setup-common.c | 67 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 989edc1..c200fb8 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -581,6 +581,56 @@ int get_cpu_thread_group_start(int cpu, struct thread_groups *tg)
 	return -1;
 }
 
+/*
+ * check_interleaved_big_core - Checks if the thread group tg
+ * corresponds to a big-core whose threads are interleavings of the
+ * threads of the component small cores.
+ *
+ * @tg: A thread-group struct for the core.
+ *
+ * Returns true if the core is a interleaved big-core.
+ * Returns false otherwise.
+ */
+static inline bool check_interleaved_big_core(struct thread_groups *tg)
+{
+	int nr_groups;
+	int threads_per_group;
+	int cur_cpu, next_cpu, i, j;
+
+	nr_groups = tg->nr_groups;
+	threads_per_group = tg->threads_per_group;
+
+	if (tg->property != 1)
+		return false;
+
+	if (nr_groups < 2 || threads_per_group < 2)
+		return false;
+
+	/*
+	 * In case of an interleaved big-core, the thread-ids of the
+	 * big-core can be obtained by interleaving the the thread-ids
+	 * of the component small
+	 *
+	 * Eg: On a 8-thread big-core with two SMT4 small cores, the
+	 * threads of the two component small cores will be
+	 * {0, 2, 4, 6} and {1, 3, 5, 7}.
+	 */
+	for (i = 0; i < nr_groups; i++) {
+		int group_start = i * threads_per_group;
+
+		for (j = 0; j < threads_per_group - 1; j++) {
+			int cur_idx = group_start + j;
+
+			cur_cpu = tg->thread_list[cur_idx];
+			next_cpu = tg->thread_list[cur_idx + 1];
+			if (next_cpu != cur_cpu + nr_groups)
+				return false;
+		}
+	}
+
+	return true;
+}
+
 /**
  * setup_cpu_maps - initialize the following cpu maps:
  *                  cpu_possible_mask
@@ -604,6 +654,7 @@ void __init smp_setup_cpu_maps(void)
 	struct device_node *dn;
 	int cpu = 0;
 	int nthreads = 1;
+	bool has_interleaved_big_cores = true;
 
 	has_big_cores = true;
 	DBG("smp_setup_cpu_maps()\n");
@@ -657,6 +708,12 @@ void __init smp_setup_cpu_maps(void)
 
 		if (has_big_cores && !dt_has_big_core(dn, &tg)) {
 			has_big_cores = false;
+			has_interleaved_big_cores = false;
+		}
+
+		if (has_interleaved_big_cores) {
+			has_interleaved_big_cores =
+				check_interleaved_big_core(&tg);
 		}
 
 		if (cpu >= nr_cpu_ids) {
@@ -713,7 +770,15 @@ void __init smp_setup_cpu_maps(void)
 	vdso_data->processorCount = num_present_cpus();
 #endif /* CONFIG_PPC64 */
 
-        /* Initialize CPU <=> thread mapping/
+	if (has_interleaved_big_cores) {
+		int key = __builtin_ctzl(CPU_FTR_ASYM_SMT);
+
+		cur_cpu_spec->cpu_features |= CPU_FTR_ASYM_SMT;
+		static_branch_enable(&cpu_feature_keys[key]);
+		pr_info("Detected interleaved big-cores\n");
+	}
+
+	/* Initialize CPU <=> thread mapping/
 	 *
 	 * WARNING: We assume that the number of threads is the same for
 	 * every CPU in the system. If that is not the case, then some code
-- 
1.9.4

^ permalink raw reply related

* [PATCH v3 1/2] powerpc: Detect the presence of big-cores via "ibm, thread-groups"
From: Gautham R. Shenoy @ 2018-07-06  9:05 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Michael Neuling,
	Vaidyanathan Srinivasan, Akshay Adiga, Shilpasri G Bhat,
	Oliver O'Halloran, Nicholas Piggin, Murilo Opsfelder Araujo
  Cc: linuxppc-dev, linux-kernel, Gautham R. Shenoy
In-Reply-To: <cover.1530867876.git.ego@linux.vnet.ibm.com>

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

On IBM POWER9, the device tree exposes a property array identifed by
"ibm,thread-groups" which will indicate which groups of threads share a
particular set of resources.

As of today we only have one form of grouping identifying the group of
threads in the core that share the L1 cache, translation cache and
instruction data flow.

This patch defines the helper function to parse the contents of
"ibm,thread-groups" and a new structure to contain the parsed output.

The patch also creates the sysfs file named "small_core_siblings" that
returns the physical ids of the threads in the core that share the L1
cache, translation cache and instruction data flow.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 Documentation/ABI/testing/sysfs-devices-system-cpu |   8 ++
 arch/powerpc/include/asm/cputhreads.h              |  22 +++
 arch/powerpc/kernel/setup-common.c                 | 154 +++++++++++++++++++++
 arch/powerpc/kernel/sysfs.c                        |  35 +++++
 4 files changed, 219 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 9c5e7732..62f24de 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -487,3 +487,11 @@ Description:	Information about CPU vulnerabilities
 		"Not affected"	  CPU is not affected by the vulnerability
 		"Vulnerable"	  CPU is affected and no mitigation in effect
 		"Mitigation: $M"  CPU is affected and mitigation $M is in effect
+
+What: 		/sys/devices/system/cpu/cpu[0-9]+/small_core_siblings
+Date:		05-Jul-2018
+KernelVersion:	v4.18.0
+Contact:	Gautham R. Shenoy <ego@linux.vnet.ibm.com>
+Description:	List of Physical ids of CPUs which share the the L1 cache,
+		translation cache and instruction data-flow with this CPU.
+Values:		Comma separated list of decimal integers.
diff --git a/arch/powerpc/include/asm/cputhreads.h b/arch/powerpc/include/asm/cputhreads.h
index d71a909..33226d7 100644
--- a/arch/powerpc/include/asm/cputhreads.h
+++ b/arch/powerpc/include/asm/cputhreads.h
@@ -23,11 +23,13 @@
 extern int threads_per_core;
 extern int threads_per_subcore;
 extern int threads_shift;
+extern bool has_big_cores;
 extern cpumask_t threads_core_mask;
 #else
 #define threads_per_core	1
 #define threads_per_subcore	1
 #define threads_shift		0
+#define has_big_cores		0
 #define threads_core_mask	(*get_cpu_mask(0))
 #endif
 
@@ -69,12 +71,32 @@ static inline cpumask_t cpu_online_cores_map(void)
 	return cpu_thread_mask_to_cores(cpu_online_mask);
 }
 
+#define MAX_THREAD_LIST_SIZE	8
+struct thread_groups {
+	unsigned int property;
+	unsigned int nr_groups;
+	unsigned int threads_per_group;
+	unsigned int thread_list[MAX_THREAD_LIST_SIZE];
+};
+
 #ifdef CONFIG_SMP
 int cpu_core_index_of_thread(int cpu);
 int cpu_first_thread_of_core(int core);
+int parse_thread_groups(struct device_node *dn, struct thread_groups *tg);
+int get_cpu_thread_group_start(int cpu, struct thread_groups *tg);
 #else
 static inline int cpu_core_index_of_thread(int cpu) { return cpu; }
 static inline int cpu_first_thread_of_core(int core) { return core; }
+static inline int parse_thread_groups(struct device_node *dn,
+				      struct thread_groups *tg)
+{
+	return -ENODATA;
+}
+
+static inline int get_cpu_thread_group_start(int cpu, struct thread_groups *tg)
+{
+	return -1;
+}
 #endif
 
 static inline int cpu_thread_in_core(int cpu)
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 40b44bb..989edc1 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -402,10 +402,12 @@ void __init check_for_initrd(void)
 #ifdef CONFIG_SMP
 
 int threads_per_core, threads_per_subcore, threads_shift;
+bool has_big_cores;
 cpumask_t threads_core_mask;
 EXPORT_SYMBOL_GPL(threads_per_core);
 EXPORT_SYMBOL_GPL(threads_per_subcore);
 EXPORT_SYMBOL_GPL(threads_shift);
+EXPORT_SYMBOL_GPL(has_big_cores);
 EXPORT_SYMBOL_GPL(threads_core_mask);
 
 static void __init cpu_init_thread_core_maps(int tpc)
@@ -433,6 +435,152 @@ static void __init cpu_init_thread_core_maps(int tpc)
 
 u32 *cpu_to_phys_id = NULL;
 
+/*
+ * parse_thread_groups: Parses the "ibm,thread-groups" device tree
+ *                      property for the CPU device node @dn and stores
+ *                      the parsed output in the thread_groups
+ *                      structure @tg.
+ *
+ * @dn: The device node of the CPU device.
+ * @tg: Pointer to a thread group structure into which the parsed
+ *     output of "ibm,thread-groups" is stored.
+ *
+ * ibm,thread-groups[0..N-1] array defines which group of threads in
+ * the CPU-device node can be grouped together based on the property.
+ *
+ * ibm,thread-groups[0] tells us the property based on which the
+ * threads are being grouped together. If this value is 1, it implies
+ * that the threads in the same group share L1, translation cache.
+ *
+ * ibm,thread-groups[1] tells us how many such thread groups exist.
+ *
+ * ibm,thread-groups[2] tells us the number of threads in each such
+ * group.
+ *
+ * ibm,thread-groups[3..N-1] is the list of threads identified by
+ * "ibm,ppc-interrupt-server#s" arranged as per their membership in
+ * the grouping.
+ *
+ * Example: If ibm,thread-groups = [1,2,4,5,6,7,8,9,10,11,12] it
+ * implies that there are 2 groups of 4 threads each, where each group
+ * of threads share L1, translation cache.
+ *
+ * The "ibm,ppc-interrupt-server#s" of the first group is {5,6,7,8}
+ * and the "ibm,ppc-interrupt-server#s" of the second group is {9, 10,
+ * 11, 12} structure
+ *
+ * Returns 0 on success, -EINVAL if the property does not exist,
+ * -ENODATA if property does not have a value, and -EOVERFLOW if the
+ * property data isn't large enough.
+ */
+int parse_thread_groups(struct device_node *dn,
+			struct thread_groups *tg)
+{
+	unsigned int nr_groups, threads_per_group, property;
+	int i;
+	u32 thread_group_array[3 + MAX_THREAD_LIST_SIZE];
+	u32 *thread_list;
+	size_t total_threads;
+	int ret;
+
+	ret = of_property_read_u32_array(dn, "ibm,thread-groups",
+					 thread_group_array, 3);
+
+	if (ret)
+		goto out_err;
+
+	property = thread_group_array[0];
+	nr_groups = thread_group_array[1];
+	threads_per_group = thread_group_array[2];
+	total_threads = nr_groups * threads_per_group;
+
+	ret = of_property_read_u32_array(dn, "ibm,thread-groups",
+					 thread_group_array,
+					 3 + total_threads);
+	if (ret)
+		goto out_err;
+
+	thread_list = &thread_group_array[3];
+
+	for (i = 0 ; i < total_threads; i++)
+		tg->thread_list[i] = thread_list[i];
+
+	tg->property = property;
+	tg->nr_groups = nr_groups;
+	tg->threads_per_group = threads_per_group;
+
+	return 0;
+out_err:
+	tg->property = 0;
+	tg->nr_groups = 0;
+	tg->threads_per_group = 0;
+	return ret;
+}
+
+/*
+ * dt_has_big_core : Parses the device tree property
+ *		    "ibm,thread-groups" for device node pointed by @dn
+ *		    and stores the parsed output in the structure
+ *		    pointed to by @tg.  Then checks if the output in
+ *		    @tg corresponds to a big-core.
+ *
+ * @dn: Device node pointer of the CPU node being checked for a
+ *      big-core.
+ * @tg: Pointer to thread_groups struct in which parsed output of
+ *      "ibm,thread-groups" is recorded.
+ *
+ * Returns true if the @dn points to a big-core.
+ * Returns false if there is an error in parsing "ibm,thread-groups"
+ * or the parsed output doesn't correspond to a big-core.
+ */
+static inline bool dt_has_big_core(struct device_node *dn,
+				   struct thread_groups *tg)
+{
+	if (parse_thread_groups(dn, tg))
+		return false;
+
+	if (tg->property != 1)
+		return false;
+
+	if (tg->nr_groups < 1)
+		return false;
+
+	return true;
+}
+
+/*
+ * get_cpu_thread_group_start : Searches the thread group in tg->thread_list
+ *                              that @cpu belongs to.
+ *
+ * @cpu : The logical CPU whose thread group is being searched.
+ * @tg : The thread-group structure of the CPU node which @cpu belongs
+ *       to.
+ *
+ * Returns the index to tg->thread_list that points to the the start
+ * of the thread_group that @cpu belongs to.
+ *
+ * Returns -1 if cpu doesn't belong to any of the groups pointed to by
+ * tg->thread_list.
+ */
+int get_cpu_thread_group_start(int cpu, struct thread_groups *tg)
+{
+	int hw_cpu_id = get_hard_smp_processor_id(cpu);
+	int i, j;
+
+	for (i = 0; i < tg->nr_groups; i++) {
+		int group_start = i * tg->threads_per_group;
+
+		for (j = 0; j < tg->threads_per_group; j++) {
+			int idx = group_start + j;
+
+			if (tg->thread_list[idx] == hw_cpu_id)
+				return group_start;
+		}
+	}
+
+	return -1;
+}
+
 /**
  * setup_cpu_maps - initialize the following cpu maps:
  *                  cpu_possible_mask
@@ -457,6 +605,7 @@ void __init smp_setup_cpu_maps(void)
 	int cpu = 0;
 	int nthreads = 1;
 
+	has_big_cores = true;
 	DBG("smp_setup_cpu_maps()\n");
 
 	cpu_to_phys_id = __va(memblock_alloc(nr_cpu_ids * sizeof(u32),
@@ -467,6 +616,7 @@ void __init smp_setup_cpu_maps(void)
 		const __be32 *intserv;
 		__be32 cpu_be;
 		int j, len;
+		struct thread_groups tg;
 
 		DBG("  * %pOF...\n", dn);
 
@@ -505,6 +655,10 @@ void __init smp_setup_cpu_maps(void)
 			cpu++;
 		}
 
+		if (has_big_cores && !dt_has_big_core(dn, &tg)) {
+			has_big_cores = false;
+		}
+
 		if (cpu >= nr_cpu_ids) {
 			of_node_put(dn);
 			break;
diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 755dc98..f5717de 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -18,6 +18,7 @@
 #include <asm/smp.h>
 #include <asm/pmc.h>
 #include <asm/firmware.h>
+#include <asm/cputhreads.h>
 
 #include "cacheinfo.h"
 #include "setup.h"
@@ -1025,6 +1026,33 @@ static ssize_t show_physical_id(struct device *dev,
 }
 static DEVICE_ATTR(physical_id, 0444, show_physical_id, NULL);
 
+static ssize_t show_small_core_siblings(struct device *dev,
+					struct device_attribute *attr,
+					char *buf)
+{
+	struct cpu *cpu = container_of(dev, struct cpu, dev);
+	struct device_node *dn = of_get_cpu_node(cpu->dev.id, NULL);
+	struct thread_groups tg;
+	int i, j;
+	ssize_t ret = 0;
+
+	if (parse_thread_groups(dn, &tg))
+		return -ENODATA;
+
+	i = get_cpu_thread_group_start(cpu->dev.id, &tg);
+
+	if (i == -1)
+		return -ENODATA;
+
+	for (j = 0; j < tg.threads_per_group - 1; j++)
+		ret += sprintf(buf + ret, "%d,", tg.thread_list[i + j]);
+
+	ret += sprintf(buf + ret, "%d\n", tg.thread_list[i + j]);
+
+	return ret;
+}
+static DEVICE_ATTR(small_core_siblings, 0444, show_small_core_siblings, NULL);
+
 static int __init topology_init(void)
 {
 	int cpu, r;
@@ -1048,6 +1076,13 @@ static int __init topology_init(void)
 			register_cpu(c, cpu);
 
 			device_create_file(&c->dev, &dev_attr_physical_id);
+
+			if (has_big_cores) {
+				const struct device_attribute *attr =
+				       &dev_attr_small_core_siblings;
+
+			       device_create_file(&c->dev, attr);
+			}
 		}
 	}
 	r = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "powerpc/topology:online",
-- 
1.9.4

^ permalink raw reply related

* Re: [PATCH v6 8/8] powernv/pseries: consolidate code for mce early handling.
From: Nicholas Piggin @ 2018-07-06  9:40 UTC (permalink / raw)
  To: Mahesh J Salgaonkar
  Cc: linuxppc-dev, Aneesh Kumar K.V, Laurent Dufour, Michal Suchanek,
	Michael Ellerman
In-Reply-To: <153072717270.29016.18207683951100257477.stgit@jupiter.in.ibm.com>

On Wed, 04 Jul 2018 23:30:12 +0530
Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:

> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> Now that other platforms also implements real mode mce handler,
> lets consolidate the code by sharing existing powernv machine check
> early code. Rename machine_check_powernv_early to
> machine_check_common_early and reuse the code.
> 
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/exceptions-64s.S |   56 +++++++---------------------------
>  1 file changed, 11 insertions(+), 45 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index 0038596b7906..3e877ec55d50 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -243,14 +243,13 @@ EXC_REAL_BEGIN(machine_check, 0x200, 0x100)
>  	SET_SCRATCH0(r13)		/* save r13 */
>  	EXCEPTION_PROLOG_0(PACA_EXMC)
>  BEGIN_FTR_SECTION
> -	b	machine_check_powernv_early
> +	b	machine_check_common_early
>  FTR_SECTION_ELSE
>  	b	machine_check_pSeries_0
>  ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE)
>  EXC_REAL_END(machine_check, 0x200, 0x100)
>  EXC_VIRT_NONE(0x4200, 0x100)
> -TRAMP_REAL_BEGIN(machine_check_powernv_early)
> -BEGIN_FTR_SECTION
> +TRAMP_REAL_BEGIN(machine_check_common_early)
>  	EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
>  	/*
>  	 * Register contents:
> @@ -306,7 +305,9 @@ BEGIN_FTR_SECTION
>  	/* Save r9 through r13 from EXMC save area to stack frame. */
>  	EXCEPTION_PROLOG_COMMON_2(PACA_EXMC)
>  	mfmsr	r11			/* get MSR value */
> +BEGIN_FTR_SECTION
>  	ori	r11,r11,MSR_ME		/* turn on ME bit */
> +END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
>  	ori	r11,r11,MSR_RI		/* turn on RI bit */
>  	LOAD_HANDLER(r12, machine_check_handle_early)
>  1:	mtspr	SPRN_SRR0,r12
> @@ -325,7 +326,6 @@ BEGIN_FTR_SECTION
>  	andc	r11,r11,r10		/* Turn off MSR_ME */
>  	b	1b
>  	b	.	/* prevent speculative execution */
> -END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
>  
>  TRAMP_REAL_BEGIN(machine_check_pSeries)
>  	.globl machine_check_fwnmi
> @@ -333,7 +333,7 @@ machine_check_fwnmi:
>  	SET_SCRATCH0(r13)		/* save r13 */
>  	EXCEPTION_PROLOG_0(PACA_EXMC)
>  BEGIN_FTR_SECTION
> -	b	machine_check_pSeries_early
> +	b	machine_check_common_early
>  END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE)
>  machine_check_pSeries_0:
>  	EXCEPTION_PROLOG_1(PACA_EXMC, KVMTEST_PR, 0x200)
> @@ -346,45 +346,6 @@ machine_check_pSeries_0:
>  
>  TRAMP_KVM_SKIP(PACA_EXMC, 0x200)
>  
> -TRAMP_REAL_BEGIN(machine_check_pSeries_early)
> -BEGIN_FTR_SECTION
> -	EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
> -	mr	r10,r1			/* Save r1 */
> -	ld	r1,PACAMCEMERGSP(r13)	/* Use MC emergency stack */
> -	subi	r1,r1,INT_FRAME_SIZE	/* alloc stack frame		*/
> -	mfspr	r11,SPRN_SRR0		/* Save SRR0 */
> -	mfspr	r12,SPRN_SRR1		/* Save SRR1 */
> -	EXCEPTION_PROLOG_COMMON_1()
> -	EXCEPTION_PROLOG_COMMON_2(PACA_EXMC)
> -	EXCEPTION_PROLOG_COMMON_3(0x200)
> -	addi	r3,r1,STACK_FRAME_OVERHEAD
> -	BRANCH_LINK_TO_FAR(machine_check_early) /* Function call ABI */
> -
> -	/* Move original SRR0 and SRR1 into the respective regs */
> -	ld	r9,_MSR(r1)
> -	mtspr	SPRN_SRR1,r9
> -	ld	r3,_NIP(r1)
> -	mtspr	SPRN_SRR0,r3
> -	ld	r9,_CTR(r1)
> -	mtctr	r9
> -	ld	r9,_XER(r1)
> -	mtxer	r9
> -	ld	r9,_LINK(r1)
> -	mtlr	r9
> -	REST_GPR(0, r1)
> -	REST_8GPRS(2, r1)
> -	REST_GPR(10, r1)
> -	ld	r11,_CCR(r1)
> -	mtcr	r11
> -	REST_GPR(11, r1)
> -	REST_2GPRS(12, r1)
> -	/* restore original r1. */
> -	ld	r1,GPR1(r1)
> -	SET_SCRATCH0(r13)		/* save r13 */
> -	EXCEPTION_PROLOG_0(PACA_EXMC)
> -	b	machine_check_pSeries_0
> -END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE)
> -
>  EXC_COMMON_BEGIN(machine_check_common)
>  	/*
>  	 * Machine check is different because we use a different
> @@ -483,6 +444,9 @@ EXC_COMMON_BEGIN(machine_check_handle_early)
>  	bl	machine_check_early
>  	std	r3,RESULT(r1)	/* Save result */
>  	ld	r12,_MSR(r1)
> +BEGIN_FTR_SECTION
> +	bne	9f			/* pSeries: continue to V mode. */
> +END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE)

Should this be "b 9f" ? Although...

>  
>  #ifdef	CONFIG_PPC_P7_NAP
>  	/*
> @@ -564,7 +528,9 @@ EXC_COMMON_BEGIN(machine_check_handle_early)
>  9:
>  	/* Deliver the machine check to host kernel in V mode. */
>  	MACHINE_CHECK_HANDLER_WINDUP
> -	b	machine_check_pSeries
> +	SET_SCRATCH0(r13)		/* save r13 */
> +	EXCEPTION_PROLOG_0(PACA_EXMC)
> +	b	machine_check_pSeries_0

I'm not sure that's quite right. You're missing out testing the result
of the early handler call? Is this buggy in existing code too? We
should be testing that result in all cases, shouldn't we? But it doesn't
seem like we are.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCHv3 2/4] drivers/base: utilize device tree info to shutdown devices
From: Rafael J. Wysocki @ 2018-07-06  9:53 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-kernel, Greg Kroah-Hartman, Rafael J . Wysocki,
	Grygorii Strashko, Christoph Hellwig, Bjorn Helgaas, Dave Young,
	linux-pci, linuxppc-dev
In-Reply-To: <CAFgQCTsGC8NH+5kXYD7vykJeUsaAzCAjgxSP-R0EqA3YvgsY0w@mail.gmail.com>

On Friday, July 6, 2018 5:02:15 AM CEST Pingfan Liu wrote:
> On Thu, Jul 5, 2018 at 6:13 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > On Tuesday, July 3, 2018 8:50:40 AM CEST Pingfan Liu wrote:
> > > commit 52cdbdd49853 ("driver core: correct device's shutdown order")
> > > places an assumption of supplier<-consumer order on the process of probe.
> > > But it turns out to break down the parent <- child order in some scene.
> > > E.g in pci, a bridge is enabled by pci core, and behind it, the devices
> > > have been probed. Then comes the bridge's module, which enables extra
> > > feature(such as hotplug) on this bridge. This will break the
> > > parent<-children order and cause failure when "kexec -e" in some scenario.
> > >
> > > The detailed description of the scenario:
> > > An IBM Power9 machine on which, two drivers portdrv_pci and shpchp(a mod)
> > > match the PCI_CLASS_BRIDGE_PCI, but neither of them success to probe due
> > > to some issue. For this case, the bridge is moved after its children in
> > > devices_kset. Then, when "kexec -e", a ata-disk behind the bridge can not
> > > write back buffer in flight due to the former shutdown of the bridge which
> > > clears the BusMaster bit.
> > >
> > > It is a little hard to impose both "parent<-child" and "supplier<-consumer"
> > > order on devices_kset. Take the following scene:
> > > step0: before a consumer's probing, (note child_a is supplier of consumer_a)
> > >   [ consumer-X, child_a, ...., child_z] [... consumer_a, ..., consumer_z, ...] supplier-X
> > >                                          ^^^^^^^^^^ affected range ^^^^^^^^^^
> > > step1: when probing, moving consumer-X after supplier-X
> > >   [ child_a, ...., child_z] [.... consumer_a, ..., consumer_z, ...] supplier-X, consumer-X
> > > step2: the children of consumer-X should be re-ordered to maintain the seq
> > >   [... consumer_a, ..., consumer_z, ....] supplier-X  [consumer-X, child_a, ...., child_z]
> > > step3: the consumer_a should be re-ordered to maintain the seq
> > >   [... consumer_z, ...] supplier-X [ consumer-X, child_a, consumer_a ..., child_z]
> > >
> > > It requires two nested recursion to drain out all out-of-order item in
> > > "affected range". To avoid such complicated code, this patch suggests
> > > to utilize the info in device tree, instead of using the order of
> > > devices_kset during shutdown. It iterates the device tree, and firstly
> > > shutdown a device's children and consumers. After this patch, the buggy
> > > commit is hollow and left to clean.
> > >
> > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> > > Cc: Christoph Hellwig <hch@infradead.org>
> > > Cc: Bjorn Helgaas <helgaas@kernel.org>
> > > Cc: Dave Young <dyoung@redhat.com>
> > > Cc: linux-pci@vger.kernel.org
> > > Cc: linuxppc-dev@lists.ozlabs.org
> > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > > ---
> > >  drivers/base/core.c    | 48 +++++++++++++++++++++++++++++++++++++++++++-----
> > >  include/linux/device.h |  1 +
> > >  2 files changed, 44 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > > index a48868f..684b994 100644
> > > --- a/drivers/base/core.c
> > > +++ b/drivers/base/core.c
> > > @@ -1446,6 +1446,7 @@ void device_initialize(struct device *dev)
> > >       INIT_LIST_HEAD(&dev->links.consumers);
> > >       INIT_LIST_HEAD(&dev->links.suppliers);
> > >       dev->links.status = DL_DEV_NO_DRIVER;
> > > +     dev->shutdown = false;
> > >  }
> > >  EXPORT_SYMBOL_GPL(device_initialize);
> > >
> > > @@ -2811,7 +2812,6 @@ static void __device_shutdown(struct device *dev)
> > >        * lock is to be held
> > >        */
> > >       parent = get_device(dev->parent);
> > > -     get_device(dev);
> >
> > Why is the get_/put_device() not needed any more?
> >
> They are moved upper layer into device_for_each_child_shutdown().
> Since there is lock breakage in __device_shutdown(), resorting to
> ref++ to protect the ancestor.  And I think the
> get_device(dev->parent) can be deleted either.

Wouldn't that break USB?

> > >       /*
> > >        * Make sure the device is off the kset list, in the
> > >        * event that dev->*->shutdown() doesn't remove it.
> > > @@ -2842,23 +2842,60 @@ static void __device_shutdown(struct device *dev)
> > >                       dev_info(dev, "shutdown\n");
> > >               dev->driver->shutdown(dev);
> > >       }
> > > -
> > > +     dev->shutdown = true;
> > >       device_unlock(dev);
> > >       if (parent)
> > >               device_unlock(parent);
> > >
> > > -     put_device(dev);
> > >       put_device(parent);
> > >       spin_lock(&devices_kset->list_lock);
> > >  }
> > >
> > > +/* shutdown dev's children and consumer firstly, then itself */
> > > +static int device_for_each_child_shutdown(struct device *dev)
> >
> > Confusing name.
> >
> > What about device_shutdown_subordinate()?
> >
> Fine. My understanding of words is not exact.
> 
> > > +{
> > > +     struct klist_iter i;
> > > +     struct device *child;
> > > +     struct device_link *link;
> > > +
> > > +     /* already shutdown, then skip this sub tree */
> > > +     if (dev->shutdown)
> > > +             return 0;
> > > +
> > > +     if (!dev->p)
> > > +             goto check_consumers;
> > > +
> > > +     /* there is breakage of lock in __device_shutdown(), and the redundant
> > > +      * ref++ on srcu protected consumer is harmless since shutdown is not
> > > +      * hot path.
> > > +      */
> > > +     get_device(dev);
> > > +
> > > +     klist_iter_init(&dev->p->klist_children, &i);
> > > +     while ((child = next_device(&i)))
> > > +             device_for_each_child_shutdown(child);
> >
> > Why don't you use device_for_each_child() here?
> >
> OK, I will try use it.

Well, hold on.

> > > +     klist_iter_exit(&i);
> > > +
> > > +check_consumers:
> > > +     list_for_each_entry_rcu(link, &dev->links.consumers, s_node) {
> > > +             if (!link->consumer->shutdown)
> > > +                     device_for_each_child_shutdown(link->consumer);
> > > +     }
> > > +
> > > +     __device_shutdown(dev);
> > > +     put_device(dev);
> >
> > Possible reference counter imbalance AFAICS.
> >
> Yes, get_device() should be ahead of "if (!dev->p)". Is anything  else I miss?

Yes, that's it.

> > > +     return 0;
> > > +}
> >
> > Well, instead of doing this dance, we might as well walk dpm_list here as it
> > is in the right order.
> >
> Sorry, do you mean that using the same way to manage the dpm_list?

No, I mean to use dpm_list instead of devices_kset for shutdown.

They should be in the same order anyway if all is correct.

> > Of course, that would require dpm_list to be available for CONFIG_PM unset,
> > but it may be a better approach long term.
> >
> > > +
> > >  /**
> > >   * device_shutdown - call ->shutdown() on each device to shutdown.
> > >   */
> > >  void device_shutdown(void)
> > >  {
> > >       struct device *dev;
> > > +     int idx;
> > >
> > > +     idx = device_links_read_lock();
> > >       spin_lock(&devices_kset->list_lock);
> > >       /*
> > >        * Walk the devices list backward, shutting down each in turn.
> > > @@ -2866,11 +2903,12 @@ void device_shutdown(void)
> > >        * devices offline, even as the system is shutting down.
> > >        */
> > >       while (!list_empty(&devices_kset->list)) {
> > > -             dev = list_entry(devices_kset->list.prev, struct device,
> > > +             dev = list_entry(devices_kset->list.next, struct device,
> > >                               kobj.entry);
> > > -             __device_shutdown(dev);
> > > +             device_for_each_child_shutdown(dev);
> > >       }
> > >       spin_unlock(&devices_kset->list_lock);
> > > +     device_links_read_unlock(idx);
> > >  }
> > >
> > >  /*
> > > diff --git a/include/linux/device.h b/include/linux/device.h
> > > index 055a69d..8a0f784 100644
> > > --- a/include/linux/device.h
> > > +++ b/include/linux/device.h
> > > @@ -1003,6 +1003,7 @@ struct device {
> > >       bool                    offline:1;
> > >       bool                    of_node_reused:1;
> > >       bool                    dma_32bit_limit:1;
> > > +     bool                    shutdown:1; /* one direction: false->true */
> > >  };
> > >
> > >  static inline struct device *kobj_to_dev(struct kobject *kobj)
> > >
> >
> > If the device_kset_move_last() in really_probe() is the only problem,
> > I'd rather try to fix that one in the first place.
> >
> > Why is it needed?
> >
> I had tried, but it turns out not easy to archive. The code is
> https://patchwork.kernel.org/patch/10485195/. And I make a detailed
> description of the algorithm in this patch's commit log. To be more
> detailed, we face the potential out of order issue in really_probe()
> like : 0th. [ consumer-X, child_a, ...., child_z] [... consumer_a,
> ..., consumer_z, ...] supplier-X //(note child_a is supplier of
> consumer_a).  To address all the potential out of order item in the
> affected section [... consumer_a, ..., consumer_z, ...],  it will
> incur two nested recursions.  1st, moving  consumer-X and its
> descendants after supplier-X,  2nd, moving consumer_a after child_a,
> 3rd. the 2nd step may pose the same situation of 0th.  Besides the two
> interleaved recursion,  the breakage of spin lock requires more effort
> to protect the item from disappearing in linked-list  (which I did not
> implement in the https://patchwork.kernel.org/patch/10485195/). Hence
> I turn to this cheap method.

So I think that we simply need to drop the devices_kset_move_last() call
from really_probe() as it is plain incorrect and the use case for it is
questionable at best.

And the use case it is supposed to address should be addressed differently.

Thanks,
Rafael

^ permalink raw reply

* Re: [PATCHv3 0/4] drivers/base: bugfix for supplier<-consumer ordering in device_kset
From: Kishon Vijay Abraham I @ 2018-07-06 10:02 UTC (permalink / raw)
  To: Lukas Wunner, Rafael J. Wysocki, linux-omap, Grygorii Strashko
  Cc: Pingfan Liu, Linux Kernel Mailing List, Greg Kroah-Hartman,
	Grygorii Strashko, Christoph Hellwig, Bjorn Helgaas, Dave Young,
	Linux PCI, linuxppc-dev, Linux PM
In-Reply-To: <20180706083603.GA9063@wunner.de>

+Grygorii, linux-omap

On Friday 06 July 2018 02:06 PM, Lukas Wunner wrote:
> [cc += Kishon Vijay Abraham]
> 
> On Thu, Jul 05, 2018 at 11:18:28AM +0200, Rafael J. Wysocki wrote:
>> OK, so calling devices_kset_move_last() from really_probe() clearly is
>> a mistake.
>>
>> I'm not really sure what the intention of it was as the changelog of
>> commit 52cdbdd49853d doesn't really explain that (why would it be
>> insufficient without that change?)
> 
> It seems 52cdbdd49853d fixed an issue with boards which have an MMC
> whose reset pin needs to be driven high on shutdown, lest the MMC
> won't be found on the next boot.
> 
> The boards' devicetrees use a kludge wherein the reset pin is modelled
> as a regulator.  The regulator is enabled when the MMC probes and
> disabled on driver unbind and shutdown.  As a result, the pin is driven
> low on shutdown and the MMC is not found on the next boot.
> 
> To fix this, another kludge was invented wherein the GPIO expander
> driving the reset pin unconditionally drives all its pins high on
> shutdown, see pcf857x_shutdown() in drivers/gpio/gpio-pcf857x.c
> (commit adc284755055, "gpio: pcf857x: restore the initial line state
> of all pcf lines").
> 
> For this kludge to work, the GPIO expander's ->shutdown hook needs to
> be executed after the MMC expander's ->shutdown hook.
> 
> Commit 52cdbdd49853d achieved that by reordering devices_kset according
> to the probe order.  Apparently the MMC probes after the GPIO expander,
> possibly because it returns -EPROBE_DEFER if the vmmc regulator isn't
> available yet, see mmc_regulator_get_supply().
> 
> Note, I'm just piecing the information together from git history,
> I'm not responsible for these kludges.  (I'm innocent!)
> 
> @Pingfan Liu, if you just remove the call to devices_kset_move_last()
> from really_probe(), does the issue go away?
> 
> If so, it might be best to remove that call and model the dependency
> with a call to device_link_add() in mmc_regulator_get_supply().

hmm.. had a quick look on this and looks like struct regulator is a regulator
frameworks internal data structure, so its members are not accessible outside.
struct regulator's device pointer is required for device_link_add().

Thanks
Kishon

^ permalink raw reply

* RE: [PATCH 0/7 v5] Support for fsl-mc bus and its devices in SMMU
From: Nipun Gupta @ 2018-07-06 11:18 UTC (permalink / raw)
  To: joro@8bytes.org
  Cc: robin.murphy@arm.com, will.deacon@arm.com,
	gregkh@linuxfoundation.org, hch@lst.de, m.szyprowski@samsung.com,
	shawnguo@kernel.org, frowand.list@gmail.com,
	iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
	devicetree@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linuxppc-dev@lists.ozlabs.org, linux-pci@vger.kernel.org,
	Bharat Bhushan, stuyoder@gmail.com, Leo Li
In-Reply-To: <20180706111327.7gkerhkk6udfeirr@8bytes.org>


Thanks Joro,

I will be sending it by mid of next week.

Regards,
Nipun

> -----Original Message-----
> From: joro@8bytes.org [mailto:joro@8bytes.org]
> Sent: Friday, July 6, 2018 4:43 PM
> To: Nipun Gupta <nipun.gupta@nxp.com>
> Cc: robin.murphy@arm.com; will.deacon@arm.com;
> gregkh@linuxfoundation.org; hch@lst.de; m.szyprowski@samsung.com;
> shawnguo@kernel.org; frowand.list@gmail.com; iommu@lists.linux-
> foundation.org; linux-kernel@vger.kernel.org; devicetree@vger.kernel.org;
> linux-arm-kernel@lists.infradead.org; linuxppc-dev@lists.ozlabs.org; linu=
x-
> pci@vger.kernel.org; Bharat Bhushan <bharat.bhushan@nxp.com>;
> stuyoder@gmail.com; Leo Li <leoyang.li@nxp.com>
> Subject: Re: [PATCH 0/7 v5] Support for fsl-mc bus and its devices in SMM=
U
>=20
> Hi Nipun,
>=20
> On Thu, Jun 21, 2018 at 03:59:27AM +0000, Nipun Gupta wrote:
> > Will this patch-set be taken for the next kernel release (and via which=
 tree)?
>=20
> I can take this through IOMMU tree if nobody objects. Please work out
> the last remaining comment on patch 7 with Robin and then re-send with
> all Acks and Reviewed-by: tags included.
>=20
> Oh, and please use 'iommu/of:' prefix instead of 'iommu: of:' and follow
> that sheme for the other patches/prefixes as well.
>=20
> Thanks,
>=20
> 	Joerg

^ permalink raw reply

* Re: [PATCH 0/7 v5] Support for fsl-mc bus and its devices in SMMU
From: joro @ 2018-07-06 11:13 UTC (permalink / raw)
  To: Nipun Gupta
  Cc: robin.murphy@arm.com, will.deacon@arm.com,
	gregkh@linuxfoundation.org, hch@lst.de, m.szyprowski@samsung.com,
	shawnguo@kernel.org, frowand.list@gmail.com,
	iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
	devicetree@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linuxppc-dev@lists.ozlabs.org, linux-pci@vger.kernel.org,
	Bharat Bhushan, stuyoder@gmail.com, Leo Li
In-Reply-To: <HE1PR0401MB24250CEFBC0B7CC919C4388AE6760@HE1PR0401MB2425.eurprd04.prod.outlook.com>

Hi Nipun,

On Thu, Jun 21, 2018 at 03:59:27AM +0000, Nipun Gupta wrote:
> Will this patch-set be taken for the next kernel release (and via which tree)?

I can take this through IOMMU tree if nobody objects. Please work out
the last remaining comment on patch 7 with Robin and then re-send with
all Acks and Reviewed-by: tags included.

Oh, and please use 'iommu/of:' prefix instead of 'iommu: of:' and follow
that sheme for the other patches/prefixes as well.

Thanks,

	Joerg

^ permalink raw reply

* [PATCH v4 0/3] hwmon: Add attributes to enable/disable sensors
From: Shilpasri G Bhat @ 2018-07-06 11:56 UTC (permalink / raw)
  To: linux, mpe; +Cc: linuxppc-dev, linux-hwmon, linux-kernel, ego, Shilpasri G Bhat

This patch series adds new attribute to enable or disable a sensor in
runtime.

v3 : https://lkml.org/lkml/2018/7/5/476
v2 : https://lkml.org/lkml/2018/7/4/263
v1 : https://lkml.org/lkml/2018/3/22/214

Shilpasri G Bhat (3):
  powernv:opal-sensor-groups: Add support to enable sensor groups
  hwmon: ibmpowernv: Add attributes to enable/disable sensor groups
  hwmon: Document the sensor enable attribute

 Documentation/hwmon/ibmpowernv                     |  43 +++-
 Documentation/hwmon/sysfs-interface                |  62 ++++++
 arch/powerpc/include/asm/opal-api.h                |   1 +
 arch/powerpc/include/asm/opal.h                    |   2 +
 .../powerpc/platforms/powernv/opal-sensor-groups.c |  28 +++
 arch/powerpc/platforms/powernv/opal-wrappers.S     |   1 +
 drivers/hwmon/ibmpowernv.c                         | 225 ++++++++++++++++++---
 7 files changed, 328 insertions(+), 34 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* [PATCH v4 1/3] powernv:opal-sensor-groups: Add support to enable sensor groups
From: Shilpasri G Bhat @ 2018-07-06 11:56 UTC (permalink / raw)
  To: linux, mpe; +Cc: linuxppc-dev, linux-hwmon, linux-kernel, ego, Shilpasri G Bhat
In-Reply-To: <1530878166-12589-1-git-send-email-shilpa.bhat@linux.vnet.ibm.com>

Adds support to enable/disable a sensor group at runtime. This
can be used to select the sensor groups that needs to be copied to
main memory by OCC. Sensor groups like power, temperature, current,
voltage, frequency, utilization can be enabled/disabled at runtime.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/opal-api.h                |  1 +
 arch/powerpc/include/asm/opal.h                    |  2 ++
 .../powerpc/platforms/powernv/opal-sensor-groups.c | 28 ++++++++++++++++++++++
 arch/powerpc/platforms/powernv/opal-wrappers.S     |  1 +
 4 files changed, 32 insertions(+)

diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index 3bab299..56a94a1 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -206,6 +206,7 @@
 #define OPAL_NPU_SPA_CLEAR_CACHE		160
 #define OPAL_NPU_TL_SET				161
 #define OPAL_SENSOR_READ_U64			162
+#define OPAL_SENSOR_GROUP_ENABLE		163
 #define OPAL_PCI_GET_PBCQ_TUNNEL_BAR		164
 #define OPAL_PCI_SET_PBCQ_TUNNEL_BAR		165
 #define OPAL_LAST				165
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index e1b2910..fc0550e 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -292,6 +292,7 @@ int64_t opal_imc_counters_init(uint32_t type, uint64_t address,
 int opal_get_power_shift_ratio(u32 handle, int token, u32 *psr);
 int opal_set_power_shift_ratio(u32 handle, int token, u32 psr);
 int opal_sensor_group_clear(u32 group_hndl, int token);
+int opal_sensor_group_enable(u32 group_hndl, int token, bool enable);
 
 s64 opal_signal_system_reset(s32 cpu);
 s64 opal_quiesce(u64 shutdown_type, s32 cpu);
@@ -326,6 +327,7 @@ extern int opal_async_wait_response_interruptible(uint64_t token,
 		struct opal_msg *msg);
 extern int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data);
 extern int opal_get_sensor_data_u64(u32 sensor_hndl, u64 *sensor_data);
+extern int sensor_group_enable(u32 grp_hndl, bool enable);
 
 struct rtc_time;
 extern time64_t opal_get_boot_time(void);
diff --git a/arch/powerpc/platforms/powernv/opal-sensor-groups.c b/arch/powerpc/platforms/powernv/opal-sensor-groups.c
index 541c9ea..f7d04b6 100644
--- a/arch/powerpc/platforms/powernv/opal-sensor-groups.c
+++ b/arch/powerpc/platforms/powernv/opal-sensor-groups.c
@@ -32,6 +32,34 @@ struct sg_attr {
 	struct sg_attr *sgattrs;
 } *sgs;
 
+int sensor_group_enable(u32 handle, bool enable)
+{
+	struct opal_msg msg;
+	int token, ret;
+
+	token = opal_async_get_token_interruptible();
+	if (token < 0)
+		return token;
+
+	ret = opal_sensor_group_enable(handle, token, enable);
+	if (ret == OPAL_ASYNC_COMPLETION) {
+		ret = opal_async_wait_response(token, &msg);
+		if (ret) {
+			pr_devel("Failed to wait for the async response\n");
+			ret = -EIO;
+			goto out;
+		}
+		ret = opal_error_code(opal_get_async_rc(msg));
+	} else {
+		ret = opal_error_code(ret);
+	}
+
+out:
+	opal_async_release_token(token);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(sensor_group_enable);
+
 static ssize_t sg_store(struct kobject *kobj, struct kobj_attribute *attr,
 			const char *buf, size_t count)
 {
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index a8d9b40..8268a1e 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -327,3 +327,4 @@ OPAL_CALL(opal_npu_tl_set,			OPAL_NPU_TL_SET);
 OPAL_CALL(opal_pci_get_pbcq_tunnel_bar,		OPAL_PCI_GET_PBCQ_TUNNEL_BAR);
 OPAL_CALL(opal_pci_set_pbcq_tunnel_bar,		OPAL_PCI_SET_PBCQ_TUNNEL_BAR);
 OPAL_CALL(opal_sensor_read_u64,			OPAL_SENSOR_READ_U64);
+OPAL_CALL(opal_sensor_group_enable,		OPAL_SENSOR_GROUP_ENABLE);
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v4 2/3] hwmon: ibmpowernv: Add attributes to enable/disable sensor groups
From: Shilpasri G Bhat @ 2018-07-06 11:56 UTC (permalink / raw)
  To: linux, mpe; +Cc: linuxppc-dev, linux-hwmon, linux-kernel, ego, Shilpasri G Bhat
In-Reply-To: <1530878166-12589-1-git-send-email-shilpa.bhat@linux.vnet.ibm.com>

On-Chip-Controller(OCC) is an embedded micro-processor in POWER9 chip
which measures various system and chip level sensors. These sensors
comprises of environmental sensors (like power, temperature, current
and voltage) and performance sensors (like utilization, frequency).
All these sensors are copied to main memory at a regular interval of
100ms. OCC provides a way to select a group of sensors that is copied
to the main memory to increase the update frequency of selected sensor
groups. When a sensor-group is disabled, OCC will not copy it to main
memory and those sensors read 0 values.

This patch provides support for enabling/disabling the sensor groups
like power, temperature, current and voltage. This patch adds new
per-senor sysfs attribute to disable and enable them.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
---
Changes from v3:
- Add 'enable' attribute for each sensor sub-group

 Documentation/hwmon/ibmpowernv |  43 +++++++-
 drivers/hwmon/ibmpowernv.c     | 225 +++++++++++++++++++++++++++++++++++------
 2 files changed, 234 insertions(+), 34 deletions(-)

diff --git a/Documentation/hwmon/ibmpowernv b/Documentation/hwmon/ibmpowernv
index 8826ba2..5646825 100644
--- a/Documentation/hwmon/ibmpowernv
+++ b/Documentation/hwmon/ibmpowernv
@@ -33,9 +33,48 @@ fanX_input		Measured RPM value.
 fanX_min		Threshold RPM for alert generation.
 fanX_fault		0: No fail condition
 			1: Failing fan
+
 tempX_input		Measured ambient temperature.
 tempX_max		Threshold ambient temperature for alert generation.
-inX_input		Measured power supply voltage
+tempX_highest		Historical maximum temperature
+tempX_lowest		Historical minimum temperature
+tempX_enable		Enable/disable all temperature sensors belonging to the
+			sub-group. In POWER9, this attribute corresponds to
+			each OCC. Using this attribute each OCC can be asked to
+			disable/enable all of its temperature sensors.
+			1: Enable
+			0: Disable
+
+inX_input		Measured power supply voltage (millivolt)
 inX_fault		0: No fail condition.
 			1: Failing power supply.
-power1_input		System power consumption (microWatt)
+inX_highest		Historical maximum voltage
+inX_lowest		Historical minimum voltage
+inX_enable		Enable/disable all voltage sensors belonging to the
+			sub-group. In POWER9, this attribute corresponds to
+			each OCC. Using this attribute each OCC can be asked to
+			disable/enable all of its voltage sensors.
+			1: Enable
+			0: Disable
+
+powerX_input		Power consumption (microWatt)
+powerX_input_highest	Historical maximum power
+powerX_input_lowest	Historical minimum power
+powerX_enable		Enable/disable all power sensors belonging to the
+			sub-group. In POWER9, this attribute corresponds to
+			each OCC. Using this attribute each OCC can be asked to
+			disable/enable all of its power sensors.
+			1: Enable
+			0: Disable
+
+currX_input		Measured current (milliampere)
+currX_highest		Historical maximum current
+currX_lowest		Historical minimum current
+currX_enable		Enable/disable all current sensors belonging to the
+			sub-group. In POWER9, this attribute corresponds to
+			each OCC. Using this attribute each OCC can be asked to
+			disable/enable all of its current sensors.
+			1: Enable
+			0: Disable
+
+energyX_input		Cumulative energy (microJoule)
diff --git a/drivers/hwmon/ibmpowernv.c b/drivers/hwmon/ibmpowernv.c
index f829dad..33cf7d2 100644
--- a/drivers/hwmon/ibmpowernv.c
+++ b/drivers/hwmon/ibmpowernv.c
@@ -90,11 +90,23 @@ struct sensor_data {
 	char label[MAX_LABEL_LEN];
 	char name[MAX_ATTR_LEN];
 	struct device_attribute dev_attr;
+	struct sensor_group_data *sgdata;
+};
+
+struct sensor_group_data {
+	struct mutex mutex;
+	const __be32 *phandles;
+	u32 gid;
+	u32 nr_phandle;
+	enum sensors type;
+	bool enable;
 };
 
 struct platform_data {
 	const struct attribute_group *attr_groups[MAX_SENSOR_TYPE + 1];
+	struct sensor_group_data *sg_data;
 	u32 sensors_count; /* Total count of sensors from each group */
+	u32 nr_sensor_groups; /* Total number of sensor sub-groups */
 };
 
 static ssize_t show_sensor(struct device *dev, struct device_attribute *devattr,
@@ -105,6 +117,9 @@ static ssize_t show_sensor(struct device *dev, struct device_attribute *devattr,
 	ssize_t ret;
 	u64 x;
 
+	if (sdata->sgdata && !sdata->sgdata->enable)
+		return -ENODATA;
+
 	ret =  opal_get_sensor_data_u64(sdata->id, &x);
 
 	if (ret)
@@ -120,6 +135,46 @@ static ssize_t show_sensor(struct device *dev, struct device_attribute *devattr,
 	return sprintf(buf, "%llu\n", x);
 }
 
+static ssize_t show_enable(struct device *dev,
+			   struct device_attribute *devattr, char *buf)
+{
+	struct sensor_data *sdata = container_of(devattr, struct sensor_data,
+						 dev_attr);
+
+	return sprintf(buf, "%u\n", sdata->sgdata->enable);
+}
+
+static ssize_t store_enable(struct device *dev,
+			    struct device_attribute *devattr,
+			    const char *buf, size_t count)
+{
+	struct sensor_data *sdata = container_of(devattr, struct sensor_data,
+						 dev_attr);
+	struct sensor_group_data *sg = sdata->sgdata;
+	bool data;
+	int ret;
+
+	ret = kstrtobool(buf, &data);
+	if (ret)
+		return ret;
+
+	ret = mutex_lock_interruptible(&sg->mutex);
+	if (ret)
+		return ret;
+
+	if (data != sg->enable) {
+		ret =  sensor_group_enable(sg->gid, data);
+		if (!ret)
+			sg->enable = data;
+	}
+
+	if (!ret)
+		ret = count;
+
+	mutex_unlock(&sg->mutex);
+	return ret;
+}
+
 static ssize_t show_label(struct device *dev, struct device_attribute *devattr,
 			  char *buf)
 {
@@ -292,12 +347,99 @@ static u32 get_sensor_hwmon_index(struct sensor_data *sdata,
 	return ++sensor_groups[sdata->type].hwmon_index;
 }
 
+static int init_sensor_group_data(struct platform_device *pdev,
+				  struct platform_data *pdata)
+{
+	struct sensor_group_data *sg_data;
+	struct device_node *groups, *sg;
+	enum sensors type;
+	int count = 0, ret = 0;
+
+	groups = of_find_node_by_path("/ibm,opal/sensor-groups");
+	if (!groups)
+		return ret;
+
+	for_each_child_of_node(groups, sg) {
+		type = get_sensor_type(sg);
+		if (type != MAX_SENSOR_TYPE)
+			pdata->nr_sensor_groups++;
+	}
+
+	if (!pdata->nr_sensor_groups)
+		goto out;
+
+	sg_data = devm_kcalloc(&pdev->dev, pdata->nr_sensor_groups,
+			       sizeof(*sg_data), GFP_KERNEL);
+	if (!sg_data) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	for_each_child_of_node(groups, sg) {
+		const __be32 *phandles;
+		int len, gid;
+
+		type = get_sensor_type(sg);
+		if (type == MAX_SENSOR_TYPE)
+			continue;
+
+		if (of_property_read_u32(sg, "sensor-group-id", &gid))
+			continue;
+
+		phandles = of_get_property(sg, "sensors", &len);
+		if (!phandles)
+			continue;
+
+		len /= sizeof(u32);
+		if (!len)
+			continue;
+
+		sensor_groups[type].attr_count++;
+		sg_data[count].gid = gid;
+		sg_data[count].type = type;
+		sg_data[count].nr_phandle = len;
+		sg_data[count].phandles = phandles;
+		mutex_init(&sg_data[count].mutex);
+		sg_data[count++].enable = false;
+	}
+	pdata->sg_data = sg_data;
+out:
+	of_node_put(groups);
+	return ret;
+}
+
+static struct sensor_group_data *get_sensor_group(struct platform_data *pdata,
+						  struct device_node *np,
+						  enum sensors type)
+{
+	struct sensor_group_data *sg_data = pdata->sg_data;
+	int i, j;
+
+	for (i = 0; i < pdata->nr_sensor_groups; i++) {
+		const __be32 *phandles = sg_data[i].phandles;
+
+		if (type != sg_data[i].type)
+			continue;
+
+		for (j = 0; j < sg_data[i].nr_phandle; j++)
+			if (be32_to_cpu(phandles[j]) == np->phandle)
+				return &sg_data[i];
+	}
+
+	return NULL;
+}
+
 static int populate_attr_groups(struct platform_device *pdev)
 {
 	struct platform_data *pdata = platform_get_drvdata(pdev);
 	const struct attribute_group **pgroups = pdata->attr_groups;
 	struct device_node *opal, *np;
 	enum sensors type;
+	int ret;
+
+	ret = init_sensor_group_data(pdev, pdata);
+	if (ret)
+		return ret;
 
 	opal = of_find_node_by_path("/ibm,opal/sensors");
 	for_each_child_of_node(opal, np) {
@@ -344,7 +486,10 @@ static int populate_attr_groups(struct platform_device *pdev)
 static void create_hwmon_attr(struct sensor_data *sdata, const char *attr_name,
 			      ssize_t (*show)(struct device *dev,
 					      struct device_attribute *attr,
-					      char *buf))
+					      char *buf),
+			    ssize_t (*store)(struct device *dev,
+					     struct device_attribute *attr,
+					     const char *buf, size_t count))
 {
 	snprintf(sdata->name, MAX_ATTR_LEN, "%s%d_%s",
 		 sensor_groups[sdata->type].name, sdata->hwmon_index,
@@ -352,23 +497,33 @@ static void create_hwmon_attr(struct sensor_data *sdata, const char *attr_name,
 
 	sysfs_attr_init(&sdata->dev_attr.attr);
 	sdata->dev_attr.attr.name = sdata->name;
-	sdata->dev_attr.attr.mode = S_IRUGO;
 	sdata->dev_attr.show = show;
+	if (store) {
+		sdata->dev_attr.store = store;
+		sdata->dev_attr.attr.mode = 0664;
+	} else {
+		sdata->dev_attr.attr.mode = 0444;
+	}
 }
 
 static void populate_sensor(struct sensor_data *sdata, int od, int hd, int sid,
 			    const char *attr_name, enum sensors type,
 			    const struct attribute_group *pgroup,
+			    struct sensor_group_data *sgdata,
 			    ssize_t (*show)(struct device *dev,
 					    struct device_attribute *attr,
-					    char *buf))
+					    char *buf),
+			    ssize_t (*store)(struct device *dev,
+					     struct device_attribute *attr,
+					     const char *buf, size_t count))
 {
 	sdata->id = sid;
 	sdata->type = type;
 	sdata->opal_index = od;
 	sdata->hwmon_index = hd;
-	create_hwmon_attr(sdata, attr_name, show);
+	create_hwmon_attr(sdata, attr_name, show, store);
 	pgroup->attrs[sensor_groups[type].attr_count++] = &sdata->dev_attr.attr;
+	sdata->sgdata = sgdata;
 }
 
 static char *get_max_attr(enum sensors type)
@@ -403,24 +558,23 @@ static int create_device_attrs(struct platform_device *pdev)
 	const struct attribute_group **pgroups = pdata->attr_groups;
 	struct device_node *opal, *np;
 	struct sensor_data *sdata;
-	u32 sensor_id;
-	enum sensors type;
 	u32 count = 0;
-	int err = 0;
+	u32 group_attr_id[MAX_SENSOR_TYPE] = {0};
 
-	opal = of_find_node_by_path("/ibm,opal/sensors");
 	sdata = devm_kcalloc(&pdev->dev,
 			     pdata->sensors_count, sizeof(*sdata),
 			     GFP_KERNEL);
-	if (!sdata) {
-		err = -ENOMEM;
-		goto exit_put_node;
-	}
+	if (!sdata)
+		return -ENOMEM;
 
+	opal = of_find_node_by_path("/ibm,opal/sensors");
 	for_each_child_of_node(opal, np) {
+		struct sensor_group_data *sgdata;
 		const char *attr_name;
-		u32 opal_index;
+		u32 opal_index, hw_id;
+		u32 sensor_id;
 		const char *label;
+		enum sensors type;
 
 		if (np->name == NULL)
 			continue;
@@ -456,14 +610,12 @@ static int create_device_attrs(struct platform_device *pdev)
 			opal_index = INVALID_INDEX;
 		}
 
-		sdata[count].opal_index = opal_index;
-		sdata[count].hwmon_index =
-			get_sensor_hwmon_index(&sdata[count], sdata, count);
-
-		create_hwmon_attr(&sdata[count], attr_name, show_sensor);
-
-		pgroups[type]->attrs[sensor_groups[type].attr_count++] =
-				&sdata[count++].dev_attr.attr;
+		hw_id = get_sensor_hwmon_index(&sdata[count], sdata, count);
+		sgdata = get_sensor_group(pdata, np, type);
+		populate_sensor(&sdata[count], opal_index, hw_id, sensor_id,
+				attr_name, type, pgroups[type], sgdata,
+				show_sensor, NULL);
+		count++;
 
 		if (!of_property_read_string(np, "label", &label)) {
 			/*
@@ -474,35 +626,43 @@ static int create_device_attrs(struct platform_device *pdev)
 			 */
 
 			make_sensor_label(np, &sdata[count], label);
-			populate_sensor(&sdata[count], opal_index,
-					sdata[count - 1].hwmon_index,
+			populate_sensor(&sdata[count], opal_index, hw_id,
 					sensor_id, "label", type, pgroups[type],
-					show_label);
+					NULL, show_label, NULL);
 			count++;
 		}
 
 		if (!of_property_read_u32(np, "sensor-data-max", &sensor_id)) {
 			attr_name = get_max_attr(type);
-			populate_sensor(&sdata[count], opal_index,
-					sdata[count - 1].hwmon_index,
+			populate_sensor(&sdata[count], opal_index, hw_id,
 					sensor_id, attr_name, type,
-					pgroups[type], show_sensor);
+					pgroups[type], sgdata, show_sensor,
+					NULL);
 			count++;
 		}
 
 		if (!of_property_read_u32(np, "sensor-data-min", &sensor_id)) {
 			attr_name = get_min_attr(type);
-			populate_sensor(&sdata[count], opal_index,
-					sdata[count - 1].hwmon_index,
+			populate_sensor(&sdata[count], opal_index, hw_id,
 					sensor_id, attr_name, type,
-					pgroups[type], show_sensor);
+					pgroups[type], sgdata, show_sensor,
+					NULL);
+			count++;
+		}
+
+		if (sgdata && !sgdata->enable) {
+			sgdata->enable = true;
+			hw_id = ++group_attr_id[type];
+			populate_sensor(&sdata[count], opal_index, hw_id,
+					sgdata->gid, "enable", type,
+					pgroups[type], sgdata, show_enable,
+					store_enable);
 			count++;
 		}
 	}
 
-exit_put_node:
 	of_node_put(opal);
-	return err;
+	return 0;
 }
 
 static int ibmpowernv_probe(struct platform_device *pdev)
@@ -517,6 +677,7 @@ static int ibmpowernv_probe(struct platform_device *pdev)
 
 	platform_set_drvdata(pdev, pdata);
 	pdata->sensors_count = 0;
+	pdata->nr_sensor_groups = 0;
 	err = populate_attr_groups(pdev);
 	if (err)
 		return err;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v4 3/3] hwmon: Document the sensor enable attribute
From: Shilpasri G Bhat @ 2018-07-06 11:56 UTC (permalink / raw)
  To: linux, mpe; +Cc: linuxppc-dev, linux-hwmon, linux-kernel, ego, Shilpasri G Bhat
In-Reply-To: <1530878166-12589-1-git-send-email-shilpa.bhat@linux.vnet.ibm.com>

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
---
 Documentation/hwmon/sysfs-interface | 62 +++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/Documentation/hwmon/sysfs-interface b/Documentation/hwmon/sysfs-interface
index fc337c3..f865dd7 100644
--- a/Documentation/hwmon/sysfs-interface
+++ b/Documentation/hwmon/sysfs-interface
@@ -171,6 +171,15 @@ in[0-*]_label	Suggested voltage channel label.
 		user-space.
 		RO
 
+in[0-*]_enable
+		Enable or disable the sensors.
+		When disabled the sensor read will return -ENODATA. This
+		attribute can be used as per-sensor or per-sub-group attribute
+		depending on what is supported by the hardware.
+		1: Enable
+		0: Disable
+		RW
+
 cpu[0-*]_vid	CPU core reference voltage.
 		Unit: millivolt
 		RO
@@ -236,6 +245,15 @@ fan[1-*]_label	Suggested fan channel label.
 		In all other cases, the label is provided by user-space.
 		RO
 
+fan[1-*]_enable
+		Enable or disable the sensors.
+		When disabled the sensor read will return -ENODATA. This
+		attribute can be used as per-sensor or per-sub-group attribute
+		depending on what is supported by the hardware.
+		1: Enable
+		0: Disable
+		RW
+
 Also see the Alarms section for status flags associated with fans.
 
 
@@ -409,6 +427,15 @@ temp_reset_history
 		Reset temp_lowest and temp_highest for all sensors
 		WO
 
+temp[1-*]_enable
+		Enable or disable the sensors.
+		When disabled the sensor read will return -ENODATA. This
+		attribute can be used as per-sensor or per-sub-group attribute
+		depending on what is supported by the hardware.
+		1: Enable
+		0: Disable
+		RW
+
 Some chips measure temperature using external thermistors and an ADC, and
 report the temperature measurement as a voltage. Converting this voltage
 back to a temperature (or the other way around for limits) requires
@@ -468,6 +495,15 @@ curr_reset_history
 		Reset currX_lowest and currX_highest for all sensors
 		WO
 
+curr[1-*]_enable
+		Enable or disable the sensors.
+		When disabled the sensor read will return -ENODATA. This
+		attribute can be used as per-sensor or per-sub-group attribute
+		depending on what is supported by the hardware.
+		1: Enable
+		0: Disable
+		RW
+
 Also see the Alarms section for status flags associated with currents.
 
 *********
@@ -566,6 +602,15 @@ power[1-*]_crit			Critical maximum power.
 				Unit: microWatt
 				RW
 
+power[1-*]_enable		Enable or disable the sensors.
+				When disabled the sensor read will return
+				-ENODATA. This attribute can be used as
+				per-sensor or per-sub-group attribute depending
+				on what is supported by the hardware.
+				1: Enable
+				0: Disable
+				RW
+
 Also see the Alarms section for status flags associated with power readings.
 
 **********
@@ -576,6 +621,14 @@ energy[1-*]_input		Cumulative energy use
 				Unit: microJoule
 				RO
 
+energy[1-*]_enable		Enable or disable the sensors.
+				When disabled the sensor read will return
+				-ENODATA. This attribute can be used as
+				per-sensor or per-sub-group attribute depending
+				on what is supported by the hardware.
+				1: Enable
+				0: Disable
+				RW
 
 ************
 * Humidity *
@@ -586,6 +639,15 @@ humidity[1-*]_input		Humidity
 				RO
 
 
+humidity[1-*]_enable		Enable or disable the sensors
+				When disabled the sensor read will return
+				-ENODATA. This attribute can be used as
+				per-sensor or per-sub-group attribute depending
+				on what is supported by the hardware.
+				1: Enable
+				0: Disable
+				RW
+
 **********
 * Alarms *
 **********
-- 
1.8.3.1

^ permalink raw reply related

* RE: [PATCH 1/7 v5] Docs: dt: add fsl-mc iommu-map device-tree binding
From: Nipun Gupta @ 2018-07-06 12:10 UTC (permalink / raw)
  To: Robin Murphy, will.deacon@arm.com, robh+dt@kernel.org,
	robh@kernel.org, mark.rutland@arm.com, catalin.marinas@arm.com,
	gregkh@linuxfoundation.org, Laurentiu Tudor, bhelgaas@google.com
  Cc: hch@lst.de, joro@8bytes.org, m.szyprowski@samsung.com,
	shawnguo@kernel.org, frowand.list@gmail.com,
	iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
	devicetree@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linuxppc-dev@lists.ozlabs.org, linux-pci@vger.kernel.org,
	Bharat Bhushan, stuyoder@gmail.com, Leo Li
In-Reply-To: <cd70dc56-ce6a-f6ef-1362-143d9c31bb46@arm.com>

DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogUm9iaW4gTXVycGh5IFtt
YWlsdG86cm9iaW4ubXVycGh5QGFybS5jb21dDQo+IFNlbnQ6IFR1ZXNkYXksIEp1bHkgMywgMjAx
OCA4OjEwIFBNDQo+IFRvOiBOaXB1biBHdXB0YSA8bmlwdW4uZ3VwdGFAbnhwLmNvbT47IHdpbGwu
ZGVhY29uQGFybS5jb207DQo+IHJvYmgrZHRAa2VybmVsLm9yZzsgcm9iaEBrZXJuZWwub3JnOyBt
YXJrLnJ1dGxhbmRAYXJtLmNvbTsNCj4gY2F0YWxpbi5tYXJpbmFzQGFybS5jb207IGdyZWdraEBs
aW51eGZvdW5kYXRpb24ub3JnOyBMYXVyZW50aXUgVHVkb3INCj4gPGxhdXJlbnRpdS50dWRvckBu
eHAuY29tPjsgYmhlbGdhYXNAZ29vZ2xlLmNvbQ0KPiBDYzogaGNoQGxzdC5kZTsgam9yb0A4Ynl0
ZXMub3JnOyBtLnN6eXByb3dza2lAc2Ftc3VuZy5jb207DQo+IHNoYXduZ3VvQGtlcm5lbC5vcmc7
IGZyb3dhbmQubGlzdEBnbWFpbC5jb207IGlvbW11QGxpc3RzLmxpbnV4LQ0KPiBmb3VuZGF0aW9u
Lm9yZzsgbGludXgta2VybmVsQHZnZXIua2VybmVsLm9yZzsgZGV2aWNldHJlZUB2Z2VyLmtlcm5l
bC5vcmc7DQo+IGxpbnV4LWFybS1rZXJuZWxAbGlzdHMuaW5mcmFkZWFkLm9yZzsgbGludXhwcGMt
ZGV2QGxpc3RzLm96bGFicy5vcmc7IGxpbnV4LQ0KPiBwY2lAdmdlci5rZXJuZWwub3JnOyBCaGFy
YXQgQmh1c2hhbiA8YmhhcmF0LmJodXNoYW5AbnhwLmNvbT47DQo+IHN0dXlvZGVyQGdtYWlsLmNv
bTsgTGVvIExpIDxsZW95YW5nLmxpQG54cC5jb20+DQo+IFN1YmplY3Q6IFJlOiBbUEFUQ0ggMS83
IHY1XSBEb2NzOiBkdDogYWRkIGZzbC1tYyBpb21tdS1tYXAgZGV2aWNlLXRyZWUNCj4gYmluZGlu
Zw0KPiANCj4gT24gMjAvMDUvMTggMTQ6NDksIE5pcHVuIEd1cHRhIHdyb3RlOg0KPiA+IFRoZSBl
eGlzdGluZyBJT01NVSBiaW5kaW5ncyBjYW5ub3QgYmUgdXNlZCB0byBzcGVjaWZ5IHRoZSByZWxh
dGlvbnNoaXANCj4gPiBiZXR3ZWVuIGZzbC1tYyBkZXZpY2VzIGFuZCBJT01NVXMuIFRoaXMgcGF0
Y2ggYWRkcyBhIGdlbmVyaWMgYmluZGluZyBmb3INCj4gPiBtYXBwaW5nIGZzbC1tYyBkZXZpY2Vz
IHRvIElPTU1VcywgdXNpbmcgaW9tbXUtbWFwIHByb3BlcnR5Lg0KPiA+DQo+ID4gU2lnbmVkLW9m
Zi1ieTogTmlwdW4gR3VwdGEgPG5pcHVuLmd1cHRhQG54cC5jb20+DQo+ID4gUmV2aWV3ZWQtYnk6
IFJvYiBIZXJyaW5nIDxyb2JoQGtlcm5lbC5vcmc+DQo+ID4gLS0tDQo+ID4gICAuLi4vZGV2aWNl
dHJlZS9iaW5kaW5ncy9taXNjL2ZzbCxxb3JpcS1tYy50eHQgICAgICB8IDM5DQo+ICsrKysrKysr
KysrKysrKysrKysrKysNCj4gPiAgIDEgZmlsZSBjaGFuZ2VkLCAzOSBpbnNlcnRpb25zKCspDQo+
ID4NCj4gPiBkaWZmIC0tZ2l0IGEvRG9jdW1lbnRhdGlvbi9kZXZpY2V0cmVlL2JpbmRpbmdzL21p
c2MvZnNsLHFvcmlxLW1jLnR4dA0KPiBiL0RvY3VtZW50YXRpb24vZGV2aWNldHJlZS9iaW5kaW5n
cy9taXNjL2ZzbCxxb3JpcS1tYy50eHQNCj4gPiBpbmRleCA2NjExYTdjLi44Y2JlZDRmIDEwMDY0
NA0KPiA+IC0tLSBhL0RvY3VtZW50YXRpb24vZGV2aWNldHJlZS9iaW5kaW5ncy9taXNjL2ZzbCxx
b3JpcS1tYy50eHQNCj4gPiArKysgYi9Eb2N1bWVudGF0aW9uL2RldmljZXRyZWUvYmluZGluZ3Mv
bWlzYy9mc2wscW9yaXEtbWMudHh0DQo+ID4gQEAgLTksNiArOSwyNSBAQCBibG9ja3MgdGhhdCBj
YW4gYmUgdXNlZCB0byBjcmVhdGUgZnVuY3Rpb25hbCBoYXJkd2FyZQ0KPiBvYmplY3RzL2Rldmlj
ZXMNCj4gPiAgIHN1Y2ggYXMgbmV0d29yayBpbnRlcmZhY2VzLCBjcnlwdG8gYWNjZWxlcmF0b3Ig
aW5zdGFuY2VzLCBMMiBzd2l0Y2hlcywNCj4gPiAgIGV0Yy4NCj4gPg0KPiA+ICtGb3IgYW4gb3Zl
cnZpZXcgb2YgdGhlIERQQUEyIGFyY2hpdGVjdHVyZSBhbmQgZnNsLW1jIGJ1cyBzZWU6DQo+ID4g
K2RyaXZlcnMvc3RhZ2luZy9mc2wtbWMvUkVBRE1FLnR4dA0KPiANCj4gTml0OiBMb29rcyBsaWtl
IHRoYXQncyBEb2N1bWVudGF0aW9uL25ldHdvcmtpbmcvZHBhYTIvb3ZlcnZpZXcucnN0IG5vdy4N
Cj4gDQo+ID4gKw0KPiA+ICtBcyBkZXNjcmliZWQgaW4gdGhlIGFib3ZlIG92ZXJ2aWV3LCBhbGwg
RFBBQTIgb2JqZWN0cyBpbiBhIERQUkMgc2hhcmUgdGhlDQo+ID4gK3NhbWUgaGFyZHdhcmUgImlz
b2xhdGlvbiBjb250ZXh0IiBhbmQgYSAxMC1iaXQgdmFsdWUgY2FsbGVkIGFuIElDSUQNCj4gPiAr
KGlzb2xhdGlvbiBjb250ZXh0IGlkKSBpcyBleHByZXNzZWQgYnkgdGhlIGhhcmR3YXJlIHRvIGlk
ZW50aWZ5DQo+ID4gK3RoZSByZXF1ZXN0ZXIuDQo+ID4gKw0KPiA+ICtUaGUgZ2VuZXJpYyAnaW9t
bXVzJyBwcm9wZXJ0eSBpcyBpbnN1ZmZpY2llbnQgdG8gZGVzY3JpYmUgdGhlIHJlbGF0aW9uc2hp
cA0KPiA+ICtiZXR3ZWVuIElDSURzIGFuZCBJT01NVXMsIHNvIGFuIGlvbW11LW1hcCBwcm9wZXJ0
eSBpcyB1c2VkIHRvDQo+IGRlZmluZQ0KPiA+ICt0aGUgc2V0IG9mIHBvc3NpYmxlIElDSURzIHVu
ZGVyIGEgcm9vdCBEUFJDIGFuZCBob3cgdGhleSBtYXAgdG8NCj4gPiArYW4gSU9NTVUuDQo+ID4g
Kw0KPiA+ICtGb3IgZ2VuZXJpYyBJT01NVSBiaW5kaW5ncywgc2VlDQo+ID4gK0RvY3VtZW50YXRp
b24vZGV2aWNldHJlZS9iaW5kaW5ncy9pb21tdS9pb21tdS50eHQuDQo+ID4gKw0KPiA+ICtGb3Ig
YXJtLXNtbXUgYmluZGluZywgc2VlOg0KPiA+ICtEb2N1bWVudGF0aW9uL2RldmljZXRyZWUvYmlu
ZGluZ3MvaW9tbXUvYXJtLHNtbXUudHh0Lg0KPiA+ICsNCj4gPiAgIFJlcXVpcmVkIHByb3BlcnRp
ZXM6DQo+ID4NCj4gPiAgICAgICAtIGNvbXBhdGlibGUNCj4gPiBAQCAtODgsMTQgKzEwNywzNCBA
QCBTdWItbm9kZXM6DQo+ID4gICAgICAgICAgICAgICAgIFZhbHVlIHR5cGU6IDxwaGFuZGxlPg0K
PiA+ICAgICAgICAgICAgICAgICBEZWZpbml0aW9uOiBTcGVjaWZpZXMgdGhlIHBoYW5kbGUgdG8g
dGhlIFBIWSBkZXZpY2Ugbm9kZSBhc3NvY2lhdGVkDQo+ID4gICAgICAgICAgICAgICAgICAgICAg
ICAgICAgIHdpdGggdGhlIHRoaXMgZHBtYWMuDQo+ID4gK09wdGlvbmFsIHByb3BlcnRpZXM6DQo+
ID4gKw0KPiA+ICstIGlvbW11LW1hcDogTWFwcyBhbiBJQ0lEIHRvIGFuIElPTU1VIGFuZCBhc3Nv
Y2lhdGVkIGlvbW11LQ0KPiBzcGVjaWZpZXINCj4gPiArICBkYXRhLg0KPiA+ICsNCj4gPiArICBU
aGUgcHJvcGVydHkgaXMgYW4gYXJiaXRyYXJ5IG51bWJlciBvZiB0dXBsZXMgb2YNCj4gPiArICAo
aWNpZC1iYXNlLGlvbW11LGlvbW11LWJhc2UsbGVuZ3RoKS4NCj4gPiArDQo+ID4gKyAgQW55IElD
SUQgaSBpbiB0aGUgaW50ZXJ2YWwgW2ljaWQtYmFzZSwgaWNpZC1iYXNlICsgbGVuZ3RoKSBpcw0K
PiA+ICsgIGFzc29jaWF0ZWQgd2l0aCB0aGUgbGlzdGVkIElPTU1VLCB3aXRoIHRoZSBpb21tdS1z
cGVjaWZpZXINCj4gPiArICAoaSAtIGljaWQtYmFzZSArIGlvbW11LWJhc2UpLg0KPiA+DQo+ID4g
ICBFeGFtcGxlOg0KPiA+DQo+ID4gKyAgICAgICAgc21tdTogaW9tbXVANTAwMDAwMCB7DQo+ID4g
KyAgICAgICAgICAgICAgIGNvbXBhdGlibGUgPSAiYXJtLG1tdS01MDAiOw0KPiA+ICsgICAgICAg
ICAgICAgICAjaW9tbXUtY2VsbHMgPSA8Mj47DQo+IA0KPiBUaGlzIHNob3VsZCBiZSAxIGlmIHN0
cmVhbS1tYXRjaC1tYXNrIGlzIHByZXNlbnQuIEJhZCBleGFtcGxlIGlzIGJhZCA6KQ0KDQpBZ3Jl
ZSA6KS4gSWxsIHVwZGF0ZSBpdC4NCg0KPiANCj4gUm9iaW4uDQo+IA0KPiA+ICsgICAgICAgICAg
ICAgICBzdHJlYW0tbWF0Y2gtbWFzayA9IDwweDdDMDA+Ow0KPiA+ICsgICAgICAgICAgICAgICAu
Li4NCj4gPiArICAgICAgICB9Ow0KPiA+ICsNCj4gPiAgICAgICAgICAgZnNsX21jOiBmc2wtbWNA
ODBjMDAwMDAwIHsNCj4gPiAgICAgICAgICAgICAgICAgICBjb21wYXRpYmxlID0gImZzbCxxb3Jp
cS1tYyI7DQo+ID4gICAgICAgICAgICAgICAgICAgcmVnID0gPDB4MDAwMDAwMDggMHgwYzAwMDAw
MCAwIDB4NDA+LCAgICAvKiBNQyBwb3J0YWwgYmFzZSAqLw0KPiA+ICAgICAgICAgICAgICAgICAg
ICAgICAgIDwweDAwMDAwMDAwIDB4MDgzNDAwMDAgMCAweDQwMDAwPjsgLyogTUMgY29udHJvbCBy
ZWcgKi8NCj4gPiAgICAgICAgICAgICAgICAgICBtc2ktcGFyZW50ID0gPCZpdHM+Ow0KPiA+ICsg
ICAgICAgICAgICAgICAgLyogZGVmaW5lIG1hcCBmb3IgSUNJRHMgMjMtNjQgKi8NCj4gPiArICAg
ICAgICAgICAgICAgIGlvbW11LW1hcCA9IDwyMyAmc21tdSAyMyA0MT47DQo+ID4gICAgICAgICAg
ICAgICAgICAgI2FkZHJlc3MtY2VsbHMgPSA8Mz47DQo+ID4gICAgICAgICAgICAgICAgICAgI3Np
emUtY2VsbHMgPSA8MT47DQo+ID4NCj4gPg0K

^ permalink raw reply

* RE: [PATCH 5/7 v5] bus: fsl-mc: support dma configure for devices on fsl-mc bus
From: Nipun Gupta @ 2018-07-06 12:17 UTC (permalink / raw)
  To: Robin Murphy, will.deacon@arm.com, robh+dt@kernel.org,
	robh@kernel.org, mark.rutland@arm.com, catalin.marinas@arm.com,
	gregkh@linuxfoundation.org, Laurentiu Tudor, bhelgaas@google.com
  Cc: hch@lst.de, joro@8bytes.org, m.szyprowski@samsung.com,
	shawnguo@kernel.org, frowand.list@gmail.com,
	iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
	devicetree@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linuxppc-dev@lists.ozlabs.org, linux-pci@vger.kernel.org,
	Bharat Bhushan, stuyoder@gmail.com, Leo Li
In-Reply-To: <b4cf1e86-0297-97ed-d10e-9b430ecb0bc7@arm.com>

DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogUm9iaW4gTXVycGh5IFtt
YWlsdG86cm9iaW4ubXVycGh5QGFybS5jb21dDQo+IFNlbnQ6IFR1ZXNkYXksIEp1bHkgMywgMjAx
OCA5OjQ0IFBNDQo+IFRvOiBOaXB1biBHdXB0YSA8bmlwdW4uZ3VwdGFAbnhwLmNvbT47IHdpbGwu
ZGVhY29uQGFybS5jb207DQo+IHJvYmgrZHRAa2VybmVsLm9yZzsgcm9iaEBrZXJuZWwub3JnOyBt
YXJrLnJ1dGxhbmRAYXJtLmNvbTsNCj4gY2F0YWxpbi5tYXJpbmFzQGFybS5jb207IGdyZWdraEBs
aW51eGZvdW5kYXRpb24ub3JnOyBMYXVyZW50aXUgVHVkb3INCj4gPGxhdXJlbnRpdS50dWRvckBu
eHAuY29tPjsgYmhlbGdhYXNAZ29vZ2xlLmNvbQ0KPiBDYzogaGNoQGxzdC5kZTsgam9yb0A4Ynl0
ZXMub3JnOyBtLnN6eXByb3dza2lAc2Ftc3VuZy5jb207DQo+IHNoYXduZ3VvQGtlcm5lbC5vcmc7
IGZyb3dhbmQubGlzdEBnbWFpbC5jb207IGlvbW11QGxpc3RzLmxpbnV4LQ0KPiBmb3VuZGF0aW9u
Lm9yZzsgbGludXgta2VybmVsQHZnZXIua2VybmVsLm9yZzsgZGV2aWNldHJlZUB2Z2VyLmtlcm5l
bC5vcmc7DQo+IGxpbnV4LWFybS1rZXJuZWxAbGlzdHMuaW5mcmFkZWFkLm9yZzsgbGludXhwcGMt
ZGV2QGxpc3RzLm96bGFicy5vcmc7IGxpbnV4LQ0KPiBwY2lAdmdlci5rZXJuZWwub3JnOyBCaGFy
YXQgQmh1c2hhbiA8YmhhcmF0LmJodXNoYW5AbnhwLmNvbT47DQo+IHN0dXlvZGVyQGdtYWlsLmNv
bTsgTGVvIExpIDxsZW95YW5nLmxpQG54cC5jb20+DQo+IFN1YmplY3Q6IFJlOiBbUEFUQ0ggNS83
IHY1XSBidXM6IGZzbC1tYzogc3VwcG9ydCBkbWEgY29uZmlndXJlIGZvciBkZXZpY2VzDQo+IG9u
IGZzbC1tYyBidXMNCj4gDQo+IE9uIDIwLzA1LzE4IDE0OjQ5LCBOaXB1biBHdXB0YSB3cm90ZToN
Cj4gPiBUaGlzIHBhdGNoIGFkZHMgc3VwcG9ydCBvZiBkbWEgY29uZmlndXJhdGlvbiBmb3IgZGV2
aWNlcyBvbiBmc2wtbWMNCj4gPiBidXMgdXNpbmcgJ2RtYV9jb25maWd1cmUnIGNhbGxiYWNrIGZv
ciBidXNzZXMuIEFsc28sIGRpcmVjdGx5IGNhbGxpbmcNCj4gPiBhcmNoX3NldHVwX2RtYV9vcHMg
aXMgcmVtb3ZlZCBmcm9tIHRoZSBmc2wtbWMgYnVzLg0KPiANCj4gTG9va3MgbGlrZSB0aGlzIGlz
IHRoZSBmaW5hbCBhcmNoX3NldHVwX2RtYV9vcHMgb2ZmZW5kZXIsIHlheSENCj4gDQo+ID4gU2ln
bmVkLW9mZi1ieTogTmlwdW4gR3VwdGEgPG5pcHVuLmd1cHRhQG54cC5jb20+DQo+ID4gUmV2aWV3
ZWQtYnk6IExhdXJlbnRpdSBUdWRvciA8bGF1cmVudGl1LnR1ZG9yQG54cC5jb20+DQo+ID4gLS0t
DQo+ID4gICBkcml2ZXJzL2J1cy9mc2wtbWMvZnNsLW1jLWJ1cy5jIHwgMTUgKysrKysrKysrKyst
LS0tDQo+ID4gICAxIGZpbGUgY2hhbmdlZCwgMTEgaW5zZXJ0aW9ucygrKSwgNCBkZWxldGlvbnMo
LSkNCj4gPg0KPiA+IGRpZmYgLS1naXQgYS9kcml2ZXJzL2J1cy9mc2wtbWMvZnNsLW1jLWJ1cy5j
IGIvZHJpdmVycy9idXMvZnNsLW1jL2ZzbC1tYy0NCj4gYnVzLmMNCj4gPiBpbmRleCA1ZDgyNjZj
Li5mYTQzYzdkIDEwMDY0NA0KPiA+IC0tLSBhL2RyaXZlcnMvYnVzL2ZzbC1tYy9mc2wtbWMtYnVz
LmMNCj4gPiArKysgYi9kcml2ZXJzL2J1cy9mc2wtbWMvZnNsLW1jLWJ1cy5jDQo+ID4gQEAgLTEy
Nyw2ICsxMjcsMTYgQEAgc3RhdGljIGludCBmc2xfbWNfYnVzX3VldmVudChzdHJ1Y3QgZGV2aWNl
ICpkZXYsDQo+IHN0cnVjdCBrb2JqX3VldmVudF9lbnYgKmVudikNCj4gPiAgIAlyZXR1cm4gMDsN
Cj4gPiAgIH0NCj4gPg0KPiA+ICtzdGF0aWMgaW50IGZzbF9tY19kbWFfY29uZmlndXJlKHN0cnVj
dCBkZXZpY2UgKmRldikNCj4gPiArew0KPiA+ICsJc3RydWN0IGRldmljZSAqZG1hX2RldiA9IGRl
djsNCj4gPiArDQo+ID4gKwl3aGlsZSAoZGV2X2lzX2ZzbF9tYyhkbWFfZGV2KSkNCj4gPiArCQlk
bWFfZGV2ID0gZG1hX2Rldi0+cGFyZW50Ow0KPiA+ICsNCj4gPiArCXJldHVybiBvZl9kbWFfY29u
ZmlndXJlKGRldiwgZG1hX2Rldi0+b2Zfbm9kZSwgMCk7DQo+ID4gK30NCj4gPiArDQo+ID4gICBz
dGF0aWMgc3NpemVfdCBtb2RhbGlhc19zaG93KHN0cnVjdCBkZXZpY2UgKmRldiwgc3RydWN0IGRl
dmljZV9hdHRyaWJ1dGUNCj4gKmF0dHIsDQo+ID4gICAJCQkgICAgIGNoYXIgKmJ1ZikNCj4gPiAg
IHsNCj4gPiBAQCAtMTQ4LDYgKzE1OCw3IEBAIHN0cnVjdCBidXNfdHlwZSBmc2xfbWNfYnVzX3R5
cGUgPSB7DQo+ID4gICAJLm5hbWUgPSAiZnNsLW1jIiwNCj4gPiAgIAkubWF0Y2ggPSBmc2xfbWNf
YnVzX21hdGNoLA0KPiA+ICAgCS51ZXZlbnQgPSBmc2xfbWNfYnVzX3VldmVudCwNCj4gPiArCS5k
bWFfY29uZmlndXJlICA9IGZzbF9tY19kbWFfY29uZmlndXJlLA0KPiA+ICAgCS5kZXZfZ3JvdXBz
ID0gZnNsX21jX2Rldl9ncm91cHMsDQo+ID4gICB9Ow0KPiA+ICAgRVhQT1JUX1NZTUJPTF9HUEwo
ZnNsX21jX2J1c190eXBlKTsNCj4gPiBAQCAtNjMzLDEwICs2NDQsNiBAQCBpbnQgZnNsX21jX2Rl
dmljZV9hZGQoc3RydWN0IGZzbF9tY19vYmpfZGVzYw0KPiAqb2JqX2Rlc2MsDQo+ID4gICAJCQln
b3RvIGVycm9yX2NsZWFudXBfZGV2Ow0KPiA+ICAgCX0NCj4gPg0KPiA+IC0JLyogT2JqZWN0cyBh
cmUgY29oZXJlbnQsIHVubGVzcyAnbm8gc2hhcmVhYmlsaXR5JyBmbGFnIHNldC4gKi8NCj4gPiAt
CWlmICghKG9ial9kZXNjLT5mbGFncyAmDQo+IEZTTF9NQ19PQkpfRkxBR19OT19NRU1fU0hBUkVB
QklMSVRZKSkNCj4gDQo+IEFsdGhvdWdoIGl0IHNlZW1zIHdlIGRvIGVuZCB1cCB3aXRob3V0IGFu
eSBoYW5kbGluZyBvZiB0aGlzDQo+ICJub24tY29oZXJlbnQgb2JqZWN0IGJlaGluZCBjb2hlcmVu
dCBNQyIgY2FzZSwgYW5kIEknbSBub3Qgc3VyZSBob3cNCj4gZWFzaWx5IHRoYXQgY291bGQgYmUg
YWNjb21tb2RhdGVkIGJ5IGdlbmVyaWMgY29kZS4uLiA6LyBIb3cgaW1wb3J0YW50IGlzDQo+IHRo
ZSBxdWlyaz8NCg0KV2UgaGF2ZSBhbGwgdGhlIGRldmljZXMgYXMgY29oZXJlbnQgaW4gb3VyIFNv
QydzIG5vdyA6KSBTbyB0aGlzIGlzIGZpbmUuDQpXZSBoYXZlIGludGVybmFsbHkgZGlzY3Vzc2Vk
IGl0Lg0KDQpSZWdhcmRzLA0KTmlwdW4NCg0KPiANCj4gUm9iaW4uDQo+IA0KPiA+IC0JCWFyY2hf
c2V0dXBfZG1hX29wcygmbWNfZGV2LT5kZXYsIDAsIDAsIE5VTEwsIHRydWUpOw0KPiA+IC0NCj4g
PiAgIAkvKg0KPiA+ICAgCSAqIFRoZSBkZXZpY2Utc3BlY2lmaWMgcHJvYmUgY2FsbGJhY2sgd2ls
bCBnZXQgaW52b2tlZCBieSBkZXZpY2VfYWRkKCkNCj4gPiAgIAkgKi8NCj4gPg0K

^ permalink raw reply

* RE: [PATCH 7/7 v5] arm64: dts: ls208xa: comply with the iommu map binding for fsl_mc
From: Nipun Gupta @ 2018-07-06 12:18 UTC (permalink / raw)
  To: Robin Murphy, will.deacon@arm.com, robh+dt@kernel.org,
	robh@kernel.org, mark.rutland@arm.com, catalin.marinas@arm.com,
	gregkh@linuxfoundation.org, Laurentiu Tudor, bhelgaas@google.com
  Cc: hch@lst.de, joro@8bytes.org, m.szyprowski@samsung.com,
	shawnguo@kernel.org, frowand.list@gmail.com,
	iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
	devicetree@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linuxppc-dev@lists.ozlabs.org, linux-pci@vger.kernel.org,
	Bharat Bhushan, stuyoder@gmail.com, Leo Li
In-Reply-To: <88626adc-6f5b-86df-88a0-620d258f8053@arm.com>

DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogUm9iaW4gTXVycGh5IFtt
YWlsdG86cm9iaW4ubXVycGh5QGFybS5jb21dDQo+IFNlbnQ6IFR1ZXNkYXksIEp1bHkgMywgMjAx
OCAxMDowNiBQTQ0KPiBUbzogTmlwdW4gR3VwdGEgPG5pcHVuLmd1cHRhQG54cC5jb20+OyB3aWxs
LmRlYWNvbkBhcm0uY29tOw0KPiByb2JoK2R0QGtlcm5lbC5vcmc7IHJvYmhAa2VybmVsLm9yZzsg
bWFyay5ydXRsYW5kQGFybS5jb207DQo+IGNhdGFsaW4ubWFyaW5hc0Bhcm0uY29tOyBncmVna2hA
bGludXhmb3VuZGF0aW9uLm9yZzsgTGF1cmVudGl1IFR1ZG9yDQo+IDxsYXVyZW50aXUudHVkb3JA
bnhwLmNvbT47IGJoZWxnYWFzQGdvb2dsZS5jb20NCj4gQ2M6IGhjaEBsc3QuZGU7IGpvcm9AOGJ5
dGVzLm9yZzsgbS5zenlwcm93c2tpQHNhbXN1bmcuY29tOw0KPiBzaGF3bmd1b0BrZXJuZWwub3Jn
OyBmcm93YW5kLmxpc3RAZ21haWwuY29tOyBpb21tdUBsaXN0cy5saW51eC0NCj4gZm91bmRhdGlv
bi5vcmc7IGxpbnV4LWtlcm5lbEB2Z2VyLmtlcm5lbC5vcmc7IGRldmljZXRyZWVAdmdlci5rZXJu
ZWwub3JnOw0KPiBsaW51eC1hcm0ta2VybmVsQGxpc3RzLmluZnJhZGVhZC5vcmc7IGxpbnV4cHBj
LWRldkBsaXN0cy5vemxhYnMub3JnOyBsaW51eC0NCj4gcGNpQHZnZXIua2VybmVsLm9yZzsgQmhh
cmF0IEJodXNoYW4gPGJoYXJhdC5iaHVzaGFuQG54cC5jb20+Ow0KPiBzdHV5b2RlckBnbWFpbC5j
b207IExlbyBMaSA8bGVveWFuZy5saUBueHAuY29tPg0KPiBTdWJqZWN0OiBSZTogW1BBVENIIDcv
NyB2NV0gYXJtNjQ6IGR0czogbHMyMDh4YTogY29tcGx5IHdpdGggdGhlIGlvbW11IG1hcA0KPiBi
aW5kaW5nIGZvciBmc2xfbWMNCj4gDQo+IE9uIDIwLzA1LzE4IDE0OjQ5LCBOaXB1biBHdXB0YSB3
cm90ZToNCj4gPiBmc2wtbWMgYnVzIHN1cHBvcnQgdGhlIG5ldyBpb21tdS1tYXAgcHJvcGVydHku
IENvbXBseSB0byB0aGlzIGJpbmRpbmcNCj4gPiBmb3IgZnNsX21jIGJ1cy4NCj4gPg0KPiA+IFNp
Z25lZC1vZmYtYnk6IE5pcHVuIEd1cHRhIDxuaXB1bi5ndXB0YUBueHAuY29tPg0KPiA+IFJldmll
d2VkLWJ5OiBMYXVyZW50aXUgVHVkb3IgPGxhdXJlbnRpdS50dWRvckBueHAuY29tPg0KPiA+IC0t
LQ0KPiA+ICAgYXJjaC9hcm02NC9ib290L2R0cy9mcmVlc2NhbGUvZnNsLWxzMjA4eGEuZHRzaSB8
IDYgKysrKystDQo+ID4gICAxIGZpbGUgY2hhbmdlZCwgNSBpbnNlcnRpb25zKCspLCAxIGRlbGV0
aW9uKC0pDQo+ID4NCj4gPiBkaWZmIC0tZ2l0IGEvYXJjaC9hcm02NC9ib290L2R0cy9mcmVlc2Nh
bGUvZnNsLWxzMjA4eGEuZHRzaQ0KPiBiL2FyY2gvYXJtNjQvYm9vdC9kdHMvZnJlZXNjYWxlL2Zz
bC1sczIwOHhhLmR0c2kNCj4gPiBpbmRleCAxMzdlZjRkLi42MDEwNTA1IDEwMDY0NA0KPiA+IC0t
LSBhL2FyY2gvYXJtNjQvYm9vdC9kdHMvZnJlZXNjYWxlL2ZzbC1sczIwOHhhLmR0c2kNCj4gPiAr
KysgYi9hcmNoL2FybTY0L2Jvb3QvZHRzL2ZyZWVzY2FsZS9mc2wtbHMyMDh4YS5kdHNpDQo+ID4g
QEAgLTE4NCw2ICsxODQsNyBAQA0KPiA+ICAgCQkjYWRkcmVzcy1jZWxscyA9IDwyPjsNCj4gPiAg
IAkJI3NpemUtY2VsbHMgPSA8Mj47DQo+ID4gICAJCXJhbmdlczsNCj4gPiArCQlkbWEtcmFuZ2Vz
ID0gPDB4MCAweDAgMHgwIDB4MCAweDEwMDAwIDB4MDAwMDAwMDA+Ow0KPiA+DQo+ID4gICAJCWNs
b2NrZ2VuOiBjbG9ja2luZ0AxMzAwMDAwIHsNCj4gPiAgIAkJCWNvbXBhdGlibGUgPSAiZnNsLGxz
MjA4MGEtY2xvY2tnZW4iOw0KPiA+IEBAIC0zNTcsNiArMzU4LDggQEANCj4gPiAgIAkJCXJlZyA9
IDwweDAwMDAwMDA4IDB4MGMwMDAwMDAgMCAweDQwPiwJIC8qIE1DIHBvcnRhbA0KPiBiYXNlICov
DQo+ID4gICAJCQkgICAgICA8MHgwMDAwMDAwMCAweDA4MzQwMDAwIDAgMHg0MDAwMD47IC8qIE1D
DQo+IGNvbnRyb2wgcmVnICovDQo+ID4gICAJCQltc2ktcGFyZW50ID0gPCZpdHM+Ow0KPiA+ICsJ
CQlpb21tdS1tYXAgPSA8MCAmc21tdSAwIDA+OwkvKiBUaGlzIGlzIGZpeGVkLXVwIGJ5DQo+IHUt
Ym9vdCAqLw0KPiA+ICsJCQlkbWEtY29oZXJlbnQ7DQo+ID4gICAJCQkjYWRkcmVzcy1jZWxscyA9
IDwzPjsNCj4gPiAgIAkJCSNzaXplLWNlbGxzID0gPDE+Ow0KPiA+DQo+ID4gQEAgLTQ2MCw2ICs0
NjMsOCBAQA0KPiA+ICAgCQkJY29tcGF0aWJsZSA9ICJhcm0sbW11LTUwMCI7DQo+ID4gICAJCQly
ZWcgPSA8MCAweDUwMDAwMDAgMCAweDgwMDAwMD47DQo+ID4gICAJCQkjZ2xvYmFsLWludGVycnVw
dHMgPSA8MTI+Ow0KPiA+ICsJCQkjaW9tbXUtY2VsbHMgPSA8MT47DQo+ID4gKwkJCXN0cmVhbS1t
YXRjaC1tYXNrID0gPDB4N0MwMD47DQo+ID4gICAJCQlpbnRlcnJ1cHRzID0gPDAgMTMgND4sIC8q
IGdsb2JhbCBzZWN1cmUgZmF1bHQgKi8NCj4gPiAgIAkJCQkgICAgIDwwIDE0IDQ+LCAvKiBjb21i
aW5lZCBzZWN1cmUgaW50ZXJydXB0ICovDQo+ID4gICAJCQkJICAgICA8MCAxNSA0PiwgLyogZ2xv
YmFsIG5vbi1zZWN1cmUgZmF1bHQgKi8NCj4gPiBAQCAtNTAyLDcgKzUwNyw2IEBADQo+ID4gICAJ
CQkJICAgICA8MCAyMDQgND4sIDwwIDIwNSA0PiwNCj4gPiAgIAkJCQkgICAgIDwwIDIwNiA0Piwg
PDAgMjA3IDQ+LA0KPiA+ICAgCQkJCSAgICAgPDAgMjA4IDQ+LCA8MCAyMDkgND47DQo+ID4gLQkJ
CW1tdS1tYXN0ZXJzID0gPCZmc2xfbWMgMHgzMDAgMD47DQo+IA0KPiBTaW5jZSB3ZSdyZSBpbiBo
ZXJlLCBpcyB0aGUgU01NVSBpdHNlbGYgYWxzbyBjb2hlcmVudD8gSWYgaXQgaXMsIHlvdQ0KPiBw
cm9iYWJseSB3YW50IHRvIHNheSBzbyBhbmQgYXZvaWQgdGhlIG92ZXJoZWFkIG9mIHBvaW50bGVz
c2x5IGNsZWFuaW5nDQo+IGNhY2hlIGxpbmVzIG9uIGV2ZXJ5IHBhZ2UgdGFibGUgdXBkYXRlLg0K
DQpZZXMsIGRtYS1jb2hlcmVudCBwcm9wZXJ0eSBpcyBhbHNvIHJlcXVpcmVkIGhlcmUuIEkgbWlz
c2VkIGl0IHNvbWVob3cuDQpUaGFua3MgZm9yIHBvaW50aW5nIHRoaXMuDQoNClJlZ2FyZHMsDQpO
aXB1bg0KDQo+IA0KPiBSb2Jpbi4NCj4gDQo+ID4gICAJCX07DQo+ID4NCj4gPiAgIAkJZHNwaTog
ZHNwaUAyMTAwMDAwIHsNCj4gPg0K

^ permalink raw reply

* Re: [PATCHv3 0/4] drivers/base: bugfix for supplier<-consumer ordering in device_kset
From: Pingfan Liu @ 2018-07-06 13:52 UTC (permalink / raw)
  To: lukas
  Cc: rafael, linux-kernel, Greg Kroah-Hartman, Grygorii Strashko,
	Christoph Hellwig, Bjorn Helgaas, Dave Young, linux-pci,
	linuxppc-dev, linux-pm, kishon
In-Reply-To: <20180706083603.GA9063@wunner.de>

On Fri, Jul 6, 2018 at 4:36 PM Lukas Wunner <lukas@wunner.de> wrote:
>
> [cc += Kishon Vijay Abraham]
>
> On Thu, Jul 05, 2018 at 11:18:28AM +0200, Rafael J. Wysocki wrote:
> > OK, so calling devices_kset_move_last() from really_probe() clearly is
> > a mistake.
> >
> > I'm not really sure what the intention of it was as the changelog of
> > commit 52cdbdd49853d doesn't really explain that (why would it be
> > insufficient without that change?)
>
> It seems 52cdbdd49853d fixed an issue with boards which have an MMC
> whose reset pin needs to be driven high on shutdown, lest the MMC
> won't be found on the next boot.
>
> The boards' devicetrees use a kludge wherein the reset pin is modelled
> as a regulator.  The regulator is enabled when the MMC probes and
> disabled on driver unbind and shutdown.  As a result, the pin is driven
> low on shutdown and the MMC is not found on the next boot.
>
> To fix this, another kludge was invented wherein the GPIO expander
> driving the reset pin unconditionally drives all its pins high on
> shutdown, see pcf857x_shutdown() in drivers/gpio/gpio-pcf857x.c
> (commit adc284755055, "gpio: pcf857x: restore the initial line state
> of all pcf lines").
>
> For this kludge to work, the GPIO expander's ->shutdown hook needs to
> be executed after the MMC expander's ->shutdown hook.
>
> Commit 52cdbdd49853d achieved that by reordering devices_kset according
> to the probe order.  Apparently the MMC probes after the GPIO expander,
> possibly because it returns -EPROBE_DEFER if the vmmc regulator isn't
> available yet, see mmc_regulator_get_supply().
>
> Note, I'm just piecing the information together from git history,
> I'm not responsible for these kludges.  (I'm innocent!)
>
Thanks for your exploration, very clearly. I had tried, but failed
since these commits are contributed with different authors. I am not
familiar with ARM and dts, so had thought
really_probe()->devices_kset_move_last() is used to address a very
popular "supplier<-consumer" order issue in smart phone, based on the
configuration hard-coded in "bios(or counterpart in ARM).

> @Pingfan Liu, if you just remove the call to devices_kset_move_last()
> from really_probe(), does the issue go away?
>
Yes, it goes away.

> If so, it might be best to remove that call and model the dependency
> with a call to device_link_add() in mmc_regulator_get_supply().
> Another idea would be to automatically add a device_link in the
> driver core if -EPROBE_DEFER is returned.
>
Maybe the first one is better, as it is already used by other drivers.

Thanks,
Pingfan

^ permalink raw reply

* Re: [PATCHv3 0/4] drivers/base: bugfix for supplier<-consumer ordering in device_kset
From: Pingfan Liu @ 2018-07-06 13:55 UTC (permalink / raw)
  To: rafael
  Cc: lukas, linux-kernel, Greg Kroah-Hartman, Grygorii Strashko,
	Christoph Hellwig, Bjorn Helgaas, Dave Young, linux-pci,
	linuxppc-dev, linux-pm, kishon
In-Reply-To: <CAJZ5v0j6+m8NGMneG45tF9L1s6ahtFo91wPCO=3YOTEyAOL+VQ@mail.gmail.com>

On Fri, Jul 6, 2018 at 4:47 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Fri, Jul 6, 2018 at 10:36 AM, Lukas Wunner <lukas@wunner.de> wrote:
> > [cc += Kishon Vijay Abraham]
> >
> > On Thu, Jul 05, 2018 at 11:18:28AM +0200, Rafael J. Wysocki wrote:
> >> OK, so calling devices_kset_move_last() from really_probe() clearly is
> >> a mistake.
> >>
> >> I'm not really sure what the intention of it was as the changelog of
> >> commit 52cdbdd49853d doesn't really explain that (why would it be
> >> insufficient without that change?)
> >
> > It seems 52cdbdd49853d fixed an issue with boards which have an MMC
> > whose reset pin needs to be driven high on shutdown, lest the MMC
> > won't be found on the next boot.
> >
> > The boards' devicetrees use a kludge wherein the reset pin is modelled
> > as a regulator.  The regulator is enabled when the MMC probes and
> > disabled on driver unbind and shutdown.  As a result, the pin is driven
> > low on shutdown and the MMC is not found on the next boot.
> >
> > To fix this, another kludge was invented wherein the GPIO expander
> > driving the reset pin unconditionally drives all its pins high on
> > shutdown, see pcf857x_shutdown() in drivers/gpio/gpio-pcf857x.c
> > (commit adc284755055, "gpio: pcf857x: restore the initial line state
> > of all pcf lines").
> >
> > For this kludge to work, the GPIO expander's ->shutdown hook needs to
> > be executed after the MMC expander's ->shutdown hook.
> >
> > Commit 52cdbdd49853d achieved that by reordering devices_kset according
> > to the probe order.  Apparently the MMC probes after the GPIO expander,
> > possibly because it returns -EPROBE_DEFER if the vmmc regulator isn't
> > available yet, see mmc_regulator_get_supply().
> >
> > Note, I'm just piecing the information together from git history,
> > I'm not responsible for these kludges.  (I'm innocent!)
>
> Sure enough. :-)
>
> In any case, calling devices_kset_move_last() in really_probe() is
> plain broken and if its only purpose was to address a single, arguably
> kludgy, use case, let's just get rid of it in the first place IMO.
>
Yes, if it is only used for a single use case.

> > @Pingfan Liu, if you just remove the call to devices_kset_move_last()
> > from really_probe(), does the issue go away?
>
> I would think so from the description of the problem (elsewhere in this thread).
>
Yes.

Thanks,
Pingfan

^ permalink raw reply

* Re: [PATCH V2 00/10] KASan ppc64 support
From: Aneesh Kumar K.V @ 2018-07-06 14:11 UTC (permalink / raw)
  To: Christophe LEROY, Andrey Ryabinin; +Cc: paulus, linuxppc-dev, LKML
In-Reply-To: <7d459964-ce68-db1c-3d28-ae206d5ebdb0@c-s.fr>

Christophe LEROY <christophe.leroy@c-s.fr> writes:

> Hi Aneesh,
>
> Are you still working on support for KASan for ppc64 ?
>

Haven't got time to work on this. The hash memory layout makes it
a bit complicated to implement this.

-aneesh

^ permalink raw reply

* Re: [PATCH V2 00/10] KASan ppc64 support
From: Christophe LEROY @ 2018-07-06 14:13 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Andrey Ryabinin; +Cc: paulus, linuxppc-dev, LKML
In-Reply-To: <87h8lch2t2.fsf@linux.vnet.ibm.com>



Le 06/07/2018 à 16:11, Aneesh Kumar K.V a écrit :
> Christophe LEROY <christophe.leroy@c-s.fr> writes:
> 
>> Hi Aneesh,
>>
>> Are you still working on support for KASan for ppc64 ?
>>
> 
> Haven't got time to work on this. The hash memory layout makes it
> a bit complicated to implement this.
> 

Ok, maybe would be easier to start with nohash.

Is there some literature somewhere about what an arch has to implement 
to use KAsan ?

Thanks,
Christophe

^ permalink raw reply

* Re: [RFC PATCH 1/2] dma-mapping: Clean up dma_set_*mask() hooks
From: Robin Murphy @ 2018-07-06 14:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: m.szyprowski, iommu, linux-arm-kernel, linux-ia64, linuxppc-dev
In-Reply-To: <20180705193738.GB28905@lst.de>

On 05/07/18 20:37, Christoph Hellwig wrote:
> On Wed, Jul 04, 2018 at 06:50:11PM +0100, Robin Murphy wrote:
>> Arch-specific implementions for dma_set_{coherent_,}mask() currently
>> rely on an inconsistent mix of arch-defined Kconfig symbols and macro
>> overrides. Now that we have a nice centralised home for DMA API gubbins,
>> let's consolidate these loose ends under consistent config options.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>
>> Here's hoping the buildbot comes by to point out what I've inevitably
>> missed, although I did check a cursory cross-compile of ppc64_defconfig
>> to iron out the obvious howlers.
> 
> The patch looks sensible to me, although I was hoping to get rid of these
> hooks in this or the next merge window as they are a horrible bad idea.
> 
>> The motivation here is that I'm looking at adding set_mask overrides
>> for arm64, and having discovered a bit of a mess it seemed prudent to
>> clean up before ingraining it any more.
> 
> What are you trying to do?  I really don't want to see more users of
> the hooks as they are are a horribly bad idea.

I really need to fix the ongoing problem we have where, due to funky 
integrations, devices suffer some downstream addressing limit (described 
by DT dma-ranges or ACPI IORT/_DMA) which we carefully set up in 
dma_configure(), but then just gets lost when the driver probes and 
innocently calls dma_set_mask() with something wider. I think it's 
effectively the generalised case of the VIA 32-bit quirk, if I 
understand that one correctly.

The approach that seemed to me to be safest is largely based on the one 
proposed in a thread from ages ago[1]; namely to make dma_configure() 
better at distinguishing firmware-specified masks from bus defaults, 
capture the firmware mask in dev->archdata during arch_setup_dma_ops(), 
then use the custom set_mask routines to ensure any subsequent updates 
never exceed that. It doesn't seem possible to make this work robustly 
without storing *some* additional per-device data, and for that archdata 
is a lesser evil than struct device itself. Plus even though it's not 
actually an arch-specific issue it feels like there's such a risk of 
breaking other platforms that I'm reticent to even try handling it 
entirely in generic code.

Robin.

[1] 
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1306507.html

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox